Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MINOR: [R] Simplify compare_dplyr_binding test helper #14676

Merged
merged 1 commit into from
Nov 21, 2022

Conversation

nealrichardson
Copy link
Member

A long time ago, dplyr expressions on Tables and RecordBatches were evaluated by calling compute functions on (Chunked)Arrays, calling Slice or Filter methods on the Tables/RBs, etc. So to make sure that all C++ bindings were exposed correctly, we needed to test that operations worked on both Tables and RecordBatches.

Today, everything goes through ExecPlans, and RecordBatches get wrapped in Tables in creating TableSourceNodes: https://github.com/apache/arrow/blob/master/r/R/query-engine.R#L63. So as long as we are able to create a Table from a RecordBatch (tested elsewhere), the query evaluation is identical. This means we don't need to test every dplyr query twice.

On my machine, this cuts off a little more than 1/3 of the running time of the dplyr tests, or about 20 seconds. The bigger benefit IMO is that when there is a failure in one of these expectations, you'll only get it once instead of twice, so it will be less confusing to see what's up.

Copy link
Member

@thisisnic thisisnic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this PR, always good to simplify the codebase and speed things up! :D

@thisisnic thisisnic merged commit 31dca2b into apache:master Nov 21, 2022
@nealrichardson nealrichardson deleted the simplify-compare-dplyr branch November 21, 2022 20:25
@ursabot
Copy link

ursabot commented Nov 22, 2022

Benchmark runs are scheduled for baseline = e9222ae and contender = 31dca2b. 31dca2b is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Finished ⬇️0.0% ⬆️0.0% ⚠️ Contender and baseline run contexts do not match] ec2-t3-xlarge-us-east-2
[Failed ⬇️0.4% ⬆️0.23%] test-mac-arm
[Failed ⬇️0.0% ⬆️0.0%] ursa-i9-9960x
[Finished ⬇️0.42% ⬆️0.0%] ursa-thinkcentre-m75q
Buildkite builds:
[Finished] 31dca2b2 ec2-t3-xlarge-us-east-2
[Failed] 31dca2b2 test-mac-arm
[Finished] 31dca2b2 ursa-i9-9960x
[Finished] 31dca2b2 ursa-thinkcentre-m75q
[Finished] e9222ae0 ec2-t3-xlarge-us-east-2
[Failed] e9222ae0 test-mac-arm
[Failed] e9222ae0 ursa-i9-9960x
[Finished] e9222ae0 ursa-thinkcentre-m75q
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants