MINOR: [R] Simplify compare_dplyr_binding test helper #14676

nealrichardson · 2022-11-18T18:41:55Z

A long time ago, dplyr expressions on Tables and RecordBatches were evaluated by calling compute functions on (Chunked)Arrays, calling Slice or Filter methods on the Tables/RBs, etc. So to make sure that all C++ bindings were exposed correctly, we needed to test that operations worked on both Tables and RecordBatches.

Today, everything goes through ExecPlans, and RecordBatches get wrapped in Tables in creating TableSourceNodes: https://github.com/apache/arrow/blob/master/r/R/query-engine.R#L63. So as long as we are able to create a Table from a RecordBatch (tested elsewhere), the query evaluation is identical. This means we don't need to test every dplyr query twice.

On my machine, this cuts off a little more than 1/3 of the running time of the dplyr tests, or about 20 seconds. The bigger benefit IMO is that when there is a failure in one of these expectations, you'll only get it once instead of twice, so it will be less confusing to see what's up.

thisisnic

Thanks for this PR, always good to simplify the codebase and speed things up! :D

ursabot · 2022-11-22T21:52:00Z

Benchmark runs are scheduled for baseline = e9222ae and contender = 31dca2b. 31dca2b is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Finished ⬇️0.0% ⬆️0.0% ⚠️ Contender and baseline run contexts do not match] ec2-t3-xlarge-us-east-2
[Failed ⬇️0.4% ⬆️0.23%] test-mac-arm
[Failed ⬇️0.0% ⬆️0.0%] ursa-i9-9960x
[Finished ⬇️0.42% ⬆️0.0%] ursa-thinkcentre-m75q
Buildkite builds:
[Finished] 31dca2b2 ec2-t3-xlarge-us-east-2
[Failed] 31dca2b2 test-mac-arm
[Finished] 31dca2b2 ursa-i9-9960x
[Finished] 31dca2b2 ursa-thinkcentre-m75q
[Finished] e9222ae0 ec2-t3-xlarge-us-east-2
[Failed] e9222ae0 test-mac-arm
[Failed] e9222ae0 ursa-i9-9960x
[Finished] e9222ae0 ursa-thinkcentre-m75q
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

MINOR: [R] Simplify compare_dplyr_binding test helper

22c4843

nealrichardson requested a review from thisisnic November 18, 2022 18:41

github-actions bot added the Component: R label Nov 18, 2022

thisisnic approved these changes Nov 21, 2022

View reviewed changes

thisisnic merged commit 31dca2b into apache:master Nov 21, 2022

nealrichardson deleted the simplify-compare-dplyr branch November 21, 2022 20:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MINOR: [R] Simplify compare_dplyr_binding test helper #14676

MINOR: [R] Simplify compare_dplyr_binding test helper #14676

nealrichardson commented Nov 18, 2022

thisisnic left a comment

ursabot commented Nov 22, 2022

MINOR: [R] Simplify compare_dplyr_binding test helper #14676

MINOR: [R] Simplify compare_dplyr_binding test helper #14676

Conversation

nealrichardson commented Nov 18, 2022

thisisnic left a comment

Choose a reason for hiding this comment

ursabot commented Nov 22, 2022