MINOR: [R] Simplify compare_dplyr_binding test helper #14676
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
A long time ago, dplyr expressions on Tables and RecordBatches were evaluated by calling compute functions on (Chunked)Arrays, calling Slice or Filter methods on the Tables/RBs, etc. So to make sure that all C++ bindings were exposed correctly, we needed to test that operations worked on both Tables and RecordBatches.
Today, everything goes through ExecPlans, and RecordBatches get wrapped in Tables in creating TableSourceNodes: https://github.com/apache/arrow/blob/master/r/R/query-engine.R#L63. So as long as we are able to create a Table from a RecordBatch (tested elsewhere), the query evaluation is identical. This means we don't need to test every dplyr query twice.
On my machine, this cuts off a little more than 1/3 of the running time of the dplyr tests, or about 20 seconds. The bigger benefit IMO is that when there is a failure in one of these expectations, you'll only get it once instead of twice, so it will be less confusing to see what's up.