Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[R] Create a field ref to a field in a struct #18818

Closed
asfimport opened this issue Sep 2, 2021 · 2 comments · Fixed by #19706
Closed

[R] Create a field ref to a field in a struct #18818

asfimport opened this issue Sep 2, 2021 · 2 comments · Fixed by #19706

Comments

@asfimport
Copy link
Collaborator

asfimport commented Sep 2, 2021

See also ARROW-11259. This probably needs to be a $ and [[ method on Expression

Reporter: Neal Richardson / @nealrichardson

Related issues:

Note: This issue was originally created as ARROW-13858. Please see the migration documentation for further details.

@asfimport
Copy link
Collaborator Author

Neal Richardson / @nealrichardson:
It turns out this doesn't work in C++. Created ARROW-13987 for that.

@asfimport
Copy link
Collaborator Author

Dewey Dunnington / @paleolimbot:
Just doing some exploring:

Where the field reference is created in from /R: https://github.com/apache/arrow/blob/master/r/R/expression.R#L170-L173

Where the field reference is created in /src: https://github.com/apache/arrow/blob/master/r/src/expression.cpp#L72-L74

Example of creating a nested field reference from a test (but it appears it wasn't implemented?) https://github.com/lidavidm/arrow/blob/master/cpp/src/arrow/compute/exec/expression_test.cc#L506-L509

You can do this kind of thing in dplyr using $:

library(arrow, warn.conflicts = FALSE)
library(dplyr, warn.conflicts = FALSE)

RecordBatch$create(df_col = tibble(a = 1)) %>% 
  mutate(df_col_a = df_col$a) %>% 
  collect()
#> # A tibble: 1 × 1
#>   df_col$a
#>      <dbl>
#> 1        1

tibble(df_col = tibble(a = 1)) %>% 
  mutate(df_col_a = df_col$a)
#> # A tibble: 1 × 2
#>   df_col$a df_col_a
#>      <dbl>    <dbl>
#> 1        1        1

nealrichardson added a commit that referenced this issue Jan 18, 2023
This PR implements `$.Expression` and `[[.Expression` methods, such that if the Expression is a FieldRef, it returns a nested FieldRef. This required revising some assumptions in a few places, particularly that if an Expression is a FieldRef, it has a `name`, and that all FieldRefs correspond to a Field in a Schema. In the case where the Expression is not a FieldRef, it will create an Expression call to `struct_field` to extract the field, iff the Expression has a knowable `type`, the type is `StructType`, and the field name exists in the struct. 

Things not done because they weren't needed to get this working:

  * `Expression$field_ref()` take a vector to construct a nested ref
  * Method to return vector of nested components of a field ref in R

Next steps for future PRs:

* Wrap this in [tidyr::unpack()](https://tidyr.tidyverse.org/reference/pack.html) method (but unfortunately, unpack() is not a generic)
* #33756
* #33757
* #33760

* Closes: #18818

Authored-by: Neal Richardson <neal.p.richardson@gmail.com>
Signed-off-by: Neal Richardson <neal.p.richardson@gmail.com>
@nealrichardson nealrichardson added this to the 12.0.0 milestone Jan 18, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants