Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add "example" capability to pyspark based data schemas #1853

Open
pascalwhoop opened this issue Nov 13, 2024 · 0 comments
Open

Add "example" capability to pyspark based data schemas #1853

pascalwhoop opened this issue Nov 13, 2024 · 0 comments
Labels
enhancement New feature or request

Comments

@pascalwhoop
Copy link

Is your feature request related to a problem? Please describe.

  • We use spark for all our heavy data wrangling
  • we want to use pandera to generate test / fake data

Describe the solution you'd like
an ability to use .example(size=20) also with pyspark based schemas

Describe alternatives you've considered

  • convert pyspark schema somehow automatically to a pandas one and then use example
  • could not figure out if there is a way to do so
  • ideally we'd also define the schema in a "agnostic" format (e.g. protofbuf / custom DSL / yaml / pydantic based) and derive the pyspark / pandas schema code from this
@pascalwhoop pascalwhoop added the enhancement New feature or request label Nov 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant