New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Add "example" capability to pyspark based data schemas #1853

Open

pascalwhoop opened this issue Nov 13, 2024 · 0 comments

Labels

pascalwhoop commented Nov 13, 2024

Is your feature request related to a problem? Please describe.

We use spark for all our heavy data wrangling
we want to use pandera to generate test / fake data

Describe the solution you'd like
an ability to use .example(size=20) also with pyspark based schemas

Describe alternatives you've considered

convert pyspark schema somehow automatically to a pandas one and then use example
could not figure out if there is a way to do so
ideally we'd also define the schema in a "agnostic" format (e.g. protofbuf / custom DSL / yaml / pydantic based) and derive the pyspark / pandas schema code from this

The text was updated successfully, but these errors were encountered:

pascalwhoop added the enhancement label

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment