





















































Hi ,
Welcome to a brand new issue of PythonPro!
In today’sExpert Insight we bring you an excerpt from the recently published book, Python Data Cleaning and Preparation Best Practices, which compares Pandas profiling and Great Expectations for data profiling and analysis.
News Highlights: DJP a Pluggy-based plugin system for Django launches, easing integration; and PondRAT malware, hidden in Python packages, targets developers in a supply chain attack.
Here are my top 5 picks from our learning resources today:
And, today’s Featured Study, introduces sbijax, a Python package built on JAX for efficient neural simulation-based inference (SBI), offering a wide range of algorithms, a user-friendly interface, and tools for efficient and scalable Bayesian analysis.
Stay awesome!
Divya Anne Selvaraj
Editor-in-Chief
P.S.:This month's survey is now live, do take the opportunity to leave us your feedback, request a learning resource, and earn your one Packt credit for this month.
django-plugin-blog
.pytest
fixture using Tree-sitter
to parse function definitions and Jedi
to rename identifiers.withColumn
within loops."Simulation-based Inference with the Python Package sbijax" by Dirmeier et al., introduces sbijax, a Python package for neural simulation-based inference (SBI). The paper outlines the package’s implementation of advanced Bayesian inference methodologies using JAX for computational efficiency.
SBIis a technique for Bayesian inference when the likelihood function is too complex to compute directly. By using neural networks as surrogates, SBI approximates complex Bayesian posterior distributions, which describe the probability of model parameters given observed data. Neural density estimation, a modern approach to SBI, refers to using neural networks to model these complex distributions accurately. The sbijax package enables this inference process by offering a range of neural inference methods, and it is built on JAX. JAX is a Python library that provides efficient automatic differentiation and parallel computation on both CPUs and GPUs. This makes sbijax particularly relevant for statisticians, data scientists, and modellers working with complex Bayesian models.
InferenceData
objects for easy exploration and analysis of posterior samples.sbijax is most useful for computational modellers, data scientists, and statisticians who require efficient and flexible tools for Bayesian inference. Its user-friendly interface, coupled with computational efficiency, makes it practical for those working with high-dimensional or complex simulation models.
The authors validate sbijax by showcasing its implementation in different SBI methods and comparing performance against conventional tools. The package provides sequential inference capabilities, combining both neural density estimation techniques and traditional ABC. The authors demonstrate sbijax’s functionality by training models using real and synthetic data, then sampling from the posterior distributions. In a benchmark example with a bivariate Gaussian model, sbijax successfully approximates complex posterior distributions using various algorithms like NLE and SMC-ABC.
The paper details the efficiency and accuracy of sbijax, backed by empirical evaluations that show JAX's computational advantage over other libraries like PyTorch. Its consistent performance across various SBI tasks underscores its reliability and broad applicability in Bayesian analysis.
You can learn more by reading the entire paper or accessing the sbijax documentation here.
Here’s an excerpt from “Chapter 3: Data Profiling – Understanding Data Structure, Quality, and Distribution” in the book, Python Data Cleaning and Preparation Best Practices by Maria Zervou, published in September 2024.
Pandas profiling and Great Expectations are both valuable tools for data profiling and analysis, but they have different strengths and use cases.
Here’s a comparison between thetwo tools.
Table 3.2 – Great Expectations and pandas profiler comparison
Pandas profiling is well suited for quick data exploration and initial insights, while Great Expectations excels in data validation, documentation, and enforcing data quality rules. Pandas profiling is more beginner-friendly and provides immediate insights, while Great Expectations offers more advanced customization options and scalability for larger datasets. The choice between the two depends on the specific requirements of the project and the level of data qualitycontrol needed.
As the volume of data increases, we need to make sure that the choice of tools we’ve made can scale as well. Let’s have a look at how we can do this withGreat Expectations.
Note
While Great Expectations offers scalability options, the specific scalability measures may depend on the underlying infrastructure, data storage systems, and distributed processing frameworks employed in your bigdata environment.
Packt library subscribers can continue reading the entire book for free. You can buy Python Data Cleaning and Preparation Best Practices,here.
And that’s a wrap.
We have an entire range of newsletters with focused content for tech pros. Subscribe to the ones you find the most usefulhere. The complete PythonPro archives can be foundhere.
If you have any suggestions or feedback, or would like us to find you aPythonlearning resource on a particular subject, take thesurveyor just respond to this email!