





















































Sponsored
🗞️Welcome tothis week’s edition ofBIPro #94, where we bring you the most exciting advancements shaping business intelligence, analytics, and AI.
From fully automated data cleaning to streamlined data pipelines and cutting-edge AI innovations, this curated list covers everything you need to stay ahead in the fast-moving world of data.
🔍 In This Edition:
✅ Automate messy data cleaning with Python to save time and boost accuracy
✅ Avoid common Power BI pitfalls for scalable, high-performance dashboards
✅ Supercharge SQL Server queries with anti-pattern detection and optimization
✅ Streamline Terraform and OpenTofu workflows for better infrastructure-as-code management
✅ Leverage Databricks for efficient data streaming in Azure
✅ Salesforce insights in BigQuery for unified analytics
📚 Must-Read Books for Data & BI Professionals
📖Causal Inference and Discovery in Python:Go beyond predictions with causal effect estimation in fraud, healthcare & more.
📖The Definitive Guide to Power Query (M): Automate data prep, optimize workflows & streamline analytics.
📖Bayesian Analysis with Python: Build Bayesian models with PyMC for smarter decisions, no stats needed!
📖Mastering PyTorch: Learn CNNs, transformers, AutoML & cloud deployment.
📖The Machine Learning Solutions Architect Handbook:Design & scale AI/ML like a pro.
📖Mastering Tableau 2023:AI-powered visualizations & governance for BI analysts.
🌟 BI & AI on the Rise
This week, we highlight AWS Pi Day 2025, Microsoft OneLake’s Iceberg integration, and Google’s Cloud Composer 3, all pushing the boundaries of data management, automation, and AI-driven insights. Plus, see how Definity Insurance transformed its analytics with BigQuery and Vertex AI, cutting migration time in half while unlocking real-time insights and AI-driven decision-making.
⚡ Ready to dive in? Scroll down for the latest trends and expert insights!
Cheers,
Merlyn Shelley
Growth Lead, Packt
Understanding why something happens is key for data professionals. This hands-on Python guide covers causal effect estimation, discovery, and ML applications in fraud, healthcare, and more. Elevate your models beyond prediction, get your copy and master causal inference today!
Tired of manual data cleaning? Master Power Query to automate, optimize, and speed up workflows. This guide covers fundamentals, advanced M language, and performance optimization, helping analysts and BI pros streamline prep, save time, and enhance analytics. Get your copy today!
Go beyond traditional stats with Bayesian analysis for confident, data-driven decisions. This Python guide covers modeling with PyMC, real-world applications, and model evaluation, ideal for data scientists, researchers, and developers. No prior stats experience needed, get your copy today!
Master PyTorch for cutting-edge AI! This guide covers CNNs, transformers, diffusion models, multi-GPU training, AutoML, and deployment to mobile, cloud, and production. Ideal for data scientists, ML engineers, and researchers, get your copy and level up today!
Design, deploy, and scale ML like an expert! Written by AWS’s David Ping, this guide covers ML lifecycle, enterprise AI architecture, and generative AI. Perfect for ML engineers, architects, and data scientists, get your copy and master ML solutions today!
Master Tableau and transform raw data into insights! This guide covers data prep, visualization, AI integration, and governance. Perfect for analysts, BI pros, and data scientists, build impactful dashboards and optimize performance. Get your copy today!
⏩ How to Fully Automate Data Cleaning with Python in 5 Steps: As a Business Intelligence professional, you often deal with messy data. This blog helps you automate data cleaning using Python’s pandas library, covering missing values, standardization, outlier handling, and validation, so you can build a reliable, repeatable pipeline for accurate analysis.
⏩ Top 5 Power BI Common Pitfalls: This blog highlights five common mistakes in Power BI projects and how to avoid them. It covers data modeling, ETL best practices, naming conventions, report performance, and source control, helping BI professionals build scalable, efficient, and well-structured Power BI solutions.
⏩ Identify Anti-Patterns in SQL Server Queries: This blog explores how SQL Server 2022’s Query_AntiPattern Extended Event helps identify inefficient query patterns. It covers common anti-patterns like non-sargable queries, parameter sniffing, and implicit conversions, guiding you in optimizing queries for better performance and resource utilization.
⏩ Digitally Signing a SQL Stored Procedure: This blog explains how to digitally sign SQL Server stored procedures using self-signed certificates. It covers creating certificates, adding signatures, verifying integrity, and detecting unauthorized modifications, helping database professionals ensure security and authenticity of SQL objects against accidental or malicious changes.
⏩ Optimize Delta Tables with VACUUM in Microsoft Fabric: This blog explains how to optimize Delta tables in Microsoft Fabric using the VACUUM operation. It covers identifying stale files, automating cleanup, preventing storage bloating, and maintaining partitioned data efficiently, helping data engineers improve performance and reduce unnecessary storage costs.
⏩ Python Modules for Developing Data Engineering Workloads: This blog explores essential Python modules for building data engineering pipelines, focusing on attrs, SQLAlchemy, and pandas. It covers their installation, use cases, examples, and caveats, helping data engineers develop scalable, efficient, and maintainable ETL/ELT workflows.
⏩ Gauss-Seidel Method SQL Function to Solve Linear Equations: This blog demonstrates how to implement the Gauss-Seidel method in SQL Server to solve systems of linear equations. It explains the function logic, input format, and practical examples, helping database professionals apply iterative numerical solutions directly within SQL.
⏩ Attribute-Level Governance Using Apache Iceberg Tables: This blog explains how to implement attribute-level governance using Apache Iceberg tables and AWS Lake Formation. It covers fine-grained access control, column and row-level security, and efficient data cataloging, helping organizations manage secure, scalable, and compliant data access across cloud environments.
⏩ Top Terraform and OpenTofu Tools to Use in 2025: Explore the top Terraform and OpenTofu tools for 2025, designed to enhance infrastructure management, security, and collaboration. This guide covers version control, automation, security scanning, cost estimation, and state management tools, helping DevOps teams optimize Infrastructure-as-Code workflows efficiently.
⏩ Queries for Optimizing and Debugging PostgreSQL Replication: Learn how to monitor, optimize, and debug PostgreSQL replication with key SQL queries. This guide covers tracking replication lag, managing slots, cleaning up unused subscriptions, and improving logical replication performance, helping database administrators maintain efficient and reliable PostgreSQL replication setups.
⏩ Data Streaming Databricks in Azure: This blog explores data streaming in Azure Databricks, comparing structured streaming and Auto Loader for ingesting files into Delta Lake. It covers implementation steps, best practices, performance considerations, and real-world examples to help data engineers build scalable streaming pipelines efficiently.
⏩ Using SQL Server Stored Procedures with the Django ORM: This blog explores integrating SQL Server stored procedures with Django’s ORM. It covers calling procedures, handling parameters, managing transactions, capturing multiple result sets, and dealing with output parameters, all with step-by-step explanations and code snippets for practical implementation.
⏩ Unlock the power of your Iceberg data in OneLake: This blog introduces Microsoft OneLake’s integration with Snowflake and Apache Iceberg tables, enabling seamless data sharing without duplication. It covers the latest updates, steps to get started, and upcoming features that enhance interoperability, performance, and schema-level data management in Fabric.
⏩ AWS Data & AI Day Copenhagen showcases the latest innovations in analytics and machine learning: AWS Data & AI Day Copenhagen brought together industry leaders to showcase cutting-edge innovations in data analytics and AI. The event featured success stories from Basware, Novo Nordisk, and Casper’s Ice Cream, illustrating how businesses leverage Amazon QuickSight, SageMaker, and AWS AI services to drive transformation.
⏩ Accelerate analytics and AI innovation with the next generation of Amazon SageMaker: Amazon SageMaker has evolved into a unified data and AI development environment, streamlining how organizations manage analytics, machine learning, and generative AI. With SageMaker Unified Studio, teams can access, analyze, and act on data seamlessly, integrating AWS services like Redshift, Athena, and Amazon Bedrock to accelerate innovation.
⏩ Streamlined Multiomics Data Analysis Leveraging Illumina Software on AWS: Multiomics research is transforming biomedical science, but managing vast genomic, transcriptomic, and proteomic datasets presents challenges. Illumina’s AWS-powered informatics solutions, including DRAGEN, Illumina Connected Analytics, and Correlation Engine,help researchers analyze, integrate, and visualize complex multiomics data efficiently, unlocking new insights into disease mechanisms and biomarker discovery.
⏩ AWS Pi Day 2025: Data foundation for analytics and AI: AWS Pi Day 2025 showcased the latest advancements in cloud data management, analytics, and AI, with a focus on Amazon S3 Tables, SageMaker Unified Studio, and SageMaker Lakehouse. These innovations streamline data access, accelerate AI development, and unify analytics workflows for seamless, scalable insights.
⏩ Datastream extracts Salesforce Data cloud data: Google Cloud has expanded Datastream to support Salesforce Data Cloud, enabling seamless real-time data replication into BigQuery, Cloud Storage, and other destinations. This integration eliminates data silos, enhances analytics, and empowers businesses with unified insights across operational and SaaS data for better decision-making.
⏩ Cloud Composer 3 for Apache Airflow: Google Cloud has announced Cloud Composer 3, the next-generation managed Apache Airflow service, designed to simplify data pipeline orchestration. With hidden infrastructure, enhanced performance, simplified networking, and per-task resource control, data teams can focus on workflows rather than maintenance, boosting efficiency, security, and scalability.
⏩ Definity's leap to data agility with BigQuery and Vertex AI: Definity Insurance successfully modernized its data infrastructure by migrating to Google Cloud’s BigQuery and Vertex AI, replacing its legacy Cloudera platform in just 10 months. This transformation reduced costs, improved scalability, accelerated AI adoption, and enabled real-time analytics, enhancing customer experiences and operational efficiency.