





















































By Wendy S. Batchelder
With 2.5 quintillion bytes of data generated daily, effective data governance is more crucial than ever. The Data Governance Handbook equips data professionals with practical strategies to ensure trustworthy, business-aligned data solutions.
Written by a three-time Fortune 500 Chief Data Officer, this guide helps CDOs, data leaders, engineers, and IT professionals:
✅ Build a strong governance foundation and drive real impact.
✅ Secure executive buy-in with measurable business results.
✅ Scale governance programs effectively using real-world insights.
✅ Enable data-driven transformation with actionable use cases.
No coding or sales expertise needed, just a clear, results-driven approach to mastering data governance. Ready to transform your data strategy? This book is for you.
🗞️ Welcome to BIPro#93– Your Weekly Business Intelligence Boost! 🚀
The world of data never stands still. New tools, techniques, and challenges constantly reshape how we work, pushing us to stay ahead. In this edition, we’re focusing on practical insights that can make a real impact, whether you're fine-tuning performance, automating workflows, or harnessing real-time analytics.
From Streaming SQL on Kafka for processing live data to Snowflake optimizations on AWS that improve efficiency and cost management, we’re tackling the challenges that matter. We also explore SQL best practices, Python automation for data cleaning, and AI-driven enhancements in BI, all designed to make your workflows smoother and smarter.
It’s not just about keeping up with trends, it’s about making data work better for you.
Inside this issue:
📊 Real-time data analytics with Streaming SQL on Kafka ~because batch processing shouldn’t slow you down.
⚡ Optimizing Snowflake on AWS ~ faster queries, leaner warehouses, and smarter cost management.
💡 Automating data cleaning with Python ~ eliminate repetitive tasks and boost efficiency.
🔍 BI hacks & SQL insights ~from fine-tuning SQL Server indexes to deploying DACPACs in Azure, get practical, time-saving tips.
Whether you’re building real-time pipelines, optimizing performance, or exploring the latest AI-powered BI tools, there’s something here for you.
So, grab a coffee, dive in, and let’s make data work smarter. ☕📈
Cheers,
Merlyn Shelley
Growth Lead, Packt
Sponsored
By Kirill Kolodiazhnyi
Harness the power of machine learning and deep learning using C++ with this hands-on guide. Written by an experienced software engineer, this book walks you through data processing, model selection, and performance optimization, equipping you with the skills to build and deploy efficient ML models on mobile and embedded devices.
Whether you're a developer, data scientist, or analyst, you’ll learn how to:
✅ Leverage C++ libraries for machine learning and deep learning tasks.
✅ Build smart models for recommendations, anomaly detection, and sentiment analysis.
✅ Optimize ML models using hyperparameter tuning and experiment tracking.
✅ Deploy models to mobile and embedded platforms for real-time applications.
With practical examples, real-world use cases, and step-by-step guidance, this book ensures you can apply ML techniques effectively in C++. Master ML with C++ and take your models to production!
Creating Automated Data Cleaning Pipelines Using Python and Pandas
Tired of repeating the same data cleaning steps? This blog shows you how to automate the process using Python and Pandas. From standardizing imports to building cleaning pipelines and tracking data quality, you'll save time, reduce errors, and work more efficiently.
10 Python One-Liners for Scikit-learn
This blog is all about writing cleaner, more efficient machine learning code using Scikit-learn. It introduces 10 powerful Python one-liners that simplify essential tasks like data loading, preprocessing, model training, evaluation, and pipeline creation. Whether you're experimenting, prototyping, or streamlining your workflow, these concise snippets will help you cut down unnecessary code while keeping things clear and effective.
INFO.VIEW DAX Functions Usage and Examples
This blog explores the INFO.VIEW DAX functions in Power BI, introduced in the October 2024 update, which allow users to auto-document their semantic models. It explains how INFO.VIEW.COLUMNS(), INFO.VIEW.TABLES(), INFO.VIEW.RELATIONSHIPS(), and INFO.VIEW.MEASURES() work, providing syntax, usage examples, and real-world applications. Unlike traditional INFO. DAX functions, these can be used in calculated tables, making models more transparent and easier to maintain.
By Arshad Ali, Schacht
Microsoft Fabric is the ultimate unified analytics solution for the AI era, seamlessly integrating data engineering, real-time analytics, AI, and visualization in one platform.
This book equips data professionals, analysts, engineers, and AI/ML experts with the knowledge to:
✅ Build scalable data solutions for lakehouses, warehouses, and real-time analytics.
✅ Integrate and transform data using Spark, Notebooks, and T-SQL.
✅ Monitor, manage, and secure Fabric environments with best practices.
✅ Leverage AI-powered analytics with Copilot for enhanced productivity.
No matter your data role, this book provides a practical, hands-on guide to mastering Microsoft Fabric. Future-proof your data analytics journey today!
Index Rebuilds Make Even Less Sense with ADR & RCSI
This blog explores why index rebuilds are often unnecessary in SQL Server when Accelerated Database Recovery (ADR) and Read Committed Snapshot Isolation (RCSI) are enabled. It demonstrates how these features increase table size due to row versioning but also explains why rebuilding indexes doesn’t provide lasting space savings. Instead of outdated index maintenance practices, the article encourages a shift in mindset, understanding why table sizes grow and focusing on the real problem before applying traditional solutions.
SQL Server Backup or Restore using Network Share with SSMS
This blog walks through how to backup or restore a SQL Server database using a network share in SQL Server Management Studio (SSMS) when local storage is limited. It explains how to map a network drive, enable xp_cmdshell to make it visible to SQL Server, and perform a database restore or backup directly from the network location. The guide also includes steps to verify the mapped drive and remove it once the process is complete.
Deploy DACPAC to Azure SQL Database using Visual Studio
This blog explains how to deploy a DACPAC file to an Azure SQL Database using Visual Studio. It covers creating a DACPAC, configuring deployment settings, selecting the correct target platform, and verifying the deployment. Additionally, it provides troubleshooting tips and best practices.
By Greg Deckler, Powell
The Power BI Cookbook is the go-to resource for BI professionals and data analysts looking to master data integration, visualization, and advanced reporting in Power BI. This updated edition brings the latest Microsoft Data Fabric capabilities, Hybrid tables, and AI-driven enhancements, helping you build powerful, future-ready BI solutions.
Whether you're a BI developer, analyst, or data professional, this book will help you:
✅ Leverage Microsoft Data Fabric for deeper insights and robust data strategies.
✅ Create Hybrid tables, scorecards, and shared cloud connections with ease.
✅ Turn complex data into clear, actionable reports using updated visualization tools.
✅ Enhance security, governance, and real-time processing for enterprise-ready BI.
Packed with step-by-step guidance and real-world use cases, this book ensures you stay ahead in the evolving Power BI landscape. Take your Power BI expertise to the next level!
Announcing AI functions for seamless data engineering with GenAI
This blog introduces AI functions in Microsoft Fabric, now in preview. It explains how to use LLM-powered transformations like summarization, classification, sentiment analysis, translation, and text generation on OneLake data with just a single line of code. It covers setup, usage, customization, and prerequisites.
Nubank elevates customer experiences with OpenAI
This blog highlights how Nubank leverages OpenAI’s AI solutions to enhance customer service, fraud prevention, and internal efficiency. It covers enterprise search, call center AI copilots, AI-powered assistants, and GPT-4o vision for fraud detection, improving response times and customer satisfaction.
Datastream extracts Salesforce Data cloud data
This blog introduces Datastream’s new support for Salesforce Data Cloud, enabling real-time data replication to BigQuery and other Google Cloud destinations. It explains how businesses can unify SaaS and operational data for advanced analytics, improve decision-making, and simplify integration without infrastructure management.
This blog explains how to build a secure data visualization application using the Amazon Redshift Data API with AWS IAM Identity Center. It covers single sign-on (SSO), role-based access control (RBAC), row-level security (RLS), and trusted identity propagation to ensure secure, role-based data access in Streamlit applications.
By Bojan Kolosnjaji, Huang Xiao, Peng Xu, Apostolis Zarras
Artificial Intelligence is transforming cybersecurity, enabling faster threat detection, smarter authentication, and more resilient defenses. This book bridges the gap between AI and cybersecurity, providing practical guidance, step-by-step exercises, and real-world applications to help professionals design, implement, and evaluate AI-driven security solutions.
Whether you're a machine learning practitioner or a cybersecurity professional, you’ll gain the skills to:
✅ Understand AI methods and their role in cybersecurity.
✅ Design AI-powered security solutions to detect and prevent cyber threats.
✅ Apply AI techniques using hands-on exercises and code examples.
✅ Avoid common pitfalls and optimize AI implementation for real-world scenarios.
Packed with practical insights and expert guidance, this book ensures you can confidently integrate AI into your cybersecurity strategy. Stay ahead of cyber threats with AI-powered defense strategies!
Harnessing Real-Time Insights With Streaming SQL on Kafka
This blog explores Streaming SQL on Kafka, enabling real-time data processing with SQL-based queries on Kafka topics. It covers key components, streaming SQL tools like ksqlDB, Flink, and Spark Structured Streaming, practical use cases, benefits, and challenges, helping businesses simplify real-time analytics and decision-making.
Database Query Service With OpenAI and PostgreSQL in .NET
This blog explains how to build a database query service in .NET using OpenAI’s GPT-4 and PostgreSQL. It covers natural language to SQL conversion, schema retrieval, secure query execution, and SQL validation to ensure safe, efficient, and user-friendly database interactions without manual query writing.
Publish a Fabric SQL Database with Pre- or Post-Deployment Scripts
This blog explains how to publish a Fabric SQL Database using Azure Data Studio (ADS), including pre- and post-deployment scripts. It covers creating a database project, connecting to Microsoft Fabric, adding objects like tables, and configuring deployment settings to streamline database management and automation.
Exploring Scalar Solutions to Complex Data Math
This blog explores efficient date calculations in SQL Server, focusing on counting specific weekdays within a date range. It evaluates iterative vs. optimized approaches, ultimately presenting a bitwise math solution that eliminates loops, significantly improving performance. The approach ensures scalability for large datasets and complex date-matching scenarios.
Improve Query Performance when SQL Server Ignores Nonclustered Index
This blog explains how to improve SQL Server query performance when the optimizer ignores a nonclustered index. It explores key lookups, covering indexes, and query optimization techniques to reduce logical reads and execution time, ultimately ensuring efficient index usage and resource optimization for better database performance.
Performance Optimization Techniques for Snowflake on AWS
This blog explores performance optimization techniques for Snowflake on AWS, covering storage, compute, and query efficiency. It provides best practices, SQL examples, and strategies for warehouse tuning, query optimization, clustering, caching, ETL efficiency, and cost control, ensuring high performance and cost-effective data operations.