This is the code repository for Databricks Certified Associate Developer for Apache Spark Using Python, published by Packt.
The ultimate guide to getting certified in Apache Spark using practical examples with Python
This guide gets you ready for certification with expert-backed content, key exam concepts, and topic reviews. Additionally, you’ll be able to make the most of Apache Spark 3.0 to modernize workloads and more using specific tools and techniques.
This book covers the following exciting features:
- Create and manipulate SQL queries in Spark
- Build complex Spark functions using Spark UDFs
- Architect big data apps with Spark fundamentals for optimal design
- Apply techniques to manipulate and optimize big data applications
- Build real-time or near-real-time applications using Spark Streaming
- Work with Apache Spark for machine learning applications
If you feel this book is for you, get your copy today!
All of the code is organized into folders. For example, Chapter04.
The code will look like the following:
# Perform an aggregation to calculate the average salary
average_salary = spark.sql("SELECT AVG(Salary) AS average_salary FROM
employees")
Following is what you need for this book: This book is for you if you’re a professional looking to venture into the world of big data and data engineering, a data professional who wants to endorse your knowledge of Spark, or a student. Although working knowledge of Python is required, no prior Spark knowledge is needed. Additionally, experience with Pyspark will be beneficial.
With the following software and hardware list you can run all code files present in the book (Chapter 4-8).
Chapter | Software required | OS required |
---|---|---|
4-8 | Python | Windows, Mac OS X, and Linux |
4-8 | Spark | Windows, Mac OS X, and Linux |
Saba Shah is a Data and AI Architect and Evangelist with a wide technical breadth and deep understanding of big data and machine learning technologies. She has experience leading data science and data engineering teams in Fortune 500s as well as startups. She started her career as a software engineer but soon transitioned to big data. She is currently a solutions architect at Databricks and works with enterprises building their data strategy and helping them create a vision for the future with machine learning and predictive analytics. Saba graduated with a degree in Computer Science and later earned an MS degree in Advanced Web Technologies. She is passionate about all things data and cricket. She currently resides in RTP, NC.