Skip to content

Databricks Certified Associate Developer for Apache Spark Using Python, Published by Packt

License

Notifications You must be signed in to change notification settings

PacktPublishing/Databricks-Certified-Associate-Developer-for-Apache-Spark-Using-Python

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Databricks Certified Associate Developer for Apache Spark Using Python

no-image

This is the code repository for Databricks Certified Associate Developer for Apache Spark Using Python, published by Packt.

The ultimate guide to getting certified in Apache Spark using practical examples with Python

What is this book about?

This guide gets you ready for certification with expert-backed content, key exam concepts, and topic reviews. Additionally, you’ll be able to make the most of Apache Spark 3.0 to modernize workloads and more using specific tools and techniques.

This book covers the following exciting features:

  • Create and manipulate SQL queries in Spark
  • Build complex Spark functions using Spark UDFs
  • Architect big data apps with Spark fundamentals for optimal design
  • Apply techniques to manipulate and optimize big data applications
  • Build real-time or near-real-time applications using Spark Streaming
  • Work with Apache Spark for machine learning applications

If you feel this book is for you, get your copy today!

https://www.packtpub.com/

Instructions and Navigations

All of the code is organized into folders. For example, Chapter04.

The code will look like the following:

# Perform an aggregation to calculate the average salary
average_salary = spark.sql("SELECT AVG(Salary) AS average_salary FROM 
employees")

Following is what you need for this book: This book is for you if you’re a professional looking to venture into the world of big data and data engineering, a data professional who wants to endorse your knowledge of Spark, or a student. Although working knowledge of Python is required, no prior Spark knowledge is needed. Additionally, experience with Pyspark will be beneficial.

With the following software and hardware list you can run all code files present in the book (Chapter 4-8).

Software and Hardware List

Chapter Software required OS required
4-8 Python Windows, Mac OS X, and Linux
4-8 Spark Windows, Mac OS X, and Linux

Related products

Get to Know the Author

Saba Shah is a Data and AI Architect and Evangelist with a wide technical breadth and deep understanding of big data and machine learning technologies. She has experience leading data science and data engineering teams in Fortune 500s as well as startups. She started her career as a software engineer but soon transitioned to big data. She is currently a solutions architect at Databricks and works with enterprises building their data strategy and helping them create a vision for the future with machine learning and predictive analytics. Saba graduated with a degree in Computer Science and later earned an MS degree in Advanced Web Technologies. She is passionate about all things data and cricket. She currently resides in RTP, NC.

About

Databricks Certified Associate Developer for Apache Spark Using Python, Published by Packt

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •