Big Data Processing Using Spark & Airflow
Spark
- Deployed Spark using Docker
- Defined Spark Sessions, created views & executed SQL queries on flights' departure delay data
Airflow
- Deployed Airflow in a Docker container
- Created a workflow using directed acyclic graphs (DAGs) & tasks for executing simple python function
- Inspected logs to make sure DAG ran successfully