Small demo on Big Data concepts
Should cover as many possible technologies from this list:
- Linux basics
- MapReduce
- Scala Spark
- PySpark
- Kafka
- Nifi
- Hue
- Sqoop
- Kotlin
- Java
The program should ideally:
- Download CSV file from HDFS (Local)
- Have a Dataframe with a proper header
- Reduce the amount of partitions
- Filter the desired data from the main Dataframe
- Join several Dataframes together
- Export the new DataFrames to Parquet
- (Bonus) Visual report of the data