delta-io / delta
An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs
See what the GitHub community is most excited about today.
An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs
Deequ is a library built on top of Apache Spark for defining "unit tests for data", which measure data quality in large datasets.
Source code for Twitter's Recommendation Algorithm
Scala 2 compiler and standard library. Scala 2 bugs at https://github.com/scala/bug; Scala 3 at https://github.com/scala/scala3
The Community Maintained High Velocity Web Framework For Java and Scala.
Apache Spark - A unified analytics engine for large-scale data processing
Removes large or troublesome blobs like git-filter-branch does, but faster. And written in Scala
Gluten is a middle layer responsible for offloading JVM-based SQL engines' execution to native engines.
The Scala 3 compiler, also known as Dotty.
Open-source high-performance RISC-V processor
Rocket Chip Generator
Apache Kyuubi is a distributed and multi-tenant gateway to provide serverless SQL on data warehouses and lakehouses.
State of the Art Natural Language Processing
Spark RAPIDS plugin - accelerate Apache Spark with GPUs
The Daml smart contract language
A Spark plugin for reading and writing Excel files
Chisel: A Modern Hardware Design Language
♞ lichess.org: the forever free, adless and open source chess server ♞
Spark: The Definitive Guide's Code Repository
sbt, the interactive build tool
FEEL parser and interpreter written in Scala
Scientific workflow engine designed for simplicity & scalability. Trivially transition between one off use cases to massive scale production environments