#

datalake

Here are 20 public repositories matching this topic...

trinodb / trino

Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)

java distributed-systems data-science sql database big-data presto hive hadoop analytics jdbc databases distributed-database query-engine iceberg datalake prestodb trino delta-lake

Updated Apr 15, 2025
Java

StarRocks / starrocks

The world's fastest open query engine for sub-second analytics both on and off the data lakehouse. With the flexibility to support nearly any scenario, StarRocks provides best-in-class performance for multi-dimensional analytics, real-time analytics, and ad-hoc queries. A Linux Foundation project.

Updated Apr 15, 2025
Java

apache / hudi

Upserts, Deletes And Incremental Processing on Big Data.

bigdata stream-processing data-integration datalake apachespark hudi apachehudi incremental-processing apacheflink

Updated Apr 15, 2025
Java

dinky

DataLinkDC / dinky

Dinky is a real-time data development platform based on Apache Flink, enabling agile data development, deployment and operation.

sql olap flink datawarehouse datalake flinksql flinkcdc real-time-computing-platform

Updated Apr 11, 2025
Java

lakesoul-io / LakeSoul

LakeSoul is an end-to-end, realtime and cloud native Lakehouse framework with fast data ingestion, concurrent update and incremental data analytics on cloud storages for both BI and AI applications.

python rust streaming sql big-data spark arrow postgresql pytorch flink datalake vectorized velox huggingface datafusion lakehouse lakesoul

Updated Apr 10, 2025
Java

apache / gravitino

World's most powerful open data catalog for building a high-performance, geo-distributed and federated metadata lake.

metadata data-catalog datalake stratosphere federated-query lakehouse model-catalog metalake skycomputing ai-catalog opendatacatalog

Updated Apr 15, 2025
Java

zingg

zinggAI / zingg

Scalable identity resolution, entity resolution, data mastering and deduplication using ML

Updated Apr 15, 2025
Java

apache / amoro

Apache Amoro (incubating) is a Lakehouse management system built on open data lake formats.

bigdata datalake lakehouse

Updated Apr 14, 2025
Java

linkedin / openhouse

Open Control Plane for Tables in Data Lakehouse

big-data catalog management declarative tables iceberg datalake datalakehouse

Updated Apr 10, 2025
Java

WeBankFinTech / Streamis

Streaming application development and management system, based on Linkis and DSS, planning to provide the workflow-like graphical drag-and-drop development capability.

streaming kafka warehouse flink iceberg datalake hudi deltalake linkis dataspherestudio wedatasphere streamis

Updated Apr 8, 2025
Java

ismailsimsek / iceberg-examples

Apache iceberg Spark s3 examples

sql s3 iceberg datalake sql-merge

Updated Mar 1, 2024
Java

bihaiyang / datalake-example

Data lake implementation demo, include iceberg on flink, iceberg on spark, hudi on flink, hudi on spark

spark sparksql flink iceberg datalake alluxio hudi flinksql

Updated Nov 19, 2023
Java

memiiso / debezium-server-batch

Debezium server batch consumers

spark batch parquet cdc hacktoberfest datalake debezium hacktoberfest2021

Updated Jul 20, 2022
Java

aboudnik / ariadne

A new Data Lake: virtual data platform, Catalog, and Resource-Driven processing

bigdata datalake spark-sql

Updated Jun 17, 2022
Java

KaveIO / LocalCatalogManager

open-source metadata data-science mongodb hadoop bigdata restful-api datalake

Updated Mar 6, 2018
Java

gpism / OpenDataCore

Welcome to the fascinating intersection of Web3, Artificial Intelligence (AI), Open Data Core (ODC), and Composable Enterprise Fabric - a nexus of modern technologies that are significantly reshaping the enterprise landscape

data fabric iot-platform ingestion web3 datalake

Updated Jul 12, 2023
Java

bloomberg / aws-proxy

Proxy for AWS

java aws hive analytics s3 sts iceberg datalake trino

Updated Oct 31, 2024
Java

liyichencc / incubator-paimon

Apache Paimon(incubating) is a streaming data lake platform that supports high-speed data ingestion, change data tracking and efficient real-time analytics.

apache datalake

Updated Nov 7, 2023
Java

KaveIO / Service

Standalone service for easily exposing DataLake contents

java mongodb rest-api datalake

Updated Apr 26, 2017
Java

cuiyuheng / hudi

Upserts, Deletes And Incremental Processing on Big Data.

database datalake

Updated Dec 4, 2024
Java

Improve this page

Add a description, image, and links to the datalake topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the datalake topic, visit your repo's landing page and select "manage topics."