π Documentation | π Performance Benchmarks
Vortex is a next-generation columnar file format and toolkit designed for high-performance data analytics. It provides:
-
β‘οΈ Blazing Fast Performance
- 100-200x faster random access reads than Apache Parquet
- 2-10x faster scans with similar compression ratios and write throughput
- Efficient support for wide tables with zero-copy/zero-parse metadata
-
π§ Extensible Architecture
- Modeled after Apache DataFusion's extensible approach
- Pluggable encoding system
- Zero-copy compatibility with Apache Arrow
π§ Development Status: This project is under active development. APIs and file formats may change, and some features are still being implemented.
- β¨ Logical Types - Clean separation between logical schema and physical layout
- π Zero-Copy Arrow Integration - Seamless conversion to/from Apache Arrow arrays
- 𧩠Extensible Encodings - Pluggable physical layouts with built-in optimizations
- π¦ Cascading Compression - Support for nested encoding schemes
- π High-Performance Computing - Optimized compute kernels for encoded data
- π Rich Statistics - Lazy-loaded summary statistics for optimization
Vortex strictly separates logical and physical concerns:
- Logical Layer: Defines data types and schema
- Physical Layer: Handles encoding and storage implementation
- Built-in Encodings: Compatible with Apache Arrow's memory format
- Extension Encodings: Optimized compression schemes (RLE, dictionary, etc.)
All features are exported through the main vortex
crate.
cargo add vortex
uv add vortex-array
For browsing the structure of Vortex files, you can use the vx
command-line tool.
# Install latest release
cargo install vortex-tui --locked
# Or build from source
cargo install --path vortex-tui --locked
# Usage
vx browse <file>
# Optional but recommended dependencies
brew install flatbuffers protobuf # For .fbs and .proto files
brew install duckdb # For benchmarks
# Install Rust toolchain
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
# or
brew install rustup
# Initialize submodules
git submodule update --init --recursive
# Setup dependencies with uv
uv sync --all-packages
For optimal performance, use MiMalloc:
#[global_allocator]
static GLOBAL_ALLOC: MiMalloc = MiMalloc;
Licensed under the Apache License, Version 2.0
Vortex is committed to remaining open-source, following governance models inspired by the Substrait project and Apache Software Foundation.
See CONTRIBUTING.md for guidelines.
This project builds upon groundbreaking work from the academic and open-source communities:
- BtrBlocks - Efficient columnar compression
- FastLanes - High-performance integer compression
- FSST - Fast random access string compression
- ALP - Adaptive lossless floating-point compression
- Procella - YouTube's unified data system
- Cloud Object Storage Analytics - High-performance analytics
- ClickHouse - Fast analytics for everyone
- Apache Arrow & Apache DataFusion
- parquet2 by Jorge Leitao
- DuckDB
- Velox & Nimble
Thanks to all contributors who have shared their knowledge and code with the community! π