Open-Source Projects
Discover top open-source software, updated regularly with real-world adoption signals.
Discover top open-source software, updated regularly with real-world adoption signals.

Blazing-fast database replication to Apache Iceberg tables
OLake replicates PostgreSQL, MySQL, MongoDB, Oracle, and Kafka to Apache Iceberg at high throughput with CDC support, no Spark or Flink required.

OLake is a high-performance connector designed for data engineers who need to replicate transactional databases into Apache Iceberg without the overhead of traditional streaming infrastructure. It supports PostgreSQL, MySQL, MongoDB, and Oracle sources with full load, CDC (change data capture), and incremental sync modes.
With benchmarks showing 235K RPS for PostgreSQL full loads—15.9× faster than Debezium—and 64K RPS for MySQL, OLake delivers enterprise-grade throughput on minimal infrastructure. It eliminates dependencies on Spark, Flink, Kafka, and Debezium, reducing operational complexity and cost. The self-serve web UI and Docker Compose deployment enable teams to configure and launch pipelines in minutes.
OLake writes directly to Apache Iceberg tables and supports Glue, Hive, JDBC, and REST catalogs (including Nessie, Polaris, Unity Catalog, and AWS S3 Tables). It also outputs Parquet to filesystems, with Delta Lake and Hudi support planned. Advanced users can leverage the CLI for automation and orchestration with Airflow or Kubernetes.
OLTP to Iceberg Migration
Replicate PostgreSQL or MySQL transactional databases to Iceberg tables without deploying Spark or Flink, reducing infrastructure cost and complexity.
Real-Time BI on CDC Data
Stream change data capture events into Iceberg and query fresh data with Athena, Trino, Presto, or Snowflake for near real-time analytics.
Cost-Efficient Data Lakehouse
Build a lakehouse on S3, ADLS, or GCS with high-throughput ingestion, leveraging Iceberg's open format for multi-engine access.
Self-Service Data Pipelines
Enable analysts and engineers to configure and launch replication jobs via the web UI, accelerating time-to-insight without custom code.
OLake supports PostgreSQL, MySQL, MongoDB (full load and CDC), and Oracle (full load and incremental). Kafka source support is in development.
No. OLake is infrastructure-light and does not depend on Spark, Flink, Kafka, or Debezium, reducing operational overhead and cost.
OLake supports AWS Glue, Hive, JDBC, and REST catalogs, including Nessie, Polaris, Unity Catalog, Lakekeeper, and AWS S3 Tables.
Use Docker Compose for quickstart with the web UI, or deploy via Kubernetes with Helm, standalone Docker, or Airflow on EC2 or Kubernetes.
OLake achieves 235K RPS for PostgreSQL full loads (15.9× faster than Debezium) and 64K RPS for MySQL (9× faster than Airbyte). Fully reproducible reports are forthcoming.
Project at a glance
ActiveLast synced 4 days ago