OLake

Blazing-fast database replication to Apache Iceberg tables

OLake replicates PostgreSQL, MySQL, MongoDB, Oracle, and Kafka to Apache Iceberg at high throughput with CDC support, no Spark or Flink required.

Overview

Fast, Infrastructure-Light Data Replication

OLake is a high-performance connector designed for data engineers who need to replicate transactional databases into Apache Iceberg without the overhead of traditional streaming infrastructure. It supports PostgreSQL, MySQL, MongoDB, and Oracle sources with full load, CDC (change data capture), and incremental sync modes.

Built for Speed and Simplicity

With benchmarks showing 235K RPS for PostgreSQL full loads—15.9× faster than Debezium—and 64K RPS for MySQL, OLake delivers enterprise-grade throughput on minimal infrastructure. It eliminates dependencies on Spark, Flink, Kafka, and Debezium, reducing operational complexity and cost. The self-serve web UI and Docker Compose deployment enable teams to configure and launch pipelines in minutes.

Iceberg-Native Architecture

OLake writes directly to Apache Iceberg tables and supports Glue, Hive, JDBC, and REST catalogs (including Nessie, Polaris, Unity Catalog, and AWS S3 Tables). It also outputs Parquet to filesystems, with Delta Lake and Hudi support planned. Advanced users can leverage the CLI for automation and orchestration with Airflow or Kubernetes.

Highlights

235K RPS PostgreSQL throughput, 15.9× faster than Debezium in benchmarks

Full load, CDC, and incremental sync for PostgreSQL, MySQL, MongoDB, Oracle

Native Apache Iceberg writer supporting Glue, Hive, JDBC, and REST catalogs

Self-serve web UI and CLI with Docker Compose, Kubernetes, and Airflow deployment

Pros

Exceptional throughput with minimal infrastructure—no Spark, Flink, or Kafka required
Automatic schema discovery and CDC replication simplify pipeline setup
Supports multiple Iceberg catalogs and cloud storage backends (S3, ADLS, GCS, MinIO)
Docker Compose quickstart and web UI lower the barrier to entry

Considerations

MongoDB and Oracle CDC support marked as work-in-progress
Kafka source and pgoutput for PostgreSQL still under development
Delta Lake and Hudi destinations planned but not yet available
Benchmark reproducibility and full reports pending publication

Fit guide

Great for

Data teams migrating OLTP databases to Iceberg without heavy streaming infrastructure
Organizations requiring high-throughput CDC replication on cost-efficient object storage
Teams using Athena, Trino, Presto, Dremio, Databricks, or Snowflake for BI on Iceberg
Engineers seeking self-serve UI and CLI automation for data lakehouse pipelines

Not ideal when

Projects requiring production-ready Kafka source or pgoutput CDC for PostgreSQL today
Teams needing Delta Lake or Hudi destinations in the immediate term
Use cases demanding fully published, reproducible benchmark validation before adoption
Environments where MongoDB or Oracle CDC must be production-stable now

How teams use it

OLTP to Iceberg Migration

Replicate PostgreSQL or MySQL transactional databases to Iceberg tables without deploying Spark or Flink, reducing infrastructure cost and complexity.

Real-Time BI on CDC Data

Stream change data capture events into Iceberg and query fresh data with Athena, Trino, Presto, or Snowflake for near real-time analytics.

Cost-Efficient Data Lakehouse

Build a lakehouse on S3, ADLS, or GCS with high-throughput ingestion, leveraging Iceberg's open format for multi-engine access.

Self-Service Data Pipelines

Enable analysts and engineers to configure and launch replication jobs via the web UI, accelerating time-to-insight without custom code.

Tech snapshot

Go86%

Java11%

Shell2%

Dockerfile1%

Makefile1%

Frequently asked questions

What databases does OLake support as sources?

OLake supports PostgreSQL, MySQL, MongoDB (full load and CDC), and Oracle (full load and incremental). Kafka source support is in development.

Does OLake require Spark, Flink, or Kafka?

No. OLake is infrastructure-light and does not depend on Spark, Flink, Kafka, or Debezium, reducing operational overhead and cost.

Which Iceberg catalogs are supported?

OLake supports AWS Glue, Hive, JDBC, and REST catalogs, including Nessie, Polaris, Unity Catalog, Lakekeeper, and AWS S3 Tables.

How do I deploy OLake?

Use Docker Compose for quickstart with the web UI, or deploy via Kubernetes with Helm, standalone Docker, or Airflow on EC2 or Kubernetes.

What are the benchmark results?

OLake achieves 235K RPS for PostgreSQL full loads (15.9× faster than Debezium) and 64K RPS for MySQL (9× faster than Airbyte). Fully reproducible reports are forthcoming.

Project at a glance

Active

Visit site View repo

Stars: 1,410
Watchers: 1,410
Forks: 239

LicenseApache-2.0

Repo age1 year old

Last commit14 hours ago

Primary languageGo

Last synced 4 hours ago