BemiDB

Postgres-compatible analytical database with built-in data sync connectors

BemiDB combines data ingestion and analytical querying in one Docker image. Sync from Postgres, Amplitude, or Attio, store compressed columnar data in S3, and query 2000x faster than regular Postgres.

Overview

What is BemiDB?

BemiDB is an analytical database that bundles data ingestion and query capabilities into a single deployment. It connects to operational databases and SaaS platforms, syncs data to S3-compatible object storage in compressed columnar format, and exposes a Postgres-compatible query interface for analytics.

Who Should Use It?

Engineering teams who need to centralize data from multiple sources without managing separate ETL pipelines and data warehouses. The Postgres wire protocol compatibility means existing BI tools, notebooks, and ORMs work without modification.

Core Capabilities

BemiDB ships with built-in connectors for Postgres, Amplitude, and Attio. Data is stored using an open table format with 4x compression and separated from compute. The analytical query engine delivers performance improvements over transactional databases for complex queries. All components run as stateless processes in a single Docker image, simplifying deployment and horizontal scaling.

Deployment Model

Requires a Postgres database for catalog metadata and S3-compatible object storage (AWS S3, MinIO, etc.). Syncer processes pull data from sources on-demand, while the server process handles query execution. Both run independently and can be scaled separately.

Highlights

Analytical query engine 2000x faster than regular Postgres for complex queries

Built-in connectors for Postgres, Amplitude, and Attio with table-level filtering

Compressed columnar storage in S3 with 4x compression using open table format

Single Docker image deployment with stateless, independently scalable processes

Pros

Postgres wire protocol compatibility works with existing tools and ORMs
Simplified deployment with single Docker image and minimal infrastructure dependencies
Storage and compute separation enables independent scaling and cost optimization
Open table format prevents vendor lock-in for stored data

Considerations

Requires external Postgres database for catalog metadata management
Limited connector ecosystem compared to established ETL platforms
Performance benchmarks lack detailed methodology and comparison context
Early-stage project with evolving feature set and potential breaking changes

Managed products teams compare with

When teams consider BemiDB, these hosted platforms usually appear on the same shortlist.

Amazon Redshift

Fully managed, petabyte-scale cloud data warehouse for analytics and reporting

Azure Synapse Analytics

Limitless analytics platform unifying enterprise data warehousing and big data analytics in a single service

Google BigQuery

Serverless, highly scalable cloud data warehouse

Looking for a hosted option? These are the services engineering teams benchmark against before choosing open source.

Fit guide

Great for

Teams consolidating data from operational databases and SaaS tools for analytics
Organizations with existing Postgres tooling investments seeking analytical performance
Use cases requiring historical data archival with query access
Environments needing S3-compatible storage with compute flexibility

Not ideal when

Real-time streaming analytics requiring sub-second data freshness
Workloads needing connectors beyond Postgres, Amplitude, and Attio
Teams requiring enterprise support contracts and SLAs
Transactional workloads with frequent writes and updates

How teams use it

Centralized analytics without ETL complexity

Query data from multiple Postgres databases and SaaS platforms through a single endpoint without building custom pipelines

Historical data archival and analysis

Offload old records from production databases to compressed S3 storage while maintaining query access for compliance and reporting

BI tool integration with analytical performance

Connect existing Postgres-compatible dashboards and notebooks to run complex aggregations without impacting production databases

Cost-optimized data warehousing

Store large datasets in S3 with 4x compression and scale query compute independently based on workload demands

Tech snapshot

Go97%

Shell2%

Makefile1%

Dockerfile1%

Frequently asked questions

What data sources does BemiDB support?

BemiDB currently includes built-in connectors for Postgres databases, Amplitude analytics, and Attio CRM. The Postgres connector supports table-level inclusion and exclusion filters.

How does BemiDB achieve 2000x faster query performance?

BemiDB uses an analytical query engine optimized for columnar data stored in S3, separating storage from compute. This architecture is designed for read-heavy analytical workloads rather than transactional operations.

What infrastructure does BemiDB require?

You need a Postgres database for catalog metadata, S3-compatible object storage (AWS S3, MinIO, etc.), and container runtime for Docker. Syncer and server processes run as separate stateless containers.

Can I use BemiDB with tools like Tableau or Metabase?

Yes. BemiDB exposes a Postgres-compatible wire protocol, so any tool or library that connects to Postgres can query BemiDB without modification.

How is data compressed and stored?

BemiDB stores data in an open columnar table format in S3 with 4x compression. The catalog Postgres database tracks metadata while actual data resides in object storage.

Project at a glance

Active

View repo

Stars: 1,520
Watchers: 1,520
Forks: 43

LicenseAGPL-3.0

Repo age1 year old

Last commit2 months ago

Primary languageGo

Last synced 5 hours ago