ByConity

Cloud-native data warehouse with compute-storage separation for large-scale analytics

ByConity is a cloud-native data warehouse derived from ClickHouse, featuring compute-storage separation, advanced query optimization, and unified batch and streaming data ingestion for high-performance analytics.

Overview

Modern Cloud Data Warehouse Architecture

ByConity is an advanced database management system built on ClickHouse v21.8 foundations, reimagined with compute-storage separation inspired by Snowflake's architecture. Designed for organizations managing large-scale analytical workloads, it delivers high-performance querying capabilities while breaking down data silos through unified batch and streaming data ingestion.

Key Capabilities

The platform introduces stateless workers, a sophisticated query optimizer, and a shared-storage framework that enables independent scaling of compute and storage resources. This architecture allows teams to extract insights from vast datasets quickly and accurately, without maintaining separate processes for different data ingestion patterns.

Deployment Flexibility

ByConity's cloud-native design works seamlessly on both Kubernetes and physical clusters, offering deployment flexibility to match your infrastructure requirements. Built in C++ and released under Apache 2.0 license, it provides enterprise-grade performance while maintaining the extensibility and community collaboration benefits of open-source software. The system requires FoundationDB client libraries and can be deployed efficiently using Docker Compose for cluster orchestration.

Highlights

Compute-storage separation architecture for independent resource scaling

Advanced query optimizer delivering fast analytics on large-scale datasets

Unified ingestion for both batch-loaded and streaming data sources

Cloud-native design supporting Kubernetes and physical cluster deployments

Pros

High-performance querying capabilities optimized for large-scale data
Eliminates data silos by handling batch and streaming data uniformly
Flexible deployment across Kubernetes and physical infrastructure
Built on proven ClickHouse foundation with architectural innovations

Considerations

Requires FoundationDB client library dependency for operation
Architectural divergence from ClickHouse may limit upstream compatibility
Relatively new independent project with smaller community than parent
Docker-based build environment may add complexity for local development

Managed products teams compare with

When teams consider ByConity, these hosted platforms usually appear on the same shortlist.

Amazon Redshift

Fully managed, petabyte-scale cloud data warehouse for analytics and reporting

Azure Synapse Analytics

Limitless analytics platform unifying enterprise data warehousing and big data analytics in a single service

Google BigQuery

Serverless, highly scalable cloud data warehouse

Looking for a hosted option? These are the services engineering teams benchmark against before choosing open source.

Fit guide

Great for

Organizations running large-scale analytical workloads in cloud environments
Teams needing unified batch and streaming data processing pipelines
Enterprises requiring independent compute and storage scaling
Data platforms seeking ClickHouse performance with cloud-native architecture

Not ideal when

Small-scale deployments where simpler databases suffice
Teams requiring full ClickHouse upstream compatibility and ecosystem
Projects avoiding FoundationDB infrastructure dependencies
Organizations needing mature, long-established community support

How teams use it

Real-Time Analytics on Streaming Data

Ingest and query streaming events alongside historical batch data without maintaining separate systems, enabling unified analytics across all data sources.

Elastic Cloud Data Warehouse

Scale compute resources independently during peak query loads while maintaining cost-efficient storage, optimizing infrastructure spend for variable workloads.

Breaking Down Enterprise Data Silos

Consolidate isolated batch and streaming data sources into a single queryable platform, improving cross-functional insights and reducing operational complexity.

Large-Scale Log Analytics

Query petabyte-scale log datasets with sub-second response times using advanced query optimization and distributed compute architecture.

Tech snapshot

C++87%

Python5%

Assembly5%

Shell1%

CMake1%

C1%

Frequently asked questions

How does ByConity differ from ClickHouse?

ByConity builds on ClickHouse v21.8 but introduces compute-storage separation, an advanced query optimizer, stateless workers, and shared-storage architecture inspired by Snowflake. These architectural changes are substantial enough that integration into upstream ClickHouse was not feasible.

What infrastructure dependencies does ByConity require?

ByConity requires the FoundationDB client library (libfdb_c.so) to run. It can be deployed on Kubernetes clusters or physical machines, with Docker Compose recommended for convenient cluster deployment.

Can ByConity handle both batch and streaming data?

Yes, ByConity seamlessly ingests both batch-loaded data and streaming data, eliminating the need for separate processing pipelines and helping break down data silos for unified analytics.

Is ByConity truly cloud-native?

Yes, ByConity is designed with a cloud-native approach featuring compute-storage separation, stateless workers, and support for Kubernetes deployments, allowing it to leverage cloud scalability and resilience while also supporting physical clusters.

What license does ByConity use?

ByConity is released under the Apache 2.0 license, making it open source and available for community collaboration, contribution, and customization.

Project at a glance

Active

Visit site View repo

Stars: 2,237
Watchers: 2,237
Forks: 312

LicenseApache-2.0

Repo age3 years old

Last commitlast month

Primary languageC++

Last synced 4 hours ago