ByConity logo

ByConity

Cloud-native data warehouse with compute-storage separation for large-scale analytics

ByConity is a cloud-native data warehouse derived from ClickHouse, featuring compute-storage separation, advanced query optimization, and unified batch and streaming data ingestion for high-performance analytics.

ByConity banner

Overview

Modern Cloud Data Warehouse Architecture

ByConity is an advanced database management system built on ClickHouse v21.8 foundations, reimagined with compute-storage separation inspired by Snowflake's architecture. Designed for organizations managing large-scale analytical workloads, it delivers high-performance querying capabilities while breaking down data silos through unified batch and streaming data ingestion.

Key Capabilities

The platform introduces stateless workers, a sophisticated query optimizer, and a shared-storage framework that enables independent scaling of compute and storage resources. This architecture allows teams to extract insights from vast datasets quickly and accurately, without maintaining separate processes for different data ingestion patterns.

Deployment Flexibility

ByConity's cloud-native design works seamlessly on both Kubernetes and physical clusters, offering deployment flexibility to match your infrastructure requirements. Built in C++ and released under Apache 2.0 license, it provides enterprise-grade performance while maintaining the extensibility and community collaboration benefits of open-source software. The system requires FoundationDB client libraries and can be deployed efficiently using Docker Compose for cluster orchestration.

Highlights

Compute-storage separation architecture for independent resource scaling
Advanced query optimizer delivering fast analytics on large-scale datasets
Unified ingestion for both batch-loaded and streaming data sources
Cloud-native design supporting Kubernetes and physical cluster deployments

Pros

  • High-performance querying capabilities optimized for large-scale data
  • Eliminates data silos by handling batch and streaming data uniformly
  • Flexible deployment across Kubernetes and physical infrastructure
  • Built on proven ClickHouse foundation with architectural innovations

Considerations

  • Requires FoundationDB client library dependency for operation
  • Architectural divergence from ClickHouse may limit upstream compatibility
  • Relatively new independent project with smaller community than parent
  • Docker-based build environment may add complexity for local development

Managed products teams compare with

When teams consider ByConity, these hosted platforms usually appear on the same shortlist.

Amazon Redshift logo

Amazon Redshift

Fully managed, petabyte-scale cloud data warehouse for analytics and reporting

Azure Synapse Analytics logo

Azure Synapse Analytics

Limitless analytics platform unifying enterprise data warehousing and big data analytics in a single service

Google BigQuery logo

Google BigQuery

Serverless, highly scalable cloud data warehouse

Looking for a hosted option? These are the services engineering teams benchmark against before choosing open source.

Fit guide

Great for

  • Organizations running large-scale analytical workloads in cloud environments
  • Teams needing unified batch and streaming data processing pipelines
  • Enterprises requiring independent compute and storage scaling
  • Data platforms seeking ClickHouse performance with cloud-native architecture

Not ideal when

  • Small-scale deployments where simpler databases suffice
  • Teams requiring full ClickHouse upstream compatibility and ecosystem
  • Projects avoiding FoundationDB infrastructure dependencies
  • Organizations needing mature, long-established community support

How teams use it

Real-Time Analytics on Streaming Data

Ingest and query streaming events alongside historical batch data without maintaining separate systems, enabling unified analytics across all data sources.

Elastic Cloud Data Warehouse

Scale compute resources independently during peak query loads while maintaining cost-efficient storage, optimizing infrastructure spend for variable workloads.

Breaking Down Enterprise Data Silos

Consolidate isolated batch and streaming data sources into a single queryable platform, improving cross-functional insights and reducing operational complexity.

Large-Scale Log Analytics

Query petabyte-scale log datasets with sub-second response times using advanced query optimization and distributed compute architecture.

Tech snapshot

C++87%
Python5%
Assembly5%
Shell1%
CMake1%
C1%

Tags

snowflakeclickhousetiktokolapsqlclouds3lakehouseclickhouse-databasekubernetsbytedance

Frequently asked questions

How does ByConity differ from ClickHouse?

ByConity builds on ClickHouse v21.8 but introduces compute-storage separation, an advanced query optimizer, stateless workers, and shared-storage architecture inspired by Snowflake. These architectural changes are substantial enough that integration into upstream ClickHouse was not feasible.

What infrastructure dependencies does ByConity require?

ByConity requires the FoundationDB client library (libfdb_c.so) to run. It can be deployed on Kubernetes clusters or physical machines, with Docker Compose recommended for convenient cluster deployment.

Can ByConity handle both batch and streaming data?

Yes, ByConity seamlessly ingests both batch-loaded data and streaming data, eliminating the need for separate processing pipelines and helping break down data silos for unified analytics.

Is ByConity truly cloud-native?

Yes, ByConity is designed with a cloud-native approach featuring compute-storage separation, stateless workers, and support for Kubernetes deployments, allowing it to leverage cloud scalability and resilience while also supporting physical clusters.

What license does ByConity use?

ByConity is released under the Apache 2.0 license, making it open source and available for community collaboration, contribution, and customization.

Project at a glance

Stable
Stars
2,227
Watchers
2,227
Forks
312
LicenseApache-2.0
Repo age3 years old
Last commit10 months ago
Primary languageC++

Last synced 3 hours ago