CubeFS logo

CubeFS

Cloud-native distributed file and object storage system

CNCF-graduated distributed storage platform offering POSIX, HDFS, and S3 protocols with scalable metadata, multi-tenancy, and hybrid cloud acceleration for modern data infrastructure.

CubeFS banner

Overview

Purpose and Audience

CubeFS is a cloud-native distributed file and object storage system designed for organizations building scalable data infrastructure. As a CNCF graduated project, it serves enterprises needing datacenter filesystems, data lake storage, or hybrid cloud solutions. It's particularly valuable for teams running databases, search systems, and AI/ML workloads that benefit from storage/compute separation.

Core Capabilities

The platform provides multiple access protocols including POSIX, HDFS, S3, and REST API, enabling seamless integration with existing toolchains. Its highly scalable metadata service ensures strong consistency while optimizing performance for both large and small files across sequential and random write patterns. CubeFS supports multi-tenancy with robust isolation, flexible storage policies ranging from high-performance replication to cost-effective erasure coding, and multi-level caching for hybrid cloud I/O acceleration.

Deployment Context

CubeFS runs on-premises as datacenter infrastructure, in private or hybrid cloud environments, and can layer atop public cloud storage like S3 to provide filesystem semantics and cache acceleration. Built in Go and licensed under Apache-2.0, it's designed for Kubernetes-native deployments and large-scale container platforms.

Highlights

Multi-protocol access: POSIX, HDFS, S3, and REST API support
Scalable metadata service with strong consistency guarantees
Flexible storage policies: replication or erasure coding
Hybrid cloud acceleration with multi-level caching

Pros

  • CNCF graduated project with proven production maturity
  • Optimized for both large/small files and varied I/O patterns
  • Multi-tenancy with strong isolation and resource utilization
  • Separation of storage and compute for modern architectures

Considerations

  • Master branch may be unstable; production use requires releases
  • Complexity inherent in distributed storage deployment
  • Learning curve for teams new to distributed filesystems
  • Requires careful capacity planning for metadata services

Managed products teams compare with

When teams consider CubeFS, these hosted platforms usually appear on the same shortlist.

Dropbox logo

Dropbox

Cloud file storage and sync for teams and individuals

Google Drive logo

Google Drive

Cloud file storage, sync, and sharing

MEGA logo

MEGA

Encrypted cloud storage and file sharing

Looking for a hosted option? These are the services engineering teams benchmark against before choosing open source.

Fit guide

Great for

  • Organizations building data lakes or datacenter filesystems
  • AI/ML platforms requiring scalable storage/compute separation
  • Hybrid cloud deployments needing S3 cache acceleration
  • Multi-tenant environments with strict isolation requirements

Not ideal when

  • Small-scale deployments with simple storage needs
  • Teams lacking distributed systems operational expertise
  • Use cases requiring Windows-native filesystem integration
  • Projects needing immediate production deployment from master branch

How teams use it

Data Lake Storage Infrastructure

Scalable, multi-protocol storage foundation supporting analytics workloads with HDFS and S3 compatibility

AI/ML Training Pipeline

Decoupled storage and compute enabling elastic scaling of training jobs with POSIX filesystem semantics

Hybrid Cloud Acceleration

Multi-level caching layer over public cloud S3 reducing latency and egress costs for on-premises applications

Multi-Tenant SaaS Platform

Isolated storage namespaces with flexible policies balancing performance and cost per tenant

Tech snapshot

Go98%
Shell1%
Python1%
Java1%
C1%
JavaScript1%

Tags

kuberneteshybrid-cloudcloud-native-storagecncfobject-storageai-native-storageerasure-codingdata-orchestrationdistributed-file-systemdistributed-storagefusecloud-storage

Frequently asked questions

What protocols does CubeFS support?

CubeFS supports POSIX, HDFS, S3, and its own REST API, enabling integration with diverse application ecosystems and toolchains.

Is CubeFS suitable for production use?

Yes, CubeFS is a CNCF graduated project with production deployments. Use stable releases rather than the master branch for production environments.

How does CubeFS handle metadata at scale?

CubeFS provides a highly scalable metadata service with strong consistency, designed to handle large-scale deployments efficiently.

Can CubeFS work with public cloud storage?

Yes, CubeFS can run atop public cloud storage like S3, providing filesystem semantics and cache acceleration for hybrid cloud architectures.

What storage policies does CubeFS offer?

CubeFS supports flexible policies including high-performance replication for speed and low-cost erasure coding for capacity optimization.

Project at a glance

Active
Stars
5,437
Watchers
5,437
Forks
690
LicenseApache-2.0
Repo age6 years old
Last commitlast week
Self-hostingSupported
Primary languageGo

Last synced yesterday