
Airbyte
Open-source data integration engine for ELT pipelines across data sources
Discover top open-source software, updated regularly with real-world adoption signals.

Ultra-performant data transformation framework for AI pipelines
Rust-powered data transformation framework for AI with incremental processing, data lineage, and declarative Python API. Build vector indexes, knowledge graphs, and custom transformations effortlessly.

CocoIndex is a high-performance data transformation framework designed specifically for AI workloads. Built with a Rust core engine and a declarative Python API, it enables developers to build production-ready data pipelines in ~100 lines of code.
Following a dataflow programming model, CocoIndex treats transformations as pure functions that create new fields without hidden state or mutations. This approach provides complete observability and automatic data lineage tracking. Developers simply declare transformations on source data—no manual CRUD operations required.
The framework excels at keeping source and target data in sync through intelligent incremental processing. When source data or transformation logic changes, CocoIndex automatically recomputes only the necessary portions while reusing cached results wherever possible. This minimizes computational overhead and ensures data freshness.
CocoIndex provides plug-and-play building blocks for diverse sources (local files, S3, Azure Blob, Google Drive), transformations (embeddings, LLM extraction, chunking), and targets (Postgres, Qdrant, LanceDB, knowledge graphs). Standardized interfaces make switching components as simple as changing a single line of code. Whether building RAG vector indexes, extracting structured data with LLMs, or constructing knowledge graphs, CocoIndex delivers exceptional developer velocity without sacrificing performance.
When teams consider CocoIndex, these hosted platforms usually appear on the same shortlist.
Looking for a hosted option? These are the services engineering teams benchmark against before choosing open source.
Semantic Search with Live Updates
Build vector indexes from document collections that automatically stay synchronized as source files change, with minimal recomputation overhead
Knowledge Graph Construction
Extract entities and relationships from documents using LLMs and maintain an up-to-date knowledge graph as content evolves
Multi-Modal AI Indexing
Process images with vision models, generate embeddings, and build searchable indexes that incrementally update when new images arrive
Structured Data Extraction
Use LLMs to extract structured information from unstructured documents like PDFs and forms, with automatic reprocessing on schema changes
Postgres stores metadata and state needed for incremental processing, enabling CocoIndex to track which data has changed and minimize recomputation while maintaining data lineage.
CocoIndex automatically detects changes in source data or transformation logic, then reprocesses only affected portions while reusing cached results for unchanged data, significantly reducing compute costs.
Yes, CocoIndex provides built-in targets for Postgres, Qdrant, and LanceDB, plus a custom target API for integrating with other databases or storage systems.
CocoIndex uses a declarative dataflow model optimized for AI workloads, with automatic incremental updates and data lineage. Traditional ETL tools typically require manual orchestration and lack AI-specific transformations.
Yes, CocoIndex is designed to be production-ready from day zero with its Rust core engine providing high performance and reliability for demanding AI data pipelines.
Project at a glance
ActiveLast synced 4 days ago