
Amazon SageMaker Feature Store
Fully managed repository to create, store, share, and serve ML features
Discover top open-source software, updated regularly with real-world adoption signals.

SQL‑driven feature platform delivering millisecond real‑time ML features
OpenMLDB lets data teams define, test, and deploy ML features with SQL, ensuring consistent offline training and online inference while delivering ultra‑low latency real‑time features at scale.
OpenMLDB addresses the 95% data‑centric workload of AI projects by letting engineers write feature logic once in SQL and use it for both offline model training and online inference. The unified execution plan guarantees that the same feature definitions produce identical results, eliminating data leakage and costly back‑filling.
A purpose‑built real‑time SQL engine processes time‑series data in a few milliseconds, while a batch engine (based on a tailored Spark distribution) handles large‑scale offline jobs. Deployment follows three simple steps: develop features offline with SQL, deploy them online with a single command, and configure a real‑time data source. Built‑in enterprise capabilities—distributed storage, fault recovery, high availability, seamless scaling, monitoring, and heterogeneous memory support—make OpenMLDB production‑ready for recommendation systems, risk analytics, finance, IoT, and more.
When teams consider OpenMLDB, these hosted platforms usually appear on the same shortlist.

Fully managed repository to create, store, share, and serve ML features

Feature registry with governance, lineage, and MLflow integration

Central hub to manage, govern, and serve ML features across batch, streaming, and real time
Looking for a hosted option? These are the services engineering teams benchmark against before choosing open source.
NYC Taxi Trip Duration Prediction
End‑to‑end ML pipeline built with OpenMLDB and LightGBM to predict ride duration, demonstrating rapid feature development and deployment.
Real‑time Data Ingestion from Apache Kafka
Seamless import of streaming events into OpenMLDB via the Kafka connector, enabling millisecond‑level feature computation for online services.
Real‑time Data Ingestion from Apache Pulsar
Pulsar streams are ingested through the OpenMLDB‑Pulsar connector, supporting low‑latency feature serving in cloud‑native environments.
End‑to‑end ML Pipelines in DolphinScheduler
Automated scheduling of feature engineering, model training, and deployment using DolphinScheduler integrated with OpenMLDB.
It serves as a feature platform for ML applications requiring ultra‑low latency real‑time features, and also functions as a time‑series database for finance, IoT, and similar domains.
OpenMLDB goes beyond a traditional feature store by generating real‑time features in a few milliseconds, whereas most stores only serve pre‑computed offline features.
SQL offers an elegant yet powerful syntax; its extensions flatten the learning curve and facilitate collaboration across data teams.
A unified execution plan generator creates identical execution plans for both batch and real‑time engines, guaranteeing feature consistency and preventing data leakage.
Develop features offline with SQL, deploy them online with a single command, and configure a real‑time data source to start serving features.
Project at a glance
ActiveLast synced 4 days ago