
Alation
Data catalog platform for data discovery, governance, and lineage
Discover top open-source software, updated regularly with real-world adoption signals.

Geo-distributed federated metadata lake for unified data governance
Apache Gravitino manages metadata across diverse sources, regions, and clouds through a unified API, enabling federated discovery, multi-region sync, and end-to-end governance for data and AI assets.

Apache Gravitino is a high-performance, geo-distributed metadata lake designed for organizations managing data and AI assets across multiple sources, regions, and clouds. It provides a single API and model to access metadata from Hive, MySQL, HDFS, S3, and other systems, eliminating silos and enabling federated discovery without migrating data.
Gravitino connects directly to underlying metadata systems, ensuring changes are immediately reflected without batch synchronization. It integrates seamlessly with query engines like Trino and Spark, allowing teams to run federated queries without modifying SQL dialects. Built-in support for Iceberg REST catalog and evolving AI model lineage standards makes it suitable for modern lakehouse and AI workflows.
The platform delivers end-to-end data governance with unified access control, auditing, and discovery across all metadata assets. Geo-distribution capabilities enable metadata sharing across hybrid and multi-cloud environments, supporting global teams and compliance requirements. Licensed under Apache 2.0, Gravitino is built with Gradle and offers Docker Compose–based playground environments for rapid evaluation.
When teams consider Apache Gravitino, these hosted platforms usually appear on the same shortlist.
Looking for a hosted option? These are the services engineering teams benchmark against before choosing open source.
Multi-Cloud Data Lake Federation
Unified metadata access across AWS S3, Azure Data Lake, and on-premises HDFS, enabling cross-cloud analytics without data migration.
Global Metadata Synchronization
Geo-distributed teams share consistent metadata views across regions, supporting compliance and reducing query latency for local users.
Federated Query with Trino
Data engineers run SQL queries spanning Hive, MySQL, and Iceberg tables through Gravitino's Trino connector without rewriting queries.
Unified Data Governance
Centralized access control and audit logs across all metadata assets, simplifying compliance reporting and security policy enforcement.
Gravitino integrates with Hive, MySQL, HDFS, S3, Iceberg, and other systems through direct connectors, with changes reflected immediately in the unified metadata layer.
Gravitino shares metadata across regions and clouds, enabling global architectures where teams in different locations access consistent metadata views without replication delays.
Yes. Gravitino provides native connectors for Trino and Spark, allowing federated queries without modifying SQL dialects or migrating existing workflows.
AI asset management, including model and feature tracking, is currently work-in-progress. Check the documentation for the latest status and roadmap.
A Docker Compose–based environment that provides a full-stack Gravitino experience for evaluation, including sample data sources and query engines.
Project at a glance
ActiveLast synced 4 days ago