
OpenMetadata
Unified platform for data discovery, governance, and observability
- Stars
- 14,120
- License
- Apache-2.0
- Last commit
- 3 hours ago
Metadata catalogs with governance, discovery and lineage across data assets.
Data catalogs and governance platforms provide a centralized repository for metadata about an organization's data assets. They enable discovery, lineage tracing, and policy enforcement across databases, data lakes, and analytics tools. Both open-source and commercial solutions exist, offering varying levels of community support, integration breadth, and hosted versus self-managed deployment options. Organizations choose based on factors such as scalability, compliance needs, and total cost of ownership.

Unified platform for data discovery, governance, and observability

Google‑style search engine for data assets across your organization

Geo-distributed federated metadata lake for unified data governance
Unified platform for data discovery, governance, and observability
DataHub provides a centralized catalog and real‑time metadata graph, enabling teams to discover, understand, and govern data across the modern data stack with extensible connectors and Kubernetes‑ready deployment.
Assess the breadth of asset types (tables, files, streams, ML models) that the platform can ingest and catalog.
Evaluate the relevance of search results, support for faceted filters, and natural-language querying.
Look for clear, interactive lineage graphs that trace data movement from source to downstream consumers.
Check for role-based access, policy definition, and audit logging to enforce data stewardship.
Consider native connectors to data warehouses, lakes, BI tools, and orchestration platforms.
For open-source projects, review star count, contribution frequency, and release cadence as proxies for health.
Most tools in this category support these baseline capabilities.
Data catalog platform for data discovery, governance, and lineage
Unified data management platform combining catalog, governance, data quality, and MDM
Modern data catalog and collaborative metadata platform for data discovery and governance
Centralized metadata repository with crawlers, schema management, and lineage
Unified data governance service for discovering, classifying, and governing on-premises and cloud data assets across the enterprise
Data intelligence platform with catalog, governance, and data quality
Alation helps teams find, trust, and use data with search, business glossary, lineage, and AI-assisted curation across cloud and on-prem sources.
Analysts use the catalog to locate relevant datasets without needing direct assistance from data engineers.
Regulatory teams query lineage and access logs to demonstrate data handling practices.
Engineers trace upstream dependencies to assess the effect of schema changes on downstream jobs.
Data stewards add business glossaries, quality scores, and ownership tags to improve asset context.
Data catalog APIs feed metadata into data quality, cataloging, and catalog-driven data mesh implementations.
What is a data catalog?
A data catalog is a searchable inventory of an organization's data assets, enriched with metadata such as schema, lineage, ownership, and usage statistics.
How does data lineage benefit data teams?
Lineage shows how data flows from source systems through transformations to downstream reports, helping teams assess impact of changes, troubleshoot issues, and meet compliance requirements.
What are the main differences between open-source and SaaS data catalog solutions?
Open-source catalogs are self-hosted, customizable, and often free but require internal maintenance. SaaS offerings provide managed hosting, SLAs, and integrated support, typically at a subscription cost.
How is governance enforced within a data catalog?
Governance is applied through role-based access controls, policy definitions (e.g., masking, retention), and audit logs that record who accessed or modified metadata.
Which data sources are commonly supported out of the box?
Most platforms include connectors for relational databases, cloud data warehouses, data lakes (e.g., S3, ADLS), streaming platforms, BI tools, and machine-learning model registries.
How can organizations measure ROI from a data catalog?
ROI can be tracked by reduced time to find data, fewer duplicate datasets, lower compliance risk, and improved data quality leading to faster analytics delivery.