DataHub logo

DataHub

Unified metadata platform for modern data discovery and governance

DataHub provides a centralized catalog and real‑time metadata graph, enabling teams to discover, understand, and govern data across the modern data stack with extensible connectors and Kubernetes‑ready deployment.

DataHub banner

Overview

Overview

DataHub is a centralized metadata platform that powers data discovery, lineage, and governance across the modern data stack. It ingests metadata from a wide range of sources via extensible connectors and stores it in a real‑time graph, enabling instant search and impact analysis.

Who It Serves & How to Deploy

Designed for data engineers, analysts, and governance teams, DataHub can be run locally with a single‑command Docker quickstart or scaled in production using the provided Helm charts on Kubernetes. The platform includes a web UI, GraphQL API, and a suite of actions that react to metadata changes in real time.

Community & Adoption

Backed by LinkedIn and a growing open‑source community, DataHub is used by enterprises such as LinkedIn, Expedia, and Udemy. Documentation, Slack support, monthly town halls, and a hosted demo environment help teams get up to speed quickly.

Highlights

Real‑time metadata graph for instant search and impact analysis
Extensible connectors covering databases, pipelines, and BI tools
Docker quickstart and Helm charts for flexible deployment
Community‑driven actions framework for automated metadata reactions

Pros

  • Strong enterprise backing and active open‑source community
  • Scalable architecture with both local and Kubernetes options
  • Rich metadata model supports lineage, governance, and discovery
  • Extensive documentation, Slack channel, and monthly town halls

Considerations

  • Initial setup can require infrastructure expertise
  • User interface may be complex for newcomers
  • Large graph deployments may need performance tuning
  • Documentation is spread across multiple sites

Managed products teams compare with

When teams consider DataHub, these hosted platforms usually appear on the same shortlist.

Alation logo

Alation

Data catalog platform for data discovery, governance, and lineage

Ataccama logo

Ataccama

Unified data management platform combining catalog, governance, data quality, and MDM

Atlan logo

Atlan

Modern data catalog and collaborative metadata platform for data discovery and governance

Looking for a hosted option? These are the services engineering teams benchmark against before choosing open source.

Fit guide

Great for

  • Data engineering teams needing unified data discovery
  • Organizations implementing data mesh or data governance
  • Enterprises with Kubernetes expertise looking for scalable catalog
  • Teams that want real‑time metadata-driven automation

Not ideal when

  • Small teams without ops resources for Docker/Kubernetes
  • Ad‑hoc data discovery without a need for governance
  • Environments lacking Java/Scala runtime support
  • Organizations preferring fully managed SaaS catalog solutions

How teams use it

Cross‑source Data Discovery

Analysts can search across databases, dashboards, and pipelines from a single UI, reducing time to find relevant assets.

Data Governance and Lineage

Compliance teams visualize end‑to‑end data flow, enabling impact analysis and policy enforcement.

Data Mesh Catalog

Domain teams publish and consume metadata in a shared graph, fostering federated ownership and discoverability.

CI/CD Impact Analysis for dbt

GitHub Action comments on pull requests with downstream impact, helping developers avoid breaking changes.

Tech snapshot

Java41%
Python30%
TypeScript27%
JavaScript1%
Shell1%
Mustache1%

Tags

data-governancehacktoberfestmetadatadatahubdata-catalogdata-discovery

Frequently asked questions

What programming languages are used in DataHub?

The core platform is written in Java, with supporting services in Python, TypeScript, and Scala.

Can I try DataHub without installing it?

Yes, a hosted demo environment is available at demo.datahub.com.

What deployment options does DataHub support?

You can run a local Docker quickstart or deploy to production using Helm charts on Kubernetes.

Is there a community Slack for support?

Yes, a Slack workspace is provided for discussions, announcements, and help.

Does DataHub support real‑time metadata updates?

Yes, the platform includes a real‑time metadata graph and actions framework that react to changes as they occur.

Project at a glance

Active
Stars
11,457
Watchers
11,457
Forks
3,341
LicenseApache-2.0
Repo age10 years old
Last commit12 hours ago
Primary languageJava

Last synced 12 hours ago