
Alation
Data catalog platform for data discovery, governance, and lineage
Discover top open-source software, updated regularly with real-world adoption signals.

Centralized metadata service for data lineage and lifecycle
Marquez captures, aggregates, and visualizes dataset, job, and run metadata, offering provenance tracking, lineage graphs, and lifecycle management via a web UI and HTTP API.

Marquez provides a unified platform to collect, store, and explore metadata across a data ecosystem. By ingesting OpenLineage events through its HTTP API, it records dataset provenance, job executions, and run details, enabling teams to trace how data moves and transforms.
Designed for data engineers, analysts, and ops teams that need visibility into pipeline health and compliance, Marquez runs on Java 17 with PostgreSQL 14. A Docker‑based quick‑start gets the service up in minutes, and a Helm chart supports Kubernetes deployments. The web UI offers interactive lineage graphs, while a beta GraphQL endpoint allows flexible queries.
Beyond provenance, Marquez aggregates runtime metrics, supports admin health checks, and exposes Prometheus‑compatible metrics. Although authentication is not built‑in, it can be layered via reverse proxies. Compatibility with OpenLineage 2‑0‑2 ensures smooth integration with existing lineage emitters.
When teams consider Marquez, these hosted platforms usually appear on the same shortlist.
Looking for a hosted option? These are the services engineering teams benchmark against before choosing open source.
Data pipeline debugging
Visual lineage graphs let engineers pinpoint failing jobs and understand upstream dataset impacts.
Compliance audit
Provenance records satisfy regulatory requirements by showing who produced and consumed each dataset.
Cross‑team data catalog
Unified view of datasets and jobs enables data discovery across multiple squads.
Performance monitoring
Run metadata and frequency metrics help ops track job runtimes and dataset access patterns.
Marquez is compatible with OpenLineage 2‑0‑2 and maintains backward compatibility with older spec versions.
The HTTP API does not require authentication by default; you can add auth via a reverse proxy or external gateway.
Yes, a Helm chart is provided for Kubernetes deployments.
Any language with an OpenLineage client library (e.g., Python, Java) can send events to Marquez.
Project at a glance
ActiveLast synced 4 days ago