Marquez logo

Marquez

Centralized metadata service for data lineage and lifecycle

Marquez captures, aggregates, and visualizes dataset, job, and run metadata, offering provenance tracking, lineage graphs, and lifecycle management via a web UI and HTTP API.

Marquez banner

Overview

Overview

Marquez provides a unified platform to collect, store, and explore metadata across a data ecosystem. By ingesting OpenLineage events through its HTTP API, it records dataset provenance, job executions, and run details, enabling teams to trace how data moves and transforms.

Audience & Deployment

Designed for data engineers, analysts, and ops teams that need visibility into pipeline health and compliance, Marquez runs on Java 17 with PostgreSQL 14. A Docker‑based quick‑start gets the service up in minutes, and a Helm chart supports Kubernetes deployments. The web UI offers interactive lineage graphs, while a beta GraphQL endpoint allows flexible queries.

Capabilities

Beyond provenance, Marquez aggregates runtime metrics, supports admin health checks, and exposes Prometheus‑compatible metrics. Although authentication is not built‑in, it can be layered via reverse proxies. Compatibility with OpenLineage 2‑0‑2 ensures smooth integration with existing lineage emitters.

Highlights

Collects OpenLineage events via HTTP API
Interactive web UI with lineage graph visualization
Beta GraphQL endpoint for flexible metadata queries
Docker quick‑start with sample data and admin metrics

Pros

  • Provides end‑to‑end data provenance
  • Supports both API and UI access
  • Compatible with OpenLineage specification
  • Easy Docker deployment for local testing

Considerations

  • No built‑in authentication or authorization
  • Requires Java 17 and PostgreSQL 14 runtime
  • GraphQL API is still in beta
  • Native integrations removed; relies on OpenLineage clients

Managed products teams compare with

When teams consider Marquez, these hosted platforms usually appear on the same shortlist.

Alation logo

Alation

Data catalog platform for data discovery, governance, and lineage

Ataccama logo

Ataccama

Unified data management platform combining catalog, governance, data quality, and MDM

Atlan logo

Atlan

Modern data catalog and collaborative metadata platform for data discovery and governance

Looking for a hosted option? These are the services engineering teams benchmark against before choosing open source.

Fit guide

Great for

  • Teams needing centralized lineage tracking across heterogeneous pipelines
  • Organizations adopting OpenLineage as a metadata standard
  • Developers who want a ready‑made UI to explore dataset dependencies
  • Ops teams able to provision Java and PostgreSQL services

Not ideal when

  • Environments that require out‑of‑the‑box RBAC or auth
  • Projects without Java runtime or PostgreSQL support
  • Use cases demanding real‑time streaming of lineage at massive scale
  • Users looking for a fully managed SaaS solution

How teams use it

Data pipeline debugging

Visual lineage graphs let engineers pinpoint failing jobs and understand upstream dataset impacts.

Compliance audit

Provenance records satisfy regulatory requirements by showing who produced and consumed each dataset.

Cross‑team data catalog

Unified view of datasets and jobs enables data discovery across multiple squads.

Performance monitoring

Run metadata and frequency metrics help ops track job runtimes and dataset access patterns.

Tech snapshot

Java74%
TypeScript19%
Python3%
Shell1%
HTML1%
JavaScript1%

Tags

metadata-servicedata-dictionarydata-governancedata-opsmetadatadata-ecosystem-metadatamarquezdata-lineagedata-provenancedata-discovery

Frequently asked questions

Which OpenLineage versions does Marquez support?

Marquez is compatible with OpenLineage 2‑0‑2 and maintains backward compatibility with older spec versions.

How is authentication handled?

The HTTP API does not require authentication by default; you can add auth via a reverse proxy or external gateway.

Can Marquez be deployed on Kubernetes?

Yes, a Helm chart is provided for Kubernetes deployments.

What languages can emit lineage events?

Any language with an OpenLineage client library (e.g., Python, Java) can send events to Marquez.

Project at a glance

Active
Stars
2,094
Watchers
2,094
Forks
380
LicenseApache-2.0
Repo age7 years old
Last commit6 days ago
Primary languageJava

Last synced yesterday