Open-source alternatives to AWS Glue Data Catalog

Compare community-driven replacements for AWS Glue Data Catalog in data catalogs & governance workflows. We curate active, self-hostable options with transparent licensing so you can evaluate the right fit quickly.

AWS Glue Data Catalog logo

AWS Glue Data Catalog

AWS Glue Data Catalog stores and organizes table metadata, auto-discovers data with crawlers, computes column statistics, and tracks lineage for ETL and analytics.Read more
Visit Product Website

Key stats

  • 9Alternatives
  • 1Support self-hosting

    Run on infrastructure you control

  • 9Active development

    Recent commits in the last 6 months

  • 8Permissive licenses

    MIT, Apache, and similar licenses

Counts reflect projects currently indexed as alternatives to AWS Glue Data Catalog.

Start with these picks

These projects match the most common migration paths for teams replacing AWS Glue Data Catalog.

Apache Gravitino logo
Apache Gravitino
Best for self-hosting

Why teams pick it

Organizations with data spread across multiple clouds, regions, or on-premises systems

OpenMetadata logo
OpenMetadata
Privacy-first alternative

Why teams pick it

Organizations adopting data mesh or decentralized data ownership

All open-source alternatives

CKAN logo

CKAN

Powerful platform for publishing, sharing, and managing open data

Active developmentIntegration-friendlyAI-powered workflowsPython

Why teams choose it

  • Rich searchable catalog with versioning
  • Full RESTful API for metadata and resources
  • Built‑in data visualisation widgets

Watch for

Requires Python and PostgreSQL expertise to deploy

Migration highlight

National open data portal

Provides a centralized catalog, public API, and searchable interface for all government datasets, improving transparency and citizen engagement.

ODD Platform logo

ODD Platform

Unified data discovery, lineage, and observability platform

Active developmentPermissive licenseFast to deployJava

Why teams choose it

  • Federated data catalog with end‑to‑end lineage across heterogeneous sources
  • Built‑in data quality dashboard compatible with Great Expectations and DBT
  • Automatic ML experiment tracking and parameter logging

Watch for

Requires a PostgreSQL backend to store metadata

Migration highlight

Accelerate dashboard creation

Analysts quickly locate source tables and understand lineage, reducing time to build reliable BI reports.

Apache Gravitino logo

Apache Gravitino

Geo-distributed federated metadata lake for unified data governance

Self-host friendlyActive developmentPermissive licenseJava

Why teams choose it

  • Unified API for managing metadata across Hive, MySQL, HDFS, S3, and more
  • Geo-distributed architecture for multi-region and multi-cloud metadata sharing
  • Direct connector integration with immediate reflection of upstream changes

Watch for

Windows builds are not currently supported

Migration highlight

Multi-Cloud Data Lake Federation

Unified metadata access across AWS S3, Azure Data Lake, and on-premises HDFS, enabling cross-cloud analytics without data migration.

Apache Atlas logo

Apache Atlas

Unified metadata governance for Hadoop and enterprise data ecosystems

Active developmentPermissive licenseFast to deployJava

Why teams choose it

  • Centralized metadata store with shared access
  • Automated lineage tracking enriched by business taxonomies
  • Integrated security via Apache Ranger (RBAC & ABAC)

Watch for

Primarily focused on Hadoop ecosystems

Migration highlight

Regulatory compliance reporting

Generate audit‑ready lineage reports to demonstrate data handling compliance.

OpenMetadata logo

OpenMetadata

Unified platform for data discovery, governance, and observability

Active developmentPermissive licensePrivacy-firstTypeScript

Why teams choose it

  • Unified data catalog with keyword search and advanced queries
  • Column‑level lineage visualization and editable no‑code editor
  • 84+ pluggable connectors for diverse data services

Watch for

Self‑hosting requires operational expertise and resources

Migration highlight

Centralized Data Catalog

Teams locate tables, dashboards, and pipelines from a single searchable UI, reducing time spent searching for assets.

Amundsen logo

Amundsen

Google‑style search engine for data assets across your organization

Active developmentPermissive licenseIntegration-friendlyPython

Why teams choose it

  • Relevance‑ranked search across tables, dashboards, ML features, and people
  • Extensible ingestion pipeline with 30+ built‑in connectors
  • Pluggable metadata stores: Neo4j, Apache Atlas, relational DBs, AWS Neptune

Watch for

Requires multiple services (frontend, search, metadata) to run

Migration highlight

Find frequently queried tables for ad‑hoc analysis

Analysts locate high‑usage tables instantly, cutting discovery time from days to minutes.

Magda logo

Magda

Federated data catalog with scalable search and Kubernetes‑native deployment

Active developmentPermissive licenseFast to deployJavaScript

Why teams choose it

  • Scalable OpenSearch‑based search across federated data sources
  • Kubernetes‑orchestrated microservices with Helm charts for cloud‑agnostic deployment
  • Extensible metadata registry using dynamic JSON aspects and plug‑in connectors/minions

Watch for

Requires Kubernetes expertise for installation and upgrades

Migration highlight

Cross‑agency open data portal

Aggregates datasets from multiple government portals, providing citizens a single searchable interface.

Marquez logo

Marquez

Centralized metadata service for data lineage and lifecycle

Active developmentPermissive licenseFast to deployJava

Why teams choose it

  • Collects OpenLineage events via HTTP API
  • Interactive web UI with lineage graph visualization
  • Beta GraphQL endpoint for flexible metadata queries

Watch for

No built‑in authentication or authorization

Migration highlight

Data pipeline debugging

Visual lineage graphs let engineers pinpoint failing jobs and understand upstream dataset impacts.

DataHub logo

DataHub

Unified metadata platform for modern data discovery and governance

Active developmentPermissive licensePrivacy-firstJava

Why teams choose it

  • Real‑time metadata graph for instant search and impact analysis
  • Extensible connectors covering databases, pipelines, and BI tools
  • Docker quickstart and Helm charts for flexible deployment

Watch for

Initial setup can require infrastructure expertise

Migration highlight

Cross‑source Data Discovery

Analysts can search across databases, dashboards, and pipelines from a single UI, reducing time to find relevant assets.

Choosing a data catalogs & governance alternative

Teams replacing AWS Glue Data Catalog in data catalogs & governance workflows typically weigh self-hosting needs, integration coverage, and licensing obligations.

  • 1 project let you self-host and keep customer data on infrastructure you control.
  • 9 options are actively maintained with recent commits.

Tip: shortlist one hosted and one self-hosted option so stakeholders can compare trade-offs before migrating away from AWS Glue Data Catalog.