Open-source alternatives to AWS Glue Data Catalog

Compare community-driven replacements for AWS Glue Data Catalog in data catalogs & governance workflows. We curate active, self-hostable options with transparent licensing so you can evaluate the right fit quickly.

AWS Glue Data Catalog

AWS Glue Data Catalog stores and organizes table metadata, auto-discovers data with crawlers, computes column statistics, and tracks lineage for ETL and analytics.Read more

Data Catalogs & Governance

Visit Alternative Website

Key stats

9Alternatives
1Support self-hosting
Run on infrastructure you control
9Active development
Recent commits in the last 6 months
8Permissive licenses
MIT, Apache, and similar licenses

Counts reflect projects currently indexed as alternatives to AWS Glue Data Catalog.

All open-source alternatives

Amundsen

Google‑style search engine for data assets across your organization

Active developmentPermissive licenseIntegration-friendlyPython

Why teams choose it

Relevance‑ranked search across tables, dashboards, ML features, and people
Extensible ingestion pipeline with 30+ built‑in connectors
Pluggable metadata stores: Neo4j, Apache Atlas, relational DBs, AWS Neptune

Watch for

Requires multiple services (frontend, search, metadata) to run

Migration highlight

Find frequently queried tables for ad‑hoc analysis

Analysts locate high‑usage tables instantly, cutting discovery time from days to minutes.

Apache Atlas

Unified metadata governance for Hadoop and enterprise data ecosystems

Active developmentPermissive licenseFast to deployJava

Why teams choose it

Centralized metadata store with shared access
Automated lineage tracking enriched by business taxonomies
Integrated security via Apache Ranger (RBAC & ABAC)

Watch for

Primarily focused on Hadoop ecosystems

Migration highlight

Regulatory compliance reporting

Generate audit‑ready lineage reports to demonstrate data handling compliance.

Apache Gravitino

Geo-distributed federated metadata lake for unified data governance

Self-host friendlyActive developmentPermissive licenseJava

Why teams choose it

Unified API for managing metadata across Hive, MySQL, HDFS, S3, and more
Geo-distributed architecture for multi-region and multi-cloud metadata sharing
Direct connector integration with immediate reflection of upstream changes

Watch for

Windows builds are not currently supported

Migration highlight

Multi-Cloud Data Lake Federation

Unified metadata access across AWS S3, Azure Data Lake, and on-premises HDFS, enabling cross-cloud analytics without data migration.

CKAN

Powerful platform for publishing, sharing, and managing open data

Active developmentIntegration-friendlyAI-powered workflowsPython

Why teams choose it

Rich searchable catalog with versioning
Full RESTful API for metadata and resources
Built‑in data visualisation widgets

Watch for

Requires Python and PostgreSQL expertise to deploy

Migration highlight

National open data portal

Provides a centralized catalog, public API, and searchable interface for all government datasets, improving transparency and citizen engagement.

DataHub

Unified metadata platform for modern data discovery and governance

Active developmentPermissive licensePrivacy-firstPython

Why teams choose it

Real‑time metadata graph for instant search and impact analysis
Extensible connectors covering databases, pipelines, and BI tools
Docker quickstart and Helm charts for flexible deployment

Watch for

Initial setup can require infrastructure expertise

Migration highlight

Cross‑source Data Discovery

Analysts can search across databases, dashboards, and pipelines from a single UI, reducing time to find relevant assets.

Magda

Federated data catalog with scalable search and Kubernetes‑native deployment

Active developmentPermissive licenseFast to deployJavaScript

Why teams choose it

Scalable OpenSearch‑based search across federated data sources
Kubernetes‑orchestrated microservices with Helm charts for cloud‑agnostic deployment
Extensible metadata registry using dynamic JSON aspects and plug‑in connectors/minions

Watch for

Requires Kubernetes expertise for installation and upgrades

Migration highlight

Cross‑agency open data portal

Aggregates datasets from multiple government portals, providing citizens a single searchable interface.

Marquez

Centralized metadata service for data lineage and lifecycle

Active developmentPermissive licenseFast to deployJava

Why teams choose it

Collects OpenLineage events via HTTP API
Interactive web UI with lineage graph visualization
Beta GraphQL endpoint for flexible metadata queries

Watch for

No built‑in authentication or authorization

Migration highlight

Data pipeline debugging

Visual lineage graphs let engineers pinpoint failing jobs and understand upstream dataset impacts.

ODD Platform

Unified data discovery, lineage, and observability platform

Active developmentPermissive licenseFast to deployJava

Why teams choose it

Federated data catalog with end‑to‑end lineage across heterogeneous sources
Built‑in data quality dashboard compatible with Great Expectations and DBT
Automatic ML experiment tracking and parameter logging

Watch for

Requires a PostgreSQL backend to store metadata

Migration highlight

Accelerate dashboard creation

Analysts quickly locate source tables and understand lineage, reducing time to build reliable BI reports.

OpenMetadata

Unified platform for data discovery, governance, and observability

Active developmentPermissive licensePrivacy-firstTypeScript

Why teams choose it

Unified data catalog with keyword search and advanced queries
Column‑level lineage visualization and editable no‑code editor
84+ pluggable connectors for diverse data services

Watch for

Self‑hosting requires operational expertise and resources

Migration highlight

Centralized Data Catalog

Teams locate tables, dashboards, and pipelines from a single searchable UI, reducing time spent searching for assets.

Choosing a data catalogs & governance alternative

Teams replacing AWS Glue Data Catalog in data catalogs & governance workflows typically weigh self-hosting needs, integration coverage, and licensing obligations.

1 project let you self-host and keep customer data on infrastructure you control.
9 options are actively maintained with recent commits.

Tip: shortlist one hosted and one self-hosted option so stakeholders can compare trade-offs before migrating away from AWS Glue Data Catalog.

AWS Glue Data Catalog

AWS Glue Data Catalog stores and organizes table metadata, auto-discovers data with crawlers, computes column statistics, and tracks lineage for ETL and analytics.Read more

Data Catalogs & Governance

Visit Alternative Website

Key stats

9Alternatives
1Support self-hosting
Run on infrastructure you control
9Active development
Recent commits in the last 6 months
8Permissive licenses
MIT, Apache, and similar licenses

Counts reflect projects currently indexed as alternatives to AWS Glue Data Catalog.

Common questions

What languages or platforms does OpenMetadata support?

It offers 84+ connectors for databases, data warehouses, dashboard services, messaging systems, and pipeline tools, plus APIs for custom integration.

Answer surfaced from OpenMetadata

What programming languages are used in DataHub?

The core platform is written in Java, with supporting services in Python, TypeScript, and Scala.

Answer surfaced from DataHub

What programming language is CKAN built with?

CKAN is written primarily in Python and uses PostgreSQL for data storage.

Answer surfaced from CKAN