
Alation
Data catalog platform for data discovery, governance, and lineage
Discover top open-source software, updated regularly with real-world adoption signals.

Federated data catalog with scalable search and Kubernetes‑native deployment
Magda unifies datasets, files, APIs, and databases across an organization, offering federated search, automated metadata enrichment, and cloud‑agnostic deployment via Kubernetes.

Magda provides a single, federated view of all data assets—whether they reside in files, databases, APIs, or external portals. By crawling sources, enriching metadata, and tracking changes, it enables users to discover, prioritize, and trust the data they need.
Built as Kubernetes‑orchestrated microservices, Magda is deployed via Helm charts and runs on any cloud or on‑premises environment. Its unopinionated Registry stores records as JSON aspects, while connectors and minions—packaged as Docker images—allow custom ingestion, validation, and enrichment in any language. Search is powered by OpenSearch, delivering fast, scalable results.
Used in production by data.gov.au, Magda supports federated authentication (Google, Facebook, WSFed, AAF, CKAN, custom) and is designed for large, heterogeneous environments. Ongoing development adds automated cataloguing, policy‑based authorization with OPA, and native dataset storage.
When teams consider Magda, these hosted platforms usually appear on the same shortlist.
Looking for a hosted option? These are the services engineering teams benchmark against before choosing open source.
Cross‑agency open data portal
Aggregates datasets from multiple government portals, providing citizens a single searchable interface.
Enterprise data discovery platform
Indexes internal databases, file shares, and APIs, enabling analysts to locate and assess data assets quickly.
Automated metadata enrichment pipeline
Minions validate links, assess data quality, and enrich records, improving search relevance without manual effort.
Custom compliance enforcement
Integrates Open Policy Agent to restrict dataset visibility based on user roles and regulatory rules.
Magda runs as a set of Docker containers orchestrated by Kubernetes; a Helm chart is provided for installation on any K8s cluster, including local Minikube.
Authentication is federated through passport.js, supporting providers such as Google, Facebook, WSFed, AAF, CKAN, and custom OAuth/OpenID Connect services.
Yes. New connectors are implemented as Docker‑based microservices that crawl the source and import metadata into the registry.
Dataset storage is currently under development; Magda presently catalogs metadata and links to external data locations.
All searchable aspects are indexed in an OpenSearch cluster, delivering fast, scalable full‑text and faceted search.
Project at a glance
ActiveLast synced 4 days ago