
Alation
Data catalog platform for data discovery, governance, and lineage
Discover top open-source software, updated regularly with real-world adoption signals.

Google‑style search engine for data assets across your organization
Amundsen indexes tables, dashboards, ML features, and people, delivering relevance‑ranked search powered by usage patterns, with a Flask/React UI and integrations for major data stores.

Amundsen is a data discovery and metadata engine that helps analysts, data scientists, and engineers locate the data they need quickly. By indexing tables, dashboards, ML features, and people, it provides a Google‑style search experience where frequently used assets surface first.
The platform consists of four microservices—frontend (Flask + React), search (Elasticsearch), metadata (Neo4j, Apache Atlas, relational DBs, or AWS Neptune), and an ingestion library (databuilder). Ingestion can be run via Python scripts or Airflow DAGs, supporting over 30 connectors such as Hive, Redshift, Snowflake, BigQuery, and many more. Deployment requires Python ≥ 3.8 and Node 12, and each service can be containerized for scalable operation.
Designed for organizations with diverse data ecosystems that need a unified, extensible catalog. The active LF AI & Data community provides support, documentation, and a Slack channel for collaboration.
When teams consider Amundsen, these hosted platforms usually appear on the same shortlist.
Looking for a hosted option? These are the services engineering teams benchmark against before choosing open source.
Find frequently queried tables for ad‑hoc analysis
Analysts locate high‑usage tables instantly, cutting discovery time from days to minutes.
Catalog dashboards across BI tools
Data consumers search and navigate to relevant Superset, Tableau, or Looker dashboards.
Expose ML feature metadata to model developers
Feature engineers retrieve feature definitions and lineage, ensuring consistency across models.
Integrate with Airflow to keep metadata fresh
Automated DAGs run databuilder jobs, continuously updating the search index as new tables appear.
It can use Neo4j, Apache Atlas, relational databases via SQLAlchemy, or AWS Neptune through the Gremlin library.
Search service leverages Elasticsearch, ranking results based on usage signals such as query frequency.
Yes, the databuilder library lets you write a Python extractor and loader for any dbapi or SQLAlchemy‑compatible source.
Frontend (Flask + React), Search service, Metadata service, and the ingestion library; each runs as a separate microservice.
Amundsen is hosted by the LF AI & Data Foundation; users can join the Slack workspace and contribute via GitHub.
Project at a glance
ActiveLast synced 4 days ago