
Astronomer
Managed Apache Airflow service for orchestrating and monitoring data pipelines in the cloud
Discover top open-source software, updated regularly with real-world adoption signals.

Modern, low-code platform for high-performance data workflow orchestration
Apache DolphinScheduler enables agile, low-code creation of high-performance data pipelines, offering drag-and-drop UI, Python SDK, API, and multi-master architecture with cloud-native deployment options.

Apache DolphinScheduler is a data orchestration platform designed for engineers who need to build, version, and run complex pipelines quickly. Its low‑code approach lets users compose workflows via a drag‑and‑drop Web UI, Python SDK, or Open API, while supporting a wide range of built‑in task types such as Shell, Spark, Hive, MySQL, PostgreSQL, and Trino.
The system can be deployed in four modes—Standalone, Cluster, Docker, and Kubernetes—making it suitable for on‑premise, cloud, or hybrid environments. A decentralized multi‑master and multi‑worker architecture provides horizontal scaling and high availability, allowing the platform to process tens of millions of tasks per day. Features like workflow versioning, backfill, multi‑tenant isolation, and fine‑grained permission control help teams maintain reliability and governance across large, distributed data teams.
When teams consider Apache DolphinScheduler, these hosted platforms usually appear on the same shortlist.
Looking for a hosted option? These are the services engineering teams benchmark against before choosing open source.
ETL pipeline orchestration across hybrid clouds
Automates data extraction, transformation, and loading across AWS, GCP, and on-premise clusters with versioned workflows and fail-over handling.
Batch job scheduling for machine-learning model training
Coordinates data preprocessing, model training, and evaluation tasks, leveraging parallel workers to process millions of tasks per day.
Data quality monitoring and backfill
Runs periodic data validation jobs, detects failures, and automatically backfills missing data using the built-in backfill UI.
Multi-tenant analytics platform for SaaS providers
Isolates each tenant’s workflows and data sources, enforcing permission policies while scaling horizontally as usage grows.
Use the provided Helm chart or Docker images to create master, worker, and database pods; the chart includes configuration for auto‑scaling and high availability.
Yes, workflows can be defined and submitted via the Python SDK or the Open API, enabling code‑first pipeline development.
Built‑in tasks include Shell, Spark, Hive, MySQL, PostgreSQL, Trino, and more; custom tasks can be added through plugins.
Its multi‑master and multi‑worker architecture provides decentralized coordination; masters can fail over without interrupting running tasks.
Yes, each workflow and its instances are versioned, allowing rollback, audit, and reproducible execution.
Project at a glance
ActiveLast synced 4 days ago