Apache DolphinScheduler logo

Apache DolphinScheduler

Modern, low-code platform for high-performance data workflow orchestration

Apache DolphinScheduler enables agile, low-code creation of high-performance data pipelines, offering drag-and-drop UI, Python SDK, API, and multi-master architecture with cloud-native deployment options.

Apache DolphinScheduler banner

Overview

Overview

Apache DolphinScheduler is a data orchestration platform designed for engineers who need to build, version, and run complex pipelines quickly. Its low‑code approach lets users compose workflows via a drag‑and‑drop Web UI, Python SDK, or Open API, while supporting a wide range of built‑in task types such as Shell, Spark, Hive, MySQL, PostgreSQL, and Trino.

Deployment & Scalability

The system can be deployed in four modes—Standalone, Cluster, Docker, and Kubernetes—making it suitable for on‑premise, cloud, or hybrid environments. A decentralized multi‑master and multi‑worker architecture provides horizontal scaling and high availability, allowing the platform to process tens of millions of tasks per day. Features like workflow versioning, backfill, multi‑tenant isolation, and fine‑grained permission control help teams maintain reliability and governance across large, distributed data teams.

Highlights

Four deployment modes: Standalone, Cluster, Docker, Kubernetes
Drag-and-drop Web UI plus Python SDK and Open API
Decentralized multi-master/worker architecture with horizontal scaling
Workflow versioning, backfill, and fine-grained permission control

Pros

  • High throughput – handles tens of millions of tasks daily
  • Native cloud‑native support across multiple clouds
  • Robust multi‑tenant isolation
  • Extensible with custom task types

Considerations

  • Java‑centric codebase may steepen learning curve for non‑Java teams
  • Complexity of cluster setup for large deployments
  • Limited built‑in connectors compared to some competitors
  • Documentation can be fragmented across UI, SDK, and API

Managed products teams compare with

When teams consider Apache DolphinScheduler, these hosted platforms usually appear on the same shortlist.

Astronomer logo

Astronomer

Managed Apache Airflow service for orchestrating and monitoring data pipelines in the cloud

Dagster logo

Dagster

Data orchestration framework for building reliable pipelines

ServiceNow logo

ServiceNow

Enterprise workflow and IT service management

Looking for a hosted option? These are the services engineering teams benchmark against before choosing open source.

Fit guide

Great for

  • Data engineering teams needing scalable, low-code pipeline authoring
  • Enterprises requiring multi-tenant isolation and fine-grained permissions
  • Organizations deploying on Kubernetes or Docker environments
  • Projects that benefit from workflow versioning and backfill

Not ideal when

  • Small scripts that only need simple cron-style scheduling
  • Teams without Java expertise seeking pure Python solutions
  • Environments where a lightweight scheduler is preferred over a full orchestration platform
  • Use cases demanding extensive out-of-the-box connector library

How teams use it

ETL pipeline orchestration across hybrid clouds

Automates data extraction, transformation, and loading across AWS, GCP, and on-premise clusters with versioned workflows and fail-over handling.

Batch job scheduling for machine-learning model training

Coordinates data preprocessing, model training, and evaluation tasks, leveraging parallel workers to process millions of tasks per day.

Data quality monitoring and backfill

Runs periodic data validation jobs, detects failures, and automatically backfills missing data using the built-in backfill UI.

Multi-tenant analytics platform for SaaS providers

Isolates each tenant’s workflows and data sources, enforcing permission policies while scaling horizontally as usage grows.

Tech snapshot

Java83%
TypeScript16%
HCL1%
SCSS1%
Shell1%
PLpgSQL1%

Tags

data-pipelinesworkflowairflowworkflow-schedulecloud-nativeorchestrationtask-schedulerjob-schedulerazkabanworkflow-orchestrationpowerful-data-pipelines

Frequently asked questions

How do I deploy DolphinScheduler in a Kubernetes cluster?

Use the provided Helm chart or Docker images to create master, worker, and database pods; the chart includes configuration for auto‑scaling and high availability.

Can I create workflows programmatically?

Yes, workflows can be defined and submitted via the Python SDK or the Open API, enabling code‑first pipeline development.

What task types are supported out of the box?

Built‑in tasks include Shell, Spark, Hive, MySQL, PostgreSQL, Trino, and more; custom tasks can be added through plugins.

How does DolphinScheduler ensure high availability?

Its multi‑master and multi‑worker architecture provides decentralized coordination; masters can fail over without interrupting running tasks.

Is there built-in support for workflow versioning?

Yes, each workflow and its instances are versioned, allowing rollback, audit, and reproducible execution.

Project at a glance

Active
Stars
14,107
Watchers
14,107
Forks
4,979
LicenseApache-2.0
Repo age6 years old
Last commit11 hours ago
Primary languageJava

Last synced 3 hours ago