
Airbyte
Open-source data integration engine for ELT pipelines across data sources
Discover top open-source software, updated regularly with real-world adoption signals.

Lightweight Python ETL framework with PostgreSQL and web UI
A transparent data transformation framework that defines pipelines as Python code, uses PostgreSQL for processing, and provides an extensive web interface for debugging and execution.
Mara Pipelines is a lightweight data transformation framework designed for teams building ETL workflows who value transparency and simplicity over distributed complexity. It positions itself between plain scripts and heavyweight orchestrators like Apache Airflow.
Pipelines, tasks, and commands are defined declaratively in Python code. The framework uses PostgreSQL as its data processing engine and relies on command-line tools rather than in-app data processing. Execution follows GNU make semantics where nodes depend on upstream completion, not data flows. Single-machine execution via Python's multiprocessing eliminates the need for distributed task queues, making debugging straightforward.
The extensive web UI serves as the primary interface for inspecting, running, and debugging pipelines. Each pipeline displays dependency graphs, 30-day runtime charts, node priority tables, and execution logs. Tasks show upstream/downstream relationships, historical performance, and command output. Cost-based priority queues automatically run expensive nodes first based on recorded runtimes.
Install via pip and integrate into Flask applications. A PostgreSQL database stores runtime information and incremental processing status. Note: heavy use of forking means native Windows execution requires Docker or WSL.
When teams consider Mara Pipelines, these hosted platforms usually appear on the same shortlist.
Looking for a hosted option? These are the services engineering teams benchmark against before choosing open source.
Daily data warehouse refresh
Schedule nightly ETL jobs that extract from sources, transform via SQL, and load to PostgreSQL with automatic retry and performance tracking
Multi-stage reporting pipeline
Chain data extraction, cleaning, aggregation, and export tasks with dependency management and visual progress monitoring
Database migration orchestration
Coordinate complex schema changes and data backfills across multiple tables with rollback-friendly task isolation
Incremental data synchronization
Track file dependencies and timestamps to process only changed data sources, reducing runtime and resource consumption
No. Mara Pipelines uses Python's multiprocessing for single-machine execution, eliminating the need for distributed task queues and simplifying debugging.
Not natively due to heavy use of forking. You must use Docker or the Windows Subsystem for Linux (WSL) to run it on Windows.
PostgreSQL is recommended for storing runtime information, execution logs, and incremental processing state. The framework is designed with PostgreSQL as the data processing engine.
Mara Pipelines records historical run times for each node and schedules nodes with higher costs (longer runtimes) first to optimize overall pipeline completion time.
Yes. The framework provides web UI components designed for Flask integration. Reference the mara example projects for implementation patterns.
Project at a glance
DormantLast synced 4 days ago