Pathway logo

Pathway

Unified Python framework for real‑time, batch, and LLM pipelines

Pathway lets Python developers build scalable batch and streaming pipelines, add AI/LLM steps, and deploy via Docker or Kubernetes, leveraging a high‑performance Rust engine.

Pathway banner

Overview

Highlights

Pythonic API backed by a Rust differential dataflow engine
Unified batch‑and‑stream processing with stateful transforms
300+ connectors via Airbyte plus native Kafka, PostgreSQL, GDrive, SharePoint
Built‑in LLM/RAG utilities and in‑memory vector index integration

Pros

  • High throughput thanks to Rust engine
  • Same code runs in development, CI, and production
  • Extensive connector ecosystem
  • Native support for LLM and RAG workflows

Considerations

  • Officially supports only macOS and Linux (Windows needs VM)
  • Enterprise‑only exactly‑once consistency
  • Requires Python 3.10+
  • Learning curve around differential dataflow concepts

Managed products teams compare with

When teams consider Pathway, these hosted platforms usually appear on the same shortlist.

Aiven for Apache Flink logo

Aiven for Apache Flink

Fully managed Apache Flink service by Aiven.

Amazon Managed Service for Apache Flink logo

Amazon Managed Service for Apache Flink

Serverless Apache Flink for real-time stream processing on AWS.

Azure Stream Analytics logo

Azure Stream Analytics

Serverless real-time analytics with SQL on streams.

Looking for a hosted option? These are the services engineering teams benchmark against before choosing open source.

Fit guide

Great for

  • Data teams building real‑time analytics pipelines
  • ML engineers integrating LLMs with live data
  • Organizations deploying pipelines on Docker/Kubernetes
  • Projects needing both batch and streaming with a single codebase

Not ideal when

  • Simple one‑off scripts without performance constraints
  • Windows‑only environments without VM access
  • Use cases demanding built‑in GUI ETL design
  • Projects that require out‑of‑the‑box exactly‑once guarantees without enterprise license

How teams use it

Real‑time ETL from Kafka to PostgreSQL

Continuously ingest Kafka events, transform, and upsert into PostgreSQL with at‑least‑once guarantees.

Event‑driven alerting pipeline

Detect threshold breaches in streaming data and push alerts to Slack or email in seconds.

Adaptive RAG with live document updates

Index new documents on the fly and serve up‑to‑date answers via LLM without re‑training.

Batch CSV processing with incremental updates

Process large CSV dumps, then apply incremental changes as new rows arrive, keeping results current.

Tech snapshot

Python67%
Rust33%
Shell1%

Tags

dataflowkafkadata-pipelinesreal-timebatch-processingdata-processingiot-analyticsetltime-series-analysispythondata-analyticsrustmachine-learning-algorithmsetl-frameworkpathwaystream-processingstreaming

Frequently asked questions

How do I install Pathway?

Run `pip install -U pathway` on Python 3.10+; Docker images are also provided.

Which operating systems are supported?

Official binaries run on macOS and Linux; Windows users can run Pathway inside a virtual machine.

What consistency guarantees does Pathway provide?

The free version offers at‑least‑once processing; the enterprise edition adds exactly‑once guarantees.

Can I add a connector for a data source not listed?

Yes, you can build a custom Python connector or use the Airbyte connector to access over 300 sources.

How does scaling work in Kubernetes?

Package your pipeline in the provided Docker image and deploy multiple replicas; the Rust engine handles multithreading and distributed execution.

Project at a glance

Active
Stars
57,749
Watchers
57,749
Forks
1,548
Repo age3 years old
Last commityesterday
Primary languagePython

Last synced yesterday