
Aiven for Apache Flink
Fully managed Apache Flink service by Aiven.
Discover top open-source software, updated regularly with real-world adoption signals.

Scalable, fault-tolerant stream processing with Kafka and YARN
Apache Samza delivers scalable, fault‑tolerant stream processing with a simple API, managed state, and tight integration with Kafka and YARN for Java and Scala workloads.
Apache Samza is a distributed stream processing framework that leverages Apache Kafka for ordered, replayable messaging and Apache YARN for resource isolation, security, and fault tolerance. It offers a callback‑based API that feels like MapReduce, making it easy for Java and Scala developers to write stateful jobs.
Samza manages state snapshots and restores them consistently after failures, supports large per‑partition state, and guarantees message durability. It runs on YARN clusters (both 2.x and 3.x) and can be built with Java 8 or Java 11, as well as Scala 2.11 or 2.12 via Gradle. While Kafka is the default source, the pluggable architecture lets you connect other messaging systems. Deployment involves building with and launching jobs via the Samza shell tools.
./gradlew clean buildIdeal for teams already invested in the Hadoop ecosystem who need reliable, exactly‑once processing at scale, and who require managed state without writing custom checkpoint logic.
When teams consider Apache Samza, these hosted platforms usually appear on the same shortlist.
Looking for a hosted option? These are the services engineering teams benchmark against before choosing open source.
Real‑time fraud detection
Processes transaction events from Kafka, maintains per‑account state, and flags suspicious activity with exactly‑once guarantees.
Clickstream aggregation
Aggregates website click events into hourly counts, persisting state to Kafka changelogs for fault‑tolerant roll‑ups.
IoT sensor enrichment
Joins incoming sensor streams with reference data, storing enriched results while handling node failures transparently.
Log processing for alerting
Consumes log streams, applies pattern matching, and triggers alerts without losing messages, even during cluster outages.
Kafka is the default and fully supported source, but Samza’s pluggable API allows integration with other messaging systems.
Samza runs on Java 8 and Java 11; Java 11 requires YARN 3.3.4+ and the `samza-yarn3` module.
State is checkpointed to Kafka changelog topics, enabling automatic restoration after failures.
Samza does not natively support Kubernetes; you would need to run YARN on Kubernetes or build a custom integration.
Use the Gradle wrapper: `./gradlew clean build`. Scala version can be selected with `-PscalaSuffix=2.12`.
Project at a glance
StableLast synced 4 days ago