Apache Kafka

Scalable distributed event streaming platform for real‑time data pipelines

Event Streaming Platforms Message Brokers & Queues

Apache Kafka delivers high‑throughput, fault‑tolerant event streaming, enabling real‑time analytics, data integration, and mission‑critical applications across heterogeneous environments for thousands of companies.

Overview

Apache Kafka is a distributed event streaming platform that lets developers build high‑performance data pipelines and real‑time analytics. It stores streams as immutable, partitioned logs, providing durability, replayability, and exactly‑once processing guarantees. Kafka’s core is written in Java and Scala, supporting Java 11‑17 runtimes and Scala 2.13, and offers client libraries for many languages.

Deployment & Operations

Kafka can be run from the compiled binaries, via the provided Docker image, or integrated into CI/CD pipelines using Gradle tasks such as jar, test, and releaseTarGz. The project includes extensive testing support, coverage reporting, and auto‑generated documentation. For production, brokers are configured with kafka-storage.sh and started with kafka-server-start.sh. The ecosystem includes connectors, stream processing APIs, and tools for monitoring and scaling across clusters.

Highlights

Distributed log architecture with partitioned topics

Exactly‑once processing guarantees

Native support for Java 11/17 and Scala 2.13

Extensive tooling: Gradle builds, Docker images, and comprehensive test suite

Pros

Proven scalability to millions of messages per second
Robust ecosystem of clients and connectors
Strong community and enterprise support
Flexible deployment: on‑prem, cloud, or containerized

Considerations

Operational complexity requires careful capacity planning
Java/Scala runtime dependency may limit language‑agnostic use
Steep learning curve for stream processing APIs
Upgrading major versions can involve breaking changes

Managed products teams compare with

When teams consider Apache Kafka, these hosted platforms usually appear on the same shortlist.

Aiven for Apache Kafka

Managed Kafka with tiered storage and built-in schema registry.

Amazon Kinesis Data Streams

Fully managed service for real-time event streaming on AWS.

Amazon MSK

Fully managed Apache Kafka on AWS.

Looking for a hosted option? These are the services engineering teams benchmark against before choosing open source.

Fit guide

Great for

Teams building real‑time analytics pipelines
Enterprises needing durable, replayable event logs
Developers integrating heterogeneous data sources
Organizations adopting microservice architectures

Not ideal when

Simple batch jobs without streaming requirements
Projects with strict low‑latency single‑node constraints
Environments lacking Java 11+ runtime
Teams without resources for cluster management

How teams use it

Log aggregation for microservices

Centralized, ordered event store enables traceability and fault‑tolerant communication between services.

Real‑time fraud detection

Stream processing on Kafka topics identifies suspicious patterns within seconds, allowing immediate response.

Change data capture (CDC) pipelines

Database change events are published to Kafka, feeding downstream analytics and search indexes.

IoT sensor data ingestion

High‑volume sensor streams are buffered in Kafka, providing reliable buffering before processing.

Tech snapshot

Java87%

Scala11%

Python2%

Shell1%

Roff1%

Batchfile1%

Frequently asked questions

What Java versions are supported?

Kafka builds client modules with Java 11 release and other modules with Java 17; Java 11+ runtime is required.

Can I run Kafka with Docker?

Yes, the official Docker image (apache/kafka:latest) can be started with a simple port mapping.

How do I generate documentation?

Gradle tasks `javadocJar` and `scaladocJar` produce Javadoc and Scaladoc jars; `aggregatedJavadoc` creates a combined site.

Is there a way to run only specific tests?

Use Gradle’s `--tests` option with the module’s test task, e.g., `./gradlew clients:test --tests RequestResponseTest`.

Where are test coverage reports located?

Coverage HTML reports appear under each module’s `build/reports` directory, e.g., `core/build/reports/scoverageTest/index.html`.

Project at a glance

Active

View repo

Stars: 32,128
Watchers: 32,128
Forks: 15,001

LicenseApache-2.0

Repo age14 years old

Last commit4 hours ago

Primary languageJava

Last synced 2 hours ago

Overview

Overview

Deployment & Operations

Highlights

Pros

Considerations

Managed products teams compare with

Aiven for Apache Kafka

Amazon Kinesis Data Streams

Amazon MSK

Fit guide

Great for

Not ideal when

How teams use it

Tech snapshot

Tags

Frequently asked questions