Apache Kafka logo

Apache Kafka

Scalable distributed event streaming platform for real‑time data pipelines

Apache Kafka delivers high‑throughput, fault‑tolerant event streaming, enabling real‑time analytics, data integration, and mission‑critical applications across heterogeneous environments for thousands of companies.

Overview

Overview

Apache Kafka is a distributed event streaming platform that lets developers build high‑performance data pipelines and real‑time analytics. It stores streams as immutable, partitioned logs, providing durability, replayability, and exactly‑once processing guarantees. Kafka’s core is written in Java and Scala, supporting Java 11‑17 runtimes and Scala 2.13, and offers client libraries for many languages.

Deployment & Operations

Kafka can be run from the compiled binaries, via the provided Docker image, or integrated into CI/CD pipelines using Gradle tasks such as jar, test, and releaseTarGz. The project includes extensive testing support, coverage reporting, and auto‑generated documentation. For production, brokers are configured with kafka-storage.sh and started with kafka-server-start.sh. The ecosystem includes connectors, stream processing APIs, and tools for monitoring and scaling across clusters.

Highlights

Distributed log architecture with partitioned topics
Exactly‑once processing guarantees
Native support for Java 11/17 and Scala 2.13
Extensive tooling: Gradle builds, Docker images, and comprehensive test suite

Pros

  • Proven scalability to millions of messages per second
  • Robust ecosystem of clients and connectors
  • Strong community and enterprise support
  • Flexible deployment: on‑prem, cloud, or containerized

Considerations

  • Operational complexity requires careful capacity planning
  • Java/Scala runtime dependency may limit language‑agnostic use
  • Steep learning curve for stream processing APIs
  • Upgrading major versions can involve breaking changes

Managed products teams compare with

When teams consider Apache Kafka, these hosted platforms usually appear on the same shortlist.

Aiven for Apache Kafka logo

Aiven for Apache Kafka

Managed Kafka with tiered storage and built-in schema registry.

Amazon Kinesis Data Streams logo

Amazon Kinesis Data Streams

Fully managed service for real-time event streaming on AWS.

Amazon MSK logo

Amazon MSK

Fully managed Apache Kafka on AWS.

Looking for a hosted option? These are the services engineering teams benchmark against before choosing open source.

Fit guide

Great for

  • Teams building real‑time analytics pipelines
  • Enterprises needing durable, replayable event logs
  • Developers integrating heterogeneous data sources
  • Organizations adopting microservice architectures

Not ideal when

  • Simple batch jobs without streaming requirements
  • Projects with strict low‑latency single‑node constraints
  • Environments lacking Java 11+ runtime
  • Teams without resources for cluster management

How teams use it

Log aggregation for microservices

Centralized, ordered event store enables traceability and fault‑tolerant communication between services.

Real‑time fraud detection

Stream processing on Kafka topics identifies suspicious patterns within seconds, allowing immediate response.

Change data capture (CDC) pipelines

Database change events are published to Kafka, feeding downstream analytics and search indexes.

IoT sensor data ingestion

High‑volume sensor streams are buffered in Kafka, providing reliable buffering before processing.

Tech snapshot

Java87%
Scala11%
Python2%
Shell1%
Roff1%
Batchfile1%

Tags

scalakafka

Frequently asked questions

What Java versions are supported?

Kafka builds client modules with Java 11 release and other modules with Java 17; Java 11+ runtime is required.

Can I run Kafka with Docker?

Yes, the official Docker image (apache/kafka:latest) can be started with a simple port mapping.

How do I generate documentation?

Gradle tasks `javadocJar` and `scaladocJar` produce Javadoc and Scaladoc jars; `aggregatedJavadoc` creates a combined site.

Is there a way to run only specific tests?

Use Gradle’s `--tests` option with the module’s test task, e.g., `./gradlew clients:test --tests RequestResponseTest`.

Where are test coverage reports located?

Coverage HTML reports appear under each module’s `build/reports` directory, e.g., `core/build/reports/scoverageTest/index.html`.

Project at a glance

Active
Stars
31,746
Watchers
31,746
Forks
14,911
LicenseApache-2.0
Repo age14 years old
Last commityesterday
Primary languageJava

Last synced yesterday