Kafka Part 1 – Prashanth Ponugoti Devops Blog

Apache Kafka

Kafka is not just a pub-sub system. It’s an event streaming platform .

It Collects, stores, and processes events in real time.
Kafka Supports: Distributed Logging, Pub-Sub Messaging’ Stream Processing (via Kafka Streams, ksqlDB)
Kafka helps build event-driven architectures, where systems react to changes as they occur, not via polling or batch jobs.
Kafka tools: Kafka Connect, Kafka Streams, Schema Registry (if relevant).
Highlight real-time capabilities: Low-latency, high-throughput, fault-tolerant.

Event

An event is something that happened—change in state (notification, state)

Examples:

Thermostat reports temperature → Event
Invoice becomes overdue → Event
Mouse hovers over button → Event
Microservice logs completion → Event

Every event consists of:

Notification: “This thing happened.”
State: A structured snapshot of the data related to that occurrence (e.g., JSON, Avro, Protobuf )

Kafka Representation:

Events are represented as key/value pairs (both are byte arrays under the hood):

Key: Identifier (e.g., user ID, order ID) — helps with partitioning & ordering.
Value: Actual event data (payload).

Serialization:

Our applications uses a structured format (e.g., JSON), which is then serialized into byte arrays to be sent to Kafka.
Deserialization happens when reading data back.

Topic

A topic is a named stream of events in Kafka. It acts like a channel or feed to which events are published. Producers send events to topics; consumers read from topics.

Producers:

Producers are clients/apps that write events to Kafka topics.

Serialize data into byte arrays.
Choose a topic (and optionally, a partition or key).
Send the event using Kafka APIs (Java, Python, REST, etc.).

Consumers:

Consumers are clients/apps that read events from topics. Deserialize events and Track their read position via offsets.

Organized into consumer groups:

Kafka guarantees each message is read by only one consumer per group.
Multiple consumers allow for parallel processing.

Topic Compilation :

The details and structure of Kafka topics, how they’re created, configured, and managed.

Key Characteristics of a Topic

Immutable Log: Messages in a topic are append-only. Events are added at the end .
Immutable: Once written, they’re not changed.
Durable Storage: Messages are stored on disk (by default for 7 days, configurable).
Retention Policy: Time-based (e.g., 7 days) , Size-based (e.g., 1GB per partition)
Topic-Level ACLs: Kafka allows authorization rules to restrict who can produce/consume a topic.
Compacted Topics: Store only the latest event per key. Useful for state storage (e.g., user profiles). cleanup.policy=compact
Compaction vs Deletion

1. – delete: Default; messages are removed after retention period
2. – compact: Only latest message per key is retained

Partitioned:
1. Topics are split into partitions for scalability and parallelism.
2. Each partition is an ordered sequence of events.
3. Partitioning enables Kafka to scale horizontally.
Parallelism: Multiple consumers can read in parallel.
1. Ordering: Kafka guarantees message order within a partition (not across partitions).
2. Key-based Routing: Events with the same key always go to the same partition.
Replicated: Each partition can be replicated across multiple brokers. Ensures fault tolerance.

Kafka is designed for horizontal scalability:

Topics are partitioned → Events are distributed across partitions.
Each partition can be handled by a separate broker (Kafka server).
Producers and consumers can operate in parallel: Each consumer in a consumer group can read from different partitions.
Keys help route messages consistently to partitions, ensuring related data is processed together.

Real-World Use Cases for Topics

Topic Name	Event Type	Producer	Consumer
orders	Order placed	Web app	Order processor
user-activity	Clicks, views, page visits	Frontend services	Analytics service
payment-status	Payment updates	Payment gateway	Billing microservice

Topic Configuration Options

When creating or managing topics, we can set:

partitions: Number of partitions (default: 1)

replication.factor: Number of replicas (usually ≥ 2 for HA)

retention.ms: How long to retain messages , Data retention duration

cleanup.policy: delete (default) or compact

min.insync.replicas: Minimum replicas required for a successful write

TASK	COMMAND
Create Topic	kafka-topics.sh –create –topic my-topic –partitions 3 –replication-factor 2 –bootstrap-server localhost:9092
Describe Topic	kafka-topics.sh –describe –topic my-topic –bootstrap-server localhost:9092
Delete Topic	kafka-topics.sh –delete –topic my-topic –bootstrap-server localhost:9092

Topic Naming Conventions (Best Practices)

Use descriptive names: order-created, user-signup
Use dash-separated lowercase words
Avoid overly generic names like events, data, logs
Design partitioning strategy based on consumer scaling and key usage
Monitor topic size and performance regularly

Kafka Connect — Integrating with External Systems

Kafka Connect is a framework for moving large amounts of data in and out of Kafka.
Used to integrate Kafka with external systems (e.g., databases, cloud services).
Two types of connectors:
- Source Connectors: Pull data into Kafka.
- Sink Connectors: Push data out of Kafka.
Comes with many pre-built connectors (e.g., JDBC, Elasticsearch, S3).