Kafka Part 1

Apache Kafka

Kafka is not just a pub-sub system. It’s an event streaming platform .

  • It  Collectsstores, and processes events in real time.
  • Kafka Supports: Distributed Logging, Pub-Sub Messaging’ Stream Processing (via Kafka Streams, ksqlDB)
  • Kafka helps build event-driven architectures, where systems react to changes as they occur, not via polling or batch jobs.
  • Kafka tools: Kafka Connect, Kafka Streams, Schema Registry (if relevant).
  • Highlight real-time capabilities: Low-latency, high-throughput, fault-tolerant.

Event

An event is something that happened—change in state (notification, state)

Examples:

  • Thermostat reports temperature → Event
  • Invoice becomes overdue → Event
  • Mouse hovers over button → Event
  • Microservice logs completion → Event

Every event consists of:

  1. Notification: “This thing happened.”
  2. State: A structured snapshot of the data related to that occurrence (e.g., JSON, Avro, Protobuf )

Kafka Representation:

Events are represented as key/value pairs (both are byte arrays under the hood):

  1. Key: Identifier (e.g., user ID, order ID) — helps with partitioning & ordering.
  2. Value: Actual event data (payload).

Serialization:

  • Our  applications uses a structured format (e.g., JSON), which is then serialized into byte arrays to be sent to Kafka.
  • Deserialization happens when reading data back.

Topic

topic is a named stream of events in Kafka. It acts like a channel or feed to which events are published. Producers send events to topics; consumers read from topics.

Producers:

Producers are clients/apps that write events to Kafka topics.

  • Serialize data into byte arrays.
  • Choose a topic (and optionally, a partition or key).
  • Send the event using Kafka APIs (Java, Python, REST, etc.).

Consumers:

Consumers are clients/apps that read events from topics. Deserialize events and Track their read position via offsets.

Organized into consumer groups:

  • Kafka guarantees each message is read by only one consumer per group.
  • Multiple consumers allow for parallel processing.

Topic Compilation  :

  The details and structure of Kafka topics, how they’re created, configured, and managed.

Key Characteristics of a Topic

  • Immutable Log: Messages in a topic are append-only. Events are added at the end .
  • Immutable: Once written, they’re not changed.
  • Durable Storage: Messages are stored on disk (by default for 7 days, configurable).
  • Retention Policy: Time-based (e.g., 7 days) , Size-based (e.g., 1GB per partition)
  • Topic-Level ACLs: Kafka allows authorization rules to restrict who can produce/consume a topic.
  • Compacted Topics: Store only the latest event per key. Useful for state storage (e.g., user profiles). cleanup.policy=compact
  • Compaction vs Deletion
    1. – delete: Default; messages are removed after retention period
    2. – compact: Only latest message per key is retained
  • Partitioned:
    1. Topics are split into partitions for scalability and parallelism.
    2. Each partition is an ordered sequence of events.
    3. Partitioning enables Kafka to scale horizontally.
  • Parallelism: Multiple consumers can read in parallel.
    1. Ordering: Kafka guarantees message order within a partition (not across partitions).
    2. Key-based Routing: Events with the same key always go to the same partition.
  • Replicated:  Each partition can be replicated across multiple brokers. Ensures fault tolerance.

Kafka is designed for horizontal scalability:

  • Topics are partitioned → Events are distributed across partitions.
  • Each partition can be handled by a separate broker (Kafka server).
  • Producers and consumers can operate in parallel: Each consumer in a consumer group can read from different partitions.
  • Keys help route messages consistently to partitions, ensuring related data is processed together.

Real-World Use Cases for Topics

Topic NameEvent TypeProducerConsumer
ordersOrder placedWeb appOrder processor
user-activityClicks, views, page visitsFrontend servicesAnalytics service
payment-statusPayment updatesPayment gatewayBilling microservice

 Topic Configuration Options

When creating or managing topics, we can set:

partitions: Number of partitions (default: 1)

replication.factor:    Number of replicas (usually ≥ 2 for HA)

retention.ms:               How long to retain messages , Data retention duration

cleanup.policy:          delete (default) or compact

min.insync.replicas:               Minimum replicas required for a successful write

TASKCOMMAND

Create Topic

 

kafka-topics.sh –create –topic my-topic –partitions 3 –replication-factor 2 –bootstrap-server localhost:9092
Describe Topickafka-topics.sh –describe –topic my-topic –bootstrap-server localhost:9092
Delete Topickafka-topics.sh –delete –topic my-topic –bootstrap-server localhost:9092

Topic Naming Conventions (Best Practices)

  • Use descriptive names: order-created, user-signup
  • Use dash-separated lowercase words
  • Avoid overly generic names like events, data, logs
  • Design partitioning strategy based on consumer scaling and key usage
  • Monitor topic size and performance regularly

Kafka Connect — Integrating with External Systems

  • Kafka Connect is a framework for moving large amounts of data in and out of Kafka.
  • Used to integrate Kafka with external systems (e.g., databases, cloud services).
  • Two types of connectors:
    • Source Connectors: Pull data into Kafka.
    • Sink Connectors: Push data out of Kafka.
  • Comes with many pre-built connectors (e.g., JDBCElasticsearchS3).

Leave a Comment

Your email address will not be published. Required fields are marked *