Kafka 101: Understanding Event Streaming
Apache Kafka Overview
Event-driven architecture
Technical Deep Dive
What is Apache Kafka?
Kafka is a distributed event streaming platform.
Developed by LinkedIn, now under Apache Foundation.
Used for building real-time data pipelines and streaming apps.
Why Kafka?
High throughput and low latency.
Scalable and fault-tolerant.
Decouples producers and consumers.
Ideal for real-time analytics, monitoring, and event sourcing.
Core Concepts
Producer: sends messages to topics.
Topic: category where records are published.
Partition: parallel unit of storage within a topic.
Consumer: reads messages from topics.
Broker: Kafka server managing data storage and replication.
Kafka Architecture
Cluster of brokers handles message distribution.
Producers push data to topics.
Consumers pull data from topics.
ZooKeeper (legacy) or KRaft manages metadata and coordination.
Data Flow in Kafka
Producer → Topic → Partition → Consumer.
Messages stored with offsets for replayability.
Consumers track offsets for exactly-once or at-least-once delivery.
Kafka Ecosystem
Kafka Connect – integrates with external systems.
Kafka Streams – processes data in real-time.
Schema Registry – manages message structure.
KSQL – SQL-like stream processing.
Example Scenario: Order Processing
E-commerce system generates 'OrderCreated' events.
Producer: Order service publishes events.
Topic: 'orders'.
Consumer: Billing, Inventory, and Notification services process events.
Java Producer Example
Properties props = new Properties();
props.put("bootstrap.servers", "localhost:9092");
props.put("key.serializer", "StringSerializer");
props.put("value.serializer", "StringSerializer");
Producer<String, String> producer = new KafkaProducer<>(props);
producer.send(new ProducerRecord<>("orders", "order123", "New Order"));
producer.close();
Java Consumer Example
Properties props = new Properties();
props.put("bootstrap.servers", "localhost:9092");
props.put("group.id", "order-group");
props.put("key.deserializer", "StringDeserializer");
props.put("value.deserializer", "StringDeserializer");
KafkaConsumer<String, String> consumer = new KafkaConsumer<>(props);
consumer.subscribe(Arrays.asList("orders"));
while (true) {
for (ConsumerRecord<String, String> record : consumer.poll(Duration.ofMillis(100))) {
System.out.println(record.key() + ": " + record.value());
}
}
Kafka Retention & Replay
Kafka retains messages for a defined period or size.
Consumers can replay messages from offsets.
Supports debugging, recovery, and auditing.
Scaling Kafka
Add more partitions to increase parallelism.
Replicate partitions for fault tolerance.
Producers can use partition keys for ordering guarantees.
Kafka Guarantees
At-least-once delivery (default).
At-most-once (possible data loss).
Exactly-once semantics (with idempotence enabled).
When to Use Kafka
Event-driven microservices.
Streaming analytics and ETL pipelines.
Log aggregation and monitoring.
Asynchronous workflows.
When NOT to Use Kafka
Small-scale apps without high throughput needs.
Low-latency transactional requests.
When simple REST APIs suffice.
Event-driven vs Synchronous Calls
Event-driven: producer doesn’t wait for response.
Synchronous: immediate response needed.
Kafka decouples services; REST tightly couples them.
Event-based = resilient; Sync calls = predictable.
Comparison Table
Kafka (Async) | REST API (Sync)
----------------|----------------
Decoupled | Coupled
Asynchronous | Request-response
High throughput | Limited scalability
Event replay | No replay capability
Best Practices
Use schema registry to manage message format.
Partition data logically for load distribution.
Enable replication for fault tolerance.
Monitor consumer lag.
Troubleshooting
Check broker logs and consumer offsets.
Monitor using Kafka Manager or Prometheus.
Watch for under-replicated partitions.
Use idempotent producers to avoid duplicates.
Security in Kafka
Use SSL/TLS for encryption.
SASL authentication for clients.
ACLs for topic access control.
Kafka in Production
Deploy across multiple availability zones.
Automate with Kubernetes or Confluent tools.
Ensure monitoring and alerting systems.
Real-World Use Cases
Netflix: Real-time monitoring & recommendations.
Uber: Stream processing for ETAs.
LinkedIn: Activity streams & metrics collection.
Summary
Kafka enables scalable, reliable event streaming.
Decouples systems via pub/sub architecture.
Ideal for modern distributed systems.
Key Takeaways
Event-driven design improves scalability.
Kafka ensures durability and replayability.
Use Kafka wisely — not every system needs it.
Resources
Apache Kafka Documentation
Confluent Kafka Tutorials
Kafka: The Definitive Guide (O’Reilly)
Q&A
Thank you!
Questions?

Kafka_101_Presentation-shahin-developer.pptx

  • 1.
    Kafka 101: UnderstandingEvent Streaming Apache Kafka Overview Event-driven architecture Technical Deep Dive
  • 2.
    What is ApacheKafka? Kafka is a distributed event streaming platform. Developed by LinkedIn, now under Apache Foundation. Used for building real-time data pipelines and streaming apps.
  • 3.
    Why Kafka? High throughputand low latency. Scalable and fault-tolerant. Decouples producers and consumers. Ideal for real-time analytics, monitoring, and event sourcing.
  • 4.
    Core Concepts Producer: sendsmessages to topics. Topic: category where records are published. Partition: parallel unit of storage within a topic. Consumer: reads messages from topics. Broker: Kafka server managing data storage and replication.
  • 5.
    Kafka Architecture Cluster ofbrokers handles message distribution. Producers push data to topics. Consumers pull data from topics. ZooKeeper (legacy) or KRaft manages metadata and coordination.
  • 6.
    Data Flow inKafka Producer → Topic → Partition → Consumer. Messages stored with offsets for replayability. Consumers track offsets for exactly-once or at-least-once delivery.
  • 7.
    Kafka Ecosystem Kafka Connect– integrates with external systems. Kafka Streams – processes data in real-time. Schema Registry – manages message structure. KSQL – SQL-like stream processing.
  • 8.
    Example Scenario: OrderProcessing E-commerce system generates 'OrderCreated' events. Producer: Order service publishes events. Topic: 'orders'. Consumer: Billing, Inventory, and Notification services process events.
  • 9.
    Java Producer Example Propertiesprops = new Properties(); props.put("bootstrap.servers", "localhost:9092"); props.put("key.serializer", "StringSerializer"); props.put("value.serializer", "StringSerializer"); Producer<String, String> producer = new KafkaProducer<>(props); producer.send(new ProducerRecord<>("orders", "order123", "New Order")); producer.close();
  • 10.
    Java Consumer Example Propertiesprops = new Properties(); props.put("bootstrap.servers", "localhost:9092"); props.put("group.id", "order-group"); props.put("key.deserializer", "StringDeserializer"); props.put("value.deserializer", "StringDeserializer"); KafkaConsumer<String, String> consumer = new KafkaConsumer<>(props); consumer.subscribe(Arrays.asList("orders")); while (true) { for (ConsumerRecord<String, String> record : consumer.poll(Duration.ofMillis(100))) { System.out.println(record.key() + ": " + record.value()); } }
  • 11.
    Kafka Retention &Replay Kafka retains messages for a defined period or size. Consumers can replay messages from offsets. Supports debugging, recovery, and auditing.
  • 12.
    Scaling Kafka Add morepartitions to increase parallelism. Replicate partitions for fault tolerance. Producers can use partition keys for ordering guarantees.
  • 13.
    Kafka Guarantees At-least-once delivery(default). At-most-once (possible data loss). Exactly-once semantics (with idempotence enabled).
  • 14.
    When to UseKafka Event-driven microservices. Streaming analytics and ETL pipelines. Log aggregation and monitoring. Asynchronous workflows.
  • 15.
    When NOT toUse Kafka Small-scale apps without high throughput needs. Low-latency transactional requests. When simple REST APIs suffice.
  • 16.
    Event-driven vs SynchronousCalls Event-driven: producer doesn’t wait for response. Synchronous: immediate response needed. Kafka decouples services; REST tightly couples them. Event-based = resilient; Sync calls = predictable.
  • 17.
    Comparison Table Kafka (Async)| REST API (Sync) ----------------|---------------- Decoupled | Coupled Asynchronous | Request-response High throughput | Limited scalability Event replay | No replay capability
  • 18.
    Best Practices Use schemaregistry to manage message format. Partition data logically for load distribution. Enable replication for fault tolerance. Monitor consumer lag.
  • 19.
    Troubleshooting Check broker logsand consumer offsets. Monitor using Kafka Manager or Prometheus. Watch for under-replicated partitions. Use idempotent producers to avoid duplicates.
  • 20.
    Security in Kafka UseSSL/TLS for encryption. SASL authentication for clients. ACLs for topic access control.
  • 21.
    Kafka in Production Deployacross multiple availability zones. Automate with Kubernetes or Confluent tools. Ensure monitoring and alerting systems.
  • 22.
    Real-World Use Cases Netflix:Real-time monitoring & recommendations. Uber: Stream processing for ETAs. LinkedIn: Activity streams & metrics collection.
  • 23.
    Summary Kafka enables scalable,reliable event streaming. Decouples systems via pub/sub architecture. Ideal for modern distributed systems.
  • 24.
    Key Takeaways Event-driven designimproves scalability. Kafka ensures durability and replayability. Use Kafka wisely — not every system needs it.
  • 25.
    Resources Apache Kafka Documentation ConfluentKafka Tutorials Kafka: The Definitive Guide (O’Reilly)
  • 26.