Kafka_101_Presentation-shahin-developer.pptx

Kafka 101: Understanding Event Streaming
Apache Kafka Overview
Event-driven architecture
Technical Deep Dive

What is Apache Kafka?
Kafka is a distributed event streaming platform.
Developed by LinkedIn, now under Apache Foundation.
Used for building real-time data pipelines and streaming apps.

Why Kafka?
High throughput and low latency.
Scalable and fault-tolerant.
Decouples producers and consumers.
Ideal for real-time analytics, monitoring, and event sourcing.

Core Concepts
Producer: sends messages to topics.
Topic: category where records are published.
Partition: parallel unit of storage within a topic.
Consumer: reads messages from topics.
Broker: Kafka server managing data storage and replication.

Kafka Architecture
Cluster of brokers handles message distribution.
Producers push data to topics.
Consumers pull data from topics.
ZooKeeper (legacy) or KRaft manages metadata and coordination.

Data Flow in Kafka
Producer → Topic → Partition → Consumer.
Messages stored with offsets for replayability.
Consumers track offsets for exactly-once or at-least-once delivery.

Kafka Ecosystem
Kafka Connect – integrates with external systems.
Kafka Streams – processes data in real-time.
Schema Registry – manages message structure.
KSQL – SQL-like stream processing.

Example Scenario: Order Processing
E-commerce system generates 'OrderCreated' events.
Producer: Order service publishes events.
Topic: 'orders'.
Consumer: Billing, Inventory, and Notification services process events.

Java Producer Example
Properties props = new Properties();
props.put("bootstrap.servers", "localhost:9092");
props.put("key.serializer", "StringSerializer");
props.put("value.serializer", "StringSerializer");
Producer<String, String> producer = new KafkaProducer<>(props);
producer.send(new ProducerRecord<>("orders", "order123", "New Order"));
producer.close();

Java Consumer Example
Properties props = new Properties();
props.put("bootstrap.servers", "localhost:9092");
props.put("group.id", "order-group");
props.put("key.deserializer", "StringDeserializer");
props.put("value.deserializer", "StringDeserializer");
KafkaConsumer<String, String> consumer = new KafkaConsumer<>(props);
consumer.subscribe(Arrays.asList("orders"));
while (true) {
for (ConsumerRecord<String, String> record : consumer.poll(Duration.ofMillis(100))) {
System.out.println(record.key() + ": " + record.value());
}
}

Kafka Retention & Replay
Kafka retains messages for a defined period or size.
Consumers can replay messages from offsets.
Supports debugging, recovery, and auditing.

Scaling Kafka
Add more partitions to increase parallelism.
Replicate partitions for fault tolerance.
Producers can use partition keys for ordering guarantees.

Kafka Guarantees
At-least-once delivery (default).
At-most-once (possible data loss).
Exactly-once semantics (with idempotence enabled).

When to Use Kafka
Event-driven microservices.
Streaming analytics and ETL pipelines.
Log aggregation and monitoring.
Asynchronous workflows.

When NOT to Use Kafka
Small-scale apps without high throughput needs.
Low-latency transactional requests.
When simple REST APIs suffice.

Event-driven vs Synchronous Calls
Event-driven: producer doesn’t wait for response.
Synchronous: immediate response needed.
Kafka decouples services; REST tightly couples them.
Event-based = resilient; Sync calls = predictable.

Comparison Table
Kafka (Async) | REST API (Sync)
----------------|----------------
Decoupled | Coupled
Asynchronous | Request-response
High throughput | Limited scalability
Event replay | No replay capability

Best Practices
Use schema registry to manage message format.
Partition data logically for load distribution.
Enable replication for fault tolerance.
Monitor consumer lag.

Troubleshooting
Check broker logs and consumer offsets.
Monitor using Kafka Manager or Prometheus.
Watch for under-replicated partitions.
Use idempotent producers to avoid duplicates.

Security in Kafka
Use SSL/TLS for encryption.
SASL authentication for clients.
ACLs for topic access control.

Kafka in Production
Deploy across multiple availability zones.
Automate with Kubernetes or Confluent tools.
Ensure monitoring and alerting systems.

Real-World Use Cases
Netflix: Real-time monitoring & recommendations.
Uber: Stream processing for ETAs.
LinkedIn: Activity streams & metrics collection.

Summary
Kafka enables scalable, reliable event streaming.
Decouples systems via pub/sub architecture.
Ideal for modern distributed systems.

Key Takeaways
Event-driven design improves scalability.
Kafka ensures durability and replayability.
Use Kafka wisely — not every system needs it.

Resources
Apache Kafka Documentation
Confluent Kafka Tutorials
Kafka: The Definitive Guide (O’Reilly)

Kafka_101_Presentation-shahin-developer.pptx

More Related Content

Similar to Kafka_101_Presentation-shahin-developer.pptx

Recently uploaded

Kafka_101_Presentation-shahin-developer.pptx