Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

The best of Apache Kafka Architecture


Published on

Big data event streaming is very common part of any big data Architecture. Of the available open source big data streaming technologies Apache Kafka stands out because of it realtime, distributed, and reliable characteristics. This is possible because of the Kafka Architecture. This talk highlights those features.

Published in: Data & Analytics
  • Writing a good research paper isn't easy and it's the fruit of hard work. For help you can check writing expert. Check out, please ⇒ ⇐ I think they are the best
    Are you sure you want to  Yes  No
    Your message goes here

The best of Apache Kafka Architecture

  1. 1. The Best of Apache Kafka Architecture Ranganathan Balashanmugam @ran_than Apache: Big Data 2015
  2. 2. Helló Budapest
  3. 3. About Me ❏ Graduated as Civil Engineer. ❏ <dev> 10+ years </dev> ❏ <Thoughtworker from=”India”/> ❏ Organizer of Hyderabad Scalability Meetup with 2000+ members.
  4. 4. “Form follows function.” - Louis Sullivan
  5. 5. Gravity Dam Indirasagar Dam, India img src:
  6. 6. Forces on a gravity dam Dam weight Head Water Tail Water Uplift
  7. 7. ❏ publish-subscribe messaging service ❏ distributed commit/write-ahead log “producers produce, consumers consume, in large distributed reliable way -- real time”
  8. 8. ❏ DBs ❏ Logs ❏ Brokers ❏ HDFS “For highly distributed messages, Kafka stands out.” Why Kafka?
  9. 9. Kafka Vs ________ src:
  10. 10. Timeline 2011 2012 2013 2014 2015 Open sourced by LinkedIn, as version 0.6 Graduated from Apache Latest stable - Several Engineers who built Kakfa create Confluent
  11. 11. A Kafka Message CRC attributes key length key message message length message content kafka.message.Message magic Change requested:KAFKA-2511
  12. 12. Producers - push Kafka Broker org.apache.kafka.clients.producer.KafkaProducer Response => [TopicName [Partition ErrorCode Offset]] Request => RequiredAcks Timeout [TopicName [Partition MessageSetSize MessageSet]]
  13. 13. Topic number of messages time size Remove messages based on kafka.common.Topic
  14. 14. Partitions kafka.cluster.Partition Serves: Horizontal scaling, Parallel consumer reads
  15. 15. Consumers - pull kafka.consumer.ConsumerConnector, kafka.consumer.SimpleConsumer Consumer 1 Consumer 2
  16. 16. Consumer offsets committing and fetching consumer offsets img src:
  17. 17. kafka:// - protocol ● Metadata ● Send ● Fetch ● Offsets ● Offset commit ● Offset fetch “Binary protocol over TCP”
  18. 18. Mechanical Sympathy "The most amazing achievement of the computer software industry is its continuing cancellation of the steady and staggering gains made by the computer hardware industry." - Henry Peteroski Image source:
  19. 19. Persistence “Everything is faster till the disk IO.”
  20. 20. Disk faster than RAM src:
  21. 21. Linear Read & Writes On high level there are only two operations: Append to end of log fetch messages from a partition beginning from a particular message id sequential file I/O
  22. 22. “Let us play pictionary”
  23. 23. Linux Page Cache “Kafka ate my RAM”
  24. 24. ZeroCopy src:
  25. 25. Batching small latency to improve throughput img src:
  26. 26. Compression bandwidth is more expensive per-byte to scale than disk I/O, CPU, or network bandwidth capacity within a facility kafka.message.CompressionCodec
  27. 27. Log compaction img src: kafka.log.LogCleaner, LogCleanerManager
  28. 28. Message Delivery Atleast once Atmost once Exactly once
  29. 29. Replication un-replicated = replication factor of one
  30. 30. Quorum based ● Better latency ● To tolerate “f” failures, need “2f+1” replicas
  31. 31. Primary-backup replication Broker 1 Broker 2 Broker 3 Broker 4 Topic 1 Topic 1 Topic 1 Topic 2 Topic 2 Topic 2 Topic 3 Topic 3Topic 3
  32. 32. ZooKeeper cluster coordinator
  33. 33. THANK YOU For questions or suggestions: B @ran_than