Introduction toKafka and ZookeeperJune Hadoop MeetupRahul Jain@rahuldausa
Who am I? Software Engineer Member of Core technology @ IVY Comptech,Hyderabad, India 6 years of programming experience...
Agenda• Overview• Zookeeper• Messaging System (Basic Concepts)• Kafka• Q&A3
Apache Zookeeper TM
What is a Distributed System“A Distributed system consists of multiple computersthat communicate and coordinate their acti...
What is Zookeeper• An Open source, High Performance coordination servicefor distributed applications• Centralized service ...
Zookeeper Use cases• Configuration Management• Cluster member nodes Bootstrapping configuration from acentral source• Dist...
Zookeeper Data Model Hierarchical Namespace Each node is called “znode” Each znode has data(stores data inbyte[] array)...
Let’s recall basic concepts ofMessaging System
Point to Point Messaging(Queue)Credit: http://fusesource.com/docs/broker/5.3/getting_started/FuseMBStartedKeyJMS.html
Publish-Subscribe Messaging(Topic)Credit: http://fusesource.com/docs/broker/5.3/getting_started/FuseMBStartedKeyJMS.html
Apache Kafka
Overview• An apache project initially developed at LinkedIn• Distributed publish-subscribe messaging system• Designed for ...
How it worksCredit : http://kafka.apache.org/design.html
Real time transfer15Consumer3(Group2)KafkaBrokerConsumer4(Group2)ProducerZookeeperConsumer2(Group1)Consumer1(Group1)Update...
Design Elements• Uses Filesystem Cache• Zero-copy transfer of messages• Batching of Messages• Batch Compression• Automatic...
Design Elements (Contd.)• Cluster formation of Broker/Consumer using Zookeeper,– So on the fly more consumer, broker can b...
Performance NumbersCredit : http://research.microsoft.com/en-us/UM/people/srikanth/netdb11/netdb11papers/netdb11-final12.p...
Questions ?@rahuldausa on twitter and slidesharehttp://www.linkedin.com/in/rahuldausa
Upcoming SlideShare
Loading in...5
×

Introduction to Kafka and Zookeeper

21,444

Published on

A short presentation on Overview of Kafka and Zookeeper for beginners to understand the basic concepts of these two in a lucid manner.

Published in: Technology
3 Comments
35 Likes
Statistics
Notes
No Downloads
Views
Total Views
21,444
On Slideshare
0
From Embeds
0
Number of Embeds
7
Actions
Shares
0
Downloads
488
Comments
3
Likes
35
Embeds 0
No embeds

No notes for slide

Introduction to Kafka and Zookeeper

  1. 1. Introduction toKafka and ZookeeperJune Hadoop MeetupRahul Jain@rahuldausa
  2. 2. Who am I? Software Engineer Member of Core technology @ IVY Comptech,Hyderabad, India 6 years of programming experience Areas of expertise/interest High traffic web applications JAVA/J2EE Big data, NoSQL Information-Retrieval, Machine learning2
  3. 3. Agenda• Overview• Zookeeper• Messaging System (Basic Concepts)• Kafka• Q&A3
  4. 4. Apache Zookeeper TM
  5. 5. What is a Distributed System“A Distributed system consists of multiple computersthat communicate and coordinate their actions bypassing messages. The components interact with eachother in order to achieve a common goal. ”- Wikipedia
  6. 6. What is Zookeeper• An Open source, High Performance coordination servicefor distributed applications• Centralized service for– Configuration Management– Locks and Synchronization for providing coordinationbetween distributed systems– Naming service (Registry)– Group Membership• Features– hierarchical namespace– provides watcher on a znode– allows to form a cluster of nodes• Supports a large volume of request for data retrieval andupdate• http://zookeeper.apache.org/6Source : http://zookeeper.apache.org
  7. 7. Zookeeper Use cases• Configuration Management• Cluster member nodes Bootstrapping configuration from acentral source• Distributed Cluster Management• Node Join/Leave• Node Status in real time• Naming Service – e.g. DNS• Distributed Synchronization – locks, barriers• Leader election• Centralized and Highly reliable Registry
  8. 8. Zookeeper Data Model Hierarchical Namespace Each node is called “znode” Each znode has data(stores data inbyte[] array) and can have children znode– Maintains “Stat” structure withversion of data changes , ACLchanges and timestamp– Version number increases with eachchanges
  9. 9. Let’s recall basic concepts ofMessaging System
  10. 10. Point to Point Messaging(Queue)Credit: http://fusesource.com/docs/broker/5.3/getting_started/FuseMBStartedKeyJMS.html
  11. 11. Publish-Subscribe Messaging(Topic)Credit: http://fusesource.com/docs/broker/5.3/getting_started/FuseMBStartedKeyJMS.html
  12. 12. Apache Kafka
  13. 13. Overview• An apache project initially developed at LinkedIn• Distributed publish-subscribe messaging system• Designed for processing of real time activity stream data e.g.logs, metrics collections• Written in Scala• Does not follow JMS Standards, neither uses JMS APIs• Features– Persistent messaging– High-throughput– Supports both queue and topic semantics– Uses Zookeeper for forming a cluster of nodes(producer/consumer/broker)and many more…• http://kafka.apache.org/13
  14. 14. How it worksCredit : http://kafka.apache.org/design.html
  15. 15. Real time transfer15Consumer3(Group2)KafkaBrokerConsumer4(Group2)ProducerZookeeperConsumer2(Group1)Consumer1(Group1)Update ConsumedMessage offsetQueueTopologyTopicTopologyKafkaBroker
  16. 16. Design Elements• Uses Filesystem Cache• Zero-copy transfer of messages• Batching of Messages• Batch Compression• Automatic Producer Load balancing.• Broker does not Push messages to Consumer, ConsumerPolls messages from Broker.
  17. 17. Design Elements (Contd.)• Cluster formation of Broker/Consumer using Zookeeper,– So on the fly more consumer, broker can be introduced. The newcluster rebalancing will be taken care by Zookeeper• Data is persisted in broker– But not removed on consumption (till retention period), so if oneconsumer fails while consuming, same message can be re-consumedagain later from broker.• Simplified storage mechanism for message,– not for each message per consumer.
  18. 18. Performance NumbersCredit : http://research.microsoft.com/en-us/UM/people/srikanth/netdb11/netdb11papers/netdb11-final12.pdfProducer Performance Consumer Performance
  19. 19. Questions ?@rahuldausa on twitter and slidesharehttp://www.linkedin.com/in/rahuldausa
  1. ¿Le ha llamado la atención una diapositiva en particular?

    Recortar diapositivas es una manera útil de recopilar información importante para consultarla más tarde.

×