2. Slide 2Slide 2Slide 2 www.edureka.co/apache-Kafka
Agenda
At the end of this webinar you will be able to know about :
īŧ Million Dollar Question! Why we need Kafka
īŧ What is Kafka?
īŧ Kafka Architecture
īŧ Kafka with Hadoop
īŧ Kafka with Spark
īŧ Kafka with Storm
īŧ Companies using Kafka
īŧ Demo on Kafka Messaging Service âĻ
3. Slide 3Slide 3Slide 3 www.edureka.co/apache-Kafka
Million Dollar Question! Why we need Kafka??
4. Slide 4Slide 4Slide 4 www.edureka.co/apache-Kafka
Why Kafka is preferred in place of
more traditional brokers like JMS
and AMQP
Why Kafka Cluster?
5. Slide 5Slide 5Slide 5 www.edureka.co/apache-Kafka
Kafka Producer Performance with Other Systems
6. Slide 6Slide 6Slide 6 www.edureka.co/apache-Kafka
Kafka Consumer Performance with Other Systems
7. Slide 7Slide 7Slide 7 www.edureka.co/apache-Kafka
Salient Features of Kafka
Feature Description
High Throughput Support for millions of messages with modest hardware
Scalability Highly scalable distributed systems with no downtime
Replication
Messages can be replicated across cluster, which provides support for multiple
subscribers and also in case of failure balances the consumers
Durability Provides support for persistence of messages to disk which can be further used for
batch consumption
Stream Processing Kafka can be used along with real time streaming applications like spark and storm
Data Loss Kafka with the proper configurations can ensure zero data loss
8. Slide 8Slide 8Slide 8 www.edureka.co/apache-Kafka
īŽ With Kafka we can easily handle hundreds of thousands of messages in a second,
īŽ The cluster can be expanded with no downtime, making Kafka highly scalable
īŽ Messages are replicated, which provides reliability and durability
īŽ Fault tolerant
īŽScalable
Kafka Advantages
10. Slide 10Slide 10Slide 10 www.edureka.co/apache-Kafka
īŽ A distributed publish-subscribe messaging system
īŽ Developed at LinkedIn Corporation
īŽ Provides solution to handle all activity stream data
īŽ Fully supported in Hadoop platform
īŽ Partitions real time consumption across cluster of machines
īŽ Provides a mechanism for parallel load into Hadoop
What is Kafka ?
11. Slide 11Slide 11Slide 11 www.edureka.co/apache-Kafka
Apache Kafka â Overview
Kafka
External
Tracking Proxy
Frontend FrontendFrontend
Background
Service
(Consumer)
Background
Service
(Consumer)
Hadoop DWH
Background
Service
(Producer)
Background
Service
(Producer)
14. Slide 14Slide 14Slide 14 www.edureka.co/apache-Kafka
īŽ Below table lists the core concepts of Kafka
Kafka Core Components
Feature Description
Topic A category or feed to which messages are published
Producer Publishes messages to the Kafka Topic
Consumer Subscribes and consumes messages from Kafka Topic
Broker Handles hundreds of megabytes of reads and writes
15. Slide 15Slide 15Slide 15 www.edureka.co/apache-Kafka
Kafka Topic
īŽ An user defined category where the messages are published
īŽ For each topic a partition log is maintained
īŽ Each partition basically contains an ordered, immutable sequence of messages where each message assigned a
sequential ID number called offset
īŽ Writes to a partition are generally sequential thereby reducing the number of hard disk seeks
īŽ Reading messages from partition can be random
16. Slide 16Slide 16Slide 16 www.edureka.co/apache-Kafka
īŽ Applications publishes messages to the topic in kafka cluster.
īŽ Can be of any kind like front end, streaming etc.,
īŽ While writing messages, it is also possible to attach a key with the
message
īŽSame key will arrive in the same partition
īŽ Doesnât wait for the acknowledgement from the kafka cluster
īŽ Publishes as much messages as fast as the broker in a cluster can handle
Kafka Producers
Kafka
Clusters
Producer
Producer
Producer
17. Slide 17Slide 17Slide 17 www.edureka.co/apache-Kafka
Kafka Consumers
īŽ Applications subscribes and consumes messages from brokers in
Kafka cluster
īŽ Can be of any kind like real time consumers, NoSQL consumers
etc.
īŽ During consumption of messages from a topic a consumer group
can be configured with multiple consumers
īŽ Each consumer of consumer group reads messages from a unique
subset of partitions in each topic they subscribe to
īŽ Messages with same key arrives at same consumer
īŽ Supports both Queuing and Publish-Subscribe
īŽ Consumers have to maintain the number of messages consumed
Kafka Clusters
Consumer
Consumer
Consumer
18. Slide 18Slide 18Slide 18 www.edureka.co/apache-Kafka
īŽEach server in the cluster is called a broker
īŽ Handles hundreds of MBs of writes from producers and reads
from consumers
īŽ Retains all published messages irrespective of whether it is
consumed or not
īŽ Retention is configured for n days
īŽ Published messages is available for consumptions for
configured n days and thereafter it is discarded
īŽ Works like a queue if consumer instances belong to same
consumer group ,else works like publish-subscribe
Kafka Brokers
20. Slide 20Slide 20Slide 20 www.edureka.co/apache-Kafka
How Kafka is can be used with Hadoop
21. Slide 21Slide 21Slide 21 www.edureka.co/apache-Kafka
Kafka with Hadoop using Camus
īŽ Camus is LinkedIn's Kafka->HDFS pipeline
īŽ It is a MapReduce job
īŽDistributes data loads out of Kafka
īŽAt LinkedIn ,it processes tens of billions of messages/day
īŽAll work done with one single Hadoop job
Courtesy : confluent
22. Slide 22Slide 22Slide 22 www.edureka.co/apache-Kafka
How Kafka is can be used with Spark
23. Slide 23Slide 23Slide 23 www.edureka.co/apache-Kafka
Kafka With Spark Streaming
īŽIf messages are stored in n partitions ,paralleling reading makes things faster
īŽGenerally in Kafka messages are stored in multiple partitions
īŽParallelism read can be effectively achieved by spark streaming
īŽParallelism of read is achieved by integrating KafkaInputDStream of Spark with Kafka High Level Consumer API
30. Slide 30
Your feedback is vital for us, be it a compliment, a suggestion or a complaint. It helps us to make your
experience better!
Please spare few minutes to take the survey after the webinar.
Survey