SlideShare a Scribd company logo
1 of 30
www.edureka.co/r-for-analytics
www.edureka.co/apache-Kafka
How Apache Kafka is transforming Hadoop, Spark & Storm
Slide 2Slide 2Slide 2 www.edureka.co/apache-Kafka
Agenda
At the end of this webinar you will be able to know about :
īƒŧ Million Dollar Question! Why we need Kafka
īƒŧ What is Kafka?
īƒŧ Kafka Architecture
īƒŧ Kafka with Hadoop
īƒŧ Kafka with Spark
īƒŧ Kafka with Storm
īƒŧ Companies using Kafka
īƒŧ Demo on Kafka Messaging Service â€Ļ
Slide 3Slide 3Slide 3 www.edureka.co/apache-Kafka
Million Dollar Question! Why we need Kafka??
Slide 4Slide 4Slide 4 www.edureka.co/apache-Kafka
Why Kafka is preferred in place of
more traditional brokers like JMS
and AMQP
Why Kafka Cluster?
Slide 5Slide 5Slide 5 www.edureka.co/apache-Kafka
Kafka Producer Performance with Other Systems
Slide 6Slide 6Slide 6 www.edureka.co/apache-Kafka
Kafka Consumer Performance with Other Systems
Slide 7Slide 7Slide 7 www.edureka.co/apache-Kafka
Salient Features of Kafka
Feature Description
High Throughput Support for millions of messages with modest hardware
Scalability Highly scalable distributed systems with no downtime
Replication
Messages can be replicated across cluster, which provides support for multiple
subscribers and also in case of failure balances the consumers
Durability Provides support for persistence of messages to disk which can be further used for
batch consumption
Stream Processing Kafka can be used along with real time streaming applications like spark and storm
Data Loss Kafka with the proper configurations can ensure zero data loss
Slide 8Slide 8Slide 8 www.edureka.co/apache-Kafka
ī‚Ž With Kafka we can easily handle hundreds of thousands of messages in a second,
ī‚Ž The cluster can be expanded with no downtime, making Kafka highly scalable
ī‚Ž Messages are replicated, which provides reliability and durability
ī‚Ž Fault tolerant
ī‚ŽScalable
Kafka Advantages
Slide 9Slide 9Slide 9 www.edureka.co/apache-Kafka
What is Kafka
Slide 10Slide 10Slide 10 www.edureka.co/apache-Kafka
ī‚Ž A distributed publish-subscribe messaging system
ī‚Ž Developed at LinkedIn Corporation
ī‚Ž Provides solution to handle all activity stream data
ī‚Ž Fully supported in Hadoop platform
ī‚Ž Partitions real time consumption across cluster of machines
ī‚Ž Provides a mechanism for parallel load into Hadoop
What is Kafka ?
Slide 11Slide 11Slide 11 www.edureka.co/apache-Kafka
Apache Kafka – Overview
Kafka
External
Tracking Proxy
Frontend FrontendFrontend
Background
Service
(Consumer)
Background
Service
(Consumer)
Hadoop DWH
Background
Service
(Producer)
Background
Service
(Producer)
Slide 12Slide 12Slide 12 www.edureka.co/apache-Kafka
Kafka Architecture
Slide 13Slide 13Slide 13 www.edureka.co/apache-Kafka
Kafka Architecture
Producer
(Front End)
Producer
(Services)
Producer
(Proxies)
Producer
(Adapters)
Other
Producer
Zookeeper
Consumers
(Real Time)
Consumers
(NoSQL)
Consumers
(Hadoop)
Consumers
(Warehouses)
Other
Producer
Kafka Kafka Kafka Kafka Broker
Slide 14Slide 14Slide 14 www.edureka.co/apache-Kafka
ī‚Ž Below table lists the core concepts of Kafka
Kafka Core Components
Feature Description
Topic A category or feed to which messages are published
Producer Publishes messages to the Kafka Topic
Consumer Subscribes and consumes messages from Kafka Topic
Broker Handles hundreds of megabytes of reads and writes
Slide 15Slide 15Slide 15 www.edureka.co/apache-Kafka
Kafka Topic
ī‚Ž An user defined category where the messages are published
ī‚Ž For each topic a partition log is maintained
ī‚Ž Each partition basically contains an ordered, immutable sequence of messages where each message assigned a
sequential ID number called offset
ī‚Ž Writes to a partition are generally sequential thereby reducing the number of hard disk seeks
ī‚Ž Reading messages from partition can be random
Slide 16Slide 16Slide 16 www.edureka.co/apache-Kafka
ī‚Ž Applications publishes messages to the topic in kafka cluster.
ī‚Ž Can be of any kind like front end, streaming etc.,
ī‚Ž While writing messages, it is also possible to attach a key with the
message
ī‚ŽSame key will arrive in the same partition
ī‚Ž Doesn’t wait for the acknowledgement from the kafka cluster
ī‚Ž Publishes as much messages as fast as the broker in a cluster can handle
Kafka Producers
Kafka
Clusters
Producer
Producer
Producer
Slide 17Slide 17Slide 17 www.edureka.co/apache-Kafka
Kafka Consumers
ī‚Ž Applications subscribes and consumes messages from brokers in
Kafka cluster
ī‚Ž Can be of any kind like real time consumers, NoSQL consumers
etc.
ī‚Ž During consumption of messages from a topic a consumer group
can be configured with multiple consumers
ī‚Ž Each consumer of consumer group reads messages from a unique
subset of partitions in each topic they subscribe to
ī‚Ž Messages with same key arrives at same consumer
ī‚Ž Supports both Queuing and Publish-Subscribe
ī‚Ž Consumers have to maintain the number of messages consumed
Kafka Clusters
Consumer
Consumer
Consumer
Slide 18Slide 18Slide 18 www.edureka.co/apache-Kafka
ī‚ŽEach server in the cluster is called a broker
ī‚Ž Handles hundreds of MBs of writes from producers and reads
from consumers
ī‚Ž Retains all published messages irrespective of whether it is
consumed or not
ī‚Ž Retention is configured for n days
ī‚Ž Published messages is available for consumptions for
configured n days and thereafter it is discarded
ī‚Ž Works like a queue if consumer instances belong to same
consumer group ,else works like publish-subscribe
Kafka Brokers
Slide 19Slide 19Slide 19 www.edureka.co/apache-Kafka
Kafka Producer-Broker-Consumer
Slide 20Slide 20Slide 20 www.edureka.co/apache-Kafka
How Kafka is can be used with Hadoop
Slide 21Slide 21Slide 21 www.edureka.co/apache-Kafka
Kafka with Hadoop using Camus
ī‚Ž Camus is LinkedIn's Kafka->HDFS pipeline
ī‚Ž It is a MapReduce job
ī‚ŽDistributes data loads out of Kafka
ī‚ŽAt LinkedIn ,it processes tens of billions of messages/day
ī‚ŽAll work done with one single Hadoop job
Courtesy : confluent
Slide 22Slide 22Slide 22 www.edureka.co/apache-Kafka
How Kafka is can be used with Spark
Slide 23Slide 23Slide 23 www.edureka.co/apache-Kafka
Kafka With Spark Streaming
ī‚ŽIf messages are stored in n partitions ,paralleling reading makes things faster
ī‚ŽGenerally in Kafka messages are stored in multiple partitions
ī‚ŽParallelism read can be effectively achieved by spark streaming
ī‚ŽParallelism of read is achieved by integrating KafkaInputDStream of Spark with Kafka High Level Consumer API
Slide 24 www.edureka.co/apache-Kafka
APPS
Kafka
E V E N T S
STREAMING ENGINE
Kafka With Spark Streaming
ī‚ŽGenerally in Kafka messages are stored in multiple partitions
Slide 25Slide 25Slide 25 www.edureka.co/apache-Kafka
How Kafka is can be used with Storm
Slide 26Slide 26Slide 26 www.edureka.co/apache-Kafka
Kafka With Spark Streaming
Slide 27Slide 27Slide 27 www.edureka.co/apache-Kafka
Companies Using Kafka
Questions
Slide 29
Slide 30
Your feedback is vital for us, be it a compliment, a suggestion or a complaint. It helps us to make your
experience better!
Please spare few minutes to take the survey after the webinar.
Survey

More Related Content

What's hot

Kafka Cluster Federation at Uber (Yupeng Fui & Xiaoman Dong, Uber) Kafka Summ...
Kafka Cluster Federation at Uber (Yupeng Fui & Xiaoman Dong, Uber) Kafka Summ...Kafka Cluster Federation at Uber (Yupeng Fui & Xiaoman Dong, Uber) Kafka Summ...
Kafka Cluster Federation at Uber (Yupeng Fui & Xiaoman Dong, Uber) Kafka Summ...
confluent
 
Building High-Throughput, Low-Latency Pipelines in Kafka
Building High-Throughput, Low-Latency Pipelines in KafkaBuilding High-Throughput, Low-Latency Pipelines in Kafka
Building High-Throughput, Low-Latency Pipelines in Kafka
confluent
 
From Newbie to Highly Available, a Successful Kafka Adoption Tale (Jonathan S...
From Newbie to Highly Available, a Successful Kafka Adoption Tale (Jonathan S...From Newbie to Highly Available, a Successful Kafka Adoption Tale (Jonathan S...
From Newbie to Highly Available, a Successful Kafka Adoption Tale (Jonathan S...
confluent
 
The Good, The Bad, and The Avro (Graham Stirling, Saxo Bank and David Navalho...
The Good, The Bad, and The Avro (Graham Stirling, Saxo Bank and David Navalho...The Good, The Bad, and The Avro (Graham Stirling, Saxo Bank and David Navalho...
The Good, The Bad, and The Avro (Graham Stirling, Saxo Bank and David Navalho...
confluent
 

What's hot (20)

Apache kafka
Apache kafkaApache kafka
Apache kafka
 
Hello, kafka! (an introduction to apache kafka)
Hello, kafka! (an introduction to apache kafka)Hello, kafka! (an introduction to apache kafka)
Hello, kafka! (an introduction to apache kafka)
 
Kafka Tutorial - basics of the Kafka streaming platform
Kafka Tutorial - basics of the Kafka streaming platformKafka Tutorial - basics of the Kafka streaming platform
Kafka Tutorial - basics of the Kafka streaming platform
 
Twitter’s Apache Kafka Adoption Journey | Ming Liu, Twitter
Twitter’s Apache Kafka Adoption Journey | Ming Liu, TwitterTwitter’s Apache Kafka Adoption Journey | Ming Liu, Twitter
Twitter’s Apache Kafka Adoption Journey | Ming Liu, Twitter
 
... No it's Apache Kafka!
... No it's Apache Kafka!... No it's Apache Kafka!
... No it's Apache Kafka!
 
An Introduction to Apache Kafka
An Introduction to Apache KafkaAn Introduction to Apache Kafka
An Introduction to Apache Kafka
 
Kafka Cluster Federation at Uber (Yupeng Fui & Xiaoman Dong, Uber) Kafka Summ...
Kafka Cluster Federation at Uber (Yupeng Fui & Xiaoman Dong, Uber) Kafka Summ...Kafka Cluster Federation at Uber (Yupeng Fui & Xiaoman Dong, Uber) Kafka Summ...
Kafka Cluster Federation at Uber (Yupeng Fui & Xiaoman Dong, Uber) Kafka Summ...
 
Can Kafka Handle a Lyft Ride? (Andrey Falko & Can Cecen, Lyft) Kafka Summit 2020
Can Kafka Handle a Lyft Ride? (Andrey Falko & Can Cecen, Lyft) Kafka Summit 2020Can Kafka Handle a Lyft Ride? (Andrey Falko & Can Cecen, Lyft) Kafka Summit 2020
Can Kafka Handle a Lyft Ride? (Andrey Falko & Can Cecen, Lyft) Kafka Summit 2020
 
Introduction to Apache Kafka and why it matters - Madrid
Introduction to Apache Kafka and why it matters - MadridIntroduction to Apache Kafka and why it matters - Madrid
Introduction to Apache Kafka and why it matters - Madrid
 
Building High-Throughput, Low-Latency Pipelines in Kafka
Building High-Throughput, Low-Latency Pipelines in KafkaBuilding High-Throughput, Low-Latency Pipelines in Kafka
Building High-Throughput, Low-Latency Pipelines in Kafka
 
Fault Tolerance with Kafka
Fault Tolerance with KafkaFault Tolerance with Kafka
Fault Tolerance with Kafka
 
Kafka Summit SF 2017 - Kafka Connect Best Practices – Advice from the Field
Kafka Summit SF 2017 - Kafka Connect Best Practices – Advice from the FieldKafka Summit SF 2017 - Kafka Connect Best Practices – Advice from the Field
Kafka Summit SF 2017 - Kafka Connect Best Practices – Advice from the Field
 
A Modern C++ Kafka API | Kenneth Jia, Morgan Stanley
A Modern C++ Kafka API | Kenneth Jia, Morgan StanleyA Modern C++ Kafka API | Kenneth Jia, Morgan Stanley
A Modern C++ Kafka API | Kenneth Jia, Morgan Stanley
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
From Newbie to Highly Available, a Successful Kafka Adoption Tale (Jonathan S...
From Newbie to Highly Available, a Successful Kafka Adoption Tale (Jonathan S...From Newbie to Highly Available, a Successful Kafka Adoption Tale (Jonathan S...
From Newbie to Highly Available, a Successful Kafka Adoption Tale (Jonathan S...
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
The Good, The Bad, and The Avro (Graham Stirling, Saxo Bank and David Navalho...
The Good, The Bad, and The Avro (Graham Stirling, Saxo Bank and David Navalho...The Good, The Bad, and The Avro (Graham Stirling, Saxo Bank and David Navalho...
The Good, The Bad, and The Avro (Graham Stirling, Saxo Bank and David Navalho...
 
Kafka 101
Kafka 101Kafka 101
Kafka 101
 
Monitoring and Resiliency Testing our Apache Kafka Clusters at Goldman Sachs ...
Monitoring and Resiliency Testing our Apache Kafka Clusters at Goldman Sachs ...Monitoring and Resiliency Testing our Apache Kafka Clusters at Goldman Sachs ...
Monitoring and Resiliency Testing our Apache Kafka Clusters at Goldman Sachs ...
 
Devoxx Morocco 2016 - Microservices with Kafka
Devoxx Morocco 2016 - Microservices with KafkaDevoxx Morocco 2016 - Microservices with Kafka
Devoxx Morocco 2016 - Microservices with Kafka
 

Viewers also liked

Reduce Side Joins
Reduce Side Joins Reduce Side Joins
Reduce Side Joins
Edureka!
 

Viewers also liked (20)

Introduction to Big Data and Hadoop
Introduction to Big Data and HadoopIntroduction to Big Data and Hadoop
Introduction to Big Data and Hadoop
 
Hadoop- A Highly Available and Secure Enterprise DataWarehousing solution
Hadoop- A Highly Available and Secure Enterprise DataWarehousing solutionHadoop- A Highly Available and Secure Enterprise DataWarehousing solution
Hadoop- A Highly Available and Secure Enterprise DataWarehousing solution
 
Bulk Loading Into HBase With MapReduce
Bulk Loading Into HBase With MapReduceBulk Loading Into HBase With MapReduce
Bulk Loading Into HBase With MapReduce
 
Is It A Right Time For Me To Learn Hadoop. Find out ?
Is It A Right Time For Me To Learn Hadoop. Find out ?Is It A Right Time For Me To Learn Hadoop. Find out ?
Is It A Right Time For Me To Learn Hadoop. Find out ?
 
Improve Customer service with Big Data
Improve Customer service with Big DataImprove Customer service with Big Data
Improve Customer service with Big Data
 
Introduction to Big Data & Hadoop
Introduction to Big Data & HadoopIntroduction to Big Data & Hadoop
Introduction to Big Data & Hadoop
 
HealthCare and Big Data with Hadoop
HealthCare and Big Data with HadoopHealthCare and Big Data with Hadoop
HealthCare and Big Data with Hadoop
 
Introduction to Big data & Hadoop -I
Introduction to Big data & Hadoop -IIntroduction to Big data & Hadoop -I
Introduction to Big data & Hadoop -I
 
Hadoop Developer
Hadoop DeveloperHadoop Developer
Hadoop Developer
 
Hadoop : The Pile of Big Data
Hadoop : The Pile of Big DataHadoop : The Pile of Big Data
Hadoop : The Pile of Big Data
 
Hadoop for Java Professionals
Hadoop for Java ProfessionalsHadoop for Java Professionals
Hadoop for Java Professionals
 
Hadoop Adminstration with Latest Release (2.0)
Hadoop Adminstration with Latest Release (2.0)Hadoop Adminstration with Latest Release (2.0)
Hadoop Adminstration with Latest Release (2.0)
 
Understanding Big Data And Hadoop
Understanding Big Data And HadoopUnderstanding Big Data And Hadoop
Understanding Big Data And Hadoop
 
Hadoop for Data Warehousing professionals
Hadoop for Data Warehousing professionalsHadoop for Data Warehousing professionals
Hadoop for Data Warehousing professionals
 
Administer Hadoop Cluster
Administer Hadoop ClusterAdminister Hadoop Cluster
Administer Hadoop Cluster
 
Hadoop Career Path and Interview Preparation
Hadoop Career Path and Interview PreparationHadoop Career Path and Interview Preparation
Hadoop Career Path and Interview Preparation
 
Webinar: Mastering Python - An Excellent tool for Web Scraping and Data Anal...
Webinar:  Mastering Python - An Excellent tool for Web Scraping and Data Anal...Webinar:  Mastering Python - An Excellent tool for Web Scraping and Data Anal...
Webinar: Mastering Python - An Excellent tool for Web Scraping and Data Anal...
 
A day in the life of hadoop administrator!
A day in the life of hadoop administrator!A day in the life of hadoop administrator!
A day in the life of hadoop administrator!
 
Reduce Side Joins
Reduce Side Joins Reduce Side Joins
Reduce Side Joins
 
Advanced Security In Hadoop Cluster
Advanced Security In Hadoop ClusterAdvanced Security In Hadoop Cluster
Advanced Security In Hadoop Cluster
 

Similar to How kafka is transforming hadoop, spark & storm

Unlocking the Power of Apache Kafka: How Kafka Listeners Facilitate Real-time...
Unlocking the Power of Apache Kafka: How Kafka Listeners Facilitate Real-time...Unlocking the Power of Apache Kafka: How Kafka Listeners Facilitate Real-time...
Unlocking the Power of Apache Kafka: How Kafka Listeners Facilitate Real-time...
Denodo
 
Apache Kafka - Scalable Message Processing and more!
Apache Kafka - Scalable Message Processing and more!Apache Kafka - Scalable Message Processing and more!
Apache Kafka - Scalable Message Processing and more!
Guido Schmutz
 
Kafka Connect & Kafka Streams/KSQL - the ecosystem around Kafka
Kafka Connect & Kafka Streams/KSQL - the ecosystem around KafkaKafka Connect & Kafka Streams/KSQL - the ecosystem around Kafka
Kafka Connect & Kafka Streams/KSQL - the ecosystem around Kafka
Guido Schmutz
 

Similar to How kafka is transforming hadoop, spark & storm (20)

How Apache Kafka is transforming Hadoop, Spark and Storm
How Apache Kafka is transforming Hadoop, Spark and StormHow Apache Kafka is transforming Hadoop, Spark and Storm
How Apache Kafka is transforming Hadoop, Spark and Storm
 
Apache Kafka with Spark Streaming: Real-time Analytics Redefined
Apache Kafka with Spark Streaming: Real-time Analytics RedefinedApache Kafka with Spark Streaming: Real-time Analytics Redefined
Apache Kafka with Spark Streaming: Real-time Analytics Redefined
 
Understanding kafka
Understanding kafkaUnderstanding kafka
Understanding kafka
 
Unlocking the Power of Apache Kafka: How Kafka Listeners Facilitate Real-time...
Unlocking the Power of Apache Kafka: How Kafka Listeners Facilitate Real-time...Unlocking the Power of Apache Kafka: How Kafka Listeners Facilitate Real-time...
Unlocking the Power of Apache Kafka: How Kafka Listeners Facilitate Real-time...
 
Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! ...
Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! ...Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! ...
Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! ...
 
Apache kafka configuration-guide
Apache kafka configuration-guideApache kafka configuration-guide
Apache kafka configuration-guide
 
Apache Kafka - Scalable Message Processing and more!
Apache Kafka - Scalable Message Processing and more!Apache Kafka - Scalable Message Processing and more!
Apache Kafka - Scalable Message Processing and more!
 
Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !
 
Kafka for data scientists
Kafka for data scientistsKafka for data scientists
Kafka for data scientists
 
Kafka Explainaton
Kafka ExplainatonKafka Explainaton
Kafka Explainaton
 
Kafka Tutorial - introduction to the Kafka streaming platform
Kafka Tutorial - introduction to the Kafka streaming platformKafka Tutorial - introduction to the Kafka streaming platform
Kafka Tutorial - introduction to the Kafka streaming platform
 
Python Kafka Integration: Developers Guide
Python Kafka Integration: Developers GuidePython Kafka Integration: Developers Guide
Python Kafka Integration: Developers Guide
 
Kafka at the Edge: an IoT scenario with OpenShift Streams for Apache Kafka | ...
Kafka at the Edge: an IoT scenario with OpenShift Streams for Apache Kafka | ...Kafka at the Edge: an IoT scenario with OpenShift Streams for Apache Kafka | ...
Kafka at the Edge: an IoT scenario with OpenShift Streams for Apache Kafka | ...
 
Applying ML on your Data in Motion with AWS and Confluent | Joseph Morais, Co...
Applying ML on your Data in Motion with AWS and Confluent | Joseph Morais, Co...Applying ML on your Data in Motion with AWS and Confluent | Joseph Morais, Co...
Applying ML on your Data in Motion with AWS and Confluent | Joseph Morais, Co...
 
[Big Data Spain] Apache Spark Streaming + Kafka 0.10: an Integration Story
[Big Data Spain] Apache Spark Streaming + Kafka 0.10:  an Integration Story[Big Data Spain] Apache Spark Streaming + Kafka 0.10:  an Integration Story
[Big Data Spain] Apache Spark Streaming + Kafka 0.10: an Integration Story
 
14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...
14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...
14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...
 
Kafka overview
Kafka overviewKafka overview
Kafka overview
 
Apache Kafka - Strakin Technologies Pvt Ltd
Apache Kafka - Strakin Technologies Pvt LtdApache Kafka - Strakin Technologies Pvt Ltd
Apache Kafka - Strakin Technologies Pvt Ltd
 
Kafka Connect & Kafka Streams/KSQL - the ecosystem around Kafka
Kafka Connect & Kafka Streams/KSQL - the ecosystem around KafkaKafka Connect & Kafka Streams/KSQL - the ecosystem around Kafka
Kafka Connect & Kafka Streams/KSQL - the ecosystem around Kafka
 
Building Event-Driven Systems with Apache Kafka
Building Event-Driven Systems with Apache KafkaBuilding Event-Driven Systems with Apache Kafka
Building Event-Driven Systems with Apache Kafka
 

More from Edureka!

More from Edureka! (20)

What to learn during the 21 days Lockdown | Edureka
What to learn during the 21 days Lockdown | EdurekaWhat to learn during the 21 days Lockdown | Edureka
What to learn during the 21 days Lockdown | Edureka
 
Top 10 Dying Programming Languages in 2020 | Edureka
Top 10 Dying Programming Languages in 2020 | EdurekaTop 10 Dying Programming Languages in 2020 | Edureka
Top 10 Dying Programming Languages in 2020 | Edureka
 
Top 5 Trending Business Intelligence Tools | Edureka
Top 5 Trending Business Intelligence Tools | EdurekaTop 5 Trending Business Intelligence Tools | Edureka
Top 5 Trending Business Intelligence Tools | Edureka
 
Tableau Tutorial for Data Science | Edureka
Tableau Tutorial for Data Science | EdurekaTableau Tutorial for Data Science | Edureka
Tableau Tutorial for Data Science | Edureka
 
Python Programming Tutorial | Edureka
Python Programming Tutorial | EdurekaPython Programming Tutorial | Edureka
Python Programming Tutorial | Edureka
 
Top 5 PMP Certifications | Edureka
Top 5 PMP Certifications | EdurekaTop 5 PMP Certifications | Edureka
Top 5 PMP Certifications | Edureka
 
Top Maven Interview Questions in 2020 | Edureka
Top Maven Interview Questions in 2020 | EdurekaTop Maven Interview Questions in 2020 | Edureka
Top Maven Interview Questions in 2020 | Edureka
 
Linux Mint Tutorial | Edureka
Linux Mint Tutorial | EdurekaLinux Mint Tutorial | Edureka
Linux Mint Tutorial | Edureka
 
How to Deploy Java Web App in AWS| Edureka
How to Deploy Java Web App in AWS| EdurekaHow to Deploy Java Web App in AWS| Edureka
How to Deploy Java Web App in AWS| Edureka
 
Importance of Digital Marketing | Edureka
Importance of Digital Marketing | EdurekaImportance of Digital Marketing | Edureka
Importance of Digital Marketing | Edureka
 
RPA in 2020 | Edureka
RPA in 2020 | EdurekaRPA in 2020 | Edureka
RPA in 2020 | Edureka
 
Email Notifications in Jenkins | Edureka
Email Notifications in Jenkins | EdurekaEmail Notifications in Jenkins | Edureka
Email Notifications in Jenkins | Edureka
 
EA Algorithm in Machine Learning | Edureka
EA Algorithm in Machine Learning | EdurekaEA Algorithm in Machine Learning | Edureka
EA Algorithm in Machine Learning | Edureka
 
Cognitive AI Tutorial | Edureka
Cognitive AI Tutorial | EdurekaCognitive AI Tutorial | Edureka
Cognitive AI Tutorial | Edureka
 
AWS Cloud Practitioner Tutorial | Edureka
AWS Cloud Practitioner Tutorial | EdurekaAWS Cloud Practitioner Tutorial | Edureka
AWS Cloud Practitioner Tutorial | Edureka
 
Blue Prism Top Interview Questions | Edureka
Blue Prism Top Interview Questions | EdurekaBlue Prism Top Interview Questions | Edureka
Blue Prism Top Interview Questions | Edureka
 
Big Data on AWS Tutorial | Edureka
Big Data on AWS Tutorial | Edureka Big Data on AWS Tutorial | Edureka
Big Data on AWS Tutorial | Edureka
 
A star algorithm | A* Algorithm in Artificial Intelligence | Edureka
A star algorithm | A* Algorithm in Artificial Intelligence | EdurekaA star algorithm | A* Algorithm in Artificial Intelligence | Edureka
A star algorithm | A* Algorithm in Artificial Intelligence | Edureka
 
Kubernetes Installation on Ubuntu | Edureka
Kubernetes Installation on Ubuntu | EdurekaKubernetes Installation on Ubuntu | Edureka
Kubernetes Installation on Ubuntu | Edureka
 
Introduction to DevOps | Edureka
Introduction to DevOps | EdurekaIntroduction to DevOps | Edureka
Introduction to DevOps | Edureka
 

Recently uploaded

EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Earley Information Science
 

Recently uploaded (20)

Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
đŸŦ The future of MySQL is Postgres 🐘
đŸŦ  The future of MySQL is Postgres   🐘đŸŦ  The future of MySQL is Postgres   🐘
đŸŦ The future of MySQL is Postgres 🐘
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 

How kafka is transforming hadoop, spark & storm

  • 2. Slide 2Slide 2Slide 2 www.edureka.co/apache-Kafka Agenda At the end of this webinar you will be able to know about : īƒŧ Million Dollar Question! Why we need Kafka īƒŧ What is Kafka? īƒŧ Kafka Architecture īƒŧ Kafka with Hadoop īƒŧ Kafka with Spark īƒŧ Kafka with Storm īƒŧ Companies using Kafka īƒŧ Demo on Kafka Messaging Service â€Ļ
  • 3. Slide 3Slide 3Slide 3 www.edureka.co/apache-Kafka Million Dollar Question! Why we need Kafka??
  • 4. Slide 4Slide 4Slide 4 www.edureka.co/apache-Kafka Why Kafka is preferred in place of more traditional brokers like JMS and AMQP Why Kafka Cluster?
  • 5. Slide 5Slide 5Slide 5 www.edureka.co/apache-Kafka Kafka Producer Performance with Other Systems
  • 6. Slide 6Slide 6Slide 6 www.edureka.co/apache-Kafka Kafka Consumer Performance with Other Systems
  • 7. Slide 7Slide 7Slide 7 www.edureka.co/apache-Kafka Salient Features of Kafka Feature Description High Throughput Support for millions of messages with modest hardware Scalability Highly scalable distributed systems with no downtime Replication Messages can be replicated across cluster, which provides support for multiple subscribers and also in case of failure balances the consumers Durability Provides support for persistence of messages to disk which can be further used for batch consumption Stream Processing Kafka can be used along with real time streaming applications like spark and storm Data Loss Kafka with the proper configurations can ensure zero data loss
  • 8. Slide 8Slide 8Slide 8 www.edureka.co/apache-Kafka ī‚Ž With Kafka we can easily handle hundreds of thousands of messages in a second, ī‚Ž The cluster can be expanded with no downtime, making Kafka highly scalable ī‚Ž Messages are replicated, which provides reliability and durability ī‚Ž Fault tolerant ī‚ŽScalable Kafka Advantages
  • 9. Slide 9Slide 9Slide 9 www.edureka.co/apache-Kafka What is Kafka
  • 10. Slide 10Slide 10Slide 10 www.edureka.co/apache-Kafka ī‚Ž A distributed publish-subscribe messaging system ī‚Ž Developed at LinkedIn Corporation ī‚Ž Provides solution to handle all activity stream data ī‚Ž Fully supported in Hadoop platform ī‚Ž Partitions real time consumption across cluster of machines ī‚Ž Provides a mechanism for parallel load into Hadoop What is Kafka ?
  • 11. Slide 11Slide 11Slide 11 www.edureka.co/apache-Kafka Apache Kafka – Overview Kafka External Tracking Proxy Frontend FrontendFrontend Background Service (Consumer) Background Service (Consumer) Hadoop DWH Background Service (Producer) Background Service (Producer)
  • 12. Slide 12Slide 12Slide 12 www.edureka.co/apache-Kafka Kafka Architecture
  • 13. Slide 13Slide 13Slide 13 www.edureka.co/apache-Kafka Kafka Architecture Producer (Front End) Producer (Services) Producer (Proxies) Producer (Adapters) Other Producer Zookeeper Consumers (Real Time) Consumers (NoSQL) Consumers (Hadoop) Consumers (Warehouses) Other Producer Kafka Kafka Kafka Kafka Broker
  • 14. Slide 14Slide 14Slide 14 www.edureka.co/apache-Kafka ī‚Ž Below table lists the core concepts of Kafka Kafka Core Components Feature Description Topic A category or feed to which messages are published Producer Publishes messages to the Kafka Topic Consumer Subscribes and consumes messages from Kafka Topic Broker Handles hundreds of megabytes of reads and writes
  • 15. Slide 15Slide 15Slide 15 www.edureka.co/apache-Kafka Kafka Topic ī‚Ž An user defined category where the messages are published ī‚Ž For each topic a partition log is maintained ī‚Ž Each partition basically contains an ordered, immutable sequence of messages where each message assigned a sequential ID number called offset ī‚Ž Writes to a partition are generally sequential thereby reducing the number of hard disk seeks ī‚Ž Reading messages from partition can be random
  • 16. Slide 16Slide 16Slide 16 www.edureka.co/apache-Kafka ī‚Ž Applications publishes messages to the topic in kafka cluster. ī‚Ž Can be of any kind like front end, streaming etc., ī‚Ž While writing messages, it is also possible to attach a key with the message ī‚ŽSame key will arrive in the same partition ī‚Ž Doesn’t wait for the acknowledgement from the kafka cluster ī‚Ž Publishes as much messages as fast as the broker in a cluster can handle Kafka Producers Kafka Clusters Producer Producer Producer
  • 17. Slide 17Slide 17Slide 17 www.edureka.co/apache-Kafka Kafka Consumers ī‚Ž Applications subscribes and consumes messages from brokers in Kafka cluster ī‚Ž Can be of any kind like real time consumers, NoSQL consumers etc. ī‚Ž During consumption of messages from a topic a consumer group can be configured with multiple consumers ī‚Ž Each consumer of consumer group reads messages from a unique subset of partitions in each topic they subscribe to ī‚Ž Messages with same key arrives at same consumer ī‚Ž Supports both Queuing and Publish-Subscribe ī‚Ž Consumers have to maintain the number of messages consumed Kafka Clusters Consumer Consumer Consumer
  • 18. Slide 18Slide 18Slide 18 www.edureka.co/apache-Kafka ī‚ŽEach server in the cluster is called a broker ī‚Ž Handles hundreds of MBs of writes from producers and reads from consumers ī‚Ž Retains all published messages irrespective of whether it is consumed or not ī‚Ž Retention is configured for n days ī‚Ž Published messages is available for consumptions for configured n days and thereafter it is discarded ī‚Ž Works like a queue if consumer instances belong to same consumer group ,else works like publish-subscribe Kafka Brokers
  • 19. Slide 19Slide 19Slide 19 www.edureka.co/apache-Kafka Kafka Producer-Broker-Consumer
  • 20. Slide 20Slide 20Slide 20 www.edureka.co/apache-Kafka How Kafka is can be used with Hadoop
  • 21. Slide 21Slide 21Slide 21 www.edureka.co/apache-Kafka Kafka with Hadoop using Camus ī‚Ž Camus is LinkedIn's Kafka->HDFS pipeline ī‚Ž It is a MapReduce job ī‚ŽDistributes data loads out of Kafka ī‚ŽAt LinkedIn ,it processes tens of billions of messages/day ī‚ŽAll work done with one single Hadoop job Courtesy : confluent
  • 22. Slide 22Slide 22Slide 22 www.edureka.co/apache-Kafka How Kafka is can be used with Spark
  • 23. Slide 23Slide 23Slide 23 www.edureka.co/apache-Kafka Kafka With Spark Streaming ī‚ŽIf messages are stored in n partitions ,paralleling reading makes things faster ī‚ŽGenerally in Kafka messages are stored in multiple partitions ī‚ŽParallelism read can be effectively achieved by spark streaming ī‚ŽParallelism of read is achieved by integrating KafkaInputDStream of Spark with Kafka High Level Consumer API
  • 24. Slide 24 www.edureka.co/apache-Kafka APPS Kafka E V E N T S STREAMING ENGINE Kafka With Spark Streaming ī‚ŽGenerally in Kafka messages are stored in multiple partitions
  • 25. Slide 25Slide 25Slide 25 www.edureka.co/apache-Kafka How Kafka is can be used with Storm
  • 26. Slide 26Slide 26Slide 26 www.edureka.co/apache-Kafka Kafka With Spark Streaming
  • 27. Slide 27Slide 27Slide 27 www.edureka.co/apache-Kafka Companies Using Kafka
  • 28.
  • 30. Slide 30 Your feedback is vital for us, be it a compliment, a suggestion or a complaint. It helps us to make your experience better! Please spare few minutes to take the survey after the webinar. Survey