SlideShare a Scribd company logo
1 of 7
Download to read offline
STREAMING WITH
KAFKA
Publish/Subscribe Messaging with Kafka
What is streaming?
■ So far we’ve really just talked about processing historical, existing big data
– Sitting on HDFS
– Sitting in a database
■ But how does new data get into your cluster? Especially if it’s “Big data”?
– New log entries from your web servers
– New sensor data from your IoT system
– New stock trades
■ Streaming lets you publish this data, in real time, to your cluster.
– And you can even process it in real time as it comes in!
Two problems
■ How to get data from many different sources flowing into your cluster
■ Processing it when it gets there
■ First, let’s focus on the first problem
Enter Kafka
■ Kafka is a general-purpose publish/subscribe messaging system
■ Kafka servers store all incoming messages from publishers for some period of
time, and publishes them to a stream of data called a topic.
■ Kafka consumers subscribe to one or more topics, and receive data as it’s
published
■ A stream / topic can have many different consumers, all with their own
position in the stream maintained
■ It’s not just for Hadoop
Kafka architecture
Kafka Cluster
App App App
App
App
App App App
DB
DB
Producers
Consumers
Stream
Processors
Connectors
How Kafka scales
Image: kafka.apache.org
■ Kafka itself may be distributed among
many processes on many servers
– Will distribute the storage of stream
data as well
■ Consumers may also be distributed
– Consumers of the same group will
have messages distributed amongst
them
– Consumers of different groups will get
their own copy of each message
Let’s play
■ Start Kafka on our sandbox
■ Set up a topic
– Publish some data to it, and watch it get consumed
■ Set up a file connector
– Monitor a log file and publish additions to it

More Related Content

Similar to STREAMING WITH KAFKA Publish/Subscribe Messaging with Kafka

introductiontoapachekafka-201102140206.pdf
introductiontoapachekafka-201102140206.pdfintroductiontoapachekafka-201102140206.pdf
introductiontoapachekafka-201102140206.pdf
TarekHamdi8
 

Similar to STREAMING WITH KAFKA Publish/Subscribe Messaging with Kafka (20)

Apache kafka
Apache kafkaApache kafka
Apache kafka
 
Python Kafka Integration: Developers Guide
Python Kafka Integration: Developers GuidePython Kafka Integration: Developers Guide
Python Kafka Integration: Developers Guide
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
Distributed messaging through Kafka
Distributed messaging through KafkaDistributed messaging through Kafka
Distributed messaging through Kafka
 
Apache storm
Apache stormApache storm
Apache storm
 
Introduction to Kafka Streams Presentation
Introduction to Kafka Streams PresentationIntroduction to Kafka Streams Presentation
Introduction to Kafka Streams Presentation
 
Kafka for Scale
Kafka for ScaleKafka for Scale
Kafka for Scale
 
Kafka
KafkaKafka
Kafka
 
Kafka presentation
Kafka presentationKafka presentation
Kafka presentation
 
Hello, kafka! (an introduction to apache kafka)
Hello, kafka! (an introduction to apache kafka)Hello, kafka! (an introduction to apache kafka)
Hello, kafka! (an introduction to apache kafka)
 
introductiontoapachekafka-201102140206.pdf
introductiontoapachekafka-201102140206.pdfintroductiontoapachekafka-201102140206.pdf
introductiontoapachekafka-201102140206.pdf
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache Kafka
 
Kafka syed academy_v1_introduction
Kafka syed academy_v1_introductionKafka syed academy_v1_introduction
Kafka syed academy_v1_introduction
 
Kafka tutorial
Kafka tutorialKafka tutorial
Kafka tutorial
 
Kafka Streams for Java enthusiasts
Kafka Streams for Java enthusiastsKafka Streams for Java enthusiasts
Kafka Streams for Java enthusiasts
 
Building Streaming Data Applications Using Apache Kafka
Building Streaming Data Applications Using Apache KafkaBuilding Streaming Data Applications Using Apache Kafka
Building Streaming Data Applications Using Apache Kafka
 
Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! ...
Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! ...Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! ...
Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! ...
 
Current and Future of Apache Kafka
Current and Future of Apache KafkaCurrent and Future of Apache Kafka
Current and Future of Apache Kafka
 
Building streaming data applications using Kafka*[Connect + Core + Streams] b...
Building streaming data applications using Kafka*[Connect + Core + Streams] b...Building streaming data applications using Kafka*[Connect + Core + Streams] b...
Building streaming data applications using Kafka*[Connect + Core + Streams] b...
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 

Recently uploaded

一比一原版(Griffith毕业证书)格里菲斯大学毕业证成绩单学位证书
一比一原版(Griffith毕业证书)格里菲斯大学毕业证成绩单学位证书一比一原版(Griffith毕业证书)格里菲斯大学毕业证成绩单学位证书
一比一原版(Griffith毕业证书)格里菲斯大学毕业证成绩单学位证书
c3384a92eb32
 
21P35A0312 Internship eccccccReport.docx
21P35A0312 Internship eccccccReport.docx21P35A0312 Internship eccccccReport.docx
21P35A0312 Internship eccccccReport.docx
rahulmanepalli02
 

Recently uploaded (20)

Diploma Engineering Drawing Qp-2024 Ece .pdf
Diploma Engineering Drawing Qp-2024 Ece .pdfDiploma Engineering Drawing Qp-2024 Ece .pdf
Diploma Engineering Drawing Qp-2024 Ece .pdf
 
Introduction-to- Metrology and Quality.pptx
Introduction-to- Metrology and Quality.pptxIntroduction-to- Metrology and Quality.pptx
Introduction-to- Metrology and Quality.pptx
 
Working Principle of Echo Sounder and Doppler Effect.pdf
Working Principle of Echo Sounder and Doppler Effect.pdfWorking Principle of Echo Sounder and Doppler Effect.pdf
Working Principle of Echo Sounder and Doppler Effect.pdf
 
Independent Solar-Powered Electric Vehicle Charging Station
Independent Solar-Powered Electric Vehicle Charging StationIndependent Solar-Powered Electric Vehicle Charging Station
Independent Solar-Powered Electric Vehicle Charging Station
 
What is Coordinate Measuring Machine? CMM Types, Features, Functions
What is Coordinate Measuring Machine? CMM Types, Features, FunctionsWhat is Coordinate Measuring Machine? CMM Types, Features, Functions
What is Coordinate Measuring Machine? CMM Types, Features, Functions
 
Databricks Generative AI FoundationCertified.pdf
Databricks Generative AI FoundationCertified.pdfDatabricks Generative AI FoundationCertified.pdf
Databricks Generative AI FoundationCertified.pdf
 
一比一原版(Griffith毕业证书)格里菲斯大学毕业证成绩单学位证书
一比一原版(Griffith毕业证书)格里菲斯大学毕业证成绩单学位证书一比一原版(Griffith毕业证书)格里菲斯大学毕业证成绩单学位证书
一比一原版(Griffith毕业证书)格里菲斯大学毕业证成绩单学位证书
 
litvinenko_Henry_Intrusion_Hong-Kong_2024.pdf
litvinenko_Henry_Intrusion_Hong-Kong_2024.pdflitvinenko_Henry_Intrusion_Hong-Kong_2024.pdf
litvinenko_Henry_Intrusion_Hong-Kong_2024.pdf
 
Adsorption (mass transfer operations 2) ppt
Adsorption (mass transfer operations 2) pptAdsorption (mass transfer operations 2) ppt
Adsorption (mass transfer operations 2) ppt
 
21P35A0312 Internship eccccccReport.docx
21P35A0312 Internship eccccccReport.docx21P35A0312 Internship eccccccReport.docx
21P35A0312 Internship eccccccReport.docx
 
Augmented Reality (AR) with Augin Software.pptx
Augmented Reality (AR) with Augin Software.pptxAugmented Reality (AR) with Augin Software.pptx
Augmented Reality (AR) with Augin Software.pptx
 
Signal Processing and Linear System Analysis
Signal Processing and Linear System AnalysisSignal Processing and Linear System Analysis
Signal Processing and Linear System Analysis
 
Raashid final report on Embedded Systems
Raashid final report on Embedded SystemsRaashid final report on Embedded Systems
Raashid final report on Embedded Systems
 
Path loss model, OKUMURA Model, Hata Model
Path loss model, OKUMURA Model, Hata ModelPath loss model, OKUMURA Model, Hata Model
Path loss model, OKUMURA Model, Hata Model
 
Call for Papers - Journal of Electrical Systems (JES), E-ISSN: 1112-5209, ind...
Call for Papers - Journal of Electrical Systems (JES), E-ISSN: 1112-5209, ind...Call for Papers - Journal of Electrical Systems (JES), E-ISSN: 1112-5209, ind...
Call for Papers - Journal of Electrical Systems (JES), E-ISSN: 1112-5209, ind...
 
Presentation on Slab, Beam, Column, and Foundation/Footing
Presentation on Slab,  Beam, Column, and Foundation/FootingPresentation on Slab,  Beam, Column, and Foundation/Footing
Presentation on Slab, Beam, Column, and Foundation/Footing
 
Artificial Intelligence in due diligence
Artificial Intelligence in due diligenceArtificial Intelligence in due diligence
Artificial Intelligence in due diligence
 
Maximizing Incident Investigation Efficacy in Oil & Gas: Techniques and Tools
Maximizing Incident Investigation Efficacy in Oil & Gas: Techniques and ToolsMaximizing Incident Investigation Efficacy in Oil & Gas: Techniques and Tools
Maximizing Incident Investigation Efficacy in Oil & Gas: Techniques and Tools
 
Dynamo Scripts for Task IDs and Space Naming.pptx
Dynamo Scripts for Task IDs and Space Naming.pptxDynamo Scripts for Task IDs and Space Naming.pptx
Dynamo Scripts for Task IDs and Space Naming.pptx
 
Developing a smart system for infant incubators using the internet of things ...
Developing a smart system for infant incubators using the internet of things ...Developing a smart system for infant incubators using the internet of things ...
Developing a smart system for infant incubators using the internet of things ...
 

STREAMING WITH KAFKA Publish/Subscribe Messaging with Kafka

  • 2. What is streaming? ■ So far we’ve really just talked about processing historical, existing big data – Sitting on HDFS – Sitting in a database ■ But how does new data get into your cluster? Especially if it’s “Big data”? – New log entries from your web servers – New sensor data from your IoT system – New stock trades ■ Streaming lets you publish this data, in real time, to your cluster. – And you can even process it in real time as it comes in!
  • 3. Two problems ■ How to get data from many different sources flowing into your cluster ■ Processing it when it gets there ■ First, let’s focus on the first problem
  • 4. Enter Kafka ■ Kafka is a general-purpose publish/subscribe messaging system ■ Kafka servers store all incoming messages from publishers for some period of time, and publishes them to a stream of data called a topic. ■ Kafka consumers subscribe to one or more topics, and receive data as it’s published ■ A stream / topic can have many different consumers, all with their own position in the stream maintained ■ It’s not just for Hadoop
  • 5. Kafka architecture Kafka Cluster App App App App App App App App DB DB Producers Consumers Stream Processors Connectors
  • 6. How Kafka scales Image: kafka.apache.org ■ Kafka itself may be distributed among many processes on many servers – Will distribute the storage of stream data as well ■ Consumers may also be distributed – Consumers of the same group will have messages distributed amongst them – Consumers of different groups will get their own copy of each message
  • 7. Let’s play ■ Start Kafka on our sandbox ■ Set up a topic – Publish some data to it, and watch it get consumed ■ Set up a file connector – Monitor a log file and publish additions to it