SlideShare a Scribd company logo
1 of 25
APACHE KAFKA
WHAT IS KAFKA?
• KAFKA is a distributed messaging system that was
originally developed at LinkedIn to serve as the
foundation for LinkedIn's activity stream and operational
data processing pipeline. It is now used at a variety of
different companies for various data pipeline and
messaging uses.
• LinkedIn developed Kafka for collecting and delivering
high volumes of log data with low latency for real-time
log processing.
• Scala and Zookeeper based.
WHAT IS KAFKA? CONTD..
• KAFKA can be used for both ONLINE(realtime) and
OFFLINE (integration with hadoop) analysis of log data.
• Distributed and scalable .
• High Throughput.
• More Priority to efficiency then fancy features.
• Simple API.
• Low Overhead.
APACHE KAFKA
=
+LOG AGGREGATOR MESSAGING SYSTEM
Where to use APACHE KAFKA?
• user activity events corresponding to logins, page-views,
clicks, “likes” ,sharing, comments, and search queries
• operational metrics (health monitoring)
• search relevance.
• Recommendations.
• Ad targeting and reporting.
• Protection against abnormal behavior.
• newsfeed
• Etc.
FLAWS in Traditional Messaging Systems.
• More focus on delivery guarantee.
• Less focus on throughput.
• Weak in distributed support.
• Performance degrades as Queue increases.
KAFKA DESIGN
PRINCIPLES
ARCHITECTURE
ARCHITECTURE contd..
• A stream of messages of particular type is defined by a topic.
• A topic is then divided into multiple partitions
• Each partition is a directory that has list of files and the files store
the message data.
• A producer can publish messages to a topic (push).
• Published messages are then stored on a set of servers called
brokers.
• Each broker stores one or more partitions
• Consumers can subscribe to any of the topics and can consume
messages by pulling data from the brokers.
• Supports both Point-to-point Delivery model and Publish/subscribe
Delivery model.
• Uses “Pull Based” system model.
WHY “PULL MODEL”?
• A “push-based” system has difficulty dealing with diverse
consumers as the broker controls the rate at which data
is transferred, (DOS attack = loss of data).
• In a Pull based system, a consumer simply falls behind
and catches up when it can.
• Consumer can rewind back to consume old messages.
DEPLOYMENT
Storage
• Simple Storage
• One each partition corresponds to a logical log.(a log ==
a list of files.)
• Physically a log is implemented using segmentation
where each segment is approximately of equal size.
• When a message is published , the broker simply
appends the message to the last segment file.
• Messages are flushed on to disk in a batch for better
performance.
• Messages is only exposed to consumers after they are
flushed.
• Messages are addressed by log offset. #lessoverhead.
Storage contd..
• Id of next message = length of current message + id of current
message.
Efficient Transfer
• Producer can submit multiple messages in a single send
request.( #End-to-endmessagebatching).
• Each pull request can consume multiple messages up to
a certain size.
• No caching of messages on the Kafta process layer.
Messages are only cached in the page cache, --“no
double buffering”. Hence very little overhead in garbage
collecting its memory.(#filesystemcaching)#lessoverhead
• Producer and Consumer access the data files
Sequentially. (disks are fast when accessed
sequentially).
Efficient Transfer :from local file to remote
socket
1. read data from the storage media to the page cache in an OS.
2. copy data in the page cache to an application buffer.
3. copy application buffer to another kernel buffer.
4. send the kernel buffer to the socket
• This process usually takes 4 data copying and one system call.
SENDFILE API
avoids 2 copies call and 1 system call
#zerocopytransfer
broker consumer
Stateless Broker
• Consumer maintains its own state hence reducing
complexity and overhead on the broker.
• Disadvantage is that broker doesn’t know whether all
subscriber’s have consumed the message , this problem
is solved by using a simple time based SLA for the
retention policy. A message is automatically deleted if it
has been
• A consumer can deliberately rewind back to an old offset
and re-consume data. #violateslogicofqueue.
Distributed Model
Producer 1 Producer 2
Broker 1 Broker 2 Broker 3
Consumer 1 Consumer 2
Zookeeper
Distributed Model contd..
• Consumer Group consists of one or more consumers.
• All messages from a partition are consumed by a single
consumer in a consumer group. #lessoverheadonbroker.
• No master node  less complexity and no master
failures to worry about.
• Zookeeper co-ordinates between the producers,
consumers and brokers.
Zookeeper functions.
• Detection of addition/ removal of brokers, producers, consumers.
• Triggering rebalance process in effect to above detection.
• Tracking.
Broker registry hostname, port and
set of topics of broker.
ephemeral
Consumer Registry consumer group Ephemeral
Owner registry consumer currently
consuming a partition
persistent
Offset registry stores the offset of last
consumed partition for
each subscribed
partition.
persistent
Delivery Guarantee
• At least once delivery. #cancauseduplication.#cost-
effective
• Messages from single partition in order.
• Messages from multiple partitions not necessarily in
order. #noguarantee
• CRC for each message in the logs #avoidlogcorruption
• If I/O error on broker then kafka removes messages with
inconsistent CRCs.
• If the storage system is completely damaged and
consumer have not consumed the message then the
message is lost forever. #futureplanstoaddreplication.
Recent Developments
• End-to-End Batch Level Compression.
• Improved Stream Processing Libraries.
• Hadoop Consumer.
• Hadoop Producer.
Apache Kafka @
References.
• http://incubator.apache.org/kafka/
• http://research.microsoft.com/en-us/um/people/srikan
• http://vimeo.com/27592622

More Related Content

What's hot

[WSO2Con EU 2017] WSO2 Unleashed: Full Stack Automation, Pitfalls and Solutions
[WSO2Con EU 2017] WSO2 Unleashed: Full Stack Automation, Pitfalls and Solutions[WSO2Con EU 2017] WSO2 Unleashed: Full Stack Automation, Pitfalls and Solutions
[WSO2Con EU 2017] WSO2 Unleashed: Full Stack Automation, Pitfalls and SolutionsWSO2
 
Azure Application insights - An Introduction
Azure Application insights - An IntroductionAzure Application insights - An Introduction
Azure Application insights - An IntroductionMatthias Güntert
 
Real User Monitoring (RUM)
Real User Monitoring (RUM)Real User Monitoring (RUM)
Real User Monitoring (RUM)Site24x7
 
Site24x7 Cloud Monitoring
Site24x7 Cloud MonitoringSite24x7 Cloud Monitoring
Site24x7 Cloud MonitoringSite24x7
 
[WSO2Con EU 2017] How a Large Organization Weighted on a WSO2 Integration Pla...
[WSO2Con EU 2017] How a Large Organization Weighted on a WSO2 Integration Pla...[WSO2Con EU 2017] How a Large Organization Weighted on a WSO2 Integration Pla...
[WSO2Con EU 2017] How a Large Organization Weighted on a WSO2 Integration Pla...WSO2
 
Modernizing Cloud and Hyperconverged Infrastructure monitoring
Modernizing Cloud and Hyperconverged Infrastructure monitoringModernizing Cloud and Hyperconverged Infrastructure monitoring
Modernizing Cloud and Hyperconverged Infrastructure monitoringManageEngine, Zoho Corporation
 
Cloud Bursting with A10 Lightning ADS
Cloud Bursting with A10 Lightning ADSCloud Bursting with A10 Lightning ADS
Cloud Bursting with A10 Lightning ADSAkshay Mathur
 
"What database can tell about application issues? What application can tell a...
"What database can tell about application issues? What application can tell a..."What database can tell about application issues? What application can tell a...
"What database can tell about application issues? What application can tell a...Fwdays
 
One Azure Monitor to Rule Them All? - Marius Zaharia
One Azure Monitor to Rule Them All? - Marius ZahariaOne Azure Monitor to Rule Them All? - Marius Zaharia
One Azure Monitor to Rule Them All? - Marius ZahariaITCamp
 
Azure API Manegement Introduction and Integeration with BizTalk
Azure API Manegement Introduction and Integeration with BizTalkAzure API Manegement Introduction and Integeration with BizTalk
Azure API Manegement Introduction and Integeration with BizTalkShailesh Dwivedi
 
Full Stack Reactive In Practice
Full Stack Reactive In PracticeFull Stack Reactive In Practice
Full Stack Reactive In PracticeLightbend
 
Nava SIEM Agent Datasheet
Nava SIEM Agent DatasheetNava SIEM Agent Datasheet
Nava SIEM Agent DatasheetLinkgard
 
Restful Asynchronous Notification
Restful Asynchronous NotificationRestful Asynchronous Notification
Restful Asynchronous NotificationMichael Koster
 
Cloud monitoring
Cloud monitoringCloud monitoring
Cloud monitoringGang Tao
 
Building Lightweight Microservices With Redis & Hydra
Building Lightweight Microservices With Redis & HydraBuilding Lightweight Microservices With Redis & Hydra
Building Lightweight Microservices With Redis & HydraRedis Labs
 
Flink Forward Berlin 2018: Yonatan Most & Avihai Berkovitz - "Anomaly Detecti...
Flink Forward Berlin 2018: Yonatan Most & Avihai Berkovitz - "Anomaly Detecti...Flink Forward Berlin 2018: Yonatan Most & Avihai Berkovitz - "Anomaly Detecti...
Flink Forward Berlin 2018: Yonatan Most & Avihai Berkovitz - "Anomaly Detecti...Flink Forward
 
Keynote : évolution et vision d'Elastic Observability
Keynote : évolution et vision d'Elastic ObservabilityKeynote : évolution et vision d'Elastic Observability
Keynote : évolution et vision d'Elastic ObservabilityElasticsearch
 

What's hot (20)

[WSO2Con EU 2017] WSO2 Unleashed: Full Stack Automation, Pitfalls and Solutions
[WSO2Con EU 2017] WSO2 Unleashed: Full Stack Automation, Pitfalls and Solutions[WSO2Con EU 2017] WSO2 Unleashed: Full Stack Automation, Pitfalls and Solutions
[WSO2Con EU 2017] WSO2 Unleashed: Full Stack Automation, Pitfalls and Solutions
 
Blr hadoop meetup
Blr hadoop meetupBlr hadoop meetup
Blr hadoop meetup
 
Monitor Cloud Resources using Alerts & Insights
Monitor Cloud Resources using Alerts & InsightsMonitor Cloud Resources using Alerts & Insights
Monitor Cloud Resources using Alerts & Insights
 
Azure Application insights - An Introduction
Azure Application insights - An IntroductionAzure Application insights - An Introduction
Azure Application insights - An Introduction
 
Real User Monitoring (RUM)
Real User Monitoring (RUM)Real User Monitoring (RUM)
Real User Monitoring (RUM)
 
Site24x7 Cloud Monitoring
Site24x7 Cloud MonitoringSite24x7 Cloud Monitoring
Site24x7 Cloud Monitoring
 
[WSO2Con EU 2017] How a Large Organization Weighted on a WSO2 Integration Pla...
[WSO2Con EU 2017] How a Large Organization Weighted on a WSO2 Integration Pla...[WSO2Con EU 2017] How a Large Organization Weighted on a WSO2 Integration Pla...
[WSO2Con EU 2017] How a Large Organization Weighted on a WSO2 Integration Pla...
 
Modernizing Cloud and Hyperconverged Infrastructure monitoring
Modernizing Cloud and Hyperconverged Infrastructure monitoringModernizing Cloud and Hyperconverged Infrastructure monitoring
Modernizing Cloud and Hyperconverged Infrastructure monitoring
 
Cloud Bursting with A10 Lightning ADS
Cloud Bursting with A10 Lightning ADSCloud Bursting with A10 Lightning ADS
Cloud Bursting with A10 Lightning ADS
 
"What database can tell about application issues? What application can tell a...
"What database can tell about application issues? What application can tell a..."What database can tell about application issues? What application can tell a...
"What database can tell about application issues? What application can tell a...
 
One Azure Monitor to Rule Them All? - Marius Zaharia
One Azure Monitor to Rule Them All? - Marius ZahariaOne Azure Monitor to Rule Them All? - Marius Zaharia
One Azure Monitor to Rule Them All? - Marius Zaharia
 
Azure API Manegement Introduction and Integeration with BizTalk
Azure API Manegement Introduction and Integeration with BizTalkAzure API Manegement Introduction and Integeration with BizTalk
Azure API Manegement Introduction and Integeration with BizTalk
 
Full Stack Reactive In Practice
Full Stack Reactive In PracticeFull Stack Reactive In Practice
Full Stack Reactive In Practice
 
implementing the right website monitoring strategy
 implementing the right website monitoring strategy implementing the right website monitoring strategy
implementing the right website monitoring strategy
 
Nava SIEM Agent Datasheet
Nava SIEM Agent DatasheetNava SIEM Agent Datasheet
Nava SIEM Agent Datasheet
 
Restful Asynchronous Notification
Restful Asynchronous NotificationRestful Asynchronous Notification
Restful Asynchronous Notification
 
Cloud monitoring
Cloud monitoringCloud monitoring
Cloud monitoring
 
Building Lightweight Microservices With Redis & Hydra
Building Lightweight Microservices With Redis & HydraBuilding Lightweight Microservices With Redis & Hydra
Building Lightweight Microservices With Redis & Hydra
 
Flink Forward Berlin 2018: Yonatan Most & Avihai Berkovitz - "Anomaly Detecti...
Flink Forward Berlin 2018: Yonatan Most & Avihai Berkovitz - "Anomaly Detecti...Flink Forward Berlin 2018: Yonatan Most & Avihai Berkovitz - "Anomaly Detecti...
Flink Forward Berlin 2018: Yonatan Most & Avihai Berkovitz - "Anomaly Detecti...
 
Keynote : évolution et vision d'Elastic Observability
Keynote : évolution et vision d'Elastic ObservabilityKeynote : évolution et vision d'Elastic Observability
Keynote : évolution et vision d'Elastic Observability
 

Viewers also liked

3450 - Writing and optimising applications for performance in a hybrid messag...
3450 - Writing and optimising applications for performance in a hybrid messag...3450 - Writing and optimising applications for performance in a hybrid messag...
3450 - Writing and optimising applications for performance in a hybrid messag...Timothy McCormick
 
3425 - Using publish/subscribe to integrate applications
3425 - Using publish/subscribe to integrate applications3425 - Using publish/subscribe to integrate applications
3425 - Using publish/subscribe to integrate applicationsTimothy McCormick
 
Introducing IBM Message Hub: Cloud-scale messaging based on Apache Kafka
Introducing IBM Message Hub: Cloud-scale messaging based on Apache KafkaIntroducing IBM Message Hub: Cloud-scale messaging based on Apache Kafka
Introducing IBM Message Hub: Cloud-scale messaging based on Apache KafkaAndrew Schofield
 
#1922 rest-push2 ap-im-v6
#1922 rest-push2 ap-im-v6#1922 rest-push2 ap-im-v6
#1922 rest-push2 ap-im-v6Jack Carnes
 
HHM-3540: The IBM MQ Light API: From Developer Laptop to Enterprise Data Cen...
 HHM-3540: The IBM MQ Light API: From Developer Laptop to Enterprise Data Cen... HHM-3540: The IBM MQ Light API: From Developer Laptop to Enterprise Data Cen...
HHM-3540: The IBM MQ Light API: From Developer Laptop to Enterprise Data Cen...Matt Leming
 
3429 How to transform your messaging environment to a secure messaging envi...
3429   How to transform your messaging environment to a secure messaging envi...3429   How to transform your messaging environment to a secure messaging envi...
3429 How to transform your messaging environment to a secure messaging envi...Robert Parker
 
WhatsNewIBMIntegrationBus10FP4
WhatsNewIBMIntegrationBus10FP4WhatsNewIBMIntegrationBus10FP4
WhatsNewIBMIntegrationBus10FP4bthomps1979
 
ConnectorsForIntegration
ConnectorsForIntegrationConnectorsForIntegration
ConnectorsForIntegrationbthomps1979
 
IBM Messaging Security - Why securing your environment is important : IBM Int...
IBM Messaging Security - Why securing your environment is important : IBM Int...IBM Messaging Security - Why securing your environment is important : IBM Int...
IBM Messaging Security - Why securing your environment is important : IBM Int...Leif Davidsen
 
Hia 1691-using iib-to_support_api_economy
Hia 1691-using iib-to_support_api_economyHia 1691-using iib-to_support_api_economy
Hia 1691-using iib-to_support_api_economyAndrew Coleman
 
Hia 1689-techinical introduction-to_iib
Hia 1689-techinical introduction-to_iibHia 1689-techinical introduction-to_iib
Hia 1689-techinical introduction-to_iibAndrew Coleman
 
Java zone 2015 How to make life with kafka easier.
Java zone 2015 How to make life with kafka easier.Java zone 2015 How to make life with kafka easier.
Java zone 2015 How to make life with kafka easier.Krzysztof Debski
 
Microservices: Where do they fit within a rapidly evolving integration archit...
Microservices: Where do they fit within a rapidly evolving integration archit...Microservices: Where do they fit within a rapidly evolving integration archit...
Microservices: Where do they fit within a rapidly evolving integration archit...Kim Clark
 

Viewers also liked (13)

3450 - Writing and optimising applications for performance in a hybrid messag...
3450 - Writing and optimising applications for performance in a hybrid messag...3450 - Writing and optimising applications for performance in a hybrid messag...
3450 - Writing and optimising applications for performance in a hybrid messag...
 
3425 - Using publish/subscribe to integrate applications
3425 - Using publish/subscribe to integrate applications3425 - Using publish/subscribe to integrate applications
3425 - Using publish/subscribe to integrate applications
 
Introducing IBM Message Hub: Cloud-scale messaging based on Apache Kafka
Introducing IBM Message Hub: Cloud-scale messaging based on Apache KafkaIntroducing IBM Message Hub: Cloud-scale messaging based on Apache Kafka
Introducing IBM Message Hub: Cloud-scale messaging based on Apache Kafka
 
#1922 rest-push2 ap-im-v6
#1922 rest-push2 ap-im-v6#1922 rest-push2 ap-im-v6
#1922 rest-push2 ap-im-v6
 
HHM-3540: The IBM MQ Light API: From Developer Laptop to Enterprise Data Cen...
 HHM-3540: The IBM MQ Light API: From Developer Laptop to Enterprise Data Cen... HHM-3540: The IBM MQ Light API: From Developer Laptop to Enterprise Data Cen...
HHM-3540: The IBM MQ Light API: From Developer Laptop to Enterprise Data Cen...
 
3429 How to transform your messaging environment to a secure messaging envi...
3429   How to transform your messaging environment to a secure messaging envi...3429   How to transform your messaging environment to a secure messaging envi...
3429 How to transform your messaging environment to a secure messaging envi...
 
WhatsNewIBMIntegrationBus10FP4
WhatsNewIBMIntegrationBus10FP4WhatsNewIBMIntegrationBus10FP4
WhatsNewIBMIntegrationBus10FP4
 
ConnectorsForIntegration
ConnectorsForIntegrationConnectorsForIntegration
ConnectorsForIntegration
 
IBM Messaging Security - Why securing your environment is important : IBM Int...
IBM Messaging Security - Why securing your environment is important : IBM Int...IBM Messaging Security - Why securing your environment is important : IBM Int...
IBM Messaging Security - Why securing your environment is important : IBM Int...
 
Hia 1691-using iib-to_support_api_economy
Hia 1691-using iib-to_support_api_economyHia 1691-using iib-to_support_api_economy
Hia 1691-using iib-to_support_api_economy
 
Hia 1689-techinical introduction-to_iib
Hia 1689-techinical introduction-to_iibHia 1689-techinical introduction-to_iib
Hia 1689-techinical introduction-to_iib
 
Java zone 2015 How to make life with kafka easier.
Java zone 2015 How to make life with kafka easier.Java zone 2015 How to make life with kafka easier.
Java zone 2015 How to make life with kafka easier.
 
Microservices: Where do they fit within a rapidly evolving integration archit...
Microservices: Where do they fit within a rapidly evolving integration archit...Microservices: Where do they fit within a rapidly evolving integration archit...
Microservices: Where do they fit within a rapidly evolving integration archit...
 

Similar to Apache kafka- Onkar Kadam

Distributed messaging with Apache Kafka
Distributed messaging with Apache KafkaDistributed messaging with Apache Kafka
Distributed messaging with Apache KafkaSaumitra Srivastav
 
Removing dependencies between services: Messaging and Apache Kafka
Removing dependencies between services: Messaging and Apache KafkaRemoving dependencies between services: Messaging and Apache Kafka
Removing dependencies between services: Messaging and Apache KafkaDaniel Muñoz Garrido
 
Building an Event Bus at Scale
Building an Event Bus at ScaleBuilding an Event Bus at Scale
Building an Event Bus at Scalejimriecken
 
apachekafka-160907180205.pdf
apachekafka-160907180205.pdfapachekafka-160907180205.pdf
apachekafka-160907180205.pdfTarekHamdi8
 
Apache Kafka Introduction
Apache Kafka IntroductionApache Kafka Introduction
Apache Kafka IntroductionAmita Mirajkar
 
kafka simplicity and complexity
kafka simplicity and complexitykafka simplicity and complexity
kafka simplicity and complexityPaolo Platter
 
Zookeeper Tutorial for beginners
Zookeeper Tutorial for beginnersZookeeper Tutorial for beginners
Zookeeper Tutorial for beginnersjeetendra mandal
 
Kafka Overview
Kafka OverviewKafka Overview
Kafka Overviewiamtodor
 
Introduction to Microservices
Introduction to MicroservicesIntroduction to Microservices
Introduction to MicroservicesMahmoudZidan41
 
Fundamentals and Architecture of Apache Kafka
Fundamentals and Architecture of Apache KafkaFundamentals and Architecture of Apache Kafka
Fundamentals and Architecture of Apache KafkaAngelo Cesaro
 
OnPrem Monitoring.pdf
OnPrem Monitoring.pdfOnPrem Monitoring.pdf
OnPrem Monitoring.pdfTarekHamdi8
 
Unleashing Real-time Power with Kafka.pptx
Unleashing Real-time Power with Kafka.pptxUnleashing Real-time Power with Kafka.pptx
Unleashing Real-time Power with Kafka.pptxKnoldus Inc.
 
AWS Study Group - Chapter 07 - Integrating Application Services [Solution Arc...
AWS Study Group - Chapter 07 - Integrating Application Services [Solution Arc...AWS Study Group - Chapter 07 - Integrating Application Services [Solution Arc...
AWS Study Group - Chapter 07 - Integrating Application Services [Solution Arc...QCloudMentor
 
Metrics are Not Enough: Monitoring Apache Kafka / Gwen Shapira (Confluent)
Metrics are Not Enough: Monitoring Apache Kafka / Gwen Shapira (Confluent)Metrics are Not Enough: Monitoring Apache Kafka / Gwen Shapira (Confluent)
Metrics are Not Enough: Monitoring Apache Kafka / Gwen Shapira (Confluent)Ontico
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache KafkaJeff Holoman
 
WSO2 Message Broker - Product Overview
WSO2 Message Broker - Product OverviewWSO2 Message Broker - Product Overview
WSO2 Message Broker - Product OverviewWSO2
 
Microservice message routing on Kubernetes
Microservice message routing on KubernetesMicroservice message routing on Kubernetes
Microservice message routing on KubernetesFrans van Buul
 

Similar to Apache kafka- Onkar Kadam (20)

Distributed messaging with Apache Kafka
Distributed messaging with Apache KafkaDistributed messaging with Apache Kafka
Distributed messaging with Apache Kafka
 
Removing dependencies between services: Messaging and Apache Kafka
Removing dependencies between services: Messaging and Apache KafkaRemoving dependencies between services: Messaging and Apache Kafka
Removing dependencies between services: Messaging and Apache Kafka
 
Building an Event Bus at Scale
Building an Event Bus at ScaleBuilding an Event Bus at Scale
Building an Event Bus at Scale
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
Kafka tutorial
Kafka tutorialKafka tutorial
Kafka tutorial
 
apachekafka-160907180205.pdf
apachekafka-160907180205.pdfapachekafka-160907180205.pdf
apachekafka-160907180205.pdf
 
Apache Kafka Introduction
Apache Kafka IntroductionApache Kafka Introduction
Apache Kafka Introduction
 
kafka simplicity and complexity
kafka simplicity and complexitykafka simplicity and complexity
kafka simplicity and complexity
 
Zookeeper Tutorial for beginners
Zookeeper Tutorial for beginnersZookeeper Tutorial for beginners
Zookeeper Tutorial for beginners
 
Kafka Overview
Kafka OverviewKafka Overview
Kafka Overview
 
Introduction to Microservices
Introduction to MicroservicesIntroduction to Microservices
Introduction to Microservices
 
Fundamentals and Architecture of Apache Kafka
Fundamentals and Architecture of Apache KafkaFundamentals and Architecture of Apache Kafka
Fundamentals and Architecture of Apache Kafka
 
OnPrem Monitoring.pdf
OnPrem Monitoring.pdfOnPrem Monitoring.pdf
OnPrem Monitoring.pdf
 
Unleashing Real-time Power with Kafka.pptx
Unleashing Real-time Power with Kafka.pptxUnleashing Real-time Power with Kafka.pptx
Unleashing Real-time Power with Kafka.pptx
 
AWS Study Group - Chapter 07 - Integrating Application Services [Solution Arc...
AWS Study Group - Chapter 07 - Integrating Application Services [Solution Arc...AWS Study Group - Chapter 07 - Integrating Application Services [Solution Arc...
AWS Study Group - Chapter 07 - Integrating Application Services [Solution Arc...
 
Metrics are Not Enough: Monitoring Apache Kafka / Gwen Shapira (Confluent)
Metrics are Not Enough: Monitoring Apache Kafka / Gwen Shapira (Confluent)Metrics are Not Enough: Monitoring Apache Kafka / Gwen Shapira (Confluent)
Metrics are Not Enough: Monitoring Apache Kafka / Gwen Shapira (Confluent)
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache Kafka
 
WSO2 Message Broker - Product Overview
WSO2 Message Broker - Product OverviewWSO2 Message Broker - Product Overview
WSO2 Message Broker - Product Overview
 
Microservices deck
Microservices deckMicroservices deck
Microservices deck
 
Microservice message routing on Kubernetes
Microservice message routing on KubernetesMicroservice message routing on Kubernetes
Microservice message routing on Kubernetes
 

Recently uploaded

Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 

Recently uploaded (20)

Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 

Apache kafka- Onkar Kadam

  • 2. WHAT IS KAFKA? • KAFKA is a distributed messaging system that was originally developed at LinkedIn to serve as the foundation for LinkedIn's activity stream and operational data processing pipeline. It is now used at a variety of different companies for various data pipeline and messaging uses. • LinkedIn developed Kafka for collecting and delivering high volumes of log data with low latency for real-time log processing. • Scala and Zookeeper based.
  • 3. WHAT IS KAFKA? CONTD.. • KAFKA can be used for both ONLINE(realtime) and OFFLINE (integration with hadoop) analysis of log data. • Distributed and scalable . • High Throughput. • More Priority to efficiency then fancy features. • Simple API. • Low Overhead.
  • 5. Where to use APACHE KAFKA? • user activity events corresponding to logins, page-views, clicks, “likes” ,sharing, comments, and search queries • operational metrics (health monitoring) • search relevance. • Recommendations. • Ad targeting and reporting. • Protection against abnormal behavior. • newsfeed • Etc.
  • 6. FLAWS in Traditional Messaging Systems. • More focus on delivery guarantee. • Less focus on throughput. • Weak in distributed support. • Performance degrades as Queue increases.
  • 9. ARCHITECTURE contd.. • A stream of messages of particular type is defined by a topic. • A topic is then divided into multiple partitions • Each partition is a directory that has list of files and the files store the message data. • A producer can publish messages to a topic (push). • Published messages are then stored on a set of servers called brokers. • Each broker stores one or more partitions • Consumers can subscribe to any of the topics and can consume messages by pulling data from the brokers. • Supports both Point-to-point Delivery model and Publish/subscribe Delivery model. • Uses “Pull Based” system model.
  • 10. WHY “PULL MODEL”? • A “push-based” system has difficulty dealing with diverse consumers as the broker controls the rate at which data is transferred, (DOS attack = loss of data). • In a Pull based system, a consumer simply falls behind and catches up when it can. • Consumer can rewind back to consume old messages.
  • 12.
  • 13.
  • 14. Storage • Simple Storage • One each partition corresponds to a logical log.(a log == a list of files.) • Physically a log is implemented using segmentation where each segment is approximately of equal size. • When a message is published , the broker simply appends the message to the last segment file. • Messages are flushed on to disk in a batch for better performance. • Messages is only exposed to consumers after they are flushed. • Messages are addressed by log offset. #lessoverhead.
  • 15. Storage contd.. • Id of next message = length of current message + id of current message.
  • 16. Efficient Transfer • Producer can submit multiple messages in a single send request.( #End-to-endmessagebatching). • Each pull request can consume multiple messages up to a certain size. • No caching of messages on the Kafta process layer. Messages are only cached in the page cache, --“no double buffering”. Hence very little overhead in garbage collecting its memory.(#filesystemcaching)#lessoverhead • Producer and Consumer access the data files Sequentially. (disks are fast when accessed sequentially).
  • 17. Efficient Transfer :from local file to remote socket 1. read data from the storage media to the page cache in an OS. 2. copy data in the page cache to an application buffer. 3. copy application buffer to another kernel buffer. 4. send the kernel buffer to the socket • This process usually takes 4 data copying and one system call. SENDFILE API avoids 2 copies call and 1 system call #zerocopytransfer broker consumer
  • 18. Stateless Broker • Consumer maintains its own state hence reducing complexity and overhead on the broker. • Disadvantage is that broker doesn’t know whether all subscriber’s have consumed the message , this problem is solved by using a simple time based SLA for the retention policy. A message is automatically deleted if it has been • A consumer can deliberately rewind back to an old offset and re-consume data. #violateslogicofqueue.
  • 19. Distributed Model Producer 1 Producer 2 Broker 1 Broker 2 Broker 3 Consumer 1 Consumer 2 Zookeeper
  • 20. Distributed Model contd.. • Consumer Group consists of one or more consumers. • All messages from a partition are consumed by a single consumer in a consumer group. #lessoverheadonbroker. • No master node  less complexity and no master failures to worry about. • Zookeeper co-ordinates between the producers, consumers and brokers.
  • 21. Zookeeper functions. • Detection of addition/ removal of brokers, producers, consumers. • Triggering rebalance process in effect to above detection. • Tracking. Broker registry hostname, port and set of topics of broker. ephemeral Consumer Registry consumer group Ephemeral Owner registry consumer currently consuming a partition persistent Offset registry stores the offset of last consumed partition for each subscribed partition. persistent
  • 22. Delivery Guarantee • At least once delivery. #cancauseduplication.#cost- effective • Messages from single partition in order. • Messages from multiple partitions not necessarily in order. #noguarantee • CRC for each message in the logs #avoidlogcorruption • If I/O error on broker then kafka removes messages with inconsistent CRCs. • If the storage system is completely damaged and consumer have not consumed the message then the message is lost forever. #futureplanstoaddreplication.
  • 23. Recent Developments • End-to-End Batch Level Compression. • Improved Stream Processing Libraries. • Hadoop Consumer. • Hadoop Producer.