SlideShare a Scribd company logo
ZAPR MEDIA LABS
Presented by: Chhavi Parasher and Ankit Timbadia
Agenda
● What is Apache Kafka?
● Why Apache Kafka?
● Fundamentals
○ Terminologies
○ Architecture
○ Protocol
● What does kafka offer?
● Where Kafka is used?
What is Apache Kafka?
What is kafka
“A high throughput distributed pub-sub messaging system.
What is kafka
“A pub-sub messaging system rethought as a distributed commit log
storage.”
Why Apache Kafka?
Why Kafka
Client Source
Data Pipelines Start like this.
Why Kafka
Client Source
Client
Client
Client
Then we reuse them
Why Kafka
Client Backend1
Client
Client
Client
Then we add additional endpoints to the
existing sources
Backend2
Why Kafka
Client Backend1
Client
Client
Client
Then it starts to look like this
Backend2
Backend3
Backend4
Birth of Kafka
Kafka decouples Data Pipelines
Source System Source System Source System Source System
Hadoop Security Systems
Real-time
monitoring
Data Warehouse
Kafka
Producers
Brokers
Consumers
Kafka Fundamentals
Terminologies
Terminologies
• Topic is a message stream (queue)
• Partitions - Topic is divided into partitions
• Ordered, immutable, a structured commit log
• Segments - Partition has log segments on disk
• Partitions broken down into physical files.
• Offset - Segment has messages
• Each message is assigned a sequential id
Anatomy of a topic
0 1 2 3 4 5 6 7 8 9
1
0
1
1
1
2
1
3
0 1 2 3 4 5 6 7 8 9
1
0
1
1
0 1 2 3 4 5 6 7 8 9
1
0
1
1
1
2
1
3
Partition
1
Partition
2
Partition
3
Writes
Old New
Segment1 Segment2
Kafka Message (Write once read many)
Partitions and Brokers
• Each broker can be
a leader or a replica
for a partition
• All writes & reads to
a topic go through
the leader
Producer writing to partition
• Producers write to a
single leader
• Each write can be
serviced by a
separate broker
Producer writing to second partition.
Keyed Messages and Partitioning
Producer
Partitioning Distributes clients across consumers
Consumer
Consumer
Consumer
Consumer
Consumer
Consumer
Consumer
Producer
Producer
Producer
Producer
Producer
Architecture
What does kafka offer?
What does kafka offer?
• Fast
• Scalable
• Durable
• Distributed by Design
What does kafka offer?
• Fast
• A single Kafka broker can handle 100Mbps of reads & writes from 1000 of clients.
• Scalable
• Durable
• Distributed by Design
What does kafka offer?
• Fast
• Scalable
• Data streams are partitioned and spread over a cluster of machines to allow data
streams larger than the capability of any single machine
• Durable
• Distributed by Design
What does kafka offer?
• Fast
• Scalable
• Durable
• Messages are persisted on disk and replicated within the cluster to prevent data
loss. Each broker can handle terabytes of messages without performance impact.
• Distributed by Design
What does kafka offer
• Fast
• Scalable
• Durable
• Distributed by Design
• Kafka has a modern cluster-centric design that offers strong durability and
fault-tolerance guarantees.
Efficiency - high throughput & low latency
• Disks are fast when used sequentially
• Append to end of log.
• Fetch messages from a partition beginning from a particular message id.
• Batching makes best use of network/IO
• Batched send and receive
• Batched compression (GZIP, Snappy and LZ4)
Efficiency - high throughput & low latency
• No message caching in JVM
• Zero-copy from file to socket (Java NIO)
●
●
●
●
●
●
Comparison to other messaging systems
Where is Kafka used?
Who uses Apache Kafka?
What do people use it for?
• Activity tracking
• page views or click tracking, profile updates
• Real-time event processing
• Provides low latency
• Good integration with spark, samza. storm etc.
• Collecting Operational Metrics
• Monitoring & alerting
• Commit log
• database changes can be published to Kafka
Why Replication?
• Broker can go down
• controlled: rolling restart for code/config push
• uncontrolled: isolated broker failure
• If broker is down
• some partitions are unavailable
• could be permanent data loss
• Replication ensures higher availability & durability
Caveats
• Not designed for large payloads.
• Decoding is done for a whole message (no streaming decoding).
• Rebalancing can screw things up.
• if you're doing any aggregation in the consumer by the partition key
• Number of partitions cannot be easily changed (chose wisely).
• Lots of topics can hurt I/O performance.
Guarantees
• At least once delivery
• In order delivery, per partition
• For a topic with replication factor N, Kafka can tolerate up to N-1 server
failures without “losing” any messages committed to the log
Summary
• A high throughput distributed messaging system rethought as commit log
• Originally developed by LinkedIn, Open Sourced in 2011
• Written in Scala, Clients for every popular language
• Used by LinkedIn, Twitter, Netflix and many more companies.
• When should we use Kafka ?
• When should we not use kafka?
Kafka in Production @ Zapr
Agenda
• Kafka Clusters at @Zapr
• Kafka in AWS
• Challenges Managing Kafka Brokers.
• Monitoring of Kafka Clusters.
Kafka Version: 0.8.2
Total Kafka Clusters: 5
Number of Topics: 185
Brokers: 17
Partitions: 7827
Regions: 2 (us-east-1,ap-southeast-1)
Kafka at Zapr
Kafka Clusters on Amazon Web Services.
Quorum of 3 Zookeeper Nodes for High Availability
Kafka Brokers
Resource Utilization
Family Type vCPU Memory CPU Pricing
Compute
Optimized
C4.2xlarge 8 16 ~75% 338$
General
Purpose
M4.2xlarge 8 32 ~25% 366$
After moving from C4 Class to M4 Class Machines 40% Improvement in CPU Usage
Production Kafka Broker Configurations
#Default Partitions and Replication Factor for Kafka Topics
num.partitions=10
default.replication.factor=2
#Controlled shutdown for the proper shutdown of Kafka Broker for maintenance.
num.recovery.threads.per.data.dir=4
controlled.shutdown.enable=true
controlled.shutdown.max.retries=5
controlled.shutdown.retry.backoff.ms=60000
#Creation and Deletion of Kafka Topics
delete.topic.enable=true
auto.create.topics.enable=true
Production Kafka Broker
Configurations
Challenges
Dynamic Load
• Kafka Scales Horizontally
• For a topic, Kafka has N (Replicated Partitions)
• Adding a Broker causes Partition to rebalanced
Current State 2 Brokers, 1 Topic and 6
Partitions & replication 2
Desired State after adding 3rd
Broker
Kafka approach
Introducing Kafka Manager
https://github.com/yahoo/kafka-manager
It supports the following :
● Manage multiple clusters.
● Easy inspection of cluster state (topics, consumers, offsets, brokers)
● Generate partition assignments with option to select brokers to use
● Run reassignment of partition (based on generated assignments)
● Create a topic with optional topic configs (0.8.1.1 has different configs than 0.8.2+)
● Delete topic (only supported on 0.8.2+ and remember set delete.topic.enable=true in
broker config)
● Add partitions to existing topic
● Update config for existing topic
● Optionally enable JMX polling for broker level and topic level metrics.
Clusters
Topic List
Topic View
Consumed Topic View
Broker List
Broker View
Reassign Partition
Monitoring
Kafka Graphite Metric Plugin
https://github.com/damienclaveau/kafka-graphite
#Graphite Reporter
kafka.metrics.reporters=com.criteo.kafka.KafkaGraphiteM
etricsReporter
kafka.graphite.metrics.reporter.enabled=true
kafka.metrics.polling.interval.secs = 60
kafka.graphite.metrics.host= graphite.zapr.com
kafka.graphite.metrics.port=2003
kafka.graphite.metrics.group=PROD.kafka-cluster-1.10-0-1-
90
Metrics @Graphite
Kafka Offset Monitoring
https://github.com/quantifind/KafkaOffsetMonitor
● This is an app to monitor your kafka consumers and their position (offset) in
the queue.
● You can see the current consumer groups, for each group the topics that they
are consuming
● Offset are useful to understand how quick you are consuming from a queue
and how fast the queue is growing
Kafka Offset Checker
Kafka offset monitoring @Graphite
Future plans
• Upgrading EC2 instances to next generations from M4 to M5
• Upgrading Kafka Version from 0.8.2 to latest version (1.2)
• Using ST1 (throughput optimized HDD) EBS
Thank you!

More Related Content

What's hot

Apache Kafka Introduction
Apache Kafka IntroductionApache Kafka Introduction
Apache Kafka Introduction
Amita Mirajkar
 
Kafka 101
Kafka 101Kafka 101
Kafka 101
Aparna Pillai
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache Kafka
AIMDek Technologies
 
Kafka 101
Kafka 101Kafka 101
Kafka 101
Clement Demonchy
 
Stream processing using Kafka
Stream processing using KafkaStream processing using Kafka
Stream processing using Kafka
Knoldus Inc.
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
Kumar Shivam
 
Apache Kafka at LinkedIn
Apache Kafka at LinkedInApache Kafka at LinkedIn
Apache Kafka at LinkedIn
Discover Pinterest
 
Apache Kafka - Martin Podval
Apache Kafka - Martin PodvalApache Kafka - Martin Podval
Apache Kafka - Martin Podval
Martin Podval
 
Apache Kafka
Apache KafkaApache Kafka
Apache Kafka
emreakis
 
Spring Boot+Kafka: the New Enterprise Platform
Spring Boot+Kafka: the New Enterprise PlatformSpring Boot+Kafka: the New Enterprise Platform
Spring Boot+Kafka: the New Enterprise Platform
VMware Tanzu
 
Introduction to apache kafka
Introduction to apache kafkaIntroduction to apache kafka
Introduction to apache kafka
Dimitris Kontokostas
 
Apache Kafka
Apache KafkaApache Kafka
Apache Kafka
Saroj Panyasrivanit
 
Apache kafka
Apache kafkaApache kafka
Introduction to Kafka connect
Introduction to Kafka connectIntroduction to Kafka connect
Introduction to Kafka connect
Knoldus Inc.
 
How Apache Kafka® Works
How Apache Kafka® WorksHow Apache Kafka® Works
How Apache Kafka® Works
confluent
 
Kafka Streams: What it is, and how to use it?
Kafka Streams: What it is, and how to use it?Kafka Streams: What it is, and how to use it?
Kafka Streams: What it is, and how to use it?
confluent
 
Kafka
KafkaKafka
Kafka
shrenikp
 
Apache Kafka Best Practices
Apache Kafka Best PracticesApache Kafka Best Practices
Apache Kafka Best Practices
DataWorks Summit/Hadoop Summit
 
A visual introduction to Apache Kafka
A visual introduction to Apache KafkaA visual introduction to Apache Kafka
A visual introduction to Apache Kafka
Paul Brebner
 

What's hot (20)

Apache Kafka Introduction
Apache Kafka IntroductionApache Kafka Introduction
Apache Kafka Introduction
 
Kafka 101
Kafka 101Kafka 101
Kafka 101
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache Kafka
 
Kafka 101
Kafka 101Kafka 101
Kafka 101
 
Stream processing using Kafka
Stream processing using KafkaStream processing using Kafka
Stream processing using Kafka
 
Kafka basics
Kafka basicsKafka basics
Kafka basics
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
Apache Kafka at LinkedIn
Apache Kafka at LinkedInApache Kafka at LinkedIn
Apache Kafka at LinkedIn
 
Apache Kafka - Martin Podval
Apache Kafka - Martin PodvalApache Kafka - Martin Podval
Apache Kafka - Martin Podval
 
Apache Kafka
Apache KafkaApache Kafka
Apache Kafka
 
Spring Boot+Kafka: the New Enterprise Platform
Spring Boot+Kafka: the New Enterprise PlatformSpring Boot+Kafka: the New Enterprise Platform
Spring Boot+Kafka: the New Enterprise Platform
 
Introduction to apache kafka
Introduction to apache kafkaIntroduction to apache kafka
Introduction to apache kafka
 
Apache Kafka
Apache KafkaApache Kafka
Apache Kafka
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
Introduction to Kafka connect
Introduction to Kafka connectIntroduction to Kafka connect
Introduction to Kafka connect
 
How Apache Kafka® Works
How Apache Kafka® WorksHow Apache Kafka® Works
How Apache Kafka® Works
 
Kafka Streams: What it is, and how to use it?
Kafka Streams: What it is, and how to use it?Kafka Streams: What it is, and how to use it?
Kafka Streams: What it is, and how to use it?
 
Kafka
KafkaKafka
Kafka
 
Apache Kafka Best Practices
Apache Kafka Best PracticesApache Kafka Best Practices
Apache Kafka Best Practices
 
A visual introduction to Apache Kafka
A visual introduction to Apache KafkaA visual introduction to Apache Kafka
A visual introduction to Apache Kafka
 

Similar to Fundamentals of Apache Kafka

Fundamentals and Architecture of Apache Kafka
Fundamentals and Architecture of Apache KafkaFundamentals and Architecture of Apache Kafka
Fundamentals and Architecture of Apache Kafka
Angelo Cesaro
 
Unleashing Real-time Power with Kafka.pptx
Unleashing Real-time Power with Kafka.pptxUnleashing Real-time Power with Kafka.pptx
Unleashing Real-time Power with Kafka.pptx
Knoldus Inc.
 
World of Tanks Experience of Using Kafka
World of Tanks Experience of Using KafkaWorld of Tanks Experience of Using Kafka
World of Tanks Experience of Using Kafka
Levon Avakyan
 
Kafka Explainaton
Kafka ExplainatonKafka Explainaton
Kafka Explainaton
NguyenChiHoangMinh
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
Srikrishna k
 
Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !
Guido Schmutz
 
apachekafka-160907180205.pdf
apachekafka-160907180205.pdfapachekafka-160907180205.pdf
apachekafka-160907180205.pdf
TarekHamdi8
 
Columbus mule soft_meetup_aug2021_Kafka_Integration
Columbus mule soft_meetup_aug2021_Kafka_IntegrationColumbus mule soft_meetup_aug2021_Kafka_Integration
Columbus mule soft_meetup_aug2021_Kafka_Integration
MuleSoft Meetup
 
Building an Event Bus at Scale
Building an Event Bus at ScaleBuilding an Event Bus at Scale
Building an Event Bus at Scale
jimriecken
 
Kafka tutorial
Kafka tutorialKafka tutorial
Kafka tutorial
Srikrishna k
 
How is Kafka so Fast?
How is Kafka so Fast?How is Kafka so Fast?
How is Kafka so Fast?
Ricardo Paiva
 
Copy of Kafka-Camus
Copy of Kafka-CamusCopy of Kafka-Camus
Copy of Kafka-CamusDeep Shah
 
Building High-Throughput, Low-Latency Pipelines in Kafka
Building High-Throughput, Low-Latency Pipelines in KafkaBuilding High-Throughput, Low-Latency Pipelines in Kafka
Building High-Throughput, Low-Latency Pipelines in Kafka
confluent
 
Python Kafka Integration: Developers Guide
Python Kafka Integration: Developers GuidePython Kafka Integration: Developers Guide
Python Kafka Integration: Developers Guide
Inexture Solutions
 
14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...
14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...
14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...
Athens Big Data
 
Reducing Microservice Complexity with Kafka and Reactive Streams
Reducing Microservice Complexity with Kafka and Reactive StreamsReducing Microservice Complexity with Kafka and Reactive Streams
Reducing Microservice Complexity with Kafka and Reactive Streams
jimriecken
 
Unlocking the Power of Apache Kafka: How Kafka Listeners Facilitate Real-time...
Unlocking the Power of Apache Kafka: How Kafka Listeners Facilitate Real-time...Unlocking the Power of Apache Kafka: How Kafka Listeners Facilitate Real-time...
Unlocking the Power of Apache Kafka: How Kafka Listeners Facilitate Real-time...
Denodo
 
Hands-on Workshop: Apache Pulsar
Hands-on Workshop: Apache PulsarHands-on Workshop: Apache Pulsar
Hands-on Workshop: Apache Pulsar
Sijie Guo
 
Distributed messaging through Kafka
Distributed messaging through KafkaDistributed messaging through Kafka
Distributed messaging through Kafka
Dileep Kalidindi
 
Data Models and Consumer Idioms Using Apache Kafka for Continuous Data Stream...
Data Models and Consumer Idioms Using Apache Kafka for Continuous Data Stream...Data Models and Consumer Idioms Using Apache Kafka for Continuous Data Stream...
Data Models and Consumer Idioms Using Apache Kafka for Continuous Data Stream...
Erik Onnen
 

Similar to Fundamentals of Apache Kafka (20)

Fundamentals and Architecture of Apache Kafka
Fundamentals and Architecture of Apache KafkaFundamentals and Architecture of Apache Kafka
Fundamentals and Architecture of Apache Kafka
 
Unleashing Real-time Power with Kafka.pptx
Unleashing Real-time Power with Kafka.pptxUnleashing Real-time Power with Kafka.pptx
Unleashing Real-time Power with Kafka.pptx
 
World of Tanks Experience of Using Kafka
World of Tanks Experience of Using KafkaWorld of Tanks Experience of Using Kafka
World of Tanks Experience of Using Kafka
 
Kafka Explainaton
Kafka ExplainatonKafka Explainaton
Kafka Explainaton
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !
 
apachekafka-160907180205.pdf
apachekafka-160907180205.pdfapachekafka-160907180205.pdf
apachekafka-160907180205.pdf
 
Columbus mule soft_meetup_aug2021_Kafka_Integration
Columbus mule soft_meetup_aug2021_Kafka_IntegrationColumbus mule soft_meetup_aug2021_Kafka_Integration
Columbus mule soft_meetup_aug2021_Kafka_Integration
 
Building an Event Bus at Scale
Building an Event Bus at ScaleBuilding an Event Bus at Scale
Building an Event Bus at Scale
 
Kafka tutorial
Kafka tutorialKafka tutorial
Kafka tutorial
 
How is Kafka so Fast?
How is Kafka so Fast?How is Kafka so Fast?
How is Kafka so Fast?
 
Copy of Kafka-Camus
Copy of Kafka-CamusCopy of Kafka-Camus
Copy of Kafka-Camus
 
Building High-Throughput, Low-Latency Pipelines in Kafka
Building High-Throughput, Low-Latency Pipelines in KafkaBuilding High-Throughput, Low-Latency Pipelines in Kafka
Building High-Throughput, Low-Latency Pipelines in Kafka
 
Python Kafka Integration: Developers Guide
Python Kafka Integration: Developers GuidePython Kafka Integration: Developers Guide
Python Kafka Integration: Developers Guide
 
14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...
14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...
14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...
 
Reducing Microservice Complexity with Kafka and Reactive Streams
Reducing Microservice Complexity with Kafka and Reactive StreamsReducing Microservice Complexity with Kafka and Reactive Streams
Reducing Microservice Complexity with Kafka and Reactive Streams
 
Unlocking the Power of Apache Kafka: How Kafka Listeners Facilitate Real-time...
Unlocking the Power of Apache Kafka: How Kafka Listeners Facilitate Real-time...Unlocking the Power of Apache Kafka: How Kafka Listeners Facilitate Real-time...
Unlocking the Power of Apache Kafka: How Kafka Listeners Facilitate Real-time...
 
Hands-on Workshop: Apache Pulsar
Hands-on Workshop: Apache PulsarHands-on Workshop: Apache Pulsar
Hands-on Workshop: Apache Pulsar
 
Distributed messaging through Kafka
Distributed messaging through KafkaDistributed messaging through Kafka
Distributed messaging through Kafka
 
Data Models and Consumer Idioms Using Apache Kafka for Continuous Data Stream...
Data Models and Consumer Idioms Using Apache Kafka for Continuous Data Stream...Data Models and Consumer Idioms Using Apache Kafka for Continuous Data Stream...
Data Models and Consumer Idioms Using Apache Kafka for Continuous Data Stream...
 

Recently uploaded

BoxLang: Review our Visionary Licenses of 2024
BoxLang: Review our Visionary Licenses of 2024BoxLang: Review our Visionary Licenses of 2024
BoxLang: Review our Visionary Licenses of 2024
Ortus Solutions, Corp
 
top nidhi software solution freedownload
top nidhi software solution freedownloadtop nidhi software solution freedownload
top nidhi software solution freedownload
vrstrong314
 
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoamOpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
takuyayamamoto1800
 
Explore Modern SharePoint Templates for 2024
Explore Modern SharePoint Templates for 2024Explore Modern SharePoint Templates for 2024
Explore Modern SharePoint Templates for 2024
Sharepoint Designs
 
Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024
Globus
 
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Globus
 
Vitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume MontevideoVitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume Montevideo
Vitthal Shirke
 
Multiple Your Crypto Portfolio with the Innovative Features of Advanced Crypt...
Multiple Your Crypto Portfolio with the Innovative Features of Advanced Crypt...Multiple Your Crypto Portfolio with the Innovative Features of Advanced Crypt...
Multiple Your Crypto Portfolio with the Innovative Features of Advanced Crypt...
Hivelance Technology
 
Visitor Management System in India- Vizman.app
Visitor Management System in India- Vizman.appVisitor Management System in India- Vizman.app
Visitor Management System in India- Vizman.app
NaapbooksPrivateLimi
 
SOCRadar Research Team: Latest Activities of IntelBroker
SOCRadar Research Team: Latest Activities of IntelBrokerSOCRadar Research Team: Latest Activities of IntelBroker
SOCRadar Research Team: Latest Activities of IntelBroker
SOCRadar
 
Advanced Flow Concepts Every Developer Should Know
Advanced Flow Concepts Every Developer Should KnowAdvanced Flow Concepts Every Developer Should Know
Advanced Flow Concepts Every Developer Should Know
Peter Caitens
 
Large Language Models and the End of Programming
Large Language Models and the End of ProgrammingLarge Language Models and the End of Programming
Large Language Models and the End of Programming
Matt Welsh
 
Prosigns: Transforming Business with Tailored Technology Solutions
Prosigns: Transforming Business with Tailored Technology SolutionsProsigns: Transforming Business with Tailored Technology Solutions
Prosigns: Transforming Business with Tailored Technology Solutions
Prosigns
 
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
informapgpstrackings
 
How Recreation Management Software Can Streamline Your Operations.pptx
How Recreation Management Software Can Streamline Your Operations.pptxHow Recreation Management Software Can Streamline Your Operations.pptx
How Recreation Management Software Can Streamline Your Operations.pptx
wottaspaceseo
 
Cyaniclab : Software Development Agency Portfolio.pdf
Cyaniclab : Software Development Agency Portfolio.pdfCyaniclab : Software Development Agency Portfolio.pdf
Cyaniclab : Software Development Agency Portfolio.pdf
Cyanic lab
 
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERRORTROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
Tier1 app
 
Understanding Globus Data Transfers with NetSage
Understanding Globus Data Transfers with NetSageUnderstanding Globus Data Transfers with NetSage
Understanding Globus Data Transfers with NetSage
Globus
 
Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus Compute wth IRI Workflows - GlobusWorld 2024Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus
 
Developing Distributed High-performance Computing Capabilities of an Open Sci...
Developing Distributed High-performance Computing Capabilities of an Open Sci...Developing Distributed High-performance Computing Capabilities of an Open Sci...
Developing Distributed High-performance Computing Capabilities of an Open Sci...
Globus
 

Recently uploaded (20)

BoxLang: Review our Visionary Licenses of 2024
BoxLang: Review our Visionary Licenses of 2024BoxLang: Review our Visionary Licenses of 2024
BoxLang: Review our Visionary Licenses of 2024
 
top nidhi software solution freedownload
top nidhi software solution freedownloadtop nidhi software solution freedownload
top nidhi software solution freedownload
 
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoamOpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
 
Explore Modern SharePoint Templates for 2024
Explore Modern SharePoint Templates for 2024Explore Modern SharePoint Templates for 2024
Explore Modern SharePoint Templates for 2024
 
Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024
 
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
 
Vitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume MontevideoVitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume Montevideo
 
Multiple Your Crypto Portfolio with the Innovative Features of Advanced Crypt...
Multiple Your Crypto Portfolio with the Innovative Features of Advanced Crypt...Multiple Your Crypto Portfolio with the Innovative Features of Advanced Crypt...
Multiple Your Crypto Portfolio with the Innovative Features of Advanced Crypt...
 
Visitor Management System in India- Vizman.app
Visitor Management System in India- Vizman.appVisitor Management System in India- Vizman.app
Visitor Management System in India- Vizman.app
 
SOCRadar Research Team: Latest Activities of IntelBroker
SOCRadar Research Team: Latest Activities of IntelBrokerSOCRadar Research Team: Latest Activities of IntelBroker
SOCRadar Research Team: Latest Activities of IntelBroker
 
Advanced Flow Concepts Every Developer Should Know
Advanced Flow Concepts Every Developer Should KnowAdvanced Flow Concepts Every Developer Should Know
Advanced Flow Concepts Every Developer Should Know
 
Large Language Models and the End of Programming
Large Language Models and the End of ProgrammingLarge Language Models and the End of Programming
Large Language Models and the End of Programming
 
Prosigns: Transforming Business with Tailored Technology Solutions
Prosigns: Transforming Business with Tailored Technology SolutionsProsigns: Transforming Business with Tailored Technology Solutions
Prosigns: Transforming Business with Tailored Technology Solutions
 
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
 
How Recreation Management Software Can Streamline Your Operations.pptx
How Recreation Management Software Can Streamline Your Operations.pptxHow Recreation Management Software Can Streamline Your Operations.pptx
How Recreation Management Software Can Streamline Your Operations.pptx
 
Cyaniclab : Software Development Agency Portfolio.pdf
Cyaniclab : Software Development Agency Portfolio.pdfCyaniclab : Software Development Agency Portfolio.pdf
Cyaniclab : Software Development Agency Portfolio.pdf
 
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERRORTROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
 
Understanding Globus Data Transfers with NetSage
Understanding Globus Data Transfers with NetSageUnderstanding Globus Data Transfers with NetSage
Understanding Globus Data Transfers with NetSage
 
Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus Compute wth IRI Workflows - GlobusWorld 2024Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus Compute wth IRI Workflows - GlobusWorld 2024
 
Developing Distributed High-performance Computing Capabilities of an Open Sci...
Developing Distributed High-performance Computing Capabilities of an Open Sci...Developing Distributed High-performance Computing Capabilities of an Open Sci...
Developing Distributed High-performance Computing Capabilities of an Open Sci...
 

Fundamentals of Apache Kafka

  • 1. ZAPR MEDIA LABS Presented by: Chhavi Parasher and Ankit Timbadia
  • 2. Agenda ● What is Apache Kafka? ● Why Apache Kafka? ● Fundamentals ○ Terminologies ○ Architecture ○ Protocol ● What does kafka offer? ● Where Kafka is used?
  • 3. What is Apache Kafka?
  • 4. What is kafka “A high throughput distributed pub-sub messaging system.
  • 5. What is kafka “A pub-sub messaging system rethought as a distributed commit log storage.”
  • 7. Why Kafka Client Source Data Pipelines Start like this.
  • 9. Why Kafka Client Backend1 Client Client Client Then we add additional endpoints to the existing sources Backend2
  • 10. Why Kafka Client Backend1 Client Client Client Then it starts to look like this Backend2 Backend3 Backend4
  • 11. Birth of Kafka Kafka decouples Data Pipelines Source System Source System Source System Source System Hadoop Security Systems Real-time monitoring Data Warehouse Kafka Producers Brokers Consumers
  • 14. Terminologies • Topic is a message stream (queue) • Partitions - Topic is divided into partitions • Ordered, immutable, a structured commit log • Segments - Partition has log segments on disk • Partitions broken down into physical files. • Offset - Segment has messages • Each message is assigned a sequential id
  • 15. Anatomy of a topic 0 1 2 3 4 5 6 7 8 9 1 0 1 1 1 2 1 3 0 1 2 3 4 5 6 7 8 9 1 0 1 1 0 1 2 3 4 5 6 7 8 9 1 0 1 1 1 2 1 3 Partition 1 Partition 2 Partition 3 Writes Old New Segment1 Segment2
  • 16. Kafka Message (Write once read many)
  • 17. Partitions and Brokers • Each broker can be a leader or a replica for a partition • All writes & reads to a topic go through the leader
  • 18. Producer writing to partition • Producers write to a single leader • Each write can be serviced by a separate broker
  • 19. Producer writing to second partition.
  • 20. Keyed Messages and Partitioning Producer
  • 21. Partitioning Distributes clients across consumers Consumer Consumer Consumer Consumer Consumer Consumer Consumer Producer Producer Producer Producer Producer
  • 23. What does kafka offer?
  • 24. What does kafka offer? • Fast • Scalable • Durable • Distributed by Design
  • 25. What does kafka offer? • Fast • A single Kafka broker can handle 100Mbps of reads & writes from 1000 of clients. • Scalable • Durable • Distributed by Design
  • 26. What does kafka offer? • Fast • Scalable • Data streams are partitioned and spread over a cluster of machines to allow data streams larger than the capability of any single machine • Durable • Distributed by Design
  • 27. What does kafka offer? • Fast • Scalable • Durable • Messages are persisted on disk and replicated within the cluster to prevent data loss. Each broker can handle terabytes of messages without performance impact. • Distributed by Design
  • 28. What does kafka offer • Fast • Scalable • Durable • Distributed by Design • Kafka has a modern cluster-centric design that offers strong durability and fault-tolerance guarantees.
  • 29. Efficiency - high throughput & low latency • Disks are fast when used sequentially • Append to end of log. • Fetch messages from a partition beginning from a particular message id. • Batching makes best use of network/IO • Batched send and receive • Batched compression (GZIP, Snappy and LZ4)
  • 30. Efficiency - high throughput & low latency • No message caching in JVM • Zero-copy from file to socket (Java NIO)
  • 32. Where is Kafka used?
  • 33. Who uses Apache Kafka?
  • 34. What do people use it for? • Activity tracking • page views or click tracking, profile updates • Real-time event processing • Provides low latency • Good integration with spark, samza. storm etc. • Collecting Operational Metrics • Monitoring & alerting • Commit log • database changes can be published to Kafka
  • 35. Why Replication? • Broker can go down • controlled: rolling restart for code/config push • uncontrolled: isolated broker failure • If broker is down • some partitions are unavailable • could be permanent data loss • Replication ensures higher availability & durability
  • 36. Caveats • Not designed for large payloads. • Decoding is done for a whole message (no streaming decoding). • Rebalancing can screw things up. • if you're doing any aggregation in the consumer by the partition key • Number of partitions cannot be easily changed (chose wisely). • Lots of topics can hurt I/O performance.
  • 37. Guarantees • At least once delivery • In order delivery, per partition • For a topic with replication factor N, Kafka can tolerate up to N-1 server failures without “losing” any messages committed to the log
  • 38. Summary • A high throughput distributed messaging system rethought as commit log • Originally developed by LinkedIn, Open Sourced in 2011 • Written in Scala, Clients for every popular language • Used by LinkedIn, Twitter, Netflix and many more companies. • When should we use Kafka ? • When should we not use kafka?
  • 40. Agenda • Kafka Clusters at @Zapr • Kafka in AWS • Challenges Managing Kafka Brokers. • Monitoring of Kafka Clusters.
  • 41. Kafka Version: 0.8.2 Total Kafka Clusters: 5 Number of Topics: 185 Brokers: 17 Partitions: 7827 Regions: 2 (us-east-1,ap-southeast-1) Kafka at Zapr
  • 42. Kafka Clusters on Amazon Web Services. Quorum of 3 Zookeeper Nodes for High Availability Kafka Brokers
  • 43. Resource Utilization Family Type vCPU Memory CPU Pricing Compute Optimized C4.2xlarge 8 16 ~75% 338$ General Purpose M4.2xlarge 8 32 ~25% 366$ After moving from C4 Class to M4 Class Machines 40% Improvement in CPU Usage
  • 44. Production Kafka Broker Configurations
  • 45. #Default Partitions and Replication Factor for Kafka Topics num.partitions=10 default.replication.factor=2 #Controlled shutdown for the proper shutdown of Kafka Broker for maintenance. num.recovery.threads.per.data.dir=4 controlled.shutdown.enable=true controlled.shutdown.max.retries=5 controlled.shutdown.retry.backoff.ms=60000 #Creation and Deletion of Kafka Topics delete.topic.enable=true auto.create.topics.enable=true Production Kafka Broker Configurations
  • 47. Dynamic Load • Kafka Scales Horizontally • For a topic, Kafka has N (Replicated Partitions) • Adding a Broker causes Partition to rebalanced
  • 48. Current State 2 Brokers, 1 Topic and 6 Partitions & replication 2 Desired State after adding 3rd Broker
  • 50. Introducing Kafka Manager https://github.com/yahoo/kafka-manager It supports the following : ● Manage multiple clusters. ● Easy inspection of cluster state (topics, consumers, offsets, brokers) ● Generate partition assignments with option to select brokers to use ● Run reassignment of partition (based on generated assignments) ● Create a topic with optional topic configs (0.8.1.1 has different configs than 0.8.2+) ● Delete topic (only supported on 0.8.2+ and remember set delete.topic.enable=true in broker config) ● Add partitions to existing topic ● Update config for existing topic ● Optionally enable JMX polling for broker level and topic level metrics.
  • 59. Kafka Graphite Metric Plugin https://github.com/damienclaveau/kafka-graphite #Graphite Reporter kafka.metrics.reporters=com.criteo.kafka.KafkaGraphiteM etricsReporter kafka.graphite.metrics.reporter.enabled=true kafka.metrics.polling.interval.secs = 60 kafka.graphite.metrics.host= graphite.zapr.com kafka.graphite.metrics.port=2003 kafka.graphite.metrics.group=PROD.kafka-cluster-1.10-0-1- 90
  • 61. Kafka Offset Monitoring https://github.com/quantifind/KafkaOffsetMonitor ● This is an app to monitor your kafka consumers and their position (offset) in the queue. ● You can see the current consumer groups, for each group the topics that they are consuming ● Offset are useful to understand how quick you are consuming from a queue and how fast the queue is growing
  • 64. Future plans • Upgrading EC2 instances to next generations from M4 to M5 • Upgrading Kafka Version from 0.8.2 to latest version (1.2) • Using ST1 (throughput optimized HDD) EBS