SlideShare a Scribd company logo
Apache Kafka
CMSC 491
Hadoop-Based Distributed Computing
Spring 2016
Adam Shook
Overview
• Kafka is a “publish-subscribe messaging
rethought as a distributed commit log”
• Fast
• Scalable
• Durable
• Distributed
Kafka adoption and use cases
• LinkedIn: activity streams, operational metrics, data bus
– 400 nodes, 18k topics, 220B msg/day (peak 3.2M msg/s), May 2014
• Netflix: real-time monitoring and event processing
• Twitter: as part of their Storm real-time data pipelines
• Spotify: log delivery (from 4h down to 10s), Hadoop
• Loggly: log collection and processing
• Mozilla: telemetry data
• Airbnb, Cisco, Gnip, InfoChimps, Ooyala, Square, Uber, …
3
How fast is Kafka?
• “Up to 2 million writes/sec on 3 cheap machines”
– Using 3 producers on 3 different machines, 3x async replication
• Only 1 producer/machine because NIC already saturated
• Sustained throughput as stored data grows
– Slightly different test config than 2M writes/sec above.
4
Why is Kafka so fast?
• Fast writes:
– While Kafka persists all data to disk, essentially all writes go to the
page cache of OS, i.e. RAM.
• Fast reads:
– Very efficient to transfer data from page cache to a network socket
– Linux: sendfile() system call
• Combination of the two = fast Kafka!
– Example (Operations): On a Kafka cluster where the consumers are mostly
caught up you will see no read activity on the disks as they will be serving data
entirely from cache.
5
A first look
• The who is who
– Producers write data to
brokers.
– Consumers read data from
brokers.
– All this is distributed.
• The data
– Data is stored in topics.
– Topics are split into
partitions, which are
replicated.
6
A first look
7
Broker(s)
Topics
8
new
Producer A1
Producer A2
Producer An
…
Producers always append to “tail”
(think: append to a file)
…
Kafka prunes “head” based on age or max size or “key”
Older msgs Newer msgs
Kafka topic
• Topic: feed name to which messages are published
– Example: “zerg.hydra”
Broker(s)
Topics
9
new
Producer A1
Producer A2
Producer An
…
Producers always append to “tail”
(think: append to a file)
…
Older msgs Newer msgs
Consumer group C1 Consumers use an “offset pointer” to
track/control their read progress
(and decide the pace of consumption)
Consumer group C2
Partitions
10
• A topic consists of partitions.
• Partition: ordered + immutable sequence of messages
that is continually appended to
Partitions
11
• #partitions of a topic is configurable
• #partitions determines max consumer (group) parallelism
– cf. parallelism of Storm’s KafkaSpout via builder.setSpout(,,N)
– Consumer group A, with 2 consumers, reads from a 4-partition topic
– Consumer group B, with 4 consumers, reads from the same topic
Partition offsets
12
• Offset: messages in the partitions are each assigned a unique
(per partition) and sequential id called the offset
– Consumers track their pointers via (offset, partition, topic) tuples
Consumer group C1
Replicas of a partition
• Replicas: “backups” of a partition
– They exist solely to prevent data loss.
– Replicas are never read from, never written to.
• They do NOT help to increase producer or consumer
parallelism!
– Kafka tolerates (numReplicas - 1) dead brokers
before losing data
• LinkedIn: numReplicas == 2  1 broker can die
13
Kafka Quickstart
• Steps for downloading Kafka, starting a server,
and creating a console-based
consumer/producer
• Requires ZooKeeper to be installed and
running
• https://kafka.apache.org/documentation.html
#quickstart
• https://github.com/adamjshook/hadoop-
demos/tree/master/kafka

More Related Content

Similar to 04-Kafka.pptx

Kafka Summit NYC 2017 Introduction to Kafka Streams with a Real-life Example
Kafka Summit NYC 2017 Introduction to Kafka Streams with a Real-life ExampleKafka Summit NYC 2017 Introduction to Kafka Streams with a Real-life Example
Kafka Summit NYC 2017 Introduction to Kafka Streams with a Real-life Example
confluent
 
Fundamentals of Apache Kafka
Fundamentals of Apache KafkaFundamentals of Apache Kafka
Fundamentals of Apache Kafka
Chhavi Parasher
 
Apache Kafka DC Meetup: Replicating DB Binary Logs to Kafka
Apache Kafka DC Meetup: Replicating DB Binary Logs to KafkaApache Kafka DC Meetup: Replicating DB Binary Logs to Kafka
Apache Kafka DC Meetup: Replicating DB Binary Logs to Kafka
Mark Bittmann
 
kafka_session_updated.pptx
kafka_session_updated.pptxkafka_session_updated.pptx
kafka_session_updated.pptx
Koiuyt1
 
Introduction to apache kafka
Introduction to apache kafkaIntroduction to apache kafka
Introduction to apache kafka
Dimitris Kontokostas
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
Kumar Shivam
 
Apache Kafka with Spark Streaming: Real-time Analytics Redefined
Apache Kafka with Spark Streaming: Real-time Analytics RedefinedApache Kafka with Spark Streaming: Real-time Analytics Redefined
Apache Kafka with Spark Streaming: Real-time Analytics Redefined
Edureka!
 
Kafka
KafkaKafka
Kafka
shrenikp
 
Unlocking the Power of Apache Kafka: How Kafka Listeners Facilitate Real-time...
Unlocking the Power of Apache Kafka: How Kafka Listeners Facilitate Real-time...Unlocking the Power of Apache Kafka: How Kafka Listeners Facilitate Real-time...
Unlocking the Power of Apache Kafka: How Kafka Listeners Facilitate Real-time...
Denodo
 
CouchbasetoHadoop_Matt_Michael_Justin v4
CouchbasetoHadoop_Matt_Michael_Justin v4CouchbasetoHadoop_Matt_Michael_Justin v4
CouchbasetoHadoop_Matt_Michael_Justin v4
Michael Kehoe
 
Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !
Guido Schmutz
 
Kafka overview
Kafka overviewKafka overview
Kafka overview
Shanki Singh Gandhi
 
Kafka and ibm event streams basics
Kafka and ibm event streams basicsKafka and ibm event streams basics
Kafka and ibm event streams basics
Brian S. Paskin
 
Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! ...
Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! ...Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! ...
Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! ...
Trivadis
 
Kafka Explainaton
Kafka ExplainatonKafka Explainaton
Kafka Explainaton
NguyenChiHoangMinh
 
introductiontoapachekafka-201102140206.pdf
introductiontoapachekafka-201102140206.pdfintroductiontoapachekafka-201102140206.pdf
introductiontoapachekafka-201102140206.pdf
TarekHamdi8
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache Kafka
AIMDek Technologies
 
Kafka 10000 feet view
Kafka 10000 feet viewKafka 10000 feet view
Kafka 10000 feet view
younessx01
 
Kafka Deep Dive
Kafka Deep DiveKafka Deep Dive
Kafka Deep Dive
Knoldus Inc.
 
Copy of Kafka-Camus
Copy of Kafka-CamusCopy of Kafka-Camus
Copy of Kafka-Camus
Deep Shah
 

Similar to 04-Kafka.pptx (20)

Kafka Summit NYC 2017 Introduction to Kafka Streams with a Real-life Example
Kafka Summit NYC 2017 Introduction to Kafka Streams with a Real-life ExampleKafka Summit NYC 2017 Introduction to Kafka Streams with a Real-life Example
Kafka Summit NYC 2017 Introduction to Kafka Streams with a Real-life Example
 
Fundamentals of Apache Kafka
Fundamentals of Apache KafkaFundamentals of Apache Kafka
Fundamentals of Apache Kafka
 
Apache Kafka DC Meetup: Replicating DB Binary Logs to Kafka
Apache Kafka DC Meetup: Replicating DB Binary Logs to KafkaApache Kafka DC Meetup: Replicating DB Binary Logs to Kafka
Apache Kafka DC Meetup: Replicating DB Binary Logs to Kafka
 
kafka_session_updated.pptx
kafka_session_updated.pptxkafka_session_updated.pptx
kafka_session_updated.pptx
 
Introduction to apache kafka
Introduction to apache kafkaIntroduction to apache kafka
Introduction to apache kafka
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
Apache Kafka with Spark Streaming: Real-time Analytics Redefined
Apache Kafka with Spark Streaming: Real-time Analytics RedefinedApache Kafka with Spark Streaming: Real-time Analytics Redefined
Apache Kafka with Spark Streaming: Real-time Analytics Redefined
 
Kafka
KafkaKafka
Kafka
 
Unlocking the Power of Apache Kafka: How Kafka Listeners Facilitate Real-time...
Unlocking the Power of Apache Kafka: How Kafka Listeners Facilitate Real-time...Unlocking the Power of Apache Kafka: How Kafka Listeners Facilitate Real-time...
Unlocking the Power of Apache Kafka: How Kafka Listeners Facilitate Real-time...
 
CouchbasetoHadoop_Matt_Michael_Justin v4
CouchbasetoHadoop_Matt_Michael_Justin v4CouchbasetoHadoop_Matt_Michael_Justin v4
CouchbasetoHadoop_Matt_Michael_Justin v4
 
Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !
 
Kafka overview
Kafka overviewKafka overview
Kafka overview
 
Kafka and ibm event streams basics
Kafka and ibm event streams basicsKafka and ibm event streams basics
Kafka and ibm event streams basics
 
Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! ...
Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! ...Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! ...
Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! ...
 
Kafka Explainaton
Kafka ExplainatonKafka Explainaton
Kafka Explainaton
 
introductiontoapachekafka-201102140206.pdf
introductiontoapachekafka-201102140206.pdfintroductiontoapachekafka-201102140206.pdf
introductiontoapachekafka-201102140206.pdf
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache Kafka
 
Kafka 10000 feet view
Kafka 10000 feet viewKafka 10000 feet view
Kafka 10000 feet view
 
Kafka Deep Dive
Kafka Deep DiveKafka Deep Dive
Kafka Deep Dive
 
Copy of Kafka-Camus
Copy of Kafka-CamusCopy of Kafka-Camus
Copy of Kafka-Camus
 

Recently uploaded

What is Augmented Reality Image Tracking
What is Augmented Reality Image TrackingWhat is Augmented Reality Image Tracking
What is Augmented Reality Image Tracking
pavan998932
 
E-commerce Development Services- Hornet Dynamics
E-commerce Development Services- Hornet DynamicsE-commerce Development Services- Hornet Dynamics
E-commerce Development Services- Hornet Dynamics
Hornet Dynamics
 
Need for Speed: Removing speed bumps from your Symfony projects ⚡️
Need for Speed: Removing speed bumps from your Symfony projects ⚡️Need for Speed: Removing speed bumps from your Symfony projects ⚡️
Need for Speed: Removing speed bumps from your Symfony projects ⚡️
Łukasz Chruściel
 
原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样
原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样
原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样
mz5nrf0n
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
Safe Software
 
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI AppAI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
Google
 
Hand Rolled Applicative User Validation Code Kata
Hand Rolled Applicative User ValidationCode KataHand Rolled Applicative User ValidationCode Kata
Hand Rolled Applicative User Validation Code Kata
Philip Schwarz
 
OpenMetadata Community Meeting - 5th June 2024
OpenMetadata Community Meeting - 5th June 2024OpenMetadata Community Meeting - 5th June 2024
OpenMetadata Community Meeting - 5th June 2024
OpenMetadata
 
Microservice Teams - How the cloud changes the way we work
Microservice Teams - How the cloud changes the way we workMicroservice Teams - How the cloud changes the way we work
Microservice Teams - How the cloud changes the way we work
Sven Peters
 
2024 eCommerceDays Toulouse - Sylius 2.0.pdf
2024 eCommerceDays Toulouse - Sylius 2.0.pdf2024 eCommerceDays Toulouse - Sylius 2.0.pdf
2024 eCommerceDays Toulouse - Sylius 2.0.pdf
Łukasz Chruściel
 
openEuler Case Study - The Journey to Supply Chain Security
openEuler Case Study - The Journey to Supply Chain SecurityopenEuler Case Study - The Journey to Supply Chain Security
openEuler Case Study - The Journey to Supply Chain Security
Shane Coughlan
 
Artificia Intellicence and XPath Extension Functions
Artificia Intellicence and XPath Extension FunctionsArtificia Intellicence and XPath Extension Functions
Artificia Intellicence and XPath Extension Functions
Octavian Nadolu
 
ALGIT - Assembly Line for Green IT - Numbers, Data, Facts
ALGIT - Assembly Line for Green IT - Numbers, Data, FactsALGIT - Assembly Line for Green IT - Numbers, Data, Facts
ALGIT - Assembly Line for Green IT - Numbers, Data, Facts
Green Software Development
 
Webinar On-Demand: Using Flutter for Embedded
Webinar On-Demand: Using Flutter for EmbeddedWebinar On-Demand: Using Flutter for Embedded
Webinar On-Demand: Using Flutter for Embedded
ICS
 
SWEBOK and Education at FUSE Okinawa 2024
SWEBOK and Education at FUSE Okinawa 2024SWEBOK and Education at FUSE Okinawa 2024
SWEBOK and Education at FUSE Okinawa 2024
Hironori Washizaki
 
KuberTENes Birthday Bash Guadalajara - Introducción a Argo CD
KuberTENes Birthday Bash Guadalajara - Introducción a Argo CDKuberTENes Birthday Bash Guadalajara - Introducción a Argo CD
KuberTENes Birthday Bash Guadalajara - Introducción a Argo CD
rodomar2
 
A Study of Variable-Role-based Feature Enrichment in Neural Models of Code
A Study of Variable-Role-based Feature Enrichment in Neural Models of CodeA Study of Variable-Role-based Feature Enrichment in Neural Models of Code
A Study of Variable-Role-based Feature Enrichment in Neural Models of Code
Aftab Hussain
 
APIs for Browser Automation (MoT Meetup 2024)
APIs for Browser Automation (MoT Meetup 2024)APIs for Browser Automation (MoT Meetup 2024)
APIs for Browser Automation (MoT Meetup 2024)
Boni García
 
GOING AOT WITH GRAALVM FOR SPRING BOOT (SPRING IO)
GOING AOT WITH GRAALVM FOR  SPRING BOOT (SPRING IO)GOING AOT WITH GRAALVM FOR  SPRING BOOT (SPRING IO)
GOING AOT WITH GRAALVM FOR SPRING BOOT (SPRING IO)
Alina Yurenko
 
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Crescat
 

Recently uploaded (20)

What is Augmented Reality Image Tracking
What is Augmented Reality Image TrackingWhat is Augmented Reality Image Tracking
What is Augmented Reality Image Tracking
 
E-commerce Development Services- Hornet Dynamics
E-commerce Development Services- Hornet DynamicsE-commerce Development Services- Hornet Dynamics
E-commerce Development Services- Hornet Dynamics
 
Need for Speed: Removing speed bumps from your Symfony projects ⚡️
Need for Speed: Removing speed bumps from your Symfony projects ⚡️Need for Speed: Removing speed bumps from your Symfony projects ⚡️
Need for Speed: Removing speed bumps from your Symfony projects ⚡️
 
原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样
原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样
原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
 
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI AppAI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
 
Hand Rolled Applicative User Validation Code Kata
Hand Rolled Applicative User ValidationCode KataHand Rolled Applicative User ValidationCode Kata
Hand Rolled Applicative User Validation Code Kata
 
OpenMetadata Community Meeting - 5th June 2024
OpenMetadata Community Meeting - 5th June 2024OpenMetadata Community Meeting - 5th June 2024
OpenMetadata Community Meeting - 5th June 2024
 
Microservice Teams - How the cloud changes the way we work
Microservice Teams - How the cloud changes the way we workMicroservice Teams - How the cloud changes the way we work
Microservice Teams - How the cloud changes the way we work
 
2024 eCommerceDays Toulouse - Sylius 2.0.pdf
2024 eCommerceDays Toulouse - Sylius 2.0.pdf2024 eCommerceDays Toulouse - Sylius 2.0.pdf
2024 eCommerceDays Toulouse - Sylius 2.0.pdf
 
openEuler Case Study - The Journey to Supply Chain Security
openEuler Case Study - The Journey to Supply Chain SecurityopenEuler Case Study - The Journey to Supply Chain Security
openEuler Case Study - The Journey to Supply Chain Security
 
Artificia Intellicence and XPath Extension Functions
Artificia Intellicence and XPath Extension FunctionsArtificia Intellicence and XPath Extension Functions
Artificia Intellicence and XPath Extension Functions
 
ALGIT - Assembly Line for Green IT - Numbers, Data, Facts
ALGIT - Assembly Line for Green IT - Numbers, Data, FactsALGIT - Assembly Line for Green IT - Numbers, Data, Facts
ALGIT - Assembly Line for Green IT - Numbers, Data, Facts
 
Webinar On-Demand: Using Flutter for Embedded
Webinar On-Demand: Using Flutter for EmbeddedWebinar On-Demand: Using Flutter for Embedded
Webinar On-Demand: Using Flutter for Embedded
 
SWEBOK and Education at FUSE Okinawa 2024
SWEBOK and Education at FUSE Okinawa 2024SWEBOK and Education at FUSE Okinawa 2024
SWEBOK and Education at FUSE Okinawa 2024
 
KuberTENes Birthday Bash Guadalajara - Introducción a Argo CD
KuberTENes Birthday Bash Guadalajara - Introducción a Argo CDKuberTENes Birthday Bash Guadalajara - Introducción a Argo CD
KuberTENes Birthday Bash Guadalajara - Introducción a Argo CD
 
A Study of Variable-Role-based Feature Enrichment in Neural Models of Code
A Study of Variable-Role-based Feature Enrichment in Neural Models of CodeA Study of Variable-Role-based Feature Enrichment in Neural Models of Code
A Study of Variable-Role-based Feature Enrichment in Neural Models of Code
 
APIs for Browser Automation (MoT Meetup 2024)
APIs for Browser Automation (MoT Meetup 2024)APIs for Browser Automation (MoT Meetup 2024)
APIs for Browser Automation (MoT Meetup 2024)
 
GOING AOT WITH GRAALVM FOR SPRING BOOT (SPRING IO)
GOING AOT WITH GRAALVM FOR  SPRING BOOT (SPRING IO)GOING AOT WITH GRAALVM FOR  SPRING BOOT (SPRING IO)
GOING AOT WITH GRAALVM FOR SPRING BOOT (SPRING IO)
 
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
 

04-Kafka.pptx

  • 1. Apache Kafka CMSC 491 Hadoop-Based Distributed Computing Spring 2016 Adam Shook
  • 2. Overview • Kafka is a “publish-subscribe messaging rethought as a distributed commit log” • Fast • Scalable • Durable • Distributed
  • 3. Kafka adoption and use cases • LinkedIn: activity streams, operational metrics, data bus – 400 nodes, 18k topics, 220B msg/day (peak 3.2M msg/s), May 2014 • Netflix: real-time monitoring and event processing • Twitter: as part of their Storm real-time data pipelines • Spotify: log delivery (from 4h down to 10s), Hadoop • Loggly: log collection and processing • Mozilla: telemetry data • Airbnb, Cisco, Gnip, InfoChimps, Ooyala, Square, Uber, … 3
  • 4. How fast is Kafka? • “Up to 2 million writes/sec on 3 cheap machines” – Using 3 producers on 3 different machines, 3x async replication • Only 1 producer/machine because NIC already saturated • Sustained throughput as stored data grows – Slightly different test config than 2M writes/sec above. 4
  • 5. Why is Kafka so fast? • Fast writes: – While Kafka persists all data to disk, essentially all writes go to the page cache of OS, i.e. RAM. • Fast reads: – Very efficient to transfer data from page cache to a network socket – Linux: sendfile() system call • Combination of the two = fast Kafka! – Example (Operations): On a Kafka cluster where the consumers are mostly caught up you will see no read activity on the disks as they will be serving data entirely from cache. 5
  • 6. A first look • The who is who – Producers write data to brokers. – Consumers read data from brokers. – All this is distributed. • The data – Data is stored in topics. – Topics are split into partitions, which are replicated. 6
  • 8. Broker(s) Topics 8 new Producer A1 Producer A2 Producer An … Producers always append to “tail” (think: append to a file) … Kafka prunes “head” based on age or max size or “key” Older msgs Newer msgs Kafka topic • Topic: feed name to which messages are published – Example: “zerg.hydra”
  • 9. Broker(s) Topics 9 new Producer A1 Producer A2 Producer An … Producers always append to “tail” (think: append to a file) … Older msgs Newer msgs Consumer group C1 Consumers use an “offset pointer” to track/control their read progress (and decide the pace of consumption) Consumer group C2
  • 10. Partitions 10 • A topic consists of partitions. • Partition: ordered + immutable sequence of messages that is continually appended to
  • 11. Partitions 11 • #partitions of a topic is configurable • #partitions determines max consumer (group) parallelism – cf. parallelism of Storm’s KafkaSpout via builder.setSpout(,,N) – Consumer group A, with 2 consumers, reads from a 4-partition topic – Consumer group B, with 4 consumers, reads from the same topic
  • 12. Partition offsets 12 • Offset: messages in the partitions are each assigned a unique (per partition) and sequential id called the offset – Consumers track their pointers via (offset, partition, topic) tuples Consumer group C1
  • 13. Replicas of a partition • Replicas: “backups” of a partition – They exist solely to prevent data loss. – Replicas are never read from, never written to. • They do NOT help to increase producer or consumer parallelism! – Kafka tolerates (numReplicas - 1) dead brokers before losing data • LinkedIn: numReplicas == 2  1 broker can die 13
  • 14. Kafka Quickstart • Steps for downloading Kafka, starting a server, and creating a console-based consumer/producer • Requires ZooKeeper to be installed and running • https://kafka.apache.org/documentation.html #quickstart • https://github.com/adamjshook/hadoop- demos/tree/master/kafka