SlideShare a Scribd company logo
1 of 24
Download to read offline
Apache Kafka
Free Friday
Luiza Souza / Otávio Carvalho
lsouza@thoughtworks.com
ocarvalh@thoughtworks.com
Apache Kafka
● Apache Kafka is a distributed messaging system
○ Provides fast, highly scalable and redundant messaging
through a pub-sub model
● It was built at LinkedIn to be used as central hub for all of the
messaging communication between their systems
● Focus on scalability and fault tolerance
Motivation
● Microservices
○ "In short, the microservice architectural style is an approach to developing a
single application as a suite of small services, each running in its own process
and communicating with lightweight mechanisms, often an HTTP resource
API. These services are built around business capabilities and independently
deployable by fully automated deployment machinery."- Martin Fowler
● Monolith First
○ Using microservices as a way to decompose monolitical
infrastructures
● Message Queues
○ Asynchronous processing
○ Decoupling
○ Load balancing
○ Scalability
How is it different?
● High throughput
○ Millions of events per second per node
● Fault-tolerance guarantees
○ Relies on Apache Zookeeper for detection of node failures
and leader election
○ Maintains a structure called ISR (In-Sync Replica Set) in order
to be able to tolerate node failures
○ (Claims to) Guarantees up to f failures with f+1 replicas
without losing data
● Distributed
○ More nodes can be included and the system keeps its
high-performance and fault-tolerance capabilities
● Broker-centric (AMQP)
○ AMQP implementations are usually broker-centric
○ Focus on delivery guarantees between producers/consumers
○ Transient preferred over durable messages
○ Use the broker itself to maintain state of what is consumed
(via message acknowledgements)
● Producer-centric (Kafka)
○ Partition a fire hose of event data into durable message
brokers with cursors (pointers)
○ Support to batch consumers that may be offline, or online
consumers that want messages at low latency
○ Doesn't have message acknowledgements, it assumes the
consumer tracks what has been consumed so far
Comparison with AMQP
Kafka Terminology
● Producers
○ Processes that publishes
msgs to topics
● Consumers
○ Processes that reads
msgs from topics
● Topic
○ Name of the feed to which
msgs are published
● Broker
○ Process running on a
single machine
● Cluster
○ Group of brokers working
together
Kafka Terminology
● Partitions
○ Subdivision of Topics
■ Scalability
■ Load balancing
○ Consumers control
their own offsets
● Replication
○ In-Sync-Replica (ISR) sets
Kafka Terminology
Figure 1. A Kafka cluster with 4 brokers, 1 topic and 2 partitions, each
with 3 replicas
Use Cases
● Messaging
● Distributed log / Log aggregation
● Change Data Capture
● Stream Processing / Event Sourcing
Use Cases - Messaging
● Messaging
○ Simple Queueing
■ e.g. Queue for sending e-mails
○ Tracking user events
○ Near real-time metrics
Use Cases - Distributed Log
● Distributed log / Log aggregation
○ LinkedIn usage
■ The whole platform is built around a central log
■ 13 million messages/sec, 15 gigabytes per sec
■ Over 1100 brokers in more than 60 clusters
Use Cases - Change Data Capture
Use Cases - Stream Processing
● Stream Processing / Event Sourcing
LinkedIn's example Netflix's example
DEMO
14
ISSUES
15
Issues
● CAP theorem (Consistency, Availability, Partitioning)
○ "You can't sacrifice partition tolerance"
● Jepsen tests (@aphyr)
○ In order to force failures on Kafka, it needs to shrink ISR
(In-Sync Replica Set) to one node (the master) and then lose
the master itself
■ It will cause a leader election and a new leader will be
elected
● It causes Kafka to lose ~50% of writes done during this
partition time
■ Kafka users usually set a replication factor of 2 or 3
replicas for each partition on a given topic
THANK YOU
20
Luiza Souza / Otávio Carvalho
lsouza@thoughtworks.com
ocarvalh@thoughtworks.com
● https://aphyr.com/posts/315-jepsen-rabbitmq
● https://aphyr.com/posts/293-jepsen-kafka
● https://thoughtworks.jiveon.com/people/tbartlet/blog/2015/11/
02/project-metamorphosis-with-kafka-spark
● https://thoughtworks.jiveon.com/message/1013489
● https://medium.com/@ikem/event-sourcing-and-cqrs-a-look-at-
kafka-e0c1b90d17d8#.x4f9ezrwn
● https://martin.kleppmann.com/2016/01/29/event-sourcing-stre
am-processing-at-ddd-europe.html
● http://microservices.io/patterns/microservices.html
● http://martinfowler.com/articles/microservices.html
● https://engineering.linkedin.com/kafka/running-kafka-scale
● https://engineering.linkedin.com/kafka/intra-cluster-replication-
apache-kafka
● http://martinfowler.com/bliki/MonolithFirst.html
Links
● https://www.oreilly.com/learning/making-sense-of-stream-proc
essing/page/3/integrating-databases-and-kafka-with-change-da
ta-capture
● http://kafka.apache.org/documentation.html
● https://github.com/toddpalino/kafkafromscratch/blob/master/A
pache%20Kafka%20from%20Scratch.pdf
● http://www.javaworld.com/article/3060078/big-data/big-data-m
essaging-with-kafka-part-1.html
● https://sookocheff.com/post/kafka/kafka-in-a-nutshell/
Links
Use Cases - Change Data Capture
● Log compaction
○ Kafka + Kafka Connect
Partitioning
● Custom Partitioner
○ Write your own logic
● Default Partitioner
○ Manual
○ Hashing
■ The most common approach
■ Messages with the same key go to the same producer
○ Spraying
■ Random partitioning

More Related Content

What's hot

What's hot (20)

Apache Kafka: Next Generation Distributed Messaging System
Apache Kafka: Next Generation Distributed Messaging SystemApache Kafka: Next Generation Distributed Messaging System
Apache Kafka: Next Generation Distributed Messaging System
 
Multi-Cluster and Failover for Apache Kafka - Kafka Summit SF 17
Multi-Cluster and Failover for Apache Kafka - Kafka Summit SF 17Multi-Cluster and Failover for Apache Kafka - Kafka Summit SF 17
Multi-Cluster and Failover for Apache Kafka - Kafka Summit SF 17
 
Reducing Microservice Complexity with Kafka and Reactive Streams
Reducing Microservice Complexity with Kafka and Reactive StreamsReducing Microservice Complexity with Kafka and Reactive Streams
Reducing Microservice Complexity with Kafka and Reactive Streams
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
Kafka 101
Kafka 101Kafka 101
Kafka 101
 
Fundamentals and Architecture of Apache Kafka
Fundamentals and Architecture of Apache KafkaFundamentals and Architecture of Apache Kafka
Fundamentals and Architecture of Apache Kafka
 
Apache Kafka - Martin Podval
Apache Kafka - Martin PodvalApache Kafka - Martin Podval
Apache Kafka - Martin Podval
 
Apache kafka introduction
Apache kafka introductionApache kafka introduction
Apache kafka introduction
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache Kafka
 
Devoxx Morocco 2016 - Microservices with Kafka
Devoxx Morocco 2016 - Microservices with KafkaDevoxx Morocco 2016 - Microservices with Kafka
Devoxx Morocco 2016 - Microservices with Kafka
 
Building Stream Infrastructure across Multiple Data Centers with Apache Kafka
Building Stream Infrastructure across Multiple Data Centers with Apache KafkaBuilding Stream Infrastructure across Multiple Data Centers with Apache Kafka
Building Stream Infrastructure across Multiple Data Centers with Apache Kafka
 
AMIS SIG - Introducing Apache Kafka - Scalable, reliable Event Bus & Message ...
AMIS SIG - Introducing Apache Kafka - Scalable, reliable Event Bus & Message ...AMIS SIG - Introducing Apache Kafka - Scalable, reliable Event Bus & Message ...
AMIS SIG - Introducing Apache Kafka - Scalable, reliable Event Bus & Message ...
 
Introduction to Kafka
Introduction to KafkaIntroduction to Kafka
Introduction to Kafka
 
Kafka Overview
Kafka OverviewKafka Overview
Kafka Overview
 
Kafka basics
Kafka basicsKafka basics
Kafka basics
 
Apache Kafka Best Practices
Apache Kafka Best PracticesApache Kafka Best Practices
Apache Kafka Best Practices
 
Introduction to Kafka and Zookeeper
Introduction to Kafka and ZookeeperIntroduction to Kafka and Zookeeper
Introduction to Kafka and Zookeeper
 
An Introduction to Apache Kafka
An Introduction to Apache KafkaAn Introduction to Apache Kafka
An Introduction to Apache Kafka
 
Building an Event Bus at Scale
Building an Event Bus at ScaleBuilding an Event Bus at Scale
Building an Event Bus at Scale
 

Similar to Apache Kafka - Free Friday

Kafka Cluster Federation at Uber (Yupeng Fui & Xiaoman Dong, Uber) Kafka Summ...
Kafka Cluster Federation at Uber (Yupeng Fui & Xiaoman Dong, Uber) Kafka Summ...Kafka Cluster Federation at Uber (Yupeng Fui & Xiaoman Dong, Uber) Kafka Summ...
Kafka Cluster Federation at Uber (Yupeng Fui & Xiaoman Dong, Uber) Kafka Summ...
confluent
 
(Current22) Let's Monitor The Conditions at the Conference
(Current22) Let's Monitor The Conditions at the Conference(Current22) Let's Monitor The Conditions at the Conference
(Current22) Let's Monitor The Conditions at the Conference
Timothy Spann
 

Similar to Apache Kafka - Free Friday (20)

kafka
kafkakafka
kafka
 
Event driven architectures with Kinesis
Event driven architectures with KinesisEvent driven architectures with Kinesis
Event driven architectures with Kinesis
 
Building zero data loss pipelines with apache kafka
Building zero data loss pipelines with apache kafkaBuilding zero data loss pipelines with apache kafka
Building zero data loss pipelines with apache kafka
 
Apache Kafka
Apache KafkaApache Kafka
Apache Kafka
 
Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015
Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015
Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015
 
Kafka in action - Tech Talk - Paytm
Kafka in action - Tech Talk - PaytmKafka in action - Tech Talk - Paytm
Kafka in action - Tech Talk - Paytm
 
Strimzi - Where Apache Kafka meets OpenShift - OpenShift Spain MeetUp
Strimzi - Where Apache Kafka meets OpenShift - OpenShift Spain MeetUpStrimzi - Where Apache Kafka meets OpenShift - OpenShift Spain MeetUp
Strimzi - Where Apache Kafka meets OpenShift - OpenShift Spain MeetUp
 
Non-Kafkaesque Apache Kafka - Yottabyte 2018
Non-Kafkaesque Apache Kafka - Yottabyte 2018Non-Kafkaesque Apache Kafka - Yottabyte 2018
Non-Kafkaesque Apache Kafka - Yottabyte 2018
 
Uber: Kafka Consumer Proxy
Uber: Kafka Consumer ProxyUber: Kafka Consumer Proxy
Uber: Kafka Consumer Proxy
 
Captial One: Why Stream Data as Part of Data Transformation?
Captial One: Why Stream Data as Part of Data Transformation?Captial One: Why Stream Data as Part of Data Transformation?
Captial One: Why Stream Data as Part of Data Transformation?
 
Columbus mule soft_meetup_aug2021_Kafka_Integration
Columbus mule soft_meetup_aug2021_Kafka_IntegrationColumbus mule soft_meetup_aug2021_Kafka_Integration
Columbus mule soft_meetup_aug2021_Kafka_Integration
 
Kafka Cluster Federation at Uber (Yupeng Fui & Xiaoman Dong, Uber) Kafka Summ...
Kafka Cluster Federation at Uber (Yupeng Fui & Xiaoman Dong, Uber) Kafka Summ...Kafka Cluster Federation at Uber (Yupeng Fui & Xiaoman Dong, Uber) Kafka Summ...
Kafka Cluster Federation at Uber (Yupeng Fui & Xiaoman Dong, Uber) Kafka Summ...
 
Linked In Stream Processing Meetup - Apache Pulsar
Linked In Stream Processing Meetup - Apache PulsarLinked In Stream Processing Meetup - Apache Pulsar
Linked In Stream Processing Meetup - Apache Pulsar
 
AMQP with RabbitMQ
AMQP with RabbitMQAMQP with RabbitMQ
AMQP with RabbitMQ
 
Pulsar - Distributed pub/sub platform
Pulsar - Distributed pub/sub platformPulsar - Distributed pub/sub platform
Pulsar - Distributed pub/sub platform
 
High performance messaging with Apache Pulsar
High performance messaging with Apache PulsarHigh performance messaging with Apache Pulsar
High performance messaging with Apache Pulsar
 
Timothy Spann: Apache Pulsar for ML
Timothy Spann: Apache Pulsar for MLTimothy Spann: Apache Pulsar for ML
Timothy Spann: Apache Pulsar for ML
 
A Hitchhiker's Guide to Apache Kafka Geo-Replication with Sanjana Kaundinya ...
 A Hitchhiker's Guide to Apache Kafka Geo-Replication with Sanjana Kaundinya ... A Hitchhiker's Guide to Apache Kafka Geo-Replication with Sanjana Kaundinya ...
A Hitchhiker's Guide to Apache Kafka Geo-Replication with Sanjana Kaundinya ...
 
Let’s Monitor Conditions at the Conference With Timothy Spann & David Kjerrum...
Let’s Monitor Conditions at the Conference With Timothy Spann & David Kjerrum...Let’s Monitor Conditions at the Conference With Timothy Spann & David Kjerrum...
Let’s Monitor Conditions at the Conference With Timothy Spann & David Kjerrum...
 
(Current22) Let's Monitor The Conditions at the Conference
(Current22) Let's Monitor The Conditions at the Conference(Current22) Let's Monitor The Conditions at the Conference
(Current22) Let's Monitor The Conditions at the Conference
 

More from Otávio Carvalho

More from Otávio Carvalho (8)

GaruaGeo: Global Scale Data Aggregation in Hybrid Edge and Cloud Computing En...
GaruaGeo: Global Scale Data Aggregation in Hybrid Edge and Cloud Computing En...GaruaGeo: Global Scale Data Aggregation in Hybrid Edge and Cloud Computing En...
GaruaGeo: Global Scale Data Aggregation in Hybrid Edge and Cloud Computing En...
 
IoT Workload Distribution Impact Between Edge and Cloud Computing in a Smart ...
IoT Workload Distribution Impact Between Edge and Cloud Computing in a Smart ...IoT Workload Distribution Impact Between Edge and Cloud Computing in a Smart ...
IoT Workload Distribution Impact Between Edge and Cloud Computing in a Smart ...
 
Stream Processing - ThoughtWorks Architecture Group - 2017
Stream Processing - ThoughtWorks Architecture Group - 2017Stream Processing - ThoughtWorks Architecture Group - 2017
Stream Processing - ThoughtWorks Architecture Group - 2017
 
Stream Processing: Uma visão geral - TDC Porto Alegre / FISL 17
Stream Processing: Uma visão geral - TDC Porto Alegre / FISL 17Stream Processing: Uma visão geral - TDC Porto Alegre / FISL 17
Stream Processing: Uma visão geral - TDC Porto Alegre / FISL 17
 
Distributed Near Real-Time Processing of Sensor Network Data Flows for Smart ...
Distributed Near Real-Time Processing of Sensor Network Data Flows for Smart ...Distributed Near Real-Time Processing of Sensor Network Data Flows for Smart ...
Distributed Near Real-Time Processing of Sensor Network Data Flows for Smart ...
 
A Survey of the State-of-the-art in Event Processing
A Survey of the State-of-the-art in Event ProcessingA Survey of the State-of-the-art in Event Processing
A Survey of the State-of-the-art in Event Processing
 
Análise e Caracterização das Novas Ferramentas para Computação em Nuvem
Análise e Caracterização das Novas Ferramentas para Computação em NuvemAnálise e Caracterização das Novas Ferramentas para Computação em Nuvem
Análise e Caracterização das Novas Ferramentas para Computação em Nuvem
 
Utilização de traços de execução para migração de aplicações para a nuvem
Utilização de traços de execução para migração de aplicações para a nuvemUtilização de traços de execução para migração de aplicações para a nuvem
Utilização de traços de execução para migração de aplicações para a nuvem
 

Recently uploaded

Mastering Windows 7 A Comprehensive Guide for Power Users .pdf
Mastering Windows 7 A Comprehensive Guide for Power Users .pdfMastering Windows 7 A Comprehensive Guide for Power Users .pdf
Mastering Windows 7 A Comprehensive Guide for Power Users .pdf
mbmh111980
 
AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...
AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...
AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...
Alluxio, Inc.
 

Recently uploaded (20)

What need to be mastered as AI-Powered Java Developers
What need to be mastered as AI-Powered Java DevelopersWhat need to be mastered as AI-Powered Java Developers
What need to be mastered as AI-Powered Java Developers
 
Top Mobile App Development Companies 2024
Top Mobile App Development Companies 2024Top Mobile App Development Companies 2024
Top Mobile App Development Companies 2024
 
Tree in the Forest - Managing Details in BDD Scenarios (live2test 2024)
Tree in the Forest - Managing Details in BDD Scenarios (live2test 2024)Tree in the Forest - Managing Details in BDD Scenarios (live2test 2024)
Tree in the Forest - Managing Details in BDD Scenarios (live2test 2024)
 
Mastering Windows 7 A Comprehensive Guide for Power Users .pdf
Mastering Windows 7 A Comprehensive Guide for Power Users .pdfMastering Windows 7 A Comprehensive Guide for Power Users .pdf
Mastering Windows 7 A Comprehensive Guide for Power Users .pdf
 
Crafting the Perfect Measurement Sheet with PLM Integration
Crafting the Perfect Measurement Sheet with PLM IntegrationCrafting the Perfect Measurement Sheet with PLM Integration
Crafting the Perfect Measurement Sheet with PLM Integration
 
GraphSummit Stockholm - Neo4j - Knowledge Graphs and Product Updates
GraphSummit Stockholm - Neo4j - Knowledge Graphs and Product UpdatesGraphSummit Stockholm - Neo4j - Knowledge Graphs and Product Updates
GraphSummit Stockholm - Neo4j - Knowledge Graphs and Product Updates
 
AI/ML Infra Meetup | Perspective on Deep Learning Framework
AI/ML Infra Meetup | Perspective on Deep Learning FrameworkAI/ML Infra Meetup | Perspective on Deep Learning Framework
AI/ML Infra Meetup | Perspective on Deep Learning Framework
 
SQL Injection Introduction and Prevention
SQL Injection Introduction and PreventionSQL Injection Introduction and Prevention
SQL Injection Introduction and Prevention
 
Microsoft 365 Copilot; An AI tool changing the world of work _PDF.pdf
Microsoft 365 Copilot; An AI tool changing the world of work _PDF.pdfMicrosoft 365 Copilot; An AI tool changing the world of work _PDF.pdf
Microsoft 365 Copilot; An AI tool changing the world of work _PDF.pdf
 
Entropy, Software Quality, and Innovation (presented at Princeton Plasma Phys...
Entropy, Software Quality, and Innovation (presented at Princeton Plasma Phys...Entropy, Software Quality, and Innovation (presented at Princeton Plasma Phys...
Entropy, Software Quality, and Innovation (presented at Princeton Plasma Phys...
 
CompTIA Security+ (Study Notes) for cs.pdf
CompTIA Security+ (Study Notes) for cs.pdfCompTIA Security+ (Study Notes) for cs.pdf
CompTIA Security+ (Study Notes) for cs.pdf
 
AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...
AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...
AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...
 
A Python-based approach to data loading in TM1 - Using Airflow as an ETL for TM1
A Python-based approach to data loading in TM1 - Using Airflow as an ETL for TM1A Python-based approach to data loading in TM1 - Using Airflow as an ETL for TM1
A Python-based approach to data loading in TM1 - Using Airflow as an ETL for TM1
 
APVP,apvp apvp High quality supplier safe spot transport, 98% purity
APVP,apvp apvp High quality supplier safe spot transport, 98% purityAPVP,apvp apvp High quality supplier safe spot transport, 98% purity
APVP,apvp apvp High quality supplier safe spot transport, 98% purity
 
Workforce Efficiency with Employee Time Tracking Software.pdf
Workforce Efficiency with Employee Time Tracking Software.pdfWorkforce Efficiency with Employee Time Tracking Software.pdf
Workforce Efficiency with Employee Time Tracking Software.pdf
 
AI/ML Infra Meetup | Reducing Prefill for LLM Serving in RAG
AI/ML Infra Meetup | Reducing Prefill for LLM Serving in RAGAI/ML Infra Meetup | Reducing Prefill for LLM Serving in RAG
AI/ML Infra Meetup | Reducing Prefill for LLM Serving in RAG
 
INGKA DIGITAL: Linked Metadata by Design
INGKA DIGITAL: Linked Metadata by DesignINGKA DIGITAL: Linked Metadata by Design
INGKA DIGITAL: Linked Metadata by Design
 
Agnieszka Andrzejewska - BIM School Course in Kraków
Agnieszka Andrzejewska - BIM School Course in KrakówAgnieszka Andrzejewska - BIM School Course in Kraków
Agnieszka Andrzejewska - BIM School Course in Kraków
 
how-to-download-files-safely-from-the-internet.pdf
how-to-download-files-safely-from-the-internet.pdfhow-to-download-files-safely-from-the-internet.pdf
how-to-download-files-safely-from-the-internet.pdf
 
10 Essential Software Testing Tools You Need to Know About.pdf
10 Essential Software Testing Tools You Need to Know About.pdf10 Essential Software Testing Tools You Need to Know About.pdf
10 Essential Software Testing Tools You Need to Know About.pdf
 

Apache Kafka - Free Friday

  • 1. Apache Kafka Free Friday Luiza Souza / Otávio Carvalho lsouza@thoughtworks.com ocarvalh@thoughtworks.com
  • 2. Apache Kafka ● Apache Kafka is a distributed messaging system ○ Provides fast, highly scalable and redundant messaging through a pub-sub model ● It was built at LinkedIn to be used as central hub for all of the messaging communication between their systems ● Focus on scalability and fault tolerance
  • 3. Motivation ● Microservices ○ "In short, the microservice architectural style is an approach to developing a single application as a suite of small services, each running in its own process and communicating with lightweight mechanisms, often an HTTP resource API. These services are built around business capabilities and independently deployable by fully automated deployment machinery."- Martin Fowler ● Monolith First ○ Using microservices as a way to decompose monolitical infrastructures ● Message Queues ○ Asynchronous processing ○ Decoupling ○ Load balancing ○ Scalability
  • 4. How is it different? ● High throughput ○ Millions of events per second per node ● Fault-tolerance guarantees ○ Relies on Apache Zookeeper for detection of node failures and leader election ○ Maintains a structure called ISR (In-Sync Replica Set) in order to be able to tolerate node failures ○ (Claims to) Guarantees up to f failures with f+1 replicas without losing data ● Distributed ○ More nodes can be included and the system keeps its high-performance and fault-tolerance capabilities
  • 5. ● Broker-centric (AMQP) ○ AMQP implementations are usually broker-centric ○ Focus on delivery guarantees between producers/consumers ○ Transient preferred over durable messages ○ Use the broker itself to maintain state of what is consumed (via message acknowledgements) ● Producer-centric (Kafka) ○ Partition a fire hose of event data into durable message brokers with cursors (pointers) ○ Support to batch consumers that may be offline, or online consumers that want messages at low latency ○ Doesn't have message acknowledgements, it assumes the consumer tracks what has been consumed so far Comparison with AMQP
  • 6. Kafka Terminology ● Producers ○ Processes that publishes msgs to topics ● Consumers ○ Processes that reads msgs from topics ● Topic ○ Name of the feed to which msgs are published ● Broker ○ Process running on a single machine ● Cluster ○ Group of brokers working together
  • 7. Kafka Terminology ● Partitions ○ Subdivision of Topics ■ Scalability ■ Load balancing ○ Consumers control their own offsets
  • 8. ● Replication ○ In-Sync-Replica (ISR) sets Kafka Terminology Figure 1. A Kafka cluster with 4 brokers, 1 topic and 2 partitions, each with 3 replicas
  • 9. Use Cases ● Messaging ● Distributed log / Log aggregation ● Change Data Capture ● Stream Processing / Event Sourcing
  • 10. Use Cases - Messaging ● Messaging ○ Simple Queueing ■ e.g. Queue for sending e-mails ○ Tracking user events ○ Near real-time metrics
  • 11. Use Cases - Distributed Log ● Distributed log / Log aggregation ○ LinkedIn usage ■ The whole platform is built around a central log ■ 13 million messages/sec, 15 gigabytes per sec ■ Over 1100 brokers in more than 60 clusters
  • 12. Use Cases - Change Data Capture
  • 13. Use Cases - Stream Processing ● Stream Processing / Event Sourcing LinkedIn's example Netflix's example
  • 16.
  • 17.
  • 18. Issues ● CAP theorem (Consistency, Availability, Partitioning) ○ "You can't sacrifice partition tolerance" ● Jepsen tests (@aphyr) ○ In order to force failures on Kafka, it needs to shrink ISR (In-Sync Replica Set) to one node (the master) and then lose the master itself ■ It will cause a leader election and a new leader will be elected ● It causes Kafka to lose ~50% of writes done during this partition time ■ Kafka users usually set a replication factor of 2 or 3 replicas for each partition on a given topic
  • 19.
  • 20. THANK YOU 20 Luiza Souza / Otávio Carvalho lsouza@thoughtworks.com ocarvalh@thoughtworks.com
  • 21. ● https://aphyr.com/posts/315-jepsen-rabbitmq ● https://aphyr.com/posts/293-jepsen-kafka ● https://thoughtworks.jiveon.com/people/tbartlet/blog/2015/11/ 02/project-metamorphosis-with-kafka-spark ● https://thoughtworks.jiveon.com/message/1013489 ● https://medium.com/@ikem/event-sourcing-and-cqrs-a-look-at- kafka-e0c1b90d17d8#.x4f9ezrwn ● https://martin.kleppmann.com/2016/01/29/event-sourcing-stre am-processing-at-ddd-europe.html ● http://microservices.io/patterns/microservices.html ● http://martinfowler.com/articles/microservices.html ● https://engineering.linkedin.com/kafka/running-kafka-scale ● https://engineering.linkedin.com/kafka/intra-cluster-replication- apache-kafka ● http://martinfowler.com/bliki/MonolithFirst.html Links
  • 22. ● https://www.oreilly.com/learning/making-sense-of-stream-proc essing/page/3/integrating-databases-and-kafka-with-change-da ta-capture ● http://kafka.apache.org/documentation.html ● https://github.com/toddpalino/kafkafromscratch/blob/master/A pache%20Kafka%20from%20Scratch.pdf ● http://www.javaworld.com/article/3060078/big-data/big-data-m essaging-with-kafka-part-1.html ● https://sookocheff.com/post/kafka/kafka-in-a-nutshell/ Links
  • 23. Use Cases - Change Data Capture ● Log compaction ○ Kafka + Kafka Connect
  • 24. Partitioning ● Custom Partitioner ○ Write your own logic ● Default Partitioner ○ Manual ○ Hashing ■ The most common approach ■ Messages with the same key go to the same producer ○ Spraying ■ Random partitioning