SlideShare a Scribd company logo
1 of 22
Apache Kafka at LinkedIn
How LinkedIn Customizes Kafka to Work at the Trillion Scale
Jon Lee
Staff Software
Engineer LinkedIn
Wesley Wu
Senior Software
Engineer LinkedIn
Agenda
1 Apache Kafka @
LinkedIn
2 Development Workflow
3 Patch Examples
4 Release Process
Agenda
1 Apache Kafka @
LinkedIn
2 Development Workflow
3 Patch Examples
4 Release Process
Apache Kafka
• Distributed stream
processing platform
• Publish and subscribe to
persistent messages
• High throughput and
low latency
• Developed at LinkedIn
• Top-level Apache project
Kafka @ LinkedIn
Ecosystem
BrokerBroker
Cruise Control
Client
BrokerBroker
Client
REST Proxy
Schema
Registry
Schema
Registry
Client Client
Brooklin
Completeness Audit
Usage Monitor
Cruise Control
Kafka @
LinkedIn
Running at Scale
• 7 trillion messages per day
• 100+ clusters, 4K+
brokers
• 100K+ topics, 7M+
partitions
• Constant scalability and
operability challenges
• Source of releases
running in LinkedIn
Production
• Branched from an Apache
Kafka release branch
• Contains hotfix patches
and upstream cherry-
picks
• Tailored to operations
and scale at LinkedIn
LinkedIn Kafka
Release Branch
Agenda
1 Apache Kafka @
LinkedIn
2 Development Workflow
3 Patch Examples
4 Release Process
Tracking Upstream Closely
“Upstream Everything”
Upstream First
• Commit to upstream first (file a KIP
if necessary)
• Cherry-pick it onto the current
LinkedIn release branch or pick it
up when a new branch containing
the upstream patch is created
• Suitable for patches with low to
medium urgency
LinkedIn First (a.k.a. hotfix
approach)
• Commit to LinkedIn branch first
• Double-commit to upstream (best
effort)
• Suitable for patches with high
urgency
Tale of Three Patches
Cherry-pick
• Cherry-picked from
upstream
• Kept until a new
LinkedIn release
branch containing the
original upstream
patch is created
Double-committed
Hotfix
• Hotfix eventually
committed to
upstream
• Kept until a new
LinkedIn branch
containing the
corresponding
upstream patch is
created
LinkedIn-private Hotfix
• Hotfix not of interest
to upstream (e.g.,
temporary debug
patches)
• OR double-commit
attempted but not
accepted by upstream
• Kept in LinkedIn
branches until they are
not needed
Close Look at a LinkedIn Release Branch
Apache Kafka
Release branch
LinkedIn
Release branch
Upstream patch (before branching point)
Cherry-pick
Hotfix (double-committed)
Hotfix (LinkedIn-private)
Apache Kafka
trunk
Developmen
t
Workflow
New
Issue
New
Feature
Already fixed
in upstream?
Intend to commit to
upstream?
File upstream ticket
Commit to
upstream
First?
Can be
cherrypicked?
KIP required?
File KIP /
upstream ticket
Done Done Done
Patch will be
picked up
at next rebase
Fixed in upstream
and patch exists in
LI Branch
Patch exists only in
LI Branch
Y
N YN
Y
NY
N
Upstream
patching
Hotfix
patching
N
Y
Cherry-
Pick
Rejected
Rejected
Rejected
Agenda
1 Apache Kafka @
LinkedIn
2 Development Workflow
3 Patch Examples
4 Release Process
Scalability Support
• Challenges
• 140+ brokers and 1M+ replicas on a single cluster
• Controller failure leads to site unavailability
• Slowness in bouncing a broker causes deployment delay
• Solutions
• Reuse UpdateMetadataRequest object to reduce controller memory
footprint
• Improve broker shutdown time by reducing lock contention
• Avoid excessive logging
Operability Support
• Challenges
• Broker removal for maintenance requires moving out all replicas.
• New replicas can get assigned to brokers that are going to be removed.
• Solutions
• Add a broker to maintenance broker list
• New replicas do not get assigned to maintenance brokers.
• Integrated with Kafka Cruise Control to automate broker removal process
Features
• Observer for billing
• Provide accounting information
• Enforce minimum replication factor
• Minimize data loss risk in case of broker failure
• New offset reset policy
• Help consumer navigate to the closest offset
We are considering (WIP):​
• CPU Optimization (e.g., using Open SSL library)​
• Separate controller node from data broker nodes
Direct Contributions to Upstream
• KIP-219: Improve quota communication
• KIP-291: Separating controller connections and requests from the
data plane
• KIP-354: Add a maximum log compaction lag
• KIP-380: Detect outdated control requests and bounced brokers
using broker generation
Agenda
1 Apache Kafka @
LinkedIn
2 Development Workflow
3 Patch Examples
4 Release Process
Creating a New LinkedIn Release Branch
Apache Kafka Trunk
Apache
Kafka 2.0.0
Apache
Kafka 2.3.0
Cherrypick Hotfix (LinkedIn-private)
Certifying a Release
20
Cert
Cluster
Baseline
Broker 0
Broker 1
Broker …
Broker N
• Identical Setup with 30+
brokers
• Production Traffic
• Automated compare run
• Detailed report
Cert
Cluster
Release
Broker 0
Broker 1
Broker …
Broker N
Produce
Traffic
Consume Traffic
Produce Traffic
Consume Traffic
• Certification covers rebalance, deployment, rolling bounce, stability, and
downgrade
• Source code available at GitHub​
http://github.com/linkedin/kafka
• NOT a fork
• Branches are named as
<Apache Kafka Release>-li (e.g.,
2.0-li and 2.3-li)​
• We are not accepting
external contributions. Please
contribute directly to upstream
Please Check Out
Thank You

More Related Content

What's hot

Introducing Kafka's Streams API
Introducing Kafka's Streams APIIntroducing Kafka's Streams API
Introducing Kafka's Streams API
confluent
 
Kafka Pluggable Authorization for Enterprise Security (Anna Kepler, Viasat) K...
Kafka Pluggable Authorization for Enterprise Security (Anna Kepler, Viasat) K...Kafka Pluggable Authorization for Enterprise Security (Anna Kepler, Viasat) K...
Kafka Pluggable Authorization for Enterprise Security (Anna Kepler, Viasat) K...
confluent
 
Apache Kafka - Scalable Message Processing and more!
Apache Kafka - Scalable Message Processing and more!Apache Kafka - Scalable Message Processing and more!
Apache Kafka - Scalable Message Processing and more!
Guido Schmutz
 

What's hot (20)

Kafka summit apac session
Kafka summit apac sessionKafka summit apac session
Kafka summit apac session
 
So You’ve Inherited Kafka? Now What? (Alon Gavra, AppsFlyer) Kafka Summit Lon...
So You’ve Inherited Kafka? Now What? (Alon Gavra, AppsFlyer) Kafka Summit Lon...So You’ve Inherited Kafka? Now What? (Alon Gavra, AppsFlyer) Kafka Summit Lon...
So You’ve Inherited Kafka? Now What? (Alon Gavra, AppsFlyer) Kafka Summit Lon...
 
Evolving from Messaging to Event Streaming
Evolving from Messaging to Event StreamingEvolving from Messaging to Event Streaming
Evolving from Messaging to Event Streaming
 
What is Apache Kafka and What is an Event Streaming Platform?
What is Apache Kafka and What is an Event Streaming Platform?What is Apache Kafka and What is an Event Streaming Platform?
What is Apache Kafka and What is an Event Streaming Platform?
 
Introducing Kafka's Streams API
Introducing Kafka's Streams APIIntroducing Kafka's Streams API
Introducing Kafka's Streams API
 
Kafka Summit NYC 2017 - Cloud Native Data Streaming Microservices with Spring...
Kafka Summit NYC 2017 - Cloud Native Data Streaming Microservices with Spring...Kafka Summit NYC 2017 - Cloud Native Data Streaming Microservices with Spring...
Kafka Summit NYC 2017 - Cloud Native Data Streaming Microservices with Spring...
 
Why Cloud-Native Kafka Matters: 4 Reasons to Stop Managing it Yourself
Why Cloud-Native Kafka Matters: 4 Reasons to Stop Managing it YourselfWhy Cloud-Native Kafka Matters: 4 Reasons to Stop Managing it Yourself
Why Cloud-Native Kafka Matters: 4 Reasons to Stop Managing it Yourself
 
Battle-tested event-driven patterns for your microservices architecture - Sca...
Battle-tested event-driven patterns for your microservices architecture - Sca...Battle-tested event-driven patterns for your microservices architecture - Sca...
Battle-tested event-driven patterns for your microservices architecture - Sca...
 
Understanding Apache Kafka® Latency at Scale
Understanding Apache Kafka® Latency at ScaleUnderstanding Apache Kafka® Latency at Scale
Understanding Apache Kafka® Latency at Scale
 
Partner Development Guide for Kafka Connect
Partner Development Guide for Kafka ConnectPartner Development Guide for Kafka Connect
Partner Development Guide for Kafka Connect
 
Flexible Authentication Strategies with SASL/OAUTHBEARER (Michael Kaminski, T...
Flexible Authentication Strategies with SASL/OAUTHBEARER (Michael Kaminski, T...Flexible Authentication Strategies with SASL/OAUTHBEARER (Michael Kaminski, T...
Flexible Authentication Strategies with SASL/OAUTHBEARER (Michael Kaminski, T...
 
Kafka Pluggable Authorization for Enterprise Security (Anna Kepler, Viasat) K...
Kafka Pluggable Authorization for Enterprise Security (Anna Kepler, Viasat) K...Kafka Pluggable Authorization for Enterprise Security (Anna Kepler, Viasat) K...
Kafka Pluggable Authorization for Enterprise Security (Anna Kepler, Viasat) K...
 
How did we move the mountain? - Migrating 1 trillion+ messages per day across...
How did we move the mountain? - Migrating 1 trillion+ messages per day across...How did we move the mountain? - Migrating 1 trillion+ messages per day across...
How did we move the mountain? - Migrating 1 trillion+ messages per day across...
 
Common issues with Apache Kafka® Producer
Common issues with Apache Kafka® ProducerCommon issues with Apache Kafka® Producer
Common issues with Apache Kafka® Producer
 
Polyglot, fault-tolerant event-driven programming with kafka, kubernetes and ...
Polyglot, fault-tolerant event-driven programming with kafka, kubernetes and ...Polyglot, fault-tolerant event-driven programming with kafka, kubernetes and ...
Polyglot, fault-tolerant event-driven programming with kafka, kubernetes and ...
 
Event Sourcing, Stream Processing and Serverless (Benjamin Stopford, Confluen...
Event Sourcing, Stream Processing and Serverless (Benjamin Stopford, Confluen...Event Sourcing, Stream Processing and Serverless (Benjamin Stopford, Confluen...
Event Sourcing, Stream Processing and Serverless (Benjamin Stopford, Confluen...
 
Apache Kafka - Scalable Message Processing and more!
Apache Kafka - Scalable Message Processing and more!Apache Kafka - Scalable Message Processing and more!
Apache Kafka - Scalable Message Processing and more!
 
Stream processing for the masses with beam, python and flink
Stream processing for the masses with beam, python and flink Stream processing for the masses with beam, python and flink
Stream processing for the masses with beam, python and flink
 
Tips & Tricks for Apache Kafka®
Tips & Tricks for Apache Kafka®Tips & Tricks for Apache Kafka®
Tips & Tricks for Apache Kafka®
 
Stream Processing Live Traffic Data with Kafka Streams
Stream Processing Live Traffic Data with Kafka StreamsStream Processing Live Traffic Data with Kafka Streams
Stream Processing Live Traffic Data with Kafka Streams
 

Similar to Apache Kafka at LinkedIn - How LinkedIn Customizes Kafka to Work at the Trillion Scale

Bringing Streaming Data To The Masses: Lowering The “Cost Of Admission” For Y...
Bringing Streaming Data To The Masses: Lowering The “Cost Of Admission” For Y...Bringing Streaming Data To The Masses: Lowering The “Cost Of Admission” For Y...
Bringing Streaming Data To The Masses: Lowering The “Cost Of Admission” For Y...
confluent
 

Similar to Apache Kafka at LinkedIn - How LinkedIn Customizes Kafka to Work at the Trillion Scale (20)

Distributed Kafka Architecture Taboola Scale
Distributed Kafka Architecture Taboola ScaleDistributed Kafka Architecture Taboola Scale
Distributed Kafka Architecture Taboola Scale
 
Kafka Explainaton
Kafka ExplainatonKafka Explainaton
Kafka Explainaton
 
Bringing Streaming Data To The Masses: Lowering The “Cost Of Admission” For Y...
Bringing Streaming Data To The Masses: Lowering The “Cost Of Admission” For Y...Bringing Streaming Data To The Masses: Lowering The “Cost Of Admission” For Y...
Bringing Streaming Data To The Masses: Lowering The “Cost Of Admission” For Y...
 
Building streaming data applications using Kafka*[Connect + Core + Streams] b...
Building streaming data applications using Kafka*[Connect + Core + Streams] b...Building streaming data applications using Kafka*[Connect + Core + Streams] b...
Building streaming data applications using Kafka*[Connect + Core + Streams] b...
 
Being Ready for Apache Kafka - Apache: Big Data Europe 2015
Being Ready for Apache Kafka - Apache: Big Data Europe 2015Being Ready for Apache Kafka - Apache: Big Data Europe 2015
Being Ready for Apache Kafka - Apache: Big Data Europe 2015
 
Building a company-wide data pipeline on Apache Kafka - engineering for 150 b...
Building a company-wide data pipeline on Apache Kafka - engineering for 150 b...Building a company-wide data pipeline on Apache Kafka - engineering for 150 b...
Building a company-wide data pipeline on Apache Kafka - engineering for 150 b...
 
Serverless brewbox
Serverless   brewboxServerless   brewbox
Serverless brewbox
 
Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !
 
An evening with Jay Kreps; author of Apache Kafka, Samza, Voldemort & Azkaban.
An evening with Jay Kreps; author of Apache Kafka, Samza, Voldemort & Azkaban.An evening with Jay Kreps; author of Apache Kafka, Samza, Voldemort & Azkaban.
An evening with Jay Kreps; author of Apache Kafka, Samza, Voldemort & Azkaban.
 
Building Streaming Data Applications Using Apache Kafka
Building Streaming Data Applications Using Apache KafkaBuilding Streaming Data Applications Using Apache Kafka
Building Streaming Data Applications Using Apache Kafka
 
kafka for db as postgres
kafka for db as postgreskafka for db as postgres
kafka for db as postgres
 
AWS re:Invent 2016: The State of Serverless Computing (SVR311)
AWS re:Invent 2016: The State of Serverless Computing (SVR311)AWS re:Invent 2016: The State of Serverless Computing (SVR311)
AWS re:Invent 2016: The State of Serverless Computing (SVR311)
 
Versioning for Developers
Versioning for DevelopersVersioning for Developers
Versioning for Developers
 
Data Pipelines with Kafka Connect
Data Pipelines with Kafka ConnectData Pipelines with Kafka Connect
Data Pipelines with Kafka Connect
 
Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! ...
Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! ...Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! ...
Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! ...
 
Data Stream Processing with Apache Flink
Data Stream Processing with Apache FlinkData Stream Processing with Apache Flink
Data Stream Processing with Apache Flink
 
Elk ruminating on logs
Elk ruminating on logsElk ruminating on logs
Elk ruminating on logs
 
8th Athens Big Data Meetup - 1st Talk - Riding The Streaming Wave DIY Style
8th Athens Big Data Meetup - 1st Talk - Riding The Streaming Wave DIY Style8th Athens Big Data Meetup - 1st Talk - Riding The Streaming Wave DIY Style
8th Athens Big Data Meetup - 1st Talk - Riding The Streaming Wave DIY Style
 
Riding the Streaming Wave DIY style
Riding the Streaming Wave  DIY styleRiding the Streaming Wave  DIY style
Riding the Streaming Wave DIY style
 
Apache Kafka 0.8 basic training - Verisign
Apache Kafka 0.8 basic training - VerisignApache Kafka 0.8 basic training - Verisign
Apache Kafka 0.8 basic training - Verisign
 

Recently uploaded

DR PROF ING GURUDUTT SAHNI WIKIPEDIA.pdf
DR PROF ING GURUDUTT SAHNI WIKIPEDIA.pdfDR PROF ING GURUDUTT SAHNI WIKIPEDIA.pdf
DR PROF ING GURUDUTT SAHNI WIKIPEDIA.pdf
DrGurudutt
 
Activity Planning: Objectives, Project Schedule, Network Planning Model. Time...
Activity Planning: Objectives, Project Schedule, Network Planning Model. Time...Activity Planning: Objectives, Project Schedule, Network Planning Model. Time...
Activity Planning: Objectives, Project Schedule, Network Planning Model. Time...
Lovely Professional University
 

Recently uploaded (20)

Introduction to Machine Learning Unit-4 Notes for II-II Mechanical Engineering
Introduction to Machine Learning Unit-4 Notes for II-II Mechanical EngineeringIntroduction to Machine Learning Unit-4 Notes for II-II Mechanical Engineering
Introduction to Machine Learning Unit-4 Notes for II-II Mechanical Engineering
 
Multivibrator and its types defination and usges.pptx
Multivibrator and its types defination and usges.pptxMultivibrator and its types defination and usges.pptx
Multivibrator and its types defination and usges.pptx
 
Intelligent Agents, A discovery on How A Rational Agent Acts
Intelligent Agents, A discovery on How A Rational Agent ActsIntelligent Agents, A discovery on How A Rational Agent Acts
Intelligent Agents, A discovery on How A Rational Agent Acts
 
Online book store management system project.pdf
Online book store management system project.pdfOnline book store management system project.pdf
Online book store management system project.pdf
 
Software Engineering - Modelling Concepts + Class Modelling + Building the An...
Software Engineering - Modelling Concepts + Class Modelling + Building the An...Software Engineering - Modelling Concepts + Class Modelling + Building the An...
Software Engineering - Modelling Concepts + Class Modelling + Building the An...
 
Electrical shop management system project report.pdf
Electrical shop management system project report.pdfElectrical shop management system project report.pdf
Electrical shop management system project report.pdf
 
RESORT MANAGEMENT AND RESERVATION SYSTEM PROJECT REPORT.pdf
RESORT MANAGEMENT AND RESERVATION SYSTEM PROJECT REPORT.pdfRESORT MANAGEMENT AND RESERVATION SYSTEM PROJECT REPORT.pdf
RESORT MANAGEMENT AND RESERVATION SYSTEM PROJECT REPORT.pdf
 
ChatGPT Prompt Engineering for project managers.pdf
ChatGPT Prompt Engineering for project managers.pdfChatGPT Prompt Engineering for project managers.pdf
ChatGPT Prompt Engineering for project managers.pdf
 
Attraction and Repulsion type Moving Iron Instruments.pptx
Attraction and Repulsion type Moving Iron Instruments.pptxAttraction and Repulsion type Moving Iron Instruments.pptx
Attraction and Repulsion type Moving Iron Instruments.pptx
 
Research Methodolgy & Intellectual Property Rights Series 2
Research Methodolgy & Intellectual Property Rights Series 2Research Methodolgy & Intellectual Property Rights Series 2
Research Methodolgy & Intellectual Property Rights Series 2
 
ROAD CONSTRUCTION PRESENTATION.PPTX.pptx
ROAD CONSTRUCTION PRESENTATION.PPTX.pptxROAD CONSTRUCTION PRESENTATION.PPTX.pptx
ROAD CONSTRUCTION PRESENTATION.PPTX.pptx
 
Supermarket billing system project report..pdf
Supermarket billing system project report..pdfSupermarket billing system project report..pdf
Supermarket billing system project report..pdf
 
Research Methodolgy & Intellectual Property Rights Series 1
Research Methodolgy & Intellectual Property Rights Series 1Research Methodolgy & Intellectual Property Rights Series 1
Research Methodolgy & Intellectual Property Rights Series 1
 
DR PROF ING GURUDUTT SAHNI WIKIPEDIA.pdf
DR PROF ING GURUDUTT SAHNI WIKIPEDIA.pdfDR PROF ING GURUDUTT SAHNI WIKIPEDIA.pdf
DR PROF ING GURUDUTT SAHNI WIKIPEDIA.pdf
 
Activity Planning: Objectives, Project Schedule, Network Planning Model. Time...
Activity Planning: Objectives, Project Schedule, Network Planning Model. Time...Activity Planning: Objectives, Project Schedule, Network Planning Model. Time...
Activity Planning: Objectives, Project Schedule, Network Planning Model. Time...
 
Introduction to Artificial Intelligence and History of AI
Introduction to Artificial Intelligence and History of AIIntroduction to Artificial Intelligence and History of AI
Introduction to Artificial Intelligence and History of AI
 
Quiz application system project report..pdf
Quiz application system project report..pdfQuiz application system project report..pdf
Quiz application system project report..pdf
 
2024 DevOps Pro Europe - Growing at the edge
2024 DevOps Pro Europe - Growing at the edge2024 DevOps Pro Europe - Growing at the edge
2024 DevOps Pro Europe - Growing at the edge
 
Lesson no16 application of Induction Generator in Wind.ppsx
Lesson no16 application of Induction Generator in Wind.ppsxLesson no16 application of Induction Generator in Wind.ppsx
Lesson no16 application of Induction Generator in Wind.ppsx
 
SLIDESHARE PPT-DECISION MAKING METHODS.pptx
SLIDESHARE PPT-DECISION MAKING METHODS.pptxSLIDESHARE PPT-DECISION MAKING METHODS.pptx
SLIDESHARE PPT-DECISION MAKING METHODS.pptx
 

Apache Kafka at LinkedIn - How LinkedIn Customizes Kafka to Work at the Trillion Scale

  • 1. Apache Kafka at LinkedIn How LinkedIn Customizes Kafka to Work at the Trillion Scale Jon Lee Staff Software Engineer LinkedIn Wesley Wu Senior Software Engineer LinkedIn
  • 2. Agenda 1 Apache Kafka @ LinkedIn 2 Development Workflow 3 Patch Examples 4 Release Process
  • 3. Agenda 1 Apache Kafka @ LinkedIn 2 Development Workflow 3 Patch Examples 4 Release Process
  • 4. Apache Kafka • Distributed stream processing platform • Publish and subscribe to persistent messages • High throughput and low latency • Developed at LinkedIn • Top-level Apache project
  • 5. Kafka @ LinkedIn Ecosystem BrokerBroker Cruise Control Client BrokerBroker Client REST Proxy Schema Registry Schema Registry Client Client Brooklin Completeness Audit Usage Monitor Cruise Control
  • 6. Kafka @ LinkedIn Running at Scale • 7 trillion messages per day • 100+ clusters, 4K+ brokers • 100K+ topics, 7M+ partitions • Constant scalability and operability challenges
  • 7. • Source of releases running in LinkedIn Production • Branched from an Apache Kafka release branch • Contains hotfix patches and upstream cherry- picks • Tailored to operations and scale at LinkedIn LinkedIn Kafka Release Branch
  • 8. Agenda 1 Apache Kafka @ LinkedIn 2 Development Workflow 3 Patch Examples 4 Release Process
  • 9. Tracking Upstream Closely “Upstream Everything” Upstream First • Commit to upstream first (file a KIP if necessary) • Cherry-pick it onto the current LinkedIn release branch or pick it up when a new branch containing the upstream patch is created • Suitable for patches with low to medium urgency LinkedIn First (a.k.a. hotfix approach) • Commit to LinkedIn branch first • Double-commit to upstream (best effort) • Suitable for patches with high urgency
  • 10. Tale of Three Patches Cherry-pick • Cherry-picked from upstream • Kept until a new LinkedIn release branch containing the original upstream patch is created Double-committed Hotfix • Hotfix eventually committed to upstream • Kept until a new LinkedIn branch containing the corresponding upstream patch is created LinkedIn-private Hotfix • Hotfix not of interest to upstream (e.g., temporary debug patches) • OR double-commit attempted but not accepted by upstream • Kept in LinkedIn branches until they are not needed
  • 11. Close Look at a LinkedIn Release Branch Apache Kafka Release branch LinkedIn Release branch Upstream patch (before branching point) Cherry-pick Hotfix (double-committed) Hotfix (LinkedIn-private) Apache Kafka trunk
  • 12. Developmen t Workflow New Issue New Feature Already fixed in upstream? Intend to commit to upstream? File upstream ticket Commit to upstream First? Can be cherrypicked? KIP required? File KIP / upstream ticket Done Done Done Patch will be picked up at next rebase Fixed in upstream and patch exists in LI Branch Patch exists only in LI Branch Y N YN Y NY N Upstream patching Hotfix patching N Y Cherry- Pick Rejected Rejected Rejected
  • 13. Agenda 1 Apache Kafka @ LinkedIn 2 Development Workflow 3 Patch Examples 4 Release Process
  • 14. Scalability Support • Challenges • 140+ brokers and 1M+ replicas on a single cluster • Controller failure leads to site unavailability • Slowness in bouncing a broker causes deployment delay • Solutions • Reuse UpdateMetadataRequest object to reduce controller memory footprint • Improve broker shutdown time by reducing lock contention • Avoid excessive logging
  • 15. Operability Support • Challenges • Broker removal for maintenance requires moving out all replicas. • New replicas can get assigned to brokers that are going to be removed. • Solutions • Add a broker to maintenance broker list • New replicas do not get assigned to maintenance brokers. • Integrated with Kafka Cruise Control to automate broker removal process
  • 16. Features • Observer for billing • Provide accounting information • Enforce minimum replication factor • Minimize data loss risk in case of broker failure • New offset reset policy • Help consumer navigate to the closest offset We are considering (WIP):​ • CPU Optimization (e.g., using Open SSL library)​ • Separate controller node from data broker nodes
  • 17. Direct Contributions to Upstream • KIP-219: Improve quota communication • KIP-291: Separating controller connections and requests from the data plane • KIP-354: Add a maximum log compaction lag • KIP-380: Detect outdated control requests and bounced brokers using broker generation
  • 18. Agenda 1 Apache Kafka @ LinkedIn 2 Development Workflow 3 Patch Examples 4 Release Process
  • 19. Creating a New LinkedIn Release Branch Apache Kafka Trunk Apache Kafka 2.0.0 Apache Kafka 2.3.0 Cherrypick Hotfix (LinkedIn-private)
  • 20. Certifying a Release 20 Cert Cluster Baseline Broker 0 Broker 1 Broker … Broker N • Identical Setup with 30+ brokers • Production Traffic • Automated compare run • Detailed report Cert Cluster Release Broker 0 Broker 1 Broker … Broker N Produce Traffic Consume Traffic Produce Traffic Consume Traffic • Certification covers rebalance, deployment, rolling bounce, stability, and downgrade
  • 21. • Source code available at GitHub​ http://github.com/linkedin/kafka • NOT a fork • Branches are named as <Apache Kafka Release>-li (e.g., 2.0-li and 2.3-li)​ • We are not accepting external contributions. Please contribute directly to upstream Please Check Out