SlideShare a Scribd company logo
1 of 27
Download to read offline
Organic Growth and a
Good Night Sleep:
Effective Kafka
Operations at Pinterest
Kafka Summit 2020 Vahid Hashemian August 24, 2020Ambud Sharma
1
2
3
4
5
6
7
Agenda Who We Are
What is Pinterest
Kafka at Pinterest
Challenges/Lessons Learned
Automation
The Upgrade Story
The Road Ahead
1 Who we are
Vahid Hashemian
Software Engineer
Logging Platform, Pinterest
Committer, PMC Member
Ambud Sharma
EM & TL
Logging Platform, Pinterest
And other members of Logging Platform team:
Eric Lopez, Heng Zhang, Henry Cai, Jeff Xiang, Ping-Min Lin
2 What is Pinterest
Mission
To bring everyone the
inspiration to create a life
they love
2 What is Pinterest
● 367 million Global Monthly Active Users
● 240 billion Pins saved
● 5B+ boards
● 82% of all Pinners use Pinterest on mobile
● 91% of Pinners say that Pinterest is filled with positivity.
● 89% of Pinners report that they leave the site feeling empowered.
2 What is Pinterest
3 Kafka at Pinterest
Scale of Data Ingestion Pipelines:
● Hosted on AWS EC2
● 50+ Kafka clusters (in production)
● 2,500+ Kafka brokers
● 3,000+ Kafka topics, 150K+ Kafka partitions
● Inbound traffic: ~ 20 GB/s (max)
● Outbound traffic: ~ 50 GB/s (max)
Current Version:
● 2.3.1 with cherry-picked commits
3 Kafka at Pinterest
● Performance Issues
○ Magnetic disks to NVMe SSDs
○ Dynamic rebalancing -> Static rebalancing
○ Message conversions (implicit throttling)
● Cost control
○ Data transfer -> Rack awareness
○ Compressed topics
○ Retentions and Replication factor tuning
● Placement and auto-balancing partitions (brokersets)
○ New topics
○ Traffic pattern changes
4 Challenges / Lessons Learned
5 Automation
Wizards Git Repo Orion
Brokerset 1 Brokerset 2
Brokerset
3
Topic 1 Topic 2
Topic 3
Topic 4 Topic 5
Cluster X
● Each brokerset is a group of 1 or more
brokers
● Brokerset is defined using broker id
ranges
● Brokersets can overlap
● Topic partition counts are a multiple of
brokersets
● Currently there are 2 types of
brokersets (Capacity & Static)
● Topics are assigned based on
capacity requirements & capacity
available
● All topics are managed via config
check-in (infra as code)
● Wizard provides a simple interface
for users to generate topic code
Topic 6
5 Automation (lineage, schema registry and wizard)
5 Automation
We developed Orion as a Unified Management System for Stateful Distributed Systems
1. UI
2. Automation
3. Cluster Management
4. Alerting
Orion manages all Kafka clusters at Pinterest.
5 Automation
We developed Orion as a Unified Management System for Stateful Distributed Systems
1. UI
2. Automation
3. Cluster Management
4. Alerting
Orion manages all Kafka clusters at Pinterest.
5 Automation
We developed Orion as a Unified Management System for Stateful Distributed Systems
1. UI
2. Automation
3. Cluster Management
4. Alerting
Orion manages all Kafka clusters at Pinterest.
5 Automation
We developed Orion as a Unified Management System for Stateful Distributed Systems
1. UI
2. Automation
3. Cluster Management
4. Alerting
Orion manages all Kafka clusters at Pinterest.
● All clusters were upgraded earlier this year to 2.3.1+
● Post-upgrade versions:
○ Broker: 2.3.1+ [2.3.1 + cherry picked commits]
○ Inter broker protocol: 2.3-IV1
○ Log message format: 2.3-IV1
6 The Upgrade Story
● Broker version
○ The version of the Kafka binary the broker runs.
● Inter broker protocol version
○ The version on which brokers communicate with each other.
○ Maximum allowed value is the minimum broker version in the cluster.
○ Once upgraded, cannot be rolled back.
● Log message format version
○ The version of format used by brokers to store messages.
○ Can be granular to per topic.
○ Guarantees that messages on disk are of smaller or equal version to the
version configured for a topic.
○ If set incorrectly, could break old consumers (prior to 0.10.2 that are not
forward compatible).
6 The Upgrade Story
6 The Upgrade Story
Worthy lessons to share
1. Not every release of
Apache Kafka may work
for your specific use
cases. Do your due
diligence before choosing
the version to upgrade to.
6 The Upgrade Story
Worthy lessons to share
2. Check all blocker / critical /
major bugs reported against
your upgrade version of choice.
6 The Upgrade Story
6 The Upgrade Story
Worthy lessons to share
3. Bug fix releases are usually
safer options for upgrade,
but the risk is never 0.
Worthy lessons to share
4. If you have clusters on old log message format versions, make sure their
clients would not be affected by the upgrade.
6 The Upgrade Story
Scala / Java
Worthy lessons to share
4. If you have clusters on old log message format versions, make sure their
clients would not be affected by the upgrade.
6 The Upgrade Story
librdkafka
Worthy lessons to share
4. If you have clusters on old log message format versions, make sure their
clients would not be affected by the upgrade.
6 The Upgrade Story
kafka-python
Worthy lessons to share
4. If you have clusters on old log message format versions, make sure their
clients would not be affected by the upgrade.
6 The Upgrade Story
confluent-kafka-python
● Interoperability
○ Abstract implementation (Kafka internals etc.) from client applications.
● Scaling and efficiency
○ Provide true horizontal and dynamic scaling.
○ Reduce cost per GB moved.
● Reliability
○ Automate on-call support
○ Measure & pinpoint data loss
8 The Road Ahead
Pinterest Engineering Blog (on Medium)
● Using graph algorithms to optimize Kafka operations (Part 1, Part 2)
● Optimizing Kafka for the cloud
● Open sourcing Singer, Pinterest’s performant and reliable logging agent
● How Pinterest runs Kafka at scale
Pinterest is hiring
● https://careers.pinterest.com/
Additional Resources

More Related Content

What's hot

Kafka Summit NYC 2017 - Cloud Native Data Streaming Microservices with Spring...
Kafka Summit NYC 2017 - Cloud Native Data Streaming Microservices with Spring...Kafka Summit NYC 2017 - Cloud Native Data Streaming Microservices with Spring...
Kafka Summit NYC 2017 - Cloud Native Data Streaming Microservices with Spring...confluent
 
How Zillow Unlocked Kafka to 50 Teams in 8 months | Shahar Cizer Kobrinsky, Z...
How Zillow Unlocked Kafka to 50 Teams in 8 months | Shahar Cizer Kobrinsky, Z...How Zillow Unlocked Kafka to 50 Teams in 8 months | Shahar Cizer Kobrinsky, Z...
How Zillow Unlocked Kafka to 50 Teams in 8 months | Shahar Cizer Kobrinsky, Z...HostedbyConfluent
 
From Postgres to Event-Driven: using docker-compose to build CDC pipelines in...
From Postgres to Event-Driven: using docker-compose to build CDC pipelines in...From Postgres to Event-Driven: using docker-compose to build CDC pipelines in...
From Postgres to Event-Driven: using docker-compose to build CDC pipelines in...confluent
 
Everything you ever needed to know about Kafka on Kubernetes but were afraid ...
Everything you ever needed to know about Kafka on Kubernetes but were afraid ...Everything you ever needed to know about Kafka on Kubernetes but were afraid ...
Everything you ever needed to know about Kafka on Kubernetes but were afraid ...HostedbyConfluent
 
Keeping Your Data Close and Your Caches Hotter (Ricardo Ferreira, Confluent) ...
Keeping Your Data Close and Your Caches Hotter (Ricardo Ferreira, Confluent) ...Keeping Your Data Close and Your Caches Hotter (Ricardo Ferreira, Confluent) ...
Keeping Your Data Close and Your Caches Hotter (Ricardo Ferreira, Confluent) ...confluent
 
Building Retry Architectures in Kafka with Compacted Topics | Matthew Zhou, V...
Building Retry Architectures in Kafka with Compacted Topics | Matthew Zhou, V...Building Retry Architectures in Kafka with Compacted Topics | Matthew Zhou, V...
Building Retry Architectures in Kafka with Compacted Topics | Matthew Zhou, V...HostedbyConfluent
 
Look how easy it is to go from events to blazing-fast analytics! | Neha Pawar...
Look how easy it is to go from events to blazing-fast analytics! | Neha Pawar...Look how easy it is to go from events to blazing-fast analytics! | Neha Pawar...
Look how easy it is to go from events to blazing-fast analytics! | Neha Pawar...HostedbyConfluent
 
Kafka for Real-Time Event Processing in Serverless Environments
Kafka for Real-Time Event Processing in Serverless EnvironmentsKafka for Real-Time Event Processing in Serverless Environments
Kafka for Real-Time Event Processing in Serverless Environmentsconfluent
 
Tips & Tricks for Apache Kafka®
Tips & Tricks for Apache Kafka®Tips & Tricks for Apache Kafka®
Tips & Tricks for Apache Kafka®confluent
 
Intro to AsyncAPI
Intro to AsyncAPIIntro to AsyncAPI
Intro to AsyncAPIconfluent
 
Introduction to Apache Kafka and why it matters - Madrid
Introduction to Apache Kafka and why it matters - MadridIntroduction to Apache Kafka and why it matters - Madrid
Introduction to Apache Kafka and why it matters - MadridPaolo Castagna
 
Distributed Enterprise Monitoring and Management of Apache Kafka (William McL...
Distributed Enterprise Monitoring and Management of Apache Kafka (William McL...Distributed Enterprise Monitoring and Management of Apache Kafka (William McL...
Distributed Enterprise Monitoring and Management of Apache Kafka (William McL...HostedbyConfluent
 
3 Ways to Deliver an Elastic, Cost-Effective Cloud Architecture (ANZ)
3 Ways to Deliver an Elastic, Cost-Effective Cloud Architecture (ANZ)3 Ways to Deliver an Elastic, Cost-Effective Cloud Architecture (ANZ)
3 Ways to Deliver an Elastic, Cost-Effective Cloud Architecture (ANZ)confluent
 
Kafka error handling patterns and best practices | Hemant Desale and Aruna Ka...
Kafka error handling patterns and best practices | Hemant Desale and Aruna Ka...Kafka error handling patterns and best practices | Hemant Desale and Aruna Ka...
Kafka error handling patterns and best practices | Hemant Desale and Aruna Ka...HostedbyConfluent
 
Tradeoffs in Distributed Systems Design: Is Kafka The Best? (Ben Stopford and...
Tradeoffs in Distributed Systems Design: Is Kafka The Best? (Ben Stopford and...Tradeoffs in Distributed Systems Design: Is Kafka The Best? (Ben Stopford and...
Tradeoffs in Distributed Systems Design: Is Kafka The Best? (Ben Stopford and...HostedbyConfluent
 
Not Your Mother's Kafka - Deep Dive into Confluent Cloud Infrastructure | Gwe...
Not Your Mother's Kafka - Deep Dive into Confluent Cloud Infrastructure | Gwe...Not Your Mother's Kafka - Deep Dive into Confluent Cloud Infrastructure | Gwe...
Not Your Mother's Kafka - Deep Dive into Confluent Cloud Infrastructure | Gwe...HostedbyConfluent
 
Multi-DC Kafka
Multi-DC KafkaMulti-DC Kafka
Multi-DC Kafkaconfluent
 
Kafka Summit NYC 2017 - Every Message Counts: Kafka as a Foundation for Highl...
Kafka Summit NYC 2017 - Every Message Counts: Kafka as a Foundation for Highl...Kafka Summit NYC 2017 - Every Message Counts: Kafka as a Foundation for Highl...
Kafka Summit NYC 2017 - Every Message Counts: Kafka as a Foundation for Highl...confluent
 
Confluent On Azure: Why you should add Confluent to your Azure toolkit | Alic...
Confluent On Azure: Why you should add Confluent to your Azure toolkit | Alic...Confluent On Azure: Why you should add Confluent to your Azure toolkit | Alic...
Confluent On Azure: Why you should add Confluent to your Azure toolkit | Alic...HostedbyConfluent
 
Creating Connector to Bridge the Worlds of Kafka and gRPC at Wework (Anoop Di...
Creating Connector to Bridge the Worlds of Kafka and gRPC at Wework (Anoop Di...Creating Connector to Bridge the Worlds of Kafka and gRPC at Wework (Anoop Di...
Creating Connector to Bridge the Worlds of Kafka and gRPC at Wework (Anoop Di...confluent
 

What's hot (20)

Kafka Summit NYC 2017 - Cloud Native Data Streaming Microservices with Spring...
Kafka Summit NYC 2017 - Cloud Native Data Streaming Microservices with Spring...Kafka Summit NYC 2017 - Cloud Native Data Streaming Microservices with Spring...
Kafka Summit NYC 2017 - Cloud Native Data Streaming Microservices with Spring...
 
How Zillow Unlocked Kafka to 50 Teams in 8 months | Shahar Cizer Kobrinsky, Z...
How Zillow Unlocked Kafka to 50 Teams in 8 months | Shahar Cizer Kobrinsky, Z...How Zillow Unlocked Kafka to 50 Teams in 8 months | Shahar Cizer Kobrinsky, Z...
How Zillow Unlocked Kafka to 50 Teams in 8 months | Shahar Cizer Kobrinsky, Z...
 
From Postgres to Event-Driven: using docker-compose to build CDC pipelines in...
From Postgres to Event-Driven: using docker-compose to build CDC pipelines in...From Postgres to Event-Driven: using docker-compose to build CDC pipelines in...
From Postgres to Event-Driven: using docker-compose to build CDC pipelines in...
 
Everything you ever needed to know about Kafka on Kubernetes but were afraid ...
Everything you ever needed to know about Kafka on Kubernetes but were afraid ...Everything you ever needed to know about Kafka on Kubernetes but were afraid ...
Everything you ever needed to know about Kafka on Kubernetes but were afraid ...
 
Keeping Your Data Close and Your Caches Hotter (Ricardo Ferreira, Confluent) ...
Keeping Your Data Close and Your Caches Hotter (Ricardo Ferreira, Confluent) ...Keeping Your Data Close and Your Caches Hotter (Ricardo Ferreira, Confluent) ...
Keeping Your Data Close and Your Caches Hotter (Ricardo Ferreira, Confluent) ...
 
Building Retry Architectures in Kafka with Compacted Topics | Matthew Zhou, V...
Building Retry Architectures in Kafka with Compacted Topics | Matthew Zhou, V...Building Retry Architectures in Kafka with Compacted Topics | Matthew Zhou, V...
Building Retry Architectures in Kafka with Compacted Topics | Matthew Zhou, V...
 
Look how easy it is to go from events to blazing-fast analytics! | Neha Pawar...
Look how easy it is to go from events to blazing-fast analytics! | Neha Pawar...Look how easy it is to go from events to blazing-fast analytics! | Neha Pawar...
Look how easy it is to go from events to blazing-fast analytics! | Neha Pawar...
 
Kafka for Real-Time Event Processing in Serverless Environments
Kafka for Real-Time Event Processing in Serverless EnvironmentsKafka for Real-Time Event Processing in Serverless Environments
Kafka for Real-Time Event Processing in Serverless Environments
 
Tips & Tricks for Apache Kafka®
Tips & Tricks for Apache Kafka®Tips & Tricks for Apache Kafka®
Tips & Tricks for Apache Kafka®
 
Intro to AsyncAPI
Intro to AsyncAPIIntro to AsyncAPI
Intro to AsyncAPI
 
Introduction to Apache Kafka and why it matters - Madrid
Introduction to Apache Kafka and why it matters - MadridIntroduction to Apache Kafka and why it matters - Madrid
Introduction to Apache Kafka and why it matters - Madrid
 
Distributed Enterprise Monitoring and Management of Apache Kafka (William McL...
Distributed Enterprise Monitoring and Management of Apache Kafka (William McL...Distributed Enterprise Monitoring and Management of Apache Kafka (William McL...
Distributed Enterprise Monitoring and Management of Apache Kafka (William McL...
 
3 Ways to Deliver an Elastic, Cost-Effective Cloud Architecture (ANZ)
3 Ways to Deliver an Elastic, Cost-Effective Cloud Architecture (ANZ)3 Ways to Deliver an Elastic, Cost-Effective Cloud Architecture (ANZ)
3 Ways to Deliver an Elastic, Cost-Effective Cloud Architecture (ANZ)
 
Kafka error handling patterns and best practices | Hemant Desale and Aruna Ka...
Kafka error handling patterns and best practices | Hemant Desale and Aruna Ka...Kafka error handling patterns and best practices | Hemant Desale and Aruna Ka...
Kafka error handling patterns and best practices | Hemant Desale and Aruna Ka...
 
Tradeoffs in Distributed Systems Design: Is Kafka The Best? (Ben Stopford and...
Tradeoffs in Distributed Systems Design: Is Kafka The Best? (Ben Stopford and...Tradeoffs in Distributed Systems Design: Is Kafka The Best? (Ben Stopford and...
Tradeoffs in Distributed Systems Design: Is Kafka The Best? (Ben Stopford and...
 
Not Your Mother's Kafka - Deep Dive into Confluent Cloud Infrastructure | Gwe...
Not Your Mother's Kafka - Deep Dive into Confluent Cloud Infrastructure | Gwe...Not Your Mother's Kafka - Deep Dive into Confluent Cloud Infrastructure | Gwe...
Not Your Mother's Kafka - Deep Dive into Confluent Cloud Infrastructure | Gwe...
 
Multi-DC Kafka
Multi-DC KafkaMulti-DC Kafka
Multi-DC Kafka
 
Kafka Summit NYC 2017 - Every Message Counts: Kafka as a Foundation for Highl...
Kafka Summit NYC 2017 - Every Message Counts: Kafka as a Foundation for Highl...Kafka Summit NYC 2017 - Every Message Counts: Kafka as a Foundation for Highl...
Kafka Summit NYC 2017 - Every Message Counts: Kafka as a Foundation for Highl...
 
Confluent On Azure: Why you should add Confluent to your Azure toolkit | Alic...
Confluent On Azure: Why you should add Confluent to your Azure toolkit | Alic...Confluent On Azure: Why you should add Confluent to your Azure toolkit | Alic...
Confluent On Azure: Why you should add Confluent to your Azure toolkit | Alic...
 
Creating Connector to Bridge the Worlds of Kafka and gRPC at Wework (Anoop Di...
Creating Connector to Bridge the Worlds of Kafka and gRPC at Wework (Anoop Di...Creating Connector to Bridge the Worlds of Kafka and gRPC at Wework (Anoop Di...
Creating Connector to Bridge the Worlds of Kafka and gRPC at Wework (Anoop Di...
 

Similar to Organic Growth and Good Sleep: Effective Kafka Operations at Pinterest

Strimzi - Where Apache Kafka meets OpenShift - OpenShift Spain MeetUp
Strimzi - Where Apache Kafka meets OpenShift - OpenShift Spain MeetUpStrimzi - Where Apache Kafka meets OpenShift - OpenShift Spain MeetUp
Strimzi - Where Apache Kafka meets OpenShift - OpenShift Spain MeetUpJosé Román Martín Gil
 
Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !Guido Schmutz
 
Streaming Processing with a Distributed Commit Log
Streaming Processing with a Distributed Commit LogStreaming Processing with a Distributed Commit Log
Streaming Processing with a Distributed Commit LogJoe Stein
 
Apache Kafka's Common Pitfalls & Intricacies: A Customer Support Perspective
Apache Kafka's Common Pitfalls & Intricacies: A Customer Support PerspectiveApache Kafka's Common Pitfalls & Intricacies: A Customer Support Perspective
Apache Kafka's Common Pitfalls & Intricacies: A Customer Support PerspectiveHostedbyConfluent
 
Mule soft meetup_chandigarh_#7_25_sept_2021
Mule soft meetup_chandigarh_#7_25_sept_2021Mule soft meetup_chandigarh_#7_25_sept_2021
Mule soft meetup_chandigarh_#7_25_sept_2021Lalit Panwar
 
Devfest uk & ireland using apache nifi with apache pulsar for fast data on-r...
Devfest uk & ireland  using apache nifi with apache pulsar for fast data on-r...Devfest uk & ireland  using apache nifi with apache pulsar for fast data on-r...
Devfest uk & ireland using apache nifi with apache pulsar for fast data on-r...Timothy Spann
 
Real-Time Distributed and Reactive Systems with Apache Kafka and Apache Accumulo
Real-Time Distributed and Reactive Systems with Apache Kafka and Apache AccumuloReal-Time Distributed and Reactive Systems with Apache Kafka and Apache Accumulo
Real-Time Distributed and Reactive Systems with Apache Kafka and Apache AccumuloJoe Stein
 
Accumulo Summit 2015: Real-Time Distributed and Reactive Systems with Apache ...
Accumulo Summit 2015: Real-Time Distributed and Reactive Systems with Apache ...Accumulo Summit 2015: Real-Time Distributed and Reactive Systems with Apache ...
Accumulo Summit 2015: Real-Time Distributed and Reactive Systems with Apache ...Accumulo Summit
 
Apache Kafka - Event Sourcing, Monitoring, Librdkafka, Scaling & Partitioning
Apache Kafka - Event Sourcing, Monitoring, Librdkafka, Scaling & PartitioningApache Kafka - Event Sourcing, Monitoring, Librdkafka, Scaling & Partitioning
Apache Kafka - Event Sourcing, Monitoring, Librdkafka, Scaling & PartitioningGuido Schmutz
 
Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !Guido Schmutz
 
Being Ready for Apache Kafka - Apache: Big Data Europe 2015
Being Ready for Apache Kafka - Apache: Big Data Europe 2015Being Ready for Apache Kafka - Apache: Big Data Europe 2015
Being Ready for Apache Kafka - Apache: Big Data Europe 2015Michael Noll
 
Sanger, upcoming Openstack for Bio-informaticians
Sanger, upcoming Openstack for Bio-informaticiansSanger, upcoming Openstack for Bio-informaticians
Sanger, upcoming Openstack for Bio-informaticiansPeter Clapham
 
Create a PHP Library the right way
Create a PHP Library the right wayCreate a PHP Library the right way
Create a PHP Library the right wayChristian Varela
 
lessons from managing a pulsar cluster
 lessons from managing a pulsar cluster lessons from managing a pulsar cluster
lessons from managing a pulsar clusterShivji Kumar Jha
 
Big Data Streams Architectures. Why? What? How?
Big Data Streams Architectures. Why? What? How?Big Data Streams Architectures. Why? What? How?
Big Data Streams Architectures. Why? What? How?Anton Nazaruk
 
Distributed messaging through Kafka
Distributed messaging through KafkaDistributed messaging through Kafka
Distributed messaging through KafkaDileep Kalidindi
 
kafka simplicity and complexity
kafka simplicity and complexitykafka simplicity and complexity
kafka simplicity and complexityPaolo Platter
 
Current and Future of Apache Kafka
Current and Future of Apache KafkaCurrent and Future of Apache Kafka
Current and Future of Apache KafkaJoe Stein
 
Apache Pulsar: Why Unified Messaging and Streaming Is the Future - Pulsar Sum...
Apache Pulsar: Why Unified Messaging and Streaming Is the Future - Pulsar Sum...Apache Pulsar: Why Unified Messaging and Streaming Is the Future - Pulsar Sum...
Apache Pulsar: Why Unified Messaging and Streaming Is the Future - Pulsar Sum...StreamNative
 

Similar to Organic Growth and Good Sleep: Effective Kafka Operations at Pinterest (20)

Strimzi - Where Apache Kafka meets OpenShift - OpenShift Spain MeetUp
Strimzi - Where Apache Kafka meets OpenShift - OpenShift Spain MeetUpStrimzi - Where Apache Kafka meets OpenShift - OpenShift Spain MeetUp
Strimzi - Where Apache Kafka meets OpenShift - OpenShift Spain MeetUp
 
Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !
 
Streaming Processing with a Distributed Commit Log
Streaming Processing with a Distributed Commit LogStreaming Processing with a Distributed Commit Log
Streaming Processing with a Distributed Commit Log
 
Apache Kafka's Common Pitfalls & Intricacies: A Customer Support Perspective
Apache Kafka's Common Pitfalls & Intricacies: A Customer Support PerspectiveApache Kafka's Common Pitfalls & Intricacies: A Customer Support Perspective
Apache Kafka's Common Pitfalls & Intricacies: A Customer Support Perspective
 
Mule soft meetup_chandigarh_#7_25_sept_2021
Mule soft meetup_chandigarh_#7_25_sept_2021Mule soft meetup_chandigarh_#7_25_sept_2021
Mule soft meetup_chandigarh_#7_25_sept_2021
 
Devfest uk & ireland using apache nifi with apache pulsar for fast data on-r...
Devfest uk & ireland  using apache nifi with apache pulsar for fast data on-r...Devfest uk & ireland  using apache nifi with apache pulsar for fast data on-r...
Devfest uk & ireland using apache nifi with apache pulsar for fast data on-r...
 
Real-Time Distributed and Reactive Systems with Apache Kafka and Apache Accumulo
Real-Time Distributed and Reactive Systems with Apache Kafka and Apache AccumuloReal-Time Distributed and Reactive Systems with Apache Kafka and Apache Accumulo
Real-Time Distributed and Reactive Systems with Apache Kafka and Apache Accumulo
 
Accumulo Summit 2015: Real-Time Distributed and Reactive Systems with Apache ...
Accumulo Summit 2015: Real-Time Distributed and Reactive Systems with Apache ...Accumulo Summit 2015: Real-Time Distributed and Reactive Systems with Apache ...
Accumulo Summit 2015: Real-Time Distributed and Reactive Systems with Apache ...
 
Apache Kafka - Event Sourcing, Monitoring, Librdkafka, Scaling & Partitioning
Apache Kafka - Event Sourcing, Monitoring, Librdkafka, Scaling & PartitioningApache Kafka - Event Sourcing, Monitoring, Librdkafka, Scaling & Partitioning
Apache Kafka - Event Sourcing, Monitoring, Librdkafka, Scaling & Partitioning
 
Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !
 
Being Ready for Apache Kafka - Apache: Big Data Europe 2015
Being Ready for Apache Kafka - Apache: Big Data Europe 2015Being Ready for Apache Kafka - Apache: Big Data Europe 2015
Being Ready for Apache Kafka - Apache: Big Data Europe 2015
 
Sanger, upcoming Openstack for Bio-informaticians
Sanger, upcoming Openstack for Bio-informaticiansSanger, upcoming Openstack for Bio-informaticians
Sanger, upcoming Openstack for Bio-informaticians
 
Flexible compute
Flexible computeFlexible compute
Flexible compute
 
Create a PHP Library the right way
Create a PHP Library the right wayCreate a PHP Library the right way
Create a PHP Library the right way
 
lessons from managing a pulsar cluster
 lessons from managing a pulsar cluster lessons from managing a pulsar cluster
lessons from managing a pulsar cluster
 
Big Data Streams Architectures. Why? What? How?
Big Data Streams Architectures. Why? What? How?Big Data Streams Architectures. Why? What? How?
Big Data Streams Architectures. Why? What? How?
 
Distributed messaging through Kafka
Distributed messaging through KafkaDistributed messaging through Kafka
Distributed messaging through Kafka
 
kafka simplicity and complexity
kafka simplicity and complexitykafka simplicity and complexity
kafka simplicity and complexity
 
Current and Future of Apache Kafka
Current and Future of Apache KafkaCurrent and Future of Apache Kafka
Current and Future of Apache Kafka
 
Apache Pulsar: Why Unified Messaging and Streaming Is the Future - Pulsar Sum...
Apache Pulsar: Why Unified Messaging and Streaming Is the Future - Pulsar Sum...Apache Pulsar: Why Unified Messaging and Streaming Is the Future - Pulsar Sum...
Apache Pulsar: Why Unified Messaging and Streaming Is the Future - Pulsar Sum...
 

More from confluent

Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...confluent
 
Santander Stream Processing with Apache Flink
Santander Stream Processing with Apache FlinkSantander Stream Processing with Apache Flink
Santander Stream Processing with Apache Flinkconfluent
 
Unlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insightsUnlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insightsconfluent
 
Workshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con FlinkWorkshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con Flinkconfluent
 
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...confluent
 
AWS Immersion Day Mapfre - Confluent
AWS Immersion Day Mapfre   -   ConfluentAWS Immersion Day Mapfre   -   Confluent
AWS Immersion Day Mapfre - Confluentconfluent
 
Eventos y Microservicios - Santander TechTalk
Eventos y Microservicios - Santander TechTalkEventos y Microservicios - Santander TechTalk
Eventos y Microservicios - Santander TechTalkconfluent
 
Q&A with Confluent Experts: Navigating Networking in Confluent Cloud
Q&A with Confluent Experts: Navigating Networking in Confluent CloudQ&A with Confluent Experts: Navigating Networking in Confluent Cloud
Q&A with Confluent Experts: Navigating Networking in Confluent Cloudconfluent
 
Citi TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep DiveCiti TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep Diveconfluent
 
Build real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with ConfluentBuild real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with Confluentconfluent
 
Q&A with Confluent Professional Services: Confluent Service Mesh
Q&A with Confluent Professional Services: Confluent Service MeshQ&A with Confluent Professional Services: Confluent Service Mesh
Q&A with Confluent Professional Services: Confluent Service Meshconfluent
 
Citi Tech Talk: Event Driven Kafka Microservices
Citi Tech Talk: Event Driven Kafka MicroservicesCiti Tech Talk: Event Driven Kafka Microservices
Citi Tech Talk: Event Driven Kafka Microservicesconfluent
 
Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3confluent
 
Citi Tech Talk: Messaging Modernization
Citi Tech Talk: Messaging ModernizationCiti Tech Talk: Messaging Modernization
Citi Tech Talk: Messaging Modernizationconfluent
 
Citi Tech Talk: Data Governance for streaming and real time data
Citi Tech Talk: Data Governance for streaming and real time dataCiti Tech Talk: Data Governance for streaming and real time data
Citi Tech Talk: Data Governance for streaming and real time dataconfluent
 
Confluent & GSI Webinars series: Session 2
Confluent & GSI Webinars series: Session 2Confluent & GSI Webinars series: Session 2
Confluent & GSI Webinars series: Session 2confluent
 
Data In Motion Paris 2023
Data In Motion Paris 2023Data In Motion Paris 2023
Data In Motion Paris 2023confluent
 
Confluent Partner Tech Talk with Synthesis
Confluent Partner Tech Talk with SynthesisConfluent Partner Tech Talk with Synthesis
Confluent Partner Tech Talk with Synthesisconfluent
 
The Future of Application Development - API Days - Melbourne 2023
The Future of Application Development - API Days - Melbourne 2023The Future of Application Development - API Days - Melbourne 2023
The Future of Application Development - API Days - Melbourne 2023confluent
 
The Playful Bond Between REST And Data Streams
The Playful Bond Between REST And Data StreamsThe Playful Bond Between REST And Data Streams
The Playful Bond Between REST And Data Streamsconfluent
 

More from confluent (20)

Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
 
Santander Stream Processing with Apache Flink
Santander Stream Processing with Apache FlinkSantander Stream Processing with Apache Flink
Santander Stream Processing with Apache Flink
 
Unlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insightsUnlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insights
 
Workshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con FlinkWorkshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con Flink
 
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
 
AWS Immersion Day Mapfre - Confluent
AWS Immersion Day Mapfre   -   ConfluentAWS Immersion Day Mapfre   -   Confluent
AWS Immersion Day Mapfre - Confluent
 
Eventos y Microservicios - Santander TechTalk
Eventos y Microservicios - Santander TechTalkEventos y Microservicios - Santander TechTalk
Eventos y Microservicios - Santander TechTalk
 
Q&A with Confluent Experts: Navigating Networking in Confluent Cloud
Q&A with Confluent Experts: Navigating Networking in Confluent CloudQ&A with Confluent Experts: Navigating Networking in Confluent Cloud
Q&A with Confluent Experts: Navigating Networking in Confluent Cloud
 
Citi TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep DiveCiti TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep Dive
 
Build real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with ConfluentBuild real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with Confluent
 
Q&A with Confluent Professional Services: Confluent Service Mesh
Q&A with Confluent Professional Services: Confluent Service MeshQ&A with Confluent Professional Services: Confluent Service Mesh
Q&A with Confluent Professional Services: Confluent Service Mesh
 
Citi Tech Talk: Event Driven Kafka Microservices
Citi Tech Talk: Event Driven Kafka MicroservicesCiti Tech Talk: Event Driven Kafka Microservices
Citi Tech Talk: Event Driven Kafka Microservices
 
Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3
 
Citi Tech Talk: Messaging Modernization
Citi Tech Talk: Messaging ModernizationCiti Tech Talk: Messaging Modernization
Citi Tech Talk: Messaging Modernization
 
Citi Tech Talk: Data Governance for streaming and real time data
Citi Tech Talk: Data Governance for streaming and real time dataCiti Tech Talk: Data Governance for streaming and real time data
Citi Tech Talk: Data Governance for streaming and real time data
 
Confluent & GSI Webinars series: Session 2
Confluent & GSI Webinars series: Session 2Confluent & GSI Webinars series: Session 2
Confluent & GSI Webinars series: Session 2
 
Data In Motion Paris 2023
Data In Motion Paris 2023Data In Motion Paris 2023
Data In Motion Paris 2023
 
Confluent Partner Tech Talk with Synthesis
Confluent Partner Tech Talk with SynthesisConfluent Partner Tech Talk with Synthesis
Confluent Partner Tech Talk with Synthesis
 
The Future of Application Development - API Days - Melbourne 2023
The Future of Application Development - API Days - Melbourne 2023The Future of Application Development - API Days - Melbourne 2023
The Future of Application Development - API Days - Melbourne 2023
 
The Playful Bond Between REST And Data Streams
The Playful Bond Between REST And Data StreamsThe Playful Bond Between REST And Data Streams
The Playful Bond Between REST And Data Streams
 

Recently uploaded

Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Science&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfScience&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfjimielynbastida
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentationphoebematthew05
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDGMarianaLemus7
 

Recently uploaded (20)

Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Science&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfScience&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdf
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentation
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDG
 

Organic Growth and Good Sleep: Effective Kafka Operations at Pinterest

  • 1. Organic Growth and a Good Night Sleep: Effective Kafka Operations at Pinterest Kafka Summit 2020 Vahid Hashemian August 24, 2020Ambud Sharma
  • 2. 1 2 3 4 5 6 7 Agenda Who We Are What is Pinterest Kafka at Pinterest Challenges/Lessons Learned Automation The Upgrade Story The Road Ahead
  • 3. 1 Who we are Vahid Hashemian Software Engineer Logging Platform, Pinterest Committer, PMC Member Ambud Sharma EM & TL Logging Platform, Pinterest And other members of Logging Platform team: Eric Lopez, Heng Zhang, Henry Cai, Jeff Xiang, Ping-Min Lin
  • 4. 2 What is Pinterest
  • 5. Mission To bring everyone the inspiration to create a life they love 2 What is Pinterest
  • 6. ● 367 million Global Monthly Active Users ● 240 billion Pins saved ● 5B+ boards ● 82% of all Pinners use Pinterest on mobile ● 91% of Pinners say that Pinterest is filled with positivity. ● 89% of Pinners report that they leave the site feeling empowered. 2 What is Pinterest
  • 7. 3 Kafka at Pinterest
  • 8. Scale of Data Ingestion Pipelines: ● Hosted on AWS EC2 ● 50+ Kafka clusters (in production) ● 2,500+ Kafka brokers ● 3,000+ Kafka topics, 150K+ Kafka partitions ● Inbound traffic: ~ 20 GB/s (max) ● Outbound traffic: ~ 50 GB/s (max) Current Version: ● 2.3.1 with cherry-picked commits 3 Kafka at Pinterest
  • 9. ● Performance Issues ○ Magnetic disks to NVMe SSDs ○ Dynamic rebalancing -> Static rebalancing ○ Message conversions (implicit throttling) ● Cost control ○ Data transfer -> Rack awareness ○ Compressed topics ○ Retentions and Replication factor tuning ● Placement and auto-balancing partitions (brokersets) ○ New topics ○ Traffic pattern changes 4 Challenges / Lessons Learned
  • 10. 5 Automation Wizards Git Repo Orion Brokerset 1 Brokerset 2 Brokerset 3 Topic 1 Topic 2 Topic 3 Topic 4 Topic 5 Cluster X ● Each brokerset is a group of 1 or more brokers ● Brokerset is defined using broker id ranges ● Brokersets can overlap ● Topic partition counts are a multiple of brokersets ● Currently there are 2 types of brokersets (Capacity & Static) ● Topics are assigned based on capacity requirements & capacity available ● All topics are managed via config check-in (infra as code) ● Wizard provides a simple interface for users to generate topic code Topic 6
  • 11. 5 Automation (lineage, schema registry and wizard)
  • 12. 5 Automation We developed Orion as a Unified Management System for Stateful Distributed Systems 1. UI 2. Automation 3. Cluster Management 4. Alerting Orion manages all Kafka clusters at Pinterest.
  • 13. 5 Automation We developed Orion as a Unified Management System for Stateful Distributed Systems 1. UI 2. Automation 3. Cluster Management 4. Alerting Orion manages all Kafka clusters at Pinterest.
  • 14. 5 Automation We developed Orion as a Unified Management System for Stateful Distributed Systems 1. UI 2. Automation 3. Cluster Management 4. Alerting Orion manages all Kafka clusters at Pinterest.
  • 15. 5 Automation We developed Orion as a Unified Management System for Stateful Distributed Systems 1. UI 2. Automation 3. Cluster Management 4. Alerting Orion manages all Kafka clusters at Pinterest.
  • 16. ● All clusters were upgraded earlier this year to 2.3.1+ ● Post-upgrade versions: ○ Broker: 2.3.1+ [2.3.1 + cherry picked commits] ○ Inter broker protocol: 2.3-IV1 ○ Log message format: 2.3-IV1 6 The Upgrade Story
  • 17. ● Broker version ○ The version of the Kafka binary the broker runs. ● Inter broker protocol version ○ The version on which brokers communicate with each other. ○ Maximum allowed value is the minimum broker version in the cluster. ○ Once upgraded, cannot be rolled back. ● Log message format version ○ The version of format used by brokers to store messages. ○ Can be granular to per topic. ○ Guarantees that messages on disk are of smaller or equal version to the version configured for a topic. ○ If set incorrectly, could break old consumers (prior to 0.10.2 that are not forward compatible). 6 The Upgrade Story
  • 18. 6 The Upgrade Story
  • 19. Worthy lessons to share 1. Not every release of Apache Kafka may work for your specific use cases. Do your due diligence before choosing the version to upgrade to. 6 The Upgrade Story
  • 20. Worthy lessons to share 2. Check all blocker / critical / major bugs reported against your upgrade version of choice. 6 The Upgrade Story
  • 21. 6 The Upgrade Story Worthy lessons to share 3. Bug fix releases are usually safer options for upgrade, but the risk is never 0.
  • 22. Worthy lessons to share 4. If you have clusters on old log message format versions, make sure their clients would not be affected by the upgrade. 6 The Upgrade Story Scala / Java
  • 23. Worthy lessons to share 4. If you have clusters on old log message format versions, make sure their clients would not be affected by the upgrade. 6 The Upgrade Story librdkafka
  • 24. Worthy lessons to share 4. If you have clusters on old log message format versions, make sure their clients would not be affected by the upgrade. 6 The Upgrade Story kafka-python
  • 25. Worthy lessons to share 4. If you have clusters on old log message format versions, make sure their clients would not be affected by the upgrade. 6 The Upgrade Story confluent-kafka-python
  • 26. ● Interoperability ○ Abstract implementation (Kafka internals etc.) from client applications. ● Scaling and efficiency ○ Provide true horizontal and dynamic scaling. ○ Reduce cost per GB moved. ● Reliability ○ Automate on-call support ○ Measure & pinpoint data loss 8 The Road Ahead
  • 27. Pinterest Engineering Blog (on Medium) ● Using graph algorithms to optimize Kafka operations (Part 1, Part 2) ● Optimizing Kafka for the cloud ● Open sourcing Singer, Pinterest’s performant and reliable logging agent ● How Pinterest runs Kafka at scale Pinterest is hiring ● https://careers.pinterest.com/ Additional Resources