SlideShare a Scribd company logo
Kafka used at scale
to deliver real-time notifications
Sergio Nunes
Infrastructure Engineer @ Zendesk
@sbrnunes
Agenda
Monitoring
Delivering real-time notifications
Some challenges
Kafka for real-time
notifications
Kafka for real-time notifications
Some initial requirements
● Listen to any database event that could translate into a notification
● Process and aggregate those events into transactions
● Translate transactions into notifications
● Push those notifications to mobile devices
○ In real-time
○ In a sane order (for the User)
○ With an accurate badge count (number of unread notifications)
Kafka for real-time notifications
Architecture of the system
Maxwell
Transactions
Stream
Events
Stream
Notifications
Stream
Notifications Service
MySQL
Kafka for real-time notifications
Architecture of the system
Maxwell
Transactions
Stream
Events
Stream
Notifications
Stream
Notifications Service
Partitioned by database === account
Partitioned by account
MySQL
Kafka for real-time notifications
Architecture of the system
Maxwell
Transactions
Stream
Events
Stream
Notifications
Stream
Notifications Service
MySQL
Partitioned by user
Kafka for real-time notifications
Architecture of the system
Maxwell
Transactions
Stream
Events
Stream
Notifications
Stream
Notifications Service
MySQL
API Server
Mark
as “read”
Badge Update
Notification
Kafka for real-time notifications
Conclusions
● Crazy fast!!!
○ Easily streaming at 7 - 10K database events /s
○ We’ve seen it handle up to 20 - 25K /s ( > 1M /min)
● Ordering guarantees provided really matched our needs
● Highly configurable
● Easily scalable (horizontally)
○ We can “easily” add or remove nodes to the Kafka cluster
○ We can easily add and remove consumer instances
Questions ?
Monitoring
Monitoring
How to monitor a service like this ?
● We need to be able to answer a few questions
○ Are we getting messages from Kafka ?
○ How fast are we reading those messages?
○ How much are we lagging ?
● We need to capture metrics!
● We can use those metrics to create alerts!
Monitoring
Application metrics
● Some metrics can be easily captured in the application and then reported to some monitoring
service
○ Examples: events produced /s, events consumed /s, etc.
Monitoring
Kafka metrics
● Other metrics are hard to get
● Consumer Lag: probably the most important metric
○ Size of partition (last offset) - consumer offset (last committed)
○ Brokers know about partition sizes
○ Consumer owns the consumed offsets
● Consumers acknowledge offsets by “committing” them back into a special Kafka
topic
○ What do we do with this?
Monitoring
Burrow
● Burrow for the rescue!!!
○ Monitoring application open sourced by Linkedin
https://github.com/linkedin/Burrow
○ Can be deployed as a sidecar application for Kafka
○ Keeps track of committed offsets, as well as the last
offsets known by the brokers
○ Exposes an REST API to gather all this information
Monitoring
Datadog
● Open source plugin for Burrow
○ https://github.com/packetloop/datadog-agent-burrow
○ Uses Burrow’s API to fetch some metrics (including the
consumer lag)
○ Publishes the metrics to Datadog
Some challenges
Some challenges
The problem of (not understanding) Kafka horizontal scaling
● Partition is considered the unit of parallelism in Kafka
○ So… we created our topics with 200 partitions!!! The more the merrier!
○ How many consumer instances ?
■ Service was running in 2 hosts, 4 cores each, maybe 4 instances in each machine ?
■ This gives us a total of 8 consumer instances for 200 partitions (?!?)
○ Did we need 200 partitions ? No !
https://www.confluent.io/blog/how-to-choose-the-number-of-topicspartitions-in-a-kafka-cluster/
● The problem: we can add partitions to a topic, but we can’t remove them
Some challenges
The problem of (not understanding) Kafka horizontal scaling
Some challenges
Reducing the number of partitions
Producers
Topic V1 Topic V2
Consumers
Process followed to reduce the partitions:
1. Create topic V2
2. Make the consumer read from V1 and V2
3. Make the producer write only to V2
4. Wait until V1 is drained
5. Re-create V1 with the new number of partitions
6. Make the producer write only to V1
7. When V2 is drained, make the consumer read only from V1
Some challenges
● Advantages of partitioning the data per account:
○ Accounts data gets load balanced across the cluster
○ Guarantees the ordering per account
● Disadvantages:
○ Partitions can become heavily unbalanced
■ Some accounts are bigger than others
■ We can’t guarantee that big accounts won’t end up in the same partition
■ Some accounts may be under load
● The problem: this can slow down the assigned consumer a lot!!!
Unbalanced partitions
Some challenges
● In some rare occasions (load spikes), we had the need to increase the throughput
● Recommended solution: to add more consumer instances
● The problem: we only had two hosts, adding more consumer instances had to be done at the
application level
○ Our implementation with Akka Streams was not helping
○ To add a new consumer we had to replicate the entire stream (significant increase in the
number of threads)
○ Was causing us some issues and actually reducing the throughput
● The solution: keep one or two consumer instances per host, within the app, but consider adding
more more nodes
Increasing the number of consumers
Some challenges
Upgrading Kafka clients
● A few surprises upgrading to 0.9
○ Processing big chunks of data can lead to delays on the heartbeat mechanism which may cause
constant rebalances
○ Possible solutions:
■ reducing the maximum size of data consumed
■ Another possible solution: increase the consumer session timeout (prevent it from expiring
before processing the data)
Questions ?
TM and © 2017 Zendesk Inc. All rights reserved.

More Related Content

What's hot

Data Pipelines with Kafka Connect
Data Pipelines with Kafka ConnectData Pipelines with Kafka Connect
Data Pipelines with Kafka ConnectKaufman Ng
 
From Message to Cluster: A Realworld Introduction to Kafka Capacity Planning
From Message to Cluster: A Realworld Introduction to Kafka Capacity PlanningFrom Message to Cluster: A Realworld Introduction to Kafka Capacity Planning
From Message to Cluster: A Realworld Introduction to Kafka Capacity Planningconfluent
 
Stream processing using Kafka
Stream processing using KafkaStream processing using Kafka
Stream processing using KafkaKnoldus Inc.
 
A visual introduction to Apache Kafka
A visual introduction to Apache KafkaA visual introduction to Apache Kafka
A visual introduction to Apache KafkaPaul Brebner
 
ksqlDB: A Stream-Relational Database System
ksqlDB: A Stream-Relational Database SystemksqlDB: A Stream-Relational Database System
ksqlDB: A Stream-Relational Database Systemconfluent
 
Apache Kafka Architecture & Fundamentals Explained
Apache Kafka Architecture & Fundamentals ExplainedApache Kafka Architecture & Fundamentals Explained
Apache Kafka Architecture & Fundamentals Explainedconfluent
 
Apache Pulsar First Overview
Apache PulsarFirst OverviewApache PulsarFirst Overview
Apache Pulsar First OverviewRicardo Paiva
 
Kafka Streams: What it is, and how to use it?
Kafka Streams: What it is, and how to use it?Kafka Streams: What it is, and how to use it?
Kafka Streams: What it is, and how to use it?confluent
 
Apache Kafka Fundamentals for Architects, Admins and Developers
Apache Kafka Fundamentals for Architects, Admins and DevelopersApache Kafka Fundamentals for Architects, Admins and Developers
Apache Kafka Fundamentals for Architects, Admins and Developersconfluent
 
Building Streaming Data Applications Using Apache Kafka
Building Streaming Data Applications Using Apache KafkaBuilding Streaming Data Applications Using Apache Kafka
Building Streaming Data Applications Using Apache KafkaSlim Baltagi
 
How is Kafka so Fast?
How is Kafka so Fast?How is Kafka so Fast?
How is Kafka so Fast?Ricardo Paiva
 
Hello, kafka! (an introduction to apache kafka)
Hello, kafka! (an introduction to apache kafka)Hello, kafka! (an introduction to apache kafka)
Hello, kafka! (an introduction to apache kafka)Timothy Spann
 
A Deep Dive into Kafka Controller
A Deep Dive into Kafka ControllerA Deep Dive into Kafka Controller
A Deep Dive into Kafka Controllerconfluent
 
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...Flink Forward
 
Using Modular Topologies in Kafka Streams to scale ksqlDB’s persistent querie...
Using Modular Topologies in Kafka Streams to scale ksqlDB’s persistent querie...Using Modular Topologies in Kafka Streams to scale ksqlDB’s persistent querie...
Using Modular Topologies in Kafka Streams to scale ksqlDB’s persistent querie...HostedbyConfluent
 
Producer Performance Tuning for Apache Kafka
Producer Performance Tuning for Apache KafkaProducer Performance Tuning for Apache Kafka
Producer Performance Tuning for Apache KafkaJiangjie Qin
 
Introduction to Kafka Streams
Introduction to Kafka StreamsIntroduction to Kafka Streams
Introduction to Kafka StreamsGuozhang Wang
 

What's hot (20)

Data Pipelines with Kafka Connect
Data Pipelines with Kafka ConnectData Pipelines with Kafka Connect
Data Pipelines with Kafka Connect
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
From Message to Cluster: A Realworld Introduction to Kafka Capacity Planning
From Message to Cluster: A Realworld Introduction to Kafka Capacity PlanningFrom Message to Cluster: A Realworld Introduction to Kafka Capacity Planning
From Message to Cluster: A Realworld Introduction to Kafka Capacity Planning
 
Stream processing using Kafka
Stream processing using KafkaStream processing using Kafka
Stream processing using Kafka
 
A visual introduction to Apache Kafka
A visual introduction to Apache KafkaA visual introduction to Apache Kafka
A visual introduction to Apache Kafka
 
ksqlDB: A Stream-Relational Database System
ksqlDB: A Stream-Relational Database SystemksqlDB: A Stream-Relational Database System
ksqlDB: A Stream-Relational Database System
 
Apache Kafka Architecture & Fundamentals Explained
Apache Kafka Architecture & Fundamentals ExplainedApache Kafka Architecture & Fundamentals Explained
Apache Kafka Architecture & Fundamentals Explained
 
Apache Pulsar First Overview
Apache PulsarFirst OverviewApache PulsarFirst Overview
Apache Pulsar First Overview
 
Kafka Streams: What it is, and how to use it?
Kafka Streams: What it is, and how to use it?Kafka Streams: What it is, and how to use it?
Kafka Streams: What it is, and how to use it?
 
Kafka
KafkaKafka
Kafka
 
Apache Kafka Fundamentals for Architects, Admins and Developers
Apache Kafka Fundamentals for Architects, Admins and DevelopersApache Kafka Fundamentals for Architects, Admins and Developers
Apache Kafka Fundamentals for Architects, Admins and Developers
 
Building Streaming Data Applications Using Apache Kafka
Building Streaming Data Applications Using Apache KafkaBuilding Streaming Data Applications Using Apache Kafka
Building Streaming Data Applications Using Apache Kafka
 
How is Kafka so Fast?
How is Kafka so Fast?How is Kafka so Fast?
How is Kafka so Fast?
 
Hello, kafka! (an introduction to apache kafka)
Hello, kafka! (an introduction to apache kafka)Hello, kafka! (an introduction to apache kafka)
Hello, kafka! (an introduction to apache kafka)
 
kafka
kafkakafka
kafka
 
A Deep Dive into Kafka Controller
A Deep Dive into Kafka ControllerA Deep Dive into Kafka Controller
A Deep Dive into Kafka Controller
 
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
 
Using Modular Topologies in Kafka Streams to scale ksqlDB’s persistent querie...
Using Modular Topologies in Kafka Streams to scale ksqlDB’s persistent querie...Using Modular Topologies in Kafka Streams to scale ksqlDB’s persistent querie...
Using Modular Topologies in Kafka Streams to scale ksqlDB’s persistent querie...
 
Producer Performance Tuning for Apache Kafka
Producer Performance Tuning for Apache KafkaProducer Performance Tuning for Apache Kafka
Producer Performance Tuning for Apache Kafka
 
Introduction to Kafka Streams
Introduction to Kafka StreamsIntroduction to Kafka Streams
Introduction to Kafka Streams
 

Similar to Kafka used at scale to deliver real-time notifications

Netflix Data Pipeline With Kafka
Netflix Data Pipeline With KafkaNetflix Data Pipeline With Kafka
Netflix Data Pipeline With KafkaSteven Wu
 
OSDC 2018 | From Monolith to Microservices by Paul Puschmann_
OSDC 2018 | From Monolith to Microservices by Paul Puschmann_OSDC 2018 | From Monolith to Microservices by Paul Puschmann_
OSDC 2018 | From Monolith to Microservices by Paul Puschmann_NETWAYS
 
Event driven architectures with Kinesis
Event driven architectures with KinesisEvent driven architectures with Kinesis
Event driven architectures with KinesisMark Harrison
 
iFood on Delivering 100 Million Events a Month to Restaurants with Scylla
iFood on Delivering 100 Million Events a Month to Restaurants with ScyllaiFood on Delivering 100 Million Events a Month to Restaurants with Scylla
iFood on Delivering 100 Million Events a Month to Restaurants with ScyllaScyllaDB
 
Domain events & Kafka in Ruby applications
Domain events & Kafka in Ruby applicationsDomain events & Kafka in Ruby applications
Domain events & Kafka in Ruby applicationsSpyros Livathinos
 
14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...
14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...
14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...Athens Big Data
 
Twitter’s Apache Kafka Adoption Journey | Ming Liu, Twitter
Twitter’s Apache Kafka Adoption Journey | Ming Liu, TwitterTwitter’s Apache Kafka Adoption Journey | Ming Liu, Twitter
Twitter’s Apache Kafka Adoption Journey | Ming Liu, TwitterHostedbyConfluent
 
Our Multi-Year Journey to a 10x Faster Confluent Cloud
Our Multi-Year Journey to a 10x Faster Confluent CloudOur Multi-Year Journey to a 10x Faster Confluent Cloud
Our Multi-Year Journey to a 10x Faster Confluent CloudHostedbyConfluent
 
Apache Kafka's Common Pitfalls & Intricacies: A Customer Support Perspective
Apache Kafka's Common Pitfalls & Intricacies: A Customer Support PerspectiveApache Kafka's Common Pitfalls & Intricacies: A Customer Support Perspective
Apache Kafka's Common Pitfalls & Intricacies: A Customer Support PerspectiveHostedbyConfluent
 
Architectual Comparison of Apache Apex and Spark Streaming
Architectual Comparison of Apache Apex and Spark StreamingArchitectual Comparison of Apache Apex and Spark Streaming
Architectual Comparison of Apache Apex and Spark StreamingApache Apex
 
Devops - Microservice and Kubernetes
Devops - Microservice and KubernetesDevops - Microservice and Kubernetes
Devops - Microservice and KubernetesNodeXperts
 
Citi Tech Talk: Monitoring and Performance
Citi Tech Talk: Monitoring and PerformanceCiti Tech Talk: Monitoring and Performance
Citi Tech Talk: Monitoring and Performanceconfluent
 
Docebo: history of a journey from legacy to serverless
Docebo: history of a journey from legacy to serverlessDocebo: history of a journey from legacy to serverless
Docebo: history of a journey from legacy to serverlessAWS User Group Italy
 
How Pulsar Enables Netdata to Offer Unlimited Infrastructure Monitoring for F...
How Pulsar Enables Netdata to Offer Unlimited Infrastructure Monitoring for F...How Pulsar Enables Netdata to Offer Unlimited Infrastructure Monitoring for F...
How Pulsar Enables Netdata to Offer Unlimited Infrastructure Monitoring for F...StreamNative
 
LINE's Private Cloud - Meet Cloud Native World
LINE's Private Cloud - Meet Cloud Native WorldLINE's Private Cloud - Meet Cloud Native World
LINE's Private Cloud - Meet Cloud Native WorldLINE Corporation
 
9th docker meetup 2016.07.13
9th docker meetup 2016.07.139th docker meetup 2016.07.13
9th docker meetup 2016.07.13Amrita Prasad
 
Ultimate Guide to Microservice Architecture on Kubernetes
Ultimate Guide to Microservice Architecture on KubernetesUltimate Guide to Microservice Architecture on Kubernetes
Ultimate Guide to Microservice Architecture on Kuberneteskloia
 
Stream processing with Apache Flink @ OfferUp
Stream processing with Apache Flink @ OfferUpStream processing with Apache Flink @ OfferUp
Stream processing with Apache Flink @ OfferUpBowen Li
 
Introduction to Akka Streams
Introduction to Akka StreamsIntroduction to Akka Streams
Introduction to Akka StreamsKnoldus Inc.
 

Similar to Kafka used at scale to deliver real-time notifications (20)

Netflix Data Pipeline With Kafka
Netflix Data Pipeline With KafkaNetflix Data Pipeline With Kafka
Netflix Data Pipeline With Kafka
 
Netflix Data Pipeline With Kafka
Netflix Data Pipeline With KafkaNetflix Data Pipeline With Kafka
Netflix Data Pipeline With Kafka
 
OSDC 2018 | From Monolith to Microservices by Paul Puschmann_
OSDC 2018 | From Monolith to Microservices by Paul Puschmann_OSDC 2018 | From Monolith to Microservices by Paul Puschmann_
OSDC 2018 | From Monolith to Microservices by Paul Puschmann_
 
Event driven architectures with Kinesis
Event driven architectures with KinesisEvent driven architectures with Kinesis
Event driven architectures with Kinesis
 
iFood on Delivering 100 Million Events a Month to Restaurants with Scylla
iFood on Delivering 100 Million Events a Month to Restaurants with ScyllaiFood on Delivering 100 Million Events a Month to Restaurants with Scylla
iFood on Delivering 100 Million Events a Month to Restaurants with Scylla
 
Domain events & Kafka in Ruby applications
Domain events & Kafka in Ruby applicationsDomain events & Kafka in Ruby applications
Domain events & Kafka in Ruby applications
 
14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...
14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...
14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...
 
Twitter’s Apache Kafka Adoption Journey | Ming Liu, Twitter
Twitter’s Apache Kafka Adoption Journey | Ming Liu, TwitterTwitter’s Apache Kafka Adoption Journey | Ming Liu, Twitter
Twitter’s Apache Kafka Adoption Journey | Ming Liu, Twitter
 
Our Multi-Year Journey to a 10x Faster Confluent Cloud
Our Multi-Year Journey to a 10x Faster Confluent CloudOur Multi-Year Journey to a 10x Faster Confluent Cloud
Our Multi-Year Journey to a 10x Faster Confluent Cloud
 
Apache Kafka's Common Pitfalls & Intricacies: A Customer Support Perspective
Apache Kafka's Common Pitfalls & Intricacies: A Customer Support PerspectiveApache Kafka's Common Pitfalls & Intricacies: A Customer Support Perspective
Apache Kafka's Common Pitfalls & Intricacies: A Customer Support Perspective
 
Architectual Comparison of Apache Apex and Spark Streaming
Architectual Comparison of Apache Apex and Spark StreamingArchitectual Comparison of Apache Apex and Spark Streaming
Architectual Comparison of Apache Apex and Spark Streaming
 
Devops - Microservice and Kubernetes
Devops - Microservice and KubernetesDevops - Microservice and Kubernetes
Devops - Microservice and Kubernetes
 
Citi Tech Talk: Monitoring and Performance
Citi Tech Talk: Monitoring and PerformanceCiti Tech Talk: Monitoring and Performance
Citi Tech Talk: Monitoring and Performance
 
Docebo: history of a journey from legacy to serverless
Docebo: history of a journey from legacy to serverlessDocebo: history of a journey from legacy to serverless
Docebo: history of a journey from legacy to serverless
 
How Pulsar Enables Netdata to Offer Unlimited Infrastructure Monitoring for F...
How Pulsar Enables Netdata to Offer Unlimited Infrastructure Monitoring for F...How Pulsar Enables Netdata to Offer Unlimited Infrastructure Monitoring for F...
How Pulsar Enables Netdata to Offer Unlimited Infrastructure Monitoring for F...
 
LINE's Private Cloud - Meet Cloud Native World
LINE's Private Cloud - Meet Cloud Native WorldLINE's Private Cloud - Meet Cloud Native World
LINE's Private Cloud - Meet Cloud Native World
 
9th docker meetup 2016.07.13
9th docker meetup 2016.07.139th docker meetup 2016.07.13
9th docker meetup 2016.07.13
 
Ultimate Guide to Microservice Architecture on Kubernetes
Ultimate Guide to Microservice Architecture on KubernetesUltimate Guide to Microservice Architecture on Kubernetes
Ultimate Guide to Microservice Architecture on Kubernetes
 
Stream processing with Apache Flink @ OfferUp
Stream processing with Apache Flink @ OfferUpStream processing with Apache Flink @ OfferUp
Stream processing with Apache Flink @ OfferUp
 
Introduction to Akka Streams
Introduction to Akka StreamsIntroduction to Akka Streams
Introduction to Akka Streams
 

Recently uploaded

Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backElena Simperl
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Alison B. Lowndes
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsPaul Groth
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Product School
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform EngineeringJemma Hussein Allen
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Jeffrey Haguewood
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...Product School
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
 
IoT Analytics Company Presentation May 2024
IoT Analytics Company Presentation May 2024IoT Analytics Company Presentation May 2024
IoT Analytics Company Presentation May 2024IoTAnalytics
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaRTTS
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Product School
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...Product School
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...Sri Ambati
 
НАДІЯ ФЕДЮШКО БАЦ «Професійне зростання QA спеціаліста»
НАДІЯ ФЕДЮШКО БАЦ  «Професійне зростання QA спеціаліста»НАДІЯ ФЕДЮШКО БАЦ  «Професійне зростання QA спеціаліста»
НАДІЯ ФЕДЮШКО БАЦ «Професійне зростання QA спеціаліста»QADay
 

Recently uploaded (20)

Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
 
IoT Analytics Company Presentation May 2024
IoT Analytics Company Presentation May 2024IoT Analytics Company Presentation May 2024
IoT Analytics Company Presentation May 2024
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
НАДІЯ ФЕДЮШКО БАЦ «Професійне зростання QA спеціаліста»
НАДІЯ ФЕДЮШКО БАЦ  «Професійне зростання QA спеціаліста»НАДІЯ ФЕДЮШКО БАЦ  «Професійне зростання QA спеціаліста»
НАДІЯ ФЕДЮШКО БАЦ «Професійне зростання QA спеціаліста»
 

Kafka used at scale to deliver real-time notifications

  • 1. Kafka used at scale to deliver real-time notifications
  • 5. Kafka for real-time notifications Some initial requirements ● Listen to any database event that could translate into a notification ● Process and aggregate those events into transactions ● Translate transactions into notifications ● Push those notifications to mobile devices ○ In real-time ○ In a sane order (for the User) ○ With an accurate badge count (number of unread notifications)
  • 6. Kafka for real-time notifications Architecture of the system Maxwell Transactions Stream Events Stream Notifications Stream Notifications Service MySQL
  • 7. Kafka for real-time notifications Architecture of the system Maxwell Transactions Stream Events Stream Notifications Stream Notifications Service Partitioned by database === account Partitioned by account MySQL
  • 8. Kafka for real-time notifications Architecture of the system Maxwell Transactions Stream Events Stream Notifications Stream Notifications Service MySQL Partitioned by user
  • 9. Kafka for real-time notifications Architecture of the system Maxwell Transactions Stream Events Stream Notifications Stream Notifications Service MySQL API Server Mark as “read” Badge Update Notification
  • 10. Kafka for real-time notifications Conclusions ● Crazy fast!!! ○ Easily streaming at 7 - 10K database events /s ○ We’ve seen it handle up to 20 - 25K /s ( > 1M /min) ● Ordering guarantees provided really matched our needs ● Highly configurable ● Easily scalable (horizontally) ○ We can “easily” add or remove nodes to the Kafka cluster ○ We can easily add and remove consumer instances
  • 13. Monitoring How to monitor a service like this ? ● We need to be able to answer a few questions ○ Are we getting messages from Kafka ? ○ How fast are we reading those messages? ○ How much are we lagging ? ● We need to capture metrics! ● We can use those metrics to create alerts!
  • 14. Monitoring Application metrics ● Some metrics can be easily captured in the application and then reported to some monitoring service ○ Examples: events produced /s, events consumed /s, etc.
  • 15. Monitoring Kafka metrics ● Other metrics are hard to get ● Consumer Lag: probably the most important metric ○ Size of partition (last offset) - consumer offset (last committed) ○ Brokers know about partition sizes ○ Consumer owns the consumed offsets ● Consumers acknowledge offsets by “committing” them back into a special Kafka topic ○ What do we do with this?
  • 16. Monitoring Burrow ● Burrow for the rescue!!! ○ Monitoring application open sourced by Linkedin https://github.com/linkedin/Burrow ○ Can be deployed as a sidecar application for Kafka ○ Keeps track of committed offsets, as well as the last offsets known by the brokers ○ Exposes an REST API to gather all this information
  • 17. Monitoring Datadog ● Open source plugin for Burrow ○ https://github.com/packetloop/datadog-agent-burrow ○ Uses Burrow’s API to fetch some metrics (including the consumer lag) ○ Publishes the metrics to Datadog
  • 19. Some challenges The problem of (not understanding) Kafka horizontal scaling ● Partition is considered the unit of parallelism in Kafka ○ So… we created our topics with 200 partitions!!! The more the merrier! ○ How many consumer instances ? ■ Service was running in 2 hosts, 4 cores each, maybe 4 instances in each machine ? ■ This gives us a total of 8 consumer instances for 200 partitions (?!?) ○ Did we need 200 partitions ? No ! https://www.confluent.io/blog/how-to-choose-the-number-of-topicspartitions-in-a-kafka-cluster/ ● The problem: we can add partitions to a topic, but we can’t remove them
  • 20. Some challenges The problem of (not understanding) Kafka horizontal scaling
  • 21. Some challenges Reducing the number of partitions Producers Topic V1 Topic V2 Consumers Process followed to reduce the partitions: 1. Create topic V2 2. Make the consumer read from V1 and V2 3. Make the producer write only to V2 4. Wait until V1 is drained 5. Re-create V1 with the new number of partitions 6. Make the producer write only to V1 7. When V2 is drained, make the consumer read only from V1
  • 22. Some challenges ● Advantages of partitioning the data per account: ○ Accounts data gets load balanced across the cluster ○ Guarantees the ordering per account ● Disadvantages: ○ Partitions can become heavily unbalanced ■ Some accounts are bigger than others ■ We can’t guarantee that big accounts won’t end up in the same partition ■ Some accounts may be under load ● The problem: this can slow down the assigned consumer a lot!!! Unbalanced partitions
  • 23. Some challenges ● In some rare occasions (load spikes), we had the need to increase the throughput ● Recommended solution: to add more consumer instances ● The problem: we only had two hosts, adding more consumer instances had to be done at the application level ○ Our implementation with Akka Streams was not helping ○ To add a new consumer we had to replicate the entire stream (significant increase in the number of threads) ○ Was causing us some issues and actually reducing the throughput ● The solution: keep one or two consumer instances per host, within the app, but consider adding more more nodes Increasing the number of consumers
  • 24. Some challenges Upgrading Kafka clients ● A few surprises upgrading to 0.9 ○ Processing big chunks of data can lead to delays on the heartbeat mechanism which may cause constant rebalances ○ Possible solutions: ■ reducing the maximum size of data consumed ■ Another possible solution: increase the consumer session timeout (prevent it from expiring before processing the data)
  • 26. TM and © 2017 Zendesk Inc. All rights reserved.