SlideShare a Scribd company logo
Brooklin Mirror Maker
How and Why we moved away from Kafka Mirror Maker
Shun-ping Chiu
Software engineer @ LinkedIn Data Pipelines
Agenda
Kafka Mirroring Use Cases
Limitations for Kafka Mirror Maker
Brooklin Mirror Maker
Future Work
Mirroring Use Cases
● Aggregating data from all data centers
● Moving data between LinkedIn and external
cloud services
Mirroring
Use Cases
Tremendous Kafka Data
● Kafka data at LinkedIn continues to grow rapidly
● We are at 5T messages and 1.4 PB everyday
Big Scale to Operate
40+Kafka src clusters
in different DCs
100+pipelines
2Tmessages/day
KMM Limitations
Kafka Mirror Maker(KMM) Topology
Datacenter B
aggregate
tracking
tracking
KMM KMM
Datacenter A
aggregate
tracking
tracking
KMM KMM
● Each KMM pipeline
○ mirrors data from 1 source cluster to 1
destination cluster
○ constitutes its own KMM cluster
Datacenter B
aggregate
tracking
tracking
Datacenter A
aggregate
tracking
tracking
KMM
aggregate
metrics
metrics
aggregate
metrics
metrics
Datacenter C
aggregate
tracking
tracking
aggregate
metrics
metrics
...
KMM KMM
KMM KMM KMM
KMM KMM KMM KMM KMM KMM
KMM KMM KMM KMM KMM KMM
KMM Setup
● # of KMM clusters =
# of data centers x # of Kafka src
clusters
● Need to operate more than 100+ KMM
clusters
● Static configuration file per KMM cluster requires every change
to be deployed
Example - Add a Topic in KMM
● Let’s say we have a pipeline (a KMM cluster) with 100+ hosts
● And 100+ pipelines ?
KMM Pain Points
● Hard to operate
○ hard to add new topic
○ difficult to split the pipeline
● One bad partition brings down the pipeline
○ deleted topic
○ ACL issue
● Performance issues
○ Unable to catch up with traffic
○ Increased lag
: (
Your Kafka Mirror Maker runs into problems and need to restart. We’re just collecting some error
infos and we will restart for you. (0% completed)
Brooklin Mirror Maker
Brooklin - Stream Ingestion Service
DestinationsSources
Data stores
Messaging systems
Microsoft
EventHubs
Data stores
Messaging systems
Microsoft
EventHubs
BMM is built on Brooklin
DestinationsSources
Data stores
Messaging systems
Microsoft
EventHubs
Data stores
Messaging systems
Microsoft
EventHubs
Brooklin Mirror Maker
● Built on top of our stream ingestion service, Brooklin
○ Better operability
○ Fault isolation
○ Performance optimizations
● BMM has fully replaced KMM at LinkedIn today
Better Operability
KMM vs BMM
Datacenter B
aggregate
tracking
tracking
BMM
Datacenter A
aggregate
tracking
tracking
BMM
Datacenter B
aggregate
tracking
tracking
KMM KMM
Datacenter A
aggregate
tracking
tracking
KMM KMM
● BMM is one cluster per data center
BMM Topology
Datacenter A
aggregate
tracking
tracking
BMM
metrics
aggregate
metrics
Datacenter B
aggregate
tracking
tracking
BMM
metrics
aggregate
metrics
Datacenter C
aggregate
tracking
tracking
BMM
metrics
aggregate
metrics
...
100+KMM clusters
~10BMM clusters
Dynamic Management API
Brooklin
Engine
Kafka src
connector
Kafka dest
connector
Management
Rest API
Diagnostics
Rest API
ZooKeeper
Management/
monitoring
portal
SRE/op
dashboards
Restful API- Creating a Pipeline
Brooklin
Engine
Management
Rest API
ZooKeeper
create POST /datastream
name: mm_DC1-tracking_DC2-aggregate-tracking
connectorName: KafkaMirrorMaker
source:
connectionString: kafkassl://DC1-tracking-vip:12345/topicA|topicB
destination:
connectionString: kafkassl://DC2-aggregate-tracking-vip:12345
metadata:
taskNums: 5
Restful API - Updating a Pipeline
Brooklin
Engine
Management
Rest API
ZooKeeper
update PUT /datastream/mm_DC1-tracking_DC2-aggregate-
tracking
name: mm_DC1-tracking_DC2-aggregate-tracking
connectorName: KafkaMirrorMaker
source:
connectionString: kafkassl://DC1-tracking-vip:12345/topicA|topicB|topicC|topicD
destination:
connectionString: kafkassl://DC2-aggregate-tracking-vip:12345
metadata:
taskNums: 10
^topic*.
Pause a Pipeline
● Manually pause and resume mirroring for each pipeline
● BMM can automatically pause mirroring for bad partitions for fault
isolation
○ Flow of messages from healthy partitions continue
○ Auto-resumes the partitions after configurable duration
Diagnostic API
Brooklin
Engine
Kafka src
connector
Kafka dest
connector
Management
Rest API
Diagnostics
Rest API
ZooKeeper
Management/
monitoring
portal
SRE/op
dashboards
Restful API - On-demand Diagnostics
Brooklin
Engine
Diagnostics
Rest API
ZooKeeper
getAllStatus GET /diag?datastream=mm_DC1-tracking_DC2-aggregate-tracking
host1.prod.linkedin.com:
datastream: mm_DC1-tracking_DC2-aggregate-tracking
assignedTopicPartitions: [topicA-0, topicA-3, topicB-0, topicB-2]
autoPausedPartitions: [{topicA-3: {reason: SEND_ERROR, description: failed to produce messages from this
partition}}]
manuallyPausedPartitions: []
host2.prod.linkedin.com:
datastream: mm_DC1-tracking_DC2-aggregate-tracking
assignedTopicPartitions: [topicA-1, topicA-2, topicB-1, topicB-3]
autoPausedPartitions: []
manuallyPausedPartitions: []
Performance Improvements
Brooklin Mirroring Pseudocode
while (!shutdown) {
records = consumer.poll();
producer.send(records);
if (timeToCommit) {
producer.flush();
consumer.commit();
}
}
Producer flush can be expensive
Flushless Produce
Only commit “safe” acknowledged checkpoints:
consumer.poll() → producer.send(records) → consumer.commit(offsets)
consumer.poll() → producer.send(records) → producer.flush() → consumer.commit()
Flushless Produce
sp0 consumer producer
checkpoint
manager
o1, o2 o1, o2 o1, o2
o1
o2
Source
Destination
ack(sp0, o2)
dp0
dp1
● Checkpoint manager maintains producer-acknowledged offsets for
each source partition
Source partition sp0
in-flight: [o1]
acked: [o2]
safe checkpoint: --
Flushless Produce
sp0 consumer producer
checkpoint
manager
o3, o4 o3, o4 o3, o4
o3
o4
Source
Destination
ack(sp0, o1)
dp0
dp1
● Update safe checkpoint to largest acknowledged offset that is less
than oldest in-flight (if any)
Source partition sp0
in-flight: [o3, o4]
acked: [o1, o2]
safe checkpoint: o2
Manage Performance through Task
● Datastream task
○ Consists of a dedicated kafka consumer and use a share producer pool to
produce the data
○ Performance is controlled by the # of Tasks
○ Tasks are assigned to each host within the BMM cluster
● BMM uses sticky assignment to speeds up the task allocation
Sticky Task Assignment
ZooKeeper
BMM
host
BMM
host
BMM
host
BMM
host
Task 1 Task 2 Task 3 Task 4
Task 5 Task 6
ZooKeeper
BMM
host
BMM
host
BMM
host
BMM
host
Task 1 Task 2 Task 3 Task 4
Task 5
BMM
host
Leader
Leader
Task 6
BMM Performance Numbers
● Testing environment
○ Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz, 12 cores, 64GB RAM
● Performance Metrics with 20 datastream tasks:
○ Throughput: compressed bytes up to 28 MB/s
○ Memory utilization: 70%
○ CPU utilization: ~100%
Passthrough Compression
● BMM is CPU bound, 70%+ CPU time is spent in decompression & re-
compression
○ GZIPInputStream.read(): ~10%
○ GZIPOutputStream.write(): ~61%
● “Passthrough” mirroring - skip the decompression & recompression
○ Throughput ~ 100MB/s
○ CPU utilization drops to 50%
Future Works
● Better workload distribution - workload
based assignment
● Auto-scaling - adjust number of tasks based
on throughput
Performance
&
Stability
Open Source
Expected at EOM, April 2019
Questions
Thank you

More Related Content

What's hot

Automate Your Kafka Cluster with Kubernetes Custom Resources
Automate Your Kafka Cluster with Kubernetes Custom Resources Automate Your Kafka Cluster with Kubernetes Custom Resources
Automate Your Kafka Cluster with Kubernetes Custom Resources
confluent
 
Git Lab Introduction
Git Lab IntroductionGit Lab Introduction
Git Lab Introduction
Krunal Doshi
 
Infrastructure & System Monitoring using Prometheus
Infrastructure & System Monitoring using PrometheusInfrastructure & System Monitoring using Prometheus
Infrastructure & System Monitoring using Prometheus
Marco Pas
 
Apache Kafka Introduction
Apache Kafka IntroductionApache Kafka Introduction
Apache Kafka Introduction
Amita Mirajkar
 
Prometheus + Grafana = Awesome Monitoring
Prometheus + Grafana = Awesome MonitoringPrometheus + Grafana = Awesome Monitoring
Prometheus + Grafana = Awesome Monitoring
Henrique Galafassi Dalssaso
 
High Availability and Disaster Recovery in PostgreSQL - EQUNIX
High Availability and Disaster Recovery in PostgreSQL - EQUNIXHigh Availability and Disaster Recovery in PostgreSQL - EQUNIX
High Availability and Disaster Recovery in PostgreSQL - EQUNIX
Julyanto SUTANDANG
 
Gitops Hands On
Gitops Hands OnGitops Hands On
Gitops Hands On
Brice Fernandes
 
Intro to GitOps & Flux.pdf
Intro to GitOps & Flux.pdfIntro to GitOps & Flux.pdf
Intro to GitOps & Flux.pdf
Weaveworks
 
Git and git workflow best practice
Git and git workflow best practiceGit and git workflow best practice
Git and git workflow best practice
Majid Hosseini
 
Patroni - HA PostgreSQL made easy
Patroni - HA PostgreSQL made easyPatroni - HA PostgreSQL made easy
Patroni - HA PostgreSQL made easy
Alexander Kukushkin
 
How to Prepare for CKA Exam
How to Prepare for CKA ExamHow to Prepare for CKA Exam
How to Prepare for CKA Exam
Alfie Chen
 
Gitlab, GitOps & ArgoCD
Gitlab, GitOps & ArgoCDGitlab, GitOps & ArgoCD
Gitlab, GitOps & ArgoCD
Haggai Philip Zagury
 
Server monitoring using grafana and prometheus
Server monitoring using grafana and prometheusServer monitoring using grafana and prometheus
Server monitoring using grafana and prometheus
Celine George
 
Explore your prometheus data in grafana - Promcon 2018
Explore your prometheus data in grafana - Promcon 2018Explore your prometheus data in grafana - Promcon 2018
Explore your prometheus data in grafana - Promcon 2018
Grafana Labs
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache Kafka
Jeff Holoman
 
Room 3 - 1 - Nguyễn Xuân Trường Lâm - Zero touch on-premise storage infrastru...
Room 3 - 1 - Nguyễn Xuân Trường Lâm - Zero touch on-premise storage infrastru...Room 3 - 1 - Nguyễn Xuân Trường Lâm - Zero touch on-premise storage infrastru...
Room 3 - 1 - Nguyễn Xuân Trường Lâm - Zero touch on-premise storage infrastru...
Vietnam Open Infrastructure User Group
 
Multi Master PostgreSQL Cluster on Kubernetes
Multi Master PostgreSQL Cluster on KubernetesMulti Master PostgreSQL Cluster on Kubernetes
Multi Master PostgreSQL Cluster on Kubernetes
Ohyama Masanori
 
CI-CD Jenkins, GitHub Actions, Tekton
CI-CD Jenkins, GitHub Actions, Tekton CI-CD Jenkins, GitHub Actions, Tekton
CI-CD Jenkins, GitHub Actions, Tekton
Araf Karsh Hamid
 
Getting up to speed with MirrorMaker 2 | Mickael Maison, IBM and Ryanne Dolan...
Getting up to speed with MirrorMaker 2 | Mickael Maison, IBM and Ryanne Dolan...Getting up to speed with MirrorMaker 2 | Mickael Maison, IBM and Ryanne Dolan...
Getting up to speed with MirrorMaker 2 | Mickael Maison, IBM and Ryanne Dolan...
HostedbyConfluent
 
OpenTelemetry For Architects
OpenTelemetry For ArchitectsOpenTelemetry For Architects
OpenTelemetry For Architects
Kevin Brockhoff
 

What's hot (20)

Automate Your Kafka Cluster with Kubernetes Custom Resources
Automate Your Kafka Cluster with Kubernetes Custom Resources Automate Your Kafka Cluster with Kubernetes Custom Resources
Automate Your Kafka Cluster with Kubernetes Custom Resources
 
Git Lab Introduction
Git Lab IntroductionGit Lab Introduction
Git Lab Introduction
 
Infrastructure & System Monitoring using Prometheus
Infrastructure & System Monitoring using PrometheusInfrastructure & System Monitoring using Prometheus
Infrastructure & System Monitoring using Prometheus
 
Apache Kafka Introduction
Apache Kafka IntroductionApache Kafka Introduction
Apache Kafka Introduction
 
Prometheus + Grafana = Awesome Monitoring
Prometheus + Grafana = Awesome MonitoringPrometheus + Grafana = Awesome Monitoring
Prometheus + Grafana = Awesome Monitoring
 
High Availability and Disaster Recovery in PostgreSQL - EQUNIX
High Availability and Disaster Recovery in PostgreSQL - EQUNIXHigh Availability and Disaster Recovery in PostgreSQL - EQUNIX
High Availability and Disaster Recovery in PostgreSQL - EQUNIX
 
Gitops Hands On
Gitops Hands OnGitops Hands On
Gitops Hands On
 
Intro to GitOps & Flux.pdf
Intro to GitOps & Flux.pdfIntro to GitOps & Flux.pdf
Intro to GitOps & Flux.pdf
 
Git and git workflow best practice
Git and git workflow best practiceGit and git workflow best practice
Git and git workflow best practice
 
Patroni - HA PostgreSQL made easy
Patroni - HA PostgreSQL made easyPatroni - HA PostgreSQL made easy
Patroni - HA PostgreSQL made easy
 
How to Prepare for CKA Exam
How to Prepare for CKA ExamHow to Prepare for CKA Exam
How to Prepare for CKA Exam
 
Gitlab, GitOps & ArgoCD
Gitlab, GitOps & ArgoCDGitlab, GitOps & ArgoCD
Gitlab, GitOps & ArgoCD
 
Server monitoring using grafana and prometheus
Server monitoring using grafana and prometheusServer monitoring using grafana and prometheus
Server monitoring using grafana and prometheus
 
Explore your prometheus data in grafana - Promcon 2018
Explore your prometheus data in grafana - Promcon 2018Explore your prometheus data in grafana - Promcon 2018
Explore your prometheus data in grafana - Promcon 2018
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache Kafka
 
Room 3 - 1 - Nguyễn Xuân Trường Lâm - Zero touch on-premise storage infrastru...
Room 3 - 1 - Nguyễn Xuân Trường Lâm - Zero touch on-premise storage infrastru...Room 3 - 1 - Nguyễn Xuân Trường Lâm - Zero touch on-premise storage infrastru...
Room 3 - 1 - Nguyễn Xuân Trường Lâm - Zero touch on-premise storage infrastru...
 
Multi Master PostgreSQL Cluster on Kubernetes
Multi Master PostgreSQL Cluster on KubernetesMulti Master PostgreSQL Cluster on Kubernetes
Multi Master PostgreSQL Cluster on Kubernetes
 
CI-CD Jenkins, GitHub Actions, Tekton
CI-CD Jenkins, GitHub Actions, Tekton CI-CD Jenkins, GitHub Actions, Tekton
CI-CD Jenkins, GitHub Actions, Tekton
 
Getting up to speed with MirrorMaker 2 | Mickael Maison, IBM and Ryanne Dolan...
Getting up to speed with MirrorMaker 2 | Mickael Maison, IBM and Ryanne Dolan...Getting up to speed with MirrorMaker 2 | Mickael Maison, IBM and Ryanne Dolan...
Getting up to speed with MirrorMaker 2 | Mickael Maison, IBM and Ryanne Dolan...
 
OpenTelemetry For Architects
OpenTelemetry For ArchitectsOpenTelemetry For Architects
OpenTelemetry For Architects
 

Similar to Brooklin Mirror Maker - How and why we moved away from Kafka Mirror Maker

More Data, More Problems: Scaling Kafka-Mirroring Pipelines at LinkedIn
More Data, More Problems: Scaling Kafka-Mirroring Pipelines at LinkedIn More Data, More Problems: Scaling Kafka-Mirroring Pipelines at LinkedIn
More Data, More Problems: Scaling Kafka-Mirroring Pipelines at LinkedIn
confluent
 
More Data, More Problems: Scaling Kafka Mirroring Pipelines at LinkedIn
More Data, More Problems: Scaling Kafka Mirroring Pipelines at LinkedInMore Data, More Problems: Scaling Kafka Mirroring Pipelines at LinkedIn
More Data, More Problems: Scaling Kafka Mirroring Pipelines at LinkedIn
Celia Kung
 
Dive into Streams with Brooklin
Dive into Streams with BrooklinDive into Streams with Brooklin
Dive into Streams with Brooklin
Celia Kung
 
OSDC 2018 | Hardware-level data-center monitoring with Prometheus by Conrad H...
OSDC 2018 | Hardware-level data-center monitoring with Prometheus by Conrad H...OSDC 2018 | Hardware-level data-center monitoring with Prometheus by Conrad H...
OSDC 2018 | Hardware-level data-center monitoring with Prometheus by Conrad H...
NETWAYS
 
Kakao Cloud Native Platform, 9rum
Kakao Cloud Native Platform, 9rumKakao Cloud Native Platform, 9rum
Kakao Cloud Native Platform, 9rum
if kakao
 
How to Succeed in Hadoop: comScore’s Deceptively Simple Secrets to Deploying ...
How to Succeed in Hadoop: comScore’s Deceptively Simple Secrets to Deploying ...How to Succeed in Hadoop: comScore’s Deceptively Simple Secrets to Deploying ...
How to Succeed in Hadoop: comScore’s Deceptively Simple Secrets to Deploying ...
MapR Technologies
 
How to Suceed in Hadoop
How to Suceed in HadoopHow to Suceed in Hadoop
How to Suceed in HadoopPrecisely
 
SingleStore & Kafka: Better Together to Power Modern Real-Time Data Architect...
SingleStore & Kafka: Better Together to Power Modern Real-Time Data Architect...SingleStore & Kafka: Better Together to Power Modern Real-Time Data Architect...
SingleStore & Kafka: Better Together to Power Modern Real-Time Data Architect...
HostedbyConfluent
 
Life of a startup - Sjoerd Mulder - Codemotion Amsterdam 2017
Life of a startup - Sjoerd Mulder - Codemotion Amsterdam 2017Life of a startup - Sjoerd Mulder - Codemotion Amsterdam 2017
Life of a startup - Sjoerd Mulder - Codemotion Amsterdam 2017
Codemotion
 
Save Money by Uncovering Kafka’s Hidden Cloud Costs
Save Money by Uncovering Kafka’s Hidden Cloud CostsSave Money by Uncovering Kafka’s Hidden Cloud Costs
Save Money by Uncovering Kafka’s Hidden Cloud Costs
HostedbyConfluent
 
A Dive into Streams @LinkedIn with Brooklin
A Dive into Streams @LinkedIn with BrooklinA Dive into Streams @LinkedIn with Brooklin
A Dive into Streams @LinkedIn with Brooklin
C4Media
 
Tivoli Common Reporting and Cognos - Customer Case
Tivoli Common Reporting and Cognos - Customer CaseTivoli Common Reporting and Cognos - Customer Case
Tivoli Common Reporting and Cognos - Customer Case
IBM Danmark
 
Data Transformations on Ops Metrics using Kafka Streams (Srividhya Ramachandr...
Data Transformations on Ops Metrics using Kafka Streams (Srividhya Ramachandr...Data Transformations on Ops Metrics using Kafka Streams (Srividhya Ramachandr...
Data Transformations on Ops Metrics using Kafka Streams (Srividhya Ramachandr...
confluent
 
Cloud Native Data Pipelines (GoTo Chicago 2017)
Cloud Native Data Pipelines (GoTo Chicago 2017)Cloud Native Data Pipelines (GoTo Chicago 2017)
Cloud Native Data Pipelines (GoTo Chicago 2017)
Sid Anand
 
WSO2 Customer Webinar: WEST Interactive’s Deployment Approach and DevOps Prac...
WSO2 Customer Webinar: WEST Interactive’s Deployment Approach and DevOps Prac...WSO2 Customer Webinar: WEST Interactive’s Deployment Approach and DevOps Prac...
WSO2 Customer Webinar: WEST Interactive’s Deployment Approach and DevOps Prac...
WSO2
 
The state of Spark in the cloud
The state of Spark in the cloudThe state of Spark in the cloud
The state of Spark in the cloud
Nicolas Poggi
 
ietf115-grow-bmp-high-availability.pptx
ietf115-grow-bmp-high-availability.pptxietf115-grow-bmp-high-availability.pptx
ietf115-grow-bmp-high-availability.pptx
ThomasGraf40
 
SingleStore & Kafka: Better Together to Power Modern Real-Time Data Architect...
SingleStore & Kafka: Better Together to Power Modern Real-Time Data Architect...SingleStore & Kafka: Better Together to Power Modern Real-Time Data Architect...
SingleStore & Kafka: Better Together to Power Modern Real-Time Data Architect...
HostedbyConfluent
 
Stream Processing with Flink and Stream Sharing
Stream Processing with Flink and Stream SharingStream Processing with Flink and Stream Sharing
Stream Processing with Flink and Stream Sharing
confluent
 
Cloud Native Data Pipelines
Cloud Native Data PipelinesCloud Native Data Pipelines
Cloud Native Data Pipelines
Bill Liu
 

Similar to Brooklin Mirror Maker - How and why we moved away from Kafka Mirror Maker (20)

More Data, More Problems: Scaling Kafka-Mirroring Pipelines at LinkedIn
More Data, More Problems: Scaling Kafka-Mirroring Pipelines at LinkedIn More Data, More Problems: Scaling Kafka-Mirroring Pipelines at LinkedIn
More Data, More Problems: Scaling Kafka-Mirroring Pipelines at LinkedIn
 
More Data, More Problems: Scaling Kafka Mirroring Pipelines at LinkedIn
More Data, More Problems: Scaling Kafka Mirroring Pipelines at LinkedInMore Data, More Problems: Scaling Kafka Mirroring Pipelines at LinkedIn
More Data, More Problems: Scaling Kafka Mirroring Pipelines at LinkedIn
 
Dive into Streams with Brooklin
Dive into Streams with BrooklinDive into Streams with Brooklin
Dive into Streams with Brooklin
 
OSDC 2018 | Hardware-level data-center monitoring with Prometheus by Conrad H...
OSDC 2018 | Hardware-level data-center monitoring with Prometheus by Conrad H...OSDC 2018 | Hardware-level data-center monitoring with Prometheus by Conrad H...
OSDC 2018 | Hardware-level data-center monitoring with Prometheus by Conrad H...
 
Kakao Cloud Native Platform, 9rum
Kakao Cloud Native Platform, 9rumKakao Cloud Native Platform, 9rum
Kakao Cloud Native Platform, 9rum
 
How to Succeed in Hadoop: comScore’s Deceptively Simple Secrets to Deploying ...
How to Succeed in Hadoop: comScore’s Deceptively Simple Secrets to Deploying ...How to Succeed in Hadoop: comScore’s Deceptively Simple Secrets to Deploying ...
How to Succeed in Hadoop: comScore’s Deceptively Simple Secrets to Deploying ...
 
How to Suceed in Hadoop
How to Suceed in HadoopHow to Suceed in Hadoop
How to Suceed in Hadoop
 
SingleStore & Kafka: Better Together to Power Modern Real-Time Data Architect...
SingleStore & Kafka: Better Together to Power Modern Real-Time Data Architect...SingleStore & Kafka: Better Together to Power Modern Real-Time Data Architect...
SingleStore & Kafka: Better Together to Power Modern Real-Time Data Architect...
 
Life of a startup - Sjoerd Mulder - Codemotion Amsterdam 2017
Life of a startup - Sjoerd Mulder - Codemotion Amsterdam 2017Life of a startup - Sjoerd Mulder - Codemotion Amsterdam 2017
Life of a startup - Sjoerd Mulder - Codemotion Amsterdam 2017
 
Save Money by Uncovering Kafka’s Hidden Cloud Costs
Save Money by Uncovering Kafka’s Hidden Cloud CostsSave Money by Uncovering Kafka’s Hidden Cloud Costs
Save Money by Uncovering Kafka’s Hidden Cloud Costs
 
A Dive into Streams @LinkedIn with Brooklin
A Dive into Streams @LinkedIn with BrooklinA Dive into Streams @LinkedIn with Brooklin
A Dive into Streams @LinkedIn with Brooklin
 
Tivoli Common Reporting and Cognos - Customer Case
Tivoli Common Reporting and Cognos - Customer CaseTivoli Common Reporting and Cognos - Customer Case
Tivoli Common Reporting and Cognos - Customer Case
 
Data Transformations on Ops Metrics using Kafka Streams (Srividhya Ramachandr...
Data Transformations on Ops Metrics using Kafka Streams (Srividhya Ramachandr...Data Transformations on Ops Metrics using Kafka Streams (Srividhya Ramachandr...
Data Transformations on Ops Metrics using Kafka Streams (Srividhya Ramachandr...
 
Cloud Native Data Pipelines (GoTo Chicago 2017)
Cloud Native Data Pipelines (GoTo Chicago 2017)Cloud Native Data Pipelines (GoTo Chicago 2017)
Cloud Native Data Pipelines (GoTo Chicago 2017)
 
WSO2 Customer Webinar: WEST Interactive’s Deployment Approach and DevOps Prac...
WSO2 Customer Webinar: WEST Interactive’s Deployment Approach and DevOps Prac...WSO2 Customer Webinar: WEST Interactive’s Deployment Approach and DevOps Prac...
WSO2 Customer Webinar: WEST Interactive’s Deployment Approach and DevOps Prac...
 
The state of Spark in the cloud
The state of Spark in the cloudThe state of Spark in the cloud
The state of Spark in the cloud
 
ietf115-grow-bmp-high-availability.pptx
ietf115-grow-bmp-high-availability.pptxietf115-grow-bmp-high-availability.pptx
ietf115-grow-bmp-high-availability.pptx
 
SingleStore & Kafka: Better Together to Power Modern Real-Time Data Architect...
SingleStore & Kafka: Better Together to Power Modern Real-Time Data Architect...SingleStore & Kafka: Better Together to Power Modern Real-Time Data Architect...
SingleStore & Kafka: Better Together to Power Modern Real-Time Data Architect...
 
Stream Processing with Flink and Stream Sharing
Stream Processing with Flink and Stream SharingStream Processing with Flink and Stream Sharing
Stream Processing with Flink and Stream Sharing
 
Cloud Native Data Pipelines
Cloud Native Data PipelinesCloud Native Data Pipelines
Cloud Native Data Pipelines
 

Recently uploaded

Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024
Paco van Beckhoven
 
AI Genie Review: World’s First Open AI WordPress Website Creator
AI Genie Review: World’s First Open AI WordPress Website CreatorAI Genie Review: World’s First Open AI WordPress Website Creator
AI Genie Review: World’s First Open AI WordPress Website Creator
Google
 
Navigating the Metaverse: A Journey into Virtual Evolution"
Navigating the Metaverse: A Journey into Virtual Evolution"Navigating the Metaverse: A Journey into Virtual Evolution"
Navigating the Metaverse: A Journey into Virtual Evolution"
Donna Lenk
 
Game Development with Unity3D (Game Development lecture 3)
Game Development  with Unity3D (Game Development lecture 3)Game Development  with Unity3D (Game Development lecture 3)
Game Development with Unity3D (Game Development lecture 3)
abdulrafaychaudhry
 
Pro Unity Game Development with C-sharp Book
Pro Unity Game Development with C-sharp BookPro Unity Game Development with C-sharp Book
Pro Unity Game Development with C-sharp Book
abdulrafaychaudhry
 
GOING AOT WITH GRAALVM FOR SPRING BOOT (SPRING IO)
GOING AOT WITH GRAALVM FOR  SPRING BOOT (SPRING IO)GOING AOT WITH GRAALVM FOR  SPRING BOOT (SPRING IO)
GOING AOT WITH GRAALVM FOR SPRING BOOT (SPRING IO)
Alina Yurenko
 
Large Language Models and the End of Programming
Large Language Models and the End of ProgrammingLarge Language Models and the End of Programming
Large Language Models and the End of Programming
Matt Welsh
 
GraphSummit Paris - The art of the possible with Graph Technology
GraphSummit Paris - The art of the possible with Graph TechnologyGraphSummit Paris - The art of the possible with Graph Technology
GraphSummit Paris - The art of the possible with Graph Technology
Neo4j
 
Globus Connect Server Deep Dive - GlobusWorld 2024
Globus Connect Server Deep Dive - GlobusWorld 2024Globus Connect Server Deep Dive - GlobusWorld 2024
Globus Connect Server Deep Dive - GlobusWorld 2024
Globus
 
Nidhi Software Price. Fact , Costs, Tips
Nidhi Software Price. Fact , Costs, TipsNidhi Software Price. Fact , Costs, Tips
Nidhi Software Price. Fact , Costs, Tips
vrstrong314
 
May Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdfMay Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdf
Adele Miller
 
Text-Summarization-of-Breaking-News-Using-Fine-tuning-BART-Model.pptx
Text-Summarization-of-Breaking-News-Using-Fine-tuning-BART-Model.pptxText-Summarization-of-Breaking-News-Using-Fine-tuning-BART-Model.pptx
Text-Summarization-of-Breaking-News-Using-Fine-tuning-BART-Model.pptx
ShamsuddeenMuhammadA
 
Enterprise Resource Planning System in Telangana
Enterprise Resource Planning System in TelanganaEnterprise Resource Planning System in Telangana
Enterprise Resource Planning System in Telangana
NYGGS Automation Suite
 
Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604
Fermin Galan
 
Mobile App Development Company In Noida | Drona Infotech
Mobile App Development Company In Noida | Drona InfotechMobile App Development Company In Noida | Drona Infotech
Mobile App Development Company In Noida | Drona Infotech
Drona Infotech
 
OpenMetadata Community Meeting - 5th June 2024
OpenMetadata Community Meeting - 5th June 2024OpenMetadata Community Meeting - 5th June 2024
OpenMetadata Community Meeting - 5th June 2024
OpenMetadata
 
APIs for Browser Automation (MoT Meetup 2024)
APIs for Browser Automation (MoT Meetup 2024)APIs for Browser Automation (MoT Meetup 2024)
APIs for Browser Automation (MoT Meetup 2024)
Boni García
 
Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024
Globus
 
Empowering Growth with Best Software Development Company in Noida - Deuglo
Empowering Growth with Best Software  Development Company in Noida - DeugloEmpowering Growth with Best Software  Development Company in Noida - Deuglo
Empowering Growth with Best Software Development Company in Noida - Deuglo
Deuglo Infosystem Pvt Ltd
 
Quarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden ExtensionsQuarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden Extensions
Max Andersen
 

Recently uploaded (20)

Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024
 
AI Genie Review: World’s First Open AI WordPress Website Creator
AI Genie Review: World’s First Open AI WordPress Website CreatorAI Genie Review: World’s First Open AI WordPress Website Creator
AI Genie Review: World’s First Open AI WordPress Website Creator
 
Navigating the Metaverse: A Journey into Virtual Evolution"
Navigating the Metaverse: A Journey into Virtual Evolution"Navigating the Metaverse: A Journey into Virtual Evolution"
Navigating the Metaverse: A Journey into Virtual Evolution"
 
Game Development with Unity3D (Game Development lecture 3)
Game Development  with Unity3D (Game Development lecture 3)Game Development  with Unity3D (Game Development lecture 3)
Game Development with Unity3D (Game Development lecture 3)
 
Pro Unity Game Development with C-sharp Book
Pro Unity Game Development with C-sharp BookPro Unity Game Development with C-sharp Book
Pro Unity Game Development with C-sharp Book
 
GOING AOT WITH GRAALVM FOR SPRING BOOT (SPRING IO)
GOING AOT WITH GRAALVM FOR  SPRING BOOT (SPRING IO)GOING AOT WITH GRAALVM FOR  SPRING BOOT (SPRING IO)
GOING AOT WITH GRAALVM FOR SPRING BOOT (SPRING IO)
 
Large Language Models and the End of Programming
Large Language Models and the End of ProgrammingLarge Language Models and the End of Programming
Large Language Models and the End of Programming
 
GraphSummit Paris - The art of the possible with Graph Technology
GraphSummit Paris - The art of the possible with Graph TechnologyGraphSummit Paris - The art of the possible with Graph Technology
GraphSummit Paris - The art of the possible with Graph Technology
 
Globus Connect Server Deep Dive - GlobusWorld 2024
Globus Connect Server Deep Dive - GlobusWorld 2024Globus Connect Server Deep Dive - GlobusWorld 2024
Globus Connect Server Deep Dive - GlobusWorld 2024
 
Nidhi Software Price. Fact , Costs, Tips
Nidhi Software Price. Fact , Costs, TipsNidhi Software Price. Fact , Costs, Tips
Nidhi Software Price. Fact , Costs, Tips
 
May Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdfMay Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdf
 
Text-Summarization-of-Breaking-News-Using-Fine-tuning-BART-Model.pptx
Text-Summarization-of-Breaking-News-Using-Fine-tuning-BART-Model.pptxText-Summarization-of-Breaking-News-Using-Fine-tuning-BART-Model.pptx
Text-Summarization-of-Breaking-News-Using-Fine-tuning-BART-Model.pptx
 
Enterprise Resource Planning System in Telangana
Enterprise Resource Planning System in TelanganaEnterprise Resource Planning System in Telangana
Enterprise Resource Planning System in Telangana
 
Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604
 
Mobile App Development Company In Noida | Drona Infotech
Mobile App Development Company In Noida | Drona InfotechMobile App Development Company In Noida | Drona Infotech
Mobile App Development Company In Noida | Drona Infotech
 
OpenMetadata Community Meeting - 5th June 2024
OpenMetadata Community Meeting - 5th June 2024OpenMetadata Community Meeting - 5th June 2024
OpenMetadata Community Meeting - 5th June 2024
 
APIs for Browser Automation (MoT Meetup 2024)
APIs for Browser Automation (MoT Meetup 2024)APIs for Browser Automation (MoT Meetup 2024)
APIs for Browser Automation (MoT Meetup 2024)
 
Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024
 
Empowering Growth with Best Software Development Company in Noida - Deuglo
Empowering Growth with Best Software  Development Company in Noida - DeugloEmpowering Growth with Best Software  Development Company in Noida - Deuglo
Empowering Growth with Best Software Development Company in Noida - Deuglo
 
Quarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden ExtensionsQuarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden Extensions
 

Brooklin Mirror Maker - How and why we moved away from Kafka Mirror Maker

  • 1. Brooklin Mirror Maker How and Why we moved away from Kafka Mirror Maker Shun-ping Chiu Software engineer @ LinkedIn Data Pipelines
  • 2. Agenda Kafka Mirroring Use Cases Limitations for Kafka Mirror Maker Brooklin Mirror Maker Future Work
  • 4. ● Aggregating data from all data centers ● Moving data between LinkedIn and external cloud services Mirroring Use Cases
  • 5. Tremendous Kafka Data ● Kafka data at LinkedIn continues to grow rapidly ● We are at 5T messages and 1.4 PB everyday
  • 6. Big Scale to Operate 40+Kafka src clusters in different DCs 100+pipelines 2Tmessages/day
  • 8. Kafka Mirror Maker(KMM) Topology Datacenter B aggregate tracking tracking KMM KMM Datacenter A aggregate tracking tracking KMM KMM ● Each KMM pipeline ○ mirrors data from 1 source cluster to 1 destination cluster ○ constitutes its own KMM cluster
  • 9. Datacenter B aggregate tracking tracking Datacenter A aggregate tracking tracking KMM aggregate metrics metrics aggregate metrics metrics Datacenter C aggregate tracking tracking aggregate metrics metrics ... KMM KMM KMM KMM KMM KMM KMM KMM KMM KMM KMM KMM KMM KMM KMM KMM KMM KMM Setup ● # of KMM clusters = # of data centers x # of Kafka src clusters ● Need to operate more than 100+ KMM clusters
  • 10. ● Static configuration file per KMM cluster requires every change to be deployed Example - Add a Topic in KMM ● Let’s say we have a pipeline (a KMM cluster) with 100+ hosts ● And 100+ pipelines ?
  • 11. KMM Pain Points ● Hard to operate ○ hard to add new topic ○ difficult to split the pipeline ● One bad partition brings down the pipeline ○ deleted topic ○ ACL issue ● Performance issues ○ Unable to catch up with traffic ○ Increased lag
  • 12. : ( Your Kafka Mirror Maker runs into problems and need to restart. We’re just collecting some error infos and we will restart for you. (0% completed)
  • 14. Brooklin - Stream Ingestion Service DestinationsSources Data stores Messaging systems Microsoft EventHubs Data stores Messaging systems Microsoft EventHubs
  • 15. BMM is built on Brooklin DestinationsSources Data stores Messaging systems Microsoft EventHubs Data stores Messaging systems Microsoft EventHubs
  • 16. Brooklin Mirror Maker ● Built on top of our stream ingestion service, Brooklin ○ Better operability ○ Fault isolation ○ Performance optimizations ● BMM has fully replaced KMM at LinkedIn today
  • 18. KMM vs BMM Datacenter B aggregate tracking tracking BMM Datacenter A aggregate tracking tracking BMM Datacenter B aggregate tracking tracking KMM KMM Datacenter A aggregate tracking tracking KMM KMM ● BMM is one cluster per data center
  • 19. BMM Topology Datacenter A aggregate tracking tracking BMM metrics aggregate metrics Datacenter B aggregate tracking tracking BMM metrics aggregate metrics Datacenter C aggregate tracking tracking BMM metrics aggregate metrics ... 100+KMM clusters ~10BMM clusters
  • 20. Dynamic Management API Brooklin Engine Kafka src connector Kafka dest connector Management Rest API Diagnostics Rest API ZooKeeper Management/ monitoring portal SRE/op dashboards
  • 21. Restful API- Creating a Pipeline Brooklin Engine Management Rest API ZooKeeper create POST /datastream name: mm_DC1-tracking_DC2-aggregate-tracking connectorName: KafkaMirrorMaker source: connectionString: kafkassl://DC1-tracking-vip:12345/topicA|topicB destination: connectionString: kafkassl://DC2-aggregate-tracking-vip:12345 metadata: taskNums: 5
  • 22. Restful API - Updating a Pipeline Brooklin Engine Management Rest API ZooKeeper update PUT /datastream/mm_DC1-tracking_DC2-aggregate- tracking name: mm_DC1-tracking_DC2-aggregate-tracking connectorName: KafkaMirrorMaker source: connectionString: kafkassl://DC1-tracking-vip:12345/topicA|topicB|topicC|topicD destination: connectionString: kafkassl://DC2-aggregate-tracking-vip:12345 metadata: taskNums: 10 ^topic*.
  • 23. Pause a Pipeline ● Manually pause and resume mirroring for each pipeline ● BMM can automatically pause mirroring for bad partitions for fault isolation ○ Flow of messages from healthy partitions continue ○ Auto-resumes the partitions after configurable duration
  • 24. Diagnostic API Brooklin Engine Kafka src connector Kafka dest connector Management Rest API Diagnostics Rest API ZooKeeper Management/ monitoring portal SRE/op dashboards
  • 25. Restful API - On-demand Diagnostics Brooklin Engine Diagnostics Rest API ZooKeeper getAllStatus GET /diag?datastream=mm_DC1-tracking_DC2-aggregate-tracking host1.prod.linkedin.com: datastream: mm_DC1-tracking_DC2-aggregate-tracking assignedTopicPartitions: [topicA-0, topicA-3, topicB-0, topicB-2] autoPausedPartitions: [{topicA-3: {reason: SEND_ERROR, description: failed to produce messages from this partition}}] manuallyPausedPartitions: [] host2.prod.linkedin.com: datastream: mm_DC1-tracking_DC2-aggregate-tracking assignedTopicPartitions: [topicA-1, topicA-2, topicB-1, topicB-3] autoPausedPartitions: [] manuallyPausedPartitions: []
  • 27. Brooklin Mirroring Pseudocode while (!shutdown) { records = consumer.poll(); producer.send(records); if (timeToCommit) { producer.flush(); consumer.commit(); } } Producer flush can be expensive
  • 28. Flushless Produce Only commit “safe” acknowledged checkpoints: consumer.poll() → producer.send(records) → consumer.commit(offsets) consumer.poll() → producer.send(records) → producer.flush() → consumer.commit()
  • 29. Flushless Produce sp0 consumer producer checkpoint manager o1, o2 o1, o2 o1, o2 o1 o2 Source Destination ack(sp0, o2) dp0 dp1 ● Checkpoint manager maintains producer-acknowledged offsets for each source partition Source partition sp0 in-flight: [o1] acked: [o2] safe checkpoint: --
  • 30. Flushless Produce sp0 consumer producer checkpoint manager o3, o4 o3, o4 o3, o4 o3 o4 Source Destination ack(sp0, o1) dp0 dp1 ● Update safe checkpoint to largest acknowledged offset that is less than oldest in-flight (if any) Source partition sp0 in-flight: [o3, o4] acked: [o1, o2] safe checkpoint: o2
  • 31. Manage Performance through Task ● Datastream task ○ Consists of a dedicated kafka consumer and use a share producer pool to produce the data ○ Performance is controlled by the # of Tasks ○ Tasks are assigned to each host within the BMM cluster ● BMM uses sticky assignment to speeds up the task allocation
  • 32. Sticky Task Assignment ZooKeeper BMM host BMM host BMM host BMM host Task 1 Task 2 Task 3 Task 4 Task 5 Task 6 ZooKeeper BMM host BMM host BMM host BMM host Task 1 Task 2 Task 3 Task 4 Task 5 BMM host Leader Leader Task 6
  • 33. BMM Performance Numbers ● Testing environment ○ Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz, 12 cores, 64GB RAM ● Performance Metrics with 20 datastream tasks: ○ Throughput: compressed bytes up to 28 MB/s ○ Memory utilization: 70% ○ CPU utilization: ~100%
  • 34. Passthrough Compression ● BMM is CPU bound, 70%+ CPU time is spent in decompression & re- compression ○ GZIPInputStream.read(): ~10% ○ GZIPOutputStream.write(): ~61% ● “Passthrough” mirroring - skip the decompression & recompression ○ Throughput ~ 100MB/s ○ CPU utilization drops to 50%
  • 36. ● Better workload distribution - workload based assignment ● Auto-scaling - adjust number of tasks based on throughput Performance & Stability
  • 37. Open Source Expected at EOM, April 2019