SlideShare a Scribd company logo
Cloud Messaging Service
Technical Overview
P R E S E N T E D B Y M a t t e o M e r l i S e p t e m b e r 2 1 , 2 0 1 5
Sections
2
1. Introduction
2. Architecture
3. Bookkeeper
4. Future
5. Q & A
CMS - Technical Overview
What is CMS
3
• Hosted Pub / Sub
• Multi tenant (Auth / Quotas / Load Balancer)
• Horizontally scalable
• Highly available, durable and consistent storage
• Geo Replication
• In production since 2013
CMS - Technical Overview
CMS Cluster
Producer
Broker
Consumer
Bookie
ZK
Global
ZK
Replication
CMS key features
4 CMS - Technical Overview
• Multi-tenancy / hosted
• Operating a system at scale is hard and requires deep understanding of internals
• Authentication / Self service provisioning / Quotas
• SLAs (Write latency 2ms avg - 5ms 99pct)
• Maintain the same latencies and throughput under backlog draining scenarios
• Simple high level API with clear ordering, durability and consistency semantics
• Geo-replication
• Single API call to configure regions to replicate to
• Load balancer: Dynamically optimize topics assignment to brokers
• Support large number of topics
• Store subscription position
• Apps don’t need to store it
• Able to delete data as soon as it's consumed
• Support round-robin distribution across multiple consumers
Work load examples
5 CMS - Technical Overview
Challenge # Topics # Producers /
topic
# Subscriptions /
topic
Produced
msg rate / s / topic
Fan-out 1 1 1 K 1 K
Throughput & latency 1 1 1 100 K
# Topics & latency 1 M 1 10 10
Fan-in 1 1 K 1 > 100 K
• Design to support wide range of use cases
• Need to be cost effective in every case
2. Architecture
Messaging model
7 CMS - Technical Overview
• Producers can attach to a topic and send messages to it
• A subscription is a durable resources that is the recipient of all messages sent to
the topic, after its creation
• Subscriptions do have a type:
• “Exclusive” means that only one consumer is allowed to attach to this subscription. First
consumer decides the type.
• “Shared” allows multiple consumers. Messages are sent in round-robin distribution. No
ordering guarantees.
• “Failover” allows multiple consumers, though only one is receiving messages at a given
point, while others are in standby mode.
Consumer-5
Failover
Subscription-C
Consumer-4
Consumer-3
Consumer-2
Subscription-B
Shared
Exclusive
Consumer-1
Subscription-AProducer-X
Producer-Y
Topic
Client API
8
▪ Expose messaging model concepts (producer/consumer)
▪ C++ and Java
▪ Connection pooling
▪ Handle recoverable failures transparently (reconnect / resend
messages) without compromising ordering guarantees
▪ Sync / async version of every operation
CMS - Technical Overview
Java producer example
9
CmsClient client = CmsClient.create("http://<broker vip>:4080");
Producer producer = client.createProducer("my-topic");
// handles retries in case of failure
producer.send("my-message".getBytes());
// Async version:
producer.sendAsync("my-message".getBytes()).thenRun(() -> {
// Message was persisted
});
CMS - Technical Overview
Java consumer example
10
CmsClient client = CmsClient.create(“http://<broker vip>:4080");
Consumer consumer = client.subscribe(
“my-topic",
"my-subscription-name",
SubscriptionType.Exclusive);
// Blocks until message available
Message msg = consumer.receive();
// Do something...
consumer.acknowledge(msg);
CMS - Technical Overview
System overview
11 CMS - Technical Overview
Broker
• State-less
• Maintain in memory cache of
messages
• Read from Bookkeeper when
cache miss
Bookkeeper
• Distributed write-ahead log
• Create many ledgers
• Append entries
• Read entries
• Delete ledger
• Consistent reads
• Single writer (the broker)
CMS Cluster
Broker
Bookie
ZK
Global
ZK
Replication
Native
dispatcher
Managed
Ledger
BK
Client
Global
replicators
Cache
Load
Balancer
Producer App
CMS client
Consumer App
CMS client
System overview
12 CMS - Technical Overview
Native dispatcher
• Async Netty server
Global replicators
• If topic is global, republish
messages in other regions
Global Zookeeper
• ZK instance with participants in
multiple US regions
• Consistent data store for
customers configuration
• Accept writes with one region
downCMS Cluster
Broker
Bookie
ZK
Global
ZK
Replication
Native
dispatcher
Managed
Ledger
BK
Client
Global
replicators
Cache
Load
Balancer
Producer App
CMS client
Consumer App
CMS client
Partitioned topics
13
▪ Client lib has a wrapper producer/
consumer implementation
▪ No API changes
▪ Producers can decide how to
assign messages to partitions:
▪ Single partition
▪ Round robin
▪ Provide a key on the message
▪ Hash of the key determines the
partition
▪ Custom routing
CMS - Technical Overview
App
CMS Cluster
Broker 1
Producer
T1
P0
P1
P2
P3
P4
T1-
P0
Broker 2
Broker 3
T1-
P1
T1-
P2
T1-
P3
T1-
P4
Partitioned topics
14
▪ Consumers can use all
subscription type with the same
semantics
▪ In “Failover” subscription type, the
election is done per partition
▪ Evenly spread the partitions
assignment across all available
consumers
▪ No need for ZK coordination
CMS - Technical Overview
CMS Cluster
Broker 1
App
Consumer-1
T1
C0
C1
C2
C3
C4
T1-
P0
Broker 2
Broker 3
T1-
P1
T1-
P2
T1-
P3
T1-
P4
App
Consumer-2
T1
C0
C1
C2
C3
C4
3. Bookkeeper
CMS Bookkeeper usage
16
▪ CMS uses Bookkeeper through a higher level interface of
ManagedLedger:
› A single managed ledger represent the storage of a single topic
› Maintains list of currently active BK ledgers
› Maintains the subscription positions using an additional ledger to checkpoint the last
acknowledged message in the stream
› Cache data
› Deletes ledgers when all cursors are done with them
CMS - Technical Overview
Bookie internal structure
17 CMS - Technical Overview
• Writes are written both to
journal and to ledger storage
(in different device)
• Ledger storage writes are
fsynced periodically
• Reads are only coming from
ledger storage
• Entries are interleaved in entry
log files
• Ledger indexes are used to
find entries offset
Bookkeeper issues
18
▪ Performance degrades when writing to many ledgers at the same time
▪ When there are heavy reads, the ledger storage device gets slow and
will impact writes
▪ Ledger storage flushes need to fsync many ledger index files each time
CMS - Technical Overview
Bookie storage improvements
19 CMS - Technical Overview
• Writes are written both to
journal and to in memory write
cache
• Entries are periodically flushed
• Entries are sorted by ledger to
be sequential on disk (per
flush period)
• Since entries are sequential,
we added read-ahead cache
• Location index is mostly kept
in memory and only updated
during flush
Bookkeeper write latency
20
▪ After hardware, next limit to achieve low latency is JVM GC
▪ GC pauses are unavoidable. Try to keep them around ~50ms and as
least as frequents as possible
› Switched BK client and servers to use Netty pooled ref-counted buffers and direct
memory to hide it from GC and eliminate payload copies
› Extensively profiled allocations and substantially reduced per-entry objects allocations
• Use Recycler pattern to pool objects (very efficient for same thread allocate/release)
• Primitive collections
• Array queue instead of linked queues in executors
• Open hash maps instead of linked hash maps
• BTree instead of ConcurrentSkipList
CMS - Technical Overview
Bookie ledgers scalability
21 CMS - Technical Overview
Single bookie — 15K write/s
BKwritelatency(ms)
0
1
2
3
4
Ledgers / bookie
1 1000 5000 10000 20000 50000
Avg 99pct
4. Future
Auto batching
23
▪ Send messages in batches throughout the system
▪ Transparent to application
▪ Configure group timing and size: e.g.: 1ms / 128Kb
▪ For the same byte/s throughput lower the txn/s through the system
› Less CPU usage in broker/bookies
› Lower GC pressure
CMS - Technical Overview
Low durability
24
▪ Current throughput bottleneck for bookie writes is journal syncs
▪ Could add more bookies but bigger cost
▪ Some use cases are ok to lose data in rare occasions
▪ Solution
› Store data in bookies
• No memory limitation, can build big backlog
› Don’t write to bookie journal
• Data is stored in write cache in 2 bookies + broker cache
› Can lose < 1min data in case 1 broker & 2 bookies crash
▪ Higher throughput with less bookies
▪ Lower publish latency
CMS - Technical Overview
5. Q & A

More Related Content

What's hot

Devoxx Morocco 2016 - Microservices with Kafka
Devoxx Morocco 2016 - Microservices with KafkaDevoxx Morocco 2016 - Microservices with Kafka
Devoxx Morocco 2016 - Microservices with Kafka
László-Róbert Albert
 
Pulsar - flexible pub-sub for internet scale
Pulsar - flexible pub-sub for internet scalePulsar - flexible pub-sub for internet scale
Pulsar - flexible pub-sub for internet scale
Matteo Merli
 
Apache pulsar - storage architecture
Apache pulsar - storage architectureApache pulsar - storage architecture
Apache pulsar - storage architecture
Matteo Merli
 
Unify Storage Backend for Batch and Streaming Computation with Apache Pulsar_...
Unify Storage Backend for Batch and Streaming Computation with Apache Pulsar_...Unify Storage Backend for Batch and Streaming Computation with Apache Pulsar_...
Unify Storage Backend for Batch and Streaming Computation with Apache Pulsar_...
StreamNative
 
Apache Kafka - Martin Podval
Apache Kafka - Martin PodvalApache Kafka - Martin Podval
Apache Kafka - Martin Podval
Martin Podval
 
Effectively-once semantics in Apache Pulsar
Effectively-once semantics in Apache PulsarEffectively-once semantics in Apache Pulsar
Effectively-once semantics in Apache Pulsar
Matteo Merli
 
Building High-Throughput, Low-Latency Pipelines in Kafka
Building High-Throughput, Low-Latency Pipelines in KafkaBuilding High-Throughput, Low-Latency Pipelines in Kafka
Building High-Throughput, Low-Latency Pipelines in Kafka
confluent
 
Reducing Microservice Complexity with Kafka and Reactive Streams
Reducing Microservice Complexity with Kafka and Reactive StreamsReducing Microservice Complexity with Kafka and Reactive Streams
Reducing Microservice Complexity with Kafka and Reactive Streams
jimriecken
 
Kafka 101
Kafka 101Kafka 101
Kafka 101
Clement Demonchy
 
Modern Distributed Messaging and RPC
Modern Distributed Messaging and RPCModern Distributed Messaging and RPC
Modern Distributed Messaging and RPC
Max Alexejev
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache Kafka
Jeff Holoman
 
Apache Kafka
Apache KafkaApache Kafka
Apache Kafka
emreakis
 
Apache Kafka Best Practices
Apache Kafka Best PracticesApache Kafka Best Practices
Apache Kafka Best Practices
DataWorks Summit/Hadoop Summit
 
lessons from managing a pulsar cluster
 lessons from managing a pulsar cluster lessons from managing a pulsar cluster
lessons from managing a pulsar cluster
Shivji Kumar Jha
 
Kafka blr-meetup-presentation - Kafka internals
Kafka blr-meetup-presentation - Kafka internalsKafka blr-meetup-presentation - Kafka internals
Kafka blr-meetup-presentation - Kafka internals
Ayyappadas Ravindran (Appu)
 
Kafka Tutorial - basics of the Kafka streaming platform
Kafka Tutorial - basics of the Kafka streaming platformKafka Tutorial - basics of the Kafka streaming platform
Kafka Tutorial - basics of the Kafka streaming platform
Jean-Paul Azar
 
Kafka at scale facebook israel
Kafka at scale   facebook israelKafka at scale   facebook israel
Kafka at scale facebook israel
Gwen (Chen) Shapira
 
How pulsar stores data at Pulsar-na-summit-2021.pptx (1)
How pulsar stores data at Pulsar-na-summit-2021.pptx (1)How pulsar stores data at Pulsar-na-summit-2021.pptx (1)
How pulsar stores data at Pulsar-na-summit-2021.pptx (1)
Shivji Kumar Jha
 
The Good, The Bad, and The Avro (Graham Stirling, Saxo Bank and David Navalho...
The Good, The Bad, and The Avro (Graham Stirling, Saxo Bank and David Navalho...The Good, The Bad, and The Avro (Graham Stirling, Saxo Bank and David Navalho...
The Good, The Bad, and The Avro (Graham Stirling, Saxo Bank and David Navalho...
confluent
 
Kafka aws
Kafka awsKafka aws
Kafka aws
Ariel Moskovich
 

What's hot (20)

Devoxx Morocco 2016 - Microservices with Kafka
Devoxx Morocco 2016 - Microservices with KafkaDevoxx Morocco 2016 - Microservices with Kafka
Devoxx Morocco 2016 - Microservices with Kafka
 
Pulsar - flexible pub-sub for internet scale
Pulsar - flexible pub-sub for internet scalePulsar - flexible pub-sub for internet scale
Pulsar - flexible pub-sub for internet scale
 
Apache pulsar - storage architecture
Apache pulsar - storage architectureApache pulsar - storage architecture
Apache pulsar - storage architecture
 
Unify Storage Backend for Batch and Streaming Computation with Apache Pulsar_...
Unify Storage Backend for Batch and Streaming Computation with Apache Pulsar_...Unify Storage Backend for Batch and Streaming Computation with Apache Pulsar_...
Unify Storage Backend for Batch and Streaming Computation with Apache Pulsar_...
 
Apache Kafka - Martin Podval
Apache Kafka - Martin PodvalApache Kafka - Martin Podval
Apache Kafka - Martin Podval
 
Effectively-once semantics in Apache Pulsar
Effectively-once semantics in Apache PulsarEffectively-once semantics in Apache Pulsar
Effectively-once semantics in Apache Pulsar
 
Building High-Throughput, Low-Latency Pipelines in Kafka
Building High-Throughput, Low-Latency Pipelines in KafkaBuilding High-Throughput, Low-Latency Pipelines in Kafka
Building High-Throughput, Low-Latency Pipelines in Kafka
 
Reducing Microservice Complexity with Kafka and Reactive Streams
Reducing Microservice Complexity with Kafka and Reactive StreamsReducing Microservice Complexity with Kafka and Reactive Streams
Reducing Microservice Complexity with Kafka and Reactive Streams
 
Kafka 101
Kafka 101Kafka 101
Kafka 101
 
Modern Distributed Messaging and RPC
Modern Distributed Messaging and RPCModern Distributed Messaging and RPC
Modern Distributed Messaging and RPC
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache Kafka
 
Apache Kafka
Apache KafkaApache Kafka
Apache Kafka
 
Apache Kafka Best Practices
Apache Kafka Best PracticesApache Kafka Best Practices
Apache Kafka Best Practices
 
lessons from managing a pulsar cluster
 lessons from managing a pulsar cluster lessons from managing a pulsar cluster
lessons from managing a pulsar cluster
 
Kafka blr-meetup-presentation - Kafka internals
Kafka blr-meetup-presentation - Kafka internalsKafka blr-meetup-presentation - Kafka internals
Kafka blr-meetup-presentation - Kafka internals
 
Kafka Tutorial - basics of the Kafka streaming platform
Kafka Tutorial - basics of the Kafka streaming platformKafka Tutorial - basics of the Kafka streaming platform
Kafka Tutorial - basics of the Kafka streaming platform
 
Kafka at scale facebook israel
Kafka at scale   facebook israelKafka at scale   facebook israel
Kafka at scale facebook israel
 
How pulsar stores data at Pulsar-na-summit-2021.pptx (1)
How pulsar stores data at Pulsar-na-summit-2021.pptx (1)How pulsar stores data at Pulsar-na-summit-2021.pptx (1)
How pulsar stores data at Pulsar-na-summit-2021.pptx (1)
 
The Good, The Bad, and The Avro (Graham Stirling, Saxo Bank and David Navalho...
The Good, The Bad, and The Avro (Graham Stirling, Saxo Bank and David Navalho...The Good, The Bad, and The Avro (Graham Stirling, Saxo Bank and David Navalho...
The Good, The Bad, and The Avro (Graham Stirling, Saxo Bank and David Navalho...
 
Kafka aws
Kafka awsKafka aws
Kafka aws
 

Similar to Cloud Messaging Service: Technical Overview

IBM IMPACT 2014 - AMC-1882 Building a Scalable & Continuously Available IBM M...
IBM IMPACT 2014 - AMC-1882 Building a Scalable & Continuously Available IBM M...IBM IMPACT 2014 - AMC-1882 Building a Scalable & Continuously Available IBM M...
IBM IMPACT 2014 - AMC-1882 Building a Scalable & Continuously Available IBM M...
Peter Broadhurst
 
WSO2 Message Broker - Product Overview
WSO2 Message Broker - Product OverviewWSO2 Message Broker - Product Overview
WSO2 Message Broker - Product Overview
WSO2
 
OnPrem Monitoring.pdf
OnPrem Monitoring.pdfOnPrem Monitoring.pdf
OnPrem Monitoring.pdf
TarekHamdi8
 
Linked In Stream Processing Meetup - Apache Pulsar
Linked In Stream Processing Meetup - Apache PulsarLinked In Stream Processing Meetup - Apache Pulsar
Linked In Stream Processing Meetup - Apache Pulsar
Karthik Ramasamy
 
Apache Kafka Introduction
Apache Kafka IntroductionApache Kafka Introduction
Apache Kafka Introduction
Amita Mirajkar
 
Citi Tech Talk: Monitoring and Performance
Citi Tech Talk: Monitoring and PerformanceCiti Tech Talk: Monitoring and Performance
Citi Tech Talk: Monitoring and Performance
confluent
 
IBM MQ - better application performance
IBM MQ - better application performanceIBM MQ - better application performance
IBM MQ - better application performance
MarkTaylorIBM
 
Tokyo AK Meetup Speedtest - Share.pdf
Tokyo AK Meetup Speedtest - Share.pdfTokyo AK Meetup Speedtest - Share.pdf
Tokyo AK Meetup Speedtest - Share.pdf
ssuser2ae721
 
Distributed messaging with Apache Kafka
Distributed messaging with Apache KafkaDistributed messaging with Apache Kafka
Distributed messaging with Apache Kafka
Saumitra Srivastav
 
OSMC 2016 - Monasca - Monitoring-as-a-Service (at-Scale) by Roland Hochmuth
OSMC 2016 - Monasca - Monitoring-as-a-Service (at-Scale) by Roland HochmuthOSMC 2016 - Monasca - Monitoring-as-a-Service (at-Scale) by Roland Hochmuth
OSMC 2016 - Monasca - Monitoring-as-a-Service (at-Scale) by Roland Hochmuth
NETWAYS
 
OSMC 2016 | Monasca: Monitoring-as-a-Service (at-Scale) by Roland Hochmuth
OSMC 2016 | Monasca: Monitoring-as-a-Service (at-Scale) by Roland HochmuthOSMC 2016 | Monasca: Monitoring-as-a-Service (at-Scale) by Roland Hochmuth
OSMC 2016 | Monasca: Monitoring-as-a-Service (at-Scale) by Roland Hochmuth
NETWAYS
 
Application Layer Protocols for the IoT
Application Layer Protocols for the IoTApplication Layer Protocols for the IoT
Application Layer Protocols for the IoT
Damien Magoni
 
Getting Started with Kafka on k8s
Getting Started with Kafka on k8sGetting Started with Kafka on k8s
Getting Started with Kafka on k8s
VMware Tanzu
 
Non-Kafkaesque Apache Kafka - Yottabyte 2018
Non-Kafkaesque Apache Kafka - Yottabyte 2018Non-Kafkaesque Apache Kafka - Yottabyte 2018
Non-Kafkaesque Apache Kafka - Yottabyte 2018
Otávio Carvalho
 
Where next for MQTT?
Where next for MQTT?Where next for MQTT?
Where next for MQTT?
Ian Craggs
 
Slow things down to make them go faster [FOSDEM 2022]
Slow things down to make them go faster [FOSDEM 2022]Slow things down to make them go faster [FOSDEM 2022]
Slow things down to make them go faster [FOSDEM 2022]
Jimmy Angelakos
 
Captial One: Why Stream Data as Part of Data Transformation?
Captial One: Why Stream Data as Part of Data Transformation?Captial One: Why Stream Data as Part of Data Transformation?
Captial One: Why Stream Data as Part of Data Transformation?
ScyllaDB
 
Learnings from the Field. Lessons from Working with Dozens of Small & Large D...
Learnings from the Field. Lessons from Working with Dozens of Small & Large D...Learnings from the Field. Lessons from Working with Dozens of Small & Large D...
Learnings from the Field. Lessons from Working with Dozens of Small & Large D...
HostedbyConfluent
 
Back to Basics: An Introduction to MQTT
Back to Basics: An Introduction to MQTTBack to Basics: An Introduction to MQTT
Back to Basics: An Introduction to MQTT
HiveMQ
 

Similar to Cloud Messaging Service: Technical Overview (20)

IBM IMPACT 2014 - AMC-1882 Building a Scalable & Continuously Available IBM M...
IBM IMPACT 2014 - AMC-1882 Building a Scalable & Continuously Available IBM M...IBM IMPACT 2014 - AMC-1882 Building a Scalable & Continuously Available IBM M...
IBM IMPACT 2014 - AMC-1882 Building a Scalable & Continuously Available IBM M...
 
WSO2 Message Broker - Product Overview
WSO2 Message Broker - Product OverviewWSO2 Message Broker - Product Overview
WSO2 Message Broker - Product Overview
 
OnPrem Monitoring.pdf
OnPrem Monitoring.pdfOnPrem Monitoring.pdf
OnPrem Monitoring.pdf
 
Linked In Stream Processing Meetup - Apache Pulsar
Linked In Stream Processing Meetup - Apache PulsarLinked In Stream Processing Meetup - Apache Pulsar
Linked In Stream Processing Meetup - Apache Pulsar
 
Apache Kafka Introduction
Apache Kafka IntroductionApache Kafka Introduction
Apache Kafka Introduction
 
Citi Tech Talk: Monitoring and Performance
Citi Tech Talk: Monitoring and PerformanceCiti Tech Talk: Monitoring and Performance
Citi Tech Talk: Monitoring and Performance
 
IBM MQ - better application performance
IBM MQ - better application performanceIBM MQ - better application performance
IBM MQ - better application performance
 
Tokyo AK Meetup Speedtest - Share.pdf
Tokyo AK Meetup Speedtest - Share.pdfTokyo AK Meetup Speedtest - Share.pdf
Tokyo AK Meetup Speedtest - Share.pdf
 
Distributed messaging with Apache Kafka
Distributed messaging with Apache KafkaDistributed messaging with Apache Kafka
Distributed messaging with Apache Kafka
 
OSMC 2016 - Monasca - Monitoring-as-a-Service (at-Scale) by Roland Hochmuth
OSMC 2016 - Monasca - Monitoring-as-a-Service (at-Scale) by Roland HochmuthOSMC 2016 - Monasca - Monitoring-as-a-Service (at-Scale) by Roland Hochmuth
OSMC 2016 - Monasca - Monitoring-as-a-Service (at-Scale) by Roland Hochmuth
 
OSMC 2016 | Monasca: Monitoring-as-a-Service (at-Scale) by Roland Hochmuth
OSMC 2016 | Monasca: Monitoring-as-a-Service (at-Scale) by Roland HochmuthOSMC 2016 | Monasca: Monitoring-as-a-Service (at-Scale) by Roland Hochmuth
OSMC 2016 | Monasca: Monitoring-as-a-Service (at-Scale) by Roland Hochmuth
 
Application Layer Protocols for the IoT
Application Layer Protocols for the IoTApplication Layer Protocols for the IoT
Application Layer Protocols for the IoT
 
Getting Started with Kafka on k8s
Getting Started with Kafka on k8sGetting Started with Kafka on k8s
Getting Started with Kafka on k8s
 
Non-Kafkaesque Apache Kafka - Yottabyte 2018
Non-Kafkaesque Apache Kafka - Yottabyte 2018Non-Kafkaesque Apache Kafka - Yottabyte 2018
Non-Kafkaesque Apache Kafka - Yottabyte 2018
 
Where next for MQTT?
Where next for MQTT?Where next for MQTT?
Where next for MQTT?
 
Slow things down to make them go faster [FOSDEM 2022]
Slow things down to make them go faster [FOSDEM 2022]Slow things down to make them go faster [FOSDEM 2022]
Slow things down to make them go faster [FOSDEM 2022]
 
kafka
kafkakafka
kafka
 
Captial One: Why Stream Data as Part of Data Transformation?
Captial One: Why Stream Data as Part of Data Transformation?Captial One: Why Stream Data as Part of Data Transformation?
Captial One: Why Stream Data as Part of Data Transformation?
 
Learnings from the Field. Lessons from Working with Dozens of Small & Large D...
Learnings from the Field. Lessons from Working with Dozens of Small & Large D...Learnings from the Field. Lessons from Working with Dozens of Small & Large D...
Learnings from the Field. Lessons from Working with Dozens of Small & Large D...
 
Back to Basics: An Introduction to MQTT
Back to Basics: An Introduction to MQTTBack to Basics: An Introduction to MQTT
Back to Basics: An Introduction to MQTT
 

Recently uploaded

JAVIER LASA-EXPERIENCIA digital 1986-2024.pdf
JAVIER LASA-EXPERIENCIA digital 1986-2024.pdfJAVIER LASA-EXPERIENCIA digital 1986-2024.pdf
JAVIER LASA-EXPERIENCIA digital 1986-2024.pdf
Javier Lasa
 
Multi-cluster Kubernetes Networking- Patterns, Projects and Guidelines
Multi-cluster Kubernetes Networking- Patterns, Projects and GuidelinesMulti-cluster Kubernetes Networking- Patterns, Projects and Guidelines
Multi-cluster Kubernetes Networking- Patterns, Projects and Guidelines
Sanjeev Rampal
 
APNIC Foundation, presented by Ellisha Heppner at the PNG DNS Forum 2024
APNIC Foundation, presented by Ellisha Heppner at the PNG DNS Forum 2024APNIC Foundation, presented by Ellisha Heppner at the PNG DNS Forum 2024
APNIC Foundation, presented by Ellisha Heppner at the PNG DNS Forum 2024
APNIC
 
Comptia N+ Standard Networking lesson guide
Comptia N+ Standard Networking lesson guideComptia N+ Standard Networking lesson guide
Comptia N+ Standard Networking lesson guide
GTProductions1
 
1.Wireless Communication System_Wireless communication is a broad term that i...
1.Wireless Communication System_Wireless communication is a broad term that i...1.Wireless Communication System_Wireless communication is a broad term that i...
1.Wireless Communication System_Wireless communication is a broad term that i...
JeyaPerumal1
 
原版仿制(uob毕业证书)英国伯明翰大学毕业证本科学历证书原版一模一样
原版仿制(uob毕业证书)英国伯明翰大学毕业证本科学历证书原版一模一样原版仿制(uob毕业证书)英国伯明翰大学毕业证本科学历证书原版一模一样
原版仿制(uob毕业证书)英国伯明翰大学毕业证本科学历证书原版一模一样
3ipehhoa
 
一比一原版(SLU毕业证)圣路易斯大学毕业证成绩单专业办理
一比一原版(SLU毕业证)圣路易斯大学毕业证成绩单专业办理一比一原版(SLU毕业证)圣路易斯大学毕业证成绩单专业办理
一比一原版(SLU毕业证)圣路易斯大学毕业证成绩单专业办理
keoku
 
guildmasters guide to ravnica Dungeons & Dragons 5...
guildmasters guide to ravnica Dungeons & Dragons 5...guildmasters guide to ravnica Dungeons & Dragons 5...
guildmasters guide to ravnica Dungeons & Dragons 5...
Rogerio Filho
 
History+of+E-commerce+Development+in+China-www.cfye-commerce.shop
History+of+E-commerce+Development+in+China-www.cfye-commerce.shopHistory+of+E-commerce+Development+in+China-www.cfye-commerce.shop
History+of+E-commerce+Development+in+China-www.cfye-commerce.shop
laozhuseo02
 
1比1复刻(bath毕业证书)英国巴斯大学毕业证学位证原版一模一样
1比1复刻(bath毕业证书)英国巴斯大学毕业证学位证原版一模一样1比1复刻(bath毕业证书)英国巴斯大学毕业证学位证原版一模一样
1比1复刻(bath毕业证书)英国巴斯大学毕业证学位证原版一模一样
3ipehhoa
 
This 7-second Brain Wave Ritual Attracts Money To You.!
This 7-second Brain Wave Ritual Attracts Money To You.!This 7-second Brain Wave Ritual Attracts Money To You.!
This 7-second Brain Wave Ritual Attracts Money To You.!
nirahealhty
 
BASIC C++ lecture NOTE C++ lecture 3.pptx
BASIC C++ lecture NOTE C++ lecture 3.pptxBASIC C++ lecture NOTE C++ lecture 3.pptx
BASIC C++ lecture NOTE C++ lecture 3.pptx
natyesu
 
test test test test testtest test testtest test testtest test testtest test ...
test test  test test testtest test testtest test testtest test testtest test ...test test  test test testtest test testtest test testtest test testtest test ...
test test test test testtest test testtest test testtest test testtest test ...
Arif0071
 
Latest trends in computer networking.pptx
Latest trends in computer networking.pptxLatest trends in computer networking.pptx
Latest trends in computer networking.pptx
JungkooksNonexistent
 
一比一原版(CSU毕业证)加利福尼亚州立大学毕业证成绩单专业办理
一比一原版(CSU毕业证)加利福尼亚州立大学毕业证成绩单专业办理一比一原版(CSU毕业证)加利福尼亚州立大学毕业证成绩单专业办理
一比一原版(CSU毕业证)加利福尼亚州立大学毕业证成绩单专业办理
ufdana
 
一比一原版(LBS毕业证)伦敦商学院毕业证成绩单专业办理
一比一原版(LBS毕业证)伦敦商学院毕业证成绩单专业办理一比一原版(LBS毕业证)伦敦商学院毕业证成绩单专业办理
一比一原版(LBS毕业证)伦敦商学院毕业证成绩单专业办理
eutxy
 
Bridging the Digital Gap Brad Spiegel Macon, GA Initiative.pptx
Bridging the Digital Gap Brad Spiegel Macon, GA Initiative.pptxBridging the Digital Gap Brad Spiegel Macon, GA Initiative.pptx
Bridging the Digital Gap Brad Spiegel Macon, GA Initiative.pptx
Brad Spiegel Macon GA
 
急速办(bedfordhire毕业证书)英国贝德福特大学毕业证成绩单原版一模一样
急速办(bedfordhire毕业证书)英国贝德福特大学毕业证成绩单原版一模一样急速办(bedfordhire毕业证书)英国贝德福特大学毕业证成绩单原版一模一样
急速办(bedfordhire毕业证书)英国贝德福特大学毕业证成绩单原版一模一样
3ipehhoa
 
The+Prospects+of+E-Commerce+in+China.pptx
The+Prospects+of+E-Commerce+in+China.pptxThe+Prospects+of+E-Commerce+in+China.pptx
The+Prospects+of+E-Commerce+in+China.pptx
laozhuseo02
 
Internet-Security-Safeguarding-Your-Digital-World (1).pptx
Internet-Security-Safeguarding-Your-Digital-World (1).pptxInternet-Security-Safeguarding-Your-Digital-World (1).pptx
Internet-Security-Safeguarding-Your-Digital-World (1).pptx
VivekSinghShekhawat2
 

Recently uploaded (20)

JAVIER LASA-EXPERIENCIA digital 1986-2024.pdf
JAVIER LASA-EXPERIENCIA digital 1986-2024.pdfJAVIER LASA-EXPERIENCIA digital 1986-2024.pdf
JAVIER LASA-EXPERIENCIA digital 1986-2024.pdf
 
Multi-cluster Kubernetes Networking- Patterns, Projects and Guidelines
Multi-cluster Kubernetes Networking- Patterns, Projects and GuidelinesMulti-cluster Kubernetes Networking- Patterns, Projects and Guidelines
Multi-cluster Kubernetes Networking- Patterns, Projects and Guidelines
 
APNIC Foundation, presented by Ellisha Heppner at the PNG DNS Forum 2024
APNIC Foundation, presented by Ellisha Heppner at the PNG DNS Forum 2024APNIC Foundation, presented by Ellisha Heppner at the PNG DNS Forum 2024
APNIC Foundation, presented by Ellisha Heppner at the PNG DNS Forum 2024
 
Comptia N+ Standard Networking lesson guide
Comptia N+ Standard Networking lesson guideComptia N+ Standard Networking lesson guide
Comptia N+ Standard Networking lesson guide
 
1.Wireless Communication System_Wireless communication is a broad term that i...
1.Wireless Communication System_Wireless communication is a broad term that i...1.Wireless Communication System_Wireless communication is a broad term that i...
1.Wireless Communication System_Wireless communication is a broad term that i...
 
原版仿制(uob毕业证书)英国伯明翰大学毕业证本科学历证书原版一模一样
原版仿制(uob毕业证书)英国伯明翰大学毕业证本科学历证书原版一模一样原版仿制(uob毕业证书)英国伯明翰大学毕业证本科学历证书原版一模一样
原版仿制(uob毕业证书)英国伯明翰大学毕业证本科学历证书原版一模一样
 
一比一原版(SLU毕业证)圣路易斯大学毕业证成绩单专业办理
一比一原版(SLU毕业证)圣路易斯大学毕业证成绩单专业办理一比一原版(SLU毕业证)圣路易斯大学毕业证成绩单专业办理
一比一原版(SLU毕业证)圣路易斯大学毕业证成绩单专业办理
 
guildmasters guide to ravnica Dungeons & Dragons 5...
guildmasters guide to ravnica Dungeons & Dragons 5...guildmasters guide to ravnica Dungeons & Dragons 5...
guildmasters guide to ravnica Dungeons & Dragons 5...
 
History+of+E-commerce+Development+in+China-www.cfye-commerce.shop
History+of+E-commerce+Development+in+China-www.cfye-commerce.shopHistory+of+E-commerce+Development+in+China-www.cfye-commerce.shop
History+of+E-commerce+Development+in+China-www.cfye-commerce.shop
 
1比1复刻(bath毕业证书)英国巴斯大学毕业证学位证原版一模一样
1比1复刻(bath毕业证书)英国巴斯大学毕业证学位证原版一模一样1比1复刻(bath毕业证书)英国巴斯大学毕业证学位证原版一模一样
1比1复刻(bath毕业证书)英国巴斯大学毕业证学位证原版一模一样
 
This 7-second Brain Wave Ritual Attracts Money To You.!
This 7-second Brain Wave Ritual Attracts Money To You.!This 7-second Brain Wave Ritual Attracts Money To You.!
This 7-second Brain Wave Ritual Attracts Money To You.!
 
BASIC C++ lecture NOTE C++ lecture 3.pptx
BASIC C++ lecture NOTE C++ lecture 3.pptxBASIC C++ lecture NOTE C++ lecture 3.pptx
BASIC C++ lecture NOTE C++ lecture 3.pptx
 
test test test test testtest test testtest test testtest test testtest test ...
test test  test test testtest test testtest test testtest test testtest test ...test test  test test testtest test testtest test testtest test testtest test ...
test test test test testtest test testtest test testtest test testtest test ...
 
Latest trends in computer networking.pptx
Latest trends in computer networking.pptxLatest trends in computer networking.pptx
Latest trends in computer networking.pptx
 
一比一原版(CSU毕业证)加利福尼亚州立大学毕业证成绩单专业办理
一比一原版(CSU毕业证)加利福尼亚州立大学毕业证成绩单专业办理一比一原版(CSU毕业证)加利福尼亚州立大学毕业证成绩单专业办理
一比一原版(CSU毕业证)加利福尼亚州立大学毕业证成绩单专业办理
 
一比一原版(LBS毕业证)伦敦商学院毕业证成绩单专业办理
一比一原版(LBS毕业证)伦敦商学院毕业证成绩单专业办理一比一原版(LBS毕业证)伦敦商学院毕业证成绩单专业办理
一比一原版(LBS毕业证)伦敦商学院毕业证成绩单专业办理
 
Bridging the Digital Gap Brad Spiegel Macon, GA Initiative.pptx
Bridging the Digital Gap Brad Spiegel Macon, GA Initiative.pptxBridging the Digital Gap Brad Spiegel Macon, GA Initiative.pptx
Bridging the Digital Gap Brad Spiegel Macon, GA Initiative.pptx
 
急速办(bedfordhire毕业证书)英国贝德福特大学毕业证成绩单原版一模一样
急速办(bedfordhire毕业证书)英国贝德福特大学毕业证成绩单原版一模一样急速办(bedfordhire毕业证书)英国贝德福特大学毕业证成绩单原版一模一样
急速办(bedfordhire毕业证书)英国贝德福特大学毕业证成绩单原版一模一样
 
The+Prospects+of+E-Commerce+in+China.pptx
The+Prospects+of+E-Commerce+in+China.pptxThe+Prospects+of+E-Commerce+in+China.pptx
The+Prospects+of+E-Commerce+in+China.pptx
 
Internet-Security-Safeguarding-Your-Digital-World (1).pptx
Internet-Security-Safeguarding-Your-Digital-World (1).pptxInternet-Security-Safeguarding-Your-Digital-World (1).pptx
Internet-Security-Safeguarding-Your-Digital-World (1).pptx
 

Cloud Messaging Service: Technical Overview

  • 1. Cloud Messaging Service Technical Overview P R E S E N T E D B Y M a t t e o M e r l i S e p t e m b e r 2 1 , 2 0 1 5
  • 2. Sections 2 1. Introduction 2. Architecture 3. Bookkeeper 4. Future 5. Q & A CMS - Technical Overview
  • 3. What is CMS 3 • Hosted Pub / Sub • Multi tenant (Auth / Quotas / Load Balancer) • Horizontally scalable • Highly available, durable and consistent storage • Geo Replication • In production since 2013 CMS - Technical Overview CMS Cluster Producer Broker Consumer Bookie ZK Global ZK Replication
  • 4. CMS key features 4 CMS - Technical Overview • Multi-tenancy / hosted • Operating a system at scale is hard and requires deep understanding of internals • Authentication / Self service provisioning / Quotas • SLAs (Write latency 2ms avg - 5ms 99pct) • Maintain the same latencies and throughput under backlog draining scenarios • Simple high level API with clear ordering, durability and consistency semantics • Geo-replication • Single API call to configure regions to replicate to • Load balancer: Dynamically optimize topics assignment to brokers • Support large number of topics • Store subscription position • Apps don’t need to store it • Able to delete data as soon as it's consumed • Support round-robin distribution across multiple consumers
  • 5. Work load examples 5 CMS - Technical Overview Challenge # Topics # Producers / topic # Subscriptions / topic Produced msg rate / s / topic Fan-out 1 1 1 K 1 K Throughput & latency 1 1 1 100 K # Topics & latency 1 M 1 10 10 Fan-in 1 1 K 1 > 100 K • Design to support wide range of use cases • Need to be cost effective in every case
  • 7. Messaging model 7 CMS - Technical Overview • Producers can attach to a topic and send messages to it • A subscription is a durable resources that is the recipient of all messages sent to the topic, after its creation • Subscriptions do have a type: • “Exclusive” means that only one consumer is allowed to attach to this subscription. First consumer decides the type. • “Shared” allows multiple consumers. Messages are sent in round-robin distribution. No ordering guarantees. • “Failover” allows multiple consumers, though only one is receiving messages at a given point, while others are in standby mode. Consumer-5 Failover Subscription-C Consumer-4 Consumer-3 Consumer-2 Subscription-B Shared Exclusive Consumer-1 Subscription-AProducer-X Producer-Y Topic
  • 8. Client API 8 ▪ Expose messaging model concepts (producer/consumer) ▪ C++ and Java ▪ Connection pooling ▪ Handle recoverable failures transparently (reconnect / resend messages) without compromising ordering guarantees ▪ Sync / async version of every operation CMS - Technical Overview
  • 9. Java producer example 9 CmsClient client = CmsClient.create("http://<broker vip>:4080"); Producer producer = client.createProducer("my-topic"); // handles retries in case of failure producer.send("my-message".getBytes()); // Async version: producer.sendAsync("my-message".getBytes()).thenRun(() -> { // Message was persisted }); CMS - Technical Overview
  • 10. Java consumer example 10 CmsClient client = CmsClient.create(“http://<broker vip>:4080"); Consumer consumer = client.subscribe( “my-topic", "my-subscription-name", SubscriptionType.Exclusive); // Blocks until message available Message msg = consumer.receive(); // Do something... consumer.acknowledge(msg); CMS - Technical Overview
  • 11. System overview 11 CMS - Technical Overview Broker • State-less • Maintain in memory cache of messages • Read from Bookkeeper when cache miss Bookkeeper • Distributed write-ahead log • Create many ledgers • Append entries • Read entries • Delete ledger • Consistent reads • Single writer (the broker) CMS Cluster Broker Bookie ZK Global ZK Replication Native dispatcher Managed Ledger BK Client Global replicators Cache Load Balancer Producer App CMS client Consumer App CMS client
  • 12. System overview 12 CMS - Technical Overview Native dispatcher • Async Netty server Global replicators • If topic is global, republish messages in other regions Global Zookeeper • ZK instance with participants in multiple US regions • Consistent data store for customers configuration • Accept writes with one region downCMS Cluster Broker Bookie ZK Global ZK Replication Native dispatcher Managed Ledger BK Client Global replicators Cache Load Balancer Producer App CMS client Consumer App CMS client
  • 13. Partitioned topics 13 ▪ Client lib has a wrapper producer/ consumer implementation ▪ No API changes ▪ Producers can decide how to assign messages to partitions: ▪ Single partition ▪ Round robin ▪ Provide a key on the message ▪ Hash of the key determines the partition ▪ Custom routing CMS - Technical Overview App CMS Cluster Broker 1 Producer T1 P0 P1 P2 P3 P4 T1- P0 Broker 2 Broker 3 T1- P1 T1- P2 T1- P3 T1- P4
  • 14. Partitioned topics 14 ▪ Consumers can use all subscription type with the same semantics ▪ In “Failover” subscription type, the election is done per partition ▪ Evenly spread the partitions assignment across all available consumers ▪ No need for ZK coordination CMS - Technical Overview CMS Cluster Broker 1 App Consumer-1 T1 C0 C1 C2 C3 C4 T1- P0 Broker 2 Broker 3 T1- P1 T1- P2 T1- P3 T1- P4 App Consumer-2 T1 C0 C1 C2 C3 C4
  • 16. CMS Bookkeeper usage 16 ▪ CMS uses Bookkeeper through a higher level interface of ManagedLedger: › A single managed ledger represent the storage of a single topic › Maintains list of currently active BK ledgers › Maintains the subscription positions using an additional ledger to checkpoint the last acknowledged message in the stream › Cache data › Deletes ledgers when all cursors are done with them CMS - Technical Overview
  • 17. Bookie internal structure 17 CMS - Technical Overview • Writes are written both to journal and to ledger storage (in different device) • Ledger storage writes are fsynced periodically • Reads are only coming from ledger storage • Entries are interleaved in entry log files • Ledger indexes are used to find entries offset
  • 18. Bookkeeper issues 18 ▪ Performance degrades when writing to many ledgers at the same time ▪ When there are heavy reads, the ledger storage device gets slow and will impact writes ▪ Ledger storage flushes need to fsync many ledger index files each time CMS - Technical Overview
  • 19. Bookie storage improvements 19 CMS - Technical Overview • Writes are written both to journal and to in memory write cache • Entries are periodically flushed • Entries are sorted by ledger to be sequential on disk (per flush period) • Since entries are sequential, we added read-ahead cache • Location index is mostly kept in memory and only updated during flush
  • 20. Bookkeeper write latency 20 ▪ After hardware, next limit to achieve low latency is JVM GC ▪ GC pauses are unavoidable. Try to keep them around ~50ms and as least as frequents as possible › Switched BK client and servers to use Netty pooled ref-counted buffers and direct memory to hide it from GC and eliminate payload copies › Extensively profiled allocations and substantially reduced per-entry objects allocations • Use Recycler pattern to pool objects (very efficient for same thread allocate/release) • Primitive collections • Array queue instead of linked queues in executors • Open hash maps instead of linked hash maps • BTree instead of ConcurrentSkipList CMS - Technical Overview
  • 21. Bookie ledgers scalability 21 CMS - Technical Overview Single bookie — 15K write/s BKwritelatency(ms) 0 1 2 3 4 Ledgers / bookie 1 1000 5000 10000 20000 50000 Avg 99pct
  • 23. Auto batching 23 ▪ Send messages in batches throughout the system ▪ Transparent to application ▪ Configure group timing and size: e.g.: 1ms / 128Kb ▪ For the same byte/s throughput lower the txn/s through the system › Less CPU usage in broker/bookies › Lower GC pressure CMS - Technical Overview
  • 24. Low durability 24 ▪ Current throughput bottleneck for bookie writes is journal syncs ▪ Could add more bookies but bigger cost ▪ Some use cases are ok to lose data in rare occasions ▪ Solution › Store data in bookies • No memory limitation, can build big backlog › Don’t write to bookie journal • Data is stored in write cache in 2 bookies + broker cache › Can lose < 1min data in case 1 broker & 2 bookies crash ▪ Higher throughput with less bookies ▪ Lower publish latency CMS - Technical Overview
  • 25. 5. Q & A