HOSTED BY
Architecting a High-Performance
Distributed Message Queuing System
Vitaly Dzhitenov
Senior Engineer at Bloomberg
Agenda
■ BlazingMQ overview
■ Distributed architecture
■ Key performance-related ideas
■ Broker architecture
■ Fixing bottlenecks
2
What is BlazingMQ?
■ Multi-producer, multi-consumer message queue
■ Physical decoupling, as well as temporal isolation, between the actors
■ Guaranteed acknowledgment
■ Message persistence and replication
■ High availability
■ Transport abstraction
■ Scalability (just add more workers / applications); high fan-out ratio (1:6,000+)
3
Distributed architecture
4
App
App
Proxy
Node
Node Node
Node
App
App
Proxy
Data Center 2
Data Center 1
Replica Leader
Replica Primary
Queue trajectories
5
Primary
Replica Replica Replica
Proxy Proxy
Proxy Proxy Proxy
Consumer
Consumer
Consumer Consumer Consumer Consumer Consumer
Proxy
Producer
Proxy
Producer Producer
Replica Replica
PUTs
PUSHes
BlazingMQ at Bloomberg
■ Battle-tested in production for eight (8) years
■ 55,000+ queues
■ Processing billions messages and terabytes of data daily
■ Low Latency
● For 600,000 msg/sec to no persistence queue w/ fan-out ratio 5, the
median is 1.7ms
● For 150,000 msg/sec over 10 persistent queues, the median is 1.4ms
● https://bloomberg.github.io/blazingmq/docs/performance/benchmarks/
6
Performance
■ Actor thread model
■ Batching
■ Memory and Object pools, polymorphic allocators
■ No data copying
7
Actors
■ Client
● Reading/writing to client
● Statistics and validation
● Queue lookup
■ Queue
● Storage and replication
● Data routing
■ Cluster
● Reading/writing to cluster nodes
● Cluster health
● Primary node
● Queue lookup
8
Primary
Replica
Replica
Proxy
Proxy
Actor Model
9
PUT
PUSH
PUT
PUSH
PUT
PUSH
Cluster
Cluster
Cluster
Cluster
Cluster
Queue
Queue
Queue
Queue
Queue
Client Client
Client
Client
Producer
Consumer
CLIENT
CLUSTER
DispatcherClient type: QUEUE
Event Dispatcher
10
ThreadPool
Queue Processors:
Monitored
SingleConsumerQueues
Clients
EventPool
Batching
■ Batch builders for every data type
■ Flushing (to the network) on:
● Size limit, fixed or auto-tuning
● Dispatcher queue idleness
● Intelligent batching decisions:
■ Adjustable batch size
■ Interdependent flushing
11
Proxy Primary Replica
Channel
Advanced batching
12
Cluster
Client
Queue
network
network
Channel
network
network
Client
Client
Cluster
Queue
network
network
PUT
PUSH Replication
PUSH
PUSH
PUT
PUT
Queue
Cluster
PUT PUT PUSH
Actor bottleneck
13
Cluster
Client Queue
Client Queue
Client Queue
The solution
■ Separate Control and Data planes
■ Keep Cluster on the Control Plane and bypass it on the Data Plane
■ Queue takes over Context, Statistics, and Validation work on the Data Plane
■ Queue validates data using lockless synchronization with Cluster
● AtomicGate
■ Based on one atomic int
■ Multiple lockless, non-blocking AtomicGate::tryEnter
■ Single AtomicGate::open, AtomicGate::closeAndDrain
14
Published as Open Source!
■ https://github.com/bloomberg/blazingmq
■ https://bloomberg.github.io/blazingmq
■ https://bloomberg.github.io/blazingmq/docs/performance/benchmarks/
15
Vitaly Dzhitenov
vdzhitenov@bloomberg.net
@TechAtBloomberg
Thank you! Let’s connect.

Architecting a High-Performance (Open Source) Distributed Message Queuing System in C++