2. AMQP
Advanced Message Queuing Protocol – is a protocol that defines some features
such as : message orientation , queuing , routing , security and reliability.
Point-to-Point and Pub-Sub pattern
The protocol sending data across the networks as a streams of octects so it does
not depend on any API (like JMS) independent vendor and clients
4. RabbitMQ Exchanges and Queues
The super simplified overview:
Publishers send messages to exchanges
Exchanges route messages to queues and other exchanges
RabbitMQ sends acknowledgements to publishers on message receipt
Consumers maintain persistent TCP connections with RabbitMQ and declare which
queue(s) they consume
RabbitMQ pushes messages to consumers
Consumers send acknowledgements of success/failure
Messages are removed from queues once consumed successfully
5. Routing
Fanout. Routes to all queues and exchanges that have a binding to the exchange.
The standard pub sub model.
Direct. Routes messages based on a Routing Key that the message carries with it,
set by the publisher. A routing key is a short string. Direct exchanges route
messages to queues/exchanges that have a Binding Key that exactly matches the
routing key.
Topic. Routes messages based on a routing key, but allows wildcard matching.
Header. RabbitMQ allows for custom headers to be added to messages. Header
exchanges route messages according to those header values. Each binding
includes exact match header values. Multiple values can be added to a binding
with ANY or ALL values required to match.
Consistent Hashing. This is an exchange that hashes either the routing key or a
message header and routes to one queue only. This is useful when you need
processing order guarantees with scaled out consumers.
8. Apache Kafka
Distributed because Kafka is deployed as a cluster of nodes, for both fault
tolerance and scale
Replicated because messages are usually replicated across multiple nodes
(servers).
Commit Log because messages are stored in partitioned, append only logs which
are called Topics. This concept of a log is the principal killer feature of Kafka.
10. Apache Kafka
Consumer Group
One producer, three partitions and one
consumer group with three consumers
Some consumers read from more than one
partition
13. Plugins
Shovel Plugin
Plugin transfer messages from queue on
one broker to an exchange on another
broker
Shovel is just a well-written client
application
Read messages from source
Forward messages to destionation
Deal with connection failure
Has plenty options
Shovel has two modules
Shovel engine
Shovel management console
14. Plugins
Federation
Forward messages between
broker without clustering
Federation is designed to
scale out publish / subscribe
messaging accros WAN
15. Plugins
Sharding
Sharding is performed by exchanges
Queues will be automatically created on
every cluster node and messages will be
sharded across them
Required Consumer = Sharded Queues
Logical queue images = sharded queues
images.1,images.2 etc
16. Plugins
Sharding
The plugin provides a new exchange type, "x-modulus-hash", that will use a hashing
function to partition messages routed to a logical queue across a number of regular queues
(shards).
"x-modulus-hash“ has the routing key then pick the queue where Hash mod N (N is number
of queue bound to exchange)
If only used partition and automatic scaling number of shard is not needed use Consistent
Hash Exchange plugin
Automatic scaling
You have 4 sharded queue on one node if a new node join two cluster automatically create 4 more
queues on the new node
Total ordering of messages between shards is not defined.
17. Plugins
Sharding
Consuming from Sharded Queue
The plugin will chose the queue from the shard
with the least amount of consumers, provided
the queue contents are local to the broker you
are connected to.
Note: The plugin did not resolve automatically
scale out consumers
Once you have enabled the plugin, you create an
exchange. If you want hash based routing, you
declare an x-modulus-hash exchange. Then you
apply a policy to it with the field “shards-per-
node”. Let’s say we have three nodes. If we set
“shards-per-node” to 5 then we’ll end up with 15
queues created across those nodes.
Load Distribution and Consumer Balancing
“queue master locator policy used”
"Minimum masters" is a queue master
locator that is most in line with the goals of
this plugin.
load balancers, the "least connections"
strategy is more likely to produce an even
distribution compared to round robin and
other strategies.
18. Plugins
Consistent Hash Exchange
Consistent Hashing exchange allows us to
partition a single queue into multiple
queues and distribute messages between
them via a hashing of the routing key,
message header or message property.
Topic or Direct exchange using the
Customer Id as the routing key. you have a
large number of customers then you'd
soon have thousands of queues.
Buckets
Another way would be to use consistent
hashing and a modulus function to create
a consistent routing key
var numberOfQueues = 10;
var routingKey =
BitConverter.ToInt64(CreateMD5Hash(custome
rId)) % numberOfQueues;
19. Plugins
Consistent Hash Exchange
We use this number between 0 and 5 as
the routing key. Because the base is a
consistent hash, customer 1000 will always
be placed in the same bucket
PROBLEM : If you change from 5 buckets
to 10 however, the bucket will most likely
not be the same anymore.
Bucketing of Customer Id to route to a fixed number of
queues
20. Plugins
Consistent Hash Exchange
A better way of doing it would to be to
use the Consistent Hash Exchange.
Instead of calculating a hash ourselves, we
let the exchange do it. We simply put the
Customer Id as the routing key.
Messages are distributed by partitioned hash space
21. Plugins
CHE & Sharding
PROBLEMS
RabbitMQ doesn't help you to coordinate
your consumers
the thing you hash (routing key, message
header or property) doesn't have enough
variance to create an even distribution. If
you only have four different values then
you might get unlucky and all go to a
single queue.
you have relatively few queues then
distribution may be uneven again.
PROBLEMS
Sharding is the act of taking a data set and
splitting it across multiple machines. which
machine takes what subset of data ?
Consistent Hash Exchange
The distribution of keys (a key is the name of a
piece of data) across the machines is always
relatively even.
The addition or removal of a machine shifts
around roughly 1/N of the keys, where N is the
final number of nodes
22. RabbitMQ Clustering
Clustering Client Connection
Clients connect through a software or hardware load balancer
Advantages: clients do not need to be aware of cluster membership
23. RabbitMQ Clustering
Clustering does not equal full HA
Standard default queues are not
replicated across nodes in the cluster
A single node in the cluster contains the
queue messages
Other nodes simply hold pointers to the
node with actual messages
24. RabbitMQ Clustering
Clustering + Mirrored Queues =
FULL HA
Mirrored queues replicate the messages to
other nodes in the cluster.
In the event of the node hosting the
master copy going down , another node
can take over
25. RabbitMQ Clustering
Client Side Failover
Clients must be aware of node failure
Clients must handle reconnections
It is important to understand the concepts
behind RabbitMQ clustering, since
A failure is not fully transparent for a client
(especially a consumer)
Messages in a queue can be lost after
failure
27. RabbitMQ HA
Failure of a Cluster Node
What happens if the “owner” node dies
o Consumer lose their subscribtions
o New messages matching the bindings are slightly discarded
If queue was durable, it cannot re-created
o The owner must be restored
If queue was not durable, it can be re-created
o Even if the former owner is not running
28. RabbitMQ HA
Mirrored Queues
A mirrored queue replicates its content on multiple cluster nodes
Every queue in RabbitMQ has a home node. That node is called queue master. All queue
operations go through the master first and then are replicated to mirrors. This is necessary to
guarantee FIFO ordering of messages.
The content of a mirrored queue lives on one master node
The content is replicated on other nodes of the cluster
Mirrored queues provide high availability
Mirrored queues also affect performence
30. RabbitMQ HA
Mirrored Queue
one master that receives all reads and writes
one or more mirrors that receive all messages and meta data from the master. These mirrors
do not exist for scaling out the reading of queues but solely for redundancy.
32. RabbitMQ HA
Failover on Client Side
Node failures are not transparent to client
Clients must deal with node failures
Clients bindings provide callbacks in case of errors
These callbacks make the recovery code easier to write
There are many failure scenarious: always test yours!
33. Apache Kafka Cluster
Distributed Systems: Communication
and Consensus
Worker node membership and naming
Configuration management
Leader election
Health status
Apache Zookeeper
Configuration information
Health status
Group membership
Worker Nodes Roles:Controllers,Leaders and
Followers
Reliability through replication
Consensus-based communication
38. RabbitMQ Reliability
What Can Go Wrong
The broker crashes ; sent but not-yet-consumed messages are lost!
The broker can provide persistence
Business processing after reception times out, or is temporarily down
The broker can redeliver the message
Messages must be sent as a group, but the application fails in the middle
Thanks to transactions, the broker can provide atomic operations
39. RabbitMQ Relability
Durability and Persistence
Durability only works if queues are durable and messages are persistent
Durability is the ‘D’ in ACID
Surviving a server restart
Surviving a broker crash
Durability happens at three different levels:
Exchange
Queue
Message (referred to as “persistence”)
40. RabbitMQ Relability
Transactions Across Senders and
Consumers
Transactions do not span the sender and consumer
A common transaction would couple the sender to the consumer
Messaging is all about decoupling!
Transaction semantics are only between the sender-broker and broker-consumer
42. Performance
Delivery Configuration on the client side
The broker can limit what it delivers to consumers
The consumer configures delivery at the level at channel level
The broker continues deliveries, depending on consumer’s ACK
Channel.basicQos() to specify the “quality of service”
Parameters:
prefetchCount: the number of unacknowledged messages the server will deliver to a
channel (0 means there is no limit) in a single batch
43. Performance
Quality of Service Configuration
basicQos() allow consumer-driven flow control
Consumer does not get overwhelmed
Method call is blocking when value is reached
Useless if auto-ack is enabled: messages are acknowledged as soon as they are delivered
45. Performance
Message Persistence
Be aware of performance implication
when specifying persistent messages
Every message will be written to disk
If the confirms are used and/or consumer
acknowledgements, this can have
significant impact on performance
Performance impact due to additional disk
I/O
Clustering and Mirrored Queues
Certain types of cluster have mirrored-queues that
replicate metadata and data on other queues in
the cluster
Overhead to replicate all the data across the nodes
Commit and acknowledgment are also replicated
across the nodes
If persistent messages, and/or publisher confirms,
and/or consumer acknowledgements are used,
performance impact will be higher with mirrored
queues, as overhead is also mirrored
46. Performance
Memory-based Flow Control
RabbitMQ comes with a nice feature
Memory-based flow control
Used to prevent the broker from accepting too
much traffic from producers
So that memory is not filled in case of verbose
producers
Why is needed
By default on RabbitMQ, messages are accepted
from clients even if they are not written to disk
yet
Accepting messages is much faster than writing
messages to disk
In a heavy load scenario , without any flow
control available memory could be completely
consumed
Memory-based Flow Control
How does it work
Producer are throttled to reduce throughput
Consumer are not affected
Alarm is raised in the log
The default memory threshold %40 of the server RAM
This does not prevent RabbitMQ from using more than
%40 RAM
It is just the point where flow control is turned on
Usage
Memory threshold value can be modified in
rabbitmq.config file
Value of 0 disables the flow control
[{rabbit, [{vm_memory_high_watermark, 0.4}]}]
47. Performance
Lazy Queues
Sometimes , queues fill up
Consumer can not keep up or are
shutdown for maintenance
Consumer can be unstable
If your queues fill up regularly , use lazy
queues
Lazy queues are good when you need very
long queues
Millions of messages
Lazy queues semantics
Lazy queues
Send messages to the disk if there is no consumer
Load messages in memory when requested by consumers
Consequences
Lazy queues consume much less RAM that default queues
Lazy queues increase I/O, but no more than using persistent
messages
Two declaration
Use an argument when declaring the queue
o x-queue-mode = lazy
Use a policy
o queue-mode = lazy
48. Performance
Best Practices on Client Side
Prefer confirms and acknowledgements over
transactions
These are lighter and generally satisfy most
requirement
Use durability and persistence only when
required
This also has a big impact on performance
Efficient design avoids having to use persistence
o Idempotency
o Retry logic
When dealing with multiple resources (queue +
database)
Use best effort pattern
Best Practices on Server Side
Do not deactivate the flow control
mechanism
Server won’t crash because it gets
overwhelmed by publishers
Monitor the health of your broker
By checking the log
By having an external health-check
mechanism
49. Production RabbitMQ Cluster
Client Side Problems
Users are using RabbitMQ in bad way
Client libraries are using RabbitMQ in bad
way
Things are not done in an optimal way
Server Side Problems
Unstable RabbitMQ version
Unoptimized configuration for a specific
use case
HA
High Performance
50. Production RabbitMQ Cluster
CloudAMQP
Cluster instruction on like AWS EC2 instance and application installation using by custom
bash script within the cloudinit
23000 running instances 7 clouds 75 Regions
51. Production RabbitMQ Cluster
Do not use too many connections
or channels
Don’t open and close connections or
channels repeatedly
AMQP connections: 7 TCP packages
AMQP channel: 2 TCP packages
AMQP publish: 1 TCP packages
AMQP close channel: 2 TCP packages
AMQP close connections: 2 TCP packages
Total 14-19 packages (+acks)
Keep connection/channel count low
Each connection uses about 100 KB of
RAM
Thousands of connections can be heavy
burden on a RabbitMQ server
Channel and connections leaks are
among the most common erros that we
see
52. Production RabbitMQ Cluster
Separate Connection for Publisher
and Consumer
Flow Control: Might not be able to
consume if the connection is in flow
control
Back pressure: RabbitMQ can apply back
pressure on the TCP connection when the
publisher is sending to many messages
Do not have too large Queue
Less than 10.000 messages in one queue
Heavy load on RAM usage
o In order to free up RAM, RabbitMQ starts
page out messages to disk
o Blocks the queue from processing
messages
Time-consuming to restart a cluster
Limit queue size with TTL or max-length
53. Production RabbitMQ Cluster
Enable lazy queues to get
predictable performance
Lazy queues was added in RabbitMQ 3.6
Writes messages to disk immediately, thus
spreading the work out over time instead of
taking the risk of a performance hit
somewhere down the road
More predictable and smooth performance
curve
o Messages are only loaded memory when they
are needed
Enable lazy queues to get
predictable performance
Enable lazy queues if…
o The publisher is sending many messages at
once
o The consumers are not keeping up with
speed of the publishers all the time
Ignore lazy queues if…
o You require high performance
o Queues are always short
54. Production RabbitMQ Cluster
Do not set RabbitMQ Management statistics rate mode to detailed
The RabbitMQ management collects and calculates metrics for every queue,
connection and channel in the cluster
Slow down the server if you have thousands of active queues on consumers
55. Production RabbitMQ Cluster
Split queues over different cores
and route messages to multiple
queues
A queue is single threaded
o 50K messages/s
Queues performance is limited to one CPU
core
All messages routed to a specific queue
will end up on the node that queue
resides
Plugins
The consistent hash exchange plugin
RabbitMQ sharding
56. Production RabbitMQ Cluster
The consistent hash exchange
plugin
Load-balance messages between queues
Messages are consistently and equally
distributed across many queues
Consume from all queues
RabbitMQ Sharding
Automatic partitioning of queues
Queues are created on every cluster node
and messages are sharded across them
Shows one queue to the consumer, but it
could be many queues running behind it
in the background
57. Production RabbitMQ Cluster
Have limited use on priority
queues
Each priority level uses an internal queue
on the Erlang VM, which takes up
resources
In most use case it is not sufficient to have
no more than 5 priority levels.
Send persistent messages and
durable queues
Messages , exchanges and queues that are
not durable and persistent are lost during
a broker restart
High performance – use transit messages
and temporary or non-durable queues
58. Production RabbitMQ Cluster
Adjust prefetch value
Limits how messages the client can receive
before acknowledging a message
RabbitMQ default prefetch value –
unlimited buffer
RabbitMQ 3.7
o Option to adjust default prefetch
o CloudAMQP servers has default prefetch of
1000
Prefetch – Too Small Prefetch
RabbitMQ is most of the time waiting to
get permission to send more messages
60. Production RabbitMQ Cluster
Prefetch
One single or few consumers with short
processing time
o Prefetch many messages at once
About the same processing time and
stable network
o Estimated prefetch value by using the total
round trip time divided by processing time
on the client for each message
Prefetch
Many consumers, and short processing time
o A lower prefetch value than for one single or
a few consumers
Many consumers and/or long processing
time
o Set prefetch count to 1 so that messages are
evenly distributed among all your workers
The prefetch value have no effect if your
client auto-ack messages
61. Production RabbitMQ Cluster
Hipe
Hipe increases server throughput at the cost of increased start-up time
Increase throughput with 20-80%
Increase start-up time about 1 to 3 minutes
Hipe is recommended if you require high availability
We do not consider Hipe as experimental any longer
62. Production RabbitMQ Cluster
Acknowledgements and Confirms
Pay attention to where in your consumer
logic you are acknowledging messages
For the faster possible throughput, manual
acks should be disabled
Publish confirm is required if the publisher
needs messages to be processed at least
one
Use a stable RabbitMQ version
3.7
Default prefetch
Individual virtual host message store
3.6
Lots of memory problem up to 3.6.14
Lazy queues
3.5
Still may customers on 3.5.7
63. Production RabbitMQ Cluster
Disable plugins you are not using
Some plugins are consuming lots of
resources
Make sure to disable plugins that you are
not using
Delete unused queues
Unused queues take up some resources,
queue index, management statistics etc.
Temporary queues should be auto deleted
64. Production RabbitMQ Cluster
Summary Overall
Short queues
Long lived connections
Limited use of priority queues
Use multiple queues and consumers
Split your queues over different cores
Stable Erlang and RabbitMQ version
Disable plugins you are not using
Channel on all connections
Summary Overall
Separate connections for publisher and
consumers
Management statistics rate mode
Delete unused queues
Temporary queues should be auto deleted
65. Production RabbitMQ Cluster
Summary High Performance
Short queues
o Max-length if possible
Do not use lazy queues
Send transit messages
Disable manual acks and publish confirms
Avoid multiple nodes (HA)
Enable HiPE
Summary High Availability
Enable lazy queues
RabbitMQ HA -2 nodes
o HA policy on all virtual hosts
Persistent messages , durable queues
Do not enable HiPE
66. What is the Next RabbitMQ 3.8
Quorum queues (mirrored queue 2.0) based on RAFT
Oauth 2.0 support
Mixed version cluster
Mnevis: a new schema data store
Protocol-agnostic core
67. What is the Next RabbitMQ 3.8
RAFT
A group of algorithms for reaching consensus in a
distributed system
Leader election
Normal operation (basic log replication)
Safety and consistency after leader changes
Neutralizing old leaders
Client interactions
Reconfiguration
More Detail
http://thesecretlivesofdata.com/raft/
https://slideplayer.com/slide/11869544
68. RAFT
October 2013Raft Consensus Algorithm
Replicated log replicated state machine
All servers execute same commands in same order
Consensus module ensures proper log replication
System makes progress as long as any majority of servers are up
Failure model: fail-stop (not Byzantine), delayed/lost messages
x3 y2 x1 z6
Log
Consensus
Module
State
Machine
Log
Consensus
Module
State
Machine
Log
Consensus
Module
State
Machine
Servers
Clients
x3 y2 x1 z6 x3 y2 x1 z6
z6
69. OAUTH 2.0
Implement as a plugin, rabbitmq_auth_backend_oauth2
OAuth2.0/JWT token scopes that follow naming conventions are translated to
RabbitMQ permissions
Clients can use any OAuth 2.0 code flow
Management UI will use the authorization code flow
Officially supported clients will simplify token renewal
Details : https://github.com/rabbitmq/rabbitmq-auth-backend-oauth2
70. Siemens PUBSUB USE CASE
Publisher and Consumer uses MQTT protocol for now.We can not scale our consumer in this design so In
Next maybe subscriber protocol will change to the amqp