Rabbitmq & Kafka Presentation

RABBITMQ
RABBITMQSUMMIT2018
Unrestricted

AMQP
 Advanced Message Queuing Protocol – is a protocol that defines some features
such as : message orientation , queuing , routing , security and reliability.
 Point-to-Point and Pub-Sub pattern
 The protocol sending data across the networks as a streams of octects so it does
not depend on any API (like JMS) independent vendor and clients

AMQP Classes and Methods
 Connection
 Start, Start Ok, Close
 Channel
 Open, Close, Flow
 Exchange
 Declare, Bind, Unbind, Delete
 Queue
 Declare, Bind, Purge
 Basic
 Publish , Consume, Get, Recover
 TX
 Select, Commit, Rollback
 Confirm
 Select, Select OK
 Exceptions
 Arguments to Channel, Close

RabbitMQ Exchanges and Queues
 The super simplified overview:
 Publishers send messages to exchanges
 Exchanges route messages to queues and other exchanges
 RabbitMQ sends acknowledgements to publishers on message receipt
 Consumers maintain persistent TCP connections with RabbitMQ and declare which
queue(s) they consume
 RabbitMQ pushes messages to consumers
 Consumers send acknowledgements of success/failure
 Messages are removed from queues once consumed successfully

Routing
 Fanout. Routes to all queues and exchanges that have a binding to the exchange.
The standard pub sub model.
 Direct. Routes messages based on a Routing Key that the message carries with it,
set by the publisher. A routing key is a short string. Direct exchanges route
messages to queues/exchanges that have a Binding Key that exactly matches the
routing key.
 Topic. Routes messages based on a routing key, but allows wildcard matching.
 Header. RabbitMQ allows for custom headers to be added to messages. Header
exchanges route messages according to those header values. Each binding
includes exact match header values. Multiple values can be added to a binding
with ANY or ALL values required to match.
 Consistent Hashing. This is an exchange that hashes either the routing key or a
message header and routes to one queue only. This is useful when you need
processing order guarantees with scaled out consumers.

Siemens Use Case Topic Exchange

Apache Kafka
 Distributed because Kafka is deployed as a cluster of nodes, for both fault
tolerance and scale
 Replicated because messages are usually replicated across multiple nodes
(servers).
 Commit Log because messages are stored in partitioned, append only logs which
are called Topics. This concept of a log is the principal killer feature of Kafka.

Apache Kafka
Consumer Group
 One producer, three partitions and one
consumer group with three consumers
 Some consumers read from more than one
partition

Apache Kafka
Rebalancing
 Powerful feature of Kafka but not RabbitMQ

Plugins
 Shovel
 Federation
 Consistent Hashing Exchange, Sharding Exchange
 protocols like STOMP and MQTT

Plugins
Shovel Plugin
 Plugin transfer messages from queue on
one broker to an exchange on another
broker
 Shovel is just a well-written client
application
 Read messages from source
 Forward messages to destionation
 Deal with connection failure
 Has plenty options
 Shovel has two modules
 Shovel engine
 Shovel management console

Plugins
Federation
 Forward messages between
broker without clustering
 Federation is designed to
scale out publish / subscribe
messaging accros WAN

Plugins
Sharding
 Sharding is performed by exchanges
 Queues will be automatically created on
every cluster node and messages will be
sharded across them
 Required Consumer = Sharded Queues
 Logical queue images = sharded queues
images.1,images.2 etc

Plugins
Sharding
 The plugin provides a new exchange type, "x-modulus-hash", that will use a hashing
function to partition messages routed to a logical queue across a number of regular queues
(shards).
 "x-modulus-hash“ has the routing key then pick the queue where Hash mod N (N is number
of queue bound to exchange)
 If only used partition and automatic scaling number of shard is not needed use Consistent
Hash Exchange plugin
 Automatic scaling
 You have 4 sharded queue on one node if a new node join two cluster automatically create 4 more
queues on the new node
 Total ordering of messages between shards is not defined.

Plugins
Sharding
 Consuming from Sharded Queue
 The plugin will chose the queue from the shard
with the least amount of consumers, provided
the queue contents are local to the broker you
are connected to.
 Note: The plugin did not resolve automatically
scale out consumers
 Once you have enabled the plugin, you create an
exchange. If you want hash based routing, you
declare an x-modulus-hash exchange. Then you
apply a policy to it with the field “shards-per-
node”. Let’s say we have three nodes. If we set
“shards-per-node” to 5 then we’ll end up with 15
queues created across those nodes.
 Load Distribution and Consumer Balancing
 “queue master locator policy used”
 "Minimum masters" is a queue master
locator that is most in line with the goals of
this plugin.
 load balancers, the "least connections"
strategy is more likely to produce an even
distribution compared to round robin and
other strategies.

Plugins
Consistent Hash Exchange
 Consistent Hashing exchange allows us to
partition a single queue into multiple
queues and distribute messages between
them via a hashing of the routing key,
message header or message property.
 Topic or Direct exchange using the
Customer Id as the routing key. you have a
large number of customers then you'd
soon have thousands of queues.
Buckets
 Another way would be to use consistent
hashing and a modulus function to create
a consistent routing key
var numberOfQueues = 10;
var routingKey =
BitConverter.ToInt64(CreateMD5Hash(custome
rId)) % numberOfQueues;

Plugins
 We use this number between 0 and 5 as
the routing key. Because the base is a
consistent hash, customer 1000 will always
be placed in the same bucket
 PROBLEM : If you change from 5 buckets
to 10 however, the bucket will most likely
not be the same anymore.
Bucketing of Customer Id to route to a fixed number of
queues

Plugins
 A better way of doing it would to be to
use the Consistent Hash Exchange.
Instead of calculating a hash ourselves, we
let the exchange do it. We simply put the
Customer Id as the routing key.
Messages are distributed by partitioned hash space

Plugins
CHE & Sharding
 PROBLEMS
 RabbitMQ doesn't help you to coordinate
your consumers
 the thing you hash (routing key, message
header or property) doesn't have enough
variance to create an even distribution. If
you only have four different values then
you might get unlucky and all go to a
single queue.
 you have relatively few queues then
distribution may be uneven again.
PROBLEMS
 Sharding is the act of taking a data set and
splitting it across multiple machines. which
machine takes what subset of data ?
 Consistent Hash Exchange
 The distribution of keys (a key is the name of a
piece of data) across the machines is always
relatively even.
 The addition or removal of a machine shifts
around roughly 1/N of the keys, where N is the
final number of nodes

RabbitMQ Clustering
 Clustering Client Connection
 Clients connect through a software or hardware load balancer
 Advantages: clients do not need to be aware of cluster membership

RabbitMQ Clustering
Clustering does not equal full HA
 Standard default queues are not
replicated across nodes in the cluster
 A single node in the cluster contains the
queue messages
 Other nodes simply hold pointers to the
node with actual messages

RabbitMQ Clustering
Clustering + Mirrored Queues =
FULL HA
 Mirrored queues replicate the messages to
other nodes in the cluster.
 In the event of the node hosting the
master copy going down , another node
can take over

RabbitMQ Clustering
Client Side Failover
 Clients must be aware of node failure
 Clients must handle reconnections
 It is important to understand the concepts
behind RabbitMQ clustering, since
 A failure is not fully transparent for a client
(especially a consumer)
 Messages in a queue can be lost after
failure

RabbitMQ HA
 Failure of a Cluster Node
 What happens if the “owner” node dies
o Consumer lose their subscribtions
o New messages matching the bindings are slightly discarded
 If queue was durable, it cannot re-created
o The owner must be restored
 If queue was not durable, it can be re-created
o Even if the former owner is not running

RabbitMQ HA
Mirrored Queues
 A mirrored queue replicates its content on multiple cluster nodes
 Every queue in RabbitMQ has a home node. That node is called queue master. All queue
operations go through the master first and then are replicated to mirrors. This is necessary to
guarantee FIFO ordering of messages.
 The content of a mirrored queue lives on one master node
 The content is replicated on other nodes of the cluster
 Mirrored queues provide high availability
 Mirrored queues also affect performence

RabbitMQ HA
 Mirrored Queue
 one master that receives all reads and writes
 one or more mirrors that receive all messages and meta data from the master. These mirrors
do not exist for scaling out the reading of queues but solely for redundancy.

RabbitMQ HA
Failure of the Master of a Mirrored Queue

RabbitMQ HA
Failover on Client Side
 Node failures are not transparent to client
 Clients must deal with node failures
 Clients bindings provide callbacks in case of errors
 These callbacks make the recovery code easier to write
 There are many failure scenarious: always test yours!

Apache Kafka Cluster
Distributed Systems: Communication
and Consensus
 Worker node membership and naming
 Configuration management
 Leader election
 Health status
 Apache Zookeeper
 Configuration information
 Health status
 Group membership
 Worker Nodes Roles:Controllers,Leaders and
Followers
 Reliability through replication
 Consensus-based communication

 Topic Across Kafka Broker

 Publisher and Consumer Read and Writer Operation

 Simple Apache Kafka Distributed Architecture

 Kafka Replication Topic Across Cluster

RabbitMQ Reliability
What Can Go Wrong
 The broker crashes ; sent but not-yet-consumed messages are lost!
 The broker can provide persistence
 Business processing after reception times out, or is temporarily down
 The broker can redeliver the message
 Messages must be sent as a group, but the application fails in the middle
 Thanks to transactions, the broker can provide atomic operations

RabbitMQ Relability
Durability and Persistence
 Durability only works if queues are durable and messages are persistent
 Durability is the ‘D’ in ACID
 Surviving a server restart
 Surviving a broker crash
 Durability happens at three different levels:
 Exchange
 Queue
 Message (referred to as “persistence”)

RabbitMQ Relability
Transactions Across Senders and
Consumers
 Transactions do not span the sender and consumer
 A common transaction would couple the sender to the consumer
 Messaging is all about decoupling!
 Transaction semantics are only between the sender-broker and broker-consumer

Performance
Factors Impact Performance
 Business factors:
 Message size
 Topology
 Client Code:
 Durability / message persistence
 Acknowledgments
 Transactions
 Publisher confirms
 Channel configuration
 Broker:
 Queue durability
 Clustering
 High Availability (mirrored) queues

Performance
Delivery Configuration on the client side
 The broker can limit what it delivers to consumers
 The consumer configures delivery at the level at channel level
 The broker continues deliveries, depending on consumer’s ACK
 Channel.basicQos() to specify the “quality of service”
 Parameters:
 prefetchCount: the number of unacknowledged messages the server will deliver to a
channel (0 means there is no limit) in a single batch

Performance
Quality of Service Configuration
 basicQos() allow consumer-driven flow control
 Consumer does not get overwhelmed
 Method call is blocking when value is reached
 Useless if auto-ack is enabled: messages are acknowledged as soon as they are delivered

Performance
Quality of Service scope the global flag

Performance
Message Persistence
 Be aware of performance implication
when specifying persistent messages
 Every message will be written to disk
 If the confirms are used and/or consumer
acknowledgements, this can have
significant impact on performance
 Performance impact due to additional disk
I/O
Clustering and Mirrored Queues
 Certain types of cluster have mirrored-queues that
replicate metadata and data on other queues in
the cluster
 Overhead to replicate all the data across the nodes
 Commit and acknowledgment are also replicated
across the nodes
 If persistent messages, and/or publisher confirms,
and/or consumer acknowledgements are used,
performance impact will be higher with mirrored
queues, as overhead is also mirrored

Performance
Memory-based Flow Control
 RabbitMQ comes with a nice feature
 Memory-based flow control
 Used to prevent the broker from accepting too
much traffic from producers
 So that memory is not filled in case of verbose
producers
 Why is needed
 By default on RabbitMQ, messages are accepted
from clients even if they are not written to disk
yet
 Accepting messages is much faster than writing
messages to disk
 In a heavy load scenario , without any flow
control available memory could be completely
consumed
Memory-based Flow Control
 How does it work
 Producer are throttled to reduce throughput
 Consumer are not affected
 Alarm is raised in the log
 The default memory threshold %40 of the server RAM
 This does not prevent RabbitMQ from using more than
%40 RAM
 It is just the point where flow control is turned on
 Usage
 Memory threshold value can be modified in
rabbitmq.config file
 Value of 0 disables the flow control
[{rabbit, [{vm_memory_high_watermark, 0.4}]}]

Performance
Lazy Queues
 Sometimes , queues fill up
 Consumer can not keep up or are
shutdown for maintenance
 Consumer can be unstable
 If your queues fill up regularly , use lazy
queues
 Lazy queues are good when you need very
long queues
 Millions of messages
Lazy queues semantics
 Lazy queues
 Send messages to the disk if there is no consumer
 Load messages in memory when requested by consumers
 Consequences
 Lazy queues consume much less RAM that default queues
 Lazy queues increase I/O, but no more than using persistent
messages
 Two declaration
 Use an argument when declaring the queue
o x-queue-mode = lazy
 Use a policy
o queue-mode = lazy

Performance
Best Practices on Client Side
 Prefer confirms and acknowledgements over
transactions
 These are lighter and generally satisfy most
requirement
 Use durability and persistence only when
required
 This also has a big impact on performance
 Efficient design avoids having to use persistence
o Idempotency
o Retry logic
 When dealing with multiple resources (queue +
database)
 Use best effort pattern
Best Practices on Server Side
 Do not deactivate the flow control
mechanism
 Server won’t crash because it gets
overwhelmed by publishers
 Monitor the health of your broker
 By checking the log
 By having an external health-check
mechanism

Production RabbitMQ Cluster
Client Side Problems
 Users are using RabbitMQ in bad way
 Client libraries are using RabbitMQ in bad
way
 Things are not done in an optimal way
Server Side Problems
 Unstable RabbitMQ version
 Unoptimized configuration for a specific
use case
 HA
 High Performance

 CloudAMQP
 Cluster instruction on like AWS EC2 instance and application installation using by custom
bash script within the cloudinit
23000 running instances 7 clouds 75 Regions

Do not use too many connections
or channels
Don’t open and close connections or
channels repeatedly
 AMQP connections: 7 TCP packages
 AMQP channel: 2 TCP packages
 AMQP publish: 1 TCP packages
 AMQP close channel: 2 TCP packages
 AMQP close connections: 2 TCP packages
 Total 14-19 packages (+acks)
 Keep connection/channel count low
 Each connection uses about 100 KB of
RAM
 Thousands of connections can be heavy
burden on a RabbitMQ server
 Channel and connections leaks are
among the most common erros that we
see

Separate Connection for Publisher
and Consumer
 Flow Control: Might not be able to
consume if the connection is in flow
control
 Back pressure: RabbitMQ can apply back
pressure on the TCP connection when the
publisher is sending to many messages
Do not have too large Queue
 Less than 10.000 messages in one queue
 Heavy load on RAM usage
o In order to free up RAM, RabbitMQ starts
page out messages to disk
o Blocks the queue from processing
messages
 Time-consuming to restart a cluster
 Limit queue size with TTL or max-length

Enable lazy queues to get
predictable performance
 Lazy queues was added in RabbitMQ 3.6
 Writes messages to disk immediately, thus
spreading the work out over time instead of
taking the risk of a performance hit
somewhere down the road
 More predictable and smooth performance
curve
o Messages are only loaded memory when they
are needed
Enable lazy queues to get
predictable performance
 Enable lazy queues if…
o The publisher is sending many messages at
once
o The consumers are not keeping up with
speed of the publishers all the time
 Ignore lazy queues if…
o You require high performance
o Queues are always short

 Do not set RabbitMQ Management statistics rate mode to detailed
 The RabbitMQ management collects and calculates metrics for every queue,
connection and channel in the cluster
 Slow down the server if you have thousands of active queues on consumers

Split queues over different cores
and route messages to multiple
queues
 A queue is single threaded
o 50K messages/s
 Queues performance is limited to one CPU
core
 All messages routed to a specific queue
will end up on the node that queue
resides
Plugins
 The consistent hash exchange plugin
 RabbitMQ sharding

The consistent hash exchange
plugin
 Load-balance messages between queues
 Messages are consistently and equally
distributed across many queues
 Consume from all queues
RabbitMQ Sharding
 Automatic partitioning of queues
 Queues are created on every cluster node
and messages are sharded across them
 Shows one queue to the consumer, but it
could be many queues running behind it
in the background

Have limited use on priority
queues
 Each priority level uses an internal queue
on the Erlang VM, which takes up
resources
 In most use case it is not sufficient to have
no more than 5 priority levels.
Send persistent messages and
durable queues
 Messages , exchanges and queues that are
not durable and persistent are lost during
a broker restart
 High performance – use transit messages
and temporary or non-durable queues

Adjust prefetch value
 Limits how messages the client can receive
before acknowledging a message
 RabbitMQ default prefetch value –
unlimited buffer
 RabbitMQ 3.7
o Option to adjust default prefetch
o CloudAMQP servers has default prefetch of
1000
Prefetch – Too Small Prefetch
 RabbitMQ is most of the time waiting to
get permission to send more messages

Prefetch – Too large value

Prefetch
 One single or few consumers with short
processing time
o Prefetch many messages at once
 About the same processing time and
stable network
o Estimated prefetch value by using the total
round trip time divided by processing time
on the client for each message
Prefetch
 Many consumers, and short processing time
o A lower prefetch value than for one single or
a few consumers
 Many consumers and/or long processing
time
o Set prefetch count to 1 so that messages are
evenly distributed among all your workers
 The prefetch value have no effect if your
client auto-ack messages

Hipe
 Hipe increases server throughput at the cost of increased start-up time
 Increase throughput with 20-80%
 Increase start-up time about 1 to 3 minutes
 Hipe is recommended if you require high availability
 We do not consider Hipe as experimental any longer

Acknowledgements and Confirms
 Pay attention to where in your consumer
logic you are acknowledging messages
 For the faster possible throughput, manual
acks should be disabled
 Publish confirm is required if the publisher
needs messages to be processed at least
one
Use a stable RabbitMQ version
 3.7
 Default prefetch
 Individual virtual host message store
 3.6
 Lots of memory problem up to 3.6.14
 Lazy queues
 3.5
 Still may customers on 3.5.7

Disable plugins you are not using
 Some plugins are consuming lots of
resources
 Make sure to disable plugins that you are
not using
Delete unused queues
 Unused queues take up some resources,
queue index, management statistics etc.
 Temporary queues should be auto deleted

Summary Overall
 Short queues
 Long lived connections
 Limited use of priority queues
 Use multiple queues and consumers
 Split your queues over different cores
 Stable Erlang and RabbitMQ version
 Disable plugins you are not using
 Channel on all connections
Summary Overall
 Separate connections for publisher and
consumers
 Management statistics rate mode
 Delete unused queues
 Temporary queues should be auto deleted

Summary High Performance
 Short queues
o Max-length if possible
 Do not use lazy queues
 Send transit messages
 Disable manual acks and publish confirms
 Avoid multiple nodes (HA)
 Enable HiPE
Summary High Availability
 Enable lazy queues
 RabbitMQ HA -2 nodes
o HA policy on all virtual hosts
 Persistent messages , durable queues
 Do not enable HiPE

What is the Next RabbitMQ 3.8
 Quorum queues (mirrored queue 2.0) based on RAFT
 Oauth 2.0 support
 Mixed version cluster
 Mnevis: a new schema data store
 Protocol-agnostic core

What is the Next RabbitMQ 3.8
RAFT
 A group of algorithms for reaching consensus in a
distributed system
 Leader election
 Normal operation (basic log replication)
 Safety and consistency after leader changes
 Neutralizing old leaders
 Client interactions
 Reconfiguration
More Detail
 http://thesecretlivesofdata.com/raft/
 https://slideplayer.com/slide/11869544

RAFT
October 2013Raft Consensus Algorithm
 Replicated log  replicated state machine
 All servers execute same commands in same order
 Consensus module ensures proper log replication
 System makes progress as long as any majority of servers are up
 Failure model: fail-stop (not Byzantine), delayed/lost messages
x3 y2 x1 z6
Log
Consensus
Module
State
Machine
Log
Consensus
Module
State
Machine
Log
Consensus
Module
State
Machine
Servers
Clients
x3 y2 x1 z6 x3 y2 x1 z6
z6

OAUTH 2.0
 Implement as a plugin, rabbitmq_auth_backend_oauth2
 OAuth2.0/JWT token scopes that follow naming conventions are translated to
RabbitMQ permissions
 Clients can use any OAuth 2.0 code flow
 Management UI will use the authorization code flow
 Officially supported clients will simplify token renewal
 Details : https://github.com/rabbitmq/rabbitmq-auth-backend-oauth2

Siemens PUBSUB USE CASE
 Publisher and Consumer uses MQTT protocol for now.We can not scale our consumer in this design so In
Next maybe subscriber protocol will change to the amqp

RabbitMQ Cluster Automation
 We are using Gitlab,Terraform,Python,Ansible,RabbitMQ plugin,Bash Script
 Simple Pipeline :

References
 https://jack-vanlightly.com/ - RabbitMQ and Kafka
 https://akshatm.svbtle.com/consistent-hash-rings-theory-and-implementation -
Consistent Hash Rings Explained Simply
 http://thesecretlivesofdata.com/raft/ - RAFT simulator
 https://raft.github.io/ - RAFT
 http://people.csail.mit.edu/karger/Talks/Hash/sld001.htm - Consistent Hashing

THANKS
Thank you very much for help this process
 Tuğçe Yılmaz
 Rıdvan Özaydın

Rabbitmq & Kafka Presentation

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Rabbitmq & Kafka Presentation

Similar to Rabbitmq & Kafka Presentation (20)

Recently uploaded

Recently uploaded (20)

Rabbitmq & Kafka Presentation