RabbitMQ in Sprayer
Alvaro Saurin
01

Introduction
What is sprayer?
Sprayer is a notification service
we can send to individual receivers

Publisher

send “hello” to
“alvaro...
What is sprayer?
Sprayer has PubSub semantics
we can send to receivers subscribed to a topic

Publisher

send “hello” to
u...
What is sprayer?
Sprayer supports many receivers classes
●
●
●
●

Publisher

HTTP: remote URLs where we POST
WebSockets: u...
perform individual
/ group deliveries

Sprayer Architecture
provides a
REST API to
publishers

internal communication
back...
Sprayer Architecture
… multiple instances of everything: we scale up!

Dispatcher
Dispatcher
WebSocket
WebSocket

Accepter...
02

RabbitMQ
RabbitMQ in Sprayer
so we use RabbitMQ as the main transport for messages

we ended up having a few queues:
●

queues for ...
queues for jobs (messages) to dispatchers

queues for status reports
...
RabbitMQ limitations
So Dispatchers get messages from RabbitMQ and send it to IOS devices, HTTP URLs…
But Howl is a specia...
RabbitMQ limitations
With RabbitMQ we could have millions of almost empty
queues, waiting for their receivers to connect…
...
Sprayer Architecture
Dispatcher
HTTP

Dispatcher
WebSocket
Accepter

RabbitMQ
Dispatcher
IOS

mongo

Status
Feeder

Dispat...
Sprayer Architecture

Dispatcher
WebSocket
Accepter

RabbitMQ

redis
Sprayer Architecture

receiver-1

...
router

receiver-n

WebSocket
connection
handler

...

Dispatcher
WebSocket

WebSock...
Sprayer Architecture
if “receiver-1” is connected, the
handler will be notified about the
new message...
a new message arr...
Sprayer Architecture

receiver-1

...
RabbitMQ

router

receiver-n
updates

WebSocket
connection
handler

...
WebSocket
co...
Sprayer Architecture
we could also have multiple
uWSGi instances...

HTTP(S) /
WebSocket

receiver-1

...
RabbitMQ

router...
RabbitMQ and Python
●

●

●

Python library: Kombu
○ Nice features: can change backend from
RabbitMQ (for example, to Redi...
RabbitMQ and Python

●

●

So we decided to go for a simple pool of
connections…
We restrict who can read from queues

>>>...
How we read from queues
we could also use a pool of readers...

waiter greenlet
waiter greenlet
RabbitMQ
JOBS:DISP:HOWL

r...
03

Clustering and HA
High Availability: overview
Target: always (someone) available, zero downtime!!
/etc/rabbitmq/rabbitmq.config

First, you ...
High Availability: the cluster
cluster nodes types:
●
●

disk-node: they persist data to disk
memory-node: they only work ...
High Availability: the cluster
Some operations considerations:

●

nodes must be started up in reverse order to the one th...
High Availability: overview
… then we can use Mirrored Queues for achieving HA…

Mirrored Queues:
Active/active high avail...
High Availability: setting things up
Very easy setup:
●

we can also specify # of replicas,
or the nodes names...

From co...
How it
works

JOBS:DISP:HOWL
JOBS:DISP:HTTP
JOBS:DISP:...

JOBS:DISP:HOWL
Producer

RabbitMQ
JOBS:DISP:HTTP
JOBS:DISP:...
...
How it
works

JOBS:DISP:HOWL

Slave

JOBS:DISP:HOWL

Master

JOBS:DISP:HOWL

Slave

Producer

we replicate Howl’s queue in...
JOBS:DISP:HOWL

What can
go wrong...

JOBS:DISP:HOWL
Consumer

Producer

JOBS:DISP:HOWL

If a slave fails, nothing
importa...
JOBS:DISP:HOWL

What can
go wrong...

JOBS:DISP:HOWL
Consumer

Producer

JOBS:DISP:HOWL

If it is the master, the oldest
n...
In the future, we should
add a load balancer like
HAProxy...

RabbitMQ
/etc/haproxy.cfg
...
listen rabbitmq_local_cluster ...
… and what about Kombu?
what should we do in our Python publisher/consumer when someone dies?

nothing!!!
can ensure() ope...
What else can go wrong...
Partitioning
RabbitMQ clusters do not tolerate network partitions well.
●

Forget WAN-clusters (...
Partitions
auto-healing: the winning partition is the one which has the most clients connected*. It will
restart all nodes...
04

Future directions
Maybe we will grow...
We must investigate flow-control:

●

prevent messages being published faster than they can be route...
Maybe we will grow...
We must investigate prefetching in our clients:
●

could vastly increase performance

… and very eas...
Maybe we will grow...
Shovels and Federations and can help by partitioning
the application domain:
●

Publisher

exchange
...
Maybe we will grow...
and last but not least...

SSD disks seem to make a huge difference in RabbitMQ performance. All dis...
RabbitMQ in Sprayer
Alvaro Saurin

questions…?
Upcoming SlideShare
Loading in …5
×

RabbitMQ in Sprayer

2,271 views

Published on

Sprayer is a low-latency messaging system supporting the delivery of messages to millions of users. In this talk I explain Sprayer's architecture and how we use RabbitMQ as our backbone transport technology.

Published in: Technology
0 Comments
7 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
2,271
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
42
Comments
0
Likes
7
Embeds 0
No embeds

No notes for slide

RabbitMQ in Sprayer

  1. 1. RabbitMQ in Sprayer Alvaro Saurin
  2. 2. 01 Introduction
  3. 3. What is sprayer? Sprayer is a notification service we can send to individual receivers Publisher send “hello” to “alvaro @ iphone#555123 456” Sprayer “hello” 0..1 Receiver
  4. 4. What is sprayer? Sprayer has PubSub semantics we can send to receivers subscribed to a topic Publisher send “hello” to users subscribed to “news” Sprayer “hello” 0..n Receivers
  5. 5. What is sprayer? Sprayer supports many receivers classes ● ● ● ● Publisher HTTP: remote URLs where we POST WebSockets: users connect to us Push notifications to Android & IOS SMS, Email... send “hello” to users subscribed to “news” Sprayer “hello” 0..n
  6. 6. perform individual / group deliveries Sprayer Architecture provides a REST API to publishers internal communication backbone (ie, transport for messages to dispatchers) Dispatcher HTTP Dispatcher WebSocket Accepter RabbitMQ Dispatcher IOS mongo Status Feeder collect delivery and connection reports Dispatcher Android
  7. 7. Sprayer Architecture … multiple instances of everything: we scale up! Dispatcher Dispatcher WebSocket WebSocket Accepter Accepter Apache / Django mongo mongo Cluster Dispatcher Dispatcher HTTP HTTP RabbitMQ Dispatcher Dispatcher IOS IOS Cluster Status Status Feeder Feeder Python daemons Dispatcher Dispatcher Android Android Python daemons
  8. 8. 02 RabbitMQ
  9. 9. RabbitMQ in Sprayer so we use RabbitMQ as the main transport for messages we ended up having a few queues: ● queues for sending messages to dispatchers ● queues for sending delivery reports to the status feeder ● ...
  10. 10. queues for jobs (messages) to dispatchers queues for status reports ...
  11. 11. RabbitMQ limitations So Dispatchers get messages from RabbitMQ and send it to IOS devices, HTTP URLs… But Howl is a special case... ● messages are published and they must be stored until the receiver connects to Howl ● once the receiver connects, we give it the message receiver-1 Publisher Accepter RabbitMQ Howl HTTP(S) / WebSocket receiver-1 ● so messages could wait on the queue until the receiver connects for a loooong time...
  12. 12. RabbitMQ limitations With RabbitMQ we could have millions of almost empty queues, waiting for their receivers to connect… But RabbitMQ does not like millions of queues… exchange receiver-1 exchange receiver-n routing info queues can be on a single node (or more...) all the routing info (ie, queue, exchange and binding records) is held in memory on each node The solution: do not use one-RabbitMQ-queue-per-receiver: we use Redis as a last mile storage.
  13. 13. Sprayer Architecture Dispatcher HTTP Dispatcher WebSocket Accepter RabbitMQ Dispatcher IOS mongo Status Feeder Dispatcher Android
  14. 14. Sprayer Architecture Dispatcher WebSocket Accepter RabbitMQ redis
  15. 15. Sprayer Architecture receiver-1 ... router receiver-n WebSocket connection handler ... Dispatcher WebSocket WebSocket connection handler redis redis Dispatcher WebSocket (Howl)
  16. 16. Sprayer Architecture if “receiver-1” is connected, the handler will be notified about the new message... a new message arrives for “receiver-1”, so it is routed to a regular Redis list WebSocket connection handler receiver-1 ... RabbitMQ router ... receiver-n WebSocket connection handler updates then it goes to Redis for the new message for “receiver-1” redis send “receiver-1 has a message” to a Redis pubsub channel processes pool HTTP(S) / WebSocket n processes x m greenlets pool Dispatcher WebSocket (Howl) and everything scales smoothly...
  17. 17. Sprayer Architecture receiver-1 ... RabbitMQ router receiver-n updates WebSocket connection handler ... WebSocket connection handler redis processes pool HTTP(S) / WebSocket n processes x m greenlets pool uWSGI receivers connect to uWSGI through WebSockets it can autoscale processes and greenlets...
  18. 18. Sprayer Architecture we could also have multiple uWSGi instances... HTTP(S) / WebSocket receiver-1 ... RabbitMQ router ... receiver-n uWSGI updates HTTP(S) / WebSocket redis processes pool ... uWSGI … and then we could add a load balancer in front (like nginx)
  19. 19. RabbitMQ and Python ● ● ● Python library: Kombu ○ Nice features: can change backend from RabbitMQ (for example, to Redis) very easily We use gevent ○ high concurrency level ○ greenlets everywhere!! Connection vs channels Connections ● ● ● cannot be shared between greenlets we do not want to have too many they are greenlet-friendly: wake up current greenlet when we can read(). channel channel ... connection Channels ● ● ● designed for having one-per-thread, we could have one-per-greenlet we can have “many of them” (ie, 65535) not so greenlet-friendly
  20. 20. RabbitMQ and Python ● ● So we decided to go for a simple pool of connections… We restrict who can read from queues >>> connection = Connection('amqp://') >>> pool = connection.Pool(2) >>> c1 = pool.acquire() >>> c2 = pool.acquire() >>> c3 = pool.acquire() >>> c1.release() >>> c3 = pool.acquire()
  21. 21. How we read from queues we could also use a pool of readers... waiter greenlet waiter greenlet RabbitMQ JOBS:DISP:HOWL reader waiter greenlet ... greenlet that reads messages from the queue and sends them to waiter greenlets waiter greenlet pool we reduce the number of connections... process
  22. 22. 03 Clustering and HA
  23. 23. High Availability: overview Target: always (someone) available, zero downtime!! /etc/rabbitmq/rabbitmq.config First, you need a cluster !!! [ ... {rabbit, [ Then you need to share a cookie between all the members of the cluster... ... {cluster_nodes, {['rabbit@rabbit1', 'rabbit@rabbit2', … and that’s all !!! 'rabbit@rabbit3'], disc}}, ... ]}, ... ].
  24. 24. High Availability: the cluster cluster nodes types: ● ● disk-node: they persist data to disk memory-node: they only work with on-memory data /etc/rabbitmq/rabbitmq.config [ ... For maximum security, at least 2 disk nodes are recommended. {rabbit, [ ... {cluster_nodes, For maximum performance, do not use all disk-nodes. {['rabbit@rabbit1', 'rabbit@rabbit2', 'rabbit@rabbit3'], disc}}, ... ]}, ... ]. (RabbitMQ 2.8.1 - PowerEdge R610 with dual Xeon E5530s and 40GB RAM)
  25. 25. High Availability: the cluster Some operations considerations: ● nodes must be started up in reverse order to the one they were stopped down ● remember: all the nodes in the cluster must share the same Erlang cookie in order to communicate
  26. 26. High Availability: overview … then we can use Mirrored Queues for achieving HA… Mirrored Queues: Active/active high availability for queues. This works by allowing queues to be mirrored on other nodes within a RabbitMQ cluster. The result is that should one node of a cluster fail, the queue can automatically switch to one of the mirrors and continue to operate, with no unavailability of service.
  27. 27. High Availability: setting things up Very easy setup: ● we can also specify # of replicas, or the nodes names... From command line: rabbitmqctl set_policy ha-all "^JOBS." '{"ha-mode":"all"}' ● … or from the web console ● … or when creating the queue from Kombu
  28. 28. How it works JOBS:DISP:HOWL JOBS:DISP:HTTP JOBS:DISP:... JOBS:DISP:HOWL Producer RabbitMQ JOBS:DISP:HTTP JOBS:DISP:... JOBS:DISP:HOWL JOBS:DISP:HTTP JOBS:DISP:...
  29. 29. How it works JOBS:DISP:HOWL Slave JOBS:DISP:HOWL Master JOBS:DISP:HOWL Slave Producer we replicate Howl’s queue in all nodes...
  30. 30. JOBS:DISP:HOWL What can go wrong... JOBS:DISP:HOWL Consumer Producer JOBS:DISP:HOWL If a slave fails, nothing important happens (clients consuming from a mirrored queue are in fact consuming from the master.)
  31. 31. JOBS:DISP:HOWL What can go wrong... JOBS:DISP:HOWL Consumer Producer JOBS:DISP:HOWL If it is the master, the oldest node is promoted to master
  32. 32. In the future, we should add a load balancer like HAProxy... RabbitMQ /etc/haproxy.cfg ... listen rabbitmq_local_cluster 127.0.0.1: 5672 mode tcp HAProxy balance roundrobin RabbitMQ server rabbit_0 rabbit0:5672 check inter 5000 rise 2 fall 3 server rabbit_1 rabbit1:5672 check inter 5000 rise 2 fall 3 server rabbit_2 rabbit2:5672 check inter 5000 rise 2 fall 3 RabbitMQ ...
  33. 33. … and what about Kombu? what should we do in our Python publisher/consumer when someone dies? nothing!!! can ensure() operations: will reconnect if needed, and recreate exchanges, queues, etc. >>> from kombu import Connection, Producer >>> conn = Connection('amqp://') >>> producer = Producer(conn) after reconnection to a different node… >>> def errback(exc, interval): ● takes some time to realize a node is down… ● ... logger.error('Error: %r', exc, exc_info=1) ... logger.info('Retry in %s seconds.', interval) so ensure() can take some time to >>> publish = conn.ensure(producer, producer.publish, complete... >>> publish({'hello': 'world'}, routing_key='dest') ... errback=errback, max_retries=3)
  34. 34. What else can go wrong... Partitioning RabbitMQ clusters do not tolerate network partitions well. ● Forget WAN-clusters (use federation or shovel). ● HA-queues end up with a master in each partition. ● When connectivity is restored, you must choose one partition which you trust the most 2 partitions
  35. 35. Partitions auto-healing: the winning partition is the one which has the most clients connected*. It will restart all nodes that are not in the winning partition... (*) ...or if this produces a draw, the one with the most nodes... and if that still produces a draw then one of the partitions is chosen in an unspecified way
  36. 36. 04 Future directions
  37. 37. Maybe we will grow... We must investigate flow-control: ● prevent messages being published faster than they can be routed to queues ● prevent messages from being published when running out of memory/disk
  38. 38. Maybe we will grow... We must investigate prefetching in our clients: ● could vastly increase performance … and very easy to set in Kombu: class kombu.messaging.Consumer(): def qos(prefetch_size=0,prefetch_count=0): …
  39. 39. Maybe we will grow... Shovels and Federations and can help by partitioning the application domain: ● Publisher exchange site 1 Shovels: messages published to a queue on Consumer broker A go to a exchange on broker B. ● Federation: messages published to an exchange or queue on broker A go an exchange or queue on broker B. Some exchanges in a broker may be federated while some may be local, so you can partition the domain in many brokers... Publisher exchange site 2 Consumer federated exchanges
  40. 40. Maybe we will grow... and last but not least... SSD disks seem to make a huge difference in RabbitMQ performance. All disk writes in Rabbit are append-only, but when if you have many queues…
  41. 41. RabbitMQ in Sprayer Alvaro Saurin questions…?

×