Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Building a 
Distributed 
Data Ingestion 
System 
with RabbitMQ 
Alvaro Videla - RabbitMQ
Alvaro Videla 
• Works at RabbitMQ 
• Co-Author of RabbitMQ in Action 
• Creator of the RabbitMQ Simulator 
• Blogs about ...
About Me 
Co-authored 
RabbitMQ in Action 
h!p://bit.ly/rabbitmq
About this Talk 
• Exploratory Talk 
• A ‘what could be done’ talk instead of ‘this is how you do it’
Agenda 
• Intro to RabbitMQ 
• The Problem 
• Solution Proposal 
• Improvements
h!ps://twi!er.com/spacemanaki/status/514590885523505153
What is RabbitMQ
Multi Protocol 
http://bit.ly/rmq-protocols
Community Plugins 
http://www.rabbitmq.com/community-plugins.html
Polyglot
Polyglot 
• PHP
Polyglot 
• PHP 
• node.js
Polyglot 
• PHP 
• node.js 
• Erlang
Polyglot 
• PHP 
• node.js 
• Erlang 
• Java
Polyglot 
• PHP 
• node.js 
• Erlang 
• Java 
• Ruby
Polyglot 
• PHP 
• node.js 
• Erlang 
• Java 
• Ruby 
• .Net
Polyglot 
• PHP 
• node.js 
• Erlang 
• Java 
• Ruby 
• .Net 
• Haskell
Polyglot 
Even COBOL!!!11
Some users of RabbitMQ
Some users of RabbitMQ 
• Instagram
Some users of RabbitMQ 
• Instagram 
• Indeed.com
Some users of RabbitMQ 
• Instagram 
• Indeed.com 
• Telefonica
Some users of RabbitMQ 
• Instagram 
• Indeed.com 
• Telefonica 
• Mercado Libre
Some users of RabbitMQ 
• Instagram 
• Indeed.com 
• Telefonica 
• Mercado Libre 
• Mozilla
The New York Times on RabbitMQ 
This architecture - Fabrik - has dozens of RabbitMQ instances 
spread across 6 AWS zones i...
h!p://www.rabbitmq.com/download.html 
Unix - Mac - Windows
Messaging with RabbitMQ 
h!ps://github.com/RabbitMQSimulator/ 
RabbitMQSimulator
h!p://tryrabbitmq.com
RabbitMQ Simulator
The Problem
Distributed Application 
App 
App 
App 
App
{ok, Connection} = 
Data Producer 
Obtain a Channel 
amqp_connection:start(#amqp_params_network{host = "localhost"}), 
{ok...
Data Producer 
Declare an Exchange 
amqp_channel:call(Channel, #’exchange.declare'{exchange = <<"events">>, 
type = <<"dir...
Data Producer 
amqp_channel:cast(Channel, 
Publish a message 
#'basic.publish'{ 
exchange = <<"events">>}, 
#amqp_msg{prop...
Data Consumer 
Obtain a Channel 
{ok, Connection} = 
amqp_connection:start(#amqp_params_network{host = "localhost"}), 
{ok...
Data Consumer 
Declare Queue and bind it 
amqp_channel:call(Channel, #'exchange.declare'{exchange = <<"events">>, 
type = ...
Data Consumer 
amqp_channel:subscribe(Channel, #'basic.consume'{queue = Queue, 
no_ack = true}, self()), 
receive 
#'basic...
loop(Channel) -> 
receive 
Data Consumer 
{#'basic.deliver'{}, #amqp_msg{payload = Body}} -> 
io:format(" [x] ~p~n", [Body...
Distributed Application 
App 
App 
App 
App
Ad-hoc solution
A process that replicates data 
to the remote server
Possible issues
Possible issues 
• Remote server is offline
Possible issues 
• Remote server is offline 
• Prevent unbounded local buffers
Possible issues 
• Remote server is offline 
• Prevent unbounded local buffers 
• Prevent message loss
Possible issues 
• Remote server is offline 
• Prevent unbounded local buffers 
• Prevent message loss 
• Prevent unnecess...
Possible issues 
• Remote server is offline 
• Prevent unbounded local buffers 
• Prevent message loss 
• Prevent unnecess...
Possible issues 
• Remote server is offline 
• Prevent unbounded local buffers 
• Prevent message loss 
• Prevent unnecess...
Can we do be!er?
RabbitMQ Federation
RabbitMQ Federation
RabbitMQ Federation 
• Supports replication across different administrative domains
RabbitMQ Federation 
• Supports replication across different administrative domains 
• Supports mix of Erlang and RabbitMQ...
RabbitMQ Federation 
• Supports replication across different administrative domains 
• Supports mix of Erlang and RabbitMQ...
RabbitMQ Federation 
• Supports replication across different administrative domains 
• Supports mix of Erlang and RabbitMQ...
RabbitMQ Federation
RabbitMQ Federation
RabbitMQ Federation
RabbitMQ Federation 
• It’s a RabbitMQ Plugin
RabbitMQ Federation 
• It’s a RabbitMQ Plugin 
• Internally uses Queues and Exchanges Decorators
RabbitMQ Federation 
• It’s a RabbitMQ Plugin 
• Internally uses Queues and Exchanges Decorators 
• Managed using Paramete...
Enabling the Plugin 
rabbitmq-plugins enable rabbitmq_federation
Enabling the Plugin 
rabbitmq-plugins enable rabbitmq_federation 
rabbitmq-plugins enable rabbitmq_federation_management
Federating an Exchange 
rabbitmqctl set_parameter federation-upstream my-upstream  
‘{“uri":"amqp://server-name","expires"...
Federating an Exchange 
rabbitmqctl set_parameter federation-upstream my-upstream  
‘{“uri":"amqp://server-name","expires"...
Federating an Exchange
Configuring Federation
Config Options 
rabbitmqctl set_parameter federation-upstream  
name ‘json-object’
Config Options 
rabbitmqctl set_parameter federation-upstream  
name ‘json-object’ 
json-object: { 
‘uri’: ‘amqp://server-...
Prevent unbound buffers 
expires: N // ms. 
message-ttl: N // ms.
Prevent message forwarding 
max-hops: N
Speed vs No Message Loss 
ack-mode: on-confirm 
ack-mode: on-publish 
ack-mode: no-ack
AMQP URI: 
amqp://user:pass@host:10000/vhost 
http://www.rabbitmq.com/uri-spec.html
Config can be applied via 
• CLI using rabbitmqctl 
• HTTP API 
• RabbitMQ Management Interface
RabbitMQ Federation
Some Queueing Theory 
http://www.rabbitmq.com/blog/2012/05/11/some-queuing-theory-throughput-latency-and-bandwidth/
RabbitMQ BasicQos Simulator
Prevent Unbound Buffers 
λ = mean arrival time 
μ = mean service rate 
if λ > μ what happens? 
https://www.rabbitmq.com/bl...
Prevent Unbound Buffers 
λ = mean arrival time 
μ = mean service rate 
if λ > μ what happens? 
Queue length goes to infini...
Recommended Reading 
Performance Modeling and 
Design of Computer Systems: 
Queueing Theory in Action
Scaling the Setup
The Problem
The Problem 
• Queues contents live in the node where the Queue was declared
The Problem 
• Queues contents live in the node where the Queue was declared 
• A cluster can access the queue from every ...
The Problem 
• Queues contents live in the node where the Queue was declared 
• A cluster can access the queue from every ...
The Problem 
• Queues contents live in the node where the Queue was declared 
• A cluster can access the queue from every ...
Enter Sharded Queues
Pieces of the Puzzle 
• modulo hash exchange (consistent hash works as well) 
• good ol’ queues
Sharded Queues
Sharded Queues
Sharded Queues
Sharded Queues
Sharded Queues 
• Declare Queues with name: nodename.queuename.index
Sharded Queues 
• Declare Queues with name: nodename.queuename.index 
• Bind the queues to a partitioner exchange
Sharded Queues 
• Declare Queues with name: nodename.queuename.index 
• Bind the queues to a partitioner exchange 
• Trans...
We need more scale!
Federated Queues
Federated Queues 
• Load-balance messages across federated queues
Federated Queues 
• Load-balance messages across federated queues 
• Only moves messages when needed
Federating a Queue 
rabbitmqctl set_parameter federation-upstream my-upstream  
‘{“uri":"amqp://server-name","expires":360...
Federating a Queue 
rabbitmqctl set_parameter federation-upstream my-upstream  
‘{“uri":"amqp://server-name","expires":360...
With RabbitMQ we can
With RabbitMQ we can 
• Ingest data using various protocols: AMQP, MQTT and STOMP
With RabbitMQ we can 
• Ingest data using various protocols: AMQP, MQTT and STOMP 
• Distribute that data globally using F...
With RabbitMQ we can 
• Ingest data using various protocols: AMQP, MQTT and STOMP 
• Distribute that data globally using F...
With RabbitMQ we can 
• Ingest data using various protocols: AMQP, MQTT and STOMP 
• Distribute that data globally using F...
Credits 
world map: wikipedia.org 
federation diagrams: rabbitmq.com
Questions?
Thanks! 
Alvaro Videla - @old_sound
Upcoming SlideShare
Loading in …5
×

Построение распределенной системы сбора данных с помощью RabbitMQ, Alvaro Videla (Pivotal Inc.)

2,570 views

Published on

Доклад Альваро Видела на HighLoad++ 2014.

Published in: Internet
  • Be the first to comment

  • Be the first to like this

Построение распределенной системы сбора данных с помощью RabbitMQ, Alvaro Videla (Pivotal Inc.)

  1. 1. Building a Distributed Data Ingestion System with RabbitMQ Alvaro Videla - RabbitMQ
  2. 2. Alvaro Videla • Works at RabbitMQ • Co-Author of RabbitMQ in Action • Creator of the RabbitMQ Simulator • Blogs about RabbitMQ Internals: h!p://videlalvaro.github.io/ internals.html • @old_sound — alvaro@rabbitmq.com — github.com/videlalvaro
  3. 3. About Me Co-authored RabbitMQ in Action h!p://bit.ly/rabbitmq
  4. 4. About this Talk • Exploratory Talk • A ‘what could be done’ talk instead of ‘this is how you do it’
  5. 5. Agenda • Intro to RabbitMQ • The Problem • Solution Proposal • Improvements
  6. 6. h!ps://twi!er.com/spacemanaki/status/514590885523505153
  7. 7. What is RabbitMQ
  8. 8. Multi Protocol http://bit.ly/rmq-protocols
  9. 9. Community Plugins http://www.rabbitmq.com/community-plugins.html
  10. 10. Polyglot
  11. 11. Polyglot • PHP
  12. 12. Polyglot • PHP • node.js
  13. 13. Polyglot • PHP • node.js • Erlang
  14. 14. Polyglot • PHP • node.js • Erlang • Java
  15. 15. Polyglot • PHP • node.js • Erlang • Java • Ruby
  16. 16. Polyglot • PHP • node.js • Erlang • Java • Ruby • .Net
  17. 17. Polyglot • PHP • node.js • Erlang • Java • Ruby • .Net • Haskell
  18. 18. Polyglot Even COBOL!!!11
  19. 19. Some users of RabbitMQ
  20. 20. Some users of RabbitMQ • Instagram
  21. 21. Some users of RabbitMQ • Instagram • Indeed.com
  22. 22. Some users of RabbitMQ • Instagram • Indeed.com • Telefonica
  23. 23. Some users of RabbitMQ • Instagram • Indeed.com • Telefonica • Mercado Libre
  24. 24. Some users of RabbitMQ • Instagram • Indeed.com • Telefonica • Mercado Libre • Mozilla
  25. 25. The New York Times on RabbitMQ This architecture - Fabrik - has dozens of RabbitMQ instances spread across 6 AWS zones in Oregon and Dublin. Upon launch today, the system autoscaled to ~500,000 users. Connection times remained flat at ~200ms. h!p://lists.rabbitmq.com/pipermail/rabbitmq-discuss/2014-January/032943.html
  26. 26. h!p://www.rabbitmq.com/download.html Unix - Mac - Windows
  27. 27. Messaging with RabbitMQ h!ps://github.com/RabbitMQSimulator/ RabbitMQSimulator
  28. 28. h!p://tryrabbitmq.com
  29. 29. RabbitMQ Simulator
  30. 30. The Problem
  31. 31. Distributed Application App App App App
  32. 32. {ok, Connection} = Data Producer Obtain a Channel amqp_connection:start(#amqp_params_network{host = "localhost"}), {ok, Channel} = amqp_connection:open_channel(Connection),
  33. 33. Data Producer Declare an Exchange amqp_channel:call(Channel, #’exchange.declare'{exchange = <<"events">>, type = <<"direct">>}),
  34. 34. Data Producer amqp_channel:cast(Channel, Publish a message #'basic.publish'{ exchange = <<"events">>}, #amqp_msg{props = #'P_basic'{delivery_mode = 2}, payload = <<“Hello Federation">>}),
  35. 35. Data Consumer Obtain a Channel {ok, Connection} = amqp_connection:start(#amqp_params_network{host = "localhost"}), {ok, Channel} = amqp_connection:open_channel(Connection),
  36. 36. Data Consumer Declare Queue and bind it amqp_channel:call(Channel, #'exchange.declare'{exchange = <<"events">>, type = <<"direct">>}), #'queue.declare_ok'{queue = Queue} = amqp_channel:call(Channel, #'queue.declare'{exclusive = true}), amqp_channel:call(Channel, #'queue.bind'{exchange = <<"events">>, queue = Queue}),
  37. 37. Data Consumer amqp_channel:subscribe(Channel, #'basic.consume'{queue = Queue, no_ack = true}, self()), receive #'basic.consume_ok'{} -> ok end, loop(Channel). Start a consumer
  38. 38. loop(Channel) -> receive Data Consumer {#'basic.deliver'{}, #amqp_msg{payload = Body}} -> io:format(" [x] ~p~n", [Body]), loop(Channel) end. Process messages
  39. 39. Distributed Application App App App App
  40. 40. Ad-hoc solution
  41. 41. A process that replicates data to the remote server
  42. 42. Possible issues
  43. 43. Possible issues • Remote server is offline
  44. 44. Possible issues • Remote server is offline • Prevent unbounded local buffers
  45. 45. Possible issues • Remote server is offline • Prevent unbounded local buffers • Prevent message loss
  46. 46. Possible issues • Remote server is offline • Prevent unbounded local buffers • Prevent message loss • Prevent unnecessary message replication
  47. 47. Possible issues • Remote server is offline • Prevent unbounded local buffers • Prevent message loss • Prevent unnecessary message replication • No need for those messages on remote server
  48. 48. Possible issues • Remote server is offline • Prevent unbounded local buffers • Prevent message loss • Prevent unnecessary message replication • No need for those messages on remote server • Messages that became stale
  49. 49. Can we do be!er?
  50. 50. RabbitMQ Federation
  51. 51. RabbitMQ Federation
  52. 52. RabbitMQ Federation • Supports replication across different administrative domains
  53. 53. RabbitMQ Federation • Supports replication across different administrative domains • Supports mix of Erlang and RabbitMQ versions
  54. 54. RabbitMQ Federation • Supports replication across different administrative domains • Supports mix of Erlang and RabbitMQ versions • Supports Network Partitions
  55. 55. RabbitMQ Federation • Supports replication across different administrative domains • Supports mix of Erlang and RabbitMQ versions • Supports Network Partitions • Specificity - not everything has to be federated
  56. 56. RabbitMQ Federation
  57. 57. RabbitMQ Federation
  58. 58. RabbitMQ Federation
  59. 59. RabbitMQ Federation • It’s a RabbitMQ Plugin
  60. 60. RabbitMQ Federation • It’s a RabbitMQ Plugin • Internally uses Queues and Exchanges Decorators
  61. 61. RabbitMQ Federation • It’s a RabbitMQ Plugin • Internally uses Queues and Exchanges Decorators • Managed using Parameters and Policies
  62. 62. Enabling the Plugin rabbitmq-plugins enable rabbitmq_federation
  63. 63. Enabling the Plugin rabbitmq-plugins enable rabbitmq_federation rabbitmq-plugins enable rabbitmq_federation_management
  64. 64. Federating an Exchange rabbitmqctl set_parameter federation-upstream my-upstream ‘{“uri":"amqp://server-name","expires":3600000}'
  65. 65. Federating an Exchange rabbitmqctl set_parameter federation-upstream my-upstream ‘{“uri":"amqp://server-name","expires":3600000}' rabbitmqctl set_policy --apply-to exchanges federate-me "^amq." '{"federation-upstream-set":"all"}'
  66. 66. Federating an Exchange
  67. 67. Configuring Federation
  68. 68. Config Options rabbitmqctl set_parameter federation-upstream name ‘json-object’
  69. 69. Config Options rabbitmqctl set_parameter federation-upstream name ‘json-object’ json-object: { ‘uri’: ‘amqp://server-name/’, ‘prefetch-count’: 1000, ‘reconnect-delay’: 1, ‘ack-mode’: on-confirm } http://www.rabbitmq.com/federation-reference.html
  70. 70. Prevent unbound buffers expires: N // ms. message-ttl: N // ms.
  71. 71. Prevent message forwarding max-hops: N
  72. 72. Speed vs No Message Loss ack-mode: on-confirm ack-mode: on-publish ack-mode: no-ack
  73. 73. AMQP URI: amqp://user:pass@host:10000/vhost http://www.rabbitmq.com/uri-spec.html
  74. 74. Config can be applied via • CLI using rabbitmqctl • HTTP API • RabbitMQ Management Interface
  75. 75. RabbitMQ Federation
  76. 76. Some Queueing Theory http://www.rabbitmq.com/blog/2012/05/11/some-queuing-theory-throughput-latency-and-bandwidth/
  77. 77. RabbitMQ BasicQos Simulator
  78. 78. Prevent Unbound Buffers λ = mean arrival time μ = mean service rate if λ > μ what happens? https://www.rabbitmq.com/blog/2014/01/23/preventing-unbounded-buffers-with-rabbitmq/
  79. 79. Prevent Unbound Buffers λ = mean arrival time μ = mean service rate if λ > μ what happens? Queue length goes to infinity over time. https://www.rabbitmq.com/blog/2014/01/23/preventing-unbounded-buffers-with-rabbitmq/
  80. 80. Recommended Reading Performance Modeling and Design of Computer Systems: Queueing Theory in Action
  81. 81. Scaling the Setup
  82. 82. The Problem
  83. 83. The Problem • Queues contents live in the node where the Queue was declared
  84. 84. The Problem • Queues contents live in the node where the Queue was declared • A cluster can access the queue from every connected node
  85. 85. The Problem • Queues contents live in the node where the Queue was declared • A cluster can access the queue from every connected node • Queues are an Erlang process (tied to one core)
  86. 86. The Problem • Queues contents live in the node where the Queue was declared • A cluster can access the queue from every connected node • Queues are an Erlang process (tied to one core) • Adding more nodes doesn’t really help
  87. 87. Enter Sharded Queues
  88. 88. Pieces of the Puzzle • modulo hash exchange (consistent hash works as well) • good ol’ queues
  89. 89. Sharded Queues
  90. 90. Sharded Queues
  91. 91. Sharded Queues
  92. 92. Sharded Queues
  93. 93. Sharded Queues • Declare Queues with name: nodename.queuename.index
  94. 94. Sharded Queues • Declare Queues with name: nodename.queuename.index • Bind the queues to a partitioner exchange
  95. 95. Sharded Queues • Declare Queues with name: nodename.queuename.index • Bind the queues to a partitioner exchange • Transparent to the consumer (virtual queue name)
  96. 96. We need more scale!
  97. 97. Federated Queues
  98. 98. Federated Queues • Load-balance messages across federated queues
  99. 99. Federated Queues • Load-balance messages across federated queues • Only moves messages when needed
  100. 100. Federating a Queue rabbitmqctl set_parameter federation-upstream my-upstream ‘{“uri":"amqp://server-name","expires":3600000}'
  101. 101. Federating a Queue rabbitmqctl set_parameter federation-upstream my-upstream ‘{“uri":"amqp://server-name","expires":3600000}' rabbitmqctl set_policy --apply-to queues federate-me "^images." '{"federation-upstream-set":"all"}'
  102. 102. With RabbitMQ we can
  103. 103. With RabbitMQ we can • Ingest data using various protocols: AMQP, MQTT and STOMP
  104. 104. With RabbitMQ we can • Ingest data using various protocols: AMQP, MQTT and STOMP • Distribute that data globally using Federation
  105. 105. With RabbitMQ we can • Ingest data using various protocols: AMQP, MQTT and STOMP • Distribute that data globally using Federation • Scale up using Sharding
  106. 106. With RabbitMQ we can • Ingest data using various protocols: AMQP, MQTT and STOMP • Distribute that data globally using Federation • Scale up using Sharding • Load balance consumers with Federated Queues
  107. 107. Credits world map: wikipedia.org federation diagrams: rabbitmq.com
  108. 108. Questions?
  109. 109. Thanks! Alvaro Videla - @old_sound

×