High scale ﬂavor(recipes for message queueing) Tomas (t0m) Doran London Perl Workshop 2010
What is message queueing.• In it’s simplest form, it’s just a list.• 1 (or more) ‘producers’ (writers)• 1 (or more) ‘consumers’ (readers)• Queue if rate of production > rate of consumption
Why do I want it?• Decouples producers and consumers (probably across the network).• Lets you manage load, and spikes in load (only n consumers).• In a web environment - lets you serve more pages, quicker.
What is the problem with web apps?• App servers take a lot of RAM• Context switching expensive• A lot of apps think like a CGI.• Making the user wait IS HORRIBLE.• Anything but very fast pages is fail.
One page request per CPU core Even if extra context switching has zero overhead you serve people sooner if you queue requests. A B A B A B A B A B A ﬁnishes signiﬁcantly before B in the lower diagram B ﬁnishes at the same time in bothN.B. This explicitly assumes you never wait on external IO (e.g. a database)
Message queueing• Many many ﬂavors. • Going to cover options available right now in perl• First, a little theory
Messaging Topologies• I.e. how producers and consumers interact together through the message broker• 3 common patterns - considerably more complex applications possible.• Even within these there is additional complexity to consider, e.g. message durability.
1: Publish-Subscribe• ‘Topic’ in ActiveMQ• Anonymous queues in AMQP• One (or more) publishers• Zero (or more) consumers• Every consumer gets every message• Messages discarded if no consumers• E.g. log message listener(s)
2: Queue(s)• One or more producers• One or more consumers• Each message delivered to exactly one consumer• Messages ‘queued’ (possibly to disk)• E.g. Job queue with worker pool allowing you to work efﬁciently through high load spikes
3: Request / Response• Create anonymous queue for replies• Publish to well known queue(s), include return address• Wait for reply.• Arrange for messages to be discarded if you stop listening for the reply.
STOMP• Streamed Text Oriented Messageing Protocol• Simple. Interoperable.• You probably want to use ActiveMQ• Net::STOMP• Simple semantics: Queues and Topics• You can build the 3 simple patterns from these
ActiveMQ - Topics• Publish / subscribe semantics.• Message goes to all the subscribers• Zero or more subscribers• Messages thrown away if no subscribers• Safe (cannot ﬁll up server with undelivered messages)
AMQP• More complex than STOMP• Wiring of message routing is part of the protocol.• All your clients know (at least half) of the wiring.• Different topologies depending on routing conﬁguration.• Nice when your server dies - no ‘current conﬁg’
Concepts - Exchanges• Named• Messages are published (sent) to one exchange• Can be durable (forces all queues attached to be durable)
Queues• Queues can be named.• Queues are bound to one (or more) exchanges.• Queues can have 0 or more clients• Queues may persist or be deleted when they have no clients• FIFO (if you have 1 consumer)• Message never delivered to > 1 client
Bindings• Binding is what joins a queue and an exchange.• There can be more than one binding for each queue, allowing a single consumer to listen to multiple message sources• You can bind to topic exchanges selectively via the message ‘routing key’
Job queue• Named exchange• Bound to 1 named and persistent queue• 0 or more listeners get round-robin messages• Messages queue when nobody listens / if consumers are slow
Publish/Subscribe• Named exchange• Each client creates an anonymous ephemeral queue• Client binds the queue to that exchange• All clients get all messages• Messages go to /dev/null if no clients
AMQP can be complex• Different exchange types - direct, topic, fanout (and custom exchange types possible)• Messages have a routing key allowing selective binding.• You can do a lot using these and a mix of named and anonymous queues• Much more complex topologies possible
Implementations:• So - I want a queue• What do I use? • Naive approaches • Job queue only approaches • More sophisticated/custom approaches
(Shared) database table• Have a ‘jobs’ table, with some data, and a ‘status’ column.• Waiting => Running => Done• Job workers poll the table and change statuses.• NO NO NO NO NO NO NO NO• No, really, mst will come and break your legs if you do this (after he stops laughing)
(Shared) database table• Have a ‘jobs’ table, with some data, and a ‘status’ column.• Waiting => Running => Done• Job workers poll the table and change statuses.• NO NO NO NO NO NO NO NO• No, really, mst will come and break your legs if you do this.
(Shared) database table• Have a ‘jobs’ table, with some data, and a ‘status’ column.• Waiting => Running => Done• MySQL will get the query plan wrong if you try joining this table (hint: the cardinality on your status column is 3).• You will lose super-hard. HAND.
(Queue) database table• Have separate queued / running / done tables• Less bad for performance - at least the ‘ﬁnd something to do’ query is very cheap.• Still pretty terrible.• You still re-invented a big old wheel here, probably badly.
Gearman• Around since 2006. Multi platform.• Not really a message queue - designed as a job queuing system• Client, Job Server, Worker• Failover (multiple job servers)• NOT persistent• Simple, works well (if that’s all you need)
TheSchwartz• Is Persistent• From the same place as Gearman• Not as well adopted• Relies on a MySQL database - SPOF• Still simple - maybe the easiest way to get started (if you need reliable)?
Client libs: Net::Stomp• Apache ActiveMQ.• Dead simple producers and consumers.• Just a client - you need to manage / run your own jobs.• Blocking.• Works perfectly well for sending messages from a web app.
Client libs: Net::RabbitFoot• AMQP / RabbitMQ• AnyEvent based - non blocking.• Documentation pretty poor (sorry).• Works well if you have an async app.• Can be used inside a web app for simple sending.
Catalyst::Engine::Stomp• By chrisa @ Venda & yours truly; now maintained by Paul Moony.• Simple framework for writing jobs / managing workers.• Allows you to ﬁre and forget, or ﬁre and wait for termination (and pass a message back)• Achieves many of the same things a Gearman• Used in anger by several companies. (http://miltonkeynes.pm.org/talks/2010/06/paul_mooney_stomp_moosex_workers.pdf)
Net::ActiveMQ• Builds on Catalyst::Engine::STOMP• Chisel talked about this at YAPC::EU (Friday PM ‘Going Postal’)• NetAPorter people - poke him to release it (or at least put it on github)!
Web::Hippie• Persistent (potentially bidirectional) web pipe to applications.• Cheap connection, no polling needed (on reasonably modern browsers). Great as the listener part for ‘replies’• Downsides - needs to be async - no DBI (kinda)!• Plays very nicely with RabbitMQ (hint MooseX::Storage & Joose.Storage <3)
CatalystX::JobServer• My current baby.• Uses AMQP.• Provides Web::Hippie pipes for jobs - shiny shiny ajax updates.• Barely production ready (but useable).• I’m talking about it later...
Conclusions• Use JSON for your message payloads.• You probably want to use Gearman if you can get away with it.• STOMP works well and is simple, tried and tested jobs solution.• AMQP is nicer and more ﬂexible, but there are less proven solutions (in perl).