Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Cassandra queuing


Published on

Published in: Technology
  • Polling for the events in Cassandra could quite easily be avoided by using a distributed cooordination service such as ZooKeeper in conjunction with Cassandra.
    Are you sure you want to  Yes  No
    Your message goes here

Cassandra queuing

  1. 1. Queuing with Cassandra David Strauss [email_address] @davidstrauss
  2. 2. Why we use queues <ul><li>Loose coupling </li><ul><li>Different languages/interprocess
  3. 3. Integrate legacy and new systems
  4. 4. Allow publishers to be unaware of listeners </li></ul><li>Asynchronous requests </li><ul><li>Long-running tasks </li></ul><li>Failure tolerance </li><ul><li>Of nodes within the queue system
  5. 5. Of systems using the queue </li></ul></ul>
  6. 6. Possible queue guarantees Deliver exactly once Deliver at least once Deliver no more than once + =
  7. 7. “Enterprise” queues <ul><li>ActiveMQ </li><ul><li>Delivers at most once
  8. 8. Punts “at least once” to lower-level redundancy </li></ul><li>RabbitMQ </li><ul><li>Clustered </li><ul><li>No guarantee of “at most once”
  9. 9. Will deliver at least once </li></ul><li>Unclustered </li><ul><li>Delivers at most once, but could fail </li></ul></ul></ul>
  10. 10. Job queues <ul><li>Beanstalkd </li><ul><li>Delivers at most once
  11. 11. Can optionally persist to disk </li></ul><li>Gearman </li><ul><li>Delivers at most once
  12. 12. No persistence between restarts </li></ul></ul>
  13. 13. What's annoying about these <ul><li>Inflexible service levels </li><ul><li>Entire installation or cluster guarantees exactly the same delivery semantics for all messages
  14. 14. Not all messages are equal </li></ul><li>No scalable “at least once” queue </li><ul><li>RabbitMQ, replicates all messages to all nodes </li><ul><li>Limits scalability to what a single node can do </li></ul><li>Sending messages redundantly to multiple job queue nodes makes multiple delivery the common case
  15. 15. Application-integrated sharding doesn't count </li></ul></ul>
  16. 16. Why Cassandra? <ul><li>Processes queuing messages can use ConsistencyLevel to indicate a service level </li><ul><li>CL.ZERO is “would be nice to deliver” </li><ul><li>Same guarantee as a non-persistent queue </li></ul><li>CL.ONE is low-latency with some durability </li><ul><li>Same guarantee as a single-node persistent queue </li></ul><li>CL.QUORUM (or more) is “delivery at least once” </li><ul><li>Same guarantee as clustered persistence (e.g. Rabbit) </li></ul></ul><li>Queue is sharded/partitioned across nodes </li><ul><li>Unlike RabbitMQ </li></ul><li>Can co-locate queue with data </li></ul>
  17. 17. Queue data models for Cassandra <ul><li>Use rows as queues </li><ul><li>Best performance for ordered messages
  18. 18. Scale limited to row size (but still huge by queue standards and possible to partition) </li></ul><li>Use column families as queues with RP </li><ul><li>Distributes queue items over a cluster
  19. 19. No message ordering </li></ul><li>Use column families as queues with OPP </li><ul><li>Distributes less well over a cluster
  20. 20. Provides message ordering </li></ul></ul>
  21. 21. When you want or need “at most once” semantics <ul><li>When things are idempotent, you don't
  22. 22. When trying to avoid resource contention or redundant computation </li><ul><li>Possible to make single consumer the common case with smart consumers
  23. 23. memcached for imperfect but scalable/HA locking </li></ul><li>When something absolutely cannot happen more than once </li><ul><li>The “bank transfer” case
  24. 24. Give messages unique identity and use locking managed by consumers
  25. 25. Use a locking framework like Zookeeper
  26. 26. Audit periodically for the effects of duplication and correct
  27. 27. Maybe don't use Cassandra... </li></ul></ul>
  28. 28. What's annoying about Cassandra queues <ul><li>Polling is necessary </li><ul><li>Makes this bad for low-latency queues </li></ul><li>Adding locking requires interfacing code with multiple systems </li><ul><li>Even then, locking is usually optimistic rather than a coordinated reservation of work items </li></ul></ul>
  29. 29. So, consider Cassandra for queuing if you have... <ul><li>Different messages with different delivery importance </li><ul><li>But most messages need to reach consumers “at least once” </li></ul><li>Limited need for “at most once” guarantees
  30. 30. Too much message volume to handle throughput on a single node
  31. 31. Willingness to poll and have high latency </li></ul>