Your SlideShare is downloading. ×
0
Cassandra queuing
Cassandra queuing
Cassandra queuing
Cassandra queuing
Cassandra queuing
Cassandra queuing
Cassandra queuing
Cassandra queuing
Cassandra queuing
Cassandra queuing
Cassandra queuing
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Cassandra queuing

7,187

Published on

Published in: Technology
1 Comment
4 Likes
Statistics
Notes
  • Polling for the events in Cassandra could quite easily be avoided by using a distributed cooordination service such as ZooKeeper in conjunction with Cassandra.
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
No Downloads
Views
Total Views
7,187
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
65
Comments
1
Likes
4
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Queuing with Cassandra David Strauss [email_address] @davidstrauss
  • 2. Why we use queues <ul><li>Loose coupling </li><ul><li>Different languages/interprocess
  • 3. Integrate legacy and new systems
  • 4. Allow publishers to be unaware of listeners </li></ul><li>Asynchronous requests </li><ul><li>Long-running tasks </li></ul><li>Failure tolerance </li><ul><li>Of nodes within the queue system
  • 5. Of systems using the queue </li></ul></ul>
  • 6. Possible queue guarantees Deliver exactly once Deliver at least once Deliver no more than once + =
  • 7. “Enterprise” queues <ul><li>ActiveMQ </li><ul><li>Delivers at most once
  • 8. Punts “at least once” to lower-level redundancy </li></ul><li>RabbitMQ </li><ul><li>Clustered </li><ul><li>No guarantee of “at most once”
  • 9. Will deliver at least once </li></ul><li>Unclustered </li><ul><li>Delivers at most once, but could fail </li></ul></ul></ul>
  • 10. Job queues <ul><li>Beanstalkd </li><ul><li>Delivers at most once
  • 11. Can optionally persist to disk </li></ul><li>Gearman </li><ul><li>Delivers at most once
  • 12. No persistence between restarts </li></ul></ul>
  • 13. What's annoying about these <ul><li>Inflexible service levels </li><ul><li>Entire installation or cluster guarantees exactly the same delivery semantics for all messages
  • 14. Not all messages are equal </li></ul><li>No scalable “at least once” queue </li><ul><li>RabbitMQ, replicates all messages to all nodes </li><ul><li>Limits scalability to what a single node can do </li></ul><li>Sending messages redundantly to multiple job queue nodes makes multiple delivery the common case
  • 15. Application-integrated sharding doesn't count </li></ul></ul>
  • 16. Why Cassandra? <ul><li>Processes queuing messages can use ConsistencyLevel to indicate a service level </li><ul><li>CL.ZERO is “would be nice to deliver” </li><ul><li>Same guarantee as a non-persistent queue </li></ul><li>CL.ONE is low-latency with some durability </li><ul><li>Same guarantee as a single-node persistent queue </li></ul><li>CL.QUORUM (or more) is “delivery at least once” </li><ul><li>Same guarantee as clustered persistence (e.g. Rabbit) </li></ul></ul><li>Queue is sharded/partitioned across nodes </li><ul><li>Unlike RabbitMQ </li></ul><li>Can co-locate queue with data </li></ul>
  • 17. Queue data models for Cassandra <ul><li>Use rows as queues </li><ul><li>Best performance for ordered messages
  • 18. Scale limited to row size (but still huge by queue standards and possible to partition) </li></ul><li>Use column families as queues with RP </li><ul><li>Distributes queue items over a cluster
  • 19. No message ordering </li></ul><li>Use column families as queues with OPP </li><ul><li>Distributes less well over a cluster
  • 20. Provides message ordering </li></ul></ul>
  • 21. When you want or need “at most once” semantics <ul><li>When things are idempotent, you don't
  • 22. When trying to avoid resource contention or redundant computation </li><ul><li>Possible to make single consumer the common case with smart consumers
  • 23. memcached for imperfect but scalable/HA locking </li></ul><li>When something absolutely cannot happen more than once </li><ul><li>The “bank transfer” case
  • 24. Give messages unique identity and use locking managed by consumers
  • 25. Use a locking framework like Zookeeper
  • 26. Audit periodically for the effects of duplication and correct
  • 27. Maybe don't use Cassandra... </li></ul></ul>
  • 28. What's annoying about Cassandra queues <ul><li>Polling is necessary </li><ul><li>Makes this bad for low-latency queues </li></ul><li>Adding locking requires interfacing code with multiple systems </li><ul><li>Even then, locking is usually optimistic rather than a coordinated reservation of work items </li></ul></ul>
  • 29. So, consider Cassandra for queuing if you have... <ul><li>Different messages with different delivery importance </li><ul><li>But most messages need to reach consumers “at least once” </li></ul><li>Limited need for “at most once” guarantees
  • 30. Too much message volume to handle throughput on a single node
  • 31. Willingness to poll and have high latency </li></ul>

×