Your SlideShare is downloading. ×

High performance queues with Cassandra

2,207

Published on

Everyone knows that Cassandra is a NoSQL solution for data storage. But often for processing of this data message queues are used with some existing messaging provider. Due to this, there is …

Everyone knows that Cassandra is a NoSQL solution for data storage. But often for processing of this data message queues are used with some existing messaging provider. Due to this, there is inconsistency of data sometimes and an additional infrastructure level to maintain. Since one of our services stores all the data in Cassandra, we have developed a solution for message queues that automatically gained a lot of useful features: scalability, high availability and flexibility. This solution I will present in the talk.

Published in: Technology, Business
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
2,207
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
0
Comments
0
Likes
2
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. High performance queues with Cassandra Mikalai Alimenkou http://xpinjection.com @xpinjection
  • 2. Kiev, Ukraine #евромайдан
  • 3. How to process data, master ? Asynchronously!
  • 4. Queues usage scenario
  • 5. More realistic scenario
  • 6. More realistic scenario
  • 7. Are you crazy? Cassandra for queues?
  • 8. So many cool MQ providers
  • 9. Initial expected loading
  • 10. Some specific requirements 1 Message Queue External Service Provider
  • 11. Some specific requirements Message Queue 1 External Service Provider
  • 12. Some specific requirements 2 Message Queue External Service Provider
  • 13. Some specific requirements Message Queue 2 1 Redeliver after 1 hour External Service Provider
  • 14. Some specific requirements 2 Message Queue External Service Provider Redeliver after 1 hour Redelivery business logic and external service hourly based usage limits
  • 15. Idea came from railways
  • 16. “Body flow” in regular life
  • 17. Message batches “station” QUEUE NOW ALMOST READY +1 HOUR WAITING +6 HOURS WAITING +12 HOURS WAITING
  • 18. System components: Message MESSAGE = REAL REQUEST DATA WITH UNIQUE ID 1ST FIELD 3rd FIELD { field data} MESSAGE ID 2nd FIELD { field data} { field data} ID FORMAT ALLOWS 4096 MESSAGES PER MILLISECOND FROM ONE NODE Timestamp 44 bits Counter 12 bits Cluster node ID 8 bits
  • 19. System components: Batch • Open for at least 1 second • Closing if opened for > 10 seconds • Closing if has > 100 messages Ascending columns ordering 1ST MESSSAGE ID 2nd MESSAGE ID 3rd MESSAGE ID BATCH ID { opt message data} { opt message data} { opt message data} ID FORMAT REQUIRE BATCH TO BE OPENED FOR > 1 SECOND Timestamp Rounded to seconds Cluster node ID + Batch Type Last 3 digits
  • 20. System components: Queue • Similar to batch • Unlimited • May have batches with past time Ascending columns ordering 1ST BATCH ID QUEUE NAME 2nd BATCH ID 3rd BATCH ID { processed at } { processed at } { processed at }
  • 21. System components: Broker batches polling BROKER check batch time process batch PROCESSOR QUEUE lock batch for processing ZOOKEEPER • • • • • Natural pre-fetch thanks to batches Easy to control messages processing Simple concurrency model Easy scalable between nodes No high loading on Cassandra
  • 22. System components: Processor PROCESSOR
  • 23. System components: Processor PROCESSOR OK
  • 24. System components: Processor PROCESSOR
  • 25. System components: Processor PROCESSOR redeliver on failure ANOTHER BATCH • • • • Tries to process messages as quickly as possible On error just redeliver message Messages are processed concurrently Any redelivery business logic is easy to implement
  • 26. Warnings and benefits • Message and batch must be checked before processing • Hard to explain “queue” size • Separate columns for status tracking of message • Perform correct compaction from time to time • Expected loading is handled with single node • Everything works on commodity hardware • Single storage for all data • System is easily scalable and reliable (no message was lost)
  • 27. Show me the code!
  • 28. @xpinjection http://xpinjection.com mikalai.alimenkou@xpinjection.com

×