Cassandra Community Webinar: CMB - An Open Message Bus for the Cloud


Published on

At Comcast Silicon Valley we have developed a general purpose message bus for the cloud. The service is API compatible with Amazon’s SQS/SNS and is built on Cassandra and Redis with the goal of linear horizontal scalability. In this Webinar we will explore the architecture of the system and how we employ Cassandra as a central component to meet key requirements. We will also take a look at the latest performance numbers.

Published in: Technology
  • Be the first to comment

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Cassandra Community Webinar: CMB - An Open Message Bus for the Cloud

  1. 1. CMBA Message Bus for the Cloud
  2. 2. CMBA Message Bus for the CloudCQS – Queuing ServiceCNS – Topic based Pub Sub Service
  3. 3. Why did we build our own?• General purpose message bus to replace projectdriven one-off solutions• Smooth data center failover, maybe even “active-active” queues• Must scale to millions of queues and 1000s ofmessages/sec (for example 1 queue per STB)• Tight latency requirements (“10ms response time95th pct”)• Evaluated other options to arrive at AWSSQS/SNS
  4. 4. AWS SQS Primer“Simple Queuing Service”• Focus on guaranteed delivery• Best effort on orderly delivery, duplicates• Few simple core APIs:– CreateQueue() / DeleteQueue()– SendMessage()– ReceiveMessage()– DeleteMessage()• Do not trust message recipients
  5. 5. Why did we build our own?AWS SQSGuaranteed Delivery +Simple, Standard API +Horizontally Scalable +Active-Active ?DC Failover ?Latency ?Limitations (Msg Size, # Artifacts, …) ?
  6. 6. “Build a horizontally scalable queuing service ontop of Cassandra (and Redis) which is APIcompatible with AWS SQS API”
  7. 7. CQS over Cassandra & Redis• Cassandra– Cross-DC persistence and replication– Proven horizontal scalability• Redis– Meet latency requirements– Help with best effort ordering– Handle Visibility Timeout (VTO)
  8. 8. Cassandra Data Modeling• How to represent queued messages inCassandra?– Single Column Queue– Single Row Queue– Multi-Row Queue
  9. 9. Cassandra Data ModelingSingle Column Queue
  10. 10. Cassandra Data ModelingSingle Row Queue
  11. 11. Cassandra Data ModelingMulti-Row Queue
  12. 12. CQS Data Flow Example1. SendMessage(MSG1)2. SendMessage(MSG2)3. SendMessage(MSG3)4. MSG1=ReceiveMessage()5. DeleteMessage(MSG1)
  13. 13. CQS Data Flow Example
  14. 14. CQS Data Flow Example
  15. 15. CQS Data Flow Example
  16. 16. CQS Data Flow Example
  17. 17. CQS Data Flow Example
  18. 18. CQS Data Flow Example
  19. 19. CQS ArchitectureRecap• Cassandra Persistence Layer– Messages sharded across 100 rows per queue• Avoid wide rows (> 500K)• Minimize churn (Tombstones)• Distribute queue among Cassandra nodes• Redis Caching Layer– To meet latency requirements• Payload cache (kicks in after first miss, pre-load next 10k)– Improve FIFOness by storing Msg IDs in Redis List– Handle message visibility entirely in Redis (Hashtable)
  20. 20. Cassandra Data ModelingKey Cassandra Features• Persistence and failover– Cross-DC replication in combination with Local QuorumReads/Writes (tunable consistency)• Millions of queues, spiky traffic patterns– Massive horizontal scalability• Message order (FIFOness) / future dated messages– Wide rows, composite column keys / TimeUUID andcolumn sort order• Message retention period (expiration)– TTL• Fast lookup of static metadata (Queues, Users etc.)– Row Cache, Secondary Indexes
  21. 21. Cassandra Data ModelingLessons Learned• Coming from RDBMS background…– Forget the table analogy, rather:• CF = HashMap<RowKey, TreeMap<ColKey, ColValue>– No need to specify column names in advance• Wide rows, value-less columns, composite keys– No unique constraints, no foreign keys, no joins:• Design schema around your queries• Use de-normalization where needed– No inserts (everything is an update!)• Design for idempotent operations• Use globally unique identifiers– But, there are indexes• Use secondary indexes
  22. 22. CQS Scalability and Availability• Scalability– Send(), Receive(), Delete()• Scale with Cassandra Ring, API Servers (stateless) andRedis Shards• Are constant time operations– Queues not sharded across Redis servers!• Availability– Depends on availability of Cassandra– Service functions without Redis!
  23. 23. CQS DC Failover
  24. 24. AWS SNS API“Simple Notification Service”• Topic based Publish/Subscribe Service• Supported protocols: HTTP/CQS/SQS• Few simple core APIs– CreateTopic() / DeleteTopic()– Subscribe() / Unsubscribe()– ConfirmSubscription()– Publish()• Do not trust message recipients (redeliverypolicy)
  25. 25. CNS Data Flow Example1. Publish(MSG1)Publish message MSG1 to a topic T with four subscribers:• S1 (HTTP)• S2 (HTTP)• S5 (CQS)• S6 (CQS)
  26. 26. CNS Data Flow Example
  27. 27. CNS Data Flow Example
  28. 28. CNS Data Flow Example
  29. 29. CNS Data Flow Example
  30. 30. CNS Architecture• CQS Queue preserves messages when PublishWorkers are down or overloaded• CQS Visibility Timeout takes care of guaranteeddelivery• Retry policy and guaranteed delivery–• Publish Workers hardened for rogue endpoints– Fail endpoints, slow endpoints, …
  31. 31. Differences SQS/SNS and CQS/CNS• Goal: Full API compatibility• Current state:– All APIs implemented, most parameters supported– Can use AWS Java SDK and others• Limitations:– AWS4 signatures not supported (V1 and V2 ok)– SMS endpoints not supported, limited email support• Enhancements:– Additional APIs for monitoring and management: PeekMessage(),HealthCheck(), GetWorkerState(), ManageWorker(), ManageService(),GetAPIStats()– Unlimited number of queues, topics and subscriptions– Adjustable message size and other parameters (SNS <= 64KB, SQS <=64KB, LP <= 20 sec, DS <= 900 sec, RP, …)
  32. 32. Use CaseX1 Sports App
  33. 33. Use CaseX1 Sports App
  34. 34. Use CaseX1 Sports App
  35. 35. Use CaseCNS with CQS Endpoints
  36. 36. Moving Forward• Open Sourced (Apache 2.0)• Hardening– CNS Chaos Monkey, …• Follow SNS / SQS– SNS Throttle Policy, AWS4 Sig…• Load and stress testing• Simplify deployment & scale up– Embedded Jetty, RPM package, Puppet scripts…• Production deployments (isolated by application)• CQS as a Service• OpenStack integration
  37. 37. Thank You!!forum/