Anton Kropp
CassieQ: The Distributed Queue Built On Cassandra
© DataStax, All Rights Reserved.
Why use queues?
• Distribution of work
• Decoupling producers/consumers
• Reliability
2
© DataStax, All Rights Reserved.
Existing Queues
• ActiveMQ
• RabbitMQ
• MSMQ
• Kafka
• SQS
• Azure Queue
• others
3
© DataStax, All Rights Reserved.
Advantage of a queue on c*
• Highly available
• Highly distributed
• Massive intake
• Masterless
• Re-use existing data store/operational knowledge
4
© DataStax, All Rights Reserved. 5
But aren’t queues antipatterns?
© DataStax, All Rights Reserved.
Issues with queues in C*
• Modeling off deletes
• Tombstones
• Evenly distributing messages?
• What is the partition key?
• How to synchronize consumers?
6
© DataStax, All Rights Reserved.
Existing C* queues
• Netflix Astyanax recipe
• Cycled time based partitioning
• Row based reader lock
• Messages put into time shard ordered by insert time
• Relies on deletes
• Requires low gc_grace_seconds for fast compaction
7
© DataStax, All Rights Reserved.
Existing C* queues
• Comcast CMB
• Uses Redis as actual queue (cheating)
• Queues are hashed to affine to same redis server
• Cassandra is cold storage backing store
• Random partitioning between 0 and 100
8
© DataStax, All Rights Reserved.
Missing features
• Authentication
• Authorization
• Statistics
• Simple deployment
• Requirement on external infrastructure
9
© DataStax, All Rights Reserved.
CassieQ
• HTTP(s) based API
• No locking
• Fixed size bucket partitioning
• Leverages pointers (kafkaesque)
• Message invisibility
• Azure Queue/SQS inspired
• Docker deployment
• Authentication/authorization
• Ideally once delivery
• Best attempt at FIFO (not guaranteed)
10
© DataStax, All Rights Reserved. 11
docker run –it 
-p 8080:8080 
–p 8081:8081 
paradoxical/cassieq dev
© DataStax, All Rights Reserved.
CassieQ Queue API
12
© DataStax, All Rights Reserved.
CassieQ Admin API
13
© DataStax, All Rights Reserved.
CassieQ workflow
• Client is authorized on an account
• Granular client authorization up to queue level
• Client consumes message from queue with message lease (invisibility)
• Gets pop receipt
• Client acks message with pop receipt
• If pop receipt not valid, lease expired
• Client can update messages
• Update message contents
• Renew lease
14
© DataStax, All Rights Reserved. 16
Lets dig inside
CassieQ internals
© DataStax, All Rights Reserved.
TLDR
• Messages partitioned into fixed sized buckets
• Pointers to buckets/messages used to track current state
• Use of lightweight transactions for atomic actions to avoid locking
• Bucketing + pointers eliminates modeling off deletes
17
© DataStax, All Rights Reserved.
CassieQ Buckets
• Messages stored in fixed sized buckets
• Deterministic when full
• Easy to reason about
• Why not time buckets?
• Time bugs suck
• Non deterministic
• Can miss data due to time overlaps
• Messages given monotonic ID
• CAS “id” table
• Bucket # = monotonicId / bucketSize
18
© DataStax, All Rights Reserved.
Pointers to Buckets/Messages
• Reader pointer
• Tracks which bucket a consumer is on
• Repair pointer
• Tracks first non-finalized bucket
• Invisibility pointer
• Tracks first unacked message
19
All 3 pointers point to monotonic id value, potentially in different
buckets
1 2 3 4 5
InvisPointer ReaderPointer
RepairPointer
Pointers to Buckets
© DataStax, All Rights Reserved.
Schema
21
CREATE TABLE queue (
account_name text,
queuename text,
bucket_size int,
version int,
...
PRIMARY KEY (account_name, queuename)
);
CREATE TABLE message (
queueid text,
bucket_num bigint,
monoton bigint,
message text,
version int,
acked boolean,
next_visible_on timestamp,
delivery_count int,
tag text,
created_date timestamp,
updated_date timestamp,
PRIMARY KEY ((queueid, bucket_num), monoton)
);
*queueid=accountName:queueName:version
© DataStax, All Rights Reserved. 22
Reading messages
© DataStax, All Rights Reserved.
Pointers to Buckets/Messages
• Reader pointer
• Tracks which bucket a consumer is on
• Repair pointer
• Tracks first non-finalized bucket
• Invisibility pointer
• Tracks first unacked message
23
© DataStax, All Rights Reserved.
Reading from a bucket
• Read any unacked message in bucket (either FIFO or random)
• Consume message (update its internal version + set its invisibility timeout)
• Return to consumer
24
1 2 3 4
Bucket 1
Undelivered messages
Reader pointer start
1 2 ? 4 5
Buckets… complications
• Once a monoton is generated, it is taken
• Even if a message fails to insert the monoton is taken
• Buckets are now partially filled!
• How to resolve?
1 2 ? 4 5 6
Bucket 2Bucket 1
Reader
Message 3
missing
When to move off a bucket?
1. All known messages in the bucket have been delivered at least once
2. All new messages being written in future buckets
1 2 ? 4 5 6 7 …
Bucket 2Bucket 1
Reader @
bucket 2
Message 3
missing
Tombstone
When to move off a bucket?
• Tombstoning (not cassandra tombstoning, naming is hard!)
• Bucket is sealed, no more writes
• Reader tombstones bucket after its reached
Tombstoning enables us to detect delayed writes
1 2 ? 4 5 6 7 …
Bucket 2Bucket 1
Reader @
bucket 2
Message 3
missing
Tombstone
© DataStax, All Rights Reserved. 29
Repairing delayed
messages
© DataStax, All Rights Reserved.
Pointers to Buckets/Messages
• Reader pointer
• Tracks which bucket a consumer is on
• Repair pointer
• Tracks first non-finalized bucket
• Invisibility pointer
• Tracks first unacked message
30
© DataStax, All Rights Reserved.
Repairing delayed writes
• Scenarios:
• Message taking its time writing (still alive, but slow)
• Message claimed monoton but is dead
• Resolution:
• Watch for tombstone in bucket
• Wait for repair timeout (30 seconds)
• If message shows up, republish
• If not, finalize bucket and move to next bucket (message is dead)
31
Repairing delayed writes
1 2 ? 4 5 6 7 …
Bucket 2Bucket 1
Reader
@ bucket
2
Message 3
missing
Tombstone
Repair Pointer
@ bucket 1
wait 30 seconds…
Repairing delayed writes
1 2 3 4 5 6 7 …
Bucket 2Bucket 1
Reader
@ bucket
2 +
Message 3
Showed up!
Tombstone
Repair Pointer
@ bucket 1
Republished to end
Repairing delayed writes
1 2 3 4 5 6 7 … 3
Bucket 2..Bucket 1
Tombstone
Repair Pointer
@ bucket 2
Reader
@ bucket
2 +
© DataStax, All Rights Reserved. 36
Invisibility
and the unhappy path ☹
© DataStax, All Rights Reserved. 37
What is invisibility?
© DataStax, All Rights Reserved. 38
A mechanism for
message re-delivery
(in a stateless system)
© DataStax, All Rights Reserved.
Pointers to Buckets/Messages
• Reader pointer
• Tracks which bucket a consumer is on
• Repair pointer
• Tracks first non-finalized bucket
• Invisibility pointer
• Tracks first unacked message
39
The happy path
• Client consumes message
• Message is marked as “invisible” with a “re-visibility” timestamp
• Client gets pop receipt encapsulating metadata (including version)
• Client acks within timeframe
• Message marked as consumed if version is the same
The unhappy path :(
• Client doesn’t ack within timeframe
• Message needs to be redelivered
• Subsequent reads checks the invis pointer for visibility
• If max delivers exceeded, push to optional DLQ
• Else redeliver!
© DataStax, All Rights Reserved.
The unhappy path :(
42
1 2 3 4 5 6 7 …
Bucket 1
Invisibility
pointer
Reader
Bucket
pointer
© DataStax, All Rights Reserved.
The unhappy path :(
43
1 2 3 4 5 6 7 …
Bucket 1
Invisibility
pointer
Reader
Bucket
pointer
© DataStax, All Rights Reserved.
The unhappy path :(
44
1 2 3 4 5* 6 7 …
Bucket 1
Invisibility
pointer
Reader
Bucket
pointer
ackack ack out expired
Long term invisibility is bad
• InvisPointer WILL NOT move past a unacked message
• Invisible messages can block other invisible messages
• Possible to starve future messages
© DataStax, All Rights Reserved.
The unhappy path :(
46
1 2 3 4 5 6 7 …
Bucket 1
Invisibility
pointer Reader
Bucket
pointer
ackack ack DLQ ou t
© DataStax, All Rights Reserved.
Conclusion
• Building a queue on c* is hard
• Limited by performance of lightweight transactions and underlying c* choices
• compaction strategies, cluster usage, etc
• Need to make trade off design choices
• CassieQ is used in production but in not stressed under highly contentious scenarios
47
Questions?
or feedback/thoughts/visceral reactions
Contribute to the antipattern @ paradoxical.io
https://github.com/paradoxical-io/cassieq

CassieQ: The Distributed Message Queue Built on Cassandra (Anton Kropp, Curalate) | C* Summit 2016

  • 1.
    Anton Kropp CassieQ: TheDistributed Queue Built On Cassandra
  • 2.
    © DataStax, AllRights Reserved. Why use queues? • Distribution of work • Decoupling producers/consumers • Reliability 2
  • 3.
    © DataStax, AllRights Reserved. Existing Queues • ActiveMQ • RabbitMQ • MSMQ • Kafka • SQS • Azure Queue • others 3
  • 4.
    © DataStax, AllRights Reserved. Advantage of a queue on c* • Highly available • Highly distributed • Massive intake • Masterless • Re-use existing data store/operational knowledge 4
  • 5.
    © DataStax, AllRights Reserved. 5 But aren’t queues antipatterns?
  • 6.
    © DataStax, AllRights Reserved. Issues with queues in C* • Modeling off deletes • Tombstones • Evenly distributing messages? • What is the partition key? • How to synchronize consumers? 6
  • 7.
    © DataStax, AllRights Reserved. Existing C* queues • Netflix Astyanax recipe • Cycled time based partitioning • Row based reader lock • Messages put into time shard ordered by insert time • Relies on deletes • Requires low gc_grace_seconds for fast compaction 7
  • 8.
    © DataStax, AllRights Reserved. Existing C* queues • Comcast CMB • Uses Redis as actual queue (cheating) • Queues are hashed to affine to same redis server • Cassandra is cold storage backing store • Random partitioning between 0 and 100 8
  • 9.
    © DataStax, AllRights Reserved. Missing features • Authentication • Authorization • Statistics • Simple deployment • Requirement on external infrastructure 9
  • 10.
    © DataStax, AllRights Reserved. CassieQ • HTTP(s) based API • No locking • Fixed size bucket partitioning • Leverages pointers (kafkaesque) • Message invisibility • Azure Queue/SQS inspired • Docker deployment • Authentication/authorization • Ideally once delivery • Best attempt at FIFO (not guaranteed) 10
  • 11.
    © DataStax, AllRights Reserved. 11 docker run –it -p 8080:8080 –p 8081:8081 paradoxical/cassieq dev
  • 12.
    © DataStax, AllRights Reserved. CassieQ Queue API 12
  • 13.
    © DataStax, AllRights Reserved. CassieQ Admin API 13
  • 14.
    © DataStax, AllRights Reserved. CassieQ workflow • Client is authorized on an account • Granular client authorization up to queue level • Client consumes message from queue with message lease (invisibility) • Gets pop receipt • Client acks message with pop receipt • If pop receipt not valid, lease expired • Client can update messages • Update message contents • Renew lease 14
  • 16.
    © DataStax, AllRights Reserved. 16 Lets dig inside CassieQ internals
  • 17.
    © DataStax, AllRights Reserved. TLDR • Messages partitioned into fixed sized buckets • Pointers to buckets/messages used to track current state • Use of lightweight transactions for atomic actions to avoid locking • Bucketing + pointers eliminates modeling off deletes 17
  • 18.
    © DataStax, AllRights Reserved. CassieQ Buckets • Messages stored in fixed sized buckets • Deterministic when full • Easy to reason about • Why not time buckets? • Time bugs suck • Non deterministic • Can miss data due to time overlaps • Messages given monotonic ID • CAS “id” table • Bucket # = monotonicId / bucketSize 18
  • 19.
    © DataStax, AllRights Reserved. Pointers to Buckets/Messages • Reader pointer • Tracks which bucket a consumer is on • Repair pointer • Tracks first non-finalized bucket • Invisibility pointer • Tracks first unacked message 19
  • 20.
    All 3 pointerspoint to monotonic id value, potentially in different buckets 1 2 3 4 5 InvisPointer ReaderPointer RepairPointer Pointers to Buckets
  • 21.
    © DataStax, AllRights Reserved. Schema 21 CREATE TABLE queue ( account_name text, queuename text, bucket_size int, version int, ... PRIMARY KEY (account_name, queuename) ); CREATE TABLE message ( queueid text, bucket_num bigint, monoton bigint, message text, version int, acked boolean, next_visible_on timestamp, delivery_count int, tag text, created_date timestamp, updated_date timestamp, PRIMARY KEY ((queueid, bucket_num), monoton) ); *queueid=accountName:queueName:version
  • 22.
    © DataStax, AllRights Reserved. 22 Reading messages
  • 23.
    © DataStax, AllRights Reserved. Pointers to Buckets/Messages • Reader pointer • Tracks which bucket a consumer is on • Repair pointer • Tracks first non-finalized bucket • Invisibility pointer • Tracks first unacked message 23
  • 24.
    © DataStax, AllRights Reserved. Reading from a bucket • Read any unacked message in bucket (either FIFO or random) • Consume message (update its internal version + set its invisibility timeout) • Return to consumer 24 1 2 3 4 Bucket 1 Undelivered messages Reader pointer start
  • 25.
    1 2 ?4 5 Buckets… complications • Once a monoton is generated, it is taken • Even if a message fails to insert the monoton is taken • Buckets are now partially filled! • How to resolve?
  • 26.
    1 2 ?4 5 6 Bucket 2Bucket 1 Reader Message 3 missing When to move off a bucket? 1. All known messages in the bucket have been delivered at least once 2. All new messages being written in future buckets
  • 27.
    1 2 ?4 5 6 7 … Bucket 2Bucket 1 Reader @ bucket 2 Message 3 missing Tombstone When to move off a bucket? • Tombstoning (not cassandra tombstoning, naming is hard!) • Bucket is sealed, no more writes • Reader tombstones bucket after its reached
  • 28.
    Tombstoning enables usto detect delayed writes 1 2 ? 4 5 6 7 … Bucket 2Bucket 1 Reader @ bucket 2 Message 3 missing Tombstone
  • 29.
    © DataStax, AllRights Reserved. 29 Repairing delayed messages
  • 30.
    © DataStax, AllRights Reserved. Pointers to Buckets/Messages • Reader pointer • Tracks which bucket a consumer is on • Repair pointer • Tracks first non-finalized bucket • Invisibility pointer • Tracks first unacked message 30
  • 31.
    © DataStax, AllRights Reserved. Repairing delayed writes • Scenarios: • Message taking its time writing (still alive, but slow) • Message claimed monoton but is dead • Resolution: • Watch for tombstone in bucket • Wait for repair timeout (30 seconds) • If message shows up, republish • If not, finalize bucket and move to next bucket (message is dead) 31
  • 32.
    Repairing delayed writes 12 ? 4 5 6 7 … Bucket 2Bucket 1 Reader @ bucket 2 Message 3 missing Tombstone Repair Pointer @ bucket 1
  • 33.
  • 34.
    Repairing delayed writes 12 3 4 5 6 7 … Bucket 2Bucket 1 Reader @ bucket 2 + Message 3 Showed up! Tombstone Repair Pointer @ bucket 1 Republished to end
  • 35.
    Repairing delayed writes 12 3 4 5 6 7 … 3 Bucket 2..Bucket 1 Tombstone Repair Pointer @ bucket 2 Reader @ bucket 2 +
  • 36.
    © DataStax, AllRights Reserved. 36 Invisibility and the unhappy path ☹
  • 37.
    © DataStax, AllRights Reserved. 37 What is invisibility?
  • 38.
    © DataStax, AllRights Reserved. 38 A mechanism for message re-delivery (in a stateless system)
  • 39.
    © DataStax, AllRights Reserved. Pointers to Buckets/Messages • Reader pointer • Tracks which bucket a consumer is on • Repair pointer • Tracks first non-finalized bucket • Invisibility pointer • Tracks first unacked message 39
  • 40.
    The happy path •Client consumes message • Message is marked as “invisible” with a “re-visibility” timestamp • Client gets pop receipt encapsulating metadata (including version) • Client acks within timeframe • Message marked as consumed if version is the same
  • 41.
    The unhappy path:( • Client doesn’t ack within timeframe • Message needs to be redelivered • Subsequent reads checks the invis pointer for visibility • If max delivers exceeded, push to optional DLQ • Else redeliver!
  • 42.
    © DataStax, AllRights Reserved. The unhappy path :( 42 1 2 3 4 5 6 7 … Bucket 1 Invisibility pointer Reader Bucket pointer
  • 43.
    © DataStax, AllRights Reserved. The unhappy path :( 43 1 2 3 4 5 6 7 … Bucket 1 Invisibility pointer Reader Bucket pointer
  • 44.
    © DataStax, AllRights Reserved. The unhappy path :( 44 1 2 3 4 5* 6 7 … Bucket 1 Invisibility pointer Reader Bucket pointer ackack ack out expired
  • 45.
    Long term invisibilityis bad • InvisPointer WILL NOT move past a unacked message • Invisible messages can block other invisible messages • Possible to starve future messages
  • 46.
    © DataStax, AllRights Reserved. The unhappy path :( 46 1 2 3 4 5 6 7 … Bucket 1 Invisibility pointer Reader Bucket pointer ackack ack DLQ ou t
  • 47.
    © DataStax, AllRights Reserved. Conclusion • Building a queue on c* is hard • Limited by performance of lightweight transactions and underlying c* choices • compaction strategies, cluster usage, etc • Need to make trade off design choices • CassieQ is used in production but in not stressed under highly contentious scenarios 47
  • 48.
    Questions? or feedback/thoughts/visceral reactions Contributeto the antipattern @ paradoxical.io https://github.com/paradoxical-io/cassieq