Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
CASSANDRA 
FOR ALL 
THE THINGS 
@iconara
CASSANDRA 
FOR ALL 
THE THINGS 
@iconara
THEO 
Chief Architect at Burt 
@iconara
THEO 
Cassandra MVP and original author of the DataStax Ruby driver 
@iconara
CASSANDRA 
FOR ALL 
THE THINGS* 
@iconara
CASSANDRA 
FOR ALL 
THE THINGS* 
* where it makes sense 
@iconara
CASSANDRA 
FTW 
@iconara
BIG TABLE 
partitions as sorted maps 
@iconara
DYNAMO 
availability and simple ops 
@iconara
LOGS AS STORAGE 
optimized for writes 
@iconara
CASSANDRA 
FTW 
@iconara
CONVERSION 
TRACKING 
$ $ $ $ $ 
@iconara
ATTRIBUTION 
@iconara 
$
ATTRIBUTION 
90s 5s 
@iconara 
30s 
$
visitor ID 
timestamp timestamp timestamp 
event event event
visitor ID 
visitor ID 
timestamp timestamp timestamp 
event event event 
timestamp timestamp timestamp 
event event event...
1/10,000 
@iconara
TTL 
automatic windowing 
@iconara
ANALYTICS 
precomputed cubes 
@iconara
SELECT 
category, 
author, 
device_type, 
COUNT(*) AS pageviews, 
AVG(duration) AS duration 
FROM pageviews 
GROUP BY 1, 2...
SELECT 
category, 
author, 
device_type, 
COUNT(*) AS pageviews, 
AVG(duration) AS duration 
FROM pageviews 
GROUP BY 1, 2...
SELECT 
category, 
author, 
device_type, 
COUNT(*) AS pageviews, 
AVG(duration) AS duration 
FROM pageviews 
GROUP BY 1, 2...
date + cube 
slice + time slice + time slice + time 
metrics metrics metrics
“sports”, “Sue”, “tablet” 
date + cube 
slice + time slice + time slice + time 
metrics metrics metrics 
category, author,...
date + cube 
slice + time slice + time slice + time 
metrics metrics metrics 
{pageviews: 234, duration: 2451.45}
COUNTERS 
at 2M increments per second they’re not cost-effective × 
@iconara
WIDE ROWS FAIL 
cardinality kills 
@iconara
“sports”, “Sue”, “tablet” 
date + cube 
slice + time slice + time slice + time 
metrics metrics metrics
remember when we 
thought this was a problem?
date + 
cube + 
shard 
slice + time slice + time slice + time 
metrics metrics metrics 
H(slice) mod 12
TIME SERIES 
@iconara
cube + 
slice + metric 
time time time 
value value value 
cube 
slice slice slice
NARROW ROWS 
@iconara
cube + slice + 
epoch + metric 
time + version time + version time + version 
value value value 
cube + epoch 
slice slice...
NARROW ROWS 
enables parallel reads 
@iconara
SELECT 
category, 
device_type, 
COUNT(*) AS pageviews, 
AVG(duration) AS duration 
FROM pageviews 
WHERE category = “spor...
category + 
device type 
culture + 
computer 
culture + 
tablet 
news + 
computer 
news + 
tablet 
news + 
mobile 
sports ...
supports wildcard on device type 
category + 
device type 
culture + 
computer 
culture + 
tablet 
news + 
computer 
news ...
TRANSIENT STATE 
➔ ➔ 
@iconara
session ID 
timestamp timestamp timestamp 
data data data 
shard 
session ID session ID session ID 
H(session ID) mod 360
WIDE ROWS FAIL 
(again) 
@iconara
shard 9 
NFNJ0U NFNJ13 NFNJ0Y NFNJ1L NFNJ16 NFNJ0U NFNJ1C NFNJ20 NFNJ0U NFNJ2L NFNJ2S NFNJ2D NFNJ2U NFNJ1X NFNJ4C NFNJ1O N...
session ID 
timestamp timestamp timestamp 
data data data 
shard 
session ID session ID session ID
session ID 
timestamp timestamp timestamp 
data data data 
shard + slot 
session ID session ID session ID
session ID 
timestamp timestamp timestamp 
data data data 
shard + slot 
session ID session ID session ID 
day-of-year mod...
shard 9 
slot 1 × × × 
shard 9 
slot 3 
NFNJ6Z NFNJ73 NFNJ7B NFNJ72 NFNJ7N NFNJ74 NFNJ6Z NFNJ8C NFNJ7N NFNJ88 NFNJ83 NFNJ9...
AND MORE 
@iconara
GRAPHS 
@iconara
TRACES 
@iconara
KEY/VALUE 
@iconara
CASSANDRA 
FOR ALL 
THE THINGS 
@iconara
@iconara 
burtcorp.com 
architecturalatrocities.com
Cassandra for all the Things
Upcoming SlideShare
Loading in …5
×

Cassandra for all the Things

4,544 views

Published on

At Burt we use Cassandra for a little bit of everything. We have a graph database, a tracing system, a stream processing engine and a document store that uses it for storage, and of course, we use it for time series too – but with a twist.

Cassandra works great for all of these use cases, but not out of the box. We've learned the hard way what not to do, and what to do instead.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Cassandra for all the Things

  1. 1. CASSANDRA FOR ALL THE THINGS @iconara
  2. 2. CASSANDRA FOR ALL THE THINGS @iconara
  3. 3. THEO Chief Architect at Burt @iconara
  4. 4. THEO Cassandra MVP and original author of the DataStax Ruby driver @iconara
  5. 5. CASSANDRA FOR ALL THE THINGS* @iconara
  6. 6. CASSANDRA FOR ALL THE THINGS* * where it makes sense @iconara
  7. 7. CASSANDRA FTW @iconara
  8. 8. BIG TABLE partitions as sorted maps @iconara
  9. 9. DYNAMO availability and simple ops @iconara
  10. 10. LOGS AS STORAGE optimized for writes @iconara
  11. 11. CASSANDRA FTW @iconara
  12. 12. CONVERSION TRACKING $ $ $ $ $ @iconara
  13. 13. ATTRIBUTION @iconara $
  14. 14. ATTRIBUTION 90s 5s @iconara 30s $
  15. 15. visitor ID timestamp timestamp timestamp event event event
  16. 16. visitor ID visitor ID timestamp timestamp timestamp event event event timestamp timestamp timestamp event event event visitor ID visitor ID timestamp timestamp timestamp event event event timestamp timestamp timestamp event event event visitor ID timestamp timestamp timestamp event event event ➔
  17. 17. 1/10,000 @iconara
  18. 18. TTL automatic windowing @iconara
  19. 19. ANALYTICS precomputed cubes @iconara
  20. 20. SELECT category, author, device_type, COUNT(*) AS pageviews, AVG(duration) AS duration FROM pageviews GROUP BY 1, 2, 3 @iconara
  21. 21. SELECT category, author, device_type, COUNT(*) AS pageviews, AVG(duration) AS duration FROM pageviews GROUP BY 1, 2, 3 @iconara
  22. 22. SELECT category, author, device_type, COUNT(*) AS pageviews, AVG(duration) AS duration FROM pageviews GROUP BY 1, 2, 3 @iconara
  23. 23. date + cube slice + time slice + time slice + time metrics metrics metrics
  24. 24. “sports”, “Sue”, “tablet” date + cube slice + time slice + time slice + time metrics metrics metrics category, author, device type
  25. 25. date + cube slice + time slice + time slice + time metrics metrics metrics {pageviews: 234, duration: 2451.45}
  26. 26. COUNTERS at 2M increments per second they’re not cost-effective × @iconara
  27. 27. WIDE ROWS FAIL cardinality kills @iconara
  28. 28. “sports”, “Sue”, “tablet” date + cube slice + time slice + time slice + time metrics metrics metrics
  29. 29. remember when we thought this was a problem?
  30. 30. date + cube + shard slice + time slice + time slice + time metrics metrics metrics H(slice) mod 12
  31. 31. TIME SERIES @iconara
  32. 32. cube + slice + metric time time time value value value cube slice slice slice
  33. 33. NARROW ROWS @iconara
  34. 34. cube + slice + epoch + metric time + version time + version time + version value value value cube + epoch slice slice slice time/1000
  35. 35. NARROW ROWS enables parallel reads @iconara
  36. 36. SELECT category, device_type, COUNT(*) AS pageviews, AVG(duration) AS duration FROM pageviews WHERE category = “sports” GROUP BY 1, 2, 3
  37. 37. category + device type culture + computer culture + tablet news + computer news + tablet news + mobile sports + computer sports + tablet sports + mobile
  38. 38. supports wildcard on device type category + device type culture + computer culture + tablet news + computer news + tablet news + mobile sports + computer sports + tablet sports + mobile device type + category computer + culture computer + news computer + sports mobile + news mobile + sports tablet + culture tablet + news tablet + sports supports wildcard on category
  39. 39. TRANSIENT STATE ➔ ➔ @iconara
  40. 40. session ID timestamp timestamp timestamp data data data shard session ID session ID session ID H(session ID) mod 360
  41. 41. WIDE ROWS FAIL (again) @iconara
  42. 42. shard 9 NFNJ0U NFNJ13 NFNJ0Y NFNJ1L NFNJ16 NFNJ0U NFNJ1C NFNJ20 NFNJ0U NFNJ2L NFNJ2S NFNJ2D NFNJ2U NFNJ1X NFNJ4C NFNJ1O NFNJ1A NFNJ29 × × × × × × × × × × × × × × × ×
  43. 43. session ID timestamp timestamp timestamp data data data shard session ID session ID session ID
  44. 44. session ID timestamp timestamp timestamp data data data shard + slot session ID session ID session ID
  45. 45. session ID timestamp timestamp timestamp data data data shard + slot session ID session ID session ID day-of-year mod 30
  46. 46. shard 9 slot 1 × × × shard 9 slot 3 NFNJ6Z NFNJ73 NFNJ7B NFNJ72 NFNJ7N NFNJ74 NFNJ6Z NFNJ8C NFNJ7N NFNJ88 NFNJ83 NFNJ94 NFNJ8Z NFNJ7P NFNJA3 × × × × × × shard 9 slot 2 NFNJ58 NFNJ5H NFNJ5C NFNJ5E NFNJ68 NFNJ6H NFNJ68 NFNJ5M NFNJ64 NFNJ6H NFNJ6C NFNJ5U NFNJ5K NFNJ84 NFNJ7Y NFNJ62 NFNJ70 NFNJ74 NFNJ6Q × × × × × × × × × × × × × × × ×
  47. 47. AND MORE @iconara
  48. 48. GRAPHS @iconara
  49. 49. TRACES @iconara
  50. 50. KEY/VALUE @iconara
  51. 51. CASSANDRA FOR ALL THE THINGS @iconara
  52. 52. @iconara burtcorp.com architecturalatrocities.com

×