Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Cassandra NYC 2011 Data Modeling


Published on

Published in: Technology, Economy & Finance
  • Hi Matthew,

    I'm looking for the best way to model my new problem:
    I have many systems which send to the DB a set of metrics (ex: system1 sends every 10 secondes, temp1 - temp2 - acc1 - vel1 - ...)

    When I see your 9th slide, I wonder if it's the best way to store the values, as the systems don't have the same metrics, and maybe the metrics can changes over the time (ex: tomorrow, the system1 won't have the temp2, but have acc2 and acc3).

    What would be the best way to do?

    Philippe (
    Are you sure you want to  Yes  No
    Your message goes here

Cassandra NYC 2011 Data Modeling

  1. 1. Data Modeling ExamplesMatthew F. Dennis // @mdennis
  2. 2. Overview● general guiding goals for Cassandra data models● Interesting and/or common examples/questions to get us started● Should be plenty of time at the end for questions, so bring them up if you have them !
  3. 3. Data Modeling Goals● Keep data queried together on disk together● In a more general sense think about the efficiency of querying your data and work backward from there to a model in Cassandra● Usually, you shouldnt try to normalize your data (contrary to many use cases in relational databases)● Usually better to keep a record that something happened as opposed to changing a value (not always the best approach though)
  4. 4. Time Series Data● Easily the most common use of Cassandra ● Financial tick data ● Click streams ● Sensor data ● Performance metrics ● GPS data ● Event logs ● etc, etc, etc ...● All of the above are essentially the same as far as C* is concerned
  5. 5. Time Series Thought Model● Things happen in some timestamp ordered stream and consist of values associated with the given timestamp (i.e. “data points”) – Every 30 seconds record location, speed, heading and engine temp – Every 5 minutes record CPU, IO and Memory usage● We are interested in recreating, aggregating and/or analyzing arbitrary time slices of the stream – Where was agent:007 and what was he doing between 11:21am and 2:38pm yesterday? – What are the last N actions foo did on my site?
  6. 6. Data Points Defined● Each data point has 1-N values● Each data point corresponds to a specific point in time or an interval/bucket (e.g. 5 th minute of 17th hour on some date)
  7. 7. Data Points Mapped to Cassandra● Row Key is id of the data point stream bucketed by time – e.g. plane01:jan_2011 or plane01:jan_01_2011 for month or day buckets respectively● Column Name is TimeUUID(timestamp of date point)● Column Value is serialized data point – JSON, XML, pickle, msgpack, thrift, protobuf, avro, BSON, WTFe● Bucketing – Avoids always requiring multiple seeks when only small slices of the stream are requested (e.g. stream is 5 years old but Im on only interested in Jan 5 th 3 years ago and/or yesterday between 2pm and 3pm). – Make it easy to lazily aggregate old stream activity – Reduces compaction overhead since old rows will never have to be merged again (until you “back fill” and/or delete something)
  8. 8. A Slightly More Concrete Example● Sensor data from airplanes● Every 30 seconds each plane sends latitude+longitude, altitude and wine remaining in mdennis glass.
  9. 9. The Visual plane5:jan_2011 TimeUUID0 TimeUUID1 TimeUUID2 p5:j11 28.90, 124.30 45K feet 28.85, 124.25 44K feet 28.81, 124.22 44K feet 70% 50% 95% Middle of the ocean and half a glass of wine at 44K feet● Row Key is the id of stream being recorded (e.g. plane5:jan_2011)● Column Name is timestamp (or TimeUUID) associated with the data point● Column Value is the value of the event (e.g. protobuf serialized lat/long+alt+wine_level)
  10. 10. Querying● When querying, construct TimeUUIDs for the min/max of the time range in question and use them as the start/end in your get_slice call● Or use a empty start and/or end along with a count
  11. 11. Bucket Sizes?● Depends greatly on ● Average size of time slice queried ● Average data point size ● Write rate of data points to a stream ● IO capacity of the nodes
  12. 12. So... Bucket Sizes?● No Bigger than a few GB per row ● bucket_size * write_rate * sizeof(avg_data_point)● Bucket size >= average size of time slice queried● No more than maybe 10M entries per row● No more than a month if you have lots of different streams● NB: there are exceptions to all of the above, which are really nothing more than guidelines
  13. 13. Ordering● In cases where the most recent data is the most interesting (e.g. last N events for entity foo or last hour of events for entity bar), you can reverse the comparator (i.e. sort descending instead of ascending) ● ●
  14. 14. Spanning Buckets● If your time slice spans buckets, youll need to construct all the row keys in question (i.e. number of unique row keys = spans+1)● If you want all the results between the dates, pass all the row keys to multiget_slice with the start and end of the desired time slice● If you only want the first N results within your time slice, lowest latency comes from multiget_slice as above but best efficiency comes from serially paging one row key at a time until your desired count is reached
  15. 15. Expiring Streams (e.g. “I only care about the past year”)● Just set the TTL to the age you want to keep● yeah, thats pretty much it ...
  16. 16. Counters● Sometimes youre only interested in counting things that happened within some time slice● Minor adaptation to the previous content to use counters (be aware they are not idempotent) ● Column names become buckets ● Values become counters
  17. 17. Example: Counting User Logins user3:system5:logins:by_day 20110107 ... 20110523 U3:S5:L:D 2 ... 7 2 logins on Jan 7th 2011 7 logins on May 23rd 2011 for user 3 on system 5 for user 3 on system 5 user3:system5:logins:by_hour 2011010710 ... 2011052316 U3:S5:L:H 1 ... 7one login for user 3 on system 5 2 logins for user 3 on system 5on Jan 7th 2011 for the 10th hour on May 23rd 2011 for the 16th hour
  18. 18. Eventually Atomic● In a legacy RDBMS atomicity is “easy”● Attempting full ACID compliance in distributed systems is a bad idea (and actually impossible in the strictest sense)● However, consistency is important and can certainly be achieved in C*● Many approaches / alternatives● I like a transaction log approach, especially in the context of C*
  19. 19. Transaction Logs (in this context)● Records what is going to be performed before it is actually performed● Performs the actions that need to be atomic (in the indivisible sense, not the all at once sense which is usually what people mean when they say isolation)● Marks that the actions were performed
  20. 20. In Cassandra● Serialize all actions that need to be performed in a single column – JSON, XML, YAML (yuck!), pickle, JSO, msgpack, protobuf, et cetera ● Row Key = randomly chosen C* node token ● Column Name = TimeUUID(nowish)● Perform actions● Delete Column
  21. 21. Configuration Details● Short gc_grace_seconds on the XACT_LOG Column Family (e.g. 5 minutes)● Write to XACT_LOG at CL.QUORUM or CL.LOCAL_QUORUM for durability ● if it fails with an unavailable exception, pick a different node token and/or node and try again (gives same semantics as a relational DB in terms of knowing the state of your transaction)
  22. 22. Failures● Before insert into the XACT_LOG● After insert, before actions● After insert, in middle of actions● After insert, after actions, before delete● After insert, after actions, after delete
  23. 23. Recovery● Each C* has a crond job offset from every other by some time period● Each job runs the same code: multiget_slice for all node tokens for all columns older than some time period (the “recovery period”)● Any columns need to be replayed in their entirety and are deleted after replay (normally there are no columns because normally things are working)
  24. 24. XACT_LOG Comments● Idempotent writes are awesome (thats why this works so well)● Doesnt work so well for counters (theyre not idempotent)● Clients must be able to deal with temporarily inconsistent data (they have to do this anyway)
  25. 25. Q?Cassandra Data Modeling Examples Matthew F. Dennis // @mdennis