C* Summit 2013: Real World, Real Time Data Modeling by Tim Moreton

2,347 views
2,189 views

Published on

Data modeling for Cassandra presents a new set of challenges, especially for developers with a background in relational data modeling. And there are added complexities in modeling for analytic applications which need to enable statistical functions over the data, but a good data model, exploiting Cassandra's strengths, can make all the difference to a successful project. This tutorial will examine a number of real-world customer data modeling examples and draw out some hints and tips that will benefit hnot just the Cassandra newbie, but also the more experienced data modeler.

Published in: Technology
0 Comments
7 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
2,347
On SlideShare
0
From Embeds
0
Number of Embeds
7
Actions
Shares
0
Downloads
0
Comments
0
Likes
7
Embeds 0
No embeds

No notes for slide

C* Summit 2013: Real World, Real Time Data Modeling by Tim Moreton

  1. 1. #Cassandra13Real World, Real-TimeData Modeling(for analytics apps)Tim MoretonFounder and CTO, Acunu
  2. 2. #Cassandra13Virtual nodes CQL SupportWE C*
  3. 3. #Cassandra13•e.g Click stream, telemetry, logs•100x more writes than reads•Almost all reads are to results•Almost no writes are ‘updates’•Really not going to fit in RAMReal-time analytics02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html•e.g User profiles•Create, Read, Update, Delete•Probably mostly reads•Probably wants atomicity•Probably fits in RAMSession storage02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.htmlWhat folk use C* for
  4. 4. #Cassandra13Real-time analytics02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.htmlSession storage02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.htmlWhat folk use C* forS WP HA ACIDS WP HA ACID
  5. 5. #Cassandra13Real-time analyticsWhat folk use C* forSession storage
  6. 6. #Cassandra13Example use case{time: 13:50:11,latitude: 12.5,longitude: -43.4,duration: 24,device_type: ..}Call detail recordstens thousands/secReal-time dashboards
  7. 7. #Cassandra13C* Data Modeling 101• Denormalise: Writes (and disk) arecheap, reads are expensive:insert data in every arrangement thatyou need to read it• Items you’ll access together, and wantsorted: put in the same row• Sets of items you’re likely to accessseparately: keep in separate rows• Atomic counters are the building blockof Cassandra real-time analytics appsrow2row3row1One eventupdateOne query read
  8. 8. #Cassandra13#1: Hierarchies13:00 ... :01→45 :02→62 :03→87<day> ... :12→2930 :13→3520 :14→303413:01 ... :10→3 :11→4 :12→214:0013:02......Countingoccurrencesby day, hour,min, secOne row for eachvalue at each level inthe hierarchyColumns encode sub-components for each level
  9. 9. #Cassandra13#1: Hierarchies{time: 13:02:11,....}13:00 ... :01→45 :02→62 :03→87<day> ... :12→2930 :13→3520 :14→303413:01 ... :10→3 :11→4 :12→214:0013:02......11:59-> 13:02Countingoccurrencesby day, hour,min, sec
  10. 10. #Cassandra13#2: Filtering{time: 13:50:11,device_type : xx,}13:00 ... :01→45 :02→62 :03→87xxyy<day>xxyy13:01xxyy14:0013:02xxyy......Adding‘WHERE’sTo filter on a field,make sure it is in thepartition key
  11. 11. #Cassandra13#3: Grouping{time: 13:50:11,device_type : xx,}Adding‘GROUP BY’13:00 ... :01, xx→45 :01, yy→3 :02, xx→7<day> ... :12, xx→1012 :12,yy→542 :13,xx→22814:00......
  12. 12. #Cassandra13#4: Drilldown13:00 ... :01, e3→- :01, e4→- :02, e5→-<day> ... :12, e1→- :12,e2→- :13,e3→-14:00......Going fromcounts to theconstituent events{_id: e3,time: 13:01:11,device_type : xx,}e3 time → 13:01:11 device_type → xx ...Use an identifier in the column key and storethe event in a different ColumnFamily
  13. 13. #Cassandra13Put it together...Source: http://paintcutpaste.com/pollock-splatter-painting/
  14. 14. #Cassandra13Schema agilitySource: http://thoughtstream-distantechoes.blogspot.com/2011/06/13062011_13.html
  15. 15. #Cassandra13APIeventstreameventstoreroll-upcubesIngestProcessingdashboard queries programatic interfaceAPIeventstreameventstoreroll-upcubesIngestProcessingdashboard queries programatic interfaceCassandra stores raw events, aggregates, data model definitionAcunu Analytics maps events and SQL-like queries into C* opsAPIeventstreameventstoreroll-upcubesIngestProcessingdashboard queries programatic interfacePROCESSING AT INGESTJSON, CSV, log ingestvia RESTful HTTP API,Flume, Storm, AMQPStorm, MQ HTTPAcunu Dashboards provides rich, real-time,embeddable visualizationsSELECT AVG(r)FROM metricsGROUP BY host;AQL Alerting!CubesMILLISECOND QUERIESAPIeventstreameventstoreroll-upcubesIngestProcessingdashboard queries programatic interfaceAPI for rich queries,threshold alertingBackfill historic resultsfor new cubes to enableagile schema changes
  16. 16. #Cassandra13 Apache,Apache Cassandra, Cassandra, Flume, and the eye logosare trademarks of the Apache Software Foundation.@timmoreton@acunuThanks!

×