Acunu Analytics: Simpler Real-Time Cassandra Apps

1,599 views

Published on

Talk for the Cassandra Seattle Meetup April 2013: http://www.meetup.com/cassandra-seattle/events/114988872/

Cassandra's got some properties which make it an ideal fit for building real-time analytics applications -- but getting from atomic increments to live dashboards and streaming queries is quite a stretch. In this talk, Tim Moreton, CTO at Acunu, talks about how and why they built Acunu Analytics, which adds rich SQL-like queries and a RESTful API on top of Cassandra, and looks at how it keeps Cassandra's spirit of denormalization under the hood.

Published in: Technology
0 Comments
4 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,599
On SlideShare
0
From Embeds
0
Number of Embeds
41
Actions
Shares
0
Downloads
39
Comments
0
Likes
4
Embeds 0
No embeds

No notes for slide

Acunu Analytics: Simpler Real-Time Cassandra Apps

  1. 1. Acunu Analytics:Simpler Real-TimeCassandra AppsTim Moreton CTO@timmoretonMonday, 29 April 13
  2. 2. 2•Scalable. No single point of {failure, bottleneck}•Fast. Especially for writes•Available. Effortless Multi-DC support•Maturing fast. Lots of production deploymentsWE C*Monday, 29 April 13
  3. 3. 3WE C*Virtual nodes CQL SupportMonday, 29 April 13
  4. 4. 4•Spartan queries•Thrift (and CQL, a bit)•Denormalization hurts agility•Weak update semanticsChallenges remain, of course.WE C*Monday, 29 April 13
  5. 5. 5C*: Two usesMonday, 29 April 13
  6. 6. 5Session storage02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html•Many more reads than writes•Updates to existing records(ideally, transactionally)•Probably fits in RAM:distribute for availabilityC*: Two usesMonday, 29 April 13
  7. 7. 5Real-time analytics02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html•Many more writes than reads•Almost all reads are to results•Almost no writes are ‘updates’•Distribute for availability,performance, capacitySession storage02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html•Many more reads than writes•Updates to existing records(ideally, transactionally)•Probably fits in RAM:distribute for availabilityC*: Two usesMonday, 29 April 13
  8. 8. 5Real-time analytics02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html•Many more writes than reads•Almost all reads are to results•Almost no writes are ‘updates’•Distribute for availability,performance, capacitySession storage02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html•Many more reads than writes•Updates to existing records(ideally, transactionally)•Probably fits in RAM:distribute for availabilityC*: Two usesMonday, 29 April 13
  9. 9. 6C*on•Rich, SQL-like queries•RESTful HTTP APIs, JSON-based•Automated denormalization•Update semantics < less critical for analyticsSupplement Cassandra with:Monday, 29 April 13
  10. 10. 7Analytics: Two patternsMonday, 29 April 13
  11. 11. 7ExploratoryAnalyticsUnstructuredWarehousesDataMining?MachineLearningAnalytics: Two patternsMonday, 29 April 13
  12. 12. 7ExploratoryAnalyticsUnstructuredWarehousesDataMining?MachineLearningAnalytics: Two patternsOperationalIntelligenceDashboards Real-timeDecisionsAlerting!Monday, 29 April 13
  13. 13. 7ExploratoryAnalyticsUnstructuredWarehousesDataMining?MachineLearningAnalytics: Two patternsOperationalIntelligenceDashboards Real-timeDecisionsAlerting!Complex analysis, data varietyQuery richnessData freshness, response timeQuery speedMonday, 29 April 13
  14. 14. 7ExploratoryAnalyticsUnstructuredWarehousesDataMining?MachineLearningAnalytics: Two patternsOperationalIntelligenceDashboards Real-timeDecisionsAlerting!Complex analysis, data varietyQuery richnessData freshness, response timeQuery speedMonday, 29 April 13
  15. 15. 8APIeventstreameventstoreroll-upcubesIngestProcessingdashboard queries programatic interfaceMonday, 29 April 13
  16. 16. 9Who uses Acunu?Location DataWeb and VisitorMarket/Tick DataInfrastructureSensor DataSocial MediaSocial GamingSmart GridProduction LineMonday, 29 April 13
  17. 17. 10Monday, 29 April 13
  18. 18. 10APIeventstreameventstoreroll-upcubesIngestProcessingdashboard queries programatic interfaceAPIeventstreameventstoreroll-upcubesIngestProcessingdashboard queries programatic interfaceCassandra stores raw events and intermediate aggregatesMonday, 29 April 13
  19. 19. 10APIeventstreameventstoreroll-upcubesIngestProcessingdashboard queries programatic interfaceAPIeventstreameventstoreroll-upcubesIngestProcessingdashboard queries programatic interfaceCassandra stores raw events and intermediate aggregatesAPIeventstoreroll-upcubesdashboard queries programatic interfaceAcunu Analytics is a Cassandra client mapping new events,queries and schema changes to aggregate reads and writes!APIeventstreameventstoreroll-upcubesIngestProcessingdashboard queries programatic intMonday, 29 April 13
  20. 20. 10APIeventstreameventstoreroll-upcubesIngestProcessingdashboard queries programatic interfaceAPIeventstreameventstoreroll-upcubesIngestProcessingdashboard queries programatic interfaceCassandra stores raw events and intermediate aggregatesAcunu Dashboards provides embeddable,custom data visualization using HTTP APIAPIeventstoreroll-upcubesdashboard queries programatic interfaceAcunu Analytics is a Cassandra client mapping new events,queries and schema changes to aggregate reads and writes!APIeventstreameventstoreroll-upcubesIngestProcessingdashboard queries programatic intMonday, 29 April 13
  21. 21. CREATE TABLE APICalls (time TIME(‘PST’, HOUR, MIN, SEC),path PATH(/),useragent STRING,latitude DOUBLE(0.1, 0.01),longitude DOUBLE(0.1, 0.01));CREATE CUBE SELECT COUNT, AVG(respTime) FROM APICallsWHERE time, pathGROUP BY time, path;CREATE CUBE SELECT COUNT FROM APICallsWHERE latitude, longitudeGROUP BY latitude, longitude;11(Loosely) Define a schema• Tables have HTTP endpoint; map to a set of ColumnFamilys• Dimensions map keys in events; allow hierarchical aggregation• Cubes defines dimensions and aggregate to maintainMonday, 29 April 13
  22. 22. CREATE CUBE SELECT SUM(a) FROM tWHERE x, y GROUP BY g, h, i;12AggregationestMonday, 29 April 13
  23. 23. CREATE CUBE SELECT SUM(a) FROM tWHERE x, y GROUP BY g, h, i;12AggregationestNew event:Apply SUM(v, v’)on this cellvA: v’X: xY: yZ: zyx(g, h, i)Monday, 29 April 13
  24. 24. CREATE CUBE SELECT SUM(a) FROM tWHERE x, y GROUP BY g, h, i;12Aggregation• Hierarchical dimensions cause multiple writes per event(That’s ok: Cassandra’s good at writes)• Most aggregates result in atomic counter incrementsestNew event:Apply SUM(v, v’)on this cellvA: v’X: xY: yZ: zyx(g, h, i)Monday, 29 April 13
  25. 25. SELECT SUM(a) FROM tWHERE x = .. and y = .. GROUP BY g, h, i;13Queriesest• WHEREs map to a Cassandra row and GROUP BY to acompound column key in that row (very roughly)Monday, 29 April 13
  26. 26. SELECT SUM(a) FROM tWHERE x = .. and y = .. GROUP BY g, h, i;13QueriesestNew query:•Locate slice thatmatches WHERE•Return all mappingsfrom GROUP BY tuplesto cell valuesvyx(g, h, i)• WHEREs map to a Cassandra row and GROUP BY to acompound column key in that row (very roughly)Monday, 29 April 13
  27. 27. 21:00 all→1345 :00→45 :01→62 :02→87 ...22:00 all→3221 :00→22 :01→19 :02→104 ...... ...UK all→228 user01→1 user14→12 user99→7 ...US all→354 user01→4 user04→8 user56→17 ......UK, 22:00 all→1904 ...∅ all→87314 UK→238 US→354 ...14A concrete exampleMonday, 29 April 13
  28. 28. 21:00 all→1345 :00→45 :01→62 :02→87 ...22:00 all→3222 :00→22 :01→19 :02→105 ...... ...UK all→229 user01→2 user14→12 user99→7 ...US all→354 user01→4 user04→8 user56→17 ......UK, 22:00 all→1905 ...∅ all→87315 UK→239 US→355 ...{cust_id: user01,session_id: 102,geography: UK,browser: IE,time: 22:02,}15Each event updatesmultiple aggregates:A concrete exampleMonday, 29 April 13
  29. 29. 21:00 all→1345 :00→45 :01→62 :02→87 ...22:00 all→3222 :00→22 :01→19 :02→105 ...... ...UK all→229 user01→2 user14→12 user99→7 ...US all→354 user01→4 user04→8 user56→17 ......UK, 22:00 all→1905 ...∅ all→87315 UK→239 US→355 ...{cust_id: user01,session_id: 102,geography: UK,browser: IE,time: 22:02,}15Each event updatesmultiple aggregates:WHERE time IN (22:00,23:00)GROUP BY minuteA concrete exampleMonday, 29 April 13
  30. 30. 21:00 all→1345 :00→45 :01→62 :02→87 ...22:00 all→3222 :00→22 :01→19 :02→105 ...... ...UK all→229 user01→2 user14→12 user99→7 ...US all→354 user01→4 user04→8 user56→17 ......UK, 22:00 all→1905 ...∅ all→87315 UK→239 US→355 ...{cust_id: user01,session_id: 102,geography: UK,browser: IE,time: 22:02,}15Each event updatesmultiple aggregates:WHERE time IN (22:00,23:00)GROUP BY minuteWHERE geography=USGROUP BY userA concrete exampleMonday, 29 April 13
  31. 31. 16SELECT `SUM(x)/(MAX(y) -MIN(y) + 0.5) AS spreadFROM ...Arithmetic expressionsSELECT a - b AS lbound, a + b AS uboundFROM(SELECT AVG(score) AS a FROM scoresWHERE year = 2012)JOIN(SELECT STDDEV(score) AS b FROM scores)USING (school)Fast inner joinsSELECT COUNT UNIQUE (visitors) GROUPBY time(DAY(‘US/Pacific’))Time zone supportSELECT SUM(size) FROM ..WHERE path MATCHES /usr/*Hierarchical aggregationSELECT DRILL FROM errors WHEREcategory IN (“warn”, “error”)Drill down to raw eventsSELECT COUNT (items) FROM ..GROUP BY category LIMIT 3,country... HAVING AVG(rating) < 2.0AND COUNT >= 10LimitsQuery-time filteringRich queriesMonday, 29 April 13
  32. 32. 17Monday, 29 April 13
  33. 33. Apache,Apache Cassandra, Cassandra, Hadoop, and the eye and elephant logosare trademarks of the Apache Software Foundation.Thank You.Tim Moreton CTO@timmoretonMonday, 29 April 13

×