Acunu Analytics: Simpler Real-Time Cassandra Apps
Upcoming SlideShare
Loading in...5
×
 

Acunu Analytics: Simpler Real-Time Cassandra Apps

on

  • 1,393 views

Talk for the Cassandra Seattle Meetup April 2013: http://www.meetup.com/cassandra-seattle/events/114988872/ ...

Talk for the Cassandra Seattle Meetup April 2013: http://www.meetup.com/cassandra-seattle/events/114988872/

Cassandra's got some properties which make it an ideal fit for building real-time analytics applications -- but getting from atomic increments to live dashboards and streaming queries is quite a stretch. In this talk, Tim Moreton, CTO at Acunu, talks about how and why they built Acunu Analytics, which adds rich SQL-like queries and a RESTful API on top of Cassandra, and looks at how it keeps Cassandra's spirit of denormalization under the hood.

Statistics

Views

Total Views
1,393
Views on SlideShare
1,357
Embed Views
36

Actions

Likes
4
Downloads
20
Comments
0

2 Embeds 36

http://tweets.acunu.com 31
https://twitter.com 5

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Acunu Analytics: Simpler Real-Time Cassandra Apps Acunu Analytics: Simpler Real-Time Cassandra Apps Presentation Transcript

  • Acunu Analytics:Simpler Real-TimeCassandra AppsTim Moreton CTO@timmoretonMonday, 29 April 13
  • 2•Scalable. No single point of {failure, bottleneck}•Fast. Especially for writes•Available. Effortless Multi-DC support•Maturing fast. Lots of production deploymentsWE C*Monday, 29 April 13
  • 3WE C*Virtual nodes CQL SupportMonday, 29 April 13
  • 4•Spartan queries•Thrift (and CQL, a bit)•Denormalization hurts agility•Weak update semanticsChallenges remain, of course.WE C*Monday, 29 April 13
  • 5C*: Two usesMonday, 29 April 13
  • 5Session storage02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html•Many more reads than writes•Updates to existing records(ideally, transactionally)•Probably fits in RAM:distribute for availabilityC*: Two usesMonday, 29 April 13
  • 5Real-time analytics02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html•Many more writes than reads•Almost all reads are to results•Almost no writes are ‘updates’•Distribute for availability,performance, capacitySession storage02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html•Many more reads than writes•Updates to existing records(ideally, transactionally)•Probably fits in RAM:distribute for availabilityC*: Two usesMonday, 29 April 13
  • 5Real-time analytics02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html•Many more writes than reads•Almost all reads are to results•Almost no writes are ‘updates’•Distribute for availability,performance, capacitySession storage02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html02:44:02 241.24.41 0.0.1 GET /index.html•Many more reads than writes•Updates to existing records(ideally, transactionally)•Probably fits in RAM:distribute for availabilityC*: Two usesMonday, 29 April 13
  • 6C*on•Rich, SQL-like queries•RESTful HTTP APIs, JSON-based•Automated denormalization•Update semantics < less critical for analyticsSupplement Cassandra with:Monday, 29 April 13
  • 7Analytics: Two patternsMonday, 29 April 13
  • 7ExploratoryAnalyticsUnstructuredWarehousesDataMining?MachineLearningAnalytics: Two patternsMonday, 29 April 13
  • 7ExploratoryAnalyticsUnstructuredWarehousesDataMining?MachineLearningAnalytics: Two patternsOperationalIntelligenceDashboards Real-timeDecisionsAlerting!Monday, 29 April 13
  • 7ExploratoryAnalyticsUnstructuredWarehousesDataMining?MachineLearningAnalytics: Two patternsOperationalIntelligenceDashboards Real-timeDecisionsAlerting!Complex analysis, data varietyQuery richnessData freshness, response timeQuery speedMonday, 29 April 13
  • 7ExploratoryAnalyticsUnstructuredWarehousesDataMining?MachineLearningAnalytics: Two patternsOperationalIntelligenceDashboards Real-timeDecisionsAlerting!Complex analysis, data varietyQuery richnessData freshness, response timeQuery speedMonday, 29 April 13
  • 8APIeventstreameventstoreroll-upcubesIngestProcessingdashboard queries programatic interfaceMonday, 29 April 13
  • 9Who uses Acunu?Location DataWeb and VisitorMarket/Tick DataInfrastructureSensor DataSocial MediaSocial GamingSmart GridProduction LineMonday, 29 April 13
  • 10Monday, 29 April 13
  • 10APIeventstreameventstoreroll-upcubesIngestProcessingdashboard queries programatic interfaceAPIeventstreameventstoreroll-upcubesIngestProcessingdashboard queries programatic interfaceCassandra stores raw events and intermediate aggregatesMonday, 29 April 13
  • 10APIeventstreameventstoreroll-upcubesIngestProcessingdashboard queries programatic interfaceAPIeventstreameventstoreroll-upcubesIngestProcessingdashboard queries programatic interfaceCassandra stores raw events and intermediate aggregatesAPIeventstoreroll-upcubesdashboard queries programatic interfaceAcunu Analytics is a Cassandra client mapping new events,queries and schema changes to aggregate reads and writes!APIeventstreameventstoreroll-upcubesIngestProcessingdashboard queries programatic intMonday, 29 April 13
  • 10APIeventstreameventstoreroll-upcubesIngestProcessingdashboard queries programatic interfaceAPIeventstreameventstoreroll-upcubesIngestProcessingdashboard queries programatic interfaceCassandra stores raw events and intermediate aggregatesAcunu Dashboards provides embeddable,custom data visualization using HTTP APIAPIeventstoreroll-upcubesdashboard queries programatic interfaceAcunu Analytics is a Cassandra client mapping new events,queries and schema changes to aggregate reads and writes!APIeventstreameventstoreroll-upcubesIngestProcessingdashboard queries programatic intMonday, 29 April 13
  • CREATE TABLE APICalls (time TIME(‘PST’, HOUR, MIN, SEC),path PATH(/),useragent STRING,latitude DOUBLE(0.1, 0.01),longitude DOUBLE(0.1, 0.01));CREATE CUBE SELECT COUNT, AVG(respTime) FROM APICallsWHERE time, pathGROUP BY time, path;CREATE CUBE SELECT COUNT FROM APICallsWHERE latitude, longitudeGROUP BY latitude, longitude;11(Loosely) Define a schema• Tables have HTTP endpoint; map to a set of ColumnFamilys• Dimensions map keys in events; allow hierarchical aggregation• Cubes defines dimensions and aggregate to maintainMonday, 29 April 13
  • CREATE CUBE SELECT SUM(a) FROM tWHERE x, y GROUP BY g, h, i;12AggregationestMonday, 29 April 13
  • CREATE CUBE SELECT SUM(a) FROM tWHERE x, y GROUP BY g, h, i;12AggregationestNew event:Apply SUM(v, v’)on this cellvA: v’X: xY: yZ: zyx(g, h, i)Monday, 29 April 13
  • CREATE CUBE SELECT SUM(a) FROM tWHERE x, y GROUP BY g, h, i;12Aggregation• Hierarchical dimensions cause multiple writes per event(That’s ok: Cassandra’s good at writes)• Most aggregates result in atomic counter incrementsestNew event:Apply SUM(v, v’)on this cellvA: v’X: xY: yZ: zyx(g, h, i)Monday, 29 April 13
  • SELECT SUM(a) FROM tWHERE x = .. and y = .. GROUP BY g, h, i;13Queriesest• WHEREs map to a Cassandra row and GROUP BY to acompound column key in that row (very roughly)Monday, 29 April 13
  • SELECT SUM(a) FROM tWHERE x = .. and y = .. GROUP BY g, h, i;13QueriesestNew query:•Locate slice thatmatches WHERE•Return all mappingsfrom GROUP BY tuplesto cell valuesvyx(g, h, i)• WHEREs map to a Cassandra row and GROUP BY to acompound column key in that row (very roughly)Monday, 29 April 13
  • 21:00 all→1345 :00→45 :01→62 :02→87 ...22:00 all→3221 :00→22 :01→19 :02→104 ...... ...UK all→228 user01→1 user14→12 user99→7 ...US all→354 user01→4 user04→8 user56→17 ......UK, 22:00 all→1904 ...∅ all→87314 UK→238 US→354 ...14A concrete exampleMonday, 29 April 13
  • 21:00 all→1345 :00→45 :01→62 :02→87 ...22:00 all→3222 :00→22 :01→19 :02→105 ...... ...UK all→229 user01→2 user14→12 user99→7 ...US all→354 user01→4 user04→8 user56→17 ......UK, 22:00 all→1905 ...∅ all→87315 UK→239 US→355 ...{cust_id: user01,session_id: 102,geography: UK,browser: IE,time: 22:02,}15Each event updatesmultiple aggregates:A concrete exampleMonday, 29 April 13
  • 21:00 all→1345 :00→45 :01→62 :02→87 ...22:00 all→3222 :00→22 :01→19 :02→105 ...... ...UK all→229 user01→2 user14→12 user99→7 ...US all→354 user01→4 user04→8 user56→17 ......UK, 22:00 all→1905 ...∅ all→87315 UK→239 US→355 ...{cust_id: user01,session_id: 102,geography: UK,browser: IE,time: 22:02,}15Each event updatesmultiple aggregates:WHERE time IN (22:00,23:00)GROUP BY minuteA concrete exampleMonday, 29 April 13
  • 21:00 all→1345 :00→45 :01→62 :02→87 ...22:00 all→3222 :00→22 :01→19 :02→105 ...... ...UK all→229 user01→2 user14→12 user99→7 ...US all→354 user01→4 user04→8 user56→17 ......UK, 22:00 all→1905 ...∅ all→87315 UK→239 US→355 ...{cust_id: user01,session_id: 102,geography: UK,browser: IE,time: 22:02,}15Each event updatesmultiple aggregates:WHERE time IN (22:00,23:00)GROUP BY minuteWHERE geography=USGROUP BY userA concrete exampleMonday, 29 April 13
  • 16SELECT `SUM(x)/(MAX(y) -MIN(y) + 0.5) AS spreadFROM ...Arithmetic expressionsSELECT a - b AS lbound, a + b AS uboundFROM(SELECT AVG(score) AS a FROM scoresWHERE year = 2012)JOIN(SELECT STDDEV(score) AS b FROM scores)USING (school)Fast inner joinsSELECT COUNT UNIQUE (visitors) GROUPBY time(DAY(‘US/Pacific’))Time zone supportSELECT SUM(size) FROM ..WHERE path MATCHES /usr/*Hierarchical aggregationSELECT DRILL FROM errors WHEREcategory IN (“warn”, “error”)Drill down to raw eventsSELECT COUNT (items) FROM ..GROUP BY category LIMIT 3,country... HAVING AVG(rating) < 2.0AND COUNT >= 10LimitsQuery-time filteringRich queriesMonday, 29 April 13
  • 17Monday, 29 April 13
  • Apache,Apache Cassandra, Cassandra, Hadoop, and the eye and elephant logosare trademarks of the Apache Software Foundation.Thank You.Tim Moreton CTO@timmoretonMonday, 29 April 13