Realtime Analytics with Cassandra
Upcoming SlideShare
Loading in...5
×
 

Like this? Share it with your network

Share

Realtime Analytics with Cassandra

on

  • 2,448 views

My talk at NoSQL Now 2012

My talk at NoSQL Now 2012

Statistics

Views

Total Views
2,448
Views on SlideShare
2,351
Embed Views
97

Actions

Likes
5
Downloads
66
Comments
0

5 Embeds 97

https://twitter.com 74
http://tweets.acunu.com 14
https://si0.twimg.com 5
http://lanyrd.com 3
http://www.twylah.com 1

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Realtime Analytics with Cassandra Presentation Transcript

  • 1. Realtime Analytics with Cassandra Acunu Analytics Tom Wilkie, Acunu 21st August 2012
  • 2. • Motivation / alternatives • What is it? • How does it work? • Approximate Analytics • Whats it good for?2 Analytics
  • 3. • Motivation / alternatives • What is it? • How does it work? • Approximate Analytics • Whats it good for?3 Analytics
  • 4. Why bother? “Companies that can harness big data will trample data incompetents” The Economist, May 26th 20114 Analytics
  • 5. time page session id duration time page session id duration time ...page session id duration ... time ... time page ... page session id ...... ... ... duration ... page session id duration ... time 14:58:03.234 time ... /index.html page session id 175 ...... ... ... duration ... ... 248.180.3.40 session id 175 duration 14:58:03.234 time... 14:58:03.234 time /index.html page ... /index.html page 248.180.3.40 session id 175 ...... ... ... duration ... 14:58:03.234 /csi/csi/council/freedom.html 14:58:03.409 ... time ... 248.180.3.40 /index.html page 248.180.3.40 session id session id 175 duration ... 248.180.3.40 1234 ... 14:58:03.409 ... time /index.html page 248.180.3.40 session id duration /csi/csi/council/freedom.html ... 248.180.3.40 1234 175 ... ... /index.html page 248.180.3.40 session id duration 14:58:03.234 /docs/access/chapter8.txt ...... page 248.180.3.40 ...session id ...... 14:58:03.234 /csi/csi/council/freedom.html 14:58:03.409 ... time 248.180.3.40 1234 175 duration /csi/csi/council/freedom.html 99.1.10.178 52 /docs/access/chapter8.txt ... page 248.180.3.40 ...session id duration 14:58:03.409 ... time 14:58:03.877 14:58:03.234 /index.html 248.180.3.40 1234 175 14:58:03.877 14:58:03.234 /csi/csi/council/freedom.html 14:58:03.409 ... time /index.html 99.1.10.178 52 248.180.3.40 1234 175 ... ... 52 1234 175 duration 14:58:03.409 ... /csi/csi/council/freedom.html 99.1.10.178 /docs/access/chapter8.txt ... page 248.180.3.40 session id 14:58:03.87714:58:03.234 time 248.180.3.40 ...session id ...... 248.180.3.40 1234 175 duration 14:58:03.877 /index.html 248.180.3.40 14:58:03.877 /docs/access/chapter8.txt 14:58:03.409 ... time/docs/access/chapter8.txt ...99.1.10.178 /csi/csi/council/freedom.html 99.1.10.178 /index.html page 52 ...52 248.180.3.40 session id duration 14:58:03.234 14:58:03.877 /docs/access/chapter8.txt 14:58:03.877 14:58:03.234 time /docs/access/chapter8.txt ...99.1.10.178 /csi/csi/council/freedom.html 99.1.10.178 /index.html page 52 ... 1234 175 ... 52 52 ... 1234 175 duration 248.180.3.40 14:58:03.409 ... /docs/access/chapter8.txt /docs/access/chapter8.txt ...99.1.10.178 99.1.10.178 52 14:58:03.87714:58:03.409 ...... /csi/csi/council/freedom.html 99.1.10.17852 52session id 175 ...... /docs/access/chapter8.txt /index.html page 248.180.3.40 14:58:03.877 14:58:03.234 time /csi/csi/council/freedom.html 14:58:03.409 /docs/access/chapter8.txt 99.1.10.178 248.180.3.40 session id duration 99.1.10.178 248.180.3.40... 1234 14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52 /docs/access/chapter8.txt ......99.1.10.178 14:58:03.877 14:58:03.234 time 14:58:03.877 14:58:03.409 ... /csi/csi/council/freedom.html 99.1.10.17852 52 ... 1234 175 duration /docs/access/chapter8.txt /index.html page 248.180.3.40 14:58:03.877 14:58:03.877 /docs/access/chapter8.txt 52 14:58:03.877 14:58:03.409 ... /csi/csi/council/freedom.html 99.1.10.17852 52 ... 1234 175 ...... 14:58:03.877 14:58:03.234 248.180.3.40 /docs/access/chapter8.txt /index.html 99.1.10.178 248.180.3.40 52 99.1.10.178 14:58:03.877 /docs/access/chapter8.txt248.180.3.40 14:58:03.877 14:58:03.234 /docs/access/chapter8.txt ... 248.180.3.40 /docs/access/chapter8.txt /index.html 99.1.10.178 248.180.3.40 52 14:58:03.877 /csi/csi/council/freedom.html 14:58:03.409 14:58:03.877 /docs/access/chapter8.txt 14:58:03.409 ... /docs/access/chapter8.txt ... 1234 14:58:03.877 14:58:03.234 /csi/csi/council/freedom.html 99.1.10.17852 52 ... 1234 175 ... 99.1.10.178 248.180.3.40 /index.html 99.1.10.178 248.180.3.40 /csi/csi/council/freedom.html /docs/access/chapter8.txt /index.html 14:58:03.409 14:58:03.877 /docs/access/chapter8.txt 14:58:03.877 14:58:03.409 /docs/access/chapter8.txt 248.180.3.40 14:58:03.877 14:58:03.234 /csi/csi/council/freedom.html 99.1.10.17852 52 52 1234 175 99.1.10.178 1234 248.180.3.40 248.180.3.40 14:58:03.877 /csi/csi/council/freedom.html 248.180.3.40 /docs/access/chapter8.txt /index.html 99.1.10.178 14:58:03.409 14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 1234 52 /docs/access/chapter8.txt/csi/csi/council/freedom.html 99.1.10.17852 52 52 1234 175 14:58:03.87714:58:03.877 14:58:03.234 /csi/csi/council/freedom.html 99.1.10.178 248.180.3.40 1234 175 14:58:03.409 /docs/access/chapter8.txt 99.1.10.17852248.180.3.40 /docs/access/chapter8.txt /index.html 99.1.10.178 248.180.3.40 52 /csi/csi/council/freedom.html 99.1.10.178 14:58:03.409 14:58:03.877 /docs/access/chapter8.txt 14:58:03.877 14:58:03.409 /docs/access/chapter8.txt 248.180.3.40 1234 52 /docs/access/chapter8.txt /csi/csi/council/freedom.html 99.1.10.178 248.180.3.40 14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 99.1.10.178 14:58:03.409 14:58:03.877 /docs/access/chapter8.txt 14:58:03.877 14:58:03.409 /docs/access/chapter8.txt 99.1.10.17852248.180.3.40 14:58:03.87714:58:03.877 14:58:03.234 /csi/csi/council/freedom.html 99.1.10.17852 52 52 1234 1234 14:58:03.877 14:58:03.877 /docs/access/chapter8.txt 248.180.3.40 14:58:03.877 /csi/csi/council/freedom.html 99.1.10.17852248.180.3.40 /docs/access/chapter8.txt/csi/csi/council/freedom.html 99.1.10.17852 52 52 1234 1234 14:58:03.409 14:58:03.877 /csi/csi/council/freedom.html 99.1.10.178 99.1.10.178 248.180.3.40 14:58:03.409 14:58:03.877 /docs/access/chapter8.txt 14:58:03.409 /docs/access/chapter8.txt 99.1.10.17852248.180.3.40 /csi/csi/council/freedom.html/docs/access/chapter8.txt 1234 99.1.10.17852 52 52 1234 248.180.3.40 /docs/access/chapter8.txt/csi/csi/council/freedom.html 14:58:03.877 14:58:03.877 /docs/access/chapter8.txt 248.180.3.40 14:58:03.409 14:58:03.877 /docs/access/chapter8.txt 1234 99.1.10.178 14:58:03.409 14:58:03.87714:58:03.409 14:58:03.877 /csi/csi/council/freedom.html/docs/access/chapter8.txt 1234 99.1.10.17852 52 52 /docs/access/chapter8.txt 248.180.3.40 /csi/csi/council/freedom.html 99.1.10.178 /docs/access/chapter8.txt 248.180.3.40123452 1234 99.1.10.178 14:58:03.409 14:58:03.877 /csi/csi/council/freedom.html 99.1.10.178 99.1.10.178 1234 52 52 52 14:58:03.409 14:58:03.877 /docs/access/chapter8.txt 14:58:03.87714:58:03.877 /csi/csi/council/freedom.html/docs/access/chapter8.txt /docs/access/chapter8.txt 248.180.3.40 52 99.1.10.178 248.180.3.40 99.1.10.178 /docs/access/chapter8.txt /docs/access/chapter8.txt 99.1.10.178 99.1.10.17852 52 52 14:58:03.877 /docs/access/chapter8.txt 99.1.10.17852 99.1.10.178 14:58:03.409 14:58:03.877 /docs/access/chapter8.txt /docs/access/chapter8.txt 248.180.3.40 14:58:03.877 14:58:03.409 14:58:03.877 14:58:03.409 14:58:03.877 /csi/csi/council/freedom.html 248.180.3.40 99.1.10.178 14:58:03.877 /csi/csi/council/freedom.html/docs/access/chapter8.txt 1234 52 1234 14:58:03.877 14:58:03.409 /docs/access/chapter8.txt 248.180.3.40 99.1.10.178 52 1234 52 52 14:58:03.87714:58:03.877 /docs/access/chapter8.txt 14:58:03.409 14:58:03.877 /csi/csi/council/freedom.html 99.1.10.17852 1234 /docs/access/chapter8.txt /docs/access/chapter8.txt /csi/csi/council/freedom.html 99.1.10.178 /docs/access/chapter8.txt 248.180.3.401234 14:58:03.877 14:58:03.409 /docs/access/chapter8.txt 248.180.3.40 99.1.10.178 52 1234 52 52 14:58:03.409 14:58:03.877 /csi/csi/council/freedom.html 99.1.10.17852 99.1.10.178 14:58:03.87714:58:03.877 /docs/access/chapter8.txt /docs/access/chapter8.txt /csi/csi/council/freedom.html 99.1.10.178 /docs/access/chapter8.txt 248.180.3.401234 14:58:03.877 14:58:03.409 /docs/access/chapter8.txt 248.180.3.40 99.1.10.178 52 1234 52 52 14:58:03.409 14:58:03.877 /csi/csi/council/freedom.html 99.1.10.17852 99.1.10.178 14:58:03.87714:58:03.877 /docs/access/chapter8.txt /docs/access/chapter8.txt /csi/csi/council/freedom.html 99.1.10.178 /docs/access/chapter8.txt 248.180.3.401234 14:58:03.877 14:58:03.409 /docs/access/chapter8.txt 248.180.3.40 99.1.10.178 52 1234 52 14:58:03.409 14:58:03.877 /csi/csi/council/freedom.html 99.1.10.17852 99.1.10.178 14:58:03.87714:58:03.877 /docs/access/chapter8.txt /csi/csi/council/freedom.html 99.1.10.178 /docs/access/chapter8.txt 248.180.3.40 14:58:03.877 14:58:03.409 /docs/access/chapter8.txt 248.180.3.40 99.1.10.178 52 1234 52 14:58:03.409 14:58:03.877 /csi/csi/council/freedom.html 99.1.10.17852 1234 /docs/access/chapter8.txt /csi/csi/council/freedom.html 99.1.10.178 /docs/access/chapter8.txt 248.180.3.40 14:58:03.877 14:58:03.409 /docs/access/chapter8.txt 248.180.3.40 99.1.10.178 52 1234 52 14:58:03.877 14:58:03.409 14:58:03.877 /csi/csi/council/freedom.html 99.1.10.17852 1234 14:58:03.877 /csi/csi/council/freedom.html 99.1.10.178 /docs/access/chapter8.txt /docs/access/chapter8.txt 248.180.3.40 14:58:03.877 14:58:03.409 /docs/access/chapter8.txt 248.180.3.40 99.1.10.178 52 1234 14:58:03.409 /docs/access/chapter8.txt 14:58:03.877 /csi/csi/council/freedom.html 99.1.10.178 /csi/csi/council/freedom.html 99.1.10.17852 1234 248.180.3.40 14:58:03.409 /docs/access/chapter8.txt 14:58:03.877 /csi/csi/council/freedom.html 99.1.10.178 /csi/csi/council/freedom.html 99.1.10.17852 1234 52 1234 14:58:03.877 14:58:03.409 /docs/access/chapter8.txt 248.180.3.40 248.180.3.40 14:58:03.409 /docs/access/chapter8.txt /csi/csi/council/freedom.html 99.1.10.178 14:58:03.877 14:58:03.409 /docs/access/chapter8.txt 248.180.3.40 14:58:03.877 /csi/csi/council/freedom.html 99.1.10.17852 1234 52 1234 248.180.3.40 14:58:03.409 /docs/access/chapter8.txt 14:58:03.877 14:58:03.877 /csi/csi/council/freedom.html 99.1.10.178 99.1.10.17852 1234 52 /docs/access/chapter8.txt 248.180.3.40 14:58:03.409 /docs/access/chapter8.txt 14:58:03.877 14:58:03.877 /csi/csi/council/freedom.html 99.1.10.178 99.1.10.17852 1234 52 /docs/access/chapter8.txt 248.180.3.40 14:58:03.877 /docs/access/chapter8.txt 14:58:03.877 /csi/csi/council/freedom.html 14:58:03.409 99.1.10.17899.1.10.17852 1234 52 /docs/access/chapter8.txt 248.180.3.40 14:58:03.409 /docs/access/chapter8.txt 14:58:03.877 /csi/csi/council/freedom.html 99.1.10.178 248.180.3.40 52 1234 14:58:03.409 /docs/access/chapter8.txt 14:58:03.877 /csi/csi/council/freedom.html 99.1.10.178 248.180.3.40 52 1234 14:58:03.877 14:58:03.409 /docs/access/chapter8.txt /csi/csi/council/freedom.html 99.1.10.178 248.180.3.40 52 1234 14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52 14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52 14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 525 Analytics
  • 6. Combining “big” and “real-time” is hard Live & historical Drill downs Trends... aggregates... and roll ups6 Analytics
  • 7. Solution Con Scalability $$$ Not realtime Spartan query semantics => complex, DIY solutions7 Analytics
  • 8. • Motivation / alternatives • What is it? • How does it work? • Approximate Analytics • Whats it good for?8 Analytics
  • 9. Analytics counter updatesClick stream events AcunuSensor data Analytics etc • Aggregate incrementally, on the fly • Store live + historical aggregates
  • 10. { time : TIME(HOUR; MIN; SEC), page : PATH(/), category : STRING, loadTime : LONG } { select : ["COUNT", "AVG(loadTime)"], where : “time, ?path”, group : “time, ?category” }10 Analytics
  • 11. Dashboard UI11 Analytics
  • 12. • Motivation / alternatives • What is it? • How does it work? • Approximate Analytics • Whats it good for?12 Analytics
  • 13. count grouped by ... day count distinct(session) count ... geographyavg(duration) ... browser13 Analytics
  • 14. time : TIME(HOUR; MIN; SEC), cust_id : LONG, Data session_id : LONG, Definition geography : STRING, browser : STRING, load_time : LONG { select: “COUNT” patterns: [ { where : “?time”, group : “?time” }, Query { where : “”, group : “geography” }, { where : “”, group : “browser” } Patterns ] }, { select: [“COUNT_DISTINCT(session_id)”, “AVG(load_time)”], where: “time”, group: “” }14 Analytics
  • 15. 21:00 all→1345 :00→45 :01→62 :02→87 ... 22:00 all→3221 :00→22 :00→19 :02→104 ...{ cust_id: user01, ... ... session_id: 102, UK all→228 user01→1 user14→12 user99→7 ... geography: UK, US all→354 user01→4 user04→8 user56→17 ... browser: IE, time: 22:02, ...} UK, 22:00 all→1904 ... ∅ all→87314 UK→238 US→354 ...15 Analytics
  • 16. 21:00 all→1345 :00→45 :01→62 :02→87 ... 22:00 all→3222 :00→22 :00→19 :02→105 ...{ cust_id: user01, ... ... session_id: 102, UK all→229 user01→2 user14→12 user99→7 ... geography: UK, US all→354 user01→4 user04→8 user56→17 ... browser: IE, time: 22:02, ...} UK, 22:00 all→1905 ... ∅ all→87315 UK→239 US→354 ...16 Analytics
  • 17. 21:00 all→1345 :00→45 :01→62 :02→87 ... 22:00 all→3221 :00→22 :00→19 :02→104 ... ... ... UK all→228 user01→1 user14→12 user99→7 ... US all→354 user01→4 user04→8 user56→17 ... ... UK, 22:00 all→1904 ... ∅ all→87314 UK→238 US→354 ...17 Analytics
  • 18. where time 21:00-22:00 count(*) 21:00 all→1345 :00→45 :01→62 :02→87 ... 22:00 all→3222 :00→22 :01→19 :02→105 ... ... ... UK all→229 user01→2 user14→12 user99→7 ... US all→354 user01→4 user04→8 user56→17 ... ... UK, 22:00 all→1905 ... ∅ all→87315 UK→239 US→354 ...18 Analytics
  • 19. where time 21:00-22:00 count(*) 21:00 all→1345 :00→45 :01→62 :02→87 ...where time 22:00-23:00, 22:00 all→3222 :00→22 :01→19 :02→105 ... group by minute ... ... UK all→229 user01→2 user14→12 user99→7 ... US all→354 user01→4 user04→8 user56→17 ... ... UK, 22:00 all→1905 ... ∅ all→87315 UK→239 US→354 ...19 Analytics
  • 20. where time 21:00-22:00 count(*) 21:00 all→1345 :00→45 :01→62 :02→87 ...where time 22:00-23:00, 22:00 all→3222 :00→22 :01→19 :02→105 ... group by minute ... ... UK all→229 user01→2 user14→12 user99→7 ...where geography=UK US all→354 user01→4 user04→8 user56→17 ... group all by user, ... UK, 22:00 all→1905 ... ∅ all→87315 UK→239 US→354 ...20 Analytics
  • 21. where time 21:00-22:00 count(*) 21:00 all→1345 :00→45 :01→62 :02→87 ...where time 22:00-23:00, 22:00 all→3222 :00→22 :01→19 :02→105 ... group by minute ... ... UK all→229 user01→2 user14→12 user99→7 ...where geography=UK US all→354 user01→4 user04→8 user56→17 ... group all by user, ... UK, 22:00 all→1905 ...count all ∅ all→87315 UK→239 US→354 ...21 Analytics
  • 22. where time 21:00-22:00 count(*) 21:00 all→1345 :00→45 :01→62 :02→87 ...where time 22:00-23:00, 22:00 all→3222 :00→22 :01→19 :02→105 ... group by minute ... ... UK all→229 user01→2 user14→12 user99→7 ...where geography=UK US all→354 user01→4 user04→8 user56→17 ... group all by user, ... UK, 22:00 all→1905 ...count all ∅ all→87315 UK→239 US→354 ...group all by geo22 Analytics
  • 23. • Motivation / alternatives • What is it? • How does it work? • Approximate Analytics • Whats it good for?23 Analytics
  • 24. Approximate Analytics Exact Real-time Large Scale24 Analytics
  • 25. Count Distinct Plan A: keep a list of all the things you’ve seen count them at query time Quick to update ... but at scale ... Takes lots of space Takes a long time to query25 Analytics
  • 26. Approximate Distinct max # leading zeroes seen so far item hash leading zeroes max so far x 00101001110... 2 2 y 11010100111... 0 2 z 00011101011... 3 3 ... ... to see a max of M takes about 2M items26 Analytics
  • 27. Approximate Distinct to reduce var, average over m=2k sub-streams item hash index, zeroes max so far x 00101001110... 0, 0 0,0,0,0 y 11010100111... 3, 1 0,0,1,0 z 00011101011... 0, 1 1,0,1,0 ... take the harmonic mean27 Analytics
  • 28. • Motivation / alternatives • What is it? • How does it work? • Approximate Analytics • Whats it good for?28 Analytics
  • 29. Was it worth it?29 Analytics
  • 30. What’s Coming? • Ad Hoc: same queries, but without the need to pre-define them • Geolocation: support for location-based events and queries • Drill down: see the events that make up any given aggregate30 Analytics
  • 31. • Motivation / alternatives • What is it? • How does it work? • Approximate Analytics • Whats it good for?31 Analytics
  • 32. Manufacturing Social Media Ad Analytics Systems Financial Oil + Gas Monitoring Services Analytics
  • 33. “Up and running in about 4 hours”“We found out a competitor was scraping our data” “We keep discovering use cases we hadn’t thought of ” Analytics
  • 34. Analytics
  • 35. www.acunu.com @acunuApache, Apache Cassandra, Cassandra, Hadoop, and the eye andelephant logos are trademarks of the Apache Software Foundation. 35 Analytics