Flexible Event Tracking (Paul Gebheim)

4,099 views

Published on

Published in: Technology

Flexible Event Tracking (Paul Gebheim)

  1. 1. Flexible Event Logging Analyzing Funnels, Retention, and Viral Spread with MongoDB Paul Gebheim - Justin.tv
  2. 2. How can we effectively use our data to make Justin.tv better?
  3. 3. Questions Who does what, and how? Funnels How valuable are groups of users? Virality Are our changes working? Retention, Funnel Conversion
  4. 4. The Dream A general framework for creating, deploying, and analyzing A/B tests in terms of Funnels, Virality, and Retention.
  5. 5. Backend Dreams Flexibility Queryability Scalability ... and it should be easy to work with
  6. 6. Backend Dreams... come true Schema-less! Rich data access/manipulation toolset At home in a web-centric toolchain Sharding, Map/Reduce, Replication
  7. 7. Lets build it...
  8. 8. Aggregating Data Web Site Events [{     "name": "front_page/broadcast_click",     "date": "2010-04-20 12:00:00-7000",     "unique_id": "fRx8zq",     "bucket": "big_red_button" }, {     "name": "front_page/broadcast_click",     "date": "2010-04-20 12:01:00-7000",     "unique_id": "9aB8c2",     "bucket": "small_blue_button" }]
  9. 9. Aggregating Data Video System Events [{     "name": "broadcast/started",     "date": "2010-04-20 12:10:00-7000",     "unique_id": "fRx8zq",     "bucket": "big_red_button",     "channel": "my_1337_ch4nn31l", }]
  10. 10. Processing Data Python Map/Reduce Configuration Documents Generate/Apply MongoDB operations
  11. 11. Example: Count how many times each event occurs per 'bucket'
  12. 12. Example Historical Data with SQL: 1 select 2     event_name, bucket, count(*) 3 from 4     events 5 group by event_name, bucket;
  13. 13. Mongo can do that! For small datasets, use collection.group()  1 var count_events_per_bucket = function() {  2     return db.events.group({  3         key: {name: 1, bucket: 1},  4         cond: {/* include all events */},  5         reduce: function(event, aggregate) {  6             aggregate.count += 1;  7         },  8         initial: {  9             count: 0 10         } 11     }); 12 }
  14. 14. Mongo can do that! For large datasets, use collection.mapReduce()  1 var count_events_per_bucket_big = function() {  2     var res = db.events.mapReduce(  3         // map  4         function() {  5             emit({  6                 name: this.name,  7                 bucket: this.bucket  8             }, 1);  9         }, 10         // reduce 11         function(key, values_list) { 12             var count=0; 13             each(values_list, function(v,n) { 14                 count += v; 15 }); 16             return count; 17         } 18     ); 19 20     return db[res.result].find(); 21 };
  15. 15. Mongo can also... be used to do the counting in real time!  1 matchers = {  2      "front_page/broadcast_click": lambda event: event["bucket"],  3      "broadcast/started": lambda event["bucket"]  4 }  5  6 for event in events:  7     key = event["name"]  8     if key in matchers:  9         count_key = "counts.%s.%s" % ( 10                         extractDay(event["date"]), 11                         matchers[key](event)) 12         event_db.event_counts.update( 13                 {"_id": key}, 14                 {"$inc": {count_key: 1}}, 15                 multi=True, upsert=True) 16     event_db.events.insert(event) 17
  16. 16. Example How the results appear in Mongo  1 > db.event_counts.find()  2 {  3     "_id": "front_page/broadcast_click",  4     "counts": {  5         "2010-04-20": {  6             "big_red_button": 1231,  7             "small_blue_button": 86  8         }  9     } 10 } 11 { 12     "_id": "broadcast/started", 13     "counts": { 14         "2010-04-20": { 15             "big_red_button": 72, 16             "small_blue_button": 6 17         } 18     } 19 } 20 >
  17. 17. What’s that we have there? First, Click the “Broadcast Button” Then, Start Broadcasting
  18. 18. We can add more events... First, Click the “Broadcast Button” Authenticate Click flash “Allow” or Disallow” box Share with friends ... Then, Start Broadcasting
  19. 19. Periodic Map/Reduce Computing a bunch of stuff every half hour is fine if its fast enough A program can generate arbitrarily complex Map/Reduce code...
  20. 20. Accurate Funnel Calculation • Per user rollup – For each user, which steps in the funnel have they been at with constraints applied – A map to get unique users, a reduce to count which unique events they triggered • Per bucket rollup – For each bucket, how many users at each ‘step’ in the funnel – Sum counts at each step per bucket
  21. 21. Same strategy... All calculations ended up being done in batch jobs...
  22. 22. Thoughts... Interactive performance poor during M/R jobs Eliot says this is fixed in 1.5.0 :-)
  23. 23. Thoughts... Even so... its fast enough!
  24. 24. Future work Migrating old Postgres-backed system to MongoDB Real-time calculation for timeseries calculation Batch jobs for Funnel, Retention, and Virality

×