• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Flexible Event Tracking (Paul Gebheim)
 

Flexible Event Tracking (Paul Gebheim)

on

  • 5,005 views

 

Statistics

Views

Total Views
5,005
Views on SlideShare
4,633
Embed Views
372

Actions

Likes
15
Downloads
88
Comments
0

7 Embeds 372

http://www.nosqldatabases.com 202
http://www.slideshare.net 124
http://www.10gen.com 33
http://www.mongodb.org 9
http://translate.googleusercontent.com 2
http://static.slidesharecdn.com 1
http://webcache.googleusercontent.com 1
More...

Accessibility

Categories

Upload Details

Uploaded via as Apple Keynote

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />

Flexible Event Tracking (Paul Gebheim) Flexible Event Tracking (Paul Gebheim) Presentation Transcript

  • Flexible Event Logging Analyzing Funnels, Retention, and Viral Spread with MongoDB Paul Gebheim - Justin.tv
  • How can we effectively use our data to make Justin.tv better?
  • Questions Who does what, and how? Funnels How valuable are groups of users? Virality Are our changes working? Retention, Funnel Conversion
  • The Dream A general framework for creating, deploying, and analyzing A/B tests in terms of Funnels, Virality, and Retention.
  • Backend Dreams Flexibility Queryability Scalability ... and it should be easy to work with
  • Backend Dreams... come true Schema-less! Rich data access/manipulation toolset At home in a web-centric toolchain Sharding, Map/Reduce, Replication
  • Lets build it...
  • Aggregating Data Web Site Events [{     "name": "front_page/broadcast_click",     "date": "2010-04-20 12:00:00-7000",     "unique_id": "fRx8zq",     "bucket": "big_red_button" }, {     "name": "front_page/broadcast_click",     "date": "2010-04-20 12:01:00-7000",     "unique_id": "9aB8c2",     "bucket": "small_blue_button" }]
  • Aggregating Data Video System Events [{     "name": "broadcast/started",     "date": "2010-04-20 12:10:00-7000",     "unique_id": "fRx8zq",     "bucket": "big_red_button",     "channel": "my_1337_ch4nn31l", }]
  • Processing Data Python Map/Reduce Configuration Documents Generate/Apply MongoDB operations
  • Example: Count how many times each event occurs per 'bucket'
  • Example Historical Data with SQL: 1 select 2     event_name, bucket, count(*) 3 from 4     events 5 group by event_name, bucket;
  • Mongo can do that! For small datasets, use collection.group()  1 var count_events_per_bucket = function() {  2     return db.events.group({  3         key: {name: 1, bucket: 1},  4         cond: {/* include all events */},  5         reduce: function(event, aggregate) {  6             aggregate.count += 1;  7         },  8         initial: {  9             count: 0 10         } 11     }); 12 }
  • Mongo can do that! For large datasets, use collection.mapReduce()  1 var count_events_per_bucket_big = function() {  2     var res = db.events.mapReduce(  3         // map  4         function() {  5             emit({  6                 name: this.name,  7                 bucket: this.bucket  8             }, 1);  9         }, 10         // reduce 11         function(key, values_list) { 12             var count=0; 13             each(values_list, function(v,n) { 14                 count += v; 15 }); 16             return count; 17         } 18     ); 19 20     return db[res.result].find(); 21 };
  • Mongo can also... be used to do the counting in real time!  1 matchers = {  2      "front_page/broadcast_click": lambda event: event["bucket"],  3      "broadcast/started": lambda event["bucket"]  4 }  5  6 for event in events:  7     key = event["name"]  8     if key in matchers:  9         count_key = "counts.%s.%s" % ( 10                         extractDay(event["date"]), 11                         matchers[key](event)) 12         event_db.event_counts.update( 13                 {"_id": key}, 14                 {"$inc": {count_key: 1}}, 15                 multi=True, upsert=True) 16     event_db.events.insert(event) 17
  • Example How the results appear in Mongo  1 > db.event_counts.find()  2 {  3     "_id": "front_page/broadcast_click",  4     "counts": {  5         "2010-04-20": {  6             "big_red_button": 1231,  7             "small_blue_button": 86  8         }  9     } 10 } 11 { 12     "_id": "broadcast/started", 13     "counts": { 14         "2010-04-20": { 15             "big_red_button": 72, 16             "small_blue_button": 6 17         } 18     } 19 } 20 >
  • What’s that we have there? First, Click the “Broadcast Button” Then, Start Broadcasting
  • We can add more events... First, Click the “Broadcast Button” Authenticate Click flash “Allow” or Disallow” box Share with friends ... Then, Start Broadcasting
  • Periodic Map/Reduce Computing a bunch of stuff every half hour is fine if its fast enough A program can generate arbitrarily complex Map/Reduce code...
  • Accurate Funnel Calculation • Per user rollup – For each user, which steps in the funnel have they been at with constraints applied – A map to get unique users, a reduce to count which unique events they triggered • Per bucket rollup – For each bucket, how many users at each ‘step’ in the funnel – Sum counts at each step per bucket
  • Same strategy... All calculations ended up being done in batch jobs...
  • Thoughts... Interactive performance poor during M/R jobs Eliot says this is fixed in 1.5.0 :-)
  • Thoughts... Even so... its fast enough!
  • Future work Migrating old Postgres-backed system to MongoDB Real-time calculation for timeseries calculation Batch jobs for Funnel, Retention, and Virality