Appboy analytics - NYC MUG 11/19/13

1,885 views

Published on

Published in: Technology
0 Comments
5 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,885
On SlideShare
0
From Embeds
0
Number of Embeds
270
Actions
Shares
0
Downloads
37
Comments
0
Likes
5
Embeds 0
No embeds

No notes for slide

Appboy analytics - NYC MUG 11/19/13

  1. 1. Appboy Analytics Jon Hyman NY MongoDB User Group, November 19, 2013 eBay NYC @appboy @jon_hyman
  2. 2. A LITTLE BIT ABOUT US & APPBOY (who we are and what we do) Appboy is a mobile relationship management platform for apps Jon Hyman CIO :: @jon_hyman ! Harvard Bridgewater
  3. 3. Appboy improves engagement by helping you understand your app users • IDENTIFY - Understand demographics, social and behavioral data • SEGMENT - Organize customers into groups based on behaviors, events, user attributes, and location • ENGAGE - Message users through push notifications, emails, and multiple forms of in-app messages
  4. 4. Use Case: Customer engagement begins with onboarding Urban Outfitters textPlus Shape Magazine
  5. 5. Agenda • How to quickly store time series data in MongoDB using flexible schemas
 • Learn how flexible schemas can easily provide breakdowns across dimensions
 • Counting quickly: statistical analysis on top of MongoDB queries
  6. 6. What kinds of analytics does Appboy track? • Lots of time series data • App opens over time • Events over time • Revenue over time • Marketing campaign stats and efficacy over time
  7. 7. What kinds of analytics does Appboy track? • Breakdowns* • Device types • Device OS versions • Screen resolutions • Revenue by product * We also care about this over time!
  8. 8. What kinds of analytics does Appboy track? • User segment membership • How many users are in each segment? • How many can be emailed or reached via push notifications? • What is the average revenue per user in the segment? • Per paying user?
  9. 9. Pre-aggregated Analytics: APP OPENS OVER TIME
  10. 10. Typical time series collection Log a new row for each open received ! {! timestamp: 2013-11-14 00:00:00 UTC,! app_id: App identifier! }! ! db.app_opens.find({app_id: A, timestamp: {$gte: date}})! Pro: Really, really simple. Easy to add attribution to users. Con: You need to aggregate the data before drawing the chart; lots of documents read into memory, lots of dirty pages
  11. 11. Fewer documents with pre-aggregation iteration 1 Create a document that groups by the time period ! {! app_id: App identifier,! date: Date of the document,! hour: 0-23 based hour this document represents,! opens: Number of opens this hour! }! ! db.app_opens.update({date: D, app_id: A, hour: 0}, {$inc: {opens:1}}) Pro: Really easy to draw histograms Con: We never care about an hour by itself. We lose attribution.
  12. 12. Fewer documents with pre-aggregation iteration 2 Create a document by day and have each hour be a field ! {! app_id: App identifier,! date: Date of the document,! total_opens: Total number of opens this day,! 0: Number of opens at midnight,! 1: Number of opens at 1am,! ...! 23: Number of opens at 11pm! }! ! db.app_opens.update(! {date: D, app_id: A}, ! {$inc: {“0”:1, total:1}}! ) Pro: Document count is low, easy to use aggregation framework for longer spans, fast: document should be in working set
  13. 13. Fewer documents with pre-aggregation iteration 2 • What about looking at different dimensions? • App opens by device type (e.g., how do iPads compare to iPhones?) • Demographics (gender, age group)
  14. 14. Solution! FLEXIBLE SCHEMAS!
  15. 15. Fewer documents with pre-aggregation iteration 3 Dynamically add dimensions in the document ! {! app_id; App identifier,! date: Date of the document,! totals: {! app_opens: Total number of opens this day,! devices: {! "iPad Air": Total number of opens on the iPad Air,! "iPhone 4": Total number of opens on the iPhone 4,! },! genders: {! male: Total number of opens from male users,! female: Total number of opens from female users! },! ...! },! 0: {! app_opens: Number of opens at midnight,! devices: {! "iPad Air": Number of opens on the iPad Air at midnight,! "iPhone 4": Number of opens on the iPhone 4 at midnight,! },! ...! },! ...! }! ! db.app_opens.update({date: D, app_id: A}, {$inc: {“0”:1, total:1}})
  16. 16. Pre-aggregated analytics Pros • • Easily extensible to add other dimensions • Still only using one document, therefore you can create charts very quickly • You get breakdowns over a time period for free ! Cons • • Pre-aggregated data has no attribution • Have to know questions ahead of time Follow up: What if we wanted to look at a graph by age group?
  17. 17. Pre-aggregated analytics summary • Get started tracking time series data quickly • You get breakdowns for free • Adding dimensions is super simple • No attribution, need to know questions ahead of time • Don’t just rely on pre-aggregated analytics
  18. 18. Counting quickly: USER SEGMENTATION & STATISTICAL ANALYSIS
  19. 19. User Segmentation •A group of users who match some set of filters
  20. 20. Counting quickly Appboy shows you segment membership in real-time as you add/edit/remove filters. ! How do we do it quickly? ! We estimate the population sizes of segments when using our web UI.
  21. 21. Counting quickly Goal: Quickly get the count() of an arbitrary query ! Problem: MongoDB counts are slow, especially unindexed ones
  22. 22. Counting quickly 10 million documents that represent people: {! favorite_color: “blue”,! age: 27,! gender: “M”,! favorite_food: “pizza”,! city: “NYC”,! shoe_size: 11,! attractiveness: 10,! ...! } !
  23. 23. Counting quickly 10 million documents that represent people: {! favorite_color: “blue”,! age: 27,! gender: “M”,! favorite_food: “pizza”,! city: “NYC”,! shoe_size: 11,! attractiveness: 10,! ...! } ! • How many people like blue? • How many live in NYC and love pizza? • How many men have a shoe size less than 10?
  24. 24. Answer: Big Question: How do you estimate counts? The same way news networks do it. ! With confidence.
  25. 25. Counting quickly Add a random number in a known range to each document. Say, between 0 and 9999. {! random: 4583,! favorite_color: “blue”,! age: 27,! gender: “M”,! favorite_food: “pizza”,! city: “NYC”,! shoe_size: 11,! attractiveness: 10,! ...! } ! Add an index on the random number: ! db.users.ensureIndex({random:1})
  26. 26. Counting quickly Step 1: Get a random sample ! I have 10 million documents. Of my 10,000 random “buckets”, I should expect each “bucket” to hold about 1,000 users. ! E.g., ! db.users.find({random: 123}).count() == ~1000! db.users.find({random: 9043}).count() == ~1000! db.users.find({random: 4982}).count() == ~1000
  27. 27. Counting quickly Step 1: Get a random sample ! Let’s take a random 100,000 users. Grab a random range that “holds” those users. These all work: ! db.users.find({random: {$gt: 0, $lt: 101})! db.users.find({random: {$gt: 503, $lt: 604})! db.users.find({random: {$gt: 8938, $lt: 9039})! db.users.find({$or: [! {random: {$gt: 9955}}, ! {random: {$lt: 56}}! ]) Tip: Limit $maxScan to 100,000 just to be safe
  28. 28. Counting quickly Step 2: Learn about that random sample ! db.users.find(! {! random: {$gt: 0, $lt: 101},! gender: “M”,! favorite_color: “blue”,! size_size: {$gt: 10}! }, ! )! ._addSpecial(“$maxScan”, 100000)! .explain() Explain Result: ! {! nscannedObjects: 100000,! n: 11302,! ...! } !
  29. 29. Counting quickly Step 3: Do the math ! Population: 10,000,000 ! Sample size: 100,000 ! Num matches: 11,302 ! Percentage of users who matched: 11.3% ! Estimated total count: 1,130,000 +/- 0.2% with 95% confidence
  30. 30. Counting quickly Step 4: Optimize ! Limit $maxScan to (100,000/numShards) to be even faster • ! Cache the random range for a few hours • ! Add more RAM (or shards) • ! Cache results to not hit the database for the same query •
  31. 31. Counting quickly Step 5: Improve ! Get more than one count: use the aggregation framework on top of the population’s sample size
 • • Work around all sorts of Mongo bugs :-(
  32. 32. Summarize • Pre-aggregated analytics • Create a document that represents event occurrences in some time period • Takes full advantage of MongoDB’s flexible schemas • Not a catch-all for analytics, you should still store event data
  33. 33. Summarize • Counting quickly • Estimate results of arbitrary queries using population sample sizes • Depending on your app, this could be a great way to keep response time predictable as you scale
  34. 34. Thanks! Questions? jon@appboy.com @appboy @jon_hyman

×