Data Science at Flurry


Published on

This talk is about the Data Science problems that the Data Science team at Flurry works on. In particular, it dives in to one of the problems we are solving: Machine Learning driven Bidding Strategy for bidding on mobile Real-Time Bidding (RTB) ad-exchanges.

Published in: Technology, Education
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • We see activity in 400,000 apps across 1.2 billion devices a monthThis big data advantage allows us the deepest insight into consumer behavior on mobileBig data is powering the evolution from perception to precision targeting
  • Collaborative Filtering uses only the matrix of time-spent by users inside apps, to build a model for how likely is a user to install and engage-with a new app.
  • But how good does collaborative filtering perform? Note that we don’t use the category of apps at all as an input to the Collaborative Filtering algorithm. Here we show how many users already had the SocialApp installed, and how many new users are predicted to install & engage with this SocialApp.Note that the category distribution of other apps installed by users who have the SocialApp is very similar to category distribution of apps for users predicted to install SocialApp. In other words, collaborative filtering while not using categories, has still resulted in picking remarkably accurately, users who have similar tastes as those Who already had SocialApp installed.
  • Data Science at Flurry

    1. 1. Data Science at Flurry Soups Ranjan, PhD
    2. 2. We all know that when we’re talking about mobile, we’re talking about apps Source: Nielsen- State of the App Nation 2012 Report and June 2013 Cross Platform Report Time spent on mobile devices
    3. 3. Flurry has the deepest insight into consumer behavior on mobile 1,200 875 700 400 320 100 0 200 400 600 800 1,000 1,200 1,400 Flurry Facebook Google Millenial Media twitter JumpTap Source: Data gathered from public statements/filings by Companies; Facebook denotes property and Network; Google Reach denotes sites and Network Monthly Device Reach (Millions) Twitter
    4. 4. • Flurry Analytics – Track users, sessions, events and crashes • Flurry AppCircle – Advertise with Flurry to acquire new users for your app • Flurry AppSpot – Monetize your app traffic via ads Flurry Product Overview
    5. 5. • AppCircle: Advertiser configuration to set an ad: – Ad type: CPI, CPC, CP Video – Corresponding Bid – Ad format: Banner or Interstitial – Targeting (Age, Gender, Device, Location, Persona) • AppCircle Bidder: – Optimally Acquire Ad-Space inventory where ads can be shown AppCircle – Advertise to Acquire Users
    6. 6. Cost Model (Bid Price Estimation) Bid Request (user, pub, exchange) {Eligible Ads} History of Bid, win-price (Ad1, Bid1, P(win)1) … (Adn, Bidn, P(win)n) Revenue model (Ad1, AdvBid1, P(conv)1)…(Adn, AdvBidn, P(conv)n) History of Ad Impressions, c onversions {Ads} Budget Pacing Advertiser Goals (α,β) Ad, AdvBid, Daily Budget, Spen d Ad Selector (Pick ad and its bid price) Bid ad on Exchange {Eligible Ads} AppCircle Bidder Strategy
    7. 7. Bidder Ad Selection Model - I  Ad Selection Model: Select Ad(adv,pub,exchange,user) = argmax (Pwin α (Revenue(adv,pub,exchange,user) – β Cost(adv,pub,exchange,user))) • Maximize margin model (α = β = 1): Select Ad(adv,pub,exchange,user) = argmax (Pwin (Revenue(adv,pub,exchange,user) – Cost(adv,pub,exchange,user))) – May lead to lower advertiser fill rate, as we will then only bid to show an advertiser's ad when we are guaranteed to win at price lower than advertiser's bid Ad Rev (ecpm) Cost P(win) Rank Adv1 1.50 1.30 0.30 0.3 * (1.5-1.3) = 0.06 Adv2 0.60 0.50 0.70 0.7 * (0.6-0.5) = 0.07
    8. 8. Bidder Ad Selection Model - II • Maximize fill rate for advertiser (α =1, β = 0): Select Ad(adv,pub,exchange,user) = argmax (Pwin Revenue(adv,pub,exchange,user)) – We select the ad that maximizes our revenue goals – however, we only bid if the revenue > cost Ad Rev (ecpm) Cost P(win) Rank Adv1 1.50 1.30 0.30 0.3 * (1.5) = 0.45 Adv2 0.60 0.50 0.70 0.7 * (0.6) = 0.42
    9. 9. Ad Revenue Optimization problem: – Max: P(conv) * bid – Conversion Prediction Model: Max P(conv) Historical Estimation: - Past conversion rate as a predictor for future conversion rates ML Conversion Prediction Model: – Features: Publisher, Ad, User, Time, Location AppCircle: Ad Revenue Optimization: u1 User id Conv-prob Conv-prob for users who saw Ad1 in Pub1’s app Avg conv-prob
    10. 10. Bidder Cost Model  Cost model:  We don’t know about other players in the auction  Best we can do is to predict based off of our wins and losses 1) If historically we win on auctions for users in Kansas City => 2) Most likely, other bidders not interested in Kansas City users => 3) Next time, we’ll lower our bid for Kansas City users => 4) If we still win those Kansas City users, continue (1-3) => 5) If not, we will revise our bid back up
    11. 11.  Machine Learnt model gives us both: Cost and P(win)  Multi-class Classification model (Logistic Regression) to predict win- price based on ad impression Machine Learning based Bidder Cost Model P(win) ~ 1.0 P(win) ~ 0.0 Win-price=28c Win-price=27c Win-price=52 c No Win
    12. 12. AppCircle Conversion Rates: Local Hour of Day Coefficient Local Hour of Day (0-23 hours) Regression weights for localHourOfDay 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 local hour of day (0-23 hours) Conversion Probability by locaHourOfDay 12 noon 4 pm 6 pm 7 pm
    13. 13. Machine Learning Workflow • How much data is enough? • Parallelize Feature Generation vs. Model Generation • Interpretable vs. Black-box models • Batch vs. Online learning • Time to Score a Model • Unbalanced Data • Over-fitting & Regularization
    14. 14. Recommender System  Recommender System as an Ad-ranking method  Given users and apps they have installed in the past, what other apps are they likely to install?  Given users and their app usage (time-spent), what new apps are they likely to highly engage with?
    15. 15. 1hr 1.6hr 1hr 0.6hr 1.5hr 1.2hr 2hr 2.1hr 2hr 3hr 0.3hr 0.1hr0.3hr 2hr 0.8hr Recommender System • Item-Item based Collaborative Filtering: – Missing value prediction App1 App2 App3 App4
    16. 16. Engagement Model – Android All • Category of SocialApp: Social • Number of users of SocialApp: 2,227 • Number of predicted users of SocialApp: 1,131 SocialApp SocialApp
    17. 17. Engagement Model – Android All • Category of SocialApp: Social • Number of users of SocialApp: 2,227 • Number of predicted users of SocialApp: 1,131 SocialAppSocialApp
    18. 18. Other Flurry Data Science Problems  Age and Gender Estimation  Click Fraud Detection  Optimize AppSpot Waterfall