Claudia Perlich
Chief Scientist, Dstillery
Adjunct Professor, Stern (NYU)
@claudia_perlich
Talesfromdatatrenchesof
display...
Ad
Exchange
Shopping at one of
our campaign sites
cookies
10 Million
URL’s
200 Million
browsers
0.0001% to 1%
baserate 10 ...
The Non-Branded Web
A consumer’s online/mobile activity
The Branded Web
gets recorded like this:
Our BrowserData:Agnostic
...
Targeting Model
Bidding
Model
Fraud
Causal
Analysis
Analytical
Decomposition
TheHeartandSoul
 Predictive modeling on hashed browsing history
 10 Million dimensions for URL’s (binary indicators)
 e...
Howcanwelearnfrom10Mfeatureswith
no/fewpositives?
 We cheat.
In ML, cheating is called “Transfer Learning”
Theheartand soul
 Has to deal with the 10 Million URL’s
 Need to find more positives!
Targeting
Model P(Buy|URL,inventor...
Experiment
 Randomized targeting across 58 different large display ad campaigns.
 Served ads to users with active, stabl...
.2.4.6.8
AUC
Train on Click Train on Purchase
*Restricted feature set used for these modeling results; qualitative conclus...
.2.4.6.8
AUC
Train on Click Train on Purchase
®
*Restricted feature set used for these modeling results; qualitative concl...
.2.4.6.81
Train on Clicks Train on Site Visits Train on Purchase
AUCDistribution
*Restricted feature set used for these mo...
Whyislearningthewrongthing
better???
Transfer:NavigatingBias-Variance
.2.4.6.81
Train on Clicks Train on Site Visits Train on Purchase
AUCDistribution
*Restricted feature set used for these mo...
Theheartand soul
 Has to deal with the 10 Million URL’s
 Transfer learning:
 Use all kinds of Site visits instead of ne...
Logisticregressionin 10
Milliondimensions
 Stochastic Gradient Descent
 L1 and L2 constraints
 Automatic estimation of ...
Ad AdAd
Real-timeScoringof aUser
Ad
OBSERVATION
Purchase
ProspectRank
Threshold
site visit with positive correlation
site ...
0
5
10
15
20
25
0
1.0M
2.0M
3.0M
4.0M
5.0M
6.0M
NNLiftoverRON
TotalImpressions
median lift = 5x
Note: the top prospects ar...
ThePokerface Bidding
ModelP(SiteVisit|Prospect Rank, Inventory, ad)
KDD 2012 Best Paper
Marginal Inventory Score:
Convert ...
InventoryforHotelCampaign
20
Lift
Measuringcausaleffect?
A/B Testing
Practical concerns
Estimate Causal effects from observational data
 Using targeted max...
Animportantdecision…
I think she is hot!
Hmm – so what should I write
to her to get her number?
Source: OK Trends
?
?
Hardshipsofcausality.
Beauty is Confounding
determines both the probability
of getting the number and of the
probability t...
Hardshipsof causality.
Targeting is Confounding
We only show ads to people
we know are more likely to
convert (ad or not)
...
ObservationalCausalMethods:TMLE
Negative Test: wrong ad
Positive Test: A/B comparison
Somecreativesdo notwork…
27
ThePoliceFraud
 Tracking artificial
co-visitation patters
 Blacklist inventory in the
exchanges
 Ignore the browser
KDD...
UnreasonablePerformanceIncreaseSpring12
2 weeks
PerformanceIndex
2x
Oddly predictive websites?
36%trafficisNon-Intentional
2011 2012
6%
36%
Traffic patterns are ‘non - human’
website 1 website 2
50%
Data from Bid Requests in Ad-Exchanges
Node:
hostname
Edge:
50% co-visitation
WWW2010
BostonHerald
BostonHerald
womenshealthbase?
WWW2012
Unreasonable Performance Increase Spring 12
2 weeks
PerformanceIndex
2x
Now it is coming also to brands
• ‘Cookie Stuffing’ increases the value of the ad for
retargeting
• Messing upWeb analytic...
Fraudpollutesmymodels
• Don’t show ads on those sites
• Don’t show ads to a high jacked browser
• Need to remove the visit...
Usingthepenaltybox:allbacktonormal
44
3 more weeks in spring 2012
PerformanceIndex
In eigenerSache
claudia.perlich@gmail.com
1. B. Dalessandro, F. Provost, R. Hook. Audience Selection for On-Line Brand
Advertising: Privacy Friendly Social Network ...
MLconf NYC Claudia Perlich
MLconf NYC Claudia Perlich
MLconf NYC Claudia Perlich
Upcoming SlideShare
Loading in …5
×

MLconf NYC Claudia Perlich

615 views

Published on

Published in: Technology, Design
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
615
On SlideShare
0
From Embeds
0
Number of Embeds
68
Actions
Shares
0
Downloads
16
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

MLconf NYC Claudia Perlich

  1. 1. Claudia Perlich Chief Scientist, Dstillery Adjunct Professor, Stern (NYU) @claudia_perlich Talesfromdatatrenchesof displayadvertising
  2. 2. Ad Exchange Shopping at one of our campaign sites cookies 10 Million URL’s 200 Million browsers 0.0001% to 1% baserate 10 Billions of auctions per day conversion Where should we advertise and at what price? Does the ad have causal effect? What data should we pay for? Attribution? Who should we target for a marketer? What requests are fraudulent?
  3. 3. The Non-Branded Web A consumer’s online/mobile activity The Branded Web gets recorded like this: Our BrowserData:Agnostic I do not want to ‘understand’ who you are … Browsing History Hashed URL’s: date1 abkcc date2 kkllo date3 88iok date4 7uiol … Browsing History Hashed URL’s: date1 abkcc date2 kkllo date3 88iok date4 7uiol … Brand Event Encoded date1 3012L20 date 2 4199L30 … date n 3075L50 Brand Event Encoded date1 3012L20 date 2 4199L30 … date n 3075L50
  4. 4. Targeting Model Bidding Model Fraud Causal Analysis Analytical Decomposition
  5. 5. TheHeartandSoul  Predictive modeling on hashed browsing history  10 Million dimensions for URL’s (binary indicators)  extremely sparse data  positives are extremely rare Targeting Model P(Buy|URL,inventory,ad)
  6. 6. Howcanwelearnfrom10Mfeatureswith no/fewpositives?  We cheat. In ML, cheating is called “Transfer Learning”
  7. 7. Theheartand soul  Has to deal with the 10 Million URL’s  Need to find more positives! Targeting Model P(Buy|URL,inventory,ad)
  8. 8. Experiment  Randomized targeting across 58 different large display ad campaigns.  Served ads to users with active, stable cookies  Targeted ~5000 random users per day for each marketer. Campaigns ran for 1 to 5 months, between 100K and 4MM impressions per campaign  Observed outcomes: clicks on ads, post-impression (PI) purchases (conversions) Data Targeting • Optimize targeting using Click and PI Purchase • Technographic info and web history as input variables • Evaluate each separately trained model on its ability to rank order users for PI Purchase, using AUC (Mann-Whitney Wilcoxin Statistic) • Each model is trained/evaluated using Logistic Regression
  9. 9. .2.4.6.8 AUC Train on Click Train on Purchase *Restricted feature set used for these modeling results; qualitative conclusions gener Predictiveperformance*(AUC)forpurchase learning [Dalessandro et al. 2012] .2.4.6.8 AUC Train on Click Train on Purchase ®
  10. 10. .2.4.6.8 AUC Train on Click Train on Purchase ® *Restricted feature set used for these modeling results; qualitative conclusions gener Predictiveperformance*(AUC)forclick learning [Dalessandro et al. 2012] Evaluatedonpredictingpurchases (AUCinthetargetdomain)
  11. 11. .2.4.6.81 Train on Clicks Train on Site Visits Train on Purchase AUCDistribution *Restricted feature set used for these modeling results; qualitative conclusions gener Predictiveperformance*(AUC) forSiteVisitlearning [Dalessandro et al. 2012] Significantly better targeting training on source task Evaluatedonpredictingpurchases (AUCinthetargetdomain)
  12. 12. Whyislearningthewrongthing better???
  13. 13. Transfer:NavigatingBias-Variance
  14. 14. .2.4.6.81 Train on Clicks Train on Site Visits Train on Purchase AUCDistribution *Restricted feature set used for these modeling results; qualitative conclusions gener Predictiveperformance*(AUC)across58 differentdisplayadcampaigns [Dalessandro et al. 2012] Significantly better targeting training on source task High cost High correlation High Variance Low cost Low correlation High Bias Low Cost High correlation Low Bias & Variance
  15. 15. Theheartand soul  Has to deal with the 10 Million URL’s  Transfer learning:  Use all kinds of Site visits instead of new purchases  Biased sample in every possible way to reduce variance  Negatives are ‘everything else’  Pre-campaign without impression  Stacking for transfer learning Targeting Model Organic: P(SiteVisit|URL’s) P(Buy|URL,inventory,ad) MLJ 2014
  16. 16. Logisticregressionin 10 Milliondimensions  Stochastic Gradient Descent  L1 and L2 constraints  Automatic estimation of optimal learning rates  Bayesian empirical industry priors  Streaming updates of the models  Fully Automated ~10000 model per week KDD 2014 Targeting Model p(sv|urls) =
  17. 17. Ad AdAd Real-timeScoringof aUser Ad OBSERVATION Purchase ProspectRank Threshold site visit with positive correlation site visit with negative correlation ENGAGEMENT Some prospects fall out of favor once their in-market indicators decline.
  18. 18. 0 5 10 15 20 25 0 1.0M 2.0M 3.0M 4.0M 5.0M 6.0M NNLiftoverRON TotalImpressions median lift = 5x Note: the top prospects are consistently rated as being excellent compared to alternatives by advertising clients’ internal measures, and when measured by their analysis partners (e.g., Nielsen): high ROI, low cost-per-acquisition, etc. Lift over random for 66 campaigns for online display ad prospecting Liftoverbaseline <snip>
  19. 19. ThePokerface Bidding ModelP(SiteVisit|Prospect Rank, Inventory, ad) KDD 2012 Best Paper Marginal Inventory Score: Convert into bid price:
  20. 20. InventoryforHotelCampaign 20 Lift
  21. 21. Measuringcausaleffect? A/B Testing Practical concerns Estimate Causal effects from observational data  Using targeted maximumlikelihood(TMLE) to estimatecausal impact  Canbe done ex-post for different questions  Need tocontrol for confounding  Data has to be ‘rich’and cover allcombinations of confounding and treatment ADKDD 2011 E[YA=ad] – E[YA=no ad]
  22. 22. Animportantdecision… I think she is hot! Hmm – so what should I write to her to get her number?
  23. 23. Source: OK Trends ? ?
  24. 24. Hardshipsofcausality. Beauty is Confounding determines both the probability of getting the number and of the probability that James will say it need to control for the actual beauty or it can appear that making compliments is a bad idea “You are beautiful.”
  25. 25. Hardshipsof causality. Targeting is Confounding We only show ads to people we know are more likely to convert (ad or not) conversionrates DID NOT SEE ADSAW AD Needtocontrolforconfounding Datahastobe‘rich’andcoverall combinationsofconfoundingand treatment
  26. 26. ObservationalCausalMethods:TMLE Negative Test: wrong ad Positive Test: A/B comparison
  27. 27. Somecreativesdo notwork… 27
  28. 28. ThePoliceFraud  Tracking artificial co-visitation patters  Blacklist inventory in the exchanges  Ignore the browser KDD 2013
  29. 29. UnreasonablePerformanceIncreaseSpring12 2 weeks PerformanceIndex 2x
  30. 30. Oddly predictive websites?
  31. 31. 36%trafficisNon-Intentional 2011 2012 6% 36%
  32. 32. Traffic patterns are ‘non - human’ website 1 website 2 50% Data from Bid Requests in Ad-Exchanges
  33. 33. Node: hostname Edge: 50% co-visitation WWW2010
  34. 34. BostonHerald
  35. 35. BostonHerald
  36. 36. womenshealthbase?
  37. 37. WWW2012
  38. 38. Unreasonable Performance Increase Spring 12 2 weeks PerformanceIndex 2x
  39. 39. Now it is coming also to brands • ‘Cookie Stuffing’ increases the value of the ad for retargeting • Messing upWeb analytics … • Messes up my models because a botnet is easier to predict than a human
  40. 40. Fraudpollutesmymodels • Don’t show ads on those sites • Don’t show ads to a high jacked browser • Need to remove the visits to the fraud sites • Need to remove the fraudulent brand visits When we see a browser on caught up in fraudulent activity: send him to the penalty box where we ignore all his actions
  41. 41. Usingthepenaltybox:allbacktonormal 44 3 more weeks in spring 2012 PerformanceIndex
  42. 42. In eigenerSache claudia.perlich@gmail.com
  43. 43. 1. B. Dalessandro, F. Provost, R. Hook. Audience Selection for On-Line Brand Advertising: Privacy Friendly Social Network Targeting, KDD 2009 2. O. Stitelman, B. Dalessandro, C. Perlich, and F. Provost. Estimating The Effect Of Online Display Advertising On Browser Conversion. ADKDD 2011 3. C.Perlich, O. Stitelman, B. Dalessandro, T. Raeder and F. Provost. Bid Optimizing and Inventory Scoring in Targeted Online Advertising. KDD 2012 (Best Paper Award) 4. T. Raeder, O. Stitelman, B. Dalessandro, C. Perlich, and F. Provost. Design Principles of Massive, Robust Prediction Systems. KDD 2012 5. B. Dalessandro, O. Stitelman, C. Perlich, F. Provost Causally Motivated Attribution for Online Advertising. In Proceedings of KDD, ADKDD 2012 6. B. Dalessandro, R. Hook. C. Perlich, F. Provost. Transfer Learning for Display Advertising MLJ 2014 7. T. Raeder, C. Perlich, B. Dalessandro, O. Stitelman, F. Provost. Scalable Supervised Dimensionality Reduction Using Clustering at KDD 2013 8. O. Stitelman, C. Perlich, B. Dalessandro, R. Hook, T. Raeder, F. Provost. Using Co- visitation Networks For Classifying Non-Intentional Traffic‘ at KDD 2013 46 SomeReferences

×