Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Copyright © 2016 Criteo
ML for Display Advertising @ Scale
Damien Lefortier
MLconf NYC
2016-04-15
Copyright © 2015 Criteo
Outline
• Introduction to the AdTech / Criteo
• Deep dive into our ML algorithms
• Offline and onl...
Copyright © 2015 Criteo
Outline
• Introduction to the AdTech / Criteo
• Deep dive into our ML algorithms
• Offline and onl...
Copyright © 2015 Criteo
AdTech / Criteo
4
Advertiser Publisher
Copyright © 2015 Criteo
Our Engine is trying to answer 3 questions
COMMON
OBJECTIVE:
Maximize the
client’s value
1. How mu...
Copyright © 2015 Criteo
6
Physical infrastructure
7 in-house data centers on 3 continents
~ 15000 servers; largest Hadoop ...
Copyright © 2015 Criteo
Outline
• Introduction to the AdTech / Criteo
• Deep dive into our ML algorithms
• Offline and onl...
Copyright © 2015 Criteo
8
Bidding
•Should we bid?
•At which price?
Recommendation
•Which products should
we display?
Look ...
Copyright © 2015 Criteo
9
Bidding
•Should we bid?
•At which price?
Recommendation
•Which products should
we display?
Look ...
Copyright © 2015 Criteo
Bidding strategy (1)
• As we sell performance: Criteo’s and our clients’ interests are aligned.
• ...
Copyright © 2015 Criteo
Bidding strategy (2)
• This value depends on the predicted performance and the client’s objective....
We train our prediction models on our historical displays
Historical displays
Variables
 Level of engagement of the user
...
Copyright © 2015 Criteo
13
Bidding
•Should we bid?
•At which price?
Recommendation
•Which products should
we display?
Look...
Copyright © 2015 Criteo
Recommend products for a user
• What we want: reco(user) = products
• 1B users x 3B products!
• Bu...
Bob saw orange shoes
Some candidate products
Historical
Similar
Complementary
Most viewed
Products delivering the best performance are displayed
Variables
 Products seen by the user
 Time since product event
 ...
Copyright © 2015 Criteo
17
Bidding
•Should we bid?
•At which price?
Recommendation
•Which products should
we display?
Look...
Historical displays (color = look & feel)
We train our prediction models on our historical displays
Variables
Some of whic...
Copyright © 2015 Criteo
19
Bidding
•Should we bid?
•At which price?
Recommendation
•Which products should
we display?
Look...
Copyright © 2015 Criteo
Many models to learn
• We have different ML models for bidding / recommendation / … and depending
...
Copyright © 2015 Criteo
Learn on huge volumes of data
10 000 displays
Copyright © 2015 Criteo
Learn on huge volumes of data
10 000 displays
leads to
50 clicks
Copyright © 2015 Criteo
Learn on huge volumes of data
10 000 displays
leads to
50 clicks
leads to
1 sale
Copyright © 2015 Criteo
Quadratic features
• Outer product between 2 features (similar to a polynomial kernel of degree 2)...
Copyright © 2015 Criteo
Hashing trick
• Standard representation of categorical features: “one-hot” encoding
• Dimensionali...
Copyright © 2015 Criteo
In-house Machine Learning library -- IRMA
• We have our own large-scale distributed machine learni...
Copyright © 2015 Criteo
Distribution of L-BFGS & SGD
• L-BFGS, being a batch algorithm, is easy to distribute.
• SGD is a ...
Copyright © 2015 Criteo
A word on more advanced techniques
• Irma is not only about vanilla logistic regression with L2 re...
Copyright © 2015 Criteo
Outline
• Introduction to the AdTech / Criteo
• Deep dive into our ML algorithms
• Offline and onl...
Copyright © 2015 Criteo
Offline & online evaluation
Usual two-step process:
• Offline testing is fast, cheap, and efficien...
Copyright © 2015 Criteo
Offline metrics (bidding case)
• We use classical metrics: LLH, RMSE, … (which focus on the predic...
Copyright © 2015 Criteo
Online metrics (bidding case)
• RevExTac = Revenue Excluding Traffic Acquisition Costs
• Cost, Rev...
Copyright © 2015 Criteo
Some statistics on evaluation
• 100K+ offline tests per year
• 1K+ A/B tests per year
• Many peopl...
Copyright © 2015 Criteo
Outline
• Introduction to the AdTech / Criteo
• Deep dive into our ML algorithms
• Offline and onl...
Copyright © 2015 Criteo
Some examples of future areas of Research
• Counterfactual evaluation (offline A/B tests)
• Embedd...
Copyright © 2015 Criteo
Counterfactual evaluation
• Estimate the business metric directly (clicks, sales, …).
• Using the ...
Copyright © 2015 Criteo
Embeddings for recommandation
• Can embeddings (for example a la word2vec) help us compute similar...
Copyright © 2015 Criteo
Policy learning – example on Look & Feel optimization
• Classical supervised machine learning appr...
Copyright © 2015 Criteo
Academic research @ Criteo
• Our 1st public dataset is online: http://bit.ly/1vgw2XC
• New 1TB dat...
Copyright © 2015 Criteo
Questions
d.lefortier@criteo.com
Upcoming SlideShare
Loading in …5
×

Damien Lefortier, Senior Machine Learning Engineer and Tech Lead in the Prediction Machine Learning team, Criteo at MLconf NYC - 4/15/16

1,665 views

Published on

Machine Learning for Display Advertising @ Scale: In this talk, we will briefly introduce the display advertising marketplace, its stakeholders and the key performance metrics. We will then present the models we have developed at Criteo for bidding in real-time auctions, product recommendation, and look & feel optimization at scale (1B+ monthly users, 3B+ products in our catalog, and 30K ad displayed / sec at peak traffic). For these tasks, we’ve moved over time from predicting rare, binary events (clicks) to predicting very rare events (sales) and continuous events (sales amounts), all of them being quite noisy, and we’ll discuss the different methods that we have tried to build these models (such as generalized linear models, trees or factorization machines). We’ll continue by discussing how we evaluate these models both offline and online. We will describe the infrastructure for large-scale distributed data processing that these algorithms rely upon and discuss different optimization techniques we have experimented with (such as SGD, L-BFGS, SVRG). Finally, we will conclude with future areas of research and discuss open challenges we are currently facing.”

Published in: Technology

Damien Lefortier, Senior Machine Learning Engineer and Tech Lead in the Prediction Machine Learning team, Criteo at MLconf NYC - 4/15/16

  1. 1. Copyright © 2016 Criteo ML for Display Advertising @ Scale Damien Lefortier MLconf NYC 2016-04-15
  2. 2. Copyright © 2015 Criteo Outline • Introduction to the AdTech / Criteo • Deep dive into our ML algorithms • Offline and online evaluation • Future areas of research 2
  3. 3. Copyright © 2015 Criteo Outline • Introduction to the AdTech / Criteo • Deep dive into our ML algorithms • Offline and online evaluation • Future areas of research 3
  4. 4. Copyright © 2015 Criteo AdTech / Criteo 4 Advertiser Publisher
  5. 5. Copyright © 2015 Criteo Our Engine is trying to answer 3 questions COMMON OBJECTIVE: Maximize the client’s value 1. How much should we bid for a given ad space? My company yes no no My company yes … 2. What products should we recommend / show? My company BUY! My company BUY! BUY! BUY! BUY! My company BUY! BUY! BUY! BUY! My company BUY! BUY! BUY! BUY! 3. What is the best look & feel of the banner?
  6. 6. Copyright © 2015 Criteo 6 Physical infrastructure 7 in-house data centers on 3 continents ~ 15000 servers; largest Hadoop cluster in Europe More than 35 PB of data storage Traffic 800k HTTP requests / sec (peak activity) 29000 impressions / sec (peak activity) < 10 ms to process a bidding request < 100 ms to render the ad (if we win)
  7. 7. Copyright © 2015 Criteo Outline • Introduction to the AdTech / Criteo • Deep dive into our ML algorithms • Offline and online evaluation • Future areas of research 7
  8. 8. Copyright © 2015 Criteo 8 Bidding •Should we bid? •At which price? Recommendation •Which products should we display? Look & Feel •Big image vs small image •Background color, ... Prediction •Generic prediction engine •Specific models trained on TBs of data
  9. 9. Copyright © 2015 Criteo 9 Bidding •Should we bid? •At which price? Recommendation •Which products should we display? Look & Feel •Big image vs small image •Background color, ... Prediction •Generic prediction engine •Specific models trained on TBs of data
  10. 10. Copyright © 2015 Criteo Bidding strategy (1) • As we sell performance: Criteo’s and our clients’ interests are aligned. • The cost of a display is lower and independent from the bid (2nd price or floor), so we should bid the max value the client is willing to pay. • We use adjustments for 1st price auctions. 10
  11. 11. Copyright © 2015 Criteo Bidding strategy (2) • This value depends on the predicted performance and the client’s objective. • Some examples: • Click optimized campaign: bid = maxCPC  pClick • CR optimized campaign: bid = maxCPO  pCR • … 11
  12. 12. We train our prediction models on our historical displays Historical displays Variables  Level of engagement of the user  Quality of inventory  User fatigue  For travel: time to check-in and number of nights : clicked displays : converted displays (size = order value) Our ability to predict relies greatly on the relevance of the variables we consider Machine Learning Algorithms
  13. 13. Copyright © 2015 Criteo 13 Bidding •Should we bid? •At which price? Recommendation •Which products should we display? Look & Feel •Big image vs small image •Background color, ... Prediction •Generic prediction engine •Specific models trained on TBs of data
  14. 14. Copyright © 2015 Criteo Recommend products for a user • What we want: reco(user) = products • 1B users x 3B products! • But we need to scale and keep it fresh • What we can do: Pre-select products offline Refine scoring online to get final candidates
  15. 15. Bob saw orange shoes Some candidate products Historical Similar Complementary Most viewed
  16. 16. Products delivering the best performance are displayed Variables  Products seen by the user  Time since product event  Level of similarity  Product features Historical displays : clicked products : converted products (size = order value) Products are selected based on their CTR, CR or OV Machine Learning Algorithms
  17. 17. Copyright © 2015 Criteo 17 Bidding •Should we bid? •At which price? Recommendation •Which products should we display? Look & Feel •Big image vs small image •Background color, ... Prediction •Generic prediction engine •Specific models trained on TBs of data
  18. 18. Historical displays (color = look & feel) We train our prediction models on our historical displays Variables Some of which we control:  How user interacts with banner  Organization of information  Colorset Some of which we don’t:  Zone format  Publisher : clicked displays : converted displays (size = order value) Look and feel will be selected based on its CTR, CR or OV My company BUY! BUY! BUY! BUY! Machine Learning Algorithms
  19. 19. Copyright © 2015 Criteo 19 Bidding •Should we bid? •At which price? Recommendation •Which products should we display? Look & Feel •Big image vs small image •Background color, ... Prediction •Generic prediction engine •Specific models trained on TBs of data
  20. 20. Copyright © 2015 Criteo Many models to learn • We have different ML models for bidding / recommendation / … and depending on the campaign objective. We use logistic regression in many places. • Each model is trained independently & refreshed as often as possible. • Three main sources of features: user, ad, page (mostly categorical). 20
  21. 21. Copyright © 2015 Criteo Learn on huge volumes of data 10 000 displays
  22. 22. Copyright © 2015 Criteo Learn on huge volumes of data 10 000 displays leads to 50 clicks
  23. 23. Copyright © 2015 Criteo Learn on huge volumes of data 10 000 displays leads to 50 clicks leads to 1 sale
  24. 24. Copyright © 2015 Criteo Quadratic features • Outer product between 2 features (similar to a polynomial kernel of degree 2). • Example between site and advertiser: 24 Publisher network Publisher Site Url Advertiser network Ad Campaign Advertiser
  25. 25. Copyright © 2015 Criteo Hashing trick • Standard representation of categorical features: “one-hot” encoding • Dimensionality equal to the number of different values… • Hashing to reduce dimensionality (made popular by John Langford in VW) • Dimensionality now independent of number of values • Using: 25
  26. 26. Copyright © 2015 Criteo In-house Machine Learning library -- IRMA • We have our own large-scale distributed machine learning library on top of Hadoop used for all our models. • From a ML perspective we rely, in most cases, on an L-BFGS solver initialized with SGD (see, eg, A. Agarwal et al. A Reliable Effective Terascale Linear Learning System). 26
  27. 27. Copyright © 2015 Criteo Distribution of L-BFGS & SGD • L-BFGS, being a batch algorithm, is easy to distribute. • SGD is a bit tricker: we do parameter averaging for that and we also use Hogwild! to multi-thread on each machine. • We use Hadoop AllReduce: 27
  28. 28. Copyright © 2015 Criteo A word on more advanced techniques • Irma is not only about vanilla logistic regression with L2 regularization…  • It contains more advanced techniques such as, e.g., transfer learning, factorization machines, learning to rank, cost-sensitive learning, … • We for example use cost-sensitive learning for bidding. 28
  29. 29. Copyright © 2015 Criteo Outline • Introduction to the AdTech / Criteo • Deep dive into our ML algorithms • Offline and online evaluation • Future areas of research 29
  30. 30. Copyright © 2015 Criteo Offline & online evaluation Usual two-step process: • Offline testing is fast, cheap, and efficient for wide exploration. • Online testing is expensive but has the ultimate word. 30
  31. 31. Copyright © 2015 Criteo Offline metrics (bidding case) • We use classical metrics: LLH, RMSE, … (which focus on the prediction and ignore the bidding system where we use these models). • Utility from Offline Evaluation of Response Prediction in Online Advertising Auctions by O. Chapelle (WWW’15). 31
  32. 32. Copyright © 2015 Criteo Online metrics (bidding case) • RevExTac = Revenue Excluding Traffic Acquisition Costs • Cost, Revenue, … 32
  33. 33. Copyright © 2015 Criteo Some statistics on evaluation • 100K+ offline tests per year • 1K+ A/B tests per year • Many people  33 • We developed a platform and processes that enable very fast testing and improvement
  34. 34. Copyright © 2015 Criteo Outline • Introduction to the AdTech / Criteo • Deep dive into our ML algorithms • Offline and online evaluation • Future areas of research 34
  35. 35. Copyright © 2015 Criteo Some examples of future areas of Research • Counterfactual evaluation (offline A/B tests) • Embeddings for recommandation • Policy learning 35
  36. 36. Copyright © 2015 Criteo Counterfactual evaluation • Estimate the business metric directly (clicks, sales, …). • Using the production model + randomization. • Good results on clicks already. 36
  37. 37. Copyright © 2015 Criteo Embeddings for recommandation • Can embeddings (for example a la word2vec) help us compute similarities between, e.g., different products or users? 37
  38. 38. Copyright © 2015 Criteo Policy learning – example on Look & Feel optimization • Classical supervised machine learning approach: learn a pClick model and sort by predicted values for each possible value (e.g, each color). • This is a hard problem and may be overkill! • Really, we only want to know which color is the best according to some business metric (eg, sales). 38
  39. 39. Copyright © 2015 Criteo Academic research @ Criteo • Our 1st public dataset is online: http://bit.ly/1vgw2XC • New 1TB dataset released last year. • Some recent publications: Offline Evaluation of Response Prediction in Online Advertising Auctions. O. Chapelle, WWW’15. Sources of Variability in Large-scale Machine Learning Systems. D. Lefortier, A. Truchet, and M. de Rijke, NIPS 2015, workshop on ML systems, 2015. Cost-sensitive Learning for Bidding in Online Advertising Auctions. F. Vasile and D. Lefortier, NIPS workshop on ML for e-Commerce, 2015. 39
  40. 40. Copyright © 2015 Criteo Questions d.lefortier@criteo.com

×