Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Criteo TektosData Meetup

753 views

Published on

This talk presents the machine learning stack at Criteo, how we deliver software at scale and a few lessons we've learned along the way.

Published in: Engineering
  • Be the first to comment

Criteo TektosData Meetup

  1. 1. Copyright © 2015 Criteo The Criteo Experience Olivier Koch Engineering Program Manager, Criteo TektosData Meetup “Data Meets Business” May 31, 2016
  2. 2. Copyright © 2015 Criteo Outline • What does Criteo do? • Deep dive into our technical stack • Delivery at scale • A few lessons learned 2
  3. 3. Copyright © 2015 Criteo Banners… what else? 3 Advertiser Publisher
  4. 4. Copyright © 2015 Criteo Online advertising at scale 4 3B displays / day 40 PB of data 15,000 servers worldwide
  5. 5. Copyright © 2015 Criteo • Deep dive into Criteo
  6. 6. Copyright © 2015 Criteo 6 Bidding •Should we bid? •At which price? Recommendation •Which products should we display? Look & Feel •Big image vs small image •Background color, ... Prediction •Generic prediction engine •Specific models trained on TBs of data
  7. 7. Copyright © 2015 Criteo 7 Bidding •Should we bid? •At which price? Recommendation •Which products should we display? Look & Feel •Big image vs small image •Background color, ... Prediction •Generic prediction engine •Specific models trained on TBs of data
  8. 8. Copyright © 2015 Criteo  As we sell performance Criteo’s and client’s interests are aligned, so the engine aims at maximizing the value we generate to our clients  As the cost of a display is lower and independant from the bid (2nd price auction or floor), we should always bid the maximum value that the client is willing to pay for a display We bid the expected value of the display for the client Value = 1€ CPM = 0,6€ CPM = 0,7€ CPM = 0,75€ CPM = 1,1€ CPM = 1,2€ CPM = 1,3€ This bidding strategy is optimal: we are sure to buy all profitable displays and only them
  9. 9. Copyright © 2015 Criteo Bid =   CPC  pClick  pSale  AOV 2012 - Ensures constant value allocation between Criteo and its clients 2014 - COS Optimizer 2013 - CRO : “Conversion Rate Optimizer” This value depends on the predicted performance and the client’s objective Revenue that the display will generate for the clientMaximum share that the client is willing to pay
  10. 10. Copyright © 2015 Criteo We train our prediction models on our historical displays Historical displays Variables  Level of engagement of the user  Quality of inventory  User fatigue  For travel: time to check-in and number of nights : clicked displays : converted displays (size = order value) Our ability to predict relies greatly on the relevance of the variables we consider Machine Learning Algorithms
  11. 11. Copyright © 2015 Criteo 11 Bidding •Should we bid? •At which price? Recommendation •Which products should we display? Look & Feel •Big image vs small image •Background color, ... Prediction •Generic prediction engine •Specific models trained on TBs of data
  12. 12. Copyright © 2015 Criteo Recommend products for a user • What we want: reco(user) = products • 1B users x 3B products! • But we need to scale and keep it fresh
  13. 13. Copyright © 2015 Criteo User X saw orange shoes Users who saw these same shoes also saw Most viewed product on the client’s site are We use collaborative filtering to select candidate products Candidate products for user X are Historical Similar Best-of
  14. 14. Copyright © 2015 Criteo Products delivering the best performance are displayed Variables  Products seen by the user  Time since product event  Level of similarity  Product features Historical displays : clicked products : converted products (size = order value) Products are selected based on their pClick x pSale x AOV Machine Learning Algorithms
  15. 15. Copyright © 2015 Criteo 15 Bidding •Should we bid? •At which price? Recommendation •Which products should we display? Look & Feel •Big image vs small image •Background color, ... Prediction •Generic prediction engine •Specific models trained on TBs of data
  16. 16. Copyright © 2015 Criteo Historical displays (color = look & feel) We train our prediction models on our historical displays Variables Some of which we control:  How user interacts with banner  Organization of information  Colorset Some of which we don’t:  Zone format  Publisher : clicked displays : converted displays (size = order value) Look and feel will be selected based on its pClick x pSale x AOV My company BUY! BUY! BUY! BUY! Machine Learning Algorithms
  17. 17. Copyright © 2015 Criteo 17 Bidding •Should we bid? •At which price? Recommendation •Which products should we display? Look & Feel •Big image vs small image •Background color, ... Prediction •Generic prediction engine •Specific models trained on TBs of data
  18. 18. Copyright © 2015 Criteo  Predict: 𝔼 𝑆𝑎𝑙𝑒𝑠𝐴𝑚𝑜𝑢𝑛𝑡 = ℙ 𝐶𝑙𝑖𝑐𝑘 ℙ 𝑆𝑎𝑙𝑒|𝐶𝑙𝑖𝑐𝑘 𝔼[𝑆𝑎𝑙𝑒𝑠𝐴𝑚𝑜𝑢𝑛𝑡|𝑆𝑎𝑙𝑒]  Each model is trained independently & refreshed as often as possible  Three sources of features: user, ad, page (mostly categorical). Optimizing for sales amount (logistic) (logistic) (log normal) (all regularized!)
  19. 19. Copyright © 2015 Criteo Learn on huge volumes of data 10 000 displays
  20. 20. Copyright © 2015 Criteo Learn on huge volumes of data 10 000 displays leads to 50 clicks
  21. 21. Copyright © 2015 Criteo Learn on huge volumes of data 10 000 displays leads to 50 clicks leads to 1 sale
  22. 22. Copyright © 2015 Criteo  We have our own large-scale distributed machine learning library on top of Hadoop used for all models.  From a ML perspective we rely, in most cases, on an L-BFGS solver initialized with SGD (see, eg, A. Agarwal et al. A Reliable Effective Terascale Linear Learning System). In-house Machine Learning library -- IRMA
  23. 23. Copyright © 2015 Criteo Learning duration: trading time and volume Longer ⇒ Volume ↑ VS Shorter ⇒ Reactivity ↑ 23 100 110 120 130 140 150 160 170 180 190 200 11/01/2014 21/01/2014 31/01/2014 10/02/2014 20/02/2014 Salesamount(€) Valentine’s day eve Precision Learning duration 12/02/2014 13/02/2014 14/02/2014 15/02/2014 16/02/2014 17/02/2014 18/02/2014 All
  24. 24. Copyright © 2015 Criteo  Each model is trained on several TB of data and contains millions of features  We learn several hundreds of models, refreshed many times per day  How about large-scale distributed machine learning? Wait a minute: how do you handle TBs of training data? + =
  25. 25. Copyright © 2015 Criteo  Hadoop AllReduce  L-BFGS, being a batch algorithm, is easy to distribute (by distributing the computation of the gradient), while it’s more difficult with SGD; we do parameter averaging for that, which needs some tweaking (learning rate, number of epochs, …). In SGD, we use Hogwild! to multi-thread.  Zookeeper to ensure fault-tolerance. Distribution of L-BFGS & SGD
  26. 26. Copyright © 2015 Criteo  Irma is not only about vanilla logistic regression with L2 regularization; it contains more advanced techniques: transfer learning, factorization machines, learning to rank, …  We for example use cost-sensitive learning for bidding. A word on advanced techniques
  27. 27. Copyright © 2015 Criteo Two steps:  Offline testing is fast, cheap, and efficient for wide exploration  Online testing is expensive but has the ultimate word  The more data you have, the faster you can make decisions Offline & online evaluation
  28. 28. Copyright © 2015 Criteo 28 Physical infrastructure 7 in-house data centers on 3 continents ~ 15000 servers, largest Hadoop cluster in Europe More than 35 PB of storage Big Data Traffic 800k HTTP requests / sec (peak activity) 29000 impressions / sec (peak activity) <10 ms to process bidding request <100 ms to process reco request
  29. 29. Copyright © 2015 Criteo Academic research @ Criteo • Our 1st public dataset is online: http://bit.ly/1vgw2XC • New 1TB dataset released last year • Recent publications: Offline evaluation of response prediction in online advertising auctions, O. Chapelle, WWW’15. Sources of variability in large-scale machine learning systems, D. Lefortier, A. Truchet, and M. de Rijke, NIPS workshop on ML systems, 2015 Cost-sensitive learning for bidding in online advertising auctions, F. Vasile and D. Lefortier, NIPS workshop on ML for e-commerce, 2015. 29
  30. 30. Copyright © 2015 Criteo New areas of research • Counterfactual evaluation (offline A/B tests) • Product embeddings for recommendation • Policy learning 30
  31. 31. Copyright © 2015 Criteo • Delivery at scale
  32. 32. Copyright © 2015 Criteo The early days of Criteo 32 Single C# repository Build in 90 minutes Weekly merges
  33. 33. Copyright © 2015 Criteo What could go wrong? 33
  34. 34. Copyright © 2015 Criteo 34
  35. 35. Copyright © 2015 Criteo Delivery at scale at Criteo 35 Trunk-based development (TBD) Fast commits Code reviews with Gerrit The MOAB Deploy with scp / bittorrent Automatic metrics checks => 200+ happy engineers!
  36. 36. Copyright © 2015 Criteo The Criteo MOAB 36
  37. 37. Copyright © 2015 Criteo Delivery at scale at Criteo 37
  38. 38. Copyright © 2015 Criteo • A few lessons learned
  39. 39. Copyright © 2015 Criteo Start small • If you can't build it with a few machines, it's likely you won't be able to do it with many 39 First Google computer
  40. 40. Copyright © 2015 Criteo Start small • Keep fancy algorithms for later 40 The Page rank algorithm
  41. 41. Copyright © 2015 Criteo Iterate fast • Easy access to data (20PB vs 4GB of clean, carefully selected data) • Convenient technologies (e.g. Python & notebooks, scikit-learn) • Make IT a non-problem • Keep projects small (typical project size 3-9 months) 41
  42. 42. Copyright © 2015 Criteo Iterate fast • Easy access to data (20PB vs 4GB of clean, carefully selected data) • Convenient technologies (e.g. Python & notebooks, scikit-learn) • Make IT a non-problem • Keep projects small (typical project size 3-9 months) 42 Talent magnet
  43. 43. Copyright © 2015 Criteo Keep teams small 43 3 members 3 channels 4 members 6 channels 5 members 10 channels 10 members 45 channels …
  44. 44. Copyright © 2015 Criteo Build the right team • Variety of skills • Software/ML engineers, ops/devops • Analysts/BI • Product • Designers • Managers 44
  45. 45. Copyright © 2015 Criteo Make the team agile • Use a flat, distributed hierarchy model and make people sit next to each other 45 EPM ENG LEAD PM MGR
  46. 46. Copyright © 2015 Criteo Make the team agile • Use the right tools • slack • jira • confluence • git • gerrit • OKR 46
  47. 47. Copyright © 2015 Criteo Build the culture • Let ideas emerge bottom-up • Hackathons (for real) • 10% projects • Transparency : make info available to all • Use mature technologies • You will fail. That’s OK! 47
  48. 48. Copyright © 2015 Criteo Take-aways • Start small • Iterate fast • Build the team • Make the team agile • Build the culture 48
  49. 49. Copyright © 2015 Criteo • Thanks! Questions?

×