Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Recommendation for new users at Criteo

48 views

Published on

This presentation describes our latest work on recommendation for new users at Criteo.

Published in: Engineering
  • Be the first to comment

Recommendation for new users at Criteo

  1. 1. Olivier Koch, Criteo RecSys London Meetup - Nov 8th, 2018 Large-scale recommendation for new users
  2. 2. 2 • Joint work with Ivan Lobov, Mohamed Amine Benhalloum, Dmitry Parfenchik, Alexandre Gillotte, Alois Bissuel, Vincent Grosbois, Sergei Lebedev, Flavian Vasile
  3. 3. 3 • 1. Context 2. Large-scale matrix factorization with randomized SVD 3. Offline evaluation methods 4. What's next? Outline
  4. 4. 4 • Buy ad space on publishers’ websites. Build banners showing products that users will like / want to buy. Get paid if users click / buy the product. What / Who is Criteo again?
  5. 5. 5 • What / Who is Criteo again? 3 billion ads/day 5 billion products 100 ms
  6. 6. 6 • Retargeting ~ a few hours
  7. 7. 7 • Acquisition ? ~ a few days/weeks
  8. 8. 8 • 2B users 20K partners ~1M products/partner Hundreds of possible campaigns per user In 50 ms! At scale
  9. 9. 9 • The Acquisition pipeline Campaign selection Product selection (Recommendation) Bidding
  10. 10. 10 • The Acquisition pipeline Campaign selection Product selection (Recommendation) Bidding
  11. 11. 11 • The Acquisition pipeline Campaign selection Product selection (Recommendation) Bidding The Recommendation problem
  12. 12. 12 • Instead of letting a different model do the bidding/campaign selection, how about we do recommendation for all user - partner pairs? 200B recommendations anyone?
  13. 13. Large-scale MF with R-SVD
  14. 14. 14 • Singular value decomposition A U S VT m x n m x m m x n n x n =
  15. 15. 15 • The catch m = n = hundred of million items
  16. 16. 16 • Randomized SVD Trick: Approximate A with a tall-and-tiny matrix Q
  17. 17. 17 • Randomized SVD
  18. 18. 18 • Randomized SVD How do we find Q?
  19. 19. 19 • Randomized SVD
  20. 20. 20 • Randomized SVD
  21. 21. 21 • Randomized SVD 0 20 40 60 80 100 120 1 5 9 13 17 21 25 29 33 37 41 45 49 53 57 61 65 69 73 77 81 85 89 93 97 101 105 109 113 117 121 125 129 133 137 141 145 149 153 157 161 165 169 173 177 singular values
  22. 22. 22 • Finding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions, Nathan Halko, Per-Gunnar Martinsson, Joel A. Tropp, Journal SIAM, May 2011 Randomized SVD
  23. 23. 23 • spark-rsvd https://github.com/criteo/Spark-RSVD
  24. 24. 24 • spark-rsvd (blog post) https://medium.com/@alois.bissuel/6695b649f519
  25. 25. 25 • Point-wise mutual information
  26. 26. 26 • Approximate nearest neighbors with Annoy https://erikbern.com/2015/10/01/nearest-neighbors-and-vector-models-part-2-how-to-search-in-high-dimensional-spaces.html Credits: Erik Bernhardsson
  27. 27. 27 • Putting it all together User timelines CoEvent matrix PMI matrix R-SVD KNN Indexing KNN Indices training inference User embedding Product vectors KNN SearchUser timelines Recommend ations
  28. 28. 28 • Putting it all together memcacheRecommen- dations HDFS All users x partners RecoService Campaign selection users x ~50 partners
  29. 29. 29 • Putting it all together memcacheRecommendati ons HDFS All users x partners RecoService Campaign selection users x ~50 partners Simpler (« no model ») Evolutive (reco-based)
  30. 30. 30 • Offline pipeline runs at scale in 5-10 hours with 100 Spark executors on ~300M timelines Spark, scala, python Scheduled every day The best is the enemy of the good (good enough for an AB test) Putting it all together
  31. 31. 31 • Good vs Best trade-off Not scalable Not prod-grade A few weeks Scalable Prod-grade Many months Scalable Not-quite-prod-grade Several months
  32. 32. Offline evaluation
  33. 33. 33 • • Global best-of (per partner) • Mixture of « sources » (best-of-by-X) merged into a pClick model Baselines
  34. 34. 34 • Precision @ k over pairs of partners Offline metrics train validation
  35. 35. 35 • Qualitative evaluation
  36. 36. 36 • Qualitative evaluation
  37. 37. 37 • Qualitative evaluation
  38. 38. 38 • Qualitative evaluation
  39. 39. What’s next?
  40. 40. 40 • Fusing CF and metadata (content2vec) Deeper representations of users and products (graph convolutions, recurrent neural nets) Train at scale with TF
  41. 41. 41 • tf-yarn: train TensorFlow models on YARN in just a few lines of code! https://github.com/criteo/tf-yarn
  42. 42. 42 • Acquisition provides new challenges for Recommendation algorithms MF (via R-SVD) is an attractive approach to try We built a pipeline leveraging R-SVD and KNN at scale (~300M users, hundreds of partners) with promising offline results Qualitative evaluation matters (on top of the quantitative one) There are many things coming up next! Summary
  43. 43. 43 • Thank you! o.koch@criteo.com ailab.criteo.com

×