1) The document discusses Criteo's use of large-scale matrix factorization with randomized SVD and approximate nearest neighbors to provide recommendations for new users at an enormous scale of 200 billion recommendations across hundreds of millions of users and partners.
2) Criteo built a pipeline that uses user timelines, a co-event matrix, point-wise mutual information, randomized SVD, and KNN indexing to train user and product embeddings and provide recommendations from pre-computed indices.
3) Offline evaluation of the recommendations compared to baseline approaches showed promising results, and qualitative evaluations also provided positive feedback, though there remain opportunities for deeper modeling and training techniques at larger scales.
2. 2 •
Joint work with Ivan Lobov, Mohamed Amine
Benhalloum, Dmitry Parfenchik, Alexandre Gillotte, Alois
Bissuel, Vincent Grosbois, Sergei Lebedev, Flavian Vasile
4. 4 •
Buy ad space on publishers’ websites.
Build banners showing products that users will like / want to buy.
Get paid if users click / buy the product.
What / Who is Criteo again?
5. 5 •
What / Who is Criteo again?
3 billion ads/day
5 billion products
100 ms
11. 11 •
The Acquisition pipeline
Campaign selection
Product selection
(Recommendation)
Bidding
The Recommendation problem
12. 12 •
Instead of letting a different model do the
bidding/campaign selection, how about we do
recommendation for all user - partner pairs?
200B recommendations anyone?
26. 26 •
Approximate nearest neighbors with Annoy
https://erikbern.com/2015/10/01/nearest-neighbors-and-vector-models-part-2-how-to-search-in-high-dimensional-spaces.html
Credits: Erik Bernhardsson
27. 27 •
Putting it all together
User timelines
CoEvent
matrix
PMI
matrix
R-SVD
KNN
Indexing
KNN Indices
training
inference
User
embedding
Product
vectors
KNN SearchUser timelines Recommend
ations
28. 28 •
Putting it all together
memcacheRecommen-
dations
HDFS
All users x partners
RecoService
Campaign
selection
users x ~50 partners
29. 29 •
Putting it all together
memcacheRecommendati
ons
HDFS
All users x partners
RecoService
Campaign
selection
users x ~50 partners
Simpler
(« no model »)
Evolutive
(reco-based)
30. 30 •
Offline pipeline runs at scale in 5-10 hours with 100 Spark
executors on ~300M timelines
Spark, scala, python
Scheduled every day
The best is the enemy of the good (good enough for an AB test)
Putting it all together
31. 31 •
Good vs Best trade-off
Not scalable
Not prod-grade
A few weeks
Scalable
Prod-grade
Many months
Scalable
Not-quite-prod-grade
Several months
40. 40 •
Fusing CF and metadata (content2vec)
Deeper representations of users and products (graph
convolutions, recurrent neural nets)
Train at scale with TF
41. 41 •
tf-yarn: train TensorFlow models on YARN in just a few lines of code!
https://github.com/criteo/tf-yarn
42. 42 •
Acquisition provides new challenges for Recommendation algorithms
MF (via R-SVD) is an attractive approach to try
We built a pipeline leveraging R-SVD and KNN at scale (~300M users, hundreds of
partners) with promising offline results
Qualitative evaluation matters (on top of the quantitative one)
There are many things coming up next!
Summary