SlideShare a Scribd company logo
1 of 43
Download to read offline
Olivier Koch, Criteo
RecSys London Meetup - Nov 8th, 2018
Large-scale
recommendation
for new users
2 •
Joint work with Ivan Lobov, Mohamed Amine
Benhalloum, Dmitry Parfenchik, Alexandre Gillotte, Alois
Bissuel, Vincent Grosbois, Sergei Lebedev, Flavian Vasile
3 •
1. Context
2. Large-scale matrix factorization with randomized SVD
3. Offline evaluation methods
4. What's next?
Outline
4 •
Buy ad space on publishers’ websites.
Build banners showing products that users will like / want to buy.
Get paid if users click / buy the product.
What / Who is Criteo again?
5 •
What / Who is Criteo again?
3 billion ads/day
5 billion products
100 ms
6 •
Retargeting
~ a few hours
7 •
Acquisition
?
~ a few days/weeks
8 •
2B users
20K partners
~1M products/partner
Hundreds of possible campaigns per user
In 50 ms!
At scale
9 •
The Acquisition pipeline
Campaign selection
Product selection
(Recommendation)
Bidding
10 •
The Acquisition pipeline
Campaign selection
Product selection
(Recommendation)
Bidding
11 •
The Acquisition pipeline
Campaign selection
Product selection
(Recommendation)
Bidding
The Recommendation problem
12 •
Instead of letting a different model do the
bidding/campaign selection, how about we do
recommendation for all user - partner pairs?
200B recommendations anyone?
Large-scale MF
with R-SVD
14 •
Singular value decomposition
A U S VT
m x n m x m m x n n x n
=
15 •
The catch
m = n = hundred of million items
16 •
Randomized SVD
Trick: Approximate A with a tall-and-tiny matrix Q
17 •
Randomized SVD
18 •
Randomized SVD
How do we find Q?
19 •
Randomized SVD
20 •
Randomized SVD
21 •
Randomized SVD
0
20
40
60
80
100
120
1
5
9
13
17
21
25
29
33
37
41
45
49
53
57
61
65
69
73
77
81
85
89
93
97
101
105
109
113
117
121
125
129
133
137
141
145
149
153
157
161
165
169
173
177
singular values
22 •
Finding structure with randomness: Probabilistic algorithms for constructing
approximate matrix decompositions, Nathan Halko, Per-Gunnar Martinsson, Joel A.
Tropp, Journal SIAM, May 2011
Randomized SVD
23 •
spark-rsvd
https://github.com/criteo/Spark-RSVD
24 •
spark-rsvd (blog post)
https://medium.com/@alois.bissuel/6695b649f519
25 •
Point-wise mutual information
26 •
Approximate nearest neighbors with Annoy
https://erikbern.com/2015/10/01/nearest-neighbors-and-vector-models-part-2-how-to-search-in-high-dimensional-spaces.html
Credits: Erik Bernhardsson
27 •
Putting it all together
User timelines
CoEvent
matrix
PMI
matrix
R-SVD
KNN
Indexing
KNN Indices
training
inference
User
embedding
Product
vectors
KNN SearchUser timelines Recommend
ations
28 •
Putting it all together
memcacheRecommen-
dations
HDFS
All users x partners
RecoService
Campaign
selection
users x ~50 partners
29 •
Putting it all together
memcacheRecommendati
ons
HDFS
All users x partners
RecoService
Campaign
selection
users x ~50 partners
Simpler
(« no model »)
Evolutive
(reco-based)
30 •
Offline pipeline runs at scale in 5-10 hours with 100 Spark
executors on ~300M timelines
Spark, scala, python
Scheduled every day
The best is the enemy of the good (good enough for an AB test)
Putting it all together
31 •
Good vs Best trade-off
Not scalable
Not prod-grade
A few weeks
Scalable
Prod-grade
Many months
Scalable
Not-quite-prod-grade
Several months
Offline
evaluation
33 •
• Global best-of (per partner)
• Mixture of « sources » (best-of-by-X) merged into a pClick
model
Baselines
34 •
Precision @ k over pairs of partners
Offline metrics
train validation
35 •
Qualitative evaluation
36 •
Qualitative evaluation
37 •
Qualitative evaluation
38 •
Qualitative evaluation
What’s next?
40 •
Fusing CF and metadata (content2vec)
Deeper representations of users and products (graph
convolutions, recurrent neural nets)
Train at scale with TF
41 •
tf-yarn: train TensorFlow models on YARN in just a few lines of code!
https://github.com/criteo/tf-yarn
42 •
Acquisition provides new challenges for Recommendation algorithms
MF (via R-SVD) is an attractive approach to try
We built a pipeline leveraging R-SVD and KNN at scale (~300M users, hundreds of
partners) with promising offline results
Qualitative evaluation matters (on top of the quantitative one)
There are many things coming up next!
Summary
43 •
Thank you!
o.koch@criteo.com
ailab.criteo.com

More Related Content

Similar to Recommendation for new users at Criteo

UKSG webinar: Authentication technology update: RA21 and OpenAthens with Josh...
UKSG webinar: Authentication technology update: RA21 and OpenAthens with Josh...UKSG webinar: Authentication technology update: RA21 and OpenAthens with Josh...
UKSG webinar: Authentication technology update: RA21 and OpenAthens with Josh...UKSG: connecting the knowledge community
 
ECIR Recommendation Challenges
ECIR Recommendation ChallengesECIR Recommendation Challenges
ECIR Recommendation ChallengesDaniel Kohlsdorf
 
Reco4J @ Munich Meetup (April 18th)
Reco4J @ Munich Meetup (April 18th)Reco4J @ Munich Meetup (April 18th)
Reco4J @ Munich Meetup (April 18th)Alessandro Negro
 
Guerilla Human Computer Interaction and Customer Based Design
Guerilla Human Computer Interaction and Customer Based DesignGuerilla Human Computer Interaction and Customer Based Design
Guerilla Human Computer Interaction and Customer Based DesignQuentin Christensen
 
Reco4J @ London Meetup (June 26th)
Reco4J @ London Meetup (June 26th)Reco4J @ London Meetup (June 26th)
Reco4J @ London Meetup (June 26th)Alessandro Negro
 
Open Chemistry: Input Preparation, Data Visualization & Analysis
Open Chemistry: Input Preparation, Data Visualization & AnalysisOpen Chemistry: Input Preparation, Data Visualization & Analysis
Open Chemistry: Input Preparation, Data Visualization & AnalysisMarcus Hanwell
 
BS 8878: Systematic Approaches to Documenting Web Accessibility Policies and ...
BS 8878: Systematic Approaches to Documenting Web Accessibility Policies and ...BS 8878: Systematic Approaches to Documenting Web Accessibility Policies and ...
BS 8878: Systematic Approaches to Documenting Web Accessibility Policies and ...lisbk
 
Ddz project new-approach-091124
Ddz project new-approach-091124Ddz project new-approach-091124
Ddz project new-approach-091124Saco Heijboer
 
Practical Steps to Address Piracy
Practical Steps to Address PiracyPractical Steps to Address Piracy
Practical Steps to Address PiracyChris Shillum
 
Multi-Agency Multi-Media Interoperable Communication, Enabled By Redis: Paul ...
Multi-Agency Multi-Media Interoperable Communication, Enabled By Redis: Paul ...Multi-Agency Multi-Media Interoperable Communication, Enabled By Redis: Paul ...
Multi-Agency Multi-Media Interoperable Communication, Enabled By Redis: Paul ...Redis Labs
 
CFPB Design Manual & Capital Framework at OSCON
CFPB Design Manual & Capital Framework at OSCONCFPB Design Manual & Capital Framework at OSCON
CFPB Design Manual & Capital Framework at OSCONMollie Bates
 
A flexible recommenndation system for Cable TV
A flexible recommenndation system for Cable TVA flexible recommenndation system for Cable TV
A flexible recommenndation system for Cable TVIntoTheMinds
 
A Flexible Recommendation System for Cable TV
A Flexible Recommendation System for Cable TVA Flexible Recommendation System for Cable TV
A Flexible Recommendation System for Cable TVFrancisco Couto
 
Agile development and operation of complex systems in multitechnology and mul...
Agile development and operation of complex systems in multitechnology and mul...Agile development and operation of complex systems in multitechnology and mul...
Agile development and operation of complex systems in multitechnology and mul...Citadelh2020
 
Developing recommendation systems to support open source software developers ...
Developing recommendation systems to support open source software developers ...Developing recommendation systems to support open source software developers ...
Developing recommendation systems to support open source software developers ...Davide Ruscio
 
CrowdSearcher. Reactive and multiplatform Crowdsourcing. keynote speech at DB...
CrowdSearcher. Reactive and multiplatform Crowdsourcing. keynote speech at DB...CrowdSearcher. Reactive and multiplatform Crowdsourcing. keynote speech at DB...
CrowdSearcher. Reactive and multiplatform Crowdsourcing. keynote speech at DB...Search Computing
 

Similar to Recommendation for new users at Criteo (20)

UKSG webinar: Authentication technology update: RA21 and OpenAthens with Josh...
UKSG webinar: Authentication technology update: RA21 and OpenAthens with Josh...UKSG webinar: Authentication technology update: RA21 and OpenAthens with Josh...
UKSG webinar: Authentication technology update: RA21 and OpenAthens with Josh...
 
ECIR Recommendation Challenges
ECIR Recommendation ChallengesECIR Recommendation Challenges
ECIR Recommendation Challenges
 
Reco4J @ Munich Meetup (April 18th)
Reco4J @ Munich Meetup (April 18th)Reco4J @ Munich Meetup (April 18th)
Reco4J @ Munich Meetup (April 18th)
 
Guerilla Human Computer Interaction and Customer Based Design
Guerilla Human Computer Interaction and Customer Based DesignGuerilla Human Computer Interaction and Customer Based Design
Guerilla Human Computer Interaction and Customer Based Design
 
Cognistreamer's use case
Cognistreamer's use caseCognistreamer's use case
Cognistreamer's use case
 
Reco4J @ London Meetup (June 26th)
Reco4J @ London Meetup (June 26th)Reco4J @ London Meetup (June 26th)
Reco4J @ London Meetup (June 26th)
 
Open Chemistry: Input Preparation, Data Visualization & Analysis
Open Chemistry: Input Preparation, Data Visualization & AnalysisOpen Chemistry: Input Preparation, Data Visualization & Analysis
Open Chemistry: Input Preparation, Data Visualization & Analysis
 
Microservices.pdf
Microservices.pdfMicroservices.pdf
Microservices.pdf
 
BS 8878: Systematic Approaches to Documenting Web Accessibility Policies and ...
BS 8878: Systematic Approaches to Documenting Web Accessibility Policies and ...BS 8878: Systematic Approaches to Documenting Web Accessibility Policies and ...
BS 8878: Systematic Approaches to Documenting Web Accessibility Policies and ...
 
Hazen, Morse, and Varnum "Fall 2022 ODI Conformance Statement Workshop for Li...
Hazen, Morse, and Varnum "Fall 2022 ODI Conformance Statement Workshop for Li...Hazen, Morse, and Varnum "Fall 2022 ODI Conformance Statement Workshop for Li...
Hazen, Morse, and Varnum "Fall 2022 ODI Conformance Statement Workshop for Li...
 
Ddz project new-approach-091124
Ddz project new-approach-091124Ddz project new-approach-091124
Ddz project new-approach-091124
 
Practical Steps to Address Piracy
Practical Steps to Address PiracyPractical Steps to Address Piracy
Practical Steps to Address Piracy
 
Multi-Agency Multi-Media Interoperable Communication, Enabled By Redis: Paul ...
Multi-Agency Multi-Media Interoperable Communication, Enabled By Redis: Paul ...Multi-Agency Multi-Media Interoperable Communication, Enabled By Redis: Paul ...
Multi-Agency Multi-Media Interoperable Communication, Enabled By Redis: Paul ...
 
CFPB Design Manual & Capital Framework at OSCON
CFPB Design Manual & Capital Framework at OSCONCFPB Design Manual & Capital Framework at OSCON
CFPB Design Manual & Capital Framework at OSCON
 
A flexible recommenndation system for Cable TV
A flexible recommenndation system for Cable TVA flexible recommenndation system for Cable TV
A flexible recommenndation system for Cable TV
 
A Flexible Recommendation System for Cable TV
A Flexible Recommendation System for Cable TVA Flexible Recommendation System for Cable TV
A Flexible Recommendation System for Cable TV
 
Agile development and operation of complex systems in multitechnology and mul...
Agile development and operation of complex systems in multitechnology and mul...Agile development and operation of complex systems in multitechnology and mul...
Agile development and operation of complex systems in multitechnology and mul...
 
Developing recommendation systems to support open source software developers ...
Developing recommendation systems to support open source software developers ...Developing recommendation systems to support open source software developers ...
Developing recommendation systems to support open source software developers ...
 
tip oopt pse-summit2017
tip oopt pse-summit2017tip oopt pse-summit2017
tip oopt pse-summit2017
 
CrowdSearcher. Reactive and multiplatform Crowdsourcing. keynote speech at DB...
CrowdSearcher. Reactive and multiplatform Crowdsourcing. keynote speech at DB...CrowdSearcher. Reactive and multiplatform Crowdsourcing. keynote speech at DB...
CrowdSearcher. Reactive and multiplatform Crowdsourcing. keynote speech at DB...
 

Recently uploaded

Vishratwadi & Ghorpadi Bridge Tender documents
Vishratwadi & Ghorpadi Bridge Tender documentsVishratwadi & Ghorpadi Bridge Tender documents
Vishratwadi & Ghorpadi Bridge Tender documentsSachinPawar510423
 
Class 1 | NFPA 72 | Overview Fire Alarm System
Class 1 | NFPA 72 | Overview Fire Alarm SystemClass 1 | NFPA 72 | Overview Fire Alarm System
Class 1 | NFPA 72 | Overview Fire Alarm Systemirfanmechengr
 
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort serviceGurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort servicejennyeacort
 
Correctly Loading Incremental Data at Scale
Correctly Loading Incremental Data at ScaleCorrectly Loading Incremental Data at Scale
Correctly Loading Incremental Data at ScaleAlluxio, Inc.
 
Introduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptxIntroduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptxk795866
 
Piping Basic stress analysis by engineering
Piping Basic stress analysis by engineeringPiping Basic stress analysis by engineering
Piping Basic stress analysis by engineeringJuanCarlosMorales19600
 
computer application and construction management
computer application and construction managementcomputer application and construction management
computer application and construction managementMariconPadriquez1
 
Call Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call GirlsCall Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call Girlsssuser7cb4ff
 
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdfCCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdfAsst.prof M.Gokilavani
 
welding defects observed during the welding
welding defects observed during the weldingwelding defects observed during the welding
welding defects observed during the weldingMuhammadUzairLiaqat
 
US Department of Education FAFSA Week of Action
US Department of Education FAFSA Week of ActionUS Department of Education FAFSA Week of Action
US Department of Education FAFSA Week of ActionMebane Rash
 
Introduction to Machine Learning Unit-3 for II MECH
Introduction to Machine Learning Unit-3 for II MECHIntroduction to Machine Learning Unit-3 for II MECH
Introduction to Machine Learning Unit-3 for II MECHC Sai Kiran
 
Risk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdfRisk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdfROCENODodongVILLACER
 
lifi-technology with integration of IOT.pptx
lifi-technology with integration of IOT.pptxlifi-technology with integration of IOT.pptx
lifi-technology with integration of IOT.pptxsomshekarkn64
 
Work Experience-Dalton Park.pptxfvvvvvvv
Work Experience-Dalton Park.pptxfvvvvvvvWork Experience-Dalton Park.pptxfvvvvvvv
Work Experience-Dalton Park.pptxfvvvvvvvLewisJB
 
Past, Present and Future of Generative AI
Past, Present and Future of Generative AIPast, Present and Future of Generative AI
Past, Present and Future of Generative AIabhishek36461
 
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsyncWhy does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsyncssuser2ae721
 
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETE
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETEINFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETE
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETEroselinkalist12
 

Recently uploaded (20)

Vishratwadi & Ghorpadi Bridge Tender documents
Vishratwadi & Ghorpadi Bridge Tender documentsVishratwadi & Ghorpadi Bridge Tender documents
Vishratwadi & Ghorpadi Bridge Tender documents
 
young call girls in Green Park🔝 9953056974 🔝 escort Service
young call girls in Green Park🔝 9953056974 🔝 escort Serviceyoung call girls in Green Park🔝 9953056974 🔝 escort Service
young call girls in Green Park🔝 9953056974 🔝 escort Service
 
Class 1 | NFPA 72 | Overview Fire Alarm System
Class 1 | NFPA 72 | Overview Fire Alarm SystemClass 1 | NFPA 72 | Overview Fire Alarm System
Class 1 | NFPA 72 | Overview Fire Alarm System
 
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort serviceGurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
 
Correctly Loading Incremental Data at Scale
Correctly Loading Incremental Data at ScaleCorrectly Loading Incremental Data at Scale
Correctly Loading Incremental Data at Scale
 
Introduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptxIntroduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptx
 
Piping Basic stress analysis by engineering
Piping Basic stress analysis by engineeringPiping Basic stress analysis by engineering
Piping Basic stress analysis by engineering
 
computer application and construction management
computer application and construction managementcomputer application and construction management
computer application and construction management
 
Call Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call GirlsCall Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call Girls
 
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdfCCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
 
welding defects observed during the welding
welding defects observed during the weldingwelding defects observed during the welding
welding defects observed during the welding
 
US Department of Education FAFSA Week of Action
US Department of Education FAFSA Week of ActionUS Department of Education FAFSA Week of Action
US Department of Education FAFSA Week of Action
 
Introduction to Machine Learning Unit-3 for II MECH
Introduction to Machine Learning Unit-3 for II MECHIntroduction to Machine Learning Unit-3 for II MECH
Introduction to Machine Learning Unit-3 for II MECH
 
Risk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdfRisk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdf
 
lifi-technology with integration of IOT.pptx
lifi-technology with integration of IOT.pptxlifi-technology with integration of IOT.pptx
lifi-technology with integration of IOT.pptx
 
Work Experience-Dalton Park.pptxfvvvvvvv
Work Experience-Dalton Park.pptxfvvvvvvvWork Experience-Dalton Park.pptxfvvvvvvv
Work Experience-Dalton Park.pptxfvvvvvvv
 
Past, Present and Future of Generative AI
Past, Present and Future of Generative AIPast, Present and Future of Generative AI
Past, Present and Future of Generative AI
 
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsyncWhy does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync
 
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
 
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETE
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETEINFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETE
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETE
 

Recommendation for new users at Criteo