SlideShare a Scribd company logo
Copyright © 2015 Criteo
Recommendation at scale
Simon Dollé
Berlin Buzzwords, June 2nd, 2015
Copyright © 2015 Criteo
2
Copyright © 2015 Criteo
We buy
• Inventory ! (ad spaces)
• Billions of times a day
• All over the Internet
• For 95% of the population
Where is the need for tech ?
We sell
• Clicks !
• (that convert)
• (that convert a lot)
We take the risk
You pay only for what you get
Copyright © 2015 Criteo
4
Physical infrastructure
6 Data centers on 3 continents operated and conceived in-house
~ 12000 servers, largest Hadoop cluster in Europe
More than 35 PB of storage Big Data
Traffic
800 k HTTP requests / sec (peak activity)
29000 impressions /sec (peak activity)
Less than 10 ms to process an RTB request
Copyright © 2015 Criteo
Data Sources
Ad display dataUser behavior dataCatalog data
Copyright © 2015 Criteo
Copyright © 2015 Criteo
Copyright © 2015 Criteo
Copyright © 2015 Criteo
Copyright © 2015 Criteo
Copyright © 2015 Criteo
Copyright © 2015 Criteo
Copyright © 2015 Criteo
Offline
• Similarities computed on browsing data
• Based on coevents
• Computed on Hadoop cluster
• Map reduce jobs, pig
• Takes around 12 hours
• Tradeoff performance, speed
Copyright © 2015 Criteo
Copyright © 2015 Criteo
0.04 0.02 0.02 0.05
Copyright © 2015 Criteo
0.04 0.020.020.05
Copyright © 2015 Criteo
0.04 0.020.020.05
Copyright © 2015 Criteo
ML model
• Logistic regression models
• Features
Product-specific User-specific User-product interactions Display-specific
Copyright © 2015 Criteo
Online optimizations
• Algorithmic
• Use simpler ML model
• Quickly discard candidates
• Technical
• Memcache + local cache
• Async I/O
Copyright © 2015 Criteo
Upcoming challenges
• Long(er)-term user profiles
• More and better product information (images, semantic, NLP)
• Instant-update of similarities
• (because batch computation is soooo last year)
Copyright © 2015 Criteo
Fancy a try ?
On your own:
With us !
http://labs.criteo.com/jobs/
• Our 1st public dataset is online: http://bit.ly/1vgw2XC
• 4GB display and click data, Kaggle challenge in 2014
• NEW : 1TB dataset released a few weeks ago: http://bit.ly/1PyH4Vq
• Hosted on Microsoft Azure, just waiting for you
Copyright © 2015 Criteo
Questions?
Copyright © 2015 Criteo
Thank you !
s.dolle@criteo.com

More Related Content

What's hot

Chemistry Data Basics with KNIME Analytics Platform
Chemistry Data Basics with KNIME Analytics PlatformChemistry Data Basics with KNIME Analytics Platform
Chemistry Data Basics with KNIME Analytics Platform
KNIMESlides
 
How to Choose a Data Warehouse
How to Choose a Data WarehouseHow to Choose a Data Warehouse
How to Choose a Data Warehouse
Matillion
 
Cloud Developer Days - BigQuery
Cloud Developer Days - BigQueryCloud Developer Days - BigQuery
Cloud Developer Days - BigQuery
Wlodek Bielski
 
Google Bigtable
Google BigtableGoogle Bigtable
Google Bigtable
GirdhareeSaran
 
Google BigQuery Best Practices
Google BigQuery Best PracticesGoogle BigQuery Best Practices
Google BigQuery Best Practices
Matillion
 
Dive Into Data Lakes
Dive Into Data LakesDive Into Data Lakes
Dive Into Data Lakes
Matillion
 
An overview of BigQuery
An overview of BigQuery An overview of BigQuery
An overview of BigQuery
GirdhareeSaran
 
DevTest Labs en Azure (por Iván Cañizares)
DevTest Labs en Azure (por Iván Cañizares)DevTest Labs en Azure (por Iván Cañizares)
DevTest Labs en Azure (por Iván Cañizares)
Jorge Millán Cabrera
 
D-Conf 2013 - Cloud Computing - XaaS
D-Conf 2013  - Cloud Computing - XaaSD-Conf 2013  - Cloud Computing - XaaS
D-Conf 2013 - Cloud Computing - XaaS
Astella Investimentos
 
Un orquestador en la nube: Azure Data Factory (por Carlos Sacristán)
Un orquestador en la nube: Azure Data Factory (por Carlos Sacristán)Un orquestador en la nube: Azure Data Factory (por Carlos Sacristán)
Un orquestador en la nube: Azure Data Factory (por Carlos Sacristán)
Jorge Millán Cabrera
 
Quick Intro to Google Cloud Technologies
Quick Intro to Google Cloud TechnologiesQuick Intro to Google Cloud Technologies
Quick Intro to Google Cloud Technologies
Chris Schalk
 
Building Dynamic Pipelines in Azure Data Factory (SQLSaturday Oslo)
Building Dynamic Pipelines in Azure Data Factory (SQLSaturday Oslo)Building Dynamic Pipelines in Azure Data Factory (SQLSaturday Oslo)
Building Dynamic Pipelines in Azure Data Factory (SQLSaturday Oslo)
Cathrine Wilhelmsen
 
2016 09 measurecamp - event data modeling
2016 09 measurecamp - event data modeling2016 09 measurecamp - event data modeling
2016 09 measurecamp - event data modeling
yalisassoon
 
How we use Hive at SnowPlow, and how the role of HIve is changing
How we use Hive at SnowPlow, and how the role of HIve is changingHow we use Hive at SnowPlow, and how the role of HIve is changing
How we use Hive at SnowPlow, and how the role of HIve is changing
yalisassoon
 
Analytics at Carbonite: presentation to Snowplow Meetup Boston April 2016
Analytics at Carbonite: presentation to Snowplow Meetup Boston April 2016Analytics at Carbonite: presentation to Snowplow Meetup Boston April 2016
Analytics at Carbonite: presentation to Snowplow Meetup Boston April 2016
yalisassoon
 
TripleLift: Preparing for a New Programmatic Ad-Tech World
TripleLift: Preparing for a New Programmatic Ad-Tech WorldTripleLift: Preparing for a New Programmatic Ad-Tech World
TripleLift: Preparing for a New Programmatic Ad-Tech World
VoltDB
 
Big data == lean data
Big data == lean dataBig data == lean data
Big data == lean data
Lars Albertsson
 
Boston DevOps December Social
Boston DevOps December SocialBoston DevOps December Social
Boston DevOps December Social
David Fredricks
 
Pyramid Analytics vs Sisense
Pyramid Analytics vs SisensePyramid Analytics vs Sisense
Pyramid Analytics vs Sisense
Pyramid Analytics
 
SnapLogic Live: Big Data Integration
SnapLogic Live: Big Data IntegrationSnapLogic Live: Big Data Integration
SnapLogic Live: Big Data Integration
SnapLogic
 

What's hot (20)

Chemistry Data Basics with KNIME Analytics Platform
Chemistry Data Basics with KNIME Analytics PlatformChemistry Data Basics with KNIME Analytics Platform
Chemistry Data Basics with KNIME Analytics Platform
 
How to Choose a Data Warehouse
How to Choose a Data WarehouseHow to Choose a Data Warehouse
How to Choose a Data Warehouse
 
Cloud Developer Days - BigQuery
Cloud Developer Days - BigQueryCloud Developer Days - BigQuery
Cloud Developer Days - BigQuery
 
Google Bigtable
Google BigtableGoogle Bigtable
Google Bigtable
 
Google BigQuery Best Practices
Google BigQuery Best PracticesGoogle BigQuery Best Practices
Google BigQuery Best Practices
 
Dive Into Data Lakes
Dive Into Data LakesDive Into Data Lakes
Dive Into Data Lakes
 
An overview of BigQuery
An overview of BigQuery An overview of BigQuery
An overview of BigQuery
 
DevTest Labs en Azure (por Iván Cañizares)
DevTest Labs en Azure (por Iván Cañizares)DevTest Labs en Azure (por Iván Cañizares)
DevTest Labs en Azure (por Iván Cañizares)
 
D-Conf 2013 - Cloud Computing - XaaS
D-Conf 2013  - Cloud Computing - XaaSD-Conf 2013  - Cloud Computing - XaaS
D-Conf 2013 - Cloud Computing - XaaS
 
Un orquestador en la nube: Azure Data Factory (por Carlos Sacristán)
Un orquestador en la nube: Azure Data Factory (por Carlos Sacristán)Un orquestador en la nube: Azure Data Factory (por Carlos Sacristán)
Un orquestador en la nube: Azure Data Factory (por Carlos Sacristán)
 
Quick Intro to Google Cloud Technologies
Quick Intro to Google Cloud TechnologiesQuick Intro to Google Cloud Technologies
Quick Intro to Google Cloud Technologies
 
Building Dynamic Pipelines in Azure Data Factory (SQLSaturday Oslo)
Building Dynamic Pipelines in Azure Data Factory (SQLSaturday Oslo)Building Dynamic Pipelines in Azure Data Factory (SQLSaturday Oslo)
Building Dynamic Pipelines in Azure Data Factory (SQLSaturday Oslo)
 
2016 09 measurecamp - event data modeling
2016 09 measurecamp - event data modeling2016 09 measurecamp - event data modeling
2016 09 measurecamp - event data modeling
 
How we use Hive at SnowPlow, and how the role of HIve is changing
How we use Hive at SnowPlow, and how the role of HIve is changingHow we use Hive at SnowPlow, and how the role of HIve is changing
How we use Hive at SnowPlow, and how the role of HIve is changing
 
Analytics at Carbonite: presentation to Snowplow Meetup Boston April 2016
Analytics at Carbonite: presentation to Snowplow Meetup Boston April 2016Analytics at Carbonite: presentation to Snowplow Meetup Boston April 2016
Analytics at Carbonite: presentation to Snowplow Meetup Boston April 2016
 
TripleLift: Preparing for a New Programmatic Ad-Tech World
TripleLift: Preparing for a New Programmatic Ad-Tech WorldTripleLift: Preparing for a New Programmatic Ad-Tech World
TripleLift: Preparing for a New Programmatic Ad-Tech World
 
Big data == lean data
Big data == lean dataBig data == lean data
Big data == lean data
 
Boston DevOps December Social
Boston DevOps December SocialBoston DevOps December Social
Boston DevOps December Social
 
Pyramid Analytics vs Sisense
Pyramid Analytics vs SisensePyramid Analytics vs Sisense
Pyramid Analytics vs Sisense
 
SnapLogic Live: Big Data Integration
SnapLogic Live: Big Data IntegrationSnapLogic Live: Big Data Integration
SnapLogic Live: Big Data Integration
 

Viewers also liked

Damien Lefortier, Senior Machine Learning Engineer and Tech Lead in the Predi...
Damien Lefortier, Senior Machine Learning Engineer and Tech Lead in the Predi...Damien Lefortier, Senior Machine Learning Engineer and Tech Lead in the Predi...
Damien Lefortier, Senior Machine Learning Engineer and Tech Lead in the Predi...
MLconf
 
New machine learning challenges at Criteo
New machine learning challenges at CriteoNew machine learning challenges at Criteo
New machine learning challenges at Criteo
Olivier Koch
 
Making advertising personal, 4th NL Recommenders Meetup
Making advertising personal, 4th NL Recommenders MeetupMaking advertising personal, 4th NL Recommenders Meetup
Making advertising personal, 4th NL Recommenders Meetup
Olivier Koch
 
New challenges for scalable machine learning in online advertising
New challenges for scalable machine learning in online advertisingNew challenges for scalable machine learning in online advertising
New challenges for scalable machine learning in online advertising
Olivier Koch
 
Flexible recommender systems based on graphs
Flexible recommender systems based on graphsFlexible recommender systems based on graphs
Flexible recommender systems based on graphs
recsysfr
 
Preference Elicitation in Mangaki: Is Your Taste Kinda Weird?
Preference Elicitation in Mangaki: Is Your Taste Kinda Weird?Preference Elicitation in Mangaki: Is Your Taste Kinda Weird?
Preference Elicitation in Mangaki: Is Your Taste Kinda Weird?
recsysfr
 
New tools from the bandit literature to improve A/B Testing
New tools from the bandit literature to improve A/B TestingNew tools from the bandit literature to improve A/B Testing
New tools from the bandit literature to improve A/B Testing
recsysfr
 
Recommendation @ PriceMinister-Rakuten - Road to personalization
Recommendation @ PriceMinister-Rakuten - Road to personalizationRecommendation @ PriceMinister-Rakuten - Road to personalization
Recommendation @ PriceMinister-Rakuten - Road to personalization
recsysfr
 
Rakuten Institute of Technology Paris
Rakuten Institute of Technology ParisRakuten Institute of Technology Paris
Rakuten Institute of Technology Paris
recsysfr
 
Recommendation @Deezer
Recommendation @DeezerRecommendation @Deezer
Recommendation @Deezer
recsysfr
 
Tailor-made personalization and recommendation - Sailendra
Tailor-made personalization and recommendation - SailendraTailor-made personalization and recommendation - Sailendra
Tailor-made personalization and recommendation - Sailendra
recsysfr
 
Story of the algorithms behind Deezer Flow
Story of the algorithms behind Deezer FlowStory of the algorithms behind Deezer Flow
Story of the algorithms behind Deezer Flow
recsysfr
 
Using Neural Networks to predict user ratings
Using Neural Networks to predict user ratingsUsing Neural Networks to predict user ratings
Using Neural Networks to predict user ratings
recsysfr
 
Dictionary Learning for Massive Matrix Factorization
Dictionary Learning for Massive Matrix FactorizationDictionary Learning for Massive Matrix Factorization
Dictionary Learning for Massive Matrix Factorization
recsysfr
 
What can bring library metadata to the web? Trust, links and love
What can bring library metadata to the web? Trust, links and loveWhat can bring library metadata to the web? Trust, links and love
What can bring library metadata to the web? Trust, links and love
recsysfr
 
Justin Basilico, Research/ Engineering Manager at Netflix at MLconf SF - 11/1...
Justin Basilico, Research/ Engineering Manager at Netflix at MLconf SF - 11/1...Justin Basilico, Research/ Engineering Manager at Netflix at MLconf SF - 11/1...
Justin Basilico, Research/ Engineering Manager at Netflix at MLconf SF - 11/1...
MLconf
 
Xavier Amatriain, VP of Engineering, Quora at MLconf SF - 11/13/15
Xavier Amatriain, VP of Engineering, Quora at MLconf SF - 11/13/15Xavier Amatriain, VP of Engineering, Quora at MLconf SF - 11/13/15
Xavier Amatriain, VP of Engineering, Quora at MLconf SF - 11/13/15
MLconf
 
Dr. Ike Nassi, Founder, TidalScale at MLconf NYC - 4/15/16
Dr. Ike Nassi, Founder, TidalScale at MLconf NYC - 4/15/16Dr. Ike Nassi, Founder, TidalScale at MLconf NYC - 4/15/16
Dr. Ike Nassi, Founder, TidalScale at MLconf NYC - 4/15/16
MLconf
 
Ewa Dominowska, Engineering Manager, Facebook at MLconf SEA - 5/20/16
Ewa Dominowska, Engineering Manager, Facebook at MLconf SEA - 5/20/16Ewa Dominowska, Engineering Manager, Facebook at MLconf SEA - 5/20/16
Ewa Dominowska, Engineering Manager, Facebook at MLconf SEA - 5/20/16
MLconf
 
Yael Elmatad, Senior Data Scientist, Tapad at MLconf NYC - 4/15/16
Yael Elmatad, Senior Data Scientist, Tapad at MLconf NYC - 4/15/16Yael Elmatad, Senior Data Scientist, Tapad at MLconf NYC - 4/15/16
Yael Elmatad, Senior Data Scientist, Tapad at MLconf NYC - 4/15/16
MLconf
 

Viewers also liked (20)

Damien Lefortier, Senior Machine Learning Engineer and Tech Lead in the Predi...
Damien Lefortier, Senior Machine Learning Engineer and Tech Lead in the Predi...Damien Lefortier, Senior Machine Learning Engineer and Tech Lead in the Predi...
Damien Lefortier, Senior Machine Learning Engineer and Tech Lead in the Predi...
 
New machine learning challenges at Criteo
New machine learning challenges at CriteoNew machine learning challenges at Criteo
New machine learning challenges at Criteo
 
Making advertising personal, 4th NL Recommenders Meetup
Making advertising personal, 4th NL Recommenders MeetupMaking advertising personal, 4th NL Recommenders Meetup
Making advertising personal, 4th NL Recommenders Meetup
 
New challenges for scalable machine learning in online advertising
New challenges for scalable machine learning in online advertisingNew challenges for scalable machine learning in online advertising
New challenges for scalable machine learning in online advertising
 
Flexible recommender systems based on graphs
Flexible recommender systems based on graphsFlexible recommender systems based on graphs
Flexible recommender systems based on graphs
 
Preference Elicitation in Mangaki: Is Your Taste Kinda Weird?
Preference Elicitation in Mangaki: Is Your Taste Kinda Weird?Preference Elicitation in Mangaki: Is Your Taste Kinda Weird?
Preference Elicitation in Mangaki: Is Your Taste Kinda Weird?
 
New tools from the bandit literature to improve A/B Testing
New tools from the bandit literature to improve A/B TestingNew tools from the bandit literature to improve A/B Testing
New tools from the bandit literature to improve A/B Testing
 
Recommendation @ PriceMinister-Rakuten - Road to personalization
Recommendation @ PriceMinister-Rakuten - Road to personalizationRecommendation @ PriceMinister-Rakuten - Road to personalization
Recommendation @ PriceMinister-Rakuten - Road to personalization
 
Rakuten Institute of Technology Paris
Rakuten Institute of Technology ParisRakuten Institute of Technology Paris
Rakuten Institute of Technology Paris
 
Recommendation @Deezer
Recommendation @DeezerRecommendation @Deezer
Recommendation @Deezer
 
Tailor-made personalization and recommendation - Sailendra
Tailor-made personalization and recommendation - SailendraTailor-made personalization and recommendation - Sailendra
Tailor-made personalization and recommendation - Sailendra
 
Story of the algorithms behind Deezer Flow
Story of the algorithms behind Deezer FlowStory of the algorithms behind Deezer Flow
Story of the algorithms behind Deezer Flow
 
Using Neural Networks to predict user ratings
Using Neural Networks to predict user ratingsUsing Neural Networks to predict user ratings
Using Neural Networks to predict user ratings
 
Dictionary Learning for Massive Matrix Factorization
Dictionary Learning for Massive Matrix FactorizationDictionary Learning for Massive Matrix Factorization
Dictionary Learning for Massive Matrix Factorization
 
What can bring library metadata to the web? Trust, links and love
What can bring library metadata to the web? Trust, links and loveWhat can bring library metadata to the web? Trust, links and love
What can bring library metadata to the web? Trust, links and love
 
Justin Basilico, Research/ Engineering Manager at Netflix at MLconf SF - 11/1...
Justin Basilico, Research/ Engineering Manager at Netflix at MLconf SF - 11/1...Justin Basilico, Research/ Engineering Manager at Netflix at MLconf SF - 11/1...
Justin Basilico, Research/ Engineering Manager at Netflix at MLconf SF - 11/1...
 
Xavier Amatriain, VP of Engineering, Quora at MLconf SF - 11/13/15
Xavier Amatriain, VP of Engineering, Quora at MLconf SF - 11/13/15Xavier Amatriain, VP of Engineering, Quora at MLconf SF - 11/13/15
Xavier Amatriain, VP of Engineering, Quora at MLconf SF - 11/13/15
 
Dr. Ike Nassi, Founder, TidalScale at MLconf NYC - 4/15/16
Dr. Ike Nassi, Founder, TidalScale at MLconf NYC - 4/15/16Dr. Ike Nassi, Founder, TidalScale at MLconf NYC - 4/15/16
Dr. Ike Nassi, Founder, TidalScale at MLconf NYC - 4/15/16
 
Ewa Dominowska, Engineering Manager, Facebook at MLconf SEA - 5/20/16
Ewa Dominowska, Engineering Manager, Facebook at MLconf SEA - 5/20/16Ewa Dominowska, Engineering Manager, Facebook at MLconf SEA - 5/20/16
Ewa Dominowska, Engineering Manager, Facebook at MLconf SEA - 5/20/16
 
Yael Elmatad, Senior Data Scientist, Tapad at MLconf NYC - 4/15/16
Yael Elmatad, Senior Data Scientist, Tapad at MLconf NYC - 4/15/16Yael Elmatad, Senior Data Scientist, Tapad at MLconf NYC - 4/15/16
Yael Elmatad, Senior Data Scientist, Tapad at MLconf NYC - 4/15/16
 

Similar to Recommendation at scale

Criteo TektosData Meetup
Criteo TektosData MeetupCriteo TektosData Meetup
Criteo TektosData Meetup
Olivier Koch
 
Simon Dollé_Large-scale Real-time recommendation at Criteo
Simon Dollé_Large-scale Real-time recommendation at Criteo Simon Dollé_Large-scale Real-time recommendation at Criteo
Simon Dollé_Large-scale Real-time recommendation at Criteo
Dataconomy Media
 
RecSys 2015: Large-scale real-time product recommendation at Criteo
RecSys 2015: Large-scale real-time product recommendation at CriteoRecSys 2015: Large-scale real-time product recommendation at Criteo
RecSys 2015: Large-scale real-time product recommendation at Criteo
Romain Lerallut
 
In-Memory Data Management Goes Mainstream - OpenSlava 2015
In-Memory Data Management Goes Mainstream - OpenSlava 2015In-Memory Data Management Goes Mainstream - OpenSlava 2015
In-Memory Data Management Goes Mainstream - OpenSlava 2015
Software AG
 
Criteo Couchbase live 2015
Criteo Couchbase live 2015Criteo Couchbase live 2015
Criteo Couchbase live 2015
Nicolasgmail.com Helleringer
 
Top Trends in Building Data Lakes for Machine Learning and AI
Top Trends in Building Data Lakes for Machine Learning and AI Top Trends in Building Data Lakes for Machine Learning and AI
Top Trends in Building Data Lakes for Machine Learning and AI
Holden Ackerman
 
IMCSummit 2015 - 1 IT Business - The Evolution of Pivotal Gemfire
IMCSummit 2015 - 1 IT Business  - The Evolution of Pivotal GemfireIMCSummit 2015 - 1 IT Business  - The Evolution of Pivotal Gemfire
IMCSummit 2015 - 1 IT Business - The Evolution of Pivotal Gemfire
In-Memory Computing Summit
 
There are 250 Database products, are you running the right one?
There are 250 Database products, are you running the right one?There are 250 Database products, are you running the right one?
There are 250 Database products, are you running the right one?
Aerospike, Inc.
 
Stream Processing as Game Changer for Big Data and Internet of Things by Kai ...
Stream Processing as Game Changer for Big Data and Internet of Things by Kai ...Stream Processing as Game Changer for Big Data and Internet of Things by Kai ...
Stream Processing as Game Changer for Big Data and Internet of Things by Kai ...
Big Data Spain
 
Streaming Analytics Comparison of Open Source Frameworks, Products, Cloud Ser...
Streaming Analytics Comparison of Open Source Frameworks, Products, Cloud Ser...Streaming Analytics Comparison of Open Source Frameworks, Products, Cloud Ser...
Streaming Analytics Comparison of Open Source Frameworks, Products, Cloud Ser...
Kai Wähner
 
Tech Job Conference: Software Engineer @Criteo
Tech Job Conference: Software Engineer @CriteoTech Job Conference: Software Engineer @Criteo
Tech Job Conference: Software Engineer @Criteo
Gilles Legoux
 
TIBCO Advanced Analytics Meetup (TAAM) - June 2015
TIBCO Advanced Analytics Meetup (TAAM) - June 2015TIBCO Advanced Analytics Meetup (TAAM) - June 2015
TIBCO Advanced Analytics Meetup (TAAM) - June 2015
Bipin Singh
 
Breaking Up the Monolith While Migrating to AWS (GPSTEC320) - AWS re:Invent 2018
Breaking Up the Monolith While Migrating to AWS (GPSTEC320) - AWS re:Invent 2018Breaking Up the Monolith While Migrating to AWS (GPSTEC320) - AWS re:Invent 2018
Breaking Up the Monolith While Migrating to AWS (GPSTEC320) - AWS re:Invent 2018
Amazon Web Services
 
Connecta Event: Big Query och dataanalys med Google Cloud Platform
Connecta Event: Big Query och dataanalys med Google Cloud PlatformConnecta Event: Big Query och dataanalys med Google Cloud Platform
Connecta Event: Big Query och dataanalys med Google Cloud Platform
ConnectaDigital
 
Enterprise Cloud Adoption
Enterprise Cloud Adoption Enterprise Cloud Adoption
Enterprise Cloud Adoption
Tom Laszewski
 
M&E Leadership Session: The State of the Industry, What's New from AWS for M&...
M&E Leadership Session: The State of the Industry, What's New from AWS for M&...M&E Leadership Session: The State of the Industry, What's New from AWS for M&...
M&E Leadership Session: The State of the Industry, What's New from AWS for M&...
Amazon Web Services
 
TiDB DevCon 2020 Opening Keynote
TiDB DevCon 2020 Opening Keynote TiDB DevCon 2020 Opening Keynote
TiDB DevCon 2020 Opening Keynote
PingCAP
 
Bos365 April 2015
Bos365 April 2015Bos365 April 2015
Bos365 April 2015
Michael Dixon
 
Real-Time Web Analytics with Amazon Kinesis Data Analytics (ADT401) - AWS re:...
Real-Time Web Analytics with Amazon Kinesis Data Analytics (ADT401) - AWS re:...Real-Time Web Analytics with Amazon Kinesis Data Analytics (ADT401) - AWS re:...
Real-Time Web Analytics with Amazon Kinesis Data Analytics (ADT401) - AWS re:...
Amazon Web Services
 
How to run Real Time processing on Big Data / Ron Zavner (GigaSpaces)
How to run Real Time processing on Big Data / Ron Zavner (GigaSpaces)How to run Real Time processing on Big Data / Ron Zavner (GigaSpaces)
How to run Real Time processing on Big Data / Ron Zavner (GigaSpaces)
Ontico
 

Similar to Recommendation at scale (20)

Criteo TektosData Meetup
Criteo TektosData MeetupCriteo TektosData Meetup
Criteo TektosData Meetup
 
Simon Dollé_Large-scale Real-time recommendation at Criteo
Simon Dollé_Large-scale Real-time recommendation at Criteo Simon Dollé_Large-scale Real-time recommendation at Criteo
Simon Dollé_Large-scale Real-time recommendation at Criteo
 
RecSys 2015: Large-scale real-time product recommendation at Criteo
RecSys 2015: Large-scale real-time product recommendation at CriteoRecSys 2015: Large-scale real-time product recommendation at Criteo
RecSys 2015: Large-scale real-time product recommendation at Criteo
 
In-Memory Data Management Goes Mainstream - OpenSlava 2015
In-Memory Data Management Goes Mainstream - OpenSlava 2015In-Memory Data Management Goes Mainstream - OpenSlava 2015
In-Memory Data Management Goes Mainstream - OpenSlava 2015
 
Criteo Couchbase live 2015
Criteo Couchbase live 2015Criteo Couchbase live 2015
Criteo Couchbase live 2015
 
Top Trends in Building Data Lakes for Machine Learning and AI
Top Trends in Building Data Lakes for Machine Learning and AI Top Trends in Building Data Lakes for Machine Learning and AI
Top Trends in Building Data Lakes for Machine Learning and AI
 
IMCSummit 2015 - 1 IT Business - The Evolution of Pivotal Gemfire
IMCSummit 2015 - 1 IT Business  - The Evolution of Pivotal GemfireIMCSummit 2015 - 1 IT Business  - The Evolution of Pivotal Gemfire
IMCSummit 2015 - 1 IT Business - The Evolution of Pivotal Gemfire
 
There are 250 Database products, are you running the right one?
There are 250 Database products, are you running the right one?There are 250 Database products, are you running the right one?
There are 250 Database products, are you running the right one?
 
Stream Processing as Game Changer for Big Data and Internet of Things by Kai ...
Stream Processing as Game Changer for Big Data and Internet of Things by Kai ...Stream Processing as Game Changer for Big Data and Internet of Things by Kai ...
Stream Processing as Game Changer for Big Data and Internet of Things by Kai ...
 
Streaming Analytics Comparison of Open Source Frameworks, Products, Cloud Ser...
Streaming Analytics Comparison of Open Source Frameworks, Products, Cloud Ser...Streaming Analytics Comparison of Open Source Frameworks, Products, Cloud Ser...
Streaming Analytics Comparison of Open Source Frameworks, Products, Cloud Ser...
 
Tech Job Conference: Software Engineer @Criteo
Tech Job Conference: Software Engineer @CriteoTech Job Conference: Software Engineer @Criteo
Tech Job Conference: Software Engineer @Criteo
 
TIBCO Advanced Analytics Meetup (TAAM) - June 2015
TIBCO Advanced Analytics Meetup (TAAM) - June 2015TIBCO Advanced Analytics Meetup (TAAM) - June 2015
TIBCO Advanced Analytics Meetup (TAAM) - June 2015
 
Breaking Up the Monolith While Migrating to AWS (GPSTEC320) - AWS re:Invent 2018
Breaking Up the Monolith While Migrating to AWS (GPSTEC320) - AWS re:Invent 2018Breaking Up the Monolith While Migrating to AWS (GPSTEC320) - AWS re:Invent 2018
Breaking Up the Monolith While Migrating to AWS (GPSTEC320) - AWS re:Invent 2018
 
Connecta Event: Big Query och dataanalys med Google Cloud Platform
Connecta Event: Big Query och dataanalys med Google Cloud PlatformConnecta Event: Big Query och dataanalys med Google Cloud Platform
Connecta Event: Big Query och dataanalys med Google Cloud Platform
 
Enterprise Cloud Adoption
Enterprise Cloud Adoption Enterprise Cloud Adoption
Enterprise Cloud Adoption
 
M&E Leadership Session: The State of the Industry, What's New from AWS for M&...
M&E Leadership Session: The State of the Industry, What's New from AWS for M&...M&E Leadership Session: The State of the Industry, What's New from AWS for M&...
M&E Leadership Session: The State of the Industry, What's New from AWS for M&...
 
TiDB DevCon 2020 Opening Keynote
TiDB DevCon 2020 Opening Keynote TiDB DevCon 2020 Opening Keynote
TiDB DevCon 2020 Opening Keynote
 
Bos365 April 2015
Bos365 April 2015Bos365 April 2015
Bos365 April 2015
 
Real-Time Web Analytics with Amazon Kinesis Data Analytics (ADT401) - AWS re:...
Real-Time Web Analytics with Amazon Kinesis Data Analytics (ADT401) - AWS re:...Real-Time Web Analytics with Amazon Kinesis Data Analytics (ADT401) - AWS re:...
Real-Time Web Analytics with Amazon Kinesis Data Analytics (ADT401) - AWS re:...
 
How to run Real Time processing on Big Data / Ron Zavner (GigaSpaces)
How to run Real Time processing on Big Data / Ron Zavner (GigaSpaces)How to run Real Time processing on Big Data / Ron Zavner (GigaSpaces)
How to run Real Time processing on Big Data / Ron Zavner (GigaSpaces)
 

Recently uploaded

SAP BW4HANA Implementagtion Content Document
SAP BW4HANA Implementagtion Content DocumentSAP BW4HANA Implementagtion Content Document
SAP BW4HANA Implementagtion Content Document
newdirectionconsulta
 
Call Girls Lucknow 0000000000 Independent Call Girl Service Lucknow
Call Girls Lucknow 0000000000 Independent Call Girl Service LucknowCall Girls Lucknow 0000000000 Independent Call Girl Service Lucknow
Call Girls Lucknow 0000000000 Independent Call Girl Service Lucknow
hiju9823
 
一比一原版斯威本理工大学毕业证(swinburne毕业证)如何办理
一比一原版斯威本理工大学毕业证(swinburne毕业证)如何办理一比一原版斯威本理工大学毕业证(swinburne毕业证)如何办理
一比一原版斯威本理工大学毕业证(swinburne毕业证)如何办理
actyx
 
Build applications with generative AI on Google Cloud
Build applications with generative AI on Google CloudBuild applications with generative AI on Google Cloud
Build applications with generative AI on Google Cloud
Márton Kodok
 
[VCOSA] Monthly Report - Cotton & Yarn Statistics May 2024
[VCOSA] Monthly Report - Cotton & Yarn Statistics May 2024[VCOSA] Monthly Report - Cotton & Yarn Statistics May 2024
[VCOSA] Monthly Report - Cotton & Yarn Statistics May 2024
Vietnam Cotton & Spinning Association
 
Sample Devops SRE Product Companies .pdf
Sample Devops SRE  Product Companies .pdfSample Devops SRE  Product Companies .pdf
Sample Devops SRE Product Companies .pdf
Vineet
 
A gentle exploration of Retrieval Augmented Generation
A gentle exploration of Retrieval Augmented GenerationA gentle exploration of Retrieval Augmented Generation
A gentle exploration of Retrieval Augmented Generation
dataschool1
 
一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理
一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理
一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理
z6osjkqvd
 
一比一原版美国帕森斯设计学院毕业证(parsons毕业证书)如何办理
一比一原版美国帕森斯设计学院毕业证(parsons毕业证书)如何办理一比一原版美国帕森斯设计学院毕业证(parsons毕业证书)如何办理
一比一原版美国帕森斯设计学院毕业证(parsons毕业证书)如何办理
asyed10
 
Senior Engineering Sample EM DOE - Sheet1.pdf
Senior Engineering Sample EM DOE  - Sheet1.pdfSenior Engineering Sample EM DOE  - Sheet1.pdf
Senior Engineering Sample EM DOE - Sheet1.pdf
Vineet
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
bmucuha
 
Q4FY24 Investor-Presentation.pdf bank slide
Q4FY24 Investor-Presentation.pdf bank slideQ4FY24 Investor-Presentation.pdf bank slide
Q4FY24 Investor-Presentation.pdf bank slide
mukulupadhayay1
 
一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理
一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理
一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理
nyvan3
 
06-18-2024-Princeton Meetup-Introduction to Milvus
06-18-2024-Princeton Meetup-Introduction to Milvus06-18-2024-Princeton Meetup-Introduction to Milvus
06-18-2024-Princeton Meetup-Introduction to Milvus
Timothy Spann
 
一比一原版(uom毕业证书)曼彻斯特大学毕业证如何办理
一比一原版(uom毕业证书)曼彻斯特大学毕业证如何办理一比一原版(uom毕业证书)曼彻斯特大学毕业证如何办理
一比一原版(uom毕业证书)曼彻斯特大学毕业证如何办理
osoyvvf
 
Telemetry Solution for Gaming (AWS Summit'24)
Telemetry Solution for Gaming (AWS Summit'24)Telemetry Solution for Gaming (AWS Summit'24)
Telemetry Solution for Gaming (AWS Summit'24)
GeorgiiSteshenko
 
Salesforce AI + Data Community Tour Slides - Canarias
Salesforce AI + Data Community Tour Slides - CanariasSalesforce AI + Data Community Tour Slides - Canarias
Salesforce AI + Data Community Tour Slides - Canarias
davidpietrzykowski1
 
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
hyfjgavov
 
一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理
一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理
一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理
ywqeos
 
社内勉強会資料_Hallucination of LLMs               .
社内勉強会資料_Hallucination of LLMs               .社内勉強会資料_Hallucination of LLMs               .
社内勉強会資料_Hallucination of LLMs               .
NABLAS株式会社
 

Recently uploaded (20)

SAP BW4HANA Implementagtion Content Document
SAP BW4HANA Implementagtion Content DocumentSAP BW4HANA Implementagtion Content Document
SAP BW4HANA Implementagtion Content Document
 
Call Girls Lucknow 0000000000 Independent Call Girl Service Lucknow
Call Girls Lucknow 0000000000 Independent Call Girl Service LucknowCall Girls Lucknow 0000000000 Independent Call Girl Service Lucknow
Call Girls Lucknow 0000000000 Independent Call Girl Service Lucknow
 
一比一原版斯威本理工大学毕业证(swinburne毕业证)如何办理
一比一原版斯威本理工大学毕业证(swinburne毕业证)如何办理一比一原版斯威本理工大学毕业证(swinburne毕业证)如何办理
一比一原版斯威本理工大学毕业证(swinburne毕业证)如何办理
 
Build applications with generative AI on Google Cloud
Build applications with generative AI on Google CloudBuild applications with generative AI on Google Cloud
Build applications with generative AI on Google Cloud
 
[VCOSA] Monthly Report - Cotton & Yarn Statistics May 2024
[VCOSA] Monthly Report - Cotton & Yarn Statistics May 2024[VCOSA] Monthly Report - Cotton & Yarn Statistics May 2024
[VCOSA] Monthly Report - Cotton & Yarn Statistics May 2024
 
Sample Devops SRE Product Companies .pdf
Sample Devops SRE  Product Companies .pdfSample Devops SRE  Product Companies .pdf
Sample Devops SRE Product Companies .pdf
 
A gentle exploration of Retrieval Augmented Generation
A gentle exploration of Retrieval Augmented GenerationA gentle exploration of Retrieval Augmented Generation
A gentle exploration of Retrieval Augmented Generation
 
一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理
一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理
一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理
 
一比一原版美国帕森斯设计学院毕业证(parsons毕业证书)如何办理
一比一原版美国帕森斯设计学院毕业证(parsons毕业证书)如何办理一比一原版美国帕森斯设计学院毕业证(parsons毕业证书)如何办理
一比一原版美国帕森斯设计学院毕业证(parsons毕业证书)如何办理
 
Senior Engineering Sample EM DOE - Sheet1.pdf
Senior Engineering Sample EM DOE  - Sheet1.pdfSenior Engineering Sample EM DOE  - Sheet1.pdf
Senior Engineering Sample EM DOE - Sheet1.pdf
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
 
Q4FY24 Investor-Presentation.pdf bank slide
Q4FY24 Investor-Presentation.pdf bank slideQ4FY24 Investor-Presentation.pdf bank slide
Q4FY24 Investor-Presentation.pdf bank slide
 
一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理
一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理
一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理
 
06-18-2024-Princeton Meetup-Introduction to Milvus
06-18-2024-Princeton Meetup-Introduction to Milvus06-18-2024-Princeton Meetup-Introduction to Milvus
06-18-2024-Princeton Meetup-Introduction to Milvus
 
一比一原版(uom毕业证书)曼彻斯特大学毕业证如何办理
一比一原版(uom毕业证书)曼彻斯特大学毕业证如何办理一比一原版(uom毕业证书)曼彻斯特大学毕业证如何办理
一比一原版(uom毕业证书)曼彻斯特大学毕业证如何办理
 
Telemetry Solution for Gaming (AWS Summit'24)
Telemetry Solution for Gaming (AWS Summit'24)Telemetry Solution for Gaming (AWS Summit'24)
Telemetry Solution for Gaming (AWS Summit'24)
 
Salesforce AI + Data Community Tour Slides - Canarias
Salesforce AI + Data Community Tour Slides - CanariasSalesforce AI + Data Community Tour Slides - Canarias
Salesforce AI + Data Community Tour Slides - Canarias
 
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
 
一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理
一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理
一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理
 
社内勉強会資料_Hallucination of LLMs               .
社内勉強会資料_Hallucination of LLMs               .社内勉強会資料_Hallucination of LLMs               .
社内勉強会資料_Hallucination of LLMs               .
 

Recommendation at scale

  • 1. Copyright © 2015 Criteo Recommendation at scale Simon Dollé Berlin Buzzwords, June 2nd, 2015
  • 2. Copyright © 2015 Criteo 2
  • 3. Copyright © 2015 Criteo We buy • Inventory ! (ad spaces) • Billions of times a day • All over the Internet • For 95% of the population Where is the need for tech ? We sell • Clicks ! • (that convert) • (that convert a lot) We take the risk You pay only for what you get
  • 4. Copyright © 2015 Criteo 4 Physical infrastructure 6 Data centers on 3 continents operated and conceived in-house ~ 12000 servers, largest Hadoop cluster in Europe More than 35 PB of storage Big Data Traffic 800 k HTTP requests / sec (peak activity) 29000 impressions /sec (peak activity) Less than 10 ms to process an RTB request
  • 5. Copyright © 2015 Criteo Data Sources Ad display dataUser behavior dataCatalog data
  • 13. Copyright © 2015 Criteo Offline • Similarities computed on browsing data • Based on coevents • Computed on Hadoop cluster • Map reduce jobs, pig • Takes around 12 hours • Tradeoff performance, speed
  • 15. Copyright © 2015 Criteo 0.04 0.02 0.02 0.05
  • 16. Copyright © 2015 Criteo 0.04 0.020.020.05
  • 17. Copyright © 2015 Criteo 0.04 0.020.020.05
  • 18. Copyright © 2015 Criteo ML model • Logistic regression models • Features Product-specific User-specific User-product interactions Display-specific
  • 19. Copyright © 2015 Criteo Online optimizations • Algorithmic • Use simpler ML model • Quickly discard candidates • Technical • Memcache + local cache • Async I/O
  • 20. Copyright © 2015 Criteo Upcoming challenges • Long(er)-term user profiles • More and better product information (images, semantic, NLP) • Instant-update of similarities • (because batch computation is soooo last year)
  • 21. Copyright © 2015 Criteo Fancy a try ? On your own: With us ! http://labs.criteo.com/jobs/ • Our 1st public dataset is online: http://bit.ly/1vgw2XC • 4GB display and click data, Kaggle challenge in 2014 • NEW : 1TB dataset released a few weeks ago: http://bit.ly/1PyH4Vq • Hosted on Microsoft Azure, just waiting for you
  • 22. Copyright © 2015 Criteo Questions?
  • 23. Copyright © 2015 Criteo Thank you ! s.dolle@criteo.com