SlideShare a Scribd company logo
1 of 23
Download to read offline
Copyright © 2015 Criteo
Recommendation at scale
Simon Dollé
Berlin Buzzwords, June 2nd, 2015
Copyright © 2015 Criteo
2
Copyright © 2015 Criteo
We buy
• Inventory ! (ad spaces)
• Billions of times a day
• All over the Internet
• For 95% of the population
Where is the need for tech ?
We sell
• Clicks !
• (that convert)
• (that convert a lot)
We take the risk
You pay only for what you get
Copyright © 2015 Criteo
4
Physical infrastructure
6 Data centers on 3 continents operated and conceived in-house
~ 12000 servers, largest Hadoop cluster in Europe
More than 35 PB of storage Big Data
Traffic
800 k HTTP requests / sec (peak activity)
29000 impressions /sec (peak activity)
Less than 10 ms to process an RTB request
Copyright © 2015 Criteo
Data Sources
Ad display dataUser behavior dataCatalog data
Copyright © 2015 Criteo
Copyright © 2015 Criteo
Copyright © 2015 Criteo
Copyright © 2015 Criteo
Copyright © 2015 Criteo
Copyright © 2015 Criteo
Copyright © 2015 Criteo
Copyright © 2015 Criteo
Offline
• Similarities computed on browsing data
• Based on coevents
• Computed on Hadoop cluster
• Map reduce jobs, pig
• Takes around 12 hours
• Tradeoff performance, speed
Copyright © 2015 Criteo
Copyright © 2015 Criteo
0.04 0.02 0.02 0.05
Copyright © 2015 Criteo
0.04 0.020.020.05
Copyright © 2015 Criteo
0.04 0.020.020.05
Copyright © 2015 Criteo
ML model
• Logistic regression models
• Features
Product-specific User-specific User-product interactions Display-specific
Copyright © 2015 Criteo
Online optimizations
• Algorithmic
• Use simpler ML model
• Quickly discard candidates
• Technical
• Memcache + local cache
• Async I/O
Copyright © 2015 Criteo
Upcoming challenges
• Long(er)-term user profiles
• More and better product information (images, semantic, NLP)
• Instant-update of similarities
• (because batch computation is soooo last year)
Copyright © 2015 Criteo
Fancy a try ?
On your own:
With us !
http://labs.criteo.com/jobs/
• Our 1st public dataset is online: http://bit.ly/1vgw2XC
• 4GB display and click data, Kaggle challenge in 2014
• NEW : 1TB dataset released a few weeks ago: http://bit.ly/1PyH4Vq
• Hosted on Microsoft Azure, just waiting for you
Copyright © 2015 Criteo
Questions?
Copyright © 2015 Criteo
Thank you !
s.dolle@criteo.com

More Related Content

What's hot

Chemistry Data Basics with KNIME Analytics Platform
Chemistry Data Basics with KNIME Analytics PlatformChemistry Data Basics with KNIME Analytics Platform
Chemistry Data Basics with KNIME Analytics PlatformKNIMESlides
 
How to Choose a Data Warehouse
How to Choose a Data WarehouseHow to Choose a Data Warehouse
How to Choose a Data WarehouseMatillion
 
Cloud Developer Days - BigQuery
Cloud Developer Days - BigQueryCloud Developer Days - BigQuery
Cloud Developer Days - BigQueryWlodek Bielski
 
Google BigQuery Best Practices
Google BigQuery Best PracticesGoogle BigQuery Best Practices
Google BigQuery Best PracticesMatillion
 
Dive Into Data Lakes
Dive Into Data LakesDive Into Data Lakes
Dive Into Data LakesMatillion
 
An overview of BigQuery
An overview of BigQuery An overview of BigQuery
An overview of BigQuery GirdhareeSaran
 
DevTest Labs en Azure (por Iván Cañizares)
DevTest Labs en Azure (por Iván Cañizares)DevTest Labs en Azure (por Iván Cañizares)
DevTest Labs en Azure (por Iván Cañizares)Jorge Millán Cabrera
 
D-Conf 2013 - Cloud Computing - XaaS
D-Conf 2013  - Cloud Computing - XaaSD-Conf 2013  - Cloud Computing - XaaS
D-Conf 2013 - Cloud Computing - XaaSAstella Investimentos
 
Un orquestador en la nube: Azure Data Factory (por Carlos Sacristán)
Un orquestador en la nube: Azure Data Factory (por Carlos Sacristán)Un orquestador en la nube: Azure Data Factory (por Carlos Sacristán)
Un orquestador en la nube: Azure Data Factory (por Carlos Sacristán)Jorge Millán Cabrera
 
Quick Intro to Google Cloud Technologies
Quick Intro to Google Cloud TechnologiesQuick Intro to Google Cloud Technologies
Quick Intro to Google Cloud TechnologiesChris Schalk
 
Building Dynamic Pipelines in Azure Data Factory (SQLSaturday Oslo)
Building Dynamic Pipelines in Azure Data Factory (SQLSaturday Oslo)Building Dynamic Pipelines in Azure Data Factory (SQLSaturday Oslo)
Building Dynamic Pipelines in Azure Data Factory (SQLSaturday Oslo)Cathrine Wilhelmsen
 
2016 09 measurecamp - event data modeling
2016 09 measurecamp - event data modeling2016 09 measurecamp - event data modeling
2016 09 measurecamp - event data modelingyalisassoon
 
How we use Hive at SnowPlow, and how the role of HIve is changing
How we use Hive at SnowPlow, and how the role of HIve is changingHow we use Hive at SnowPlow, and how the role of HIve is changing
How we use Hive at SnowPlow, and how the role of HIve is changingyalisassoon
 
Analytics at Carbonite: presentation to Snowplow Meetup Boston April 2016
Analytics at Carbonite: presentation to Snowplow Meetup Boston April 2016Analytics at Carbonite: presentation to Snowplow Meetup Boston April 2016
Analytics at Carbonite: presentation to Snowplow Meetup Boston April 2016yalisassoon
 
TripleLift: Preparing for a New Programmatic Ad-Tech World
TripleLift: Preparing for a New Programmatic Ad-Tech WorldTripleLift: Preparing for a New Programmatic Ad-Tech World
TripleLift: Preparing for a New Programmatic Ad-Tech WorldVoltDB
 
Boston DevOps December Social
Boston DevOps December SocialBoston DevOps December Social
Boston DevOps December SocialDavid Fredricks
 
Pyramid Analytics vs Sisense
Pyramid Analytics vs SisensePyramid Analytics vs Sisense
Pyramid Analytics vs SisensePyramid Analytics
 
SnapLogic Live: Big Data Integration
SnapLogic Live: Big Data IntegrationSnapLogic Live: Big Data Integration
SnapLogic Live: Big Data IntegrationSnapLogic
 

What's hot (20)

Chemistry Data Basics with KNIME Analytics Platform
Chemistry Data Basics with KNIME Analytics PlatformChemistry Data Basics with KNIME Analytics Platform
Chemistry Data Basics with KNIME Analytics Platform
 
How to Choose a Data Warehouse
How to Choose a Data WarehouseHow to Choose a Data Warehouse
How to Choose a Data Warehouse
 
Cloud Developer Days - BigQuery
Cloud Developer Days - BigQueryCloud Developer Days - BigQuery
Cloud Developer Days - BigQuery
 
Google Bigtable
Google BigtableGoogle Bigtable
Google Bigtable
 
Google BigQuery Best Practices
Google BigQuery Best PracticesGoogle BigQuery Best Practices
Google BigQuery Best Practices
 
Dive Into Data Lakes
Dive Into Data LakesDive Into Data Lakes
Dive Into Data Lakes
 
An overview of BigQuery
An overview of BigQuery An overview of BigQuery
An overview of BigQuery
 
DevTest Labs en Azure (por Iván Cañizares)
DevTest Labs en Azure (por Iván Cañizares)DevTest Labs en Azure (por Iván Cañizares)
DevTest Labs en Azure (por Iván Cañizares)
 
D-Conf 2013 - Cloud Computing - XaaS
D-Conf 2013  - Cloud Computing - XaaSD-Conf 2013  - Cloud Computing - XaaS
D-Conf 2013 - Cloud Computing - XaaS
 
Un orquestador en la nube: Azure Data Factory (por Carlos Sacristán)
Un orquestador en la nube: Azure Data Factory (por Carlos Sacristán)Un orquestador en la nube: Azure Data Factory (por Carlos Sacristán)
Un orquestador en la nube: Azure Data Factory (por Carlos Sacristán)
 
Quick Intro to Google Cloud Technologies
Quick Intro to Google Cloud TechnologiesQuick Intro to Google Cloud Technologies
Quick Intro to Google Cloud Technologies
 
Building Dynamic Pipelines in Azure Data Factory (SQLSaturday Oslo)
Building Dynamic Pipelines in Azure Data Factory (SQLSaturday Oslo)Building Dynamic Pipelines in Azure Data Factory (SQLSaturday Oslo)
Building Dynamic Pipelines in Azure Data Factory (SQLSaturday Oslo)
 
2016 09 measurecamp - event data modeling
2016 09 measurecamp - event data modeling2016 09 measurecamp - event data modeling
2016 09 measurecamp - event data modeling
 
How we use Hive at SnowPlow, and how the role of HIve is changing
How we use Hive at SnowPlow, and how the role of HIve is changingHow we use Hive at SnowPlow, and how the role of HIve is changing
How we use Hive at SnowPlow, and how the role of HIve is changing
 
Analytics at Carbonite: presentation to Snowplow Meetup Boston April 2016
Analytics at Carbonite: presentation to Snowplow Meetup Boston April 2016Analytics at Carbonite: presentation to Snowplow Meetup Boston April 2016
Analytics at Carbonite: presentation to Snowplow Meetup Boston April 2016
 
TripleLift: Preparing for a New Programmatic Ad-Tech World
TripleLift: Preparing for a New Programmatic Ad-Tech WorldTripleLift: Preparing for a New Programmatic Ad-Tech World
TripleLift: Preparing for a New Programmatic Ad-Tech World
 
Big data == lean data
Big data == lean dataBig data == lean data
Big data == lean data
 
Boston DevOps December Social
Boston DevOps December SocialBoston DevOps December Social
Boston DevOps December Social
 
Pyramid Analytics vs Sisense
Pyramid Analytics vs SisensePyramid Analytics vs Sisense
Pyramid Analytics vs Sisense
 
SnapLogic Live: Big Data Integration
SnapLogic Live: Big Data IntegrationSnapLogic Live: Big Data Integration
SnapLogic Live: Big Data Integration
 

Viewers also liked

Damien Lefortier, Senior Machine Learning Engineer and Tech Lead in the Predi...
Damien Lefortier, Senior Machine Learning Engineer and Tech Lead in the Predi...Damien Lefortier, Senior Machine Learning Engineer and Tech Lead in the Predi...
Damien Lefortier, Senior Machine Learning Engineer and Tech Lead in the Predi...MLconf
 
New machine learning challenges at Criteo
New machine learning challenges at CriteoNew machine learning challenges at Criteo
New machine learning challenges at CriteoOlivier Koch
 
Making advertising personal, 4th NL Recommenders Meetup
Making advertising personal, 4th NL Recommenders MeetupMaking advertising personal, 4th NL Recommenders Meetup
Making advertising personal, 4th NL Recommenders MeetupOlivier Koch
 
New challenges for scalable machine learning in online advertising
New challenges for scalable machine learning in online advertisingNew challenges for scalable machine learning in online advertising
New challenges for scalable machine learning in online advertisingOlivier Koch
 
Flexible recommender systems based on graphs
Flexible recommender systems based on graphsFlexible recommender systems based on graphs
Flexible recommender systems based on graphsrecsysfr
 
Preference Elicitation in Mangaki: Is Your Taste Kinda Weird?
Preference Elicitation in Mangaki: Is Your Taste Kinda Weird?Preference Elicitation in Mangaki: Is Your Taste Kinda Weird?
Preference Elicitation in Mangaki: Is Your Taste Kinda Weird?recsysfr
 
New tools from the bandit literature to improve A/B Testing
New tools from the bandit literature to improve A/B TestingNew tools from the bandit literature to improve A/B Testing
New tools from the bandit literature to improve A/B Testingrecsysfr
 
Recommendation @ PriceMinister-Rakuten - Road to personalization
Recommendation @ PriceMinister-Rakuten - Road to personalizationRecommendation @ PriceMinister-Rakuten - Road to personalization
Recommendation @ PriceMinister-Rakuten - Road to personalizationrecsysfr
 
Rakuten Institute of Technology Paris
Rakuten Institute of Technology ParisRakuten Institute of Technology Paris
Rakuten Institute of Technology Parisrecsysfr
 
Recommendation @Deezer
Recommendation @DeezerRecommendation @Deezer
Recommendation @Deezerrecsysfr
 
Tailor-made personalization and recommendation - Sailendra
Tailor-made personalization and recommendation - SailendraTailor-made personalization and recommendation - Sailendra
Tailor-made personalization and recommendation - Sailendrarecsysfr
 
Story of the algorithms behind Deezer Flow
Story of the algorithms behind Deezer FlowStory of the algorithms behind Deezer Flow
Story of the algorithms behind Deezer Flowrecsysfr
 
Using Neural Networks to predict user ratings
Using Neural Networks to predict user ratingsUsing Neural Networks to predict user ratings
Using Neural Networks to predict user ratingsrecsysfr
 
Dictionary Learning for Massive Matrix Factorization
Dictionary Learning for Massive Matrix FactorizationDictionary Learning for Massive Matrix Factorization
Dictionary Learning for Massive Matrix Factorizationrecsysfr
 
What can bring library metadata to the web? Trust, links and love
What can bring library metadata to the web? Trust, links and loveWhat can bring library metadata to the web? Trust, links and love
What can bring library metadata to the web? Trust, links and loverecsysfr
 
Justin Basilico, Research/ Engineering Manager at Netflix at MLconf SF - 11/1...
Justin Basilico, Research/ Engineering Manager at Netflix at MLconf SF - 11/1...Justin Basilico, Research/ Engineering Manager at Netflix at MLconf SF - 11/1...
Justin Basilico, Research/ Engineering Manager at Netflix at MLconf SF - 11/1...MLconf
 
Xavier Amatriain, VP of Engineering, Quora at MLconf SF - 11/13/15
Xavier Amatriain, VP of Engineering, Quora at MLconf SF - 11/13/15Xavier Amatriain, VP of Engineering, Quora at MLconf SF - 11/13/15
Xavier Amatriain, VP of Engineering, Quora at MLconf SF - 11/13/15MLconf
 
Dr. Ike Nassi, Founder, TidalScale at MLconf NYC - 4/15/16
Dr. Ike Nassi, Founder, TidalScale at MLconf NYC - 4/15/16Dr. Ike Nassi, Founder, TidalScale at MLconf NYC - 4/15/16
Dr. Ike Nassi, Founder, TidalScale at MLconf NYC - 4/15/16MLconf
 
Ewa Dominowska, Engineering Manager, Facebook at MLconf SEA - 5/20/16
Ewa Dominowska, Engineering Manager, Facebook at MLconf SEA - 5/20/16Ewa Dominowska, Engineering Manager, Facebook at MLconf SEA - 5/20/16
Ewa Dominowska, Engineering Manager, Facebook at MLconf SEA - 5/20/16MLconf
 
Yael Elmatad, Senior Data Scientist, Tapad at MLconf NYC - 4/15/16
Yael Elmatad, Senior Data Scientist, Tapad at MLconf NYC - 4/15/16Yael Elmatad, Senior Data Scientist, Tapad at MLconf NYC - 4/15/16
Yael Elmatad, Senior Data Scientist, Tapad at MLconf NYC - 4/15/16MLconf
 

Viewers also liked (20)

Damien Lefortier, Senior Machine Learning Engineer and Tech Lead in the Predi...
Damien Lefortier, Senior Machine Learning Engineer and Tech Lead in the Predi...Damien Lefortier, Senior Machine Learning Engineer and Tech Lead in the Predi...
Damien Lefortier, Senior Machine Learning Engineer and Tech Lead in the Predi...
 
New machine learning challenges at Criteo
New machine learning challenges at CriteoNew machine learning challenges at Criteo
New machine learning challenges at Criteo
 
Making advertising personal, 4th NL Recommenders Meetup
Making advertising personal, 4th NL Recommenders MeetupMaking advertising personal, 4th NL Recommenders Meetup
Making advertising personal, 4th NL Recommenders Meetup
 
New challenges for scalable machine learning in online advertising
New challenges for scalable machine learning in online advertisingNew challenges for scalable machine learning in online advertising
New challenges for scalable machine learning in online advertising
 
Flexible recommender systems based on graphs
Flexible recommender systems based on graphsFlexible recommender systems based on graphs
Flexible recommender systems based on graphs
 
Preference Elicitation in Mangaki: Is Your Taste Kinda Weird?
Preference Elicitation in Mangaki: Is Your Taste Kinda Weird?Preference Elicitation in Mangaki: Is Your Taste Kinda Weird?
Preference Elicitation in Mangaki: Is Your Taste Kinda Weird?
 
New tools from the bandit literature to improve A/B Testing
New tools from the bandit literature to improve A/B TestingNew tools from the bandit literature to improve A/B Testing
New tools from the bandit literature to improve A/B Testing
 
Recommendation @ PriceMinister-Rakuten - Road to personalization
Recommendation @ PriceMinister-Rakuten - Road to personalizationRecommendation @ PriceMinister-Rakuten - Road to personalization
Recommendation @ PriceMinister-Rakuten - Road to personalization
 
Rakuten Institute of Technology Paris
Rakuten Institute of Technology ParisRakuten Institute of Technology Paris
Rakuten Institute of Technology Paris
 
Recommendation @Deezer
Recommendation @DeezerRecommendation @Deezer
Recommendation @Deezer
 
Tailor-made personalization and recommendation - Sailendra
Tailor-made personalization and recommendation - SailendraTailor-made personalization and recommendation - Sailendra
Tailor-made personalization and recommendation - Sailendra
 
Story of the algorithms behind Deezer Flow
Story of the algorithms behind Deezer FlowStory of the algorithms behind Deezer Flow
Story of the algorithms behind Deezer Flow
 
Using Neural Networks to predict user ratings
Using Neural Networks to predict user ratingsUsing Neural Networks to predict user ratings
Using Neural Networks to predict user ratings
 
Dictionary Learning for Massive Matrix Factorization
Dictionary Learning for Massive Matrix FactorizationDictionary Learning for Massive Matrix Factorization
Dictionary Learning for Massive Matrix Factorization
 
What can bring library metadata to the web? Trust, links and love
What can bring library metadata to the web? Trust, links and loveWhat can bring library metadata to the web? Trust, links and love
What can bring library metadata to the web? Trust, links and love
 
Justin Basilico, Research/ Engineering Manager at Netflix at MLconf SF - 11/1...
Justin Basilico, Research/ Engineering Manager at Netflix at MLconf SF - 11/1...Justin Basilico, Research/ Engineering Manager at Netflix at MLconf SF - 11/1...
Justin Basilico, Research/ Engineering Manager at Netflix at MLconf SF - 11/1...
 
Xavier Amatriain, VP of Engineering, Quora at MLconf SF - 11/13/15
Xavier Amatriain, VP of Engineering, Quora at MLconf SF - 11/13/15Xavier Amatriain, VP of Engineering, Quora at MLconf SF - 11/13/15
Xavier Amatriain, VP of Engineering, Quora at MLconf SF - 11/13/15
 
Dr. Ike Nassi, Founder, TidalScale at MLconf NYC - 4/15/16
Dr. Ike Nassi, Founder, TidalScale at MLconf NYC - 4/15/16Dr. Ike Nassi, Founder, TidalScale at MLconf NYC - 4/15/16
Dr. Ike Nassi, Founder, TidalScale at MLconf NYC - 4/15/16
 
Ewa Dominowska, Engineering Manager, Facebook at MLconf SEA - 5/20/16
Ewa Dominowska, Engineering Manager, Facebook at MLconf SEA - 5/20/16Ewa Dominowska, Engineering Manager, Facebook at MLconf SEA - 5/20/16
Ewa Dominowska, Engineering Manager, Facebook at MLconf SEA - 5/20/16
 
Yael Elmatad, Senior Data Scientist, Tapad at MLconf NYC - 4/15/16
Yael Elmatad, Senior Data Scientist, Tapad at MLconf NYC - 4/15/16Yael Elmatad, Senior Data Scientist, Tapad at MLconf NYC - 4/15/16
Yael Elmatad, Senior Data Scientist, Tapad at MLconf NYC - 4/15/16
 

Similar to Recommendation at scale

Criteo TektosData Meetup
Criteo TektosData MeetupCriteo TektosData Meetup
Criteo TektosData MeetupOlivier Koch
 
Simon Dollé_Large-scale Real-time recommendation at Criteo
Simon Dollé_Large-scale Real-time recommendation at Criteo Simon Dollé_Large-scale Real-time recommendation at Criteo
Simon Dollé_Large-scale Real-time recommendation at Criteo Dataconomy Media
 
RecSys 2015: Large-scale real-time product recommendation at Criteo
RecSys 2015: Large-scale real-time product recommendation at CriteoRecSys 2015: Large-scale real-time product recommendation at Criteo
RecSys 2015: Large-scale real-time product recommendation at CriteoRomain Lerallut
 
In-Memory Data Management Goes Mainstream - OpenSlava 2015
In-Memory Data Management Goes Mainstream - OpenSlava 2015In-Memory Data Management Goes Mainstream - OpenSlava 2015
In-Memory Data Management Goes Mainstream - OpenSlava 2015Software AG
 
Top Trends in Building Data Lakes for Machine Learning and AI
Top Trends in Building Data Lakes for Machine Learning and AI Top Trends in Building Data Lakes for Machine Learning and AI
Top Trends in Building Data Lakes for Machine Learning and AI Holden Ackerman
 
IMCSummit 2015 - 1 IT Business - The Evolution of Pivotal Gemfire
IMCSummit 2015 - 1 IT Business  - The Evolution of Pivotal GemfireIMCSummit 2015 - 1 IT Business  - The Evolution of Pivotal Gemfire
IMCSummit 2015 - 1 IT Business - The Evolution of Pivotal GemfireIn-Memory Computing Summit
 
There are 250 Database products, are you running the right one?
There are 250 Database products, are you running the right one?There are 250 Database products, are you running the right one?
There are 250 Database products, are you running the right one?Aerospike, Inc.
 
Stream Processing as Game Changer for Big Data and Internet of Things by Kai ...
Stream Processing as Game Changer for Big Data and Internet of Things by Kai ...Stream Processing as Game Changer for Big Data and Internet of Things by Kai ...
Stream Processing as Game Changer for Big Data and Internet of Things by Kai ...Big Data Spain
 
Streaming Analytics Comparison of Open Source Frameworks, Products, Cloud Ser...
Streaming Analytics Comparison of Open Source Frameworks, Products, Cloud Ser...Streaming Analytics Comparison of Open Source Frameworks, Products, Cloud Ser...
Streaming Analytics Comparison of Open Source Frameworks, Products, Cloud Ser...Kai Wähner
 
Tech Job Conference: Software Engineer @Criteo
Tech Job Conference: Software Engineer @CriteoTech Job Conference: Software Engineer @Criteo
Tech Job Conference: Software Engineer @CriteoGilles Legoux
 
TIBCO Advanced Analytics Meetup (TAAM) - June 2015
TIBCO Advanced Analytics Meetup (TAAM) - June 2015TIBCO Advanced Analytics Meetup (TAAM) - June 2015
TIBCO Advanced Analytics Meetup (TAAM) - June 2015Bipin Singh
 
Breaking Up the Monolith While Migrating to AWS (GPSTEC320) - AWS re:Invent 2018
Breaking Up the Monolith While Migrating to AWS (GPSTEC320) - AWS re:Invent 2018Breaking Up the Monolith While Migrating to AWS (GPSTEC320) - AWS re:Invent 2018
Breaking Up the Monolith While Migrating to AWS (GPSTEC320) - AWS re:Invent 2018Amazon Web Services
 
Connecta Event: Big Query och dataanalys med Google Cloud Platform
Connecta Event: Big Query och dataanalys med Google Cloud PlatformConnecta Event: Big Query och dataanalys med Google Cloud Platform
Connecta Event: Big Query och dataanalys med Google Cloud PlatformConnectaDigital
 
Enterprise Cloud Adoption
Enterprise Cloud Adoption Enterprise Cloud Adoption
Enterprise Cloud Adoption Tom Laszewski
 
M&E Leadership Session: The State of the Industry, What's New from AWS for M&...
M&E Leadership Session: The State of the Industry, What's New from AWS for M&...M&E Leadership Session: The State of the Industry, What's New from AWS for M&...
M&E Leadership Session: The State of the Industry, What's New from AWS for M&...Amazon Web Services
 
TiDB DevCon 2020 Opening Keynote
TiDB DevCon 2020 Opening Keynote TiDB DevCon 2020 Opening Keynote
TiDB DevCon 2020 Opening Keynote PingCAP
 
Real-Time Web Analytics with Amazon Kinesis Data Analytics (ADT401) - AWS re:...
Real-Time Web Analytics with Amazon Kinesis Data Analytics (ADT401) - AWS re:...Real-Time Web Analytics with Amazon Kinesis Data Analytics (ADT401) - AWS re:...
Real-Time Web Analytics with Amazon Kinesis Data Analytics (ADT401) - AWS re:...Amazon Web Services
 
How to run Real Time processing on Big Data / Ron Zavner (GigaSpaces)
How to run Real Time processing on Big Data / Ron Zavner (GigaSpaces)How to run Real Time processing on Big Data / Ron Zavner (GigaSpaces)
How to run Real Time processing on Big Data / Ron Zavner (GigaSpaces)Ontico
 

Similar to Recommendation at scale (20)

Criteo TektosData Meetup
Criteo TektosData MeetupCriteo TektosData Meetup
Criteo TektosData Meetup
 
Simon Dollé_Large-scale Real-time recommendation at Criteo
Simon Dollé_Large-scale Real-time recommendation at Criteo Simon Dollé_Large-scale Real-time recommendation at Criteo
Simon Dollé_Large-scale Real-time recommendation at Criteo
 
RecSys 2015: Large-scale real-time product recommendation at Criteo
RecSys 2015: Large-scale real-time product recommendation at CriteoRecSys 2015: Large-scale real-time product recommendation at Criteo
RecSys 2015: Large-scale real-time product recommendation at Criteo
 
In-Memory Data Management Goes Mainstream - OpenSlava 2015
In-Memory Data Management Goes Mainstream - OpenSlava 2015In-Memory Data Management Goes Mainstream - OpenSlava 2015
In-Memory Data Management Goes Mainstream - OpenSlava 2015
 
Criteo Couchbase live 2015
Criteo Couchbase live 2015Criteo Couchbase live 2015
Criteo Couchbase live 2015
 
Top Trends in Building Data Lakes for Machine Learning and AI
Top Trends in Building Data Lakes for Machine Learning and AI Top Trends in Building Data Lakes for Machine Learning and AI
Top Trends in Building Data Lakes for Machine Learning and AI
 
IMCSummit 2015 - 1 IT Business - The Evolution of Pivotal Gemfire
IMCSummit 2015 - 1 IT Business  - The Evolution of Pivotal GemfireIMCSummit 2015 - 1 IT Business  - The Evolution of Pivotal Gemfire
IMCSummit 2015 - 1 IT Business - The Evolution of Pivotal Gemfire
 
There are 250 Database products, are you running the right one?
There are 250 Database products, are you running the right one?There are 250 Database products, are you running the right one?
There are 250 Database products, are you running the right one?
 
Stream Processing as Game Changer for Big Data and Internet of Things by Kai ...
Stream Processing as Game Changer for Big Data and Internet of Things by Kai ...Stream Processing as Game Changer for Big Data and Internet of Things by Kai ...
Stream Processing as Game Changer for Big Data and Internet of Things by Kai ...
 
Streaming Analytics Comparison of Open Source Frameworks, Products, Cloud Ser...
Streaming Analytics Comparison of Open Source Frameworks, Products, Cloud Ser...Streaming Analytics Comparison of Open Source Frameworks, Products, Cloud Ser...
Streaming Analytics Comparison of Open Source Frameworks, Products, Cloud Ser...
 
Tech Job Conference: Software Engineer @Criteo
Tech Job Conference: Software Engineer @CriteoTech Job Conference: Software Engineer @Criteo
Tech Job Conference: Software Engineer @Criteo
 
TIBCO Advanced Analytics Meetup (TAAM) - June 2015
TIBCO Advanced Analytics Meetup (TAAM) - June 2015TIBCO Advanced Analytics Meetup (TAAM) - June 2015
TIBCO Advanced Analytics Meetup (TAAM) - June 2015
 
Breaking Up the Monolith While Migrating to AWS (GPSTEC320) - AWS re:Invent 2018
Breaking Up the Monolith While Migrating to AWS (GPSTEC320) - AWS re:Invent 2018Breaking Up the Monolith While Migrating to AWS (GPSTEC320) - AWS re:Invent 2018
Breaking Up the Monolith While Migrating to AWS (GPSTEC320) - AWS re:Invent 2018
 
Connecta Event: Big Query och dataanalys med Google Cloud Platform
Connecta Event: Big Query och dataanalys med Google Cloud PlatformConnecta Event: Big Query och dataanalys med Google Cloud Platform
Connecta Event: Big Query och dataanalys med Google Cloud Platform
 
Enterprise Cloud Adoption
Enterprise Cloud Adoption Enterprise Cloud Adoption
Enterprise Cloud Adoption
 
M&E Leadership Session: The State of the Industry, What's New from AWS for M&...
M&E Leadership Session: The State of the Industry, What's New from AWS for M&...M&E Leadership Session: The State of the Industry, What's New from AWS for M&...
M&E Leadership Session: The State of the Industry, What's New from AWS for M&...
 
TiDB DevCon 2020 Opening Keynote
TiDB DevCon 2020 Opening Keynote TiDB DevCon 2020 Opening Keynote
TiDB DevCon 2020 Opening Keynote
 
Bos365 April 2015
Bos365 April 2015Bos365 April 2015
Bos365 April 2015
 
Real-Time Web Analytics with Amazon Kinesis Data Analytics (ADT401) - AWS re:...
Real-Time Web Analytics with Amazon Kinesis Data Analytics (ADT401) - AWS re:...Real-Time Web Analytics with Amazon Kinesis Data Analytics (ADT401) - AWS re:...
Real-Time Web Analytics with Amazon Kinesis Data Analytics (ADT401) - AWS re:...
 
How to run Real Time processing on Big Data / Ron Zavner (GigaSpaces)
How to run Real Time processing on Big Data / Ron Zavner (GigaSpaces)How to run Real Time processing on Big Data / Ron Zavner (GigaSpaces)
How to run Real Time processing on Big Data / Ron Zavner (GigaSpaces)
 

Recently uploaded

04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptxNLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptxBoston Institute of Analytics
 
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理e4aez8ss
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfJohn Sterrett
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach
 
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptxAmazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptxAbdelrhman abooda
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Colleen Farrelly
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档208367051
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一fhwihughh
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdfHuman37
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...Florian Roscheck
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queensdataanalyticsqueen03
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...limedy534
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDRafezzaman
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxEmmanuel Dauda
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceSapana Sha
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...soniya singh
 

Recently uploaded (20)

04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptxNLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
 
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdf
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
 
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptxAmazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
 
Call Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort ServiceCall Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort Service
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queens
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptx
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts Service
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
 

Recommendation at scale

  • 1. Copyright © 2015 Criteo Recommendation at scale Simon Dollé Berlin Buzzwords, June 2nd, 2015
  • 2. Copyright © 2015 Criteo 2
  • 3. Copyright © 2015 Criteo We buy • Inventory ! (ad spaces) • Billions of times a day • All over the Internet • For 95% of the population Where is the need for tech ? We sell • Clicks ! • (that convert) • (that convert a lot) We take the risk You pay only for what you get
  • 4. Copyright © 2015 Criteo 4 Physical infrastructure 6 Data centers on 3 continents operated and conceived in-house ~ 12000 servers, largest Hadoop cluster in Europe More than 35 PB of storage Big Data Traffic 800 k HTTP requests / sec (peak activity) 29000 impressions /sec (peak activity) Less than 10 ms to process an RTB request
  • 5. Copyright © 2015 Criteo Data Sources Ad display dataUser behavior dataCatalog data
  • 13. Copyright © 2015 Criteo Offline • Similarities computed on browsing data • Based on coevents • Computed on Hadoop cluster • Map reduce jobs, pig • Takes around 12 hours • Tradeoff performance, speed
  • 15. Copyright © 2015 Criteo 0.04 0.02 0.02 0.05
  • 16. Copyright © 2015 Criteo 0.04 0.020.020.05
  • 17. Copyright © 2015 Criteo 0.04 0.020.020.05
  • 18. Copyright © 2015 Criteo ML model • Logistic regression models • Features Product-specific User-specific User-product interactions Display-specific
  • 19. Copyright © 2015 Criteo Online optimizations • Algorithmic • Use simpler ML model • Quickly discard candidates • Technical • Memcache + local cache • Async I/O
  • 20. Copyright © 2015 Criteo Upcoming challenges • Long(er)-term user profiles • More and better product information (images, semantic, NLP) • Instant-update of similarities • (because batch computation is soooo last year)
  • 21. Copyright © 2015 Criteo Fancy a try ? On your own: With us ! http://labs.criteo.com/jobs/ • Our 1st public dataset is online: http://bit.ly/1vgw2XC • 4GB display and click data, Kaggle challenge in 2014 • NEW : 1TB dataset released a few weeks ago: http://bit.ly/1PyH4Vq • Hosted on Microsoft Azure, just waiting for you
  • 22. Copyright © 2015 Criteo Questions?
  • 23. Copyright © 2015 Criteo Thank you ! s.dolle@criteo.com