SlideShare a Scribd company logo
Copyright © 2014 Criteo
Making Advertising Personal
Large Scale Real-Time Product Recommendation at Criteo
Olivier Koch & Romain Lerallut
May 19th, 2015
Copyright © 2014 Criteo
Performance Advertising
We buy
• Inventory ! (ad spaces)
• Billions of times a day
• All over the Internet
• For 95% of the population
Where is the need for tech ?
We sell
• Clicks !
• (that convert)
• (that convert a lot)
We take the risk
You pay only for what you get
Copyright © 2013. Confidential
International infrastructure
Key figures
Traffic
550 k HTTP requests / sec (peak activity)
23000 impressions /sec (peak activity)
180 k requests / sec on RTB (average)
Less than 10 ms to process an RTB request
3
Figures of May2013
Physical infrastructure
6 Data centers on 3 continents operated and conceived in-
house
~ 12000 servers, largest Hadoop cluster in Europe
Availability / Uptime >99.95%
More than 20 PB of storage Big Data
Copyright © 2012. Confidential 4
Arbitrage
•Should we bid?
•At which price?
Recommendation
•Which products should
we display?
Graphical
optimization
•Big image vs small image
•Background color, ...
Prediction
•Generic prediction engine
•Specific models trained on TBs of data
Global Engine Architecture
Copyright © 2014 Criteo
Building ads with
personalized content in realtime
Recommendation for Advertising
2.5 billion products
~ 8ms response time
Copyright © 2014 Criteo
Data Sources
• Catalog data
• Feed provided by the merchants
• User behavior data
• Large scale intent data
• All visits to merchant websites
• Page views, basket, sales events
• Ad display data
• Displayed and clicked ads
6
Copyright © 2012. Confidential
7
Recommendation execution flow
Candidates
Generation
• Get candidates from all
sources using user historical
products and bestofs
Candidates
Aggregation
• Remove duplicates
• Aggregate features
Preselection
• Call degraded prediction to
decrease number of
candidates
Scoring
• Call full prediction model to
score each candidate
Winner Selector
• Select N products on score,
randomization
Glup Logging
• Log recommended products,
prediction variables
Copyright © 2014 Criteo
Issue #1: Retrieving user-specific products
• The need: storing [user → interesting products] vectors
• Difficult to store and retrieve at scale (900+ mln users)
• Hard to keep up to date
• Using seen products as a proxy
• Store the [user → viewed products] vectors
• Easier to maintain
• Store [viewed product → interesting products] vectors
• Based on aggregated user behavior data
• Computable offline
• Final ranking by a ML model
8
Copyright © 2014 Criteo
Issue #2: Scoring products
• Need to fuse data from several sources
• Product-specific
• User-specific
• User-product interactions
• Display-specific
• 1st solution: regression model
• Predict P(product click then sale)
• Easy to evaluate
• 2nd solution: ranking model
• More appropriate for our needs
• Still maximizing post-click sales
• Can be evaluated only on multi-product banners
9
Copyright © 2014 Criteo
Issue #3: Picking the right products
• Several questions:
• User-product fatigue
• Independent product choice assumption
• Explore / exploit
• Solution 1: Randomization in the banners
• Keep independent products assumption
• Separate optimization process to shuffle displayed items
• Solution 2: A better scoring model
• Score a full banner, not independent products
• Store all product display counts
10
Copyright © 2014 Criteo
Issue #4: 8ms response time ! Tweaking for performance
• CTR / Sales prediction takes ~40 µs per candidate using in-house library
• 2-step prediction:
• A fast pass to remove most of the candidates
• A slow pass to score accurately the remaining candidates
And the technical fine print:
• All real-time code in C#
• Async I/O for better efficiency
• HAProxy to scale the front-end
• Memcached to store all required data in memory
11
Copyright © 2014 Criteo
Upcoming challenges
• Long(er)-term user profiles
• More and better product information (images, semantic, NLP)
• Vertical-optimized engine
• Classifieds (catalog-free recommendation ?)
• Travel
• Instant-update of similarities
• (because batch computation is soooo last year)
12
Copyright © 2014 Criteo
Fancy a try ?
13
On your own:
With us !
http://www.criteo.com/careers/
• Our 1st public dataset is online: http://bit.ly/1vgw2XC
• 4GB display and click data, Kaggle challenge in 2014
• NEW : 1TB dataset released a few weeks ago
• Hosted on Microsoft Azure, just waiting for you
Copyright © 2014 Criteo
Questions?
Copyright © 2014 Criteo
Thank you !
o.koch@criteo.com
r.lerallut@criteo.com

More Related Content

What's hot

criteo-performance-advertising-playbook-2015
criteo-performance-advertising-playbook-2015criteo-performance-advertising-playbook-2015
criteo-performance-advertising-playbook-2015
Carolyn Bednarz
 
Introduction Criteo - 2.0
Introduction Criteo - 2.0Introduction Criteo - 2.0
Introduction Criteo - 2.0
Scott Turecek
 
Simon Dollé_Large-scale Real-time recommendation at Criteo
Simon Dollé_Large-scale Real-time recommendation at Criteo Simon Dollé_Large-scale Real-time recommendation at Criteo
Simon Dollé_Large-scale Real-time recommendation at Criteo
Dataconomy Media
 
Criteo Infrastructure (Platform) Meetup
Criteo Infrastructure (Platform) MeetupCriteo Infrastructure (Platform) Meetup
Criteo Infrastructure (Platform) Meetup
Ibrahim Abubakari
 
Criteo Couchbase live 2015
Criteo Couchbase live 2015Criteo Couchbase live 2015
Criteo Couchbase live 2015
Nicolasgmail.com Helleringer
 
Your Future With Content Manager OnDemand
Your Future With Content Manager OnDemandYour Future With Content Manager OnDemand
Your Future With Content Manager OnDemand
Zia Consulting
 
Aws community day keynote
Aws community day keynoteAws community day keynote
Aws community day keynote
James Samuel
 
Ad Server Solutions - ad server ad exchange
Ad Server Solutions - ad server ad exchangeAd Server Solutions - ad server ad exchange
Ad Server Solutions - ad server ad exchange
Ad Server Solutions
 
3 Minute Introduction
3 Minute Introduction3 Minute Introduction
3 Minute Introduction
Julian Tol
 
Optimize your Cloud Purchase Strategy
Optimize your Cloud Purchase Strategy Optimize your Cloud Purchase Strategy
Optimize your Cloud Purchase Strategy
Jan Thielscher
 
Google’s strategy to gain an edge on the Cloud
Google’s strategy to gain an edge on the CloudGoogle’s strategy to gain an edge on the Cloud
Google’s strategy to gain an edge on the Cloud
RohitSingh1837
 
Successful IoT projects - a few lessons
Successful IoT projects - a few lessonsSuccessful IoT projects - a few lessons
Successful IoT projects - a few lessons
Jan Thielscher
 
Online Ad Serving
Online Ad ServingOnline Ad Serving
Online Ad Serving
Neha Gupta
 
Criteo's Ad Week 2012 presentation - Big Data and the Value of Clickers
Criteo's Ad Week 2012 presentation - Big Data and the Value of ClickersCriteo's Ad Week 2012 presentation - Big Data and the Value of Clickers
Criteo's Ad Week 2012 presentation - Big Data and the Value of Clickers
Criteo
 
Axonite Campaign Automation Infrastructure for HasOffers
Axonite Campaign Automation Infrastructure for HasOffersAxonite Campaign Automation Infrastructure for HasOffers
Axonite Campaign Automation Infrastructure for HasOffers
Yuval Shefler
 
CrowdCast Pitch
CrowdCast PitchCrowdCast Pitch
CrowdCast Pitch
Shahrukh Ghazali
 

What's hot (16)

criteo-performance-advertising-playbook-2015
criteo-performance-advertising-playbook-2015criteo-performance-advertising-playbook-2015
criteo-performance-advertising-playbook-2015
 
Introduction Criteo - 2.0
Introduction Criteo - 2.0Introduction Criteo - 2.0
Introduction Criteo - 2.0
 
Simon Dollé_Large-scale Real-time recommendation at Criteo
Simon Dollé_Large-scale Real-time recommendation at Criteo Simon Dollé_Large-scale Real-time recommendation at Criteo
Simon Dollé_Large-scale Real-time recommendation at Criteo
 
Criteo Infrastructure (Platform) Meetup
Criteo Infrastructure (Platform) MeetupCriteo Infrastructure (Platform) Meetup
Criteo Infrastructure (Platform) Meetup
 
Criteo Couchbase live 2015
Criteo Couchbase live 2015Criteo Couchbase live 2015
Criteo Couchbase live 2015
 
Your Future With Content Manager OnDemand
Your Future With Content Manager OnDemandYour Future With Content Manager OnDemand
Your Future With Content Manager OnDemand
 
Aws community day keynote
Aws community day keynoteAws community day keynote
Aws community day keynote
 
Ad Server Solutions - ad server ad exchange
Ad Server Solutions - ad server ad exchangeAd Server Solutions - ad server ad exchange
Ad Server Solutions - ad server ad exchange
 
3 Minute Introduction
3 Minute Introduction3 Minute Introduction
3 Minute Introduction
 
Optimize your Cloud Purchase Strategy
Optimize your Cloud Purchase Strategy Optimize your Cloud Purchase Strategy
Optimize your Cloud Purchase Strategy
 
Google’s strategy to gain an edge on the Cloud
Google’s strategy to gain an edge on the CloudGoogle’s strategy to gain an edge on the Cloud
Google’s strategy to gain an edge on the Cloud
 
Successful IoT projects - a few lessons
Successful IoT projects - a few lessonsSuccessful IoT projects - a few lessons
Successful IoT projects - a few lessons
 
Online Ad Serving
Online Ad ServingOnline Ad Serving
Online Ad Serving
 
Criteo's Ad Week 2012 presentation - Big Data and the Value of Clickers
Criteo's Ad Week 2012 presentation - Big Data and the Value of ClickersCriteo's Ad Week 2012 presentation - Big Data and the Value of Clickers
Criteo's Ad Week 2012 presentation - Big Data and the Value of Clickers
 
Axonite Campaign Automation Infrastructure for HasOffers
Axonite Campaign Automation Infrastructure for HasOffersAxonite Campaign Automation Infrastructure for HasOffers
Axonite Campaign Automation Infrastructure for HasOffers
 
CrowdCast Pitch
CrowdCast PitchCrowdCast Pitch
CrowdCast Pitch
 

Viewers also liked

C# development workflow @ criteo
C# development workflow @ criteoC# development workflow @ criteo
C# development workflow @ criteo
Ibrahim Abubakari
 
RecsysFR: Criteo presentation
RecsysFR: Criteo presentationRecsysFR: Criteo presentation
RecsysFR: Criteo presentation
recsysfr
 
Criteo. Reach people, not devices!
Criteo. Reach people, not devices!Criteo. Reach people, not devices!
Criteo. Reach people, not devices!
HybridRussia
 
Why reinvent the wheel at Criteo?
Why reinvent the wheel at Criteo? Why reinvent the wheel at Criteo?
Why reinvent the wheel at Criteo?
Criteolabs
 
Couchbase live 2016
Couchbase live 2016Couchbase live 2016
Couchbase live 2016
Pierre Mavro
 
Response prediction for display advertising - WSDM 2014
Response prediction for display advertising - WSDM 2014Response prediction for display advertising - WSDM 2014
Response prediction for display advertising - WSDM 2014
Olivier Chapelle
 
Infographic: How Professional Services Automation can transform your Company!
Infographic: How Professional Services Automation can transform your Company!Infographic: How Professional Services Automation can transform your Company!
Infographic: How Professional Services Automation can transform your Company!
Changepoint
 
Saintjo Two AV4
Saintjo Two AV4Saintjo Two AV4
Saintjo Two AV4
Bernard Alexandre
 
Bizarre... vous avez dit bizarre - Paris Monitoring meetup #2
Bizarre... vous avez dit bizarre - Paris Monitoring meetup #2Bizarre... vous avez dit bizarre - Paris Monitoring meetup #2
Bizarre... vous avez dit bizarre - Paris Monitoring meetup #2
Paris Monitoring
 
Etude Criteo - la valeur réelle des internautes qui cliquent - 26 Juin 2012
Etude Criteo - la valeur réelle des internautes qui cliquent  - 26 Juin 2012Etude Criteo - la valeur réelle des internautes qui cliquent  - 26 Juin 2012
Etude Criteo - la valeur réelle des internautes qui cliquent - 26 Juin 2012Romain Fonnier
 
Morning with MongoDB Paris 2012 - Cas d'usages courant en entreprise. Présent...
Morning with MongoDB Paris 2012 - Cas d'usages courant en entreprise. Présent...Morning with MongoDB Paris 2012 - Cas d'usages courant en entreprise. Présent...
Morning with MongoDB Paris 2012 - Cas d'usages courant en entreprise. Présent...
MongoDB
 
AWS Summit Paris - Track 1 - Session 2 - Designez vos architectures pour plus...
AWS Summit Paris - Track 1 - Session 2 - Designez vos architectures pour plus...AWS Summit Paris - Track 1 - Session 2 - Designez vos architectures pour plus...
AWS Summit Paris - Track 1 - Session 2 - Designez vos architectures pour plus...
Amazon Web Services
 
Hadoop summit-ams-2014-04-03
Hadoop summit-ams-2014-04-03Hadoop summit-ams-2014-04-03
Hadoop summit-ams-2014-04-03
SDanzanvilliersCriteo
 
TechFeedというテクノロジーキュレーションサービスを作った話
TechFeedというテクノロジーキュレーションサービスを作った話TechFeedというテクノロジーキュレーションサービスを作った話
TechFeedというテクノロジーキュレーションサービスを作った話
yoshikawa_t
 
Using Neural Networks to predict user ratings
Using Neural Networks to predict user ratingsUsing Neural Networks to predict user ratings
Using Neural Networks to predict user ratings
recsysfr
 
Dictionary Learning for Massive Matrix Factorization
Dictionary Learning for Massive Matrix FactorizationDictionary Learning for Massive Matrix Factorization
Dictionary Learning for Massive Matrix Factorization
recsysfr
 
Meta-Prod2Vec: Simple Product Embeddings with Side-Information
Meta-Prod2Vec: Simple Product Embeddings with Side-InformationMeta-Prod2Vec: Simple Product Embeddings with Side-Information
Meta-Prod2Vec: Simple Product Embeddings with Side-Information
recsysfr
 
CONTENT2VEC: a Joint Architecture to use Product Image and Text for the task ...
CONTENT2VEC: a Joint Architecture to use Product Image and Text for the task ...CONTENT2VEC: a Joint Architecture to use Product Image and Text for the task ...
CONTENT2VEC: a Joint Architecture to use Product Image and Text for the task ...
recsysfr
 

Viewers also liked (18)

C# development workflow @ criteo
C# development workflow @ criteoC# development workflow @ criteo
C# development workflow @ criteo
 
RecsysFR: Criteo presentation
RecsysFR: Criteo presentationRecsysFR: Criteo presentation
RecsysFR: Criteo presentation
 
Criteo. Reach people, not devices!
Criteo. Reach people, not devices!Criteo. Reach people, not devices!
Criteo. Reach people, not devices!
 
Why reinvent the wheel at Criteo?
Why reinvent the wheel at Criteo? Why reinvent the wheel at Criteo?
Why reinvent the wheel at Criteo?
 
Couchbase live 2016
Couchbase live 2016Couchbase live 2016
Couchbase live 2016
 
Response prediction for display advertising - WSDM 2014
Response prediction for display advertising - WSDM 2014Response prediction for display advertising - WSDM 2014
Response prediction for display advertising - WSDM 2014
 
Infographic: How Professional Services Automation can transform your Company!
Infographic: How Professional Services Automation can transform your Company!Infographic: How Professional Services Automation can transform your Company!
Infographic: How Professional Services Automation can transform your Company!
 
Saintjo Two AV4
Saintjo Two AV4Saintjo Two AV4
Saintjo Two AV4
 
Bizarre... vous avez dit bizarre - Paris Monitoring meetup #2
Bizarre... vous avez dit bizarre - Paris Monitoring meetup #2Bizarre... vous avez dit bizarre - Paris Monitoring meetup #2
Bizarre... vous avez dit bizarre - Paris Monitoring meetup #2
 
Etude Criteo - la valeur réelle des internautes qui cliquent - 26 Juin 2012
Etude Criteo - la valeur réelle des internautes qui cliquent  - 26 Juin 2012Etude Criteo - la valeur réelle des internautes qui cliquent  - 26 Juin 2012
Etude Criteo - la valeur réelle des internautes qui cliquent - 26 Juin 2012
 
Morning with MongoDB Paris 2012 - Cas d'usages courant en entreprise. Présent...
Morning with MongoDB Paris 2012 - Cas d'usages courant en entreprise. Présent...Morning with MongoDB Paris 2012 - Cas d'usages courant en entreprise. Présent...
Morning with MongoDB Paris 2012 - Cas d'usages courant en entreprise. Présent...
 
AWS Summit Paris - Track 1 - Session 2 - Designez vos architectures pour plus...
AWS Summit Paris - Track 1 - Session 2 - Designez vos architectures pour plus...AWS Summit Paris - Track 1 - Session 2 - Designez vos architectures pour plus...
AWS Summit Paris - Track 1 - Session 2 - Designez vos architectures pour plus...
 
Hadoop summit-ams-2014-04-03
Hadoop summit-ams-2014-04-03Hadoop summit-ams-2014-04-03
Hadoop summit-ams-2014-04-03
 
TechFeedというテクノロジーキュレーションサービスを作った話
TechFeedというテクノロジーキュレーションサービスを作った話TechFeedというテクノロジーキュレーションサービスを作った話
TechFeedというテクノロジーキュレーションサービスを作った話
 
Using Neural Networks to predict user ratings
Using Neural Networks to predict user ratingsUsing Neural Networks to predict user ratings
Using Neural Networks to predict user ratings
 
Dictionary Learning for Massive Matrix Factorization
Dictionary Learning for Massive Matrix FactorizationDictionary Learning for Massive Matrix Factorization
Dictionary Learning for Massive Matrix Factorization
 
Meta-Prod2Vec: Simple Product Embeddings with Side-Information
Meta-Prod2Vec: Simple Product Embeddings with Side-InformationMeta-Prod2Vec: Simple Product Embeddings with Side-Information
Meta-Prod2Vec: Simple Product Embeddings with Side-Information
 
CONTENT2VEC: a Joint Architecture to use Product Image and Text for the task ...
CONTENT2VEC: a Joint Architecture to use Product Image and Text for the task ...CONTENT2VEC: a Joint Architecture to use Product Image and Text for the task ...
CONTENT2VEC: a Joint Architecture to use Product Image and Text for the task ...
 

Similar to Making advertising personal, 4th NL Recommenders Meetup

Recommendation at scale
Recommendation at scaleRecommendation at scale
Recommendation at scale
simondolle
 
An Introduction to Apache Geode (incubating)
An Introduction to Apache Geode (incubating)An Introduction to Apache Geode (incubating)
An Introduction to Apache Geode (incubating)
Anthony Baker
 
Open Sourcing GemFire - Apache Geode
Open Sourcing GemFire - Apache GeodeOpen Sourcing GemFire - Apache Geode
Open Sourcing GemFire - Apache Geode
Apache Geode
 
GraphTalk Berlin - Einführung in Graphdatenbanken
GraphTalk Berlin - Einführung in GraphdatenbankenGraphTalk Berlin - Einführung in Graphdatenbanken
GraphTalk Berlin - Einführung in Graphdatenbanken
Neo4j
 
Webinar: 2 Billion Data Points Each Day
Webinar: 2 Billion Data Points Each DayWebinar: 2 Billion Data Points Each Day
Webinar: 2 Billion Data Points Each Day
DataStax
 
The New Model
The New ModelThe New Model
The New Model
David Kaiser
 
10 Lessons Learned from Meeting with 150 Banks Across the Globe
10 Lessons Learned from Meeting with 150 Banks Across the Globe10 Lessons Learned from Meeting with 150 Banks Across the Globe
10 Lessons Learned from Meeting with 150 Banks Across the Globe
DataWorks Summit
 
OC Big Data Monthly Meetup #6 - Session 1 - IBM
OC Big Data Monthly Meetup #6 - Session 1 - IBMOC Big Data Monthly Meetup #6 - Session 1 - IBM
OC Big Data Monthly Meetup #6 - Session 1 - IBM
Big Data Joe™ Rossi
 
SD Big Data Monthly Meetup #4 - Session 1 - IBM
SD Big Data Monthly Meetup #4 - Session 1 - IBMSD Big Data Monthly Meetup #4 - Session 1 - IBM
SD Big Data Monthly Meetup #4 - Session 1 - IBM
Big Data Joe™ Rossi
 
RTBkit Introduction & Best Practices
RTBkit Introduction & Best PracticesRTBkit Introduction & Best Practices
RTBkit Introduction & Best Practices
Datacratic
 
Hybris @ Neev
Hybris @ NeevHybris @ Neev
Hybris @ Neev
Neev Technologies
 
Connecta Event: Big Query och dataanalys med Google Cloud Platform
Connecta Event: Big Query och dataanalys med Google Cloud PlatformConnecta Event: Big Query och dataanalys med Google Cloud Platform
Connecta Event: Big Query och dataanalys med Google Cloud Platform
ConnectaDigital
 
The Business Justification for APM
The Business Justification for APMThe Business Justification for APM
The Business Justification for APM
Jonah Kowall
 
Conclusion Connect state of IoT 2019 Review io t solutions world congress 2019
Conclusion Connect state of IoT 2019 Review io t solutions world congress 2019Conclusion Connect state of IoT 2019 Review io t solutions world congress 2019
Conclusion Connect state of IoT 2019 Review io t solutions world congress 2019
Conclusion Connect enabling industry 4.0 with IoT
 
Dashlane Mission Teams
Dashlane Mission TeamsDashlane Mission Teams
Dashlane Mission Teams
Dashlane
 
Lean Startup: Reduce 40% go-to-market time & cost on your next product launch
Lean Startup: Reduce 40% go-to-market time & cost on your next product launchLean Startup: Reduce 40% go-to-market time & cost on your next product launch
Lean Startup: Reduce 40% go-to-market time & cost on your next product launch
People10 Technosoft Private Limited
 
AI as a Service, Build Shared AI Service Platforms Based on Deep Learning Tec...
AI as a Service, Build Shared AI Service Platforms Based on Deep Learning Tec...AI as a Service, Build Shared AI Service Platforms Based on Deep Learning Tec...
AI as a Service, Build Shared AI Service Platforms Based on Deep Learning Tec...
Databricks
 
Lean startup 2019
Lean startup 2019Lean startup 2019
Lean startup 2019
John J. H. Oh
 
Self-Service Analytics with Guard Rails
Self-Service Analytics with Guard RailsSelf-Service Analytics with Guard Rails
Self-Service Analytics with Guard Rails
Denodo
 
Containers and VMs and Clouds: Oh My. by Mike Coleman
Containers and VMs and Clouds: Oh My. by Mike ColemanContainers and VMs and Clouds: Oh My. by Mike Coleman
Containers and VMs and Clouds: Oh My. by Mike Coleman
Docker, Inc.
 

Similar to Making advertising personal, 4th NL Recommenders Meetup (20)

Recommendation at scale
Recommendation at scaleRecommendation at scale
Recommendation at scale
 
An Introduction to Apache Geode (incubating)
An Introduction to Apache Geode (incubating)An Introduction to Apache Geode (incubating)
An Introduction to Apache Geode (incubating)
 
Open Sourcing GemFire - Apache Geode
Open Sourcing GemFire - Apache GeodeOpen Sourcing GemFire - Apache Geode
Open Sourcing GemFire - Apache Geode
 
GraphTalk Berlin - Einführung in Graphdatenbanken
GraphTalk Berlin - Einführung in GraphdatenbankenGraphTalk Berlin - Einführung in Graphdatenbanken
GraphTalk Berlin - Einführung in Graphdatenbanken
 
Webinar: 2 Billion Data Points Each Day
Webinar: 2 Billion Data Points Each DayWebinar: 2 Billion Data Points Each Day
Webinar: 2 Billion Data Points Each Day
 
The New Model
The New ModelThe New Model
The New Model
 
10 Lessons Learned from Meeting with 150 Banks Across the Globe
10 Lessons Learned from Meeting with 150 Banks Across the Globe10 Lessons Learned from Meeting with 150 Banks Across the Globe
10 Lessons Learned from Meeting with 150 Banks Across the Globe
 
OC Big Data Monthly Meetup #6 - Session 1 - IBM
OC Big Data Monthly Meetup #6 - Session 1 - IBMOC Big Data Monthly Meetup #6 - Session 1 - IBM
OC Big Data Monthly Meetup #6 - Session 1 - IBM
 
SD Big Data Monthly Meetup #4 - Session 1 - IBM
SD Big Data Monthly Meetup #4 - Session 1 - IBMSD Big Data Monthly Meetup #4 - Session 1 - IBM
SD Big Data Monthly Meetup #4 - Session 1 - IBM
 
RTBkit Introduction & Best Practices
RTBkit Introduction & Best PracticesRTBkit Introduction & Best Practices
RTBkit Introduction & Best Practices
 
Hybris @ Neev
Hybris @ NeevHybris @ Neev
Hybris @ Neev
 
Connecta Event: Big Query och dataanalys med Google Cloud Platform
Connecta Event: Big Query och dataanalys med Google Cloud PlatformConnecta Event: Big Query och dataanalys med Google Cloud Platform
Connecta Event: Big Query och dataanalys med Google Cloud Platform
 
The Business Justification for APM
The Business Justification for APMThe Business Justification for APM
The Business Justification for APM
 
Conclusion Connect state of IoT 2019 Review io t solutions world congress 2019
Conclusion Connect state of IoT 2019 Review io t solutions world congress 2019Conclusion Connect state of IoT 2019 Review io t solutions world congress 2019
Conclusion Connect state of IoT 2019 Review io t solutions world congress 2019
 
Dashlane Mission Teams
Dashlane Mission TeamsDashlane Mission Teams
Dashlane Mission Teams
 
Lean Startup: Reduce 40% go-to-market time & cost on your next product launch
Lean Startup: Reduce 40% go-to-market time & cost on your next product launchLean Startup: Reduce 40% go-to-market time & cost on your next product launch
Lean Startup: Reduce 40% go-to-market time & cost on your next product launch
 
AI as a Service, Build Shared AI Service Platforms Based on Deep Learning Tec...
AI as a Service, Build Shared AI Service Platforms Based on Deep Learning Tec...AI as a Service, Build Shared AI Service Platforms Based on Deep Learning Tec...
AI as a Service, Build Shared AI Service Platforms Based on Deep Learning Tec...
 
Lean startup 2019
Lean startup 2019Lean startup 2019
Lean startup 2019
 
Self-Service Analytics with Guard Rails
Self-Service Analytics with Guard RailsSelf-Service Analytics with Guard Rails
Self-Service Analytics with Guard Rails
 
Containers and VMs and Clouds: Oh My. by Mike Coleman
Containers and VMs and Clouds: Oh My. by Mike ColemanContainers and VMs and Clouds: Oh My. by Mike Coleman
Containers and VMs and Clouds: Oh My. by Mike Coleman
 

Recently uploaded

5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides
DanBrown980551
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
Zilliz
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
innovationoecd
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Safe Software
 
Finale of the Year: Apply for Next One!
Finale of the Year: Apply for Next One!Finale of the Year: Apply for Next One!
Finale of the Year: Apply for Next One!
GDSC PJATK
 
GenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizationsGenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizations
kumardaparthi1024
 
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
Chart Kalyan
 
Digital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying AheadDigital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying Ahead
Wask
 
Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
ssuserfac0301
 
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
alexjohnson7307
 
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
Tatiana Kojar
 
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
saastr
 
Letter and Document Automation for Bonterra Impact Management (fka Social Sol...
Letter and Document Automation for Bonterra Impact Management (fka Social Sol...Letter and Document Automation for Bonterra Impact Management (fka Social Sol...
Letter and Document Automation for Bonterra Impact Management (fka Social Sol...
Jeffrey Haguewood
 
Nunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdf
Nunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdfNunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdf
Nunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdf
flufftailshop
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
Tomaz Bratanic
 
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptxOcean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
SitimaJohn
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
shyamraj55
 
AWS Cloud Cost Optimization Presentation.pptx
AWS Cloud Cost Optimization Presentation.pptxAWS Cloud Cost Optimization Presentation.pptx
AWS Cloud Cost Optimization Presentation.pptx
HarisZaheer8
 
Generating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and MilvusGenerating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and Milvus
Zilliz
 
Trusted Execution Environment for Decentralized Process Mining
Trusted Execution Environment for Decentralized Process MiningTrusted Execution Environment for Decentralized Process Mining
Trusted Execution Environment for Decentralized Process Mining
LucaBarbaro3
 

Recently uploaded (20)

5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
 
Finale of the Year: Apply for Next One!
Finale of the Year: Apply for Next One!Finale of the Year: Apply for Next One!
Finale of the Year: Apply for Next One!
 
GenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizationsGenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizations
 
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
 
Digital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying AheadDigital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying Ahead
 
Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
 
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
 
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
 
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
 
Letter and Document Automation for Bonterra Impact Management (fka Social Sol...
Letter and Document Automation for Bonterra Impact Management (fka Social Sol...Letter and Document Automation for Bonterra Impact Management (fka Social Sol...
Letter and Document Automation for Bonterra Impact Management (fka Social Sol...
 
Nunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdf
Nunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdfNunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdf
Nunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdf
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
 
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptxOcean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
 
AWS Cloud Cost Optimization Presentation.pptx
AWS Cloud Cost Optimization Presentation.pptxAWS Cloud Cost Optimization Presentation.pptx
AWS Cloud Cost Optimization Presentation.pptx
 
Generating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and MilvusGenerating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and Milvus
 
Trusted Execution Environment for Decentralized Process Mining
Trusted Execution Environment for Decentralized Process MiningTrusted Execution Environment for Decentralized Process Mining
Trusted Execution Environment for Decentralized Process Mining
 

Making advertising personal, 4th NL Recommenders Meetup

  • 1. Copyright © 2014 Criteo Making Advertising Personal Large Scale Real-Time Product Recommendation at Criteo Olivier Koch & Romain Lerallut May 19th, 2015
  • 2. Copyright © 2014 Criteo Performance Advertising We buy • Inventory ! (ad spaces) • Billions of times a day • All over the Internet • For 95% of the population Where is the need for tech ? We sell • Clicks ! • (that convert) • (that convert a lot) We take the risk You pay only for what you get
  • 3. Copyright © 2013. Confidential International infrastructure Key figures Traffic 550 k HTTP requests / sec (peak activity) 23000 impressions /sec (peak activity) 180 k requests / sec on RTB (average) Less than 10 ms to process an RTB request 3 Figures of May2013 Physical infrastructure 6 Data centers on 3 continents operated and conceived in- house ~ 12000 servers, largest Hadoop cluster in Europe Availability / Uptime >99.95% More than 20 PB of storage Big Data
  • 4. Copyright © 2012. Confidential 4 Arbitrage •Should we bid? •At which price? Recommendation •Which products should we display? Graphical optimization •Big image vs small image •Background color, ... Prediction •Generic prediction engine •Specific models trained on TBs of data Global Engine Architecture
  • 5. Copyright © 2014 Criteo Building ads with personalized content in realtime Recommendation for Advertising 2.5 billion products ~ 8ms response time
  • 6. Copyright © 2014 Criteo Data Sources • Catalog data • Feed provided by the merchants • User behavior data • Large scale intent data • All visits to merchant websites • Page views, basket, sales events • Ad display data • Displayed and clicked ads 6
  • 7. Copyright © 2012. Confidential 7 Recommendation execution flow Candidates Generation • Get candidates from all sources using user historical products and bestofs Candidates Aggregation • Remove duplicates • Aggregate features Preselection • Call degraded prediction to decrease number of candidates Scoring • Call full prediction model to score each candidate Winner Selector • Select N products on score, randomization Glup Logging • Log recommended products, prediction variables
  • 8. Copyright © 2014 Criteo Issue #1: Retrieving user-specific products • The need: storing [user → interesting products] vectors • Difficult to store and retrieve at scale (900+ mln users) • Hard to keep up to date • Using seen products as a proxy • Store the [user → viewed products] vectors • Easier to maintain • Store [viewed product → interesting products] vectors • Based on aggregated user behavior data • Computable offline • Final ranking by a ML model 8
  • 9. Copyright © 2014 Criteo Issue #2: Scoring products • Need to fuse data from several sources • Product-specific • User-specific • User-product interactions • Display-specific • 1st solution: regression model • Predict P(product click then sale) • Easy to evaluate • 2nd solution: ranking model • More appropriate for our needs • Still maximizing post-click sales • Can be evaluated only on multi-product banners 9
  • 10. Copyright © 2014 Criteo Issue #3: Picking the right products • Several questions: • User-product fatigue • Independent product choice assumption • Explore / exploit • Solution 1: Randomization in the banners • Keep independent products assumption • Separate optimization process to shuffle displayed items • Solution 2: A better scoring model • Score a full banner, not independent products • Store all product display counts 10
  • 11. Copyright © 2014 Criteo Issue #4: 8ms response time ! Tweaking for performance • CTR / Sales prediction takes ~40 µs per candidate using in-house library • 2-step prediction: • A fast pass to remove most of the candidates • A slow pass to score accurately the remaining candidates And the technical fine print: • All real-time code in C# • Async I/O for better efficiency • HAProxy to scale the front-end • Memcached to store all required data in memory 11
  • 12. Copyright © 2014 Criteo Upcoming challenges • Long(er)-term user profiles • More and better product information (images, semantic, NLP) • Vertical-optimized engine • Classifieds (catalog-free recommendation ?) • Travel • Instant-update of similarities • (because batch computation is soooo last year) 12
  • 13. Copyright © 2014 Criteo Fancy a try ? 13 On your own: With us ! http://www.criteo.com/careers/ • Our 1st public dataset is online: http://bit.ly/1vgw2XC • 4GB display and click data, Kaggle challenge in 2014 • NEW : 1TB dataset released a few weeks ago • Hosted on Microsoft Azure, just waiting for you
  • 14. Copyright © 2014 Criteo Questions?
  • 15. Copyright © 2014 Criteo Thank you ! o.koch@criteo.com r.lerallut@criteo.com