SlideShare a Scribd company logo
Copyright © 2015 Criteo
The Criteo Experience
Olivier Koch
Engineering Program Manager, Criteo
TektosData Meetup “Data Meets Business”
May 31, 2016
Copyright © 2015 Criteo
Outline
• What does Criteo do?
• Deep dive into our technical stack
• Delivery at scale
• A few lessons learned
2
Copyright © 2015 Criteo
Banners… what else?
3
Advertiser Publisher
Copyright © 2015 Criteo
Online advertising at scale
4
3B displays / day
40 PB of data
15,000 servers
worldwide
Copyright © 2015 Criteo
• Deep dive into Criteo
Copyright © 2015 Criteo
6
Bidding
•Should we bid?
•At which price?
Recommendation
•Which products should
we display?
Look & Feel
•Big image vs small image
•Background color, ...
Prediction
•Generic prediction engine
•Specific models trained on TBs of data
Copyright © 2015 Criteo
7
Bidding
•Should we bid?
•At which price?
Recommendation
•Which products should
we display?
Look & Feel
•Big image vs small image
•Background color, ...
Prediction
•Generic prediction engine
•Specific models trained on TBs of data
Copyright © 2015 Criteo
 As we sell performance Criteo’s and client’s interests are aligned, so the engine aims at maximizing
the value we generate to our clients
 As the cost of a display is lower and independant from the bid (2nd price auction or floor), we should
always bid the maximum value that the client is willing to pay for a display
We bid the expected value of the display for the client
Value = 1€
CPM = 0,6€
CPM = 0,7€
CPM = 0,75€
CPM = 1,1€
CPM = 1,2€
CPM = 1,3€
This bidding strategy is optimal: we are sure to buy all profitable displays and only them
Copyright © 2015 Criteo
Bid =   CPC  pClick  pSale  AOV
2012 - Ensures constant
value allocation between
Criteo and its clients
2014 - COS
Optimizer
2013 - CRO :
“Conversion Rate
Optimizer”
This value depends on the predicted performance and the
client’s objective
Revenue that the display will generate for the clientMaximum share that
the client is willing to
pay
Copyright © 2015 Criteo
We train our prediction models on our historical displays
Historical displays
Variables
 Level of engagement of the user
 Quality of inventory
 User fatigue
 For travel: time to check-in and number
of nights
: clicked displays : converted displays (size = order value)
Our ability to predict relies
greatly on the relevance of
the variables we consider
Machine Learning
Algorithms
Copyright © 2015 Criteo
11
Bidding
•Should we bid?
•At which price?
Recommendation
•Which products should
we display?
Look & Feel
•Big image vs small image
•Background color, ...
Prediction
•Generic prediction engine
•Specific models trained on TBs of data
Copyright © 2015 Criteo
Recommend products for a user
• What we want: reco(user) = products
• 1B users x 3B products!
• But we need to scale and keep it fresh
Copyright © 2015 Criteo
User X saw orange shoes
Users who saw these same shoes also saw
Most viewed product on the client’s site are
We use collaborative filtering to select candidate products
Candidate products for user X are
Historical
Similar
Best-of
Copyright © 2015 Criteo
Products delivering the best performance are displayed
Variables
 Products seen by the user
 Time since product event
 Level of similarity
 Product features
Historical displays
: clicked products : converted products (size = order value)
Products are selected based
on their pClick x pSale x AOV
Machine Learning
Algorithms
Copyright © 2015 Criteo
15
Bidding
•Should we bid?
•At which price?
Recommendation
•Which products should
we display?
Look & Feel
•Big image vs small image
•Background color, ...
Prediction
•Generic prediction engine
•Specific models trained on TBs of data
Copyright © 2015 Criteo
Historical displays (color = look & feel)
We train our prediction models on our historical displays
Variables
Some of which we control:
 How user interacts with banner
 Organization of information
 Colorset
Some of which we don’t:
 Zone format
 Publisher
: clicked displays : converted displays (size = order value)
Look and feel will be selected
based on its pClick x pSale x AOV
My company
BUY! BUY! BUY!
BUY!
Machine Learning
Algorithms
Copyright © 2015 Criteo
17
Bidding
•Should we bid?
•At which price?
Recommendation
•Which products should
we display?
Look & Feel
•Big image vs small image
•Background color, ...
Prediction
•Generic prediction engine
•Specific models trained on TBs of data
Copyright © 2015 Criteo
 Predict: 𝔼 𝑆𝑎𝑙𝑒𝑠𝐴𝑚𝑜𝑢𝑛𝑡 = ℙ 𝐶𝑙𝑖𝑐𝑘 ℙ 𝑆𝑎𝑙𝑒|𝐶𝑙𝑖𝑐𝑘 𝔼[𝑆𝑎𝑙𝑒𝑠𝐴𝑚𝑜𝑢𝑛𝑡|𝑆𝑎𝑙𝑒]
 Each model is trained independently & refreshed as often as possible
 Three sources of features: user, ad, page (mostly categorical).
Optimizing for sales amount
(logistic) (logistic) (log normal) (all regularized!)
Copyright © 2015 Criteo
Learn on huge volumes of data
10 000 displays
Copyright © 2015 Criteo
Learn on huge volumes of data
10 000 displays
leads to
50 clicks
Copyright © 2015 Criteo
Learn on huge volumes of data
10 000 displays
leads to
50 clicks
leads to
1 sale
Copyright © 2015 Criteo
 We have our own large-scale distributed machine learning library on top of Hadoop used for all models.
 From a ML perspective we rely, in most cases, on an L-BFGS solver initialized with SGD (see, eg, A.
Agarwal et al. A Reliable Effective Terascale Linear Learning System).
In-house Machine Learning library -- IRMA
Copyright © 2015 Criteo
Learning duration: trading time and volume
Longer ⇒ Volume ↑ VS Shorter ⇒ Reactivity ↑
23
100
110
120
130
140
150
160
170
180
190
200
11/01/2014 21/01/2014 31/01/2014 10/02/2014 20/02/2014
Salesamount(€)
Valentine’s day eve
Precision
Learning duration
12/02/2014 13/02/2014 14/02/2014 15/02/2014
16/02/2014 17/02/2014 18/02/2014 All
Copyright © 2015 Criteo
 Each model is trained on several TB of data and contains millions of features
 We learn several hundreds of models, refreshed many times per day
 How about large-scale distributed machine learning?
Wait a minute: how do you handle TBs of training data?
+ =
Copyright © 2015 Criteo
 Hadoop AllReduce
 L-BFGS, being a batch algorithm, is easy to distribute (by distributing the computation of the gradient),
while it’s more difficult with SGD; we do parameter averaging for that, which needs some tweaking
(learning rate, number of epochs, …). In SGD, we use Hogwild! to multi-thread.
 Zookeeper to ensure fault-tolerance.
Distribution of L-BFGS & SGD
Copyright © 2015 Criteo
 Irma is not only about vanilla logistic regression with L2 regularization; it contains more advanced
techniques: transfer learning, factorization machines, learning to rank, …
 We for example use cost-sensitive learning for bidding.
A word on advanced techniques
Copyright © 2015 Criteo
Two steps:
 Offline testing is fast, cheap, and efficient for wide exploration
 Online testing is expensive but has the ultimate word
 The more data you have, the faster you can make decisions
Offline & online evaluation
Copyright © 2015 Criteo
28
Physical infrastructure
7 in-house data centers on 3 continents
~ 15000 servers, largest Hadoop cluster in Europe
More than 35 PB of storage Big Data
Traffic
800k HTTP requests / sec (peak activity)
29000 impressions / sec (peak activity)
<10 ms to process bidding request
<100 ms to process reco request
Copyright © 2015 Criteo
Academic research @ Criteo
• Our 1st public dataset is online: http://bit.ly/1vgw2XC
• New 1TB dataset released last year
• Recent publications:
Offline evaluation of response prediction in online advertising auctions, O. Chapelle, WWW’15.
Sources of variability in large-scale machine learning systems, D. Lefortier, A. Truchet, and M.
de Rijke, NIPS workshop on ML systems, 2015
Cost-sensitive learning for bidding in online advertising auctions, F. Vasile and D. Lefortier,
NIPS workshop on ML for e-commerce, 2015.
29
Copyright © 2015 Criteo
New areas of research
• Counterfactual evaluation (offline A/B tests)
• Product embeddings for recommendation
• Policy learning
30
Copyright © 2015 Criteo
• Delivery at scale
Copyright © 2015 Criteo
The early days of Criteo
32
Single C# repository
Build in 90 minutes
Weekly merges
Copyright © 2015 Criteo
What could go wrong?
33
Copyright © 2015 Criteo
34
Copyright © 2015 Criteo
Delivery at scale at Criteo
35
Trunk-based development (TBD)
Fast commits
Code reviews with Gerrit
The MOAB
Deploy with scp / bittorrent
Automatic metrics checks
=> 200+ happy engineers!
Copyright © 2015 Criteo
The Criteo MOAB
36
Copyright © 2015 Criteo
Delivery at scale at Criteo
37
Copyright © 2015 Criteo
• A few lessons learned
Copyright © 2015 Criteo
Start small
• If you can't build it with a few machines, it's likely you won't be able to do it with
many
39
First Google computer
Copyright © 2015 Criteo
Start small
• Keep fancy algorithms for later
40
The Page rank algorithm
Copyright © 2015 Criteo
Iterate fast
• Easy access to data (20PB vs 4GB of clean, carefully selected data)
• Convenient technologies (e.g. Python & notebooks, scikit-learn)
• Make IT a non-problem
• Keep projects small (typical project size 3-9 months)
41
Copyright © 2015 Criteo
Iterate fast
• Easy access to data (20PB vs 4GB of clean, carefully selected data)
• Convenient technologies (e.g. Python & notebooks, scikit-learn)
• Make IT a non-problem
• Keep projects small (typical project size 3-9 months)
42
Talent magnet
Copyright © 2015 Criteo
Keep teams small
43
3 members
3 channels
4 members
6 channels
5 members
10 channels
10 members
45 channels
…
Copyright © 2015 Criteo
Build the right team
• Variety of skills
• Software/ML engineers, ops/devops
• Analysts/BI
• Product
• Designers
• Managers
44
Copyright © 2015 Criteo
Make the team agile
• Use a flat, distributed hierarchy model and make people sit next to each other
45
EPM
ENG LEAD
PM
MGR
Copyright © 2015 Criteo
Make the team agile
• Use the right tools
• slack
• jira
• confluence
• git
• gerrit
• OKR
46
Copyright © 2015 Criteo
Build the culture
• Let ideas emerge bottom-up
• Hackathons (for real)
• 10% projects
• Transparency : make info available to all
• Use mature technologies
• You will fail. That’s OK!
47
Copyright © 2015 Criteo
Take-aways
• Start small
• Iterate fast
• Build the team
• Make the team agile
• Build the culture
48
Copyright © 2015 Criteo
• Thanks! Questions?

More Related Content

What's hot

criteo-performance-advertising-playbook-2015
criteo-performance-advertising-playbook-2015criteo-performance-advertising-playbook-2015
criteo-performance-advertising-playbook-2015
Carolyn Bednarz
 
Introduction Criteo - 2.0
Introduction Criteo - 2.0Introduction Criteo - 2.0
Introduction Criteo - 2.0
Scott Turecek
 
Criteo Couchbase live 2015
Criteo Couchbase live 2015Criteo Couchbase live 2015
Criteo Couchbase live 2015
Nicolasgmail.com Helleringer
 
Criteo Infrastructure (Platform) Meetup
Criteo Infrastructure (Platform) MeetupCriteo Infrastructure (Platform) Meetup
Criteo Infrastructure (Platform) Meetup
Ibrahim Abubakari
 
Back to the Future: Bringing Performance Targeting to Mobile Devices from DRS...
Back to the Future: Bringing Performance Targeting to Mobile Devices from DRS...Back to the Future: Bringing Performance Targeting to Mobile Devices from DRS...
Back to the Future: Bringing Performance Targeting to Mobile Devices from DRS...
Digiday
 
Sis mon 1315 sponsored lunch criteo
Sis mon 1315 sponsored lunch criteoSis mon 1315 sponsored lunch criteo
Sis mon 1315 sponsored lunch criteo
MediaPost
 
Ad Server Solutions - ad server ad exchange
Ad Server Solutions - ad server ad exchangeAd Server Solutions - ad server ad exchange
Ad Server Solutions - ad server ad exchange
Ad Server Solutions
 
Criteo's Ad Week 2012 presentation - Big Data and the Value of Clickers
Criteo's Ad Week 2012 presentation - Big Data and the Value of ClickersCriteo's Ad Week 2012 presentation - Big Data and the Value of Clickers
Criteo's Ad Week 2012 presentation - Big Data and the Value of Clickers
Criteo
 
3 Minute Introduction
3 Minute Introduction3 Minute Introduction
3 Minute Introduction
Julian Tol
 
Online Ad Serving
Online Ad ServingOnline Ad Serving
Online Ad Serving
Neha Gupta
 
Criteo - NOAH13 London
Criteo - NOAH13 LondonCriteo - NOAH13 London
Criteo - NOAH13 London
NOAH Advisors
 
Your Future With Content Manager OnDemand
Your Future With Content Manager OnDemandYour Future With Content Manager OnDemand
Your Future With Content Manager OnDemand
Zia Consulting
 
When business meets measurement protocol - atdconf - 2017 - Tel Aviv
When business meets measurement protocol - atdconf - 2017 - Tel AvivWhen business meets measurement protocol - atdconf - 2017 - Tel Aviv
When business meets measurement protocol - atdconf - 2017 - Tel Aviv
Zorin Radovancevic
 
ActOnCloud for Cloud Service Providers and Enterprises
ActOnCloud for Cloud Service Providers and EnterprisesActOnCloud for Cloud Service Providers and Enterprises
ActOnCloud for Cloud Service Providers and Enterprises
Madan Ganesh Velayudham
 
Obtaining the Programmatic Holy Grail: Transparency, Flexibility & Control
Obtaining the Programmatic Holy Grail: Transparency, Flexibility & ControlObtaining the Programmatic Holy Grail: Transparency, Flexibility & Control
Obtaining the Programmatic Holy Grail: Transparency, Flexibility & Control
MediaPost
 
Our Experience with Adobe Audience Manager DMP
Our Experience with Adobe Audience Manager DMPOur Experience with Adobe Audience Manager DMP
Our Experience with Adobe Audience Manager DMP
Matěj Novák
 
Alo tech master presentation short_google partners
Alo tech master presentation short_google partnersAlo tech master presentation short_google partners
Alo tech master presentation short_google partners
Cenk Soyak
 
Axonite Campaign Automation Infrastructure for HasOffers
Axonite Campaign Automation Infrastructure for HasOffersAxonite Campaign Automation Infrastructure for HasOffers
Axonite Campaign Automation Infrastructure for HasOffers
Yuval Shefler
 

What's hot (18)

criteo-performance-advertising-playbook-2015
criteo-performance-advertising-playbook-2015criteo-performance-advertising-playbook-2015
criteo-performance-advertising-playbook-2015
 
Introduction Criteo - 2.0
Introduction Criteo - 2.0Introduction Criteo - 2.0
Introduction Criteo - 2.0
 
Criteo Couchbase live 2015
Criteo Couchbase live 2015Criteo Couchbase live 2015
Criteo Couchbase live 2015
 
Criteo Infrastructure (Platform) Meetup
Criteo Infrastructure (Platform) MeetupCriteo Infrastructure (Platform) Meetup
Criteo Infrastructure (Platform) Meetup
 
Back to the Future: Bringing Performance Targeting to Mobile Devices from DRS...
Back to the Future: Bringing Performance Targeting to Mobile Devices from DRS...Back to the Future: Bringing Performance Targeting to Mobile Devices from DRS...
Back to the Future: Bringing Performance Targeting to Mobile Devices from DRS...
 
Sis mon 1315 sponsored lunch criteo
Sis mon 1315 sponsored lunch criteoSis mon 1315 sponsored lunch criteo
Sis mon 1315 sponsored lunch criteo
 
Ad Server Solutions - ad server ad exchange
Ad Server Solutions - ad server ad exchangeAd Server Solutions - ad server ad exchange
Ad Server Solutions - ad server ad exchange
 
Criteo's Ad Week 2012 presentation - Big Data and the Value of Clickers
Criteo's Ad Week 2012 presentation - Big Data and the Value of ClickersCriteo's Ad Week 2012 presentation - Big Data and the Value of Clickers
Criteo's Ad Week 2012 presentation - Big Data and the Value of Clickers
 
3 Minute Introduction
3 Minute Introduction3 Minute Introduction
3 Minute Introduction
 
Online Ad Serving
Online Ad ServingOnline Ad Serving
Online Ad Serving
 
Criteo - NOAH13 London
Criteo - NOAH13 LondonCriteo - NOAH13 London
Criteo - NOAH13 London
 
Your Future With Content Manager OnDemand
Your Future With Content Manager OnDemandYour Future With Content Manager OnDemand
Your Future With Content Manager OnDemand
 
When business meets measurement protocol - atdconf - 2017 - Tel Aviv
When business meets measurement protocol - atdconf - 2017 - Tel AvivWhen business meets measurement protocol - atdconf - 2017 - Tel Aviv
When business meets measurement protocol - atdconf - 2017 - Tel Aviv
 
ActOnCloud for Cloud Service Providers and Enterprises
ActOnCloud for Cloud Service Providers and EnterprisesActOnCloud for Cloud Service Providers and Enterprises
ActOnCloud for Cloud Service Providers and Enterprises
 
Obtaining the Programmatic Holy Grail: Transparency, Flexibility & Control
Obtaining the Programmatic Holy Grail: Transparency, Flexibility & ControlObtaining the Programmatic Holy Grail: Transparency, Flexibility & Control
Obtaining the Programmatic Holy Grail: Transparency, Flexibility & Control
 
Our Experience with Adobe Audience Manager DMP
Our Experience with Adobe Audience Manager DMPOur Experience with Adobe Audience Manager DMP
Our Experience with Adobe Audience Manager DMP
 
Alo tech master presentation short_google partners
Alo tech master presentation short_google partnersAlo tech master presentation short_google partners
Alo tech master presentation short_google partners
 
Axonite Campaign Automation Infrastructure for HasOffers
Axonite Campaign Automation Infrastructure for HasOffersAxonite Campaign Automation Infrastructure for HasOffers
Axonite Campaign Automation Infrastructure for HasOffers
 

Similar to Criteo TektosData Meetup

Recommendation at scale
Recommendation at scaleRecommendation at scale
Recommendation at scale
simondolle
 
Simon Dollé_Large-scale Real-time recommendation at Criteo
Simon Dollé_Large-scale Real-time recommendation at Criteo Simon Dollé_Large-scale Real-time recommendation at Criteo
Simon Dollé_Large-scale Real-time recommendation at Criteo
Dataconomy Media
 
Tech Job Conference: Software Engineer @Criteo
Tech Job Conference: Software Engineer @CriteoTech Job Conference: Software Engineer @Criteo
Tech Job Conference: Software Engineer @Criteo
Gilles Legoux
 
Top Trends in Building Data Lakes for Machine Learning and AI
Top Trends in Building Data Lakes for Machine Learning and AI Top Trends in Building Data Lakes for Machine Learning and AI
Top Trends in Building Data Lakes for Machine Learning and AI
Holden Ackerman
 
VUCA - Planning for the essentially unplannable in a disruptive world
VUCA - Planning for the essentially unplannable in a disruptive worldVUCA - Planning for the essentially unplannable in a disruptive world
VUCA - Planning for the essentially unplannable in a disruptive world
Joakim Lindbom
 
Enterprise Cloud Adoption
Enterprise Cloud Adoption Enterprise Cloud Adoption
Enterprise Cloud Adoption
Tom Laszewski
 
Big Data at Tube: Events to Insights to Action
Big Data at Tube: Events to Insights to ActionBig Data at Tube: Events to Insights to Action
Big Data at Tube: Events to Insights to Action
Murtaza Doctor
 
Why choose-liferay
Why choose-liferayWhy choose-liferay
Why choose-liferay
Ruud Kluivers
 
Data_and_Analytics_Industry_IESE_v3.pdf
Data_and_Analytics_Industry_IESE_v3.pdfData_and_Analytics_Industry_IESE_v3.pdf
Data_and_Analytics_Industry_IESE_v3.pdf
prevota
 
Ingesting click events for analytics
Ingesting click events for analyticsIngesting click events for analytics
Ingesting click events for analytics
Data Driven Innovation
 
BIG Data & Hadoop Applications in E-Commerce
BIG Data & Hadoop Applications in E-CommerceBIG Data & Hadoop Applications in E-Commerce
BIG Data & Hadoop Applications in E-Commerce
Skillspeed
 
Platform approach to scaling machine learning across the enterprise
Platform approach to scaling machine learning across the enterprisePlatform approach to scaling machine learning across the enterprise
Platform approach to scaling machine learning across the enterprise
Olalekan Fuad Elesin
 
20151119 Sensibilisation des Utilisateurs aux coûts d'usage du Cloud
20151119 Sensibilisation des Utilisateurs aux coûts d'usage du Cloud20151119 Sensibilisation des Utilisateurs aux coûts d'usage du Cloud
20151119 Sensibilisation des Utilisateurs aux coûts d'usage du Cloud
Objectif Libre
 
Serverless projects at Myplanet
Serverless projects at MyplanetServerless projects at Myplanet
Serverless projects at Myplanet
Daniel Zivkovic
 
Connecta Event: Big Query och dataanalys med Google Cloud Platform
Connecta Event: Big Query och dataanalys med Google Cloud PlatformConnecta Event: Big Query och dataanalys med Google Cloud Platform
Connecta Event: Big Query och dataanalys med Google Cloud Platform
ConnectaDigital
 
AI in Software for Augmenting Intelligence Across the Enterprise
AI in Software for Augmenting Intelligence Across the EnterpriseAI in Software for Augmenting Intelligence Across the Enterprise
AI in Software for Augmenting Intelligence Across the Enterprise
The Hive
 
Ingesting Click Data for Analytics
Ingesting Click Data for AnalyticsIngesting Click Data for Analytics
Ingesting Click Data for Analytics
ClickMeter
 
The Data Lake: Empowering Your Data Science Team
The Data Lake: Empowering Your Data Science TeamThe Data Lake: Empowering Your Data Science Team
The Data Lake: Empowering Your Data Science Team
Senturus
 
Optimizing Innovation: Modular Toolchains that Enable Digital Transformations
Optimizing Innovation: Modular Toolchains that Enable Digital TransformationsOptimizing Innovation: Modular Toolchains that Enable Digital Transformations
Optimizing Innovation: Modular Toolchains that Enable Digital Transformations
DevOps.com
 
Optimizing Innovation- Modular Toolchains that Enable Digital Transformations
Optimizing Innovation-  Modular Toolchains that Enable Digital TransformationsOptimizing Innovation-  Modular Toolchains that Enable Digital Transformations
Optimizing Innovation- Modular Toolchains that Enable Digital Transformations
Tasktop
 

Similar to Criteo TektosData Meetup (20)

Recommendation at scale
Recommendation at scaleRecommendation at scale
Recommendation at scale
 
Simon Dollé_Large-scale Real-time recommendation at Criteo
Simon Dollé_Large-scale Real-time recommendation at Criteo Simon Dollé_Large-scale Real-time recommendation at Criteo
Simon Dollé_Large-scale Real-time recommendation at Criteo
 
Tech Job Conference: Software Engineer @Criteo
Tech Job Conference: Software Engineer @CriteoTech Job Conference: Software Engineer @Criteo
Tech Job Conference: Software Engineer @Criteo
 
Top Trends in Building Data Lakes for Machine Learning and AI
Top Trends in Building Data Lakes for Machine Learning and AI Top Trends in Building Data Lakes for Machine Learning and AI
Top Trends in Building Data Lakes for Machine Learning and AI
 
VUCA - Planning for the essentially unplannable in a disruptive world
VUCA - Planning for the essentially unplannable in a disruptive worldVUCA - Planning for the essentially unplannable in a disruptive world
VUCA - Planning for the essentially unplannable in a disruptive world
 
Enterprise Cloud Adoption
Enterprise Cloud Adoption Enterprise Cloud Adoption
Enterprise Cloud Adoption
 
Big Data at Tube: Events to Insights to Action
Big Data at Tube: Events to Insights to ActionBig Data at Tube: Events to Insights to Action
Big Data at Tube: Events to Insights to Action
 
Why choose-liferay
Why choose-liferayWhy choose-liferay
Why choose-liferay
 
Data_and_Analytics_Industry_IESE_v3.pdf
Data_and_Analytics_Industry_IESE_v3.pdfData_and_Analytics_Industry_IESE_v3.pdf
Data_and_Analytics_Industry_IESE_v3.pdf
 
Ingesting click events for analytics
Ingesting click events for analyticsIngesting click events for analytics
Ingesting click events for analytics
 
BIG Data & Hadoop Applications in E-Commerce
BIG Data & Hadoop Applications in E-CommerceBIG Data & Hadoop Applications in E-Commerce
BIG Data & Hadoop Applications in E-Commerce
 
Platform approach to scaling machine learning across the enterprise
Platform approach to scaling machine learning across the enterprisePlatform approach to scaling machine learning across the enterprise
Platform approach to scaling machine learning across the enterprise
 
20151119 Sensibilisation des Utilisateurs aux coûts d'usage du Cloud
20151119 Sensibilisation des Utilisateurs aux coûts d'usage du Cloud20151119 Sensibilisation des Utilisateurs aux coûts d'usage du Cloud
20151119 Sensibilisation des Utilisateurs aux coûts d'usage du Cloud
 
Serverless projects at Myplanet
Serverless projects at MyplanetServerless projects at Myplanet
Serverless projects at Myplanet
 
Connecta Event: Big Query och dataanalys med Google Cloud Platform
Connecta Event: Big Query och dataanalys med Google Cloud PlatformConnecta Event: Big Query och dataanalys med Google Cloud Platform
Connecta Event: Big Query och dataanalys med Google Cloud Platform
 
AI in Software for Augmenting Intelligence Across the Enterprise
AI in Software for Augmenting Intelligence Across the EnterpriseAI in Software for Augmenting Intelligence Across the Enterprise
AI in Software for Augmenting Intelligence Across the Enterprise
 
Ingesting Click Data for Analytics
Ingesting Click Data for AnalyticsIngesting Click Data for Analytics
Ingesting Click Data for Analytics
 
The Data Lake: Empowering Your Data Science Team
The Data Lake: Empowering Your Data Science TeamThe Data Lake: Empowering Your Data Science Team
The Data Lake: Empowering Your Data Science Team
 
Optimizing Innovation: Modular Toolchains that Enable Digital Transformations
Optimizing Innovation: Modular Toolchains that Enable Digital TransformationsOptimizing Innovation: Modular Toolchains that Enable Digital Transformations
Optimizing Innovation: Modular Toolchains that Enable Digital Transformations
 
Optimizing Innovation- Modular Toolchains that Enable Digital Transformations
Optimizing Innovation-  Modular Toolchains that Enable Digital TransformationsOptimizing Innovation-  Modular Toolchains that Enable Digital Transformations
Optimizing Innovation- Modular Toolchains that Enable Digital Transformations
 

Recently uploaded

Unit-III-ELECTROCHEMICAL STORAGE DEVICES.ppt
Unit-III-ELECTROCHEMICAL STORAGE DEVICES.pptUnit-III-ELECTROCHEMICAL STORAGE DEVICES.ppt
Unit-III-ELECTROCHEMICAL STORAGE DEVICES.ppt
KrishnaveniKrishnara1
 
Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...
Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...
Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...
IJECEIAES
 
Understanding Inductive Bias in Machine Learning
Understanding Inductive Bias in Machine LearningUnderstanding Inductive Bias in Machine Learning
Understanding Inductive Bias in Machine Learning
SUTEJAS
 
Textile Chemical Processing and Dyeing.pdf
Textile Chemical Processing and Dyeing.pdfTextile Chemical Processing and Dyeing.pdf
Textile Chemical Processing and Dyeing.pdf
NazakatAliKhoso2
 
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024
Sinan KOZAK
 
Advanced control scheme of doubly fed induction generator for wind turbine us...
Advanced control scheme of doubly fed induction generator for wind turbine us...Advanced control scheme of doubly fed induction generator for wind turbine us...
Advanced control scheme of doubly fed induction generator for wind turbine us...
IJECEIAES
 
5214-1693458878915-Unit 6 2023 to 2024 academic year assignment (AutoRecovere...
5214-1693458878915-Unit 6 2023 to 2024 academic year assignment (AutoRecovere...5214-1693458878915-Unit 6 2023 to 2024 academic year assignment (AutoRecovere...
5214-1693458878915-Unit 6 2023 to 2024 academic year assignment (AutoRecovere...
ihlasbinance2003
 
A SYSTEMATIC RISK ASSESSMENT APPROACH FOR SECURING THE SMART IRRIGATION SYSTEMS
A SYSTEMATIC RISK ASSESSMENT APPROACH FOR SECURING THE SMART IRRIGATION SYSTEMSA SYSTEMATIC RISK ASSESSMENT APPROACH FOR SECURING THE SMART IRRIGATION SYSTEMS
A SYSTEMATIC RISK ASSESSMENT APPROACH FOR SECURING THE SMART IRRIGATION SYSTEMS
IJNSA Journal
 
A review on techniques and modelling methodologies used for checking electrom...
A review on techniques and modelling methodologies used for checking electrom...A review on techniques and modelling methodologies used for checking electrom...
A review on techniques and modelling methodologies used for checking electrom...
nooriasukmaningtyas
 
The Python for beginners. This is an advance computer language.
The Python for beginners. This is an advance computer language.The Python for beginners. This is an advance computer language.
The Python for beginners. This is an advance computer language.
sachin chaurasia
 
官方认证美国密歇根州立大学毕业证学位证书原版一模一样
官方认证美国密歇根州立大学毕业证学位证书原版一模一样官方认证美国密歇根州立大学毕业证学位证书原版一模一样
官方认证美国密歇根州立大学毕业证学位证书原版一模一样
171ticu
 
Presentation of IEEE Slovenia CIS (Computational Intelligence Society) Chapte...
Presentation of IEEE Slovenia CIS (Computational Intelligence Society) Chapte...Presentation of IEEE Slovenia CIS (Computational Intelligence Society) Chapte...
Presentation of IEEE Slovenia CIS (Computational Intelligence Society) Chapte...
University of Maribor
 
Casting-Defect-inSlab continuous casting.pdf
Casting-Defect-inSlab continuous casting.pdfCasting-Defect-inSlab continuous casting.pdf
Casting-Defect-inSlab continuous casting.pdf
zubairahmad848137
 
Recycled Concrete Aggregate in Construction Part III
Recycled Concrete Aggregate in Construction Part IIIRecycled Concrete Aggregate in Construction Part III
Recycled Concrete Aggregate in Construction Part III
Aditya Rajan Patra
 
Modelagem de um CSTR com reação endotermica.pdf
Modelagem de um CSTR com reação endotermica.pdfModelagem de um CSTR com reação endotermica.pdf
Modelagem de um CSTR com reação endotermica.pdf
camseq
 
学校原版美国波士顿大学毕业证学历学位证书原版一模一样
学校原版美国波士顿大学毕业证学历学位证书原版一模一样学校原版美国波士顿大学毕业证学历学位证书原版一模一样
学校原版美国波士顿大学毕业证学历学位证书原版一模一样
171ticu
 
Recycled Concrete Aggregate in Construction Part II
Recycled Concrete Aggregate in Construction Part IIRecycled Concrete Aggregate in Construction Part II
Recycled Concrete Aggregate in Construction Part II
Aditya Rajan Patra
 
Engine Lubrication performance System.pdf
Engine Lubrication performance System.pdfEngine Lubrication performance System.pdf
Engine Lubrication performance System.pdf
mamamaam477
 
International Conference on NLP, Artificial Intelligence, Machine Learning an...
International Conference on NLP, Artificial Intelligence, Machine Learning an...International Conference on NLP, Artificial Intelligence, Machine Learning an...
International Conference on NLP, Artificial Intelligence, Machine Learning an...
gerogepatton
 
ML Based Model for NIDS MSc Updated Presentation.v2.pptx
ML Based Model for NIDS MSc Updated Presentation.v2.pptxML Based Model for NIDS MSc Updated Presentation.v2.pptx
ML Based Model for NIDS MSc Updated Presentation.v2.pptx
JamalHussainArman
 

Recently uploaded (20)

Unit-III-ELECTROCHEMICAL STORAGE DEVICES.ppt
Unit-III-ELECTROCHEMICAL STORAGE DEVICES.pptUnit-III-ELECTROCHEMICAL STORAGE DEVICES.ppt
Unit-III-ELECTROCHEMICAL STORAGE DEVICES.ppt
 
Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...
Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...
Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...
 
Understanding Inductive Bias in Machine Learning
Understanding Inductive Bias in Machine LearningUnderstanding Inductive Bias in Machine Learning
Understanding Inductive Bias in Machine Learning
 
Textile Chemical Processing and Dyeing.pdf
Textile Chemical Processing and Dyeing.pdfTextile Chemical Processing and Dyeing.pdf
Textile Chemical Processing and Dyeing.pdf
 
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024
 
Advanced control scheme of doubly fed induction generator for wind turbine us...
Advanced control scheme of doubly fed induction generator for wind turbine us...Advanced control scheme of doubly fed induction generator for wind turbine us...
Advanced control scheme of doubly fed induction generator for wind turbine us...
 
5214-1693458878915-Unit 6 2023 to 2024 academic year assignment (AutoRecovere...
5214-1693458878915-Unit 6 2023 to 2024 academic year assignment (AutoRecovere...5214-1693458878915-Unit 6 2023 to 2024 academic year assignment (AutoRecovere...
5214-1693458878915-Unit 6 2023 to 2024 academic year assignment (AutoRecovere...
 
A SYSTEMATIC RISK ASSESSMENT APPROACH FOR SECURING THE SMART IRRIGATION SYSTEMS
A SYSTEMATIC RISK ASSESSMENT APPROACH FOR SECURING THE SMART IRRIGATION SYSTEMSA SYSTEMATIC RISK ASSESSMENT APPROACH FOR SECURING THE SMART IRRIGATION SYSTEMS
A SYSTEMATIC RISK ASSESSMENT APPROACH FOR SECURING THE SMART IRRIGATION SYSTEMS
 
A review on techniques and modelling methodologies used for checking electrom...
A review on techniques and modelling methodologies used for checking electrom...A review on techniques and modelling methodologies used for checking electrom...
A review on techniques and modelling methodologies used for checking electrom...
 
The Python for beginners. This is an advance computer language.
The Python for beginners. This is an advance computer language.The Python for beginners. This is an advance computer language.
The Python for beginners. This is an advance computer language.
 
官方认证美国密歇根州立大学毕业证学位证书原版一模一样
官方认证美国密歇根州立大学毕业证学位证书原版一模一样官方认证美国密歇根州立大学毕业证学位证书原版一模一样
官方认证美国密歇根州立大学毕业证学位证书原版一模一样
 
Presentation of IEEE Slovenia CIS (Computational Intelligence Society) Chapte...
Presentation of IEEE Slovenia CIS (Computational Intelligence Society) Chapte...Presentation of IEEE Slovenia CIS (Computational Intelligence Society) Chapte...
Presentation of IEEE Slovenia CIS (Computational Intelligence Society) Chapte...
 
Casting-Defect-inSlab continuous casting.pdf
Casting-Defect-inSlab continuous casting.pdfCasting-Defect-inSlab continuous casting.pdf
Casting-Defect-inSlab continuous casting.pdf
 
Recycled Concrete Aggregate in Construction Part III
Recycled Concrete Aggregate in Construction Part IIIRecycled Concrete Aggregate in Construction Part III
Recycled Concrete Aggregate in Construction Part III
 
Modelagem de um CSTR com reação endotermica.pdf
Modelagem de um CSTR com reação endotermica.pdfModelagem de um CSTR com reação endotermica.pdf
Modelagem de um CSTR com reação endotermica.pdf
 
学校原版美国波士顿大学毕业证学历学位证书原版一模一样
学校原版美国波士顿大学毕业证学历学位证书原版一模一样学校原版美国波士顿大学毕业证学历学位证书原版一模一样
学校原版美国波士顿大学毕业证学历学位证书原版一模一样
 
Recycled Concrete Aggregate in Construction Part II
Recycled Concrete Aggregate in Construction Part IIRecycled Concrete Aggregate in Construction Part II
Recycled Concrete Aggregate in Construction Part II
 
Engine Lubrication performance System.pdf
Engine Lubrication performance System.pdfEngine Lubrication performance System.pdf
Engine Lubrication performance System.pdf
 
International Conference on NLP, Artificial Intelligence, Machine Learning an...
International Conference on NLP, Artificial Intelligence, Machine Learning an...International Conference on NLP, Artificial Intelligence, Machine Learning an...
International Conference on NLP, Artificial Intelligence, Machine Learning an...
 
ML Based Model for NIDS MSc Updated Presentation.v2.pptx
ML Based Model for NIDS MSc Updated Presentation.v2.pptxML Based Model for NIDS MSc Updated Presentation.v2.pptx
ML Based Model for NIDS MSc Updated Presentation.v2.pptx
 

Criteo TektosData Meetup

  • 1. Copyright © 2015 Criteo The Criteo Experience Olivier Koch Engineering Program Manager, Criteo TektosData Meetup “Data Meets Business” May 31, 2016
  • 2. Copyright © 2015 Criteo Outline • What does Criteo do? • Deep dive into our technical stack • Delivery at scale • A few lessons learned 2
  • 3. Copyright © 2015 Criteo Banners… what else? 3 Advertiser Publisher
  • 4. Copyright © 2015 Criteo Online advertising at scale 4 3B displays / day 40 PB of data 15,000 servers worldwide
  • 5. Copyright © 2015 Criteo • Deep dive into Criteo
  • 6. Copyright © 2015 Criteo 6 Bidding •Should we bid? •At which price? Recommendation •Which products should we display? Look & Feel •Big image vs small image •Background color, ... Prediction •Generic prediction engine •Specific models trained on TBs of data
  • 7. Copyright © 2015 Criteo 7 Bidding •Should we bid? •At which price? Recommendation •Which products should we display? Look & Feel •Big image vs small image •Background color, ... Prediction •Generic prediction engine •Specific models trained on TBs of data
  • 8. Copyright © 2015 Criteo  As we sell performance Criteo’s and client’s interests are aligned, so the engine aims at maximizing the value we generate to our clients  As the cost of a display is lower and independant from the bid (2nd price auction or floor), we should always bid the maximum value that the client is willing to pay for a display We bid the expected value of the display for the client Value = 1€ CPM = 0,6€ CPM = 0,7€ CPM = 0,75€ CPM = 1,1€ CPM = 1,2€ CPM = 1,3€ This bidding strategy is optimal: we are sure to buy all profitable displays and only them
  • 9. Copyright © 2015 Criteo Bid =   CPC  pClick  pSale  AOV 2012 - Ensures constant value allocation between Criteo and its clients 2014 - COS Optimizer 2013 - CRO : “Conversion Rate Optimizer” This value depends on the predicted performance and the client’s objective Revenue that the display will generate for the clientMaximum share that the client is willing to pay
  • 10. Copyright © 2015 Criteo We train our prediction models on our historical displays Historical displays Variables  Level of engagement of the user  Quality of inventory  User fatigue  For travel: time to check-in and number of nights : clicked displays : converted displays (size = order value) Our ability to predict relies greatly on the relevance of the variables we consider Machine Learning Algorithms
  • 11. Copyright © 2015 Criteo 11 Bidding •Should we bid? •At which price? Recommendation •Which products should we display? Look & Feel •Big image vs small image •Background color, ... Prediction •Generic prediction engine •Specific models trained on TBs of data
  • 12. Copyright © 2015 Criteo Recommend products for a user • What we want: reco(user) = products • 1B users x 3B products! • But we need to scale and keep it fresh
  • 13. Copyright © 2015 Criteo User X saw orange shoes Users who saw these same shoes also saw Most viewed product on the client’s site are We use collaborative filtering to select candidate products Candidate products for user X are Historical Similar Best-of
  • 14. Copyright © 2015 Criteo Products delivering the best performance are displayed Variables  Products seen by the user  Time since product event  Level of similarity  Product features Historical displays : clicked products : converted products (size = order value) Products are selected based on their pClick x pSale x AOV Machine Learning Algorithms
  • 15. Copyright © 2015 Criteo 15 Bidding •Should we bid? •At which price? Recommendation •Which products should we display? Look & Feel •Big image vs small image •Background color, ... Prediction •Generic prediction engine •Specific models trained on TBs of data
  • 16. Copyright © 2015 Criteo Historical displays (color = look & feel) We train our prediction models on our historical displays Variables Some of which we control:  How user interacts with banner  Organization of information  Colorset Some of which we don’t:  Zone format  Publisher : clicked displays : converted displays (size = order value) Look and feel will be selected based on its pClick x pSale x AOV My company BUY! BUY! BUY! BUY! Machine Learning Algorithms
  • 17. Copyright © 2015 Criteo 17 Bidding •Should we bid? •At which price? Recommendation •Which products should we display? Look & Feel •Big image vs small image •Background color, ... Prediction •Generic prediction engine •Specific models trained on TBs of data
  • 18. Copyright © 2015 Criteo  Predict: 𝔼 𝑆𝑎𝑙𝑒𝑠𝐴𝑚𝑜𝑢𝑛𝑡 = ℙ 𝐶𝑙𝑖𝑐𝑘 ℙ 𝑆𝑎𝑙𝑒|𝐶𝑙𝑖𝑐𝑘 𝔼[𝑆𝑎𝑙𝑒𝑠𝐴𝑚𝑜𝑢𝑛𝑡|𝑆𝑎𝑙𝑒]  Each model is trained independently & refreshed as often as possible  Three sources of features: user, ad, page (mostly categorical). Optimizing for sales amount (logistic) (logistic) (log normal) (all regularized!)
  • 19. Copyright © 2015 Criteo Learn on huge volumes of data 10 000 displays
  • 20. Copyright © 2015 Criteo Learn on huge volumes of data 10 000 displays leads to 50 clicks
  • 21. Copyright © 2015 Criteo Learn on huge volumes of data 10 000 displays leads to 50 clicks leads to 1 sale
  • 22. Copyright © 2015 Criteo  We have our own large-scale distributed machine learning library on top of Hadoop used for all models.  From a ML perspective we rely, in most cases, on an L-BFGS solver initialized with SGD (see, eg, A. Agarwal et al. A Reliable Effective Terascale Linear Learning System). In-house Machine Learning library -- IRMA
  • 23. Copyright © 2015 Criteo Learning duration: trading time and volume Longer ⇒ Volume ↑ VS Shorter ⇒ Reactivity ↑ 23 100 110 120 130 140 150 160 170 180 190 200 11/01/2014 21/01/2014 31/01/2014 10/02/2014 20/02/2014 Salesamount(€) Valentine’s day eve Precision Learning duration 12/02/2014 13/02/2014 14/02/2014 15/02/2014 16/02/2014 17/02/2014 18/02/2014 All
  • 24. Copyright © 2015 Criteo  Each model is trained on several TB of data and contains millions of features  We learn several hundreds of models, refreshed many times per day  How about large-scale distributed machine learning? Wait a minute: how do you handle TBs of training data? + =
  • 25. Copyright © 2015 Criteo  Hadoop AllReduce  L-BFGS, being a batch algorithm, is easy to distribute (by distributing the computation of the gradient), while it’s more difficult with SGD; we do parameter averaging for that, which needs some tweaking (learning rate, number of epochs, …). In SGD, we use Hogwild! to multi-thread.  Zookeeper to ensure fault-tolerance. Distribution of L-BFGS & SGD
  • 26. Copyright © 2015 Criteo  Irma is not only about vanilla logistic regression with L2 regularization; it contains more advanced techniques: transfer learning, factorization machines, learning to rank, …  We for example use cost-sensitive learning for bidding. A word on advanced techniques
  • 27. Copyright © 2015 Criteo Two steps:  Offline testing is fast, cheap, and efficient for wide exploration  Online testing is expensive but has the ultimate word  The more data you have, the faster you can make decisions Offline & online evaluation
  • 28. Copyright © 2015 Criteo 28 Physical infrastructure 7 in-house data centers on 3 continents ~ 15000 servers, largest Hadoop cluster in Europe More than 35 PB of storage Big Data Traffic 800k HTTP requests / sec (peak activity) 29000 impressions / sec (peak activity) <10 ms to process bidding request <100 ms to process reco request
  • 29. Copyright © 2015 Criteo Academic research @ Criteo • Our 1st public dataset is online: http://bit.ly/1vgw2XC • New 1TB dataset released last year • Recent publications: Offline evaluation of response prediction in online advertising auctions, O. Chapelle, WWW’15. Sources of variability in large-scale machine learning systems, D. Lefortier, A. Truchet, and M. de Rijke, NIPS workshop on ML systems, 2015 Cost-sensitive learning for bidding in online advertising auctions, F. Vasile and D. Lefortier, NIPS workshop on ML for e-commerce, 2015. 29
  • 30. Copyright © 2015 Criteo New areas of research • Counterfactual evaluation (offline A/B tests) • Product embeddings for recommendation • Policy learning 30
  • 31. Copyright © 2015 Criteo • Delivery at scale
  • 32. Copyright © 2015 Criteo The early days of Criteo 32 Single C# repository Build in 90 minutes Weekly merges
  • 33. Copyright © 2015 Criteo What could go wrong? 33
  • 34. Copyright © 2015 Criteo 34
  • 35. Copyright © 2015 Criteo Delivery at scale at Criteo 35 Trunk-based development (TBD) Fast commits Code reviews with Gerrit The MOAB Deploy with scp / bittorrent Automatic metrics checks => 200+ happy engineers!
  • 36. Copyright © 2015 Criteo The Criteo MOAB 36
  • 37. Copyright © 2015 Criteo Delivery at scale at Criteo 37
  • 38. Copyright © 2015 Criteo • A few lessons learned
  • 39. Copyright © 2015 Criteo Start small • If you can't build it with a few machines, it's likely you won't be able to do it with many 39 First Google computer
  • 40. Copyright © 2015 Criteo Start small • Keep fancy algorithms for later 40 The Page rank algorithm
  • 41. Copyright © 2015 Criteo Iterate fast • Easy access to data (20PB vs 4GB of clean, carefully selected data) • Convenient technologies (e.g. Python & notebooks, scikit-learn) • Make IT a non-problem • Keep projects small (typical project size 3-9 months) 41
  • 42. Copyright © 2015 Criteo Iterate fast • Easy access to data (20PB vs 4GB of clean, carefully selected data) • Convenient technologies (e.g. Python & notebooks, scikit-learn) • Make IT a non-problem • Keep projects small (typical project size 3-9 months) 42 Talent magnet
  • 43. Copyright © 2015 Criteo Keep teams small 43 3 members 3 channels 4 members 6 channels 5 members 10 channels 10 members 45 channels …
  • 44. Copyright © 2015 Criteo Build the right team • Variety of skills • Software/ML engineers, ops/devops • Analysts/BI • Product • Designers • Managers 44
  • 45. Copyright © 2015 Criteo Make the team agile • Use a flat, distributed hierarchy model and make people sit next to each other 45 EPM ENG LEAD PM MGR
  • 46. Copyright © 2015 Criteo Make the team agile • Use the right tools • slack • jira • confluence • git • gerrit • OKR 46
  • 47. Copyright © 2015 Criteo Build the culture • Let ideas emerge bottom-up • Hackathons (for real) • 10% projects • Transparency : make info available to all • Use mature technologies • You will fail. That’s OK! 47
  • 48. Copyright © 2015 Criteo Take-aways • Start small • Iterate fast • Build the team • Make the team agile • Build the culture 48
  • 49. Copyright © 2015 Criteo • Thanks! Questions?