SlideShare a Scribd company logo
1 of 15
Download to read offline
How Criteo Scaled and Supported 
Massive Growth with MongoDB

MongoDB Conference
New York City, June 2013
Julien SIMON
Vice President, Engineering
j.simon@criteo.com @julsimon
CRITEO "

2
•  R&D EFFORT
•  RETARGETING
•  CPC
PHASE 1 : 2005-2008
CRITEO CREATION 
•  MORE THAN 3000 CLIENTS
•  35 COUNTRIES, 15 OFFICES
•  R&D: MORE THAN 300 PEOPLE
PHASE 2 : 2008-2012
GLOBAL LEADER : + 700 EMPLOYEES!
2007
15
EMPLOYEES

2009
84
EMPLOYEES

6
EMPLOYEES

2005
2010
203
EMPLOYEES

2012
+700
EMPLOYEES SO FAR

2006
2011
395
EMPLOYEES

2008
33
EMPLOYEES
GLOBAL PRESENCE
3
SYDNEY
PARIS
LONDON
BARCELONA
MILAN
MUNICH
BOSTON
NEW YORK
SAO PAULO
PALO ALTO
TOKYO
SEOUL
STOCKHOLM
AMSTERDAM

15 OFFICES, 30+ COUNTRIES
CHICAGO
GO
 GO
GO
Powered by 
PERFORMANCE DISPLAY
Copyright © 2013 Criteo. Confidential
A user sees products 
on your …
… and sees

After on the banner, 
the user goes back to the product page.
...then browses 
the
4
!
!
REAL-TIME PERSONALIZATION
5
Copyright © 2013 Criteo. Confidential.
Boutons!
all original #represent
SHOP NOW
Couleurs
Fond
 Disposition!
WARM MEETS LIGHT!
SWEET NOTHING!
ADDIDAS IS ALL IN!
ALL	
  ORIGINALS	
  
#REPRESENT	
  
Slogans!
JOIN NOW!
SEE MORE!
CLICK HERE!
“Call to action”!
Lien !
opt-out!
SEE MORE!
JOIN NOW!
SEE MORE!
CLICK HERE!
SHOP NOW!
SHOP	
  NOW	
  	
  
JOIN	
  NOW	
  	
  
JOIN NOW
PREDICTION & RECOMMENDATION
2 CORE TECHNOLOGIES
choose the right product
to display
choose the right users / advertiser /
publisher to display
RECOMMENDATION
ENGINE
CTR + CR
increase
PREDICTION 
ENGINE
INFRASTRUCTURE
7
Copyright © 2013 Criteo. Confidential.
 DAILY TRAFFIC
- HTTP REQUESTS: 30+ BILLION
- BANNERS SERVED: 1+ BILLION
 PEAK TRAFFIC (PER SECOND)
- HTTP REQUESTS: 500,000+
- BANNERS: 25,000+
 7 DATA CENTERS
 SET UP AND MANAGED
IN-HOUSE
 AVAILABILITY > 99.95%
8
Copyright © 2013 Criteo. Confidential.
HIGH PERFORMANCE COMPUTING
FETCH, STORE, CRUNCH, QUERY 20 additional TB EVERY DAY ?

…SUBTITLED « HOW I LEARNED TO STOP WORRYING AND LOVE HPC »
PRODUCT CATALOGUES
•  Catalogue = product feed provided by advertisers (product id, description,
category, price, URL, etc)
•  3000+ catalogues, ranging from a few MB to several tens of GB
•  About 50% of products change every day
•  Imported at least once a day by an in-house application
•  Data replicated within a geographical zone
•  Accessed through a cache layer by web servers
•  Microsoft SQL Server used from day 1
•  Running fine in Europe, but…
–  Number of databases (1 per advertiser)… and servers
–  Size of databases
–  SQL Server issues hard to debug and understand
•  Running kind of fine in the US, until dead end in Q1 2011
–  transactional replication over high latency links
Copyright © 2010 Criteo. Confidential.
REQUIREMENTS FOR A NEW DB 
•  Scale-out architecture running on commodity hardware
(aka « Intel CPUs in metal boxes »)
•  No transactions needed, eventual consistency OK
•  High availability
•  Distributed clusters, with replication over high latency links
•  Requestable (key-value not enough)
•  Open source
… with active user community
… backed by a stable organization with long-term commitment (not one guy in a garage)
… no licence fees for production use
… commercial support available at reasonable cost
•  Easy to learn, (re)deploy, monitor and upgrade
•  « Low maintenance » (don’t need a 10-people team just to run it)
•  Multi-language support
•  Ability to export everything to Hadoop multiple times per day
Copyright © 2010 Criteo. Confidential.
FROM SQL SERVER TO MONGODB
•  Ah, database migrations… everyone loves them J
•  1st step: solve replication issue
–  Import and replicate catalogues in MongoDB
–  Push content to SQL Server, still queried by web servers
•  2nd step: prove that MongoDB can survive our web traffic
–  Modify web applications to query MongoDB
–  C-a-r-e-f-u-l-l-y switch web queries to MongoDB for a small set of catalogues
–  Observe, measure, A/B test… and generally make sure that the system still works
•  3rd step: scale !
–  Migrate thousands of catalogues away from SQL Server
–  Monitor and tweak the MongoDB clusters
–  Add more MongoDB servers… and more shards
–  Update ops processes (monitoring, backups, etc)
Copyright © 2010 Criteo. Confidential.
OUR MONGODB DEPLOYMENT 
•  Europe
–  18 3-server shards (1+1+1)
–  800M products, 1TB
–  1B requests/day (peak at 40K/s)
–  350M updates/day (peak at 11K/s)
•  US
–  14 4-server shards (2+2)
–  400M products, 650GB
•  APAC
–  12 3-server shards (2+1)
–  300M products, 500GB
•  146 servers total :
2.0 (+ Criteo patches) à 2.2 à 2.4.3
Copyright © 2010 Criteo. Confidential.
MONGODB, 2+ YEARS LATER
•  Stable (2.4.3 much better)
•  Easy to (re)install and administer
•  Great for small datasets (i.e. smaller than server RAM)
•  Good performance if read/write ratio is high
•  Failover and inter-DC replication work (but shard early!)
•  Performance suffers when :
–  dataset much larger than RAM
–  read/write ratio is low
–  Multiple applications coexist on the same cluster
•  Some scalability issues remain (master-slave, connections)
•  Criteo is very interested in the 10gen roadmap J
Copyright © 2010 Criteo. Confidential.
THANKS A LOT FOR YOUR ATTENTION!
14
Copyright © 2013 Criteo. Confidential.
www.criteo.com
engineering.criteo.com
How Criteo Scaled and Supported  Massive Growth with MongoDB (2013)

More Related Content

Similar to How Criteo Scaled and Supported Massive Growth with MongoDB (2013)

Scaling wix to over 70 m users
Scaling wix to over 70 m usersScaling wix to over 70 m users
Scaling wix to over 70 m usersYoav Avrahami
 
Dubbo and Weidian's practice on micro-service architecture
Dubbo and Weidian's practice on micro-service architectureDubbo and Weidian's practice on micro-service architecture
Dubbo and Weidian's practice on micro-service architectureHuxing Zhang
 
Effective Microservices In a Data-centric World
Effective Microservices In a Data-centric WorldEffective Microservices In a Data-centric World
Effective Microservices In a Data-centric WorldRandy Shoup
 
Beyond CDNs: How to Harness the Next Phase of Innovation in Web Performance
Beyond CDNs: How to Harness the Next Phase of Innovation in Web PerformanceBeyond CDNs: How to Harness the Next Phase of Innovation in Web Performance
Beyond CDNs: How to Harness the Next Phase of Innovation in Web PerformanceYottaa
 
Hadoop Summit 2016 - Evolution of Big Data Pipelines At Intuit
Hadoop Summit 2016 - Evolution of Big Data Pipelines At IntuitHadoop Summit 2016 - Evolution of Big Data Pipelines At Intuit
Hadoop Summit 2016 - Evolution of Big Data Pipelines At IntuitRekha Joshi
 
Cincom Smalltalk: Present, Future & Smalltalk Advocacy
Cincom Smalltalk: Present, Future & Smalltalk AdvocacyCincom Smalltalk: Present, Future & Smalltalk Advocacy
Cincom Smalltalk: Present, Future & Smalltalk AdvocacyESUG
 
Rapid Application Development with MEAN Stack
Rapid Application Development with MEAN StackRapid Application Development with MEAN Stack
Rapid Application Development with MEAN StackAvinash Kaza
 
Webinar: NoSQL as the New Normal
Webinar: NoSQL as the New NormalWebinar: NoSQL as the New Normal
Webinar: NoSQL as the New NormalMongoDB
 
Meteor Revolution: From DDP to Blaze Reactive Rendering
Meteor Revolution: From DDP to Blaze Reactive Rendering Meteor Revolution: From DDP to Blaze Reactive Rendering
Meteor Revolution: From DDP to Blaze Reactive Rendering Massimo Sgrelli
 
Migration of a high-traffic E-commerce website from Legacy Monolith to Micros...
Migration of a high-traffic E-commerce website from Legacy Monolith to Micros...Migration of a high-traffic E-commerce website from Legacy Monolith to Micros...
Migration of a high-traffic E-commerce website from Legacy Monolith to Micros...Pavel Pratyush
 
ARC202:real world real time analytics
ARC202:real world real time analyticsARC202:real world real time analytics
ARC202:real world real time analyticsSebastian Montini
 
Enterprise architectsview 2015-apr
Enterprise architectsview 2015-aprEnterprise architectsview 2015-apr
Enterprise architectsview 2015-aprMongoDB
 
MongoDB .local London 2019: Nationwide Building Society: Building Mobile Appl...
MongoDB .local London 2019: Nationwide Building Society: Building Mobile Appl...MongoDB .local London 2019: Nationwide Building Society: Building Mobile Appl...
MongoDB .local London 2019: Nationwide Building Society: Building Mobile Appl...MongoDB
 
Breaking the Monolith: Organizing Your Team to Embrace Microservices
Breaking the Monolith: Organizing Your Team to Embrace MicroservicesBreaking the Monolith: Organizing Your Team to Embrace Microservices
Breaking the Monolith: Organizing Your Team to Embrace MicroservicesPaul Osman
 
QCon 2015 - Microservices Track Notes
QCon 2015 - Microservices Track Notes QCon 2015 - Microservices Track Notes
QCon 2015 - Microservices Track Notes Abdul Basit Munda
 
SOA 11g Upgrade Experience - SNI
SOA 11g Upgrade Experience - SNISOA 11g Upgrade Experience - SNI
SOA 11g Upgrade Experience - SNIjtreague
 
Technical Debt - SOTR14 - Clarkie
Technical Debt -  SOTR14 - ClarkieTechnical Debt -  SOTR14 - Clarkie
Technical Debt - SOTR14 - ClarkieAndrew Clarke
 
ACUCOBOL - Product Strategy and Roadmap
ACUCOBOL - Product Strategy and RoadmapACUCOBOL - Product Strategy and Roadmap
ACUCOBOL - Product Strategy and RoadmapMicro Focus
 
Mongo DB: Operational Big Data Database
Mongo DB: Operational Big Data DatabaseMongo DB: Operational Big Data Database
Mongo DB: Operational Big Data DatabaseXpand IT
 
2016 Federal User Group Conference - TeamForge Capabilities and Directions
2016 Federal User Group Conference - TeamForge Capabilities and Directions2016 Federal User Group Conference - TeamForge Capabilities and Directions
2016 Federal User Group Conference - TeamForge Capabilities and DirectionsCollabNet
 

Similar to How Criteo Scaled and Supported Massive Growth with MongoDB (2013) (20)

Scaling wix to over 70 m users
Scaling wix to over 70 m usersScaling wix to over 70 m users
Scaling wix to over 70 m users
 
Dubbo and Weidian's practice on micro-service architecture
Dubbo and Weidian's practice on micro-service architectureDubbo and Weidian's practice on micro-service architecture
Dubbo and Weidian's practice on micro-service architecture
 
Effective Microservices In a Data-centric World
Effective Microservices In a Data-centric WorldEffective Microservices In a Data-centric World
Effective Microservices In a Data-centric World
 
Beyond CDNs: How to Harness the Next Phase of Innovation in Web Performance
Beyond CDNs: How to Harness the Next Phase of Innovation in Web PerformanceBeyond CDNs: How to Harness the Next Phase of Innovation in Web Performance
Beyond CDNs: How to Harness the Next Phase of Innovation in Web Performance
 
Hadoop Summit 2016 - Evolution of Big Data Pipelines At Intuit
Hadoop Summit 2016 - Evolution of Big Data Pipelines At IntuitHadoop Summit 2016 - Evolution of Big Data Pipelines At Intuit
Hadoop Summit 2016 - Evolution of Big Data Pipelines At Intuit
 
Cincom Smalltalk: Present, Future & Smalltalk Advocacy
Cincom Smalltalk: Present, Future & Smalltalk AdvocacyCincom Smalltalk: Present, Future & Smalltalk Advocacy
Cincom Smalltalk: Present, Future & Smalltalk Advocacy
 
Rapid Application Development with MEAN Stack
Rapid Application Development with MEAN StackRapid Application Development with MEAN Stack
Rapid Application Development with MEAN Stack
 
Webinar: NoSQL as the New Normal
Webinar: NoSQL as the New NormalWebinar: NoSQL as the New Normal
Webinar: NoSQL as the New Normal
 
Meteor Revolution: From DDP to Blaze Reactive Rendering
Meteor Revolution: From DDP to Blaze Reactive Rendering Meteor Revolution: From DDP to Blaze Reactive Rendering
Meteor Revolution: From DDP to Blaze Reactive Rendering
 
Migration of a high-traffic E-commerce website from Legacy Monolith to Micros...
Migration of a high-traffic E-commerce website from Legacy Monolith to Micros...Migration of a high-traffic E-commerce website from Legacy Monolith to Micros...
Migration of a high-traffic E-commerce website from Legacy Monolith to Micros...
 
ARC202:real world real time analytics
ARC202:real world real time analyticsARC202:real world real time analytics
ARC202:real world real time analytics
 
Enterprise architectsview 2015-apr
Enterprise architectsview 2015-aprEnterprise architectsview 2015-apr
Enterprise architectsview 2015-apr
 
MongoDB .local London 2019: Nationwide Building Society: Building Mobile Appl...
MongoDB .local London 2019: Nationwide Building Society: Building Mobile Appl...MongoDB .local London 2019: Nationwide Building Society: Building Mobile Appl...
MongoDB .local London 2019: Nationwide Building Society: Building Mobile Appl...
 
Breaking the Monolith: Organizing Your Team to Embrace Microservices
Breaking the Monolith: Organizing Your Team to Embrace MicroservicesBreaking the Monolith: Organizing Your Team to Embrace Microservices
Breaking the Monolith: Organizing Your Team to Embrace Microservices
 
QCon 2015 - Microservices Track Notes
QCon 2015 - Microservices Track Notes QCon 2015 - Microservices Track Notes
QCon 2015 - Microservices Track Notes
 
SOA 11g Upgrade Experience - SNI
SOA 11g Upgrade Experience - SNISOA 11g Upgrade Experience - SNI
SOA 11g Upgrade Experience - SNI
 
Technical Debt - SOTR14 - Clarkie
Technical Debt -  SOTR14 - ClarkieTechnical Debt -  SOTR14 - Clarkie
Technical Debt - SOTR14 - Clarkie
 
ACUCOBOL - Product Strategy and Roadmap
ACUCOBOL - Product Strategy and RoadmapACUCOBOL - Product Strategy and Roadmap
ACUCOBOL - Product Strategy and Roadmap
 
Mongo DB: Operational Big Data Database
Mongo DB: Operational Big Data DatabaseMongo DB: Operational Big Data Database
Mongo DB: Operational Big Data Database
 
2016 Federal User Group Conference - TeamForge Capabilities and Directions
2016 Federal User Group Conference - TeamForge Capabilities and Directions2016 Federal User Group Conference - TeamForge Capabilities and Directions
2016 Federal User Group Conference - TeamForge Capabilities and Directions
 

More from Julien SIMON

An introduction to computer vision with Hugging Face
An introduction to computer vision with Hugging FaceAn introduction to computer vision with Hugging Face
An introduction to computer vision with Hugging FaceJulien SIMON
 
Reinventing Deep Learning
 with Hugging Face Transformers
Reinventing Deep Learning
 with Hugging Face TransformersReinventing Deep Learning
 with Hugging Face Transformers
Reinventing Deep Learning
 with Hugging Face TransformersJulien SIMON
 
Building NLP applications with Transformers
Building NLP applications with TransformersBuilding NLP applications with Transformers
Building NLP applications with TransformersJulien SIMON
 
Building Machine Learning Models Automatically (June 2020)
Building Machine Learning Models Automatically (June 2020)Building Machine Learning Models Automatically (June 2020)
Building Machine Learning Models Automatically (June 2020)Julien SIMON
 
Starting your AI/ML project right (May 2020)
Starting your AI/ML project right (May 2020)Starting your AI/ML project right (May 2020)
Starting your AI/ML project right (May 2020)Julien SIMON
 
Scale Machine Learning from zero to millions of users (April 2020)
Scale Machine Learning from zero to millions of users (April 2020)Scale Machine Learning from zero to millions of users (April 2020)
Scale Machine Learning from zero to millions of users (April 2020)Julien SIMON
 
An Introduction to Generative Adversarial Networks (April 2020)
An Introduction to Generative Adversarial Networks (April 2020)An Introduction to Generative Adversarial Networks (April 2020)
An Introduction to Generative Adversarial Networks (April 2020)Julien SIMON
 
AIM410R1 Deep learning applications with TensorFlow, featuring Fannie Mae (De...
AIM410R1 Deep learning applications with TensorFlow, featuring Fannie Mae (De...AIM410R1 Deep learning applications with TensorFlow, featuring Fannie Mae (De...
AIM410R1 Deep learning applications with TensorFlow, featuring Fannie Mae (De...Julien SIMON
 
AIM361 Optimizing machine learning models with Amazon SageMaker (December 2019)
AIM361 Optimizing machine learning models with Amazon SageMaker (December 2019)AIM361 Optimizing machine learning models with Amazon SageMaker (December 2019)
AIM361 Optimizing machine learning models with Amazon SageMaker (December 2019)Julien SIMON
 
AIM410R Deep Learning Applications with TensorFlow, featuring Mobileye (Decem...
AIM410R Deep Learning Applications with TensorFlow, featuring Mobileye (Decem...AIM410R Deep Learning Applications with TensorFlow, featuring Mobileye (Decem...
AIM410R Deep Learning Applications with TensorFlow, featuring Mobileye (Decem...Julien SIMON
 
A pragmatic introduction to natural language processing models (October 2019)
A pragmatic introduction to natural language processing models (October 2019)A pragmatic introduction to natural language processing models (October 2019)
A pragmatic introduction to natural language processing models (October 2019)Julien SIMON
 
Building smart applications with AWS AI services (October 2019)
Building smart applications with AWS AI services (October 2019)Building smart applications with AWS AI services (October 2019)
Building smart applications with AWS AI services (October 2019)Julien SIMON
 
Build, train and deploy ML models with SageMaker (October 2019)
Build, train and deploy ML models with SageMaker (October 2019)Build, train and deploy ML models with SageMaker (October 2019)
Build, train and deploy ML models with SageMaker (October 2019)Julien SIMON
 
The Future of AI (September 2019)
The Future of AI (September 2019)The Future of AI (September 2019)
The Future of AI (September 2019)Julien SIMON
 
Building Machine Learning Inference Pipelines at Scale (July 2019)
Building Machine Learning Inference Pipelines at Scale (July 2019)Building Machine Learning Inference Pipelines at Scale (July 2019)
Building Machine Learning Inference Pipelines at Scale (July 2019)Julien SIMON
 
Train and Deploy Machine Learning Workloads with AWS Container Services (July...
Train and Deploy Machine Learning Workloads with AWS Container Services (July...Train and Deploy Machine Learning Workloads with AWS Container Services (July...
Train and Deploy Machine Learning Workloads with AWS Container Services (July...Julien SIMON
 
Optimize your Machine Learning Workloads on AWS (July 2019)
Optimize your Machine Learning Workloads on AWS (July 2019)Optimize your Machine Learning Workloads on AWS (July 2019)
Optimize your Machine Learning Workloads on AWS (July 2019)Julien SIMON
 
Deep Learning on Amazon Sagemaker (July 2019)
Deep Learning on Amazon Sagemaker (July 2019)Deep Learning on Amazon Sagemaker (July 2019)
Deep Learning on Amazon Sagemaker (July 2019)Julien SIMON
 
Automate your Amazon SageMaker Workflows (July 2019)
Automate your Amazon SageMaker Workflows (July 2019)Automate your Amazon SageMaker Workflows (July 2019)
Automate your Amazon SageMaker Workflows (July 2019)Julien SIMON
 
Build, train and deploy ML models with Amazon SageMaker (May 2019)
Build, train and deploy ML models with Amazon SageMaker (May 2019)Build, train and deploy ML models with Amazon SageMaker (May 2019)
Build, train and deploy ML models with Amazon SageMaker (May 2019)Julien SIMON
 

More from Julien SIMON (20)

An introduction to computer vision with Hugging Face
An introduction to computer vision with Hugging FaceAn introduction to computer vision with Hugging Face
An introduction to computer vision with Hugging Face
 
Reinventing Deep Learning
 with Hugging Face Transformers
Reinventing Deep Learning
 with Hugging Face TransformersReinventing Deep Learning
 with Hugging Face Transformers
Reinventing Deep Learning
 with Hugging Face Transformers
 
Building NLP applications with Transformers
Building NLP applications with TransformersBuilding NLP applications with Transformers
Building NLP applications with Transformers
 
Building Machine Learning Models Automatically (June 2020)
Building Machine Learning Models Automatically (June 2020)Building Machine Learning Models Automatically (June 2020)
Building Machine Learning Models Automatically (June 2020)
 
Starting your AI/ML project right (May 2020)
Starting your AI/ML project right (May 2020)Starting your AI/ML project right (May 2020)
Starting your AI/ML project right (May 2020)
 
Scale Machine Learning from zero to millions of users (April 2020)
Scale Machine Learning from zero to millions of users (April 2020)Scale Machine Learning from zero to millions of users (April 2020)
Scale Machine Learning from zero to millions of users (April 2020)
 
An Introduction to Generative Adversarial Networks (April 2020)
An Introduction to Generative Adversarial Networks (April 2020)An Introduction to Generative Adversarial Networks (April 2020)
An Introduction to Generative Adversarial Networks (April 2020)
 
AIM410R1 Deep learning applications with TensorFlow, featuring Fannie Mae (De...
AIM410R1 Deep learning applications with TensorFlow, featuring Fannie Mae (De...AIM410R1 Deep learning applications with TensorFlow, featuring Fannie Mae (De...
AIM410R1 Deep learning applications with TensorFlow, featuring Fannie Mae (De...
 
AIM361 Optimizing machine learning models with Amazon SageMaker (December 2019)
AIM361 Optimizing machine learning models with Amazon SageMaker (December 2019)AIM361 Optimizing machine learning models with Amazon SageMaker (December 2019)
AIM361 Optimizing machine learning models with Amazon SageMaker (December 2019)
 
AIM410R Deep Learning Applications with TensorFlow, featuring Mobileye (Decem...
AIM410R Deep Learning Applications with TensorFlow, featuring Mobileye (Decem...AIM410R Deep Learning Applications with TensorFlow, featuring Mobileye (Decem...
AIM410R Deep Learning Applications with TensorFlow, featuring Mobileye (Decem...
 
A pragmatic introduction to natural language processing models (October 2019)
A pragmatic introduction to natural language processing models (October 2019)A pragmatic introduction to natural language processing models (October 2019)
A pragmatic introduction to natural language processing models (October 2019)
 
Building smart applications with AWS AI services (October 2019)
Building smart applications with AWS AI services (October 2019)Building smart applications with AWS AI services (October 2019)
Building smart applications with AWS AI services (October 2019)
 
Build, train and deploy ML models with SageMaker (October 2019)
Build, train and deploy ML models with SageMaker (October 2019)Build, train and deploy ML models with SageMaker (October 2019)
Build, train and deploy ML models with SageMaker (October 2019)
 
The Future of AI (September 2019)
The Future of AI (September 2019)The Future of AI (September 2019)
The Future of AI (September 2019)
 
Building Machine Learning Inference Pipelines at Scale (July 2019)
Building Machine Learning Inference Pipelines at Scale (July 2019)Building Machine Learning Inference Pipelines at Scale (July 2019)
Building Machine Learning Inference Pipelines at Scale (July 2019)
 
Train and Deploy Machine Learning Workloads with AWS Container Services (July...
Train and Deploy Machine Learning Workloads with AWS Container Services (July...Train and Deploy Machine Learning Workloads with AWS Container Services (July...
Train and Deploy Machine Learning Workloads with AWS Container Services (July...
 
Optimize your Machine Learning Workloads on AWS (July 2019)
Optimize your Machine Learning Workloads on AWS (July 2019)Optimize your Machine Learning Workloads on AWS (July 2019)
Optimize your Machine Learning Workloads on AWS (July 2019)
 
Deep Learning on Amazon Sagemaker (July 2019)
Deep Learning on Amazon Sagemaker (July 2019)Deep Learning on Amazon Sagemaker (July 2019)
Deep Learning on Amazon Sagemaker (July 2019)
 
Automate your Amazon SageMaker Workflows (July 2019)
Automate your Amazon SageMaker Workflows (July 2019)Automate your Amazon SageMaker Workflows (July 2019)
Automate your Amazon SageMaker Workflows (July 2019)
 
Build, train and deploy ML models with Amazon SageMaker (May 2019)
Build, train and deploy ML models with Amazon SageMaker (May 2019)Build, train and deploy ML models with Amazon SageMaker (May 2019)
Build, train and deploy ML models with Amazon SageMaker (May 2019)
 

Recently uploaded

Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentationphoebematthew05
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 

Recently uploaded (20)

Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort ServiceHot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentation
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 

How Criteo Scaled and Supported Massive Growth with MongoDB (2013)

  • 1. How Criteo Scaled and Supported Massive Growth with MongoDB MongoDB Conference New York City, June 2013 Julien SIMON Vice President, Engineering j.simon@criteo.com @julsimon
  • 2. CRITEO " 2 •  R&D EFFORT •  RETARGETING •  CPC PHASE 1 : 2005-2008 CRITEO CREATION •  MORE THAN 3000 CLIENTS •  35 COUNTRIES, 15 OFFICES •  R&D: MORE THAN 300 PEOPLE PHASE 2 : 2008-2012 GLOBAL LEADER : + 700 EMPLOYEES! 2007 15 EMPLOYEES 2009 84 EMPLOYEES 6 EMPLOYEES 2005 2010 203 EMPLOYEES 2012 +700 EMPLOYEES SO FAR 2006 2011 395 EMPLOYEES 2008 33 EMPLOYEES
  • 3. GLOBAL PRESENCE 3 SYDNEY PARIS LONDON BARCELONA MILAN MUNICH BOSTON NEW YORK SAO PAULO PALO ALTO TOKYO SEOUL STOCKHOLM AMSTERDAM 15 OFFICES, 30+ COUNTRIES CHICAGO
  • 4. GO GO GO Powered by PERFORMANCE DISPLAY Copyright © 2013 Criteo. Confidential A user sees products on your … … and sees After on the banner, the user goes back to the product page. ...then browses the 4
  • 5. ! ! REAL-TIME PERSONALIZATION 5 Copyright © 2013 Criteo. Confidential. Boutons! all original #represent SHOP NOW Couleurs Fond Disposition! WARM MEETS LIGHT! SWEET NOTHING! ADDIDAS IS ALL IN! ALL  ORIGINALS   #REPRESENT   Slogans! JOIN NOW! SEE MORE! CLICK HERE! “Call to action”! Lien ! opt-out! SEE MORE! JOIN NOW! SEE MORE! CLICK HERE! SHOP NOW! SHOP  NOW     JOIN  NOW     JOIN NOW
  • 6. PREDICTION & RECOMMENDATION 2 CORE TECHNOLOGIES choose the right product to display choose the right users / advertiser / publisher to display RECOMMENDATION ENGINE CTR + CR increase PREDICTION ENGINE
  • 7. INFRASTRUCTURE 7 Copyright © 2013 Criteo. Confidential.  DAILY TRAFFIC - HTTP REQUESTS: 30+ BILLION - BANNERS SERVED: 1+ BILLION  PEAK TRAFFIC (PER SECOND) - HTTP REQUESTS: 500,000+ - BANNERS: 25,000+  7 DATA CENTERS  SET UP AND MANAGED IN-HOUSE  AVAILABILITY > 99.95%
  • 8. 8 Copyright © 2013 Criteo. Confidential. HIGH PERFORMANCE COMPUTING FETCH, STORE, CRUNCH, QUERY 20 additional TB EVERY DAY ? …SUBTITLED « HOW I LEARNED TO STOP WORRYING AND LOVE HPC »
  • 9. PRODUCT CATALOGUES •  Catalogue = product feed provided by advertisers (product id, description, category, price, URL, etc) •  3000+ catalogues, ranging from a few MB to several tens of GB •  About 50% of products change every day •  Imported at least once a day by an in-house application •  Data replicated within a geographical zone •  Accessed through a cache layer by web servers •  Microsoft SQL Server used from day 1 •  Running fine in Europe, but… –  Number of databases (1 per advertiser)… and servers –  Size of databases –  SQL Server issues hard to debug and understand •  Running kind of fine in the US, until dead end in Q1 2011 –  transactional replication over high latency links Copyright © 2010 Criteo. Confidential.
  • 10. REQUIREMENTS FOR A NEW DB •  Scale-out architecture running on commodity hardware (aka « Intel CPUs in metal boxes ») •  No transactions needed, eventual consistency OK •  High availability •  Distributed clusters, with replication over high latency links •  Requestable (key-value not enough) •  Open source … with active user community … backed by a stable organization with long-term commitment (not one guy in a garage) … no licence fees for production use … commercial support available at reasonable cost •  Easy to learn, (re)deploy, monitor and upgrade •  « Low maintenance » (don’t need a 10-people team just to run it) •  Multi-language support •  Ability to export everything to Hadoop multiple times per day Copyright © 2010 Criteo. Confidential.
  • 11. FROM SQL SERVER TO MONGODB •  Ah, database migrations… everyone loves them J •  1st step: solve replication issue –  Import and replicate catalogues in MongoDB –  Push content to SQL Server, still queried by web servers •  2nd step: prove that MongoDB can survive our web traffic –  Modify web applications to query MongoDB –  C-a-r-e-f-u-l-l-y switch web queries to MongoDB for a small set of catalogues –  Observe, measure, A/B test… and generally make sure that the system still works •  3rd step: scale ! –  Migrate thousands of catalogues away from SQL Server –  Monitor and tweak the MongoDB clusters –  Add more MongoDB servers… and more shards –  Update ops processes (monitoring, backups, etc) Copyright © 2010 Criteo. Confidential.
  • 12. OUR MONGODB DEPLOYMENT •  Europe –  18 3-server shards (1+1+1) –  800M products, 1TB –  1B requests/day (peak at 40K/s) –  350M updates/day (peak at 11K/s) •  US –  14 4-server shards (2+2) –  400M products, 650GB •  APAC –  12 3-server shards (2+1) –  300M products, 500GB •  146 servers total : 2.0 (+ Criteo patches) à 2.2 à 2.4.3 Copyright © 2010 Criteo. Confidential.
  • 13. MONGODB, 2+ YEARS LATER •  Stable (2.4.3 much better) •  Easy to (re)install and administer •  Great for small datasets (i.e. smaller than server RAM) •  Good performance if read/write ratio is high •  Failover and inter-DC replication work (but shard early!) •  Performance suffers when : –  dataset much larger than RAM –  read/write ratio is low –  Multiple applications coexist on the same cluster •  Some scalability issues remain (master-slave, connections) •  Criteo is very interested in the 10gen roadmap J Copyright © 2010 Criteo. Confidential.
  • 14. THANKS A LOT FOR YOUR ATTENTION! 14 Copyright © 2013 Criteo. Confidential. www.criteo.com engineering.criteo.com