Business Track: How Criteo Scaled and Supported Massive Growth with MongoDB

3,769 views

Published on

Published in: Technology
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
3,769
On SlideShare
0
From Embeds
0
Number of Embeds
1,475
Actions
Shares
0
Downloads
61
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide
  • http://www.istockphoto.com/stock-photo-14605073-world-map.php?st=71d31fc
  • http://www.shutterstock.com/pic.mhtml?id=50969467http://www.shutterstock.com/pic.mhtml?id=67739992
  • http://www.istockphoto.com/stock-photo-15438323-taxes-due.phphttp://www.shutterstock.com/pic.mhtml?id=106229987http://www.istockphoto.com/stock-photo-11925038-nerdy-office-worker-drinking-coffee.php
  • Business Track: How Criteo Scaled and Supported Massive Growth with MongoDB

    1. 1. How Criteo Scaled and SupportedMassive Growth with MongoDBJulien SIMONVice President, Engineeringj.simon@criteo.com @julsimon
    2. 2. CRITEO2• R&D EFFORT• RETARGETING• CPCPHASE 1 : 2005-2008CRITEO CREATION• MORE THAN 3000 CLIENTS• 35 COUNTRIES, 15 OFFICES• R&D: MORE THAN 300 PEOPLEPHASE 2 : 2008-2012GLOBAL LEADER : + 700 EMPLOYEES!200715EMPLOYEES 200984EMPLOYEES6EMPLOYEES20052010203EMPLOYEES2012+700EMPLOYEES SO FAR20062011395EMPLOYEES200833EMPLOYEES
    3. 3. GLOBAL PRESENCE3SYDNEYPARISLONDONBARCELONAMILANMUNICHBOSTONNEW YORKSAO PAULOPALO ALTOTOKYOSEOULSTOCKHOLMAMSTERDAM15 OFFICES, 30+COUNTRIESCHICAGO
    4. 4. GO GOGOPowered byPERFORMANCE DISPLAYCopyright © 2013 Criteo. ConfidentialA user sees productson your …… and seesAfter on the banner,the user goes back to the product page....then browsesthe4
    5. 5. REAL-TIME PERSONALIZATION5Copyright © 2013 Criteo. Confidential.Boutonsall original #representSHOP NOWCouleursFond DispositionWARM MEETS LIGHTSWEET NOTHINGADDIDAS IS ALL INALL ORIGINALS#REPRESENTSlogansJOIN NOWSEE MORECLICK HERE“Call to action”Lienopt-outSEE MOREJOIN NOWSEE MORECLICK HERESHOP NOWSHOP NOWJOIN NOWJOIN NOW
    6. 6. PREDICTION & RECOMMENDATION2 CORETECHNOLOGIESchoose the right productto displaychoose the right users / advertiser /publisher to displayRECOMMENDATIONENGINECTR + CRincreasePREDICTIONENGINE
    7. 7. INFRASTRUCTURE7Copyright © 2013 Criteo. Confidential.DAILY TRAFFIC- HTTP REQUESTS: 30+ BILLION- BANNERS SERVED: 1+ BILLIONPEAK TRAFFIC (PER SECOND)- HTTP REQUESTS: 500,000+- BANNERS: 25,000+7 DATA CENTERSSET UP AND MANAGEDIN-HOUSEAVAILABILITY > 99.95%
    8. 8. 8Copyright © 2013 Criteo. Confidential.HIGH PERFORMANCE COMPUTINGFETCH, STORE, CRUNCH, QUERY 20 additional TB EVERY DAY ?…SUBTITLED « HOW I LEARNED TO STOP WORRYING AND LOVE HPC »
    9. 9. PRODUCT CATALOGUES• Catalogue = product feed provided by advertisers (productid, description, category, price, URL, etc)• 3000+ catalogues, ranging from a few MB to several tens of GB• About 50% of products change every day• Imported at least once a day by an in-house application• Data replicated within a geographical zone• Accessed through a cache layer by web servers• Microsoft SQL Server used from day 1• Running fine in Europe, but…– Number of databases (1 per advertiser)… and servers– Size of databases– SQL Server issues hard to debug and understand• Running kind of fine in the US, until dead end in Q1 2011– transactional replication over high latency linksCopyright © 2010 Criteo. Confidential.
    10. 10. REQUIREMENTS FOR A NEW DB• Scale-out architecture running on commodity hardware(aka « Intel CPUs in metal boxes »)• No transactions needed, eventual consistency OK• High availability• Distributed clusters, with replication over high latency links• Requestable (key-value not enough)• Open source… with active user community… backed by a stable organization with long-term commitment (not one guy in a garage)… no licence fees for production use… commercial support available at reasonable cost• Easy to learn, (re)deploy, monitor and upgrade• « Low maintenance » (don’t need a 10-people team just to run it)• Multi-language support• Ability to export everything to Hadoop multiple times per dayCopyright © 2010 Criteo. Confidential.
    11. 11. FROM SQL SERVER TO MONGODB• Ah, database migrations… everyone loves them • 1st step: solve replication issue– Import and replicate catalogues in MongoDB– Push content to SQL Server, still queried by web servers• 2nd step: prove that MongoDB can survive our web traffic– Modify web applications to query MongoDB– C-a-r-e-f-u-l-l-y switch web queries to MongoDB for a small set of catalogues– Observe, measure, A/B test… and generally make sure that the system still works• 3rd step: scale !– Migrate thousands of catalogues away from SQL Server– Monitor and tweak the MongoDB clusters– Add more MongoDB servers… and more shards– Update ops processes (monitoring, backups, etc)Copyright © 2010 Criteo. Confidential.
    12. 12. OUR MONGODB DEPLOYMENT• Europe– 18 3-server shards (1+1+1)– 800M products, 1TB– 1B requests/day (peak at 40K/s)– 350M updates/day (peak at 11K/s)• US– 14 4-server shards (2+2)– 400M products, 650GB• APAC– 12 3-server shards (2+1)– 300M products, 500GB• 146 servers total :2.0 (+ Criteo patches)  2.2  2.4.3Copyright © 2010 Criteo. Confidential.
    13. 13. MONGODB, 2+ YEARS LATER• Stable (2.4.3 much better)• Easy to (re)install and administer• Great for small datasets (i.e. smaller than server RAM)• Good performance if read/write ratio is high• Failover and inter-DC replication work (but shard early!)• Performance suffers when :– dataset much larger than RAM– read/write ratio is low– Multiple applications coexist on the same cluster• Some scalability issues remain (master-slave, connections)• Criteo is very interested in the 10gen roadmap Copyright © 2010 Criteo. Confidential.
    14. 14. THANKS A LOT FOR YOUR ATTENTION!14Copyright © 2013 Criteo. Confidential.www.criteo.comengineering.criteo.com

    ×