• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content







Total Views
Views on SlideShare
Embed Views



9 Embeds 2,033

http://blog.xebia.fr 1981
http://flavors.me 27 9
http://www.newsblur.com 5
http://www-ig-opensocial.googleusercontent.com 3
http://webcache.googleusercontent.com 3
http://a0.twimg.com 2
https://si0.twimg.com 2
https://twitter.com 1


Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment

    MongoDB@sfr.fr MongoDB@sfr.fr Presentation Transcript

    • MongoDB @ SFRsfr.fr
    • WelcomeAntoine Raith, technical team leader @ SFR
    • Apache, Tomcat, JEE1 mutualised platform30 physical application servers150 Tomcat deployedWeb development at Internet Direction
    • 22M pageviews per day4.5M only on homepage8M customers authentication per dayWe NEED to scale!What do we face ?
    • Increase our scalability Avoid Schema/Table/Column dependency Closer to developper team than sysadmin or DBA teamNoSQL?
    • ScalableComplex queriesSchema-lessEasy deployment and monitoringOpen-SourceWhy MongoDB ?
    • [Live project] customers data[Live project] sfr.fr targeted ads[Development project] Products catalogOur projects based on MongoDB
    • MongoDB @ SFRCustomer Data
    • Hello!Jérôme Leleu, web architect @ SFRIn charge of SSO and user profile service
    • User profile service (UPS)Web services (SOAP or JSON)Get the profile of SFR clientsData are agregated from many backends of the informationsystemContext
    • Java 1.6, mongo driver 2.6.5, replicat set + shardingTechnical data : « local storage » collection■ only 1 collection in a database■ « last connection date » of web account■ 14 millions■ read/writes by identifier of the web account (shard key)Some functional data are coming : « internautes » collection(6 millions)…Data in UPS
    • My choice : read on slave and write (without acknowledge)on master« local storage » collection needs to be readable immediatlyafter write-> not really compatible with asynchronous replication andreads on slave-> use of memcached (like for most data in UPS) as acache for reads (let replication happens)Implementation in MongoDB
    • 2 Go of data and 2 Go of index for 14 millions documents(from « db.stats(); »)Insert / update : 600 k each day / communication exception: 6 k each dayAverage insert/update time : 56 msSome figures
    • Default values of the Java mongo driver are inappropriate :unlimited connect timeout, unlimited read timeout, wait 120seconds to get a connection from pool !Cant’ make « AND » query on the same fieldbefore mongo 2.0Is it a good choice to read on slave / write on master ?Replication time ? Is it a real use case ?To replace by :force acknowledge on writes and read on slave ?ORdon’t acknowledge writes and read on master ?Problems & pending question
    • Mongo @ SFRTargeted ads application
    • Hi!Matthieu BlancWeb architect @ Degetel, contractor for SFR
    • ContextPresent targeted ads to www.sfr.fr web visitorsBased on :● Their profile● Their web browsing history● Date/Time of the day● etc.
    • Ex : A web visitor consult a smartphone @ www.sfr.fr
    • A smartphones ad is shown when he goes back tohomepage
    • Ex : A web visitor goes to www.sfr.fr from a searchengine
    • An ad related to his search is shown
    • ProblemNeed to keep web visitor web browsing historyNeed to track down every :● Ad views● Clicks● ConversionsMongo DB to the rescue!
    • image from http://www.flickr. com/photos/cayusa/The D.U.N.C.E. principle : everything by default
    • Java 1.6 Spring Data for MongoDB 1.0.0 (uses mongo driver 2.7.1) Read/Write on master No Sharding WriteConcern.NORMALThe D.U.N.C.E. principle : everything by default
    • Case StudyEvent Logging with MongoDB
    • Capped collections :Event Loggingdb.createCollection("mycoll", {capped: true, size:100000})Old log data automatically LRU’s outNo risk of filling up a diskno need to write log archival / deletion scriptsGood performance for a high number of writes compared toreadsEvent Logging
    • Map Reduce <- we are bad at this Cron Job -> Server side logs aggregation by minute and by ad Aggregated logs persisted in a dedicated collection Cron Job 2 consolidate aggregated logs by hour every day Cron Job 3 consolidate aggregated logs by day every weekLog Analysis
    • Event Logging
    • The Result
    • The Result
    • The Result
    • Main collection (visitors web browsing history):36 millions documents and growingSome DataAvg. document size 430 bytes80 millions events processed in less than 3 monthsBy seconds 60 reads 50 writes (60 finds, 30 updates, 20inserts)Conclusion
    • It works! :)Some DataDefault properties are good enough even for a high trafficwebsite (for now...)Conclusion
    • Mongo @ SFRProducts catalog
    • Good morning!David Rault, web architect @ SFRIn charge of MarketPlace project@squat80 http://fr.linkedin.com/pub/david-rault/37/722/963
    • ● Products classified by categories ● Categories determine products features ● Multiple sellers ○ can create new products (based on EAN/MPN) ■ can modify the products they created ■ can only refer to products created by other sellers ○ publish offers (product id + price) ● Order management is out-of-scope ○ delegated to existing order-management system ● Still in developmentContext
    • ● Schema-less: products are structured documents ○ Different properties depending on product category (TVs, phone protections, wires, ...) ○ No JOIN required - documents load in a single call ○ New categories will come : no migration required● Searching capabilities ○ Empowers navigating through the store ○ Complex-queries on products features● Performance ○ Our Ops forbid intensive writes into Oracle DB (!)Why Mongo ?
    • Java 7 - Tomcat 7Direct use of Java driver (2.7.2)Replicat-set (2 replicas + 1 arbiter)Sharding enabledWrites are replicas-safeTechnical choices
    • ● WS for creation/update of products and offers ● Triggers (scheduled) to consolidate data ○ for each product : valid offers on a 2-day window are agregated into the product ○ for each categories : product counts, pseudo- enumerated field values (e.g. list of brands) are agregrated into the product ● "Live streaming" into Google Search Appliance ○ feed for both internal keyword searches & portal- wide searches (within *.sfr.fr sites)"Back-office" Design
    • ● Straight-forward queries ○ mostly READs ○ by product id, by category ○ filtering (min/max price, by brand, by color, ...) ■ filters are category-specific ● Customer-activity tracking ○ build knowledge base for future features: ■ recommendation engine ○ products viewed, previous orders, wish-list, etc. ○ both for identified and anonymous visitors"Front-office" design
    • ● Need to unlearn 10+ years EXP in relational design/development ○ Think "document", not relation ○ No magical (a.k.a ORM) framework ● bye bye Hibernate ;) ○ Some surprises/confusion with the query syntax ■ No "$and" in versions <2.0, didnt manage some queries (though it worked in mongo shell) ● "min_price > a and min_price > b" with the Java driver ■ Function operators appear at varying positions ● { "$lt": { "some_field": some_value }} ● { "some_field": { "$in" : some_values }}How is it going ?
    • ● Good performance ○ Although relatively low number of documents (~5-10 000 documents) ● Fast development cycle ○ Only a few hours to have the first prototype running ○ With googles help and a couple of hours, build a micro full-text indexing search feature ● Mongo Shell is my friend ○ as well as Google & MongoDB.org ○ at last, a developer-friendly (command-line) tool ● bye bye sqlplus ;)How is it still going ?
    • "borrowed" from Geek and Poke http://geekandpoke.typepad.com/ Thank You!