Successfully reported this slideshow.
Your SlideShare is downloading. ×

Why we chose mongodb for guardian.co.uk

Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Loading in …3
×

Check these out next

1 of 49 Ad
Advertisement

More Related Content

Slideshows for you (20)

Viewers also liked (20)

Advertisement

Similar to Why we chose mongodb for guardian.co.uk (20)

Advertisement

Recently uploaded (20)

Why we chose mongodb for guardian.co.uk

  1. Why we chose mongodb for guardian.co.uk Graham Tackley Web Platform Team Lead, guardian.co.uk
  2. “It is not the strongest of the species that survives, nor the most intelligent. It is the one that is most adaptable to change.”
  3. Early Period circa ’95 The “Lash It Together” era
  4. Early Period (95, the “Lash It Together” era) Perl, CGI, apache Experimental Manual processes Bespoke software RDBMS, scripts & static files
  5. Mid Period circa ’00 The “Vendor CMS” era
  6. Mid Period: 2000s (The “Vendor CMS era”) Vignette / AOLserver TCL, Apache, Oracle Platform for online publishing Initially scales well with acceleration in delivery of features
  7. Mid Period: 2000s (The “Vendor CMS era”) Surprise! Vendor’s CMS doesn’t do what we want! Mish-mash in templates: HTML, JavaScript, TCL, SQL, PL-SQL No model in app tier, only in RDBMS schema created in Oracle Designer
  8. Mid Period: 2000s (The “Vendor CMS era”)
  9. Mid Period: 2000s (The “Vendor CMS era”)
  10. Mid Period: 2000s (The “Vendor CMS era”) After a few years, very difficult to extend Database schema becomes fixed due to dependencies in templates
  11. Mid Period: 2000s (The “Vendor CMS era”) If you can’t change the system:
  12. Modern Period circa ’05-09 The “J2EE Monolithic” era
  13. Web server Web server Web server I bring you NEWS!!! App server App server App server Oracle CMS Data feeds
  14. Web server Web server Web server Modern java app I bring you NEWS!!! App server App server App server Spring / Hibernate DDD / TDD Strong Oracle in java model Database abstracted away with ORM CMS Data feeds
  15. Problems
  16. Each release involves schema upgrade Schema upgrade = downtime for journalists
  17. Complexity still increasing: 300+ tables, 10,000 lines of hibernate XML config 1,000 domain objects mapped to database 70,000 lines of domain object code Very tight binding to database
  18. ORM not really masking complexity: Database has strong influence on domain model: many domain objects made more complex mapping joins in RDBMS Complex hibernate features used, interceptors, proxies Complex caching strategy Lots of optimisations And: We still hand code complex queries in SQL!
  19. Load becoming an issue RDBMS difficult to scale
  20. Partial NoSQL circa ’09-10 The “Sticking Plaster” era
  21. Introduce yet more caching to patch up load problems Text Introduction of memcached
  22. Decouple applications from database by building APIs Power APIs using alternative, more scalable technologies APIs used to scale out database reads Writes still go to RDBMs
  23. Content API Mutualised news! http://content.guardianapis.com Read API delivered using Apache Solr Hosted in EC2 Document oriented search engine Scales well for read operations
  24. Core Api Web servers Solr/API App server Solr/API Memcached (20Gb) Solr/API rdbms Solr Solr/API Solr/API CMS Cloud, EC2
  25. Mutualised news! We’ve solved our load problem (for now) but Increased our complexity
  26. Mutualised news! We now have 3 models! RDBMS tables Java Objects JSON API
  27. Mutualised news!
  28. Mutualised news!
  29. Mutualised news!
  30. MutualisedAPI is very simple JSON news! Multiple domain concepts expressed in single document Can be designed in forwardly extensible way What if the JSON API was our primary model?
  31. Full NoSQL in development The “It’s the future!” era
  32. Database selection Simple keystore. Too simple? Huge scalability. Do we need it? Schema design difficult. Simple to use, can execute similar queries to RDBMs
  33. MongoDB Mutualised news! database Document oriented Stores parsed JSON documents Can express complex queries Can be flexible about consistency Malleable schema: can easily change at runtime Can work at both large & small scales
  34. Flexible Schema Mutualised news!
  35. Flexible Schema Mutualised news!
  36. Flexible Schema Mutualised news! Can easily represent different classes of tag as documents Both documents can be inserted into same collection Far simpler than equivalent hibernate mapped subclass configuration
  37. Flexible Schema Simple to query: Mutualised news!
  38. Flexible Schema Simple to query: Mutualised news! Query operators: $ne, $nin, $all, $exists, $gt, $lt, $gte ...
  39. Modifying the schema Mutualised news!
  40. Modifying the schema Mutualised news!
  41. Modifying the schema Mutualised news!
  42. The first project: Identity Current login/registration system still in TCL/PL-SQL 3M+ users in relational database Very complex schema + PL-SQL New system required Can we migrate from Oracle to NoSql?
  43. Build API that can support both backends Registration app guardian.co.uk API This bit is hard! Oracle
  44. Build API that can support both backends Registration app guardian.co.uk API MongoDB Oracle
  45. Migrate using API & decommision Registration app guardian.co.uk API MongoDB
  46. Add new stuff! Registration app guardian.co.uk API MongoDB Solr? Redis?
  47. MongoDB Simple, flexible schema with similar query & indexing to RDBMS Great at small or large scale Easy for developers to get going Commercial support available (10Gen) One day may power all of guardian.co.uk No transactions / joins: developers must cater for this Produces a net reduction in lines of code / complexity
  48. Shameless plugs http://content.guardianapis.com We’re hiring: http://www.gnmcareers.co.uk ref JS323 graham.tackley@guardian.co.uk - @tackers

Editor's Notes

  • \n\n
  • Theme: evolution of platform\nadapting to change is critical - will start with some history as to how we adapted to chg\n\n
  • \n
  • Ancient system\nScripts & database\nBespoke software, changes difficult\n
  • \n
  • Site oriented to broadcast publishing model\nCMS helps. No longer lashing things together \n\n
  • Template & rdbms oriented design, and TCL = no real domain model\nHeavyweight schema change process\n\n
  • This is from a TEMPLATE!\nscroll down to reveal HTML\n(about 10,000 of these)\n
  • bottom of template\nabout 10,000 of these!\n\n
  • Can’t change schema easily, to many dependencies in templates\n\n
  • dodo\ne.g. at start just articles; now video, interactives, audio, galleries, live blogs...\n
  • \n
  • “Web 2.0”, community, RSS, discoverability, tagging.\n\n
  • Very standard 3 tier application\nScale application servers on load\nCaching local to application server at first. Memcached added later\nRead heavy, broadcast model. Almost no writes compared to reads\n\n
  • Very standard 3 tier application\nScale application servers on load\nCaching local to application server at first. Memcached added later (in next era!)\nRead heavy, broadcast model. Almost no writes compared to reads\n\n
  • \n
  • \n
  • \n
  • \n
  • \n
  • Talk: beginning to use NoSql in real organisation. Change in journalism affecting platform\n\n
  • We don’t have a scale problem with current application & model\n(Interesting fact: small dip at end is actually period of very high load. Caching works)\n\n
  • Talk: beginning to use NoSql in real organisation. Change in journalism affecting platform\n\n
  • Most of our new features - and partners - drive from the content api\n
  • Introduction of memcached & Solr\nSolr hosted in the cloud (EC2)\n
  • “Out” service\n
  • \n
  • Most recent content\n\n
  • Most recent content with tags, fields\n(this is pretty well how we went live with the content api)\n\n\n
  • Single article with media\nExtensible schema, eg: adding geotagging to images. Hard in DB, easy in JSON\nThis document represents at least 30 database tables!\n\n\n
  • \n\n
  • \n
  • Couch used at BBC. To simple.\nCassandra: Impressive. Do we need it? Schema design tricky.\nMongoDB: Not a huge mindset change. Devs working in a few days\n
  • Not a million miles from a RDBMS\nSimpler\n
  • Experiments with mongodb & content API\nGuardian site categorises content with tags\nTone tag represents “editorial tone” of content\n(SKIP IF LESS THAN 10 MINS TO GO!)\n\n
  • Different tag types can have different schemas\nKeywords (subjects) are in a section, music / madonna\n\n
  • \n\n
  • \n\n
  • \n\n
  • Suppose we want to add external musicbrainz ID to tag?\nAn update can modify the schema at runtime. No downtime.\n\n
  • Where clause: id\n$push atomically ads external reference onto tag\n\n
  • Resulting document now looks like this\n\n
  • Migration project, not green fields\n
  • REST API\nMapped initially just to oracle, then (next slide) to both datastores\nIntegration tested\n\n
  • API supports both data stores - lazy migration\nCurrently writing this - so far 60-70% less code for mongo version\n\n
  • Then batch migration and bye bye oracle\n
  • In the future?\n\n
  • \n\n
  • \n\n

×