Your SlideShare is downloading. ×
0
Moving from Relational to Document Store
Moving from Relational to Document Store
Moving from Relational to Document Store
Moving from Relational to Document Store
Moving from Relational to Document Store
Moving from Relational to Document Store
Moving from Relational to Document Store
Moving from Relational to Document Store
Moving from Relational to Document Store
Moving from Relational to Document Store
Moving from Relational to Document Store
Moving from Relational to Document Store
Moving from Relational to Document Store
Moving from Relational to Document Store
Moving from Relational to Document Store
Moving from Relational to Document Store
Moving from Relational to Document Store
Moving from Relational to Document Store
Moving from Relational to Document Store
Moving from Relational to Document Store
Moving from Relational to Document Store
Moving from Relational to Document Store
Moving from Relational to Document Store
Moving from Relational to Document Store
Moving from Relational to Document Store
Moving from Relational to Document Store
Moving from Relational to Document Store
Moving from Relational to Document Store
Moving from Relational to Document Store
Moving from Relational to Document Store
Moving from Relational to Document Store
Moving from Relational to Document Store
Moving from Relational to Document Store
Moving from Relational to Document Store
Moving from Relational to Document Store
Moving from Relational to Document Store
Moving from Relational to Document Store
Moving from Relational to Document Store
Moving from Relational to Document Store
Moving from Relational to Document Store
Moving from Relational to Document Store
Moving from Relational to Document Store
Moving from Relational to Document Store
Moving from Relational to Document Store
Moving from Relational to Document Store
Moving from Relational to Document Store
Moving from Relational to Document Store
Moving from Relational to Document Store
Moving from Relational to Document Store
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Moving from Relational to Document Store

7,752

Published on

guardian.co.uk is a leading UK-based news website. We've spent ten years fighting relational database representations of our domain model, until the implementation of our API made us realise that if …

guardian.co.uk is a leading UK-based news website. We've spent ten years fighting relational database representations of our domain model, until the implementation of our API made us realise that if only we could store documents everything got simpler. I'll talk about the history that led us to choosing MongoDB as a key part of our infrastructure going forward and how we're progressively migrating from our relational database.

With huge credit to Mat Wall @matwall for creating the original version of this talk.

Published in: Technology
0 Comments
4 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
7,752
On Slideshare
0
From Embeds
0
Number of Embeds
7
Actions
Shares
0
Downloads
114
Comments
0
Likes
4
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide
  • g.co.uk: leading liberal news website40m uniques p/m, half from US\n
  • publishing newspaper for nearly 200 years; website for 15 yrs\nadapting to change is critical - start with history of fighting relational dbs & attempts to tame; then how we’re moving to mongo\n
  • \n
  • Ancient system\nScripts & database\nBespoke software, changes difficult\n
  • \n
  • Site oriented to broadcast publishing model\nCMS helps. No longer lashing things together \n\n
  • Template & rdbms oriented design, and TCL = no real domain model\nHeavyweight schema change process\n\n
  • This is from a TEMPLATE!\nscroll down to reveal HTML\n(about 10,000 of these)\n
  • bottom of template\nabout 10,000 of these!\n\n
  • Can’t change schema easily, to many dependencies in templates\n\n
  • dodo\ne.g. at start just articles; now video, interactives, audio, galleries, live blogs...\n
  • \n
  • “Web 2.0”, community, RSS, discoverability, tagging.\n\n
  • Very standard 3 tier application\nScale application servers on load\nCaching local to application server at first. Memcached added later\nRead heavy, broadcast model. Almost no writes compared to reads\n\n
  • Very standard 3 tier application\nScale application servers on load\nCaching local to application server at first. Memcached added later (in next era!)\nRead heavy, broadcast model. Almost no writes compared to reads\n\n
  • \n
  • \n
  • \n
  • \n
  • \n
  • Talk: beginning to use NoSql in real organisation. Change in journalism affecting platform\n\n
  • My team spent 2 yrs making relational db inconsitant:\nmuch better to deliever 30 sec old data in 200ms, than up to data data in 5 secs\nWe don’t have a scale problem with current application & model\n\n\n
  • Talk: beginning to use NoSql in real organisation. Change in journalism affecting platform\n\n
  • Content api - both for external people (engaging with the internet) and for us. \nMost new stuff from content api, e.g. m.guardian, iPhone, iPad etc.\n
  • Introduction of memcached & Solr\nSolr hosted in the cloud (EC2)\n
  • “Out” service\n
  • \n
  • Most recent content\n\n
  • Most recent content with tags, fields\n(this is pretty well how we went live with the content api)\n\n\n
  • Single article with media\nExtensible schema, eg: adding geotagging to images. Hard in DB, easy in JSON\nThis document represents at least 30 database tables!\n\n\n
  • \n\n
  • \n
  • Not a million miles from a RDBMS\nSimpler\n
  • Experiments with mongodb & content API\nGuardian site categorises content with tags\nTone tag represents “editorial tone” of content\n\n
  • Different tag types can have different schemas\nKeywords (subjects) are in a section, music / madonna\n\n
  • \n\n
  • \n\n
  • \n\n
  • Suppose we want to add external musicbrainz ID to tag?\nAn update can modify the schema at runtime. No downtime.\n\n
  • Where clause: id\n$push atomically ads external reference onto tag\n\n
  • Resulting document now looks like this\nWe haven’t had to change the code yet - old code still works, no downtime\n\n
  • Migration project, not green fields\n(SKIP IF LESS THAN 10 MINS TO GO!)\n\n
  • REST API\nMapped initially just to oracle, then (next slide) to both datastores\nIntegration tested\n\n
  • API supports both data stores - lazy migration\nCurrently writing this in Scala + Casbah - so far 60-70% less code for mongo version\n\n
  • Then batch migration and bye bye oracle\n
  • In the future?\n\n
  • 1. we will never use a relational database for a new project\n2. kill: orm; memcached etc\n3. spend the time thinking about how your problem space should be defined\n4. things will change - plan for it!\n
  • Transcript

    • 1. Evolving from Relational toDocument Store Graham TackleyWeb Platform Team Lead, guardian.co.uk
    • 2. “It is not the strongest of the species thatsurvives, nor the most intelligent. It is the one that is most adaptable to change.”
    • 3. Early Period circa ’95The “Lash It Together” era
    • 4. Early Period (95, the “Lash It Together” era) Perl, CGI, apache ExperimentalManual processesBespoke software RDBMS, scripts & static files
    • 5. Mid Period circa ’00The “Vendor CMS” era
    • 6. Mid Period: 2000s (The “Vendor CMS era”) Vignette / AOLserver TCL, Apache, Oracle Platform for online publishingInitially scales well withacceleration in delivery of features
    • 7. Mid Period: 2000s (The “Vendor CMS era”) Surprise! Vendor’s CMSdoesn’t do what we want! Mish-mash in templates: HTML, JavaScript, TCL, SQL, PL-SQLNo model in app tier, onlyin RDBMS schema created in Oracle Designer
    • 8. Mid Period: 2000s (The “Vendor CMS era”)
    • 9. Mid Period: 2000s (The “Vendor CMS era”)
    • 10. Mid Period: 2000s (The “Vendor CMS era”)After a few years, very difficult to extend Database schemabecomes fixed due to dependencies in templates
    • 11. Mid Period: 2000s (The “Vendor CMS era”)If you can’t change the system:
    • 12. Modern Period circa ’05-09The “J2EE Monolithic” era
    • 13. Web server Web server Web server I bring you NEWS!!!App server App server App server Oracle CMS Data feeds
    • 14. Web server Web server Web server Modern java app I bring you NEWS!!!App server App server App server Spring / Hibernate DDD / TDD Strong Oracle in java model Database abstracted away with ORM CMS Data feeds
    • 15. Problems
    • 16. Each release involves schema upgradeSchema upgrade = downtime for journalists
    • 17. Complexity still increasing: 300+ tables, 10,000 lines of hibernate XML config1,000 domain objects mapped to database 70,000 lines of domain object code Very tight binding to database
    • 18. ORM not really masking complexity: Database has strong influence on domain model: manydomain objects made more complex mapping joins in RDBMSComplex hibernate features used, interceptors, proxies Complex caching strategy
    • 19. ORM not really masking complexity: Database has strong influence on domain model: manydomain objects made more complex mapping joins in RDBMSComplex hibernate features used, interceptors, proxies Complex caching strategy And: We still hand code most queries in SQL!
    • 20. Load becoming an issueRDBMS difficult to scale
    • 21. Partial NoSQL circa ’09-10The “Sticking Plaster” era
    • 22. Introduce yet more caching to patch up load problems Text Introduction of memcached
    • 23. Decouple applications from database by building APIsPower APIs using alternative, more scalable technologies APIs used to scale out database reads Writes still go to RDBMs
    • 24. Mutualised news! Content API http://content.guardianapis.comRead only access to 10 years worth of content
    • 25. Core Api Web servers Solr/API App servers Solr/APIMemcached (20Gb) Solr/API oracle Solr Solr/API Solr/API CMS Cloud, EC2
    • 26. Mutualised news!We’ve solved our load problem (for now) but Increased our complexity
    • 27. Mutualised news! We now have 3 models! RDBMS tables Java Objects JSON API
    • 28. Mutualised news!
    • 29. Mutualised news!
    • 30. Mutualised news!
    • 31. Mutualised news! understandable JSON API is simple andMultiple domain concepts expressed in single document Can be designed in forwardly extensible way
    • 32. Mutualised news! understandable JSON API is simple andMultiple domain concepts expressed in single document Can be designed in forwardly extensible wayWhat if the JSON API was our primary model?
    • 33. Full NoSQL in developmentThe “It’s the future!” era
    • 34. MongoDB Mutualised news! database Document oriented Stores parsed JSON documents Can express complex queries Can be flexible about consistencyMalleable schema: can easily change at runtime Can work at both large & small scales
    • 35. Flexible SchemaMutualised news!
    • 36. Flexible SchemaMutualised news!
    • 37. Flexible SchemaMutualised news!Can easily represent different classes of tag as documents Both documents can be inserted into same collection Far simpler than equivalent hibernate mapped subclass configuration
    • 38. Flexible Schema Simple to query:Mutualised news!
    • 39. Flexible Schema Simple to query:Mutualised news! Query operators: $ne, $nin, $all, $exists, $gt, $lt, $gte ...
    • 40. Modifying the schemaMutualised news!
    • 41. Modifying the schemaMutualised news!
    • 42. Modifying the schemaMutualised news!
    • 43. The first project: IdentityCurrent login/registration system still in TCL/PL-SQL 3M+ users in relational database Very complex schema + PL-SQL New system required Can we migrate from Oracle to NoSql?
    • 44. Build API that can support both backends Registration app guardian.co.uk API This bit is hard! Oracle
    • 45. Build API that can support both backends Registration app guardian.co.uk API MongoDB Oracle
    • 46. Migrate using API & decommision Registration app guardian.co.uk API MongoDB
    • 47. Add new stuff! Registration app guardian.co.uk APIMongoDB Solr? Redis?
    • 48. Summary: Evolving to Document Store Relational databases are dead Ruthlessly reject relational complexity Think hard about your json model Plan your evolution http://content.guardianapis.com graham.tackley@guardian.co.uk - @tackers

    ×