Why I chose   mongodbfor guardian.co.uk              Mat WallLead Software Architect, guardian.co.uk
“It is not the strongest of the species thatsurvives, nor the most intelligent. It is the one       that is most adaptable...
Early Period         circa ’95The “Lash It Together” era
Early Period (95, the “Lash It Together” era) Perl, CGI, apache  ExperimentalManual processesBespoke software RDBMS, scrip...
Mid Period      circa ’00The “Vendor CMS” era
Mid Period: 2000s (The “Vendor CMS era”) Vignette / AOLserver TCL, Apache, Oracle  Platform for online       publishingIni...
Mid Period: 2000s (The “Vendor CMS era”) Surprise! Vendor’s CMSdoesn’t do what we want! Mish-mash in templates: HTML, Java...
Mid Period: 2000s (The “Vendor CMS era”)
Mid Period: 2000s (The “Vendor CMS era”)
Mid Period: 2000s (The “Vendor CMS era”)After a few years, very   difficult to extend  Database schemabecomes fixed due to  ...
Mid Period: 2000s (The “Vendor CMS era”)If you can’t change the        system:
Modern Period       circa ’05-09The “J2EE Monolithic” era
Web server       Web server         Web server  I bring you NEWS!!!App server      App server          App server         ...
Web server         Web server         Web server              Modern java app  I bring you NEWS!!!App server      App serv...
Problems
Each release involves schema upgradeSchema upgrade = downtime for journalists
Complexity still increasing:               300+ tables,  10,000 lines of hibernate XML config1,000 domain objects mapped to...
ORM not really masking complexity: Database has strong influence on domain model: manydomain objects made more complex mapp...
Load becoming an issueRDBMS difficult to scale
Partial NoSQL       circa ’09-10The “Sticking Plaster” era
Introduce yet more caching to patch up load problems Decouple applications from database by building APIsPower APIs using ...
Core                             Api   Web servers                            Solr/API    App server                      ...
Content APIMutualised news! Apache Solr Read API delivered using            Hosted in EC2   Document oriented search engin...
Related content from Solr           Introduction of memcached
Mutualised news!We’ve solved our load problem (for now)                 but       Increased our complexity
Mutualised news!      We now have 3 models!           RDBMS tables            Java Objects             JSON API
Mutualised news!
Mutualised news!
Mutualised news!
MutualisedAPI is very simple           JSON news!Multiple domain concepts expressed in single document     Can be designed...
Full NoSQL    in developmentThe “It’s the future!” era
The first project: IdentityCurrent login/registration system still in TCL/PL-SQL          3M+ users in relational database ...
Database selection      Simple keystore. Too simple?     Huge scalability. Do we need it?        Schema design difficult.  ...
MongoDB Mutualised news! database     Document oriented       Stores parsed JSON documents        Can express complex quer...
MongoDB conceptsMutualised news!    RDBMS           MongoDB     Table          Collection      Row        JSON Document   ...
Flexible SchemaMutualised news!
Flexible SchemaMutualised news!
Flexible SchemaMutualised news!Can easily represent different classes of tag as                 documents    Both document...
Flexible Schema        Simple to query:Mutualised news!
Flexible Schema              Simple to query:Mutualised news!            Query operators:  $ne, $nin, $all, $exists, $gt, ...
Modifying the schemaMutualised news!
Modifying the schemaMutualised news!
Modifying the schemaMutualised news!
Schema upgrades     Mutualised news!   Schema can be upgraded simply by upgrading the                 application versionA...
Schema upgrades      Mutualised news! by:           This can be mitigated        Adding a “version” key to each documentUp...
Mongodb architecture                     mongod                  Single nodeDurability only possible in upcoming 1.8 relea...
Mongodb architecture         master                      replicas          mongod                      mongod             ...
Mongodb architecture        master                          replicas        Durability achieved (<1.8) via replication    ...
Mongodb architectureAggregator                         mongosconsistent     shard      shard             shard     shard (...
Mongodb architecture                Writes scaled by shardingAggregator                            mongos                S...
Mongodb durabilityRelies (pre 1.8) on replication for durability1.8 features optional journaling & redo logs Database user...
Old Idenity system Hundreds of tables & stored proceduresMutualised news!       New Identity model     User               ...
Very simple domain objects   Simple, flexible objects    No hibernate session
Very simple domain objectsFlexible schema embraced in domain object design
Very simple domain objectsUsing casbah scala drivers = significant reduction in LOC                 vs SQL implementation
Build API that can support both backends    Registration app         guardian.co.uk                       API         Mong...
Build API that can support both backends    Registration app         guardian.co.uk                       API             ...
Migrate using API & decommision Registration app         guardian.co.uk                    API      MongoDB
Add new stuff!    Registration app           guardian.co.uk                         APIMongoDB                Solr?       ...
MongoDBSimple, flexible schema with similar query & indexing to                        RDBMS              Great at small or...
Shameless plug        We’re hiring:http://www.careersatgnl.co.uk
Q con london2011-matthewwall-whyichosemongodbforguardiancouk
Q con london2011-matthewwall-whyichosemongodbforguardiancouk
Upcoming SlideShare
Loading in …5
×

Q con london2011-matthewwall-whyichosemongodbforguardiancouk

1,602 views

Published on

http://www.infoq.com/presentations/Why-I-Chose-MongoDB-for-Guardian

Published in: Technology
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,602
On SlideShare
0
From Embeds
0
Number of Embeds
17
Actions
Shares
0
Downloads
0
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

Q con london2011-matthewwall-whyichosemongodbforguardiancouk

  1. 1. Why I chose mongodbfor guardian.co.uk Mat WallLead Software Architect, guardian.co.uk
  2. 2. “It is not the strongest of the species thatsurvives, nor the most intelligent. It is the one that is most adaptable to change.”
  3. 3. Early Period circa ’95The “Lash It Together” era
  4. 4. Early Period (95, the “Lash It Together” era) Perl, CGI, apache ExperimentalManual processesBespoke software RDBMS, scripts & static files
  5. 5. Mid Period circa ’00The “Vendor CMS” era
  6. 6. Mid Period: 2000s (The “Vendor CMS era”) Vignette / AOLserver TCL, Apache, Oracle Platform for online publishingInitially scales well withacceleration in delivery of features
  7. 7. Mid Period: 2000s (The “Vendor CMS era”) Surprise! Vendor’s CMSdoesn’t do what we want! Mish-mash in templates: HTML, JavaScript, TCL, SQL, PL-SQLNo model in app tier, onlyin RDBMS schema created in Oracle Designer
  8. 8. Mid Period: 2000s (The “Vendor CMS era”)
  9. 9. Mid Period: 2000s (The “Vendor CMS era”)
  10. 10. Mid Period: 2000s (The “Vendor CMS era”)After a few years, very difficult to extend Database schemabecomes fixed due to dependencies in templates
  11. 11. Mid Period: 2000s (The “Vendor CMS era”)If you can’t change the system:
  12. 12. Modern Period circa ’05-09The “J2EE Monolithic” era
  13. 13. Web server Web server Web server I bring you NEWS!!!App server App server App server Oracle CMS Data feeds
  14. 14. Web server Web server Web server Modern java app I bring you NEWS!!!App server App server App server Spring / Hibernate DDD / TDD Strong Oracle in java model Database abstracted away with ORM CMS Data feeds
  15. 15. Problems
  16. 16. Each release involves schema upgradeSchema upgrade = downtime for journalists
  17. 17. Complexity still increasing: 300+ tables, 10,000 lines of hibernate XML config1,000 domain objects mapped to database 70,000 lines of domain object code Very tight binding to database
  18. 18. ORM not really masking complexity: Database has strong influence on domain model: manydomain objects made more complex mapping joins in RDBMSComplex hibernate features used, interceptors, proxies Complex caching strategy Lots of optimisations And:We still hand code complex queries in SQL!
  19. 19. Load becoming an issueRDBMS difficult to scale
  20. 20. Partial NoSQL circa ’09-10The “Sticking Plaster” era
  21. 21. Introduce yet more caching to patch up load problems Decouple applications from database by building APIsPower APIs using alternative, more scalable technologies APIs used to scale out database reads Writes still go to RDBMs
  22. 22. Core Api Web servers Solr/API App server Solr/APIMemcached (20Gb) Solr/API rdbms Solr Solr/API M/Q Solr/API CMS Cloud, EC2
  23. 23. Content APIMutualised news! Apache Solr Read API delivered using Hosted in EC2 Document oriented search engine Loose schema: records, fields, facets Scales well for read operations
  24. 24. Related content from Solr Introduction of memcached
  25. 25. Mutualised news!We’ve solved our load problem (for now) but Increased our complexity
  26. 26. Mutualised news! We now have 3 models! RDBMS tables Java Objects JSON API
  27. 27. Mutualised news!
  28. 28. Mutualised news!
  29. 29. Mutualised news!
  30. 30. MutualisedAPI is very simple JSON news!Multiple domain concepts expressed in single document Can be designed in forwardly extensible wayWhat if the JSON API was our primary model?
  31. 31. Full NoSQL in developmentThe “It’s the future!” era
  32. 32. The first project: IdentityCurrent login/registration system still in TCL/PL-SQL 3M+ users in relational database Very complex schema + PL-SQL New system required Can we migrate from Oracle to NoSql?
  33. 33. Database selection Simple keystore. Too simple? Huge scalability. Do we need it? Schema design difficult. Simple to use, can execute similar queries to RDBMs
  34. 34. MongoDB Mutualised news! database Document oriented Stores parsed JSON documents Can express complex queries Can be flexible about consistencyMalleable schema: can easily change at runtime Can work at both large & small scales
  35. 35. MongoDB conceptsMutualised news! RDBMS MongoDB Table Collection Row JSON Document Index Index Join Embedding & Linking Partition Shard
  36. 36. Flexible SchemaMutualised news!
  37. 37. Flexible SchemaMutualised news!
  38. 38. Flexible SchemaMutualised news!Can easily represent different classes of tag as documents Both documents can be inserted into same collection Far simpler than equivalent hibernate mapped subclass configuration
  39. 39. Flexible Schema Simple to query:Mutualised news!
  40. 40. Flexible Schema Simple to query:Mutualised news! Query operators: $ne, $nin, $all, $exists, $gt, $lt, $gte ...
  41. 41. Modifying the schemaMutualised news!
  42. 42. Modifying the schemaMutualised news!
  43. 43. Modifying the schemaMutualised news!
  44. 44. Schema upgrades Mutualised news! Schema can be upgraded simply by upgrading the application versionApplication must deal with differing document versions Can become complex over time
  45. 45. Schema upgrades Mutualised news! by: This can be mitigated Adding a “version” key to each documentUpdating the version each time the application modifies a documentUsing MapReduce capability to forcibly migrate documents from older versions if required
  46. 46. Mongodb architecture mongod Single nodeDurability only possible in upcoming 1.8 release (databse fsync from buffer every min)
  47. 47. Mongodb architecture master replicas mongod mongod mongod Replica set mongod mongod Can choose to read & Can choose to run readswrite from master for full on slaves to scale reads consistency
  48. 48. Mongodb architecture master replicas Durability achieved (<1.8) via replication mongod mongod Reads can be scaled out onto replicas mongod (eventual consistency) Replica set mongod All writes to master mongod If master fails, new master nominated by election Can choose to read & Can choose to accept dirtywrite from drivers handle most cluster complexity scale DB master for full reads from slaves to consistency reads
  49. 49. Mongodb architectureAggregator mongosconsistent shard shard shard shard (master) replica replica replica replicainconsistent (replica) replica replica replica replica replica replica replica replica
  50. 50. Mongodb architecture Writes scaled by shardingAggregator mongos Shards populated by rangesconsistent shard shard shard shard (master) mongos queries appropriate shard(s) Shards automatically balanced replica replica replica replicainconsistent (replica) replica replica replica replica Developers (essentially) unaware of shards replica replica replica replica
  51. 51. Mongodb durabilityRelies (pre 1.8) on replication for durability1.8 features optional journaling & redo logs Database users need to be cluster aware, each query can specify: No error checking / write confirmation Write confirmed on master Write replicated to N slave servers
  52. 52. Old Idenity system Hundreds of tables & stored proceduresMutualised news! New Identity model User List Fields Text Dates Date/Time Statuses Boolean
  53. 53. Very simple domain objects Simple, flexible objects No hibernate session
  54. 54. Very simple domain objectsFlexible schema embraced in domain object design
  55. 55. Very simple domain objectsUsing casbah scala drivers = significant reduction in LOC vs SQL implementation
  56. 56. Build API that can support both backends Registration app guardian.co.uk API MongoDB Oracle
  57. 57. Build API that can support both backends Registration app guardian.co.uk API This bit is hard! MongoDB Oracle
  58. 58. Migrate using API & decommision Registration app guardian.co.uk API MongoDB
  59. 59. Add new stuff! Registration app guardian.co.uk APIMongoDB Solr? Redis?
  60. 60. MongoDBSimple, flexible schema with similar query & indexing to RDBMS Great at small or large scale Easy for developers to get going Commercial support available (10Gen) One day may power all of guardian.co.ukNo transactions / joins: developers must cater for thisProduces a net reduction in lines of code / complexity
  61. 61. Shameless plug We’re hiring:http://www.careersatgnl.co.uk

×