Content Management with MongoDB by Mark Helmstetter

59,258 views

Published on

MongoDB is great for content management and delivery across a multitude of apps such as e-commerce websites, online publications, web content management systems (CMS), document management, archives and others. MongoDB's flexible schema and data model make it easy to catalog multiple content types with diverse meta data.

-Schema design for content management
-Using GridFS for storing binary files
-How you can leverage MongoDB's auto-sharding to partition your content across multiple servers

Published in: Technology

Content Management with MongoDB by Mark Helmstetter

  1. 1. Senior Solutions Architect, MongoDB Mark Helmstetter twitter.com/mongodb MongoDB for Content Management
  2. 2. Agenda • MongoDB Features and Overview • Sample Content Management System (CMS) Application • Schema Design Considerations • Building Feeds and Querying Data • Replication, Failover, and Scaling • Case Studies • Further Resources
  3. 3. MongoDB Features • JSON Document Model with Dynamic Schemas • Auto-Sharding for Horizontal Scalability • Text Search • Aggregation Framework and MapReduce • Full, Flexible Index Support and Rich Queries • Built-In Replication for High Availability • Advanced Security • Large Media Storage with GridFS
  4. 4. Sample CMS Application
  5. 5. CMS Application Overview • Business news service • Hundreds of stories per day • Millions of website visitors per month • Comments • Related stories • Tags • Company profiles
  6. 6. Viewing Stories (Web Site) Headline Date, Byline Copy Comments Tags Related Stories
  7. 7. Viewing Categories/Tags (Web Site) Headline Date, Byline Lead Text Headline Date, Byline Lead Text
  8. 8. Sample Article Headline Byline, Date, Comments Copy Related Stories Image
  9. 9. Schema Design Considerations
  10. 10. Sample Relational DB Structure story id headline copy authorid slug … author id first_name last_name title … tag id name … Comment id storyid name Email comment_text … related_story id storyid related_storyid … link_story_tag id storyid tagid …
  11. 11. Sample Relational DB Structure • Number of queries per page load? • Caching layers add complexity • Tables may grow to millions of rows • Joins will become slower over time as db increases in size • Schema changes • Scaling database to handle more reads
  12. 12. MongoDB Schema Design • “Dynamic Schema”, however, schema design is important • JSON documents • Design for the use case and work backwards • Avoid a relational model in MongoDB • No joins or transactions, most related information should be contained in the same document • Atomic updates on documents, equivalent of transaction
  13. 13. { _id: 375, headline: ”Apple Reports Second Quarter Earnings", date: ISODate("2013-07-14T01:00:00+01:00"), url: “apple-reports-second-quarter-earnings”, byline: { author: “Jason Zucchetto”, title: “Lead Business Editor” }, copy: “Apple reported second quarter revenue today…”, tags: [ ”AAPL", ”Earnings” ], comments: [ { name: “Frank”, comment: “Great story!”} ] } Sample MongoDB Schema
  14. 14. { _id: 375, headline: ”Apple Reports Second Quarter Earnings", date: ISODate("2013-07-14T01:00:00+01:00"), url: “apple-reports-second-quarter-earnings”, byline: { author: “Jason Zucchetto”, title: “Lead Business Editor” }, copy: “Apple reported second quarter revenue today…”, tags: [ ”AAPL", ”Earnings” ], image: “/images/aapl/tim-cook.jpg”, ticker: “AAPL” } Adding Fields Based on Story
  15. 15. { _id: 375, headline: ”Apple Reports Second Quarter Earnings", date: ISODate("2013-07-14T01:00:00+01:00"), url: “apple-reports-second-quarter-earnings”, … copy: “Apple reported second quarter revenue today…”, tags: [ ”AAPL", ”Earnings” ], last25comments: [ { name: “Frank”, comment: “Great story!”}, { name: “John”, comment: “This is interesting”} … ] } High Comment Volume
  16. 16. { _id: 375, headline: ”Apple Reports Second Quarter Earnings", date: ISODate("2013-07-14T01:00:00+01:00"), url: “apple-reports-second-quarter-earnings”, … relatedstories: [ { headline: “Google Reports on Revenue”, date: ISODate("2013-07-15T01:00:00+01:00"), slug: “goog-revenue-third-quarter” }, { headline: “Yahoo Reports on Revenue”, date: ISODate("2013-07-15T01:00:00+01:00"), slug: “yhoo-revenue-third-quarter” } ] } Managing Related Stories
  17. 17. { // Story Collection (sample document) _id: 375, headline: ”Apple Reports Second Quarter Earnings", date: ISODate("2013-07-14T01:00:00+01:00"), url: “apple-reports-second-quarter-earnings”, byline: { author: “Jason Zucchetto”, title: “Lead Business Editor” }, copy: “Apple reported second quarter revenue today…”, tags: [ ”AAPL", ”Earnings” ], last25comments: [ { name: “Frank”, comment: “Great story!”}, { name: “John”, comment: “This is interesting”} ] Full Sample Story Schema
  18. 18. image: “/images/aapl/tim-cook.jpg”, ticker: “AAPL”, relatedstories: [ { headline: “Google Reports on Revenue”, date: ISODate("2013-07-15T01:00:00+01:00"), slug: “goog-revenue-third-quarter” }, { headline: “Yahoo Reports on Revenue”, date: ISODate("2013-07-15T01:00:00+01:00"), slug: “yhoo-revenue-third-quarter” } ] } Full Sample Story Schema
  19. 19. story { headline date url … relatedstories : [] last25comments : [] … companyid } CMS Collections comment { story_id name comment } company { name url location ticker last25stories : [] }
  20. 20. Querying and Indexing
  21. 21. // Display a story, related stories, and first page of comments db.story.find( { url: “apple-reports-second-quarter-earnings” }); // Display a story, related stories, and second page of comments db.story.find( { url: “apple-reports-second-quarter-earnings” }); db.comment.find( { story_id : 1234 }).limit(25).skip(25).sort({ date : -1 }); // All Stories for a given tag db.story.find( { tags: “Earnings” }); Querying MongoDB
  22. 22. // Display data for a company, latest stories db.company.find( { url: “apple-inc” }); // Display data for a company, all stories db.company.find( { url: “apple-inc” }); db.story.find( { company_id : 1234 }); Querying MongoDB
  23. 23. // Inserting new stories are easy, just submit JSON document db.story.insert( { headline: “Apple Reports Revenue”... }); // Adding story tags db.story.update( { _id : 375 }, { $addToSet : { tags : "AAPL" } } ) // Adding a comment (if embedding comments in story) db.story.update( { _id : 375 }, { $push: { comments: { name: ‘Jason’, ‘comment: ‘Great Story’} } } ) Inserting and Updating Stories
  24. 24. // Index on story slug db.story.ensureIndex( { url: 1 }); // Index on story tags db.story.ensureIndex( { tags: 1 }); MongoDB Indexes for CMS
  25. 25. Building Custom RSS Feeds
  26. 26. // Very simple to gather specific information for a feed db.story.find( { tags: { $in : [“Earnings”, “AAPL”] } }).sort( { date : -1 }); Query Tags and Sort by Date
  27. 27. Replication, Failover, and Scaling
  28. 28. Replication • Extremely easy to set up • Replica node can trail primary node and maintain a copy of the primary database • Useful for disaster recovery, failover, backups, and specific workloads such as analytics • When Primary goes down, a Secondary will automatically become the new Primary
  29. 29. Replication
  30. 30. Reading from Secondaries (Delayed Consistency) Reading from Secondaries (Delayed Consistency)
  31. 31. Scaling Horizontally • Important to keep working data set in RAM • When working data set exceeds RAM, easy to add additional machines and segment data across machines (sharding)
  32. 32. Sharding with MongoDB
  33. 33. Case Studies
  34. 34. Runs unified data store serving hundreds of diverse web properties on MongoDB Case Study Problem Why MongoDB Results • Hundreds of diverse web properties built on Java-based CMS • Rich documents forced into ill-suited model • Adding new data types, tables to RDBMS killed read performance • Flexible schema • Rich querying and support for secondary index support • Easy to manage replication and scaling • Developers can focus on end-user features instead of back-end storage • Simplified day-to-day operations • Simple to add new brands, content types, etc. to platform
  35. 35. Serves targeted content to users using MongoDB- powered identity system Case Study Problem Why MongoDB Results • 20M+ unique visitors per month • Rigid relational schema unable to evolve with changing data types and new features • Slow development cycles • Easy-to-manage dynamic data model enables limitless growth, interactive content • Support for ad hoc queries • Highly extensible • Rapid rollout of new features • Customized, social conversations throughout site • Tracks user data to increase engagement, revenue
  36. 36. Powers content-serving web platform on MongoDB to deliver dynamic data to users Case Study Problem Why MongoDB Results • Static web content • Siloed data stores, disparate technologies • Unable to aggregate and integrate data for dynamic content • Support for agile development • Easy to use and maintain • Low subscription and HW costs • Ability to serve dynamic content • Decreased TCO • Replaced multiple technologies with single MongoDB database
  37. 37. Resource Location MongoDB Downloads mongodb.com/download Free Online Training education. mongodb.com Webinars and Events mongodb.com/events White Papers mongodb.com/white-papers Case Studies mongodb.com/customers Presentations mongodb.com/presentations Documentation docs.mongodb.org Additional Info info@mongodb.com For More Information Resource Location
  38. 38. Questions?
  39. 39. MongoDB World New York City, June 23-25 #MongoDBWorld See what’s next in MongoDB including • MongoDB 2.6 • Sharding • Replication • Aggregation http://world.mongodb.com Save $200 with discount code THANKYOU
  40. 40. Senior Solutions Architect, MongoDB Mark Helmstetter #MongoDB Thank You

×