Senior Solutions Architect, MongoDB
Mark Helmstetter
twitter.com/mongodb
MongoDB for Content
Management
Agenda
• MongoDB Features and Overview
• Sample Content Management System (CMS)
Application
• Schema Design Considerations
• Building Feeds and Querying Data
• Replication, Failover, and Scaling
• Case Studies
• Further Resources
MongoDB Features
• JSON Document Model with
Dynamic Schemas
• Auto-Sharding for Horizontal
Scalability
• Text Search
• Aggregation Framework and
MapReduce
• Full, Flexible Index Support
and Rich Queries
• Built-In Replication for High
Availability
• Advanced Security
• Large Media Storage with
GridFS
Sample CMS Application
CMS Application Overview
• Business news service
• Hundreds of stories per day
• Millions of website visitors per month
• Comments
• Related stories
• Tags
• Company profiles
Viewing Stories (Web Site)
Headline
Date, Byline
Copy
Comments
Tags
Related Stories
Viewing Categories/Tags (Web Site)
Headline
Date, Byline
Lead Text
Headline
Date, Byline
Lead Text
Sample Article
Headline
Byline, Date, Comments
Copy
Related Stories
Image
Schema Design
Considerations
Sample Relational DB Structure
story
id
headline
copy
authorid
slug
…
author
id
first_name
last_name
title
…
tag
id
name
…
Comment
id
storyid
name
Email
comment_text
…
related_story
id
storyid
related_storyid
…
link_story_tag
id
storyid
tagid
…
Sample Relational DB Structure
• Number of queries per page load?
• Caching layers add complexity
• Tables may grow to millions of rows
• Joins will become slower over time as db increases
in size
• Schema changes
• Scaling database to handle more reads
MongoDB Schema Design
• “Dynamic Schema”, however, schema design is
important
• JSON documents
• Design for the use case and work backwards
• Avoid a relational model in MongoDB
• No joins or transactions, most related information
should be contained in the same document
• Atomic updates on documents, equivalent of
transaction
{
_id: 375,
headline: ”Apple Reports Second Quarter Earnings",
date: ISODate("2013-07-14T01:00:00+01:00"),
url: “apple-reports-second-quarter-earnings”,
byline: {
author: “Jason Zucchetto”,
title: “Lead Business Editor”
},
copy: “Apple reported second quarter revenue today…”,
tags: [
”AAPL",
”Earnings”
],
comments: [
{ name: “Frank”, comment: “Great story!”}
]
}
Sample MongoDB Schema
{
_id: 375,
headline: ”Apple Reports Second Quarter Earnings",
date: ISODate("2013-07-14T01:00:00+01:00"),
url: “apple-reports-second-quarter-earnings”,
byline: {
author: “Jason Zucchetto”,
title: “Lead Business Editor”
},
copy: “Apple reported second quarter revenue today…”,
tags: [
”AAPL",
”Earnings”
],
image: “/images/aapl/tim-cook.jpg”,
ticker: “AAPL”
}
Adding Fields Based on
Story
{
_id: 375,
headline: ”Apple Reports Second Quarter Earnings",
date: ISODate("2013-07-14T01:00:00+01:00"),
url: “apple-reports-second-quarter-earnings”,
…
copy: “Apple reported second quarter revenue today…”,
tags: [
”AAPL",
”Earnings”
],
last25comments: [
{ name: “Frank”, comment: “Great story!”},
{ name: “John”, comment: “This is interesting”}
…
]
}
High Comment Volume
{
_id: 375,
headline: ”Apple Reports Second Quarter Earnings",
date: ISODate("2013-07-14T01:00:00+01:00"),
url: “apple-reports-second-quarter-earnings”,
…
relatedstories: [
{
headline: “Google Reports on Revenue”,
date: ISODate("2013-07-15T01:00:00+01:00"),
slug: “goog-revenue-third-quarter”
}, {
headline: “Yahoo Reports on Revenue”,
date: ISODate("2013-07-15T01:00:00+01:00"),
slug: “yhoo-revenue-third-quarter”
}
]
}
Managing Related Stories
{ // Story Collection (sample document)
_id: 375,
headline: ”Apple Reports Second Quarter Earnings",
date: ISODate("2013-07-14T01:00:00+01:00"),
url: “apple-reports-second-quarter-earnings”,
byline: {
author: “Jason Zucchetto”,
title: “Lead Business Editor”
},
copy: “Apple reported second quarter revenue today…”,
tags: [
”AAPL",
”Earnings”
],
last25comments: [
{ name: “Frank”, comment: “Great story!”},
{ name: “John”, comment: “This is interesting”}
]
Full Sample Story Schema
image: “/images/aapl/tim-cook.jpg”,
ticker: “AAPL”,
relatedstories: [
{
headline: “Google Reports on Revenue”,
date: ISODate("2013-07-15T01:00:00+01:00"),
slug: “goog-revenue-third-quarter”
}, {
headline: “Yahoo Reports on Revenue”,
date: ISODate("2013-07-15T01:00:00+01:00"),
slug: “yhoo-revenue-third-quarter”
}
]
}
Full Sample Story Schema
story
{
headline
date
url
…
relatedstories : []
last25comments : []
…
companyid
}
CMS Collections
comment
{
story_id
name
comment
}
company
{
name
url
location
ticker
last25stories : []
}
Querying and Indexing
// Display a story, related stories, and first page of comments
db.story.find( { url: “apple-reports-second-quarter-earnings” });
// Display a story, related stories, and second page of comments
db.story.find( { url: “apple-reports-second-quarter-earnings” });
db.comment.find( { story_id : 1234 }).limit(25).skip(25).sort({ date
: -1 });
// All Stories for a given tag
db.story.find( { tags: “Earnings” });
Querying MongoDB
// Display data for a company, latest stories
db.company.find( { url: “apple-inc” });
// Display data for a company, all stories
db.company.find( { url: “apple-inc” });
db.story.find( { company_id : 1234 });
Querying MongoDB
// Inserting new stories are easy, just submit JSON document
db.story.insert( { headline: “Apple Reports Revenue”... });
// Adding story tags
db.story.update( { _id : 375 }, { $addToSet : { tags : "AAPL" } }
)
// Adding a comment (if embedding comments in story)
db.story.update( { _id : 375 }, { $push: { comments: { name:
‘Jason’, ‘comment: ‘Great Story’} } } )
Inserting and Updating
Stories
// Index on story slug
db.story.ensureIndex( { url: 1 });
// Index on story tags
db.story.ensureIndex( { tags: 1 });
MongoDB Indexes for CMS
Building Custom RSS Feeds
// Very simple to gather specific information for a feed
db.story.find( { tags: { $in : [“Earnings”, “AAPL”] } }).sort(
{ date : -1 });
Query Tags and Sort by Date
Replication, Failover, and
Scaling
Replication
• Extremely easy to set up
• Replica node can trail primary node and maintain a
copy of the primary database
• Useful for disaster recovery, failover, backups, and
specific workloads such as analytics
• When Primary goes down, a Secondary will
automatically become the new Primary
Replication
Reading from Secondaries (Delayed
Consistency)
Reading from Secondaries (Delayed
Consistency)
Scaling Horizontally
• Important to keep working data set in RAM
• When working data set exceeds RAM, easy to add
additional machines and segment data across
machines (sharding)
Sharding with MongoDB
Case Studies
Runs unified data store serving hundreds of
diverse web properties on MongoDB
Case Study
Problem Why MongoDB Results
• Hundreds of diverse
web properties built on
Java-based CMS
• Rich documents forced
into ill-suited model
• Adding new data types,
tables to RDBMS killed
read performance
• Flexible schema
• Rich querying and support
for secondary index
support
• Easy to manage
replication and scaling
• Developers can focus on
end-user features instead
of back-end storage
• Simplified day-to-day
operations
• Simple to add new brands,
content types, etc. to
platform
Serves targeted content to users using MongoDB-
powered identity system
Case Study
Problem Why MongoDB Results
• 20M+ unique visitors
per month
• Rigid relational schema
unable to evolve with
changing data types
and new features
• Slow development
cycles
• Easy-to-manage
dynamic data model
enables limitless
growth, interactive
content
• Support for ad hoc
queries
• Highly extensible
• Rapid rollout of new
features
• Customized, social
conversations
throughout site
• Tracks user data to
increase engagement,
revenue
Powers content-serving web platform on
MongoDB to deliver dynamic data to users
Case Study
Problem Why MongoDB Results
• Static web content
• Siloed data stores,
disparate technologies
• Unable to aggregate
and integrate data for
dynamic content
• Support for agile
development
• Easy to use and
maintain
• Low subscription and
HW costs
• Ability to serve dynamic
content
• Decreased TCO
• Replaced multiple
technologies with single
MongoDB database
Resource Location
MongoDB Downloads mongodb.com/download
Free Online Training education. mongodb.com
Webinars and Events mongodb.com/events
White Papers mongodb.com/white-papers
Case Studies mongodb.com/customers
Presentations mongodb.com/presentations
Documentation docs.mongodb.org
Additional Info info@mongodb.com
For More Information
Resource Location
Questions?
MongoDB World
New York City, June 23-25
#MongoDBWorld
See what’s next in MongoDB including
• MongoDB 2.6
• Sharding
• Replication
• Aggregation
http://world.mongodb.com
Save $200 with discount code THANKYOU
Senior Solutions Architect, MongoDB
Mark Helmstetter
#MongoDB
Thank You

Content Management with MongoDB by Mark Helmstetter

  • 1.
    Senior Solutions Architect,MongoDB Mark Helmstetter twitter.com/mongodb MongoDB for Content Management
  • 2.
    Agenda • MongoDB Featuresand Overview • Sample Content Management System (CMS) Application • Schema Design Considerations • Building Feeds and Querying Data • Replication, Failover, and Scaling • Case Studies • Further Resources
  • 3.
    MongoDB Features • JSONDocument Model with Dynamic Schemas • Auto-Sharding for Horizontal Scalability • Text Search • Aggregation Framework and MapReduce • Full, Flexible Index Support and Rich Queries • Built-In Replication for High Availability • Advanced Security • Large Media Storage with GridFS
  • 4.
  • 5.
    CMS Application Overview •Business news service • Hundreds of stories per day • Millions of website visitors per month • Comments • Related stories • Tags • Company profiles
  • 6.
    Viewing Stories (WebSite) Headline Date, Byline Copy Comments Tags Related Stories
  • 7.
    Viewing Categories/Tags (WebSite) Headline Date, Byline Lead Text Headline Date, Byline Lead Text
  • 8.
    Sample Article Headline Byline, Date,Comments Copy Related Stories Image
  • 9.
  • 10.
    Sample Relational DBStructure story id headline copy authorid slug … author id first_name last_name title … tag id name … Comment id storyid name Email comment_text … related_story id storyid related_storyid … link_story_tag id storyid tagid …
  • 11.
    Sample Relational DBStructure • Number of queries per page load? • Caching layers add complexity • Tables may grow to millions of rows • Joins will become slower over time as db increases in size • Schema changes • Scaling database to handle more reads
  • 12.
    MongoDB Schema Design •“Dynamic Schema”, however, schema design is important • JSON documents • Design for the use case and work backwards • Avoid a relational model in MongoDB • No joins or transactions, most related information should be contained in the same document • Atomic updates on documents, equivalent of transaction
  • 13.
    { _id: 375, headline: ”AppleReports Second Quarter Earnings", date: ISODate("2013-07-14T01:00:00+01:00"), url: “apple-reports-second-quarter-earnings”, byline: { author: “Jason Zucchetto”, title: “Lead Business Editor” }, copy: “Apple reported second quarter revenue today…”, tags: [ ”AAPL", ”Earnings” ], comments: [ { name: “Frank”, comment: “Great story!”} ] } Sample MongoDB Schema
  • 14.
    { _id: 375, headline: ”AppleReports Second Quarter Earnings", date: ISODate("2013-07-14T01:00:00+01:00"), url: “apple-reports-second-quarter-earnings”, byline: { author: “Jason Zucchetto”, title: “Lead Business Editor” }, copy: “Apple reported second quarter revenue today…”, tags: [ ”AAPL", ”Earnings” ], image: “/images/aapl/tim-cook.jpg”, ticker: “AAPL” } Adding Fields Based on Story
  • 15.
    { _id: 375, headline: ”AppleReports Second Quarter Earnings", date: ISODate("2013-07-14T01:00:00+01:00"), url: “apple-reports-second-quarter-earnings”, … copy: “Apple reported second quarter revenue today…”, tags: [ ”AAPL", ”Earnings” ], last25comments: [ { name: “Frank”, comment: “Great story!”}, { name: “John”, comment: “This is interesting”} … ] } High Comment Volume
  • 16.
    { _id: 375, headline: ”AppleReports Second Quarter Earnings", date: ISODate("2013-07-14T01:00:00+01:00"), url: “apple-reports-second-quarter-earnings”, … relatedstories: [ { headline: “Google Reports on Revenue”, date: ISODate("2013-07-15T01:00:00+01:00"), slug: “goog-revenue-third-quarter” }, { headline: “Yahoo Reports on Revenue”, date: ISODate("2013-07-15T01:00:00+01:00"), slug: “yhoo-revenue-third-quarter” } ] } Managing Related Stories
  • 17.
    { // StoryCollection (sample document) _id: 375, headline: ”Apple Reports Second Quarter Earnings", date: ISODate("2013-07-14T01:00:00+01:00"), url: “apple-reports-second-quarter-earnings”, byline: { author: “Jason Zucchetto”, title: “Lead Business Editor” }, copy: “Apple reported second quarter revenue today…”, tags: [ ”AAPL", ”Earnings” ], last25comments: [ { name: “Frank”, comment: “Great story!”}, { name: “John”, comment: “This is interesting”} ] Full Sample Story Schema
  • 18.
    image: “/images/aapl/tim-cook.jpg”, ticker: “AAPL”, relatedstories:[ { headline: “Google Reports on Revenue”, date: ISODate("2013-07-15T01:00:00+01:00"), slug: “goog-revenue-third-quarter” }, { headline: “Yahoo Reports on Revenue”, date: ISODate("2013-07-15T01:00:00+01:00"), slug: “yhoo-revenue-third-quarter” } ] } Full Sample Story Schema
  • 19.
    story { headline date url … relatedstories : [] last25comments: [] … companyid } CMS Collections comment { story_id name comment } company { name url location ticker last25stories : [] }
  • 20.
  • 21.
    // Display astory, related stories, and first page of comments db.story.find( { url: “apple-reports-second-quarter-earnings” }); // Display a story, related stories, and second page of comments db.story.find( { url: “apple-reports-second-quarter-earnings” }); db.comment.find( { story_id : 1234 }).limit(25).skip(25).sort({ date : -1 }); // All Stories for a given tag db.story.find( { tags: “Earnings” }); Querying MongoDB
  • 22.
    // Display datafor a company, latest stories db.company.find( { url: “apple-inc” }); // Display data for a company, all stories db.company.find( { url: “apple-inc” }); db.story.find( { company_id : 1234 }); Querying MongoDB
  • 23.
    // Inserting newstories are easy, just submit JSON document db.story.insert( { headline: “Apple Reports Revenue”... }); // Adding story tags db.story.update( { _id : 375 }, { $addToSet : { tags : "AAPL" } } ) // Adding a comment (if embedding comments in story) db.story.update( { _id : 375 }, { $push: { comments: { name: ‘Jason’, ‘comment: ‘Great Story’} } } ) Inserting and Updating Stories
  • 24.
    // Index onstory slug db.story.ensureIndex( { url: 1 }); // Index on story tags db.story.ensureIndex( { tags: 1 }); MongoDB Indexes for CMS
  • 25.
  • 26.
    // Very simpleto gather specific information for a feed db.story.find( { tags: { $in : [“Earnings”, “AAPL”] } }).sort( { date : -1 }); Query Tags and Sort by Date
  • 27.
  • 28.
    Replication • Extremely easyto set up • Replica node can trail primary node and maintain a copy of the primary database • Useful for disaster recovery, failover, backups, and specific workloads such as analytics • When Primary goes down, a Secondary will automatically become the new Primary
  • 29.
  • 30.
    Reading from Secondaries(Delayed Consistency) Reading from Secondaries (Delayed Consistency)
  • 31.
    Scaling Horizontally • Importantto keep working data set in RAM • When working data set exceeds RAM, easy to add additional machines and segment data across machines (sharding)
  • 32.
  • 33.
  • 34.
    Runs unified datastore serving hundreds of diverse web properties on MongoDB Case Study Problem Why MongoDB Results • Hundreds of diverse web properties built on Java-based CMS • Rich documents forced into ill-suited model • Adding new data types, tables to RDBMS killed read performance • Flexible schema • Rich querying and support for secondary index support • Easy to manage replication and scaling • Developers can focus on end-user features instead of back-end storage • Simplified day-to-day operations • Simple to add new brands, content types, etc. to platform
  • 35.
    Serves targeted contentto users using MongoDB- powered identity system Case Study Problem Why MongoDB Results • 20M+ unique visitors per month • Rigid relational schema unable to evolve with changing data types and new features • Slow development cycles • Easy-to-manage dynamic data model enables limitless growth, interactive content • Support for ad hoc queries • Highly extensible • Rapid rollout of new features • Customized, social conversations throughout site • Tracks user data to increase engagement, revenue
  • 36.
    Powers content-serving webplatform on MongoDB to deliver dynamic data to users Case Study Problem Why MongoDB Results • Static web content • Siloed data stores, disparate technologies • Unable to aggregate and integrate data for dynamic content • Support for agile development • Easy to use and maintain • Low subscription and HW costs • Ability to serve dynamic content • Decreased TCO • Replaced multiple technologies with single MongoDB database
  • 37.
    Resource Location MongoDB Downloadsmongodb.com/download Free Online Training education. mongodb.com Webinars and Events mongodb.com/events White Papers mongodb.com/white-papers Case Studies mongodb.com/customers Presentations mongodb.com/presentations Documentation docs.mongodb.org Additional Info info@mongodb.com For More Information Resource Location
  • 38.
  • 39.
    MongoDB World New YorkCity, June 23-25 #MongoDBWorld See what’s next in MongoDB including • MongoDB 2.6 • Sharding • Replication • Aggregation http://world.mongodb.com Save $200 with discount code THANKYOU
  • 40.
    Senior Solutions Architect,MongoDB Mark Helmstetter #MongoDB Thank You