Media owners are turning to MongoDB to drive social interaction with their published content. The way customers consume information has changed and passive communication is no longer enough. They want to comment, share and engage with publishers and their community through a range of media types and via multiple channels whenever and wherever they are. There are serious challenges with taking this semi-structured and unstructured data and making it work in a traditional relational database. This webinar looks at how MongoDB’s schemaless design and document orientation gives organisation’s like the Guardian the flexibility to aggregate social content and scale out.
5. Increase in Write Ratio
Users don’t just want to read content!
They want to share and contribute to the
content!
Volume Writes!
6. Need to Scale Datasource
JSON JSON JSON
Web Server
Application Server
Service #1
SQL
Bottleneck!
7. Need to Scale Datasource
JSON JSON JSON
Web Server
Application Server
Service #1
SQL
Bottleneck!
8. Application Cache?
JSON JSON JSON
Web Server
Application Server
Service #1
App Cache
SQL
9. Issues
+ Read Only data comes from a Cache
- Writes slow down as need to update the
Cache and the Database
- Need to keep cache data in sync between
Application Servers
10. IT needs are evolving...
Agile
Development
• Iterative
• Continuous
Data Volume, Type
& Use
• Trillions of records
• 100’s of millions of
queries per second
• Real-Time Analytics
• Unstructured / semi-
New Hardware
structured Architectures
• Commodity servers
• Cloud Computing
• Horizontal Scaling
11. Tradeoff: Scale vs. Functionality
• memcached
scalability & performance
• key/value
• RDBMS
depth of functionality
12. Terminology
RDBMS MongoDB
Table Collection
Row(s) JSON Document
Index Index
Join Embedding & Linking
14. A simple start
article = {author: "Chris",
date: new Date(),
title: "Managing Social Content"}
> db.articles.save(article)
Map the documents to your application.
15. Find the document
> db.articles.find()
{ _id: ObjectId("4c4ba5c0672c685e5e8aabf3"),
author: "Chris",
date: ISODate("2012-01-23T14:01:00.117Z"),
title: “Managing Social Content"
}
Note:
• _id must be unique, but can be anything you'd like
• Default BSON ObjectId if one is not supplied
16. Add an index, find via index
> db.articles.ensureIndex({author: 1})
> db.articles.find({author: 'Chris'})
{ _id: ObjectId("4c4ba5c0672c685e5e8aabf3"),
author: "Chris",
date: ISODate("2012-01-23T14:01:00.117Z"),
...
}
Secondary index on "author"
19. Adding Tags
> db.articles.update(
{title: "Managing Social Content" },
{$push: {tags: [“MongoDB”, “NoSQL”]}}
)
Push social "tags" into the existing article
20. Find the document
> db.articles.find()
{ _id: ObjectId("4c4ba5c0672c685e5e8aabf3"),
author: "Chris",
date: ISODate("2012-01-23T14:01:00.117Z"),
title: "Managing Social Content",
tags: [ "comic", "adventure" ]
}
Note:
• _id must be unique, but can be anything you'd like
• Default BSON ObjectId if one is not supplied
24. Extending the Schema
{ _id : ObjectId("4c4ba5c0672c685e5e8aabf3"),
author : "Chris",
date: ISODate("2012-01-23T14:01:00.117Z"),
title : "Managing Social Content",
tags : [ "MongoDB", "NoSQL" ],
comments : [{
author : "Marc",
date : ISODate("2012-01-23T14:31:53.848Z"),
text : "great article",
stars : 5
}],
comments_count: 1
}
25. Extending the Schema
{ _id : ObjectId("4c4ba5c0672c685e5e8aabf3"),
author : "Chris",
date: ISODate("2012-01-23T14:01:00.117Z"),
title : "Managing Social Content",
tags : [ "MongoDB", "NoSQL" ],
comments : [{
author : "Marc",
date : ISODate("2012-01-23T14:31:53.848Z"),
text : "great article",
stars : 5
}],
comments_count: 1
}
26. Trees
//Embedded Tree
{ ...
comments : [{
author : "Marc", text : "...",
replies : [{
author : "Fred", text : "..."
replies : [],
}]
}]
}
+ PROs: Single Document, Performance, Intuitive
- CONs: Hard to search, Partial Results, 16MB limit
27. One to Many - Normalized
// Articles collection
{ _id : 1000,
author : "Chris",
date: ISODate("2012-01-23T14:01:00.117Z"),
title : "Managing Social Content"
}
// Comments collection
{ _id : 1,
article : 1000,
author : "Marc",
date : ISODate("2012-01-23T14:31:53.848Z"),
...
}
> article = db. articles.find({title: "Managing Social
Content"});
> db.comments.find({article: article._id});
28. Array of Ancestors
A B C
// Store all ancestors of a node
{ _id: "a" } E D
{ _id: "b", thread: [ "a" ], replyTo: "a" }
{ _id: "c", thread: [ "a", "b" ], replyTo: "b" } F
{ _id: "d", thread: [ "a", "b" ], replyTo: "b" }
{ _id: "e", thread: [ "a" ], replyTo: "a" }
{ _id: "f", thread: [ "a", "e" ], replyTo: "e" }
29. Array of Ancestors
A B C
// Store all ancestors of a node
{ _id: "a" } E D
{ _id: "b", thread: [ "a" ], replyTo: "a" }
{ _id: "c", thread: [ "a", "b" ], replyTo: "b" } F
{ _id: "d", thread: [ "a", "b" ], replyTo: "b" }
{ _id: "e", thread: [ "a" ], replyTo: "a" }
{ _id: "f", thread: [ "a", "e" ], replyTo: "e" }
// find all threads where 'b" is in
> db.msg_tree.find({"thread": "b"})
30. Array of Ancestors
A B C
// Store all ancestors of a node
{ _id: "a" } E D
{ _id: "b", thread: [ "a" ], replyTo: "a" }
{ _id: "c", thread: [ "a", "b" ], replyTo: "b" } F
{ _id: "d", thread: [ "a", "b" ], replyTo: "b" }
{ _id: "e", thread: [ "a" ], replyTo: "a" }
{ _id: "f", thread: [ "a", "e" ], replyTo: "e" }
// find all threads where 'b" is in
> db.msg_tree.find({"thread": "b"})
// find all direct message "b: replied to
> db.msg_tree.find({"replyTo": "b"})
31. Array of Ancestors
A B C
// Store all ancestors of a node
{ _id: "a" } E D
{ _id: "b", thread: [ "a" ], replyTo: "a" }
{ _id: "c", thread: [ "a", "b" ], replyTo: "b" } F
{ _id: "d", thread: [ "a", "b" ], replyTo: "b" }
{ _id: "e", thread: [ "a" ], replyTo: "a" }
{ _id: "f", thread: [ "a", "e" ], replyTo: "e" }
// find all threads where 'b" is in
> db.msg_tree.find({"thread": "b"})
// find all direct message "b: replied to
> db.msg_tree.find({"replyTo": "b"})
//find all ancestors of f:
> threads = db.msg_tree.findOne({"_id": "f"}).thread
> db.msg_tree.find({"_id ": { $in : threads})
33. Geospatial
• Geo hash stored in B-Tree
• First two values indexed
db.articles.save({
loc: { long: 40.739037, lat: 40.739037 }
});
db.articles.save({
loc: [40.739037, 40.739037]
});
db.articles.ensureIndex({"loc": "2d"})
34. Geospatial Query
• Multi-location indexes for a single document
• $near may return the document for each index match
• $within will return a document once and once only
Find 100 nearby locations:
> db.locations.find({loc: {$near: [37.75, -122.42]}});
Find all locations within a box
>box = [[40, 40], [60, 60]]
>db.locations.find({loc: {$within: {$box: box}}});
36. Aggregation framework
• New aggregation framework
• Declarative framework (no JavaScript)
• Describe a chain of operations to apply
• Expression evaluation
• Return computed values
• Framework: new operations added easily
• C++ implementation
37. Aggregation - Pipelines
• Aggregation requests specify a pipeline
• A pipeline is a series of operations
• Members of a collection are passed through
a pipeline to produce a result
• ps -ef | grep -i mongod
38. Example - twitter
{
"_id" : ObjectId("4f47b268fb1c80e141e9888c"),
"user" : {
"friends_count" : 73,
"location" : "Brazil",
"screen_name" : "Bia_cunha1",
"name" : "Beatriz Helena Cunha",
"followers_count" : 102,
}
}
• Find the # of followers and # friends by location
41. Example - twitter
db.tweets.aggregate(
{$match: Predicate
{"user.friends_count": { $gt: 0 },
"user.followers_count": { $gt: 0 }
}
},
{$project:
{ location: "$user.location",
Parts of the
friends: "$user.friends_count", document you
followers: "$user.followers_count" want to project
}
},
{$group:
{_id: "$location",
friends: {$sum: "$friends"},
followers: {$sum: "$followers"}
}
}
);
42. Example - twitter
db.tweets.aggregate(
{$match: Predicate
{"user.friends_count": { $gt: 0 },
"user.followers_count": { $gt: 0 }
}
},
{$project:
{ location: "$user.location",
Parts of the
friends: "$user.friends_count", document you
followers: "$user.followers_count" want to project
}
},
{$group:
{_id: "$location", Function to
friends: {$sum: "$friends"},
followers: {$sum: "$followers"}
apply to the
} result set
}
);
47. Questions
• 10Gen Services
– Development Support
– Consultancy
– TAM
– Production Support
• Free online MongoDB
training
– Develop
– Deploy
– Classes start Oct. 2012
43
Editor's Notes
\n
\n
\n
\n
\n
\n
\n
\n
\n
Evolutions in computing are significantly impacting the traditional RDBMS.\nVolume of data is magnitudes higher than previously\ntens of millions quereis a second\nstructured and unstructured data\ncloud computing and storage\nscaling horizontally not vertically due reaching the capacity. buying a bigger box\ncommodity servers not expensive sans\nand developers are not doing waterfall development anymore, they want to be more agile\nflexible in their data models..\n
where is mongodb, when you compare functionality vs. performance?\nwe are to haveing most of the features of a relational database but not complex joins which arent scale\n
* No joins for scalability - Doing joins across shards in SQL highly inefficient and difficult to perform.\n* MongoDB is geared for easy scaling - going from a single node to a distributed cluster is easy.\n* Little or no application code changes are needed to scale from a single node to a sharded cluster.\n
\n
\n
\n
* you can always add und remove indexes during runtime (but reindexing will take some time)\n
\n
\n
* upserts - $push, $inc\n* atomicy\n
\n
\n
* Rich query language\n* Powerful - can do range queries $lt and $gt\n* Update - can update parts of documents\n
* upserts - $push, $inc\n* atomicy\n
* later? .. extending…: whats wrong with that schema?\ncomments… (a lot of comments) a single doc could be only 16meg in size), padding factors\n
* Also one to many pattern\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
change som customers to european ones\ncraigslist: had the problem, that they couldnt introduce new features as fast they want, because they had to change the schema for that, wich is a massive impact on the database. possible \n
and counting..\n..national archives, are digitalising their whole dataset and storing that into mongodb\n...the guardian, main database for every new project\n...navteq, discovering mongodb because of its location based features and now loving it because of the flexibility of the schema\n...cern : using for their data aggregation system. so all systems feeding that db results in 1M Updates a day.\n\n..a customer in france:\n250 million products stored (product data only, not images which are stored in our own CDN)\n- 300 million reads per day (peak at 1600 reads per second)\n- 150 million writes per day\n