SlideShare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.
SlideShare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.
Successfully reported this slideshow.
Activate your 14 day free trial to unlock unlimited reading.
6.
This talk
‣Intro
-Terms / Definitions
‣Getting a flavor
-Creating a Schema
-Indexes
-Evolving the Schema
7.
This talk
‣Intro
-Terms / Definitions
‣Getting a flavor
-Creating a Schema
-Indexes
-Evolving the Schema
‣Data modeling
-DBRef
-Single Table Inheritance
-Many - Many
-Trees
-Lists / Queues / Stacks
8.
Document Oriented
Basic unit of data: JSON Documents
Not Relational, Key Value
Not OODB
- Associations implied by Document Structure
- but your database schema != your program schema
9.
Terms
Table -> Collection
Row(s) -> JSON Document
Index -> Index
Join -> Embedding and Linking
across documents
Partition -> Shard
Partition Key -> Shard Key
10.
Considerations
What are the requirements ?
- Functionality to be supported
- Access Patterns ?
- Data Life Cycle (insert, update, deletes)
- Expected Performance / Workload ?
Capabilities of the database ?
11.
DB Considerations
How can we manipulate this data ?
Dynamic Queries
Secondary Indexes
Atomic Updates
Map Reduce
Access Patterns ?
Read / Write Ratio
Types of updates
Types of queries
Considerations
No Joins
Single Document Transactions only
12.
Design Session
Use Rich Design Documents
post = {author: “kyle”,
date: new Date(),
text: “my blog post...”,
tags: [“mongodb”, “intro”]}
>db.post.save(post)
13.
>db.posts.find()
{ _id : ObjectId("4c4ba5c0672c685e5e8aabf3"),
author : "kyle",
date : "Sat Jul 24 2010 19:47:11 GMT-0700 (PDT)",
text : "My first blog",
tags : [ "mongodb", "intro" ] }
Notes:
- ID is unique, but can be anything you’d like
14.
Secondary index for “author”
// 1 means ascending, -1 means descending
>db.posts.ensureIndex({author: 1})
>db.posts.find({author: 'kyle'})
{ _id : ObjectId("4c4ba5c0672c685e5e8aabf3"),
author : "kyle",
... }
15.
Verifying indexes exist
>db.system.indexes.find()
// Index on ID
{ name : "_id_",
ns : "test.posts",
key : { "_id" : 1 } }
16.
Verifying indexes exist
>db.system.indexes.find()
// Index on ID
{ name : "_id_",
ns : "test.posts",
key : { "_id" : 1 } }
// Index on author
{ _id : ObjectId("4c4ba6c5672c685e5e8aabf4"),
ns : "test.posts",
key : { "author" : 1 },
name : "author_1" }
22.
// create index on nested documents:
>db.posts.ensureIndex({"comments.author": 1})
>db.posts.find({comments.author:”Fred”})
23.
// create index on nested documents:
>db.posts.ensureIndex({"comments.author": 1})
>db.posts.find({comments.author:”kyle”})
// find last 5 posts:
>db.posts.find().sort({date:-1}).limit(5)
24.
// create index on nested documents:
>db.posts.ensureIndex({"comments.author": 1})
>db.posts.find({comments.author:”kyle”})
// find last 5 posts:
>db.posts.find().sort({date:-1}).limit(5)
// most commented post:
>db.posts.find().sort({comments_count:-1}).limit(1)
When sorting, check if you need an index
25.
Map Reduce
Aggregation and batch manipulation
Collection in, Collection out
Parallel in sharded environments
26.
Map reduce
mapFunc = function () {
this.tags.forEach(function (z) {emit(z, {count:1});});
}
reduceFunc = function (k, v) {
var total = 0;
for (var i = 0; i < v.length; i++) { total += v[i].count; }
return {count:total}; }
res = db.posts.mapReduce(mapFunc, reduceFunc)
>db[res.result].find()
{ _id : "intro", value : { count : 1 } }
{ _id : "mongodb", value : { count : 1 } }
27.
Review
So Far:
- Started out with a simple schema
- Queried Data
- Evolved the schema
- Queried / Updated the data some more
29.
Review
So Far:
- Started out with a simple schema
- Queried Data
- Evolved the schema
- Queried / Updated the data some more
Observations:
- Using Rich Documents works well
- Simplify relations by embedding them
- Iterative development is easy with MongoDB
31.
One to Many
- Embedded Array / Array Keys
- slice operator to return subset of array
- hard to find latest comments across all documents
32.
One to Many
- Embedded Array / Array Keys
- slice operator to return subset of array
- hard to find latest comments across all documents
- Embedded tree
- Single document
- Natural
- Hard to query
33.
One to Many
- Embedded Array / Array Keys
- slice operator to return subset of array
- hard to find latest comments across all documents
- Embedded tree
- Single document
- Natural
- Hard to query
- Normalized (2 collections)
- most flexible
- more queries
34.
Many - Many
Example:
- Product can be in many categories
- Category can have many products
Products id | product_id | category_id Category
37.
products:
{ _id: ObjectId("4c4ca23933fb5941681b912e"),
name: "Sumatra Dark Roast",
category_ids: [ ObjectId("4c4ca25433fb5941681b912f"),
ObjectId("4c4ca25433fb5941681b92af”]}
categories:
{ _id: ObjectId("4c4ca25433fb5941681b912f"),
name: "Indonesia",
product_ids: [ ObjectId("4c4ca23933fb5941681b912e"),
ObjectId("4c4ca30433fb5941681b9130"),
ObjectId("4c4ca30433fb5941681b913a"]}
//All categories for a given product
>db.categories.find({product_ids: ObjectId("4c4ca23933fb5941681b912e")})
38.
products:
{ _id: ObjectId("4c4ca23933fb5941681b912e"),
name: "Sumatra Dark Roast",
category_ids: [ ObjectId("4c4ca25433fb5941681b912f"),
ObjectId("4c4ca25433fb5941681b92af”]}
categories:
{ _id: ObjectId("4c4ca25433fb5941681b912f"),
name: "Indonesia",
product_ids: [ ObjectId("4c4ca23933fb5941681b912e"),
ObjectId("4c4ca30433fb5941681b9130"),
ObjectId("4c4ca30433fb5941681b913a"]}
//All categories for a given product
>db.categories.find({product_ids: ObjectId("4c4ca23933fb5941681b912e")})
//All products for a given category
>db.products.find({category_ids: ObjectId("4c4ca25433fb5941681b912f")})
39.
Alternative
products:
{ _id: ObjectId("4c4ca23933fb5941681b912e"),
name: "Sumatra Dark Roast",
category_ids: [ ObjectId("4c4ca25433fb5941681b912f"),
ObjectId("4c4ca25433fb5941681b92af”]}
categories:
{ _id: ObjectId("4c4ca25433fb5941681b912f"),
name: "Indonesia"}
40.
Alternative
products:
{ _id: ObjectId("4c4ca23933fb5941681b912e"),
name: "Sumatra Dark Roast",
category_ids: [ ObjectId("4c4ca25433fb5941681b912f"),
ObjectId("4c4ca25433fb5941681b92af”]}
categories:
{ _id: ObjectId("4c4ca25433fb5941681b912f"),
name: "Indonesia"}
// All products for a given category
>db.products.find({category_ids: ObjectId("4c4ca25433fb5941681b912f")})
41.
Alternative
products:
{ _id: ObjectId("4c4ca23933fb5941681b912e"),
name: "Sumatra Dark Roast",
category_ids: [ ObjectId("4c4ca25433fb5941681b912f"),
ObjectId("4c4ca25433fb5941681b92af”]}
categories:
{ _id: ObjectId("4c4ca25433fb5941681b912f"),
name: "Indonesia"}
// All products for a given category
>db.products.find({category_ids: ObjectId("4c4ca25433fb5941681b912f")})
// All categories for a given product
product = db.products.find(_id : some_id)
>db.categories.find({_id : {$in : product.category_ids}})
42.
Trees
Full Tree in Document
{ comments: [
{ author: “rpb”, text: “...”,
replies: [
{author: “Fred”, text: “...”,
replies: []}
]}
]}
Pros: Single Document, Performance, Intuitive
Cons: Hard to search, Partial Results, 4MB limit
43.
Trees
Parent Links
- Each node is stored as a document
- Contains the id of the parent
Child Links
- Each node contains the id’s of the children
- Can support graphs (multiple parents / child)
44.
Array of Ancestors
- Store Ancestors of a node
{ _id: "a" }
{ _id: "b", ancestors: [ "a" ], parent: "a" }
{ _id: "c", ancestors: [ "a", "b" ], parent: "b" }
{ _id: "d", ancestors: [ "a", "b" ], parent: "b" }
{ _id: "e", ancestors: [ "a" ], parent: "a" }
{ _id: "f", ancestors: [ "a", "e" ], parent: "e" }
{ _id: "g", ancestors: [ "a", "b", "d" ], parent: "d" }
45.
Array of Ancestors
- Store Ancestors of a node
{ _id: "a" }
{ _id: "b", ancestors: [ "a" ], parent: "a" }
{ _id: "c", ancestors: [ "a", "b" ], parent: "b" }
{ _id: "d", ancestors: [ "a", "b" ], parent: "b" }
{ _id: "e", ancestors: [ "a" ], parent: "a" }
{ _id: "f", ancestors: [ "a", "e" ], parent: "e" }
{ _id: "g", ancestors: [ "a", "b", "d" ], parent: "d" }
//find all descendants of b:
>db.tree2.find({ancestors: ‘b’})
46.
Array of Ancestors
- Store Ancestors of a node
{ _id: "a" }
{ _id: "b", ancestors: [ "a" ], parent: "a" }
{ _id: "c", ancestors: [ "a", "b" ], parent: "b" }
{ _id: "d", ancestors: [ "a", "b" ], parent: "b" }
{ _id: "e", ancestors: [ "a" ], parent: "a" }
{ _id: "f", ancestors: [ "a", "e" ], parent: "e" }
{ _id: "g", ancestors: [ "a", "b", "d" ], parent: "d" }
//find all descendants of b:
>db.tree2.find({ancestors: ‘b’})
//find all ancestors of f:
>ancestors = db.tree2.findOne({_id:’f’}).ancestors
>db.tree2.find({_id: { $in : ancestors})
47.
findAndModify
Queue example
//Example: grab highest priority job and mark
job = db.jobs.findAndModify({
query: {inprogress: false},
sort: {priority: -1),
update: {$set: {inprogress: true,
started: new Date()}},
new: true})