Schema Design (Mongo Austin)

2,137 views

Published on

Bernie Hackett's presentation at Mongo Austin

Published in: Technology
0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
2,137
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
65
Comments
0
Likes
3
Embeds 0
No embeds

No notes for slide
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • In 1.6.x M/R results stored in temp collection until connection closed. Could define an output collection.\nIn 1.8.x M/R results are stored to a permanent collection unless you specify not to.\n
  • \n
  • key: fields to group by\ninitial: The initial value of the aggregation counter\nreduce: The reduce function that aggregates the objects we iterate over.\n
  • \n
  • \n
  • \n
  • \n
  • find() will only return documents where the field exists (and in this case it's value is greater than 0\n
  • Indexes can still be created for fields that don't appear in all documents.\n\nNew in 1.8.x: sparse indexes: The index only includes the documents where the field exists. In a normal index the non-existent fields are treated as null values.\n
  • \n
  • \n
  • The greater the height of the tree the harder it becomes to query.\n
  • Normalized: Two collections instead of one.\n\nMore flexible but requires more queries to retrieve the same data.\n
  • Strong life-cycle association: use embedded array\n\nOtherwise you have options: embedded array/tree or normalize the data\n
  • \n
  • \n
  • Two collections\n\nOne option: arrays of keys (pointers) in each document that point to documents in another collection\n
  • Only one query to find the category for a product given the product id.\n\nOnly one query to find products in a category given the category id.\n
  • Alternative: only store an array of keys in the documents of one collection.\n\nAdvantage: less storage space required in the categories collection\n\n
  • Finding all the products in a given category is still one query.\n\n
  • Disadvantage: Finding all the categories for a given product is two queries.\n
  • 4MB limit in 1.6.x\n16MB in 1.8.x\n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • findAndModify returns one result object, update is atomic\n\nquery: The query filter\nsort: if multiple documents match, return the first one in the sorted results\nupdate: a modifier object that specifies the mods to make\nnew: return the modified object, otherwise return the old object\n
  • \n
  • \n
  • Schema Design (Mongo Austin)

    1. 1. Schema Design Bernie Hackettbernie@10gen.com
    2. 2. TopicsIntroduction• Basic Data Modeling• Manipulating Data• Evolving a schemaCommon patterns• Single table inheritance• One-to-Many & Many-to-Many• Trees• Queues
    3. 3. So why model data? http://www.flickr.com/photos/42304632@N00/493639870/
    4. 4. Benefits of relational• Before relational • Data and Logic combined• After relational • Separation of concerns • Data modeled independent of logic • Logic freed from concerns of data design• MongoDB continues this separation
    5. 5. NormalizationGoals• Avoid anomalies when inserting, updating or deleting• Minimize redesign when extending the schema• Make the model informative to users• Avoid bias toward a particular queryIn MongoDB• Similar goals apply• The rules are different
    6. 6. Relational made normalizeddata look like this
    7. 7. Document databases makenormalized data look like this
    8. 8. TerminologyRDBMS MongoDBTable CollectionRow(s) JSON
DocumentIndex IndexJoin Embedding
&
LinkingPartition ShardPartition
Key Shard
Key
    9. 9. DB ConsiderationsHow can we manipulate Access Patterns? this data? • Dynamic Queries • Read / Write Ratio • Secondary Indexes • Types of updates • Atomic Updates • Types of queries • Map Reduce • Data life-cycle Further Considerations • No Joins • Document writes are atomic
    10. 10. So today’s example will use...
    11. 11. Design SessionDesign documents that simply map toyour application> post
=
{
author:
"Hergé", 







date:
new
Date(), 











text:
"Destination
Moon", 











tags:
[
"comic",
 "adventure"
]
}>
db.post.save(post)
    12. 12. Find the document>
db.posts.find()

{
_id:
ObjectId("4c4ba5c0672c685e5e8aabf3"),



author:
"Hergé",




date:
"Sat
Jul
24
2010
19:47:11
GMT‐0700
(PDT)",




text:
"Destination
Moon",




tags:
[
"comic",
"adventure"
]

}

Notes:• ID must be unique, but can be anything you’d like• MongoDB will generate a default ID if one is not supplied
    13. 13. Add and index, find via IndexSecondary index for "author"
//


1
means
ascending,
‐1
means
descending
>
db.posts.ensureIndex(
{author:
1
}
)
>
db.posts.find(
{
author:
Hergé
}
)




{
_id:
ObjectId("4c4ba5c0672c685e5e8aabf3"),




date:
"Sat
Jul
24
2010
19:47:11
GMT‐0700
(PDT)",




author:
"Hergé",





...
}
    14. 14. Verifying indexes exist>
db.posts.getIndexes()//
Index
on
ID

{
name:
"_id_",




ns:
"test.posts",




key:
{
"_id"
:
1
}
}//
Index
on
author

{
_id:
ObjectId("4c4ba6c5672c685e5e8aabf4"),




ns:
"test.posts",




key:
{
"author"
:
1
},




name:
"author_1"
}
    15. 15. Examine the query plan>
db.blogs.find(
{
author:
Hergé
}
).explain(){
 "cursor"
:
"BtreeCursor
author_1",
 "nscanned"
:
1,
 "nscannedObjects"
:
1,
 "n"
:
1,
 "millis"
:
5,
 "indexBounds"
:
{
 
 "author"
:
[
 
 
 [
 
 
 
 "Hergé",
 
 
 
 "Hergé"
 
 
 ]
 
 ]
 }}
    16. 16. Query operatorsConditional operators: $ne, $in, $nin, $mod, $all, $size, $exists, $type, $lt, $lte, $gt, $gte//
find
posts
with
any
tags>
db.posts.find(
{
tags:
{
$exists:
true
}
}
)
    17. 17. Query operatorsConditional operators: $ne, $in, $nin, $mod, $all, $size, $exists, $type, $lt, $lte, $gt, $gte//
find
posts
with
any
tags>
db.posts.find(
{
tags:
{
$exists:
true
}
}
)Regular expressions://
posts
where
author
starts
with
h>
db.posts.find(
{
author:
/^h/i
}
)

    18. 18. Query operatorsConditional operators: $ne, $in, $nin, $mod, $all, $size, $exists, $type, $lt, $lte, $gt, $gte//
find
posts
with
any
tags>
db.posts.find(
{
tags:
{
$exists:
true
}
}
)Regular expressions://
posts
where
author
starts
with
h>
db.posts.find(
{
author:
/^h/i
}
)
Counting://
number
of
posts
written
by
Hergé>
db.posts.find(
{
author:
"Hergé"
}
).count()
    19. 19. Extending the Schema



>
new_comment
=
{
author:
"Bernie",
 













date:
new
Date(), 













text:
"great
book"
}
>
db.posts.update(










{
text:
"Destination
Moon"
},











{
$push:
{
comments:
new_comment
},












$inc:

{
comments_count:
1
}
}
)
    20. 20. Extending the Schema


{
_id
:
ObjectId("4c4ba5c0672c685e5e8aabf3"),




author
:
"Hergé",



date
:
"Sat
Jul
24
2010
19:47:11
GMT‐0700
(PDT)",




text
:
"Destination
Moon",



tags
:
[
"comic",
"adventure"
],







comments
:
[
 {
 
 author
:
"Bernie",
 
 date
:
"Sat
Jul
24
2010
20:51:03
GMT‐0700
(PDT)",
 
 text
:
"great
book"
 }



],



comments_count:
1

}


    21. 21. Extending the Schema//
create
index
on
nested
documents:>
db.posts.ensureIndex(
{
"comments.author":
1
}
)>
db.posts.find(
{
"comments.author":
"Bernie"
}
)
    22. 22. Extending the Schema//
create
index
on
nested
documents:>
db.posts.ensureIndex(
{
"comments.author":
1
}
)>
db.posts.find(
{
"comments.author":
"Bernie"
}
)//
find
last
5
posts:>
db.posts.find().sort(
{
date:
‐1
}
).limit(5)
    23. 23. Extending the Schema//
create
index
on
nested
documents:>
db.posts.ensureIndex(
{
"comments.author":
1
}
)>
db.posts.find(
{
"comments.author":
"Bernie"
}
)//
find
last
5
posts:>
db.posts.find().sort(
{
date:
‐1
}
).limit(5)//
most
commented
post:>
db.posts.find().sort(
{
comments_count:
‐1
}
).limit(1) When sorting, check if you need an index
    24. 24. Watch for full table scans>
db.blogs.find(
{
text:
Destination
Moon
}
).explain()

{
 "cursor"
:
"BasicCursor",
 "nscanned"
:
1,
 "nscannedObjects"
:
1,
 "n"
:
1,
 "millis"
:
0,
 "indexBounds"
:
{
 

 }}
    25. 25. Map Reduce
    26. 26. Map reduce : count tagsmapFunc
=
function
()
{



this.tags.forEach(
function(
z
)
{
emit(
z,
{
count:1
}
);
}
);}reduceFunc
=
function(
k,
v
)
{



var
total
=
0;



for
(
var
i
=
0;
i
<
v.length;
i++
)
{










total
+=
v[i].count;



}



return
{
count:
total
};
}res
=
db.posts.mapReduce(
mapFunc,
reduceFunc
)>db[res.result].find()




{
_id
:
"comic",
value
:
{
count
:
1
}
}




{
_id
:
"adventure",
value
:
{
count
:
1
}
}
    27. 27. Group• Equivalent to a Group By in SQL• Specify the attributes to group the data• Process the results in a Reduce function
    28. 28. Group - Count post by Authorcmd
=
{
key:
{
"author":
true
}, 







initial:
{
count:
0
}, 







reduce:
function(obj,
prev)
{ 















prev.count++;













},





};result
=
db.posts.group(cmd);[
 {
 
 "author"
:
"Hergé",
 
 "count"
:
1
 },
 {
 
 "author"
:
"Kyle",
 
 "count"
:
3
 }]
    29. 29. ReviewSo Far:- Started out with a simple schema- Queried Data- Evolved the schema- Queried / Updated the data some more
    30. 30. Inheritance
    31. 31. Single Table Inheritance - RDBMSshapes table id type area radius d length width 1 circle 3.14 1 2 square 4 2 3 rect 10 5 2
    32. 32. Single Table Inheritance -MongoDB>
db.shapes.find()
{
_id:
"1",
type:
"circle",
area:
3.14,
radius:
1
}
{
_id:
"2",
type:
"square",
area:
4,
d:
2
}
{
_id:
"3",
type:
"rect",
area:
10,
length:
5,
width:
2
}
    33. 33. Single Table Inheritance -MongoDB>
db.shapes.find()
{
_id:
"1",
type:
"circle",
area:
3.14,
radius:
1
}
{
_id:
"2",
type:
"square",
area:
4,
d:
2
}
{
_id:
"3",
type:
"rect",
area:
10,
length:
5,
width:
2
}//
find
shapes
where
radius
>
0
>
db.shapes.find(
{
radius:
{
$gt:
0
}
}
)
    34. 34. Single Table Inheritance -MongoDB>
db.shapes.find()
{
_id:
"1",
type:
"circle",
area:
3.14,
radius:
1
}
{
_id:
"2",
type:
"square",
area:
4,
d:
2
}
{
_id:
"3",
type:
"rect",
area:
10,
length:
5,
width:
2
}//
find
shapes
where
radius
>
0
>
db.shapes.find(
{
radius:
{
$gt:
0
}
}
)//
create
index>
db.shapes.ensureIndex(
{
radius:
1
}
)
    35. 35. One to ManyOne to Many relationships can specify• degree of association between objects• containment• life-cycle
    36. 36. One to Many- Embedded Array / Array Keys - slice operator to return subset of array - some queries harder e.g find latest comments across all documentsblogs:
{







author
:
"Hergé",



date
:
"Sat
Jul
24
2010
19:47:11
GMT‐0700
(PDT)",




comments
:
[
 

{
 
 author
:
"Bernie",
 
 date
:
"Sat
Jul
24
2010
20:51:03
GMT‐0700
(PDT)",
 
 text
:
"great
book"
 

}



]
}
    37. 37. One to Many- Embedded tree - Single document - Natural - Hard to queryblogs:
{







author
:
"Hergé",



date
:
"Sat
Jul
24
2010
19:47:11
GMT‐0700
(PDT)",




comments
:
[
 

{
 
 author
:
"Bernie",
 
 date
:
"Sat
Jul
24
2010
20:51:03
GMT‐0700
(PDT)",
 
 text
:
"great
book", 





replies:
[
{
author
:
“James”,
...
}
]
 

}



]
}
    38. 38. One to Many- Normalized (2 collections) - most flexible - more queriesblogs:
{







author
:
"Hergé",



date
:
"Sat
Jul
24
2010
19:47:11
GMT‐0700
(PDT)",




comments
:
[
 


{
comment
:
ObjectId(“1”)
}



]
}comments
:
{
_id
:
“1”, 












author
:
"James", 

date
:
"Sat
Jul
24
2010
20:51:03
..."
}
    39. 39. One to Many - patterns- Embedded Array / Array Keys- Embedded Array / Array Keys- Embedded tree- Normalized
    40. 40. Many - ManyExample:- Product can be in many categories- Category can have many products
    41. 41. Many - Manyproducts:


{
_id:
ObjectId("10"),




name:
"Destination
Moon",




category_ids:
[
ObjectId("20"),
ObjectId("30")
]
}


    42. 42. Many - Manyproducts:


{
_id:
ObjectId("10"),




name:
"Destination
Moon",




category_ids:
[
ObjectId("20"),
ObjectId("30")
]
}

categories:


{
_id:
ObjectId("20"),





name:
"adventure",





product_ids:
[
ObjectId("10"),
ObjectId("11"),
ObjectId("12")
]
}
    43. 43. Many - Manyproducts:


{
_id:
ObjectId("10"),




name:
"Destination
Moon",




category_ids:
[
ObjectId("20"),
ObjectId("30")
]
}

categories:


{
_id:
ObjectId("20"),





name:
"adventure",





product_ids:
[
ObjectId("10"),
ObjectId("11"),
ObjectId("12")
]
}//All
categories
for
a
given
product>
db.categories.find(
{
product_ids:
ObjectId("10")
}
)
    44. 44. Alternativeproducts:


{
_id:
ObjectId("10"),




name:
"Destination
Moon",




category_ids:
[
ObjectId("20"),
ObjectId("30")
]
}

categories:


{
_id:
ObjectId("20"),





name:
"adventure"
}
    45. 45. Alternativeproducts:


{
_id:
ObjectId("10"),




name:
"Destination
Moon",




category_ids:
[
ObjectId("20"),
ObjectId("30")
]
}

categories:


{
_id:
ObjectId("20"),





name:
"adventure"
}//
All
products
for
a
given
category>
db.products.find(
{
category_ids:
ObjectId("20")
}
)

    46. 46. Alternativeproducts:


{
_id:
ObjectId("10"),




name:
"Destination
Moon",




category_ids:
[
ObjectId("20"),
ObjectId("30")
]
}

categories:


{
_id:
ObjectId("20"),





name:
"adventure"
}//
All
products
for
a
given
category>
db.products.find(
{
category_ids:
ObjectId("20")
}
)
//
All
categories
for
a
given
productproduct

=
db.products.find(_id
:
some_id)>
db.categories.find(
{
_id
:
{
$in
:
product.category_ids
}
}
)

    47. 47. TreesFull Tree in Document{
comments:
[




{
author:
"Bernie",
text:
"...",







replies:
[





















{author:
"James",
text:
"...",






















replies:
[
]
}







]
}

]}Pros: Single Document, Performance, IntuitiveCons: Hard to search, Partial Results, 16MB limit


    48. 48. TreesParent Links- Each node is stored as a document- Contains the id of the parentChild Links- Each node contains the id’s of the children- Can support graphs (multiple parents / child)
    49. 49. Array of Ancestors- Store all Ancestors of a node

{
_id:
"a"
}

{
_id:
"b",
ancestors:
[
"a"
],
parent:
"a"
}

{
_id:
"c",
ancestors:
[
"a",
"b"
],
parent:
"b"
}

{
_id:
"d",
ancestors:
[
"a",
"b"
],
parent:
"b"
}

{
_id:
"e",
ancestors:
[
"a"
],
parent:
"a"
}

{
_id:
"f",
ancestors:
[
"a",
"e"
],
parent:
"e"
}
    50. 50. Array of Ancestors- Store all Ancestors of a node

{
_id:
"a"
}

{
_id:
"b",
ancestors:
[
"a"
],
parent:
"a"
}

{
_id:
"c",
ancestors:
[
"a",
"b"
],
parent:
"b"
}

{
_id:
"d",
ancestors:
[
"a",
"b"
],
parent:
"b"
}

{
_id:
"e",
ancestors:
[
"a"
],
parent:
"a"
}

{
_id:
"f",
ancestors:
[
"a",
"e"
],
parent:
"e"
}//find
all
descendants
of
b:>
db.tree2.find(
{
ancestors:
b
}
)//find
all
direct
descendants
of
b:>
db.tree2.find(
{
parent:
b
}
)
    51. 51. Array of Ancestors- Store all Ancestors of a node

{
_id:
"a"
}

{
_id:
"b",
ancestors:
[
"a"
],
parent:
"a"
}

{
_id:
"c",
ancestors:
[
"a",
"b"
],
parent:
"b"
}

{
_id:
"d",
ancestors:
[
"a",
"b"
],
parent:
"b"
}

{
_id:
"e",
ancestors:
[
"a"
],
parent:
"a"
}

{
_id:
"f",
ancestors:
[
"a",
"e"
],
parent:
"e"
}//find
all
descendants
of
b:>
db.tree2.find(
{
ancestors:
b
}
)//find
all
direct
descendants
of
b:>
db.tree2.find(
{
parent:
b
}
)//find
all
ancestors
of
f:>
ancestors
=
db.tree2.findOne(
{
_id:
f
}
).ancestors>
db.tree2.find(
{
_id:
{
$in
:
ancestors
}
)
    52. 52. Trees as PathsStore hierarchy as a path expression- Separate each node by a delimiter, e.g. "/"- Use text search for find parts of a tree{
comments:
[




{
author:
"Bernie",
text:
"initial
post",







path:
"/"
},




{
author:
"Jim",

text:
"jim’s
comment",






path:
"/jim"
},




{
author:
"Bernie",
text:
"Bernie’s
reply
to
Jim",






path
:
"/jim/bernie"}
]
}//
Find
the
conversations
Jim
was
a
part
of>
db.posts.find(
{
path:
/jim/i
}
)
    53. 53. Queue• Need to maintain order and state• Ensure that updates to the queue are atomic


{
inprogress:
false,




priority:
1,



...


}
    54. 54. Queue• Need to maintain order and state• Ensure that updates to the queue are atomic


{
inprogress:
false,




priority:
1,



...


}//
find
highest
priority
job
and
mark
as
in‐progressjob
=
db.jobs.findAndModify(
{














query:

{
inprogress:
false
},














sort:


{
priority:
‐1
},















update:
{
$set:
{inprogress:
true,
 













started:
new
Date()
}
},














new:
true
}
)


    55. 55. SummarySchema design is different in MongoDBBasic data design principals stay the sameFocus on how the apps manipulates dataRapidly evolve schema to meet your requirementsEnjoy your new freedom, use it wisely :-)
    56. 56. download at mongodb.org We’re Hiring ! bernie@10gen.com conferences,
appearances,
and
meetups http://www.10gen.com/events



Facebook





|




Twitter




|




LinkedIn http://bit.ly/mongo>
 @mongodb http://linkd.in/joinmongo

    ×