Mongo Atlanta
Schema Design
     Robert Stam
  robert@10gen.com
Topics
Introduction
•  Basic data modeling
•  Manipulating data
•  Evolving a schema

Common patterns
• Single table inheritance
• One-to-many
• Many-to-many
• Trees
• Queues
Benefit of relational
Before relational model
• Data and logic combined

After relational model
• Separation of concerns
• Data model independent of logic
• Logic freed from concerns of data design

MongoDB continues this separation
Normalization
Goals
• Avoid anomalies when inserting, updating or
  deleting
• Minimize redesign when extending the
  schema
• Make the model informative to users
• Avoid bias toward a particular query

In MongoDB
•  Similar goals apply
•  But rules are different
Relational model makes
normalized data looks like
Document databases make
normalized data look like
Terminology
  Relational           MongoDB
     Table             Collection
    Row(s)            Documents
     Index               Index
      Join        Embedding and linking
    Partition            Shard
  Partition key        Shard key
Collections
• Cheap to create (max 24000)
• Collections don’t have a schema
• Individual documents have a schema
• Common for documents in a collection to
  share a schema
• Document schema can evolve
• Consider using multiple related collections
  tied together by a naming convention:
 • e.g. LogData-2011-02-08
Document basics
• Zero or more elements
• Elements are name/value pairs
• Rich data types for values
• JSON
• BSON
Data types
• Numeric (Int32, Int64, Double)
• String
• Boolean
• DateTime
• ObjectId
• Others (Javascript, Regex, Binary, Null, ...)
• Array
• Nested document
Experimenting with MongoDB
• Mongo shell
• Javascript
$
mongo
MongoDB
shell
version:
1.7.5
connecting
to:
test
>
db.books.find()
{




_id
:
ObjectId("12345678901234567890abcd"),




author
:
"Ernest
Hemingway",




title
:
"The
Old
Man
and
the
Sea"
}
>
Sample rich document
>
db.orders.findOne()
{




_id
:
1,




customer
:
{








customer_id
:
1234,








name
:
"John
Doe",








address
:
{












line1
:
"123
Main
St",












city
:
"Duncannon",












state
:
"PA",












zip
:
"12345‐6789"








}




}




items
:
[








{
item_id
:
111,
...
}
//
data
for
first
item








{
item_id
:
222,
...
}
//
data
for
next
item








...




]
}
Rich document advantages
 • Holistic representation
 • Still easy to manipulate
 • Pre-joined for fast retrieval
Document size
• Max 4MB in earlier MongoDB versions
• Max 16MB in current versions
• Performance considerations long before
  reaching the maximum size
Database considerations
 • How can we manipulate this data?
 • Dynamic queries
 • Secondary indexes
 • Atomic updates
What are the access patterns?
 • Read/write ratio
 • Types of updates
 • Types of queries
 • Data life-cycle
Considerations
 • No joins
 • Document writes are atomic
Document design
• Design documents that map simply to your
   application data
>
book
=
{




author
:
"Ernest
Hemingway",




title
:
"The
Old
Man
and
the
Sea",




tags
:
["American
Literature",
"Sea",
"Large
Fish"]
}
>
db.books.insert(book)
>
Find the document
>
db.books.find({
author
:
"Ernest
Hemingway"
})
{




_id
:
ObjectId("12345678901234567890abcd"),




author
:
"Ernest
Hemingway",




title
:
"The
Old
Man
and
the
Sea",




tags
:
["American
Literature",
"Sea",
"Large
Fish"]
}
>

 Notes:
    •Every document must have a unique _id
    •MongoDB will generate one automatically if
     your document does not have an _id
Find via index
>
db.books.ensureIndex({
author
:
1
})

>
db.books.find({
author
:
"Ernest
Hemingway"
})
{




_id
:
ObjectId("12345678901234567890abcd"),




author
:
"Ernest
Hemingway",




title
:
"The
Old
Man
and
the
Sea",




tags
:
["American
Literature",
"Sea",
"Large
Fish"]
}
>
Verify index exists
>
db.books.getIndexes()
{




...,




{








_id
:
ObjectId("12345678901234567890abcd"),








ns
:
"test.books",








key
:
{
author
:
1
},








name
:
"author_1"




},




...
}
>
Verify index is used
Examine the query plan
>
db.books.find({
author
:
"Ernest
Hemingway"
}).explain()
{




cursor
:
"BtreeCursor
author_1",




nscanned
:
1,




nscannedObjects
:
1,




n
:
1,




millis
:
1,




indexBounds
:
{








author
:
[












[
"Ernest
Hemingway",
"Ernest
Hemingway"
]








]




}
}
>
Query operators
Conditional operators
• equals ({ author : "..." })
• matches ({ author : /^e/i })
• $ne, $in, $nin, $mod, $all, $size, $exists,
  $type, $lt, $lte, $gt, $gte, $ne
Sample queries
//
find
books
by
"Ernest
Hemingway"
>
db.books.find({
author
:
"Ernest
Hemingway"
})

//
find
books
by
authors
whose
name
starts
with
"e"
>
db.books.find({
author
:
/^e/i
})

//
find
books
tagged
"American
Literature"
>
db.books.find({
tags
:
"American
Literature"
})

//
find
books
that
have
a
tags
element
>
db.books.find({
tags
:
{
$exists
:
true
}
})

//
count
books
by
authors
whose
name
starts
with
"e"
>
db.books.find({
author
:
/^e/i
}).count()
Extending the schema
>
comment
=
{




author
:
"Robert",




text
:
"Great
book",




date
:
Date()
}
>
db.books.update(




{
title
:
"The
Old
Man
and
the
Sea"
},




{









$inc
:
{
comments_count
:
1
},








$push
:
{
comments
:
comment
}




}
}
>
Extended schema
>
db.books.find({
title
:
"The
Old
Man
and
the
Sea"
})
{




_id
:
ObjectId("12345678901234567890abcd"),




author
:
"Ernest
Hemingway",




title
:
"The
Old
Man
and
the
Sea",




tags
:
["American
Literature",
"Sea",
"Large
Fish"],




comments_count
:
1,




comments
:
[








{












author
:
"Robert",












text
:
"Great
book",












date
:
"Wed
Feb
02
2011
10:36:18
..."








}




]
}
>
Using the extended schema
//
create
index
on
nested
element
>
db.books.ensureIndex({
"comments.author"
:
1
})

//
find
books
Robert
has
commented
on
>
db.books.find({
"comments.author"
:
"Robert"
})

//
find
book
with
most
comments
>
db.books.find().sort({
"comments_count"
:
‐1}).limit(1)

//
when
sorting,
check
if
you
need
an
index
Watch for full table scans
Examine the query plan
>
db.books.find()



.sort({
"comments_count"
:
‐1}).limit(1).explain()
{




cursor
:
"BasicCursor",




nscanned
:
12345,




nscannedObjects
:
12345,




n
:
1,




millis
:
123




indexBounds
:
{
}
}
>
Inheritance
Single table inheritance
Shapes table:
id     type     area   radius side length width

1      circle 3.14     1



2      square 4              2



3      rect     10                5       2
Single table inheritance: MongoDB
>
db.shapes.find()
{
_id
:
1,
type
:
"circle",
area
:
3.14,
radius
:
1
},
{
_id
:
2,
type
:
"square",
area
:
4,
side
:
2
},
{
_id
:
3,
type
:
"rect",
area
:
10,
length
:
5,
width
:
2
}

//
find
shapes
where
radius
>
0
>
db.shapes.find({
radius
:
{
$gt
:
0
}
})
{
_id
:
1,
type
:
"circle",
area
:
3.14,
radius
:
1
},

//
find
shapes
where
area
>=
4
>
db.shapes.find({
area
:
{
$gte
:
4
}
})
{
_id
:
2,
type
:
"square",
area
:
4,
side
:
2
},
{
_id
:
3,
type
:
"rect",
area
:
10,
length
:
5,
width
:
2
}

//
db.ensureIndex({
radius
:
1
})
One-to-many
Options
•Embedded Array
•Embedded Document
•Normalized
One-to-many: embedded array
>
db.books.find()
{




author
:
"Ernest
Hemingway",




title
:
"The
Old
Man
and
the
Sea",




comments
:
[








{
author
:
"Robert",
text
:
"Great
book"
},








{
author
:
"Jim",
text
:
"I
didn't
like
it"
}




]
}
>
One to many: embedded trees
>
db.books.find()
{




author
:
"Ernest
Hemingway",




title
:
"The
Old
Man
and
the
Sea",




comments
:
[








{












author
:
"Robert",












text
:
"Great
book"












replies
:
[
















{




















author
:
"Jim",




















text
:
"I
didn't
like
it"
















}












]








}




]
}
>
One-to-many: normalized
>
db.books.find()
{




_id
:
1,




author
:
"Ernest
Hemingway",




title
:
"The
Old
Man
and
the
Sea",




comment_ids
:
[1,
2]
}
>
db.comments.find()
{
_id
:
1,
book_id
:
1,
author
:
"Robert",
text
:
"Great

book"
}
{
_id
:
2,
book_id
:
1,
author
:
"Jim",
text
:
"I
didn't
like

it"
}
>
Many-to-many
Example:
• Product can be in many categories
• Category has many products
Many-to-many: products and categories
>
db.products.find()
{




_id
:
1,




name
:
"Baseball
bat",




category_ids
:
[1,
2]
}

>
db.categories.find()
{




_id
:
1,




name
:
"Sports
Equipment",




product_ids
:
[1]
}
{




_id
:
2,




name
:
"Baseball",




product_ids
:
[1,
...]
}
Many-to-many: queries
//
all
products
for
a
given
category
>
db.products.find({
category_ids
:
1
})

//
all
categories
for
a
given
product
>
db.categories.find({
product_ids
:
1
})
Many-to-many: products and categories
(normalized)
>
db.products.find()
{




_id
:
1,




name
:
"Baseball
bat",




category_ids
:
[1,
2]
}

>
db.categories.find()
{




_id
:
1,




name
:
"Sports
Equipment"
}
{




_id
:
2,




name
:
"Baseball"
}
Many-to-many: queries (normalized)
//
all
products
for
a
given
category
>
db.products.find({
category_ids
:
1
})

//
all
categories
for
a
given
product
>
product
=
db.product.findOne({
_id
:
1
})
>
db.categories.find(




{
_id
:
{
$in
:
product.category_ids
}
})
Trees
Options:
•Full tree in document
•Parent links
•Child links
•Parent and child links
•Array of ancestors
•Ancestor paths
Trees: full tree in document
{




comments
:
[








{
author
:
"Robert",
text
:
"...",












replies
:
[
















{
author
:
"Jim",
text
:
"...",




















replies
:
[]
















}












]








}




]
}

Pros:
single
document,
performance,
intuitive
Cons:
hard
to
search,
hard
to
get
partial
results,
document

size
limit
could
be
reached
Trees: Parent and child links
Parent links
• Each node is stored as a document
• Contains the id of the parent

Child links
• Each node is stored as a document
• Contains the ids of the children

In some cases you might do both
Trees: array of ancestors
>
db.nodes.find()
{
_id
:
1
}
{
_id
:
2,
ancestors
:
[1],
parent
:
1
}
{
_id
:
3,
ancestors
:
[1,
2],
parent
:
2
}
{
_id
:
4,
ancestors
:
[1,
2],
parent
:
2
}
{
_id
:
5,
ancestors
:
[1],
parent
:
1
}
{
_id
:
6,
ancestors
:
[1,
5],
parent
:
5
}
Trees: array of ancestors (queries)
//
find
all
children
of
2
>
db.nodes.find({
parent
:
2
})

//
find
all
descendents
of
2
>
db.nodes.find({
ancestors
:
2
})

//
find
all
ancestors
of
6
>
node
=
db.nodes.findOne({
_id
=
6
})
>
db.nodes.find({
_id
:
{
$in
:
node.ancestors
}
})

//
find
all
siblings
of
3
>
node
=
db.nodes.findOne({
_id
=
3
})
>
db.nodes.find({
parent
:
node.parent,
_id
:
{
$ne
:
3
}
})
Trees: paths
store
hierarchy
as
a
path
expression
separate
each
node
by
a
delimiter
(avoid
"/"
and
".")
use
regular
expressions
to
find
parts
of
a
tree

>
db.nodes.find()
{
_id
:
1,
path
:
",1,"
}
{
_id
:
2,
path
:
",1,2,"
}
{
_id
:
3,
path
:
",1,2,3,"
}
{
_id
:
4,
path
:
",1,2,4,"
}
{
_id
:
5,
path
:
",1,5,"
}
{
_id
:
6,
path
:
",1,5,6,"
}

variations:
don't
store
leading
or
trailing
delimiter
don't
store
final
id
(it's
the
same
as
_id)
Trees: paths (queries)
//
find
all
descendents
of
2
>
db.nodes.find({
path
:
/,2,/
})

//
find
all
children
of
2
>
db.nodes.find({
path
:
/,2,[^,]+,$/
})
or
>
db.nodes.find({
path
:
/,2,$/
})
//
if
_id
is
not
on
path

//
find
all
ancestors
of
6
//
not
so
easy

//
find
all
siblings
of
3
//
not
so
easy
Queues
Need
to
maintain
order
and
state
Ensure
that
updates
to
the
queue
are
atomic

>
db.queue.find()
{
_id
:
1,
inprogress
:
false,
priority
:
1,
job
:
...
}

//
take
the
highest
priority
pending
job
>
db.queue.findAndModify(




query
:
{
inprogress
:
false
},




sort
:
{
priority
:
‐1
},




update
:
{








$set
:
{












inprogress
:
true,












started
:
Date()








}




},




new
:
true
)
>
Summary
• Schema design is different in MongoDB
• Basic principles stay the same
• Use rich documents
• There's more than one right way
• Focus on how your application uses the
  data
• Rapidly evolve the schema to meet your
  requirements
Thank you
Learn more
• www.mongodb.org
• www.10gen.com/events
• www.10gen.com/webinars

Schema Design