Schema Design

Mongo Atlanta
Schema Design
Robert Stam
robert@10gen.com

Topics
Introduction
• Basic data modeling
• Manipulating data
• Evolving a schema

Common patterns
• Single table inheritance
• One-to-many
• Many-to-many
• Trees
• Queues

Beneﬁt of relational
Before relational model
• Data and logic combined

After relational model
• Separation of concerns
• Data model independent of logic
• Logic freed from concerns of data design

MongoDB continues this separation

Normalization
Goals
• Avoid anomalies when inserting, updating or
deleting
• Minimize redesign when extending the
schema
• Make the model informative to users
• Avoid bias toward a particular query

In MongoDB
• Similar goals apply
• But rules are different

Relational model makes
normalized data looks like

Document databases make
normalized data look like

Terminology
Relational MongoDB
Table Collection
Row(s) Documents
Index Index
Join Embedding and linking
Partition Shard
Partition key Shard key

Collections
• Cheap to create (max 24000)
• Collections don’t have a schema
• Individual documents have a schema
• Common for documents in a collection to
share a schema
• Document schema can evolve
• Consider using multiple related collections
tied together by a naming convention:
• e.g. LogData-2011-02-08

Document basics
• Zero or more elements
• Elements are name/value pairs
• Rich data types for values
• JSON
• BSON

Data types
• Numeric (Int32, Int64, Double)
• String
• Boolean
• DateTime
• ObjectId
• Others (Javascript, Regex, Binary, Null, ...)
• Array
• Nested document

Experimenting with MongoDB
• Mongo shell
• Javascript
$ mongo
MongoDB shell version: 1.7.5
connecting to: test
> db.books.find()
{
    _id : ObjectId("12345678901234567890abcd"),
    author : "Ernest Hemingway",
    title : "The Old Man and the Sea"
}
>

Sample rich document
> db.orders.findOne()
{
    _id : 1,
    customer : {
        customer_id : 1234,
        name : "John Doe",
        address : {
            line1 : "123 Main St",
            city : "Duncannon",
            state : "PA",
            zip : "12345‐6789"
        }
    }
    items : [
        { item_id : 111, ... } // data for first item
        { item_id : 222, ... } // data for next item
        ...
    ]
}

Rich document advantages
• Holistic representation
• Still easy to manipulate
• Pre-joined for fast retrieval

Document size
• Max 4MB in earlier MongoDB versions
• Max 16MB in current versions
• Performance considerations long before
reaching the maximum size

Database considerations
• How can we manipulate this data?
• Dynamic queries
• Secondary indexes
• Atomic updates
What are the access patterns?
• Read/write ratio
• Types of updates
• Types of queries
• Data life-cycle
Considerations
• No joins
• Document writes are atomic

Document design
• Design documents that map simply to your
application data
> book = {
    title : "The Old Man and the Sea",
    tags : ["American Literature", "Sea", "Large Fish"]
}
> db.books.insert(book)
>

Find the document
> db.books.find({ author : "Ernest Hemingway" })
{
    _id : ObjectId("12345678901234567890abcd"),
}
>

Notes:
•Every document must have a unique _id
•MongoDB will generate one automatically if
your document does not have an _id

Find via index
> db.books.ensureIndex({ author : 1 })

{
    _id : ObjectId("12345678901234567890abcd"),
}
>

Verify index exists
> db.books.getIndexes()
{
    ...,
    {
        _id : ObjectId("12345678901234567890abcd"),
        ns : "test.books",
        key : { author : 1 },
        name : "author_1"
    },
    ...
}
>

Verify index is used
Examine the query plan
> db.books.find({ author : "Ernest Hemingway" }).explain()
{
    cursor : "BtreeCursor author_1",
    nscanned : 1,
    nscannedObjects : 1,
    n : 1,
    millis : 1,
    indexBounds : {
        author : [
            [ "Ernest Hemingway", "Ernest Hemingway" ]
        ]
    }
}
>

Query operators
Conditional operators
• equals ({ author : "..." })
• matches ({ author : /^e/i })
• $ne, $in, $nin, $mod, $all, $size, $exists,
$type, $lt, $lte, $gt, $gte, $ne

Sample queries
// find books by "Ernest Hemingway"

// find books by authors whose name starts with "e"
> db.books.find({ author : /^e/i })

// find books tagged "American Literature"
> db.books.find({ tags : "American Literature" })

// find books that have a tags element
> db.books.find({ tags : { $exists : true } })

// count books by authors whose name starts with "e"
> db.books.find({ author : /^e/i }).count()

Extending the schema
> comment = {
    author : "Robert",
    text : "Great book",
    date : Date()
}
> db.books.update(
    { title : "The Old Man and the Sea" },
    { 
        $inc : { comments_count : 1 },
        $push : { comments : comment }
    }
}
>

Extended schema
> db.books.find({ title : "The Old Man and the Sea" })
{
    _id : ObjectId("12345678901234567890abcd"),
    tags : ["American Literature", "Sea", "Large Fish"],
    comments_count : 1,
    comments : [
        {
            author : "Robert",
            text : "Great book",
            date : "Wed Feb 02 2011 10:36:18 ..."
        }
    ]
}
>

Using the extended schema
// create index on nested element
> db.books.ensureIndex({ "comments.author" : 1 })

// find books Robert has commented on
> db.books.find({ "comments.author" : "Robert" })

// find book with most comments
> db.books.find().sort({ "comments_count" : ‐1}).limit(1)

// when sorting, check if you need an index

Watch for full table scans
Examine the query plan
> db.books.find()
   .sort({ "comments_count" : ‐1}).limit(1).explain()
{
    cursor : "BasicCursor",
    nscanned : 12345,
    nscannedObjects : 12345,
    n : 1,
    millis : 123
    indexBounds : { }
}
>

Single table inheritance
Shapes table:
id type area radius side length width

1 circle 3.14 1

2 square 4 2

3 rect 10 5 2

Single table inheritance: MongoDB
> db.shapes.find()
{ _id : 1, type : "circle", area : 3.14, radius : 1 },
{ _id : 2, type : "square", area : 4, side : 2 },
{ _id : 3, type : "rect", area : 10, length : 5, width : 2 }

// find shapes where radius > 0
> db.shapes.find({ radius : { $gt : 0 } })
{ _id : 1, type : "circle", area : 3.14, radius : 1 },

// find shapes where area >= 4
> db.shapes.find({ area : { $gte : 4 } })
{ _id : 2, type : "square", area : 4, side : 2 },
{ _id : 3, type : "rect", area : 10, length : 5, width : 2 }

// db.ensureIndex({ radius : 1 })

One-to-many
Options
•Embedded Array
•Embedded Document
•Normalized

One-to-many: embedded array
> db.books.find()
{
        { author : "Robert", text : "Great book" },
        { author : "Jim", text : "I didn't like it" }
    ]
}
>

One to many: embedded trees
> db.books.find()
{
        {
            author : "Robert",
            text : "Great book"
            replies : [
                {
                    author : "Jim",
                    text : "I didn't like it"
                }
            ]
        }
    ]
}
>

One-to-many: normalized
> db.books.find()
{
    _id : 1,
    comment_ids : [1, 2]
}
> db.comments.find()
{ _id : 1, book_id : 1, author : "Robert", text : "Great 
book" }
{ _id : 2, book_id : 1, author : "Jim", text : "I didn't like 
it" }
>

Many-to-many
Example:
• Product can be in many categories
• Category has many products

Many-to-many: products and categories
> db.products.find()
{
    _id : 1,
    name : "Baseball bat",
    category_ids : [1, 2]
}

> db.categories.find()
{
    _id : 1,
    name : "Sports Equipment",
    product_ids : [1]
}
{
    _id : 2,
    name : "Baseball",
    product_ids : [1, ...]
}

Many-to-many: queries
// all products for a given category
> db.products.find({ category_ids : 1 })

// all categories for a given product
> db.categories.find({ product_ids : 1 })

Many-to-many: products and categories
(normalized)
> db.products.find()
{
    _id : 1,
    name : "Baseball bat",
    category_ids : [1, 2]
}

> db.categories.find()
{
    _id : 1,
    name : "Sports Equipment"
}
{
    _id : 2,
    name : "Baseball"
}

Many-to-many: queries (normalized)
// all products for a given category
> db.products.find({ category_ids : 1 })

// all categories for a given product
> product = db.product.findOne({ _id : 1 })
> db.categories.find(
    { _id : { $in : product.category_ids } })

Trees
Options:
•Full tree in document
•Parent links
•Child links
•Parent and child links
•Array of ancestors
•Ancestor paths

Trees: full tree in document
{
        { author : "Robert", text : "...",
            replies : [
                { author : "Jim", text : "...",
                    replies : []
                }
            ]
        }
    ]
}

Pros: single document, performance, intuitive
Cons: hard to search, hard to get partial results, document 
size limit could be reached

Trees: Parent and child links
Parent links
• Each node is stored as a document
• Contains the id of the parent

Child links
• Each node is stored as a document
• Contains the ids of the children

In some cases you might do both

Trees: array of ancestors
> db.nodes.find()
{ _id : 1 }
{ _id : 2, ancestors : [1], parent : 1 }
{ _id : 3, ancestors : [1, 2], parent : 2 }
{ _id : 5, ancestors : [1], parent : 1 }

Trees: array of ancestors (queries)
// find all children of 2
> db.nodes.find({ parent : 2 })

// find all descendents of 2
> db.nodes.find({ ancestors : 2 })

// find all ancestors of 6
> node = db.nodes.findOne({ _id = 6 })
> db.nodes.find({ _id : { $in : node.ancestors } })

// find all siblings of 3
> node = db.nodes.findOne({ _id = 3 })
> db.nodes.find({ parent : node.parent, _id : { $ne : 3 } })

Trees: paths
store hierarchy as a path expression
separate each node by a delimiter (avoid "/" and ".")
use regular expressions to find parts of a tree

> db.nodes.find()
{ _id : 1, path : ",1," }
{ _id : 2, path : ",1,2," }
{ _id : 3, path : ",1,2,3," }
{ _id : 4, path : ",1,2,4," }
{ _id : 5, path : ",1,5," }
{ _id : 6, path : ",1,5,6," }

variations:
don't store leading or trailing delimiter
don't store final id (it's the same as _id)

Trees: paths (queries)
// find all descendents of 2
> db.nodes.find({ path : /,2,/ })

// find all children of 2
> db.nodes.find({ path : /,2,[^,]+,$/ })
or
> db.nodes.find({ path : /,2,$/ }) // if _id is not on path

// find all ancestors of 6
// not so easy

// find all siblings of 3
// not so easy

Queues
Need to maintain order and state
Ensure that updates to the queue are atomic

> db.queue.find()
{ _id : 1, inprogress : false, priority : 1, job : ... }

// take the highest priority pending job
> db.queue.findAndModify(
    query : { inprogress : false },
    sort : { priority : ‐1 },
    update : {
        $set : {
            inprogress : true,
            started : Date()
        }
    },
    new : true
)
>

Summary
• Schema design is different in MongoDB
• Basic principles stay the same
• Use rich documents
• There's more than one right way
• Focus on how your application uses the
data
• Rapidly evolve the schema to meet your
requirements

Thank you
Learn more
• www.mongodb.org
• www.10gen.com/events
• www.10gen.com/webinars

Schema Design

More Related Content

What's hot

Similar to Schema Design

More from MongoDB

Recently uploaded

Schema Design

Editor's Notes