Schema Design (Mongo Austin)

Schema Design
Bernie Hackett
bernie@10gen.com

Topics

Introduction
• Basic Data Modeling
• Manipulating Data
• Evolving a schema
Common patterns
• Single table inheritance
• One-to-Many & Many-to-Many
• Trees
• Queues

So why model data?

http://www.ﬂickr.com/photos/42304632@N00/493639870/

Beneﬁts of relational

• Before relational
• Data and Logic combined
• After relational
• Separation of concerns
• Data modeled independent of logic
• Logic freed from concerns of data design

• MongoDB continues this separation

Normalization

Goals
• Avoid anomalies when inserting, updating or
deleting
• Minimize redesign when extending the schema
• Make the model informative to users
• Avoid bias toward a particular query
In MongoDB
• Similar goals apply
• The rules are different

Relational made normalized
data look like this

Document databases make
normalized data look like this

Terminology
RDBMS MongoDB
Table Collection
Row(s) JSON Document
Index Index
Join Embedding & Linking
Partition Shard
Partition Key Shard Key

DB Considerations
How can we manipulate Access Patterns?
this data?

• Dynamic Queries • Read / Write Ratio
• Secondary Indexes • Types of updates
• Atomic Updates • Types of queries
• Map Reduce • Data life-cycle
Further Considerations
• No Joins
• Document writes are atomic

So today’s example will use...

Design Session
Design documents that simply map to
your application
> post = { author: "Hergé",
        date: new Date(),
            text: "Destination Moon",
            tags: [ "comic", 
"adventure" ] }

> db.post.save(post)

Find the document
> db.posts.find()

  { _id: ObjectId("4c4ba5c0672c685e5e8aabf3"),
    author: "Hergé", 
    date: "Sat Jul 24 2010 19:47:11 GMT‐0700 (PDT)", 
    text: "Destination Moon", 
    tags: [ "comic", "adventure" ]
  }  

Notes:
• ID must be unique, but can be anything you’d like
• MongoDB will generate a default ID if one is not
supplied

Add and index, ﬁnd via Index
Secondary index for "author"

 //   1 means ascending, ‐1 means descending

 > db.posts.ensureIndex( {author: 1 } )

 > db.posts.find( { author: 'Hergé' } ) 
 
   { _id: ObjectId("4c4ba5c0672c685e5e8aabf3"),
     date: "Sat Jul 24 2010 19:47:11 GMT‐0700 (PDT)",
     author: "Hergé", 
     ... }

Verifying indexes exist
> db.posts.getIndexes()

// Index on ID
  { name: "_id_", 
    ns: "test.posts", 
    key: { "_id" : 1 } }

// Index on author
  { _id: ObjectId("4c4ba6c5672c685e5e8aabf4"), 
    ns: "test.posts", 
    key: { "author" : 1 }, 
    name: "author_1" }

Examine the query plan
> db.blogs.find( { author: 'Hergé' } ).explain()
{
  "cursor" : "BtreeCursor author_1",
  "nscanned" : 1,
  "nscannedObjects" : 1,
  "n" : 1,
  "millis" : 5,
  "indexBounds" : {
    "author" : [
      [
        "Hergé",
        "Hergé"
      ]
    ]
  }
}

Query operators
Conditional operators:
$ne, $in, $nin, $mod, $all, $size, $exists, $type,
$lt, $lte, $gt, $gte

// find posts with any tags
> db.posts.find( { tags: { $exists: true } } )

Query operators


Regular expressions:
// posts where author starts with h
> db.posts.find( { author: /^h/i } )

Query operators


Regular expressions:
// posts where author starts with h
> db.posts.find( { author: /^h/i } ) 

Counting:
// number of posts written by Hergé
> db.posts.find( { author: "Hergé" } ).count()

Extending the Schema
   
 > new_comment = { author: "Bernie", 
              date: new Date(),
              text: "great book" }

 > db.posts.update(
           { text: "Destination Moon" }, 
           { '$push': { comments: new_comment },
             '$inc':  { comments_count: 1 } } )

 
  { _id : ObjectId("4c4ba5c0672c685e5e8aabf3"), 
    author : "Hergé",
    date : "Sat Jul 24 2010 19:47:11 GMT‐0700 (PDT)", 
    text : "Destination Moon",
    tags : [ "comic", "adventure" ],
    
    comments : [
  {
    author : "Bernie",
    date : "Sat Jul 24 2010 20:51:03 GMT‐0700 (PDT)",
    text : "great book"
  }
    ],
    comments_count: 1
  }

// create index on nested documents:
> db.posts.ensureIndex( { "comments.author": 1 } )

> db.posts.find( { "comments.author": "Bernie" } )



// find last 5 posts:
> db.posts.find().sort( { date: ‐1 } ).limit(5)



// find last 5 posts:
> db.posts.find().sort( { date: ‐1 } ).limit(5)

// most commented post:
> db.posts.find().sort( { comments_count: 
‐1 } ).limit(1)

When sorting, check if you need an index

Watch for full table scans

> db.blogs.find( { text: 'Destination 
Moon' } ).explain()  
{
  "cursor" : "BasicCursor",
  "nscanned" : 1,
  "nscannedObjects" : 1,
  "n" : 1,
  "millis" : 0,
  "indexBounds" : {
   
  }
}

Map reduce : count tags
mapFunc = function () {
    this.tags.forEach( function( z ) { emit( z, { count:
1 } ); } );
}

reduceFunc = function( k, v ) {
    var total = 0;
    for ( var i = 0; i < v.length; i++ ) {  
         total += v[i].count;
    }
    return { count: total }; 
}

res = db.posts.mapReduce( mapFunc, reduceFunc )

>db[res.result].find()
     { _id : "comic", value : { count : 1 } }
     { _id : "adventure", value : { count : 1 } }

Group

• Equivalent to a Group By in SQL

• Specify the attributes to group the data

• Process the results in a Reduce function

Group - Count post by Author
cmd = { key: { "author": true },
        initial: { count: 0 },
        reduce: function(obj, prev) {
                prev.count++;
              },
      };
result = db.posts.group(cmd);

[
  {
    "author" : "Hergé",
    "count" : 1
  },
  {
    "author" : "Kyle",
    "count" : 3
  }
]

Review

So Far:
- Started out with a simple schema
- Queried Data
- Evolved the schema
- Queried / Updated the data some more

Single Table Inheritance - RDBMS
shapes table

id type area radius d length width

1 circle 3.14 1

2 square 4 2

3 rect 10 5 2

Single Table Inheritance -
MongoDB
> db.shapes.find()
 { _id: "1", type: "circle", area: 3.14, radius: 1 }
 { _id: "2", type: "square", area: 4, d: 2 }
 { _id: "3", type: "rect", area: 10, length: 5, width: 2 }

MongoDB

// find shapes where radius > 0 
> db.shapes.find( { radius: { $gt: 0 } } )

MongoDB

// find shapes where radius > 0 
> db.shapes.find( { radius: { $gt: 0 } } )

// create index
> db.shapes.ensureIndex( { radius: 1 } )

One to Many
One to Many relationships can specify
• degree of association between objects
• containment
• life-cycle

One to Many
- Embedded Array / Array Keys
- slice operator to return subset of array
- some queries harder
e.g ﬁnd latest comments across all documents
blogs: {    
    {
    text : "great book"
    }
    ] }

One to Many
- Embedded tree
- Single document
- Natural
- Hard to query
    {
    text : "great book",
      replies: [ { author : “James”, ... } ]
    }
    ] }

One to Many
- Normalized (2 collections)
- most ﬂexible
- more queries
     { comment : ObjectId(“1”) }
    ] }

comments : { _id : “1”,
             author : "James",
  date : "Sat Jul 24 2010 20:51:03 ..." }

One to Many - patterns


- Embedded tree
- Normalized

Many - Many
Example:

- Product can be in many categories
- Category can have many products

Many - Many

products:
   { _id: ObjectId("10"),
     name: "Destination Moon",
     category_ids: [ ObjectId("20"), ObjectId("30") ] }

Many - Many

products:
  
categories:
   { _id: ObjectId("20"), 
     name: "adventure", 
     product_ids: [ ObjectId("10"), ObjectId("11"), 
ObjectId("12") ] }

Many - Many

products:
  
categories:
   { _id: ObjectId("20"), 
     name: "adventure", 
     product_ids: [ ObjectId("10"), ObjectId("11"), 
ObjectId("12") ] }

//All categories for a given product
> db.categories.find( { product_ids: ObjectId("10") } )

Alternative
products:
  
categories:
   { _id: ObjectId("20"), 
     name: "adventure" }

Alternative
products:
  
categories:
   { _id: ObjectId("20"), 

// All products for a given category
> db.products.find( { category_ids: ObjectId("20") } )

Alternative
products:
  
categories:
   { _id: ObjectId("20"), 

// All products for a given category
> db.products.find( { category_ids: ObjectId("20") } ) 

// All categories for a given product
product  = db.products.find(_id : some_id)
> db.categories.find( { _id : { $in : 
product.category_ids } } )

Trees
Full Tree in Document

{ comments: [
     { author: "Bernie", text: "...", 
       replies: [
                      {author: "James", text: "...",
                       replies: [ ] } 
       ] }
  ]
}

Pros: Single Document, Performance, Intuitive

Cons: Hard to search, Partial Results, 16MB limit

Trees
Parent Links
- Each node is stored as a document
- Contains the id of the parent

Child Links
- Each node contains the id’s of the children
- Can support graphs (multiple parents / child)

Array of Ancestors
- Store all Ancestors of a node
  { _id: "a" }
  { _id: "b", ancestors: [ "a" ], parent: "a" }
  { _id: "c", ancestors: [ "a", "b" ], parent: "b" }
  { _id: "d", ancestors: [ "a", "b" ], parent: "b" }
  { _id: "e", ancestors: [ "a" ], parent: "a" }
  { _id: "f", ancestors: [ "a", "e" ], parent: "e" }

Array of Ancestors
  { _id: "a" }

//find all descendants of b:
> db.tree2.find( { ancestors: 'b' } )

//find all direct descendants of b:
> db.tree2.find( { parent: 'b' } )

Array of Ancestors
  { _id: "a" }

//find all descendants of b:
> db.tree2.find( { ancestors: 'b' } )

//find all direct descendants of b:
> db.tree2.find( { parent: 'b' } )

//find all ancestors of f:
> ancestors = db.tree2.findOne( { _id: 'f' } ).ancestors
> db.tree2.find( { _id: { $in : ancestors } )

Trees as Paths
Store hierarchy as a path expression
- Separate each node by a delimiter, e.g. "/"
- Use text search for ﬁnd parts of a tree

{ comments: [
     { author: "Bernie", text: "initial post", 
       path: "/" },
     { author: "Jim",  text: "jim’s comment",
       path: "/jim" },
     { author: "Bernie", text: "Bernie’s reply to Jim",
       path : "/jim/bernie"} ] }

// Find the conversations Jim was a part of
> db.posts.find( { path: /jim/i } )

Queue
• Need to maintain order and state
• Ensure that updates to the queue are atomic
   { inprogress: false,
     priority: 1, 
   ...
   }

Queue
• Need to maintain order and state
• Ensure that updates to the queue are atomic
   { inprogress: false,
     priority: 1, 
   ...
   }

// find highest priority job and mark as in‐progress
job = db.jobs.findAndModify( {
               query:  { inprogress: false },
               sort:   { priority: ‐1 }, 
               update: { $set: {inprogress: true, 
              started: new Date() } },
               new: true } )

Summary

Schema design is different in MongoDB

Basic data design principals stay the same

Focus on how the apps manipulates data

Rapidly evolve schema to meet your requirements

Enjoy your new freedom, use it wisely :-)

download at mongodb.org

We’re Hiring !
bernie@10gen.com

conferences, appearances, and meetups
http://www.10gen.com/events

    Facebook      |     Twitter     |     LinkedIn
http://bit.ly/mongo>  @mongodb http://linkd.in/joinmongo

Schema Design (Mongo Austin)

More Related Content

What's hot

Viewers also liked

Similar to Schema Design (Mongo Austin)

More from MongoDB

Recently uploaded

Schema Design (Mongo Austin)

Editor's Notes