Schema Design at Scale
       Eliot Horowitz
       @eliothorowitz
          MongoSF
        May 24, 2011
Schema

• Single biggest performance factor
• More choices than in an RDBMS
• Embedding, index design, shard keys
Embedding
• Great for read performance
• One seek to load entire object
• One roundtrip to database
• Writes can be slow if adding to objects all
  the time
• Should you embed comments?
Blog Post - Embedded
  {   _id : “/post/eliot/2011-05-24/1”),
      author : "eliot",
      text : "About MongoDB...",
      tags : [ "tech", "databases" ],
      comments : [
	   {
	   	   author : "Fred",
	   	   date : "Sat Apr 25 2010 20:51:03
GMT-0700",
	   	   text : "Best Post Ever!"
	   }
  ]}
Blog Post - Not Embedded
             blog.posts
     {       _id : “/post/eliot/2011-05-24/1”),
             author : "eliot",
             text : "About MongoDB...",
             tags : [ "tech", "databases" ]
     }

             blog.comments
     {
             post :   “/post/eliot/2011-05-24/1”
 	           author   : "Fred",
 	           date :   "May 24 2011",
 	           text :   "Best Post Ever!"
 	       }
Blog Post - Hybrid
blog.comments

    {
        _id : “/post/eliot/2011-05-24/1---1”
        comments : [
    	     { author : "Fred",
	           date : "May 24 2011",
	           text : "Best Post Ever!" } ,
    	     { author : "Bob",
	           date : "May 24 2011",
	           text : "Awesome" } ,
        ]
	   }
Indexes

• Index common queries
• Make sure there aren’t duplicates: (A) and
  (A,B) aren’t needed
• Right-balanced indexes keep working set
  small
Random Index Access


 Have to keep
entire index in
     ram
                                        •email address
                                        •hash
Right-Balanced Index Access

Only have to keep
 small portion in                   •Time Based
       ram                          •ObjectId
                                    •Auto Increment
Covered Indexes

• Keep data sequential in index
• find( { email : “eliot@10gen.com” } , { first :
   1 , last : 1 , state : 1 } )
• index: { email : 1 , first : 1 , last : 1 , state : 1 }
Choosing a Shard Key

• Shard key determines how data is
  partitioned
• Hard to change
• Most important performance decision
Range Based



• collection is broken into chunks by range
• chunks default to 200mb or 100,000
  objects
Use Case: User Profiles
  { email : “eliot@10gen.com” ,
      addresses : [ { state : “NY” } ]
  }
• Shard by email
• Lookup by email hits 1 node
• Index on { “addresses.state” : 1 }
Use Case: Activity
          Stream
  { user_id : XXX, event_id : YYY , data : ZZZ }
• Shard by user_id
• Looking up an activity stream hits 1 node
• Writing even is distributed
• Index on { “event_id” : 1 } for deletes
Use Case: Photos
  { photo_id : ???? , data : <binary> }
  What’s the right key?
• auto increment
• MD5( data )
• now() + MD5(data)
• month() + MD5(data)
Use Case: Logging
    { machine : “app.foo.com” , app : “apache” ,
     when : “2010-12-02:11:33:14” , data : XXX }
    Possible Shard keys
•   { machine : 1 }
•   { when : 1 }
•   { machine : 1 , app : 1 }
•   { app : 1 }
Download MongoDB
      http://www.mongodb.org



   and
let
us
know
what
you
think
    @eliothorowitz



@mongodb


       10gen is hiring!
http://www.10gen.com/jobs

2011 mongo sf-schemadesign

  • 1.
    Schema Design atScale Eliot Horowitz @eliothorowitz MongoSF May 24, 2011
  • 2.
    Schema • Single biggestperformance factor • More choices than in an RDBMS • Embedding, index design, shard keys
  • 3.
    Embedding • Great forread performance • One seek to load entire object • One roundtrip to database • Writes can be slow if adding to objects all the time • Should you embed comments?
  • 4.
    Blog Post -Embedded { _id : “/post/eliot/2011-05-24/1”), author : "eliot", text : "About MongoDB...", tags : [ "tech", "databases" ], comments : [ { author : "Fred", date : "Sat Apr 25 2010 20:51:03 GMT-0700", text : "Best Post Ever!" } ]}
  • 5.
    Blog Post -Not Embedded blog.posts { _id : “/post/eliot/2011-05-24/1”), author : "eliot", text : "About MongoDB...", tags : [ "tech", "databases" ] } blog.comments { post : “/post/eliot/2011-05-24/1” author : "Fred", date : "May 24 2011", text : "Best Post Ever!" }
  • 6.
    Blog Post -Hybrid blog.comments { _id : “/post/eliot/2011-05-24/1---1” comments : [ { author : "Fred", date : "May 24 2011", text : "Best Post Ever!" } , { author : "Bob", date : "May 24 2011", text : "Awesome" } , ] }
  • 7.
    Indexes • Index commonqueries • Make sure there aren’t duplicates: (A) and (A,B) aren’t needed • Right-balanced indexes keep working set small
  • 8.
    Random Index Access Have to keep entire index in ram •email address •hash
  • 9.
    Right-Balanced Index Access Onlyhave to keep small portion in •Time Based ram •ObjectId •Auto Increment
  • 10.
    Covered Indexes • Keepdata sequential in index • find( { email : “eliot@10gen.com” } , { first : 1 , last : 1 , state : 1 } ) • index: { email : 1 , first : 1 , last : 1 , state : 1 }
  • 11.
    Choosing a ShardKey • Shard key determines how data is partitioned • Hard to change • Most important performance decision
  • 12.
    Range Based • collectionis broken into chunks by range • chunks default to 200mb or 100,000 objects
  • 13.
    Use Case: UserProfiles { email : “eliot@10gen.com” , addresses : [ { state : “NY” } ] } • Shard by email • Lookup by email hits 1 node • Index on { “addresses.state” : 1 }
  • 14.
    Use Case: Activity Stream { user_id : XXX, event_id : YYY , data : ZZZ } • Shard by user_id • Looking up an activity stream hits 1 node • Writing even is distributed • Index on { “event_id” : 1 } for deletes
  • 15.
    Use Case: Photos { photo_id : ???? , data : <binary> } What’s the right key? • auto increment • MD5( data ) • now() + MD5(data) • month() + MD5(data)
  • 16.
    Use Case: Logging { machine : “app.foo.com” , app : “apache” , when : “2010-12-02:11:33:14” , data : XXX } Possible Shard keys • { machine : 1 } • { when : 1 } • { machine : 1 , app : 1 } • { app : 1 }
  • 17.
    Download MongoDB http://www.mongodb.org and
let
us
know
what
you
think @eliothorowitz



@mongodb 10gen is hiring! http://www.10gen.com/jobs

Editor's Notes