2011 mongo sf-schemadesign
 

Like this? Share it with your network

Share

2011 mongo sf-schemadesign

on

  • 13,765 views

Interested in learning more about MongoDB? Sign up for MongoSV, the largest annual user conference dedicated to MongoDB. Learn more at MongoSV.com

Interested in learning more about MongoDB? Sign up for MongoSV, the largest annual user conference dedicated to MongoDB. Learn more at MongoSV.com

Statistics

Views

Total Views
13,765
Views on SlideShare
3,362
Embed Views
10,403

Actions

Likes
4
Downloads
118
Comments
1

16 Embeds 10,403

http://www.10gen.com 9024
http://www.mongodb.com 1184
http://vermelho.jugem.jp 81
http://archive.10gen.com 62
url_unknown 17
http://translate.googleusercontent.com 12
http://www.slideshare.net 9
http://blog.naver.com 4
http://www.twylah.com 2
http://207.46.192.232 2
http://paper.li 1
http://webcache.googleusercontent.com 1
http://www.theofficialboard.com 1
http://drupal1.10gen.cc 1
http://a0.twimg.com 1
http://localhost:8080 1
More...

Accessibility

Categories

Upload Details

Uploaded via as Apple Keynote

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
  • This is really a nice presentation.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • \n
  • not schema less - dynamic schema\nschema is just as important, or more important than relational\nunderstand write vs read tradeoffs\n\n
  • compare to mysql here\n\n
  • \n
  • \n
  • \n
  • most common performance problem\nwhy _id index can be ignored\n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n

2011 mongo sf-schemadesign Presentation Transcript

  • 1. Schema Design at Scale Eliot Horowitz @eliothorowitz MongoSF May 24, 2011
  • 2. Schema• Single biggest performance factor• More choices than in an RDBMS• Embedding, index design, shard keys
  • 3. Embedding• Great for read performance• One seek to load entire object• One roundtrip to database• Writes can be slow if adding to objects all the time• Should you embed comments?
  • 4. Blog Post - Embedded { _id : “/post/eliot/2011-05-24/1”), author : "eliot", text : "About MongoDB...", tags : [ "tech", "databases" ], comments : [ { author : "Fred", date : "Sat Apr 25 2010 20:51:03GMT-0700", text : "Best Post Ever!" } ]}
  • 5. Blog Post - Not Embedded blog.posts { _id : “/post/eliot/2011-05-24/1”), author : "eliot", text : "About MongoDB...", tags : [ "tech", "databases" ] } blog.comments { post : “/post/eliot/2011-05-24/1” author : "Fred", date : "May 24 2011", text : "Best Post Ever!" }
  • 6. Blog Post - Hybridblog.comments { _id : “/post/eliot/2011-05-24/1---1” comments : [ { author : "Fred", date : "May 24 2011", text : "Best Post Ever!" } , { author : "Bob", date : "May 24 2011", text : "Awesome" } , ] }
  • 7. Indexes• Index common queries• Make sure there aren’t duplicates: (A) and (A,B) aren’t needed• Right-balanced indexes keep working set small
  • 8. Random Index Access Have to keepentire index in ram •email address •hash
  • 9. Right-Balanced Index AccessOnly have to keep small portion in •Time Based ram •ObjectId •Auto Increment
  • 10. Covered Indexes• Keep data sequential in index• find( { email : “eliot@10gen.com” } , { first : 1 , last : 1 , state : 1 } )• index: { email : 1 , first : 1 , last : 1 , state : 1 }
  • 11. Choosing a Shard Key• Shard key determines how data is partitioned• Hard to change• Most important performance decision
  • 12. Range Based• collection is broken into chunks by range• chunks default to 200mb or 100,000 objects
  • 13. Use Case: User Profiles { email : “eliot@10gen.com” , addresses : [ { state : “NY” } ] }• Shard by email• Lookup by email hits 1 node• Index on { “addresses.state” : 1 }
  • 14. Use Case: Activity Stream { user_id : XXX, event_id : YYY , data : ZZZ }• Shard by user_id• Looking up an activity stream hits 1 node• Writing even is distributed• Index on { “event_id” : 1 } for deletes
  • 15. Use Case: Photos { photo_id : ???? , data : <binary> } What’s the right key?• auto increment• MD5( data )• now() + MD5(data)• month() + MD5(data)
  • 16. Use Case: Logging { machine : “app.foo.com” , app : “apache” , when : “2010-12-02:11:33:14” , data : XXX } Possible Shard keys• { machine : 1 }• { when : 1 }• { machine : 1 , app : 1 }• { app : 1 }
  • 17. Download MongoDB http://www.mongodb.org and
let
us
know
what
you
think @eliothorowitz



@mongodb 10gen is hiring!http://www.10gen.com/jobs