Your SlideShare is downloading. ×
0
Scaling with MongoDB       Eliot Horowitz       @eliothorowitz          MongoSV      December 3, 2010
Scaling• Storage needs only go up• Operations/sec only go up• Complexity only goes up
Scaling by Optimization• Schema Design• Index Design• Hardware Configuration
Horizontal Scaling• Vertical scaling is limited• Hard to scale vertically in the cloud• Can scale wider than higher
Schema• Modeling the same data in different ways  can change performance by orders of  magnitude• Very often performance p...
Embedding• Great for read performance• One seek to load entire object• One roundtrip to database• Writes can be slow if ad...
Should you embed comments?             {                 title : “MongoDB is fun” ,                 author : “eliot” ,    ...
Indexes• Index common queries• Make sure there aren’t duplicates: (A) and  (A,B) aren’t needed• Right-balanced indexes kee...
Random Index Access                       Have to keep                      entire index in                           ram
Right-Balanced Index Access                      Only have to keep                       small portion in                 ...
Covered Indexes    db.users.find( { name: “joe”} , { name: 1 , email: 1, _id:0} )•   Add email address in your index    db....
RAM Requirements• Understand working set• What percentage of your data has to fit in  RAM?• How do you figure this out?
Hardware• Disk performance• How many drives• What about ec2?• Network performance
Read Scaling• One master at any time• Programmer determines if read hits master  or a slave• Pro: easy to setup, can scale...
One Master, Many Slaves• Custom Master/Slave setup• Have as many slaves as you want• Can put them local to application ser...
Replica Sets• High Availability Cluster• One master at any time, up to 6 slaves• A slave automatically promoted to master ...
Sharding• Many masters, even more slaves• Can scale reads and writes in two  dimensions• Add slaves for inconsistent read ...
Architecture                     Shards            mongod   mongod     mongod                                             ...
Common Setup• Typical setup is 3 shards with 3 servers per  shard: 3 masters, 6 slaves• One massive collection, dozen non-...
Choosing a Shard Key• Shard key determines how data is  partitioned• Hard to change• Most important performance decision
Range Based       MIN          MAX        LOCATION        A            F           shard1        F            M           ...
Use Case: User Profiles  { email : “eliot@10gen.com” ,      addresses : [ { state : “NY” } ]  }• Shard by email• Lookup by ...
Use Case: Activity          Stream  { user_id : XXX, event_id : YYY , data : ZZZ }• Shard by user_id• Looking up an activi...
Use Case: Photos  { photo_id : ???? , data : <binary> }  What’s the right key?• auto increment• MD5( data )• now() + MD5(d...
Use Case: Logging    { machine : “app.foo.com” , app : “apache” ,     when : “2010-12-02:11:33:14” , data : XXX }    Possi...
Right-Balanced Index Access                      Only have to keep                       small portion in                 ...
Download MongoDB      http://www.mongodb.org   and
let
us
know
what
you
think    @eliothorowitz



@mongodb       10gen is...
Upcoming SlideShare
Loading in...5
×

Scaling with MongoDB

3,007

Published on

Eliot Horowitz's presentation at MongoSV on December 3, 2010

Published in: Technology
0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
3,007
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
38
Comments
0
Likes
3
Embeds 0
No embeds

No notes for slide
  • \n
  • \n
  • What is scaling?\nWell - hopefully for everyone here.\n\n
  • \n
  • ec2 goes up to 64gb, maybe mention 256gb box here??? ($30-40k)\nmaybe can but 256gb box, but i spin up 10 ec2 64gb boxes in 10 minutes\n
  • \n
  • not schema less - dynamic schema\nschema is just as important, or more important than relational\nunderstand write vs read tradeoffs\n\n
  • compare to mysql here\n\n
  • \n
  • most common performance problem\nwhy _id index can be ignored\n
  • \n
  • \n
  • \n
  • data looked at per second/minute/hour/day\nare you indexes accessed randomly\n
  • \n256gb ram $30-40k\n
  • \n
  • \n
  • \n
  • \n
  • \n
  • Don&amp;#x2019;t pre-emptively shard - easy to add later\n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • Transcript of "Scaling with MongoDB"

    1. 1. Scaling with MongoDB Eliot Horowitz @eliothorowitz MongoSV December 3, 2010
    2. 2. Scaling• Storage needs only go up• Operations/sec only go up• Complexity only goes up
    3. 3. Scaling by Optimization• Schema Design• Index Design• Hardware Configuration
    4. 4. Horizontal Scaling• Vertical scaling is limited• Hard to scale vertically in the cloud• Can scale wider than higher
    5. 5. Schema• Modeling the same data in different ways can change performance by orders of magnitude• Very often performance problems can be solved by changing Schema
    6. 6. Embedding• Great for read performance• One seek to load entire object• One roundtrip to database• Writes can be slow if adding to objects all the time
    7. 7. Should you embed comments? { title : “MongoDB is fun” , author : “eliot” , date : “2010-12-03” , comments : [ { author : “bob” , text : “...” } , { author : “joe” , text : “...” } ] }db.posts.update( { title : “MongoDB is fun” } , { $push : { author : “sam” , text : “...” } } )
    8. 8. Indexes• Index common queries• Make sure there aren’t duplicates: (A) and (A,B) aren’t needed• Right-balanced indexes keep working set small
    9. 9. Random Index Access Have to keep entire index in ram
    10. 10. Right-Balanced Index Access Only have to keep small portion in ram
    11. 11. Covered Indexes db.users.find( { name: “joe”} , { name: 1 , email: 1, _id:0} )• Add email address in your index db.users.ensureIndex( { name : 1 , email : 1} )
    12. 12. RAM Requirements• Understand working set• What percentage of your data has to fit in RAM?• How do you figure this out?
    13. 13. Hardware• Disk performance• How many drives• What about ec2?• Network performance
    14. 14. Read Scaling• One master at any time• Programmer determines if read hits master or a slave• Pro: easy to setup, can scale reads very well• Con: reads are inconsistent on a slave• Writes don’t scale
    15. 15. One Master, Many Slaves• Custom Master/Slave setup• Have as many slaves as you want• Can put them local to application servers• Good for 90+% read heavy applications (Wikipedia)
    16. 16. Replica Sets• High Availability Cluster• One master at any time, up to 6 slaves• A slave automatically promoted to master if failure• Drivers support auto routing of reads to slaves if programmer allows• Good for applications that need high write availability but mostly reads (Commenting System)
    17. 17. Sharding• Many masters, even more slaves• Can scale reads and writes in two dimensions• Add slaves for inconsistent read scaling and redundancy• Add Shards for write and data size scaling
    18. 18. Architecture Shards mongod mongod mongod ... Config mongod mongod mongod Serversmongodmongodmongod mongos mongos ... client
    19. 19. Common Setup• Typical setup is 3 shards with 3 servers per shard: 3 masters, 6 slaves• One massive collection, dozen non-sharded• Can add sharding later to an existing replica set with no down time• Can have sharded and non-sharded collections
    20. 20. Choosing a Shard Key• Shard key determines how data is partitioned• Hard to change• Most important performance decision
    21. 21. Range Based MIN MAX LOCATION A F shard1 F M shard1 M R shard2 R Z shard3• collection is broken into chunks by range• chunks default to 200mb or 100,000 objects
    22. 22. Use Case: User Profiles { email : “eliot@10gen.com” , addresses : [ { state : “NY” } ] }• Shard by email• Lookup by email hits 1 node• Index on { “addresses.state” : 1 }
    23. 23. Use Case: Activity Stream { user_id : XXX, event_id : YYY , data : ZZZ }• Shard by user_id• Looking up an activity stream hits 1 node• Writing even is distributed• Index on { “event_id” : 1 } for deletes
    24. 24. Use Case: Photos { photo_id : ???? , data : <binary> } What’s the right key?• auto increment• MD5( data )• now() + MD5(data)• month() + MD5(data)
    25. 25. Use Case: Logging { machine : “app.foo.com” , app : “apache” , when : “2010-12-02:11:33:14” , data : XXX } Possible Shard keys• { machine : 1 }• { when : 1 }• { machine : 1 , app : 1 }• { app : 1 }
    26. 26. Right-Balanced Index Access Only have to keep small portion in ram
    27. 27. Download MongoDB http://www.mongodb.org and
let
us
know
what
you
think @eliothorowitz



@mongodb 10gen is hiring!http://www.10gen.com/jobs
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.

    ×