Scaling with MongoDB

Like this? Share it with your network

Share

Scaling with MongoDB

  • 3,826 views
Uploaded on

Eliot Horowitz's presentation at MongoSV on December 3, 2010

Eliot Horowitz's presentation at MongoSV on December 3, 2010

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
3,826
On Slideshare
3,391
From Embeds
435
Number of Embeds
4

Actions

Shares
Downloads
37
Comments
0
Likes
3

Embeds 435

http://www.10gen.com 432
http://localhost:8080 1
http://test.10gen.com 1
http://w.mongodb.org 1

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide
  • \n
  • \n
  • What is scaling?\nWell - hopefully for everyone here.\n\n
  • \n
  • ec2 goes up to 64gb, maybe mention 256gb box here??? ($30-40k)\nmaybe can but 256gb box, but i spin up 10 ec2 64gb boxes in 10 minutes\n
  • \n
  • not schema less - dynamic schema\nschema is just as important, or more important than relational\nunderstand write vs read tradeoffs\n\n
  • compare to mysql here\n\n
  • \n
  • most common performance problem\nwhy _id index can be ignored\n
  • \n
  • \n
  • \n
  • data looked at per second/minute/hour/day\nare you indexes accessed randomly\n
  • \n256gb ram $30-40k\n
  • \n
  • \n
  • \n
  • \n
  • \n
  • Don’t pre-emptively shard - easy to add later\n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n

Transcript

  • 1. Scaling with MongoDB Eliot Horowitz @eliothorowitz MongoSV December 3, 2010
  • 2. Scaling• Storage needs only go up• Operations/sec only go up• Complexity only goes up
  • 3. Scaling by Optimization• Schema Design• Index Design• Hardware Configuration
  • 4. Horizontal Scaling• Vertical scaling is limited• Hard to scale vertically in the cloud• Can scale wider than higher
  • 5. Schema• Modeling the same data in different ways can change performance by orders of magnitude• Very often performance problems can be solved by changing Schema
  • 6. Embedding• Great for read performance• One seek to load entire object• One roundtrip to database• Writes can be slow if adding to objects all the time
  • 7. Should you embed comments? { title : “MongoDB is fun” , author : “eliot” , date : “2010-12-03” , comments : [ { author : “bob” , text : “...” } , { author : “joe” , text : “...” } ] }db.posts.update( { title : “MongoDB is fun” } , { $push : { author : “sam” , text : “...” } } )
  • 8. Indexes• Index common queries• Make sure there aren’t duplicates: (A) and (A,B) aren’t needed• Right-balanced indexes keep working set small
  • 9. Random Index Access Have to keep entire index in ram
  • 10. Right-Balanced Index Access Only have to keep small portion in ram
  • 11. Covered Indexes db.users.find( { name: “joe”} , { name: 1 , email: 1, _id:0} )• Add email address in your index db.users.ensureIndex( { name : 1 , email : 1} )
  • 12. RAM Requirements• Understand working set• What percentage of your data has to fit in RAM?• How do you figure this out?
  • 13. Hardware• Disk performance• How many drives• What about ec2?• Network performance
  • 14. Read Scaling• One master at any time• Programmer determines if read hits master or a slave• Pro: easy to setup, can scale reads very well• Con: reads are inconsistent on a slave• Writes don’t scale
  • 15. One Master, Many Slaves• Custom Master/Slave setup• Have as many slaves as you want• Can put them local to application servers• Good for 90+% read heavy applications (Wikipedia)
  • 16. Replica Sets• High Availability Cluster• One master at any time, up to 6 slaves• A slave automatically promoted to master if failure• Drivers support auto routing of reads to slaves if programmer allows• Good for applications that need high write availability but mostly reads (Commenting System)
  • 17. Sharding• Many masters, even more slaves• Can scale reads and writes in two dimensions• Add slaves for inconsistent read scaling and redundancy• Add Shards for write and data size scaling
  • 18. Architecture Shards mongod mongod mongod ... Config mongod mongod mongod Serversmongodmongodmongod mongos mongos ... client
  • 19. Common Setup• Typical setup is 3 shards with 3 servers per shard: 3 masters, 6 slaves• One massive collection, dozen non-sharded• Can add sharding later to an existing replica set with no down time• Can have sharded and non-sharded collections
  • 20. Choosing a Shard Key• Shard key determines how data is partitioned• Hard to change• Most important performance decision
  • 21. Range Based MIN MAX LOCATION A F shard1 F M shard1 M R shard2 R Z shard3• collection is broken into chunks by range• chunks default to 200mb or 100,000 objects
  • 22. Use Case: User Profiles { email : “eliot@10gen.com” , addresses : [ { state : “NY” } ] }• Shard by email• Lookup by email hits 1 node• Index on { “addresses.state” : 1 }
  • 23. Use Case: Activity Stream { user_id : XXX, event_id : YYY , data : ZZZ }• Shard by user_id• Looking up an activity stream hits 1 node• Writing even is distributed• Index on { “event_id” : 1 } for deletes
  • 24. Use Case: Photos { photo_id : ???? , data : <binary> } What’s the right key?• auto increment• MD5( data )• now() + MD5(data)• month() + MD5(data)
  • 25. Use Case: Logging { machine : “app.foo.com” , app : “apache” , when : “2010-12-02:11:33:14” , data : XXX } Possible Shard keys• { machine : 1 }• { when : 1 }• { machine : 1 , app : 1 }• { app : 1 }
  • 26. Right-Balanced Index Access Only have to keep small portion in ram
  • 27. Download MongoDB http://www.mongodb.org and
let
us
know
what
you
think @eliothorowitz



@mongodb 10gen is hiring!http://www.10gen.com/jobs