2011 mongo sf-scaling

1,535 views

Published on

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,535
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
38
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide
  • \n
  • \n
  • \n
  • ec2 goes up to 64gb, maybe mention 256gb box here??? ($30-40k)\nmaybe can but 256gb box, but i spin up 10 ec2 64gb boxes in 10 minutes\n
  • \n
  • \n
  • \n
  • \n
  • \n
  • Don’t pre-emptively shard - easy to add later\n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • Hashed shard keys coming soon\n
  • \n
  • \n
  • \n
  • \n
  • \n
  • 2011 mongo sf-scaling

    1. 1. Practical Scaling and Sharding Eliot Horowitz @eliothorowitz MongoSF May 24, 2011
    2. 2. Scaling by Optimization• Schema Design• Index Design• Hardware Configuration
    3. 3. Horizontal Scaling• Vertical scaling is limited• Hard to scale vertically in the cloud• Can scale wider than higher
    4. 4. Replica Sets• One master at any time• Programmer determines if read hits master or a slave• Easy to setup to scale reads
    5. 5. db.people.find( { state : “NY” } ).addOption( SlaveOK )• routed to a secondary automatically• will use master if no secondary is available
    6. 6. Not Enough• Writes don’t scale• Reads are out of date on slaves• RAM/Data Size doesn’t scale
    7. 7. Why Shard?• Distribute write load• Keep working set in RAM• Consistent reads• Preserve functionality
    8. 8. Sharding Design Goals• Scale linearly• Increase capacity with no downtime• Transparent to the application• Low administration to add capacity
    9. 9. Sharding and Documents• Rich documents reduce need for joins• No joins makes sharding solvable
    10. 10. Basics• Choose how you partition data• Convert from single replica set to sharding with no downtime• Full feature set• Fully consistent by default
    11. 11. Architecture Shards mongod mongod mongod ... mongod mongod mongod mongod mongod mongodConfigServersmongod mongos mongos ...mongodmongod client client client client
    12. 12. Typical Basic Setup Data Center Primary Data Center Secondary S1 p=1 S1 p=1 S1 p=0 S2 p=1 S2 p=1 S2 p=0 S3 p=1 S3 p=1 S3 p=0 Config 1 Config 2 Config 2mongos mongos mongos mongos
    13. 13. Range Based• collection is broken into chunks by range• chunks default to 64mb or 100,000 objects
    14. 14. Choosing a Shard Key• Shard key determines how data is partitioned• Hard to change• Most important performance decision
    15. 15. Use Case: Photos { photo_id : ???? , data : <binary> } What’s the right key?• auto increment• MD5( data )• month() + MD5(data)
    16. 16. Initial Loading• System start with 1 chunk• Writes will hit 1 shard and then move• Pre-splitting for initial bulk loading can dramatically improve bulk load time
    17. 17. Administering a Cluster• Do not wait too long to add capacity• Need capacity for normal workload + cost of moving data• Stay < 70% operational capacity
    18. 18. Hardware Considerations• Understand working set and make sure it can fit in RAM• Choose appropriate sized boxes for shards • Too small and admin/overhead goes up • Too large, and you can’t add capacity smoothly
    19. 19. Download MongoDB http://www.mongodb.org and
let
us
know
what
you
think @eliothorowitz



@mongodb 10gen is hiring!http://www.10gen.com/jobs
    20. 20. Use Case: User Profiles { email : “eliot@10gen.com” , addresses : [ { state : “NY” } ] }• Shard by email• Lookup by email hits 1 node• Index on { “addresses.state” : 1 }
    21. 21. Use Case: Activity Stream { user_id : XXX, event_id : YYY , data : ZZZ }• Shard by user_id• Looking up an activity stream hits 1 node• Writing even is distributed• Index on { “event_id” : 1 } for deletes

    ×