Your SlideShare is downloading. ×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Scaling MongoDB (Mongo Austin)

2,797
views

Published on

Eliot Horowitz's presentation at Mongo Austin

Eliot Horowitz's presentation at Mongo Austin

Published in: Technology

0 Comments
4 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
2,797
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
48
Comments
0
Likes
4
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide
  • \n
  • \n
  • What is scaling?\nWell - hopefully for everyone here.\n\n
  • ec2 goes up to 64gb, maybe mention 256gb box here??? ($30-40k)\nmaybe can but 256gb box, but i spin up 10 ec2 64gb boxes in 10 minutes\n
  • \n
  • \n
  • \n
  • \n
  • \n
  • Don’t pre-emptively shard - easy to add later\n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • Transcript

    • 1. Scaling with MongoDB Eliot Horowitz @eliothorowitz MongoAustin February 15, 2011
    • 2. Scaling• Storage needs only go up• Operations/sec only go up• Complexity only goes up
    • 3. Horizontal Scaling• Vertical scaling is limited• Hard to scale vertically in the cloud• Can scale wider than higher
    • 4. Read Scaling• One master at any time• Programmer determines if read hits master or a slave• Pro: easy to setup, can scale reads very well• Con: reads are inconsistent on a slave• Writes don’t scale
    • 5. One Master, Many Slaves• Custom Master/Slave setup• Have as many slaves as you want• Can put them local to application servers• Good for 90+% read heavy applications (Wikipedia)
    • 6. Replica Sets• High Availability Cluster• One master at any time, up to 6 slaves• A slave automatically promoted to master if failure• Drivers support auto routing of reads to slaves if programmer allows• Good for applications that need high write availability but mostly reads (Commenting System)
    • 7. Sharding• Many masters, even more slaves• Can scale in two dimensions• Add Shards for write and data size scaling• Add slaves for inconsistent read scaling and redundancy
    • 8. Sharding Basics• Data is split up into chunks• Shard: Replica sets that hold a portion of the data• Config Servers: Store meta data about system• Mongos: Routers, direct direct and merge requests
    • 9. Architecture Shards mongod mongod mongod ... mongod mongod mongod mongod mongod mongod Config Serversmongod mongos mongos ...mongodmongod client client client client
    • 10. Common Setup• A common setup is 3 shards with 3 servers per shard: 3 masters, 6 slaves• Can add sharding later to an existing replica set with no down time• Can have sharded and non-sharded collections
    • 11. Range Based MIN MAX LOCATION A F shard1 F M shard1 M R shard2 R Z shard3• collection is broken into chunks by range• chunks default to 64mb or 100,000 objects
    • 12. Config Servers• 3 of them• changes are made with 2 phase commit• if any are down, meta data goes read only• system is online as long as 1/3 is up
    • 13. mongos• Sharding Router• Acts just like a mongod to clients• Can have 1 or as many as you want• Can run on appserver so no extra network traffic• Cache meta data from config servers
    • 14. Writes• Inserts : require shard key, routed• Removes: routed and/or scattered• Updates: routed or scattered
    • 15. Queries• By shard key: routed• sorted by shard key: routed in order• by non shard key: scatter gather• sorted by non shard key: distributed merge sort
    • 16. Splitting• Take a chunk and split it in 2• Splits on the median value• Splits only change meta data, no data change
    • 17. SplittingT1 MIN MAX LOCATION A Z shard1T2 MIN MAX LOCATION A G shard1 G Z shard1T3 MIN MAX LOCATION A D shard1 D G shard1 G S shard1 S Z shard1
    • 18. Balancing• Moves chunks from one shard to another• Done online while system is running• Balancing runs in the background
    • 19. MigratingT3 MIN MAX LOCATION A D shard1 D G shard1 G S shard1 S Z shard1T4 MIN MAX LOCATION A D shard1 D G shard1 G S shard1 S Z shard2T5 MIN MAX LOCATION A D shard1 D G shard1 G S shard2 S Z shard2
    • 20. Choosing a Shard Key• Shard key determines how data is partitioned• Hard to change• Most important performance decision
    • 21. Use Case: User Profiles { email : “eliot@10gen.com” , addresses : [ { state : “NY” } ] }• Shard by email• Lookup by email hits 1 node• Index on { “addresses.state” : 1 }
    • 22. Use Case: Activity Stream { user_id : XXX, event_id : YYY , data : ZZZ }• Shard by user_id• Looking up an activity stream hits 1 node• Writing even is distributed• Index on { “event_id” : 1 } for deletes
    • 23. Use Case: Photos { photo_id : ???? , data : <binary> } What’s the right key?• auto increment• MD5( data )• now() + MD5(data)• month() + MD5(data)
    • 24. Use Case: Logging { machine : “app.foo.com” , app : “apache” , when : “2010-12-02:11:33:14” , data : XXX } Possible Shard keys• { machine : 1 }• { when : 1 }• { machine : 1 , app : 1 }• { app : 1 }
    • 25. Roadmap
    • 26. Past Releases• First release - February 2009• v1.0 - August 2009• v1.2 - December 2009 - Map/Reduce, lots of small things• v1.4 - March 2010 - Concurrency/Geo• V1.6 - August 2010 - Sharding/Replica Sets
    • 27. 1.8• Single Server Durability• Covered Indexes• Enhancements to Sharding/Replica Sets
    • 28. Short List• Better Aggregation• Full Text Search• TTL timeout collections• Concurrency• Compaction
    • 29. Download MongoDB http://www.mongodb.org and
let
us
know
what
you
think @eliothorowitz



@mongodb 10gen is hiring!http://www.10gen.com/jobs