MongoDB Sharding Internals
 

MongoDB Sharding Internals

on

  • 3,650 views

Eliot Horowitz's presentation at MongoSV on December 3, 2010

Eliot Horowitz's presentation at MongoSV on December 3, 2010

Statistics

Views

Total Views
3,650
Views on SlideShare
3,650
Embed Views
0

Actions

Likes
13
Downloads
100
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Apple Keynote

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • \n
  • \n
  • for inconsistent read scaling\n\n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • don’t shard by date\n\n
  • \n
  • \n
  • \n

MongoDB Sharding Internals MongoDB Sharding Internals Presentation Transcript

  • Sharding Internals Eliot Horowitz @eliothorowitz MongoSV December 3, 2010
  • MongoDB Sharding• Scale horizontally for data size, index size, write and consistent read scaling• Distribute databases, collections or a objects in a collection• Auto-balancing, migrations, management happen with no down time
  • • Choose how you partition data• Can convert from single master to sharded system with no downtime• Same features as non-sharding single master• Fully consistent
  • Range Based MIN MAX LOCATION A F shard1 F M shard1 M R shard2 R Z shard3• collection is broken into chunks by range• chunks default to 200mb or 100,000 objects
  • Architecture Shards mongod mongod mongod ... Config mongod mongod mongod Serversmongodmongodmongod mongos mongos ... client
  • Shards• Can be master, master/slave or replica sets• Replica sets gives sharding + full auto- failover• Regular mongod processes
  • Config Servers• 3 of them• changes are made with 2 phase commit• if any are down, meta data goes read only• system is online as long as 1/3 is up
  • mongos• Sharding Router• Acts just like a mongod to clients• Can have 1 or as many as you want• Can run on appserver so no extra network traffic• Cache meta data from config servers
  • Writes• Inserts : require shard key, routed• Removes: routed and/or scattered• Updates: routed or scattered
  • Queries• By shard key: routed• sorted by shard key: routed in order• by non shard key: scatter gather• sorted by non shard key: distributed merge sort
  • Splitting• Take a chunk and split it in 2• Splits on the median value• Splits only change meta data, no data change
  • SplittingT1 MIN MAX LOCATION A Z shard1T2 MIN MAX LOCATION A G shard1 G Z shard1T3 MIN MAX LOCATION A D shard1 D G shard1 G S shard1 S Z shard1
  • Balancing• Moves chunks from one shard to another• Done online while system is running• Balancing runs in the background
  • MigratingT3 MIN MAX LOCATION A D shard1 D G shard1 G S shard1 S Z shard1T4 MIN MAX LOCATION A D shard1 D G shard1 G S shard1 S Z shard2T5 MIN MAX LOCATION A D shard1 D G shard1 G S shard2 S Z shard2
  • Setting it Up• Start servers• add shards: db.runCommand( { addshard : "10.1.1.5" } )• turn on partitioning: db.runCommand( { enablesharding : "test" }• shard a collection: db.runCommand( { shardcollection : "test.data" , key : { num : 1 } } )
  • User profiles• Partition by user_id• Secondary indexes on location, dates, etc...• Reads/writes know which shard to hit
  • User Activity Stream• Shard by user_id• Loading a user’s stream hits a single shard• Writes are distributed across all shards• Can index on activity for deleting
  • Photos• Can shard by photo_id for best read/write distribution• Secondary index on tags, date
  • LoggingPossible Shard Keys• date• machine, date• logger name
  • Download MongoDB http://www.mongodb.org and
let
us
know
what
you
think @eliothorowitz



@mongodb 10gen is hiring!http://www.10gen.com/jobs