Eliot Horowitz @eliothorowitz MongoBerlin October 4, 2010 Sharding Internals
MongoDB Sharding Scale horizontally for data size, index size, write and consistent read scaling Distribute databases, collections or a objects in a collection Auto-balancing, migrations, management happen with no down time
Choose how you partition data Can convert from single master to sharded system with no downtime Same features as non-sharding single master Fully consistent
Range Based collection is broken into chunks by range chunks default to 200mb or 100,000 objects
User profiles Partition by user_id Secondary indexes on location, dates, etc... Reads/writes know which shard to hit
User Activity Stream Shard by user_id Loading a user’s stream hits a single shard Writes are distributed across all shards Can index on activity for deleting
Photos Can shard by photo_id for best read/write distribution Secondary index on tags, date
Logging date machine, date logger name Possible Shard Keys
Architecture client mongos ... mongos mongod mongod mongodddd mongod mongod mongod ... Shards mongod mongod mongod Config Servers
Config Servers 3 of them changes are made with 2 phase commit if any are down, meta data goes read only system is online as long as 1/3 is up
Shards Can be master, master/slave or replica sets Replica sets gives sharding + full auto-failover Regular mongod processes
mongos Sharding Router Acts just like a mongod to clients Can have 1 or as many as you want Can run on appserver so no extra network traffic
Writes Inserts : require shard key, routed Removes: routed and/or scattered Updates: routed or scattered
Queries By shard key: routed sorted by shard key: routed in order by non shard key: scatter gather sorted by non shard key: distributed merge sort
Operations split: breaking a chunk into 2 migrate: move a chunk from 1 shard to another balancing: moving chunks automatically to keep system in balance
Setting it Up Start servers add shards: db.runCommand( { addshard : "10.1.1.5" } ) turn on partitioning: db.runCommand( { enablesharding : "test" }  shard a collection: db.runCommand( { shardcollection : "test.data" , key : { num : 1 } } )
Download MongoDB http://www.mongodb.org and let us know what you think @eliothorowitz  @mongodb 10gen is hiring! http://www.10gen.com/jobs

2010 mongo berlin-shardinginternals (1)

  • 1.
    Eliot Horowitz @eliothorowitzMongoBerlin October 4, 2010 Sharding Internals
  • 2.
    MongoDB Sharding Scalehorizontally for data size, index size, write and consistent read scaling Distribute databases, collections or a objects in a collection Auto-balancing, migrations, management happen with no down time
  • 3.
    Choose how youpartition data Can convert from single master to sharded system with no downtime Same features as non-sharding single master Fully consistent
  • 4.
    Range Based collectionis broken into chunks by range chunks default to 200mb or 100,000 objects
  • 5.
    User profiles Partitionby user_id Secondary indexes on location, dates, etc... Reads/writes know which shard to hit
  • 6.
    User Activity StreamShard by user_id Loading a user’s stream hits a single shard Writes are distributed across all shards Can index on activity for deleting
  • 7.
    Photos Can shardby photo_id for best read/write distribution Secondary index on tags, date
  • 8.
    Logging date machine,date logger name Possible Shard Keys
  • 9.
    Architecture client mongos... mongos mongod mongod mongodddd mongod mongod mongod ... Shards mongod mongod mongod Config Servers
  • 10.
    Config Servers 3of them changes are made with 2 phase commit if any are down, meta data goes read only system is online as long as 1/3 is up
  • 11.
    Shards Can bemaster, master/slave or replica sets Replica sets gives sharding + full auto-failover Regular mongod processes
  • 12.
    mongos Sharding RouterActs just like a mongod to clients Can have 1 or as many as you want Can run on appserver so no extra network traffic
  • 13.
    Writes Inserts :require shard key, routed Removes: routed and/or scattered Updates: routed or scattered
  • 14.
    Queries By shardkey: routed sorted by shard key: routed in order by non shard key: scatter gather sorted by non shard key: distributed merge sort
  • 15.
    Operations split: breakinga chunk into 2 migrate: move a chunk from 1 shard to another balancing: moving chunks automatically to keep system in balance
  • 16.
    Setting it UpStart servers add shards: db.runCommand( { addshard : "10.1.1.5" } ) turn on partitioning: db.runCommand( { enablesharding : "test" } shard a collection: db.runCommand( { shardcollection : "test.data" , key : { num : 1 } } )
  • 17.
    Download MongoDB http://www.mongodb.organd let us know what you think @eliothorowitz @mongodb 10gen is hiring! http://www.10gen.com/jobs

Editor's Notes

  • #3 for inconsistent read scaling
  • #7 don’t shard by date