Mongo db roma replication and sharding


Published on

Example of replication and sharding on MongoDB

Published in: Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Mongo db roma replication and sharding

  1. 1. MongoDB – Roma 12 Luglio 2012 Replication and Sharding: Hands on Guglielmo Incisa
  2. 2. Replication• What is it – Data is replicated (cloned) into at least two nodes – Updates are sent to one node (Primary) and automatically propagated to the others (Secondary) – Connection can through a router or directly to the Primary (Secondary is read only) • If we connect our app server to the Primary we must deal with its failure and reconnect to the new Primary Primary App server DB Router
  3. 3. Replication• Why we need it – If one node fails the application server can still work without any impact – The router will automatically manage the connection to the rest of the nodes (router may be subject to failure though) Primary App server DB Router
  4. 4. Replication• Why we need it – More and more IT departments are moving from • Big, proprietary, reliable and expensive servers – To • Commodity Hardware (smaller, less reliable, inexpensive servers: PC) – Commodity hardware is less reliable but our users demand that our applications be always available: the replication can help. – Example: how many servers do I need to have 99,999% of availability? • If for example a PC has 98% availability (8 days if downtime in a year, or 98% probability to be down) • -> Two replicated PC have 99,96% of availability • -> Three replicated PC have more than 99,999% (Telecom Grade / Core Network).
  5. 5. Sharding• What is it – Data is partitioned and distributed to different nodes • Some records are in node 1, others in node 2 etc… – MongoDB Sharding: the partition is based on a field. • Database: test2 – Table: testSchema1 – Fields: » owner: owner of the file, key and shard key (string) » date (string) » tags (list of string) » keywords: words in the document, created by java code below (list of string) » fileName (string) » content: the file (binary) » ascii: the file (string)
  6. 6. Sharding• Why we need it – Servers with smaller storage – To increase responsiveness by increasing parallelism Router Owner: A-H Owner: I-O Owner: P-Z
  7. 7. Replication and Sharding• Can we have both? – MongoDB: yes!• Our example: Shard A: 2 + arbiter Config process Shard B: 2 + arbiter Router mongos Shard C: 2 + arbiter
  8. 8. Replication and Sharding• Replication: – Two nodes and an arbiter • The arbiter is needed when a number of even nodes are used, it decides which server is Primary and which one is secondary, manages the upgrade when one is down• Sharding – Three sets: A, B, C – Config Process: • <<The config servers store the clusters metadata, which includes basic information on each shard server and the chunks contained therein.>> – Routing Process: • <<The mongos process can be thought of as a routing and coordination process that makes the various components of the cluster look like a single system. When receiving client requests, the mongos process routes the request to the appropriate server(s) and merges any results to be sent back to the client.>>
  9. 9. Setup 1• Start Servers and arbiters – Create /data/db, db2, db3, db4, db5, db6, db7, db8 ,db9, configdb – --nojournal speeds up the startup (journalling is default in 64 bit)• Replica set A – ./mongodb-linux-x86_64-2.0.4/bin/mongod --shardsvr --replSet DSSA –nojournal – ./mongodb-linux-x86_64-2.0.4/bin/mongod --shardsvr --replSet DSSA --dbpath /data/db2 --port 27021 –nojournal – Arbiter: Shard A: 2 + arbiter – ./mongodb-linux-x86_64-2.0.4/bin/mongod --shardsvr --replSet DSSA --dbpath /data/db7 --port 27031 –nojournal• Replica set B – ./mongodb-linux-x86_64-2.0.4/bin/mongod --shardsvr --replSet DSSB --dbpath /data/db3 -- Shard B: 2 + arbiter port 27023 –nojournal – ./mongodb-linux-x86_64-2.0.4/bin/mongod --shardsvr --replSet DSSB --dbpath /data/db4 -- port 27025 –nojournal – Arbiter: Shard C: 2 + arbiter – ./mongodb-linux-x86_64-2.0.4/bin/mongod --shardsvr --replSet DSSB --dbpath /data/db8 -- port 27035 –nojournal• Replica set C – ./mongodb-linux-x86_64-2.0.4/bin/mongod --shardsvr --replSet DSSC --dbpath /data/db5 -- port 27027 –nojournal – ./mongodb-linux-x86_64-2.0.4/bin/mongod --shardsvr --replSet DSSC --dbpath /data/db6 -- port 27029 –nojournal – Arbiter: – ./mongodb-linux-x86_64-2.0.4/bin/mongod --shardsvr --replSet DSSC --dbpath /data/db9 -- port 27039 --nojournal
  10. 10. Setup 2• Set the replicas, connect to each primary and set the configuration• Set replica A ./mongodb-linux-x86_64-2.0.4/bin/mongo --port 27018 cfg = { _id : "DSSA", members : [ {_id : 0, host : “hostname:27018"}, {_id : 1, host : "hostname:27021"}, {_id : 2, host : "hostname:27031", arbiterOnly:true} ] } rs.initiate(cfg) db.getMongo().setSlaveOk()• Set replica B ./mongodb-linux-x86_64-2.0.4/bin/mongo --port 27023 cfg = { _id : "DSSB", members : [ {_id : 0, host : "hostname:27023"}, {_id : 1, host : "hostname:27025"}, {_id : 2, host : "hostname:27035", arbiterOnly:true} ] } rs.initiate(cfg) db.getMongo().setSlaveOk()• Set replica C ./mongodb-linux-x86_64-2.0.4/bin/mongo --port 27027 cfg = { _id : "DSSC", members : [ {_id : 0, host : "hostname:27027"}, {_id : 1, host : "hostname:27029"}, {_id : 2, host : "hostname:27039", arbiterOnly:true}, ] } rs.initiate(cfg) db.getMongo().setSlaveOk()
  11. 11. Setup 3• Star config server ./mongodb-linux-x86_64-2.0.4/bin/mongod --configsvr --nojournal• Start router ./mongodb-linux-x86_64-2.0.4/bin/mongos --configdb grog:27019 --chunkSize 1• Configure Shards ./mongodb-linux-x86_64-2.0.4/bin/mongo admin db.runCommand( { addshard : "DSSA/hostname:27018, hostname:27021"}) db.runCommand( { addshard : "DSSB/hostname:27023, hostname:27025"}) db.runCommand( { addshard : "DSSC/hostname:27027, hostname:27029"}) db.runCommand( { enablesharding : "test2"}) db.runCommand( { shardcollection : "test2.testSchema1",key : { owner : 1}})• Load data… – We load 11 documents, sharding is done over the “owner”
  12. 12. MapReduce• "Map" step: The master node takes the input, divides it into smaller sub- problems, and distributes them to worker nodes. A worker node may do this again in turn, leading to a multi-level tree structure. The worker node processes the smaller problem, and passes the answer back to its master node.• "Reduce" step: The master node then collects the answers to all the sub- problems and combines them in some way to form the output – the answer to the problem it was originally trying to solve.• Source: Wikipedia•
  13. 13. MapReduce• map = function(){ if(!this.keywords){ return; } for (index in this.keywords){ emit(this.keywords[index],1); } }• reduce = function(previous,current){ var count = 0; for (index in current) { count += current[index]; } return count; }• result = db.runCommand({ "mapreduce" : "testSchema1", "map":map, "reduce":reduce, "out":"keywords"}) db.keywords.find() mongos> db.keywords.find({_id:“hello"})
  14. 14. Check Sharding• Connect to router and count the records: ./mongodb-linux-x86_64-2.0.4/bin/mongo admin mongos>use test2 mongos>db,testSchema1.count() 11• Connect to each primary (and see the number of records in each shard): ./mongodb-linux-x86_64-2.0.4/bin/mongo --port 27018 mongo>use test2 Mongo>db,testSchema1.count() 4 ./mongodb-linux-x86_64-2.0.4/bin/mongo --port 27023 mongo>use test2 mongo>db,testSchema1.count() 4 ./mongodb-linux-x86_64-2.0.4/bin/mongo --port 27027 mongo>use test2 mongo>db,testSchema1.count() 3
  15. 15. Check Replication• Kill Server 1 (=Primary A)• Connect to router and count the records: mongos>use test2 mongos>db,testSchema1.count() 11• Check if (Server 2) Secondary A in now primary• Load a new chunck• Counting will be 22• Restart killed server (Server 1) , wait• Kill the other one (Server 2), Primary A• Check that Server 1 is Primary again• Counting will still be 22• Restart Server 2