Mongo db roma replication and sharding

MongoDB – Roma
12 Luglio 2012
Replication and Sharding:
Hands on

Guglielmo Incisa

Replication
• What is it
– Data is replicated (cloned) into at least two nodes
– Updates are sent to one node (Primary) and automatically propagated
to the others (Secondary)
– Connection can through a router or directly to the Primary (Secondary
is read only)
• If we connect our app server to the Primary we must deal with its failure and
reconnect to the new Primary

Primary

App server DB

Router

Replication
• Why we need it
– If one node fails the application server can still work without any
impact
– The router will automatically manage the connection to the rest of the
nodes (router may be subject to failure though)

Primary

App server DB

Router

Replication
• Why we need it
– More and more IT departments are moving from
• Big, proprietary, reliable and expensive servers
– To
• Commodity Hardware (smaller, less reliable, inexpensive servers: PC)
– Commodity hardware is less reliable but our users demand that our
applications be always available: the replication can help.
– Example: how many servers do I need to have 99,999% of availability?
• If for example a PC has 98% availability (8 days if downtime in a year, or 98%
probability to be down)
• -> Two replicated PC have 99,96% of availability
• -> Three replicated PC have more than 99,999% (Telecom Grade / Core Network).

Sharding
• What is it
– Data is partitioned and distributed to different nodes
• Some records are in node 1, others in node 2 etc…
– MongoDB Sharding: the partition is based on a field.
• Database: test2
– Table: testSchema1
– Fields:
» owner: owner of the file, key and shard key (string)
» date (string)
» tags (list of string)
» keywords: words in the document, created by java code below (list of string)
» fileName (string)
» content: the file (binary)
» ascii: the file (string)

Sharding
• Why we need it
– Servers with smaller storage
– To increase responsiveness by increasing parallelism

Router

Owner: A-H Owner: I-O Owner: P-Z

Replication and Sharding
• Can we have both?
– MongoDB: yes!
• Our example:

Shard A: 2 + arbiter

Config process

Shard B: 2 + arbiter

Router
mongos
Shard C: 2 + arbiter

Replication and Sharding
• Replication:
– Two nodes and an arbiter
• The arbiter is needed when a number of even nodes are used, it decides which server is Primary and which
one is secondary, manages the upgrade when one is down

• Sharding
– Three sets: A, B, C
– Config Process:
• <<The config servers store the cluster's metadata, which includes basic information on each shard server and
the chunks contained therein.>>
– Routing Process:
• <<The mongos process can be thought of as a routing and coordination process that makes the various
components of the cluster look like a single system. When receiving client requests, the mongos process routes
the request to the appropriate server(s) and merges any results to be sent back to the client.>>

Setup 1
• Start Servers and arbiters
– Create /data/db, db2, db3, db4, db5, db6, db7, db8 ,db9, configdb
– --nojournal speeds up the startup (journalling is default in 64 bit)
• Replica set A
– ./mongodb-linux-x86_64-2.0.4/bin/mongod --shardsvr --replSet DSSA –nojournal
– ./mongodb-linux-x86_64-2.0.4/bin/mongod --shardsvr --replSet DSSA --dbpath /data/db2
--port 27021 –nojournal
– Arbiter:
Shard A: 2 + arbiter
– ./mongodb-linux-x86_64-2.0.4/bin/mongod --shardsvr --replSet DSSA --dbpath /data/db7
--port 27031 –nojournal

• Replica set B
– ./mongodb-linux-x86_64-2.0.4/bin/mongod --shardsvr --replSet DSSB --dbpath /data/db3 -- Shard B: 2 + arbiter
port 27023 –nojournal
– ./mongodb-linux-x86_64-2.0.4/bin/mongod --shardsvr --replSet DSSB --dbpath /data/db4 --
– Arbiter: Shard C: 2 + arbiter
– ./mongodb-linux-x86_64-2.0.4/bin/mongod --shardsvr --replSet DSSB --dbpath /data/db8 --

• Replica set C
– ./mongodb-linux-x86_64-2.0.4/bin/mongod --shardsvr --replSet DSSC --dbpath /data/db5 --
– Arbiter:
port 27039 --nojournal

Setup 2
• Set the replicas, connect to each primary and set the configuration
• Set replica A
./mongodb-linux-x86_64-2.0.4/bin/mongo --port 27018
cfg = {
_id : "DSSA",
members : [
{_id : 0, host : “hostname:27018"},
{_id : 1, host : "hostname:27021"},
{_id : 2, host : "hostname:27031", arbiterOnly:true}
]
}
rs.initiate(cfg)
db.getMongo().setSlaveOk()

• Set replica B
cfg = {
_id : "DSSB",
members : [
{_id : 2, host : "hostname:27035", arbiterOnly:true}
]
}
rs.initiate(cfg)

• Set replica C
cfg = {
_id : "DSSC",
members : [
{_id : 2, host : "hostname:27039", arbiterOnly:true},
]
}
rs.initiate(cfg)

Setup 3
• Star config server
./mongodb-linux-x86_64-2.0.4/bin/mongod --configsvr --nojournal

• Start router
./mongodb-linux-x86_64-2.0.4/bin/mongos --configdb grog:27019 --chunkSize 1

• Configure Shards
./mongodb-linux-x86_64-2.0.4/bin/mongo admin
db.runCommand( { addshard : "DSSA/hostname:27018, hostname:27021"})
db.runCommand( { addshard : "DSSB/hostname:27023, hostname:27025"})
db.runCommand( { addshard : "DSSC/hostname:27027, hostname:27029"})
db.runCommand( { enablesharding : "test2"})
db.runCommand( { shardcollection : "test2.testSchema1",key : { owner : 1}})

• Load data…

– We load 11 documents, sharding is done over the “owner”

MapReduce
• "Map" step: The master node takes the input, divides it into smaller sub-
problems, and distributes them to worker nodes. A worker node may do
this again in turn, leading to a multi-level tree structure. The worker node
processes the smaller problem, and passes the answer back to its master
node.
• "Reduce" step: The master node then collects the answers to all the sub-
problems and combines them in some way to form the output – the
answer to the problem it was originally trying to solve.
• Source: Wikipedia
•

MapReduce
• map = function(){
if(!this.keywords){
return;
}
for (index in this.keywords){
emit(this.keywords[index],1);
}
}
• reduce = function(previous,current){
var count = 0;
for (index in current) {
count += current[index];
}
return count;
}
• result = db.runCommand({
"mapreduce" : "testSchema1",
"map":map,
"reduce":reduce,
"out":"keywords"})
db.keywords.find()
mongos> db.keywords.find({_id:“hello"})

Check Sharding
• Connect to router and count the records:
./mongodb-linux-x86_64-2.0.4/bin/mongo admin
mongos>use test2
mongos>db,testSchema1.count()
11
• Connect to each primary (and see the number of records in each shard):
mongo>use test2
Mongo>db,testSchema1.count()
4
mongo>use test2
mongo>db,testSchema1.count()
4
mongo>use test2
mongo>db,testSchema1.count()
3

Check Replication
• Kill Server 1 (=Primary A)
• Connect to router and count the records:
mongos>use test2
mongos>db,testSchema1.count()
11
• Check if (Server 2) Secondary A in now primary
• Load a new chunck
• Counting will be 22
• Restart killed server (Server 1) , wait
• Kill the other one (Server 2), Primary A
• Check that Server 1 is Primary again
• Counting will still be 22
• Restart Server 2

Mongo db roma replication and sharding

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (6)

Similar to Mongo db roma replication and sharding

Similar to Mongo db roma replication and sharding (20)

Recently uploaded

Recently uploaded (20)

Mongo db roma replication and sharding