SlideShare a Scribd company logo
1 of 26
Download to read offline
MongoDB Sharding
Uzzal Basak
MongoDB Sharding
Sharding is a method for distributing data across multiple machines. MongoDB uses
sharding to support deployments with very large data sets and high throughput operations.
Database systems with large data sets or high throughput applications can challenge the
capacity of a single server. For example, high query rates can exhaust the CPU capacity
of the server. Working set sizes larger than the system’s RAM stress the I/O capacity of
disk drives.
There are two methods for addressing system growth: vertical and horizontal scaling.
Vertical Scaling involves increasing the capacity of a single server, such as using a more
powerful CPU, adding more RAM, or increasing the amount of storage space. Limitations
in available technology may restrict a single machine from being sufficiently powerful
for a given workload. Additionally, Cloud-based providers have hard ceilings based on
available hardware configurations. As a result, there is a practical maximum for vertical
Horizontal Scaling involves dividing the system dataset and load over multiple servers,
adding additional servers to increase capacity as required. While the overall speed or
capacity of a single machine may not be high, each machine handles a subset of the overall
workload, potentially providing better efficiency than a single high-speed high-capacity
server. Expanding the capacity of the deployment only requires adding additional servers
as needed, which can be a lower overall cost than high-end hardware for a single machine.
The trade off is increased complexity in infrastructure and maintenance for the
Sharded Cluster
A MongoDB sharded cluster consists of the following components:
 shard: Each shard contains a subset of the sharded data. Each shard can be
deployed as a replica set.
 mongos: The mongos acts as a query router, providing an interface between
client applications and the sharded cluster.
 config servers: Config servers store metadata and configuration settings for the
A replica set in MongoDB is a group of mongod processes that maintain the same data set. Replica
sets provide redundancy and high availability, and are the basis for all production deployments.
Advantage of Replica Set:
1. It’s ensure data availability is any disaster period. If any Primary failure due to any hardware issue,
then data will not loss at all.
2. We can execute Select queries from Secondary database which can reduce the load from Primary
3. Then we can set the delay replication from Secondary database that will help any data corruption
from developer. Suppose we set 1 hour delay replication that means always 1 hour data sync lag
from Primary database. If any developer does anything accidentally, then we can recovery from
secondary database so we can consider this Secondary Database as hot-backup.
When we go for Sharding?
Sharding is the most complex architecture we can deploy using MongoDB, and there are two main
approaches as to when to shard or not. The first is to configure the cluster as soon as possible – when
we predict high throughput and fast data growth.
The second says we should use a cluster as the best alternative when the application demands more
resources than the replica set can offer (such as low memory, an overloaded disk or high processor
load). This approach is more corrective than preventative, but we will discuss that in the future.
1) Disaster recovery plan
Disaster recovery (DR) is a very delicate topic: how long would tolerate an outage? If necessary, how
long would it take to restore the entire database? Depending on the database size and on disk speed, a
backup/restore process might take hours or even days!
There is no hard number in Gigabytes to justify a cluster. But in general, you should engage when the
database is more than 200GB the backup and restore processes might take a while to finish.
Let’s consider the case where we have a replica set with a 300GB database. The full restore process
might last around four hours, whereas if the database has two shards, it will take about two hours –
and depending on the number of shards we can improve that time. Simple math: if there are two shards,
the restore process takes half of the time to restore when compared to a single replica set.
2) Hardware limitations
Disk and memory are inexpensive nowadays. However, this is not true when companies need to scale
out to high numbers (such as TB of RAM). Suppose cloud provider can only offer up to 5,000 IOPS
in the disk subsystem, but the application needs more than that to work correctly. To work around this
performance limitation, it is better to start a cluster and divide the writes among instances. That said,
if there are two shards the application will have 10000 IOPS available to use for writes and reads in
the disk subsystem.
3) Storage engine limitations
There are a few storage engine limitations that can be a bottleneck . MMAPv2 does have a lock per
collection, while WiredTiger has tickets that will limit the number of writes and reads happening
concurrently. Although we can tweak the number of tickets available in WiredTiger, there is a virtual
limit – which means that changing the available tickets might generate processor overload instead of
increasing performance. If one of these situations becomes a bottleneck in system, we can start a
cluster. Once shard the collection, distribute the load/lock among the different instances.
4) Hot data vs. cold data
Several databases only work with a small percentage of the data being stored. This is called hot data
or working set. Cold data or historical data is rarely read, and demands considerable system resources
when it is. So why spend money on expensive machines that only store cold data or low-value
data? With a cluster deployment we can choose where the cold data is stored, and use cheap devices
and disks to do so. The same is true for hot data – we can use better machines to have better
performance. This methodology also speeds up writes and reads on the hot data, as the indexes are
smaller and add less overhead to the system.
5) Geo-distributed data
It doesn’t matter whether this need comes from application design or legal compliance. If the data
must stay within continent or country borders, a cluster helps make that happen. It is possible to limit
data localization so that it is stored solely in a specific “part of the world.” The number of shards and
their geographic positions is not essential for the application, as it only views the database. This is
commonly used in worldwide companies for better performance, or simply to comply with the local
6) Infrastructure limitations
Infrastructure and hardware limitations are very similar. When thinking about infrastructure, however,
we focus on specific cases when the instances should be small. An example is running MongoDB on
Mesos. Some providers only offer a few cores and a limited amount of RAM. Even if you are willing
to pay more for that, it is not possible to purchase more than they offer as their products. A cluster
provides the option to split a small amount of data among a lot of shards, reaching the same
performance a big and expensive machine provides.
7) Failure isolation
Consider that a replica set or a single instance holds all the data. If for any reason this instance/replica
set goes down, the whole application goes down. In a cluster, if we lose one of the five shards, 80%
of the data is still available. Running a few shards helps to isolate failures. Obviously, running a bunch
of instances makes the cluster prone to have a failed instance, but as each shard must have at least
three instances the probability of the entire shard being down is minimal. For providers that offer
different zones, it is good practice to have different members of the shard in different availability zones
(or even different regions).
8) Speed up queries
Queries can take too long, depending on the number of reads they perform. In a clustered deployment,
queries can run in parallel and speed up the query response time. If a query runs in ten seconds in a
replica set, it is very likely that the same query will run in five to six seconds if the cluster has two
shards, and so on.
Sharding Configuration is below:
Basically, I have only Two Virtual Box. So, 3 node Config servers and 3 node Shard1 and Mogos
process will run in this Machine.
Host IP Port 7001 7002 7003
Shard Process Running Machine
Host IP Port 4000
Host IP Port 5001 5002 5003
Target 1: Create a Replica set for Shard 1 and test Replica set is working.
Target 2: Create a Config Server with 3 node Replica set also Sharding name.
Target 3: Create a MONGOS process and add Config server in MONGOS process.
Target 4: Add Shard name with Shard 1 and start with new config file
Target 5; Add Shard 1 into MONGOS process
Create directory for keyfile, Datafile and logfile for both Node
Host and Port No Datafile Logfile /home/oracle/mongo_shard/mongoShard1_1/datafile /home/oracle/mongo_shard/mongoShard1_1/logfile /home/oracle/mongo_shard/mongoShard1_2/datafile /home/oracle/mongo_shard/mongoShard1_2/logfile /home/oracle/mongo_shard/mongoShard1_3/datafile /home/oracle/mongo_shard/mongoShard1_3/logfile
Host and Port No Datafile Logfile /home/oracle/mongo_shard/mongoConfig_1/datafile /home/oracle/mongo_shard/mongoConfig_1/logfile /home/oracle/mongo_shard/mongoConfig_2/datafile /home/oracle/mongo_shard/mongoConfig_2/logfile /home/oracle/mongo_shard/mongoConfig_3/datafile /home/oracle/mongo_shard/mongoConfig_3/logfile
Host and Port No Datafile Logfile N/a /home/oracle/mongo_shard/Shard/logfile
Generate keyfile and transfer this file another node:
[23:11:51 oracle@test2 keyfile]$ openssl rand -base64 741 > /home/oracle/mongodb/keyfile/keyfile
[23:12:15 oracle@test2 keyfile]$ chmod 600 /home/oracle/mongodb/keyfile/keyfile
[23:12:24 oracle@test2 keyfile]$ scp /home/oracle/mongodb/keyfile/keyfile oracle@ansible:/home/oracle/mongodb/keyfile
oracle@ansible's password:
100% 1004 648.0KB/s 00:00
[23:13:29 oracle@test2 keyfile]$
Target 1: Create a Replica set for Shard 1 and test Replica set is working
Config File for All Three Node
Node 1:
replSetName: shard1
dbPath: /home/oracle/mongo_shard/mongoShard1_1/datafile
port: 5001
authorization: enabled
keyFile: /home/oracle/mongodb/keyfile/keyfile
destination: file
path: /home/oracle/mongo_shard/mongoShard1_1/logfile/mongod.log
logAppend: true
fork: true
Node 2
replSetName: shard1
dbPath: /home/oracle/mongo_shard/mongoShard1_2/datafile
port: 6002
authorization: enabled
keyFile: /home/oracle/mongodb/keyfile/keyfile
destination: file
path: /home/oracle/mongo_shard/mongoShard1_2/logfile/mongod.log
logAppend: true
fork: true
Node 3
replSetName: shard1
dbPath: /home/oracle/mongo_shard/mongoShard1_3/datafile
port: 5003
authorization: enabled
keyFile: /home/oracle/mongodb/keyfile/keyfile
destination: file
path: /home/oracle/mongo_shard/mongoShard1_3/logfile/mongod.log
logAppend: true
fork: true
Start the MongoDB Deamon process in both node using configure file
Node 1
[02:02:39 oracle@test2 bin]$ mongod -f config_file/shard1/node1.conf
about to fork child process, waiting until server is ready for connections.
forked process: 6033
child process started successfully, parent exiting
Node 2
[23:34:11 oracle@ansible bin]$ mongod -f config_file/shard1/node2.conf
about to fork child process, waiting until server is ready for connections.
forked process: 9687
child process started successfully, parent exiting
[23:34:27 oracle@ansible bin]$
Node 3
[02:05:31 oracle@test2 bin]$ mongod -f config_file/shard1/node3.conf
about to fork child process, waiting until server is ready for connections.
forked process: 6102
child process started successfully, parent exiting
ReplicaSet Configuration
Connecting to node1:
[02:06:19 oracle@test2 bin]$ mongo --port 5001
MongoDB shell version v3.6.8
connecting to: mongodb://
MongoDB server version: 3.6.8
MongoDB Enterprise >
Replication Initialization
MongoDB Enterprise > rs.initiate()
MongoDB user Creation
user: "admin",
pwd: "pass",
roles: [
{role: "root", db: "admin"}
MongoDB Enterprise > use admin
switched to db admin
MongoDB Enterprise > db.createUser({
... user: "admin",
... pwd: "pass",
... roles: [
... {role: "root", db: "admin"}
... ]
... })
Successfully added user: {
"user" : "admin",
"roles" : [
"role" : "root",
"db" : "admin"
MongoDB Enterprise shard1:PRIMARY>
Connect with admin user
mongo --host "shard1/" -u "admin" -p "pass" --authenticationDatabase
Add Node
Now testing Replication is working
Primary is Changing
Target 2: Create a Config Server with 3 node Replica set also Sharding name.
This Part So Far
Config servers store the metadata for a sharded cluster. The metadata reflects state and organization
for all data and components within the sharded cluster. The metadata includes the list of chunks on
every shard and the ranges that define the chunks
Config File for All Three Node Config replication
Node 1:
clusterRole: configsvr
replSetName: Config
keyFile: /home/oracle/mongodb/keyfile/keyfile
bindIp: localhost,
port: 7001
destination: file
path: /home/oracle/mongo_shard/mongoConfig_1/logfile/csrsvr.log
logAppend: true
fork: true
dbPath: /home/oracle/mongo_shard/mongoConfig_1/datafile
Node 2
clusterRole: configsvr
replSetName: Config
keyFile: /home/oracle/mongodb/keyfile/keyfile
bindIp: localhost,
port: 7002
destination: file
path: /home/oracle/mongo_shard/mongoConfig_2/logfile/csrsvr.log
logAppend: true
fork: true
dbPath: /home/oracle/mongo_shard/mongoConfig_2/datafile
Node 3
clusterRole: configsvr
replSetName: Config
keyFile: /home/oracle/mongodb/keyfile/keyfile
bindIp: localhost,
port: 7003
destination: file
path: /home/oracle/mongo_shard/mongoConfig_3/logfile/csrsvr.log
logAppend: true
fork: true
dbPath: /home/oracle/mongo_shard/mongoConfig_3/datafile
Start the MongoDB Daemon process for all config server using configure file
Node 1
[00:31:31 oracle@ansible bin]$ mongod -f
about to fork child process, waiting until server is ready for connections.
forked process: 13992
child process started successfully, parent exiting
Node 2
[03:03:18 oracle@test2 bin]$ mongod -f
about to fork child process, waiting until server is ready for connections.
forked process: 7168
child process started successfully, parent exiting
[03:03:34 oracle@test2 bin]$
Node 3
[00:31:44 oracle@ansible bin]$ mongod -f
about to fork child process, waiting until server is ready for connections.
forked process: 14031
child process started successfully, parent exiting
Connect to one of the config servers:
mongo --port 7001
Initiating the CSRS:
Creating super user on CSRS:
use admin
user: "admin",
pwd: "pass",
roles: [
{role: "root", db: "admin"}
Authenticating as the super user:
db.auth("admin", "pass")
Add the second and third node to the CSRS:
Target 3: Create a MONGOS process and add Config server in MONGOS process
MongoDB mongos instances route queries and write operations to shards in a
sharded cluster. mongos provide the only interface to a sharded cluster from the
perspective of applications. Applications never connect or communicate directly with
the shards.
The mongos tracks what data is on which shard by caching the metadata from
the config servers. The mongosuses the metadata to route operations from
applications and clients to the mongod instances. A mongos has no persistent state
and consumes minimal system resources.
The most common practice is to run mongos instances on the same systems as your
application servers, but you can maintain mongos instances on the shards or on
other dedicated resources.
Routing and Results Process
A mongos instance routes a query to a cluster by:
1. Determining the list of shards that must receive the query.
2. Establishing a cursor on all targeted shards.
The mongos then merges the data from each of the targeted shards and returns the
result document. Certain query modifiers, such as sorting, are performed on a shard
such as the primary shard before mongos retrieves the results.
This Part So Far
configDB: Config/,,
keyFile: /home/oracle/mongodb/keyfile/keyfile
bindIp: localhost,
port: 4000
destination: file
path: /home/oracle/mongo_shard/Shard/logfile/shard.log
logAppend: true
fork: true
Log shows it’s also working fine
Target 4: Add Shard name with Shard 1 and start with new config file
Check Current Shard Status
mongo --port 4000 --username admin --password pass --authenticationDatabase admin
Update Config file for Shard 1
Previous Config File New Config File
replSetName: shard1
dbPath: /home/oracle/mongo_shard/mongoShard1_1/datafile
port: 5001
authorization: enabled
keyFile: /home/oracle/mongodb/keyfile/keyfile
destination: file
path: /home/oracle/mongo_shard/mongoShard1_1/logfile/mongod.log
logAppend: true
fork: true
clusterRole: shardsvr
dbPath: /home/oracle/mongo_shard/mongoShard1_1/datafile
cacheSizeGB: .1
port: 5001
keyFile: /home/oracle/mongodb/keyfile/keyfile
destination: file
path: /home/oracle/mongo_shard/mongoShard1_1/logfile/mongod.log
logAppend: true
fork: true
replSetName: shard1
mongo --port 5001 -u "admin" -p "pass" --authenticationDatabase "admin"
Now start with new above config file.
Check the Replicaset status and make it primary
Current Primary 
mongo --port 5003 -u "admin" -p "pass" --authenticationDatabase "admin"
Shutdown Node 2 and start with new config file
mongo --port 6002 -u "admin" -p "pass" --authenticationDatabase "admin"
Previous Config File New Config File
replSetName: shard1
dbPath: /home/oracle/mongo_shard/mongoShard1_2/datafile
clusterRole: shardsvr
dbPath: /home/oracle/mongo_shard/mongoShard1_2/datafile
port: 6002
authorization: enabled
keyFile: /home/oracle/mongodb/keyfile/keyfile
destination: file
path: /home/oracle/mongo_shard/mongoShard1_2/logfile/mongod.log
logAppend: true
fork: true
cacheSizeGB: .1
port: 6002
keyFile: /home/oracle/mongodb/keyfile/keyfile
destination: file
path: /home/oracle/mongo_shard/mongoShard1_2/logfile/mongod.log
logAppend: true
fork: true
replSetName: shard1
Start the Mongod process
Shutdown Node 3 and start with new config file
mongo --port 5003 -u "admin" -p "pass" --authenticationDatabase "admin"
Previous config file New config file
replication: sharding:
replSetName: shard1
dbPath: /home/oracle/mongo_shard/mongoShard1_3/datafile
port: 5003
authorization: enabled
keyFile: /home/oracle/mongodb/keyfile/keyfile
destination: file
path: /home/oracle/mongo_shard/mongoShard1_3/logfile/mongod.log
logAppend: true
fork: true
clusterRole: shardsvr
dbPath: /home/oracle/mongo_shard/mongoShard1_3/datafile
cacheSizeGB: .1
port: 5003
keyFile: /home/oracle/mongodb/keyfile/keyfile
destination: file
path: /home/oracle/mongo_shard/mongoShard1_3/logfile/mongod.log
logAppend: true
fork: true
replSetName: shard1
Start Node 3 with new config file
mongod -f config_file/shard1/node3.conf
and connect it
mongo --port 5003 -u "admin" -p "pass" --authenticationDatabase "admin"
Target 5; Add Shard 1 into MONGOS process
Connect Shard Cluster
mongo --port 4000 --username admin --password pass --authenticationDatabase
Adding new shard to cluster from mongos:
Output Shows Shard added
Log Shows
Check the Shard Status
Then We can do add new Shard named Shard 2 as like in below Diagram

More Related Content

What's hot

Oracle 10g Application Server
Oracle 10g Application ServerOracle 10g Application Server
Oracle 10g Application ServerMark J. Feldman
Oracle Cloud PaaS & IaaS:2020年2月度サービス情報アップデート
Oracle Cloud PaaS & IaaS:2020年2月度サービス情報アップデートOracle Cloud PaaS & IaaS:2020年2月度サービス情報アップデート
Oracle Cloud PaaS & IaaS:2020年2月度サービス情報アップデートオラクルエンジニア通信
【旧版】Oracle Exadata Cloud Service:サービス概要のご紹介 [2020年8月版]
【旧版】Oracle Exadata Cloud Service:サービス概要のご紹介 [2020年8月版]【旧版】Oracle Exadata Cloud Service:サービス概要のご紹介 [2020年8月版]
【旧版】Oracle Exadata Cloud Service:サービス概要のご紹介 [2020年8月版]オラクルエンジニア通信
Performance benchmark results: Amazon Web Services (AWS) SAN in the Cloud vs....
Performance benchmark results: Amazon Web Services (AWS) SAN in the Cloud vs....Performance benchmark results: Amazon Web Services (AWS) SAN in the Cloud vs....
Performance benchmark results: Amazon Web Services (AWS) SAN in the Cloud vs....Principled Technologies
Oracle Cloud Infrastructure:2021年1月度サービス・アップデート
Oracle Cloud Infrastructure:2021年1月度サービス・アップデートOracle Cloud Infrastructure:2021年1月度サービス・アップデート
Oracle Cloud Infrastructure:2021年1月度サービス・アップデートオラクルエンジニア通信
Serverless Patterns by Jesse Butler
Serverless Patterns by Jesse ButlerServerless Patterns by Jesse Butler
Serverless Patterns by Jesse ButlerOracle Developers
【旧版】Oracle Database Cloud Service:サービス概要のご紹介 [2020年5月版]
【旧版】Oracle Database Cloud Service:サービス概要のご紹介 [2020年5月版]【旧版】Oracle Database Cloud Service:サービス概要のご紹介 [2020年5月版]
【旧版】Oracle Database Cloud Service:サービス概要のご紹介 [2020年5月版]オラクルエンジニア通信
Oracle Cloud Infrastructure – Compute
Oracle Cloud Infrastructure – ComputeOracle Cloud Infrastructure – Compute
Oracle Cloud Infrastructure – ComputeMarketingArrowECS_CZ
Get higher transaction throughput and better price/performance with an Amazon...
Get higher transaction throughput and better price/performance with an Amazon...Get higher transaction throughput and better price/performance with an Amazon...
Get higher transaction throughput and better price/performance with an Amazon...Principled Technologies
A single-socket Dell EMC PowerEdge R7515 solution delivered better value on a...
A single-socket Dell EMC PowerEdge R7515 solution delivered better value on a...A single-socket Dell EMC PowerEdge R7515 solution delivered better value on a...
A single-socket Dell EMC PowerEdge R7515 solution delivered better value on a...Principled Technologies
Oracle Cloud PaaS & IaaS:2019年12月度サービス情報アップデート
Oracle Cloud PaaS & IaaS:2019年12月度サービス情報アップデートOracle Cloud PaaS & IaaS:2019年12月度サービス情報アップデート
Oracle Cloud PaaS & IaaS:2019年12月度サービス情報アップデートオラクルエンジニア通信
IBM Power9 Features and Specifications
IBM Power9 Features and SpecificationsIBM Power9 Features and Specifications
IBM Power9 Features and
Oracle Cloud Infrastructure – Storage
Oracle Cloud Infrastructure – StorageOracle Cloud Infrastructure – Storage
Oracle Cloud Infrastructure – StorageMarketingArrowECS_CZ
Give DevOps teams self-service resource pools within your private infrastruct...
Give DevOps teams self-service resource pools within your private infrastruct...Give DevOps teams self-service resource pools within your private infrastruct...
Give DevOps teams self-service resource pools within your private infrastruct...Principled Technologies
SQL Server 2016 database performance on the Dell EMC PowerEdge FC630 QLogic 1...
SQL Server 2016 database performance on the Dell EMC PowerEdge FC630 QLogic 1...SQL Server 2016 database performance on the Dell EMC PowerEdge FC630 QLogic 1...
SQL Server 2016 database performance on the Dell EMC PowerEdge FC630 QLogic 1...Principled Technologies
Vertica on Amazon Web Services
Vertica on Amazon Web ServicesVertica on Amazon Web Services
Vertica on Amazon Web ServicesAndrey Karpov
自律型データベース Oracle Autonomous Database 最新情報(Oracle Cloudウェビナーシリーズ: 2020年8月6日)
自律型データベースOracle Autonomous Database 最新情報(Oracle Cloudウェビナーシリーズ: 2020年8月6日)自律型データベースOracle Autonomous Database 最新情報(Oracle Cloudウェビナーシリーズ: 2020年8月6日)
自律型データベース Oracle Autonomous Database 最新情報(Oracle Cloudウェビナーシリーズ: 2020年8月6日)オラクルエンジニア通信
Ensure greater uptime and boost VMware vSAN cluster performance with the Del...
Ensure greater uptime and boost VMware vSAN cluster  performance with the Del...Ensure greater uptime and boost VMware vSAN cluster  performance with the Del...
Ensure greater uptime and boost VMware vSAN cluster performance with the Del...Principled Technologies
Výhody a benefity nasazení Oracle Database Appliance
Výhody a benefity nasazení Oracle Database ApplianceVýhody a benefity nasazení Oracle Database Appliance
Výhody a benefity nasazení Oracle Database ApplianceMarketingArrowECS_CZ

What's hot (20)

Oracle 10g Application Server
Oracle 10g Application ServerOracle 10g Application Server
Oracle 10g Application Server
Oracle Cloud PaaS & IaaS:2020年2月度サービス情報アップデート
Oracle Cloud PaaS & IaaS:2020年2月度サービス情報アップデートOracle Cloud PaaS & IaaS:2020年2月度サービス情報アップデート
Oracle Cloud PaaS & IaaS:2020年2月度サービス情報アップデート
【旧版】Oracle Exadata Cloud Service:サービス概要のご紹介 [2020年8月版]
【旧版】Oracle Exadata Cloud Service:サービス概要のご紹介 [2020年8月版]【旧版】Oracle Exadata Cloud Service:サービス概要のご紹介 [2020年8月版]
【旧版】Oracle Exadata Cloud Service:サービス概要のご紹介 [2020年8月版]
Performance benchmark results: Amazon Web Services (AWS) SAN in the Cloud vs....
Performance benchmark results: Amazon Web Services (AWS) SAN in the Cloud vs....Performance benchmark results: Amazon Web Services (AWS) SAN in the Cloud vs....
Performance benchmark results: Amazon Web Services (AWS) SAN in the Cloud vs....
Oracle Cloud Infrastructure:2021年1月度サービス・アップデート
Oracle Cloud Infrastructure:2021年1月度サービス・アップデートOracle Cloud Infrastructure:2021年1月度サービス・アップデート
Oracle Cloud Infrastructure:2021年1月度サービス・アップデート
Serverless Patterns by Jesse Butler
Serverless Patterns by Jesse ButlerServerless Patterns by Jesse Butler
Serverless Patterns by Jesse Butler
【旧版】Oracle Database Cloud Service:サービス概要のご紹介 [2020年5月版]
【旧版】Oracle Database Cloud Service:サービス概要のご紹介 [2020年5月版]【旧版】Oracle Database Cloud Service:サービス概要のご紹介 [2020年5月版]
【旧版】Oracle Database Cloud Service:サービス概要のご紹介 [2020年5月版]
Oracle Cloud Infrastructure – Compute
Oracle Cloud Infrastructure – ComputeOracle Cloud Infrastructure – Compute
Oracle Cloud Infrastructure – Compute
Get higher transaction throughput and better price/performance with an Amazon...
Get higher transaction throughput and better price/performance with an Amazon...Get higher transaction throughput and better price/performance with an Amazon...
Get higher transaction throughput and better price/performance with an Amazon...
A single-socket Dell EMC PowerEdge R7515 solution delivered better value on a...
A single-socket Dell EMC PowerEdge R7515 solution delivered better value on a...A single-socket Dell EMC PowerEdge R7515 solution delivered better value on a...
A single-socket Dell EMC PowerEdge R7515 solution delivered better value on a...
Oracle Cloud PaaS & IaaS:2019年12月度サービス情報アップデート
Oracle Cloud PaaS & IaaS:2019年12月度サービス情報アップデートOracle Cloud PaaS & IaaS:2019年12月度サービス情報アップデート
Oracle Cloud PaaS & IaaS:2019年12月度サービス情報アップデート
IBM Power9 Features and Specifications
IBM Power9 Features and SpecificationsIBM Power9 Features and Specifications
IBM Power9 Features and Specifications
Oracle Cloud Infrastructure – Storage
Oracle Cloud Infrastructure – StorageOracle Cloud Infrastructure – Storage
Oracle Cloud Infrastructure – Storage
Give DevOps teams self-service resource pools within your private infrastruct...
Give DevOps teams self-service resource pools within your private infrastruct...Give DevOps teams self-service resource pools within your private infrastruct...
Give DevOps teams self-service resource pools within your private infrastruct...
SQL Server 2016 database performance on the Dell EMC PowerEdge FC630 QLogic 1...
SQL Server 2016 database performance on the Dell EMC PowerEdge FC630 QLogic 1...SQL Server 2016 database performance on the Dell EMC PowerEdge FC630 QLogic 1...
SQL Server 2016 database performance on the Dell EMC PowerEdge FC630 QLogic 1...
Vertica on Amazon Web Services
Vertica on Amazon Web ServicesVertica on Amazon Web Services
Vertica on Amazon Web Services
自律型データベース Oracle Autonomous Database 最新情報(Oracle Cloudウェビナーシリーズ: 2020年8月6日)
自律型データベースOracle Autonomous Database 最新情報(Oracle Cloudウェビナーシリーズ: 2020年8月6日)自律型データベースOracle Autonomous Database 最新情報(Oracle Cloudウェビナーシリーズ: 2020年8月6日)
自律型データベース Oracle Autonomous Database 最新情報(Oracle Cloudウェビナーシリーズ: 2020年8月6日)
Ensure greater uptime and boost VMware vSAN cluster performance with the Del...
Ensure greater uptime and boost VMware vSAN cluster  performance with the Del...Ensure greater uptime and boost VMware vSAN cluster  performance with the Del...
Ensure greater uptime and boost VMware vSAN cluster performance with the Del...
Výhody a benefity nasazení Oracle Database Appliance
Výhody a benefity nasazení Oracle Database ApplianceVýhody a benefity nasazení Oracle Database Appliance
Výhody a benefity nasazení Oracle Database Appliance
Power edge mx7000_sds_performance_1018
Power edge mx7000_sds_performance_1018Power edge mx7000_sds_performance_1018
Power edge mx7000_sds_performance_1018

Similar to MongoDB Sharding Explained

MongoDB Replication and Sharding
MongoDB Replication and ShardingMongoDB Replication and Sharding
MongoDB Replication and ShardingTharun Srinivasa
Performance Tuning
Performance TuningPerformance Tuning
Performance TuningJannet Peetz
Cassandra in Operation
Cassandra in OperationCassandra in Operation
Cassandra in Operationniallmilton
Bt0070 operating systems 2
Bt0070 operating systems  2Bt0070 operating systems  2
Bt0070 operating systems 2Techglyphs
Webcenter application performance tuning guide
Webcenter application performance tuning guideWebcenter application performance tuning guide
Webcenter application performance tuning guideVinay Kumar
EOUG95 - Client Server Very Large Databases - Paper
EOUG95 - Client Server Very Large Databases - PaperEOUG95 - Client Server Very Large Databases - Paper
EOUG95 - Client Server Very Large Databases - PaperDavid Walker
Big Data Glossary of terms
Big Data Glossary of termsBig Data Glossary of terms
Big Data Glossary of termsKognitio
TechDay - Toronto 2016 - Hyperconvergence and OpenNebula
TechDay - Toronto 2016 - Hyperconvergence and OpenNebulaTechDay - Toronto 2016 - Hyperconvergence and OpenNebula
TechDay - Toronto 2016 - Hyperconvergence and OpenNebulaOpenNebula Project
A presentaion on Panasas HPC NAS
A presentaion on Panasas HPC NASA presentaion on Panasas HPC NAS
A presentaion on Panasas HPC NASRahul Janghel
Why is Virtualization Creating Storage Sprawl? By Storage Switzerland
Why is Virtualization Creating Storage Sprawl? By Storage SwitzerlandWhy is Virtualization Creating Storage Sprawl? By Storage Switzerland
Why is Virtualization Creating Storage Sprawl? By Storage SwitzerlandINFINIDAT
Google File System
Google File SystemGoogle File System
Google File SystemDreamJobs1
Waters Grid & HPC Course
Waters Grid & HPC CourseWaters Grid & HPC Course
Waters Grid & HPC Coursejimliddle
User-space Network Processing
User-space Network ProcessingUser-space Network Processing
User-space Network ProcessingRyousei Takano
Hardware Provisioning
Hardware ProvisioningHardware Provisioning
Hardware ProvisioningMongoDB

Similar to MongoDB Sharding Explained (20)

MongoDB Replication and Sharding
MongoDB Replication and ShardingMongoDB Replication and Sharding
MongoDB Replication and Sharding
My sql
My sqlMy sql
My sql
Performance Tuning
Performance TuningPerformance Tuning
Performance Tuning
Cassandra in Operation
Cassandra in OperationCassandra in Operation
Cassandra in Operation
Cassandra admin
Cassandra adminCassandra admin
Cassandra admin
Bt0070 operating systems 2
Bt0070 operating systems  2Bt0070 operating systems  2
Bt0070 operating systems 2
Webcenter application performance tuning guide
Webcenter application performance tuning guideWebcenter application performance tuning guide
Webcenter application performance tuning guide
EOUG95 - Client Server Very Large Databases - Paper
EOUG95 - Client Server Very Large Databases - PaperEOUG95 - Client Server Very Large Databases - Paper
EOUG95 - Client Server Very Large Databases - Paper
Big Data Glossary of terms
Big Data Glossary of termsBig Data Glossary of terms
Big Data Glossary of terms
TechDay - Toronto 2016 - Hyperconvergence and OpenNebula
TechDay - Toronto 2016 - Hyperconvergence and OpenNebulaTechDay - Toronto 2016 - Hyperconvergence and OpenNebula
TechDay - Toronto 2016 - Hyperconvergence and OpenNebula
A presentaion on Panasas HPC NAS
A presentaion on Panasas HPC NASA presentaion on Panasas HPC NAS
A presentaion on Panasas HPC NAS
Why is Virtualization Creating Storage Sprawl? By Storage Switzerland
Why is Virtualization Creating Storage Sprawl? By Storage SwitzerlandWhy is Virtualization Creating Storage Sprawl? By Storage Switzerland
Why is Virtualization Creating Storage Sprawl? By Storage Switzerland
Google File System
Google File SystemGoogle File System
Google File System
Waters Grid & HPC Course
Waters Grid & HPC CourseWaters Grid & HPC Course
Waters Grid & HPC Course
Scaling Up vs. Scaling-out
Scaling Up vs. Scaling-outScaling Up vs. Scaling-out
Scaling Up vs. Scaling-out
Dba tuning
Dba tuningDba tuning
Dba tuning
Hadoop Research
Hadoop Research Hadoop Research
Hadoop Research
User-space Network Processing
User-space Network ProcessingUser-space Network Processing
User-space Network Processing
Mdb dn 2016_11_ops_mgr
Mdb dn 2016_11_ops_mgrMdb dn 2016_11_ops_mgr
Mdb dn 2016_11_ops_mgr
Hardware Provisioning
Hardware ProvisioningHardware Provisioning
Hardware Provisioning

More from uzzal basak

Elk presentation 2#3
Elk presentation 2#3Elk presentation 2#3
Elk presentation 2#3uzzal basak
Elk presentation1#3
Elk presentation1#3Elk presentation1#3
Elk presentation1#3uzzal basak
Oracle goldengate 11g schema replication from standby database
Oracle goldengate 11g schema replication from standby databaseOracle goldengate 11g schema replication from standby database
Oracle goldengate 11g schema replication from standby databaseuzzal basak
12c db upgrade from
12c db upgrade from db upgrade from
12c db upgrade from basak
Encrypt and decrypt in solaris system
Encrypt and decrypt in solaris systemEncrypt and decrypt in solaris system
Encrypt and decrypt in solaris systemuzzal basak
Oracle table partition step
Oracle table partition stepOracle table partition step
Oracle table partition stepuzzal basak
Oracle business intelligence enterprise edition 11g
Oracle business intelligence enterprise edition 11gOracle business intelligence enterprise edition 11g
Oracle business intelligence enterprise edition 11guzzal basak
EMC Networker installation Document
EMC Networker installation DocumentEMC Networker installation Document
EMC Networker installation Documentuzzal basak
Oracle Audit vault
Oracle Audit vaultOracle Audit vault
Oracle Audit vaultuzzal basak
Schema replication using oracle golden gate 12c
Schema replication using oracle golden gate 12cSchema replication using oracle golden gate 12c
Schema replication using oracle golden gate 12cuzzal basak
Oracle data guard configuration in 12c
Oracle data guard configuration in 12cOracle data guard configuration in 12c
Oracle data guard configuration in 12cuzzal basak

More from uzzal basak (11)

Elk presentation 2#3
Elk presentation 2#3Elk presentation 2#3
Elk presentation 2#3
Elk presentation1#3
Elk presentation1#3Elk presentation1#3
Elk presentation1#3
Oracle goldengate 11g schema replication from standby database
Oracle goldengate 11g schema replication from standby databaseOracle goldengate 11g schema replication from standby database
Oracle goldengate 11g schema replication from standby database
12c db upgrade from
12c db upgrade from db upgrade from
12c db upgrade from
Encrypt and decrypt in solaris system
Encrypt and decrypt in solaris systemEncrypt and decrypt in solaris system
Encrypt and decrypt in solaris system
Oracle table partition step
Oracle table partition stepOracle table partition step
Oracle table partition step
Oracle business intelligence enterprise edition 11g
Oracle business intelligence enterprise edition 11gOracle business intelligence enterprise edition 11g
Oracle business intelligence enterprise edition 11g
EMC Networker installation Document
EMC Networker installation DocumentEMC Networker installation Document
EMC Networker installation Document
Oracle Audit vault
Oracle Audit vaultOracle Audit vault
Oracle Audit vault
Schema replication using oracle golden gate 12c
Schema replication using oracle golden gate 12cSchema replication using oracle golden gate 12c
Schema replication using oracle golden gate 12c
Oracle data guard configuration in 12c
Oracle data guard configuration in 12cOracle data guard configuration in 12c
Oracle data guard configuration in 12c

Recently uploaded

A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge

Recently uploaded (20)

A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf

MongoDB Sharding Explained

  • 2. MongoDB Sharding Sharding is a method for distributing data across multiple machines. MongoDB uses sharding to support deployments with very large data sets and high throughput operations. Database systems with large data sets or high throughput applications can challenge the capacity of a single server. For example, high query rates can exhaust the CPU capacity of the server. Working set sizes larger than the system’s RAM stress the I/O capacity of disk drives. There are two methods for addressing system growth: vertical and horizontal scaling. Vertical Scaling involves increasing the capacity of a single server, such as using a more powerful CPU, adding more RAM, or increasing the amount of storage space. Limitations in available technology may restrict a single machine from being sufficiently powerful for a given workload. Additionally, Cloud-based providers have hard ceilings based on available hardware configurations. As a result, there is a practical maximum for vertical scaling.
  • 3. Horizontal Scaling involves dividing the system dataset and load over multiple servers, adding additional servers to increase capacity as required. While the overall speed or capacity of a single machine may not be high, each machine handles a subset of the overall workload, potentially providing better efficiency than a single high-speed high-capacity server. Expanding the capacity of the deployment only requires adding additional servers as needed, which can be a lower overall cost than high-end hardware for a single machine. The trade off is increased complexity in infrastructure and maintenance for the deployment. Sharded Cluster A MongoDB sharded cluster consists of the following components:  shard: Each shard contains a subset of the sharded data. Each shard can be deployed as a replica set.  mongos: The mongos acts as a query router, providing an interface between client applications and the sharded cluster.  config servers: Config servers store metadata and configuration settings for the cluster.
  • 4. A replica set in MongoDB is a group of mongod processes that maintain the same data set. Replica sets provide redundancy and high availability, and are the basis for all production deployments. Advantage of Replica Set: 1. It’s ensure data availability is any disaster period. If any Primary failure due to any hardware issue, then data will not loss at all. 2. We can execute Select queries from Secondary database which can reduce the load from Primary database. 3. Then we can set the delay replication from Secondary database that will help any data corruption from developer. Suppose we set 1 hour delay replication that means always 1 hour data sync lag from Primary database. If any developer does anything accidentally, then we can recovery from secondary database so we can consider this Secondary Database as hot-backup. When we go for Sharding? Sharding is the most complex architecture we can deploy using MongoDB, and there are two main approaches as to when to shard or not. The first is to configure the cluster as soon as possible – when we predict high throughput and fast data growth. The second says we should use a cluster as the best alternative when the application demands more resources than the replica set can offer (such as low memory, an overloaded disk or high processor load). This approach is more corrective than preventative, but we will discuss that in the future. 1) Disaster recovery plan Disaster recovery (DR) is a very delicate topic: how long would tolerate an outage? If necessary, how long would it take to restore the entire database? Depending on the database size and on disk speed, a backup/restore process might take hours or even days! There is no hard number in Gigabytes to justify a cluster. But in general, you should engage when the database is more than 200GB the backup and restore processes might take a while to finish. Let’s consider the case where we have a replica set with a 300GB database. The full restore process might last around four hours, whereas if the database has two shards, it will take about two hours – and depending on the number of shards we can improve that time. Simple math: if there are two shards, the restore process takes half of the time to restore when compared to a single replica set. 2) Hardware limitations Disk and memory are inexpensive nowadays. However, this is not true when companies need to scale out to high numbers (such as TB of RAM). Suppose cloud provider can only offer up to 5,000 IOPS in the disk subsystem, but the application needs more than that to work correctly. To work around this performance limitation, it is better to start a cluster and divide the writes among instances. That said, if there are two shards the application will have 10000 IOPS available to use for writes and reads in
  • 5. the disk subsystem. 3) Storage engine limitations There are a few storage engine limitations that can be a bottleneck . MMAPv2 does have a lock per collection, while WiredTiger has tickets that will limit the number of writes and reads happening concurrently. Although we can tweak the number of tickets available in WiredTiger, there is a virtual limit – which means that changing the available tickets might generate processor overload instead of increasing performance. If one of these situations becomes a bottleneck in system, we can start a cluster. Once shard the collection, distribute the load/lock among the different instances. 4) Hot data vs. cold data Several databases only work with a small percentage of the data being stored. This is called hot data or working set. Cold data or historical data is rarely read, and demands considerable system resources when it is. So why spend money on expensive machines that only store cold data or low-value data? With a cluster deployment we can choose where the cold data is stored, and use cheap devices and disks to do so. The same is true for hot data – we can use better machines to have better performance. This methodology also speeds up writes and reads on the hot data, as the indexes are smaller and add less overhead to the system. 5) Geo-distributed data It doesn’t matter whether this need comes from application design or legal compliance. If the data must stay within continent or country borders, a cluster helps make that happen. It is possible to limit data localization so that it is stored solely in a specific “part of the world.” The number of shards and their geographic positions is not essential for the application, as it only views the database. This is commonly used in worldwide companies for better performance, or simply to comply with the local law. 6) Infrastructure limitations Infrastructure and hardware limitations are very similar. When thinking about infrastructure, however, we focus on specific cases when the instances should be small. An example is running MongoDB on Mesos. Some providers only offer a few cores and a limited amount of RAM. Even if you are willing to pay more for that, it is not possible to purchase more than they offer as their products. A cluster provides the option to split a small amount of data among a lot of shards, reaching the same performance a big and expensive machine provides. 7) Failure isolation Consider that a replica set or a single instance holds all the data. If for any reason this instance/replica set goes down, the whole application goes down. In a cluster, if we lose one of the five shards, 80% of the data is still available. Running a few shards helps to isolate failures. Obviously, running a bunch of instances makes the cluster prone to have a failed instance, but as each shard must have at least three instances the probability of the entire shard being down is minimal. For providers that offer
  • 6. different zones, it is good practice to have different members of the shard in different availability zones (or even different regions). 8) Speed up queries Queries can take too long, depending on the number of reads they perform. In a clustered deployment, queries can run in parallel and speed up the query response time. If a query runs in ten seconds in a replica set, it is very likely that the same query will run in five to six seconds if the cluster has two shards, and so on. Sharding Configuration is below: Basically, I have only Two Virtual Box. So, 3 node Config servers and 3 node Shard1 and Mogos process will run in this Machine. CRS Host IP Port 7001 7002 7003
  • 7. Shard Process Running Machine Host IP Port 4000 Shard1 Host IP Port 5001 5002 5003 Target 1: Create a Replica set for Shard 1 and test Replica set is working. Target 2: Create a Config Server with 3 node Replica set also Sharding name. Target 3: Create a MONGOS process and add Config server in MONGOS process. Target 4: Add Shard name with Shard 1 and start with new config file Target 5; Add Shard 1 into MONGOS process Create directory for keyfile, Datafile and logfile for both Node Shard1: Host and Port No Datafile Logfile /home/oracle/mongo_shard/mongoShard1_1/datafile /home/oracle/mongo_shard/mongoShard1_1/logfile /home/oracle/mongo_shard/mongoShard1_2/datafile /home/oracle/mongo_shard/mongoShard1_2/logfile /home/oracle/mongo_shard/mongoShard1_3/datafile /home/oracle/mongo_shard/mongoShard1_3/logfile ConfigServer Host and Port No Datafile Logfile /home/oracle/mongo_shard/mongoConfig_1/datafile /home/oracle/mongo_shard/mongoConfig_1/logfile /home/oracle/mongo_shard/mongoConfig_2/datafile /home/oracle/mongo_shard/mongoConfig_2/logfile /home/oracle/mongo_shard/mongoConfig_3/datafile /home/oracle/mongo_shard/mongoConfig_3/logfile Mongos
  • 8. Host and Port No Datafile Logfile N/a /home/oracle/mongo_shard/Shard/logfile Generate keyfile and transfer this file another node: [23:11:51 oracle@test2 keyfile]$ openssl rand -base64 741 > /home/oracle/mongodb/keyfile/keyfile [23:12:15 oracle@test2 keyfile]$ chmod 600 /home/oracle/mongodb/keyfile/keyfile [23:12:24 oracle@test2 keyfile]$ scp /home/oracle/mongodb/keyfile/keyfile oracle@ansible:/home/oracle/mongodb/keyfile oracle@ansible's password: keyfile 100% 1004 648.0KB/s 00:00 [23:13:29 oracle@test2 keyfile]$ Target 1: Create a Replica set for Shard 1 and test Replica set is working Config File for All Three Node Node 1: replication: replSetName: shard1 storage: dbPath: /home/oracle/mongo_shard/mongoShard1_1/datafile net: bindIp:,localhost port: 5001 security: authorization: enabled keyFile: /home/oracle/mongodb/keyfile/keyfile systemLog:
  • 9. destination: file path: /home/oracle/mongo_shard/mongoShard1_1/logfile/mongod.log logAppend: true processManagement: fork: true Node 2 replication: replSetName: shard1 storage: dbPath: /home/oracle/mongo_shard/mongoShard1_2/datafile net: bindIp:,localhost port: 6002 security: authorization: enabled keyFile: /home/oracle/mongodb/keyfile/keyfile systemLog: destination: file path: /home/oracle/mongo_shard/mongoShard1_2/logfile/mongod.log logAppend: true processManagement: fork: true Node 3 replication: replSetName: shard1 storage: dbPath: /home/oracle/mongo_shard/mongoShard1_3/datafile net: bindIp:,localhost port: 5003 security: authorization: enabled keyFile: /home/oracle/mongodb/keyfile/keyfile systemLog: destination: file path: /home/oracle/mongo_shard/mongoShard1_3/logfile/mongod.log
  • 10. logAppend: true processManagement: fork: true Start the MongoDB Deamon process in both node using configure file Node 1 [02:02:39 oracle@test2 bin]$ mongod -f config_file/shard1/node1.conf about to fork child process, waiting until server is ready for connections. forked process: 6033 child process started successfully, parent exiting Node 2 [23:34:11 oracle@ansible bin]$ mongod -f config_file/shard1/node2.conf about to fork child process, waiting until server is ready for connections. forked process: 9687 child process started successfully, parent exiting [23:34:27 oracle@ansible bin]$ Node 3 [02:05:31 oracle@test2 bin]$ mongod -f config_file/shard1/node3.conf about to fork child process, waiting until server is ready for connections. forked process: 6102 child process started successfully, parent exiting ReplicaSet Configuration Connecting to node1: [02:06:19 oracle@test2 bin]$ mongo --port 5001 MongoDB shell version v3.6.8 connecting to: mongodb:// MongoDB server version: 3.6.8
  • 11. MongoDB Enterprise > Replication Initialization MongoDB Enterprise > rs.initiate() MongoDB user Creation db.createUser({ user: "admin", pwd: "pass", roles: [ {role: "root", db: "admin"} ] }) MongoDB Enterprise > use admin switched to db admin MongoDB Enterprise > db.createUser({ ... user: "admin", ... pwd: "pass", ... roles: [ ... {role: "root", db: "admin"} ... ] ... }) Successfully added user: { "user" : "admin", "roles" : [ { "role" : "root", "db" : "admin" } ] } MongoDB Enterprise shard1:PRIMARY>
  • 12. Connect with admin user mongo --host "shard1/" -u "admin" -p "pass" --authenticationDatabase "admin" Add Node rs.add("") rs.add("") Now testing Replication is working rs.stepDown() Primary is Changing
  • 13. Target 2: Create a Config Server with 3 node Replica set also Sharding name. This Part So Far Config servers store the metadata for a sharded cluster. The metadata reflects state and organization for all data and components within the sharded cluster. The metadata includes the list of chunks on every shard and the ranges that define the chunks Config File for All Three Node Config replication Node 1: sharding: clusterRole: configsvr replication: replSetName: Config security: keyFile: /home/oracle/mongodb/keyfile/keyfile net: bindIp: localhost, port: 7001
  • 14. systemLog: destination: file path: /home/oracle/mongo_shard/mongoConfig_1/logfile/csrsvr.log logAppend: true processManagement: fork: true storage: dbPath: /home/oracle/mongo_shard/mongoConfig_1/datafile Node 2 sharding: clusterRole: configsvr replication: replSetName: Config security: keyFile: /home/oracle/mongodb/keyfile/keyfile net: bindIp: localhost, port: 7002 systemLog: destination: file path: /home/oracle/mongo_shard/mongoConfig_2/logfile/csrsvr.log logAppend: true processManagement: fork: true storage: dbPath: /home/oracle/mongo_shard/mongoConfig_2/datafile Node 3 sharding: clusterRole: configsvr replication: replSetName: Config security: keyFile: /home/oracle/mongodb/keyfile/keyfile net: bindIp: localhost, port: 7003
  • 15. systemLog: destination: file path: /home/oracle/mongo_shard/mongoConfig_3/logfile/csrsvr.log logAppend: true processManagement: fork: true storage: dbPath: /home/oracle/mongo_shard/mongoConfig_3/datafile Start the MongoDB Daemon process for all config server using configure file Node 1 [00:31:31 oracle@ansible bin]$ mongod -f /home/oracle/mongodb_software/bin/config_file/config_svr/cnsvr1.conf about to fork child process, waiting until server is ready for connections. forked process: 13992 child process started successfully, parent exiting Node 2 [03:03:18 oracle@test2 bin]$ mongod -f /home/TimesTen_SFT/mongodb/bin/config_file/config_svr/cnsvr2.conf about to fork child process, waiting until server is ready for connections. forked process: 7168 child process started successfully, parent exiting [03:03:34 oracle@test2 bin]$ Node 3 [00:31:44 oracle@ansible bin]$ mongod -f /home/oracle/mongodb_software/bin/config_file/config_svr/cnsvr3.conf about to fork child process, waiting until server is ready for connections. forked process: 14031 child process started successfully, parent exiting Connect to one of the config servers:
  • 16. mongo --port 7001 Initiating the CSRS: rs.initiate() Creating super user on CSRS: use admin db.createUser({ user: "admin", pwd: "pass", roles: [ {role: "root", db: "admin"} ] }) Authenticating as the super user: db.auth("admin", "pass")
  • 17. Add the second and third node to the CSRS: rs.add("") rs.add("")
  • 18.
  • 19. Target 3: Create a MONGOS process and add Config server in MONGOS process MongoDB mongos instances route queries and write operations to shards in a sharded cluster. mongos provide the only interface to a sharded cluster from the perspective of applications. Applications never connect or communicate directly with the shards. The mongos tracks what data is on which shard by caching the metadata from the config servers. The mongosuses the metadata to route operations from applications and clients to the mongod instances. A mongos has no persistent state and consumes minimal system resources. The most common practice is to run mongos instances on the same systems as your application servers, but you can maintain mongos instances on the shards or on other dedicated resources. Routing and Results Process A mongos instance routes a query to a cluster by: 1. Determining the list of shards that must receive the query. 2. Establishing a cursor on all targeted shards. The mongos then merges the data from each of the targeted shards and returns the result document. Certain query modifiers, such as sorting, are performed on a shard such as the primary shard before mongos retrieves the results. This Part So Far
  • 20. sharding: configDB: Config/,, security: keyFile: /home/oracle/mongodb/keyfile/keyfile net: bindIp: localhost, port: 4000 systemLog: destination: file path: /home/oracle/mongo_shard/Shard/logfile/shard.log logAppend: true processManagement: fork: true Log shows it’s also working fine Target 4: Add Shard name with Shard 1 and start with new config file
  • 21. Check Current Shard Status mongo --port 4000 --username admin --password pass --authenticationDatabase admin Update Config file for Shard 1 Previous Config File New Config File replication: replSetName: shard1 storage: dbPath: /home/oracle/mongo_shard/mongoShard1_1/datafile net: bindIp:,localhost port: 5001 security: authorization: enabled keyFile: /home/oracle/mongodb/keyfile/keyfile systemLog: destination: file path: /home/oracle/mongo_shard/mongoShard1_1/logfile/mongod.log logAppend: true processManagement: fork: true sharding: clusterRole: shardsvr storage: dbPath: /home/oracle/mongo_shard/mongoShard1_1/datafile wiredTiger: engineConfig: cacheSizeGB: .1 net: bindIp:,localhost port: 5001 security: keyFile: /home/oracle/mongodb/keyfile/keyfile systemLog: destination: file path: /home/oracle/mongo_shard/mongoShard1_1/logfile/mongod.log logAppend: true processManagement: fork: true replication:
  • 22. replSetName: shard1 Connect: mongo --port 5001 -u "admin" -p "pass" --authenticationDatabase "admin" db.shutdownServer() Now start with new above config file. Check the Replicaset status and make it primary Current Primary  mongo --port 5003 -u "admin" -p "pass" --authenticationDatabase "admin" rs.isMaster() rs.stepDown() Shutdown Node 2 and start with new config file mongo --port 6002 -u "admin" -p "pass" --authenticationDatabase "admin" db.shutdownServer() Previous Config File New Config File replication: replSetName: shard1 storage: dbPath: /home/oracle/mongo_shard/mongoShard1_2/datafile net: bindIp:,localhost sharding: clusterRole: shardsvr storage: dbPath: /home/oracle/mongo_shard/mongoShard1_2/datafile wiredTiger: engineConfig:
  • 23. port: 6002 security: authorization: enabled keyFile: /home/oracle/mongodb/keyfile/keyfile systemLog: destination: file path: /home/oracle/mongo_shard/mongoShard1_2/logfile/mongod.log logAppend: true processManagement: fork: true cacheSizeGB: .1 net: bindIp:,localhost port: 6002 security: keyFile: /home/oracle/mongodb/keyfile/keyfile systemLog: destination: file path: /home/oracle/mongo_shard/mongoShard1_2/logfile/mongod.log logAppend: true processManagement: fork: true replication: replSetName: shard1 Start the Mongod process Shutdown Node 3 and start with new config file mongo --port 5003 -u "admin" -p "pass" --authenticationDatabase "admin" db.shutdownServer() Previous config file New config file replication: sharding:
  • 24. replSetName: shard1 storage: dbPath: /home/oracle/mongo_shard/mongoShard1_3/datafile net: bindIp:,localhost port: 5003 security: authorization: enabled keyFile: /home/oracle/mongodb/keyfile/keyfile systemLog: destination: file path: /home/oracle/mongo_shard/mongoShard1_3/logfile/mongod.log logAppend: true processManagement: fork: true clusterRole: shardsvr storage: dbPath: /home/oracle/mongo_shard/mongoShard1_3/datafile wiredTiger: engineConfig: cacheSizeGB: .1 net: bindIp:,localhost port: 5003 security: keyFile: /home/oracle/mongodb/keyfile/keyfile systemLog: destination: file path: /home/oracle/mongo_shard/mongoShard1_3/logfile/mongod.log logAppend: true processManagement: fork: true replication: replSetName: shard1 Start Node 3 with new config file mongod -f config_file/shard1/node3.conf and connect it mongo --port 5003 -u "admin" -p "pass" --authenticationDatabase "admin" Target 5; Add Shard 1 into MONGOS process
  • 25. Connect Shard Cluster mongo --port 4000 --username admin --password pass --authenticationDatabase admin Adding new shard to cluster from mongos: sh.addShard("shard1/") Output Shows Shard added
  • 26. Log Shows Check the Shard Status Then We can do add new Shard named Shard 2 as like in below Diagram