This document discusses MongoDB sharding which involves horizontally scaling MongoDB across multiple machines or shards. It describes the components of a sharded MongoDB cluster including shards, config servers, and mongos query routers. It provides examples of when and why sharding would be used such as for large datasets, high throughput, hardware limitations, storage engine limitations, isolating failures, and separating hot and cold data. The document then outlines steps to set up a basic two node sharded cluster with one shard, three config servers, and mongos query routers on the same two machines.
2. MongoDB Sharding
Sharding is a method for distributing data across multiple machines. MongoDB uses
sharding to support deployments with very large data sets and high throughput operations.
Database systems with large data sets or high throughput applications can challenge the
capacity of a single server. For example, high query rates can exhaust the CPU capacity
of the server. Working set sizes larger than the system’s RAM stress the I/O capacity of
disk drives.
There are two methods for addressing system growth: vertical and horizontal scaling.
Vertical Scaling involves increasing the capacity of a single server, such as using a more
powerful CPU, adding more RAM, or increasing the amount of storage space. Limitations
in available technology may restrict a single machine from being sufficiently powerful
for a given workload. Additionally, Cloud-based providers have hard ceilings based on
available hardware configurations. As a result, there is a practical maximum for vertical
scaling.
3. Horizontal Scaling involves dividing the system dataset and load over multiple servers,
adding additional servers to increase capacity as required. While the overall speed or
capacity of a single machine may not be high, each machine handles a subset of the overall
workload, potentially providing better efficiency than a single high-speed high-capacity
server. Expanding the capacity of the deployment only requires adding additional servers
as needed, which can be a lower overall cost than high-end hardware for a single machine.
The trade off is increased complexity in infrastructure and maintenance for the
deployment.
Sharded Cluster
A MongoDB sharded cluster consists of the following components:
shard: Each shard contains a subset of the sharded data. Each shard can be
deployed as a replica set.
mongos: The mongos acts as a query router, providing an interface between
client applications and the sharded cluster.
config servers: Config servers store metadata and configuration settings for the
cluster.
4. A replica set in MongoDB is a group of mongod processes that maintain the same data set. Replica
sets provide redundancy and high availability, and are the basis for all production deployments.
Advantage of Replica Set:
1. It’s ensure data availability is any disaster period. If any Primary failure due to any hardware issue,
then data will not loss at all.
2. We can execute Select queries from Secondary database which can reduce the load from Primary
database.
3. Then we can set the delay replication from Secondary database that will help any data corruption
from developer. Suppose we set 1 hour delay replication that means always 1 hour data sync lag
from Primary database. If any developer does anything accidentally, then we can recovery from
secondary database so we can consider this Secondary Database as hot-backup.
When we go for Sharding?
Sharding is the most complex architecture we can deploy using MongoDB, and there are two main
approaches as to when to shard or not. The first is to configure the cluster as soon as possible – when
we predict high throughput and fast data growth.
The second says we should use a cluster as the best alternative when the application demands more
resources than the replica set can offer (such as low memory, an overloaded disk or high processor
load). This approach is more corrective than preventative, but we will discuss that in the future.
1) Disaster recovery plan
Disaster recovery (DR) is a very delicate topic: how long would tolerate an outage? If necessary, how
long would it take to restore the entire database? Depending on the database size and on disk speed, a
backup/restore process might take hours or even days!
There is no hard number in Gigabytes to justify a cluster. But in general, you should engage when the
database is more than 200GB the backup and restore processes might take a while to finish.
Let’s consider the case where we have a replica set with a 300GB database. The full restore process
might last around four hours, whereas if the database has two shards, it will take about two hours –
and depending on the number of shards we can improve that time. Simple math: if there are two shards,
the restore process takes half of the time to restore when compared to a single replica set.
2) Hardware limitations
Disk and memory are inexpensive nowadays. However, this is not true when companies need to scale
out to high numbers (such as TB of RAM). Suppose cloud provider can only offer up to 5,000 IOPS
in the disk subsystem, but the application needs more than that to work correctly. To work around this
performance limitation, it is better to start a cluster and divide the writes among instances. That said,
if there are two shards the application will have 10000 IOPS available to use for writes and reads in
5. the disk subsystem.
3) Storage engine limitations
There are a few storage engine limitations that can be a bottleneck . MMAPv2 does have a lock per
collection, while WiredTiger has tickets that will limit the number of writes and reads happening
concurrently. Although we can tweak the number of tickets available in WiredTiger, there is a virtual
limit – which means that changing the available tickets might generate processor overload instead of
increasing performance. If one of these situations becomes a bottleneck in system, we can start a
cluster. Once shard the collection, distribute the load/lock among the different instances.
4) Hot data vs. cold data
Several databases only work with a small percentage of the data being stored. This is called hot data
or working set. Cold data or historical data is rarely read, and demands considerable system resources
when it is. So why spend money on expensive machines that only store cold data or low-value
data? With a cluster deployment we can choose where the cold data is stored, and use cheap devices
and disks to do so. The same is true for hot data – we can use better machines to have better
performance. This methodology also speeds up writes and reads on the hot data, as the indexes are
smaller and add less overhead to the system.
5) Geo-distributed data
It doesn’t matter whether this need comes from application design or legal compliance. If the data
must stay within continent or country borders, a cluster helps make that happen. It is possible to limit
data localization so that it is stored solely in a specific “part of the world.” The number of shards and
their geographic positions is not essential for the application, as it only views the database. This is
commonly used in worldwide companies for better performance, or simply to comply with the local
law.
6) Infrastructure limitations
Infrastructure and hardware limitations are very similar. When thinking about infrastructure, however,
we focus on specific cases when the instances should be small. An example is running MongoDB on
Mesos. Some providers only offer a few cores and a limited amount of RAM. Even if you are willing
to pay more for that, it is not possible to purchase more than they offer as their products. A cluster
provides the option to split a small amount of data among a lot of shards, reaching the same
performance a big and expensive machine provides.
7) Failure isolation
Consider that a replica set or a single instance holds all the data. If for any reason this instance/replica
set goes down, the whole application goes down. In a cluster, if we lose one of the five shards, 80%
of the data is still available. Running a few shards helps to isolate failures. Obviously, running a bunch
of instances makes the cluster prone to have a failed instance, but as each shard must have at least
three instances the probability of the entire shard being down is minimal. For providers that offer
6. different zones, it is good practice to have different members of the shard in different availability zones
(or even different regions).
8) Speed up queries
Queries can take too long, depending on the number of reads they perform. In a clustered deployment,
queries can run in parallel and speed up the query response time. If a query runs in ten seconds in a
replica set, it is very likely that the same query will run in five to six seconds if the cluster has two
shards, and so on.
Sharding Configuration is below:
Basically, I have only Two Virtual Box. So, 3 node Config servers and 3 node Shard1 and Mogos
process will run in this Machine.
CRS
Host IP Port
192.168.56.103 7001
192.168.56.101 7002
192.168.56.103 7003
7. Shard Process Running Machine
Host IP Port
192.168.56.101 4000
Shard1
Host IP Port
192.168.56.101 5001
192.168.56.103 5002
192.168.56.101 5003
Target 1: Create a Replica set for Shard 1 and test Replica set is working.
Target 2: Create a Config Server with 3 node Replica set also Sharding name.
Target 3: Create a MONGOS process and add Config server in MONGOS process.
Target 4: Add Shard name with Shard 1 and start with new config file
Target 5; Add Shard 1 into MONGOS process
Create directory for keyfile, Datafile and logfile for both Node
Shard1:
Host and Port No Datafile Logfile
192.168.56.101:5001 /home/oracle/mongo_shard/mongoShard1_1/datafile /home/oracle/mongo_shard/mongoShard1_1/logfile
192.168.56.103:6002 /home/oracle/mongo_shard/mongoShard1_2/datafile /home/oracle/mongo_shard/mongoShard1_2/logfile
192.168.56.101:5003 /home/oracle/mongo_shard/mongoShard1_3/datafile /home/oracle/mongo_shard/mongoShard1_3/logfile
ConfigServer
Host and Port No Datafile Logfile
192.168.56.103:7001 /home/oracle/mongo_shard/mongoConfig_1/datafile /home/oracle/mongo_shard/mongoConfig_1/logfile
192.168.56.101:7002 /home/oracle/mongo_shard/mongoConfig_2/datafile /home/oracle/mongo_shard/mongoConfig_2/logfile
192.168.56.103:7003 /home/oracle/mongo_shard/mongoConfig_3/datafile /home/oracle/mongo_shard/mongoConfig_3/logfile
Mongos
8. Host and Port No Datafile Logfile
192.168.56.101:4000 N/a /home/oracle/mongo_shard/Shard/logfile
Generate keyfile and transfer this file another node:
[23:11:51 oracle@test2 keyfile]$ openssl rand -base64 741 > /home/oracle/mongodb/keyfile/keyfile
[23:12:15 oracle@test2 keyfile]$ chmod 600 /home/oracle/mongodb/keyfile/keyfile
[23:12:24 oracle@test2 keyfile]$ scp /home/oracle/mongodb/keyfile/keyfile oracle@ansible:/home/oracle/mongodb/keyfile
oracle@ansible's password:
keyfile
100% 1004 648.0KB/s 00:00
[23:13:29 oracle@test2 keyfile]$
Target 1: Create a Replica set for Shard 1 and test Replica set is working
Config File for All Three Node
Node 1:
replication:
replSetName: shard1
storage:
dbPath: /home/oracle/mongo_shard/mongoShard1_1/datafile
net:
bindIp: 192.168.56.101,localhost
port: 5001
security:
authorization: enabled
keyFile: /home/oracle/mongodb/keyfile/keyfile
systemLog:
10. logAppend: true
processManagement:
fork: true
Start the MongoDB Deamon process in both node using configure file
Node 1
[02:02:39 oracle@test2 bin]$ mongod -f config_file/shard1/node1.conf
about to fork child process, waiting until server is ready for connections.
forked process: 6033
child process started successfully, parent exiting
Node 2
[23:34:11 oracle@ansible bin]$ mongod -f config_file/shard1/node2.conf
about to fork child process, waiting until server is ready for connections.
forked process: 9687
child process started successfully, parent exiting
[23:34:27 oracle@ansible bin]$
Node 3
[02:05:31 oracle@test2 bin]$ mongod -f config_file/shard1/node3.conf
about to fork child process, waiting until server is ready for connections.
forked process: 6102
child process started successfully, parent exiting
ReplicaSet Configuration
Connecting to node1:
[02:06:19 oracle@test2 bin]$ mongo --port 5001
MongoDB shell version v3.6.8
connecting to: mongodb://127.0.0.1:5001/
MongoDB server version: 3.6.8
12. Connect with admin user
mongo --host "shard1/192.168.56.101:5001" -u "admin" -p "pass" --authenticationDatabase
"admin"
Add Node
rs.add("192.168.56.103:6002")
rs.add("192.168.56.101:5003")
Now testing Replication is working
rs.stepDown()
Primary is Changing
13. Target 2: Create a Config Server with 3 node Replica set also Sharding name.
This Part So Far
Config servers store the metadata for a sharded cluster. The metadata reflects state and organization
for all data and components within the sharded cluster. The metadata includes the list of chunks on
every shard and the ranges that define the chunks
Config File for All Three Node Config replication
Node 1:
sharding:
clusterRole: configsvr
replication:
replSetName: Config
security:
keyFile: /home/oracle/mongodb/keyfile/keyfile
net:
bindIp: localhost,192.168.56.103
port: 7001
15. systemLog:
destination: file
path: /home/oracle/mongo_shard/mongoConfig_3/logfile/csrsvr.log
logAppend: true
processManagement:
fork: true
storage:
dbPath: /home/oracle/mongo_shard/mongoConfig_3/datafile
Start the MongoDB Daemon process for all config server using configure file
Node 1
[00:31:31 oracle@ansible bin]$ mongod -f
/home/oracle/mongodb_software/bin/config_file/config_svr/cnsvr1.conf
about to fork child process, waiting until server is ready for connections.
forked process: 13992
child process started successfully, parent exiting
Node 2
[03:03:18 oracle@test2 bin]$ mongod -f
/home/TimesTen_SFT/mongodb/bin/config_file/config_svr/cnsvr2.conf
about to fork child process, waiting until server is ready for connections.
forked process: 7168
child process started successfully, parent exiting
[03:03:34 oracle@test2 bin]$
Node 3
[00:31:44 oracle@ansible bin]$ mongod -f
/home/oracle/mongodb_software/bin/config_file/config_svr/cnsvr3.conf
about to fork child process, waiting until server is ready for connections.
forked process: 14031
child process started successfully, parent exiting
Connect to one of the config servers:
16. mongo --port 7001
Initiating the CSRS:
rs.initiate()
Creating super user on CSRS:
use admin
db.createUser({
user: "admin",
pwd: "pass",
roles: [
{role: "root", db: "admin"}
]
})
Authenticating as the super user:
db.auth("admin", "pass")
17. Add the second and third node to the CSRS:
rs.add("192.168.56.101:7002")
rs.add("192.168.56.103:7003")
18.
19. Target 3: Create a MONGOS process and add Config server in MONGOS process
MongoDB mongos instances route queries and write operations to shards in a
sharded cluster. mongos provide the only interface to a sharded cluster from the
perspective of applications. Applications never connect or communicate directly with
the shards.
The mongos tracks what data is on which shard by caching the metadata from
the config servers. The mongosuses the metadata to route operations from
applications and clients to the mongod instances. A mongos has no persistent state
and consumes minimal system resources.
The most common practice is to run mongos instances on the same systems as your
application servers, but you can maintain mongos instances on the shards or on
other dedicated resources.
Routing and Results Process
A mongos instance routes a query to a cluster by:
1. Determining the list of shards that must receive the query.
2. Establishing a cursor on all targeted shards.
The mongos then merges the data from each of the targeted shards and returns the
result document. Certain query modifiers, such as sorting, are performed on a shard
such as the primary shard before mongos retrieves the results.
This Part So Far