SlideShare a Scribd company logo
MongoDB – Roma
 12 Luglio 2012
 Replication and Sharding:
         Hands on

     Guglielmo Incisa
Replication
• What is it
   – Data is replicated (cloned) into at least two nodes
   – Updates are sent to one node (Primary) and automatically propagated
     to the others (Secondary)
   – Connection can through a router or directly to the Primary (Secondary
     is read only)
       • If we connect our app server to the Primary we must deal with its failure and
         reconnect to the new Primary



                                        Primary


             App server                                       DB


                          Router
Replication
• Why we need it
   – If one node fails the application server can still work without any
     impact
   – The router will automatically manage the connection to the rest of the
     nodes (router may be subject to failure though)


                                         Primary


         App server                                DB


                      Router
Replication
• Why we need it
   – More and more IT departments are moving from
       • Big, proprietary, reliable and expensive servers
   – To
       • Commodity Hardware (smaller, less reliable, inexpensive servers: PC)
   – Commodity hardware is less reliable but our users demand that our
     applications be always available: the replication can help.
   – Example: how many servers do I need to have 99,999% of availability?
       • If for example a PC has 98% availability (8 days if downtime in a year, or 98%
         probability to be down)
       • -> Two replicated PC have 99,96% of availability
       • -> Three replicated PC have more than 99,999% (Telecom Grade / Core Network).
Sharding
• What is it
   – Data is partitioned and distributed to different nodes
       • Some records are in node 1, others in node 2 etc…
   – MongoDB Sharding: the partition is based on a field.
       • Database: test2
            – Table: testSchema1
            – Fields:
                  » owner: owner of the file, key and shard key (string)
                  » date (string)
                  » tags (list of string)
                  » keywords: words in the document, created by java code below (list of string)
                  » fileName (string)
                  » content: the file (binary)
                  » ascii: the file (string)
Sharding
• Why we need it
   – Servers with smaller storage
   – To increase responsiveness by increasing parallelism

                                     Router




        Owner: A-H    Owner: I-O              Owner: P-Z
Replication and Sharding
• Can we have both?
   – MongoDB: yes!
• Our example:


                                           Shard A: 2 + arbiter


                          Config process

                                                                     Shard B: 2 + arbiter


                     Router
                     mongos
                                                      Shard C: 2 + arbiter
Replication and Sharding
• Replication:
     – Two nodes and an arbiter
          •   The arbiter is needed when a number of even nodes are used, it decides which server is Primary and which
              one is secondary, manages the upgrade when one is down

•   Sharding
     – Three sets: A, B, C
     – Config Process:
          •   <<The config servers store the cluster's metadata, which includes basic information on each shard server and
              the chunks contained therein.>>
     – Routing Process:
          •   <<The mongos process can be thought of as a routing and coordination process that makes the various
              components of the cluster look like a single system. When receiving client requests, the mongos process routes
              the request to the appropriate server(s) and merges any results to be sent back to the client.>>
Setup 1
•   Start Servers and arbiters
     –   Create /data/db, db2, db3, db4, db5, db6, db7, db8 ,db9, configdb
     –   --nojournal speeds up the startup (journalling is default in 64 bit)
•   Replica set A
     –   ./mongodb-linux-x86_64-2.0.4/bin/mongod --shardsvr --replSet DSSA –nojournal
     –   ./mongodb-linux-x86_64-2.0.4/bin/mongod --shardsvr --replSet DSSA --dbpath /data/db2
         --port 27021 –nojournal
     –   Arbiter:
                                                                                                   Shard A: 2 + arbiter
     –   ./mongodb-linux-x86_64-2.0.4/bin/mongod --shardsvr --replSet DSSA --dbpath /data/db7
         --port 27031 –nojournal


•   Replica set B
     –   ./mongodb-linux-x86_64-2.0.4/bin/mongod --shardsvr --replSet DSSB --dbpath /data/db3 --                   Shard B: 2 + arbiter
         port 27023 –nojournal
     –   ./mongodb-linux-x86_64-2.0.4/bin/mongod --shardsvr --replSet DSSB --dbpath /data/db4 --
         port 27025 –nojournal
     –   Arbiter:                                                                                             Shard C: 2 + arbiter
     –   ./mongodb-linux-x86_64-2.0.4/bin/mongod --shardsvr --replSet DSSB --dbpath /data/db8 --
         port 27035 –nojournal


•   Replica set C
     –   ./mongodb-linux-x86_64-2.0.4/bin/mongod --shardsvr --replSet DSSC --dbpath /data/db5 --
         port 27027 –nojournal
     –   ./mongodb-linux-x86_64-2.0.4/bin/mongod --shardsvr --replSet DSSC --dbpath /data/db6 --
         port 27029 –nojournal
     –   Arbiter:
     –   ./mongodb-linux-x86_64-2.0.4/bin/mongod --shardsvr --replSet DSSC --dbpath /data/db9 --
         port 27039 --nojournal
Setup 2
•   Set the replicas, connect to each primary and set the configuration
•   Set replica A
     ./mongodb-linux-x86_64-2.0.4/bin/mongo --port 27018
     cfg = {
                    _id : "DSSA",
                    members : [
                      {_id : 0, host : “hostname:27018"},
                      {_id : 1, host : "hostname:27021"},
                    {_id : 2, host : "hostname:27031", arbiterOnly:true}
                    ]
                }
     rs.initiate(cfg)
     db.getMongo().setSlaveOk()

•   Set replica B
     ./mongodb-linux-x86_64-2.0.4/bin/mongo --port 27023
           cfg = {
              _id : "DSSB",
              members : [
                {_id : 0, host : "hostname:27023"},
                {_id : 1, host : "hostname:27025"},
              {_id : 2, host : "hostname:27035", arbiterOnly:true}
              ]
           }
           rs.initiate(cfg)
           db.getMongo().setSlaveOk()

•   Set replica C
     ./mongodb-linux-x86_64-2.0.4/bin/mongo --port 27027
           cfg = {
              _id : "DSSC",
              members : [
                {_id : 0, host : "hostname:27027"},
                {_id : 1, host : "hostname:27029"},
              {_id : 2, host : "hostname:27039", arbiterOnly:true},
              ]
           }
           rs.initiate(cfg)
           db.getMongo().setSlaveOk()
Setup 3
•   Star config server
     ./mongodb-linux-x86_64-2.0.4/bin/mongod --configsvr --nojournal



•   Start router
     ./mongodb-linux-x86_64-2.0.4/bin/mongos --configdb grog:27019 --chunkSize 1


•   Configure Shards
     ./mongodb-linux-x86_64-2.0.4/bin/mongo admin
     db.runCommand( { addshard : "DSSA/hostname:27018, hostname:27021"})
     db.runCommand( { addshard : "DSSB/hostname:27023, hostname:27025"})
     db.runCommand( { addshard : "DSSC/hostname:27027, hostname:27029"})
     db.runCommand( { enablesharding : "test2"})
     db.runCommand( { shardcollection : "test2.testSchema1",key : { owner : 1}})

•   Load data…

     – We load 11 documents, sharding is done over the “owner”
MapReduce
•   "Map" step: The master node takes the input, divides it into smaller sub-
    problems, and distributes them to worker nodes. A worker node may do
    this again in turn, leading to a multi-level tree structure. The worker node
    processes the smaller problem, and passes the answer back to its master
    node.
•   "Reduce" step: The master node then collects the answers to all the sub-
    problems and combines them in some way to form the output – the
    answer to the problem it was originally trying to solve.
•   Source: Wikipedia
•
MapReduce
•   map = function(){
             if(!this.keywords){
             return;
             }
             for (index in this.keywords){
                           emit(this.keywords[index],1);
             }
    }
•   reduce = function(previous,current){
             var count = 0;
             for (index in current) {
                           count += current[index];
             }
             return count;
    }
•   result = db.runCommand({
             "mapreduce" : "testSchema1",
             "map":map,
             "reduce":reduce,
             "out":"keywords"})
    db.keywords.find()
    mongos> db.keywords.find({_id:“hello"})
Check Sharding
•   Connect to router and count the records:
     ./mongodb-linux-x86_64-2.0.4/bin/mongo admin
     mongos>use test2
     mongos>db,testSchema1.count()
     11
•   Connect to each primary (and see the number of records in each shard):
     ./mongodb-linux-x86_64-2.0.4/bin/mongo --port 27018
     mongo>use test2
     Mongo>db,testSchema1.count()
     4
     ./mongodb-linux-x86_64-2.0.4/bin/mongo --port 27023
     mongo>use test2
     mongo>db,testSchema1.count()
     4
     ./mongodb-linux-x86_64-2.0.4/bin/mongo --port 27027
     mongo>use test2
     mongo>db,testSchema1.count()
     3
Check Replication
•   Kill Server 1 (=Primary A)
•   Connect to router and count the records:
     mongos>use test2
     mongos>db,testSchema1.count()
     11
•   Check if (Server 2) Secondary A in now primary
•   Load a new chunck
•   Counting will be 22
•   Restart killed server (Server 1) , wait
•   Kill the other one (Server 2), Primary A
•   Check that Server 1 is Primary again
•   Counting will still be 22
•   Restart Server 2

More Related Content

What's hot

Ceph Performance and Sizing Guide
Ceph Performance and Sizing GuideCeph Performance and Sizing Guide
Ceph Performance and Sizing Guide
Jose De La Rosa
 
PostgreSQL Performance Tuning
PostgreSQL Performance TuningPostgreSQL Performance Tuning
PostgreSQL Performance Tuningelliando dias
 
DUG'20: 12 - DAOS in Lenovo’s HPC Innovation Center
DUG'20: 12 - DAOS in Lenovo’s HPC Innovation CenterDUG'20: 12 - DAOS in Lenovo’s HPC Innovation Center
DUG'20: 12 - DAOS in Lenovo’s HPC Innovation Center
Andrey Kudryavtsev
 
Mastering PostgreSQL Administration
Mastering PostgreSQL AdministrationMastering PostgreSQL Administration
Mastering PostgreSQL Administration
EDB
 
Mongodb replication
Mongodb replicationMongodb replication
Mongodb replication
PoguttuezhiniVP
 
Benchmarking MongoDB and CouchBase
Benchmarking MongoDB and CouchBaseBenchmarking MongoDB and CouchBase
Benchmarking MongoDB and CouchBaseChristopher Choi
 
Red Hat Enterprise Linux OpenStack Platform on Inktank Ceph Enterprise
Red Hat Enterprise Linux OpenStack Platform on Inktank Ceph EnterpriseRed Hat Enterprise Linux OpenStack Platform on Inktank Ceph Enterprise
Red Hat Enterprise Linux OpenStack Platform on Inktank Ceph Enterprise
Red_Hat_Storage
 
PGConf.ASIA 2019 - High Availability, 10 Seconds Failover - Lucky Haryadi
PGConf.ASIA 2019 - High Availability, 10 Seconds Failover - Lucky HaryadiPGConf.ASIA 2019 - High Availability, 10 Seconds Failover - Lucky Haryadi
PGConf.ASIA 2019 - High Availability, 10 Seconds Failover - Lucky Haryadi
Equnix Business Solutions
 
Hadoop HDFS Concepts
Hadoop HDFS ConceptsHadoop HDFS Concepts
Hadoop HDFS Concepts
tutorialvillage
 
Logical volume manager xfs
Logical volume manager xfsLogical volume manager xfs
Logical volume manager xfs
Sarwar Javaid
 
Replication and Replica Sets
Replication and Replica SetsReplication and Replica Sets
Replication and Replica Sets
MongoDB
 
Hadoop Introduction
Hadoop IntroductionHadoop Introduction
Hadoop Introduction
tutorialvillage
 
Storage Systems for big data - HDFS, HBase, and intro to KV Store - Redis
Storage Systems for big data - HDFS, HBase, and intro to KV Store - RedisStorage Systems for big data - HDFS, HBase, and intro to KV Store - Redis
Storage Systems for big data - HDFS, HBase, and intro to KV Store - Redis
Sameer Tiwari
 
Migrating to XtraDB Cluster
Migrating to XtraDB ClusterMigrating to XtraDB Cluster
Migrating to XtraDB Cluster
percona2013
 
HADOOP 실제 구성 사례, Multi-Node 구성
HADOOP 실제 구성 사례, Multi-Node 구성HADOOP 실제 구성 사례, Multi-Node 구성
HADOOP 실제 구성 사례, Multi-Node 구성
Young Pyo
 
A DBA’s guide to using TSA
A DBA’s guide to using TSAA DBA’s guide to using TSA
A DBA’s guide to using TSA
Frederik Engelen
 
Elastic Search Training#1 (brief tutorial)-ESCC#1
Elastic Search Training#1 (brief tutorial)-ESCC#1Elastic Search Training#1 (brief tutorial)-ESCC#1
Elastic Search Training#1 (brief tutorial)-ESCC#1
medcl
 
Inside PostgreSQL Shared Memory
Inside PostgreSQL Shared MemoryInside PostgreSQL Shared Memory
Inside PostgreSQL Shared Memory
EDB
 
MariaDB 10.5 binary install (바이너리 설치)
MariaDB 10.5 binary install (바이너리 설치)MariaDB 10.5 binary install (바이너리 설치)
MariaDB 10.5 binary install (바이너리 설치)
NeoClova
 

What's hot (20)

Ceph Performance and Sizing Guide
Ceph Performance and Sizing GuideCeph Performance and Sizing Guide
Ceph Performance and Sizing Guide
 
PostgreSQL Performance Tuning
PostgreSQL Performance TuningPostgreSQL Performance Tuning
PostgreSQL Performance Tuning
 
DUG'20: 12 - DAOS in Lenovo’s HPC Innovation Center
DUG'20: 12 - DAOS in Lenovo’s HPC Innovation CenterDUG'20: 12 - DAOS in Lenovo’s HPC Innovation Center
DUG'20: 12 - DAOS in Lenovo’s HPC Innovation Center
 
Mastering PostgreSQL Administration
Mastering PostgreSQL AdministrationMastering PostgreSQL Administration
Mastering PostgreSQL Administration
 
Mongodb replication
Mongodb replicationMongodb replication
Mongodb replication
 
Benchmarking MongoDB and CouchBase
Benchmarking MongoDB and CouchBaseBenchmarking MongoDB and CouchBase
Benchmarking MongoDB and CouchBase
 
Red Hat Enterprise Linux OpenStack Platform on Inktank Ceph Enterprise
Red Hat Enterprise Linux OpenStack Platform on Inktank Ceph EnterpriseRed Hat Enterprise Linux OpenStack Platform on Inktank Ceph Enterprise
Red Hat Enterprise Linux OpenStack Platform on Inktank Ceph Enterprise
 
PGConf.ASIA 2019 - High Availability, 10 Seconds Failover - Lucky Haryadi
PGConf.ASIA 2019 - High Availability, 10 Seconds Failover - Lucky HaryadiPGConf.ASIA 2019 - High Availability, 10 Seconds Failover - Lucky Haryadi
PGConf.ASIA 2019 - High Availability, 10 Seconds Failover - Lucky Haryadi
 
Upgrade & ndmp
Upgrade & ndmpUpgrade & ndmp
Upgrade & ndmp
 
Hadoop HDFS Concepts
Hadoop HDFS ConceptsHadoop HDFS Concepts
Hadoop HDFS Concepts
 
Logical volume manager xfs
Logical volume manager xfsLogical volume manager xfs
Logical volume manager xfs
 
Replication and Replica Sets
Replication and Replica SetsReplication and Replica Sets
Replication and Replica Sets
 
Hadoop Introduction
Hadoop IntroductionHadoop Introduction
Hadoop Introduction
 
Storage Systems for big data - HDFS, HBase, and intro to KV Store - Redis
Storage Systems for big data - HDFS, HBase, and intro to KV Store - RedisStorage Systems for big data - HDFS, HBase, and intro to KV Store - Redis
Storage Systems for big data - HDFS, HBase, and intro to KV Store - Redis
 
Migrating to XtraDB Cluster
Migrating to XtraDB ClusterMigrating to XtraDB Cluster
Migrating to XtraDB Cluster
 
HADOOP 실제 구성 사례, Multi-Node 구성
HADOOP 실제 구성 사례, Multi-Node 구성HADOOP 실제 구성 사례, Multi-Node 구성
HADOOP 실제 구성 사례, Multi-Node 구성
 
A DBA’s guide to using TSA
A DBA’s guide to using TSAA DBA’s guide to using TSA
A DBA’s guide to using TSA
 
Elastic Search Training#1 (brief tutorial)-ESCC#1
Elastic Search Training#1 (brief tutorial)-ESCC#1Elastic Search Training#1 (brief tutorial)-ESCC#1
Elastic Search Training#1 (brief tutorial)-ESCC#1
 
Inside PostgreSQL Shared Memory
Inside PostgreSQL Shared MemoryInside PostgreSQL Shared Memory
Inside PostgreSQL Shared Memory
 
MariaDB 10.5 binary install (바이너리 설치)
MariaDB 10.5 binary install (바이너리 설치)MariaDB 10.5 binary install (바이너리 설치)
MariaDB 10.5 binary install (바이너리 설치)
 

Viewers also liked

MongoDB by Tonny
MongoDB by TonnyMongoDB by Tonny
MongoDB by Tonny
Agate Studio
 
MongoDB Replication and Sharding
MongoDB Replication and ShardingMongoDB Replication and Sharding
MongoDB Replication and ShardingTharun Srinivasa
 
MongoDB: Replication,Sharding,MapReduce
MongoDB: Replication,Sharding,MapReduceMongoDB: Replication,Sharding,MapReduce
MongoDB: Replication,Sharding,MapReduceTakahiro Inoue
 
MongoDB: Advance concepts - Replication and Sharding
MongoDB: Advance concepts - Replication and ShardingMongoDB: Advance concepts - Replication and Sharding
MongoDB: Advance concepts - Replication and Sharding
Knoldus Inc.
 
MongoDB Introduction - Document Oriented Nosql Database
MongoDB Introduction - Document Oriented Nosql DatabaseMongoDB Introduction - Document Oriented Nosql Database
MongoDB Introduction - Document Oriented Nosql DatabaseSudhir Patil
 
"Sharding - patterns & antipatterns". Доклад Алексея Рыбака (Badoo) и Констан...
"Sharding - patterns & antipatterns". Доклад Алексея Рыбака (Badoo) и Констан..."Sharding - patterns & antipatterns". Доклад Алексея Рыбака (Badoo) и Констан...
"Sharding - patterns & antipatterns". Доклад Алексея Рыбака (Badoo) и Констан...
Badoo Development
 

Viewers also liked (6)

MongoDB by Tonny
MongoDB by TonnyMongoDB by Tonny
MongoDB by Tonny
 
MongoDB Replication and Sharding
MongoDB Replication and ShardingMongoDB Replication and Sharding
MongoDB Replication and Sharding
 
MongoDB: Replication,Sharding,MapReduce
MongoDB: Replication,Sharding,MapReduceMongoDB: Replication,Sharding,MapReduce
MongoDB: Replication,Sharding,MapReduce
 
MongoDB: Advance concepts - Replication and Sharding
MongoDB: Advance concepts - Replication and ShardingMongoDB: Advance concepts - Replication and Sharding
MongoDB: Advance concepts - Replication and Sharding
 
MongoDB Introduction - Document Oriented Nosql Database
MongoDB Introduction - Document Oriented Nosql DatabaseMongoDB Introduction - Document Oriented Nosql Database
MongoDB Introduction - Document Oriented Nosql Database
 
"Sharding - patterns & antipatterns". Доклад Алексея Рыбака (Badoo) и Констан...
"Sharding - patterns & antipatterns". Доклад Алексея Рыбака (Badoo) и Констан..."Sharding - patterns & antipatterns". Доклад Алексея Рыбака (Badoo) и Констан...
"Sharding - patterns & antipatterns". Доклад Алексея Рыбака (Badoo) и Констан...
 

Similar to Mongo db roma replication and sharding

MongoDB for Time Series Data Part 3: Sharding
MongoDB for Time Series Data Part 3: ShardingMongoDB for Time Series Data Part 3: Sharding
MongoDB for Time Series Data Part 3: ShardingMongoDB
 
Mongodb workshop
Mongodb workshopMongodb workshop
Mongodb workshop
Harun Yardımcı
 
NoSQL Infrastructure - Late 2013
NoSQL Infrastructure - Late 2013NoSQL Infrastructure - Late 2013
NoSQL Infrastructure - Late 2013
Server Density
 
MongoDB: Optimising for Performance, Scale & Analytics
MongoDB: Optimising for Performance, Scale & AnalyticsMongoDB: Optimising for Performance, Scale & Analytics
MongoDB: Optimising for Performance, Scale & Analytics
Server Density
 
NoSQL Infrastructure
NoSQL InfrastructureNoSQL Infrastructure
NoSQL Infrastructure
Server Density
 
2014 05-07-fr - add dev series - session 6 - deploying your application-2
2014 05-07-fr - add dev series - session 6 - deploying your application-22014 05-07-fr - add dev series - session 6 - deploying your application-2
2014 05-07-fr - add dev series - session 6 - deploying your application-2MongoDB
 
Introduction to Mongodb
Introduction to MongodbIntroduction to Mongodb
Introduction to Mongodb
Harun Yardımcı
 
Ops Jumpstart: MongoDB Administration 101
Ops Jumpstart: MongoDB Administration 101Ops Jumpstart: MongoDB Administration 101
Ops Jumpstart: MongoDB Administration 101
MongoDB
 
MongoDB for Time Series Data: Sharding
MongoDB for Time Series Data: ShardingMongoDB for Time Series Data: Sharding
MongoDB for Time Series Data: Sharding
MongoDB
 
Sharding - Seoul 2012
Sharding - Seoul 2012Sharding - Seoul 2012
Sharding - Seoul 2012MongoDB
 
Sharding
ShardingSharding
ShardingMongoDB
 
Scylla Summit 2022: ScyllaDB Rust Driver: One Driver to Rule Them All
Scylla Summit 2022: ScyllaDB Rust Driver: One Driver to Rule Them AllScylla Summit 2022: ScyllaDB Rust Driver: One Driver to Rule Them All
Scylla Summit 2022: ScyllaDB Rust Driver: One Driver to Rule Them All
ScyllaDB
 
Deployment Strategy
Deployment StrategyDeployment Strategy
Deployment StrategyMongoDB
 
Sharding
ShardingSharding
Sharding
MongoDB
 
Get expertise with mongo db
Get expertise with mongo dbGet expertise with mongo db
Get expertise with mongo db
Amit Thakkar
 
mongodb tutorial
mongodb tutorialmongodb tutorial
mongodb tutorial
Jaehong Park
 
Using Spring with NoSQL databases (SpringOne China 2012)
Using Spring with NoSQL databases (SpringOne China 2012)Using Spring with NoSQL databases (SpringOne China 2012)
Using Spring with NoSQL databases (SpringOne China 2012)
Chris Richardson
 
Spil Games @ FOSDEM: Galera Replicator IRL
Spil Games @ FOSDEM: Galera Replicator IRLSpil Games @ FOSDEM: Galera Replicator IRL
Spil Games @ FOSDEM: Galera Replicator IRL
spil-engineering
 
Deployment Strategies
Deployment StrategiesDeployment Strategies
Deployment Strategies
MongoDB
 
Using MongoDB and Python
Using MongoDB and PythonUsing MongoDB and Python
Using MongoDB and Python
Mike Bright
 

Similar to Mongo db roma replication and sharding (20)

MongoDB for Time Series Data Part 3: Sharding
MongoDB for Time Series Data Part 3: ShardingMongoDB for Time Series Data Part 3: Sharding
MongoDB for Time Series Data Part 3: Sharding
 
Mongodb workshop
Mongodb workshopMongodb workshop
Mongodb workshop
 
NoSQL Infrastructure - Late 2013
NoSQL Infrastructure - Late 2013NoSQL Infrastructure - Late 2013
NoSQL Infrastructure - Late 2013
 
MongoDB: Optimising for Performance, Scale & Analytics
MongoDB: Optimising for Performance, Scale & AnalyticsMongoDB: Optimising for Performance, Scale & Analytics
MongoDB: Optimising for Performance, Scale & Analytics
 
NoSQL Infrastructure
NoSQL InfrastructureNoSQL Infrastructure
NoSQL Infrastructure
 
2014 05-07-fr - add dev series - session 6 - deploying your application-2
2014 05-07-fr - add dev series - session 6 - deploying your application-22014 05-07-fr - add dev series - session 6 - deploying your application-2
2014 05-07-fr - add dev series - session 6 - deploying your application-2
 
Introduction to Mongodb
Introduction to MongodbIntroduction to Mongodb
Introduction to Mongodb
 
Ops Jumpstart: MongoDB Administration 101
Ops Jumpstart: MongoDB Administration 101Ops Jumpstart: MongoDB Administration 101
Ops Jumpstart: MongoDB Administration 101
 
MongoDB for Time Series Data: Sharding
MongoDB for Time Series Data: ShardingMongoDB for Time Series Data: Sharding
MongoDB for Time Series Data: Sharding
 
Sharding - Seoul 2012
Sharding - Seoul 2012Sharding - Seoul 2012
Sharding - Seoul 2012
 
Sharding
ShardingSharding
Sharding
 
Scylla Summit 2022: ScyllaDB Rust Driver: One Driver to Rule Them All
Scylla Summit 2022: ScyllaDB Rust Driver: One Driver to Rule Them AllScylla Summit 2022: ScyllaDB Rust Driver: One Driver to Rule Them All
Scylla Summit 2022: ScyllaDB Rust Driver: One Driver to Rule Them All
 
Deployment Strategy
Deployment StrategyDeployment Strategy
Deployment Strategy
 
Sharding
ShardingSharding
Sharding
 
Get expertise with mongo db
Get expertise with mongo dbGet expertise with mongo db
Get expertise with mongo db
 
mongodb tutorial
mongodb tutorialmongodb tutorial
mongodb tutorial
 
Using Spring with NoSQL databases (SpringOne China 2012)
Using Spring with NoSQL databases (SpringOne China 2012)Using Spring with NoSQL databases (SpringOne China 2012)
Using Spring with NoSQL databases (SpringOne China 2012)
 
Spil Games @ FOSDEM: Galera Replicator IRL
Spil Games @ FOSDEM: Galera Replicator IRLSpil Games @ FOSDEM: Galera Replicator IRL
Spil Games @ FOSDEM: Galera Replicator IRL
 
Deployment Strategies
Deployment StrategiesDeployment Strategies
Deployment Strategies
 
Using MongoDB and Python
Using MongoDB and PythonUsing MongoDB and Python
Using MongoDB and Python
 

Recently uploaded

"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi
Fwdays
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
Elena Simperl
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
Frank van Harmelen
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Inflectra
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
Paul Groth
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
Cheryl Hung
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Tobias Schneck
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Jeffrey Haguewood
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
Jemma Hussein Allen
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Thierry Lestable
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
OnBoard
 
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptxIOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
Abida Shariff
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
DianaGray10
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
UiPathCommunity
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
Thijs Feryn
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
 
Search and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical FuturesSearch and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical Futures
Bhaskar Mitra
 

Recently uploaded (20)

"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
 
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptxIOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
Search and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical FuturesSearch and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical Futures
 

Mongo db roma replication and sharding

  • 1. MongoDB – Roma 12 Luglio 2012 Replication and Sharding: Hands on Guglielmo Incisa
  • 2. Replication • What is it – Data is replicated (cloned) into at least two nodes – Updates are sent to one node (Primary) and automatically propagated to the others (Secondary) – Connection can through a router or directly to the Primary (Secondary is read only) • If we connect our app server to the Primary we must deal with its failure and reconnect to the new Primary Primary App server DB Router
  • 3. Replication • Why we need it – If one node fails the application server can still work without any impact – The router will automatically manage the connection to the rest of the nodes (router may be subject to failure though) Primary App server DB Router
  • 4. Replication • Why we need it – More and more IT departments are moving from • Big, proprietary, reliable and expensive servers – To • Commodity Hardware (smaller, less reliable, inexpensive servers: PC) – Commodity hardware is less reliable but our users demand that our applications be always available: the replication can help. – Example: how many servers do I need to have 99,999% of availability? • If for example a PC has 98% availability (8 days if downtime in a year, or 98% probability to be down) • -> Two replicated PC have 99,96% of availability • -> Three replicated PC have more than 99,999% (Telecom Grade / Core Network).
  • 5. Sharding • What is it – Data is partitioned and distributed to different nodes • Some records are in node 1, others in node 2 etc… – MongoDB Sharding: the partition is based on a field. • Database: test2 – Table: testSchema1 – Fields: » owner: owner of the file, key and shard key (string) » date (string) » tags (list of string) » keywords: words in the document, created by java code below (list of string) » fileName (string) » content: the file (binary) » ascii: the file (string)
  • 6. Sharding • Why we need it – Servers with smaller storage – To increase responsiveness by increasing parallelism Router Owner: A-H Owner: I-O Owner: P-Z
  • 7. Replication and Sharding • Can we have both? – MongoDB: yes! • Our example: Shard A: 2 + arbiter Config process Shard B: 2 + arbiter Router mongos Shard C: 2 + arbiter
  • 8. Replication and Sharding • Replication: – Two nodes and an arbiter • The arbiter is needed when a number of even nodes are used, it decides which server is Primary and which one is secondary, manages the upgrade when one is down • Sharding – Three sets: A, B, C – Config Process: • <<The config servers store the cluster's metadata, which includes basic information on each shard server and the chunks contained therein.>> – Routing Process: • <<The mongos process can be thought of as a routing and coordination process that makes the various components of the cluster look like a single system. When receiving client requests, the mongos process routes the request to the appropriate server(s) and merges any results to be sent back to the client.>>
  • 9. Setup 1 • Start Servers and arbiters – Create /data/db, db2, db3, db4, db5, db6, db7, db8 ,db9, configdb – --nojournal speeds up the startup (journalling is default in 64 bit) • Replica set A – ./mongodb-linux-x86_64-2.0.4/bin/mongod --shardsvr --replSet DSSA –nojournal – ./mongodb-linux-x86_64-2.0.4/bin/mongod --shardsvr --replSet DSSA --dbpath /data/db2 --port 27021 –nojournal – Arbiter: Shard A: 2 + arbiter – ./mongodb-linux-x86_64-2.0.4/bin/mongod --shardsvr --replSet DSSA --dbpath /data/db7 --port 27031 –nojournal • Replica set B – ./mongodb-linux-x86_64-2.0.4/bin/mongod --shardsvr --replSet DSSB --dbpath /data/db3 -- Shard B: 2 + arbiter port 27023 –nojournal – ./mongodb-linux-x86_64-2.0.4/bin/mongod --shardsvr --replSet DSSB --dbpath /data/db4 -- port 27025 –nojournal – Arbiter: Shard C: 2 + arbiter – ./mongodb-linux-x86_64-2.0.4/bin/mongod --shardsvr --replSet DSSB --dbpath /data/db8 -- port 27035 –nojournal • Replica set C – ./mongodb-linux-x86_64-2.0.4/bin/mongod --shardsvr --replSet DSSC --dbpath /data/db5 -- port 27027 –nojournal – ./mongodb-linux-x86_64-2.0.4/bin/mongod --shardsvr --replSet DSSC --dbpath /data/db6 -- port 27029 –nojournal – Arbiter: – ./mongodb-linux-x86_64-2.0.4/bin/mongod --shardsvr --replSet DSSC --dbpath /data/db9 -- port 27039 --nojournal
  • 10. Setup 2 • Set the replicas, connect to each primary and set the configuration • Set replica A ./mongodb-linux-x86_64-2.0.4/bin/mongo --port 27018 cfg = { _id : "DSSA", members : [ {_id : 0, host : “hostname:27018"}, {_id : 1, host : "hostname:27021"}, {_id : 2, host : "hostname:27031", arbiterOnly:true} ] } rs.initiate(cfg) db.getMongo().setSlaveOk() • Set replica B ./mongodb-linux-x86_64-2.0.4/bin/mongo --port 27023 cfg = { _id : "DSSB", members : [ {_id : 0, host : "hostname:27023"}, {_id : 1, host : "hostname:27025"}, {_id : 2, host : "hostname:27035", arbiterOnly:true} ] } rs.initiate(cfg) db.getMongo().setSlaveOk() • Set replica C ./mongodb-linux-x86_64-2.0.4/bin/mongo --port 27027 cfg = { _id : "DSSC", members : [ {_id : 0, host : "hostname:27027"}, {_id : 1, host : "hostname:27029"}, {_id : 2, host : "hostname:27039", arbiterOnly:true}, ] } rs.initiate(cfg) db.getMongo().setSlaveOk()
  • 11. Setup 3 • Star config server ./mongodb-linux-x86_64-2.0.4/bin/mongod --configsvr --nojournal • Start router ./mongodb-linux-x86_64-2.0.4/bin/mongos --configdb grog:27019 --chunkSize 1 • Configure Shards ./mongodb-linux-x86_64-2.0.4/bin/mongo admin db.runCommand( { addshard : "DSSA/hostname:27018, hostname:27021"}) db.runCommand( { addshard : "DSSB/hostname:27023, hostname:27025"}) db.runCommand( { addshard : "DSSC/hostname:27027, hostname:27029"}) db.runCommand( { enablesharding : "test2"}) db.runCommand( { shardcollection : "test2.testSchema1",key : { owner : 1}}) • Load data… – We load 11 documents, sharding is done over the “owner”
  • 12. MapReduce • "Map" step: The master node takes the input, divides it into smaller sub- problems, and distributes them to worker nodes. A worker node may do this again in turn, leading to a multi-level tree structure. The worker node processes the smaller problem, and passes the answer back to its master node. • "Reduce" step: The master node then collects the answers to all the sub- problems and combines them in some way to form the output – the answer to the problem it was originally trying to solve. • Source: Wikipedia •
  • 13. MapReduce • map = function(){ if(!this.keywords){ return; } for (index in this.keywords){ emit(this.keywords[index],1); } } • reduce = function(previous,current){ var count = 0; for (index in current) { count += current[index]; } return count; } • result = db.runCommand({ "mapreduce" : "testSchema1", "map":map, "reduce":reduce, "out":"keywords"}) db.keywords.find() mongos> db.keywords.find({_id:“hello"})
  • 14. Check Sharding • Connect to router and count the records: ./mongodb-linux-x86_64-2.0.4/bin/mongo admin mongos>use test2 mongos>db,testSchema1.count() 11 • Connect to each primary (and see the number of records in each shard): ./mongodb-linux-x86_64-2.0.4/bin/mongo --port 27018 mongo>use test2 Mongo>db,testSchema1.count() 4 ./mongodb-linux-x86_64-2.0.4/bin/mongo --port 27023 mongo>use test2 mongo>db,testSchema1.count() 4 ./mongodb-linux-x86_64-2.0.4/bin/mongo --port 27027 mongo>use test2 mongo>db,testSchema1.count() 3
  • 15. Check Replication • Kill Server 1 (=Primary A) • Connect to router and count the records: mongos>use test2 mongos>db,testSchema1.count() 11 • Check if (Server 2) Secondary A in now primary • Load a new chunck • Counting will be 22 • Restart killed server (Server 1) , wait • Kill the other one (Server 2), Primary A • Check that Server 1 is Primary again • Counting will still be 22 • Restart Server 2