SlideShare a Scribd company logo
1 of 15
MongoDB – Roma
 12 Luglio 2012
 Replication and Sharding:
         Hands on

     Guglielmo Incisa
Replication
• What is it
   – Data is replicated (cloned) into at least two nodes
   – Updates are sent to one node (Primary) and automatically propagated
     to the others (Secondary)
   – Connection can through a router or directly to the Primary (Secondary
     is read only)
       • If we connect our app server to the Primary we must deal with its failure and
         reconnect to the new Primary



                                        Primary


             App server                                       DB


                          Router
Replication
• Why we need it
   – If one node fails the application server can still work without any
     impact
   – The router will automatically manage the connection to the rest of the
     nodes (router may be subject to failure though)


                                         Primary


         App server                                DB


                      Router
Replication
• Why we need it
   – More and more IT departments are moving from
       • Big, proprietary, reliable and expensive servers
   – To
       • Commodity Hardware (smaller, less reliable, inexpensive servers: PC)
   – Commodity hardware is less reliable but our users demand that our
     applications be always available: the replication can help.
   – Example: how many servers do I need to have 99,999% of availability?
       • If for example a PC has 98% availability (8 days if downtime in a year, or 98%
         probability to be down)
       • -> Two replicated PC have 99,96% of availability
       • -> Three replicated PC have more than 99,999% (Telecom Grade / Core Network).
Sharding
• What is it
   – Data is partitioned and distributed to different nodes
       • Some records are in node 1, others in node 2 etc…
   – MongoDB Sharding: the partition is based on a field.
       • Database: test2
            – Table: testSchema1
            – Fields:
                  » owner: owner of the file, key and shard key (string)
                  » date (string)
                  » tags (list of string)
                  » keywords: words in the document, created by java code below (list of string)
                  » fileName (string)
                  » content: the file (binary)
                  » ascii: the file (string)
Sharding
• Why we need it
   – Servers with smaller storage
   – To increase responsiveness by increasing parallelism

                                     Router




        Owner: A-H    Owner: I-O              Owner: P-Z
Replication and Sharding
• Can we have both?
   – MongoDB: yes!
• Our example:


                                           Shard A: 2 + arbiter


                          Config process

                                                                     Shard B: 2 + arbiter


                     Router
                     mongos
                                                      Shard C: 2 + arbiter
Replication and Sharding
• Replication:
     – Two nodes and an arbiter
          •   The arbiter is needed when a number of even nodes are used, it decides which server is Primary and which
              one is secondary, manages the upgrade when one is down

•   Sharding
     – Three sets: A, B, C
     – Config Process:
          •   <<The config servers store the cluster's metadata, which includes basic information on each shard server and
              the chunks contained therein.>>
     – Routing Process:
          •   <<The mongos process can be thought of as a routing and coordination process that makes the various
              components of the cluster look like a single system. When receiving client requests, the mongos process routes
              the request to the appropriate server(s) and merges any results to be sent back to the client.>>
Setup 1
•   Start Servers and arbiters
     –   Create /data/db, db2, db3, db4, db5, db6, db7, db8 ,db9, configdb
     –   --nojournal speeds up the startup (journalling is default in 64 bit)
•   Replica set A
     –   ./mongodb-linux-x86_64-2.0.4/bin/mongod --shardsvr --replSet DSSA –nojournal
     –   ./mongodb-linux-x86_64-2.0.4/bin/mongod --shardsvr --replSet DSSA --dbpath /data/db2
         --port 27021 –nojournal
     –   Arbiter:
                                                                                                   Shard A: 2 + arbiter
     –   ./mongodb-linux-x86_64-2.0.4/bin/mongod --shardsvr --replSet DSSA --dbpath /data/db7
         --port 27031 –nojournal


•   Replica set B
     –   ./mongodb-linux-x86_64-2.0.4/bin/mongod --shardsvr --replSet DSSB --dbpath /data/db3 --                   Shard B: 2 + arbiter
         port 27023 –nojournal
     –   ./mongodb-linux-x86_64-2.0.4/bin/mongod --shardsvr --replSet DSSB --dbpath /data/db4 --
         port 27025 –nojournal
     –   Arbiter:                                                                                             Shard C: 2 + arbiter
     –   ./mongodb-linux-x86_64-2.0.4/bin/mongod --shardsvr --replSet DSSB --dbpath /data/db8 --
         port 27035 –nojournal


•   Replica set C
     –   ./mongodb-linux-x86_64-2.0.4/bin/mongod --shardsvr --replSet DSSC --dbpath /data/db5 --
         port 27027 –nojournal
     –   ./mongodb-linux-x86_64-2.0.4/bin/mongod --shardsvr --replSet DSSC --dbpath /data/db6 --
         port 27029 –nojournal
     –   Arbiter:
     –   ./mongodb-linux-x86_64-2.0.4/bin/mongod --shardsvr --replSet DSSC --dbpath /data/db9 --
         port 27039 --nojournal
Setup 2
•   Set the replicas, connect to each primary and set the configuration
•   Set replica A
     ./mongodb-linux-x86_64-2.0.4/bin/mongo --port 27018
     cfg = {
                    _id : "DSSA",
                    members : [
                      {_id : 0, host : “hostname:27018"},
                      {_id : 1, host : "hostname:27021"},
                    {_id : 2, host : "hostname:27031", arbiterOnly:true}
                    ]
                }
     rs.initiate(cfg)
     db.getMongo().setSlaveOk()

•   Set replica B
     ./mongodb-linux-x86_64-2.0.4/bin/mongo --port 27023
           cfg = {
              _id : "DSSB",
              members : [
                {_id : 0, host : "hostname:27023"},
                {_id : 1, host : "hostname:27025"},
              {_id : 2, host : "hostname:27035", arbiterOnly:true}
              ]
           }
           rs.initiate(cfg)
           db.getMongo().setSlaveOk()

•   Set replica C
     ./mongodb-linux-x86_64-2.0.4/bin/mongo --port 27027
           cfg = {
              _id : "DSSC",
              members : [
                {_id : 0, host : "hostname:27027"},
                {_id : 1, host : "hostname:27029"},
              {_id : 2, host : "hostname:27039", arbiterOnly:true},
              ]
           }
           rs.initiate(cfg)
           db.getMongo().setSlaveOk()
Setup 3
•   Star config server
     ./mongodb-linux-x86_64-2.0.4/bin/mongod --configsvr --nojournal



•   Start router
     ./mongodb-linux-x86_64-2.0.4/bin/mongos --configdb grog:27019 --chunkSize 1


•   Configure Shards
     ./mongodb-linux-x86_64-2.0.4/bin/mongo admin
     db.runCommand( { addshard : "DSSA/hostname:27018, hostname:27021"})
     db.runCommand( { addshard : "DSSB/hostname:27023, hostname:27025"})
     db.runCommand( { addshard : "DSSC/hostname:27027, hostname:27029"})
     db.runCommand( { enablesharding : "test2"})
     db.runCommand( { shardcollection : "test2.testSchema1",key : { owner : 1}})

•   Load data…

     – We load 11 documents, sharding is done over the “owner”
MapReduce
•   "Map" step: The master node takes the input, divides it into smaller sub-
    problems, and distributes them to worker nodes. A worker node may do
    this again in turn, leading to a multi-level tree structure. The worker node
    processes the smaller problem, and passes the answer back to its master
    node.
•   "Reduce" step: The master node then collects the answers to all the sub-
    problems and combines them in some way to form the output – the
    answer to the problem it was originally trying to solve.
•   Source: Wikipedia
•
MapReduce
•   map = function(){
             if(!this.keywords){
             return;
             }
             for (index in this.keywords){
                           emit(this.keywords[index],1);
             }
    }
•   reduce = function(previous,current){
             var count = 0;
             for (index in current) {
                           count += current[index];
             }
             return count;
    }
•   result = db.runCommand({
             "mapreduce" : "testSchema1",
             "map":map,
             "reduce":reduce,
             "out":"keywords"})
    db.keywords.find()
    mongos> db.keywords.find({_id:“hello"})
Check Sharding
•   Connect to router and count the records:
     ./mongodb-linux-x86_64-2.0.4/bin/mongo admin
     mongos>use test2
     mongos>db,testSchema1.count()
     11
•   Connect to each primary (and see the number of records in each shard):
     ./mongodb-linux-x86_64-2.0.4/bin/mongo --port 27018
     mongo>use test2
     Mongo>db,testSchema1.count()
     4
     ./mongodb-linux-x86_64-2.0.4/bin/mongo --port 27023
     mongo>use test2
     mongo>db,testSchema1.count()
     4
     ./mongodb-linux-x86_64-2.0.4/bin/mongo --port 27027
     mongo>use test2
     mongo>db,testSchema1.count()
     3
Check Replication
•   Kill Server 1 (=Primary A)
•   Connect to router and count the records:
     mongos>use test2
     mongos>db,testSchema1.count()
     11
•   Check if (Server 2) Secondary A in now primary
•   Load a new chunck
•   Counting will be 22
•   Restart killed server (Server 1) , wait
•   Kill the other one (Server 2), Primary A
•   Check that Server 1 is Primary again
•   Counting will still be 22
•   Restart Server 2

More Related Content

What's hot

Ceph Performance and Sizing Guide
Ceph Performance and Sizing GuideCeph Performance and Sizing Guide
Ceph Performance and Sizing GuideJose De La Rosa
 
PostgreSQL Performance Tuning
PostgreSQL Performance TuningPostgreSQL Performance Tuning
PostgreSQL Performance Tuningelliando dias
 
DUG'20: 12 - DAOS in Lenovo’s HPC Innovation Center
DUG'20: 12 - DAOS in Lenovo’s HPC Innovation CenterDUG'20: 12 - DAOS in Lenovo’s HPC Innovation Center
DUG'20: 12 - DAOS in Lenovo’s HPC Innovation CenterAndrey Kudryavtsev
 
Mastering PostgreSQL Administration
Mastering PostgreSQL AdministrationMastering PostgreSQL Administration
Mastering PostgreSQL AdministrationEDB
 
Benchmarking MongoDB and CouchBase
Benchmarking MongoDB and CouchBaseBenchmarking MongoDB and CouchBase
Benchmarking MongoDB and CouchBaseChristopher Choi
 
Red Hat Enterprise Linux OpenStack Platform on Inktank Ceph Enterprise
Red Hat Enterprise Linux OpenStack Platform on Inktank Ceph EnterpriseRed Hat Enterprise Linux OpenStack Platform on Inktank Ceph Enterprise
Red Hat Enterprise Linux OpenStack Platform on Inktank Ceph EnterpriseRed_Hat_Storage
 
PGConf.ASIA 2019 - High Availability, 10 Seconds Failover - Lucky Haryadi
PGConf.ASIA 2019 - High Availability, 10 Seconds Failover - Lucky HaryadiPGConf.ASIA 2019 - High Availability, 10 Seconds Failover - Lucky Haryadi
PGConf.ASIA 2019 - High Availability, 10 Seconds Failover - Lucky HaryadiEqunix Business Solutions
 
Logical volume manager xfs
Logical volume manager xfsLogical volume manager xfs
Logical volume manager xfsSarwar Javaid
 
Replication and Replica Sets
Replication and Replica SetsReplication and Replica Sets
Replication and Replica SetsMongoDB
 
Storage Systems for big data - HDFS, HBase, and intro to KV Store - Redis
Storage Systems for big data - HDFS, HBase, and intro to KV Store - RedisStorage Systems for big data - HDFS, HBase, and intro to KV Store - Redis
Storage Systems for big data - HDFS, HBase, and intro to KV Store - RedisSameer Tiwari
 
Migrating to XtraDB Cluster
Migrating to XtraDB ClusterMigrating to XtraDB Cluster
Migrating to XtraDB Clusterpercona2013
 
HADOOP 실제 구성 사례, Multi-Node 구성
HADOOP 실제 구성 사례, Multi-Node 구성HADOOP 실제 구성 사례, Multi-Node 구성
HADOOP 실제 구성 사례, Multi-Node 구성Young Pyo
 
A DBA’s guide to using TSA
A DBA’s guide to using TSAA DBA’s guide to using TSA
A DBA’s guide to using TSAFrederik Engelen
 
Elastic Search Training#1 (brief tutorial)-ESCC#1
Elastic Search Training#1 (brief tutorial)-ESCC#1Elastic Search Training#1 (brief tutorial)-ESCC#1
Elastic Search Training#1 (brief tutorial)-ESCC#1medcl
 
Inside PostgreSQL Shared Memory
Inside PostgreSQL Shared MemoryInside PostgreSQL Shared Memory
Inside PostgreSQL Shared MemoryEDB
 
MariaDB 10.5 binary install (바이너리 설치)
MariaDB 10.5 binary install (바이너리 설치)MariaDB 10.5 binary install (바이너리 설치)
MariaDB 10.5 binary install (바이너리 설치)NeoClova
 

What's hot (20)

Ceph Performance and Sizing Guide
Ceph Performance and Sizing GuideCeph Performance and Sizing Guide
Ceph Performance and Sizing Guide
 
PostgreSQL Performance Tuning
PostgreSQL Performance TuningPostgreSQL Performance Tuning
PostgreSQL Performance Tuning
 
DUG'20: 12 - DAOS in Lenovo’s HPC Innovation Center
DUG'20: 12 - DAOS in Lenovo’s HPC Innovation CenterDUG'20: 12 - DAOS in Lenovo’s HPC Innovation Center
DUG'20: 12 - DAOS in Lenovo’s HPC Innovation Center
 
Mastering PostgreSQL Administration
Mastering PostgreSQL AdministrationMastering PostgreSQL Administration
Mastering PostgreSQL Administration
 
Mongodb replication
Mongodb replicationMongodb replication
Mongodb replication
 
Benchmarking MongoDB and CouchBase
Benchmarking MongoDB and CouchBaseBenchmarking MongoDB and CouchBase
Benchmarking MongoDB and CouchBase
 
Red Hat Enterprise Linux OpenStack Platform on Inktank Ceph Enterprise
Red Hat Enterprise Linux OpenStack Platform on Inktank Ceph EnterpriseRed Hat Enterprise Linux OpenStack Platform on Inktank Ceph Enterprise
Red Hat Enterprise Linux OpenStack Platform on Inktank Ceph Enterprise
 
PGConf.ASIA 2019 - High Availability, 10 Seconds Failover - Lucky Haryadi
PGConf.ASIA 2019 - High Availability, 10 Seconds Failover - Lucky HaryadiPGConf.ASIA 2019 - High Availability, 10 Seconds Failover - Lucky Haryadi
PGConf.ASIA 2019 - High Availability, 10 Seconds Failover - Lucky Haryadi
 
Upgrade & ndmp
Upgrade & ndmpUpgrade & ndmp
Upgrade & ndmp
 
Hadoop HDFS Concepts
Hadoop HDFS ConceptsHadoop HDFS Concepts
Hadoop HDFS Concepts
 
Logical volume manager xfs
Logical volume manager xfsLogical volume manager xfs
Logical volume manager xfs
 
Replication and Replica Sets
Replication and Replica SetsReplication and Replica Sets
Replication and Replica Sets
 
Hadoop Introduction
Hadoop IntroductionHadoop Introduction
Hadoop Introduction
 
Storage Systems for big data - HDFS, HBase, and intro to KV Store - Redis
Storage Systems for big data - HDFS, HBase, and intro to KV Store - RedisStorage Systems for big data - HDFS, HBase, and intro to KV Store - Redis
Storage Systems for big data - HDFS, HBase, and intro to KV Store - Redis
 
Migrating to XtraDB Cluster
Migrating to XtraDB ClusterMigrating to XtraDB Cluster
Migrating to XtraDB Cluster
 
HADOOP 실제 구성 사례, Multi-Node 구성
HADOOP 실제 구성 사례, Multi-Node 구성HADOOP 실제 구성 사례, Multi-Node 구성
HADOOP 실제 구성 사례, Multi-Node 구성
 
A DBA’s guide to using TSA
A DBA’s guide to using TSAA DBA’s guide to using TSA
A DBA’s guide to using TSA
 
Elastic Search Training#1 (brief tutorial)-ESCC#1
Elastic Search Training#1 (brief tutorial)-ESCC#1Elastic Search Training#1 (brief tutorial)-ESCC#1
Elastic Search Training#1 (brief tutorial)-ESCC#1
 
Inside PostgreSQL Shared Memory
Inside PostgreSQL Shared MemoryInside PostgreSQL Shared Memory
Inside PostgreSQL Shared Memory
 
MariaDB 10.5 binary install (바이너리 설치)
MariaDB 10.5 binary install (바이너리 설치)MariaDB 10.5 binary install (바이너리 설치)
MariaDB 10.5 binary install (바이너리 설치)
 

Viewers also liked

MongoDB Replication and Sharding
MongoDB Replication and ShardingMongoDB Replication and Sharding
MongoDB Replication and ShardingTharun Srinivasa
 
MongoDB: Replication,Sharding,MapReduce
MongoDB: Replication,Sharding,MapReduceMongoDB: Replication,Sharding,MapReduce
MongoDB: Replication,Sharding,MapReduceTakahiro Inoue
 
MongoDB: Advance concepts - Replication and Sharding
MongoDB: Advance concepts - Replication and ShardingMongoDB: Advance concepts - Replication and Sharding
MongoDB: Advance concepts - Replication and ShardingKnoldus Inc.
 
MongoDB Introduction - Document Oriented Nosql Database
MongoDB Introduction - Document Oriented Nosql DatabaseMongoDB Introduction - Document Oriented Nosql Database
MongoDB Introduction - Document Oriented Nosql DatabaseSudhir Patil
 
"Sharding - patterns & antipatterns". Доклад Алексея Рыбака (Badoo) и Констан...
"Sharding - patterns & antipatterns". Доклад Алексея Рыбака (Badoo) и Констан..."Sharding - patterns & antipatterns". Доклад Алексея Рыбака (Badoo) и Констан...
"Sharding - patterns & antipatterns". Доклад Алексея Рыбака (Badoo) и Констан...Badoo Development
 

Viewers also liked (6)

MongoDB by Tonny
MongoDB by TonnyMongoDB by Tonny
MongoDB by Tonny
 
MongoDB Replication and Sharding
MongoDB Replication and ShardingMongoDB Replication and Sharding
MongoDB Replication and Sharding
 
MongoDB: Replication,Sharding,MapReduce
MongoDB: Replication,Sharding,MapReduceMongoDB: Replication,Sharding,MapReduce
MongoDB: Replication,Sharding,MapReduce
 
MongoDB: Advance concepts - Replication and Sharding
MongoDB: Advance concepts - Replication and ShardingMongoDB: Advance concepts - Replication and Sharding
MongoDB: Advance concepts - Replication and Sharding
 
MongoDB Introduction - Document Oriented Nosql Database
MongoDB Introduction - Document Oriented Nosql DatabaseMongoDB Introduction - Document Oriented Nosql Database
MongoDB Introduction - Document Oriented Nosql Database
 
"Sharding - patterns & antipatterns". Доклад Алексея Рыбака (Badoo) и Констан...
"Sharding - patterns & antipatterns". Доклад Алексея Рыбака (Badoo) и Констан..."Sharding - patterns & antipatterns". Доклад Алексея Рыбака (Badoo) и Констан...
"Sharding - patterns & antipatterns". Доклад Алексея Рыбака (Badoo) и Констан...
 

Similar to Mongo db roma replication and sharding

MongoDB for Time Series Data Part 3: Sharding
MongoDB for Time Series Data Part 3: ShardingMongoDB for Time Series Data Part 3: Sharding
MongoDB for Time Series Data Part 3: ShardingMongoDB
 
NoSQL Infrastructure - Late 2013
NoSQL Infrastructure - Late 2013NoSQL Infrastructure - Late 2013
NoSQL Infrastructure - Late 2013Server Density
 
MongoDB: Optimising for Performance, Scale & Analytics
MongoDB: Optimising for Performance, Scale & AnalyticsMongoDB: Optimising for Performance, Scale & Analytics
MongoDB: Optimising for Performance, Scale & AnalyticsServer Density
 
2014 05-07-fr - add dev series - session 6 - deploying your application-2
2014 05-07-fr - add dev series - session 6 - deploying your application-22014 05-07-fr - add dev series - session 6 - deploying your application-2
2014 05-07-fr - add dev series - session 6 - deploying your application-2MongoDB
 
Ops Jumpstart: MongoDB Administration 101
Ops Jumpstart: MongoDB Administration 101Ops Jumpstart: MongoDB Administration 101
Ops Jumpstart: MongoDB Administration 101MongoDB
 
MongoDB for Time Series Data: Sharding
MongoDB for Time Series Data: ShardingMongoDB for Time Series Data: Sharding
MongoDB for Time Series Data: ShardingMongoDB
 
Sharding - Seoul 2012
Sharding - Seoul 2012Sharding - Seoul 2012
Sharding - Seoul 2012MongoDB
 
Sharding
ShardingSharding
ShardingMongoDB
 
Scylla Summit 2022: ScyllaDB Rust Driver: One Driver to Rule Them All
Scylla Summit 2022: ScyllaDB Rust Driver: One Driver to Rule Them AllScylla Summit 2022: ScyllaDB Rust Driver: One Driver to Rule Them All
Scylla Summit 2022: ScyllaDB Rust Driver: One Driver to Rule Them AllScyllaDB
 
Deployment Strategy
Deployment StrategyDeployment Strategy
Deployment StrategyMongoDB
 
Sharding
ShardingSharding
ShardingMongoDB
 
Get expertise with mongo db
Get expertise with mongo dbGet expertise with mongo db
Get expertise with mongo dbAmit Thakkar
 
Using Spring with NoSQL databases (SpringOne China 2012)
Using Spring with NoSQL databases (SpringOne China 2012)Using Spring with NoSQL databases (SpringOne China 2012)
Using Spring with NoSQL databases (SpringOne China 2012)Chris Richardson
 
Spil Games @ FOSDEM: Galera Replicator IRL
Spil Games @ FOSDEM: Galera Replicator IRLSpil Games @ FOSDEM: Galera Replicator IRL
Spil Games @ FOSDEM: Galera Replicator IRLspil-engineering
 
Deployment Strategies
Deployment StrategiesDeployment Strategies
Deployment StrategiesMongoDB
 
2016 feb-23 pyugre-py_mongo
2016 feb-23 pyugre-py_mongo2016 feb-23 pyugre-py_mongo
2016 feb-23 pyugre-py_mongoMichael Bright
 

Similar to Mongo db roma replication and sharding (20)

MongoDB for Time Series Data Part 3: Sharding
MongoDB for Time Series Data Part 3: ShardingMongoDB for Time Series Data Part 3: Sharding
MongoDB for Time Series Data Part 3: Sharding
 
Mongodb workshop
Mongodb workshopMongodb workshop
Mongodb workshop
 
NoSQL Infrastructure - Late 2013
NoSQL Infrastructure - Late 2013NoSQL Infrastructure - Late 2013
NoSQL Infrastructure - Late 2013
 
MongoDB: Optimising for Performance, Scale & Analytics
MongoDB: Optimising for Performance, Scale & AnalyticsMongoDB: Optimising for Performance, Scale & Analytics
MongoDB: Optimising for Performance, Scale & Analytics
 
NoSQL Infrastructure
NoSQL InfrastructureNoSQL Infrastructure
NoSQL Infrastructure
 
2014 05-07-fr - add dev series - session 6 - deploying your application-2
2014 05-07-fr - add dev series - session 6 - deploying your application-22014 05-07-fr - add dev series - session 6 - deploying your application-2
2014 05-07-fr - add dev series - session 6 - deploying your application-2
 
Introduction to Mongodb
Introduction to MongodbIntroduction to Mongodb
Introduction to Mongodb
 
Ops Jumpstart: MongoDB Administration 101
Ops Jumpstart: MongoDB Administration 101Ops Jumpstart: MongoDB Administration 101
Ops Jumpstart: MongoDB Administration 101
 
MongoDB for Time Series Data: Sharding
MongoDB for Time Series Data: ShardingMongoDB for Time Series Data: Sharding
MongoDB for Time Series Data: Sharding
 
Sharding - Seoul 2012
Sharding - Seoul 2012Sharding - Seoul 2012
Sharding - Seoul 2012
 
Sharding
ShardingSharding
Sharding
 
Scylla Summit 2022: ScyllaDB Rust Driver: One Driver to Rule Them All
Scylla Summit 2022: ScyllaDB Rust Driver: One Driver to Rule Them AllScylla Summit 2022: ScyllaDB Rust Driver: One Driver to Rule Them All
Scylla Summit 2022: ScyllaDB Rust Driver: One Driver to Rule Them All
 
Deployment Strategy
Deployment StrategyDeployment Strategy
Deployment Strategy
 
Sharding
ShardingSharding
Sharding
 
Get expertise with mongo db
Get expertise with mongo dbGet expertise with mongo db
Get expertise with mongo db
 
mongodb tutorial
mongodb tutorialmongodb tutorial
mongodb tutorial
 
Using Spring with NoSQL databases (SpringOne China 2012)
Using Spring with NoSQL databases (SpringOne China 2012)Using Spring with NoSQL databases (SpringOne China 2012)
Using Spring with NoSQL databases (SpringOne China 2012)
 
Spil Games @ FOSDEM: Galera Replicator IRL
Spil Games @ FOSDEM: Galera Replicator IRLSpil Games @ FOSDEM: Galera Replicator IRL
Spil Games @ FOSDEM: Galera Replicator IRL
 
Deployment Strategies
Deployment StrategiesDeployment Strategies
Deployment Strategies
 
2016 feb-23 pyugre-py_mongo
2016 feb-23 pyugre-py_mongo2016 feb-23 pyugre-py_mongo
2016 feb-23 pyugre-py_mongo
 

Recently uploaded

New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 

Recently uploaded (20)

New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 

Mongo db roma replication and sharding

  • 1. MongoDB – Roma 12 Luglio 2012 Replication and Sharding: Hands on Guglielmo Incisa
  • 2. Replication • What is it – Data is replicated (cloned) into at least two nodes – Updates are sent to one node (Primary) and automatically propagated to the others (Secondary) – Connection can through a router or directly to the Primary (Secondary is read only) • If we connect our app server to the Primary we must deal with its failure and reconnect to the new Primary Primary App server DB Router
  • 3. Replication • Why we need it – If one node fails the application server can still work without any impact – The router will automatically manage the connection to the rest of the nodes (router may be subject to failure though) Primary App server DB Router
  • 4. Replication • Why we need it – More and more IT departments are moving from • Big, proprietary, reliable and expensive servers – To • Commodity Hardware (smaller, less reliable, inexpensive servers: PC) – Commodity hardware is less reliable but our users demand that our applications be always available: the replication can help. – Example: how many servers do I need to have 99,999% of availability? • If for example a PC has 98% availability (8 days if downtime in a year, or 98% probability to be down) • -> Two replicated PC have 99,96% of availability • -> Three replicated PC have more than 99,999% (Telecom Grade / Core Network).
  • 5. Sharding • What is it – Data is partitioned and distributed to different nodes • Some records are in node 1, others in node 2 etc… – MongoDB Sharding: the partition is based on a field. • Database: test2 – Table: testSchema1 – Fields: » owner: owner of the file, key and shard key (string) » date (string) » tags (list of string) » keywords: words in the document, created by java code below (list of string) » fileName (string) » content: the file (binary) » ascii: the file (string)
  • 6. Sharding • Why we need it – Servers with smaller storage – To increase responsiveness by increasing parallelism Router Owner: A-H Owner: I-O Owner: P-Z
  • 7. Replication and Sharding • Can we have both? – MongoDB: yes! • Our example: Shard A: 2 + arbiter Config process Shard B: 2 + arbiter Router mongos Shard C: 2 + arbiter
  • 8. Replication and Sharding • Replication: – Two nodes and an arbiter • The arbiter is needed when a number of even nodes are used, it decides which server is Primary and which one is secondary, manages the upgrade when one is down • Sharding – Three sets: A, B, C – Config Process: • <<The config servers store the cluster's metadata, which includes basic information on each shard server and the chunks contained therein.>> – Routing Process: • <<The mongos process can be thought of as a routing and coordination process that makes the various components of the cluster look like a single system. When receiving client requests, the mongos process routes the request to the appropriate server(s) and merges any results to be sent back to the client.>>
  • 9. Setup 1 • Start Servers and arbiters – Create /data/db, db2, db3, db4, db5, db6, db7, db8 ,db9, configdb – --nojournal speeds up the startup (journalling is default in 64 bit) • Replica set A – ./mongodb-linux-x86_64-2.0.4/bin/mongod --shardsvr --replSet DSSA –nojournal – ./mongodb-linux-x86_64-2.0.4/bin/mongod --shardsvr --replSet DSSA --dbpath /data/db2 --port 27021 –nojournal – Arbiter: Shard A: 2 + arbiter – ./mongodb-linux-x86_64-2.0.4/bin/mongod --shardsvr --replSet DSSA --dbpath /data/db7 --port 27031 –nojournal • Replica set B – ./mongodb-linux-x86_64-2.0.4/bin/mongod --shardsvr --replSet DSSB --dbpath /data/db3 -- Shard B: 2 + arbiter port 27023 –nojournal – ./mongodb-linux-x86_64-2.0.4/bin/mongod --shardsvr --replSet DSSB --dbpath /data/db4 -- port 27025 –nojournal – Arbiter: Shard C: 2 + arbiter – ./mongodb-linux-x86_64-2.0.4/bin/mongod --shardsvr --replSet DSSB --dbpath /data/db8 -- port 27035 –nojournal • Replica set C – ./mongodb-linux-x86_64-2.0.4/bin/mongod --shardsvr --replSet DSSC --dbpath /data/db5 -- port 27027 –nojournal – ./mongodb-linux-x86_64-2.0.4/bin/mongod --shardsvr --replSet DSSC --dbpath /data/db6 -- port 27029 –nojournal – Arbiter: – ./mongodb-linux-x86_64-2.0.4/bin/mongod --shardsvr --replSet DSSC --dbpath /data/db9 -- port 27039 --nojournal
  • 10. Setup 2 • Set the replicas, connect to each primary and set the configuration • Set replica A ./mongodb-linux-x86_64-2.0.4/bin/mongo --port 27018 cfg = { _id : "DSSA", members : [ {_id : 0, host : “hostname:27018"}, {_id : 1, host : "hostname:27021"}, {_id : 2, host : "hostname:27031", arbiterOnly:true} ] } rs.initiate(cfg) db.getMongo().setSlaveOk() • Set replica B ./mongodb-linux-x86_64-2.0.4/bin/mongo --port 27023 cfg = { _id : "DSSB", members : [ {_id : 0, host : "hostname:27023"}, {_id : 1, host : "hostname:27025"}, {_id : 2, host : "hostname:27035", arbiterOnly:true} ] } rs.initiate(cfg) db.getMongo().setSlaveOk() • Set replica C ./mongodb-linux-x86_64-2.0.4/bin/mongo --port 27027 cfg = { _id : "DSSC", members : [ {_id : 0, host : "hostname:27027"}, {_id : 1, host : "hostname:27029"}, {_id : 2, host : "hostname:27039", arbiterOnly:true}, ] } rs.initiate(cfg) db.getMongo().setSlaveOk()
  • 11. Setup 3 • Star config server ./mongodb-linux-x86_64-2.0.4/bin/mongod --configsvr --nojournal • Start router ./mongodb-linux-x86_64-2.0.4/bin/mongos --configdb grog:27019 --chunkSize 1 • Configure Shards ./mongodb-linux-x86_64-2.0.4/bin/mongo admin db.runCommand( { addshard : "DSSA/hostname:27018, hostname:27021"}) db.runCommand( { addshard : "DSSB/hostname:27023, hostname:27025"}) db.runCommand( { addshard : "DSSC/hostname:27027, hostname:27029"}) db.runCommand( { enablesharding : "test2"}) db.runCommand( { shardcollection : "test2.testSchema1",key : { owner : 1}}) • Load data… – We load 11 documents, sharding is done over the “owner”
  • 12. MapReduce • "Map" step: The master node takes the input, divides it into smaller sub- problems, and distributes them to worker nodes. A worker node may do this again in turn, leading to a multi-level tree structure. The worker node processes the smaller problem, and passes the answer back to its master node. • "Reduce" step: The master node then collects the answers to all the sub- problems and combines them in some way to form the output – the answer to the problem it was originally trying to solve. • Source: Wikipedia •
  • 13. MapReduce • map = function(){ if(!this.keywords){ return; } for (index in this.keywords){ emit(this.keywords[index],1); } } • reduce = function(previous,current){ var count = 0; for (index in current) { count += current[index]; } return count; } • result = db.runCommand({ "mapreduce" : "testSchema1", "map":map, "reduce":reduce, "out":"keywords"}) db.keywords.find() mongos> db.keywords.find({_id:“hello"})
  • 14. Check Sharding • Connect to router and count the records: ./mongodb-linux-x86_64-2.0.4/bin/mongo admin mongos>use test2 mongos>db,testSchema1.count() 11 • Connect to each primary (and see the number of records in each shard): ./mongodb-linux-x86_64-2.0.4/bin/mongo --port 27018 mongo>use test2 Mongo>db,testSchema1.count() 4 ./mongodb-linux-x86_64-2.0.4/bin/mongo --port 27023 mongo>use test2 mongo>db,testSchema1.count() 4 ./mongodb-linux-x86_64-2.0.4/bin/mongo --port 27027 mongo>use test2 mongo>db,testSchema1.count() 3
  • 15. Check Replication • Kill Server 1 (=Primary A) • Connect to router and count the records: mongos>use test2 mongos>db,testSchema1.count() 11 • Check if (Server 2) Secondary A in now primary • Load a new chunck • Counting will be 22 • Restart killed server (Server 1) , wait • Kill the other one (Server 2), Primary A • Check that Server 1 is Primary again • Counting will still be 22 • Restart Server 2