SlideShare a Scribd company logo
Replication
& Durability
@spf13

                  AKA
Steve Francia
15+ years
building the
internet
  Father, husband,
  skateboarder


Chief Solutions Architect @
responsible for drivers,
integrations, web & docs
Agenda
• Intro to replication
• How MongoDB does Replication
• Configuring a ReplicaSet
• Advanced Replication
• Durability
• High Availability Scenarios
Replication
Use cases
Use cases
• High Availability (auto-failover)
Use cases
• High Availability (auto-failover)
• Read Scaling (extra copies to read from)
Use cases
• High Availability (auto-failover)
• Read Scaling (extra copies to read from)
• Backups Delayed Copy (fat finger)
 • Online, Time (PiT) backups
 • Point in
Use cases
• High Availability (auto-failover)
• Read Scaling (extra copies to read from)
• Backups Delayed Copy (fat finger)
 • Online, Time (PiT) backups
 • Point in
• Use (hidden) replica for secondary
  workload
 • Analytics
 • Data-processingexternal systems
 • Integration with
Types of outage
Types of outage
Planned
 • Hardware upgrade
 • O/S or file-system tuning
 • Relocation of data to new file-system /
   storage
 • Software upgrade
Types of outage
Planned
 • Hardware upgrade
 • O/S or file-system tuning
 • Relocation of data to new file-system /
   storage
 • Software upgrade
Unplanned
 • Hardware failure
 • Data center failure
 • Region outage
 • Human error
 • Application corruption
Replica Set features
Replica Set features
• A cluster of N servers
Replica Set features
• A cluster of N servers
• Any (one) node can be primary
Replica Set features
• A cluster of N servers
• Any (one) node can be primary
• Consensus election of primary
Replica Set features
•   A cluster of N servers
•   Any (one) node can be primary
•   Consensus election of primary
•   Automatic failover
Replica Set features
•   A cluster of N servers
•   Any (one) node can be primary
•   Consensus election of primary
•   Automatic failover
•   Automatic recovery
Replica Set features
•   A cluster of N servers
•   Any (one) node can be primary
•   Consensus election of primary
•   Automatic failover
•   Automatic recovery
•   All writes to primary
Replica Set features
•    A cluster of N servers
•    Any (one) node can be primary
•    Consensus election of primary
•    Automatic failover
•    Automatic recovery
•    All writes to primary
•    Reads can be to primary
    (default) or a secondary
How
 MongoDB
Replication
How MongoDB
    Replication works
        Member 1              Member 3




                   Member 2




• Set is made up of 2 or more nodes
How MongoDB
    Replication works
        Member 1              Member 3




                   Member 2
                    Primary




• Election establishes the PRIMARY
• Data replication from PRIMARY to
How MongoDB
     Replication works
                     negotiate
                    new master

         Member 1                Member 3




                    Member 2
                     DOWN




• PRIMARY may fail
• Automatic election of new PRIMARY if
How MongoDB
    Replication works
                              Member 3
        Member 1
                               Primary




                   Member 2
                    DOWN




• New PRIMARY elected
• Replica Set re-established
How MongoDB
    Replication works
                                Member 3
        Member 1
                                 Primary




                   Member 2
                   Recovering




• Automatic recovery
How MongoDB
    Replication works
                            Member 3
        Member 1
                             Primary




                   Member
                     2




• Replica Set re-established
How Is Data
Replicated?
How Is Data
         Replicated? to the
• Change operations are written
oplog
 • The oplog is a capped collection (fixed size)
  •Must have enough space to allow new secondaries to
   catch up (from scratch or from a backup)
  •Must have enough space to cope with any applicable
   slaveDelay
How Is Data
         Replicated? to the
• Change operations are written
 oplog
 • The oplog is a capped collection (fixed size)
  •Must have enough space to allow new secondaries to
   catch up (from scratch or from a backup)
  •Must have enough space to cope with any applicable
   slaveDelay
• Secondaries query the primary’s oplog
 and apply what they find
 • All replicas contain an oplog
Configuring
a ReplicaSet
Creating a Replica Set
$ ./mongod --replSet <name>

> cfg = {
  _id : "<name>",
  members : [
    { _id : 0, host : "sf1.acme.com" },
    { _id : 1, host : "sf2.acme.com" },
    { _id : 2, host : "sf3.acme.com" }
  ]
}
> use admin
> rs.initiate(cfg)
Managing a Replica Set
Managing a Replica Set
rs.conf()
   Shell helper: get current configuration
Managing a Replica Set
rs.conf()
   Shell helper: get current configuration
rs.initiate(<cfg>);
   Shell helper: initiate replica set
Managing a Replica Set
rs.conf()
   Shell helper: get current configuration
rs.initiate(<cfg>);
   Shell helper: initiate replica set
rs.reconfig(<cfg>)
   Shell helper: reconfigure a replica set
Managing a Replica Set
rs.conf()
   Shell helper: get current configuration
rs.initiate(<cfg>);
   Shell helper: initiate replica set
rs.reconfig(<cfg>)
   Shell helper: reconfigure a replica set
rs.add("hostname:<port>")
   Shell helper: add a new member
Managing a Replica Set
rs.conf()
   Shell helper: get current configuration
rs.initiate(<cfg>);
   Shell helper: initiate replica set
rs.reconfig(<cfg>)
   Shell helper: reconfigure a replica set
rs.add("hostname:<port>")
   Shell helper: add a new member
rs.remove("hostname:<port>")
   Shell helper: remove a member
Managing a Replica Set
Managing a Replica Set
 rs.status()
    Reports status of the replica set from one
    node's point of view
Managing a Replica Set
 rs.status()
    Reports status of the replica set from one
    node's point of view
 rs.stepDown(<secs>)
    Request the primary to step down
Managing a Replica Set
 rs.status()
    Reports status of the replica set from one
    node's point of view
 rs.stepDown(<secs>)
    Request the primary to step down
 rs.freeze(<secs>)
    Prevents any changes to the current replica
    set configuration (primary/secondary status)
    Use during backups
Advanced
Replication
Lots of Features

• Delayed
• Hidden
• Priorities
• Tags
Slave Delay
Slave Delay
• Lags behind master by configurable
  time delay
Slave Delay
• Lags behind master by configurable
  time delay

• Automatically hidden from clients
Slave Delay
• Lags behind master by configurable
  time delay

• Automatically hidden from clients
• Protects against operator errors
 • Fat fingering
 • Application corrupts data
Other member
    types
Other member
        types
• Arbiters
 • Don’t store a copy of the data
 • Vote in elections
 • Used as a tie breaker
Other member
        types
• Arbiters
 • Don’t store a copy of the data
 • Vote in elections
 • Used as a tie breaker
• Hidden
 • Not reported in isMaster
 • Hidden from slaveOk reads
Priorities
Priorities
• Priority: a number between 0 and 100
• Used during an election:
 • Most up to date
 • Highest priority
 • Less than 10s behind failed Primary
• Allows weighting of members during
  failover
Priorities - example
   A      B     C     D     E
  p:10   p:10   p:1   p:1   p:0
Priorities - example
         A       B         C      D       E
        p:10   p:10       p:1    p:1      p:0


•   Assuming all members are up to date
Priorities - example
           A       B       C       D      E
          p:10    p:10     p:1    p:1     p:0


•   Assuming all members are up to date
•   Members A or B will be chosen first
    •   Highest priority
Priorities - example
           A       B          C      D     E
          p:10    p:10       p:1     p:1   p:0


•   Assuming all members are up to date
•   Members A or B will be chosen first
    •   Highest priority
•   Members C or D will be chosen when:

    •   A and B are unavailable
    •   A and B are not up to date
Priorities - example
           A       B          C       D         E
          p:10    p:10       p:1     p:1        p:0


•   Assuming all members are up to date
•   Members A or B will be chosen first
    •   Highest priority
•   Members C or D will be chosen when:

    •   A and B are unavailable
    •   A and B are not up to date
•   Member E is never chosen

    •   priority:0 means it cannot be elected
Durabilit
   y
Durability Options
Durability Options


•Fire and forget
Durability Options


•Fire and forget
•Write Concern
Write Concern

                        &
If a write requires a
return trip

What the return trip
should depend on
Write Concern
Write Concern
w:
the number of servers to replicate to (or
majority)
Write Concern
w:
the number of servers to replicate to (or
majority)
wtimeout:
timeout in ms waiting for replication
Write Concern
w:
the number of servers to replicate to (or
majority)
wtimeout:
timeout in ms waiting for replication
j:
wait for journal sync
Write Concern
w:
the number of servers to replicate to (or
majority)
wtimeout:
timeout in ms waiting for replication
j:
wait for journal sync
tags:
ensure replication to n nodes of given tag
Fire and Forget
                 Driver           Primary

                          write


                                            apply in memory




•Operations are applied in memory
•No waiting for persistence to disk
•MongoDB clients do not block waiting to confirm
 the operation completed
Wait for error
              Driver                  Primary

                          write

                       getLastError             apply in memory




•Operations are applied in memory
•No waiting for persistence to disk
•MongoDB clients do block waiting to confirm the
 operation completed
Wait for journal
             sync
              Driver

                          write
                                      Primary



                       getLastError
                                                apply in memory
                         j:true
                                                Write to journal




•Operations are applied in memory
•Wait for persistence to journal
•MongoDB clients do block waiting to confirm the
 operation completed
Wait for fsync
              Driver                  Primary

                          write

                       getLastError
                                                apply in memory
                        fsync:true
                                                write to journal (if enabled)


                                                fsync




•Operations are applied in memory
•Wait for persistence to journal
•Wait for persistence to disk
•MongoDB clients do block waiting to confirm the
 operation completed
Wait for replication
   Driver                  Primary                     Secondary


               write

            getLastError
                                     apply in memory
              w:2
                                          replicate




•Operations are applied in memory
•No waiting for persistence to disk
•Waiting for replication to n nodes
•MongoDB clients do block waiting to confirm the
 operation completed
Tagging
• Control over where data is written to.
• Each member can have one or more tags:

  tags: {dc: "stockholm"}

  tags: {dc: "stockholm",
         ip: "192.168",
         rack: "row3-rk7"}


• Replica set defines rules for where data resides
• Rules defined in RS config... can change
  without change application code
Tagging - example
    {
        _id : "someSet",
        members : [
            {_id : 0, host : "A", tags : {"dc":   "ny"}},
            {_id : 1, host : "B", tags : {"dc":   "ny"}},
            {_id : 2, host : "C", tags : {"dc":   "sf"}},
            {_id : 3, host : "D", tags : {"dc":   "sf"}},
            {_id : 4, host : "E", tags : {"dc":   "cloud"}}
        ]
        settings : {
            getLastErrorModes : {
                veryImportant : {"dc" : 3},
                sortOfImportant : {"dc" : 2}
            }
        }
}
High
Availability
Scenarios
Single Node
    • Downtime inevitable
    • If node crashes human
      intervention might be
      needed

    • Should absolutely run
      with journaling to
      prevent data loss /
Replica Set 1
          • Single datacenter
Arbiter
          • Single switch & power
          • One node failure
          • Automatic recovery of
            single node crash


          • Points of failure:
           • Power
           • Network
           • Datacenter
Replica Set 2
          • Single datacenter
Arbiter
          • Multiple power/network
            zones

          • Automatic recovery of single
            node crash

          • w=2 not viable as losing 1
            node means no writes


          • Points of failure:
           • Datacenter
           • Two node failure
Replica Set 3
     • Single datacenter
     • Multiple power/network
       zones

     • Automatic recovery of
       single node crash

     • w=2 viable as 2/3 online
     • Points of failure:
      • Datacenter
      • Two node failure
When disaster
Replica Set 4
     • Multi datacenter
     • DR node for safety

     • Can't do multi data
       center durable write
       safely since only 1 node
       in distant DC
Replica Set 5
     • Three data centers
     • Can survive full data
       center loss


     • Can do w= { dc : 2 } to
       guarantee write in 2
       data centers
Set
Use?           Data Protection High Availability                Notes
       size


                                                     Must use --journal to protect
 X     One           No               No
                                                           against crashes

                                                   On loss of one member, surviving
       Two          Yes               No
                                                         member is read only

                                                   On loss of one member, surviving
       Three        Yes         Yes - 1 failure     two members can elect a new
                                                                primary
                                                     * On loss of two members,
 X     Four         Yes         Yes - 1 failure*   surviving two members are read
                                                                 only

                                                 On loss of two members, surviving
       Five         Yes         Yes - 2 failures  three members can elect a new
                                                              primary



                              Typical
http://spf13.com
                            http://github.com/s
                            @spf13




Questions?
     download at mongodb.org
 We’re hiring!! Contact us at jobs@10gen.com
Replication, Durability, and Disaster Recovery

More Related Content

What's hot

Google file system
Google file systemGoogle file system
Google file system
Dhan V Sagar
 
GFS - Google File System
GFS - Google File SystemGFS - Google File System
GFS - Google File System
tutchiio
 
The Chubby lock service for loosely- coupled distributed systems
The Chubby lock service for loosely- coupled distributed systems The Chubby lock service for loosely- coupled distributed systems
The Chubby lock service for loosely- coupled distributed systems
Ioanna Tsalouchidou
 
Google File System
Google File SystemGoogle File System
Google File System
nadikari123
 

What's hot (20)

The Google file system
The Google file systemThe Google file system
The Google file system
 
Advance google file system
Advance google file systemAdvance google file system
Advance google file system
 
Gfs google-file-system-13331
Gfs google-file-system-13331Gfs google-file-system-13331
Gfs google-file-system-13331
 
Google File System
Google File SystemGoogle File System
Google File System
 
Google File System
Google File SystemGoogle File System
Google File System
 
Google file system
Google file systemGoogle file system
Google file system
 
GFS & HDFS Introduction
GFS & HDFS IntroductionGFS & HDFS Introduction
GFS & HDFS Introduction
 
Google file system
Google file systemGoogle file system
Google file system
 
Google file system
Google file systemGoogle file system
Google file system
 
The Google Bigtable
The Google BigtableThe Google Bigtable
The Google Bigtable
 
advanced Google file System
advanced Google file Systemadvanced Google file System
advanced Google file System
 
GFS - Google File System
GFS - Google File SystemGFS - Google File System
GFS - Google File System
 
Google File System
Google File SystemGoogle File System
Google File System
 
Hbase Nosql
Hbase NosqlHbase Nosql
Hbase Nosql
 
The Chubby lock service for loosely- coupled distributed systems
The Chubby lock service for loosely- coupled distributed systems The Chubby lock service for loosely- coupled distributed systems
The Chubby lock service for loosely- coupled distributed systems
 
Google file system
Google file systemGoogle file system
Google file system
 
google file system
google file systemgoogle file system
google file system
 
Real-time, Exactly-once Data Ingestion from Kafka to ClickHouse at eBay
Real-time, Exactly-once Data Ingestion from Kafka to ClickHouse at eBayReal-time, Exactly-once Data Ingestion from Kafka to ClickHouse at eBay
Real-time, Exactly-once Data Ingestion from Kafka to ClickHouse at eBay
 
Google File System
Google File SystemGoogle File System
Google File System
 
Google file system GFS
Google file system GFSGoogle file system GFS
Google file system GFS
 

Viewers also liked

人人网技术经理张铁安 Feed系统结构浅析
人人网技术经理张铁安 Feed系统结构浅析人人网技术经理张铁安 Feed系统结构浅析
人人网技术经理张铁安 Feed系统结构浅析
isnull
 
Basic Replication in MongoDB
Basic Replication in MongoDBBasic Replication in MongoDB
Basic Replication in MongoDB
MongoDB
 
EmilyHauserResumeJuly2016
EmilyHauserResumeJuly2016EmilyHauserResumeJuly2016
EmilyHauserResumeJuly2016
Emily Hauser
 

Viewers also liked (20)

Introduction to Cassandra: Replication and Consistency
Introduction to Cassandra: Replication and ConsistencyIntroduction to Cassandra: Replication and Consistency
Introduction to Cassandra: Replication and Consistency
 
Understanding Data Partitioning and Replication in Apache Cassandra
Understanding Data Partitioning and Replication in Apache CassandraUnderstanding Data Partitioning and Replication in Apache Cassandra
Understanding Data Partitioning and Replication in Apache Cassandra
 
Bigtable and Dynamo
Bigtable and DynamoBigtable and Dynamo
Bigtable and Dynamo
 
MySQL Backup and Security Best Practices
MySQL Backup and Security Best PracticesMySQL Backup and Security Best Practices
MySQL Backup and Security Best Practices
 
Mongo db replication guide
Mongo db replication guideMongo db replication guide
Mongo db replication guide
 
Webinar: Best Practices for Getting Started with MongoDB
Webinar: Best Practices for Getting Started with MongoDBWebinar: Best Practices for Getting Started with MongoDB
Webinar: Best Practices for Getting Started with MongoDB
 
Migrating to MongoDB: Best Practices
Migrating to MongoDB: Best PracticesMigrating to MongoDB: Best Practices
Migrating to MongoDB: Best Practices
 
人人网技术经理张铁安 Feed系统结构浅析
人人网技术经理张铁安 Feed系统结构浅析人人网技术经理张铁安 Feed系统结构浅析
人人网技术经理张铁安 Feed系统结构浅析
 
Basic Replication in MongoDB
Basic Replication in MongoDBBasic Replication in MongoDB
Basic Replication in MongoDB
 
Webinar: Backups + Disaster Recovery
Webinar: Backups + Disaster RecoveryWebinar: Backups + Disaster Recovery
Webinar: Backups + Disaster Recovery
 
Building Awesome CLI apps in Go
Building Awesome CLI apps in GoBuilding Awesome CLI apps in Go
Building Awesome CLI apps in Go
 
新浪微博Feed服务架构
新浪微博Feed服务架构新浪微博Feed服务架构
新浪微博Feed服务架构
 
Automate MongoDB with MongoDB Management Service
Automate MongoDB with MongoDB Management ServiceAutomate MongoDB with MongoDB Management Service
Automate MongoDB with MongoDB Management Service
 
Recommender Engines Seminar Paper
Recommender Engines Seminar PaperRecommender Engines Seminar Paper
Recommender Engines Seminar Paper
 
Best Practices for Running MongoDB on AWS - AWS May 2016 Webinar Series
Best Practices for Running MongoDB on AWS - AWS May 2016 Webinar SeriesBest Practices for Running MongoDB on AWS - AWS May 2016 Webinar Series
Best Practices for Running MongoDB on AWS - AWS May 2016 Webinar Series
 
Deep Dive on ArangoDB
Deep Dive on ArangoDBDeep Dive on ArangoDB
Deep Dive on ArangoDB
 
FlinkML - Big data application meetup
FlinkML - Big data application meetupFlinkML - Big data application meetup
FlinkML - Big data application meetup
 
The care and feeding of a MySQL database
The care and feeding of a MySQL databaseThe care and feeding of a MySQL database
The care and feeding of a MySQL database
 
EmilyHauserResumeJuly2016
EmilyHauserResumeJuly2016EmilyHauserResumeJuly2016
EmilyHauserResumeJuly2016
 
Mongo db multidc_webinar
Mongo db multidc_webinarMongo db multidc_webinar
Mongo db multidc_webinar
 

Similar to Replication, Durability, and Disaster Recovery

Back to Basics: Build Something Big With MongoDB
Back to Basics: Build Something Big With MongoDB Back to Basics: Build Something Big With MongoDB
Back to Basics: Build Something Big With MongoDB
MongoDB
 
London devops logging
London devops loggingLondon devops logging
London devops logging
Tomas Doran
 
Backup, Restore, and Disaster Recovery
Backup, Restore, and Disaster RecoveryBackup, Restore, and Disaster Recovery
Backup, Restore, and Disaster Recovery
MongoDB
 
Buytaert kris my_sql-pacemaker
Buytaert kris my_sql-pacemakerBuytaert kris my_sql-pacemaker
Buytaert kris my_sql-pacemaker
kuchinskaya
 
Deployment Strategy
Deployment StrategyDeployment Strategy
Deployment Strategy
MongoDB
 

Similar to Replication, Durability, and Disaster Recovery (20)

Replication and replica sets
Replication and replica setsReplication and replica sets
Replication and replica sets
 
Exploring the replication in MongoDB
Exploring the replication in MongoDBExploring the replication in MongoDB
Exploring the replication in MongoDB
 
Evolution of MongoDB Replicaset and Its Best Practices
Evolution of MongoDB Replicaset and Its Best PracticesEvolution of MongoDB Replicaset and Its Best Practices
Evolution of MongoDB Replicaset and Its Best Practices
 
Evolution Of MongoDB Replicaset
Evolution Of MongoDB ReplicasetEvolution Of MongoDB Replicaset
Evolution Of MongoDB Replicaset
 
Introduction to Apache ZooKeeper | Big Data Hadoop Spark Tutorial | CloudxLab
Introduction to Apache ZooKeeper | Big Data Hadoop Spark Tutorial | CloudxLabIntroduction to Apache ZooKeeper | Big Data Hadoop Spark Tutorial | CloudxLab
Introduction to Apache ZooKeeper | Big Data Hadoop Spark Tutorial | CloudxLab
 
Back to Basics: Build Something Big With MongoDB
Back to Basics: Build Something Big With MongoDB Back to Basics: Build Something Big With MongoDB
Back to Basics: Build Something Big With MongoDB
 
Keeping MongoDB Data Safe
Keeping MongoDB Data SafeKeeping MongoDB Data Safe
Keeping MongoDB Data Safe
 
London devops logging
London devops loggingLondon devops logging
London devops logging
 
Backup, Restore, and Disaster Recovery
Backup, Restore, and Disaster RecoveryBackup, Restore, and Disaster Recovery
Backup, Restore, and Disaster Recovery
 
Buytaert kris my_sql-pacemaker
Buytaert kris my_sql-pacemakerBuytaert kris my_sql-pacemaker
Buytaert kris my_sql-pacemaker
 
Linux-HA with Pacemaker
Linux-HA with PacemakerLinux-HA with Pacemaker
Linux-HA with Pacemaker
 
Riding the Stream Processing Wave (Strange loop 2019)
Riding the Stream Processing Wave (Strange loop 2019)Riding the Stream Processing Wave (Strange loop 2019)
Riding the Stream Processing Wave (Strange loop 2019)
 
Introduction to DRBD
Introduction to DRBDIntroduction to DRBD
Introduction to DRBD
 
Fail-Safe Cluster for FirebirdSQL and something more
Fail-Safe Cluster for FirebirdSQL and something moreFail-Safe Cluster for FirebirdSQL and something more
Fail-Safe Cluster for FirebirdSQL and something more
 
Deployment Strategies (Mongo Austin)
Deployment Strategies (Mongo Austin)Deployment Strategies (Mongo Austin)
Deployment Strategies (Mongo Austin)
 
Linux-HA with Pacemaker
Linux-HA with PacemakerLinux-HA with Pacemaker
Linux-HA with Pacemaker
 
Deployment Strategy
Deployment StrategyDeployment Strategy
Deployment Strategy
 
Distributed system coordination by zookeeper and introduction to kazoo python...
Distributed system coordination by zookeeper and introduction to kazoo python...Distributed system coordination by zookeeper and introduction to kazoo python...
Distributed system coordination by zookeeper and introduction to kazoo python...
 
Technical track-afterimaging Progress Database
Technical track-afterimaging Progress DatabaseTechnical track-afterimaging Progress Database
Technical track-afterimaging Progress Database
 
Thoughts on Transaction and Consistency Models
Thoughts on Transaction and Consistency ModelsThoughts on Transaction and Consistency Models
Thoughts on Transaction and Consistency Models
 

More from Steven Francia

OSCON 2012 MongoDB Tutorial
OSCON 2012 MongoDB TutorialOSCON 2012 MongoDB Tutorial
OSCON 2012 MongoDB Tutorial
Steven Francia
 
MongoDB, Hadoop and Humongous Data
MongoDB, Hadoop and Humongous DataMongoDB, Hadoop and Humongous Data
MongoDB, Hadoop and Humongous Data
Steven Francia
 

More from Steven Francia (20)

State of the Gopher Nation - Golang - August 2017
State of the Gopher Nation - Golang - August 2017State of the Gopher Nation - Golang - August 2017
State of the Gopher Nation - Golang - August 2017
 
The Future of the Operating System - Keynote LinuxCon 2015
The Future of the Operating System -  Keynote LinuxCon 2015The Future of the Operating System -  Keynote LinuxCon 2015
The Future of the Operating System - Keynote LinuxCon 2015
 
7 Common Mistakes in Go (2015)
7 Common Mistakes in Go (2015)7 Common Mistakes in Go (2015)
7 Common Mistakes in Go (2015)
 
What every successful open source project needs
What every successful open source project needsWhat every successful open source project needs
What every successful open source project needs
 
7 Common mistakes in Go and when to avoid them
7 Common mistakes in Go and when to avoid them7 Common mistakes in Go and when to avoid them
7 Common mistakes in Go and when to avoid them
 
Go for Object Oriented Programmers or Object Oriented Programming without Obj...
Go for Object Oriented Programmers or Object Oriented Programming without Obj...Go for Object Oriented Programmers or Object Oriented Programming without Obj...
Go for Object Oriented Programmers or Object Oriented Programming without Obj...
 
Painless Data Storage with MongoDB & Go
Painless Data Storage with MongoDB & Go Painless Data Storage with MongoDB & Go
Painless Data Storage with MongoDB & Go
 
Getting Started with Go
Getting Started with GoGetting Started with Go
Getting Started with Go
 
Build your first MongoDB App in Ruby @ StrangeLoop 2013
Build your first MongoDB App in Ruby @ StrangeLoop 2013Build your first MongoDB App in Ruby @ StrangeLoop 2013
Build your first MongoDB App in Ruby @ StrangeLoop 2013
 
Modern Database Systems (for Genealogy)
Modern Database Systems (for Genealogy)Modern Database Systems (for Genealogy)
Modern Database Systems (for Genealogy)
 
Introduction to MongoDB and Hadoop
Introduction to MongoDB and HadoopIntroduction to MongoDB and Hadoop
Introduction to MongoDB and Hadoop
 
Future of data
Future of dataFuture of data
Future of data
 
MongoDB, Hadoop and humongous data - MongoSV 2012
MongoDB, Hadoop and humongous data - MongoSV 2012MongoDB, Hadoop and humongous data - MongoSV 2012
MongoDB, Hadoop and humongous data - MongoSV 2012
 
Big data for the rest of us
Big data for the rest of usBig data for the rest of us
Big data for the rest of us
 
OSCON 2012 MongoDB Tutorial
OSCON 2012 MongoDB TutorialOSCON 2012 MongoDB Tutorial
OSCON 2012 MongoDB Tutorial
 
Multi Data Center Strategies
Multi Data Center StrategiesMulti Data Center Strategies
Multi Data Center Strategies
 
NoSQL databases and managing big data
NoSQL databases and managing big dataNoSQL databases and managing big data
NoSQL databases and managing big data
 
MongoDB, Hadoop and Humongous Data
MongoDB, Hadoop and Humongous DataMongoDB, Hadoop and Humongous Data
MongoDB, Hadoop and Humongous Data
 
MongoDB and hadoop
MongoDB and hadoopMongoDB and hadoop
MongoDB and hadoop
 
MongoDB for Genealogy
MongoDB for GenealogyMongoDB for Genealogy
MongoDB for Genealogy
 

Recently uploaded

Search and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical FuturesSearch and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical Futures
Bhaskar Mitra
 

Recently uploaded (20)

Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
 
In-Depth Performance Testing Guide for IT Professionals
In-Depth Performance Testing Guide for IT ProfessionalsIn-Depth Performance Testing Guide for IT Professionals
In-Depth Performance Testing Guide for IT Professionals
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 
UiPath Test Automation using UiPath Test Suite series, part 2
UiPath Test Automation using UiPath Test Suite series, part 2UiPath Test Automation using UiPath Test Suite series, part 2
UiPath Test Automation using UiPath Test Suite series, part 2
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
 
The architecture of Generative AI for enterprises.pdf
The architecture of Generative AI for enterprises.pdfThe architecture of Generative AI for enterprises.pdf
The architecture of Generative AI for enterprises.pdf
 
Introduction to Open Source RAG and RAG Evaluation
Introduction to Open Source RAG and RAG EvaluationIntroduction to Open Source RAG and RAG Evaluation
Introduction to Open Source RAG and RAG Evaluation
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
 
Optimizing NoSQL Performance Through Observability
Optimizing NoSQL Performance Through ObservabilityOptimizing NoSQL Performance Through Observability
Optimizing NoSQL Performance Through Observability
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 
WSO2CONMay2024OpenSourceConferenceDebrief.pptx
WSO2CONMay2024OpenSourceConferenceDebrief.pptxWSO2CONMay2024OpenSourceConferenceDebrief.pptx
WSO2CONMay2024OpenSourceConferenceDebrief.pptx
 
Search and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical FuturesSearch and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical Futures
 
Powerful Start- the Key to Project Success, Barbara Laskowska
Powerful Start- the Key to Project Success, Barbara LaskowskaPowerful Start- the Key to Project Success, Barbara Laskowska
Powerful Start- the Key to Project Success, Barbara Laskowska
 
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
 
Speed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in MinutesSpeed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in Minutes
 
Agentic RAG What it is its types applications and implementation.pdf
Agentic RAG What it is its types applications and implementation.pdfAgentic RAG What it is its types applications and implementation.pdf
Agentic RAG What it is its types applications and implementation.pdf
 
AI revolution and Salesforce, Jiří Karpíšek
AI revolution and Salesforce, Jiří KarpíšekAI revolution and Salesforce, Jiří Karpíšek
AI revolution and Salesforce, Jiří Karpíšek
 
PLAI - Acceleration Program for Generative A.I. Startups
PLAI - Acceleration Program for Generative A.I. StartupsPLAI - Acceleration Program for Generative A.I. Startups
PLAI - Acceleration Program for Generative A.I. Startups
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
 

Replication, Durability, and Disaster Recovery

  • 2. @spf13 AKA Steve Francia 15+ years building the internet Father, husband, skateboarder Chief Solutions Architect @ responsible for drivers, integrations, web & docs
  • 3. Agenda • Intro to replication • How MongoDB does Replication • Configuring a ReplicaSet • Advanced Replication • Durability • High Availability Scenarios
  • 6. Use cases • High Availability (auto-failover)
  • 7. Use cases • High Availability (auto-failover) • Read Scaling (extra copies to read from)
  • 8. Use cases • High Availability (auto-failover) • Read Scaling (extra copies to read from) • Backups Delayed Copy (fat finger) • Online, Time (PiT) backups • Point in
  • 9. Use cases • High Availability (auto-failover) • Read Scaling (extra copies to read from) • Backups Delayed Copy (fat finger) • Online, Time (PiT) backups • Point in • Use (hidden) replica for secondary workload • Analytics • Data-processingexternal systems • Integration with
  • 11. Types of outage Planned • Hardware upgrade • O/S or file-system tuning • Relocation of data to new file-system / storage • Software upgrade
  • 12. Types of outage Planned • Hardware upgrade • O/S or file-system tuning • Relocation of data to new file-system / storage • Software upgrade Unplanned • Hardware failure • Data center failure • Region outage • Human error • Application corruption
  • 14. Replica Set features • A cluster of N servers
  • 15. Replica Set features • A cluster of N servers • Any (one) node can be primary
  • 16. Replica Set features • A cluster of N servers • Any (one) node can be primary • Consensus election of primary
  • 17. Replica Set features • A cluster of N servers • Any (one) node can be primary • Consensus election of primary • Automatic failover
  • 18. Replica Set features • A cluster of N servers • Any (one) node can be primary • Consensus election of primary • Automatic failover • Automatic recovery
  • 19. Replica Set features • A cluster of N servers • Any (one) node can be primary • Consensus election of primary • Automatic failover • Automatic recovery • All writes to primary
  • 20. Replica Set features • A cluster of N servers • Any (one) node can be primary • Consensus election of primary • Automatic failover • Automatic recovery • All writes to primary • Reads can be to primary (default) or a secondary
  • 22. How MongoDB Replication works Member 1 Member 3 Member 2 • Set is made up of 2 or more nodes
  • 23. How MongoDB Replication works Member 1 Member 3 Member 2 Primary • Election establishes the PRIMARY • Data replication from PRIMARY to
  • 24. How MongoDB Replication works negotiate new master Member 1 Member 3 Member 2 DOWN • PRIMARY may fail • Automatic election of new PRIMARY if
  • 25. How MongoDB Replication works Member 3 Member 1 Primary Member 2 DOWN • New PRIMARY elected • Replica Set re-established
  • 26. How MongoDB Replication works Member 3 Member 1 Primary Member 2 Recovering • Automatic recovery
  • 27. How MongoDB Replication works Member 3 Member 1 Primary Member 2 • Replica Set re-established
  • 29. How Is Data Replicated? to the • Change operations are written oplog • The oplog is a capped collection (fixed size) •Must have enough space to allow new secondaries to catch up (from scratch or from a backup) •Must have enough space to cope with any applicable slaveDelay
  • 30. How Is Data Replicated? to the • Change operations are written oplog • The oplog is a capped collection (fixed size) •Must have enough space to allow new secondaries to catch up (from scratch or from a backup) •Must have enough space to cope with any applicable slaveDelay • Secondaries query the primary’s oplog and apply what they find • All replicas contain an oplog
  • 32. Creating a Replica Set $ ./mongod --replSet <name> > cfg = { _id : "<name>", members : [ { _id : 0, host : "sf1.acme.com" }, { _id : 1, host : "sf2.acme.com" }, { _id : 2, host : "sf3.acme.com" } ] } > use admin > rs.initiate(cfg)
  • 34. Managing a Replica Set rs.conf() Shell helper: get current configuration
  • 35. Managing a Replica Set rs.conf() Shell helper: get current configuration rs.initiate(<cfg>); Shell helper: initiate replica set
  • 36. Managing a Replica Set rs.conf() Shell helper: get current configuration rs.initiate(<cfg>); Shell helper: initiate replica set rs.reconfig(<cfg>) Shell helper: reconfigure a replica set
  • 37. Managing a Replica Set rs.conf() Shell helper: get current configuration rs.initiate(<cfg>); Shell helper: initiate replica set rs.reconfig(<cfg>) Shell helper: reconfigure a replica set rs.add("hostname:<port>") Shell helper: add a new member
  • 38. Managing a Replica Set rs.conf() Shell helper: get current configuration rs.initiate(<cfg>); Shell helper: initiate replica set rs.reconfig(<cfg>) Shell helper: reconfigure a replica set rs.add("hostname:<port>") Shell helper: add a new member rs.remove("hostname:<port>") Shell helper: remove a member
  • 40. Managing a Replica Set rs.status() Reports status of the replica set from one node's point of view
  • 41. Managing a Replica Set rs.status() Reports status of the replica set from one node's point of view rs.stepDown(<secs>) Request the primary to step down
  • 42. Managing a Replica Set rs.status() Reports status of the replica set from one node's point of view rs.stepDown(<secs>) Request the primary to step down rs.freeze(<secs>) Prevents any changes to the current replica set configuration (primary/secondary status) Use during backups
  • 44. Lots of Features • Delayed • Hidden • Priorities • Tags
  • 46. Slave Delay • Lags behind master by configurable time delay
  • 47. Slave Delay • Lags behind master by configurable time delay • Automatically hidden from clients
  • 48. Slave Delay • Lags behind master by configurable time delay • Automatically hidden from clients • Protects against operator errors • Fat fingering • Application corrupts data
  • 49. Other member types
  • 50. Other member types • Arbiters • Don’t store a copy of the data • Vote in elections • Used as a tie breaker
  • 51. Other member types • Arbiters • Don’t store a copy of the data • Vote in elections • Used as a tie breaker • Hidden • Not reported in isMaster • Hidden from slaveOk reads
  • 53. Priorities • Priority: a number between 0 and 100 • Used during an election: • Most up to date • Highest priority • Less than 10s behind failed Primary • Allows weighting of members during failover
  • 54. Priorities - example A B C D E p:10 p:10 p:1 p:1 p:0
  • 55. Priorities - example A B C D E p:10 p:10 p:1 p:1 p:0 • Assuming all members are up to date
  • 56. Priorities - example A B C D E p:10 p:10 p:1 p:1 p:0 • Assuming all members are up to date • Members A or B will be chosen first • Highest priority
  • 57. Priorities - example A B C D E p:10 p:10 p:1 p:1 p:0 • Assuming all members are up to date • Members A or B will be chosen first • Highest priority • Members C or D will be chosen when: • A and B are unavailable • A and B are not up to date
  • 58. Priorities - example A B C D E p:10 p:10 p:1 p:1 p:0 • Assuming all members are up to date • Members A or B will be chosen first • Highest priority • Members C or D will be chosen when: • A and B are unavailable • A and B are not up to date • Member E is never chosen • priority:0 means it cannot be elected
  • 62. Durability Options •Fire and forget •Write Concern
  • 63. Write Concern & If a write requires a return trip What the return trip should depend on
  • 65. Write Concern w: the number of servers to replicate to (or majority)
  • 66. Write Concern w: the number of servers to replicate to (or majority) wtimeout: timeout in ms waiting for replication
  • 67. Write Concern w: the number of servers to replicate to (or majority) wtimeout: timeout in ms waiting for replication j: wait for journal sync
  • 68. Write Concern w: the number of servers to replicate to (or majority) wtimeout: timeout in ms waiting for replication j: wait for journal sync tags: ensure replication to n nodes of given tag
  • 69. Fire and Forget Driver Primary write apply in memory •Operations are applied in memory •No waiting for persistence to disk •MongoDB clients do not block waiting to confirm the operation completed
  • 70. Wait for error Driver Primary write getLastError apply in memory •Operations are applied in memory •No waiting for persistence to disk •MongoDB clients do block waiting to confirm the operation completed
  • 71. Wait for journal sync Driver write Primary getLastError apply in memory j:true Write to journal •Operations are applied in memory •Wait for persistence to journal •MongoDB clients do block waiting to confirm the operation completed
  • 72. Wait for fsync Driver Primary write getLastError apply in memory fsync:true write to journal (if enabled) fsync •Operations are applied in memory •Wait for persistence to journal •Wait for persistence to disk •MongoDB clients do block waiting to confirm the operation completed
  • 73. Wait for replication Driver Primary Secondary write getLastError apply in memory w:2 replicate •Operations are applied in memory •No waiting for persistence to disk •Waiting for replication to n nodes •MongoDB clients do block waiting to confirm the operation completed
  • 74. Tagging • Control over where data is written to. • Each member can have one or more tags: tags: {dc: "stockholm"} tags: {dc: "stockholm", ip: "192.168", rack: "row3-rk7"} • Replica set defines rules for where data resides • Rules defined in RS config... can change without change application code
  • 75. Tagging - example { _id : "someSet", members : [ {_id : 0, host : "A", tags : {"dc": "ny"}}, {_id : 1, host : "B", tags : {"dc": "ny"}}, {_id : 2, host : "C", tags : {"dc": "sf"}}, {_id : 3, host : "D", tags : {"dc": "sf"}}, {_id : 4, host : "E", tags : {"dc": "cloud"}} ] settings : { getLastErrorModes : { veryImportant : {"dc" : 3}, sortOfImportant : {"dc" : 2} } } }
  • 77. Single Node • Downtime inevitable • If node crashes human intervention might be needed • Should absolutely run with journaling to prevent data loss /
  • 78. Replica Set 1 • Single datacenter Arbiter • Single switch & power • One node failure • Automatic recovery of single node crash • Points of failure: • Power • Network • Datacenter
  • 79. Replica Set 2 • Single datacenter Arbiter • Multiple power/network zones • Automatic recovery of single node crash • w=2 not viable as losing 1 node means no writes • Points of failure: • Datacenter • Two node failure
  • 80. Replica Set 3 • Single datacenter • Multiple power/network zones • Automatic recovery of single node crash • w=2 viable as 2/3 online • Points of failure: • Datacenter • Two node failure
  • 82. Replica Set 4 • Multi datacenter • DR node for safety • Can't do multi data center durable write safely since only 1 node in distant DC
  • 83. Replica Set 5 • Three data centers • Can survive full data center loss • Can do w= { dc : 2 } to guarantee write in 2 data centers
  • 84. Set Use? Data Protection High Availability Notes size Must use --journal to protect X One No No against crashes On loss of one member, surviving Two Yes No member is read only On loss of one member, surviving Three Yes Yes - 1 failure two members can elect a new primary * On loss of two members, X Four Yes Yes - 1 failure* surviving two members are read only On loss of two members, surviving Five Yes Yes - 2 failures three members can elect a new primary Typical
  • 85. http://spf13.com http://github.com/s @spf13 Questions? download at mongodb.org We’re hiring!! Contact us at jobs@10gen.com

Editor's Notes

  1. \n
  2. \n
  3. \n
  4. \n
  5. \n
  6. \n
  7. \n
  8. \n
  9. \n
  10. \n
  11. \n
  12. \n
  13. \n
  14. \n
  15. \n
  16. \n
  17. \n
  18. \n
  19. \n
  20. \n
  21. \n
  22. \n
  23. \n
  24. \n
  25. \n
  26. \n
  27. \n
  28. \n
  29. \n
  30. \n
  31. \n
  32. \n
  33. \n
  34. \n
  35. \n
  36. \n
  37. \n
  38. \n
  39. \n
  40. \n
  41. \n
  42. \n
  43. \n
  44. \n
  45. \n
  46. \n
  47. \n
  48. \n
  49. \n
  50. \n
  51. \n
  52. \n
  53. \n
  54. \n
  55. \n
  56. \n
  57. \n
  58. \n
  59. \n
  60. \n
  61. \n
  62. \n
  63. \n
  64. \n
  65. \n
  66. \n
  67. \n
  68. \n
  69. \n
  70. \n
  71. \n
  72. \n
  73. \n
  74. \n
  75. \n
  76. \n
  77. \n
  78. \n
  79. \n
  80. \n
  81. \n