SlideShare a Scribd company logo
1 of 24
Distributed Consensus in MongoDB
Spencer T Brody
Senior Software Engineer at MongoDB
@stbrody
Agenda
• Introduction to consensus
• Leader-based replicated state machine
• Elections and data replication in MongoDB
• Improvements coming in MongoDB 3.2
Why use Replication?
• Data redundancy
• High availability
What is Consensus?
• Getting multiple processes/servers to agree on something
• Must handle a wide range of failure modes
• Disk failure
• Network partitions
• Machine freezes
• Clock skews
Basic consensus
State
Machine
X 3
Y 2
Z 7
State
Machine
X 3
Y 2
Z 7
State
Machine
X 3
Y 2
Z 7
Leader Based Consensus
State
Machine
X 3
Y 2
Z 7
Replicated
Log
X ⬅️ 1
Y ⬅️ 2
X ⬅️ 3
State
Machine
X 3
Y 2
Z 7
Replicated
Log
X ⬅️ 1
Y ⬅️ 2
X ⬅️ 3
State
Machine
X 3
Y 2
Z 7
Replicated
Log
X ⬅️ 1
Y ⬅️ 2
X ⬅️ 3
Agenda
• Introduction to consensus
• Leader-based replicated state machine
• Elections and data replication in MongoDB
• Improvements coming in MongoDB 3.2
Elections
X 3
Y 2
Z 7
X ⬅️ 1
Y ⬅️ 2
X ⬅️ 3
X 3
Y 2
Z 7
X ⬅️ 1
Y ⬅️ 2
X ⬅️ 3
X 3
Y 2
Z 7
X ⬅️ 1
Y ⬅️ 2
X ⬅️ 3
X 3
Y 2
Z 7
X ⬅️ 1
Y ⬅️ 2
X ⬅️ 3
X 3
Y 2
Z 7
X ⬅️ 1
Y ⬅️ 2
X ⬅️ 3
Elections
X 3
Y 2
Z 7
X ⬅️ 1
Y ⬅️ 2
X ⬅️ 3
X 3
Y 2
Z 7
X ⬅️ 1
Y ⬅️ 2
X ⬅️ 3
X 3
Y 2
Z 7
X ⬅️ 1
Y ⬅️ 2
X ⬅️ 3
X 3
Y 2
Z 7
X ⬅️ 1
Y ⬅️ 2
X ⬅️ 3
X 3
Y 2
Z 7
X ⬅️ 1
Y ⬅️ 2
X ⬅️ 3
Elections
X 3
Y 2
Z 7
X ⬅️ 1
Y ⬅️ 2
X ⬅️ 3
X 3
Y 2
Z 7
X ⬅️ 1
Y ⬅️ 2
X ⬅️ 3
X 3
Y 2
Z 7
X ⬅️ 1
Y ⬅️ 2
X ⬅️ 3
X 3
Y 2
Z 7
X ⬅️ 1
Y ⬅️ 2
X ⬅️ 3
X 3
Y 2
Z 7
X ⬅️ 1
Y ⬅️ 2
X ⬅️ 3
Elections
X 3
Y 2
Z 7
X ⬅️ 1
Y ⬅️ 2
X ⬅️ 3
X 3
Y 2
Z 7
X ⬅️ 1
Y ⬅️ 2
X ⬅️ 3
X 3
Y 2
Z 7
X ⬅️ 1
Y ⬅️ 2
X ⬅️ 3
X 3
Y 2
Z 7
X ⬅️ 1
Y ⬅️ 2
X ⬅️ 3
X 3
Y 2
Z 7
X ⬅️ 1
Y ⬅️ 2
X ⬅️ 3
Elections
X 3
Y 2
Z 7
X ⬅️ 1
Y ⬅️ 2
X ⬅️ 3
X 3
Y 2
Z 7
X ⬅️ 1
Y ⬅️ 2
X ⬅️ 3
X 3
Y 2
Z 7
X ⬅️ 1
Y ⬅️ 2
X ⬅️ 3
X 3
Y 2
Z 7
X ⬅️ 1
Y ⬅️ 2
X ⬅️ 3
X 3
Y 2
Z 7
X ⬅️ 1
Y ⬅️ 2
X ⬅️ 3
Data Replication
X 3
Y 2
Z 7
X ⬅️ 1
Y ⬅️ 2
X ⬅️ 3
X 3
Y 2
Z 7
X ⬅️ 1
Y ⬅️ 2
X ⬅️ 3
X 3
Y 2
Z 7
X ⬅️ 1
Y ⬅️ 2
X ⬅️ 3
X 3
Y 2
Z 7
X ⬅️ 1
Y ⬅️ 2
X ⬅️ 3
X 3
Y 2
Z 7
X ⬅️ 1
Y ⬅️ 2
X ⬅️ 3
Agenda
• Introduction to consensus
• Leader-based replicated state machine
• Elections and data replication in MongoDB
• Improvements coming in MongoDB 3.2
Agenda
• Introduction to consensus
• Leader-based replicated state machine
• Elections and data replication in MongoDB
• Improvements coming in MongoDB 3.2
• Goals and inspiration from Raft Consensus Algorithm
• Preventing double voting
• Monitoring node status
• Calling for elections
Goals for MongoDB 3.2
• Decrease failover time
• Speed up detection and resolution of false primary situations
Finding Inspiration in Raft
• “In Search of an Understandable Consensus Algorithm” by Diego
Ongaro: https://ramcloud.stanford.edu/raft.pdf
• Designed to address the shortcomings of Paxos
• Easier to understand
• Easier to implement in real applications
• Provably correct
• Remarkably similar to what we’re doing already
Raft Concepts
• Term (election) IDs
• Monitoring node status using existing data replication channel
• Asymmetric election timeouts
Preventing Double Voting
• Can’t vote for 2 nodes in the same election
• Pre-3.2: 30 second vote timeout
• Post-3.2: Term IDs
• Term:
• Monotonically increasing ID
• Incremented on every election *attempt*
• Lets voters distinguish elections so they can vote twice quickly in
different elections.
Monitoring Node Status
• Pre-3.2: Heartbeats
• Sent every two seconds from every node to every other node
• Volume increases quadratically as nodes are added to the replica set
• Post-3.2: Extra metadata sent via existing data replication channel
• Utilizes chained replication
• Faster heartbeats = faster elections and detection of false primaries
Data Replication
X 3
Y 2
Z 7
X ⬅️ 1
Y ⬅️ 2
X ⬅️ 3
X 3
Y 2
Z 7
X ⬅️ 1
Y ⬅️ 2
X ⬅️ 3
X 3
Y 2
Z 7
X ⬅️ 1
Y ⬅️ 2
X ⬅️ 3
X 3
Y 2
Z 7
X ⬅️ 1
Y ⬅️ 2
X ⬅️ 3
X 3
Y 2
Z 7
X ⬅️ 1
Y ⬅️ 2
X ⬅️ 3
Determining When To Call For An Election
• Tradeoff between failover time and spurious failovers
• Node calls for an election when it hasn’t heard from the primary within
the election timeout
• Starting in 3.2:
• Election timeout is configurable
• Election timeout is varied randomly for each node
• Varying timeouts help reduce tied votes
• Fewer tied votes = faster failover
Conclusion
• MongoDB 3.2 will have
• Faster failovers
• Faster error detection
• More control to prevent spurious failovers
• This means your systems are
• More stable
• More resilient to failure
• Easier to maintain
Questions?

More Related Content

Viewers also liked

Webinar: Back to Basics: Thinking in Documents
Webinar: Back to Basics: Thinking in DocumentsWebinar: Back to Basics: Thinking in Documents
Webinar: Back to Basics: Thinking in DocumentsMongoDB
 
MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and V...
MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and V...MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and V...
MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and V...MongoDB
 
Back to Basics Webinar 3: Introduction to Replica Sets
Back to Basics Webinar 3: Introduction to Replica SetsBack to Basics Webinar 3: Introduction to Replica Sets
Back to Basics Webinar 3: Introduction to Replica SetsMongoDB
 
Introduction to Raft algorithm
Introduction to Raft algorithmIntroduction to Raft algorithm
Introduction to Raft algorithmmuayyad alsadi
 
Non Developer Scrum Teams: How Scrum Can Improve Your Operations
Non Developer Scrum Teams: How Scrum Can Improve Your OperationsNon Developer Scrum Teams: How Scrum Can Improve Your Operations
Non Developer Scrum Teams: How Scrum Can Improve Your OperationsMatthew Salerno
 
What is NoSQL and CAP Theorem
What is NoSQL and CAP TheoremWhat is NoSQL and CAP Theorem
What is NoSQL and CAP TheoremRahul Jain
 

Viewers also liked (6)

Webinar: Back to Basics: Thinking in Documents
Webinar: Back to Basics: Thinking in DocumentsWebinar: Back to Basics: Thinking in Documents
Webinar: Back to Basics: Thinking in Documents
 
MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and V...
MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and V...MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and V...
MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and V...
 
Back to Basics Webinar 3: Introduction to Replica Sets
Back to Basics Webinar 3: Introduction to Replica SetsBack to Basics Webinar 3: Introduction to Replica Sets
Back to Basics Webinar 3: Introduction to Replica Sets
 
Introduction to Raft algorithm
Introduction to Raft algorithmIntroduction to Raft algorithm
Introduction to Raft algorithm
 
Non Developer Scrum Teams: How Scrum Can Improve Your Operations
Non Developer Scrum Teams: How Scrum Can Improve Your OperationsNon Developer Scrum Teams: How Scrum Can Improve Your Operations
Non Developer Scrum Teams: How Scrum Can Improve Your Operations
 
What is NoSQL and CAP Theorem
What is NoSQL and CAP TheoremWhat is NoSQL and CAP Theorem
What is NoSQL and CAP Theorem
 

Similar to Replication Election and Consensus Algorithm Refinements for MongoDB 3.2

Mitch Trachtenberg: Open Source for Election Integrity
Mitch Trachtenberg: Open Source for Election IntegrityMitch Trachtenberg: Open Source for Election Integrity
Mitch Trachtenberg: Open Source for Election IntegrityFriprogsenteret
 
Unveiling etcd: Architecture and Source Code Deep Dive
Unveiling etcd: Architecture and Source Code Deep DiveUnveiling etcd: Architecture and Source Code Deep Dive
Unveiling etcd: Architecture and Source Code Deep DiveChieh (Jack) Yu
 
Gemtalk Systems Product Roadmap
Gemtalk Systems Product RoadmapGemtalk Systems Product Roadmap
Gemtalk Systems Product RoadmapESUG
 
Reaching reliable agreement in an unreliable world
Reaching reliable agreement in an unreliable worldReaching reliable agreement in an unreliable world
Reaching reliable agreement in an unreliable worldHeidi Howard
 
A Kafkaesque Raft Protocol | Jason Gustafson, Confluent
A Kafkaesque Raft Protocol | Jason Gustafson, ConfluentA Kafkaesque Raft Protocol | Jason Gustafson, Confluent
A Kafkaesque Raft Protocol | Jason Gustafson, ConfluentHostedbyConfluent
 
Fighting fraud: finding duplicates at scale (Highload+ 2019)
Fighting fraud: finding duplicates at scale (Highload+ 2019)Fighting fraud: finding duplicates at scale (Highload+ 2019)
Fighting fraud: finding duplicates at scale (Highload+ 2019)Alexey Grigorev
 
Ddd by-clark chou
Ddd by-clark chouDdd by-clark chou
Ddd by-clark chouKim Kao
 
Cassandra introduction apache con 2014 budapest
Cassandra introduction apache con 2014 budapestCassandra introduction apache con 2014 budapest
Cassandra introduction apache con 2014 budapestDuyhai Doan
 
The Strathclyde Poker Research Environment
The Strathclyde Poker Research EnvironmentThe Strathclyde Poker Research Environment
The Strathclyde Poker Research EnvironmentLuke Dicken
 
Optimal Strategies for Large-Scale Batch ETL Jobs
Optimal Strategies for Large-Scale Batch ETL JobsOptimal Strategies for Large-Scale Batch ETL Jobs
Optimal Strategies for Large-Scale Batch ETL JobsEmma Tang
 
Optimal Strategies for Large Scale Batch ETL Jobs with Emma Tang
Optimal Strategies for Large Scale Batch ETL Jobs with Emma TangOptimal Strategies for Large Scale Batch ETL Jobs with Emma Tang
Optimal Strategies for Large Scale Batch ETL Jobs with Emma TangDatabricks
 

Similar to Replication Election and Consensus Algorithm Refinements for MongoDB 3.2 (12)

Mitch Trachtenberg: Open Source for Election Integrity
Mitch Trachtenberg: Open Source for Election IntegrityMitch Trachtenberg: Open Source for Election Integrity
Mitch Trachtenberg: Open Source for Election Integrity
 
Unveiling etcd: Architecture and Source Code Deep Dive
Unveiling etcd: Architecture and Source Code Deep DiveUnveiling etcd: Architecture and Source Code Deep Dive
Unveiling etcd: Architecture and Source Code Deep Dive
 
Gemtalk Systems Product Roadmap
Gemtalk Systems Product RoadmapGemtalk Systems Product Roadmap
Gemtalk Systems Product Roadmap
 
Evoting2003
Evoting2003Evoting2003
Evoting2003
 
Reaching reliable agreement in an unreliable world
Reaching reliable agreement in an unreliable worldReaching reliable agreement in an unreliable world
Reaching reliable agreement in an unreliable world
 
A Kafkaesque Raft Protocol | Jason Gustafson, Confluent
A Kafkaesque Raft Protocol | Jason Gustafson, ConfluentA Kafkaesque Raft Protocol | Jason Gustafson, Confluent
A Kafkaesque Raft Protocol | Jason Gustafson, Confluent
 
Fighting fraud: finding duplicates at scale (Highload+ 2019)
Fighting fraud: finding duplicates at scale (Highload+ 2019)Fighting fraud: finding duplicates at scale (Highload+ 2019)
Fighting fraud: finding duplicates at scale (Highload+ 2019)
 
Ddd by-clark chou
Ddd by-clark chouDdd by-clark chou
Ddd by-clark chou
 
Cassandra introduction apache con 2014 budapest
Cassandra introduction apache con 2014 budapestCassandra introduction apache con 2014 budapest
Cassandra introduction apache con 2014 budapest
 
The Strathclyde Poker Research Environment
The Strathclyde Poker Research EnvironmentThe Strathclyde Poker Research Environment
The Strathclyde Poker Research Environment
 
Optimal Strategies for Large-Scale Batch ETL Jobs
Optimal Strategies for Large-Scale Batch ETL JobsOptimal Strategies for Large-Scale Batch ETL Jobs
Optimal Strategies for Large-Scale Batch ETL Jobs
 
Optimal Strategies for Large Scale Batch ETL Jobs with Emma Tang
Optimal Strategies for Large Scale Batch ETL Jobs with Emma TangOptimal Strategies for Large Scale Batch ETL Jobs with Emma Tang
Optimal Strategies for Large Scale Batch ETL Jobs with Emma Tang
 

More from MongoDB

MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB AtlasMongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB AtlasMongoDB
 
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!MongoDB
 
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...MongoDB
 
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDBMongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDBMongoDB
 
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...MongoDB
 
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series DataMongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series DataMongoDB
 
MongoDB SoCal 2020: MongoDB Atlas Jump Start
 MongoDB SoCal 2020: MongoDB Atlas Jump Start MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB SoCal 2020: MongoDB Atlas Jump StartMongoDB
 
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]MongoDB
 
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2MongoDB
 
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...MongoDB
 
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!MongoDB
 
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your MindsetMongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your MindsetMongoDB
 
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas JumpstartMongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas JumpstartMongoDB
 
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...MongoDB
 
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++MongoDB
 
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...MongoDB
 
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep DiveMongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep DiveMongoDB
 
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & GolangMongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & GolangMongoDB
 
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...MongoDB
 
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...MongoDB
 

More from MongoDB (20)

MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB AtlasMongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
 
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
 
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
 
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDBMongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
 
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
 
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series DataMongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
 
MongoDB SoCal 2020: MongoDB Atlas Jump Start
 MongoDB SoCal 2020: MongoDB Atlas Jump Start MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB SoCal 2020: MongoDB Atlas Jump Start
 
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
 
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
 
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
 
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
 
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your MindsetMongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
 
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas JumpstartMongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
 
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
 
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
 
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
 
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep DiveMongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
 
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & GolangMongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
 
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
 
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
 

Recently uploaded

Science&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfScience&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfjimielynbastida
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentationphoebematthew05
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 

Recently uploaded (20)

Science&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfScience&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdf
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentation
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptxVulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 

Replication Election and Consensus Algorithm Refinements for MongoDB 3.2

  • 1. Distributed Consensus in MongoDB Spencer T Brody Senior Software Engineer at MongoDB @stbrody
  • 2. Agenda • Introduction to consensus • Leader-based replicated state machine • Elections and data replication in MongoDB • Improvements coming in MongoDB 3.2
  • 3. Why use Replication? • Data redundancy • High availability
  • 4. What is Consensus? • Getting multiple processes/servers to agree on something • Must handle a wide range of failure modes • Disk failure • Network partitions • Machine freezes • Clock skews
  • 5. Basic consensus State Machine X 3 Y 2 Z 7 State Machine X 3 Y 2 Z 7 State Machine X 3 Y 2 Z 7
  • 6. Leader Based Consensus State Machine X 3 Y 2 Z 7 Replicated Log X ⬅️ 1 Y ⬅️ 2 X ⬅️ 3 State Machine X 3 Y 2 Z 7 Replicated Log X ⬅️ 1 Y ⬅️ 2 X ⬅️ 3 State Machine X 3 Y 2 Z 7 Replicated Log X ⬅️ 1 Y ⬅️ 2 X ⬅️ 3
  • 7. Agenda • Introduction to consensus • Leader-based replicated state machine • Elections and data replication in MongoDB • Improvements coming in MongoDB 3.2
  • 8. Elections X 3 Y 2 Z 7 X ⬅️ 1 Y ⬅️ 2 X ⬅️ 3 X 3 Y 2 Z 7 X ⬅️ 1 Y ⬅️ 2 X ⬅️ 3 X 3 Y 2 Z 7 X ⬅️ 1 Y ⬅️ 2 X ⬅️ 3 X 3 Y 2 Z 7 X ⬅️ 1 Y ⬅️ 2 X ⬅️ 3 X 3 Y 2 Z 7 X ⬅️ 1 Y ⬅️ 2 X ⬅️ 3
  • 9. Elections X 3 Y 2 Z 7 X ⬅️ 1 Y ⬅️ 2 X ⬅️ 3 X 3 Y 2 Z 7 X ⬅️ 1 Y ⬅️ 2 X ⬅️ 3 X 3 Y 2 Z 7 X ⬅️ 1 Y ⬅️ 2 X ⬅️ 3 X 3 Y 2 Z 7 X ⬅️ 1 Y ⬅️ 2 X ⬅️ 3 X 3 Y 2 Z 7 X ⬅️ 1 Y ⬅️ 2 X ⬅️ 3
  • 10. Elections X 3 Y 2 Z 7 X ⬅️ 1 Y ⬅️ 2 X ⬅️ 3 X 3 Y 2 Z 7 X ⬅️ 1 Y ⬅️ 2 X ⬅️ 3 X 3 Y 2 Z 7 X ⬅️ 1 Y ⬅️ 2 X ⬅️ 3 X 3 Y 2 Z 7 X ⬅️ 1 Y ⬅️ 2 X ⬅️ 3 X 3 Y 2 Z 7 X ⬅️ 1 Y ⬅️ 2 X ⬅️ 3
  • 11. Elections X 3 Y 2 Z 7 X ⬅️ 1 Y ⬅️ 2 X ⬅️ 3 X 3 Y 2 Z 7 X ⬅️ 1 Y ⬅️ 2 X ⬅️ 3 X 3 Y 2 Z 7 X ⬅️ 1 Y ⬅️ 2 X ⬅️ 3 X 3 Y 2 Z 7 X ⬅️ 1 Y ⬅️ 2 X ⬅️ 3 X 3 Y 2 Z 7 X ⬅️ 1 Y ⬅️ 2 X ⬅️ 3
  • 12. Elections X 3 Y 2 Z 7 X ⬅️ 1 Y ⬅️ 2 X ⬅️ 3 X 3 Y 2 Z 7 X ⬅️ 1 Y ⬅️ 2 X ⬅️ 3 X 3 Y 2 Z 7 X ⬅️ 1 Y ⬅️ 2 X ⬅️ 3 X 3 Y 2 Z 7 X ⬅️ 1 Y ⬅️ 2 X ⬅️ 3 X 3 Y 2 Z 7 X ⬅️ 1 Y ⬅️ 2 X ⬅️ 3
  • 13. Data Replication X 3 Y 2 Z 7 X ⬅️ 1 Y ⬅️ 2 X ⬅️ 3 X 3 Y 2 Z 7 X ⬅️ 1 Y ⬅️ 2 X ⬅️ 3 X 3 Y 2 Z 7 X ⬅️ 1 Y ⬅️ 2 X ⬅️ 3 X 3 Y 2 Z 7 X ⬅️ 1 Y ⬅️ 2 X ⬅️ 3 X 3 Y 2 Z 7 X ⬅️ 1 Y ⬅️ 2 X ⬅️ 3
  • 14. Agenda • Introduction to consensus • Leader-based replicated state machine • Elections and data replication in MongoDB • Improvements coming in MongoDB 3.2
  • 15. Agenda • Introduction to consensus • Leader-based replicated state machine • Elections and data replication in MongoDB • Improvements coming in MongoDB 3.2 • Goals and inspiration from Raft Consensus Algorithm • Preventing double voting • Monitoring node status • Calling for elections
  • 16. Goals for MongoDB 3.2 • Decrease failover time • Speed up detection and resolution of false primary situations
  • 17. Finding Inspiration in Raft • “In Search of an Understandable Consensus Algorithm” by Diego Ongaro: https://ramcloud.stanford.edu/raft.pdf • Designed to address the shortcomings of Paxos • Easier to understand • Easier to implement in real applications • Provably correct • Remarkably similar to what we’re doing already
  • 18. Raft Concepts • Term (election) IDs • Monitoring node status using existing data replication channel • Asymmetric election timeouts
  • 19. Preventing Double Voting • Can’t vote for 2 nodes in the same election • Pre-3.2: 30 second vote timeout • Post-3.2: Term IDs • Term: • Monotonically increasing ID • Incremented on every election *attempt* • Lets voters distinguish elections so they can vote twice quickly in different elections.
  • 20. Monitoring Node Status • Pre-3.2: Heartbeats • Sent every two seconds from every node to every other node • Volume increases quadratically as nodes are added to the replica set • Post-3.2: Extra metadata sent via existing data replication channel • Utilizes chained replication • Faster heartbeats = faster elections and detection of false primaries
  • 21. Data Replication X 3 Y 2 Z 7 X ⬅️ 1 Y ⬅️ 2 X ⬅️ 3 X 3 Y 2 Z 7 X ⬅️ 1 Y ⬅️ 2 X ⬅️ 3 X 3 Y 2 Z 7 X ⬅️ 1 Y ⬅️ 2 X ⬅️ 3 X 3 Y 2 Z 7 X ⬅️ 1 Y ⬅️ 2 X ⬅️ 3 X 3 Y 2 Z 7 X ⬅️ 1 Y ⬅️ 2 X ⬅️ 3
  • 22. Determining When To Call For An Election • Tradeoff between failover time and spurious failovers • Node calls for an election when it hasn’t heard from the primary within the election timeout • Starting in 3.2: • Election timeout is configurable • Election timeout is varied randomly for each node • Varying timeouts help reduce tied votes • Fewer tied votes = faster failover
  • 23. Conclusion • MongoDB 3.2 will have • Faster failovers • Faster error detection • More control to prevent spurious failovers • This means your systems are • More stable • More resilient to failure • Easier to maintain

Editor's Notes

  1. Every write requires participation from all nodes – like direct democracy A lot of network traffic, poor performance This is like pure Paxos
  2. Leader = Primary This is what most real-world deployments do In mongodb… Oplog = logical transformation of data
  3. One node nominates itself and contacts others Pre-election check – will you vote for me? Election check – do you vote for me? Election notifications – I am now primary
  4. Nodes might vote no: primary exists, repl lag
  5. Chaining Less bandwidth across long hops, less read load on primary
  6. Question break
  7. Define: Failover time. Failover time > election time False primaries Rollbacks
  8. Raft paper presented at Usenix 2014 by Stamford University PHD students Diego Ongaro and John Ousterhout Provably correct Paxos - 1989 Paxos hard to use – “Paxos made simple”, “Paxos made practical”
  9. Continue sending heartbeats Initial sync source, spanning tree maintenance Even less frequent now
  10. Define “election timeout” - *not* time limit of election duration One possibility: Base timeout <1second: ~500ms, plus a few milliseconds