MongoDB 2.8 Replication Internals: Fitting it all together

•Download as PPTX, PDF•

2 likes•794 views

MongoDB replication internal architecture for 2.8 Abstract: Replication in MongoDB requires deep integration with almost every part of the codebase, and has important hooks in various systems like storage, indexing, command processing and querying. Most of the replication components have seen a major overhaul recently in order to make further improvements. In this talk we will address what those pieces are, how they interact, and interesting choices made during their design. In this talk we get into the interaction of the replication protocols, commands really, writes and write concern enforcement, consensus (elections/ leader/follower/ majority) behaviors, and down into the depths of oplog generation and application on replicas. While a large part of the talk will be a technical overview of the big pieces we will dive into many important areas in order to ensure better understanding. The audience will be able to greatly affect which areas we focus on during the session, so come with ideas and a focus.

Technology

Replication
Internals
Fitting Everything Together

2.8, Refactored
● Architecture as of 2.8
● Unit testable; more, and faster, cpp tests
● Many changes (heartbeats, locking, future)
● Interop with 2.6
● Larger replica sets

Large Blocks
● Topology Manager (state machine)
● Replication Coordinator (repl facade)
● Applier (replicate/apply oplog)
● Executor (network, heartbeats, serialization)
● Commands (re-config, init, status, etc)
● External (writes, storage, query, commands)

Blocks
CFG
Topology Manager
Applier
Replication
Coordinator
Oplog
CMDs
Writes
Query
Executor

Topology
● Maintains Authoritative State
o Heartbeat, ping, member state
o Roles and transitions
● Contains Decision Logic
● Unit Testable
● Serial Access
CFG
Topology Manager

Examples
● updateConfig
● prepare*Response for commands
● getSyncSource, *
● setFollowerMode (state)
● processHeartbeat
● prepareHeartbeatResponse

$PrepareHeartbeatResponse Status TopologyCoordinatorImpl::prepareHeartbeatResponse(...) { // Check error conditions, then set response fields … response->setElectable(!_getMyUnelectableReason(...)); response->setHbMsg(_getHbmsg(...)); response->setTime(...); response->setOpTime(lastOpApplied); if (!_syncSource) { response->setSyncingTo(_syncSource); } … topology_coordinator_impl.cpp:628$

Failover Scenario
Heart
beats P
S
HAeaclttihve C Phreimcka (rrysHB)
S

Failover Scenario
Heart
beats P
S
Active Primary
Failed S

Failover Scenario
Heart
beats Failed
P
Health Check (rsHB)
S

Replications Coordinator
● Interface to other subsystems
● Uses executor to schedule
o Commands
o Elections, Initiate, Reconfig
o Role/State Changes
● Unit Testable
o With help, requires mocking out bridge for
subsystems
Replication
Coordinator

Blocks
Applier
Replication
Coordinator
CFG
Oplog
CMDs
Writes
Query
Executor
Topology Manager

Examples
● process*Response for commands
● awaitReplication* (for writes or migration)
● isReplEnabled
● canAcceptWrites*

Accepting writes
static bool checkIsMasterForDatabase(const std::string& db, ...) {
if (!getReplicationCoordinator()->canAcceptWritesForDatabase(db)){
errorDetail->setErrCode(ErrorCodes::NotMaster);
errorDetail->setErrMessage("Not primary while writing to " + ns);
return false;
}
return true;
}

Applier
● Reads from *upstream* oplog
● Applier operations transformations
● Mostly unchanged since 2.4
● Includes UpdatePosition commands
Applier

Read + Apply Decoupled
● Background oplog reader thread (net)
● Pool of oplog applier threads (by collection)
Repl Source
Buffer
Applier
Pool
DB1 DB2
DB4
DB3
Local Oplog
Network

$Replication Operations oplog entry (fields): o = update, o2 = query { "ns" : "test.tags", "op" : "u", "v" : 2, "ts": ..., "o2" : { "_id" : 1 }, "o" : { "$set" : { "tags.4" : "e" } } }$

Executor
● Serializes access to Topology state
● Serializes global state changes wrt db writes
● Processes network requests in IO pool
● Supports event/signal notification

Write Request
● Sent by user
● Interpreted by command subsystem
● Checked by replication coordinator
● Executed
● Idempotent entry recorded in oplog
● ~ Replicated
● ~ Possibly verified during user write request

Write Request
Applier
Replication
Coordinator
CFG
Oplog
CMDs
Writes
Query
Executor
Topology Manager

● Topology Manager (state machine)
● Replication Coordinator (repl facade)
● Applier (replicate/apply oplog)
● Executor (network, heartbeats, serialization)
● Commands (re-config, init, status, etc)
● External (writes, storage, query, commands)

What's hot

Zn task - defcon russia 20DefconRussia

DConf 2016: Keynote by Walter Bright Andrei Alexandrescu

Javascript forloop-letkang taehun

[D2 Campus] Tech meetup (주제: Android) 모바일 머신러닝 [열일한 내 거북ᄆ...Jeongah Shin

Bypassing DEP using ROPJapneet Singh

TestR: generating unit tests for R internalsRoman Tsegelskyi

App secforum2014 andrivet-cplusplus11-metaprogramming_applied_to_software_obf...Cyber Security Alliance

Ejemplos Programas DescompiladosLuis Viteri

Triton and symbolic execution on gdbWei-Bo Chen

CNIT 127 Ch 2: Stack overflows on LinuxSam Bowne

CNIT 127 Ch 4: Introduction to format string bugsSam Bowne

Game Analytics Cluster Schedulercmmdevries

Debugging TV Frame 0x09Dmitry Vostokov

Vm ware fuzzing - defcon russia 20DefconRussia

Slidesshahriar-ro

Ganga: an interface to the LHC computing gridMatt Williams

Klee and angrWei-Bo Chen

The Stack and Buffer OverflowsUTD Computer Security Group

Address/Thread/Memory SanitizerPlatonov Sergey

127 Ch 2: Stack overflows on LinuxSam Bowne

What's hot (20)

Zn task - defcon russia 20

DConf 2016: Keynote by Walter Bright

Javascript forloop-let

[D2 Campus] Tech meetup (주제: Android) 모바일 머신러닝 [열일한 내 거북ᄆ...

Bypassing DEP using ROP

TestR: generating unit tests for R internals

App secforum2014 andrivet-cplusplus11-metaprogramming_applied_to_software_obf...

Ejemplos Programas Descompilados

Triton and symbolic execution on gdb

CNIT 127 Ch 2: Stack overflows on Linux

CNIT 127 Ch 4: Introduction to format string bugs

Game Analytics Cluster Scheduler

Debugging TV Frame 0x09

Vm ware fuzzing - defcon russia 20

Slides

Ganga: an interface to the LHC computing grid

Klee and angr

The Stack and Buffer Overflows

Address/Thread/Memory Sanitizer

127 Ch 2: Stack overflows on Linux

Similar to MongoDB 2.8 Replication Internals: Fitting it all together

Streaming replication in practiceAlexey Lesovsky

Java On CRaCSimon Ritter

Hypertable Nosqlelliando dias

Hypertablebetaisao

Ob1k presentation at Java.ILEran Harel

Php 5.6 From the Inside OutFerenc Kovács

Threading Successes 03 Gamebryoguest40fc7cd

Tech Talk: ONOS- A Distributed SDN Network Operating Systemnvirters

Specialized Compiler for Hash CrackingPositive Hack Days

Finding OOMS in Legacy Systems with the Syslog Telegraf PluginInfluxData

Java Memory ModelŁukasz Koniecki

Troubleshooting PostgreSQL Streaming ReplicationAlexey Lesovsky

Java gpu computingArjan Lamers

Log4j2joergreichert

Apache Flink: Better, Faster & Uncut - Piotr Nowojski, data ArtisansEvention

Chapter Seven(1)bolovv

Logging for ContainersEduardo Silva Pereira

Technical Overview of Apache Drill by Jacques NadeauMapR Technologies

一种多屏时代的通用 web 应用架构勇浩赖

Tp web勇浩赖

Similar to MongoDB 2.8 Replication Internals: Fitting it all together (20)

Streaming replication in practice

Java On CRaC

Hypertable Nosql

Hypertable

Ob1k presentation at Java.IL

Php 5.6 From the Inside Out

Threading Successes 03 Gamebryo

Tech Talk: ONOS- A Distributed SDN Network Operating System

Specialized Compiler for Hash Cracking

Finding OOMS in Legacy Systems with the Syslog Telegraf Plugin

Java Memory Model

Troubleshooting PostgreSQL Streaming Replication

Java gpu computing

Log4j2

Apache Flink: Better, Faster & Uncut - Piotr Nowojski, data Artisans

Chapter Seven(1)

Logging for Containers

Technical Overview of Apache Drill by Jacques Nadeau

一种多屏时代的通用 web 应用架构

Tp web

Recently uploaded

Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia

Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada

Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited

Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j

WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal

Slack Application Development 101 Slidespraypatel2

Finology Group – Insurtech Innovation Award 2024The Digital Insurer

FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh

Understanding the Laravel MVC ArchitecturePixlogix Infotech

Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard

08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls

How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes

08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls

08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls

My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar

Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix

How to convert PDF to text with Nanonetsnaman860154

#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo

04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG

Recently uploaded (20)

Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...

Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024

Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365

Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...

WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service

Slack Application Development 101 Slides

Finology Group – Insurtech Innovation Award 2024

FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi

Understanding the Laravel MVC Architecture

Maximizing Board Effectiveness 2024 Webinar.pptx

08448380779 Call Girls In Greater Kailash - I Women Seeking Men

How to Troubleshoot Apps for the Modern Connected Worker

08448380779 Call Girls In Diplomatic Enclave Women Seeking Men

08448380779 Call Girls In Friends Colony Women Seeking Men

My Hashitalk Indonesia April 2024 Presentation

Swan(sea) Song – personal research during my six years at Swansea ... and bey...

How to convert PDF to text with Nanonets

#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...

04-2024-HHUG-Sales-and-Marketing-Alignment.pptx

MongoDB 2.8 Replication Internals: Fitting it all together

1. Replication Internals Fitting Everything Together

2. 2.8, Refactored ● Architecture as of 2.8 ● Unit testable; more, and faster, cpp tests ● Many changes (heartbeats, locking, future) ● Interop with 2.6 ● Larger replica sets

3. Large Blocks ● Topology Manager (state machine) ● Replication Coordinator (repl facade) ● Applier (replicate/apply oplog) ● Executor (network, heartbeats, serialization) ● Commands (re-config, init, status, etc) ● External (writes, storage, query, commands)

4. Blocks CFG Topology Manager Applier Replication Coordinator Oplog CMDs Writes Query Executor

5. Blocks CFG Topology Manager Applier Replication Coordinator Oplog CMDs Writes Query Executor

6. Topology ● Maintains Authoritative State o Heartbeat, ping, member state o Roles and transitions ● Contains Decision Logic ● Unit Testable ● Serial Access CFG Topology Manager

7. Examples ● updateConfig ● prepare*Response for commands ● getSyncSource, * ● setFollowerMode (state) ● processHeartbeat ● prepareHeartbeatResponse

8. PrepareHeartbeatResponse Status TopologyCoordinatorImpl::prepareHeartbeatResponse(...) { // Check error conditions, then set response fields … response->setElectable(!_getMyUnelectableReason(...)); response->setHbMsg(_getHbmsg(...)); response->setTime(...); response->setOpTime(lastOpApplied); if (!_syncSource) { response->setSyncingTo(_syncSource); } … topology_coordinator_impl.cpp:628

9. Failover Scenario Heart beats P S HAeaclttihve C Phreimcka (rrysHB) S

10. Failover Scenario Heart beats P S Active Primary Failed S

11. Failover Scenario Heart beats Failed P Health Check (rsHB) S

12. Blocks CFG Topology Manager Applier Replication Coordinator Oplog CMDs Writes Query Executor

13. Replications Coordinator ● Interface to other subsystems ● Uses executor to schedule o Commands o Elections, Initiate, Reconfig o Role/State Changes ● Unit Testable o With help, requires mocking out bridge for subsystems Replication Coordinator

14. Blocks Applier Replication Coordinator CFG Oplog CMDs Writes Query Executor Topology Manager

15. Examples ● process*Response for commands ● awaitReplication* (for writes or migration) ● isReplEnabled ● canAcceptWrites*

16. Accepting writes static bool checkIsMasterForDatabase(const std::string& db, ...) { if (!getReplicationCoordinator()->canAcceptWritesForDatabase(db)){ errorDetail->setErrCode(ErrorCodes::NotMaster); errorDetail->setErrMessage("Not primary while writing to " + ns); return false; } return true; }

17. Blocks CFG Topology Manager Applier Replication Coordinator Oplog CMDs Writes Query Executor

18. Applier ● Reads from *upstream* oplog ● Applier operations transformations ● Mostly unchanged since 2.4 ● Includes UpdatePosition commands Applier

19. Read + Apply Decoupled ● Background oplog reader thread (net) ● Pool of oplog applier threads (by collection) Repl Source Buffer Applier Pool DB1 DB2 DB4 DB3 Local Oplog Network

20. Replication Operations oplog entry (fields): o = update, o2 = query { "ns" : "test.tags", "op" : "u", "v" : 2, "ts": ..., "o2" : { "_id" : 1 }, "o" : { "$set" : { "tags.4" : "e" } } }

21. Blocks CFG Topology Manager Applier Replication Coordinator Oplog CMDs Writes Query Executor

22. Executor ● Serializes access to Topology state ● Serializes global state changes wrt db writes ● Processes network requests in IO pool ● Supports event/signal notification

23. Write Request ● Sent by user ● Interpreted by command subsystem ● Checked by replication coordinator ● Executed ● Idempotent entry recorded in oplog ● ~ Replicated ● ~ Possibly verified during user write request

24. Write Request Applier Replication Coordinator CFG Oplog CMDs Writes Query Executor Topology Manager

25. ● Topology Manager (state machine) ● Replication Coordinator (repl facade) ● Applier (replicate/apply oplog) ● Executor (network, heartbeats, serialization) ● Commands (re-config, init, status, etc) ● External (writes, storage, query, commands)

26. Thanks Questions?

MongoDB 2.8 Replication Internals: Fitting it all together

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to MongoDB 2.8 Replication Internals: Fitting it all together

Similar to MongoDB 2.8 Replication Internals: Fitting it all together (20)

More from Scott Hernandez

More from Scott Hernandez (13)

Recently uploaded

Recently uploaded (20)

MongoDB 2.8 Replication Internals: Fitting it all together