SlideShare a Scribd company logo
RGLock: Recoverable Mutual
Exclusion for Non-Volatile Main
Memory Systems
MASc Thesis Seminar
by
Aditya Ramaraju
Academic Supervisor: Prof. Wojciech Golab
Outline
 Preliminaries: Spinlocks
 Motivation: Crash-recovery, NVMM
 Shortcomings in Related Work
 Execution Model
 Recoverable Mutual Exclusion
 RGLock Algorithm
 Proof Sketch
 Conclusion:
 Learnings
 Limitations
 Further research
 Summary of contributions
2
Preliminaries
 The primary challenge of concurrency is managing access to shared, mutable state.
 If there is no controlled access to shared data, some processes will obtain an inconsistent
view of this data.
 A race condition arises when any two concurrent processes simultaneously modifying the
value of a shared variable can produce different outcomes, depending on their sequence
of operations.
 Critical Section (CS), a block of code to manipulate shared data is needed to avoid race
conditions in multiprocessor programming.
3
Preliminaries
 Mutual Exclusion is the problem of implementing a CS such that no two concurrent
processes execute the CS simultaneously.
 Generally, processes gain permission to access CS by acquiring the lock in an entry
protocol and then release the lock in an exit protocol, after completing the CS.
 Actions that do not involve the protected shared resource are categorized under non-
critical section (NCS).
4
Preliminaries
 A concurrent program is thus defined as a non-terminating loop alternating between
critical and non-critical sections.
 A passage is a single iteration of such loop consisting of four sections of code in a
concurrent program with the following structure:
 Doorway: a wait-free block of code in the entry protocol.
 If the mutex is already being held by another process, busy-waiting is performed by a
technique called spinning.
5
Entry
Protocol
Critical
Section
Exit
Protocol
Non-
Critical
Section
Preliminaries
 Spin-locks:
• Attempt to acquire lock by repeatedly polling a shared variable.
• Release the lock by resetting the spin variable.
• Eg: Test-and-Set lock, Ticket lock, etc.
• Prone to high contention on single cache line.
 Queue-based locks:
• contending processes “line up” in a queue, only head enters the CS.
• FCFS guarantee, high scalability.
 In-depth surveys by Raynal (1986), Anderson et al. (2003), and Buhr et al. (2014).
6
Preliminaries
 MCS Lock (1991):
• Gained most widespread usage and popularity.
• Relies on fetch_and_store availability for doorway.
• Makes use of compare_and_swap (CAS) in lock release.
• Generates 𝒪(1) remote memory references.
• Requires only a constant amount of space per lock per process.
• Guarantees Mutual Exclusion, FCFS order, and Starvation freedom.
7
Motivation
 Crash-recovery:
• Examples of crash failures: system crash, power loss, accidental or intentional termination, heuristic
deadlock recovery mechanisms, etc.
• In a crash-recovery model, a failed process may be resurrected after a crash failure to resume execution of
its algorithm.
• Several crash-recovery techniques exist for the message-passing model, which use check-pointing and
message logging.
• For DSM and CC models with SRAM-based caches and DRAM-based memories, such techniques are
poorly suited owing to frequent disk accesses.
8
Motivation
 Crash-recoverable Mutex:
• Lamport was first to consider failures in his Bakery algorithm: processes ‘restart’ in NCS when they fail.
• However, none of the prominent mutual exclusion algorithms (Peterson’s, Lamport’s Bakery, MCS, etc.) can
provide fault-tolerance “out-of-the box” if the state of the spin variable is lost in a crash failure.
 Goals for a Crash-recoverable Mutex:
– No process’s queue entry is lost in the crash, i.e., no process in the system should starve due to a crash.
– Each process contains at most one instance of its record in the lock queue.
– At most one process owns the lock. Also, at most one process at a time believes it is the lock-holder.
– If a lock-holder crashes, then it should not lose the ownership when it recovers from the crash.
– No process should wait indefinitely to relinquish its lock ownership.
9
Motivation
 How NVMM is a big step in the quest for a crash-recoverable mutex:
• Potentially the most advanced alternative to the 40-year old CPU, DRAM and disk design.
• Combines the high speed of SRAM, the density of DRAM and the non-volatility of flash memory.
• All execution state can be dissociated from process crashes and power failures by storing it on a persistent
non-volatile medium (PCM, FeRAM, MRAM, memristors, etc.).
10
Image: K. Bailey and L. Ceze, “Operating system implications of fast, cheap, non-volatile memory,” Proceedings
of the 13th USENIX conference on Hot topics in operating systems. USENIX Association, pp. 2–2, 2011.
Motivation
 Why “out-of-the box” MCS is a poor fit in the event of a crash
(even in NVMM systems):
• Besides the state of the PC, the evidence of a process ever completing
the doorway is lost in the crash.
• A lock holder
• may attempt to acquire lock again
• may never set the lock free
• may never relinquish the lock
• A busy-waiting process
• may attempt to enter the queue twice!
• may never link itself behind last known predecessor
• may block itself even though it was just promoted
• In all cases above, the progress of most active processes in the queue
is impeded.
11
Shortcomings in Related Work
 Bohannon et al. (1995 & 96) proposed recovery mechanisms for test_and_set lock and
MCS Lock. Michael and Kim (2009) proposed a CAS-based implementation of a
recoverable queue lock.
 However, in the event of a crash, their solutions
 require the OS/scheduler to play ‘Big Brother’
 are highly inefficient in large non-homogeneous systems
 involve a ‘cleanup’ routine that itself is assumed to never crash
 do not account for system crash, i.e., all processes fail simultaneously
 do not guarantee FCFS due to “usurping” of lock from a dead process
 do not guarantee starvation freedom and are also prone to priority inversion during “cleanup”
12
Execution Model
 Hardware considerations:
 An asynchronous multi-processor architecture of Cache Coherent (CC) model – write-through approach
 The main memory modules are based on the persistent and reliable Non-Volatile Random Access Memory
(NVRAM) medium. We assume that
• Information once stored in NVRAM is never lost or corrupted.
• the caching and memory ordering can be controlled to the point where the shared memory operations
are atomic and durable.
 Local memory references (e.g., in-cache reads) vs Remote Memory References (RMRs).
 The time complexity of our algorithm is measured by counting the RMRs performed during a passage.
 Support for swap_and_store (SAS) and compare_and_swap (CAS) instructions.
13
Execution Model
 Formalism:
 We use a less formal approach to the I/O automata model, by defining the behavior of processes using a
pseudo-code representation.
 A process is a sequential program consisting of operations on variables. Each variable is either private or
shared. Each process also has a special private variable, program counter (PC).
 A step in a history corresponds to a statement execution or a crash.
 The processes in the system interact with a finite set of variables in corresponding sequence of steps
recorded in an execution history 𝐻 ∈ ℋ.
 In a fair history, each individual process in the system is given an opportunity to perform its locally controlled
steps infinitely often.
14
Execution Model
 Formalism (contd..):
 A crash is a failure in an execution of one process where the private variables of the crashed process are
reset to their initial values and the process simply stops executing any computation until it is active again.
 A crash-recovery procedure reconstructs a crashed process’s state and resumes its active execution from
the point of failure in the algorithm.
 A process is said to be in recovery until the execution of its crash-recovery procedure is complete.
 Classification of steps:
 Normal step
 Crash-recovery step
 CS step
15
Execution Model
 Formalism (contd..):
 A crash-recoverable execution history is a fair history wherein every process either executes infinitely many
passages or crashes a finite number of times.
 In other words, if a process is ever inactive, it is not because it is crashing indefinitely.
16
Execution Model
 Summary of assumptions:
 A process in recovery reconstructs its state from the shared variables stored in non-volatile memory.
 Process crashes are independent, i.e., failure of one process does not crash other active processes in the
system.
 Other active processes in the system may read, modify and write to the globally accessible shared variables
of a process in recovery.
 The code for critical section is idempotent and harmlessly repeatable by a process in recovery if it has the
necessary exclusive access to do so.
17
Recoverable Mutual Exclusion
 To the best of our knowledge, we are the first to provide a formal specification to the
correctness properties of Recoverable Mutual Exclusion.
 A crash-recoverable mutex satisfies all the following :
 Mutual Exclusion (ME)
 First-come-first-served (FCFS)
 Livelock-freedom (LF)
 Starvation-freedom (SF)
 Terminating Exit (TE)
 Finite Recovery (FR)
18
RGLock
19
RMEQ
 𝑅𝑀𝐸𝑄 is a linked-list of qnodes.
 Each qnode contains:
 a checkpoint number 𝑐ℎ𝑘.
 an 𝑎ℎ𝑒𝑎𝑑 pointer that determines the links in 𝑅𝑀𝐸𝑄 and also acts as the spin variable.
 a 𝑛𝑒𝑥𝑡 pointer to hold the address of the successor qnode.
 The lock is represented by pointer 𝐿, set either to 𝑛𝑢𝑙𝑙 when the lock is free or to the tail
qnode of 𝑅𝑀𝐸𝑄.
 Processes append their qnodes to 𝑅𝑀𝐸𝑄 using the SAS instruction (doorway).
 The process with head qnode in 𝑅𝑀𝐸𝑄 is the lock-holder.
 To release a lock a process either sets 𝐿 to 𝑛𝑢𝑙𝑙 if it has no immediate successor in 𝑅𝑀𝐸𝑄, or
flips the successor’s spin variable to 𝑛𝑢𝑙𝑙.
20
RMEQ
21
NULL
NULL
ID: P1
Chk: 2
1
ID: P2
Chk:1
NULL
Lock L
ID: P4
Chk:1
1
ID: P3
Chk:1
1
ID: P5
Chk:1
ID: P8
Chk:0/1/3
NULL
NULL
or
some
qnode
𝑞𝑖 ∈ 𝑅𝑀𝐸𝑄
Head qnode
𝑞𝑖 ∉ 𝑅𝑀𝐸𝑄
Crashed qnode
Index:
RGLock Algorithm
 Overview:
 All processes start from an initial state in the NCS.
 In a failure-free passage, execute 𝑎𝑐𝑞𝑢𝑖𝑟𝑒_𝑙𝑜𝑐𝑘, CS and 𝑟𝑒𝑙𝑒𝑎𝑠𝑒_𝑙𝑜𝑐𝑘 before returning to NCS.
 A process may take several steps in NCS until subsequent request for lock acquisition.
 If a process crashes at any point of execution within a failure-free passage, it
 reads the state of its qnode from NVRAM;
 invokes corresponding recovery procedure based on the 𝑐ℎ𝑘 value;
 identifies the position of its qnode in RMEQ; and then
 completes the crash-recoverable passage accordingly and returns to NCS.
22
RGLock Algorithm
 atomic 𝒔𝒘𝒂𝒑_𝒂𝒏𝒅_𝒔𝒕𝒐𝒓𝒆 (SAS):
 In one indivisible atomic step, a fetch_and_store is immediately followed by another store that writes the result
of the fetch_and_store operation to a location in the invoking process’s non-volatile memory.
 Ensures strict FCFS order in lock acquisitions.
 Aids a process in recovery in identifying the position of its qnode in 𝑅𝑀𝐸𝑄.
 Pseudo-code representation:
function SAS (old_element: address, new_element: value, location: address)
atomic {
temp: val_type := *old_element
*old_element := new_element
*location := temp
}
23
RGLock Algorithm
24
MCS Lock
RGLock
NCS (𝑞𝑖 ∉ 𝑅𝑀𝐸𝑄)
𝑞𝑖. 𝑎ℎ𝑒𝑎𝑑. 𝑛𝑒𝑥𝑡 = 𝑞𝑖 𝑞𝑖. 𝑐ℎ𝑘 ≔ 2
SAS(L,𝑞𝑖, 𝑞𝑖ahead)
𝑞𝑖.ahead ≠ null 𝑞𝑖.ahead = null
𝑞𝑖.ahead null
INITIAL
𝑞𝑖.next = null
𝑞𝑖.next ≠ null
false
𝑞𝑖.𝑛𝑒𝑥𝑡.𝑎ℎ𝑒𝑎𝑑≔𝑛𝑢𝑙𝑙
if
CAS
𝑞𝑖 ∈ 𝑅𝑀𝐸𝑄
true
if
CS
ENTRYProtocol
EXITProtocol
Crash-recovery step
recoverHead
recoverReleaserecoverBlocked
failureFree
25
Index:
Normal step
Crash step
Recovery procedure
selected based on
𝑞𝑖. 𝑐ℎ𝑘 value
𝑞𝑖.next
≠ null ?
true
RGLock Algorithm
Crash-recovery procedures
26
RGLock: crash-recovery procedures
Crash-recovery procedures
27
recoverBlocked recoverHead recoverRelease
• Invoked if 𝑞𝑖. 𝑐ℎ𝑘 = 1
immediately after crash.
• Check if 𝑞𝑖 ∈ 𝑅𝑀𝐸𝑄
• If yes,
• busy-wait in
waitForCS until 𝑞𝑖
is head
• proceed to CS in
recoverHead
• release the lock
• If no,
• return false
• execute failureFree
• Invoked if 𝑞𝑖. 𝑐ℎ𝑘 = 2
immediately after crash
or within
recoverBlocked.
• Execute CS
• Release the lock
• Invoked if 𝑞𝑖. 𝑐ℎ𝑘 = 3
immediately after crash
• Check if 𝑞𝑖 ∈ 𝑅𝑀𝐸𝑄
• If yes,
• release the lock
• If no,
• reset 𝑞𝑖. 𝑐ℎ𝑘 and
return to NCS
failureFree
• Invoked if 𝑞𝑖. 𝑐ℎ𝑘 =0
immediately after crash
or if recoverBlocked
returns false.
Proof Sketch
28
 The correctness of our algorithm is derived by an induction on the length of the execution history or by contradiction,
where applicable.
 We use a history variable 𝑄 which represents the sequence of process IDs whose qnodes are in 𝑅𝑀𝐸𝑄.
 An invariant is established with respect to the state of 𝐿, 𝑄, 𝑅𝑀𝐸𝑄 and the 𝑎ℎ𝑒𝑎𝑑 and 𝑐ℎ𝑘 fields on a qnode.
 We show that the elements of 𝑄 are the same as the qnodes in 𝑅𝑀𝐸𝑄 at the end of a finite history, in that order.
 The head qnode of 𝑅𝑀𝐸𝑄 is the lock holder and since 𝑄 always has at most one head element, ME is guaranteed.
 FCFS, SF, LF, and TE are proved by contradiction, using the invariant.
 And since every procedure in the RGLock algorithm terminates in a finite number of steps, FR is guaranteed.
 Finally, we show that the RGLock algorithm incurs 𝒪(1) RMRs per process per failure-free passage.
Conclusion
29
 Learnings (for me, that is):
 Less is more.
 For designing synchronization datastructures.
 Evolution of qnodes in RMEQ.
 Asynchrony is a harsh mistress.
 𝑓𝑖𝑛𝑑𝑀𝑒 accuracy.
 𝑤𝑎𝑖𝑡𝐹𝑜𝑟𝐶𝑆 correctness.
Conclusion
30
 Known Limitations
 Requires support for an unconventional hardware instruction (SAS).
 𝑓𝑖𝑛𝑑𝑀𝑒 presets the no. of processes in the system.
 Further Research
 Programmatic implementation of the algorithm.
 Simplify the code for more rigorous analysis.
 Bakery algorithm in the context of crash-recovery for NVMM.
 Make provision for processes to be added to the system even after the algorithm is initialized.
 Potential Impact
 In-memory databases for ‘always-on’ applications and high-performance computing.
Conclusion
31
 Summary of Contributions:
 Formal specification of the correctness properties of Recoverable Mutual Exclusion.
 RGLock: a first-of-its-kind crash-recoverable mutual exclusion lock for NVMM systems.
 Proposed doorway instruction could help guide design of future NVMM architectures.
 Distinguishing RGLock from earlier attempts for crash-recoverable mutex:
 RGLock satisfies all safety and liveness properties simultaneously in presence of crash failures.
 RGLock tolerates failures on any individual component, including a lock-holder, and system-wide crashes as well.
 Compared to MCS Lock, RGLock does not inflate time complexity in failure-free execution.
 A comprehensive proof of correctness for the RGLock algorithm.
Let’s Talk!
32
NCS (𝑞𝑖 ∉ 𝑅𝑀𝐸𝑄)
𝑞𝑖. 𝑎ℎ𝑒𝑎𝑑. 𝑛𝑒𝑥𝑡 = 𝑞𝑖 𝑞𝑖. 𝑐ℎ𝑘 ≔ 2
SAS(L,𝑞𝑖, 𝑞𝑖ahead)
𝑞𝑖.ahead ≠ null 𝑞𝑖.ahead = null
𝑞𝑖.ahead null
INITIAL
𝑞𝑖.next = null
𝑞𝑖.next ≠ null
false
𝑞𝑖.𝑛𝑒𝑥𝑡.𝑎ℎ𝑒𝑎𝑑≔𝑛𝑢𝑙𝑙
if
CAS
𝑞𝑖 ∈ 𝑅𝑀𝐸𝑄
true
if
CS
ENTRYProtocol
EXITProtocol
Crash-recovery step
recoverHead
recoverReleaserecoverBlocked
failureFree
𝑞𝑖.next
≠ null ?
true

More Related Content

What's hot

Unit 5 ppt
Unit 5 pptUnit 5 ppt
Unit 5 ppt
jayanarayana reddy
 
jvm/java - towards lock-free concurrency
jvm/java - towards lock-free concurrencyjvm/java - towards lock-free concurrency
jvm/java - towards lock-free concurrency
Arvind Kalyan
 
Eee598 Project New
Eee598 Project NewEee598 Project New
Eee598 Project New
supriyacs
 
Concurrency 2010
Concurrency 2010Concurrency 2010
Concurrency 2010
敬倫 林
 
Real-Time Scheduling Algorithms
Real-Time Scheduling AlgorithmsReal-Time Scheduling Algorithms
Real-Time Scheduling Algorithms
AJAL A J
 
OS Process and Thread Concepts
OS Process and Thread ConceptsOS Process and Thread Concepts
OS Process and Thread Concepts
sgpraju
 
CS9222 ADVANCED OPERATING SYSTEMS
CS9222 ADVANCED OPERATING SYSTEMSCS9222 ADVANCED OPERATING SYSTEMS
CS9222 ADVANCED OPERATING SYSTEMS
Kathirvel Ayyaswamy
 
Real Time Operating Systems
Real Time Operating SystemsReal Time Operating Systems
Real Time Operating SystemsRohit Joshi
 
Process, Threads, Symmetric Multiprocessing and Microkernels in Operating System
Process, Threads, Symmetric Multiprocessing and Microkernels in Operating SystemProcess, Threads, Symmetric Multiprocessing and Microkernels in Operating System
Process, Threads, Symmetric Multiprocessing and Microkernels in Operating SystemLieYah Daliah
 
Multi threaded rtos
Multi threaded rtosMulti threaded rtos
Multi threaded rtos
James Wong
 
8. mutual exclusion in Distributed Operating Systems
8. mutual exclusion in Distributed Operating Systems8. mutual exclusion in Distributed Operating Systems
8. mutual exclusion in Distributed Operating Systems
Dr Sandeep Kumar Poonia
 
Real Time Operating Systems
Real Time Operating SystemsReal Time Operating Systems
Real Time Operating Systems
Pawandeep Kaur
 
Operating system 23 process synchronization
Operating system 23 process synchronizationOperating system 23 process synchronization
Operating system 23 process synchronization
Vaibhav Khanna
 
Cpu scheduling pre final formatting
Cpu scheduling pre final formattingCpu scheduling pre final formatting
Cpu scheduling pre final formatting
marangburu42
 
LSC Revisited - From Scenarios to Distributed Components
LSC Revisited - From Scenarios to Distributed ComponentsLSC Revisited - From Scenarios to Distributed Components
LSC Revisited - From Scenarios to Distributed Components
Dirk Fahland
 

What's hot (15)

Unit 5 ppt
Unit 5 pptUnit 5 ppt
Unit 5 ppt
 
jvm/java - towards lock-free concurrency
jvm/java - towards lock-free concurrencyjvm/java - towards lock-free concurrency
jvm/java - towards lock-free concurrency
 
Eee598 Project New
Eee598 Project NewEee598 Project New
Eee598 Project New
 
Concurrency 2010
Concurrency 2010Concurrency 2010
Concurrency 2010
 
Real-Time Scheduling Algorithms
Real-Time Scheduling AlgorithmsReal-Time Scheduling Algorithms
Real-Time Scheduling Algorithms
 
OS Process and Thread Concepts
OS Process and Thread ConceptsOS Process and Thread Concepts
OS Process and Thread Concepts
 
CS9222 ADVANCED OPERATING SYSTEMS
CS9222 ADVANCED OPERATING SYSTEMSCS9222 ADVANCED OPERATING SYSTEMS
CS9222 ADVANCED OPERATING SYSTEMS
 
Real Time Operating Systems
Real Time Operating SystemsReal Time Operating Systems
Real Time Operating Systems
 
Process, Threads, Symmetric Multiprocessing and Microkernels in Operating System
Process, Threads, Symmetric Multiprocessing and Microkernels in Operating SystemProcess, Threads, Symmetric Multiprocessing and Microkernels in Operating System
Process, Threads, Symmetric Multiprocessing and Microkernels in Operating System
 
Multi threaded rtos
Multi threaded rtosMulti threaded rtos
Multi threaded rtos
 
8. mutual exclusion in Distributed Operating Systems
8. mutual exclusion in Distributed Operating Systems8. mutual exclusion in Distributed Operating Systems
8. mutual exclusion in Distributed Operating Systems
 
Real Time Operating Systems
Real Time Operating SystemsReal Time Operating Systems
Real Time Operating Systems
 
Operating system 23 process synchronization
Operating system 23 process synchronizationOperating system 23 process synchronization
Operating system 23 process synchronization
 
Cpu scheduling pre final formatting
Cpu scheduling pre final formattingCpu scheduling pre final formatting
Cpu scheduling pre final formatting
 
LSC Revisited - From Scenarios to Distributed Components
LSC Revisited - From Scenarios to Distributed ComponentsLSC Revisited - From Scenarios to Distributed Components
LSC Revisited - From Scenarios to Distributed Components
 

Similar to Seminar

Processscheduling 161001112521
Processscheduling 161001112521Processscheduling 161001112521
Processscheduling 161001112521
marangburu42
 
Processscheduling 161001112521
Processscheduling 161001112521Processscheduling 161001112521
Processscheduling 161001112521
marangburu42
 
Critical section operating system
Critical section  operating systemCritical section  operating system
Critical section operating system
Muhammad Baqar Kazmi
 
Operating System- INTERPROCESS COMMUNICATION.docx
Operating System- INTERPROCESS COMMUNICATION.docxOperating System- INTERPROCESS COMMUNICATION.docx
Operating System- INTERPROCESS COMMUNICATION.docx
minaltmv
 
Techno-Fest-15nov16
Techno-Fest-15nov16Techno-Fest-15nov16
Techno-Fest-15nov16
Satish Navkar
 
Linux synchronization tools
Linux synchronization toolsLinux synchronization tools
Linux synchronization toolsmukul bhardwaj
 
Memory model
Memory modelMemory model
Memory model
Yi-Hsiu Hsu
 
Concurrency: Mutual Exclusion and Synchronization
Concurrency: Mutual Exclusion and SynchronizationConcurrency: Mutual Exclusion and Synchronization
Concurrency: Mutual Exclusion and Synchronization
Anas Ebrahim
 
UNIT 2-UNDERSTANDING THE SYNCHRONIZATION PROCESS.pptx
UNIT 2-UNDERSTANDING THE SYNCHRONIZATION PROCESS.pptxUNIT 2-UNDERSTANDING THE SYNCHRONIZATION PROCESS.pptx
UNIT 2-UNDERSTANDING THE SYNCHRONIZATION PROCESS.pptx
LeahRachael
 
Distributed computing
Distributed  computingDistributed  computing
Distributed computing
Swetha544947
 
Operating Systems - Process Synchronization and Deadlocks
Operating Systems - Process Synchronization and DeadlocksOperating Systems - Process Synchronization and Deadlocks
Operating Systems - Process Synchronization and Deadlocks
Mukesh Chinta
 
Process coordination
Process coordinationProcess coordination
Process coordination
Sweta Kumari Barnwal
 
Processes, Threads and Scheduler
Processes, Threads and SchedulerProcesses, Threads and Scheduler
Processes, Threads and Scheduler
Munazza-Mah-Jabeen
 
CS 301 Computer ArchitectureStudent # 1 EID 09Kingdom of .docx
CS 301 Computer ArchitectureStudent # 1 EID 09Kingdom of .docxCS 301 Computer ArchitectureStudent # 1 EID 09Kingdom of .docx
CS 301 Computer ArchitectureStudent # 1 EID 09Kingdom of .docx
faithxdunce63732
 
Operating system Interview Questions
Operating system Interview QuestionsOperating system Interview Questions
Operating system Interview Questions
Kuntal Bhowmick
 
Operating Systems - "Chapter 5 Process Synchronization"
Operating Systems - "Chapter 5 Process Synchronization"Operating Systems - "Chapter 5 Process Synchronization"
Operating Systems - "Chapter 5 Process Synchronization"
Ra'Fat Al-Msie'deen
 
CS9222 ADVANCED OPERATING SYSTEMS
CS9222 ADVANCED OPERATING SYSTEMSCS9222 ADVANCED OPERATING SYSTEMS
CS9222 ADVANCED OPERATING SYSTEMS
Kathirvel Ayyaswamy
 
Os solved question paper
Os solved question paperOs solved question paper
Os solved question paperAnkit Bhatnagar
 
Review of Some Checkpointing Schemes for Distributed and Mobile Computing Env...
Review of Some Checkpointing Schemes for Distributed and Mobile Computing Env...Review of Some Checkpointing Schemes for Distributed and Mobile Computing Env...
Review of Some Checkpointing Schemes for Distributed and Mobile Computing Env...
Eswar Publications
 

Similar to Seminar (20)

Processscheduling 161001112521
Processscheduling 161001112521Processscheduling 161001112521
Processscheduling 161001112521
 
Processscheduling 161001112521
Processscheduling 161001112521Processscheduling 161001112521
Processscheduling 161001112521
 
Critical section operating system
Critical section  operating systemCritical section  operating system
Critical section operating system
 
Os
OsOs
Os
 
Operating System- INTERPROCESS COMMUNICATION.docx
Operating System- INTERPROCESS COMMUNICATION.docxOperating System- INTERPROCESS COMMUNICATION.docx
Operating System- INTERPROCESS COMMUNICATION.docx
 
Techno-Fest-15nov16
Techno-Fest-15nov16Techno-Fest-15nov16
Techno-Fest-15nov16
 
Linux synchronization tools
Linux synchronization toolsLinux synchronization tools
Linux synchronization tools
 
Memory model
Memory modelMemory model
Memory model
 
Concurrency: Mutual Exclusion and Synchronization
Concurrency: Mutual Exclusion and SynchronizationConcurrency: Mutual Exclusion and Synchronization
Concurrency: Mutual Exclusion and Synchronization
 
UNIT 2-UNDERSTANDING THE SYNCHRONIZATION PROCESS.pptx
UNIT 2-UNDERSTANDING THE SYNCHRONIZATION PROCESS.pptxUNIT 2-UNDERSTANDING THE SYNCHRONIZATION PROCESS.pptx
UNIT 2-UNDERSTANDING THE SYNCHRONIZATION PROCESS.pptx
 
Distributed computing
Distributed  computingDistributed  computing
Distributed computing
 
Operating Systems - Process Synchronization and Deadlocks
Operating Systems - Process Synchronization and DeadlocksOperating Systems - Process Synchronization and Deadlocks
Operating Systems - Process Synchronization and Deadlocks
 
Process coordination
Process coordinationProcess coordination
Process coordination
 
Processes, Threads and Scheduler
Processes, Threads and SchedulerProcesses, Threads and Scheduler
Processes, Threads and Scheduler
 
CS 301 Computer ArchitectureStudent # 1 EID 09Kingdom of .docx
CS 301 Computer ArchitectureStudent # 1 EID 09Kingdom of .docxCS 301 Computer ArchitectureStudent # 1 EID 09Kingdom of .docx
CS 301 Computer ArchitectureStudent # 1 EID 09Kingdom of .docx
 
Operating system Interview Questions
Operating system Interview QuestionsOperating system Interview Questions
Operating system Interview Questions
 
Operating Systems - "Chapter 5 Process Synchronization"
Operating Systems - "Chapter 5 Process Synchronization"Operating Systems - "Chapter 5 Process Synchronization"
Operating Systems - "Chapter 5 Process Synchronization"
 
CS9222 ADVANCED OPERATING SYSTEMS
CS9222 ADVANCED OPERATING SYSTEMSCS9222 ADVANCED OPERATING SYSTEMS
CS9222 ADVANCED OPERATING SYSTEMS
 
Os solved question paper
Os solved question paperOs solved question paper
Os solved question paper
 
Review of Some Checkpointing Schemes for Distributed and Mobile Computing Env...
Review of Some Checkpointing Schemes for Distributed and Mobile Computing Env...Review of Some Checkpointing Schemes for Distributed and Mobile Computing Env...
Review of Some Checkpointing Schemes for Distributed and Mobile Computing Env...
 

Seminar

  • 1. RGLock: Recoverable Mutual Exclusion for Non-Volatile Main Memory Systems MASc Thesis Seminar by Aditya Ramaraju Academic Supervisor: Prof. Wojciech Golab
  • 2. Outline  Preliminaries: Spinlocks  Motivation: Crash-recovery, NVMM  Shortcomings in Related Work  Execution Model  Recoverable Mutual Exclusion  RGLock Algorithm  Proof Sketch  Conclusion:  Learnings  Limitations  Further research  Summary of contributions 2
  • 3. Preliminaries  The primary challenge of concurrency is managing access to shared, mutable state.  If there is no controlled access to shared data, some processes will obtain an inconsistent view of this data.  A race condition arises when any two concurrent processes simultaneously modifying the value of a shared variable can produce different outcomes, depending on their sequence of operations.  Critical Section (CS), a block of code to manipulate shared data is needed to avoid race conditions in multiprocessor programming. 3
  • 4. Preliminaries  Mutual Exclusion is the problem of implementing a CS such that no two concurrent processes execute the CS simultaneously.  Generally, processes gain permission to access CS by acquiring the lock in an entry protocol and then release the lock in an exit protocol, after completing the CS.  Actions that do not involve the protected shared resource are categorized under non- critical section (NCS). 4
  • 5. Preliminaries  A concurrent program is thus defined as a non-terminating loop alternating between critical and non-critical sections.  A passage is a single iteration of such loop consisting of four sections of code in a concurrent program with the following structure:  Doorway: a wait-free block of code in the entry protocol.  If the mutex is already being held by another process, busy-waiting is performed by a technique called spinning. 5 Entry Protocol Critical Section Exit Protocol Non- Critical Section
  • 6. Preliminaries  Spin-locks: • Attempt to acquire lock by repeatedly polling a shared variable. • Release the lock by resetting the spin variable. • Eg: Test-and-Set lock, Ticket lock, etc. • Prone to high contention on single cache line.  Queue-based locks: • contending processes “line up” in a queue, only head enters the CS. • FCFS guarantee, high scalability.  In-depth surveys by Raynal (1986), Anderson et al. (2003), and Buhr et al. (2014). 6
  • 7. Preliminaries  MCS Lock (1991): • Gained most widespread usage and popularity. • Relies on fetch_and_store availability for doorway. • Makes use of compare_and_swap (CAS) in lock release. • Generates 𝒪(1) remote memory references. • Requires only a constant amount of space per lock per process. • Guarantees Mutual Exclusion, FCFS order, and Starvation freedom. 7
  • 8. Motivation  Crash-recovery: • Examples of crash failures: system crash, power loss, accidental or intentional termination, heuristic deadlock recovery mechanisms, etc. • In a crash-recovery model, a failed process may be resurrected after a crash failure to resume execution of its algorithm. • Several crash-recovery techniques exist for the message-passing model, which use check-pointing and message logging. • For DSM and CC models with SRAM-based caches and DRAM-based memories, such techniques are poorly suited owing to frequent disk accesses. 8
  • 9. Motivation  Crash-recoverable Mutex: • Lamport was first to consider failures in his Bakery algorithm: processes ‘restart’ in NCS when they fail. • However, none of the prominent mutual exclusion algorithms (Peterson’s, Lamport’s Bakery, MCS, etc.) can provide fault-tolerance “out-of-the box” if the state of the spin variable is lost in a crash failure.  Goals for a Crash-recoverable Mutex: – No process’s queue entry is lost in the crash, i.e., no process in the system should starve due to a crash. – Each process contains at most one instance of its record in the lock queue. – At most one process owns the lock. Also, at most one process at a time believes it is the lock-holder. – If a lock-holder crashes, then it should not lose the ownership when it recovers from the crash. – No process should wait indefinitely to relinquish its lock ownership. 9
  • 10. Motivation  How NVMM is a big step in the quest for a crash-recoverable mutex: • Potentially the most advanced alternative to the 40-year old CPU, DRAM and disk design. • Combines the high speed of SRAM, the density of DRAM and the non-volatility of flash memory. • All execution state can be dissociated from process crashes and power failures by storing it on a persistent non-volatile medium (PCM, FeRAM, MRAM, memristors, etc.). 10 Image: K. Bailey and L. Ceze, “Operating system implications of fast, cheap, non-volatile memory,” Proceedings of the 13th USENIX conference on Hot topics in operating systems. USENIX Association, pp. 2–2, 2011.
  • 11. Motivation  Why “out-of-the box” MCS is a poor fit in the event of a crash (even in NVMM systems): • Besides the state of the PC, the evidence of a process ever completing the doorway is lost in the crash. • A lock holder • may attempt to acquire lock again • may never set the lock free • may never relinquish the lock • A busy-waiting process • may attempt to enter the queue twice! • may never link itself behind last known predecessor • may block itself even though it was just promoted • In all cases above, the progress of most active processes in the queue is impeded. 11
  • 12. Shortcomings in Related Work  Bohannon et al. (1995 & 96) proposed recovery mechanisms for test_and_set lock and MCS Lock. Michael and Kim (2009) proposed a CAS-based implementation of a recoverable queue lock.  However, in the event of a crash, their solutions  require the OS/scheduler to play ‘Big Brother’  are highly inefficient in large non-homogeneous systems  involve a ‘cleanup’ routine that itself is assumed to never crash  do not account for system crash, i.e., all processes fail simultaneously  do not guarantee FCFS due to “usurping” of lock from a dead process  do not guarantee starvation freedom and are also prone to priority inversion during “cleanup” 12
  • 13. Execution Model  Hardware considerations:  An asynchronous multi-processor architecture of Cache Coherent (CC) model – write-through approach  The main memory modules are based on the persistent and reliable Non-Volatile Random Access Memory (NVRAM) medium. We assume that • Information once stored in NVRAM is never lost or corrupted. • the caching and memory ordering can be controlled to the point where the shared memory operations are atomic and durable.  Local memory references (e.g., in-cache reads) vs Remote Memory References (RMRs).  The time complexity of our algorithm is measured by counting the RMRs performed during a passage.  Support for swap_and_store (SAS) and compare_and_swap (CAS) instructions. 13
  • 14. Execution Model  Formalism:  We use a less formal approach to the I/O automata model, by defining the behavior of processes using a pseudo-code representation.  A process is a sequential program consisting of operations on variables. Each variable is either private or shared. Each process also has a special private variable, program counter (PC).  A step in a history corresponds to a statement execution or a crash.  The processes in the system interact with a finite set of variables in corresponding sequence of steps recorded in an execution history 𝐻 ∈ ℋ.  In a fair history, each individual process in the system is given an opportunity to perform its locally controlled steps infinitely often. 14
  • 15. Execution Model  Formalism (contd..):  A crash is a failure in an execution of one process where the private variables of the crashed process are reset to their initial values and the process simply stops executing any computation until it is active again.  A crash-recovery procedure reconstructs a crashed process’s state and resumes its active execution from the point of failure in the algorithm.  A process is said to be in recovery until the execution of its crash-recovery procedure is complete.  Classification of steps:  Normal step  Crash-recovery step  CS step 15
  • 16. Execution Model  Formalism (contd..):  A crash-recoverable execution history is a fair history wherein every process either executes infinitely many passages or crashes a finite number of times.  In other words, if a process is ever inactive, it is not because it is crashing indefinitely. 16
  • 17. Execution Model  Summary of assumptions:  A process in recovery reconstructs its state from the shared variables stored in non-volatile memory.  Process crashes are independent, i.e., failure of one process does not crash other active processes in the system.  Other active processes in the system may read, modify and write to the globally accessible shared variables of a process in recovery.  The code for critical section is idempotent and harmlessly repeatable by a process in recovery if it has the necessary exclusive access to do so. 17
  • 18. Recoverable Mutual Exclusion  To the best of our knowledge, we are the first to provide a formal specification to the correctness properties of Recoverable Mutual Exclusion.  A crash-recoverable mutex satisfies all the following :  Mutual Exclusion (ME)  First-come-first-served (FCFS)  Livelock-freedom (LF)  Starvation-freedom (SF)  Terminating Exit (TE)  Finite Recovery (FR) 18
  • 20. RMEQ  𝑅𝑀𝐸𝑄 is a linked-list of qnodes.  Each qnode contains:  a checkpoint number 𝑐ℎ𝑘.  an 𝑎ℎ𝑒𝑎𝑑 pointer that determines the links in 𝑅𝑀𝐸𝑄 and also acts as the spin variable.  a 𝑛𝑒𝑥𝑡 pointer to hold the address of the successor qnode.  The lock is represented by pointer 𝐿, set either to 𝑛𝑢𝑙𝑙 when the lock is free or to the tail qnode of 𝑅𝑀𝐸𝑄.  Processes append their qnodes to 𝑅𝑀𝐸𝑄 using the SAS instruction (doorway).  The process with head qnode in 𝑅𝑀𝐸𝑄 is the lock-holder.  To release a lock a process either sets 𝐿 to 𝑛𝑢𝑙𝑙 if it has no immediate successor in 𝑅𝑀𝐸𝑄, or flips the successor’s spin variable to 𝑛𝑢𝑙𝑙. 20
  • 21. RMEQ 21 NULL NULL ID: P1 Chk: 2 1 ID: P2 Chk:1 NULL Lock L ID: P4 Chk:1 1 ID: P3 Chk:1 1 ID: P5 Chk:1 ID: P8 Chk:0/1/3 NULL NULL or some qnode 𝑞𝑖 ∈ 𝑅𝑀𝐸𝑄 Head qnode 𝑞𝑖 ∉ 𝑅𝑀𝐸𝑄 Crashed qnode Index:
  • 22. RGLock Algorithm  Overview:  All processes start from an initial state in the NCS.  In a failure-free passage, execute 𝑎𝑐𝑞𝑢𝑖𝑟𝑒_𝑙𝑜𝑐𝑘, CS and 𝑟𝑒𝑙𝑒𝑎𝑠𝑒_𝑙𝑜𝑐𝑘 before returning to NCS.  A process may take several steps in NCS until subsequent request for lock acquisition.  If a process crashes at any point of execution within a failure-free passage, it  reads the state of its qnode from NVRAM;  invokes corresponding recovery procedure based on the 𝑐ℎ𝑘 value;  identifies the position of its qnode in RMEQ; and then  completes the crash-recoverable passage accordingly and returns to NCS. 22
  • 23. RGLock Algorithm  atomic 𝒔𝒘𝒂𝒑_𝒂𝒏𝒅_𝒔𝒕𝒐𝒓𝒆 (SAS):  In one indivisible atomic step, a fetch_and_store is immediately followed by another store that writes the result of the fetch_and_store operation to a location in the invoking process’s non-volatile memory.  Ensures strict FCFS order in lock acquisitions.  Aids a process in recovery in identifying the position of its qnode in 𝑅𝑀𝐸𝑄.  Pseudo-code representation: function SAS (old_element: address, new_element: value, location: address) atomic { temp: val_type := *old_element *old_element := new_element *location := temp } 23
  • 25. NCS (𝑞𝑖 ∉ 𝑅𝑀𝐸𝑄) 𝑞𝑖. 𝑎ℎ𝑒𝑎𝑑. 𝑛𝑒𝑥𝑡 = 𝑞𝑖 𝑞𝑖. 𝑐ℎ𝑘 ≔ 2 SAS(L,𝑞𝑖, 𝑞𝑖ahead) 𝑞𝑖.ahead ≠ null 𝑞𝑖.ahead = null 𝑞𝑖.ahead null INITIAL 𝑞𝑖.next = null 𝑞𝑖.next ≠ null false 𝑞𝑖.𝑛𝑒𝑥𝑡.𝑎ℎ𝑒𝑎𝑑≔𝑛𝑢𝑙𝑙 if CAS 𝑞𝑖 ∈ 𝑅𝑀𝐸𝑄 true if CS ENTRYProtocol EXITProtocol Crash-recovery step recoverHead recoverReleaserecoverBlocked failureFree 25 Index: Normal step Crash step Recovery procedure selected based on 𝑞𝑖. 𝑐ℎ𝑘 value 𝑞𝑖.next ≠ null ? true RGLock Algorithm
  • 27. Crash-recovery procedures 27 recoverBlocked recoverHead recoverRelease • Invoked if 𝑞𝑖. 𝑐ℎ𝑘 = 1 immediately after crash. • Check if 𝑞𝑖 ∈ 𝑅𝑀𝐸𝑄 • If yes, • busy-wait in waitForCS until 𝑞𝑖 is head • proceed to CS in recoverHead • release the lock • If no, • return false • execute failureFree • Invoked if 𝑞𝑖. 𝑐ℎ𝑘 = 2 immediately after crash or within recoverBlocked. • Execute CS • Release the lock • Invoked if 𝑞𝑖. 𝑐ℎ𝑘 = 3 immediately after crash • Check if 𝑞𝑖 ∈ 𝑅𝑀𝐸𝑄 • If yes, • release the lock • If no, • reset 𝑞𝑖. 𝑐ℎ𝑘 and return to NCS failureFree • Invoked if 𝑞𝑖. 𝑐ℎ𝑘 =0 immediately after crash or if recoverBlocked returns false.
  • 28. Proof Sketch 28  The correctness of our algorithm is derived by an induction on the length of the execution history or by contradiction, where applicable.  We use a history variable 𝑄 which represents the sequence of process IDs whose qnodes are in 𝑅𝑀𝐸𝑄.  An invariant is established with respect to the state of 𝐿, 𝑄, 𝑅𝑀𝐸𝑄 and the 𝑎ℎ𝑒𝑎𝑑 and 𝑐ℎ𝑘 fields on a qnode.  We show that the elements of 𝑄 are the same as the qnodes in 𝑅𝑀𝐸𝑄 at the end of a finite history, in that order.  The head qnode of 𝑅𝑀𝐸𝑄 is the lock holder and since 𝑄 always has at most one head element, ME is guaranteed.  FCFS, SF, LF, and TE are proved by contradiction, using the invariant.  And since every procedure in the RGLock algorithm terminates in a finite number of steps, FR is guaranteed.  Finally, we show that the RGLock algorithm incurs 𝒪(1) RMRs per process per failure-free passage.
  • 29. Conclusion 29  Learnings (for me, that is):  Less is more.  For designing synchronization datastructures.  Evolution of qnodes in RMEQ.  Asynchrony is a harsh mistress.  𝑓𝑖𝑛𝑑𝑀𝑒 accuracy.  𝑤𝑎𝑖𝑡𝐹𝑜𝑟𝐶𝑆 correctness.
  • 30. Conclusion 30  Known Limitations  Requires support for an unconventional hardware instruction (SAS).  𝑓𝑖𝑛𝑑𝑀𝑒 presets the no. of processes in the system.  Further Research  Programmatic implementation of the algorithm.  Simplify the code for more rigorous analysis.  Bakery algorithm in the context of crash-recovery for NVMM.  Make provision for processes to be added to the system even after the algorithm is initialized.  Potential Impact  In-memory databases for ‘always-on’ applications and high-performance computing.
  • 31. Conclusion 31  Summary of Contributions:  Formal specification of the correctness properties of Recoverable Mutual Exclusion.  RGLock: a first-of-its-kind crash-recoverable mutual exclusion lock for NVMM systems.  Proposed doorway instruction could help guide design of future NVMM architectures.  Distinguishing RGLock from earlier attempts for crash-recoverable mutex:  RGLock satisfies all safety and liveness properties simultaneously in presence of crash failures.  RGLock tolerates failures on any individual component, including a lock-holder, and system-wide crashes as well.  Compared to MCS Lock, RGLock does not inflate time complexity in failure-free execution.  A comprehensive proof of correctness for the RGLock algorithm.
  • 32. Let’s Talk! 32 NCS (𝑞𝑖 ∉ 𝑅𝑀𝐸𝑄) 𝑞𝑖. 𝑎ℎ𝑒𝑎𝑑. 𝑛𝑒𝑥𝑡 = 𝑞𝑖 𝑞𝑖. 𝑐ℎ𝑘 ≔ 2 SAS(L,𝑞𝑖, 𝑞𝑖ahead) 𝑞𝑖.ahead ≠ null 𝑞𝑖.ahead = null 𝑞𝑖.ahead null INITIAL 𝑞𝑖.next = null 𝑞𝑖.next ≠ null false 𝑞𝑖.𝑛𝑒𝑥𝑡.𝑎ℎ𝑒𝑎𝑑≔𝑛𝑢𝑙𝑙 if CAS 𝑞𝑖 ∈ 𝑅𝑀𝐸𝑄 true if CS ENTRYProtocol EXITProtocol Crash-recovery step recoverHead recoverReleaserecoverBlocked failureFree 𝑞𝑖.next ≠ null ? true