SlideShare a Scribd company logo
1 of 49
Download to read offline
Read-Copy-Update for
HelenOS
http://d3s.mff.cuni.cz

Martin Děcký
decky@d3s.mff.cuni.cz

CHARLES UNIVERSITY IN PRAGUE
faculty of mathematics and physics
faculty of mathematics and physics
Introduction
HelenOS
Microkernel multiserver operating system
Relying on asynchronous IPC mechanism
Major motivation for scalable concurrent algorithms and data
structures

Martin Děcký
Researcher in computer science (operating systems)
Not an expert on concurrent algorithms
But very lucky to be able to cooperate with hugely talented
people in this area

Martin Děcký, FOSDEM 2014, February 2nd 2014

Read-Copy-Update for HelenOS

2
In a Nutshell

Asynchronous IPC
=
Communicating parties may access the
communication facilities concurrently

Martin Děcký, FOSDEM 2014, February 2nd 2014

Read-Copy-Update for HelenOS

3
In a Nutshell

Asynchronous IPC
=
Communicating parties may access the
communication facilities concurrently
→ The state of the shared communication facilities needs to be protected by
explicit synchronization means

Martin Děcký, FOSDEM 2014, February 2nd 2014

Read-Copy-Update for HelenOS

4
In a Nutshell

Asynchronous IPC
=
Communicating parties have to access the
communication facilities concurrently

Martin Děcký, FOSDEM 2014, February 2nd 2014

Read-Copy-Update for HelenOS

5
In a Nutshell

Asynchronous IPC
=
Communicating parties have to access the
communication facilities concurrently
← In order to counterweight the overhead of the communication by doing
other useful work while waiting for a reply

Martin Děcký, FOSDEM 2014, February 2nd 2014

Read-Copy-Update for HelenOS

6
In a Nutshell

Asynchronous IPC
→
Communication facilities have to use
concurrency-friendly (non-blocking)
synchronization means

Martin Děcký, FOSDEM 2014, February 2nd 2014

Read-Copy-Update for HelenOS

7
In a Nutshell

Asynchronous IPC
→
Communication facilities have to use
concurrency-friendly (non-blocking)
synchronization means
← In order to avoid limiting the achievable degree of concurrency

Martin Děcký, FOSDEM 2014, February 2nd 2014

Read-Copy-Update for HelenOS

8
Basic Synchronization Taxonomy
For accessing shared data structures
Mutual exclusion synchronization
Temporal separation of scheduling entities
Typical means
Disabling preemption, Dekker's algorithm, direct use of atomic
test-and-set operations, etc.

Typical mechanisms
Locks, semaphores, condition variables, etc.
[+] Relatively intuitive semantics, well-known characteristics
[-] Overhead, restriction of concurrency, deadlocks

Martin Děcký, FOSDEM 2014, February 2nd 2014

Read-Copy-Update for HelenOS

9
Mutual Exclusion Synchronization

Martin Děcký, FOSDEM 2014, February 2nd 2014

Read-Copy-Update for HelenOS

10
Basic Synchronization Taxonomy
Non-blocking synchronization
Replace temporal separation by sophisticated means that
guarantee logical consistency
Typical means
Atomic writes, direct use of atomic read-modify-write operations, etc.

Typical mechanisms
Transactional memory, hazard pointers, Read-Copy-Update, etc.

[+] Reasonable (almost no) overhead and restriction of
concurrency in favorable cases, guarantee of progress
[-] Less intuitive semantics, sometimes non-trivial
characteristics, non-favorable cases, livelocks

Martin Děcký, FOSDEM 2014, February 2nd 2014

Read-Copy-Update for HelenOS

11
Non-blocking Synchronization

Martin Děcký, FOSDEM 2014, February 2nd 2014

Read-Copy-Update for HelenOS

12
Non-blocking Synchronization
Wait-freedom
Guaranteed system-wide progress and starvation-freedom
(all operations are finitely bounded)
Wait-freedom algorithms always exist [1], but the performance of general
methods is usually inferior to blocking algorithms
Wait-free queue by Kogan & Petrank [2]

Lock-freedom
Guaranteed system-wide progress, but individual threads can starve
Four phases: Data operation, assisting obstruction, aborting obstruction, waiting

Obstruction-freedom
Guaranteed single thread progress if isolated for a bounded time
(obstructing threads need to be suspended)

Martin Děcký, FOSDEM 2014, February 2nd 2014

Read-Copy-Update for HelenOS

13
From Means to Mechanism

Synchronization
means

Synchronization
mechanism

Individual instance of usage

Martin Děcký, FOSDEM 2014, February 2nd 2014

Generic reusable pattern

Read-Copy-Update for HelenOS

14
From Means to Mechanism

Synchronization
means

Synchronization
mechanism

Individual instance of usage
E.g. non-blocking list
implementation using
atomic pointer writes

Martin Děcký, FOSDEM 2014, February 2nd 2014

Generic reusable pattern
E.g. non-blocking list
implementation using
Read-Copy-Update

Read-Copy-Update for HelenOS

15
What Is Read-Copy-Update
Non-blocking synchronization mechanism
Targeting synchronization of read-mostly pointer-based data
structures with immutable values
Favorable case: R/W ratio of ~ 10:1 (but even 1:1 is achievable)
Unlimited number of readers without blocking
(not waiting for other readers or writers)
Little overhead on the reader side
(smaller than taking an uncontended lock)
Readers have to tolerate “stale” data and late updates
Readers have to observe “safe” access patterns
Synchronization among writers out of scope of the mechanism
Optional provisions for asynchronous reclamation

Martin Děcký, FOSDEM 2014, February 2nd 2014

Read-Copy-Update for HelenOS

16
What Is Read-Copy-Update (2)
Read-side critical section
Delimited by read_lock() and read_unlock() operations
(non-blocking)
Protected data can be referenced only inside the critical section

Safe access() methods for reading pointers
Avoiding unsafe compiler optimizations (reloading the pointer)
Not necessary for reading values

Quiescent state (a thread outside a critical section)
Grace period (all threads pass through a quiescent state)

Martin Děcký, FOSDEM 2014, February 2nd 2014

Read-Copy-Update for HelenOS

17
What Is Read-Copy-Update (3)
Synchronous write-side update
Atomically unlinking an old element
Calling a synchronize() operation
Blocks until a grace period elapses (all readers pass a
quiescent state, no longer referencing the unlinked data)
Possibility to reclaim or free the unlinked data

Inserting a new element using safe assign() operation
Avoiding unsafe compiler optimizations and store
reordering on weakly ordered architectures

Martin Děcký, FOSDEM 2014, February 2nd 2014

Read-Copy-Update for HelenOS

18
Synchronous Update Example
I.
head

next

v0

next

v1

Atomic pointer update to remove
the element with v0 from the list

Martin Děcký, FOSDEM 2014, February 2nd 2014

Read-Copy-Update for HelenOS

19
Synchronous Update Example
I.

II.

head

head

next

v0

next

v0

next

v1

next

v1

Blocking on synchronize()
During the grace period preexisting readers
can still access the “stale” element with v0

Martin Děcký, FOSDEM 2014, February 2nd 2014

Read-Copy-Update for HelenOS

20
Synchronous Update Example
I.

II.

head

III.

head

head

next

v0

next

v0

next

v2

next

v1

next

v1

next

v1

No reader can reference the element with v0
anymore – it can be reclaimed
New element with v2 can be atomically inserted

Martin Děcký, FOSDEM 2014, February 2nd 2014

Read-Copy-Update for HelenOS

21
What Is Read-Copy-Update (4)
Asynchronous write-side update
Using a call() operation
Non-blocking operation registering a callback
Callback is executed after a grace period elapses

Using a barrier() operation
Waiting for all queued asynchronous callbacks

Martin Děcký, FOSDEM 2014, February 2nd 2014

Read-Copy-Update for HelenOS

22
Grace Period Detection
Cornerstone of any RCU algorithm
Implicit trade-off between precision and overhead
Any extension of a grace period is also a grace period
Long (imprecise) grace periods
Blocking synchronous writers for a longer time
Increasing memory usage due to unreclaimed elements

Short (precise) grace periods

Increasing overhead on the reader side (need for memory barriers, atomic
operations, other heavy-weight operations, etc.)

Usual compromise

Identifying naturally occurring quiescent states for the given RCU
algorithm
Context switches, exceptions (timer ticks), etc.

Martin Děcký, FOSDEM 2014, February 2nd 2014

Read-Copy-Update for HelenOS

23
The Big Picture ...

Martin Děcký, FOSDEM 2014, February 2nd 2014

Read-Copy-Update for HelenOS

24
Motivation for RCU in HelenOS
Foundation for a scalable concurrent data
structure
Developing a microkernel-specific RCU
algorithm
Specific requirements, constraints and use cases
Last well-known RCU implementation for
a microkernel in 2003 (K42)

Martin Děcký, FOSDEM 2014, February 2nd 2014

Read-Copy-Update for HelenOS

25
Credits
AP-RCU
Non-intrusive, portable RCU algorithm
Developed and implemented by Andrej Podzimek for UTS
(OpenSolaris) [3] [4]

AH-RCU
Inspired by AP-RCU and several other RCU algorithms
Developed and implemented by Adam Hraška for SPARTAN
(HelenOS) [7]
Foundation for the Concurrent Hash Table in HelenOS [8]
Additional variants (preemptible AP-RCU, user space RCU)

Martin Děcký, FOSDEM 2014, February 2nd 2014

Read-Copy-Update for HelenOS

26
HelenOS requirements
The RCU algorithm must not impose design concepts of
legacy systems on HelenOS
E.g. a specific way how the timer interrupt handler is
implemented

The kernel space RCU algorithm must support
Read-side critical sections in interrupt and exception handlers
Asynchronous reclaimation (call()) in interrupt and exception
handlers
Read-side critical sections with preemption enabled
(not affecting scheduling latency)

Martin Děcký, FOSDEM 2014, February 2nd 2014

Read-Copy-Update for HelenOS

27
HelenOS requirements (2)
Concurrent Hash Table implementation
Growing and shrinking
Interrupt and non-maskable interrupt tolerant
Suitable for a global page hash table

Concurrent reads with low overhead
Concurrent inserts and deletes

Martin Děcký, FOSDEM 2014, February 2nd 2014

Read-Copy-Update for HelenOS

28
AH-RCU
Basic characteristics
Kernel space algorithm
Read-side critical sections are preemptible (without
loss of performance)
Multiple read-side critical sections within a time slice
Expensive operations when a thread was preempted do not
make much harm

Support for asynchronous reclaimation in interrupt
and exception handlers
No reliance on periodic timer

Martin Děcký, FOSDEM 2014, February 2nd 2014

Read-Copy-Update for HelenOS

29
AH-RCU (2)
Grace period detection
Test if all CPUs passed a quiescent state
Sending an interprocessor interrupt (IPI) to each CPU
If the interrupt handler detects a nesting count of 0, it issues
a memory barrier (representing a natural quiescent state)

Avoid sending IPI if context switch is detected

Detect any preempted readers holding up the
current grace period
Sleep and wait for the last preempted reader holding
up the grace period to wake the detector thread

Martin Děcký, FOSDEM 2014, February 2nd 2014

Read-Copy-Update for HelenOS

30
AH-RCU (3)
Advantages
Low overhead and preemptible read-side critical
section, suitable for exception handlers
No regular sampling

Disadvantages
Polling CPUs using interprocessor interrupts might
be disruptive in large systems

Martin Děcký, FOSDEM 2014, February 2nd 2014

Read-Copy-Update for HelenOS

31
HelenOS Concurrent Hash Table
Basic characteristics
Inspired by Triplett's relativistic hash table [5] and Michael's lockfree lists [6]
Hash collisions resolved using separate RCU-protected bucket lists
Buckets organized as lock-free lists without hazard pointers
RCU still protects against accessing invalid pointers and the ABA problem

Concurrent lookups and concurrent modifications

Tolerance for nested concurrent modifications from interrupt and exception handlers

Growing and shrinking using background resizing by a factor of 2
Concurrent with lookups and updates
Requires four grace periods

Deferred element freeing using RCU call()

Martin Děcký, FOSDEM 2014, February 2nd 2014

Read-Copy-Update for HelenOS

32
Enough talk!

Martin Děcký, FOSDEM 2014, February 2nd 2014

Read-Copy-Update for HelenOS

33
Enough talk!
Show me the (pseudo)code!

Martin Děcký, FOSDEM 2014, February 2nd 2014

Read-Copy-Update for HelenOS

34
AP-RCU Reader Side
read_lock():
disable_preemption()
check_qs()
cpu.nesting_cnt++
read_unlock():
cpu.nesting_cnt-check_qs()
enable_preemption()

Martin Děcký, FOSDEM 2014, February 2nd 2014

check_qs():
if (cpu.nesting_cnt == 0) {
if (cpu.last_seen_gp != cur_gp) {
gp = cur_gp
memory_barrier()
cpu.last_seen_gp = gp
}
}

Read-Copy-Update for HelenOS

35
AP-RCU Reader Side
read_lock():
disable_preemption()
check_qs()
cpu.nesting_cnt++

check_qs():
if (cpu.nesting_cnt == 0) {
if (cpu.last_seen_gp != cur_gp) {
gp = cur_gp
memory_barrier()
cpu.last_seen_gp = gp
}
}

read_unlock():
cpu.nesting_cnt-check_qs()
The first reader to notice the start of new
enable_preemption() each CPU announces its quiescentastate. grace period
on
Once all CPUs announce a quiescent state or perform
a context switch (a naturally occurring quiescent state
due to disabled preemption), the grace period ends.

Note: Writer forces a context switch on CPUs where no read-side critical section was
not observed for a while.
Note: Except memory_barrier() only inexpensive operations.
Martin Děcký, FOSDEM 2014, February 2nd 2014

Read-Copy-Update for HelenOS

36
Preemptible AP-RCU Reader Side
read_lock():
disable_preemption()
if (thread.nesting_cnt == 0)
record_qs()
thread.nesting_cnt++
enable_preemption()

record_qs():
if (cpu.last_seen_gp != cur_gp) {
gp = cur_gp
memory_barrier()
cpu_last_seen_gp = gp
}

read_unlock():
disable_preemption()
if (thread.nesting_cnt-- == 0) {
record_qs()
if ((thread.was_preempted) ||
(cpu.is_delaying_gp))
signal_unlock()
}
enable_preemption()

signal_unlock():
if (atomic_exchange(cpu.is_delaying_gp, false)
== true)
remaining_readers_semaphore.up()
if (atomic_exchange(thread.was_preempted, false)
== true) {
preempt_mutex.lock()
preempted_list.remove(thread)
if ((is_empty(cpu.cur_preempted)) &&
(preempted_blocking_gp))
remaining_readers_semaphore.up()
preempt_mutex.unlock()
}

Martin Děcký, FOSDEM 2014, February 2nd 2014

Read-Copy-Update for HelenOS

37
AH-RCU Reader Side
read_lock():
thread.nesting_cnt++
compiler_barrier()
read_unlock():
compiler_barrier()
thread.nesting_cnt-if (thread.nesting_cnt == was_preempted)
preempted_unlock()
preempted_unlock():
// avoid race between thread and interrupt handler
if (atomic_exchange(thread.nesting_cnt, 0) == was_preempted) {
preempt_lock.lock()
preempted_list.remove(thread)
if ((is_empty(cpu.cur_preempted)) && (detection_waiting))
detection_semaphore.up()
// notify the detector thread about the grace period
preempt_lock.unlock()
}

Note: Except preempted_unlock() only inexpensive operations.
Martin Děcký, FOSDEM 2014, February 2nd 2014

Read-Copy-Update for HelenOS

38
AP-RCU Writer Side
synchronize():
memory_barrier()
mutex.lock()
cur_gp++
// start a new grace period
reader_cpus = []
// gather CPUs potentially in read-side CS
foreach cpu in cpus {
if ((!cpu.idle) && (cpu.last_seen_gp != cur_gp)) {
cpu.last_ctx_switch_cnt = cpu.ctx_switch_cnt
reader_cpus += cpu
}
}
wait(10ms)

// longest acceptable grace period duration (tunable)

foreach cpu in reader_cpus {
// enforce a quiescent state
if ((!cpu.idle) && (cpu.last_seen_gp != cur_gp) &&
(cpu.last_ctx_switch_cnt == cpu.ctx_switch_cnt))
cpu.ctx_switch_force_wait()
}
mutex.unlock()
Martin Děcký, FOSDEM 2014, February 2nd 2014

Read-Copy-Update for HelenOS

39
AH-RCU Writer Side
detector_thread:
forever {
wait_for_callbacks()
// run callbacks added before the current grace period
execute_callbacks()
// push callbacks registered since last processing to the queue
advance_callbacks()

}

wait_for_gp_end()

Martin Děcký, FOSDEM 2014, February 2nd 2014

Read-Copy-Update for HelenOS

40
AH-RCU Writer Side (2)
wait_for_gp_end():
gp_mutex.lock()
if (completed_gp != cur_gp) {
// a grace period is already in progress
wait_for_gp_end_signal()
goto out
} else {
// start a new grace period
preempt_lock.lock()
cur_gp++
preempt_lock.unlock()
}
gp_mutex.unlock()
wait_for_readers()
gp_mutex.lock()
completed_gp = cur_gp
out:

gp_mutex.unlock()

Martin Děcký, FOSDEM 2014, February 2nd 2014

Read-Copy-Update for HelenOS

41
Evaluation
Read-side critical section scalability: Traversal of a five-element list
The list is protected as a whole, it is only read, never modified.
1.2e+09

ideal
ah-rcu
pap-rcu
spinlock

List traversals / second

1e+09

8e+08

6e+08

4e+08

2e+08

0
0

1

2

3

4

5

Threads
Martin Děcký, FOSDEM 2014, February 2nd 2014

Read-Copy-Update for HelenOS

42
Evaluation (2)
Write-side overhead: Different ratios of updates vs. lookups
Five-element list, four threads running in parallel. Updates
are always synchronized by a spinlock.
5e+08

ah-rcu + spinlock
pap-rcu + spinlock
spinlock

Operations / second

4.5e+08
4e+08
3.5e+08
3e+08
2.5e+08
2e+08
1.5e+08
1e+08
5e+07
0
0

10

20

40

60

80

100

% of updates
Martin Děcký, FOSDEM 2014, February 2nd 2014

Read-Copy-Update for HelenOS

43
Evaluation (3)
Read-side scalability vs. write-side overhead: Crossover point
Data points from previous figure with low fraction of updates are
discarded.
6e+07

ah-rcu + spinlock
pap-rcu + spinlock
spinlock

Operations / second

5e+07

4e+07

3e+07

2e+07

1e+07

0
5 10

20

40

60

80

100

% of updates
Martin Děcký, FOSDEM 2014, February 2nd 2014

Read-Copy-Update for HelenOS

44
Evaluation (4)
Concurrent hash table lookup scalability
128 buckets, average load factor of 4 elements per bucket, 50 % of lookups
for hitting keys, 50 % of lookups for missing keys (each thread used a
separate list). The resize condition was checked, but never executed.
7e+07

cht + ah-rcu
ht + bucket spinlocks
ht + global spinlock

Lookups / second

6e+07
5e+07
4e+07
3e+07
2e+07
1e+07
0
0

1

2

3

4

5

Threads
Martin Děcký, FOSDEM 2014, February 2nd 2014

Read-Copy-Update for HelenOS

45
Evaluation (5)
Concurrent hash table update overhead: Different ratios of concurrent
updates vs. lookups
Four threads running in parallel.
7e+07

cht + ah-rcu
ht + bucket spinlocks
ht + global spinlock

Operations / second

6e+07
5e+07
4e+07
3e+07
2e+07
1e+07
0
0

10

20

30

40

50

60

70

80

% of updates
Martin Děcký, FOSDEM 2014, February 2nd 2014

Read-Copy-Update for HelenOS

46
Conclusion
Novel scalable algorithms
Preemptible AP-RCU for HelenOS
Preemptible AH-RCU for HelenOS
Resizeable Concurrent Hash Table for HelenOS
Suitable as a basic data structure for asynchronous
HelenOS IPC
Suitable for other kernel uses (e.g. global page table)

Thorough evaluation
Promising behavior

Martin Děcký, FOSDEM 2014, February 2nd 2014

Read-Copy-Update for HelenOS

47
Q&A

www.helenos.org

Martin Děcký, FOSDEM 2014, February 2nd 2014

Read-Copy-Update for HelenOS

48
References
[1] Herlihy M. P.: Impossibility and universality results for wait-free synchronization, in Proceedings of 7th
Annual ACM Symposium on Principles of Distributed Computing, ACM, 1988
[2] Kogan A., Petrank E.: Wait-free queues with multiple enqueuers and dequeuers, in Proceedings of 16th
ACM Symposium on Principles and Practice of Parallel Programming, ACM, 2011
[3] Podzimek A., Děcký M., Bulej L., Tůma P.: A Non-Intrusive Read-Copy-Update for UTS, in Proceedings of
18th IEEE International Conference on Parallel and Distributed Systems, IEEE, 2012,
http://d3s.mff.cuni.cz/publications/download/PodzimekDeckyBulejTuma-ICPADS-2012.pdf
[4] http://d3s.mff.cuni.cz/software/rcu/rcu.patch
[5] Triplett J., McKenney P. E., Walpole J.: Resizable, scalable, concurrent hash tables via relativistic
programming, in Proceedings of the 2011 USENIX Annual Technical Conference, ACM, 2011
[6] Michael M. M.: High performance dynamic lock-free hash tables and list-based sets, in Proceedings of
14th Annual ACM Symposium on Parallel Algorithms and Architectures, ACM, 2002
[7] Hraška A.: Read-Copy-Update for HelenOS, master thesis, Charles University in Prague, 2013,
http://www.helenos.org/doc/theses/ah-thesis.pdf
[8] https://code.launchpad.net/~adam-hraska+lp/helenos/cht-bench

Martin Děcký, FOSDEM 2014, February 2nd 2014

Read-Copy-Update for HelenOS

49

More Related Content

Similar to FOSDEM 2014: Read-Copy-Update for HelenOS

Tutorial on Parallel Computing and Message Passing Model - C1
Tutorial on Parallel Computing and Message Passing Model - C1Tutorial on Parallel Computing and Message Passing Model - C1
Tutorial on Parallel Computing and Message Passing Model - C1Marcirio Chaves
 
Setting up a private cloud for academic environment with OSS by Zoran Pantic ...
Setting up a private cloud for academic environment with OSS by Zoran Pantic ...Setting up a private cloud for academic environment with OSS by Zoran Pantic ...
Setting up a private cloud for academic environment with OSS by Zoran Pantic ...José Ferreiro
 
Hardware/Software Co-Design for Efficient Microkernel Execution
Hardware/Software Co-Design for Efficient Microkernel ExecutionHardware/Software Co-Design for Efficient Microkernel Execution
Hardware/Software Co-Design for Efficient Microkernel ExecutionMartin Děcký
 
FOSDEM 2013: Operating Systems Hot Topics
FOSDEM 2013: Operating Systems Hot TopicsFOSDEM 2013: Operating Systems Hot Topics
FOSDEM 2013: Operating Systems Hot TopicsMartin Děcký
 
Cluster Setup Manual Using Ubuntu and MPICH
Cluster Setup Manual Using Ubuntu and MPICHCluster Setup Manual Using Ubuntu and MPICH
Cluster Setup Manual Using Ubuntu and MPICHMisu Md Rakib Hossain
 
Lessons Learned from Porting HelenOS to RISC-V
Lessons Learned from Porting HelenOS to RISC-VLessons Learned from Porting HelenOS to RISC-V
Lessons Learned from Porting HelenOS to RISC-VMartin Děcký
 
Distributed operating system
Distributed operating systemDistributed operating system
Distributed operating systemMoeez Ahmad
 
Integrating Fedora with DuraCloud 1-11-12
Integrating Fedora with DuraCloud 1-11-12Integrating Fedora with DuraCloud 1-11-12
Integrating Fedora with DuraCloud 1-11-12DuraSpace
 
Cost-effective software reliability through autonomic tuning of system resources
Cost-effective software reliability through autonomic tuning of system resourcesCost-effective software reliability through autonomic tuning of system resources
Cost-effective software reliability through autonomic tuning of system resourcesVincenzo De Florio
 
documents.pub_replication-consistency.ppt
documents.pub_replication-consistency.pptdocuments.pub_replication-consistency.ppt
documents.pub_replication-consistency.pptwondefirawwolile
 
Ppt project process migration
Ppt project process migrationPpt project process migration
Ppt project process migrationjaya380
 
Pacemaker+DRBD
Pacemaker+DRBDPacemaker+DRBD
Pacemaker+DRBDDan Frincu
 
CRIU: are we there yet?
CRIU: are we there yet?CRIU: are we there yet?
CRIU: are we there yet?OpenVZ
 
Coordination-aware Elasticity
Coordination-aware ElasticityCoordination-aware Elasticity
Coordination-aware ElasticityHong-Linh Truong
 
Case study operating systems
Case study operating systemsCase study operating systems
Case study operating systemsAkhil Bevara
 

Similar to FOSDEM 2014: Read-Copy-Update for HelenOS (20)

Tutorial on Parallel Computing and Message Passing Model - C1
Tutorial on Parallel Computing and Message Passing Model - C1Tutorial on Parallel Computing and Message Passing Model - C1
Tutorial on Parallel Computing and Message Passing Model - C1
 
Setting up a private cloud for academic environment with OSS by Zoran Pantic ...
Setting up a private cloud for academic environment with OSS by Zoran Pantic ...Setting up a private cloud for academic environment with OSS by Zoran Pantic ...
Setting up a private cloud for academic environment with OSS by Zoran Pantic ...
 
Hardware/Software Co-Design for Efficient Microkernel Execution
Hardware/Software Co-Design for Efficient Microkernel ExecutionHardware/Software Co-Design for Efficient Microkernel Execution
Hardware/Software Co-Design for Efficient Microkernel Execution
 
FOSDEM 2013: Operating Systems Hot Topics
FOSDEM 2013: Operating Systems Hot TopicsFOSDEM 2013: Operating Systems Hot Topics
FOSDEM 2013: Operating Systems Hot Topics
 
Cluster Setup Manual Using Ubuntu and MPICH
Cluster Setup Manual Using Ubuntu and MPICHCluster Setup Manual Using Ubuntu and MPICH
Cluster Setup Manual Using Ubuntu and MPICH
 
Chapter 22 - Windows XP
Chapter 22 - Windows XPChapter 22 - Windows XP
Chapter 22 - Windows XP
 
Lessons Learned from Porting HelenOS to RISC-V
Lessons Learned from Porting HelenOS to RISC-VLessons Learned from Porting HelenOS to RISC-V
Lessons Learned from Porting HelenOS to RISC-V
 
Distributed operating system
Distributed operating systemDistributed operating system
Distributed operating system
 
Integrating Fedora with DuraCloud 1-11-12
Integrating Fedora with DuraCloud 1-11-12Integrating Fedora with DuraCloud 1-11-12
Integrating Fedora with DuraCloud 1-11-12
 
Cost-effective software reliability through autonomic tuning of system resources
Cost-effective software reliability through autonomic tuning of system resourcesCost-effective software reliability through autonomic tuning of system resources
Cost-effective software reliability through autonomic tuning of system resources
 
documents.pub_replication-consistency.ppt
documents.pub_replication-consistency.pptdocuments.pub_replication-consistency.ppt
documents.pub_replication-consistency.ppt
 
Ppt project process migration
Ppt project process migrationPpt project process migration
Ppt project process migration
 
Lec13 multidevice
Lec13 multideviceLec13 multidevice
Lec13 multidevice
 
Pacemaker+DRBD
Pacemaker+DRBDPacemaker+DRBD
Pacemaker+DRBD
 
CRIU: are we there yet?
CRIU: are we there yet?CRIU: are we there yet?
CRIU: are we there yet?
 
Coordination-aware Elasticity
Coordination-aware ElasticityCoordination-aware Elasticity
Coordination-aware Elasticity
 
Case study operating systems
Case study operating systemsCase study operating systems
Case study operating systems
 
Case study windows
Case study windowsCase study windows
Case study windows
 
Examples in OS synchronization for UG
Examples in OS synchronization for UG Examples in OS synchronization for UG
Examples in OS synchronization for UG
 
Windows 2000
Windows 2000Windows 2000
Windows 2000
 

More from Martin Děcký

Microkernels in the Era of Data-Centric Computing
Microkernels in the Era of Data-Centric ComputingMicrokernels in the Era of Data-Centric Computing
Microkernels in the Era of Data-Centric ComputingMartin Děcký
 
Formal Verification of Functional Code
Formal Verification of Functional CodeFormal Verification of Functional Code
Formal Verification of Functional CodeMartin Děcký
 
IPC in Microkernel Systems, Capabilities
IPC in Microkernel Systems, CapabilitiesIPC in Microkernel Systems, Capabilities
IPC in Microkernel Systems, CapabilitiesMartin Děcký
 
Unikernels, Multikernels, Virtual Machine-based Kernels
Unikernels, Multikernels, Virtual Machine-based KernelsUnikernels, Multikernels, Virtual Machine-based Kernels
Unikernels, Multikernels, Virtual Machine-based KernelsMartin Děcký
 
Code Instrumentation, Dynamic Tracing
Code Instrumentation, Dynamic TracingCode Instrumentation, Dynamic Tracing
Code Instrumentation, Dynamic TracingMartin Děcký
 
Nízkoúrovňové programování
Nízkoúrovňové programováníNízkoúrovňové programování
Nízkoúrovňové programováníMartin Děcký
 
Porting HelenOS to RISC-V
Porting HelenOS to RISC-VPorting HelenOS to RISC-V
Porting HelenOS to RISC-VMartin Děcký
 
What Could Microkernels Learn from Monolithic Kernels (and Vice Versa)
What Could Microkernels Learn from Monolithic Kernels (and Vice Versa)What Could Microkernels Learn from Monolithic Kernels (and Vice Versa)
What Could Microkernels Learn from Monolithic Kernels (and Vice Versa)Martin Děcký
 
HelenOS: State of the Union 2012
HelenOS: State of the Union 2012HelenOS: State of the Union 2012
HelenOS: State of the Union 2012Martin Děcký
 

More from Martin Děcký (9)

Microkernels in the Era of Data-Centric Computing
Microkernels in the Era of Data-Centric ComputingMicrokernels in the Era of Data-Centric Computing
Microkernels in the Era of Data-Centric Computing
 
Formal Verification of Functional Code
Formal Verification of Functional CodeFormal Verification of Functional Code
Formal Verification of Functional Code
 
IPC in Microkernel Systems, Capabilities
IPC in Microkernel Systems, CapabilitiesIPC in Microkernel Systems, Capabilities
IPC in Microkernel Systems, Capabilities
 
Unikernels, Multikernels, Virtual Machine-based Kernels
Unikernels, Multikernels, Virtual Machine-based KernelsUnikernels, Multikernels, Virtual Machine-based Kernels
Unikernels, Multikernels, Virtual Machine-based Kernels
 
Code Instrumentation, Dynamic Tracing
Code Instrumentation, Dynamic TracingCode Instrumentation, Dynamic Tracing
Code Instrumentation, Dynamic Tracing
 
Nízkoúrovňové programování
Nízkoúrovňové programováníNízkoúrovňové programování
Nízkoúrovňové programování
 
Porting HelenOS to RISC-V
Porting HelenOS to RISC-VPorting HelenOS to RISC-V
Porting HelenOS to RISC-V
 
What Could Microkernels Learn from Monolithic Kernels (and Vice Versa)
What Could Microkernels Learn from Monolithic Kernels (and Vice Versa)What Could Microkernels Learn from Monolithic Kernels (and Vice Versa)
What Could Microkernels Learn from Monolithic Kernels (and Vice Versa)
 
HelenOS: State of the Union 2012
HelenOS: State of the Union 2012HelenOS: State of the Union 2012
HelenOS: State of the Union 2012
 

Recently uploaded

Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...soniya singh
 
Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...aditisharan08
 
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfThe Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfkalichargn70th171
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...OnePlan Solutions
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVshikhaohhpro
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...gurkirankumar98700
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...kellynguyen01
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...ICS
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfkalichargn70th171
 
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...Christina Lin
 
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataBradBedford3
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantAxelRicardoTrocheRiq
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software DevelopersVinodh Ram
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...stazi3110
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providermohitmore19
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comFatema Valibhai
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxComplianceQuest1
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)OPEN KNOWLEDGE GmbH
 
Introduction to Decentralized Applications (dApps)
Introduction to Decentralized Applications (dApps)Introduction to Decentralized Applications (dApps)
Introduction to Decentralized Applications (dApps)Intelisync
 

Recently uploaded (20)

Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
 
Exploring iOS App Development: Simplifying the Process
Exploring iOS App Development: Simplifying the ProcessExploring iOS App Development: Simplifying the Process
Exploring iOS App Development: Simplifying the Process
 
Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...
 
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfThe Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
 
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service Consultant
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software Developers
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)
 
Introduction to Decentralized Applications (dApps)
Introduction to Decentralized Applications (dApps)Introduction to Decentralized Applications (dApps)
Introduction to Decentralized Applications (dApps)
 

FOSDEM 2014: Read-Copy-Update for HelenOS

  • 1. Read-Copy-Update for HelenOS http://d3s.mff.cuni.cz Martin Děcký decky@d3s.mff.cuni.cz CHARLES UNIVERSITY IN PRAGUE faculty of mathematics and physics faculty of mathematics and physics
  • 2. Introduction HelenOS Microkernel multiserver operating system Relying on asynchronous IPC mechanism Major motivation for scalable concurrent algorithms and data structures Martin Děcký Researcher in computer science (operating systems) Not an expert on concurrent algorithms But very lucky to be able to cooperate with hugely talented people in this area Martin Děcký, FOSDEM 2014, February 2nd 2014 Read-Copy-Update for HelenOS 2
  • 3. In a Nutshell Asynchronous IPC = Communicating parties may access the communication facilities concurrently Martin Děcký, FOSDEM 2014, February 2nd 2014 Read-Copy-Update for HelenOS 3
  • 4. In a Nutshell Asynchronous IPC = Communicating parties may access the communication facilities concurrently → The state of the shared communication facilities needs to be protected by explicit synchronization means Martin Děcký, FOSDEM 2014, February 2nd 2014 Read-Copy-Update for HelenOS 4
  • 5. In a Nutshell Asynchronous IPC = Communicating parties have to access the communication facilities concurrently Martin Děcký, FOSDEM 2014, February 2nd 2014 Read-Copy-Update for HelenOS 5
  • 6. In a Nutshell Asynchronous IPC = Communicating parties have to access the communication facilities concurrently ← In order to counterweight the overhead of the communication by doing other useful work while waiting for a reply Martin Děcký, FOSDEM 2014, February 2nd 2014 Read-Copy-Update for HelenOS 6
  • 7. In a Nutshell Asynchronous IPC → Communication facilities have to use concurrency-friendly (non-blocking) synchronization means Martin Děcký, FOSDEM 2014, February 2nd 2014 Read-Copy-Update for HelenOS 7
  • 8. In a Nutshell Asynchronous IPC → Communication facilities have to use concurrency-friendly (non-blocking) synchronization means ← In order to avoid limiting the achievable degree of concurrency Martin Děcký, FOSDEM 2014, February 2nd 2014 Read-Copy-Update for HelenOS 8
  • 9. Basic Synchronization Taxonomy For accessing shared data structures Mutual exclusion synchronization Temporal separation of scheduling entities Typical means Disabling preemption, Dekker's algorithm, direct use of atomic test-and-set operations, etc. Typical mechanisms Locks, semaphores, condition variables, etc. [+] Relatively intuitive semantics, well-known characteristics [-] Overhead, restriction of concurrency, deadlocks Martin Děcký, FOSDEM 2014, February 2nd 2014 Read-Copy-Update for HelenOS 9
  • 10. Mutual Exclusion Synchronization Martin Děcký, FOSDEM 2014, February 2nd 2014 Read-Copy-Update for HelenOS 10
  • 11. Basic Synchronization Taxonomy Non-blocking synchronization Replace temporal separation by sophisticated means that guarantee logical consistency Typical means Atomic writes, direct use of atomic read-modify-write operations, etc. Typical mechanisms Transactional memory, hazard pointers, Read-Copy-Update, etc. [+] Reasonable (almost no) overhead and restriction of concurrency in favorable cases, guarantee of progress [-] Less intuitive semantics, sometimes non-trivial characteristics, non-favorable cases, livelocks Martin Děcký, FOSDEM 2014, February 2nd 2014 Read-Copy-Update for HelenOS 11
  • 12. Non-blocking Synchronization Martin Děcký, FOSDEM 2014, February 2nd 2014 Read-Copy-Update for HelenOS 12
  • 13. Non-blocking Synchronization Wait-freedom Guaranteed system-wide progress and starvation-freedom (all operations are finitely bounded) Wait-freedom algorithms always exist [1], but the performance of general methods is usually inferior to blocking algorithms Wait-free queue by Kogan & Petrank [2] Lock-freedom Guaranteed system-wide progress, but individual threads can starve Four phases: Data operation, assisting obstruction, aborting obstruction, waiting Obstruction-freedom Guaranteed single thread progress if isolated for a bounded time (obstructing threads need to be suspended) Martin Děcký, FOSDEM 2014, February 2nd 2014 Read-Copy-Update for HelenOS 13
  • 14. From Means to Mechanism Synchronization means Synchronization mechanism Individual instance of usage Martin Děcký, FOSDEM 2014, February 2nd 2014 Generic reusable pattern Read-Copy-Update for HelenOS 14
  • 15. From Means to Mechanism Synchronization means Synchronization mechanism Individual instance of usage E.g. non-blocking list implementation using atomic pointer writes Martin Děcký, FOSDEM 2014, February 2nd 2014 Generic reusable pattern E.g. non-blocking list implementation using Read-Copy-Update Read-Copy-Update for HelenOS 15
  • 16. What Is Read-Copy-Update Non-blocking synchronization mechanism Targeting synchronization of read-mostly pointer-based data structures with immutable values Favorable case: R/W ratio of ~ 10:1 (but even 1:1 is achievable) Unlimited number of readers without blocking (not waiting for other readers or writers) Little overhead on the reader side (smaller than taking an uncontended lock) Readers have to tolerate “stale” data and late updates Readers have to observe “safe” access patterns Synchronization among writers out of scope of the mechanism Optional provisions for asynchronous reclamation Martin Děcký, FOSDEM 2014, February 2nd 2014 Read-Copy-Update for HelenOS 16
  • 17. What Is Read-Copy-Update (2) Read-side critical section Delimited by read_lock() and read_unlock() operations (non-blocking) Protected data can be referenced only inside the critical section Safe access() methods for reading pointers Avoiding unsafe compiler optimizations (reloading the pointer) Not necessary for reading values Quiescent state (a thread outside a critical section) Grace period (all threads pass through a quiescent state) Martin Děcký, FOSDEM 2014, February 2nd 2014 Read-Copy-Update for HelenOS 17
  • 18. What Is Read-Copy-Update (3) Synchronous write-side update Atomically unlinking an old element Calling a synchronize() operation Blocks until a grace period elapses (all readers pass a quiescent state, no longer referencing the unlinked data) Possibility to reclaim or free the unlinked data Inserting a new element using safe assign() operation Avoiding unsafe compiler optimizations and store reordering on weakly ordered architectures Martin Děcký, FOSDEM 2014, February 2nd 2014 Read-Copy-Update for HelenOS 18
  • 19. Synchronous Update Example I. head next v0 next v1 Atomic pointer update to remove the element with v0 from the list Martin Děcký, FOSDEM 2014, February 2nd 2014 Read-Copy-Update for HelenOS 19
  • 20. Synchronous Update Example I. II. head head next v0 next v0 next v1 next v1 Blocking on synchronize() During the grace period preexisting readers can still access the “stale” element with v0 Martin Děcký, FOSDEM 2014, February 2nd 2014 Read-Copy-Update for HelenOS 20
  • 21. Synchronous Update Example I. II. head III. head head next v0 next v0 next v2 next v1 next v1 next v1 No reader can reference the element with v0 anymore – it can be reclaimed New element with v2 can be atomically inserted Martin Děcký, FOSDEM 2014, February 2nd 2014 Read-Copy-Update for HelenOS 21
  • 22. What Is Read-Copy-Update (4) Asynchronous write-side update Using a call() operation Non-blocking operation registering a callback Callback is executed after a grace period elapses Using a barrier() operation Waiting for all queued asynchronous callbacks Martin Děcký, FOSDEM 2014, February 2nd 2014 Read-Copy-Update for HelenOS 22
  • 23. Grace Period Detection Cornerstone of any RCU algorithm Implicit trade-off between precision and overhead Any extension of a grace period is also a grace period Long (imprecise) grace periods Blocking synchronous writers for a longer time Increasing memory usage due to unreclaimed elements Short (precise) grace periods Increasing overhead on the reader side (need for memory barriers, atomic operations, other heavy-weight operations, etc.) Usual compromise Identifying naturally occurring quiescent states for the given RCU algorithm Context switches, exceptions (timer ticks), etc. Martin Děcký, FOSDEM 2014, February 2nd 2014 Read-Copy-Update for HelenOS 23
  • 24. The Big Picture ... Martin Děcký, FOSDEM 2014, February 2nd 2014 Read-Copy-Update for HelenOS 24
  • 25. Motivation for RCU in HelenOS Foundation for a scalable concurrent data structure Developing a microkernel-specific RCU algorithm Specific requirements, constraints and use cases Last well-known RCU implementation for a microkernel in 2003 (K42) Martin Děcký, FOSDEM 2014, February 2nd 2014 Read-Copy-Update for HelenOS 25
  • 26. Credits AP-RCU Non-intrusive, portable RCU algorithm Developed and implemented by Andrej Podzimek for UTS (OpenSolaris) [3] [4] AH-RCU Inspired by AP-RCU and several other RCU algorithms Developed and implemented by Adam Hraška for SPARTAN (HelenOS) [7] Foundation for the Concurrent Hash Table in HelenOS [8] Additional variants (preemptible AP-RCU, user space RCU) Martin Děcký, FOSDEM 2014, February 2nd 2014 Read-Copy-Update for HelenOS 26
  • 27. HelenOS requirements The RCU algorithm must not impose design concepts of legacy systems on HelenOS E.g. a specific way how the timer interrupt handler is implemented The kernel space RCU algorithm must support Read-side critical sections in interrupt and exception handlers Asynchronous reclaimation (call()) in interrupt and exception handlers Read-side critical sections with preemption enabled (not affecting scheduling latency) Martin Děcký, FOSDEM 2014, February 2nd 2014 Read-Copy-Update for HelenOS 27
  • 28. HelenOS requirements (2) Concurrent Hash Table implementation Growing and shrinking Interrupt and non-maskable interrupt tolerant Suitable for a global page hash table Concurrent reads with low overhead Concurrent inserts and deletes Martin Děcký, FOSDEM 2014, February 2nd 2014 Read-Copy-Update for HelenOS 28
  • 29. AH-RCU Basic characteristics Kernel space algorithm Read-side critical sections are preemptible (without loss of performance) Multiple read-side critical sections within a time slice Expensive operations when a thread was preempted do not make much harm Support for asynchronous reclaimation in interrupt and exception handlers No reliance on periodic timer Martin Děcký, FOSDEM 2014, February 2nd 2014 Read-Copy-Update for HelenOS 29
  • 30. AH-RCU (2) Grace period detection Test if all CPUs passed a quiescent state Sending an interprocessor interrupt (IPI) to each CPU If the interrupt handler detects a nesting count of 0, it issues a memory barrier (representing a natural quiescent state) Avoid sending IPI if context switch is detected Detect any preempted readers holding up the current grace period Sleep and wait for the last preempted reader holding up the grace period to wake the detector thread Martin Děcký, FOSDEM 2014, February 2nd 2014 Read-Copy-Update for HelenOS 30
  • 31. AH-RCU (3) Advantages Low overhead and preemptible read-side critical section, suitable for exception handlers No regular sampling Disadvantages Polling CPUs using interprocessor interrupts might be disruptive in large systems Martin Děcký, FOSDEM 2014, February 2nd 2014 Read-Copy-Update for HelenOS 31
  • 32. HelenOS Concurrent Hash Table Basic characteristics Inspired by Triplett's relativistic hash table [5] and Michael's lockfree lists [6] Hash collisions resolved using separate RCU-protected bucket lists Buckets organized as lock-free lists without hazard pointers RCU still protects against accessing invalid pointers and the ABA problem Concurrent lookups and concurrent modifications Tolerance for nested concurrent modifications from interrupt and exception handlers Growing and shrinking using background resizing by a factor of 2 Concurrent with lookups and updates Requires four grace periods Deferred element freeing using RCU call() Martin Děcký, FOSDEM 2014, February 2nd 2014 Read-Copy-Update for HelenOS 32
  • 33. Enough talk! Martin Děcký, FOSDEM 2014, February 2nd 2014 Read-Copy-Update for HelenOS 33
  • 34. Enough talk! Show me the (pseudo)code! Martin Děcký, FOSDEM 2014, February 2nd 2014 Read-Copy-Update for HelenOS 34
  • 35. AP-RCU Reader Side read_lock(): disable_preemption() check_qs() cpu.nesting_cnt++ read_unlock(): cpu.nesting_cnt-check_qs() enable_preemption() Martin Děcký, FOSDEM 2014, February 2nd 2014 check_qs(): if (cpu.nesting_cnt == 0) { if (cpu.last_seen_gp != cur_gp) { gp = cur_gp memory_barrier() cpu.last_seen_gp = gp } } Read-Copy-Update for HelenOS 35
  • 36. AP-RCU Reader Side read_lock(): disable_preemption() check_qs() cpu.nesting_cnt++ check_qs(): if (cpu.nesting_cnt == 0) { if (cpu.last_seen_gp != cur_gp) { gp = cur_gp memory_barrier() cpu.last_seen_gp = gp } } read_unlock(): cpu.nesting_cnt-check_qs() The first reader to notice the start of new enable_preemption() each CPU announces its quiescentastate. grace period on Once all CPUs announce a quiescent state or perform a context switch (a naturally occurring quiescent state due to disabled preemption), the grace period ends. Note: Writer forces a context switch on CPUs where no read-side critical section was not observed for a while. Note: Except memory_barrier() only inexpensive operations. Martin Děcký, FOSDEM 2014, February 2nd 2014 Read-Copy-Update for HelenOS 36
  • 37. Preemptible AP-RCU Reader Side read_lock(): disable_preemption() if (thread.nesting_cnt == 0) record_qs() thread.nesting_cnt++ enable_preemption() record_qs(): if (cpu.last_seen_gp != cur_gp) { gp = cur_gp memory_barrier() cpu_last_seen_gp = gp } read_unlock(): disable_preemption() if (thread.nesting_cnt-- == 0) { record_qs() if ((thread.was_preempted) || (cpu.is_delaying_gp)) signal_unlock() } enable_preemption() signal_unlock(): if (atomic_exchange(cpu.is_delaying_gp, false) == true) remaining_readers_semaphore.up() if (atomic_exchange(thread.was_preempted, false) == true) { preempt_mutex.lock() preempted_list.remove(thread) if ((is_empty(cpu.cur_preempted)) && (preempted_blocking_gp)) remaining_readers_semaphore.up() preempt_mutex.unlock() } Martin Děcký, FOSDEM 2014, February 2nd 2014 Read-Copy-Update for HelenOS 37
  • 38. AH-RCU Reader Side read_lock(): thread.nesting_cnt++ compiler_barrier() read_unlock(): compiler_barrier() thread.nesting_cnt-if (thread.nesting_cnt == was_preempted) preempted_unlock() preempted_unlock(): // avoid race between thread and interrupt handler if (atomic_exchange(thread.nesting_cnt, 0) == was_preempted) { preempt_lock.lock() preempted_list.remove(thread) if ((is_empty(cpu.cur_preempted)) && (detection_waiting)) detection_semaphore.up() // notify the detector thread about the grace period preempt_lock.unlock() } Note: Except preempted_unlock() only inexpensive operations. Martin Děcký, FOSDEM 2014, February 2nd 2014 Read-Copy-Update for HelenOS 38
  • 39. AP-RCU Writer Side synchronize(): memory_barrier() mutex.lock() cur_gp++ // start a new grace period reader_cpus = [] // gather CPUs potentially in read-side CS foreach cpu in cpus { if ((!cpu.idle) && (cpu.last_seen_gp != cur_gp)) { cpu.last_ctx_switch_cnt = cpu.ctx_switch_cnt reader_cpus += cpu } } wait(10ms) // longest acceptable grace period duration (tunable) foreach cpu in reader_cpus { // enforce a quiescent state if ((!cpu.idle) && (cpu.last_seen_gp != cur_gp) && (cpu.last_ctx_switch_cnt == cpu.ctx_switch_cnt)) cpu.ctx_switch_force_wait() } mutex.unlock() Martin Děcký, FOSDEM 2014, February 2nd 2014 Read-Copy-Update for HelenOS 39
  • 40. AH-RCU Writer Side detector_thread: forever { wait_for_callbacks() // run callbacks added before the current grace period execute_callbacks() // push callbacks registered since last processing to the queue advance_callbacks() } wait_for_gp_end() Martin Děcký, FOSDEM 2014, February 2nd 2014 Read-Copy-Update for HelenOS 40
  • 41. AH-RCU Writer Side (2) wait_for_gp_end(): gp_mutex.lock() if (completed_gp != cur_gp) { // a grace period is already in progress wait_for_gp_end_signal() goto out } else { // start a new grace period preempt_lock.lock() cur_gp++ preempt_lock.unlock() } gp_mutex.unlock() wait_for_readers() gp_mutex.lock() completed_gp = cur_gp out: gp_mutex.unlock() Martin Děcký, FOSDEM 2014, February 2nd 2014 Read-Copy-Update for HelenOS 41
  • 42. Evaluation Read-side critical section scalability: Traversal of a five-element list The list is protected as a whole, it is only read, never modified. 1.2e+09 ideal ah-rcu pap-rcu spinlock List traversals / second 1e+09 8e+08 6e+08 4e+08 2e+08 0 0 1 2 3 4 5 Threads Martin Děcký, FOSDEM 2014, February 2nd 2014 Read-Copy-Update for HelenOS 42
  • 43. Evaluation (2) Write-side overhead: Different ratios of updates vs. lookups Five-element list, four threads running in parallel. Updates are always synchronized by a spinlock. 5e+08 ah-rcu + spinlock pap-rcu + spinlock spinlock Operations / second 4.5e+08 4e+08 3.5e+08 3e+08 2.5e+08 2e+08 1.5e+08 1e+08 5e+07 0 0 10 20 40 60 80 100 % of updates Martin Děcký, FOSDEM 2014, February 2nd 2014 Read-Copy-Update for HelenOS 43
  • 44. Evaluation (3) Read-side scalability vs. write-side overhead: Crossover point Data points from previous figure with low fraction of updates are discarded. 6e+07 ah-rcu + spinlock pap-rcu + spinlock spinlock Operations / second 5e+07 4e+07 3e+07 2e+07 1e+07 0 5 10 20 40 60 80 100 % of updates Martin Děcký, FOSDEM 2014, February 2nd 2014 Read-Copy-Update for HelenOS 44
  • 45. Evaluation (4) Concurrent hash table lookup scalability 128 buckets, average load factor of 4 elements per bucket, 50 % of lookups for hitting keys, 50 % of lookups for missing keys (each thread used a separate list). The resize condition was checked, but never executed. 7e+07 cht + ah-rcu ht + bucket spinlocks ht + global spinlock Lookups / second 6e+07 5e+07 4e+07 3e+07 2e+07 1e+07 0 0 1 2 3 4 5 Threads Martin Děcký, FOSDEM 2014, February 2nd 2014 Read-Copy-Update for HelenOS 45
  • 46. Evaluation (5) Concurrent hash table update overhead: Different ratios of concurrent updates vs. lookups Four threads running in parallel. 7e+07 cht + ah-rcu ht + bucket spinlocks ht + global spinlock Operations / second 6e+07 5e+07 4e+07 3e+07 2e+07 1e+07 0 0 10 20 30 40 50 60 70 80 % of updates Martin Děcký, FOSDEM 2014, February 2nd 2014 Read-Copy-Update for HelenOS 46
  • 47. Conclusion Novel scalable algorithms Preemptible AP-RCU for HelenOS Preemptible AH-RCU for HelenOS Resizeable Concurrent Hash Table for HelenOS Suitable as a basic data structure for asynchronous HelenOS IPC Suitable for other kernel uses (e.g. global page table) Thorough evaluation Promising behavior Martin Děcký, FOSDEM 2014, February 2nd 2014 Read-Copy-Update for HelenOS 47
  • 48. Q&A www.helenos.org Martin Děcký, FOSDEM 2014, February 2nd 2014 Read-Copy-Update for HelenOS 48
  • 49. References [1] Herlihy M. P.: Impossibility and universality results for wait-free synchronization, in Proceedings of 7th Annual ACM Symposium on Principles of Distributed Computing, ACM, 1988 [2] Kogan A., Petrank E.: Wait-free queues with multiple enqueuers and dequeuers, in Proceedings of 16th ACM Symposium on Principles and Practice of Parallel Programming, ACM, 2011 [3] Podzimek A., Děcký M., Bulej L., Tůma P.: A Non-Intrusive Read-Copy-Update for UTS, in Proceedings of 18th IEEE International Conference on Parallel and Distributed Systems, IEEE, 2012, http://d3s.mff.cuni.cz/publications/download/PodzimekDeckyBulejTuma-ICPADS-2012.pdf [4] http://d3s.mff.cuni.cz/software/rcu/rcu.patch [5] Triplett J., McKenney P. E., Walpole J.: Resizable, scalable, concurrent hash tables via relativistic programming, in Proceedings of the 2011 USENIX Annual Technical Conference, ACM, 2011 [6] Michael M. M.: High performance dynamic lock-free hash tables and list-based sets, in Proceedings of 14th Annual ACM Symposium on Parallel Algorithms and Architectures, ACM, 2002 [7] Hraška A.: Read-Copy-Update for HelenOS, master thesis, Charles University in Prague, 2013, http://www.helenos.org/doc/theses/ah-thesis.pdf [8] https://code.launchpad.net/~adam-hraska+lp/helenos/cht-bench Martin Děcký, FOSDEM 2014, February 2nd 2014 Read-Copy-Update for HelenOS 49