SlideShare a Scribd company logo
* Based on kernel 5.11 (x86_64) – QEMU
* 2-socket CPUs (2 cores/socket)
* 16GB memory
* Kernel parameter: nokaslr norandmaps
* KASAN: disabled
* Userspace: ASLR is disabled
* Legacy BIOS
Linux Synchronization Mechanism: Semaphore &
Mutex
Adrian Huang | Feb, 2023
Agenda
• Semaphore
✓producer-consumer problem
✓Implementation in Linux kernel
• Mutex (introduced in v2.6.16)
✓Enforce serialization on shared memory systems
✓Implementation in Linux kernel
✓Mutex lock
➢Fast path, midpath, slow path
✓Mutex unlock
➢Fast path and slow path
➢Mutex ownership (with a lab)
◼ Re-visit this concept: Only the lock owner has the permission to unlock the mutex
✓Q & A
Semaphore: producer-consumer problem
task 0
Semaphore
wakeup/signal
#0
#1
#2
#3
Share resource
counter = 4
wait_list
wait/sleep
P, down()
V, up()
task 1
task N
.
.
task task task
• Sleeping lock
• Used in process context *ONLY*
• Cannot hold a spin lock while acquiring a semaphore
• Mainly use in producer-consumer scenario
• The lock holder does not require to unlock the lock. (non-ownership concept)
✓ Something like notification
down
__down
raw_spin_lock_irqsave
sem->count > 0
semaphore
lock
count
wait_list semaphore_waiter
list
task
struct raw_spinlock or raw_spinlock_t
raw_lock
sem->count--
raw_spin_unlock_irqrestore
up
Semaphore Implementation in Linux Kernel
__down_common
Y
N
Semaphore Implementation in Linux Kernel
Semaphore Implementation in Linux Kernel
[Only for interruptible and wakekill task] Check if the sleeping task gets a signal
Semaphore Implementation in Linux Kernel
1 Protect sem->count data
2
Reschedule: Need to unlock spinlock
down
__down
raw_spin_lock_irqsave
sem->count > 0
sem->count--
raw_spin_unlock_irqrestore
__down_common
Y
N
Semaphore Implementation in Linux Kernel
Protect sem->count data
up
__up
raw_spin_lock_irqsave
wait_list empty?
sem->count++
raw_spin_unlock_irqrestore
Y
N
semaphore
lock
count
wait_list semaphore_waiter
list
task
struct raw_spinlock or raw_spinlock_t
raw_lock
up
Agenda
• Semaphore
✓producer-consumer problem
✓Implementation in Linux kernel
• Mutex (introduced in v2.6.16)
✓Enforce serialization on shared memory systems
✓Implementation in Linux kernel
✓Call path
➢Fast path, midpath, slow path
Mutex: Enforce serialization on shared memory systems
task 0
mutex_unlock()
mutex_lock()
task 1
task N
.
.
task task task
mutex
owner
wait_lock
osq
wait_list
Critical Section
task
rlock
struct spinlock or spinlock_t
atomic_t tail;
optimistic_spin_queue
Lock owner
Protect accessing members
of mutex struct
[midpath] spinning: busy-waiting
[slow path] waiting tasks: sleep
(non-busy waiting)
Mutex Implementation in Linux
• Mutex implementation paths
✓Fastpath: Uncontended case by using cmpxchg(): CAS (Compare and Swap)
✓Midpath (optimistic spinning) - The priority of the lock owner is the highest one
➢Spin for mutex lock acquisition when the lock owner is running.
➢The lock owner is likely to release the lock soon.
➢Leverage cancelable MCS lock (OSQ - Optimistic Spin Queue: MCS-like lock): v3.15
✓Slowpath: The task is added to the waiting queue and sleeps until woken up by the
unlock path
• Mutex is a hybrid type (spinning & sleeping): Busy-waiting for a few cycles
instead of immediately sleeping
• Ownership: Only the lock owner can release the lock
• kernel/locking/{mutex.c, osq_lock.c}
• Reference: Generic Mutex Subsystem
mutex_lock
__mutex_trylock_fast
atomic_long_try_cmpxchg_acquire
__mutex_lock_slowpath
Fail
__mutex_lock
__mutex_lock_common
preempt_disable
__mutex_trylock() ||
mutex_optimistic_spin()
preempt_enable
return
add the task to wait_list
Success: fast path
Yes: midpath
No: slow path
mutex_lock(): Call path
mutex_lock
__mutex_trylock_fast
atomic_long_try_cmpxchg_acquire
__mutex_lock_slowpath
Fail
__mutex_lock
__mutex_lock_common
preempt_disable
__mutex_trylock() ||
mutex_optimistic_spin()
preempt_enable
return
add the task to wait_list
Success: fast path
Yes: midpath
No: slow path
mutex_lock(): Fast path
__mutex_trylock
__mutex_trylock_or_owner
mutex_can_spin_on_owner
The lock might be unlocked by another core
mutex_optimistic_spin
osq_lock
__mutex_trylock_or_owner
mutex_spin_on_owner
cpu_relax
osq_unlock
for (;;)
break if getting the lock
There’s an owner: break if the lock
is released or owner goes to sleep
Return true if the following conditions are met
• The spinning task is not preempted: need_resched()
• The lock owner:
✓ Not preempted : checked by vcpu_is_preempted()
✓ Not sleep: checked by owner->on_cpu
• Spinner is spinning on the current lock owner!
mutex_spin_on_owner() returns true → keep looping for
acquiring the lock
• Lock release: one of spinning tasks can get the lock
mutex_spin_on_owner() returns false → break ‘for’ loop
• The spinning task is preempted
• The lock owner is preempted
• The lock owner sleeps
mutex_lock(): midpath
mutex_can_spin_on_owner
mutex_spin_on_owner
__mutex_trylock
__mutex_trylock_or_owner
mutex_can_spin_on_owner
The lock might be unlocked by another core
mutex_optimistic_spin
osq_lock
__mutex_trylock_or_owner
mutex_spin_on_owner
cpu_relax
osq_unlock
for (;;)
break if getting the lock
There’s an owner: break if the lock
is released or owner goes to sleep
mutex_lock(): midpath
Second or later osq_lock() is spinned in this function.
First osq_lock() gets osq lock and spins in this loop.
Notify other osq spinners to get an osq lock.
owner
owner
mutex_unlock()
mutex_lock()
Critical Section
core 0
mutex_unlock()
mutex_lock()
Critical Section
core 1 core 2 core 3
spinning
mutex_unlock()
mutex_lock()
Critical Section
spinning
mutex_unlock()
mutex_lock()
Critical Section
spinning
midpath: [Case #1: ideal] without preemption or sleep (both lock
owner and spinner)
owner
owner
One of spinning tasks can get the lock after the owner releases the lock:
Spinning tasks do not need to be moved to wait list
mutex_unlock()
mutex_lock()
Critical Section
core 0
mutex_unlock()
mutex_lock()
Critical Section
core 1 core 2 core 3
spinning
mutex_unlock()
mutex_lock()
Critical Section
spinning
mutex_unlock()
mutex_lock()
Critical Section
spinning
midpath – [Case #1: ideal] lock release without preemption or
sleep
When to exit the spinning?
1. The lock owner releases the lock
2. The lock owner goes to sleep or is preempted: spinning tasks go to slow path
✓ Check task->on_cpu
✓ Functions: prepare_task(), finish_task()…
3. The spinning task is preempted: the spinning task goes to slow path
✓ need_resched()
owner
owner
owner
owner
Three cases for “cannot spin on
mutex owner”
• The lock owner is preempted
• The spinning task is preempted
• The lock owner sleeps
owner
owner
mutex_unlock()
mutex_lock()
Critical Section
core 0
mutex_unlock()
mutex_lock()
Critical Section
core 1 core 2 core 3
spinning (midpath)
mutex_unlock()
mutex_lock()
Critical Section
spinning (midpath)
mutex_unlock()
mutex_lock()
Critical Section
non-busy wait
(slow path: move
this task to wait
list)
midpath: [Case #2] Mutex lock owner is preempted
owner
owner
Critical Section
preempt
reschedule
wakeup
non-busy wait
(slow path: move
this task to wait
list)
1
2
3
6 4
5
7
8
mutex_unlock()
mutex_lock()
Critical Section
core 0
mutex_unlock()
mutex_lock()
core 1 core 2
spinning (midpath)
mutex_unlock()
mutex_lock()
Critical Section
spinning (midpath)
midpath: [Case #3] Spinner (osq lock owner) is preempted
owner
preempt
1
non-busy wait
(slow path)
Reschedule back
4
3 owner
Critical Section
__mutex_unlock_slowpath ->
wake_up_q
5
core 3
mutex_unlock()
mutex_lock()
Critical Section
spinning (midpath)
6 owner
Reschedule:
schedule_preempt_disabled()
2
Three cases for “cannot spin on
mutex owner”
• The lock owner is preempted
• The spinning task is preempted
• The lock owner sleeps
mutex_unlock()
mutex_lock()
Critical Section
core 0
mutex_unlock()
mutex_lock()
core 1 core 2
spinning (midpath)
mutex_unlock()
mutex_lock()
Critical Section
spinning (midpath)
midpath: [Case #3] Spinner (osq lock owner) is preempted
owner
preempt
1
Reschedule:
schedule_preempt_disabled()
2
non-busy wait
(slow path)
Reschedule back
4
3 owner
Critical Section
__mutex_unlock_slowpath ->
wake_up_q
5
core 3
mutex_unlock()
mutex_lock()
Critical Section
spinning (midpath)
6 owner
Who sets TIF_NEED_RESCHED? → set_tsk_need_resched()
1. Call path
✓ timer_interrupt → tick_handle_periodic → tick_periodic →
update_process_times → scheduler_tick → curr->sched_class-
>task_tick → task_tick_fair → entity_tick -> check_preempt_tick
-> resched_curr -> set_tsk_need_resched
✓ HW interrupt (not timer HW) → wake up a higher priority task
2. Users:
✓ check_preempt_tick(), check_preempt_wakeup(),
wake_up_process()….and so on.
Who sets TIF_NEED_RESCHED? full call path
Who sets TIF_NEED_RESCHED?
Who sets TIF_NEED_RESCHED? → set_tsk_need_resched()
1. Call path
✓ timer_interrupt → tick_handle_periodic → tick_periodic
→ update_process_times → scheduler_tick → curr-
>sched_class->task_tick → task_tick_fair → entity_tick ->
check_preempt_tick -> resched_curr ->
set_tsk_need_resched
✓ HW interrupt (not timer HW) → wake up a higher priority
task
2. Users:
✓ check_preempt_tick(), check_preempt_wakeup(),
wake_up_process()….and so on.
Set TIF_NEED_RESCHED: current task will be rescheduled later
PREEMPT_NEED_RESCHED bit = 0 → Need to reschedule (check comments in this header)
Who sets TIF_NEED_RESCHED?
• Set TIF_NEED_RESCHED flag if the delta is greater than
ideal_runtime
✓ The running task will be scheduled out.
Who sets TIF_NEED_RESCHED?
Who sets TIF_NEED_RESCHED? → set_tsk_need_resched()
1. Call path
✓ timer_interrupt → tick_handle_periodic → tick_periodic →
update_process_times → scheduler_tick → curr->sched_class-
>task_tick → task_tick_fair → entity_tick -> check_preempt_tick
-> resched_curr -> set_tsk_need_resched
✓ HW interrupt (not timer HW) → wake up a higher priority task
2. Users:
✓ check_preempt_tick(), check_preempt_wakeup(),
wake_up_process()….and so on.
Who sets TIF_NEED_RESCHED?
Three cases for “cannot spin on
mutex owner”
• The lock owner is preempted
• The spinning task is preempted
• The lock owner sleeps
[Case #4] Locker owner sleeps (reschedule): A test kernel module
The action of sleep is identical to preemption and “wait for IO”: reschedule
Create 4 kernel threads
Source code (github): test-modules/mutex/mutex.c
mutex_unlock()
mutex_lock()
core 0 core 1 core 2
owner
1
mutex_optimistic_spin() ->
mutex_can_spin_on_owner() returns fail
Owner
6
core 3
[Case #4] Locker owner sleeps (reschedule): other tasks cannot spin
kthread_0 kthread_1 kthread_2 kthread_3
Critical Section
msleep(): reschedule
task->cpu_on = 1 0
mutex_unlock()
mutex_lock()
Critical Section
msleep(): reschedule
task->cpu_on = 1 0
non-busy wait
(slow path): lock owner’s
task->cpu_on = 0
mutex_unlock()
mutex_lock()
Critical Section
msleep(): reschedule
task->cpu_on = 1 0
non-busy wait
(slow path): lock owner’s
task->cpu_on = 0
mutex_unlock()
mutex_lock()
Critical Section
msleep(): reschedule
task->cpu_on = 1 0
non-busy wait
(slow path): lock owner’s
task->cpu_on = 0
2
3
4
mutex_optimistic_spin() ->
mutex_can_spin_on_owner() returns fail
5
mutex_optimistic_spin() ->
mutex_can_spin_on_owner()
returns fail
Owner
7
Owner
8
__mutex_trylock() ||
mutex_optimistic_spin()
preempt_enable
return
add the task to wait_list
Yes: midpath
No: slow path
Call path
[Case #4] Locker owner sleeps (reschedule): gdb
watchpoint: task->on_cpu → who changes this?
[Case #4] Locker owner sleeps (reschedule): Who changes task->on_cpu?
task->on_cpu is set 0 during context switch
mutex_unlock()
mutex_lock()
core 0 core 1
owner
1
mutex_optimistic_spin() ->
mutex_can_spin_on_owner() returns fail
Owner
6
[Case #4] Locker owner sleeps (reschedule): gdb: other tasks cannot spin
kthread_0 kthread_1
Critical Section
msleep(): reschedule
task->cpu_on = 1 0
mutex_unlock()
mutex_lock()
Critical Section
msleep(): reschedule
task->cpu_on = 1 0
non-busy wait
(slow path): lock owner’s
task->cpu_on = 0
2
3
__mutex_trylock() ||
mutex_optimistic_spin()
preempt_enable
return
add the task to wait_list
Yes: midpath
No: slow path
Call path
mutex_unlock()
mutex_lock()
core 0 core 1
owner
1
mutex_optimistic_spin() ->
mutex_can_spin_on_owner() returns fail
Owner
6
[Case #4] Locker owner sleeps (reschedule): gdb: other tasks cannot spin
kthread_0 kthread_1
Critical Section
msleep(): reschedule
task->cpu_on = 1 0
mutex_unlock()
mutex_lock()
Critical Section
msleep(): reschedule
task->cpu_on = 1 0
non-busy wait
(slow path): lock owner’s
task->cpu_on = 0
2
3
__mutex_trylock() ||
mutex_optimistic_spin()
preempt_enable
return
add the task to wait_list
Yes: midpath
No: slow path
Call path
retval = 0 → cannot spin this owner
owner->on_cpu = 0
mutex_unlock()
mutex_unlock
__mutex_unlock_fast
mutex_unlock(): Call path
return
mutex
owner
wait_lock
osq
wait_list
task
• task_struct pointers aligns to at least L1_CACHE_BYTES
• 3 LSB bits are used for non-empty waiter list
✓ W (MUTEX_FLAG_WAITERS)
◼ Non-empty waiter list. Issue a wakeup when unlocking
✓ H (MUTEX_FLAG_HANDOFF)
◼ Unlock needs to hand the lock to the top-waiter
◼ Use by ww_mutex because ww_mutex’s waiter list is not FIFO order.
✓ P (MUTEX_FLAG_PICKUP)
◼ Handoff has been done and we're waiting for pickup
◼ Use by ww_mutex because ww_mutex’s waiter list is not FIFO order.
locker->owner = 0
__mutex_unlock_slowpath
Have waiters: One of 3-bit LSB
of lock->owner is not cleared.
spin_lock(&lock->wait_lock)
spin_unlock(&lock->wait_lock)
Get a waiter from lock->wait_list
wake_up_q
wake_up_process
owner
task virtual addr W
P H
0
1
2
63
lock->owner
__mutex_handoff Called if MUTEX_FLAG_HANDOFF is set
atomic_long_cmpxchg_release(
&lock->owner, owner,
__owner_flags(owner))
The woken task will update lock->owner
Set 3-bit LSB of lock->owner → Clear the
original task struct address
* ww_mutex (Wound/Wait Mutex): Deadlock-proof mutex
[Unlock task = lock->owner]
No waiter: 3-bit LSB of lock->owner are cleared
mutex_unlock
__mutex_unlock_fast
mutex_unlock(): fast path
return
locker->owner = 0
[fast path] A spinner will take the lock
[Unlock task = lock->owner]
No waiter: 3-bit LSB of lock->owner are cleared
mutex_unlock
__mutex_unlock_fast
mutex_unlock(): slow path
return
locker->owner = 0
__mutex_unlock_slowpath
Have waiters: One of 3-bit LSB
of lock->owner is not cleared.
spin_lock(&lock->wait_lock)
spin_unlock(&lock->wait_lock)
Get a waiter from lock->wait_list
wake_up_q
wake_up_process
__mutex_handoff Called if MUTEX_FLAG_HANDOFF is set
owner
task virtual addr 1
0 0
0
1
2
63
atomic_long_cmpxchg_release(
&lock->owner, owner,
__owner_flags(owner)
owner
0 1
0 0
0
1
2
63
The woken task will update lock->owner
mutex
owner
wait_lock
osq
wait_list
task
• task_struct pointers aligns to at least L1_CACHE_BYTES
• 3 LSB bits are used for non-empty waiter list
✓ W (MUTEX_FLAG_WAITERS)
◼ Non-empty waiter list. Issue a wakeup when unlocking
✓ H (MUTEX_FLAG_HANDOFF)
◼ Unlock needs to hand the lock to the top-waiter
◼ Use by ww_mutex because ww_mutex’s waiter list is not FIFO order.
✓ P (MUTEX_FLAG_PICKUP)
◼ Handoff has been done and we're waiting for pickup
◼ Use by ww_mutex because ww_mutex’s waiter list is not FIFO order.
owner
task virtual addr W
P H
0
1
2
63
lock>-owner
* ww_mutex (Wound/Wait Mutex): Deadlock-proof mutex
[Unlock task = lock->owner]
No waiter: 3-bit LSB of lock->owner are cleared
mutex_unlock
__mutex_unlock_fast
[Unlock task = lock->owner]
No waiter: 3-bit LSB of lock->owner are cleared
return
locker->owner = 0
__mutex_unlock_slowpath
Have waiters: One of 3-bit LSB
of lock-owner is not cleared.
spin_lock(&lock->wait_lock)
spin_unlock(&lock->wait_lock)
Get a waiter from lock->wait_list
wake_up_q
wake_up_process
__mutex_handoff Called if MUTEX_FLAG_HANDOFF is set
owner
task virtual addr 1
0 0
0
1
2
63
atomic_long_cmpxchg_release(
&lock->owner, owner,
__owner_flags(owner)
owner
0 1
0 0
0
1
2
63
The woken task will update lock->owner
mutex_unlock(): slow path
Resume here: kthread_0 wakes up kthread_1
Woken task
1 Context switch
2
Update lock->owner
gdb watchpoint: lock->owner
Update lock->owner
* Bit 0 is still set (MUTEX_FLAG_WAITERS): The upcoming
mutex_unlock() will wake up the waiter instead of spinner.
* Bit 1 (MUTEX_FLAG_HANDOFF) is cleared from
__mutex_trylock->__mutex_trylock_or_owner.
Update lock->owner
When/who clears 3-bit LSB of lock->owner?
Woken task: When/who to clear 3-bit LSB of lock-owner?
Clear 3-bit LSB of lock->owner if no waiters
mutex_unlock
__mutex_unlock_fast
mutex_unlock(): Mutex ownership
return
locker->owner = 0
__mutex_unlock_slowpath
Have waiters: One of 3-bit LSB
of lock->owner is not cleared.
spin_lock(&lock->wait_lock)
spin_unlock(&lock->wait_lock)
Get a waiter from lock->wait_list
wake_up_q
wake_up_process
__mutex_handoff Called if MUTEX_FLAG_HANDOFF is set
atomic_long_cmpxchg_release(
&lock->owner, owner,
__owner_flags(owner))
The woken task will update lock->owner
Set 3-bit LSB of lock->owner → Clear the
original task struct address
[Unlock task = lock->owner]
No waiter: 3-bit LSB of lock->owner are cleared
• [Fastpath] Check ownership of a mutex
• [Slowpath] Does not check ownership of a mutex
mutex_unlock(): [lab] Behavior observation when lock owner !=
unlocker’s task (Mutex ownership)
mutex_unlock()
mutex_lock()
core 0 core 1
kthread_0 kthread_1
Critical Section
sleep 3 seconds mutex_unlock()
sleep 1 second
lock_thread unlock_thread
Source code: test-modules/mutex-unlock-by-another-task/mutex.c
This scenario is created on purpose for
demonstration. It won’t happen in real case.
Note
mutex_unlock()
mutex_lock()
core 0 core 1
kthread_0 kthread_1
Critical Section
sleep 3 seconds mutex_unlock()
sleep 1 second
lock_thread unlock_thread
Can another task
unlock the mutex?
mutex_unlock(): [lab] Behavior observation when lock owner !=
unlocker’s task (Mutex ownership)
This scenario is created on purpose for
demonstration. It won’t happen in real case.
Note
__mutex_unlock_slowpath
spin_lock(&lock->wait_lock)
spin_unlock(&lock->wait_lock)
Get a waiter from lock->wait_list
wake_up_q
wake_up_process
__mutex_handoff
atomic_long_cmpxchg_release(
&lock->owner, owner,
__owner_flags(owner))
mutex_unlock(): slow path does not check unlocker’s ownership
[DEBUG_MUTEXES] Print a warning message if unlocker’s task != lock owner’s task
breakpoint
breakpoint
1
1
Lock owner
2
Unlocker’s task != lock owner
3
mutex_unlock(): [lab] Behavior when lock owner != unlocker’s task
mutex_unlock()
mutex_lock()
core 0 core 1
kthread_0 kthread_1
Critical Section
sleep 3 seconds mutex_unlock()
sleep 1 second
lock_thread unlock_thread
We’re here
breakpoint
1
breakpoint
1
lock owner = old
2
lock->owner is set 0 by atomic_long_cmpxchg_release()
3
mutex_unlock(): [lab] Behavior if lock owner != unlocker’s task
mutex_unlock()
mutex_lock()
core 0 core 1
kthread_0 kthread_1
Critical Section
sleep 3 seconds mutex_unlock()
sleep 1 second
lock_thread unlock_thread
We’re here
mutex_unlock(): [lab] Behavior observation when lock owner !=
unlocker’s task (Mutex ownership)
mutex_unlock()
mutex_lock()
core 0 core 1
kthread_0 kthread_1
Critical Section
sleep 3 seconds
mutex_unlock()
sleep 1 second
lock_thread unlock_thread
1
2
3
This unlocks the
locked mutex
mutex
owner = 0
wait_lock = 0
osq = 0
wait_list
Breakpoint stops at second mutex_unlock() in kthread_0
mutex_unlock(): [lab] Behavior observation when lock owner !=
unlocker’s task (Mutex ownership)
mutex_unlock()
mutex_lock()
core 0 core 1
kthread_0 kthread_1
Critical Section
sleep 3 seconds mutex_unlock()
sleep 1 second
lock_thread unlock_thread
1
2
3
This unlocks the locked mutex
What’s the behavior?
breakpoint
1
mutex_unlock(): [lab] Behavior when lock owner != unlocker’s task
mutex_unlock()
mutex_lock()
core 0 core 1
kthread_0 kthread_1
Critical Section
sleep 3 seconds mutex_unlock()
sleep 1 second
lock_thread unlock_thread
breakpoint
1
No lock owner
2
We’re here
breakpoint
1
mutex_unlock(): [lab] Behavior if lock owner != unlocker’s task
mutex_unlock()
mutex_lock()
core 0 core 1
kthread_0 kthread_1
Critical Section
sleep 3 seconds mutex_unlock()
sleep 1 second
lock_thread unlock_thread
breakpoint
1
lock owner = old
2
lock->owner is set 0 by atomic_long_cmpxchg_release()
3
4
We’re here
mutex_unlock(): [lab] Behavior observation when lock owner !=
unlocker’s task (Mutex ownership)
mutex_unlock()
mutex_lock()
core 0 core 1
kthread_0 kthread_1
Critical Section
sleep 3 seconds mutex_unlock()
sleep 1 second
lock_thread unlock_thread
Takeaways
1. [Fastpath] Linux kernel checks mutex’s ownership
2. [Slowpath] Linux kernel does not check mutex’s ownership when unlocking a mutex
✓ Developers must take care of mutex_lock/mutex_unlock pair
✓ Slowpath prints a warning message if mutex debug option is enabled.
✓ Different from the concept: Only the lock owner has the permission to unlock the mutex
mutex_unlock(): [lab] Behavior observation when lock owner !=
unlocker’s task (Mutex ownership)
Why doesn’t slowpath check ownership?
1. Ownership checking is only for developers and not enforced by Linux kernel.
✓ Developers need to take care of it.
Quotes
1. From Generic Mutex Subsystem
✓ Mutex Semantics
• Only one task can hold the mutex at a time.
• Only the owner can unlock the mutex.
• …
✓ These semantics are fully enforced when CONFIG_DEBUG_MUTEXES is enabled
Think about…
mutex_unlock()
mutex_lock()
core 0 core 1
Critical Section
sleep 3 seconds
mutex_unlock()
sleep 1 second
mutex_unlock()
mutex_lock()
core 2
Critical Section
1 Will it acquire the mutex lock
successfully?
2
Will this unlock the mutex
lock acquired by core 2?
Q&A #1: Semaphore can be used to synchronize with user-space
Consumer
buffer
Producer
up()
down()
[Semaphore] Producer/Consumer Concept
* Screenshot captured from: Chapter 10, Linux Kernel Development, 3rd Edition, Robert Love
• Consumer waits if buffer is empty
• Producer waits if buffer is full
• Only one process can manipulate the buffer at a time
(mutual exclusion)
Principle
Q&A #1: Semaphore can be used to synchronize with user-space
Consumer
buffer
Producer
up(), V
down(), P
[Semaphore] Producer/Consumer Concept
Code reference: CS 537 Notes, Section #6: Semaphores and Producer/Consumer Problem
Q&A #1: Semaphore can be used to synchronize with user-space
Consumer
buffer
Producer
up(), V
down(), P
[Semaphore] Producer/Consumer Concept
Data structure synchronization
Notify consumer
Wait if buffer is full Wait if buffer is empty
Notify producer
Code reference: CS 537 Notes, Section #6: Semaphores and Producer/Consumer Problem
Q&A #1: Semaphore can be used to synchronize with user-space
Consumer
buffer
Producer
up(), V
down(), P
[Semaphore] Producer/Consumer Concept
User Space
Kernel Space
. . .
buffer
Consumer Consumer
Producer
system
call
system
call
Possible Scenario
Note
1. up()/down() invocations are done in kernel.
Q&A #2: Mutex isn’t suitable for synchronizations
between kernel and user-space
* Screenshot captured from: Chapter 10, Linux Kernel Development, 3rd Edition, Robert Love
Explanation
1. [Mutex] ownership!
✓ Whoever locked a mutex must unlock it
Reference
• Generic Mutex Subsystem
• Wound/Wait Deadlock-Proof Mutex Design
• Mutexes and Semaphores Demystified
• MCS locks and qspinlocks
• Linux中的mutex机制[一] - 加锁和osq lock
Backup
__mutex_trylock
signal_pending_state
spin_unlock(&lock->wait_lock)
spin_lock(&lock->wait_lock)
for (;;)
goto ‘acquired’ lable if getting the lock
Interrupted by signal: break
mutex_lock(): slowpath
__mutex_add_waiter(lock, &waiter, &lock-
>wait_list)
waiter.task = current
set_current_state(state)
schedule_preempt_disabled
__mutex_set_flag(lock,
MUTEX_FLAG_HANDOFF)
__mutex_trylock() ||
(first && mutex_optimistic_spin())
spin_lock(&lock->wait_lock)
N Y: break
spin_lock(&lock->wait_lock)
acquired
__set_current_state(TASK_RUNNING)
mutex_remove_waiter
list_empty(&lock->wait_list)
__mutex_clear_flag(lock,
MUTEX_FLAGS)
spin_lock(&lock->wait_lock)
preempt_enable

More Related Content

What's hot

Memory Mapping Implementation (mmap) in Linux Kernel
Memory Mapping Implementation (mmap) in Linux KernelMemory Mapping Implementation (mmap) in Linux Kernel
Memory Mapping Implementation (mmap) in Linux Kernel
Adrian Huang
 
Anatomy of the loadable kernel module (lkm)
Anatomy of the loadable kernel module (lkm)Anatomy of the loadable kernel module (lkm)
Anatomy of the loadable kernel module (lkm)
Adrian Huang
 
Memory Management with Page Folios
Memory Management with Page FoliosMemory Management with Page Folios
Memory Management with Page Folios
Adrian Huang
 
Linux Initialization Process (2)
Linux Initialization Process (2)Linux Initialization Process (2)
Linux Initialization Process (2)
shimosawa
 
LISA2019 Linux Systems Performance
LISA2019 Linux Systems PerformanceLISA2019 Linux Systems Performance
LISA2019 Linux Systems Performance
Brendan Gregg
 
Process Address Space: The way to create virtual address (page table) of user...
Process Address Space: The way to create virtual address (page table) of user...Process Address Space: The way to create virtual address (page table) of user...
Process Address Space: The way to create virtual address (page table) of user...
Adrian Huang
 
Vmlinux: anatomy of bzimage and how x86 64 processor is booted
Vmlinux: anatomy of bzimage and how x86 64 processor is bootedVmlinux: anatomy of bzimage and how x86 64 processor is booted
Vmlinux: anatomy of bzimage and how x86 64 processor is booted
Adrian Huang
 
Linux Kernel - Virtual File System
Linux Kernel - Virtual File SystemLinux Kernel - Virtual File System
Linux Kernel - Virtual File System
Adrian Huang
 
Physical Memory Management.pdf
Physical Memory Management.pdfPhysical Memory Management.pdf
Physical Memory Management.pdf
Adrian Huang
 
Kernel_Crash_Dump_Analysis
Kernel_Crash_Dump_AnalysisKernel_Crash_Dump_Analysis
Kernel_Crash_Dump_AnalysisBuland Singh
 
Linux Kernel Booting Process (2) - For NLKB
Linux Kernel Booting Process (2) - For NLKBLinux Kernel Booting Process (2) - For NLKB
Linux Kernel Booting Process (2) - For NLKB
shimosawa
 
Decompressed vmlinux: linux kernel initialization from page table configurati...
Decompressed vmlinux: linux kernel initialization from page table configurati...Decompressed vmlinux: linux kernel initialization from page table configurati...
Decompressed vmlinux: linux kernel initialization from page table configurati...
Adrian Huang
 
Linux MMAP & Ioremap introduction
Linux MMAP & Ioremap introductionLinux MMAP & Ioremap introduction
Linux MMAP & Ioremap introduction
Gene Chang
 
Kdump and the kernel crash dump analysis
Kdump and the kernel crash dump analysisKdump and the kernel crash dump analysis
Kdump and the kernel crash dump analysis
Buland Singh
 
Linux Initialization Process (1)
Linux Initialization Process (1)Linux Initialization Process (1)
Linux Initialization Process (1)
shimosawa
 
Memory Compaction in Linux Kernel.pdf
Memory Compaction in Linux Kernel.pdfMemory Compaction in Linux Kernel.pdf
Memory Compaction in Linux Kernel.pdf
Adrian Huang
 
Linux scheduler
Linux schedulerLinux scheduler
Linux scheduler
Liran Ben Haim
 
UM2019 Extended BPF: A New Type of Software
UM2019 Extended BPF: A New Type of SoftwareUM2019 Extended BPF: A New Type of Software
UM2019 Extended BPF: A New Type of Software
Brendan Gregg
 
Kernel Recipes 2019 - ftrace: Where modifying a running kernel all started
Kernel Recipes 2019 - ftrace: Where modifying a running kernel all startedKernel Recipes 2019 - ftrace: Where modifying a running kernel all started
Kernel Recipes 2019 - ftrace: Where modifying a running kernel all started
Anne Nicolas
 
Linux Performance Profiling and Monitoring
Linux Performance Profiling and MonitoringLinux Performance Profiling and Monitoring
Linux Performance Profiling and Monitoring
Georg Schönberger
 

What's hot (20)

Memory Mapping Implementation (mmap) in Linux Kernel
Memory Mapping Implementation (mmap) in Linux KernelMemory Mapping Implementation (mmap) in Linux Kernel
Memory Mapping Implementation (mmap) in Linux Kernel
 
Anatomy of the loadable kernel module (lkm)
Anatomy of the loadable kernel module (lkm)Anatomy of the loadable kernel module (lkm)
Anatomy of the loadable kernel module (lkm)
 
Memory Management with Page Folios
Memory Management with Page FoliosMemory Management with Page Folios
Memory Management with Page Folios
 
Linux Initialization Process (2)
Linux Initialization Process (2)Linux Initialization Process (2)
Linux Initialization Process (2)
 
LISA2019 Linux Systems Performance
LISA2019 Linux Systems PerformanceLISA2019 Linux Systems Performance
LISA2019 Linux Systems Performance
 
Process Address Space: The way to create virtual address (page table) of user...
Process Address Space: The way to create virtual address (page table) of user...Process Address Space: The way to create virtual address (page table) of user...
Process Address Space: The way to create virtual address (page table) of user...
 
Vmlinux: anatomy of bzimage and how x86 64 processor is booted
Vmlinux: anatomy of bzimage and how x86 64 processor is bootedVmlinux: anatomy of bzimage and how x86 64 processor is booted
Vmlinux: anatomy of bzimage and how x86 64 processor is booted
 
Linux Kernel - Virtual File System
Linux Kernel - Virtual File SystemLinux Kernel - Virtual File System
Linux Kernel - Virtual File System
 
Physical Memory Management.pdf
Physical Memory Management.pdfPhysical Memory Management.pdf
Physical Memory Management.pdf
 
Kernel_Crash_Dump_Analysis
Kernel_Crash_Dump_AnalysisKernel_Crash_Dump_Analysis
Kernel_Crash_Dump_Analysis
 
Linux Kernel Booting Process (2) - For NLKB
Linux Kernel Booting Process (2) - For NLKBLinux Kernel Booting Process (2) - For NLKB
Linux Kernel Booting Process (2) - For NLKB
 
Decompressed vmlinux: linux kernel initialization from page table configurati...
Decompressed vmlinux: linux kernel initialization from page table configurati...Decompressed vmlinux: linux kernel initialization from page table configurati...
Decompressed vmlinux: linux kernel initialization from page table configurati...
 
Linux MMAP & Ioremap introduction
Linux MMAP & Ioremap introductionLinux MMAP & Ioremap introduction
Linux MMAP & Ioremap introduction
 
Kdump and the kernel crash dump analysis
Kdump and the kernel crash dump analysisKdump and the kernel crash dump analysis
Kdump and the kernel crash dump analysis
 
Linux Initialization Process (1)
Linux Initialization Process (1)Linux Initialization Process (1)
Linux Initialization Process (1)
 
Memory Compaction in Linux Kernel.pdf
Memory Compaction in Linux Kernel.pdfMemory Compaction in Linux Kernel.pdf
Memory Compaction in Linux Kernel.pdf
 
Linux scheduler
Linux schedulerLinux scheduler
Linux scheduler
 
UM2019 Extended BPF: A New Type of Software
UM2019 Extended BPF: A New Type of SoftwareUM2019 Extended BPF: A New Type of Software
UM2019 Extended BPF: A New Type of Software
 
Kernel Recipes 2019 - ftrace: Where modifying a running kernel all started
Kernel Recipes 2019 - ftrace: Where modifying a running kernel all startedKernel Recipes 2019 - ftrace: Where modifying a running kernel all started
Kernel Recipes 2019 - ftrace: Where modifying a running kernel all started
 
Linux Performance Profiling and Monitoring
Linux Performance Profiling and MonitoringLinux Performance Profiling and Monitoring
Linux Performance Profiling and Monitoring
 

Similar to semaphore & mutex.pdf

Dead Lock Analysis of spin_lock() in Linux Kernel (english)
Dead Lock Analysis of spin_lock() in Linux Kernel (english)Dead Lock Analysis of spin_lock() in Linux Kernel (english)
Dead Lock Analysis of spin_lock() in Linux Kernel (english)
Sneeker Yeh
 
An Introduction to Locks in Go
An Introduction to Locks in GoAn Introduction to Locks in Go
An Introduction to Locks in Go
Yu-Shuan Hsieh
 
Synchronization problem with threads
Synchronization problem with threadsSynchronization problem with threads
Synchronization problem with threads
Syed Zaid Irshad
 
Training Slides: Basics 105: Backup, Recovery and Provisioning Within Tungste...
Training Slides: Basics 105: Backup, Recovery and Provisioning Within Tungste...Training Slides: Basics 105: Backup, Recovery and Provisioning Within Tungste...
Training Slides: Basics 105: Backup, Recovery and Provisioning Within Tungste...
Continuent
 
Locks (Concurrency)
Locks (Concurrency)Locks (Concurrency)
Locks (Concurrency)Sri Prasanna
 
Linux synchronization tools
Linux synchronization toolsLinux synchronization tools
Linux synchronization toolsmukul bhardwaj
 
Much Ado About Blocking: Wait/Wakke in the Linux Kernel
Much Ado About Blocking: Wait/Wakke in the Linux KernelMuch Ado About Blocking: Wait/Wakke in the Linux Kernel
Much Ado About Blocking: Wait/Wakke in the Linux Kernel
Davidlohr Bueso
 
Training Slides: 203 - Backup & Recovery
Training Slides: 203 - Backup & RecoveryTraining Slides: 203 - Backup & Recovery
Training Slides: 203 - Backup & Recovery
Continuent
 
Computer Operating Systems Concurrency Slide
Computer Operating Systems Concurrency SlideComputer Operating Systems Concurrency Slide
Computer Operating Systems Concurrency Slide
yasarcereen
 
Synchronization linux
Synchronization linuxSynchronization linux
Synchronization linuxSusant Sahani
 
Locked down openSUSE Tumbleweed kernel
Locked down openSUSE Tumbleweed kernelLocked down openSUSE Tumbleweed kernel
Locked down openSUSE Tumbleweed kernel
SUSE Labs Taipei
 
Userspace adaptive spinlocks with rseq
Userspace adaptive spinlocks with rseqUserspace adaptive spinlocks with rseq
Userspace adaptive spinlocks with rseq
Igalia
 
Linux kernel development_ch9-10_20120410
Linux kernel development_ch9-10_20120410Linux kernel development_ch9-10_20120410
Linux kernel development_ch9-10_20120410huangachou
 
Linux kernel development chapter 10
Linux kernel development chapter 10Linux kernel development chapter 10
Linux kernel development chapter 10huangachou
 
AOS Lab 6: Scheduling
AOS Lab 6: SchedulingAOS Lab 6: Scheduling
AOS Lab 6: SchedulingZubair Nabi
 
Wait for your fortune without Blocking!
Wait for your fortune without Blocking!Wait for your fortune without Blocking!
Wait for your fortune without Blocking!
Roman Elizarov
 
Java util concurrent
Java util concurrentJava util concurrent
Java util concurrent
Roger Xia
 
AOS Lab 4: If you liked it, then you should have put a “lock” on it
AOS Lab 4: If you liked it, then you should have put a “lock” on itAOS Lab 4: If you liked it, then you should have put a “lock” on it
AOS Lab 4: If you liked it, then you should have put a “lock” on itZubair Nabi
 
Concurrency 2010
Concurrency 2010Concurrency 2010
Concurrency 2010
敬倫 林
 

Similar to semaphore & mutex.pdf (20)

Dead Lock Analysis of spin_lock() in Linux Kernel (english)
Dead Lock Analysis of spin_lock() in Linux Kernel (english)Dead Lock Analysis of spin_lock() in Linux Kernel (english)
Dead Lock Analysis of spin_lock() in Linux Kernel (english)
 
An Introduction to Locks in Go
An Introduction to Locks in GoAn Introduction to Locks in Go
An Introduction to Locks in Go
 
Kernel
KernelKernel
Kernel
 
Synchronization problem with threads
Synchronization problem with threadsSynchronization problem with threads
Synchronization problem with threads
 
Training Slides: Basics 105: Backup, Recovery and Provisioning Within Tungste...
Training Slides: Basics 105: Backup, Recovery and Provisioning Within Tungste...Training Slides: Basics 105: Backup, Recovery and Provisioning Within Tungste...
Training Slides: Basics 105: Backup, Recovery and Provisioning Within Tungste...
 
Locks (Concurrency)
Locks (Concurrency)Locks (Concurrency)
Locks (Concurrency)
 
Linux synchronization tools
Linux synchronization toolsLinux synchronization tools
Linux synchronization tools
 
Much Ado About Blocking: Wait/Wakke in the Linux Kernel
Much Ado About Blocking: Wait/Wakke in the Linux KernelMuch Ado About Blocking: Wait/Wakke in the Linux Kernel
Much Ado About Blocking: Wait/Wakke in the Linux Kernel
 
Training Slides: 203 - Backup & Recovery
Training Slides: 203 - Backup & RecoveryTraining Slides: 203 - Backup & Recovery
Training Slides: 203 - Backup & Recovery
 
Computer Operating Systems Concurrency Slide
Computer Operating Systems Concurrency SlideComputer Operating Systems Concurrency Slide
Computer Operating Systems Concurrency Slide
 
Synchronization linux
Synchronization linuxSynchronization linux
Synchronization linux
 
Locked down openSUSE Tumbleweed kernel
Locked down openSUSE Tumbleweed kernelLocked down openSUSE Tumbleweed kernel
Locked down openSUSE Tumbleweed kernel
 
Userspace adaptive spinlocks with rseq
Userspace adaptive spinlocks with rseqUserspace adaptive spinlocks with rseq
Userspace adaptive spinlocks with rseq
 
Linux kernel development_ch9-10_20120410
Linux kernel development_ch9-10_20120410Linux kernel development_ch9-10_20120410
Linux kernel development_ch9-10_20120410
 
Linux kernel development chapter 10
Linux kernel development chapter 10Linux kernel development chapter 10
Linux kernel development chapter 10
 
AOS Lab 6: Scheduling
AOS Lab 6: SchedulingAOS Lab 6: Scheduling
AOS Lab 6: Scheduling
 
Wait for your fortune without Blocking!
Wait for your fortune without Blocking!Wait for your fortune without Blocking!
Wait for your fortune without Blocking!
 
Java util concurrent
Java util concurrentJava util concurrent
Java util concurrent
 
AOS Lab 4: If you liked it, then you should have put a “lock” on it
AOS Lab 4: If you liked it, then you should have put a “lock” on itAOS Lab 4: If you liked it, then you should have put a “lock” on it
AOS Lab 4: If you liked it, then you should have put a “lock” on it
 
Concurrency 2010
Concurrency 2010Concurrency 2010
Concurrency 2010
 

Recently uploaded

Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Globus
 
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...
Anthony Dahanne
 
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, BetterWebinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
XfilesPro
 
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
Juraj Vysvader
 
Visitor Management System in India- Vizman.app
Visitor Management System in India- Vizman.appVisitor Management System in India- Vizman.app
Visitor Management System in India- Vizman.app
NaapbooksPrivateLimi
 
How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?
How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?
How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?
XfilesPro
 
2024 RoOUG Security model for the cloud.pptx
2024 RoOUG Security model for the cloud.pptx2024 RoOUG Security model for the cloud.pptx
2024 RoOUG Security model for the cloud.pptx
Georgi Kodinov
 
Quarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden ExtensionsQuarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden Extensions
Max Andersen
 
Globus Connect Server Deep Dive - GlobusWorld 2024
Globus Connect Server Deep Dive - GlobusWorld 2024Globus Connect Server Deep Dive - GlobusWorld 2024
Globus Connect Server Deep Dive - GlobusWorld 2024
Globus
 
A Comprehensive Look at Generative AI in Retail App Testing.pdf
A Comprehensive Look at Generative AI in Retail App Testing.pdfA Comprehensive Look at Generative AI in Retail App Testing.pdf
A Comprehensive Look at Generative AI in Retail App Testing.pdf
kalichargn70th171
 
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdfDominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
AMB-Review
 
GlobusWorld 2024 Opening Keynote session
GlobusWorld 2024 Opening Keynote sessionGlobusWorld 2024 Opening Keynote session
GlobusWorld 2024 Opening Keynote session
Globus
 
How Recreation Management Software Can Streamline Your Operations.pptx
How Recreation Management Software Can Streamline Your Operations.pptxHow Recreation Management Software Can Streamline Your Operations.pptx
How Recreation Management Software Can Streamline Your Operations.pptx
wottaspaceseo
 
Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024
Paco van Beckhoven
 
Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024
Globus
 
BoxLang: Review our Visionary Licenses of 2024
BoxLang: Review our Visionary Licenses of 2024BoxLang: Review our Visionary Licenses of 2024
BoxLang: Review our Visionary Licenses of 2024
Ortus Solutions, Corp
 
Designing for Privacy in Amazon Web Services
Designing for Privacy in Amazon Web ServicesDesigning for Privacy in Amazon Web Services
Designing for Privacy in Amazon Web Services
KrzysztofKkol1
 
De mooiste recreatieve routes ontdekken met RouteYou en FME
De mooiste recreatieve routes ontdekken met RouteYou en FMEDe mooiste recreatieve routes ontdekken met RouteYou en FME
De mooiste recreatieve routes ontdekken met RouteYou en FME
Jelle | Nordend
 
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Globus
 
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital TransformationWSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2
 

Recently uploaded (20)

Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
 
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...
 
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, BetterWebinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
 
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
 
Visitor Management System in India- Vizman.app
Visitor Management System in India- Vizman.appVisitor Management System in India- Vizman.app
Visitor Management System in India- Vizman.app
 
How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?
How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?
How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?
 
2024 RoOUG Security model for the cloud.pptx
2024 RoOUG Security model for the cloud.pptx2024 RoOUG Security model for the cloud.pptx
2024 RoOUG Security model for the cloud.pptx
 
Quarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden ExtensionsQuarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden Extensions
 
Globus Connect Server Deep Dive - GlobusWorld 2024
Globus Connect Server Deep Dive - GlobusWorld 2024Globus Connect Server Deep Dive - GlobusWorld 2024
Globus Connect Server Deep Dive - GlobusWorld 2024
 
A Comprehensive Look at Generative AI in Retail App Testing.pdf
A Comprehensive Look at Generative AI in Retail App Testing.pdfA Comprehensive Look at Generative AI in Retail App Testing.pdf
A Comprehensive Look at Generative AI in Retail App Testing.pdf
 
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdfDominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
 
GlobusWorld 2024 Opening Keynote session
GlobusWorld 2024 Opening Keynote sessionGlobusWorld 2024 Opening Keynote session
GlobusWorld 2024 Opening Keynote session
 
How Recreation Management Software Can Streamline Your Operations.pptx
How Recreation Management Software Can Streamline Your Operations.pptxHow Recreation Management Software Can Streamline Your Operations.pptx
How Recreation Management Software Can Streamline Your Operations.pptx
 
Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024
 
Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024
 
BoxLang: Review our Visionary Licenses of 2024
BoxLang: Review our Visionary Licenses of 2024BoxLang: Review our Visionary Licenses of 2024
BoxLang: Review our Visionary Licenses of 2024
 
Designing for Privacy in Amazon Web Services
Designing for Privacy in Amazon Web ServicesDesigning for Privacy in Amazon Web Services
Designing for Privacy in Amazon Web Services
 
De mooiste recreatieve routes ontdekken met RouteYou en FME
De mooiste recreatieve routes ontdekken met RouteYou en FMEDe mooiste recreatieve routes ontdekken met RouteYou en FME
De mooiste recreatieve routes ontdekken met RouteYou en FME
 
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
 
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital TransformationWSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
 

semaphore & mutex.pdf

  • 1. * Based on kernel 5.11 (x86_64) – QEMU * 2-socket CPUs (2 cores/socket) * 16GB memory * Kernel parameter: nokaslr norandmaps * KASAN: disabled * Userspace: ASLR is disabled * Legacy BIOS Linux Synchronization Mechanism: Semaphore & Mutex Adrian Huang | Feb, 2023
  • 2. Agenda • Semaphore ✓producer-consumer problem ✓Implementation in Linux kernel • Mutex (introduced in v2.6.16) ✓Enforce serialization on shared memory systems ✓Implementation in Linux kernel ✓Mutex lock ➢Fast path, midpath, slow path ✓Mutex unlock ➢Fast path and slow path ➢Mutex ownership (with a lab) ◼ Re-visit this concept: Only the lock owner has the permission to unlock the mutex ✓Q & A
  • 3. Semaphore: producer-consumer problem task 0 Semaphore wakeup/signal #0 #1 #2 #3 Share resource counter = 4 wait_list wait/sleep P, down() V, up() task 1 task N . . task task task • Sleeping lock • Used in process context *ONLY* • Cannot hold a spin lock while acquiring a semaphore • Mainly use in producer-consumer scenario • The lock holder does not require to unlock the lock. (non-ownership concept) ✓ Something like notification
  • 4. down __down raw_spin_lock_irqsave sem->count > 0 semaphore lock count wait_list semaphore_waiter list task struct raw_spinlock or raw_spinlock_t raw_lock sem->count-- raw_spin_unlock_irqrestore up Semaphore Implementation in Linux Kernel __down_common Y N
  • 6. Semaphore Implementation in Linux Kernel [Only for interruptible and wakekill task] Check if the sleeping task gets a signal
  • 7. Semaphore Implementation in Linux Kernel 1 Protect sem->count data 2 Reschedule: Need to unlock spinlock down __down raw_spin_lock_irqsave sem->count > 0 sem->count-- raw_spin_unlock_irqrestore __down_common Y N
  • 8. Semaphore Implementation in Linux Kernel Protect sem->count data up __up raw_spin_lock_irqsave wait_list empty? sem->count++ raw_spin_unlock_irqrestore Y N semaphore lock count wait_list semaphore_waiter list task struct raw_spinlock or raw_spinlock_t raw_lock up
  • 9. Agenda • Semaphore ✓producer-consumer problem ✓Implementation in Linux kernel • Mutex (introduced in v2.6.16) ✓Enforce serialization on shared memory systems ✓Implementation in Linux kernel ✓Call path ➢Fast path, midpath, slow path
  • 10. Mutex: Enforce serialization on shared memory systems task 0 mutex_unlock() mutex_lock() task 1 task N . . task task task mutex owner wait_lock osq wait_list Critical Section task rlock struct spinlock or spinlock_t atomic_t tail; optimistic_spin_queue Lock owner Protect accessing members of mutex struct [midpath] spinning: busy-waiting [slow path] waiting tasks: sleep (non-busy waiting)
  • 11. Mutex Implementation in Linux • Mutex implementation paths ✓Fastpath: Uncontended case by using cmpxchg(): CAS (Compare and Swap) ✓Midpath (optimistic spinning) - The priority of the lock owner is the highest one ➢Spin for mutex lock acquisition when the lock owner is running. ➢The lock owner is likely to release the lock soon. ➢Leverage cancelable MCS lock (OSQ - Optimistic Spin Queue: MCS-like lock): v3.15 ✓Slowpath: The task is added to the waiting queue and sleeps until woken up by the unlock path • Mutex is a hybrid type (spinning & sleeping): Busy-waiting for a few cycles instead of immediately sleeping • Ownership: Only the lock owner can release the lock • kernel/locking/{mutex.c, osq_lock.c} • Reference: Generic Mutex Subsystem
  • 14. __mutex_trylock __mutex_trylock_or_owner mutex_can_spin_on_owner The lock might be unlocked by another core mutex_optimistic_spin osq_lock __mutex_trylock_or_owner mutex_spin_on_owner cpu_relax osq_unlock for (;;) break if getting the lock There’s an owner: break if the lock is released or owner goes to sleep Return true if the following conditions are met • The spinning task is not preempted: need_resched() • The lock owner: ✓ Not preempted : checked by vcpu_is_preempted() ✓ Not sleep: checked by owner->on_cpu • Spinner is spinning on the current lock owner! mutex_spin_on_owner() returns true → keep looping for acquiring the lock • Lock release: one of spinning tasks can get the lock mutex_spin_on_owner() returns false → break ‘for’ loop • The spinning task is preempted • The lock owner is preempted • The lock owner sleeps mutex_lock(): midpath mutex_can_spin_on_owner mutex_spin_on_owner
  • 15. __mutex_trylock __mutex_trylock_or_owner mutex_can_spin_on_owner The lock might be unlocked by another core mutex_optimistic_spin osq_lock __mutex_trylock_or_owner mutex_spin_on_owner cpu_relax osq_unlock for (;;) break if getting the lock There’s an owner: break if the lock is released or owner goes to sleep mutex_lock(): midpath Second or later osq_lock() is spinned in this function. First osq_lock() gets osq lock and spins in this loop. Notify other osq spinners to get an osq lock.
  • 16. owner owner mutex_unlock() mutex_lock() Critical Section core 0 mutex_unlock() mutex_lock() Critical Section core 1 core 2 core 3 spinning mutex_unlock() mutex_lock() Critical Section spinning mutex_unlock() mutex_lock() Critical Section spinning midpath: [Case #1: ideal] without preemption or sleep (both lock owner and spinner) owner owner One of spinning tasks can get the lock after the owner releases the lock: Spinning tasks do not need to be moved to wait list
  • 17. mutex_unlock() mutex_lock() Critical Section core 0 mutex_unlock() mutex_lock() Critical Section core 1 core 2 core 3 spinning mutex_unlock() mutex_lock() Critical Section spinning mutex_unlock() mutex_lock() Critical Section spinning midpath – [Case #1: ideal] lock release without preemption or sleep When to exit the spinning? 1. The lock owner releases the lock 2. The lock owner goes to sleep or is preempted: spinning tasks go to slow path ✓ Check task->on_cpu ✓ Functions: prepare_task(), finish_task()… 3. The spinning task is preempted: the spinning task goes to slow path ✓ need_resched() owner owner owner owner
  • 18. Three cases for “cannot spin on mutex owner” • The lock owner is preempted • The spinning task is preempted • The lock owner sleeps
  • 19. owner owner mutex_unlock() mutex_lock() Critical Section core 0 mutex_unlock() mutex_lock() Critical Section core 1 core 2 core 3 spinning (midpath) mutex_unlock() mutex_lock() Critical Section spinning (midpath) mutex_unlock() mutex_lock() Critical Section non-busy wait (slow path: move this task to wait list) midpath: [Case #2] Mutex lock owner is preempted owner owner Critical Section preempt reschedule wakeup non-busy wait (slow path: move this task to wait list) 1 2 3 6 4 5 7 8
  • 20. mutex_unlock() mutex_lock() Critical Section core 0 mutex_unlock() mutex_lock() core 1 core 2 spinning (midpath) mutex_unlock() mutex_lock() Critical Section spinning (midpath) midpath: [Case #3] Spinner (osq lock owner) is preempted owner preempt 1 non-busy wait (slow path) Reschedule back 4 3 owner Critical Section __mutex_unlock_slowpath -> wake_up_q 5 core 3 mutex_unlock() mutex_lock() Critical Section spinning (midpath) 6 owner Reschedule: schedule_preempt_disabled() 2
  • 21. Three cases for “cannot spin on mutex owner” • The lock owner is preempted • The spinning task is preempted • The lock owner sleeps
  • 22. mutex_unlock() mutex_lock() Critical Section core 0 mutex_unlock() mutex_lock() core 1 core 2 spinning (midpath) mutex_unlock() mutex_lock() Critical Section spinning (midpath) midpath: [Case #3] Spinner (osq lock owner) is preempted owner preempt 1 Reschedule: schedule_preempt_disabled() 2 non-busy wait (slow path) Reschedule back 4 3 owner Critical Section __mutex_unlock_slowpath -> wake_up_q 5 core 3 mutex_unlock() mutex_lock() Critical Section spinning (midpath) 6 owner Who sets TIF_NEED_RESCHED? → set_tsk_need_resched() 1. Call path ✓ timer_interrupt → tick_handle_periodic → tick_periodic → update_process_times → scheduler_tick → curr->sched_class- >task_tick → task_tick_fair → entity_tick -> check_preempt_tick -> resched_curr -> set_tsk_need_resched ✓ HW interrupt (not timer HW) → wake up a higher priority task 2. Users: ✓ check_preempt_tick(), check_preempt_wakeup(), wake_up_process()….and so on.
  • 23. Who sets TIF_NEED_RESCHED? full call path
  • 24. Who sets TIF_NEED_RESCHED? Who sets TIF_NEED_RESCHED? → set_tsk_need_resched() 1. Call path ✓ timer_interrupt → tick_handle_periodic → tick_periodic → update_process_times → scheduler_tick → curr- >sched_class->task_tick → task_tick_fair → entity_tick -> check_preempt_tick -> resched_curr -> set_tsk_need_resched ✓ HW interrupt (not timer HW) → wake up a higher priority task 2. Users: ✓ check_preempt_tick(), check_preempt_wakeup(), wake_up_process()….and so on.
  • 25. Set TIF_NEED_RESCHED: current task will be rescheduled later PREEMPT_NEED_RESCHED bit = 0 → Need to reschedule (check comments in this header) Who sets TIF_NEED_RESCHED?
  • 26. • Set TIF_NEED_RESCHED flag if the delta is greater than ideal_runtime ✓ The running task will be scheduled out. Who sets TIF_NEED_RESCHED?
  • 27. Who sets TIF_NEED_RESCHED? → set_tsk_need_resched() 1. Call path ✓ timer_interrupt → tick_handle_periodic → tick_periodic → update_process_times → scheduler_tick → curr->sched_class- >task_tick → task_tick_fair → entity_tick -> check_preempt_tick -> resched_curr -> set_tsk_need_resched ✓ HW interrupt (not timer HW) → wake up a higher priority task 2. Users: ✓ check_preempt_tick(), check_preempt_wakeup(), wake_up_process()….and so on. Who sets TIF_NEED_RESCHED?
  • 28. Three cases for “cannot spin on mutex owner” • The lock owner is preempted • The spinning task is preempted • The lock owner sleeps
  • 29. [Case #4] Locker owner sleeps (reschedule): A test kernel module The action of sleep is identical to preemption and “wait for IO”: reschedule Create 4 kernel threads Source code (github): test-modules/mutex/mutex.c
  • 30. mutex_unlock() mutex_lock() core 0 core 1 core 2 owner 1 mutex_optimistic_spin() -> mutex_can_spin_on_owner() returns fail Owner 6 core 3 [Case #4] Locker owner sleeps (reschedule): other tasks cannot spin kthread_0 kthread_1 kthread_2 kthread_3 Critical Section msleep(): reschedule task->cpu_on = 1 0 mutex_unlock() mutex_lock() Critical Section msleep(): reschedule task->cpu_on = 1 0 non-busy wait (slow path): lock owner’s task->cpu_on = 0 mutex_unlock() mutex_lock() Critical Section msleep(): reschedule task->cpu_on = 1 0 non-busy wait (slow path): lock owner’s task->cpu_on = 0 mutex_unlock() mutex_lock() Critical Section msleep(): reschedule task->cpu_on = 1 0 non-busy wait (slow path): lock owner’s task->cpu_on = 0 2 3 4 mutex_optimistic_spin() -> mutex_can_spin_on_owner() returns fail 5 mutex_optimistic_spin() -> mutex_can_spin_on_owner() returns fail Owner 7 Owner 8 __mutex_trylock() || mutex_optimistic_spin() preempt_enable return add the task to wait_list Yes: midpath No: slow path Call path
  • 31. [Case #4] Locker owner sleeps (reschedule): gdb watchpoint: task->on_cpu → who changes this?
  • 32. [Case #4] Locker owner sleeps (reschedule): Who changes task->on_cpu? task->on_cpu is set 0 during context switch
  • 33. mutex_unlock() mutex_lock() core 0 core 1 owner 1 mutex_optimistic_spin() -> mutex_can_spin_on_owner() returns fail Owner 6 [Case #4] Locker owner sleeps (reschedule): gdb: other tasks cannot spin kthread_0 kthread_1 Critical Section msleep(): reschedule task->cpu_on = 1 0 mutex_unlock() mutex_lock() Critical Section msleep(): reschedule task->cpu_on = 1 0 non-busy wait (slow path): lock owner’s task->cpu_on = 0 2 3 __mutex_trylock() || mutex_optimistic_spin() preempt_enable return add the task to wait_list Yes: midpath No: slow path Call path
  • 34. mutex_unlock() mutex_lock() core 0 core 1 owner 1 mutex_optimistic_spin() -> mutex_can_spin_on_owner() returns fail Owner 6 [Case #4] Locker owner sleeps (reschedule): gdb: other tasks cannot spin kthread_0 kthread_1 Critical Section msleep(): reschedule task->cpu_on = 1 0 mutex_unlock() mutex_lock() Critical Section msleep(): reschedule task->cpu_on = 1 0 non-busy wait (slow path): lock owner’s task->cpu_on = 0 2 3 __mutex_trylock() || mutex_optimistic_spin() preempt_enable return add the task to wait_list Yes: midpath No: slow path Call path retval = 0 → cannot spin this owner owner->on_cpu = 0
  • 36. mutex_unlock __mutex_unlock_fast mutex_unlock(): Call path return mutex owner wait_lock osq wait_list task • task_struct pointers aligns to at least L1_CACHE_BYTES • 3 LSB bits are used for non-empty waiter list ✓ W (MUTEX_FLAG_WAITERS) ◼ Non-empty waiter list. Issue a wakeup when unlocking ✓ H (MUTEX_FLAG_HANDOFF) ◼ Unlock needs to hand the lock to the top-waiter ◼ Use by ww_mutex because ww_mutex’s waiter list is not FIFO order. ✓ P (MUTEX_FLAG_PICKUP) ◼ Handoff has been done and we're waiting for pickup ◼ Use by ww_mutex because ww_mutex’s waiter list is not FIFO order. locker->owner = 0 __mutex_unlock_slowpath Have waiters: One of 3-bit LSB of lock->owner is not cleared. spin_lock(&lock->wait_lock) spin_unlock(&lock->wait_lock) Get a waiter from lock->wait_list wake_up_q wake_up_process owner task virtual addr W P H 0 1 2 63 lock->owner __mutex_handoff Called if MUTEX_FLAG_HANDOFF is set atomic_long_cmpxchg_release( &lock->owner, owner, __owner_flags(owner)) The woken task will update lock->owner Set 3-bit LSB of lock->owner → Clear the original task struct address * ww_mutex (Wound/Wait Mutex): Deadlock-proof mutex [Unlock task = lock->owner] No waiter: 3-bit LSB of lock->owner are cleared
  • 37. mutex_unlock __mutex_unlock_fast mutex_unlock(): fast path return locker->owner = 0 [fast path] A spinner will take the lock [Unlock task = lock->owner] No waiter: 3-bit LSB of lock->owner are cleared
  • 38. mutex_unlock __mutex_unlock_fast mutex_unlock(): slow path return locker->owner = 0 __mutex_unlock_slowpath Have waiters: One of 3-bit LSB of lock->owner is not cleared. spin_lock(&lock->wait_lock) spin_unlock(&lock->wait_lock) Get a waiter from lock->wait_list wake_up_q wake_up_process __mutex_handoff Called if MUTEX_FLAG_HANDOFF is set owner task virtual addr 1 0 0 0 1 2 63 atomic_long_cmpxchg_release( &lock->owner, owner, __owner_flags(owner) owner 0 1 0 0 0 1 2 63 The woken task will update lock->owner mutex owner wait_lock osq wait_list task • task_struct pointers aligns to at least L1_CACHE_BYTES • 3 LSB bits are used for non-empty waiter list ✓ W (MUTEX_FLAG_WAITERS) ◼ Non-empty waiter list. Issue a wakeup when unlocking ✓ H (MUTEX_FLAG_HANDOFF) ◼ Unlock needs to hand the lock to the top-waiter ◼ Use by ww_mutex because ww_mutex’s waiter list is not FIFO order. ✓ P (MUTEX_FLAG_PICKUP) ◼ Handoff has been done and we're waiting for pickup ◼ Use by ww_mutex because ww_mutex’s waiter list is not FIFO order. owner task virtual addr W P H 0 1 2 63 lock>-owner * ww_mutex (Wound/Wait Mutex): Deadlock-proof mutex [Unlock task = lock->owner] No waiter: 3-bit LSB of lock->owner are cleared
  • 39. mutex_unlock __mutex_unlock_fast [Unlock task = lock->owner] No waiter: 3-bit LSB of lock->owner are cleared return locker->owner = 0 __mutex_unlock_slowpath Have waiters: One of 3-bit LSB of lock-owner is not cleared. spin_lock(&lock->wait_lock) spin_unlock(&lock->wait_lock) Get a waiter from lock->wait_list wake_up_q wake_up_process __mutex_handoff Called if MUTEX_FLAG_HANDOFF is set owner task virtual addr 1 0 0 0 1 2 63 atomic_long_cmpxchg_release( &lock->owner, owner, __owner_flags(owner) owner 0 1 0 0 0 1 2 63 The woken task will update lock->owner mutex_unlock(): slow path
  • 40. Resume here: kthread_0 wakes up kthread_1 Woken task 1 Context switch 2
  • 43. * Bit 0 is still set (MUTEX_FLAG_WAITERS): The upcoming mutex_unlock() will wake up the waiter instead of spinner. * Bit 1 (MUTEX_FLAG_HANDOFF) is cleared from __mutex_trylock->__mutex_trylock_or_owner. Update lock->owner When/who clears 3-bit LSB of lock->owner?
  • 44. Woken task: When/who to clear 3-bit LSB of lock-owner? Clear 3-bit LSB of lock->owner if no waiters
  • 45. mutex_unlock __mutex_unlock_fast mutex_unlock(): Mutex ownership return locker->owner = 0 __mutex_unlock_slowpath Have waiters: One of 3-bit LSB of lock->owner is not cleared. spin_lock(&lock->wait_lock) spin_unlock(&lock->wait_lock) Get a waiter from lock->wait_list wake_up_q wake_up_process __mutex_handoff Called if MUTEX_FLAG_HANDOFF is set atomic_long_cmpxchg_release( &lock->owner, owner, __owner_flags(owner)) The woken task will update lock->owner Set 3-bit LSB of lock->owner → Clear the original task struct address [Unlock task = lock->owner] No waiter: 3-bit LSB of lock->owner are cleared • [Fastpath] Check ownership of a mutex • [Slowpath] Does not check ownership of a mutex
  • 46. mutex_unlock(): [lab] Behavior observation when lock owner != unlocker’s task (Mutex ownership) mutex_unlock() mutex_lock() core 0 core 1 kthread_0 kthread_1 Critical Section sleep 3 seconds mutex_unlock() sleep 1 second lock_thread unlock_thread Source code: test-modules/mutex-unlock-by-another-task/mutex.c This scenario is created on purpose for demonstration. It won’t happen in real case. Note
  • 47. mutex_unlock() mutex_lock() core 0 core 1 kthread_0 kthread_1 Critical Section sleep 3 seconds mutex_unlock() sleep 1 second lock_thread unlock_thread Can another task unlock the mutex? mutex_unlock(): [lab] Behavior observation when lock owner != unlocker’s task (Mutex ownership) This scenario is created on purpose for demonstration. It won’t happen in real case. Note
  • 48. __mutex_unlock_slowpath spin_lock(&lock->wait_lock) spin_unlock(&lock->wait_lock) Get a waiter from lock->wait_list wake_up_q wake_up_process __mutex_handoff atomic_long_cmpxchg_release( &lock->owner, owner, __owner_flags(owner)) mutex_unlock(): slow path does not check unlocker’s ownership [DEBUG_MUTEXES] Print a warning message if unlocker’s task != lock owner’s task
  • 49. breakpoint breakpoint 1 1 Lock owner 2 Unlocker’s task != lock owner 3 mutex_unlock(): [lab] Behavior when lock owner != unlocker’s task mutex_unlock() mutex_lock() core 0 core 1 kthread_0 kthread_1 Critical Section sleep 3 seconds mutex_unlock() sleep 1 second lock_thread unlock_thread We’re here
  • 50. breakpoint 1 breakpoint 1 lock owner = old 2 lock->owner is set 0 by atomic_long_cmpxchg_release() 3 mutex_unlock(): [lab] Behavior if lock owner != unlocker’s task mutex_unlock() mutex_lock() core 0 core 1 kthread_0 kthread_1 Critical Section sleep 3 seconds mutex_unlock() sleep 1 second lock_thread unlock_thread We’re here
  • 51. mutex_unlock(): [lab] Behavior observation when lock owner != unlocker’s task (Mutex ownership) mutex_unlock() mutex_lock() core 0 core 1 kthread_0 kthread_1 Critical Section sleep 3 seconds mutex_unlock() sleep 1 second lock_thread unlock_thread 1 2 3 This unlocks the locked mutex mutex owner = 0 wait_lock = 0 osq = 0 wait_list Breakpoint stops at second mutex_unlock() in kthread_0
  • 52. mutex_unlock(): [lab] Behavior observation when lock owner != unlocker’s task (Mutex ownership) mutex_unlock() mutex_lock() core 0 core 1 kthread_0 kthread_1 Critical Section sleep 3 seconds mutex_unlock() sleep 1 second lock_thread unlock_thread 1 2 3 This unlocks the locked mutex What’s the behavior?
  • 53. breakpoint 1 mutex_unlock(): [lab] Behavior when lock owner != unlocker’s task mutex_unlock() mutex_lock() core 0 core 1 kthread_0 kthread_1 Critical Section sleep 3 seconds mutex_unlock() sleep 1 second lock_thread unlock_thread breakpoint 1 No lock owner 2 We’re here
  • 54. breakpoint 1 mutex_unlock(): [lab] Behavior if lock owner != unlocker’s task mutex_unlock() mutex_lock() core 0 core 1 kthread_0 kthread_1 Critical Section sleep 3 seconds mutex_unlock() sleep 1 second lock_thread unlock_thread breakpoint 1 lock owner = old 2 lock->owner is set 0 by atomic_long_cmpxchg_release() 3 4 We’re here
  • 55. mutex_unlock(): [lab] Behavior observation when lock owner != unlocker’s task (Mutex ownership) mutex_unlock() mutex_lock() core 0 core 1 kthread_0 kthread_1 Critical Section sleep 3 seconds mutex_unlock() sleep 1 second lock_thread unlock_thread Takeaways 1. [Fastpath] Linux kernel checks mutex’s ownership 2. [Slowpath] Linux kernel does not check mutex’s ownership when unlocking a mutex ✓ Developers must take care of mutex_lock/mutex_unlock pair ✓ Slowpath prints a warning message if mutex debug option is enabled. ✓ Different from the concept: Only the lock owner has the permission to unlock the mutex
  • 56. mutex_unlock(): [lab] Behavior observation when lock owner != unlocker’s task (Mutex ownership) Why doesn’t slowpath check ownership? 1. Ownership checking is only for developers and not enforced by Linux kernel. ✓ Developers need to take care of it. Quotes 1. From Generic Mutex Subsystem ✓ Mutex Semantics • Only one task can hold the mutex at a time. • Only the owner can unlock the mutex. • … ✓ These semantics are fully enforced when CONFIG_DEBUG_MUTEXES is enabled
  • 57. Think about… mutex_unlock() mutex_lock() core 0 core 1 Critical Section sleep 3 seconds mutex_unlock() sleep 1 second mutex_unlock() mutex_lock() core 2 Critical Section 1 Will it acquire the mutex lock successfully? 2 Will this unlock the mutex lock acquired by core 2?
  • 58. Q&A #1: Semaphore can be used to synchronize with user-space Consumer buffer Producer up() down() [Semaphore] Producer/Consumer Concept * Screenshot captured from: Chapter 10, Linux Kernel Development, 3rd Edition, Robert Love • Consumer waits if buffer is empty • Producer waits if buffer is full • Only one process can manipulate the buffer at a time (mutual exclusion) Principle
  • 59. Q&A #1: Semaphore can be used to synchronize with user-space Consumer buffer Producer up(), V down(), P [Semaphore] Producer/Consumer Concept Code reference: CS 537 Notes, Section #6: Semaphores and Producer/Consumer Problem
  • 60. Q&A #1: Semaphore can be used to synchronize with user-space Consumer buffer Producer up(), V down(), P [Semaphore] Producer/Consumer Concept Data structure synchronization Notify consumer Wait if buffer is full Wait if buffer is empty Notify producer Code reference: CS 537 Notes, Section #6: Semaphores and Producer/Consumer Problem
  • 61. Q&A #1: Semaphore can be used to synchronize with user-space Consumer buffer Producer up(), V down(), P [Semaphore] Producer/Consumer Concept User Space Kernel Space . . . buffer Consumer Consumer Producer system call system call Possible Scenario Note 1. up()/down() invocations are done in kernel.
  • 62. Q&A #2: Mutex isn’t suitable for synchronizations between kernel and user-space * Screenshot captured from: Chapter 10, Linux Kernel Development, 3rd Edition, Robert Love Explanation 1. [Mutex] ownership! ✓ Whoever locked a mutex must unlock it
  • 63. Reference • Generic Mutex Subsystem • Wound/Wait Deadlock-Proof Mutex Design • Mutexes and Semaphores Demystified • MCS locks and qspinlocks • Linux中的mutex机制[一] - 加锁和osq lock
  • 65. __mutex_trylock signal_pending_state spin_unlock(&lock->wait_lock) spin_lock(&lock->wait_lock) for (;;) goto ‘acquired’ lable if getting the lock Interrupted by signal: break mutex_lock(): slowpath __mutex_add_waiter(lock, &waiter, &lock- >wait_list) waiter.task = current set_current_state(state) schedule_preempt_disabled __mutex_set_flag(lock, MUTEX_FLAG_HANDOFF) __mutex_trylock() || (first && mutex_optimistic_spin()) spin_lock(&lock->wait_lock) N Y: break spin_lock(&lock->wait_lock) acquired __set_current_state(TASK_RUNNING) mutex_remove_waiter list_empty(&lock->wait_list) __mutex_clear_flag(lock, MUTEX_FLAGS) spin_lock(&lock->wait_lock) preempt_enable