Breaking the Kubernetes Kill Chain: Host Path Mount
Linux synchronization tools
1. Why Synchronization:
Critical region is a section of code that must be completely executed by the
control path that enters it before any other control flow path can enter it.
Thus kernel locking mechanism is required to provide:
• Synchronization
• Data integrity
If kernel supports interleaving of control path flow, protecting the critical region becomes
mandatory as one single control flow might not finish before other flow becomes active.
Linux provides several synchronization tools in kernel space.
Synchronization tools in kernel space?
Kernel space also has different execution paths. In case of preemptive kernel,
interleaving becomes little more complicated and prominent.
Asynchronous events like Interrupt will always remain in system.
Kernel preemption:
In non-preemptive kernel, when process is executing in kernel space (kernel thread or
user thread entering the kernel via system call), process was allowed to complete the
execution unless it does self-relinquish of CPU.
In preemptive kernel, process can be switched by scheduler before it finishes also, based
on scheduling criteria. This was introduced to decrease the process latency and increase
responsiveness.
Kernel space execution paths:
Process / Thread
Interrupt
Bottom-half handlers – tasklets, softirqs
Kernel locking for different paths:
Process/Thread:
- Between process/thread disabling kernel preemption will provide synchronicity.
Interrupts:
- Interrupts are asynchronous and requires synchronicity between interrupts, bottom-
half and processes to protect the critical region shared between them.
o Disabling interrupts/interrupt line can provide synchronicity
Thus practically, disabling preemption and interrupt can provide synchronicity.
Why other tools?
> On SMP, above methods will not work.
In case of SMP need of such tools becomes mandatory, as simultaneous execution of
code flow can happen.
2. Spinlocks:
Although spinlock were designed for SMP systems but it is encouraged to be used in
kernel code for uni-processor also to have code portability on SMP systems.
What is spinlock?
If kernel control path finds the lock available, it will acquire the lock and proceed.
In case lock in un-available, it will spin on a tight loop forever until lock is made
available. This means, the process flow will be executing continuously and blocking the
CPU from any other code path execution.
Linux Files related to spinlock implementation:
Include/asm-arm/spinlock.h: architecture dependant code for spinlock implementation
Include/linux/spinlock.h
Kerel/spinlock.c
Spinlock Usage:
- Define spinlock variable
Spinlock_t var= SPIN_LOCK (0) / SPIN_UNLOCKED (1)
Spin_lock_init() - set the spinlock to 1 (SPIN_UNLOCKED)
- To acquire the lock:
• Spin_lock_irq()
• Spin_lock_irqsave
• Spin_lock_bh
• Spin_lock
• Spin_lock_trylock
All spin lock calls:
• Disable interrupts
• Disable kernel preemption
• Decrements the spinlock count (spinlock count can go to any large negative
number indicating its demand) –> ( _raw_spin_lock present in architecture
dependent)
• Checks if count is equal to zero (if not equal to zero, keep looping).
• In case of uni-processor, spinlock call just disables kernel preemption. – On uni-
processor,
3. Which spinlock call to use:
If critical section is not shared by the interrupts and is only present in the process context
(system call) then spin_lock can be used.
Spin_lock_irqsave should be used in all other cases when critical section is shared
between interrupts/process path.
Spin_lock_trylock checks spinlock can be acquired or not and returns immediately.
Caller has to decide on the action based on the return value.
Spin_lock_bh performs actions as spin_lock_irq in addition to disabling softirq’s.
Spinlock are not recursive. Therefore spinlock should not be re-acquired.
Spinlock should be acquired and released without any wait/sleep. All calls that can lead
to system sleep should not be used while spinlock is acquired as interrupts/preemption is
disabled.
1. Spin_lock,
2. spin_lock_irq
3. spin_lock_irqsave
spin_lock is to be used when there is no protection required from interrupts/BH/other
processes. There is only one user for the critical section and will provide protection on
SMP.
In case protection required from interrupts, and sure that interrupts will be enabled at the
time of accessing critical sections use 2nd option. This is because at the unlock stage,it
will enable the interrupt despite of the state beforehand.
In case the state of interrupts is unknown at the time of accessing critical section, use 3rd
option.
New Spinlock implementation in Linux world (ticket-spinlock):
This is new spinlock implementation that is new to the latest linux version.
The purpose of change:
Spinlock on SMP systems can be acquired from multiple paths, thus the value of spinlock
can be any large negative number.
On release of spinlock, any one of the waiting path can acquire spinlock instead of the
one that requested first for spinlock.
New spinlock is named as ticket spinlock, that makes sure that first requester of spinlock
gets the spinlock once it is released back to the system.
Memory Optimization
System in production can save memory from spinlock optimization by changing spinlock
functions from inline functions to out-of-line function calls.
4. In development system, this change might make debugging difficult as every time system
profiling is done it will show spinlock function instead of the function in which spinlock
API is used.
Semaphore:
How Semaphore works?
Semaphore is lock based on count which defines the number of locks available in the
system.
Semaphore is based on the counter value (count ), which indicates number of process that
can concurrently acquire the resource ( semaphore lock).
Down – To acquire the lock before entering critical section.
- Test weather the lock is available
- If available proceed,
- Else suspend the execution path (push process out of run queue and put in a wait
queue)
As semaphore wait pushes the control path in sleep, they can be used only in process
context. Thus should be avoided in interrupt context like ISR, BH, exceptions.
UP – To release the lock at the end of critical section
- Release the lock
- If other threads waiting for the lock, wake up processes
- Else increment the count and exit
-
Files:
Include/asm-mips/semaphore.h
Arch/mips/kernel/semaphore.c
API:
Sema_init - initialize the semaphore count with given value
Down – try to lock the critical section by decreasing the semaphore count , mark the
thread as UNINTERRUPTIBLE and sleep (putting the thread in wait queue).
Down_interruptible – same as down , with marking thread as INTERRUPTIBLE
Down_trylock – Return immediately with success or failure
Up – release the semaphore lock.
Special (Binary) Sempahores:
Binary semaphore are called as Mutex. Operations (API) on mutex remain same as
semaphore with initial count value of 1.
5. API:
DECLARE_MUTEX
DECLARE_MUTEX_LOCKED
INIT_MUTEX
INIT_MUTEX_LOCKED
New Mutexes:
It is different from the mutex name being used to define the binary semaphore previously.
These mutexes are new implementation in linux from 2.6.16 kernel onwards.
Features:
only one task can hold the lock
only the owner can unlock
multiple unlocks and recursive locks are not allowed
process may not exit with mutex held
memory areas where held locks reside must not be freed
File:
Kernel/mutex.c
Include/linux/mutex.h
Include/linux/lockdep.h
Include/asm-generic/mutex-xxx.h
API:
DEFINE_MUTEX
Mutex_init
Mutex_lock
Mutex_lock_interruptible
Mutex_trylock
Mutex_unlock
Some other linux synchronization tools:
Atomic Operations:
To make read-modify-write operations as atomic, we need to provide some mechanism.
Atomic operations are defined to meet this requirement.
Atomic operations are dependant on the processor instructions provided for atomicity.
Atomicity is based on the LL/SC (Load-link / Store-conditional).
Load-link returns the current value of a memory location. A subsequent store-conditional
to the same memory location will store a new value only if no updates have occurred to
that location since the load-link. If any updates have occurred, the store-conditional is
guaranteed to fail, even if the value read by the load-link has since been restored.
File:
Include/asm-mips/atomic.h
6. API:
atomic_t my_atomic_counter = ATOMIC_INIT(0);
atomic read(v)
Atomically read the value of v.
atomic set(i, v)
Atomically set the value of v to i.
atomic add(i, v)
Atomically add the i to the value of v.
atomic sub(i, v)
Atomically subtract the i from the value of v.
atomic sub and test(i, v)
Atomically subtract i from v and return true if and only if the result is zero.
atomic inc(v)
Atomically increase value of v by 1.
atomic dec(v)
Atomically decrease value of v by 1.
atomic dec and test(v)
Atomically decrease value of v by 1 and return true if and only if the result is
zero.
atomic inc and test(v)
Atomically increase value of v by 1 and return true if and only if the result is
zero.
atomic add negative(i, v)
Atomically add i to v and return true if and only if the result is negative.
atomic add return(i, v)
Atomically add the i to the value of v and return the result.
atomic sub return(i, v)
Atomically subtract the i from the value of v and return the result.
atomic inc return(v)
Atomically increase value of v by 1 and return the result.
atomic dec return(v)
Atomically decrease value of v by 1 and return the result.
7. Memory barriers
Memory barrier primitive ensures that the instruction before the primitive is completed
before executing the instruction after the primitive.
In MIPS, sync instruction provides the basic memory barrier instruction.
API:
Memory barriers are in Linux implemented as architecture-dependent macros
(<asm/system.h>). The most common one are:
barrier();
Prevent compile-time reordering by inserting optimization barrier (empty
code, thus no performance loss).
mb();
Prevent read, write and optimization reordering (SMP and UP).
rmb();
Prevent read and optimization reordering (SMP and UP).
wmb();
Prevent write and optimization reordering (SMP and UP).
smp mb();
Prevent read, write and optimization reordering (SMP only).
smp rmb();
Prevent read and optimization reordering (SMP only).
smp wmb();
Prevent write and optimization reordering (SMP only).
Others:
Completions
RCU
Preemption
Interrupts
Per-cpu variables
RW semaphores
Futex
SeqLocks