This document describes FlexSC, a system that aims to reduce the costs of synchronous system calls by decoupling system call invocation from execution. FlexSC uses a shared memory region containing system call entries to asynchronously schedule system calls on dedicated kernel threads using workqueues. When a user program issues a system call, the entry is marked as submitted. A scanner thread then queues work to a worker thread to execute the system call asynchronously. The worker thread updates the entry status and return value upon completion. However, the current FlexSC implementation has limitations as kernel threads cannot fully access user address spaces. Further work is needed to address these limitations.
2. Few comments before start
●
Not strictly follows the specification from
paper(e.g. syscall thread replaced by
workqueue’s worker thread)
●
Paper conatains two programmable parts
– FlexSC: Exception-less system call mechanism
– libflexsc: developed for applying FlexSC to real
application(e.g. Apache). Standard POSIX thread
compliant. libflexsc needs to modify glibc syscall
wrapper, POSIX libraries
●
Focus on FlexSC implementation
3. When a system call occurs
user kernel
Exception
blocked
time
4. When a system call occurs
user kernel
Exception
blocked
time
Tradition Exception based
system call has drawbacks
1. Direct cost
2. Indirect cost
6. Costs of synchronous syscalls
●
Mode switch cost (Direct cost)
– flushing the user-mode pipeline
– saving a few registers onto the kernel stack
– allocating execution stack
– changing the protection domain(ring 3 → 0)
– redirecting execution to the registered exception handler
– returning control back to user
●
Processor structure pollution (Indirect cost)
– user mode state is replaced by kernel mode state
– Processor structure: L1 data and instruction cache, TLB,
branch prediction tables, prefetch buffers, unified caches
L2 and L3 → Cache pollution
7. Benefits of FlexSC
●
Lower direct cost
– Fewer mode switches by seperating cores into
user mode cores and kernel mode cores
●
Lower indirect cost
– Decoupling Execution from Invocation through
system call scheduling with dedicated cores
●
Linux policy : Activation and Execution are
bound together → Is this policy always
good?
8. Overview
: when user invoke system call
sysentry[0]
sysentry[1]
sysentry[n]
scanner
thread
queue_work()
User
Program
syscall
workqueue
4
3
3
4
worker
thread
CPU
CPU
CPU
CPU
3
4
syscall()
1
2
3
4
syscall
bound
normal user
instruction
invoke system call
through memory
strore operation
Shared Memory
syspagelibflexsc
store
storereturn value worker
thread
worker
thread
worker
thread
9. system call entry(sysentry)
●
Collection of information needed to
execute given system call
●
status = {free, submitted, busy, done}
●
why 64 bytes? even smaller size is possible
– 64 is a divisor of popular cache line sizes of
today’s processor
10. State transition of sysentry
Free Submitted
BusyDone
U: User thread
K: Kernel thread
U: Invokes
system call
K: Executing
system call
K: Done syscall &
Recording return value
U: Consumes
return value
scanner thread
queue a work
when state
becomes
Submitted
work thread
executing
systcall
12. Overview
: shared memory
user kernel
sysentry[0]
sysentry[1]
sysentry[n]
scanner
thread
visible
memroy
address
worker
thread
syscall
store return value & update status
worker thread enables
asynchronous system call
and scheduling system call
syspage
15. Scanner thread
●
It scans through syspage and check if
entry’s state is “submitted”
●
If entry state is “submitted”, then It put a
work into workqueue
17. Shared memory issues
● One way that kernel thread access shared user
memory is forking user process
● In 2.6xx, kernel_thread(thread_fn, arg,
CLONE_VM | CLONE_ FS | CLONE_FILES) is
simple solution
● But on recent kernel, forking user process is
simply not possible: segfault
● kernel_thread() causes segfault
(http://www.spinics.net/lists/newbiesmsg574
45.html)
18. syspage allocation from user
space
For sharing memory with
page unit
Prevent mapped page swapped
out before pinning it to kernel
space
19. Mapping syspage into kernel
virtual address space
●
kernel simply can’t access user address space
●
Generally, accessing phsycial address directly is not
recommneded
●
So mapping user page to kernel virtual page needed
src: https://en.wikipedia.org/wiki/X86-64#Virtual_address_space_details
Kernel space
User space
22. Workqueue: asynchronous
execution mechanism
●
Workqueue is an asynchronous execution
mechanism which is widely used across
the kernel
●
It's used for various purposes from simple
context bouncing to hosting a persistent
in-kernel service thread
23. Workqueue design & problems
of legacy workqueue(~2010)
src: http://events.linuxfoundation.org/sites/events/files/slides/Async%20execution%20with%20wqs.pdf
25. workqueue in FlexSC
● It is used to execute asyncronous system call
● Scanner thread does queue_work_on(CPU,
workqueue work) to wake up worker thread
● It enables “Decoupling Execution from
Invocation” by specifying specific CPU
– A CPU that invoke a system call from user space is
different from the CPU that execute that system
call
– Reduces indirect cost
26. workqueue in FlexSC
: Initilization
●
create workqueue
●
set max active worker thread to
NUM_SYSENTRY(= 64)
29. Interface for user program
● New system calls
– flexsc_register(): Initilization process for FlexSC system and it
allows calling process to use FlexSC
– flexsc_exit(): termination FlexSC system
– flexsc_wait() (didn’t implement it yet) : when user process has
nothing to do except for waiting syscall execution
● libflexsc
– make ease use of FlexSC for user program
– paper extends libflexsc to be compatiable with standard POSIX
thread
– In our case, just providing wrapper function of syscall, and some
initilization(set CPU affinity, allocation of syspage, locking the
page, ...)
30. Limitation of current
implementation
●
Following limitation occurs because servicing
kernel thread can’t fork user process
●
Not sharing user’s file descriptor table
– file I/O related system call(e.g. read, write) not
supported
●
Not sharing entire user space address
– system calls including pointer arguments not
supported. It can point address outside shared page
● Only limited support of system call
31. Furthur works
●
Overcome current limitation(sharing file
descriptor table, sharing entire user space
if possible)
●
Do callback function when asynchronous
system call is done
●
Measure the exact cost of system call
using system performance analaysis tools
32. References
●
SOARES, L., AND STUMM, M. Flexsc:
Flexible System Call Scheduling with
Exception-Less System Calls. In 9th
USENIX Symposium on Operating Systems
Design and Implementation (OSDI)
(2010), pp. 33–46.