Your SlideShare is downloading. ×
FlexSC: Exception-Less System Calls - presented @ OSDI 2010
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

FlexSC: Exception-Less System Calls - presented @ OSDI 2010

1,534
views

Published on


0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
1,534
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
17
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. FlexSC Flexible System Call Scheduling with Exception-Less System Calls Livio Soares and Michael Stumm University of Toronto
  • 2. Motivation The synchronous system call interface is a legacy from the single core era Expensive! Costs are: ➔ direct: mode-switch ➔ indirect: processor structure pollution FlexSC implements efficient and flexible system calls for the multicore era 2
  • 3. FlexSC overview Two contributions: FlexSC and FlexSC-Threads Results in: 1) MySQL throughput increase of up to 40% and latency reduction of 30% 2) Apache throughput increase of up to 115% and latency reduction of 50% 3
  • 4. Performance impact of synchronous syscalls ➔ Xalan from SPEC CPU 2006 ➔ Virtually no time in the OS ➔ Linux on Intel Core i7 (Nehalem) ➔ Injected exceptions with varying frequencies ➔ Direct: emulate null system call Direct ➔ Indirect: emulate “write()” system call Indirect ➔ Measured only user-mode time ➔ Kernel time ignored Ideally, user-mode performance is unaltered 4
  • 5. Degradation due to sync. syscalls Degradation (lower is faster) Xalan (SPEC CPU 2006) 70% 60% Apache Indirect 50% Direct 40% MySQL 30% 20% 10% 0% 1K 2K 5K 10K 20K 50K 100K user-mode instructions between exceptions (log scale) System calls can half processor efficiency; indirect cause is major contributor 5
  • 6. Processor state pollution ➔ Key source of performance impact ➔ On a Linux write() call: rd ➔ up to 2/3 of the L1 data cache and data TLB are evicted ➔ Kernel performance equally affected ➔ Processor efficiency for OS code is also cut in half 6
  • 7. Synchronous system calls are expensive User Kernel Traditional system calls are synchronous and use exceptions to cross domains 7
  • 8. Alternative: side-step the boundary User Kernel Exception-less syscalls remove synchronicity by decoupling invocation from execution 8
  • 9. Benefits of exception-less system calls ➔ Significantly reduce direct costs ➔ Fewer mode switches User ➔ Allow for batching Kernel ➔ Reduce indirect costs ➔ Allow for dynamic multicore specialization ➔ Further reduce direct and indirect costs 9
  • 10. Exception-less interface: syscall page write(fd, buf, 4096); entry = free_syscall_entry(); /* write syscall */ entry->syscall = 1; entry->num_args = 3; entry->args[0] = fd; entry->args[1] = buf; entry->args[2] = 4096; entry->status = SUBMIT; SUBMIT while (entry->status != DONE) DONE do_something_else(); return entry->return_code; 10
  • 11. Exception-less interface: syscall page write(fd, buf, 4096); entry = free_syscall_entry(); /* write syscall */ entry->syscall = 1; entry->num_args = 3; entry->args[0] = fd; entry->args[1] = buf; entry->args[2] = 4096; SUBMIT entry->status = SUBMIT; SUBMIT while (entry->status != DONE) DONE do_something_else(); return entry->return_code; 11
  • 12. Exception-less interface: syscall page write(fd, buf, 4096); entry = free_syscall_entry(); /* write syscall */ entry->syscall = 1; entry->num_args = 3; entry->args[0] = fd; entry->args[1] = buf; entry->args[2] = 4096; DONE entry->status = SUBMIT; SUBMIT while (entry->status != DONE) DONE do_something_else(); return entry->return_code; 12
  • 13. Syscall threads ➔ Kernel-only threads ➔ Part of application process ➔ Execute requests from syscall page ➔ Schedulable on a per-core basis 13
  • 14. System call batching Request as many system calls as possible Switch to kernel-mode Start executing all posted system calls Avoids direct and indirect costs, even on a single core 14
  • 15. Dynamic multicore specialization FlexSC makes specializing cores simple Dynamically adapts to workload needs 15
  • 16. What programs can benefit from FlexSC? Event-driven servers (e.g., memcached, nginx webserver) ➔ Use asynchoronous calls, similar to FlexSC ➔ Can use FlexSC directly ➔ Mix sync and exception-less system calls Multi-threaded servers: FlexSC-Threads ➔ Thread library, compatible with Pthreads ➔ No changes to app. code or recompilation required ➔ Transparently converts legacy syscalls into exception-less ones 16
  • 17. FlexSC-Threads library ➔ Hybrid (M-on-N) threading model ➔ One kernel visible thread per core ➔ Many user threads per kernel-visible thread ➔ Redirects system calls (libc wrappers) ➔ Posts exception-less syscall to syscall page ➔ Switches to other user-level thread ➔ Resumes thread upon syscall completion Benefits of exception-less syscalls while maintaining sequential syscall interface 17
  • 18. FlexSC-Threads in action User 18
  • 19. FlexSC-Threads in action On a syscall: Post request to system call page Block user-level thread 19
  • 20. FlexSC-Threads in action Kernel On a syscall: Post request to system call page Block user-level thread Switch to next ready thread 20
  • 21. FlexSC-Threads in action User Kernel If all user-level threads become blocked: 1) enter kernel 2) wait for completion of at least 1 syscall 21
  • 22. Evaluation ➔ Linux 2.6.33 ➔ Nehalem (Core i7) server, 2.3GHz ➔ 4 cores on a chip ➔ Clients connected on 1 Gbps network ➔ Workloads ➔ Sysbench on MySQL (80% user, 20% kernel) ➔ ApacheBench on Apache (50% user, 50% kernel) ➔ Default Linux NTPL (“sync”) vs. sync FlexSC-Threads (“flexsc”) flexsc 22
  • 23. Sysbench: “OLTP” on MySQL (1 core) 500 400 (requests/sec.) Throughput 300 15% improvement 200 flexsc 100 sync 0 0 50 100 150 200 250 300 Request Concurrency 23
  • 24. Sysbench: “OLTP” on MySQL (4 cores) 1,000 800 (requests/sec.) Throughput 600 40% improvement 400 200 flexsc sync 0 0 50 100 150 200 250 300 Request Concurrency 24
  • 25. MySQL latency per client request 256 connections 1900 1,000 900 95th 800 percentile Latency (ms) 700 average 600 500 400 300 200 100 0 sync flexsc sync flexsc sync flexsc 1 core 2 cores 4 cores Up to 30% reduction of average request latencies 25
  • 26. MySQL processor metrics SysBench (4 cores) 1.4 1.2 Relative Performance User Kernel 1 (flexsc/sync) 0.8 0.6 0.4 0.2 0 L3 d-cache TLB IPC L2 i-cache Branch IPC L2 i-cache Branch L3 d-cache TLB Performance improvements consequence of more efficient processor execution 26
  • 27. ApacheBench throughput (1 core) 45,000 40,000 flexsc 35,000 sync (requests/sec.) Throughput 30,000 25,000 20,000 15,000 80-90% improvement 10,000 5,000 0 0 200 400 600 800 1000 Request Concurrency 27
  • 28. ApacheBench throughput (4 cores) 45,000 40,000 35,000 (requests/sec.) 115% improvement Throughput 30,000 25,000 20,000 15,000 10,000 flexsc 5,000 sync 0 0 200 400 600 800 1000 Request Concurrency 28
  • 29. Apache latency per client request 256 concurrent requests 238 30 99th 25 percentile Latency (ms) 20 average 15 10 5 0 sync flexsc sync flexsc sync flexsc 1 core 2 cores 4 cores Up to 50% reduction of average request latencies 29
  • 30. Apache processor metrics Apache (1 core) 2 Relative Performance 1.5 (flexsc/sync) User Kernel 1 0.5 0 L3 d-cache TLB IPC L2 i-cache Branch IPC L2 i-cache Branch L3 d-cache TLB Processor efficiency doubles for kernel and user-mode execution 30
  • 31. Discussion ➔ New OS architecture not necessary ➔ Exception-less syscalls can coexist with legacy ones ➔ Foundation for non-blocking system calls ➔ select() / poll() in user-space ➔ Interesting case of non-blocking free() ➔ Multicore ultra -specialization ➔ TCP Servers (Rutgers; Iftode et.al), FS Servers ➔ Single-ISA asymmetric cores ➔ OS-friendly cores (HP Labs; Mogul et. al) 31
  • 32. Concluding Remarks ➔ System calls degrade server performance ➔ Processor pollution is inherent to synchronous system calls ➔ Exception-less syscalls ➔ Flexible and efficient system call execution ➔ FlexSC-Threads ➔ Leverages exception-less syscalls ➔ No modifications to multi-threaded applications ➔ Throughput & latency gains ➔ 2x throughput improvement for Apache and BIND ➔ 1.4x throughput improvement for MySQL 32
  • 33. FlexSC Flexible System Call Scheduling with Exception-Less System Calls Livio Soares and Michael Stumm University of Toronto