Exception-Less System Calls for    Event-Driven Servers  Livio Soares and Michael Stumm       University of Toronto
Talk overview ➔     At OSDI10: exception-less system calls     ➔   Technique targeted at highly threaded servers     ➔   D...
Event-driven server architectures ➔   Supports I/O concurrency with a single     execution context     ➔   Alternative to ...
Example: simple network servervoid server() {   ...   ...   fd = accept();   ...   ...   read(fd);   ...   ...   write(fd)...
Example: simple network servervoid server() {                                 S1   ...   ...   S1   fd = accept();        ...
Performance: events vs. threads                                ApacheBench                 14000                 12000 Req...
Issues with UNIX event primitives ➔   Do not cover all system calls     ➔   Mostly work with file-descriptors (files and s...
FlexSC component overviewFlexSC and FlexSC-Threads presented at OSDI 2010This work: libflexsc for event-driven servers 1) ...
Benefits for event-driven applications 1) General purpose     ➔   Any/all system calls can be asynchronous 2) Non-intrusiv...
Summary of exception-less syscallsLivio Soares | Exception-Less System Calls for Event-Driven Servers   10
Exception-less interface: syscall pagewrite(fd, buf, 4096);entry = free_syscall_entry();/* write syscall */entry->syscall ...
Exception-less interface: syscall pagewrite(fd, buf, 4096);entry = free_syscall_entry();/* write syscall */entry->syscall ...
Exception-less interface: syscall pagewrite(fd, buf, 4096);entry = free_syscall_entry();/* write syscall */entry->syscall ...
Syscall threads ➔   Kernel-only threads     ➔   Part of application process ➔   Execute requests from syscall page ➔   Sch...
Dynamic multicore specializationCore 0                                                                Core 1Core 2        ...
libflexsc: async syscall library ➔   Async syscall and notification library ➔   Similar to libevent     ➔   But operates o...
Main API: async system call1  struct flexsc_cb {2      void (*callback)(struct flexsc_cb *); /* event handler */3      voi...
memcached port to libflexsc ➔   memcached: in-memory key/value store     ➔   Simple code-base: 8K LOC     ➔   Uses libeven...
nginx port to libflexsc ➔   Most popular event-driven webserver     ➔   Code base: 82K LOC     ➔   Natively uses both non-...
Evaluation ➔   Linux 2.6.33 ➔   Nehalem (Core i7) server, 2.3GHz     ➔   4 cores ➔   Client connected through 1Gbps networ...
memcached on 4 cores                              140000 Throughput (requests/sec.)                              120000   ...
memcached processor metrics                       1.2                                    User                             ...
httperf on nginx (1 core)                    120                              flexsc                    100               ...
nginx processor metrics                       1.2                                   User                               Ker...
Concluding remarks ➔   Current event-based primitives add overhead     ➔   I/O operations require frequent communication  ...
Exception-Less System Calls for    Event-Driven Servers  Livio Soares and Michael Stumm       University of Toronto
Backup Slides
Difference in improvementsWhy does nginx improve more than memcached?1) Frequency of mode switches:                 Server...
Limitations ➔   Scalability (number of outstanding syscalls)     ➔   Interface: operations perform linear scan     ➔   Imp...
Latency (ApacheBench)                30                                                             epoll                2...
Upcoming SlideShare
Loading in …5
×

Livio slides-libflexsc-usenix-atc11

852 views
783 views

Published on

Published in: Technology, Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
852
On SlideShare
0
From Embeds
0
Number of Embeds
5
Actions
Shares
0
Downloads
5
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Livio slides-libflexsc-usenix-atc11

  1. 1. Exception-Less System Calls for Event-Driven Servers Livio Soares and Michael Stumm University of Toronto
  2. 2. Talk overview ➔ At OSDI10: exception-less system calls ➔ Technique targeted at highly threaded servers ➔ Doubled performance of Apache ➔ Event-driven servers are popular ➔ Faster than threaded servers We show that exception-less system calls make event-driven server faster ➔ memcached speeds up by 25-35% ➔ nginx speeds up by 70-120%Livio Soares | Exception-Less System Calls for Event-Driven Servers 2
  3. 3. Event-driven server architectures ➔ Supports I/O concurrency with a single execution context ➔ Alternative to thread based architectures ➔ At a high-level: ➔ Divide program flow into non-blocking stages ➔ After each stage register interest in event(s) ➔ Notification of event is asynchronous, driving next stage in the program flow ➔ To avoid idle time, applications multiplex execution of multiple independent stagesLivio Soares | Exception-Less System Calls for Event-Driven Servers 3
  4. 4. Example: simple network servervoid server() { ... ... fd = accept(); ... ... read(fd); ... ... write(fd); ... ... close(fd); ... ...}Livio Soares | Exception-Less System Calls for Event-Driven Servers 4
  5. 5. Example: simple network servervoid server() { S1 ... ... S1 fd = accept(); UNIX options: ... S2 ... S2 Non-blocking I/O read(fd); poll() ... ... S3 S3 select() write(fd); epoll() ... ... S4 S4 Async I/O close(fd); ... ... S5} S5Livio Soares | Exception-Less System Calls for Event-Driven Servers 5
  6. 6. Performance: events vs. threads ApacheBench 14000 12000 Requests/sec. 10000 8000 6000 4000 nginx (events) 2000 Apache (threads) 0 0 100 200 300 400 500 Concurrency nginx delivers 1.7x the throughput of Apache; gracefully copes with high loadsLivio Soares | Exception-Less System Calls for Event-Driven Servers 6
  7. 7. Issues with UNIX event primitives ➔ Do not cover all system calls ➔ Mostly work with file-descriptors (files and sockets) ➔ Overhead ➔ Tracking progress of I/O involves both application and kernel code ➔ Application and kernel communicate frequently Previous work shows that fine-grain mode switching can half processor efficiencyLivio Soares | Exception-Less System Calls for Event-Driven Servers 7
  8. 8. FlexSC component overviewFlexSC and FlexSC-Threads presented at OSDI 2010This work: libflexsc for event-driven servers 1) memcached throughput increase of up to 35% 2) nginx throughput increase of up to 120%Livio Soares | Exception-Less System Calls for Event-Driven Servers 8
  9. 9. Benefits for event-driven applications 1) General purpose ➔ Any/all system calls can be asynchronous 2) Non-intrusive kernel implementation ➔ Does not require per syscall code 3) Facilitates multi-processor execution ➔ OS work is automatically distributed 4) Improved processor efficiency ➔ Reduces frequent user/kernel mode switchesLivio Soares | Exception-Less System Calls for Event-Driven Servers 9
  10. 10. Summary of exception-less syscallsLivio Soares | Exception-Less System Calls for Event-Driven Servers 10
  11. 11. Exception-less interface: syscall pagewrite(fd, buf, 4096);entry = free_syscall_entry();/* write syscall */entry->syscall = 1;entry->num_args = 3;entry->args[0] = fd;entry->args[1] = buf;entry->args[2] = 4096;entry->status = SUBMIT; SUBMITwhile (entry->status != DONE) DONE do_something_else();return entry->return_code;Livio Soares | Exception-Less System Calls for Event-Driven Servers 11
  12. 12. Exception-less interface: syscall pagewrite(fd, buf, 4096);entry = free_syscall_entry();/* write syscall */entry->syscall = 1;entry->num_args = 3;entry->args[0] = fd;entry->args[1] = buf;entry->args[2] = 4096; SUBMITentry->status = SUBMIT; SUBMITwhile (entry->status != DONE) DONE do_something_else();return entry->return_code;Livio Soares | Exception-Less System Calls for Event-Driven Servers 12
  13. 13. Exception-less interface: syscall pagewrite(fd, buf, 4096);entry = free_syscall_entry();/* write syscall */entry->syscall = 1;entry->num_args = 3;entry->args[0] = fd;entry->args[1] = buf;entry->args[2] = 4096; DONEentry->status = SUBMIT; SUBMITwhile (entry->status != DONE) DONE do_something_else();return entry->return_code;Livio Soares | Exception-Less System Calls for Event-Driven Servers 13
  14. 14. Syscall threads ➔ Kernel-only threads ➔ Part of application process ➔ Execute requests from syscall page ➔ Schedulable on a per-core basisLivio Soares | Exception-Less System Calls for Event-Driven Servers 14
  15. 15. Dynamic multicore specializationCore 0 Core 1Core 2 Core 3 1) FlexSC makes specializing cores simple 2) Dynamically adapts to workload needsLivio Soares | Exception-Less System Calls for Event-Driven Servers 15
  16. 16. libflexsc: async syscall library ➔ Async syscall and notification library ➔ Similar to libevent ➔ But operates on syscalls instead of file-descriptors ➔ Three main components: 1) Provides main loop (dispatcher) 2) Support asynchronous syscall with associated callback to notify completion 3) Cancellation supportLivio Soares | Exception-Less System Calls for Event-Driven Servers 16
  17. 17. Main API: async system call1  struct flexsc_cb {2  void (*callback)(struct flexsc_cb *); /* event handler */3  void *arg;                            /* auxiliary var */4  int64_t ret;                          /* syscall return */5  }6 7  int flexsc_##SYSCALL(struct flexsc_cb *, ... /*syscall args*/); 8   /* Example: asynchronous accept */9   struct flexsc_cb cb;10   cb.callback = handle_accept;11  flexsc_accept(&cb, master_sock, NULL, 0);12 13   void handle_accept(struct flexsc_cb *cb) {14    int fd = cb­>ret;15    if (fd != ­1) {16    struct flexsc_cb read_cb;17    read_cb.callback = handle_read;18    flexsc_read(&read_cb, fd, read_buf, read_count);19    }20   }Livio Soares | Exception-Less System Calls for Event-Driven Servers 17
  18. 18. memcached port to libflexsc ➔ memcached: in-memory key/value store ➔ Simple code-base: 8K LOC ➔ Uses libevent ➔ Modified 293 LOC ➔ Transformed libevent calls to libflexsc ➔ Mostly in one file: memcached.c ➔ Most memcached syscalls are socket basedLivio Soares | Exception-Less System Calls for Event-Driven Servers 18
  19. 19. nginx port to libflexsc ➔ Most popular event-driven webserver ➔ Code base: 82K LOC ➔ Natively uses both non-blocking (epoll) I/O and asynchronous I/O ➔ Modified 255 LOC ➔ Socket based code already asynchronous ➔ Not all file-system calls were asynchronous ➔ e.g., open, fstat, getdents ➔ Special handling of stack allocated syscall argsLivio Soares | Exception-Less System Calls for Event-Driven Servers 19
  20. 20. Evaluation ➔ Linux 2.6.33 ➔ Nehalem (Core i7) server, 2.3GHz ➔ 4 cores ➔ Client connected through 1Gbps network ➔ Workloads ➔ memslap on memcached (30% user, 70% kernel) ➔ httperf on nginx (25% user, 75% kernel) ➔ Default Linux (“epoll”) vs. libflexsc (“flexsc”)Livio Soares | Exception-Less System Calls for Event-Driven Servers 20
  21. 21. memcached on 4 cores 140000 Throughput (requests/sec.) 120000 100000 80000 30% improvement 60000 40000 flexsc 20000 epoll 0 0 200 400 600 800 1000 Request ConcurrencyLivio Soares | Exception-Less System Calls for Event-Driven Servers 21
  22. 22. memcached processor metrics 1.2 User Kernel 1Relative Performance 0.8 (flexsc/epoll) 0.6 0.4 0.2 0 L2 i-cache L2 i-cache CPI d-cache CPI d-cache Livio Soares | Exception-Less System Calls for Event-Driven Servers 22
  23. 23. httperf on nginx (1 core) 120 flexsc 100 epollThroughput (Mbps) 80 60 40 100% improvement 20 0 0 10000 20000 30000 40000 50000 60000 Requests/sLivio Soares | Exception-Less System Calls for Event-Driven Servers 23
  24. 24. nginx processor metrics 1.2 User Kernel 1Relative Performance 0.8 (flexsc/epoll) 0.6 0.4 0.2 0 L2 i-cache CPI d-cache Branch CPI d-cache Branch L2 i-cache Livio Soares | Exception-Less System Calls for Event-Driven Servers 24
  25. 25. Concluding remarks ➔ Current event-based primitives add overhead ➔ I/O operations require frequent communication between OS and application ➔ libflexsc: exception-less syscall library 1) General purpose 2) Non-intrusive kernel implementation 3) Facilitates multi-processor execution 4) Improved processor efficiency ➔ Ported memcached and nginx to libflexsc ➔ Performance improvements of 30 - 120%Livio Soares | Exception-Less System Calls for Event-Driven Servers 25
  26. 26. Exception-Less System Calls for Event-Driven Servers Livio Soares and Michael Stumm University of Toronto
  27. 27. Backup Slides
  28. 28. Difference in improvementsWhy does nginx improve more than memcached?1) Frequency of mode switches: Server memcached nginx Frequency of syscalls 3,750 1,460 (in instructions)2) nginx uses greater diversity of system calls ➔ More interference in processor structures (caches)3) Instruction count reduction ➔ nginx with epoll() has connection timeoutsLivio Soares | Exception-Less System Calls for Event-Driven Servers 28
  29. 29. Limitations ➔ Scalability (number of outstanding syscalls) ➔ Interface: operations perform linear scan ➔ Implementation: overheads of syscall threads non-negligible ➔ Solutions ➔ Throttle syscalls at application or OS ➔ Switch interface to scalable message passing ➔ Provide exception-less versions of async I/O ➔ Make kernel fully non-blockingLivio Soares | Exception-Less System Calls for Event-Driven Servers 29
  30. 30. Latency (ApacheBench) 30 epoll 25 flexsc Latency (ms) 20 15 10 50% latency reduction 5 0 1 core 2 coresLivio Soares | Exception-Less System Calls for Event-Driven Servers 30

×