Livio slides-libflexsc-usenix-atc11

  • 647 views
Uploaded on

 

More in: Technology , Education
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
647
On Slideshare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
3
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Exception-Less System Calls for Event-Driven Servers Livio Soares and Michael Stumm University of Toronto
  • 2. Talk overview ➔ At OSDI10: exception-less system calls ➔ Technique targeted at highly threaded servers ➔ Doubled performance of Apache ➔ Event-driven servers are popular ➔ Faster than threaded servers We show that exception-less system calls make event-driven server faster ➔ memcached speeds up by 25-35% ➔ nginx speeds up by 70-120%Livio Soares | Exception-Less System Calls for Event-Driven Servers 2
  • 3. Event-driven server architectures ➔ Supports I/O concurrency with a single execution context ➔ Alternative to thread based architectures ➔ At a high-level: ➔ Divide program flow into non-blocking stages ➔ After each stage register interest in event(s) ➔ Notification of event is asynchronous, driving next stage in the program flow ➔ To avoid idle time, applications multiplex execution of multiple independent stagesLivio Soares | Exception-Less System Calls for Event-Driven Servers 3
  • 4. Example: simple network servervoid server() { ... ... fd = accept(); ... ... read(fd); ... ... write(fd); ... ... close(fd); ... ...}Livio Soares | Exception-Less System Calls for Event-Driven Servers 4
  • 5. Example: simple network servervoid server() { S1 ... ... S1 fd = accept(); UNIX options: ... S2 ... S2 Non-blocking I/O read(fd); poll() ... ... S3 S3 select() write(fd); epoll() ... ... S4 S4 Async I/O close(fd); ... ... S5} S5Livio Soares | Exception-Less System Calls for Event-Driven Servers 5
  • 6. Performance: events vs. threads ApacheBench 14000 12000 Requests/sec. 10000 8000 6000 4000 nginx (events) 2000 Apache (threads) 0 0 100 200 300 400 500 Concurrency nginx delivers 1.7x the throughput of Apache; gracefully copes with high loadsLivio Soares | Exception-Less System Calls for Event-Driven Servers 6
  • 7. Issues with UNIX event primitives ➔ Do not cover all system calls ➔ Mostly work with file-descriptors (files and sockets) ➔ Overhead ➔ Tracking progress of I/O involves both application and kernel code ➔ Application and kernel communicate frequently Previous work shows that fine-grain mode switching can half processor efficiencyLivio Soares | Exception-Less System Calls for Event-Driven Servers 7
  • 8. FlexSC component overviewFlexSC and FlexSC-Threads presented at OSDI 2010This work: libflexsc for event-driven servers 1) memcached throughput increase of up to 35% 2) nginx throughput increase of up to 120%Livio Soares | Exception-Less System Calls for Event-Driven Servers 8
  • 9. Benefits for event-driven applications 1) General purpose ➔ Any/all system calls can be asynchronous 2) Non-intrusive kernel implementation ➔ Does not require per syscall code 3) Facilitates multi-processor execution ➔ OS work is automatically distributed 4) Improved processor efficiency ➔ Reduces frequent user/kernel mode switchesLivio Soares | Exception-Less System Calls for Event-Driven Servers 9
  • 10. Summary of exception-less syscallsLivio Soares | Exception-Less System Calls for Event-Driven Servers 10
  • 11. Exception-less interface: syscall pagewrite(fd, buf, 4096);entry = free_syscall_entry();/* write syscall */entry->syscall = 1;entry->num_args = 3;entry->args[0] = fd;entry->args[1] = buf;entry->args[2] = 4096;entry->status = SUBMIT; SUBMITwhile (entry->status != DONE) DONE do_something_else();return entry->return_code;Livio Soares | Exception-Less System Calls for Event-Driven Servers 11
  • 12. Exception-less interface: syscall pagewrite(fd, buf, 4096);entry = free_syscall_entry();/* write syscall */entry->syscall = 1;entry->num_args = 3;entry->args[0] = fd;entry->args[1] = buf;entry->args[2] = 4096; SUBMITentry->status = SUBMIT; SUBMITwhile (entry->status != DONE) DONE do_something_else();return entry->return_code;Livio Soares | Exception-Less System Calls for Event-Driven Servers 12
  • 13. Exception-less interface: syscall pagewrite(fd, buf, 4096);entry = free_syscall_entry();/* write syscall */entry->syscall = 1;entry->num_args = 3;entry->args[0] = fd;entry->args[1] = buf;entry->args[2] = 4096; DONEentry->status = SUBMIT; SUBMITwhile (entry->status != DONE) DONE do_something_else();return entry->return_code;Livio Soares | Exception-Less System Calls for Event-Driven Servers 13
  • 14. Syscall threads ➔ Kernel-only threads ➔ Part of application process ➔ Execute requests from syscall page ➔ Schedulable on a per-core basisLivio Soares | Exception-Less System Calls for Event-Driven Servers 14
  • 15. Dynamic multicore specializationCore 0 Core 1Core 2 Core 3 1) FlexSC makes specializing cores simple 2) Dynamically adapts to workload needsLivio Soares | Exception-Less System Calls for Event-Driven Servers 15
  • 16. libflexsc: async syscall library ➔ Async syscall and notification library ➔ Similar to libevent ➔ But operates on syscalls instead of file-descriptors ➔ Three main components: 1) Provides main loop (dispatcher) 2) Support asynchronous syscall with associated callback to notify completion 3) Cancellation supportLivio Soares | Exception-Less System Calls for Event-Driven Servers 16
  • 17. Main API: async system call1  struct flexsc_cb {2  void (*callback)(struct flexsc_cb *); /* event handler */3  void *arg;                            /* auxiliary var */4  int64_t ret;                          /* syscall return */5  }6 7  int flexsc_##SYSCALL(struct flexsc_cb *, ... /*syscall args*/); 8   /* Example: asynchronous accept */9   struct flexsc_cb cb;10   cb.callback = handle_accept;11  flexsc_accept(&cb, master_sock, NULL, 0);12 13   void handle_accept(struct flexsc_cb *cb) {14    int fd = cb­>ret;15    if (fd != ­1) {16    struct flexsc_cb read_cb;17    read_cb.callback = handle_read;18    flexsc_read(&read_cb, fd, read_buf, read_count);19    }20   }Livio Soares | Exception-Less System Calls for Event-Driven Servers 17
  • 18. memcached port to libflexsc ➔ memcached: in-memory key/value store ➔ Simple code-base: 8K LOC ➔ Uses libevent ➔ Modified 293 LOC ➔ Transformed libevent calls to libflexsc ➔ Mostly in one file: memcached.c ➔ Most memcached syscalls are socket basedLivio Soares | Exception-Less System Calls for Event-Driven Servers 18
  • 19. nginx port to libflexsc ➔ Most popular event-driven webserver ➔ Code base: 82K LOC ➔ Natively uses both non-blocking (epoll) I/O and asynchronous I/O ➔ Modified 255 LOC ➔ Socket based code already asynchronous ➔ Not all file-system calls were asynchronous ➔ e.g., open, fstat, getdents ➔ Special handling of stack allocated syscall argsLivio Soares | Exception-Less System Calls for Event-Driven Servers 19
  • 20. Evaluation ➔ Linux 2.6.33 ➔ Nehalem (Core i7) server, 2.3GHz ➔ 4 cores ➔ Client connected through 1Gbps network ➔ Workloads ➔ memslap on memcached (30% user, 70% kernel) ➔ httperf on nginx (25% user, 75% kernel) ➔ Default Linux (“epoll”) vs. libflexsc (“flexsc”)Livio Soares | Exception-Less System Calls for Event-Driven Servers 20
  • 21. memcached on 4 cores 140000 Throughput (requests/sec.) 120000 100000 80000 30% improvement 60000 40000 flexsc 20000 epoll 0 0 200 400 600 800 1000 Request ConcurrencyLivio Soares | Exception-Less System Calls for Event-Driven Servers 21
  • 22. memcached processor metrics 1.2 User Kernel 1Relative Performance 0.8 (flexsc/epoll) 0.6 0.4 0.2 0 L2 i-cache L2 i-cache CPI d-cache CPI d-cache Livio Soares | Exception-Less System Calls for Event-Driven Servers 22
  • 23. httperf on nginx (1 core) 120 flexsc 100 epollThroughput (Mbps) 80 60 40 100% improvement 20 0 0 10000 20000 30000 40000 50000 60000 Requests/sLivio Soares | Exception-Less System Calls for Event-Driven Servers 23
  • 24. nginx processor metrics 1.2 User Kernel 1Relative Performance 0.8 (flexsc/epoll) 0.6 0.4 0.2 0 L2 i-cache CPI d-cache Branch CPI d-cache Branch L2 i-cache Livio Soares | Exception-Less System Calls for Event-Driven Servers 24
  • 25. Concluding remarks ➔ Current event-based primitives add overhead ➔ I/O operations require frequent communication between OS and application ➔ libflexsc: exception-less syscall library 1) General purpose 2) Non-intrusive kernel implementation 3) Facilitates multi-processor execution 4) Improved processor efficiency ➔ Ported memcached and nginx to libflexsc ➔ Performance improvements of 30 - 120%Livio Soares | Exception-Less System Calls for Event-Driven Servers 25
  • 26. Exception-Less System Calls for Event-Driven Servers Livio Soares and Michael Stumm University of Toronto
  • 27. Backup Slides
  • 28. Difference in improvementsWhy does nginx improve more than memcached?1) Frequency of mode switches: Server memcached nginx Frequency of syscalls 3,750 1,460 (in instructions)2) nginx uses greater diversity of system calls ➔ More interference in processor structures (caches)3) Instruction count reduction ➔ nginx with epoll() has connection timeoutsLivio Soares | Exception-Less System Calls for Event-Driven Servers 28
  • 29. Limitations ➔ Scalability (number of outstanding syscalls) ➔ Interface: operations perform linear scan ➔ Implementation: overheads of syscall threads non-negligible ➔ Solutions ➔ Throttle syscalls at application or OS ➔ Switch interface to scalable message passing ➔ Provide exception-less versions of async I/O ➔ Make kernel fully non-blockingLivio Soares | Exception-Less System Calls for Event-Driven Servers 29
  • 30. Latency (ApacheBench) 30 epoll 25 flexsc Latency (ms) 20 15 10 50% latency reduction 5 0 1 core 2 coresLivio Soares | Exception-Less System Calls for Event-Driven Servers 30