Your SlideShare is downloading. ×
0
Livio slides-libflexsc-usenix-atc11
Livio slides-libflexsc-usenix-atc11
Livio slides-libflexsc-usenix-atc11
Livio slides-libflexsc-usenix-atc11
Livio slides-libflexsc-usenix-atc11
Livio slides-libflexsc-usenix-atc11
Livio slides-libflexsc-usenix-atc11
Livio slides-libflexsc-usenix-atc11
Livio slides-libflexsc-usenix-atc11
Livio slides-libflexsc-usenix-atc11
Livio slides-libflexsc-usenix-atc11
Livio slides-libflexsc-usenix-atc11
Livio slides-libflexsc-usenix-atc11
Livio slides-libflexsc-usenix-atc11
Livio slides-libflexsc-usenix-atc11
Livio slides-libflexsc-usenix-atc11
Livio slides-libflexsc-usenix-atc11
Livio slides-libflexsc-usenix-atc11
Livio slides-libflexsc-usenix-atc11
Livio slides-libflexsc-usenix-atc11
Livio slides-libflexsc-usenix-atc11
Livio slides-libflexsc-usenix-atc11
Livio slides-libflexsc-usenix-atc11
Livio slides-libflexsc-usenix-atc11
Livio slides-libflexsc-usenix-atc11
Livio slides-libflexsc-usenix-atc11
Livio slides-libflexsc-usenix-atc11
Livio slides-libflexsc-usenix-atc11
Livio slides-libflexsc-usenix-atc11
Livio slides-libflexsc-usenix-atc11
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Livio slides-libflexsc-usenix-atc11

681

Published on

Published in: Technology, Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
681
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
5
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Exception-Less System Calls for Event-Driven Servers Livio Soares and Michael Stumm University of Toronto
  • 2. Talk overview ➔ At OSDI10: exception-less system calls ➔ Technique targeted at highly threaded servers ➔ Doubled performance of Apache ➔ Event-driven servers are popular ➔ Faster than threaded servers We show that exception-less system calls make event-driven server faster ➔ memcached speeds up by 25-35% ➔ nginx speeds up by 70-120%Livio Soares | Exception-Less System Calls for Event-Driven Servers 2
  • 3. Event-driven server architectures ➔ Supports I/O concurrency with a single execution context ➔ Alternative to thread based architectures ➔ At a high-level: ➔ Divide program flow into non-blocking stages ➔ After each stage register interest in event(s) ➔ Notification of event is asynchronous, driving next stage in the program flow ➔ To avoid idle time, applications multiplex execution of multiple independent stagesLivio Soares | Exception-Less System Calls for Event-Driven Servers 3
  • 4. Example: simple network servervoid server() { ... ... fd = accept(); ... ... read(fd); ... ... write(fd); ... ... close(fd); ... ...}Livio Soares | Exception-Less System Calls for Event-Driven Servers 4
  • 5. Example: simple network servervoid server() { S1 ... ... S1 fd = accept(); UNIX options: ... S2 ... S2 Non-blocking I/O read(fd); poll() ... ... S3 S3 select() write(fd); epoll() ... ... S4 S4 Async I/O close(fd); ... ... S5} S5Livio Soares | Exception-Less System Calls for Event-Driven Servers 5
  • 6. Performance: events vs. threads ApacheBench 14000 12000 Requests/sec. 10000 8000 6000 4000 nginx (events) 2000 Apache (threads) 0 0 100 200 300 400 500 Concurrency nginx delivers 1.7x the throughput of Apache; gracefully copes with high loadsLivio Soares | Exception-Less System Calls for Event-Driven Servers 6
  • 7. Issues with UNIX event primitives ➔ Do not cover all system calls ➔ Mostly work with file-descriptors (files and sockets) ➔ Overhead ➔ Tracking progress of I/O involves both application and kernel code ➔ Application and kernel communicate frequently Previous work shows that fine-grain mode switching can half processor efficiencyLivio Soares | Exception-Less System Calls for Event-Driven Servers 7
  • 8. FlexSC component overviewFlexSC and FlexSC-Threads presented at OSDI 2010This work: libflexsc for event-driven servers 1) memcached throughput increase of up to 35% 2) nginx throughput increase of up to 120%Livio Soares | Exception-Less System Calls for Event-Driven Servers 8
  • 9. Benefits for event-driven applications 1) General purpose ➔ Any/all system calls can be asynchronous 2) Non-intrusive kernel implementation ➔ Does not require per syscall code 3) Facilitates multi-processor execution ➔ OS work is automatically distributed 4) Improved processor efficiency ➔ Reduces frequent user/kernel mode switchesLivio Soares | Exception-Less System Calls for Event-Driven Servers 9
  • 10. Summary of exception-less syscallsLivio Soares | Exception-Less System Calls for Event-Driven Servers 10
  • 11. Exception-less interface: syscall pagewrite(fd, buf, 4096);entry = free_syscall_entry();/* write syscall */entry->syscall = 1;entry->num_args = 3;entry->args[0] = fd;entry->args[1] = buf;entry->args[2] = 4096;entry->status = SUBMIT; SUBMITwhile (entry->status != DONE) DONE do_something_else();return entry->return_code;Livio Soares | Exception-Less System Calls for Event-Driven Servers 11
  • 12. Exception-less interface: syscall pagewrite(fd, buf, 4096);entry = free_syscall_entry();/* write syscall */entry->syscall = 1;entry->num_args = 3;entry->args[0] = fd;entry->args[1] = buf;entry->args[2] = 4096; SUBMITentry->status = SUBMIT; SUBMITwhile (entry->status != DONE) DONE do_something_else();return entry->return_code;Livio Soares | Exception-Less System Calls for Event-Driven Servers 12
  • 13. Exception-less interface: syscall pagewrite(fd, buf, 4096);entry = free_syscall_entry();/* write syscall */entry->syscall = 1;entry->num_args = 3;entry->args[0] = fd;entry->args[1] = buf;entry->args[2] = 4096; DONEentry->status = SUBMIT; SUBMITwhile (entry->status != DONE) DONE do_something_else();return entry->return_code;Livio Soares | Exception-Less System Calls for Event-Driven Servers 13
  • 14. Syscall threads ➔ Kernel-only threads ➔ Part of application process ➔ Execute requests from syscall page ➔ Schedulable on a per-core basisLivio Soares | Exception-Less System Calls for Event-Driven Servers 14
  • 15. Dynamic multicore specializationCore 0 Core 1Core 2 Core 3 1) FlexSC makes specializing cores simple 2) Dynamically adapts to workload needsLivio Soares | Exception-Less System Calls for Event-Driven Servers 15
  • 16. libflexsc: async syscall library ➔ Async syscall and notification library ➔ Similar to libevent ➔ But operates on syscalls instead of file-descriptors ➔ Three main components: 1) Provides main loop (dispatcher) 2) Support asynchronous syscall with associated callback to notify completion 3) Cancellation supportLivio Soares | Exception-Less System Calls for Event-Driven Servers 16
  • 17. Main API: async system call1  struct flexsc_cb {2  void (*callback)(struct flexsc_cb *); /* event handler */3  void *arg;                            /* auxiliary var */4  int64_t ret;                          /* syscall return */5  }6 7  int flexsc_##SYSCALL(struct flexsc_cb *, ... /*syscall args*/); 8   /* Example: asynchronous accept */9   struct flexsc_cb cb;10   cb.callback = handle_accept;11  flexsc_accept(&cb, master_sock, NULL, 0);12 13   void handle_accept(struct flexsc_cb *cb) {14    int fd = cb­>ret;15    if (fd != ­1) {16    struct flexsc_cb read_cb;17    read_cb.callback = handle_read;18    flexsc_read(&read_cb, fd, read_buf, read_count);19    }20   }Livio Soares | Exception-Less System Calls for Event-Driven Servers 17
  • 18. memcached port to libflexsc ➔ memcached: in-memory key/value store ➔ Simple code-base: 8K LOC ➔ Uses libevent ➔ Modified 293 LOC ➔ Transformed libevent calls to libflexsc ➔ Mostly in one file: memcached.c ➔ Most memcached syscalls are socket basedLivio Soares | Exception-Less System Calls for Event-Driven Servers 18
  • 19. nginx port to libflexsc ➔ Most popular event-driven webserver ➔ Code base: 82K LOC ➔ Natively uses both non-blocking (epoll) I/O and asynchronous I/O ➔ Modified 255 LOC ➔ Socket based code already asynchronous ➔ Not all file-system calls were asynchronous ➔ e.g., open, fstat, getdents ➔ Special handling of stack allocated syscall argsLivio Soares | Exception-Less System Calls for Event-Driven Servers 19
  • 20. Evaluation ➔ Linux 2.6.33 ➔ Nehalem (Core i7) server, 2.3GHz ➔ 4 cores ➔ Client connected through 1Gbps network ➔ Workloads ➔ memslap on memcached (30% user, 70% kernel) ➔ httperf on nginx (25% user, 75% kernel) ➔ Default Linux (“epoll”) vs. libflexsc (“flexsc”)Livio Soares | Exception-Less System Calls for Event-Driven Servers 20
  • 21. memcached on 4 cores 140000 Throughput (requests/sec.) 120000 100000 80000 30% improvement 60000 40000 flexsc 20000 epoll 0 0 200 400 600 800 1000 Request ConcurrencyLivio Soares | Exception-Less System Calls for Event-Driven Servers 21
  • 22. memcached processor metrics 1.2 User Kernel 1Relative Performance 0.8 (flexsc/epoll) 0.6 0.4 0.2 0 L2 i-cache L2 i-cache CPI d-cache CPI d-cache Livio Soares | Exception-Less System Calls for Event-Driven Servers 22
  • 23. httperf on nginx (1 core) 120 flexsc 100 epollThroughput (Mbps) 80 60 40 100% improvement 20 0 0 10000 20000 30000 40000 50000 60000 Requests/sLivio Soares | Exception-Less System Calls for Event-Driven Servers 23
  • 24. nginx processor metrics 1.2 User Kernel 1Relative Performance 0.8 (flexsc/epoll) 0.6 0.4 0.2 0 L2 i-cache CPI d-cache Branch CPI d-cache Branch L2 i-cache Livio Soares | Exception-Less System Calls for Event-Driven Servers 24
  • 25. Concluding remarks ➔ Current event-based primitives add overhead ➔ I/O operations require frequent communication between OS and application ➔ libflexsc: exception-less syscall library 1) General purpose 2) Non-intrusive kernel implementation 3) Facilitates multi-processor execution 4) Improved processor efficiency ➔ Ported memcached and nginx to libflexsc ➔ Performance improvements of 30 - 120%Livio Soares | Exception-Less System Calls for Event-Driven Servers 25
  • 26. Exception-Less System Calls for Event-Driven Servers Livio Soares and Michael Stumm University of Toronto
  • 27. Backup Slides
  • 28. Difference in improvementsWhy does nginx improve more than memcached?1) Frequency of mode switches: Server memcached nginx Frequency of syscalls 3,750 1,460 (in instructions)2) nginx uses greater diversity of system calls ➔ More interference in processor structures (caches)3) Instruction count reduction ➔ nginx with epoll() has connection timeoutsLivio Soares | Exception-Less System Calls for Event-Driven Servers 28
  • 29. Limitations ➔ Scalability (number of outstanding syscalls) ➔ Interface: operations perform linear scan ➔ Implementation: overheads of syscall threads non-negligible ➔ Solutions ➔ Throttle syscalls at application or OS ➔ Switch interface to scalable message passing ➔ Provide exception-less versions of async I/O ➔ Make kernel fully non-blockingLivio Soares | Exception-Less System Calls for Event-Driven Servers 29
  • 30. Latency (ApacheBench) 30 epoll 25 flexsc Latency (ms) 20 15 10 50% latency reduction 5 0 1 core 2 coresLivio Soares | Exception-Less System Calls for Event-Driven Servers 30

×