Understanding

294 views

Published on

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
294
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
2
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • Response time: sending last byte of request, receiving last byte of response Why last byte of response, server drop connections
  • Understanding

    1. 1. Understanding & Addressing Blocking-Induced Server Latency Yaoping Ruan IBM T.J. Watson Research Center Vivek Pai Princeton University
    2. 2. Background – Web servers <ul><li>Previous work focuses on throughput </li></ul><ul><ul><li>SPECWeb99 mixes throughput & latency </li></ul></ul><ul><li>Network delay dominated end-user latency </li></ul><ul><li>Server latency contribution increasing </li></ul><ul><ul><li>Connection speed increase </li></ul></ul><ul><ul><li>Multiple data centers reduces round-trip time </li></ul></ul>
    3. 3. <ul><li>Understand server-induced latency </li></ul><ul><ul><li>Observe in both Flash & Apache </li></ul></ul><ul><ul><li>Identify blocking in filesystem-related queues </li></ul></ul><ul><ul><li>Quantify effects – service inversion </li></ul></ul><ul><li>Address problem </li></ul><ul><ul><li>In both servers </li></ul></ul><ul><ul><li>Using portable techniques </li></ul></ul><ul><ul><li>With scalable results </li></ul></ul><ul><ul><li>5-50x latency reduction </li></ul></ul>Paper Contributions
    4. 4. Outline <ul><li>Experimental setup & measurement methodology </li></ul><ul><li>Identify blocking in Web servers </li></ul><ul><li>New server design </li></ul><ul><li>Results of the new servers </li></ul>
    5. 5. Experimental Setup <ul><li>Server-client setup </li></ul><ul><ul><li>3 GHz P4 w/ 1 GB memory </li></ul></ul><ul><ul><li>FreeBSD 4.6 operating system </li></ul></ul><ul><li>Web servers </li></ul><ul><ul><li>Flash & Apache 1.3 </li></ul></ul><ul><ul><li>Fairly tuned for performance </li></ul></ul><ul><li>Workloads </li></ul><ul><ul><li>SPECweb99 static </li></ul></ul><ul><ul><li>3 GB dataset and 1024 simultaneous connections </li></ul></ul>
    6. 6. Latency Analysis Methodology <ul><li>Response time vs load </li></ul><ul><ul><li>Infinite-demand </li></ul></ul><ul><ul><li>20, 40, 60, 80, 90, 95% of the infinite-demand request rate </li></ul></ul><ul><li>Record mean & 5 th , 50 th , 95 th percentiles of latency CDF </li></ul>Flash Latency Profile – 336 Mb/s
    7. 7. Evidence for Blocking in Flash <ul><li>Event-driven model </li></ul><ul><ul><li>Select( ) or kevent( ) </li></ul></ul><ul><ul><li>Each call returns ready events </li></ul></ul><ul><li>About 60-70 events/call </li></ul><ul><li>But we have free CPU – should return more often, fewer ready events </li></ul>CDF of # of ready events for Flash
    8. 8. Evidence for Blocking in Apache <ul><li>Multiple processes </li></ul><ul><ul><li>Blocking expected </li></ul></ul><ul><ul><li>Hard to identify excessive blocking </li></ul></ul><ul><li>Two configurations </li></ul><ul><li>Sample % of ready processes per sec. </li></ul><ul><li>Bimodal (0 or 60+) </li></ul><ul><ul><li>Worse with 1024 </li></ul></ul>
    9. 9. Identifying Blocking using DeBox <ul><li>Exclusive vnode locks </li></ul><ul><ul><li>To reduce complexity and avoid possible deadlocks </li></ul></ul><ul><li>Directory walk locks </li></ul><ul><ul><li>Lock overlapping between the parent and child directory </li></ul></ul><ul><li>Locks during disk access </li></ul><ul><ul><li>Only downgrade the parent’s lock when the child needs disk access </li></ul></ul><ul><li>Result: lock convoys </li></ul>
    10. 10. Growth of Median Latencies <ul><li>But medians shouldn’t grow this fast </li></ul><ul><li>Working set < 200MB </li></ul>
    11. 11. Response Time vs Dataset Size >99.5 % cache hits
    12. 12. Service Inversion <ul><li>CDF breakdowns </li></ul><ul><ul><li>Split CDF by decile </li></ul></ul><ul><ul><li>Group responses by size </li></ul></ul><ul><li>Service inversion = difference between actual order and ideal order </li></ul>1 2 3 4 … 9 10
    13. 13. Ideal Service Breakdown small files large files
    14. 14. Flash Service Breakdown Flash CDF breakdown by decile at load level 0.95 Service inversion at this level is 0.58 small files large files
    15. 15. Apache Service Breakdown Apache CDF breakdown by decile at load level 0.95 Service inversion at this level is 0.58 small files large files
    16. 16. Solution <ul><li>Let blocking happen, elsewhere </li></ul><ul><ul><li>Move filesystem calls out of process </li></ul></ul><ul><li>Shared backend </li></ul><ul><ul><li>Cache open file descriptors </li></ul></ul><ul><ul><li>Perform misses via helper processes </li></ul></ul><ul><ul><li>Prefetch cold disk blocks </li></ul></ul><ul><li>IPC better than blocking </li></ul>
    17. 17. Flashpache Apache Flashpache
    18. 18. Flash Ready Events <ul><ul><li>Mean events: Flash – 61 , fdpass – 15 , New-Flash – 1.6 </li></ul></ul>
    19. 19. Flashpache Ready Processes
    20. 20. New-Flash Latency Profile New-Flash Latency Profile – 450 Mb/s Flash Latency Profile – 336 Mb/s <ul><li>Latency improvement : 6 X in mean, 43 X in median </li></ul><ul><li>Median & 95 th percentile virtually flat </li></ul>
    21. 21. Flashpache Performance <ul><li>Latency improvement: 15 X in mean, 9 X in median </li></ul><ul><li>Median & 95th percentile virtually flat </li></ul>Flashpache Latency Profile – 273 Mb/s Apache Latency Profile – 241 Mb/s
    22. 22. Response Time vs. Dataset
    23. 23. Flash Service Breakdowns Flash New-Flash
    24. 24. Flashpache Service Breakdowns Apache Flashpache
    25. 25. Latency Scalability Response time at 0.95 load level
    26. 26. In The Paper <ul><li>More details, measurements, breakdowns </li></ul><ul><li>Quantifying service inversion </li></ul>
    27. 27. Conclusion <ul><li>Much server latency originates from head-of-line blocking </li></ul><ul><li>Impact on latency higher than throughput </li></ul><ul><li>Blocking degrades service fairness </li></ul><ul><li>Possible to solve in server application </li></ul>
    28. 28. Thank you www.cs.princeton.edu/nsg/papers Princeton University Network Systems Group
    29. 29. Apache on Linux Apache Flashpache
    30. 30. Effects on Response Time Flash latency CDFs
    31. 31. Service Inversion

    ×