Successfully reported this slideshow.
Your SlideShare is downloading. ×

Benchmarking (RICON 2014)

Benchmarking (RICON 2014)

Download to read offline

Knowledge of how to set up good benchmarks is invaluable in understanding performance of the system. Writing correct and useful benchmarks is hard, and verification of the results is difficult and prone to errors. When done right, benchmarks guide teams to improve the performance of their systems. When done wrong, hours of effort may result in a worse performing application, upset customers or worse! In this talk, we will discuss what you need to know to write better benchmarks for distributed systems. We will look at examples of bad benchmarks and learn about what biases can invalidate the measurements, in the hope of correctly applying our new-found skills and avoiding such pitfalls in the future.

Knowledge of how to set up good benchmarks is invaluable in understanding performance of the system. Writing correct and useful benchmarks is hard, and verification of the results is difficult and prone to errors. When done right, benchmarks guide teams to improve the performance of their systems. When done wrong, hours of effort may result in a worse performing application, upset customers or worse! In this talk, we will discuss what you need to know to write better benchmarks for distributed systems. We will look at examples of bad benchmarks and learn about what biases can invalidate the measurements, in the hope of correctly applying our new-found skills and avoiding such pitfalls in the future.

More Related Content

Related Books

Free with a 30 day trial from Scribd

See all

Related Audiobooks

Free with a 30 day trial from Scribd

See all

Benchmarking (RICON 2014)

  1. 1. Benchmarking: You’re Doing It Wrong Aysylu Greenberg @aysylu22
  2. 2. To Write Good Benchmarks… Need to be Full Stack
  3. 3. Benchmark = How Fast? your process vs Goal your process vs Best PracCces
  4. 4. Today • How Not to Write Benchmarks • Benchmark Setup & Results: - You’re wrong about machines - You’re wrong about stats - You’re wrong about what maLers • Becoming Less Wrong • Having Fun with Riak
  5. 5. HOW NOT TO WRITE BENCHMARKS
  6. 6. Website Serving Images • Access 1 image 1000 Cmes • Latency measured for each access • Start measuring immediately • 3 runs • Find mean • Dev environment Web Request Server Cache S3
  7. 7. WHAT’S WRONG WITH THIS BENCHMARK?
  8. 8. YOU’RE WRONG ABOUT THE MACHINE
  9. 9. Wrong About the Machine • Cache, cache, cache, cache!
  10. 10. It’s Caches All The Way Down Web Request Server Cache S3
  11. 11. It’s Caches All The Way Down
  12. 12. Caches in Benchmarks Prof. Saman Amarasinghe, MIT 2009
  13. 13. Caches in Benchmarks Prof. Saman Amarasinghe, MIT 2009
  14. 14. Caches in Benchmarks Prof. Saman Amarasinghe, MIT 2009
  15. 15. Caches in Benchmarks Prof. Saman Amarasinghe, MIT 2009
  16. 16. Caches in Benchmarks Prof. Saman Amarasinghe, MIT 2009
  17. 17. Website Serving Images • Access 1 image 1000 Cmes • Latency measured for each access • Start measuring immediately • 3 runs • Find mean • Dev environment Web Request Server Cache S3
  18. 18. Wrong About the Machine • Cache, cache, cache, cache! • Warmup & Cming
  19. 19. Website Serving Images • Access 1 image 1000 Cmes • Latency measured for each access • Start measuring immediately • 3 runs • Find mean • Dev environment Web Request Server Cache S3
  20. 20. Wrong About the Machine • Cache, cache, cache, cache! • Warmup & Cming • Periodic interference
  21. 21. Website Serving Images • Access 1 image 1000 Cmes • Latency measured for each access • Start measuring immediately • 3 runs • Find mean • Dev environment Web Request Server Cache S3
  22. 22. Wrong About the Machine • Cache, cache, cache, cache! • Warmup & Cming • Periodic interference • Test != Prod
  23. 23. Website Serving Images • Access 1 image 1000 Cmes • Latency measured for each access • Start measuring immediately • 3 runs • Find mean • Dev environment Web Request Server Cache S3
  24. 24. Wrong About the Machine • Cache, cache, cache, cache! • Warmup & Cming • Periodic interference • Test != Prod • Power mode changes
  25. 25. YOU’RE WRONG ABOUT THE STATS
  26. 26. Wrong About Stats • Too few samples
  27. 27. Wrong About Stats 120 100 80 60 40 20 0 Convergence of Median on Samples 0 10 20 30 40 50 60 Latency Time Stable Samples Stable Median Decaying Samples Decaying Median
  28. 28. Website Serving Images • Access 1 image 1000 Cmes • Latency measured for each access • Start measuring immediately • 3 runs • Find mean • Dev machine Web Request Server Cache S3
  29. 29. Wrong About Stats • Too few samples • Gaussian (not)
  30. 30. Website Serving Images • Access 1 image 1000 Cmes • Latency measured for each access • Start measuring immediately • 3 runs • Find mean • Dev machine Web Request Server Cache S3
  31. 31. Wrong About Stats • Too few samples • Gaussian (not) • MulCmodal distribuCon
  32. 32. MulCmodal DistribuCon 50% 99% # occurrences Latency 5 ms 10 ms
  33. 33. Wrong About Stats • Too few samples • Gaussian (not) • MulCmodal distribuCon • Outliers
  34. 34. YOU’RE WRONG ABOUT WHAT MATTERS
  35. 35. Wrong About What MaLers • Premature opCmizaCon
  36. 36. “Programmers waste enormous amounts of Cme thinking about … the speed of noncriCcal parts of their programs ... Forget about small efficiencies …97% of the Cme: premature opHmizaHon is the root of all evil. Yet we should not pass up our opportuniCes in that criCcal 3%.” -­‐-­‐ Donald Knuth
  37. 37. Wrong About What MaLers • Premature opCmizaCon • UnrepresentaCve workloads
  38. 38. Wrong About What MaLers • Premature opCmizaCon • UnrepresentaCve workloads • Memory pressure
  39. 39. Wrong About What MaLers • Premature opCmizaCon • UnrepresentaCve workloads • Memory pressure • Load balancing
  40. 40. Wrong About What MaLers • Premature opCmizaCon • UnrepresentaCve workloads • Memory pressure • Load balancing • Reproducibility of measurements
  41. 41. BECOMING LESS WRONG
  42. 42. User AcCons MaLer X > Y for workload Z with trade offs A, B, and C -­‐ hLp://www.toomuchcode.org/
  43. 43. Profiling Code instrumentaCon Aggregate over logs Traces
  44. 44. Microbenchmarking: Blessing & Curse + Quick & cheap + Answers narrow ?s well - Osen misleading results - Not representaCve of the program
  45. 45. Microbenchmarking: Blessing & Curse • Choose your N wisely
  46. 46. Choose Your N Wisely Prof. Saman Amarasinghe, MIT 2009
  47. 47. Microbenchmarking: Blessing & Curse • Choose your N wisely • Measure side effects
  48. 48. Microbenchmarking: Blessing & Curse • Choose your N wisely • Measure side effects • Beware of clock resoluCon
  49. 49. Microbenchmarking: Blessing & Curse • Choose your N wisely • Measure side effects • Beware of clock resoluCon • Dead Code EliminaCon
  50. 50. Microbenchmarking: Blessing & Curse • Choose your N wisely • Measure side effects • Beware of clock resoluCon • Dead Code EliminaCon • Constant work per iteraCon
  51. 51. Non-­‐Constant Work Per IteraCon
  52. 52. Follow-­‐up Material • How NOT to Measure Latency by Gil Tene – hLp://www.infoq.com/presentaCons/latency-­‐piualls • Taming the Long Latency Tail on highscalability.com – hLp://highscalability.com/blog/2012/3/12/google-­‐taming-­‐ the-­‐long-­‐latency-­‐tail-­‐when-­‐more-­‐machines-­‐equal.html • Performance Analysis Methodology by Brendan Gregg – hLp://www.brendangregg.com/methodology.html • Silverman’s Mode Detec@on Method by MaL Adereth – hLp://adereth.github.io/blog/2014/10/12/silvermans-­‐ mode-­‐detecCon-­‐method-­‐explained/
  53. 53. HAVING FUN WITH
  54. 54. Setup • SSD 30 GB • M3 large • Riak version 1.4.2-­‐0-­‐g61ac9d8 • Ubuntu 12.04.5 LTS • 4 byte keys, 10 KB values
  55. 55. 2350 2300 2250 2200 2150 2100 2050 2000 1950 1900 1850 Latency (usec) Get Latency L3 Number of Keys
  56. 56. Takeaway #1: Cache
  57. 57. Takeaway #2: Outliers
  58. 58. Takeaway #3: Workload
  59. 59. Benchmarking: You’re Doing It Wrong Aysylu Greenberg @aysylu22

×