Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Benchmarking:-
You’re Doing It Wrong
Aysylu-Greenberg-
@aysylu22-
October-2015-
Aysylu-Greenberg-
--------@aysylu22-
-
-
To-Write-Good-Benchmarks…-
Need-to-be-Full-Stack-
-
-
your-process-vs-goal-
your-process-vs-best-pracFces-
-
Benchmark-=-How-Fast?-
Today-
•  How-Not-to-Write-Benchmarks-
•  Benchmark-Setup-&-Results:-
- -You’re-wrong-about-machines-
- -You’re-wrong-abou...
HOW$NOT$TO$WRITE$BENCHMARKS$
Website-Serving-Images-
•  Access-1-image-1000-Fmes-
•  Latency-measured-for-each-access-
•  Start-measuring-immediately-
...
WHAT’S$WRONG$WITH$THIS$
BENCHMARK?$$
YOU’RE$WRONG$ABOUT$THE$MACHINE$
$
Wrong-About-the-Machine-
•  Cache,-cache,-cache,-cache!-
It’s-Caches-All-The-Way-Down-
Web-Request-
Server-
S3-Cache-
It’s-Caches-All-The-Way-Down-
Prefetching:-Program-
Prefetching:-Disabled-
Prefetching:-Enabled-
Caches-in-Benchmarks-
Prof.-Saman-Amarasinghe,-MIT-2009--
Caches-in-Benchmarks-
Prof.-Saman-Amarasinghe,-MIT-2009--
Caches-in-Benchmarks-
Prof.-Saman-Amarasinghe,-MIT-2009--
Caches-in-Benchmarks-
Prof.-Saman-Amarasinghe,-MIT-2009--
Caches-in-Benchmarks-
Prof.-Saman-Amarasinghe,-MIT-2009--
Website-Serving-Images-
•  Access-1-image-1000-Fmes-
•  Latency-measured-for-each-access-
•  Start-measuring-immediately-
...
Wrong-About-the-Machine-
•  Cache,-cache,-cache,-cache!-
•  Warmup-&-Fming-
Website-Serving-Images-
•  Access-1-image-1000-Fmes-
•  Latency-measured-for-each-access-
•  Start-measuring-immediately-
...
Wrong-About-the-Machine-
•  Cache,-cache,-cache,-cache!-
•  Warmup-&-Fming-
•  Periodic-interference-
Periodic-Interference-
Periodic-Interference-
Website-Serving-Images-
•  Access-1-image-1000-Fmes-
•  Latency-measured-for-each-access-
•  Start-measuring-immediately-
...
Wrong-About-the-Machine-
•  Cache,-cache,-cache,-cache!-
•  Warmup-&-Fming-
•  Periodic-interference-
•  Test-!=-Prod-
Website-Serving-Images-
•  Access-1-image-1000-Fmes-
•  Latency-measured-for-each-access-
•  Start-measuring-immediately-
...
Wrong-About-the-Machine-
•  Cache,-cache,-cache,-cache!-
•  Warmup-&-Fming-
•  Periodic-interference-
•  Test-!=-Prod-
•  ...
Power-Modes-
$-cat-/sys/devices/system/cpu/*/cpufreq/scaling_governor-
“ondemand”-OR-“performance”-
-
Current-CPU-frequenc...
YOU’RE$WRONG$ABOUT$THE$STATS$
$
Wrong-About-Stats-
•  Too-few-samples-
-
0-
20-
40-
60-
80-
100-
120-
0- 10- 20- 30- 40- 50- 60-
Latency$
#$Runs$
Convergence$of$Median$on$Samples$
Stable-Samples-...
Website-Serving-Images-
•  Access-1-image-1000-Fmes-
•  Latency-measured-for-each-access-
•  Start-measuring-immediately-
...
Wrong-About-Stats-
•  Too-few-samples-
•  Gaussian-(not)-
Website-Serving-Images-
•  Access-1-image-1000-Fmes-
•  Latency-measured-for-each-access-
•  Start-measuring-immediately-
...
Wrong-About-Stats-
•  Too-few-samples-
•  Gaussian-(not)-
•  MulFmodal-distribuFon-
MulFmodal-DistribuFon-
50%-
99%-
#-occurrences-
Latency- 5-ms- 10-ms-
MulFmodal-DistribuFon-
Wrong-About-Stats-
•  Too-few-samples-
•  Gaussian-(not)-
•  MulFmodal-distribuFon-
•  Outliers-
Coordinated-Omission-
0-
request-
response-
request-
response-
10-
request-
20- 30- 40- 50- 60- 70- 80-
response-
Fme-
req...
Wrong-About-Stats-
•  Too-few-samples-
•  Gaussian-(not)-
•  MulFmodal-distribuFon-
•  Outliers-
YOU’RE$WRONG$ABOUT$WHAT$MATTERS$
$
Wrong-About-What-MaOers-
•  Premature-opFmizaFon-
“Programmers-waste-enormous-amounts-of-
Fme-thinking-about-…-the-speed-of-
noncriFcal-parts-of-their-programs-...-Forget-
...
Wrong-About-What-MaOers-
•  Premature-opFmizaFon-
•  UnrepresentaFve-workloads-
Wrong-About-What-MaOers-
•  Premature-opFmizaFon-
•  UnrepresentaFve-workloads-
•  Memory-pressure-
Wrong-About-What-MaOers-
•  Premature-opFmizaFon-
•  UnrepresentaFve-workloads-
•  Memory-pressure-
•  Hidden-components-
Wrong-About-What-MaOers-
•  Premature-opFmizaFon-
•  UnrepresentaFve-workloads-
•  Memory-pressure-
•  Hidden-components-
...
BECOMING$LESS$WRONG$
User-AcFons-MaOer-
-
X->-Y-for-workload-Z-
with-trade-offs-A,-B,-and-C-
p-hOp://www.toomuchcode.org/-
Profiling-
-
Profiling-
perf-
gprof-&-
Oprofile-
YourKit-&-
jProfiler-
jVisualVM-
cProfile-
Profiling-
perf-
gprof-&-
Oprofile-
YourKit-&-
jProfiler-
jVisualVM-
cProfile-
perf-
#-Various-basic-CPU-staFsFcs,-system-wide,-for-10-seconds-
perf-stat-pe-cycles,instrucFons,cachepmisses-pa-sleep-10-...
perf-
hOp://www.brendangregg.com/perf.html-
Profiling-
perf-
gprof-&-
Oprofile-
YourKit-&-
jProfiler-
jVisualVM-
cProfile-
gprof:-Where-Does-It-Spend-Its-Time?-
•  Compile-with-profiling-
-
•  Execute-the-code-
-
•  Run-the-gprof-
hOp://www.thege...
gprof:-Where-Does-It-Spend-Its-Time?-
hOp://www.thegeekstuff.com/2012/08/gprofptutorial/-
Profiling-
perf-
gprof-&-
Oprofile-
YourKit-&-
jProfiler-
jVisualVM-
cProfile-
hOp://www.brendangregg.com/linuxperf.html-
Profiling-
perf-
gprof-&-
Oprofile-
YourKit-&-
jProfiler-
jVisualVM-
cProfile-
Profiling-
perf-
gprof-&-
Oprofile-
YourKit-&-
jProfiler-
jVisualVM-
cProfile-
Profiling-
perf-
gprof-&-
Oprofile-
YourKit-&-
jProfiler-
jVisualVM-
cProfile-
Profiling-
perf-
gprof-&-
Oprofile-
YourKit-&-
jProfiler-
jVisualVM-
cProfile-
Profiling-
Code-instrumentaFon-
Aggregate-over-logs-
Traces-
-
Microbenchmarking:-Blessing-&-Curse-
+ Quick-&-cheap-
+ Answers-narrow-?s-well-
- O|en-misleading-results-
- Not-represent...
Microbenchmarking:-Blessing-&-Curse-
•  Choose-your-N-wisely-
-
Choose-Your-N-Wisely-
Prof.-Saman-Amarasinghe,-MIT-2009--
Microbenchmarking:-Blessing-&-Curse-
•  Choose-your-N-wisely-
•  Measure-side-effects-
Microbenchmarking:-Blessing-&-Curse-
•  Choose-your-N-wisely-
•  Measure-side-effects-
•  Beware-of-clock-resoluFon-
Microbenchmarking:-Blessing-&-Curse-
•  Choose-your-N-wisely-
•  Measure-side-effects-
•  Beware-of-clock-resoluFon-
•  Dea...
Microbenchmarking:-Blessing-&-Curse-
•  Choose-your-N-wisely-
•  Measure-side-effects-
•  Beware-of-clock-resoluFon-
•  Dea...
NonpConstant-Work-Per-IteraFon-
What-Should-a-Benchmark-Do?-
Measure-behavior-of-system-
-
Represent-realisFc-workload-
-
Run-for-sufficiently-long-Fme-
-
C...
Followpup-Material-
•  How$NOT$to$Measure$Latency$by-Gil-Tene-
–  hOp://www.infoq.com/presentaFons/latencyppi}alls-
•  Tam...
Followpup-Material-
hOp://wwwpplan.cs.colorado.edu/diwan/asplos09.pdf-
Followpup-Material-
•  List-of-media-for-learning-more-about-measurement-bias-in-
system-benchmarks:-
hOps://gist.github.c...
Takeaway-#1:-Cache-
Takeaway-#2:-Outliers-
Takeaway-#3:-Workload-
Benchmarking:-
You’re Doing It Wrong
Aysylu-Greenberg-
@aysylu22-
Benchmarking (JAXLondon 2015)
Benchmarking (JAXLondon 2015)
Benchmarking (JAXLondon 2015)
Upcoming SlideShare
Loading in …5
×

Benchmarking (JAXLondon 2015)

568 views

Published on

Knowledge of how to set up good benchmarks is invaluable in understanding performance of the system. Writing correct and useful benchmarks is hard, and verification of the results is difficult and prone to errors. When done right, benchmarks guide teams to improve the performance of their systems. When done wrong, hours of effort may result in a worse performing application, upset customers or worse! In this talk, we will discuss what you need to know to write better benchmarks. We will look at examples of bad benchmarks and learn about what biases can invalidate the measurements, in the hope of correctly applying our new-found skills and avoiding such pitfalls in the future.

Published in: Software
  • Be the first to comment

  • Be the first to like this

Benchmarking (JAXLondon 2015)

  1. 1. Benchmarking:- You’re Doing It Wrong Aysylu-Greenberg- @aysylu22- October-2015-
  2. 2. Aysylu-Greenberg- --------@aysylu22- - -
  3. 3. To-Write-Good-Benchmarks…- Need-to-be-Full-Stack-
  4. 4. - - your-process-vs-goal- your-process-vs-best-pracFces- - Benchmark-=-How-Fast?-
  5. 5. Today- •  How-Not-to-Write-Benchmarks- •  Benchmark-Setup-&-Results:- - -You’re-wrong-about-machines- - -You’re-wrong-about-stats- - -You’re-wrong-about-what-maOers- •  Becoming-Less-Wrong-
  6. 6. HOW$NOT$TO$WRITE$BENCHMARKS$
  7. 7. Website-Serving-Images- •  Access-1-image-1000-Fmes- •  Latency-measured-for-each-access- •  Start-measuring-immediately- •  3-runs- •  Find-mean- •  Dev-environment- Web-Request- Server- S3-Cache-
  8. 8. WHAT’S$WRONG$WITH$THIS$ BENCHMARK?$$
  9. 9. YOU’RE$WRONG$ABOUT$THE$MACHINE$ $
  10. 10. Wrong-About-the-Machine- •  Cache,-cache,-cache,-cache!-
  11. 11. It’s-Caches-All-The-Way-Down- Web-Request- Server- S3-Cache-
  12. 12. It’s-Caches-All-The-Way-Down-
  13. 13. Prefetching:-Program-
  14. 14. Prefetching:-Disabled-
  15. 15. Prefetching:-Enabled-
  16. 16. Caches-in-Benchmarks- Prof.-Saman-Amarasinghe,-MIT-2009--
  17. 17. Caches-in-Benchmarks- Prof.-Saman-Amarasinghe,-MIT-2009--
  18. 18. Caches-in-Benchmarks- Prof.-Saman-Amarasinghe,-MIT-2009--
  19. 19. Caches-in-Benchmarks- Prof.-Saman-Amarasinghe,-MIT-2009--
  20. 20. Caches-in-Benchmarks- Prof.-Saman-Amarasinghe,-MIT-2009--
  21. 21. Website-Serving-Images- •  Access-1-image-1000-Fmes- •  Latency-measured-for-each-access- •  Start-measuring-immediately- •  3-runs- •  Find-mean- •  Dev-environment- Web-Request- Server- S3-Cache-
  22. 22. Wrong-About-the-Machine- •  Cache,-cache,-cache,-cache!- •  Warmup-&-Fming-
  23. 23. Website-Serving-Images- •  Access-1-image-1000-Fmes- •  Latency-measured-for-each-access- •  Start-measuring-immediately- •  3-runs- •  Find-mean- •  Dev-environment- Web-Request- Server- S3-Cache-
  24. 24. Wrong-About-the-Machine- •  Cache,-cache,-cache,-cache!- •  Warmup-&-Fming- •  Periodic-interference-
  25. 25. Periodic-Interference-
  26. 26. Periodic-Interference-
  27. 27. Website-Serving-Images- •  Access-1-image-1000-Fmes- •  Latency-measured-for-each-access- •  Start-measuring-immediately- •  3-runs- •  Find-mean- •  Dev-environment- Web-Request- Server- S3-Cache-
  28. 28. Wrong-About-the-Machine- •  Cache,-cache,-cache,-cache!- •  Warmup-&-Fming- •  Periodic-interference- •  Test-!=-Prod-
  29. 29. Website-Serving-Images- •  Access-1-image-1000-Fmes- •  Latency-measured-for-each-access- •  Start-measuring-immediately- •  3-runs- •  Find-mean- •  Dev-environment- Web-Request- Server- S3-Cache-
  30. 30. Wrong-About-the-Machine- •  Cache,-cache,-cache,-cache!- •  Warmup-&-Fming- •  Periodic-interference- •  Test-!=-Prod- •  Power-mode-changes-
  31. 31. Power-Modes- $-cat-/sys/devices/system/cpu/*/cpufreq/scaling_governor- “ondemand”-OR-“performance”- - Current-CPU-frequencies:- $-grep-"MHz"-/proc/cpuinfo-
  32. 32. YOU’RE$WRONG$ABOUT$THE$STATS$ $
  33. 33. Wrong-About-Stats- •  Too-few-samples- -
  34. 34. 0- 20- 40- 60- 80- 100- 120- 0- 10- 20- 30- 40- 50- 60- Latency$ #$Runs$ Convergence$of$Median$on$Samples$ Stable-Samples- Stable-Median- Decaying-Samples- Decaying-Median-
  35. 35. Website-Serving-Images- •  Access-1-image-1000-Fmes- •  Latency-measured-for-each-access- •  Start-measuring-immediately- •  3-runs- •  Find-mean- •  Dev-machine- Web-Request- Server- S3-Cache-
  36. 36. Wrong-About-Stats- •  Too-few-samples- •  Gaussian-(not)-
  37. 37. Website-Serving-Images- •  Access-1-image-1000-Fmes- •  Latency-measured-for-each-access- •  Start-measuring-immediately- •  3-runs- •  Find-mean- •  Dev-machine- Web-Request- Server- S3-Cache-
  38. 38. Wrong-About-Stats- •  Too-few-samples- •  Gaussian-(not)- •  MulFmodal-distribuFon-
  39. 39. MulFmodal-DistribuFon- 50%- 99%- #-occurrences- Latency- 5-ms- 10-ms-
  40. 40. MulFmodal-DistribuFon-
  41. 41. Wrong-About-Stats- •  Too-few-samples- •  Gaussian-(not)- •  MulFmodal-distribuFon- •  Outliers-
  42. 42. Coordinated-Omission- 0- request- response- request- response- 10- request- 20- 30- 40- 50- 60- 70- 80- response- Fme- request- response- request-
  43. 43. Wrong-About-Stats- •  Too-few-samples- •  Gaussian-(not)- •  MulFmodal-distribuFon- •  Outliers-
  44. 44. YOU’RE$WRONG$ABOUT$WHAT$MATTERS$ $
  45. 45. Wrong-About-What-MaOers- •  Premature-opFmizaFon-
  46. 46. “Programmers-waste-enormous-amounts-of- Fme-thinking-about-…-the-speed-of- noncriFcal-parts-of-their-programs-...-Forget- about-small-efficiencies-…97%-of-the-Fme:- premature$opImizaIon$is$ the$root$of$all$evil.-Yet-we- should-not-pass-up-our- opportuniFes-in-that-criFcal- 3%.”- - pp-Donald-Knuth-
  47. 47. Wrong-About-What-MaOers- •  Premature-opFmizaFon- •  UnrepresentaFve-workloads-
  48. 48. Wrong-About-What-MaOers- •  Premature-opFmizaFon- •  UnrepresentaFve-workloads- •  Memory-pressure-
  49. 49. Wrong-About-What-MaOers- •  Premature-opFmizaFon- •  UnrepresentaFve-workloads- •  Memory-pressure- •  Hidden-components-
  50. 50. Wrong-About-What-MaOers- •  Premature-opFmizaFon- •  UnrepresentaFve-workloads- •  Memory-pressure- •  Hidden-components- •  Reproducibility-of-measurements-
  51. 51. BECOMING$LESS$WRONG$
  52. 52. User-AcFons-MaOer- - X->-Y-for-workload-Z- with-trade-offs-A,-B,-and-C- p-hOp://www.toomuchcode.org/-
  53. 53. Profiling- -
  54. 54. Profiling- perf- gprof-&- Oprofile- YourKit-&- jProfiler- jVisualVM- cProfile-
  55. 55. Profiling- perf- gprof-&- Oprofile- YourKit-&- jProfiler- jVisualVM- cProfile-
  56. 56. perf- #-Various-basic-CPU-staFsFcs,-system-wide,-for-10-seconds- perf-stat-pe-cycles,instrucFons,cachepmisses-pa-sleep-10- #-Count-system-calls-for-the-enFre-system,-for-5-seconds- perf-stat-pe-'syscalls:sys_enter_*'-pa-sleep-5- #-Sample-CPU-stack-traces,-once-every-10,000-Level-1-data- cache-misses,-for-5-seconds- perf-record-pe-L1pdcacheploadpmisses-pc-10000-pag-pp-sleep-5- hOp://www.brendangregg.com/perf.html-
  57. 57. perf- hOp://www.brendangregg.com/perf.html-
  58. 58. Profiling- perf- gprof-&- Oprofile- YourKit-&- jProfiler- jVisualVM- cProfile-
  59. 59. gprof:-Where-Does-It-Spend-Its-Time?- •  Compile-with-profiling- - •  Execute-the-code- - •  Run-the-gprof- hOp://www.thegeekstuff.com/2012/08/gprofptutorial/-
  60. 60. gprof:-Where-Does-It-Spend-Its-Time?- hOp://www.thegeekstuff.com/2012/08/gprofptutorial/-
  61. 61. Profiling- perf- gprof-&- Oprofile- YourKit-&- jProfiler- jVisualVM- cProfile-
  62. 62. hOp://www.brendangregg.com/linuxperf.html-
  63. 63. Profiling- perf- gprof-&- Oprofile- YourKit-&- jProfiler- jVisualVM- cProfile-
  64. 64. Profiling- perf- gprof-&- Oprofile- YourKit-&- jProfiler- jVisualVM- cProfile-
  65. 65. Profiling- perf- gprof-&- Oprofile- YourKit-&- jProfiler- jVisualVM- cProfile-
  66. 66. Profiling- perf- gprof-&- Oprofile- YourKit-&- jProfiler- jVisualVM- cProfile-
  67. 67. Profiling- Code-instrumentaFon- Aggregate-over-logs- Traces- -
  68. 68. Microbenchmarking:-Blessing-&-Curse- + Quick-&-cheap- + Answers-narrow-?s-well- - O|en-misleading-results- - Not-representaFve-of-the-program-
  69. 69. Microbenchmarking:-Blessing-&-Curse- •  Choose-your-N-wisely- -
  70. 70. Choose-Your-N-Wisely- Prof.-Saman-Amarasinghe,-MIT-2009--
  71. 71. Microbenchmarking:-Blessing-&-Curse- •  Choose-your-N-wisely- •  Measure-side-effects-
  72. 72. Microbenchmarking:-Blessing-&-Curse- •  Choose-your-N-wisely- •  Measure-side-effects- •  Beware-of-clock-resoluFon-
  73. 73. Microbenchmarking:-Blessing-&-Curse- •  Choose-your-N-wisely- •  Measure-side-effects- •  Beware-of-clock-resoluFon- •  Dead-Code-EliminaFon-
  74. 74. Microbenchmarking:-Blessing-&-Curse- •  Choose-your-N-wisely- •  Measure-side-effects- •  Beware-of-clock-resoluFon- •  Dead-Code-EliminaFon- •  Constant-work-per-iteraFon-
  75. 75. NonpConstant-Work-Per-IteraFon-
  76. 76. What-Should-a-Benchmark-Do?- Measure-behavior-of-system- - Represent-realisFc-workload- - Run-for-sufficiently-long-Fme- - Compare-in-the-same-context- - Output-predictable-and-reproducible-results-
  77. 77. Followpup-Material- •  How$NOT$to$Measure$Latency$by-Gil-Tene- –  hOp://www.infoq.com/presentaFons/latencyppi}alls- •  Taming$the$Long$Latency$Tail-on-highscalability.com- –  hOp://highscalability.com/blog/2012/3/12/googleptamingptheplongplatencyp tailpwhenpmorepmachinespequal.html- •  Performance$Analysis$Methodology$by-Brendan-Gregg- –  hOp://www.brendangregg.com/methodology.html- •  Silverman’s$Mode$Detec@on$Method-by-MaO-Adereth- –  hOp://adereth.github.io/blog/2014/10/12/silvermanspmodepdetecFonp methodpexplained/- •  How$Not$To$Measure$System$Performance-by-James-Bornholt$ –  hOps://homes.cs.washington.edu/~bornholt/post/performancep evaluaFon.html- •  Trust$No$One,$Not$Even$Performance$Counters-by-Paul-Khuong$ –  hDp://www.pvk.ca/Blog/2014/10/19/performancePop@misa@onP~Pwri@ngPanP essay/#trustPnoPone$
  78. 78. Followpup-Material- hOp://wwwpplan.cs.colorado.edu/diwan/asplos09.pdf-
  79. 79. Followpup-Material- •  List-of-media-for-learning-more-about-measurement-bias-in- system-benchmarks:- hOps://gist.github.com/aysylu/58ab5d67314d684a7f4c- -
  80. 80. Takeaway-#1:-Cache-
  81. 81. Takeaway-#2:-Outliers-
  82. 82. Takeaway-#3:-Workload-
  83. 83. Benchmarking:- You’re Doing It Wrong Aysylu-Greenberg- @aysylu22-

×