Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Benchmarking: 
You’re Doing It Wrong 
Aysylu 
Greenberg 
@aysylu22
To 
Write 
Good 
Benchmarks… 
Need 
to 
be 
Full 
Stack
Benchmark 
= 
How 
Fast? 
your 
process 
vs 
Goal 
your 
process 
vs 
Best 
PracCces
Today 
• How 
Not 
to 
Write 
Benchmarks 
• Benchmark 
Setup 
& 
Results: 
- 
You’re 
wrong 
about 
machines 
- 
You’re 
w...
HOW 
NOT 
TO 
WRITE 
BENCHMARKS
Website 
Serving 
Images 
• Access 
1 
image 
1000 
Cmes 
• Latency 
measured 
for 
each 
access 
• Start 
measuring 
imme...
WHAT’S 
WRONG 
WITH 
THIS 
BENCHMARK?
YOU’RE 
WRONG 
ABOUT 
THE 
MACHINE
Wrong 
About 
the 
Machine 
• Cache, 
cache, 
cache, 
cache!
It’s 
Caches 
All 
The 
Way 
Down 
Web 
Request 
Server 
Cache 
S3
It’s 
Caches 
All 
The 
Way 
Down
Caches 
in 
Benchmarks 
Prof. 
Saman 
Amarasinghe, 
MIT 
2009
Caches 
in 
Benchmarks 
Prof. 
Saman 
Amarasinghe, 
MIT 
2009
Caches 
in 
Benchmarks 
Prof. 
Saman 
Amarasinghe, 
MIT 
2009
Caches 
in 
Benchmarks 
Prof. 
Saman 
Amarasinghe, 
MIT 
2009
Caches 
in 
Benchmarks 
Prof. 
Saman 
Amarasinghe, 
MIT 
2009
Website 
Serving 
Images 
• Access 
1 
image 
1000 
Cmes 
• Latency 
measured 
for 
each 
access 
• Start 
measuring 
imme...
Wrong 
About 
the 
Machine 
• Cache, 
cache, 
cache, 
cache! 
• Warmup 
& 
Cming
Website 
Serving 
Images 
• Access 
1 
image 
1000 
Cmes 
• Latency 
measured 
for 
each 
access 
• Start 
measuring 
imme...
Wrong 
About 
the 
Machine 
• Cache, 
cache, 
cache, 
cache! 
• Warmup 
& 
Cming 
• Periodic 
interference
Website 
Serving 
Images 
• Access 
1 
image 
1000 
Cmes 
• Latency 
measured 
for 
each 
access 
• Start 
measuring 
imme...
Wrong 
About 
the 
Machine 
• Cache, 
cache, 
cache, 
cache! 
• Warmup 
& 
Cming 
• Periodic 
interference 
• Test 
!= 
Pr...
Website 
Serving 
Images 
• Access 
1 
image 
1000 
Cmes 
• Latency 
measured 
for 
each 
access 
• Start 
measuring 
imme...
Wrong 
About 
the 
Machine 
• Cache, 
cache, 
cache, 
cache! 
• Warmup 
& 
Cming 
• Periodic 
interference 
• Test 
!= 
Pr...
YOU’RE 
WRONG 
ABOUT 
THE 
STATS
Wrong 
About 
Stats 
• Too 
few 
samples
Wrong 
About 
Stats 
120 
100 
80 
60 
40 
20 
0 
Convergence 
of 
Median 
on 
Samples 
0 
10 
20 
30 
40 
50 
60 
Latency...
Website 
Serving 
Images 
• Access 
1 
image 
1000 
Cmes 
• Latency 
measured 
for 
each 
access 
• Start 
measuring 
imme...
Wrong 
About 
Stats 
• Too 
few 
samples 
• Gaussian 
(not)
Website 
Serving 
Images 
• Access 
1 
image 
1000 
Cmes 
• Latency 
measured 
for 
each 
access 
• Start 
measuring 
imme...
Wrong 
About 
Stats 
• Too 
few 
samples 
• Gaussian 
(not) 
• MulCmodal 
distribuCon
MulCmodal 
DistribuCon 
50% 
99% 
# 
occurrences 
Latency 
5 
ms 
10 
ms
Wrong 
About 
Stats 
• Too 
few 
samples 
• Gaussian 
(not) 
• MulCmodal 
distribuCon 
• Outliers
YOU’RE 
WRONG 
ABOUT 
WHAT 
MATTERS
Wrong 
About 
What 
MaLers 
• Premature 
opCmizaCon
“Programmers 
waste 
enormous 
amounts 
of 
Cme 
thinking 
about 
… 
the 
speed 
of 
noncriCcal 
parts 
of 
their 
program...
Wrong 
About 
What 
MaLers 
• Premature 
opCmizaCon 
• UnrepresentaCve 
workloads
Wrong 
About 
What 
MaLers 
• Premature 
opCmizaCon 
• UnrepresentaCve 
workloads 
• Memory 
pressure
Wrong 
About 
What 
MaLers 
• Premature 
opCmizaCon 
• UnrepresentaCve 
workloads 
• Memory 
pressure 
• Load 
balancing
Wrong 
About 
What 
MaLers 
• Premature 
opCmizaCon 
• UnrepresentaCve 
workloads 
• Memory 
pressure 
• Load 
balancing 
...
BECOMING 
LESS 
WRONG
User 
AcCons 
MaLer 
X 
> 
Y 
for 
workload 
Z 
with 
trade 
offs 
A, 
B, 
and 
C 
-­‐ 
hLp://www.toomuchcode.org/
Profiling 
Code 
instrumentaCon 
Aggregate 
over 
logs 
Traces
Microbenchmarking: 
Blessing 
& 
Curse 
+ Quick 
& 
cheap 
+ Answers 
narrow 
?s 
well 
- Osen 
misleading 
results 
- Not...
Microbenchmarking: 
Blessing 
& 
Curse 
• Choose 
your 
N 
wisely
Choose 
Your 
N 
Wisely 
Prof. 
Saman 
Amarasinghe, 
MIT 
2009
Microbenchmarking: 
Blessing 
& 
Curse 
• Choose 
your 
N 
wisely 
• Measure 
side 
effects
Microbenchmarking: 
Blessing 
& 
Curse 
• Choose 
your 
N 
wisely 
• Measure 
side 
effects 
• Beware 
of 
clock 
resoluCo...
Microbenchmarking: 
Blessing 
& 
Curse 
• Choose 
your 
N 
wisely 
• Measure 
side 
effects 
• Beware 
of 
clock 
resoluCo...
Microbenchmarking: 
Blessing 
& 
Curse 
• Choose 
your 
N 
wisely 
• Measure 
side 
effects 
• Beware 
of 
clock 
resoluCo...
Non-­‐Constant 
Work 
Per 
IteraCon
Follow-­‐up 
Material 
• How 
NOT 
to 
Measure 
Latency 
by 
Gil 
Tene 
– hLp://www.infoq.com/presentaCons/latency-­‐piual...
HAVING 
FUN 
WITH
Setup 
• SSD 
30 
GB 
• M3 
large 
• Riak 
version 
1.4.2-­‐0-­‐g61ac9d8 
• Ubuntu 
12.04.5 
LTS 
• 4 
byte 
keys, 
10 
KB...
2350 
2300 
2250 
2200 
2150 
2100 
2050 
2000 
1950 
1900 
1850 
Latency 
(usec) 
Get 
Latency 
L3 
Number 
of 
Keys
Takeaway 
#1: 
Cache
Takeaway 
#2: 
Outliers
Takeaway 
#3: 
Workload
Benchmarking: 
You’re Doing It Wrong 
Aysylu 
Greenberg 
@aysylu22
Benchmarking (RICON 2014)
Upcoming SlideShare
Loading in …5
×

Benchmarking (RICON 2014)

1,010 views

Published on

Knowledge of how to set up good benchmarks is invaluable in understanding performance of the system. Writing correct and useful benchmarks is hard, and verification of the results is difficult and prone to errors. When done right, benchmarks guide teams to improve the performance of their systems. When done wrong, hours of effort may result in a worse performing application, upset customers or worse! In this talk, we will discuss what you need to know to write better benchmarks for distributed systems. We will look at examples of bad benchmarks and learn about what biases can invalidate the measurements, in the hope of correctly applying our new-found skills and avoiding such pitfalls in the future.

Published in: Software
  • Be the first to comment

Benchmarking (RICON 2014)

  1. 1. Benchmarking: You’re Doing It Wrong Aysylu Greenberg @aysylu22
  2. 2. To Write Good Benchmarks… Need to be Full Stack
  3. 3. Benchmark = How Fast? your process vs Goal your process vs Best PracCces
  4. 4. Today • How Not to Write Benchmarks • Benchmark Setup & Results: - You’re wrong about machines - You’re wrong about stats - You’re wrong about what maLers • Becoming Less Wrong • Having Fun with Riak
  5. 5. HOW NOT TO WRITE BENCHMARKS
  6. 6. Website Serving Images • Access 1 image 1000 Cmes • Latency measured for each access • Start measuring immediately • 3 runs • Find mean • Dev environment Web Request Server Cache S3
  7. 7. WHAT’S WRONG WITH THIS BENCHMARK?
  8. 8. YOU’RE WRONG ABOUT THE MACHINE
  9. 9. Wrong About the Machine • Cache, cache, cache, cache!
  10. 10. It’s Caches All The Way Down Web Request Server Cache S3
  11. 11. It’s Caches All The Way Down
  12. 12. Caches in Benchmarks Prof. Saman Amarasinghe, MIT 2009
  13. 13. Caches in Benchmarks Prof. Saman Amarasinghe, MIT 2009
  14. 14. Caches in Benchmarks Prof. Saman Amarasinghe, MIT 2009
  15. 15. Caches in Benchmarks Prof. Saman Amarasinghe, MIT 2009
  16. 16. Caches in Benchmarks Prof. Saman Amarasinghe, MIT 2009
  17. 17. Website Serving Images • Access 1 image 1000 Cmes • Latency measured for each access • Start measuring immediately • 3 runs • Find mean • Dev environment Web Request Server Cache S3
  18. 18. Wrong About the Machine • Cache, cache, cache, cache! • Warmup & Cming
  19. 19. Website Serving Images • Access 1 image 1000 Cmes • Latency measured for each access • Start measuring immediately • 3 runs • Find mean • Dev environment Web Request Server Cache S3
  20. 20. Wrong About the Machine • Cache, cache, cache, cache! • Warmup & Cming • Periodic interference
  21. 21. Website Serving Images • Access 1 image 1000 Cmes • Latency measured for each access • Start measuring immediately • 3 runs • Find mean • Dev environment Web Request Server Cache S3
  22. 22. Wrong About the Machine • Cache, cache, cache, cache! • Warmup & Cming • Periodic interference • Test != Prod
  23. 23. Website Serving Images • Access 1 image 1000 Cmes • Latency measured for each access • Start measuring immediately • 3 runs • Find mean • Dev environment Web Request Server Cache S3
  24. 24. Wrong About the Machine • Cache, cache, cache, cache! • Warmup & Cming • Periodic interference • Test != Prod • Power mode changes
  25. 25. YOU’RE WRONG ABOUT THE STATS
  26. 26. Wrong About Stats • Too few samples
  27. 27. Wrong About Stats 120 100 80 60 40 20 0 Convergence of Median on Samples 0 10 20 30 40 50 60 Latency Time Stable Samples Stable Median Decaying Samples Decaying Median
  28. 28. Website Serving Images • Access 1 image 1000 Cmes • Latency measured for each access • Start measuring immediately • 3 runs • Find mean • Dev machine Web Request Server Cache S3
  29. 29. Wrong About Stats • Too few samples • Gaussian (not)
  30. 30. Website Serving Images • Access 1 image 1000 Cmes • Latency measured for each access • Start measuring immediately • 3 runs • Find mean • Dev machine Web Request Server Cache S3
  31. 31. Wrong About Stats • Too few samples • Gaussian (not) • MulCmodal distribuCon
  32. 32. MulCmodal DistribuCon 50% 99% # occurrences Latency 5 ms 10 ms
  33. 33. Wrong About Stats • Too few samples • Gaussian (not) • MulCmodal distribuCon • Outliers
  34. 34. YOU’RE WRONG ABOUT WHAT MATTERS
  35. 35. Wrong About What MaLers • Premature opCmizaCon
  36. 36. “Programmers waste enormous amounts of Cme thinking about … the speed of noncriCcal parts of their programs ... Forget about small efficiencies …97% of the Cme: premature opHmizaHon is the root of all evil. Yet we should not pass up our opportuniCes in that criCcal 3%.” -­‐-­‐ Donald Knuth
  37. 37. Wrong About What MaLers • Premature opCmizaCon • UnrepresentaCve workloads
  38. 38. Wrong About What MaLers • Premature opCmizaCon • UnrepresentaCve workloads • Memory pressure
  39. 39. Wrong About What MaLers • Premature opCmizaCon • UnrepresentaCve workloads • Memory pressure • Load balancing
  40. 40. Wrong About What MaLers • Premature opCmizaCon • UnrepresentaCve workloads • Memory pressure • Load balancing • Reproducibility of measurements
  41. 41. BECOMING LESS WRONG
  42. 42. User AcCons MaLer X > Y for workload Z with trade offs A, B, and C -­‐ hLp://www.toomuchcode.org/
  43. 43. Profiling Code instrumentaCon Aggregate over logs Traces
  44. 44. Microbenchmarking: Blessing & Curse + Quick & cheap + Answers narrow ?s well - Osen misleading results - Not representaCve of the program
  45. 45. Microbenchmarking: Blessing & Curse • Choose your N wisely
  46. 46. Choose Your N Wisely Prof. Saman Amarasinghe, MIT 2009
  47. 47. Microbenchmarking: Blessing & Curse • Choose your N wisely • Measure side effects
  48. 48. Microbenchmarking: Blessing & Curse • Choose your N wisely • Measure side effects • Beware of clock resoluCon
  49. 49. Microbenchmarking: Blessing & Curse • Choose your N wisely • Measure side effects • Beware of clock resoluCon • Dead Code EliminaCon
  50. 50. Microbenchmarking: Blessing & Curse • Choose your N wisely • Measure side effects • Beware of clock resoluCon • Dead Code EliminaCon • Constant work per iteraCon
  51. 51. Non-­‐Constant Work Per IteraCon
  52. 52. Follow-­‐up Material • How NOT to Measure Latency by Gil Tene – hLp://www.infoq.com/presentaCons/latency-­‐piualls • Taming the Long Latency Tail on highscalability.com – hLp://highscalability.com/blog/2012/3/12/google-­‐taming-­‐ the-­‐long-­‐latency-­‐tail-­‐when-­‐more-­‐machines-­‐equal.html • Performance Analysis Methodology by Brendan Gregg – hLp://www.brendangregg.com/methodology.html • Silverman’s Mode Detec@on Method by MaL Adereth – hLp://adereth.github.io/blog/2014/10/12/silvermans-­‐ mode-­‐detecCon-­‐method-­‐explained/
  53. 53. HAVING FUN WITH
  54. 54. Setup • SSD 30 GB • M3 large • Riak version 1.4.2-­‐0-­‐g61ac9d8 • Ubuntu 12.04.5 LTS • 4 byte keys, 10 KB values
  55. 55. 2350 2300 2250 2200 2150 2100 2050 2000 1950 1900 1850 Latency (usec) Get Latency L3 Number of Keys
  56. 56. Takeaway #1: Cache
  57. 57. Takeaway #2: Outliers
  58. 58. Takeaway #3: Workload
  59. 59. Benchmarking: You’re Doing It Wrong Aysylu Greenberg @aysylu22

×