SlideShare a Scribd company logo
1 of 60
Download to read offline
Benchmarking: 
You’re Doing It Wrong 
Aysylu 
Greenberg 
@aysylu22
To 
Write 
Good 
Benchmarks… 
Need 
to 
be 
Full 
Stack
Benchmark 
= 
How 
Fast? 
your 
process 
vs 
Goal 
your 
process 
vs 
Best 
PracCces
Today 
• How 
Not 
to 
Write 
Benchmarks 
• Benchmark 
Setup 
& 
Results: 
- 
You’re 
wrong 
about 
machines 
- 
You’re 
wrong 
about 
stats 
- 
You’re 
wrong 
about 
what 
maLers 
• Becoming 
Less 
Wrong 
• Having 
Fun 
with 
Riak
HOW 
NOT 
TO 
WRITE 
BENCHMARKS
Website 
Serving 
Images 
• Access 
1 
image 
1000 
Cmes 
• Latency 
measured 
for 
each 
access 
• Start 
measuring 
immediately 
• 3 
runs 
• Find 
mean 
• Dev 
environment 
Web 
Request 
Server 
Cache 
S3
WHAT’S 
WRONG 
WITH 
THIS 
BENCHMARK?
YOU’RE 
WRONG 
ABOUT 
THE 
MACHINE
Wrong 
About 
the 
Machine 
• Cache, 
cache, 
cache, 
cache!
It’s 
Caches 
All 
The 
Way 
Down 
Web 
Request 
Server 
Cache 
S3
It’s 
Caches 
All 
The 
Way 
Down
Caches 
in 
Benchmarks 
Prof. 
Saman 
Amarasinghe, 
MIT 
2009
Caches 
in 
Benchmarks 
Prof. 
Saman 
Amarasinghe, 
MIT 
2009
Caches 
in 
Benchmarks 
Prof. 
Saman 
Amarasinghe, 
MIT 
2009
Caches 
in 
Benchmarks 
Prof. 
Saman 
Amarasinghe, 
MIT 
2009
Caches 
in 
Benchmarks 
Prof. 
Saman 
Amarasinghe, 
MIT 
2009
Website 
Serving 
Images 
• Access 
1 
image 
1000 
Cmes 
• Latency 
measured 
for 
each 
access 
• Start 
measuring 
immediately 
• 3 
runs 
• Find 
mean 
• Dev 
environment 
Web 
Request 
Server 
Cache 
S3
Wrong 
About 
the 
Machine 
• Cache, 
cache, 
cache, 
cache! 
• Warmup 
& 
Cming
Website 
Serving 
Images 
• Access 
1 
image 
1000 
Cmes 
• Latency 
measured 
for 
each 
access 
• Start 
measuring 
immediately 
• 3 
runs 
• Find 
mean 
• Dev 
environment 
Web 
Request 
Server 
Cache 
S3
Wrong 
About 
the 
Machine 
• Cache, 
cache, 
cache, 
cache! 
• Warmup 
& 
Cming 
• Periodic 
interference
Website 
Serving 
Images 
• Access 
1 
image 
1000 
Cmes 
• Latency 
measured 
for 
each 
access 
• Start 
measuring 
immediately 
• 3 
runs 
• Find 
mean 
• Dev 
environment 
Web 
Request 
Server 
Cache 
S3
Wrong 
About 
the 
Machine 
• Cache, 
cache, 
cache, 
cache! 
• Warmup 
& 
Cming 
• Periodic 
interference 
• Test 
!= 
Prod
Website 
Serving 
Images 
• Access 
1 
image 
1000 
Cmes 
• Latency 
measured 
for 
each 
access 
• Start 
measuring 
immediately 
• 3 
runs 
• Find 
mean 
• Dev 
environment 
Web 
Request 
Server 
Cache 
S3
Wrong 
About 
the 
Machine 
• Cache, 
cache, 
cache, 
cache! 
• Warmup 
& 
Cming 
• Periodic 
interference 
• Test 
!= 
Prod 
• Power 
mode 
changes
YOU’RE 
WRONG 
ABOUT 
THE 
STATS
Wrong 
About 
Stats 
• Too 
few 
samples
Wrong 
About 
Stats 
120 
100 
80 
60 
40 
20 
0 
Convergence 
of 
Median 
on 
Samples 
0 
10 
20 
30 
40 
50 
60 
Latency 
Time 
Stable 
Samples 
Stable 
Median 
Decaying 
Samples 
Decaying 
Median
Website 
Serving 
Images 
• Access 
1 
image 
1000 
Cmes 
• Latency 
measured 
for 
each 
access 
• Start 
measuring 
immediately 
• 3 
runs 
• Find 
mean 
• Dev 
machine 
Web 
Request 
Server 
Cache 
S3
Wrong 
About 
Stats 
• Too 
few 
samples 
• Gaussian 
(not)
Website 
Serving 
Images 
• Access 
1 
image 
1000 
Cmes 
• Latency 
measured 
for 
each 
access 
• Start 
measuring 
immediately 
• 3 
runs 
• Find 
mean 
• Dev 
machine 
Web 
Request 
Server 
Cache 
S3
Wrong 
About 
Stats 
• Too 
few 
samples 
• Gaussian 
(not) 
• MulCmodal 
distribuCon
MulCmodal 
DistribuCon 
50% 
99% 
# 
occurrences 
Latency 
5 
ms 
10 
ms
Wrong 
About 
Stats 
• Too 
few 
samples 
• Gaussian 
(not) 
• MulCmodal 
distribuCon 
• Outliers
YOU’RE 
WRONG 
ABOUT 
WHAT 
MATTERS
Wrong 
About 
What 
MaLers 
• Premature 
opCmizaCon
“Programmers 
waste 
enormous 
amounts 
of 
Cme 
thinking 
about 
… 
the 
speed 
of 
noncriCcal 
parts 
of 
their 
programs 
... 
Forget 
about 
small 
efficiencies 
…97% 
of 
the 
Cme: 
premature 
opHmizaHon 
is 
the 
root 
of 
all 
evil. 
Yet 
we 
should 
not 
pass 
up 
our 
opportuniCes 
in 
that 
criCcal 
3%.” 
-­‐-­‐ 
Donald 
Knuth
Wrong 
About 
What 
MaLers 
• Premature 
opCmizaCon 
• UnrepresentaCve 
workloads
Wrong 
About 
What 
MaLers 
• Premature 
opCmizaCon 
• UnrepresentaCve 
workloads 
• Memory 
pressure
Wrong 
About 
What 
MaLers 
• Premature 
opCmizaCon 
• UnrepresentaCve 
workloads 
• Memory 
pressure 
• Load 
balancing
Wrong 
About 
What 
MaLers 
• Premature 
opCmizaCon 
• UnrepresentaCve 
workloads 
• Memory 
pressure 
• Load 
balancing 
• Reproducibility 
of 
measurements
BECOMING 
LESS 
WRONG
User 
AcCons 
MaLer 
X 
> 
Y 
for 
workload 
Z 
with 
trade 
offs 
A, 
B, 
and 
C 
-­‐ 
hLp://www.toomuchcode.org/
Profiling 
Code 
instrumentaCon 
Aggregate 
over 
logs 
Traces
Microbenchmarking: 
Blessing 
& 
Curse 
+ Quick 
& 
cheap 
+ Answers 
narrow 
?s 
well 
- Osen 
misleading 
results 
- Not 
representaCve 
of 
the 
program
Microbenchmarking: 
Blessing 
& 
Curse 
• Choose 
your 
N 
wisely
Choose 
Your 
N 
Wisely 
Prof. 
Saman 
Amarasinghe, 
MIT 
2009
Microbenchmarking: 
Blessing 
& 
Curse 
• Choose 
your 
N 
wisely 
• Measure 
side 
effects
Microbenchmarking: 
Blessing 
& 
Curse 
• Choose 
your 
N 
wisely 
• Measure 
side 
effects 
• Beware 
of 
clock 
resoluCon
Microbenchmarking: 
Blessing 
& 
Curse 
• Choose 
your 
N 
wisely 
• Measure 
side 
effects 
• Beware 
of 
clock 
resoluCon 
• Dead 
Code 
EliminaCon
Microbenchmarking: 
Blessing 
& 
Curse 
• Choose 
your 
N 
wisely 
• Measure 
side 
effects 
• Beware 
of 
clock 
resoluCon 
• Dead 
Code 
EliminaCon 
• Constant 
work 
per 
iteraCon
Non-­‐Constant 
Work 
Per 
IteraCon
Follow-­‐up 
Material 
• How 
NOT 
to 
Measure 
Latency 
by 
Gil 
Tene 
– hLp://www.infoq.com/presentaCons/latency-­‐piualls 
• Taming 
the 
Long 
Latency 
Tail 
on 
highscalability.com 
– hLp://highscalability.com/blog/2012/3/12/google-­‐taming-­‐ 
the-­‐long-­‐latency-­‐tail-­‐when-­‐more-­‐machines-­‐equal.html 
• Performance 
Analysis 
Methodology 
by 
Brendan 
Gregg 
– hLp://www.brendangregg.com/methodology.html 
• Silverman’s 
Mode 
Detec@on 
Method 
by 
MaL 
Adereth 
– hLp://adereth.github.io/blog/2014/10/12/silvermans-­‐ 
mode-­‐detecCon-­‐method-­‐explained/
HAVING 
FUN 
WITH
Setup 
• SSD 
30 
GB 
• M3 
large 
• Riak 
version 
1.4.2-­‐0-­‐g61ac9d8 
• Ubuntu 
12.04.5 
LTS 
• 4 
byte 
keys, 
10 
KB 
values
2350 
2300 
2250 
2200 
2150 
2100 
2050 
2000 
1950 
1900 
1850 
Latency 
(usec) 
Get 
Latency 
L3 
Number 
of 
Keys
Takeaway 
#1: 
Cache
Takeaway 
#2: 
Outliers
Takeaway 
#3: 
Workload
Benchmarking: 
You’re Doing It Wrong 
Aysylu 
Greenberg 
@aysylu22

More Related Content

Similar to Benchmarking (RICON 2014)

Anomaly Detection Using the CLA
Anomaly Detection Using the CLAAnomaly Detection Using the CLA
Anomaly Detection Using the CLANumenta
 
The challenges of live events scalability
The challenges of live events scalabilityThe challenges of live events scalability
The challenges of live events scalabilityGuy Tomer
 
Adventures in Azure Machine Learning from NE Bytes
Adventures in Azure Machine Learning from NE BytesAdventures in Azure Machine Learning from NE Bytes
Adventures in Azure Machine Learning from NE BytesDerek Graham
 
Machine learning systems for engineers
Machine learning systems for engineersMachine learning systems for engineers
Machine learning systems for engineersCameron Joannidis
 
DevOpsDays Austin: Helping Horses Become Unicorns, Chef's Operations Maturity...
DevOpsDays Austin: Helping Horses Become Unicorns, Chef's Operations Maturity...DevOpsDays Austin: Helping Horses Become Unicorns, Chef's Operations Maturity...
DevOpsDays Austin: Helping Horses Become Unicorns, Chef's Operations Maturity...Matt Ray
 
Building data intensive applications
Building data intensive applicationsBuilding data intensive applications
Building data intensive applicationsAmit Kejriwal
 
Coates bosc2010 clouds-fluff-and-no-substance
Coates bosc2010 clouds-fluff-and-no-substanceCoates bosc2010 clouds-fluff-and-no-substance
Coates bosc2010 clouds-fluff-and-no-substanceBOSC 2010
 
Embrace Chaos - Introducing Chaos Engineering to your Organization
Embrace Chaos - Introducing Chaos Engineering to your OrganizationEmbrace Chaos - Introducing Chaos Engineering to your Organization
Embrace Chaos - Introducing Chaos Engineering to your OrganizationPaul Osman
 
The deep bootstrap framework review
The deep bootstrap framework reviewThe deep bootstrap framework review
The deep bootstrap framework reviewtaeseon ryu
 
Show Me the Numbers: Automated Browser
Show Me the Numbers: Automated Browser Show Me the Numbers: Automated Browser
Show Me the Numbers: Automated Browser colleenfry
 
Sean Kandel - Data profiling: Assessing the overall content and quality of a ...
Sean Kandel - Data profiling: Assessing the overall content and quality of a ...Sean Kandel - Data profiling: Assessing the overall content and quality of a ...
Sean Kandel - Data profiling: Assessing the overall content and quality of a ...huguk
 
Badneedles
BadneedlesBadneedles
Badneedlesdimisec
 
Cvcc performance tuning
Cvcc performance tuningCvcc performance tuning
Cvcc performance tuningJohn McCaffrey
 
The RED Method: How to monitoring your microservices.
The RED Method: How to monitoring your microservices.The RED Method: How to monitoring your microservices.
The RED Method: How to monitoring your microservices.Grafana Labs
 
(DVO205) Monitoring Evolution: Flying Blind to Flying by Instrument
(DVO205) Monitoring Evolution: Flying Blind to Flying by Instrument(DVO205) Monitoring Evolution: Flying Blind to Flying by Instrument
(DVO205) Monitoring Evolution: Flying Blind to Flying by InstrumentAmazon Web Services
 
Rubyslava beyond the_monolith
Rubyslava beyond the_monolithRubyslava beyond the_monolith
Rubyslava beyond the_monolitholahmichal
 
Capacity Planning for fun & profit
Capacity Planning for fun & profitCapacity Planning for fun & profit
Capacity Planning for fun & profitRodrigo Campos
 
Continuous Integration, the minimum viable product
Continuous Integration, the minimum viable productContinuous Integration, the minimum viable product
Continuous Integration, the minimum viable productJulian Simpson
 
Performance Oriented Design
Performance Oriented DesignPerformance Oriented Design
Performance Oriented DesignRodrigo Campos
 

Similar to Benchmarking (RICON 2014) (20)

Anomaly Detection Using the CLA
Anomaly Detection Using the CLAAnomaly Detection Using the CLA
Anomaly Detection Using the CLA
 
The challenges of live events scalability
The challenges of live events scalabilityThe challenges of live events scalability
The challenges of live events scalability
 
Adventures in Azure Machine Learning from NE Bytes
Adventures in Azure Machine Learning from NE BytesAdventures in Azure Machine Learning from NE Bytes
Adventures in Azure Machine Learning from NE Bytes
 
Machine learning systems for engineers
Machine learning systems for engineersMachine learning systems for engineers
Machine learning systems for engineers
 
DevOpsDays Austin: Helping Horses Become Unicorns, Chef's Operations Maturity...
DevOpsDays Austin: Helping Horses Become Unicorns, Chef's Operations Maturity...DevOpsDays Austin: Helping Horses Become Unicorns, Chef's Operations Maturity...
DevOpsDays Austin: Helping Horses Become Unicorns, Chef's Operations Maturity...
 
Building data intensive applications
Building data intensive applicationsBuilding data intensive applications
Building data intensive applications
 
Coates bosc2010 clouds-fluff-and-no-substance
Coates bosc2010 clouds-fluff-and-no-substanceCoates bosc2010 clouds-fluff-and-no-substance
Coates bosc2010 clouds-fluff-and-no-substance
 
Embrace Chaos - Introducing Chaos Engineering to your Organization
Embrace Chaos - Introducing Chaos Engineering to your OrganizationEmbrace Chaos - Introducing Chaos Engineering to your Organization
Embrace Chaos - Introducing Chaos Engineering to your Organization
 
The deep bootstrap framework review
The deep bootstrap framework reviewThe deep bootstrap framework review
The deep bootstrap framework review
 
Show Me the Numbers: Automated Browser
Show Me the Numbers: Automated Browser Show Me the Numbers: Automated Browser
Show Me the Numbers: Automated Browser
 
Sean Kandel - Data profiling: Assessing the overall content and quality of a ...
Sean Kandel - Data profiling: Assessing the overall content and quality of a ...Sean Kandel - Data profiling: Assessing the overall content and quality of a ...
Sean Kandel - Data profiling: Assessing the overall content and quality of a ...
 
Badneedles
BadneedlesBadneedles
Badneedles
 
Cvcc performance tuning
Cvcc performance tuningCvcc performance tuning
Cvcc performance tuning
 
The RED Method: How to monitoring your microservices.
The RED Method: How to monitoring your microservices.The RED Method: How to monitoring your microservices.
The RED Method: How to monitoring your microservices.
 
(DVO205) Monitoring Evolution: Flying Blind to Flying by Instrument
(DVO205) Monitoring Evolution: Flying Blind to Flying by Instrument(DVO205) Monitoring Evolution: Flying Blind to Flying by Instrument
(DVO205) Monitoring Evolution: Flying Blind to Flying by Instrument
 
Rubyslava beyond the_monolith
Rubyslava beyond the_monolithRubyslava beyond the_monolith
Rubyslava beyond the_monolith
 
Capacity Planning for fun & profit
Capacity Planning for fun & profitCapacity Planning for fun & profit
Capacity Planning for fun & profit
 
Continuous Integration, the minimum viable product
Continuous Integration, the minimum viable productContinuous Integration, the minimum viable product
Continuous Integration, the minimum viable product
 
Cloud War Stories
Cloud War StoriesCloud War Stories
Cloud War Stories
 
Performance Oriented Design
Performance Oriented DesignPerformance Oriented Design
Performance Oriented Design
 

More from Aysylu Greenberg

Software Supply Chains for DevOps @ InfoQ Live 2021
Software Supply Chains for DevOps @ InfoQ Live 2021Software Supply Chains for DevOps @ InfoQ Live 2021
Software Supply Chains for DevOps @ InfoQ Live 2021Aysylu Greenberg
 
Binary Authorization in Kubernetes
Binary Authorization in KubernetesBinary Authorization in Kubernetes
Binary Authorization in KubernetesAysylu Greenberg
 
Software Supply Chain Management with Grafeas and Kritis
Software Supply Chain Management with Grafeas and KritisSoftware Supply Chain Management with Grafeas and Kritis
Software Supply Chain Management with Grafeas and KritisAysylu Greenberg
 
Software Supply Chain Observability with Grafeas and Kritis
Software Supply Chain Observability with Grafeas and KritisSoftware Supply Chain Observability with Grafeas and Kritis
Software Supply Chain Observability with Grafeas and KritisAysylu Greenberg
 
Software Supply Chain Management with Grafeas and Kritis
Software Supply Chain Management with Grafeas and KritisSoftware Supply Chain Management with Grafeas and Kritis
Software Supply Chain Management with Grafeas and KritisAysylu Greenberg
 
Zero Downtime Migrations at Scale
Zero Downtime Migrations at ScaleZero Downtime Migrations at Scale
Zero Downtime Migrations at ScaleAysylu Greenberg
 
Distributed systems in practice, in theory (ScaleConf Colombia)
Distributed systems in practice, in theory (ScaleConf Colombia)Distributed systems in practice, in theory (ScaleConf Colombia)
Distributed systems in practice, in theory (ScaleConf Colombia)Aysylu Greenberg
 
MesosCon Asia Keynote: Replacing a Jet Engine Mid-flight
MesosCon Asia Keynote: Replacing a Jet Engine Mid-flightMesosCon Asia Keynote: Replacing a Jet Engine Mid-flight
MesosCon Asia Keynote: Replacing a Jet Engine Mid-flightAysylu Greenberg
 
Distributed systems in practice, in theory (JAX London)
Distributed systems in practice, in theory (JAX London)Distributed systems in practice, in theory (JAX London)
Distributed systems in practice, in theory (JAX London)Aysylu Greenberg
 
Building A Distributed Build System at Google Scale (StrangeLoop 2016)
Building A Distributed Build System at Google Scale (StrangeLoop 2016)Building A Distributed Build System at Google Scale (StrangeLoop 2016)
Building A Distributed Build System at Google Scale (StrangeLoop 2016)Aysylu Greenberg
 
Building a Distributed Build System at Google Scale
Building a Distributed Build System at Google ScaleBuilding a Distributed Build System at Google Scale
Building a Distributed Build System at Google ScaleAysylu Greenberg
 
Distributed systems in practice, in theory
Distributed systems in practice, in theoryDistributed systems in practice, in theory
Distributed systems in practice, in theoryAysylu Greenberg
 
Probabilistic Accuracy Bounds @ Papers We Love SF
Probabilistic Accuracy Bounds @ Papers We Love SFProbabilistic Accuracy Bounds @ Papers We Love SF
Probabilistic Accuracy Bounds @ Papers We Love SFAysylu Greenberg
 
Benchmarking (JAXLondon 2015)
Benchmarking (JAXLondon 2015)Benchmarking (JAXLondon 2015)
Benchmarking (JAXLondon 2015)Aysylu Greenberg
 
Loom & Functional Graphs in Clojure @ LambdaConf 2015
Loom & Functional Graphs in Clojure @ LambdaConf 2015Loom & Functional Graphs in Clojure @ LambdaConf 2015
Loom & Functional Graphs in Clojure @ LambdaConf 2015Aysylu Greenberg
 
Benchmarking (DevNexus 2015)
Benchmarking (DevNexus 2015)Benchmarking (DevNexus 2015)
Benchmarking (DevNexus 2015)Aysylu Greenberg
 
PWL: One VM to Rule Them All
PWL: One VM to Rule Them AllPWL: One VM to Rule Them All
PWL: One VM to Rule Them AllAysylu Greenberg
 

More from Aysylu Greenberg (20)

Software Supply Chains for DevOps @ InfoQ Live 2021
Software Supply Chains for DevOps @ InfoQ Live 2021Software Supply Chains for DevOps @ InfoQ Live 2021
Software Supply Chains for DevOps @ InfoQ Live 2021
 
Binary Authorization in Kubernetes
Binary Authorization in KubernetesBinary Authorization in Kubernetes
Binary Authorization in Kubernetes
 
Software Supply Chain Management with Grafeas and Kritis
Software Supply Chain Management with Grafeas and KritisSoftware Supply Chain Management with Grafeas and Kritis
Software Supply Chain Management with Grafeas and Kritis
 
Software Supply Chain Observability with Grafeas and Kritis
Software Supply Chain Observability with Grafeas and KritisSoftware Supply Chain Observability with Grafeas and Kritis
Software Supply Chain Observability with Grafeas and Kritis
 
Software Supply Chain Management with Grafeas and Kritis
Software Supply Chain Management with Grafeas and KritisSoftware Supply Chain Management with Grafeas and Kritis
Software Supply Chain Management with Grafeas and Kritis
 
Zero Downtime Migrations at Scale
Zero Downtime Migrations at ScaleZero Downtime Migrations at Scale
Zero Downtime Migrations at Scale
 
Zero Downtime Migration
Zero Downtime MigrationZero Downtime Migration
Zero Downtime Migration
 
PWL Denver: Copysets
PWL Denver: CopysetsPWL Denver: Copysets
PWL Denver: Copysets
 
Distributed systems in practice, in theory (ScaleConf Colombia)
Distributed systems in practice, in theory (ScaleConf Colombia)Distributed systems in practice, in theory (ScaleConf Colombia)
Distributed systems in practice, in theory (ScaleConf Colombia)
 
MesosCon Asia Keynote: Replacing a Jet Engine Mid-flight
MesosCon Asia Keynote: Replacing a Jet Engine Mid-flightMesosCon Asia Keynote: Replacing a Jet Engine Mid-flight
MesosCon Asia Keynote: Replacing a Jet Engine Mid-flight
 
Distributed systems in practice, in theory (JAX London)
Distributed systems in practice, in theory (JAX London)Distributed systems in practice, in theory (JAX London)
Distributed systems in practice, in theory (JAX London)
 
Building A Distributed Build System at Google Scale (StrangeLoop 2016)
Building A Distributed Build System at Google Scale (StrangeLoop 2016)Building A Distributed Build System at Google Scale (StrangeLoop 2016)
Building A Distributed Build System at Google Scale (StrangeLoop 2016)
 
Building a Distributed Build System at Google Scale
Building a Distributed Build System at Google ScaleBuilding a Distributed Build System at Google Scale
Building a Distributed Build System at Google Scale
 
(+ Loom (years 2))
(+ Loom (years 2))(+ Loom (years 2))
(+ Loom (years 2))
 
Distributed systems in practice, in theory
Distributed systems in practice, in theoryDistributed systems in practice, in theory
Distributed systems in practice, in theory
 
Probabilistic Accuracy Bounds @ Papers We Love SF
Probabilistic Accuracy Bounds @ Papers We Love SFProbabilistic Accuracy Bounds @ Papers We Love SF
Probabilistic Accuracy Bounds @ Papers We Love SF
 
Benchmarking (JAXLondon 2015)
Benchmarking (JAXLondon 2015)Benchmarking (JAXLondon 2015)
Benchmarking (JAXLondon 2015)
 
Loom & Functional Graphs in Clojure @ LambdaConf 2015
Loom & Functional Graphs in Clojure @ LambdaConf 2015Loom & Functional Graphs in Clojure @ LambdaConf 2015
Loom & Functional Graphs in Clojure @ LambdaConf 2015
 
Benchmarking (DevNexus 2015)
Benchmarking (DevNexus 2015)Benchmarking (DevNexus 2015)
Benchmarking (DevNexus 2015)
 
PWL: One VM to Rule Them All
PWL: One VM to Rule Them AllPWL: One VM to Rule Them All
PWL: One VM to Rule Them All
 

Recently uploaded

The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfkalichargn70th171
 
why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfwhy an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfjoe51371421
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantAxelRicardoTrocheRiq
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️Delhi Call girls
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providermohitmore19
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comFatema Valibhai
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...MyIntelliSource, Inc.
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...OnePlan Solutions
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...gurkirankumar98700
 
Active Directory Penetration Testing, cionsystems.com.pdf
Active Directory Penetration Testing, cionsystems.com.pdfActive Directory Penetration Testing, cionsystems.com.pdf
Active Directory Penetration Testing, cionsystems.com.pdfCionsystems
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxComplianceQuest1
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVshikhaohhpro
 
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AIABDERRAOUF MEHENNI
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...ICS
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Modelsaagamshah0812
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...kellynguyen01
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerThousandEyes
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsAndolasoft Inc
 

Recently uploaded (20)

The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfwhy an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdf
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service Consultant
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
 
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
 
Active Directory Penetration Testing, cionsystems.com.pdf
Active Directory Penetration Testing, cionsystems.com.pdfActive Directory Penetration Testing, cionsystems.com.pdf
Active Directory Penetration Testing, cionsystems.com.pdf
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.js
 

Benchmarking (RICON 2014)

  • 1. Benchmarking: You’re Doing It Wrong Aysylu Greenberg @aysylu22
  • 2.
  • 3. To Write Good Benchmarks… Need to be Full Stack
  • 4. Benchmark = How Fast? your process vs Goal your process vs Best PracCces
  • 5. Today • How Not to Write Benchmarks • Benchmark Setup & Results: - You’re wrong about machines - You’re wrong about stats - You’re wrong about what maLers • Becoming Less Wrong • Having Fun with Riak
  • 6. HOW NOT TO WRITE BENCHMARKS
  • 7. Website Serving Images • Access 1 image 1000 Cmes • Latency measured for each access • Start measuring immediately • 3 runs • Find mean • Dev environment Web Request Server Cache S3
  • 8. WHAT’S WRONG WITH THIS BENCHMARK?
  • 9. YOU’RE WRONG ABOUT THE MACHINE
  • 10. Wrong About the Machine • Cache, cache, cache, cache!
  • 11. It’s Caches All The Way Down Web Request Server Cache S3
  • 12. It’s Caches All The Way Down
  • 13. Caches in Benchmarks Prof. Saman Amarasinghe, MIT 2009
  • 14. Caches in Benchmarks Prof. Saman Amarasinghe, MIT 2009
  • 15. Caches in Benchmarks Prof. Saman Amarasinghe, MIT 2009
  • 16. Caches in Benchmarks Prof. Saman Amarasinghe, MIT 2009
  • 17. Caches in Benchmarks Prof. Saman Amarasinghe, MIT 2009
  • 18. Website Serving Images • Access 1 image 1000 Cmes • Latency measured for each access • Start measuring immediately • 3 runs • Find mean • Dev environment Web Request Server Cache S3
  • 19. Wrong About the Machine • Cache, cache, cache, cache! • Warmup & Cming
  • 20. Website Serving Images • Access 1 image 1000 Cmes • Latency measured for each access • Start measuring immediately • 3 runs • Find mean • Dev environment Web Request Server Cache S3
  • 21. Wrong About the Machine • Cache, cache, cache, cache! • Warmup & Cming • Periodic interference
  • 22. Website Serving Images • Access 1 image 1000 Cmes • Latency measured for each access • Start measuring immediately • 3 runs • Find mean • Dev environment Web Request Server Cache S3
  • 23. Wrong About the Machine • Cache, cache, cache, cache! • Warmup & Cming • Periodic interference • Test != Prod
  • 24. Website Serving Images • Access 1 image 1000 Cmes • Latency measured for each access • Start measuring immediately • 3 runs • Find mean • Dev environment Web Request Server Cache S3
  • 25. Wrong About the Machine • Cache, cache, cache, cache! • Warmup & Cming • Periodic interference • Test != Prod • Power mode changes
  • 26. YOU’RE WRONG ABOUT THE STATS
  • 27. Wrong About Stats • Too few samples
  • 28. Wrong About Stats 120 100 80 60 40 20 0 Convergence of Median on Samples 0 10 20 30 40 50 60 Latency Time Stable Samples Stable Median Decaying Samples Decaying Median
  • 29. Website Serving Images • Access 1 image 1000 Cmes • Latency measured for each access • Start measuring immediately • 3 runs • Find mean • Dev machine Web Request Server Cache S3
  • 30. Wrong About Stats • Too few samples • Gaussian (not)
  • 31. Website Serving Images • Access 1 image 1000 Cmes • Latency measured for each access • Start measuring immediately • 3 runs • Find mean • Dev machine Web Request Server Cache S3
  • 32. Wrong About Stats • Too few samples • Gaussian (not) • MulCmodal distribuCon
  • 33. MulCmodal DistribuCon 50% 99% # occurrences Latency 5 ms 10 ms
  • 34. Wrong About Stats • Too few samples • Gaussian (not) • MulCmodal distribuCon • Outliers
  • 35. YOU’RE WRONG ABOUT WHAT MATTERS
  • 36. Wrong About What MaLers • Premature opCmizaCon
  • 37. “Programmers waste enormous amounts of Cme thinking about … the speed of noncriCcal parts of their programs ... Forget about small efficiencies …97% of the Cme: premature opHmizaHon is the root of all evil. Yet we should not pass up our opportuniCes in that criCcal 3%.” -­‐-­‐ Donald Knuth
  • 38. Wrong About What MaLers • Premature opCmizaCon • UnrepresentaCve workloads
  • 39. Wrong About What MaLers • Premature opCmizaCon • UnrepresentaCve workloads • Memory pressure
  • 40. Wrong About What MaLers • Premature opCmizaCon • UnrepresentaCve workloads • Memory pressure • Load balancing
  • 41. Wrong About What MaLers • Premature opCmizaCon • UnrepresentaCve workloads • Memory pressure • Load balancing • Reproducibility of measurements
  • 43. User AcCons MaLer X > Y for workload Z with trade offs A, B, and C -­‐ hLp://www.toomuchcode.org/
  • 44. Profiling Code instrumentaCon Aggregate over logs Traces
  • 45. Microbenchmarking: Blessing & Curse + Quick & cheap + Answers narrow ?s well - Osen misleading results - Not representaCve of the program
  • 46. Microbenchmarking: Blessing & Curse • Choose your N wisely
  • 47. Choose Your N Wisely Prof. Saman Amarasinghe, MIT 2009
  • 48. Microbenchmarking: Blessing & Curse • Choose your N wisely • Measure side effects
  • 49. Microbenchmarking: Blessing & Curse • Choose your N wisely • Measure side effects • Beware of clock resoluCon
  • 50. Microbenchmarking: Blessing & Curse • Choose your N wisely • Measure side effects • Beware of clock resoluCon • Dead Code EliminaCon
  • 51. Microbenchmarking: Blessing & Curse • Choose your N wisely • Measure side effects • Beware of clock resoluCon • Dead Code EliminaCon • Constant work per iteraCon
  • 53. Follow-­‐up Material • How NOT to Measure Latency by Gil Tene – hLp://www.infoq.com/presentaCons/latency-­‐piualls • Taming the Long Latency Tail on highscalability.com – hLp://highscalability.com/blog/2012/3/12/google-­‐taming-­‐ the-­‐long-­‐latency-­‐tail-­‐when-­‐more-­‐machines-­‐equal.html • Performance Analysis Methodology by Brendan Gregg – hLp://www.brendangregg.com/methodology.html • Silverman’s Mode Detec@on Method by MaL Adereth – hLp://adereth.github.io/blog/2014/10/12/silvermans-­‐ mode-­‐detecCon-­‐method-­‐explained/
  • 55. Setup • SSD 30 GB • M3 large • Riak version 1.4.2-­‐0-­‐g61ac9d8 • Ubuntu 12.04.5 LTS • 4 byte keys, 10 KB values
  • 56. 2350 2300 2250 2200 2150 2100 2050 2000 1950 1900 1850 Latency (usec) Get Latency L3 Number of Keys
  • 60. Benchmarking: You’re Doing It Wrong Aysylu Greenberg @aysylu22