SlideShare a Scribd company logo
Java Micro-Benchmarking
Constantine Nosovsky
1
Agenda
Benchmark definition, types, common problems
Tools needed to measure performance
Code warm-up, what happens before the steady-state
Using JMH
Side effects that can affect performance
JVM optimizations (Good or Evil?)
A word about concurrency
Full example, “human factor” included
A fleeting glimpse on the JMH details
2 of 50
Benchmark definition, types, common
problems
3 of 50
What is a “Benchmark”?
Benchmark is a program for performance measurement
Requirements:
Dimensions: throughput and latency
Avoid significant overhead
Test what is to be tested
Perform a set of executions and provide stable reproducible
results
Should be easy to run
4 of 50
Benchmark types
By scale
• Micro-benchmark (component level)
• Macro-benchmark (system level)
By nature
• Synthetic benchmark (emulate component load)
• Application benchmark (run real-world application)
5 of 50
We’ll talk about
Synthetic micro-benchmark
• Mimic component workload separately from the application
• Measure performance of a small isolated piece of code
The main concern
• The smaller the Component we test – the stronger the impact of
• Benchmark infrastructure overhead
• JVM internal processes
• OS and Hardware internals
• … and the phases of the Moon
• Don’t we really test one of those?
6 of 50
When micro-benchmark is needed
Most of the time it is not needed at all
Does algorithm A work faster than B?
(Consider equal analytical estimation)
Does this tiny modification make any difference?
(from Java, JVM, native code or hardware point of view)
7 of 50
Tools needed to measure performance
8 of 50
You had one job…
9 of 50
final int COUNT = 100;
long start = System.currentTimeMillis();
for (int i = 0; i < COUNT; i++) {
// doStuff();
}
long duration = System.currentTimeMillis() - start;
long avg = duration / COUNT;
System.out.println("Average execution time is " + avg + " ms");
Pitfall #0
Using profiler to measure performance of small methods
(adds significant overhead, measures execution “as is”)
“You had one job” approach is enough in real life
(not for micro-benchmarks, we got it already)
Annotations and reflective benchmark invocations
(you must be great at java.lang.reflect measurement)
10 of 50
Micro-benchmark frameworks
JMH – Takes into account a lot of internal VM processes
and executes benchmarks with minimal infrastructure (Oracle)
Caliper – Allows to measure repetitive code, works for Android,
allows to post results online (Google)
Japex – Allows to reduce infrastructure code, generates nice
HTML reports with JFreeChart plots
JUnitPerf – Measure functionality of the existing JUnit tests
11 of 50
Java time interval measurement
System.currentTimeMillis()
• Value in milliseconds, but granularity depends on the OS
• Represents a “wall-clock” time (since the start of Epoch)
System.nanoTime()
• Value in nanoseconds, since some time offset
• The accuracy is not worse than System.currentTimeMillis()
ThreadMXBean.getCurrentThreadCpuTime()
• The actual CPU time spent for the thread (nanoseconds)
• Might be unsupported by your VM
• Might be expensive
• Relevant for a single thread only
12 of 50
Code warm-up, what happens before
the steady-state
13 of 50
Code warm-up, class loading
A single warm-up iteration is NOT enough for Class Loading
(not all the branches of classes may load on the first iteration)
Sometimes classes are unloaded (it would be a shame if
something messed your results up with a huge peak)
Get help between iterations from
• ClassLoadingMXBean.getTotalLoadedClassCount()
• ClassLoadingMXBean.getUnloadedClassCount()
• -verbose:class
14 of 50
Code warm-up, compilation
Classes are loaded, verified and then being compiled
Oracle HotSpot and Azul Zing run application in interpreter
The hot method is being compiled after
~10k (server), ~1.5k (client) invocations
Long methods with loops are likely to be compiled earlier
Check CompilationMXBean.getTotalCompilationTime
Enable compilation logging with
• -XX:+UnlockDiagnosticVMOptions
• -XX:+PrintCompilation
• -XX:+LogCompilation -XX:LogFile=<filename>
15 of 50
Code warm-up, OSR
Normal compilation and OSR will result in a similar code
…unless compiler is not able to optimize a given frame
(e.g. inner loop is compiled before the outer one)
In the real world normal compilation is more likely to happen, so
it’s better to avoid OSR in your benchmark
• Do a set of small warm-up iterations instead of a single big one
• Do not perform warm-up loops in the steady-state testing method
16 of 50
Code warm-up, OSR example
Now forget about array range check elimination
17 of 50
Before:
public static void main(String… args) {
loop1: if(P1) goto done1
i=0;
loop2: if(P2) goto done2
A[i++];
goto loop2; // OSR goes here
done2:
goto loop1;
done1:
}
After:
void OSR_main() {
A= // from interpreter
i= // from interpreter
loop2: if(P2) {
if(P1) goto done1
i=0;
} else { A[i++]; }
goto loop2
done1:
}
Reaching the steady-state, summary
Always do warm-up to reach steady-state
• Use the same data and the same code
• Discard warm-up results
• Avoid OSR
• Don’t run benchmark in the “mixed modes” (interpreter/compiler)
• Check class loading and compilation
18 of 50
Using JMH
Provides Maven archetype for a quick project setup
Annotate your methods with @GenerateMicroBenchmark
mvn install will build ready to use runnable jar with your
benchmarks and needed infrastructure
java -jar target/mb.jar <benchmark regex> [options]
Will perform warm-up following by a set of iterations
Print the results
19 of 50
Side effects that can affect
performance
20 of 50
Synchronization puzzle
void testSynchInner() {
synchronized (this) {
i++;
}
}
synchronized void testSynchOuter() {
i++;
}
21 of 50
8,244,087 usec
13,383,707 usec
Synchronization puzzle, side effect
Biased Locking: an optimization in the VM that leaves an object
as logically locked by a given thread even after the thread has
released the lock (cheap reacquisition)
Does not work on VM start up
(4 sec in HotSpot)
Use -XX:BiasedLockingStartupDelay=0
22 of 50
JVM optimizations (Good or Evil?)
WARNING: some of the following optimizations
will not work (at least for the given examples)
in Java 6 (jdk1.6.0_26), consider using Java 7 (jdk1.7.0_21)
23 of 50
Dead code elimination
VM optimization eliminates dead branches of code
Even if the code is meant to be executed,
but the result is never used and does not have any side effect
Always consume all the results of your benchmarked code
Or you’ll get the “over 9000” performance level
Do not accumulate results or store them in class fields that are
never used either
Use them in the unobvious logical expression instead
24 of 50
Dead code elimination, example
Measurement: average nanoseconds / operation, less is better
25 of 50
private double n = 10;
public void stub() { }
public void dead() {
@SuppressWarnings("unused")
double r = n * Math.log(n) / 2;
}
public void alive() {
double r = n * Math.log(n) / 2;
if(r == n && r == 0)
throw new IllegalStateException();
}
1.017
48.514
1.008
Constant folding
If the compiler sees that the result of calculation will always be
the same, it will be stored in the constant value and reused
Measurement: average nanoseconds / operation, less is better
26 of 50
private double x = Math.PI;
public void stub() { }
public double wrong() {
return Math.log(Math.PI);
}
public double measureRight() {
return Math.log(x);
}
1.014
1.695
43.435
Loop unrolling
Is there anything bad?
Measurement: average nanoseconds / operation, less is better
27 of 50
private double[] A = new double[2048];
public double plain() {
double sum = 0;
for (int i = 0; i < A.length; i++)
sum += A[i];
return sum;
}
public double manualUnroll() {
double sum = 0;
for (int i = 0; i < A.length; i += 4)
sum += A[i] + A[i + 1] + A[i + 2] + A[i + 3];
return sum;
}
2773.883
816.791
Loop unrolling and hoisting
Something bad happens when
the loops of benchmark infrastructure code are unrolled
And the calculations that we try to measure
are hoisted from the loop
For example, Caliper style benchmark looks like
private int reps(int reps) {
int s = 0;
for (int i = 0; i < reps; i++)
s += (x + y);
return s;
}
28 of 50
Loop unrolling and hoisting, example
29 of 50
@GenerateMicroBenchmark
public int measureRight() {
return (x + y);
}
@GenerateMicroBenchmark
@OperationsPerInvocation(1)
public int measureWrong_1() {
return reps(1);
}
...
@GenerateMicroBenchmark
@OperationsPerInvocation(N)
public int measureWrong_N() {
return reps(N);
}
Loop unrolling and hoisting, example
Method Result
Right 2.104
Wrong_1 2.055
Wrong_10 0.267
Wrong_100 0.033
Wrong_1000 0.057
Wrong_10000 0.045
Wrong_100000 0.043
30 of 50
Measurement: average nanoseconds / operation, less is better
A word about concurrency
Processes and threads fight for resources
(single threaded benchmark is a utopia)
31 of 50
Concurrency problems of benchmarks
Benchmark states should be correctly
• Initialized
• Published
• Shared between certain group of threads
Multi threaded benchmark iteration should be synchronized
and all threads should start their work at the same time
No need to implement this infrastructure yourself,
just write a correct benchmark using your favorite framework
32 of 50
Full example, “human factor” included
33 of 50
List iteration
Which list implementation is faster for the foreach loop?
ArrayList and LinkedList sequential iteration is linear, O(n)
• ArrayList Iterator.next(): return array[cursor++];
• LinkedList Iterator.next(): return current = current.next;
Let’s check for the list of 1 million Integer’s
34 of 50
List iteration, foreach vs iterator
35 of 50
public List<Integer> arrayListForeach() {
for(Integer i : arrayList) {
}
return arrayList;
}
public Iterator<Integer> arrayListIterator() {
Iterator<Integer> iterator = arrayList.iterator();
while(iterator.hasNext()) {
iterator.next();
}
return iterator;
}
23.659
Measurement: average milliseconds / operation, less is better
22.445
List iteration, foreach < iterator, why?
Foreach variant assigns element to a local variable
for(Integer i : arrayList)
Iterator variant does not
iterator.next();
We need to change Iterator variant to
Integer i = iterator.next();
So now it’s correct to compare the results, at least according to
the bytecode 
36 of 50
List iteration, benchmark
37 of 50
@GenerateMicroBenchmark(BenchmarkType.All)
public List<Integer> arrayListForeach() {
for(Integer i : arrayList) {
}
return arrayList;
}
@GenerateMicroBenchmark(BenchmarkType.All)
public Iterator<Integer> arrayListIterator() {
Iterator<Integer> iterator = arrayList.iterator();
while(iterator.hasNext()) {
Integer i = iterator.next();
}
return iterator;
}
List iteration, benchmark, result
List impl Iteration Java 6 Java 7
ArrayList
foreach 24.792 5.118
iterator 24.769 0.140
LinkedList
foreach 15.236 9.485
iterator 15.255 9.306
38 of 50
Measurement: average milliseconds / operation, less is better
Java 6 ArrayList uses AbstractList.Itr,
LinkedList has its own, so there is less abstractions
(in Java 7 ArrayList has its own optimized iterator)
List iteration, benchmark, result
List impl Iteration Java 6 Java 7
ArrayList
foreach 24.792 5.118
iterator 24.769 0.140
LinkedList
foreach 15.236 9.485
iterator 15.255 9.306
39 of 50
Measurement: average milliseconds / operation, less is better
WTF?!
List iteration, benchmark, loop-hoisting
40 of 50
ListBenchmark.arrayListIterator()
Iterator<Integer> iterator = arrayList.iterator();
while(iterator.hasNext()) {
iterator.next();
}
return iterator;
ArrayList.Itr<E>.next()
if (modCount != expectedModCount) throw new CME();
int i = cursor;
if (i >= size) throw new NoSuchElementException();
Object[] elementData = ArrayList.this.elementData;
if (i >= elementData.length) throw new CME();
cursor = i + 1;
return (E) elementData[lastRet = i];
List iteration, benchmark, BlackHole
41 of 50
@GenerateMicroBenchmark(BenchmarkType.All)
public void arrayListForeach(BlackHole bh) {
for(Integer i : arrayList) {
bh.consume(i);
}
}
@GenerateMicroBenchmark(BenchmarkType.All)
public void arrayListIterator(BlackHole bh) {
Iterator<Integer> iterator = arrayList.iterator();
while(iterator.hasNext()) {
Integer i = iterator.next();
bh.consume(i);
}
}
List iteration, benchmark, correct result
List impl Iteration Java 6 Java 7 Java 7 BlackHole
ArrayList
foreach 24.792 5.118 8.550
iterator 24.769 0.140 8.608
LinkedList
foreach 15.236 9.485 11.739
iterator 15.255 9.306 11.763
42 of 50
Measurement: average milliseconds / operation, less is better
A fleeting glimpse on the JMH details
We already know that JMH
• Uses maven
• Uses annotation-driven approach to detect benchmarks
• Provides BlackHole to consume results (and CPU cycles)
43 of 50
JMH: Building infrastructure
Finds annotated micro-benchmarks using reflection
Generates infrastructure plain java source code
around the calls to the micro-benchmarks
Compile, pack, run, profit
No reflection during benchmark execution
44 of 50
JMH: Various metrics
Single execution time
Operations per time unit
Average time per operation
Percentile estimation of time per operation
45 of 50
JMH: Concurrency infrastructure
@State of benchmark data is shared across the benchmark,
thread, or a group of threads
Allows to perform Fixtures (setUp and tearDown) in scope of
the whole run, iteration or single execution
@Threads a simple way to run concurrent test if you defined
correct @State
@Group threads to assign them for a particular role in the
benchmark
46 of 50
JMH: VM forking
Allows to compare results obtained from various instances of VM
• First test will work on the clean JVM and others will not
• VM processes are not determined and may vary from run to run
(compilation order, multi-threading, randomization)
47 of 50
JMH: @CompilerControl
Instructions whether to compile method or not
Instructions whether to inline methods
Inserting breakpoints into generated code
Printing methods assembly
48 of 50
Conclusions
Do not reinvent the wheel, if you are not sure how it should work
(consider using existing one)
Consider the results being wrong if you don’t have a clear
explanation. Do not swallow that mystical behavior
49 of 50
Thanks for you attention
Questions?
50 of 50

More Related Content

What's hot

JVM JIT compilation overview by Vladimir Ivanov
JVM JIT compilation overview by Vladimir IvanovJVM JIT compilation overview by Vladimir Ivanov
JVM JIT compilation overview by Vladimir IvanovZeroTurnaround
 
NkSIP: The Erlang SIP application server
NkSIP: The Erlang SIP application serverNkSIP: The Erlang SIP application server
NkSIP: The Erlang SIP application server
Carlos González Florido
 
Distributed Locking in Kubernetes
Distributed Locking in KubernetesDistributed Locking in Kubernetes
Distributed Locking in Kubernetes
Rafał Leszko
 
Semmle Codeql
Semmle Codeql Semmle Codeql
Semmle Codeql
M. S.
 
Spring Boot Actuator 2.0 & Micrometer
Spring Boot Actuator 2.0 & MicrometerSpring Boot Actuator 2.0 & Micrometer
Spring Boot Actuator 2.0 & Micrometer
Toshiaki Maki
 
リアクティブプログラミングとMVVMパターンについて
リアクティブプログラミングとMVVMパターンについてリアクティブプログラミングとMVVMパターンについて
リアクティブプログラミングとMVVMパターンについて
Hidenori Takeshita
 
Winning the Lottery with Spring: A Microservices Case Study for the Dutch Lot...
Winning the Lottery with Spring: A Microservices Case Study for the Dutch Lot...Winning the Lottery with Spring: A Microservices Case Study for the Dutch Lot...
Winning the Lottery with Spring: A Microservices Case Study for the Dutch Lot...
VMware Tanzu
 
PHP5.5新機能「ジェネレータ」初心者入門
PHP5.5新機能「ジェネレータ」初心者入門PHP5.5新機能「ジェネレータ」初心者入門
PHP5.5新機能「ジェネレータ」初心者入門
kwatch
 
ORM Injection
ORM InjectionORM Injection
ORM Injection
Simone Onofri
 
What's new in Scala 2.13?
What's new in Scala 2.13?What's new in Scala 2.13?
What's new in Scala 2.13?
Hermann Hueck
 
An introduction to JVM performance
An introduction to JVM performanceAn introduction to JVM performance
An introduction to JVM performance
Rafael Winterhalter
 
분산 트랜잭션 환경에서 데이터 일관성 유지 방안 업로드용
분산 트랜잭션 환경에서 데이터 일관성 유지 방안 업로드용분산 트랜잭션 환경에서 데이터 일관성 유지 방안 업로드용
분산 트랜잭션 환경에서 데이터 일관성 유지 방안 업로드용
승필 박
 
Intrinsic Methods in HotSpot VM
Intrinsic Methods in HotSpot VMIntrinsic Methods in HotSpot VM
Intrinsic Methods in HotSpot VM
Kris Mok
 
Burp Suite v1.1 Introduction
Burp Suite v1.1 IntroductionBurp Suite v1.1 Introduction
Burp Suite v1.1 Introduction
Ashraf Bashir
 
Automating Web Application Security Testing With OWASP ZAP DOT NET API - Tech...
Automating Web Application Security Testing With OWASP ZAP DOT NET API - Tech...Automating Web Application Security Testing With OWASP ZAP DOT NET API - Tech...
Automating Web Application Security Testing With OWASP ZAP DOT NET API - Tech...
gmaran23
 
Pragmatic functional refactoring with java 8
Pragmatic functional refactoring with java 8Pragmatic functional refactoring with java 8
Pragmatic functional refactoring with java 8
RichardWarburton
 
Spring Core
Spring CoreSpring Core
Spring Core
Pushan Bhattacharya
 
Exploiting Deserialization Vulnerabilities in Java
Exploiting Deserialization Vulnerabilities in JavaExploiting Deserialization Vulnerabilities in Java
Exploiting Deserialization Vulnerabilities in Java
CODE WHITE GmbH
 
Reactive Microservices with Quarkus
Reactive Microservices with QuarkusReactive Microservices with Quarkus
Reactive Microservices with Quarkus
Niklas Heidloff
 

What's hot (20)

JVM JIT compilation overview by Vladimir Ivanov
JVM JIT compilation overview by Vladimir IvanovJVM JIT compilation overview by Vladimir Ivanov
JVM JIT compilation overview by Vladimir Ivanov
 
NkSIP: The Erlang SIP application server
NkSIP: The Erlang SIP application serverNkSIP: The Erlang SIP application server
NkSIP: The Erlang SIP application server
 
Distributed Locking in Kubernetes
Distributed Locking in KubernetesDistributed Locking in Kubernetes
Distributed Locking in Kubernetes
 
Semmle Codeql
Semmle Codeql Semmle Codeql
Semmle Codeql
 
Spring Boot Actuator 2.0 & Micrometer
Spring Boot Actuator 2.0 & MicrometerSpring Boot Actuator 2.0 & Micrometer
Spring Boot Actuator 2.0 & Micrometer
 
リアクティブプログラミングとMVVMパターンについて
リアクティブプログラミングとMVVMパターンについてリアクティブプログラミングとMVVMパターンについて
リアクティブプログラミングとMVVMパターンについて
 
Winning the Lottery with Spring: A Microservices Case Study for the Dutch Lot...
Winning the Lottery with Spring: A Microservices Case Study for the Dutch Lot...Winning the Lottery with Spring: A Microservices Case Study for the Dutch Lot...
Winning the Lottery with Spring: A Microservices Case Study for the Dutch Lot...
 
PHP5.5新機能「ジェネレータ」初心者入門
PHP5.5新機能「ジェネレータ」初心者入門PHP5.5新機能「ジェネレータ」初心者入門
PHP5.5新機能「ジェネレータ」初心者入門
 
ORM Injection
ORM InjectionORM Injection
ORM Injection
 
What's new in Scala 2.13?
What's new in Scala 2.13?What's new in Scala 2.13?
What's new in Scala 2.13?
 
An introduction to JVM performance
An introduction to JVM performanceAn introduction to JVM performance
An introduction to JVM performance
 
분산 트랜잭션 환경에서 데이터 일관성 유지 방안 업로드용
분산 트랜잭션 환경에서 데이터 일관성 유지 방안 업로드용분산 트랜잭션 환경에서 데이터 일관성 유지 방안 업로드용
분산 트랜잭션 환경에서 데이터 일관성 유지 방안 업로드용
 
Intrinsic Methods in HotSpot VM
Intrinsic Methods in HotSpot VMIntrinsic Methods in HotSpot VM
Intrinsic Methods in HotSpot VM
 
Awk勉強会用資料
Awk勉強会用資料Awk勉強会用資料
Awk勉強会用資料
 
Burp Suite v1.1 Introduction
Burp Suite v1.1 IntroductionBurp Suite v1.1 Introduction
Burp Suite v1.1 Introduction
 
Automating Web Application Security Testing With OWASP ZAP DOT NET API - Tech...
Automating Web Application Security Testing With OWASP ZAP DOT NET API - Tech...Automating Web Application Security Testing With OWASP ZAP DOT NET API - Tech...
Automating Web Application Security Testing With OWASP ZAP DOT NET API - Tech...
 
Pragmatic functional refactoring with java 8
Pragmatic functional refactoring with java 8Pragmatic functional refactoring with java 8
Pragmatic functional refactoring with java 8
 
Spring Core
Spring CoreSpring Core
Spring Core
 
Exploiting Deserialization Vulnerabilities in Java
Exploiting Deserialization Vulnerabilities in JavaExploiting Deserialization Vulnerabilities in Java
Exploiting Deserialization Vulnerabilities in Java
 
Reactive Microservices with Quarkus
Reactive Microservices with QuarkusReactive Microservices with Quarkus
Reactive Microservices with Quarkus
 

Similar to Java Micro-Benchmarking

Code instrumentation
Code instrumentationCode instrumentation
Code instrumentation
Bryan Reinero
 
00_Introduction to Java.ppt
00_Introduction to Java.ppt00_Introduction to Java.ppt
00_Introduction to Java.ppt
HongAnhNguyn285885
 
Efficient and Advanced Omniscient Debugging for xDSMLs (SLE 2015)
Efficient and Advanced Omniscient Debugging for xDSMLs (SLE 2015)Efficient and Advanced Omniscient Debugging for xDSMLs (SLE 2015)
Efficient and Advanced Omniscient Debugging for xDSMLs (SLE 2015)
Benoit Combemale
 
Test driven development
Test driven developmentTest driven development
Test driven development
christoforosnalmpantis
 
How and what to unit test
How and what to unit testHow and what to unit test
How and what to unit test
Eugenio Lentini
 
JAVASCRIPT TDD(Test driven Development) & Qunit Tutorial
JAVASCRIPT TDD(Test driven Development) & Qunit TutorialJAVASCRIPT TDD(Test driven Development) & Qunit Tutorial
JAVASCRIPT TDD(Test driven Development) & Qunit Tutorial
Anup Singh
 
Node.js Community Benchmarking WG update
Node.js Community  Benchmarking WG updateNode.js Community  Benchmarking WG update
Node.js Community Benchmarking WG update
Michael Dawson
 
What’s eating python performance
What’s eating python performanceWhat’s eating python performance
What’s eating python performance
Piotr Przymus
 
Pi j1.3 operators
Pi j1.3 operatorsPi j1.3 operators
Pi j1.3 operators
mcollison
 
SoCal Code Camp 2015: An introduction to Java 8
SoCal Code Camp 2015: An introduction to Java 8SoCal Code Camp 2015: An introduction to Java 8
SoCal Code Camp 2015: An introduction to Java 8
Chaitanya Ganoo
 
Automated and Scalable Solutions for Software Testing: The Essential Role of ...
Automated and Scalable Solutions for Software Testing: The Essential Role of ...Automated and Scalable Solutions for Software Testing: The Essential Role of ...
Automated and Scalable Solutions for Software Testing: The Essential Role of ...
Lionel Briand
 
Hadoop cluster performance profiler
Hadoop cluster performance profilerHadoop cluster performance profiler
Hadoop cluster performance profiler
Ihor Bobak
 
IoT Best Practices: Unit Testing
IoT Best Practices: Unit TestingIoT Best Practices: Unit Testing
IoT Best Practices: Unit Testing
farmckon
 
AOP on Android
AOP on AndroidAOP on Android
AOP on Android
Tibor Srámek
 
How to fake_properly
How to fake_properlyHow to fake_properly
How to fake_properly
Rainer Schuettengruber
 
Performance and how to measure it - ProgSCon London 2016
Performance and how to measure it - ProgSCon London 2016Performance and how to measure it - ProgSCon London 2016
Performance and how to measure it - ProgSCon London 2016
Matt Warren
 
maXbox Starter 43 Work with Code Metrics ISO Standard
maXbox Starter 43 Work with Code Metrics ISO StandardmaXbox Starter 43 Work with Code Metrics ISO Standard
maXbox Starter 43 Work with Code Metrics ISO Standard
Max Kleiner
 
From System Modeling to Automated System Testing
From System Modeling to Automated System TestingFrom System Modeling to Automated System Testing
From System Modeling to Automated System Testing
Florian Lier
 
Performance is a Feature!
Performance is a Feature!Performance is a Feature!
Performance is a Feature!
PostSharp Technologies
 
BarcelonaJUG2016: walkmod: how to run and design code transformations
BarcelonaJUG2016: walkmod: how to run and design code transformationsBarcelonaJUG2016: walkmod: how to run and design code transformations
BarcelonaJUG2016: walkmod: how to run and design code transformations
walkmod
 

Similar to Java Micro-Benchmarking (20)

Code instrumentation
Code instrumentationCode instrumentation
Code instrumentation
 
00_Introduction to Java.ppt
00_Introduction to Java.ppt00_Introduction to Java.ppt
00_Introduction to Java.ppt
 
Efficient and Advanced Omniscient Debugging for xDSMLs (SLE 2015)
Efficient and Advanced Omniscient Debugging for xDSMLs (SLE 2015)Efficient and Advanced Omniscient Debugging for xDSMLs (SLE 2015)
Efficient and Advanced Omniscient Debugging for xDSMLs (SLE 2015)
 
Test driven development
Test driven developmentTest driven development
Test driven development
 
How and what to unit test
How and what to unit testHow and what to unit test
How and what to unit test
 
JAVASCRIPT TDD(Test driven Development) & Qunit Tutorial
JAVASCRIPT TDD(Test driven Development) & Qunit TutorialJAVASCRIPT TDD(Test driven Development) & Qunit Tutorial
JAVASCRIPT TDD(Test driven Development) & Qunit Tutorial
 
Node.js Community Benchmarking WG update
Node.js Community  Benchmarking WG updateNode.js Community  Benchmarking WG update
Node.js Community Benchmarking WG update
 
What’s eating python performance
What’s eating python performanceWhat’s eating python performance
What’s eating python performance
 
Pi j1.3 operators
Pi j1.3 operatorsPi j1.3 operators
Pi j1.3 operators
 
SoCal Code Camp 2015: An introduction to Java 8
SoCal Code Camp 2015: An introduction to Java 8SoCal Code Camp 2015: An introduction to Java 8
SoCal Code Camp 2015: An introduction to Java 8
 
Automated and Scalable Solutions for Software Testing: The Essential Role of ...
Automated and Scalable Solutions for Software Testing: The Essential Role of ...Automated and Scalable Solutions for Software Testing: The Essential Role of ...
Automated and Scalable Solutions for Software Testing: The Essential Role of ...
 
Hadoop cluster performance profiler
Hadoop cluster performance profilerHadoop cluster performance profiler
Hadoop cluster performance profiler
 
IoT Best Practices: Unit Testing
IoT Best Practices: Unit TestingIoT Best Practices: Unit Testing
IoT Best Practices: Unit Testing
 
AOP on Android
AOP on AndroidAOP on Android
AOP on Android
 
How to fake_properly
How to fake_properlyHow to fake_properly
How to fake_properly
 
Performance and how to measure it - ProgSCon London 2016
Performance and how to measure it - ProgSCon London 2016Performance and how to measure it - ProgSCon London 2016
Performance and how to measure it - ProgSCon London 2016
 
maXbox Starter 43 Work with Code Metrics ISO Standard
maXbox Starter 43 Work with Code Metrics ISO StandardmaXbox Starter 43 Work with Code Metrics ISO Standard
maXbox Starter 43 Work with Code Metrics ISO Standard
 
From System Modeling to Automated System Testing
From System Modeling to Automated System TestingFrom System Modeling to Automated System Testing
From System Modeling to Automated System Testing
 
Performance is a Feature!
Performance is a Feature!Performance is a Feature!
Performance is a Feature!
 
BarcelonaJUG2016: walkmod: how to run and design code transformations
BarcelonaJUG2016: walkmod: how to run and design code transformationsBarcelonaJUG2016: walkmod: how to run and design code transformations
BarcelonaJUG2016: walkmod: how to run and design code transformations
 

Recently uploaded

FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
Paul Groth
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Inflectra
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Tobias Schneck
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
Frank van Harmelen
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
Product School
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
Product School
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Ramesh Iyer
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Product School
 
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptxIOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
Abida Shariff
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
Alison B. Lowndes
 
ODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User GroupODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User Group
CatarinaPereira64715
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
Ralf Eggert
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
Cheryl Hung
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Jeffrey Haguewood
 

Recently uploaded (20)

FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
 
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptxIOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
 
ODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User GroupODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User Group
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
 

Java Micro-Benchmarking

  • 2. Agenda Benchmark definition, types, common problems Tools needed to measure performance Code warm-up, what happens before the steady-state Using JMH Side effects that can affect performance JVM optimizations (Good or Evil?) A word about concurrency Full example, “human factor” included A fleeting glimpse on the JMH details 2 of 50
  • 3. Benchmark definition, types, common problems 3 of 50
  • 4. What is a “Benchmark”? Benchmark is a program for performance measurement Requirements: Dimensions: throughput and latency Avoid significant overhead Test what is to be tested Perform a set of executions and provide stable reproducible results Should be easy to run 4 of 50
  • 5. Benchmark types By scale • Micro-benchmark (component level) • Macro-benchmark (system level) By nature • Synthetic benchmark (emulate component load) • Application benchmark (run real-world application) 5 of 50
  • 6. We’ll talk about Synthetic micro-benchmark • Mimic component workload separately from the application • Measure performance of a small isolated piece of code The main concern • The smaller the Component we test – the stronger the impact of • Benchmark infrastructure overhead • JVM internal processes • OS and Hardware internals • … and the phases of the Moon • Don’t we really test one of those? 6 of 50
  • 7. When micro-benchmark is needed Most of the time it is not needed at all Does algorithm A work faster than B? (Consider equal analytical estimation) Does this tiny modification make any difference? (from Java, JVM, native code or hardware point of view) 7 of 50
  • 8. Tools needed to measure performance 8 of 50
  • 9. You had one job… 9 of 50 final int COUNT = 100; long start = System.currentTimeMillis(); for (int i = 0; i < COUNT; i++) { // doStuff(); } long duration = System.currentTimeMillis() - start; long avg = duration / COUNT; System.out.println("Average execution time is " + avg + " ms");
  • 10. Pitfall #0 Using profiler to measure performance of small methods (adds significant overhead, measures execution “as is”) “You had one job” approach is enough in real life (not for micro-benchmarks, we got it already) Annotations and reflective benchmark invocations (you must be great at java.lang.reflect measurement) 10 of 50
  • 11. Micro-benchmark frameworks JMH – Takes into account a lot of internal VM processes and executes benchmarks with minimal infrastructure (Oracle) Caliper – Allows to measure repetitive code, works for Android, allows to post results online (Google) Japex – Allows to reduce infrastructure code, generates nice HTML reports with JFreeChart plots JUnitPerf – Measure functionality of the existing JUnit tests 11 of 50
  • 12. Java time interval measurement System.currentTimeMillis() • Value in milliseconds, but granularity depends on the OS • Represents a “wall-clock” time (since the start of Epoch) System.nanoTime() • Value in nanoseconds, since some time offset • The accuracy is not worse than System.currentTimeMillis() ThreadMXBean.getCurrentThreadCpuTime() • The actual CPU time spent for the thread (nanoseconds) • Might be unsupported by your VM • Might be expensive • Relevant for a single thread only 12 of 50
  • 13. Code warm-up, what happens before the steady-state 13 of 50
  • 14. Code warm-up, class loading A single warm-up iteration is NOT enough for Class Loading (not all the branches of classes may load on the first iteration) Sometimes classes are unloaded (it would be a shame if something messed your results up with a huge peak) Get help between iterations from • ClassLoadingMXBean.getTotalLoadedClassCount() • ClassLoadingMXBean.getUnloadedClassCount() • -verbose:class 14 of 50
  • 15. Code warm-up, compilation Classes are loaded, verified and then being compiled Oracle HotSpot and Azul Zing run application in interpreter The hot method is being compiled after ~10k (server), ~1.5k (client) invocations Long methods with loops are likely to be compiled earlier Check CompilationMXBean.getTotalCompilationTime Enable compilation logging with • -XX:+UnlockDiagnosticVMOptions • -XX:+PrintCompilation • -XX:+LogCompilation -XX:LogFile=<filename> 15 of 50
  • 16. Code warm-up, OSR Normal compilation and OSR will result in a similar code …unless compiler is not able to optimize a given frame (e.g. inner loop is compiled before the outer one) In the real world normal compilation is more likely to happen, so it’s better to avoid OSR in your benchmark • Do a set of small warm-up iterations instead of a single big one • Do not perform warm-up loops in the steady-state testing method 16 of 50
  • 17. Code warm-up, OSR example Now forget about array range check elimination 17 of 50 Before: public static void main(String… args) { loop1: if(P1) goto done1 i=0; loop2: if(P2) goto done2 A[i++]; goto loop2; // OSR goes here done2: goto loop1; done1: } After: void OSR_main() { A= // from interpreter i= // from interpreter loop2: if(P2) { if(P1) goto done1 i=0; } else { A[i++]; } goto loop2 done1: }
  • 18. Reaching the steady-state, summary Always do warm-up to reach steady-state • Use the same data and the same code • Discard warm-up results • Avoid OSR • Don’t run benchmark in the “mixed modes” (interpreter/compiler) • Check class loading and compilation 18 of 50
  • 19. Using JMH Provides Maven archetype for a quick project setup Annotate your methods with @GenerateMicroBenchmark mvn install will build ready to use runnable jar with your benchmarks and needed infrastructure java -jar target/mb.jar <benchmark regex> [options] Will perform warm-up following by a set of iterations Print the results 19 of 50
  • 20. Side effects that can affect performance 20 of 50
  • 21. Synchronization puzzle void testSynchInner() { synchronized (this) { i++; } } synchronized void testSynchOuter() { i++; } 21 of 50 8,244,087 usec 13,383,707 usec
  • 22. Synchronization puzzle, side effect Biased Locking: an optimization in the VM that leaves an object as logically locked by a given thread even after the thread has released the lock (cheap reacquisition) Does not work on VM start up (4 sec in HotSpot) Use -XX:BiasedLockingStartupDelay=0 22 of 50
  • 23. JVM optimizations (Good or Evil?) WARNING: some of the following optimizations will not work (at least for the given examples) in Java 6 (jdk1.6.0_26), consider using Java 7 (jdk1.7.0_21) 23 of 50
  • 24. Dead code elimination VM optimization eliminates dead branches of code Even if the code is meant to be executed, but the result is never used and does not have any side effect Always consume all the results of your benchmarked code Or you’ll get the “over 9000” performance level Do not accumulate results or store them in class fields that are never used either Use them in the unobvious logical expression instead 24 of 50
  • 25. Dead code elimination, example Measurement: average nanoseconds / operation, less is better 25 of 50 private double n = 10; public void stub() { } public void dead() { @SuppressWarnings("unused") double r = n * Math.log(n) / 2; } public void alive() { double r = n * Math.log(n) / 2; if(r == n && r == 0) throw new IllegalStateException(); } 1.017 48.514 1.008
  • 26. Constant folding If the compiler sees that the result of calculation will always be the same, it will be stored in the constant value and reused Measurement: average nanoseconds / operation, less is better 26 of 50 private double x = Math.PI; public void stub() { } public double wrong() { return Math.log(Math.PI); } public double measureRight() { return Math.log(x); } 1.014 1.695 43.435
  • 27. Loop unrolling Is there anything bad? Measurement: average nanoseconds / operation, less is better 27 of 50 private double[] A = new double[2048]; public double plain() { double sum = 0; for (int i = 0; i < A.length; i++) sum += A[i]; return sum; } public double manualUnroll() { double sum = 0; for (int i = 0; i < A.length; i += 4) sum += A[i] + A[i + 1] + A[i + 2] + A[i + 3]; return sum; } 2773.883 816.791
  • 28. Loop unrolling and hoisting Something bad happens when the loops of benchmark infrastructure code are unrolled And the calculations that we try to measure are hoisted from the loop For example, Caliper style benchmark looks like private int reps(int reps) { int s = 0; for (int i = 0; i < reps; i++) s += (x + y); return s; } 28 of 50
  • 29. Loop unrolling and hoisting, example 29 of 50 @GenerateMicroBenchmark public int measureRight() { return (x + y); } @GenerateMicroBenchmark @OperationsPerInvocation(1) public int measureWrong_1() { return reps(1); } ... @GenerateMicroBenchmark @OperationsPerInvocation(N) public int measureWrong_N() { return reps(N); }
  • 30. Loop unrolling and hoisting, example Method Result Right 2.104 Wrong_1 2.055 Wrong_10 0.267 Wrong_100 0.033 Wrong_1000 0.057 Wrong_10000 0.045 Wrong_100000 0.043 30 of 50 Measurement: average nanoseconds / operation, less is better
  • 31. A word about concurrency Processes and threads fight for resources (single threaded benchmark is a utopia) 31 of 50
  • 32. Concurrency problems of benchmarks Benchmark states should be correctly • Initialized • Published • Shared between certain group of threads Multi threaded benchmark iteration should be synchronized and all threads should start their work at the same time No need to implement this infrastructure yourself, just write a correct benchmark using your favorite framework 32 of 50
  • 33. Full example, “human factor” included 33 of 50
  • 34. List iteration Which list implementation is faster for the foreach loop? ArrayList and LinkedList sequential iteration is linear, O(n) • ArrayList Iterator.next(): return array[cursor++]; • LinkedList Iterator.next(): return current = current.next; Let’s check for the list of 1 million Integer’s 34 of 50
  • 35. List iteration, foreach vs iterator 35 of 50 public List<Integer> arrayListForeach() { for(Integer i : arrayList) { } return arrayList; } public Iterator<Integer> arrayListIterator() { Iterator<Integer> iterator = arrayList.iterator(); while(iterator.hasNext()) { iterator.next(); } return iterator; } 23.659 Measurement: average milliseconds / operation, less is better 22.445
  • 36. List iteration, foreach < iterator, why? Foreach variant assigns element to a local variable for(Integer i : arrayList) Iterator variant does not iterator.next(); We need to change Iterator variant to Integer i = iterator.next(); So now it’s correct to compare the results, at least according to the bytecode  36 of 50
  • 37. List iteration, benchmark 37 of 50 @GenerateMicroBenchmark(BenchmarkType.All) public List<Integer> arrayListForeach() { for(Integer i : arrayList) { } return arrayList; } @GenerateMicroBenchmark(BenchmarkType.All) public Iterator<Integer> arrayListIterator() { Iterator<Integer> iterator = arrayList.iterator(); while(iterator.hasNext()) { Integer i = iterator.next(); } return iterator; }
  • 38. List iteration, benchmark, result List impl Iteration Java 6 Java 7 ArrayList foreach 24.792 5.118 iterator 24.769 0.140 LinkedList foreach 15.236 9.485 iterator 15.255 9.306 38 of 50 Measurement: average milliseconds / operation, less is better Java 6 ArrayList uses AbstractList.Itr, LinkedList has its own, so there is less abstractions (in Java 7 ArrayList has its own optimized iterator)
  • 39. List iteration, benchmark, result List impl Iteration Java 6 Java 7 ArrayList foreach 24.792 5.118 iterator 24.769 0.140 LinkedList foreach 15.236 9.485 iterator 15.255 9.306 39 of 50 Measurement: average milliseconds / operation, less is better WTF?!
  • 40. List iteration, benchmark, loop-hoisting 40 of 50 ListBenchmark.arrayListIterator() Iterator<Integer> iterator = arrayList.iterator(); while(iterator.hasNext()) { iterator.next(); } return iterator; ArrayList.Itr<E>.next() if (modCount != expectedModCount) throw new CME(); int i = cursor; if (i >= size) throw new NoSuchElementException(); Object[] elementData = ArrayList.this.elementData; if (i >= elementData.length) throw new CME(); cursor = i + 1; return (E) elementData[lastRet = i];
  • 41. List iteration, benchmark, BlackHole 41 of 50 @GenerateMicroBenchmark(BenchmarkType.All) public void arrayListForeach(BlackHole bh) { for(Integer i : arrayList) { bh.consume(i); } } @GenerateMicroBenchmark(BenchmarkType.All) public void arrayListIterator(BlackHole bh) { Iterator<Integer> iterator = arrayList.iterator(); while(iterator.hasNext()) { Integer i = iterator.next(); bh.consume(i); } }
  • 42. List iteration, benchmark, correct result List impl Iteration Java 6 Java 7 Java 7 BlackHole ArrayList foreach 24.792 5.118 8.550 iterator 24.769 0.140 8.608 LinkedList foreach 15.236 9.485 11.739 iterator 15.255 9.306 11.763 42 of 50 Measurement: average milliseconds / operation, less is better
  • 43. A fleeting glimpse on the JMH details We already know that JMH • Uses maven • Uses annotation-driven approach to detect benchmarks • Provides BlackHole to consume results (and CPU cycles) 43 of 50
  • 44. JMH: Building infrastructure Finds annotated micro-benchmarks using reflection Generates infrastructure plain java source code around the calls to the micro-benchmarks Compile, pack, run, profit No reflection during benchmark execution 44 of 50
  • 45. JMH: Various metrics Single execution time Operations per time unit Average time per operation Percentile estimation of time per operation 45 of 50
  • 46. JMH: Concurrency infrastructure @State of benchmark data is shared across the benchmark, thread, or a group of threads Allows to perform Fixtures (setUp and tearDown) in scope of the whole run, iteration or single execution @Threads a simple way to run concurrent test if you defined correct @State @Group threads to assign them for a particular role in the benchmark 46 of 50
  • 47. JMH: VM forking Allows to compare results obtained from various instances of VM • First test will work on the clean JVM and others will not • VM processes are not determined and may vary from run to run (compilation order, multi-threading, randomization) 47 of 50
  • 48. JMH: @CompilerControl Instructions whether to compile method or not Instructions whether to inline methods Inserting breakpoints into generated code Printing methods assembly 48 of 50
  • 49. Conclusions Do not reinvent the wheel, if you are not sure how it should work (consider using existing one) Consider the results being wrong if you don’t have a clear explanation. Do not swallow that mystical behavior 49 of 50
  • 50. Thanks for you attention Questions? 50 of 50