Kickass benchmarking
with JMH
Leonardo F. Gomes
Nenad Bogojevic
0
Done, works, fast!
1
Premature optimization
Donald Knuth, 1974
is the root of all evil
2
Donald E. Knuth
Professor Emeritus at Stanford University
ACM Grace Murray Hopper Award
Turing Award
Author of The Art Of Computer Programming
Creator of TeX
3
4
Donald E. Knuth
We should forget about small
efficiencies, say about 97% of the
time: premature optimization is
the root of all evil.
Yet we should not pass up our
opportunity in that critical 3%.
5
Donald E. Knuth
A good programmer will not be
lulled into complacency by such
reasoning, he will be wise to
look carefully at the critical code;
but only after that code has
been identified.
6
Donald E. Knuth
It is often a mistake to make
a priori judgements about
what parts of a program are
really critical.
7
Donald E. Knuth
The universal experience of
programmers who have been
using measurement tools has been
that the intuitive guesses fail.
8
Good programmers measure
before optimizing
9
Benchmarking is hard
11
Why?
12
Warmup
phase
13
Java source
code
Bytecodes
HotSpot
Java VM
compile execute
_Ahead-of-
time
_Using javac
_Instructions for an abstract machine
14
Bytecodes
HotSpot Java VM
Interpreter
Heap
Stack
Garbage
collector
execute
access
access
manage
vC1
C2
Machine code
Debug info
Compiled method
Object maps
compile produce
Compilation
system
15
Compiler
optimizations
16
©2016AmadeusITGroupanditsaffiliatesandsubsidiaries
Don’t roll out
your own
benchmarking harness
18
JMH is your friend
19
Java
Microbenchmark
Harness
20
JVM
Microbenchmark
Harness
21
JVM
Millibenchmark
Harness
22
JVM
Macrobenchmark
Harness
23
JVM
Nanobenchmark
Harness
24
JMH is for benchmarking
what JUnit is
for unit testing
25
Macro 1 … 1000s
Milli 1 … 1000ms
Micro 1 … 1000us
Nano 1 … 1000ns
27
Granularity
Benchmark modes
Throughput ops/time_unit
AverageTime time/operation
SampleTime percentiles
SingleShotTime cold performance
28
29
@Warmup(iterations=5, time=1,
timeUnit=SECONDS)
@Measurement(iterations=5, time=1,
timeUnit=SECONDS)
Multithreading
30
Multithreading
made easy
@Threads(20)
@State(Scope.Thread)
31
Multithreading
made easy
results are aggregated for you
32
Anatomy of
33
Hashtable
34
table
lock
thread 0
get ( key0 )
thread 1
get ( key1 )
put ( key0, value0 ) put ( key1, value1 )
lock (thread 0)
Anatomy of
35
ConcurrentHashMap
36
Segment Segment Segment Segment
lock lock lock lock
thread 0
put ( key0, value0)
segmentFor ( hash0 )
thread 1
put ( key1, value1)
segmentFor ( hash1 )
thread 2
put ( key2, value2)
segmentFor ( hash2 )
lock (thread 0) lock (thread 1)
37
Segment Segment Segment Segment
lock lock lock lock
thread 0
get ( key0 )
segmentFor ( hash0 )
thread 2
get ( key2 )
segmentFor ( hash2 )
read volatile read volatile
Built-in profilers can show
If compilation is happening while measuring
If class loading is happening while measuring
How much object allocation is happening
Which methods are consuming CPU time
39
External profilers can be used
Linux perf_events
Windows xperf
Java Mission Control (pluggable)
Yourkit, etc.
40
JMH’s adopters
42
Our experience
at amadeus
Verify that new code matches expectations
Check that no regression is introduced
Validate optimization ideas
Cover performance fixes with related test
Continuous
Integration
Care about a warmup phase
Reduce noise
Define regression
Make sure backlog is handled
Key takeaways
50
Benchmark is tricky
Measure before optimizing
JMH helps a lot
• Caliper: https://www.flickr.com/photos/andrewthecook/14026422669/sizes/l
• Geometric forms: https://www.flickr.com/photos/internetarchivebookimages/14753972274/sizes/l
• Metric tape: https://www.flickr.com/photos/ilianov/3345314090/sizes/l/
• Mountain: https://www.flickr.com/photos/pthread/8151096195/sizes/l
• Friends: https://www.flickr.com/photos/livenature/13895494231/sizes/l
• Root: https://www.flickr.com/photos/paperpariah/19937816358/sizes/l/
• Knuth: https://www.flickr.com/photos/ioerror/56360019/sizes/l
• Warmup: https://www.flickr.com/photos/komunews/2085730526/sizes/o/
• Multithreading: https://www.flickr.com/photos/slimjim/4329655445/sizes/l
• Stop: https://www.flickr.com/photos/thematthewknot/3924980314/sizes/l
• Boats: https://www.flickr.com/photos/cuppini/8465318134/sizes/l
• Next steps: https://www.flickr.com/photos/gebagia/22346547334/sizes/l
• Marines: https://www.flickr.com/photos/dvids/14007373489/sizes/l
• Artic ice: https://commons.wikimedia.org/wiki/File:ICESCAPE.jpg
• Demo time: https://www.flickr.com/photos/abstractbynature/6111219203
• Blue sky: https://www.flickr.com/photos/foctavian/16371691937/
51
Questions?
52
53
Follow us
@lgomes
@nenadbo
github.com/kickass-jmh

Kickass benchmarking with JMH Riviera Dev 2017

Editor's Notes

  • #2 Motivation: Professional programmers should take responsibility on the performance of the code that they are developing. Performance is frequently something that is overlooked until last phases of the development process, whereas it should actually be integrated in the development process. TODO: Add some motivation around environmental benefits.
  • #4 This catching phrase is usually used without much context. Just like biblical citations it can lead to religious wars. Let’s check the context around that phrase.
  • #5 Let’s put some context around that citation. Look into what’s written before and after that phrase in the paper where it appeared. Knuth is the guy who said that. There’s some polemic about whether the quote is originally from Knuth or if he was citing Tony Hoare. This article tries to ”prove” that it’s actually from Knuth: https://shreevatsa.wordpress.com/2008/05/16/premature-optimization-is-the-root-of-all-evil/
  • #6 A little bit of context: "Programmers waste enormous amounts of time thinking about, or worrying about, the speed of noncritical parts of their programs, and these attempts at efficiency actually have a strong negative impact when debugging and maintenance are considered. We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. Yet we should not pass up our opportunity in that critical 3%. A good programmer will not be lulled into complacency by such reasoning, he will be wise to look carefully at the critical code; but only after that code has been identified. It is often a mistake to make a priori judgements about what parts of a program are really critical, since the universal experience of programmers who have been using measurement tools has been that the intuitive guesses fail.” Page 268 Donald Knuth Structured programming with go to statements Computing Surveys, Vol. 6, No. 4, December 1974 http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.103.6084&rep=rep1&type=pdf
  • #7 A little bit of context: "Programmers waste enormous amounts of time thinking about, or worrying about, the speed of noncritical parts of their programs, and these attempts at efficiency actually have a strong negative impact when debugging and maintenance are considered. We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. Yet we should not pass up our opportunity in that critical 3%. A good programmer will not be lulled into complacency by such reasoning, he will be wise to look carefully at the critical code; but only after that code has been identified. It is often a mistake to make a priori judgements about what parts of a program are really critical, since the universal experience of programmers who have been using measurement tools has been that the intuitive guesses fail.” Page 268 Donald Knuth Structured programming with go to statements Computing Surveys, Vol. 6, No. 4, December 1974 http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.103.6084&rep=rep1&type=pdf
  • #8 A little bit of context: "Programmers waste enormous amounts of time thinking about, or worrying about, the speed of noncritical parts of their programs, and these attempts at efficiency actually have a strong negative impact when debugging and maintenance are considered. We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. Yet we should not pass up our opportunity in that critical 3%. A good programmer will not be lulled into complacency by such reasoning, he will be wise to look carefully at the critical code; but only after that code has been identified. It is often a mistake to make a priori judgements about what parts of a program are really critical, since the universal experience of programmers who have been using measurement tools has been that the intuitive guesses fail.” Page 268 Donald Knuth Structured programming with go to statements Computing Surveys, Vol. 6, No. 4, December 1974 http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.103.6084&rep=rep1&type=pdf
  • #9 A little bit of context: "Programmers waste enormous amounts of time thinking about, or worrying about, the speed of noncritical parts of their programs, and these attempts at efficiency actually have a strong negative impact when debugging and maintenance are considered. We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. Yet we should not pass up our opportunity in that critical 3%. A good programmer will not be lulled into complacency by such reasoning, he will be wise to look carefully at the critical code; but only after that code has been identified. It is often a mistake to make a priori judgements about what parts of a program are really critical, since the universal experience of programmers who have been using measurement tools has been that the intuitive guesses fail.” Page 268 Donald Knuth Structured programming with go to statements Computing Surveys, Vol. 6, No. 4, December 1974 http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.103.6084&rep=rep1&type=pdf
  • #10 A little bit of context: "Programmers waste enormous amounts of time thinking about, or worrying about, the speed of noncritical parts of their programs, and these attempts at efficiency actually have a strong negative impact when debugging and maintenance are considered. We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. Yet we should not pass up our opportunity in that critical 3%. A good programmer will not be lulled into complacency by such reasoning, he will be wise to look carefully at the critical code; but only after that code has been identified. It is often a mistake to make a priori judgements about what parts of a program are really critical, since the universal experience of programmers who have been using measurement tools has been that the intuitive guesses fail.” Page 268 Donald Knuth Structured programming with go to statements Computing Surveys, Vol. 6, No. 4, December 1974 http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.103.6084&rep=rep1&type=pdf
  • #11 Summary of what Knuth said 
  • #12 https://commons.wikimedia.org/wiki/File:ICESCAPE.jpg
  • #13 Now you’re all convinced that you should be measuring the performance of your code. But wait, don’t just put timers on your unit-tests .
  • #14 Now you’re all convinced that you should be measuring the performance of your code. But wait, don’t just put timers on your unit-tests .
  • #15 Explain how code is initially interpreted; Then, compiled at runtime; Then, it runs in compiled mode.
  • #16 Explain how code is initially interpreted; Then, compiled at runtime; Then, it runs in compiled mode.
  • #17 Explain how code is initially interpreted; Then, compiled at runtime; Then, it runs in compiled mode.
  • #18 Branch prediction Loop unrolling Dead code elimination Autobox elimitation Constant propagation Null check elimination Algebraic simplification Devirtualisation Range check elimitation Etc.
  • #21 JMH is an open-source project that does exactly that. It’s part of the OpenJDK project and we will see how it can help you.
  • #22 JMH is an open-source project that does exactly that. It’s part of the OpenJDK project and we will see how it can help you.
  • #23 JMH is an open-source project that does exactly that. It’s part of the OpenJDK project and we will see how it can help you.
  • #24 JMH is an open-source project that does exactly that. It’s part of the OpenJDK project and we will see how it can help you.
  • #25 JMH is an open-source project that does exactly that. It’s part of the OpenJDK project and we will see how it can help you.
  • #26 JMH is an open-source project that does exactly that. It’s part of the OpenJDK project and we will see how it can help you.
  • #27 JMH is an open-source project that does exactly that. It’s part of the OpenJDK project and we will see how it can help you.
  • #29 JMH is an open-source project that does exactly that. It’s part of the OpenJDK project and we will see how it can help you.
  • #30 JMH is an open-source project that does exactly that. It’s part of the OpenJDK project and we will see how it can help you.
  • #31 JMH is an open-source project that does exactly that. It’s part of the OpenJDK project and we will see how it can help you.
  • #33 JMH is an open-source project that does exactly that. It’s part of the OpenJDK project and we will see how it can help you. The thread scope matches well the concept of application server, because usually Java app servers have scope per thread. This would be like processing 20 requests in parallel. Benchmark scope would be a cache that all your requests are accessing. It should be guarded by synchronization mechanisms to make sure that it remains consistent.
  • #34 JMH is an open-source project that does exactly that. It’s part of the OpenJDK project and we will see how it can help you.
  • #35 JMH is an open-source project that does exactly that. It’s part of the OpenJDK project and we will see how it can help you.
  • #36 Write to a volatile happens-before every subsequent read of that volatile
  • #37 JMH is an open-source project that does exactly that. It’s part of the OpenJDK project and we will see how it can help you.
  • #38 Instead of using a single lock for the shared data, the shared data is segmented with each segment having its own lock. Uncontended lock acquisition is very cheap; it's the contented locks that cause scalability issues. With a different lock for each partition, ConcurrentHashMap effectively reduces how often a lock is requested by the number of partitions. You can think of ConcurrentHashMap as made up of n separate hash tables.
  • #39 Write to a volatile happens-before every subsequent read of that volatile
  • #41 JMH is an open-source project that does exactly that. It’s part of the OpenJDK project and we will see how it can help you.
  • #42 JMH is an open-source project that does exactly that. It’s part of the OpenJDK project and we will see how it can help you.
  • #43 https://github.com/chrishantha/jfr-flame-graph
  • #44 http://netty.io/wiki/microbenchmarks.html https://github.com/grpc/grpc-java/tree/master/benchmarks/src/jmh https://github.com/akka/akka/tree/master/akka-bench-jmh https://github.com/SonarSource/sslr/blob/master/sslr-benchmarks/src/main/java/org/sonar/sslr/benchmarks/RecursiveRuleBenchmark.java https://github.com/droolsjbpm/kie-benchmarks/blob/master/drools-benchmarks/src/main/java/org/drools/benchmarks/session/InsertFireLoopBenchmark.java https://github.com/finagle/finagle-serial#benchmarks
  • #49 LMAX: Our micro-benchmarks currently take over an hour to run, though with more hardware we could run them in parallel to improve this. That's still not bad, but for comparison, our suite of ~11k acceptance tests only takes ~25mins...
  • #51 Reduce noise / Isolate your benchmarks as much as possible (using cpu isolation, sched_setaffinity); Care about a correct warmup phase / Give benchmarks enough time to run; Don't do nanosecond per operation benchmarks in Continuous integration; Define regression / Some variance is expected; Define well your baseline; Differentiate inter-version, intra-version regressions; Make sure issues backlog is tracked and handled.