PPTX, PDF688 views

Kickass benchmarking with JMH Riviera Dev 2017

In this session, we will introduce you to JMH, an OpenJDK harness for building, running and analysing nano/micro/milli/macro benchmarks written in Java and other languages targeting the JVM. It should help you find spots to optimise performance and, which may be even more important, it will show you parts that you don't really need to optimise. It not only will make your benchmarks more accurate, but also much easier to write.

Software◦

Kickass benchmarking
with JMH
Leonardo F. Gomes
Nenad Bogojevic
0

Premature optimization
Donald Knuth, 1974
is the root of all evil
2

Donald E. Knuth
Professor Emeritus at Stanford University
ACM Grace Murray Hopper Award
Turing Award
Author of The Art Of Computer Programming
Creator of TeX
3

Donald E. Knuth
We should forget about small
efficiencies, say about 97% of the
time: premature optimization is
the root of all evil.
Yet we should not pass up our
opportunity in that critical 3%.
5

Donald E. Knuth
A good programmer will not be
lulled into complacency by such
reasoning, he will be wise to
look carefully at the critical code;
but only after that code has
been identified.
6

Donald E. Knuth
It is often a mistake to make
a priori judgements about
what parts of a program are
really critical.
7

Donald E. Knuth
The universal experience of
programmers who have been
using measurement tools has been
that the intuitive guesses fail.
8

Good programmers measure
before optimizing
9

Java source
code
Bytecodes
HotSpot
Java VM
compile execute
_Ahead-of-
time
_Using javac
_Instructions for an abstract machine
14

Bytecodes
HotSpot Java VM
Interpreter
Heap
Stack
Garbage
collector
execute
access
access
manage
vC1
C2
Machine code
Debug info
Compiled method
Object maps
compile produce
Compilation
system
15

©2016AmadeusITGroupanditsaffiliatesandsubsidiaries
Don’t roll out
your own
benchmarking harness
18

JMH is for benchmarking
what JUnit is
for unit testing
25

Macro 1 … 1000s
Milli 1 … 1000ms
Micro 1 … 1000us
Nano 1 … 1000ns
27
Granularity

Benchmark modes
Throughput ops/time_unit
AverageTime time/operation
SampleTime percentiles
SingleShotTime cold performance
28

29
@Warmup(iterations=5, time=1,
timeUnit=SECONDS)
@Measurement(iterations=5, time=1,
timeUnit=SECONDS)

Multithreading
made easy
@Threads(20)
@State(Scope.Thread)
31

Multithreading
made easy
results are aggregated for you
32

34
table
lock
thread 0
get ( key0 )
thread 1
get ( key1 )
put ( key0, value0 ) put ( key1, value1 )
lock (thread 0)

36
Segment Segment Segment Segment
lock lock lock lock
thread 0
put ( key0, value0)
segmentFor ( hash0 )
thread 1
put ( key1, value1)
segmentFor ( hash1 )
thread 2
put ( key2, value2)
segmentFor ( hash2 )
lock (thread 0) lock (thread 1)

37
Segment Segment Segment Segment
lock lock lock lock
thread 0
get ( key0 )
segmentFor ( hash0 )
thread 2
get ( key2 )
segmentFor ( hash2 )
read volatile read volatile

Built-in profilers can show
If compilation is happening while measuring
If class loading is happening while measuring
How much object allocation is happening
Which methods are consuming CPU time
39

External profilers can be used
Linux perf_events
Windows xperf
Java Mission Control (pluggable)
Yourkit, etc.
40

Verify that new code matches expectations
Check that no regression is introduced
Validate optimization ideas
Cover performance fixes with related test

Care about a warmup phase
Reduce noise
Define regression
Make sure backlog is handled

Key takeaways
50
Benchmark is tricky
Measure before optimizing
JMH helps a lot

• Caliper: https://www.flickr.com/photos/andrewthecook/14026422669/sizes/l
• Geometric forms: https://www.flickr.com/photos/internetarchivebookimages/14753972274/sizes/l
• Metric tape: https://www.flickr.com/photos/ilianov/3345314090/sizes/l/
• Mountain: https://www.flickr.com/photos/pthread/8151096195/sizes/l
• Friends: https://www.flickr.com/photos/livenature/13895494231/sizes/l
• Root: https://www.flickr.com/photos/paperpariah/19937816358/sizes/l/
• Knuth: https://www.flickr.com/photos/ioerror/56360019/sizes/l
• Warmup: https://www.flickr.com/photos/komunews/2085730526/sizes/o/
• Multithreading: https://www.flickr.com/photos/slimjim/4329655445/sizes/l
• Stop: https://www.flickr.com/photos/thematthewknot/3924980314/sizes/l
• Boats: https://www.flickr.com/photos/cuppini/8465318134/sizes/l
• Next steps: https://www.flickr.com/photos/gebagia/22346547334/sizes/l
• Marines: https://www.flickr.com/photos/dvids/14007373489/sizes/l
• Artic ice: https://commons.wikimedia.org/wiki/File:ICESCAPE.jpg
• Demo time: https://www.flickr.com/photos/abstractbynature/6111219203
• Blue sky: https://www.flickr.com/photos/foctavian/16371691937/
51

53
Follow us
@lgomes
@nenadbo
github.com/kickass-jmh

Kickass benchmarking with JMH Riviera Dev 2017

1.
Kickass benchmarking with JMH LeonardoF. Gomes Nenad Bogojevic 0
2.
Done, works, fast! 1
3.
Premature optimization Donald Knuth,1974 is the root of all evil 2
4.
Donald E. Knuth ProfessorEmeritus at Stanford University ACM Grace Murray Hopper Award Turing Award Author of The Art Of Computer Programming Creator of TeX 3
5.
4
6.
Donald E. Knuth Weshould forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. Yet we should not pass up our opportunity in that critical 3%. 5
7.
Donald E. Knuth Agood programmer will not be lulled into complacency by such reasoning, he will be wise to look carefully at the critical code; but only after that code has been identified. 6
8.
Donald E. Knuth Itis often a mistake to make a priori judgements about what parts of a program are really critical. 7
9.
Donald E. Knuth Theuniversal experience of programmers who have been using measurement tools has been that the intuitive guesses fail. 8
10.
Good programmers measure beforeoptimizing 9
12.
Benchmarking is hard 11
13.
Why? 12
14.
Warmup phase 13
15.
Java source code Bytecodes HotSpot Java VM compileexecute _Ahead-of- time _Using javac _Instructions for an abstract machine 14
16.
Bytecodes HotSpot Java VM Interpreter Heap Stack Garbage collector execute access access manage vC1 C2 Machinecode Debug info Compiled method Object maps compile produce Compilation system 15
17.
Compiler optimizations 16
20.
JMH is yourfriend 19
21.
Java Microbenchmark Harness 20
22.
JVM Microbenchmark Harness 21
23.
JVM Millibenchmark Harness 22
24.
JVM Macrobenchmark Harness 23
25.
JVM Nanobenchmark Harness 24
26.
JMH is forbenchmarking what JUnit is for unit testing 25
28.
Macro 1 …1000s Milli 1 … 1000ms Micro 1 … 1000us Nano 1 … 1000ns 27 Granularity
29.
Benchmark modes Throughput ops/time_unit AverageTimetime/operation SampleTime percentiles SingleShotTime cold performance 28
30.
29 @Warmup(iterations=5, time=1, timeUnit=SECONDS) @Measurement(iterations=5, time=1, timeUnit=SECONDS)
31.
Multithreading 30
32.
Multithreading made easy @Threads(20) @State(Scope.Thread) 31
33.
Multithreading made easy results areaggregated for you 32
34.
Anatomy of 33 Hashtable
35.
34 table lock thread 0 get (key0 ) thread 1 get ( key1 ) put ( key0, value0 ) put ( key1, value1 ) lock (thread 0)
36.
Anatomy of 35 ConcurrentHashMap
37.
36 Segment Segment SegmentSegment lock lock lock lock thread 0 put ( key0, value0) segmentFor ( hash0 ) thread 1 put ( key1, value1) segmentFor ( hash1 ) thread 2 put ( key2, value2) segmentFor ( hash2 ) lock (thread 0) lock (thread 1)
38.
37 Segment Segment SegmentSegment lock lock lock lock thread 0 get ( key0 ) segmentFor ( hash0 ) thread 2 get ( key2 ) segmentFor ( hash2 ) read volatile read volatile
40.
Built-in profilers canshow If compilation is happening while measuring If class loading is happening while measuring How much object allocation is happening Which methods are consuming CPU time 39
41.
External profilers canbe used Linux perf_events Windows xperf Java Mission Control (pluggable) Yourkit, etc. 40
42.
JMH’s adopters 42
43.
Our experience at amadeus
44.
Verify that newcode matches expectations Check that no regression is introduced Validate optimization ideas Cover performance fixes with related test
45.
Continuous Integration
49.
Care about awarmup phase Reduce noise Define regression Make sure backlog is handled
50.
Key takeaways 50 Benchmark istricky Measure before optimizing JMH helps a lot
51.
• Caliper: https://www.flickr.com/photos/andrewthecook/14026422669/sizes/l •Geometric forms: https://www.flickr.com/photos/internetarchivebookimages/14753972274/sizes/l • Metric tape: https://www.flickr.com/photos/ilianov/3345314090/sizes/l/ • Mountain: https://www.flickr.com/photos/pthread/8151096195/sizes/l • Friends: https://www.flickr.com/photos/livenature/13895494231/sizes/l • Root: https://www.flickr.com/photos/paperpariah/19937816358/sizes/l/ • Knuth: https://www.flickr.com/photos/ioerror/56360019/sizes/l • Warmup: https://www.flickr.com/photos/komunews/2085730526/sizes/o/ • Multithreading: https://www.flickr.com/photos/slimjim/4329655445/sizes/l • Stop: https://www.flickr.com/photos/thematthewknot/3924980314/sizes/l • Boats: https://www.flickr.com/photos/cuppini/8465318134/sizes/l • Next steps: https://www.flickr.com/photos/gebagia/22346547334/sizes/l • Marines: https://www.flickr.com/photos/dvids/14007373489/sizes/l • Artic ice: https://commons.wikimedia.org/wiki/File:ICESCAPE.jpg • Demo time: https://www.flickr.com/photos/abstractbynature/6111219203 • Blue sky: https://www.flickr.com/photos/foctavian/16371691937/ 51
52.
Questions? 52
53.
53 Follow us @lgomes @nenadbo github.com/kickass-jmh

Editor's Notes

#2 Motivation: Professional programmers should take responsibility on the performance of the code that they are developing. Performance is frequently something that is overlooked until last phases of the development process, whereas it should actually be integrated in the development process. TODO: Add some motivation around environmental benefits.
#4 This catching phrase is usually used without much context. Just like biblical citations it can lead to religious wars. Let’s check the context around that phrase.
#5 Let’s put some context around that citation. Look into what’s written before and after that phrase in the paper where it appeared. Knuth is the guy who said that. There’s some polemic about whether the quote is originally from Knuth or if he was citing Tony Hoare. This article tries to ”prove” that it’s actually from Knuth: https://shreevatsa.wordpress.com/2008/05/16/premature-optimization-is-the-root-of-all-evil/
#6 A little bit of context: "Programmers waste enormous amounts of time thinking about, or worrying about, the speed of noncritical parts of their programs, and these attempts at efficiency actually have a strong negative impact when debugging and maintenance are considered. We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. Yet we should not pass up our opportunity in that critical 3%. A good programmer will not be lulled into complacency by such reasoning, he will be wise to look carefully at the critical code; but only after that code has been identified. It is often a mistake to make a priori judgements about what parts of a program are really critical, since the universal experience of programmers who have been using measurement tools has been that the intuitive guesses fail.” Page 268 Donald Knuth Structured programming with go to statements Computing Surveys, Vol. 6, No. 4, December 1974 http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.103.6084&rep=rep1&type=pdf
#7 A little bit of context: "Programmers waste enormous amounts of time thinking about, or worrying about, the speed of noncritical parts of their programs, and these attempts at efficiency actually have a strong negative impact when debugging and maintenance are considered. We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. Yet we should not pass up our opportunity in that critical 3%. A good programmer will not be lulled into complacency by such reasoning, he will be wise to look carefully at the critical code; but only after that code has been identified. It is often a mistake to make a priori judgements about what parts of a program are really critical, since the universal experience of programmers who have been using measurement tools has been that the intuitive guesses fail.” Page 268 Donald Knuth Structured programming with go to statements Computing Surveys, Vol. 6, No. 4, December 1974 http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.103.6084&rep=rep1&type=pdf
#8 A little bit of context: "Programmers waste enormous amounts of time thinking about, or worrying about, the speed of noncritical parts of their programs, and these attempts at efficiency actually have a strong negative impact when debugging and maintenance are considered. We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. Yet we should not pass up our opportunity in that critical 3%. A good programmer will not be lulled into complacency by such reasoning, he will be wise to look carefully at the critical code; but only after that code has been identified. It is often a mistake to make a priori judgements about what parts of a program are really critical, since the universal experience of programmers who have been using measurement tools has been that the intuitive guesses fail.” Page 268 Donald Knuth Structured programming with go to statements Computing Surveys, Vol. 6, No. 4, December 1974 http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.103.6084&rep=rep1&type=pdf
#9 A little bit of context: "Programmers waste enormous amounts of time thinking about, or worrying about, the speed of noncritical parts of their programs, and these attempts at efficiency actually have a strong negative impact when debugging and maintenance are considered. We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. Yet we should not pass up our opportunity in that critical 3%. A good programmer will not be lulled into complacency by such reasoning, he will be wise to look carefully at the critical code; but only after that code has been identified. It is often a mistake to make a priori judgements about what parts of a program are really critical, since the universal experience of programmers who have been using measurement tools has been that the intuitive guesses fail.” Page 268 Donald Knuth Structured programming with go to statements Computing Surveys, Vol. 6, No. 4, December 1974 http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.103.6084&rep=rep1&type=pdf
#10 A little bit of context: "Programmers waste enormous amounts of time thinking about, or worrying about, the speed of noncritical parts of their programs, and these attempts at efficiency actually have a strong negative impact when debugging and maintenance are considered. We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. Yet we should not pass up our opportunity in that critical 3%. A good programmer will not be lulled into complacency by such reasoning, he will be wise to look carefully at the critical code; but only after that code has been identified. It is often a mistake to make a priori judgements about what parts of a program are really critical, since the universal experience of programmers who have been using measurement tools has been that the intuitive guesses fail.” Page 268 Donald Knuth Structured programming with go to statements Computing Surveys, Vol. 6, No. 4, December 1974 http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.103.6084&rep=rep1&type=pdf
#11 Summary of what Knuth said 
#12 https://commons.wikimedia.org/wiki/File:ICESCAPE.jpg
#13 Now you’re all convinced that you should be measuring the performance of your code. But wait, don’t just put timers on your unit-tests .
#14 Now you’re all convinced that you should be measuring the performance of your code. But wait, don’t just put timers on your unit-tests .
#15 Explain how code is initially interpreted; Then, compiled at runtime; Then, it runs in compiled mode.
#16 Explain how code is initially interpreted; Then, compiled at runtime; Then, it runs in compiled mode.
#17 Explain how code is initially interpreted; Then, compiled at runtime; Then, it runs in compiled mode.
#18 Branch prediction Loop unrolling Dead code elimination Autobox elimitation Constant propagation Null check elimination Algebraic simplification Devirtualisation Range check elimitation Etc.
#21 JMH is an open-source project that does exactly that. It’s part of the OpenJDK project and we will see how it can help you.
#22 JMH is an open-source project that does exactly that. It’s part of the OpenJDK project and we will see how it can help you.
#23 JMH is an open-source project that does exactly that. It’s part of the OpenJDK project and we will see how it can help you.
#24 JMH is an open-source project that does exactly that. It’s part of the OpenJDK project and we will see how it can help you.
#25 JMH is an open-source project that does exactly that. It’s part of the OpenJDK project and we will see how it can help you.
#26 JMH is an open-source project that does exactly that. It’s part of the OpenJDK project and we will see how it can help you.
#27 JMH is an open-source project that does exactly that. It’s part of the OpenJDK project and we will see how it can help you.
#29 JMH is an open-source project that does exactly that. It’s part of the OpenJDK project and we will see how it can help you.
#30 JMH is an open-source project that does exactly that. It’s part of the OpenJDK project and we will see how it can help you.
#31 JMH is an open-source project that does exactly that. It’s part of the OpenJDK project and we will see how it can help you.
#33 JMH is an open-source project that does exactly that. It’s part of the OpenJDK project and we will see how it can help you. The thread scope matches well the concept of application server, because usually Java app servers have scope per thread. This would be like processing 20 requests in parallel. Benchmark scope would be a cache that all your requests are accessing. It should be guarded by synchronization mechanisms to make sure that it remains consistent.
#34 JMH is an open-source project that does exactly that. It’s part of the OpenJDK project and we will see how it can help you.
#35 JMH is an open-source project that does exactly that. It’s part of the OpenJDK project and we will see how it can help you.
#36 Write to a volatile happens-before every subsequent read of that volatile
#37 JMH is an open-source project that does exactly that. It’s part of the OpenJDK project and we will see how it can help you.
#38 Instead of using a single lock for the shared data, the shared data is segmented with each segment having its own lock. Uncontended lock acquisition is very cheap; it's the contented locks that cause scalability issues. With a different lock for each partition, ConcurrentHashMap effectively reduces how often a lock is requested by the number of partitions. You can think of ConcurrentHashMap as made up of n separate hash tables.
#39 Write to a volatile happens-before every subsequent read of that volatile
#41 JMH is an open-source project that does exactly that. It’s part of the OpenJDK project and we will see how it can help you.
#42 JMH is an open-source project that does exactly that. It’s part of the OpenJDK project and we will see how it can help you.
#43 https://github.com/chrishantha/jfr-flame-graph
#44 http://netty.io/wiki/microbenchmarks.html https://github.com/grpc/grpc-java/tree/master/benchmarks/src/jmh https://github.com/akka/akka/tree/master/akka-bench-jmh https://github.com/SonarSource/sslr/blob/master/sslr-benchmarks/src/main/java/org/sonar/sslr/benchmarks/RecursiveRuleBenchmark.java https://github.com/droolsjbpm/kie-benchmarks/blob/master/drools-benchmarks/src/main/java/org/drools/benchmarks/session/InsertFireLoopBenchmark.java https://github.com/finagle/finagle-serial#benchmarks
#49 LMAX: Our micro-benchmarks currently take over an hour to run, though with more hardware we could run them in parallel to improve this. That's still not bad, but for comparison, our suite of ~11k acceptance tests only takes ~25mins...
#51 Reduce noise / Isolate your benchmarks as much as possible (using cpu isolation, sched_setaffinity); Care about a correct warmup phase / Give benchmarks enough time to run; Don't do nanosecond per operation benchmarks in Continuous integration; Define regression / Some variance is expected; Define well your baseline; Differentiate inter-version, intra-version regressions; Make sure issues backlog is tracked and handled.

Kickass benchmarking with JMH Riviera Dev 2017

More Related Content

Recently uploaded

Featured

Kickass benchmarking with JMH Riviera Dev 2017

Editor's Notes