Performance has always been a major concern in software development and should not be taken lightly even when commodity computers have multicore CPUs and a few gigabytes of RAM. One of the most handy, simple tools for performance testing are microbenchmarks. Unfortunately, developing correct Java microbenchmarks is a complex task with many pitfalls on the way. This presentation is about the Do's and Don'ts of Java microbenchmarking and about what tools are out there to help with this tricky task.
3. Microbenchmark – simple definition
1. Start the 2. Run the code 3. Stop the 4. Report
clock clock
3
4. Better microbenchmark definition
• Small program
• Goal: Measure something about a few
lines of code
• All other variables should be removed
• Returns some kind of a numeric
result
4
5. Why do I need microbenchmarks?
• Discover something about my code:
• How fast is it
• Calculate throughput – TPS, KB/s
• Measure the result of changing my code:
• Should I replace a HashMap with a TreeMap?
• What is the cost of synchronizing a method?
5
6. Why are you talking about this?
• It’s hard to write a robust
microbenchmark
• it’s even harder to do it in Java™
• There are not enough Java
microbenchmarking tools
• There are too many flawed
microbenchmarks out there
6
8. A microbenchmark story: the problem
The boss asks you to solve a performance issue
in one of the components
Blah, blah …
8
9. A microbenchmark story: the cause
You find out that the cause is excessive use
of Math.sqrt()
9
10. A microbenchmark story: a solution?
• You decide to develop a state of the art
square root approximation
• After developing the square root
approximation you want to benchmark it
against the java.lang.Math
implementation
10
11. SQRT approximation microbenchmark
Let’s run this little piece of code in a loop
and see what happens …
public static void main(String[] args) {
long start = System.currentTimeMillis(); // start the clock
for (double i = 0; i < 10 * 1000 * 1000; i++) {
mySqrt(i); // little piece of code
}
long end = System.currentTimeMillis(); // stop the clock
long duration = end - start;
System.out.format(quot;Test duration: %d (ms) %nquot;, duration);
}
11
14. SQRT microbenchmark: what’s wrong?
Dynamic optimizations
Garbage collection Dead code elimination
The Java™ HotSpot virtual machine
Classloading
Dynamic Compilation
On Stack Replacement
14
15. The HotSpot: a mixed mode system
2
Code is
1
interpreted Profiling
3
Interpreted again Dynamic
or recompiled Compilation
5
Stuff 4
Happen
15
16. Dynamic compilation
• Dynamic compilation is unpredictable
• Don’t know when the compiler will run
• Don’t know how long the compiler will run
• Same code may be compiled more than once
• The JVM can switch to compiled code at will
16
20. What the heck is code hoisting ?
• Hoist = to raise or lift
• Size optimization
• Eliminate duplicated pieces
of code in method bodies
by hoisting expressions
or statements
20
21. Code hoisting example
a + b is a busy After hoisting the
expression expression a + b. A
new local variable t
has been introduced
Optimizing Java for Size: Compiler Techniques for Code Compaction, Samuli Heilala
21
22. Dynamic optimizations cont.
• Most of the optimizations are performed
at runtime
• Profiling data is used by the compiler to
improve optimization decisions
• You don’t have access to the dynamically
compiled code
22
23. Example: Very fast square root?
10,000,000 calls to Math.sqrt() ~ 4 ms
public static void main(String[] args) {
long start = System.nanoTime();
int result = 0;
for (int i = 0; i < 10 * 1000 * 1000; i++) {
result += Math.sqrt(i);
}
long duration = (System.nanoTime() - start) / 1000000;
System.out.format(quot;Test duration: %d (ms) %nquot;, duration);
}
23
24. Example: not so fast?
Now it takes ~ 2000 ms ?!?
public static void main(String[] args) {
long start = System.nanoTime();
int result = 0;
for (int i = 0; i < 10 * 1000 * 1000; i++) {
result += Math.sqrt(i); Single line
of code
} added
System.out.format(quot;Result: %d %nquot;, result);
long duration = (System.nanoTime() - start) / 1000000;
System.out.format(quot;Test duration: %d (ms) %nquot;, duration);
}
24
25. DCE - Dead Code Elimination
• Dead code - code that has no effect on the
outcome of the program execution
public static void main(String[] args) {
long start = System.nanoTime();
int result = 0;
for (int i = 0; i < 10 * 1000 * 1000; i++) {
result += Math.sqrt(i);
} Dead Code
long duration = (System.nanoTime() - start) / 1000000;
System.out.format(quot;Test duration: %d (ms) %nquot;, duration);
}
25
26. OSR - On Stack Replacement
• Methods are HOT if they cumulatively
execute more than 10,000 of loop
iterations
• Older JVM versions did not switch to the
compiled version until the method exited
and was re-entered
• OSR - switch from interpretation to
compiled code in the middle of a loop
26
27. OSR and microbenchmarking
• OSR’d code may be less performant
• Some optimizations are not performed
• OSR usually happen when you put
everything into one long method
• Developers tend to write long main()
methods when benchmarking
• Real life applications are hopefully divided
into more fine grained methods
27
28. Classloading
• Classes are usually loaded only when
they are first used
• Class loading takes time
• I/O
• Parsing
• Verification
• May flow your benchmark results
28
29. Garbage Collection
• JVM automatically claim resources by
• Garbage collection
• Objects finalization
• Outside of developer’s control
• Unpredictable
• Should be measured if invoked as a result
of the benchmarked code
29
30. Time measurement
How long is one millisecond?
public static void main(String[] args) throws
InterruptedException {
long start = System.currentTimeMillis();
Thread.sleep(1);
final long end = System.currentTimeMillis();
final long duration = (end - start);
System.out.format(quot;Test duration: %d (ms) %nquot;, duration);
}
Test duration: 16 (ms)
30
31. System.curremtTimeMillis()
• Accuracy varies with platform
Resolution Platform Source
55 ms Windows 95/98 Java Glossary
10 – 15 ms Windows NT, 2K, XP, 2003 David Holmes
1 ms Mac OS X Java Glossary
1 ms Linux – 2.6 kernel Markus Kobler
31
32. Wrong target platform
• Choosing the wrong platform for your
microbenchmark
• Benchmarking on Windows when your
target platform is Linux
• Benchmarking a highly threaded
application on a single core machine
• Benchmarking on a Sun JVM when the
target platform is Oracle (BEA) JRockit
32
33. Caching
• Caching
• Hardware – CPU caching
• Operating System – File system caching
• Database – query caching
33
34. Caching: CPU L1 and L2 caches
• The more the data accessed are far from the
CPU, the more the delays are high
• Size of dataset affects access cost
Array size Time (us) Cost (ns)
16k 413451 9.821
8192K 5743812 136.446
Jcachev2 results for Intel® core™2 duo T8300, L1 = 32 KB, L2 = 3 MB
34
38. Warm-up up your code
• Let the JVM reach steady state execution
profile before you start benchmarking
• All classes should be loaded before
benchmarking
• Usually executing your code for ~10
seconds should be enough
38
39. Warm-up up your code – cont.
• Detect JIT compilations by using
• CompilationMXBean.
getTotalCompilationTime()
• -XX:+PrintCompilation
• Measure classloading time
• Use the ClassLoadingMXBean
39
40. CompilationMXBean usage
import java.lang.management.ManagementFactory;
import java.lang.management.CompilationMXBean;
long compilationTimeTotal;
CompilationMXBean compBean =
ManagementFactory.getCompilationMXBean();
if (compBean.isCompilationTimeMonitoringSupported())
compilationTimeTotal = compBean.getTotalCompilationTime();
40
41. Dynamic optimizations
• Avoid on stack replacement
• Don’t put all your benchmark code in one
big main() method
• Avoid dead code elimination
• Print the final result
• Report unreasonable speedups
41
42. Garbage Collection
• Measure garbage collection time
• Force garbage collection and finalization
before benchmarking
• Perform enough iteration to reach garbage
collection steady state
• Gather gc stats:
-XX:PrintGCTimeStamps
-XX:PrintGCDetails
42
43. Time measurement
• Use System.nanoTime()
• Microseconds accuracy on modern operating
systems and hardware
• Not worse than currentTimeMillis()
• Notice: Windows users
• executes in microseconds
• don’t overuse !
43
44. JVM configuration
• Use similar JVM options to your target
environment:
• -server or –client JVM
• Enough heap space (-Xmx)
• Garbage collection options
• Thread stack size (-Xss)
• JIT compiling options
44
45. Other issues
• Use fixed size data sets
• Too large data sets can cause L1 cache
blowout
• Notice system load
• Don’t play GTA while benchmarking !
45
47. Java™ benchmarking tools
• Various specialized benchmarks
• SPECjAppServer ®
• SPECjvm™
• CaffeineMark 3.0™
• SciMark 2.0
• Only a few benchmarking frameworks
47
48. Japex Micro-Benchmark framework
• Similar in spirit to JUnit
• Measures throughput – work over time
• Transactions Per Second (Default)
• KBs per second
• XML based configuration
• XML/HTML reports
48
49. Japex: Drivers
• Encapsulates knowledge about a specific
algorithm implementation
• Must extend JapexDriverBase
public interface JapexDriver extends Runnable {
public void initializeDriver();
public void prepare(TestCase testCase);
public void warmup(TestCase testCase);
public void run(TestCase testCase);
public void finish(TestCase testCase);
public void terminateDriver();
}
49
50. Japex: Writing your own driver
public class SqrtNewtonApproxDriver extends JapexDriverBase {
private long tmp;
…
@Override
public void warmup(TestCase testCase) {
tmp += sqrt(getNextRandomNumber());
}
…
}
50
54. Japex: pros and cons
• Pros
• Similar to JUnit
• Nice HTML reports
• Cons
• Last stable release on March 2007
• HotSpot issues are not handled
• XML configuration
54
55. Brent Boyer’s Benchmark framework
• Part of the “Robust Java benchmarking”
article by Brent Boyer
• Automate as many aspects as possible:
• Resource reclamation
• Class loading
• Dead code elimination
• Statistics
55
56. Benchmark framework example
Benchmark.Params params = new Benchmark.Params(true);
params.setExecutionTimeGoal(0.5);
params.setNumberMeasurements(50);
Runnable task = new Runnable() {
public void run() {
sqrt(getNextRandomNumber());
}
};
Benchmark benchmark = new Benchmark(task, params);
System.out.println(benchmark.toString());
56
57. Benchmark single line summary
Benchmark output:
first = 25.702 us,
mean = 91.070 ns
(CI deltas: -115.591 ps, +171.423 ps)
sd = 1.451 us (CI deltas: -461.523 ns, +676.964 ns)
WARNING: execution times have mild outliers, SD
VALUES MAY BE INACCURATE
57
58. Outlier and serial correlation issues
• Records outlier and serial correlation
issues
• Outliers indicate that a major
measurement error happened
• Large outliers - some other activity started on the
computer during measurement
• Small outliers might hint that DCE occurred
• Serial correlation indicates that the JVM has not
reached its steady-state performance profile
58
59. Benchmark : pros and cons
• Pros
• Handles HotSpot related issues
• Detailed statistics
• Cons
• Each run takes a lot of time
• Not a formal project
• Lacks documentation
59
61. Summary 1
• Micro benchmarking is hard when it
comes to Java™
• Define what you want to measure and
how want to do it, pick your goals
• Know what you are doing
• Always warm-up your code
• Handle DCE, OSR, GC issues
• Use fixed size data sets and fixed work
61
62. Summary 2
• Do not rely solely on microbenchmark
results
• Sanity check results
• Use a profiler
• Test your code in real life scenarios under
realistic load (macro-benchmark)
62