Java Benchmarking
Srinivasan Raghavan
Senior Member of Technical Staff
Java Platform Group
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |
Program Agenda
Why do we need to benchmark ?
Do we benchmark correctly ?
Know the optimization ..
How open jmh works ..
Advanced topics ..
1
2
3
4
4
5
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |
Why do we need to benchmark?
5
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |
Trends in hardware tech
• Single processor performance improvement have slowing steadily from 2003
• Clock speed have not increased last decade due to power consumption factors
• Also there is a concern in microprocessor production tech which is hitting 7 nm process
barrier
• DRAM chip capacity has increased by about 25% to 40% per year recently. And there is
also tremendous increase in bandwidth. But latency is still a concern
• Its increasing difficulty of efficiently manufacturing even smaller DRAM cells
• Bandwidth has outpaced latency across these technologies and will likely continue to do
so
6
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |
Why do you need to measure?
• Software Engineering is more like the Interstellar movie . Can be realistic, driven by
math. Its still a movie not a reality
• Performance Engineering is firmly placed in reality where one has to deal complex
hardware interaction, compiler and hardware optimization and multithreading
• The aim of Performance Engineering is to gather the performance model of the
underlying system
• It can give a picture where optimization is required and where there is too much over
engineering
• Having a microbench mark data is a far better that writing code in the blind
7
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |
Do we benchmark correctly ?
8
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |
A wrong approach to benchmark
9
interface Incrementer {
void increment();
}
class LockIncrementer implements Incrementer {
private long counter = 0;
private Lock lock = new ReentrantLock();
public void increment() {
lock.lock();
try {
++counter;
} finally {
lock.unlock();
}}}
class SyncIncrementer implements Incrementer {
private long counter = 0;
public synchronized void increment() {
++counter;
}
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |
A wrong approach to benchmark
10
long test(Incrementer incr) {
long start = System.nanoTime();
for (long i = 0; i < 10000000L; i++)
incr.increment();
return System.nanoTime() - start;
}
public static void main(String[] args) {
long synchTime = test(new SyncIncrementer());
long lockTime = test(new LockIncrementer());
System.out.printf("synchronized: %1$10dn", synchTime);
System.out.printf("Lock: %1$10dn", lockTime);
System.out.printf("Lock/synchronized = %1$.3f", (double) lockTime /
(double) synchTime);
}
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |
What are things missed here ?
• No consideration for compiler and hardware optimization and number of cores etc
• No consideration for the JVM optimization
• No consideration for the number of threads acting here.
• No consideration for variation of number of inputs
• Conclusion solely based on numbers and stack overflow conversation
11
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |
Open JMH
• JMH is a Java harness for building, running, and analyzing nano/micro/milli/macro
benchmarks written in Java and other languages targeting the JVM.
• Part of the code-tools project of openjdk
• Used extensively within open jdk to test the internals
• Keeps pace with the changes in the jvm
• Brings scientific approach to benchmarking
12
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |
A quick look at JMH working
• A maven based project . Bundles the benchmark code with the working jar
• A quick common annotation list
13
Annotation Function
@Benchmark Lines up the method for benchmarking
@BenchmarkMode Defines mode of benchmark line
averagetime or throughput
@Warmup Defines the warm-up cycles
@Measurement Defines the measurement iteration
@Fork Number of vm
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |
Know the optimization
14
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |
VM and Hardware optimization
• Dead code elimination – Elimination of code which is not used
• Inlining – Analyzing the outcome of the code and optimizing it
• Loop unrolling – increase a program's speed by reducing (or eliminating) instructions
that control the loop
• Warmup – VM starts by interpreting the code and after seeing the hot methods it starts
aggressive inlining
15
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | 16
@Benchmark
public void testMethod() {
int sum = 0;
for (int i = 0; i < 50; i++) {
sum += i;
}
}
@Benchmark
public int testMethod_1() {
int sum = 0;
for (int i = 0; i < 50; i++) {
sum += i;
}
return sum;
}
Benchmark Mode Cnt Score Error Units
Benchmark_Inlining.testMethod avgt 5 0.411 ▒ 0.199 ns/op
Benchmark_Inlining.testMethod_1 avgt 5 3.396 ▒ 0.138 ns/op
Benchmark_Inlining.testMethod_2 avgt 5 5.123 ▒ 0.993 ns/op
@CompilerControl(Mode.DONT_INLINE)
@Benchmark
public int testMethod_2() {
int sum = 0;
for (int i = 0; i < 50; i++) {
sum += i;
}
return sum;
}
DCE and Inlining
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | 17
private double[] A = new double[2048];
@Benchmark
public double test1() {
double sum = 0.0;
for (int i = 0; i < A.length; i++) {
sum += A[i];
}
return sum;
}
@Benchmark
public double testManualUnroll() {
double sum = 0.0;
for (int i = 0; i < A.length; i += 4) {
sum += A[i] + A[i + 1] + A[i + 2] + A[i + 3];
}
return sum;
}
Benchmark Mode Cnt Score Error Units
Benchmark_LoopUnroll.test1 avgt 5 1946.006 ▒ 73.579 ns/op
Benchmark_LoopUnroll.testManualUnroll avgt 5 823.572 ▒ 183.984 ns/op
Loop Unrolling
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | 18
Benchmark Mode Cnt Score Error Units
JMHSample_11_Loops.measure avgt 5 5.472 ▒ 2.783 ns/op
JMHSample_11_Loops.measure1 avgt 5 4.813 ▒ 0.352 ns/op
JMHSample_11_Loops.measure1000 avgt 5 0.039 ▒ 0.008 ns/op
JMHSample_11_Loops.measure100000 avgt 5 0.036 ▒ 0.006 ns/op
Unrolling and Inline in Steroids
int x = 1;
int y = 2;
@Benchmark
public int measure() {
return (x + y);
}
private int reps(int reps) {
int s = 0;
for (int i = 0; i < reps; i++) {
s += (x + y);
}
return s;
}
@Benchmark
@OperationsPerInvocation(1)
public int measure1() {
return reps(1);
}
@Benchmark
@OperationsPerInvocation(10000)
public int measure1000() {
return reps(10000);
}
@Benchmark
@OperationsPerInvocation(100000)
public int measure100000() {
return reps(100000);
}
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |
Advanced Topics
19
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | 20
Benchmark Mode Cnt Score Error Units
BenchmarkAtomicInteger.baseline avgt 5 4.109 ▒ 0.345 ns/op
BenchmarkAtomicInteger.incrPlain avgt 5 3.320 ▒ 0.415 ns/op
BenchmarkAtomicInteger.incrAtomic avgt 5 9.205 ▒ 1.031 ns/op
What can be the cost of Atomic write
private int plainV;
private AtomicInteger atomicInteger = new
AtomicInteger(0);
@Benchmark
public int baseline() {
return 42;
}
@Benchmark
public int incrPlain() {
return plainV++;
}
@Benchmark
public int incrAtomic() {
return atomicInteger.incrementAndGet();
}
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | 21
Benchmark (tokens) Mode Cnt Score Error Units
BenchmarkAmortizedAtomicInteger.baseline 40 avgt 5 84.652 ▒ 3.229 ns/op
BenchmarkAmortizedAtomicInteger.incrPlain 40 avgt 5 85.101 ▒ 2.916 ns/op
BenchmarkAmortizedAtomicInteger.incrAtomic 40 avgt 5 87.118 ▒ 3.891 ns/op
Lets Amortize the cost
@Param({ "40" })
private int tokens;
private int plainV;
private AtomicInteger atomicInteger = new
AtomicInteger(0);
@Benchmark
public int baseline() {
Blackhole.consumeCPU(tokens);
return 42;
}
@Benchmark
public int incrPlain() {
Blackhole.consumeCPU(tokens);
return plainV++;
}
@Benchmark
public int incrAtomic() {
Blackhole.consumeCPU(tokens);
return atomicInteger.incrementAndGet();
}
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |
Conclusion
• Benchmark for understand the performance model of the system not obtain number for
fighting in stack over flow.
• It can give an insight to where performance tweaking is need and where its not required
where underlying systems can do the optimization for you
• Superficial conclusion without accurate measurement on performance can lead over
engineering.
22
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |
References
• http://hg.openjdk.java.net/code-tools/jmh/
• http://openjdk.java.net/projects/code-tools/jmh/
• http://shipilev.net/
• Computer Architecture A Quantitative Approach (5th edition) John L. Hennessy
Stanford University David A. Patterson University of California, Berkeley
• http://openjdk.java.net/projects/jdk8/
23
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |
Questions ?
24
JavaMicroBenchmarkpptm

JavaMicroBenchmarkpptm

  • 3.
    Java Benchmarking Srinivasan Raghavan SeniorMember of Technical Staff Java Platform Group Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |
  • 4.
    Copyright © 2015,Oracle and/or its affiliates. All rights reserved. | Program Agenda Why do we need to benchmark ? Do we benchmark correctly ? Know the optimization .. How open jmh works .. Advanced topics .. 1 2 3 4 4 5
  • 5.
    Copyright © 2015,Oracle and/or its affiliates. All rights reserved. | Why do we need to benchmark? 5
  • 6.
    Copyright © 2015,Oracle and/or its affiliates. All rights reserved. | Trends in hardware tech • Single processor performance improvement have slowing steadily from 2003 • Clock speed have not increased last decade due to power consumption factors • Also there is a concern in microprocessor production tech which is hitting 7 nm process barrier • DRAM chip capacity has increased by about 25% to 40% per year recently. And there is also tremendous increase in bandwidth. But latency is still a concern • Its increasing difficulty of efficiently manufacturing even smaller DRAM cells • Bandwidth has outpaced latency across these technologies and will likely continue to do so 6
  • 7.
    Copyright © 2015,Oracle and/or its affiliates. All rights reserved. | Why do you need to measure? • Software Engineering is more like the Interstellar movie . Can be realistic, driven by math. Its still a movie not a reality • Performance Engineering is firmly placed in reality where one has to deal complex hardware interaction, compiler and hardware optimization and multithreading • The aim of Performance Engineering is to gather the performance model of the underlying system • It can give a picture where optimization is required and where there is too much over engineering • Having a microbench mark data is a far better that writing code in the blind 7
  • 8.
    Copyright © 2015,Oracle and/or its affiliates. All rights reserved. | Do we benchmark correctly ? 8
  • 9.
    Copyright © 2015,Oracle and/or its affiliates. All rights reserved. | A wrong approach to benchmark 9 interface Incrementer { void increment(); } class LockIncrementer implements Incrementer { private long counter = 0; private Lock lock = new ReentrantLock(); public void increment() { lock.lock(); try { ++counter; } finally { lock.unlock(); }}} class SyncIncrementer implements Incrementer { private long counter = 0; public synchronized void increment() { ++counter; }
  • 10.
    Copyright © 2015,Oracle and/or its affiliates. All rights reserved. | A wrong approach to benchmark 10 long test(Incrementer incr) { long start = System.nanoTime(); for (long i = 0; i < 10000000L; i++) incr.increment(); return System.nanoTime() - start; } public static void main(String[] args) { long synchTime = test(new SyncIncrementer()); long lockTime = test(new LockIncrementer()); System.out.printf("synchronized: %1$10dn", synchTime); System.out.printf("Lock: %1$10dn", lockTime); System.out.printf("Lock/synchronized = %1$.3f", (double) lockTime / (double) synchTime); }
  • 11.
    Copyright © 2015,Oracle and/or its affiliates. All rights reserved. | What are things missed here ? • No consideration for compiler and hardware optimization and number of cores etc • No consideration for the JVM optimization • No consideration for the number of threads acting here. • No consideration for variation of number of inputs • Conclusion solely based on numbers and stack overflow conversation 11
  • 12.
    Copyright © 2015,Oracle and/or its affiliates. All rights reserved. | Open JMH • JMH is a Java harness for building, running, and analyzing nano/micro/milli/macro benchmarks written in Java and other languages targeting the JVM. • Part of the code-tools project of openjdk • Used extensively within open jdk to test the internals • Keeps pace with the changes in the jvm • Brings scientific approach to benchmarking 12
  • 13.
    Copyright © 2015,Oracle and/or its affiliates. All rights reserved. | A quick look at JMH working • A maven based project . Bundles the benchmark code with the working jar • A quick common annotation list 13 Annotation Function @Benchmark Lines up the method for benchmarking @BenchmarkMode Defines mode of benchmark line averagetime or throughput @Warmup Defines the warm-up cycles @Measurement Defines the measurement iteration @Fork Number of vm
  • 14.
    Copyright © 2015,Oracle and/or its affiliates. All rights reserved. | Know the optimization 14
  • 15.
    Copyright © 2015,Oracle and/or its affiliates. All rights reserved. | VM and Hardware optimization • Dead code elimination – Elimination of code which is not used • Inlining – Analyzing the outcome of the code and optimizing it • Loop unrolling – increase a program's speed by reducing (or eliminating) instructions that control the loop • Warmup – VM starts by interpreting the code and after seeing the hot methods it starts aggressive inlining 15
  • 16.
    Copyright © 2015,Oracle and/or its affiliates. All rights reserved. | 16 @Benchmark public void testMethod() { int sum = 0; for (int i = 0; i < 50; i++) { sum += i; } } @Benchmark public int testMethod_1() { int sum = 0; for (int i = 0; i < 50; i++) { sum += i; } return sum; } Benchmark Mode Cnt Score Error Units Benchmark_Inlining.testMethod avgt 5 0.411 ▒ 0.199 ns/op Benchmark_Inlining.testMethod_1 avgt 5 3.396 ▒ 0.138 ns/op Benchmark_Inlining.testMethod_2 avgt 5 5.123 ▒ 0.993 ns/op @CompilerControl(Mode.DONT_INLINE) @Benchmark public int testMethod_2() { int sum = 0; for (int i = 0; i < 50; i++) { sum += i; } return sum; } DCE and Inlining
  • 17.
    Copyright © 2015,Oracle and/or its affiliates. All rights reserved. | 17 private double[] A = new double[2048]; @Benchmark public double test1() { double sum = 0.0; for (int i = 0; i < A.length; i++) { sum += A[i]; } return sum; } @Benchmark public double testManualUnroll() { double sum = 0.0; for (int i = 0; i < A.length; i += 4) { sum += A[i] + A[i + 1] + A[i + 2] + A[i + 3]; } return sum; } Benchmark Mode Cnt Score Error Units Benchmark_LoopUnroll.test1 avgt 5 1946.006 ▒ 73.579 ns/op Benchmark_LoopUnroll.testManualUnroll avgt 5 823.572 ▒ 183.984 ns/op Loop Unrolling
  • 18.
    Copyright © 2015,Oracle and/or its affiliates. All rights reserved. | 18 Benchmark Mode Cnt Score Error Units JMHSample_11_Loops.measure avgt 5 5.472 ▒ 2.783 ns/op JMHSample_11_Loops.measure1 avgt 5 4.813 ▒ 0.352 ns/op JMHSample_11_Loops.measure1000 avgt 5 0.039 ▒ 0.008 ns/op JMHSample_11_Loops.measure100000 avgt 5 0.036 ▒ 0.006 ns/op Unrolling and Inline in Steroids int x = 1; int y = 2; @Benchmark public int measure() { return (x + y); } private int reps(int reps) { int s = 0; for (int i = 0; i < reps; i++) { s += (x + y); } return s; } @Benchmark @OperationsPerInvocation(1) public int measure1() { return reps(1); } @Benchmark @OperationsPerInvocation(10000) public int measure1000() { return reps(10000); } @Benchmark @OperationsPerInvocation(100000) public int measure100000() { return reps(100000); }
  • 19.
    Copyright © 2015,Oracle and/or its affiliates. All rights reserved. | Advanced Topics 19
  • 20.
    Copyright © 2015,Oracle and/or its affiliates. All rights reserved. | 20 Benchmark Mode Cnt Score Error Units BenchmarkAtomicInteger.baseline avgt 5 4.109 ▒ 0.345 ns/op BenchmarkAtomicInteger.incrPlain avgt 5 3.320 ▒ 0.415 ns/op BenchmarkAtomicInteger.incrAtomic avgt 5 9.205 ▒ 1.031 ns/op What can be the cost of Atomic write private int plainV; private AtomicInteger atomicInteger = new AtomicInteger(0); @Benchmark public int baseline() { return 42; } @Benchmark public int incrPlain() { return plainV++; } @Benchmark public int incrAtomic() { return atomicInteger.incrementAndGet(); }
  • 21.
    Copyright © 2015,Oracle and/or its affiliates. All rights reserved. | 21 Benchmark (tokens) Mode Cnt Score Error Units BenchmarkAmortizedAtomicInteger.baseline 40 avgt 5 84.652 ▒ 3.229 ns/op BenchmarkAmortizedAtomicInteger.incrPlain 40 avgt 5 85.101 ▒ 2.916 ns/op BenchmarkAmortizedAtomicInteger.incrAtomic 40 avgt 5 87.118 ▒ 3.891 ns/op Lets Amortize the cost @Param({ "40" }) private int tokens; private int plainV; private AtomicInteger atomicInteger = new AtomicInteger(0); @Benchmark public int baseline() { Blackhole.consumeCPU(tokens); return 42; } @Benchmark public int incrPlain() { Blackhole.consumeCPU(tokens); return plainV++; } @Benchmark public int incrAtomic() { Blackhole.consumeCPU(tokens); return atomicInteger.incrementAndGet(); }
  • 22.
    Copyright © 2015,Oracle and/or its affiliates. All rights reserved. | Conclusion • Benchmark for understand the performance model of the system not obtain number for fighting in stack over flow. • It can give an insight to where performance tweaking is need and where its not required where underlying systems can do the optimization for you • Superficial conclusion without accurate measurement on performance can lead over engineering. 22
  • 23.
    Copyright © 2015,Oracle and/or its affiliates. All rights reserved. | References • http://hg.openjdk.java.net/code-tools/jmh/ • http://openjdk.java.net/projects/code-tools/jmh/ • http://shipilev.net/ • Computer Architecture A Quantitative Approach (5th edition) John L. Hennessy Stanford University David A. Patterson University of California, Berkeley • http://openjdk.java.net/projects/jdk8/ 23
  • 24.
    Copyright © 2015,Oracle and/or its affiliates. All rights reserved. | Questions ? 24

Editor's Notes

  • #4 This is a Title Slide with Java FY15 Theme slide ideal for including the Java Theme with a brief title, subtitle and presenter information. To customize this slide with your own picture: Right-click the slide area and choose Format Background from the pop-up menu. From the Fill menu, click Picture and texture fill. Under Insert from: click File. Locate your new picture and click Insert. To copy the Customized Background from Another Presentation on PC Click New Slide from the Home tab's Slides group and select Reuse Slides. Click Browse in the Reuse Slides panel and select Browse Files. Double-click the PowerPoint presentation that contains the background you wish to copy. Check Keep Source Formatting and click the slide that contains the background you want. Click the left-hand slide preview to which you wish to apply the new master layout. Apply New Layout (Important): Right-click any selected slide, point to Layout, and click the slide containing the desired layout from the layout gallery. Delete any unwanted slides or duplicates. To copy the Customized Background from Another Presentation on Mac Click New Slide from the Home tab's Slides group and select Insert Slides from Other Presentation… Navigate to the PowerPoint presentation file that contains the background you wish to copy. Double-click or press Insert. This prompts the Slide Finder dialogue box. Make sure Keep design of original slides is unchecked and click the slide(s) that contains the background you want. Hold Shift key to select multiple slides. Click the left-hand slide preview to which you wish to apply the new master layout. Apply New Layout (Important): Click Layout from the Home tab's Slides group, and click the slide containing the desired layout from the layout gallery. Delete any unwanted slides or duplicates.
  • #6 This slide can also be used as a Q and A slide
  • #9 This slide can also be used as a Q and A slide
  • #15 This slide can also be used as a Q and A slide
  • #20 This slide can also be used as a Q and A slide
  • #25 This slide can also be used as a Q and A slide