SlideShare a Scribd company logo
1 of 68
Download to read offline
JAVA
PERFORMANCE
PUZZLERS
DOUGLAS Q. HAWKINS
LEAD JIT DEVELOPER
AZUL SYSTEMS
@dougqh
dougqh@gmail.com
AGENDA
LOOK AT SOME INTERESTING
OFTEN UNINTUITIVE PERFORMANCE CASES
SEE WHAT WE CAN LEARN FROM THEM
Adding 1..1395 Numbers
Adding 1..1396 Numbers
A
B
C Adding 1..1397 Numbers
1
100
10000
0 50 100 150 200 250 300 350 400 450
logns
iterations
Interpreter 1st JIT 2nd JIT Repeat…
Deoptimize
& Repeat
JMH JAVA MEASUREMENT HARNESS:
@State(Scope.Benchmark)
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
public class Benchmark {
@Setup
public void setup() {}
@Benchmark
public void benchmark1() { … }
@Benchmark
public void benchmark2() { … }
…
}
01A
add1395 avgt 10 3.790 ± 0.300 ns/op
add1396 avgt 10 3.784 ± 0.336 ns/op
add1397 avgt 10 2767.567 ± 351.909 ns/op
01A
RESULTS
PERFORMANCE IS
FULL OF SURPRISES.
PERFORMANCE CLIFF:
HUGE METHOD LIMIT
add1395 7993 bytes
add1396 7999 bytes
add1397 8005 bytes
int x = 0;
for ( int i = 1; i <= 1395; ++i ) {
x += i;
}
return x;
A
B
C
int x = 0;
for ( int i = 1; i <= 1396; ++i ) {
x += i;
}
return x;
int x = 0;
for ( int i = 1; i <= 1397; ++i ) {
x += i;
}
return x; 01B
add1395 avgt 10 435.372 ± 19.225 ns/op
add1396 avgt 10 436.955 ± 18.688 ns/op
add1397 avgt 10 434.245 ± 11.635 ns/op
NOW EQUALLY — BAD
01B
System.arraycopy
int[] dest = new int[src.length];
for ( int i = 0; i < src.length; ++i ) {
dest[i] = src[i];
}
array.clone
A
B
C
02A
arrayCopy avgt 10 95598.200 ns/op
cloneArray avgt 10 94848.814 ns/op
manual avgt 10 93309.838 ns/op
02A
RESULTS
100k
int[] original = randomInts(100_000);
int[] copy1 = new int[original.length];
long startTime1 = System.nanoTime();
for ( int i = 0; i < original.length; ++i ) {
copy1[i] = original[i];
}
System.out.printf("copy loop: % 10d ns%n",
System.nanoTime() - startTime1);
int[] copy2 = new int[original.length];
long startTime2 = System.nanoTime();
System.arraycopy(
original, 0,
copy2, 0, original.length);
System.out.printf("arraycopy: % 10d ns%n",
System.nanoTime() - startTime2);
VS.
02B
02B
RESULTS
copy loop: 1646957 ns
arraycopy: 54827 ns
PERFORMANCE IS
FULL OF INTRICACIES.
O(N) != O(N)
Integer.valueOf(x)
new Integer(x)
x
A
B
C
03
public java.lang.Integer valueOf();
Code:
0: aload_0
1: getfield offset:I
4: aload_0
5: getfield nums:[I
8: arraylength
9: if_icmplt 17
12: aload_0
13: iconst_0
14: putfield offset:I
17: aload_0
18: getfield nums:[I
21: aload_0
22: dup
23: getfield offset:I
26: dup_x1
27: iconst_1
28: iadd
29: putfield offset:I
32: iaload
33: invokestatic Integer.valueOf(I)
36: areturn
public java.lang.Integer auto();
Code:
0: aload_0
1: getfield offset:I
4: aload_0
5: getfield nums:[I
8: arraylength
9: if_icmplt 17
12: aload_0
13: iconst_0
14: putfield offset:I
17: aload_0
18: getfield nums:[I
21: aload_0
22: dup
23: getfield offset:I
26: dup_x1
27: iconst_1
28: iadd
29: putfield offset:I
32: iaload
33: invokestatic Integer.valueOf(I)
36: areturn
javap -c
03
public static Integer valueOf(int i) {
if (i >= IntegerCache.low && i <= IntegerCache.high)
return IntegerCache.cache[i + (-IntegerCache.low)];
return new Integer(i);
}
static final int low = -128;
// configurable through
// -Djava.lang.Integer.IntegerCache.high
static final int high = 127;
03
@Param({"100", "1000", "10000"})
private int range;
PARAMERIZING JMH
@Setup
public void setUp() {
ThreadLocalRandom random = ThreadLocalRandom.current();
nums = new int[1_000_000];
for ( int i = 0; i < nums.length; ++i ) {
nums[i] = random.nextInt(-range, range);
}
}
03
RESULTSRESULTS
(range)
auto 100 avgt 10 5.132 ± 0.316 ns/op
auto 1000 avgt 10 8.184 ± 1.551 ns/op
auto 10000 avgt 10 6.996 ± 1.401 ns/op
new_ 100 avgt 10 6.328 ± 0.973 ns/op
new_ 1000 avgt 10 6.083 ± 0.651 ns/op
new_ 10000 avgt 10 6.243 ± 1.031 ns/op
valueOf 100 avgt 10 5.096 ± 0.116 ns/op
valueOf 1000 avgt 10 8.488 ± 1.957 ns/op
valueOf 10000 avgt 10 7.155 ± 1.382 ns/op
03
RESULTS
-100 to 100 -1,000 to 1,000 -10,000 to 10,000
new 6.328 ± 0.973 6.083 ± 0.651 6.243 ± 1.031
autobox 5.132 ± 0.316 8.184 ± 1.551 6.996 ± 1.401
valueOf 5.096 ± 0.116 8.488 ± 1.957 7.155 ± 1.382
EVERYTHING MATTERS:
HARDWARE INCLUDED
0
2000000
4000000
6000000
8000000
100 1000 10000 100000 1000000
ints ArrayList LinkedList
http://cr.openjdk.java.net/~shade/scratch/ArrayVsLinked.java
LINKED LIST
ARRAY LIST
ARRAY
O(N) != O(N)
A
B
C
D
E
list.toArray()
list.toArray(new Object[0])
list.toArray(new Object[list.size()])
list.toArray(new String[0])
list.toArray(new String[list.size()])
https://shipilev.net/blog/2016/arrays-wisdom-ancients/ 04
A
B
C
D
E
list.toArray()
list.toArray(new Object[0])
list.toArray(new Object[list.size()])
list.toArray(new String[0])
list.toArray(new String[list.size()])
SAYS…
https://shipilev.net/blog/2016/arrays-wisdom-ancients/
04
toArray avgt 10 54.084 ± 10.000 ns/op
toArraySized avgt 10 58.555 ± 0.745 ns/op
toArrayUnsized avgt 10 54.025 ± 0.343 ns/op
toStringArraySize avgt 10 154.291 ± 2.060 ns/op
toStringArrayUnsized avgt 10 135.603 ± 2.115 ns/op
RESULTS
https://shipilev.net/blog/2016/arrays-wisdom-ancients/ 04
STRING[] SLOWER THAN OBJECT[]?
Object[] objects = new Integer[20];
objects[0] = “foo”;
https://shipilev.net/blog/2016/arrays-wisdom-ancients/
STRING[] SLOWER THAN OBJECT[]?
Object[] objects = new Integer[20];
objects[0] = “foo”;
Possible Runtime Check
Sometimes JIT Eliminates It
https://shipilev.net/blog/2016/arrays-wisdom-ancients/
TRICKY WHEN ARRAY IS PASSED IN
list.toArray(new String[…]);
OBJECT[] IS COMMONLY USED,
SO SPECIAL CASE.
toArray avgt 10 54.084 ± 10.000 ns/op
toArraySized avgt 10 58.555 ± 0.745 ns/op
toArrayUnsized avgt 10 54.025 ± 0.343 ns/op
toStringArraySize avgt 10 154.291 ± 2.060 ns/op
toStringArrayUnsized avgt 10 135.603 ± 2.115 ns/op
IS WRONG.
https://shipilev.net/blog/2016/arrays-wisdom-ancients/ 04
WHY IS NO ARRAY / UNSIZED FASTER
// allocate
dest = malloc(sizeof(E) * len);
// zero-initialize
for ( int i = 0; i < len; ++i ) {
dest[i] = null;
}
// copy
for ( int i = 0; i < len; ++i ) {
dest[i] = src[i];
}
Dead
Stores
Integer.toString(NUM)
“” + NUM
A
B
C
StringBuilder builder = new StringBuilder();
builder.append(NUM);
builder.toString();
05A
static final int NUM =
ThreadLocalRandom.current().nextInt()
builder avgt 10 47.688 ± 3.866 ns/op
concat avgt 10 33.118 ± 0.840 ns/op
toString avgt 10 41.105 ± 2.005 ns/op
05A
javap -c
public java.lang.String concat();
Code:
0: new StringBuilder
3: dup
4: invokespecial StringBuilder."<init>":()V
7: ldc ""
9: invokevirtual StringBuilder.append(LString;)
12: getstatic NUM:I
15: invokevirtual StringBuilder.append(I)
18: invokevirtual StringBuilder.toString()
21: areturn
05A
public java.lang.String concat() {
return new StringBuilder().
append(“”).
append(NUM).
toString();
}
IN JAVA...
05B
return new StringBuilder().
append(“”).
append(NUM).
toString();
StringBuilder builder =
new StringBuilder();
builder.append(NUM);
VS.
05B
return builder.toString();
05B
RESULTS
builder avgt 10 41.122 ± 1.688 ns/op
concat avgt 10 35.173 ± 3.092 ns/op
concatLikeBuilder avgt 10 32.536 ± 2.302 ns/op
COMPILERS ARE
GLORIFIED REGEX-ES.
static final int NUM = 1 << 20
builder avgt 10 40.734 ± 0.997 ns/op
concat avgt 10 3.343 ± 0.205 ns/op
toString avgt 10 32.877 ± 0.450 ns/op
05C
public java.lang.String concat();
Code:
0: ldc “1048576”
2: areturn
public java.lang.String concat() {
return “1048576”;
}
javap -c
05C
CONSTANT FOLDING & PROPAGATION
static final int NUM = 1 << 20
static final int NUM = 1_048_576
constant
fold
constant
propagate
“” + NUM
“” + 1_048_576
constant
fold
“1048576”
05C
PERFORMANCE
COMPOSES
UNINTUITIVELY.
String str = “”;
for ( int i = 0; i < 100; ++i ) {
str += i;
}
A
B
StringBuilder builder = new StringBuilder();
for ( int i = 0; i < 100; ++i ) {
builder.append(i);
}
06
RESULTS
builder avgt 10 979.112 ± 59.893 ns/op
concat avgt 10 2661.941 ± 135.251 ns/op
05
FAST IN ONE CONTEXT
CAN BE SLOW IN ANOTHER.
RESULTS
speed
objects
allocated*
memory
consumed*
concat 2661.941 ± 135.251 ns/op 400 objects 71800 bytes
builder 979.112 ± 59.893 ns/op 9 objects 1640 bytes
* Memory usage measured separately with Caliper
06
RESULTS
speed
objects
allocated*
memory
consumed*
concat 2661.941 ± 135.251 ns/op 400 objects 71800 bytes
builder 979.112 ± 59.893 ns/op 9 objects 1640 bytes
sizedBuilder 887.717 ± 62.569 ns/op 4 objects 1464 bytes
* Memory usage measured separately with Caliper
06
APPLIES TO COLLECTIONS, TOO.
HashSet 16
HashMap 16
Hashtable 16
LinkedList 1
ArrayList 10
Vector 10
StringBuilder 16
StringBuffer 16
http://www.slideshare.net/cnbailey/memory-efficient-java
O(N)?
obj.invoke with 1 type
obj.invoke with 2 type
A
B
C obj.invoke with 3 types
(monomoprhic)
(bimorphic)
(trimorphic)
(megamorphic)
07A
THERE IS A CLIFF
AT 3 TYPES.
https://shipilev.net/blog/2015/black-magic-method-dispatch/
func.apply(x);
if ( func.getClass() == Square.class ) {
x * x;
} else if ( func.getClass() == Cube.class ) {
x * x * x
} else {
…
}
func.apply(x);
<= 2 types > 2 types
@Setup
public void setup() {
for ( int i = 0; i < 20_000; ++i ) {
if ( morphism >= 1 ) func = new Square();
call();
if ( morphism >=2 ) func = new Cube();
call();
if ( morphism >= 3 ) …
call();
if ( morphism >= 4 ) …
call();
}
// regardless of morphism --
// use Square in the end
func = new Square();
}
@Benchmark
public int call() {
int x = nums[index];
index =
(index + 1) % nums.length;
return func.apply(x);
}
call 1 avgt 10 8.120 ± 0.103 ns/op
call 2 avgt 10 8.225 ± 0.113 ns/op
call 3 avgt 10 8.170 ± 0.329 ns/op
call 4 avgt 10 8.189 ± 0.241 ns/op
RESULTS
07A
(morphism)
NO CLIFF?
@Benchmark
public int callLoop() {
int sum = 0;
for ( int x: nums ) {
sum += call(x);
}
return sum;
}
public int call(int x) {
return func.apply(x);
}
07B
https://shipilev.net/blog/2015/black-magic-method-dispatch/
07B
RESULTS
callLoop 1 avgt 10 4079.187 ± 269.537 ns/op
callLoop 2 avgt 10 6090.224 ± 209.573 ns/op
callLoop 3 avgt 10 20508.673 ± 18484.645 ns/op
callLoop 4 avgt 10 20271.124 ± 17767.914 ns/op
(morphism)
PERFORMANCE
ISN’T ADDITIVE.
FAST + FAST = SLOW
PERFORMANCE IS FULL OF SURPRISES.
UNINTUITIVE
INTRICACIES
SENSITIVITIES
NON-OBVIOUS CLIFFS
NOT ADDITIVE
PERFORMANCE IS FULL OF SURPRISES.
DON’T WORRY *TOO* MUCH.
JUST WRITE CLEAN CODE.
ONLY WORRY ABOUT THE HOTTEST CODE,
IMPROVE AND MEASURE *CAREFULLY*.
REMEMBER BEST WAY TO ADD 1…1396
int x = 0;
x += 1; x += 2; x += 3; x += 4; x += 5; x += 6; x += 7; x += 8; x += 9; x += 10;
x += 11; x += 12; x += 13; x += 14; x += 15; x += 16; x += 17; x += 18; x += 19; x += 20;
x += 21; x += 22; x += 23; x += 24; x += 25; x += 26; x += 27; x += 28; x += 29; x += 30;
x += 31; x += 32; x += 33; x += 34; x += 35; x += 36; x += 37; x += 38; x += 39; x += 40;
x += 41; x += 42; x += 43; x += 44; x += 45; x += 46; x += 47; x += 48; x += 49; x += 50;
x += 51; x += 52; x += 53; x += 54; x += 55; x += 56; x += 57; x += 58; x += 59; x += 60;
x += 61; x += 62; x += 63; x += 64; x += 65; x += 66; x += 67; x += 68; x += 69; x += 70;
x += 71; x += 72; x += 73; x += 74; x += 75; x += 76; x += 77; x += 78; x += 79; x += 80;
x += 81; x += 82; x += 83; x += 84; x += 85; x += 86; x += 87; x += 88; x += 89; x += 90;
x += 91; x += 92; x += 93; x += 94; x += 95; x += 96; x += 97; x += 98; x += 99; x += 100;
x += 101; x += 102; x += 103; x += 104; x += 105; x += 106; x += 107; x += 108; x += 109; x += 110;
x += 111; x += 112; x += 113; x += 114; x += 115; x += 116; x += 117; x += 118; x += 119; x += 120;
x += 121; x += 122; x += 123; x += 124; x += 125; x += 126; x += 127; x += 128; x += 129; x += 130;
x += 131; x += 132; x += 133; x += 134; x += 135; x += 136; x += 137; x += 138; x += 139; x += 140;
x += 141; x += 142; x += 143; x += 144; x += 145; x += 146; x += 147; x += 148; x += 149; x += 150;
x += 151; x += 152; x += 153; x += 154; x += 155; x += 156; x += 157; x += 158; x += 159; x += 160;
x += 161; x += 162; x += 163; x += 164; x += 165; x += 166; x += 167; x += 168; x += 169; x += 170;
x += 171; x += 172; x += 173; x += 174; x += 175; x += 176; x += 177; x += 178; x += 179; x += 180;
x += 181; x += 182; x += 183; x += 184; x += 185; x += 186; x += 187; x += 188; x += 189; x += 190;
x += 191; x += 192; x += 193; x += 194; x += 195; x += 196; x += 197; x += 198; x += 199; x += 200;
x += 201; x += 202; x += 203; x += 204; x += 205; x += 206; x += 207; x += 208; x += 209; x += 210;
int n = 1395;
return n * (n + 1) / 2;A
B
C
int n = 1396;
return n * (n + 1) / 2;
int n = 1397;
return n * (n + 1) / 2;
2.515 ± 0.043 ns/op
2.532 ± 0.089 ns/op
2.580 ± 0.042 ns/op
01C
REFERENCES
ALEKSEY SHIPILËV
http://shipilev.net/
PSYCHOMATIC LOBOTOMY SAW
http://psy-lob-saw.blogspot.com/
MECHANICAL SYMPATHY
http://mechanical-sympathy.blogspot.com/
JAVA SPECIALIST NEWSLETTER
http://www.javaspecialists.eu/

More Related Content

What's hot

.NET Multithreading and File I/O
.NET Multithreading and File I/O.NET Multithreading and File I/O
.NET Multithreading and File I/O
Jussi Pohjolainen
 
Java simple programs
Java simple programsJava simple programs
Java simple programs
VEERA RAGAVAN
 

What's hot (20)

서버 개발자가 바라 본 Functional Reactive Programming with RxJava - SpringCamp2015
서버 개발자가 바라 본 Functional Reactive Programming with RxJava - SpringCamp2015서버 개발자가 바라 본 Functional Reactive Programming with RxJava - SpringCamp2015
서버 개발자가 바라 본 Functional Reactive Programming with RxJava - SpringCamp2015
 
How Data Flow analysis works in a static code analyzer
How Data Flow analysis works in a static code analyzerHow Data Flow analysis works in a static code analyzer
How Data Flow analysis works in a static code analyzer
 
Java practice programs for beginners
Java practice programs for beginnersJava practice programs for beginners
Java practice programs for beginners
 
Final JAVA Practical of BCA SEM-5.
Final JAVA Practical of BCA SEM-5.Final JAVA Practical of BCA SEM-5.
Final JAVA Practical of BCA SEM-5.
 
.NET Multithreading and File I/O
.NET Multithreading and File I/O.NET Multithreading and File I/O
.NET Multithreading and File I/O
 
Beauty and the beast - Haskell on JVM
Beauty and the beast  - Haskell on JVMBeauty and the beast  - Haskell on JVM
Beauty and the beast - Haskell on JVM
 
COSCUP: Introduction to Julia
COSCUP: Introduction to JuliaCOSCUP: Introduction to Julia
COSCUP: Introduction to Julia
 
Java PRACTICAL file
Java PRACTICAL fileJava PRACTICAL file
Java PRACTICAL file
 
Java practical(baca sem v)
Java practical(baca sem v)Java practical(baca sem v)
Java practical(baca sem v)
 
Eta
EtaEta
Eta
 
Dagger & rxjava & retrofit
Dagger & rxjava & retrofitDagger & rxjava & retrofit
Dagger & rxjava & retrofit
 
Virtual machine and javascript engine
Virtual machine and javascript engineVirtual machine and javascript engine
Virtual machine and javascript engine
 
Thread
ThreadThread
Thread
 
Java Generics
Java GenericsJava Generics
Java Generics
 
The Language for future-julia
The Language for future-juliaThe Language for future-julia
The Language for future-julia
 
Java simple programs
Java simple programsJava simple programs
Java simple programs
 
Java Practical File Diploma
Java Practical File DiplomaJava Practical File Diploma
Java Practical File Diploma
 
The Ring programming language version 1.5.3 book - Part 10 of 184
The Ring programming language version 1.5.3 book - Part 10 of 184The Ring programming language version 1.5.3 book - Part 10 of 184
The Ring programming language version 1.5.3 book - Part 10 of 184
 
Aaron Bedra - Effective Software Security Teams
Aaron Bedra - Effective Software Security TeamsAaron Bedra - Effective Software Security Teams
Aaron Bedra - Effective Software Security Teams
 
The Ring programming language version 1.5.4 book - Part 10 of 185
The Ring programming language version 1.5.4 book - Part 10 of 185The Ring programming language version 1.5.4 book - Part 10 of 185
The Ring programming language version 1.5.4 book - Part 10 of 185
 

Similar to Java Performance Puzzlers

Lock? We don't need no stinkin' locks!
Lock? We don't need no stinkin' locks!Lock? We don't need no stinkin' locks!
Lock? We don't need no stinkin' locks!
Michael Barker
 
Refactoring to Macros with Clojure
Refactoring to Macros with ClojureRefactoring to Macros with Clojure
Refactoring to Macros with Clojure
Dmitry Buzdin
 
Ds lab manual by s.k.rath
Ds lab manual by s.k.rathDs lab manual by s.k.rath
Ds lab manual by s.k.rath
SANTOSH RATH
 

Similar to Java Performance Puzzlers (20)

How to add an optimization for C# to RyuJIT
How to add an optimization for C# to RyuJITHow to add an optimization for C# to RyuJIT
How to add an optimization for C# to RyuJIT
 
Anomalies in X-Ray Engine
Anomalies in X-Ray EngineAnomalies in X-Ray Engine
Anomalies in X-Ray Engine
 
Collection frame work
Collection  frame workCollection  frame work
Collection frame work
 
White box-sol
White box-solWhite box-sol
White box-sol
 
Coding in Style
Coding in StyleCoding in Style
Coding in Style
 
What is new in Java 8
What is new in Java 8What is new in Java 8
What is new in Java 8
 
Locks? We Don't Need No Stinkin' Locks - Michael Barker
Locks? We Don't Need No Stinkin' Locks - Michael BarkerLocks? We Don't Need No Stinkin' Locks - Michael Barker
Locks? We Don't Need No Stinkin' Locks - Michael Barker
 
Lock? We don't need no stinkin' locks!
Lock? We don't need no stinkin' locks!Lock? We don't need no stinkin' locks!
Lock? We don't need no stinkin' locks!
 
Java Questions
Java QuestionsJava Questions
Java Questions
 
Counter Wars (JEEConf 2016)
Counter Wars (JEEConf 2016)Counter Wars (JEEConf 2016)
Counter Wars (JEEConf 2016)
 
Java programming lab manual
Java programming lab manualJava programming lab manual
Java programming lab manual
 
LvivPy4 - Threading vs asyncio
LvivPy4 - Threading vs asyncioLvivPy4 - Threading vs asyncio
LvivPy4 - Threading vs asyncio
 
Lambdas puzzler - Peter Lawrey
Lambdas puzzler - Peter LawreyLambdas puzzler - Peter Lawrey
Lambdas puzzler - Peter Lawrey
 
Refactoring to Macros with Clojure
Refactoring to Macros with ClojureRefactoring to Macros with Clojure
Refactoring to Macros with Clojure
 
Ds lab manual by s.k.rath
Ds lab manual by s.k.rathDs lab manual by s.k.rath
Ds lab manual by s.k.rath
 
New SPL Features in PHP 5.3 (TEK-X)
New SPL Features in PHP 5.3 (TEK-X)New SPL Features in PHP 5.3 (TEK-X)
New SPL Features in PHP 5.3 (TEK-X)
 
Ns2: Introduction - Part I
Ns2: Introduction - Part INs2: Introduction - Part I
Ns2: Introduction - Part I
 
Design Patterns - Compiler Case Study - Hands-on Examples
Design Patterns - Compiler Case Study - Hands-on ExamplesDesign Patterns - Compiler Case Study - Hands-on Examples
Design Patterns - Compiler Case Study - Hands-on Examples
 
PVS-Studio team experience: checking various open source projects, or mistake...
PVS-Studio team experience: checking various open source projects, or mistake...PVS-Studio team experience: checking various open source projects, or mistake...
PVS-Studio team experience: checking various open source projects, or mistake...
 
05. Java Loops Methods and Classes
05. Java Loops Methods and Classes05. Java Loops Methods and Classes
05. Java Loops Methods and Classes
 

More from Doug Hawkins

More from Doug Hawkins (8)

ReadyNow: Azul's Unconventional "AOT"
ReadyNow: Azul's Unconventional "AOT"ReadyNow: Azul's Unconventional "AOT"
ReadyNow: Azul's Unconventional "AOT"
 
JVM Mechanics: When Does the JVM JIT & Deoptimize?
JVM Mechanics: When Does the JVM JIT & Deoptimize?JVM Mechanics: When Does the JVM JIT & Deoptimize?
JVM Mechanics: When Does the JVM JIT & Deoptimize?
 
Understanding Garbage Collection
Understanding Garbage CollectionUnderstanding Garbage Collection
Understanding Garbage Collection
 
JVM Internals - NHJUG Jan 2012
JVM Internals - NHJUG Jan 2012JVM Internals - NHJUG Jan 2012
JVM Internals - NHJUG Jan 2012
 
Inside Android's Dalvik VM - NEJUG Nov 2011
Inside Android's Dalvik VM - NEJUG Nov 2011Inside Android's Dalvik VM - NEJUG Nov 2011
Inside Android's Dalvik VM - NEJUG Nov 2011
 
JVM Internals - NEJUG Nov 2010
JVM Internals - NEJUG Nov 2010JVM Internals - NEJUG Nov 2010
JVM Internals - NEJUG Nov 2010
 
Introduction to Class File Format & Byte Code
Introduction to Class File Format & Byte CodeIntroduction to Class File Format & Byte Code
Introduction to Class File Format & Byte Code
 
JVM Internals - Garbage Collection & Runtime Optimizations
JVM Internals - Garbage Collection & Runtime OptimizationsJVM Internals - Garbage Collection & Runtime Optimizations
JVM Internals - Garbage Collection & Runtime Optimizations
 

Recently uploaded

Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 

Recently uploaded (20)

Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 

Java Performance Puzzlers

  • 1. JAVA PERFORMANCE PUZZLERS DOUGLAS Q. HAWKINS LEAD JIT DEVELOPER AZUL SYSTEMS @dougqh dougqh@gmail.com
  • 2. AGENDA LOOK AT SOME INTERESTING OFTEN UNINTUITIVE PERFORMANCE CASES SEE WHAT WE CAN LEARN FROM THEM
  • 3. Adding 1..1395 Numbers Adding 1..1396 Numbers A B C Adding 1..1397 Numbers
  • 4. 1 100 10000 0 50 100 150 200 250 300 350 400 450 logns iterations Interpreter 1st JIT 2nd JIT Repeat… Deoptimize & Repeat
  • 5. JMH JAVA MEASUREMENT HARNESS: @State(Scope.Benchmark) @BenchmarkMode(Mode.AverageTime) @OutputTimeUnit(TimeUnit.NANOSECONDS) public class Benchmark { @Setup public void setup() {} @Benchmark public void benchmark1() { … } @Benchmark public void benchmark2() { … } … } 01A
  • 6. add1395 avgt 10 3.790 ± 0.300 ns/op add1396 avgt 10 3.784 ± 0.336 ns/op add1397 avgt 10 2767.567 ± 351.909 ns/op 01A RESULTS
  • 8. PERFORMANCE CLIFF: HUGE METHOD LIMIT add1395 7993 bytes add1396 7999 bytes add1397 8005 bytes
  • 9. int x = 0; for ( int i = 1; i <= 1395; ++i ) { x += i; } return x; A B C int x = 0; for ( int i = 1; i <= 1396; ++i ) { x += i; } return x; int x = 0; for ( int i = 1; i <= 1397; ++i ) { x += i; } return x; 01B
  • 10. add1395 avgt 10 435.372 ± 19.225 ns/op add1396 avgt 10 436.955 ± 18.688 ns/op add1397 avgt 10 434.245 ± 11.635 ns/op NOW EQUALLY — BAD 01B
  • 11. System.arraycopy int[] dest = new int[src.length]; for ( int i = 0; i < src.length; ++i ) { dest[i] = src[i]; } array.clone A B C 02A
  • 12. arrayCopy avgt 10 95598.200 ns/op cloneArray avgt 10 94848.814 ns/op manual avgt 10 93309.838 ns/op 02A RESULTS 100k
  • 13. int[] original = randomInts(100_000); int[] copy1 = new int[original.length]; long startTime1 = System.nanoTime(); for ( int i = 0; i < original.length; ++i ) { copy1[i] = original[i]; } System.out.printf("copy loop: % 10d ns%n", System.nanoTime() - startTime1); int[] copy2 = new int[original.length]; long startTime2 = System.nanoTime(); System.arraycopy( original, 0, copy2, 0, original.length); System.out.printf("arraycopy: % 10d ns%n", System.nanoTime() - startTime2); VS. 02B
  • 14. 02B RESULTS copy loop: 1646957 ns arraycopy: 54827 ns
  • 15. PERFORMANCE IS FULL OF INTRICACIES.
  • 18. public java.lang.Integer valueOf(); Code: 0: aload_0 1: getfield offset:I 4: aload_0 5: getfield nums:[I 8: arraylength 9: if_icmplt 17 12: aload_0 13: iconst_0 14: putfield offset:I 17: aload_0 18: getfield nums:[I 21: aload_0 22: dup 23: getfield offset:I 26: dup_x1 27: iconst_1 28: iadd 29: putfield offset:I 32: iaload 33: invokestatic Integer.valueOf(I) 36: areturn public java.lang.Integer auto(); Code: 0: aload_0 1: getfield offset:I 4: aload_0 5: getfield nums:[I 8: arraylength 9: if_icmplt 17 12: aload_0 13: iconst_0 14: putfield offset:I 17: aload_0 18: getfield nums:[I 21: aload_0 22: dup 23: getfield offset:I 26: dup_x1 27: iconst_1 28: iadd 29: putfield offset:I 32: iaload 33: invokestatic Integer.valueOf(I) 36: areturn javap -c 03
  • 19. public static Integer valueOf(int i) { if (i >= IntegerCache.low && i <= IntegerCache.high) return IntegerCache.cache[i + (-IntegerCache.low)]; return new Integer(i); } static final int low = -128; // configurable through // -Djava.lang.Integer.IntegerCache.high static final int high = 127;
  • 20. 03 @Param({"100", "1000", "10000"}) private int range; PARAMERIZING JMH @Setup public void setUp() { ThreadLocalRandom random = ThreadLocalRandom.current(); nums = new int[1_000_000]; for ( int i = 0; i < nums.length; ++i ) { nums[i] = random.nextInt(-range, range); } }
  • 21. 03 RESULTSRESULTS (range) auto 100 avgt 10 5.132 ± 0.316 ns/op auto 1000 avgt 10 8.184 ± 1.551 ns/op auto 10000 avgt 10 6.996 ± 1.401 ns/op new_ 100 avgt 10 6.328 ± 0.973 ns/op new_ 1000 avgt 10 6.083 ± 0.651 ns/op new_ 10000 avgt 10 6.243 ± 1.031 ns/op valueOf 100 avgt 10 5.096 ± 0.116 ns/op valueOf 1000 avgt 10 8.488 ± 1.957 ns/op valueOf 10000 avgt 10 7.155 ± 1.382 ns/op
  • 22. 03 RESULTS -100 to 100 -1,000 to 1,000 -10,000 to 10,000 new 6.328 ± 0.973 6.083 ± 0.651 6.243 ± 1.031 autobox 5.132 ± 0.316 8.184 ± 1.551 6.996 ± 1.401 valueOf 5.096 ± 0.116 8.488 ± 1.957 7.155 ± 1.382
  • 24. 0 2000000 4000000 6000000 8000000 100 1000 10000 100000 1000000 ints ArrayList LinkedList http://cr.openjdk.java.net/~shade/scratch/ArrayVsLinked.java
  • 27. ARRAY
  • 29. A B C D E list.toArray() list.toArray(new Object[0]) list.toArray(new Object[list.size()]) list.toArray(new String[0]) list.toArray(new String[list.size()]) https://shipilev.net/blog/2016/arrays-wisdom-ancients/ 04
  • 30. A B C D E list.toArray() list.toArray(new Object[0]) list.toArray(new Object[list.size()]) list.toArray(new String[0]) list.toArray(new String[list.size()]) SAYS… https://shipilev.net/blog/2016/arrays-wisdom-ancients/ 04
  • 31. toArray avgt 10 54.084 ± 10.000 ns/op toArraySized avgt 10 58.555 ± 0.745 ns/op toArrayUnsized avgt 10 54.025 ± 0.343 ns/op toStringArraySize avgt 10 154.291 ± 2.060 ns/op toStringArrayUnsized avgt 10 135.603 ± 2.115 ns/op RESULTS https://shipilev.net/blog/2016/arrays-wisdom-ancients/ 04
  • 32. STRING[] SLOWER THAN OBJECT[]? Object[] objects = new Integer[20]; objects[0] = “foo”; https://shipilev.net/blog/2016/arrays-wisdom-ancients/
  • 33. STRING[] SLOWER THAN OBJECT[]? Object[] objects = new Integer[20]; objects[0] = “foo”; Possible Runtime Check Sometimes JIT Eliminates It https://shipilev.net/blog/2016/arrays-wisdom-ancients/
  • 34. TRICKY WHEN ARRAY IS PASSED IN list.toArray(new String[…]); OBJECT[] IS COMMONLY USED, SO SPECIAL CASE.
  • 35. toArray avgt 10 54.084 ± 10.000 ns/op toArraySized avgt 10 58.555 ± 0.745 ns/op toArrayUnsized avgt 10 54.025 ± 0.343 ns/op toStringArraySize avgt 10 154.291 ± 2.060 ns/op toStringArrayUnsized avgt 10 135.603 ± 2.115 ns/op IS WRONG. https://shipilev.net/blog/2016/arrays-wisdom-ancients/ 04
  • 36. WHY IS NO ARRAY / UNSIZED FASTER // allocate dest = malloc(sizeof(E) * len); // zero-initialize for ( int i = 0; i < len; ++i ) { dest[i] = null; } // copy for ( int i = 0; i < len; ++i ) { dest[i] = src[i]; } Dead Stores
  • 37. Integer.toString(NUM) “” + NUM A B C StringBuilder builder = new StringBuilder(); builder.append(NUM); builder.toString(); 05A
  • 38. static final int NUM = ThreadLocalRandom.current().nextInt() builder avgt 10 47.688 ± 3.866 ns/op concat avgt 10 33.118 ± 0.840 ns/op toString avgt 10 41.105 ± 2.005 ns/op 05A
  • 39. javap -c public java.lang.String concat(); Code: 0: new StringBuilder 3: dup 4: invokespecial StringBuilder."<init>":()V 7: ldc "" 9: invokevirtual StringBuilder.append(LString;) 12: getstatic NUM:I 15: invokevirtual StringBuilder.append(I) 18: invokevirtual StringBuilder.toString() 21: areturn 05A
  • 40. public java.lang.String concat() { return new StringBuilder(). append(“”). append(NUM). toString(); } IN JAVA... 05B
  • 41. return new StringBuilder(). append(“”). append(NUM). toString(); StringBuilder builder = new StringBuilder(); builder.append(NUM); VS. 05B return builder.toString();
  • 42. 05B RESULTS builder avgt 10 41.122 ± 1.688 ns/op concat avgt 10 35.173 ± 3.092 ns/op concatLikeBuilder avgt 10 32.536 ± 2.302 ns/op
  • 44. static final int NUM = 1 << 20 builder avgt 10 40.734 ± 0.997 ns/op concat avgt 10 3.343 ± 0.205 ns/op toString avgt 10 32.877 ± 0.450 ns/op 05C
  • 45. public java.lang.String concat(); Code: 0: ldc “1048576” 2: areturn public java.lang.String concat() { return “1048576”; } javap -c 05C
  • 46. CONSTANT FOLDING & PROPAGATION static final int NUM = 1 << 20 static final int NUM = 1_048_576 constant fold constant propagate “” + NUM “” + 1_048_576 constant fold “1048576” 05C
  • 48. String str = “”; for ( int i = 0; i < 100; ++i ) { str += i; } A B StringBuilder builder = new StringBuilder(); for ( int i = 0; i < 100; ++i ) { builder.append(i); } 06
  • 49. RESULTS builder avgt 10 979.112 ± 59.893 ns/op concat avgt 10 2661.941 ± 135.251 ns/op 05
  • 50. FAST IN ONE CONTEXT CAN BE SLOW IN ANOTHER.
  • 51. RESULTS speed objects allocated* memory consumed* concat 2661.941 ± 135.251 ns/op 400 objects 71800 bytes builder 979.112 ± 59.893 ns/op 9 objects 1640 bytes * Memory usage measured separately with Caliper 06
  • 52. RESULTS speed objects allocated* memory consumed* concat 2661.941 ± 135.251 ns/op 400 objects 71800 bytes builder 979.112 ± 59.893 ns/op 9 objects 1640 bytes sizedBuilder 887.717 ± 62.569 ns/op 4 objects 1464 bytes * Memory usage measured separately with Caliper 06
  • 53. APPLIES TO COLLECTIONS, TOO. HashSet 16 HashMap 16 Hashtable 16 LinkedList 1 ArrayList 10 Vector 10 StringBuilder 16 StringBuffer 16 http://www.slideshare.net/cnbailey/memory-efficient-java
  • 54. O(N)?
  • 55. obj.invoke with 1 type obj.invoke with 2 type A B C obj.invoke with 3 types (monomoprhic) (bimorphic) (trimorphic) (megamorphic) 07A
  • 56. THERE IS A CLIFF AT 3 TYPES.
  • 57. https://shipilev.net/blog/2015/black-magic-method-dispatch/ func.apply(x); if ( func.getClass() == Square.class ) { x * x; } else if ( func.getClass() == Cube.class ) { x * x * x } else { … } func.apply(x); <= 2 types > 2 types
  • 58. @Setup public void setup() { for ( int i = 0; i < 20_000; ++i ) { if ( morphism >= 1 ) func = new Square(); call(); if ( morphism >=2 ) func = new Cube(); call(); if ( morphism >= 3 ) … call(); if ( morphism >= 4 ) … call(); } // regardless of morphism -- // use Square in the end func = new Square(); } @Benchmark public int call() { int x = nums[index]; index = (index + 1) % nums.length; return func.apply(x); }
  • 59. call 1 avgt 10 8.120 ± 0.103 ns/op call 2 avgt 10 8.225 ± 0.113 ns/op call 3 avgt 10 8.170 ± 0.329 ns/op call 4 avgt 10 8.189 ± 0.241 ns/op RESULTS 07A (morphism)
  • 61. @Benchmark public int callLoop() { int sum = 0; for ( int x: nums ) { sum += call(x); } return sum; } public int call(int x) { return func.apply(x); } 07B https://shipilev.net/blog/2015/black-magic-method-dispatch/
  • 62. 07B RESULTS callLoop 1 avgt 10 4079.187 ± 269.537 ns/op callLoop 2 avgt 10 6090.224 ± 209.573 ns/op callLoop 3 avgt 10 20508.673 ± 18484.645 ns/op callLoop 4 avgt 10 20271.124 ± 17767.914 ns/op (morphism)
  • 64. PERFORMANCE IS FULL OF SURPRISES. UNINTUITIVE INTRICACIES SENSITIVITIES NON-OBVIOUS CLIFFS NOT ADDITIVE
  • 65. PERFORMANCE IS FULL OF SURPRISES. DON’T WORRY *TOO* MUCH. JUST WRITE CLEAN CODE. ONLY WORRY ABOUT THE HOTTEST CODE, IMPROVE AND MEASURE *CAREFULLY*.
  • 66. REMEMBER BEST WAY TO ADD 1…1396 int x = 0; x += 1; x += 2; x += 3; x += 4; x += 5; x += 6; x += 7; x += 8; x += 9; x += 10; x += 11; x += 12; x += 13; x += 14; x += 15; x += 16; x += 17; x += 18; x += 19; x += 20; x += 21; x += 22; x += 23; x += 24; x += 25; x += 26; x += 27; x += 28; x += 29; x += 30; x += 31; x += 32; x += 33; x += 34; x += 35; x += 36; x += 37; x += 38; x += 39; x += 40; x += 41; x += 42; x += 43; x += 44; x += 45; x += 46; x += 47; x += 48; x += 49; x += 50; x += 51; x += 52; x += 53; x += 54; x += 55; x += 56; x += 57; x += 58; x += 59; x += 60; x += 61; x += 62; x += 63; x += 64; x += 65; x += 66; x += 67; x += 68; x += 69; x += 70; x += 71; x += 72; x += 73; x += 74; x += 75; x += 76; x += 77; x += 78; x += 79; x += 80; x += 81; x += 82; x += 83; x += 84; x += 85; x += 86; x += 87; x += 88; x += 89; x += 90; x += 91; x += 92; x += 93; x += 94; x += 95; x += 96; x += 97; x += 98; x += 99; x += 100; x += 101; x += 102; x += 103; x += 104; x += 105; x += 106; x += 107; x += 108; x += 109; x += 110; x += 111; x += 112; x += 113; x += 114; x += 115; x += 116; x += 117; x += 118; x += 119; x += 120; x += 121; x += 122; x += 123; x += 124; x += 125; x += 126; x += 127; x += 128; x += 129; x += 130; x += 131; x += 132; x += 133; x += 134; x += 135; x += 136; x += 137; x += 138; x += 139; x += 140; x += 141; x += 142; x += 143; x += 144; x += 145; x += 146; x += 147; x += 148; x += 149; x += 150; x += 151; x += 152; x += 153; x += 154; x += 155; x += 156; x += 157; x += 158; x += 159; x += 160; x += 161; x += 162; x += 163; x += 164; x += 165; x += 166; x += 167; x += 168; x += 169; x += 170; x += 171; x += 172; x += 173; x += 174; x += 175; x += 176; x += 177; x += 178; x += 179; x += 180; x += 181; x += 182; x += 183; x += 184; x += 185; x += 186; x += 187; x += 188; x += 189; x += 190; x += 191; x += 192; x += 193; x += 194; x += 195; x += 196; x += 197; x += 198; x += 199; x += 200; x += 201; x += 202; x += 203; x += 204; x += 205; x += 206; x += 207; x += 208; x += 209; x += 210;
  • 67. int n = 1395; return n * (n + 1) / 2;A B C int n = 1396; return n * (n + 1) / 2; int n = 1397; return n * (n + 1) / 2; 2.515 ± 0.043 ns/op 2.532 ± 0.089 ns/op 2.580 ± 0.042 ns/op 01C
  • 68. REFERENCES ALEKSEY SHIPILËV http://shipilev.net/ PSYCHOMATIC LOBOTOMY SAW http://psy-lob-saw.blogspot.com/ MECHANICAL SYMPATHY http://mechanical-sympathy.blogspot.com/ JAVA SPECIALIST NEWSLETTER http://www.javaspecialists.eu/