SlideShare a Scribd company logo
JVM Mechanics
A peek under the hood

Gil Tene, CTO & co-Founder, Azul Systems
Ā©2012 Azul Systems, Inc.
About me: Gil Tene
co-founder, CTO
@Azul Systems
Working JVMs since
2001, Managed
runtimes since 1989
Created Pauseless & C4
core GC algorithms
(Tene, Wolf)

A Long history building
Virtual & Physical
Machines, Operating
Systems, Enterprise
apps, etc...
Ā©2012 Azul Systems, Inc.	

	

	

	

	

* working on real-world trash compaction issues, circa 2004
About Azul
Vega

We make scalable Virtual
Machines
Have built ā€œwhatever it takes
to get job doneā€ since 2002
3 generations of custom SMP
Multi-core HW (Vega)
Now Pure software for
commodity x86 (Zing)

C4

ā€œIndustry ļ¬rstsā€ in Garbage
collection, elastic memory,
Java virtualization, memory
scale
Ā©2012 Azul Systems, Inc.
High level agenda
Compiler stuff
Adaptive behavior stuff
Ordering stuff
Garbage Collection stuff
Some chest beating
Open discussion
Ā©2012 Azul Systems, Inc.
Compiler Stuff
The JIT compilers transforms code
The code actually executed can be very
different than the code you write
Some simple compiler tricks
Code can be reordered...
int doMath(int x, int y, int z) {
int a = x + y;
int b = x - y;
int c = z + x;
return a + b;
}

Can be reordered to:
int doMath(int x, int y, int z) {
int c = z + x;
int b = x - y;
int a = x + y;
return a + b;
}
Ā©2012 Azul Systems, Inc.
Dead code can be removed
int doMath(int x, int y, int z) {
int a = x + y;
int b = x - y;
int c = z + x;
return a + b;
}

Can be reduced to:
int doMath(int x, int y, int z) {
int a = x + y;
int b = x - y;
return a + b;
}
Ā©2012 Azul Systems, Inc.
Values can be propagated
int doMath(int x, int y, int z) {
int a = x + y;
int b = x - y;
int c = z + x;
return a + b;
}

Can be reduced to:
int doMath(int x, int y, int z) {
return x + y + x - y;
}

Ā©2012 Azul Systems, Inc.
Math can be simpliļ¬ed
int doMath(int x, int y, int z) {
int a = x + y;
int b = x - y;
int c = z + x;
return a + b;
}

Can be reduced to:
int doMath(int x, int y, int z) {
return x + x;
}

Ā©2012 Azul Systems, Inc.
So why does this matter
Keep your code ā€œreadableā€
largestValueLog = Math.log(largestValueWithSingleUnitResolution);

magnitude = (int) Math.ceil(largestValueLog/Math.log(2.0));
subBucketMagnitude = (magnitude > 1) ? magnitude : 1;
subBucketCount = (int) Math.pow(2, subBucketMagnitude);
subBucketMask = subBucketCount - 1;

Hard enough to follow as it is
No value in ā€œoptimizingā€ human-readable meaning away
Compiled code will end up the same anyway
Ā©2012 Azul Systems, Inc.
Some more compiler tricks
Reads can be cached
int distanceRatio(Object a) {
int distanceTo = a.getX() - start;
int distanceAfter = end - a.getX();
return distanceTo/distanceAfter;
}

Is the same as
int distanceRatio(Object a) {
int x = a.getX();
int distanceTo = x - start;
int distanceAfter = end - x;
return distanceTo/distanceAfter;
}
Ā©2012 Azul Systems, Inc.
Reads can be cached
void loopUntilFlagSet(Object a) {
while (!a.ļ¬‚agIsSet()) {
loopcount++;
}
}

Is the same as:
void loopUntilFlagSet(Object a) {
boolean ļ¬‚agIsSet = a.ļ¬‚agIsSet();
while (!ļ¬‚agIsSet) {
loopcount++;
}
}
Ā©2012 Azul Systems, Inc.	

Thatā€™s what volatile is for...
Writes can be eliminated
Intermediate values might never be visible
void updateDistance(Object a) {
int distance = 100;
a.setX(distance);
a.setX(distance * 2);
a.setX(distance * 3);
}

Is the same as
void updateDistance(Object a) {
a.setX(300);
}
Ā©2012 Azul Systems, Inc.
Writes can be eliminated
Intermediate values might never be visible
void updateDistance(Object a) {
a. setVisibleValue(0);
for (int i = 0; i < 1000000; i++) {
a.setInternalValue(i);
}
a.setVisibleValue(a.getInternalValue());
}

Is the same as
void updateDistance(Object a) {
a.setInternalValue(1000000);
a.setVisibleValue(1000000);
}
Ā©2012 Azul Systems, Inc.
Inlining...
public class Thing {
private int x;
public ļ¬nal int getX() { return x };
}
...
myX = thing.getX();

Is the same as
Class Thing {
int x;
}
...
myX = thing.x;
Ā©2012 Azul Systems, Inc.
Things JIT compilers can do
..and static compilers can have
a hard time with
Class Hierarchy Analysis (CHA)
Can perform global analysis on currently loaded code
Deduce stuff about inheritance, method overrides, etc.

Can make optimization decisions based on assumptions
Re-evaluate assumptions when loading new classes
Throw away code that conļ¬‚icts with assumptions
before class loading makes them invalid

Ā©2012 Azul Systems, Inc.
Inlining works without ā€œļ¬nalā€
public class Thing {
private int x;
public int getX() { return x };
}
...
myX = thing.getX();

Is the same as
Class Thing {
int x;
}
...
myX = thing.x;

As long as there is only one implementer of getX()
Ā©2012 Azul Systems, Inc.
Speculative stuff
The power of the ā€œuncommon trapā€
Being able throw away wrong code is very useful
Speculatively assuming callee type
polymorphic can be ā€œmonomorphicā€ or ā€œmegamorphicā€
Can make virtual calls static even without CHA
Can speculatively inline things without CHA
Speculatively assuming branch behavior
Weā€™ve only ever seen this thing go one way, so....
Ā©2012 Azul Systems, Inc.
Adaptive compilation make
cleaner code practical
Reduces need to trade off clean design against speed

E.g. ā€œļ¬nalā€ should be used on methods only when you
want to prohibit extension, overriding. Has no effect on
speed.
E.g. branching can be written ā€œnaturallyā€

Ā©2012 Azul Systems, Inc.
Interesting side effects
Adaptive compilation is...
adaptive
Measuring actual behavior is harder
Micro-benchmarking is an art
JITs are moving target
ā€œWarmupā€ techniques can often fail

Ā©2012 Azul Systems, Inc.
Warmup problems
Common Example:
Trading system wants to have the ļ¬rst trade be fast
So run 20,000 ā€œfakeā€ messages through the system to warm up
let JIT compilers optimize code, and deopt before actual trades

What really happens
Code is written to do different things ā€œif this is a fake messageā€
e.g. ā€œDonā€™t send to the exchange if this is a fake messageā€
JITs optimize for fake path, including speculatively assuming ā€œfakeā€
First real message through deopts...
Ā©2012 Azul Systems, Inc.
Warmup tips...
(to get ā€œļ¬rst real thingā€ to be fast)
System should not distinguish between ā€œrealā€ and ā€œfakeā€
Make that an external concern
Avoid all conditional code based on ā€œfakeā€
Avoid all class-speciļ¬c calls based on ā€œfakeā€ (ā€œobject oriented ifsā€).

Use ā€œrealā€ input sources and output targets instead
Have system output to a sink target that knows it is ā€œfakeā€
Make output decisions based on data, not code
E.g. array of sink targets, with array index computed from message
payload

Ā©2012 Azul Systems, Inc.
Ordering stuff
Ordering of operations
Within a thread, itā€™s trivial : ā€œhappens beforeā€
Across threads?
News ļ¬‚ash: CPU memory ordering doesnā€™t matter
There is a much bigger culprit at play: Compilers
The code will be reordered before your cpu ever sees it
The only ordering rules that matter are the JMM ones.

Intuitive read:
ā€œHappens before holds within threadsā€
Things can move into, but not out of synchronized blocks
Volatile stuff is a bit more tricky...
Ā©2012 Azul Systems, Inc.
Ordering of operations

Source: http:/
/g.oswego.edu/dl/jmm/cookbook.html
Ā©2012 Azul Systems, Inc.
Garbage Collection Stuff
Most of what People seem to ā€œknowā€
about Garbage Collection is wrong
In many cases, itā€™s much better than you may think
GC is extremely efļ¬cient. Much more so that malloc()
Dead objects cost nothing to collect
GC will ļ¬nd all the dead objects (including cyclic graphs)
...

In many cases, itā€™s much worse than you may think
Yes, it really does stop for ~1 sec per live GB (in most JVMs).
No, GC does not mean you canā€™t have memory leaks
No, those pauses you eliminated from your 20 minute test are
not gone
...
Ā©2012 Azul Systems, Inc.
Generational Collection
Weak Generational Hypothesis; ā€œmost objects die youngā€
Focus collection efforts on young generation:
Use a moving collector: work is linear to the live set
The live set in the young generation is a small % of the space
Promote objects that live long enough to older generations

Only collect older generations as they ļ¬ll up
ā€œGenerational ļ¬lterā€ reduces rate of allocation into older generations

Tends to be (order of magnitude) more efļ¬cient
Great way to keep up with high allocation rate
Practical necessity for keeping up with processor throughput
Ā©2012 Azul Systems, Inc.
GC Efficiency:
Empty memory vs. CPU

Ā©2012 Azul Systems, Inc.
CPU%
Heap size vs.
GC CPU %

100%

Live set

Heap size
Two Intuitive limits

If we had exactly 1 byte of empty memory at all
times, the collector would have to work ā€œvery hardā€,
and GC would take 100% of the CPU time
If we had inļ¬nite empty memory, we would never have
to collect, and GC would take 0% of the CPU time
GC CPU % will follow a rough 1/x curve between these
two limit points, dropping as the amount of memory
increases.

Ā©2012 Azul Systems, Inc.
Empty memory needs
(empty memory == CPU power)
The amount of empty memory in the heap is the
dominant factor controlling the amount of GC work
For both Copy and Mark/Compact collectors, the
amount of work per cycle is linear to live set
The amount of memory recovered per cycle is equal to
the amount of unused memory (heap size) - (live set)
The collector has to perform a GC cycle when the
empty memory runs out
A Copy or Mark/Compact collectorā€™s efļ¬ciency doubles
with every doubling of the empty memory
Ā©2012 Azul Systems, Inc.
What empty memory controls
Empty memory controls efļ¬ciency (amount of collector
work needed per amount of application work
performed)
Empty memory controls the frequency of pauses (if
the collector performs any Stop-the-world operations)
Empty memory DOES NOT control pause times (only
their frequency)
In Mark/Sweep/Compact collectors that pause for
sweeping, more empty memory means less frequent but
LARGER pauses
Ā©2012 Azul Systems, Inc.
GC and latency:
That pesky stop-the-world thing

Ā©2012 Azul Systems, Inc.
Delaying the inevitable
Some form of copying/compaction is inevitable in practice
And compacting anything requires scanning/ļ¬xing all references to it

Delay tactics focus on getting ā€œeasy empty spaceā€ ļ¬rst
This is the focus for the vast majority of GC tuning

Most objects die young [Generational]
So collect young objects only, as much as possible. Hope for short STW.
But eventually, some old dead objects must be reclaimed

Most old dead space can be reclaimed without moving it
[e.g. CMS] track dead space in lists, and reuse it in place
But eventually, space gets fragmented, and needs to be moved

Much of the heap is not ā€œpopularā€ [e.g. G1, ā€œBalancedā€]
A non popular region will only be pointed to from a small % of the heap
So compact non-popular regions in short stop-the-world pauses
But eventually, popular objects and regions need to be compacted
Ā©2012 Azul Systems, Inc.
Memory use
How many of you use heap sizes of:
Fļ†

more than Ā½ GB?

Fļ†

more than 1 GB?

Fļ†

more than 2 GB?

Fļ†

more than 4 GB?

Fļ†

more than 10 GB?

Fļ†

more than 20 GB?

Fļ†

more than 50 GB?

Ā©2012 Azul Systems, Inc.
Reality check: servers in 2012
Retail prices, major web server store (US $, Oct 2012)
24 vCore, 128GB server

ā‰ˆ $5K

24 vCore, 256GB server

ā‰ˆ $8K

32 vCore, 384GB server

ā‰ˆ $14K

48 vCore, 512GB server

ā‰ˆ $19K

64 vCore, 1TB server

ā‰ˆ $36K

Cheap (< $1/GB/Month), and roughly linear to ~1TB
10s to 100s of GB/sec of memory bandwidth
The Application Memory Wall
A simple observation:
Application instances appear to be unable to
make effective use of modern server memory
capacities

The size of application instances as a % of a
serverā€™s capacity is rapidly dropping

Ā©2012 Azul Systems, Inc.
How much memory do applications need?
ā€œ640KB ought to be enough for anybodyā€
WRONG!
So whatā€™s the right number?
6,400K?
64,000K?
640,000K?
6,400,000K?
64,000,000K?
There is no right number
Target moves at 50x-100x per decade
Ā©2012 Azul Systems, Inc.	

	

	

	

	

	

ā€œI've said some stupid things and
some wrong things, but not that.
No one involved in computers
would ever say that a certain
amount of memory is enough for
all time ā€¦ā€ - Bill Gates, 1996
ā€œTinyā€ application history
Assuming Mooreā€™s Law means:
2010

ā€œtransistor counts grow at ā‰ˆ2x
every ā‰ˆ18 monthsā€
It also means memory size grows
ā‰ˆ100x every 10 years

1990

1980

??? GB apps on 256 GB
Application
Memory Wall

2000

1GB apps on a 2 ā€“ 4 GB server

10MB apps on a 32 ā€“ 64 MB server

100KB apps on a Ā¼ to Ā½ MB Server
ā€œTinyā€: would be ā€œsillyā€ to distribute

Ā©2012 Azul Systems, Inc.
What is causing the
Application Memory Wall?
Garbage Collection is a clear and dominant cause
There seem to be practical heap size limits for
applications with responsiveness requirements
[Virtually] All current commercial JVMs will exhibit a
multi-second pause on a normally utilized 2-6GB heap.
Itā€™s a question of ā€œWhenā€ and ā€œHow oftenā€, not ā€œIfā€.
GC tuning only moves the ā€œwhenā€ and the ā€œhow oftenā€ around

Root cause: The link between scale and responsiveness
Ā©2012 Azul Systems, Inc.
The problems that need solving
(areas where the state of the art needs improvement)

Robust Concurrent Marking
In the presence of high mutation and allocation rates
Cover modern runtime semantics (e.g. weak refs, lock deļ¬‚ation)

Compaction that is not monolithic-stop-the-world
E.g. stay responsive while compacting Ā¼ TB heaps
Must be robust: not just a tactic to delay STW compaction
[current ā€œincremental STWā€ attempts fall short on robustness]

Young-Gen that is not monolithic-stop-the-world
Stay responsive while promoting multi-GB data spikes
Concurrent or ā€œincremental STWā€ may both be ok
Surprisingly little work done in this speciļ¬c area
Ā©2012 Azul Systems, Inc.
jHiccup

Ā©2012 Azul Systems, Inc.
Incontinuities in Java platform execution
Hiccups"by"Time"Interval"

Max"per"Interval"

99%"

99.90%"

99.99%"

Max"

Hiccup&Dura*on&(msec)&

1800"
1600"
1400"
1200"
1000"
800"
600"
400"
200"
0"
0"

200"

400"

600"

800"

1000"

1200"

1400"

1600"

1800"

&Elapsed&Time&(sec)&

Hiccups"by"Percen@le"Distribu@on"

1800"

Hiccup&Dura*on&(msec)&

1600"

Max=1665.024&

1400"
1200"
1000"
800"
600"
400"
200"
0"

Ā©2012 Azul Systems, Inc.	

	

	

0%"

	

90%"

	

99%"

	

&
99.9%"

&
Percen*le&

99.99%"

99.999%"
jHiccup
A tool for capturing and displaying platform hiccups
Records any observed non-continuity of the underlying platform
Plots results in simple, consistent format

Simple, non-intrusive
As simple as adding the word ā€œjHiccupā€ to your java launch line
% jHiccup java myļ¬‚ags myApp
(Or use as a java agent)
Adds a background thread that samples time @ 1000/sec

Open Source
Released to the public domain, creative commons CC0
Ā©2012 Azul Systems, Inc.
Telco App Example
Max Time per

Hiccups&by&Time&Interval&

interval

Max"per"Interval"

99%"

99.90%"

99.99%"

Max"

Hiccup&Dura*on&(msec)&

140"
120"
100"
80"
60"
40"
20"
0"
0"

500"

1000"

1500"

2000"

2500"

&Elapsed&Time&(sec)&

Optional SLA

Hiccups&by&Percen*le&Distribu*on&

plotting
Hiccup&Dura*on&(msec)&

120"

duration at

100"
80"

percentile

60"

levels

40"
20"
0"

0%"

90%"

99%"

&
99.9%"

&
Percen*le&

Hiccups"by"Percen?le"

Ā©2012 Azul Systems, Inc.	

Hiccup

140"

	

	

	

	

	

99.99%"

SLA"

99.999%"

99.9999%"
Examples

Ā©2012 Azul Systems, Inc.
Idle App on Quiet System

Idle App on Busy System

Hiccups&by&Time&Interval&
Max"per"Interval"

99%"

99.90%"

Hiccups&by&Time&Interval&

99.99%"

Max"

Max"per"Interval"

20"
15"
10"
5"
0"

Max"

50"
40"
30"
20"
10"

100"

200"

300"

400"

500"

600"

700"

800"

900"

0"

100"

200"

&Elapsed&Time&(sec)&

300"

400"

500"

600"

700"

800"

900"

&Elapsed&Time&(sec)&

Hiccups&by&Percen*le&Distribu*on&

Hiccups&by&Percen*le&Distribu*on&

25"

60"

Max=22.336&

20"

Hiccup&Dura*on&(msec)&

Hiccup&Dura*on&(msec)&

99.99%"

0"
0"

15"
10"
5"
0"

99.90%"

60"

Hiccup&Dura*on&(msec)&

Hiccup&Dura*on&(msec)&

25"

99%"

0%"

Ā©2012 Azul Systems, Inc.	

90%"

99%"

	

	

&
99.9%"

99.99%"

99.999%"

	

	

	

&
Percen*le&

50"

Max=49.728&
40"
30"
20"
10"
0"

0%"

90%"

99%"

&
99.9%"

&
Percen*le&

99.99%"

99.999%"
Idle App on Quiet System

Idle App on Dedicated System

Hiccups&by&Time&Interval&
Max"per"Interval"

99%"

99.90%"

Hiccups&by&Time&Interval&

99.99%"

Max"

Max"per"Interval"

20"
15"
10"
5"
0"

99.99%"

Max"

0.4"
0.35"
0.3"
0.25"
0.2"
0.15"
0.1"
0.05"
0"

0"

100"

200"

300"

400"

500"

600"

700"

800"

900"

0"

100"

200"

&Elapsed&Time&(sec)&

300"

400"

500"

600"

700"

800"

900"

&Elapsed&Time&(sec)&

Hiccups&by&Percen*le&Distribu*on&

Hiccups&by&Percen*le&Distribu*on&

25"

0.45"

Max=22.336&

20"

Hiccup&Dura*on&(msec)&

Hiccup&Dura*on&(msec)&

99.90%"

0.45"

Hiccup&Dura*on&(msec)&

Hiccup&Dura*on&(msec)&

25"

99%"

15"
10"
5"

0.4"

Max=0.411&

0.35"
0.3"
0.25"
0.2"
0.15"
0.1"
0.05"

0"

0%"

Ā©2012 Azul Systems, Inc.	

90%"

99%"

	

	

&
99.9%"

99.99%"

99.999%"

	

	

	

&
Percen*le&

0"

0%"

90%"

99%"

&
99.9%"

&
Percen*le&

99.99%"

99.999%"
EHCache: 1GB data set under load
Hiccups"by"Time"Interval"

Max"per"Interval"

99%"

99.90%"

99.99%"

Max"

Hiccup&Dura*on&(msec)&

4000"
3500"
3000"
2500"
2000"
1500"
1000"
500"
0"
0"

500"

1000"

1500"

2000"

&Elapsed&Time&(sec)&

Hiccups"by"Percen@le"Distribu@on"

Hiccup&Dura*on&(msec)&

4000"
3500"

Max=3448.832&

3000"
2500"
2000"
1500"
1000"
500"
0"

Ā©2012 Azul Systems, Inc.	

	

	

0%"

	

90%"

	

99%"

	

&
99.9%"

&
Percen*le&

99.99%"

99.999%"

99.9999%"
Fun with jHiccup

Ā©2012 Azul Systems, Inc.
Oracle HotSpot CMS, 1GB in an 8GB heap

Zing 5, 1GB in an 8GB heap

Hiccups&by&Time&Interval&
Max"per"Interval"

99%"

99.90%"

Hiccups&by&Time&Interval&

99.99%"

Max"

Max"per"Interval"

12000"
10000"
8000"
6000"
4000"
2000"
0"

2000"

2500"

Max"

20"
15"
10"
5"

500"

1000"

1500"

2000"

2500"

3000"

3500"

0"

500"

1000"

&Elapsed&Time&(sec)&

1500"

3000"

3500"

&Elapsed&Time&(sec)&

Hiccups&by&Percen*le&Distribu*on&

Hiccups&by&Percen*le&Distribu*on&

14000"

25"

Max=13156.352&

12000"

Hiccup&Dura*on&(msec)&

Hiccup&Dura*on&(msec)&

99.99%"

0"
0"

10000"
8000"
6000"
4000"
2000"
0"

99.90%"

25"

Hiccup&Dura*on&(msec)&

Hiccup&Dura*on&(msec)&

14000"

99%"

0%"

Ā©2012 Azul Systems, Inc.	

90%"

	

99%"

	

&99.9%"
&
Percen*le&

	

99.99%"

	

99.999%"

	

20"

Max=20.384&

15"
10"
5"
0"

0%"

90%"

99%"

&
99.9%"

&
Percen*le&

99.99%"

99.999%"

99.9999%"
Oracle HotSpot CMS, 1GB in an 8GB heap

Zing 5, 1GB in an 8GB heap

Hiccups&by&Time&Interval&
Max"per"Interval"

99%"

99.90%"

Hiccups&by&Time&Interval&

99.99%"

Max"

Max"per"Interval"

12000"

99.99%"

Max"

12000"

10000"
8000"
6000"
4000"
2000"
0"

10000"
8000"
6000"
4000"
2000"
0"

0"

500"

1000"

1500"

2000"

2500"

3000"

3500"

0"

500"

1000"

&Elapsed&Time&(sec)&

2000"

2500"

3000"

3500"

Hiccups&by&Percen*le&Distribu*on&

14000"

14000"

Hiccup&Dura*on&(msec)&

Max=13156.352&

12000"
10000"
8000"
6000"
4000"
2000"
0"

1500"

&Elapsed&Time&(sec)&

Hiccups&by&Percen*le&Distribu*on&

Hiccup&Dura*on&(msec)&

99.90%"

14000"

Hiccup&Dura*on&(msec)&

Hiccup&Dura*on&(msec)&

14000"

99%"

0%"

90%"

99%"

&99.9%"
&
Percen*le&

99.99%"

99.999%"

12000"
10000"
8000"
6000"
4000"
2000"
0"

0%"

90%"
Max=20.384&

Drawn to scale
Ā©2012 Azul Systems, Inc.	

	

	

	

	

	

99%"

&
99.9%"

99.99%"

&
Percen*le&

99.999%"

99.9999%"
What you can expect (from Zing)
in the low latency world
Assuming individual transaction work is ā€œshortā€ (on
the order of 1 msec), and assuming you donā€™t have
100s of runnable threads competing for 10 cores...
ā€œEasilyā€ get your application to < 10 msec worst case
With some tuning, 2-3 msec worst case
Can go to below 1 msec worst case...
May require heavy tuning/tweaking
Mileage WILL vary
Ā©2012 Azul Systems, Inc.
Oracle HotSpot (pure newgen)

Zing

Hiccups&by&Time&Interval&
Max"per"Interval"

99%"

99.90%"

Hiccups&by&Time&Interval&

99.99%"

Max"

Max"per"Interval"

20"
15"
10"
5"
0"

99.99%"

Max"

1.6"
1.4"
1.2"
1"
0.8"
0.6"
0.4"
0.2"
0"

0"

100"

200"

300"

400"

500"

600"

0"

100"

200"

&Elapsed&Time&(sec)&

300"

400"

500"

600"

&Elapsed&Time&(sec)&

Hiccups&by&Percen*le&Distribu*on&

Hiccups&by&Percen*le&Distribu*on&

25"

1.8"

Max=22.656&

20"

Hiccup&Dura*on&(msec)&

Hiccup&Dura*on&(msec)&

99.90%"

1.8"

Hiccup&Dura*on&(msec)&

Hiccup&Dura*on&(msec)&

25"

99%"

15"
10"
5"

1.6"

Max=1.568&

1.4"
1.2"
1"
0.8"
0.6"
0.4"
0.2"

0"

0%"

90%"

&

99%"

99.9%"

&
99.99%"

99.999%"

Percen*le&

0"

0%"

90%"

&

99%"

Low latency trading application
Ā©2012 Azul Systems, Inc.	

	

	

	

	

	

99.9%"

&
99.99%"
Percen*le&

99.999%"
Oracle HotSpot (pure newgen)

Zing

Hiccups&by&Time&Interval&
Max"per"Interval"

99%"

99.90%"

Hiccups&by&Time&Interval&

99.99%"

Max"

Max"per"Interval"

20"
15"
10"
5"
0"

Max"

20"
15"
10"
5"

100"

200"

300"

400"

500"

600"

0"

100"

200"

&Elapsed&Time&(sec)&

300"

400"

500"

600"

&Elapsed&Time&(sec)&

Hiccups&by&Percen*le&Distribu*on&

Hiccups&by&Percen*le&Distribu*on&

25"

25"

Max=22.656&

20"

Hiccup&Dura*on&(msec)&

Hiccup&Dura*on&(msec)&

99.99%"

0"
0"

15"
10"
5"
0"

99.90%"

25"

Hiccup&Dura*on&(msec)&

Hiccup&Dura*on&(msec)&

25"

99%"

0%"

90%"

&

99%"

99.9%"

&
99.99%"

99.999%"

Percen*le&

20"
15"
10"
5"
0"

Max=1.568&
0%"

90%"

&

99%"

Low latency - Drawn to scale
Ā©2012 Azul Systems, Inc.	

	

	

	

	

	

99.9%"

&
99.99%"
Percen*le&

99.999%"
Open
Discussion
http:/
/www.azulsystems.com
http:/
/www.jhiccup.com

Ā©2012 Azul Systems, Inc.

More Related Content

Viewers also liked

Dissecting the Hotspot JVM
Dissecting the Hotspot JVMDissecting the Hotspot JVM
Dissecting the Hotspot JVM
Ivan Ivanov
Ā 

Viewers also liked (9)

Priming Java for Speed at Market Open
Priming Java for Speed at Market OpenPriming Java for Speed at Market Open
Priming Java for Speed at Market Open
Ā 
Start Fast and Stay Fast - Priming Java for Market Open with ReadyNow!
Start Fast and Stay Fast - Priming Java for Market Open with ReadyNow!Start Fast and Stay Fast - Priming Java for Market Open with ReadyNow!
Start Fast and Stay Fast - Priming Java for Market Open with ReadyNow!
Ā 
Enabling Java in Latency Sensitive Environments
Enabling Java in Latency Sensitive EnvironmentsEnabling Java in Latency Sensitive Environments
Enabling Java in Latency Sensitive Environments
Ā 
The Java Evolution Mismatch - Why You Need a Better JVM
The Java Evolution Mismatch - Why You Need a Better JVMThe Java Evolution Mismatch - Why You Need a Better JVM
The Java Evolution Mismatch - Why You Need a Better JVM
Ā 
QCon London: Low latency Java in the real world - LMAX Exchange and the Zing JVM
QCon London: Low latency Java in the real world - LMAX Exchange and the Zing JVMQCon London: Low latency Java in the real world - LMAX Exchange and the Zing JVM
QCon London: Low latency Java in the real world - LMAX Exchange and the Zing JVM
Ā 
Dissecting the Hotspot JVM
Dissecting the Hotspot JVMDissecting the Hotspot JVM
Dissecting the Hotspot JVM
Ā 
Intrinsic Methods in HotSpot VM
Intrinsic Methods in HotSpot VMIntrinsic Methods in HotSpot VM
Intrinsic Methods in HotSpot VM
Ā 
Ƙredev 2011 - JVM JIT for Dummies (What the JVM Does With Your Bytecode When ...
Ƙredev 2011 - JVM JIT for Dummies (What the JVM Does With Your Bytecode When ...Ƙredev 2011 - JVM JIT for Dummies (What the JVM Does With Your Bytecode When ...
Ƙredev 2011 - JVM JIT for Dummies (What the JVM Does With Your Bytecode When ...
Ā 
JVM Mechanics: When Does the JVM JIT & Deoptimize?
JVM Mechanics: When Does the JVM JIT & Deoptimize?JVM Mechanics: When Does the JVM JIT & Deoptimize?
JVM Mechanics: When Does the JVM JIT & Deoptimize?
Ā 

Similar to JVM Mechanics: A Peek Under the Hood

C++ Data-flow Parallelism sounds great! But how practical is it? Letā€™s see ho...
C++ Data-flow Parallelism sounds great! But how practical is it? Letā€™s see ho...C++ Data-flow Parallelism sounds great! But how practical is it? Letā€™s see ho...
C++ Data-flow Parallelism sounds great! But how practical is it? Letā€™s see ho...
Jason Hearne-McGuiness
Ā 
C5 c++ development environment
C5 c++ development environmentC5 c++ development environment
C5 c++ development environment
snchnchl
Ā 

Similar to JVM Mechanics: A Peek Under the Hood (20)

Enabling Java in Latency Sensitive Applications by Gil Tene, CTO, Azul Systems
Enabling Java in Latency Sensitive Applications by Gil Tene, CTO, Azul SystemsEnabling Java in Latency Sensitive Applications by Gil Tene, CTO, Azul Systems
Enabling Java in Latency Sensitive Applications by Gil Tene, CTO, Azul Systems
Ā 
Enabling Java in Latency-Sensitive Applications
Enabling Java in Latency-Sensitive ApplicationsEnabling Java in Latency-Sensitive Applications
Enabling Java in Latency-Sensitive Applications
Ā 
Understanding Java Garbage Collection - And What You Can Do About It
Understanding Java Garbage Collection - And What You Can Do About ItUnderstanding Java Garbage Collection - And What You Can Do About It
Understanding Java Garbage Collection - And What You Can Do About It
Ā 
Winning With Java at Market Open
Winning With Java at Market OpenWinning With Java at Market Open
Winning With Java at Market Open
Ā 
[REPEAT] Deep Learning for Developers: An Introduction, Featuring Samsung SDS...
[REPEAT] Deep Learning for Developers: An Introduction, Featuring Samsung SDS...[REPEAT] Deep Learning for Developers: An Introduction, Featuring Samsung SDS...
[REPEAT] Deep Learning for Developers: An Introduction, Featuring Samsung SDS...
Ā 
Hexadite Real Life Django ORM
Hexadite Real Life Django ORMHexadite Real Life Django ORM
Hexadite Real Life Django ORM
Ā 
Understanding Application Hiccups - and What You Can Do About Them
Understanding Application Hiccups - and What You Can Do About ThemUnderstanding Application Hiccups - and What You Can Do About Them
Understanding Application Hiccups - and What You Can Do About Them
Ā 
How to write clean & testable code without losing your mind
How to write clean & testable code without losing your mindHow to write clean & testable code without losing your mind
How to write clean & testable code without losing your mind
Ā 
Deep Learning for Developers: An Introduction, Featuring Samsung SDS (AIM301-...
Deep Learning for Developers: An Introduction, Featuring Samsung SDS (AIM301-...Deep Learning for Developers: An Introduction, Featuring Samsung SDS (AIM301-...
Deep Learning for Developers: An Introduction, Featuring Samsung SDS (AIM301-...
Ā 
C++ How to program
C++ How to programC++ How to program
C++ How to program
Ā 
Implementing the Genetic Algorithm in XSLT: PoC
Implementing the Genetic Algorithm in XSLT: PoCImplementing the Genetic Algorithm in XSLT: PoC
Implementing the Genetic Algorithm in XSLT: PoC
Ā 
JetBrains Day Seoul - Exploring .NETā€™s memory management ā€“ a trip down memory...
JetBrains Day Seoul - Exploring .NETā€™s memory management ā€“ a trip down memory...JetBrains Day Seoul - Exploring .NETā€™s memory management ā€“ a trip down memory...
JetBrains Day Seoul - Exploring .NETā€™s memory management ā€“ a trip down memory...
Ā 
Next Generation Indexes For Big Data Engineering (ODSC East 2018)
Next Generation Indexes For Big Data Engineering (ODSC East 2018)Next Generation Indexes For Big Data Engineering (ODSC East 2018)
Next Generation Indexes For Big Data Engineering (ODSC East 2018)
Ā 
GWT is Smarter Than You
GWT is Smarter Than YouGWT is Smarter Than You
GWT is Smarter Than You
Ā 
Understanding Hardware Transactional Memory
Understanding Hardware Transactional MemoryUnderstanding Hardware Transactional Memory
Understanding Hardware Transactional Memory
Ā 
C++ Data-flow Parallelism sounds great! But how practical is it? Letā€™s see ho...
C++ Data-flow Parallelism sounds great! But how practical is it? Letā€™s see ho...C++ Data-flow Parallelism sounds great! But how practical is it? Letā€™s see ho...
C++ Data-flow Parallelism sounds great! But how practical is it? Letā€™s see ho...
Ā 
Skiron - Experiments in CPU Design in D
Skiron - Experiments in CPU Design in DSkiron - Experiments in CPU Design in D
Skiron - Experiments in CPU Design in D
Ā 
What's new with JavaScript in GNOME: The 2020 edition (GUADEC 2020)
What's new with JavaScript in GNOME: The 2020 edition (GUADEC 2020)What's new with JavaScript in GNOME: The 2020 edition (GUADEC 2020)
What's new with JavaScript in GNOME: The 2020 edition (GUADEC 2020)
Ā 
C5 c++ development environment
C5 c++ development environmentC5 c++ development environment
C5 c++ development environment
Ā 
Real time stream processing presentation at General Assemb.ly
Real time stream processing presentation at General Assemb.lyReal time stream processing presentation at General Assemb.ly
Real time stream processing presentation at General Assemb.ly
Ā 

More from Azul Systems Inc.

Push Technology's latest data distribution benchmark with Solarflare and Zing
Push Technology's latest data distribution benchmark with Solarflare and ZingPush Technology's latest data distribution benchmark with Solarflare and Zing
Push Technology's latest data distribution benchmark with Solarflare and Zing
Azul Systems Inc.
Ā 

More from Azul Systems Inc. (20)

Advancements ingc andc4overview_linkedin_oct2017
Advancements ingc andc4overview_linkedin_oct2017Advancements ingc andc4overview_linkedin_oct2017
Advancements ingc andc4overview_linkedin_oct2017
Ā 
Zulu Embedded Java Introduction
Zulu Embedded Java IntroductionZulu Embedded Java Introduction
Zulu Embedded Java Introduction
Ā 
What's New in the JVM in Java 8?
What's New in the JVM in Java 8?What's New in the JVM in Java 8?
What's New in the JVM in Java 8?
Ā 
DotCMS Bootcamp: Enabling Java in Latency Sensitivie Environments
DotCMS Bootcamp: Enabling Java in Latency Sensitivie EnvironmentsDotCMS Bootcamp: Enabling Java in Latency Sensitivie Environments
DotCMS Bootcamp: Enabling Java in Latency Sensitivie Environments
Ā 
ObjectLayout: Closing the (last?) inherent C vs. Java speed gap
ObjectLayout: Closing the (last?) inherent C vs. Java speed gapObjectLayout: Closing the (last?) inherent C vs. Java speed gap
ObjectLayout: Closing the (last?) inherent C vs. Java speed gap
Ā 
Azul Systems open source guide
Azul Systems open source guideAzul Systems open source guide
Azul Systems open source guide
Ā 
Intelligent Trading Summit NY 2014: Understanding Latency: Key Lessons and Tools
Intelligent Trading Summit NY 2014: Understanding Latency: Key Lessons and ToolsIntelligent Trading Summit NY 2014: Understanding Latency: Key Lessons and Tools
Intelligent Trading Summit NY 2014: Understanding Latency: Key Lessons and Tools
Ā 
Understanding Java Garbage Collection
Understanding Java Garbage CollectionUnderstanding Java Garbage Collection
Understanding Java Garbage Collection
Ā 
The evolution of OpenJDK: From Java's beginnings to 2014
The evolution of OpenJDK: From Java's beginnings to 2014The evolution of OpenJDK: From Java's beginnings to 2014
The evolution of OpenJDK: From Java's beginnings to 2014
Ā 
Push Technology's latest data distribution benchmark with Solarflare and Zing
Push Technology's latest data distribution benchmark with Solarflare and ZingPush Technology's latest data distribution benchmark with Solarflare and Zing
Push Technology's latest data distribution benchmark with Solarflare and Zing
Ā 
Webinar: Zing Vision: Answering your toughest production Java performance que...
Webinar: Zing Vision: Answering your toughest production Java performance que...Webinar: Zing Vision: Answering your toughest production Java performance que...
Webinar: Zing Vision: Answering your toughest production Java performance que...
Ā 
Speculative Locking: Breaking the Scale Barrier (JAOO 2005)
Speculative Locking: Breaking the Scale Barrier (JAOO 2005)Speculative Locking: Breaking the Scale Barrier (JAOO 2005)
Speculative Locking: Breaking the Scale Barrier (JAOO 2005)
Ā 
Java vs. C/C++
Java vs. C/C++Java vs. C/C++
Java vs. C/C++
Ā 
What's Inside a JVM?
What's Inside a JVM?What's Inside a JVM?
What's Inside a JVM?
Ā 
Towards a Scalable Non-Blocking Coding Style
Towards a Scalable Non-Blocking Coding StyleTowards a Scalable Non-Blocking Coding Style
Towards a Scalable Non-Blocking Coding Style
Ā 
Experiences with Debugging Data Races
Experiences with Debugging Data RacesExperiences with Debugging Data Races
Experiences with Debugging Data Races
Ā 
Lock-Free, Wait-Free Hash Table
Lock-Free, Wait-Free Hash TableLock-Free, Wait-Free Hash Table
Lock-Free, Wait-Free Hash Table
Ā 
How NOT to Write a Microbenchmark
How NOT to Write a MicrobenchmarkHow NOT to Write a Microbenchmark
How NOT to Write a Microbenchmark
Ā 
The Art of Java Benchmarking
The Art of Java BenchmarkingThe Art of Java Benchmarking
The Art of Java Benchmarking
Ā 
Azul Zulu on Azure Overview -- OpenTech CEE Workshop, Warsaw, Poland
Azul Zulu on Azure Overview -- OpenTech CEE Workshop, Warsaw, PolandAzul Zulu on Azure Overview -- OpenTech CEE Workshop, Warsaw, Poland
Azul Zulu on Azure Overview -- OpenTech CEE Workshop, Warsaw, Poland
Ā 

Recently uploaded

Recently uploaded (20)

ODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User GroupODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User Group
Ā 
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptxIOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
Ā 
Demystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John StaveleyDemystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John Staveley
Ā 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
Ā 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
Ā 
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya Halder
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya HalderCustom Approval Process: A New Perspective, Pavel Hrbacek & Anindya Halder
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya Halder
Ā 
"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi
Ā 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Ā 
Dev Dives: Train smarter, not harder ā€“ active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder ā€“ active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder ā€“ active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder ā€“ active learning and UiPath LLMs for do...
Ā 
Powerful Start- the Key to Project Success, Barbara Laskowska
Powerful Start- the Key to Project Success, Barbara LaskowskaPowerful Start- the Key to Project Success, Barbara Laskowska
Powerful Start- the Key to Project Success, Barbara Laskowska
Ā 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
Ā 
Exploring UiPath Orchestrator API: updates and limits in 2024 šŸš€
Exploring UiPath Orchestrator API: updates and limits in 2024 šŸš€Exploring UiPath Orchestrator API: updates and limits in 2024 šŸš€
Exploring UiPath Orchestrator API: updates and limits in 2024 šŸš€
Ā 
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
Ā 
In-Depth Performance Testing Guide for IT Professionals
In-Depth Performance Testing Guide for IT ProfessionalsIn-Depth Performance Testing Guide for IT Professionals
In-Depth Performance Testing Guide for IT Professionals
Ā 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Ā 
Free and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
Free and Effective: Making Flows Publicly Accessible, Yumi IbrahimzadeFree and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
Free and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
Ā 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
Ā 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
Ā 
UiPath Test Automation using UiPath Test Suite series, part 1
UiPath Test Automation using UiPath Test Suite series, part 1UiPath Test Automation using UiPath Test Suite series, part 1
UiPath Test Automation using UiPath Test Suite series, part 1
Ā 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
Ā 

JVM Mechanics: A Peek Under the Hood

  • 1. JVM Mechanics A peek under the hood Gil Tene, CTO & co-Founder, Azul Systems Ā©2012 Azul Systems, Inc.
  • 2. About me: Gil Tene co-founder, CTO @Azul Systems Working JVMs since 2001, Managed runtimes since 1989 Created Pauseless & C4 core GC algorithms (Tene, Wolf) A Long history building Virtual & Physical Machines, Operating Systems, Enterprise apps, etc... Ā©2012 Azul Systems, Inc. * working on real-world trash compaction issues, circa 2004
  • 3. About Azul Vega We make scalable Virtual Machines Have built ā€œwhatever it takes to get job doneā€ since 2002 3 generations of custom SMP Multi-core HW (Vega) Now Pure software for commodity x86 (Zing) C4 ā€œIndustry ļ¬rstsā€ in Garbage collection, elastic memory, Java virtualization, memory scale Ā©2012 Azul Systems, Inc.
  • 4. High level agenda Compiler stuff Adaptive behavior stuff Ordering stuff Garbage Collection stuff Some chest beating Open discussion Ā©2012 Azul Systems, Inc.
  • 6. The JIT compilers transforms code The code actually executed can be very different than the code you write
  • 8. Code can be reordered... int doMath(int x, int y, int z) { int a = x + y; int b = x - y; int c = z + x; return a + b; } Can be reordered to: int doMath(int x, int y, int z) { int c = z + x; int b = x - y; int a = x + y; return a + b; } Ā©2012 Azul Systems, Inc.
  • 9. Dead code can be removed int doMath(int x, int y, int z) { int a = x + y; int b = x - y; int c = z + x; return a + b; } Can be reduced to: int doMath(int x, int y, int z) { int a = x + y; int b = x - y; return a + b; } Ā©2012 Azul Systems, Inc.
  • 10. Values can be propagated int doMath(int x, int y, int z) { int a = x + y; int b = x - y; int c = z + x; return a + b; } Can be reduced to: int doMath(int x, int y, int z) { return x + y + x - y; } Ā©2012 Azul Systems, Inc.
  • 11. Math can be simpliļ¬ed int doMath(int x, int y, int z) { int a = x + y; int b = x - y; int c = z + x; return a + b; } Can be reduced to: int doMath(int x, int y, int z) { return x + x; } Ā©2012 Azul Systems, Inc.
  • 12. So why does this matter Keep your code ā€œreadableā€ largestValueLog = Math.log(largestValueWithSingleUnitResolution); magnitude = (int) Math.ceil(largestValueLog/Math.log(2.0)); subBucketMagnitude = (magnitude > 1) ? magnitude : 1; subBucketCount = (int) Math.pow(2, subBucketMagnitude); subBucketMask = subBucketCount - 1; Hard enough to follow as it is No value in ā€œoptimizingā€ human-readable meaning away Compiled code will end up the same anyway Ā©2012 Azul Systems, Inc.
  • 14. Reads can be cached int distanceRatio(Object a) { int distanceTo = a.getX() - start; int distanceAfter = end - a.getX(); return distanceTo/distanceAfter; } Is the same as int distanceRatio(Object a) { int x = a.getX(); int distanceTo = x - start; int distanceAfter = end - x; return distanceTo/distanceAfter; } Ā©2012 Azul Systems, Inc.
  • 15. Reads can be cached void loopUntilFlagSet(Object a) { while (!a.ļ¬‚agIsSet()) { loopcount++; } } Is the same as: void loopUntilFlagSet(Object a) { boolean ļ¬‚agIsSet = a.ļ¬‚agIsSet(); while (!ļ¬‚agIsSet) { loopcount++; } } Ā©2012 Azul Systems, Inc. Thatā€™s what volatile is for...
  • 16. Writes can be eliminated Intermediate values might never be visible void updateDistance(Object a) { int distance = 100; a.setX(distance); a.setX(distance * 2); a.setX(distance * 3); } Is the same as void updateDistance(Object a) { a.setX(300); } Ā©2012 Azul Systems, Inc.
  • 17. Writes can be eliminated Intermediate values might never be visible void updateDistance(Object a) { a. setVisibleValue(0); for (int i = 0; i < 1000000; i++) { a.setInternalValue(i); } a.setVisibleValue(a.getInternalValue()); } Is the same as void updateDistance(Object a) { a.setInternalValue(1000000); a.setVisibleValue(1000000); } Ā©2012 Azul Systems, Inc.
  • 18. Inlining... public class Thing { private int x; public ļ¬nal int getX() { return x }; } ... myX = thing.getX(); Is the same as Class Thing { int x; } ... myX = thing.x; Ā©2012 Azul Systems, Inc.
  • 19. Things JIT compilers can do ..and static compilers can have a hard time with
  • 20. Class Hierarchy Analysis (CHA) Can perform global analysis on currently loaded code Deduce stuff about inheritance, method overrides, etc. Can make optimization decisions based on assumptions Re-evaluate assumptions when loading new classes Throw away code that conļ¬‚icts with assumptions before class loading makes them invalid Ā©2012 Azul Systems, Inc.
  • 21. Inlining works without ā€œļ¬nalā€ public class Thing { private int x; public int getX() { return x }; } ... myX = thing.getX(); Is the same as Class Thing { int x; } ... myX = thing.x; As long as there is only one implementer of getX() Ā©2012 Azul Systems, Inc.
  • 22. Speculative stuff The power of the ā€œuncommon trapā€ Being able throw away wrong code is very useful Speculatively assuming callee type polymorphic can be ā€œmonomorphicā€ or ā€œmegamorphicā€ Can make virtual calls static even without CHA Can speculatively inline things without CHA Speculatively assuming branch behavior Weā€™ve only ever seen this thing go one way, so.... Ā©2012 Azul Systems, Inc.
  • 23. Adaptive compilation make cleaner code practical Reduces need to trade off clean design against speed E.g. ā€œļ¬nalā€ should be used on methods only when you want to prohibit extension, overriding. Has no effect on speed. E.g. branching can be written ā€œnaturallyā€ Ā©2012 Azul Systems, Inc.
  • 25. Adaptive compilation is... adaptive Measuring actual behavior is harder Micro-benchmarking is an art JITs are moving target ā€œWarmupā€ techniques can often fail Ā©2012 Azul Systems, Inc.
  • 26. Warmup problems Common Example: Trading system wants to have the ļ¬rst trade be fast So run 20,000 ā€œfakeā€ messages through the system to warm up let JIT compilers optimize code, and deopt before actual trades What really happens Code is written to do different things ā€œif this is a fake messageā€ e.g. ā€œDonā€™t send to the exchange if this is a fake messageā€ JITs optimize for fake path, including speculatively assuming ā€œfakeā€ First real message through deopts... Ā©2012 Azul Systems, Inc.
  • 27. Warmup tips... (to get ā€œļ¬rst real thingā€ to be fast) System should not distinguish between ā€œrealā€ and ā€œfakeā€ Make that an external concern Avoid all conditional code based on ā€œfakeā€ Avoid all class-speciļ¬c calls based on ā€œfakeā€ (ā€œobject oriented ifsā€). Use ā€œrealā€ input sources and output targets instead Have system output to a sink target that knows it is ā€œfakeā€ Make output decisions based on data, not code E.g. array of sink targets, with array index computed from message payload Ā©2012 Azul Systems, Inc.
  • 29. Ordering of operations Within a thread, itā€™s trivial : ā€œhappens beforeā€ Across threads? News ļ¬‚ash: CPU memory ordering doesnā€™t matter There is a much bigger culprit at play: Compilers The code will be reordered before your cpu ever sees it The only ordering rules that matter are the JMM ones. Intuitive read: ā€œHappens before holds within threadsā€ Things can move into, but not out of synchronized blocks Volatile stuff is a bit more tricky... Ā©2012 Azul Systems, Inc.
  • 30. Ordering of operations Source: http:/ /g.oswego.edu/dl/jmm/cookbook.html Ā©2012 Azul Systems, Inc.
  • 32. Most of what People seem to ā€œknowā€ about Garbage Collection is wrong In many cases, itā€™s much better than you may think GC is extremely efļ¬cient. Much more so that malloc() Dead objects cost nothing to collect GC will ļ¬nd all the dead objects (including cyclic graphs) ... In many cases, itā€™s much worse than you may think Yes, it really does stop for ~1 sec per live GB (in most JVMs). No, GC does not mean you canā€™t have memory leaks No, those pauses you eliminated from your 20 minute test are not gone ... Ā©2012 Azul Systems, Inc.
  • 33. Generational Collection Weak Generational Hypothesis; ā€œmost objects die youngā€ Focus collection efforts on young generation: Use a moving collector: work is linear to the live set The live set in the young generation is a small % of the space Promote objects that live long enough to older generations Only collect older generations as they ļ¬ll up ā€œGenerational ļ¬lterā€ reduces rate of allocation into older generations Tends to be (order of magnitude) more efļ¬cient Great way to keep up with high allocation rate Practical necessity for keeping up with processor throughput Ā©2012 Azul Systems, Inc.
  • 34. GC Efficiency: Empty memory vs. CPU Ā©2012 Azul Systems, Inc.
  • 35. CPU% Heap size vs. GC CPU % 100% Live set Heap size
  • 36. Two Intuitive limits If we had exactly 1 byte of empty memory at all times, the collector would have to work ā€œvery hardā€, and GC would take 100% of the CPU time If we had inļ¬nite empty memory, we would never have to collect, and GC would take 0% of the CPU time GC CPU % will follow a rough 1/x curve between these two limit points, dropping as the amount of memory increases. Ā©2012 Azul Systems, Inc.
  • 37. Empty memory needs (empty memory == CPU power) The amount of empty memory in the heap is the dominant factor controlling the amount of GC work For both Copy and Mark/Compact collectors, the amount of work per cycle is linear to live set The amount of memory recovered per cycle is equal to the amount of unused memory (heap size) - (live set) The collector has to perform a GC cycle when the empty memory runs out A Copy or Mark/Compact collectorā€™s efļ¬ciency doubles with every doubling of the empty memory Ā©2012 Azul Systems, Inc.
  • 38. What empty memory controls Empty memory controls efļ¬ciency (amount of collector work needed per amount of application work performed) Empty memory controls the frequency of pauses (if the collector performs any Stop-the-world operations) Empty memory DOES NOT control pause times (only their frequency) In Mark/Sweep/Compact collectors that pause for sweeping, more empty memory means less frequent but LARGER pauses Ā©2012 Azul Systems, Inc.
  • 39. GC and latency: That pesky stop-the-world thing Ā©2012 Azul Systems, Inc.
  • 40. Delaying the inevitable Some form of copying/compaction is inevitable in practice And compacting anything requires scanning/ļ¬xing all references to it Delay tactics focus on getting ā€œeasy empty spaceā€ ļ¬rst This is the focus for the vast majority of GC tuning Most objects die young [Generational] So collect young objects only, as much as possible. Hope for short STW. But eventually, some old dead objects must be reclaimed Most old dead space can be reclaimed without moving it [e.g. CMS] track dead space in lists, and reuse it in place But eventually, space gets fragmented, and needs to be moved Much of the heap is not ā€œpopularā€ [e.g. G1, ā€œBalancedā€] A non popular region will only be pointed to from a small % of the heap So compact non-popular regions in short stop-the-world pauses But eventually, popular objects and regions need to be compacted Ā©2012 Azul Systems, Inc.
  • 41. Memory use How many of you use heap sizes of: Fļ† more than Ā½ GB? Fļ† more than 1 GB? Fļ† more than 2 GB? Fļ† more than 4 GB? Fļ† more than 10 GB? Fļ† more than 20 GB? Fļ† more than 50 GB? Ā©2012 Azul Systems, Inc.
  • 42. Reality check: servers in 2012 Retail prices, major web server store (US $, Oct 2012) 24 vCore, 128GB server ā‰ˆ $5K 24 vCore, 256GB server ā‰ˆ $8K 32 vCore, 384GB server ā‰ˆ $14K 48 vCore, 512GB server ā‰ˆ $19K 64 vCore, 1TB server ā‰ˆ $36K Cheap (< $1/GB/Month), and roughly linear to ~1TB 10s to 100s of GB/sec of memory bandwidth
  • 43. The Application Memory Wall A simple observation: Application instances appear to be unable to make effective use of modern server memory capacities The size of application instances as a % of a serverā€™s capacity is rapidly dropping Ā©2012 Azul Systems, Inc.
  • 44. How much memory do applications need? ā€œ640KB ought to be enough for anybodyā€ WRONG! So whatā€™s the right number? 6,400K? 64,000K? 640,000K? 6,400,000K? 64,000,000K? There is no right number Target moves at 50x-100x per decade Ā©2012 Azul Systems, Inc. ā€œI've said some stupid things and some wrong things, but not that. No one involved in computers would ever say that a certain amount of memory is enough for all time ā€¦ā€ - Bill Gates, 1996
  • 45. ā€œTinyā€ application history Assuming Mooreā€™s Law means: 2010 ā€œtransistor counts grow at ā‰ˆ2x every ā‰ˆ18 monthsā€ It also means memory size grows ā‰ˆ100x every 10 years 1990 1980 ??? GB apps on 256 GB Application Memory Wall 2000 1GB apps on a 2 ā€“ 4 GB server 10MB apps on a 32 ā€“ 64 MB server 100KB apps on a Ā¼ to Ā½ MB Server ā€œTinyā€: would be ā€œsillyā€ to distribute Ā©2012 Azul Systems, Inc.
  • 46. What is causing the Application Memory Wall? Garbage Collection is a clear and dominant cause There seem to be practical heap size limits for applications with responsiveness requirements [Virtually] All current commercial JVMs will exhibit a multi-second pause on a normally utilized 2-6GB heap. Itā€™s a question of ā€œWhenā€ and ā€œHow oftenā€, not ā€œIfā€. GC tuning only moves the ā€œwhenā€ and the ā€œhow oftenā€ around Root cause: The link between scale and responsiveness Ā©2012 Azul Systems, Inc.
  • 47. The problems that need solving (areas where the state of the art needs improvement) Robust Concurrent Marking In the presence of high mutation and allocation rates Cover modern runtime semantics (e.g. weak refs, lock deļ¬‚ation) Compaction that is not monolithic-stop-the-world E.g. stay responsive while compacting Ā¼ TB heaps Must be robust: not just a tactic to delay STW compaction [current ā€œincremental STWā€ attempts fall short on robustness] Young-Gen that is not monolithic-stop-the-world Stay responsive while promoting multi-GB data spikes Concurrent or ā€œincremental STWā€ may both be ok Surprisingly little work done in this speciļ¬c area Ā©2012 Azul Systems, Inc.
  • 49. Incontinuities in Java platform execution Hiccups"by"Time"Interval" Max"per"Interval" 99%" 99.90%" 99.99%" Max" Hiccup&Dura*on&(msec)& 1800" 1600" 1400" 1200" 1000" 800" 600" 400" 200" 0" 0" 200" 400" 600" 800" 1000" 1200" 1400" 1600" 1800" &Elapsed&Time&(sec)& Hiccups"by"Percen@le"Distribu@on" 1800" Hiccup&Dura*on&(msec)& 1600" Max=1665.024& 1400" 1200" 1000" 800" 600" 400" 200" 0" Ā©2012 Azul Systems, Inc. 0%" 90%" 99%" & 99.9%" & Percen*le& 99.99%" 99.999%"
  • 50. jHiccup A tool for capturing and displaying platform hiccups Records any observed non-continuity of the underlying platform Plots results in simple, consistent format Simple, non-intrusive As simple as adding the word ā€œjHiccupā€ to your java launch line % jHiccup java myļ¬‚ags myApp (Or use as a java agent) Adds a background thread that samples time @ 1000/sec Open Source Released to the public domain, creative commons CC0 Ā©2012 Azul Systems, Inc.
  • 51. Telco App Example Max Time per Hiccups&by&Time&Interval& interval Max"per"Interval" 99%" 99.90%" 99.99%" Max" Hiccup&Dura*on&(msec)& 140" 120" 100" 80" 60" 40" 20" 0" 0" 500" 1000" 1500" 2000" 2500" &Elapsed&Time&(sec)& Optional SLA Hiccups&by&Percen*le&Distribu*on& plotting Hiccup&Dura*on&(msec)& 120" duration at 100" 80" percentile 60" levels 40" 20" 0" 0%" 90%" 99%" & 99.9%" & Percen*le& Hiccups"by"Percen?le" Ā©2012 Azul Systems, Inc. Hiccup 140" 99.99%" SLA" 99.999%" 99.9999%"
  • 53. Idle App on Quiet System Idle App on Busy System Hiccups&by&Time&Interval& Max"per"Interval" 99%" 99.90%" Hiccups&by&Time&Interval& 99.99%" Max" Max"per"Interval" 20" 15" 10" 5" 0" Max" 50" 40" 30" 20" 10" 100" 200" 300" 400" 500" 600" 700" 800" 900" 0" 100" 200" &Elapsed&Time&(sec)& 300" 400" 500" 600" 700" 800" 900" &Elapsed&Time&(sec)& Hiccups&by&Percen*le&Distribu*on& Hiccups&by&Percen*le&Distribu*on& 25" 60" Max=22.336& 20" Hiccup&Dura*on&(msec)& Hiccup&Dura*on&(msec)& 99.99%" 0" 0" 15" 10" 5" 0" 99.90%" 60" Hiccup&Dura*on&(msec)& Hiccup&Dura*on&(msec)& 25" 99%" 0%" Ā©2012 Azul Systems, Inc. 90%" 99%" & 99.9%" 99.99%" 99.999%" & Percen*le& 50" Max=49.728& 40" 30" 20" 10" 0" 0%" 90%" 99%" & 99.9%" & Percen*le& 99.99%" 99.999%"
  • 54. Idle App on Quiet System Idle App on Dedicated System Hiccups&by&Time&Interval& Max"per"Interval" 99%" 99.90%" Hiccups&by&Time&Interval& 99.99%" Max" Max"per"Interval" 20" 15" 10" 5" 0" 99.99%" Max" 0.4" 0.35" 0.3" 0.25" 0.2" 0.15" 0.1" 0.05" 0" 0" 100" 200" 300" 400" 500" 600" 700" 800" 900" 0" 100" 200" &Elapsed&Time&(sec)& 300" 400" 500" 600" 700" 800" 900" &Elapsed&Time&(sec)& Hiccups&by&Percen*le&Distribu*on& Hiccups&by&Percen*le&Distribu*on& 25" 0.45" Max=22.336& 20" Hiccup&Dura*on&(msec)& Hiccup&Dura*on&(msec)& 99.90%" 0.45" Hiccup&Dura*on&(msec)& Hiccup&Dura*on&(msec)& 25" 99%" 15" 10" 5" 0.4" Max=0.411& 0.35" 0.3" 0.25" 0.2" 0.15" 0.1" 0.05" 0" 0%" Ā©2012 Azul Systems, Inc. 90%" 99%" & 99.9%" 99.99%" 99.999%" & Percen*le& 0" 0%" 90%" 99%" & 99.9%" & Percen*le& 99.99%" 99.999%"
  • 55. EHCache: 1GB data set under load Hiccups"by"Time"Interval" Max"per"Interval" 99%" 99.90%" 99.99%" Max" Hiccup&Dura*on&(msec)& 4000" 3500" 3000" 2500" 2000" 1500" 1000" 500" 0" 0" 500" 1000" 1500" 2000" &Elapsed&Time&(sec)& Hiccups"by"Percen@le"Distribu@on" Hiccup&Dura*on&(msec)& 4000" 3500" Max=3448.832& 3000" 2500" 2000" 1500" 1000" 500" 0" Ā©2012 Azul Systems, Inc. 0%" 90%" 99%" & 99.9%" & Percen*le& 99.99%" 99.999%" 99.9999%"
  • 56. Fun with jHiccup Ā©2012 Azul Systems, Inc.
  • 57. Oracle HotSpot CMS, 1GB in an 8GB heap Zing 5, 1GB in an 8GB heap Hiccups&by&Time&Interval& Max"per"Interval" 99%" 99.90%" Hiccups&by&Time&Interval& 99.99%" Max" Max"per"Interval" 12000" 10000" 8000" 6000" 4000" 2000" 0" 2000" 2500" Max" 20" 15" 10" 5" 500" 1000" 1500" 2000" 2500" 3000" 3500" 0" 500" 1000" &Elapsed&Time&(sec)& 1500" 3000" 3500" &Elapsed&Time&(sec)& Hiccups&by&Percen*le&Distribu*on& Hiccups&by&Percen*le&Distribu*on& 14000" 25" Max=13156.352& 12000" Hiccup&Dura*on&(msec)& Hiccup&Dura*on&(msec)& 99.99%" 0" 0" 10000" 8000" 6000" 4000" 2000" 0" 99.90%" 25" Hiccup&Dura*on&(msec)& Hiccup&Dura*on&(msec)& 14000" 99%" 0%" Ā©2012 Azul Systems, Inc. 90%" 99%" &99.9%" & Percen*le& 99.99%" 99.999%" 20" Max=20.384& 15" 10" 5" 0" 0%" 90%" 99%" & 99.9%" & Percen*le& 99.99%" 99.999%" 99.9999%"
  • 58. Oracle HotSpot CMS, 1GB in an 8GB heap Zing 5, 1GB in an 8GB heap Hiccups&by&Time&Interval& Max"per"Interval" 99%" 99.90%" Hiccups&by&Time&Interval& 99.99%" Max" Max"per"Interval" 12000" 99.99%" Max" 12000" 10000" 8000" 6000" 4000" 2000" 0" 10000" 8000" 6000" 4000" 2000" 0" 0" 500" 1000" 1500" 2000" 2500" 3000" 3500" 0" 500" 1000" &Elapsed&Time&(sec)& 2000" 2500" 3000" 3500" Hiccups&by&Percen*le&Distribu*on& 14000" 14000" Hiccup&Dura*on&(msec)& Max=13156.352& 12000" 10000" 8000" 6000" 4000" 2000" 0" 1500" &Elapsed&Time&(sec)& Hiccups&by&Percen*le&Distribu*on& Hiccup&Dura*on&(msec)& 99.90%" 14000" Hiccup&Dura*on&(msec)& Hiccup&Dura*on&(msec)& 14000" 99%" 0%" 90%" 99%" &99.9%" & Percen*le& 99.99%" 99.999%" 12000" 10000" 8000" 6000" 4000" 2000" 0" 0%" 90%" Max=20.384& Drawn to scale Ā©2012 Azul Systems, Inc. 99%" & 99.9%" 99.99%" & Percen*le& 99.999%" 99.9999%"
  • 59. What you can expect (from Zing) in the low latency world Assuming individual transaction work is ā€œshortā€ (on the order of 1 msec), and assuming you donā€™t have 100s of runnable threads competing for 10 cores... ā€œEasilyā€ get your application to < 10 msec worst case With some tuning, 2-3 msec worst case Can go to below 1 msec worst case... May require heavy tuning/tweaking Mileage WILL vary Ā©2012 Azul Systems, Inc.
  • 60. Oracle HotSpot (pure newgen) Zing Hiccups&by&Time&Interval& Max"per"Interval" 99%" 99.90%" Hiccups&by&Time&Interval& 99.99%" Max" Max"per"Interval" 20" 15" 10" 5" 0" 99.99%" Max" 1.6" 1.4" 1.2" 1" 0.8" 0.6" 0.4" 0.2" 0" 0" 100" 200" 300" 400" 500" 600" 0" 100" 200" &Elapsed&Time&(sec)& 300" 400" 500" 600" &Elapsed&Time&(sec)& Hiccups&by&Percen*le&Distribu*on& Hiccups&by&Percen*le&Distribu*on& 25" 1.8" Max=22.656& 20" Hiccup&Dura*on&(msec)& Hiccup&Dura*on&(msec)& 99.90%" 1.8" Hiccup&Dura*on&(msec)& Hiccup&Dura*on&(msec)& 25" 99%" 15" 10" 5" 1.6" Max=1.568& 1.4" 1.2" 1" 0.8" 0.6" 0.4" 0.2" 0" 0%" 90%" & 99%" 99.9%" & 99.99%" 99.999%" Percen*le& 0" 0%" 90%" & 99%" Low latency trading application Ā©2012 Azul Systems, Inc. 99.9%" & 99.99%" Percen*le& 99.999%"
  • 61. Oracle HotSpot (pure newgen) Zing Hiccups&by&Time&Interval& Max"per"Interval" 99%" 99.90%" Hiccups&by&Time&Interval& 99.99%" Max" Max"per"Interval" 20" 15" 10" 5" 0" Max" 20" 15" 10" 5" 100" 200" 300" 400" 500" 600" 0" 100" 200" &Elapsed&Time&(sec)& 300" 400" 500" 600" &Elapsed&Time&(sec)& Hiccups&by&Percen*le&Distribu*on& Hiccups&by&Percen*le&Distribu*on& 25" 25" Max=22.656& 20" Hiccup&Dura*on&(msec)& Hiccup&Dura*on&(msec)& 99.99%" 0" 0" 15" 10" 5" 0" 99.90%" 25" Hiccup&Dura*on&(msec)& Hiccup&Dura*on&(msec)& 25" 99%" 0%" 90%" & 99%" 99.9%" & 99.99%" 99.999%" Percen*le& 20" 15" 10" 5" 0" Max=1.568& 0%" 90%" & 99%" Low latency - Drawn to scale Ā©2012 Azul Systems, Inc. 99.9%" & 99.99%" Percen*le& 99.999%"