SlideShare a Scribd company logo
Angelika Langer
Trainer/Consultant
http://www.AngelikaLanger.com/
Java Performance
Garbage Collection
Pauses
© Copyright 1995-2015 by Angelika Langer & Klaus Kreft. All Rights Reserved.
http://www.AngelikaLanger.com/
last update: 18/03/2014 08:06
gc pauses (2)
objective
• what causes long GC pauses?
• what does GC do during a STW pause?
• how can I reduce the pause time?
• explore HotSpot JVM's GC algorithms
• point out reasons for long pauses
• discuss tuning options
• glance at alternative GC in other JVMs
© Copyright 1995-2015 by Angelika Langer & Klaus Kreft. All Rights Reserved.
http://www.AngelikaLanger.com/
last update: 18/03/2014 08:06
gc pauses (3)
speaker's relationship to topic
• independent trainer / consultant / author
– teaching Java for ~20 years
– curriculum of some challenging seminars
– JCP observer and Java champion since 2005
– co-author of "Effective Java" column
– author of Java Generics FAQ and Lambda Tutorial & Reference
© Copyright 1995-2015 by Angelika Langer & Klaus Kreft. All Rights Reserved.
http://www.AngelikaLanger.com/
last update: 18/03/2014 08:06
gc pauses (4)
garbage collection
• purpose
– make memory occupied by unreachable objects available
for subsequent memory allocation
– in order to allow for 24/7 service with finite memory resources
• involves several activities
– garbage detection
– garbage elimination
© Copyright 1995-2015 by Angelika Langer & Klaus Kreft. All Rights Reserved.
http://www.AngelikaLanger.com/
last update: 18/03/2014 08:06
gc pauses (5)
cost of garbage collection
• stop-the-world (STW) phases stop all application threads
– problem for applications with time constraints
e.g. user interaction, SLA applications, ...
• concurrent GC phases steal CPU cycles from application
– problem for application with performance / throughput constraints
e.g. transactions per second, ...
© Copyright 1995-2015 by Angelika Langer & Klaus Kreft. All Rights Reserved.
http://www.AngelikaLanger.com/
last update: 18/03/2014 08:06
gc pauses (6)
trade-off
memory footprint
throughput pause time
© Copyright 1995-2015 by Angelika Langer & Klaus Kreft. All Rights Reserved.
http://www.AngelikaLanger.com/
last update: 18/03/2014 08:06
gc pauses (7)
what causes long pause times?
• what does GC do during a STW pause?
• how can I reduce the pause time?
• agenda for this talk
– explore the HotSpot JVM's GC algorithms
– point out obvious (and not so obvious) reasons for long pauses
– discuss tuning options
– glance at GC in other JVMs
© Copyright 1995-2015 by Angelika Langer & Klaus Kreft. All Rights Reserved.
http://www.AngelikaLanger.com/
last update: 18/03/2014 08:06
gc pauses (8)
agenda
• classic HotSpot GC algorithms
– parallel GC
– concurrent GC
– G1
• reasons for long pauses
• pause time tuning
• alternative garbage collectors
– Shenandoah
– Azul & JRocket
© Copyright 1995-2015 by Angelika Langer & Klaus Kreft. All Rights Reserved.
http://www.AngelikaLanger.com/
last update: 18/03/2014 08:06
gc pauses (9)
parallel GC
• a generational collector
– based on the assumption that most objects die young
© Copyright 1995-2015 by Angelika Langer & Klaus Kreft. All Rights Reserved.
http://www.AngelikaLanger.com/
last update: 18/03/2014 08:06
gc pauses (10)
parallel GC (cont.)
• organizes heap into different areas (generations)
• uses different algorithms per generation
object lifetimeeden survivor
spaces
old (tenured) generationyoung generation
© Copyright 1995-2015 by Angelika Langer & Klaus Kreft. All Rights Reserved.
http://www.AngelikaLanger.com/
last update: 18/03/2014 08:06
gc pauses (11)
minor GC
• copy algorithm
– scans all references into the young generation
– copies all reachable objects (survivors) into survivor space
– also handles promotion to old gen
– updates references to relocated objects
– frees the entire young generation en bloc
• performed frequently
– by multiple GC threads in parallel
© Copyright 1995-2015 by Angelika Langer & Klaus Kreft. All Rights Reserved.
http://www.AngelikaLanger.com/
last update: 18/03/2014 08:06
gc pauses (12)
minor GC
• upside
– eden is an empty block of memory afterwards
very efficient subsequent memory allocation
– no fragmentation
survivor space is compact
• downside
– stop-the-world pause
proportional to number of survivors
– higher footprint
needs free space as destination for copying
© Copyright 1995-2015 by Angelika Langer & Klaus Kreft. All Rights Reserved.
http://www.AngelikaLanger.com/
last update: 18/03/2014 08:06
gc pauses (13)
full GC
• mark-and-compact algorithm
– follows all references into the heap
– marks all reachable objects
– sweeps all dead objects (i.e. marks their memory as "free")
– compacts the old generation
– updates references to relocated objects
• performed rarely
– by multiple GC threads in parallel
© Copyright 1995-2015 by Angelika Langer & Klaus Kreft. All Rights Reserved.
http://www.AngelikaLanger.com/
last update: 18/03/2014 08:06
gc pauses (14)
full GC
• upside
– no fragmentation
• downside
– stop-the-world pause proportional to
number of survivors (in marking phase) and
size of heap (in compaction phase)
© Copyright 1995-2015 by Angelika Langer & Klaus Kreft. All Rights Reserved.
http://www.AngelikaLanger.com/
last update: 18/03/2014 08:06
gc pauses (15)
inter-generational references
• generational GC has a downside
– works on a subset of the heap (young gen in minor GC)
• all references into young must be scanned
– root references (from outside the heap into the young gen)
stack variables, static fields, references from JIT compiled code, etc.
– inter-generational references (from old gen into young gen)
require write barriers
© Copyright 1995-2015 by Angelika Langer & Klaus Kreft. All Rights Reserved.
http://www.AngelikaLanger.com/
last update: 18/03/2014 08:06
gc pauses (16)
old-to-young references
roots
starting points for
young generation
marking
young old
card table
*
*
© Copyright 1995-2015 by Angelika Langer & Klaus Kreft. All Rights Reserved.
http://www.AngelikaLanger.com/
last update: 18/03/2014 08:06
gc pauses (17)
write barriers
• a barrier is additional code
– executed when a reference is modified
– must catch when application creates an old-to-young reference
– sets a dirty bit in a card table
• card table is later processed in GC pause
– to find the actual inter-generational references
© Copyright 1995-2015 by Angelika Langer & Klaus Kreft. All Rights Reserved.
http://www.AngelikaLanger.com/
last update: 18/03/2014 08:06
gc pauses (18)
cost of generational GC
• extra effort for inter-generational references
– slows down application (due to write barrier)
– increases pauses (due to card table processing)
• floating garbage
– dead (not yet collected) objects in old gen
might prevent collection of dead objects in young gen
© Copyright 1995-2015 by Angelika Langer & Klaus Kreft. All Rights Reserved.
http://www.AngelikaLanger.com/
last update: 18/03/2014 08:06
gc pauses (19)
parallel GC
minor GC
minor GC
full GC
- marking
- summary
- compaction
- copy
- copy
© Copyright 1995-2015 by Angelika Langer & Klaus Kreft. All Rights Reserved.
http://www.AngelikaLanger.com/
last update: 18/03/2014 08:06
gc pauses (20)
parallel GC - tuning
• pause time proportional to number of survivors and size of heap
– large heap => long pauses
=> not many tuning options
• some tuning ideas
– increase parallelism
increase number of parallel GC threads
provided that there are idle CPUs available
– let objects die in young gen
reason: GC in old gen is more expensive than in young gen
increase young gen size, survivor size, tenuring threshold, ...
© Copyright 1995-2015 by Angelika Langer & Klaus Kreft. All Rights Reserved.
http://www.AngelikaLanger.com/
last update: 18/03/2014 08:06
gc pauses (21)
some useful VM flags
-XX:+UseParallelGC -XX:+UseParallelOldGC
– select parallel GC on young and old gen
-XX:ParallelGCThreads=<number>
– specify number of GC threads
-Xmn<value> or -XX:NewRatio=<ratio>
– specify size of young gen
-XX:SurvivorRatio=<ratio>
– specify size of survivor spaces
© Copyright 1995-2015 by Angelika Langer & Klaus Kreft. All Rights Reserved.
http://www.AngelikaLanger.com/
last update: 18/03/2014 08:06
gc pauses (22)
agenda
• classic HotSpot GC algorithms
– parallel GC
– concurrent GC
– G1
• reasons for long pauses
• pause time tuning
• alternative garbage collectors
– Shenandoah
– Azul & JRocket
© Copyright 1995-2015 by Angelika Langer & Klaus Kreft. All Rights Reserved.
http://www.AngelikaLanger.com/
last update: 18/03/2014 08:06
gc pauses (23)
concurrent GC
• alternative algorithm on old generation
– STW copy algorithm on young generation
– concurrent mark-and-sweep (CMS) algorithm on old generation
• CMS has several phases
– initial marking phase (STW)
– marking phase (concurrent)
– final remarking phase (STW)
– sweep phase (concurrent)
© Copyright 1995-2015 by Angelika Langer & Klaus Kreft. All Rights Reserved.
http://www.AngelikaLanger.com/
last update: 18/03/2014 08:06
gc pauses (24)
concurrent marking
• marking must identify reachable objects
– while application is running and modifies the reference graph
• uses tricolor algorithm
– requires write barriers
• concurrent marking is not exact
– snap-shot-at-the-beginning (SATB) marking
i.e. objects stay alive, if they were reachable at the beginning
– very conservative; creates a lot of floating garbage
© Copyright 1995-2015 by Angelika Langer & Klaus Kreft. All Rights Reserved.
http://www.AngelikaLanger.com/
last update: 18/03/2014 08:06
gc pauses (25)
concurrent sweep
• sweeping adds free memory cells to free lists
– allocation in old gen requires lookup in free lists
=> more expensive allocation
– increases cost of minor GC
because promotion is more expensive
• sweeping leads to fragmentation
– higher risk of promotion failure
– if large objects are promoted
• fallback to full GC
© Copyright 1995-2015 by Angelika Langer & Klaus Kreft. All Rights Reserved.
http://www.AngelikaLanger.com/
last update: 18/03/2014 08:06
gc pauses (26)
cost of CMS
• increased minor GC pause time
– due to more expensive allocation in old gen via free lists
• substantially more floating garbage
– due to concurrent SATB marking
• extra effort for tricolor algorithm
– slows down application via write barriers
• long turn-around times
– complex algorithm takes longer until actual memory reclaim
• unreliable
– fallback to full GC in case of fragmentation
© Copyright 1995-2015 by Angelika Langer & Klaus Kreft. All Rights Reserved.
http://www.AngelikaLanger.com/
last update: 18/03/2014 08:06
gc pauses (27)
cost of CMS
• reduced pause time (on average)
• at the expense of
– higher memory consumption
– lower throughput
© Copyright 1995-2015 by Angelika Langer & Klaus Kreft. All Rights Reserved.
http://www.AngelikaLanger.com/
last update: 18/03/2014 08:06
gc pauses (28)
concurrent GC
minor GC
initial marking
final remarking
- concurrent marking
minor GC
- concurrent marking
minor GC
- concurrent sweep
© Copyright 1995-2015 by Angelika Langer & Klaus Kreft. All Rights Reserved.
http://www.AngelikaLanger.com/
last update: 18/03/2014 08:06
gc pauses (29)
CMS - tuning
• pause time depends on
– amount of work left for remarking phase
i.e. number of grey cells created by application's activities
– degree of fragmentation
i.e. fallback to full GC
• some tuning ideas
– get more done concurrently
increase number of threads in concurrent phases
– reduce CMS's workload
let objects die in young gen instead of old gen
increase young gen size, survivor size, tenuring threshold, ...
– start marking cycles earlier
to avoid fallback to full GC
lower the occupancy threshold that initiates the cycle
© Copyright 1995-2015 by Angelika Langer & Klaus Kreft. All Rights Reserved.
http://www.AngelikaLanger.com/
last update: 18/03/2014 08:06
gc pauses (30)
some useful VM flags
-XX:+UseConcMarkSweepGC
– select CMS on old gen (automatically uses parallel young GC)
-XX:CMSInitiatingOccupancyFraction=<percent>
-XX:+UseCMSInitiatingOccupancyOnly
– lower threshold that starts CMS cycle
-XX:ConcGCThreads=<n>
– specify number of threads in concurrent phases
© Copyright 1995-2015 by Angelika Langer & Klaus Kreft. All Rights Reserved.
http://www.AngelikaLanger.com/
last update: 18/03/2014 08:06
gc pauses (31)
agenda
• classic HotSpot GC algorithms
– parallel GC
– concurrent GC
– G1
• reasons for long pauses
• pause time tuning
• alternative garbage collectors
– Shenandoah
– Azul & JRocket
© Copyright 1995-2015 by Angelika Langer & Klaus Kreft. All Rights Reserved.
http://www.AngelikaLanger.com/
last update: 18/03/2014 08:06
gc pauses (32)
"garbage first" (G1) GC
• a generational garbage collector
– organizes the heap into regions of identical size
– copy algorithm => no fragmentation
young survivor old
young mode
© Copyright 1995-2015 by Angelika Langer & Klaus Kreft. All Rights Reserved.
http://www.AngelikaLanger.com/
last update: 18/03/2014 08:06
gc pauses (33)
mixed mode collections
• builds collection set dynamically
– collection set = regions to be included into next GC
• two modes
– young: all young regions are collected
– mixed: old regions with a lot of garbage are included
young survivor old
mixed mode
collection set
© Copyright 1995-2015 by Angelika Langer & Klaus Kreft. All Rights Reserved.
http://www.AngelikaLanger.com/
last update: 18/03/2014 08:06
gc pauses (34)
remembered sets
• partial GC (on subset of heap) requires
– maintenance of references into the collection set
– all regions have a remembered set
• remembered set (RS)
– list of references from outside the region into the region
© Copyright 1995-2015 by Angelika Langer & Klaus Kreft. All Rights Reserved.
http://www.AngelikaLanger.com/
last update: 18/03/2014 08:06
gc pauses (35)
remembered set (cont.)
• RS maintenance requires write barriers
– must catch creation of inter-regional references
– RS update tasks are put into a work queue
processed concurrently by background threads, or
when STW GC pause starts
• simplification (in order to reduce RS overhead)
– references originating from young regions are not recorded in RS
– instead: all young regions are included into each GC
© Copyright 1995-2015 by Angelika Langer & Klaus Kreft. All Rights Reserved.
http://www.AngelikaLanger.com/
last update: 18/03/2014 08:06
gc pauses (36)
concurrent marking
• G1 performs a concurrent SATB marking
– similar to CMS's marking
– initial marking phase piggybacked on young GC
– no sweep phase
– instead a concurrent cleanup phase
reclaims entirely empty old regions en bloc
• marking information used for internal statistics
– GC efficiency calculation, i.e. amount of garbage per old region
– liveness info, i.e. are origins of RS entries still alive
© Copyright 1995-2015 by Angelika Langer & Klaus Kreft. All Rights Reserved.
http://www.AngelikaLanger.com/
last update: 18/03/2014 08:06
gc pauses (37)
G1 pauses
• only two main GC parameters:
-XX:GCPauseIntervalMillis=500
-XX:MaxGCPauseMillis=200
GC pauseapplication running
GCPauseIntervalMillis
<
<
MaxGCPauseMillis
© Copyright 1995-2015 by Angelika Langer & Klaus Kreft. All Rights Reserved.
http://www.AngelikaLanger.com/
last update: 18/03/2014 08:06
gc pauses (38)
G1 pauses
• evacuation pause in young mode
– proportional to number of young regions and number of survivors
therein
– also depends on cost of pending RS updates
– self-adjusting
G1 tries to create only as many young regions as can be collecting within
pause time goal
does not always work out
• evacuation pause in mixed mode
– depends on pause time goal
– self-adjusting
G1 includes only old regions with lots of garbage and only as many as fit
into the pause time goal
does not always work out
© Copyright 1995-2015 by Angelika Langer & Klaus Kreft. All Rights Reserved.
http://www.AngelikaLanger.com/
last update: 18/03/2014 08:06
gc pauses (39)
G1 pauses (cont.)
• full evacuation pause
– includes all regions into collection set
– if heap is almost full and cannot be expanded any further
– proportional to number of regions and survivors
• remarking
– amount of work left for remarking phase
• cleanup
– number of empty old regions
© Copyright 1995-2015 by Angelika Langer & Klaus Kreft. All Rights Reserved.
http://www.AngelikaLanger.com/
last update: 18/03/2014 08:06
gc pauses (40)
cost of G1
• extra effort for remembered sets
– slows down application via write barriers
– background GC threads for concurrent RS update
– increased pause time for RS update in evacuation pause
• long turn-around times
– complex algorithm takes longer until actual memory reclaim
• some amount of floating garbage
– due to concurrent SATB marking
• unreliable
– pause time goal not guaranteed
© Copyright 1995-2015 by Angelika Langer & Klaus Kreft. All Rights Reserved.
http://www.AngelikaLanger.com/
last update: 18/03/2014 08:06
gc pauses (41)
cost of G1
• upside
– reduced pause time (compared to parallel GC)
– no fragmentation (compared to CMS)
– fully self-adapting
• downside
– higher memory consumption
– lower throughput
© Copyright 1995-2015 by Angelika Langer & Klaus Kreft. All Rights Reserved.
http://www.AngelikaLanger.com/
last update: 18/03/2014 08:06
gc pauses (42)
G1 GC
young GC
initial marking
final remarking
- concurrent marking
minor GC
- concurrent marking
mixed GC
- evacuation
young GC
- concurrent cleanup
© Copyright 1995-2015 by Angelika Langer & Klaus Kreft. All Rights Reserved.
http://www.AngelikaLanger.com/
last update: 18/03/2014 08:06
gc pauses (43)
G1 - tuning
• some tuning ideas
– avoid over-tuning
set realistic pause time goals
– start marking cycles earlier
to avoid full GC
lower the occupancy threshold that initiates the cycle
– use more GC threads
increase number of threads in concurrent and STW phases
© Copyright 1995-2015 by Angelika Langer & Klaus Kreft. All Rights Reserved.
http://www.AngelikaLanger.com/
last update: 18/03/2014 08:06
gc pauses (44)
some useful VM flags
-XX:+UseG1GC
– select G1
-XX:GCPauseIntervalMillis -XX:MaxGCPauseMillis=<ms>
– specify pause time and interval goals
-XX:InitiatingHeapOccupancyPercent=<percent>
– lower threshold that starts marking cycle
-XX:ConcGCThreads=<n> -XX:ParallelGCThreads=<n>
– specify number of threads in concurrent and STW phases
© Copyright 1995-2015 by Angelika Langer & Klaus Kreft. All Rights Reserved.
http://www.AngelikaLanger.com/
last update: 18/03/2014 08:06
gc pauses (45)
agenda
• classic HotSpot GC algorithms
• less obvious reasons for long pauses
– humongous objects
– soft/weak references
– trace output
• alternative garbage collectors
– Shenandoah
– Oracle JRockit
– Azul "C4"
© Copyright 1995-2015 by Angelika Langer & Klaus Kreft. All Rights Reserved.
http://www.AngelikaLanger.com/
last update: 18/03/2014 08:06
gc pauses (46)
Shenandoah
• an alternative GC algorithm for OpenJDK
– (project name: Shenandoah)
– submitted by RedHat
• goal
– manage 100GB+ heaps with < 10ms pause times
– pause times proportional to size of root set, not size of heap
© Copyright 1995-2015 by Angelika Langer & Klaus Kreft. All Rights Reserved.
http://www.AngelikaLanger.com/
last update: 18/03/2014 08:06
gc pauses (47)
Shenandoah
• very similar to G1
– organized into regions
– concurrent SATB marking
– dynamically composed collection set based on GC efficiency
– ...
• key difference
– no notion of generations
age does not matter, only GC efficiency does
– concurrent evacuation
no STW pause for copying survivors
no remembered sets
survivors are copied invidually on write access
© Copyright 1995-2015 by Angelika Langer & Klaus Kreft. All Rights Reserved.
http://www.AngelikaLanger.com/
last update: 18/03/2014 08:06
gc pauses (48)
Shenandoah phases
• concurrent marking
– visit all reachable objects (starting with root references)
– needs a STW initial & remark pause (just like CMS and G1 do)
– afterwards there is liveness info for all regions
• concurrent evacuation
– select "garbage first" regions for evacuation ("from" space)
– select free target regions for evacuation ("to" space)
– scan reachable objects in selected "from" regions
– add a forwarding pointer to each object that must be relocated
– but do not copy it (yet)
© Copyright 1995-2015 by Angelika Langer & Klaus Kreft. All Rights Reserved.
http://www.AngelikaLanger.com/
last update: 18/03/2014 08:06
gc pauses (49)
forwarding pointer
"from" region "to" region
survivor
dead
live new location
© Copyright 1995-2015 by Angelika Langer & Klaus Kreft. All Rights Reserved.
http://www.AngelikaLanger.com/
last update: 18/03/2014 08:06
gc pauses (50)
Shenandoah phases (cont.)
• concurrent evacuation (cont.)
– first write access to survivor in "from" creates copy in "to"
– subsequent read access is redirected to copy
– references to relocated object are updated in next marking phase
– afterwards all "from" regions are reclaimed
© Copyright 1995-2015 by Angelika Langer & Klaus Kreft. All Rights Reserved.
http://www.AngelikaLanger.com/
last update: 18/03/2014 08:06
gc pauses (51)
survivor new location
copy on write (COW)
"from" region "to" region
survivor
dead
copylive
© Copyright 1995-2015 by Angelika Langer & Klaus Kreft. All Rights Reserved.
http://www.AngelikaLanger.com/
last update: 18/03/2014 08:06
gc pauses (52)
after next marking cycle
"to" region
copy
"from" region
survivor
dead
live
© Copyright 1995-2015 by Angelika Langer & Klaus Kreft. All Rights Reserved.
http://www.AngelikaLanger.com/
last update: 18/03/2014 08:06
gc pauses (53)
Shenandoah
initial marking
final remarking
- concurrent marking
initial marking
- memory reclaim
- concurrent evacuation
- copying
- concurrent evacuation
- copying
- concurrent marking
final remarking
© Copyright 1995-2015 by Angelika Langer & Klaus Kreft. All Rights Reserved.
http://www.AngelikaLanger.com/
last update: 18/03/2014 08:06
gc pauses (54)
evaluation
• upside
– few short STW pauses
– no remembered set overhead
• downside
– increased object size (due to forwarding pointer)
– more expensive write barriers
trigger object copying
note: applications threads (not GC threads) create the copies
– temporarily more expensive read access
due to indirection via forwarding pointer
– floating garbage
– long turn-around time
© Copyright 1995-2015 by Angelika Langer & Klaus Kreft. All Rights Reserved.
http://www.AngelikaLanger.com/
last update: 18/03/2014 08:06
gc pauses (55)
agenda
• classic HotSpot GC algorithms
• less obvious reasons for long pauses
– humongous objects
– soft/weak references
– trace output
• alternative garbage collectors
– Shenandoah
– Oracle JRockit
– Azul "C4"
© Copyright 1995-2015 by Angelika Langer & Klaus Kreft. All Rights Reserved.
http://www.AngelikaLanger.com/
last update: 18/03/2014 08:06
gc pauses (56)
JRockit GC
• GC functionality different from HotSpot
– more modular
• ‘mix & match’
– heap partitioning
– GC algorithm
– compaction
• can switch GC algorithms and strategies at runtime
– at least to a certain extent
© Copyright 1995-2015 by Angelika Langer & Klaus Kreft. All Rights Reserved.
http://www.AngelikaLanger.com/
last update: 18/03/2014 08:06
gc pauses (57)
heap partitioning
• un-partitioned
– single heap
• generational
– two areas: nursery (~ young generation) + old generation
– nursery contains keep area
 most recently allocated objects
 not copied to old gen during young GC
 avoids premature promotion of short-lived objects
© Copyright 1995-2015 by Angelika Langer & Klaus Kreft. All Rights Reserved.
http://www.AngelikaLanger.com/
last update: 18/03/2014 08:06
gc pauses (58)
young gen collector
• collects nursery (if present)
• scavenger GC
– copies all live objects from nursery to old generation
 does not touch keep area
– stop-the-world
– uses all available CPU cores
– resembles HotSpot's parallel young GC (w/o survivor space)
© Copyright 1995-2015 by Angelika Langer & Klaus Kreft. All Rights Reserved.
http://www.AngelikaLanger.com/
last update: 18/03/2014 08:06
gc pauses (59)
old gen collector
• collect old generation (gen) or entire heap (single)
• algorithm split into mark-and-sweep GC + compaction
strategy
– select GC algorithm and compaction strategy independently
© Copyright 1995-2015 by Angelika Langer & Klaus Kreft. All Rights Reserved.
http://www.AngelikaLanger.com/
last update: 18/03/2014 08:06
gc pauses (60)
two mark-and-sweep GC algorithms
• parallel mark-and-sweep GC
– stop-the-world
– uses all available CPU cores
– resembles HotSpot's parallel old GC (w/ compaction)
• concurrent mark-and-sweep GC
– "mostly concurrent"
– short stop-the-world-pauses during marking and sweeping
– resembles HotSpot's CMS
HotSpot sweeps concurrently (w/o STW pauses)
© Copyright 1995-2015 by Angelika Langer & Klaus Kreft. All Rights Reserved.
http://www.AngelikaLanger.com/
last update: 18/03/2014 08:06
gc pauses (61)
compaction
• partial compaction (on only a part of the heap)
– one or two windows traveling the heap
– window size is adjustable
– external or internal compaction
• compaction runs as a stop-the-world pause
– during sweep phase
top bottom
external
internal
© Copyright 1995-2015 by Angelika Langer & Klaus Kreft. All Rights Reserved.
http://www.AngelikaLanger.com/
last update: 18/03/2014 08:06
gc pauses (62)
-XgcPrio:deterministic - JRockit Real Time
• GC split up into work packets
– e.g. a compaction job for part of the heap
• if it takes too long, throw away the work packet
– re-try later, re-using partial results if possible
© Copyright 1995-2015 by Angelika Langer & Klaus Kreft. All Rights Reserved.
http://www.AngelikaLanger.com/
last update: 18/03/2014 08:06
gc pauses (63)
agenda
• classic HotSpot GC algorithms
• less obvious reasons for long pauses
– humongous objects
– soft/weak references
– trace output
• alternative garbage collectors
– Shenandoah
– Oracle JRockit
– Azul "C4"
© Copyright 1995-2015 by Angelika Langer & Klaus Kreft. All Rights Reserved.
http://www.AngelikaLanger.com/
last update: 18/03/2014 08:06
gc pauses (64)
Azul "C4"
• commercial JVM with a no-pause collector named "C4"
– "C4" = Continuously Concurrent Compacting Collector
• special purpose JVM (the so-called Zing platform)
– runs virtualized on top of the actual OS and includes its own
operating environment
• C4 algorithm makes massive and rapid changes to
virtual memory mappings
– regular Linux has technical remapping limitations
– Zing has its own virtual memory subsystem that supports
memory remaps, unmaps, etc. as needed for "C4"
© Copyright 1995-2015 by Angelika Langer & Klaus Kreft. All Rights Reserved.
http://www.AngelikaLanger.com/
last update: 18/03/2014 08:06
gc pauses (65)
Azul "C4" - how does it differ ?
• same core mechanism used for both generations
– concurrent mark-compact
– old and young generation collectors run simultaneously
and concurrently with the application threads
old gen mark-compact
young gen mark-compact
• algorithm has 3 phases
– mark
– relocate
– remap
© Copyright 1995-2015 by Angelika Langer & Klaus Kreft. All Rights Reserved.
http://www.AngelikaLanger.com/
last update: 18/03/2014 08:06
gc pauses (66)
"C4" phases
• mark phase
– trace generation’s live set by starting from roots
mark all encountered objects as live
mark all encountered object references as marked through
• relocate phase
– compact memory by relocating live objects into contiguously populated
target pages
free sparse pages based on liveness totals collected during previous mark phase
each “from” page is protected, its objects are relocated to new “to” pages
– forwarding information is stored outside the “from” page
– “from” page’s physical memory is immediately recycled
© Copyright 1995-2015 by Angelika Langer & Klaus Kreft. All Rights Reserved.
http://www.AngelikaLanger.com/
last update: 18/03/2014 08:06
gc pauses (67)
"C4" phases (cont.)
• remap phase
– remapping occurs when mutator threads encounter stale
references to relocated objects
stale references are corrected to point to current object address
remap phase is combined with next GC cycle’s mark phase
– at the end of remap phase, no stale references will exist
virtual addresses associated with relocated “from” can be safely recycled
– “no hurry” to finish remap phase
there are not physical resources being held
© Copyright 1995-2015 by Angelika Langer & Klaus Kreft. All Rights Reserved.
http://www.AngelikaLanger.com/
last update: 18/03/2014 08:06
gc pauses (68)
combined mark-remap phase
mark
relocate
remap
mark
relocate
remap
mark
relocate
remap
© Copyright 1995-2015 by Angelika Langer & Klaus Kreft. All Rights Reserved.
http://www.AngelikaLanger.com/
last update: 18/03/2014 08:06
gc pauses (69)
GC comparison
concurrent mark-compact
mostly concurrent or STW
parallel mark-sweep
STW incremental compact
mostly concurrent mark
concurrent compact
mostly concurrent mark
STW incremental compact
mostly concurrent mark-sweep
STW mark-compact
old
STW mark-compactSTW copyG1Oracle HotSpot
concurrent
mark-compact
C4Azul Zing
STW mark-compactN/ARealTimeOracle JRockit
???N/AShenandoahOpenJDK
STW mark-compactSTW copyCMSOracle HotSpot
STW copyParallelGCOracle HotSpot
fallbackyoungcollector
© Copyright 1995-2015 by Angelika Langer & Klaus Kreft. All Rights Reserved.
http://www.AngelikaLanger.com/
last update: 18/03/2014 08:06
gc pauses (70)
garbage collection pauses
Q & A
AngelikaAngelika LangerLanger
http://www.AngelikaLanger.com
twitter: @AngelikaLanger
Angelika Langer & Klaus Kreft
http://www.AngelikaLanger.com/
Java 8
Stream
Performance
© Copyright 2003-2015 by Angelika Langer & Klaus Kreft. All Rights Reserved.
http://www.AngelikaLanger.com/
last update: 10/13/2015,10:06
Stream Performance (2)
objective
• how do streams perform?
– explore whether / when parallel streams outperfom seq. streams
– compare performance of streams to performance of regular loops
• what determines stream performance?
– take a glance at some stream internal mechanisms
© Copyright 2003-2015 by Angelika Langer & Klaus Kreft. All Rights Reserved.
http://www.AngelikaLanger.com/
last update: 10/13/2015,10:06
Stream Performance (3)
speaker's relationship to topic
• independent trainer / consultant / author
– teaching C++ and Java for ~20 years
– curriculum of half a dozen challenging Java seminars
– JCP observer and Java champion since 2005
– co-author of "Effective Java" column
– author of Java Generics FAQ
– author of Lambda Tutorial & Reference
© Copyright 2003-2015 by Angelika Langer & Klaus Kreft. All Rights Reserved.
http://www.AngelikaLanger.com/
last update: 10/13/2015,10:06
Stream Performance (4)
agenda
• introduction
• loop vs. sequential stream
• sequential vs. parallel stream
© Copyright 2003-2015 by Angelika Langer & Klaus Kreft. All Rights Reserved.
http://www.AngelikaLanger.com/
last update: 10/13/2015,10:06
Stream Performance (5)
what is a stream?
• equivalent of
sequence from functional programming languages
– object-oriented view: internal iterator pattern
 see GOF book for more details
• idea
myStream. forEach ( s -> System.out.print(s) );
stream operation user-defined functionality
applied to each element
© Copyright 2003-2015 by Angelika Langer & Klaus Kreft. All Rights Reserved.
http://www.AngelikaLanger.com/
last update: 10/13/2015,10:06
Stream Performance (6)
fluent programming
myStream. filter ( s -> s.length() > 3 )
. mapToInt ( s -> s.length() )
. forEach ( System.out::print );
stream operation user-defined functionality
applied to each element
intermediate
operations
terminal
operation
© Copyright 2003-2015 by Angelika Langer & Klaus Kreft. All Rights Reserved.
http://www.AngelikaLanger.com/
last update: 10/13/2015,10:06
Stream Performance (7)
obtain a stream
• collection:
• array:
• resulting stream
– does not store any elements
– just a view of the underlying stream source
• more stream factories, but not in this talk
myCollection.stream(). ...
Arrays.stream(myArray). ...
© Copyright 2003-2015 by Angelika Langer & Klaus Kreft. All Rights Reserved.
http://www.AngelikaLanger.com/
last update: 10/13/2015,10:06
Stream Performance (8)
parallel streams
• collection:
• array:
• performs stream operations in parallel
– i.e. with multiple worker threads from fork-join common pool
myCollection.parallelStream(). ...
Arrays.stream(myArray).parallel(). ...
myParallelStream.forEach(s -> System.out.print(s));
© Copyright 2003-2015 by Angelika Langer & Klaus Kreft. All Rights Reserved.
http://www.AngelikaLanger.com/
last update: 10/13/2015,10:06
Stream Performance (9)
stream functionality rivals loops
• Java 8 streams:
• since Java 5:
• pre-Java 5: Iterator iter = myCol.iterator();
while (iter.hasNext()) {
String s = iter.next();
if (s.length() > 3)
System.out.print(s.length());
}
for (String s : myCol)
if (s.length() > 3)
System.out.print(s.length());
myStream.filter(s -> s.length() > 3)
.mapToInt(s -> s.length())
.forEach(System.out::print);
myStream.filter(s -> s.length() > 3)
.forEach(s->System.out.print(s.length()));
© Copyright 2003-2015 by Angelika Langer & Klaus Kreft. All Rights Reserved.
http://www.AngelikaLanger.com/
last update: 10/13/2015,10:06
Stream Performance (10)
obvious question …
… how does the performance compare ?
• loop vs. sequential stream vs. parallel stream
© Copyright 2003-2015 by Angelika Langer & Klaus Kreft. All Rights Reserved.
http://www.AngelikaLanger.com/
last update: 10/13/2015,10:06
Stream Performance (11)
benchmarks …
… done on an older desktop system with:
– Intel E8500,
 2 x 3,17GHz
 4GB RAM
– Win 7
– JDK 1.8.0_05
• disclaimer: your mileage may vary
– i.e. parallel performance heavily depends on number of CPU-Cores
© Copyright 2003-2015 by Angelika Langer & Klaus Kreft. All Rights Reserved.
http://www.AngelikaLanger.com/
last update: 10/13/2015,10:06
Stream Performance (12)
agenda
• introduction
• loop vs. sequential stream
• sequential vs. parallel stream
© Copyright 2003-2015 by Angelika Langer & Klaus Kreft. All Rights Reserved.
http://www.AngelikaLanger.com/
last update: 10/13/2015,10:06
Stream Performance (13)
how do sequential stream work?
• example
• filter() and mapToInt() return streams
– intermediate operations
• reduce() returns int
– terminal operation,
– that produces a single result from all elements of the stream
String[] txt = { "State", "of", "the", "Lambda",
"Libraries", "Edition"};
IntStream is = Arrays.stream(txt).filter(s -> s.length() > 3)
.mapToInt(s -> s.length())
.reduce(0, (l1, l2) -> l1 + l2);
© Copyright 2003-2015 by Angelika Langer & Klaus Kreft. All Rights Reserved.
http://www.AngelikaLanger.com/
last update: 10/13/2015,10:06
Stream Performance (14)
pipelined processing
"State" "of" "the" "Lambda" "Libraries" "Edition"
5 6 9 7
"State" "Lambda" "Libraries" "Edition"
code looks like
really executed
filter
mapToInt
Arrays.stream(txt).filter(s -> s.length() > 3)
.mapToInt(s -> s.length())
.reduce(0, (l1, l2) -> l1 + l2);
reduce
5 11 20 270
© Copyright 2003-2015 by Angelika Langer & Klaus Kreft. All Rights Reserved.
http://www.AngelikaLanger.com/
last update: 10/13/2015,10:06
Stream Performance (15)
benchmark with int-array
• int[500_000], find largest element
– for-loop:
– sequential stream:
int[] a = ints;
int e = ints.length;
int m = Integer.MIN_VALUE;
for (int i = 0; i < e; i++)
if (a[i] > m) m = a[i];
int m = Arrays.stream(ints)
.reduce(Integer.MIN_VALUE, Math::max);
© Copyright 2003-2015 by Angelika Langer & Klaus Kreft. All Rights Reserved.
http://www.AngelikaLanger.com/
last update: 10/13/2015,10:06
Stream Performance (16)
results
for-loop: 0.36 ms
seq. stream: 5.35 ms
• for-loop is ~15x faster
• are seq. streams always much slower than loops?
– no, this is the most extreme example
– lets see the same benchmark with an ArrayList<Integer>
 underlying data structure is also an array
 this time filled with Integer values, i.e. the boxed equivalent of int
© Copyright 2003-2015 by Angelika Langer & Klaus Kreft. All Rights Reserved.
http://www.AngelikaLanger.com/
last update: 10/13/2015,10:06
Stream Performance (17)
benchmark with ArrayList<Integer>
• find largest element in an ArrayList with 500_000
elements
– for-loop:
– sequential stream:
int m = Integer.MIN_VALUE;
for (int i : myList)
if (i > m) m = i;
int m = myList.stream()
.reduce(Integer.MIN_VALUE, Math::max);
© Copyright 2003-2015 by Angelika Langer & Klaus Kreft. All Rights Reserved.
http://www.AngelikaLanger.com/
last update: 10/13/2015,10:06
Stream Performance (18)
results
ArrayList, for-loop: 6.55 ms
ArrayList, seq. stream: 8.33 ms
• for-loop still faster, but only 1.27x
• iteration for ArrayList is more expensive
– boxed elements require an additional memory access (indirection)
– which does not work well with the CPU’s memory cache
• bottom-line:
– iteration cost dominates the benchmark result
– performance advantage of the for-loop is insignificant
© Copyright 2003-2015 by Angelika Langer & Klaus Kreft. All Rights Reserved.
http://www.AngelikaLanger.com/
last update: 10/13/2015,10:06
Stream Performance (19)
some thoughts
• previous situation:
– costs of iteration are relative high, but
– costs of functionality applied to each element are relative low
 after JIT-compilation:
more or less the cost of a compare-assembler-instruction
• what if we apply a more expensive functionality
to each element ?
– how will this affect the benchmark results ?
© Copyright 2003-2015 by Angelika Langer & Klaus Kreft. All Rights Reserved.
http://www.AngelikaLanger.com/
last update: 10/13/2015,10:06
Stream Performance (20)
expensive functionality
• slowSin()
from Apache Commons Mathematics Library
– calculates a Taylor approximation of the sine function value
for the parameter passed to this method
– (normally) not in the public interface of the library
 used to calculate values for an internal table,
 which is used for interpolation by FastCalcMath.sin()
© Copyright 2003-2015 by Angelika Langer & Klaus Kreft. All Rights Reserved.
http://www.AngelikaLanger.com/
last update: 10/13/2015,10:06
Stream Performance (21)
benchmark with slowSin()
• int array / ArrayList with 10_000 elements
– for-loop:
– sequential stream:
– code for ArrayList changed respectively
int[] a = ints;
int e = a.length;
double m = Double.MIN_VALUE;
for (int i = 0; i < e; i++) {
double d = Sine.slowSin(a[i]);
if (d > m) m = d;
}
Arrays.stream(ints)
.mapToDouble(Sine::slowSin)
.reduce(Double.MIN_VALUE, Math::max);
© Copyright 2003-2015 by Angelika Langer & Klaus Kreft. All Rights Reserved.
http://www.AngelikaLanger.com/
last update: 10/13/2015,10:06
Stream Performance (22)
results
int[], for-loop: 11.72 ms
int[], seq. stream: 11.85 ms
ArrayList, for-loop: 11.84 ms
ArrayList, seq. stream: 11.85 ms
• for-loop is not really faster
• reason:
– applied functionality costs dominate the benchmark result
– performance advantage of the for-loop has evaporated
© Copyright 2003-2015 by Angelika Langer & Klaus Kreft. All Rights Reserved.
http://www.AngelikaLanger.com/
last update: 10/13/2015,10:06
Stream Performance (23)
other aspect (without benchmark)
• today, compilers (javac + JIT) can optimize
loops better than stream code
• reasons:
– linear code (loop) vs. injected functionality (stream)
– lambdas + method references are new to Java
– loop optimization is a very mature technology
– …
© Copyright 2003-2015 by Angelika Langer & Klaus Kreft. All Rights Reserved.
http://www.AngelikaLanger.com/
last update: 10/13/2015,10:06
Stream Performance (24)
for-loop vs. seq. stream / re-cap
• sequential stream can be slower or as fast as for-loop
• depends on
– costs of the iteration
– costs of the functionality applied to each element
• the higher the cost (iteration + functionality)
the closer is stream performance
to for-loop performance
© Copyright 2003-2015 by Angelika Langer & Klaus Kreft. All Rights Reserved.
http://www.AngelikaLanger.com/
last update: 10/13/2015,10:06
Stream Performance (25)
agenda
• introduction
• loop vs. sequential stream
• sequential vs. parallel stream
– introduction
– stateless functionality
– stateful functionality
© Copyright 2003-2015 by Angelika Langer & Klaus Kreft. All Rights Reserved.
http://www.AngelikaLanger.com/
last update: 10/13/2015,10:06
Stream Performance (26)
parallel streams
• library side parallelism
– important feature
 you need not know anything about threads, etc.
 very little implementation effort, just: parallel
• performance aspect
– outperform loops, which are inherently sequential
© Copyright 2003-2015 by Angelika Langer & Klaus Kreft. All Rights Reserved.
http://www.AngelikaLanger.com/
last update: 10/13/2015,10:06
Stream Performance (27)
how do parallel stream work?
• example
• parallel()’s functionality is based on
the fork-join framework
final int SIZE = 64;
int[] ints = new int[SIZE];
ThreadLocalRandom rand = ThreadLocalRandom.current();
for (int i=0; i<SIZE; i++) ints[i] = rand.nextInt();
Arrays.stream(ints)
.parallel()
.reduce(Math::max)
.ifPresent(System.out.println(m -> “max is: ” + m));
© Copyright 2003-2015 by Angelika Langer & Klaus Kreft. All Rights Reserved.
http://www.AngelikaLanger.com/
last update: 10/13/2015,10:06
Stream Performance (28)
fork join tasks
• original task is divided into two sub-tasks
by splitting the stream source into two parts
– original task’s result are based on sub-tasks’ results
– sub-tasks are divided again … fork phase
• at a certain depth partitioning stops
– tasks at this level (leaf tasks) are executed
– execution phase
• completed sub-task results
are ‘combined’ to super-task results
– join phase
© Copyright 2003-2015 by Angelika Langer & Klaus Kreft. All Rights Reserved.
http://www.AngelikaLanger.com/
last update: 10/13/2015,10:06
Stream Performance (29)
find largest element with parallel stream
T
fork phase execution join phase
reduce((i,j) -> Math.max(i,j));
0_63
T2
T1
0_31
32_63 T22
T21
T12
T11
0_15
16_31
32_47
48_63
m48_63
m32_47
m16_31
m0_15
T2
T1
max(m32_47,m48_63)
max(m0_15,m16_31)
m32_63
m0_31
T
max(m0_31,m32_63)
m0_63
© Copyright 2003-2015 by Angelika Langer & Klaus Kreft. All Rights Reserved.
http://www.AngelikaLanger.com/
last update: 10/13/2015,10:06
Stream Performance (30)
split level
• deeper split level than shown !!!
– execution/leaf tasks: ~ 4*numberOfCores
 8 tasks for a dual core CPU (only 4 in the previous diagram)
– i.e. one additional split (only 2 in the previous graphic)
• key abstractions
– java.util.Spliterator
– java.util.concurrent.ForkJoinPool.commonPool()
© Copyright 2003-2015 by Angelika Langer & Klaus Kreft. All Rights Reserved.
http://www.AngelikaLanger.com/
last update: 10/13/2015,10:06
Stream Performance (31)
what is a Spliterator ?
• spliterator = splitter + iterator
• each type of stream source has its own spliterator type
– knows how to split the stream source
 e.g. ArrayList.ArrayListSpliterator
– knows how to iterate the stream source
 in execution phase
© Copyright 2003-2015 by Angelika Langer & Klaus Kreft. All Rights Reserved.
http://www.AngelikaLanger.com/
last update: 10/13/2015,10:06
Stream Performance (32)
what is the CommonPool ?
• common pool is a singleton fork-join pool instance
– introduced with Java 8
– all parallel stream operations use the common pool
 so does other parallel JDK functionality (e.g. CompletableFuture), too
• default: parallel execution of stream tasks uses
– (current) thread that invoked terminal operation, and
– (number of cores – 1) many threads from common pool
 if (number of cores) > 1
• this default configuration used for all benchmarks
© Copyright 2003-2015 by Angelika Langer & Klaus Kreft. All Rights Reserved.
http://www.AngelikaLanger.com/
last update: 10/13/2015,10:06
Stream Performance (33)
parallel streams + intermediate operations
• what if the stream contains
upstream intermediate operations
when/where are these applied to the stream ?
... .parallelStream().filter(...)
.mapToInt(...)
.reduce((i,j) -> Math.max(i,j));
© Copyright 2003-2015 by Angelika Langer & Klaus Kreft. All Rights Reserved.
http://www.AngelikaLanger.com/
last update: 10/13/2015,10:06
Stream Performance (34)
find largest element in parallel
filter(...).mapToInt(...).reduce((i,j) -> Math.max(i,j));
. . . . . … .
filter
mapToInt
reduce
T
T2
T1
T22
T21
T12
T11
T2
T1
T
execution
© Copyright 2003-2015 by Angelika Langer & Klaus Kreft. All Rights Reserved.
http://www.AngelikaLanger.com/
last update: 10/13/2015,10:06
Stream Performance (35)
parallel overhead …
… compared to sequential stream algorithm
• algorithm is more complicated / resource intensive
– create fork-join-task objects
 splitting
 fork-join-task objects creation
– thread pool scheduling
– …
• plus additional GC costs
– fork-join-task objects have to be reclaimed
© Copyright 2003-2015 by Angelika Langer & Klaus Kreft. All Rights Reserved.
http://www.AngelikaLanger.com/
last update: 10/13/2015,10:06
Stream Performance (36)
agenda
• introduction
• loop vs. sequential stream
• sequential vs. parallel stream
– introduction
– stateless functionality
– stateful functionality
© Copyright 2003-2015 by Angelika Langer & Klaus Kreft. All Rights Reserved.
http://www.AngelikaLanger.com/
last update: 10/13/2015,10:06
Stream Performance (37)
back to the first example / benchmark parallel
• find largest element, array / collection, 500_000 elements
– sequential stream:
– parallel stream:
int m = Arrays.stream(ints)
.reduce(Integer.MIN_VALUE, Math::max);
int m = Arrays.stream(ints).parallel()
.reduce(Integer.MIN_VALUE, Math::max);
int m = myCollection.stream()
.reduce(Integer.MIN_VALUE, Math::max);
int m = myCollection.parallelStream()
.reduce(Integer.MIN_VALUE, Math::max);
© Copyright 2003-2015 by Angelika Langer & Klaus Kreft. All Rights Reserved.
http://www.AngelikaLanger.com/
last update: 10/13/2015,10:06
Stream Performance (38)
results
seq. par. seq./par.
int-Array 5.35 ms 3.35 ms 1.60
ArrayList 8.33 ms 6.33 ms 1.32
LinkedList 12.74 ms 19.57 ms 0.65
HashSet 20.76 ms 16.01 ms 1.30
TreeSet 19.79 ms 15.49 ms 1.28
© Copyright 2003-2015 by Angelika Langer & Klaus Kreft. All Rights Reserved.
http://www.AngelikaLanger.com/
last update: 10/13/2015,10:06
Stream Performance (39)
result discussion
• why is parallel LinkedList performance so bad ?
– hard to split
– needs 250_000 iterator’s next() invocations for the first split
 with ArrayList: just some index computation
• performance of the other collections is also not so great
– functionality applied to each element is not very CPU-expensive
 after JIT-compilation: cost of a compare-assembler-instruction
– iteration (element access) is relative expensive (indirection !)
 but not CPU expensive
– but more CPU-power is what we have with parallel streams
© Copyright 2003-2015 by Angelika Langer & Klaus Kreft. All Rights Reserved.
http://www.AngelikaLanger.com/
last update: 10/13/2015,10:06
Stream Performance (40)
result discussion (cont.)
• why is parallel int-array performance relatively good ?
– iteration (element access) is no so expensive (no indirection !)
© Copyright 2003-2015 by Angelika Langer & Klaus Kreft. All Rights Reserved.
http://www.AngelikaLanger.com/
last update: 10/13/2015,10:06
Stream Performance (41)
CPU-expensive functionality
• back to slowSin()
– calculates a Taylor approximation of the sine function value
for the parameter passed to this method
– CPU-bound functionality
 needs only the initial parameter from memory
 calculation based on it’s own (intermediate) results
– ideal to be speed up by parallel streams with multiple cores
© Copyright 2003-2015 by Angelika Langer & Klaus Kreft. All Rights Reserved.
http://www.AngelikaLanger.com/
last update: 10/13/2015,10:06
Stream Performance (42)
benchmark parallel with slowSin()
• array / collection with 10_000 elements
– array:
– collection:
myCollection.stream() // .parallelStream()
.mapToDouble(Sine::slowSin)
.reduce(Double.MIN_VALUE, (i, j) -> Math.max(i, j);
Arrays.stream(ints) // .parallel()
.mapToDouble(Sine::slowSin)
.reduce(Double.MIN_VALUE, (i, j) -> Math.max(i, j);
© Copyright 2003-2015 by Angelika Langer & Klaus Kreft. All Rights Reserved.
http://www.AngelikaLanger.com/
last update: 10/13/2015,10:06
Stream Performance (43)
results
seq. par. seq./par.
int-Array 10.81 ms 6.03 ms 1.79
ArrayList 10.97 ms 6.10 ms 1.80
LinkedList 11.15 ms 6.25 ms 1.78
HashSet 11.15 ms 6.15 ms 1.81
TreeSet 11.14 ms 6.30 ms 1.77
© Copyright 2003-2015 by Angelika Langer & Klaus Kreft. All Rights Reserved.
http://www.AngelikaLanger.com/
last update: 10/13/2015,10:06
Stream Performance (44)
result discussion
• performance improvements for all stream sources
– by a factor of ~ 1.8
 even for LinkedList
• the ~1.8 is the maximum improvement on our platform
– the remaining 0.2 are
 overhead of the parallel algorithm
 sequential bottlenecks (Amdahl’s law)
© Copyright 2003-2015 by Angelika Langer & Klaus Kreft. All Rights Reserved.
http://www.AngelikaLanger.com/
last update: 10/13/2015,10:06
Stream Performance (45)
sufficient size (without benchmark)
• stream source must have a sufficient size,
so that it benefits from parallel processing
• overhead increases with growing number of cores
– number of tasks ~ 4*number of cores
– (in most cases) not with the size of the stream source
• Doug Lea mentioned 10_000 for CPU-inexpensive funct.
– http://gee.cs.oswego.edu/dl/html/StreamParallelGuidance.html
• 500_000 respectively 10_000 in our examples
– size can be smaller for CPU-expensive functionality
© Copyright 2003-2015 by Angelika Langer & Klaus Kreft. All Rights Reserved.
http://www.AngelikaLanger.com/
last update: 10/13/2015,10:06
Stream Performance (46)
dynamic overclocking (without benchmark)
• modern multi-core CPU typically increases the
CPU-frequency when not all of its cores are active
– Intel call this feature: turbo boost
• benchmark sequential versus parallel stream
– seq. test might run with a dynamically overclocked CPU
– will this also happen in the real environment or only in the test?
• no issue with our test system
– too old
– no dynamic overclocking supported
© Copyright 2003-2015 by Angelika Langer & Klaus Kreft. All Rights Reserved.
http://www.AngelikaLanger.com/
last update: 10/13/2015,10:06
Stream Performance (47)
agenda
• introduction
• loop vs. sequential stream
• sequential vs. parallel stream
– introduction
– stateless functionality
– stateful functionality
© Copyright 2003-2015 by Angelika Langer & Klaus Kreft. All Rights Reserved.
http://www.AngelikaLanger.com/
last update: 10/13/2015,10:06
Stream Performance (48)
stateful functionality …
… with parallel streams / multiple threads boils down to
shared mutable state
• costs performance to handle this
– e.g. lock-free CAS, requires retries in case of collision
• traditionally not supported with sequences
– functional programming languages don’t have mutable types, and
– often no parallel sequences either
• new solutions/approaches in Java 8 streams
© Copyright 2003-2015 by Angelika Langer & Klaus Kreft. All Rights Reserved.
http://www.AngelikaLanger.com/
last update: 10/13/2015,10:06
Stream Performance (49)
stateful functionality with Java 8 streams
• intermediate stateful operations, e.g. distinct()
– see javadoc: This is a stateful intermediate operation.
– shared mutable state handled by stream implementation (JDK)
• (terminal) operations that allow stateful functional
parameters, e.g.
forEach(Consumer<? super T> action)
– see javadoc: If the action accesses shared state, it is responsible
for providing the required synchronization.
– shared mutable state handled by user/client code
© Copyright 2003-2015 by Angelika Langer & Klaus Kreft. All Rights Reserved.
http://www.AngelikaLanger.com/
last update: 10/13/2015,10:06
Stream Performance (50)
stateful functionality with Java 8 streams (cont.)
• stream’s overloaded method: collect()
– shared mutable state handled by stream implementation, and
– collector functionality
 standard collectors from Collectors (JDK)
 user-defined collector functionality (JDK + user/client code)
• don’t have time to discuss all situations
– only discuss distinct()
– shared mutable state handled by stream implementation (JDK)
© Copyright 2003-2015 by Angelika Langer & Klaus Kreft. All Rights Reserved.
http://www.AngelikaLanger.com/
last update: 10/13/2015,10:06
Stream Performance (51)
distinct()
• element goes to the result stream,
if it hasn’t already appeared before
– appeared before, in terms of equals()
– shared mutable state: elements already in the result stream
 have to compare the current element to each element of the output stream
• parallel introduces a barrier (algorithmic overhead)
.parallelStream().statelessOps().distinct().statelessOps().terminal();
two alternative
algorithms
© Copyright 2003-2015 by Angelika Langer & Klaus Kreft. All Rights Reserved.
http://www.AngelikaLanger.com/
last update: 10/13/2015,10:06
Stream Performance (52)
two algorithms for parallel distinct()
• ordering + distinct()
– normally elements go to the next stage, in the same order in which
they appear for the first time in the current stage
• javadoc from distinct()
– Removing the ordering constraint with unordered() may result in
significantly more efficient execution for distinct() in parallel
pipelines, if the semantics of your situation permit.
• two different algorithms for parallel distinct()
– one for ordered streams + one for unordered streams
© Copyright 2003-2015 by Angelika Langer & Klaus Kreft. All Rights Reserved.
http://www.AngelikaLanger.com/
last update: 10/13/2015,10:06
Stream Performance (53)
benchmark with distinct()
• Integer[100_000], filled with 50_000 distinct values
• results:
seq. par. ordered par. unordered
6.39 ms 34.09 ms 9.1 ms
// parallel ordered
Arrays.stream(integers).parallel().distinct().count();
// sequential
Arrays.stream(integers).distinct().count();
// parallel unordered
Arrays.stream(integers).parallel().unordered().distinct().count();
© Copyright 2003-2015 by Angelika Langer & Klaus Kreft. All Rights Reserved.
http://www.AngelikaLanger.com/
last update: 10/13/2015,10:06
Stream Performance (54)
benchmark with distinct() + slowSin()
• Integer[10_000], filled with numbers 0 … 9999
– after the mapping 5004 distinct values
• results:
seq. par. ordered par. unordered
11.59 ms 6.83 ms 6.81 ms
Arrays.stream(newIntegers) //.parallel().unordered()
.map(i -> new Double(2200* Sine.slowSin(i * 0.001)).intValue())
.distinct()
.count();
© Copyright 2003-2015 by Angelika Langer & Klaus Kreft. All Rights Reserved.
http://www.AngelikaLanger.com/
last update: 10/13/2015,10:06
Stream Performance (55)
sequential vs. parallel stream / re-cap
to benefit from parallel stream usage …
• … stream source …
– must have sufficient size
– should be easy to split
• … operations …
– should be CPU-expensive
– should not be stateful
© Copyright 2003-2015 by Angelika Langer & Klaus Kreft. All Rights Reserved.
http://www.AngelikaLanger.com/
last update: 10/13/2015,10:06
Stream Performance (56)
advice
• benchmark on target platform !
• previous benchmark:
– find largest element, LinkedList, 500_000 elements
• what if we use a quad-core-CPU (Intel i5-4590) ?
– will the parallel result be worse, better, … better than seq. … ?
seq. par. seq./par.
12.74 ms 19.57 ms 0.65
seq. par. seq./par.
5.24 ms 4.84 ms 1.08
© Copyright 2003-2015 by Angelika Langer & Klaus Kreft. All Rights Reserved.
http://www.AngelikaLanger.com/
last update: 10/13/2015,10:06
Stream Performance (57)
authors
Angelika LangerAngelika Langer
KlausKlaus KreftKreft
http://www.AngelikaLanger.com
© Copyright 2003-2015 by Angelika Langer & Klaus Kreft. All Rights Reserved.
http://www.AngelikaLanger.com/
last update: 10/13/2015,10:06
Stream Performance (58)
stream performance
Q & A

More Related Content

Viewers also liked

Mastering Java Bytecode - JAX.de 2012
Mastering Java Bytecode - JAX.de 2012Mastering Java Bytecode - JAX.de 2012
Mastering Java Bytecode - JAX.de 2012
Anton Arhipov
 
Mastering java bytecode with ASM - GeeCON 2012
Mastering java bytecode with ASM - GeeCON 2012Mastering java bytecode with ASM - GeeCON 2012
Mastering java bytecode with ASM - GeeCON 2012
Anton Arhipov
 
Java Bytecode For Discriminating Developers - GeeCON 2011
Java Bytecode For Discriminating Developers - GeeCON 2011Java Bytecode For Discriminating Developers - GeeCON 2011
Java Bytecode For Discriminating Developers - GeeCON 2011
Anton Arhipov
 
JVM bytecode - The secret language behind Java and Scala
JVM bytecode - The secret language behind Java and ScalaJVM bytecode - The secret language behind Java and Scala
JVM bytecode - The secret language behind Java and Scala
Takipi
 
Introduction to jQuery
Introduction to jQueryIntroduction to jQuery
Introduction to jQuery
achinth
 
Garbage collection
Garbage collectionGarbage collection
Garbage collection
Mudit Gupta
 
Mark and sweep algorithm(garbage collector)
Mark and sweep algorithm(garbage collector)Mark and sweep algorithm(garbage collector)
Mark and sweep algorithm(garbage collector)
Ashish Jha
 
G1 Garbage Collector - Big Heaps and Low Pauses?
G1 Garbage Collector - Big Heaps and Low Pauses?G1 Garbage Collector - Big Heaps and Low Pauses?
G1 Garbage Collector - Big Heaps and Low Pauses?
C2B2 Consulting
 
G1 collector and tuning and Cassandra
G1 collector and tuning and CassandraG1 collector and tuning and Cassandra
G1 collector and tuning and Cassandra
Chris Lohfink
 
GC Tuning Confessions Of A Performance Engineer
GC Tuning Confessions Of A Performance EngineerGC Tuning Confessions Of A Performance Engineer
GC Tuning Confessions Of A Performance Engineer
Monica Beckwith
 
GC Tuning in the HotSpot Java VM - a FISL 10 Presentation
GC Tuning in the HotSpot Java VM - a FISL 10 PresentationGC Tuning in the HotSpot Java VM - a FISL 10 Presentation
GC Tuning in the HotSpot Java VM - a FISL 10 Presentation
Ludovic Poitou
 
Understanding Java Garbage Collection - And What You Can Do About It
Understanding Java Garbage Collection - And What You Can Do About ItUnderstanding Java Garbage Collection - And What You Can Do About It
Understanding Java Garbage Collection - And What You Can Do About It
Azul Systems Inc.
 
Heap Management
Heap ManagementHeap Management
Heap Management
Jenny Galino
 
Garbage collection
Garbage collectionGarbage collection
Garbage collection
Somya Bagai
 
Garbage collection algorithms
Garbage collection algorithmsGarbage collection algorithms
Garbage collection algorithms
achinth
 
How long can you afford to Stop The World?
How long can you afford to Stop The World?How long can you afford to Stop The World?
How long can you afford to Stop The World?
Java Usergroup Berlin-Brandenburg
 
[Jbcn 2016] Garbage Collectors WTF!?
[Jbcn 2016] Garbage Collectors WTF!?[Jbcn 2016] Garbage Collectors WTF!?
[Jbcn 2016] Garbage Collectors WTF!?
Alonso Torres
 
Java GC
Java GCJava GC
Java GC
Ray Cheng
 
Let's talk about Garbage Collection
Let's talk about Garbage CollectionLet's talk about Garbage Collection
Let's talk about Garbage Collection
Haim Yadid
 
Basic Garbage Collection Techniques
Basic  Garbage  Collection  TechniquesBasic  Garbage  Collection  Techniques
Basic Garbage Collection Techniques
An Khuong
 

Viewers also liked (20)

Mastering Java Bytecode - JAX.de 2012
Mastering Java Bytecode - JAX.de 2012Mastering Java Bytecode - JAX.de 2012
Mastering Java Bytecode - JAX.de 2012
 
Mastering java bytecode with ASM - GeeCON 2012
Mastering java bytecode with ASM - GeeCON 2012Mastering java bytecode with ASM - GeeCON 2012
Mastering java bytecode with ASM - GeeCON 2012
 
Java Bytecode For Discriminating Developers - GeeCON 2011
Java Bytecode For Discriminating Developers - GeeCON 2011Java Bytecode For Discriminating Developers - GeeCON 2011
Java Bytecode For Discriminating Developers - GeeCON 2011
 
JVM bytecode - The secret language behind Java and Scala
JVM bytecode - The secret language behind Java and ScalaJVM bytecode - The secret language behind Java and Scala
JVM bytecode - The secret language behind Java and Scala
 
Introduction to jQuery
Introduction to jQueryIntroduction to jQuery
Introduction to jQuery
 
Garbage collection
Garbage collectionGarbage collection
Garbage collection
 
Mark and sweep algorithm(garbage collector)
Mark and sweep algorithm(garbage collector)Mark and sweep algorithm(garbage collector)
Mark and sweep algorithm(garbage collector)
 
G1 Garbage Collector - Big Heaps and Low Pauses?
G1 Garbage Collector - Big Heaps and Low Pauses?G1 Garbage Collector - Big Heaps and Low Pauses?
G1 Garbage Collector - Big Heaps and Low Pauses?
 
G1 collector and tuning and Cassandra
G1 collector and tuning and CassandraG1 collector and tuning and Cassandra
G1 collector and tuning and Cassandra
 
GC Tuning Confessions Of A Performance Engineer
GC Tuning Confessions Of A Performance EngineerGC Tuning Confessions Of A Performance Engineer
GC Tuning Confessions Of A Performance Engineer
 
GC Tuning in the HotSpot Java VM - a FISL 10 Presentation
GC Tuning in the HotSpot Java VM - a FISL 10 PresentationGC Tuning in the HotSpot Java VM - a FISL 10 Presentation
GC Tuning in the HotSpot Java VM - a FISL 10 Presentation
 
Understanding Java Garbage Collection - And What You Can Do About It
Understanding Java Garbage Collection - And What You Can Do About ItUnderstanding Java Garbage Collection - And What You Can Do About It
Understanding Java Garbage Collection - And What You Can Do About It
 
Heap Management
Heap ManagementHeap Management
Heap Management
 
Garbage collection
Garbage collectionGarbage collection
Garbage collection
 
Garbage collection algorithms
Garbage collection algorithmsGarbage collection algorithms
Garbage collection algorithms
 
How long can you afford to Stop The World?
How long can you afford to Stop The World?How long can you afford to Stop The World?
How long can you afford to Stop The World?
 
[Jbcn 2016] Garbage Collectors WTF!?
[Jbcn 2016] Garbage Collectors WTF!?[Jbcn 2016] Garbage Collectors WTF!?
[Jbcn 2016] Garbage Collectors WTF!?
 
Java GC
Java GCJava GC
Java GC
 
Let's talk about Garbage Collection
Let's talk about Garbage CollectionLet's talk about Garbage Collection
Let's talk about Garbage Collection
 
Basic Garbage Collection Techniques
Basic  Garbage  Collection  TechniquesBasic  Garbage  Collection  Techniques
Basic Garbage Collection Techniques
 

Similar to Garbage Collection Pause Times - Angelika Langer

Tuning the HotSpot JVM Garbage Collectors
Tuning the HotSpot JVM Garbage CollectorsTuning the HotSpot JVM Garbage Collectors
Tuning the HotSpot JVM Garbage Collectors
langer4711
 
Jouney of process safety (2)
Jouney of  process safety (2)Jouney of  process safety (2)
Jouney of process safety (2)
MD- Economic Engineering Excellence
 
ZGC-SnowOne.pdf
ZGC-SnowOne.pdfZGC-SnowOne.pdf
ZGC-SnowOne.pdf
Monica Beckwith
 
Garbage First Garbage Collector (G1 GC) - Migration to, Expectations and Adva...
Garbage First Garbage Collector (G1 GC) - Migration to, Expectations and Adva...Garbage First Garbage Collector (G1 GC) - Migration to, Expectations and Adva...
Garbage First Garbage Collector (G1 GC) - Migration to, Expectations and Adva...
Monica Beckwith
 
JVM Performance Tuning
JVM Performance TuningJVM Performance Tuning
JVM Performance Tuning
Jeremy Leisy
 
JVM memory management & Diagnostics
JVM memory management & DiagnosticsJVM memory management & Diagnostics
JVM memory management & Diagnostics
Dhaval Shah
 
GC Tuning Confessions Of A Performance Engineer - Improved :)
GC Tuning Confessions Of A Performance Engineer - Improved :)GC Tuning Confessions Of A Performance Engineer - Improved :)
GC Tuning Confessions Of A Performance Engineer - Improved :)
Monica Beckwith
 
Tuning Java for Big Data
Tuning Java for Big DataTuning Java for Big Data
Tuning Java for Big Data
Scott Seighman
 
Pimp my gc - Supersonic Scala
Pimp my gc - Supersonic ScalaPimp my gc - Supersonic Scala
Pimp my gc - Supersonic Scala
Pierre Laporte
 
Gopher in performance_tales_ms_go_cracow
Gopher in performance_tales_ms_go_cracowGopher in performance_tales_ms_go_cracow
Gopher in performance_tales_ms_go_cracow
MateuszSzczyrzyca
 
Advances in EM Simulations
Advances in EM SimulationsAdvances in EM Simulations
Advances in EM Simulations
Altair
 
Java gc and JVM optimization
Java gc  and JVM optimizationJava gc  and JVM optimization
Java gc and JVM optimization
Rajan Jethva
 
Jvm problem diagnostics
Jvm problem diagnosticsJvm problem diagnostics
Jvm problem diagnostics
Danijel Mitar
 
19th Session.pptx
19th Session.pptx19th Session.pptx
19th Session.pptx
IkhwaniSaputra
 
Care and Feeding of Large Scale Graphite Installations - DevOpsDays Austin 2013
Care and Feeding of Large Scale Graphite Installations - DevOpsDays Austin 2013Care and Feeding of Large Scale Graphite Installations - DevOpsDays Austin 2013
Care and Feeding of Large Scale Graphite Installations - DevOpsDays Austin 2013
Nick Galbreath
 
Target: Performance Tuning Cassandra at Target
Target: Performance Tuning Cassandra at TargetTarget: Performance Tuning Cassandra at Target
Target: Performance Tuning Cassandra at Target
DataStax Academy
 
OpenFOAM benchmark for EPYC server cavity flow small
OpenFOAM benchmark for EPYC server cavity flow smallOpenFOAM benchmark for EPYC server cavity flow small
OpenFOAM benchmark for EPYC server cavity flow small
takuyayamamoto1800
 
Are your v8 garbage collection logs speaking to you?Joyee Cheung -Alibaba Clo...
Are your v8 garbage collection logs speaking to you?Joyee Cheung -Alibaba Clo...Are your v8 garbage collection logs speaking to you?Joyee Cheung -Alibaba Clo...
Are your v8 garbage collection logs speaking to you?Joyee Cheung -Alibaba Clo...
NodejsFoundation
 
Why My Streaming Job is Slow - Profiling and Optimizing Kafka Streams Apps (L...
Why My Streaming Job is Slow - Profiling and Optimizing Kafka Streams Apps (L...Why My Streaming Job is Slow - Profiling and Optimizing Kafka Streams Apps (L...
Why My Streaming Job is Slow - Profiling and Optimizing Kafka Streams Apps (L...
confluent
 
How does the Cloud Foundry Diego Project Run at Scale?
How does the Cloud Foundry Diego Project Run at Scale?How does the Cloud Foundry Diego Project Run at Scale?
How does the Cloud Foundry Diego Project Run at Scale?
VMware Tanzu
 

Similar to Garbage Collection Pause Times - Angelika Langer (20)

Tuning the HotSpot JVM Garbage Collectors
Tuning the HotSpot JVM Garbage CollectorsTuning the HotSpot JVM Garbage Collectors
Tuning the HotSpot JVM Garbage Collectors
 
Jouney of process safety (2)
Jouney of  process safety (2)Jouney of  process safety (2)
Jouney of process safety (2)
 
ZGC-SnowOne.pdf
ZGC-SnowOne.pdfZGC-SnowOne.pdf
ZGC-SnowOne.pdf
 
Garbage First Garbage Collector (G1 GC) - Migration to, Expectations and Adva...
Garbage First Garbage Collector (G1 GC) - Migration to, Expectations and Adva...Garbage First Garbage Collector (G1 GC) - Migration to, Expectations and Adva...
Garbage First Garbage Collector (G1 GC) - Migration to, Expectations and Adva...
 
JVM Performance Tuning
JVM Performance TuningJVM Performance Tuning
JVM Performance Tuning
 
JVM memory management & Diagnostics
JVM memory management & DiagnosticsJVM memory management & Diagnostics
JVM memory management & Diagnostics
 
GC Tuning Confessions Of A Performance Engineer - Improved :)
GC Tuning Confessions Of A Performance Engineer - Improved :)GC Tuning Confessions Of A Performance Engineer - Improved :)
GC Tuning Confessions Of A Performance Engineer - Improved :)
 
Tuning Java for Big Data
Tuning Java for Big DataTuning Java for Big Data
Tuning Java for Big Data
 
Pimp my gc - Supersonic Scala
Pimp my gc - Supersonic ScalaPimp my gc - Supersonic Scala
Pimp my gc - Supersonic Scala
 
Gopher in performance_tales_ms_go_cracow
Gopher in performance_tales_ms_go_cracowGopher in performance_tales_ms_go_cracow
Gopher in performance_tales_ms_go_cracow
 
Advances in EM Simulations
Advances in EM SimulationsAdvances in EM Simulations
Advances in EM Simulations
 
Java gc and JVM optimization
Java gc  and JVM optimizationJava gc  and JVM optimization
Java gc and JVM optimization
 
Jvm problem diagnostics
Jvm problem diagnosticsJvm problem diagnostics
Jvm problem diagnostics
 
19th Session.pptx
19th Session.pptx19th Session.pptx
19th Session.pptx
 
Care and Feeding of Large Scale Graphite Installations - DevOpsDays Austin 2013
Care and Feeding of Large Scale Graphite Installations - DevOpsDays Austin 2013Care and Feeding of Large Scale Graphite Installations - DevOpsDays Austin 2013
Care and Feeding of Large Scale Graphite Installations - DevOpsDays Austin 2013
 
Target: Performance Tuning Cassandra at Target
Target: Performance Tuning Cassandra at TargetTarget: Performance Tuning Cassandra at Target
Target: Performance Tuning Cassandra at Target
 
OpenFOAM benchmark for EPYC server cavity flow small
OpenFOAM benchmark for EPYC server cavity flow smallOpenFOAM benchmark for EPYC server cavity flow small
OpenFOAM benchmark for EPYC server cavity flow small
 
Are your v8 garbage collection logs speaking to you?Joyee Cheung -Alibaba Clo...
Are your v8 garbage collection logs speaking to you?Joyee Cheung -Alibaba Clo...Are your v8 garbage collection logs speaking to you?Joyee Cheung -Alibaba Clo...
Are your v8 garbage collection logs speaking to you?Joyee Cheung -Alibaba Clo...
 
Why My Streaming Job is Slow - Profiling and Optimizing Kafka Streams Apps (L...
Why My Streaming Job is Slow - Profiling and Optimizing Kafka Streams Apps (L...Why My Streaming Job is Slow - Profiling and Optimizing Kafka Streams Apps (L...
Why My Streaming Job is Slow - Profiling and Optimizing Kafka Streams Apps (L...
 
How does the Cloud Foundry Diego Project Run at Scale?
How does the Cloud Foundry Diego Project Run at Scale?How does the Cloud Foundry Diego Project Run at Scale?
How does the Cloud Foundry Diego Project Run at Scale?
 

More from JAXLondon_Conference

Cassandra and Spark - Tim Berglund
Cassandra and Spark - Tim BerglundCassandra and Spark - Tim Berglund
Cassandra and Spark - Tim Berglund
JAXLondon_Conference
 
All change! How the new Economics of Cloud will make you think differently ab...
All change! How the new Economics of Cloud will make you think differently ab...All change! How the new Economics of Cloud will make you think differently ab...
All change! How the new Economics of Cloud will make you think differently ab...
JAXLondon_Conference
 
The Unit Test is dead. Long live the Unit Test! - Colin Vipurs
The Unit Test is dead. Long live the Unit Test! - Colin VipursThe Unit Test is dead. Long live the Unit Test! - Colin Vipurs
The Unit Test is dead. Long live the Unit Test! - Colin Vipurs
JAXLondon_Conference
 
Stop guessing, start testing – mobile testing done right - Timo Euteneuer
Stop guessing, start testing – mobile testing done right - Timo EuteneuerStop guessing, start testing – mobile testing done right - Timo Euteneuer
Stop guessing, start testing – mobile testing done right - Timo Euteneuer
JAXLondon_Conference
 
Java Generics Past, Present and Future - Richard Warburton, Raoul-Gabriel Urma
Java Generics Past, Present and Future - Richard Warburton, Raoul-Gabriel UrmaJava Generics Past, Present and Future - Richard Warburton, Raoul-Gabriel Urma
Java Generics Past, Present and Future - Richard Warburton, Raoul-Gabriel Urma
JAXLondon_Conference
 
Java Generics Past, Present and Future - Richard Warburton, Raoul-Gabriel Urma
Java Generics Past, Present and Future - Richard Warburton, Raoul-Gabriel UrmaJava Generics Past, Present and Future - Richard Warburton, Raoul-Gabriel Urma
Java Generics Past, Present and Future - Richard Warburton, Raoul-Gabriel Urma
JAXLondon_Conference
 
Smoothing the continuous delivery path – a tale of two teams - Lyndsay Prewer
Smoothing the continuous delivery path – a tale of two teams - Lyndsay PrewerSmoothing the continuous delivery path – a tale of two teams - Lyndsay Prewer
Smoothing the continuous delivery path – a tale of two teams - Lyndsay Prewer
JAXLondon_Conference
 
VC from the inside - a techie's perspective - Adrian Colyer
VC from the inside - a techie's perspective - Adrian ColyerVC from the inside - a techie's perspective - Adrian Colyer
VC from the inside - a techie's perspective - Adrian Colyer
JAXLondon_Conference
 
Use your type system; write less code - Samir Talwar
Use your type system; write less code - Samir TalwarUse your type system; write less code - Samir Talwar
Use your type system; write less code - Samir Talwar
JAXLondon_Conference
 
Thinking fast and slow with software development - Daniel Bryant
Thinking fast and slow with software development - Daniel BryantThinking fast and slow with software development - Daniel Bryant
Thinking fast and slow with software development - Daniel Bryant
JAXLondon_Conference
 
The java memory model and the mutability matrix of pain - Jamie Allen
The java memory model and the mutability matrix of pain - Jamie AllenThe java memory model and the mutability matrix of pain - Jamie Allen
The java memory model and the mutability matrix of pain - Jamie Allen
JAXLondon_Conference
 
The art of shifting perspectives - Rachel Davies
The art of shifting perspectives - Rachel DaviesThe art of shifting perspectives - Rachel Davies
The art of shifting perspectives - Rachel Davies
JAXLondon_Conference
 
Spring Boot in the Web Tier - Dave Syer
Spring Boot in the Web Tier - Dave SyerSpring Boot in the Web Tier - Dave Syer
Spring Boot in the Web Tier - Dave Syer
JAXLondon_Conference
 
Microservices from dream to reality in an hour - Dr. Holly Cummins
Microservices from dream to reality in an hour - Dr. Holly CumminsMicroservices from dream to reality in an hour - Dr. Holly Cummins
Microservices from dream to reality in an hour - Dr. Holly Cummins
JAXLondon_Conference
 
Love your architecture - Alexander von Zitzewitz
Love your architecture - Alexander von ZitzewitzLove your architecture - Alexander von Zitzewitz
Love your architecture - Alexander von Zitzewitz
JAXLondon_Conference
 
Lambdas puzzler - Peter Lawrey
Lambdas puzzler - Peter LawreyLambdas puzzler - Peter Lawrey
Lambdas puzzler - Peter Lawrey
JAXLondon_Conference
 
Java vs. Java Script for enterprise web applications - Chris Bailey
Java vs. Java Script for enterprise web applications - Chris BaileyJava vs. Java Script for enterprise web applications - Chris Bailey
Java vs. Java Script for enterprise web applications - Chris Bailey
JAXLondon_Conference
 
Java generics past, present and future - Raoul-Gabriel Urma, Richard Warburton
Java generics past, present and future - Raoul-Gabriel Urma, Richard WarburtonJava generics past, present and future - Raoul-Gabriel Urma, Richard Warburton
Java generics past, present and future - Raoul-Gabriel Urma, Richard Warburton
JAXLondon_Conference
 
Java 8 best practices - Stephen Colebourne
Java 8 best practices - Stephen ColebourneJava 8 best practices - Stephen Colebourne
Java 8 best practices - Stephen Colebourne
JAXLondon_Conference
 
Intuitions for scaling data centric architectures - Benjamin Stopford
Intuitions for scaling data centric architectures - Benjamin StopfordIntuitions for scaling data centric architectures - Benjamin Stopford
Intuitions for scaling data centric architectures - Benjamin Stopford
JAXLondon_Conference
 

More from JAXLondon_Conference (20)

Cassandra and Spark - Tim Berglund
Cassandra and Spark - Tim BerglundCassandra and Spark - Tim Berglund
Cassandra and Spark - Tim Berglund
 
All change! How the new Economics of Cloud will make you think differently ab...
All change! How the new Economics of Cloud will make you think differently ab...All change! How the new Economics of Cloud will make you think differently ab...
All change! How the new Economics of Cloud will make you think differently ab...
 
The Unit Test is dead. Long live the Unit Test! - Colin Vipurs
The Unit Test is dead. Long live the Unit Test! - Colin VipursThe Unit Test is dead. Long live the Unit Test! - Colin Vipurs
The Unit Test is dead. Long live the Unit Test! - Colin Vipurs
 
Stop guessing, start testing – mobile testing done right - Timo Euteneuer
Stop guessing, start testing – mobile testing done right - Timo EuteneuerStop guessing, start testing – mobile testing done right - Timo Euteneuer
Stop guessing, start testing – mobile testing done right - Timo Euteneuer
 
Java Generics Past, Present and Future - Richard Warburton, Raoul-Gabriel Urma
Java Generics Past, Present and Future - Richard Warburton, Raoul-Gabriel UrmaJava Generics Past, Present and Future - Richard Warburton, Raoul-Gabriel Urma
Java Generics Past, Present and Future - Richard Warburton, Raoul-Gabriel Urma
 
Java Generics Past, Present and Future - Richard Warburton, Raoul-Gabriel Urma
Java Generics Past, Present and Future - Richard Warburton, Raoul-Gabriel UrmaJava Generics Past, Present and Future - Richard Warburton, Raoul-Gabriel Urma
Java Generics Past, Present and Future - Richard Warburton, Raoul-Gabriel Urma
 
Smoothing the continuous delivery path – a tale of two teams - Lyndsay Prewer
Smoothing the continuous delivery path – a tale of two teams - Lyndsay PrewerSmoothing the continuous delivery path – a tale of two teams - Lyndsay Prewer
Smoothing the continuous delivery path – a tale of two teams - Lyndsay Prewer
 
VC from the inside - a techie's perspective - Adrian Colyer
VC from the inside - a techie's perspective - Adrian ColyerVC from the inside - a techie's perspective - Adrian Colyer
VC from the inside - a techie's perspective - Adrian Colyer
 
Use your type system; write less code - Samir Talwar
Use your type system; write less code - Samir TalwarUse your type system; write less code - Samir Talwar
Use your type system; write less code - Samir Talwar
 
Thinking fast and slow with software development - Daniel Bryant
Thinking fast and slow with software development - Daniel BryantThinking fast and slow with software development - Daniel Bryant
Thinking fast and slow with software development - Daniel Bryant
 
The java memory model and the mutability matrix of pain - Jamie Allen
The java memory model and the mutability matrix of pain - Jamie AllenThe java memory model and the mutability matrix of pain - Jamie Allen
The java memory model and the mutability matrix of pain - Jamie Allen
 
The art of shifting perspectives - Rachel Davies
The art of shifting perspectives - Rachel DaviesThe art of shifting perspectives - Rachel Davies
The art of shifting perspectives - Rachel Davies
 
Spring Boot in the Web Tier - Dave Syer
Spring Boot in the Web Tier - Dave SyerSpring Boot in the Web Tier - Dave Syer
Spring Boot in the Web Tier - Dave Syer
 
Microservices from dream to reality in an hour - Dr. Holly Cummins
Microservices from dream to reality in an hour - Dr. Holly CumminsMicroservices from dream to reality in an hour - Dr. Holly Cummins
Microservices from dream to reality in an hour - Dr. Holly Cummins
 
Love your architecture - Alexander von Zitzewitz
Love your architecture - Alexander von ZitzewitzLove your architecture - Alexander von Zitzewitz
Love your architecture - Alexander von Zitzewitz
 
Lambdas puzzler - Peter Lawrey
Lambdas puzzler - Peter LawreyLambdas puzzler - Peter Lawrey
Lambdas puzzler - Peter Lawrey
 
Java vs. Java Script for enterprise web applications - Chris Bailey
Java vs. Java Script for enterprise web applications - Chris BaileyJava vs. Java Script for enterprise web applications - Chris Bailey
Java vs. Java Script for enterprise web applications - Chris Bailey
 
Java generics past, present and future - Raoul-Gabriel Urma, Richard Warburton
Java generics past, present and future - Raoul-Gabriel Urma, Richard WarburtonJava generics past, present and future - Raoul-Gabriel Urma, Richard Warburton
Java generics past, present and future - Raoul-Gabriel Urma, Richard Warburton
 
Java 8 best practices - Stephen Colebourne
Java 8 best practices - Stephen ColebourneJava 8 best practices - Stephen Colebourne
Java 8 best practices - Stephen Colebourne
 
Intuitions for scaling data centric architectures - Benjamin Stopford
Intuitions for scaling data centric architectures - Benjamin StopfordIntuitions for scaling data centric architectures - Benjamin Stopford
Intuitions for scaling data centric architectures - Benjamin Stopford
 

Recently uploaded

ALGIT - Assembly Line for Green IT - Numbers, Data, Facts
ALGIT - Assembly Line for Green IT - Numbers, Data, FactsALGIT - Assembly Line for Green IT - Numbers, Data, Facts
ALGIT - Assembly Line for Green IT - Numbers, Data, Facts
Green Software Development
 
Artificia Intellicence and XPath Extension Functions
Artificia Intellicence and XPath Extension FunctionsArtificia Intellicence and XPath Extension Functions
Artificia Intellicence and XPath Extension Functions
Octavian Nadolu
 
Using Xen Hypervisor for Functional Safety
Using Xen Hypervisor for Functional SafetyUsing Xen Hypervisor for Functional Safety
Using Xen Hypervisor for Functional Safety
Ayan Halder
 
Graspan: A Big Data System for Big Code Analysis
Graspan: A Big Data System for Big Code AnalysisGraspan: A Big Data System for Big Code Analysis
Graspan: A Big Data System for Big Code Analysis
Aftab Hussain
 
Neo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j - Product Vision and Knowledge Graphs - GraphSummit ParisNeo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j
 
GreenCode-A-VSCode-Plugin--Dario-Jurisic
GreenCode-A-VSCode-Plugin--Dario-JurisicGreenCode-A-VSCode-Plugin--Dario-Jurisic
GreenCode-A-VSCode-Plugin--Dario-Jurisic
Green Software Development
 
Microservice Teams - How the cloud changes the way we work
Microservice Teams - How the cloud changes the way we workMicroservice Teams - How the cloud changes the way we work
Microservice Teams - How the cloud changes the way we work
Sven Peters
 
Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604
Fermin Galan
 
Atelier - Innover avec l’IA Générative et les graphes de connaissances
Atelier - Innover avec l’IA Générative et les graphes de connaissancesAtelier - Innover avec l’IA Générative et les graphes de connaissances
Atelier - Innover avec l’IA Générative et les graphes de connaissances
Neo4j
 
DDS-Security 1.2 - What's New? Stronger security for long-running systems
DDS-Security 1.2 - What's New? Stronger security for long-running systemsDDS-Security 1.2 - What's New? Stronger security for long-running systems
DDS-Security 1.2 - What's New? Stronger security for long-running systems
Gerardo Pardo-Castellote
 
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptxTop Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
rickgrimesss22
 
原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样
原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样
原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样
mz5nrf0n
 
socradar-q1-2024-aviation-industry-report.pdf
socradar-q1-2024-aviation-industry-report.pdfsocradar-q1-2024-aviation-industry-report.pdf
socradar-q1-2024-aviation-industry-report.pdf
SOCRadar
 
E-commerce Development Services- Hornet Dynamics
E-commerce Development Services- Hornet DynamicsE-commerce Development Services- Hornet Dynamics
E-commerce Development Services- Hornet Dynamics
Hornet Dynamics
 
Neo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j - Product Vision and Knowledge Graphs - GraphSummit ParisNeo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j
 
Enterprise Resource Planning System in Telangana
Enterprise Resource Planning System in TelanganaEnterprise Resource Planning System in Telangana
Enterprise Resource Planning System in Telangana
NYGGS Automation Suite
 
Need for Speed: Removing speed bumps from your Symfony projects ⚡️
Need for Speed: Removing speed bumps from your Symfony projects ⚡️Need for Speed: Removing speed bumps from your Symfony projects ⚡️
Need for Speed: Removing speed bumps from your Symfony projects ⚡️
Łukasz Chruściel
 
Fundamentals of Programming and Language Processors
Fundamentals of Programming and Language ProcessorsFundamentals of Programming and Language Processors
Fundamentals of Programming and Language Processors
Rakesh Kumar R
 
2024 eCommerceDays Toulouse - Sylius 2.0.pdf
2024 eCommerceDays Toulouse - Sylius 2.0.pdf2024 eCommerceDays Toulouse - Sylius 2.0.pdf
2024 eCommerceDays Toulouse - Sylius 2.0.pdf
Łukasz Chruściel
 
LORRAINE ANDREI_LEQUIGAN_HOW TO USE WHATSAPP.pptx
LORRAINE ANDREI_LEQUIGAN_HOW TO USE WHATSAPP.pptxLORRAINE ANDREI_LEQUIGAN_HOW TO USE WHATSAPP.pptx
LORRAINE ANDREI_LEQUIGAN_HOW TO USE WHATSAPP.pptx
lorraineandreiamcidl
 

Recently uploaded (20)

ALGIT - Assembly Line for Green IT - Numbers, Data, Facts
ALGIT - Assembly Line for Green IT - Numbers, Data, FactsALGIT - Assembly Line for Green IT - Numbers, Data, Facts
ALGIT - Assembly Line for Green IT - Numbers, Data, Facts
 
Artificia Intellicence and XPath Extension Functions
Artificia Intellicence and XPath Extension FunctionsArtificia Intellicence and XPath Extension Functions
Artificia Intellicence and XPath Extension Functions
 
Using Xen Hypervisor for Functional Safety
Using Xen Hypervisor for Functional SafetyUsing Xen Hypervisor for Functional Safety
Using Xen Hypervisor for Functional Safety
 
Graspan: A Big Data System for Big Code Analysis
Graspan: A Big Data System for Big Code AnalysisGraspan: A Big Data System for Big Code Analysis
Graspan: A Big Data System for Big Code Analysis
 
Neo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j - Product Vision and Knowledge Graphs - GraphSummit ParisNeo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
 
GreenCode-A-VSCode-Plugin--Dario-Jurisic
GreenCode-A-VSCode-Plugin--Dario-JurisicGreenCode-A-VSCode-Plugin--Dario-Jurisic
GreenCode-A-VSCode-Plugin--Dario-Jurisic
 
Microservice Teams - How the cloud changes the way we work
Microservice Teams - How the cloud changes the way we workMicroservice Teams - How the cloud changes the way we work
Microservice Teams - How the cloud changes the way we work
 
Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604
 
Atelier - Innover avec l’IA Générative et les graphes de connaissances
Atelier - Innover avec l’IA Générative et les graphes de connaissancesAtelier - Innover avec l’IA Générative et les graphes de connaissances
Atelier - Innover avec l’IA Générative et les graphes de connaissances
 
DDS-Security 1.2 - What's New? Stronger security for long-running systems
DDS-Security 1.2 - What's New? Stronger security for long-running systemsDDS-Security 1.2 - What's New? Stronger security for long-running systems
DDS-Security 1.2 - What's New? Stronger security for long-running systems
 
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptxTop Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
 
原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样
原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样
原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样
 
socradar-q1-2024-aviation-industry-report.pdf
socradar-q1-2024-aviation-industry-report.pdfsocradar-q1-2024-aviation-industry-report.pdf
socradar-q1-2024-aviation-industry-report.pdf
 
E-commerce Development Services- Hornet Dynamics
E-commerce Development Services- Hornet DynamicsE-commerce Development Services- Hornet Dynamics
E-commerce Development Services- Hornet Dynamics
 
Neo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j - Product Vision and Knowledge Graphs - GraphSummit ParisNeo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
 
Enterprise Resource Planning System in Telangana
Enterprise Resource Planning System in TelanganaEnterprise Resource Planning System in Telangana
Enterprise Resource Planning System in Telangana
 
Need for Speed: Removing speed bumps from your Symfony projects ⚡️
Need for Speed: Removing speed bumps from your Symfony projects ⚡️Need for Speed: Removing speed bumps from your Symfony projects ⚡️
Need for Speed: Removing speed bumps from your Symfony projects ⚡️
 
Fundamentals of Programming and Language Processors
Fundamentals of Programming and Language ProcessorsFundamentals of Programming and Language Processors
Fundamentals of Programming and Language Processors
 
2024 eCommerceDays Toulouse - Sylius 2.0.pdf
2024 eCommerceDays Toulouse - Sylius 2.0.pdf2024 eCommerceDays Toulouse - Sylius 2.0.pdf
2024 eCommerceDays Toulouse - Sylius 2.0.pdf
 
LORRAINE ANDREI_LEQUIGAN_HOW TO USE WHATSAPP.pptx
LORRAINE ANDREI_LEQUIGAN_HOW TO USE WHATSAPP.pptxLORRAINE ANDREI_LEQUIGAN_HOW TO USE WHATSAPP.pptx
LORRAINE ANDREI_LEQUIGAN_HOW TO USE WHATSAPP.pptx
 

Garbage Collection Pause Times - Angelika Langer

  • 2. © Copyright 1995-2015 by Angelika Langer & Klaus Kreft. All Rights Reserved. http://www.AngelikaLanger.com/ last update: 18/03/2014 08:06 gc pauses (2) objective • what causes long GC pauses? • what does GC do during a STW pause? • how can I reduce the pause time? • explore HotSpot JVM's GC algorithms • point out reasons for long pauses • discuss tuning options • glance at alternative GC in other JVMs
  • 3. © Copyright 1995-2015 by Angelika Langer & Klaus Kreft. All Rights Reserved. http://www.AngelikaLanger.com/ last update: 18/03/2014 08:06 gc pauses (3) speaker's relationship to topic • independent trainer / consultant / author – teaching Java for ~20 years – curriculum of some challenging seminars – JCP observer and Java champion since 2005 – co-author of "Effective Java" column – author of Java Generics FAQ and Lambda Tutorial & Reference
  • 4. © Copyright 1995-2015 by Angelika Langer & Klaus Kreft. All Rights Reserved. http://www.AngelikaLanger.com/ last update: 18/03/2014 08:06 gc pauses (4) garbage collection • purpose – make memory occupied by unreachable objects available for subsequent memory allocation – in order to allow for 24/7 service with finite memory resources • involves several activities – garbage detection – garbage elimination
  • 5. © Copyright 1995-2015 by Angelika Langer & Klaus Kreft. All Rights Reserved. http://www.AngelikaLanger.com/ last update: 18/03/2014 08:06 gc pauses (5) cost of garbage collection • stop-the-world (STW) phases stop all application threads – problem for applications with time constraints e.g. user interaction, SLA applications, ... • concurrent GC phases steal CPU cycles from application – problem for application with performance / throughput constraints e.g. transactions per second, ...
  • 6. © Copyright 1995-2015 by Angelika Langer & Klaus Kreft. All Rights Reserved. http://www.AngelikaLanger.com/ last update: 18/03/2014 08:06 gc pauses (6) trade-off memory footprint throughput pause time
  • 7. © Copyright 1995-2015 by Angelika Langer & Klaus Kreft. All Rights Reserved. http://www.AngelikaLanger.com/ last update: 18/03/2014 08:06 gc pauses (7) what causes long pause times? • what does GC do during a STW pause? • how can I reduce the pause time? • agenda for this talk – explore the HotSpot JVM's GC algorithms – point out obvious (and not so obvious) reasons for long pauses – discuss tuning options – glance at GC in other JVMs
  • 8. © Copyright 1995-2015 by Angelika Langer & Klaus Kreft. All Rights Reserved. http://www.AngelikaLanger.com/ last update: 18/03/2014 08:06 gc pauses (8) agenda • classic HotSpot GC algorithms – parallel GC – concurrent GC – G1 • reasons for long pauses • pause time tuning • alternative garbage collectors – Shenandoah – Azul & JRocket
  • 9. © Copyright 1995-2015 by Angelika Langer & Klaus Kreft. All Rights Reserved. http://www.AngelikaLanger.com/ last update: 18/03/2014 08:06 gc pauses (9) parallel GC • a generational collector – based on the assumption that most objects die young
  • 10. © Copyright 1995-2015 by Angelika Langer & Klaus Kreft. All Rights Reserved. http://www.AngelikaLanger.com/ last update: 18/03/2014 08:06 gc pauses (10) parallel GC (cont.) • organizes heap into different areas (generations) • uses different algorithms per generation object lifetimeeden survivor spaces old (tenured) generationyoung generation
  • 11. © Copyright 1995-2015 by Angelika Langer & Klaus Kreft. All Rights Reserved. http://www.AngelikaLanger.com/ last update: 18/03/2014 08:06 gc pauses (11) minor GC • copy algorithm – scans all references into the young generation – copies all reachable objects (survivors) into survivor space – also handles promotion to old gen – updates references to relocated objects – frees the entire young generation en bloc • performed frequently – by multiple GC threads in parallel
  • 12. © Copyright 1995-2015 by Angelika Langer & Klaus Kreft. All Rights Reserved. http://www.AngelikaLanger.com/ last update: 18/03/2014 08:06 gc pauses (12) minor GC • upside – eden is an empty block of memory afterwards very efficient subsequent memory allocation – no fragmentation survivor space is compact • downside – stop-the-world pause proportional to number of survivors – higher footprint needs free space as destination for copying
  • 13. © Copyright 1995-2015 by Angelika Langer & Klaus Kreft. All Rights Reserved. http://www.AngelikaLanger.com/ last update: 18/03/2014 08:06 gc pauses (13) full GC • mark-and-compact algorithm – follows all references into the heap – marks all reachable objects – sweeps all dead objects (i.e. marks their memory as "free") – compacts the old generation – updates references to relocated objects • performed rarely – by multiple GC threads in parallel
  • 14. © Copyright 1995-2015 by Angelika Langer & Klaus Kreft. All Rights Reserved. http://www.AngelikaLanger.com/ last update: 18/03/2014 08:06 gc pauses (14) full GC • upside – no fragmentation • downside – stop-the-world pause proportional to number of survivors (in marking phase) and size of heap (in compaction phase)
  • 15. © Copyright 1995-2015 by Angelika Langer & Klaus Kreft. All Rights Reserved. http://www.AngelikaLanger.com/ last update: 18/03/2014 08:06 gc pauses (15) inter-generational references • generational GC has a downside – works on a subset of the heap (young gen in minor GC) • all references into young must be scanned – root references (from outside the heap into the young gen) stack variables, static fields, references from JIT compiled code, etc. – inter-generational references (from old gen into young gen) require write barriers
  • 16. © Copyright 1995-2015 by Angelika Langer & Klaus Kreft. All Rights Reserved. http://www.AngelikaLanger.com/ last update: 18/03/2014 08:06 gc pauses (16) old-to-young references roots starting points for young generation marking young old card table * *
  • 17. © Copyright 1995-2015 by Angelika Langer & Klaus Kreft. All Rights Reserved. http://www.AngelikaLanger.com/ last update: 18/03/2014 08:06 gc pauses (17) write barriers • a barrier is additional code – executed when a reference is modified – must catch when application creates an old-to-young reference – sets a dirty bit in a card table • card table is later processed in GC pause – to find the actual inter-generational references
  • 18. © Copyright 1995-2015 by Angelika Langer & Klaus Kreft. All Rights Reserved. http://www.AngelikaLanger.com/ last update: 18/03/2014 08:06 gc pauses (18) cost of generational GC • extra effort for inter-generational references – slows down application (due to write barrier) – increases pauses (due to card table processing) • floating garbage – dead (not yet collected) objects in old gen might prevent collection of dead objects in young gen
  • 19. © Copyright 1995-2015 by Angelika Langer & Klaus Kreft. All Rights Reserved. http://www.AngelikaLanger.com/ last update: 18/03/2014 08:06 gc pauses (19) parallel GC minor GC minor GC full GC - marking - summary - compaction - copy - copy
  • 20. © Copyright 1995-2015 by Angelika Langer & Klaus Kreft. All Rights Reserved. http://www.AngelikaLanger.com/ last update: 18/03/2014 08:06 gc pauses (20) parallel GC - tuning • pause time proportional to number of survivors and size of heap – large heap => long pauses => not many tuning options • some tuning ideas – increase parallelism increase number of parallel GC threads provided that there are idle CPUs available – let objects die in young gen reason: GC in old gen is more expensive than in young gen increase young gen size, survivor size, tenuring threshold, ...
  • 21. © Copyright 1995-2015 by Angelika Langer & Klaus Kreft. All Rights Reserved. http://www.AngelikaLanger.com/ last update: 18/03/2014 08:06 gc pauses (21) some useful VM flags -XX:+UseParallelGC -XX:+UseParallelOldGC – select parallel GC on young and old gen -XX:ParallelGCThreads=<number> – specify number of GC threads -Xmn<value> or -XX:NewRatio=<ratio> – specify size of young gen -XX:SurvivorRatio=<ratio> – specify size of survivor spaces
  • 22. © Copyright 1995-2015 by Angelika Langer & Klaus Kreft. All Rights Reserved. http://www.AngelikaLanger.com/ last update: 18/03/2014 08:06 gc pauses (22) agenda • classic HotSpot GC algorithms – parallel GC – concurrent GC – G1 • reasons for long pauses • pause time tuning • alternative garbage collectors – Shenandoah – Azul & JRocket
  • 23. © Copyright 1995-2015 by Angelika Langer & Klaus Kreft. All Rights Reserved. http://www.AngelikaLanger.com/ last update: 18/03/2014 08:06 gc pauses (23) concurrent GC • alternative algorithm on old generation – STW copy algorithm on young generation – concurrent mark-and-sweep (CMS) algorithm on old generation • CMS has several phases – initial marking phase (STW) – marking phase (concurrent) – final remarking phase (STW) – sweep phase (concurrent)
  • 24. © Copyright 1995-2015 by Angelika Langer & Klaus Kreft. All Rights Reserved. http://www.AngelikaLanger.com/ last update: 18/03/2014 08:06 gc pauses (24) concurrent marking • marking must identify reachable objects – while application is running and modifies the reference graph • uses tricolor algorithm – requires write barriers • concurrent marking is not exact – snap-shot-at-the-beginning (SATB) marking i.e. objects stay alive, if they were reachable at the beginning – very conservative; creates a lot of floating garbage
  • 25. © Copyright 1995-2015 by Angelika Langer & Klaus Kreft. All Rights Reserved. http://www.AngelikaLanger.com/ last update: 18/03/2014 08:06 gc pauses (25) concurrent sweep • sweeping adds free memory cells to free lists – allocation in old gen requires lookup in free lists => more expensive allocation – increases cost of minor GC because promotion is more expensive • sweeping leads to fragmentation – higher risk of promotion failure – if large objects are promoted • fallback to full GC
  • 26. © Copyright 1995-2015 by Angelika Langer & Klaus Kreft. All Rights Reserved. http://www.AngelikaLanger.com/ last update: 18/03/2014 08:06 gc pauses (26) cost of CMS • increased minor GC pause time – due to more expensive allocation in old gen via free lists • substantially more floating garbage – due to concurrent SATB marking • extra effort for tricolor algorithm – slows down application via write barriers • long turn-around times – complex algorithm takes longer until actual memory reclaim • unreliable – fallback to full GC in case of fragmentation
  • 27. © Copyright 1995-2015 by Angelika Langer & Klaus Kreft. All Rights Reserved. http://www.AngelikaLanger.com/ last update: 18/03/2014 08:06 gc pauses (27) cost of CMS • reduced pause time (on average) • at the expense of – higher memory consumption – lower throughput
  • 28. © Copyright 1995-2015 by Angelika Langer & Klaus Kreft. All Rights Reserved. http://www.AngelikaLanger.com/ last update: 18/03/2014 08:06 gc pauses (28) concurrent GC minor GC initial marking final remarking - concurrent marking minor GC - concurrent marking minor GC - concurrent sweep
  • 29. © Copyright 1995-2015 by Angelika Langer & Klaus Kreft. All Rights Reserved. http://www.AngelikaLanger.com/ last update: 18/03/2014 08:06 gc pauses (29) CMS - tuning • pause time depends on – amount of work left for remarking phase i.e. number of grey cells created by application's activities – degree of fragmentation i.e. fallback to full GC • some tuning ideas – get more done concurrently increase number of threads in concurrent phases – reduce CMS's workload let objects die in young gen instead of old gen increase young gen size, survivor size, tenuring threshold, ... – start marking cycles earlier to avoid fallback to full GC lower the occupancy threshold that initiates the cycle
  • 30. © Copyright 1995-2015 by Angelika Langer & Klaus Kreft. All Rights Reserved. http://www.AngelikaLanger.com/ last update: 18/03/2014 08:06 gc pauses (30) some useful VM flags -XX:+UseConcMarkSweepGC – select CMS on old gen (automatically uses parallel young GC) -XX:CMSInitiatingOccupancyFraction=<percent> -XX:+UseCMSInitiatingOccupancyOnly – lower threshold that starts CMS cycle -XX:ConcGCThreads=<n> – specify number of threads in concurrent phases
  • 31. © Copyright 1995-2015 by Angelika Langer & Klaus Kreft. All Rights Reserved. http://www.AngelikaLanger.com/ last update: 18/03/2014 08:06 gc pauses (31) agenda • classic HotSpot GC algorithms – parallel GC – concurrent GC – G1 • reasons for long pauses • pause time tuning • alternative garbage collectors – Shenandoah – Azul & JRocket
  • 32. © Copyright 1995-2015 by Angelika Langer & Klaus Kreft. All Rights Reserved. http://www.AngelikaLanger.com/ last update: 18/03/2014 08:06 gc pauses (32) "garbage first" (G1) GC • a generational garbage collector – organizes the heap into regions of identical size – copy algorithm => no fragmentation young survivor old young mode
  • 33. © Copyright 1995-2015 by Angelika Langer & Klaus Kreft. All Rights Reserved. http://www.AngelikaLanger.com/ last update: 18/03/2014 08:06 gc pauses (33) mixed mode collections • builds collection set dynamically – collection set = regions to be included into next GC • two modes – young: all young regions are collected – mixed: old regions with a lot of garbage are included young survivor old mixed mode collection set
  • 34. © Copyright 1995-2015 by Angelika Langer & Klaus Kreft. All Rights Reserved. http://www.AngelikaLanger.com/ last update: 18/03/2014 08:06 gc pauses (34) remembered sets • partial GC (on subset of heap) requires – maintenance of references into the collection set – all regions have a remembered set • remembered set (RS) – list of references from outside the region into the region
  • 35. © Copyright 1995-2015 by Angelika Langer & Klaus Kreft. All Rights Reserved. http://www.AngelikaLanger.com/ last update: 18/03/2014 08:06 gc pauses (35) remembered set (cont.) • RS maintenance requires write barriers – must catch creation of inter-regional references – RS update tasks are put into a work queue processed concurrently by background threads, or when STW GC pause starts • simplification (in order to reduce RS overhead) – references originating from young regions are not recorded in RS – instead: all young regions are included into each GC
  • 36. © Copyright 1995-2015 by Angelika Langer & Klaus Kreft. All Rights Reserved. http://www.AngelikaLanger.com/ last update: 18/03/2014 08:06 gc pauses (36) concurrent marking • G1 performs a concurrent SATB marking – similar to CMS's marking – initial marking phase piggybacked on young GC – no sweep phase – instead a concurrent cleanup phase reclaims entirely empty old regions en bloc • marking information used for internal statistics – GC efficiency calculation, i.e. amount of garbage per old region – liveness info, i.e. are origins of RS entries still alive
  • 37. © Copyright 1995-2015 by Angelika Langer & Klaus Kreft. All Rights Reserved. http://www.AngelikaLanger.com/ last update: 18/03/2014 08:06 gc pauses (37) G1 pauses • only two main GC parameters: -XX:GCPauseIntervalMillis=500 -XX:MaxGCPauseMillis=200 GC pauseapplication running GCPauseIntervalMillis < < MaxGCPauseMillis
  • 38. © Copyright 1995-2015 by Angelika Langer & Klaus Kreft. All Rights Reserved. http://www.AngelikaLanger.com/ last update: 18/03/2014 08:06 gc pauses (38) G1 pauses • evacuation pause in young mode – proportional to number of young regions and number of survivors therein – also depends on cost of pending RS updates – self-adjusting G1 tries to create only as many young regions as can be collecting within pause time goal does not always work out • evacuation pause in mixed mode – depends on pause time goal – self-adjusting G1 includes only old regions with lots of garbage and only as many as fit into the pause time goal does not always work out
  • 39. © Copyright 1995-2015 by Angelika Langer & Klaus Kreft. All Rights Reserved. http://www.AngelikaLanger.com/ last update: 18/03/2014 08:06 gc pauses (39) G1 pauses (cont.) • full evacuation pause – includes all regions into collection set – if heap is almost full and cannot be expanded any further – proportional to number of regions and survivors • remarking – amount of work left for remarking phase • cleanup – number of empty old regions
  • 40. © Copyright 1995-2015 by Angelika Langer & Klaus Kreft. All Rights Reserved. http://www.AngelikaLanger.com/ last update: 18/03/2014 08:06 gc pauses (40) cost of G1 • extra effort for remembered sets – slows down application via write barriers – background GC threads for concurrent RS update – increased pause time for RS update in evacuation pause • long turn-around times – complex algorithm takes longer until actual memory reclaim • some amount of floating garbage – due to concurrent SATB marking • unreliable – pause time goal not guaranteed
  • 41. © Copyright 1995-2015 by Angelika Langer & Klaus Kreft. All Rights Reserved. http://www.AngelikaLanger.com/ last update: 18/03/2014 08:06 gc pauses (41) cost of G1 • upside – reduced pause time (compared to parallel GC) – no fragmentation (compared to CMS) – fully self-adapting • downside – higher memory consumption – lower throughput
  • 42. © Copyright 1995-2015 by Angelika Langer & Klaus Kreft. All Rights Reserved. http://www.AngelikaLanger.com/ last update: 18/03/2014 08:06 gc pauses (42) G1 GC young GC initial marking final remarking - concurrent marking minor GC - concurrent marking mixed GC - evacuation young GC - concurrent cleanup
  • 43. © Copyright 1995-2015 by Angelika Langer & Klaus Kreft. All Rights Reserved. http://www.AngelikaLanger.com/ last update: 18/03/2014 08:06 gc pauses (43) G1 - tuning • some tuning ideas – avoid over-tuning set realistic pause time goals – start marking cycles earlier to avoid full GC lower the occupancy threshold that initiates the cycle – use more GC threads increase number of threads in concurrent and STW phases
  • 44. © Copyright 1995-2015 by Angelika Langer & Klaus Kreft. All Rights Reserved. http://www.AngelikaLanger.com/ last update: 18/03/2014 08:06 gc pauses (44) some useful VM flags -XX:+UseG1GC – select G1 -XX:GCPauseIntervalMillis -XX:MaxGCPauseMillis=<ms> – specify pause time and interval goals -XX:InitiatingHeapOccupancyPercent=<percent> – lower threshold that starts marking cycle -XX:ConcGCThreads=<n> -XX:ParallelGCThreads=<n> – specify number of threads in concurrent and STW phases
  • 45. © Copyright 1995-2015 by Angelika Langer & Klaus Kreft. All Rights Reserved. http://www.AngelikaLanger.com/ last update: 18/03/2014 08:06 gc pauses (45) agenda • classic HotSpot GC algorithms • less obvious reasons for long pauses – humongous objects – soft/weak references – trace output • alternative garbage collectors – Shenandoah – Oracle JRockit – Azul "C4"
  • 46. © Copyright 1995-2015 by Angelika Langer & Klaus Kreft. All Rights Reserved. http://www.AngelikaLanger.com/ last update: 18/03/2014 08:06 gc pauses (46) Shenandoah • an alternative GC algorithm for OpenJDK – (project name: Shenandoah) – submitted by RedHat • goal – manage 100GB+ heaps with < 10ms pause times – pause times proportional to size of root set, not size of heap
  • 47. © Copyright 1995-2015 by Angelika Langer & Klaus Kreft. All Rights Reserved. http://www.AngelikaLanger.com/ last update: 18/03/2014 08:06 gc pauses (47) Shenandoah • very similar to G1 – organized into regions – concurrent SATB marking – dynamically composed collection set based on GC efficiency – ... • key difference – no notion of generations age does not matter, only GC efficiency does – concurrent evacuation no STW pause for copying survivors no remembered sets survivors are copied invidually on write access
  • 48. © Copyright 1995-2015 by Angelika Langer & Klaus Kreft. All Rights Reserved. http://www.AngelikaLanger.com/ last update: 18/03/2014 08:06 gc pauses (48) Shenandoah phases • concurrent marking – visit all reachable objects (starting with root references) – needs a STW initial & remark pause (just like CMS and G1 do) – afterwards there is liveness info for all regions • concurrent evacuation – select "garbage first" regions for evacuation ("from" space) – select free target regions for evacuation ("to" space) – scan reachable objects in selected "from" regions – add a forwarding pointer to each object that must be relocated – but do not copy it (yet)
  • 49. © Copyright 1995-2015 by Angelika Langer & Klaus Kreft. All Rights Reserved. http://www.AngelikaLanger.com/ last update: 18/03/2014 08:06 gc pauses (49) forwarding pointer "from" region "to" region survivor dead live new location
  • 50. © Copyright 1995-2015 by Angelika Langer & Klaus Kreft. All Rights Reserved. http://www.AngelikaLanger.com/ last update: 18/03/2014 08:06 gc pauses (50) Shenandoah phases (cont.) • concurrent evacuation (cont.) – first write access to survivor in "from" creates copy in "to" – subsequent read access is redirected to copy – references to relocated object are updated in next marking phase – afterwards all "from" regions are reclaimed
  • 51. © Copyright 1995-2015 by Angelika Langer & Klaus Kreft. All Rights Reserved. http://www.AngelikaLanger.com/ last update: 18/03/2014 08:06 gc pauses (51) survivor new location copy on write (COW) "from" region "to" region survivor dead copylive
  • 52. © Copyright 1995-2015 by Angelika Langer & Klaus Kreft. All Rights Reserved. http://www.AngelikaLanger.com/ last update: 18/03/2014 08:06 gc pauses (52) after next marking cycle "to" region copy "from" region survivor dead live
  • 53. © Copyright 1995-2015 by Angelika Langer & Klaus Kreft. All Rights Reserved. http://www.AngelikaLanger.com/ last update: 18/03/2014 08:06 gc pauses (53) Shenandoah initial marking final remarking - concurrent marking initial marking - memory reclaim - concurrent evacuation - copying - concurrent evacuation - copying - concurrent marking final remarking
  • 54. © Copyright 1995-2015 by Angelika Langer & Klaus Kreft. All Rights Reserved. http://www.AngelikaLanger.com/ last update: 18/03/2014 08:06 gc pauses (54) evaluation • upside – few short STW pauses – no remembered set overhead • downside – increased object size (due to forwarding pointer) – more expensive write barriers trigger object copying note: applications threads (not GC threads) create the copies – temporarily more expensive read access due to indirection via forwarding pointer – floating garbage – long turn-around time
  • 55. © Copyright 1995-2015 by Angelika Langer & Klaus Kreft. All Rights Reserved. http://www.AngelikaLanger.com/ last update: 18/03/2014 08:06 gc pauses (55) agenda • classic HotSpot GC algorithms • less obvious reasons for long pauses – humongous objects – soft/weak references – trace output • alternative garbage collectors – Shenandoah – Oracle JRockit – Azul "C4"
  • 56. © Copyright 1995-2015 by Angelika Langer & Klaus Kreft. All Rights Reserved. http://www.AngelikaLanger.com/ last update: 18/03/2014 08:06 gc pauses (56) JRockit GC • GC functionality different from HotSpot – more modular • ‘mix & match’ – heap partitioning – GC algorithm – compaction • can switch GC algorithms and strategies at runtime – at least to a certain extent
  • 57. © Copyright 1995-2015 by Angelika Langer & Klaus Kreft. All Rights Reserved. http://www.AngelikaLanger.com/ last update: 18/03/2014 08:06 gc pauses (57) heap partitioning • un-partitioned – single heap • generational – two areas: nursery (~ young generation) + old generation – nursery contains keep area  most recently allocated objects  not copied to old gen during young GC  avoids premature promotion of short-lived objects
  • 58. © Copyright 1995-2015 by Angelika Langer & Klaus Kreft. All Rights Reserved. http://www.AngelikaLanger.com/ last update: 18/03/2014 08:06 gc pauses (58) young gen collector • collects nursery (if present) • scavenger GC – copies all live objects from nursery to old generation  does not touch keep area – stop-the-world – uses all available CPU cores – resembles HotSpot's parallel young GC (w/o survivor space)
  • 59. © Copyright 1995-2015 by Angelika Langer & Klaus Kreft. All Rights Reserved. http://www.AngelikaLanger.com/ last update: 18/03/2014 08:06 gc pauses (59) old gen collector • collect old generation (gen) or entire heap (single) • algorithm split into mark-and-sweep GC + compaction strategy – select GC algorithm and compaction strategy independently
  • 60. © Copyright 1995-2015 by Angelika Langer & Klaus Kreft. All Rights Reserved. http://www.AngelikaLanger.com/ last update: 18/03/2014 08:06 gc pauses (60) two mark-and-sweep GC algorithms • parallel mark-and-sweep GC – stop-the-world – uses all available CPU cores – resembles HotSpot's parallel old GC (w/ compaction) • concurrent mark-and-sweep GC – "mostly concurrent" – short stop-the-world-pauses during marking and sweeping – resembles HotSpot's CMS HotSpot sweeps concurrently (w/o STW pauses)
  • 61. © Copyright 1995-2015 by Angelika Langer & Klaus Kreft. All Rights Reserved. http://www.AngelikaLanger.com/ last update: 18/03/2014 08:06 gc pauses (61) compaction • partial compaction (on only a part of the heap) – one or two windows traveling the heap – window size is adjustable – external or internal compaction • compaction runs as a stop-the-world pause – during sweep phase top bottom external internal
  • 62. © Copyright 1995-2015 by Angelika Langer & Klaus Kreft. All Rights Reserved. http://www.AngelikaLanger.com/ last update: 18/03/2014 08:06 gc pauses (62) -XgcPrio:deterministic - JRockit Real Time • GC split up into work packets – e.g. a compaction job for part of the heap • if it takes too long, throw away the work packet – re-try later, re-using partial results if possible
  • 63. © Copyright 1995-2015 by Angelika Langer & Klaus Kreft. All Rights Reserved. http://www.AngelikaLanger.com/ last update: 18/03/2014 08:06 gc pauses (63) agenda • classic HotSpot GC algorithms • less obvious reasons for long pauses – humongous objects – soft/weak references – trace output • alternative garbage collectors – Shenandoah – Oracle JRockit – Azul "C4"
  • 64. © Copyright 1995-2015 by Angelika Langer & Klaus Kreft. All Rights Reserved. http://www.AngelikaLanger.com/ last update: 18/03/2014 08:06 gc pauses (64) Azul "C4" • commercial JVM with a no-pause collector named "C4" – "C4" = Continuously Concurrent Compacting Collector • special purpose JVM (the so-called Zing platform) – runs virtualized on top of the actual OS and includes its own operating environment • C4 algorithm makes massive and rapid changes to virtual memory mappings – regular Linux has technical remapping limitations – Zing has its own virtual memory subsystem that supports memory remaps, unmaps, etc. as needed for "C4"
  • 65. © Copyright 1995-2015 by Angelika Langer & Klaus Kreft. All Rights Reserved. http://www.AngelikaLanger.com/ last update: 18/03/2014 08:06 gc pauses (65) Azul "C4" - how does it differ ? • same core mechanism used for both generations – concurrent mark-compact – old and young generation collectors run simultaneously and concurrently with the application threads old gen mark-compact young gen mark-compact • algorithm has 3 phases – mark – relocate – remap
  • 66. © Copyright 1995-2015 by Angelika Langer & Klaus Kreft. All Rights Reserved. http://www.AngelikaLanger.com/ last update: 18/03/2014 08:06 gc pauses (66) "C4" phases • mark phase – trace generation’s live set by starting from roots mark all encountered objects as live mark all encountered object references as marked through • relocate phase – compact memory by relocating live objects into contiguously populated target pages free sparse pages based on liveness totals collected during previous mark phase each “from” page is protected, its objects are relocated to new “to” pages – forwarding information is stored outside the “from” page – “from” page’s physical memory is immediately recycled
  • 67. © Copyright 1995-2015 by Angelika Langer & Klaus Kreft. All Rights Reserved. http://www.AngelikaLanger.com/ last update: 18/03/2014 08:06 gc pauses (67) "C4" phases (cont.) • remap phase – remapping occurs when mutator threads encounter stale references to relocated objects stale references are corrected to point to current object address remap phase is combined with next GC cycle’s mark phase – at the end of remap phase, no stale references will exist virtual addresses associated with relocated “from” can be safely recycled – “no hurry” to finish remap phase there are not physical resources being held
  • 68. © Copyright 1995-2015 by Angelika Langer & Klaus Kreft. All Rights Reserved. http://www.AngelikaLanger.com/ last update: 18/03/2014 08:06 gc pauses (68) combined mark-remap phase mark relocate remap mark relocate remap mark relocate remap
  • 69. © Copyright 1995-2015 by Angelika Langer & Klaus Kreft. All Rights Reserved. http://www.AngelikaLanger.com/ last update: 18/03/2014 08:06 gc pauses (69) GC comparison concurrent mark-compact mostly concurrent or STW parallel mark-sweep STW incremental compact mostly concurrent mark concurrent compact mostly concurrent mark STW incremental compact mostly concurrent mark-sweep STW mark-compact old STW mark-compactSTW copyG1Oracle HotSpot concurrent mark-compact C4Azul Zing STW mark-compactN/ARealTimeOracle JRockit ???N/AShenandoahOpenJDK STW mark-compactSTW copyCMSOracle HotSpot STW copyParallelGCOracle HotSpot fallbackyoungcollector
  • 70. © Copyright 1995-2015 by Angelika Langer & Klaus Kreft. All Rights Reserved. http://www.AngelikaLanger.com/ last update: 18/03/2014 08:06 gc pauses (70) garbage collection pauses Q & A AngelikaAngelika LangerLanger http://www.AngelikaLanger.com twitter: @AngelikaLanger
  • 71. Angelika Langer & Klaus Kreft http://www.AngelikaLanger.com/ Java 8 Stream Performance
  • 72. © Copyright 2003-2015 by Angelika Langer & Klaus Kreft. All Rights Reserved. http://www.AngelikaLanger.com/ last update: 10/13/2015,10:06 Stream Performance (2) objective • how do streams perform? – explore whether / when parallel streams outperfom seq. streams – compare performance of streams to performance of regular loops • what determines stream performance? – take a glance at some stream internal mechanisms
  • 73. © Copyright 2003-2015 by Angelika Langer & Klaus Kreft. All Rights Reserved. http://www.AngelikaLanger.com/ last update: 10/13/2015,10:06 Stream Performance (3) speaker's relationship to topic • independent trainer / consultant / author – teaching C++ and Java for ~20 years – curriculum of half a dozen challenging Java seminars – JCP observer and Java champion since 2005 – co-author of "Effective Java" column – author of Java Generics FAQ – author of Lambda Tutorial & Reference
  • 74. © Copyright 2003-2015 by Angelika Langer & Klaus Kreft. All Rights Reserved. http://www.AngelikaLanger.com/ last update: 10/13/2015,10:06 Stream Performance (4) agenda • introduction • loop vs. sequential stream • sequential vs. parallel stream
  • 75. © Copyright 2003-2015 by Angelika Langer & Klaus Kreft. All Rights Reserved. http://www.AngelikaLanger.com/ last update: 10/13/2015,10:06 Stream Performance (5) what is a stream? • equivalent of sequence from functional programming languages – object-oriented view: internal iterator pattern  see GOF book for more details • idea myStream. forEach ( s -> System.out.print(s) ); stream operation user-defined functionality applied to each element
  • 76. © Copyright 2003-2015 by Angelika Langer & Klaus Kreft. All Rights Reserved. http://www.AngelikaLanger.com/ last update: 10/13/2015,10:06 Stream Performance (6) fluent programming myStream. filter ( s -> s.length() > 3 ) . mapToInt ( s -> s.length() ) . forEach ( System.out::print ); stream operation user-defined functionality applied to each element intermediate operations terminal operation
  • 77. © Copyright 2003-2015 by Angelika Langer & Klaus Kreft. All Rights Reserved. http://www.AngelikaLanger.com/ last update: 10/13/2015,10:06 Stream Performance (7) obtain a stream • collection: • array: • resulting stream – does not store any elements – just a view of the underlying stream source • more stream factories, but not in this talk myCollection.stream(). ... Arrays.stream(myArray). ...
  • 78. © Copyright 2003-2015 by Angelika Langer & Klaus Kreft. All Rights Reserved. http://www.AngelikaLanger.com/ last update: 10/13/2015,10:06 Stream Performance (8) parallel streams • collection: • array: • performs stream operations in parallel – i.e. with multiple worker threads from fork-join common pool myCollection.parallelStream(). ... Arrays.stream(myArray).parallel(). ... myParallelStream.forEach(s -> System.out.print(s));
  • 79. © Copyright 2003-2015 by Angelika Langer & Klaus Kreft. All Rights Reserved. http://www.AngelikaLanger.com/ last update: 10/13/2015,10:06 Stream Performance (9) stream functionality rivals loops • Java 8 streams: • since Java 5: • pre-Java 5: Iterator iter = myCol.iterator(); while (iter.hasNext()) { String s = iter.next(); if (s.length() > 3) System.out.print(s.length()); } for (String s : myCol) if (s.length() > 3) System.out.print(s.length()); myStream.filter(s -> s.length() > 3) .mapToInt(s -> s.length()) .forEach(System.out::print); myStream.filter(s -> s.length() > 3) .forEach(s->System.out.print(s.length()));
  • 80. © Copyright 2003-2015 by Angelika Langer & Klaus Kreft. All Rights Reserved. http://www.AngelikaLanger.com/ last update: 10/13/2015,10:06 Stream Performance (10) obvious question … … how does the performance compare ? • loop vs. sequential stream vs. parallel stream
  • 81. © Copyright 2003-2015 by Angelika Langer & Klaus Kreft. All Rights Reserved. http://www.AngelikaLanger.com/ last update: 10/13/2015,10:06 Stream Performance (11) benchmarks … … done on an older desktop system with: – Intel E8500,  2 x 3,17GHz  4GB RAM – Win 7 – JDK 1.8.0_05 • disclaimer: your mileage may vary – i.e. parallel performance heavily depends on number of CPU-Cores
  • 82. © Copyright 2003-2015 by Angelika Langer & Klaus Kreft. All Rights Reserved. http://www.AngelikaLanger.com/ last update: 10/13/2015,10:06 Stream Performance (12) agenda • introduction • loop vs. sequential stream • sequential vs. parallel stream
  • 83. © Copyright 2003-2015 by Angelika Langer & Klaus Kreft. All Rights Reserved. http://www.AngelikaLanger.com/ last update: 10/13/2015,10:06 Stream Performance (13) how do sequential stream work? • example • filter() and mapToInt() return streams – intermediate operations • reduce() returns int – terminal operation, – that produces a single result from all elements of the stream String[] txt = { "State", "of", "the", "Lambda", "Libraries", "Edition"}; IntStream is = Arrays.stream(txt).filter(s -> s.length() > 3) .mapToInt(s -> s.length()) .reduce(0, (l1, l2) -> l1 + l2);
  • 84. © Copyright 2003-2015 by Angelika Langer & Klaus Kreft. All Rights Reserved. http://www.AngelikaLanger.com/ last update: 10/13/2015,10:06 Stream Performance (14) pipelined processing "State" "of" "the" "Lambda" "Libraries" "Edition" 5 6 9 7 "State" "Lambda" "Libraries" "Edition" code looks like really executed filter mapToInt Arrays.stream(txt).filter(s -> s.length() > 3) .mapToInt(s -> s.length()) .reduce(0, (l1, l2) -> l1 + l2); reduce 5 11 20 270
  • 85. © Copyright 2003-2015 by Angelika Langer & Klaus Kreft. All Rights Reserved. http://www.AngelikaLanger.com/ last update: 10/13/2015,10:06 Stream Performance (15) benchmark with int-array • int[500_000], find largest element – for-loop: – sequential stream: int[] a = ints; int e = ints.length; int m = Integer.MIN_VALUE; for (int i = 0; i < e; i++) if (a[i] > m) m = a[i]; int m = Arrays.stream(ints) .reduce(Integer.MIN_VALUE, Math::max);
  • 86. © Copyright 2003-2015 by Angelika Langer & Klaus Kreft. All Rights Reserved. http://www.AngelikaLanger.com/ last update: 10/13/2015,10:06 Stream Performance (16) results for-loop: 0.36 ms seq. stream: 5.35 ms • for-loop is ~15x faster • are seq. streams always much slower than loops? – no, this is the most extreme example – lets see the same benchmark with an ArrayList<Integer>  underlying data structure is also an array  this time filled with Integer values, i.e. the boxed equivalent of int
  • 87. © Copyright 2003-2015 by Angelika Langer & Klaus Kreft. All Rights Reserved. http://www.AngelikaLanger.com/ last update: 10/13/2015,10:06 Stream Performance (17) benchmark with ArrayList<Integer> • find largest element in an ArrayList with 500_000 elements – for-loop: – sequential stream: int m = Integer.MIN_VALUE; for (int i : myList) if (i > m) m = i; int m = myList.stream() .reduce(Integer.MIN_VALUE, Math::max);
  • 88. © Copyright 2003-2015 by Angelika Langer & Klaus Kreft. All Rights Reserved. http://www.AngelikaLanger.com/ last update: 10/13/2015,10:06 Stream Performance (18) results ArrayList, for-loop: 6.55 ms ArrayList, seq. stream: 8.33 ms • for-loop still faster, but only 1.27x • iteration for ArrayList is more expensive – boxed elements require an additional memory access (indirection) – which does not work well with the CPU’s memory cache • bottom-line: – iteration cost dominates the benchmark result – performance advantage of the for-loop is insignificant
  • 89. © Copyright 2003-2015 by Angelika Langer & Klaus Kreft. All Rights Reserved. http://www.AngelikaLanger.com/ last update: 10/13/2015,10:06 Stream Performance (19) some thoughts • previous situation: – costs of iteration are relative high, but – costs of functionality applied to each element are relative low  after JIT-compilation: more or less the cost of a compare-assembler-instruction • what if we apply a more expensive functionality to each element ? – how will this affect the benchmark results ?
  • 90. © Copyright 2003-2015 by Angelika Langer & Klaus Kreft. All Rights Reserved. http://www.AngelikaLanger.com/ last update: 10/13/2015,10:06 Stream Performance (20) expensive functionality • slowSin() from Apache Commons Mathematics Library – calculates a Taylor approximation of the sine function value for the parameter passed to this method – (normally) not in the public interface of the library  used to calculate values for an internal table,  which is used for interpolation by FastCalcMath.sin()
  • 91. © Copyright 2003-2015 by Angelika Langer & Klaus Kreft. All Rights Reserved. http://www.AngelikaLanger.com/ last update: 10/13/2015,10:06 Stream Performance (21) benchmark with slowSin() • int array / ArrayList with 10_000 elements – for-loop: – sequential stream: – code for ArrayList changed respectively int[] a = ints; int e = a.length; double m = Double.MIN_VALUE; for (int i = 0; i < e; i++) { double d = Sine.slowSin(a[i]); if (d > m) m = d; } Arrays.stream(ints) .mapToDouble(Sine::slowSin) .reduce(Double.MIN_VALUE, Math::max);
  • 92. © Copyright 2003-2015 by Angelika Langer & Klaus Kreft. All Rights Reserved. http://www.AngelikaLanger.com/ last update: 10/13/2015,10:06 Stream Performance (22) results int[], for-loop: 11.72 ms int[], seq. stream: 11.85 ms ArrayList, for-loop: 11.84 ms ArrayList, seq. stream: 11.85 ms • for-loop is not really faster • reason: – applied functionality costs dominate the benchmark result – performance advantage of the for-loop has evaporated
  • 93. © Copyright 2003-2015 by Angelika Langer & Klaus Kreft. All Rights Reserved. http://www.AngelikaLanger.com/ last update: 10/13/2015,10:06 Stream Performance (23) other aspect (without benchmark) • today, compilers (javac + JIT) can optimize loops better than stream code • reasons: – linear code (loop) vs. injected functionality (stream) – lambdas + method references are new to Java – loop optimization is a very mature technology – …
  • 94. © Copyright 2003-2015 by Angelika Langer & Klaus Kreft. All Rights Reserved. http://www.AngelikaLanger.com/ last update: 10/13/2015,10:06 Stream Performance (24) for-loop vs. seq. stream / re-cap • sequential stream can be slower or as fast as for-loop • depends on – costs of the iteration – costs of the functionality applied to each element • the higher the cost (iteration + functionality) the closer is stream performance to for-loop performance
  • 95. © Copyright 2003-2015 by Angelika Langer & Klaus Kreft. All Rights Reserved. http://www.AngelikaLanger.com/ last update: 10/13/2015,10:06 Stream Performance (25) agenda • introduction • loop vs. sequential stream • sequential vs. parallel stream – introduction – stateless functionality – stateful functionality
  • 96. © Copyright 2003-2015 by Angelika Langer & Klaus Kreft. All Rights Reserved. http://www.AngelikaLanger.com/ last update: 10/13/2015,10:06 Stream Performance (26) parallel streams • library side parallelism – important feature  you need not know anything about threads, etc.  very little implementation effort, just: parallel • performance aspect – outperform loops, which are inherently sequential
  • 97. © Copyright 2003-2015 by Angelika Langer & Klaus Kreft. All Rights Reserved. http://www.AngelikaLanger.com/ last update: 10/13/2015,10:06 Stream Performance (27) how do parallel stream work? • example • parallel()’s functionality is based on the fork-join framework final int SIZE = 64; int[] ints = new int[SIZE]; ThreadLocalRandom rand = ThreadLocalRandom.current(); for (int i=0; i<SIZE; i++) ints[i] = rand.nextInt(); Arrays.stream(ints) .parallel() .reduce(Math::max) .ifPresent(System.out.println(m -> “max is: ” + m));
  • 98. © Copyright 2003-2015 by Angelika Langer & Klaus Kreft. All Rights Reserved. http://www.AngelikaLanger.com/ last update: 10/13/2015,10:06 Stream Performance (28) fork join tasks • original task is divided into two sub-tasks by splitting the stream source into two parts – original task’s result are based on sub-tasks’ results – sub-tasks are divided again … fork phase • at a certain depth partitioning stops – tasks at this level (leaf tasks) are executed – execution phase • completed sub-task results are ‘combined’ to super-task results – join phase
  • 99. © Copyright 2003-2015 by Angelika Langer & Klaus Kreft. All Rights Reserved. http://www.AngelikaLanger.com/ last update: 10/13/2015,10:06 Stream Performance (29) find largest element with parallel stream T fork phase execution join phase reduce((i,j) -> Math.max(i,j)); 0_63 T2 T1 0_31 32_63 T22 T21 T12 T11 0_15 16_31 32_47 48_63 m48_63 m32_47 m16_31 m0_15 T2 T1 max(m32_47,m48_63) max(m0_15,m16_31) m32_63 m0_31 T max(m0_31,m32_63) m0_63
  • 100. © Copyright 2003-2015 by Angelika Langer & Klaus Kreft. All Rights Reserved. http://www.AngelikaLanger.com/ last update: 10/13/2015,10:06 Stream Performance (30) split level • deeper split level than shown !!! – execution/leaf tasks: ~ 4*numberOfCores  8 tasks for a dual core CPU (only 4 in the previous diagram) – i.e. one additional split (only 2 in the previous graphic) • key abstractions – java.util.Spliterator – java.util.concurrent.ForkJoinPool.commonPool()
  • 101. © Copyright 2003-2015 by Angelika Langer & Klaus Kreft. All Rights Reserved. http://www.AngelikaLanger.com/ last update: 10/13/2015,10:06 Stream Performance (31) what is a Spliterator ? • spliterator = splitter + iterator • each type of stream source has its own spliterator type – knows how to split the stream source  e.g. ArrayList.ArrayListSpliterator – knows how to iterate the stream source  in execution phase
  • 102. © Copyright 2003-2015 by Angelika Langer & Klaus Kreft. All Rights Reserved. http://www.AngelikaLanger.com/ last update: 10/13/2015,10:06 Stream Performance (32) what is the CommonPool ? • common pool is a singleton fork-join pool instance – introduced with Java 8 – all parallel stream operations use the common pool  so does other parallel JDK functionality (e.g. CompletableFuture), too • default: parallel execution of stream tasks uses – (current) thread that invoked terminal operation, and – (number of cores – 1) many threads from common pool  if (number of cores) > 1 • this default configuration used for all benchmarks
  • 103. © Copyright 2003-2015 by Angelika Langer & Klaus Kreft. All Rights Reserved. http://www.AngelikaLanger.com/ last update: 10/13/2015,10:06 Stream Performance (33) parallel streams + intermediate operations • what if the stream contains upstream intermediate operations when/where are these applied to the stream ? ... .parallelStream().filter(...) .mapToInt(...) .reduce((i,j) -> Math.max(i,j));
  • 104. © Copyright 2003-2015 by Angelika Langer & Klaus Kreft. All Rights Reserved. http://www.AngelikaLanger.com/ last update: 10/13/2015,10:06 Stream Performance (34) find largest element in parallel filter(...).mapToInt(...).reduce((i,j) -> Math.max(i,j)); . . . . . … . filter mapToInt reduce T T2 T1 T22 T21 T12 T11 T2 T1 T execution
  • 105. © Copyright 2003-2015 by Angelika Langer & Klaus Kreft. All Rights Reserved. http://www.AngelikaLanger.com/ last update: 10/13/2015,10:06 Stream Performance (35) parallel overhead … … compared to sequential stream algorithm • algorithm is more complicated / resource intensive – create fork-join-task objects  splitting  fork-join-task objects creation – thread pool scheduling – … • plus additional GC costs – fork-join-task objects have to be reclaimed
  • 106. © Copyright 2003-2015 by Angelika Langer & Klaus Kreft. All Rights Reserved. http://www.AngelikaLanger.com/ last update: 10/13/2015,10:06 Stream Performance (36) agenda • introduction • loop vs. sequential stream • sequential vs. parallel stream – introduction – stateless functionality – stateful functionality
  • 107. © Copyright 2003-2015 by Angelika Langer & Klaus Kreft. All Rights Reserved. http://www.AngelikaLanger.com/ last update: 10/13/2015,10:06 Stream Performance (37) back to the first example / benchmark parallel • find largest element, array / collection, 500_000 elements – sequential stream: – parallel stream: int m = Arrays.stream(ints) .reduce(Integer.MIN_VALUE, Math::max); int m = Arrays.stream(ints).parallel() .reduce(Integer.MIN_VALUE, Math::max); int m = myCollection.stream() .reduce(Integer.MIN_VALUE, Math::max); int m = myCollection.parallelStream() .reduce(Integer.MIN_VALUE, Math::max);
  • 108. © Copyright 2003-2015 by Angelika Langer & Klaus Kreft. All Rights Reserved. http://www.AngelikaLanger.com/ last update: 10/13/2015,10:06 Stream Performance (38) results seq. par. seq./par. int-Array 5.35 ms 3.35 ms 1.60 ArrayList 8.33 ms 6.33 ms 1.32 LinkedList 12.74 ms 19.57 ms 0.65 HashSet 20.76 ms 16.01 ms 1.30 TreeSet 19.79 ms 15.49 ms 1.28
  • 109. © Copyright 2003-2015 by Angelika Langer & Klaus Kreft. All Rights Reserved. http://www.AngelikaLanger.com/ last update: 10/13/2015,10:06 Stream Performance (39) result discussion • why is parallel LinkedList performance so bad ? – hard to split – needs 250_000 iterator’s next() invocations for the first split  with ArrayList: just some index computation • performance of the other collections is also not so great – functionality applied to each element is not very CPU-expensive  after JIT-compilation: cost of a compare-assembler-instruction – iteration (element access) is relative expensive (indirection !)  but not CPU expensive – but more CPU-power is what we have with parallel streams
  • 110. © Copyright 2003-2015 by Angelika Langer & Klaus Kreft. All Rights Reserved. http://www.AngelikaLanger.com/ last update: 10/13/2015,10:06 Stream Performance (40) result discussion (cont.) • why is parallel int-array performance relatively good ? – iteration (element access) is no so expensive (no indirection !)
  • 111. © Copyright 2003-2015 by Angelika Langer & Klaus Kreft. All Rights Reserved. http://www.AngelikaLanger.com/ last update: 10/13/2015,10:06 Stream Performance (41) CPU-expensive functionality • back to slowSin() – calculates a Taylor approximation of the sine function value for the parameter passed to this method – CPU-bound functionality  needs only the initial parameter from memory  calculation based on it’s own (intermediate) results – ideal to be speed up by parallel streams with multiple cores
  • 112. © Copyright 2003-2015 by Angelika Langer & Klaus Kreft. All Rights Reserved. http://www.AngelikaLanger.com/ last update: 10/13/2015,10:06 Stream Performance (42) benchmark parallel with slowSin() • array / collection with 10_000 elements – array: – collection: myCollection.stream() // .parallelStream() .mapToDouble(Sine::slowSin) .reduce(Double.MIN_VALUE, (i, j) -> Math.max(i, j); Arrays.stream(ints) // .parallel() .mapToDouble(Sine::slowSin) .reduce(Double.MIN_VALUE, (i, j) -> Math.max(i, j);
  • 113. © Copyright 2003-2015 by Angelika Langer & Klaus Kreft. All Rights Reserved. http://www.AngelikaLanger.com/ last update: 10/13/2015,10:06 Stream Performance (43) results seq. par. seq./par. int-Array 10.81 ms 6.03 ms 1.79 ArrayList 10.97 ms 6.10 ms 1.80 LinkedList 11.15 ms 6.25 ms 1.78 HashSet 11.15 ms 6.15 ms 1.81 TreeSet 11.14 ms 6.30 ms 1.77
  • 114. © Copyright 2003-2015 by Angelika Langer & Klaus Kreft. All Rights Reserved. http://www.AngelikaLanger.com/ last update: 10/13/2015,10:06 Stream Performance (44) result discussion • performance improvements for all stream sources – by a factor of ~ 1.8  even for LinkedList • the ~1.8 is the maximum improvement on our platform – the remaining 0.2 are  overhead of the parallel algorithm  sequential bottlenecks (Amdahl’s law)
  • 115. © Copyright 2003-2015 by Angelika Langer & Klaus Kreft. All Rights Reserved. http://www.AngelikaLanger.com/ last update: 10/13/2015,10:06 Stream Performance (45) sufficient size (without benchmark) • stream source must have a sufficient size, so that it benefits from parallel processing • overhead increases with growing number of cores – number of tasks ~ 4*number of cores – (in most cases) not with the size of the stream source • Doug Lea mentioned 10_000 for CPU-inexpensive funct. – http://gee.cs.oswego.edu/dl/html/StreamParallelGuidance.html • 500_000 respectively 10_000 in our examples – size can be smaller for CPU-expensive functionality
  • 116. © Copyright 2003-2015 by Angelika Langer & Klaus Kreft. All Rights Reserved. http://www.AngelikaLanger.com/ last update: 10/13/2015,10:06 Stream Performance (46) dynamic overclocking (without benchmark) • modern multi-core CPU typically increases the CPU-frequency when not all of its cores are active – Intel call this feature: turbo boost • benchmark sequential versus parallel stream – seq. test might run with a dynamically overclocked CPU – will this also happen in the real environment or only in the test? • no issue with our test system – too old – no dynamic overclocking supported
  • 117. © Copyright 2003-2015 by Angelika Langer & Klaus Kreft. All Rights Reserved. http://www.AngelikaLanger.com/ last update: 10/13/2015,10:06 Stream Performance (47) agenda • introduction • loop vs. sequential stream • sequential vs. parallel stream – introduction – stateless functionality – stateful functionality
  • 118. © Copyright 2003-2015 by Angelika Langer & Klaus Kreft. All Rights Reserved. http://www.AngelikaLanger.com/ last update: 10/13/2015,10:06 Stream Performance (48) stateful functionality … … with parallel streams / multiple threads boils down to shared mutable state • costs performance to handle this – e.g. lock-free CAS, requires retries in case of collision • traditionally not supported with sequences – functional programming languages don’t have mutable types, and – often no parallel sequences either • new solutions/approaches in Java 8 streams
  • 119. © Copyright 2003-2015 by Angelika Langer & Klaus Kreft. All Rights Reserved. http://www.AngelikaLanger.com/ last update: 10/13/2015,10:06 Stream Performance (49) stateful functionality with Java 8 streams • intermediate stateful operations, e.g. distinct() – see javadoc: This is a stateful intermediate operation. – shared mutable state handled by stream implementation (JDK) • (terminal) operations that allow stateful functional parameters, e.g. forEach(Consumer<? super T> action) – see javadoc: If the action accesses shared state, it is responsible for providing the required synchronization. – shared mutable state handled by user/client code
  • 120. © Copyright 2003-2015 by Angelika Langer & Klaus Kreft. All Rights Reserved. http://www.AngelikaLanger.com/ last update: 10/13/2015,10:06 Stream Performance (50) stateful functionality with Java 8 streams (cont.) • stream’s overloaded method: collect() – shared mutable state handled by stream implementation, and – collector functionality  standard collectors from Collectors (JDK)  user-defined collector functionality (JDK + user/client code) • don’t have time to discuss all situations – only discuss distinct() – shared mutable state handled by stream implementation (JDK)
  • 121. © Copyright 2003-2015 by Angelika Langer & Klaus Kreft. All Rights Reserved. http://www.AngelikaLanger.com/ last update: 10/13/2015,10:06 Stream Performance (51) distinct() • element goes to the result stream, if it hasn’t already appeared before – appeared before, in terms of equals() – shared mutable state: elements already in the result stream  have to compare the current element to each element of the output stream • parallel introduces a barrier (algorithmic overhead) .parallelStream().statelessOps().distinct().statelessOps().terminal(); two alternative algorithms
  • 122. © Copyright 2003-2015 by Angelika Langer & Klaus Kreft. All Rights Reserved. http://www.AngelikaLanger.com/ last update: 10/13/2015,10:06 Stream Performance (52) two algorithms for parallel distinct() • ordering + distinct() – normally elements go to the next stage, in the same order in which they appear for the first time in the current stage • javadoc from distinct() – Removing the ordering constraint with unordered() may result in significantly more efficient execution for distinct() in parallel pipelines, if the semantics of your situation permit. • two different algorithms for parallel distinct() – one for ordered streams + one for unordered streams
  • 123. © Copyright 2003-2015 by Angelika Langer & Klaus Kreft. All Rights Reserved. http://www.AngelikaLanger.com/ last update: 10/13/2015,10:06 Stream Performance (53) benchmark with distinct() • Integer[100_000], filled with 50_000 distinct values • results: seq. par. ordered par. unordered 6.39 ms 34.09 ms 9.1 ms // parallel ordered Arrays.stream(integers).parallel().distinct().count(); // sequential Arrays.stream(integers).distinct().count(); // parallel unordered Arrays.stream(integers).parallel().unordered().distinct().count();
  • 124. © Copyright 2003-2015 by Angelika Langer & Klaus Kreft. All Rights Reserved. http://www.AngelikaLanger.com/ last update: 10/13/2015,10:06 Stream Performance (54) benchmark with distinct() + slowSin() • Integer[10_000], filled with numbers 0 … 9999 – after the mapping 5004 distinct values • results: seq. par. ordered par. unordered 11.59 ms 6.83 ms 6.81 ms Arrays.stream(newIntegers) //.parallel().unordered() .map(i -> new Double(2200* Sine.slowSin(i * 0.001)).intValue()) .distinct() .count();
  • 125. © Copyright 2003-2015 by Angelika Langer & Klaus Kreft. All Rights Reserved. http://www.AngelikaLanger.com/ last update: 10/13/2015,10:06 Stream Performance (55) sequential vs. parallel stream / re-cap to benefit from parallel stream usage … • … stream source … – must have sufficient size – should be easy to split • … operations … – should be CPU-expensive – should not be stateful
  • 126. © Copyright 2003-2015 by Angelika Langer & Klaus Kreft. All Rights Reserved. http://www.AngelikaLanger.com/ last update: 10/13/2015,10:06 Stream Performance (56) advice • benchmark on target platform ! • previous benchmark: – find largest element, LinkedList, 500_000 elements • what if we use a quad-core-CPU (Intel i5-4590) ? – will the parallel result be worse, better, … better than seq. … ? seq. par. seq./par. 12.74 ms 19.57 ms 0.65 seq. par. seq./par. 5.24 ms 4.84 ms 1.08
  • 127. © Copyright 2003-2015 by Angelika Langer & Klaus Kreft. All Rights Reserved. http://www.AngelikaLanger.com/ last update: 10/13/2015,10:06 Stream Performance (57) authors Angelika LangerAngelika Langer KlausKlaus KreftKreft http://www.AngelikaLanger.com
  • 128. © Copyright 2003-2015 by Angelika Langer & Klaus Kreft. All Rights Reserved. http://www.AngelikaLanger.com/ last update: 10/13/2015,10:06 Stream Performance (58) stream performance Q & A