Stop-the-world GCs on milticores

A study of the Scalability of Stop-
the-World Garbage Collectors on
Multicores
Aliya Ibragimova
University of Fribourg

Agenda
• Overview
• Problem Statement
• Parallel Scavenge description
• Identifying bottlenecks
• Methods and solutions
• Results
• Conclusion

Overview
• A Stop-the-World Collector performs garbage
collection while the application is completely
stopped
• A Parallel Collection uses multiple threads to
perform Garbage Collection

Parallel Scavenge example available in
OpenJDK7

Problem Statement
Stop-the-world (STW) algorithm degrades badly beyond
8 – cores on a 48-core NUMA-machine with OpenJDK 7:

– Does the Stop-the-World design has intrinsic
limitations?
– If no what are the limitations of the STW approach?
– How we can improve the current design?

Contended locks: GC monitor’s lock
Beginning of parallel phase

GC monitor’s lock

GC task queue

GC threads

Solution: use Michael-Scott lock-free queue

The end of parallel phase

GC monitor’s lock

Global
counter

Solution: remove redundant synchronization
use timestamps to avoid race conditions

Idea: remove GC monitor’s lock

1. Task queue
Use lock-free task queue

2. Barrier at the end of parallel phase
Remove redundant synchronization

3. Conditional variable of the GC monitor
Replace conditional variable with Linux’s
futex_wait calls.

Lack of NUMA-awareness

Memory Memory

CPU CPU CPU CPU

NUMA – Non-Uniform Memory access

• Memory access imbalance
• Memory locality

Lack of NUMA-awareness
• Interleaved spaces
– map pages from different nodes with round robin
policy
• Fragmented spaces
– thread allocates memory from the fragment
associated with the node where it is executing
• Segregated spaces
– Fragmented space that is restricted to being
accessed by GC threads running on the same node
Best performance: fragmented spaces in the young space interleaved
in others

Results
Resulting GC, NAPS for NUMA-Aware Parallel Scavenge

Look at the effect of the optimization on 3
benchmarks:
• SPECjbb2005
• SPECjvm2008
• DeCapo
8 memory nodes, 48 cores, 96 GB RAM, Linux 3.0 64-bit

Results
• NAPS improves performance and scalability over
Parallel Scavenge all most in all cases
• NAPS performance continue to increase up to 48
cores
• NAPS reduces pause time up to 2.8 times in the best
case
• NAPS improves responsiveness of applications

Conclusion

• This slide is about next steps…

Questions
If you have any questions you are welcome to ask.

Stop-the-world GCs on milticores

More Related Content

What's hot

Viewers also liked

Similar to Stop-the-world GCs on milticores

Recently uploaded

Stop-the-world GCs on milticores