Machine Learning for
MSc Advanced Computer Science
“Machine Learning for Garbage
Gavin Brown & Mikel Lujan
Traditional manual memory management
by the programmer is prone to error
Automatic memory management by the
runtime environment, takes out the
Stop the World GC, in Object Orientated
GC time in this context is dead time,
algorithm extremely important for
Generational hypothesis, “The majority of
objects die young” - D. Ungar, 1984
Separate heap objects into different
generations, vary collection frequency.
Reduce scanning and pause time.
Copying time increases GC time, copying
reserve reduces heap size.
Allocate directly into mature generation?
Automatically extract patterns and
processes underlying data generation
Classification, Regression, Unsupervised
learning (Clustering etc.)
Algorithm extremely important
“Use ML to predict whether an object will
live long enough to be tenured into the
Existing literature review used to
generate hypothesises about which
heuristics could indicate object lifetime
Modified Jikes RVM to trace object
allocation and tenuring data
DaCapo benchmarks, varying heap sizes.
Mutual Information used to test
correlation of example heuristics as
Current Project Results
Allocation site, allocation method and
method trace are good object lifetime
indicators for both scalars and arrays
Object type exhibits high MI for scalar,
but less so for array, objects
Object size exhibits high MI for array
objects, but minimal for scalars.
Generalised object type name
characteristics exists e.g. Iterator or
Enum classes are short lived.
Machine Learning plan
Treat as classification problem not
regression, predict tenuring decision not
exact object lifetime.
Mixture of categorical and ordinal data
makes learning more complex.
Lack of linear separability between
Large alphabets for many attributes, e.g.
type, reduces generality of advice.
Machine learning technique using tree
structure to represent classifier.
Branch nodes consist of attribute tests
Branches are attributes values
Leaf nodes represent classification
Can handle our complex data input
characteristics, e.g. categorical data.
Future Project Plan
Begin with offline learning. Train decision
tree offline and plug results back into Jikes
RVM for execution. Evaluate effect on
garbage collection and benchmark
Consider online learning within the JVM,
training stage now done during application
execution not before. Performance impact
of this stage is crucial.
Extensions? Increased granularity of object
Mutual Information can be used to
correlate heuristics from with object
lifetimes, not previously used in the
literature, and confirm hypothesises.
Previously unknown heuristic found.
Objects with the same generalised object
type have similar lifetime characteristics.
Decision tree may be appropriate for
learning with available data.