Machine Learning for
Garbage Collection



     James Thomas
 




     MSc Advanced Computer Science
 




     “Machin...
Garbage Collection
     Traditional manual memory management
 


     by the programmer is prone to error

     Automatic...
Generational Garbage
Collection

     Generational hypothesis, “The majority of
 


     objects die young” - D. Ungar, 1...
Generational Garbage
Collection
Machine Learning
Introduction

     Automatically extract patterns and
 


     processes underlying data generation

   ...
Project Progress
     Existing literature review used to
 


     generate hypothesises about which
     heuristics could...
Mutual Information sample
results
Mutual Information sample
results
Current Project Results
     Allocation site, allocation method and
 


     method trace are good object lifetime
     i...
Machine Learning plan

 Treat as classification problem not



regression, predict tenuring decision not
exact object lif...
Decision Tree

 Machine learning technique using tree



structure to represent classifier.

    Branch nodes consist of ...
Future Project Plan
 Begin with offline learning. Train decision



tree offline and plug results back into Jikes
RVM for...
Current Project
Conclusions

  Mutual Information can be used to



correlate heuristics from with object
lifetimes, not ...
Questions?
Upcoming SlideShare
Loading in …5
×

My thesis progress presentation

9,343 views

Published on

Presentation I had to give, half way through my thesis, to discuss my progress.

Published in: Technology, Education
  • Be the first to comment

My thesis progress presentation

  1. 1. Machine Learning for Garbage Collection James Thomas  MSc Advanced Computer Science  “Machine Learning for Garbage  Collection” Gavin Brown & Mikel Lujan 
  2. 2. Garbage Collection Traditional manual memory management  by the programmer is prone to error Automatic memory management by the  runtime environment, takes out the “garbage” Stop the World GC, in Object Orientated  environment (Java) GC time in this context is dead time,  algorithm extremely important for efficiency
  3. 3. Generational Garbage Collection Generational hypothesis, “The majority of  objects die young” - D. Ungar, 1984 Separate heap objects into different  generations, vary collection frequency. Reduce scanning and pause time. Copying time increases GC time, copying  reserve reduces heap size. Allocate directly into mature generation? 
  4. 4. Generational Garbage Collection
  5. 5. Machine Learning Introduction Automatically extract patterns and  processes underlying data generation Classification, Regression, Unsupervised  learning (Clustering etc.) Algorithm extremely important  “Use ML to predict whether an object will  live long enough to be tenured into the mature space”
  6. 6. Project Progress Existing literature review used to  generate hypothesises about which heuristics could indicate object lifetime behaviour Modified Jikes RVM to trace object  allocation and tenuring data DaCapo benchmarks, varying heap sizes.  Mutual Information used to test  correlation of example heuristics as lifetime predictor.
  7. 7. Mutual Information sample results
  8. 8. Mutual Information sample results
  9. 9. Current Project Results Allocation site, allocation method and  method trace are good object lifetime indicators for both scalars and arrays Object type exhibits high MI for scalar,  but less so for array, objects Object size exhibits high MI for array  objects, but minimal for scalars. Generalised object type name  characteristics exists e.g. Iterator or Enum classes are short lived.
  10. 10. Machine Learning plan Treat as classification problem not  regression, predict tenuring decision not exact object lifetime. Mixture of categorical and ordinal data  makes learning more complex. Lack of linear separability between  classes. Large alphabets for many attributes, e.g.  type, reduces generality of advice.
  11. 11. Decision Tree Machine learning technique using tree  structure to represent classifier. Branch nodes consist of attribute tests  Branches are attributes values  Leaf nodes represent classification  Can handle our complex data input  characteristics, e.g. categorical data.
  12. 12. Future Project Plan Begin with offline learning. Train decision  tree offline and plug results back into Jikes RVM for execution. Evaluate effect on garbage collection and benchmark execution. Consider online learning within the JVM,  training stage now done during application execution not before. Performance impact of this stage is crucial. Extensions? Increased granularity of object  lifetime prediction.
  13. 13. Current Project Conclusions Mutual Information can be used to  correlate heuristics from with object lifetimes, not previously used in the literature, and confirm hypothesises. Previously unknown heuristic found.  Objects with the same generalised object type have similar lifetime characteristics. Decision tree may be appropriate for  learning with available data.
  14. 14. Questions?

×