Memory Management in a Hardware-accelerated JVM

1,759 views

Published on

Slides of my presentation at an internal meeting of the KIS Research Group.

Published in: Technology, Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,759
On SlideShare
0
From Embeds
0
Number of Embeds
8
Actions
Shares
0
Downloads
0
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Memory Management in a Hardware-accelerated JVM

  1. 1. Memory Management in a Hardware-accelerated JVM Peter Bertels peter.bertels@ugent.be Ghent University – Faculty of Engineering – Department of Electronics and Information Systems – KIS – December, 10th 2008
  2. 2. Hardware accelerators may lead to a significant performance boost DNA aligner software only with acceleration 2
  3. 3. General-purpose processor and hardware accelerator form a hybrid system java vhdl hardware host processor PCI accelerator 3
  4. 4. Java virtual machine acts as an abstraction layer for this hybrid system application java virtual machine hardware host processor PCI accelerator 4
  5. 5. Virtual machine delegates control to the hardware accelerator hardware host processor PCI accelerator hardware call return 5
  6. 6. Virtual machine translates Java bytecode to native machine code … or hardware? hardware optim. 2 optim. 1 baseline 6
  7. 7. Hybrid architecture benefits from communication-aware memory allocation hardware host processor accelerator PCI local memory main memory • Distributed Java heap: both components can access all objects • NUMA: non-uniform memory access • Optimal data allocation is very important 7
  8. 8. Local allocation: memory location is based on the creating component hardware host processor PCI accelerator call( ) return( ) • Each component creates objects in its own memory • Best solution for most short living objects 8
  9. 9. Self-learning strategy tries to find the optimal memory allocation • Group objects by creation site • Per creation site: count loads & stores per component • New objects are allocated based on these counters • Default: allocate in main memory processor 12 397 Circle c = new Circle(); accelerator 9
  10. 10. Determining which strategy performs best for each benchmark • DaCapo and SPECjvm2008 benchmark suites • Comparison of remote access ratio 47% baseline strategy 33% local allocation 26% self-learning 11% optimum 10
  11. 11. Self-learning strategy is a good choice for most benchmarks pmd self-learning self-learning ≈ local 227mtrt 202jess 209db local allocation baseline 213javac 11
  12. 12. Most creation sites reach to a stable allocation policy after only a few objects Circle c = new Circle(); processor time accelerator • Except from the first 7 objects, all objects are allocated in the ‘optimal’ memory 12
  13. 13. Some objects are placed before stabilisation, but self-learning algorithm learns very fast 100% 50% 0% 0 16 512 16384 never 13
  14. 14. Sampling data usage patterns increases the remote access ratio • Evaluation for benchmark antlr • Limited performance degradation # remote accesses 100% 50% 0% no sampling 1/10 1/100 1/1,000 1/10,000 sample rate 14
  15. 15. Self-learning strategy also works with sampled data usage patterns # remote accesses 100% 50% 0% antlr luindex hsqldb 227mtrt serial 209db 15
  16. 16. More programs benefit from hardware- acceleration due to self-learning strategy relative execution time baseline 16 local 4 self-learning 1 1 10 100 relative cost of remote accesses 16
  17. 17. Hardware-accelerated JVM benefits from self-learning memory allocation • Remote accesses are a problem – 47% of all memory accesses – Slow communication channel • Self-learning memory allocation solves this problem – Significant reduction of remote accesses 17

×