This was a presentation made for a paper titled, "Garbage collection auto tuning for java map reduce on multi-cores". This was a work done for the Advanced Virtualization Techniques" module.
Garbage collection auto tuning for java map reduce on multi-cores
1. Powerpoint Templates 1
Presentation By:
Pradeeban Kathiravelu
INESC-ID Lisboa
Instituto Superior Técnico,
Universidade de Lisboa
Garbage Collection Auto-Tuning for
Java MapReduce on Multi-Cores
Jeremy Singer George Kovoor Gavin Brown Mikel Luján
University of Glasgow
jeremy.singer@glasgow.ac.uk
kovoor.george@gmail.com
University of Manchester
firstname.lastname@manchester.ac.uk
3. Powerpoint Templates 3
Introduction
MRJ, A MapReduce Java Framework
for multi-core architectures
Use of memory management auto-
tuning techniques
based on machine learning.
MRJ performance within 10% of
optimal
On 75% of the benchmark tests.
4. Powerpoint Templates 4
Why GC Auto Tuning?
MRJ end-user cannot be expected to
perform expert analysis to determine
GC activity reducing MRJ
performance.
How to improve the JVM
configuration.
5. Powerpoint Templates 5
Motivation
Efficient adaptation to benchmark-specific
or heap-size-specific anomalies.
Could be installed by the system
administrator
automatically enabled for users that do not
have sufficient permissions to change JVM
parameters.
Enable rapid deployment of MRJ on new
multi-core architecture layouts
6. Powerpoint Templates 6
Contributions
A Scalable Java fork/join framework
for MapReduce (MRJ), on a commodity
multi-core platform.
A comprehensive study on the
impact of Java runtime garbage
collection (GC) on MRJ
An auto-tuning approach to optimize
GC for MRJ.
7. Powerpoint Templates 7
MRJ
Same application interface as Hadoop.
Only map() and reduce() to be defined.
Abstracts away all the details of the
parallelization, runtime scheduling, ..
Focus on the application logic.
11. Powerpoint Templates 11
GC Overhead
GC overhead increases with the number of
processors, more significantly for small heap sizes
12. Powerpoint Templates 12
Relative GC Performance
Input Dependent
Application performance different inputs.
Small → Serial.
Medium, Large → Parallel and Concurrent.
Different Heap Sizes.
Application Dependent
Parallel >> Serial & Concurrent ??
13. Powerpoint Templates 13
sm: concurrent > parallel ?
sm: Search for a word in an input file.
Death rate = Total garbage collected
Total execution time
16. Powerpoint Templates 16
Related Work
The original work on MapReduce [13, 14]
applies to compute-clusters.
Ranger et al. describe the first application of
MapReduce to multi-core processors [31].
Conventional memory management
techniques do not scale to large multi-core
environments [40].
Application of machine learning to Java
runtime performance auto-tuning is a
growing trend [26, 39].
17. Powerpoint Templates 17
Conclusions
MRJ: A Java-based framework for MapReduce parallelism
Targets conventional multi-core architectures.
Speedups of up to 6x the default GC policy
10% geometric mean speedup over all benchmarks
with the largest input data sets.
Scalable performance
With increasing # of threads to the underlying Java
fork/join pool
Machine-learning GC auto-tuning policy improving the
runtime performance
19. Powerpoint Templates 19
Selected References
[13] J. Dean and S. Ghemawat. MapReduce: simplified data processing on large clusters. In Proceedings of
the 6th symposium on operating systems design and implementation, pages 137–150, 2004.
[14] J. Dean and S. Ghemawat. MapReduce: simplified data processing on large clusters. Communications
of the ACM, 51(1):107–113, 2008.
[26] F. Mao and X. Shen. Cross-input learning and discriminative prediction in evolvable virtual machines.
In Proceedings of the 7th
annual IEEE/ACM International Symposium on Code Generation and
Optimization, pages 92–101, 2009.
[31] C. Ranger, R. Raghuraman, A. Penmetsa, G. Bradski, and C. Kozyrakis. Evaluating mapreduce for
multi-core and multiprocessor systems. In Proceedings of the 13th International Symposium on High
Performance Computer Architecture, pages 13–24, 2007.
[39] C. Zhang and M. Hirzel. Online phase-adaptive data layout selection. In ECOOP 2008 Object-Oriented
Programming, pages 309–334, 2008.
[40] Y. Zhao, J. Shi, K. Zheng, H. Wang, H. Lin, and L. Shao. Allocation wall: a limiting factor of Java
applications on emerging multi-core platforms. ACM SIGPLAN Notices, 44(10):361–376, 2009.