Hyojeong Lee
Distributed Computing System Laboratory
Department of Computer Science and Engineering
Seoul National University, Korea
Progress Report
Table of Contents
DOING TODO
Swap-aware
JVM GC
1차 구현 내용 정리
- 모듈별 실행시간
- Evaluation 보완 (SPECjvm)
Page level 재구현
- 관련 연구 참고
Page level 완료
- Validation (DL4J, Spark)
- Compare with other policies
Plan for optimized GC
Journaling in
Lustre FS
서베이 및 논문 수정 최적화 및 실험 추가
Swap-aware JVM GC: 1차 정리
● Motivation:
● Necessity of OS swap
● JVM GC policy is not aware of OS swap
● Background: GC process
Swap-aware JVM GC: 1차 정리
● Scheme: Skip source-destination mapping
● Validation: with test program
● SPECjvm2008 for evaluation (heap)
● scimark.fft.large (4G), crypto.aes (1G), serial (2G),
xml.validation (0.5G)
● derby (2G)
Swap-aware JVM GC: 1차 정리
Swapout Swapin
● Problem
● Current GC doesn’t allow free space between live objects.
● In detail, verifying object before/after GC will fail if object is
free(hole).
● Solution
● Goal
● Avoid source-destination mapping of regions to skip,
without modifying existing metadata and policy.
● Ours
● Set metadata (i.e. dest_addr) properly considering swapness.
● Problem (debugging)
● full_cp and dense prefix are changed.
(cf) first_dead_space_region()
● Related work (APSys’18, VEE’19)
● Insert dummy objects.
Swap-aware JVM GC: 재구현
● Solution (in detail)
● Related work
● summarize_skip_ver()
: insert dummy objects to hole.
Swap-aware JVM GC: 재구현
● 내용 파악/수정
Serialization due to waiting
transaction commit
→ Parallel journal commit
● 최적화 및 실험 추가 (Todo)
i.e. mdtest
Journaling in Lustre FS
TODO
DONE/DOING TODO
Swap-aware
JVM GC
1차 구현 내용 정리 ~3/18
- 모듈별 실행시간
- Evaluation 보완 (SPECjvm)
Page level 재구현 ~3/18
- 관련 연구 참고
Page level 완료 ~3/29
- Validation (DL4J, Spark)
- Compare with other policies
Plan for optimized GC Next
Journaling in
Lustre FS
서베이 및 논문 수정 ~3/15(Done) 최적화 및 실험 추가 Next
Bkup_Target workloads (~ 3/15)
● Attempts
1. Can’t skip entire memcpy.
2. ‘Validation after GC error’ because of wrong metadata.
3. DOING
● Evaluation
● Simple java program (microbench)
● Make swapped objects target for compaction
● Benchmarks (benchmark)
● DaCapo, SPECjvm2008, JOlden, Hyracks
● Neo4j (real workload)
● embedded, disk-based, fully transactional Java persistence engine that
manages graph data
● Spark (real workload)
● Sparkbench’s Graph-computation & Machine-learning algorithm
● Deeplearning4Java (real workload)
● Deep learning platform for Java
Target list
● SPECjvm2008 for evaluation
● scimark.fft.large (4G)
● derby (2G)
● crypto.aes (1G)
● serial (2G)
● xml.validation (0.5G)
SPECjvm2008
● Kernel(page) level: Swap flag - Attempt (2)
● Evaluation w/ DL4J
● Object Detection: House Number Detection
● Dataset: http://ufldl.stanford.edu/housenumbers/
● 73,257 digit images for training
● 26,032 digit images for testing
● 531,131 additional, somewhat less difficult samples, to use as
extra training data
● Result
● Baseline (DONE)
● Iteration for training: 100 times
● FGC: 2 times
● Memory usage: ~ 9 GB
● Optimized (TODO)
DL4J
● Kernel(page) level: Swap flag - Attempt (2)
● Evaluation w/ Spark
● Proper target
● SVDPlusPlus
● LogisticRegression
● How to execute?
● 환경 구축, https://docs.google.com/document/d/1seI-
ZzjKvcJeMOJFLLWX0hys7bJVla5TsTobtgrguAE/edit?usp=sh
aring
● 실험 방법,
https://docs.google.com/document/d/1eL5sGQrzUMM3SffME
yp0gQ1qeh7oJ3dbOD2TW6LPy9M/edit?usp=sharing
● Sparkbench 분석 및 베이스라인 실험 결과,
https://docs.google.com/presentation/d/1VYHK2550iJjVg4BeL
zavviCvdDQrehRZ_1oXN27CDKg/edit?usp=sharing
Spark (sparkbench)
(cf) Other JVM GC policies
# Policy
Copy swapped
obj
Traverse for
allocation
Other issues etc
1
Concurrent Mark
and Sweep (CMS)
X O
Floating garbage /
More logics to deal with fragmentation /
More space for list
long-lived obj 비율 높고 pause
time 제약 있는 앱에 적합
2 Parallel Compact O X X
pause time 제약 있는 앱에 적합
default in java 8
3 Garbage first (G1) O X X default in java 9
4
SAGP (Swap-
aware Parallel
Compact)
X X
Floating garbage /
Check pagemap (fopen, close) /
More fragmentation
기존 allocation 정책 그대로 사용
가능
커널의 swap 정보를 역으로 읽어
올 수 있으면 오버헤드 제거 가능
https://docs.google.com/presentation/d/1rLyJyny7NMmSLpd9f_z_arjzzngL4bxjAbgo0d5Lnik/edit?usp=sharing
Backup slides
● Solutions
● Swap-aware GC policy
● (Fine-grained) Checking pagemap → Inserting DS maintaining
swap info
● (Coarse-grained) Adding reference counter + LRU list
● Optimized GC policy (TODO)
Solutions
dense_prefix
Virtual (heap)
Physical (kernel)
Actually, no need to compact
→ Just remapping virtual space!
● Implementation scheme
Swap-aware JVM GC Policy
dense_prefix
Swap space Swapped live data
Process of Full GC
1. Mark
2. Summarize
3. Compact
Assume that,
- Swapped region =
LRU → Do not need to
swapin.
source_reg
dest_addr
live_size
…
512k
Metadata of Region
Bitmap
1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1
Add variable
‘_swpness’ &
check it using
pagemap
1. Mark
: Set bit for live object by
tracking root set
2. Summarize
: Fill metadata & set
dense prefix
3. Compact
: push_region to stack → do sliding
compaction (memcpy)
…
Region stack list
Region stacks for each GC thread
When src’s _swpness > 0,
skip pushing/copying
draining and
stealing over
stacks
● Kernel(page) level: Swap flag - Attempt (1)
● Evaluation w/ Simple java program
● Simple java program
● Make swapped objects target for compaction
● Result
Swap-aware JVM GC Policy
Allocate 30
objects
Allocate 10 objects
(objects 0 to 9 are
swapped out)
Access 0 to 9 (objects 10 to
19 are swapped out)
&
Make null objects 0 to 9 (be
GC targets)
GC triggered, do mark and
summarize (checking swpness
consumes about 24 sec)
Do compact
Allocate 30
objects
Allocate 10 objects
(objects 0 to 9 are
swapped out)
Access 0 to 9 (objects 10 to
19 are swapped out)
&
Make null objects 0 to 9 (be
GC targets)
GC triggered, do mark and
summarize & Do compact
● Kernel(page) level: Swap flag - Attempt (2)
● Evaluation w/ Simple java program
Swap-aware JVM GC Policy
● Kernel(page) level: Swap flag - Attempt (2)
● Evaluation w/ Simple java program
Swap-aware JVM GC Policy

Progress_190315

  • 1.
    Hyojeong Lee Distributed ComputingSystem Laboratory Department of Computer Science and Engineering Seoul National University, Korea Progress Report
  • 2.
    Table of Contents DOINGTODO Swap-aware JVM GC 1차 구현 내용 정리 - 모듈별 실행시간 - Evaluation 보완 (SPECjvm) Page level 재구현 - 관련 연구 참고 Page level 완료 - Validation (DL4J, Spark) - Compare with other policies Plan for optimized GC Journaling in Lustre FS 서베이 및 논문 수정 최적화 및 실험 추가
  • 3.
    Swap-aware JVM GC:1차 정리 ● Motivation: ● Necessity of OS swap ● JVM GC policy is not aware of OS swap ● Background: GC process
  • 4.
    Swap-aware JVM GC:1차 정리 ● Scheme: Skip source-destination mapping ● Validation: with test program
  • 5.
    ● SPECjvm2008 forevaluation (heap) ● scimark.fft.large (4G), crypto.aes (1G), serial (2G), xml.validation (0.5G) ● derby (2G) Swap-aware JVM GC: 1차 정리 Swapout Swapin
  • 6.
    ● Problem ● CurrentGC doesn’t allow free space between live objects. ● In detail, verifying object before/after GC will fail if object is free(hole). ● Solution ● Goal ● Avoid source-destination mapping of regions to skip, without modifying existing metadata and policy. ● Ours ● Set metadata (i.e. dest_addr) properly considering swapness. ● Problem (debugging) ● full_cp and dense prefix are changed. (cf) first_dead_space_region() ● Related work (APSys’18, VEE’19) ● Insert dummy objects. Swap-aware JVM GC: 재구현
  • 7.
    ● Solution (indetail) ● Related work ● summarize_skip_ver() : insert dummy objects to hole. Swap-aware JVM GC: 재구현
  • 8.
    ● 내용 파악/수정 Serializationdue to waiting transaction commit → Parallel journal commit ● 최적화 및 실험 추가 (Todo) i.e. mdtest Journaling in Lustre FS
  • 9.
    TODO DONE/DOING TODO Swap-aware JVM GC 1차구현 내용 정리 ~3/18 - 모듈별 실행시간 - Evaluation 보완 (SPECjvm) Page level 재구현 ~3/18 - 관련 연구 참고 Page level 완료 ~3/29 - Validation (DL4J, Spark) - Compare with other policies Plan for optimized GC Next Journaling in Lustre FS 서베이 및 논문 수정 ~3/15(Done) 최적화 및 실험 추가 Next
  • 10.
  • 11.
    ● Attempts 1. Can’tskip entire memcpy. 2. ‘Validation after GC error’ because of wrong metadata. 3. DOING ● Evaluation ● Simple java program (microbench) ● Make swapped objects target for compaction ● Benchmarks (benchmark) ● DaCapo, SPECjvm2008, JOlden, Hyracks ● Neo4j (real workload) ● embedded, disk-based, fully transactional Java persistence engine that manages graph data ● Spark (real workload) ● Sparkbench’s Graph-computation & Machine-learning algorithm ● Deeplearning4Java (real workload) ● Deep learning platform for Java Target list
  • 12.
    ● SPECjvm2008 forevaluation ● scimark.fft.large (4G) ● derby (2G) ● crypto.aes (1G) ● serial (2G) ● xml.validation (0.5G) SPECjvm2008
  • 13.
    ● Kernel(page) level:Swap flag - Attempt (2) ● Evaluation w/ DL4J ● Object Detection: House Number Detection ● Dataset: http://ufldl.stanford.edu/housenumbers/ ● 73,257 digit images for training ● 26,032 digit images for testing ● 531,131 additional, somewhat less difficult samples, to use as extra training data ● Result ● Baseline (DONE) ● Iteration for training: 100 times ● FGC: 2 times ● Memory usage: ~ 9 GB ● Optimized (TODO) DL4J
  • 14.
    ● Kernel(page) level:Swap flag - Attempt (2) ● Evaluation w/ Spark ● Proper target ● SVDPlusPlus ● LogisticRegression ● How to execute? ● 환경 구축, https://docs.google.com/document/d/1seI- ZzjKvcJeMOJFLLWX0hys7bJVla5TsTobtgrguAE/edit?usp=sh aring ● 실험 방법, https://docs.google.com/document/d/1eL5sGQrzUMM3SffME yp0gQ1qeh7oJ3dbOD2TW6LPy9M/edit?usp=sharing ● Sparkbench 분석 및 베이스라인 실험 결과, https://docs.google.com/presentation/d/1VYHK2550iJjVg4BeL zavviCvdDQrehRZ_1oXN27CDKg/edit?usp=sharing Spark (sparkbench)
  • 15.
    (cf) Other JVMGC policies # Policy Copy swapped obj Traverse for allocation Other issues etc 1 Concurrent Mark and Sweep (CMS) X O Floating garbage / More logics to deal with fragmentation / More space for list long-lived obj 비율 높고 pause time 제약 있는 앱에 적합 2 Parallel Compact O X X pause time 제약 있는 앱에 적합 default in java 8 3 Garbage first (G1) O X X default in java 9 4 SAGP (Swap- aware Parallel Compact) X X Floating garbage / Check pagemap (fopen, close) / More fragmentation 기존 allocation 정책 그대로 사용 가능 커널의 swap 정보를 역으로 읽어 올 수 있으면 오버헤드 제거 가능 https://docs.google.com/presentation/d/1rLyJyny7NMmSLpd9f_z_arjzzngL4bxjAbgo0d5Lnik/edit?usp=sharing
  • 16.
  • 17.
    ● Solutions ● Swap-awareGC policy ● (Fine-grained) Checking pagemap → Inserting DS maintaining swap info ● (Coarse-grained) Adding reference counter + LRU list ● Optimized GC policy (TODO) Solutions dense_prefix Virtual (heap) Physical (kernel) Actually, no need to compact → Just remapping virtual space!
  • 18.
    ● Implementation scheme Swap-awareJVM GC Policy dense_prefix Swap space Swapped live data Process of Full GC 1. Mark 2. Summarize 3. Compact Assume that, - Swapped region = LRU → Do not need to swapin. source_reg dest_addr live_size … 512k Metadata of Region Bitmap 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 Add variable ‘_swpness’ & check it using pagemap 1. Mark : Set bit for live object by tracking root set 2. Summarize : Fill metadata & set dense prefix 3. Compact : push_region to stack → do sliding compaction (memcpy) … Region stack list Region stacks for each GC thread When src’s _swpness > 0, skip pushing/copying draining and stealing over stacks
  • 19.
    ● Kernel(page) level:Swap flag - Attempt (1) ● Evaluation w/ Simple java program ● Simple java program ● Make swapped objects target for compaction ● Result Swap-aware JVM GC Policy Allocate 30 objects Allocate 10 objects (objects 0 to 9 are swapped out) Access 0 to 9 (objects 10 to 19 are swapped out) & Make null objects 0 to 9 (be GC targets) GC triggered, do mark and summarize (checking swpness consumes about 24 sec) Do compact Allocate 30 objects Allocate 10 objects (objects 0 to 9 are swapped out) Access 0 to 9 (objects 10 to 19 are swapped out) & Make null objects 0 to 9 (be GC targets) GC triggered, do mark and summarize & Do compact
  • 20.
    ● Kernel(page) level:Swap flag - Attempt (2) ● Evaluation w/ Simple java program Swap-aware JVM GC Policy
  • 21.
    ● Kernel(page) level:Swap flag - Attempt (2) ● Evaluation w/ Simple java program Swap-aware JVM GC Policy