(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽❤️🧑🏻 89...
Progress_190213
1. Hyojeong Lee
Distributed Computing System Laboratory
Department of Computer Science and Engineering
Seoul National University, Korea
Progress Report
3. ● Motivation
● Necessity of swapping
● Performing memory-intensive workloads such as deep learning algorithm,
the size of generated data is large and unpredictable.
● Current GC policy is not aware of swapping.
● ParallelCompact, default GC policy of Java8 do not consider whether data
is swapped.
● Swap I/O by processing swapped data results in long GC time.
● For examples,
● SVD++ in Sparkbench (swap I/O & execution time)
Motivation
5. ● Features
● High Memory usage
● High Locality
(So, there is compaction for swapped data.)
● Target workloads
● Simple java program for validation
● Java benchmarks
● DaCapo, SPECjvm2008, JOlden, Hyracks
● Real workloads
● Neo4j, Spark, Deeplearning4Java
Targets
6. ● Solutions
● Swap-aware GC policy
● (Fine-grained) Checking pagemap → Inserting DS maintaining
swap info
● (Coarse-grained) Adding reference counter + LRU list
● Optimized GC policy (TODO)
Solutions
dense_prefix
Virtual (heap)
Physical (kernel)
Actually, no need to compact
→ Just remapping virtual space!
7. ● Implementation scheme
Swap-aware JVM GC Policy
dense_prefix
Swap space Swapped live data
Process of Full GC
1. Mark
2. Summarize
3. Compact
Assume that,
- Swapped region =
LRU → Do not need to
swapin.
source_reg
dest_addr
live_size
…
512k
Metadata of Region
Bitmap
1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1
Add variable
‘_swpness’ &
check it using
pagemap
1. Mark
: Set bit for live object by
tracking root set
2. Summarize
: Fill metadata & set
dense prefix
3. Compact
: push_region to stack → do sliding
compaction (memcpy)
…
Region stack list
Region stacks for each GC thread
When src’s _swpness > 0,
skip pushing/copying
draining and
stealing over
stacks
8. ● Attempts
1. Can’t skip entire memcpy.
2. ‘Validation after GC error’ because of wrong metadata.
3. DOING
● Evaluation
● Simple java program (microbench)
● Make swapped objects target for compaction
● Benchmarks (benchmark)
● DaCapo, SPECjvm2008, JOlden, Hyracks
● Neo4j (real workload)
● embedded, disk-based, fully transactional Java persistence engine that
manages graph data
● Spark (real workload)
● Sparkbench’s Graph-computation & Machine-learning algorithm
● Deeplearning4Java (real workload)
● Deep learning platform for Java
Swap-aware JVM GC Policy
9. ● Kernel(page) level: Swap flag - Attempt (1)
● Evaluation w/ Simple java program
● Simple java program
● Make swapped objects target for compaction
● Result
Swap-aware JVM GC Policy
Allocate 30
objects
Allocate 10 objects
(objects 0 to 9 are
swapped out)
Access 0 to 9 (objects 10 to
19 are swapped out)
&
Make null objects 0 to 9 (be
GC targets)
GC triggered, do mark and
summarize (checking swpness
consumes about 24 sec)
Do compact
Allocate 30
objects
Allocate 10 objects
(objects 0 to 9 are
swapped out)
Access 0 to 9 (objects 10 to
19 are swapped out)
&
Make null objects 0 to 9 (be
GC targets)
GC triggered, do mark and
summarize & Do compact
10. ● Kernel(page) level: Swap flag - Attempt (2)
● Evaluation w/ Simple java program
Swap-aware JVM GC Policy
11. ● Kernel(page) level: Swap flag - Attempt (2)
● Evaluation w/ Simple java program
Swap-aware JVM GC Policy
12. ● Kernel(page) level: Swap flag - Attempt (2)
● Evaluation w/ DL4J
● Object Detection: House Number Detection
● Dataset: http://ufldl.stanford.edu/housenumbers/
● 73,257 digit images for training
● 26,032 digit images for testing
● 531,131 additional, somewhat less difficult samples, to use as
extra training data
● Result
● Baseline (DONE)
● Iteration for training: 100 times
● FGC: 2 times
● Memory usage: ~ 9 GB
● Optimized (TODO)
Swap-aware JVM GC Policy
13. ● Swap-aware JVM GC policy
● ~ 02.28
● Complete page level solution
● Validation with real workloads (DL4J, Spark)
● Compare with existing GC policies
● Plan for optimized GC policy
● Parallel logging
● ~ 02.28
● Parallel logging on Lustre file system
● Improve paper: An Efficient Journaling Mechanism in Lustre
File System for Fast Storage Devices
TODO (outline)
15. (cf) JVM GC policies
# Policy
Copy swapped
obj
Traverse for
allocation
Other issues etc
1
Concurrent Mark
and Sweep (CMS)
X O
Floating garbage /
More logics to deal with fragmentation /
More space for list
long-lived obj 비율 높고 pause
time 제약 있는 앱에 적합
2 Parallel Compact O X X
pause time 제약 있는 앱에 적합
default in java 8
3 Garbage first (G1) O X X default in java 9
4
SAGP (Swap-
aware Parallel
Compact)
X X
Floating garbage /
Check pagemap (fopen, close) /
More fragmentation
기존 allocation 정책 그대로 사용
가능
커널의 swap 정보를 역으로 읽어
올 수 있으면 오버헤드 제거 가능
https://docs.google.com/presentation/d/1rLyJyny7NMmSLpd9f_z_arjzzngL4bxjAbgo0d5Lnik/edit?usp=sharing
16. Swap-aware JVM GC Policy
Allocate 30
objects
Allocate 10 objects
(objects 0 to 9 are
swapped out)
Access 0 to 9 (objects 10 to
19 are swapped out)
&
Make null objects 0 to 9 (be
GC targets)
GC triggered, do mark and
summarize (checking swpness
consumes about 24 sec)
Do compact
Allocate 30
objects
Allocate 10 objects
(objects 0 to 9 are
swapped out)
Access 0 to 9 (objects 10 to
19 are swapped out)
&
Make null objects 0 to 9 (be
GC targets)
GC triggered, do mark and
summarize & Do compact
Comments:
- 최적화 전(오른쪽) / 후(왼쪽) 이며, 각 GC 스텝은 빨간색 화살표로 표기
- 현재 미완성 버전은 모든 swapped region을 skip하지 못하지만, 일차적으로 베이스라인과 성능을 비교해본 결과,
- Swap I/O: swapin 약 300mb 감소 / swapout 약 600mb 감소
- Execution time: 약 15초 증가
- 최적화 완료 후에는 swapin/out 각각 10gb 이상 감소할 것으로 예상
- 따라서, 실행 시간이 증가한 것은 check_swpness가 24초 가량 소요됨을 감안하면 최적화 완료 후 개선의 여지가 있
음
swapin (mb) swapout (mb) total time (sec)
optimized 31109 41253 180
baseline 31411 41866 165