Translation Cache Policies for Dynamic Binary Translation

Responsible : Prof. Frédéric Pétrot
Supervisor : Luc Michel
TIMA Laboratory - SLS Group
Grenoble, France
Translation cache policies for
dynamic binary translation
Ecole
Nationale
des Sciences
de l'Informatique
Saber Ferjani

2
 DBT: Is a CPU simulation technique, it reads a
short sequence of code (Target), translates it,
and executes it in a different CPU (Host).
Host Machine
CPUSimulated Target
translation
asm code

TB TB TB TB TB TB
3
 Translation cache: It is a buffer in host
machine that stores the Translated Blocks (TB)

Outline
1. Virtualization and simulation techniques
2. Qemu Internals
3. Typical cache algorithms
4. Cache algorithm proposal
5. Simulation results
6. Conclusion & Perspectives
4

5
1.1. Just In Time Compiler

6
1.2. Hosted & Native Hypervisors

7
1.3. Virtualization tools
Virtual Box
Virtual PC
VMware
Xen
Bochs
Valgrind
Qemu
KVM

8
1.4. Simulation techniques
 Interpretive technique ► Extremely slow!
 Native Simulation ► Need source code!
 Binary Translation:
 Static ► Cannot handle indirect branches
 Dynamic ► Quite fast & flexible

2. Qemu internals
9
2.1. Overview
 Generic & Open source machine emulator
 Created by Fabrice Bellard in 2003
 Supported targets: IA32, ARM, SPARC, MIPS, PPC…

2. Qemu internals
10
2.2. Execution flow example

2. Qemu internals
11
2.3. Main execution loop

2. Qemu internals
12
2.4. Translation cache size

2. Qemu internals
13
2.4. TB allocation

3. Typical cache algorithms
14
Optimal cache algorithm (offline)
Basic cache algorithms:
Flush, Random, FIFO, LRU, LFU
Advanced cache algorithms:
LRFU, 2Q, LIRS, ARC
Qemu constraints:
TB are not movable
TB size is variable,
TB size is unpredictable

15
4.1. Algorithm design

16
4.2. Data structure
Constant insertion overhead
Frequently referenced TBs are elected for
re-translation into separated cache area

17
4.3. HST update
Before CSA flush, add address of all TBs
that were executed more than 𝐹𝑡ℎ
HST is used as circular buffer,
HST size is fixed to half of HSA size
@HS1
@HS2
@HS3@HS4
@HS5

Qemu monitor: Back-end configuration
console interface
Log options:
out asm: show generated host code
In asm: show target assembly code
Exec: show trace before each executed TB
…etc
Generated log of (log exec):
Trace (Host Address) [(Target Address)]
18
5.1. Qemu log

19
5.2. TB-trace: Translation cache simulator

20
5.3. Simulated cache algorithms
LRU
LFU
CSA HSA
• A-LRU:
• A-LFU:
• A-2Q:
@
@
@@
@
HST

5.3. Qemu used guest machines
LZMA benchmark
Linux Kernel
Windows XP start-up

22
5.5. Guest 1: LZMA benchmark over Debian
0,25 0,375 0,5
62
89
72
50 55 5256
68
88
CSA
flushs
Quota=
LRU LFU 2Q
0,25 0,375 0,5
18,5%
39,6%
26,1%
86,9% 91,3% 90,1%
81,8% 81,9% 81,8%
Hotspot
hit

23
5.6. Guest 2: Linux kernel 2.6.20
0,25 0,375 0,5
15
18
22
15
17
21
16
19
23
CSA
flushs
Quota=
LRU LFU 2Q
+1
HSA
flush
+1
HSA
flush
0,25 0,375 0,5
24,1%
32,1%
43,6%
24,4%
61,9% 57,4%
30,0%
64,1% 65,2%
Hotspot
hit

24
5.7. Guest 3: Windows XP start-up
0,25 0,375 0,5
15
18
21
15
17
21
16
19
24
CSA
flushs
Quota=
LRU LFU 2Q
+1
HSA
flush
+1
HSA
flush
+1
HSA
flush
0,25 0,375 0,5
16,0%
45,2%
52,1%
23,4%
56,5% 51,4%
29,0%
45,3%
64,7%
Hotspot
hit

Qemu translation cache is inefficient
Cache algorithms based on page
replacement cannot be used
Our algorithm proposal advantages:
Reduce unneeded re-translations
TB insertion overhead is constant
Drawbacks:
Invalidated TB remain allocated
Address find operation depend on HST size
25
6.1. Conclusion

Use a hash function for HST to accelerate
TB lookup before each new translation,
Use an op-code buffer to accelerate TB
re-translation of hot spots,
Estimate size of next translation, and try
to overwrite invalidated TB
26
6.2. Perspectives

Translation Cache Policies for Dynamic Binary Translation

More Related Content

What's hot

Similar to Translation Cache Policies for Dynamic Binary Translation

More from Saber Ferjani

Recently uploaded

Translation Cache Policies for Dynamic Binary Translation