QEMU Intro.
Chiawei, Wang
2015/07/17
1
Story Time
• Emulation V.S. Virtualization
• Why we love emulator ?
• QEMU
• ISA translation of QEMU
• Guest Insn.  Intermediate Representation  Host Insn.
• Code Block Translation
• Translation Block Cache
• Translation Block Chaining
• Helper Feature of QEMU
2
Emulation V.S. Virtualization
• Both can be used to host VM (a.k.a hypervisor)
• Virtualization
• Share the underlying hardware as disjoin set for each VM instance.
• Host ISA is the same as Guest ISA.
• Guest operations can be directly dispatched to hardware
• Fast
• Emulation
• Everything of Guest ISA are realized by software.
• Register, Memory, I/O
• Host ISA can be differ from Guest ISA.
• Guest operations are translated into operations to the emulated devices
• Slow
3
Why we LOVE Emulator ?
• Everything is implemented by software!
• Everything can be customized on demands!
• Welcome to the code-tracing hell….
• Popular emulator
• QEMU
• Bochs
• QEMU is preferred in our use due to its better
performance.
• We will give more details later.
4
QEMU
• QEMU, a Fast and Portable Dynamic Translator
• http://static.usenix.org/event/usenix05/tech/freenix/full_pap
ers/bellard/bellard.pdf
• Supporting numerous ISA emulation
• i386
• x86_64
• arm
• mips
• ppc
• Etc.
• QEMU is also the client of Linux KVM (virtualization).
• Herein we focus only on its emulation functionality.
5
QEMU Snapshot & Console
6
Before we dig into QEMU …
Let us go through an emulation
of a code snippets for example.
7
Example of Emulation
• Emulate ARM Guest on x86 Host
ARM
Register Value
R0 0
R1 0
code:
E3 A0 00 01 MOV R0, #1
E3 A0 10 02 MOV R1, #2
E0 80 00 01 ADD R0, R0, R1
8
Example of Emulation
• Emulate ARM Guest on x86 Host
ARM
Register Value
R0 0
R1 0
code:
E3 A0 00 01 MOV R0, #1
E3 A0 10 02 MOV R1, #2
E0 80 00 01 ADD R0, R0, R1
x86
Register Value
EAX 0
ECX 0
Emulate
R0, R1 w/ EAX, ECX
9
Example of Emulation
• Emulate ARM Guest on x86 Host
ARM
Register Value
R0 1
R1 0
code:
E3 A0 00 01 MOV R0, #1
E3 A0 10 02 MOV R1, #2
E0 80 00 01 ADD R0, R0, R1
x86
Register Value
EAX 0
ECX 0
Emulate
R0, R1 w/ EAX, ECX
10
Example of Emulation
• Emulate ARM Guest on x86 Host
ARM
Register Value
R0 1
R1 0
code:
E3 A0 00 01 MOV R0, #1
E3 A0 10 02 MOV R1, #2
E0 80 00 01 ADD R0, R0, R1
x86
Register Value
EAX 1
ECX 0
Emulate
R0, R1 w/ EAX, ECX
translate
code:
B8 01 00 00 00 MOV EAX, 0x1
11
Example of Emulation
• Emulate ARM Guest on x86 Host
ARM
Register Value
R0 1
R1 2
code:
E3 A0 00 01 MOV R0, #1
E3 A0 10 02 MOV R1, #2
E0 80 00 01 ADD R0, R0, R1
x86
Register Value
EAX 1
ECX 0
Emulate
R0, R1 w/ EAX, ECX
code:
B8 01 00 00 00 MOV EAX, 0x1
12
Example of Emulation
• Emulate ARM Guest on x86 Host
ARM
Register Value
R0 1
R1 2
code:
E3 A0 00 01 MOV R0, #1
E3 A0 10 02 MOV R1, #2
E0 80 00 01 ADD R0, R0, R1
x86
Register Value
EAX 1
ECX 2
Emulate
R0, R1 w/ EAX, ECX
translate
code:
B8 01 00 00 00 MOV EAX, 0x1
B9 02 00 00 00 MOV ECX, 0x2
13
Example of Emulation
• Emulate ARM Guest on x86 Host
ARM
Register Value
R0 3
R1 2
code:
E3 A0 00 01 MOV R0, #1
E3 A0 10 02 MOV R1, #2
E0 80 00 01 ADD R0, R0, R1
x86
Register Value
EAX 1
ECX 2
Emulate
R0, R1 w/ EAX, ECX
code:
B8 01 00 00 00 MOV EAX, 0x1
B9 02 00 00 00 MOV ECX, 0x2
14
Example of Emulation
• Emulate ARM Guest on x86 Host
ARM
Register Value
R0 3
R1 2
code:
E3 A0 00 01 MOV R0, #1
E3 A0 10 02 MOV R1, #2
E0 80 00 01 ADD R0, R0, R1
x86
Register Value
EAX 3
ECX 2
Emulate
R0, R1 w/ EAX, ECX
code:
B8 01 00 00 00 MOV EAX, 0x1
B9 02 00 00 00 MOV ECX, 0x2
01 C8 ADD EAX, ECX
translate
15
Translation Between Different ISA
Guest Host
16
Translation Between Different ISA
Guest Host
code translation
17
Translation Between Different ISA
Guest Host
code translation
18
Translation Between Different ISA
Guest Host
code translation
19
Translation Between Different ISA
Guest Host
code translation
20
Translation Between Different ISA
Guest Host
code translation
Are you fucking
kidding me ?
21
Translation Between Different ISA
of QEMU
• QEMU adopts an abstraction layer between the
translation.
• Tiny Code Generator (TCG), an intermediate
representation (IR) code.
Guest Host
TCG
22
Translation Between Different ISA
of QEMU
• QEMU adopts an abstraction layer between the
translation.
• Tiny Code Generator (TCG), an intermediate
representation (IR) code.
Guest Host
TCG
23
Translation Between Different ISA
of QEMU
• QEMU adopts an abstraction layer between the
translation.
• Tiny Code Generator (TCG), an intermediate
representation (IR) code.
Guest Host
TCG
24
Translation Between Different ISA
of QEMU
• QEMU adopts an abstraction layer between the
translation.
• Tiny Code Generator (TCG), an intermediate
representation (IR) code.
Guest Host
TCG
25
Example of
QEMU Code Translation
Guest Code TCG
mov eax, ds
mov_i64 tmp0, rax
movi_i64 tmp3, 0xfd194
st_i64 tmp3, env, 0x80
mov_i32 tmp5, tmp0
movi_i32 tmp11, 0x3
call load_seg, 0x0, 0, env, tmp11, tmp5
movi_i64 tmp3, 0xfd196
st_i64 tmp3, env, 0x80
exit_tb 0x0
set_label L0
exit_tb 0x7f77499ff3cb
• x86_64 (Guest)  TCG
26
Example of
QEMU Code Translation
TCG Host Code
mov_i64 tmp0, rax
movi_i64 tmp3, 0xfd194
st_i64 tmp3, env, 0x80
mov_i32 tmp5, tmp0
movi_i32 tmp11, 0x3
call load_seg, 0x0, 0, env, tmp11, tmp5
movi_i64 tmp3, 0xfd196
st_i64 tmp3, env, 0x80
exit_tb 0x0
set_label L0
exit_tb 0x7f77499ff3cb
mov rax, 0x3
mov [0x7f779478f008], rax
mov [r14 + 0x80], 0xfd194
mov rdi, r14
mov esi, 0x3
mov edx, 0x10
call 0x7f776ce8e500
mov [r14 + 0x80], 0xfd196
xor eax, eax
jmp 0x7f776a9fec16
lea rax, [rip – 0x110005ed]
jmp 0x7f776a9fec16
• TCG  x86_64 (Host)
27
Example of
QEMU Code Translation
TCG Host Code
mov_i64 tmp0, rax
movi_i64 tmp3, 0xfd194
st_i64 tmp3, env, 0x80
mov_i32 tmp5, tmp0
movi_i32 tmp11, 0x3
call load_seg, 0x0, 0, env, tmp11, tmp5
movi_i64 tmp3, 0xfd196
st_i64 tmp3, env, 0x80
exit_tb 0x0
set_label L0
exit_tb 0x7f77499ff3cb
mov rax, 0x3
mov [0x7f779478f008], rax
mov [r14 + 0x80], 0xfd194
mov rdi, r14
mov esi, 0x3
mov edx, 0x10
call 0x7f776ce8e500
mov [r14 + 0x80], 0xfd196
xor eax, eax
jmp 0x7f776a9fec16
lea rax, [rip – 0x110005ed]
jmp 0x7f776a9fec16
• TCG  x86_64 (Host)
28
I just wanna execute
mov eax, ds
Code Block-based Translation
• First thought of code translation
• Interpret each encountered Guest instruction and
execute the translated code in Host (Bochs’ way).
• Recall the emulation example in page 7.
• QEMU use code block-based translation instead of
one-by-one interpretation.
• Performance improvement
29
What is Code Block
• Code Block/Basic Block (also called Translation Block in QEMU)
• A collection of instructions that can be SEQUENTIALLY executed.
• Each block is ended with a control-flow transfer instruction.
30
Performance Improvement
31
• Translation block optimization
mov eax, 1
add eax, 2
mov eax, 3
• Translation block cache (coming up)
Performance Improvement
32
• Translation block optimization
mov eax, 1
add eax, 2
mov eax, 3
• Translation block cache (coming up)
Translation Block Cache
• Since executing code doesn’t change often, why
don’t we stop translating the code previously
translated ?
33
Translation Block Cache
• Since executing code doesn’t change often, why
don’t we stop translating the code previously
translated ?
• YES! QEMU caches the translation block and index
it with the Guest physical address where the code
resides in.
34
Translation Block Cache
• Workflow
35
main:
mov dword ptr [esp+18], 0
mov dword ptr [esp+14], 80
mov dword ptr [esp+10], 1
mov dword ptr [esp+C], 0
mov dword ptr [esp+8], 0
mov dword ptr [esp+4], C0000000
mov dword ptr [esp], 00404020
mov eax, dword ptr[00406120]
call eax // f = CreateFileA( … )
sub esp, 1C
mov dword ptr[ebp-C], eax
cmp dword ptr[ebp-C], -1
jnz short 00401557 // if(f == -1)
mov eax, -1
jmp short 0040156C // return -1
mov eax, dword ptr [ebp-C]
mov dword ptr [esp], eax
mov eax, dword ptr[0040611C]
call eax // CloseHandle( f )
sub esp, 4
mov eax, 0
mov ecx, dword ptr [ebp-4]
leave // return 0
EIP = 0x11223344
GVA = 0x11223344
GPA = 0x5566
Is TB_cache[GPA]
Valid ?
Execute the TB
Code Translation
mov dword ptr [esp+18], 0
mov dword ptr [esp+14], 80
mov dword ptr [esp+10], 1
mov dword ptr [esp+C], 0
mov dword ptr [esp+8], 0
mov dword ptr [esp+4], C0000000
mov dword ptr [esp], 00404020
mov eax, dword ptr[00406120]
call eax
TB_cache[GPA] = TB
GVA: Guest Virtual Address
GPA: Guest Physical Address
Guest Host
True
False
Lookup the Guest
page table for GVA
TB (Translated code inside)
Translation Block Cache
• Cache space is limited.
• Policy for cache replacement upon full cache is
required.
36
Translation Block Cache
• Cache space is limited.
• Policy for cache replacement upon full cache is
required.
37
• Assume TB1, TB2, and TB3 are all cached and going
to be sequentially executed.
• Six control flow transfer.
Translation Block Chaining
38
QEMU
TB 1
TB 2
TB 3
Find TB1 in cache & Exec
Return
Find TB2 in cache & Exec
Return
Find TB3 in cache & Exec
Return
Time
• When TB1, TB2, and TB3 are executed sequentially
in most case …
• Four control flow transfer. Faster
Translation Block Chaining
39
QEMU
TB 1
TB 2
TB 3
Find TB1 in cache & Exec
Return
Find TB2 in cache & Exec
Return
Find TB3 in cache & Exec
Return
Time
• What if the end of a TB is a conditional branch ?
( e.g. JCC group of x86 )
• Each TB has two slots for chaining
Translation Block Chaining
40
TB 1
TB 2 TB 3
True
Chain
False
Chain
41
So far so good ?
Helper Feature of QEMU
• Helper makes the TB execution be transferred
immediately to C-function Host code.
• Advantage
• Ease of the burden of coding on complex code
translation
• Interception during TB execution
• Disadvantage
• Overhead caused by transmitting the QEMU state from
“executing translated Guest code” to “executing Host
code”
42
Example of Helper Use
43
qemu/helper.c
void helper_div( arg1, arg2 )
{
// Do the division job &
// Update the emulated Guest CPU/Memory
}
Guest Code TCG IR Host Code
• x86_64  TCG  x86_64
Example of Helper Use
44
qemu/helper.c
void helper_div( arg1, arg2 )
{
// Do the division job &
// Update the emulated Guest CPU/Memory
}
Guest Code TCG IR Host Code
• x86_64  TCG  x86_64
compile
helper_div:
0x7f776ce80440: push rbp
0x7f776ce80441: mov rbp, rsp
…
Example of Helper Use
45
qemu/helper.c
void helper_div( arg1, arg2 )
{
// Do the division job &
// Update the emulated Guest CPU/Memory
}
Guest Code TCG IR Host Code
div ecx
• x86_64  TCG  x86_64
compile
helper_div:
0x7f776ce80440: push rbp
0x7f776ce80441: mov rbp, rsp
…
Example of Helper Use
46
qemu/helper.c
void helper_div( arg1, arg2 )
{
// Do the division job &
// Update the emulated Guest CPU/Memory
}
Guest Code TCG IR Host Code
div ecx
mov_i64 tmp0,rcx
movi_i64 tmp3,$0xf0544
st_i64 tmp3,env,$0x80
call divl_EAX,$0x0,$0,env,tmp0
• x86_64  TCG  x86_64
compile
helper_div:
0x7f776ce80440: push rbp
0x7f776ce80441: mov rbp, rsp
…
Translation by calling
gen_helper_div
Example of Helper Use
47
qemu/helper.c
void helper_div( arg1, arg2 )
{
// Do the division job &
// Update the emulated Guest CPU/Memory
}
Guest Code TCG IR Host Code
div ecx
mov_i64 tmp0,rcx
movi_i64 tmp3,$0xf0544
st_i64 tmp3,env,$0x80
call divl_EAX,$0x0,$0,env,tmp0
movq $0xf0544,0x80(%r14)
mov %r14,%rdi
mov $0xa,%esi
callq 0x7f776ce80440
• x86_64  TCG  x86_64
compile
helper_div:
0x7f776ce80440: push rbp
0x7f776ce80441: mov rbp, rsp
…
Translation by calling
gen_helper_div
Generate Host Code for div ecx emulation
Example of Helper Use
48
qemu/helper.c
void helper_div( arg1, arg2 )
{
// Do the division job &
// Update the emulated Guest CPU/Memory
}
Guest Code TCG IR Host Code
div ecx
mov_i64 tmp0,rcx
movi_i64 tmp3,$0xf0544
st_i64 tmp3,env,$0x80
call divl_EAX,$0x0,$0,env,tmp0
movq $0xf0544,0x80(%r14)
mov %r14,%rdi
mov $0xa,%esi
callq 0x7f776ce80440
• x86_64  TCG  x86_64
compile
helper_div:
0x7f776ce80440: push rbp
0x7f776ce80441: mov rbp, rsp
…
Translation by calling
gen_helper_div
Generate Host Code for div ecx emulation
Example of Helper Use
49
qemu/helper.c
void helper_div( arg1, arg2 )
{
// Do the division job &
// Update the emulated Guest CPU/Memory
}
Guest Code TCG IR Host Code
div ecx
mov_i64 tmp0,rcx
movi_i64 tmp3,$0xf0544
st_i64 tmp3,env,$0x80
call divl_EAX,$0x0,$0,env,tmp0
movq $0xf0544,0x80(%r14)
mov %r14,%rdi
mov $0xa,%esi
callq 0x7f776ce80440
• x86_64  TCG  x86_64
compile
helper_div:
0x7f776ce80440: push rbp
0x7f776ce80441: mov rbp, rsp
…
Translation by calling
gen_helper_div
Generate Host Code for div ecx emulation
When does QEMU Use Helper ?
• Translating an instruction which results in complex
and numerous TCG IR generation.
• e.g. div of x86
• Interception to the execution of translated
instruction is required. (like hook)
• e.g. jcc of x86
• … (might be more cases. I haven’t fully comprehended)
50
More QEMU-related System of
DSNSLab
• SecMap
• Two-layer Disk Forensics
• MrKIP
• VMaware Detector
• Cloudebug
• ProbeBuilder
• Android Taint
51
Q & A
52

Qemu Introduction

  • 1.
  • 2.
    Story Time • EmulationV.S. Virtualization • Why we love emulator ? • QEMU • ISA translation of QEMU • Guest Insn.  Intermediate Representation  Host Insn. • Code Block Translation • Translation Block Cache • Translation Block Chaining • Helper Feature of QEMU 2
  • 3.
    Emulation V.S. Virtualization •Both can be used to host VM (a.k.a hypervisor) • Virtualization • Share the underlying hardware as disjoin set for each VM instance. • Host ISA is the same as Guest ISA. • Guest operations can be directly dispatched to hardware • Fast • Emulation • Everything of Guest ISA are realized by software. • Register, Memory, I/O • Host ISA can be differ from Guest ISA. • Guest operations are translated into operations to the emulated devices • Slow 3
  • 4.
    Why we LOVEEmulator ? • Everything is implemented by software! • Everything can be customized on demands! • Welcome to the code-tracing hell…. • Popular emulator • QEMU • Bochs • QEMU is preferred in our use due to its better performance. • We will give more details later. 4
  • 5.
    QEMU • QEMU, aFast and Portable Dynamic Translator • http://static.usenix.org/event/usenix05/tech/freenix/full_pap ers/bellard/bellard.pdf • Supporting numerous ISA emulation • i386 • x86_64 • arm • mips • ppc • Etc. • QEMU is also the client of Linux KVM (virtualization). • Herein we focus only on its emulation functionality. 5
  • 6.
    QEMU Snapshot &Console 6
  • 7.
    Before we diginto QEMU … Let us go through an emulation of a code snippets for example. 7
  • 8.
    Example of Emulation •Emulate ARM Guest on x86 Host ARM Register Value R0 0 R1 0 code: E3 A0 00 01 MOV R0, #1 E3 A0 10 02 MOV R1, #2 E0 80 00 01 ADD R0, R0, R1 8
  • 9.
    Example of Emulation •Emulate ARM Guest on x86 Host ARM Register Value R0 0 R1 0 code: E3 A0 00 01 MOV R0, #1 E3 A0 10 02 MOV R1, #2 E0 80 00 01 ADD R0, R0, R1 x86 Register Value EAX 0 ECX 0 Emulate R0, R1 w/ EAX, ECX 9
  • 10.
    Example of Emulation •Emulate ARM Guest on x86 Host ARM Register Value R0 1 R1 0 code: E3 A0 00 01 MOV R0, #1 E3 A0 10 02 MOV R1, #2 E0 80 00 01 ADD R0, R0, R1 x86 Register Value EAX 0 ECX 0 Emulate R0, R1 w/ EAX, ECX 10
  • 11.
    Example of Emulation •Emulate ARM Guest on x86 Host ARM Register Value R0 1 R1 0 code: E3 A0 00 01 MOV R0, #1 E3 A0 10 02 MOV R1, #2 E0 80 00 01 ADD R0, R0, R1 x86 Register Value EAX 1 ECX 0 Emulate R0, R1 w/ EAX, ECX translate code: B8 01 00 00 00 MOV EAX, 0x1 11
  • 12.
    Example of Emulation •Emulate ARM Guest on x86 Host ARM Register Value R0 1 R1 2 code: E3 A0 00 01 MOV R0, #1 E3 A0 10 02 MOV R1, #2 E0 80 00 01 ADD R0, R0, R1 x86 Register Value EAX 1 ECX 0 Emulate R0, R1 w/ EAX, ECX code: B8 01 00 00 00 MOV EAX, 0x1 12
  • 13.
    Example of Emulation •Emulate ARM Guest on x86 Host ARM Register Value R0 1 R1 2 code: E3 A0 00 01 MOV R0, #1 E3 A0 10 02 MOV R1, #2 E0 80 00 01 ADD R0, R0, R1 x86 Register Value EAX 1 ECX 2 Emulate R0, R1 w/ EAX, ECX translate code: B8 01 00 00 00 MOV EAX, 0x1 B9 02 00 00 00 MOV ECX, 0x2 13
  • 14.
    Example of Emulation •Emulate ARM Guest on x86 Host ARM Register Value R0 3 R1 2 code: E3 A0 00 01 MOV R0, #1 E3 A0 10 02 MOV R1, #2 E0 80 00 01 ADD R0, R0, R1 x86 Register Value EAX 1 ECX 2 Emulate R0, R1 w/ EAX, ECX code: B8 01 00 00 00 MOV EAX, 0x1 B9 02 00 00 00 MOV ECX, 0x2 14
  • 15.
    Example of Emulation •Emulate ARM Guest on x86 Host ARM Register Value R0 3 R1 2 code: E3 A0 00 01 MOV R0, #1 E3 A0 10 02 MOV R1, #2 E0 80 00 01 ADD R0, R0, R1 x86 Register Value EAX 3 ECX 2 Emulate R0, R1 w/ EAX, ECX code: B8 01 00 00 00 MOV EAX, 0x1 B9 02 00 00 00 MOV ECX, 0x2 01 C8 ADD EAX, ECX translate 15
  • 16.
  • 17.
    Translation Between DifferentISA Guest Host code translation 17
  • 18.
    Translation Between DifferentISA Guest Host code translation 18
  • 19.
    Translation Between DifferentISA Guest Host code translation 19
  • 20.
    Translation Between DifferentISA Guest Host code translation 20
  • 21.
    Translation Between DifferentISA Guest Host code translation Are you fucking kidding me ? 21
  • 22.
    Translation Between DifferentISA of QEMU • QEMU adopts an abstraction layer between the translation. • Tiny Code Generator (TCG), an intermediate representation (IR) code. Guest Host TCG 22
  • 23.
    Translation Between DifferentISA of QEMU • QEMU adopts an abstraction layer between the translation. • Tiny Code Generator (TCG), an intermediate representation (IR) code. Guest Host TCG 23
  • 24.
    Translation Between DifferentISA of QEMU • QEMU adopts an abstraction layer between the translation. • Tiny Code Generator (TCG), an intermediate representation (IR) code. Guest Host TCG 24
  • 25.
    Translation Between DifferentISA of QEMU • QEMU adopts an abstraction layer between the translation. • Tiny Code Generator (TCG), an intermediate representation (IR) code. Guest Host TCG 25
  • 26.
    Example of QEMU CodeTranslation Guest Code TCG mov eax, ds mov_i64 tmp0, rax movi_i64 tmp3, 0xfd194 st_i64 tmp3, env, 0x80 mov_i32 tmp5, tmp0 movi_i32 tmp11, 0x3 call load_seg, 0x0, 0, env, tmp11, tmp5 movi_i64 tmp3, 0xfd196 st_i64 tmp3, env, 0x80 exit_tb 0x0 set_label L0 exit_tb 0x7f77499ff3cb • x86_64 (Guest)  TCG 26
  • 27.
    Example of QEMU CodeTranslation TCG Host Code mov_i64 tmp0, rax movi_i64 tmp3, 0xfd194 st_i64 tmp3, env, 0x80 mov_i32 tmp5, tmp0 movi_i32 tmp11, 0x3 call load_seg, 0x0, 0, env, tmp11, tmp5 movi_i64 tmp3, 0xfd196 st_i64 tmp3, env, 0x80 exit_tb 0x0 set_label L0 exit_tb 0x7f77499ff3cb mov rax, 0x3 mov [0x7f779478f008], rax mov [r14 + 0x80], 0xfd194 mov rdi, r14 mov esi, 0x3 mov edx, 0x10 call 0x7f776ce8e500 mov [r14 + 0x80], 0xfd196 xor eax, eax jmp 0x7f776a9fec16 lea rax, [rip – 0x110005ed] jmp 0x7f776a9fec16 • TCG  x86_64 (Host) 27
  • 28.
    Example of QEMU CodeTranslation TCG Host Code mov_i64 tmp0, rax movi_i64 tmp3, 0xfd194 st_i64 tmp3, env, 0x80 mov_i32 tmp5, tmp0 movi_i32 tmp11, 0x3 call load_seg, 0x0, 0, env, tmp11, tmp5 movi_i64 tmp3, 0xfd196 st_i64 tmp3, env, 0x80 exit_tb 0x0 set_label L0 exit_tb 0x7f77499ff3cb mov rax, 0x3 mov [0x7f779478f008], rax mov [r14 + 0x80], 0xfd194 mov rdi, r14 mov esi, 0x3 mov edx, 0x10 call 0x7f776ce8e500 mov [r14 + 0x80], 0xfd196 xor eax, eax jmp 0x7f776a9fec16 lea rax, [rip – 0x110005ed] jmp 0x7f776a9fec16 • TCG  x86_64 (Host) 28 I just wanna execute mov eax, ds
  • 29.
    Code Block-based Translation •First thought of code translation • Interpret each encountered Guest instruction and execute the translated code in Host (Bochs’ way). • Recall the emulation example in page 7. • QEMU use code block-based translation instead of one-by-one interpretation. • Performance improvement 29
  • 30.
    What is CodeBlock • Code Block/Basic Block (also called Translation Block in QEMU) • A collection of instructions that can be SEQUENTIALLY executed. • Each block is ended with a control-flow transfer instruction. 30
  • 31.
    Performance Improvement 31 • Translationblock optimization mov eax, 1 add eax, 2 mov eax, 3 • Translation block cache (coming up)
  • 32.
    Performance Improvement 32 • Translationblock optimization mov eax, 1 add eax, 2 mov eax, 3 • Translation block cache (coming up)
  • 33.
    Translation Block Cache •Since executing code doesn’t change often, why don’t we stop translating the code previously translated ? 33
  • 34.
    Translation Block Cache •Since executing code doesn’t change often, why don’t we stop translating the code previously translated ? • YES! QEMU caches the translation block and index it with the Guest physical address where the code resides in. 34
  • 35.
    Translation Block Cache •Workflow 35 main: mov dword ptr [esp+18], 0 mov dword ptr [esp+14], 80 mov dword ptr [esp+10], 1 mov dword ptr [esp+C], 0 mov dword ptr [esp+8], 0 mov dword ptr [esp+4], C0000000 mov dword ptr [esp], 00404020 mov eax, dword ptr[00406120] call eax // f = CreateFileA( … ) sub esp, 1C mov dword ptr[ebp-C], eax cmp dword ptr[ebp-C], -1 jnz short 00401557 // if(f == -1) mov eax, -1 jmp short 0040156C // return -1 mov eax, dword ptr [ebp-C] mov dword ptr [esp], eax mov eax, dword ptr[0040611C] call eax // CloseHandle( f ) sub esp, 4 mov eax, 0 mov ecx, dword ptr [ebp-4] leave // return 0 EIP = 0x11223344 GVA = 0x11223344 GPA = 0x5566 Is TB_cache[GPA] Valid ? Execute the TB Code Translation mov dword ptr [esp+18], 0 mov dword ptr [esp+14], 80 mov dword ptr [esp+10], 1 mov dword ptr [esp+C], 0 mov dword ptr [esp+8], 0 mov dword ptr [esp+4], C0000000 mov dword ptr [esp], 00404020 mov eax, dword ptr[00406120] call eax TB_cache[GPA] = TB GVA: Guest Virtual Address GPA: Guest Physical Address Guest Host True False Lookup the Guest page table for GVA TB (Translated code inside)
  • 36.
    Translation Block Cache •Cache space is limited. • Policy for cache replacement upon full cache is required. 36
  • 37.
    Translation Block Cache •Cache space is limited. • Policy for cache replacement upon full cache is required. 37
  • 38.
    • Assume TB1,TB2, and TB3 are all cached and going to be sequentially executed. • Six control flow transfer. Translation Block Chaining 38 QEMU TB 1 TB 2 TB 3 Find TB1 in cache & Exec Return Find TB2 in cache & Exec Return Find TB3 in cache & Exec Return Time
  • 39.
    • When TB1,TB2, and TB3 are executed sequentially in most case … • Four control flow transfer. Faster Translation Block Chaining 39 QEMU TB 1 TB 2 TB 3 Find TB1 in cache & Exec Return Find TB2 in cache & Exec Return Find TB3 in cache & Exec Return Time
  • 40.
    • What ifthe end of a TB is a conditional branch ? ( e.g. JCC group of x86 ) • Each TB has two slots for chaining Translation Block Chaining 40 TB 1 TB 2 TB 3 True Chain False Chain
  • 41.
  • 42.
    Helper Feature ofQEMU • Helper makes the TB execution be transferred immediately to C-function Host code. • Advantage • Ease of the burden of coding on complex code translation • Interception during TB execution • Disadvantage • Overhead caused by transmitting the QEMU state from “executing translated Guest code” to “executing Host code” 42
  • 43.
    Example of HelperUse 43 qemu/helper.c void helper_div( arg1, arg2 ) { // Do the division job & // Update the emulated Guest CPU/Memory } Guest Code TCG IR Host Code • x86_64  TCG  x86_64
  • 44.
    Example of HelperUse 44 qemu/helper.c void helper_div( arg1, arg2 ) { // Do the division job & // Update the emulated Guest CPU/Memory } Guest Code TCG IR Host Code • x86_64  TCG  x86_64 compile helper_div: 0x7f776ce80440: push rbp 0x7f776ce80441: mov rbp, rsp …
  • 45.
    Example of HelperUse 45 qemu/helper.c void helper_div( arg1, arg2 ) { // Do the division job & // Update the emulated Guest CPU/Memory } Guest Code TCG IR Host Code div ecx • x86_64  TCG  x86_64 compile helper_div: 0x7f776ce80440: push rbp 0x7f776ce80441: mov rbp, rsp …
  • 46.
    Example of HelperUse 46 qemu/helper.c void helper_div( arg1, arg2 ) { // Do the division job & // Update the emulated Guest CPU/Memory } Guest Code TCG IR Host Code div ecx mov_i64 tmp0,rcx movi_i64 tmp3,$0xf0544 st_i64 tmp3,env,$0x80 call divl_EAX,$0x0,$0,env,tmp0 • x86_64  TCG  x86_64 compile helper_div: 0x7f776ce80440: push rbp 0x7f776ce80441: mov rbp, rsp … Translation by calling gen_helper_div
  • 47.
    Example of HelperUse 47 qemu/helper.c void helper_div( arg1, arg2 ) { // Do the division job & // Update the emulated Guest CPU/Memory } Guest Code TCG IR Host Code div ecx mov_i64 tmp0,rcx movi_i64 tmp3,$0xf0544 st_i64 tmp3,env,$0x80 call divl_EAX,$0x0,$0,env,tmp0 movq $0xf0544,0x80(%r14) mov %r14,%rdi mov $0xa,%esi callq 0x7f776ce80440 • x86_64  TCG  x86_64 compile helper_div: 0x7f776ce80440: push rbp 0x7f776ce80441: mov rbp, rsp … Translation by calling gen_helper_div Generate Host Code for div ecx emulation
  • 48.
    Example of HelperUse 48 qemu/helper.c void helper_div( arg1, arg2 ) { // Do the division job & // Update the emulated Guest CPU/Memory } Guest Code TCG IR Host Code div ecx mov_i64 tmp0,rcx movi_i64 tmp3,$0xf0544 st_i64 tmp3,env,$0x80 call divl_EAX,$0x0,$0,env,tmp0 movq $0xf0544,0x80(%r14) mov %r14,%rdi mov $0xa,%esi callq 0x7f776ce80440 • x86_64  TCG  x86_64 compile helper_div: 0x7f776ce80440: push rbp 0x7f776ce80441: mov rbp, rsp … Translation by calling gen_helper_div Generate Host Code for div ecx emulation
  • 49.
    Example of HelperUse 49 qemu/helper.c void helper_div( arg1, arg2 ) { // Do the division job & // Update the emulated Guest CPU/Memory } Guest Code TCG IR Host Code div ecx mov_i64 tmp0,rcx movi_i64 tmp3,$0xf0544 st_i64 tmp3,env,$0x80 call divl_EAX,$0x0,$0,env,tmp0 movq $0xf0544,0x80(%r14) mov %r14,%rdi mov $0xa,%esi callq 0x7f776ce80440 • x86_64  TCG  x86_64 compile helper_div: 0x7f776ce80440: push rbp 0x7f776ce80441: mov rbp, rsp … Translation by calling gen_helper_div Generate Host Code for div ecx emulation
  • 50.
    When does QEMUUse Helper ? • Translating an instruction which results in complex and numerous TCG IR generation. • e.g. div of x86 • Interception to the execution of translated instruction is required. (like hook) • e.g. jcc of x86 • … (might be more cases. I haven’t fully comprehended) 50
  • 51.
    More QEMU-related Systemof DSNSLab • SecMap • Two-layer Disk Forensics • MrKIP • VMaware Detector • Cloudebug • ProbeBuilder • Android Taint 51
  • 52.