Tetcon2016 160104

BE-PUM: Binary Emulation for Pushdown Model Generation
Obfuscation code localization
based on CFG generation of malware
Nguyen Minh Hai
Industrial University of Ho Chi Minh City (IUH)
with Quan Thanh Tho, Ho Chi Minh City University of
Technology (HMCUT) , in
Collaboration with Mizuhito Ogawa (JAIST)
January 2016

BE-PUM
• Binary Emulation for Pushdown Model Generation
• Key features:
Generate model (CFG) from binary code of malware
Show better results compared with many other tools, e.g.
IDA Pro, Jakstab, Hooper...
Tackle many obfuscation techniques and successfully
unpack many packers (27 different packers)
 Generic Unpacker for Model Generation of Malware
Detect packer by semantic signature (recognizing packer
techniques)
 Sematic Signature Matching for Packer Detection
1

Agenda
1.Motivation
2.BE-PUM
3.Experiments
4.Conclusions
5.Demo
2

Malwares
• Malware (malicious software) – a real threat
Virus
Trojan horse
Keylogger
• How to deal
Signature detection (Industry approach)
Emulation (Sandbox approach)
Model checking (Formal approach)
3

Issues
• Signature-based = Failed by obfuscation techniques
• Sandbox-based
Heavy cost
Virus may have different behaviors (at different
time points)
Virus may even detect sandbox environment
• Model Checking
Model Generation
Model Checking
4

Model Checking Outline
Model
GenerationModel
Checking
5

Typical approach
• Control Flow Graph (CFG) is generated as the
model
One program location is mapped a node
Decide all of destinations when branching
• Things are more difficult with sophisticated
binaries:
Self-modification code (Encryption/Decryption)
Indirect jump
Many other obfuscation techniques
6

Control Flow Graph
• Choices of many tools (CodeSurfer/x86, McVeto,
JakStab, BIRD, Renovo, Syman, BINCOA/OSMOSE,
IDA Pro)
Hexa Instructions
0x00401000 cmp eax, 0
0x00401003 jle 0x0040100d
0x00401005 mov eax, 0x00401001
0x0040100a jmp 0x00401015
0x0040100c halt
0x0040100d mov eax, 0x00401018
0x00401012 sub eax, 5
0x00401015 sub eax, 1
0x00401018 0x0040100c
00
03
05
0A
12
0D
15
18
0A
7

BE-PUM
• BE-PUM - Binary Emulation for Pushdown Model
• Apply pushdown model generation of binary code
Apply concolic testing (dynamic symbolic execution) to
handle indirect jump
Apply on-the-fly model generation for handling self-
modifying code
Focus on obfuscation techniques which are used in
malware and packer tools.
14

Running Example
Hexa Instructions
0x00401000 cmp eax, 0
0x00401003 jle 0x0040100d
0x00401005 mov eax, 0x00401001
0x0040100a jmp 0x00401015
0x0040100c halt
0x0040100d mov eax, 0x00401018
0x00401012 sub eax, 5
0x00401015 sub eax, 1
0x00401018 jmp eax

Running Example
Hexa Instructions
0x00401000 cmp eax, 0
0x00401003 jle 0x0040100d
0x00401005 mov eax, 0x00401001
0x0040100a jmp 0x00401015
0x0040100c halt
0x0040100d mov eax, 0x00401018
0x00401012 sub eax, 5
0x00401015 sub eax, 1
0x00401018 jmp eax
eax = α

Running Example
Hexa Instructions
0x00401000 cmp eax, 0
0x00401003 jle 0x0040100d
0x00401005 mov eax, 0x00401001
0x0040100a jmp 0x00401015
0x0040100c halt
0x0040100d mov eax, 0x00401018
0x00401012 sub eax, 5
0x00401015 sub eax, 1
0x00401018 jmp eax
eax <0
eax >=0
eax = α

Running Example
Hexa Instructions
0x00401000 cmp eax, 0
0x00401003 jle 0x0040100d
0x00401005 mov eax, 0x00401001
0x0040100a jmp 0x00401015
0x0040100c halt
0x0040100d mov eax, 0x00401018
0x00401012 sub eax, 5
0x00401015 sub eax, 1
0x00401018 jmp eax
00
03
05
0a
12
0d
15
18
eax = α
Convert symbolic value of α into a concrete value
Use white-box testing to under-approximate α
jmp to α?

Test-case Generation
00
03
05
0a
12
0d
15
18
α >= 0
α >= 0
α >= 0
α >= 0
00
03
05
0a
12
0d
15
18
α <= 0
α <= 0
α <= 0
eax = α eax = α
Test-case-1 = 5
Test-case-2 = -7

Enlarging the Model by Testing Result
Simulation Snapshot
eax=0x0040100C;
start=0x00401000;
return=0;
address=0x0040100C;
Hexa Instructions
0x00401000 cmp eax, 0
0x00401003 jle 0x0040100d
0x00401005 mov eax, 0x00401001
0x0040100a jmp 0x00401015
0x0040100c halt
0x0040100d mov eax, 0x00401018
0x00401012 sub eax, 5
0x00401015 sub eax, 1
0x00401018 jmp eax
00
03
05
0a
12
0d
15
18
0c
Test-
case-1Test-
case-
2

Strategy for covered instruction selection
• Instruction statistics collected from virus samples
• Full list of 300 supported instructions
Call Jump Return
add shl Call je jz jne jump mov cmovg ret cmp out setna lods daa
and sal jnz jb jnae xchg cmovl int pop setnae movs das
sub dec jc jnb jae movz cmovl aaa popa setnb neg enter
or inc jnc jng jnae movsb cmovna aad popf setnbe nop in
xor adc jle ja jl movsw cmovnae aam push setnc shld int1
imul shr jnge jnl jnbe mosx cmovnbe aas pusha setne shrd int3
ror ror jge jo jg movzb cmovne bsf pushf setng stc lahf
div rep jnle jns loop movzw cmovng bswap rdtsc setnge stos lea
sbb mul js jno jp cmova cmovnge bt sahf setnl test leave
clc sar jno jpe jecxz cmovb cmovnl btc scas setnle xlat
not ror jmp loope loopne cmovbe cmovnle brt seta setno cbw
idiv rcr loop loopz loopnz cmovc cmovno bts setae setnp cwde
xadd rol cmove cmovnp cbw setb setns cmps
adc rcl cmovp cmovns cdq setbe seto cmpxchg
dec mul cmovpe cmovnz clc setc setp cmpxchg8b
shr sbb cmovpo cmovo cld sete setpe cpuid
sar cmovs cmovz cli setg setpo cwd
cltd setge sets cwde
cmc setl setz cwt
Arthimetic Conditinal Jump Move Control
16

Supported 400 Windows APIs
• Kernel32.dll: _lwrite, accept, bind, CloseHandle, closesocket, connect,
CopyFile, CreateFile, CreateFileMapping, CreateProcess, CreateThread
DeleteFile, ExitProcess, FindClose, FindFirstFile, FindNextFile,
FreeEnvironmentStrings, GetCommandLine, GetCurrentDirectory
GetCurrentProcess, GetEnvironmentStrings, GetFileAttributes, GetFileSize,
GetFileType, gethostbyname, gethostname, GetLastError GetLocalTime,
GetModuleFileName, GetModuleHandle, GetProcAddress, GetStartupInfo,
GetStdHandle, GetSystemDirectory GetSystemTime, GetTickCount,
GetVersion, GetVersionEx, GetWindowsDirectory, HeapAlloc, HeapCreate,
HeapDestroy, HeapFree, HeapReAlloc, IsDebuggerPresent, listen, LoadLibrary,
lstrcat, lstrcmp, lstrcpy, lstrlen, MapViewOfFile, MoveFile, PeekMessageA,
ReadFile, recv, RegCloseKey, RegOpenKeyEx, RegSetValueEx, send, ,
SetCurrentDirectory, SetEndOfFile, SetFileAttributes, SetFilePointer,
SetHandleCount, shutdown, socket ,UnmapViewOfFile, VirtualAlloc, VirtualFree,
WaitForSingleObject, WinExec, WriteFile, WSACleanup, WSAStartup...
• User32.dll: MessageBox, SendMessage, FindWindow, PostMessage.
17

Best Practice
• Apply bread-first-search strategy to ask Z3 to
generate as much test-case as possible
• Use JNA (Java Native Access) to simulate API
calling
18

Indirect Jump
• Virus.Win32.Aztec
00401057 . B8 00100000 MOV EAX,1000
0040105C . 05 00004000 ADD EAX, 00400000
00401061 . FFE0 JMP EAX
BE-PUM
IDA Pro
20

Overlapping Instruction
HLLW.Rolog.f
•Junk code modifies the return address.
00437002 E8 03000000 CALL 0043700A
00437007 E9 EB045D45 JMP 45A074F7
00437002 CALL 0043700A
0043700D RETN
0043700A POP EBP
0043700B INC EBP
0043700C PUSH EBP
Code
21

Self-Modifying Code
• Virus.Win32.Seppuku.1606 : Self-Modifying Code
00401646 E8 B5F9FFFF CALL 00401000
EDI = 401067
004010E5 MOV EAX,DWORD PTR SS:[EBP+401489]
004010EB STOS DWORD PTR ES:[EDI]
00401646 E8 00000000 CALL 0040164B
23

Decryption
• Email-Worm.Win32.Kickin.d : Self-decryption
00609223 pop ebp
00609224 push 3d
00609226 mov byte ptr ds:[esi+9cccd0e5],dh
0060922C retn 8d9e
0060922F pxor mm5,mm3
00609232 dec ecx
00609233 fiadd word ptr ds:[ecx+80a6b31]
Decryption loop
ecx was set to 0CAh0060933A mov ecx,0ca
00609345 lods byte ptr ds:[esi]
00609346 xor al,ah
00609348 inc ah
0060934A rol ah,2
0060934D add ah,90
00609350 stos byte ptr es:[edi]
00609351 loopd 00609345
00609223 call 00609228
00609228 mov ebx, [ebp+402705]
0060922E add ebx,28
00609231 pop eax
00609232 sub eax,ebx
00609234 mov [ebp+40270d],eax
24

Comparison with others
• BE-PUM current tool: precise models (CFG)
generated from real malwares
Indirect jumps (now)
Self-modification (now)
Decryption (now)
SEH (now)
Packer techniques (now)
• Experiments
Compare the CFG with those generated by
Jakstab and IDA Pro
29

Supported Techniques in Packer
32

Remarks
• BE-PUM plays the roles of both model generation
and model emulator for binaries
Model Generation: on-the-fly manner, with
concolic technique
–Missing piece: Loop invariant (handled by
looping many many times if needed)
 Emulator
– A “symbolic sandbox”
34

Tetcon2016 160104

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (9)

Similar to Tetcon2016 160104

Similar to Tetcon2016 160104 (20)

Recently uploaded

Recently uploaded (20)

Tetcon2016 160104

Editor's Notes