BE-PUM: Binary Emulation for Pushdown Model Generation
Obfuscation code localization
based on CFG generation of malware
Nguyen Minh Hai
Industrial University of Ho Chi Minh City (IUH)
with Quan Thanh Tho, Ho Chi Minh City University of
Technology (HMCUT) , in
Collaboration with Mizuhito Ogawa (JAIST)
January 2016
BE-PUM
• Binary Emulation for Pushdown Model Generation
• Key features:
Generate model (CFG) from binary code of malware
Show better results compared with many other tools, e.g.
IDA Pro, Jakstab, Hooper...
Tackle many obfuscation techniques and successfully
unpack many packers (27 different packers)
 Generic Unpacker for Model Generation of Malware
Detect packer by semantic signature (recognizing packer
techniques)
 Sematic Signature Matching for Packer Detection
1
Agenda
1.Motivation
2.BE-PUM
3.Experiments
4.Conclusions
5.Demo
2
Malwares
• Malware (malicious software) – a real threat
Virus
Trojan horse
Keylogger
• How to deal
Signature detection (Industry approach)
Emulation (Sandbox approach)
Model checking (Formal approach)
3
Issues
• Signature-based = Failed by obfuscation techniques
• Sandbox-based
Heavy cost
Virus may have different behaviors (at different
time points)
Virus may even detect sandbox environment
• Model Checking
Model Generation
Model Checking
4
Model Checking Outline
Model
GenerationModel
Checking
5
Typical approach
• Control Flow Graph (CFG) is generated as the
model
One program location is mapped a node
Decide all of destinations when branching
• Things are more difficult with sophisticated
binaries:
Self-modification code (Encryption/Decryption)
Indirect jump
Many other obfuscation techniques
6
Control Flow Graph
• Choices of many tools (CodeSurfer/x86, McVeto,
JakStab, BIRD, Renovo, Syman, BINCOA/OSMOSE,
IDA Pro)
Hexa Instructions
0x00401000 cmp eax, 0
0x00401003 jle 0x0040100d
0x00401005 mov eax, 0x00401001
0x0040100a jmp 0x00401015
0x0040100c halt
0x0040100d mov eax, 0x00401018
0x00401012 sub eax, 5
0x00401015 sub eax, 1
0x00401018 0x0040100c
00
03
05
0A
12
0D
15
18
0A
7
ExampleExample
8
9
10
Demo
Demo
BE-PUM
• BE-PUM - Binary Emulation for Pushdown Model
• Apply pushdown model generation of binary code
Apply concolic testing (dynamic symbolic execution) to
handle indirect jump
Apply on-the-fly model generation for handling self-
modifying code
Focus on obfuscation techniques which are used in
malware and packer tools.
14
Running Examples
Running Example
Hexa Instructions
0x00401000 cmp eax, 0
0x00401003 jle 0x0040100d
0x00401005 mov eax, 0x00401001
0x0040100a jmp 0x00401015
0x0040100c halt
0x0040100d mov eax, 0x00401018
0x00401012 sub eax, 5
0x00401015 sub eax, 1
0x00401018 jmp eax
Running Example
Hexa Instructions
0x00401000 cmp eax, 0
0x00401003 jle 0x0040100d
0x00401005 mov eax, 0x00401001
0x0040100a jmp 0x00401015
0x0040100c halt
0x0040100d mov eax, 0x00401018
0x00401012 sub eax, 5
0x00401015 sub eax, 1
0x00401018 jmp eax
eax = α
Running Example
Hexa Instructions
0x00401000 cmp eax, 0
0x00401003 jle 0x0040100d
0x00401005 mov eax, 0x00401001
0x0040100a jmp 0x00401015
0x0040100c halt
0x0040100d mov eax, 0x00401018
0x00401012 sub eax, 5
0x00401015 sub eax, 1
0x00401018 jmp eax
eax = α
Running Example
Hexa Instructions
0x00401000 cmp eax, 0
0x00401003 jle 0x0040100d
0x00401005 mov eax, 0x00401001
0x0040100a jmp 0x00401015
0x0040100c halt
0x0040100d mov eax, 0x00401018
0x00401012 sub eax, 5
0x00401015 sub eax, 1
0x00401018 jmp eax
eax <0
eax >=0
eax = α
Running Example
Hexa Instructions
0x00401000 cmp eax, 0
0x00401003 jle 0x0040100d
0x00401005 mov eax, 0x00401001
0x0040100a jmp 0x00401015
0x0040100c halt
0x0040100d mov eax, 0x00401018
0x00401012 sub eax, 5
0x00401015 sub eax, 1
0x00401018 jmp eax
00
03
05
0a
12
0d
15
18
eax = α
Convert symbolic value of α into a concrete value
Use white-box testing to under-approximate α
jmp to α?
Test-case Generation
00
03
05
0a
12
0d
15
18
α >= 0
α >= 0
α >= 0
α >= 0
00
03
05
0a
12
0d
15
18
α <= 0
α <= 0
α <= 0
eax = α eax = α
Test-case-1 = 5
Test-case-2 = -7
Enlarging the Model by Testing Result
Simulation Snapshot
eax=0x0040100C;
start=0x00401000;
return=0;
address=0x0040100C;
Hexa Instructions
0x00401000 cmp eax, 0
0x00401003 jle 0x0040100d
0x00401005 mov eax, 0x00401001
0x0040100a jmp 0x00401015
0x0040100c halt
0x0040100d mov eax, 0x00401018
0x00401012 sub eax, 5
0x00401015 sub eax, 1
0x00401018 jmp eax
00
03
05
0a
12
0d
15
18
0c
Test-
case-1Test-
case-
2
Framework
15
Strategy for covered instruction selection
• Instruction statistics collected from virus samples
• Full list of 300 supported instructions
Call Jump Return
add shl Call je jz jne jump mov cmovg ret cmp out setna lods daa
and sal jnz jb jnae xchg cmovl int pop setnae movs das
sub dec jc jnb jae movz cmovl aaa popa setnb neg enter
or inc jnc jng jnae movsb cmovna aad popf setnbe nop in
xor adc jle ja jl movsw cmovnae aam push setnc shld int1
imul shr jnge jnl jnbe mosx cmovnbe aas pusha setne shrd int3
ror ror jge jo jg movzb cmovne bsf pushf setng stc lahf
div rep jnle jns loop movzw cmovng bswap rdtsc setnge stos lea
sbb mul js jno jp cmova cmovnge bt sahf setnl test leave
clc sar jno jpe jecxz cmovb cmovnl btc scas setnle xlat
not ror jmp loope loopne cmovbe cmovnle brt seta setno cbw
idiv rcr loop loopz loopnz cmovc cmovno bts setae setnp cwde
xadd rol cmove cmovnp cbw setb setns cmps
adc rcl cmovp cmovns cdq setbe seto cmpxchg
dec mul cmovpe cmovnz clc setc setp cmpxchg8b
shr sbb cmovpo cmovo cld sete setpe cpuid
sar cmovs cmovz cli setg setpo cwd
cltd setge sets cwde
cmc setl setz cwt
Arthimetic Conditinal Jump Move Control
16
Supported 400 Windows APIs
• Kernel32.dll: _lwrite, accept, bind, CloseHandle, closesocket, connect,
CopyFile, CreateFile, CreateFileMapping, CreateProcess, CreateThread
DeleteFile, ExitProcess, FindClose, FindFirstFile, FindNextFile,
FreeEnvironmentStrings, GetCommandLine, GetCurrentDirectory
GetCurrentProcess, GetEnvironmentStrings, GetFileAttributes, GetFileSize,
GetFileType, gethostbyname, gethostname, GetLastError GetLocalTime,
GetModuleFileName, GetModuleHandle, GetProcAddress, GetStartupInfo,
GetStdHandle, GetSystemDirectory GetSystemTime, GetTickCount,
GetVersion, GetVersionEx, GetWindowsDirectory, HeapAlloc, HeapCreate,
HeapDestroy, HeapFree, HeapReAlloc, IsDebuggerPresent, listen, LoadLibrary,
lstrcat, lstrcmp, lstrcpy, lstrlen, MapViewOfFile, MoveFile, PeekMessageA,
ReadFile, recv, RegCloseKey, RegOpenKeyEx, RegSetValueEx, send, ,
SetCurrentDirectory, SetEndOfFile, SetFileAttributes, SetFilePointer,
SetHandleCount, shutdown, socket ,UnmapViewOfFile, VirtualAlloc, VirtualFree,
WaitForSingleObject, WinExec, WriteFile, WSACleanup, WSAStartup...
• User32.dll: MessageBox, SendMessage, FindWindow, PostMessage.
17
Best Practice
• Apply bread-first-search strategy to ask Z3 to
generate as much test-case as possible
• Use JNA (Java Native Access) to simulate API
calling
18
Indirect Jump
• Virus.Win32.Aztec
00401057 . B8 00100000 MOV EAX,1000
0040105C . 05 00004000 ADD EAX, 00400000
00401061 . FFE0 JMP EAX
BE-PUM
IDA Pro
20
Overlapping Instruction
HLLW.Rolog.f
•Junk code modifies the return address.
00437002 E8 03000000 CALL 0043700A
00437007 E9 EB045D45 JMP 45A074F7
00437002 CALL 0043700A
0043700D RETN
0043700A POP EBP
0043700B INC EBP
0043700C PUSH EBP
Code
21
Demo
BE-PUM
IDA Pro
22
Self-Modifying Code
• Virus.Win32.Seppuku.1606 : Self-Modifying Code
00401646 E8 B5F9FFFF CALL 00401000
EDI = 401067
004010E5 MOV EAX,DWORD PTR SS:[EBP+401489]
004010EB STOS DWORD PTR ES:[EDI]
00401646 E8 00000000 CALL 0040164B
23
Decryption
• Email-Worm.Win32.Kickin.d : Self-decryption
00609223 pop ebp
00609224 push 3d
00609226 mov byte ptr ds:[esi+9cccd0e5],dh
0060922C retn 8d9e
0060922F pxor mm5,mm3
00609232 dec ecx
00609233 fiadd word ptr ds:[ecx+80a6b31]
Decryption loop
ecx was set to 0CAh0060933A mov ecx,0ca
00609345 lods byte ptr ds:[esi]
00609346 xor al,ah
00609348 inc ah
0060934A rol ah,2
0060934D add ah,90
00609350 stos byte ptr es:[edi]
00609351 loopd 00609345
00609223 call 00609228
00609228 mov ebx, [ebp+402705]
0060922E add ebx,28
00609231 pop eax
00609232 sub eax,ebx
00609234 mov [ebp+40270d],eax
24
Demo
BE-PUM
IDA Pro
25
Comparison with others
• BE-PUM current tool: precise models (CFG)
generated from real malwares
Indirect jumps (now)
Self-modification (now)
Decryption (now)
SEH (now)
Packer techniques (now)
• Experiments
Compare the CFG with those generated by
Jakstab and IDA Pro
29
Experiment statistics
30
Supported Techniques in Packer
32
Related Works
33
Remarks
• BE-PUM plays the roles of both model generation
and model emulator for binaries
Model Generation: on-the-fly manner, with
concolic technique
–Missing piece: Loop invariant (handled by
looping many many times if needed)
 Emulator
– A “symbolic sandbox”
34
Demo
36
Thank you for your attention

Tetcon2016 160104

  • 1.
    BE-PUM: Binary Emulationfor Pushdown Model Generation Obfuscation code localization based on CFG generation of malware Nguyen Minh Hai Industrial University of Ho Chi Minh City (IUH) with Quan Thanh Tho, Ho Chi Minh City University of Technology (HMCUT) , in Collaboration with Mizuhito Ogawa (JAIST) January 2016
  • 2.
    BE-PUM • Binary Emulationfor Pushdown Model Generation • Key features: Generate model (CFG) from binary code of malware Show better results compared with many other tools, e.g. IDA Pro, Jakstab, Hooper... Tackle many obfuscation techniques and successfully unpack many packers (27 different packers)  Generic Unpacker for Model Generation of Malware Detect packer by semantic signature (recognizing packer techniques)  Sematic Signature Matching for Packer Detection 1
  • 3.
  • 4.
    Malwares • Malware (malicioussoftware) – a real threat Virus Trojan horse Keylogger • How to deal Signature detection (Industry approach) Emulation (Sandbox approach) Model checking (Formal approach) 3
  • 5.
    Issues • Signature-based =Failed by obfuscation techniques • Sandbox-based Heavy cost Virus may have different behaviors (at different time points) Virus may even detect sandbox environment • Model Checking Model Generation Model Checking 4
  • 6.
  • 7.
    Typical approach • ControlFlow Graph (CFG) is generated as the model One program location is mapped a node Decide all of destinations when branching • Things are more difficult with sophisticated binaries: Self-modification code (Encryption/Decryption) Indirect jump Many other obfuscation techniques 6
  • 8.
    Control Flow Graph •Choices of many tools (CodeSurfer/x86, McVeto, JakStab, BIRD, Renovo, Syman, BINCOA/OSMOSE, IDA Pro) Hexa Instructions 0x00401000 cmp eax, 0 0x00401003 jle 0x0040100d 0x00401005 mov eax, 0x00401001 0x0040100a jmp 0x00401015 0x0040100c halt 0x0040100d mov eax, 0x00401018 0x00401012 sub eax, 5 0x00401015 sub eax, 1 0x00401018 0x0040100c 00 03 05 0A 12 0D 15 18 0A 7
  • 9.
  • 10.
  • 11.
  • 12.
  • 13.
  • 14.
    BE-PUM • BE-PUM -Binary Emulation for Pushdown Model • Apply pushdown model generation of binary code Apply concolic testing (dynamic symbolic execution) to handle indirect jump Apply on-the-fly model generation for handling self- modifying code Focus on obfuscation techniques which are used in malware and packer tools. 14
  • 15.
  • 16.
    Running Example Hexa Instructions 0x00401000cmp eax, 0 0x00401003 jle 0x0040100d 0x00401005 mov eax, 0x00401001 0x0040100a jmp 0x00401015 0x0040100c halt 0x0040100d mov eax, 0x00401018 0x00401012 sub eax, 5 0x00401015 sub eax, 1 0x00401018 jmp eax
  • 17.
    Running Example Hexa Instructions 0x00401000cmp eax, 0 0x00401003 jle 0x0040100d 0x00401005 mov eax, 0x00401001 0x0040100a jmp 0x00401015 0x0040100c halt 0x0040100d mov eax, 0x00401018 0x00401012 sub eax, 5 0x00401015 sub eax, 1 0x00401018 jmp eax eax = α
  • 18.
    Running Example Hexa Instructions 0x00401000cmp eax, 0 0x00401003 jle 0x0040100d 0x00401005 mov eax, 0x00401001 0x0040100a jmp 0x00401015 0x0040100c halt 0x0040100d mov eax, 0x00401018 0x00401012 sub eax, 5 0x00401015 sub eax, 1 0x00401018 jmp eax eax = α
  • 19.
    Running Example Hexa Instructions 0x00401000cmp eax, 0 0x00401003 jle 0x0040100d 0x00401005 mov eax, 0x00401001 0x0040100a jmp 0x00401015 0x0040100c halt 0x0040100d mov eax, 0x00401018 0x00401012 sub eax, 5 0x00401015 sub eax, 1 0x00401018 jmp eax eax <0 eax >=0 eax = α
  • 20.
    Running Example Hexa Instructions 0x00401000cmp eax, 0 0x00401003 jle 0x0040100d 0x00401005 mov eax, 0x00401001 0x0040100a jmp 0x00401015 0x0040100c halt 0x0040100d mov eax, 0x00401018 0x00401012 sub eax, 5 0x00401015 sub eax, 1 0x00401018 jmp eax 00 03 05 0a 12 0d 15 18 eax = α Convert symbolic value of α into a concrete value Use white-box testing to under-approximate α jmp to α?
  • 21.
    Test-case Generation 00 03 05 0a 12 0d 15 18 α >=0 α >= 0 α >= 0 α >= 0 00 03 05 0a 12 0d 15 18 α <= 0 α <= 0 α <= 0 eax = α eax = α Test-case-1 = 5 Test-case-2 = -7
  • 22.
    Enlarging the Modelby Testing Result Simulation Snapshot eax=0x0040100C; start=0x00401000; return=0; address=0x0040100C; Hexa Instructions 0x00401000 cmp eax, 0 0x00401003 jle 0x0040100d 0x00401005 mov eax, 0x00401001 0x0040100a jmp 0x00401015 0x0040100c halt 0x0040100d mov eax, 0x00401018 0x00401012 sub eax, 5 0x00401015 sub eax, 1 0x00401018 jmp eax 00 03 05 0a 12 0d 15 18 0c Test- case-1Test- case- 2
  • 23.
  • 24.
    Strategy for coveredinstruction selection • Instruction statistics collected from virus samples • Full list of 300 supported instructions Call Jump Return add shl Call je jz jne jump mov cmovg ret cmp out setna lods daa and sal jnz jb jnae xchg cmovl int pop setnae movs das sub dec jc jnb jae movz cmovl aaa popa setnb neg enter or inc jnc jng jnae movsb cmovna aad popf setnbe nop in xor adc jle ja jl movsw cmovnae aam push setnc shld int1 imul shr jnge jnl jnbe mosx cmovnbe aas pusha setne shrd int3 ror ror jge jo jg movzb cmovne bsf pushf setng stc lahf div rep jnle jns loop movzw cmovng bswap rdtsc setnge stos lea sbb mul js jno jp cmova cmovnge bt sahf setnl test leave clc sar jno jpe jecxz cmovb cmovnl btc scas setnle xlat not ror jmp loope loopne cmovbe cmovnle brt seta setno cbw idiv rcr loop loopz loopnz cmovc cmovno bts setae setnp cwde xadd rol cmove cmovnp cbw setb setns cmps adc rcl cmovp cmovns cdq setbe seto cmpxchg dec mul cmovpe cmovnz clc setc setp cmpxchg8b shr sbb cmovpo cmovo cld sete setpe cpuid sar cmovs cmovz cli setg setpo cwd cltd setge sets cwde cmc setl setz cwt Arthimetic Conditinal Jump Move Control 16
  • 25.
    Supported 400 WindowsAPIs • Kernel32.dll: _lwrite, accept, bind, CloseHandle, closesocket, connect, CopyFile, CreateFile, CreateFileMapping, CreateProcess, CreateThread DeleteFile, ExitProcess, FindClose, FindFirstFile, FindNextFile, FreeEnvironmentStrings, GetCommandLine, GetCurrentDirectory GetCurrentProcess, GetEnvironmentStrings, GetFileAttributes, GetFileSize, GetFileType, gethostbyname, gethostname, GetLastError GetLocalTime, GetModuleFileName, GetModuleHandle, GetProcAddress, GetStartupInfo, GetStdHandle, GetSystemDirectory GetSystemTime, GetTickCount, GetVersion, GetVersionEx, GetWindowsDirectory, HeapAlloc, HeapCreate, HeapDestroy, HeapFree, HeapReAlloc, IsDebuggerPresent, listen, LoadLibrary, lstrcat, lstrcmp, lstrcpy, lstrlen, MapViewOfFile, MoveFile, PeekMessageA, ReadFile, recv, RegCloseKey, RegOpenKeyEx, RegSetValueEx, send, , SetCurrentDirectory, SetEndOfFile, SetFileAttributes, SetFilePointer, SetHandleCount, shutdown, socket ,UnmapViewOfFile, VirtualAlloc, VirtualFree, WaitForSingleObject, WinExec, WriteFile, WSACleanup, WSAStartup... • User32.dll: MessageBox, SendMessage, FindWindow, PostMessage. 17
  • 26.
    Best Practice • Applybread-first-search strategy to ask Z3 to generate as much test-case as possible • Use JNA (Java Native Access) to simulate API calling 18
  • 27.
    Indirect Jump • Virus.Win32.Aztec 00401057. B8 00100000 MOV EAX,1000 0040105C . 05 00004000 ADD EAX, 00400000 00401061 . FFE0 JMP EAX BE-PUM IDA Pro 20
  • 28.
    Overlapping Instruction HLLW.Rolog.f •Junk codemodifies the return address. 00437002 E8 03000000 CALL 0043700A 00437007 E9 EB045D45 JMP 45A074F7 00437002 CALL 0043700A 0043700D RETN 0043700A POP EBP 0043700B INC EBP 0043700C PUSH EBP Code 21
  • 29.
  • 30.
    Self-Modifying Code • Virus.Win32.Seppuku.1606: Self-Modifying Code 00401646 E8 B5F9FFFF CALL 00401000 EDI = 401067 004010E5 MOV EAX,DWORD PTR SS:[EBP+401489] 004010EB STOS DWORD PTR ES:[EDI] 00401646 E8 00000000 CALL 0040164B 23
  • 31.
    Decryption • Email-Worm.Win32.Kickin.d :Self-decryption 00609223 pop ebp 00609224 push 3d 00609226 mov byte ptr ds:[esi+9cccd0e5],dh 0060922C retn 8d9e 0060922F pxor mm5,mm3 00609232 dec ecx 00609233 fiadd word ptr ds:[ecx+80a6b31] Decryption loop ecx was set to 0CAh0060933A mov ecx,0ca 00609345 lods byte ptr ds:[esi] 00609346 xor al,ah 00609348 inc ah 0060934A rol ah,2 0060934D add ah,90 00609350 stos byte ptr es:[edi] 00609351 loopd 00609345 00609223 call 00609228 00609228 mov ebx, [ebp+402705] 0060922E add ebx,28 00609231 pop eax 00609232 sub eax,ebx 00609234 mov [ebp+40270d],eax 24
  • 32.
  • 33.
    Comparison with others •BE-PUM current tool: precise models (CFG) generated from real malwares Indirect jumps (now) Self-modification (now) Decryption (now) SEH (now) Packer techniques (now) • Experiments Compare the CFG with those generated by Jakstab and IDA Pro 29
  • 34.
  • 35.
  • 36.
  • 37.
    Remarks • BE-PUM playsthe roles of both model generation and model emulator for binaries Model Generation: on-the-fly manner, with concolic technique –Missing piece: Loop invariant (handled by looping many many times if needed)  Emulator – A “symbolic sandbox” 34
  • 38.
  • 39.
    Thank you foryour attention

Editor's Notes

  • #39 A simulation is a system that behaves similar to something else, but is implemented in an entirely different way. It provides the basic behaviour of a system, but may not necessarily adhere to all of the rules of the system being simulated. It is there to give you an idea about how something works. Example Think of a flight simulator as an example. It looks and feels like you are flying an airplane, but you are completely disconnected from the reality of flying the plane, and you can bend or break those rules as you see fit. For example, fly an Airbus A380 upside down between London and Sydney without breaking it. Emulation An emulation is a system that behaves exactly like something else, and adheres to all of the rules of the system being emulated. It is effectively a complete replication of another system, right down to being binary compatible with the emulated system&amp;apos;s inputs and outputs, but operating in a different environment to the environment of the original emulated system. The rules are fixed, and cannot be changed, or the system fails. Example The M.A.M.E. system is built around this very premise. All those old arcade systems that have been long forgotten, that were implemented almost entirely in hardware, or in the firmware of their hardware systems can be emulated right down to the original bugs and crashes that would occur when you reached the highest possible score.