CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
MONITORING NATIVE EXECUTION IN WOW64 APPS
1. MONITORING NATIVE EXECUTION IN WOW64 APPLICATIONS
Assaf Carlsbad
@assaf_carlsbad
Yarden Shafir
@yarden_shafir
DEEP HOOKS
2. • I started dancing at the age of 7 and later
competed with a rhythmic gymnastics
team.
• After my military service I practiced dancing
and aerial acrobatics.
• Today I teach aerial acrobatics and perform
on silks and lyra.
• In my spare time, I'm a security researcher
at SentinelOne.
Yarden
3. • Click to add text
Assaf
• Click to add picture
4. BACKGROUND
• AVs (EDR/EPP/NGAV) do tons of user-mode hooking
• Used to intercept and potentially block process’ actions
• User-mode hooks can be (and are) bypassed by malicious techniques
• Some techniques are unique to WoW64 processes
5. WOW64
• Windows on Windows 64
• 32-bit apps running on 64-bit systems
• Filesystem & registry redirection (out of scope)
• Syscall handling
• 2 versions of NTDLL in the process – 32-bit and 64-bit
• API calls from the app reach the 32-bit NTDLL
6. Transition to 64-bit
x86 mode x64 mode
Pointer expansion
Calling convention adjustments
Etc.
OS kernel
WoW64 system call overview
64-bit ntdll
32-bit ntdll
7. HEAVEN’S GATE
• A technique for calling the
64-bit API function without
going through the 32-bit API
• Abuses the JMP 0x33 control
transfer
• Used ITW by various
malware
Application code
64-bit NtAllocateVirtualMemory
SYSCALL
32-bit NtAllocateVirtualMemory
WoW64 transition (jmp 0x33)
Jmp 0x33
Your hook
is here
http://rce.co/knockin-on-heavens-gate-dynamic-processor-mode-switching/
8. THE SOLUTION
• Hook 64-bit APIs in WoW64 processes!
• But…
• Need to inject 64-bit code into the process
• That code should run in a difficult environment
• No hooking library we are aware of can do this out-of-the-box
9. INJECTION
• Lots of injection methods exist out there
• Most can only inject a DLL which has the same bitness as the target
process
• We need to do something unique - inject a 64-bit DLL into a WoW64
process
11. INJECTION #1 – WOW64LOG.DLL
• Wow64log.dll
• A DLL loaded automatically during WoW64 initialization
• Not shipped by default, so can be created in system32
• Easy – just name your DLL wow64log.dll
http://waleedassar.blogspot.com/2013/01/wow64logdll.html
12. INJECTION #2 – HEAVEN’S GATE
• To detect the malware, you must become the malware
13. INJECTION #2 – HEAVEN’S GATE
• 2 image loaders: 32-bit and 64-bit
• Requires injection of 32-bit code first
• Use Heaven’s Gate to transition into x64 mode
• Call 64-bit LdrLoadDll()
Image
loader
Image
loader
AV
32-bit
DLL
32-bit 64-bit
64-bit
DLL
WoW64 Process
14. INJECTION #3 - APC
• Asynchronous Procedure Call
• Kernel mechanism that provides a way to execute a custom routine in
the context of a particular thread
• User-mode APCs
• Runs with user-mode permissions
• Target thread must enter alertable wait state
• Handled by ntdll!KiUserApcDispatcher
• Usually queued from a kernel driver
https://wikileaks.org/ciav7p1/cms/page_7995519.html
15. INJECTION #3 - APC
INJECTION #3 - APC
• Popular among AVs
• Queue an APC to LdrLoadDll()/LoadLibrary()
• Used to inject a DLL with the same bitness as the target process
• In WoW64 processes – APCs can run in 32-bit or 64-bit mode
• Can call 64-bit loader functions!
19. CFG – CONTROL FLOW GUARD
• Exploit mitigation feature introduced in Windows 8.1
• Only allows indirect calls to valid call targets
• Indirect calls to invalid targets will crash the process
WITHOUT CFG WITH CFG
20. VALID CALL TARGETS
• Valid call targets:
• For images – Start addresses of functions
• For executable private memory allocations – All of the buffer
• CFG uses a bitmap to mark valid executable addresses
• Each bit represents 8 bytes in the process’ address space
• Valid call targets are marked in the bitmap whenever new executable
memory is introduced into the process
24. BACK TO APC INJECTION
• APC target contains 64-bit code
• Handled by 64-bit KiUserApcDispatcher
• Adapter thunk is never called
• Adapter thunk is not considered a valid call
target by CFG validation routine
Kernel driver
64-bit NTDLL
KiUserApcDispatcher
“adapter thunk”
64-bit NTDLL
LdrLoadDll
CFG validation
25. SO WHERE’S THE PROBLEM?
• Our “adapter thunk” is not marked in the native CFG bitmap
• Only 64-bit modules are marked in the native CFG bitmap
• Private memory allocations are always marked in the WoW64
bitmap
64-bit module Private 64-bit code
27. OPTION #1 – “NATIVIZE” THE PROCESS
• To check if the process is native, the kernel uses the WoW64Process
member of the EPROCESS
• If we set EPROCESS->WoW64Process to NULL, MiSelectCfgBitmap will:
• Assume that the process is a native one
• Mark the adapter thunk in the native bitmap
28. OPTION #2 – “THUNKLESS” APC INJECTION
• Private memory is marked in the WoW64 CFG bitmap
• No private memory = no problem
• Can we call the 64-bit LdrLoadDll() directly?
64-bit APC 64-bit LdrLoadDll()
64-bit APC Private 64-bit code
30. NOT A FAILURE YET
• Because of x64 calling convention, every function implicitly
receives 4 arguments
• First 4 arguments are passed in registers: rcx, rdx, r8, r9
• We can control the first 3 parameters passed by the APC
• Whatever value is in r9 will be interpreted as the fourth
push p5
mov r9, p4
mov r8, p3
mov rdx, p2
mov rcx, p1
call func
31. REQUIREMENTS
• Fourth parameter is an output parameter
• Needs to be a pointer to writable memory
• Needs to be memory we can overwrite without messing things up
32. WHAT’S IN R9?
• R9 holds a CONTEXT structure
• Will be used to resume the thread after
APC dispatch (via NtContinue)
• First few members don’t hold CPU-
related data and can be overwritten
http://www.nynaeve.net/?p=202
34. INLINE HOOKS 101
• We want our DLL to hook the 64-bit NTDLL
• Most hooking engines use “inline hooks”
35. CONSTRAINTS
• No hooking engine can hook 64-bit APIs in WoW64 apps
• Major limitation - no core Win32 DLLs
• Kernelbase.dll
• Kernel32.dll
• user32.dll
• msvcrt.dll
• Strip all dependencies other than 64-bit NTDLL
• Re-implement WIN32 APIs
• Disable some security and runtime checks
• Replace some functions (memset, memcpy) implemented in CRT
39. MAYBE NOT?
• Until Windows 8 all the modules in WoW64 processes resided below
4GB
• In Windows 8.1 the 64-bit NTDLL was moved above 4GB
Windows 7Windows 10
40. • The JMP used by inline hooks (0xE9) only allows jumping 2GB or less
into the trampoline
• No other code can be allocated above 4GB
• Distance between the trampoline and the hooked function is much
greater than 2GB => STATUS_EPIC_FAILURE
BACK TO THE DRAWING BOARD #1
41. JMP $+0x11223344
MOV rax,0x1122334455667788
JMP rax
PUSH 0x55667788
MOV dword ptr [rsp+4],0x11223344
RET
JMP qword ptr [rip+0]
0x1122334455667788
MOV rax, qword ptr [0x5000]
JMP rax
BACK TO THE DRAWING BOARD #1
1 2
3
4 5
http://www.ragestorm.net/blogs/?p=107
42. WORKS ON WINDOWS 10 (BUT ONLY THERE…)
• Instruction is too long for environments older than Windows 10
Windows 7 Windows 10
43. PUSH 0x55667788
MOV dword ptr [rsp+4],0x11223344
RET
00000000
55667788
deadbeef
12345678
44332211
aabbccdd
Return
address
BACK TO THE DRAWING BOARD #2
• Memory can only be allocated in lower 4GB
• Trampoline will be in lower 4GB
• 64-bit addresses under 4GB –
0x00000000xxxxxxxx
• Push imm32 zero-extends the value to 64-bits
45. DEEP HOOKS - RECAP
• Injection of 64-bit DLL to WoW64 process:
• 3 (relatively) known injection methods
• 2 (new) variations to APC injection
• Modified hooking engine
• Re-implemented Win32 APIs
• Project configuration changes
• Replaced the JMP instruction from the hooked function to the detour
Yarden
A general schema showing the path a system call has to go through to reach from the application to the operating system.
Important part: 32-bit API function, 64-bit API function, control transfer from 32-bit to 64-bit (required to make communication with the OS possible).
Assaf
In this talk we will present 3 main injection methods capable of injecting 64-bit DLLs into WoW64 processes.
We will start with the easy stuff, and go on the more hardcore things.
The last one of these will require some modifications in order to operate on recent Windows versions, for which we will present 2 different variations (not classical method – new generation stuff).
Assaf
A very rough pseudo-code of APC injection using a “shellcode”.
The shellcode is shown in the previous slide. This code attaches to the target process, initializes the context which will be used by the APC and initializes and queues the APC.
Allegro!
Assaf
Explain what is shown here:
We queued our APC and NTDLL (KiUserApcDispatcher -> KiUserCallForwarder) tried to call our APC routine.
This attempt failed with InvalidCallTarget exception and caused the process to crash.
This happened because of CFG, which we’ll talk about in the next slide.
Yarden
This graph shown how the validation looks when an invalid call target is called.
Yarden
Our problem lies in this function – MiSelectCfgBitmap().
This is the function that decides which bitmap will be marked for each executable section.
Since the “Segment” parameter will be NULL for private memory allocations (never mind why, it just is), the WoW64 bitmap will be marked instead of the native one.
BUT, there are other conditions to be met for the WoW64 bitmap to be marked. Maybe we can “cheat” one of them and use it to our advantage.
Assaf
Explain what the EPROCESS is – a unique structure for every process, only reachable from the kernel. Contains lots of data about the process, its threads, etc.
If this is a native process, there should only exist a native bitmap, so this is the only option.
But if we make the kernel think that the process is native, it will always use the native bitmap even though the wow64 bitmap still exists.
Explain the proposed solution (shown in the snippet)
Save the current value of WoW64Process (it’s a pointer to some structure)
Set Wow64Process to 0, making the process look like a native one
Allocate our “shellcode”, it will be marked as a valid call target in the native bitmap
Set Wow64process back to its previous value, because we don’t need to seem like a “native” process anymore.
Assaf
Explain what the EPROCESS is – a unique structure for every process, only reachable from the kernel. Contains lots of data about the process, its threads, etc.
If this is a native process, there should only exist a native bitmap, so this is the only option.
But if we make the kernel think that the process is native, it will always use the native bitmap even though the wow64 bitmap still exists.
Explain the proposed solution (shown in the snippet)
Save the current value of WoW64Process (it’s a pointer to some structure)
Set Wow64Process to 0, making the process look like a native one
Allocate our “shellcode”, it will be marked as a valid call target in the native bitmap
Set Wow64process back to its previous value, because we don’t need to seem like a “native” process anymore.
Allegro!
Assaf
(Add some cute animation of this)
We decided that the best course of action would be to replace the jump instruction with another, one that would allow us to jump the required distance from the hooked function to the trampoline.
We considered a few options:
Jmp rip+offset (default option) – only works up to 2 GB – not enough
Mov rax, [addr]; jmp rax – dirties a register
Mov rax, addr; jmp rax – dirties a register
Push half-addr; mov [rsp+4], half-addr; ret – long, complex, using half of addresses could be messy
Jmp [rip+0]; addr - works
Assaf
When an address is pushed on the stack in 64-bit systems, the stack allocates 8 bytes for the address.
If a 4-byte address is pushed, then this address will be used as the lower half of the address, the top half being sign-extended.
In our case, that means that as long as our trampoline in located below 2GB (standard case), then the top half of the address will be zeroed out, resulting in the legitimate address of our trampoline.