Mingbo Zhang, Rutgers University
Saman Zonouz, Rutgers University
Time-of-check-to-time-of-use (TOCTOU) also known as “race condition” or “double fetch” is a long standing problem. Since memory read/write is so common an operation, it barely triggers no security mechanisms. We leverage a CPU feature called SMAP(Supervisor Mode Access Prevention) to efficiently monitor the events of kernel accessing user-mode memory. When user pages being accessed by kernel, our mitigation kicks in and protect them against further modifications from other user-mode threads. We also leverage the same CPU feature to find double fetch errors in kernel modules. A simple hypervisor is used to confine a system wide CPU feature such as SMAP to particular process.
5. Why It Happens
● Kernel get data directly from user-mode memory
● User memory is a shared resource
● No mechanism to inform other users when it changes
● The kernel should “Capture” the value
5
6. Anything Kernel Touches, Stay the Same
6
syscall
Thread 0
parameters
passed in
…
pointer0
…
pointer1
…
…
Thread 1
write
kernel
user
kernel
access
7. Supervisor Mode Access Prevention (SMAP)
● The kernel cannot access user-mode pages.
● It triggers page faults.
● Set CR4.SMAP to enable it.
● SMAP can be disabled when setting EFLAGS.AC=1.
● Two instructions STAC (Set AC Flag) and CLAC (Clear
AC Flag) can be used to easily set or clear the flag.
7
8. SMAP in Linux
Two gateway functions
● copy_to_user()
● copy_from_user()
Where SMAP is temporarily
disabled.
When OS sees a SMAP
exception, it panic.
8
copy_from_user(...)
{
stac();
.... copy;
clac();
}
9. 9
User Memory
Kernel Memory
kernel
page
#PF
0xFF
Leverage SMAP Differently
● Any user page referenced by the
kernel will be set as a kernel page
until current system call ends.
● Page is protected from other user threads
● SMAP #PF being handled
10. What If Other Threads Need to Read The
Same Page
● Simply change it back to
userspace and set it to read-only.
10
User Memory
kernel
page
thread
user
page
0xFF
#PF
11. Implementation
● Hook page fault handler (0x0E).
● Hook Windows internal function KiSystemCallExit to know
when the syscall ends.
11
12. Page Fault Handler
We only handle exceptions that related to SMAP. Others are
passed to the OS kernel.
● Page fault error code (stack)
● EIP, CS, ESP, SS, EFLAGS (stack)
● Current page directory base (CR3)
● Virtual address that caused the exception (CR2)
12
13. Exception Flooding
● SMAP is a system-wide feature.
● Windows kernel not SMAP ready.
● Syscall has user parameters.
● Win32k.sys
13
14. Reduce Exceptions
● Debugging page fault handler is not very
convenient.
● To reduce exceptions, we want SMAP only
effective on a particular process.
● Use virtualization to confine SMAP within a
process.
14
15. Hypervisor
● Goal: Set CR4.SMAP bit when the CPU runs on the target process.
● Similar to what virtualization rootkit usually does, our thin hypervisor lifts the
current system into VM guest mode.
15
Operating System
Hypervisor
driver
Operating System
(guest mode)
HardwareHardware
Hypervisor
16. 16
Hypervisor
Process Target
process
Process
...
Process Target
process
Guest CR4.SMAP = 0
VM EXIT
Set VMCS.GuestState.CR4.SMAP = 1
mov cr3, eax
VM ENTER
Target
process
SMAP
Enabled
Guest CR4.SMAP = 1
mov cr3, eax
Process
Set VMCS.GuestState.CR4.SMAP = 0
● Monitor process
context switches
events.
mov cr3, exx
17. Write Conflict
17
● Idea 1:
○ Thread level CopyOnWrite split page tables
when write conflict occurs.
○ Cons: How to merge more pages back.
● Idea 2:
○ Wait for the current protection to end (current
syscall ends)
○ Wait inside the page fault exception handler
18. Page Fault Exception Handler
● #PF is an exception (faults).
● PASSIVE_LEVEL
● KeDelayExecutionThread()
18
Interrupt
Descriptor
Table
Interrupt
Exception
An interrupt is an asynchronous event that is
typically triggered by an I/O device.
An exception is a synchronous event that is
generated when the processor detects one or
more predefined conditions while executing an
instruction.
Faults
Traps
Aborts
External
NMI
20. Flash TLB on SMP System
● We simply change User/Supervisor bit in PTE to switch the page between
user and kernel space.
● Since each CPU core has its own cache (TLB), changing PTE alone in
memory may not be sufficient on a multiprocessor system.
20
● invlpg on local core
● Send IPI (Inter Processor Interrupt) to
other cores
● KeFlushSingleTb
21. Set Interrupt Flag in Page Fault Handler
21
● In page fault handler, the processor clears the IF(Interrupt enable flag) in
EFLAGS register and it will be set back when interrupts "IRET".
● After getting there faulting virtual address from CR2, set IF flag.
Interrupt
Descriptor
Table
Interrupt gate
Trap gate
Clears the interrupt enable (IF) flag in
the EFLAGS register.
22. Evaluation
● Intel Core I5-6400 (6 GEN CPU Skylake), ASUS H110M-C motherboard(Intel
H110 Chipset, Realtek RTL8111H Network Controller), 8GB ram and 500GB
hard disk.
● CVE-2008-2252 fixed in MS08-061.
● Simple POC code that crashes the system.
22
25. Try to Find More Bugs
● Inspired by Bochspwn (Mateusz Jurczyk, Gynvael Coldwind)
“Identifying and Exploiting Windows Kernel Race Conditions via Memory
Access Patterns”
25
● Observe kernel-to-user memory access patterns more efficiently
● Same hardware feature -- SMAP
26. Use SMAP only for Monitoring
1. The kernel triggers SMAP exception.
2. Record information.
3. Let the kernel go.
26
● SMAP exception once triggered, it’s too late to disable it.
(EFLAGS.AC)
27. First Try: SMAP + Single-step Trap
● Set Trap flag.
● Debug software must set Resume flag in the EFLAGS
image on the stack just prior to returning to the interrupted
program with IRETD.
● It seems that Resume flag doesn’t work in page fault
context.
27
TRAP FLAGRESUME FLAG
28. Set Breakpoints Manually
● In page fault handler
○ Parsing the length of the current instruction to locate the beginning of the
next instruction.
○ Write a breakpoint (byte 0xcc).
● Intercept breakpoint trap event in the hypervisor.
○ The current process is SYSTEM. Switch to the target process that
triggers the breakpoint.
○ Write back the original byte.
○ Release the protected page.
○ Re-execute the faulting instruction.
28
29. Memory Access Patterns
● Same user-mode virtual address accessed more than
once.
● Within one syscall.
● Same thread.
29