JIT Spraying Never Dies - Bypass CFG By Leveraging WARP Shader JIT Spraying.pdf
1. JIT Spraying Never Dies - Bypass CFG
By Leveraging WARP Shader JIT Spraying
Bing Sun, Chong Xu
2. Abstract
• Many scripting languages, such as JavaScript and ActionScript, use Just-In-Time (JIT) compilation
to improve the script execution performance. However, under some circumstances, the legit JIT
mechanism can be leveraged by the exploit to bypass memory protection and mitigation such as
ASLR and DEP. Such exploitation technique was first introduced as "JIT Spraying" in 2010. The
idea is to use the constant numeric value in high-level script language to generate the desired
JITed code at predictable locations. With the JIT spraying as a reliable exploitation technique
seeing its popularity, vendors started to revisit the JIT engine implementation. Since then,
mitigation countermeasures, such as randomizing the JIT code page allocation and mutating
JITed code generation, have been employed to prevent JIT spraying. Particularly, MS WARP
Shader JIT engine, which we will exploit in this talk, has security mechanisms such as Shader
complexity, JIT cache size limit, separation between the constant data and code. As a result, the
JIT spraying technique became less effective in most exploitation scenarios. Nevertheless, JIT
Spraying technique has never died, even in the most secure Windows 10 era. In this talk, we will
present a completely different JIT spraying exploitation technique (based on MS WARP JIT) to
bypass control flow guard (CFG) in the context of browser in a generic way. This presentation
provides details on how to circumvent the MS WARP JIT restrictions and achieve reliable CFG
bypass. At the end, a live demo will be given to demonstrate bypassing CFG on IE11 and Edge of
Windows 10.
3. About Speakers
• Bing Sun
– Bing Sun is a senior information security researcher, and now he is leading the IPS security research
team of Intel Security Group (formerly McAfee). He has extensive experiences in operating system
kernel and information security technique R&D, with especially deep diving in advanced
vulnerability exploitation and detection, Rootkits detection, firmware security and virtualization
technology. Moreover, Bing is also a regular speaker at international security conference, such as
XCon, Black Hat and CanSecWest.
• Chong Xu
– Chong received his Ph.D. degree in networking and security from Duke University. His current focus
includes research and innovation on intrusion and prevention techniques as well as threat
intelligence. He is a senior director of Intel Security IPS team, which leads Intel Security vulnerability
research, malware and APT detection, and botnet detection and feeds security content and
innovative protection solutions into Intel Security’s network IPS, host IPS, and sandbox products, as
well as McAfee Global Threat Intelligence (GTI).
5. Background Knowledge
• Direct3D & DXGI
• Software Rasterizer & WARP
• Rendering Pipeline & Shader
• GLSL/HLSL
• WebGL & Its Usage
• Shader’s Lifecycle on WARP
• The Basic Principle of CFG
• Known CFG Bypass Methods
6. Direct3D & DXGI
• Direct3D (part of DirectX)
– A Microsoft DirectX API subsystem component. It is
presented like a thin abstract layer between a graphics
application and the graphics hardware drivers
(comparable to GDI).
– Provides low-level API for drawing primitives with the
rendering pipeline or performing parallel operations
with the compute shader.
– Compete with Khronos' OpenGL and its follow-on
Vulkan.
• DXGI (Microsoft DirectX Graphics
Infrastructure )
– Encapsulates some of the low-level tasks that are
needed by Direct3D 10/11/12.
– Enumerating graphics adapters, enumerating display
modes, selecting buffer formats, sharing resources
between processes, and presenting rendered frames to
a window or monitor for display.
7. Software Rasterizer & WARP
• Software Rasterizer
– A software component that can render an image
independent on graphics hardware (GPU). The rendering
takes place entirely in the CPU.
• WARP (Windows Advanced Rasterization Platform )
– WARP is a full-featured Direct3D 10 software rasterizer
that does not require graphics hardware (GPU) to
execute.
– WARP can be used for rendering when no compatible
hardware is available, in kernel mode applications, in a
headless environment, or for remote rendering of
Remote Desktop Connection client.
– WARP contains two high-speed, real-time compilers:
• The high-level intermediate language compiler that
converts HLSL bytecode and the current render
state into an optimized stream of vector
commands for the Shaders.
• The high-performance JIT code generator.
8. Rendering Pipeline & Shader
• Rendering Pipeline
– Refers to the sequence of steps used to create a 2D raster
representation of a 3D scene, and it is the process of turning 3D
model into what the computer displays.
– Modern GPUs use a programmable rendering pipeline that makes it
possible to write your own functions to control how shapes and
images are rendered using vertex and fragment Shaders.
• Shader
– Shader: An user-defined program that is used to do Shading (the
production of appropriate levels of color within an image, or to
produce special effects or do video post-processing). Shader is
designed to execute one of the programmable stages of the rendering
pipeline.
– Vertex Shader: A pipeline stage that handles the processing of
individual vertices, and it performs transformations to post-projection
space (vertex's 3D position in virtual space to the 2D coordinate), and
per-vertex lighting etc.
– Fragment Shader (aka. Pixel Shader): A pipeline stage after a primitive
is rasterized, and it processes a fragment (pixel) generated by the
Rasterization into a set of colors and a single depth value.
9. GLSL & HLSL
• Shading Language
– A graphics programming language adapted to programming Shader effects (characterizing
surfaces, volumes, and objects).
• GLSL (OpenGL Shading Language)
– A high-level shading language based on the syntax of the C programming language.
– It was created by the OpenGL ARB (OpenGL Architecture Review Board) to give developers
more direct control of the graphics pipeline without having to use ARB assembly language
or hardware-specific languages.
• HLSL (High-Level Shading Language)
– A proprietary shading language developed by Microsoft to augment the Shader assembly
language.
– HLSL is analogous to the GLSL shading language used with the OpenGL standard.
10. An Example of Shaders
Defined with GLSL
A Fragment Shader
A Vertex Shader
11. WebGL and Its Usage
• WebGL (Web Graphics Library)
– a JavaScript API for rendering interactive 3D computer graphics and 2D graphics within any
compatible web browser without the use of plug-ins.
– WebGL programs consist of control code written in JavaScript and shader code (GLSL).
– Officially supported by MS IE11 & Edge.
• Create a WebGL Shader program
1. Define Shaders with GLSL in the page.
2. Add a Canvas element to the page, and create a new WebGL rendering context
(getContext("experimental-webgl")).
3. Get Shader source code and compile shader (createShader, shaderSource, compileShader).
4. Attach Shaders to program and link program (createProgram, attachShader, linkProgram) .
5. Feed data from JavaScript into Shader program through attribute or uniform
(getAttribLocation, enableVertexAttribArray, bindBuffer, vertexAttribPointer,
getUniformLocation , uniformxxx).
6. Draw to the screen (drawArrays, drawElements).
13. The Basic Principle of CFG
• About CFG (Control Flow Guard)
– A compiler-aided exploitation mitigation mechanism that prevents exploit from
hijacking the control flow.
– Compiler inserts CFG check before each indirect control transfer instruction
(call/jmp), and at runtime the CFG check will validate the call target address
against a pre-configured CFG bitmap to determine whether the call target is valid
or not. The process will be terminated upon an unexpected call target is identified.
– The RVA of all valid call targets determined at the time of compilation are kept in
a Guard CF Function table in PE file. During the PE loading process, the loader will
read CF info from guard CF function table and update the CFG bitmap.
– The read-only CFG bitmap is maintained by the OS, and part of the bitmap is
shared by all processes. Even bit in CFG bitmap corresponds to one 16-bytes
aligned address, while odd bit corresponds to 15 non 16-bytes aligned addresses.
– When the PE file is loaded, __guard_check_icall_fptr will be resolved to point to
ntdll!LdrpValidateUserCallTarget.
14. The Basic Principle of CFG
(cont’d)
Compiler inserts a call
target check before each
indirect function call/jmp
CFG bitmap base
High 24-bit of call target
address is used as an
index into the bitmap to
get a 32-bit bitmap entry
Bit 3 ~ 7 of target address
is used as an offset
Test the bit “offset” of that
32-bit bitmap entry. Target
address is valid if bit is set,
otherwise trigger INT 29h
Non 16-byte aligned, set bit 0 of offset
16-byte aligned
15. Known CFG Bypass Methods
• Call VirtualProtect Wrapper to replace ___guard_check_icall_fptr
– The Wrapper itself must be able to pass CFG check.
– The Wrapper is better to take as few arguments as possible to facilitate passing arguments from high
level language.
– Fixed by adding extra logic in wrapper to make sure it can not be used for other purposes.
• Transit via unguarded trampoline (either in executable or in JIT code)
– The trampoline itself must be able to pass CFG check.
– The target address of unguarded indirect control transfer instruction must be controllable.
– Fixed by introducing a CFG check before the indirect control transfer instruction.
• Leverage stack desynchronization situation to overwrite function return address
– Requires a function that contains a controllable function callout, which is used to cause stack imbalance.
– A controllable value must be pushed onto the stack, which happens to overwrite the function’s saved
return address.
– Fixed by enforcing stack pointer sanity check.
16. Bypass CFG via WARP Shader
JIT Spraying
• The Security Assessment on WARP Shader JIT Mechanism
• The Weakness of WARP Shader JIT Engine
• The Challenge of Exploiting WARP Shader JIT & Solution
• The Detailed Bypass Implementation
• The Possibility of Exploiting 64-bit Browser
17. The Security Assessment on
WARP Shader JIT Mechanism
• Some security related measures
(intentional or otherwise) in WARP
Shader JIT implementation raised the
bar of performing successful JIT
spraying attack.
– JIT cache limits
– Separation of data and code
18. JIT Cache Limits
Max cached Shader in JIT cache is 0x180, exceeding that threshold leads to the
deletion of cached Shader, thus breaks the continuity of sprayed memory layout.
20. The Weakness of WARP Shader
JIT Engine
• Although security measures have been
considered in WARP Shader JIT
implementation, weaknesses still exist,
making it possible to leverage WARP
Shader JIT to bypass memory mitigation.
– No randomization of JIT page allocation
and JITed code generation
– No CFG for JITed code
21. The Weakness of WARP Shader
JIT Engine (cont’d)
• No Randomization of JIT Page Allocation and
Code Generation
– WARP JIT code page is allocated by
kernel32!VirtualAlloc with MEM_TOP_DOWN flag. As a
result, the repeated JIT call can eventually generate
continuous RX pages at the high address end. After
spraying a big enough space (about 19M), certain
address will become stable and predictable.
– The same Shader will always generate the same JITed
code on the same OS (i.e. the same version of WARP).
• No CFG for JITed Code
– All bits in CFG bitmap are set by default, meaning any
address in JIT code page will be treated as a valid call
target.
24. The Challenge of Exploiting
WARP Shader JIT
II. The speed of WARP JIT Spraying
‒ It takes approx. 5 minutes to spray 0x13F0000 (19M) in order to cover some
predictable address (such as 0x7EC3XXXX).
III. JITed instruction generation
– Separation of data and code makes it very difficult to control a JITed instruction of
three-bytes in length or longer.
I. WARP module not loaded
‒ WARP will NOT be used if the hardware display
device supports D3D10 and above.
‒ Some tested platforms where WARP is used by
default: earlier version of VMware (or VM tools not
updated), MS Hyper-V (without RemoteFX),
Remote Desktop etc.
25. WARP Module Not Loaded
Data
Code
Check this option to add
support for D3D10
WARP will not
come into play
when the
underlying
display hardware
supports D3D10
Check this VM setting option
to enable D3D10 support
Web page contains WebGL Shader
26. The Speed of WARP JIT Spraying
Data
Code
Check this option to add
support for D3D10
It takes approx. 4.5 minutes to
spray a space of 19M
Out of 535M private bytes, WARP JIT
data/code only accounts for 19M
27. The Solution to Challenge I & II
• Without the ability of Arbitrary
Address Read/Write(AAR/AAW),
there seems to be not much we can
do to get these problems solved!
• Magic will happen with the help of AAR/AAW.
• Manipulates the internal data structure of D3D to force the instant
functioning of WARP module on any platform (simply call
LoadLibrary will NOT work though)!
• Tweaks the internal parameter of WARP JIT page allocation to reduce
the whole JIT spraying time to only a few second!
28. Force Loading WARP Module
and Fast JIT Spraying
Data
Code
Check this option to add
support for D3D10
WARP module is loaded even on
D3D10 hardware
Use larger JIT section allocation
to speed up the JIT spraying
29. The Solution to Challenge III
• Specially crafted Shader can generate
some useful two-byte long instructions,
such as x94xc3; however this sequence is
no longer usable because CFG check will
alter the value in the eax register.
• Sometimes simple things can make our life easier.
• The natural JIT function epilog.
• Indirect jmp via esi.
30. The Solution to Challenge III (cont’d)
• The natural JIT function epilog
‒ pop ebx // pop function call return address
pop ebp // pop the 1st argument
mov esp, ebp // switch the stack to something we control
pop ebp // skip the 1st dword
ret // transfer the control to wherever we want
‒ The 1st argument of function call must be controllable.
‒ No need for specially crafted Shader.
‒ No need for additional stack pivot ROP.
‒ A lot of usage scenarios, especially when AAR/AAW ability are
acquired (see examples in the following slides).
31. The Solution to Challenge III (cont’d)
• Indirect jmp via esi
‒ jmp dword ptr [esi+0fh]
‒ The value in esi must be controllable.
‒ No need for specially crafted Shader.
‒ Need stack pivot.
‒ In terms of the virtual function call format in 32-bit binaries, call
dword ptr [esi+xx] is more difficult to find comparing to call esi/call
edi, therefore more difficult to exploit.
36. The Detailed Bypass
Implementation
Craft some WebGL Shader
and make sure it’s able to
generate the desired JIT
gadget at certain fixed
offset
Trigger WARP
Shader JIT in a
repeated manner
by creating/linking
new WebGL
program object and
drawing on the
Canvas in a loop
After spraying a big enough
space, some JIT gadget will
be expected to appear at
certain predictable address.
(Sometimes some searching
work may be needed)
Use that particular JIT
gadget as a trampoline to
bypass CFG, and later
transfer to the main
shellcode or other ROP
gadget if necessary
If for some reason WARP is
not enabled on the system
by default, do some magic
to make it come into play at
runtime
37. The Possibility of Exploiting
64-bit Browser
Data
Code
Check this option to add
support for D3D10
* In fact no need to spray such a huge space if AAR is acquired, the
JIT page address can be deduced by leaking WARP module base.
* Spraying 9G is big enough to make some address covered, but it still
needs some searching to find the JIT gadget within each 1/2G section.
41. Extra Demo
• Other 0day CFG/DEP Bypass Methods
– 0day I (Replace fptr in .idata section)
– 0day II (Create RWX memory)
– 0day III (Make arbitrary memory RW)
42. Demo - CFG Bypass 0day I
Replace fptr in .idata Section
44. Demo - CFG Bypass 0day III
Make Arbitrary Memory RW
45. Q&A
• You are welcomed to send questions to
– Bing Sun @ bing.sun@intel.com
– Chong Xu @ chong.c.xu@intel.com
• Thank MSRC for helping getting the issue fixed in MS June
Patch.
• Special thanks to Haifei Li, Stanley Zhu and the ISecG IPS
Vulnerability Research team.