This was a paper that I wrote about a CFI system, which works alongside randomization techniques like ASLR. It uses the offsets of branch instruction's destination instead of the absolute address. This works alongside ASLR because randomization based protection mechanisms, do the randomization on PAGE size basis, not inside the page itself...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
A holistic Control Flow Integrity
1. HCFI: Holistic Control Flow Integrity
A kernel-level approach to enforce CFI
M.Golyani, S.Niksefat
2017
Abstract—While Control Flow Integrity is one of the most
powerful methods used to prevent attackers from obtaining
control of a process, there are still some shortcomings in different
aspects of yet presented CFI systems. In this paper, we propose
a new CFI system which is able to work alongside with other
protection schemes, without the need of the program’s source
code, specific hardware, and binary rewriting. Our proposed
work uses kernel facilities as well as performance counters in the
processor to monitor the execution of the protected applications
and detects any violation of the correct execution flow. In this
CFI system, the CFI policy is generated once on a single machine
and used on other machines as well. We have implemented this
system on a Linux box and evaluation results indicate that this
CFI system is completely practical with low overhead and is able
to detect various kinds of attacks.
I. INTRODUCTION
Up to now, lots of mechanisms have been developed to
provide security for operating systems and running processes.
Among them, Control Flow Integrity (CFI) [2] is one of the
most reliable techniques. In CFI-based solutions, the overall
process is that in the first step, also known as the offline phase,
a valid control flow graph is depicted for each binary, and
in the on-line phase, when the operating system is executing
the binary, a CFI enforcement mechanism takes place which
compares the current execution flow with the saved one. If
the CFI system finds any violation of valid control flow
graph, it raises an alarm and takes proper action accordingly.
Implemented successfully, CFI is one of the most proper
protection schemes due to the fact that it is not restricted to a
specific type of attack and detects any kind of violation from
the valid execution flow.
The practicability of yet presented CFI mechanisms has
been discussed in many other papers. Although in some of
these papers it is stated that the studied CFI mechanisms
can’t provide a suitable protection for the system, it should
be noticed that most of these CFI mechanisms are similar
to each other in case of the detection process. In these
CFI mechanisms a set of valid targets for indirect branches
is generated and during the run-time, each indirect branch
instruction is checked against the list separately. While this
approach may be bypassed [14], [3], [8], [17], a holistic CFI
mechanism, which we will introduce later on, can still provide
a suitable protection for the system.
The CFI mechanisms presented until today, are categorized
into two groups of fine-grained and coarse-grained CFI sys-
tems, while there can be a third approach between fine-grained
and coarse-grained CFI. In this paper, we present a new CFI
system that uses a holistic approach to check control flow
integrity in a period and not only at a specific time. In our
proposed system, alongside analyzing the valid targets for
branches inside a program, the whole execution flow of the
executed process is also monitored (a holistic CFI system),
therefore any violation from control flow graph is detected
immediately.
Furthermore, in our proposed work, the ability to implement
the CFI system alongside other protection mechanisms like
ASLR and DEP is addressed as well. This characteristic has
not been appropriately considered in existing CFI systems
yet. This proposed CFI system protects both statically and
dynamically linked executable files without the need to source
or recompilation as well as instrumentation and any specific
hardware equipment.
Contributions:
In summary, the contributions of this paper are as follows:
• We propose a CFI system which detects and enforces the
CFG by considering the sequence of branches made till
a certain point. Using a sequence of branches at a time
instead of a single branch at a time provides a holistic
view of the program’s execution flow.
• Our presented system is designed in a way that can be
used alongside other protection schemes like ASLR.
• It is the only CFI system which works in a centralized
way. Computation of CFG for each binary is performed
in a central system and the computed CFG can be used
on other systems.
• We have constructed a working prototype of the presented
system in an Ubuntu 14.0.4 LTS operating system as
a kernel module which protects binaries on a system
which has ASLR and exec-shield enabled and compile-
time protections are used as well.
The rest of this paper is organized as follows. In the ”Back-
ground” section some information about the history of attacks
and the existing CFI systems is provided. In this section we
study the advances made in both the attack techniques and the
defence mechanisms. Section III presents the security model
we considered in designing and implementing the proposed
work. In section IV an overview of the proposed work is
depicted and in sections V and VI we provide an in-depth view
of the two main operation phases in the proposed work. Sec-
tion VII presents the evaluation results of an implementation
of the proposed work in terms of security and performance.
2. Related works even though discussed throughout the paper, is
presented in Section VIII, and in section IX we conclude the
paper.
II. BACKGROUND
Since Elias Levy’s article titled ”Smashing the stack for fun
and profit” at 1996 [23], lots of attacks have been introduced to
take the control of processes and alongside with these attacks,
security solutions have been presented as well. One of the
most well-known attacks is stack buffer overflow attack which
exploits the lack of boundary checking in some programming
languages to overwrite sensitive data in memory. This attack
was first introduced to exploit stack buffers but later expanded
to heap area as well.
To defend against overflow-based attacks, security mecha-
nisms like canary based protections were introduced, in which
a particular value, known as canary is placed in the buffer,
right beside critical pointer and at the way of the overflow.
Using this technique, whenever overflow occurs, just before
the overflow overwrites the critical pointer, it has to overwrite
the canary to reach the pointer and hence the protection system
detects the change of the canary and raises an alarm.
While this technique works fine against overflow-based
attacks, it has no effect on other types of attacks like format
string-based attacks. Format string attacks were proposed after
buffer overflow attacks and using them, an attacker is able to
overwrite a sensitive pointer in memory, e.g., return address,
and transfer the control of the process execution into his/her
own injected shellcode.
When an attacker is able to put some code into the mem-
ory, he/she can use some techniques like format-string-based
attacks to execute his/her injected code. In general, these types
of attacks are known as Code injection attacks in which a
piece of code is first injected into the memory and executed
afterward. To counter code injection attacks, data execution
prevention [41] systems were introduced in which a distinction
between code and data were made in memory regions and
only code region has the permission to get executed, not data
region. Security systems like WˆX, Exec-shield and DEP are
based on this protection mechanism. Alongside with software
solutions, hardware processor producers like Intel and AMD,
implemented some facilities to enforce this type of protection
in hardware level as well. Intel introduced the Execute disable
(XD) flag and AMD introduced the No execute (NX) flag in
their processors.
Although the mentioned protection techniques stop code
injection attacks effectively, they are ineffective against some
other types of attacks like return into libc (a.k.a. ret2libc).
In ret2libc technique, the attacker overwrites a pointer (e.g.,
return address) to the location of a function within libc library.
As address space of libc is marked as executable and normal
execution flow is often transferred there to execute a function,
the attacker will be able to execute a whole function in libc
library, providing its arguments and hence bypass the above
protections.
To prevent the ret2libc attack which in fact is the first
and most basic type of Code reuse attacks, randomization
techniques come into play. Protection mechanisms like ASLR
(Address Space Layout Randomization) [42] in Linux, and
concepts like Position Independent Executable [43] files are
some of the mechanisms that can be used against this type
of attack, but alongside with these protection schemes, some
other techniques have been proposed to bypass them. Heap
spray [44], ASLR brute force [39], and return into non-
randomized regions [45] are some of these techniques.
Two of the most cutting-edge methods which attackers
use are Return Oriented Programming and Jump Oriented
Programming. In these methods, attacker overwrites a pointer
with a pointer to a small part of the program’s own code. The
final attack is formed by arranging these small parts of the
code in proper order. These small parts of the code are called
Gadgets. By executing gadgets in proper order, it is proved
that attacker can gain a Turing-complete system to execute
whatever he/she likes [1].
Nowadays, presenting an efficient and reliable technique to
counter code reuse attacks effectively is the main concern in
academic society. kBouncer [28] was one of the first attempts
to gain a practical solution, although it has been proven to
be inefficient in practice [19]. One of the other prominent
works in this field is ROPecker [6], but there have been found
some methods to bypass it as well [4]. Randomization-based
protections like Isomeron [13], protections based on omitting
gadgets from binary thorough recompilation like DROP [5],
hardware-based approaches like SIGDROP [34], Gadge me
if you can [16], protections based on binary instrumentation
like ROPDefender [15] and lots of other works have tried
to mitigate code reuse attacks in different ways, but unfor-
tunately almost all of them consider a special characteristic
of these attacks as a mean of detection. Examples for these
characteristics are the length of the gadget chain, length of
the gadget itself, and so on. Hence, to bypass these protection
mechanisms, attackers always come up with some new attack
techniques, just by changing these characteristics in their
attacks.
CFI protections, On the other hand, without targeting neither
a specific attack nor a particular characteristic of an attack, are
able to detect and prevent a vast variety of attacks. From code
injection attacks to ROP and JOP attacks, if the attack affects
the running program’ execution in a way that it differs from the
predefined control flow, the attack is detected and prevented.
CFI enforcement mechanisms usually work in two phases,
first, in an off-line phase the control flow graph, a.k.a., CFG, of
a binary is obtained and then in an online phase, any violation
from this CFG is identified and considered as an attack.
CCFI [26], CFIMon [36], CONVERSE [21], OCFI [27], HCFI
[7] are some of the most prominent CFI systems presented
yet. Intel has recently presented the control flow enforcement
technology overview, CET, [22] which provides shadow stack
and indirect branch tracking capabilities to counter ROP like
attacks. CFI concept although seems to be flawless at the
first glance, but the implementations presented up to now, all
3. have some shortcomings which is noted in recent researches
that challenge the functionality of these schemes and have
presented some techniques to bypass these implementations
[14], [3], [8], [17]. It should be noticed that each of these attack
techniques target a specific implementation of CFI concept
and not the CFI concept itself. Considering this background, a
practical protection system against different attack techniques
is still needed.
In this paper, we present a new protection scheme which
enforces CFI in a different way from existing mechanisms.
Our proposed work, compares the current execution flow of the
protected process with the correct CFG, driven from the off-
line phase, considering a sequence of branches not just one at
a time. This system can be implemented alongside with other
protection schemes and does not need access to source code
in order to operate correctly. In this system, no modification
is made in binaries and no instrumentation is made neither.
III. SECURITY MODEL
Nowadays, there are various protection mechanisms used in
operating systems. Canary based protections, Data execution
prevention protections, and randomization-based protections
are the most popular ones. When there is fewer number of
protection schemes activated in a system, an attacker can take
over the control of the system more easily and the task of
activated protection schemes is heavier. In other words, there
is an inverse relationship between the number of protection
schemes and the number of security task each protection
should perform. Accordingly, in a system, if a security eval-
uation is made by only enabling one protection system, this
evaluation would be more rigorous than evaluating the same
system with more protection schemes enabled. Of course, it
should be noticed that the security scheme under evaluation
should be able to operate correctly when other protection
schemes are enabled as well, and activity of other protection
systems, should not interfere the operation of the system under
evaluation.
Accordingly, in our security model, the SSP protection
module which is used by GCC compiler at compile time, has
been disabled. This protection module is solely used to detect
the rewrite of sensitive pointers like saved EIP and have no
role in detection or prevention of code execution if the return
address is overwritten. Therefore enabling or disabling this
module has no effect on the functionality of the proposed
system, and by disabling the SSP, it will be just easier to
attack the system.
Data Execution Prevention and Address Space Layout Ran-
domization, on the other hand, are related to what happens
after an attacker takes over the control of the execution.
These protection schemes are not about stopping the attacker
from overwriting the return address, but to prevent them
from executing their own code. Therefore by disabling these
schemes, only the complexity of attacks is decreased, although
to ensure the correct functionality, we evaluate our job in
both situations. In other words, in our security model, the
functionality of the proposed system is checked in both the
presence and the absence of the ASLR and the DEP.
According to what stated above, in our security model, the
SSP protection is disabled, but we evaluate our work in both
ASLR and Exec-shield enabled and disabled states. Accord-
ingly, in our threat model, an attacker is able to overwrite a
sensitive pointer arbitrarily and is also able to execute his/her
own pieces of code in program’s address space, either by
injecting the code directly (ASLR, exec-shield disabled), or
by using more advanced attack techniques like ROP (ASLR,
exec-shield enabled).
IV. OVERVIEW OF THE PROPOSED WORK
Like other CFI systems, the general structure of our pro-
posed system is that in an off-line phase, the control flow graph
of the protected binaries is created and in an on-line phase,
this CFG is enforced. In our proposed work the generation and
enforcement of the CFG are done using some kernel features
for each binary under system’s protection.
Enforcing the CFG is done using Kprobe facility in the
Linux kernel in conjunction with the LBR Model Specific
Registers. Since the execution path of a process is determined
by the branch instructions it executes, in this system we
built a kernel module to record the performed branches. In
other words, by monitoring all branch instructions made in
a process, one can determine the valid execution flow graph.
Each branch instruction, despite its type, at the execution time
has specific source and destination address. In our proposed
work the distance between the source and the destination of a
branch is used as an identifier for that branch.
In this system, we use LBR model specific registers to
obtain the distance between the source and the destination
of a branch instruction. These registers, which come in 16
pairs, configured properly, store the source and destination
address of each user-space branch instruction executed in the
system in a ring buffer. Although it is possible to detect
the correct execution path of a process using these registers,
monitoring all branch instructions made in a process, incurs
high overhead. Therefore, in our proposed work, the contents
of LBR registers is accessed only when a system call is made.
Almost all of the existing CFI systems, work by computing
and analyzing the valid destination for branch instructions.
In these systems, in each task of analysis, a specific branch
instruction is analyzed and valid destination addresses for that
instruction is identified and compared with current execution
flow, whilst in our proposed work, in each task of analysis, 16
branch instructions made till now are analyzed and hence it is
possible to check the integrity of execution flow in a period.
In this system, whenever a sensitive system call is executed,
the 16 branch instructions before this system call is identified,
the distance between source and destination of these branches
is calculated and then this table of 16 branch distances is
compared with the table received from the off-line phase. Any
contradiction in these two tables is assumed as an unauthorized
attempt to redirect the control flow and so the execution of
that process is stopped immediately. Since in this scheme, the
4. Fig. 1. The overview of the proposed system: 1. The system is triggered 2.
Current LBR contents loaded 3. Current LBR is compared with the offline
table 4. The decision to stop or continue the execution of the protected
program is made
distance between source and destination of each branch is used
and not the absolute addresses, it is possible to implement our
proposed work in conjunction with other protection schemes
like ASLR which changes the address of where the binary
is loaded in memory on each execution. On the other hand,
because of special design in our system, which we will discuss
later, the table of valid branch distances (TVBD) that is
computed in a specific OS version for a specific binary, is
usable on other systems running the same binary on the same
OS.
A general overview of our proposed work is depicted in
figure 1. As it is shown, the system is triggered by a sensitive
system call made from a protected application (1). At this
point, the detection module loads current LBR contents (2)
and computes the distance between destination and source
of each branch trace record in a specific way which will
be discussed later in Offline analysis section. After that, it
compares the resulted table of 16 branches currently executed
just before the sensitive system call with the table derived from
the offline analysis (3). Comparing these two sets of 16 branch
information, the system will stop the process immediately if
a violation is detected, and otherwise, the execution of the
program will continue (4).
V. ONLINE PHASE
In our implementation of the proposed work, the list of the
applications which the CFI system should protect is announced
to the kernel module through a device file in the system. In
the kernel space, the installed kernel module will process this
list and the CFI checks will be enforced only for applications
mentioned in this list. In this system, the CFI enforcement
module is activated by each invocation of a sensitive system
call, it then checks the executed branch instructions and their
destination addresses and compares them with the table of the
valid branch distances (TVBD) received from off-line phase
and any difference in these two tables is considered as an
attack. Therefore, the proposed system works in three main
phases: First, the system is configured in a way that any
invocation of a sensitive system call, triggers the detection
module, second, the executed branch instructions till now are
analyzed, and third, the decision is made about whether to
stop the execution or not.
A. Hooking sensitive system calls
In this state, which runs just after the installation of the pro-
posed system, we use kernel probes to intercept the sensitive
system calls. Using these probes, it is possible to dynamically
insert breakpoints inside of the each desired kernel routine and
collect performance or debug information as needed. Before
the introduction of kernel probes, in kernel version 2.6 and
before, one would need to alter the sys call table array inside
the kernel to do this job, but by introduction of kernel probes,
this array is now marked as read-only and it is possible
to intercept the functions and routines without breaking the
integrity of the kernel, by simply using kernel probes.
Currently, there are three types of kernel probes available:
kprobe, jprobe, and kretprobe. In our implementation of the
proposed work, we use jprobes to perform our job. A jprobe
could be set on the entry point of each kernel function, and it is
possible to access the arguments of the called function inside
the probe. Using jprobes, it is possible not only to intercept
sensitive system calls and perform CFI checks but also to
analyze the passed arguments as a mean of valid execution
flow detection.
In our proposed work, we use jprobes to intercept sensitive
system calls like exec, fork, and so on. Although it is possible
to set a jprobe on any desired point in the system, for
example, in systems using sysenter mechanism, it is possible
to set a jprobe at the start of sysenter do call and identify
the called function by examining the arguments passed to
it. By the way, in our proposed work, the executed branch
instructions until the invocation of a sensitive system call in
the protected application are identified and compared with the
TVBD received from the off-line phase.
B. Tracking the branches
In our proposed work, LBR model specific registers are
used to analyze the executed branch instructions in the running
process. LBR registers are 16 pairs of MSR registers which
could be found in Intel processors based on Nehalem micro-
architecture onwards. By executing each branch instruction on
a processor with activated LBR, the source and the destination
address of the branch instruction are stored in one of 16 LBR
register pairs. Since this job is performed by the hardware,
there will be no added overhead for the system.
When there are more than 16 branch instructions executed,
the old LBR contents are overwritten in a ringed-buffer order,
overwriting the first record at first. Hence, there should always
be an index to point to the last filled LBR register. This
pointer is called TOS. In this way, it is possible at any time
to identify the last executed branch instruction by examining
5. the LBR TOS register. It is also possible to confine the LBR
facility to record only the user-space branches to use these 16
registers more thrifty.
Accordingly, in our implementation of the proposed work,
after enabling LBR in the processor, using it’s filtering facility,
the source and the destination address of executed branches
in the user-space would be accessible through these registers.
Therefore, after the interception of sensitive system calls,
whenever a jprobe is activated, the contents of LBR registers
are analyzed and sorted by the time of execution. Doing so,
the specifications of 16 branch instructions which executed
just before the sensitive system call is analyzed and compared
with the TVBD received from the off-line phase.
C. Enforcing the CFI
After comparing the 16 records of the saved branch in-
structions, received from the off-line phase, with 16 records
of executed branch instructions in current process, if any
contradictions found in these two tables, the execution of the
current process is interrupted and otherwise, if these two tables
are the same, the execution will proceed.
In our implementation of the proposed work, we use signals
to stop the running process. If any violation of CFG is de-
tected, a kill signal is sent to the running process immediately,
causing the protected application to stop forcefully. Although
there are lots of more appropriate actions available to take in
case of an attack being identified, for the sake of simplicity,
we chose signals. In case of normal behavior and conformity
of the TVBD and the executed branch instructions in the
running process, the normal execution flow will continue,
calling jprobe return.
It is to be mentioned that in this CFI system, we use a table
of the 16 latest branch instructions executed in the running
process just before the sensitive system call to identify the
attack. Whilst in current existing CFI protection mechanisms,
each branch instruction is handled separately, checking the
destination of this particular branch instruction against a list
of valid destinations. Using a table of 16 branch instructions
instead of just one branch instruction at a time improves
the security of our CFI system and it will stop lots of
yet-discovered attack techniques, as we will discuss in the
evaluation section in this paper.
VI. OFFLINE ANALYSIS
In this phase, generating the Control Flow Graph is the main
operation. There have been lots of techniques introduced in
current existing CFI systems to compute the CFG and any of
these techniques can be used to generate the CFG. Some of
the existing CFI systems use static analysis of binary files to
generate the CFG, and some others use dynamic analysis and
emulated runs to do so.
The most challenging task in generating the CFG, in most
of the CFI systems, is how to handle the indirect branches.
In these systems, each indirect branch instruction is handled
separately and possible destinations for that specific branch
are identified. Any failure in the correct identification of these
valid destinations, or the vast range of possible destinations,
makes these systems vulnerable to some advanced attacks like
ROP. That is because by increasing the number of valid desti-
nations for a specific branch instruction, or by identifying an
invalid address as a valid destination for a branch instruction, it
would be possible that a malicious branch instruction, related
to a ROP attack gadget, be considered as a valid branch and
hence, the chance of success for the attacker is increased
accordingly.
To avoid that, in our proposed work a branch instruction
is not handled separately. This means that even if an indirect
branch instruction in the current execution of the protected
binary is executed exactly according to the CFG, it will not
be considered as a valid branch yet. A valid branch in our
proposed work is a branch which not only conforms to the
CFG but also in the set of 16 branches that this particular
branch is part of, there is no violation from CFG as well,
and also the sequence order of the current branch instructions
executed in the protected binary, should be exactly the same
as the TVBD. On the other hand, in our proposed work,
instead of using fixed destination addresses to identify the
branch instructions, we use the distance between the source
and the destination of a branch as a characteristic of that
branch instruction. Therefore, for a branch instruction to be
considered as valid, three conditions should meet:
1) The distance between the source and the destination of
the branch should conform to the CFG.
2) In the set of 16 branches which the current branch is
part of, all 16 branches should conform to the CFG.
3) The sequence of these 16 executed branch instructions
till now, should be exactly the same as the sequence of
the entries in the TVBD.
In our implementation of the proposed work, all these three
conditions are checked in the on-line phase by comparing the
received table of valid 16 branch distances from off-line phase
and the 16 branch distances executed in the running process.
Although it is possible to use any desired method to generate
the CFG, in our implementation we use emulation. In this
method, we run the binary in an isolated system and fill the
TVBD for this binary in different execution scenarios for a
while. Doing so, a table of 16 valid distances is available for
each sensitive system call executed in the program code. This
table is available for each protected binary separately. It is
to be mentioned that the produced tables of valid distances
for a specific application on a particular operating system, is
usable on other systems running the same application in the
same version of the operating system. This is because of the
way we use to conduct the table of the valid branch distances,
considering the PAGE SHIFT concept.
A. PAGE SHIFT
In the Linux kernel paging operation, the address of each
page is made up of two parts. The most significant part is
a pointer to the whole page, and the least significant part is
an offset inside the page. In x86 systems, for example, which
page size is set to 4 kilobytes, 4096 bytes, to be able to address
6. TABLE I
THE PAGE SIZE FOR THE DIFFERENT ARCHITECTURES IN THE LINUX
KERNEL
Architecture PAGE SIZE PAGE SHIFT
X86 4096 12
Alpha 8192 13
ARM 4096 12
AVR32 4096 12
IA64 4096, 8192, 16384, 65536 12, 13, 14, 16
M68k 4096, 8192 12, 13
Sparc 4096 12
all entries of a page, 12 bits are needed. Therefore in these
systems, the 12 least significant bits of each address in paging
operation is related to the offsets inside the page. Accordingly,
by shifting the address of each page by 12 bits, the part of
the address which is responsible for indexing inside the page
is ignored and the remained part, the most significant part, is
the address of the page itself.
This number of shifted bits which is 12 in x86 systems,
is known as the PAGE SHIFT concept in the Linux kernel.
At the time of writing this article the size of each page,
which is named PAGE SIZE in the kernel, is calculated from
the PAGE SHIFT value. The PAGE ALIGN macro inside the
kernel uses this calculated size. The size of the pages in the
Linux kernel for different systems is listed in the tableI.
In the ASLR system, on the other hand, after the random-
ization process, the generated address is aligned according
to the size of the pages and randomization is performed
for the address of each page, not inside the page. In Linux
systems, this alignment is done after the randomization process
(i.e. after get random int), through the PAGE ALIGN macro
inside the kernel, and therefore the generated random value is
aligned regarding the address of the page.
B. Calculation of the TVBD
In our implementation of the proposed work in a Linux
system, considering what mentioned before, to produce the
valid offsets we do not use the address of the page itself,
instead, we use the offset inside the page to calculate the
distance between the source and the destination of a branch. In
other words, the valid distances for each branch instruction are
in the range of 1-4096 and the exact value is recorded during
the off-line phase. To do so, after extracting the addresses
from LBR registers, we subtract the source address from the
destination address and then we use three least significant
digits of the resulted number as the valid offset, which may
differ from 1 to FFF. Doing so, after the off-line phase, we will
have sets of 16 offsets for each sensitive system call executed
in the protected application.
Using this method, it is possible to implement the proposed
work alongside with the other protection schemes like ASLR
and it is also possible to calculate the TVBD once on an
operating system and use that TVBD for the same application
running on different machines with the same operating system.
Fig. 2. Replacement of the pages does not affect the offset inside the page,
in two different runs.
That is because the randomization in the current system is
done per memory page and not inside the pages and hence
the offsets inside the pages are still the same, so the calculated
TVBD will always stay the same. It is possible to construct a
central system to collect the TVBDs for different applications
on different operating system versions calculated on other
systems and store them in a database. This central system
can then give the proper table to other systems on demand,
according to OS-Application combination for that system.
Therefore each system could have an updated database of
TVBDs for applications it is protecting, without the need
of calculating these offset tables itself. In other words, the
operation of calculating the CFG is performed once on a
system and the resulting tables are used on other systems as
well.
To prove that, we executed an altered version of the BET
program, introduced in [3], under two different conditions.
First, we executed BET for 110 times in a single system and
observed the distance between the source and the destination
addresses of the executed branch instructions in every exe-
cution. Because the location of the loaded memory pages is
different almost on each execution (mostly because of the
ASLR operation), we got 86 different tables of distances.
Afterwards, we extracted the 3 least significant digits of each
recorded distance, which would be the offset inside the pages,
and we observed that these 3 digits are always the same.
The resulted distances of 10 different executions among 110
are listed in the table II. As it is listed in the table, the
distance between the source and the destination of the branch
instructions vary in different executions, but the last 3 digits
are always the same, therefore it can be used as a measure to
form the CFG for each program.
Secondly, we executed the BET program for 10 times on
the two separate ASLR enabled machines and recorded the
distance between the source and the destination of each of the
16 branch instructions before a specific sensitive system call
(fork in this example). The resulted addresses and offsets are
listed in table III.
As it is listed in tableIII, the offset inside the pages in
two different executions of the BET program on two separate
7. TABLE II
THE RESULTED DISTANCE TABLE OF 10 DIFFERENT RUNS ON A SINGLE
MACHINE
Index Resulted distances Offsets inside pages
(Always the same)
1 0x0010efcf 0x00043d54 0x00000003
0x000b4df8 0x000b4dfa 0x09fc7319
0x00043d97 0x00000228 0x0000026c
0x000710c3 0x000710c5 0x09f835e9
0x0012416b 0x00006335 0x00004962
0x00000795 0xfcf 0xd54 0x003 0xdf8
2 0x0010efcf 0x00043d54 0x00000003
0x000b4df8 0x000b4dfa 0x09fc8319
0x00043d97 0x00000228 0x0000026c
0x000710c3 0x000710c5 0x09f845e9
0x0012416b 0x00006335 0x00004962
0x00000795 0xdfa 0x319 0xd97 0x228
3 0x0010efcf 0x00043d54 0x00000003
0x000b4df8 0x000b4dfa 0x09fca319
0x00043d97 0x00000228 0x0000026c
0x000710c3 0x000710c5 0x09f865e9
0x0012416b 0x00006335 0x00004962
0x00000795 0x26c 0xc3 0xc5 0x5e9
4 0x0010efcf 0x00043d54 0x00000003
0x000b4df8 0x000b4dfa 0x09fcf319
0x00043d97 0x00000228 0x0000026c
0x000710c3 0x000710c5 0x09f8b5e9
0x0012416b 0x00006335 0x00004962
0x00000795 0x16b 0x35 0x962 0x795
5 0x0010efcf 0x00043d54 0x00000003
0x000b4df8 0x000b4dfa 0x09fd2319
0x00043d97 0x00000228 0x0000026c
0x000710c3 0x000710c5 0x09f8e5e9
0x0012416b 0x00006335 0x00004962
0x00000795
6 0x0010efcf 0x00043d54 0x00000003
0x000b4df8 0x000b4dfa 0x09fd6319
0x00043d97 0x00000228 0x0000026c
0x000710c3 0x000710c5 0x09f925e9
0x0012416b 0x00006335 0x00004962
0x00000795
7 0x0010efcf 0x00043d54 0x00000003
0x000b4df8 0x000b4dfa 0x09fd7319
0x00043d97 0x00000228 0x0000026c
0x000710c3 0x000710c5 0x09f935e9
0x0012416b 0x00006335 0x00004962
0x00000795
8 0x0010efcf 0x00043d54 0x00000003
0x000b4df8 0x000b4dfa 0x09fd8319
0x00043d97 0x00000228 0x0000026c
0x000710c3 0x000710c5 0x09f945e9
0x0012416b 0x00006335 0x00004962
0x00000795
9 0x0010efcf 0x00043d54 0x00000003
0x000b4df8 0x000b4dfa 0x09fd9319
0x00043d97 0x00000228 0x0000026c
0x000710c3 0x000710c5 0x09f955e9
0x0012416b 0x00006335 0x00004962
0x00000795
10 0x0010efcf 0x00043d54 0x00000003
0x000b4df8 0x000b4dfa 0x09fe1319
0x00043d97 0x00000228 0x0000026c
0x000710c3 0x000710c5 0x09f9d5e9
0x0012416b 0x00006335 0x00004962
0x00000795
TABLE III
THE RESULTING OFFSETS OF THE LAST 16 BRANCH INSTRUCTIONS ON
SYSTEM A (UPPER RECORD) VS SYSTEM B (LOWER RECORD)
Indx
Source Destination Dst-Src Offset
Address Address Distance in page
1
0xb7631445 0xb7740414 0x0010efcf
0xfcf
0xb76ce445 0xb77dd414 0x0010efcf
2
0xb774a015 0xb774a7aa 0x00000795
0x795
0xb77e7015 0xb77e77aa 0x00000795
3
0xb774a81f 0xb774f181 0x00004962
0x962
0xb77e781f 0xb77ec181 0x00004962
4
0xb774f1cb 0xb7755500 0x00006335
0x335
0xb77ec1cb 0xb77f2500 0x00006335
5
0xb775550b 0xb76313a0 0xffedbe95
0xe95
0xb77f250b 0xb76ce3a0 0xffedbe95
6
0xc1652989 0xb76313a0 0xf5fdea17
0xa17
0xc1652989 0xb76ce3a0 0xf607ba17
7
0xb76313a6 0xb76a246b 0x000710c5
0x0c5
0xb76ce3a6 0xb773f46b 0x000710c5
8
0xb76a246e 0xb76313ab 0xfff8ef3d
0xf3d
0xb773f46e 0xb76ce3ab 0xfff8ef3d
9
0xb76313bc 0xb7631628 0x0000026c
0x26c
0xb76ce3bc 0xb76ce628 0x0000026c
10
0xb763162f 0xb7631407 0xfffffdd8
0xdd8
0xb76ce62f 0xb76ce407 0xfffffdd8
11
0xb7631407 0xb75ed670 0xfffbc269
0x269
0xb76ce407 0xb768a670 0xfffbc269
12
0xc1652989 0xb75ed670 0xf5f9ace7
0xce7
0xc1652989 0xb768a670 0xf6037ce7
13
0xb75ed671 0xb76a246b 0x000b4dfa
0xdfa
0xb768a671 0xb773f46b 0x000b4dfa
14
0xb76a246e 0xb75ed676 0xfff4b208
0x208
0xb773f46e 0xb768a676 0xfff4b208
15
0xb75ed69a 0xb75ed69d 0x00000003
0x003
0xb768a69a 0xb768a69d 0x00000003
16
0xb75ed6b8 0xb763140c 0x00043d54
0xd54
0xb768a6b8 0xb76ce40c 0x00043d54
machines, are identical, and hence will result in the same
TVBD on both machines. Therefore we can calculate the
TVBD for each binary on a specific operating system once in a
base system and use it on the other machines running the same
combination of the application and the operating system. The
two systems used in this evaluation are A: Lenovo ThinkPad
T420, and B: HP Pavilion g6.
C. Static analysis and various execution states
Though we used emulated runs to construct the table of the
valid branch distances, but it is also possible to draw the CFG
by static analysis of the binaries. In this case, considering that
we use sets of 16 branches at once, it may come to mind that
how can we handle the conditional and the indirect branches
in the static analysis.
To answer this question it should be mentioned that in
the static analysis, if we handle each instruction separately,
then the only way we can identify the threat is that we build
a table of valid destinations for each individual branch. In
other words, constructing the table of valid distances is the
only solution, but when an indirect branch is analyzed among
the other branches in a set, it would be possible to use the
location of the current executed branch in this set as an extra
characteristic of the executed branch.
8. This means that in the set of 16 valid branch instructions
just before a sensitive system call, which is received from
the off-line phase, if an indirect branch is located at the
eleventh entry, in the actual execution of this application,
this particular indirect branch instruction should be located
exactly at the same location among the set of 16 branches
before the same sensitive system call. This kind of analysis can
prevent attacks like ROP because in these attacks the attacker
is constructing a gadget chain and to bypass the detection
mechanisms, he/she uses the indirect branches which could
take multiple destinations. Using these branch instructions to
construct the gadget chain, he/she will be able to direct the
execution flow to wherever he/she wants. Because when the
branch instructions are considered and checked separately, the
only way to detect the threat is to check the destination, but
when these indirect branch instructions are checked among
the other 15 branch instructions, not only the destination of
that branch instruction could be used as a way to detect the
violation, but also the position of that specific indirect branch
instruction in the set is another tool to identify the threat as
well.
Another question which may come to mind is that from a
high-level approach an application may have various execution
flow graphs, so how is it possible to construct the set of 16
branches before a sensitive call? To answer this question one
must know that the ”various execution flow graphs” term is
a high-level term. The only way that application can have
various execution flows, in low-level speaking, is conditional
branch instructions. The conditional branch instructions, each
could have two valid destinations that according to the condi-
tions, one of them is chosen at the execution time.
Furthermore, in our proposed work we do not handle all
of the branch instructions executed in an application, we only
consider the 16 branch instructions before a sensitive system
call. Hence, in the worst case, there would be 16 conditional
branch instructions to analyze (i.e., 216
different states will
be analyzed). However, a situation in which all 16 branch
instructions before a sensitive system call are conditional ones,
is the most unlikely. An analysis of the Apache web server
indicates that on average, there are only 6 conditional branches
in the set of 16 branches before sensitive system calls. That
means 64 different valid states for TVBD for each system
call, which is completely feasible to check at the run-time.
Therefore to construct the TVBD and draw the CFG in the
off-line phase, it is also possible to use static binary analysis
as well.
VII. EVALUATION
To evaluate our proposed work, we implemented this system
in an Ubuntu 14.0.4 LTS box with an Intel Core i7 CPU as a
kernel module and analyzed its security against different types
of attacks and also its efficiency related to the performance
overhead it incurs to the system.
TABLE IV
THE EVALUATION OF THE SYSTEM’S SECURITY IN DETECTING VARIOUS
TYPES OF ATTACKS.
Attack type DEP status ASLR status Result
shellcode injection OFF OFF Prevented
return into libc ON OFF Prevented
ROP ON ON Prevented
ROP OFF OFF Prevented
TABLE V
THE PROPOSED SYSTEM’S EFFECTIVENESS AGAINST REAL WORLD
EXPLOITS.
Application EDB/CVE id Result
unrar EDB-ID 17611 Prevented
nginx CVE-2013-2028 Prevented
A. Security
To evaluate the security of this system, we first used a
modified version of the BET program and exploited it using
shellcode injection, return into libc, and ROP techniques. To
check the functionality of our proposed work, we repeated
these attacks with the presence of ASLR and DEP protections,
and the result of the test is listed in table IV. In case of ROP
attacked, we also used the knowledge of the process address
space to bypass the ASLR, and in this case, the proposed
system prevented the attack successfully.
After doing so, to check the reliability of our proposed
CFI system against real-world attacks, we used two publicly
available exploits against nginx and unrar applications, which
both were detected and prevented by the system successfully
as listed in table V.
B. Performance
To evaluate the performance overhead incurred by the
proposed system, we examined the execution time of the BET
program in presence and absence of the protection system,
using valgrind. The results show that in a worth case scenario
which the valid content in TVBD is the last one and the
table itself contains 128 different sets of valid 16 branches,
the incurred overhead is negligible (less than 1%).
We have also examined the performance of our system in
a real-world scenario as well. To do so, we analyzed the
performance of nginx web server in terms of different numbers
of connection requests per second and the resulting mean
response time per request, using apachebench tool. First, we
examined the performance of the web server, running the
apachebench against the web server in a bare system and then
we executed the test in presence of our proposed work. The
result, as depicted in figure 3, shows that the performance in
both cases is almost identical.
VIII. RELATED WORKS
Lots of researches have been done up to date to propose
an efficient CFI system and the results of some of these
9. Fig. 3. Performance overhead of the system in nginx web server
200 300 400 500 600
10
15
20
Number of requests per second
Meanresponsetimeperrequest(ms)
W/O CFI system W CFI system
researches are known to be practical to some extent, however,
there are some limitations carried with each solution. Some of
these solutions require special hardware peripherals or binary
instrumentation while others need to access the program’s
source code in order to be able to protect it and lots of them
are not able to operate beside other protection mechanisms
like ASLR.
RAGuard [38], SOFIA [10], HAFIX [12], CONVERSE [21]
and the proposed system in [30] are protection mechanisms
which need special hardware support, like customized CPU
instruction set, to operate. Intel has recently presented a
hardware facility, named CET, to provide CFI in hardware
level, but it has not been implemented in its processors yet.
Some other protection systems like CCFIR [37], S-D CFI
[25], Lockdown [29], O-CFI [27], and the proposed work in
[40] use binary rewriting, instrumentation and internal hooks
to enforce the CFI policy. These mechanisms perform some
extra checks before indirect branches and validate the execu-
tion path of the protected application. To do that, they usually
inject some instruction into the binary of the application. The
instrumentation technique has been used in the kernel itself as
well [9].
Another method used to enforce CFI is to change the com-
piler or the protected application’s source code and compiling
it again. In this method, the compiler or the program’s source
code is modified in a way that performs more security checks
during the run-time of the program. Obviously, systems based
on this approach need access to program’s source code and
recompilation as well. CCFI [26] and the proposed work in
[32] are of this type.
The above mentioned approaches and other protection sys-
TABLE VI
COMPARISON OF DIFFERENT CFI SYSTEMS, BASED ON THEIR
REQUIREMENTS
CFI system Release Specific Recompilation Binary
Date Hardware Alteration
CFIMon 2012 - - -
CCFIR 2013 - - *
CONVERSE 2014 * - -
Tiec et al 2014 - * -
S-D CFI 2014 - - *
LockDown 2015 - - *
OCFI 2015 - - *
CCFI 2015 - * -
HAFIX 2015 * - -
HCFI 2016 * - -
RAGuard 2017 * - -
PT-CFI 2017 - - -
Proposed work 2017 - - -
tems like [35], [33], [31], [11], [18], and [24] are some of
the endeavours to propose an effective, practical CFI system.
Another classification of CFI systems is to categorize them
into Fine grained and Coarse grained systems. While most of
the presented systems up to now can be categorized in these
classes, our proposed work takes a third approach which we
called semi-holistic approach, in which the indirect branches
are not analyzed individually, but the system validates them
in sets of 16 branch instructions at a time. Doing so, we are
able to increase the accuracy of the system and meanwhile,
the use of jprobes in Linux kernel is a way of decreasing the
performance overhead and monitoring the general behavior
of programs. According to what mentioned, a comparison of
various CFI systems and our proposed work is presented in
table VI.
As it is shown in the table, our proposed work, as well as the
CFIMon and the PT-CFI, do not have any special requirements
to operate, however, there are some differences between our
proposed work and other those two systems. In CFI-Mon,
the BTS registers are used to track the branch instructions
and detect the violation of valid execution path. The incurred
overhead is announced to be 6%. This CFI system works in
two phases. In the offline phase, a set of valid destinations
for each branch instruction is collected using static analysis
of binaries, and in the online phase, using the BTS registers,
the executed branches are monitored and checked against
specific rules. In this system, in the offline phase, the call set
contains the addresses of all instructions at the beginning
of the functions and the ret set contains the addresses of
instructions just after call instructions in the program. The
valid destinations for indirect branch instructions are stored in
train set, using a learning mechanism.
At the runtime, this CFI system checks every branch in-
structions against these sets. Because this system uses absolute
addresses for branch destinations, it can’t be used beside
randomization mechanisms like ASLR. Processing all branch
instructions in a program, on the other hand, incurs heavy
overhead to the system, and because in this CFI mechanism
10. each branch instruction is validated separately, an attacker may
exploit the ret set by using the call preceding gadgets in a
ROP attack, and bypass the system.
In PT-CFI system, a newly introduced Intel processors
facility, named PT (Processor Trace), is used to check the
control flow integrity. Using PT, as mentioned in [20], could
cause up to hundreds of megabytes of information per second
to be generated for each processing core. Moreover, because
of the low performance of using all information packets of PT,
the PT-CFI system only uses TIP (Target IP) packets to detect
the violation of the valid execution flow, and when a violation
is detected, then using what they call deep inspection, further
analysis of PT information will take place. Although PT-CFI
is similar to our proposed work in general, the techniques used
there are different. On the other hand, the ability to use the PT-
CFI beside other protection mechanisms have not been studied
yet.
IX. CONCLUSION
Although there are various CFI systems presented up to
date, these systems validate executed branch instructions sep-
arately. This approach lacks a holistic view of the program’s
execution flow. Besides, a practical CFI system which acts
accurately without the need of the program’s source code,
special hardware peripherals, binary alteration, compiler mod-
ification and with the ability to operate beside other protection
mechanisms is still needed.
In this paper, we proposed a new CFI system which ad-
dresses the above-mentioned characteristics. This CFI system
is able to operate alongside with other protection schemes like
DEP and ASLR. In our proposed work, the computation of
the CFG is done once on a system and the resulted policy is
usable on other systems running the same OS/APP as well,
using the PAGE SHIFT concept in Linux kernel. We also
used the LBR model specific registers to compute the distance
between the source and the destination of the executed branch
instructions. To get closer to a holistic view of the program’s
execution flow, at each single act of analysis, we study the
characteristics of the executed branch instructions, as the only
means to direct the execution flow, in sets of 16 branches, just
before each sensitive system call and doing so, we are able
to detect any violation of the CFG in the program’s execution
flow. Facilitated with the kprobe concepts presented in the
Linux kernel, we are able to enforce the CFG with a negligible
performance overhead.
To evaluate our proposed work, we implemented this system
as a single LKM (Linux Kernel Module) interacting with the
user space to get the CFG on an Ubuntu 14.0.4 LTS box.
The result of our evaluations shows that the proposed work is
able to detect various types of attacks with a low-performance
overhead, alongside with the other protection systems like
ASLR and DEP.
REFERENCES
[1] Microgadgets: Size does matter in turing-complete return-oriented pro-
gramming. In Presented as part of the 6th USENIX Workshop on
Offensive Technologies, Bellevue, WA, 2012. USENIX.
[2] Mart´ın Abadi, Mihai Budiu, ´Ulfar Erlingsson, and Jay Ligatti. Control-
flow integrity principles, implementations, and applications. ACM Trans.
Inf. Syst. Secur., 13(1):4:1–4:40, November 2009.
[3] Nicholas Carlini, Antonio Barresi, Mathias Payer, David Wagner, and
Thomas R. Gross. Control-flow bending: On the effectiveness of control-
flow integrity. In 24th USENIX Security Symposium (USENIX Security
15), pages 161–176, Washington, D.C., 2015. USENIX Association.
[4] Nicholas Carlini and David Wagner. Rop is still dangerous: Breaking
modern defenses. In 23rd USENIX Security Symposium (USENIX Security
14), pages 385–399, San Diego, CA, 2014. USENIX Association.
[5] Ping Chen, Hai Xiao, Xiaobin Shen, Xinchun Yin, Bing Mao, and Li Xie.
DROP: Detecting Return-Oriented Programming Malicious Code, pages
163–177. Springer Berlin Heidelberg, Berlin, Heidelberg, 2009.
[6] Yueqiang Cheng, Zongwei Zhou, Miao Yu, Xuhua Ding, and Robert H.
Deng. Ropecker: A generic and practical approach for defending against
rop attacks. In NDSS. The Internet Society, 2014.
[7] Nick Christoulakis, George Christou, Elias Athanasopoulos, and Sotiris
Ioannidis. Hcfi: Hardware-enforced control-flow integrity. In Proceedings
of the Sixth ACM Conference on Data and Application Security and
Privacy, CODASPY ’16, pages 38–49, New York, NY, USA, 2016. ACM.
[8] Mauro Conti, Stephen Crane, Lucas Davi, Michael Franz, Per Larsen,
Marco Negro, Christopher Liebchen, Mohaned Qunaibit, and Ahmad-
Reza Sadeghi. Losing control: On the effectiveness of control-flow
integrity under stack attacks. In Proceedings of the 22Nd ACM SIGSAC
Conference on Computer and Communications Security, CCS ’15, pages
952–963, New York, NY, USA, 2015. ACM.
[9] John Criswell, Nathan Dautenhahn, and Vikram Adve. Kcofi: Complete
control-flow integrity for commodity operating system kernels. In
Proceedings of the 2014 IEEE Symposium on Security and Privacy,
SP ’14, pages 292–307, Washington, DC, USA, 2014. IEEE Computer
Society.
[10] R. d. Clercq, R. D. Keulenaer, B. Coppens, B. Yang, P. Maene,
K. d. Bosschere, B. Preneel, B. d. Sutter, and I. Verbauwhede. Sofia:
Software and control flow integrity architecture. In 2016 Design,
Automation Test in Europe Conference Exhibition (DATE), pages 1172–
1177, March 2016.
[11] Sanjeev Das, Wei Zhang, and Yang Liu. A fine-grained control flow
integrity approach against runtime memory attacks for embedded systems.
IEEE Trans. Very Large Scale Integr. Syst., 24(11):3193–3207, November
2016.
[12] L. Davi, M. Hanreich, D. Paul, A. R. Sadeghi, P. Koeberl, D. Sullivan,
O. Arias, and Y. Jin. Hafix: Hardware-assisted flow integrity extension.
In 2015 52nd ACM/EDAC/IEEE Design Automation Conference (DAC),
pages 1–6, June 2015.
[13] Lucas Davi, Christopher Liebchen, Ahmad-Reza Sadeghi, Kevin Z.
Snow, and Fabian Monrose. Isomeron: Code randomization resilient to
(just-in-time) return-oriented programming. In 22nd Annual Network
and Distributed System Security Symposium, NDSS 2015, San Diego,
California, USA, February 8-11, 2015, 2015.
[14] Lucas Davi, Ahmad-Reza Sadeghi, Daniel Lehmann, and Fabian Mon-
rose. Stitching the gadgets: On the ineffectiveness of coarse-grained
control-flow integrity protection. In 23rd USENIX Security Symposium
(USENIX Security 14), pages 401–416, San Diego, CA, 2014. USENIX
Association.
[15] Lucas Davi, Ahmad-Reza Sadeghi, and Marcel Winandy. Ropdefender:
A detection tool to defend against return-oriented programming attacks.
In Proceedings of the 6th ACM Symposium on Information, Computer
and Communications Security, ASIACCS ’11, pages 40–51, New York,
NY, USA, 2011. ACM.
[16] Lucas Vincenzo Davi, Alexandra Dmitrienko, Stefan N¨urnberger, and
Ahmad-Reza Sadeghi. Gadge me if you can: Secure and efficient ad-
hoc instruction-level randomization for x86 and arm. In Proceedings
of the 8th ACM SIGSAC Symposium on Information, Computer and
Communications Security, ASIA CCS ’13, pages 299–310, New York,
NY, USA, 2013. ACM.
[17] Isaac Evans, Fan Long, Ulziibayar Otgonbaatar, Howard Shrobe, Martin
Rinard, Hamed Okhravi, and Stelios Sidiroglou-Douskos. Control jujutsu:
On the weaknesses of fine-grained control flow integrity. In Proceedings
of the 22Nd ACM SIGSAC Conference on Computer and Communications
Security, CCS ’15, pages 901–913, New York, NY, USA, 2015. ACM.
[18] X. Ge, N. Talele, M. Payer, and T. Jaeger. Fine-grained control-flow
integrity for kernel software. In 2016 IEEE European Symposium on
Security and Privacy (EuroS P), pages 179–194, March 2016.
11. [19] Enes G¨oktas, Elias Athanasopoulos, Herbert Bos, and Georgios Portoka-
lidis. Out of control: Overcoming control-flow integrity. In Proceedings
of the 2014 IEEE Symposium on Security and Privacy, SP ’14, pages
575–589, Washington, DC, USA, 2014. IEEE Computer Society.
[20] Yufei Gu, Qingchuan Zhao, Yinqian Zhang, and Zhiqiang Lin. Pt-cfi:
Transparent backward-edge control flow violation detection using intel
processor trace. In Proceedings of the Seventh ACM on Conference on
Data and Application Security and Privacy, CODASPY ’17, pages 173–
184, New York, NY, USA, 2017. ACM.
[21] Z. Guo, R. Bhakta, and I. G. Harris. Control-flow checking for
intrusion detection via a real-time debug interface. In 2014 International
Conference on Smart Computing Workshops, pages 87–92, Nov 2014.
[22] Intel. Control-flow Enforcement Technology Preview. Intel Corporation,
2016.
[23] Elias Levy. Smashing the stack for fun and profit. Phrack Magazine,
49, 1996.
[24] Yan Lin, Xiaoxiao Tang, Debin Gao, and Jianming Fu. Control flow
integrity enforcement with dynamic code optimization. In Matt Bishop
and Anderson C A Nascimento, editors, Information Security: 19th
International Conference, ISC 2016, Honolulu, HI, USA, September 3-6,
2016. Proceedings, pages 366–385, Cham, 2016. Springer International
Publishing.
[25] X. Liu, Q. Wei, and Z. Ye. Static-dynamic control flow integrity. In
2014 Ninth International Conference on P2P, Parallel, Grid, Cloud and
Internet Computing, pages 189–196, Nov 2014.
[26] Ali Jose Mashtizadeh, Andrea Bittau, Dan Boneh, and David Mazi`eres.
Ccfi: Cryptographically enforced control flow integrity. In Proceedings of
the 22Nd ACM SIGSAC Conference on Computer and Communications
Security, CCS ’15, pages 941–951, New York, NY, USA, 2015. ACM.
[27] Vishwath Mohan, Per Larsen, Stefan Brunthaler, Kevin W. Hamlen, and
Michael Franz. Opaque control-flow integrity. In NDSS, 2015.
[28] Vasilis Pappas, Michalis Polychronakis, and Angelos D. Keromytis.
Transparent rop exploit mitigation using indirect branch tracing. In
Presented as part of the 22nd USENIX Security Symposium (USENIX
Security 13), pages 447–462, Washington, D.C., 2013. USENIX.
[29] Mathias Payer, Antonio Barresi, and Thomas R. Gross. Fine-grained
control-flow integrity through binary hardening. In Magnus Almgren,
Vincenzo Gulisano, and Federico Maggi, editors, Detection of Intrusions
and Malware, and Vulnerability Assessment: 12th International Confer-
ence, DIMVA 2015, Milan, Italy, July 9-10, 2015, Proceedings, Cham,
2015. Springer International Publishing.
[30] Dean Sullivan, Orlando Arias, Lucas Davi, Per Larsen, Ahmad-Reza
Sadeghi, and Yier Jin. Strategy without tactics: Policy-agnostic hardware-
enhanced control-flow integrity. In Proceedings of the 53rd Annual
Design Automation Conference, DAC ’16, pages 163:1–163:6, New York,
NY, USA, 2016. ACM.
[31] Jiaqi Tan, Hui Jun Tay, Utsav Drolia, Rajeev Gandhi, and Priya
Narasimhan. Pcfire: Towards provable preventative control-flow integrity
enforcement for realistic embedded software. In Proceedings of the 13th
International Conference on Embedded Software, EMSOFT ’16, pages
19:1–19:10, New York, NY, USA, 2016. ACM.
[32] Caroline Tice, Tom Roeder, Peter Collingbourne, Stephen Checkoway,
´Ulfar Erlingsson, Luis Lozano, and Geoff Pike. Enforcing forward-edge
control-flow integrity in gcc & llvm. In Proceedings of the 23rd
USENIX Conference on Security Symposium, SEC’14, pages 941–955,
Berkeley, CA, USA, 2014. USENIX Association.
[33] X. Wang and R. Karri. Reusing hardware performance counters to detect
and identify kernel control-flow modifying rootkits. IEEE Transactions on
Computer-Aided Design of Integrated Circuits and Systems, 35(3):485–
498, March 2016.
[34] Xueyang Wang and Jerry Backer. SIGDROP: signature-based ROP
detection using hardware performance counters. CoRR, abs/1609.02667,
2016.
[35] Xueyang Wang and Ramesh Karri. Numchecker: Detecting kernel
control-flow modifying rootkits by using hardware performance counters.
In Proceedings of the 50th Annual Design Automation Conference, DAC
’13, pages 79:1–79:7, New York, NY, USA, 2013. ACM.
[36] Yubin Xia, Yutao Liu, Haibo Chen, and Binyu Zang. Cfimon: Detecting
violation of control flow integrity using performance counters. In Pro-
ceedings of the 2012 42Nd Annual IEEE/IFIP International Conference
on Dependable Systems and Networks (DSN), DSN ’12, pages 1–12,
Washington, DC, USA, 2012. IEEE Computer Society.
[37] Chao Zhang, Tao Wei, Zhaofeng Chen, Lei Duan, Laszlo Szekeres,
Stephen McCamant, Dawn Song, and Wei Zou. Practical control flow
integrity and randomization for binary executables. In Proceedings of the
2013 IEEE Symposium on Security and Privacy, SP ’13, pages 559–573,
Washington, DC, USA, 2013. IEEE Computer Society.
[38] J. Zhang, R. Hou, J. Fan, K. Liu, K. Zhang, and S. A. McKee. Raguard:
A hardware based mechanism for backward-edgecontrol-flow integrity. In
ACM International Conference on Computing Frontiers 2017, Siena, Italy,
2017. ACM.
[39] Shacham, Hovav and Page, Matthew and Pfaff, Ben and Goh, Eu-Jin and
Modadugu, Nagendra and Boneh, Dan. On the Effectiveness of Address-
space Randomization. In Proceedings of the 11th ACM Conference on
Computer and Communications Security, pages 298–307, New York, NY,
USA, 2004. ACM.
[40] Mingwei Zhang and R. Sekar. Control flow integrity for cots binaries.
In Presented as part of the 22nd USENIX Security Symposium (USENIX
Security 13), pages 337–352, Washington, D.C., 2013. USENIX.
[41] S. Andersen and V. Abella. Changes to functionality in microsoft
windows xp service pack 2, part 3: Memory protection technologies,
Data Execution Prevention, In Microsoft TechNet Library, September
2004. http://technet.microsoft.com/en-us/library/bb457155.aspx.
[42] PaX Team, ”Address Space Layout Randomization (ASLR)”, 2003.
https://pax.grsecurity.net/docs/aslr.txt.
[43] RedHat,
”Position Independent Executables (PIE)”, In Redhat Customer Portal,
November 2012. https://access.redhat.com/blogs/766093/posts/1975793.
[44] Alexander Sotirov, ”Heap Feng Shui in JavaScript”, In Black-
Hat Europe, 2007. https://www.blackhat.com/presentations/bh-europe-
07/Sotirov/Presentation/bh-eu-07-sotirov-apr19.pdf.
[45] Tilo Muler, Computer Science ”ASLR Smack and Laugh Refer-
ence”, In Seminar on Advanced Exploitation Techniques, RWTH
Aachen, Germany, February 2008. https://pdfs.semanticscholar.org/440e/
61ecb744e55d0425cdb648fe24e4ff999686.pdf.