414351_Iason_Papapanagiotakis-bousy_Iason_Papapanagiotakis_Thesis_2360661_357965939

University College London
Learning Rewrite Rules of
Metamorphic Obfuscation Engines
by
Iason Papapanagiotakis-Bousy
supervised by
Dr. Earl T. Barr
August 31, 2016
This report is submitted as part requirement for the MSc in Information Security at
University College London. It is substantially the result of my own work except
where explicitly indicated in the text. The report may be freely copied and
distributed provided the source is explicitly acknowledged.

Abstract
Metamorphic program obfuscations are semantics preserving program trans-
formation. In the space of malware they are used to avoid detection from static
analysis. The perceived usage of metamorphic obfuscations is that a part of
the malware will rewrite itself before infecting a new host. This dissertation
defines a different model, external metamorphic obfscations, in which the ob-
fuscation is done a priori. This process of mutating malware is analysed and
compared to existing research. We then define the problem of learning the
rewriting rules of such obfuscations given a finite set of its outputs. We prove
that it is impossible to solve it and relax it to an optimization problem for
which we give a solution under assumptions. Our work is the first step to-
wards a new direction in metamorphic program transformations research that
spawned additional interesting questions for future work.
Keywords: Metamorphism, Malware, Term Rewriting Systems, Rule Learn-
ing, Obfuscation
1

Contents
1 Introduction 3
1.1 A Motivating Example . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2 Background 6
2.1 Term Rewriting Systems . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.2 Finding Differences . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.3 Computational Learning Theory . . . . . . . . . . . . . . . . . . . . . 10
2.4 Association Rule Mining . . . . . . . . . . . . . . . . . . . . . . . . . 12
3 Offline Metamorphic Obfuscation 14
3.1 Application Strategies . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.2 Obfuscation Genealogy . . . . . . . . . . . . . . . . . . . . . . . . . . 20
4 Learning Obfuscation Rules from Finite Malware Samples 24
4.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
4.2 Problem Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
4.3 An Approximate Solution . . . . . . . . . . . . . . . . . . . . . . . . 28
5 Implementation 35
5.1 Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
6 Related Work 40
7 Conclusion and Future Work 43
8 Acknowledgments 46
2

1 Introduction
The expansion of connected computing systems make them a lucrative target for
criminals that have evolved into organized professional groups or are state sponsored
cyberwarfare units that produce malicious software — malware — to achieve their
goals. This is apparent from everyday news that report security breaches, electronic
scams and frauds but also from the evolution of the information security field where
professionals and academics work on detecting, classifying and mitigating such at-
tacks.
Malware authors and security researchers are in a perpetual arms race, with the
former discovering new attack vectors and inventing increasingly complex techniques
to avoid detection and the latter working on protecting information systems from
such attacks.
One of the earliest techniques employed to detect known malware has been signa-
tures. Anti-malware software would have a list of the programs classified as malware
by the researchers and before running a program would check that it does not appear
in that list. To counter that measure, malware authors have developed obfuscation
techniques that change the “appearance” (syntactic representation) of a program.
This creates a broad classification of malware with respect to the obfuscation used.
Malware researchers have designated three different obfuscation classes. These are,
given by increasing order of complexity: Oligomorphic, Polymorphic and Metamor-
phic. For the evolution of obfuscations and their countermeasures we refer to the work
of O’Kane et al. [33]. Our work focuses on the class of metamorphic obfuscations
that is described in Section 3, the other two have been widely studied but are still
relevant, especially polymorphic malware has many open questions for researchers.
The term metamorphic obfuscations refers to the set of semantics preserving
code transformations that can be used to alter the syntax of a program. This code
transformation is done by what is called the obfuscation engine. While the common
case studied by researchers is for the obfuscation engine to be part of the malware, in
this work we describe how this could be done differently. Specifically, we distinguish
internal and external obfuscation engines and formally define the latter in Section 3.
3

We describe the advantages (and disadvantages) of using an external obfuscation
engine as well as some assumptions on how they might apply the obfuscations and
their impact.
Central to our definition of external metamorphic obfuscation engines and the rest
of our work are term rewriting systems. Modeling an obfuscation engine as a term
rewriting system has been proposed as an elegant formalism that maps obfuscations
to rewriting rules. In Section 2.1, we give an introduction to term rewriting systems
and the notation used throughout the rest of this work.
Having defined metamorphic obfuscations using term rewriting system, in Sec-
tion 4 we then tackle the following research questions: “Are metamorphic obfuscations
learnable from a sample of obfuscated programs?” and if so “under what assumptions
can we learn the rules of a term rewriting system that approximate the obfuscation
engine?” and “what classes of rewriting rules are learnable?”. To do so, we combine
term rewriting systems theory, computational learning theory and association rule
mining introduced in Section 2.1, Section 2.3 and Section 2.4 respectively.
The main contributions of this work are:
1. We define external metamorphic obfuscation engines using term rewriting sys-
tems. We describe their strengths and weaknesses and hypothesise on how they
might work internally.
2. We define the problem of learning the rewriting rules of an obfuscation engine
given a finite sample of program variants generated from a single archetype
program.
3. We prove the impossibility of the rewrite rule learning problem described above
and relax the problem to an optimization problem with a trivial solution.
4. We then give an algorithm for solving the relaxed problem under some assump-
tions and argue that it is the first step towards a more general result.
4

1.1 A Motivating Example
In order to give some context of the learning problem we define in Section 4, we now
give a small example. Suppose a malware author wrote a program α in a programming
language with only the following instructions Σ = {ADD, MUL, NOP, JMP} (oper-
ants are ignored for simplicity). Let α = NOP.ADD.MUL.JMP.ADD.JMP.MUL.
Now the author of α wants to obfuscate the program to make it look different. To do
so, the author uses the following two rules: R := {MUL → ADD.ADD.ADD, ǫ →
NOP}. By randomly applying some of these rules on parts of program α he ends
up with some variants of the original program. Let S denote the collection of those
variants and suppose
S = {NOP.ADD.NOP.MUL.JMP.ADD.JMP.MUL,
NOP.ADD.MUL.JMP.ADD.JMP.ADD.ADD.ADD,
NOP.ADD.MUL.JMP.ADD.NOP.JMP.MUL}
For a malware researcher, the problem is to be able to classify all programs in S
as different forms of the same original program. In order to do so, it is very useful
for the researcher to know what were the obfuscating program transformations used.
This brings us to our main research problem, given S learn R.
5

2 Background
2.1 Term Rewriting Systems
Term rewriting systems (TRS) provide an elegant, abstract and simple, yet powerful,
computation mechanism. TRS are a fully general programming paradigm as they
can simulate Turing machines [24, 16] . The basic idea is very simple: replacement of
equals by equals by applying symbolic equations over symbolically structured objects,
terms. Applying equations in one direction only immediately leads to the concept of
(directed) term rewriting. The most well-known TRS is probably λ − calculus, that
has played a crucial role in mathematical logic with respect to formalising the notion
of computability. [35]
In the rest of this section we give the basic notation and properties that are nec-
essary to our work in the following chapters. We made an effort to have a consistent
notation with the TRS literature but since there are conflicts for some parts we have
chosen what we believe makes most sense. The information presented here was taken
from the works of Franz Baader and Tobias Nipkow [7], Nachum Dershowitz and
Jean-Pierre Jouannaud [17], Yoshihito Toyama [38], Dana Ron [35] and Wikipedia
[46].
Let V ar(s) denote the set of variables occurring in a term s.
Definition 2.1. A rewrite rule is an identity l ≈ r such that l is not a variable
and V ar(l) ⊇ V ar(r). In this case we may write l → r instead of l ≈ r. The left-
hand side and the right-hand side of a rule r = (l, r) are given by lhs(r) and rhs(r)
respectively.
Definition 2.2. A term rewriting system (TRS for short) is a pair T , R con-
sisting of a set of terms T and a set R ⊆ T × T of (rewrite or reduction) rules.
A redex (reducible expression) is an instance of the lhs of a rewrite rule. Con-
tracting the redex means replacing it with the corresponding instance of the rhs of
the rule.
Definition 2.3. Let T , R be a TRS.
6

(1) → is the one-step rewrite relation. We write a → b iff there is redex of a rule r ∈
R and contracting it produce b. ∃r ∈ R s.t. a = x1.lhs(r).x2 ∧ b = x1.rhs(r).x2.
(2)
∗
→ is the transitive closure of → ∪ =, where = is the identity relation, i.e.
∗
→ is
the smallest preorder (reflexive and transitive relation) containing →. It is also
called the reflexive transitive closure of →.
(3) ↔ is → ∪
−1
−→, that is the union of the relation → with its inverse relation, also
known as the symmetric closure of →.
(4)
∗
↔ is the transitive closure of ↔ ∪ =, that is
∗
↔ is the smallest equivalence
relation containing →. It is also known as the reflexive transitive symmetric
closure of →.
Definition 2.4. An term x ∈ T is called reducible iff ∃y ∈ T | x → y , otherwise
x is called irreducible or a normal form.
Definition 2.5. Two terms x, y ∈ T are joinable iff ∃z ∈ T | x
∗
→ z
∗
← y. z may
or may not be a normal form. If x and y are joinable we write x ↓ y.
Since multiple rewrite rules can have multiple redexes on a term, applying them
on a different order obviously leads to non-deterministic computations. This leads
to two fundamental questions on TRS:
• Do all computations eventually stop?
• If two sequences of rewritings, starting from the same term, diverge at one
point, can they eventually be rejoined?
The first question is the termination problem and the second one the confluence
problem.
Definition 2.6. A rewriting system T , R is terminating, also called noetherian,
iff ∀t ∈ T there are no infinite sequence of rewritings. In such a system every object
has at least one normal form.
7

Theorem 2.1 (Term Rewriting and all that). The following problem is in general
undecidable:
Given a finite term rewriting system R = T , R , is R terminating, i.e. is there no
term t ∈ T starting an infinite reduction sequence?
Although undecidable in the general case, the termination problem has decidable
subclasses that are of interest. A term rewriting systems R = T , R such that
∀r ∈ R V ar(rhs(r)) = ∅ is called right-ground. If for a right-ground TRS it also
holds that ∀r ∈ R V ar(lhs(r)) = ∅, the TRS is a String Rewriting System, usually
called semi-Thue System. The termination problem is decidable for right-ground
TRS and semi-Thue systems [7].
Definition 2.7 (The Church-Rosser property). A rewriting system is confluent iff
∀x, y ∈ T | x
∗
↔ y =⇒ x ↓ y.
Definition 2.8. A TRS R = T , R is locally confluent iff
∀x, y, z ∈ T x → y ∧ x → z =⇒ y ↓ z.
Local confluence is important to show confluence of a term rewriting system.
Hue [23] shows that a terminating TRS is confluent iff it is locally confluent. The
deciding factor for local confluence are the rules that form critical pairs. This concept
was discovered by Knuth and Bendix [28] in their paper that delivered fundamental
results in confluence of term rewriting systems. Two rules r1, r2 form a critical pair
if the prefix of one matches the suffix of the other or if one is a subterm of the other.
Theorem 2.2. A terminating TRS is confluent iff all its critical pairs are joinable.
This result led to the Knuth-Bendix completion algorithm which tries to resolve
critical pair divergences by adding fitting rewrite rules.
Definition 2.9. A term rewriting systems that is terminating and confluent is called
convergent.
2.1.1 The Word Problem
In this section we give a description of the word problem. It is one of the most im-
portant and studied problems in the term rewriting systems research. Here, we want
8

to introduce the problem for readers that are not familiar with it and later on, in
Section 4, we argue its relevance with our research problem.
Definition 2.10 (The Word problem). Given a term rewriting system T , R and
x, y ∈ T , are x and y equivalent under
∗
←→
R
?
The problem is mostly encountered in abstract algebra where it is formulated as
“algorithmically determine if two representations of elements of a group are different
encodings of the same element”. The problem is in general undecidable and it has
been shown by William Boone that “There is exhibited a group given by a finite num-
ber of generators and a finite number of defining relations and having an unsolvable
word problem” [10].
Nevertheless, there has been plenty of research to prove decidability of the word
problem in specific types of groups. One of the most important results has been the
Knuth-Bendix completion algorithm, mentioned earlier, that can transform a termi-
nating term rewriting system to a convergent one [28]. This makes the word problem
decidable since it is terminating and each term has a single normal form. The limita-
tions of the algorithm come from the fact that it is solving a, in general, undecidable
problem which means that either it succeeds and output the convergent TRS or it
does not terminate. This “weakness” could be exploited in an adversarial model by
using techniques such as those proposed by Simonaire to render the algorithm use-
less [36]. In practice however it is the best known term rewriting system completion
algorithm.
2.2 Finding Differences
As shown in the simple example given in the introduction, applying different obfus-
cation rules to different redexes will result (in most cases) to different strings. The
fact that the strings we work on are different gives us little to no information, we
are rather interested in how they differ. Finding the best (most compact) way of ex-
pressing the difference of two strings amounts to computing the Levenshtein distance
9

together with the necessary edit transcript of the two strings. This is also equivalent
to computing the longest common subsequence which focuses on finding the common
parts while the Levenshtein distance is about computing the differences.
Definition 2.11 (Levenshtein Distance). Given two strings s1 and s2 over an alpha-
bet Σ and the edit operations:
• Insert a symbol: ǫ → x such that uv gives uxv.
• Delete a symbol: x → ǫ such taht uxv gives uv.
• Substitute a symbol: x → y, where x = y, such that uxv gives uyv.
The Levenshtein distance of s1 and s2 is the minimun number of operations re-
quired to transform s1 into s2.
The Levenshtein distance belongs to the family of edit distances, other edit dis-
tances metrics are used with a different set of edit operations or different cost on
each operation (in the Levenshtein Distance each operation has a cost of 1).
Definition 2.12. The global sequence alignment problem is to compute the
Levenshtein distance along with the transcript of necessary edit operations.
The problem was first solved by Needleman and Wunsch with dynamic program-
ming [32]. The complexity of the algorithm for two strings of length M and N is
O(MN) in time and O(MN) in space. Hirshberg invented a modification that re-
duces the space complexity to O(N) [21]. Hunt and McIlroy [25] proposed a heuristic
improvement with text files in mind. Their algorithm was implemented for the first
version of the Unix diff program. The diff program was later updated to use My-
ers algorithm [31] that has linear space complexity and achieves an expected-case
time complexity of O((M + N) + D2
) where D is the length of the longest common
subsequence.
2.3 Computational Learning Theory
In this section we give an introduction to the field of Computational Learning Theory.
This is not aimed to fully cover the research field but rather to introduce its basic
10

concepts to the readers and motivate some of the problems we encounter on the
following sections.
From the Association for Computational Learning: Computational Learning The-
ory is a research field, part of Artificial Intelligence, that studies the design and anal-
ysis of Machine Learning algorithms. In particular, such algorithms aim at making
accurate predictions or representations based on observations.
The emphasis in Computational Learning Theory is on rigorous mathematical
analysis using techniques from various connected fields such as probability, statistics,
optimization, information theory and geometry. While theoretically rooted, learning
theory puts a strong emphasis on efficient computation as well [1]. Learning prob-
lems are modeled as an algorithm (the learner) that tries to learn a concept (i.e. a
language) given an information presentation method.
The field originated with the work of Mark Gold that studied language learnability
[20]. In his paper, Gold defines language learning as an infinite process where at each
time unit, the learner receives a unit of information and is to make a guess as to the
identity of the unknown language on the basis of the information received so far. He
considers a class of languages learnable or identifiable in the limit with respect to
the information presentation method used if there is an algorithm such that: Given
any language of the class, there is some finite time after which the guesses will all be
correct. The information presentation methods studied were:
• Text, where the learner is presented with strings from the target language L in
random order.
• Informant that can label strings as belonging to the target language L or not
and can choose a specific order to present information to the leaner.
Gold found that under this model of learning the class of context-sensitive lan-
guages are learnable form an informant, but not even the class of regular languages
is learnable from text.
In 1984 Leslie Valiant proposed the Probably Approximately Correct (PAC)
model for learning [41]. PAC is considered amongst the most significant results
in computational learning theory as it provided an attractive general model to study
11

the computational, statistical and other aspects of learning [39]. Like in Gold’s work,
the PAC framework has a learner that receives samples and must form a generalisa-
tion function (called the hypothesis) that will allow him to classify unseen instances.
The learner is either presented with samples following an arbitrary distribution or
has access to an informant to which he can make queries. Unlike identification in
the limit, PAC allows the hypothesis to have a bounded generalisation error (the
“approximately correct” part) with high probability (the “probably” part).
A more detailed overview of the computational learning theory field can be found
in the book of Kearns and Vazirani [27] and the papers of Angluin [6] and Turan [39].
As our main research question is the learning problem defined in section 4, we
explored Computational Learning Theory in order to have a formal framework in
which we can characterise our problem. The first step was to learn the formalism
used in this space and what are the main results. Once we had defined our learning
problem, we tried to map it to other learning problems that had already been studied
in order to leverage existing results to reach a conclusion about our research question.
2.4 Association Rule Mining
Our research problem defined in Section 4 is one of learning rules. More precisely,
we are interested in finding the smallest set of rules that can “describe” the largest
set of pairs of strings. This problem has been studied extensively in the field of data
mining giving birth to association rule mining.
Association rule mining, credited to Argwal et al. [4], defines a set of measures
of interestingness of patterns found in large sets of transactions. It was originally
conceived to discover common patterns in shopping habits from data generated by
point-of-sale systems.
In our rule selection, we used the two best-known measures, support and confi-
dence. We should point out that in our use-case, all transactions contain exactly two
items. Thus, we give the definitions of support and confidence used for transactions
of two items.
Let T be a collection of transactions of exactly two items. We consider every
12

transaction to be a rule.
Support measures the coverage of a transaction t ∈ T, or how frequent it is.
supp(t) =
# of occurrences of t
|T|
Confidence measures the accuracy of a rule, or how frequently the lhs(t) implies
the rhs(t).
conf(t) =
# of occurrences of t
# of occurrences of lhs(t)
Rule mining is usually done by manually defining a minimum support and then
looking for rules with high confidence.
13

3 Offline Metamorphic Obfuscation
obfuscate • from the Latin ob- “in the way” and fuscus “dark brown”,
means to make obscure/confusing [2].
In this dissertation we are considering program obfuscations, that is obfuscations
that modify a program in order to make it “look” different but maintain its func-
tionality. The use cases of program obfuscation can vary, our work uses malware
obfuscation for examples and as a motivation but the following ideas might be ap-
plicable to a wider range of program obfuscation use cases.
Metamorphism is a class of obfuscations that aims to transform a program into
a different but equivalent one in order to avoid detection. As described by Szor
[37] “they are able to create new generations that look different”. A metamorphic
obfuscation engines can be modeled as term rewriting system [44] that modifies the
syntax of program p and output a syntactically different program p′
that maintains
the semantics of p. Let P be the set of all syntactically correct programs in some
instruction set (i.e. x86 assembly). Let e() : P → N be a predicate that returns the
computational resources (both space and time) necessary of a program.
Definition 3.1 (Metamorphic Obfuscation Engine). A metamorphic obfuscation en-
gine is a tuple O = P, R, A where P, R is a term rewriting systems over programs
and A is an algorithm for applying the rules in R.
p′
= O(p) such that p = p′
∧ (1)
p = p′
∧ (2)
∃n ∈ N : O(e(p′
)) ∈ O(e(p)n
) (3)
The first condition is needed in order for the metamorphic engine to not break
the functionality of the program. The second condition is representing the syntactic
transformation of the input program p, the more different the output is from the
input the better the obfuscation. Finally, the constraints on the time and space
14

complexity of the output program are non-functional requirements. If p′
needed ex-
ponentially more resources than p, it could not execute correctly in similar execution
environments.
The effort to formally define the space of malware, and in particular viruses,
started with Adleman [3] who gave a formal description of the different types of
computer viruses. In his work he identifies that their key functions are injure, infect
and imitate. At that time, Adleman did not consider program obfuscation. Zuo et
al. [48] updated those definitions to capture the effect of obfuscation that modern
malware uses. In their paper, Zuo et al. define metamorphic viruses as follows:
Let (d, p) denote the environment (data and programs). T(d, p) and I(d, p) are
called trigger and infection condition, respectively. When T(d, p) holds, the virus
executes the injury function D(d, p) and when I(d, p) holds, the virus uses S(p) to
select a program to infect.
Definition 3.2 (Metamorphic Virus [48]). The pair (v, v′
) of two different total
recursive functions v and v′
is called a metamorphic virus if for all x, (v, v′
) satisfies:
φv(x) =



D(d, p), if T(d, p)
φx(d, p[v′
(S(p))]), if I(d, p)
φx(d, p), otherwise
(4)
and
φv′(x) =



D′
(d, p), if T′
(d, p)
φx(d, p[v(S′
(p))]), if I′
(d, p)
φx(d, p), otherwise
(5)
This can be extended to a tuple of n recursive functions (v1, v2, . . . vn) to capture
more complex metamorphic viruses.
In this definition, the obfuscation engine is assumed to be inside the malware.
At the time of infection the obfuscation engine will generate the new variant, we
call it an internal obfuscation engine as it is part of the malware. However, the
obfuscation engine could be separated from the malware and generate variants of the
original program at an earlier time, we will call such obfuscation engine external.
15

This difference has been mentioned by Walestein et al. and Okane et al. as open
and closed world obfuscations [45, 33]. Both terminologies capture the same notion
but while the internal/external focuses on the obfuscation process, the open/closed
focuses on the set of known facts.
Informally, we say that internal malware obfuscation is when the obfuscation is
done by the malware at the time of infection of a new host. External obfuscation
is the case when the obfuscation of the malware is independent of the infection. In-
ternal obfuscation is the common case studied by researchers, possibly falling under
the “streetlight effect”, as it is easier to study obfuscation engines that can be recov-
ered by disassembly rather than hypothesise their existence in the hands of malware
authors.
In this work, we argue that there is value in defining the external model for
metamorphic obfuscations. Consider for example, malware obfuscated by humans, it
has to be external obfuscation. This could be the case in spear phishing where the
number of targets is very small and the “success” of the malware very important. In
such instance, a human could rewrite the malware to obfuscate its function before
sending it to the victim.
Definition 3.3 (External Metamorphic Obfuscation). Let O denote an obfuscation
engine as described in definition 3.1 and p a program. The program p′
is externally
obfuscated metamorphically iff:
p′
= O(p) ∧ O /∈ p′
The difference with definition 3.1 is that firstly while Zuo et al. use total recursive
functions to capture the obfuscations, we use term rewriting systems, like Walestein
et al. [44], as we found it more appropriate. Secondly, the difference of internal and
external obfuscation is that unlike Zuo et al. and Walenstein et al. [48, 44, 45], we
separate the obfuscation from the infection.
Walestein et al. made an analysis of the design space of metamorphic malware
that include their obfuscation engine [45]. The paper explains why internal meta-
morphic obfuscation engines are difficult to design. Their complexity is due to the
design space being a recursive one.
16

From the point of view of the author a metamorphic obfuscation engine there are
many benefits to choose the external model over the internal.
(1) Simplification. The design space is no longer necessarily recursive. The meta-
morphic engine does not have to locate the malware payload and disassemble
it since it can work directly on the source code. Additionally, the approximate
control flow graph can easily be constructed and non-normalisable obfuscations
can be applied.
(2) Stealth. Complex internal obfuscation engines like Win32/Simile tend to be
large segments of code making up to 90% of the size of the malware. Removing
it reduces significantly the footprint of the malware, making it less likely to be
detected.
(3) Durability. Metamorphic engines are very complicated software and valuable for
malware authors but despite all countermeasures, internal metamorphic engines
have been reverse engineered. Once the mutations are known to the research
community, the effectiveness of the engine is greatly reduced. With the exter-
nal model, malware authors are protecting the metamorphic engine from direct
disassembly.
The downside of having an external metamorphic engine would be that the mal-
ware is no longer self contained when it comes to infecting a new target with a
modified copy of itself. One possibility would be for it to contact a remote server to
get a new instance when it is about to infect a new target. Alternatively, malware
authors are not interested in the self propagating property and leverage the increas-
ing number of attack vectors to infect new hosts (i.e. an web server under the control
of the malware author obfuscates a malicious Javascript file before serving it to its
clients).
3.1 Application Strategies
Rewriting systems offer a very convinient way to model rewrite obfuscation engines
but do not provide an algorithm for changing one term to another. They are in gen-
17

eral, as mentioned in Section 2.1, a non-deterministic computational model. However,
obfuscation engines are deterministic processes as they are real programs.
To go from the (possibly) non-deterministic computational model to a program,
an obfuscation engine needs an algorithm A to “enforce determinism” 1
. The general
description of A is as follows:
Data: set of rewrite rules R;
input string p;
number of iterations maxiter;
Result: a string p′
= p such that p′ n
←→
R
p : n ∈ N ∧ n ≤ maxiter
i = 0;
p′
= p;
redexes = list of all the redexes of rules in R found in p′
;
while i ≤ maxiter ∧ redexes is not empty do
r = select redex(redexes);
p′
= contract(r);
update redexes;
i = i + 1
end
return p′
Algorithm 1: The template for a TRS rule application strategy
The obfuscation engine O can then run the algorithm A to generate a variant of
a program p. Two factors remain to be determined, the select redex function and
the number of iterations of A. Selecting those two parameters defines what we call
the application strategy of the obfuscation engine. The number of iterations only
determines how many redexes will be contracted each time we execute O, a low
maxiter value will execute faster and could produce a higher number of variants 2
while a higher value will make each variant produced more different from its ancestor.
1
Here we use determinism in its computer science sense, the resulting algorithm is allowed to
use randomness.
2
This is the case if the total number of equivalent programs is finite.
18

The select redex function, on the other hand, is more interesting. The author of the
obfuscation engine could write the select redex to:
1. Always pick the first/last/i-th redex.
2. Pick a redex at random.
3. Have some other complex algorithm to decide what redex to choose.
In the next two paragraphs, we explore first the fixed strategy (option 1) and
why it is not fitting and then we consider options 2 and 3 in a single rule application
strategy called random rule — random offset.
3.1.1 Fixed Strategy
The first choice is what would happen by using simple regular expressions libraries
(i.e. the regex Python module always returns the first match). Although it is straight-
forward to implement, in practice it is a very bad choice for obfuscation. That is
because the obfuscation engine will constantly change the beginning of the program,
leaving most of the program unchanged. For example, consider the algorithm A that
always picks the first redex as described above. Let r ∈ R be an injector rule such
that r = ǫ, σ . The redex of r in a program p will be just before the first symbol
of p, contracting it would give us ǫp → σp. Repeating the procedure n times would
result in σσ . . . σ
n
p, making it vulnerable to signature detection. We consider this
strategy too simplistic and vulnerable to be used by malware authors.
3.1.2 Random Rule — Random Offset Strategy
The second and third possibilities could be considered as a single case study for mal-
ware researchers as a complex enough redex selection algorithm (option 3) will look
like random processes (option 2) to an external observer. This has been previously
considered in the paper of Chouchane et al [12] that models metamorphic obfusca-
tion engines as probabilistic language generators. In this case, we will call the rule
application algorithm A the random rule — random offset strategy. We believe this
19

is the right model to study as long as we do not have concrete evidence of what is
actually used.
Assumption 3.1. A implements the random rule — random offset strategy.
Under that assumption, we are interested in quantifying the impact that a rewrite
rule can have on a program.
Let O = P, R, A denote a metamorphic obfuscation engine with a finite set of
n rewriting rules ri ∈ R. Each ri is assigned a rule application probability Pi. Given
a program p let Li be the set of all distinct redexes of the rule ri in p.
Definition 3.4. The relative impact of a rule ri ∈ R on a string p is:
Iri =
|Li| ∗ Pi
n
j=1
|Lj| ∗ Pj
With relative impact, we can compare the effect of two rules of a given rewrite
system on a string p.
Definition 3.5. The absolute impact of a rule ri ∈ R on a string p is:
Iai =
|Li|
length(p)
Although absolute impact does not consider the rule application probabilities, it
can be useful when we need to compare between rules of different rewrite systems on
the same string p.
3.2 Obfuscation Genealogy
While in Section 3.1 we analyse the possible rule application strategies that can
rewrite a program p into an different but equivalent program p′
, in this section we
demonstrate the differences in the genealogy of the variants between internal and
external obfuscations.
Let mi be a malware that includes a metamorphic obfuscation engine. Assuming
it can keep state or it is using randomness, the obfuscation engine can generate
multiple different variants mi+1,1, mi+1,2 . . . mi+1,n and can guarantee that all of them
20

are different than mi. We thus have the set: Mi+1 = O(mi) such that ∀m ∈ Mi+1 :
m = mi.
. . .
��
��+ , ��+ ,
��+ ,�
Figure 1: All mutations of first degree are different.
Each of the programs in Mi+1 will then run the obfuscation to generate new
variants. Because the obfuscation is done internally, when the engine will run on
mi+1,1 it cannot know if some of the variants it will generate were already generated
by some other variant i.e. mi+1,2. In general, let Mall denote all malware variants
generated, there is no guarantee that the set
∀m∈Mall
O(m) is empty.
What was intended to be a tree of program variants is in fact a graph. This is
due to the limited knowledge that is available to the obfuscation engine because of
the fact that it is internal.
An external obfuscation engine O, on the other hand, has the advantage that it
“knows” all program variants generated and thus, can avoid duplicates. The process
of generating a set of obfuscated malware variants in the external setting could be
described by the following algorithm:
21

. . .
��
��+ , ��+ ,
��+ ,�
...
��+ , ��+ , ��+ , ��+ ,
Figure 2: Diﬀerent variants can generate the same new mutation.
Data: an external obfuscation engine O;
the original program α;
the set of all malware variants generated Mall;
Result: a bigger Mall set
Mall = {α};
while more variants are needed do
select p from Mall;
p′
= O(p);
if p′
/∈ Mall then
add p′
to Mall;
end
end
return Mall;
Algorithm 2: Oﬄine generation of obfuscated program variants
Although the resilience against repeated instance is a theoretical advantage for
using the external obfuscation model, an interesting problem for future research
22

would be to bound the probability that internal metamorphic obfuscation encounter
such a collision. Depending the rewriting rules of the obfuscation TRS, this problem
can be more or less present. If the critical pairs of the TRS from circles then more
circles will appear in the graph. Only trivial, non interesting set of rules could
completely avoid ever generating the same program variant.
A related problem is counting how many viable variants can be generated in
each model. Viability is an important concept when it comes to program mutations.
We have identiﬁed two concerns on that matter: First, is the mutation eﬃcient
enough to be considered viable (i.e. a program that takes 2 hours to open a network
connection)? Second, is the mutation doing what it is supposed to do? In other
words, is our TRS truly semantics preserving such that any sequence of rewritings
result in a semantically equivalent program?
Let A and B represent the viable program variants in the external and internal
model respectively. It is obvious that |A| ≥ |B| since we can simulate an internal ob-
fuscation using an external one (for the purpose of counting distinct viable variants),
but future research could focus on describing their relationship and the reasons that
might make B much smaller than A.
23

4 Learning Obfuscation Rules from Finite Mal-
ware Samples
As outlined in the introduction, one of the problems we set out to solve was to
(automatically) learn the rewrite rules of an obfuscation engine. Let Rall denote
all possible semantics preserving code transformations, the problem is to learn the
subset of rewrite rules used by an obfuscation engine. We define the problem for
a finite sample of obfuscated programs, variants of the same original program, for
which we have no access to the obfuscation engine used to transform them.
This section starts with the preliminaries where we introduce some additional
concepts and definitions. Subsequently, the learning problem is formally defined and
studied. Finally, we give an algorithm for an approximate solution under assump-
tions.
4.1 Preliminaries
Like in definition 3.1, we will consider obfuscation engines as term rewriting systems.
Something that is important is a notion of “equality” between term rewriting systems.
We call two TRS R1 and R2 equivalent if they have the same set of terms T and for
each rewriting of a term under R1 there is a chain of one or more rewritings under
R2 that give the same result.
Definition 4.1. A class of equivalent term rewriting systems E is defined as a set
of TRS such that ∀R1 = T , R1 , R2 = T , R2 ∈ E and ∀x, y ∈ T , x
∗
←→
R1
y iff
x
∗
←→
R2
y
Given this definition of equivalent term rewriting systems, it is important for the
purposes of our work to give a relaxed definition of observable equivalence. This is
motivated by the fact that we are interested in term rewriting systems that might be
equivalent for a (usually finite) subset of T .
Definition 4.2. A class of observable equivalent term rewriting systems Eo is de-
fined as a set of TRS such that ∀R1 = T , R1 , R2 = T , R2 ∈ Eo and ∀x, y ∈ T ′
⊆
24

T , x
∗
←→
R1
y iff x
∗
←→
R2
y
Having defined what equivalent term-rewriting systems are, an interesting prob-
lem is to measure the quality of a TRS R within a class E of equivalent term rewriting
systems. To the best of our knowledge, this problem has not been given a lot of at-
tention. In the only work that we found to touch on this characterisation of term
rewriting systems [40], the authors assert that the length of a rewriting rule is nega-
tively correlated with how “good” the rule is. Building on that intuition, we define
lR to be the total size of a TRS R and then define the subset of “optimal” TRS in
an equivalence class E. We use the Occam Razor principle to represent the quality
of a TRS.
Definition 4.3. The size of a term rewriting system R = T , R is:
lR :=
r∈R
len(r)
where len(r) := |rhs(r)| + |lhs(r)|
Definition 4.4. Given a class of equivalent term rewriting systems E, a TRS R =
T , R ∈ E is Occam Razor iff:
∀R′
∈ E : lR ≤ lR′
For the same practical reasons that we defined observable equivalent classes of
term rewriting systems in definition 4.2, we give a relaxed definition for the best TRS
in a class of observably equivalent term rewriting systems.
Definition 4.5. Given a class of observably equivalent term rewriting systems Eo, a
TRS R = T , R ∈ Eo is observably Occam Razor iff:
∀R′
∈ Eo : lR ≤ lR′
In practice, instead of talking about an Occam Razor term rewriting system in
an equivalence class E or an observable Occam Razor term rewriting system in an
observably equivalent class Eo, we might simply use the terms Occam’s razor TRS
and observable Occam’s razor TRS when the equivalence class is deductible from the
context.
25

4.2 Problem Definition
We will now formulate our learning problem for metamorphic obfuscations. As men-
tioned in the beginning of the section, the fact that we want to capture both internal
and external obfuscations limit us to an offline learning problem following the def-
inition of Karp [26]. That is because if the obfuscation is done externally and we
have no access to the obfuscation engine, the only thing we have to learn from is a
finite sample set. Ideally, we would like to given a finite number of malware variants
generated from a single original program, the “archetype”, to infer the Occam razor
TRS that can rewrite any possible output of the metamorphic engine generated from
the same archetype into another.
Let P denote the set of all syntactically correct programs in some instruction set,
O = P, R, A denote a metamorphic obfuscation engine and α be the archetype
program. Let Pα : Pα ⊆ P be a set of all possible outputs of O on input α. Let
Sα : Sα ⊆ Pα be the finite sample set of observable outputs.
α
PPα
Sα
P′
Figure 3: The sets of programs.
By construction, we know that ∀pi, pj ∈ Pα there is a finite sequence of rewritings
such that On
(pi) = pj =⇒ pi
n
←→
R
pj. Where On
(p) denotes n applications of the
algorithm A on input p given the rules in R.
We now state the problem formally.
Problem 4.1 (Learning a Metamorphic Obfuscation Engine). Given Sα, learn a
term rewriting system R such that:
(i) R = Pa, R is Occam razor.
26

(ii) pi
∗
←→
R
pj ∀pi, pj ∈ Pα
Theorem 4.1. Given a set S, learning an Occam razor set R of rewriting rules is
impossible.
Proof. Term rewriting systems are equivalent to Turing machines [24, 16]. Consider
an algorithm A such that on input S it outputs the Occam razor — smallest — term
rewriting system. We could then use A to compute the Kolmogorov complexity of a
program p as K = |A(p)|.
Nevertheless, it is still interesting to ﬁnd a suboptimal (in terms of size) term
rewriting system from the set of equivalent term rewriting systems E. Although
size is still important, as an exponentially bigger TRS than the original one would be
impractical, we relax condition (i) to a TRS R whose size is bounded by a polynomial
of the number of samples.
Problem 4.2. Given Sα, learn a term rewriting system R such that:
(i) lR = O (|Sa|n
) , n ∈ N.
(ii) pi
∗
←→
R
pj ∀pi, pj ∈ Pα
Theorem 4.2. Solving problem 4.2 is impossible.
Proof. Consider the term rewriting system R as a language generator. Learning
to recognize Pa (all possible strings of the language) from Sa (a ﬁnite sample of
positives examples), with no additional information, has been shown impossible by
Gold [20].
The impossibility of problem 4.2 can be stated informally as: “It is impossible
to generalize (knowledge) only from randomly presented positive examples without
error”.
Since it was condition (ii) that “caused” the impossibility, we relax that condition.
Instead of trying to learn a TRS that maintains the
∗
←→
R
property on all objects of
Pa, we instead aim to cover a subset P′
of Pa.
Problem 4.3. Given Sα, learn a TRS R such that:
27

(i) lR = O (|Sa|n
) , n ∈ N.
(ii) pi
∗
←→
R
pj ∀pi, pj ∈ P′
⊆ Pα
Note that in this case we do not put a restriction on the relation between P′
and
Sα.
Na¨ıve Solution By relaxing the requirement on the size of the rewriting system
that we need to learn, this problem becomes much easier. The trivial solution in this
case is storing all pairs of samples in Sa as rewriting rules.
R := {r : rhs(r) = pi ∧ lhs(r) = pj ∀pi, pj ∈ Sa}
Although this solves the problem 4.3, the solution is far from ideal. The resulting
term rewriting system will have two drawbacks:
• Big size, the size of the resulting TRS will be: lR = |Sa| ∗ (|Sa| − 1) ∗
|Sa|
i=1
|pi|
• Not generalizing, as the resulting rules will allow the rewriting of all p ∈ Sa but
not others.
The problem now becomes one of optimisation to ﬁnd “the smallest term rewriting
system” that covers “the largest set P′
”. This is the topic of Section 4.3.
4.3 An Approximate Solution
In the following section we address the problem 4.3 in more detail. Although the
na¨ıve solution already given is poor, it illustrates the general direction we will be
following.
In order to improve upon the naive solution, we could try to minimise the TRS
generated, make it more generalising or both. Instead of storing each pair of strings,
the following solution uses the Levenshtein distance to minimise the amount of in-
formation stored to represent the diﬀerence of two strings.
28

Solution: Compute the edit transcripts for all pairs of strings in Sa. The edit
transcript as described in Section 2.2 can be seen as a function that transforms a
string into another. Let E(Sa) be the set of the edit transcript computed from all
the pairs of strings in Sa and let p denote any entire string (not a substring) to be
rewritten by the TRS. We can generate a TRS R = T , R as follows:
R := {r : lhs(r) = pi ∧ rhs(r) = e(pi, pj), ∀pi, pj ∈ Sa, e ∈ E(Sa)}
The generated TRS will still have
|Sa| ∗ (|Sa| − 1)
2
rules, the same number of
rules as the naive solution , but the rules will be smaller as they will represent only
the diﬀerences of the two strings. Note that given the resulting TRS and two strings
p1, p2 from Sa, we can’t “know” which rule of R to apply to get p2 from p1, we just
have the guarantee that there is a rule r such that p2 = r(p1). This fact does not
invalidate the solution as the goal is to learn the TRS but not the algorithm that
was used to apply it.
The next improvement to reduce the size of the TRS generated is to consider each
edit operation found in any of the edit transcripts separately. For example, let i(x, 4)
denote the insert operation of x at the position 4 3
and s(x, y, 2) the substitution
of y at position 2 with x. Let two edit transcripts be ed1 = i(x, 4), s(x, y, 2) and
ed2 = i(x, 1), i(a, 9), i(b, 10), s(x, y, 12). The ﬁrst preprocessing step is to group sim-
ilar consecutive edit operations, in this case, ed2 becomes i(x, 1), i(ab, 9), s(x, y, 12).
Then, extracting the individual edit operations from ed1 and ed2 will yield the set:
(i(x, 4), s(x, y, 2), i(x, 1), i(ab, 9), s(x, y, 12))
As mentioned earlier, we are not interested in knowing which rule to apply where as
long as there is a sequence of rewritings maintaining the
∗
↔ relation. We thus can
discard the indexes of where to apply the edit operations, this will yield the much
smaller set: (i(x, ∗), i(ab, ∗), s(x, y, ∗)). This leads us to the solution illustrated by
the following algorithm:
3
The delete operation is the same as insert.
29

Data: the malware sample set Sa;
Result: a set of rewrite rules R;
R the rules to be extracted;
for all pairs pi, pj ∈ Sa do
t = the edit transcript of (pi, pj);
group consecutive edits in t;
remove the indexes of edits in t;
for all edit operations edop ∈ t do
if edop /∈ R then
add edop to R
end
end
end
return R
Algorithm 3: Converting edit transcripts to rewrite rules.
Note that while edit operations are equivalent to rewrite rules they are written a
bit diﬀerently. The three edit operations insert, delete and substitute are represented
as rewrite rules as follows: i(a, ∗) ⇔ r : ǫ → a, d(a, ∗) ⇔ r : a → ǫ and s(a, b, ∗) ⇔
r : a → b.
This algorithm will generate a smaller TRS than previous solutions explored,
but the rules in R will still be far from the rewriting system used to obfuscate the
programs in Sa. The reasons for this are:
• The algorithm will separate the rule r1 : a → b from the rule r2 : b → a,
however only one of them should be in the “optimal” TRS.
• While grouping consecutive edit operations make the TRS smaller, it could be
the case that instead of the rule r : ǫ → ab, the original TRS has two rules
r1 : ǫ → a and r2 : ǫ → b 4
.
• A generated rule might be the result of multiple applications of diﬀerent rules.
4
This remark also applies to rules that are product of substitution edit operations.
30

For example, let Ro = (r1 : aab → ddc, r2 : dd → ef) be the rules of TRS used
by the metamorphic obfuscation engine. The proposed algorithm might return
the rules in Ro but (most likely) will return also others like r : aab → efc. The
rule r is clearly a composition of r1 and r2 and thus it should be removed from
the generated TRS as it is redundant.
• There might be noise in the rewrite rules generated. The noise can be due to
the previous two remarks, or due to “imperfect” results on the edit transcript
5
.
In order to address the first point, we have to “redirect” all the rewriting rules
with the following rule: |lhs(r)| < |rhs(r)| and if both sides are of equal length to
use the lexicographic order lhs(r) < rhs(r). We call the problem presented in the
second remark contiguous rule applications. It is addressed by an iterative process of
looking for appearances of rules as sub-rules of larger ones described in the following
algorithm. The third remark is about what we call nested rule applications. This
problem has not been addressed yet but we do discuss a strategy to solve it in
Section 7. Finally, in order to overcome noise we use association rule mining (the
support and confidence metrics) to determine which rewriting rules are most likely
part of the original term rewriting system.
Let redirect() be a function that takes a rewrite rule and redirects it according to
the previously described strategy. In the following algorithm we also use ⊆ between
two strings to indicate the substring operation. Finally, along each rewrite rule stored
we also keep the number of appearances of that rule.
5
This is analyzed in Section 5 where we use diff to extract the pairwise differences.
31

Data: the malware sample set Sa
support threshold st
confidence threshold ct
Result: a set of rewrite rules R
Rmin the rules extracted by association mining
for all pairs pi, pj ∈ Sa do
t = the edit transcript of (pi, pj)
group consecutive edits in t
remove the indexes of edits in t
for all edit operations edop ∈ t do
add redirect(edop) to R
end
end
Rmin := {r ∈ R : support(r) > st ∧ confidence(r) > ct}
for r1 ∈ Rmin do
for r2 ∈ R ∧ r2 /∈ Rmin do
if (lhs(r1) ⊆ lhs(r2) ∧ rhs(r1) ⊆ rhs(r2)) ∨
(lhs(r1) ⊆ rhs(r2) ∧ rhs(r1) ⊆ lhs(r2)) then
R := R {r2}
remove the occurrence of r1 from r2, r3 = r2 − r1
R := R ∪ {r3}
increase the counter of occurrences of r1 by one
end
end
end
compute support and confidence of all rules in R
update Rmin if any rule not included has passed the support and confidence
thresholds
return Rmin, R
Algorithm 4: The rule learning algorithm.
32

Complexity: To analyse the time complexity of the algorithm, we break down to
two main functions. The first one is computing the edit transcripts for all pairs of
samples and the second is processing the rewrite rules. For easier analysis, assume
all samples have the same length l = |s|∀s ∈ Sa. If we use the Needleman-Wrunsh
algorithm to get the pairwise edit transcripts, the first part has a time complexity:
|Sa| ∗ (|Sa| − 1)
2
∗l2
. The complexity of the second part depends on the size of the set
of rules generated at the first step and on the number of rules that will be effectively
recognized using the support and confidence threshold on the first pass. We can
express it as follows: |Rmin|∗(|R|−|Rmin|). Putting both parts together we get that
the proposed rule learning algorithm has complexity
|Sa| ∗ (|Sa| − 1)
2
∗ l2
+ |Rmin| ∗
(|R| − |Rmin|). Because |Rmin| will be (a lot) smaller than |R|, we can simplify the
expression to O ((|Sa| ∗ l)2
∗ |R|).
Assumptions: We now discuss some assumptions that make our algorithm pre-
sented above give good results.
By grouping consecutive edits into a single rewrite rule, we are implicitly making
the following assumption. Let Ro = T , R be the term rewriting system used by
the metamorphic obfuscation engine.
Assumption 4.1. The TRS Ro = T , R is such that T := Σ∗
.
Corollary 4.2.1. Ro is effectively a semi-Thue system as the terms in T contain
only ground terms (strings).
This is a strong assumption and in the case of obfuscation engines only very
simple rewriting rules, such as NOP injectors, would belong to the category of string
rewriting. It is, nevertheless, a useful assumption to make as it allows us to study a
simpler, yet non-trivial problem, which is: learning a semi-Thue system.
• Solving that problem is a step towards the more general result of learning
term rewriting systems. We give some ideas on how to generalise our result in
Section 7.
33

• It can be used to identify the simplest obfuscation rewriting rules. The more
occurrences of the lhs of a rule r ∈ R inside a program pi, the higher the relative
impact of the rule r is. Injector rules of the form ǫ → rhs, by definition, can be
applied anywhere and thus have the highest relative impact. Removing simple
rules from the toolset of malware authors is easier than learning more complex
ones and at the same time it greatly reduces the size of Pa.
• With a large enough sample Sa our algorithm should be able to learn simple
rules that include free variables. This will be the case when variables have a
small domain (i.e variables only take as values the name of registers r1–16).
Assumption 4.2. There is a significant part of the code that remains the same
across most variants in Sa.
While this is a strong assumption, it is reasonable to make it. If variants in Sa do
not share any code fragments, the best solution is the naive one. In practice however,
code mutations will not change a program so radically. It is left for future work to
determine the percentage of unchanged code that is necessary for high quality results.
Let Ao be the algorithm, also referred to as rule application strategy, used by the
metamorphic obfuscation engine to apply the rules of Ro to a program p.
Assumption 4.3. Ao implements the random rule — random offset strategy.
The strategy was explained in Section 3.1.2 and can be summarised as “pick
uniformly at random a redex from the set of redexes of all rewrite rules and contract
it”. Under that assumption, it will be unlikely to encounter nested rule applications
as long as the set of all redexes is large enough. It is also an important factor when
calculating the relative impact of each rule, if the selection is not at random the
impact factor lose its importance.
34

5 Implementation
We provide an implementation of both an obfuscation engine on strings and the final
algorithm proposed in Section 4.3. We have chosen Python 2.7 as our implementation
programming language because it provides libraries that implement parts of our
algorithms and its high level syntax allowed us to quickly implement changes of the
algorithm into our code. All of our code have been developed and tested under 64-bit
version of Ubuntu Linux.
Building an obfuscation engine was not the goal of this work, but we have created
one in order to be able to test our rule learning algorithm. It is thus a very simple
rewriting engine that is easy to modify or extend. The python script will:
1. Generate a string a at random from a predefined alphabet and add a to Sa.
2. Pick at random a string p from Sa, a rewrite rule r and an offset ofs.
3. Contract r at the first redex of lhs(r) in p after ofs. If there are no such redexes
do an injection (always applicable rule of the form ǫ → σ) at the offset.
4. If the resulting string is not already in Sa, add it to Sa.
5. While the size of Sa smaller than a threshold, go to 2.
The second python script is an implementation of the rule learning algorithm
described in Section 4.3. The code is very close to the description given in the
previous chapter. To get the unified edit operations from all pairs of input files we
tried both using sequence aligment libraries from the computational biology field and
the diff utility. Both use the same underlying algorithm but the diff algorithm has
additional heuristics that make it a lot faster and also has an output format easier
to use, for those reasons we have chosen it over sequence alignment. We should note
that the implementation does not use the difflib Python package as we ran into cases
where it did produce sub-optimal results. Instead, we have made use of the standard
Linux diff utility, by writing a Python parser to import the results into our program.
Because diff utility was designed to work on text files, it works with entire lines
unlike the sequence alignment that works character by character. In order to make
35

diff effective the creation of the initial string and the rewritings done put on each
line a single element from our alphabet Σ. This however should not be considered
as a limitation because if, for example, we wanted to apply out algorithm to x86
assembly that has instructions of variable length, we could put one byte at each line.
5.1 Testing
In this section we describe the parameters used for testing the implementation and
the result we had.
Although our algorithm works on strings containing any character, in order to
simulate a real-world use case and have rules that people familiar with code obfusca-
tions recognise, the examples we present here use an extension of the pseudo-assembly
introduced in our motivating example in Section 1.1. Let our instruction set be:
Σ := {ADD, MUL, JMP, JNE, JEQ, SUB, BUS, NOP, XXX}
The “special” element XXX is added when generating the first string, the archetype
α, to represent parts of the code that does not match any rewriting rule, thus limiting
the places where rules can be applied (see assumption 4.2).
The rules of our obfuscation engine are:
R :=



r1: ǫ → NOP
r2: MUL → ADD.ADD.ADD
r3: ADD → BUS
r4: JMP → ADD.SUB.JEQ
r5: JNE → SUB.JEQ



The testing was conducted as follows: First, generate random string α from
elements of Σ. Then, use the rules in R with the random-rule random-offset strategy
to generate the sample set Sa. Finally, run the implementation of the rule learning
algorithm with a given support and confidence threshold.
We have chosen the values of the support threshold = 0.005 and confidence thresh-
old = 0.75 experimentally. We noticed that a high threshold of confidence is effective
to distinguish “good” rules while there is only need for a very small support (mostly
36

to avoid noise). In the Table 1 we present the support and confidence scores of
each rule in R plus the rule with the highest support and the rule with the highest
confidence not in R.
Table 1: Support and confidence of the rules for different sizes of the sample set.
|Sa| R Best from the rest
rule supp. conf. rule supp. conf.
40 r1 0.6458 0.9602 ǫ → JNE 0.0062 0.0092
r2 0.0866 1 ǫ → SUB 0.0062 0.0092
r3 0.0735 1
r4 0.0155 1
r5 0.1519 1
200 r1 0.7797 0.9832 ǫ → JMP 0.003 0.0037
r2 0.0213 0.9482 JEQ → JNE 6.48E-006 1
r3 0.1153 0.992
r4 0.0453 1
r5 0.0228 1
The results we collected from running this experiment multiple times show that
by looking at the support and confidence of the rewrite rules extracted it is easy to
distinguish the rules that are in R. Because our rule learning algorithm does not
have a strategy to process nested rule applications,occasionally we get some rewriting
rules like the following: MUL → BUS.ADD.ADD, JMP → BUS.SUB.JEQ. In
both of those cases, replacing BUS with ADD will transform the rules into one of
the rewrite rules of R.
Adding noise: While the problem formulated in Section 4 requires all the samples
given to the learning algorithm to be rewritings of a single original program, in a
real-world scenario it is unlikely that we could have such a sample set for an external
metamorphic obfuscation engine (it would be possible if the engine is internal or
we somehow have access to it and we can make it generate variants). Thus, it is
interesting to see how our proposed algorithm performs when the sample set includes
37

programs/strings that were not generated by rewriting the archetype α.
To do so, we generated a sample set of 100 variants as before and then added
random strings (using the same alphabet Σ). In Table 2 we give the resutls obtained,
we present the confidence and support of the rules in R as well as the rules with the
highest support and confidence not in R.
Table 2: The support and confidence of the rules when adding random strings to the
sample set.
|S a| R Best from the rest
rule supp. conf. rule supp. conf.
120 r1 0.1245 0.1721 JNE.JMP → MUL.ADD.MUL 5.31E-006 1
r2 0.0139 0.3558 ǫ → MUL 0.0389 0.0537
r3 0.0144 0.2687 ǫ → JEQ 0.0388 0.0536
r4 0.0215 0.4439
r5 0.0118 0.3307
140 r1 0.0785 0.1065 JMP.MUL → JNE.SUB.SUB 1.91E-005 1
r2 0.0075 0.2148 ǫ → JEQ 0.0425 0.0577
r3 0.0076 0.1682 ǫ → MUL 0.0418 0.0567
r4 0.0114 0.2683
r5 0.0084 0.2388
160 r1 0.0504 0.0687 JEQ.JEQ → JNE.SUB.MUL 0.0002 1
r2 0.0039 0.124 JNE.ADD.JEQ → JMP.XXX 0.0002 1
r3 0.0044 0.1009 ǫ → JEQ 0.0439 0.0599
r4 0.0073 0.1851
r5 0.0035 0.1086
It is hard to establish a threshold for support and confidence that would work for
any size of sample set and different percentages of noise in the sample set. Instead
of a fixed threshold, it is easier to look for the rewrite rules that have support and
confidence significantly higher than the rest. It could be up to a future work to find
a formula that determines those thresholds. By increasing the amount of random
38

strings we add to the input sample set, there is a point after which there is no
combination of support and confidence thresholds that can distinguish the rules of
R. This is the case for the third row where |Sa| = 160.
Performance: The code was developed as a proof-of-concept for the algorithm,
and is not optimised for either memory or speed. It was instead designed to allow
easy modifications as the project evolved. The largest test we have done is with
500 files each with 200 instructions of our pseudo assembly. With this input it took
several hours to complete running on a laptop with a CPU speed of 1300MHz. Should
the the theory evolve to include a wider range of obfuscations, it would be interesting
to develop an optimized version of the code.
39

6 Related Work
There have been many proposed methods for malware classification but no definition
of external metamorphic obfuscation has been given so far and by extension no clas-
sification technique to capture malware obfuscated in that way. Our work defines a
new problem and makes the first steps towards a possible solution of it.
Previous malware classification efforts have been focused on oligomorphic, poly-
morphic and internal metamorphic obfuscations. Researchers have applied multiple
techniques to solve this classification problem. The observation that “A compro-
mised application cannot cause much harm unless it interacts with the underlying
operating system” [43], led many researchers to propose solutions based on classify-
ing programs by the sequence of system calls they are invoking. Both static [43] and
dynamic [19, 22] analysis approaches have been explored using either a manual [14]
or automated [47, 13] learning process.
Another direction followed by researchers looking to improve malware classifica-
tion has been the comparison of control flow graphs (CFG). The process, proposed
by Bonfante, Kaczmarek and Marion, involves extracting the CFG of a program and
proving it is isomorphic to the CFG of a known malware [8, 9]. Vinod et al introduced
the idea to compare control flow graphs by using the longest common subsequence of
basic blocks of code in the CFG of two programs [42]. Finally, Mehra, Jain and Uppal
combined CFG analysis with system calls to automatically select the best features
to do classification [30].
The most relevant, to our work, detection and classification technique is program
normalisation. The goal of program normalisation is to reduce the signature space
by undoing obfuscations in order to get a single (or a small set of) normal form.
Christodorescu et al. proposed a malware normaliser for three common obfusca-
tion rules and used it effectively as a pre-processor for commercial malware detectors
[15]. Bruschi et al. extended the same idea to cover a wider range of obfuscations
[11]. It is noteworthy that while this approach might improve performance of mal-
ware classifier, it also has theoretical limits as shown by Owens [34] who proposes a
way to make non-normalisable functions for metamorphic malware.
40

Figure 4: Detecting malware variants using normalisation. Taken from [44].
The closest work related to ours is the was carried out by Walestein et. al [44].
They model metamorphic obfuscation engines as term rewriting systems and present
an algorithm that given the obfuscation rules as a TRS can produce a normalising
TRS, that is convergent and equivalence preserving, for that particular set of obfus-
cations. In their work, obfuscation rules were extracted manually from the mutation
engine that was part of the malware as they were working with internal metamorphic
malware. Such rule extraction technique is expensive, error prone and in the case of
externally obfuscated malware impossible. Thus, we consider our work complemen-
tary to theirs as together they could become a classification technique for external
metamorphic malware. Given an oracle that could solve the problem defined in Sec-
tion 4 and generate the term rewriting system used to obfuscate, we could then use
the technique proposed by Walenstein et al. to turn it into a normaliser for all pro-
grams transformed by that obfuscation engine. While this would present a complete
solution to classification of malware generated by an external obfuscation engine, it
would still have the problem that the Knuth-Bendix completion procedure, used by
Walenstein et al., does not always terminate.
When it comes to the term rewriting system literature, to the best of our knowl-
edge, there have not been any attempts to (approximately) learn the rewriting rules
41

of an unknown TRS. Since term rewriting systems are equivalent to turning ma-
chines, the closest work in this space appears in the computational learning theory
field, such as Gold’s Identification in the limit [20] and Valiant’s Theory of Learnable
[41] research, that studies the learnability of different classes of languages/problems.
42

7 Conclusion and Future Work
The novel nature this research lead us to many interesting problems which the limited
time for this thesis did not allow us to pursue. We have mentioned some of them
throughout Section 3 and Section 4, in this section we expand a little more on them
and suggest possible directions that could be followed.
Starting with external metamorphic obfuscations introduced in Section 3, it would
be interesting to study carefully the design space of such obfuscations as done for
internal metamorphic obfuscations by Walenstein et al. [45]. As already mentioned,
malware relying only on external obfuscation would lose their ability to self-mutate in
order to infect new hosts. A viability analysis of such a model could be done, from the
point of view of a malware author. Related to the previous question, in Section 3.2,
we have highlighted the notion of viability of a mutated variant. A more formal
characterisation of viability should be defined and based on it and the properties of
the obfuscating term rewriting system (such as existence of critical pairs/circles) we
could search for lower and upper bounds on the number of distinct possible variants.
Besides possible future work on external metamorphic obfuscation, we have sug-
gestions for improving our work on learning obfuscations as rewrite rules.
In Section 4.3 we have distinguished what we called nested rule applications.
These are the regions where multiple rewriting rules have been applied over many
phases of the obfuscation. Because of this, the extraction of the pairwise difference
might yield a pair (r, l) such that r and l contain sub-terms of different rewrite rules
(either in their lhs or rhs). The problem that could be studied is, given the pair
(r, l) and a set of known (inferred rules with high confidence) rewrite rules R find
the shortest path, if there is one, of rewrite rule applications from the set R between
r and l.
An important limitation already discussed in Section 4.3 is that our proposed
solution will infer string rewriting rules. While this can capture simple rules, in
order to extract with high confidence more complex rules that are context sensitive
and contain variables, a more powerful learning process has to take place. That
is, to go from learning rules that match substrings to rules that match subterms.
43

Our preliminary research on this topic gave us two distinct directions that could be
explored in search for learning term rewriting rules, regular expressions and anti-
unification.
Learning regular expressions from a set of lhs of rewrite rules could help discover
the invariant parts of the rule and abstract the rest. Different algorithms have been
proposed for learning deterministic finite automata [5] or more closely related, regular
expressions from positive examples [18]. The drawback of regular expressions is that
unilike anti-unification, the formalism does not permit substitution variables.
Anti-unification is the process of constructing the least general generalisation com-
mon to two given symbolic expressions. Given two terms t1 and t2, anti-unification is
concerned with finding a term t such that both t1 and t2 are instances of t under some
substitutions [29]. With anti-unification, we get terms with distinct variables that
we could not obtain with regular expression learning. On the other hand, in order
for anti-unification to work, the initial terms must have some structure as opposed
to regular expression learning that can be applied to any string. For our particular
use case, this should not be a problem as the assembly obtained by any disassembler
has that structure (each instruction is a function symbol and each register is a vari-
able). We thus believe that anti-unification is more suitable than regular expression
learning and should be the way to learn term rewriting rules.
The word problem described in Section 2.1.1 is in general undecidable but has
been shown decidable for certain groups. The malware classification problem using
normalisation is trying to solve the word problem as it tries to determine if a malware
m1 can be reduced to the normal form n of a known malware m2. In other words,
given n, m2 such that n
∗
↔ m2, test whether n
∗
↔ m1, which is equivalent to testing
if m1
∗
↔ m2. It would be thus interesting from a theoretical perspective to see if
the malware classification using program normalisation could be reduced to a group
where the word problem is decidable. If not, a solution in to this problem will always
include heuristics.
Another interesting work that could be pursued is using the proposed learning
method on variants of known internal metamorphic malware. This could be done with
44

the current algorithm and if the improvements suggested above are implemented, a
comparison of the results would help validate the method used. We suggest known
obfuscations because for them we have access to the ground truth, the rules that
have been found after disassembling obfuscation engines.
Following up on the previous suggestion, future work should focus on deﬁning the
obfuscation rewrite rule learning problem for internal obfuscations. With access to
the obfuscation access, even as a black box, it is possible that the learning process
can be shown equivalent to the teacher-student concept used in PAC [27].
Finally, future research could try to use our results both technical and theoretical
in other ﬁelds. Possible topics could range form commercial obfuscation for intellec-
tual property protection to plagiarism detection.
45

8 Acknowledgments
First and foremost, I would like to express my gratitude towards my supervisor, Dr.
Earl Barr. More than a supervisor, he has been a true mentor to me by giving me the
right directions to help my thinking progress. Better than giving me answers, he gave
me advice and motivation to work on the interesting problems that we encountered.
For this I will be always grateful to him.
I would also like to thank Dr. David Clark for suggesting me to use association rule
mining. This simple rule learning mechanism turned out to be a perfect match for
what we needed in this work.
I am thankful to Dr. Hector Menendez Benito for giving me his opinion on parts of
my work as well as taking the time to review with me some of the background work.
For helping me with typesetting in LATEX, I would like to thank Zheng Gao.
For her moral support and proofreading work, I am thankful to Vasiliki Meletaki.
Finally, for their support and encouragement to continue my studies I would like to
thank my parents.
46

References
[1] Association for Computational Learning. http://www.learningtheory.org/.
Accessed: 15-August-2016.
[2] obfuscate, Merriam–Webster Dictionary. http://www.merriam-webster.com/
dictionary/obfuscate. Accessed: 15-August-2016.
[3] L. M. Adleman. An abstract theory of computer viruses (invited talk). In
Proceedings on Advances in Cryptology, CRYPTO ’88, pages 354–374, New York,
NY, USA, 1990. Springer-Verlag New York, Inc.
[4] Rakesh Agrawal, Tomasz Imieli´nski, and Arun Swami. Mining association rules
between sets of items in large databases. In Acm sigmod record, volume 22,
pages 207–216. ACM, 1993.
[5] Dana Angluin. Queries and concept learning. Machine learning, 2(4):319–342,
1988.
[6] Dana Angluin. Computational learning theory: survey and selected bibliogra-
phy. In Proceedings of the twenty-fourth annual ACM symposium on Theory of
computing, pages 351–369. ACM, 1992.
[7] Franz Baader and Tobias Nipkow. Term rewriting and all that. Cambridge
university press, 1999.
[8] Guillaume Bonfante, Matthieu Kaczmarek, and Jean-Yves Marion. Control
ﬂow graphs as malware signatures. In International workshop on the Theory of
Computer Viruses, 2007.
[9] Guillaume Bonfante, Matthieu Kaczmarek, and Jean-Yves Marion. Morpholog-
ical detection of malware. In Malicious and Unwanted Software, 2008. MAL-
WARE 2008. 3rd International Conference on, pages 1–8. IEEE, 2008.
[10] William W Boone. The word problem. Annals of mathematics, pages 207–265,
1959.
47

[11] Danilo Bruschi, Lorenzo Martignoni, and Mattia Monga. Code normalization
for self-mutating malware. IEEE Security and Privacy, 5(2):46–54, 2007.
[12] Mohamed R Chouchane, Andrew Walenstein, and Arun Lakhotia. Statistical
signatures for fast filtering of instruction-substituting metamorphic malware.
In Proceedings of the 2007 ACM workshop on Recurring malcode, pages 31–37.
ACM, 2007.
[13] Mihai Christodorescu, Somesh Jha, and Christopher Kruegel. Mining specifica-
tions of malicious behavior. In Proceedings of the 1st India software engineering
conference, pages 5–14. ACM, 2008.
[14] Mihai Christodorescu, Somesh Jha, Sanjit A Seshia, Dawn Song, and Randal E
Bryant. Semantics-aware malware detection. In 2005 IEEE Symposium on
Security and Privacy (S&P’05), pages 32–46. IEEE, 2005.
[15] Mihai Christodorescu, Johannes Kinder, Somesh Jha, Stefan Katzenbeisser, and
Helmut Veith. Malware normalization. Technical report, University of Wiscon-
sin, 2005.
[16] Max Dauchet. Simulation of turing machines by a left-linear rewrite rule. In
International Conference on Rewriting Techniques and Applications, pages 109–
120. Springer, 1989.
[17] Nachum Dershowitz and Jean-Pierre Jouannaud. Rewrite systems. Citeseer,
1989.
[18] Henning Fernau. Algorithms for learning regular expressions from positive data.
Information and Computation, 207(4):521–541, 2009.
[19] Stephanie Forrest, Steven A Hofmeyr, Anil Somayaji, and Thomas A Longstaff.
A sense of self for unix processes. In Security and Privacy, 1996. Proceedings.,
1996 IEEE Symposium on, pages 120–128. IEEE, 1996.
[20] E Mark Gold. Language identification in the limit. Information and control,
10(5):447–474, 1967.
48

[21] Daniel S. Hirschberg. A linear space algorithm for computing maximal common
subsequences. Communications of the ACM, 18(6):341–343, 1975.
[22] Steven A Hofmeyr, Stephanie Forrest, and Anil Somayaji. Intrusion detection
using sequences of system calls. Journal of computer security, 6(3):151–180,
1998.
[23] Gérard Huet. Confluent reductions: Abstract properties and applications to
term rewriting systems: Abstract properties and applications to term rewriting
systems. Journal of the ACM (JACM), 27(4):797–821, 1980.
[24] Gérard Huet and Dallas Lankford. On the uniform halting problem for term
rewriting systems. IRIA. Laboratoire de Recherche en Informatique et Automa-
tique, 1978.
[25] J W Hunt and M D Mcilroy. An Algorithm for Differential File Comparison.
1976.
[26] Richard M Karp. On-line algorithms versus off-line algorithms: How much is
it worth to know the future? In Proceedings of the IFIP 12th World Computer
Congress on Algorithms, Software, Architecture-Information Processing’92, Vol-
ume 1-Volume I, pages 416–429. North-Holland Publishing Co., 1992.
[27] Michael J Kearns and Umesh Virkumar Vazirani. An introduction to computa-
tional learning theory. MIT press, 1994.
[28] Donald E Knuth and Peter B Bendix. Simple word problems in universal alge-
bras. In Automation of Reasoning, pages 342–376. Springer, 1983.
[29] Temur Kutsia, Jordi Levy, and Mateu Villaret. Anti-unification for unranked
terms and hedges. Journal of Automated Reasoning, 52(2):155–190, 2014.
[30] Vishakha Mehra, Vinesh Jain, and Dolly Uppal. Dacomm: Detection and clas-
sification of metamorphic malware. In Communication Systems and Network
Technologies (CSNT), 2015 Fifth International Conference on, pages 668–673.
IEEE, 2015.
49

[31] Eugene W Myers. An O(ND) difference algorithm and its variations. Algorith-
mica, 1(1-4):251–266, 1986.
[32] Saul B Needleman and Christian D Wunsch. A general method applicable to
the search for similarities in the amino acid sequence of two proteins. Journal
of molecular biology, 48(3):443–453, 1970.
[33] Philip OKane, Sakir Sezer, and Kieran McLaughlin. Obfuscation: the hidden
malware. IEEE Security & Privacy, 9(5):41–47, 2011.
[34] Rodney Owens and Weichao Wang. Non-normalizable functions: A new method
to generate metamorphic malware. In 2011-MILCOM 2011 Military Communi-
cations Conference, pages 1279–1284. IEEE, 2011.
[35] Dana Ron. Automata Learning and its Applications. PhD thesis, Hebrew Uni-
versity, 1995.
[36] Eric D Simonaire. Sub-circuit selection and replacement algorithms modeled as
term rewriting systems. Technical report, DTIC Document, 2008.
[37] Peter Szor. The art of computer virus research and defense. Pearson Education,
2005.
[38] Yoshihito Toyama. Commutativity of term rewriting systems. Programming of
future generation computers II, pages 393–407, 1988.
[39] György Turán. Remarks on computational learning theory. Annals of Mathe-
matics and Artificial Intelligence, 28(1):43–45, 2000.
[40] Muhammad Afzal Upal. Learning plan rewriting rules. In Proceedings of the
Fourteenth International Florida Artificial Intelligence Research Society Confer-
ence, pages 412–416. AAAI Press, 2001.
[41] Leslie G Valiant. A theory of the learnable. Communications of the ACM,
27(11):1134–1142, 1984.
50

[42] P Vinod, Vijay Laxmi, Manoj Singh Gaur, GVSS Kumar, and Yadvendra S
Chundawat. Static cfg analyzer for metamorphic malware code. In Proceedings
of the 2nd international conference on Security of information and networks,
pages 225–228. ACM, 2009.
[43] David Wagner and R Dean. Intrusion detection via static analysis. In Security
and Privacy, 2001. S&P 2001. Proceedings. 2001 IEEE Symposium on, pages
156–168. IEEE, 2001.
[44] Andrew Walenstein, Rachit Mathur, Mohamed R Chouchane, and Arun Lakho-
tia. Normalizing metamorphic malware using term rewriting. In Sixth IEEE
International Workshop on Source Code Analysis and Manipulation, pages 75–
84. IEEE, 2006.
[45] Andrew Walenstein, Rachit Mathur, Mohamed R Chouchane, and Arun Lakho-
tia. The design space of metamorphic malware. In 2nd International Conference
on i-Warfare and Security, pages 241–248, 2007.
[46] Wikipedia. Rewriting — wikipedia, the free encyclopedia. https://en.
wikipedia.org/w/index.php?title=Rewriting&oldid=698782291. Accessed:
24-August-2016.
[47] Qinghua Zhang and Douglas S Reeves. Metaaware: Identifying metamorphic
malware. In Computer Security Applications Conference, 2007. ACSAC 2007.
Twenty-Third Annual, pages 411–420. IEEE, 2007.
[48] Zhi-hong Zuo, Qing-xin Zhu, and Ming-tian Zhou. On the time complexity of
computer viruses. IEEE Transactions on information theory, 51(8):2962–2966,
2005.
51

414351_Iason_Papapanagiotakis-bousy_Iason_Papapanagiotakis_Thesis_2360661_357965939

Recommended

Recommended

More Related Content

What's hot

What's hot (15)

Viewers also liked

Viewers also liked (14)

Similar to 414351_Iason_Papapanagiotakis-bousy_Iason_Papapanagiotakis_Thesis_2360661_357965939

Similar to 414351_Iason_Papapanagiotakis-bousy_Iason_Papapanagiotakis_Thesis_2360661_357965939 (20)

414351_Iason_Papapanagiotakis-bousy_Iason_Papapanagiotakis_Thesis_2360661_357965939