SlideShare a Scribd company logo
A Classification of Viruses through Recursion
Theorems
Guillaume Bonfante, Matthieu Kaczmarek and Jean-Yves Marion
Guillaume.Bonfante@loria.fr, Matthieu.Kaczmarek@loria.fr and
Jean-Yves.Marion@loria.fr
Nancy-Universit´e - Loria - INPL - Ecole Nationale Sup´erieure des Mines de Nancy
B.P. 239, 54506 Vandœuvre-l`es-Nancy C´edex, France
Abstract. We study computer virology from an abstract point of view.
Viruses and worms are self-replicating programs, whose constructions are
essentially based on Kleene’s second recursion theorem. We show that
we can classify viruses as solutions of fixed point equations which are
obtained from different versions of Kleene’s second recursion theorem.
This lead us to consider four classes of viruses which various polymor-
phic features. We propose to use virus distribution in order to deal with
mutations.
Topics covered. Computability theoretic aspects of programs, com-
puter virology.
Keywords. Computer viruses, polymorphism, propagation, recursion
theorem, iteration theorem.
1 Theoretical Computer Virology
An important information security breach is computer virus infections. Follow-
ing Filiol’s book [9], we do think that theoretical studies should help to design
new defenses against computer viruses. The objective of this paper is to pursue
a theoretical study of computer viruses initiated in [4]. Since viruses are essen-
tially self-replicating programs, we see that virus programming methods are an
attempt to answer to von Neumann’s question [22].
Can an automaton be constructed, i.e., assembled and built from appro-
priately “raw material”, by an other automaton? [. . . ] Can the construc-
tion of automata by automata progress from simpler types to increasingly
complicated types?
Abstract computer virology was initiated in the 80’s by the seminal works of
Cohen and Adleman [7]. The latter coined the term virus. Cohen defined viruses
with respect to Turing Machines [8]. Later [1], Adleman took a more abstract
point of view in order to have a definition independent from any particular
computational model. Then, only a few theoretical studies followed those seminal
works. Chess and White refined the mutation model of Cohen in [6]. Zuo and
Zhou formalized polymorphism from Adleman’s work [23] and they analyzed the
time complexity of viruses [24].
Recently, we tried [3, 4] to formalize inside computability the notion of viruses.
This formalization captures previous definitions that we have mentioned above.
We also characterized two kinds of viruses, blueprint and smith viruses, and we
proved constructively their existence. This work proposes to go further, introduc-
ing a notion of distribution to take into account polymorphism or metamorphism.
We define four kinds of viruses:
1. A blueprint virus is a virus, which reproduces by just duplicating its code.
2. An evolving blueprint virus is a virus, which can mutate when it duplicates by
modifying its code. Evolving blueprint viruses are generated by a disbution
engine.
3. A smith virus is a blueprint virus which can use its propagation function
directly to reproduce.
4. Lastly, we present Smith distribution. A virus generating by a Smith distri-
bution can mutate its code like evolving blueprint viruses, but also mutate
its propagation function.
We show that each category is closely linked to a corresponding form of the
recursion theorem, given a rational taxonomy of viruses. So recursion theorems
play a key role in constructions of viruses, which is worth to mention. Indeed,
and despite the works [11, 12], recursion theorems are used essentially to prove
“negative” results such as the constructions of undecidable or inseparable sets,
see [19] for a general reference, or such as Blum’s speed-up theorem [2].
Lastly, we switch to a simple programming language named WHILE+
to illus-
trate the fact that our constructions lives in the programming world. Actually,
we follow the ideas of the experimentation of the iteration theorem and of the re-
cursion theorem, which are developed in [11, 12] by Jones et al. and very recently
by Moss in [16].
2 A Virus Definition
2.1 The WHILE+
language
The domain of computation D is the set of binary trees generated from an atom
nil and a pairing mechanism , . The syntax of WHILE+
is given by the following
grammar from a set of variables V:
Expressions: E → V | cons(E1, E2) | hd(E) | tl(E) |
execn(E0, E1, . . . , En) | specn(E0, E1 . . . , En) with n ≥ 1
Commands: C → V := E | C1; C2 | while(E){C} | if(E){C1}else{C2}
A WHILE+
program p is defined as follows p(V1, . . . , Vn){C; return E; }. A pro-
gram p computes a function p from Dn
to D.
We suppose that we are given a concrete syntax of WHILE+
, that is an encod-
ing of programs by binary trees of D. From now on, when the context is clear,
we do not make any distinction between a program and its concrete syntax. And
we make no distinction between programs and data.
For convenience, we have a built-in self-interpreter execn of WHILE+
programs
which satisfies :
execn (p, x1, . . . xn) = p (x1, . . . xn)
In the above equation, the notation p means the concrete syntax of the program
p.
We also use a built-in specializer specn which satisfies:
specm (p, x1, . . . xm) (xm+1, . . . , xn) = p (x1, . . . xn)
We may omit the subscpript n which indicates the number of arguments of an
interpreter or a specializer.
The use of an interpreter and of a specializer is justified by Jones who showed
in [13] that programs with these constructions can be simulated up to a linear
constant time by programs without them.
If f and g designate the same function, we write f ≈ g. A function f is
semi-computable if there is a program p such that p ≈ f, moreover, if f is
total, we say that f is computable.
2.2 A Computer Virus representation
We propose the following scenario in order to represent viruses. When a program
p is executed within an environment x, the evaluation of p (x), if it halts, is a
new environment. This process may be then repeated by replacing x by the new
computed environment. The entry x is thought of as a finite sequence x1, . . . , xn
which represents files and accessible parameters.
Typically, a program copy which duplicates a file satisfies copy (p, x) =
p, p, x . The original environment is p, x . After the evaluation of copy, we
have the environment p, p, x in which p is copied.
Next consider an example of parasitic virus. Parasitic viruses insert them-
selves into existing files. When an infected host is executed, first the virus in-
fects a new host, then it gives the control back to the original host. For more
details we refer to the virus writing manual of Ludwig [15]. A parasistic virus
is a program v which works on an environment p, q, x . The infected form
of p is B(v, p) where B is a propagation function which specifies how a virus
contaminates a file. Here, the propagation function B can be for example a
program code concatenation function. So, we have a first “generic” equation:
v (p, q, x ) = B(v, p) ( q, x ). Following the description of a parasitic virus,
v computes the infected form B(v, q) and then executes p. This means that the
following equation also holds: v (q, x) = p (B(v, q), x). A parasistic virus is
defined by the two above equations.
More generally, the construction of viruses lies in the resolution of fixed point
equations such as the ones above in which v and B are unknowns. The existence
of solutions of such systems is provided by Kleene’s recursion theorem. From
this observation and following [4], we propose the following virus representation:
Definition 1 (Computer Virus). Let B be a computable function. A virus
w.r.t B is a program v such that ∀p, x : v (p, x) = B(v, p) (x). Then, B is
named a propagation function for the virus v.
This definition includes the ones of Adleman and Cohen, and it handles
more propagation and duplication features than the other models [4]. However,
it is worth to notice that the existence of a virus v w.r.t a given propagation
function B is constructive. This is a key difference since it allows to build viruses
by applying fixed point constructions given by proofs of recursion theorems.
A motivation behind the choice of WHILE+
programming language is the fact
that there is no self-referential operator, like $0 in bash, which returns a copy
of the program concrete syntax. Indeed, we present below virus construction
without this feature. This shows that even if there is no self-referential operator,
there are still viruses. Now, viruses should be more efficient if such operators are
present. Of course, a seminal paper on this subject is [21].
3 Blueprint Duplication
3.1 Blueprint distribution engine
From [4], a blueprint virus for a function g is a program v which computes g
using its own code v and its environment p, x. The function g can be seen as
the virus specification function. A blueprint virus for a function g is a program
v which satisfies
v is a virus w.r.t some propagation function
∀p, x : v (p, x) = g(v, p, x)
(1)
Note that a blueprint virus does not use any code of its propagation function,
unlike smith viruses that we shall see shortly. The solutions of this system are
provided by Kleene’s recursion theorem.
Theorem 2 (Kleene’s Recursion Theorem [14]). Let f be a semi-computable
function. There is a program e such that e (x) = f(e, x).
Definition 3 (Distribution engine). A distribution engine is a program dv
such that for every virus specification program g, dv (g) is a virus w.r.t a fixed
and given a propagation function B.
Theorem 4. There is a distribution engine dv such that for any program g,
dv (g) is a blueprint virus for g .
Proof. We use a construction for the recursion theorem due to Smullyan [20].
It provides a fixpoint which can be directly used as a distribution engine. We
define dv thanks to the concrete syntax of dg as follows:
dg (z,u,y,x){
r := exec(z,spec(u,z,u),y,x);
return r;
}
dv (g){
r := spec(dg,g,dg);
return r;
}
We observe that dv (g) (p, x) = g ( dv (g), p, x). Moreover, dv (g) is clearly
a virus w.r.t to the propagation function spec .
We consider a typical example of blueprint duplication which looks like the
real life virus ILoveYou. This program arrives as an e-mail attachment. Opening
the attachment triggers the attack. The infection first scans the memory for
passwords and sends them back to the attacker, then the virus self-duplicates
sending itself at every address of the local address book.
To represent this scenario we need to deal with mailing processes. A mail
m = @, y is an association of an address @ and data y. Then, we consider that
the environment contains a mailbox mb = m1, . . . , mn which is a sequence of
mails. To send a mail m, we add it to the mailbox, that is mb := cons(m, mb).
We suppose that an external process deals with mailing.
In the following, x denotes the local file structure, and @bk = @1, . . . , @n
denotes the local address book, a sequence of addresses. We finally introduce a
WHILE+
program find which searches its input for passwords and which returns
them as its evaluation. The virus behavior for the scenario of ILoveYou is given
by the following program.
g (v,mb, @bk, x ) {
pass := exec(find,x);
mb := cons(cons(‘‘badguy@dom.com’’,pass),mb);
y := @bk;
while (y) {
mb := cons(cons(hd(y),v),mb);
y := tl(y);
}
return mb;
}
From the virus specification program g, we generate the blueprint virus
dv (g).
3.2 Distributions of evolving blueprint viruses
An evolving blueprint virus is a virus, which can mutate but the propagation
function remains the same. Here, we describe a distribution engine for which the
specification of a virus can use the code of its own distribution engine. Thus,
we can generate evolved copies of a virus. Formally, given a virus specification
function g, a distribution of evolving blueprint viruses is a program dv satisfying:
dv is a distribution engine
∀i, p, x : dv (i) (p, x) = g(dv, i, p, x)
(2)
The existence of blueprint distributions corresponds to a stronger form of the
recursion theorem, which was first proved by Case [5].
Theorem 5 (Explicit recursion [4]). Let f be a semi-computable function.
There exists a computable function e such that ∀x, y : e(x) (y) = f(e, x, y)
where e computes e.
Definition 6 (Distribution engine builder). A builder of distribution engine
is a program cv such that for every virus specification program g, cv (g) is a
distribution engine.
Theorem 7. There is a builder of distribution engine cv such that for any pro-
gram g, cv (g) is a distribution of evolving blueprint viruses for some fixed
propagation function B.
Proof. We define
edg (z,t,i,y,x) {
e := spec(spec3,t,z,t);
return exec(z,e,i,y,x);
}
cv (g){
r := spec(spec3,edg,g,edg);
return r;
}
We observe that for any i, cv (g) (i) (p, x) = g( cv (g), i, p, x). Moreover,
cv (g) (i) is a virus w.r.t spec .
To illustrate Theorem 7, we come back to the scenario of the virus ILoveYou,
and we add to it mutation abilities. We introduce a WHILE+
program poly which
is a polymorphic engine. This program takes a program p and a key i, and it
rewrites p according to i, conserving the semantics of p. That is, poly satisfies
poly (p, i) is one-one on i and poly (p, i) ≈ p .
We build a virus which self-duplicates sending mutated forms of itself. With
the notations of the Sect. 3.1, we consider a behavior described by the following
WHILE+
program.
g (dv,i,mb, @bk, x ) {
pass := exec(find,x);
mb := cons(cons(‘‘badguy@dom.com’’,pass),mb);
next key := cons(nil,i)
virus := exec(dv,next key);
mutation := exec(poly,virus,i);
y := @bk;
while (y) {
mb := cons(cons(hd(y),mutation),mb);
y := tl(y);
}
return mb;
}
We apply Theorem 7 to transform this program into a code of the correspond-
ing distribution engine. So, cv (g) (i) is a copy indexed by i of the evolving
blueprint virus specified by g.
4 Smith Reproduction
4.1 Smith Viruses
We define a smith virus as two programs v, B which is defined w.r.t a virus
specification function g according to the following system.
v is a virus w.r.t B
∀p, x : v (p, x) = g(B, v, p, x)
The class of smith viruses is obtained by the double recursion theorem due to
Smullyan [18] as a solution to the above equations.
Theorem 8 (Double Recursion Theorem [18]). Let f1 and f2 be two semi-
computable functions. There are two programs e1 and e2 such that
e1 (x) = f1(e1, e2, x) e2 (x) = f2(e1, e2, x)
We extend the previous definition of engine distribution to propagation en-
gine as follows.
Definition 9 (Virus Distribution). A virus distribution is a pair (dv, dB)
of programs such that for every virus specification g, dv (g) is a virus w.r.t
dB (g) . As previously, dv is named a distribution engine and dB is named
a propagation engine.
Theorem 10. There is a virus distribution (dv, dB) such that for any program
g, dv (g), dB (g) is a smith virus for g .
Proof. We define the following programs with a double fixed point.
dg1 (z1,z2,y1,y2,y,x) {
e1 := spec(y1,z1,z2,y1,y2);
e2 := spec(y2,z1,z2,y1,y2);
return exec(z1,e1,e2,y,x);
}
dg2 (z1,z2,y1,y2,y,x) {
e1 := spec(y1,z1,z2,y1,y2);
e2 := spec(y2,z1,z2,y1,y2);
return exec(z2,e1,e2,y,x);
}
and
pispec (g,B,v,y,p) {
r := spec(g,B,v,p);
return r;
}
Then, let dv and dB be the following programs.
dv (g){
r := spec(pispec,g);
return spec(dg2,r,g,dg1,dg2);
}
dB (g){
r := spec(pispec,g);
return spec(dg1,r,g,dg1,dg2);
}
We observe that for any program g
dv (g) (p, x) = dB (g) ( dv (g), p) (x) = g( dB (g), dv (g), p, x)
We present how to build the parasitic virus of Sect. 2. The virus specification
function g of the virus is the following.
g (B,v,p, q, x ) {
infected form := exec(B,v,p);
return exec(p,infected form,x);
}
First, it infects a new host q with the virus v using the propagation procedure
B. Then, it executes the original host p. This corresponds to the behavior of a
parasitic virus. We obtain a smith virus using the builder of Theorem 10.
4.2 Smith Distributions
Smith distributions generate viruses which are able to mutate their code and
their propagation mechanism. A smith distribution (dv, dB) w.r.t the virus spec-
ification program g satisfies
(dv, dB) is a virus distribution
∀i, p, x : dv (i) (p, x) = g(dB, dv, i, p, x)
The class of Smith distributions is defined as the solutions of this double
recursion theorem.
Theorem 11 (Double explicit Recursion). Let f1 and f2 be two semi-
computable functions. There are two computable functions e1 and e2 such that
for all x and y
e1(x) (y) = f1(e1, e2, x, y) e2(x) (y) = f2(e1, e2, x, y)
where e1 and e2 respectively compute e1 and e2.
Definition 12 (Distribution builder). A Distribution builder is a pair of pro-
grams cv, cB such that for every virus specification program g, ( cv (g), cB (g))
is a virus distribution.
Theorem 13. There is a distribution builder (cv, cB) such that for any program
g, ( cv (g), cB (g)) is a smith distribution for g .
Proof. We define the following programs:
edg1 (z1,z2,t1,t2,i,y,x) {
e1 := spec(spec5,t1,z1,z2,t1,t2);
e2 := spec(spec5,t2,z1,z2,t1,t2);
return exec(z1,e1,e2,i,y,x);
}
edg2 (z1,z2,t1,t2,i,y,x) {
e1 := spec(spec5,t1,z1,z2,t1,t2);
e2 := spec(spec5,t2,z1,z2,t1,t2);
return exec(z2,e1,e2,i,y,x);
}
and
pispec (g,db,dv,i,y,p) {
r := spec(g,db,dv,i,p);
return r;
}
Let cv and cB be the following programs.
cv (g){
r := spec(pispec ,g)
return spec(spec5,edg2,r,g,edg1,edg2);
}
cB (g){
r := spec(pispec ,g)
return spec(spec5,edg1,r,g,edg1,edg2);
}
We observe that for any program g
cv (g) (i) (p, x) = cB (g) (i) ( cv (g) (i), p) (x)
= g( cB (g), cv (g), i, p, x)
We enhance the virus of Sect. 4.1, adding some polymorphic abilities. Any
virus of generation i infects a new host q with a virus of next generation using
the propagation procedure of generation i. Then it gives the control back to the
original host p. This behavior is illustrated by the following program.
g (db,dv,i,p, q, x ) {
B := exec(db,i);
v := exec(dv,cons(i,nil));
mutation := exec(poly,v,i);
infected form := exec(B,mutation,q);
return exec(p,infected form,x);
}
Then, we obtain the smith distribution by the builder of Theorem 13.
References
1. L. Adleman. An abstract theory of computer viruses. In Advances in Cryptology
– CRYPTO’88, volume 403. Lecture Notes in Computer Science, 1988.
2. M. Blum. A machine-independent theory of the complexity of recursive functions.
Journal of the Association for Computing Machinery, 14(2):322–336, 1967.
3. G. Bonfante, M. Kaczmarek, and J.-Y. Marion. Toward an abstract computer
virology. In ICTAC, pages 579–593, 2005.
4. G. Bonfante, M. Kaczmarek, and J.-Y. Marion. On abstract computer virology
from a recursion-theoretic perspective. Journal in Computer Virology, 1(3-4), 2006.
5. J. Case. Periodicity in generations of automata. Theory of Computing Systems,
8(1):15–32, 1974.
6. D. Chess and S. White. An undetectable computer virus. Proceedings of the 2000
Virus Bulletin Conference (VB2000), 2000.
7. F. Cohen. Computer Viruses. PhD thesis, University of Southern California,
January 1986.
8. F. Cohen. On the implications of computer viruses and methods of defense. Com-
puters and Security, 7:167–184, 1988.
9. E. Filiol. Computer Viruses: from Theory to Applications. Springer-Verlag, 2005.
10. E. Filiol. Malware pattern scanning schemes secure against black-box analysis.
Journal of Computer Virology, 2(1):35–50, 2006.
11. T. Hansen, T. Nikolajsen, J. Tr¨aff, and N. Jones. Experiments with implemen-
tations of two theoretical constructions. In Lecture Notes in Computer Science,
volume 363, pages 119–133. Springer Verlag, 1989.
12. N. Jones. Computer implementation and applications of kleene’s S-m-n and recur-
sive theorems. In Y. N. Moschovakis, editor, Lecture Notes in Mathematics, Logic
From Computer Science, pages 243–263. Springer Verlag, 1991.
13. N. Jones. Constant Time Factors Do Matter. MIT Press, Cambridge, MA, USA,
1997.
14. S. Kleene. Introduction to Metamathematics. Van Nostrand, 1952.
15. M. Ludwig. The Giant Black Book of Computer Viruses. American Eagle Publi-
cations, 1998.
16. L. Moss. Recursion theorems and self-replication via text register machine pro-
grams. In EATCS bulletin, 2006.
17. P. Odiffredi. Classical Recursion Theory. North-Holland, 1989.
18. H. Rogers. Theory of Recursive Functions and Effective Computability. McGraw
Hill, New York, 1967.
19. R. Smullyan. Recursion Theory for Metamathematics. Oxford University Press,
1993.
20. R. Smullyan. Diagonalization and Self-Reference. Oxford University Press, 1994.
21. K. Thompson. Reflections on trusting trust. Communications of the Association
for Computing Machinery, 27(8):761–763, 1984.
22. J. von Neumann. Theory of Self-Reproducing Automata. University of Illinois
Press, Urbana, Illinois, 1966. edited and completed by A.W.Burks.
23. Z. Zuo and M. Zhou. Some further theoretical results about computer viruses. The
Computer Journal, 47(6):627–633, 2004.
24. Z. Zuo, Q.-x. Zhu, and M.-t. Zhou. On the time complexity of computer viruses.
IEEE Transactions on information theory, 51(8):2962–2966, August 2005.

More Related Content

Viewers also liked

A cooperative immunization system for an untrusting internet
A cooperative immunization system for an untrusting internetA cooperative immunization system for an untrusting internet
A cooperative immunization system for an untrusting internet
UltraUploader
 
A method for detecting obfuscated calls in malicious binaries
A method for detecting obfuscated calls in malicious binariesA method for detecting obfuscated calls in malicious binaries
A method for detecting obfuscated calls in malicious binaries
UltraUploader
 
(Ebook computer - ita - pdf) fondamenti di informatica - teoria
(Ebook   computer - ita - pdf) fondamenti di informatica - teoria(Ebook   computer - ita - pdf) fondamenti di informatica - teoria
(Ebook computer - ita - pdf) fondamenti di informatica - teoriaUltraUploader
 
A note on cohen's formal model for computer viruses
A note on cohen's formal model for computer virusesA note on cohen's formal model for computer viruses
A note on cohen's formal model for computer viruses
UltraUploader
 
An introduction to computer viruses
An introduction to computer virusesAn introduction to computer viruses
An introduction to computer viruses
UltraUploader
 
A framework for deception
A framework for deceptionA framework for deception
A framework for deception
UltraUploader
 
A plague of viruses biological, computer and marketing
A plague of viruses biological, computer and marketingA plague of viruses biological, computer and marketing
A plague of viruses biological, computer and marketing
UltraUploader
 
Bot software spreads, causes new worries
Bot software spreads, causes new worriesBot software spreads, causes new worries
Bot software spreads, causes new worries
UltraUploader
 
An overview of unix rootkits
An overview of unix rootkitsAn overview of unix rootkits
An overview of unix rootkits
UltraUploader
 
[Ebook ita - database] access 2000 manuale
[Ebook   ita - database] access 2000 manuale[Ebook   ita - database] access 2000 manuale
[Ebook ita - database] access 2000 manualeUltraUploader
 
A formal definition of computer worms and some related results
A formal definition of computer worms and some related resultsA formal definition of computer worms and some related results
A formal definition of computer worms and some related results
UltraUploader
 

Viewers also liked (11)

A cooperative immunization system for an untrusting internet
A cooperative immunization system for an untrusting internetA cooperative immunization system for an untrusting internet
A cooperative immunization system for an untrusting internet
 
A method for detecting obfuscated calls in malicious binaries
A method for detecting obfuscated calls in malicious binariesA method for detecting obfuscated calls in malicious binaries
A method for detecting obfuscated calls in malicious binaries
 
(Ebook computer - ita - pdf) fondamenti di informatica - teoria
(Ebook   computer - ita - pdf) fondamenti di informatica - teoria(Ebook   computer - ita - pdf) fondamenti di informatica - teoria
(Ebook computer - ita - pdf) fondamenti di informatica - teoria
 
A note on cohen's formal model for computer viruses
A note on cohen's formal model for computer virusesA note on cohen's formal model for computer viruses
A note on cohen's formal model for computer viruses
 
An introduction to computer viruses
An introduction to computer virusesAn introduction to computer viruses
An introduction to computer viruses
 
A framework for deception
A framework for deceptionA framework for deception
A framework for deception
 
A plague of viruses biological, computer and marketing
A plague of viruses biological, computer and marketingA plague of viruses biological, computer and marketing
A plague of viruses biological, computer and marketing
 
Bot software spreads, causes new worries
Bot software spreads, causes new worriesBot software spreads, causes new worries
Bot software spreads, causes new worries
 
An overview of unix rootkits
An overview of unix rootkitsAn overview of unix rootkits
An overview of unix rootkits
 
[Ebook ita - database] access 2000 manuale
[Ebook   ita - database] access 2000 manuale[Ebook   ita - database] access 2000 manuale
[Ebook ita - database] access 2000 manuale
 
A formal definition of computer worms and some related results
A formal definition of computer worms and some related resultsA formal definition of computer worms and some related results
A formal definition of computer worms and some related results
 

Similar to A classification of viruses through recursion theorems

A study of detecting computer viruses in real infected files in the n-gram re...
A study of detecting computer viruses in real infected files in the n-gram re...A study of detecting computer viruses in real infected files in the n-gram re...
A study of detecting computer viruses in real infected files in the n-gram re...
UltraUploader
 
Unified Programming Theory
Unified Programming TheoryUnified Programming Theory
Unified Programming Theory
Crazy Mathematician
 
An undetectable computer virus
An undetectable computer virusAn undetectable computer virus
An undetectable computer virus
UltraUploader
 
Are current antivirus programs able to detect complex metamorphic malware an ...
Are current antivirus programs able to detect complex metamorphic malware an ...Are current antivirus programs able to detect complex metamorphic malware an ...
Are current antivirus programs able to detect complex metamorphic malware an ...
UltraUploader
 
Programming modulo representations
Programming modulo representationsProgramming modulo representations
Programming modulo representations
Marco Benini
 
An abstract theory of computer viruses
An abstract theory of computer virusesAn abstract theory of computer viruses
An abstract theory of computer viruses
UltraUploader
 
Basic programming1..pptx
Basic programming1..pptxBasic programming1..pptx
Basic programming1..pptx
Lokesh238440
 
Asm based modelling of self-replicating programs
Asm based modelling of self-replicating programsAsm based modelling of self-replicating programs
Asm based modelling of self-replicating programs
UltraUploader
 
Iteration, induction, and recursion
Iteration, induction, and recursionIteration, induction, and recursion
Iteration, induction, and recursion
Mohammed Hussein
 
414351_Iason_Papapanagiotakis-bousy_Iason_Papapanagiotakis_Thesis_2360661_357...
414351_Iason_Papapanagiotakis-bousy_Iason_Papapanagiotakis_Thesis_2360661_357...414351_Iason_Papapanagiotakis-bousy_Iason_Papapanagiotakis_Thesis_2360661_357...
414351_Iason_Papapanagiotakis-bousy_Iason_Papapanagiotakis_Thesis_2360661_357...
Jason Papapanagiotakis
 
Lect 1
Lect 1Lect 1
International Journal of Computational Engineering Research (IJCER)
International Journal of Computational Engineering Research (IJCER)International Journal of Computational Engineering Research (IJCER)
International Journal of Computational Engineering Research (IJCER)
ijceronline
 
Modeling and Threshold Sensitivity Analysis of Computer Virus Epidemic
Modeling and Threshold Sensitivity Analysis of Computer Virus EpidemicModeling and Threshold Sensitivity Analysis of Computer Virus Epidemic
Modeling and Threshold Sensitivity Analysis of Computer Virus Epidemic
IOSR Journals
 
I017134347
I017134347I017134347
I017134347
IOSR Journals
 
A general definition of malware
A general definition of malwareA general definition of malware
A general definition of malware
UltraUploader
 
PROGRAMMING LANGUAGES
PROGRAMMING LANGUAGESPROGRAMMING LANGUAGES
PROGRAMMING LANGUAGES
ABHINAV SINGH
 
CORCON2014: Does programming really need data structures?
CORCON2014: Does programming really need data structures?CORCON2014: Does programming really need data structures?
CORCON2014: Does programming really need data structures?
Marco Benini
 
MDH01-CSL03
MDH01-CSL03MDH01-CSL03
MDH01-CSL03
Dan HERNEST
 
Algebraic specification of computer viruses and their environments
Algebraic specification of computer viruses and their environmentsAlgebraic specification of computer viruses and their environments
Algebraic specification of computer viruses and their environments
UltraUploader
 
A Systematic Approach To Probabilistic Pointer Analysis
A Systematic Approach To Probabilistic Pointer AnalysisA Systematic Approach To Probabilistic Pointer Analysis
A Systematic Approach To Probabilistic Pointer Analysis
Monica Franklin
 

Similar to A classification of viruses through recursion theorems (20)

A study of detecting computer viruses in real infected files in the n-gram re...
A study of detecting computer viruses in real infected files in the n-gram re...A study of detecting computer viruses in real infected files in the n-gram re...
A study of detecting computer viruses in real infected files in the n-gram re...
 
Unified Programming Theory
Unified Programming TheoryUnified Programming Theory
Unified Programming Theory
 
An undetectable computer virus
An undetectable computer virusAn undetectable computer virus
An undetectable computer virus
 
Are current antivirus programs able to detect complex metamorphic malware an ...
Are current antivirus programs able to detect complex metamorphic malware an ...Are current antivirus programs able to detect complex metamorphic malware an ...
Are current antivirus programs able to detect complex metamorphic malware an ...
 
Programming modulo representations
Programming modulo representationsProgramming modulo representations
Programming modulo representations
 
An abstract theory of computer viruses
An abstract theory of computer virusesAn abstract theory of computer viruses
An abstract theory of computer viruses
 
Basic programming1..pptx
Basic programming1..pptxBasic programming1..pptx
Basic programming1..pptx
 
Asm based modelling of self-replicating programs
Asm based modelling of self-replicating programsAsm based modelling of self-replicating programs
Asm based modelling of self-replicating programs
 
Iteration, induction, and recursion
Iteration, induction, and recursionIteration, induction, and recursion
Iteration, induction, and recursion
 
414351_Iason_Papapanagiotakis-bousy_Iason_Papapanagiotakis_Thesis_2360661_357...
414351_Iason_Papapanagiotakis-bousy_Iason_Papapanagiotakis_Thesis_2360661_357...414351_Iason_Papapanagiotakis-bousy_Iason_Papapanagiotakis_Thesis_2360661_357...
414351_Iason_Papapanagiotakis-bousy_Iason_Papapanagiotakis_Thesis_2360661_357...
 
Lect 1
Lect 1Lect 1
Lect 1
 
International Journal of Computational Engineering Research (IJCER)
International Journal of Computational Engineering Research (IJCER)International Journal of Computational Engineering Research (IJCER)
International Journal of Computational Engineering Research (IJCER)
 
Modeling and Threshold Sensitivity Analysis of Computer Virus Epidemic
Modeling and Threshold Sensitivity Analysis of Computer Virus EpidemicModeling and Threshold Sensitivity Analysis of Computer Virus Epidemic
Modeling and Threshold Sensitivity Analysis of Computer Virus Epidemic
 
I017134347
I017134347I017134347
I017134347
 
A general definition of malware
A general definition of malwareA general definition of malware
A general definition of malware
 
PROGRAMMING LANGUAGES
PROGRAMMING LANGUAGESPROGRAMMING LANGUAGES
PROGRAMMING LANGUAGES
 
CORCON2014: Does programming really need data structures?
CORCON2014: Does programming really need data structures?CORCON2014: Does programming really need data structures?
CORCON2014: Does programming really need data structures?
 
MDH01-CSL03
MDH01-CSL03MDH01-CSL03
MDH01-CSL03
 
Algebraic specification of computer viruses and their environments
Algebraic specification of computer viruses and their environmentsAlgebraic specification of computer viruses and their environments
Algebraic specification of computer viruses and their environments
 
A Systematic Approach To Probabilistic Pointer Analysis
A Systematic Approach To Probabilistic Pointer AnalysisA Systematic Approach To Probabilistic Pointer Analysis
A Systematic Approach To Probabilistic Pointer Analysis
 

More from UltraUploader

01 le 10 regole dell'hacking
01   le 10 regole dell'hacking01   le 10 regole dell'hacking
01 le 10 regole dell'hackingUltraUploader
 
00 the big guide sz (by dr.to-d)
00   the big guide sz (by dr.to-d)00   the big guide sz (by dr.to-d)
00 the big guide sz (by dr.to-d)UltraUploader
 
[E book ita] php manual
[E book   ita] php manual[E book   ita] php manual
[E book ita] php manual
UltraUploader
 
[Ebook ita - security] introduzione alle tecniche di exploit - mori - ifoa ...
[Ebook   ita - security] introduzione alle tecniche di exploit - mori - ifoa ...[Ebook   ita - security] introduzione alle tecniche di exploit - mori - ifoa ...
[Ebook ita - security] introduzione alle tecniche di exploit - mori - ifoa ...UltraUploader
 
(E book) cracking & hacking tutorial 1000 pagine (ita)
(E book) cracking & hacking tutorial 1000 pagine (ita)(E book) cracking & hacking tutorial 1000 pagine (ita)
(E book) cracking & hacking tutorial 1000 pagine (ita)UltraUploader
 
(Ebook ita - inform - access) guida al database access (doc)
(Ebook   ita - inform - access) guida al database access (doc)(Ebook   ita - inform - access) guida al database access (doc)
(Ebook ita - inform - access) guida al database access (doc)UltraUploader
 
Broadband network virus detection system based on bypass monitor
Broadband network virus detection system based on bypass monitorBroadband network virus detection system based on bypass monitor
Broadband network virus detection system based on bypass monitor
UltraUploader
 
Botnetsand applications
Botnetsand applicationsBotnetsand applications
Botnetsand applications
UltraUploader
 
Blended attacks exploits, vulnerabilities and buffer overflow techniques in c...
Blended attacks exploits, vulnerabilities and buffer overflow techniques in c...Blended attacks exploits, vulnerabilities and buffer overflow techniques in c...
Blended attacks exploits, vulnerabilities and buffer overflow techniques in c...
UltraUploader
 
Blast off!
Blast off!Blast off!
Blast off!
UltraUploader
 
Bird binary interpretation using runtime disassembly
Bird binary interpretation using runtime disassemblyBird binary interpretation using runtime disassembly
Bird binary interpretation using runtime disassembly
UltraUploader
 
Biologically inspired defenses against computer viruses
Biologically inspired defenses against computer virusesBiologically inspired defenses against computer viruses
Biologically inspired defenses against computer viruses
UltraUploader
 
Biological versus computer viruses
Biological versus computer virusesBiological versus computer viruses
Biological versus computer viruses
UltraUploader
 
Biological aspects of computer virology
Biological aspects of computer virologyBiological aspects of computer virology
Biological aspects of computer virology
UltraUploader
 
Biological models of security for virus propagation in computer networks
Biological models of security for virus propagation in computer networksBiological models of security for virus propagation in computer networks
Biological models of security for virus propagation in computer networks
UltraUploader
 
Binary obfuscation using signals
Binary obfuscation using signalsBinary obfuscation using signals
Binary obfuscation using signals
UltraUploader
 
Beyond layers and peripheral antivirus security
Beyond layers and peripheral antivirus securityBeyond layers and peripheral antivirus security
Beyond layers and peripheral antivirus security
UltraUploader
 
Becoming positive
Becoming positiveBecoming positive
Becoming positive
UltraUploader
 

More from UltraUploader (20)

1 (1)
1 (1)1 (1)
1 (1)
 
01 intro
01 intro01 intro
01 intro
 
01 le 10 regole dell'hacking
01   le 10 regole dell'hacking01   le 10 regole dell'hacking
01 le 10 regole dell'hacking
 
00 the big guide sz (by dr.to-d)
00   the big guide sz (by dr.to-d)00   the big guide sz (by dr.to-d)
00 the big guide sz (by dr.to-d)
 
[E book ita] php manual
[E book   ita] php manual[E book   ita] php manual
[E book ita] php manual
 
[Ebook ita - security] introduzione alle tecniche di exploit - mori - ifoa ...
[Ebook   ita - security] introduzione alle tecniche di exploit - mori - ifoa ...[Ebook   ita - security] introduzione alle tecniche di exploit - mori - ifoa ...
[Ebook ita - security] introduzione alle tecniche di exploit - mori - ifoa ...
 
(E book) cracking & hacking tutorial 1000 pagine (ita)
(E book) cracking & hacking tutorial 1000 pagine (ita)(E book) cracking & hacking tutorial 1000 pagine (ita)
(E book) cracking & hacking tutorial 1000 pagine (ita)
 
(Ebook ita - inform - access) guida al database access (doc)
(Ebook   ita - inform - access) guida al database access (doc)(Ebook   ita - inform - access) guida al database access (doc)
(Ebook ita - inform - access) guida al database access (doc)
 
Broadband network virus detection system based on bypass monitor
Broadband network virus detection system based on bypass monitorBroadband network virus detection system based on bypass monitor
Broadband network virus detection system based on bypass monitor
 
Botnetsand applications
Botnetsand applicationsBotnetsand applications
Botnetsand applications
 
Blended attacks exploits, vulnerabilities and buffer overflow techniques in c...
Blended attacks exploits, vulnerabilities and buffer overflow techniques in c...Blended attacks exploits, vulnerabilities and buffer overflow techniques in c...
Blended attacks exploits, vulnerabilities and buffer overflow techniques in c...
 
Blast off!
Blast off!Blast off!
Blast off!
 
Bird binary interpretation using runtime disassembly
Bird binary interpretation using runtime disassemblyBird binary interpretation using runtime disassembly
Bird binary interpretation using runtime disassembly
 
Biologically inspired defenses against computer viruses
Biologically inspired defenses against computer virusesBiologically inspired defenses against computer viruses
Biologically inspired defenses against computer viruses
 
Biological versus computer viruses
Biological versus computer virusesBiological versus computer viruses
Biological versus computer viruses
 
Biological aspects of computer virology
Biological aspects of computer virologyBiological aspects of computer virology
Biological aspects of computer virology
 
Biological models of security for virus propagation in computer networks
Biological models of security for virus propagation in computer networksBiological models of security for virus propagation in computer networks
Biological models of security for virus propagation in computer networks
 
Binary obfuscation using signals
Binary obfuscation using signalsBinary obfuscation using signals
Binary obfuscation using signals
 
Beyond layers and peripheral antivirus security
Beyond layers and peripheral antivirus securityBeyond layers and peripheral antivirus security
Beyond layers and peripheral antivirus security
 
Becoming positive
Becoming positiveBecoming positive
Becoming positive
 

A classification of viruses through recursion theorems

  • 1. A Classification of Viruses through Recursion Theorems Guillaume Bonfante, Matthieu Kaczmarek and Jean-Yves Marion Guillaume.Bonfante@loria.fr, Matthieu.Kaczmarek@loria.fr and Jean-Yves.Marion@loria.fr Nancy-Universit´e - Loria - INPL - Ecole Nationale Sup´erieure des Mines de Nancy B.P. 239, 54506 Vandœuvre-l`es-Nancy C´edex, France Abstract. We study computer virology from an abstract point of view. Viruses and worms are self-replicating programs, whose constructions are essentially based on Kleene’s second recursion theorem. We show that we can classify viruses as solutions of fixed point equations which are obtained from different versions of Kleene’s second recursion theorem. This lead us to consider four classes of viruses which various polymor- phic features. We propose to use virus distribution in order to deal with mutations. Topics covered. Computability theoretic aspects of programs, com- puter virology. Keywords. Computer viruses, polymorphism, propagation, recursion theorem, iteration theorem. 1 Theoretical Computer Virology An important information security breach is computer virus infections. Follow- ing Filiol’s book [9], we do think that theoretical studies should help to design new defenses against computer viruses. The objective of this paper is to pursue a theoretical study of computer viruses initiated in [4]. Since viruses are essen- tially self-replicating programs, we see that virus programming methods are an attempt to answer to von Neumann’s question [22]. Can an automaton be constructed, i.e., assembled and built from appro- priately “raw material”, by an other automaton? [. . . ] Can the construc- tion of automata by automata progress from simpler types to increasingly complicated types? Abstract computer virology was initiated in the 80’s by the seminal works of Cohen and Adleman [7]. The latter coined the term virus. Cohen defined viruses with respect to Turing Machines [8]. Later [1], Adleman took a more abstract point of view in order to have a definition independent from any particular computational model. Then, only a few theoretical studies followed those seminal works. Chess and White refined the mutation model of Cohen in [6]. Zuo and Zhou formalized polymorphism from Adleman’s work [23] and they analyzed the time complexity of viruses [24].
  • 2. Recently, we tried [3, 4] to formalize inside computability the notion of viruses. This formalization captures previous definitions that we have mentioned above. We also characterized two kinds of viruses, blueprint and smith viruses, and we proved constructively their existence. This work proposes to go further, introduc- ing a notion of distribution to take into account polymorphism or metamorphism. We define four kinds of viruses: 1. A blueprint virus is a virus, which reproduces by just duplicating its code. 2. An evolving blueprint virus is a virus, which can mutate when it duplicates by modifying its code. Evolving blueprint viruses are generated by a disbution engine. 3. A smith virus is a blueprint virus which can use its propagation function directly to reproduce. 4. Lastly, we present Smith distribution. A virus generating by a Smith distri- bution can mutate its code like evolving blueprint viruses, but also mutate its propagation function. We show that each category is closely linked to a corresponding form of the recursion theorem, given a rational taxonomy of viruses. So recursion theorems play a key role in constructions of viruses, which is worth to mention. Indeed, and despite the works [11, 12], recursion theorems are used essentially to prove “negative” results such as the constructions of undecidable or inseparable sets, see [19] for a general reference, or such as Blum’s speed-up theorem [2]. Lastly, we switch to a simple programming language named WHILE+ to illus- trate the fact that our constructions lives in the programming world. Actually, we follow the ideas of the experimentation of the iteration theorem and of the re- cursion theorem, which are developed in [11, 12] by Jones et al. and very recently by Moss in [16]. 2 A Virus Definition 2.1 The WHILE+ language The domain of computation D is the set of binary trees generated from an atom nil and a pairing mechanism , . The syntax of WHILE+ is given by the following grammar from a set of variables V: Expressions: E → V | cons(E1, E2) | hd(E) | tl(E) | execn(E0, E1, . . . , En) | specn(E0, E1 . . . , En) with n ≥ 1 Commands: C → V := E | C1; C2 | while(E){C} | if(E){C1}else{C2} A WHILE+ program p is defined as follows p(V1, . . . , Vn){C; return E; }. A pro- gram p computes a function p from Dn to D. We suppose that we are given a concrete syntax of WHILE+ , that is an encod- ing of programs by binary trees of D. From now on, when the context is clear,
  • 3. we do not make any distinction between a program and its concrete syntax. And we make no distinction between programs and data. For convenience, we have a built-in self-interpreter execn of WHILE+ programs which satisfies : execn (p, x1, . . . xn) = p (x1, . . . xn) In the above equation, the notation p means the concrete syntax of the program p. We also use a built-in specializer specn which satisfies: specm (p, x1, . . . xm) (xm+1, . . . , xn) = p (x1, . . . xn) We may omit the subscpript n which indicates the number of arguments of an interpreter or a specializer. The use of an interpreter and of a specializer is justified by Jones who showed in [13] that programs with these constructions can be simulated up to a linear constant time by programs without them. If f and g designate the same function, we write f ≈ g. A function f is semi-computable if there is a program p such that p ≈ f, moreover, if f is total, we say that f is computable. 2.2 A Computer Virus representation We propose the following scenario in order to represent viruses. When a program p is executed within an environment x, the evaluation of p (x), if it halts, is a new environment. This process may be then repeated by replacing x by the new computed environment. The entry x is thought of as a finite sequence x1, . . . , xn which represents files and accessible parameters. Typically, a program copy which duplicates a file satisfies copy (p, x) = p, p, x . The original environment is p, x . After the evaluation of copy, we have the environment p, p, x in which p is copied. Next consider an example of parasitic virus. Parasitic viruses insert them- selves into existing files. When an infected host is executed, first the virus in- fects a new host, then it gives the control back to the original host. For more details we refer to the virus writing manual of Ludwig [15]. A parasistic virus is a program v which works on an environment p, q, x . The infected form of p is B(v, p) where B is a propagation function which specifies how a virus contaminates a file. Here, the propagation function B can be for example a program code concatenation function. So, we have a first “generic” equation: v (p, q, x ) = B(v, p) ( q, x ). Following the description of a parasitic virus, v computes the infected form B(v, q) and then executes p. This means that the following equation also holds: v (q, x) = p (B(v, q), x). A parasistic virus is defined by the two above equations. More generally, the construction of viruses lies in the resolution of fixed point equations such as the ones above in which v and B are unknowns. The existence of solutions of such systems is provided by Kleene’s recursion theorem. From this observation and following [4], we propose the following virus representation:
  • 4. Definition 1 (Computer Virus). Let B be a computable function. A virus w.r.t B is a program v such that ∀p, x : v (p, x) = B(v, p) (x). Then, B is named a propagation function for the virus v. This definition includes the ones of Adleman and Cohen, and it handles more propagation and duplication features than the other models [4]. However, it is worth to notice that the existence of a virus v w.r.t a given propagation function B is constructive. This is a key difference since it allows to build viruses by applying fixed point constructions given by proofs of recursion theorems. A motivation behind the choice of WHILE+ programming language is the fact that there is no self-referential operator, like $0 in bash, which returns a copy of the program concrete syntax. Indeed, we present below virus construction without this feature. This shows that even if there is no self-referential operator, there are still viruses. Now, viruses should be more efficient if such operators are present. Of course, a seminal paper on this subject is [21]. 3 Blueprint Duplication 3.1 Blueprint distribution engine From [4], a blueprint virus for a function g is a program v which computes g using its own code v and its environment p, x. The function g can be seen as the virus specification function. A blueprint virus for a function g is a program v which satisfies v is a virus w.r.t some propagation function ∀p, x : v (p, x) = g(v, p, x) (1) Note that a blueprint virus does not use any code of its propagation function, unlike smith viruses that we shall see shortly. The solutions of this system are provided by Kleene’s recursion theorem. Theorem 2 (Kleene’s Recursion Theorem [14]). Let f be a semi-computable function. There is a program e such that e (x) = f(e, x). Definition 3 (Distribution engine). A distribution engine is a program dv such that for every virus specification program g, dv (g) is a virus w.r.t a fixed and given a propagation function B. Theorem 4. There is a distribution engine dv such that for any program g, dv (g) is a blueprint virus for g . Proof. We use a construction for the recursion theorem due to Smullyan [20]. It provides a fixpoint which can be directly used as a distribution engine. We define dv thanks to the concrete syntax of dg as follows: dg (z,u,y,x){ r := exec(z,spec(u,z,u),y,x); return r; } dv (g){ r := spec(dg,g,dg); return r; }
  • 5. We observe that dv (g) (p, x) = g ( dv (g), p, x). Moreover, dv (g) is clearly a virus w.r.t to the propagation function spec . We consider a typical example of blueprint duplication which looks like the real life virus ILoveYou. This program arrives as an e-mail attachment. Opening the attachment triggers the attack. The infection first scans the memory for passwords and sends them back to the attacker, then the virus self-duplicates sending itself at every address of the local address book. To represent this scenario we need to deal with mailing processes. A mail m = @, y is an association of an address @ and data y. Then, we consider that the environment contains a mailbox mb = m1, . . . , mn which is a sequence of mails. To send a mail m, we add it to the mailbox, that is mb := cons(m, mb). We suppose that an external process deals with mailing. In the following, x denotes the local file structure, and @bk = @1, . . . , @n denotes the local address book, a sequence of addresses. We finally introduce a WHILE+ program find which searches its input for passwords and which returns them as its evaluation. The virus behavior for the scenario of ILoveYou is given by the following program. g (v,mb, @bk, x ) { pass := exec(find,x); mb := cons(cons(‘‘badguy@dom.com’’,pass),mb); y := @bk; while (y) { mb := cons(cons(hd(y),v),mb); y := tl(y); } return mb; } From the virus specification program g, we generate the blueprint virus dv (g). 3.2 Distributions of evolving blueprint viruses An evolving blueprint virus is a virus, which can mutate but the propagation function remains the same. Here, we describe a distribution engine for which the specification of a virus can use the code of its own distribution engine. Thus, we can generate evolved copies of a virus. Formally, given a virus specification function g, a distribution of evolving blueprint viruses is a program dv satisfying: dv is a distribution engine ∀i, p, x : dv (i) (p, x) = g(dv, i, p, x) (2) The existence of blueprint distributions corresponds to a stronger form of the recursion theorem, which was first proved by Case [5].
  • 6. Theorem 5 (Explicit recursion [4]). Let f be a semi-computable function. There exists a computable function e such that ∀x, y : e(x) (y) = f(e, x, y) where e computes e. Definition 6 (Distribution engine builder). A builder of distribution engine is a program cv such that for every virus specification program g, cv (g) is a distribution engine. Theorem 7. There is a builder of distribution engine cv such that for any pro- gram g, cv (g) is a distribution of evolving blueprint viruses for some fixed propagation function B. Proof. We define edg (z,t,i,y,x) { e := spec(spec3,t,z,t); return exec(z,e,i,y,x); } cv (g){ r := spec(spec3,edg,g,edg); return r; } We observe that for any i, cv (g) (i) (p, x) = g( cv (g), i, p, x). Moreover, cv (g) (i) is a virus w.r.t spec . To illustrate Theorem 7, we come back to the scenario of the virus ILoveYou, and we add to it mutation abilities. We introduce a WHILE+ program poly which is a polymorphic engine. This program takes a program p and a key i, and it rewrites p according to i, conserving the semantics of p. That is, poly satisfies poly (p, i) is one-one on i and poly (p, i) ≈ p . We build a virus which self-duplicates sending mutated forms of itself. With the notations of the Sect. 3.1, we consider a behavior described by the following WHILE+ program. g (dv,i,mb, @bk, x ) { pass := exec(find,x); mb := cons(cons(‘‘badguy@dom.com’’,pass),mb); next key := cons(nil,i) virus := exec(dv,next key); mutation := exec(poly,virus,i); y := @bk; while (y) { mb := cons(cons(hd(y),mutation),mb); y := tl(y); } return mb; } We apply Theorem 7 to transform this program into a code of the correspond- ing distribution engine. So, cv (g) (i) is a copy indexed by i of the evolving blueprint virus specified by g.
  • 7. 4 Smith Reproduction 4.1 Smith Viruses We define a smith virus as two programs v, B which is defined w.r.t a virus specification function g according to the following system. v is a virus w.r.t B ∀p, x : v (p, x) = g(B, v, p, x) The class of smith viruses is obtained by the double recursion theorem due to Smullyan [18] as a solution to the above equations. Theorem 8 (Double Recursion Theorem [18]). Let f1 and f2 be two semi- computable functions. There are two programs e1 and e2 such that e1 (x) = f1(e1, e2, x) e2 (x) = f2(e1, e2, x) We extend the previous definition of engine distribution to propagation en- gine as follows. Definition 9 (Virus Distribution). A virus distribution is a pair (dv, dB) of programs such that for every virus specification g, dv (g) is a virus w.r.t dB (g) . As previously, dv is named a distribution engine and dB is named a propagation engine. Theorem 10. There is a virus distribution (dv, dB) such that for any program g, dv (g), dB (g) is a smith virus for g . Proof. We define the following programs with a double fixed point. dg1 (z1,z2,y1,y2,y,x) { e1 := spec(y1,z1,z2,y1,y2); e2 := spec(y2,z1,z2,y1,y2); return exec(z1,e1,e2,y,x); } dg2 (z1,z2,y1,y2,y,x) { e1 := spec(y1,z1,z2,y1,y2); e2 := spec(y2,z1,z2,y1,y2); return exec(z2,e1,e2,y,x); } and pispec (g,B,v,y,p) { r := spec(g,B,v,p); return r; } Then, let dv and dB be the following programs. dv (g){ r := spec(pispec,g); return spec(dg2,r,g,dg1,dg2); } dB (g){ r := spec(pispec,g); return spec(dg1,r,g,dg1,dg2); }
  • 8. We observe that for any program g dv (g) (p, x) = dB (g) ( dv (g), p) (x) = g( dB (g), dv (g), p, x) We present how to build the parasitic virus of Sect. 2. The virus specification function g of the virus is the following. g (B,v,p, q, x ) { infected form := exec(B,v,p); return exec(p,infected form,x); } First, it infects a new host q with the virus v using the propagation procedure B. Then, it executes the original host p. This corresponds to the behavior of a parasitic virus. We obtain a smith virus using the builder of Theorem 10. 4.2 Smith Distributions Smith distributions generate viruses which are able to mutate their code and their propagation mechanism. A smith distribution (dv, dB) w.r.t the virus spec- ification program g satisfies (dv, dB) is a virus distribution ∀i, p, x : dv (i) (p, x) = g(dB, dv, i, p, x) The class of Smith distributions is defined as the solutions of this double recursion theorem. Theorem 11 (Double explicit Recursion). Let f1 and f2 be two semi- computable functions. There are two computable functions e1 and e2 such that for all x and y e1(x) (y) = f1(e1, e2, x, y) e2(x) (y) = f2(e1, e2, x, y) where e1 and e2 respectively compute e1 and e2. Definition 12 (Distribution builder). A Distribution builder is a pair of pro- grams cv, cB such that for every virus specification program g, ( cv (g), cB (g)) is a virus distribution. Theorem 13. There is a distribution builder (cv, cB) such that for any program g, ( cv (g), cB (g)) is a smith distribution for g . Proof. We define the following programs:
  • 9. edg1 (z1,z2,t1,t2,i,y,x) { e1 := spec(spec5,t1,z1,z2,t1,t2); e2 := spec(spec5,t2,z1,z2,t1,t2); return exec(z1,e1,e2,i,y,x); } edg2 (z1,z2,t1,t2,i,y,x) { e1 := spec(spec5,t1,z1,z2,t1,t2); e2 := spec(spec5,t2,z1,z2,t1,t2); return exec(z2,e1,e2,i,y,x); } and pispec (g,db,dv,i,y,p) { r := spec(g,db,dv,i,p); return r; } Let cv and cB be the following programs. cv (g){ r := spec(pispec ,g) return spec(spec5,edg2,r,g,edg1,edg2); } cB (g){ r := spec(pispec ,g) return spec(spec5,edg1,r,g,edg1,edg2); } We observe that for any program g cv (g) (i) (p, x) = cB (g) (i) ( cv (g) (i), p) (x) = g( cB (g), cv (g), i, p, x) We enhance the virus of Sect. 4.1, adding some polymorphic abilities. Any virus of generation i infects a new host q with a virus of next generation using the propagation procedure of generation i. Then it gives the control back to the original host p. This behavior is illustrated by the following program. g (db,dv,i,p, q, x ) { B := exec(db,i); v := exec(dv,cons(i,nil)); mutation := exec(poly,v,i); infected form := exec(B,mutation,q); return exec(p,infected form,x); } Then, we obtain the smith distribution by the builder of Theorem 13. References 1. L. Adleman. An abstract theory of computer viruses. In Advances in Cryptology – CRYPTO’88, volume 403. Lecture Notes in Computer Science, 1988. 2. M. Blum. A machine-independent theory of the complexity of recursive functions. Journal of the Association for Computing Machinery, 14(2):322–336, 1967. 3. G. Bonfante, M. Kaczmarek, and J.-Y. Marion. Toward an abstract computer virology. In ICTAC, pages 579–593, 2005.
  • 10. 4. G. Bonfante, M. Kaczmarek, and J.-Y. Marion. On abstract computer virology from a recursion-theoretic perspective. Journal in Computer Virology, 1(3-4), 2006. 5. J. Case. Periodicity in generations of automata. Theory of Computing Systems, 8(1):15–32, 1974. 6. D. Chess and S. White. An undetectable computer virus. Proceedings of the 2000 Virus Bulletin Conference (VB2000), 2000. 7. F. Cohen. Computer Viruses. PhD thesis, University of Southern California, January 1986. 8. F. Cohen. On the implications of computer viruses and methods of defense. Com- puters and Security, 7:167–184, 1988. 9. E. Filiol. Computer Viruses: from Theory to Applications. Springer-Verlag, 2005. 10. E. Filiol. Malware pattern scanning schemes secure against black-box analysis. Journal of Computer Virology, 2(1):35–50, 2006. 11. T. Hansen, T. Nikolajsen, J. Tr¨aff, and N. Jones. Experiments with implemen- tations of two theoretical constructions. In Lecture Notes in Computer Science, volume 363, pages 119–133. Springer Verlag, 1989. 12. N. Jones. Computer implementation and applications of kleene’s S-m-n and recur- sive theorems. In Y. N. Moschovakis, editor, Lecture Notes in Mathematics, Logic From Computer Science, pages 243–263. Springer Verlag, 1991. 13. N. Jones. Constant Time Factors Do Matter. MIT Press, Cambridge, MA, USA, 1997. 14. S. Kleene. Introduction to Metamathematics. Van Nostrand, 1952. 15. M. Ludwig. The Giant Black Book of Computer Viruses. American Eagle Publi- cations, 1998. 16. L. Moss. Recursion theorems and self-replication via text register machine pro- grams. In EATCS bulletin, 2006. 17. P. Odiffredi. Classical Recursion Theory. North-Holland, 1989. 18. H. Rogers. Theory of Recursive Functions and Effective Computability. McGraw Hill, New York, 1967. 19. R. Smullyan. Recursion Theory for Metamathematics. Oxford University Press, 1993. 20. R. Smullyan. Diagonalization and Self-Reference. Oxford University Press, 1994. 21. K. Thompson. Reflections on trusting trust. Communications of the Association for Computing Machinery, 27(8):761–763, 1984. 22. J. von Neumann. Theory of Self-Reproducing Automata. University of Illinois Press, Urbana, Illinois, 1966. edited and completed by A.W.Burks. 23. Z. Zuo and M. Zhou. Some further theoretical results about computer viruses. The Computer Journal, 47(6):627–633, 2004. 24. Z. Zuo, Q.-x. Zhu, and M.-t. Zhou. On the time complexity of computer viruses. IEEE Transactions on information theory, 51(8):2962–2966, August 2005.