Building a usage profile for anomaly detection on computer networks
1. 0
Building a Usage Profile of a Computer Network System for
Anomaly Detection on the Computer Network and various
Peripherals on the Network
Nathanael Ato Asaam
Founder and CEO
Equicksales Consulting Ltd.
2019
2. 1
1
Abstract
This paper is an investigation into building usage profiles of a system using behavior models. Such
behavior models are the heart of machine learning, and evolutionary computing. Some other
methods of building such usage profiles include the use of statistical models such as time series
models, univariate models and mean and standard deviation models. The aim of building these
usage profiles is to be able to detect unusual behavior on the system. This paper uses regression
to determine the usage profiles of a system by studying the relationship between relevant system
variables that will be used to formulate the usage profile. The dependent and independent
variables for the usage profile can be determined from an audit trail.
Additionally, the paper applies hidden markov models to study the various states a
computer system can fall into and the various stage transitions in order to be able to predict
unusual behavior in the system. Unusual behavior in this case may be a particular state or a
transition from one state to another or the manner in which a particular state transition occurred.
With this usage profile which is composed of the usage profile equation and a mean and standard
deviation model that capture average usage and its standard deviation and the markov chain
model that captures the various states of the system and the various state transition it becomes
possible to detect anomaly on the system. Using linear and nonlinear programming, the usage
profile equation can be maximized or minimized to determine states of the system and points at
which the system is optimal. This can help improve the system’s usage.
Also using differential coefficient of the usage profile equation and other statistical models
such as the mean and standard deviation model, a threat profile of the system can be developed.
When the threat profile equation is minimized using linear and nonlinear programming, it will
help prevent threats on the system. The benefit of this research is its application to the development
of anomaly threat detection systems and risk analysis systems that can be used for performing
computer security risk assessments and analysis.
The research model of this paper is Y=f (Xi, Ci) such that Xi represents system variables
like number of application software running or number of system processes. Ci represents system
constants like average number of processes. During this research, an experiment was conducted
into how to represent a computer system’s usage with an abstract mathematical model. The
experiment was conducted on desktops using micro usage models of a network system.
3. 2
2
Threat analysis and detection is also done using some special parameters of the usage
model. These parameters are constants in the system. Examples of these constants include rates at
which the system’s usage increase or decrease with respect to certain variables in the system, and
the rate at which threat occurrence increase or decrease in the system with respect to variables
that make up the usage model. The normal usage and threat models on the other hand are the
objective functions that are used for analyzing threats.
The techniques we have discussed in this paper make it possible to achieve correctness,
promptness and ease of use. The usage model function with its associated average usage and
standard deviation make it possible to ensure correctness of the intrusion detection system. This is
because the statistical data sampled for development of an intrusion detection system developed
using these techniques can be used to formulate an acceptable usage range. There are two special
utilities that compose the usage model. They are essential for improving convenient usage and
preventing false alarms. They make the intrusion detection system correct and prompt at
preventing threats. These concepts are the basics for developing a security audit framework.
A simulation was run for four times within a week. On the first instance, it was run for
15 minutes. On the second instance, it was run for 30 minutes and on the third instance it was run
for 45 minutes. On the last instance it was run for 60 minutes. The results indicate that a threat
detection system can be built using the differential equation technique, the novel self integrating
data structure and linear and non linear programming concepts.
To make the intrusion detection system for which this research model proposes detect
threats promptly, multithreading is applied to analyze, predict, detect and halt threats.
Multithreading is a programming concept that ensure that several processes run on the computer
at the same time. This concept makes it possible to predict multiple threats, do multiple threat
analysis and halt or alarm the occurrences of multiple threats on a computer system.
Ease of use of a system for which the intrusion detection system is developed is achieved
using the mean and standard deviation model. Without that model, there is no acceptable range of
our usage. That means that the average usage and its standard deviation prevents a rigid usage
model and as such makes usage convenient.
4. 3
3
Table of Contents
Abstract.......................................................................................................................................1
Introduction.................................................................................................................................7
Background.............................................................................................................................7
Problem Definition..................................................................................................................8
Research Questions .................................................................................................................8
Objectives ...............................................................................................................................9
Behavior Models .....................................................................................................................9
System Threats......................................................................................................................10
Boolean Calculus...................................................................................................................10
Micro Usage Models .............................................................................................................10
Properties of Intrusion Detection Systems .............................................................................10
Research Model and Methodology ........................................................................................11
Statistical and Machine Learning Models ..............................................................................11
Cognitive Based, User Intention Based and Computer Immunology Based Models ...............11
Literature Review......................................................................................................................12
Intrusion Detection Systems ..................................................................................................12
Behavior Encryption..............................................................................................................13
Risk Analysis ........................................................................................................................14
Information Security Awareness and Practices ......................................................................14
Protocol for Mitigating Risks on Social Networking Sites .....................................................15
Research Model and Methodology ............................................................................................16
Research Model.....................................................................................................................16
Methodology.........................................................................................................................18
5. 4
4
Machine Learning Algorithms & Behavior Based Intrusion Systems.....................................19
Audit Trail Analysis ..............................................................................................................19
Normal Usage Model ............................................................................................................19
Threat Modelling...................................................................................................................19
Boolean Calculus...................................................................................................................20
Experimenting Usage and Threat Models ..............................................................................20
Computer Usage Survey........................................................................................................20
Threat Detection Systems......................................................................................................20
Threats Associated with Computer Systems ..............................................................................21
Attacks associated with a computer system............................................................................21
Malicious Code .....................................................................................................................21
IP Scan and Attack ................................................................................................................21
Web Browsing.......................................................................................................................21
Virus .....................................................................................................................................22
Unprotected Shares................................................................................................................22
Mass emails...........................................................................................................................22
Simple Network Management Protocol (SNMP)....................................................................22
Hoaxes ..................................................................................................................................22
Backdoors .............................................................................................................................22
Password Crack.....................................................................................................................23
Brute Force............................................................................................................................23
Dictionary .............................................................................................................................23
Denial of Service (DoS) and Distributed Denial of Service (DDoS).......................................23
Spoofing................................................................................................................................24
6. 5
5
Man in the Middle.................................................................................................................24
Spam.....................................................................................................................................24
Mail Bombing .......................................................................................................................25
Mathematical Modelling Techniques and Machine Learning Based Models .............................26
Simple Linear Regression......................................................................................................26
Multiple Linear Regression ...................................................................................................27
Non Linear Regression ..........................................................................................................27
Machine Learning based models Used for Developing Anomaly Based.....................................28
The Normal Usage Model of a System ......................................................................................30
Single Variable Calculus Review and its Applications...........................................................31
Authentication Usage Model .................................................................................................32
Session Usage Model ............................................................................................................33
Memory Usage Model...........................................................................................................33
CPU Usage Model.................................................................................................................33
Program Usage Model .......................................................................................................34
Host Usage Model .............................................................................................................34
Battery Usage Model.........................................................................................................34
Device Usage Model .........................................................................................................34
Server Usage Model ..........................................................................................................35
Port Usage Model..............................................................................................................35
Network Usage Model.......................................................................................................35
Aggressive Usage Detector................................................................................................35
False Alarm Detector.............................................................................................................35
Threat Models in a System ........................................................................................................38
7. 6
6
Properties and Methods of the Novel Self Integrating Data Structure.....................................38
Integration Review ................................................................................................................38
Interpretation of Threat Model Integrals ................................................................................39
Threat Analysis and Detection...............................................................................................39
Threat Prediction...................................................................................................................40
Risk Analysis in a System .....................................................................................................41
Normal Usage Model and Threat Model Simulation..................................................................41
Tools and Computer Packages...................................................................................................43
Conclusion and Discussion........................................................................................................44
References.................................................................................................................................47
8. 7
7
Introduction
Background
Cyber security threats on computer networks have the potential of causing damage to resources on
the computer network. Examples of these damages include corrupting data stored or transmitted
on the network, infesting a host on the network with virus, impersonating a valid user on the
network and preventing proper functioning of applications softwares on various host on the
network. The security of computer systems is very essential to various organizations. Computer
systems security is usually provided by computer software that protect the computer system for
which they were developed. Such a computer software system is an intrusion detection system.
Other computer systems that provide security are antivirus and firewall and risk analysis systems.
Also, periodic computer security audits will enable threat detection and prevention on computer
networks.
There are two types of intrusion detection systems. These are knowledge based intrusion
detection systems also known as signature based intrusion detection systems and behavior based
intrusion detection systems also known as anomaly intrusion detection systems. Behavior based
intrusion detection systems detect and prevent intrusions based on deviations from an observed
behavior pattern of the computer system for which the intrusion detection system has been built.
These deviations represent threats on the system. Knowledge based intrusion detection systems
detect intrusions based on mappings of system occurrences with a database of known threats. The
database of known threats is known as threat signatures. Intrusion detection systems are also
known as threat detection systems.
The first goal of this paper is to investigate techniques for representing a computer system’s
normal usage with a mathematical abstract model. The mathematical abstract model is known in
this paper as a normal usage model. It is hoped that, the normal usage model will aid in analyzing
activities and occurrences on a computer system that deviates from the system’s normal usage.
This will help in detecting and preventing threat on the system. The second goal of this paper is to
examine anomaly detection by analyzing changes in a system that deviates from the systems
normal usage.
This research paper will also be doing an investigation into how to build a usage profile
that can be used to determine anomalous activities on a computer system. The paper proposes a
9. 8
8
research model made up of a dependent variable and one or more independent variables that can
be used for modelling the usage of a computer system. This research model is a regression based
model. As such, simple linear or multiple linear regression can be used to develop the model. The
research paper also uses a statistical model known as mean and standard deviation model. The
mean and standard deviation model models the average usage of the system and it associated
standard deviation. Finally, the paper also uses a markov chain model to model various states in a
computer system, their associated probabilities and the various state transitions. These three
different models are behavior models and together form the usage profile that this paper proposes.
Also, the paper uses a java interface for implementing the usage model that describes a component
of a computer system whose usage can be modelled using simple or multiple linear regression.
Problem Definition
If the normal functioning of a computer system can be represented by an abstract model, then any
deviation from that abstract model can be used to analyze and detect threats in that system.
The main problems this paper seeks to investigate are listed below.
• To represent the normal usage of a computer network with a mathematical abstract
model.
• To investigate techniques for building a usage profile of a computer network.
• To determining activities and occurrences that are deviations from a system’s
normal usage and flagging them as anomalous activities.
• To develop an anomaly intrusion detection system.
• To develop a risk analysis system
• To develop a security audit framework made of an anomaly intrusion detection
system and a risk analysis system.
• To draft a document that will detail the operation and administration of the security
audit framework.
In this paper, the abstract model of the system’s usage is known as a normal usage model and the
deviations from the system’s normal usage is known as threats.
Research Questions
The main questions to be investigated are listed below.
• What are the best and most efficient techniques for modelling a computer network’s
normal usage?
10. 9
9
• How can we build a usage profile of a computer network that will be adequate for
detecting anomalous activities on the network?
• What are the best techniques for designing and implementing an anomaly intrusion
detection system?
• What are the best techniques for designing and implementing a risk analysis system?
• What are the best techniques for design and implementation of a security audit
framework?
• What are the procedures, and processes that must be followed in the operation and
administration of a security audit framework?
Objectives
The main objectives of this paper are as follows.
• Representing a computer network’s normal functioning with an abstract
model
• Building a usage profile of a computer network.
• Detecting activities and occurrences that deviate from the normal usage of
a computer network and flag these activities and occurrences as anomalous
activities on a computer network.
• Design and implementation of an Anomaly Intrusion Detection System.
• Design and implementation of a Risk Analysis System.
• Design and implementation of a Security Audit Framework.
• Draft a document that details the procedures, processes and guidelines that
must be followed in the operation and administration of a security audit
framework.
Behavior Models
It is hoped that the abstract representation of a system’s normal usage will capture the entire
behavior of the system. Such models are known as behavior models. As such, the threat detection
system this paper seeks to explore is expected to be a behavior based threat detection system.
Examples of behavior models that this paper seeks to explore are statistical models, cognitive
based models, machine learning based models, user intention based models, and computer
immunology based models. These models are associated with the development of anomaly-based
intrusion detection systems.
11. 10
10
System Threats
There are three types of system logs that our intended threat analysis and detection hopes to arrive
at. These are system errors, system threats and usage rates all categorized based on the magnitude
and characteristics of an instance of the threat model.
These logs must as such be audited by a security expert to analyze changes in our computer
system that fits or deviates from our current usage model in order to project a more appropriate
instance of our usage model that will be perfectly functional and suiting in the future.
Boolean Calculus
It is expected that using Boolean algebra and calculus of Boolean functions, the normal usage
model can have a hardware representation. Researching how to implement this hardware
representation can be done using Boolean algebra and calculus of Boolean functions. These
concepts are related with concepts from computer organization and architecture such as logic gates,
multipliers, design of arithmetic and logic units, and concepts from embedded systems like
architecture of various embedded system implementation. These architectures include hardware
only implementation and hardware/software implementation.
Micro Usage Models
Micro usage models are sub models of our normal usage model. They are modelled using the same
research model. Examples of micro usage models that this paper explores are Device Usage Model,
Host Usage Model, Server Usage Model, Authentication Usage Model, Session Usage Model,
CPU Usage Model, Memory Usage Model ,Port Usage Model and Network Usage Model. These
micro usage models are expected to derive their mathematical representation from variables
sampled from an audit trail analysis. These micro usage models are expected to be components of
a usage profile developed for computer network.
Properties of Intrusion Detection Systems
There are special properties of intrusion detection systems that make them effective and efficient
at detecting and preventing threats. Examples of these properties are correctness, promptness and
ease of use. Correctness means how good the intrusion detection system can detect threats. This is
important because correctness affects the rate at which a predicted threat is false or true.
Promptness is related to the time it takes to detect and halt a threat and ease of use is related to the
12. 11
11
property of the intrusion detection system aiding convenient use of the Computer Network or
System for which it was developed.
Research Model and Methodology
The research model of this paper investigates threat detection using application of Calculus,
Boolean algebra, Machine learning and Statistical models. These fields of study are mainly related
to Discrete Mathematics, Computer Science, Operational Research, Linear and Non Linear
Programming, Regression Analysis and Data Mining.
The research model is inspired by linear and non linear regression. The methodology for threat
detection is inspired by linear and non linear programming and calculus. Some Computer Science
fields that inspire the threat detection parts of this research are multithreading, architectures of
embedded system design and implementation, and concepts from computer organization and
architecture like implementation of arithmetic and logic unit.
Statistical and Machine Learning Models
Statistical models are mathematical models that can be used in the development of intrusion
detection systems. These models have different types. Machine learning techniques are also used
to build intrusion detection systems. These techniques have special models or structures that aid
development of intrusion detection systems. Examples of statistical models are mean and standard
deviation models, univariate models, and time series models. Machine learning models include
Neural networks, Bayesian networks, Hidden Markov Models and Genetic algorithms.
Cognitive Based, User Intention Based and Computer Immunology Based Models
Besides the statistical and machine learning based models that can be used for developing anomaly
based intrusion detection systems, there are cognitive based models that are used to develop
anomaly intrusion detection systems.
13. 12
12
Literature Review
This section reviews major topics that constitute this research paper and work done in some of
these areas. The topics and areas that will be considered for discussion include intrusion detection
systems since any discussion or study of threat and their source detection is centered on intrusion
detection systems. Also, behavior encryption is another computer security field that will be
discussed in detail since it adds much value to information hiding parts of this research. Risk
analysis will also be reviewed to sum up what constitutes risk analysis. Finally, there will be a
review on Normal Usage Models.
Intrusion Detection Systems
Basically, there are two types of intrusion detection systems in the industry based on the approach
used for threat detection and the technologies used to build the system. These are knowledge based
also known as signature based and behavior based intrusion detection systems. Each takes a
different approach to threat detection and each uses different technology for building the intrusion
detection systems. Also, every single one has its pros and cons.
Knowledge based intrusion detection systems are built on a database of already known
threats. These known vulnerabilities or threats are called threat signatures. Usually, detection is
done as direct mappings of various system incidents that indicate threats with threat signatures. As
a result, the database of threats must be constantly updated for new identified threats. Because new
threats can be detected for inclusion in the database, the correctness of detecting threat is
sometimes compromised since threats which do not have corresponding signatures cannot be
mapped and detected. But these types of intrusion detection systems have lower false alarms since
each detected threat is registered in the database of threat signatures.
Behavior based intrusion detection systems take a different approach to threat detection.
They are built using artificial intelligence technologies. Usually, the system for which the intrusion
detection is built is modelled for its behavior and deviations from that behavior is used as a
technique for detecting the threats. Because of this, they have a better correctness at detecting
threats. No threat signatures or mappings of incidents that indicate threat is required. Additionally,
they have higher false alarms because there is no mapping of detected threats with a database of
known threats.
14. 13
13
Besides these, intrusion detection systems are classified based on purposes for which they
are built and the activeness or passiveness at which they deal with threats. There are host based
and network based intrusion detection systems made for such purposes. Active intrusion detection
systems are configured to block or prevent attacks while passive intrusion detection systems are
configured to monitor, detect and alert threats.
Anomaly Detection Systems
According to a research paper entitled “Design and Implementation of Anomaly Detection
System”, there are global variables of a network that can be used for detecting anomalous activities
on a network. The paper used a hybrid of signature based and anomaly intrusion detection to detect
anomaly. According to the paper, some of the techniques used for detecting intrusion include using
generic network rules to detect network anomaly. The paper also used dynamic network knowledge
such as network statistics to detect anomalous activities.
Behavior Encryption
Behavior algorithms are applied to safeguard information on computing devices such as mobile
phones and laptops. These algorithms are the basics for building systems that study and encrypt
user behavior on a computing device in order to ensure the security of information on the
computing devices. A study into mobile platform security reports that behavior encryption
application systems have been designed and built, focusing on mobile platforms. Results from this
study indicated that encryption application systems are effective in ensuing mobile platform
security.
In addition to this, it must be noted that, since mobile devices can have security through
behavior encryption systems, then the behavior of host on a network or network systems can also
be encrypted to ensure safe communication since each host or user on a system or network has a
particular behavior pattern.
Cryptographic study into encrypting the normal usage model can fall under behavior encryption
since the usage model represents a system’s behavior and can be composed of a user’s behavior.
This can aid in securing the information that embodies the usage model. It is also necessary because
if the usage model can easily be predicted then it is possible to manipulate the usage model and
launch an attack.
15. 14
14
Risk Analysis
Computer risk analysis is also called risk assessment. It involves the process of analyzing and
interpreting risk. To analyze risk, the scope and methodology has to be initially determined. Later,
information is collected and analyzed before interpreting the risk analysis results. Determining the
scope can be described as identifying the system to be analyzed for risk and parts of the system
that will be considered. Also, the analytical method that will be used with its detail and formality
must be planned. The boundary, scope and methodology used during risk assessment determine
the total amount of work efforts that is needed in the risk management, and the type and usefulness
of the assessments result.
Risk has many components including assets, threats, likelihood of threat occurrence,
vulnerability, safeguard and consequence. Risk management include risk acceptance which takes
place after several risk analyses. Normally, after risk has been analyzed and safeguards
implemented, the remaining or residual risk in the system that makes the system functional must
be accepted by management. This may be due to constraints on the system such as ease of use, or
features of the systems for which strict safeguard will cost the organization operational problems.
As such, risk acceptance, like the selection of safeguards, should take into account various factors
besides those addressed in the risk assessment. In addition, risk acceptance should take into
account the limitations of the risk assessment.
Information Security Awareness and Practices
A paper on information security awareness in Saudi Arabia discusses information security
awareness and practices. The paper is entitled “A study of information security awareness and
practices in Saudi Arabia.” This paper emphasizes the fact that information is under constant threat
from cyber vandals. However, Saudi Arabia is rated poor in terms of information security due to
the fact that the country is a highly suppressed, patriarchical and tribal culture country.
The paper examined the level of information security awareness among the general public
in the country using an anonymous online survey based on instruments the Malaysian Security
Organization produced. In all, 633 persons responded to the survey and analysis confirmed that
indeed, information security awareness is low in the country and this is mostly related to the fact
that, the country is highly suppressed, patriarchical and tribal in nature.
16. 15
15
Protocol for Mitigating Risks on Social Networking Sites
According to an academic paper entitled, “Protocol for mitigating the risk of hijacking social
networking sites”, hackers can hijack a user’s session on social networking sites, impersonate the
victim and take over his session.
The paper deals with this risk by presenting a security authentication protocol for mitigating
the risk. The protocol takes into account that users of social networking sites connect to the sites
using several platforms and connection speeds. To cater for mobile devices and tablets using Wifi
connection, a novel Self-Configuring Repeatable Hash Chains (SCRHC) protocol was developed
to prevent the hijacking of session cookies. This protocol supports three levels of caching making
it possible to forfeit storage space for enhanced performance and reduced workload.
Behavior/Anomaly Based Intrusion Detection
Behavior models are used to detect intrusion in computer system. This section reviews the behavior
models that can be used to build behavior based intrusion detection systems. These models are put
into various categories. The categories are, statistical models, machine learning based techniques,
cognitive models, computer immunology, user intention. Statistical models include operational or
threshold metric model, markov process or marker model, multivariate model, statistical moments
model, time series models, univariate models. Machine learning based models include bayesian
networks, generic algorithms, neural networks, fuzzy logic, and outlier detection, cognitive models
include finite state machines, description scripts, and expert systems.
17. 16
16
Research Model and Methodology
Research Model
Assume that the normal usage (Y) of a computer network can be represented by a mathematical
function;
Y=f (Xi, Ci) such that Xi represents system variables like number of functions or number of
authentications. Ci represents system constants like maximum or minimum number of
authentications. When a change in Y is beyond the standard deviation determined from the data
set of our usage, then that change indicates a threat. To investigate this threat, machine learning
algorithms, mathematical functions and behavior based intrusion detection systems will be studied
to determine Y in terms of a number of variables that represent Y appropriately. The expected
usage model of the network to be investigated includes the following components. Host Usage
Model, Server Usage Model, Device Usage Model, Port Usage Model, Network Usage Model,
Session Usage Model, Authentication Usage Model, Memory Usage Model, CPU Usage Model,
Battery Usage Model and Program Usage Model. These components are expected to be derived
from the variables listed below.
• Average number of application software that run on the mobile system while using the
system
• Average number of system processes that run on the mobile system while using the
system.
• Average number of authentications in the mobile system.
• Average number of user actions that happens on the mobile system Average time a user
spends before his session expires.
• Average time the mobile facility or resource functions each day.
• Number of paired ports communicating on the network
• Average amount of memory space used on devices while the network is being operated.
• Average CPU time spent on a single device on the network
• Average life span of a single device battery on the network.
USAGE MODEL: JAVA INTERFACE THAT IMPLEMENTS THE RESEARCH MODEL
For each component of a computer system under investigation, we will program a usage model
which is an implementation of the research model for that component which forms part of the
18. 17
17
computer system under investigation. Each usage model implements an interface captured in a
java file called model.java.
There are eight functions in the model.java interface. The first one is computeval which is
for computing the usage value at an instance. The second one is findchange which is for finding
changes in the usage of the computer system. The third one is learnsys which is for learning the
usage of the system. The fourth one is findrelationship which is for finding the regression equation.
The fifth one is monitor which is for monitoring the usage of the system. The sixth one is
showalarm which is for displaying error messages and detected intrusion. The seventh one is
haltprocess which is for halting detected intrusion and the eighth one is predictvals. It is for
predicting usage values based on the regression equation determined. Omitting an implementation
of one of the functions of the usage model will throw an exception. To implement the usage model,
you will use the java keyword implements. Below is an implementation of the model.java file
USAGE MODEL FILE
public interface model{
public double computeval();
public double findchange();
public void learnsys(int t);
public Object findrelationship();
public void monitor(int t);
public void showalarm(String info);
public void haltprocess();
public void predictvals();
}
IMPLEMENTING THE USAGE MODEL FOR AN AUTHENTICATION SYSTEM
class auth_usage implements model{
/* variable declaration for dependent and independent variables
public double computeval(){
}
public double findchange(){
}
public void learnsys(int t){
}
19. 18
18
public Object findrelationship(){
}
public void monitor(int t){
}
public void showalarm(String info){
}
public void haltprocess(){
}
public void predictvals(){
}
}
Methodology
The list below details activities or processes that will be followed to represent a computer system
with an abstract mathematical model and analyze changes in that system. It is hoped that following
these processes will arrive at design and implementation of a normal usage model, a threat
detection system and a mobile security audit framework.
• Machine Learning Algorithms & Behavior Based Intrusion Systems: Investigate machine
learning algorithms, mathematical functions, and behavior based intrusion detection
systems in order to determine the extent to which the normal usage of a mobile system can
be represented by the research model.
• Audit trails: Analyze audit trails in order to formulate a set of independent and dependent
variables and their associated data set that will help in modelling the usage model of a
mobile system.
• Normal Usage Model: Apply the knowledge gained from the machine learning algorithms
and behavior based intrusion detection systems study and the audit trails analysis to model
and represent the normal usage of a mobile system such as a smart phone, laptop or wireless
network.
• Threat Modelling: Study differential equations of the normal usage model and its
applications in order to model, detect and prevent threats.
• Boolean Calculus: Apply Boolean algebra and calculus of Boolean functions to design and
implement a hardware and software that make up the Normal Usage and Threat Detection
Systems.
20. 19
19
• Use programming as a tool to experiment representations of the normal usage and threat
models to aid design and implementation of a mobile security audit framework.
• Employ questionnaire to collect information about the usage of computers and mobile
phones.
• Threat Detection Systems: Develop an anomaly based threat detection system to
demonstrate the effectiveness of the research model. The goal is to measure the
effectiveness of the threat detection system developed, at preventing threats on a computer
system.
Machine Learning Algorithms & Behavior Based Intrusion Systems
Machine learning techniques and algorithms will be investigated to know the extent to which an
expert system that learns a computer system’s usage can be built. Since the expected usage model
is a mathematical model, various mathematical modelling techniques will be applied to
determining the normal usage model.
When deviations from these mathematical models are analyzed it can lead to design and
implementation of behavior based intrusion detection systems. As such, a thorough study into
design and implementation of behavior based intrusion detection systems will be done.
Audit Trail Analysis
It is expected that computer security audit reports will be sampled and analyzed to arrive at a set
of dependent and independent variables and their data set. These variables and their associated
data set can be used to formulate the normal usage model.
Normal Usage Model
An investigation into applying the knowledge gained from the machine learning study, the
mathematical modelling study, the behavior based intrusion detection system study and the audit
trail analysis will the done. It is hoped that this will answer the question how do you represent the
normal functioning of a computer system with a mathematical abstract model.
Threat Modelling
Differential equations of the normal usage model will be investigated to know the extent to which
deviations from the normal usage models can be analyzed. An abstract mathematical model of
these deviations will be formulated. These abstract models are derivatives of the normal usage
model.
21. 20
20
Boolean Calculus
A study into representing the normal usage model with a boolean function will be done. It is hoped
that analyzing these boolean functions will aid in building a hardware that is the expected usage
system. Differential equations of these boolean functions will be studied to analyze changes in the
system that indicate deviation from the normal usage model.
Experimenting Usage and Threat Models
Programming will be used as a tool to experiment various usage and threat models. These usage
and threat models are expected to be derived from a computer system. This experiment will lead
to design and implementation of a normal usage system, a threat detection system and a risk
analysis system. These systems are expected to be components of a mobile security audit
framework.
Computer Usage Survey
A questionnaire for obtaining information about computer and smart phone usage will be
employed. It is expected that this will give an idea about various statistics that make up a computer
or smart phone’s usage. These statistics will be a guideline for sampling experimental data of a
computer system’s usage during experimenting the usage and threat models.
Threat Detection Systems
It is hoped that an anomaly based threat detection system will be developed to demonstrate the
effectiveness of the research model at being used to model systems usage and threats. The
effectiveness of the threat detection system developed at preventing threats on a computer system
will also be measured. In this project, the threat detection system that will be developed is for
ecommerce sites.
22. 21
21
Threats Associated with Computer Systems
This chapter discusses some of the threats and attacks associated with computer and network
systems.
Attacks associated with a computer system
The attack types that will be discussed include Malicious Code, IP Scan and Attack, Web
Browsing, Virus,
Unprotected Shares, Mass emails, Simple Network Management Protocol(SNMP), Hoaxes,
Backdoors, Password Crack, Brute Force, Dictionary, Denial of Service(DoS) and Distributed
Denial of Service Attack(DDoS), Spoofing, Man in the Middle, Spam, Mail Bombing, Sniffers,
Social Engineering, Buffer Overflow and Timing Attack.
Malicious Code
Malicious Code attack include the execution of viruses, worms Trojan horses and active Web
scripts with the intent to destroy or steal information. The state of the art malicious code is the
polymorphic or multivector worm. The attack programs uses up to six attack vectors to exploit a
variety of vulnerabilities in commonly known information system devices. Perhaps the best
illustration of such an attack remains the outbreak Nimda in Septembers 2001 which used five of
the six vectors with startling speed. TruSecure Corporation an industry source for information
security statistics and solutions reports that Nimda spread to span the internet address of 14
countries in less than 25 minutes.
IP Scan and Attack
The infested system scans a random or the local IP addresses and targets any of the several
vulnerabilities known to hackers or left over from previous exploits such as Code Red Black
Orifice, Poizon Box.
Web Browsing
If the infested system has write access to any Web page, it makes all the Web content files (html,
asp,gci and others) infectious so that users who browse to those pages become infected.
23. 22
22
Virus
Each infested machine infects certain common executable or script files on all computers to which
it can write with virus code that can cause infection.
Unprotected Shares
Using vulnerabilities in file systems and the way many organizations configure them, the infested
machine copies the viral components to all locations it can reach.
Mass emails
By sending email infections to addresses found in the address book. The infected machine infects
many users, whose mail reading program also automatically runs the programs and infects other
systems.
Simple Network Management Protocol (SNMP)
By using the widely known and common password that were employed in the early versions of
the protocol (which is used for remote management of networks and computer devices) the
attacking program can gain control of the device. Most vendors have closed these vulnerabilities
with software upgrades.
Hoaxes
A more devious approach to attacking computer systems is the transmission of a virus hoax with
a real virus attached, when the attack is masked, in seemingly legitimate message, unsuspecting
users readily distribute it. Even though those users are trying to do the right thing to avoid
infection, they end up sending the attack on to their coworkers and friends and infesting many
users along the way.
Backdoors
Using a known or previously unknown and newly discovered access mechanism, an attacker can
gain access into a system or network resource through a back door. Sometimes, these entries are
left behind by system designers or maintenance staff and thus referred to as trap doors. A trap door
is hard to detect, because, very often the programmer who puts it in place also makes the access
exempt from the usual audit logging features of the system.
24. 23
23
Password Crack
Attempting to reverse-calculate a password is often called cracking. A cracking attack is a
component of many dictionary attacks. It is used when a copy of the security account manager
(SAM) data file can be obtained. The SAM file contains the hashed representation of the user’s
password. A password can be hashed using the same algorithm and compared to the hashed results.
If they are the same the password has then been cracked.
Brute Force
The application of computing and network resources to try every possible combination of options
of password is called brute force attack. Since this is often an attempt to repeatedly guess
passwords to commonly used accounts, it is sometimes called a password attack. If attackers can
narrow the field of accounts to be attacked, they can devote more time and resources to attacking
fewer accounts. That is one reason a recommended practice is to change account names for
common accounts from the manufacturer’s default. While often effective against low-security
systems, password attacks are often not useful against systems that have adopted the usual security
practices recommended by manufacturers.
Dictionary
This is another form of brute force attack. The dictionary attack narrows the field by selecting
specific accounts to attack and uses a list of commonly used password (the dictionary) instead of
random combinations. Organizations can use similar dictionaries to disallow passwords during
the reset process and thus guard against easy-to-guess passwords. In addition, rules requiring
additional number and/ or special characters make the dictionary attack less effective.
Denial of Service (DoS) and Distributed Denial of Service (DDoS)
In a denial of service attack, the attacker sends a large number of connections or information
requests to a target. So many requests are made that the target system cannot handle them along
with legitimate request for service successfully. This may result in the system crashing or simply
becoming unable to perform ordinary functions. A distributed denial of service is an attack in
which a coordinated stream of request is launched against a target from many locations at the same
time. Most DDos attacks are preceded by a preparation phase in which many systems, perhaps
25. 24
24
thousands are compromised. The compromised machines are turned into zombies, machines that
are directed remotely (usually by a transmitted command) by the attacker or participate in the
attack. DDos attacks are the most difficult to defend against and there are presently no controls
that any single organization can apply. There are, however some cooperative efforts to enable
DDos defenses among groups of services providers; among them is the Consensus Roadmap for
Defeating Distributed Denial of Service attacks.
Spoofing
Spoofing is a technique used to gain unauthorized access to computers wherein the intruder sends
messages to a computer that has an IP address that indicates that the messages are coming from a
trusted host. To engage in IP spoofing, a hacker must first use a variety of techniques to find an
IP address of a trusted host and then modify the packet headers so that it appears that the packets
are coming from that host. Newer routers and firewalls arrangements can offer protection against
IP spoofing
Man in the Middle
In the well-known man-in-the-middle or TCP hijacking attack, an attacker monitors (or sniffs)
packets from the network, modifies them and inserts them back into the network. This type of
attack uses IP spoofing to enable an attacker to impersonate another entity on the network. It
allows the attacker to eavesdrop as well as to change, delete, reroute, add forge, or divert data. In
a variant on the TCP hijacking session, the spoofing involves the interception of an encryption
key exchange, which enables the hacker to act as an invisible man-in-the-middle – that is
eavesdropper – with regard to encrypted communications.
Spam
Spam is unsolicited commercial email. While many considers spam a trivial nuisance rather than
an attack, it has been used as means to make malicious code attacks more effective. In March
2002, reports emerged of malicious code embedded in MP3 files that were included as attachments
to spam. The most significant consequence of spam on the modern organization, however, is the
waste of both computer and human resources it causes by the flow of unwanted electronic mail.
26. 25
25
Many organizations attempt to cope with the flood of spam by using filtering technologies to stem
the flow. Other organizations tell the users of the mail system to delete unwanted messages.
Mail Bombing
Another form of e-mail attack that is also Dos is called mail bomb, in which an attacker routes
larger quantities of e-mail to the target. This can be accomplished through social engineering or
by exploiting various technical flaws in the Simple Mail Transport Protocol. The target of the
attack receives unmanageable large volumes of unsolicited e-mail. By sending large e-mails with
forged header information, attackers can take advantage of poorly configured e-mail systems on
the internet
27. 26
26
Mathematical Modelling Techniques and Machine Learning Based Models
The mathematical relation that represents the normal usage model can be determined using
regression analysis. Regression analysis is a field of statistics. It employs the least squares method
to determine relationship between a data set compose of two or more variables. The least squares
method tries to determine the relationship by minimizing the error margin of the derived relation.
Simple Linear Regression
Simple linear regression problems involve a dependent and a single independent variable. The goal
is to find a linear relationship between the two variables. The linear relationships are of the form
y=b0+b1x where y is the dependent variable and x is the independent variable. The slope of the line
is b1 and the y-intercept is b0. The relationship between the dependent and independent variable
can be derived using the least squares method. First of all, the sum of the dependent and the
independent variables, and the sum product of the dependent and the independent variables must
be calculated. Secondly, the sum of the squares of the dependent and the independent variables
must be calculated.
The constant that represents the slope of the line that fits the predicted function is calculated as
the product of the sum product of the dependent variable and the independent variable and the
sample size minus the product of the sums of the dependent and the independent variables divided
by the product of the sample size and the sum of the squares of the independent variable minus the
square of the sum of the independent variable.
The constant that represents the y-intercept of the line is also calculated as the product of the sum
of the dependent variable and the sum of the squares of the independent variable minus the product
of the sum of the independent and the sum product of the dependent and independent variables
divided by the product of the sum of the squares of the independent variable and the sample size
minus the square of the sum of the independent variable.
Finally, the correlation coefficient of the predictive relation is also calculated as the product of the
sample size and the sum product of the dependent and independent variable minus the product of
the sums of the dependent and independent variables divided by the square root of the product of
the sample size and the sum of the squares of the independent variable minus the product of the
squares of the sum of the independent variables multiplied by the product of the sample size and
28. 27
27
the sum of the squares of the dependent variable minus the square of the sum of the dependent
variable.
Multiple Linear Regression
Multiple linear regression problems involve a dependent variable and two or more independent
variables. Using the least squares method, the goal is to find the linear relationship between the
variables involved. The relationships are of the form y=b0 + b1x1+b2x2+…+bnxn, where n is the
number of independent variables, x1, x2,… ,xn are the various independent variables and y is the
dependent variable.
To solve multiple linear problems, we first need to reduce the expected function or multiple linear
models to their simple linear forms. In this form, it is easier to determine the regression equation.
To do this we need to determine the y=b0+b1x for every independent variable. That way, the
regression coefficient set denoted b associated with the independent variables can be determined
using the least squares method. As such the set b made up of b1, b2,…bn is a set containing the entire
regression coefficient associated with the predicted regression function.
Non Linear Regression
Non linear regression problems involve finding a non linear relationship between a dependent
variable and one or more independent variables. Because non linear graphs are difficult to analyze,
they can be represented mathematically as linear models before they are analyzed. This makes it
possible to use linear regression techniques to analyze such relationships.
One of the ways used to represent non linear relationships with linear models is taking logs
on both sides of the relationship equation. That reduces the non linear relationship to a linear
relationship. An example is of the form y2
=x2
/xy. To reduce this relationship to a linear relation
we take logs on both sides of the relation.
The resulting relationship is 2logy=2logx-logx-logy. When this relationship is simplified
the resulting relationship is logy=(logx)/3. In this form, the logy term represents the dependent
variable and the logx term represents the independent variable. Let K=logy and let P = logx. It
implies that K=P/3. This becomes the linear form of our non linear relation.
29. 28
28
Machine Learning based models Used for Developing Anomaly Based
Intrusion Detection Systems.
This section discusses how hidden markov models can be used to detect and prevent threats on a
computer system.
Application of hidden markov models to detect threats and other critical occurrences in a
system.
Hidden markov models are machine learning models that are used to model states in a
system, the sequence in which they occur and the associated probabilities for each state transition.
When a system has a set of states in which it usually falls and it can be predicted or established
that each new state is dependent on the previous states, then hidden markov models can be used to
learn the state transitions that usually happens in the system. It must be stated that the sequence in
which states occur in a system can be characterized by a parametric random process. Also, the
probability associated with each state transition is irrespective of the time in which the transition
occurred in the system.
For computer systems which have occurrences that happen based on a parametric random
process, these occurrences can be seen as the set of states in the system. Some of these occurrences
may be the point at which the system is at its optimal usage, and the point at which a particular
threat occurs in the system. When a set of threat types that happens in the system is determined, it
becomes possible to study the sequence in which these threats occur in the system and the various
transitions between the threats using hidden markov models. Also, the various usage points
including the optimal, the minimum and the average usage and how they are transited in the system
can be studied using hidden markov models.
Because various occurrences and threats can be studied using hidden markov models, it
becomes possible to predict the next occurrence or threat that will happen on a host or a computer
network. Threat sources can also be predicted using threat models. When threat models are
integrated, they give a general idea about the source of the threat. With such knowledge and ability,
the next threat or occurrence that has a higher likelihood of happening on a host or network can be
predicted using application of hidden markov models. As such, occurrences can be prevented if
they are estimated to be disastrous. Also, if for instance, for some reason, the optimal or minimal
usage must be reached, it becomes possible to study ways of optimizing the transition from the
30. 29
29
current state or predicted next state to the required state. This makes it possible to move from a
particular usage point to the desired usage point.
This approach to threat detection and usage optimization, make it possible to build anomaly
based intrusion detection systems that are correct, prompt and increase optimal use of the system.
The anomaly based intrusion detection systems built using these techniques are correct because
the threat models come from usage models that are built using similar approaches and the threat
prediction and prevention mechanisms are designed using robust techniques developed using these
approaches. Also, there are likely going to be lower false alarms since the threats predicted on host
or networks come from threat models designed from such robust methods.
An example of a kind of cyber security threat that this approach can be used to model is a
network problem where a student is determined or predicted to be sending threatening or socially
unacceptable emails to colleagues. Typically, his identity is hidden on the network on which he
sends the emails. As such, it is difficult to determine the likelihood that he will send such
threatening emails on a particular day or hour so that his identity could be determined and brought
to book. Using hidden markov models, a usage model of the email system could be developed that
will make it possible to determine the day or hour in which he is likely going to send such an email.
This will help in determining his identity and bring him to book.
31. 30
30
The Normal Usage Model of a System
If the normal usage of a mobile system can be represented by a mathematical function such that
that function is made up of system variables Xi and system constants Ci, then any representation
of our mobile system can be summarized as Y=f (Xi, Ci), where Y is our systems’ usage and Xi
are the various independent variables of our mobile system that constitutes the normal usage model
of the system. A normal usage model is an abstract representation of the usual or normal
functioning or behavior of a system.
In order to model the normal usage of our system and determine its mathematical
representation, it is essential to keep the method simple and the variables simple in abstraction and
minimal in quantity. This makes it easy to analyze, model and detect threats by applying a branch
of calculus called differentiation. Simplicity and minimal number of variables make it possible to
arrive at a mathematical function whose differential coefficient can be easily computed using
differentiation. As such, two cases will be considered.
In the first case, the normal usage model of our system can be analyzed and modelled
based on simple but essential micro usage models. These micro usage models represent smaller
components of our mobile system such as an authentication system of our mobile system, and a
user’s session. Ideally, these models are best derived from exactly one most appropriate system
variable when feasible or at most two in order to reduce the complexity involve in computing the
differential coefficient of the usage model.
For a mathematical function involving more than a single independent variable, our method
for threat detection using the differential equations techniques is within the scope of multivariable
calculus. Since it is easy to compute the differential coefficient of a single variable function, our
threat analysis and detection can be easy if all our micro models are single variable functions.
In the second case however, our usage model derives it mathematical representation from
at less two or three most relevant system variables of the mobile system under examination. This
option increases the complexity involved in calculating the differential coefficient of our normal
usage model and analyzing the threat associated. This is because the normal usage model for this
case is a function that can be derived from two or more independent system variables.
To do this type of differentiation, we use a branch of calculus called partial differentiation,
where one of the independent variables of our usage model is held constant to analyze changes in
32. 31
31
the usage. This type of differentiation is also within the scope of multivariable calculus. The
sections that follow the one below throw more light on how to model the normal usage of several
micro usage models. These micro usage models are expected to be components of a computer
network’s usage.
It must be noted that the usage model is made up of the usage model function and a
statistical model that captures the mean and standard deviation of the predicted usage function.
This statistical usage model is called moments or mean and standard deviation model. There are
other statistical model that could have been used. These include time series models, univariate
models and bivariate models.
Single Variable Calculus Review and its Applications
Assume a mobile system with exactly three major system variables. If sampling each of these
variables helps us to arrive at exactly one micro usage model of our mobile system that best
represents the behavior or functioning of that feature of our system, then we can use differential
equations of the three micro models to analyze and detect threats. Below are some examples of
calculus basics for our threat modelling techniques.
Y=2X+3 is a linear function that represents our first micro usage model. X is number of
authentications. Y=3X2+2X+6 is a quadratic function that represents our second micro usage
model and X is the number of host on the mobile system’s wireless network. Y=40/ X+ 5 is an
exponential function that represents our third micro usage model and X is the number of application
on a host on the mobile system’s wireless network. For each micro usage model, the differential
coefficient can be computed using the law for differentiation given below.
Theorem 1: dy/dx(C) =0, where C is a constant. Theorem 2: dy/dx (f[Xi, Ci]) is computed
as the product of the exponent of the first term that results from simplifying f (Xi, Ci) and the
constant besides it multiplied by the system variable Xi raise to the power the original exponent of
the first term minus one plus the result for iterating the first step till every term of f (Xi, Ci) has
been evaluated based on the first step. The final result looks like the sum of a series of rational
numbers computed from the law after going through all the terms.
From the calculus basics review above, the corresponding differential coefficients of the
three micro models are determined as follows; 2, 6X+2, and -40/ X2
. If the standard deviations of
our micro models are computed, then we can analyze changes in our system by looking at values
33. 32
32
of our usage model and its derivates and how they relate to the average usage, its corresponding
standard deviation, and the acceptable thresholds for threats.
Any occurrence at a point where our usage model value is not equal to the average usage
indicates a threat. Any occurrences at a point where the usage model value is less than the average
usage minus its corresponding standard deviation is a denial of service threat. Any occurrence at a
point where the usage model value is greater than the average usage plus it corresponding standard
deviation is an intrusion. Also any occurrence at a point where the value of the usage’s derivative
is not equal to the acceptable threshold for threats is a threat.
Usage Model List
Authentication Usage Model
The authentication usage model represents the usage of an authentication system. The independent
variables that must be sampled to determine the usage of an authentication system are the average
data transmitted during an authentication (x1) and the average network speed for a single
authentication (x2). The average data transmitted is the average of request and response data for a
single authentication and the average network speed is the average upload and download speed for
a single authentication. The dependent variable that must be sampled is the time taken for an
authentication (y).
The goal of modelling the dependent and independent variables is to arrive at a mathematical
relationship between y and the two independent variables x1 and x2. It is expected that the
relationship will be Y=c1(x2/x1) +c2, where c1 and c2 are system constants. In addition to that, some
system constants that will aid threat analysis must be determined. These are the total number of
valid authentications, the expected authentications within a time frame, the minimum
authentications within a time frame and the maximum authentications within a time frame. The
mathematical relationship between y, x1 and x2 is the normal usage model of the authentication
system. After this relationship has been determined, various occurrences that deviate from this
relationship can be used to analyze threats. For instance, any occurrence that is not equal to the
average usage is a threat. Additionally, any occurrence that indicates a change outside an
acceptable threshold is a threat. The acceptable threshold is a range within which changes in the
systems are deemed normal. Such a range is composed of the average usage and standard
deviation.
34. 33
33
Session Usage Model
A session usage model represents a single user’s behavior before his session expires. To determine
the mathematical model for a user’s session, two main independent variables must be sampled.
These are size of session data accumulated (x1), and number of user actions (x2). The dependent
variable that must be sampled is time spent before session expires (y). The session usage model is
expected to be made up of two micro usage models. The mathematical representation of the micro
usage models are expected to be Y=c1x1+c2 where c1 and c2 are systems constants and Y=c1x2+c2
where c1 and c2 are system constants.
In addition to the two mathematical functions, some system constants that will aid threat analysis
must be determined. These include average user actions, average size of data accumulated, average
time spent. These constants can be determined from the data set used to determine the usage model.
The two mathematical relationships represent the session usage model. Both are linear
functions. It is expected that as user actions increase the time spent also increases. It is also
expected that as data accumulated increase times spent also increases.
Memory Usage Model
The memory usage model represents the usage of memory space in a system. The independent
variables that must be sampled are number of application programs running (x1), and the number
of system processes running (x2). The dependent variable that must be sample is amount of
memory space being used(y). The mathematical relationship between x1, x2, and y is expected to
be y=c1x1+c2x2+c3 where c1 is the average memory space for programs, c2 is the average memory
space for processes and c3 is the average memory being used when no process or program is
running.
In addition to these, some system constants that aid threat analysis must be determined. These
include the minimum and maximum memory space for programs and the minimum and maximum
memory space for processes. The mathematical relationship between x1, x2, and y is the memory
usage model. When determined, the memory usage model can be used to analyze changes in the
memory usage that indicate threats in the system.
CPU Usage Model
The CPU usage model represents CPU usage in a system. The independent variables that must be
sampled are the number of application programs running (x1), and number of system processes
35. 34
34
running (x2). The dependent variable that must be sampled is amount of CPU power being used
(y). The mathematical relationship between x1, x2, and y is expected to be y=c1x1+c2x2+c3 where
c1 is the average CPU power being used for programs, c2 is the average CPU power being used for
processes and c3 is average CPU power being used when no process or program is running. In
addition to these, some system constants that aid threat analysis must be determined. These include
the minimum and maximum CPU power for programs and the minimum and maximum CPU
power for processes. The mathematical relationship between x1, x2 and y is the CPU usage model.
When determined, the CPU usage model can be used to analyze changes in the CPU usage that
indicate threats in the system.
Program Usage Model
To determine the program usage model the dependent and independent variables that must be
sampled are time spent using program (y), and number of functions used (x). In addition to that,
the following constants must also be determined. Minimum functions used and maximum
functions used. The relationship between y and x determined after sampling various x and y values
is the program usage model denoted by y=f(x).
Host Usage Model
The host usage model is composed of four independent variables. Memory usage (x1), session
usage (x2), CPU usage (x3), and program usage (x4), derived from their respective usage models.
The dependent variable that must be sampled in the time host spent on host (y). Any relationship
determined between the dependent and the independent variables is the host usage model. The
resulting host usage model is denoted y=f (x1,x2, x3, x4).
Battery Usage Model
The battery usage model is made up of the average usage of CPU, average memory usage and the
average usage of how a session behaves in the system. These are the independent variables. The
dependent variable is the battery lifespan. The independent variables are derived from their
respective micro usage models.
Device Usage Model
The device usage model is made up of a battery usage model, a host usage model, and the time
spent on the device. The usage models that make up the device usage model compute the average
36. 35
35
micro usage and try to relate that with the time spent on the device. The time spent on the device
is the dependent variable.
Server Usage Model
The server usage model is made up of the CPU time being used, the memory space being used and
the number of processes running. These variables are used to form two different micro usage
models. As such, there are two dependent variables, CPU time and memory space. The
independent variable for both micro usage models is the number of processes running.
Port Usage Model
The port usage model is made up of the time elapsed during communication, number of programs
that use the port and the number of paired ports. The number of paired ports is the dependent
variable and the remaining variables are the independent variables.
Network Usage Model
The network usage model is made up of average port usage, average server usage average host
usage, the average size of data transmitted on the network, and time spent on the network. The first
three variables are the independent variables. The remaining two are the dependent variables. As
such two micro usage models make up the network usage model.
Aggressive Usage Detector
This model is a utility that detects aggressive behavior on a system. It is modelled just like the
various micro usage models. Various factors that determine aggressive behavior during system
usage are used to determine the mathematical representation of this utility. Aggressive behavior
includes aggressive use of major system resources, and aggressive use of system components with
limited resources.
The average aggressive behavior and its standard deviation are determined. Any system
occurrence that indicates the average aggressive behavior, or the average aggressive behavior plus
its standard deviation or the average aggressive behavior minus its standard deviation is considered
a threat and must be halted, alerted or stored for audit purposes.
False Alarm Detector
The false alarm detector is a utility that detects normal system usage that otherwise may be deemed
threats. Occurrences that meet the criteria for false alarms are normal usage that seems to put the
entire usage of the system into a false state of vibration or anarchy. Such usage occurrences are as
37. 36
36
such prioritized as normal optimal usage. The remedy for the vibrations such usage occurrences
cause is delay in other normal usage occurrences in the system.
The state and magnitude of other system occurrences plus the state and magnitude of the normal
optimal usage determine the impact of the perceived anarchy. To increase convenience with which
the system for which this utility is developed, the average delay time and its standard deviation
must be detected. This utility is part of the normal usage. The utility is modelled just like the
aggressive usage detector.
Special parameters of the usage model
This section discusses special parameters of our normal usage model. These parameters include
the average usage, the usage standard deviation, the minimum usage, the maximum usage and the
most frequent usage value recorded.
The average usage is the predicted average usage after the normal usage model function has been
determined. The usage standard deviation is the standard deviation of the predicted normal usage
function. The minimum and maximum usage values are the minimum and maximum usage
predicted using the normal usage model. These parameters together with usage rates, threat model
constants and other usage constants are used in analyzing and detecting threats.
BUILDING THE USAGE PROFILE
To build the usage profile we will first program a usage model for all the components of the
computer system under investigation. For this research, we want to build the usage profile for a
computer network. As such we will program a usage model for authentication on the computer
system, we will also program a usage model for a user’s session on the computer system. Also, we
will program the usage model for memory usage in a computer system. We will also program a
usage model for CPU usage. Additionally, we will program a usage model for a host on a network
and program another usage model for a server on the network and finally we will program a usage
model for the network its self.
The usage model for each component represents the behaviour of that component of a
computer system under investigation. The usage model when implemented will help us determine
the regression equation which represents the research model and the average usage and its standard
deviation. In addition to the regression equation and the mean and standard deviation model we
will develop a markov chain model for the system under investigation. As such we will determine
38. 37
37
states in the entire computer network and the various state transitions and the associated
probabilities of state transitions. The rest of this chapter will explain how to build a usage profile
using an authentication system and explain the details of the critical variables of the other usage
models and explain the mathematical theory needed for building the usage profile.
BUILDING A MODEL PROFILE FOR AN AUTHENTICATION SYSTEM
To build a usage model for an authentication system, we must sample critical system variables of
a system. These variables include the download speed on the network, the upload speed on the
network, the size of data sent to the server during authentication, the size of data sent to the client
during authentication and the time it takes for a successful authentication. The size of data sent and
received from the server are request data and response data respectively.
To build the usage model for the authentication data, we will capture data for all the critical
variables at equal time intervals say every 10 minutes while the authentication system is being
used. After having a sample of sample size of about 10 we will try to determine the relationship
between the dependent variable and the independent variables. As already stated the relationship
can be determined using simple or multiple linear regression. In addition to the regression equation,
we will also determine other statistics that describe the behavior of the authentication system such
as the mean and standard deviations for the variables that were sampled.
BUILDING THE MARKOV CHAIN MODEL FOR THE AUTHENTICATION SYSTEM
Hidden markov models are machine learning models that are used to model states in a system, the
sequence in which they occur and the associated probabilities for each state transition. When a
system has a set of states in which it usually falls, and it can be predicted or established that each
new state is dependent on the previous states, then hidden markov models can be used to learn the
state transitions that usually happens in the system.
To build the markov chain model we will determine states on the authentication system and their
associated probabilities. Some of these states include the average usage of the authentication
system. This may be abstracted as the average time it takes for a successful authentication. Other
states include the minimum and maximum recorded time for a successful authentication and the
average time it takes for a failed authentication or the maximum and minimum recorded time for
failed authentications. With this information and their associated probabilities of occurrence during
a normal day we have more information about the behaviour of the authentication system.
39. 38
38
Threat Models in a System
A threat is a change in the normal usage model that is beyond a certain acceptable threshold called
the standard deviation of the usage model. A threat model on the other hand is an abstract
representation of this change in our mobile system that is beyond the acceptable threshold.
Integration can be performed on a threat model to determine the source of the threat. Integration is
a reverse operation for differentiation in calculus. A threat model that can perform integration
operations can be called a novel self integrating data structure. This chapter of the paper will look
at threat models of the micro usage models that make up a computer network and how to analyze
these threats in order to prevent them.
Also, how to determine the sources of these threats using a novel self integrating threat
model will be discussed. To do this, three main functions are introduced. The functions are y=3,
y=4X+2 and y=9X2
+3. These functions are in the context of the novel self integrating data
structure. These functions are three different threat models. Additionally, the threat models of the
various micro usage models discussed in this paper will be explored.
Properties and Methods of the Novel Self Integrating Data Structure
The best properties or characteristics of the data structure that represents our threat model include
just to mention a few, names of network software or host application software, version number of
network and host software, license information that include date software was purchased or
released and number of years needed for renewal, IP address and Mac address of a host on a
network.
The methods of such a gigantic or simulative object may include methods for computing
the integral of a threat model, another for computing the differential coefficient of the predictive
normal usage model, a method for computing the differential equation of a network or host threat
model. These methods included are mostly methods needed for performing the major calculus
operations that will help in the novel calculus simulation on a network to detect threat and their
sources on a wireless network. Besides these, it may be necessary to implements methods that
retrieve hidden network identity like IP and Mac addresses on a local area network.
Integration Review
Based on our three functions stated in this chapter, we will do an introductory review of integration
which is a branch of calculus that is a reverse operation for differentiation. The integrals for the
40. 39
39
functions introduced in this chapter are computed respectively as 3X +C, 2X2
+4X+C and
3X3
+3X+C where C represents system constants in the mobile system. Computing the integral can
be tricky so two laws are defined below to aid quick computation of the integrals of a normal
mathematical function.
Theorem 1:
If a function is represented by a constant such as a rational number, the integral is the product of
the variable x and the rational number which is the constant plus a system constant c, to be
determined by about a pair of x and y values.
Theorem 2:
If a function is not represented by a constant, the integral is given as the constant of the first x
occurring term divided by the sum of the exponent of the first x occurring term and 1 multiplied
by the variable x raised to the power the sum of the exponent of the first x occurring term and 1
plus repeating the same for every x occurring term plus the corresponding system constant c.
Interpretation of Threat Model Integrals
Since the novel self integrating data structure is a programmed threat model, it is important to
discuss the meaning of its integrals. The integrals represent the source of the original threat.
Examples of the integrals of the threat model may result in detecting the function, software, host
or network from which the threat was detected. With properties like software name, version
number, IP and Mac addresses it becomes easy to pin point the source of the threat.
If the integral of a threat model looks like the normal usage model of a function of the
system under examination, then that function from the system under examination can be predicted
as the source of the threat. Similarly, if the integral is similar to the normal usage model of a
software, host, or network that forms part of the system which is being investigated, then that threat
can be predicted to be from that software, host or network.
Threat Analysis and Detection
To do threat analysis in a system and abort processes that initiated those threats, linear and non
linear programming techniques can be used. The goal here is to minimize the threat occurrence
frequency and the overall impacts associated with the threat and optimize the normal usage
function. In addition to these two goals, there are some constants that aid threat analysis. These
constants are associated with the normal usage model and the threats in the system.
41. 40
40
Examples of these constants may be the rate at which usage is increasing with respect to a
particular usage variable or the rate at which the threat impact and frequency increases with respect
to a particular variable in the usage model and other special parameters associated with the usage
model function.
The average usage, its standard deviation and the threat model function make up the threat
model. The average usage and standard deviation are constants in the threat model. Using the threat
model function, the average usage and standard deviation, threats analysis can be done using linear
and non linear programming. The goal is to minimize threats using the threat model function as
the objective function and the average usage and standard deviation as constraints. Other
parameters that may be used as constraints include the rate at which usage is increasing with respect
to a particular usage variable or the rate at which the threat impact and frequency is increasing with
respect to a particular usage variable.
Threat Prediction
This section discusses how to predict threats in a system. The network usage model discussed in
the previous chapter and its associated threat model will be used to demonstrate how to predict or
detect a threat in a system. As discussed in the previous section, threat can be detected using linear
and non linear programming. The network usage model function and its associated threat model
function are the objective functions.
The constraints that will be used are the average network usage and its standard deviation,
and other parameters such as the rate at which the network threat increases with respect to other
network usage model components such as average host usage, average server usage, average port
usage, average time the network operates, average data transmitted on the network. The goal of the
linear or non linear programming is to optimize the usage such that usage is within the range of the
average usage minus its standard deviation and the average usage plus its standard deviation. These
are the lower and upper bounds of our objective function. Every combination of system variables
whose usage is within this usage range minimizes threat in the system.
Since the average port, host and server usage are derived from their corresponding usage models,
the linear and non linear programming analysis will be done independently for these ones. When
a threat is predicted in a system, the chance of it being accurate is dependent on the usage value at
that instance and whether it is within the range of the acceptable usage. This is constructed using
42. 41
41
the average usage and its standard deviation. Any usage value that is less than the average usage
minus its standard deviation is a threat. Also, a usage value that is greater than the average usage
plus its standard deviation is a threat. That means that any predicted threat at a point where the
predicted usage is within the usage range has a high chance of being false. In addition to that, the
actual and predicted usage values can be used to determine that chance that the predicted threat is
accurate. If the difference between them is high, there is a chance that the predicted usage may be
wrong. Since the predicted usage and the threat models are derived from the usage model function,
there is a chance the predicted threat is also false. Finally, the closer the correlation coefficient of
the usage model function is to zero, the higher the chance the predicted usage and its associated
threats values are wrong. Usage model functions with correlation coefficient of 0.6 and above
indicate that the predicted usage values and predicted threats values are accurate. These values are
obtained from the usage model function and the threat model function respectively which are
modeled using relevant systems variables that make it possible to model system usage and system
threats.
Risk Analysis in a System
To do risk analysis in a system, the frequency at which threats in the system occur and the impact
they have on the system must be known. When a frequency table is constructed for all threats and
their associated impacts stored, it becomes easy to analyze risks associated with a system.
When a threat is predicted, the likelihood of the threat occurring in the system can be computed
using the threat frequencies. The impacts various threats have can also be determined based on the
types of threats and other parameters such as the number of such threats, the speed at which they
occurred and the resources they affected or damaged. Risk in a system is computed as the product
of the likelihood of threat occurrence and the impact that threat occurrence has on the system.
These concepts are the basics for developing a risk analysis system using the techniques we have
discussed so far.
Normal Usage Model and Threat Model Simulation
In this chapter, we discuss the experiment that was conducted to determine the usage of a computer
system. We also discuss how to simulate the threat and usage models with the hope of developing
a threat detection system. Four of the micro usage models that were discussed in this paper were
used for the simulation. These are the ones for authentication, session CPU and memory.
43. 42
42
Because the usage model for authentications was determined to be a rational function, logs was
taken on both sides of the relation as part of the simulation in order to reduce the relation to their
linear form. The original function is Y=c1(x2/x1) +c2. When reduced to its linear form we have log
Y= log c1+ log x2 – log x1 + log c2. Since log c2 and log c2 results in constants let denote them with
k1 and k2 respectively. Additionally, let B= log Y, let j1= log x1 and let j2= log x2. Therefore, the
linear form of the usage for authentication is B= j2- j1 + k1 + k2. Since k1 + k2 is a constant let it be
represented by k. As such B= j2- j1 + k where B is the dependent variable and j2 and j1 are the
independent variables. When B, j2, and j1 are sampled, Y=c1(x2/x1) +c2 can be determined.
The cpu and the memory usage models are multiple linear forms. The original relation is of the
form y=c1x1+c2x2+c3 where x1 and x2 are the independent variables. The original relation must be
reduced to their simple linear form. To do this, determine y=b0+bx for each independent variable.
The sum of the various b0 equals c3. The various b correspond to the constant associated with the
independent variable for which y=b0+bx was determined. For example, the b for any y=b0+bx
determined for x1 equals to c1 and that for x2 equals to c2. When x1, x2, and y are sampled and the
various y=b0+bx determined, y=c1x1+c2x2+c3 can be determined completely.
The simulation was run for four times within a week. On the first instance, it was run for 15
minutes. On the second instance, it was run for 30 minutes. On the third instance it was run for 45
minutes. On the last instance it was run for 60 minutes. The functions for the usage models, and
their corresponding correlation coefficient were also determined.
44. 43
43
Tools and Computer Packages
This chapter discusses the tools and computer packages that were used throughout this research
project. We will also look at the programming languages, database platforms and development
frameworks that can be used to develop an anomaly based intrusion system for ecommerce sites
using the concepts were have discussed in this paper. The simulation was implemented using java.
It was a console based simulation. Java was chosen for its object oriented concepts such as
encapsulation, inheritance, interfaces, objects, and polymorphism.
To implement an intrusion detection system using results of this research, the following
tools will be essentials. These tools are best suited for intrusion detection systems developed for
ecommerce sites. Bootstrap, Codeignitor, MySQL Database Management System, SQLite,
SQLyog, and Eclipse. The programming languages that will be used are PHP and Android. PHP
is for the desktops and laptops that connect to the ecommerce sites and Android is for mobile
phones that use the ecommerce sites.
Bootstrap and Codeignitor are web development frameworks. Bootstrap is for frontend
developments and Codeignitor is a backend framework for PHP developers. For Android Eclipse
can be used as the best IDE for Android developments. MySQL and SQLyog are for the database
servers that will run on the ecommerce site as part of the intrusion detection system
implementation. SQLite is for the databases that run on the Android implementations that form
part of the intrusion detection system developed for the ecommerce website.
With are these tools frameworks and packages, developers are ready to develop intrusion
detection systems for ecommerce sites using the concepts in this research paper. It is expected that
the micro usage models discussed will be integral libraries that will be implemented in PHP and
Android as part of an implementation for ecommerce sites or any group of web or mobile
application system.
45. 44
44
Conclusion and Discussion
It is worth mentioning that the normal usage models and threat models experimented in this paper
represents a computer system and it associated threats. These threats can be analyzed periodically
and audited as part of a computer security audit. This will fuel development of a risk analysis
system. A risk analysis system, threat detection system and normal usage system developed from
experimenting the usage and threat models will make up a mobile security audit framework that
can be used for maintaining cyber security on computer systems. When practices and processes
for maintaining this framework are drafted and adhered to, it will make it easy to maintain cyber
security on various computer systems.
Additionally, it can be established that using the differential equations technique, the novel
self integrating data structure, and the linear and non programming techniques, threats on a system
can be analyzed and detected. To halt such threats, the intrusion detection system developed using
the techniques stated above must possess certain qualities. These qualities include correctness,
promptness, and ease of use. Correctness means how good the intrusion detection system can
detect threats. This is important because correctness affects the rate at which a predicted threat is
false or true. Promptness is related to the time it takes to detect or halt a threat and ease of use is
related to the property of the intrusion detection system aiding convenient use of the computer
system for which it is developed.
The techniques we have discussed make it possible to achieve correctness, promptness and
ease of use. The usage model function with its associated average usage and standard deviation
make it possible to ensure correctness of the intrusion detection system. This is because the
statistical data sampled for developing the intrusion detection system is within the range of the
acceptable usage. The average usage and standard deviation are computed using statistical models.
One of such models used in this research is the moments or mean and standard deviation model.
With this statistical model and the usage model equation, it becomes possible to ensure correctness
of the intrusion detection system.
To achieve promptness, multithreading is applied to analyze, predict, detect and halt
threats. All threats alarms and detection must use multithreading. Multithreading is a programming
concept that ensure that several processes run on the computer at the same time. This concept
makes it possible to predict multiple threats, do multiple threat analysis and halt or alarm