SlideShare a Scribd company logo
1 of 67
Download to read offline
An Intrusion Detection System, a Risk Analysis System, a Secured Expert Medical
Consultation System Developed Using Artificial Intelligence Technologies and a Cyber
Security Framework
Abstract
This paper is an investigation into building an Intrusion Detection System, a Risk Analysis
System and a Secured Expert Medical Consultation System developed using Artificial
Intelligence Techniques. First of all, we describe how to build a Usage Profile of a Computer
Network. This Usage Profile of the Computer Network will be the basic building block for the
development of the Intrusion Intrusion Detection System and the Risk Analysis System. The
proposed Usage Profile is made up of a Linear Regression model, a Mean and Standard
Deviation model, and a Hidden Markov model. These models can be built by sampling
experimental data of critical variables of a Computer Network.
Secondly, the Secured Expert Medical Consultation System will be developed using
Evolutionary Computing techniques. The proposed system will make it possible for Medical
Doctors to perform medical consultations with ease. This is because the system will be able
to suggest Doctor’s diagnosis, medical test and drug prescription associated with medical
information captured during medical consultation such as symptoms and reactions of
patients. The system would be able to do this by mapping a set of symptoms and conditions
with a database of diseases.
Additionally, this project will answer the question: how do we secure such an expert
medical consultation system? As such, we will look at security of the system itself and
security of other components of the system such as the database of the system and also
security of the server on which the system and its database would be hosted. In this project,
we will secure the system by performing Penetration Testing on the system as the system is
being developed. Some of the techniques we would explore include SQL Injections,
Dictionary attacks etc. Also, we would look at how to secure the database by configuring the
appropriate Access Controls on the Database Management System (DBMS) and configuring
the DBMS to guard against intrusion using authentication and authorization. We will also
look at Password types that will be good for such a system.
Also, this paper will outline processes, practices and guidelines that must be followed
for performing cybersecurity audit. These processes, practices and guidelines will be
outlined in a cybersecurity framework.
Finally, it is suggested that deviations from the usage profile of the computer
network can be flagged as anomalous activities. This can help us develop an intrusion
detection system and a risk analysis system that can be used for detecting and preventing
intrusions and performing risk analysis respectively.
1
Abstract 0
1.0 Introduction 5
2.0 Intrusion Detection System, Risk Analysis System and Cybersecurity Framework.6
2.1 Problem Definition 6
2.2 Research Questions 6
2.3 Objectives 6
2.4 Literature Review 7
2.4.1 Intrusion Detection Systems 7
2.4.2 Anomaly Detection Systems 8
2.4.3 Behaviour Encryption 8
2.4.4 Risk Analysis 8
2.4.5 Information Security Awareness and Practices 9
2.4.6 Protocol For Mitigating Risks on Social Networking Sites 10
2.4.7 Behaviour Models and Anomaly Intrusion Detection 10
2.5 Research Model and Methodology 10
2.5.1 Research Model 10
2.5.2 Usage Model: A Java Interface That Implements the Research Model 11
2.5.3 Usage Model File: model.java 11
2.5.4 Implementing the Usage Model for an Authentication System 12
2.5.5 Methodology 12
2.5.5.1 Machine Learning Algorithms & Behaviour Based Intrusion Systems 13
2.5.5.2 Audit Trail Analysis 13
2.5.5.3 Normal Usage Model 13
2.5.5.4 Threat Modelling 13
2.5.5.5 Boolean Calculus 13
2.5.5.6 Experimenting Usage and Threat Models 13
2.5.5.7Computer Usage Survey 14
2.5.5.8 Intrusion Detection Systems 14
2.6 Threats Associated With Computer Systems 14
2.6.1 Attacks associated with a computer system 14
2.6.2 Malicious Code 14
2.6.3 IP Scan and Attack 14
2.6.4 Web Browsing 15
2.6.5 Virus 15
2.6.6 Unprotected Shares 15
2.6.7 Mass emails 15
2.6.8 Simple Network Management Protocol (SNMP) 15
2.6.9 Hoaxes 15
2.6.10 Backdoors 15
2.6.11 Password Crack 15
2.6.12 Brute Force 16
2.6.13 Dictionary 16
2
2.6.14 Denial of Service (DoS) and Distributed Denial of Service (DDoS) 16
2.6.15 Spoofing 16
2.6.16 Man in the Middle 17
2.6.17 Spam 17
2.6.18 Mail Bombing 17
2.7 Mathematical Modelling Techniques and Machine Learning Based Models 17
2.7.1 Simple Linear Regression 17
2.7.2 Multiple Linear Regression 18
2.7.3 Non Linear Regression 18
2.7.4 Machine Learning Based Models Used for Developing Anomaly Based
Intrusion Detection Systems. 19
2.8 The Normal Usage Model of a System 20
2.8.1 Single Variable Calculus Review and its Applications 21
2.8.2 Usage Model List 22
2.8.3 Authentication Usage Model 22
2.8.4 Session Usage Model 23
2.8.5 Memory Usage Model 23
2.8.6 CPU Usage Model 23
2.8.7 Program Usage Model 24
2.8.8 Host Usage Model 24
2.8.9 Battery Usage Model 24
2.8.10 Device Usage Model 24
2.8.11 Server Usage Model 24
2.8.12 Port Usage Model 24
2.8.13 Network Usage Model 25
2.8.14 Aggressive Usage Detector 25
2.8.15 False Alarm Detector 25
2.8.16 Special Parameters of The Usage Model 25
2.8.17 Building The Usage Profile 26
2.8.18 Building a Usage Profile for an Authentication System 26
2.8.19 Building A Markov Chain Model for An Authentication System 26
2.8.20 Threat Models in a System 27
2.8.21 Properties and Methods of the Novel Self Integrating Data Structure 27
2.8.22 Integration Review 28
2.8.23 Interpretation of Threat Model Integrals 28
2.8.24 Threat Analysis and Detection 28
2.8.25 Threat Prediction 29
2.8.26 Risk Analysis in a System 30
2.9 Normal Usage Model and Threat Model Simulation 30
2.10 Tools and Computer Packages 31
3.0 Secured Expert Medical Consultation System 32
3.1 Problem Definition 32
3.2 Research Questions 32
3.3 Objectives of Paper 32
3
3.4 Literature Review 34
3.4.1 Evolutionary Computing Terminologies 34
3.4.2 Evolutionary Algorithms 34
3.4.2.1 Representation 34
3.4.2.2 Evaluation or Fitness Function 34
3.4.2.3 Population 35
3.4.2.4 Parent Selection Mechanism 35
3.4.2.5 Variation Operators 35
3.4.2.6 Mutation 35
3.4.2.7 Recombination 35
3.4.2.8 Survivor Selection Mechanism 36
3.4.2.9 Initialization 36
3.4.2.10 Termination Condition 36
3.4.3 Genetic Algorithms 36
3.4.4 Evolutionary Strategies 36
3.4.5 Genetic Programming 37
3.4.6 Evolutionary Programming 37
3.4.7 Differential Evolution 37
3.4.8 A Survey on Wearable Sensor-Based Systems for Health Monitoring and
Prognosis 37
3.4.9 Sensors in Medicine 38
3.4.10 Expert System Methodologies and Application 38
3.4.10.1 Rule-based Systems 38
3.4.10.2 Knowledge-based Systems 39
3.4.10.3 Neural Networks 39
3.4.10.4 Fuzzy Expert System 39
3.4.10.5 Object Oriented Methodologies 39
3.4.10.6 Case-based Reasoning 40
3.4.10.7 Modelling 40
3.4.10.8 System Architecture 40
3.4.10.9 Intelligent Agents 40
3.4.10.10 Ontology 41
3.4.10.11 Database Methodology 41
3.5 Research Model and Methodology 43
3.5.1 Research Model 43
3.5.2 Research Methodology 43
3.5.2.1 Assumption Enumeration 43
3.5.2.2 Hypothesis Formulation 44
3.5.2.3 Experimentation 44
3.5.2.4 Hypothesis Testing 44
3.5.2.5 Demonstration 45
3.5.2.6 Agile Development 45
3.6 Requirement Specification 46
3.6.1 Functional Requirement for the Android App 46
4
3.6.1.1 Consultation 46
3.6.1.2 Patient Basic details and medical information 46
3.6.1.3 User Settings and Authentication 46
3.6.2 Functional Requirements of the Web Application 47
3.6.2.1 Patient Details and Medical Information 47
3.6.2.2 Consultation 47
3.6.2.3 Drug Prescription and Medical Test 47
3.6.2.4 User Settings and Authentications 47
3.6.3 Non-Functional Requirements 48
3.7 Scope of Diseases 48
3.7.1 Symptoms and Reactions of Diseases that will be modelled 48
3.7.2 Symptoms of Malaria 49
3.7.3 Symptoms of Cholera 49
3.7.4 Symptoms of Diarrhoea 49
3.7.5 Symptoms of Bi-polar Disorder 49
3.7.6 Symptoms of Schizophrenia 49
3.7.7 Symptoms of Diabetes 49
3.7.8 Skin Diseases 50
3.7.9 Symptoms of Hypertension 50
3.7.10 Symptoms of Asthma 50
3.7.11 Medical Tests Associated with Diseases 51
3.8 The Evolutionary Computing World 51
3.8.1 Representation 51
3.8.2 Population 51
3.8.3 Initialization 51
3.8.4 Fitness Function 51
3.8.5 Parent Selection Mechanism 52
3.8.6 Survivor Selection Mechanism 52
3.8.7 Mutation 52
3.8.8 Recombination 52
3.8.9 Termination Condition 52
3.9 Design and Implementation 54
3.9.1 Designing the Mobile App 54
Fig. 2 55
3.9.2 Implementing the Mobile App 55
3.9.3 Designing the Web App 57
3.9.4 Implementing the Web App 57
4.0 Conclusion and Discussion 60
5.0 References 62
5
1.0 Introduction
If a usage profile of a system can be built, it will become possible to detect unusual
behaviour on the system. The method for building such usage profiles involves determining
factors of the system that are critical to the system. These factors can be seen as critical
system variables that affect the system’s usage. The other thing to consider is determining
the way in which you can obtain an abstract representation of the usage profile. The abstract
representation of the usage profile can be achieved by the application of behaviour models
such as statistical models, machine learning models and cognitive based models.
Secondly, Cyber security threats on computer networks have the potential of causing
damage to resources on the computer network. Examples of these damages include
corrupting data stored or transmitted on the network, infesting a host on the network with
virus, impersonating a valid user on the network and preventing proper functioning of
applications softwares on various hosts on the network. The security of computer systems is
very essential to various organizations. Computer systems security is usually provided by
computer software that protects the computer system for which they were developed. Such
a computer software system is an intrusion detection system. Other computer systems that
provide security are antivirus and firewall and risk analysis systems. Also, periodic computer
security audits will enable threat detection and prevention on computer networks.
Additionally, It must be stated that medical health care can be made a bit successful,
timely and will yield the expected results when an expert medical consultation system is
used in administering medical care and performing medical consultation. This type of system
can be developed through the application of artificial intelligence technologies such as
machine learning, artificial neural networks and evolutionary computing.
It must be stated that application of concepts of evolutionary computing can make it
possible to develop an expert medical consultation system. This is possible when we develop
a database of diseases with their symptoms and a database of medicines that are used to
cure the diseases and a database of reactions and indications and their associated medical
test that will aid in administering medical consultation.
This paper is an investigation into building an Intrusion detection System, a Risk
Analysis System and a Secured Expert Medical Consultation System using Artificial
Intelligence Techniques. The intrusion detection system and a risk analysis system which will
be developed using behaviour models such as statistical models, machine learning models
and cognitive models. The expert medical consultation system will be developed using
evolutionary computing techniques. Finally, the paper will outline a Cybersecurity
Framework that will describe the processes, practices and guidelines that must be adhered
when performing a security audit or risk analysis.
6
2.0 Intrusion Detection System, Risk Analysis
System and Cybersecurity Framework.
This chapter of the research paper is dedicated to the Intrusion Detection System, Risk
Analysis System and Cybersecurity Framework. We will discuss the objectives of the project,
the problem we seek to solve and the research question for this part of the research paper.
We will also describe how to build the usage profile that is the basic building block for the
intrusion detection system and the risk analysis system.
2.1 Problem Definition
If the normal usage or behaviour of a computer system can be represented by an abstract
model, then this abstract model can be used to detect threats on the system. The threats on
the system can be detected as deviations from the abstract model which is the behaviour of
the system. The main problems this paper seeks to investigate are listed below.
● Representing the normal usage or behaviour of a system with an abstract model.
● Determining activities and occurrences on the system that are deviations from the
system’s normal behaviour or usage.
● Representing these activities or deviations with an abstract model.
● Preventing such activities or occurrences from occurring on the system.
● In this paper the system’s normal behaviour is known as the usage profile and the
deviations from the system’s normal behaviour is known as the threat profile of the
system.
2.2 Research Questions
The main questions to be investigated are listed below.
● What are the best and most efficient techniques for modelling a system’s normal
behaviour or usage?
● What are the best and most efficient techniques for design and implementation of an
intrusion detection system?
● How can we build a risk analysis system for performing risk assessment of a
computer network?
2.3 Objectives
The main objectives of this research are as follows.
● Representing a computer network’s normal functioning with an abstract model
● Building a usage profile of a computer network.
7
● Detecting activities and occurrences that deviate from the normal usage of a
computer network and flagging these activities and occurrences as anomalous
activities on a computer network.
● Design and implementation of an Anomaly Intrusion Detection System.
● Design and implementation of a Risk Analysis System.
● Drafting of a document that details the procedures, processes, practices and
guidelines that must be followed when performing security audits..
2.4 Literature Review
This section reviews major topics that constitute this research paper and work done in some
of these areas. The topics and areas that will be considered for discussion include intrusion
detection systems since any discussion or study of threat and their source detection is
centred on intrusion detection systems. Also, behaviour encryption is another computer
security field that will be discussed in detail since it adds much value to information hiding
parts of this research. Risk analysis will also be reviewed to sum up what constitutes risk
analysis. Finally, there will be a review on Normal Usage Models.
2.4.1 Intrusion Detection Systems
Basically, there are two types of intrusion detection systems in the industry based on the
approach used for threat detection and the technologies used to build the system [25].
These are knowledge based also known as signature based and behaviour based intrusion
detection systems [25]. Each takes a different approach to threat detection and each uses
different technology for building the intrusion detection systems. Also, every single one has
its pros and cons.
Knowledge based intrusion detection systems are built on a database of already
known threats [25]. These known vulnerabilities or threats are called threat signatures [25].
Usually, detection is done as direct mappings of various system incidents that indicate
threats with threat signatures [25]. As a result, the database of threats must be constantly
updated for new identified threats [25]. Because new threats can be detected for inclusion
in the database, the correctness of detecting threats is sometimes compromised since
threats which do not have corresponding signatures cannot be mapped and detected [25].
But these types of intrusion detection systems have lower false alarms since each detected
threat is registered in the database of threat signatures [25].
Behaviour based intrusion detection systems take a different approach to threat
detection. They are built using artificial intelligence technologies [25]. Usually, the system for
which the intrusion detection is built is modelled for its behaviour and deviations from that
behaviour is used as a technique for detecting the threats [25]. Because of this, they have a
better correctness at detecting threats [25]. No threat signatures or mappings of incidents
that indicate threat is required [25]. Additionally, they have higher false alarms because
there is no mapping of detected threats with a database of known threats [25].
8
Besides these, intrusion detection systems are classified based on purposes for which
they are built and the activeness or passiveness at which they deal with threats [25]. There
are host based and network based intrusion detection systems made for such purposes [25].
Active intrusion detection systems are configured to block or prevent attacks while passive
intrusion detection systems are configured to monitor, detect and alert threats [25].
2.4.2 Anomaly Detection Systems
According to a research paper entitled “Design and Implementation of Anomaly Detection
System”, there are global variables of a network that can be used for detecting anomalous
activities on a network [19]. The paper used a hybrid of signature based and anomaly
intrusion detection to detect anomaly [19]. According to the paper, some of the techniques
used for detecting intrusion include using generic network rules to detect network anomaly.
The paper also used dynamic network knowledge such as network statistics to detect
anomalous activities [19].
2.4.3 Behaviour Encryption
Behaviour algorithms are applied to safeguard information on computing devices such as
mobile phones and laptops [27]. These algorithms are the basics for building systems that
study and encrypt user behaviour on a computing device in order to ensure the security of
information on the computing devices [27]. A study into mobile platform security reports
that behaviour encryption application systems have been designed and built, focusing on
mobile platforms [27]. Results from this study indicated that encryption application systems
are effective in ensuing mobile platform security [27].
In addition to this, it must be noted that, since mobile devices can have security
through behaviour encryption systems, then the behaviour of hosts on a network or
network systems can also be encrypted to ensure safe communication since each host or
user on a system or network has a particular behaviour pattern.
Cryptographic study into encrypting the normal usage model can fall under
behaviour encryption since the usage model represents a system’s behaviour and can be
composed of a user’s behaviour. This can aid in securing the information that embodies the
usage model. It is also necessary because if the usage model can easily be predicted then it
is possible to manipulate the usage model and launch an attack.
2.4.4 Risk Analysis
Computer risk analysis is also called risk assessment [49]. It involves the process of analyzing
and interpreting risk [49]. There are two main types of risk assessments: qualitative and
quantitative [44]. Quantitative Risk Assessments uses mathematical models and simulations
to assign numerical values to risk [46]. Qualitative Risk Assessments relies on an Expert’s
subjective judgement to build a theoretical model of risk for any given situation [46]. It must
be stated that, to analyze risk, the scope and methodology has to be initially determined
[49]. Later, information is collected and analyzed before interpreting the risk analysis results
[49]. Determining the scope can be described as identifying the system to be analyzed for
9
risk and parts of the system that will be considered [49]. Also, the analytical method that will
be used with its detail and formality must be planned [49]. The boundary, scope and
methodology used during risk assessment determine the total amount of work efforts that is
needed in the risk management, and the type and usefulness of the assessments result[49].
Risk has many components including assets, threats, likelihood of threat occurrence,
vulnerability, safeguard and consequence [49]. Two formulas for Risk are of paramount
significance to this research paper. The first one is given as;
Risk = Threat + Consequence + Vulnerability [47].
It must be emphasized that, “Risk in this formulas can be broken down to consider
likelihood of threat occurrence, the effectiveness of your current security program and the
consequence of an unwanted criminal or terrorist event occurring”[47]. The second formula
which I have known while I was an Information Security Consultant is given as;
Risk = Likelihood of Threat Occurrence ✕ Impact of Threat Occurrence
This formula is somewhat more suitable in performing risk assessment because it is a bit
simple. It could be used for qualitative risk assessments or even quantitative risk
assessments
Additionally, Risk management includes risk acceptance which takes place after
several risk analyses [48]. Normally, after risk has been analyzed and safeguards
implemented, the remaining or residual risk in the system that makes the system functional
must be accepted by management [48]. This may be due to constraints on the system such
as ease of use, or features of the systems for which strict safeguard will cost the organization
operational problems. As such, risk acceptance, like the selection of safeguards, should take
into account various factors besides those addressed in the risk assessment [49]. In addition,
risk acceptance should take into account the limitations of the risk assessment [49].
2.4.5 Information Security Awareness and Practices
A paper on information security awareness in Saudi Arabia discusses information security
awareness and practices. The paper is entitled “A study of information security awareness
and practices in Saudi Arabia.” This paper emphasizes the fact that information is under
constant threat from cyber vandals [1]. However, Saudi Arabia is rated poor in terms of
information security due to the fact that the country is a highly suppressed, patriarchical and
tribal culture country [1].
The paper examined the level of information security awareness among the general
public in the country using an anonymous online survey based on instruments the Malaysian
Security Organization produced [1]. In all, 633 persons responded to the survey and analysis
confirmed that indeed, information security awareness is low in the country and this is
mostly related to the fact that the country is highly suppressed, patriarchical and tribal in
nature [1].
10
2.4.6 Protocol For Mitigating Risks on Social Networking Sites
According to an academic paper entitled, “Protocol for mitigating the risk of hijacking social
networking sites”, hackers can hijack a user’s session on social networking sites, impersonate
the victim and take over his session [7].
The paper deals with this risk by presenting a security authentication protocol for
mitigating the risk [7]. The protocol takes into account that users of social networking sites
connect to the sites using several platforms and connection speeds [7]. To cater for mobile
devices and tablets using Wifi connection, a novel Self-Configuring Repeatable Hash Chains
(SCRHC) protocol was developed to prevent the hijacking of session cookies [7]. This
protocol supports three levels of caching making it possible to forfeit storage space for
enhanced performance and reduced workload [7].
2.4.7 Behaviour Models and Anomaly Intrusion Detection
Behaviour models are used to detect intrusion in computer systems. This section reviews the
behaviour models that can be used to build behaviour based intrusion detection systems.
These models are put into various categories. The categories are, statistical models, machine
learning based techniques, cognitive models, computer immunology, user intention.
Statistical models include operational or threshold metric model, markov process or marker
model, multivariate model, statistical moments model, time series models, univariate
models. Machine learning based models include bayesian networks, generic algorithms,
neural networks, fuzzy logic, and outlier detection, cognitive models include finite state
machines, description scripts, and expert systems.
2.5 Research Model and Methodology
This section describes the research model and methodology for developing the security
audit framework. We will describe the research model and the steps that make up the
methodology.
2.5.1 Research Model
Assume that the normal usage (Y) of a computer network can be represented by a
mathematical function;
Y=f (Xi, Ci) such that Xi represents system variables like number of functions or number of
authentications. Ci represents system constants like maximum or minimum number of
authentications. When a change in Y is beyond the standard deviation determined from the
data set of our usage, then that change indicates a threat. To investigate this threat, machine
learning algorithms, mathematical functions and behaviour based intrusion detection
systems will be studied to determine Y in terms of a number of variables that represent Y
appropriately. The expected usage model of the network to be investigated includes the
following components. Host Usage Model, Server Usage Model, Device Usage Model, Port
Usage Model, Network Usage Model, Session Usage Model, Authentication Usage Model,
11
Memory Usage Model, CPU Usage Model, Battery Usage Model and Program Usage Model.
These components are expected to be derived from the variables listed below.
● Average number of application software that run on the network system while using
the system
● Average number of system processes that run on the network system while using the
system.
● Average number of authentications in the network system.
● Average number of user actions that happens on the network system
● Average time a user spends before his session expires.
● Average time the network system functions each day.
● Number of paired ports communicating on the network
● Average amount of memory space used on devices while the network is being
operated.
● Average CPU time spent on a single device on the network
● Average life span of a single device battery on the network.
2.5.2 Usage Model: A Java Interface That Implements the Research Model
For each component of a computer system under investigation, we will program a usage
model which is an implementation of the research model for that component which forms
part of the computer system under investigation. Each usage model implements an interface
captured in a java file called model.java.
There are eight functions in the model.java interface. The first one is computeval
which is for computing the usage value at an instance. The second one is findchange which
is for finding changes in the usage of the computer system. The third one is learnsys which is
for learning the usage of the system. The fourth one is findrelationship which is for finding
the regression equation. The fifth one is monitor which is for monitoring the usage of the
system. The sixth one is showalarm which is for displaying error messages and detected
intrusion. The seventh one is haltprocess which is for halting detected intrusion and the
eighth one is predictvals. It is for predicting usage values based on the regression equation
determined. Omitting an implementation of one of the functions of the usage model will
throw an exception. To implement the usage model, you will use the java keyword
implements. Below is an implementation of the model.java file
2.5.3 Usage Model File: model.java
public interface model{
public double computeval();
public double findchange();
public void learnsys(int t);
public Object findrelationship();
public void monitor(int t);
12
public void showalarm(String info);
public void haltprocess();
public void predictvals();
}
2.5.4 Implementing the Usage Model for an Authentication System
class auth_usage implements model{
/*variable declaration for dependent and independent variables */
public double computeval(){
}
public double findchange(){
}
public void learnsys(int t){
}
public Object findrelationship(){
}
public void monitor(int t){
}
public void showalarm(String info){
}
public void haltprocess(){
}
public void predictvals(){
}
}/* end of class
2.5.5 Methodology
The list below details activities or processes that will be followed to represent a computer
system with an abstract mathematical model and analyze changes in that system. It is hoped
13
that following these processes will arrive at the design and implementation of a normal
usage model, an intrusion detection system and a risk analysis system.
2.5.5.1 Machine Learning Algorithms & Behaviour Based Intrusion Systems
Machine learning techniques and algorithms will be investigated to know the extent to
which an expert system that learns a computer system’s usage can be built. Since the
expected usage model is a mathematical model, various mathematical modelling techniques
will be applied to determining the normal usage model.
When deviations from these mathematical models are analyzed it can lead to design and
implementation of behaviour based intrusion detection systems. As such, a thorough study
into design and implementation of behaviour based intrusion detection systems will be
done.
2.5.5.2 Audit Trail Analysis
It is expected that computer security audit reports will be sampled and analyzed to arrive at
a set of dependent and independent variables and their data set. These variables and their
associated data set can be used to formulate the normal usage model.
2.5.5.3 Normal Usage Model
An investigation into applying the knowledge gained from the machine learning study, the
mathematical modelling study, the behaviour based intrusion detection system study and
the audit trail analysis will be done. It is hoped that this will answer the question how do you
represent the normal functioning of a computer system with a mathematical abstract model.
2.5.5.4 Threat Modelling
Differential equations of the normal usage model will be investigated to know the extent to
which deviations from the normal usage models can be analyzed. An abstract mathematical
model of these deviations will be formulated. These abstract models are derivatives of the
normal usage model.
2.5.5.5 Boolean Calculus
A study into representing the normal usage model with a boolean function will be done. It is
hoped that analyzing these boolean functions will aid in building a hardware that is the
expected usage system. Differential equations of these boolean functions will be studied to
analyze changes in the system that indicate deviation from the normal usage model.
2.5.5.6 Experimenting Usage and Threat Models
Programming will be used as a tool to experiment various usage and threat models. These
usage and threat models are expected to be derived from a computer system. This
experiment will lead to design and implementation of a normal usage system, an intrusion
detection system and a risk analysis system. These systems are expected to represent the
14
behaviour of a computer network, detect intrusion in a computer network, and used for
performing risk assessments respectively.
2.5.5.7Computer Usage Survey
A questionnaire for obtaining information about computer and smart phone usage will be
employed. It is expected that this will give an idea about various statistics that make up a
computer or smart phone’s usage. These statistics will be a guideline for sampling
experimental data of a computer system’s usage during experimenting the usage and threat
models.
2.5.5.8 Intrusion Detection Systems
It is hoped that an anomaly based intrusion detection system will be developed to
demonstrate the effectiveness of the research model at being used to model systems usage
and threats. The effectiveness of the intrusion detection system developed at preventing
intrusion in a computer network will also be measured. In this project, the intrusion
detection system that will be developed is for a computer network and an ecommerce site.
2.6 Threats Associated With Computer Systems
This chapter discusses some of the threats and attacks associated with computer and
network systems.
2.6.1 Attacks associated with a computer system
The attack types that will be discussed include Malicious Code, IP Scan and Attack, Web
Browsing, Virus, Unprotected Shares, Mass emails, Simple Network Management
Protocol(SNMP), Hoaxes, Backdoors, Password Crack, Brute Force, Dictionary, Denial of
Service(DoS) and Distributed Denial of Service Attack(DDoS), Spoofing, Man in the Middle,
Spam, Mail Bombing, Sniffers, Social Engineering, Buffer Overflow and Timing Attack.
2.6.2 Malicious Code
Malicious Code attack include the execution of viruses, worms Trojan horses and active Web
scripts with the intent to destroy or steal information. The state of the art malicious code is
the polymorphic or multivector worm. The attack programs uses up to six attack vectors to
exploit a variety of vulnerabilities in commonly known information system devices. Perhaps
the best illustration of such an attack remains the outbreak Nimda in Septembers 2001
which used five of the six vectors with startling speed. TruSecure Corporation an industry
source for information security statistics and solutions reports that Nimda spread to span
the internet address of 14 countries in less than 25 minutes.
2.6.3 IP Scan and Attack
The infested system scans a random or the local IP addresses and targets any of the several
vulnerabilities known to hackers or left over from previous exploits such as Code Red Black
Orifice, Poizon Box.
15
2.6.4 Web Browsing
If the infested system has write access to any Web page, it makes all the Web content files
(html, asp,gci and others) infectious so that users who browse to those pages become
infected.
2.6.5 Virus
Each infested machine infects certain common executable or script files on all computers to
which it can write with virus code that can cause infection.
2.6.6 Unprotected Shares
Using vulnerabilities in file systems and the way many organizations configure them, the
infested machine copies the viral components to all locations it can reach.
2.6.7 Mass emails
By sending email infections to addresses found in the address book. The infected machine
infects many users, whose mail reading program also automatically runs the programs and
infects other systems.
2.6.8 Simple Network Management Protocol (SNMP)
By using the widely known and common password that were employed in the early versions
of the protocol (which is used for remote management of networks and computer devices)
the attacking program can gain control of the device. Most vendors have closed these
vulnerabilities with software upgrades.
2.6.9 Hoaxes
A more devious approach to attacking computer systems is the transmission of a virus hoax
with a real virus attached, when the attack is masked, in seemingly legitimate message,
unsuspecting users readily distribute it. Even though those users are trying to do the right
thing to avoid infection, they end up sending the attack on to their coworkers and friends
and infesting many users along the way.
2.6.10 Backdoors
Using a known or previously unknown and newly discovered access mechanism, an attacker
can gain access into a system or network resource through a back door. Sometimes, these
entries are left behind by system designers or maintenance staff and thus referred to as trap
doors. A trap door is hard to detect, because, very often the programmer who puts it in
place also makes the access exempt from the usual audit logging features of the system.
2.6.11 Password Crack
Attempting to reverse-calculate a password is often called cracking. A cracking attack is a
component of many dictionary attacks. It is used when a copy of the security account
manager (SAM) data file can be obtained. The SAM file contains the hashed representation
16
of the user’s password. A password can be hashed using the same algorithm and compared
to the hashed results. If they are the same the password has then been cracked.
2.6.12 Brute Force
The application of computing and network resources to try every possible combination of
options of password is called brute force attack. Since this is often an attempt to repeatedly
guess passwords to commonly used accounts, it is sometimes called a password attack. If
attackers can narrow the field of accounts to be attacked, they can devote more time and
resources to attacking fewer accounts. That is one reason a recommended practice is to
change account names for common accounts from the manufacturer’s default. While often
effective against low-security systems, password attacks are often not useful against systems
that have adopted the usual security practices recommended by manufacturers.
2.6.13 Dictionary
This is another form of brute force attack. The dictionary attack narrows the field by
selecting specific accounts to attack and uses a list of commonly used password (the
dictionary) instead of random combinations. Organizations can use similar dictionaries to
disallow passwords during the reset process and thus guard against easy-to-guess
passwords. In addition, rules requiring additional number and/ or special characters make
the dictionary attack less effective.
2.6.14 Denial of Service (DoS) and Distributed Denial of Service (DDoS)
In a denial of service attack, the attacker sends a large number of connections or
information requests to a target. So many requests are made that the target system cannot
handle them along with legitimate requests for service successfully. This may result in the
system crashing or simply becoming unable to perform ordinary functions. A distributed
denial of service is an attack in which a coordinated stream of requests is launched against a
target from many locations at the same time. Most DDos attacks are preceded by a
preparation phase in which many systems, perhaps thousands are compromised. The
compromised machines are turned into zombies, machines that are directed remotely
(usually by a transmitted command) by the attacker or participate in the attack. DDos attacks
are the most difficult to defend against and there are presently no controls that any single
organization can apply. There are, however, some cooperative efforts to enable DDos
defences among groups of services providers; among them is the Consensus Roadmap for
Defeating Distributed Denial of Service attacks.
2.6.15 Spoofing
Spoofing is a technique used to gain unauthorized access to computers wherein the intruder
sends messages to a computer that has an IP address that indicates that the messages are
coming from a trusted host. To engage in IP spoofing, a hacker must first use a variety of
techniques to find an IP address of a trusted host and then modify the packet headers so
that it appears that the packets are coming from that host. Newer routers and firewalls
arrangements can offer protection against IP spoofing
17
2.6.16 Man in the Middle
In the well-known man-in-the-middle or TCP hijacking attack, an attacker monitors (or sniffs)
packets from the network, modifies them and inserts them back into the network. This type
of attack uses IP spoofing to enable an attacker to impersonate another entity on the
network. It allows the attacker to eavesdrop as well as to change, delete, reroute, add forge,
or divert data. In a variant on the TCP hijacking session, the spoofing involves the
interception of an encryption key exchange, which enables the hacker to act as an invisible
man-in-the-middle – that is eavesdropper – with regard to encrypted communications.
2.6.17 Spam
Spam is unsolicited commercial email. While many considers spam a trivial nuisance rather
than an attack, it has been used as a means to make malicious code attacks more effective.
In March 2002, reports emerged of malicious code embedded in MP3 files that were
included as attachments to spam. The most significant consequence of spam on the modern
organization, however, is the waste of both computer and human resources it causes by the
flow of unwanted electronic mail. Many organizations attempt to cope with the flood of
spam by using filtering technologies to stem the flow. Other organizations tell the users of
the mail system to delete unwanted messages.
2.6.18 Mail Bombing
Another form of e-mail attack that is also Dos is called mail bomb, in which an attacker
routes larger quantities of e-mail to the target. This can be accomplished through social
engineering or by exploiting various technical flaws in the Simple Mail Transport Protocol.
The target of the attack receives unmanageable large volumes of unsolicited e-mail. By
sending large e-mails with forged header information, attackers can take advantage of
poorly configured e-mail systems on the internet.
2.7 Mathematical Modelling Techniques and Machine Learning
Based Models
The mathematical relation that represents the normal usage model can be determined using
regression analysis. Regression analysis is a field of statistics. It employs the least squares
method to determine the relationship between a data set composed of two or more
variables. The least squares method tries to determine the relationship by minimizing the
error margin of the derived relation.
2.7.1 Simple Linear Regression
Simple linear regression problems involve a dependent and a single independent variable.
The goal is to find a linear relationship between the two variables. The linear relationships
are of the form y=b0+b1x where y is the dependent variable and x is the independent
variable. The slope of the line is b1 and the y-intercept is b0. The relationship between the
dependent and independent variable can be derived using the least squares method. First of
all, the sum of the dependent and the independent variables, and the sum product of the
18
dependent and the independent variables must be calculated. Secondly, the sum of the
squares of the dependent and the independent variables must be calculated.
The constant that represents the slope of the line that fits the predicted function is
calculated as the product of the sum product of the dependent variable and the
independent variable and the sample size minus the product of the sums of the dependent
and the independent variables divided by the product of the sample size and the sum of the
squares of the independent variable minus the square of the sum of the independent
variable.
The constant that represents the y-intercept of the line is also calculated as the
product of the sum of the dependent variable and the sum of the squares of the
independent variable minus the product of the sum of the independent and the sum
product of the dependent and independent variables divided by the product of the sum of
the squares of the independent variable and the sample size minus the square of the sum of
the independent variable.
Finally, the correlation coefficient of the predictive relation is also calculated as the
product of the sample size and the sum product of the dependent and independent variable
minus the product of the sums of the dependent and independent variables divided by the
square root of the product of the sample size and the sum of the squares of the
independent variable minus the product of the squares of the sum of the independent
variables multiplied by the product of the sample size and the sum of the squares of the
dependent variable minus the square of the sum of the dependent variable.
2.7.2 Multiple Linear Regression
Multiple linear regression problems involve a dependent variable and two or more
independent variables. Using the least squares method, the goal is to find the linear
relationship between the variables involved. The relationships are of the form y=b0 +
b1x1+b2x2+…+bnxn, where n is the number of independent variables, x1, x2,… ,xn are the
various independent variables and y is the dependent variable.
To solve multiple linear problems, we first need to reduce the expected function or multiple
linear models to their simple linear forms. In this form, it is easier to determine the
regression equation. To do this we need to determine the y=b0+b1x for every independent
variable. That way, the regression coefficient set denoted b associated with the independent
variables can be determined using the least squares method. As such the set b made up of
b1, b2,…bn is a set containing the entire regression coefficient associated with the
predicted regression function.
2.7.3 Non Linear Regression
Non linear regression problems involve finding a non linear relationship between a
dependent variable and one or more independent variables. Because non linear graphs are
difficult to analyze, they can be represented mathematically as linear models before they are
19
analyzed. This makes it possible to use linear regression techniques to analyze such
relationships.
One of the ways used to represent non linear relationships with linear models is taking logs
on both sides of the relationship equation. That reduces the non linear relationship to a
linear relationship. An example is of the form y2=x2/xy. To reduce this relationship to a linear
relation we take logs on both sides of the relation.
The resulting relationship is 2logy=2logx-logx-logy. When this relationship is simplified the
resulting relationship is logy=(logx)/3. In this form, the logy term represents the dependent
variable and the logx term represents the independent variable. Let K=logy and let P = logx.
It implies that K=P/3. This becomes the linear form of our non linear relation.
2.7.4 Machine Learning Based Models Used for Developing Anomaly Based Intrusion
Detection Systems.
This section discusses how hidden markov models can be used to detect and prevent threats
on a computer system.
Hidden markov models are machine learning models that are used to model states in a
system, the sequence in which they occur and the associated probabilities for each state
transition. When a system has a set of states in which it usually falls and it can be predicted
or established that each new state is dependent on the previous states, then hidden markov
models can be used to learn the state transitions that usually happens in the system. It must
be stated that the sequence in which states occur in a system can be characterized by a
parametric random process. Also, the probability associated with each state transition is
irrespective of the time in which the transition occurred in the system.
For computer systems which have occurrences that happen based on a parametric
random process, these occurrences can be seen as the set of states in the system. Some of
these occurrences may be the point at which the system is at its optimal usage, and the
point at which a particular threat occurs in the system. When a set of threat types that
happens in the system is determined, it becomes possible to study the sequence in which
these threats occur in the system and the various transitions between the threats using
hidden markov models. Also, the various usage points including the optimal, the minimum
and the average usage and how they are transited in the system can be studied using hidden
markov models.
Because various occurrences and threats can be studied using hidden markov
models, it becomes possible to predict the next occurrence or threat that will happen on a
host or a computer network. Threat sources can also be predicted using threat models.
When threat models are integrated, they give a general idea about the source of the threat.
With such knowledge and ability, the next threat or occurrence that has a higher likelihood
of happening on a host or network can be predicted using application of hidden markov
models. As such, occurrences can be prevented if they are estimated to be disastrous. Also,
if for instance, for some reason, the optimal or minimal usage must be reached, it becomes
possible to study ways of optimizing the transition from the current state or predicted next
20
state to the required state. This makes it possible to move from a particular usage point to
the desired usage point.
This approach to threat detection and usage optimization, make it possible to build
anomaly based intrusion detection systems that are correct, prompt and increase optimal
use of the system. The anomaly based intrusion detection systems built using these
techniques are correct because the threat models come from usage models that are built
using similar approaches and the threat prediction and prevention mechanisms are designed
using robust techniques developed using these approaches. Also, there are likely going to be
lower false alarms since the threats predicted on hosts or on the network come from threat
models designed from such robust methods.
An example of a kind of cyber security threat that this approach can be used to
model is a network problem where a student is determined or predicted to be sending
threatening or socially unacceptable emails to colleagues. Typically, his identity is hidden on
the network on which he sends the emails. As such, it is difficult to determine the likelihood
that he will send such threatening emails on a particular day or hour so that his identity
could be determined and brought to book. Using hidden markov models, a usage model of
the email system could be developed that will make it possible to determine the day or hour
in which he is likely going to send such an email. This will help in determining his identity
and bring him to book.
2.8 The Normal Usage Model of a System
If the normal usage of a network system can be represented by a mathematical function
such that that function is made up of system variables Xi and system constants Ci, then any
representation of our mobile system can be summarized as Y=f (Xi, Ci), where Y is our
systems’ usage and Xi are the various independent variables of our mobile system that
constitutes the normal usage model of the system. A normal usage model is an abstract
representation of the usual or normal functioning or behaviour of a system.
In order to model the normal usage of our system and determine its mathematical
representation, it is essential to keep the method simple and the variables simple in
abstraction and minimal in quantity. This makes it easy to analyze, model and detect threats
by applying a branch of calculus called differentiation. Simplicity and minimal number of
variables make it possible to arrive at a mathematical function whose differential coefficient
can be easily computed using differentiation. As such, two cases will be considered.
In the first case, the normal usage model of our system can be analyzed and
modelled based on simple but essential micro usage models. These micro usage models
represent smaller components of our mobile system such as an authentication system of our
mobile system, and a user’s session. Ideally, these models are best derived from exactly one
most appropriate system variable when feasible or at most two in order to reduce the
complexity involved in computing the differential coefficient of the usage model.
21
For a mathematical function involving more than a single independent variable, our
method for threat detection using the differential equations techniques is within the scope
of multivariable calculus. Since it is easy to compute the differential coefficient of a single
variable function, our threat analysis and detection can be easy if all our micro models are
single variable functions.
In the second case however, our usage model derives its mathematical representation from
at least two or three most relevant system variables of the mobile system under
examination. This option increases the complexity involved in calculating the differential
coefficient of our normal usage model and analyzing the threat associated. This is because
the normal usage model for this case is a function that can be derived from two or more
independent system variables.
To do this type of differentiation, we use a branch of calculus called partial
differentiation, where one of the independent variables of our usage model is held constant
to analyze changes in the usage. This type of differentiation is also within the scope of
multivariable calculus. The sections that follow the one below throw more light on how to
model the normal usage of several micro usage models. These micro usage models are
expected to be components of a computer network’s usage.
It must be noted that the usage model is made up of the usage model function and a
statistical model that captures the mean and standard deviation of the predicted usage
function. This statistical usage model is called moments or mean and standard deviation
model. There are other statistical models that could have been used. These include time
series models, univariate models and bivariate models.
2.8.1 Single Variable Calculus Review and its Applications
Assume a mobile system with exactly three major system variables. If sampling each of these
variables helps us to arrive at exactly one micro usage model of our mobile system that best
represents the behavior or functioning of that feature of our system, then we can use
differential equations of the three micro models to analyze and detect threats. Below are
some examples of calculus basics for our threat modelling techniques.
Y=2X+3 is a linear function that represents our first micro usage model. X is the number of
authentications. Y=3X2+2X+6 is a quadratic function that represents our second micro usage
model and X is the number of hosts on the mobile system’s wireless network. Y=40/ X+ 5 is
an exponential function that represents our third micro usage model and X is the number of
applications on a host on the mobile system’s wireless network. For each micro usage
model, the differential coefficient can be computed using the law for differentiation given
below.
Theorem 1: dy/dx(C) =0, where C is a constant. Theorem 2: dy/dx (f[Xi, Ci]) is computed as
the product of the exponent of the first term that results from simplifying f (Xi, Ci) and the
constant besides it multiplied by the system variable Xi raise to the power the original
exponent of the first term minus one plus the result for iterating the first step till every term
22
of f (Xi, Ci) has been evaluated based on the first step. The final result looks like the sum of a
series of rational numbers computed from the law after going through all the terms.
From the calculus basics review above, the corresponding differential coefficients of the
three micro models are determined as follows; 2, 6X+2, and -40/ X2. If the standard
deviations of our micro models are computed, then we can analyze changes in our system by
looking at values of our usage model and its derivatives and how they relate to the average
usage, its corresponding standard deviation, and the acceptable thresholds for threats.
Any occurrence at a point where our usage model value is not equal to the average usage
indicates a threat. Any occurrences at a point where the usage model value is less than the
average usage minus its corresponding standard deviation is a denial of service threat. Any
occurrence at a point where the usage model value is greater than the average usage plus it
corresponding standard deviation is an intrusion. Also any occurrence at a point where the
value of the usage’s derivative is not equal to the acceptable threshold for threats is a threat.
2.8.2 Usage Model List
It must be stated that for each component of the system under investigation, we will create
a usage model.
2.8.3 Authentication Usage Model
The authentication usage model represents the usage of an authentication system. The
independent variables that must be sampled to determine the usage of an authentication
system are the average data transmitted during an authentication (x1) and the average
network speed for a single authentication (x2). The average data transmitted is the average
of request and response data for a single authentication and the average network speed is
the average upload and download speed for a single authentication. The dependent variable
that must be sampled is the time taken for an authentication (y).
The goal of modelling the dependent and independent variables is to arrive at a
mathematical relationship between y and the two independent variables x1 and x2. It is
expected that the relationship will be Y=c1(x2/x1) +c2, where c1 and c2 are system
constants. In addition to that, some system constants that will aid threat analysis must be
determined. These are the total number of valid authentications, the expected
authentications within a time frame, the minimum authentications within a time frame and
the maximum authentications within a time frame. The mathematical relationship between
y, x1 and x2 is the normal usage model of the authentication system. After this relationship
has been determined, various occurrences that deviate from this relationship can be used to
analyze threats. For instance, any occurrence that is not equal to the average usage is a
threat. Additionally, any occurrence that indicates a change outside an acceptable threshold
is a threat. The acceptable threshold is a range within which changes in the systems are
deemed normal. Such a range is composed of the average usage and standard deviation.
23
2.8.4 Session Usage Model
A session usage model represents a single user’s behavior before his session expires. To
determine the mathematical model for a user’s session, two main independent variables
must be sampled. These are size of session data accumulated (x1), and number of user
actions (x2). The dependent variable that must be sampled is time spent before session
expires (y). The session usage model is expected to be made up of two micro usage models.
The mathematical representation of the micro usage models are expected to be Y=c1x1+c2
where c1 and c2 are systems constants and Y=c1x2+c2 where c1 and c2 are system
constants.
In addition to the two mathematical functions, some system constants that will aid threat
analysis must be determined. These include average user actions, average size of data
accumulated, average time spent. These constants can be determined from the data set
used to determine the usage model.
The two mathematical relationships represent the session usage model. Both are linear
functions. It is expected that as user actions increase the time spent also increases. It is also
expected that as data accumulated increase times spent also increases.
2.8.5 Memory Usage Model
The memory usage model represents the usage of memory space in a system. The
independent variables that must be sampled are number of application programs running
(x1), and the number of system processes running (x2). The dependent variable that must be
sample is amount of memory space being used(y). The mathematical relationship between
x1, x2, and y is expected to be y=c1x1+c2x2+c3 where c1 is the average memory space for
programs, c2 is the average memory space for processes and c3 is the average memory
being used when no process or program is running.
In addition to these, some system constants that aid threat analysis must be determined.
These include the minimum and maximum memory space for programs and the minimum
and maximum memory space for processes. The mathematical relationship between x1, x2,
and y is the memory usage model. When determined, the memory usage model can be used
to analyze changes in the memory usage that indicate threats in the system.
2.8.6 CPU Usage Model
The CPU usage model represents CPU usage in a system. The independent variables that
must be sampled are the number of application programs running (x1), and number of
system processes running (x2). The dependent variable that must be sampled is amount of
CPU power being used (y). The mathematical relationship between x1, x2, and y is expected
to be y=c1x1+c2x2+c3 where c1 is the average CPU power being used for programs, c2 is the
average CPU power being used for processes and c3 is average CPU power being used when
no process or program is running. In addition to these, some system constants that aid
threat analysis must be determined. These include the minimum and maximum CPU power
for programs and the minimum and maximum CPU power for processes. The mathematical
24
relationship between x1, x2 and y is the CPU usage model. When determined, the CPU
usage model can be used to analyze changes in the CPU usage that indicate threats in the
system.
2.8.7 Program Usage Model
To determine the program usage model the dependent and independent variables that must
be sampled are time spent using program (y), and number of functions used (x). In addition
to that, the following constants must also be determined. Minimum functions used and
maximum functions used. The relationship between y and x determined after sampling
various x and y values is the program usage model denoted by y=f(x).
2.8.8 Host Usage Model
The host usage model is composed of four independent variables. Memory usage (x1),
session usage (x2), CPU usage (x3), and program usage (x4), derived from their respective
usage models. The dependent variable that must be sampled in the time host spent on host
(y). Any relationship determined between the dependent and the independent variables is
the host usage model. The resulting host usage model is denoted y=f (x1,x2, x3, x4).
2.8.9 Battery Usage Model
The battery usage model is made up of the average usage of CPU, average memory usage
and the average usage of how a session behaves in the system. These are the independent
variables. The dependent variable is the battery lifespan. The independent variables are
derived from their respective micro usage models.
2.8.10 Device Usage Model
The device usage model is made up of a battery usage model, a host usage model, and the
time spent on the device. The usage models that make up the device usage model compute
the average micro usage and try to relate that with the time spent on the device. The time
spent on the device is the dependent variable.
2.8.11 Server Usage Model
The server usage model is made up of the CPU time being used, the memory space being
used and the number of processes running. These variables are used to form two different
micro usage models. As such, there are two dependent variables, CPU time and memory
space. The independent variable for both micro usage models is the number of processes
running.
2.8.12 Port Usage Model
The port usage model is made up of the time elapsed during communication, number of
programs that use the port and the number of paired ports. The number of paired ports is
the dependent variable and the remaining variables are the independent variables.
25
2.8.13 Network Usage Model
The network usage model is made up of average port usage, average server usage average
host usage, the average size of data transmitted on the network, and time spent on the
network. The first three variables are the independent variables. The remaining two are the
dependent variables. As such two micro usage models make up the network usage model.
2.8.14 Aggressive Usage Detector
This model is a utility that detects aggressive behavior on a system. It is modelled just like
the various micro usage models. Various factors that determine aggressive behavior during
system usage are used to determine the mathematical representation of this utility.
Aggressive behavior includes aggressive use of major system resources, and aggressive use
of system components with limited resources.
The average aggressive behavior and its standard deviation are determined. Any system
occurrence that indicates the average aggressive behavior, or the average aggressive
behavior plus its standard deviation or the average aggressive behavior minus its standard
deviation is considered a threat and must be halted, alerted or stored for audit purposes.
2.8.15 False Alarm Detector
The false alarm detector is a utility that detects normal system usage that otherwise may be
deemed threats. Occurrences that meet the criteria for false alarms are normal usage that
seems to put the entire usage of the system into a false state of vibration or anarchy. Such
usage occurrences are as such prioritized as normal optimal usage. The remedy for the
vibrations such usage occurrences cause is delay in other normal usage occurrences in the
system.
The state and magnitude of other system occurrences plus the state and magnitude
of the normal optimal usage determine the impact of the perceived anarchy. To increase
convenience with which the system for which this utility is developed, the average delay
time and its standard deviation must be detected. This utility is part of the normal usage.
The utility is modelled just like the aggressive usage detector.
2.8.16 Special Parameters of The Usage Model
This section discusses special parameters of our normal usage model. These parameters
include the average usage, the usage standard deviation, the minimum usage, the maximum
usage and the most frequent usage value recorded.
The average usage is the predicted average usage after the normal usage model function has
been determined. The usage standard deviation is the standard deviation of the predicted
normal usage function. The minimum and maximum usage values are the minimum and
maximum usage predicted using the normal usage model. These parameters together with
usage rates, threat model constants and other usage constants are used in analyzing and
detecting threats.
26
2.8.17 Building The Usage Profile
To build the usage profile we will first program a usage model for all the components of the
computer system under investigation. For this research, we want to build the usage profile
for a computer network. As such we will program a usage model for authentication on the
computer system, we will also program a usage model for a user’s session on the computer
system. Also, we will program the usage model for memory usage in a computer system. We
will also program a usage model for CPU usage. Additionally, we will program a usage model
for a host on a network and program another usage model for a server on the network and
finally we will program a usage model for the network its self.
The usage model for each component represents the behaviour of that component of a
computer system under investigation. The usage model when implemented will help us
determine the regression equation which represents the research model and the average
usage and its standard deviation. In addition to the regression equation and the mean and
standard deviation model we will develop a markov chain model for the system under
investigation. As such we will determine states in the entire computer network and the
various state transitions and the associated probabilities of state transitions. The rest of this
chapter will explain how to build a usage profile using an authentication system and explain
the details of the critical variables of the other usage models and explain the mathematical
theory needed for building the usage profile.
2.8.18 Building a Usage Profile for an Authentication System
To build a usage model for an authentication system, we must sample critical system
variables of a system. These variables include the download speed on the network, the
upload speed on the network, the size of data sent to the server during authentication, the
size of data sent to the client during authentication and the time it takes for a successful
authentication. The size of data sent and received from the server are request data and
response data respectively.
To build the usage model for the authentication data, we will capture data for all the critical
variables at equal time intervals say every 10 minutes while the authentication system is
being used. After having a sample of sample size of about 10 we will try to determine the
relationship between the dependent variable and the independent variables. As already
stated the relationship can be determined using simple or multiple linear regression. In
addition to the regression equation, we will also determine other statistics that describe the
behaviour of the authentication system such as the mean and standard deviations for the
variables that were sampled.
2.8.19 Building A Markov Chain Model for An Authentication System
Hidden markov models are machine learning models that are used to model states in a
system, the sequence in which they occur and the associated probabilities for each state
transition. When a system has a set of states in which it usually falls, and it can be predicted
or established that each new state is dependent on the previous states, then hidden markov
models can be used to learn the state transitions that usually happens in the system.
27
To build the markov chain model we will determine states on the authentication system and
their associated probabilities. Some of these states include the average usage of the
authentication system. This may be abstracted as the average time it takes for a successful
authentication. Other states include the minimum and maximum recorded time for a
successful authentication and the average time it takes for a failed authentication or the
maximum and minimum recorded time for failed authentications. With this information and
their associated probabilities of occurrence during a normal day we have more information
about the behaviour of the authentication system.
2.8.20 Threat Models in a System
A threat is a change in the normal usage model that is beyond a certain acceptable threshold
called the standard deviation of the usage model. A threat model on the other hand is an
abstract representation of this change in our mobile system that is beyond the acceptable
threshold. Integration can be performed on a threat model to determine the source of the
threat. Integration is a reverse operation for differentiation in calculus. A threat model that
can perform integration operations can be called a novel self integrating data structure. This
chapter of the paper will look at threat models of the micro usage models that make up a
computer network and how to analyze these threats in order to prevent them.
Also, how to determine the sources of these threats using a novel self integrating
threat model will be discussed. To do this, three main functions are introduced. The
functions are y=3, y=4X+2 and y=9X2+3. These functions are in the context of the novel self
integrating data structure. These functions are three different threat models. Additionally,
the threat models of the various micro usage models discussed in this paper will be
explored.
2.8.21 Properties and Methods of the Novel Self Integrating Data Structure
The best properties or characteristics of the data structure that represents our threat model
include just to mention a few, names of network software or host application software,
version number of network and host software, license information that include date
software was purchased or released and number of years needed for renewal, IP address
and Mac address of a host on a network.
The methods of such a gigantic or simulative object may include methods for
computing the integral of a threat model, another for computing the differential coefficient
of the predictive normal usage model, a method for computing the differential equation of a
network or host threat model. These methods included are mostly methods needed for
performing the major calculus operations that will help in the novel calculus simulation on a
network to detect threats and their sources on a wireless network. Besides these, it may be
necessary to implement methods that retrieve hidden network identity like IP and Mac
addresses on a local area network.
28
2.8.22 Integration Review
Based on our three functions stated in this chapter, we will do an introductory review of
integration which is a branch of calculus that is a reverse operation for differentiation. The
integrals for the functions introduced in this chapter are computed respectively as 3X +C,
2X2+4X+C and 3X3+3X+C where C represents system constants in the mobile system.
Computing the integral can be tricky so two laws are defined below to aid quick
computation of the integrals of a normal mathematical function.
Theorem 1:
If a function is represented by a constant such as a rational number, the integral is the
product of the variable x and the rational number which is the constant plus a system
constant c, to be determined by about a pair of x and y values.
Theorem 2:
If a function is not represented by a constant, the integral is given as the constant of the first
x occurring term divided by the sum of the exponent of the first x occurring term and 1
multiplied by the variable x raised to the power the sum of the exponent of the first x
occurring term and 1 plus repeating the same for every x occurring term plus the
corresponding system constant c.
2.8.23 Interpretation of Threat Model Integrals
Since the novel self integrating data structure is a programmed threat model, it is important
to discuss the meaning of its integrals. The integrals represent the source of the original
threat. Examples of the integrals of the threat model may result in detecting the function,
software, host or network from which the threat was detected. With properties like
software name, version number, IP and Mac addresses it becomes easy to pin point the
source of the threat.
If the integral of a threat model looks like the normal usage model of a function of
the system under examination, then that function from the system under examination can
be predicted as the source of the threat. Similarly, if the integral is similar to the normal
usage model of a software, host, or network that forms part of the system which is being
investigated, then that threat can be predicted to be from that software, host or network.
2.8.24 Threat Analysis and Detection
To do threat analysis in a system and abort processes that initiated those threats, linear and
non linear programming techniques can be used. The goal here is to minimize the threat
occurrence frequency and the overall impacts associated with the threat and optimize the
normal usage function. In addition to these two goals, there are some constants that aid
threat analysis. These constants are associated with the normal usage model and the threats
in the system.
Examples of these constants may be the rate at which usage is increasing with
respect to a particular usage variable or the rate at which the threat impact and frequency
29
increases with respect to a particular variable in the usage model and other special
parameters associated with the usage model function.
The average usage, its standard deviation and the threat model function make up the
threat model. The average usage and standard deviation are constants in the threat model.
Using the threat model function, the average usage and standard deviation, threats analysis
can be done using linear and non linear programming. The goal is to minimize threats using
the threat model function as the objective function and the average usage and standard
deviation as constraints. Other parameters that may be used as constraints include the rate
at which usage is increasing with respect to a particular usage variable or the rate at which
the threat impact and frequency is increasing with respect to a particular usage variable.
2.8.25 Threat Prediction
This section discusses how to predict threats in a system. The network usage model
discussed in the previous chapter and its associated threat model will be used to
demonstrate how to predict or detect a threat in a system. As discussed in the previous
section, threats can be detected using linear and non linear programming. The network
usage model function and its associated threat model function are the objective functions.
The constraints that will be used are the average network usage and its standard
deviation, and other parameters such as the rate at which the network threat increases with
respect to other network usage model components such as average host usage, average
server usage, average port usage, average time the network operates, average data
transmitted on the network. The goal of the linear or non linear programming is to optimize
the usage such that usage is within the range of the average usage minus its standard
deviation and the average usage plus its standard deviation. These are the lower and upper
bounds of our objective function. Every combination of system variables whose usage is
within this usage range minimizes threat in the system.
Since the average port, host and server usage are derived from their corresponding
usage models, the linear and non linear programming analysis will be done independently
for these ones. When a threat is predicted in a system, the chance of it being accurate is
dependent on the usage value at that instance and whether it is within the range of the
acceptable usage. This is constructed using the average usage and its standard deviation.
Any usage value that is less than the average usage minus its standard deviation is a threat.
Also, a usage value that is greater than the average usage plus its standard deviation is a
threat. That means that any predicted threat at a point where the predicted usage is within
the usage range has a high chance of being false. In addition to that, the actual and
predicted usage values can be used to determine that chance that the predicted threat is
accurate. If the difference between them is high, there is a chance that the predicted usage
may be wrong.
Since the predicted usage and the threat models are derived from the usage model
function, there is a chance the predicted threat is also false. Finally, the closer the
correlation coefficient of the usage model function is to zero, the higher the chance the
30
predicted usage and its associated threats values are wrong. Usage model functions with
correlation coefficient of 0.6 and above indicate that the predicted usage values and
predicted threats values are accurate. These values are obtained from the usage model
function and the threat model function respectively which are modelled using relevant
system variables that make it possible to model system usage and system threats.
2.8.26 Risk Analysis in a System
To do risk analysis in a system, the frequency at which threats in the system occur and the
impact they have on the system must be known. When a frequency table is constructed for
all threats and their associated impacts stored, it becomes easy to analyze risks associated
with a system.
When a threat is predicted, the likelihood of the threat occurring in the system can be
computed using the threat frequencies. The impacts various threats have can also be
determined based on the types of threats and other parameters such as the number of such
threats, the speed at which they occurred and the resources they affected or damaged. Risk
in a system is computed as the product of the likelihood of threat occurrence and the impact
that threat occurrence has on the system. These concepts are the basics for developing a
risk analysis system using the techniques we have discussed so far.
2.9 Normal Usage Model and Threat Model Simulation
In this chapter, we discuss the experiment that was conducted to determine the usage of a
computer system. We also discuss how to simulate the threat and usage models with the
hope of developing a threat detection system. Four of the micro usage models that were
discussed in this paper were used for the simulation. These are the ones for authentication,
session CPU and memory.
Because the usage model for authentications was determined to be a rational
function, logs were taken on both sides of the relation as part of the simulation in order to
reduce the relation to their linear form. The original function is Y=c1(x2/x1) +c2. When
reduced to its linear form we have log Y= log c1+ log x2 – log x1 + log c2. Since log c2 and log
c2 result in constants let denote them with k1 and k2 respectively. Additionally, let B= log Y,
let j1= log x1 and let j2= log x2. Therefore, the linear form of the usage for authentication is
B= j2- j1 + k1 + k2. Since k1 + k2 is a constant, let it be represented by k. As such B= j2- j1 + k
where B is the dependent variable and j2 and j1 are the independent variables. When B, j2,
and j1 are sampled, Y=c1(x2/x1) +c2 can be determined.
The cpu and the memory usage models are multiple linear forms. The original
relation is of the form y=c1x1+c2x2+c3 where x1 and x2 are the independent variables. The
original relation must be reduced to their simple linear form. To do this, determine y=b0+bx
for each independent variable. The sum of the various b0 equals c3. The various b
correspond to the constant associated with the independent variable for which y=b0+bx was
determined. For example, the b for any y=b0+bx determined for x1 equals to c1 and that for
31
x2 equals to c2. When x1, x2, and y are sampled and the various y=b0+bx determined,
y=c1x1+c2x2+c3 can be determined completely.
The simulation was run for four times within a week. On the first instance, it was run
for 15 minutes. On the second instance, it was run for 30 minutes. On the third instance it
was run for 45 minutes. On the last instance it was run for 60 minutes. The functions for the
usage models, and their corresponding correlation coefficient were also determined.
2.10 Tools and Computer Packages
This chapter discusses the tools and computer packages that were used throughout this
research project. We will also look at the programming languages, database platforms and
development frameworks that can be used to develop an anomaly based intrusion system
for ecommerce sites using the concepts we have discussed in this paper. The simulation was
implemented using java. It was a console based simulation. Java was chosen for its object
oriented concepts such as encapsulation, inheritance, interfaces, objects, and
polymorphism.
To implement an intrusion detection system using results of this research, the
following tools will be essentials These tools are best suited for intrusion detection systems
developed for ecommerce sites. Bootstrap, CodeIgnitor, MySQL Database Management
System, SQLite, SQLyog, and Eclipse. The programming languages that will be used are PHP
and Android. PHP is for the desktops and laptops that connect to the ecommerce sites and
Android is for mobile phones that use the ecommerce sites.
Bootstrap and CodeIgnitor are web development frameworks. Bootstrap is for
frontend developments and CodeIgnitor is a backend framework for PHP developers. For
Android Eclipse can be used as the best IDE for Android developments. MySQL and SQLyog
are for the database servers that will run on the ecommerce site as part of the intrusion
detection system implementation. SQLite is for the databases that run on the Android
implementations that form part of the intrusion detection system developed for the
ecommerce website.
With all these tools, frameworks and packages, developers are ready to develop
intrusion detection systems for ecommerce sites using the concepts in this research paper. It
is expected that the micro usage models discussed will be integral libraries that will be
implemented in PHP and Android as part of an implementation for ecommerce sites or any
group of web or mobile application systems.
32
3.0 Secured Expert Medical Consultation System
In this chapter, we describe the objectives for the secured expert medical consultation
system, the research question for developing the system, and the problems that we seek to
address. We will also look at the requirements of the system and then describe how the
system will be developed.
3.1 Problem Definition
The three problems that this research project seek to address are:
● Developing a medical system that will assist in medical consultation
● Determining diseases that a Patient is likely to get based on medical history.
● Measuring medical information such as height, weight, temperature, blood pressure
and blood sugar.
3.2 Research Questions
● What are the best and most efficient ways of modelling diseases for development of
a medical care system for administering medical care?
● How can an expert medical consultation system be developed?
● How can we determine disease that a Patient will get based on medical history?
● How can we develop a hardware system and a software that can be used to measure
medical information such as weight, height, temperature, blood pressure and blood
sugar?
3.3 Objectives of Paper
● The first goal is to model disease for the development of an expert system for
diagnosing diseases during medical consultation.
● The second goal is to optimize how to match a set of symptoms and indications to a
particular disease.
● The third goal is to find diseases that a Patient can bet based on medical history
● The fourth goal is to develop a medical equipment that can be used to measure
medical information such as height, weight, temperature, blood pressure and blood
sugar.
● The last goal is to develop an expert medical consultation system for mobile phones,
tablets and personal computers.
33
34
3.4 Literature Review
In this section we describe the various literature that forms part of this research. We will
look at evolutionary algorithms, the various types of evolutionary algorithms and how
sensors can help in measuring medical data.
3.4.1 Evolutionary Computing Terminologies
Some of the terms used in evolutionary computing are phenotypes, genotypes,
chromosomes, genes and alleles [3]. Phenotypes are a set of search space that are related
to the possible solutions in a problem [3]. Genotypes are a set of result space that are
related to the possible solution in a problem [3]. The transition from the search space,
phenotypes to the results space genotypes is encoding [3]. The transition from the results
space to the search space is decoding [3]. In some cases, the search space may be a set of
integers and the result space may be a set of binary numbers representing a search integer
in the search space [3].
3.4.2 Evolutionary Algorithms
One of the techniques used in evolutionary computing is evolutionary algorithms. The
components of an evolutionary algorithm are representation, Evaluation or fitness function,
population, parent selection mechanism, variation operators which are recombination and
mutation, survivor selection mechanism (replacement), initialization and termination
condition [3]. Some of the classes of Evolutionary Algorithms are Genetic Algorithms,
Genetic Programming, Differential Evolution, Evolutionary Strategy, and Evolutionary
Programming [2].
3.4.2.1 Representation
Representation includes changing the real world into the evolutionary computing world [3].
The possible solution set which is the set of phenotypes is encoded into objects in the
evolutionary computing world called genotypes [3]. Many synonyms are used to describe
elements of the two space [3]. The genotypes are called chromosomes [3]. Genes are
placeholders and alleles describe objects in the place [3].
3.4.2.2 Evaluation or Fitness Function
The evaluation function forms the basis for selection [3]. It is the requirement to adapt to. It
defines what improvements means [3]. From the problem-solving perspective, it represents
the task to solve in the evolutionary computing context [3]. Technically, it represents a
procedure that assigns a quality measure to the genotypes [3]. Typically, the procedure is
composed from a quality measure in the genotype space and the reverse representation
[3]. Often the problem to solve in an evolutionary algorithm is an optimization problem [3].
In such cases the name objective function is used in the problem context and the fitness
function is identical to or a simple transformation of the objective function [3].
35
3.4.2.3 Population
The population is a multiset of genotypes [3]. The role of the population is to hold possible
solutions [3]. The population is the unit of evolution [3]. Genotypes are static individual
objects, not changing or adapting, it is the population that does [3].
3.4.2.4 Parent Selection Mechanism
The role of parent selection mechanism is to distinguish among individuals based on their
quality [3]. This is to allow better individuals to become parents of the next generation. An
individual is seen as a parent if it has been selected to undergo variation in order to create
offspring [3]. The parent selection mechanism together with the survivor selection
mechanism is essential for quality improvements [3]. In EC parent selection mechanism is
usually probabilistic. Thus, high quality individuals get a higher chance of becoming parents
than those with low quality [3]. However low quality is usually given a small chance
otherwise the entire search becomes too greedy and gets stuck in a local optimum [3].
3.4.2.5 Variation Operators
The variation operators are mutation and recombination [3]. The review for mutation and
recombination is given below.
3.4.2.6 Mutation
Mutation is an operation that is performed on one genotype and produces a slightly
modified mutant [3]. As such, mutation is a unary operator [3]. A mutation operator is
usually stochastic [3]. As such, its output, which is the child, depends on a series of random
choices [3]. It should be noted that an arbitrary unary operator is not necessarily mutation
[3]. Mutation in general is supposed to cause an unbiased random change [3]. It must be
noted that the variation operator forms the evolutionary implementation of basic steps
within the search space [3]. Theorems suggesting that given sufficient time evolutionary
algorithms (EA) determine a global optimum depends on the property of each genotype
representing a possible solution that can be reached by the variation operators [3].
3.4.2.7 Recombination
The name for a binary operator is recombination or crossover [3]. Similar to mutation
recombination is a stochastic operator [3]. The choice on which part of the parent is
combined and how these parts are combined are random selection [3]. Recombination
operators with higher arity, that is having more than one operand or parent is
mathematically possible and easy to implement but have no biological equivalence [3]. That
is why perhaps they are not widely used although several studies show that they have a
positive effect on evolution [3]. The principle behind recombination is very simple. That is,
by mating parents or individuals with different features we can produce offspring with both
features [3]. Biologically, recombination is the superior form of reproduction [3].
36
3.4.2.8 Survivor Selection Mechanism
Survivor selection mechanism is often called replacement or replacement strategy [3].
However, a good reason to use the survivor mechanism is to keep terminology [3]. The role
of the survivor mechanism is to distinguish among individuals based on their quality. It is
similar to parent selection, but it is used at a different stage of the evolution cycle [3].
3.4.2.9 Initialization
Initializations are kept simple in most EA applications [3]. The first population is seeded by
randomly generated individuals [3].
3.4.2.10 Termination Condition
There are two types of termination conditions [3]. The first one is when the evolutionary
computing problem has an optimal fitness level [3]. This may probably come from a known
optimum of the given objective function or fitness function [3]. In such cases, when that
level is reached then the evolutionary problem search can be stopped [3]. However, EAs are
stochastic in nature and the optimum may not be reached hence the fitness function may
never be satisfied and the algorithm may never stop [3]. That requires that the condition is
extended with one that certainly stops the algorithm [3]. Some of these extensions include
the following. Using maximum allowed CPU time [3]. So, when this maximum time elapses
then the algorithm is stopped [3]. Total number of fitness evaluations is given a limit so that
the algorithm is stopped when this limit is reached [3]. To do the evolution a number of
times for example for a number of generations [3].
3.4.3 Genetic Algorithms
A Genetic Algorithm is a search heuristic that is inspired by Charles Darwin’s theory of
natural evolution [24]. This algorithm reflects the process of natural selection where the
fittest individuals are selected for reproduction in order to produce offspring for the next
generation [24]. There are five phases in a genetic algorithm [24]. These are;
Initial Population, Fitness Function, Selection, Crossover, and Mutation [24].
3.4.4 Evolutionary Strategies
Evolutionary Strategies (ES) is one type of black - box optimization algorithm that belongs to
the family of evolutionary algorithms [30]. The optimization targets of Evolutionary
Strategies are vectors of real numbers [30]. It must be noted that Evolutionary Strategies
are stochastic optimization algorithms and are designed specifically for continuous function
optimization [26].
37
3.4.5 Genetic Programming
Genetic Programming is a domain-independent method for genetically breeding a
population of computer programs to solve a problem ]28]. That is, Genetic Programming
iteratively transforms a population of computer programs into a new generation of
programs by applying analogs of naturally occurring genetic operations [28]. It must be
stated that Genetic Programming is a form of Artificial Intelligence that mimics natural
selection to find optimal results [20].
3.4.6 Evolutionary Programming
Evolutionary Programming originally conceived by Lawrence J. Fogel in 1960 is a stochastic
optimization technique similar to Genetic Algorithms [43]. One main difference between
Evolutionary Programming and Genetic Algorithms is that it places emphasis on behavioural
linkage between parent and offspring rather than seeking to emulate specific genetic
operators as observed in nature [43]. It is also similar to Evolutionary Strategies although
they were developed independently [43].
3.4.7 Differential Evolution
Differential Evolution is a heuristic approach for global optimization of nonlinear and
non-differentiable continuous space functions [38]. Differential Evolution is similar to
popular direct search approaches such as genetic algorithms and evolutionary strategies
[38]. It must be stated that this algorithm is advantageous over the other mentioned
approaches because it can handle nonlinear and non-differentiable muti-dimension
objective functions, while requiring very few control parameters to steer the minimisation
[38].
3.4.8 A Survey on Wearable Sensor-Based Systems for Health
Monitoring and Prognosis
A research paper entitled “A Survey on Wearable Sensor-Based Systems for Health
Monitoring and Prognosis” describes wearable and biomedical health systems for health
monitoring and prognosis [5]. The paper explains that these wearable and biomedical
health systems have gained a lot of attention in the scientific community [5]. The paper also
explains that this is “mainly motivated by increasing healthcare costs and propelled by
recent technological advances in miniature biosensing devices, smart textiles,
microelectronics, and wireless communications, the continuous advance of wearable
sensor-based systems will potentially transform the future of healthcare by enabling
proactive personal health management and ubiquitous monitoring of a patient's health
condition” [5].
38
The paper attempts to review the current research and developments on wearable
biosensor systems for medical monitoring. According to the paper, a variety of system
implementations are compared in an approach to identify the technological shortcoming of
the current state of the art in wearable biosensor solutions and systems [5]. The paper also
explains that “an emphasis is given to multiparameter physiological sensing system designs,
providing reliable vital signs measurements and incorporating real-time decision support for
early detection of symptoms or context awareness” [5].
3.4.9 Sensors in Medicine
According to another research paper entitled “Sensors in Medicine'', sensors are devices
that detect physical, chemical and biological signals and provide a way for those signals to
be measured and recorded.[8] Also that paper explains that “physical properties that can be
sensed include temperature, pressure, vibration, sound level, light intensity, load or weight,
flow rate of gases and liquids, amplitude of magnetic and electronic fields, and
concentrations of many substances in gaseous, liquid, or solid form. Although sensors of
today are where computers were in 1970, medical applications of sensors are taking off
because of advances in microchip technologies and molecular chemistry.” [8]
3.4.10 Expert System Methodologies and Application
This sectionof the research paper describes expert system methodology and applications. A
research paper on expert system methodology and applications classifies expert system
methodology and applications by using literature review and classification of articles from
1995 to 2004 [37]. According to the research paper, based on its survey and classification of
articles, it describes eleven categories of expert system methodology classification [37]. The
eleven categories are Rule-based Systems, Knowledge-based Systems, Neural Networks,
Fuzzy Expert Systems, Object Oriented Methodologies, Case-based Reasoning, System
Architecture, Intelligent Agent Systems, Database Methodologies, Modelling and
Ontology[37]. The paper also describes the applications, research and problem domains for
the various categories.
3.4.10.1 Rule-based Systems
A Rule-based Expert System is a system which contains information obtained from a Human
Expert and represents that information in a form of rules, such as IF-THEN [37]. The rule can
be used to perform operations on the data to infer in order to reach an appropriate
conclusion [37].
According to the research paper, applications of Rule-based expert systems include
State Transition Analysis, Psychiatric Treatment, Production Planning, Advisory System,
Teaching, Electronic Power Planning, Automobile Process Planning, Hypergraph
Representation, System Development, Knowledge Verification/Validation, Alcohol
Production, DNA Histogram Interpretation, Knowledge Based Maintenance, Scheduling
Strategy, Management Fraud Assessment, Knowledge Acquisition, Communication System
An Intrusion Detection System, a Risk Analysis System , a Secured  Expert Medical Consultation System and a Cyber Security  Framework
An Intrusion Detection System, a Risk Analysis System , a Secured  Expert Medical Consultation System and a Cyber Security  Framework
An Intrusion Detection System, a Risk Analysis System , a Secured  Expert Medical Consultation System and a Cyber Security  Framework
An Intrusion Detection System, a Risk Analysis System , a Secured  Expert Medical Consultation System and a Cyber Security  Framework
An Intrusion Detection System, a Risk Analysis System , a Secured  Expert Medical Consultation System and a Cyber Security  Framework
An Intrusion Detection System, a Risk Analysis System , a Secured  Expert Medical Consultation System and a Cyber Security  Framework
An Intrusion Detection System, a Risk Analysis System , a Secured  Expert Medical Consultation System and a Cyber Security  Framework
An Intrusion Detection System, a Risk Analysis System , a Secured  Expert Medical Consultation System and a Cyber Security  Framework
An Intrusion Detection System, a Risk Analysis System , a Secured  Expert Medical Consultation System and a Cyber Security  Framework
An Intrusion Detection System, a Risk Analysis System , a Secured  Expert Medical Consultation System and a Cyber Security  Framework
An Intrusion Detection System, a Risk Analysis System , a Secured  Expert Medical Consultation System and a Cyber Security  Framework
An Intrusion Detection System, a Risk Analysis System , a Secured  Expert Medical Consultation System and a Cyber Security  Framework
An Intrusion Detection System, a Risk Analysis System , a Secured  Expert Medical Consultation System and a Cyber Security  Framework
An Intrusion Detection System, a Risk Analysis System , a Secured  Expert Medical Consultation System and a Cyber Security  Framework
An Intrusion Detection System, a Risk Analysis System , a Secured  Expert Medical Consultation System and a Cyber Security  Framework
An Intrusion Detection System, a Risk Analysis System , a Secured  Expert Medical Consultation System and a Cyber Security  Framework
An Intrusion Detection System, a Risk Analysis System , a Secured  Expert Medical Consultation System and a Cyber Security  Framework
An Intrusion Detection System, a Risk Analysis System , a Secured  Expert Medical Consultation System and a Cyber Security  Framework
An Intrusion Detection System, a Risk Analysis System , a Secured  Expert Medical Consultation System and a Cyber Security  Framework
An Intrusion Detection System, a Risk Analysis System , a Secured  Expert Medical Consultation System and a Cyber Security  Framework
An Intrusion Detection System, a Risk Analysis System , a Secured  Expert Medical Consultation System and a Cyber Security  Framework
An Intrusion Detection System, a Risk Analysis System , a Secured  Expert Medical Consultation System and a Cyber Security  Framework
An Intrusion Detection System, a Risk Analysis System , a Secured  Expert Medical Consultation System and a Cyber Security  Framework
An Intrusion Detection System, a Risk Analysis System , a Secured  Expert Medical Consultation System and a Cyber Security  Framework
An Intrusion Detection System, a Risk Analysis System , a Secured  Expert Medical Consultation System and a Cyber Security  Framework
An Intrusion Detection System, a Risk Analysis System , a Secured  Expert Medical Consultation System and a Cyber Security  Framework
An Intrusion Detection System, a Risk Analysis System , a Secured  Expert Medical Consultation System and a Cyber Security  Framework
An Intrusion Detection System, a Risk Analysis System , a Secured  Expert Medical Consultation System and a Cyber Security  Framework

More Related Content

What's hot

Advanced Cybersecurity Risk Management: How to successfully address your Cybe...
Advanced Cybersecurity Risk Management: How to successfully address your Cybe...Advanced Cybersecurity Risk Management: How to successfully address your Cybe...
Advanced Cybersecurity Risk Management: How to successfully address your Cybe...
PECB
 

What's hot (20)

ICAAP - IBANK
ICAAP - IBANKICAAP - IBANK
ICAAP - IBANK
 
Simulation
SimulationSimulation
Simulation
 
Cyber Security in the Manufacturing Industry: New challenges in the informati...
Cyber Security in the Manufacturing Industry: New challenges in the informati...Cyber Security in the Manufacturing Industry: New challenges in the informati...
Cyber Security in the Manufacturing Industry: New challenges in the informati...
 
Information security governance
Information security governanceInformation security governance
Information security governance
 
Information Risk Management - Cyber Risk Management - IT Risks
Information Risk Management - Cyber Risk Management - IT RisksInformation Risk Management - Cyber Risk Management - IT Risks
Information Risk Management - Cyber Risk Management - IT Risks
 
BSidesAugusta 2022 - The Power of the OT Security Playbook
BSidesAugusta 2022 - The Power of the OT Security PlaybookBSidesAugusta 2022 - The Power of the OT Security Playbook
BSidesAugusta 2022 - The Power of the OT Security Playbook
 
Cobit 2019 framework by ISACA
Cobit 2019 framework by ISACACobit 2019 framework by ISACA
Cobit 2019 framework by ISACA
 
Conducting a NIST Cybersecurity Framework (CSF) Assessment
Conducting a NIST Cybersecurity Framework (CSF) AssessmentConducting a NIST Cybersecurity Framework (CSF) Assessment
Conducting a NIST Cybersecurity Framework (CSF) Assessment
 
Advanced Cybersecurity Risk Management: How to successfully address your Cybe...
Advanced Cybersecurity Risk Management: How to successfully address your Cybe...Advanced Cybersecurity Risk Management: How to successfully address your Cybe...
Advanced Cybersecurity Risk Management: How to successfully address your Cybe...
 
CISSP Cheatsheet.pdf
CISSP Cheatsheet.pdfCISSP Cheatsheet.pdf
CISSP Cheatsheet.pdf
 
PECB Webinar: Cybersecurity Guidelines – Introduction to ISO 27032
PECB Webinar: Cybersecurity Guidelines – Introduction to ISO 27032PECB Webinar: Cybersecurity Guidelines – Introduction to ISO 27032
PECB Webinar: Cybersecurity Guidelines – Introduction to ISO 27032
 
Computer security priciple and practice
Computer security   priciple and practiceComputer security   priciple and practice
Computer security priciple and practice
 
Cyber Threats & Cybersecurity - Are You Ready? - Jared Carstensen
Cyber Threats & Cybersecurity - Are You Ready? - Jared CarstensenCyber Threats & Cybersecurity - Are You Ready? - Jared Carstensen
Cyber Threats & Cybersecurity - Are You Ready? - Jared Carstensen
 
Modern SOC Trends 2020
Modern SOC Trends 2020Modern SOC Trends 2020
Modern SOC Trends 2020
 
Comparative of risk analysis methodologies
Comparative of risk analysis methodologiesComparative of risk analysis methodologies
Comparative of risk analysis methodologies
 
Iso iec 27032 foundation - cybersecurity training course
Iso iec 27032 foundation - cybersecurity training courseIso iec 27032 foundation - cybersecurity training course
Iso iec 27032 foundation - cybersecurity training course
 
Thinking in systems slides
Thinking in systems slidesThinking in systems slides
Thinking in systems slides
 
CMMC, ISO/IEC 27701, and ISO/IEC 27001 — Best Practices and Differences
CMMC, ISO/IEC 27701, and ISO/IEC 27001 — Best Practices and DifferencesCMMC, ISO/IEC 27701, and ISO/IEC 27001 — Best Practices and Differences
CMMC, ISO/IEC 27701, and ISO/IEC 27001 — Best Practices and Differences
 
Cyber Risk: Exposures, prevention, and solutions
Cyber Risk: Exposures, prevention, and solutionsCyber Risk: Exposures, prevention, and solutions
Cyber Risk: Exposures, prevention, and solutions
 
Loihi many core_neuromorphic_chip
Loihi many core_neuromorphic_chipLoihi many core_neuromorphic_chip
Loihi many core_neuromorphic_chip
 

Similar to An Intrusion Detection System, a Risk Analysis System , a Secured Expert Medical Consultation System and a Cyber Security Framework

A method for detecting abnormal program behavior on embedded devices
A method for detecting abnormal program behavior on embedded devicesA method for detecting abnormal program behavior on embedded devices
A method for detecting abnormal program behavior on embedded devices
Raja Ram
 
ENGS4851_Final_Certified_Report
ENGS4851_Final_Certified_ReportENGS4851_Final_Certified_Report
ENGS4851_Final_Certified_Report
Nagendra Posani
 

Similar to An Intrusion Detection System, a Risk Analysis System , a Secured Expert Medical Consultation System and a Cyber Security Framework (20)

Verification of the protection services in antivirus systems by using nusmv m...
Verification of the protection services in antivirus systems by using nusmv m...Verification of the protection services in antivirus systems by using nusmv m...
Verification of the protection services in antivirus systems by using nusmv m...
 
A hybrid technique for sql injection attacks detection and prevention
A hybrid technique for sql injection attacks detection and preventionA hybrid technique for sql injection attacks detection and prevention
A hybrid technique for sql injection attacks detection and prevention
 
A method for detecting abnormal program behavior on embedded devices
A method for detecting abnormal program behavior on embedded devicesA method for detecting abnormal program behavior on embedded devices
A method for detecting abnormal program behavior on embedded devices
 
A Study on Vulnerability Management
A Study on Vulnerability ManagementA Study on Vulnerability Management
A Study on Vulnerability Management
 
Cisco network management
Cisco network managementCisco network management
Cisco network management
 
project(copy1)
project(copy1)project(copy1)
project(copy1)
 
Kingston University Thesis - Design and Implementation of a Secure Web Applic...
Kingston University Thesis - Design and Implementation of a Secure Web Applic...Kingston University Thesis - Design and Implementation of a Secure Web Applic...
Kingston University Thesis - Design and Implementation of a Secure Web Applic...
 
A new approach for formal behavioral
A new approach for formal behavioralA new approach for formal behavioral
A new approach for formal behavioral
 
Genetic algorithm based approach for
Genetic algorithm based approach forGenetic algorithm based approach for
Genetic algorithm based approach for
 
Online Signature Authentication by Using Mouse Behavior
Online Signature Authentication by Using Mouse Behavior Online Signature Authentication by Using Mouse Behavior
Online Signature Authentication by Using Mouse Behavior
 
ENGS4851_Final_Certified_Report
ENGS4851_Final_Certified_ReportENGS4851_Final_Certified_Report
ENGS4851_Final_Certified_Report
 
Application of Data Mining Technique in Invasion Recognition
Application of Data Mining Technique in Invasion RecognitionApplication of Data Mining Technique in Invasion Recognition
Application of Data Mining Technique in Invasion Recognition
 
9.system analysis
9.system analysis9.system analysis
9.system analysis
 
CRIME EXPLORATION AND FORECAST
CRIME EXPLORATION AND FORECASTCRIME EXPLORATION AND FORECAST
CRIME EXPLORATION AND FORECAST
 
A PHASED APPROACH TO INTRUSION DETECTION IN NETWORK
A PHASED APPROACH TO INTRUSION DETECTION IN NETWORKA PHASED APPROACH TO INTRUSION DETECTION IN NETWORK
A PHASED APPROACH TO INTRUSION DETECTION IN NETWORK
 
MICRE: Microservices In MediCal Research Environments
MICRE: Microservices In MediCal Research EnvironmentsMICRE: Microservices In MediCal Research Environments
MICRE: Microservices In MediCal Research Environments
 
Web applications security conference slides
Web applications security  conference slidesWeb applications security  conference slides
Web applications security conference slides
 
IRJET- An Intrusion Detection Framework based on Binary Classifiers Optimized...
IRJET- An Intrusion Detection Framework based on Binary Classifiers Optimized...IRJET- An Intrusion Detection Framework based on Binary Classifiers Optimized...
IRJET- An Intrusion Detection Framework based on Binary Classifiers Optimized...
 
Intrusion detection system based on web usage mining
Intrusion detection system based on web usage miningIntrusion detection system based on web usage mining
Intrusion detection system based on web usage mining
 
IRJET- Sandbox Technology
IRJET- Sandbox TechnologyIRJET- Sandbox Technology
IRJET- Sandbox Technology
 

Recently uploaded

Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Recently uploaded (20)

Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 

An Intrusion Detection System, a Risk Analysis System , a Secured Expert Medical Consultation System and a Cyber Security Framework

  • 1. An Intrusion Detection System, a Risk Analysis System, a Secured Expert Medical Consultation System Developed Using Artificial Intelligence Technologies and a Cyber Security Framework Abstract This paper is an investigation into building an Intrusion Detection System, a Risk Analysis System and a Secured Expert Medical Consultation System developed using Artificial Intelligence Techniques. First of all, we describe how to build a Usage Profile of a Computer Network. This Usage Profile of the Computer Network will be the basic building block for the development of the Intrusion Intrusion Detection System and the Risk Analysis System. The proposed Usage Profile is made up of a Linear Regression model, a Mean and Standard Deviation model, and a Hidden Markov model. These models can be built by sampling experimental data of critical variables of a Computer Network. Secondly, the Secured Expert Medical Consultation System will be developed using Evolutionary Computing techniques. The proposed system will make it possible for Medical Doctors to perform medical consultations with ease. This is because the system will be able to suggest Doctor’s diagnosis, medical test and drug prescription associated with medical information captured during medical consultation such as symptoms and reactions of patients. The system would be able to do this by mapping a set of symptoms and conditions with a database of diseases. Additionally, this project will answer the question: how do we secure such an expert medical consultation system? As such, we will look at security of the system itself and security of other components of the system such as the database of the system and also security of the server on which the system and its database would be hosted. In this project, we will secure the system by performing Penetration Testing on the system as the system is being developed. Some of the techniques we would explore include SQL Injections, Dictionary attacks etc. Also, we would look at how to secure the database by configuring the appropriate Access Controls on the Database Management System (DBMS) and configuring the DBMS to guard against intrusion using authentication and authorization. We will also look at Password types that will be good for such a system. Also, this paper will outline processes, practices and guidelines that must be followed for performing cybersecurity audit. These processes, practices and guidelines will be outlined in a cybersecurity framework. Finally, it is suggested that deviations from the usage profile of the computer network can be flagged as anomalous activities. This can help us develop an intrusion detection system and a risk analysis system that can be used for detecting and preventing intrusions and performing risk analysis respectively.
  • 2. 1 Abstract 0 1.0 Introduction 5 2.0 Intrusion Detection System, Risk Analysis System and Cybersecurity Framework.6 2.1 Problem Definition 6 2.2 Research Questions 6 2.3 Objectives 6 2.4 Literature Review 7 2.4.1 Intrusion Detection Systems 7 2.4.2 Anomaly Detection Systems 8 2.4.3 Behaviour Encryption 8 2.4.4 Risk Analysis 8 2.4.5 Information Security Awareness and Practices 9 2.4.6 Protocol For Mitigating Risks on Social Networking Sites 10 2.4.7 Behaviour Models and Anomaly Intrusion Detection 10 2.5 Research Model and Methodology 10 2.5.1 Research Model 10 2.5.2 Usage Model: A Java Interface That Implements the Research Model 11 2.5.3 Usage Model File: model.java 11 2.5.4 Implementing the Usage Model for an Authentication System 12 2.5.5 Methodology 12 2.5.5.1 Machine Learning Algorithms & Behaviour Based Intrusion Systems 13 2.5.5.2 Audit Trail Analysis 13 2.5.5.3 Normal Usage Model 13 2.5.5.4 Threat Modelling 13 2.5.5.5 Boolean Calculus 13 2.5.5.6 Experimenting Usage and Threat Models 13 2.5.5.7Computer Usage Survey 14 2.5.5.8 Intrusion Detection Systems 14 2.6 Threats Associated With Computer Systems 14 2.6.1 Attacks associated with a computer system 14 2.6.2 Malicious Code 14 2.6.3 IP Scan and Attack 14 2.6.4 Web Browsing 15 2.6.5 Virus 15 2.6.6 Unprotected Shares 15 2.6.7 Mass emails 15 2.6.8 Simple Network Management Protocol (SNMP) 15 2.6.9 Hoaxes 15 2.6.10 Backdoors 15 2.6.11 Password Crack 15 2.6.12 Brute Force 16 2.6.13 Dictionary 16
  • 3. 2 2.6.14 Denial of Service (DoS) and Distributed Denial of Service (DDoS) 16 2.6.15 Spoofing 16 2.6.16 Man in the Middle 17 2.6.17 Spam 17 2.6.18 Mail Bombing 17 2.7 Mathematical Modelling Techniques and Machine Learning Based Models 17 2.7.1 Simple Linear Regression 17 2.7.2 Multiple Linear Regression 18 2.7.3 Non Linear Regression 18 2.7.4 Machine Learning Based Models Used for Developing Anomaly Based Intrusion Detection Systems. 19 2.8 The Normal Usage Model of a System 20 2.8.1 Single Variable Calculus Review and its Applications 21 2.8.2 Usage Model List 22 2.8.3 Authentication Usage Model 22 2.8.4 Session Usage Model 23 2.8.5 Memory Usage Model 23 2.8.6 CPU Usage Model 23 2.8.7 Program Usage Model 24 2.8.8 Host Usage Model 24 2.8.9 Battery Usage Model 24 2.8.10 Device Usage Model 24 2.8.11 Server Usage Model 24 2.8.12 Port Usage Model 24 2.8.13 Network Usage Model 25 2.8.14 Aggressive Usage Detector 25 2.8.15 False Alarm Detector 25 2.8.16 Special Parameters of The Usage Model 25 2.8.17 Building The Usage Profile 26 2.8.18 Building a Usage Profile for an Authentication System 26 2.8.19 Building A Markov Chain Model for An Authentication System 26 2.8.20 Threat Models in a System 27 2.8.21 Properties and Methods of the Novel Self Integrating Data Structure 27 2.8.22 Integration Review 28 2.8.23 Interpretation of Threat Model Integrals 28 2.8.24 Threat Analysis and Detection 28 2.8.25 Threat Prediction 29 2.8.26 Risk Analysis in a System 30 2.9 Normal Usage Model and Threat Model Simulation 30 2.10 Tools and Computer Packages 31 3.0 Secured Expert Medical Consultation System 32 3.1 Problem Definition 32 3.2 Research Questions 32 3.3 Objectives of Paper 32
  • 4. 3 3.4 Literature Review 34 3.4.1 Evolutionary Computing Terminologies 34 3.4.2 Evolutionary Algorithms 34 3.4.2.1 Representation 34 3.4.2.2 Evaluation or Fitness Function 34 3.4.2.3 Population 35 3.4.2.4 Parent Selection Mechanism 35 3.4.2.5 Variation Operators 35 3.4.2.6 Mutation 35 3.4.2.7 Recombination 35 3.4.2.8 Survivor Selection Mechanism 36 3.4.2.9 Initialization 36 3.4.2.10 Termination Condition 36 3.4.3 Genetic Algorithms 36 3.4.4 Evolutionary Strategies 36 3.4.5 Genetic Programming 37 3.4.6 Evolutionary Programming 37 3.4.7 Differential Evolution 37 3.4.8 A Survey on Wearable Sensor-Based Systems for Health Monitoring and Prognosis 37 3.4.9 Sensors in Medicine 38 3.4.10 Expert System Methodologies and Application 38 3.4.10.1 Rule-based Systems 38 3.4.10.2 Knowledge-based Systems 39 3.4.10.3 Neural Networks 39 3.4.10.4 Fuzzy Expert System 39 3.4.10.5 Object Oriented Methodologies 39 3.4.10.6 Case-based Reasoning 40 3.4.10.7 Modelling 40 3.4.10.8 System Architecture 40 3.4.10.9 Intelligent Agents 40 3.4.10.10 Ontology 41 3.4.10.11 Database Methodology 41 3.5 Research Model and Methodology 43 3.5.1 Research Model 43 3.5.2 Research Methodology 43 3.5.2.1 Assumption Enumeration 43 3.5.2.2 Hypothesis Formulation 44 3.5.2.3 Experimentation 44 3.5.2.4 Hypothesis Testing 44 3.5.2.5 Demonstration 45 3.5.2.6 Agile Development 45 3.6 Requirement Specification 46 3.6.1 Functional Requirement for the Android App 46
  • 5. 4 3.6.1.1 Consultation 46 3.6.1.2 Patient Basic details and medical information 46 3.6.1.3 User Settings and Authentication 46 3.6.2 Functional Requirements of the Web Application 47 3.6.2.1 Patient Details and Medical Information 47 3.6.2.2 Consultation 47 3.6.2.3 Drug Prescription and Medical Test 47 3.6.2.4 User Settings and Authentications 47 3.6.3 Non-Functional Requirements 48 3.7 Scope of Diseases 48 3.7.1 Symptoms and Reactions of Diseases that will be modelled 48 3.7.2 Symptoms of Malaria 49 3.7.3 Symptoms of Cholera 49 3.7.4 Symptoms of Diarrhoea 49 3.7.5 Symptoms of Bi-polar Disorder 49 3.7.6 Symptoms of Schizophrenia 49 3.7.7 Symptoms of Diabetes 49 3.7.8 Skin Diseases 50 3.7.9 Symptoms of Hypertension 50 3.7.10 Symptoms of Asthma 50 3.7.11 Medical Tests Associated with Diseases 51 3.8 The Evolutionary Computing World 51 3.8.1 Representation 51 3.8.2 Population 51 3.8.3 Initialization 51 3.8.4 Fitness Function 51 3.8.5 Parent Selection Mechanism 52 3.8.6 Survivor Selection Mechanism 52 3.8.7 Mutation 52 3.8.8 Recombination 52 3.8.9 Termination Condition 52 3.9 Design and Implementation 54 3.9.1 Designing the Mobile App 54 Fig. 2 55 3.9.2 Implementing the Mobile App 55 3.9.3 Designing the Web App 57 3.9.4 Implementing the Web App 57 4.0 Conclusion and Discussion 60 5.0 References 62
  • 6. 5 1.0 Introduction If a usage profile of a system can be built, it will become possible to detect unusual behaviour on the system. The method for building such usage profiles involves determining factors of the system that are critical to the system. These factors can be seen as critical system variables that affect the system’s usage. The other thing to consider is determining the way in which you can obtain an abstract representation of the usage profile. The abstract representation of the usage profile can be achieved by the application of behaviour models such as statistical models, machine learning models and cognitive based models. Secondly, Cyber security threats on computer networks have the potential of causing damage to resources on the computer network. Examples of these damages include corrupting data stored or transmitted on the network, infesting a host on the network with virus, impersonating a valid user on the network and preventing proper functioning of applications softwares on various hosts on the network. The security of computer systems is very essential to various organizations. Computer systems security is usually provided by computer software that protects the computer system for which they were developed. Such a computer software system is an intrusion detection system. Other computer systems that provide security are antivirus and firewall and risk analysis systems. Also, periodic computer security audits will enable threat detection and prevention on computer networks. Additionally, It must be stated that medical health care can be made a bit successful, timely and will yield the expected results when an expert medical consultation system is used in administering medical care and performing medical consultation. This type of system can be developed through the application of artificial intelligence technologies such as machine learning, artificial neural networks and evolutionary computing. It must be stated that application of concepts of evolutionary computing can make it possible to develop an expert medical consultation system. This is possible when we develop a database of diseases with their symptoms and a database of medicines that are used to cure the diseases and a database of reactions and indications and their associated medical test that will aid in administering medical consultation. This paper is an investigation into building an Intrusion detection System, a Risk Analysis System and a Secured Expert Medical Consultation System using Artificial Intelligence Techniques. The intrusion detection system and a risk analysis system which will be developed using behaviour models such as statistical models, machine learning models and cognitive models. The expert medical consultation system will be developed using evolutionary computing techniques. Finally, the paper will outline a Cybersecurity Framework that will describe the processes, practices and guidelines that must be adhered when performing a security audit or risk analysis.
  • 7. 6 2.0 Intrusion Detection System, Risk Analysis System and Cybersecurity Framework. This chapter of the research paper is dedicated to the Intrusion Detection System, Risk Analysis System and Cybersecurity Framework. We will discuss the objectives of the project, the problem we seek to solve and the research question for this part of the research paper. We will also describe how to build the usage profile that is the basic building block for the intrusion detection system and the risk analysis system. 2.1 Problem Definition If the normal usage or behaviour of a computer system can be represented by an abstract model, then this abstract model can be used to detect threats on the system. The threats on the system can be detected as deviations from the abstract model which is the behaviour of the system. The main problems this paper seeks to investigate are listed below. ● Representing the normal usage or behaviour of a system with an abstract model. ● Determining activities and occurrences on the system that are deviations from the system’s normal behaviour or usage. ● Representing these activities or deviations with an abstract model. ● Preventing such activities or occurrences from occurring on the system. ● In this paper the system’s normal behaviour is known as the usage profile and the deviations from the system’s normal behaviour is known as the threat profile of the system. 2.2 Research Questions The main questions to be investigated are listed below. ● What are the best and most efficient techniques for modelling a system’s normal behaviour or usage? ● What are the best and most efficient techniques for design and implementation of an intrusion detection system? ● How can we build a risk analysis system for performing risk assessment of a computer network? 2.3 Objectives The main objectives of this research are as follows. ● Representing a computer network’s normal functioning with an abstract model ● Building a usage profile of a computer network.
  • 8. 7 ● Detecting activities and occurrences that deviate from the normal usage of a computer network and flagging these activities and occurrences as anomalous activities on a computer network. ● Design and implementation of an Anomaly Intrusion Detection System. ● Design and implementation of a Risk Analysis System. ● Drafting of a document that details the procedures, processes, practices and guidelines that must be followed when performing security audits.. 2.4 Literature Review This section reviews major topics that constitute this research paper and work done in some of these areas. The topics and areas that will be considered for discussion include intrusion detection systems since any discussion or study of threat and their source detection is centred on intrusion detection systems. Also, behaviour encryption is another computer security field that will be discussed in detail since it adds much value to information hiding parts of this research. Risk analysis will also be reviewed to sum up what constitutes risk analysis. Finally, there will be a review on Normal Usage Models. 2.4.1 Intrusion Detection Systems Basically, there are two types of intrusion detection systems in the industry based on the approach used for threat detection and the technologies used to build the system [25]. These are knowledge based also known as signature based and behaviour based intrusion detection systems [25]. Each takes a different approach to threat detection and each uses different technology for building the intrusion detection systems. Also, every single one has its pros and cons. Knowledge based intrusion detection systems are built on a database of already known threats [25]. These known vulnerabilities or threats are called threat signatures [25]. Usually, detection is done as direct mappings of various system incidents that indicate threats with threat signatures [25]. As a result, the database of threats must be constantly updated for new identified threats [25]. Because new threats can be detected for inclusion in the database, the correctness of detecting threats is sometimes compromised since threats which do not have corresponding signatures cannot be mapped and detected [25]. But these types of intrusion detection systems have lower false alarms since each detected threat is registered in the database of threat signatures [25]. Behaviour based intrusion detection systems take a different approach to threat detection. They are built using artificial intelligence technologies [25]. Usually, the system for which the intrusion detection is built is modelled for its behaviour and deviations from that behaviour is used as a technique for detecting the threats [25]. Because of this, they have a better correctness at detecting threats [25]. No threat signatures or mappings of incidents that indicate threat is required [25]. Additionally, they have higher false alarms because there is no mapping of detected threats with a database of known threats [25].
  • 9. 8 Besides these, intrusion detection systems are classified based on purposes for which they are built and the activeness or passiveness at which they deal with threats [25]. There are host based and network based intrusion detection systems made for such purposes [25]. Active intrusion detection systems are configured to block or prevent attacks while passive intrusion detection systems are configured to monitor, detect and alert threats [25]. 2.4.2 Anomaly Detection Systems According to a research paper entitled “Design and Implementation of Anomaly Detection System”, there are global variables of a network that can be used for detecting anomalous activities on a network [19]. The paper used a hybrid of signature based and anomaly intrusion detection to detect anomaly [19]. According to the paper, some of the techniques used for detecting intrusion include using generic network rules to detect network anomaly. The paper also used dynamic network knowledge such as network statistics to detect anomalous activities [19]. 2.4.3 Behaviour Encryption Behaviour algorithms are applied to safeguard information on computing devices such as mobile phones and laptops [27]. These algorithms are the basics for building systems that study and encrypt user behaviour on a computing device in order to ensure the security of information on the computing devices [27]. A study into mobile platform security reports that behaviour encryption application systems have been designed and built, focusing on mobile platforms [27]. Results from this study indicated that encryption application systems are effective in ensuing mobile platform security [27]. In addition to this, it must be noted that, since mobile devices can have security through behaviour encryption systems, then the behaviour of hosts on a network or network systems can also be encrypted to ensure safe communication since each host or user on a system or network has a particular behaviour pattern. Cryptographic study into encrypting the normal usage model can fall under behaviour encryption since the usage model represents a system’s behaviour and can be composed of a user’s behaviour. This can aid in securing the information that embodies the usage model. It is also necessary because if the usage model can easily be predicted then it is possible to manipulate the usage model and launch an attack. 2.4.4 Risk Analysis Computer risk analysis is also called risk assessment [49]. It involves the process of analyzing and interpreting risk [49]. There are two main types of risk assessments: qualitative and quantitative [44]. Quantitative Risk Assessments uses mathematical models and simulations to assign numerical values to risk [46]. Qualitative Risk Assessments relies on an Expert’s subjective judgement to build a theoretical model of risk for any given situation [46]. It must be stated that, to analyze risk, the scope and methodology has to be initially determined [49]. Later, information is collected and analyzed before interpreting the risk analysis results [49]. Determining the scope can be described as identifying the system to be analyzed for
  • 10. 9 risk and parts of the system that will be considered [49]. Also, the analytical method that will be used with its detail and formality must be planned [49]. The boundary, scope and methodology used during risk assessment determine the total amount of work efforts that is needed in the risk management, and the type and usefulness of the assessments result[49]. Risk has many components including assets, threats, likelihood of threat occurrence, vulnerability, safeguard and consequence [49]. Two formulas for Risk are of paramount significance to this research paper. The first one is given as; Risk = Threat + Consequence + Vulnerability [47]. It must be emphasized that, “Risk in this formulas can be broken down to consider likelihood of threat occurrence, the effectiveness of your current security program and the consequence of an unwanted criminal or terrorist event occurring”[47]. The second formula which I have known while I was an Information Security Consultant is given as; Risk = Likelihood of Threat Occurrence ✕ Impact of Threat Occurrence This formula is somewhat more suitable in performing risk assessment because it is a bit simple. It could be used for qualitative risk assessments or even quantitative risk assessments Additionally, Risk management includes risk acceptance which takes place after several risk analyses [48]. Normally, after risk has been analyzed and safeguards implemented, the remaining or residual risk in the system that makes the system functional must be accepted by management [48]. This may be due to constraints on the system such as ease of use, or features of the systems for which strict safeguard will cost the organization operational problems. As such, risk acceptance, like the selection of safeguards, should take into account various factors besides those addressed in the risk assessment [49]. In addition, risk acceptance should take into account the limitations of the risk assessment [49]. 2.4.5 Information Security Awareness and Practices A paper on information security awareness in Saudi Arabia discusses information security awareness and practices. The paper is entitled “A study of information security awareness and practices in Saudi Arabia.” This paper emphasizes the fact that information is under constant threat from cyber vandals [1]. However, Saudi Arabia is rated poor in terms of information security due to the fact that the country is a highly suppressed, patriarchical and tribal culture country [1]. The paper examined the level of information security awareness among the general public in the country using an anonymous online survey based on instruments the Malaysian Security Organization produced [1]. In all, 633 persons responded to the survey and analysis confirmed that indeed, information security awareness is low in the country and this is mostly related to the fact that the country is highly suppressed, patriarchical and tribal in nature [1].
  • 11. 10 2.4.6 Protocol For Mitigating Risks on Social Networking Sites According to an academic paper entitled, “Protocol for mitigating the risk of hijacking social networking sites”, hackers can hijack a user’s session on social networking sites, impersonate the victim and take over his session [7]. The paper deals with this risk by presenting a security authentication protocol for mitigating the risk [7]. The protocol takes into account that users of social networking sites connect to the sites using several platforms and connection speeds [7]. To cater for mobile devices and tablets using Wifi connection, a novel Self-Configuring Repeatable Hash Chains (SCRHC) protocol was developed to prevent the hijacking of session cookies [7]. This protocol supports three levels of caching making it possible to forfeit storage space for enhanced performance and reduced workload [7]. 2.4.7 Behaviour Models and Anomaly Intrusion Detection Behaviour models are used to detect intrusion in computer systems. This section reviews the behaviour models that can be used to build behaviour based intrusion detection systems. These models are put into various categories. The categories are, statistical models, machine learning based techniques, cognitive models, computer immunology, user intention. Statistical models include operational or threshold metric model, markov process or marker model, multivariate model, statistical moments model, time series models, univariate models. Machine learning based models include bayesian networks, generic algorithms, neural networks, fuzzy logic, and outlier detection, cognitive models include finite state machines, description scripts, and expert systems. 2.5 Research Model and Methodology This section describes the research model and methodology for developing the security audit framework. We will describe the research model and the steps that make up the methodology. 2.5.1 Research Model Assume that the normal usage (Y) of a computer network can be represented by a mathematical function; Y=f (Xi, Ci) such that Xi represents system variables like number of functions or number of authentications. Ci represents system constants like maximum or minimum number of authentications. When a change in Y is beyond the standard deviation determined from the data set of our usage, then that change indicates a threat. To investigate this threat, machine learning algorithms, mathematical functions and behaviour based intrusion detection systems will be studied to determine Y in terms of a number of variables that represent Y appropriately. The expected usage model of the network to be investigated includes the following components. Host Usage Model, Server Usage Model, Device Usage Model, Port Usage Model, Network Usage Model, Session Usage Model, Authentication Usage Model,
  • 12. 11 Memory Usage Model, CPU Usage Model, Battery Usage Model and Program Usage Model. These components are expected to be derived from the variables listed below. ● Average number of application software that run on the network system while using the system ● Average number of system processes that run on the network system while using the system. ● Average number of authentications in the network system. ● Average number of user actions that happens on the network system ● Average time a user spends before his session expires. ● Average time the network system functions each day. ● Number of paired ports communicating on the network ● Average amount of memory space used on devices while the network is being operated. ● Average CPU time spent on a single device on the network ● Average life span of a single device battery on the network. 2.5.2 Usage Model: A Java Interface That Implements the Research Model For each component of a computer system under investigation, we will program a usage model which is an implementation of the research model for that component which forms part of the computer system under investigation. Each usage model implements an interface captured in a java file called model.java. There are eight functions in the model.java interface. The first one is computeval which is for computing the usage value at an instance. The second one is findchange which is for finding changes in the usage of the computer system. The third one is learnsys which is for learning the usage of the system. The fourth one is findrelationship which is for finding the regression equation. The fifth one is monitor which is for monitoring the usage of the system. The sixth one is showalarm which is for displaying error messages and detected intrusion. The seventh one is haltprocess which is for halting detected intrusion and the eighth one is predictvals. It is for predicting usage values based on the regression equation determined. Omitting an implementation of one of the functions of the usage model will throw an exception. To implement the usage model, you will use the java keyword implements. Below is an implementation of the model.java file 2.5.3 Usage Model File: model.java public interface model{ public double computeval(); public double findchange(); public void learnsys(int t); public Object findrelationship(); public void monitor(int t);
  • 13. 12 public void showalarm(String info); public void haltprocess(); public void predictvals(); } 2.5.4 Implementing the Usage Model for an Authentication System class auth_usage implements model{ /*variable declaration for dependent and independent variables */ public double computeval(){ } public double findchange(){ } public void learnsys(int t){ } public Object findrelationship(){ } public void monitor(int t){ } public void showalarm(String info){ } public void haltprocess(){ } public void predictvals(){ } }/* end of class 2.5.5 Methodology The list below details activities or processes that will be followed to represent a computer system with an abstract mathematical model and analyze changes in that system. It is hoped
  • 14. 13 that following these processes will arrive at the design and implementation of a normal usage model, an intrusion detection system and a risk analysis system. 2.5.5.1 Machine Learning Algorithms & Behaviour Based Intrusion Systems Machine learning techniques and algorithms will be investigated to know the extent to which an expert system that learns a computer system’s usage can be built. Since the expected usage model is a mathematical model, various mathematical modelling techniques will be applied to determining the normal usage model. When deviations from these mathematical models are analyzed it can lead to design and implementation of behaviour based intrusion detection systems. As such, a thorough study into design and implementation of behaviour based intrusion detection systems will be done. 2.5.5.2 Audit Trail Analysis It is expected that computer security audit reports will be sampled and analyzed to arrive at a set of dependent and independent variables and their data set. These variables and their associated data set can be used to formulate the normal usage model. 2.5.5.3 Normal Usage Model An investigation into applying the knowledge gained from the machine learning study, the mathematical modelling study, the behaviour based intrusion detection system study and the audit trail analysis will be done. It is hoped that this will answer the question how do you represent the normal functioning of a computer system with a mathematical abstract model. 2.5.5.4 Threat Modelling Differential equations of the normal usage model will be investigated to know the extent to which deviations from the normal usage models can be analyzed. An abstract mathematical model of these deviations will be formulated. These abstract models are derivatives of the normal usage model. 2.5.5.5 Boolean Calculus A study into representing the normal usage model with a boolean function will be done. It is hoped that analyzing these boolean functions will aid in building a hardware that is the expected usage system. Differential equations of these boolean functions will be studied to analyze changes in the system that indicate deviation from the normal usage model. 2.5.5.6 Experimenting Usage and Threat Models Programming will be used as a tool to experiment various usage and threat models. These usage and threat models are expected to be derived from a computer system. This experiment will lead to design and implementation of a normal usage system, an intrusion detection system and a risk analysis system. These systems are expected to represent the
  • 15. 14 behaviour of a computer network, detect intrusion in a computer network, and used for performing risk assessments respectively. 2.5.5.7Computer Usage Survey A questionnaire for obtaining information about computer and smart phone usage will be employed. It is expected that this will give an idea about various statistics that make up a computer or smart phone’s usage. These statistics will be a guideline for sampling experimental data of a computer system’s usage during experimenting the usage and threat models. 2.5.5.8 Intrusion Detection Systems It is hoped that an anomaly based intrusion detection system will be developed to demonstrate the effectiveness of the research model at being used to model systems usage and threats. The effectiveness of the intrusion detection system developed at preventing intrusion in a computer network will also be measured. In this project, the intrusion detection system that will be developed is for a computer network and an ecommerce site. 2.6 Threats Associated With Computer Systems This chapter discusses some of the threats and attacks associated with computer and network systems. 2.6.1 Attacks associated with a computer system The attack types that will be discussed include Malicious Code, IP Scan and Attack, Web Browsing, Virus, Unprotected Shares, Mass emails, Simple Network Management Protocol(SNMP), Hoaxes, Backdoors, Password Crack, Brute Force, Dictionary, Denial of Service(DoS) and Distributed Denial of Service Attack(DDoS), Spoofing, Man in the Middle, Spam, Mail Bombing, Sniffers, Social Engineering, Buffer Overflow and Timing Attack. 2.6.2 Malicious Code Malicious Code attack include the execution of viruses, worms Trojan horses and active Web scripts with the intent to destroy or steal information. The state of the art malicious code is the polymorphic or multivector worm. The attack programs uses up to six attack vectors to exploit a variety of vulnerabilities in commonly known information system devices. Perhaps the best illustration of such an attack remains the outbreak Nimda in Septembers 2001 which used five of the six vectors with startling speed. TruSecure Corporation an industry source for information security statistics and solutions reports that Nimda spread to span the internet address of 14 countries in less than 25 minutes. 2.6.3 IP Scan and Attack The infested system scans a random or the local IP addresses and targets any of the several vulnerabilities known to hackers or left over from previous exploits such as Code Red Black Orifice, Poizon Box.
  • 16. 15 2.6.4 Web Browsing If the infested system has write access to any Web page, it makes all the Web content files (html, asp,gci and others) infectious so that users who browse to those pages become infected. 2.6.5 Virus Each infested machine infects certain common executable or script files on all computers to which it can write with virus code that can cause infection. 2.6.6 Unprotected Shares Using vulnerabilities in file systems and the way many organizations configure them, the infested machine copies the viral components to all locations it can reach. 2.6.7 Mass emails By sending email infections to addresses found in the address book. The infected machine infects many users, whose mail reading program also automatically runs the programs and infects other systems. 2.6.8 Simple Network Management Protocol (SNMP) By using the widely known and common password that were employed in the early versions of the protocol (which is used for remote management of networks and computer devices) the attacking program can gain control of the device. Most vendors have closed these vulnerabilities with software upgrades. 2.6.9 Hoaxes A more devious approach to attacking computer systems is the transmission of a virus hoax with a real virus attached, when the attack is masked, in seemingly legitimate message, unsuspecting users readily distribute it. Even though those users are trying to do the right thing to avoid infection, they end up sending the attack on to their coworkers and friends and infesting many users along the way. 2.6.10 Backdoors Using a known or previously unknown and newly discovered access mechanism, an attacker can gain access into a system or network resource through a back door. Sometimes, these entries are left behind by system designers or maintenance staff and thus referred to as trap doors. A trap door is hard to detect, because, very often the programmer who puts it in place also makes the access exempt from the usual audit logging features of the system. 2.6.11 Password Crack Attempting to reverse-calculate a password is often called cracking. A cracking attack is a component of many dictionary attacks. It is used when a copy of the security account manager (SAM) data file can be obtained. The SAM file contains the hashed representation
  • 17. 16 of the user’s password. A password can be hashed using the same algorithm and compared to the hashed results. If they are the same the password has then been cracked. 2.6.12 Brute Force The application of computing and network resources to try every possible combination of options of password is called brute force attack. Since this is often an attempt to repeatedly guess passwords to commonly used accounts, it is sometimes called a password attack. If attackers can narrow the field of accounts to be attacked, they can devote more time and resources to attacking fewer accounts. That is one reason a recommended practice is to change account names for common accounts from the manufacturer’s default. While often effective against low-security systems, password attacks are often not useful against systems that have adopted the usual security practices recommended by manufacturers. 2.6.13 Dictionary This is another form of brute force attack. The dictionary attack narrows the field by selecting specific accounts to attack and uses a list of commonly used password (the dictionary) instead of random combinations. Organizations can use similar dictionaries to disallow passwords during the reset process and thus guard against easy-to-guess passwords. In addition, rules requiring additional number and/ or special characters make the dictionary attack less effective. 2.6.14 Denial of Service (DoS) and Distributed Denial of Service (DDoS) In a denial of service attack, the attacker sends a large number of connections or information requests to a target. So many requests are made that the target system cannot handle them along with legitimate requests for service successfully. This may result in the system crashing or simply becoming unable to perform ordinary functions. A distributed denial of service is an attack in which a coordinated stream of requests is launched against a target from many locations at the same time. Most DDos attacks are preceded by a preparation phase in which many systems, perhaps thousands are compromised. The compromised machines are turned into zombies, machines that are directed remotely (usually by a transmitted command) by the attacker or participate in the attack. DDos attacks are the most difficult to defend against and there are presently no controls that any single organization can apply. There are, however, some cooperative efforts to enable DDos defences among groups of services providers; among them is the Consensus Roadmap for Defeating Distributed Denial of Service attacks. 2.6.15 Spoofing Spoofing is a technique used to gain unauthorized access to computers wherein the intruder sends messages to a computer that has an IP address that indicates that the messages are coming from a trusted host. To engage in IP spoofing, a hacker must first use a variety of techniques to find an IP address of a trusted host and then modify the packet headers so that it appears that the packets are coming from that host. Newer routers and firewalls arrangements can offer protection against IP spoofing
  • 18. 17 2.6.16 Man in the Middle In the well-known man-in-the-middle or TCP hijacking attack, an attacker monitors (or sniffs) packets from the network, modifies them and inserts them back into the network. This type of attack uses IP spoofing to enable an attacker to impersonate another entity on the network. It allows the attacker to eavesdrop as well as to change, delete, reroute, add forge, or divert data. In a variant on the TCP hijacking session, the spoofing involves the interception of an encryption key exchange, which enables the hacker to act as an invisible man-in-the-middle – that is eavesdropper – with regard to encrypted communications. 2.6.17 Spam Spam is unsolicited commercial email. While many considers spam a trivial nuisance rather than an attack, it has been used as a means to make malicious code attacks more effective. In March 2002, reports emerged of malicious code embedded in MP3 files that were included as attachments to spam. The most significant consequence of spam on the modern organization, however, is the waste of both computer and human resources it causes by the flow of unwanted electronic mail. Many organizations attempt to cope with the flood of spam by using filtering technologies to stem the flow. Other organizations tell the users of the mail system to delete unwanted messages. 2.6.18 Mail Bombing Another form of e-mail attack that is also Dos is called mail bomb, in which an attacker routes larger quantities of e-mail to the target. This can be accomplished through social engineering or by exploiting various technical flaws in the Simple Mail Transport Protocol. The target of the attack receives unmanageable large volumes of unsolicited e-mail. By sending large e-mails with forged header information, attackers can take advantage of poorly configured e-mail systems on the internet. 2.7 Mathematical Modelling Techniques and Machine Learning Based Models The mathematical relation that represents the normal usage model can be determined using regression analysis. Regression analysis is a field of statistics. It employs the least squares method to determine the relationship between a data set composed of two or more variables. The least squares method tries to determine the relationship by minimizing the error margin of the derived relation. 2.7.1 Simple Linear Regression Simple linear regression problems involve a dependent and a single independent variable. The goal is to find a linear relationship between the two variables. The linear relationships are of the form y=b0+b1x where y is the dependent variable and x is the independent variable. The slope of the line is b1 and the y-intercept is b0. The relationship between the dependent and independent variable can be derived using the least squares method. First of all, the sum of the dependent and the independent variables, and the sum product of the
  • 19. 18 dependent and the independent variables must be calculated. Secondly, the sum of the squares of the dependent and the independent variables must be calculated. The constant that represents the slope of the line that fits the predicted function is calculated as the product of the sum product of the dependent variable and the independent variable and the sample size minus the product of the sums of the dependent and the independent variables divided by the product of the sample size and the sum of the squares of the independent variable minus the square of the sum of the independent variable. The constant that represents the y-intercept of the line is also calculated as the product of the sum of the dependent variable and the sum of the squares of the independent variable minus the product of the sum of the independent and the sum product of the dependent and independent variables divided by the product of the sum of the squares of the independent variable and the sample size minus the square of the sum of the independent variable. Finally, the correlation coefficient of the predictive relation is also calculated as the product of the sample size and the sum product of the dependent and independent variable minus the product of the sums of the dependent and independent variables divided by the square root of the product of the sample size and the sum of the squares of the independent variable minus the product of the squares of the sum of the independent variables multiplied by the product of the sample size and the sum of the squares of the dependent variable minus the square of the sum of the dependent variable. 2.7.2 Multiple Linear Regression Multiple linear regression problems involve a dependent variable and two or more independent variables. Using the least squares method, the goal is to find the linear relationship between the variables involved. The relationships are of the form y=b0 + b1x1+b2x2+…+bnxn, where n is the number of independent variables, x1, x2,… ,xn are the various independent variables and y is the dependent variable. To solve multiple linear problems, we first need to reduce the expected function or multiple linear models to their simple linear forms. In this form, it is easier to determine the regression equation. To do this we need to determine the y=b0+b1x for every independent variable. That way, the regression coefficient set denoted b associated with the independent variables can be determined using the least squares method. As such the set b made up of b1, b2,…bn is a set containing the entire regression coefficient associated with the predicted regression function. 2.7.3 Non Linear Regression Non linear regression problems involve finding a non linear relationship between a dependent variable and one or more independent variables. Because non linear graphs are difficult to analyze, they can be represented mathematically as linear models before they are
  • 20. 19 analyzed. This makes it possible to use linear regression techniques to analyze such relationships. One of the ways used to represent non linear relationships with linear models is taking logs on both sides of the relationship equation. That reduces the non linear relationship to a linear relationship. An example is of the form y2=x2/xy. To reduce this relationship to a linear relation we take logs on both sides of the relation. The resulting relationship is 2logy=2logx-logx-logy. When this relationship is simplified the resulting relationship is logy=(logx)/3. In this form, the logy term represents the dependent variable and the logx term represents the independent variable. Let K=logy and let P = logx. It implies that K=P/3. This becomes the linear form of our non linear relation. 2.7.4 Machine Learning Based Models Used for Developing Anomaly Based Intrusion Detection Systems. This section discusses how hidden markov models can be used to detect and prevent threats on a computer system. Hidden markov models are machine learning models that are used to model states in a system, the sequence in which they occur and the associated probabilities for each state transition. When a system has a set of states in which it usually falls and it can be predicted or established that each new state is dependent on the previous states, then hidden markov models can be used to learn the state transitions that usually happens in the system. It must be stated that the sequence in which states occur in a system can be characterized by a parametric random process. Also, the probability associated with each state transition is irrespective of the time in which the transition occurred in the system. For computer systems which have occurrences that happen based on a parametric random process, these occurrences can be seen as the set of states in the system. Some of these occurrences may be the point at which the system is at its optimal usage, and the point at which a particular threat occurs in the system. When a set of threat types that happens in the system is determined, it becomes possible to study the sequence in which these threats occur in the system and the various transitions between the threats using hidden markov models. Also, the various usage points including the optimal, the minimum and the average usage and how they are transited in the system can be studied using hidden markov models. Because various occurrences and threats can be studied using hidden markov models, it becomes possible to predict the next occurrence or threat that will happen on a host or a computer network. Threat sources can also be predicted using threat models. When threat models are integrated, they give a general idea about the source of the threat. With such knowledge and ability, the next threat or occurrence that has a higher likelihood of happening on a host or network can be predicted using application of hidden markov models. As such, occurrences can be prevented if they are estimated to be disastrous. Also, if for instance, for some reason, the optimal or minimal usage must be reached, it becomes possible to study ways of optimizing the transition from the current state or predicted next
  • 21. 20 state to the required state. This makes it possible to move from a particular usage point to the desired usage point. This approach to threat detection and usage optimization, make it possible to build anomaly based intrusion detection systems that are correct, prompt and increase optimal use of the system. The anomaly based intrusion detection systems built using these techniques are correct because the threat models come from usage models that are built using similar approaches and the threat prediction and prevention mechanisms are designed using robust techniques developed using these approaches. Also, there are likely going to be lower false alarms since the threats predicted on hosts or on the network come from threat models designed from such robust methods. An example of a kind of cyber security threat that this approach can be used to model is a network problem where a student is determined or predicted to be sending threatening or socially unacceptable emails to colleagues. Typically, his identity is hidden on the network on which he sends the emails. As such, it is difficult to determine the likelihood that he will send such threatening emails on a particular day or hour so that his identity could be determined and brought to book. Using hidden markov models, a usage model of the email system could be developed that will make it possible to determine the day or hour in which he is likely going to send such an email. This will help in determining his identity and bring him to book. 2.8 The Normal Usage Model of a System If the normal usage of a network system can be represented by a mathematical function such that that function is made up of system variables Xi and system constants Ci, then any representation of our mobile system can be summarized as Y=f (Xi, Ci), where Y is our systems’ usage and Xi are the various independent variables of our mobile system that constitutes the normal usage model of the system. A normal usage model is an abstract representation of the usual or normal functioning or behaviour of a system. In order to model the normal usage of our system and determine its mathematical representation, it is essential to keep the method simple and the variables simple in abstraction and minimal in quantity. This makes it easy to analyze, model and detect threats by applying a branch of calculus called differentiation. Simplicity and minimal number of variables make it possible to arrive at a mathematical function whose differential coefficient can be easily computed using differentiation. As such, two cases will be considered. In the first case, the normal usage model of our system can be analyzed and modelled based on simple but essential micro usage models. These micro usage models represent smaller components of our mobile system such as an authentication system of our mobile system, and a user’s session. Ideally, these models are best derived from exactly one most appropriate system variable when feasible or at most two in order to reduce the complexity involved in computing the differential coefficient of the usage model.
  • 22. 21 For a mathematical function involving more than a single independent variable, our method for threat detection using the differential equations techniques is within the scope of multivariable calculus. Since it is easy to compute the differential coefficient of a single variable function, our threat analysis and detection can be easy if all our micro models are single variable functions. In the second case however, our usage model derives its mathematical representation from at least two or three most relevant system variables of the mobile system under examination. This option increases the complexity involved in calculating the differential coefficient of our normal usage model and analyzing the threat associated. This is because the normal usage model for this case is a function that can be derived from two or more independent system variables. To do this type of differentiation, we use a branch of calculus called partial differentiation, where one of the independent variables of our usage model is held constant to analyze changes in the usage. This type of differentiation is also within the scope of multivariable calculus. The sections that follow the one below throw more light on how to model the normal usage of several micro usage models. These micro usage models are expected to be components of a computer network’s usage. It must be noted that the usage model is made up of the usage model function and a statistical model that captures the mean and standard deviation of the predicted usage function. This statistical usage model is called moments or mean and standard deviation model. There are other statistical models that could have been used. These include time series models, univariate models and bivariate models. 2.8.1 Single Variable Calculus Review and its Applications Assume a mobile system with exactly three major system variables. If sampling each of these variables helps us to arrive at exactly one micro usage model of our mobile system that best represents the behavior or functioning of that feature of our system, then we can use differential equations of the three micro models to analyze and detect threats. Below are some examples of calculus basics for our threat modelling techniques. Y=2X+3 is a linear function that represents our first micro usage model. X is the number of authentications. Y=3X2+2X+6 is a quadratic function that represents our second micro usage model and X is the number of hosts on the mobile system’s wireless network. Y=40/ X+ 5 is an exponential function that represents our third micro usage model and X is the number of applications on a host on the mobile system’s wireless network. For each micro usage model, the differential coefficient can be computed using the law for differentiation given below. Theorem 1: dy/dx(C) =0, where C is a constant. Theorem 2: dy/dx (f[Xi, Ci]) is computed as the product of the exponent of the first term that results from simplifying f (Xi, Ci) and the constant besides it multiplied by the system variable Xi raise to the power the original exponent of the first term minus one plus the result for iterating the first step till every term
  • 23. 22 of f (Xi, Ci) has been evaluated based on the first step. The final result looks like the sum of a series of rational numbers computed from the law after going through all the terms. From the calculus basics review above, the corresponding differential coefficients of the three micro models are determined as follows; 2, 6X+2, and -40/ X2. If the standard deviations of our micro models are computed, then we can analyze changes in our system by looking at values of our usage model and its derivatives and how they relate to the average usage, its corresponding standard deviation, and the acceptable thresholds for threats. Any occurrence at a point where our usage model value is not equal to the average usage indicates a threat. Any occurrences at a point where the usage model value is less than the average usage minus its corresponding standard deviation is a denial of service threat. Any occurrence at a point where the usage model value is greater than the average usage plus it corresponding standard deviation is an intrusion. Also any occurrence at a point where the value of the usage’s derivative is not equal to the acceptable threshold for threats is a threat. 2.8.2 Usage Model List It must be stated that for each component of the system under investigation, we will create a usage model. 2.8.3 Authentication Usage Model The authentication usage model represents the usage of an authentication system. The independent variables that must be sampled to determine the usage of an authentication system are the average data transmitted during an authentication (x1) and the average network speed for a single authentication (x2). The average data transmitted is the average of request and response data for a single authentication and the average network speed is the average upload and download speed for a single authentication. The dependent variable that must be sampled is the time taken for an authentication (y). The goal of modelling the dependent and independent variables is to arrive at a mathematical relationship between y and the two independent variables x1 and x2. It is expected that the relationship will be Y=c1(x2/x1) +c2, where c1 and c2 are system constants. In addition to that, some system constants that will aid threat analysis must be determined. These are the total number of valid authentications, the expected authentications within a time frame, the minimum authentications within a time frame and the maximum authentications within a time frame. The mathematical relationship between y, x1 and x2 is the normal usage model of the authentication system. After this relationship has been determined, various occurrences that deviate from this relationship can be used to analyze threats. For instance, any occurrence that is not equal to the average usage is a threat. Additionally, any occurrence that indicates a change outside an acceptable threshold is a threat. The acceptable threshold is a range within which changes in the systems are deemed normal. Such a range is composed of the average usage and standard deviation.
  • 24. 23 2.8.4 Session Usage Model A session usage model represents a single user’s behavior before his session expires. To determine the mathematical model for a user’s session, two main independent variables must be sampled. These are size of session data accumulated (x1), and number of user actions (x2). The dependent variable that must be sampled is time spent before session expires (y). The session usage model is expected to be made up of two micro usage models. The mathematical representation of the micro usage models are expected to be Y=c1x1+c2 where c1 and c2 are systems constants and Y=c1x2+c2 where c1 and c2 are system constants. In addition to the two mathematical functions, some system constants that will aid threat analysis must be determined. These include average user actions, average size of data accumulated, average time spent. These constants can be determined from the data set used to determine the usage model. The two mathematical relationships represent the session usage model. Both are linear functions. It is expected that as user actions increase the time spent also increases. It is also expected that as data accumulated increase times spent also increases. 2.8.5 Memory Usage Model The memory usage model represents the usage of memory space in a system. The independent variables that must be sampled are number of application programs running (x1), and the number of system processes running (x2). The dependent variable that must be sample is amount of memory space being used(y). The mathematical relationship between x1, x2, and y is expected to be y=c1x1+c2x2+c3 where c1 is the average memory space for programs, c2 is the average memory space for processes and c3 is the average memory being used when no process or program is running. In addition to these, some system constants that aid threat analysis must be determined. These include the minimum and maximum memory space for programs and the minimum and maximum memory space for processes. The mathematical relationship between x1, x2, and y is the memory usage model. When determined, the memory usage model can be used to analyze changes in the memory usage that indicate threats in the system. 2.8.6 CPU Usage Model The CPU usage model represents CPU usage in a system. The independent variables that must be sampled are the number of application programs running (x1), and number of system processes running (x2). The dependent variable that must be sampled is amount of CPU power being used (y). The mathematical relationship between x1, x2, and y is expected to be y=c1x1+c2x2+c3 where c1 is the average CPU power being used for programs, c2 is the average CPU power being used for processes and c3 is average CPU power being used when no process or program is running. In addition to these, some system constants that aid threat analysis must be determined. These include the minimum and maximum CPU power for programs and the minimum and maximum CPU power for processes. The mathematical
  • 25. 24 relationship between x1, x2 and y is the CPU usage model. When determined, the CPU usage model can be used to analyze changes in the CPU usage that indicate threats in the system. 2.8.7 Program Usage Model To determine the program usage model the dependent and independent variables that must be sampled are time spent using program (y), and number of functions used (x). In addition to that, the following constants must also be determined. Minimum functions used and maximum functions used. The relationship between y and x determined after sampling various x and y values is the program usage model denoted by y=f(x). 2.8.8 Host Usage Model The host usage model is composed of four independent variables. Memory usage (x1), session usage (x2), CPU usage (x3), and program usage (x4), derived from their respective usage models. The dependent variable that must be sampled in the time host spent on host (y). Any relationship determined between the dependent and the independent variables is the host usage model. The resulting host usage model is denoted y=f (x1,x2, x3, x4). 2.8.9 Battery Usage Model The battery usage model is made up of the average usage of CPU, average memory usage and the average usage of how a session behaves in the system. These are the independent variables. The dependent variable is the battery lifespan. The independent variables are derived from their respective micro usage models. 2.8.10 Device Usage Model The device usage model is made up of a battery usage model, a host usage model, and the time spent on the device. The usage models that make up the device usage model compute the average micro usage and try to relate that with the time spent on the device. The time spent on the device is the dependent variable. 2.8.11 Server Usage Model The server usage model is made up of the CPU time being used, the memory space being used and the number of processes running. These variables are used to form two different micro usage models. As such, there are two dependent variables, CPU time and memory space. The independent variable for both micro usage models is the number of processes running. 2.8.12 Port Usage Model The port usage model is made up of the time elapsed during communication, number of programs that use the port and the number of paired ports. The number of paired ports is the dependent variable and the remaining variables are the independent variables.
  • 26. 25 2.8.13 Network Usage Model The network usage model is made up of average port usage, average server usage average host usage, the average size of data transmitted on the network, and time spent on the network. The first three variables are the independent variables. The remaining two are the dependent variables. As such two micro usage models make up the network usage model. 2.8.14 Aggressive Usage Detector This model is a utility that detects aggressive behavior on a system. It is modelled just like the various micro usage models. Various factors that determine aggressive behavior during system usage are used to determine the mathematical representation of this utility. Aggressive behavior includes aggressive use of major system resources, and aggressive use of system components with limited resources. The average aggressive behavior and its standard deviation are determined. Any system occurrence that indicates the average aggressive behavior, or the average aggressive behavior plus its standard deviation or the average aggressive behavior minus its standard deviation is considered a threat and must be halted, alerted or stored for audit purposes. 2.8.15 False Alarm Detector The false alarm detector is a utility that detects normal system usage that otherwise may be deemed threats. Occurrences that meet the criteria for false alarms are normal usage that seems to put the entire usage of the system into a false state of vibration or anarchy. Such usage occurrences are as such prioritized as normal optimal usage. The remedy for the vibrations such usage occurrences cause is delay in other normal usage occurrences in the system. The state and magnitude of other system occurrences plus the state and magnitude of the normal optimal usage determine the impact of the perceived anarchy. To increase convenience with which the system for which this utility is developed, the average delay time and its standard deviation must be detected. This utility is part of the normal usage. The utility is modelled just like the aggressive usage detector. 2.8.16 Special Parameters of The Usage Model This section discusses special parameters of our normal usage model. These parameters include the average usage, the usage standard deviation, the minimum usage, the maximum usage and the most frequent usage value recorded. The average usage is the predicted average usage after the normal usage model function has been determined. The usage standard deviation is the standard deviation of the predicted normal usage function. The minimum and maximum usage values are the minimum and maximum usage predicted using the normal usage model. These parameters together with usage rates, threat model constants and other usage constants are used in analyzing and detecting threats.
  • 27. 26 2.8.17 Building The Usage Profile To build the usage profile we will first program a usage model for all the components of the computer system under investigation. For this research, we want to build the usage profile for a computer network. As such we will program a usage model for authentication on the computer system, we will also program a usage model for a user’s session on the computer system. Also, we will program the usage model for memory usage in a computer system. We will also program a usage model for CPU usage. Additionally, we will program a usage model for a host on a network and program another usage model for a server on the network and finally we will program a usage model for the network its self. The usage model for each component represents the behaviour of that component of a computer system under investigation. The usage model when implemented will help us determine the regression equation which represents the research model and the average usage and its standard deviation. In addition to the regression equation and the mean and standard deviation model we will develop a markov chain model for the system under investigation. As such we will determine states in the entire computer network and the various state transitions and the associated probabilities of state transitions. The rest of this chapter will explain how to build a usage profile using an authentication system and explain the details of the critical variables of the other usage models and explain the mathematical theory needed for building the usage profile. 2.8.18 Building a Usage Profile for an Authentication System To build a usage model for an authentication system, we must sample critical system variables of a system. These variables include the download speed on the network, the upload speed on the network, the size of data sent to the server during authentication, the size of data sent to the client during authentication and the time it takes for a successful authentication. The size of data sent and received from the server are request data and response data respectively. To build the usage model for the authentication data, we will capture data for all the critical variables at equal time intervals say every 10 minutes while the authentication system is being used. After having a sample of sample size of about 10 we will try to determine the relationship between the dependent variable and the independent variables. As already stated the relationship can be determined using simple or multiple linear regression. In addition to the regression equation, we will also determine other statistics that describe the behaviour of the authentication system such as the mean and standard deviations for the variables that were sampled. 2.8.19 Building A Markov Chain Model for An Authentication System Hidden markov models are machine learning models that are used to model states in a system, the sequence in which they occur and the associated probabilities for each state transition. When a system has a set of states in which it usually falls, and it can be predicted or established that each new state is dependent on the previous states, then hidden markov models can be used to learn the state transitions that usually happens in the system.
  • 28. 27 To build the markov chain model we will determine states on the authentication system and their associated probabilities. Some of these states include the average usage of the authentication system. This may be abstracted as the average time it takes for a successful authentication. Other states include the minimum and maximum recorded time for a successful authentication and the average time it takes for a failed authentication or the maximum and minimum recorded time for failed authentications. With this information and their associated probabilities of occurrence during a normal day we have more information about the behaviour of the authentication system. 2.8.20 Threat Models in a System A threat is a change in the normal usage model that is beyond a certain acceptable threshold called the standard deviation of the usage model. A threat model on the other hand is an abstract representation of this change in our mobile system that is beyond the acceptable threshold. Integration can be performed on a threat model to determine the source of the threat. Integration is a reverse operation for differentiation in calculus. A threat model that can perform integration operations can be called a novel self integrating data structure. This chapter of the paper will look at threat models of the micro usage models that make up a computer network and how to analyze these threats in order to prevent them. Also, how to determine the sources of these threats using a novel self integrating threat model will be discussed. To do this, three main functions are introduced. The functions are y=3, y=4X+2 and y=9X2+3. These functions are in the context of the novel self integrating data structure. These functions are three different threat models. Additionally, the threat models of the various micro usage models discussed in this paper will be explored. 2.8.21 Properties and Methods of the Novel Self Integrating Data Structure The best properties or characteristics of the data structure that represents our threat model include just to mention a few, names of network software or host application software, version number of network and host software, license information that include date software was purchased or released and number of years needed for renewal, IP address and Mac address of a host on a network. The methods of such a gigantic or simulative object may include methods for computing the integral of a threat model, another for computing the differential coefficient of the predictive normal usage model, a method for computing the differential equation of a network or host threat model. These methods included are mostly methods needed for performing the major calculus operations that will help in the novel calculus simulation on a network to detect threats and their sources on a wireless network. Besides these, it may be necessary to implement methods that retrieve hidden network identity like IP and Mac addresses on a local area network.
  • 29. 28 2.8.22 Integration Review Based on our three functions stated in this chapter, we will do an introductory review of integration which is a branch of calculus that is a reverse operation for differentiation. The integrals for the functions introduced in this chapter are computed respectively as 3X +C, 2X2+4X+C and 3X3+3X+C where C represents system constants in the mobile system. Computing the integral can be tricky so two laws are defined below to aid quick computation of the integrals of a normal mathematical function. Theorem 1: If a function is represented by a constant such as a rational number, the integral is the product of the variable x and the rational number which is the constant plus a system constant c, to be determined by about a pair of x and y values. Theorem 2: If a function is not represented by a constant, the integral is given as the constant of the first x occurring term divided by the sum of the exponent of the first x occurring term and 1 multiplied by the variable x raised to the power the sum of the exponent of the first x occurring term and 1 plus repeating the same for every x occurring term plus the corresponding system constant c. 2.8.23 Interpretation of Threat Model Integrals Since the novel self integrating data structure is a programmed threat model, it is important to discuss the meaning of its integrals. The integrals represent the source of the original threat. Examples of the integrals of the threat model may result in detecting the function, software, host or network from which the threat was detected. With properties like software name, version number, IP and Mac addresses it becomes easy to pin point the source of the threat. If the integral of a threat model looks like the normal usage model of a function of the system under examination, then that function from the system under examination can be predicted as the source of the threat. Similarly, if the integral is similar to the normal usage model of a software, host, or network that forms part of the system which is being investigated, then that threat can be predicted to be from that software, host or network. 2.8.24 Threat Analysis and Detection To do threat analysis in a system and abort processes that initiated those threats, linear and non linear programming techniques can be used. The goal here is to minimize the threat occurrence frequency and the overall impacts associated with the threat and optimize the normal usage function. In addition to these two goals, there are some constants that aid threat analysis. These constants are associated with the normal usage model and the threats in the system. Examples of these constants may be the rate at which usage is increasing with respect to a particular usage variable or the rate at which the threat impact and frequency
  • 30. 29 increases with respect to a particular variable in the usage model and other special parameters associated with the usage model function. The average usage, its standard deviation and the threat model function make up the threat model. The average usage and standard deviation are constants in the threat model. Using the threat model function, the average usage and standard deviation, threats analysis can be done using linear and non linear programming. The goal is to minimize threats using the threat model function as the objective function and the average usage and standard deviation as constraints. Other parameters that may be used as constraints include the rate at which usage is increasing with respect to a particular usage variable or the rate at which the threat impact and frequency is increasing with respect to a particular usage variable. 2.8.25 Threat Prediction This section discusses how to predict threats in a system. The network usage model discussed in the previous chapter and its associated threat model will be used to demonstrate how to predict or detect a threat in a system. As discussed in the previous section, threats can be detected using linear and non linear programming. The network usage model function and its associated threat model function are the objective functions. The constraints that will be used are the average network usage and its standard deviation, and other parameters such as the rate at which the network threat increases with respect to other network usage model components such as average host usage, average server usage, average port usage, average time the network operates, average data transmitted on the network. The goal of the linear or non linear programming is to optimize the usage such that usage is within the range of the average usage minus its standard deviation and the average usage plus its standard deviation. These are the lower and upper bounds of our objective function. Every combination of system variables whose usage is within this usage range minimizes threat in the system. Since the average port, host and server usage are derived from their corresponding usage models, the linear and non linear programming analysis will be done independently for these ones. When a threat is predicted in a system, the chance of it being accurate is dependent on the usage value at that instance and whether it is within the range of the acceptable usage. This is constructed using the average usage and its standard deviation. Any usage value that is less than the average usage minus its standard deviation is a threat. Also, a usage value that is greater than the average usage plus its standard deviation is a threat. That means that any predicted threat at a point where the predicted usage is within the usage range has a high chance of being false. In addition to that, the actual and predicted usage values can be used to determine that chance that the predicted threat is accurate. If the difference between them is high, there is a chance that the predicted usage may be wrong. Since the predicted usage and the threat models are derived from the usage model function, there is a chance the predicted threat is also false. Finally, the closer the correlation coefficient of the usage model function is to zero, the higher the chance the
  • 31. 30 predicted usage and its associated threats values are wrong. Usage model functions with correlation coefficient of 0.6 and above indicate that the predicted usage values and predicted threats values are accurate. These values are obtained from the usage model function and the threat model function respectively which are modelled using relevant system variables that make it possible to model system usage and system threats. 2.8.26 Risk Analysis in a System To do risk analysis in a system, the frequency at which threats in the system occur and the impact they have on the system must be known. When a frequency table is constructed for all threats and their associated impacts stored, it becomes easy to analyze risks associated with a system. When a threat is predicted, the likelihood of the threat occurring in the system can be computed using the threat frequencies. The impacts various threats have can also be determined based on the types of threats and other parameters such as the number of such threats, the speed at which they occurred and the resources they affected or damaged. Risk in a system is computed as the product of the likelihood of threat occurrence and the impact that threat occurrence has on the system. These concepts are the basics for developing a risk analysis system using the techniques we have discussed so far. 2.9 Normal Usage Model and Threat Model Simulation In this chapter, we discuss the experiment that was conducted to determine the usage of a computer system. We also discuss how to simulate the threat and usage models with the hope of developing a threat detection system. Four of the micro usage models that were discussed in this paper were used for the simulation. These are the ones for authentication, session CPU and memory. Because the usage model for authentications was determined to be a rational function, logs were taken on both sides of the relation as part of the simulation in order to reduce the relation to their linear form. The original function is Y=c1(x2/x1) +c2. When reduced to its linear form we have log Y= log c1+ log x2 – log x1 + log c2. Since log c2 and log c2 result in constants let denote them with k1 and k2 respectively. Additionally, let B= log Y, let j1= log x1 and let j2= log x2. Therefore, the linear form of the usage for authentication is B= j2- j1 + k1 + k2. Since k1 + k2 is a constant, let it be represented by k. As such B= j2- j1 + k where B is the dependent variable and j2 and j1 are the independent variables. When B, j2, and j1 are sampled, Y=c1(x2/x1) +c2 can be determined. The cpu and the memory usage models are multiple linear forms. The original relation is of the form y=c1x1+c2x2+c3 where x1 and x2 are the independent variables. The original relation must be reduced to their simple linear form. To do this, determine y=b0+bx for each independent variable. The sum of the various b0 equals c3. The various b correspond to the constant associated with the independent variable for which y=b0+bx was determined. For example, the b for any y=b0+bx determined for x1 equals to c1 and that for
  • 32. 31 x2 equals to c2. When x1, x2, and y are sampled and the various y=b0+bx determined, y=c1x1+c2x2+c3 can be determined completely. The simulation was run for four times within a week. On the first instance, it was run for 15 minutes. On the second instance, it was run for 30 minutes. On the third instance it was run for 45 minutes. On the last instance it was run for 60 minutes. The functions for the usage models, and their corresponding correlation coefficient were also determined. 2.10 Tools and Computer Packages This chapter discusses the tools and computer packages that were used throughout this research project. We will also look at the programming languages, database platforms and development frameworks that can be used to develop an anomaly based intrusion system for ecommerce sites using the concepts we have discussed in this paper. The simulation was implemented using java. It was a console based simulation. Java was chosen for its object oriented concepts such as encapsulation, inheritance, interfaces, objects, and polymorphism. To implement an intrusion detection system using results of this research, the following tools will be essentials These tools are best suited for intrusion detection systems developed for ecommerce sites. Bootstrap, CodeIgnitor, MySQL Database Management System, SQLite, SQLyog, and Eclipse. The programming languages that will be used are PHP and Android. PHP is for the desktops and laptops that connect to the ecommerce sites and Android is for mobile phones that use the ecommerce sites. Bootstrap and CodeIgnitor are web development frameworks. Bootstrap is for frontend developments and CodeIgnitor is a backend framework for PHP developers. For Android Eclipse can be used as the best IDE for Android developments. MySQL and SQLyog are for the database servers that will run on the ecommerce site as part of the intrusion detection system implementation. SQLite is for the databases that run on the Android implementations that form part of the intrusion detection system developed for the ecommerce website. With all these tools, frameworks and packages, developers are ready to develop intrusion detection systems for ecommerce sites using the concepts in this research paper. It is expected that the micro usage models discussed will be integral libraries that will be implemented in PHP and Android as part of an implementation for ecommerce sites or any group of web or mobile application systems.
  • 33. 32 3.0 Secured Expert Medical Consultation System In this chapter, we describe the objectives for the secured expert medical consultation system, the research question for developing the system, and the problems that we seek to address. We will also look at the requirements of the system and then describe how the system will be developed. 3.1 Problem Definition The three problems that this research project seek to address are: ● Developing a medical system that will assist in medical consultation ● Determining diseases that a Patient is likely to get based on medical history. ● Measuring medical information such as height, weight, temperature, blood pressure and blood sugar. 3.2 Research Questions ● What are the best and most efficient ways of modelling diseases for development of a medical care system for administering medical care? ● How can an expert medical consultation system be developed? ● How can we determine disease that a Patient will get based on medical history? ● How can we develop a hardware system and a software that can be used to measure medical information such as weight, height, temperature, blood pressure and blood sugar? 3.3 Objectives of Paper ● The first goal is to model disease for the development of an expert system for diagnosing diseases during medical consultation. ● The second goal is to optimize how to match a set of symptoms and indications to a particular disease. ● The third goal is to find diseases that a Patient can bet based on medical history ● The fourth goal is to develop a medical equipment that can be used to measure medical information such as height, weight, temperature, blood pressure and blood sugar. ● The last goal is to develop an expert medical consultation system for mobile phones, tablets and personal computers.
  • 34. 33
  • 35. 34 3.4 Literature Review In this section we describe the various literature that forms part of this research. We will look at evolutionary algorithms, the various types of evolutionary algorithms and how sensors can help in measuring medical data. 3.4.1 Evolutionary Computing Terminologies Some of the terms used in evolutionary computing are phenotypes, genotypes, chromosomes, genes and alleles [3]. Phenotypes are a set of search space that are related to the possible solutions in a problem [3]. Genotypes are a set of result space that are related to the possible solution in a problem [3]. The transition from the search space, phenotypes to the results space genotypes is encoding [3]. The transition from the results space to the search space is decoding [3]. In some cases, the search space may be a set of integers and the result space may be a set of binary numbers representing a search integer in the search space [3]. 3.4.2 Evolutionary Algorithms One of the techniques used in evolutionary computing is evolutionary algorithms. The components of an evolutionary algorithm are representation, Evaluation or fitness function, population, parent selection mechanism, variation operators which are recombination and mutation, survivor selection mechanism (replacement), initialization and termination condition [3]. Some of the classes of Evolutionary Algorithms are Genetic Algorithms, Genetic Programming, Differential Evolution, Evolutionary Strategy, and Evolutionary Programming [2]. 3.4.2.1 Representation Representation includes changing the real world into the evolutionary computing world [3]. The possible solution set which is the set of phenotypes is encoded into objects in the evolutionary computing world called genotypes [3]. Many synonyms are used to describe elements of the two space [3]. The genotypes are called chromosomes [3]. Genes are placeholders and alleles describe objects in the place [3]. 3.4.2.2 Evaluation or Fitness Function The evaluation function forms the basis for selection [3]. It is the requirement to adapt to. It defines what improvements means [3]. From the problem-solving perspective, it represents the task to solve in the evolutionary computing context [3]. Technically, it represents a procedure that assigns a quality measure to the genotypes [3]. Typically, the procedure is composed from a quality measure in the genotype space and the reverse representation [3]. Often the problem to solve in an evolutionary algorithm is an optimization problem [3]. In such cases the name objective function is used in the problem context and the fitness function is identical to or a simple transformation of the objective function [3].
  • 36. 35 3.4.2.3 Population The population is a multiset of genotypes [3]. The role of the population is to hold possible solutions [3]. The population is the unit of evolution [3]. Genotypes are static individual objects, not changing or adapting, it is the population that does [3]. 3.4.2.4 Parent Selection Mechanism The role of parent selection mechanism is to distinguish among individuals based on their quality [3]. This is to allow better individuals to become parents of the next generation. An individual is seen as a parent if it has been selected to undergo variation in order to create offspring [3]. The parent selection mechanism together with the survivor selection mechanism is essential for quality improvements [3]. In EC parent selection mechanism is usually probabilistic. Thus, high quality individuals get a higher chance of becoming parents than those with low quality [3]. However low quality is usually given a small chance otherwise the entire search becomes too greedy and gets stuck in a local optimum [3]. 3.4.2.5 Variation Operators The variation operators are mutation and recombination [3]. The review for mutation and recombination is given below. 3.4.2.6 Mutation Mutation is an operation that is performed on one genotype and produces a slightly modified mutant [3]. As such, mutation is a unary operator [3]. A mutation operator is usually stochastic [3]. As such, its output, which is the child, depends on a series of random choices [3]. It should be noted that an arbitrary unary operator is not necessarily mutation [3]. Mutation in general is supposed to cause an unbiased random change [3]. It must be noted that the variation operator forms the evolutionary implementation of basic steps within the search space [3]. Theorems suggesting that given sufficient time evolutionary algorithms (EA) determine a global optimum depends on the property of each genotype representing a possible solution that can be reached by the variation operators [3]. 3.4.2.7 Recombination The name for a binary operator is recombination or crossover [3]. Similar to mutation recombination is a stochastic operator [3]. The choice on which part of the parent is combined and how these parts are combined are random selection [3]. Recombination operators with higher arity, that is having more than one operand or parent is mathematically possible and easy to implement but have no biological equivalence [3]. That is why perhaps they are not widely used although several studies show that they have a positive effect on evolution [3]. The principle behind recombination is very simple. That is, by mating parents or individuals with different features we can produce offspring with both features [3]. Biologically, recombination is the superior form of reproduction [3].
  • 37. 36 3.4.2.8 Survivor Selection Mechanism Survivor selection mechanism is often called replacement or replacement strategy [3]. However, a good reason to use the survivor mechanism is to keep terminology [3]. The role of the survivor mechanism is to distinguish among individuals based on their quality. It is similar to parent selection, but it is used at a different stage of the evolution cycle [3]. 3.4.2.9 Initialization Initializations are kept simple in most EA applications [3]. The first population is seeded by randomly generated individuals [3]. 3.4.2.10 Termination Condition There are two types of termination conditions [3]. The first one is when the evolutionary computing problem has an optimal fitness level [3]. This may probably come from a known optimum of the given objective function or fitness function [3]. In such cases, when that level is reached then the evolutionary problem search can be stopped [3]. However, EAs are stochastic in nature and the optimum may not be reached hence the fitness function may never be satisfied and the algorithm may never stop [3]. That requires that the condition is extended with one that certainly stops the algorithm [3]. Some of these extensions include the following. Using maximum allowed CPU time [3]. So, when this maximum time elapses then the algorithm is stopped [3]. Total number of fitness evaluations is given a limit so that the algorithm is stopped when this limit is reached [3]. To do the evolution a number of times for example for a number of generations [3]. 3.4.3 Genetic Algorithms A Genetic Algorithm is a search heuristic that is inspired by Charles Darwin’s theory of natural evolution [24]. This algorithm reflects the process of natural selection where the fittest individuals are selected for reproduction in order to produce offspring for the next generation [24]. There are five phases in a genetic algorithm [24]. These are; Initial Population, Fitness Function, Selection, Crossover, and Mutation [24]. 3.4.4 Evolutionary Strategies Evolutionary Strategies (ES) is one type of black - box optimization algorithm that belongs to the family of evolutionary algorithms [30]. The optimization targets of Evolutionary Strategies are vectors of real numbers [30]. It must be noted that Evolutionary Strategies are stochastic optimization algorithms and are designed specifically for continuous function optimization [26].
  • 38. 37 3.4.5 Genetic Programming Genetic Programming is a domain-independent method for genetically breeding a population of computer programs to solve a problem ]28]. That is, Genetic Programming iteratively transforms a population of computer programs into a new generation of programs by applying analogs of naturally occurring genetic operations [28]. It must be stated that Genetic Programming is a form of Artificial Intelligence that mimics natural selection to find optimal results [20]. 3.4.6 Evolutionary Programming Evolutionary Programming originally conceived by Lawrence J. Fogel in 1960 is a stochastic optimization technique similar to Genetic Algorithms [43]. One main difference between Evolutionary Programming and Genetic Algorithms is that it places emphasis on behavioural linkage between parent and offspring rather than seeking to emulate specific genetic operators as observed in nature [43]. It is also similar to Evolutionary Strategies although they were developed independently [43]. 3.4.7 Differential Evolution Differential Evolution is a heuristic approach for global optimization of nonlinear and non-differentiable continuous space functions [38]. Differential Evolution is similar to popular direct search approaches such as genetic algorithms and evolutionary strategies [38]. It must be stated that this algorithm is advantageous over the other mentioned approaches because it can handle nonlinear and non-differentiable muti-dimension objective functions, while requiring very few control parameters to steer the minimisation [38]. 3.4.8 A Survey on Wearable Sensor-Based Systems for Health Monitoring and Prognosis A research paper entitled “A Survey on Wearable Sensor-Based Systems for Health Monitoring and Prognosis” describes wearable and biomedical health systems for health monitoring and prognosis [5]. The paper explains that these wearable and biomedical health systems have gained a lot of attention in the scientific community [5]. The paper also explains that this is “mainly motivated by increasing healthcare costs and propelled by recent technological advances in miniature biosensing devices, smart textiles, microelectronics, and wireless communications, the continuous advance of wearable sensor-based systems will potentially transform the future of healthcare by enabling proactive personal health management and ubiquitous monitoring of a patient's health condition” [5].
  • 39. 38 The paper attempts to review the current research and developments on wearable biosensor systems for medical monitoring. According to the paper, a variety of system implementations are compared in an approach to identify the technological shortcoming of the current state of the art in wearable biosensor solutions and systems [5]. The paper also explains that “an emphasis is given to multiparameter physiological sensing system designs, providing reliable vital signs measurements and incorporating real-time decision support for early detection of symptoms or context awareness” [5]. 3.4.9 Sensors in Medicine According to another research paper entitled “Sensors in Medicine'', sensors are devices that detect physical, chemical and biological signals and provide a way for those signals to be measured and recorded.[8] Also that paper explains that “physical properties that can be sensed include temperature, pressure, vibration, sound level, light intensity, load or weight, flow rate of gases and liquids, amplitude of magnetic and electronic fields, and concentrations of many substances in gaseous, liquid, or solid form. Although sensors of today are where computers were in 1970, medical applications of sensors are taking off because of advances in microchip technologies and molecular chemistry.” [8] 3.4.10 Expert System Methodologies and Application This sectionof the research paper describes expert system methodology and applications. A research paper on expert system methodology and applications classifies expert system methodology and applications by using literature review and classification of articles from 1995 to 2004 [37]. According to the research paper, based on its survey and classification of articles, it describes eleven categories of expert system methodology classification [37]. The eleven categories are Rule-based Systems, Knowledge-based Systems, Neural Networks, Fuzzy Expert Systems, Object Oriented Methodologies, Case-based Reasoning, System Architecture, Intelligent Agent Systems, Database Methodologies, Modelling and Ontology[37]. The paper also describes the applications, research and problem domains for the various categories. 3.4.10.1 Rule-based Systems A Rule-based Expert System is a system which contains information obtained from a Human Expert and represents that information in a form of rules, such as IF-THEN [37]. The rule can be used to perform operations on the data to infer in order to reach an appropriate conclusion [37]. According to the research paper, applications of Rule-based expert systems include State Transition Analysis, Psychiatric Treatment, Production Planning, Advisory System, Teaching, Electronic Power Planning, Automobile Process Planning, Hypergraph Representation, System Development, Knowledge Verification/Validation, Alcohol Production, DNA Histogram Interpretation, Knowledge Based Maintenance, Scheduling Strategy, Management Fraud Assessment, Knowledge Acquisition, Communication System