An Artificial Neural Network Based Anomaly Detection for Computer Networks Connected to the Internet and Web and Mobile Applications that are Accessed through the Computer Network

Nathanael Asaam
Founder and CEO @ Equicksales Consulting Ltd | Application Support Officer @ Ashesi University
nataoasaam@gmail.com
1
An Artificial Neural Network Based Anomaly Detection for Computer Networks Connected to
the Internet and Web and Mobile Applications that are Accessed through the Computer
Network
Abstract
This paper is a research paper that seeks to investigate the application of artificial neural network
(ANN) in the development of an anomaly detection system for a computer network connected to
the internet and web and mobile application used on the computer network. The methodology of
this study employs a usage profile of a computer network and Apache web server that hosts a web
application. Also, the paper intends to explore how the Adaline algorithm which is a technique
used for implementing ANN can be used to create a usage profile of the Apache web server and
the computer network.
Even though we believe that using the Perceptron algorithm which is one of the oldest ways
of implementing an ANN, is not suitable for the usage profile that this research paper intends to
develop, we make a concise comparison of the Perceptron and Adaline algorithms in order to help
readers have an understanding of how both algorithms can be used to develop an ANN.

Nathanael Asaam
2
Introduction
Cyber security threats on computer networks have the potential of causing damage to resources on
the computer network. Examples of these damages include corrupting data stored or transmitted
on the network, infesting a host on the network with virus, impersonating a valid user on the
network and preventing proper functioning of applications softwares on various hosts on the
network. The security of computer systems is very essential to various organizations. Computer
systems’ security is usually provided by computer software that protects the computer system for
which they were developed. Such a computer software system is an intrusion detection system.
Other computer systems that provide security are antivirus and firewall and risk analysis systems.
Also, periodic computer security audits will enable attack and threat detection and prevention on
computer networks.
Also, security of a computer network connected to the internet and web and mobile
applications software accessed through the network are very critical. This is particularly important
since there have been many security threats and intrusions associated with web applications,
ecommerce sites and mobile applications. A research paper on Classification of malicious web
sessions states that 60% of total attack attempts on the internet were targeted against Web
applications [7]. Another Research paper states that Web based, and Web application attacks are
ranked number two and three in cyber security environments according to the European Network
and Information Security Agency (ENISA) [9]. Another research paper on Security analysis of
smartphone Point of Sale (POS) systems states that there are several security threats that come
with smartphone POS systems [8]. The paper states that one of the threats associated with
smartphone POS systems is Network Adversaries [8]. A network adversary is someone who
intercepts or modifies a communication to and from a smartphone app that is used on a wireless
communication link; including Wi-Fi, and Cellular Data Network [8]. According to the paper,
protection from this type of threat is done through Transport Layer Security (TLS) which encrypts
sensitive data including credit card data [8]. Other attacks or threats associated with smartphone
POS Systems are Malicious App, Fake POS App, Malicious OS, Malicious Firmware, Malicious
Hardware, and Insider Attacks [2].
It is also important to mention that anomaly detection systems can be used to detect
intrusive activities in web and mobile applications and also on computer networks. Anomaly
detection uses deviations from novel patterns of a system to detect anomalous activities in the
system. Additionally, Anomaly detection systems can be developed for applications hosted on the
LAMP Stack. This is partly possible because the various software that make up the LAMP stack
have logs. There are two types of logs, which are access logs and error logs. Because the logs
provide extensive information about the behaviour of the various software that make up the LAMP
Stack, it is easy to construct a rule-based or machine learning-base models that can help detect
intrusive activities in a web application that is hosted on a web server. Also, it is essential to state

Nathanael Asaam
3
that smartphone applications also connect to web servers. In that, some of the processing burden
is left to functionality implemented on web applications. As such it is possible to mitigate attacks
and threats associated with smartphone applications using anomaly detection systems developed
for web servers. Also, anomaly detection systems can be also developed for functionality that
resides on the smartphone.
One way of developing anomaly detection systems is by developing a usage profile. This
is because if a usage profile of a system can be built, it will become possible to detect unusual
behaviour on the system. The method for building such usage profiles involves determining factors
of the system that are critical to the system. These factors can be seen as critical system variables
that affect the system’s usage. The other thing to consider is determining the way in which you can
obtain an abstract representation of the usage profile. The abstract representation of the usage
profile can be achieved by the application of behaviour models such as statistical models, machine
learning models and cognitive based models.
This paper is a research paper that seeks to explore various ways of mitigating attacks and
threats that are associated with a computer network connected to the internet and web and mobile
applications that are accessed through the computer network. More specifically, the research will
focus on web applications hosted on a LAMP server and mobile applications that are used on the
computer network. The research will also investigate anomaly detection on a Lamp Server using
Artificial Neural Network (ANN). LAMP is an acronym that stands for Linux, Apache, MySQL
and PHP. Some of the methods that will be employed in this research are using rule-based and
machine-learning based models to develop anomaly detection systems for mobile and web
applications. Additionally, we will explore how to mitigate several attack types such as Network
Adversary, Structured Query Language Injection (SQLI) and Cross-site Scripting (XSS).
Background
Below is a concise background on what makes up the LAMP Stack and Apache Log files.
Ubuntu Server
Ubuntu Server is a version of the Ubuntu Operating System (OS) which is a distribution of the
Linux OS that is designed and engineered as a backbone for the Internet [12]. Ubuntu Server brings
economic and technical scalability to your datacentre, public or private [12]. Whether you want to
deploy an OpenStack, a Kubernetes cluster, or a 50,000 - node render farm, it delivers the best
value scale-out performance available [12].
Apache
Apache also known as Apache HTTP Server is the most widely used web server software and runs
on 67% of all websites in the world [11]. It is developed and maintained by Apache Software

Nathanael Asaam
4
Foundation [11]. It is fast, reliable and secure and can be highly customized to meet the needs of
many different environments using extensions and modules [11].
Mysql
Mysql is an open-source Relational Database Management System (RDBMS) that enables users
to store, manage and retrieve structured data efficiently [13]. It is widely used for various
applications from small-scale projects to large-scales websites and enterprise level solutions [13].
It is also the most popular open-source SQL DBMS and is developed, supported and distributed
by Oracle Corporation [14].
PHP
PHP (a recursive acronym for PHP Hypertext Preprocessor) is a widely used general-purpose
scripting language that is well suited for web development and can be embedded in HTML [15].
According to Web Technology Surveys, PHP is used by 78.1% of all websites including high-
traffic websites such as Facebook and Wikipedia [16].
LAMP Server and Apache Log Files
A Lamp Server consists of a distribution of the Linux Operating System, Apache Web Server,
MySQL, and PHP. These software generate logs that are stored in log files. Apache Log files are
text files containing information about all that the Apache Web Server is doing [2]. The are two
main types of Apache Log files [4]. These are Access log files and Error log files [4]. These log
files are very significant because they help System Administrators to parse and analyze the data in
these files later, in order to get insightful information and other metrics about the web server and
also, it becomes possible to graph the insights and metrics obtained from this data [2]. This can
lead to detecting system errors, and other security breaches such as intrusion and other
vulnerabilities, and threats associated with websites and application systems hosted on the Apache
Web Server.
Also, these Apache Logs files contain multivariate time series data recorded in access and
error log files. It must be stated that such multivariate time series data are difficult to analyze and
effectively model [1]. Also, when the dimensionality of such data begins to grow, it becomes
difficult for humans to monitor it manually 1]. However, such data can be analyzed and modelled
using data mining techniques to draw inferences [1]. This can help us detect performance
anomalies in the Apache Web Server. Also, it must be emphasized that there are two common
Access Log types. These are Common Log Format and Combined Log Format [2]. The Apache
Common Log Format is one of the two most common log formats used by HTTP servers and the
Apache Combined Log is another format often used in Access Logs [2]. Refer to Appendix 2 for
more information about the two Access Log Types.

Nathanael Asaam
5
Problem Definition
If the normal behaviour of a computer network and web and mobile application that are used on
the network can be represented by an abstract model, then this abstract model can be used to detect
intrusions on the computer network. These intrusions on the computer network can be detected as
deviations from the abstract model which is the behaviour of the computer network. The main
problems this paper seeks to investigate are listed below.
• Representing the normal behaviour of a computer network and web and mobile apps that
are used on the computer network with an abstract model.
• Determining activities and occurrences that are deviations from normal behaviour.
• Representing these activities or deviations with an abstract model.
• Preventing such activities or occurrences from occurring on the computer network or on
the web and mobile applications that are used on the network.
In this research paper, the normal behaviour is known as a usage profile and the deviation from the
normal behaviour is known as the threat profile.
Research Questions
The main questions to be investigated are listed below.
• What are the best and most efficient techniques for building an abstract model of the normal
behaviour of a computer network and web and mobile applications that are used on the
computer network?
• How can we design and develop an anomaly detection system?
• How can we perform risk assessment of a computer network?
Objectives of Study
The main objectives of the research papers are as follows.
• Developing an anomaly detection system
• Drafting of a policy document that contains details of procedures, processes, practices and
guidelines that must be followed when performing risk assessments of a computer network.

Nathanael Asaam
6
Literature Review
This chapter discusses previous work in the field of intrusion detection system, anomaly detection
systems, neural networks and other literature relevant to this research paper.
Intrusion Detection Systems
Basically, there are two types of intrusion detection systems in the industry based on the approach
used for threat detection and the technologies used to build the system [20]. These are knowledge
based also known as signature based and behaviour based intrusion detection systems [20]. Each
takes a different approach to threat detection and each uses different technology for building the
intrusion detection systems. Also, every single one has its pros and cons. Knowledge based
intrusion detection systems are built on a database of already known threats [20]. These known
vulnerabilities or threats are called threat signatures [20]. Usually, detection is done as direct
mappings of various system incidents that indicate threats with threat signatures [20]. As a result,
the database of threats must be constantly updated for new identified threats [20]. Because new
threats can be detected for inclusion in the database, the correctness of detecting threats is
sometimes compromised since threats which do not have corresponding signatures cannot be
mapped and detected [25]. But these types of intrusion detection systems have lower false alarms
since each detected threat is registered in the database of threat signatures [20].
Behaviour based intrusion detection systems take a different approach to threat detection.
They are built using artificial intelligence technologies [20]. Usually, the system for which the
intrusion detection is built is modelled for its behaviour and deviations from that behaviour is used
as a technique for detecting the threats [20]. Because of this, they have a better correctness at
detecting threats [20]. No threat signatures or mappings of incidents that indicate threat is required
[20]. Additionally, they have higher false alarms because there is no mapping of detected threats
with a database of known threats [20]. Besides these, intrusion detection systems are classified
based on purposes for which they are built and the activeness or passiveness at which they deal
with threats [20]. There are host based and network based intrusion detection systems made for
such purposes [20]. Active intrusion detection systems are configured to block or prevent attacks
while passive intrusion detection systems are configured to monitor, detect and alert threats [20].
Risk Analysis
Computer risk analysis is also called risk assessment [23]. It involves the process of analyzing and
interpreting risk [23]. There are two main types of risk assessments: qualitative and quantitative
[21]. Quantitative Risk Assessments uses mathematical models and simulations to assign
numerical values to risk [22]. Qualitative Risk Assessments relies on an Expert’s subjective
judgement to build a theoretical model of risk for any given situation [22]. It must be stated that,

Nathanael Asaam
7
to analyze risk, the scope and methodology has to be initially determined [23]. Later, information
is collected and analyzed before interpreting the risk analysis results.
[23]. Determining the scope can be described as identifying the system to be analyzed for risk and
parts of the system that will be considered [23]. Also, the analytical method that will be used with
its detail and formality must be planned [23]. The boundary, scope and methodology used during
risk assessment determine the total amount of work efforts that is needed in the risk management,
and the type and usefulness of the assessments result [23].
Risk has many components including assets, threats, likelihood of threat occurrence,
vulnerability, safeguard and consequence [23]. Two formulas for Risk are of paramount
significance to this research paper. The first one is given as; Risk = Threat + Consequence +
Vulnerability [24]. It must be emphasized that, “Risk in this formulas can be broken down to
consider likelihood of threat occurrence, the effectiveness of your current security program and
the consequence of an unwanted criminal or terrorist event occurring” [24]. The second formula
which I have known while I was an Information Security Consultant is given as; Risk = Likelihood
of Threat Occurrence ✕ Impact of Threat Occurrence
This formula is somewhat more suitable in performing risk assessment because it is a bit
simple. It could be used for qualitative risk assessments or even quantitative risk assessments.
Additionally, Risk management includes risk acceptance which takes place after several risk
analyses [25]. Normally, after risk has been analyzed and safeguards implemented, the remaining
or residual risk in the system that makes the system functional must be accepted by management
[25]. This may be due to constraints on the system such as ease of use or features of the systems
for which strict safeguard will cost the organization operational problems. As such, risk
acceptance, like the selection of safeguards, should take into account various factors besides those
addressed in the risk assessment [23]. In addition, risk acceptance should take into account the
limitations of the risk assessment [23].
Anomaly Detection Systems
According to a research paper entitled “Design and Implementation of Anomaly Detection
System”, there are global variables of a network that can be used for detecting anomalous activities
on a network [26]. The paper used a hybrid of signature based and anomaly intrusion detection to
detect anomaly [26]. According to the paper, some of the techniques used for detecting intrusion
include using generic network rules to detect network anomaly. The paper also used dynamic
network knowledge such as network statistics to detect anomalous activities [26].
Behaviour Encryption

Nathanael Asaam
8
Behaviour algorithms are applied to safeguard information on computing devices such as mobile
phones and laptops [27]. These algorithms are the basics for building systems that study and
encrypt user behaviour on a computing device in order to ensure the security of information on the
computing devices [27]. A study into mobile platform security reports that behaviour encryption
application systems have been designed and built, focusing on mobile platforms [27]. Results from
this study indicated that encryption application systems are effective in ensuing mobile platform
security [27].
In addition to this, it must be noted that, since mobile devices can have security through
behaviour encryption systems, then the behaviour of hosts on a network or network systems can
also be encrypted to ensure safe communication since each host or user on a system or network
has a particular behaviour pattern .Cryptographic study into encrypting the normal usage model
can fall under behaviour encryption since the usage model represents a system’s behaviour and
can be composed of a user’s behaviour. This can aid in securing the information that embodies the
usage model. It is also necessary because if the usage model can easily be predicted then it is
possible to manipulate the usage model and launch an attack.
Artificial Neural Network
An ANN is a mathematical representation of the human neural architecture, reflecting its learning
and generalization abilities [29]. ANN is widely used in research because it can model highly non-
linear systems in which the relation among the variables is unknow or very complex [29]. The
basic elements of an ANN are Artificial Neurons, Weight and Biases, activation Functions and
Layers of Neurons [28]. There are three layers of neurons, namely, Input Layer, Hidden Layer and
Output Layer [28]. Also, there are two modalities of ANN. These are Neuron connections, Signal
flow. The two categories of neural connection architectures are monolayer network and multilayer
network [28]. The Signal flow categories are feedforward networks and feedback networks [28].
The types of learning in ANN are Supervised learning and Unsupervised learning [28]. There are
also two stages of learning which are training and testing [28].
Learning Parameters
One important learning parameter is the learning rate. It dictates how strongly the neural weights
would vary in the weights hyperspace. It is denoted by the Greek letter η. The learning process
may be and is expected to be controlled [29].
Another important parameter is the condition for stopping. Usually, the training stops when the
general mean error is reached. However, there are cases where maximum number of iterations or
epochs is the condition for stopping. This is particularly used when the network fails to learn and
there is little or no change in the weights’ values [29].

Nathanael Asaam
9
Activation Function
Let us now look at the activation function. There are four types of activation functions [28]. These
are:
• Sigmoid Logistic – Given by the equation: 𝑓(𝑥) =
1
1+𝑒−𝑥
• Hyperbolic Tangent – Given by the equation: 𝑓(𝑥) =
1−𝑒−𝑥
1+𝑒−𝑥
• Hard Limiting Threshold – Given by the equation: 𝑓(𝑥) = {0 𝑖𝑓 𝑥 < 0, 1 𝑖𝑓 𝑥 ≥ 1} Also
known as Binary Step
• Purely Linear – Given by the equation: 𝑓(𝑥) = 𝑥
It must be emphasized that for binary classification problems, a sigmoid logistic activation
function can be used [33]. Also, to predict values that are larger than 1 Sigmoid Logistic and
Hyperbolic Tangent are not suitable as the activation function.
Below is a comparison of various activation functions and their derivatives [33].

Nathanael Asaam
10
Learning Algorithms
Examples of learning algorithms in ANN are Perceptron and Adaline (Adaptive Linear Neuron).
Below is a diagram that compares Perceptron and Adaline.
Error Measurement & Cost Function
Supposed we have a set of records containing a pair of X and T variables.
Let consider developing a ANN as a mathematical function, ANN () that produces Y when feed
with X. For each x value given to the ANN it will produce a y value that when compared to the t
value gives an error e. e=y-t [28]. Note that this is a mere individual error measurement per data
point [28]. We should take into account a general error measurement covering all N data pairs
because we want the network to learn all data points and the same weights must produce the data
covering the entire data set. That is the role of a cost function.
C(X, T,W)==
𝟏
𝑵
∑ [𝑨𝑵𝑵(𝒙(𝒊)) − 𝒕(𝒊)]²
𝒏=𝑵
𝒊=𝟎 [28]
The function above is the overall measurement of error between the target output and the neural
output where X are the inputs, T are the target outputs and W the weights, x(i) is the input at point
i and t(i) is the target output at point i. Th cost function should be minimized.
Rule Based Model for Analyzing Http Access Logs and Detecting Web Scans, SQL Injection
(SQLI) and Cross-Site Scripting (XSS)
A research paper on using a rule-based model to detect anomaly by analyzing Http Server Access
Logs and Web Scans explain that, according to the European Network and Information Security
Agency (ENISA) Threat Landscape, Web based and Web Application attacks are ranked as number

Nathanael Asaam
11
two and three in Cyber Security Environment [17]. These rankings remain unchanged between
2014 and 2015 [17]. Thus, Web Applications are more prone to Security Risks [17].The research
paper states that Cross-Site Scripting (XSS) and Structure Query Language Injection (SQLI) seem
to be at a decreasing rate in 2014 but increased in 2015 [17]. The paper went further to state that,
to detect all the mentioned attacks and web scans, analyzing log files are preferred due to the fact
that anomalies in users’ request and related server response could be clearly identified [17]. Also,
it must be stated that two primary reasons why analyzing log files is preferred are that there is no
need for expensive hardware for the analysis and also log files provide successful detection
especially for encrypted protocols like Secured Socket Layer (SSL) and Secured Shell Daemon
(SSHD) [17]. However, the paper noted that, the heavier the website traffic the more difficult the
analysis of the log file and this presents the need for a user-friendly web vulnerability scanner
detection tool for analyzing log files [17].
Also, the motivation for this research paper is that, work in this field uses a different
approach, which is machine learning and data mining based predictive detection of malicious
activities [17]. Additionally, in order to increase the accuracy of a machine learning classifier, a
large-scale input training data is needed which in turns leads to increase in memory usage [17].
Another negative point about machine learning based approaches is overfitting; referring to a
model that models the training data too well resulting in the model’s negative predictive
performance and low generalization ability [17]. Finally, the proposed model of this research paper
has three significant assumptions. These are;
1. In access logs POST data cannot be logged. Subsequently, the proposed method cannot
capture this sort of data [17].
2. Browsers or Web Server may support other encodings. Since only two are in the context of
the research paper, the script does not capture data encoded in other styles [17].
3. The proposed model is for detection of two well-known web application attacks and
malicious web vulnerability scans. Thus, the model is not for prevention and working
online mode is not included in the research paper [17].
Classification of Malicious Cyber Activities and Attacks and Vulnerability Scans
A research paper on classification of malicious web sessions states that SANs reported that 60%
of total attack attempts observed on the Internet were against Web Applications [7]. The paper
further states that recently, the long tradition and great success of characterization of network
traffic and server workload is not the focus of research [7]. Also, not much focus is placed on
quantification of malicious attacker behaviour [7]. The one evident reason for this is the lack of
publicly available, good quality data on cyber security threats and malicious attacker activities
[7].The paper explains that, although there is a significant amount of research in intrusion
detection, the focus is on developing data mining techniques aimed at constructing a black-box
that classifies network traffic on malicious and non-malicious activities rather than the discovery

Nathanael Asaam
12
of the nature of malicious activities [7]. Additionally, significant amount of intrusion detection
research works were based on outdated data sets such as the DARPA Intrusion Detection Data Set
and its derivative KDD [7]. Motivated by the lack of available data sets that incorporated attacker
activities, the researchers developed and deployed high interaction honeypots as a means to collect
such data [7]. Their honeypots were configured in a three-tier architecture (consisting of frontend
web server, application server and backend database) and had meaningful functionalities [7].
Furthermore, they ran standard off the shelf operating systems and applications which followed
typical security guidelines and did not include user accounts with nil or weak passwords [7]. The
data collected by the honeypots are grouped into four datasets each with a duration of four to five
months [7].Also, each dataset consisted of malicious web sessions extracted from application level
logs of systems running on the Internet [7].
The research paper used supervised machine learning methods to automatically classify
malicious web sessions on attacks and vulnerability scans and each web session was characterized
with 43 features reflecting different session characteristics such as number of requests in a session,
number of requests of a specific method type (GET, POST, OPTIONS), number of requests to
dynamic application files and length of request substring within a session [7]. In all, the research
paper used three supervised machine learning methods; namely, Support Vector Machines (SVM),
Decision trees based J48, and PART to classify attacker activities aimed at web systems [7].
According to the paper, results show that Supervised Learning methods can be used to efficiently
distinguish attack sessions from vulnerability scan sessions, with very high probability of detection
and low probability of false alarms [7]. Finally, it is worth stating that the research paper explored
the following three research questions;
1. Can Supervised Machine Learning methods be used to distinguish between Web Attacks
and Vulnerability Scans?
2. Do Attacks and Vulnerability Scans differ in small number of features? If so, are these
subset of best features consistent across different datasets?
3. Do some learners perform consistently, better than others across different datasets?
Security Monitoring of Http Traffic Using Extended Flows
A research paper on Security Monitoring of Http Traffic Using Extended Flows states that
Http is currently the most widely used protocol which takes a significant amount of network traffic
[18] The paper further explains that the most suitable way of gaining an overview of Http traffic
in a large-scale network is extended network flow monitoring [18]. There are two approaches to
network traffic monitoring, according to the research paper. These are Deep Packet Inspection
(DPI) and Flow Monitoring. DPI is resource demanding but provides detailed information about a
whole packet including a payload [18]. Network Flow Monitoring is fast but is limited to layers 3
and 4 of the OSI/OSI model but Extended Flow Monitoring is a synergy of the benefits of both

Nathanael Asaam
13
methods [18]. It provides application-level data to traditional flow records while keeping the
ability to monitor large-scales and high-speed networks [18].
The research paper further explains that the correlation of logs from web servers is an
option, but also states that in large networks it is not always possible to gain access to logs or even
be aware of all of them [18].
Thus, this research is more significant to Administrators of Large Networks; in general
Networks of Academics and ISPs [18]. The paper also addresses two problems, which are, lack of
overview of network traffic and insufficient security awareness [18]. The paper also states that
many Administrators oversee Web Servers and their neighbourhood in their administration but are
not aware of security threats in the rest of the network [18]. The other problem is to find a suitable
set of tools to analyze Http traffic and distinguish between legitimate and malicious traffic [18].
The research paper poses these two research questions;
1. What classes of Http traffic relevant to security can be observed at network level, and what
is their impact on attack detection?
2. What is the added value of extended flow compared to traditional flow monitoring from a
security point of view?
The paper also describes three classes of Http traffic which contain brute-force password
attacks, connection to proxies, and Http Scanners and Web crawlers [18]. Using classification, the
paper was able to detect 16 previously undetectable brute-force password attacks and 19 Http
Scans per day in their campus [9]. The activities of proxy servers and web crawlers were also
observed [18]. Another result of this research paper is that four network flows were monitored
[18]. These are source IP address, destination IP address, hostname, and Http Requests [18].

Nathanael Asaam
14
Research Methodology
This chapter of the research paper will describe the methodology of the study.
Data Sources
In this study, the of data that will be used will be obtained from an Apache Access Log file of an
Apache Web Server hosted on a LAMP Stack that hosts some web application and also data from
Risk Assessments of a computer network and also data from a log file of a proxy server on a
network. The data from the Apache Web Server will help us build a usage profile of the Apache
Web Server using behaviour models such as statistical models, machine learning models and
cognitive based models. It is our hope that the data from the Risk assessment of the network and
the data from the log file of the proxy server will help us build a usage profile of the computer
network. Additionally, it is hoped that the data from the Risk assessment of the computer network
will provide us with information about the following variable listed below collated over a period
of time say a month or three months.
• Number of application software that run on the network during a day.
• Number of system processes that run on the network during a day.
• Number of authentications in the network during a day.
• Number of user actions that happens on the network during a day.
• The time it takes a user before his session expires on the network.
• The amount of memory space used on a device such as a personal computer or smartphone
that connects to the network.
• The CPU time spent on a single device such as a personal computer or a smartphone that
connects to the network.
However, if this data cannot be obtained, we hope to deploy a honeypot that will help us capture
this data for a period of time say, a month or three months for each day.
Usage Profile
Assuming that the normal behaviour (Y) of a computer network can be represented by a linear
regression or an artificial intelligence model; Y=f (Xi,) such that Xi represents some variables such
as the number of application software that run on the network during a day or the number of system
processes that run on the network during a day. Then when a change in Y is beyond the standard
deviation determined from the data set of the network usage, then that change indicates an
anomaly. To investigate this anomaly, machine learning algorithms such as Hidden Markov models
and Artificial Neural Networks, and Linear Regression Models and Anomaly Intrusion Detection
systems will be studied to determine Y in terms of a number of variables that represent Y
appropriately. Additionally, a Java Interface that implements the Usage profile will be employed

Nathanael Asaam
15
to investigate how to implement the usage profile of the network. For each component of a
computer network under investigation, we will program a usage profile which is an
implementation for that component. Each usage profile implements an interface captured in a java
file called model.java.
There are eight functions in the model.java interface. The first one is computeval which is
for computing the usage value at an instance. The second one is findchange which is for finding
changes in the usage of the computer system. The third one is learnsys which is for learning the
usage of the system. The fourth one is findrelationship which is for finding the regression
equation. The fifth one is monitor which is for monitoring the usage of the system. The sixth one
is showalarm which is for displaying error messages and detected intrusion. The seventh one is
haltprocess which is for halting detected intrusion and the eighth one is predictvals. It is for
predicting usage values based on the regression equation determined. Omitting an implementation
of one of the functions of the usage model will throw an exception. To implement the usage model,
you will use the java keyword implements. Refer to Appendix 10 for an implementation of the
model.java file and how it can be used to implement a Usage Profile.
Processes
The list below details activities or processes that will be followed to represent a computer network
and web and mobile applications with an abstract model and analyze changes in the computer
network and the web and mobile applications. It is hoped that following these processes will lead
to the design and implementation of an anomaly detection system.
1. Machine Learning Algorithms & Anomaly Intrusion Detection Systems Study
Machine learning techniques and algorithms will be investigated to know the extent to which an
expert system that learns a computer network’s behaviour and the behaviour of web and mobile
application that are used on the network can be built. The behaviour of the computer network and
the web and mobile applications is known in this research paper as a usage profile. Since the
expected usage profile is going to be composed of a linear regression model and an artificial neural
network and a hidden Markov model, linear regression modelling techniques, and artificial neural
networks and Hidden Markov models will be applied to determine the usage profile. When,
deviations from these statistical and artificial intelligence models are analyzed it can lead to design
and implementation of an anomaly detection system. As such, a thorough study into the design and
implementation of anomaly intrusion detection systems will also be done.
2. Analysis of Apache Access Log File and Computer Network Risk Assessment Data
At this stage of the study, we will analyze Apache Access Log files to aid with implementing the
usage profile of the network and web and mobile applications that are used on the network.

Nathanael Asaam
16
Additionally, it is expected that reports of Computer Network Risk assessments will be sampled
and analyzed to arrive at a set of dependent and independent variables and their data set. If this
data cannot be obtained, then we will deploy honeypots that will help us obtain and collate this
data.
3. Modelling of Deviations from the Usage Profile
Differential equations of the regression equation obtained will be investigated to know the extent
to which deviations from the usage profile can be analyzed. Also abstract statistical and artificial
intelligence models of these deviations will be formulated. These abstract models will be
derivatives of the usage profile. This will help us detect anomaly on the computer network and the
Apache Web Server. Additionally, Linear Programming models of the regression equation will be
implemented to help determine points on the Network and the Apache Server where the behaviour
observed is optimum. This will also help us flag anomalous activities.
Proposed System Model
This section describes our proposed Anomaly Detection System model for the LAMP server. The
proposed Anomaly Detection System for the Apache Web Server employs three different but
simple techniques for log file size monitoring, log file entries classification, and Markov Model of
log file sizes.
1. Log File Size Monitoring
First of all, we will check for the size of the Apache Access log file and we will do real time
monitoring of the Apache Access log file in order to determine the rate of change of the file sizes
on a day-to-day basis. Also, we will track log file sizes daily to see if the expected new file size is
within the expected threshold based on statistical measures such as mean log file sizes computed
based on file sizes for a number of days, and standard deviation of that data. If during the
monitoring, we see a deviation, we will record it as an anomaly.
2. Log File Entries Classification
Also, we will analyze the Apache Access log file and classify every log entry as being, a normal
user activity or an abnormal user activity. As such based on that classification model, we can detect
abnormal user activities.
3. Markov Model of the Log Files Sizes
We will also build a Markov Model of the log file sizes using the data for each day. This will help
us infer into the new log file size for the Apache Access logs and be able to predict roughly, the
expected log file size for the next day. As such, for each day, if what the expected file size should
be is not achieved, then we can record it as an anomaly.

Nathanael Asaam
17
Usage Profile List
It must be stated that for each component of the system under investigation, we will develop a
usage profile.
1. Authentication Usage Profile
The authentication usage profile represents the usage of an authentication system. The independent
variables that must be sampled to determine the usage of an authentication system are the size of
data transmitted during an authentication (x1) and the network speed for a single authentication
(x2). The data transmitted is composed of the of request and response data for a single
authentication and the network speed is the composed of upload and download speed for a single
authentication. The dependent variable that must be sampled is the time taken for an authentication
(y).
2. Session Usage Profile
A session usage profile represents a single user’s behavior before his session expires. To determine
the regression or artificial intelligence model for a user’s session, two main independent variables
must be sampled. These are size of session data accumulated (x1), and number of user actions (x2).
The dependent variable that must be sampled is time spent before session expires (y).
3. Memory Usage Profile
The memory usage profile represents the usage of memory space in a device such as a personal
computer or a smartphone that is used on the computer network. The independent variables that
must be sampled are the number of application programs running (x1), and the number of system
processes running (x2). The dependent variable that must be sample is amount of memory space
used(y).
4. CPU Usage Profile
The CPU usage profile represents CPU usage in a device such as a personal computer or a
smartphone used on the computer network. The independent variables that must be sampled are
the number of application programs running (x1), and number of system processes running (x2).
The dependent variable that must be sampled is amount of CPU power being used (y).
5. Host Usage Profile
The host usage profile is composed of three independent variables. Memory usage (x1), session
usage (x2), and CPU usage (x3) derived from their respective usage profiles. The dependent
variable that must be sampled is the time host spent on the network (y).

Nathanael Asaam
18
6. Server Usage Profile
The server usage profile is made up of the CPU time being used and the memory space being used
These are the independent variables. The dependent variable for the Server Usage profile is the
time the server has been running.
7. Network Usage Profile
The network usage profile is made up of, server usage, host usage, and time spent on the network.
The first two variables are the independent variables. The last one is the dependent variables.
8. Aggressive Usage Detector
This usage profile is a utility that detects aggressive behavior on a system. It is modelled just like
the various usage profiles. Various factors that determine aggressive behavior during system usage
are used to determine the regression or artificial intelligence profile of this utility. Aggressive
behavior includes aggressive use of major system resources, and aggressive use of system
components with limited resources. The average aggressive behavior and its standard deviation
are determined. Any system occurrence that indicates the average aggressive behavior, or the
average aggressive behavior plus its standard deviation or the average aggressive behavior minus
its standard deviation is considered an attack or a threat and must be halted, alerted or stored for
audit purposes.
9. False Alarm Detector
The false alarm detector is a utility that detects normal system usage that otherwise may be deemed
as an attack or a threat. Occurrences that meet the criteria for false alarms are normal usage that
seems to put the entire usage of the system into a false state of vibration or anarchy. Such usage
occurrences are as such prioritized as normal optimal usage. The remedy for the vibrations such
usage occurrences cause is the delay in other usage occurrences in the system. The state and
magnitude of other system occurrences plus the state and magnitude of the normal optimal usage
determine the impact of the perceived anarchy. To increase convenience with which the system for
which this utility is developed, the average delay time and its standard deviation must be detected.
This utility is part of the normal usage and is modelled just like the aggressive usage detector.

Nathanael Asaam
19
Conclusion
It must emphasized that this paper describes some of the simplest techniques that can be used to
develop a usage profile of a computer network connected to the internet and web and mobile
applications that are used on the network. These usage profiles represent the behaviour of the
computer network and the web and mobile application. Three techniques for building an artificial
intelligence model of an Apache Web server are described in the research paper and these
techniques are also easy to implement.

Nathanael Asaam
20
References
1. Neural Network-based Anomaly Detection Models and Interpretability Methods for Multivariate
Time Series data https://su.diva-portal.org/smash/get/diva2:1784432/FULLTEXT01.pdf
2. Understanding Apache Logging: How to View, Locate and Analyze Access & Error Logs
https://sematext.com/blog/apache-
logs/#:~:text=The%20Apache%20access%20logs%20are,used%20to%20request%20the%20data.
3. Artificial neural network based techniques for anomaly detection in Apache Spark
https://www.researchgate.net/publication/336769100_Artificial_neural_networks_based_tec
hniques_for_anomaly_detection_in_Apache_Spark
4. How to View and Configure Apache Access and Error Logs
https://betterstack.com/community/guides/logging/how-to-view-and-configure-apache-access-
and-error-logs/
5. HTTP request methods https://developer.mozilla.org/en-US/docs/Web/HTTP/Methods
6. HTTP response status codes https://developer.mozilla.org/en-US/docs/Web/HTTP/Status
7. K. Goseva-Popstojanova, G. Anastasovski and R. Pantev. Classification of Malicious web
sessions. https://community.wvu.edu/~kagoseva/Papers/ICCCN-2012.pdf
8. W. Frisby, B. Moench, B. Rench and T. Ristenpart. Security Analysis of Smartphone point-
of-sale-systems. https://www.usenix.org/system/files/conference/woot12/woot12-
final25.pdf
9. M. B. Seyyar, F. O. Catak and E. Gul. Detection of attack-targeted scans from the Apache
Http Server access logs.
https://www.sciencedirect.com/science/article/pii/S2210832717300169
10. Hackin9 Practical Protection Security Magazine
https://www.slideshare.net/RodrigoGomesPires/hakin9-05-2013?from_search=3
11. What is Apache https://www.wpbeginner.com/glossary/apache/
12. Ubuntu Server Documentation
https://ubuntu.com/server/docs#:~:text=Ubuntu%20Server%20is%20a%20version,your%20data
centr e%2C%20public%20or%20private
13. What is Mysql and How Does it work https://www.hostinger.com/tutorials/what-is-mysql
14. What is Mysql https://dev.mysql.com/doc/refman/8.0/en/what-is-mysql.html
15. What is PHP https://www.php.net/manual/en/intro-whatis.php
16. What is PHP? Learning All about the Scripting Language
https://www.hostinger.com/tutorials/what-is-php/
17. Detection of Attack Targeted Scans From Apache HTTP Server Access
Logs https://www.sciencedirect.com/science/article/pii/S2210832717300169
18. Security Monitoring of Http Traffic Using Extended Flows
https://is.muni.cz/publication/1300438/http_security_monitoring-paper.pdf

Nathanael Asaam
21
19. Analyzing Http Request for Web Intrusion Detection
https://www.semanticscholar.org/paper/Analyzing-HTTP-requests-for-web-intrusion-detection-
Althub iti-Yuan/f3adfc7e7686114ce2cb1a1eb7dc22848fdf13ca
20. Intrusion Detection System (IDS) https://searchsecurity.techtarget.com/definition/intrusion-
detection-system
21. What is Risk Management and why is it important
https://www.techtarget.com/searchsecurity/definition/risk-analysis [30-04-2023]
22. Risk Analysis: Definition, Types, Limitations and Examples
https://www.investopedia.com/terms/r/risk-analysis.asp [30 -04-2023]
23. An Introduction to Computer Security. Chapter 7: Computer Security Risk Management
https://csrc.nist.rip/publications/nistpubs/800-12/800-12-html/chapter7.html [30-04-2023]
24. The Three Components of Security Risk Assessments https://www.securingpeople.com/security-
risk-assessment/threat-vulnerability-risk/#:~:text
=Risk%20%3D%20Threat%20%2B%20Consequence%20%2B%20Vulnerability,criminal%20or
%20terrorist%20event%20occurring. [30-04-2023]
25. Risk Acceptance https://www.enisa.europa.eu/topics/risk-management/current-risk/risk-
management-invent ory/rm-process/risk-acceptance [30-04-2023]
26. Gaia Maselli, Luca Deri, Stefano Suin, Design and Implementation of an Anomaly Detection
System: an Empirical Approach https://luca.ntop.org/ADS.pdf
27. Jian g Chunfeng, Research and application of behavior encryption [27-31 May 2012]
http://ieeexplore.ieee.org/xpl/articleDetails.jsp?tp=&arnumber=6320096&contentType=Confer
e nce+Publications&queryText%3DCyber+Security+Papers+.PLS.+mobile+phones
28. F. M. Soares, A. M. F. Souza Neural Network Programming with Java
https://img1.wsimg.com/blobby/go/389886fb-5e50-4fa8-b2d8-
5849bb0f2b64/downloads/1ca32jha0_704519.pdf
29. F. Amato, A. Lopez, E. M. Pena-Mendez, P. Vanhara, A. Hampl, J Havel Artificial neural networks in
medical diagnosis
https://www.researchgate.net/publication/250310836_Artificial_neural_networks_in_medical_
diagnosis
30. Artificial Intelligence (AI) in Medical Diagnostics Market by Modality(CT, X-RAY, MRI, Ultrasound),
Application (IVD, Radiology, CNS, CVS, Ob/Gyn), User (Hospital, Lab), Unmet Need, Key
Stakeholders, Buying Criteria – Global Forecast https://www.marketsandmarkets.com/Market-
Reports/artificial-intelligence-medical-diagnostics-market-
22519734.html?gad_source=1&gclid=Cj0KCQiAwvKtBhDrARIsAJj-kTgP5p6mDNGEPgB-
rwe9NDVTVZzpFglIk9WqNLS40-RSp0iFYMH_0pMaAorAEALw_wcB
31. M. Banoula What is Perceptron: A Beginner Guide for Perceptron
https://www.simplilearn.com/tutorials/deep-learning-tutorial/perceptron

Nathanael Asaam
22
32. Adaline Concepts and definitions
https://gamco.es/en/glossary/adaline/#:~:text=ADALINE%20(Adaptive%20Linear%20Neuron)%2
0is,uses%20a%20linear%20activation%20function.
33. L. Panneerselvam Activation Functions and their Derivatives – A Quick and Complete Guide
https://www.analyticsvidhya.com/blog/2021/04/activation-functions-and-their-derivatives-a-
quick-complete-guide/

Nathanael Asaam
23
Appendix 1
Location of Apache Log Files
The Apache log files can be located and viewed depending on the Operating System on which the Apache
Web Server is hosted [2]. For Debian and Ubuntu Linux distributions, the access log file can be found and
viewed at /var/log/apache2/access.log [2]. For Red Hat, CentOS and Fedora Linux distributions, the
access log file can be found at /var/log/httpd/access_log [2]. On FreeBSD, the access log file is stored in
/var/log/httppd-access.log [2]. Also, the error log file can be located at var/log/apache2/error.log in
Ubuntu and Debian distributions and at /var/log/httpd/error_log in CentOS, Fedora and Red Hat
Enterprise Linux (RHEL) distributions [1]. Also, on FreeBSD Linux distributions, the error log file is stored
in var/log/httpd-error.log [2].
Additionally, it must be stated that Apache Access Log files stores information about incoming
requests [1]. The details captured include the time of the request, the requested resource, the response
code, the time it took to respond, and the IP address used to request the data [2]. Also, it is paramount
to mention that understanding log files is easier when you use a log analysis tool that parses the data and
gives you an aggregate view of the data [2].
Appendix 2
1. The Apache Common Log Format
The definition of the Common Log Format looks like:
LogFormat “%h %l %u %t ”%r” %>s %b” common
Below is how an access log from that file looks like:
10.1.2.3 – rehg [10/Nov/2021:19:22:12 -0000] “GET /sematext.png HTTP/1.1” 200 3424
As you can see the following elements are present:
• %h, resolved into 10.1.2.3 – the IP address of the remote host that made the request
• %l, remote log name provided by identd, in our case a hyphen is provided, which is a value that
we can expect to be logged in a case when the information provided by the logging directive is
not found or can’t be accessed.
• %u, resolved into rehg, the user identifier determined by HTTP authentication.
• %t, the date and time of the request with time zone, in the above case it is
[10/Nov/2021:19:22:12 -0000]
• ”%r”, the first line of the request inside double quotes, in the above case it is: “GET
/sematest.png HTTP/1.1”
• %>s, the status code reported to the client. This information is crucial because it determines
whether the request was successful or not.

Nathanael Asaam
24
• %b, the size of the object sent to the client, in our case the object was sematext.png file and its
size was 3423 bytes.
2. The Apache Combine Log Format
The Combined Log Format is very similar to the Common Log Format but include two additional headers
– the referrer and the user agent. Its definition looks like:
LogFormat “%h %l %u %t ”%r” %>s %b” ”%{Referer}i” ”%{User-agent}i” combined
An example of log line produced by the above log looks like:
10.1.2.3 – grah [12/Nov/2021:14:25:32 -000] “GET /sematext.png HTTP/1.1” 200 3423
http://www.sematext.com/index.html “Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebkit/537.36
(KHTML, like Gecko) Chrome/79.0.3945.74 Safari/537.36 Edg/79.0.309.43”
Appendix 3
HTTP Request Methods
There are several HTTP request methods. Below is a description of the types of HTTP request methods.
HEAD: The HEAD method asks for a response identical to a GET request, but without the response body.
GET: The GET method requests a representation of the specified resource. Requests using GET should only
retrieve data.
POST: The POST method submits an entity to the specified resource, often causing a change in state or
side effects on the server.
PUT: The PUT method replaces all current representations of the target resource with the request payload.
DELETE: The DELETE method deletes the specified resource.
CONNECT: The CONNECT method establishes a tunnel to the server identified by the target resource.
OPTIONS: The OPTIONS method describes the communication options for the target resource.
TRACE: The TRACE method performs a message loop-back test along the path to the target resource.
PATCH: The PATCH method applies partial modifications to a resource.
Appendix 4
HTTP Respond Status Codes
The are several HTTP response status codes. These codes indicate whether a particular HTTP request was
successful or not. The responses and as such their corresponding codes are group into five class. Below
are the five classes.

Nathanael Asaam
25
a. Informational responses (100 to 199)
b. Successful responses (200 to 299)
c. Redirection messages (300 to 399)
d. Client error responses (400 to 4990
e. Server error responses (500 to 599)
Appendix 5
Informational Responses
100 - Continue: This interim response indicates that the client should continue the request or ignore the
response if the request is already finished.
101 - Switching Protocol: This code is sent in response to an Upgrade request header from the client and
indicates the protocol the server is switching to.
102 - Processing: This code indicates that the server has received and is processing the request, but no
response is available yet.
103 - Early Hints: This status code is primarily intended to be used with the link header, letting the user
agent start preloading resources while the server prepares a response or preconnect to an origin from
which the page will need resources.
Appendix 6
Successful Responses
200 - OK: The request succeeded. The result meaning of "success" depends on the HTTP method:
a. GET: The resource has been fetched and transmitted in the message body.
b. HEAD: The representation headers are included in the response without any message body.
c. PUT or POST: The resource describing the result of the action is transmitted in the message body.
d. TRACE: The message body contains the request message as received by the server.
201 - Created: The request succeeded, and a new resource was created as a result. This is typically the
response sent after POST requests, or some PUT requests.
202 - Accepted: The request has been received but not yet acted upon. It is noncommittal, since there is
no way in HTTP to later send an asynchronous response indicating the outcome of the request. It is
intended for cases where another process or server handles the request, or for batch processing.
203 - Non-Authoritative Information: This response code means the returned metadata is not exactly the
same as is available from the origin server but is collected from a local or a third-party copy. This is mostly
used for mirrors or backups of another resource. Except for that specific case, the 200 OK response is
preferred to this status.

Nathanael Asaam
26
204 - No Content: There is no content to send for this request, but the headers may be useful. The user
agent may update its cached headers for this resource with the new ones.
Appendix 7
Redirection messages
Appendix 8
Client error response
Appendix 9
Server error response
Appendix 10
1. Generic Usage Profile Interface
public interface model{ public double
computeval(); public double
findchange(); public void learnsys(int
t); public Object findrelationship();
public void monitor(int t);
public void showalarm(String info); public void
haltprocess();
public void predictvals();
}
2. An implementation of the Usage Profile
class auth_usage implements model{
/*variable declaration for dependent and independent variables */ public double
computeval(){
}
public double findchange(){
}
public void learnsys(int t){

Nathanael Asaam
27
}
public Object findrelationship(){
}
public void monitor(int t){
}
public void showalarm(String info){
}
public void haltprocess(){
}
public void predictvals(){
}
}/* end of class

An Artificial Neural Network Based Anomaly Detection for Computer Networks Connected to the Internet and Web and Mobile Applications that are Accessed through the Computer Network

Recommended

Recommended

More Related Content

Similar to An Artificial Neural Network Based Anomaly Detection for Computer Networks Connected to the Internet and Web and Mobile Applications that are Accessed through the Computer Network

Similar to An Artificial Neural Network Based Anomaly Detection for Computer Networks Connected to the Internet and Web and Mobile Applications that are Accessed through the Computer Network (20)

Recently uploaded

Recently uploaded (20)

An Artificial Neural Network Based Anomaly Detection for Computer Networks Connected to the Internet and Web and Mobile Applications that are Accessed through the Computer Network