SlideShare a Scribd company logo
1 of 5
Download to read offline
Nathanael Asaam
Founder and CEO @ Equicksales Consulting Ltd | Application Support Officer @ Ashesi
University
nataoasaam@gmail.com
An Anomaly Detection System for Ecommerce Sites Hosted on a LAMP Server
Introduction
Anomaly detection uses deviations from novel patterns of a system to detect intrusions and threats on a
computer system. In this paper, we describe how to build an anomaly detection system for ecommerce sites
that are hosted on a LAMP server. LAMP is an acronym that stands for Linux, Apache, Mysql and PHP.
The motivation for this research paper is that LAMP servers normally come with application logs for
the various software that make up the LAMP stack. Thus, it is easy to build a machine learning model that
describes the behaviour of the various software that make up the LAMP stack or a rule-based model for
anomaly detection. This is partly because the training data can easily be obtained from the logs for apache,
mysql php, and any linux distribution. Additionally, the logs are updated as and when there are interactions
with the various softwares. There are also error logs for some errors that occurred in these softwares.
In this project, we will describe how to build a model for a LAMP Server that uses Ubuntu Server and use it to
detect anomalous activities on a web application or ecommerce site that is hosted on that LAMP server. It is
essential to note that Ubuntu Server is free and open source, and so is Apache, Mysql and PHP.
Background
In this section we give a concise background of Ubuntu Server, Apache, Mysql and PHP.
Ubuntu Server
Ubuntu Server is a version of the Ubuntu Operating System that is designed and engineered as a backbone for
the Internet [2]. Ubuntu Server brings economic and technical scalability to your datacentre, public or private
[2]. Whether you want to deploy an OpenStack, a Kubernetes cluster, or a 50,000 - node render farm, it
delivers the best value scale-out performance available[2].
Apache
Apache also known as Apache HTTP Server is the most widely used web server software and runs on 67% of all
websites in the world [1]. It is developed and maintained by Apache Software Foundation [1]. It is fast, reliable
and secure and can be highly customized to meet the needs of many different environments using extensions
and modules [1].
Mysql
Mysql is an open-source Relational Database Management System (RDBMS) that enables users to
store,manage and retrieve structured data efficiently [4]. It is widely used for various applications from
small-scale projects to large-scales websites and enterprise level solutions [4]. It is also the most popular
open-source SQL DBMS and is developed, supported and distributed by Oracle Corporation [5].
PHP
PHP (a recursive acronym for PHP Hypertext Preprocessor) is a widely used general-purpose scripting language
that is well suited for web development and can be embedded in HTML [6]. According to Web Technology
1
2
Surveys, PHP is used by 78.1% of all websites including high-traffic websites such as Facebook and Wikipedia
[7].
Previous Work
In this section we describe several works related to vulnerability scan detection and detecting intrusions or
attacks by analyzing logs of Apache Http Server.
Rule Based Model for Analyzing Http Access Logs and Detecting Web Scans, SQL Injection (SQLI) and
Cross-Site Scripting (XSS)
A research paper on using a rule-based model to detect anomaly by analyzing Http Server Access Logs and Web
Scans explain that, according to the European Network and Information Security Agency (ENISA) Threat
Landscape, Web based and Web Application attacks are ranked as number two and three in Cyber Security
Environment [3]. These rankings remain unchanged between 2014 and 2015 [3]. Thus, Web Applications are
more prone to Security Risks [3].
The research paper states that Cross-Site Scripting (XSS) and Structure Query Language Injection
(SQLI) seem to be at a decreasing rate in 2014 but increased in 2015 [3]. The paper went further to state that,
to detect all the mentioned attacks and web scans, analyzing log files are preferred due to the fact that
anomalies in users’ request and related server response could be clearly identified [3]. Also, it must be stated
that two primary reasons why analyzing log files is preferred are that there is no need for expensive hardware
for the analysis and also log files provide successful detection especially for encrypted protocols like Secured
Socket Layer (SSL) and Secured Shell Daemon (SSHD) [3]. However, the paper noted that, the heavier the
website traffic the more difficult the analysis of the log file and this presents the need for a user-friendly web
vulnerability scanner detection tool for analyzing log files [3].
Also, the motivation for this research paper is that, work in this field uses a different approach, which
is machine learning and data mining based predictive detection of malicious activities [3]. Additionally, in order
to increase the accuracy of a machine learning classifier, a large-scale input training data is needed which in
turns leads to increase in memory usage [3]. Another negative point about machine learning based approaches
is overfitting; referring to a model that models the training data too well resulting in the models negative
predictive performance and low generalization ability [3].
Finally, the proposed model of this research paper has three significant assumptions. These are;
1. In access logs POST data cannot be logged. Subsequently, the proposed method cannot capture this
sort of data [3].
2. Browsers or Web Server may support other encodings. Since only two are in the context of the
research paper, the script does not capture data encoded in other styles.
3. The proposed model is for detection of two well-known web application attacks and malicious web
vulnerability scans. Thus the model is not for prevention and working online mode is not included in the
research paper.
Classification of Malicious Cyber Activities and Attacks and Vulnerability Scans
A research paper on classification of malicious web sessions states that SANs reported that 60% of total attack
attempts observed on the Internet were against Web Applications [8]. The paper further states that recently,
the long tradition and great success of characterization of network traffic and server workload is not the focus
2
3
of research [8]. Also, not much focus is placed on quantification of malicious attacker behaviour [8]. The one
evident reason for this is the lack of publicly available, good quality data on cyber security threats and
malicious attacker activities [8].
The paper explains that, although there is a significant amount of research in intrusion detection, the
focus is on developing data mining techniques aimed at constructing a black-box that classifies network traffic
on malicious and non-malicious activities rather than the discovery of the nature of malicious activities [8].
Additionally, significant amount of intrusion detection research works were based on outdated data sets such
as the DARPA Intrusion Detection Data Set and its derivative KDD[8]. Motivated by the lack of available data
sets that incorporated attacker activities, the researchers developed and deployed high interaction honeypots
as a means to collect such data [8]. Their honeypots were configured in a three-tier architecture (consisting of
frontend web server, application server and backend database) and had meaningful functionalities [8].
Furthermore, they ran standard off the shelf operating systems and applications which followed typical
security guidelines and did not include user accounts with nil or weak passwords [8]. The data collected by the
honeypots are grouped into four datasets each with a duration of four to five months [8]. Also, each dataset
consisted of malicious web sessions extracted from application level logs of systems running on the Internet
[8].
The research paper used supervised machine learning methods to automatically classify malicious
web sessions on attacks and vulnerability scans and each web session was characterized with 43 features
reflecting different session characteristics such as number of requests in a session, number of requests of a
specific method type (GET,POST, OPTIONS), number of requests to dynamic application files and length of
request substring within a session [8]. In all, the research paper used three supervised machine learning
methods; namely, Support Vector Machines (SVM), Decision trees based J48, and PART to classify attacker
activities aimed at web systems [8]. According to the paper, results show that Supervised Learning methods
can be used to efficiently distinguish attack sessions from vulnerability scan sessions, with very high probability
of detection and low probability of false alarms[8].
Finally, it is worth stating that the research paper explored the following three research questions;
1. Can Supervised Machine Learning methods be used to distinguish between Web Attacks and
Vulnerability Scans?
2. Do Attacks and Vulnerability Scans differ in small number of features? If so, are these subset
of best features consistent across different datasets?
3. Do some learners perform consistently, better than others across different datasets?
Security Monitoring of Http Traffic Using Extended Flows
A research paper on Security Monitoring of Http Traffic Using Extended Flows states that Http is currently the
most widely used protocol which takes a significant amount of network traffic [9] The paper further explains
that the most suitable way of gaining an overview of Http traffic in a large-scale network is extended network
flow monitoring [9]. There are two approaches to network traffic monitoring, according to the research paper.
These are Deep Packet Inspection (DPI) and Flow Monitoring. DPI is resource demanding but provides detailed
information about a whole packet including a payload [9]. Network Flow Monitoring is fast but is limited to
layers 3 and 4 of the OSI/OSI model but Extended Flow Monitoring is a synergy of the benefits of both methods
[9]. It provides application-level data to traditional flow records while keeping the ability to monitor
large-scales and high-speed networks [9].
The research paper further explains that the correlation of logs from web servers is an option, but also
states that in large networks it is not always possible to gain access to logs or even be aware of all of them [9].
3
4
Thus this research is more significant to Administrators of Large Networks; in general Networks of Academics
and ISPs [9]. The paper also addresses two problems, which are, lack of overview of network traffic and
insufficient security awareness [9]. The paper also states that many Administrators oversee Web Servers and
and their neighbourhood in their administration, but are not aware of security threats in the rest of the
network [9]. The other problem is to find a suitable set of tools to analyze Http traffic and distinguish between
legitimate and malicious traffic [9].
The research paper poses these two research questions;
1. What classes of Http traffic relevant to security can be observed at network level, and what is
their impact on attack detection?
2. What is the added value of extended flow compared to traditional flow monitoring from a
security point of view?
The paper also describes three classes of Http traffic which contain brute-force password attacks,
connection to proxies, and Http Scanners and Web crawlers [9]. Using classification the paper was able to
detect 16 previously undetectable brute-force password attacks and 19 Http Scans per day in their campus [9].
The activities of proxy servers and web crawlers were also observed [9]. Another result of this research paper is
that four network flows were monitored [9]. These are source IP address, destination IP address, hostname,
and Http Requests [9].
Proposed System Model
This section describes our proposed Anomaly Detection System model for the LAMP server. The proposed
Anomaly Detection System employs three different but simple techniques for log file size monitoring, log file
entries classification, and Markov Model of log file sizes.
Log File Size Monitoring
First of all, we will check for the size of log files for Ubuntu Server, Apache, Mysql and PHP. and we will do real
time monitoring of the log files for all these softwares in order to determine the rate of change of the file sizes
on a day-to-day basis.
Also, we will track log file sizes daily to see if the expected new file size is within the expected
threshold based on statistical measures such as mean log file sizes computed based on file sizes for a number
of days, and standard deviation of that data. If during monitoring we see a deviation we will record it as an
anomaly.
Log File Entries Classification
Also, we will analyze the log files and classify every log entry as being, a normal user activity or an abnormal
user activity; whether the file is an access log file or error log file. As such based on that classification model,
we can detect abnormal user activities.
Markov Model of the Log Files Sizes
We will also build a Markov Model of the log file sizes using the data for each day. This will help us infer into
the new log file size for the various software logs and be able to predict roughly, the expected log file size for
the next day. As such, for each day, if what the expected file size should be is not achieved, then we can record
it as an anomaly.
4
5
Conclusion
This research paper describes three techniques that will be employed to detect anomalies on a LAMP server
that probably hosts a web application or ecommerce site. These three techniques are simple to understand
and relatively easy to implement.
References
1. What is Apache https://www.wpbeginner.com/glossary/apache/
2. Ubuntu Server Documentation
https://ubuntu.com/server/docs#:~:text=Ubuntu%20Server%20is%20a%20version,your%20datacentr
e%2C%20public%20or%20private.
3. Detection of Attack Targeted Scans From Apache HTTP Server Access Logs
https://www.sciencedirect.com/science/article/pii/S2210832717300169
4. What is Mysql and How Does it work https://www.hostinger.com/tutorials/what-is-mysql
5. What is Mysql https://dev.mysql.com/doc/refman/8.0/en/what-is-mysql.html
6. What is PHP https://www.php.net/manual/en/intro-whatis.php
7. What is PHP? Learning All about the Scripting Language
https://www.hostinger.com/tutorials/what-is-php/
8. Classification of Malicious web Sessions
https://community.wvu.edu/~kagoseva/Papers/ICCCN-2012.pdf
9. Security Monitoring of Http Traffic Using Extended Flows
https://is.muni.cz/publication/1300438/http_security_monitoring-paper.pdf
10. Analyzing Http Request for Web Intrusion Detection
https://www.semanticscholar.org/paper/Analyzing-HTTP-requests-for-web-intrusion-detection-Althub
iti-Yuan/f3adfc7e7686114ce2cb1a1eb7dc22848fdf13ca
11. Hackin9 Practical Protection Security Magazine
https://www.slideshare.net/RodrigoGomesPires/hakin9-05-2013?from_search=3
5

More Related Content

Similar to Detect Anomalies on LAMP Servers

IRJET-Computational model for the processing of documents and support to the ...
IRJET-Computational model for the processing of documents and support to the ...IRJET-Computational model for the processing of documents and support to the ...
IRJET-Computational model for the processing of documents and support to the ...IRJET Journal
 
An effective search on web log from most popular downloaded content
An effective search on web log from most popular downloaded contentAn effective search on web log from most popular downloaded content
An effective search on web log from most popular downloaded contentijdpsjournal
 
A hybrid technique for sql injection attacks detection and prevention
A hybrid technique for sql injection attacks detection and preventionA hybrid technique for sql injection attacks detection and prevention
A hybrid technique for sql injection attacks detection and preventionijdms
 
Research Inventy : International Journal of Engineering and Science
Research Inventy : International Journal of Engineering and ScienceResearch Inventy : International Journal of Engineering and Science
Research Inventy : International Journal of Engineering and Scienceinventy
 
A survey of cloud based secured web application
A survey of cloud based secured web applicationA survey of cloud based secured web application
A survey of cloud based secured web applicationIAEME Publication
 
Detection of Phishing Websites
Detection of Phishing WebsitesDetection of Phishing Websites
Detection of Phishing WebsitesIRJET Journal
 
Effective Information Flow Control as a Service: EIFCaaS
Effective Information Flow Control as a Service: EIFCaaSEffective Information Flow Control as a Service: EIFCaaS
Effective Information Flow Control as a Service: EIFCaaSIRJET Journal
 
Analysis of Network Traffic and Security through Log Aggregation
Analysis of Network Traffic and Security through Log AggregationAnalysis of Network Traffic and Security through Log Aggregation
Analysis of Network Traffic and Security through Log AggregationIJCSIS Research Publications
 
Web personalization using clustering of web usage data
Web personalization using clustering of web usage dataWeb personalization using clustering of web usage data
Web personalization using clustering of web usage dataijfcstjournal
 
Verification of the protection services in antivirus systems by using nusmv m...
Verification of the protection services in antivirus systems by using nusmv m...Verification of the protection services in antivirus systems by using nusmv m...
Verification of the protection services in antivirus systems by using nusmv m...ijfcstjournal
 
Only Abstract
Only AbstractOnly Abstract
Only Abstractguesta67d4a
 
A Resiliency Framework For An Enterprise Cloud
A Resiliency Framework For An Enterprise CloudA Resiliency Framework For An Enterprise Cloud
A Resiliency Framework For An Enterprise CloudJeff Nelson
 
IRJET- Adopting Encryption for Intranet File Communication System
IRJET- Adopting Encryption for Intranet File Communication SystemIRJET- Adopting Encryption for Intranet File Communication System
IRJET- Adopting Encryption for Intranet File Communication SystemIRJET Journal
 
Online stream mining approach for clustering network traffic
Online stream mining approach for clustering network trafficOnline stream mining approach for clustering network traffic
Online stream mining approach for clustering network trafficeSAT Journals
 
Online stream mining approach for clustering network traffic
Online stream mining approach for clustering network trafficOnline stream mining approach for clustering network traffic
Online stream mining approach for clustering network trafficeSAT Publishing House
 
Double guard: Detecting Interruptions in N- Tier Web Applications
Double guard: Detecting Interruptions in N- Tier Web ApplicationsDouble guard: Detecting Interruptions in N- Tier Web Applications
Double guard: Detecting Interruptions in N- Tier Web ApplicationsIJMER
 
Big Data Security Analytic Solution using Splunk
Big Data Security Analytic Solution using SplunkBig Data Security Analytic Solution using Splunk
Big Data Security Analytic Solution using SplunkIJERA Editor
 
Web log data analysis by enhanced fuzzy c
Web log data analysis by enhanced fuzzy cWeb log data analysis by enhanced fuzzy c
Web log data analysis by enhanced fuzzy cijcsa
 
Security against Web Application Attacks Using Ontology Based Intrusion Detec...
Security against Web Application Attacks Using Ontology Based Intrusion Detec...Security against Web Application Attacks Using Ontology Based Intrusion Detec...
Security against Web Application Attacks Using Ontology Based Intrusion Detec...IRJET Journal
 
A Generic Model for Student Data Analytic Web Service (SDAWS)
A Generic Model for Student Data Analytic Web Service (SDAWS)A Generic Model for Student Data Analytic Web Service (SDAWS)
A Generic Model for Student Data Analytic Web Service (SDAWS)Editor IJCATR
 

Similar to Detect Anomalies on LAMP Servers (20)

IRJET-Computational model for the processing of documents and support to the ...
IRJET-Computational model for the processing of documents and support to the ...IRJET-Computational model for the processing of documents and support to the ...
IRJET-Computational model for the processing of documents and support to the ...
 
An effective search on web log from most popular downloaded content
An effective search on web log from most popular downloaded contentAn effective search on web log from most popular downloaded content
An effective search on web log from most popular downloaded content
 
A hybrid technique for sql injection attacks detection and prevention
A hybrid technique for sql injection attacks detection and preventionA hybrid technique for sql injection attacks detection and prevention
A hybrid technique for sql injection attacks detection and prevention
 
Research Inventy : International Journal of Engineering and Science
Research Inventy : International Journal of Engineering and ScienceResearch Inventy : International Journal of Engineering and Science
Research Inventy : International Journal of Engineering and Science
 
A survey of cloud based secured web application
A survey of cloud based secured web applicationA survey of cloud based secured web application
A survey of cloud based secured web application
 
Detection of Phishing Websites
Detection of Phishing WebsitesDetection of Phishing Websites
Detection of Phishing Websites
 
Effective Information Flow Control as a Service: EIFCaaS
Effective Information Flow Control as a Service: EIFCaaSEffective Information Flow Control as a Service: EIFCaaS
Effective Information Flow Control as a Service: EIFCaaS
 
Analysis of Network Traffic and Security through Log Aggregation
Analysis of Network Traffic and Security through Log AggregationAnalysis of Network Traffic and Security through Log Aggregation
Analysis of Network Traffic and Security through Log Aggregation
 
Web personalization using clustering of web usage data
Web personalization using clustering of web usage dataWeb personalization using clustering of web usage data
Web personalization using clustering of web usage data
 
Verification of the protection services in antivirus systems by using nusmv m...
Verification of the protection services in antivirus systems by using nusmv m...Verification of the protection services in antivirus systems by using nusmv m...
Verification of the protection services in antivirus systems by using nusmv m...
 
Only Abstract
Only AbstractOnly Abstract
Only Abstract
 
A Resiliency Framework For An Enterprise Cloud
A Resiliency Framework For An Enterprise CloudA Resiliency Framework For An Enterprise Cloud
A Resiliency Framework For An Enterprise Cloud
 
IRJET- Adopting Encryption for Intranet File Communication System
IRJET- Adopting Encryption for Intranet File Communication SystemIRJET- Adopting Encryption for Intranet File Communication System
IRJET- Adopting Encryption for Intranet File Communication System
 
Online stream mining approach for clustering network traffic
Online stream mining approach for clustering network trafficOnline stream mining approach for clustering network traffic
Online stream mining approach for clustering network traffic
 
Online stream mining approach for clustering network traffic
Online stream mining approach for clustering network trafficOnline stream mining approach for clustering network traffic
Online stream mining approach for clustering network traffic
 
Double guard: Detecting Interruptions in N- Tier Web Applications
Double guard: Detecting Interruptions in N- Tier Web ApplicationsDouble guard: Detecting Interruptions in N- Tier Web Applications
Double guard: Detecting Interruptions in N- Tier Web Applications
 
Big Data Security Analytic Solution using Splunk
Big Data Security Analytic Solution using SplunkBig Data Security Analytic Solution using Splunk
Big Data Security Analytic Solution using Splunk
 
Web log data analysis by enhanced fuzzy c
Web log data analysis by enhanced fuzzy cWeb log data analysis by enhanced fuzzy c
Web log data analysis by enhanced fuzzy c
 
Security against Web Application Attacks Using Ontology Based Intrusion Detec...
Security against Web Application Attacks Using Ontology Based Intrusion Detec...Security against Web Application Attacks Using Ontology Based Intrusion Detec...
Security against Web Application Attacks Using Ontology Based Intrusion Detec...
 
A Generic Model for Student Data Analytic Web Service (SDAWS)
A Generic Model for Student Data Analytic Web Service (SDAWS)A Generic Model for Student Data Analytic Web Service (SDAWS)
A Generic Model for Student Data Analytic Web Service (SDAWS)
 

Recently uploaded

Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Bluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfBluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfngoud9212
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 

Recently uploaded (20)

Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort ServiceHot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Bluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfBluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdf
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 

Detect Anomalies on LAMP Servers

  • 1. Nathanael Asaam Founder and CEO @ Equicksales Consulting Ltd | Application Support Officer @ Ashesi University nataoasaam@gmail.com An Anomaly Detection System for Ecommerce Sites Hosted on a LAMP Server Introduction Anomaly detection uses deviations from novel patterns of a system to detect intrusions and threats on a computer system. In this paper, we describe how to build an anomaly detection system for ecommerce sites that are hosted on a LAMP server. LAMP is an acronym that stands for Linux, Apache, Mysql and PHP. The motivation for this research paper is that LAMP servers normally come with application logs for the various software that make up the LAMP stack. Thus, it is easy to build a machine learning model that describes the behaviour of the various software that make up the LAMP stack or a rule-based model for anomaly detection. This is partly because the training data can easily be obtained from the logs for apache, mysql php, and any linux distribution. Additionally, the logs are updated as and when there are interactions with the various softwares. There are also error logs for some errors that occurred in these softwares. In this project, we will describe how to build a model for a LAMP Server that uses Ubuntu Server and use it to detect anomalous activities on a web application or ecommerce site that is hosted on that LAMP server. It is essential to note that Ubuntu Server is free and open source, and so is Apache, Mysql and PHP. Background In this section we give a concise background of Ubuntu Server, Apache, Mysql and PHP. Ubuntu Server Ubuntu Server is a version of the Ubuntu Operating System that is designed and engineered as a backbone for the Internet [2]. Ubuntu Server brings economic and technical scalability to your datacentre, public or private [2]. Whether you want to deploy an OpenStack, a Kubernetes cluster, or a 50,000 - node render farm, it delivers the best value scale-out performance available[2]. Apache Apache also known as Apache HTTP Server is the most widely used web server software and runs on 67% of all websites in the world [1]. It is developed and maintained by Apache Software Foundation [1]. It is fast, reliable and secure and can be highly customized to meet the needs of many different environments using extensions and modules [1]. Mysql Mysql is an open-source Relational Database Management System (RDBMS) that enables users to store,manage and retrieve structured data efficiently [4]. It is widely used for various applications from small-scale projects to large-scales websites and enterprise level solutions [4]. It is also the most popular open-source SQL DBMS and is developed, supported and distributed by Oracle Corporation [5]. PHP PHP (a recursive acronym for PHP Hypertext Preprocessor) is a widely used general-purpose scripting language that is well suited for web development and can be embedded in HTML [6]. According to Web Technology 1
  • 2. 2 Surveys, PHP is used by 78.1% of all websites including high-traffic websites such as Facebook and Wikipedia [7]. Previous Work In this section we describe several works related to vulnerability scan detection and detecting intrusions or attacks by analyzing logs of Apache Http Server. Rule Based Model for Analyzing Http Access Logs and Detecting Web Scans, SQL Injection (SQLI) and Cross-Site Scripting (XSS) A research paper on using a rule-based model to detect anomaly by analyzing Http Server Access Logs and Web Scans explain that, according to the European Network and Information Security Agency (ENISA) Threat Landscape, Web based and Web Application attacks are ranked as number two and three in Cyber Security Environment [3]. These rankings remain unchanged between 2014 and 2015 [3]. Thus, Web Applications are more prone to Security Risks [3]. The research paper states that Cross-Site Scripting (XSS) and Structure Query Language Injection (SQLI) seem to be at a decreasing rate in 2014 but increased in 2015 [3]. The paper went further to state that, to detect all the mentioned attacks and web scans, analyzing log files are preferred due to the fact that anomalies in users’ request and related server response could be clearly identified [3]. Also, it must be stated that two primary reasons why analyzing log files is preferred are that there is no need for expensive hardware for the analysis and also log files provide successful detection especially for encrypted protocols like Secured Socket Layer (SSL) and Secured Shell Daemon (SSHD) [3]. However, the paper noted that, the heavier the website traffic the more difficult the analysis of the log file and this presents the need for a user-friendly web vulnerability scanner detection tool for analyzing log files [3]. Also, the motivation for this research paper is that, work in this field uses a different approach, which is machine learning and data mining based predictive detection of malicious activities [3]. Additionally, in order to increase the accuracy of a machine learning classifier, a large-scale input training data is needed which in turns leads to increase in memory usage [3]. Another negative point about machine learning based approaches is overfitting; referring to a model that models the training data too well resulting in the models negative predictive performance and low generalization ability [3]. Finally, the proposed model of this research paper has three significant assumptions. These are; 1. In access logs POST data cannot be logged. Subsequently, the proposed method cannot capture this sort of data [3]. 2. Browsers or Web Server may support other encodings. Since only two are in the context of the research paper, the script does not capture data encoded in other styles. 3. The proposed model is for detection of two well-known web application attacks and malicious web vulnerability scans. Thus the model is not for prevention and working online mode is not included in the research paper. Classification of Malicious Cyber Activities and Attacks and Vulnerability Scans A research paper on classification of malicious web sessions states that SANs reported that 60% of total attack attempts observed on the Internet were against Web Applications [8]. The paper further states that recently, the long tradition and great success of characterization of network traffic and server workload is not the focus 2
  • 3. 3 of research [8]. Also, not much focus is placed on quantification of malicious attacker behaviour [8]. The one evident reason for this is the lack of publicly available, good quality data on cyber security threats and malicious attacker activities [8]. The paper explains that, although there is a significant amount of research in intrusion detection, the focus is on developing data mining techniques aimed at constructing a black-box that classifies network traffic on malicious and non-malicious activities rather than the discovery of the nature of malicious activities [8]. Additionally, significant amount of intrusion detection research works were based on outdated data sets such as the DARPA Intrusion Detection Data Set and its derivative KDD[8]. Motivated by the lack of available data sets that incorporated attacker activities, the researchers developed and deployed high interaction honeypots as a means to collect such data [8]. Their honeypots were configured in a three-tier architecture (consisting of frontend web server, application server and backend database) and had meaningful functionalities [8]. Furthermore, they ran standard off the shelf operating systems and applications which followed typical security guidelines and did not include user accounts with nil or weak passwords [8]. The data collected by the honeypots are grouped into four datasets each with a duration of four to five months [8]. Also, each dataset consisted of malicious web sessions extracted from application level logs of systems running on the Internet [8]. The research paper used supervised machine learning methods to automatically classify malicious web sessions on attacks and vulnerability scans and each web session was characterized with 43 features reflecting different session characteristics such as number of requests in a session, number of requests of a specific method type (GET,POST, OPTIONS), number of requests to dynamic application files and length of request substring within a session [8]. In all, the research paper used three supervised machine learning methods; namely, Support Vector Machines (SVM), Decision trees based J48, and PART to classify attacker activities aimed at web systems [8]. According to the paper, results show that Supervised Learning methods can be used to efficiently distinguish attack sessions from vulnerability scan sessions, with very high probability of detection and low probability of false alarms[8]. Finally, it is worth stating that the research paper explored the following three research questions; 1. Can Supervised Machine Learning methods be used to distinguish between Web Attacks and Vulnerability Scans? 2. Do Attacks and Vulnerability Scans differ in small number of features? If so, are these subset of best features consistent across different datasets? 3. Do some learners perform consistently, better than others across different datasets? Security Monitoring of Http Traffic Using Extended Flows A research paper on Security Monitoring of Http Traffic Using Extended Flows states that Http is currently the most widely used protocol which takes a significant amount of network traffic [9] The paper further explains that the most suitable way of gaining an overview of Http traffic in a large-scale network is extended network flow monitoring [9]. There are two approaches to network traffic monitoring, according to the research paper. These are Deep Packet Inspection (DPI) and Flow Monitoring. DPI is resource demanding but provides detailed information about a whole packet including a payload [9]. Network Flow Monitoring is fast but is limited to layers 3 and 4 of the OSI/OSI model but Extended Flow Monitoring is a synergy of the benefits of both methods [9]. It provides application-level data to traditional flow records while keeping the ability to monitor large-scales and high-speed networks [9]. The research paper further explains that the correlation of logs from web servers is an option, but also states that in large networks it is not always possible to gain access to logs or even be aware of all of them [9]. 3
  • 4. 4 Thus this research is more significant to Administrators of Large Networks; in general Networks of Academics and ISPs [9]. The paper also addresses two problems, which are, lack of overview of network traffic and insufficient security awareness [9]. The paper also states that many Administrators oversee Web Servers and and their neighbourhood in their administration, but are not aware of security threats in the rest of the network [9]. The other problem is to find a suitable set of tools to analyze Http traffic and distinguish between legitimate and malicious traffic [9]. The research paper poses these two research questions; 1. What classes of Http traffic relevant to security can be observed at network level, and what is their impact on attack detection? 2. What is the added value of extended flow compared to traditional flow monitoring from a security point of view? The paper also describes three classes of Http traffic which contain brute-force password attacks, connection to proxies, and Http Scanners and Web crawlers [9]. Using classification the paper was able to detect 16 previously undetectable brute-force password attacks and 19 Http Scans per day in their campus [9]. The activities of proxy servers and web crawlers were also observed [9]. Another result of this research paper is that four network flows were monitored [9]. These are source IP address, destination IP address, hostname, and Http Requests [9]. Proposed System Model This section describes our proposed Anomaly Detection System model for the LAMP server. The proposed Anomaly Detection System employs three different but simple techniques for log file size monitoring, log file entries classification, and Markov Model of log file sizes. Log File Size Monitoring First of all, we will check for the size of log files for Ubuntu Server, Apache, Mysql and PHP. and we will do real time monitoring of the log files for all these softwares in order to determine the rate of change of the file sizes on a day-to-day basis. Also, we will track log file sizes daily to see if the expected new file size is within the expected threshold based on statistical measures such as mean log file sizes computed based on file sizes for a number of days, and standard deviation of that data. If during monitoring we see a deviation we will record it as an anomaly. Log File Entries Classification Also, we will analyze the log files and classify every log entry as being, a normal user activity or an abnormal user activity; whether the file is an access log file or error log file. As such based on that classification model, we can detect abnormal user activities. Markov Model of the Log Files Sizes We will also build a Markov Model of the log file sizes using the data for each day. This will help us infer into the new log file size for the various software logs and be able to predict roughly, the expected log file size for the next day. As such, for each day, if what the expected file size should be is not achieved, then we can record it as an anomaly. 4
  • 5. 5 Conclusion This research paper describes three techniques that will be employed to detect anomalies on a LAMP server that probably hosts a web application or ecommerce site. These three techniques are simple to understand and relatively easy to implement. References 1. What is Apache https://www.wpbeginner.com/glossary/apache/ 2. Ubuntu Server Documentation https://ubuntu.com/server/docs#:~:text=Ubuntu%20Server%20is%20a%20version,your%20datacentr e%2C%20public%20or%20private. 3. Detection of Attack Targeted Scans From Apache HTTP Server Access Logs https://www.sciencedirect.com/science/article/pii/S2210832717300169 4. What is Mysql and How Does it work https://www.hostinger.com/tutorials/what-is-mysql 5. What is Mysql https://dev.mysql.com/doc/refman/8.0/en/what-is-mysql.html 6. What is PHP https://www.php.net/manual/en/intro-whatis.php 7. What is PHP? Learning All about the Scripting Language https://www.hostinger.com/tutorials/what-is-php/ 8. Classification of Malicious web Sessions https://community.wvu.edu/~kagoseva/Papers/ICCCN-2012.pdf 9. Security Monitoring of Http Traffic Using Extended Flows https://is.muni.cz/publication/1300438/http_security_monitoring-paper.pdf 10. Analyzing Http Request for Web Intrusion Detection https://www.semanticscholar.org/paper/Analyzing-HTTP-requests-for-web-intrusion-detection-Althub iti-Yuan/f3adfc7e7686114ce2cb1a1eb7dc22848fdf13ca 11. Hackin9 Practical Protection Security Magazine https://www.slideshare.net/RodrigoGomesPires/hakin9-05-2013?from_search=3 5