This project addresses the critical need for an accessible and resource
efficient AI-enhanced network intrusion identification system capable of
effectively distinguishing abnormal network behaviour from benign traffic
i
Philadelphia University
FACULTY OFIT
Project Title:
Improving Snort Techniques
Using Machine Learning
Supervisor:
Dr. Athari Alnatshah
Group Members:
Yazan Majdi Arman 202211016
Omar Saleem Al-Bishtawi 202210748
Mohammad Yousef Abu Jneineh 202110500
Abdulrahman Asaad Jalamneh 202210980
1st
semester
2025-2026
2.
ii
Approval
We certify thatwe have read the project, and as a member of project evaluation committee we
had examined the students in the content of this document and knowledge related to it, and we
certify that it is adequate with standings as a project for partial fulfilment of the requirements
of Cyber-Security department.
3.
iii
Certificate
It is certifiedthat this project has been prepared and written under my direct supervision and
guidance. I also would like to certify that this document is approved for submission and
evaluation.
Supervisor:
Signature:
Date:
4.
iv
Dedication
We dedicate thiswork to our parents and families. Thank you for being our constant source of
strength and for providing the encouragement needed to turn our aspirations into reality. This
achievement is as much yours as it is ours.
5.
v
Acknowledgment
First and foremost,we would like to express our deepest gratitude to our supervisor, Dr.
Athari Alnatshah, for her invaluable guidance, patience, and expert advice throughout the
development of this project.
We extend our special thanks to our friends for their unwavering support and motivation.
Furthermore, your contributions were essential in shaping the functionality of this system.
Finally, we thank everyone who contributed, directly or indirectly, to the successful completion
of this work.
6.
vi
Table of Contents
Chapter1 - Introduction......................................................................................................... 9
1.1 Introduction:................................................................................................................. 9
1.2 Background: ................................................................................................................. 9
1.3 Problem Statement:...................................................................................................... 9
1.4 Limitations: ................................................................................................................ 10
1.5 Project Objectives:...................................................................................................... 10
1.6 Project Solution Overview: .......................................................................................... 10
Chapter 2 – Literature Review............................................................................................... 13
2.1 Introduction................................................................................................................ 13
2.2 Review of existing systems: ......................................................................................... 13
2.3 Comparison of existing solutions:................................................................................ 15
2.4 Evaluation of current states:........................................................................................ 15
Chapter 3 – Methodology & Plan ........................................................................................... 17
3.1 Introduction................................................................................................................ 17
3.3 Requirement Gathering Techniques (Waterfall)............................................................ 17
3.4 Project Plan (Gantt Chart) ........................................................................................... 18
3.5 Development Tools and Technologies.......................................................................... 18
Chapter 4 – System Specification ......................................................................................... 20
4.1 Introduction................................................................................................................ 20
4.2 Functional Requirements............................................................................................ 21
4.3 Logical system............................................................................................................ 23
Chapter 5 – Implementation & Testing .................................................................................. 26
5.1 Introduction................................................................................................................ 26
5.2 Testing phase ............................................................................................................. 30
7.
vii
Chapter 6 –Future work & Conclution................................................................................... 32
6.1 Introduction................................................................................................................ 32
6.2 Future Work................................................................................................................ 32
6.3 Conclusion................................................................................................................. 32
References: - ....................................................................................................................... 33
Academic Journals & Papers............................................................................................. 33
Company Technology & Software...................................................................................... 34
Methodology .................................................................................................................... 34
List of Tables
Table 1: Tasks Chart ............................................................................................................. 12
Table 2 Examines the evolution of Network Intrusion Detection Systems................................ 13
Table 3 Comparative Analysis of Techniques ......................................................................... 14
Table 4: Comparison of existing solutions ............................................................................. 15
8.
8
Abstract
Sophisticated network cyberattackscontinue to pose a significant and
costly risk, highlighting the inherent limitations of traditional signature-based
security systems that are purely reactive and often result in high false positive
rates. This project addresses the critical need for an accessible and resource
efficient AI-enhanced network intrusion identification system capable of
effectively distinguishing abnormal network behaviour from benign traffic. The
methodology involved utilizing a robust ensemble machine learning approach,
concentrating on the analysis of behavioural network flow features to establish a
reliable detection capability. Crucially, the model was hardened using specialized
techniques to correct for the challenges posed by uneven data distributions,
ensuring effective identification across all threat categories. The resulting
lightweight solution successfully demonstrates high detection reliability while
significantly minimizing false positive alarms, offering a practical and
economically sound approach to proactive security enhancement for
organizations that lack high-end computational infrastructure.
9.
9
Chapter 1 -Introduction
1.1 Introduction:
In these days connected digital landscape, cyberattacks pose a serious
threat to organizations, corporations, governments, and individuals alike.
Among the various forms of cyber threats, network-based attacks are some of
the most prevalent and dangerous, as they target the fundamental
infrastructure that enables communication and data exchange.
To address this growing threat, this project intends an AI-enhanced
network cyberattack system focused specifically on network-level intrusions.
Researchers and many security companies around the world like Fortinet are
trying to reach that for cyberattacks which could prevent it. Recent studies and
real-world applications combined between AI models and traditional methods
in both accuracy and speed when it comes to identifying network intrusions. As
cyber threats continue to evolve, the integration of intelligent systems into
network monitoring and defence strategies becomes not only beneficial but
necessary.
1.2 Background:
The motivation behind this project falls under the urgent need to
strengthen cyber security defences in an increasingly connected and
vulnerable digital environment. As organizations continue to expand their digital
operations, the attack surface of their networks grows correspondingly, making
them more susceptible to cyber threats, particularly those that originate or
propagate through network traffic.
This work concentrates on the integration of predictive powers into
cybersecurity systems to confront these challenges. By utilizing AI, ML and
Deep learning, the project intends to shift away from the passive, post-incident
response approach to an active form of threat intelligence anticipation. This is
to enable the early detection of potential attacks, enhanced situational
awareness, and overall improved security posture of digital infrastructures. In
essence, it is the wider goal of turning cybersecurity into a dynamic, predictive
discipline in tandem with the evolution of the threat landscape.
1.3 Problem Statement:
Several tools have attempted to shift from traditional detection methods
to proactive security using artificial intelligence. Some of them leverage self-
learning AI to anticipate abnormal network behaviours and flag potential
breaches before they occur, while other depends on the predictive defence
methodology, these systems often suffer from high false positive rates, where
normal activities are mistakenly flagged as threats. This not only wastes
resources but also reduces confidence in the system’s calculations.
Other solutions use machine learning models trained on historical data
to predict and block malware or network intrusions before execution. These
systems show promise, but they come at a significant cost both financially
10.
10
and computationally. Theirreliance on powerful infrastructure and premium
licensing models makes them inaccessible to many users and small
organizations that would benefit most from predictive security capabilities.
Main problems: -
▪ False positives alerts
▪ High maintenance cost
1.4 Limitations:
The scope and final outcomes of this project are subject to the following constraints:
• Data Dependency: The model's training is based on a well-known benchmark
dataset, meaning its performance may require further fine-tuning when applied
to a specific, unique live network environment.
• Tool Constraints: The project implementation relies on readily available
standard open-source programming and machine learning frameworks, which
limits the use of specialized or proprietary technologies.
• Deployment Scope: The project concludes with the creation of the final, ready-
to-use model file. Integration into a full, live network security infrastructure for
continuous operation is excluded from this phase due to time constraints.
1.5 Project Objectives:
The primary objectives this project intends to accomplish are:
To develop a machine learning model capable of identifying network-level
cyberattacks by analysing network flow data.
To significantly reduce false positives and ensure a high detection success rate
across different threat types by utilizing methods designed to handle uneven
data distribution.
To establish an effective threat identification capability by deploying a
lightweight and resource-efficient architecture.
To create a security solution that is accessible and can operate effectively
without necessitating the purchase of high-end computational infrastructure.
1.6 Project Solution Overview:
The proposed solution involves an AI-enhanced intrusion identification system
based on an effective ensemble learning method. This algorithm was selected due to
its demonstrated robustness when processing complex network traffic data. The
solution focuses on extracting and analysing behavioural network flow features such
as connection metrics and data transfer volumes to distinguish between benign and
malicious activity. The model is specially trained with methods to address common
challenges posed by unbalanced datasets, which is essential to ensure that less
frequent but critical attack types are reliably detected.
11.
11
1.7 Project Scope:
ProjectScope Covered:
Thorough feature preparation and processing of the network dataset to
maximize the model's performance.
Selection and rigorous optimization of an optimal classification algorithm.
Implementation of techniques to handle the challenges of skewed data
distribution across different threat categories.
Validation and comparative testing of the final model's performance using
standard, appropriate security metrics.
Preparation of the final, fully optimized model and data handling pipeline for
future deployment.
Project Scope Not Covered:
Creation of a comprehensive Graphical User Interface (GUI) for operational use
by an end-user.
Development of a complex live API for real-time packet capture and integration
with commercial network security tools.
Detailed exploration or comparison of complex deep learning architectures, which
contradict the project's goal of a lightweight design.
1.8 Project Feasibility:
The project has been assessed and confirmed to be practical and achievable across
all key areas:
Technical Feasibility: The core technology (Machine Learning, data
processing, and programming frameworks) is well-established and accessible,
and the initial algorithmic testing confirms the technical viability of the
approach.
Operational Feasibility: The goal is to produce a resource-light model that
analyses standard network flow data. This design makes the solution
compatible and practical for integration into existing network monitoring
processes.
Economic Feasibility: By relying exclusively on open-source software and
selecting a resource-efficient model architecture, the project minimizes
development and potential deployment costs, ensuring it remains an
economically sound alternative to expensive commercial systems.
12.
12
Table 1: TasksChart
0 1 2 3 4 5 6 7
Yazan
Omar
Abdulrhman
Mohammad
TIME SCHEDULE
Dataset Tools AI-Algo ML
13.
13
Chapter 2 –Literature Review
2.1 Introduction
In the context of this project, several key terms are defined to establish clarity.
Cyberattack detection refers to the use of data-driven methods, particularly artificial
intelligence, to anticipate and forecast potential malicious activities on a network
before they occur. Machine Learning (ML) is a branch of artificial intelligence that
enables systems to learn patterns from historical data and make decisions without
being explicitly programmed. A network flow represents a sequence of packets
sharing common properties such as source and destination IP addresses, ports,
and protocol types, which are crucial features used for model training. Feature
extraction involves selecting relevant attributes from raw network data that are
most useful for identifying potential threats. Classification algorithms are
supervised ML models such as Decision Trees, Random Forests, or Support
Vector Machines that categorize data into labels like “attack” or “normal.” The term
dataset refers to a structured collection of historical network records used to train
and evaluate the model. Real-time implies the system’s ability to analyse and
respond to incoming traffic almost instantly, enhancing proactive defence. Lastly,
false positives and false negatives represent incorrect alerts where the model
either wrongly flags normal activity as malicious or fails to detect an actual attack,
respectively. Understanding these terms is essential for interpreting the
methodology and results of this study.
2.2 Review of existing systems:
This literature review examines the evolution of Network Intrusion Detection Systems (NIDS)
from 2011 to 2019, highlighting a shift from foundational data preprocessing and taxonomic
strategies toward high-performance frameworks like Hadoop and Cloud-based environments.
The research emphasizes the integration of machine learning, specifically Random Forest
methods, and the critical role of data collection, evidenced by the categorization of 34 public
datasets to improve detection accuracy and scalability. As shown in Figure 2.1: -
Table 2 Examines the evolution of Network Intrusion Detection Systems
Author & Year Title Focus Topics
Shoham et al. (2023) Hybrid ddos detection Hybrid DDoS
Detection: A hybrid
machine learning
approach specifically
designed.
Detect and mitigate
Distributed Denial-of-
Service (DDoS) attacks.
Davis and Clark. 2011 Data preprocessing for
anomaly-based
network intrusion
detection: A review
Data preprocessing Relevant features
construction using
targeted content
parsing and deeper
network packet
inspection
Jeong et al. 2012 Anomaly tele-traffic
intrusion detection
systems on Hadoop-
based Platforms: A
Framework Hadoop and big data
platforms for speed,
storage volume, and
cost-efficiency
14.
14
survey of problemsand
solutions
Poston. 2012 A brief taxonomy of
intrusion detection
strategies
Strategies Taxonomy of traditional
network intrusion
detection
Modi et al. 2013 A survey of intrusion
detection techniques in
Cloud
Framework Incorporating IDS on
host system and virtual
machines
Keegan et al. 2016 A survey of cloud-
based network
intrusion detection
analysis
Framework Integrating machine
learning algorithms and
MapReduce to cloud
computing
environments
(Figure 2.1 review examines the evolution of Network Intrusion Detection Systems)
A comparative analysis of the various techniques adopts a multi-fold approach, where
the techniques are categorized based on their distinct characteristics, and then
compared to identify the advantages and disadvantages of each technique. The
categories chosen include (1) based on learning mechanism employed for
classification and detection, (2) based on features used for training and detection, (3)
AI techniques employed, and (4) based on the deployment. As shown in the Figure
2.2: -
Table 3 Comparative Analysis of Techniques
(Figure 2.2 COMPARATIVE ANALYSIS OF THE VARIOUS TECHNIQUES ADOPTS A MULTI-FOLD APPROACH)
15.
15
2.3 Comparison ofexisting solutions:
Before presenting the comparison, it’s essential to note that the analysed tools
employ different approaches to cyberattack prediction. BforeAI PreCrime relies
on predictive modelling and domain behaviour scoring to forecast malicious
domains before they’re used. SafeBreach simulates attack scenarios using
breach and attack emulation techniques to assess defensive gaps. ThetaRay
applies deep learning and statistical anomaly detection over large-scale data
streams to uncover hidden threats. Darktrace uses self-learning AI and
clustering to identify abnormal behaviours in real time. Comparing these
technologies highlights how current solutions integrate AI in various forms
predictive analytics, simulation, deep anomaly detection, and self-learning to
achieve proactive cyber defence. Shown in Figure 2.3: -
Table 4: Comparison of existing solutions
(Figure 2.3 Shows a comparison of existing solutions)
2.3 Evaluation of current states:
The current state of network-based cyberattack systems has evolved
considerably with the integration of Artificial Intelligence (AI) and Machine
Learning (ML). Traditional rule-based Intrusion Detection Systems (IDS) are still
Tool Name Timeframe
Datasets
Used
Technique
Used
Accuracy Disadvantages
BforeAI
PreCrime
Up to 89
days
ahead
Domain
metadata,
behavioural
patterns
Predictive
modeming,
behavioural
scoring
50% for
domain-
level
prediction
Can’t detect
zero-day
attacks
SafeBreach
Hours to
days
ahead
Simulated
attack
methods
Attack
simulation,
scenario
modelling
Not
publicly
disclosed
Does not
predict
external
threats &
relies on
simulated
playbooks
ThetaRay
Real-time
(Hours
ahead)
Big data logs
Deep
anomaly
detection
(statistical
+ DL)
Not
publicly
disclosed
Not ideal for
IT-centric
threats;
specialized to
financial data
flows
Darktrace
Antigena
Minutes to
hours
ahead
Proprietary
traffic & self-
learned
organizational
behaviour
Self-
learning AI,
clustering,
anomaly
detection
Not
publicly
disclosed
Can generate
false
positives;
lacks
transparency
into detection
rationale
16.
16
widely deployed; however,they struggle to predict sophisticated and previously
unknown attack patterns due to their reliance on static signatures and
predefined rules. This limitation has catalysed a shift toward more adaptive and
intelligent systems.
A significant issue in the current state of research is the over-reliance on
synthetic or outdated datasets, which may not reflect real world traffic. This can
result in high accuracy during testing but poor generalization to live networks.
Additionally, class imbalance (more benign traffic than attacks) benign is the
false positives or non-malicious data; and lack of diversity in attack types can
bias the models. Plus, many modern AI models, especially deep learning-based
ones, act as "black boxes." While their predictions may be accurate,
understanding why a certain traffic flow was classified as malicious is often
unclear. This poses a problem in critical environments where interpretability and
trust in automated decisions are necessary, which in result the use of the AI
explanation libraries could play a big role in the consequences of the project.
Another current state which drew our attention is that in AI based
cybersecurity systems is their vulnerability to adversarial machine learning.
Attackers can exploit weaknesses in the model by crafting malicious traffic
designed to evade detection, known as evasion attacks, like the recent fake
antiviruses that got trended a while ago. These may involve subtle
manipulations in network flow characteristics such as altering packet sizes,
interarrival times, or flow counts to bypass the model without triggering alerts.
Such attacks are particularly dangerous because they often go unnoticed while
still accomplishing malicious goal
Additional major threat comes in the form of poisoning attacks, where
attackers inject misleading or crafted data into the training pipeline. This
compromises the model’s learning process, causing it to make incorrect
forecast during deployment. In high-stakes environments like banking or critical
infrastructure, even a small decrease in model performance due to data
poisoning can lead to significant security breaches. Poisoned models may even
"learn" to ignore specific attack patterns entirely, leaving systems defenceless
against known threats.
Moreover, model inversion attacks present a privacy risk by allowing
adversaries to infer sensitive training data or internal logic of the model based
on its outputs. In network contexts, this could reveal user behaviour, device
types, or even traffic patterns related to secure internal services. These attacks
highlight that AI not only protects but also exposes new attack surfaces that
must be secured.
To defend against these vulnerabilities, AI-based systems must
incorporate robust training methods, such as adversarial training, encryption,
anomaly injection resistance, and continuous validation using fresh traffic.
Additionally, deploying these models in secure environments with strong access
controls and logging can reduce the risk of exploitation.
17.
17
Chapter 3 –Methodology & Plan
3.1 Introduction
This chapter outlines the systematic approach used to design and implement
the Intelligent Model for Predicting Network Cyber-Attacks. The methodology serves
as the backbone of the project, ensuring that data is handled correctly, models are
trained rigorously, and the final system that provides reliable, real-time security
insights. By combining data science principles with network security protocols, this
methodology aims to achieve high detection accuracy with minimal false alarms.
3.2 System Development Methodology
The project follows the Waterfall Methodology, a linear and sequential software
development life cycle. Given the critical security requirements of an Intrusion
Detection System (IDS), this model was chosen to ensure that each stage from data
acquisition to system deployment reaches a state of total completion and validation
before the next phase commences.
The Waterfall approach provides a structured environment where requirements are
frozen after the initial phase. This stability was essential for the complex integration of
network monitoring tools (Snort/Zeek) with the Python-based AI backend, ensuring
that the data flow remained consistent and reliable. Security systems require clear
documentation and stable requirements.
Though by using Waterfall, we ensured that the selection of the dataset was perfectly
calibrated before we began training the Random Forest model. This minimized errors
in the integration phase.
3.3 Requirement Gathering Techniques (Waterfall)
To establish a robust system specification, multiple information-gathering
strategies were utilized:
Document Analysis: An exhaustive review of the UNSW-NB15 dataset technical
whitepapers was conducted to understand feature engineering requirements.
Additionally, the official documentation for Snort and Zeek was analysed to map
log formats to AI input variables.
Brainstorming & Expert Consultation: Technical sessions were held to define the
"Intelligence Suite" logic, specifically focusing on how to bridge the gap between
signature-based alerts and probabilistic AI behaviour.
Prototyping: Small-scale tests were performed to determine the hardware and
software limitations of running a Flask web server alongside real-time network
sniffers.
18.
18
(Figure 3.1 Waterfallmethodology)
3.4 Project Plan (Gantt Chart)
The project followed a strict timeline to ensure all components from the data
pre-processing all the way to the web interface in figure 3.2: -
(Figure 3.2 Gantt Chart tasks timeline)
3.5 Development Tools and Technologies
The project utilizes a modern Python-based stack designed for high-performance
machine learning and data processing.
19.
19
Data Pre-processing: -
ProgrammingLanguage (Python): Chosen for its extensive ecosystem of security
and AI libraries.
Scikit-Learn (Library): Used to build the "Fail-Safe" Pipeline, including the Column
Transformer and Voting Classifier.
Joblib (Persistence): Used to produce the .pkl file (pickle), allowing the model to be
saved and loaded into a real-time environment without retraining.
Pandas & NumPy: Essential for handling the large-scale data structures of network
traffic logs.
Random Forest & PCA: Random Forest serves as the core intelligence layer,
utilizing an ensemble of decision trees to classify network traffic through a majority
voting mechanism and the PCA was implemented to reduce the high
dimensionality of the UNSW-NB15 dataset by projecting it into a lower-dimensional
space while retaining 95% of the variance.
Matplotlib & Seaborn: Used to generate the "Intelligence Performance Summary"
and "Model Percentages" charts.
Google Colab: The primary development environments for writing and testing the
dataset pre-processing.
Implementation: -
Operating System (Linux Ubuntu): The main environment used for integrating
the model, providing a stable platform for running concurrent security services.
Network Analyzer (Zeek): Used for understanding traffic behaviour and generating
high-level metadata logs. Zeek transforms raw packets into structured data that the
AI can interpret.
Intrusion Detection System (Snort): Implemented for signature-based detection.
Snort applies predefined rules to the traffic stream to catch known malicious
patterns instantly.
Integrated Development Environment (PyCharm): The central IDE used to write
the integration code, managing the complex logic required to merge Zeek/Snort
outputs into a single Python application.
Web Framework (Flask): Used to develop the final dashboard. Flask acts as the
"Presentation Layer," pulling data from the AI model to show real-time "Normal vs.
Attack" percentages to the user SIEM dashboard.
20.
20
Chapter 4 –System Specification
4.1 Introduction
This chapter describes the UNSW-NB15 network intrusion dataset used in the
project. The following sections explain the methodology used to extract features from
the dataset and the step-by-step workflow for data preparation. It details the process
of training the system using Random Forest and Support Vector Machine (SVM)
algorithms based on behavioural analysis. After the dataset is processed and verified,
the findings from the AI's feature importance analysis are used to generate manual
Snort rules. This hybrid approach allows the system to identify both known signatures
and abnormal behaviour. As figure 4.1 illustrates the general stages of the
methodology, from data pre-processing to final performance evaluation.
(Figure 4.1 System chart)
21.
21
4.2 Functional Requirements
DatasetSelection: The dataset used in this project is the UNSW-NB15, a
modern network intrusion dataset. Unlike older datasets (like KDD99), UNSW-NB15
contains a comprehensive variety of contemporary synthesized attack activities and
normal traffic. The dataset consists of over 250,000 samples categorized into nine
attack types (Fuzzers, Exploits, DoS, and Generic) and Normal traffic. It was obtained
in CSV format, containing 45 features including duration, protocol type, and packet
counts.
Data Preprocessing: In this stage, the raw network logs are cleaned and transformed.
Redundant data is removed, and categorical strings (like "TCP" or "UDP") are
converted into numerical values that the AI can understand.
I. Cleaning: To ensure the model does not crash, an imputation strategy was
used. Any missing values (NaNs) or undefined records were handled using
SimpleImputer. Numerical gaps were filled with the median value, and
categorical gaps were filled with an "unknown" placeholder.
II. Feature Encoding & Normalization: Categorical features were processed using
One-Hot Encoding, and numerical features were scaled using StandardScaler
to ensure that high-value features (like sbytes) did not overshadow smaller
values (like dur).
III. Balancing: The dataset was stratified during the split to ensure that both
"Normal" and "Attack" classes were proportionally represented in both training
and testing phases.
Feature Selection (RF + PCA): To achieve maximum efficiency in a live SIEM (Security
Information and Event Management) environment, a two-step reduction process was
used:
1. Random Forest (RF): First, RF was used to determine the "Feature
Importance." This identifies which network attributes (like sttl or sbytes) have
the most significant impact on detecting an attack.
2. Principal Component Analysis (PCA): Following RF, PCA was applied to reduce
dimensionality. We retained 95% of the cumulative variance, which compressed
the wide dataset into a core set of 22 Principal Components. This reduces
processing time without sacrificing the accuracy of the detection engine.
22.
22
In this (4.2)figure, we demonstrate the most important indicators for detecting the
attacks: -
(Figure 4.2: Top 10 Network features attack indicators)
The Random Forest model works by asking a series of "Yes/No" questions (Decision
Trees).
• A feature is "Important" if it is the best at splitting the data into two clean
groups (Normal vs. Attack).
• If a feature like sttl (Source TTL) can instantly separate 80% of attacks from
normal traffic, the AI gives it a high "Importance Score."
Mathematically, this is often measured by the Gini Impurity index:
23.
23
4.3 Logical system
RandomForest (RF): Random Forest is an ensemble learning method that
builds multiple decision trees and merges them to get a more accurate and stable. In
our model, RF acts as the primary "Heavy Hitter" classifier. It is highly resistant to
overfitting and excels at identifying complex patterns in high-dimensional network
data.
Support Vector Machine (SVM): SVM is a supervised learning model that finds the
optimal hyperplane to separate classes in a multi-dimensional space. It is particularly
effective for binary classification of network traffic where the boundary between
"Normal" and "Attack" is narrow.
Figure (4.3) presents the Receiver Operating Characteristic (ROC) curve of the
proposed AI-based intrusion detection system. The ROC curve illustrates the
relationship between the True Positive Rate (TPR) and the False Positive Rate
(FPR) at various classification thresholds.
The solid curve represents the performance of the Random Forest classifier, while
the diagonal dashed line indicates the performance of a random classifier. The
model achieves an Area Under the Curve (AUC) of 0.990, which demonstrates
excellent discriminative capability between normal and attack traffic.
(Figure 4.3 ROC curve)
24.
24
Random Forest Results:After applying PCA, the Random Forest model
demonstrated outstanding performance. It achieved an overall Accuracy of 94.82%.
The model showed high precision, meaning it generated very few false alarms,
making it suitable for a real-time SOC environment.
SVM Results: The SVM model achieved an Accuracy of 93.32%. While slightly lower
than RF, the SVM provided a robust secondary validation, particularly in identifying
"Stealth" attacks that involve low packet counts.
As shown in figure (4.3) : -
(Figure 4.3 Percentages)
The figure 4.4 diagram illustrates the integrated workflow of the hybrid
detection system, where live network traffic is captured and simultaneously
processed through two primary channels. On the left, Snort applies rules to identify
known malicious patterns, while on the right, the Machine Learning pipeline
(Random Forest/SVM) performs behavioural analysis on the UNSW-NB15 dataset
features. The central integration module consolidates these findings to provide a
comprehensive security verdict, which is then visualized in real-time via the
administrative dashboard.
26
Chapter 5 –Implementation & Testing
5.1 Introduction
This chapter details the transition of the project from its theoretical and architectural
design into a functional, multi-layered security prototype for Improving Snort
Techniques Using Machine Learning. The implementation phase focuses on the
integration of established network monitoring tools, specifically Snort and Zeek, with
a custom Python-based AI backend. By utilizing the UNSW-NB15 dataset and the
machine learning models, specifically Random Forest and KNN, specified in the
previous chapter, this phase demonstrates how raw network traffic is transformed
into actionable security intelligence.
Figure 5.1 demonstrates the libraries used for starting the pre-processing phase: -
(Figure 5.1 Data Pre-processing)
27.
27
Figure 5.2 illustratesthe result of the pre-processing percentages in which contains
(Accuracy, Recall, Precision, F1-score)
(Figure 5.2 Results)
Figure 5.3: The start of the integration between the AI model (intrution_model.pkl),
Snort alerts (snort/alert) and the Zeek network monitoring file (conn.log)
( Figure 5.3 Model integration )
28.
28
Figure 5.4: Byusing the Flask framework, we made an app.py file for creating the
events dashboard on a specific port
( Figure 5.4 App Flask )
Figure 5.5: Displaying the Snort rules we have installed based on the attacks best
captured by our model
(Figure 5.5 Snort Rules)
29.
29
(Figure 5.6 Sign-inUser interface)
Figure 5.7: Presenting the final GUI events dashboard in which also shows some
events testing
(Figure 5.7 Home Dashboard)
30.
30
Figure 5.8: Afteran incident, the in-charged person can download a Report.pdf in
which will present the final 10 attacks with its meta-data
(Figure 5.8 Report file last events)
5.2 Testing phase
Figure 5.2.1 shows the launch of a PS script using the secure shell port for testing
the AI model for detecting a high volume of connections as a potential DoS/Probing
attack
(Figure 5.2.1)
31.
31
Figure 5.2.2 Displayspost-incident, how will the event log be presented for the user
(Figure 5.2.2)
32.
32
Chapter 6 –Future work & Conclusion
6.1 Introduction
This chapter demonstrates a comprehensive analysis of the results obtained
from the development of this project. The chapter aims to discuss the data collected
during the implementation phase, evaluate the effectiveness of the system through
testing, and analyse testing to provide context for the findings. The overall goal is to
gain insights into the strengths and weaknesses of the developed system and to
understand the extent to which it meets its intended objectives.
6.2 Future Work
While this study successfully demonstrated the integration of the Ai models with
snort & zeek for threat detection, several avenues for future research remain:
• Real-time Deployment and Scalability: Future research could focus on
transitioning the current model from a controlled environment (Google Colab)
to a live, high-speed network environment. Investigating the computational
overhead of deep learning models in real-time "inline" deployments would be
critical for industry adoption.
• Expansion of Datasets: While the UNSW-NB15 dataset provides a robust
foundation, future iterations of this work should incorporate more recent
datasets (such as CSE-CIC-IDS2020) to account for evolving exploits and
modern encrypted traffic patterns.
• Adversarial Machine Learning: A significant area for further study is the
resilience of AI models against adversarial attacks, where attackers
specifically craft traffic to bypass neural network detection.
6.3 Conclusion
This project successfully enhanced traditional intrusion detection by
integrating Machine Learning with Snort to address the limitations of signature-based
systems. Through the application of Random Forest and SVM models on the UNSW-
NB15 dataset, the system demonstrated a superior ability to identify the specified
threats. The implementation proved that behavioural analysis significantly reduces
false positives while maintaining high detection accuracy across diverse network
traffic. Findings confirm that this hybrid approach provides a more robust and
scalable defence mechanism than conventional methods alone. While the current
prototype shows high efficiency, future work could explore deep learning
architectures to further automate threat mitigation. Ultimately, this research bridges
the gap between static rule sets and dynamic AI-driven security, offering a more
resilient framework for modern cybersecurity environments.
33.
33
References: -
Academic Journals& Papers
• Moamin, S. A., Abdulhameed, M. K., Al-Amri, R. M., Radhi, A. D.,
Naser, R. K., & Pheng, L. G. (2025). Artificial Intelligence in Malware
and Network Intrusion Detection: A Comprehensive Survey of
Techniques, Datasets, Challenges, and Future Directions. Babylonian
Journal of Artificial Intelligence, 2025, 77-98.
• Jonathan J. Davis and Andrew J. Clark. 2011. Data preprocessing for
anomaly based network intrusion detection: Areview. Comput.
Secur. 30, 6 (2011), 353–375.
• H. J. Jeong, W. Hyun, J. Lim, and I. You. 2012. Anomaly teletraffic
intrusion detection systems on Hadoop-basedplatforms: A survey of
some problems and solutions. In 15th International Conference on
Network-based InformationSystems. 766–770
• H. E. Poston. 2012. A brief taxonomy of intrusion detection strategies.
In IEEE National Aerospace and ElectronicsConference. 255–263.
• Chou, D., & Jiang, M. (2021). A survey on data-driven network
intrusion detection. ACM Computing Surveys (CSUR), 54(9), 1-36.
• Moustafa, N., & Slay, J. (2015, November). UNSW-NB15: a
comprehensive data set for network intrusion detection systems
(UNSW-NB15 network data set). In 2015 military communications
and information systems conference (MilCIS) (pp. 1-6). IEEE.
• Shohan, N. J., Tanbhir, G., Elahi, F., Ullah, A., & Sakib, M. N. (2023,
December). Enhancing network security: A hybrid approach for
detection and mitigation of distributed denial-of-service attacks
using machine learning. In International Conference on Advanced
Network Technologies and Intelligent Computing (pp. 81-95). Cham:
Springer Nature Switzerland.
34.
34
Company Technology &Software
• Bfore.Ai. (n.d.). Predictive domain scoring and pre-emptive threat
intelligence: Technology overview. https://bfore.ai/
• Darktrace. (n.d.). Autonomous response and real-time behavioural
clustering: Cyber AI research. https://www.darktrace.com/
• Google. (n.d.). Google Colab. https://colab.google/
• PythonAnywhere. (n.d.). PythonAnywhere host services.
https://www.pythonanywhere.com/
• SafeBreach. (n.d.). Attack emulation and defensive gap assessment:
Platform features. https://www.safebreach.com/
• ThetaRay. (n.d.). Deep learning and statistical analysis for large-scale
data. https://thetaray.com/
Methodology
• Lucid Software. (n.d.). Waterfall methodology overview.
https://www.lucidchart.com/