The paper proposes a two-phase classification method for detecting anomalies in network traffic, aiming to tackle the challenges of imbalance and feature selection. The study uses Information Gain to select relevant features and evaluates its performance on the CICIDS-2018 dataset with various classifiers. Results indicate that the ensemble classifier achieved the highest accuracy, precision, and recall. The proposed method addresses challenges in intrusion detection and highlights the effectiveness of ensemble classifiers in improving anomaly detection accuracy. Also, the quantity of pertinent characteristics chosen by Information Gain has a considerable impact on the F1-score and detection accuracy. Specifically, the Ensemble Learning achieved the highest accuracy of 98.36% and F1-score of 97.98% using the relevant selected features.
A PROPOSED MODEL FOR DIMENSIONALITY REDUCTION TO IMPROVE THE CLASSIFICATION C...IJNSA Journal
Over the past few years, intrusion protection systems have drawn a mature research area in the field of computer networks. The problem of excessive features has a significant impact on
intrusion detection performance. The use of machine learning algorithms in many previous researches has been used to identify network traffic, harmful or normal. Therefore, to obtain the accuracy, we must reduce the dimensionality of the data used. A new model design based on a combination of feature selection and machine learning algorithms is proposed in this paper. This model depends on selected genes from every feature to increase the accuracy of intrusion detection systems. We selected from features content only ones which impact in attack detection. The performance has been evaluated based on a comparison of several known algorithms. The NSL-KDD dataset is used for examining classification. The proposed model outperformed the other learning approaches with accuracy 98.8 %.
Intrusion detection system for imbalance ratio class using weighted XGBoost c...TELKOMNIKA JOURNAL
The rapid development of the internet of things (IoT) has taken an important role in daily activities. As it develops, IoT is very vulnerable to attacks and creates IoT for users. Intrusion detection system (IDS) can work efficiently and look for activity in the network. Many data sets have already been collected, however, when dealing with problems involving big data and hight data imbalances. This article proposes, using the dataset used by BotIoT to evaluate the system framework to be created, the XGBoost model to improve the detection performance of all types of attacks, to control unbalanced data using the imbalance ratio of each class weight (CW). The experimental results show that the proposed approach greatly increases the detection rate for infrequent disturbances.
ADDRESSING IMBALANCED CLASSES PROBLEM OF INTRUSION DETECTION SYSTEM USING WEI...IJCNCJournal
The main issues of the Intrusion Detection Systems (IDS) are in the sensitivity of these systems toward the errors, the inconsistent and inequitable ways in which the evaluation processes of these systems were often performed. Most of the previous efforts concerned with improving the overall accuracy of these models via increasing the detection rate and decreasing the false alarm which is an important issue. Machine Learning (ML) algorithms can classify all or most of the records of the minor classes to one of the main classes with negligible impact on performance. The riskiness of the threats caused by the small classes and the shortcoming of the previous efforts were used to address this issue, in addition to the need for improving the performance of the IDSs were the motivations for this work. In this paper, stratified sampling method and different cost-function schemes were consolidated with Extreme Learning Machine (ELM) method with Kernels, Activation Functions to build competitive ID solutions that improved the performance of these systems and reduced the occurrence of the accuracy paradox problem. The main experiments were performed using the UNB ISCX2012 dataset. The experimental results of the UNB ISCX2012 dataset showed that ELM models with polynomial function outperform other models in overall accuracy, recall, and F-score. Also, it competed with traditional model in Normal, DoS and SSH classes.
INTRUSION DETECTION USING FEATURE SELECTION AND MACHINE LEARNING ALGORITHM WI...ijcsit
In order to avoid illegitimate use of any intruder, intrusion detection over the network is one of the critical
issues. An intruder may enter any network or system or server by intruding malicious packets into the
system in order to steal, sniff, manipulate or corrupt any useful and secret information, this process is
referred to as intrusion whereas when packets are transmitted by intruder over the network for any purpose
of intrusion is referred to as attack. With the expanding networking technology, millions of servers
communicate with each other and this expansion is always in progress every day. Due to this fact, more
and more intruders get attention; and so to overcome this need of smart intrusion detection model is a
primary requirement.
By analyzing the feature selection methods the identification of essential features of NSL-KDD data set is
done, then by using selected features and machine learning approach and analyzing the basic features of
networks over the data set a hybrid algorithm is made. Finally a model is produced over the algorithm
containing the rules for the network features.
A hybrid misuse intrusion detection model is made to find attacks on system to improve the intrusion
detection. Based on prior features, intrusions on the system can be detected without any previous learning.
This model contains the advantage of feature selection and machine learning techniques with misuse
detection.
New Hybrid Intrusion Detection System Based On Data Mining Technique to Enhan...ijceronline
International Journal of Computational Engineering Research (IJCER) is dedicated to protecting personal information and will make every reasonable effort to handle collected information appropriately. All information collected, as well as related requests, will be handled as carefully and efficiently as possible in accordance with IJCER standards for integrity and objectivity.
New Hybrid Intrusion Detection System Based On Data Mining Technique to Enhan...ijceronline
International Journal of Computational Engineering Research (IJCER) is dedicated to protecting personal information and will make every reasonable effort to handle collected information appropriately. All information collected, as well as related requests, will be handled as carefully and efficiently as possible in accordance with IJCER standards for integrity and objectivity.
Visualize network anomaly detection by using k means clustering algorithmIJCNCJournal
With the ever increasing amount of new attacks in today’s world the amount of data will keep increasing,
and because of the base-rate fallacy the amount of false alarms will also increase. Another problem with
detection of attacks is that they usually isn’t detected until after the attack has taken place, this makes
defending against attacks hard and can easily lead to disclosure of sensitive information.
In this paper we choose K-means algorithm with the Kdd Cup 1999 network data set to evaluate the
performance of an unsupervised learning method for anomaly detection. The results of the evaluation
showed that a high detection rate can be achieve while maintaining a low false alarm rate .This paper
presents the result of using k-means clustering by applying Cluster 3.0 tool and visualized this result by
using TreeView visualization tool .
A PROPOSED MODEL FOR DIMENSIONALITY REDUCTION TO IMPROVE THE CLASSIFICATION C...IJNSA Journal
Over the past few years, intrusion protection systems have drawn a mature research area in the field of computer networks. The problem of excessive features has a significant impact on
intrusion detection performance. The use of machine learning algorithms in many previous researches has been used to identify network traffic, harmful or normal. Therefore, to obtain the accuracy, we must reduce the dimensionality of the data used. A new model design based on a combination of feature selection and machine learning algorithms is proposed in this paper. This model depends on selected genes from every feature to increase the accuracy of intrusion detection systems. We selected from features content only ones which impact in attack detection. The performance has been evaluated based on a comparison of several known algorithms. The NSL-KDD dataset is used for examining classification. The proposed model outperformed the other learning approaches with accuracy 98.8 %.
Intrusion detection system for imbalance ratio class using weighted XGBoost c...TELKOMNIKA JOURNAL
The rapid development of the internet of things (IoT) has taken an important role in daily activities. As it develops, IoT is very vulnerable to attacks and creates IoT for users. Intrusion detection system (IDS) can work efficiently and look for activity in the network. Many data sets have already been collected, however, when dealing with problems involving big data and hight data imbalances. This article proposes, using the dataset used by BotIoT to evaluate the system framework to be created, the XGBoost model to improve the detection performance of all types of attacks, to control unbalanced data using the imbalance ratio of each class weight (CW). The experimental results show that the proposed approach greatly increases the detection rate for infrequent disturbances.
ADDRESSING IMBALANCED CLASSES PROBLEM OF INTRUSION DETECTION SYSTEM USING WEI...IJCNCJournal
The main issues of the Intrusion Detection Systems (IDS) are in the sensitivity of these systems toward the errors, the inconsistent and inequitable ways in which the evaluation processes of these systems were often performed. Most of the previous efforts concerned with improving the overall accuracy of these models via increasing the detection rate and decreasing the false alarm which is an important issue. Machine Learning (ML) algorithms can classify all or most of the records of the minor classes to one of the main classes with negligible impact on performance. The riskiness of the threats caused by the small classes and the shortcoming of the previous efforts were used to address this issue, in addition to the need for improving the performance of the IDSs were the motivations for this work. In this paper, stratified sampling method and different cost-function schemes were consolidated with Extreme Learning Machine (ELM) method with Kernels, Activation Functions to build competitive ID solutions that improved the performance of these systems and reduced the occurrence of the accuracy paradox problem. The main experiments were performed using the UNB ISCX2012 dataset. The experimental results of the UNB ISCX2012 dataset showed that ELM models with polynomial function outperform other models in overall accuracy, recall, and F-score. Also, it competed with traditional model in Normal, DoS and SSH classes.
INTRUSION DETECTION USING FEATURE SELECTION AND MACHINE LEARNING ALGORITHM WI...ijcsit
In order to avoid illegitimate use of any intruder, intrusion detection over the network is one of the critical
issues. An intruder may enter any network or system or server by intruding malicious packets into the
system in order to steal, sniff, manipulate or corrupt any useful and secret information, this process is
referred to as intrusion whereas when packets are transmitted by intruder over the network for any purpose
of intrusion is referred to as attack. With the expanding networking technology, millions of servers
communicate with each other and this expansion is always in progress every day. Due to this fact, more
and more intruders get attention; and so to overcome this need of smart intrusion detection model is a
primary requirement.
By analyzing the feature selection methods the identification of essential features of NSL-KDD data set is
done, then by using selected features and machine learning approach and analyzing the basic features of
networks over the data set a hybrid algorithm is made. Finally a model is produced over the algorithm
containing the rules for the network features.
A hybrid misuse intrusion detection model is made to find attacks on system to improve the intrusion
detection. Based on prior features, intrusions on the system can be detected without any previous learning.
This model contains the advantage of feature selection and machine learning techniques with misuse
detection.
New Hybrid Intrusion Detection System Based On Data Mining Technique to Enhan...ijceronline
International Journal of Computational Engineering Research (IJCER) is dedicated to protecting personal information and will make every reasonable effort to handle collected information appropriately. All information collected, as well as related requests, will be handled as carefully and efficiently as possible in accordance with IJCER standards for integrity and objectivity.
New Hybrid Intrusion Detection System Based On Data Mining Technique to Enhan...ijceronline
International Journal of Computational Engineering Research (IJCER) is dedicated to protecting personal information and will make every reasonable effort to handle collected information appropriately. All information collected, as well as related requests, will be handled as carefully and efficiently as possible in accordance with IJCER standards for integrity and objectivity.
Visualize network anomaly detection by using k means clustering algorithmIJCNCJournal
With the ever increasing amount of new attacks in today’s world the amount of data will keep increasing,
and because of the base-rate fallacy the amount of false alarms will also increase. Another problem with
detection of attacks is that they usually isn’t detected until after the attack has taken place, this makes
defending against attacks hard and can easily lead to disclosure of sensitive information.
In this paper we choose K-means algorithm with the Kdd Cup 1999 network data set to evaluate the
performance of an unsupervised learning method for anomaly detection. The results of the evaluation
showed that a high detection rate can be achieve while maintaining a low false alarm rate .This paper
presents the result of using k-means clustering by applying Cluster 3.0 tool and visualized this result by
using TreeView visualization tool .
Data Mining Techniques for Providing Network Security through Intrusion Detec...IJAAS Team
Intrusion Detection Systems are playing major role in network security in this internet world. Many researchers have been introduced number of intrusion detection systems in the past. Even though, no system was detected all kind of attacks and achieved better detection accuracy. Most of the intrusion detection systems are used data mining techniques such as clustering, outlier detection, classification, classification through learning techniques. Most of the researchers have been applied soft computing techniques for making effective decision over the network dataset for enhancing the detection accuracy in Intrusion Detection System. Few researchers also applied artificial intelligence techniques along with data mining algorithms for making dynamic decision. This paper discusses about the number of intrusion detection systems that are proposed for providing network security. Finally, comparative analysis made between the existing systems and suggested some new ideas for enhancing the performance of the existing systems.
Machine learning-based intrusion detection system for detecting web attacksIAESIJAI
The increasing use of smart devices results in a huge amount of data, which raises concerns about personal data, including health data and financial data. This data circulates on the network and can encounter network traffic at any time. This traffic can either be normal traffic or an intrusion created by hackers with the aim of injecting abnormal traffic into the network. Firewalls and traditional intrusion detection systems detect attacks based on signature patterns. However, this is not sufficient to detect advanced or unknown attacks. To detect different types of unknown attacks, the use of intelligent techniques is essential. In this paper, we analyse some machine learning techniques proposed in recent years. In this study, several classifications were made to detect anomalous behaviour in network traffic. The models were built and evaluated based on the Canadian Institute for Cybersecurity-intrusion detection systems dataset released in 2017 (CIC-IDS-2017), which includes both current and historical attacks. The experiments were conducted using decision tree, random forest, logistic regression, gaussian naïve bayes, adaptive boosting, and their ensemble approach. The models were evaluated using various evaluation metrics such as accuracy, precision, recall, F1-score, false positive rate, receiver operating characteristic curve, and calibration curve.
A Lightweight Method for Detecting Cyber Attacks in High-traffic Large Networ...IJCNCJournal
Protecting information systems is a difficult and long-term task. The size and traffic intensity of computer networks are diverse and no one protection solution is universal for all cases. A certain solution protects well in the campus network, but it is unlikely to protect well in the service provider's network. A key component of a cyber defence system is a network attack detector. This component needs to be designed to have a good way to scale detection capabilities with network size and traffic intensity beyond the size and intensity of a campus network. From this point of view, this paper aims to build a network attack detection method suitable for the scale of large and high-traffic networks based on machine learning models using clustering techniques and our proposed detection technique. The detection technique is different from outlier detection commonly used in clustering-based anomaly detection applications. The method was evaluated in cases using different feature extraction methods and different clustering algorithms. Experimental results on the NSL-KDD data set are positive with a detection accuracy of over 97%.
A LIGHTWEIGHT METHOD FOR DETECTING CYBER ATTACKS IN HIGH-TRAFFIC LARGE NETWOR...IJCNCJournal
Protecting information systems is a difficult and long-term task. The size and traffic intensity of computer
networks are diverse and no one protection solution is universal for all cases. A certain solution protects
well in the campus network, but it is unlikely to protect well in the service provider's network. A key
component of a cyber defence system is a network attack detector. This component needs to be designed to
have a good way to scale detection capabilities with network size and traffic intensity beyond the size and
intensity of a campus network. From this point of view, this paper aims to build a network attack detection
method suitable for the scale of large and high-traffic networks based on machine learning models using
clustering techniques and our proposed detection technique. The detection technique is different from
outlier detection commonly used in clustering-based anomaly detection applications. The method was
evaluated in cases using different feature extraction methods and different clustering algorithms.
Experimental results on the NSL-KDD data set are positive with a detection accuracy of over 97%.
A new proactive feature selection model based on the enhanced optimization a...IJECEIAES
Cyberattacks have grown steadily over the last few years. The distributed reflection denial of service (DRDoS) attack has been rising, a new variant of distributed denial of service (DDoS) attack. DRDoS attacks are more difficult to mitigate due to the dynamics and the attack strategy of this type of attack. The number of features influences the performance of the intrusion detection system by investigating the behavior of traffic. Therefore, the feature selection model improves the accuracy of the detection mechanism also reduces the time of detection by reducing the number of features. The proposed model aims to detect DRDoS attacks based on the feature selection model, and this model is called a proactive feature selection model proactive feature selection (PFS). This model uses a nature-inspired optimization algorithm for the feature subset selection. Three machine learning algorithms, i.e., k-nearest neighbor (KNN), random forest (RF), and support vector machine (SVM), were evaluated as the potential classifier for evaluating the selected features. We have used the CICDDoS2019 dataset for evaluation purposes. The performance of each classifier is compared to previous models. The results indicate that the suggested model works better than the current approaches providing a higher detection rate (DR), a low false-positive rate (FPR), and increased accuracy detection (DA). The PFS model shows better accuracy to detect DRDoS attacks with 89.59%.
SURVEY OF NETWORK ANOMALY DETECTION USING MARKOV CHAINijcseit
Recently an internet threat has been increased. Our motive is detect the intrusion in the network in concise.
The real time issue such as DoS attack in banking, companies, industries and organization have been
increased significantly IDS has been used in both server and host side. The major challenge is to effectively
predict the periods of threats and protect the server from the unauthorized user. In this study, a novel
probabilistic approach is proposed effectively to detect the network intrusions. It uses a Markov chain for
probabilistic modelling of abnormal events in network systems. The degree of abnormality of the incoming
data is performed on the basis of the network states.
Survey of network anomaly detection using markov chainijcseit
Recently an internet threat has been increased. Our motive is detect the intrusion in the network in concise.
The real time issue such as DoS attack in banking, companies, industries and organization have been
increased significantly IDS has been used in both server and host side. The major challenge is to effectively
predict the periods of threats and protect the server from the unauthorized user. In this study, a novel
probabilistic approach is proposed effectively to detect the network intrusions. It uses a Markov chain for
probabilistic modelling of abnormal events in network systems. The degree of abnormality of the incoming
data is performed on the basis of the network states.
International Journal of Computer Science, Engineering and Information Techno...ijcseit
Recently an internet threat has been increased. Our motive is detect the intrusion in the network in concise.
The real time issue such as DoS attack in banking, companies, industries and organization have been
increased significantly IDS has been used in both server and host side. The major challenge is to effectively
predict the periods of threats and protect the server from the unauthorized user. In this study, a novel
probabilistic approach is proposed effectively to detect the network intrusions. It uses a Markov chain for
probabilistic modelling of abnormal events in network systems. The degree of abnormality of the incoming
data is performed on the basis of the network states.
The main goal of Intrusion Detection Systems (IDSs) is
to detect intrusions. This kind of detection system represents a
significant tool in traditional computer based systems for ensuring
cyber security. IDS model can be faster and reach more accurate
detection rates, by selecting the most related features from the
input dataset. Feature selection is an important stage of any IDs to
select the optimal subset of features that enhance the process of the
training model to become faster and reduce the complexity while
preserving or enhancing the performance of the system. In this
paper, we proposed a method that based on dividing the input
dataset into different subsets according to each attack. Then we
performed a feature selection technique using information gain
filter for each subset. Then the optimal features set is generated by
combining the list of features sets that obtained for each attack.
Experimental results that conducted on NSL-KDD dataset shows
that the proposed method for feature selection with fewer features,
make an improvement to the system accuracy while decreasing the
complexity. Moreover, a comparative study is performed to the
efficiency of technique for feature selection using different
classification methods. To enhance the overall performance,
another stage is conducted using Random Forest and PART on
voting learning algorithm. The results indicate that the best
accuracy is achieved when using the product probability rule.
ATTACK DETECTION AVAILING FEATURE DISCRETION USING RANDOM FOREST CLASSIFIERCSEIJJournal
The widespread use of the Internet has an adverse effect of being vulnerable to cyber attacks. Defensive
mechanisms like firewalls and IDSs have evolved with a lot of research contributions happening in these
areas. Machine learning techniques have been successfully used in these defense mechanisms especially
IDSs. Although they are effective to some extent in identifying new patterns and variants of existing
malicious patterns, many attacks are still left as undetected. The objective is to develop an algorithm for
detecting malicious domains based on passive traffic measurements. In this paper, an anomaly-based
intrusion detection system based on an ensemble based machine learning classifier called Random Forest
with gradient boosting is deployed. NSL-KDD cup dataset is used for analysis and out of 41 features, 32
features were identified as significant using feature discretion. Our observations confirm the conjecture
that both the feature selection and stochastic based genetic operators improves the accuracy and the
effectiveness. The training time is shown to be reduced tremendously by 98.59% and accuracy improved to
98.75%.
Attack Detection Availing Feature Discretion using Random Forest ClassifierCSEIJJournal
The widespread use of the Internet has an adverse effect of being vulnerable to cyber attacks. Defensive
mechanisms like firewalls and IDSs have evolved with a lot of research contributions happening in these
areas. Machine learning techniques have been successfully used in these defense mechanisms especially
IDSs. Although they are effective to some extent in identifying new patterns and variants of existing
malicious patterns, many attacks are still left as undetected. The objective is to develop an algorithm for
detecting malicious domains based on passive traffic measurements. In this paper, an anomaly-based
intrusion detection system based on an ensemble based machine learning classifier called Random Forest
with gradient boosting is deployed. NSL-KDD cup dataset is used for analysis and out of 41 features, 32
features were identified as significant using feature discretion.
Improving the performance of Intrusion detection systemsyasmen essam
Intrusion detection systems (IDS) are widely studied by
researchers nowadays due to the dramatic growth in
network-based technologies. Policy violations and
unauthorized access is in turn increasing which makes
intrusion detection systems of great importance. Existing
approaches to improve intrusion detection systems focus on feature selection or reduction since some features are
irrelevant or redundant which when removed improve the
accuracy as well as the learning time.
BENCHMARKS FOR EVALUATING ANOMALY-BASED INTRUSION DETECTION SOLUTIONSIJNSA Journal
Anomaly-based Intrusion Detection Systems (IDS) have gained increased popularity over time. There are many proposed anomaly-based systems using different Machine Learning (ML) algorithms and techniques, however there is no standard benchmark to compare them based on quantifiable measures. In this paper, we propose a benchmark that measures both accuracy and performance to produce objective metrics that
can be used in the evaluation of each algorithm implementation. We then use this benchmark to compare accuracy as well as the performance of four different Anomaly-based IDS solutions based on various ML algorithms. The algorithms include Naive Bayes, Support Vector Machines, Neural Networks, and K-means Clustering. The benchmark evaluation is performed on the popular NSL-KDD dataset. The experimental results show the differences in accuracy and performance between these Anomaly-based IDS solutions on the dataset. The results also demonstrate how this benchmark can be used to create useful metrics for such comparisons
Benchmarks for Evaluating Anomaly Based Intrusion Detection SolutionsIJNSA Journal
Anomaly-based Intrusion Detection Systems (IDS) have gained increased popularity over time. There are many proposed anomaly-based systems using different Machine Learning (ML) algorithms and techniques, however there is no standard benchmark to compare them based on quantifiable measures. In this paper, we propose a benchmark that measures both accuracy and performance to produce objective metrics that can be used in the evaluation of each algorithm implementation. We then use this benchmark to compare accuracy as well as the performance of four different Anomaly-based IDS solutions based on various ML algorithms. The algorithms include Naive Bayes, Support Vector Machines, Neural Networks, and K-means Clustering. The benchmark evaluation is performed on the popular NSL-KDD dataset. The experimental results show the differences in accuracy and performance between these Anomaly-based IDS solutions on the dataset. The results also demonstrate how this benchmark can be used to create useful metrics for such comparisons.
Progress of Machine Learning in the Field of Intrusion Detection Systemsijcisjournal
With the growth in the use of the Internet and local area networks, malicious attacks and intrusions into
computer systems are increasing. Implementing intrusion detection systems have become extremely
important to help maintain good network security. Support vector machines (SVMs), a classic pattern
recognition tool, have been widely used in intrusion detection. They can handle very large data with high
efficiency, are easy to use, and exhibit good prediction behavior. This paper presents a new SVM model
enriched with a Gaussian kernel function based on the features of the training data for intrusion detection.
The new model is tested with the CICIDS2017 dataset. The test proves better results in terms of detection
efficiency and false alarm rate, which can give better coverage and make detection more efficient.
11421ijcPROGRESS OF MACHINE LEARNING IN THE FIELD OF INTRUSION DETECTION SYST...ijcisjournal
With the growth in the use of the Internet and local area networks, malicious attacks and intrusions into computer systems are increasing. Implementing intrusion detection systems have become extremely important to help maintain good network security. Support vector machines (SVMs), a classic pattern recognition tool, have been widely used in intrusion detection. They can handle very large data with high efficiency, are easy to use, and exhibit good prediction behavior. This paper presents a new SVM model enriched with a Gaussian kernel function based on the features of the training data for intrusion detection. The new model is tested with the CICIDS2017 dataset. The test proves better results in terms of detection efficiency and false alarm rate, which can give better coverage and make detection more efficient.
CLASSIFICATION PROCEDURES FOR INTRUSION DETECTION BASED ON KDD CUP 99 DATA SETIJNSA Journal
In network security framework, intrusion detection is one of a benchmark part and is a fundamental way to protect PC from many threads. The huge issue in intrusion detection is presented as a huge number of false alerts; this issue motivates several experts to discover the solution for minifying false alerts according to data mining that is a consideration as analysis procedure utilized in a large data e.g. KDD CUP 99. This paper presented various data mining classification for handling false alerts in intrusion detection as reviewed. According to the result of testing many procedure of data mining on KDD CUP 99 that is no individual procedure can reveal all attack class, with high accuracy and without false alerts. The best accuracy in Multilayer Perceptron is 92%; however, the best Training Time in Rule based model is 4 seconds . It is concluded that ,various procedures should be utilized to handle several of network attacks.
CLASSIFICATION PROCEDURES FOR INTRUSION DETECTION BASED ON KDD CUP 99 DATA SETIJNSA Journal
In network security framework, intrusion detection is one of a benchmark part and is a fundamental way to protect PC from many threads. The huge issue in intrusion detection is presented as a huge number of false alerts; this issue motivates several experts to discover the solution for minifying false alerts according to data mining that is a consideration as analysis procedure utilized in a large data e.g. KDD CUP 99. This paper presented various data mining classification for handling false alerts in intrusion detection as reviewed. According to the result of testing many procedure of data mining on KDD CUP 99 that is no individual procedure can reveal all attack class, with high accuracy and without false alerts. The best accuracy in Multilayer Perceptron is 92%; however, the best Training Time in Rule based model is 4 seconds . It is concluded that ,various procedures should be utilized to handle several of network attacks.
DETECTION OF ATTACKS IN WIRELESS NETWORKS USING DATA MINING TECHNIQUESIAEME Publication
With the progressive increase of network application and electronic devices (computer, mobile phones, android, etc), attack and intrusion detection is becoming a very challenging task in cybercrime detection area. in this context, most of existing approaches of attack detection rely mainly on a finite set of attacks. However, these solutions are vulnerable, that is, they fail in detecting some attacks when sources of information’s are ambiguous or imperfect. But, few approaches started investigating toward this direction. Following this trends, this paper investigates the role of machine learning approach (ANN, SVM) in detecting TCP connection traffic as normal or suspicious one. But, using ANN and SVM is an expensive technique individually. In this paper, combining two classifiers has been proposed, where artificial neural network (ANN) classifier and support vector machine (SVM) were employed. Additionally, our proposed solution allows visualizing obtained classification results. Accuracy of the proposed solution has been compared with other classifier results. Experiments have been conducted with different network connection selected from NSL-KDD DARPA dataset. Empirical results show that combining ANN and SVM techniques for attack detection is a promising direction
An effective approach for tackling network security
problems is Intrusion detection systems (IDS). These kind of
systems play a key role in network security as they can detect
different types of attacks in networks, including DoS, U2R Probe
and R2L. In addition, IDS are an increasingly key part of the
system’s defense. Various approaches to IDS are now being used,
but are unfortunately relatively ineffective. Data mining techniques
and artificial intelligence play an important role in security
services. We will present a comparative study of three wellknown
intelligent algorithms in this paper. These are Radial Basis
Functions (RBF), Multilayer Perceptrons (MLP) and Support
Vector Machine (SVM).This work’s main interest is to benchmark
the performance of these3 intelligent algorithms. This is done by
using a dataset of about 9,000 connections, randomly chosen from
KDD'99’s 10% dataset. In addition, we investigate these
algorithms’ performance in terms of their attack classification
accuracy. The Simulation results are also analyzed and the
discussion is then presented. It has been observed that SVM with a
linear kernel (Linear-SVM) gives a better performance than MLP
and RBF in terms of its detection accuracy and processing speed.
Vehicle Ad Hoc Networks (VANETs) have become a viable technology to improve traffic flow and safety on the roads. Due to its effectiveness and scalability, the Wingsuit Search-based Optimised Link State Routing Protocol (WS-OLSR) is frequently used for data distribution in VANETs. However, the selection of MultiPoint Relays (MPRs) plays a pivotal role in WS-OLSR's performance. This paper presents an improved MPR selection algorithm tailored to WS-OLSR, designed to enhance the overall routing efficiency and reduce overhead. The analysis found that the current OLSR protocol has problems such as redundancy of HELLO and TC message packets or failure to update routing information in time, so a WS-OLSR routing protocol based on improved-MPR selection algorithm was proposed. Firstly, factors such as node mobility and link changes are comprehensively considered to reflect network topology changes, and the broadcast cycle of node HELLO messages is controlled through topology changes. Secondly, a new MPR selection algorithm is proposed, considering link stability issues and nodes. Finally, evaluate its effectiveness in terms of packet delivery ratio, end-to-end delay, and control message overhead. Simulation results demonstrate the superior performance of our improved MR selection algorithm when compared to traditional approaches.
A Novel Medium Access Control Strategy for Heterogeneous Traffic in Wireless ...IJCNCJournal
So far, Wireless Body Area Networks (WBANs) have played a pivotal role in driving the development of intelligent healthcare systems with broad applicability across various domains. Each WBAN consists of one or more types of sensors that can be embedded in clothing, attached directly to the body, or even implanted beneath an individual's skin. These sensors typically serve asingle application. However, the traffic generated by each sensor may have distinct requirements. This diversity necessitates a dual approach: tailored treatment based on the specific needs of each traffic typeand the fulfillment of application requirements, such asreliability and timeliness. Never the less, the presence of energy constraints and the unreliable nature of wireless communications make QoS provisioning under such networks a non-trivial task. In this context, the current paper introduces a novel Medium AccessControl (MAC) strategy for the regular traffic applications of WBANs, designed to significantly enhance efficiency when compared to the established MAC protocols IEEE 802.15.4 and IEEE 802.15.6, with a particular focus on improving reliability, timeliness, and energy efficiency.
More Related Content
Similar to Intrusion Detection System (IDS) Development Using Tree-Based Machine Learning Algorithms
Data Mining Techniques for Providing Network Security through Intrusion Detec...IJAAS Team
Intrusion Detection Systems are playing major role in network security in this internet world. Many researchers have been introduced number of intrusion detection systems in the past. Even though, no system was detected all kind of attacks and achieved better detection accuracy. Most of the intrusion detection systems are used data mining techniques such as clustering, outlier detection, classification, classification through learning techniques. Most of the researchers have been applied soft computing techniques for making effective decision over the network dataset for enhancing the detection accuracy in Intrusion Detection System. Few researchers also applied artificial intelligence techniques along with data mining algorithms for making dynamic decision. This paper discusses about the number of intrusion detection systems that are proposed for providing network security. Finally, comparative analysis made between the existing systems and suggested some new ideas for enhancing the performance of the existing systems.
Machine learning-based intrusion detection system for detecting web attacksIAESIJAI
The increasing use of smart devices results in a huge amount of data, which raises concerns about personal data, including health data and financial data. This data circulates on the network and can encounter network traffic at any time. This traffic can either be normal traffic or an intrusion created by hackers with the aim of injecting abnormal traffic into the network. Firewalls and traditional intrusion detection systems detect attacks based on signature patterns. However, this is not sufficient to detect advanced or unknown attacks. To detect different types of unknown attacks, the use of intelligent techniques is essential. In this paper, we analyse some machine learning techniques proposed in recent years. In this study, several classifications were made to detect anomalous behaviour in network traffic. The models were built and evaluated based on the Canadian Institute for Cybersecurity-intrusion detection systems dataset released in 2017 (CIC-IDS-2017), which includes both current and historical attacks. The experiments were conducted using decision tree, random forest, logistic regression, gaussian naïve bayes, adaptive boosting, and their ensemble approach. The models were evaluated using various evaluation metrics such as accuracy, precision, recall, F1-score, false positive rate, receiver operating characteristic curve, and calibration curve.
A Lightweight Method for Detecting Cyber Attacks in High-traffic Large Networ...IJCNCJournal
Protecting information systems is a difficult and long-term task. The size and traffic intensity of computer networks are diverse and no one protection solution is universal for all cases. A certain solution protects well in the campus network, but it is unlikely to protect well in the service provider's network. A key component of a cyber defence system is a network attack detector. This component needs to be designed to have a good way to scale detection capabilities with network size and traffic intensity beyond the size and intensity of a campus network. From this point of view, this paper aims to build a network attack detection method suitable for the scale of large and high-traffic networks based on machine learning models using clustering techniques and our proposed detection technique. The detection technique is different from outlier detection commonly used in clustering-based anomaly detection applications. The method was evaluated in cases using different feature extraction methods and different clustering algorithms. Experimental results on the NSL-KDD data set are positive with a detection accuracy of over 97%.
A LIGHTWEIGHT METHOD FOR DETECTING CYBER ATTACKS IN HIGH-TRAFFIC LARGE NETWOR...IJCNCJournal
Protecting information systems is a difficult and long-term task. The size and traffic intensity of computer
networks are diverse and no one protection solution is universal for all cases. A certain solution protects
well in the campus network, but it is unlikely to protect well in the service provider's network. A key
component of a cyber defence system is a network attack detector. This component needs to be designed to
have a good way to scale detection capabilities with network size and traffic intensity beyond the size and
intensity of a campus network. From this point of view, this paper aims to build a network attack detection
method suitable for the scale of large and high-traffic networks based on machine learning models using
clustering techniques and our proposed detection technique. The detection technique is different from
outlier detection commonly used in clustering-based anomaly detection applications. The method was
evaluated in cases using different feature extraction methods and different clustering algorithms.
Experimental results on the NSL-KDD data set are positive with a detection accuracy of over 97%.
A new proactive feature selection model based on the enhanced optimization a...IJECEIAES
Cyberattacks have grown steadily over the last few years. The distributed reflection denial of service (DRDoS) attack has been rising, a new variant of distributed denial of service (DDoS) attack. DRDoS attacks are more difficult to mitigate due to the dynamics and the attack strategy of this type of attack. The number of features influences the performance of the intrusion detection system by investigating the behavior of traffic. Therefore, the feature selection model improves the accuracy of the detection mechanism also reduces the time of detection by reducing the number of features. The proposed model aims to detect DRDoS attacks based on the feature selection model, and this model is called a proactive feature selection model proactive feature selection (PFS). This model uses a nature-inspired optimization algorithm for the feature subset selection. Three machine learning algorithms, i.e., k-nearest neighbor (KNN), random forest (RF), and support vector machine (SVM), were evaluated as the potential classifier for evaluating the selected features. We have used the CICDDoS2019 dataset for evaluation purposes. The performance of each classifier is compared to previous models. The results indicate that the suggested model works better than the current approaches providing a higher detection rate (DR), a low false-positive rate (FPR), and increased accuracy detection (DA). The PFS model shows better accuracy to detect DRDoS attacks with 89.59%.
SURVEY OF NETWORK ANOMALY DETECTION USING MARKOV CHAINijcseit
Recently an internet threat has been increased. Our motive is detect the intrusion in the network in concise.
The real time issue such as DoS attack in banking, companies, industries and organization have been
increased significantly IDS has been used in both server and host side. The major challenge is to effectively
predict the periods of threats and protect the server from the unauthorized user. In this study, a novel
probabilistic approach is proposed effectively to detect the network intrusions. It uses a Markov chain for
probabilistic modelling of abnormal events in network systems. The degree of abnormality of the incoming
data is performed on the basis of the network states.
Survey of network anomaly detection using markov chainijcseit
Recently an internet threat has been increased. Our motive is detect the intrusion in the network in concise.
The real time issue such as DoS attack in banking, companies, industries and organization have been
increased significantly IDS has been used in both server and host side. The major challenge is to effectively
predict the periods of threats and protect the server from the unauthorized user. In this study, a novel
probabilistic approach is proposed effectively to detect the network intrusions. It uses a Markov chain for
probabilistic modelling of abnormal events in network systems. The degree of abnormality of the incoming
data is performed on the basis of the network states.
International Journal of Computer Science, Engineering and Information Techno...ijcseit
Recently an internet threat has been increased. Our motive is detect the intrusion in the network in concise.
The real time issue such as DoS attack in banking, companies, industries and organization have been
increased significantly IDS has been used in both server and host side. The major challenge is to effectively
predict the periods of threats and protect the server from the unauthorized user. In this study, a novel
probabilistic approach is proposed effectively to detect the network intrusions. It uses a Markov chain for
probabilistic modelling of abnormal events in network systems. The degree of abnormality of the incoming
data is performed on the basis of the network states.
The main goal of Intrusion Detection Systems (IDSs) is
to detect intrusions. This kind of detection system represents a
significant tool in traditional computer based systems for ensuring
cyber security. IDS model can be faster and reach more accurate
detection rates, by selecting the most related features from the
input dataset. Feature selection is an important stage of any IDs to
select the optimal subset of features that enhance the process of the
training model to become faster and reduce the complexity while
preserving or enhancing the performance of the system. In this
paper, we proposed a method that based on dividing the input
dataset into different subsets according to each attack. Then we
performed a feature selection technique using information gain
filter for each subset. Then the optimal features set is generated by
combining the list of features sets that obtained for each attack.
Experimental results that conducted on NSL-KDD dataset shows
that the proposed method for feature selection with fewer features,
make an improvement to the system accuracy while decreasing the
complexity. Moreover, a comparative study is performed to the
efficiency of technique for feature selection using different
classification methods. To enhance the overall performance,
another stage is conducted using Random Forest and PART on
voting learning algorithm. The results indicate that the best
accuracy is achieved when using the product probability rule.
ATTACK DETECTION AVAILING FEATURE DISCRETION USING RANDOM FOREST CLASSIFIERCSEIJJournal
The widespread use of the Internet has an adverse effect of being vulnerable to cyber attacks. Defensive
mechanisms like firewalls and IDSs have evolved with a lot of research contributions happening in these
areas. Machine learning techniques have been successfully used in these defense mechanisms especially
IDSs. Although they are effective to some extent in identifying new patterns and variants of existing
malicious patterns, many attacks are still left as undetected. The objective is to develop an algorithm for
detecting malicious domains based on passive traffic measurements. In this paper, an anomaly-based
intrusion detection system based on an ensemble based machine learning classifier called Random Forest
with gradient boosting is deployed. NSL-KDD cup dataset is used for analysis and out of 41 features, 32
features were identified as significant using feature discretion. Our observations confirm the conjecture
that both the feature selection and stochastic based genetic operators improves the accuracy and the
effectiveness. The training time is shown to be reduced tremendously by 98.59% and accuracy improved to
98.75%.
Attack Detection Availing Feature Discretion using Random Forest ClassifierCSEIJJournal
The widespread use of the Internet has an adverse effect of being vulnerable to cyber attacks. Defensive
mechanisms like firewalls and IDSs have evolved with a lot of research contributions happening in these
areas. Machine learning techniques have been successfully used in these defense mechanisms especially
IDSs. Although they are effective to some extent in identifying new patterns and variants of existing
malicious patterns, many attacks are still left as undetected. The objective is to develop an algorithm for
detecting malicious domains based on passive traffic measurements. In this paper, an anomaly-based
intrusion detection system based on an ensemble based machine learning classifier called Random Forest
with gradient boosting is deployed. NSL-KDD cup dataset is used for analysis and out of 41 features, 32
features were identified as significant using feature discretion.
Improving the performance of Intrusion detection systemsyasmen essam
Intrusion detection systems (IDS) are widely studied by
researchers nowadays due to the dramatic growth in
network-based technologies. Policy violations and
unauthorized access is in turn increasing which makes
intrusion detection systems of great importance. Existing
approaches to improve intrusion detection systems focus on feature selection or reduction since some features are
irrelevant or redundant which when removed improve the
accuracy as well as the learning time.
BENCHMARKS FOR EVALUATING ANOMALY-BASED INTRUSION DETECTION SOLUTIONSIJNSA Journal
Anomaly-based Intrusion Detection Systems (IDS) have gained increased popularity over time. There are many proposed anomaly-based systems using different Machine Learning (ML) algorithms and techniques, however there is no standard benchmark to compare them based on quantifiable measures. In this paper, we propose a benchmark that measures both accuracy and performance to produce objective metrics that
can be used in the evaluation of each algorithm implementation. We then use this benchmark to compare accuracy as well as the performance of four different Anomaly-based IDS solutions based on various ML algorithms. The algorithms include Naive Bayes, Support Vector Machines, Neural Networks, and K-means Clustering. The benchmark evaluation is performed on the popular NSL-KDD dataset. The experimental results show the differences in accuracy and performance between these Anomaly-based IDS solutions on the dataset. The results also demonstrate how this benchmark can be used to create useful metrics for such comparisons
Benchmarks for Evaluating Anomaly Based Intrusion Detection SolutionsIJNSA Journal
Anomaly-based Intrusion Detection Systems (IDS) have gained increased popularity over time. There are many proposed anomaly-based systems using different Machine Learning (ML) algorithms and techniques, however there is no standard benchmark to compare them based on quantifiable measures. In this paper, we propose a benchmark that measures both accuracy and performance to produce objective metrics that can be used in the evaluation of each algorithm implementation. We then use this benchmark to compare accuracy as well as the performance of four different Anomaly-based IDS solutions based on various ML algorithms. The algorithms include Naive Bayes, Support Vector Machines, Neural Networks, and K-means Clustering. The benchmark evaluation is performed on the popular NSL-KDD dataset. The experimental results show the differences in accuracy and performance between these Anomaly-based IDS solutions on the dataset. The results also demonstrate how this benchmark can be used to create useful metrics for such comparisons.
Progress of Machine Learning in the Field of Intrusion Detection Systemsijcisjournal
With the growth in the use of the Internet and local area networks, malicious attacks and intrusions into
computer systems are increasing. Implementing intrusion detection systems have become extremely
important to help maintain good network security. Support vector machines (SVMs), a classic pattern
recognition tool, have been widely used in intrusion detection. They can handle very large data with high
efficiency, are easy to use, and exhibit good prediction behavior. This paper presents a new SVM model
enriched with a Gaussian kernel function based on the features of the training data for intrusion detection.
The new model is tested with the CICIDS2017 dataset. The test proves better results in terms of detection
efficiency and false alarm rate, which can give better coverage and make detection more efficient.
11421ijcPROGRESS OF MACHINE LEARNING IN THE FIELD OF INTRUSION DETECTION SYST...ijcisjournal
With the growth in the use of the Internet and local area networks, malicious attacks and intrusions into computer systems are increasing. Implementing intrusion detection systems have become extremely important to help maintain good network security. Support vector machines (SVMs), a classic pattern recognition tool, have been widely used in intrusion detection. They can handle very large data with high efficiency, are easy to use, and exhibit good prediction behavior. This paper presents a new SVM model enriched with a Gaussian kernel function based on the features of the training data for intrusion detection. The new model is tested with the CICIDS2017 dataset. The test proves better results in terms of detection efficiency and false alarm rate, which can give better coverage and make detection more efficient.
CLASSIFICATION PROCEDURES FOR INTRUSION DETECTION BASED ON KDD CUP 99 DATA SETIJNSA Journal
In network security framework, intrusion detection is one of a benchmark part and is a fundamental way to protect PC from many threads. The huge issue in intrusion detection is presented as a huge number of false alerts; this issue motivates several experts to discover the solution for minifying false alerts according to data mining that is a consideration as analysis procedure utilized in a large data e.g. KDD CUP 99. This paper presented various data mining classification for handling false alerts in intrusion detection as reviewed. According to the result of testing many procedure of data mining on KDD CUP 99 that is no individual procedure can reveal all attack class, with high accuracy and without false alerts. The best accuracy in Multilayer Perceptron is 92%; however, the best Training Time in Rule based model is 4 seconds . It is concluded that ,various procedures should be utilized to handle several of network attacks.
CLASSIFICATION PROCEDURES FOR INTRUSION DETECTION BASED ON KDD CUP 99 DATA SETIJNSA Journal
In network security framework, intrusion detection is one of a benchmark part and is a fundamental way to protect PC from many threads. The huge issue in intrusion detection is presented as a huge number of false alerts; this issue motivates several experts to discover the solution for minifying false alerts according to data mining that is a consideration as analysis procedure utilized in a large data e.g. KDD CUP 99. This paper presented various data mining classification for handling false alerts in intrusion detection as reviewed. According to the result of testing many procedure of data mining on KDD CUP 99 that is no individual procedure can reveal all attack class, with high accuracy and without false alerts. The best accuracy in Multilayer Perceptron is 92%; however, the best Training Time in Rule based model is 4 seconds . It is concluded that ,various procedures should be utilized to handle several of network attacks.
DETECTION OF ATTACKS IN WIRELESS NETWORKS USING DATA MINING TECHNIQUESIAEME Publication
With the progressive increase of network application and electronic devices (computer, mobile phones, android, etc), attack and intrusion detection is becoming a very challenging task in cybercrime detection area. in this context, most of existing approaches of attack detection rely mainly on a finite set of attacks. However, these solutions are vulnerable, that is, they fail in detecting some attacks when sources of information’s are ambiguous or imperfect. But, few approaches started investigating toward this direction. Following this trends, this paper investigates the role of machine learning approach (ANN, SVM) in detecting TCP connection traffic as normal or suspicious one. But, using ANN and SVM is an expensive technique individually. In this paper, combining two classifiers has been proposed, where artificial neural network (ANN) classifier and support vector machine (SVM) were employed. Additionally, our proposed solution allows visualizing obtained classification results. Accuracy of the proposed solution has been compared with other classifier results. Experiments have been conducted with different network connection selected from NSL-KDD DARPA dataset. Empirical results show that combining ANN and SVM techniques for attack detection is a promising direction
An effective approach for tackling network security
problems is Intrusion detection systems (IDS). These kind of
systems play a key role in network security as they can detect
different types of attacks in networks, including DoS, U2R Probe
and R2L. In addition, IDS are an increasingly key part of the
system’s defense. Various approaches to IDS are now being used,
but are unfortunately relatively ineffective. Data mining techniques
and artificial intelligence play an important role in security
services. We will present a comparative study of three wellknown
intelligent algorithms in this paper. These are Radial Basis
Functions (RBF), Multilayer Perceptrons (MLP) and Support
Vector Machine (SVM).This work’s main interest is to benchmark
the performance of these3 intelligent algorithms. This is done by
using a dataset of about 9,000 connections, randomly chosen from
KDD'99’s 10% dataset. In addition, we investigate these
algorithms’ performance in terms of their attack classification
accuracy. The Simulation results are also analyzed and the
discussion is then presented. It has been observed that SVM with a
linear kernel (Linear-SVM) gives a better performance than MLP
and RBF in terms of its detection accuracy and processing speed.
Similar to Intrusion Detection System (IDS) Development Using Tree-Based Machine Learning Algorithms (20)
Vehicle Ad Hoc Networks (VANETs) have become a viable technology to improve traffic flow and safety on the roads. Due to its effectiveness and scalability, the Wingsuit Search-based Optimised Link State Routing Protocol (WS-OLSR) is frequently used for data distribution in VANETs. However, the selection of MultiPoint Relays (MPRs) plays a pivotal role in WS-OLSR's performance. This paper presents an improved MPR selection algorithm tailored to WS-OLSR, designed to enhance the overall routing efficiency and reduce overhead. The analysis found that the current OLSR protocol has problems such as redundancy of HELLO and TC message packets or failure to update routing information in time, so a WS-OLSR routing protocol based on improved-MPR selection algorithm was proposed. Firstly, factors such as node mobility and link changes are comprehensively considered to reflect network topology changes, and the broadcast cycle of node HELLO messages is controlled through topology changes. Secondly, a new MPR selection algorithm is proposed, considering link stability issues and nodes. Finally, evaluate its effectiveness in terms of packet delivery ratio, end-to-end delay, and control message overhead. Simulation results demonstrate the superior performance of our improved MR selection algorithm when compared to traditional approaches.
A Novel Medium Access Control Strategy for Heterogeneous Traffic in Wireless ...IJCNCJournal
So far, Wireless Body Area Networks (WBANs) have played a pivotal role in driving the development of intelligent healthcare systems with broad applicability across various domains. Each WBAN consists of one or more types of sensors that can be embedded in clothing, attached directly to the body, or even implanted beneath an individual's skin. These sensors typically serve asingle application. However, the traffic generated by each sensor may have distinct requirements. This diversity necessitates a dual approach: tailored treatment based on the specific needs of each traffic typeand the fulfillment of application requirements, such asreliability and timeliness. Never the less, the presence of energy constraints and the unreliable nature of wireless communications make QoS provisioning under such networks a non-trivial task. In this context, the current paper introduces a novel Medium AccessControl (MAC) strategy for the regular traffic applications of WBANs, designed to significantly enhance efficiency when compared to the established MAC protocols IEEE 802.15.4 and IEEE 802.15.6, with a particular focus on improving reliability, timeliness, and energy efficiency.
May_2024 Top 10 Read Articles in Computer Networks & Communications.pdfIJCNCJournal
The International Journal of Computer Networks & Communications (IJCNC) is a bi monthly open access peer-reviewed journal that publishes articles which contribute new results in all areas of Computer Networks & Communications. The journal focuses on all technical and practical aspects of Computer Networks & data Communications. The goal of this journal is to bring together researchers and practitioners from academia and industry to focus on advanced networking concepts and establishing new collaborations in these areas.
A Topology Control Algorithm Taking into Account Energy and Quality of Transm...IJCNCJournal
The efficient use of energy in wireless sensor networks is critical for extending node lifetime. The network topology is one of the factors that have a significant impact on the energy usage at the nodes and the quality of transmission (QoT) in the network. We propose a topology control algorithm for software-defined wireless sensor networks (SDWSNs) in this paper. Our method is to formulate topology control algorithm as a nonlinear programming (NP) problem with the objective to optimizing two metrics, maximum communication range, and desired degree. This NP problem is solved at the SDWSN controller by employing the genetic algorithm (GA) to determine the best topology. The simulation results show that the proposed algorithm outperforms the MaxPower algorithm in terms of average node degree and energy expansion ratio.
Multi-Server user Authentication Scheme for Privacy Preservation with Fuzzy C...IJCNCJournal
The integration of artificial intelligence technology with a scalable Internet of Things (IoT) platform facilitates diverse smart communication services, allowing remote users to access services from anywhere at any time. The multi-server environment within IoT introduces a flexible security service model, enabling users to interact with any server through a single registration. To ensure secure and privacy preservation services for resources, an authentication scheme is essential. Zhao et al. recently introduced a user authentication scheme for the multi-server environment, utilizing passwords and smart cards, claiming resilience against well-known attacks. This paper conducts cryptanalysis on Zhao et al.'s scheme, focusing on denial of service and privacy attacks, revealing a lack of user-friendliness. Subsequently, we propose a new multi-server user authentication scheme for privacy preservation with fuzzy commitment over the IoT environment, addressing the shortcomings of Zhao et al.'s scheme. Formal security verification of the proposed scheme is conducted using the ProVerif simulation tool. Through both formal and informal security analyses, we demonstrate that the proposed scheme is resilient against various known attacks and those identified in Zhao et al.'s scheme.
Advanced Privacy Scheme to Improve Road Safety in Smart Transportation SystemsIJCNCJournal
In -Vehicle Ad-Hoc Network (VANET), vehicles continuously transmit and receive spatiotemporal data with neighboring vehicles, thereby establishing a comprehensive 360-degree traffic awareness system. Vehicular Network safety applications facilitate the transmission of messages between vehicles that are near each other, at regular intervals, enhancing drivers' contextual understanding of the driving environment and significantly improving traffic safety. Privacy schemes in VANETs are vital to safeguard vehicles’ identities and their associated owners or drivers. Privacy schemes prevent unauthorized parties from linking the vehicle's communications to a specific real-world identity by employing techniques such as pseudonyms, randomization, or cryptographic protocols. Nevertheless, these communications frequently contain important vehicle information that malevolent groups could use to Monitor the vehicle over a long period. The acquisition of this shared data has the potential to facilitate the reconstruction of vehicle trajectories, thereby posing a potential risk to the privacy of the driver. Addressing the critical challenge of developing effective and scalable privacy-preserving protocols for communication in vehicle networks is of the highest priority. These protocols aim to reduce the transmission of confidential data while ensuring the required level of communication. This paper aims to propose an Advanced Privacy Vehicle Scheme (APV) that periodically changes pseudonyms to protect vehicle identities and improve privacy. The APV scheme utilizes a concept called the silent period, which involves changing the pseudonym of a vehicle periodically based on the tracking of neighboring vehicles. The pseudonym is a temporary identifier that vehicles use to communicate with each other in a VANET. By changing the pseudonym regularly, the APV scheme makes it difficult for unauthorized entities to link a vehicle's communications to its real-world identity. The proposed APV is compared to the SLOW, RSP, CAPS, and CPN techniques. The data indicates that the efficiency of APV is a better improvement in privacy metrics. It is evident that the AVP offers enhanced safety for vehicles during transportation in the smart city.
April 2024 - Top 10 Read Articles in Computer Networks & CommunicationsIJCNCJournal
The International Journal of Computer Networks & Communications (IJCNC) is a bi monthly open access peer-reviewed journal that publishes articles which contribute new results in all areas of Computer Networks & Communications. The journal focuses on all technical and practical aspects of Computer Networks & data Communications. The goal of this journal is to bring together researchers and practitioners from academia and industry to focus on advanced networking concepts and establishing new collaborations in these areas.
DEF: Deep Ensemble Neural Network Classifier for Android Malware DetectionIJCNCJournal
Malware is one of the threats to security of computer networks and information systems. Since malware instances are available sufficiently, there is increased interest among researchers on usage of Artificial Intelligence (AI). Of late AI-enabled methods such as machine learning (ML) and deep learning paved way for solving many real-world problems. As it is a learning-based approach, accumulated training samples help in improving thequality of training and thus leveraging malware detection accuracy. Existing deep learning methods are focusing on learning-based malware detection systems. However, there is need for improving the state of the art through ensemble approach. Towards this end, in this paper we proposed a framework known as Deep Ensemble Framework (DEF) for automatic malware detection. The framework obtains features from training samples. From given malware instance a grayscale image is generated. There is another process to extract the opcode sequences. Convolutional Neural Network (CNN) and Long Short Term Memory (LSTM) techniques are used to obtain grayscale image and opcode sequence respectively. Afterwards, a stacking ensemble is employed in order to achieve efficient malware detection and classification. Malware samples collected fromthe Internet sources and Microsoft are used for theempirical study. An algorithm known as Ensemble Learning for Automatic Malware Detection (EL-AML) is proposed to realize our framework. Another algorithm named Pre-Process is proposed to assist the EL-AML algorithm for obtaining intermediate features required by CNN and LSTM.Empirical study reveals that our framework outperforms many existing methods in terms of speed-up and accuracy.
High Performance NMF Based Intrusion Detection System for Big Data IOT TrafficIJCNCJournal
With the emergence of smart devices and the Internet of Things (IoT), millions of users connected to the network produce massive network traffic datasets. These vast datasets of network traffic, Big Data are challenging to store, deal with and analyse using a single computer. In this paper we developed parallel implementation using a High Performance Computer (HPC) for the Non-Negative Matrix Factorization technique as an engine for an Intrusion Detection System (HPC-NMF-IDS). The large IoT traffic datasets of order of millions samples are distributed evenly on all the computing cores for both storage and speedup purpose. The distribution of computing tasks involved in the Matrix Factorization takes into account the reduction of the communication cost between the computing cores. The experiments we conducted on the proposed HPC-IDS-NMF give better results than the traditional ML-based intrusion detection systems. We could train the HPC model with datasets of one million samples in only 31 seconds instead of the 40 minutes using one processor), that is a speed up of 87 times. Moreover, we have got an excellent detection accuracy rate of 98% for KDD dataset.
A Novel Medium Access Control Strategy for Heterogeneous Traffic in Wireless ...IJCNCJournal
So far, Wireless Body Area Networks (WBANs) have played a pivotal role in driving the development of intelligent healthcare systems with broad applicability across various domains. Each WBAN consists of one or more types of sensors that can be embedded in clothing, attached directly to the body, or even implanted beneath an individual's skin. These sensors typically serve asingle application. However, the traffic generated by each sensor may have distinct requirements. This diversity necessitates a dual approach: tailored treatment based on the specific needs of each traffic typeand the fulfillment of application requirements, such asreliability and timeliness. Never the less, the presence of energy constraints and the unreliable nature of wireless communications make QoS provisioning under such networks a non-trivial task. In this context, the current paper introduces a novel Medium AccessControl (MAC) strategy for the regular traffic applications of WBANs, designed to significantly enhance efficiency when compared to the established MAC protocols IEEE 802.15.4 and IEEE 802.15.6, with a particular focus on improving reliability, timeliness, and energy efficiency.
A Topology Control Algorithm Taking into Account Energy and Quality of Transm...IJCNCJournal
The efficient use of energy in wireless sensor networks is critical for extending node lifetime. The network topology is one of the factors that have a significant impact on the energy usage at the nodes and the quality of transmission (QoT) in the network. We propose a topology control algorithm for software-defined wireless sensor networks (SDWSNs) in this paper. Our method is to formulate topology control algorithm as a nonlinear programming (NP) problem with the objective to optimizing two metrics, maximum communication range, and desired degree. This NP problem is solved at the SDWSN controller by employing the genetic algorithm (GA) to determine the best topology. The simulation results show that the proposed algorithm outperforms the MaxPower algorithm in terms of average node degree and energy expansion ratio.
Multi-Server user Authentication Scheme for Privacy Preservation with Fuzzy C...IJCNCJournal
The integration of artificial intelligence technology with a scalable Internet of Things (IoT) platform facilitates diverse smart communication services, allowing remote users to access services from anywhere at any time. The multi-server environment within IoT introduces a flexible security service model, enabling users to interact with any server through a single registration. To ensure secure and privacy preservation services for resources, an authentication scheme is essential. Zhao et al. recently introduced a user authentication scheme for the multi-server environment, utilizing passwords and smart cards, claiming resilience against well-known attacks. This paper conducts cryptanalysis on Zhao et al.'s scheme, focusing on denial of service and privacy attacks, revealing a lack of user-friendliness. Subsequently, we propose a new multi-server user authentication scheme for privacy preservation with fuzzy commitment over the IoT environment, addressing the shortcomings of Zhao et al.'s scheme. Formal security verification of the proposed scheme is conducted using the ProVerif simulation tool. Through both formal and informal security analyses, we demonstrate that the proposed scheme is resilient against various known attacks and those identified in Zhao et al.'s scheme.
Advanced Privacy Scheme to Improve Road Safety in Smart Transportation SystemsIJCNCJournal
In -Vehicle Ad-Hoc Network (VANET), vehicles continuously transmit and receive spatiotemporal data with neighboring vehicles, thereby establishing a comprehensive 360-degree traffic awareness system. Vehicular Network safety applications facilitate the transmission of messages between vehicles that are near each other, at regular intervals, enhancing drivers' contextual understanding of the driving environment and significantly improving traffic safety. Privacy schemes in VANETs are vital to safeguard vehicles’ identities and their associated owners or drivers. Privacy schemes prevent unauthorized parties from linking the vehicle's communications to a specific real-world identity by employing techniques such as pseudonyms, randomization, or cryptographic protocols. Nevertheless, these communications frequently contain important vehicle information that malevolent groups could use to Monitor the vehicle over a long period. The acquisition of this shared data has the potential to facilitate the reconstruction of vehicle trajectories, thereby posing a potential risk to the privacy of the driver. Addressing the critical challenge of developing effective and scalable privacy-preserving protocols for communication in vehicle networks is of the highest priority. These protocols aim to reduce the transmission of confidential data while ensuring the required level of communication. This paper aims to propose an Advanced Privacy Vehicle Scheme (APV) that periodically changes pseudonyms to protect vehicle identities and improve privacy. The APV scheme utilizes a concept called the silent period, which involves changing the pseudonym of a vehicle periodically based on the tracking of neighboring vehicles. The pseudonym is a temporary identifier that vehicles use to communicate with each other in a VANET. By changing the pseudonym regularly, the APV scheme makes it difficult for unauthorized entities to link a vehicle's communications to its real-world identity. The proposed APV is compared to the SLOW, RSP, CAPS, and CPN techniques. The data indicates that the efficiency of APV is a better improvement in privacy metrics. It is evident that the AVP offers enhanced safety for vehicles during transportation in the smart city.
DEF: Deep Ensemble Neural Network Classifier for Android Malware DetectionIJCNCJournal
Malware is one of the threats to security of computer networks and information systems. Since malware instances are available sufficiently, there is increased interest among researchers on usage of Artificial Intelligence (AI). Of late AI-enabled methods such as machine learning (ML) and deep learning paved way for solving many real-world problems. As it is a learning-based approach, accumulated training samples help in improving thequality of training and thus leveraging malware detection accuracy. Existing deep learning methods are focusing on learning-based malware detection systems. However, there is need for improving the state of the art through ensemble approach. Towards this end, in this paper we proposed a framework known as Deep Ensemble Framework (DEF) for automatic malware detection. The framework obtains features from training samples. From given malware instance a grayscale image is generated. There is another process to extract the opcode sequences. Convolutional Neural Network (CNN) and Long Short Term Memory (LSTM) techniques are used to obtain grayscale image and opcode sequence respectively. Afterwards, a stacking ensemble is employed in order to achieve efficient malware detection and classification. Malware samples collected fromthe Internet sources and Microsoft are used for theempirical study. An algorithm known as Ensemble Learning for Automatic Malware Detection (EL-AML) is proposed to realize our framework. Another algorithm named Pre-Process is proposed to assist the EL-AML algorithm for obtaining intermediate features required by CNN and LSTM.Empirical study reveals that our framework outperforms many existing methods in terms of speed-up and accuracy.
High Performance NMF based Intrusion Detection System for Big Data IoT TrafficIJCNCJournal
With the emergence of smart devices and the Internet of Things (IoT), millions of users connected to the network produce massive network traffic datasets. These vast datasets of network traffic, Big Data are challenging to store, deal with and analyse using a single computer. In this paper we developed parallel implementation using a High Performance Computer (HPC) for the Non-Negative Matrix Factorization technique as an engine for an Intrusion Detection System (HPC-NMF-IDS). The large IoT traffic datasets of order of millions samples are distributed evenly on all the computing cores for both storage and speedup purpose. The distribution of computing tasks involved in the Matrix Factorization takes into account the reduction of the communication cost between the computing cores. The experiments we conducted on the proposed HPC-IDS-NMF give better results than the traditional ML-based intrusion detection systems. We could train the HPC model with datasets of one million samples in only 31 seconds instead of the 40 minutes using one processor), that is a speed up of 87 times. Moreover, we have got an excellent detection accuracy rate of 98% for KDD dataset.
IoT Guardian: A Novel Feature Discovery and Cooperative Game Theory Empowered...IJCNCJournal
Cyber intrusion attacks increasingly target the Internet of Things (IoT) ecosystem, exploiting vulnerable devices and networks. Malicious activities must be identified early to minimize damage and mitigate threats. Using actual benign and attack traffic from the CICIoT2023 dataset, this WORK aims to evaluate and benchmark machine-learning techniques for IoT intrusion detection. There are four main phases to the system. First, the CICIoT2023 dataset is refined to remove irrelevant features and clean up missing and duplicate data. The second phase employs statistical models and artificial intelligence to discover novel features. The most significant features are then selected in the third phase based on cooperative game theory. Using the original CICIoT2023 dataset and a dataset containing only novel features, we train and evaluate a variety of machine learning classifiers. On the original dataset, Random Forest achieved the highest accuracy of 99%. Still, with novel features, Random Forest's performance dropped only slightly (96%) while other models achieved significantly lower accuracy. As a whole, the work contributes substantial contributions to tailored feature engineering, feature selection, and rigorous benchmarking of IoT intrusion detection techniques. IoT networks and devices face continuously evolving threats, making it necessary to develop robust intrusion detection systems.
Enhancing Traffic Routing Inside a Network through IoT Technology & Network C...IJCNCJournal
IoT networking uses real items as stationary or mobile nodes. Mobile nodes complicate networking. Internet of Things (IoT) networks have a lot of control overhead messages because devices are mobile. These signals are generated by the constant flow of control data as such device identity, geographical positioning, node mobility, device configuration, and others. Network clustering is a popular overhead communication management method. Many cluster-based routing methods have been developed to address system restrictions. Node clustering based on the Internet of Things (IoT) protocol, may be used to cluster all network nodes according to predefined criteria. Each cluster will have a Smart Designated Node. SDN cluster management is efficient. Many intelligent nodes remain in the network. The network design spreads these signals. This paper presents an intelligent and responsive routing approach for clustered nodes in IoT networks. An existing method builds a new sub-area clustered topology. The Nodes Clustering Based on the Internet of Things (NCIoT) method improves message transmission between any two nodes. This will facilitate the secure and reliable interchange of healthcare data between professionals and patients. NCIoT is a system that organizes nodes in the Internet of Things (IoT) by grouping them together based on their proximity. It also picks SDN routes for these nodes. This approach involves selecting one option from a range of choices and preparing for likely outcomes problem addressing limitations on activities is a primary focus during the review process. Predictive inquiry employs the process of analyzing data to forecast and anticipate future events. This document provides an explanation of compact units. The Predictive Inquiry Small Packets (PISP) improved its backup system and partnered with SDN to establish a routing information table for each intelligent node, resulting in higher routing performance. Both principal and secondary roads are available for use. The simulation findings indicate that NCIoT algorithms outperform CBR protocols. Enhancements lead to a substantial 78% boost in network performance. In addition, the end-to-end latency dropped by 12.5%. The PISP methodology produces 5.9% more inquiry packets compared to alternative approaches. The algorithms are constructed and evaluated against academic ones.
IoT Guardian: A Novel Feature Discovery and Cooperative Game Theory Empowered...IJCNCJournal
Cyber intrusion attacks increasingly target the Internet of Things (IoT) ecosystem, exploiting vulnerable devices and networks. Malicious activities must be identified early to minimize damage and mitigate threats. Using actual benign and attack traffic from the CICIoT2023 dataset, this WORK aims to evaluate and benchmark machine-learning techniques for IoT intrusion detection. There are four main phases to the system. First, the CICIoT2023 dataset is refined to remove irrelevant features and clean up missing and duplicate data. The second phase employs statistical models and artificial intelligence to discover novel features. The most significant features are then selected in the third phase based on cooperative game theory. Using the original CICIoT2023 dataset and a dataset containing only novel features, we train and evaluate a variety of machine learning classifiers. On the original dataset, Random Forest achieved the highest accuracy of 99%. Still, with novel features, Random Forest's performance dropped only slightly (96%) while other models achieved significantly lower accuracy. As a whole, the work contributes substantial contributions to tailored feature engineering, feature selection, and rigorous benchmarking of IoT intrusion detection techniques. IoT networks and devices face continuously evolving threats, making it necessary to develop robust intrusion detection systems.
** Connect, Collaborate, And Innovate: IJCNC - Where Networking Futures Take ...IJCNCJournal
The International Journal of Computer Networks & Communications (IJCNC) is a bi monthly open access peer-reviewed journal that publishes articles which contribute new results in all areas of Computer Networks & Communications. The journal focuses on all technical and practical aspects of Computer Networks & data Communications. The goal of this journal is to bring together researchers and practitioners from academia and industry to focus on advanced networking concepts and establishing new collaborations in these areas.
Enhancing Traffic Routing Inside a Network through IoT Technology & Network C...IJCNCJournal
IoT networking uses real items as stationary or mobile nodes. Mobile nodes complicate networking. Internet of Things (IoT) networks have a lot of control overhead messages because devices are mobile. These signals are generated by the constant flow of control data as such device identity, geographical positioning, node mobility, device configuration, and others. Network clustering is a popular overhead communication management method. Many cluster-based routing methods have been developed to address system restrictions. Node clustering based on the Internet of Things (IoT) protocol, may be used to cluster all network nodes according to predefined criteria. Each cluster will have a Smart Designated Node. SDN cluster management is efficient. Many intelligent nodes remain in the network. The network design spreads these signals. This paper presents an intelligent and responsive routing approach for clustered nodes in IoT networks. An existing method builds a new sub-area clustered topology. The Nodes Clustering Based on the Internet of Things (NCIoT) method improves message transmission between any two nodes. This will facilitate the secure and reliable interchange of healthcare data between professionals and patients. NCIoT is a system that organizes nodes in the Internet of Things (IoT) by grouping them together based on their proximity. It also picks SDN routes for these nodes. This approach involves selecting one option from a range of choices and preparing for likely outcomes problem addressing limitations on activities is a primary focus during the review process. Predictive inquiry employs the process of analyzing data to forecast and anticipate future events. This document provides an explanation of compact units. The Predictive Inquiry Small Packets (PISP) improved its backup system and partnered with SDN to establish a routing information table for each intelligent node, resulting in higher routing performance. Both principal and secondary roads are available for use. The simulation findings indicate that NCIoT algorithms outperform CBR protocols. Enhancements lead to a substantial 78% boost in network performance. In addition, the end-to-end latency dropped by 12.5%. The PISP methodology produces 5.9% more inquiry packets compared to alternative approaches. The algorithms are constructed and evaluated against academic ones.
Students, digital devices and success - Andreas Schleicher - 27 May 2024..pptxEduSkills OECD
Andreas Schleicher presents at the OECD webinar ‘Digital devices in schools: detrimental distraction or secret to success?’ on 27 May 2024. The presentation was based on findings from PISA 2022 results and the webinar helped launch the PISA in Focus ‘Managing screen time: How to protect and equip students against distraction’ https://www.oecd-ilibrary.org/education/managing-screen-time_7c225af4-en and the OECD Education Policy Perspective ‘Students, digital devices and success’ can be found here - https://oe.cd/il/5yV
Operation “Blue Star” is the only event in the history of Independent India where the state went into war with its own people. Even after about 40 years it is not clear if it was culmination of states anger over people of the region, a political game of power or start of dictatorial chapter in the democratic setup.
The people of Punjab felt alienated from main stream due to denial of their just demands during a long democratic struggle since independence. As it happen all over the word, it led to militant struggle with great loss of lives of military, police and civilian personnel. Killing of Indira Gandhi and massacre of innocent Sikhs in Delhi and other India cities was also associated with this movement.
The Roman Empire A Historical Colossus.pdfkaushalkr1407
The Roman Empire, a vast and enduring power, stands as one of history's most remarkable civilizations, leaving an indelible imprint on the world. It emerged from the Roman Republic, transitioning into an imperial powerhouse under the leadership of Augustus Caesar in 27 BCE. This transformation marked the beginning of an era defined by unprecedented territorial expansion, architectural marvels, and profound cultural influence.
The empire's roots lie in the city of Rome, founded, according to legend, by Romulus in 753 BCE. Over centuries, Rome evolved from a small settlement to a formidable republic, characterized by a complex political system with elected officials and checks on power. However, internal strife, class conflicts, and military ambitions paved the way for the end of the Republic. Julius Caesar’s dictatorship and subsequent assassination in 44 BCE created a power vacuum, leading to a civil war. Octavian, later Augustus, emerged victorious, heralding the Roman Empire’s birth.
Under Augustus, the empire experienced the Pax Romana, a 200-year period of relative peace and stability. Augustus reformed the military, established efficient administrative systems, and initiated grand construction projects. The empire's borders expanded, encompassing territories from Britain to Egypt and from Spain to the Euphrates. Roman legions, renowned for their discipline and engineering prowess, secured and maintained these vast territories, building roads, fortifications, and cities that facilitated control and integration.
The Roman Empire’s society was hierarchical, with a rigid class system. At the top were the patricians, wealthy elites who held significant political power. Below them were the plebeians, free citizens with limited political influence, and the vast numbers of slaves who formed the backbone of the economy. The family unit was central, governed by the paterfamilias, the male head who held absolute authority.
Culturally, the Romans were eclectic, absorbing and adapting elements from the civilizations they encountered, particularly the Greeks. Roman art, literature, and philosophy reflected this synthesis, creating a rich cultural tapestry. Latin, the Roman language, became the lingua franca of the Western world, influencing numerous modern languages.
Roman architecture and engineering achievements were monumental. They perfected the arch, vault, and dome, constructing enduring structures like the Colosseum, Pantheon, and aqueducts. These engineering marvels not only showcased Roman ingenuity but also served practical purposes, from public entertainment to water supply.
This is a presentation by Dada Robert in a Your Skill Boost masterclass organised by the Excellence Foundation for South Sudan (EFSS) on Saturday, the 25th and Sunday, the 26th of May 2024.
He discussed the concept of quality improvement, emphasizing its applicability to various aspects of life, including personal, project, and program improvements. He defined quality as doing the right thing at the right time in the right way to achieve the best possible results and discussed the concept of the "gap" between what we know and what we do, and how this gap represents the areas we need to improve. He explained the scientific approach to quality improvement, which involves systematic performance analysis, testing and learning, and implementing change ideas. He also highlighted the importance of client focus and a team approach to quality improvement.
Ethnobotany and Ethnopharmacology:
Ethnobotany in herbal drug evaluation,
Impact of Ethnobotany in traditional medicine,
New development in herbals,
Bio-prospecting tools for drug discovery,
Role of Ethnopharmacology in drug evaluation,
Reverse Pharmacology.
How to Create Map Views in the Odoo 17 ERPCeline George
The map views are useful for providing a geographical representation of data. They allow users to visualize and analyze the data in a more intuitive manner.
Model Attribute Check Company Auto PropertyCeline George
In Odoo, the multi-company feature allows you to manage multiple companies within a single Odoo database instance. Each company can have its own configurations while still sharing common resources such as products, customers, and suppliers.
The Art Pastor's Guide to Sabbath | Steve ThomasonSteve Thomason
What is the purpose of the Sabbath Law in the Torah. It is interesting to compare how the context of the law shifts from Exodus to Deuteronomy. Who gets to rest, and why?
Intrusion Detection System (IDS) Development Using Tree-Based Machine Learning Algorithms
1. International Journal of Computer Networks & Communications (IJCNC) Vol.15, No.4, July 2023
DOI:10.5121/ijcnc.2023.15406 93
INTRUSION DETECTION SYSTEM (IDS)
DEVELOPMENT USING TREE- BASED MACHINE
LEARNING ALGORITHMS
Witcha Chimphlee and Siriporn Chimphlee
Department of Data Science and Analytics, Suan Dusit University, Bangkok, Thailand
ABSTRACT
The paper proposes a two-phase classification method for detecting anomalies in network traffic, aiming to
tackle the challenges of imbalance and feature selection. The study uses Information Gain to select
relevant features and evaluates its performance on the CICIDS-2018 dataset with various classifiers.
Results indicate that the ensemble classifier achieved the highest accuracy, precision, and recall. The
proposed method addresses challenges in intrusion detection and highlights the effectiveness of ensemble
classifiers in improving anomaly detection accuracy. Also, the quantity of pertinent characteristics chosen
by Information Gain has a considerable impact on the F1-score and detection accuracy. Specifically, the
Ensemble Learning achieved the highest accuracy of 98.36% and F1-score of 97.98% using the relevant
selected features.
KEYWORDS
Intrusion Detection System, Anomaly Detection, Imbalance Data, Feature Selection, CICIDS-2018 dataset
1. INTRODUCTION
Due to the growth of applications that produce data, the data volumes have increased
dramatically in recent years and must now be gathered, stored, and analyzed[1]. Therefore, the
number of attacks has increased including malware, botnets, spam, phishing, and DoS attacks
have turned out to be consistent dangers for systems and hosts. The network traffic activity is
made up of numerous features that have been compiled into a dataset to identify various attack
types [2]. Technology is currently facing a significant difficulty because of the daily growth in
the enormous volume of data generated online[3].In order to identify these threats, effective
intrusion detection systems (IDS) have been created. Systems for detecting intrusions have been
crucial to the safety of networks and computers. IDS network traffic monitoring and analysis is
used to categorize various sorts of attacks [4][5]. The primary problems with the IDS are the
systems' susceptibility to errors and the inconsistent and unfair ways that the systems' evaluation
processes were frequently carried out[6][7]. One of the most important challenges with the
greatest performance for big intrusion detection data sets is the component of dimensional
reduction known as feature selection, which is the process of choosing the ideal feature subset to
represent the full dataset [8].
Problems with categorization might be seen in pattern recognition or anomaly detection. When a
variable needs to be predicted yet is categorical, the challenge is referred to as a classification
problem. Several classification techniques are used to detect distinct sorts of assaults to improve
IDS performance [9]. Anomaly intrusion detection is an important research area in computer
2. International Journal of Computer Networks & Communications (IJCNC) Vol.15, No.4, July 2023
94
network security, aimed at identifying abnormal behavior in network traffic that may indicate an
attack or a security breach.
IDS that rely on detection come in two flavors: signature-based and anomaly-based. Attacks are
discovered using specified signature samples in signature-based IDS, making it a type of abuse
detection[10]. This method works well with large samples of signature data and has a low false
alarm rate. Only known attacks, however, can be identified, leading to a high proportion of
missed alarms. Nevertheless, anomaly-based IDS recognizes assaults by seeing out-of-the-
ordinary behaviors that depart from the typical profile. This method may have a reduced risk of
false alarms, but it can detect unidentified attacks.
The categorization model separates the dataset into training and testing phases[11].
Unfortunately, the training process is difficult and time-consuming due to the abundance of high-
dimensional features. To improve the model's performance during testing, pertinent and valuable
features must be chosen from the whole feature collection[ 1 2 ] . Improvements to intrusion
detection systems (IDS) are being made using machine learning (ML) techniques, which are
becoming more and more prominent in computer security datasets. There are many machine
learning algorithms available to users that can be implemented on datasets [13]. ML algorithms
assist in managing enormous amounts of data and extracting important features for different
feature selection procedures[14]. Popular machine learning classifier IDS divides different
assaults into several categories. Machine learning techniques including decision trees (DT), extra
trees (ET), random forests (RF), and XGBoost (eXtreme Gradient Boosting) are frequently used
in anomaly intrusion detection. These algorithms understand the system's typical behavior and
recognize variations from it that might indicate an attack by training on huge amounts of network
traffic data. Intrusion detection is still a crucial field of research for two reasons. The first is the
regular updating and modifying of network breaches, which results in patterns that are constantly
changing. Second, it gets simpler to explore and assess new concepts as more intrusion detection
datasets become accessible over time [15].Therefore, it is crucial to discover an optimal method
that reduces both false positives and false negatives.
The goal of this study is to analyze how the number of feature dimensions affects classification
accuracy when using attack datasets. Additionally, the study considers the impact of data sample
imbalance on classifier evaluation. To assess the quality of preprocessed data for multiple attacks
in the CSE-CICIDS-2018 dataset, various metrics were computed. Furthermore, the study
computed and discussed the performance measures of intrusion detection models that were
trained. The main contributions of this paper are:
The feature reduction method based on the first classification outcomes and feature
importance metrics produced by Information Gain.
A comparison of the machine learning methods DT, RF, ET, and XGBoost for IDS.
To demonstrate the effect of preprocessing, the imbalance issue in an intrusion
detection dataset is handled using SMOTE.
The rest of the paper is organized as follows: Section 2 gives an overview of intrusion detection
and classification model. It also provides more information about imbalance data and feature
selection. Section 3 provides our methodology. Section 4 presents the results from each of the
algorithms and Section 5 concludes with the findings and discussion of the project results.
3. International Journal of Computer Networks & Communications (IJCNC) Vol.15, No.4, July 2023
95
2. RELATED WORKS
An essential component that monitors and analyzes networks to find intrusions and alert
managers to ongoing attack operations is an intrusion detection system. Intrusion detection is a
prominent study subject. First, network invasions constantly update and evolve, leading to
patterns that are constantly shifting. Second, more intrusion detection datasets are becoming
available over time, allowing for the examination and evaluation of novel strategies [16]. Low
false positives and false negatives are desirable in an intrusion detection system. The
representative training dataset's quality, however, can have a big impact on these metrics. Real-
world situations could involve a variety of difficulties, including issues with class imbalance,
mixed data types (continuous, discrete, ordinal, and categorical), as well as non-Gaussian and
multimodal distributions in intrusion detection traces that call for handling.
The field of intrusion detection systems has been examined and investigated by numerous
researchers. Colas and Brazdil [17] conducted a comparative analysis of KNN, Naive Bayes, and
SVM methods in intrusion detection systems. Their feature-based comparison showed that SVM
is more efficient and has shorter processing time, but KNN has a better classification accuracy.
One of the strengths of their study is the systematic comparison of these three popular algorithms,
which provides useful insights into their strengths and weaknesses. However, their evaluation
was limited to a specific dataset and may not generalize to other datasets.
Jiang et al. [18] developed a text categorization model that combines a one-pass clustering
approach and an improved KNN text classification algorithm. Their combination strategy showed
a significant improvement over conventional KNN, Naive Bayes, and SVM algorithms, in terms
of reducing text redundancy and enhancing text categorization. One of the strengths of their
research is the novel combination of clustering and classification methods, which can be applied
to various text mining tasks. However, their study focused only on text data and may not
generalize to other types of data.
Elejla and colleagues [19] examined a number of classification methods, including KNN, SVM,
Decision tree, Naive Bayes, and Neural network, to forecast DDoS attacks using network
monitoring. They found that SVM and Decision tree methods outperformed other algorithms in
terms of detection accuracy. One of the strengths of their study is the evaluation of multiple
classification algorithms in the context of DDoS attack detection. However, their evaluation was
limited to a specific dataset and may not generalize to other datasets.
Bahrololum et al. [20]used supervised and unsupervised Neural Network (NN) and Self
Organizing Map (SOM) methodologies in their analysis of network traffic patterns for intrusion
detection. Their study showed that NN and SOM can effectively capture the complex patterns in
network traffic data and outperform traditional statistical methods. One of the strengths of their
research is the use of advanced machine learning methods to analyze network traffic data.
However, their study focused only on a specific type of data and may not generalize to other
types of data.
Awad and Alabdallah [6] presented a weighted extreme learning machine technique to address
the problem of imbalanced classes in intrusion detection systems. Their study showed that the
weighted ELM approach can effectively handle imbalanced data and improve classification
performance. One of the strengths of their research is the development of a novel approach to
handle imbalanced data, which is a common problem in intrusion detection. However, their
evaluation was limited to a specific dataset and may not generalize to other datasets.
4. International Journal of Computer Networks & Communications (IJCNC) Vol.15, No.4, July 2023
96
Siriporn and Witcha [21] compared the performance of a number of classification methods,
including LR, KNN, CART, Naive Bayes, RF, MLP, and XGBoost. They also included a feature
selection technique employing radio frequency (RF) to enhance classification performance. Their
study showed that RF-based feature selection can improve classification accuracy for all tested
algorithms. One of the strengths of their research is the comprehensive evaluation of multiple
algorithms and feature selection techniques. However, their evaluation was limited to a specific
dataset and may not generalize to other datasets.
3. THE PROPOSED FRAMEWORK
In this paper, we suggest a machine learning (ML)-based traffic classification method for IDS
and discuss some of the drawbacks of current approaches. To address the imbalance of traffic
samples and identify important characteristics from input flows, we specifically recommend a
data pre-processing strategy that incorporates embedded feature selection and under-sampling. In
Figure 1, we present the structure for the suggested methodology, and in the sections that follow,
we elaborate on each stage.
3.1. CSE-CICIDS-2018 DATASET
This section provides an overview of the CSE-CIC-IDS-2018 dataset [22], It was suggested by
the Canadian Institute for Cybersecurity and the Communications Security Establishment (CSE)
(CIC). The dataset contains both the real-time network activity of various infiltration states and
all of the inner network traces required to calculate data packet payloads. The characteristics of
the dataset are relevant to our inquiry. The dataset contains 14 different forms of invasions,
including SQL injections, Brute Force-XSS, DoS GoldenEye assaults, DoS Hulk an attack,
Botnet, SSH brute force, DDoS-low orbit ion cannon (LOIC)-UDP attacks, DDoS-LOIC-HTTP
attacks, and Brute Force-UDP attacks[23].
Figure 1. Designing the proposed method
3.2. Data Preprocessing
An IDS must include data processing as it is the initial step in simplifying machine learning
model training. Effective data preparation has a direct impact on the classification model's
performance, and by using the appropriate procedures, technical issues with data pretreatment
can be resolved and performance levels can be increased. This section covers the precise steps
5. International Journal of Computer Networks & Communications (IJCNC) Vol.15, No.4, July 2023
97
involved in data preparation, such as data integration, cleaning, encoding, and normalization, as
well as how feature selection is used. The success of model training depends on the data
preparation processes, which have received little attention despite the vast number of samples
that have been collected.
3.2.1. Data Integration
This section covers the CSE-CIC-IDS-2018 dataset, which has 16,233,002 instances spread over
10 files with 80 features per row. The dataset comprises 14 different attack types across six
different scenarios, with attack traffic accounting for about 17% of these incidents [24]. There are
many different types of attacks in the dataset, which is enormous. We pre-processed the data and
then combined it into a single database to get the dataset ready for analysis. For simple access
and analysis, we compiled all the data from the raw-data files and placed it in a database.
3.2.2. Data cleaning
High-quality data is necessary to deliver trustworthy analytics that lead to effective and sensible
decision-making. Data cleaning is a necessary component of data pre-processing, which improves
the utility of a dataset. It ensures that the data is free of noise and errors that could cause model
technical issues. In this study, missing values and pointless attributes were removed from the
dataset using data cleaning. Timestamped samples, "Infinity," and "NaN" values were excluded.
The missing data was filled in with the mean value, and the feature values were scaled to a
standard format using StandardScaler.
3.2.3. Data encoding
Data encoding is required to transform category variables into numerical values that machine
learning algorithms may use. In our study, the labels are either "0" or "1," where "0" stands for
"Benign" and "1" for "Attack," because we are working with a binary classification problem. To
help the model comprehend the labels more accurately, we encoded them. The model would
perform poorly if the labels were not encoded since it would have trouble understanding them. As
a result, data encoding gives the model the ability to understand the labels as numerical values,
which improves how well it processes and learns from the data.
3.2.4.Normalization
For intrusion detection systems that rely on statistical features extracted from the data,
normalization is a crucial step in the preprocessing of the data. Input data must commonly be
normalized for machine learning-based techniques in order to remove bias that could result from
variations in the magnitudes of the variables' values. When there is a substantial difference
between the highest and lowest values of the data, normalization is necessary. By normalizing the
data, the range of values is normalized, which enhances the performance of the model. The most
common normalizing technique, StandardScaler, adjusts the data to have a mean of 0 and a
standard deviation of 1. The transformation of continuous and quasi-continuous features uses
standardization. By reducing the mean and scaling the data to a single variance, it normalizes the
data. This can be denoted as 𝒳scaler =
𝒳−μ
σ
, where 𝒳𝑠𝑐𝑎𝑙𝑒𝑟 generates a new value, 𝜇 is the
mean, and 𝜎 is the standard deviation, σ = √
1
𝑁
∑ (𝑥𝑖 − 𝜇)2
𝑁
𝑖=1 .
6. International Journal of Computer Networks & Communications (IJCNC) Vol.15, No.4, July 2023
98
3.3. Feature selection : Information Gain
Feature engineering is a critical step in machine learning where raw data is transformed into
useful features to enhance the predictive power of models[25]. Techniques for dimensionality
reduction and methods for feature selection are used in this procedure to help choose the most
pertinent features for training and detection. For models to perform better, proper dataset
preparation and feature selection are essential[26]. The process of feature selection entails
deleting noisy or unimportant features while selecting beneficial features from a dataset that
faithfully reflect the original data pattern. It is essential to the development of anomaly-based
intrusion detection systems because it increases computing efficiency and accuracy. Finding a
subcategory of characteristics that accurately represents the data and is necessary for prediction is
the aim of feature selection [27].It is essential to carefully select the ideal set of features in order
to increase the accuracy of the IDS model by reducing false positives and false negatives. In
addition, by simplifying the model, less characteristics in the CICIDS-2018 dataset can enhance
the model's interpretability and lessen overfitting. The CICIDS-2018 dataset has a lot of classes,
therefore choosing the best features is not an easy task. The techniques used to choose the ideal
feature set, the tests that were run, and the features that were adjusted or removed before feature
analysis are covered in this section.
The most popular feature selection technique is Information Gain, a filter-based feature selection
technique [28]. Information Gain ranks characteristics and reduces noise caused by unimportant
features by identifying the characteristic that best communicates the most knowledge about a
certain class. When determining which feature will provide the most information, entropy, a
measure of uncertainty that describes the distribution of features, is calculated[29].
Information gain is a popular approach for feature selection, which involves identifying the most
relevant features in a dataset that can help to make accurate predictions. Here are the steps to
handle feature selection using information gain:
1. Compute the information gain for each feature: Information gain measures the reduction
in entropy (i.e., uncertainty) that results from splitting the data based on a particular
feature. Features with higher information gain are more useful for making predictions.
You can use a formula like the one below to calculate the information gain for each
feature:
𝐼𝑛𝑓𝑜𝑟𝑚𝑎𝑡𝑖𝑜𝑛 𝐺𝑎𝑖𝑛 = 𝐸𝑛𝑡𝑟𝑜𝑝𝑦(𝑆) − ∑
|𝑆𝑣|
|𝑆|
𝐸𝑛𝑡𝑟𝑜𝑝𝑦(𝑆𝑣)
𝑉𝑎𝑙𝑢𝑒𝑠(𝐴)
(1)
where:
S is the entire dataset
Sv is the subset of S for which the feature value is v
|S| is the total number of instances in S
|Sv| is the number of instances in Sv
2. Rank the features based on their information gain: Once you have calculated the
information gain for each feature, you can rank them in descending order based on their
information gain values. The features with the highest information gain are the most
relevant and should be selected for the model.
3. Select the top N features: Depending on the size and complexity of your dataset, you may
want to select only the top N features with the highest information gain. This will help to
reduce the dimensionality of your data and improve the efficiency of your model.
7. International Journal of Computer Networks & Communications (IJCNC) Vol.15, No.4, July 2023
99
4. Train the model with the selected features: Once you have identified the top features
using information gain, you can train your model using only those features. This will
help to improve the accuracy and interpretability of your model.
Overall, using information gain for feature selection can help to improve the performance and
efficiency of your machine learning models by reducing the dimensionality of your data and
identifying the most relevant features.
3.4. Class Imbalance
Class imbalance is a crucial factor to take into account in cybersecurity and machine learning.
The researchers' top aim is to increase detection precision. Thus, it is not a good idea to use
accuracy as the only statistic if the dataset is uneven and dominated by a single category. The
efficacy of the system implies that this unbalanced structure requires development. Random
oversampling and the synthetic minority oversampling approach (SMOTE), which frequently
results in a low rate of anomaly detection, are used to address the issue of class-imbalanced
data[30], [31], and may be employed to produce more data in minority classes where there is a
dearth of information. This can then be used to create the matrices that can be used to calculate
the unbalanced ratio[32]. Where the data size for class I is shown by Xi. The ratio between the
maximum and minimum instances of each class is, in other words, the imbalance ratio. Thus,
system efficiency should be increased by lowering this imbalance rate. When one group of
individuals is overrepresented in comparison to another, there is a class imbalance.The
imbalanced classification problem is caused, numerically, by the ratio of benign traffic to all
traffic. Table 1 demonstrates that Benign accounts for a sizeable portion of the data with
13,484,708 records, or 83% of the total, while each type of assault accounted for less than 5% of
the records, or roughly three million records, or 17% of all records, while Benign accounts for
13,484,708 records, or 83% of the total
Table 1. CICIDS-2018 data distribution [22]
Class Type Number Volume (%)
Benign 13,484,708 83.0700
DDOS attack-HOIC 686,012 4.2260
DDoS attacks-LOIC-HTTP 576,191 3.5495
DoS attacks-Hulk 461,912 2.8455
Bot 286,191 1.7630
FTP-BruteForce 193,360 1.1912
SSH-Bruteforce 187,589 1.1556
Infilteration 161,934 0.9976
DoS attacks-SlowHTTPTest 139,890 0.8618
DoS attacks-GoldenEye 41,508 0.2557
DoS attacks-Slowloris 10,990 0.0677
DDOS attack-LOIC-UDP 1,730 0.0107
Brute Force -Web 611 0.0038
Brute Force -XSS 230 0.0014
SQL Injection 87 0.0005
Total 16,232,943 100
From Table 1 generates a pie chart to display the proportion of bening and attack traffic in a
dataset as Figure 2.
8. International Journal of Computer Networks & Communications (IJCNC) Vol.15, No.4, July 2023
100
Figure 2. Proportion of Benign and Attacks
3.5. Classifiers
To determine which classifier performs the best, the training model is constructed and fitted with
various classifiers using the model.fit() method. The quality of the learning model and dataset has
a significant impact on an IDS's effectiveness [33]. Predicting the class of a given dataset is a step
in the classification process. Based on whether the traffic is malicious or benign, binary and
multiclass assaults are categorized inside IDS. Binary classification makes use of two clusters,
whereas multiclass classification expands the notion to include "n" clusters, allowing for
prediction of numerous categories or classes. Given that there are more classifications, multiclass
classification is frequently more difficult than binary classification. As a result, algorithms must
exert more effort and take longer to complete jobs, which could result in less effective results[6].
Each dataset needs to be analyzed, categorized as normal or aberrant, and the present structures
saved for future use. Although abuse detection and anomaly detection are both possible
applications of classification, the latter is more frequently used. This study handled feature
selection and class imbalance using five machine learning algorithms. This is a more detailed
explanation of these.
3.5.1. Random Forest classifier (RF)
Using the outputs of several decision trees, each of which was applied to a different subset of a
dataset, a machine learning classifier called Random Forest (RF) improves prediction accuracy. It
is comparable to the bootstrapping process used in the CART decision tree model. Using
different samples and initial parameters, RF tries to construct many CART models. The final
forecast, which consists of a substantial number of decision trees that each function
independently to anticipate the class outcome, is based on the class that receives the majority of
votes [9]. Random Forest include fewer control and model parameters than other models, a lower
error rate, resistance to overfitting, and the ability to employ a broad variety of potential attributes
without having to pick features. Also, when the number of trees in the forest increases, the
variance of the model decreases but the bias remains constant.
3.5.2. XGBoost
XGBoost is a machine learning software that uses gradient-boosted decision trees and is
primarily concerned with performance and speed. It is an effective tool for maximizing the
9. International Journal of Computer Networks & Communications (IJCNC) Vol.15, No.4, July 2023
101
hardware and memory resources available for tree boosting algorithms, allowing algorithm
refinement, model modification, and deployment in computer systems. The three main gradient
boosting methods—gradient boosting, regularized boosting, and stochastic boosting—are
supported by XGBoost. Several applications prefer the technique since it considerably reduces
computation time and increases memory consumption[34].
3.5.3. Decision Tree (DT)
A machine learning approach called a decision tree creates a model that resembles a tree and
describes the link between attributes and a class label. It divides observations recursively based
on the property with the highest gain ratio value that is the most informative. Although
continuous data can also be handled by transforming it to categorical data, decision trees are best
suited for data sets with categorical data. As DT models are provided as a set of rules, one benefit
is that they are easy to interpret. Each non-leaf node represents a test on a feature attribute, and
each branch shows the outcome of this feature attribute on a certain value domain. In order to
classify an item using a DT, the associated feature attribute must first be tested. Next, the output
branch must be chosen based on its value until it reaches the leaf node, where the category stored
there is used as the decision outcome[35]. A category is stored in each leaf node.
3.5.4. Extremely Trees Classifier
The Extra Trees Classifier (ETC) machine learning algorithm is a member of the ensemble
method family. It is similar to the Random Forest algorithm but chooses split points in the
decision trees in a different way. Using a variety of randomly selected feature and data subsets,
ETC builds numerous decision trees, and each tree casts a vote for the final classification
outcome. Unlike Random Forest, ETC chooses split points at random, disregarding the ideal split
point. [35]. ETC is faster than other decision tree-based models thanks to this method, which also
makes it less prone to overfitting. By giving the minority class samples more weight during the
training process, ETC is also better able to manage imbalanced datasets than other methods.
Overall, ETC is a strong and effective classification method that may be applied to a variety of
tasks, such as intrusion and anomaly detection.
3.5.5. Ensemble Approach
An ensemble learner is a machine learning technique that combines multiple individual models to
improve the accuracy and robustness of the overall prediction. The individual models can be of
different types, using different algorithms or feature sets, and are trained independently on the
same or different datasets. The ensemble learner then combines the predictions of the individual
models using a voting or weighted average method to make the final prediction. Ensemble
learners are often used in classification problems and can be categorized into two main types:
bagging and boosting. Bagging involves training each model on a random subset of the training
data, while boosting focuses on training each model on the examples that were previously
misclassified by the ensemble. Ensemble learning has been shown to be highly effective in
improving the performance of machine learning models, especially when the individual models
have different biases or error patterns.
IDSs have been demonstrated to perform better when using ensemble techniques, especially
when spotting uncommon and unknown assaults. Also, they can increase the effectiveness of the
detection process and lower the likelihood of false alarms. However, compared to single
classifiers, ensemble approaches might be more computationally expensive and resource-
intensive. As a result, the characteristics of the dataset and the resources at hand should be taken
into consideration while choosing an ensemble approach.
10. International Journal of Computer Networks & Communications (IJCNC) Vol.15, No.4, July 2023
102
3.6. Evaluation Metrics
The classification report is a graphical depiction that shows the four key classification model
parameters, Precision, Recall, F1-score, and Support. These values are used to gauge the
correctness of the model fitting. By including numerical scores for convenience, it makes
interpretation and detection simpler. True Positive (TP), True Negative (TN), False Positive (FP),
and False Negative are the four results of Confusion Matrix (FN)
3.6.1. Accuracy
An essential performance statistic, accuracy shows how well a classification model, or classifier,
can accurately predict previously unknown data. It stands for the model's capacity for accurate
prediction. Precision equals TP+TN/TP+FP+FN+TN.
3.6.2. Precision
A crucial performance criterion that needs to be considered is precision. The ratio of correctly
observed positive findings to all observed positive results is what is gauged. Precision equals
TP/TP + FP.
3.6.3. Recall
The recall is determined by dividing the total number of observations in a class by the proportion
of accurately observed positive findings. The proportion of positive observations is represented
by its output. TP/TP + FN = recall.
3.6.4. F1-Score
The F1-score, which is more important than accuracy, is a critical performance metric to take into
account. The costs of false positives and false negatives might not be comparable when working
with a large dataset. Accuracy might not be the best choice when expenses are not equal. In these
situations, the F1-score needs to be looked at for a more precise assessment.
The F1-Score is calculated as 2 * (Precision * Recall)/(Precision + Recall).
4. EXPERIMENTAL SETUPS AND RESULTS
With an IMac Pro with an Intel Xeon W 3.2 GHz (8-Cores), 32 GB of 2666 MHz DDR4
Memory, and a 1 TB HDD, all experiments are conducted. The scripts were created using the
numpy, pandas, and sklearn libraries in the Python (Version 3.9) environment.
SMOTE is a data augmentation technique that involves synthesizing new data points for the
minority class by interpolating between existing data points. Here are the steps to apply SMOTE
to the training data:
1. Identify the minority class: In the case of the CIC-IDS-2018 dataset, the minority class is
the malicious traffic class.
2. Calculate the imbalance ratio: Calculate the imbalance ratio between the minority and
majority classes in the training data. This will help to determine the number of synthetic
data points to be generated by SMOTE.
11. International Journal of Computer Networks & Communications (IJCNC) Vol.15, No.4, July 2023
103
3. Apply SMOTE: Use a library such as imblearn to apply SMOTE to the training data. The
SMOTE function takes as input the training data and the imbalance ratio and generates
new data points for the minority class. The number of synthetic data points generated by
SMOTE should be proportional to the imbalance ratio. For example, if the imbalance
ratio is 1:10 (i.e., the minority class has 10% of the samples of the majority class),
SMOTE should generate 9 new data points for each existing data point in the minority
class.
4. Combine the original and synthetic data: Combine the original training data with the
synthetic data generated by SMOTE to create a new balanced training data set.
5. Shuffle the data: Shuffle the new balanced training data set to avoid any bias in the order
of the data points.
6. Train the model: Train a classification model on the new balanced training data set.
By applying SMOTE to the training data, you can increase the number of samples in the minority
class and balance the class distribution, which can lead to better performance of the machine
learning model. It's important to note that while SMOTE can help to address class imbalance, it
may not always lead to the best performance and other techniques may need to be considered.
The imblearn library provides a range of functions for handling imbalanced datasets, including
the SMOTE function. Here are the steps to apply SMOTE to the training data using imblearn:
1. Import the necessary libraries: Start by importing the necessary libraries. You will need
the imblearn library for applying SMOTE and the NumPy library for data manipulation.
from imblearn.over_sampling import SMOTE
import numpy as np
2. Create the SMOTE object: Create an instance of the SMOTE class, which will be used to
apply the SMOTE algorithm to the training data. You can specify the sampling strategy
as "minority" to only apply SMOTE to the minority class.
smote = SMOTE(sampling_strategy='minority')
3. Fit and transform the training data: Apply the fit_transform method of the SMOTE object
to the training data to generate synthetic data points for the minority class. This method
takes as input the feature matrix X_train and the target vector y_train and returns the
balanced training data set.
X_train_balanced, y_train_balanced = smote.fit_transform(X_train, y_train)
4. Check the class distribution: Verify that the class distribution of the balanced training
data set is now balanced by calculating the number of samples in each class.
The SMOTE function generates synthetic data points for the minority class by interpolating
between existing data points. The number of synthetic data points generated for each existing
data point is proportional to the imbalance ratio between the minority and majority classes. For
example, if the imbalance ratio is 1:10 (i.e., the minority class has 10% of the samples of the
majority class), SMOTE should generate 9 new data points for each existing data point in the
minority class.
12. International Journal of Computer Networks & Communications (IJCNC) Vol.15, No.4, July 2023
104
By applying SMOTE to the training data using the imblearn library, it can generate synthetic data
points for the minority class and balance the class distribution, which can improve the
performance of the machine learning model.
We obtained the results presented in Figure 3, which displays the imbalanced dataset before the
classification stage. It’s showing the distribution of the classes in dataset, with the bars colored
blue for the “Benign” class (label 0) and red for the “Attacks” class (label 1).
Figure 3. The Distribution of the classes after imbalanced
The results represent the importance values of 19 different features obtained through feature
selection using the Information Gain method. Information Gain measures the amount of
information provided by a feature to the classification task. The greater the Information Gain, the
more significant the feature becomes for classification. Based on the findings, here are the top 5
features exhibiting the highest Information Gain:
1. Init Fwd Win Byts (Information Gain = 0.746076591844705)
2. Flow IAT Max (Information Gain = 0.6545814589068368)
3. Flow Duration (Information Gain = 0.6405164250112803)
4. Fwd Pkts/s (Information Gain = 0.634605317927722)
5. Bwd Pkt Len Min (Information Gain = 0.6316676523933165)
The remaining features have Information Gain values ranging from 0.529 to 0.597.Based on
these results, it may be beneficial to focus on the top 5 features during further analysis and
modeling. These features appear to have the most significant impact on the classification task,
and using them could potentially result in a more accurate and efficient model. However, it is
essential to note that the importance of features can vary depending on the specific dataset and
classification task. Therefore, it is important to evaluate and validate the results of feature
selection thoroughly as shown in Table 2.
13. International Journal of Computer Networks & Communications (IJCNC) Vol.15, No.4, July 2023
105
Table 2. Important feature with value
No. Feature Importance Value
1 Init Fwd Win Byts 0.7460765918447050
2 Flow IAT Max 0.6545814589068368
3 Flow Duration 0.6405164250112803
4 Fwd Pkts/s 0.6346053179277220
5 Bwd Pkt Len Min 0.6316676523933165
6 Flow IAT Mean 0.6227957388844962
7 Flow Pkts/s 0.6206618556552554
8 Fwd IAT Max 0.6193223696107817
9 Fwd IAT Tot 0.6077872354594660
10 Fwd IAT Mean 0.5973932520832026
11 Fwd Header Len 0.5359267744506864
12 Subflow Fwd Byts 0.5355080035978141
13 Pkt Len Max 0.5342673508275444
14 Fwd Seg Size Avg 0.5330819927752828
15 Fwd Pkt Len Mean 0.5308392160721118
16 Pkt Len Mean 0.5304229611917282
17 Dst Port 0.5293337733399603
18 Pkt Size Avg 0.5283890303812000
19 Fwd Pkt Len Max 0.5280293285443132
The results in Table 3 represent the performance of five different classifiers: Decision Tree,
Random Forest, Extra Tree, XGBoost, and an Ensemble model. The evaluation metrics used to
measure the performance of each classifier are Accuracy, Precision, Recall, and F1-Score.
Table 3. Performance of different classifiers.
Classifiers Accuracy Precision Recall F1-Score
Decision Tree 0.9786 0.9808 0.9786 0.9779
Random Forest 0.9831 0.9822 0.9832 0.9796
Extra Tree 0.9819 0.9815 0.9819 0.9797
XGBoost 0.9334 0.9838 0.9834 0.9787
Ensemble 0.9836 0.9822 0.9836 0.9798
Accuracy measures the proportion of correctly classified instances out of the total instances. In
this case, the Random Forest classifier achieved the highest accuracy of 0.9831, followed closely
by the Ensemble classifier with an accuracy of 0.9836. The Decision Tree, Extra Tree, and
XGBoost classifiers had accuracies of 0.9786, 0.9819, and 0.9334, respectively. Precision
measures the proportion of true positives (correctly predicted positive instances) out of all
positive predictions. The Decision Tree, Random Forest, Extra Tree, and Ensemble classifiers
achieved high precision scores of 0.9808, 0.9822, 0.9815, and 0.9822, respectively. The XGBoost
classifier had the highest precision score of 0.9838.Recall measures the proportion of true
positives out of all actual positive instances. The Random Forest and Ensemble classifiers
achieved the highest recall scores of 0.9832 and 0.9836, respectively. The Decision Tree and
Extra Tree classifiers had recall scores of 0.9786 and 0.9819, respectively. The XGBoost
classifier had a recall score of 0.9834.F1-Score is the harmonic mean of precision and recall and
provides a single metric that combines both measures. The Random Forest and Ensemble
classifiers achieved the highest F1-Scores of 0.9796 and 0.9798, respectively. The Decision Tree,
Extra Tree, and XGBoost classifiers had F1-Scores of 0.9779, 0.9797, and 0.9787, respectively.
14. International Journal of Computer Networks & Communications (IJCNC) Vol.15, No.4, July 2023
106
Overall, the results suggest that the Ensemble classifiers performed the best in terms of accuracy
and F1-Score. However, depending on the specific use case, other classifiers with high precision
or recall scores may be more appropriate. Additionally, further analysis may be needed to
determine the significance of any differences in performance between the classifiers.
Table 4. Performance of classifiers with and without feature selection.
Classifiers Accuracy
(with FS)
Accuracy
(with FS)
F1-Score
(with FS)
F1-Score
(without FS)
Decision Tree 0.9786 0.9575 0.9779 0.9476
Random Forest 0.9831 0.9737 0.9796 0.9611
Extra Tree 0.9819 0.9687 0.9797 0.9535
XGBoost 0.9334 0.9266 0.9787 0.9118
Ensemble 0.9836 0.9739 0.9798 0.9614
The results (Table 4) show that feature selection improved the performance of all classifiers in
terms of accuracy and F1-Score. In particular, the Random Forest and Ensemble classifiers
achieved a higher accuracy and F1-Score with feature selection. For example, the Random Forest
classifier achieved an accuracy of 0.9831 and an F1-Score of 0.9796 with feature selection,
compared to an accuracy of 0.9737 and an F1-Score of 0.9611 without feature selection.
These results demonstrate the effectiveness of the proposed feature selection technique in
improving the performance of the classifiers. By selecting the most relevant features, the models
were able to achieve higher accuracy and F1-Scores while maintaining high precision and recall.
This emphasizes the significance of feature engineering in machine learning and underscores the
potential advantages of thoughtfully choosing features to enhance model performance.
The study on anomaly detection through feature selection is similar to the other studies in that it
compares the performance of multiple classifiers to identify the most suitable one for a specific
task. However, this study has a specific focus on the impact of feature selection on model
performance, and it emphasizes the importance of feature engineering in machine learning. One
unique aspect of this study is its focus on an imbalanced dataset, which is a common problem in
machine learning. The study shows that classifiers can achieve high accuracy and F1-Scores even
in the presence of class imbalance, indicating their robustness. However, the specific use case
may require a classifier with higher precision or recall, and further experimentation may be
needed to identify the best classifier for a given application.
Overall, the study highlights the importance of feature selection in improving model
performance, and it demonstrates the effectiveness of five classifiers in an anomaly detection
task. The study's findings can be valuable for practical applications of machine learning, where
model performance is crucial for identifying anomalies and making accurate predictions.
6. CONCLUSIONS
By experimentation, this study sought to determine how feature selection can increase the
accuracy of anomaly detection. The Information Gain technique was used since it can determine
how much weight to give to feature information. Using Accuracy, Precision, Recall, and F1-
Score criteria, the performance of five classifiers Decision Tree, Random Forest, Extra Tree,
XGBoost, and Ensemble model was assessed. The Ensemble classifier came in second with an
accuracy of 0.9836, closely behind the Random Forest classifier, which had the greatest accuracy
of 0.9831. However, it is worth noting that the dataset used in this study was imbalanced, with
the majority class representing over 98% of the instances, which can present challenges for
15. International Journal of Computer Networks & Communications (IJCNC) Vol.15, No.4, July 2023
107
classification models. Despite this, the classifiers achieved high accuracy and F1-Scores,
indicating their robustness to class imbalance.
Furthermore, the study showed that feature selection played a crucial role in the performance of
the classifiers. By selecting the most relevant features, the models achieved higher accuracy and
F1-Scores while maintaining high precision and recall. This emphasizes the significance of
feature engineering in machine learning and the possible benefits of selecting features properly to
maximize model performance.
In summary, this study demonstrated the performance of five classifiers on an imbalanced dataset
with carefully selected features. The Random Forest and Ensemble classifiers performed the best
in terms of accuracy and F1-Score, but the specific use case may require a classifier with higher
precision or recall. Further analysis and experimentation may be needed to determine the best
classifier for a particular application, but the results underscore the importance of feature
selection and its potential to improve model performance.
REFERENCES
[1] D. Ravikumar, “Towards Enhancement of Machine Learning Techniques Using CSE-CIC-IDS2018
Cybersecurity Dataset.” [Online]. Available: https://scholarworks.rit.edu/theses
[2] N. Hariyale, M. S. Rathore, R. Prasad, and P. Saurabh, A Hybrid Approach for Intrusion Detection
System, vol. 1048. Springer US, 2020. doi: 10.1007/978-981-15-0035-0_31.
[3] A. S. B. H. Ismail, A. H. Abdullah, K. B. A. Bak, M. A. Bin Ngadi, D. Dahlan, and W. Chimphlee,
“A novel method for unsupervised anomaly detection using unlabelled data,” Proceedings - The
International Conference on Computational Sciences and its Applications, ICCSA 2008, pp. 252–260,
2008, doi: 10.1109/ICCSA.2008.70.
[4] P. Dokas, L. Ertoz, V. Kumar, A. Lazarevic, J. Srivastava, and P.-N. Tan, “Data mining for network
intrusion detection,” National Science Foundation Workshop on Next Generation Data Mining, vol.
38, no. 7, pp. 21–30, 2002, [Online]. Available:
http://scholar.google.com/scholar?hl=en&btnG=Search&q=intitle:Data+Mining+for+Network+Intrus
ion+Detection%5Cnhttp://www.csee.umbc.edu/~kolari1/Mining/ngdm/dokas.pdf
[5] T. N. Kim, T. N. Tri, L. T. Nguyen, and D. T. Truong, “A combination of the intrusion detection
system and the open-source firewall using python language,” International Journal of Computer
Networks and Communications, vol. 14, no. 1, pp. 59–69, Jan. 2022, doi: 10.5121/ijcnc.2022.14104.
[6] M. Awad and A. Alabdallah, “Addressing imbalanced classes problem of intrusion detection system
using weighted Extreme Learning Machine,” International Journal of Computer Networks and
Communications, vol. 11, no. 5, pp. 39–58, 2019, doi: 10.5121/ijcnc.2019.11503.
[7] T. T. Huynh and H. T. Nguyen, “on the performance of intrusion detection systems with hidden
multilayer neural network using dsd training,” International Journal of Computer Networks and
Communications, vol. 14, no. 1, pp. 117–137, Jan. 2022, doi: 10.5121/ijcnc.2022.14108.
[8] S. H. Kang and K. J. Kim, “A feature selection approach to find optimal feature subsets for the
network intrusion detection system,” Cluster Comput, vol. 19, no. 1, pp. 325–333, Mar. 2016, doi:
10.1007/s10586-015-0527-8.
[9] A. A. Salih and A. M. Abdulazeez, “Evaluation of Classification Algorithms for Intrusion Detection
System: A Review,” Journal of Soft Computing and Data Mining, vol. 02, no. 01, Apr. 2021, doi:
10.30880/jscdm.2021.02.01.004.
[10] N. H. Son and H. T. Dung, “a lightweight method for detecting cyber attacks in high-traffic large
networks based on clustering techniques,” International Journal of Computer Networks and
Communications, vol. 15, no. 1, pp. 35–51, Jan. 2023, doi: 10.5121/ijcnc.2023.15103.
[11] S. S. Dhaliwal, A. Al Nahid, and R. Abbas, “Effective intrusion detection system using XGBoost,”
Information (Switzerland), vol. 9, no. 7, Jun. 2018, doi: 10.3390/info9070149.
16. International Journal of Computer Networks & Communications (IJCNC) Vol.15, No.4, July 2023
108
[12] S. Singh Panwar, Y. P. Raiwani, and L. S. Panwar, “Evaluation of Network Intrusion Detection with
Features Selection and Machine Learning Algorithms on CICIDS-2017 Dataset,” in SSRN Electronic
Journal, 2019. doi: 10.2139/ssrn.3394103.
[13] L. McClendon and N. Meghanathan, “Using Machine Learning Algorithms to Analyze Crime Data,”
Machine Learning and Applications: An International Journal, vol. 2, no. 1, pp. 1–12, Mar. 2015, doi:
10.5121/mlaij.2015.2101.
[14] A. Khan, “The State of the Art in Intrusion Prevention and Detection,” The State of the Art in
Intrusion Prevention and Detection, 2014, doi: 10.1201/b16390.
[15] A. Shafee, M. Baza, D. A. Talbert, M. M. Fouda, M. Nabil, and M. Mahmoud, “Mimic Learning to
Generate a Shareable Network Intrusion Detection Model; Mimic Learning to Generate a Shareable
Network Intrusion Detection Model,” 2020.
[16] J. L. Leevy, J. Hancock, R. Zuech, and T. M. Khoshgoftaar, “Detecting cybersecurity attacks across
different network features and learners,” J Big Data, vol. 8, no. 1, Dec. 2021, doi: 10.1186/s40537-
021-00426-w.
[17] F. Colas and P. Brazdil, “Comparison of SVM and some older classification algorithms in text
classification tasks,” IFIP International Federation for Information Processing, vol. 217, pp. 169–178,
2006, doi: 10.1007/978-0-387-34747-9_18.
[18] H. Li, H. Jiang, D. Wang, and B. Han, “An improved KNN algorithm for text classification,” in
Proceedings - 8th International Conference on Instrumentation and Measurement, Computer,
Communication and Control, IMCCC 2018, Institute of Electrical and Electronics Engineers Inc., Jul.
2018, pp. 1081–1085. doi: 10.1109/IMCCC.2018.00225.
[19] O. E. Elejla, B. Belaton, M. Anbar, B. Alabsi, and A. K. Al-Ani, “Comparison of classification
algorithms on ICMPv6-based DDoS attacks detection,” in Lecture Notes in Electrical Engineering,
Springer Verlag, 2019, pp. 347–357. doi: 10.1007/978-981-13-2622-6_34.
[20] M. Bahrololum, E. Salahi, and M. Khaleghi, “Anomaly intrusion detection design using hybrid of
unsupervised and supervised neural network,” 2009.
[21] S. Chimphlee and W. Chimphlee, “Machine learning to improve the performance of anomaly-based
network intrusion detection in big data,” Indonesian Journal of Electrical Engineering and Computer
Science, vol. 30, no. 2, pp. 1106–1119, May 2023, doi: 10.11591/ijeecs.v30.i2.pp1106-1119.
[22] “CSE-CIC-IDS2018 dataset,” 2018. https://www.unb.ca/cic/datasets/ids-2018.html
[23] J. L. Leevy and T. M. Khoshgoftaar, “A survey and analysis of intrusion detection models based on
CSE-CIC-IDS2018 Big Data,” J Big Data, vol. 7, no. 1, Dec. 2020, doi: 10.1186/s40537-020-00382-
x.
[24] Q. Zhou and D. Pezaros, “Evaluation of Machine Learning Classifiers for Zero-Day Intrusion
Detection -- An Analysis on CIC-AWS-2018 dataset,” 2019, [Online]. Available:
http://arxiv.org/abs/1905.03685
[25] D. V. Jeyanthi and B. Indrani, “An efficient intrusion detection system with custom features using
fpa-gradient boost machine learning algorithm,” International Journal of Computer Networks and
Communications, vol. 14, no. 1, pp. 99–115, Jan. 2022, doi: 10.5121/ijcnc.2022.14107.
[26] H. Motoda and H. Liu, “Feature selection, extraction and construction,” Communication of IICM,
vol. 5, pp. 67–72, 2002.
[27] S. Khalid, T. Khalil, and S. Nasreen, “A survey of feature selection and feature extraction techniques
in machine learning,” 2014. doi: 10.1109/SAI.2014.6918213.
[28] A. Agarwal, P. Sharma, M. Alshehri, A. A. Mohamed, and O. Alfarraj, “Classification model for
accuracy and intrusion detection using machine learning approach,” PeerJ Comput Sci, vol. 7, pp. 1–
22, 2021, doi: 10.7717/PEERJ-CS.437.
[29] J. L. Leevy and T. M. Khoshgoftaar, “A survey and analysis of intrusion detection models based on
CSE-CIC-IDS2018 Big Data,” J Big Data, vol. 7, no. 1, Dec. 2020, doi: 10.1186/s40537-020-00382-
x.
[30] N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, “SMOTE: Synthetic minority over-
sampling technique,” Journal of Artificial Intelligence Research, vol. 16, no. 1, pp. 321–357, 2002,
doi: 10.1613/jair.953.
17. International Journal of Computer Networks & Communications (IJCNC) Vol.15, No.4, July 2023
109
[31] R. A. A. Viadinugroho, “Imbalanced Classification in Python: SMOTE-ENN Method,” Toward Data
Science, Apr. 18, 2021. https://towardsdatascience.com/imbalanced-classification-in-python-smote-
enn-method-db5db06b8d50 (accessed May 18, 2022).
[32] N. Japkowicz, “Learning from imbalanced data sets: a comparison of various strategies,” 2000.
[Online]. Available: www.aaai.org
[33] L. Mohan, S. Jain, P. Suyal, and A. Kumar, “Data mining Classification Techniques for Intrusion
Detection System,” Proceedings - 2020 12th International Conference on Computational Intelligence
and Communication Networks, CICN 2020, pp. 351–355, 2020, doi:
10.1109/CICN49253.2020.9242642.
[34] A. Gouveia and M. Correia, “Network Intrusion Detection with XGBoost.” [Online]. Available:
https://www.kdd.org/kdd-cup/view/kdd-cup-1999
[35] A. Ammar, “A Decision Tree Classifier for Intrusion Detection Priority Tagging,” Journal of
Computer and Communications, vol. 03, no. 04, pp. 52–58, 2015, doi: 10.4236/jcc.2015.34006.
AUTHORS
Witcha Chimphlee his PhD in Computer Science from the University Technology of
Malaysia. Currently, he holds the position of Assistant Professor in the Data Science and
Analytics department at Suan Dusit University in Thailand. His current research interests
include Machine learning, computer networks and security, data science, and big data
analytics. He has published several papers in peer-reviewed journals and has actively
participated in various international conferences. He is a dedicated researcher with a passion for advancing
the field of computer science through his work
Siriporn Chimphlee, holds a PhD in Computer Science from the University Technology
of Malaysia and is currently an Assistant Professor in the Data Science and Analytics
department at Suan Dusit University in Thailand. Her research interests include data
mining, intrusion detection, web mining, and information technology. She has published
several papers in reputed journals and has actively participated in international
conferences. She is passionate about exploring new avenues in computer science and
constantly strives to contribute to the field through her research.