The widespread use of the Internet has an adverse effect of being vulnerable to cyber attacks. Defensive
mechanisms like firewalls and IDSs have evolved with a lot of research contributions happening in these
areas. Machine learning techniques have been successfully used in these defense mechanisms especially
IDSs. Although they are effective to some extent in identifying new patterns and variants of existing
malicious patterns, many attacks are still left as undetected. The objective is to develop an algorithm for
detecting malicious domains based on passive traffic measurements. In this paper, an anomaly-based
intrusion detection system based on an ensemble based machine learning classifier called Random Forest
with gradient boosting is deployed. NSL-KDD cup dataset is used for analysis and out of 41 features, 32
features were identified as significant using feature discretion.
The main goal of Intrusion Detection Systems (IDSs) is
to detect intrusions. This kind of detection system represents a
significant tool in traditional computer based systems for ensuring
cyber security. IDS model can be faster and reach more accurate
detection rates, by selecting the most related features from the
input dataset. Feature selection is an important stage of any IDs to
select the optimal subset of features that enhance the process of the
training model to become faster and reduce the complexity while
preserving or enhancing the performance of the system. In this
paper, we proposed a method that based on dividing the input
dataset into different subsets according to each attack. Then we
performed a feature selection technique using information gain
filter for each subset. Then the optimal features set is generated by
combining the list of features sets that obtained for each attack.
Experimental results that conducted on NSL-KDD dataset shows
that the proposed method for feature selection with fewer features,
make an improvement to the system accuracy while decreasing the
complexity. Moreover, a comparative study is performed to the
efficiency of technique for feature selection using different
classification methods. To enhance the overall performance,
another stage is conducted using Random Forest and PART on
voting learning algorithm. The results indicate that the best
accuracy is achieved when using the product probability rule.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
Evaluation of network intrusion detection using markov chainIJCI JOURNAL
Day today life internet threat has been increased significantly. There is a need to develop model in order to
maintain security of system. The most effective techniques are Intrusion Detection System (IDS).The
purpose of intrusion system through the security devices detect and deal with it. In this paper, a
mathematical approach is used effectively to predict and detect intrusion in the network. Here we discuss
about two algorithms ‘K-Means + Apriori’, a method which classify normal and abnormal activities in
computer network. In K-Means process, it partitions the training set into K-clusters using Euclidean
distance and introduce an outlier factor, then it build Apriori Algorithm to prune the data by removing
infrequent data in the database. Based on defined state the degree of incoming data is evaluated through
the experiment using sample DARPA2000 dataset, and achieves high detection performance in level of
attack in stages.
FORTIFICATION OF HYBRID INTRUSION DETECTION SYSTEM USING VARIANTS OF NEURAL ...IJNSA Journal
Intrusion Detection Systems (IDS) form a key part of system defence, where it identifies abnormal
activities happening in a computer system. In recent years different soft computing based techniques have
been proposed for the development of IDS. On the other hand, intrusion detection is not yet a perfect
technology. This has provided an opportunity for data mining to make quite a lot of important
contributions in the field of intrusion detection. In this paper we have proposed a new hybrid technique
by utilizing data mining techniques such as fuzzy C means clustering, Fuzzy neural network / Neurofuzzy and radial basis function(RBF) SVM for fortification of the intrusion detection system. The
proposed technique has five major steps in which, first step is to perform the relevance analysis, and then
input data is clustered using Fuzzy C-means clustering. After that, neuro-fuzzy is trained, such that each
of the data point is trained with the corresponding neuro-fuzzy classifier associated with the cluster.
Subsequently, a vector for SVM classification is formed and in the last step, classification using RBF-
SVM is performed to detect intrusion has happened or not. Data set used is the KDD cup 1999 dataset
and we have used precision, recall, F-measure and accuracy as the evaluation metrics parameters. Our
technique could achieve better accuracy for all types of intrusions. The results of proposed technique are
compared with the other existing techniques. These comparisons proved the effectiveness of our
technique.
Wmn06MODERNIZED INTRUSION DETECTION USING ENHANCED APRIORI ALGORITHM ijwmn
Communication networks are essential and it will create many crucial issues today. Nowadays, we
consider that the firewalls are the first line of defense but that policies cannot meet the particular
requirements of needed process to achieve security. Most of the research has been done in this area but
we are lagging to achieve security needs. Already many models such as ADAM, DHP, LERAD and
ENTROPHY are proposed to resolve security problems but we need an efficient model to detect new types
of various intrusions within the entire network. In this paper, we proposed to design a modernized
intrusion detection system which consist of two methods such as anomaly and misuse detection. Both are
integrated and also used to detect novel attacks. Our system proposed to discover temporal pattern of
attacker behaviors, which is profiled using an algorithm EAA (Enhanced Apriori Algorithm). This is
experimented with a simple interface to display the behaviors of attacks effectively
A PROPOSED MODEL FOR DIMENSIONALITY REDUCTION TO IMPROVE THE CLASSIFICATION C...IJNSA Journal
Over the past few years, intrusion protection systems have drawn a mature research area in the field of computer networks. The problem of excessive features has a significant impact on
intrusion detection performance. The use of machine learning algorithms in many previous researches has been used to identify network traffic, harmful or normal. Therefore, to obtain the accuracy, we must reduce the dimensionality of the data used. A new model design based on a combination of feature selection and machine learning algorithms is proposed in this paper. This model depends on selected genes from every feature to increase the accuracy of intrusion detection systems. We selected from features content only ones which impact in attack detection. The performance has been evaluated based on a comparison of several known algorithms. The NSL-KDD dataset is used for examining classification. The proposed model outperformed the other learning approaches with accuracy 98.8 %.
The main goal of Intrusion Detection Systems (IDSs) is
to detect intrusions. This kind of detection system represents a
significant tool in traditional computer based systems for ensuring
cyber security. IDS model can be faster and reach more accurate
detection rates, by selecting the most related features from the
input dataset. Feature selection is an important stage of any IDs to
select the optimal subset of features that enhance the process of the
training model to become faster and reduce the complexity while
preserving or enhancing the performance of the system. In this
paper, we proposed a method that based on dividing the input
dataset into different subsets according to each attack. Then we
performed a feature selection technique using information gain
filter for each subset. Then the optimal features set is generated by
combining the list of features sets that obtained for each attack.
Experimental results that conducted on NSL-KDD dataset shows
that the proposed method for feature selection with fewer features,
make an improvement to the system accuracy while decreasing the
complexity. Moreover, a comparative study is performed to the
efficiency of technique for feature selection using different
classification methods. To enhance the overall performance,
another stage is conducted using Random Forest and PART on
voting learning algorithm. The results indicate that the best
accuracy is achieved when using the product probability rule.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
Evaluation of network intrusion detection using markov chainIJCI JOURNAL
Day today life internet threat has been increased significantly. There is a need to develop model in order to
maintain security of system. The most effective techniques are Intrusion Detection System (IDS).The
purpose of intrusion system through the security devices detect and deal with it. In this paper, a
mathematical approach is used effectively to predict and detect intrusion in the network. Here we discuss
about two algorithms ‘K-Means + Apriori’, a method which classify normal and abnormal activities in
computer network. In K-Means process, it partitions the training set into K-clusters using Euclidean
distance and introduce an outlier factor, then it build Apriori Algorithm to prune the data by removing
infrequent data in the database. Based on defined state the degree of incoming data is evaluated through
the experiment using sample DARPA2000 dataset, and achieves high detection performance in level of
attack in stages.
FORTIFICATION OF HYBRID INTRUSION DETECTION SYSTEM USING VARIANTS OF NEURAL ...IJNSA Journal
Intrusion Detection Systems (IDS) form a key part of system defence, where it identifies abnormal
activities happening in a computer system. In recent years different soft computing based techniques have
been proposed for the development of IDS. On the other hand, intrusion detection is not yet a perfect
technology. This has provided an opportunity for data mining to make quite a lot of important
contributions in the field of intrusion detection. In this paper we have proposed a new hybrid technique
by utilizing data mining techniques such as fuzzy C means clustering, Fuzzy neural network / Neurofuzzy and radial basis function(RBF) SVM for fortification of the intrusion detection system. The
proposed technique has five major steps in which, first step is to perform the relevance analysis, and then
input data is clustered using Fuzzy C-means clustering. After that, neuro-fuzzy is trained, such that each
of the data point is trained with the corresponding neuro-fuzzy classifier associated with the cluster.
Subsequently, a vector for SVM classification is formed and in the last step, classification using RBF-
SVM is performed to detect intrusion has happened or not. Data set used is the KDD cup 1999 dataset
and we have used precision, recall, F-measure and accuracy as the evaluation metrics parameters. Our
technique could achieve better accuracy for all types of intrusions. The results of proposed technique are
compared with the other existing techniques. These comparisons proved the effectiveness of our
technique.
Wmn06MODERNIZED INTRUSION DETECTION USING ENHANCED APRIORI ALGORITHM ijwmn
Communication networks are essential and it will create many crucial issues today. Nowadays, we
consider that the firewalls are the first line of defense but that policies cannot meet the particular
requirements of needed process to achieve security. Most of the research has been done in this area but
we are lagging to achieve security needs. Already many models such as ADAM, DHP, LERAD and
ENTROPHY are proposed to resolve security problems but we need an efficient model to detect new types
of various intrusions within the entire network. In this paper, we proposed to design a modernized
intrusion detection system which consist of two methods such as anomaly and misuse detection. Both are
integrated and also used to detect novel attacks. Our system proposed to discover temporal pattern of
attacker behaviors, which is profiled using an algorithm EAA (Enhanced Apriori Algorithm). This is
experimented with a simple interface to display the behaviors of attacks effectively
A PROPOSED MODEL FOR DIMENSIONALITY REDUCTION TO IMPROVE THE CLASSIFICATION C...IJNSA Journal
Over the past few years, intrusion protection systems have drawn a mature research area in the field of computer networks. The problem of excessive features has a significant impact on
intrusion detection performance. The use of machine learning algorithms in many previous researches has been used to identify network traffic, harmful or normal. Therefore, to obtain the accuracy, we must reduce the dimensionality of the data used. A new model design based on a combination of feature selection and machine learning algorithms is proposed in this paper. This model depends on selected genes from every feature to increase the accuracy of intrusion detection systems. We selected from features content only ones which impact in attack detection. The performance has been evaluated based on a comparison of several known algorithms. The NSL-KDD dataset is used for examining classification. The proposed model outperformed the other learning approaches with accuracy 98.8 %.
An intrusion detection system for packet and flow based networks using deep n...IJECEIAES
Study on deep neural networks and big data is merging now by several aspects to enhance the capabilities of intrusion detection system (IDS). Many IDS models has been introduced to provide security over big data. This study focuses on the intrusion detection in computer networks using big datasets. The advent of big data has agitated the comprehensive assistance in cyber security by forwarding a brunch of affluent algorithms to classify and analysis patterns and making a better prediction more efficiently. In this study, to detect intrusion a detection model has been propounded applying deep neural networks. We applied the suggested model on the latest dataset available at online, formatted with packet based, flow based data and some additional metadata. The dataset is labeled and imbalanced with 79 attributes and some classes having much less training samples compared to other classes. The proposed model is build using Keras and Google Tensorflow deep learning environment. Experimental result shows that intrusions are detected with the accuracy over 99% for both binary and multiclass classification with selected best features. Receiver operating characteristics (ROC) and precision-recall curve average score is also 1. The outcome implies that Deep Neural Networks offers a novel research model with great accuracy for intrusion detection model, better than some models presented in the literature.
CLASSIFICATION PROCEDURES FOR INTRUSION DETECTION BASED ON KDD CUP 99 DATA SETIJNSA Journal
In network security framework, intrusion detection is one of a benchmark part and is a fundamental way to protect PC from many threads. The huge issue in intrusion detection is presented as a huge number of false alerts; this issue motivates several experts to discover the solution for minifying false alerts according to data mining that is a consideration as analysis procedure utilized in a large data e.g. KDD CUP 99. This paper presented various data mining classification for handling false alerts in intrusion detection as reviewed. According to the result of testing many procedure of data mining on KDD CUP 99 that is no individual procedure can reveal all attack class, with high accuracy and without false alerts. The best accuracy in Multilayer Perceptron is 92%; however, the best Training Time in Rule based model is 4 seconds . It is concluded that ,various procedures should be utilized to handle several of network attacks.
CLASSIFICATION PROCEDURES FOR INTRUSION DETECTION BASED ON KDD CUP 99 DATA SETIJNSA Journal
In network security framework, intrusion detection is one of a benchmark part and is a fundamental way to protect PC from many threads. The huge issue in intrusion detection is presented as a huge number of false alerts; this issue motivates several experts to discover the solution for minifying false alerts according to data mining that is a consideration as analysis procedure utilized in a large data e.g. KDD CUP 99. This paper presented various data mining classification for handling false alerts in intrusion detection as reviewed. According to the result of testing many procedure of data mining on KDD CUP 99 that is no individual procedure can reveal all attack class, with high accuracy and without false alerts. The best accuracy in Multilayer Perceptron is 92%; however, the best Training Time in Rule based model is 4 seconds . It is concluded that ,various procedures should be utilized to handle several of network attacks.
Visualize network anomaly detection by using k means clustering algorithmIJCNCJournal
With the ever increasing amount of new attacks in today’s world the amount of data will keep increasing,
and because of the base-rate fallacy the amount of false alarms will also increase. Another problem with
detection of attacks is that they usually isn’t detected until after the attack has taken place, this makes
defending against attacks hard and can easily lead to disclosure of sensitive information.
In this paper we choose K-means algorithm with the Kdd Cup 1999 network data set to evaluate the
performance of an unsupervised learning method for anomaly detection. The results of the evaluation
showed that a high detection rate can be achieve while maintaining a low false alarm rate .This paper
presents the result of using k-means clustering by applying Cluster 3.0 tool and visualized this result by
using TreeView visualization tool .
A Novel Classification via Clustering Method for Anomaly Based Network Intrus...IDES Editor
Intrusion detection in the internet is an active
area of research. Intruders can be classified into two
types, namely; external intruders who are unauthorized
users of the computers they attack, and internal
intruders, who have permission to access the system but
with some restrictions. The aim of this paper is to present
a methodology to recognize attacks during the normal
activities in a system. A novel classification via sequential
information bottleneck (sIB) clustering algorithm has
been proposed to build an efficient anomaly based
network intrusion detection model. We have compared
our proposed method with other clustering algorithms
like X-Means, Farthest First, Filtered clusters, DBSCAN,
K-Means, and EM (Expectation-Maximization)
clustering in order to find the suitability of our proposed
algorithm. A subset of KDDCup 1999 intrusion detection
benchmark dataset has been used for the experiment.
Results show that the proposed method is efficient in
terms of detection accuracy, low false positive rate in
comparison to the other existing methods.
COPYRIGHTThis thesis is copyright materials protected under the .docxvoversbyobersby
COPYRIGHT
This thesis is copyright materials protected under the Berne Convection, the copyright Act 1999 and other international and national enactments in that behalf, on intellectual property. It may not be reproduced by any means in full or in part except for short extracts in fair dealing so for research or private study, critical scholarly review or discourse with acknowledgment, with written permission of the Dean School of Graduate Studies on behalf of both the author and XXX XXX University.ABSTRACT
With Fast growing internet world the risk of intrusion has also increased, as a result Intrusion Detection System (IDS) is the admired key research field. IDS are used to identify any suspicious activity or patterns in the network or machine, which endeavors the security features or compromise the machine. IDS majorly use all the features of the data. It is a keen observation that all the features are not of equal relevance for the detection of attacks. Moreover every feature does not contribute in enhancing the system performance significantly. The main aim of the work done is to develop an efficient denial of service network intrusion classification model. The specific objectives included: to analyse existing literature in intrusion detection systems; what are the techniques used to model IDS, types of network attacks, performance of various machine learning tools, how are network intrusion detection systems assessed; to find out top network traffic attributes that can be used to model denial of service intrusion detection; to develop a machine learning model for detection of denial of service network intrusion.Methods: The research design was experimental and data was collected by simulation using NSL-KDD dataset. By implementing Correlation Feature Selection (CFS) mechanism using three search algorithms, a smallest set of features is selected with all the features that are selected very frequently. Findings: The smallest subset of features chosen is the most nominal among all the feature subset found. Further, the performances using Artificial neural networks(ANN), decision trees, Support Vector Machines (SVM) and K-Nearest Neighbour (KNN) classifiers is compared for 7 subsets found by filter model and 41 attributes. Results: The outcome indicates a remarkable improvement in the performance metrics used for comparison of the two classifiers. The results show that using 17/18 selected features improves DOS types classification accuracies as compared to using the 41 features in the NSL-KDD dataset. It was further observed that using an ensemble of three classifiers with decision fusion performs better as compared to using a single classifier for DOS type’s classification. Among machine learning tools experimented, ANN achieved best classification accuracies followed by SVM and DT. KNN registered the lowest classification accuracies. Application: The proposed work with such an improved detection rate and lesser classification time and lar.
Through the generalization of deep learning, the research community has addressed critical challenges in
the network security domain, like malware identification and anomaly detection. However, they have yet to
discuss deploying them on Internet of Things (IoT) devices for day-to-day operations. IoT devices are often
limited in memory and processing power, rendering the compute-intensive deep learning environment
unusable. This research proposes a way to overcome this barrier by bypassing feature engineering in the
deep learning pipeline and using raw packet data as input. We introduce a feature- engineering-less
machine learning (ML) process to perform malware detection on IoT devices. Our proposed model,”
Feature engineering-less ML (FEL-ML),” is a lighter-weight detection algorithm that expends no extra
computations on “engineered” features. It effectively accelerates the low-powered IoT edge. It is trained
on unprocessed byte-streams of packets. Aside from providing better results, it is quicker than traditional
feature-based methods. FEL-ML facilitates resource-sensitive network traffic security with the added
benefit of eliminating the significant investment by subject matter experts in feature engineering.
EFFICIENT ATTACK DETECTION IN IOT DEVICES USING FEATURE ENGINEERING-LESS MACH...ijcsit
Through the generalization of deep learning, the research community has addressed critical challenges in
the network security domain, like malware identification and anomaly detection. However, they have yet to
discuss deploying them on Internet of Things (IoT) devices for day-to-day operations. IoT devices are often
limited in memory and processing power, rendering the compute-intensive deep learning environment
unusable. This research proposes a way to overcome this barrier by bypassing feature engineering in the
deep learning pipeline and using raw packet data as input. We introduce a feature- engineering-less
machine learning (ML) process to perform malware detection on IoT devices. Our proposed model,”
Feature engineering-less ML (FEL-ML),” is a lighter-weight detection algorithm that expends no extra
computations on “engineered” features. It effectively accelerates the low-powered IoT edge. It is trained
on unprocessed byte-streams of packets. Aside from providing better results, it is quicker than traditional
feature-based methods. FEL-ML facilitates resource-sensitive network traffic security with the added
benefit of eliminating the significant investment by subject matter experts in feature engineering.
New kind of intrusions causes deviation in the normal behaviour of traffic flow in
computer networks every day. This study focused on enhancing the learning capabilities of IDS
to detect the anomalies present in a network traffic flow by comparing the k-means approach of
data mining for intrusion detection and the outlier detection approach. The k-means approach
uses clustering mechanisms to group the traffic flow data into normal and abnormal clusters.
Outlier detection calculates an outlier score (neighbourhood outlier factor (NOF)) for each flow
record, whose value decides whether a traffic flow is normal or abnormal. These two methods
were then compared in terms of various performance metrics and the amount of computer
resources consumed by them. Overall, k-means was more accurate and precise and has better
classification rate than outlier detection in intrusion detection using traffic flows. This will help
systems administrators in their choice of IDS.
An Efficient Intrusion Detection System with Custom Features using FPA-Gradie...IJCNCJournal
An efficient Intrusion Detection System has to be given high priority while connecting systems with a network to prevent the system before an attack happens. It is a big challenge to the network security group to prevent the system from a variable types of new attacks as technology is growing in parallel. In this paper, an efficient model to detect Intrusion is proposed to predict attacks with high accuracy and less false-negative rate by deriving custom features UNSW-CF by using the benchmark intrusion dataset UNSW-NB15. To reduce the learning complexity, Custom Features are derived and then Significant Features are constructed by applying meta-heuristic FPA (Flower Pollination algorithm) and MRMR (Minimal Redundancy and Maximum Redundancy) which reduces learning time and also increases prediction accuracy. ENC (ElasicNet Classifier), KRRC (Kernel Ridge Regression Classifier), IGBC (Improved Gradient Boosting Classifier) is employed to classify the attacks in the datasets UNSW-CF, UNSW and recorded that UNSW-CF with derived custom features using IGBC integrated with FPA provided high accuracy of 97.38% and a low error rate of 2.16%. Also, the sensitivity and specificity rate for IGB attains a high rate of 97.32% and 97.50% respectively.
AN EFFICIENT INTRUSION DETECTION SYSTEM WITH CUSTOM FEATURES USING FPA-GRADIE...IJCNCJournal
An efficient Intrusion Detection System has to be given high priority while connecting systems with a network to prevent the system before an attack happens. It is a big challenge to the network security group to prevent the system from a variable types of new attacks as technology is growing in parallel. In this paper, an efficient model to detect Intrusion is proposed to predict attacks with high accuracy and less false-negative rate by deriving custom features UNSW-CF by using the benchmark intrusion dataset UNSW-NB15. To reduce the learning complexity, Custom Features are derived and then Significant Features are constructed by applying meta-heuristic FPA (Flower Pollination algorithm) and MRMR (Minimal Redundancy and Maximum Redundancy) which reduces learning time and also increases prediction accuracy. ENC (ElasicNet Classifier), KRRC (Kernel Ridge Regression Classifier), IGBC (Improved Gradient Boosting Classifier) is employed to classify the attacks in the datasets UNSW-CF, UNSW and recorded that UNSW-CF with derived custom features using IGBC integrated with FPA provided high accuracy of 97.38% and a low error rate of 2.16%. Also, the sensitivity and specificity rate for IGB attains a high rate of 97.32% and 97.50% respectively.
IOT SOLUTIONS FOR SMART PARKING- SIGFOX TECHNOLOGYCSEIJJournal
Sigfox technology has emerged as a competitive product in the communication service provider market for
approximately a decade. Widely implemented for smart parking solutions across various European
countries, it has now gained traction in Germany as well. The technology's successful track record and
reputation in the market demonstrate its effectiveness and reliability in addressing the communication
needs of IoT applications, particularly in the context of vehicle parking systems. This is noted in terms of a
city like Berlin-Germany, for on which the study is conducted. The major challenge being on how to relate
the parking techniques in a more user friendly, cost effective and less energy consumpmti0n mode where
the questions had at the beginning of the paper, relatively at the end the answers are sought to it via Sigfox
and its comparison with other related technologies like LoRA WAN and weightless. But more so future
areas of research study is also pointed out on areas which are not clearly identified in this particular
research area.
This paper entails the pros, cons adaptive, emerging and existing technology study in terms of cloud, big
data, Data analytics are all discussed in tandem to Sigfox.
Reliability Improvement with PSP of Web-Based Software ApplicationsCSEIJJournal
In diverse industrial and academic environments, the quality of the software has been evaluated using
different analytic studies. The contribution of the present work is focused on the development of a
methodology in order to improve the evaluation and analysis of the reliability of web-based software
applications. The Personal Software Process (PSP) was introduced in our methodology for improving the
quality of the process and the product. The Evaluation + Improvement (Ei) process is performed in our
methodology to evaluate and improve the quality of the software system. We tested our methodology in a
web-based software system and used statistical modeling theory for the analysis and evaluation of the
reliability. The behavior of the system under ideal conditions was evaluated and compared against the
operation of the system executing under real conditions. The results obtained demonstrated the
effectiveness and applicability of our methodology
More Related Content
Similar to Attack Detection Availing Feature Discretion using Random Forest Classifier
An intrusion detection system for packet and flow based networks using deep n...IJECEIAES
Study on deep neural networks and big data is merging now by several aspects to enhance the capabilities of intrusion detection system (IDS). Many IDS models has been introduced to provide security over big data. This study focuses on the intrusion detection in computer networks using big datasets. The advent of big data has agitated the comprehensive assistance in cyber security by forwarding a brunch of affluent algorithms to classify and analysis patterns and making a better prediction more efficiently. In this study, to detect intrusion a detection model has been propounded applying deep neural networks. We applied the suggested model on the latest dataset available at online, formatted with packet based, flow based data and some additional metadata. The dataset is labeled and imbalanced with 79 attributes and some classes having much less training samples compared to other classes. The proposed model is build using Keras and Google Tensorflow deep learning environment. Experimental result shows that intrusions are detected with the accuracy over 99% for both binary and multiclass classification with selected best features. Receiver operating characteristics (ROC) and precision-recall curve average score is also 1. The outcome implies that Deep Neural Networks offers a novel research model with great accuracy for intrusion detection model, better than some models presented in the literature.
CLASSIFICATION PROCEDURES FOR INTRUSION DETECTION BASED ON KDD CUP 99 DATA SETIJNSA Journal
In network security framework, intrusion detection is one of a benchmark part and is a fundamental way to protect PC from many threads. The huge issue in intrusion detection is presented as a huge number of false alerts; this issue motivates several experts to discover the solution for minifying false alerts according to data mining that is a consideration as analysis procedure utilized in a large data e.g. KDD CUP 99. This paper presented various data mining classification for handling false alerts in intrusion detection as reviewed. According to the result of testing many procedure of data mining on KDD CUP 99 that is no individual procedure can reveal all attack class, with high accuracy and without false alerts. The best accuracy in Multilayer Perceptron is 92%; however, the best Training Time in Rule based model is 4 seconds . It is concluded that ,various procedures should be utilized to handle several of network attacks.
CLASSIFICATION PROCEDURES FOR INTRUSION DETECTION BASED ON KDD CUP 99 DATA SETIJNSA Journal
In network security framework, intrusion detection is one of a benchmark part and is a fundamental way to protect PC from many threads. The huge issue in intrusion detection is presented as a huge number of false alerts; this issue motivates several experts to discover the solution for minifying false alerts according to data mining that is a consideration as analysis procedure utilized in a large data e.g. KDD CUP 99. This paper presented various data mining classification for handling false alerts in intrusion detection as reviewed. According to the result of testing many procedure of data mining on KDD CUP 99 that is no individual procedure can reveal all attack class, with high accuracy and without false alerts. The best accuracy in Multilayer Perceptron is 92%; however, the best Training Time in Rule based model is 4 seconds . It is concluded that ,various procedures should be utilized to handle several of network attacks.
Visualize network anomaly detection by using k means clustering algorithmIJCNCJournal
With the ever increasing amount of new attacks in today’s world the amount of data will keep increasing,
and because of the base-rate fallacy the amount of false alarms will also increase. Another problem with
detection of attacks is that they usually isn’t detected until after the attack has taken place, this makes
defending against attacks hard and can easily lead to disclosure of sensitive information.
In this paper we choose K-means algorithm with the Kdd Cup 1999 network data set to evaluate the
performance of an unsupervised learning method for anomaly detection. The results of the evaluation
showed that a high detection rate can be achieve while maintaining a low false alarm rate .This paper
presents the result of using k-means clustering by applying Cluster 3.0 tool and visualized this result by
using TreeView visualization tool .
A Novel Classification via Clustering Method for Anomaly Based Network Intrus...IDES Editor
Intrusion detection in the internet is an active
area of research. Intruders can be classified into two
types, namely; external intruders who are unauthorized
users of the computers they attack, and internal
intruders, who have permission to access the system but
with some restrictions. The aim of this paper is to present
a methodology to recognize attacks during the normal
activities in a system. A novel classification via sequential
information bottleneck (sIB) clustering algorithm has
been proposed to build an efficient anomaly based
network intrusion detection model. We have compared
our proposed method with other clustering algorithms
like X-Means, Farthest First, Filtered clusters, DBSCAN,
K-Means, and EM (Expectation-Maximization)
clustering in order to find the suitability of our proposed
algorithm. A subset of KDDCup 1999 intrusion detection
benchmark dataset has been used for the experiment.
Results show that the proposed method is efficient in
terms of detection accuracy, low false positive rate in
comparison to the other existing methods.
COPYRIGHTThis thesis is copyright materials protected under the .docxvoversbyobersby
COPYRIGHT
This thesis is copyright materials protected under the Berne Convection, the copyright Act 1999 and other international and national enactments in that behalf, on intellectual property. It may not be reproduced by any means in full or in part except for short extracts in fair dealing so for research or private study, critical scholarly review or discourse with acknowledgment, with written permission of the Dean School of Graduate Studies on behalf of both the author and XXX XXX University.ABSTRACT
With Fast growing internet world the risk of intrusion has also increased, as a result Intrusion Detection System (IDS) is the admired key research field. IDS are used to identify any suspicious activity or patterns in the network or machine, which endeavors the security features or compromise the machine. IDS majorly use all the features of the data. It is a keen observation that all the features are not of equal relevance for the detection of attacks. Moreover every feature does not contribute in enhancing the system performance significantly. The main aim of the work done is to develop an efficient denial of service network intrusion classification model. The specific objectives included: to analyse existing literature in intrusion detection systems; what are the techniques used to model IDS, types of network attacks, performance of various machine learning tools, how are network intrusion detection systems assessed; to find out top network traffic attributes that can be used to model denial of service intrusion detection; to develop a machine learning model for detection of denial of service network intrusion.Methods: The research design was experimental and data was collected by simulation using NSL-KDD dataset. By implementing Correlation Feature Selection (CFS) mechanism using three search algorithms, a smallest set of features is selected with all the features that are selected very frequently. Findings: The smallest subset of features chosen is the most nominal among all the feature subset found. Further, the performances using Artificial neural networks(ANN), decision trees, Support Vector Machines (SVM) and K-Nearest Neighbour (KNN) classifiers is compared for 7 subsets found by filter model and 41 attributes. Results: The outcome indicates a remarkable improvement in the performance metrics used for comparison of the two classifiers. The results show that using 17/18 selected features improves DOS types classification accuracies as compared to using the 41 features in the NSL-KDD dataset. It was further observed that using an ensemble of three classifiers with decision fusion performs better as compared to using a single classifier for DOS type’s classification. Among machine learning tools experimented, ANN achieved best classification accuracies followed by SVM and DT. KNN registered the lowest classification accuracies. Application: The proposed work with such an improved detection rate and lesser classification time and lar.
Through the generalization of deep learning, the research community has addressed critical challenges in
the network security domain, like malware identification and anomaly detection. However, they have yet to
discuss deploying them on Internet of Things (IoT) devices for day-to-day operations. IoT devices are often
limited in memory and processing power, rendering the compute-intensive deep learning environment
unusable. This research proposes a way to overcome this barrier by bypassing feature engineering in the
deep learning pipeline and using raw packet data as input. We introduce a feature- engineering-less
machine learning (ML) process to perform malware detection on IoT devices. Our proposed model,”
Feature engineering-less ML (FEL-ML),” is a lighter-weight detection algorithm that expends no extra
computations on “engineered” features. It effectively accelerates the low-powered IoT edge. It is trained
on unprocessed byte-streams of packets. Aside from providing better results, it is quicker than traditional
feature-based methods. FEL-ML facilitates resource-sensitive network traffic security with the added
benefit of eliminating the significant investment by subject matter experts in feature engineering.
EFFICIENT ATTACK DETECTION IN IOT DEVICES USING FEATURE ENGINEERING-LESS MACH...ijcsit
Through the generalization of deep learning, the research community has addressed critical challenges in
the network security domain, like malware identification and anomaly detection. However, they have yet to
discuss deploying them on Internet of Things (IoT) devices for day-to-day operations. IoT devices are often
limited in memory and processing power, rendering the compute-intensive deep learning environment
unusable. This research proposes a way to overcome this barrier by bypassing feature engineering in the
deep learning pipeline and using raw packet data as input. We introduce a feature- engineering-less
machine learning (ML) process to perform malware detection on IoT devices. Our proposed model,”
Feature engineering-less ML (FEL-ML),” is a lighter-weight detection algorithm that expends no extra
computations on “engineered” features. It effectively accelerates the low-powered IoT edge. It is trained
on unprocessed byte-streams of packets. Aside from providing better results, it is quicker than traditional
feature-based methods. FEL-ML facilitates resource-sensitive network traffic security with the added
benefit of eliminating the significant investment by subject matter experts in feature engineering.
New kind of intrusions causes deviation in the normal behaviour of traffic flow in
computer networks every day. This study focused on enhancing the learning capabilities of IDS
to detect the anomalies present in a network traffic flow by comparing the k-means approach of
data mining for intrusion detection and the outlier detection approach. The k-means approach
uses clustering mechanisms to group the traffic flow data into normal and abnormal clusters.
Outlier detection calculates an outlier score (neighbourhood outlier factor (NOF)) for each flow
record, whose value decides whether a traffic flow is normal or abnormal. These two methods
were then compared in terms of various performance metrics and the amount of computer
resources consumed by them. Overall, k-means was more accurate and precise and has better
classification rate than outlier detection in intrusion detection using traffic flows. This will help
systems administrators in their choice of IDS.
An Efficient Intrusion Detection System with Custom Features using FPA-Gradie...IJCNCJournal
An efficient Intrusion Detection System has to be given high priority while connecting systems with a network to prevent the system before an attack happens. It is a big challenge to the network security group to prevent the system from a variable types of new attacks as technology is growing in parallel. In this paper, an efficient model to detect Intrusion is proposed to predict attacks with high accuracy and less false-negative rate by deriving custom features UNSW-CF by using the benchmark intrusion dataset UNSW-NB15. To reduce the learning complexity, Custom Features are derived and then Significant Features are constructed by applying meta-heuristic FPA (Flower Pollination algorithm) and MRMR (Minimal Redundancy and Maximum Redundancy) which reduces learning time and also increases prediction accuracy. ENC (ElasicNet Classifier), KRRC (Kernel Ridge Regression Classifier), IGBC (Improved Gradient Boosting Classifier) is employed to classify the attacks in the datasets UNSW-CF, UNSW and recorded that UNSW-CF with derived custom features using IGBC integrated with FPA provided high accuracy of 97.38% and a low error rate of 2.16%. Also, the sensitivity and specificity rate for IGB attains a high rate of 97.32% and 97.50% respectively.
AN EFFICIENT INTRUSION DETECTION SYSTEM WITH CUSTOM FEATURES USING FPA-GRADIE...IJCNCJournal
An efficient Intrusion Detection System has to be given high priority while connecting systems with a network to prevent the system before an attack happens. It is a big challenge to the network security group to prevent the system from a variable types of new attacks as technology is growing in parallel. In this paper, an efficient model to detect Intrusion is proposed to predict attacks with high accuracy and less false-negative rate by deriving custom features UNSW-CF by using the benchmark intrusion dataset UNSW-NB15. To reduce the learning complexity, Custom Features are derived and then Significant Features are constructed by applying meta-heuristic FPA (Flower Pollination algorithm) and MRMR (Minimal Redundancy and Maximum Redundancy) which reduces learning time and also increases prediction accuracy. ENC (ElasicNet Classifier), KRRC (Kernel Ridge Regression Classifier), IGBC (Improved Gradient Boosting Classifier) is employed to classify the attacks in the datasets UNSW-CF, UNSW and recorded that UNSW-CF with derived custom features using IGBC integrated with FPA provided high accuracy of 97.38% and a low error rate of 2.16%. Also, the sensitivity and specificity rate for IGB attains a high rate of 97.32% and 97.50% respectively.
IOT SOLUTIONS FOR SMART PARKING- SIGFOX TECHNOLOGYCSEIJJournal
Sigfox technology has emerged as a competitive product in the communication service provider market for
approximately a decade. Widely implemented for smart parking solutions across various European
countries, it has now gained traction in Germany as well. The technology's successful track record and
reputation in the market demonstrate its effectiveness and reliability in addressing the communication
needs of IoT applications, particularly in the context of vehicle parking systems. This is noted in terms of a
city like Berlin-Germany, for on which the study is conducted. The major challenge being on how to relate
the parking techniques in a more user friendly, cost effective and less energy consumpmti0n mode where
the questions had at the beginning of the paper, relatively at the end the answers are sought to it via Sigfox
and its comparison with other related technologies like LoRA WAN and weightless. But more so future
areas of research study is also pointed out on areas which are not clearly identified in this particular
research area.
This paper entails the pros, cons adaptive, emerging and existing technology study in terms of cloud, big
data, Data analytics are all discussed in tandem to Sigfox.
Reliability Improvement with PSP of Web-Based Software ApplicationsCSEIJJournal
In diverse industrial and academic environments, the quality of the software has been evaluated using
different analytic studies. The contribution of the present work is focused on the development of a
methodology in order to improve the evaluation and analysis of the reliability of web-based software
applications. The Personal Software Process (PSP) was introduced in our methodology for improving the
quality of the process and the product. The Evaluation + Improvement (Ei) process is performed in our
methodology to evaluate and improve the quality of the software system. We tested our methodology in a
web-based software system and used statistical modeling theory for the analysis and evaluation of the
reliability. The behavior of the system under ideal conditions was evaluated and compared against the
operation of the system executing under real conditions. The results obtained demonstrated the
effectiveness and applicability of our methodology
DATA MINING FOR STUDENTS’ EMPLOYABILITY PREDICTIONCSEIJJournal
This study has been undertaken to predict the student employability.Assessing student employability
provides a method of integrating student abilities and employer business requirements, which is becoming
an increasingly important concern for academic institutions. Improving student evaluation techniques for
employability can help students to have a better understanding of business organizations and find the right
one for them. The data for the training classification models is gathered through a survey in which students
are asked to fill out a questionnaire in which they may indicate their abilities and academic achievement.
This information may be used to determine their competency in a variety of skill categories, including soft
skills, problem-solving skills and technical abilities and so on.The goal of this research is to use data
mining to predict student employability by considering different factors such as skills that the students have
gained during their diploma level and time duration with respect to the knowledge they have captured
when they expect the placement at the end of graduation. Further during this research most specific skills
with relevant to each job category also was identified. In this research for the prediction of the student
employability different data mining models such as such as KNN, Naive Bayer’s, and Decision Tree were
evaluated and out of that best model also was identified for this institute's student’s employability
prediction.So, in this research classification and association techniques were used and evaluated.
Call for Articles - Computer Science & Engineering: An International Journal ...CSEIJJournal
Computer Science & Engineering: An International Journal (CSEIJ) is a bi-monthly open access peer-reviewed journal that publishes articles which contribute new results in all areas of the Computer Science & Computer Engineering. The journal is devoted to the publication of high quality papers on theoretical and practical aspects of computer science and computer Engineering.
A Complexity Based Regression Test Selection StrategyCSEIJJournal
Software is unequivocally the foremost and indispensable entity in this technologically driven world.
Therefore quality assurance, and in particular, software testing is a crucial step in the software
development cycle. This paper presents an effective test selection strategy that uses a Spectrum of
Complexity Metrics (SCM). Our aim in this paper is to increase the efficiency of the testing process by
significantly reducing the number of test cases without having a significant drop in test effectiveness. The
strategy makes use of a comprehensive taxonomy of complexity metrics based on the product level (class,
method, statement) and its characteristics.We use a series of experiments based on three applications with
a significant number of mutants to demonstrate the effectiveness of our selection strategy.For further
evaluation, we compareour approach to boundary value analysis. The results show the capability of our
approach to detect mutants as well as the seeded errors.
XML Encryption and Signature for Securing Web ServicesCSEIJJournal
In this research, we have focused on the most challenging issue that Web Services face, i.e. how to secure
their information. Web Services security could be guaranteed by employing security standards, which is the
main focus of this search. Every suggested model related to security design should put in the account the
securities' objectives; integrity, confidentiality, non- repudiation, authentication, and authorization. The
proposed model describes SOAP messages and the way to secure their contents. Due to the reason that
SOAP message is the core of the exchanging information in Web Services, this research has developed a
security model needed to ensure e-business security. The essence of our model depends on XML encryption
and XML signature to encrypt and sign SOAP message. The proposed model looks forward to achieve a
high speed of transaction and a strong level of security without jeopardizing the performance of
transmission information.
Performance Comparison of PCA,DWT-PCA And LWT-PCA for Face Image RetrievalCSEIJJournal
This paper compares the performance of face image retrieval system based on discrete wavelet transforms
and Lifting wavelet transforms with principal component analysis (PCA). These techniques are
implemented and their performances are investigated using frontal facial images from the ORL database.
The Discrete Wavelet Transform is effective in representing image features and is suitable in Face image
retrieval, it still encounters problems especially in implementation; e.g. Floating point operation and
decomposition speed. We use the advantages of lifting scheme, a spatial approach for constructing wavelet
filters, which provides feasible alternative for problems facing its classical counterpart. Lifting scheme has
such intriguing properties as convenient construction, simple structure, integer-to-integer transform, low
computational complexity as well as flexible adaptivity, revealing its potentials in Face image retrieval.
Comparing to PCA and DWT with PCA, Lifting wavelet transform with PCA gives less computation and
DWT-PCA gives high retrieval rate..
Call for Papers - Computer Science & Engineering: An International Journal (C...CSEIJJournal
Computer Science & Engineering: An International Journal (CSEIJ) is a bi-monthly open access peer-reviewed journal that publishes articles which contribute new results in all areas of the Computer Science & Computer Engineering. The journal is devoted to the publication of high quality papers on theoretical and practical aspects of computer science and computer Engineering.
Paper Submission - Computer Science & Engineering: An International Journal (...CSEIJJournal
Computer Science & Engineering: An International Journal (CSEIJ) is a bi-monthly open access peer-reviewed journal that publishes articles which contribute new results in all areas of the Computer Science & Computer Engineering. The journal is devoted to the publication of high quality papers on theoretical and practical aspects of computer science and computer Engineering.
Performance Comparison of PCA,DWT-PCA And LWT-PCA for Face Image RetrievalCSEIJJournal
This paper compares the performance of face image retrieval system based on discrete wavelet transforms
and Lifting wavelet transforms with principal component analysis (PCA). These techniques are
implemented and their performances are investigated using frontal facial images from the ORL database.
The Discrete Wavelet Transform is effective in representing image features and is suitable in Face image
retrieval, it still encounters problems especially in implementation; e.g. Floating point operation and
decomposition speed. We use the advantages of lifting scheme, a spatial approach for constructing wavelet
filters, which provides feasible alternative for problems facing its classical counterpart. Lifting scheme has
such intriguing properties as convenient construction, simple structure, integer-to-integer transform, low
computational complexity as well as flexible adaptivity, revealing its potentials in Face image retrieval.
Comparing to PCA and DWT with PCA, Lifting wavelet transform with PCA gives less computation and
DWT-PCA gives high retrieval rate..
Call for Papers - Computer Science & Engineering: An International Journal (C...CSEIJJournal
Computer Science & Engineering: An International Journal (CSEIJ) is a bi-monthly open access peer-reviewed journal that publishes articles which contribute new results in all areas of the Computer Science & Computer Engineering. The journal is devoted to the publication of high quality papers on theoretical and practical aspects of computer science and computer Engineering.
Data security and privacy are important to prevent the re-
veal, modification and unauthorized usage of sensitive information. The
introduction of using critical power devices for internet of things (IoTs),
e-commerce, e-payment, and wireless sensor networks (WSNs) has brought
a new challenge of security due to the low computation capability of sen-
sors. Therefore, the lightweight authenticated key agreement protocols
are important to protect their security and privacy. Several researches
have been published about authenticated key agreement. However, there
is a need of lightweight schemes that can fit with critical capability de-
vices. Addition to that, a malicious key generation center (KGC) can
become a threat to watch other users, i.e impersonate user by causing
the key escrow problem
Call for Papers - Computer Science & Engineering: An International Journal (C...CSEIJJournal
Computer Science & Engineering: An International Journal (CSEIJ) is a bi-monthly open access peer-reviewed journal that publishes articles which contribute new results in all areas of the Computer Science & Computer Engineering. The journal is devoted to the publication of high quality papers on theoretical and practical aspects of computer science and computer Engineering.
Recommendation System for Information Services Adapted, Over Terrestrial Digi...CSEIJJournal
The development of digital television in Colombia has grown in last year’s, specially the digital terrestrial
television (DTT), which is an essential part to the projects of National Minister of ICT, thanks to the big
distribution and use of the television network and Internet in the country. This article explains how joining
different technologies like social networks, information adaptation and DTT, to get an application that
offers information services to users, based on their data, preferences, inclinations, use and interaction with
others users and groups inside the network.
Reconfiguration Strategies for Online Hardware Multitasking in Embedded SystemsCSEIJJournal
An intensive use of reconfigurable hardware is expected in future embedded systems. This means that the
system has to decide which tasks are more suitable for hardware execution. In order to make an efficient
use of the FPGA it is convenient to choose one that allows hardware multitasking, which is implemented by
using partial dynamic reconfiguration. One of the challenges for hardware multitasking in embedded
systems is the online management of the only reconfiguration port of present FPGA devices. This paper
presents different online reconfiguration scheduling strategies which assign the reconfiguration interface
resource using different criteria: workload distribution or task’ deadline. The online scheduling strategies
presented take efficient and fast decisions based on the information available at each moment. Experiments
have been made in order to analyze the performance and convenience of these reconfiguration strategies.
Performance Comparison and Analysis of Mobile Ad Hoc Routing ProtocolsCSEIJJournal
A mobile ad hoc network (MANET) is a wireless network that uses multi-hop peer-to-peer routing instead
of static network infrastructure to provide network connectivity. MANETs have applications in rapidly
deployed and dynamic military and civilian systems. The network topology in a MANET usually changes
with time. Therefore, there are new challenges for routing protocols in MANETs since traditional routing
protocols may not be suitable for MANETs. Researchers are designing new MANET routing protocols
and comparing and improving existing MANET routing protocols before any routing protocols are
standardized using simulations. However, the simulation results from different research groups are not
consistent with each other. This is because of a lack of consistency in MANET routing protocol models
and application environments, including networking and user traffic profiles. Therefore, the simulation
scenarios are not equitable for all protocols and conclusions cannot be generalized. Furthermore, it is
difficult for one to choose a proper routing protocol for a given MANET application. According to the
aforementioned issues, this paper focuses on MANET routing protocols. Specifically, my contribution
includes the characterization of different routing protocols and compare and analyze the performance of
different routing protocols.
Adaptive Stabilization and Synchronization of Hyperchaotic QI SystemCSEIJJournal
The hyperchaotic Qi system (Chen, Yang, Qi and Yuan, 2007) is one of the important models of four-
dimensional hyperchaotic systems. This paper investigates the adaptive stabilization and synchronization
of hyperchaotic Qi system with unknown parameters. First, adaptive control laws are designed to
stabilize the hyperchaotic Qi system to its equilibrium point at the origin based on the adaptive control
theory and Lyapunov stability theory. Then adaptive control laws are derived to achieve global chaos
synchronization of identical hyperchaotic Qi systems with unknown parameters. Numerical simulations
are shown to demonstrate the effectiveness of the proposed adaptive stabilization and synchronization
schemes.
An Energy Efficient Data Secrecy Scheme For Wireless Body Sensor NetworksCSEIJJournal
Data secrecy is one of the key concerns for wireless body sensor networks (WBSNs). Usually, a data
secrecy scheme should accomplish two tasks: key establishment and encryption. WBSNs generally face
more serious limitations than general wireless networks in terms of energy supply. To address this, in this
paper, we propose an energy efficient data secrecy scheme for WBSNs. On one hand, the proposed key
establishment protocol integrates node IDs, seed value and nonce seamlessly for security, then
establishes a session key between two nodes based on one-way hash algorithm SHA-1. On the other hand,
a low-complexity threshold selective encryption technology is proposed. Also, we design a security
selection patter exchange method with low-complexity for the threshold selection encryption. Then, we
evaluate the energy consumption of the proposed scheme. Our scheme shows the great advantage over
the other existing schemes in terms of low energy consumption.
To improve the QoS in MANETs through analysis between reactive and proactive ...CSEIJJournal
A Mobile Ad hoc NETwork (MANET), is a self-configuring infra structure less network of mobile devices
connected by wireless links. ad hoc is Latin and means "for this purpose". Each device in a MANET is free
to move independently in any direction, and will therefore change its links to other devices frequently. Each
must forward traffic unrelated to its own use, and therefore be a router. The primary challenge in building
a MANET is equipping each device to continuously maintain the information required to properly route
traffic. QOS is defined as a set of service requirements to be met by the network while transporting a
packet stream from source to destination. Intrinsic to the notion of QOS is an agreement or a guarantee by
the network to provide a set of measurable pre-specified service attributes to the user in terms of delay,
jitter, available bandwidth, packet loss, and so on. The analysis is mainly between proactive or table-driven
protocols like OLSR (Optimized Link State Routing) viz DSDV (Destination Sequenced Distance Vector) &
CGSR (Cluster Head Gateway Switch Routing) and reactive or source initiated routing protocols viz
AODV (Ad hoc on Demand distance Vector) & DSR (Dynamic Source Routing). The QoS analysis of the
above said protocols is simulated on NS2 and results are shown thereby.
This paper introduces Topic Tracking for Punjabi language. Text mining is a field that automatically
extracts previously unknown and useful information from unstructured textual data. It has strong
connections with natural language processing. NLP has produced technologies that teach computers
natural language so that they may analyze, understand and even generate text. Topic tracking is one of the
technologies that has been developed and can be used in the text mining process. The main purpose of topic
tracking is to identify and follow events presented in multiple news sources, including newswires, radio and
TV broadcasts. It collects dispersed information together and makes it easy for user to get a general
understanding. Not much work has been done in Topic tracking for Indian Languages in general and
Punjabi in particular. First we survey various approaches available for Topic Tracking, then represent our
approach for Punjabi. The experimental results are shown.
Explore the innovative world of trenchless pipe repair with our comprehensive guide, "The Benefits and Techniques of Trenchless Pipe Repair." This document delves into the modern methods of repairing underground pipes without the need for extensive excavation, highlighting the numerous advantages and the latest techniques used in the industry.
Learn about the cost savings, reduced environmental impact, and minimal disruption associated with trenchless technology. Discover detailed explanations of popular techniques such as pipe bursting, cured-in-place pipe (CIPP) lining, and directional drilling. Understand how these methods can be applied to various types of infrastructure, from residential plumbing to large-scale municipal systems.
Ideal for homeowners, contractors, engineers, and anyone interested in modern plumbing solutions, this guide provides valuable insights into why trenchless pipe repair is becoming the preferred choice for pipe rehabilitation. Stay informed about the latest advancements and best practices in the field.
Water scarcity is the lack of fresh water resources to meet the standard water demand. There are two type of water scarcity. One is physical. The other is economic water scarcity.
Welcome to WIPAC Monthly the magazine brought to you by the LinkedIn Group Water Industry Process Automation & Control.
In this month's edition, along with this month's industry news to celebrate the 13 years since the group was created we have articles including
A case study of the used of Advanced Process Control at the Wastewater Treatment works at Lleida in Spain
A look back on an article on smart wastewater networks in order to see how the industry has measured up in the interim around the adoption of Digital Transformation in the Water Industry.
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...Amil Baba Dawood bangali
Contact with Dawood Bhai Just call on +92322-6382012 and we'll help you. We'll solve all your problems within 12 to 24 hours and with 101% guarantee and with astrology systematic. If you want to take any personal or professional advice then also you can call us on +92322-6382012 , ONLINE LOVE PROBLEM & Other all types of Daily Life Problem's.Then CALL or WHATSAPP us on +92322-6382012 and Get all these problems solutions here by Amil Baba DAWOOD BANGALI
#vashikaranspecialist #astrologer #palmistry #amliyaat #taweez #manpasandshadi #horoscope #spiritual #lovelife #lovespell #marriagespell#aamilbabainpakistan #amilbabainkarachi #powerfullblackmagicspell #kalajadumantarspecialist #realamilbaba #AmilbabainPakistan #astrologerincanada #astrologerindubai #lovespellsmaster #kalajaduspecialist #lovespellsthatwork #aamilbabainlahore#blackmagicformarriage #aamilbaba #kalajadu #kalailam #taweez #wazifaexpert #jadumantar #vashikaranspecialist #astrologer #palmistry #amliyaat #taweez #manpasandshadi #horoscope #spiritual #lovelife #lovespell #marriagespell#aamilbabainpakistan #amilbabainkarachi #powerfullblackmagicspell #kalajadumantarspecialist #realamilbaba #AmilbabainPakistan #astrologerincanada #astrologerindubai #lovespellsmaster #kalajaduspecialist #lovespellsthatwork #aamilbabainlahore #blackmagicforlove #blackmagicformarriage #aamilbaba #kalajadu #kalailam #taweez #wazifaexpert #jadumantar #vashikaranspecialist #astrologer #palmistry #amliyaat #taweez #manpasandshadi #horoscope #spiritual #lovelife #lovespell #marriagespell#aamilbabainpakistan #amilbabainkarachi #powerfullblackmagicspell #kalajadumantarspecialist #realamilbaba #AmilbabainPakistan #astrologerincanada #astrologerindubai #lovespellsmaster #kalajaduspecialist #lovespellsthatwork #aamilbabainlahore #Amilbabainuk #amilbabainspain #amilbabaindubai #Amilbabainnorway #amilbabainkrachi #amilbabainlahore #amilbabaingujranwalan #amilbabainislamabad
Overview of the fundamental roles in Hydropower generation and the components involved in wider Electrical Engineering.
This paper presents the design and construction of hydroelectric dams from the hydrologist’s survey of the valley before construction, all aspects and involved disciplines, fluid dynamics, structural engineering, generation and mains frequency regulation to the very transmission of power through the network in the United Kingdom.
Author: Robbie Edward Sayers
Collaborators and co editors: Charlie Sims and Connor Healey.
(C) 2024 Robbie E. Sayers
Fundamentals of Electric Drives and its applications.pptx
Attack Detection Availing Feature Discretion using Random Forest Classifier
1. Computer Science & Engineering: An International Journal (CSEIJ), Vol 12, No 6, December 2022
DOI:10.5121/cseij.2022.12611 109
ATTACK DETECTION AVAILING FEATURE
DISCRETION USING RANDOM FOREST CLASSIFIER
Anne Dickson1
and Ciza Thomas2
1
Assistant Professor, Department of Computer Science & Engineering, Mar Baselios
College of Engineering & Technology, Trivandrum, Kerala
2
Senior Joint Director, Directorate of Technical Education, Trivandrum, Kerala
ABSTRACT
The widespread use of the Internet has an adverse effect of being vulnerable to cyber attacks. Defensive
mechanisms like firewalls and IDSs have evolved with a lot of research contributions happening in these
areas. Machine learning techniques have been successfully used in these defense mechanisms especially
IDSs. Although they are effective to some extent in identifying new patterns and variants of existing
malicious patterns, many attacks are still left as undetected. The objective is to develop an algorithm for
detecting malicious domains based on passive traffic measurements. In this paper, an anomaly-based
intrusion detection system based on an ensemble based machine learning classifier called Random Forest
with gradient boosting is deployed. NSL-KDD cup dataset is used for analysis and out of 41 features, 32
features were identified as significant using feature discretion. Our observations confirm the conjecture
that both the feature selection and stochastic based genetic operators improves the accuracy and the
effectiveness. The training time is shown to be reduced tremendously by 98.59% and accuracy improved to
98.75%.
KEYWORDS
Statistical Traffic Properties, Traffic Classification, Segmentation, Deep Packet Inspection, Intrusion
Detection System
1. INTRODUCTION
In the digitized world, the Internet has become an integral part of our life. Now all the
transactions are becoming online and all are living in an online era. All the important
transactions and documents are transferred using online ,e-mail, etc. As we depend too much on
these services offered by internet the crime related to these are also exponentially increasing. So
here comes the importance of an intrusion detection system (IDS), which forms the second layer
of defense.It is our duty to keep our data credentials secure. Social engineering is the ultimate
data source of real-time cyber threats. The enormous growth in internet applications leads to the
challenging growth in cyber security. Exponential growth of network threats inadversely affected
the confidentiality, integrity and availability which are the basic principles of information
security. Firewall, IDS which are considered as the wall of defence failed to detect modern attack
scenario. Deep network packet inspection and network behaviour analysis is not done
appropriately by current IDS. Hence, analyzing and monitoring the network systems to detect
anomalies and network threats are supposed to perform using variant approaches using integrated
IDS such as machine learning, deep learning and other hybrid methods.
Intrusion detection is the process of identifying any abnormal incidents such as unauthorized
access of a system or attack on a system. These systems can be implemented either in software or
2. Computer Science & Engineering: An International Journal (CSEIJ), Vol 12, No 6, December 2022
110
hardware. The firewall is used to stop the unwanted traffic from outside. It does not indicate
attacks, inside the system [1, 2]. IDS can be generally classified as anomaly and misuse-based.
Misuse based detection uses the known signature to identify the attacks. It tries to match the
signature of an attack with the signature in the database. However, this type of detection fails to
identify a new attack. Anomaly-based detection identfies intrusion by observing deviations from
normal behavior. Anything that slightly varies from normal is considered as anomaly. So in such
cases the rate of false alarm will be more. However, this type of detection is suitable for detecting
zero-day attacks [2] [3].
Anomaly detection can be done using various machine learning algorithms. Machine learning is a
part of artificial intelligence and it learns and improves with experience. The main advantage of
machine learning algorithm is the speed of detection. It uses trained data to form a model that can
be used to predict the test data. Though there are several intrusion detection systems, still some
attacks are not properly detected. Majority of these attacks comes under the category of minority
attacks like remote to local (R2L) and user to root (U2R) [4][5]. Previous studies mentioned that
the feature selection improves the speed of computation [6][7]. Feature selection determines the
useful features from the whole feature set. There are mainly two types of feature selection,
namely the wrapper method, and filter method. Wrapper method depends on the classifier
whereas the filter method uses some suitable criteria. When coming to network trouble shooting
for threat detection, we need network visibility. As intrusions increased with technology
expansion, exploration of flow data traffic structure turn into an irreplaceable procedure. It is
mandatory to identify the source and destination of packet flows and its configurations. Packet
format should be identified precisely regardless of on-demand or full packet dissection. When the
world bloom with technology exploration using computers and automation, the core challenge
faced in current decade is in modelling secure network applications. IDS also faces different
challenges in the areas of network topology, hardware involvement and in other functionality.
The performance and availability differs with highly accessible and limited resources with
different protocols.
In this paper, we propose an intrusion detection system based on random forest. Feature selection
is done using the genetic algorithm. Weights are calculated for each feature. The rest of the paper
is organized as: Related works are described in section II. Section III includes a brief description
about theoretical background, proposed work and workflow. The experimental setup, brief
discussion about NSL-KDD dataset and empirical results are described in section IV. Finally, the
paper is concluded with future directions and discussions.
2. RELATED WORKS
This section reviews related works on detecting and analysing various subsisting network
intrusions using varients of machine learning approaches. IDS is the most researched area among
research community working in the field of network security. Anomaly based detection draws
more attention than signature based due to its effectiveness in disclosing novel attacks
[3].Though there are lot of work developed,the unavoidable fact is the failure in detecting serious
network attacks.This problem comes in the case of the minority attack such as the R2L and U2R.
This is because the number of samples of these attacks in the dataset is very less compared to the
number of samples under the category of denial of service (DoS), probe and normal. If we are
considering these aspects, there is a chance of getting improved detection rate.
In 2003 Mukkamala et al.[11] described a method for calculating the importance of the input
attributes. They studied two classfiers named artificial neural network and support vector
machine. One method deletes an attribute at a time and compares the performance of the system
with full attributes. They ranked each feature as important and secondary based on the accuracy,
3. Computer Science & Engineering: An International Journal (CSEIJ), Vol 12, No 6, December 2022
111
training time, and testing time. This is a general method irrespective of the modeling tool.
Further, they described a ranking method specifically for SVM. They conclude by saying SVM
performs better than the artificial neural network in terms of training time, running time, and
prediction accuracy. An SVM based intrusion detection system was discussed by Pervez et
al.[12]. In this work one feature is deleted and its accuracy is calculated. If the accuracy obtained
is greater than the accuracy obtained using all features, the particular feature was eliminated from
the dataset.
In 2012 Yinhui Li et al.[13] described four methods of feature reduction such as feature removal,
sole feature, hybrid method and gradually feature removal method. They used the KDD CUP99
dataset and constructed a compact dataset by clustering and selected a small training dataset by
Ant colony optimization(ACO). For the classifier purpose, RBF kernel-based SVM is used. The
intrusion detection system developed by the Nelcileno Araujo et al.[14] in 2010 used a hybrid
approach. First, information gain for 41 features of KDD CUP99 data set is calculated and then
according to the value, the feature is ranked and the detection rate of the optimal feature is
assessed. The K-means classifier was used to extract the feature with the highest information gain
ratio (IGR). Rough set based feature reduction is done by Rung-Ching Chen et al.[15] in 2009.
In 2006 Wei Wang et al.[16] propose a method for identifying intrusion using principal
component analysis (PCA). They also profile the behaviour of each individual attack. Fangjun
Kuang et al. [17] propose an intrusion detection system (IDS) in which a feature reduction is
done using kernel principal component analysis (KPCA). It is an improved version of PCA which
adopts a non-linear kernel method. Cheng-Lung Hung et al. [18] describe the genetic algorithm
for feature selection and parameter optimization for SVM. Aswani Kumar et al.[19] in 2017
describe an intrusion detection system in which feature selection is employed using chi-square
method. Adriana et al.[20] use information gain method for feature selection. In this particle
swarm optimization (PSO) and Ant colony optimization(ACO) are used for the parameter
optimization. This method showed a considerable amount of reduction in the computational time
and also improvement in the detection. Mostafa A.Salama and et al.[21] describes an IDS using
support vector machine (SVM). In this feature reduction was done using deep belief
network(DBN).
3. THEORETICAL BACKGROUND
Feature selection helps to work on problem with n dimensional feature space. Identifying the
subgroup from the input variables by neglecting the irrelevant ones is said to be feature selection.
There are two different types of feature selection such as supervised and unsupervised.
Supervised are further categorized into wrapper, filter and intrinsic. Finding the relevant features
helps in improving both accuracy and computation time. Feature extraction is a complicated task
than feature selection. It is a procedure of creating new features when we could not have used
raw features. This process includes some arithmetic operations on features for better extraction.
Because only adequate feature extraction can yield better classification. From the input raw data,
we will consider some sample of rows and columns, and it is subjected to row sampling and
feature sampling.
Random Forest classifier or a regressor is a bagging technique. The base learner is decision tree.
Decision tree has two properties.They are low bias and high variance. When we use many
decision trees in the random forest,in full depth, it will get trained properly for our trained data
set. So the error will be very less. Whenever we get new test data, these decision tree are prone to
get more errors. Hence over fitting occurs. Multiple decision trees are taken in random forest.
Finally we combine the decision trees for majority vote, the high variance will get converted into
low variance. Because when we do row sampling and feature sampling and giving the records to
4. Computer Science & Engineering: An International Journal (CSEIJ), Vol 12, No 6, December 2022
112
the decision tree, the decision tree turns into an expert with respect to these input samples. Hence
random forest works very well with respect to most for the machine learning use cases[23].
Classifier uses majority vote whereas regression will find the mean or median of the particular
output of decision trees. With the help of the hyper parameter, we can identify the number of
decision trees that can be used for our problem domain.
4. FEATURE DISCRETION
As a pre-processing step, the feature discretion is done using random forest classifier to identify
the most important feature that contributes towards the label, so that we can eliminate the least
contributing feature thereby improving the computational efficiency.. Filter method and wrapper
method are the two main different types of feature selection[28]. The importance of features is
measured by their connection with the dependent variable or outcome variable, and features are
chosen based on their results in various statistical tests. To rank all of the features in the data set,
the filter approach employs an attribute evaluator and a ranker. Wrapper approaches use greedy
search algorithms to examine all possible feature combinations and select the one that delivers
the best result for a particular machine learning algorithm [29]. Sequential search algorithm and
heuristic algorithm such as genetic algorithm comes under this category. If there are 'p' feature,
then 2p
possible combinations of features are possible.
4.1. Feature Discretion using Passive Traffic Measurements
Initially features are selected using feature subset selection process. The practise of detecting and
deleting as much useless and redundant information as feasible is known as feature subset
selection. This decreases the data dimensionality, allowing learning algorithms to operate more
quickly and effectively.
The Genetic Method, developed by Holland in 1965, is a sophisticated stochastic search
algorithm based on natural genetics and selection mechanisms. Evolution is an optimizing
process. Genetic Algorithm is an Evolutionary optimization technique based on the concept of
"Survival of the fittest", Darwin Theory. It simulates the concept of evolution. This is a bio-
inspired and uses the concept of genetics and natural selection. It comes under the evolutionary
algorithm. Genetic algorithm(GA) iterates through fitness assessment, selection, recombination,
and population reassembly. Initially, a set of random solutions called population is created. Each
person in the population is referred to as a chromosome, and each chromosome represents a
solution to the problem at hand. Generations are the iterations in which the chromosomes evolve.
During each generation, the chromosomes are evaluated using some measure of fitness. It gets
evolved using the principle of variation, selection and inheritance. Crossover and mutation are
used for the generation of offspring. In crossover two parent's genetic information's are mixed to
produce offspring. To keep the population size constant, a new generation is generated by
selecting some of the parents and offspring based on fitness values and rejecting others. Fitter
chromosomes have a better chance of being chosen. The algorithms eventually converge on the
best chromosome, which should reflect the optimal or suboptimal solution to the problem after
numerous generations. The classic methods for crossover are one point, two point and uniform
crossover. Mutation operator maintains diversity among the population. One simple method of
performing the mutation is bit flip mutation. Selection can be done using roulette selection,
tournament selection etc. For wide classes of problems, GA works reasonably well.
Initial population is created randomly. Selection is done using Tournament selection. Crossover
and mutation are the subsequent steps. The termination condition used here is to stop the iteration
after a fixed number of generation. Since the iteration terminated after a fixed number of
5. Computer Science & Engineering: An International Journal (CSEIJ), Vol 12, No 6, December 2022
113
generation we can expect a nearly optimal solution or better solution than previous generation.
The genetic algorithm creates initial population randomly and here it provides the best reduced
feature subset in the last generation. The importance of each feature is checked by simply
deactivating each feature one at a time from the reduced feature set and accuracy is observed.
Calculated accuracy change by (au-aui) where 'au' denotes accuracy with all features and ‘aui’ is
the accuracy by deleting one feature. The minimum and maximum value of accuracy change is
noted. Calculate the weight of each feature using the expression,
where - Accuracy change of each feature
- Minimum accuracy change
- Maximum accuracy change
Cross validation is used to analyze the individual attacks detection rate. If the detection rate is
low, then that particular attack is further studied. If any of the important feature is not included,
then it is added to the earlier reduced feature set and the above steps are repeated until
satisfactory detection rate is achieved.
4.2. Proposed Methodology
The following are the steps in the suggested method as given in Figure 1. Initialize the value
obtained from the random forest classifier as the first random population. Calculate each
particle's fitness value in the population. Determine the best population by sorting from the above
obtained set. Cross over and mutation is performed for avoiding worst set and preserving the
good one. Repeat the process until we reach the optimal value. Identify the population which
gave the best value. This is a process that repeats over and over we start with population and then
create a new population until we reach to identify network attack.
The work flow of proposed method consists of three parts, feature discretion, classification and
detection[30]. Feature discretion is done using an evolutionary algorithm called genetic algorithm
which works on the principle of natural selection.
Figure 1: Proposed Method
6. Computer Science & Engineering: An International Journal (CSEIJ), Vol 12, No 6, December 2022
114
4.3. Classification of Intrusions
Random forest, an ensemble based technique is used here for classifying the labeled dataset. In
random forest the base learner is decision tree. Low bias and high variance are the two main
properties of decision tree. Here the trees are going to split upto depth. By creating a decision tree
in its entirety, we can ensure that it is correctly trained for our training data set. As a result, the
training error will be minimal. In the case of test data, the number of instances is less compared
to training samples[30]. By incorporating row sampling and feature selection, multiple decision
tree models are combined in parallel so that the design will lead to high accuracy and low
variance. Usually, the random forest works with respect to a scalar, it tries to find out the average
of particular output from all decision trees. Binary classifier uses majority vote whereas
regression problem uses mean or median of output of all decision trees.
Variant approaches are being utilized for intrusion detection but unfortunately none of the system
existing so far is completely flawless. Secure data communication through the Internet, as well as
any other network, is always vulnerable to hacking and misuse. As a result, it has become an
essential component of computer and network security. It work from inside the network to catch
attacks and breaches that make it through the firewall whereas firewall filter traffic on the
network's periphery. It detects attackers and network anomalies and sends alerts through text,
email to the management station.
5. EXPERIMENTAL SETUP
Various machine learning techniques are utilized to train and build multiple classification models
that can classify attack type of network traffic verses normal type of network traffic. The test
accuracies of various classification models are compared to identify the best model for
performing network intrusion detection. Initially performance evaluators, essential and model
required for analysis are imported followed by the path of the input data set. The dataset used is
the twenty percent of labelled NSLKDD dataset. Initially import all the required libraries in
Python such as Pandas, numpy, scipy, sklearn.seaborn, matlibplot with necessary models. Sklearn
library is an open source machine learning tool widely used in python, with various tools for
building statistical and machine learning models, including classification, clustering and
dimensionality reduction. The dataset is then partitioned into train set and test set for further
processing. Analyze the number of records and features. Using genetic algorithm eliminate
features with low importance and check the effect on the accuracy of the model. Based on feature
selection, the four levels are introduced. As all the features have some contribution to the model,
we will keep all the features.The required model is fitted using train data. After we build the
model using training data, we will test the accuracy of the model with test data and determine the
appropriate model for this dataset. Finally, the response for test data is predicted by finding
detection accuracy. The NSL KDD data set consists of 25191 rows in train data and 11850 rows
in test data with 42 features. Train data set consists of twenty two labels with thirty eight
numerical features and three categorical features. It has no missing values and duplicates.
Training accuracy in Random Forest is 0.99875 with 0.8210 test accuracy.
6. ATTACK DESCRIPTION
Probing the network or system to gain information that helps in attack and thereby gaining a read
access or write access to the system or disabling the services of the server are the main goals of
an attacker. There are various tools available for the defenders of the network to study about
various loopholes present in the real network environment that cause the cyber attacks. The
research community has always come up with sophisticated defensive mechanisms that are
7. Computer Science & Engineering: An International Journal (CSEIJ), Vol 12, No 6, December 2022
115
intelligent and adaptive to the ever evolving cyber world. Benchmark datasets are available for
the researchers to evaluate these algorithms. One of the popular dataset for evaluation is the NSL-
KDD dataset. There are twenty two attack types present in train set of NSL-KDD dataset and the
test set includes thirty nine attack types as given in Table 1. Attacks belong to any one of the four
classes, namely Probe, DoS, U2R or R2L.
Most serious attack category is the Denial of Service (DoS) such as unintended, distributed and
application layer DoS. Most well known among this attack is the DNS flood. In order to slow or
crash the service, an attacker simply floods it with requests from a faked IP address. The purpose
is to reject new authorised TCP connections from any genuine client's side. If the attacker sets the
source and destination information of a TCP segment as same, then it will give rise to Land
attack, which is the Layer 4 DoS. Attempt to crash, destabilize, or freeze the targeted computer or
service by sending malformed or oversized packets using a simple ping command will lead to
Ping of Death (PoD). A smurf attack is a type of DDoS attack that employs the ICMP protocol to
flood the victim's network with packets. TCP fragmentation attacks assaults target TCP or IP
reassembly mechanisms, preventing from putting together fragmented data packets. As a result,
the data packets overlap and quickly overwhelm the victim's servers, causing them to fail. It starts
by sending the fragmented packets to a target machine. Such attacks are also termed as teardrop
attacks.
When one source IP address transmits a predetermined number of ICMP packets to numerous
hosts in a predetermined time period, it becomes an address sweep. Within a predetermined time
interval, one source IP address transmits IP packets including TCP SYN segments to a predefined
number of various ports at the same destination IP address. A backdoor is any method that allows
authorised and unauthorised users to bypass typical security measures and achieve high-level user
access or root access to a computer system, network, or software application. Password guessing
is another sort of network attack. By detecting the user id or password combination of a genuine
user, access rights to a computer and network resources are compromised.
Password guessing attacks can be categorized into two types such as Brute Force Attack and
Dictionary attack respectively. A Brute Force assault is a form of password guessing attack that
involves attempting every possible code, combination, or password until the correct one is
discovered. This kind of attack could take a long time to finish. Another sort of password
guessing attack is a dictionary attack, which employs a dictionary of frequent terms to figure out
the user's password. A buffer overflow, also known as a buffer overrun, occurs when a fixed-
length buffer is filled with more data than it can manage. This overflow normally causes a system
crash, but it also gives an attacker the ability to run arbitrary code or exploit coding mistakes to
cause harmful behaviour. Attackers leveraging password spraying technique are exploiting
Internet Message Access Protocol (IMAP) to break into cloud accounts.
A rootkit is a form of malware that is meant to infect a target PC and install a suite of tools that
give the attacker persistent remote access to the device. Powering down the machine and running
a scan from a known clean system is one technique to discover the infestation. Another way of
rootkit identification is behavioural analysis. Leveraging Internet Message Access Protocol
(IMAP) for password-spray attacks to compromise cloud-based accounts lead to imap attack.
8. Computer Science & Engineering: An International Journal (CSEIJ), Vol 12, No 6, December 2022
116
Table 1. Attacks present in NSL-KDD Dataset
Dataset Attacks
NSL-KDD Train
Back, Land, Neptune, Pod. Smurf, Teadrop, Satan, IPsweep, Nmap,
Portsweep, guesspasssword,ftp-write, imap, phf, multihop,
warezmaster,
warezclient, spy, buffer overflow, loadmodule, rootkit, perl
NSL-KDD Test
Back, Land, Neptune, Pod, Smurf, Teardrop, Satan, IPsweep, Nmap,
Portsweep, guesspassword, ftp_write,imap, phf, multihop,
warezmaster,
warezclient, spy,buffer overflow, loadmodule,rootkit,perl, Apache2,
Mailbomb,processtable,udpstorm,snmpgetattack,snmpguess,named,
worm,
sendmail,sqlattack,httptunnel,xterm,ps,xlock,xsnoop,mscan,saint
7. RESULTS AND INFERENCES
The evaluation started with the feature selection. Since the genetic algorithm is a stochastic
process, feature selection was implemented many times and most repeated features were taken.
Selected features are included in the Table 2.
Table 2. Reduced Feature Set
Reduced Feature Set
Protocol type Rerror rate
Dst bytes Same service rate
Land Diff service rate
Logged in Srv diff host rate
Num compromised Destination host count
Root shell Dst host srv count
Num Root Dst host same srv rate
Num access files Dst host same src port rate
Num outbound cmds Dst host srv diff host rate
Is guest login Dst host srv serror rate
Srv count Dst host rerror rate
Srv serror rate
With these features, the classification was done. Analysis of individual detection rate has been
done. From the thorough analysis, it is understood that due to feature selection, some features that
contributed to the detection of attacks listed in Table 1 were missing for classification process.
Table 6 shows the classification results using Random Forest for labelled dataset.
The individual attacks detection rates are analysed and it is found that the detection rate of the
attack named ping of death (pod) is very less. Only 30% of the attack was correctly detected. The
features unique to the attacks that have a low detection rate are introduced in various levels of
development of the enhancement algorithm proposed and implemented in this work. Table 3
clearly indicates the probability of detection of individual attacks in the different stages of
development such as Level I with 23 features, Level II with seven additional features of nmap,
pod and teardrop. Level III with two additional features of warezclient and warezmaster. Level
IV with all the 41 features. Table 4 and Table 5 indicates the precision and F-score of individual
9. Computer Science & Engineering: An International Journal (CSEIJ), Vol 12, No 6, December 2022
117
attacks in each level. For the first level of detection beyond pod we considered two more attacks
teardrop and nmap. From the analysis, it is understood that due to feature selection some
attributes that contributed to the detection of the above mentioned intrusions were missing for
classification process.
Table 3. Recall of 20 percent of dataset
Attacks Level I Level II Level III Level IV
Back 0.67 0.75 0.97 0.97
bufferoverflow 1 1 1 1
guesspassword 1 1 1 1
ipsweep 0.86 0.86 0.98 0.98
Neptune 1 1 1 1
nmap 0.84 0.92 0.92 0.92
Normal 0.98 0.99 0.99 0.99
pod 0.30 1 1 1
portsweep 0.97 0.97 0.97 0.97
satan 0.78 0.89 0.91 0.89
smurf 0.98 0.99 0.99 1
teardrop 0.74 1 1 1
warezclient 0.71 0.71 0.93 0.93
warezmaster 1 1 1 1
Teardrop is an attack that comes under the category of the denial of service which mainly uses
the fragmented traffic to damage the victim machine. Fragmentation is a natural effect when
traffic is moving over a network having a fluctuating size of the MTU (Maximum Transmission
Unit). It can additionally take place when a host wishes to put a datagram on the network that
exceeds its own networks maximum transmission network. At last from the destination host they
were reassembled. Teardrop attack takes advantage of these fragments with the coinciding offset
fields. While reassembling the fragments at the destination host some system may hang, crash or
reboot. On understanding the different features of the dataset, it is found that the feature wrong
fragment is related to fragmentation. So it is decided to add to the set of above 23 features to
increase the rate of detection of attack teardrop.
Table 4. Precision of 20 percent of dataset
Attacks Level I Level II Level III Level IV
Back 0.75 0.73 0.97 0.95
bufferoverflow 0.33 0.33 1 1
guesspassword 1 1 1 1
ipsweep 0.93 0.95 0.95 0.95
Neptune 1 1 1 1
nmap 0.80 0.84 0.82 0.81
Normal 0.98 0.98 0.99 0.99
pod 0.40 0.95 0.95 0.95
portsweep 0.97 0.98 0.99 0.99
satan 0.98 0.92 0.94 1
smurf 1 1 1 1
teardrop 0.77 1 1 1
warezclient 0.62 0.74 0.70 0.70
warezmaster 0.67 0.67 1 1
10. Computer Science & Engineering: An International Journal (CSEIJ), Vol 12, No 6, December 2022
118
Ping of death (Pod) is also a DoS attack and it is related to fragmentation. One of the features of
TCP/IP is that single IP packet can be broken into smaller packets. When a packet is broken into
small fragments it is possible to add up to a large amount than the allowed number of bytes.
Attackers make use of this property to crash or freeze a system. This can be detected by
identifying ICMP packets larger than 64KB. For this, we identified that the important features
missing are service and wrong fragment.
The next attack evaluated was Nmap. It is a network scanner. In a computer network, It is used to
distinguish between the host and the services. It can perform different types of scanning such as
port scan including SYN, FIN and ACK scanning with TCP and UDP as well as ICMP scanning.
So the port scan can be identified by examining the network packet through TCP, UDP or only
FIN packets or only SYN packets which have been sent to many ports on the target machine or
group of target machines on some duration of time. So the features duration, num shell, count,
serror rate, srv rerror rate are added to the above reduced feature set.After the addition of seven
more features to the reduced dataset, second level of detection was performed. The results
indicates a good improvement in the detection rate of these attacks.
In the case of teardrop and pod, the detection rate increased to 1 from 0.74 and 0.30. For the
nmap, the detection rate increased from 0.84 to 0.98. The attacks warezclient and warezmaster
were considered for the tird level detection. These two attacks come under the category of (R2L)
root to local attack.
The warezmaster utilizes the system bug which is related to the ftp server. This attack occurs
when write permission is given b y mistake to the user on the system by the ftp server. Then the
attacker can login and upload any files. The attacker login to the system using a guest account
while the attack is occuring. Then hidden directory is created and illegal copies of the software
are uploaded. This attack can be identified when many data were sent from source to destination
during ftp session.
Table 5. F-Score of 20 percent of dataset
Attacks Level I Level II Level III Level IV
Back 0.71 0.74 0.97 0.96
bufferoverflow 0.50 0.50 1 1
guesspassword 1 1 1 1
ipsweep 0.89 0.90 0.97 0.96
Neptune 1 1 1 1
nmap 0.82 0.88 0.87 0.86
Normal 0.98 0.98 0.99 0.99
pod 0.40 0.95 0.97 0.95
portsweep 0.98 0.98 0.98 0.98
satan 0.87 0.91 0.92 0.94
smurf 0.99 1 1 1
teardrop 0.75 1 1 1
warezclient 0.67 0.73 0.80 0.80
warezmaster 0.80 0.80 1 1
The warezclient attack occurs after the warezmaster attack. In this, the warez is downloaded
which is actually loaded during warezmaster attack. Downloading files from an FTP server
always seems to be a legal process. This can be identified during an FTP session when hot
indicators were notably triggered for a small duration of time. This may be due to downloading
warez. So hot is an important feature. For the third level of detection two additional features src
11. Computer Science & Engineering: An International Journal (CSEIJ), Vol 12, No 6, December 2022
119
bytes and hot were incorporated. At this stage, detection is performed using 32 features and the
detection rate of the warezclient increases from 0.71 to 0.93.
The fourth level of detection was done using all the 41 feature and its accuracy and detection rate
was observed. The overall accuracy improved than the above levels. The results of recall
precision and F-score for four different levels are shown in Table 6.
For the experimentation, only twenty percent of the dataset was used. Since some of the attacks
which has only less representation or less samples are unsampled. Hence the tables are provided
with fourteen types of attacks. Tables 7 to 9 provides the recall, precision and F-Measure of the
unsampled data set having all the twenty two types of attacks present in the entire dataset.
Table 10 and Table 11 shows the overall accuracy obtained during each stage of the experiment.
The results of classification accuracy reveals the fact that feature discretion as well as providing
weight to the reduced feature set improves accuracy. This work shows the importance of
detecting all types of attacks present in the data set without compromising accuracy. Because
some of the major attacks that contribute most were easily detected without detecting the
minority attacks, which are very hard to detect. This issue is being addressed in this using a level
wise detection. Out of 41 features, 32 features were proved to be very important. Thus by using
this level by level reduced feature set, we were able to identify missing features along with few
attacks such as Pod, Nmap, teardrop, warezclient and warezmaster.
Table 6. Precision, Recall and F-Measure of twenty percent of dataset
Attacks Recall Precision F-Measure
Neptune 1 1 1
warezclient 0.939 0.971 0.955
ipsweep 0.989 0.994 0.998
portsweep 0.986 0.997 0.991
teardrop 1 1 1
Nmap 0.983 0.983 0.983
Satan 0.973 0.994 0.983
Smurf 1 1 1
Pod 1 1 1
Back 1 1 1
guesspassword 1 1 1
bufferoverflow 0.667 0.800 0.727
Imap 0.600 1 0.750
Warezmaster 0.714 0.833 0.769
spy 0 0 0
12. Computer Science & Engineering: An International Journal (CSEIJ), Vol 12, No 6, December 2022
120
Table 7. Recall of twentytwo attacks in the dataset
Attacks Level I Level II Level III Level IV
Back 0.84 0.92 0.98 0.98
Bufferoverflow 1 1 1 1
ftp write 0.67 0.67 0.67 0.67
guesspassword 1 1 1 1
Imap 1 1 1 1
ipsweep 0.90 0.90 0.98 0.99
Land 1 1 1 1
Loadmodule 0.74 0.89 1 1
Multihop 1 1 1 1
Neptune 1 1 1 1
Nmap 0.75 0.93 0.91 0.88
Normal 0.98 0.98 0.98 0.99
Phf 1 1 1 1
Pod 0.29 1 1 1
portsweep 0.97 0.98 0.97 0.98
Rootkit 0.23 0.23 0.77 1
Satan 0.82 0.93 0.93 0.93
Smurf 0.99 0.99 1 1
spy 1 1 1 1
teardrop 0.68 1 1 1
warezclient 0.66 0.66 0.91 0.89
warezmaster 0.86 0.86 0.86 0.86
Table 8. Precision of twentytwo attacks in the dataset
Attacks Level I Level II Level III Level IV
Back 0.77 0.75 0.91 0.96
Bufferoverflow 0.85 0.92 0.91 0.92
ftp write 0.67 0.86 0.75 0.88
guesspassword 1 1 0.99 1
Imap 1 0.97 0.85 0.97
ipsweep 0.94 0.94 0.95 0.93
Land 1 1 1 1
Loadmodule 0.89 0.93 1 0.98
Multihop 0.49 0.52 0.58 0.58
Neptune 1 1 1 1
Nmap 0.94 1 0.98 0.97
Normal 0.95 0.97 0.99 0.99
Phf 1 1 1 1
Pod 0.22 1 0.88 0.92
portsweep 0.98 0.98 1 0.98
Rootkit 1 1 0.88 0.92
Satan 0.97 0.91 0.90 0.98
Smurf 1 0.99 0.99 1
spy 1 1 1 1
teardrop 0.93 1 1 1
warezclient 0.66 0.70 0.65 0.76
warezmaster 1 1 1 1
13. Computer Science & Engineering: An International Journal (CSEIJ), Vol 12, No 6, December 2022
121
Table 9. F-Score of twentytwo attacks in the dataset
Attacks Level I Level II Level III Level IV
Back 0.80 0.82 0.94 0.97
Bufferoverflow 0.92 0.96 0.95 0.96
ftp write 0.80 0.75 0.71 0.82
guesspassword 1 1 1 1
Imap 1 0.98 0.92 0.98
ipsweep 0.92 0.92 0.96 0.96
Land 1 1 1 1
Loadmodule 0.81 0.91 1 0.99
Multihop 0.66 0.69 0.73 0.73
Neptune 1 1 1 1
Nmap 0.83 1 0.95 0.92
Normal 0.97 0.98 0.99 0.99
Phf 1 1 1 1
Pod 0.25 1 0.93 0.93
portsweep 0.98 0.98 0.98 0.98
Rootkit 0.37 0.37 0.82 0.96
Satan 0.89 0.92 0.91 0.96
Smurf 1 0.99 1 1
spy 1 1 1 1
teardrop 0.79 1 1 1
warezclient 0.66 0.68 0.76 0.82
warezmaster 0.92 0.92 0.92 0.92
Table 10. Overall Accuracy for weighted and non-weighted features
Number of Features Accuracy in Percentage
41 Features 81.04
23 Features 90.87
Weighted 23 Features 97.12
Weighted 30 Features 98.01
Weighted 32 Features 98.74
Weighted 41 Features 98.86
Table 11. Overall Accuracy for Random Forest.
Classifier Number of Features Accuracy Time Taken
(Seconds)
Random
Forest
23 Features 97 3.48
23+7
(additional features of nmap, teardrop,
pod)
99.36 3.14
23+7
(additional features of warezclient and
warezmaster)
99.65 3.86
8. CONCLUSION
This manuscript aims at finding the optimum search point in detecting intrusion among network.
This was accomplished using feature discretion using the best ensemble based machine learning
classifier known as gradient based random forest by importing the data set by splitting into test
and train. The sequential workflow includes feature selection, classification and searching
14. Computer Science & Engineering: An International Journal (CSEIJ), Vol 12, No 6, December 2022
122
technique. Stochastic based evolutionary algorithm is applied for getting the optimized result. As
future direction, simulating this data set using various hybrid methods will help the research
community to explore more challenges in the field of threat analysis and network security.
REFERENCES
[1] Stefan Axelsson. Intrusion detection systems: A survey and taxonomy. Technical report,
Technical report, 2000.
[2] Ciza Thomas and N Balakrishnan. Performance enhancement of intrusion detection systems
using advances in sensor fusion. Supercomputer Education and Research Centre Indian
Institute of Science, Doctoral Thesis, 304pp. Available at: http://www. serc. iisc.
ernet.in/graduation-theses/CizaThomas-PhD-Thesis. pdf, 2009.
[3] Aleksandar Lazarevic, Vipin Kumar, and Jaideep Srivastava. Intrusion detection: A survey. In
Managing Cyber Threats, pages 19–78. Springer, 2005.
[4] Chirag Modi, Dhiren Patel, Bhavesh Borisaniya, Hiren Patel, Avi Patel, and Muttukrishnan
Rajarajan. A survey of intrusion detection techniques in cloud. Journal of Network and
Computer Applications, 36(1):42–57, 2013.
[5] Swati Paliwal and Ravindra Gupta. Denial-of-service, probing & remote to user (r2l) attack
detection using genetic algorithm., International Journal ofComputer Applications 60(19):57–62,
2012.
[6] Ron Kohavi and George H John. Wrappers for feature subset selection. Artificial intelligence,97(1-
2):273–324, 1997.
[7] Jason Weston, Sayan Mukherjee, Olivier Chapelle, Massimiliano Pontil, Tomaso Poggio, and
Vladimir Vapnik. Feature selection for svms. In Advances in neural information processing systems,
pages 668–674, 2001.
[8] Alessia Mammone, Marco Turchi, and Nello Cristianini. Support vector machines. Wiley
Interdisciplinary Reviews: ComputationalStatistics, 1(3):283–289, 2009.
[9] Marti A. Hearst, Susan T Dumais, Edgar Osuna, John Platt, and Bernhard Scholkopf. Support vector
machines. IEEE Intelligent Systems and their applications, 13(4):18–28, 1998.
[10] Corinna Cortes and Vladimir Vapnik.Support-vector networks. Machine learning, 20(3):273–297,
1995.
[11] Srinivas Mukkamala and Andrew Sung. Feature selection for intrusion detection with neural
networks and support vector machines. Transportation Research Record: Journal of the
Transportation Research Board, (1822):33–39, 2003.
[12] Muhammad Shakil Pervez and Dewan Md Farid. Feature selection and intrusion classification in nsl-
kdd cup 99 dataset employing svms. In Software, Knowledge, Information Management and
Applications (SKIMA), 2014 8th International Conference on, pages 1–6. IEEE, 2014.
[13] Yinhui Li, Jingbo Xia, Silan Zhang, Jiakai Yan, Xiaochuan Ai, and Kuobin Dai. An efficient
intrusion detection system based on support vector machines and gradually feature removal method.
ExpertSystems with Applications, 39(1):424–430, 2012.
[14] Nelcileno Araújo, Ruy de Oliveira, Ailton Akira Shinoda, Bharat Bhargava, et al. Identifying
important characteristics in the kdd99 intrusion detection dataset by feature selection using a
hybrid approach. In Telecommunications (ICT), 2010 IEEE 17th International Conference on,
pages 552–558. IEEE, 2010.
[15] Rung-Ching Chen, Kai-Fan Cheng, Ying-Hao Chen, and Chia-Fen Hsieh. Using rough set and
support vector machine for network intrusion detection system. In Intelligent Information and
Database Systems, 2009. ACIIDS 2009. First AsianConference on, pages 465–470. IEEE, 2009.
[16] Wei Wang and Roberto Battiti. Identifying intrusions in computer networks with principal
component analysis. In Availability, Reliability and Security, 2006. ARES 2006. The First
International Conference on, pages 8–pp. IEEE, 2006.
[17] Fangjun Kuang, Weihong Xu, and Siyang Zhang. A novel hybrid kpca and svm with ga model for
intrusion detection. Applied Soft Computing, 18:178–184,2014.
[18] Cheng-Lung Huang and Chieh-Jen Wang. A ga-based feature selection and parameters
optimizationfor support vector machines. Expert Systems with applications, 31(2):231–240, 2006.
[19] Ikram Sumaiya Thaseen and Cherukuri Aswani Kumar. Intrusion detection model using
fusionof chi-square feature selection and multi class svm. Journal of King Saud University-
15. Computer Science & Engineering: An International Journal (CSEIJ), Vol 12, No 6, December 2022
123
Computer and Information Sciences, 29(4):462–472, 2017.
[20] Adriana-Cristina Enache and Victor Valeriu Patriciu. Intrusions detection based on support
vector machine optimized with swarm intelligence. In Applied Computational Intelligence and
Informatics (SACI), 2014 IEEE 9th International Symposium on, pages 153–158. IEEE, 2014.
[21] Mostafa A Salama, Heba F Eid, Rabie A Ramadan, Ashraf Darwish, and Aboul Ella
Hassanien. Hybrid intelligent intrusion detection scheme. In Soft computing in industrial
applications, pages 293–303. Springer, 2011.
[22] Stephanie Forrest. Genetic algorithms: principles of natural selection applied to computation.
Science, 261(5123):872–878, 1993.
[23] Randy L Haupt, Sue Ellen Haupt, and Sue Ellen Haupt. Practical genetic algorithms, volume 2.
Wiley New York, 1998.
[24] Sean Luke. Essentials of metaheuristics, volume 113. Lulu Raleigh, 2009.
[25] Bineet Mishra and Rakesh Kumar Patnaik. Genetic Algorithm and its varients: Theory ans
Applications, PhD thesis, 2009.
[26] L Dhanabal and SP Shantharajah. A study on nsl-kdd dataset for intrusion detection system
based on classification algorithms. International Journal of Advanced Research in Computer and
Communication Engineering, 4(6):446–452, 2015.
[27] Charles E Metz. Basic principles of roc analysis.In Seminars in nuclear medicine, volume 8,
pages 283–298. Elsevier, 1978.
[28] Kristopher Robert Kendall. A database of computer attacks for the evaluation of intrusion
detection systems. PhD thesis, Massachusetts Institute of Technology, 1999.
[29] Stephen Northcutt and Judy Novak. Network intrusion detection. Sams Publishing,
2002.
[30] Ciza Thomas, Vishwas Sharma, and N Balakrishnan. Usefulness of darpa dataset for intrusion
detection system evaluation. In Data Mining, Intrusion Detection, Information Assurance, and
DataNetworks Security 2008, volume 6973, page 69730G. International Society for Optics and
Photonics,2008.