This document summarizes and compares different clustering algorithms that can be used for network anomaly detection. It proposes a method that first applies clustering algorithms like k-means, hierarchical, and expectation maximization clustering to partition network traffic data into clusters. It then applies the ID3 decision tree algorithm on each cluster to classify instances as normal or anomalous. The performance of this combined method is compared to using just the clustering or ID3 algorithms individually. Real network data sets are used to evaluate performance based on various metrics. The combined method is found to outperform the individual algorithms. The document also reviews several other related works applying clustering and decision trees for network anomaly detection and privacy-preserving data mining.
A Novel Classification via Clustering Method for Anomaly Based Network Intrus...IDES Editor
Intrusion detection in the internet is an active
area of research. Intruders can be classified into two
types, namely; external intruders who are unauthorized
users of the computers they attack, and internal
intruders, who have permission to access the system but
with some restrictions. The aim of this paper is to present
a methodology to recognize attacks during the normal
activities in a system. A novel classification via sequential
information bottleneck (sIB) clustering algorithm has
been proposed to build an efficient anomaly based
network intrusion detection model. We have compared
our proposed method with other clustering algorithms
like X-Means, Farthest First, Filtered clusters, DBSCAN,
K-Means, and EM (Expectation-Maximization)
clustering in order to find the suitability of our proposed
algorithm. A subset of KDDCup 1999 intrusion detection
benchmark dataset has been used for the experiment.
Results show that the proposed method is efficient in
terms of detection accuracy, low false positive rate in
comparison to the other existing methods.
Intrusion detection with Parameterized Methods for Wireless Sensor Networksrahulmonikasharma
Current network intrusion detection systems lack adaptability to the frequently changing network environments. Furthermore, intrusion detection in the new distributed architectures is now a major requirement. In this paper, we propose two Adaboost based intrusion detection algorithms. In the first algorithm, a traditional online Adaboost process is used where decision stumps are used as weak classifiers. In the second algorithm, an improved online Adaboost process is proposed, and online Gaussian mixture models (GMMs) are used as weak classifiers. We further propose a distributed intrusion detection framework, in which a local parameterized detection model is constructed in each node using the online Adaboost algorithm. A global detection model is constructed in each node by combining the local parametric models using a small number of samples in the node. This combination is achieved using an algorithm based on particle swarm optimization (PSO) and support vector machines. The global model in each node is used to detect intrusions. Experimental results show that the improved online Adaboost process with GMMs obtains a higher detection rate and a lower false alarm rate than the traditional online Adaboost process that uses decision stumps. Both the algorithms outperform existing intrusion detection algorithms. It is also shown that our PSO, and SVM-based algorithm effectively combines the local detection models into the global model in each node; the global model in a node can handle the intrusion types that are found in other nodes, without sharing the samples of these intrusion types.
Classification Rule Discovery Using Ant-Miner Algorithm: An Application Of N...IJMER
Enormous studies on intrusion detection have widely applied data mining techniques to
finding out the useful knowledge automatically from large amount of databases, while few studies have
proposed classification data mining approaches. In an actual risk assessment process, the discovery of
intrusion detection prediction knowledge from experts is still regarded as an important task because
experts’ predictions depend on their subjectivity. Traditional statistical techniques and artificial
intelligence techniques are commonly used to solve this classification decision making. This paper
proposes an ant-miner based data mining method for discovering network intrusion detection rules from
large dataset. The obtained result of this experiment shows that clearly the ant-miner is superior than
ID3, J48, ADtree, BFtree, Simple cart. Although different classification models have been developed for
network intrusion detection, each of them has its strength and weakness, including the most commonly
applied Support Vector Machine(SVM)method and the clustering based on Self Organized Ant Colony
Network (CSOACN).Our algorithm is implemented and evaluated using a standard bench mark KDD99
dataset. Experiments show that ant-miner algorithm out performs than other methods in terms of both
classification rate and accuracy
Minkowski Distance based Feature Selection Algorithm for Effective Intrusion ...IJMER
Intrusion Detection System (IDS) plays a major role in the provision of effective security to various types of networks. Moreover, Intrusion Detection System for networks need appropriate rule set for classifying network bench mark data into normal or attack patterns. Generally, each dataset is characterized by a large set of features. However, all these features will not be relevant or fully contribute in identifying an attack. Since different attacks need various subsets to provide better detection accuracy. In this paper an improved feature selection algorithm is proposed to identify the most appropriate subset of features for detecting a certain attacks. This proposed method is based on Minkowski distance feature ranking and an improved exhaustive search that selects a better combination of features. This system has been evaluated using the KDD CUP 1999 dataset and also with EMSVM [1] classifier. The experimental results show that the proposed system provides high classification accuracy and low false alarm rate when applied on the reduced feature subsets
A NOVEL INTRUSION DETECTION MODEL FOR MOBILE AD-HOC NETWORKS USING CP-KNNIJCNCJournal
Mobile ad-hoc network security problems are the subject of in depth analysis. A group of mobile nodes area unit connected to a set wired backbone. In MANET, the node themselves implement the network management in a very cooperative fashion. All the nodes area unit accountable to create a constellation that is dynamically, modification it and conjointly the absence of any clear network boundaries. We tend to project a completely unique intrusion detection model for mobile ad-hoc network victimization. CP-KNN (Conformal Prediction K-Nearest Neighbor) algorithmic rule is to classify the audit knowledge for anomaly detection. The non-conformity score worth is employed to cut back the classification period of time for multi level iteration. It is effectively notice anomalies with high true positive rate, low false positive rate and high confidence that the progressive of assorted anomaly detection ways. Additionally it is interfered
by “noisy” knowledge (unclean data), the projected technique is strong, effective and conjointly it retains
its smart detection performance and to avoid the abnormal activity.
A Novel Classification via Clustering Method for Anomaly Based Network Intrus...IDES Editor
Intrusion detection in the internet is an active
area of research. Intruders can be classified into two
types, namely; external intruders who are unauthorized
users of the computers they attack, and internal
intruders, who have permission to access the system but
with some restrictions. The aim of this paper is to present
a methodology to recognize attacks during the normal
activities in a system. A novel classification via sequential
information bottleneck (sIB) clustering algorithm has
been proposed to build an efficient anomaly based
network intrusion detection model. We have compared
our proposed method with other clustering algorithms
like X-Means, Farthest First, Filtered clusters, DBSCAN,
K-Means, and EM (Expectation-Maximization)
clustering in order to find the suitability of our proposed
algorithm. A subset of KDDCup 1999 intrusion detection
benchmark dataset has been used for the experiment.
Results show that the proposed method is efficient in
terms of detection accuracy, low false positive rate in
comparison to the other existing methods.
Intrusion detection with Parameterized Methods for Wireless Sensor Networksrahulmonikasharma
Current network intrusion detection systems lack adaptability to the frequently changing network environments. Furthermore, intrusion detection in the new distributed architectures is now a major requirement. In this paper, we propose two Adaboost based intrusion detection algorithms. In the first algorithm, a traditional online Adaboost process is used where decision stumps are used as weak classifiers. In the second algorithm, an improved online Adaboost process is proposed, and online Gaussian mixture models (GMMs) are used as weak classifiers. We further propose a distributed intrusion detection framework, in which a local parameterized detection model is constructed in each node using the online Adaboost algorithm. A global detection model is constructed in each node by combining the local parametric models using a small number of samples in the node. This combination is achieved using an algorithm based on particle swarm optimization (PSO) and support vector machines. The global model in each node is used to detect intrusions. Experimental results show that the improved online Adaboost process with GMMs obtains a higher detection rate and a lower false alarm rate than the traditional online Adaboost process that uses decision stumps. Both the algorithms outperform existing intrusion detection algorithms. It is also shown that our PSO, and SVM-based algorithm effectively combines the local detection models into the global model in each node; the global model in a node can handle the intrusion types that are found in other nodes, without sharing the samples of these intrusion types.
Classification Rule Discovery Using Ant-Miner Algorithm: An Application Of N...IJMER
Enormous studies on intrusion detection have widely applied data mining techniques to
finding out the useful knowledge automatically from large amount of databases, while few studies have
proposed classification data mining approaches. In an actual risk assessment process, the discovery of
intrusion detection prediction knowledge from experts is still regarded as an important task because
experts’ predictions depend on their subjectivity. Traditional statistical techniques and artificial
intelligence techniques are commonly used to solve this classification decision making. This paper
proposes an ant-miner based data mining method for discovering network intrusion detection rules from
large dataset. The obtained result of this experiment shows that clearly the ant-miner is superior than
ID3, J48, ADtree, BFtree, Simple cart. Although different classification models have been developed for
network intrusion detection, each of them has its strength and weakness, including the most commonly
applied Support Vector Machine(SVM)method and the clustering based on Self Organized Ant Colony
Network (CSOACN).Our algorithm is implemented and evaluated using a standard bench mark KDD99
dataset. Experiments show that ant-miner algorithm out performs than other methods in terms of both
classification rate and accuracy
Minkowski Distance based Feature Selection Algorithm for Effective Intrusion ...IJMER
Intrusion Detection System (IDS) plays a major role in the provision of effective security to various types of networks. Moreover, Intrusion Detection System for networks need appropriate rule set for classifying network bench mark data into normal or attack patterns. Generally, each dataset is characterized by a large set of features. However, all these features will not be relevant or fully contribute in identifying an attack. Since different attacks need various subsets to provide better detection accuracy. In this paper an improved feature selection algorithm is proposed to identify the most appropriate subset of features for detecting a certain attacks. This proposed method is based on Minkowski distance feature ranking and an improved exhaustive search that selects a better combination of features. This system has been evaluated using the KDD CUP 1999 dataset and also with EMSVM [1] classifier. The experimental results show that the proposed system provides high classification accuracy and low false alarm rate when applied on the reduced feature subsets
A NOVEL INTRUSION DETECTION MODEL FOR MOBILE AD-HOC NETWORKS USING CP-KNNIJCNCJournal
Mobile ad-hoc network security problems are the subject of in depth analysis. A group of mobile nodes area unit connected to a set wired backbone. In MANET, the node themselves implement the network management in a very cooperative fashion. All the nodes area unit accountable to create a constellation that is dynamically, modification it and conjointly the absence of any clear network boundaries. We tend to project a completely unique intrusion detection model for mobile ad-hoc network victimization. CP-KNN (Conformal Prediction K-Nearest Neighbor) algorithmic rule is to classify the audit knowledge for anomaly detection. The non-conformity score worth is employed to cut back the classification period of time for multi level iteration. It is effectively notice anomalies with high true positive rate, low false positive rate and high confidence that the progressive of assorted anomaly detection ways. Additionally it is interfered
by “noisy” knowledge (unclean data), the projected technique is strong, effective and conjointly it retains
its smart detection performance and to avoid the abnormal activity.
An effective approach for tackling network security
problems is Intrusion detection systems (IDS). These kind of
systems play a key role in network security as they can detect
different types of attacks in networks, including DoS, U2R Probe
and R2L. In addition, IDS are an increasingly key part of the
system’s defense. Various approaches to IDS are now being used,
but are unfortunately relatively ineffective. Data mining techniques
and artificial intelligence play an important role in security
services. We will present a comparative study of three wellknown
intelligent algorithms in this paper. These are Radial Basis
Functions (RBF), Multilayer Perceptrons (MLP) and Support
Vector Machine (SVM).This work’s main interest is to benchmark
the performance of these3 intelligent algorithms. This is done by
using a dataset of about 9,000 connections, randomly chosen from
KDD'99’s 10% dataset. In addition, we investigate these
algorithms’ performance in terms of their attack classification
accuracy. The Simulation results are also analyzed and the
discussion is then presented. It has been observed that SVM with a
linear kernel (Linear-SVM) gives a better performance than MLP
and RBF in terms of its detection accuracy and processing speed.
Survey of network anomaly detection using markov chainijcseit
Recently an internet threat has been increased. Our motive is detect the intrusion in the network in concise.
The real time issue such as DoS attack in banking, companies, industries and organization have been
increased significantly IDS has been used in both server and host side. The major challenge is to effectively
predict the periods of threats and protect the server from the unauthorized user. In this study, a novel
probabilistic approach is proposed effectively to detect the network intrusions. It uses a Markov chain for
probabilistic modelling of abnormal events in network systems. The degree of abnormality of the incoming
data is performed on the basis of the network states.
A SURVEY ON DIFFERENT MACHINE LEARNING ALGORITHMS AND WEAK CLASSIFIERS BASED ...ijaia
Network intrusion detection often finds a difficulty in creating classifiers that could handle unequal distributed attack categories. Generally, attacks such as Remote to Local (R2L) and User to Root (U2R) attacks are very rare attacks and even in KDD dataset, these attacks are only 2% of overall datasets. So, these result in model not able to efficiently learn the characteristics of rare categories and this will result in poor detection rates of rare attack categories like R2L and U2R attacks. We even compared the accuracy of KDD and NSL-KDD datasets using different classifiers in WEKA.
A SURVEY ON DIFFERENT MACHINE LEARNING ALGORITHMS AND WEAK CLASSIFIERS BASED ...gerogepatton
Network intrusion detection often finds a difficulty in creating classifiers that could handle unequal distributed attack categories. Generally, attacks such as Remote to Local (R2L) and User to Root (U2R) attacks are very rare attacks and even in KDD dataset, these attacks are only 2% of overall datasets. So,these result in model not able to efficiently learn the characteristics of rare categories and this will result in
poor detection rates of rare attack categories like R2L and U2R attacks. We even compared the accuracy of KDD and NSL-KDD datasets using different classifiers in WEKA.
An approach for ids by combining svm and ant colony algorithmeSAT Journals
Abstract This piece of work researches the intrusion detection problem of the network sanctuary; the primary task is to classify network behavior as normal or abnormal while reducing misclassification. In this paper, two efficient data mining algorithms are combined together to detect the network intrusion. Combining SVM and Ant colony (CSVAC) used for well-organized data classification, this technique takes the advantage of both the algorithm while avoiding their weaknesses. This algorithm is implemented and evaluated using standard benchmark KDDCUP99 data set. Experimental results drastically well produce superior results than the other algorithm in terms of accuracy rate and run time efficiency, and this algorithm able to detect the new types of attacks Keywords: Intrusion Detection; Support Vector Machine; Ant colony; Combined Support vector with ant colony
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology
A new clutering approach for anomaly intrusion detectionIJDKP
Recent advances in technology have made our work easier compare to earlier times. Computer network is
growing day by day but while discussing about the security of computers and networks it has always been a
major concerns for organizations varying from smaller to larger enterprises. It is true that organizations
are aware of the possible threats and attacks so they always prepare for the safer side but due to some
loopholes attackers are able to make attacks.
Intrusion detection is one of the major fields of research and researchers are trying to find new algorithms
for detecting intrusions. Clustering techniques of data mining is an interested area of research for detecting
possible intrusions and attacks. This paper presents a new clustering approach for anomaly intrusion
detection by using the approach of K-medoids method of clustering and its certain modifications. The
proposed algorithm is able to achieve high detection rate and overcomes the disadvantages of K-means
algorithm.
Visualize network anomaly detection by using k means clustering algorithmIJCNCJournal
With the ever increasing amount of new attacks in today’s world the amount of data will keep increasing,
and because of the base-rate fallacy the amount of false alarms will also increase. Another problem with
detection of attacks is that they usually isn’t detected until after the attack has taken place, this makes
defending against attacks hard and can easily lead to disclosure of sensitive information.
In this paper we choose K-means algorithm with the Kdd Cup 1999 network data set to evaluate the
performance of an unsupervised learning method for anomaly detection. The results of the evaluation
showed that a high detection rate can be achieve while maintaining a low false alarm rate .This paper
presents the result of using k-means clustering by applying Cluster 3.0 tool and visualized this result by
using TreeView visualization tool .
Multi Stage Filter Using Enhanced Adaboost for Network Intrusion DetectionIJNSA Journal
Based on the analysis and distribution of network attacks in KDDCup99 dataset and real time traffic, this paper proposes a design of multi stage filter which is an efficient and effective approach in dealing with various categories of attacks in networks. The first stage of the filter is designed using Enhanced Adaboost with Decision tree algorithm to detect the frequent attacks occurs in the network and the second stage of the filter is designed using enhanced Adaboost with Naïve Byes algorithm to detect the moderate attacks occurs in the network. The final stage of the filter is used to detect the infrequent
attack which is designed using the enhanced Adaboost algorithm with Naïve Bayes as a base learner. Performance of this design is tested with the KDDCup99 dataset and is shown to have high detection rate with low false alarm rates.
Intrusion Detection System for Classification of Attacks with Cross Validationinventionjournals
Now days, due to rapidly uses of internet, the patterns of network attacks are increasing. There are various organizations and institutes are using internet and access or share the sensitive information in network. To protect information from unauthorized or intruders is one of the important issues. In this paper, we have used decision tree techniques like C4.5 and CART as classifier for classification of attacks. We have proposed an ensemble model that is combination of C4.5 and Classification and Regression Tree (CART) as robust classifier for classification of attacks. We have used NSL-KDD data set with binary and multiclass problem with 10-fold cross validation. The proposed ensemble model gives satisfactory accuracy as 99.67% and 99.53% in case of binary class and multiclass NSL-KDD data set respectively.
UTILIZING XAI TECHNIQUE TO IMPROVE AUTOENCODER BASED MODEL FOR COMPUTER NETWO...IJCNCJournal
Machine learning (ML) and Deep Learning (DL) methods are being adopted rapidly, especially in computer network security, such as fraud detection, network anomaly detection, intrusion detection, and much more. However, the lack of transparency of ML and DL based models is a major obstacle to their implementation and criticized due to its black-box nature, even with such tremendous results. Explainable Artificial Intelligence (XAI) is a promising area that can improve the trustworthiness of these models by giving explanations and interpreting its output. If the internal working of the ML and DL based models is understandable, then it can further help to improve its performance. The objective of this paper is to show that how XAI can be used to interpret the results of the DL model, the autoencoder in this case. And, based on the interpretation, we improved its performance for computer network anomaly detection. The kernel SHAP method, which is based on the shapley values, is used as a novel feature selection technique. This method is used to identify only those features that are actually causing the anomalous behaviour of the set of attack/anomaly instances. Later, these feature sets are used to train and validate the autoencoderbut on benign data only. Finally, the built SHAP_Model outperformed the other two models proposed based on the feature selection method. This whole experiment is conducted on the subset of the latest CICIDS2017 network dataset. The overall accuracy and AUC of SHAP_Model is 94% and 0.969, respectively.
An intrusion detection system plays a major role in network security. We
propose a model “DB-OLS: An Approach for IDS” which is a Deviation Based-Outlier
approach for Intrusion detection using Self Organizing Maps. In this model “Self
Organizing Map” approach is to be used for behavior learning and “Outlier mining”
approach, for detecting an intruder by calculating deviation from known user profile.
This model aims to improve the capability of detecting intruders.
AN EFFICIENT INTRUSION DETECTION SYSTEM WITH CUSTOM FEATURES USING FPA-GRADIE...IJCNCJournal
An efficient Intrusion Detection System has to be given high priority while connecting systems with a network to prevent the system before an attack happens. It is a big challenge to the network security group to prevent the system from a variable types of new attacks as technology is growing in parallel. In this paper, an efficient model to detect Intrusion is proposed to predict attacks with high accuracy and less false-negative rate by deriving custom features UNSW-CF by using the benchmark intrusion dataset UNSW-NB15. To reduce the learning complexity, Custom Features are derived and then Significant Features are constructed by applying meta-heuristic FPA (Flower Pollination algorithm) and MRMR (Minimal Redundancy and Maximum Redundancy) which reduces learning time and also increases prediction accuracy. ENC (ElasicNet Classifier), KRRC (Kernel Ridge Regression Classifier), IGBC (Improved Gradient Boosting Classifier) is employed to classify the attacks in the datasets UNSW-CF, UNSW and recorded that UNSW-CF with derived custom features using IGBC integrated with FPA provided high accuracy of 97.38% and a low error rate of 2.16%. Also, the sensitivity and specificity rate for IGB attains a high rate of 97.32% and 97.50% respectively.
Artificial Neural Content Techniques for Enhanced Intrusion Detection and Pre...AM Publications
This paper presents a novel approach for detecting network intrusions based on a competitive training neural
network. In the paper, the performance of this approach is compared to that of the self-organizing map (SOM), which is a
popular unsupervised training algorithm used in intrusion detection. While obtaining a similarly accurate detection rate as
the SOM does, the proposed approach uses only one forth of the computation times of the SOM. Furthermore, the
clustering result of this method is independent of the number of the initial neurons. This approach also exhibits the ability
to detect the known and unknown network attacks. The experimental results obtained by applying this approach to the
KDD-99 data set demonstrate that the proposed approach performs exceptionally in terms of both accuracy and
computation time.
ATTACK DETECTION AVAILING FEATURE DISCRETION USING RANDOM FOREST CLASSIFIERCSEIJJournal
The widespread use of the Internet has an adverse effect of being vulnerable to cyber attacks. Defensive
mechanisms like firewalls and IDSs have evolved with a lot of research contributions happening in these
areas. Machine learning techniques have been successfully used in these defense mechanisms especially
IDSs. Although they are effective to some extent in identifying new patterns and variants of existing
malicious patterns, many attacks are still left as undetected. The objective is to develop an algorithm for
detecting malicious domains based on passive traffic measurements. In this paper, an anomaly-based
intrusion detection system based on an ensemble based machine learning classifier called Random Forest
with gradient boosting is deployed. NSL-KDD cup dataset is used for analysis and out of 41 features, 32
features were identified as significant using feature discretion. Our observations confirm the conjecture
that both the feature selection and stochastic based genetic operators improves the accuracy and the
effectiveness. The training time is shown to be reduced tremendously by 98.59% and accuracy improved to
98.75%.
Attack Detection Availing Feature Discretion using Random Forest ClassifierCSEIJJournal
The widespread use of the Internet has an adverse effect of being vulnerable to cyber attacks. Defensive
mechanisms like firewalls and IDSs have evolved with a lot of research contributions happening in these
areas. Machine learning techniques have been successfully used in these defense mechanisms especially
IDSs. Although they are effective to some extent in identifying new patterns and variants of existing
malicious patterns, many attacks are still left as undetected. The objective is to develop an algorithm for
detecting malicious domains based on passive traffic measurements. In this paper, an anomaly-based
intrusion detection system based on an ensemble based machine learning classifier called Random Forest
with gradient boosting is deployed. NSL-KDD cup dataset is used for analysis and out of 41 features, 32
features were identified as significant using feature discretion.
Analysis on different Data mining Techniques and algorithms used in IOTIJERA Editor
In this paper, we discusses about five functionalities of data mining in IOT that affects the performance and that
are: Data anomaly detection, Data clustering, Data classification, feature selection, time series prediction. Some
important algorithm has also been reviewed here of each functionalities that show advantages and limitations as
well as some new algorithm that are in research direction. Here we had represent knowledge view of data
mining in IOT.
An effective approach for tackling network security
problems is Intrusion detection systems (IDS). These kind of
systems play a key role in network security as they can detect
different types of attacks in networks, including DoS, U2R Probe
and R2L. In addition, IDS are an increasingly key part of the
system’s defense. Various approaches to IDS are now being used,
but are unfortunately relatively ineffective. Data mining techniques
and artificial intelligence play an important role in security
services. We will present a comparative study of three wellknown
intelligent algorithms in this paper. These are Radial Basis
Functions (RBF), Multilayer Perceptrons (MLP) and Support
Vector Machine (SVM).This work’s main interest is to benchmark
the performance of these3 intelligent algorithms. This is done by
using a dataset of about 9,000 connections, randomly chosen from
KDD'99’s 10% dataset. In addition, we investigate these
algorithms’ performance in terms of their attack classification
accuracy. The Simulation results are also analyzed and the
discussion is then presented. It has been observed that SVM with a
linear kernel (Linear-SVM) gives a better performance than MLP
and RBF in terms of its detection accuracy and processing speed.
Survey of network anomaly detection using markov chainijcseit
Recently an internet threat has been increased. Our motive is detect the intrusion in the network in concise.
The real time issue such as DoS attack in banking, companies, industries and organization have been
increased significantly IDS has been used in both server and host side. The major challenge is to effectively
predict the periods of threats and protect the server from the unauthorized user. In this study, a novel
probabilistic approach is proposed effectively to detect the network intrusions. It uses a Markov chain for
probabilistic modelling of abnormal events in network systems. The degree of abnormality of the incoming
data is performed on the basis of the network states.
A SURVEY ON DIFFERENT MACHINE LEARNING ALGORITHMS AND WEAK CLASSIFIERS BASED ...ijaia
Network intrusion detection often finds a difficulty in creating classifiers that could handle unequal distributed attack categories. Generally, attacks such as Remote to Local (R2L) and User to Root (U2R) attacks are very rare attacks and even in KDD dataset, these attacks are only 2% of overall datasets. So, these result in model not able to efficiently learn the characteristics of rare categories and this will result in poor detection rates of rare attack categories like R2L and U2R attacks. We even compared the accuracy of KDD and NSL-KDD datasets using different classifiers in WEKA.
A SURVEY ON DIFFERENT MACHINE LEARNING ALGORITHMS AND WEAK CLASSIFIERS BASED ...gerogepatton
Network intrusion detection often finds a difficulty in creating classifiers that could handle unequal distributed attack categories. Generally, attacks such as Remote to Local (R2L) and User to Root (U2R) attacks are very rare attacks and even in KDD dataset, these attacks are only 2% of overall datasets. So,these result in model not able to efficiently learn the characteristics of rare categories and this will result in
poor detection rates of rare attack categories like R2L and U2R attacks. We even compared the accuracy of KDD and NSL-KDD datasets using different classifiers in WEKA.
An approach for ids by combining svm and ant colony algorithmeSAT Journals
Abstract This piece of work researches the intrusion detection problem of the network sanctuary; the primary task is to classify network behavior as normal or abnormal while reducing misclassification. In this paper, two efficient data mining algorithms are combined together to detect the network intrusion. Combining SVM and Ant colony (CSVAC) used for well-organized data classification, this technique takes the advantage of both the algorithm while avoiding their weaknesses. This algorithm is implemented and evaluated using standard benchmark KDDCUP99 data set. Experimental results drastically well produce superior results than the other algorithm in terms of accuracy rate and run time efficiency, and this algorithm able to detect the new types of attacks Keywords: Intrusion Detection; Support Vector Machine; Ant colony; Combined Support vector with ant colony
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology
A new clutering approach for anomaly intrusion detectionIJDKP
Recent advances in technology have made our work easier compare to earlier times. Computer network is
growing day by day but while discussing about the security of computers and networks it has always been a
major concerns for organizations varying from smaller to larger enterprises. It is true that organizations
are aware of the possible threats and attacks so they always prepare for the safer side but due to some
loopholes attackers are able to make attacks.
Intrusion detection is one of the major fields of research and researchers are trying to find new algorithms
for detecting intrusions. Clustering techniques of data mining is an interested area of research for detecting
possible intrusions and attacks. This paper presents a new clustering approach for anomaly intrusion
detection by using the approach of K-medoids method of clustering and its certain modifications. The
proposed algorithm is able to achieve high detection rate and overcomes the disadvantages of K-means
algorithm.
Visualize network anomaly detection by using k means clustering algorithmIJCNCJournal
With the ever increasing amount of new attacks in today’s world the amount of data will keep increasing,
and because of the base-rate fallacy the amount of false alarms will also increase. Another problem with
detection of attacks is that they usually isn’t detected until after the attack has taken place, this makes
defending against attacks hard and can easily lead to disclosure of sensitive information.
In this paper we choose K-means algorithm with the Kdd Cup 1999 network data set to evaluate the
performance of an unsupervised learning method for anomaly detection. The results of the evaluation
showed that a high detection rate can be achieve while maintaining a low false alarm rate .This paper
presents the result of using k-means clustering by applying Cluster 3.0 tool and visualized this result by
using TreeView visualization tool .
Multi Stage Filter Using Enhanced Adaboost for Network Intrusion DetectionIJNSA Journal
Based on the analysis and distribution of network attacks in KDDCup99 dataset and real time traffic, this paper proposes a design of multi stage filter which is an efficient and effective approach in dealing with various categories of attacks in networks. The first stage of the filter is designed using Enhanced Adaboost with Decision tree algorithm to detect the frequent attacks occurs in the network and the second stage of the filter is designed using enhanced Adaboost with Naïve Byes algorithm to detect the moderate attacks occurs in the network. The final stage of the filter is used to detect the infrequent
attack which is designed using the enhanced Adaboost algorithm with Naïve Bayes as a base learner. Performance of this design is tested with the KDDCup99 dataset and is shown to have high detection rate with low false alarm rates.
Intrusion Detection System for Classification of Attacks with Cross Validationinventionjournals
Now days, due to rapidly uses of internet, the patterns of network attacks are increasing. There are various organizations and institutes are using internet and access or share the sensitive information in network. To protect information from unauthorized or intruders is one of the important issues. In this paper, we have used decision tree techniques like C4.5 and CART as classifier for classification of attacks. We have proposed an ensemble model that is combination of C4.5 and Classification and Regression Tree (CART) as robust classifier for classification of attacks. We have used NSL-KDD data set with binary and multiclass problem with 10-fold cross validation. The proposed ensemble model gives satisfactory accuracy as 99.67% and 99.53% in case of binary class and multiclass NSL-KDD data set respectively.
UTILIZING XAI TECHNIQUE TO IMPROVE AUTOENCODER BASED MODEL FOR COMPUTER NETWO...IJCNCJournal
Machine learning (ML) and Deep Learning (DL) methods are being adopted rapidly, especially in computer network security, such as fraud detection, network anomaly detection, intrusion detection, and much more. However, the lack of transparency of ML and DL based models is a major obstacle to their implementation and criticized due to its black-box nature, even with such tremendous results. Explainable Artificial Intelligence (XAI) is a promising area that can improve the trustworthiness of these models by giving explanations and interpreting its output. If the internal working of the ML and DL based models is understandable, then it can further help to improve its performance. The objective of this paper is to show that how XAI can be used to interpret the results of the DL model, the autoencoder in this case. And, based on the interpretation, we improved its performance for computer network anomaly detection. The kernel SHAP method, which is based on the shapley values, is used as a novel feature selection technique. This method is used to identify only those features that are actually causing the anomalous behaviour of the set of attack/anomaly instances. Later, these feature sets are used to train and validate the autoencoderbut on benign data only. Finally, the built SHAP_Model outperformed the other two models proposed based on the feature selection method. This whole experiment is conducted on the subset of the latest CICIDS2017 network dataset. The overall accuracy and AUC of SHAP_Model is 94% and 0.969, respectively.
An intrusion detection system plays a major role in network security. We
propose a model “DB-OLS: An Approach for IDS” which is a Deviation Based-Outlier
approach for Intrusion detection using Self Organizing Maps. In this model “Self
Organizing Map” approach is to be used for behavior learning and “Outlier mining”
approach, for detecting an intruder by calculating deviation from known user profile.
This model aims to improve the capability of detecting intruders.
AN EFFICIENT INTRUSION DETECTION SYSTEM WITH CUSTOM FEATURES USING FPA-GRADIE...IJCNCJournal
An efficient Intrusion Detection System has to be given high priority while connecting systems with a network to prevent the system before an attack happens. It is a big challenge to the network security group to prevent the system from a variable types of new attacks as technology is growing in parallel. In this paper, an efficient model to detect Intrusion is proposed to predict attacks with high accuracy and less false-negative rate by deriving custom features UNSW-CF by using the benchmark intrusion dataset UNSW-NB15. To reduce the learning complexity, Custom Features are derived and then Significant Features are constructed by applying meta-heuristic FPA (Flower Pollination algorithm) and MRMR (Minimal Redundancy and Maximum Redundancy) which reduces learning time and also increases prediction accuracy. ENC (ElasicNet Classifier), KRRC (Kernel Ridge Regression Classifier), IGBC (Improved Gradient Boosting Classifier) is employed to classify the attacks in the datasets UNSW-CF, UNSW and recorded that UNSW-CF with derived custom features using IGBC integrated with FPA provided high accuracy of 97.38% and a low error rate of 2.16%. Also, the sensitivity and specificity rate for IGB attains a high rate of 97.32% and 97.50% respectively.
Artificial Neural Content Techniques for Enhanced Intrusion Detection and Pre...AM Publications
This paper presents a novel approach for detecting network intrusions based on a competitive training neural
network. In the paper, the performance of this approach is compared to that of the self-organizing map (SOM), which is a
popular unsupervised training algorithm used in intrusion detection. While obtaining a similarly accurate detection rate as
the SOM does, the proposed approach uses only one forth of the computation times of the SOM. Furthermore, the
clustering result of this method is independent of the number of the initial neurons. This approach also exhibits the ability
to detect the known and unknown network attacks. The experimental results obtained by applying this approach to the
KDD-99 data set demonstrate that the proposed approach performs exceptionally in terms of both accuracy and
computation time.
ATTACK DETECTION AVAILING FEATURE DISCRETION USING RANDOM FOREST CLASSIFIERCSEIJJournal
The widespread use of the Internet has an adverse effect of being vulnerable to cyber attacks. Defensive
mechanisms like firewalls and IDSs have evolved with a lot of research contributions happening in these
areas. Machine learning techniques have been successfully used in these defense mechanisms especially
IDSs. Although they are effective to some extent in identifying new patterns and variants of existing
malicious patterns, many attacks are still left as undetected. The objective is to develop an algorithm for
detecting malicious domains based on passive traffic measurements. In this paper, an anomaly-based
intrusion detection system based on an ensemble based machine learning classifier called Random Forest
with gradient boosting is deployed. NSL-KDD cup dataset is used for analysis and out of 41 features, 32
features were identified as significant using feature discretion. Our observations confirm the conjecture
that both the feature selection and stochastic based genetic operators improves the accuracy and the
effectiveness. The training time is shown to be reduced tremendously by 98.59% and accuracy improved to
98.75%.
Attack Detection Availing Feature Discretion using Random Forest ClassifierCSEIJJournal
The widespread use of the Internet has an adverse effect of being vulnerable to cyber attacks. Defensive
mechanisms like firewalls and IDSs have evolved with a lot of research contributions happening in these
areas. Machine learning techniques have been successfully used in these defense mechanisms especially
IDSs. Although they are effective to some extent in identifying new patterns and variants of existing
malicious patterns, many attacks are still left as undetected. The objective is to develop an algorithm for
detecting malicious domains based on passive traffic measurements. In this paper, an anomaly-based
intrusion detection system based on an ensemble based machine learning classifier called Random Forest
with gradient boosting is deployed. NSL-KDD cup dataset is used for analysis and out of 41 features, 32
features were identified as significant using feature discretion.
Analysis on different Data mining Techniques and algorithms used in IOTIJERA Editor
In this paper, we discusses about five functionalities of data mining in IOT that affects the performance and that
are: Data anomaly detection, Data clustering, Data classification, feature selection, time series prediction. Some
important algorithm has also been reviewed here of each functionalities that show advantages and limitations as
well as some new algorithm that are in research direction. Here we had represent knowledge view of data
mining in IOT.
Wmn06MODERNIZED INTRUSION DETECTION USING ENHANCED APRIORI ALGORITHM ijwmn
Communication networks are essential and it will create many crucial issues today. Nowadays, we
consider that the firewalls are the first line of defense but that policies cannot meet the particular
requirements of needed process to achieve security. Most of the research has been done in this area but
we are lagging to achieve security needs. Already many models such as ADAM, DHP, LERAD and
ENTROPHY are proposed to resolve security problems but we need an efficient model to detect new types
of various intrusions within the entire network. In this paper, we proposed to design a modernized
intrusion detection system which consist of two methods such as anomaly and misuse detection. Both are
integrated and also used to detect novel attacks. Our system proposed to discover temporal pattern of
attacker behaviors, which is profiled using an algorithm EAA (Enhanced Apriori Algorithm). This is
experimented with a simple interface to display the behaviors of attacks effectively
Online stream mining approach for clustering network trafficeSAT Journals
Abstract A large number of research have been proposed on intrusion detection system, which leads to the implementation of agent based intelligent IDS (IIDS), Non – intelligent IDS (NIDS), signature based IDS etc. While building such IDS models, learning algorithms from flow of network traffic plays crucial role in accuracy of IDS systems. The proposed work focuses on implementing the novel method to cluster network traffic which eliminates the limitations in existing online clustering algorithms and prove the robustness and accuracy over large stream of network traffic arriving at extremely high rate. We compare the existing algorithm with novel methods to analyse the accuracy and complexity. Keywords— NIDS, Data Stream Mining, Online Clustering, RAH algorithm, Online Efficient Incremental Clustering algorithm
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology
A Survey on Different Machine Learning Algorithms and Weak Classifiers Based ...gerogepatton
Network intrusion detection often finds a difficulty in creating classifiers that could handle unequal
distributed attack categories. Generally, attacks such as Remote to Local (R2L) and User to Root (U2R)
attacks are very rare attacks and even in KDD dataset, these attacks are only 2% of overall datasets. So,
these result in model not able to efficiently learn the characteristics of rare categories and this will result in
poor detection rates of rare attack categories like R2L and U2R attacks. We even compared the accuracy of
KDD and NSL-KDD datasets using different classifiers in WEKA.
A survey of Network Intrusion Detection using soft computing Techniqueijsrd.com
with the impending era of internet, the network security has become the key foundation for lot of financial and business application. Intrusion detection is one of the looms to resolve the problem of network security. An Intrusion Detection System (IDS) is a program that analyses what happens or has happened during an execution and tries to find indications that the computer has been misused. Here we propose a new approach by utilizing neuro fuzzy and support vector machine with fuzzy genetic algorithm for higher rate of detection.
SURVEY OF NETWORK ANOMALY DETECTION USING MARKOV CHAINijcseit
Recently an internet threat has been increased. Our motive is detect the intrusion in the network in concise.
The real time issue such as DoS attack in banking, companies, industries and organization have been
increased significantly IDS has been used in both server and host side. The major challenge is to effectively
predict the periods of threats and protect the server from the unauthorized user. In this study, a novel
probabilistic approach is proposed effectively to detect the network intrusions. It uses a Markov chain for
probabilistic modelling of abnormal events in network systems. The degree of abnormality of the incoming
data is performed on the basis of the network states.
International Journal of Computer Science, Engineering and Information Techno...ijcseit
Recently an internet threat has been increased. Our motive is detect the intrusion in the network in concise.
The real time issue such as DoS attack in banking, companies, industries and organization have been
increased significantly IDS has been used in both server and host side. The major challenge is to effectively
predict the periods of threats and protect the server from the unauthorized user. In this study, a novel
probabilistic approach is proposed effectively to detect the network intrusions. It uses a Markov chain for
probabilistic modelling of abnormal events in network systems. The degree of abnormality of the incoming
data is performed on the basis of the network states.
COPYRIGHTThis thesis is copyright materials protected under the .docxvoversbyobersby
COPYRIGHT
This thesis is copyright materials protected under the Berne Convection, the copyright Act 1999 and other international and national enactments in that behalf, on intellectual property. It may not be reproduced by any means in full or in part except for short extracts in fair dealing so for research or private study, critical scholarly review or discourse with acknowledgment, with written permission of the Dean School of Graduate Studies on behalf of both the author and XXX XXX University.ABSTRACT
With Fast growing internet world the risk of intrusion has also increased, as a result Intrusion Detection System (IDS) is the admired key research field. IDS are used to identify any suspicious activity or patterns in the network or machine, which endeavors the security features or compromise the machine. IDS majorly use all the features of the data. It is a keen observation that all the features are not of equal relevance for the detection of attacks. Moreover every feature does not contribute in enhancing the system performance significantly. The main aim of the work done is to develop an efficient denial of service network intrusion classification model. The specific objectives included: to analyse existing literature in intrusion detection systems; what are the techniques used to model IDS, types of network attacks, performance of various machine learning tools, how are network intrusion detection systems assessed; to find out top network traffic attributes that can be used to model denial of service intrusion detection; to develop a machine learning model for detection of denial of service network intrusion.Methods: The research design was experimental and data was collected by simulation using NSL-KDD dataset. By implementing Correlation Feature Selection (CFS) mechanism using three search algorithms, a smallest set of features is selected with all the features that are selected very frequently. Findings: The smallest subset of features chosen is the most nominal among all the feature subset found. Further, the performances using Artificial neural networks(ANN), decision trees, Support Vector Machines (SVM) and K-Nearest Neighbour (KNN) classifiers is compared for 7 subsets found by filter model and 41 attributes. Results: The outcome indicates a remarkable improvement in the performance metrics used for comparison of the two classifiers. The results show that using 17/18 selected features improves DOS types classification accuracies as compared to using the 41 features in the NSL-KDD dataset. It was further observed that using an ensemble of three classifiers with decision fusion performs better as compared to using a single classifier for DOS type’s classification. Among machine learning tools experimented, ANN achieved best classification accuracies followed by SVM and DT. KNN registered the lowest classification accuracies. Application: The proposed work with such an improved detection rate and lesser classification time and lar.
DETECTION OF ATTACKS IN WIRELESS NETWORKS USING DATA MINING TECHNIQUESIAEME Publication
With the progressive increase of network application and electronic devices (computer, mobile phones, android, etc), attack and intrusion detection is becoming a very challenging task in cybercrime detection area. in this context, most of existing approaches of attack detection rely mainly on a finite set of attacks. However, these solutions are vulnerable, that is, they fail in detecting some attacks when sources of information’s are ambiguous or imperfect. But, few approaches started investigating toward this direction. Following this trends, this paper investigates the role of machine learning approach (ANN, SVM) in detecting TCP connection traffic as normal or suspicious one. But, using ANN and SVM is an expensive technique individually. In this paper, combining two classifiers has been proposed, where artificial neural network (ANN) classifier and support vector machine (SVM) were employed. Additionally, our proposed solution allows visualizing obtained classification results. Accuracy of the proposed solution has been compared with other classifier results. Experiments have been conducted with different network connection selected from NSL-KDD DARPA dataset. Empirical results show that combining ANN and SVM techniques for attack detection is a promising direction
1. Sonika Tiwari, Prof. Roopali Soni / International Journal of Engineering Research and
Applications (IJERA) ISSN: 2248-9622 www.ijera.com
Vol. 2, Issue 5, September- October 2012, pp.1495-1500
Performance Comparison Of Different Clustering Algorithms
With ID3 Decision Tree Learning Method For Network
Anomaly Detection
Sonika Tiwari*, Prof. Roopali Soni**
*(Department of Computer Science, OIST Bhopal)
** (Department of Computer Science, OIST Bhopal)
ABSTRACT
This paper proposes a combinatorial method the signature of known attacks. Anomaly detection
based on different clustering algorithms with ID3 keeps a profile of normal system behavior and
decision tree classification for the classification of interprets any significant deviation from this normal
network anomaly detection. The idea is to detect profile as malicious activity. One of the strengths of
the network anomalies by first applying any anomaly detection is the ability to detect new
clustering algorithm to partition it into a number attacks. Anomaly detection’s most serious weakness
of clusters and then applying ID3 algorithm for is that it generates too many false alarms. Anomaly
the decision that whether an anomaly has been detection falls into two categories: supervised
detected or not. An ID3 decision tree is anomaly detection and unsupervised anomaly
constructed on each cluster. A special algorithm detection. In supervised anomaly detection, the
is used to combine results of the two algorithms instances of the data set used for training the system
and obtain final anomaly score values. The are labelled either as normal or as specific attack
threshold rule is applied for making decision on type. The problem with this approach is that labeling
the test instance normality or abnormality. Here the data is time consuming. Unsupervised anomaly
we are comparing the result performance of the detection, on the other hand, operates on unlabeled
best clustering algorithm for the detection of the data. The advantage of using unlabeled data is that
network anomalies. The algorithms that we shall the unlabeled data is easy and inexpensive to obtain.
apply here are k-mean algorithm, hierarchical The main challenge in performing unsupervised
clustering, expected maximization clustering. All anomaly detection is distinguishing the normal data
these algorithms are first applied on the data sets patterns from attack data patterns.
consisting of a captured network ARP traffic to Recently, clustering has been investigated as one
group them into a number of clusters and then approach to solving this problem. As attack data
by applying ID3 decision tree classification on patterns are assumed to differ from normal data
each of the clustering algorithm for the detection patterns, clustering can be used to distinguish attack
of the network anomalies and compare the data patterns from normal data patterns. Clustering
performance of each clustering algorithm. network traffic data is difficult because:
1. of high data volume
I. INTRODUCTION 2. of high data dimension
It is important for companies to keep their 3. the distribution of attack and normal classes is
computer systems secure because their economical skewed
activities rely on it. Despite the existence of attack 4. the data is a mixture of categorical and
prevention mechanisms such as firewalls, most continuous data
company computer networks are still the victim of 5. of the pre-processing of the data required.
attacks. According to the statistics of CERT [1], the
number of reported incidents against computer Network anomaly detection
networks has increased from 252 in 1990 to 21756 As we explained earlier, detectors need
in 2000 and to 137529 in 2003. This happened models or rules for detecting intrusions. These
because of misconfiguration of firewalls or because models can be built off-line on the basis of earlier
malicious activities are generally cleverly designed network traffic data gathered by agents. Once the
to circumvent the firewall policies. It is therefore model has been built, the task of detecting and
crucial to have another line of defence in order to stopping intrusions can be performed online. One of
detect and stop malicious activities. This line of the weaknesses of this approach is that it is not
defence is intrusion detection systems (IDS). adaptive. This is because small changes in traffic
During the last decades, different approaches to affect the model globally. Some approaches to
intrusion detection have been explored. The two anomaly detection perform the model construction
most common approaches are misuse detection and and anomaly detection simultaneously on-line. In
anomaly detection. In misuse detection, attacks are some of these approaches clustering has been used.
detected by matching the current traffic pattern with One of the advantages of online modelling is that it
is less time consuming because it does not require a
1495 | P a g e
2. Sonika Tiwari, Prof. Roopali Soni / International Journal of Engineering Research and
Applications (IJERA) ISSN: 2248-9622 www.ijera.com
Vol. 2, Issue 5, September- October 2012, pp.1495-1500
separate training phase. Furthermore, the model 3: else if all tuples in S have the same class value
reflects the current nature of network traffic. The then
problem with this approach is that it can lead to 4: Return a leaf with that specific class value.
inaccurate models. This happens because this 5: else
approach fails to detect attacks performed 6: Determine attribute A with the highest
systematically over a long period of time. These information gain in S.
types of attacks can only be detected by analysing 7: Partition S in m parts S(a1), ..., S(am) such that
network traffic gathered over a long period of time. a1, ..., am are the different values of A.
The clusters obtained by clustering network traffic 8: Return a tree with root A and m branches labeled
data off-line can be used for either anomaly a1...am, such that branch i contains ID3(R − {A} ,C,
detection or misuse detection. For anomaly S(ai)).
detection, it is the clusters formed by the normal 9: end if
data that are relevant for model construction. For
misuse detection, it is the different attack clusters II. RELATED WORK
that are used for model construction. Paper: A Novel Unsupervised Classification
Approach for Network Anomaly Detection by K
Clustering is a division of data into groups Means Clustering and ID3 Decision Tree
of similar objects. Each group, called cluster, Learning Methods[8].
consists of objects that are similar amongst them and
dissimilar compared to objects of other groups. Author: Yasser Yasami, Saadat Pour Mozaffari,
Representing data by fewer clusters necessarily loses Computer Engineering Department Amirkabir
certain fine details, but achieves simplification. It University of Technology (AUT) Tehran, Iran.
represents many data objects by few clusters, and Abstract: This paper presents a novel host-based
hence, it models data by its clusters. combinatorial method based on k-Means clustering
Cluster analysis is the organization of a collection of and ID3 decision tree learning algorithms for
patterns (usually represented as a vector of unsupervised classification of anomalous and
measurements, or a point in a multidimensional normal activities in computer network ARP traffic.
space) into clusters based on similarity. Patterns The k-Means clustering method is first applied to
within a valid cluster are more similar to each other the normal training instances to partition it into k
than they are to a pattern belonging to a different clusters using Euclidean distance similarity. An ID3
cluster. It is important to understand the decision tree is constructed on each cluster.
difference between clustering (unsupervised Anomaly scores from the k-Means clustering
classification) and discriminate analysis algorithm and decisions of the ID3 decision trees are
(supervised classification). In supervised extracted. A special algorithm is used to combine
classification, we are provided with a collection of results of the two algorithms and obtain final
labelled (preclassified) patterns; the problem is to anomaly score values. The threshold rule is applied
label a newly encountered, yet unlabeled, pattern. for making decision on the test instance normality or
Typically, the given labeled (training) patterns are abnormality.
used to learn the descriptions of classes which in turn Conclusion: The proposed method is compared with
are used to label a new pattern. In the case of the individual k-Means and ID3 methods and the
clustering, the problem is to group a given collection other proposed approaches based on markovian
of unlabeled patterns into meaningful clusters. In a chains and stochastic learning automata in terms of
sense, labels are associated with clusters also, but the overall classification performance defined over
these category labels are data driven; that is, they five different performance measures. Results on real
are obtained solely from the data [2,3,4]. evaluation test bed network data sets show that: the
proposed method outperforms the individual k-
ID3 Algorithm Means and the ID3 compared to the other
The ID3 algorithm (Inducing Decision approaches.
Trees) was originally introduced by Quinlan in [11] Paper: Privacy Preserving ID3 over
and is described below in Algorithm 1. Here we Horizontally, Vertically and Grid Partitioned
briefly recall the steps involved in the algorithm. For Data [7].
a thorough discussion of the algorithm we refer the
interested reader to [10]. Author: Bart Kuijpers, Vanessa Lemmens, Bart
Require: R, a set of attributes. Moelans Theoretical Computer Science, Hasselt
Require: C, the class attribute. University & Transnational University Limburg,
Require: S, data set of tuples. Belgium.
1: if R is empty then Abstract: This consider privacy preserving decision
2: Return the leaf having the most frequent value in tree induction via ID3 in the case where the training
data set S. data is horizontally or vertically distributed.
Furthermore, we consider the same problem in the
1496 | P a g e
3. Sonika Tiwari, Prof. Roopali Soni / International Journal of Engineering Research and
Applications (IJERA) ISSN: 2248-9622 www.ijera.com
Vol. 2, Issue 5, September- October 2012, pp.1495-1500
case where the data is both horizontally and factors: size of dataset, number of clusters, type of
vertically distributed, a situation we refer to as grid dataset and type of software used.
partitioned data. We give an algorithm for privacy Conclusion: The main conclusion that can be
preserving ID3 over horizontally partitioned data concluded is the performance comparison of
involving more than two parties. For grid partitioned different clustering algorithm.
data, we discuss two different evaluation methods Paper: Dynamic Network Evolution: Models,
for preserving privacy ID3, namely, first merging Clustering, Anomaly Detection[9].
horizontally and developing vertically or first
merging vertically and next developing horizontally. Author: Cemal Cagatay Bilgin and B¨ulent Yener
Next to introducing privacy preserving data mining Rensselaer Polytechnic Institute, Troy NY, 12180.
over grid-partitioned data, the main contribution of Abstract: Traditionally, research on graph theory
this paper is that we show, by means of a focused on studying graphs that are static. However,
complexity analysis that the former evaluation almost all real networks are dynamic in nature and
method is the more efficient. large in size. Quite recently, research areas for
Conclusion: Here the datasets when partitioned studying the topology, evolution, applications of
horizontally, vertically and after that the clustering complex evolving networks and processes occurring
algorithm is applied performs better performance in them and governing them attracted attention from
than on the whole datasets. researchers. In this work, we review the significant
Paper: A comparison of clustering method for contributions in the literature on complex evolving
unsupervised anomaly detection in network networks; metrics used from degree distribution to
traffic[5]. spectral graph analysis, real world applications from
biology to social sciences, problem domains from
Author: Koffi Bruno Yao. anomaly detection, dynamic graph clustering to
Abstract: Network anomaly detection aims at community detection.
detecting malicious activities in computer network Conclusion: Many real world complex systems can
traffic data. In this approach, the normal profile of be represented as graphs. The entities in these
the network traffic is modelled and any significant system represent the nodes or vertices and links or
deviation from this normal profile is interpreted as edges connect a pair or more of the nodes. We
malicious. While supervised anomaly detection encounter such networks in almost any application
models the normal traffic behaviour on the basis of domain i.e. computer science, sociology, chemistry,
an attack free data set, unsupervised anomaly biology, anthropology, psychology, geography,
detection works on a data set which contains both history, engineering.
normal and attack data. Clustering has recently been
investigated as one way of approaching the issues of III. Proposed SCHEME
unsupervised anomaly detection.
Conclusion: The main goal of the paper has been to Algorithm :1 K-mean Clustering Algorithm
investigate the efficiency of different classical 1) Pick a number (K) of cluster centers (at
clustering algorithms in clustering network traffic random)
data for unsupervised anomaly detection. The 2) Assign every item to its nearest cluster
clusters obtained by clustering the network traffic center (e.g. using Euclidean distance)
data set are intended to be used by a security expert 3) Move each cluster center to the mean of its
for manual labelling. A second goal has been to assigned items
study some possible ways of combining these 4) Repeat steps 2, 3 until convergence
algorithms in order to improve their performance. (change in cluster assignments less than a
Paper: Comparisons between Data Clustering threshold).
Algorithms [6].
Algorithm : 2 Hierarchical Clustering
Author: Osama Abu Abbas Computer Science
Bottom up
Department, Yarmouk University, Jordan 1) Start with single-instance clusters
Abstract: Clustering is a division of data into group s 2) At each step, join the two closest clusters
of similar objects. Each group, called a cluster, 3) Design decision: distance between clusters
consists of objects that are similar between Top down
themselves and dissimilar compared to objects of 1) Start with one universal cluster
other groups. This paper is intended to study 2) Find two clusters
and compare different data clustering algorithms. 3) Proceed recursively on each subset
The algorithms under investigation are: k-means 4) Can be very fast
algorithm, hierarchical clustering algorithm, self-
organizing maps algorithm, and expectation
maximization clustering algorithm. All these
algorithms are compared according to the following
1497 | P a g e
4. Sonika Tiwari, Prof. Roopali Soni / International Journal of Engineering Research and
Applications (IJERA) ISSN: 2248-9622 www.ijera.com
Vol. 2, Issue 5, September- October 2012, pp.1495-1500
Algorithm 3: Density Based Clustering IV. DATASET USED
Here we are using a number of attributes
1) select a point p and contains a class attribute to find whether an
2) Retrieve all points density-reachable from p wrt and MinPts.anamoly has been or not.
3) If p is a core point, a cluster is formed. Node {A, B, C}
4) If p is a border point, no points are density-reachable from pLoad DBSCAN
and {Hign,Low}
visits the next point of the database. Transmission {TCP, UDP}
5) Continue the prsocess until all of the points have been processed. Address {192.168.1.1, 192.168.1.2}
Mac
Algorithm 4: Proposed ID3 Algorithm Class_anomaly {yes, no}
Input Layer A,low,192.168.1.1,udp,no A,low,192.168.1.1,udp,no
Define P1, P2, …., Pn Parties.(Horizontally C,low,192.168.1.1,udp,yes C,low,192.168.1.1,udp,yes
partitioned). A,high,192.168.1.2,tcp,yes A,high,192.168.1.2,tcp,yes
Each Party contains R set of attributes A1, A2, …., B,high,192.168.1.2,tcp,yes B,high,192.168.1.2,tcp,yes
AR. A,low,192.168.1.1,udp,no B,low,192.168.1.1,tcp,yes
C the class attributes contains c class values C1, C2, C,low,192.168.1.1,udp,yes A,low,192.168.1.1,udp,no
…., Cc. A,high,192.168.1.2,tcp,yes C,low,192.168.1.1,udp,yes
For party Pi where i = 1 to n do B,high,192.168.1.2,tcp,yes A,high,192.168.1.2,tcp,yes
If R is Empty Then B,low,192.168.1.1,udp,yes B,high,192.168.1.2,tcp,yes
Return a leaf node with class value A,low,192.168.1.1,udp,no B,low,192.168.1.1,udp,yes
Else If all transaction in T(Pi) have the same class A,high,192.168.1.2,tcp,yes A,low,192.168.1.1,udp,no
Then B,low,192.168.1.1,udp,yes A,high,192.168.1.2,tcp,yes
Return a leaf node with the class value A,low,192.168.1.1,udp,no C,high,192.168.1.1,tcp,no
A,high,192.168.1.2,tcp,yes A,high,192.168.1.2,tcp,yes
Else
Calculate Expected Information classify the given
sample for each party Pi individually.
A,low,192.168.1.1,udp,no,cl A,low,192.168.1.1,udp,no,cluste
Calculate Entropy for each attribute (A1, A2, …., AR)
uster0 r0
of each party Pi.
C,low,192.168.1.1,udp,yes,c C,low,192.168.1.1,udp,yes,clust
Calculate Information Gain for each attribute (A1,
luster0 er0
A2,…., AR) of each party Pi
A,high,192.168.1.2,tcp,yes,c A,high,192.168.1.2,tcp,yes,clust
Calculate Total Information Gain for each attribute luster1 er0
of all parties (TotalInformationGain( )). B,high,192.168.1.2,tcp,yes,c B,high,192.168.1.2,tcp,yes,clust
ABestAttribute MaxInformationGain( ) uster1 er0
Let V1, V2, …., Vm be the value of attributes. A,low,192.168.1.1,udp,no,cl B,low,192.168.1.1,tcp,yes,cluste
ABestAttribute partitioned P1, P2,…., Pn parties into m uster0 r0
parties C,low,192.168.1.1,udp,yes,c A,low,192.168.1.1,udp,no,cluste
P1(V1), P1(V2), …., P1(Vm) luster0 r0
P2(V1), P2(V2), …., P2(Vm) A,high,192.168.1.2,tcp,yes,c C,low,192.168.1.1,udp,yes,clust
. . luster1 er0
. . B,high,192.168.1.2,tcp,yes,c A,high,192.168.1.2,tcp,yes,clust
Pn(V1), Pn(V2), …., Pn(Vm) luster1 er0
Return the Tree whose Root is labelled ABestAttribute B,low,192.168.1.1,udp,yes,c B,high,192.168.1.2,tcp,yes,clust
and has m edges labelled V1, V2, …., Vm. Such that luster0 er0
for every i the edge Vi goes to the Tree A,low,192.168.1.1,udp,no,cl B,low,192.168.1.1,udp,yes,clust
NPPID3(R – ABestAttribute, C, (P1(Vi), P2(Vi), …., uster0 er0
Pn(Vi))) A,high,192.168.1.2,tcp,yes,c A,low,192.168.1.1,udp,no,cluste
End. luster1 r0
B,low,192.168.1.1,udp,yes,c A,high,192.168.1.2,tcp,yes,clust
luster0 er0
A,low,192.168.1.1,udp,no,cl C,high,192.168.1.1,tcp,no,cluste
uster0 r1
A,high,192.168.1.2,tcp,yes,c A,high,192.168.1.2,tcp,yes,clust
luster1 er0
1498 | P a g e
5. Sonika Tiwari, Prof. Roopali Soni / International Journal of Engineering Research and
Applications (IJERA) ISSN: 2248-9622 www.ijera.com
Vol. 2, Issue 5, September- October 2012, pp.1495-1500
V. RESULT ANALYSIS In Table 2. Our proposed work performs less means
square error as compared to the existing algorithm.
ID3_Relative HP_Relative
number_of_instances absolute error absolute error
14 60% NP
25 63.22% 60.13%
50 64.53% 63.24%
100 64.28% 63.63%
200 65.06% 64.28%
Table 3.
In Table 3. It was observed that the
proposed algorithm has less absolute error than the
existing algorithm.
Clustering Time Mean Mean
with (ms) absolute absolute
proposed id3 error error
K-mean with 47 0.0714 14.2105 %
proposed id3
Hierarchical 31 0.0357 36.5854 %
number_of_instances id3_time(ms) HP_time(ms) with proposed
id3
14 78 15
EM with 43 0.0238 5.4119 %
25 93 15 proposed id3
50 110 16 Table 4.
100 125 31 As shown in the table 4 is the comparative
study of different clustering algorithm with our
200 150 32 proposed algorithm.
Table 1.
As shown in Table 1. is the time needed for Clustering Time Mean Mean
the decision of any dataset. It was observerd that the with existing (ms) absolute absolute
existing id3takes more time as compared our id3 error error
proposed work. K-mean with 65 0.0914 20.2105 %
Where, existing id3
HP is the proposed horizontal partioned based ID3. Hierarchical 50 0.0557 45.5854 %
Relative absolute error can be calculated with existing
id3
as: EM with 60 0.0438 7.4119 %
Mean squared error can be calculated existing id3
Table 5
as: VI. CONCLUSION
with The clustering algorithms are used to
Actual target values: a1 a2 … an divide any datasets into a number of clusters, this
Predicted target values: p1 p2 … pn time clustering algorithms are combined with ID3
number_of_instan ID3 Mean HP Mean algorithm to detect the network anomaly detection
ces absolute absolute and the performance is compared with the other
error error clustering algorithms. The proposed algorithm
implemented here provides a way of classifying and
provides better leaning of the network anomalies
14 0.2857 NP and normal activities in computer network ARP
25 0.24 0.237 traffic.
50 0.24 0.24
100 0.23 0.22
200 0.235 0.23
Table 2.
1499 | P a g e
6. Sonika Tiwari, Prof. Roopali Soni / International Journal of Engineering Research and
Applications (IJERA) ISSN: 2248-9622 www.ijera.com
Vol. 2, Issue 5, September- October 2012, pp.1495-1500
noise. Proceedings of 2nd international
REFERENCES Conference on Knowledge Discovery and
[1] A comparative Study of Anomaly Data Mining, 1996.
Detection Schemes in Network Intrusion [13] A. Wespi, G. Vigna and L.Deri. Recent
Detection, A. Lazarevic, L. Ertoz, V. Advances in Intrusion Detection. 5th
Kumar, A. Ozgur, J. Srivastava. International Symposium, Raid 2002
[2] Keogh E., Chakrabarti K., Pazzani M., Zurich, Switzerland, October 2002
and Mehrotra S., “Dimensionality Proceedings. Springer.
Reduction for Fast Similarity Search in [14] G. Qu, S. Hariri, and M. Yousif, “A New
Large Time Series Databases,” Knowledge Dependency and Correlation Analysis for
and Information Systems, vol. 3, pp. 263- Features,” IEEE Trans. Knowledge and
286, 2001. Data Eng., vol. 17, no. 9, pp. 1199-1207,
[3] Lepere R. and Trystram D., “A New Sept. 2005.
Clustering Algorithm for Large [15] J. Kittler, M. Hatef, R.P.W. Duin, and J.
Communication Delays,” in Proceedings Matas, “On Combining Classifiers,” IEEE
of 16th IEEE-ACM Annual International Trans. Pattern Analysis and Machine
Parallel and Distributed Processing Intelligence, vol. 20, no. 3, pp. 226-239,
Symposium (IPDPS’02), Fort Lauderdale, Mar. 1998.
USA, 2002.
[4] Li C. and Biswas G., “Unsupervised
Learning with Mixed Numeric and
Nominal Data,” IEEE Transactions on
Knowledge and Data Engineering, vol. 14,
no. 4, pp. 673-690, 2002.
[5] A comparison of clustering method for
unsupervised anomaly detection in network
traffic, Koffi Bruno Yao.
[6] Comparisons between Data Clustering
Algorithms,Osama Abu Abbas Computer
Science Department, Yarmouk University,
Jordan
[7] Privacy Preserving ID3 over Horizontally,
Vertically and Grid Partitioned Data,Bart
Kuijpers, Vanessa Lemmens, Bart Moelans
Theoretical Computer Science, Hasselt
University & Transnational University
Limburg, Belgium.
[8] A Novel Unsupervised Classification
Approach for Network Anomaly Detection
by K Means Clustering and ID3 Decision
Tree Learning Methods,Yasser Yasami,
Saadat Pour Mozaffari, Computer
Engineering Department Amirkabir
University of Technology (AUT) Tehran,
Iran.
[9] Dynamic Network Evolution: Models,
Clustering, Anomaly Detection,Cemal
Cagatay Bilgin and B¨ulent Yener
Rensselaer Polytechnic Institute, Troy NY,
12180.,
[10] Wenke Lee and S. J. Stolfo. Data Mining
Approaches for Intrusion Detection, 1998.
[11] Stefano Zanero and Sergio M. Savaresi.
Unsupervised learning techniques for an
intrusion detection system, ACM March
2004.
[12] Martin Ester, Hans-Peter Kriegel,Jorg
Sander,Xiaowei Xu. A density-based
clustering algorithm for discovering
Clusters in Large Spatial databases with
1500 | P a g e