An Intrusion Detection System (IDS) is a defense measure that supervises activities of the computer network and reports the malicious activities to the network administrator. Intruders do many attempts to gain access to the network and try to harm the organization’s data. Thus the security is the most important aspect for any type of organization. Due to these reasons, intrusion detection has been an important research issue. An IDS can be broadly classified as Signature based IDS and Anomaly based IDS. In our proposed work, the decision tree
algorithm is developed based on C4.5 decision tree approach. Feature selection and split value are important issues for constructing a decision tree. In this paper, the algorithm is designed to address these two issues. The most relevant features are selected using information gain and the split value is selected in such a way that makes the classifier unbiased towards most frequent values. Experimentation is performed on NSL-KDD (Network Security Laboratory Knowledge Discovery and Data Mining) dataset based on number of features. The time taken by the classifier to construct the model and the accuracy achieved is analyzed. It is concluded that the proposed Decision Tree Split (DTS) algorithm can be used for signature based intrusion detection.
Hybrid Approach for Intrusion Detection Model Using Combination of K-Means Cl...theijes
Any violation of information security policy with malicious intent is regarded as an intrusion. The fast evolving new kind of intrusions poses a very serious threat to system security, although there has been the rapid development of several security tools to counter the growing threats, intrusive activities are still growing. Many Intrusion Detection models have been implemented since the concept of Intrusion Detection emerged, but the majority of the existing Intrusion detection models have many drawbacks which include but not limited to low accuracy in detection, high false alarm rates, adaptability weakness, inability to detect new intrusions etc. The main aim of this study is proposing a model that combined simple K-Means clustering Algorithms and Random Forest classification technique that will have minimum false alarms rate and high accuracy detection rate. The experiment was carried out in WEKA 3.8 using the NSL-KDD dataset to process the dataset and obtained the results. At the end of training and testing of the proposed study, the results indicated that the proposed approach achieved improved accuracy and reduced false alarm rates by 99.98% and 0.14% respectively.
C LUSTERING B ASED A TTRIBUTE S UBSET S ELECTION U SING F AST A LGORITHmIJCI JOURNAL
In machine learning and data mining, attribute sel
ect is the practice of selecting a subset o
f most
consequential attributes for utilize in model const
ruction. Using an attribute select method is that t
he data
encloses many redundant or extraneous attributes. W
here redundant attributes are those which sup
ply
no supplemental information than the presently
selected attributes, and impertinent attribut
es offer
no valuable information in any context
Survey of network anomaly detection using markov chainijcseit
Recently an internet threat has been increased. Our motive is detect the intrusion in the network in concise.
The real time issue such as DoS attack in banking, companies, industries and organization have been
increased significantly IDS has been used in both server and host side. The major challenge is to effectively
predict the periods of threats and protect the server from the unauthorized user. In this study, a novel
probabilistic approach is proposed effectively to detect the network intrusions. It uses a Markov chain for
probabilistic modelling of abnormal events in network systems. The degree of abnormality of the incoming
data is performed on the basis of the network states.
Implementation of digital image watermarking techniques using dwt and dwt svd...eSAT Journals
Abstract
These days, in every field there is gigantic utilization of computerized substance. Data took care of on web and mixed media system framework is in advanced structure. Computerized watermarking is only the innovation in which there is inserting of different data in advanced substance, which we need to shield from illicit replicating. Computerized picture watermarking is concealing data in any structure (content, picture, sound and video) in unique picture without corrupting its perceptual quality. On the off chance that of Discrete Wavelet Transform (DWT), deterioration of the first picture is completed to insert the watermark. Moreover, if there should arise an occurrence of cross breed system (DWT-SVD) firstly picture is decayed by and after that watermark is installed in solitary qualities acquired by application of Singular Value Decomposition (SVD). DWT and SVD are utilized in combination to enhance the nature of watermarking. We have the procedures which are looked at on the premise of Peak Signal to Noise Ratio (PSNR) esteem at various benefits of scaling component; high estimation of PSNR is coveted because it displays great intangibility of the strategy.
Due to continuous growth of the Internet technology, it needs to establish security mechanism. Intrusion Detection System (IDS) is increasingly becoming a crucial component for computer and network security systems. Most of the existing intrusion detection techniques emphasize on building intrusion detection model based on all features provided. Some of these features are irrelevant or redundant. This paper is proposed to identify important input features in building IDS that is computationally efficient and effective. In this paper, we identify important attributes for each attack type by analyzing the detection rate. We input the specific attributes for each attack types to classify using Naive Bayes, and Random Forest. We perform our experiments on NSL-KDD intrusion detection data set, which consists of selected records of the complete KDD Cup 1999 intrusion detection dataset.
High performance intrusion detection using modified k mean & naïve bayeseSAT Journals
Abstract
Internet Technology is growing at exponential rate day by day, making data security of computer systems more complex and critical. There has been multiple methodology implemented for the same in recent time as detailed in [1], [3]. Availability of larger bandwidth has made the multiple large computer server network connected worldwide and thus increasing the load on the necessity to secure data and Intrusion detection system (IDS) is one of the most efficient technique to maintain security of computer system. The proposed system is designed in such a way that are helpful in identifying malicious behavior and improper use of computer system. In this report we proposed a hybrid technique for intrusion detection using data mining algorithms. Our main objective is to do complete analysis of intrusion detection Dataset to test the implemented system.In This report we will propose a new methodology in which Modified k-mean is used for clustering whereas Naïve Bayes for the classification. These two data mining techniques will be used for Intrusion detection in large horizontally distributed database.
Keywords: Intrusion Detection, Modified K-Mean, Naïve Bays
Hybrid Approach for Intrusion Detection Model Using Combination of K-Means Cl...theijes
Any violation of information security policy with malicious intent is regarded as an intrusion. The fast evolving new kind of intrusions poses a very serious threat to system security, although there has been the rapid development of several security tools to counter the growing threats, intrusive activities are still growing. Many Intrusion Detection models have been implemented since the concept of Intrusion Detection emerged, but the majority of the existing Intrusion detection models have many drawbacks which include but not limited to low accuracy in detection, high false alarm rates, adaptability weakness, inability to detect new intrusions etc. The main aim of this study is proposing a model that combined simple K-Means clustering Algorithms and Random Forest classification technique that will have minimum false alarms rate and high accuracy detection rate. The experiment was carried out in WEKA 3.8 using the NSL-KDD dataset to process the dataset and obtained the results. At the end of training and testing of the proposed study, the results indicated that the proposed approach achieved improved accuracy and reduced false alarm rates by 99.98% and 0.14% respectively.
C LUSTERING B ASED A TTRIBUTE S UBSET S ELECTION U SING F AST A LGORITHmIJCI JOURNAL
In machine learning and data mining, attribute sel
ect is the practice of selecting a subset o
f most
consequential attributes for utilize in model const
ruction. Using an attribute select method is that t
he data
encloses many redundant or extraneous attributes. W
here redundant attributes are those which sup
ply
no supplemental information than the presently
selected attributes, and impertinent attribut
es offer
no valuable information in any context
Survey of network anomaly detection using markov chainijcseit
Recently an internet threat has been increased. Our motive is detect the intrusion in the network in concise.
The real time issue such as DoS attack in banking, companies, industries and organization have been
increased significantly IDS has been used in both server and host side. The major challenge is to effectively
predict the periods of threats and protect the server from the unauthorized user. In this study, a novel
probabilistic approach is proposed effectively to detect the network intrusions. It uses a Markov chain for
probabilistic modelling of abnormal events in network systems. The degree of abnormality of the incoming
data is performed on the basis of the network states.
Implementation of digital image watermarking techniques using dwt and dwt svd...eSAT Journals
Abstract
These days, in every field there is gigantic utilization of computerized substance. Data took care of on web and mixed media system framework is in advanced structure. Computerized watermarking is only the innovation in which there is inserting of different data in advanced substance, which we need to shield from illicit replicating. Computerized picture watermarking is concealing data in any structure (content, picture, sound and video) in unique picture without corrupting its perceptual quality. On the off chance that of Discrete Wavelet Transform (DWT), deterioration of the first picture is completed to insert the watermark. Moreover, if there should arise an occurrence of cross breed system (DWT-SVD) firstly picture is decayed by and after that watermark is installed in solitary qualities acquired by application of Singular Value Decomposition (SVD). DWT and SVD are utilized in combination to enhance the nature of watermarking. We have the procedures which are looked at on the premise of Peak Signal to Noise Ratio (PSNR) esteem at various benefits of scaling component; high estimation of PSNR is coveted because it displays great intangibility of the strategy.
Due to continuous growth of the Internet technology, it needs to establish security mechanism. Intrusion Detection System (IDS) is increasingly becoming a crucial component for computer and network security systems. Most of the existing intrusion detection techniques emphasize on building intrusion detection model based on all features provided. Some of these features are irrelevant or redundant. This paper is proposed to identify important input features in building IDS that is computationally efficient and effective. In this paper, we identify important attributes for each attack type by analyzing the detection rate. We input the specific attributes for each attack types to classify using Naive Bayes, and Random Forest. We perform our experiments on NSL-KDD intrusion detection data set, which consists of selected records of the complete KDD Cup 1999 intrusion detection dataset.
High performance intrusion detection using modified k mean & naïve bayeseSAT Journals
Abstract
Internet Technology is growing at exponential rate day by day, making data security of computer systems more complex and critical. There has been multiple methodology implemented for the same in recent time as detailed in [1], [3]. Availability of larger bandwidth has made the multiple large computer server network connected worldwide and thus increasing the load on the necessity to secure data and Intrusion detection system (IDS) is one of the most efficient technique to maintain security of computer system. The proposed system is designed in such a way that are helpful in identifying malicious behavior and improper use of computer system. In this report we proposed a hybrid technique for intrusion detection using data mining algorithms. Our main objective is to do complete analysis of intrusion detection Dataset to test the implemented system.In This report we will propose a new methodology in which Modified k-mean is used for clustering whereas Naïve Bayes for the classification. These two data mining techniques will be used for Intrusion detection in large horizontally distributed database.
Keywords: Intrusion Detection, Modified K-Mean, Naïve Bays
Evaluation of network intrusion detection using markov chainIJCI JOURNAL
Day today life internet threat has been increased significantly. There is a need to develop model in order to
maintain security of system. The most effective techniques are Intrusion Detection System (IDS).The
purpose of intrusion system through the security devices detect and deal with it. In this paper, a
mathematical approach is used effectively to predict and detect intrusion in the network. Here we discuss
about two algorithms ‘K-Means + Apriori’, a method which classify normal and abnormal activities in
computer network. In K-Means process, it partitions the training set into K-clusters using Euclidean
distance and introduce an outlier factor, then it build Apriori Algorithm to prune the data by removing
infrequent data in the database. Based on defined state the degree of incoming data is evaluated through
the experiment using sample DARPA2000 dataset, and achieves high detection performance in level of
attack in stages.
Intrusion Detection System for Classification of Attacks with Cross Validationinventionjournals
Now days, due to rapidly uses of internet, the patterns of network attacks are increasing. There are various organizations and institutes are using internet and access or share the sensitive information in network. To protect information from unauthorized or intruders is one of the important issues. In this paper, we have used decision tree techniques like C4.5 and CART as classifier for classification of attacks. We have proposed an ensemble model that is combination of C4.5 and Classification and Regression Tree (CART) as robust classifier for classification of attacks. We have used NSL-KDD data set with binary and multiclass problem with 10-fold cross validation. The proposed ensemble model gives satisfactory accuracy as 99.67% and 99.53% in case of binary class and multiclass NSL-KDD data set respectively.
A novel ensemble modeling for intrusion detection system IJECEIAES
Vast increase in data through internet services has made computer systems more vulnerable and difficult to protect from malicious attacks. Intrusion detection systems (IDSs) must be more potent in monitoring intrusions. Therefore an effectual Intrusion Detection system architecture is built which employs a facile classification model and generates low false alarm rates and high accuracy. Noticeably, IDS endure enormous amounts of data traffic that contain redundant and irrelevant features, which affect the performance of the IDS negatively. Despite good feature selection approaches leads to a reduction of unrelated and redundant features and attain better classification accuracy in IDS. This paper proposes a novel ensemble model for IDS based on two algorithms Fuzzy Ensemble Feature selection (FEFS) and Fusion of Multiple Classifier (FMC). FEFS is a unification of five feature scores. These scores are obtained by using feature-class distance functions. Aggregation is done using fuzzy union operation. On the other hand, the FMC is the fusion of three classifiers. It works based on Ensemble decisive function. Experiments were made on KDD cup 99 data set have shown that our proposed system works superior to well-known methods such as Support Vector Machines (SVMs), K-Nearest Neighbor (KNN) and Artificial Neural Networks (ANNs). Our examinations ensured clearly the prominence of using ensemble methodology for modeling IDSs, and hence our system is robust and efficient.
TUPLE VALUE BASED MULTIPLICATIVE DATA PERTURBATION APPROACH TO PRESERVE PRIVA...IJDKP
Huge volume of data from domain specific applications such as medical, financial, library, telephone,
shopping records and individual are regularly generated. Sharing of these data is proved to be beneficial
for data mining application. On one hand such data is an important asset to business decision making by
analyzing it. On the other hand data privacy concerns may prevent data owners from sharing information
for data analysis. In order to share data while preserving privacy, data owner must come up with a solution
which achieves the dual goal of privacy preservation as well as an accuracy of data mining task –
clustering and classification. An efficient and effective approach has been proposed that aims to protect
privacy of sensitive information and obtaining data clustering with minimum information loss
PREDICTION OF MALIGNANCY IN SUSPECTED THYROID TUMOUR PATIENTS BY THREE DIFFER...cscpconf
In the present study, the abilities of three classification methods of data mining namely artificial
neural networks with feed-forward back propagation algorithm, J48 decision tree method and
logistic regression analysis are compared in a medical real dataset. The prediction of
malignancy in suspected thyroid tumour patients is the objective of the study. The accuracy of
the correct predictions (the minimum error rate), the amount of time consuming in the
modelling process and the interpretability and simplicity of the results for clinical experts are
the factors considered to choose the best method
N ETWORK F AULT D IAGNOSIS U SING D ATA M INING C LASSIFIERScsandit
Mobile networks are under more pressure than ever b
efore because of the increasing number of
smartphone users and the number of people relying o
n mobile data networks. With larger
numbers of users, the issue of service quality has
become more important for network operators.
Identifying faults in mobile networks that reduce t
he quality of service must be found within
minutes so that problems can be addressed and netwo
rks returned to optimised performance. In
this paper, a method of automated fault diagnosis i
s presented using decision trees, rules and
Bayesian classifiers for visualization of network f
aults. Using data mining techniques the model
classifies optimisation criteria based on the key p
erformance indicators metrics to identify
network faults supporting the most efficient optimi
sation decisions. The goal is to help wireless
providers to localize the key performance indicator
alarms and determine which Quality of
Service factors should be addressed first and at wh
ich locations
Comparative Study of Machine Learning Algorithms for Sentiment Analysis with ...Sagar Deogirkar
Comparing the State-of-the-Art Deep Learning with Machine Learning algorithms performance on TF-IDF vector creation for Sentiment Analysis using Airline Tweeter Data Set.
An approach for ids by combining svm and ant colony algorithmeSAT Journals
Abstract This piece of work researches the intrusion detection problem of the network sanctuary; the primary task is to classify network behavior as normal or abnormal while reducing misclassification. In this paper, two efficient data mining algorithms are combined together to detect the network intrusion. Combining SVM and Ant colony (CSVAC) used for well-organized data classification, this technique takes the advantage of both the algorithm while avoiding their weaknesses. This algorithm is implemented and evaluated using standard benchmark KDDCUP99 data set. Experimental results drastically well produce superior results than the other algorithm in terms of accuracy rate and run time efficiency, and this algorithm able to detect the new types of attacks Keywords: Intrusion Detection; Support Vector Machine; Ant colony; Combined Support vector with ant colony
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology
Intrusion Detection System Based on K-Star Classifier and Feature Set ReductionIOSR Journals
Abstract: Network security and Intrusion Detection Systems (IDS’s) is an important security related research
area. This paper applies K-star algorithm with filtering analysis in order to build a network intrusion detection
system. For our experimental analysis and as a case study, we have used the new NSL-KDD dataset, which is a
modified dataset for KDDCup 1999 intrusion detection benchmark dataset. With a split of 66.0% for the
training set and the remainder for the testing set a 2 class classifications has been implemented. WEKA which is
a java based open source software consists of a collection of machine learning algorithms for Data mining tasks
has been used in the testing process. The experimental results show that the proposed approach is very accurate
with low false positive rate and high true positive rate and it takes less learning time in comparison with other
existing approaches used for efficient network intrusion detection.
Keywords: Information Gain, Intrusion Detection System, Instance-based classifier, K-Star, Weka.
Detection of malicious attacks by Meta classification algorithmsEswar Publications
We address the problem of malicious node detection in a network based on the characteristics in the behavior of the network. This issue brings out a challenging set of research papers in the recent contributing a critical component to secure the network. This type of work evolves with many changes in the solution strategies. In this work, we propose carefully the learning models with cautious selection of attributes, selection of parameter thresholds and number of iterations. In this research, appropriate approach to evaluate the performance of a set of meta classifier algorithms (Ad Boost, Attribute selected classifier, Bagging, Classification via Regression,
Filtered classifier, logit Boost, multiclass classifier). The ratio between training and testing data is made such way that compatibility of data patterns in both the sets are same. Hence we consider a set of supervised machine learning schemes with meta classifiers were applied on the selected dataset to predict the attack risk of the network environment . The trained models were then used for predicting the risk of the attacks in a web server environment or by any network administrator or any Security Experts. The Prediction Accuracy of the Classifiers was evaluated using 10-fold Cross Validation and the results have been compared to obtain the accuracy.
Evaluation of network intrusion detection using markov chainIJCI JOURNAL
Day today life internet threat has been increased significantly. There is a need to develop model in order to
maintain security of system. The most effective techniques are Intrusion Detection System (IDS).The
purpose of intrusion system through the security devices detect and deal with it. In this paper, a
mathematical approach is used effectively to predict and detect intrusion in the network. Here we discuss
about two algorithms ‘K-Means + Apriori’, a method which classify normal and abnormal activities in
computer network. In K-Means process, it partitions the training set into K-clusters using Euclidean
distance and introduce an outlier factor, then it build Apriori Algorithm to prune the data by removing
infrequent data in the database. Based on defined state the degree of incoming data is evaluated through
the experiment using sample DARPA2000 dataset, and achieves high detection performance in level of
attack in stages.
Intrusion Detection System for Classification of Attacks with Cross Validationinventionjournals
Now days, due to rapidly uses of internet, the patterns of network attacks are increasing. There are various organizations and institutes are using internet and access or share the sensitive information in network. To protect information from unauthorized or intruders is one of the important issues. In this paper, we have used decision tree techniques like C4.5 and CART as classifier for classification of attacks. We have proposed an ensemble model that is combination of C4.5 and Classification and Regression Tree (CART) as robust classifier for classification of attacks. We have used NSL-KDD data set with binary and multiclass problem with 10-fold cross validation. The proposed ensemble model gives satisfactory accuracy as 99.67% and 99.53% in case of binary class and multiclass NSL-KDD data set respectively.
A novel ensemble modeling for intrusion detection system IJECEIAES
Vast increase in data through internet services has made computer systems more vulnerable and difficult to protect from malicious attacks. Intrusion detection systems (IDSs) must be more potent in monitoring intrusions. Therefore an effectual Intrusion Detection system architecture is built which employs a facile classification model and generates low false alarm rates and high accuracy. Noticeably, IDS endure enormous amounts of data traffic that contain redundant and irrelevant features, which affect the performance of the IDS negatively. Despite good feature selection approaches leads to a reduction of unrelated and redundant features and attain better classification accuracy in IDS. This paper proposes a novel ensemble model for IDS based on two algorithms Fuzzy Ensemble Feature selection (FEFS) and Fusion of Multiple Classifier (FMC). FEFS is a unification of five feature scores. These scores are obtained by using feature-class distance functions. Aggregation is done using fuzzy union operation. On the other hand, the FMC is the fusion of three classifiers. It works based on Ensemble decisive function. Experiments were made on KDD cup 99 data set have shown that our proposed system works superior to well-known methods such as Support Vector Machines (SVMs), K-Nearest Neighbor (KNN) and Artificial Neural Networks (ANNs). Our examinations ensured clearly the prominence of using ensemble methodology for modeling IDSs, and hence our system is robust and efficient.
TUPLE VALUE BASED MULTIPLICATIVE DATA PERTURBATION APPROACH TO PRESERVE PRIVA...IJDKP
Huge volume of data from domain specific applications such as medical, financial, library, telephone,
shopping records and individual are regularly generated. Sharing of these data is proved to be beneficial
for data mining application. On one hand such data is an important asset to business decision making by
analyzing it. On the other hand data privacy concerns may prevent data owners from sharing information
for data analysis. In order to share data while preserving privacy, data owner must come up with a solution
which achieves the dual goal of privacy preservation as well as an accuracy of data mining task –
clustering and classification. An efficient and effective approach has been proposed that aims to protect
privacy of sensitive information and obtaining data clustering with minimum information loss
PREDICTION OF MALIGNANCY IN SUSPECTED THYROID TUMOUR PATIENTS BY THREE DIFFER...cscpconf
In the present study, the abilities of three classification methods of data mining namely artificial
neural networks with feed-forward back propagation algorithm, J48 decision tree method and
logistic regression analysis are compared in a medical real dataset. The prediction of
malignancy in suspected thyroid tumour patients is the objective of the study. The accuracy of
the correct predictions (the minimum error rate), the amount of time consuming in the
modelling process and the interpretability and simplicity of the results for clinical experts are
the factors considered to choose the best method
N ETWORK F AULT D IAGNOSIS U SING D ATA M INING C LASSIFIERScsandit
Mobile networks are under more pressure than ever b
efore because of the increasing number of
smartphone users and the number of people relying o
n mobile data networks. With larger
numbers of users, the issue of service quality has
become more important for network operators.
Identifying faults in mobile networks that reduce t
he quality of service must be found within
minutes so that problems can be addressed and netwo
rks returned to optimised performance. In
this paper, a method of automated fault diagnosis i
s presented using decision trees, rules and
Bayesian classifiers for visualization of network f
aults. Using data mining techniques the model
classifies optimisation criteria based on the key p
erformance indicators metrics to identify
network faults supporting the most efficient optimi
sation decisions. The goal is to help wireless
providers to localize the key performance indicator
alarms and determine which Quality of
Service factors should be addressed first and at wh
ich locations
Comparative Study of Machine Learning Algorithms for Sentiment Analysis with ...Sagar Deogirkar
Comparing the State-of-the-Art Deep Learning with Machine Learning algorithms performance on TF-IDF vector creation for Sentiment Analysis using Airline Tweeter Data Set.
An approach for ids by combining svm and ant colony algorithmeSAT Journals
Abstract This piece of work researches the intrusion detection problem of the network sanctuary; the primary task is to classify network behavior as normal or abnormal while reducing misclassification. In this paper, two efficient data mining algorithms are combined together to detect the network intrusion. Combining SVM and Ant colony (CSVAC) used for well-organized data classification, this technique takes the advantage of both the algorithm while avoiding their weaknesses. This algorithm is implemented and evaluated using standard benchmark KDDCUP99 data set. Experimental results drastically well produce superior results than the other algorithm in terms of accuracy rate and run time efficiency, and this algorithm able to detect the new types of attacks Keywords: Intrusion Detection; Support Vector Machine; Ant colony; Combined Support vector with ant colony
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology
Intrusion Detection System Based on K-Star Classifier and Feature Set ReductionIOSR Journals
Abstract: Network security and Intrusion Detection Systems (IDS’s) is an important security related research
area. This paper applies K-star algorithm with filtering analysis in order to build a network intrusion detection
system. For our experimental analysis and as a case study, we have used the new NSL-KDD dataset, which is a
modified dataset for KDDCup 1999 intrusion detection benchmark dataset. With a split of 66.0% for the
training set and the remainder for the testing set a 2 class classifications has been implemented. WEKA which is
a java based open source software consists of a collection of machine learning algorithms for Data mining tasks
has been used in the testing process. The experimental results show that the proposed approach is very accurate
with low false positive rate and high true positive rate and it takes less learning time in comparison with other
existing approaches used for efficient network intrusion detection.
Keywords: Information Gain, Intrusion Detection System, Instance-based classifier, K-Star, Weka.
Detection of malicious attacks by Meta classification algorithmsEswar Publications
We address the problem of malicious node detection in a network based on the characteristics in the behavior of the network. This issue brings out a challenging set of research papers in the recent contributing a critical component to secure the network. This type of work evolves with many changes in the solution strategies. In this work, we propose carefully the learning models with cautious selection of attributes, selection of parameter thresholds and number of iterations. In this research, appropriate approach to evaluate the performance of a set of meta classifier algorithms (Ad Boost, Attribute selected classifier, Bagging, Classification via Regression,
Filtered classifier, logit Boost, multiclass classifier). The ratio between training and testing data is made such way that compatibility of data patterns in both the sets are same. Hence we consider a set of supervised machine learning schemes with meta classifiers were applied on the selected dataset to predict the attack risk of the network environment . The trained models were then used for predicting the risk of the attacks in a web server environment or by any network administrator or any Security Experts. The Prediction Accuracy of the Classifiers was evaluated using 10-fold Cross Validation and the results have been compared to obtain the accuracy.
Minkowski Distance based Feature Selection Algorithm for Effective Intrusion ...IJMER
Intrusion Detection System (IDS) plays a major role in the provision of effective security to various types of networks. Moreover, Intrusion Detection System for networks need appropriate rule set for classifying network bench mark data into normal or attack patterns. Generally, each dataset is characterized by a large set of features. However, all these features will not be relevant or fully contribute in identifying an attack. Since different attacks need various subsets to provide better detection accuracy. In this paper an improved feature selection algorithm is proposed to identify the most appropriate subset of features for detecting a certain attacks. This proposed method is based on Minkowski distance feature ranking and an improved exhaustive search that selects a better combination of features. This system has been evaluated using the KDD CUP 1999 dataset and also with EMSVM [1] classifier. The experimental results show that the proposed system provides high classification accuracy and low false alarm rate when applied on the reduced feature subsets
ATTACK DETECTION AVAILING FEATURE DISCRETION USING RANDOM FOREST CLASSIFIERCSEIJJournal
The widespread use of the Internet has an adverse effect of being vulnerable to cyber attacks. Defensive
mechanisms like firewalls and IDSs have evolved with a lot of research contributions happening in these
areas. Machine learning techniques have been successfully used in these defense mechanisms especially
IDSs. Although they are effective to some extent in identifying new patterns and variants of existing
malicious patterns, many attacks are still left as undetected. The objective is to develop an algorithm for
detecting malicious domains based on passive traffic measurements. In this paper, an anomaly-based
intrusion detection system based on an ensemble based machine learning classifier called Random Forest
with gradient boosting is deployed. NSL-KDD cup dataset is used for analysis and out of 41 features, 32
features were identified as significant using feature discretion. Our observations confirm the conjecture
that both the feature selection and stochastic based genetic operators improves the accuracy and the
effectiveness. The training time is shown to be reduced tremendously by 98.59% and accuracy improved to
98.75%.
Attack Detection Availing Feature Discretion using Random Forest ClassifierCSEIJJournal
The widespread use of the Internet has an adverse effect of being vulnerable to cyber attacks. Defensive
mechanisms like firewalls and IDSs have evolved with a lot of research contributions happening in these
areas. Machine learning techniques have been successfully used in these defense mechanisms especially
IDSs. Although they are effective to some extent in identifying new patterns and variants of existing
malicious patterns, many attacks are still left as undetected. The objective is to develop an algorithm for
detecting malicious domains based on passive traffic measurements. In this paper, an anomaly-based
intrusion detection system based on an ensemble based machine learning classifier called Random Forest
with gradient boosting is deployed. NSL-KDD cup dataset is used for analysis and out of 41 features, 32
features were identified as significant using feature discretion.
Analysis on different Data mining Techniques and algorithms used in IOTIJERA Editor
In this paper, we discusses about five functionalities of data mining in IOT that affects the performance and that
are: Data anomaly detection, Data clustering, Data classification, feature selection, time series prediction. Some
important algorithm has also been reviewed here of each functionalities that show advantages and limitations as
well as some new algorithm that are in research direction. Here we had represent knowledge view of data
mining in IOT.
High performance intrusion detection using modified k mean & naïve bayeseSAT Journals
Abstract
Internet Technology is growing at exponential rate day by day, making data security of computer systems more complex and critical. There has been multiple methodology implemented for the same in recent time as detailed in [1], [3]. Availability of larger bandwidth has made the multiple large computer server network connected worldwide and thus increasing the load on the necessity to secure data and Intrusion detection system (IDS) is one of the most efficient technique to maintain security of computer system. The proposed system is designed in such a way that are helpful in identifying malicious behavior and improper use of computer system. In this report we proposed a hybrid technique for intrusion detection using data mining algorithms. Our main objective is to do complete analysis of intrusion detection Dataset to test the implemented system.In This report we will propose a new methodology in which Modified k-mean is used for clustering whereas Naïve Bayes for the classification. These two data mining techniques will be used for Intrusion detection in large horizontally distributed database.
Keywords: Intrusion Detection, Modified K-Mean, Naïve Bays
CLASSIFIER SELECTION MODELS FOR INTRUSION DETECTION SYSTEM (IDS)ieijjournal1
Any abnormal activity can be assumed to be anomalies intrusion. In the literature several techniques and
algorithms have been discussed for anomaly detection. In the most of cases true positive and false positive
parameters have been used to compare their performance. However, depending upon the application a
wrong true positive or wrong false positive may have severe detrimental effects. This necessitates inclusion
of cost sensitive parameters in the performance. Moreover the most common testing dataset KDD-CUP-99
has huge size of data which intern require certain amount of pre-processing. Our work in this paper starts
with enumerating the necessity of cost sensitive analysis with some real life examples. After discussing
KDD-CUP-99 an approach is proposed for feature elimination and then features selection to reduce the
number of more relevant features directly and size of KDD-CUP-99 indirectly. From the reported
literature general methods for anomaly detection are selected which perform best for different types of
attacks. These different classifiers are clubbed to form an ensemble. A cost opportunistic technique is
suggested to allocate the relative weights to classifiers ensemble for generating the final result. The cost
sensitivity of true positive and false positive results is done and a method is proposed to select the elements
of cost sensitivity metrics for further improving the results to achieve the overall better performance. The
impact on performance trade of due to incorporating the cost sensitivity is discussed.
Intrusion detection with Parameterized Methods for Wireless Sensor Networksrahulmonikasharma
Current network intrusion detection systems lack adaptability to the frequently changing network environments. Furthermore, intrusion detection in the new distributed architectures is now a major requirement. In this paper, we propose two Adaboost based intrusion detection algorithms. In the first algorithm, a traditional online Adaboost process is used where decision stumps are used as weak classifiers. In the second algorithm, an improved online Adaboost process is proposed, and online Gaussian mixture models (GMMs) are used as weak classifiers. We further propose a distributed intrusion detection framework, in which a local parameterized detection model is constructed in each node using the online Adaboost algorithm. A global detection model is constructed in each node by combining the local parametric models using a small number of samples in the node. This combination is achieved using an algorithm based on particle swarm optimization (PSO) and support vector machines. The global model in each node is used to detect intrusions. Experimental results show that the improved online Adaboost process with GMMs obtains a higher detection rate and a lower false alarm rate than the traditional online Adaboost process that uses decision stumps. Both the algorithms outperform existing intrusion detection algorithms. It is also shown that our PSO, and SVM-based algorithm effectively combines the local detection models into the global model in each node; the global model in a node can handle the intrusion types that are found in other nodes, without sharing the samples of these intrusion types.
Similar to Decision Tree Based Algorithm for Intrusion Detection (20)
Content-Based Image Retrieval (CBIR) systems have been used for the searching of relevant images in various research areas. In CBIR systems features such as shape, texture and color are used. The extraction of features is the main step on which the retrieval results depend. Color features in CBIR are used as in the color histogram, color moments, conventional color correlogram and color histogram. Color space selection is used to represent the information of color of the pixels of the query image. The shape is the basic characteristic of segmented regions of an image. Different methods are introduced for better retrieval using different shape representation techniques; earlier the global shape representations were used but with time moved towards local shape representations. The local shape is more related to the expressing of result instead of the method. Local shape features may be derived from the texture properties and the color derivatives. Texture features have been used for images of documents, segmentation-based recognition,and satellite images. Texture features are used in different CBIR systems along with color, shape, geometrical structure and sift features.
The cyber attacks have become most prevalent in the past few years. During this time, attackers have discovered new vulnerabilities to carry out malicious activities on the internet. Both the clients and the servers have been victimized by the attackers. Clickjacking is one of the attacks that have been adopted by the attackers to deceive the innocuous internet users to initiate some action. Clickjacking attack exploits one of the vulnerabilities existing in the web applications. This attack uses a technique that allows cross domain attacks with the help of userinitiated clicks and performs unintended actions. This paper traces out the vulnerabilities that make a website vulnerable to clickjacking attack and proposes a solution for the same.
Performance Analysis of Audio and Video Synchronization using Spreaded Code D...Eswar Publications
The audio and video synchronization plays an important role in speech recognition and multimedia communication. The audio-video sync is a quite significant problem in live video conferencing. It is due to use of various hardware components which introduces variable delay and software environments. The objective of the synchronization is used to preserve the temporal alignment between the audio and video signals. This paper proposes the audio-video synchronization using spreading codes delay measurement technique. The performance of the proposed method made on home database and achieves 99% synchronization efficiency. The audio-visual
signature technique provides a significant reduction in audio-video sync problems and the performance analysis of audio and video synchronization in an effective way. This paper also implements an audio- video synchronizer and analyses its performance in an efficient manner by synchronization efficiency, audio-video time drift and audio-video delay parameters. The simulation result is carried out using mat lab simulation tools and simulink. It is automatically estimating and correcting the timing relationship between the audio and video signals and maintaining the Quality of Service.
Due to the availability of complicated devices in industry, models for consumers at lower cost of resources are developed. Home Automation systems have been developed by several researchers. The limitations of home automation includes complexity in architecture, higher costs of the equipment, interface inflexibility. In this paper as we have proposed, the working protocol of PIC 16F72 technology is which is secure, cost efficient, flexible that leads to the development of efficient home automation systems. The system is operational to control various home appliances like fans, Bulbs, Tube light. The following paper describes about components used and working of all components connected. The home automation system makes use of Android app entitled “Home App” which gives
flexibility and easy to use GUI.
Semantically Enchanced Personalised Adaptive E-Learning for General and Dysle...Eswar Publications
E-learning plays an important role in providing required and well formed knowledge to a learner. The medium of e- learning has achieved advancement in various fields such as adaptive e-learning systems. The need for enhancing e-learning semantically can enhance the retrieval and adaptability of the learning curriculum. This paper provides a semantically enhanced module based e-learning for computer science programme on a learnercentric perspective. The learners are categorized based on their proficiency for providing personalized learning environment for users. Learning disorders on the platform of e-learning still require lots of research. Therefore, this paper also provides a personalized assessment theoretical model for alphabet learning with learning objects for
children’s who face dyslexia.
Agriculture plays an important role in the economy of our country. Over 58 percent of the rural households depend on the agriculture sector as their means of livelihood. Agriculture is one of the major contributors to Gross Domestic Product(GDP). Seeds are the soul of agriculture. This application helps in reducing the time for the researchers as well as farmers to know the seedling parameters. The application helps the farmers to know about the percentage of seedlings that will grow and it is very essential in estimating the yield of that particular crop. Manual calculation may lead to some error, to minimize that error, the developed app is used. The scientist and farmers require the app to know about the physiological seed quality parameters and to take decisions regarding their farming activities. In this article a desktop app for seed germination percentage and vigour index calculation are developed in PHP scripting language.
What happens when adaptive video streaming players compete in time-varying ba...Eswar Publications
Competition among adaptive video streaming players severely diminishes user-QoE. When players compete at a bottleneck link many do not obtain adequate resources. This imbalance eventually causes ill effects such as screen flickering and video stalling. There have been many attempts in recent years to overcome some of these problems. However, added to the competition at the bottleneck link there is also the possibility of varying network bandwidth which can make the situation even worse. This work focuses on such a situation. It evaluates current heuristic adaptive video players at a bottleneck link with time-varying bandwidth conditions. Experimental setup includes the TAPAS player and emulated network conditions. The results show PANDA outperforms FESTIVE, ELASTIC and the Conventional players.
WLI-FCM and Artificial Neural Network Based Cloud Intrusion Detection SystemEswar Publications
Security and Performance aspects of cloud computing are the major issues which have to be tended to in Cloud Computing. Intrusion is one such basic and imperative security problem for Cloud Computing. Consequently, it is essential to create an Intrusion Detection System (IDS) to detect both inside and outside assaults with high detection precision in cloud environment. In this paper, cloud intrusion detection system at hypervisor layer is developed and assesses to detect the depraved activities in cloud computing environment. The cloud intrusion detection system uses a hybrid algorithm which is a fusion of WLI- FCM clustering algorithm and Back propagation artificial Neural Network to improve the detection accuracy of the cloud intrusion detection system. The proposed system is implemented and compared with K-means and classic FCM. The DARPA’s KDD cup dataset 1999 is used for simulation. From the detailed performance analysis, it is clear that the proposed system is able to detect the anomalies with high detection accuracy and low false alarm rate.
Spreading Trade Union Activities through Cyberspace: A Case StudyEswar Publications
This report present the outcome of an investigative research conducted to examine the modu-operandi of academic staff union of polytechnics (ASUP) YabaTech. The investigation covered the logistics and cost implication for spreading union activities among members. It was discovered that cost of management and dissemination of information to members was at high side, also logistics problem constitutes to loss of information in transit hence cut away some members from union activities. To curtail the problem identified, we proposed the
design of secure and dynamic website for spreading union activities among members and public. The proposed system was implemented using HTML5 technology, interface frameworks like Bootstrap and j query which enables the responsive feature of the application interface. The backend was designed using PHPMYSQL. It was discovered from the evaluation of the new system that cost of managing information has reduced considerably, and logistic problems identified in the old system has become a forgotten issue.
Identifying an Appropriate Model for Information Systems Integration in the O...Eswar Publications
Nowadays organizations are using information systems for optimizing processes in order to increase coordination and interoperability across the organizations. Since Oil and Gas Industry is one of the large industries in whole of the world, there is a need to compatibility of its Information Systems (IS) which consists three categories of systems: Field IS, Plant IS and Enterprise IS to create interoperability and approach the
optimizing processes as its result. In this paper we introduce the different models of information systems integration, identify the types of information systems that are using in the upstream and downstream sectors of petroleum industry, and finally based on expert’s opinions will identify a suitable model for information systems integration in this industry.
Link-and Node-Disjoint Evaluation of the Ad Hoc on Demand Multi-path Distance...Eswar Publications
This work illustrates the AOMDV routing protocol. Its ancestor, the AODV routing protocol is also described. This tutorial demonstrates how forward and reverse paths are created by the AOMDV routing protocol. Loop free paths formulation is described, together with node and link disjoint paths. Finally, the performance of the AOMDV routing protocol is investigated along link and node disjoint paths. The WSN with the AOMDV routing protocol using link disjoint paths is better than the WSN with the AOMDV routing protocol using node disjoint paths for energy consumption.
Bridging Centrality: Identifying Bridging Nodes in Transportation NetworkEswar Publications
To identify the importance of node of a network, several centralities are used. Majority of these centrality measures are dominated by components' degree due to their nature of looking at networks’ topology. We propose a centrality to identification model, bridging centrality, based on information flow and topological aspects. We apply bridging centrality on real world networks including the transportation network and show that the nodes distinguished by bridging centrality are well located on the connecting positions between highly connected regions. Bridging centrality can discriminate bridging nodes, the nodes with more information flowed through them and locations between highly connected regions, while other centrality measures cannot.
Now a days we are living in an era of Information Technology where each and every person has to become IT incumbent either intentionally or unintentionally. Technology plays a vital role in our day to day life since last few decades and somehow we all are depending on it in order to obtain maximum benefit and comfort. This new era equipped with latest advents of technology, enlightening world in the form of Internet of Things (IoT). Internet of things is such a specified and dignified domain which leads us to the real world scenarios where each object can perform some task while communicating with some other objects. The world with full of devices, sensors and other objects which will communicate and make human life far better and easier than ever. This paper provides an overview of current research work on IoT in terms of architecture, a technology used and applications. It also highlights all the issues related to technologies used for IoT, after the literature review of research work. The main purpose of this survey is to provide all the latest technologies, their corresponding
trends and details in the field of IoT in systematic manner. It will be helpful for further research.
Automatic Monitoring of Soil Moisture and Controlling of Irrigation SystemEswar Publications
In past couple of decades, there is immediate growth in field of agricultural technology. Utilization of proper method of irrigation by drip is very reasonable and proficient. A various drip irrigation methods have been proposed, but they have been found to be very luxurious and dense to use. The farmer has to maintain watch on irrigation schedule in the conventional drip irrigation system, which is different for different types of crops. In remotely monitored embedded system for irrigation purposes have become a new essential for farmer to accumulate his energy, time and money and will take place only when there will be requirement of water. In this approach, the soil test for chemical constituents, water content, and salinity and fertilizer requirement data collected by wireless and processed for better drip irrigation plan. This paper reviews different monitoring systems and proposes an automatic monitoring system model using Wireless Sensor Network (WSN) which helps the farmer to improve the yield.
Multi- Level Data Security Model for Big Data on Public Cloud: A New ModelEswar Publications
With the advent of cloud computing the big data has emerged as a very crucial technology. The certain type of cloud provides the consumers with the free services like storage, computational power etc. This paper is intended to make use of infrastructure as a service where the storage service from the public cloud providers is going to leveraged by an individual or organization. The paper will emphasize the model which can be used by anyone without any cost. They can store the confidential data without any type of security issue, as the data will be altered
in such a way that it cannot be understood by the intruder if any. Not only that but the user can retrieve back the original data within no time. The proposed security model is going to effectively and efficiently provide a robust security while data is on cloud infrastructure as well as when data is getting migrated towards cloud infrastructure or vice versa.
Impact of Technology on E-Banking; Cameroon PerspectivesEswar Publications
The financial services industry is experiencing rapid changes in services delivery and channels usage, and financial companies and users of financial services are looking at new technologies as they emerge and deciding whether or not to embrace them and the new opportunities to save and manage enormous time, cost and stress.
There is no doubt about the favourable and manifold impact of technology on e-banking as pictured in this review paper, almost all banks are with the least and most access e-banking Technological equipments like ATMs and Cards. On the other Hand cheap and readily available technology has opened a favourable competition in ebanking services business with a lot of wide range competitors competing with Commercial Banks in Cameroon in providing digital financial services.
Classification Algorithms with Attribute Selection: an evaluation study using...Eswar Publications
Attribute or feature selection plays an important role in the process of data mining. In general the data set contains more number of attributes. But in the process of effective classification not all attributes are relevant.
Attribute selection is a technique used to extract the ranking of attributes. Therefore, this paper presents a comparative evaluation study of classification algorithms before and after attribute selection using Waikato Environment for Knowledge Analysis (WEKA). The evaluation study concludes that the performance metrics of the classification algorithm, improves after performing attribute selection. This will reduce the work of processing irrelevant attributes.
Mining Frequent Patterns and Associations from the Smart meters using Bayesia...Eswar Publications
In today’s world migration of people from rural areas to urban areas is quite common. Health care services are one of the most challenging aspect that is must require to the people with abnormal health. Advancements in the technologies lead to build the smart homes, which contains various sensor or smart meter devices to automate the process of other electronic device. Additionally these smart meters can be able to capture the daily activities of the patients and also monitor the health conditions of the patients by mining the frequent patterns and
association rules generated from the smart meters. In this work we proposed a model that is able to monitor the activities of the patients in home and can send the daily activities to the corresponding doctor. We can extract the frequent patterns and association rules from the log data and can predict the health conditions of the patients and can give the suggestions according to the prediction. Our work is divided in to three stages. Firstly, we used to record the daily activities of the patient using a specific time period at three regular intervals. Secondly we applied the frequent pattern growth for extracting the association rules from the log file. Finally, we applied k means clustering for the input and applied Bayesian network model to predict the health behavior of the patient and precautions will be given accordingly.
Network as a Service Model in Cloud Authentication by HMAC AlgorithmEswar Publications
Resource pooling on internet-based accessing on use as pay environmental technology and ruled in IT field is the
cloud. Present, in every organization has trusted the web, however, the information must flow but not hold the
data. Therefore, all customers have to use the cloud. While the cloud progressing info by securing-protocols. Third
party observing and certain circumstances directly stale in flow and kept of packets in the virtual private cloud.
Global security statistics in the year 2017, hacking sensitive information in cloud approximately maybe 75.35%,
and the world security analyzer said this calculation maybe reached to 100%. For this cause, this proposed
research work concentrates on Authentication-Message-Digest-Key with authentication in routing the Network as
a Service of packets in OSPF (Open Shortest Path First) implementing Cloud with GNS3 has tested them to
securing from attackers.
Microstrip patch antennas are recently used in wireless detection applications due to their low power consumption, low cost, versatility, field excitation, ease of fabrication etc. The microstrip patch antennas are also called as printed antennas which is suffer with an array elements of antenna and narrow bandwidth. To overcome the above drawbacks, Flame Retardant Material is used as the substrate. Rectangular shape of microstrip patch antenna with FR4 material as the substrate which is more suitable for the explosive detection applications. The proposed printed antenna was designed with the dimension of 60 x 60 mm2. FR-4 material has a dielectric constant value of 4.3 with thickness 1.56 mm, length and width 60 mm and 60 mm respectively. One side of the substrate contains the ground plane of dimensions 60 x60 mm2 made of copper and the other side of the substrate contains the patch which have dimensions 34 x 29 mm2 and thickness 0.03mm which is also made of copper. RMPA without slot, Vertical slot RMPA, Double horizontal slot RMPA and Centre slot RMPA structures were
designed and the performance of the antennas were analysed with various parameters such as gain, directivity, Efield, VSWR and return loss. From the performance analysis, double horizontal slot RMPA antenna provides a better result and it provides maximum gain (8.61dB) and minimum return loss (-33.918dB). Based on the E-field excitation value the SEMTEX explosive material is detected and it was simulated using CST software.
Explore the innovative world of trenchless pipe repair with our comprehensive guide, "The Benefits and Techniques of Trenchless Pipe Repair." This document delves into the modern methods of repairing underground pipes without the need for extensive excavation, highlighting the numerous advantages and the latest techniques used in the industry.
Learn about the cost savings, reduced environmental impact, and minimal disruption associated with trenchless technology. Discover detailed explanations of popular techniques such as pipe bursting, cured-in-place pipe (CIPP) lining, and directional drilling. Understand how these methods can be applied to various types of infrastructure, from residential plumbing to large-scale municipal systems.
Ideal for homeowners, contractors, engineers, and anyone interested in modern plumbing solutions, this guide provides valuable insights into why trenchless pipe repair is becoming the preferred choice for pipe rehabilitation. Stay informed about the latest advancements and best practices in the field.
Saudi Arabia stands as a titan in the global energy landscape, renowned for its abundant oil and gas resources. It's the largest exporter of petroleum and holds some of the world's most significant reserves. Let's delve into the top 10 oil and gas projects shaping Saudi Arabia's energy future in 2024.
Overview of the fundamental roles in Hydropower generation and the components involved in wider Electrical Engineering.
This paper presents the design and construction of hydroelectric dams from the hydrologist’s survey of the valley before construction, all aspects and involved disciplines, fluid dynamics, structural engineering, generation and mains frequency regulation to the very transmission of power through the network in the United Kingdom.
Author: Robbie Edward Sayers
Collaborators and co editors: Charlie Sims and Connor Healey.
(C) 2024 Robbie E. Sayers
Decision Tree Based Algorithm for Intrusion Detection
1. Int. J. Advanced Networking and Applications
Volume: 07 Issue: 04 Pages: 2828-2834 (2016) ISSN: 0975-0290
2828
Decision Tree Based Algorithm for Intrusion
Detection
Kajal Rai
Research Scholar, Department of Computer Science and Applications, Panjab University, Chandigarh, India
Email: kajalrai@pu.ac.in
M. Syamala Devi
Professor, Department of Computer Science and Applications, Panjab University, Chandigarh, India
Email: syamala@pu.ac.in
Ajay Guleria
System Manager, Computer Center, Panjab University, Chandigarh, India
Email: ag@pu.ac.in
-----------------------------------------------------ABSTRACT------------------------------------------------------
An Intrusion Detection System (IDS) is a defense measure that supervises activities of the computer network and
reports the malicious activities to the network administrator. Intruders do many attempts to gain access to the
network and try to harm the organization’s data. Thus the security is the most important aspect for any type of
organization. Due to these reasons, intrusion detection has been an important research issue. An IDS can be
broadly classified as Signature based IDS and Anomaly based IDS. In our proposed work, the decision tree
algorithm is developed based on C4.5 decision tree approach. Feature selection and split value are important
issues for constructing a decision tree. In this paper, the algorithm is designed to address these two issues. The
most relevant features are selected using information gain and the split value is selected in such a way that makes
the classifier unbiased towards most frequent values. Experimentation is performed on NSL-KDD (Network
Security Laboratory Knowledge Discovery and Data Mining) dataset based on number of features. The time
taken by the classifier to construct the model and the accuracy achieved is analyzed. It is concluded that the
proposed Decision Tree Split (DTS) algorithm can be used for signature based intrusion detection.
Keywords - Decision Tree, Information Gain, Gain Ratio, NSL-KDD, Signature-based IDS
--------------------------------------------------------------------------------------------------------------------------------------------------
Date of Submission: Dec 22, 2015 Date of Acceptance: Dec 28, 2015
--------------------------------------------------------------------------------------------------------------------------------------------------
1. INTRODUCTION
An Intrusion Detection System (IDS) is a system that
monitors network to check harmful activities in the
network and reports events that does not meet the
security criteria to the network administrator. IDSs are
categorized as Signature based and Anomaly based.
Signature or Misuse based IDS uses various techniques
to locate the similarity among system behavior and
previously known attacks stored in the signature
database. Anomaly based IDS detects activities in a
network which deviates from normal behaviors stored
in system profiles database. There are various classifiers
that are applicable to misuse based detection. Some are
tree based such as decision tree [1], and random forest
[2], whereas some are rule based such as oneR [3],
while some are function based such as SVM (Support
Vector Machine) [4]. In this paper, the decision tree
classifier is used to classify input data as normal or
anomalous.
A Decision Tree is a tree-like graph consisting of
internal nodes which represent a test on an attribute and
branches which denote the outcome of the test and leaf
nodes which signify a class label. The classification
rules are formed by the path selected from the root node
to the leaf. To divide each input data, first the root node
is chosen as it is the most prominent attribute to
separate the data. The tree is constructed by identifying
attributes and their associated values which will be used
to analyze the input data at each intermediate node of
the tree. After the tree is formed, it can prefigure newly
coming data by traversing, starting from a root node to
the leaf node visiting all the internal nodes in the path
depending upon the test conditions of the attributes at
each node [5]. The main issue in constructing decision
tree is, which value is chosen for splitting the node of
the tree. This issue is taken care in section 3.
Decision trees can analyze data and identify significant
characteristics in the network that indicate malicious
activities. It can add value to many real-time security
systems by analyzing large set of intrusion detection
data. It can recognize trends and patterns that support
further investigation, the development of attack
signatures, and other activities of monitoring. The main
advantage of using decision trees instead of other
classification techniques is that they provide a rich set
of rules that are easy to understand, and can be
effortlessly integrated with real-time technologies [6].
NSL-KDD is the latest dataset for intrusion detection.
This dataset consists of 41 features, however not all the
2. Int. J. Advanced Networking and Applications
Volume: 07 Issue: 04 Pages: 2828-2834 (2016) ISSN: 0975-0290
2829
features are of equal importance. If complete feature set
is used for classification input data, then the classifier
will take more time to detect intrusion and they can also
affect the accuracy of the classifier. That’s why before
performing any classification, we need to reduce this set
by applying some feature selection method. Feature
selection is done to remove irrelevant and redundant
features. In the literature, there are various feature
selection methods such as information gain [7], PCA
(Principle Component Analysis), and GA (Genetic
Algorithm). For classification of network data several
classifiers are available such as KNN (k-nearest
neighbor), SVM, ANN (Artificial Neural Network), and
decision tree. C4.5 builds decision tree by using the
notion of information entropy from a set of training
data. At each node of the tree, the algorithm picks out
an attribute which most efficiently divides the set of
given data into smaller subsets associated with any class
in the given training set. The dividing factor here is the
gain ratio. The attribute with the highest gain ratio is
selected to do the judgment [8].
The rest of the paper is structured as follows: Section 2
gives a brief related work of intrusion detection based
on feature selection and classifiers. In section 3
provides the proposed algorithm DTS for developing
decision tree. In section 4, the experimental results
using NSL-KDD dataset is shown. Section 5, includes
conclusion and scope for future work.
2. RELATED WORK
Literature review has been done including latest papers
that perform training and testing of the system on NSL-
KDD dataset. The review is performed based on feature
selection and classification.
2.1 REVIEW BASED ON FEATURE SELECTION
Gaikwad and Thool [9] have applied Genetic Algorithm
(GA) on NSL-KDD data set to select relevant features.
The GA selects 15 features out of 41 from the available
data set. These 15 features gives 79% accuracy on test
data with decision tree as a classifier and it takes 176
seconds to build the model.
Bajaj and Arora [10] discuss various feature selection
methods such as information gain, gain ratio and
correlation based feature selection. In their paper, they
select 33 features out of 41 for classification and the
results for various classifiers are compared. The Simple
Cart algorithm gives the highest accuracy 66.77%
whereas the classification result of C4.5 decision tree is
65.65% only. Alazab et al. [11] also select features
using information gain and decision tree to detect both
the old and the new attacks.
Thaseen and Kumar [12] used two useful methods for
feature selection, namely, Correlation based Feature
Selection (CFS) and Consistency-based Feature
Selection (CONS). In this paper, 8 features are selected
using CFS and the classification is done using Naïve
Bayes, C4.5 decision tree, and AD (Alternating
Decision) tree and their results have been compared.
CONS selects 10 useful features out of 41 and then the
classification using various techniques such as Random
Forest and Random Tree has been analyzed.
Revathi, and Malathi[13] select 15 features using CFS
and test using various classifiers such as Random
Forest, C4.5 decision tree, SVM, CART and Naïve
Bayes. The results of above mentioned classifiers have
been compared and the outcome shows that Random
Forest gives highest accuracy in detecting attacks.
Jyothsna and Prasad [14] evaluates and compares the
performance of IDSs for different feature selection
techniques such as information gain, gain ratio,
Optimized Least Significant Particle based Quantitative
Particle Swarm Optimization (OLSP-QPSO), and
Principle Component Analysis (PCA). Results show
that the OLSP-QPSO technique has more number of
attribute reduction and low false alarm rate with high
detection rate when compared with the remaining
feature selection techniques.
In [15], the authors proposed hybrid KNN and Neural
Network based multilevel classification model. In this
model, KNN was used as a classifier for anomaly
detection with two classes, namely, 'normal' and
'abnormal'. After that a neural network was used to
detect a specific type of attack in 'abnormal' class. For
experiments, the NSL-KDD dataset was used. First, all
the features of the dataset were used for classification.
Then classification is performed on 25 selected features.
Selection has been done by Rough Set Theory and
Information Gain separately. In this classification
model, Information Gain with 25 features of the NSL-
KDD dataset produced better results as compared to 25
features with Rough Set Theory as well as 41 features
of NSL-KDD dataset.
2.2 REVIEW BASED ON CLASSIFIERS
Elekar, and Waghmare [16] implement different
classifiers such as C4.5 decision tree, Random Forest,
Hoeffding Tree and Random Tree for intrusion
detection and compare the result using WEKA. The
results show that the Hoeffding Tree gives the best
result among the various classifiers for detecting attacks
on the test data.
Aggarwal and Sharma [1] evaluate ten classification
algorithms such as Random Forest, C4.5, Naïve Bayes,
and Decision Table. Then they simulate these
classification algorithms in WEKA with KDD’99
dataset. These ten classifiers are analyzed according to
metrics such as accuracy, precision, and F-score.
Random Tree shows the best results overall while the
algorithms that have high detection rate and low false
alarm rate were C4.5 and Random Forest.
In [17] the authors show how useful is the NSL-KDD
for various intrusion detection models. For
3. Int. J. Advanced Networking and Applications
Volume: 07 Issue: 04 Pages: 2828-2834 (2016) ISSN: 0975-0290
2830
dimensionality reduction, PCA technique was used in
this paper. Six different algorithms, namely, ID3, Bayes
Net, J48, CART, SVM, and Naïve Bayes were used for
the experimentation with and without feature reduction,
and from the results it was clear that SVM gives the
highest accuracy for the above two cases.
In [18], the authors designed a multi-layer hybrid
machine learning IDS. PCA was used for attribute
selection and only 22 features were selected in the first
layer of the IDS. GA was used in the next layer for
generating detectors, which can distinguish between
normal and abnormal behavior. In the third layer,
classification was done using several classifiers. Results
demonstrate that the Naive Bayes has good accuracy for
two types of attacks, namely, User-to-Root (U2R) and
Remote-to-Local (R2L) attacks however the decision
tree gives higher accuracy up to 82% for Denial-of-
Service attacks and 65% of probe attacks.
The system proposed by Raeeyat et al. in [19] consists
of 4 modules namely, Data pre-processing module,
Misuse detection module, anomaly detection module
and Evaluation and comparison module. Data were pre-
processed before passing to the other modules by data
pre-processing module. In the misuse detection module,
pre-processed data is given to PCA to take out
important features. After that the data were examined
using Adaboost algorithm based on C4.5 decision tree
to know whether it is a normal packet or an intrusion.
Then the outcome of decision tree is passed on to the
next module for evaluation and comparison. When the
data is sent to misuse detection module it is
simultaneously sent to anomaly detection module also.
The correlation among features was also found out by
the correlation unit by using Pearson Correlation. Data
correlation graph is used to show deviation of behavior
from the normal behavior. Then, the evaluation and
comparison module determine whether the instance is
an intrusion or not by taking the output from misuse and
anomaly detection module and if both the module
shows that it is an intrusion then only that instance is
considered as an intrusion.
In [20], a hybrid IDS is proposed. Random Forest is
used for classification in misuse detection to build
patterns of intrusion from a training dataset. Weighted
k-means clustering is used in anomaly detection. Due to
less correlation between clusters, there is high false
alarm rate as the number of clusters increases for
detecting larger number of attacks.
3. DECISION TREE SPLIT (DTS)
ALGORITHM
Decision Tree Split (DTS) algorithm is based on C4.5
decision tree algorithm [11] [21]. The main issue in
constructing decision tree is the split value of a node.
The proposed algorithm gives a novel approach in
selecting the split value. The steps of the algorithm are
as follows:
1. If all the given training examples belong to the
same class, then a leaf node is created for the
decision tree by choosing that class.
2. For every feature 'a', calculate the gain ratio by
dividing the information gain of an attribute
with splitting value of that attribute. The
formula for gain ratio is
Split(a)
IG(a)
a)GainRatio(
where, S is the set of all the examples in the
given training set.
3. Information gain of an attribute is computed as
Ent(S_a)*
values(a)a_val |a|
|S_a|
Ent(S)IG(a)
where, S_a is the subset of S, values (a) is the
set of all possible values of attribute ‘a’ and |a|
is the total number of values in attribute ‘a’
4. Entropy can be calculated as
)
|S|
S)freq(Lj,
log2(*
num_class
1j |S|
S)freq(Lj,
Ent(S)
where, L = L1,L2,…, Ln is the set of classes,
andnum_class is the number of distinct classes.
For our consideration num_class has only two
values, namely, ‘normal’, and ‘anomaly’.
5. Split value of an attribute is chosen by taking
the average of all the values in the domain at
that particular attribute. It can be formulated as
m
m
i
ivala
1
_
Split(a)
where m is the number of values of an attribute
‘a’.
6. Find the attribute with the highest gain ratio.
Suppose, the highest gain ratio is for the
attribute 'a_best'.
7. Construct a decision node that divides the
dataset on the attribute 'a_best'.
8. Repeat steps from 1 to 4 on each subsets
produced by dividing the set on attribute
'a_best' and insert those nodes as descendant
of parent node.
C4.5 algorithm uses the following function for
calculating the split value of an attribute
||
|_|
2log*
)(_ ||
|_|
Split(a)
a
aS
avaluesvala a
aS
4. Int. J. Advanced Networking and Applications
Volume: 07 Issue: 04 Pages: 2828-2834 (2016) ISSN: 0975-0290
2831
3.1 SIGNIFICANCE OF THE PROPOSED
ALGORITHM
To select the split value, C4.5 algorithm first sorts all
the values of an attribute. Then from these sorted
values, say, Pi, Pi+1, … Pn , the gain ratio of all the
values is calculated by choosing the lower value of Pi
and Pi+1 as threshold value and then calculate split value
by using above mentioned formula. The value which
gives the highest gain ratio is chosen as the split value
for that particular node. Instead of using all these
calculations which makes technique more complex and
difficult to understand, we use a simple and effective
approach. In our approach, there is no need to sort the
attribute values to calculate the split value. We calculate
the split value by taking the average of the values in the
domain of a particular attribute at each node. It gives
uniform weightage to all the values in the domain,
making the classifier totally unbiased towards the most
frequent values in the domain of an attribute.
Sometimes, gain ratio may choose an attribute as a split
attribute just because its intrinsic information is very
low. This limitation can be overcome by considering
only those attributes that have greater value of
information gain than average information gain.
4. IMPLEMENTATION AND TESTING
The DTS algorithm is implemented on a 64-bit
Windows 8.1 operating system, with 8 GB of RAM and
a Pentium(R) processor with CPU speed of 2.20GHz
using tools WEKA and MATLAB. The proposed
algorithm is compared with the existing ones such as
Classification and Regression Tree (CART), C4.5, and
AD Tree.
The experiments were done for performance
comparison of different tree based classifiers and the
DTS algorithm. The analysis is done based on different
parameters such as how many seconds the classifier
takes to construct the model, false positive rate, true
positive rate, and accuracy. True Positive (TP)
represents the examples that are correctly predicted as
normal. True Negative (TN) shows the instances which
are correctly predicted as an attack. False Positive (FP)
identifies the instances which are predicted as attack
while they are not. False Negative (FN) represents the
cases which are prefigured as normal while they are
attack in reality. Accuracy can be defined as the number
of correct predictions. It can be computed as
FN)FPTN(TP
TN)(TP
Accuracy
The Receiver Operating Characteristic (ROC) curve is
also plotted for various techniques. ROC plots the curve
between true positive rate (TPR) and false positive rate
(FPR) of an algorithm. TPR and FPR are computed as
FNTP
TP
TPR
and
TNFP
FP
FPR
4.1 DATASET
In this section, the efficiency of the proposed technique
is evaluated cautiously by experimentations with the
NSL-KDD data set, which is a revised version of
KDD’99 data set. The reason for using NSL-KDD
dataset for our experiments is that the KDD’99 data set
has a large number of redundant records in the training
and testing data set. For binary classification, the NSL-
KDD classifies the network traffic into two classes,
namely, normal and anomaly. The experiments were
performed on full training data set having 125973
records and test data set having 22544 records.
First, we compute information gain of all the attributes
of the data set. We found that there are 16 attributes
whose information gain is greater than the average
information gain. That’s why in the preprocess step, we
can choose 16 or less than 16 attributes for further
processing based on information gain because the
remaining features will not have much effect on
classification of the dataset. Then, the data set with
these selected attributes is passed to the algorithm for
constructing, training and testing the decision tree.
4.2 RESULTS AND ANALYSIS
The performance of the proposed algorithm is
compared with the performance of various techniques.
The comparison of results is done based on the
accuracy in detecting attacks on the test dataset of NSL-
KDD. The results are taken from the literature which
uses various techniques such as Self Organizing Maps
(SOM), hoeffding tree, and Ripple Down Rule learner
Intrusion Detection (RDRID) for training their detection
model and testing. It is observed that our proposed
algorithm for constructing decision tree is efficient in
attack detection as shown in Fig. 1.
Various other classifiers such as CART, Naïve Bayes
(NB) Tree, and AD Tree along with the proposed
algorithm are tested using NSL-KDD test dataset. ROC
curves of AD Tree, C4.5, CART and DTS algorithm
without feature selection on test data of NSL-KDD are
plotted as shown in Fig. 2. The time taken by several
classifiers is also measured and bar graph is plotted in
Fig. 3. It is observed from Fig. 2 and Fig. 3, that the true
positive rate of DTS is better than C4.5 technique,
however CART shows the best performance in terms of
true positive rate. But if we compare the results in terms
of delay to build the model, we can see that CART
takes very high time as compared to other techniques.
The results of comparison of various classifiers with
different number of features are presented in Fig. 4. It
can be seen from the results that with the proposed
technique, instead of training with all the features we
get good accuracy with even less number of features
selected using information gain.
5. Int. J. Advanced Networking and Applications
Volume: 07 Issue: 04 Pages: 2828-2834 (2016) ISSN: 0975-0290
2832
Figure1. Results of comparison of proposed algorithm with various other techniques
Figure 2. Comparison of ROC Curve of proposed algorithm with various other classifiers
Figure 3. Time of construction of model of the proposed algorithm with various tree based classifiers
6. Int. J. Advanced Networking and Applications
Volume: 07 Issue: 04 Pages: 2828-2834 (2016) ISSN: 0975-0290
2833
Figure 4. Comparison of accuracy of several classifiers with proposed algorithm
5. CONCLUSION AND FUTURE SCOPE OF
WORK
Decision tree assists the network administrator to
decide about the incoming traffic, i.e., whether the
coming data is malicious or not by providing a model
that separates malicious and non-malicious traffic. By
modified the split value calculation by taking the
average of all the values in the domain of an attribute.
The algorithm provides uniform weightage to all the
values in the domain. It allows taking less number of
attributes and provides acceptable accuracy in
reasonable account of time. From the results of the
experiments, it is concluded that the proposed algorithm
for signature based intrusion detection is more efficient
with respect to finding attacks in the network with less
number of features and it takes less time to construct the
model. It is also concluded that the efficiency depends
on the size of the data set and the number of features
used to construct the decision tree. The formula used in
DTS to calculate gain ratio can also be used in attribute
selection for feature reduction. Our future scope of
work is to improve the split value by using concepts
such as geometric mean which also gives uniform
weightage to the domain values.
REFERENCES
[1] P. Aggarwal, and S.K. Sharma, An Empirical
Comparison of Classifiers to Analyze Intrusion
Detection, Proc. of Fifth International Conference an
Advanced Computing and Communication
Technologies, 2015.
[2] Ho, and Tin Kam, Random Decision Forests,Proc.
of the 3rd International Conference on Document
Analysis and Recognition, Montreal, QC, 14–16 August
1995. pp. 278–282.
[3]http://www.saedsayad.com/oner.htm
[4]C. Cortes, and V. Vapnik, (1995). Support-vector
networks, Machine Learning 20 (3): 273.
[5] P.N. Tan, M. Steinbach, V. Kumar, Introduction to
Data Mining, Pearson Addison Wesley, 2005.
[6] J. Markey, Using Decision Tree Analysis for
Intrusion Detection: A How-To Guide, SANS Institute
InfoSec Reading Room, June, 2011.
[7] T. M. Mitchell, (1997). Machine Learning. The Mc-
Graw-Hill Companies, Inc. ISBN 0070428077.
[8] J.R. Quinlan, C4.5: Programs for Machine Learning,
Morgan Kaufmann Publishers, 1993.
[9] D.P. Gaikwad, and R.C. Thool, Intrusion Detection
System Using Bagging with Partial Decision Tree Base
Classifier,Proc. of the 4th
International Conference on
Advances in Computing, Communication and Control,
2015.
[10] K. Bajaj, and A. Arora, Improving the Intrusion
Detection using Discriminative Machine Learning
Approach and Improve the Time Complexity by Data
Mining Feature Selection Methods, International
Journal of Computer Science, vol. 76, Aug, 2013.
[11] A. Alazab, M. Hobbs, J.Abawajy, and M. Alazab,
Using Feature Selection for Intrusion Detection System,
International Symposium on Communications and
Information Technologies, 2012.
[12] S. Thaseen, and Ch. A. Kumar, An Analysis of
Supervised Tree Based Classifiers for Intrusion
Detection System, In Proc. of the 2013 International
Conference on Pattern Recognition, Informatics and
Mobile Engineering, Feb, 2013.
[13] S. Revathi, and A. Malathi, A detailed Analysis on
NSL-KDD Dataset Using Various Machine Learning
Techniques for Intrusion Detection, International
7. Int. J. Advanced Networking and Applications
Volume: 07 Issue: 04 Pages: 2828-2834 (2016) ISSN: 0975-0290
2834
Journal of Engineering research and Technology, vol 2,
Issue 12, Dec, 2013.
[14] V. Jyothsna, and V.V. Prasad, A Comparative
Study on Performance Evaluation of Intrusion
Detection System through Feature Reduction for High
Speed Networks, Global Journal of Computer Science
and Technology: E Network, Web and Security, vol. 14,
Issue 7, Version 1.0, 2014.
[15] P. Ghosh, C. Debnath, D. Metia, and Dr. R. Dutta,
An Efficient Hybrid Multilevel Intrusion Detection
System in Cloud Environment, IOSR Journal of
Computer Engineering (IOSR-JCE) e-ISSN: 2278-0661,
p-ISSN: 2278-8727, Volume 16, Issue 4, Ver. VII (Jul –
Aug. 2014), PP 16-26.
[16] K.S. Elekar, and M.M. Waghmare, Comparison of
Tree base Data Mining Algorithms for Network
Intrusion Detection, International Journal of
Engineering, Education and Technology, vol 3 Issue 2,
2015.
[17] S. Mallissery, S. Kolekar, and R. Ganiga, Accuracy
Analysis of Machine Learning Algorithms for intrusion
Detection System using NSL-KDD Dataset, Proc.
International Conference on Future Trends in
Computing and Communication -- FTCC 2013, July
2013, Bangkok.
[18] A.S.A. Aziz, A.E. Hassanien, S. El-Ola Hanafy,
M.F. Tolba, Multi-layer hybrid machine learning
techniques for anomalies detection and classification
approach, 13th International Conference on Hybrid
Intelligent Systems (HIS), 2013, IEEE.
[19] A. Raeeyat, and H. Sajedi, HIDS: DC-ADT: An
Effective Hybrid Intrusion Detection System based on
Data Correlation and Adaboost based Decision Tree
classifier, Journal of Academic and Applied Studies,
vol. 2(12), Dec. 2012, pp.19-33.
[20] R.M. Elbasiony, E.A. Sallan, T.E. Eltobely, and
M.M. Fahmy, A hybrid network intrusion detection
framework based on random forests and weighted k-
means, Ain Shams Engineering Journal, vol.4, Issue 4,
Dec, 2013, pp. 753-762.
[21] D.P. Gaikwad, and R.C. Thool, Intrusion Detection
System using Ripple Down Rule learner and Genetic
Algorithm, International Journal of Computer Science
and Information Technologies, vol. 5, 2014, pp. 6976-
6980.
[22] L.M. Ibrahim, D.T. Basheer, and M.S. Mahmod, A
comparison study for intrusion database (KDD99, NSL-
KDD) based on Self Organization Map (SOM)
Artificial Neural Network, Journal of Engineering
Science and Technology, vol. 8, No. 1, 2013, pp. 107-
119.