SlideShare a Scribd company logo
1 of 4
Download to read offline
ACEEE Int. J. on Signal & Image Processing, Vol. 01, No. 03, Dec 2010




        Intrusion Detection using C4.5: Performance
          Enhancement by Classifier Combination
                                      Manasi Gyanchandani1, R. N. Yadav2, J. L. Rana3
                                         Dept. of Information Technology MANIT Bhopal
                                               manasi_gyanchandani@yahoo.co.in
                                         Dept of Electronics and Communication MANIT Bhopal
                                                 myadav@gmail.com
                                         Ex-HOD, Dept. of CS/IT, MANIT, Bhopal
                                                   jl_rana@yahoo.com.

Abstract: Data Security has become a very critical part of any   computer security has been an active area of research since
organizational information system. Intrusion Detection           it was originally proposed in [8].
System (IDS) is used as a security measure to preserve data          Initially KDDCUP ’99 dataset was used for IDS but it
integrity and system availability from various attacks. This     has some inherent problems. The important problem was a
paper evaluates the performance of C4.5 classifier and its
combination using bagging, boosting and stacking over NSL-
                                                                 huge number of redundant records; about 78% and 75% of
KDD dataset for IDS. This dataset set consists of selected       the records are duplicate in the train and test set
records of the complete KDD dataset.                             respectively. This large amount of redundant records in the
                                                                 train set will cause learning algorithms to be biased
                     I.   INTRODUCTION                           towards the more frequent records, and thus prevent it from
                                                                 learning unfrequent records which are usually more
    Our lives have drastically changed due to information        harmful to networks such as U2R and R2L attacks. The
technology, at the same time we are completely dependent         other problem with duplicated records in the test set will
on technology that is vulnerable to attacks. Because of          cause the evaluation results to be biased by the methods
these attacks the confidentiality, integrity and availability    which have better detection rates on the frequent records
of the information may be lost. It is estimated that these       .To remove these problems a new data set was proposed,
attacks are costing tens or even hundreds of millions of         NSL-KDD [1].
dollars each year. The numbers of attacks are even                   C4.5 which is an extension of ID3 algorithm [11] is an
doubling each year. These attacks may cause a serious            algorithm used to generate a decision tree. The decision
threat to national security. IDS were designed to monitor        trees generated by C4.5 can be used for classification, and
attacks and generate alarms whenever certain abnormal            for this reason, it is often referred to as a statistical
activities are detected.                                         classifier .The results will vary significantly if training data
    IDSs can be categorized based on which events they           is changed. This variation is known as error due to variance
monitor, the way they collect information that an intrusion      that can be minimized using various classifier
has occurred. IDSs that critically analysis data circulating     combinations.
on the network are called as Network based IDS (NIDSs)               Section II presents the description of dataset used.
and IDS that reside on the host and collect logs of              Section III describes the various classifier combination
operating system-related events are called as Host based         techniques. Section IV provides the experimental results
IDS (HIDSs) [3] [8].                                             and discussion. Section V concludes the paper.
    Two types of Intrusion Detection techniques exist based
            on the method of inspecting the traffic:                                II. DATA SET DESCRIPTION
     • Signature based IDS
     • Statistical anomaly based IDS.                                Mostly all the experiments on intrusion detection are
    In signature based IDS, also known as misuse detection,      done on KDDCUP ’99 dataset, which is a subset of the
signatures of known attacks are stored and the events are        1998 DARPA Intrusion Detection Evaluation data set, and
matched against the stored signatures. It will signal an         is processed, extracting 41 features from the raw data of
intrusion if a match is found. The main drawback with this       DARPA 98 data set. [4] defined higher-level features that
method is that it cannot detect new attacks whose                help in distinguishing between “good” normal connections
signatures are unknown. This means that an IDS using             from “bad” connections (attacks). This data can be used to
misuse detection will only detect known attacks or attacks       test both host based and network based systems, and both
that are similar enough to a known attack to match its           signature and anomaly detection systems. A connection is
signature [3]. Statistical anomaly based intrusion detection     a sequence of Transmission Control Protocol (TCP)
has attracted many academic researchers due to its potential     packets starting and ending with well defined times,
for addressing novel attacks. The researchers have found         between which data flows from a source IP address to a
that several machine learning algorithms have a very high        target IP address under some well defined protocol. Each
detection rate while keeping a low false alarm rate.             connection is labeled as normal, or as an attack, with
Anomaly detection applied to intrusion detection and             exactly one specific attack type. Each connection record
                                                                 consists of about 100 bytes [9]. Some of the basics features
                                                                 of individual TCP connection are listed in Table I .

© 2010 ACEEE                                                46
DOI: 01.IJSIP.01.03.45
ACEEE Int. J. on Signal & Image Processing, Vol. 01, No. 03, Dec 2010



                                TABLE I                                                                TPR = TP / (TP+FN)
            BASIC FEATURES OF INDIVIDUAL TCP CONNECTIONS
                                                                                         • A false positive (FP) occurs when the outcome is
     Feature                                                                                 incorrectly predicted as yes (or positive) when it
                                  Description                        Type
      name                                                                                   is actually no (negative). It is calculated as
                      length (number of seconds) of the             Contin                   below.
    Duration
                                 connection                          uous                             FPR = FP / (TN + FP)
    Protocol
       _            type of the protocol, e.g. tcp, udp, etc.
                                                                    Discret              • A false negative (FN) occurs when the outcome
                                                                       e
     Type                                                                                    is incorrectly predicted as negative when it is
     Service
                   network service on the destination, e.g.,        Discret                  actually positive.
                              http, telnet, etc.                       e
                                                                                         • Recall: The percentage of the total relevant
    src_byte         number of data bytes from source to            Contin
        s                        destination                         uous                    documents in a database retrieved by your
    dst_byte       number of data bytes from destination to         Contin                   search. If it is known that there were 1000
        s                          source                            uous                    relevant documents in a database and search
                                                                    Discret                  retrieved 100 of these relevant documents, the
      Flag         normal or error status of the connection
                                                                       e
                     1 if connection is from/to the same            Discret
                                                                                             recall would be 10%. It is calculated as below.
      Land                                                                                            Recall =TP / (TP+FN)
                            host/port; 0 otherwise                     e
    wrong_fr
                        number of ``wrong'' fragments
                                                                    Contin               • Precision: The percentage of relevant documents
     agment                                                          uous                    in relation to the number of documents retrieved.
                                                                    Contin
     Urgent               number of urgent packets
                                                                     uous
                                                                                             If search retrieves 100 documents and 20 of
                                                                                             these are relevant, the precision is 20%. It is
A. NSL-KDD                                                                                   calculated as below.
    KDD train and test set consists of huge number of                                                   Precision=TP / (TP+FP)
redundant records. Almost about 78% and 75% of the                                       • The overall success rate is the number of correct
records are duplicated in the train and test set respectively.                               classifications divided by the total number of
This may cause the classification algorithms to be biased                                    classifications.
towards these redundant records and thus prevent it from                                       Success rate = (TP+TN) / (TP+TN+FP+FN)
classifying the other records (which are not duplicate).To                                         Error Rate = 1- Success rate
solve this problem, a new dataset was developed NSL-                                     • In a multiclass prediction, the result on a test set is
KDD. All the repeated records in the entire KDD train and                                often displayed as a two dimensional confusion
test set were removed, and only one copy of each record                                  matrix with a row and column for each class. Each
was kept. Tables II and III show the statistical analysis of                             matrix element shows the number of test examples
the reduction of repeated records in the KDD train and test                              for which the actual class is the row and the
sets, respectively, [1].                                                                 predicted class is the column. Good results
                       TABLE II                                                          correspond to large numbers down the main
  STATISTICAL ANALYSIS OF REDUNDANT RECORDS IN THE                                       diagonal and small, ideally zero, off-diagonal
                   KDD TRAIN SET
                                                                                         elements.The confusion Matrix is formed based on
                                                                                         the Table IV.
                                                                Reduction
               Original Records      Distinct Records                                                        TABLE IV
                                                                  rate
                                                                                                         CONFUSION MATRIX
   Attacks        3,925,650               262,178                93.32%
   Normal          972,781                812,814                16.44%                                      Predicted Class
    Total         4,898,431              1,074,992               78.05%
                                                                                                                  Attack       Normal
                       TABLE III
  STATISTICAL ANALYSIS OF REDUNDANT RECORDS IN THE                                             Actual
                                                                                                        Attack      TP          FN
                    KDD TEST SET                                                               Class
                                                                                                        Normal      FP          TN
               Original Records     Distinct Records       Reduction rate
   Attacks         250,436                29,378                88.26%
   Normal          60,591                 47,911                20.92%                 III.CLASSIFIER COMBINATION TECHNIQUES
    Total          311,027                77,289                75.15%
                                                                                       Classifier combination technique can be used to reduce
B. Evaluation Metrics                                                              the error due to variance. In order to make decisions in
                                                                                   intrusion detection more reliable , the output of different
   Metrics which are mainly used to evaluate the
                                                                                   models can be combined. Several machine learning
performance of classifier are present in [6] [2] and are
                                                                                   techniques do this by learning an ensemble of models and
given here for ready reference.
                                                                                   using them in combination , Bagging, Boosting, and
 • The true positives (TP) and true negatives (TN) are                             Stacking are most efficient among them. These models can
   correct classifications. True positive is the probability                       increase the predictive performance over a single model
   that there is an alert, when there is an intrusion. It is                       and can also be applied to numeric prediction problems and
   calculated as below.                                                            classification tasks. The performance of these three models

© 2010 ACEEE                                                                  47
DOI: 01.IJSIP.01.03.45
ACEEE Int. J. on Signal & Image Processing, Vol. 01, No. 03, Dec 2010



is good. An ensemble of classifiers is a set of classifiers         It was found that for the normal class, as shown in the
whose individual decisions are combined to classify new          Table V, bagging gives the better result. The recall was
examples. The purpose of combining classifiers consists on       found to be 0.719 for bagging and it was 0.708 for C4.5,
improving the accuracy of a single classifier [10].              both having the same precision value (0.973). While for the
                                                                 anomaly class as shown in Table VI, both recall and
A. Bagging:
                                                                 precision have higher values for bagging.
    The Bootstrap aggregating algorithm generates different
                                                                                                TABLE V
classifiers from different bootstrap samples and combines
                                                                            PERFORMANCE METRICS FOR NORMAL CLASS
decisions from the different classifiers into a single
prediction by voting (the class that gets more votes from                         Bagging         Boosting    Stacking    C4.5
the classifiers wins).                                                 TP          0.973           0.957       0.974
                                                                                                                          0.973
B. Boosting:
                                                                       FP          0.288           0.346       0.326      0.304
    Another method to construct an ensemble of classifiers
                                                                     Recall        0.719           0.677       0.693      0.708
is know as boosting, which is used to boost the
                                                                    Precision      0.973           0.957       0.974      0.973
performance of a weak learner. A weak learner is a simple
classifier whose error is less than 50% on training                                             TABLE VI
instances. The models which are more successful will be                 PERFORMANCE METRICS FOR ANOMALY CLASS
assigned with more weight as compared to other models.
Here each new model is influenced by the performance of                           Bagging          Boosting    Stacking     C4.5
previously built model.                                                TP          0.712            0.654       0.674      0.696
    Thus boosting can built a powerful combined classifier             FP          0.027            0.043       0.026      0.027
from very simple learning methods. It can convert these              Recall        0.972            0.953       0.971      0.971
simple learning methods called as weak learners into strong
                                                                    Precision      0.712            0.654       0.674      0.696
ones. It produces classifiers that are more accurate on fresh
data than ones generated by bagging. But it sometimes fails
in practical situations: It generate a classifier that is less                              V     CONCLUSIONS
accurate than a single classifier from the same data [7].
                                                                     Error due to variance has been reduced using classifier
     C. Stacking:                                                combinations thus increasing the performance of the
     Stacking is the abbreviation to refer to Stacked            classification using the NSL-KDD dataset. Out of the three
Generalization. Unlike bagging and boosting, it uses             classifiers Bagging provides better results. NSL-KDD
different learning algorithms to generate the ensemble of        dataset can be used for performance evaluation for 5-
classifiers. The main idea of stacking is to combine             classes (normal, dos, probe, u2r and r2l) instead of 2-
classifiers from different learners such as decision trees,      classes. Further performance can be improved by reducing
instance-based learners, etc.                                    the features as given in [12].
     Since each one uses different knowledge representation          Different set of features are used for different class.
and different learning biases, the hypothesis space will be      More classification algorithm and its combination can be
explored differently, and different classifiers will be          used on NSL-KDD dataset
obtained. Thus, it is expected that they will not be
correlated.                                                                                REFERENCES
    Once the classifiers have been generated, they must be       [1] M. Tavallaee, E. Bagheri, W. Lu, and A. Ghorbani, “A
combined. Unlike bagging and boosting, stacking does not             Detailed Analysis of the KDD CUP 99 Data Set,”
use a voting system because, for example, if the majority of         Proceedings of the         Second IEEE Symposium on
the classifiers make bad predictions this will lead to a final       Computational Intelligence for Security and Defense
bad classification. To solve this problem, stacking uses the         Applications (CISDA) 2009.
concept of Meta learner.[10] The Meta learner (or level-1        [2] M. Shyu, S. Chen, K. Sarinnapakorn, & L. Chang,” A novel
model), tries to learn, using a learning algorithm, how the          anomaly detection scheme based on principal component
                                                                     classifier”, Proceedings of the IEEE foundation & New
decisions of the base classifiers (or level-0 models) should
                                                                     Directions of Data Mining Workshop, in conjunction with the
be combined .                                                        Third IEEE International Conference on Data Mining
                                                                     (ICDM03), pp. 172-179, 2003.
               IV RESULTS AND DISCUSSION                         [3] D.E.Denning, “An Intrusion Detection Model”, IEEE
                                                                     Transactions on Software Engineering, SE-13, pp. 222-232,
    In order to reduce the error due to variance classifier          1987.
combinations are used. Initially C4.5 classifier is applied      [4] Stolfo J., Fan W., Lee W., Prodromidis A., and Chan P.K.,
over NSL-KDD dataset. NSL-KDD contains 125973                        “Cost-based modeling and evaluation for data mining with
records in the train set and 22544 records in the test set. To       application to fraud and intrusion detection,” DARPA
improve the performance of C4.5 classifier over NSL-KDD              Information Survivability Conference, 2000.
dataset, classifier combinations techniques: bagging,            [5] http://weka.sourceforge.net/wekadoc/index.php/en:
boosting and stacking are used.                                  [6] P Srinivasulu, D Nagaraju, P Ramesh Kumar, and K
                                                                     Nagerwara Rao, “Classifying the Network Intrusion Attacks


© 2010 ACEEE                                                48
DOI: 01.IJSIP.01.03.45
ACEEE Int. J. on Signal & Image Processing, Vol. 01, No. 03, Dec 2010



    using Data Mining Classification Methods and their            [9] The KDD Archive. KDD99 cup dataset, 1999.
    Performance                                                      http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html
    Comparison”, International Journal of Computer Science and    [10] Ricardo Aler, Daniel Borrajo, and Agapito Ledezma, “
    Network Security, Vol.9 No.6, pp 11-18 June 2009.                   Heuristic Search Based Stacking of Classifiers”, Universidad
[7] Ian H. Witten and Eibe Frank, “Data Mining”, Practical            Carlos III, Avda, Universidad, 30, 28911 Leganés (Madrid)
    Machine Learning Tools and Techniques, Second Edition,            , 2002.
    Elsevier, 2005.                                                [11] http://en.wikipedia.org/wiki/C4.5_algorithm
[8] Srilatha Chebrolu, Ajith Abraham, Johnson P. Thomas,”              [12] Anazida Zainal, Mohd Aizaini Maarof, and Siti Mariyam
    Feature Deduction and       ensemble design of Intrusion          Shamsuddin, Ensemble Classifiers for Network Intrusion
    Detection”, Elsevier, Computer and Security, 24,pp 295-307,       Detection System”, Journal of Information Assurance and
    2005.                                                             Security 4, 217-225, 2009




© 2010 ACEEE                                                 49
DOI: 01.IJSIP.01.03.45

More Related Content

What's hot

Quality and Distortion Evaluation of Audio Signal by Spectrum
Quality and Distortion Evaluation of Audio Signal by SpectrumQuality and Distortion Evaluation of Audio Signal by Spectrum
Quality and Distortion Evaluation of Audio Signal by Spectrum
CSCJournals
 
Conditional Averaging a New Algorithm for Digital Filter
Conditional Averaging a New Algorithm for Digital FilterConditional Averaging a New Algorithm for Digital Filter
Conditional Averaging a New Algorithm for Digital Filter
IDES Editor
 
Speaker identification using mel frequency
Speaker identification using mel frequency Speaker identification using mel frequency
Speaker identification using mel frequency
Phan Duy
 

What's hot (16)

Quality and Distortion Evaluation of Audio Signal by Spectrum
Quality and Distortion Evaluation of Audio Signal by SpectrumQuality and Distortion Evaluation of Audio Signal by Spectrum
Quality and Distortion Evaluation of Audio Signal by Spectrum
 
Identification of Causal Variables for Building Energy Fault Detection by Sem...
Identification of Causal Variables for Building Energy Fault Detection by Sem...Identification of Causal Variables for Building Energy Fault Detection by Sem...
Identification of Causal Variables for Building Energy Fault Detection by Sem...
 
Ijrdtvlis11 140006
Ijrdtvlis11 140006Ijrdtvlis11 140006
Ijrdtvlis11 140006
 
52 57
52 5752 57
52 57
 
speech enhancement
speech enhancementspeech enhancement
speech enhancement
 
A Survey on DPI Techniques for Regular Expression Detection in Network Intrus...
A Survey on DPI Techniques for Regular Expression Detection in Network Intrus...A Survey on DPI Techniques for Regular Expression Detection in Network Intrus...
A Survey on DPI Techniques for Regular Expression Detection in Network Intrus...
 
Conditional Averaging a New Algorithm for Digital Filter
Conditional Averaging a New Algorithm for Digital FilterConditional Averaging a New Algorithm for Digital Filter
Conditional Averaging a New Algorithm for Digital Filter
 
Real Time Speaker Identification System – Design, Implementation and Validation
Real Time Speaker Identification System – Design, Implementation and ValidationReal Time Speaker Identification System – Design, Implementation and Validation
Real Time Speaker Identification System – Design, Implementation and Validation
 
Speaker identification using mel frequency
Speaker identification using mel frequency Speaker identification using mel frequency
Speaker identification using mel frequency
 
Marathi Isolated Word Recognition System using MFCC and DTW Features
Marathi Isolated Word Recognition System using MFCC and DTW FeaturesMarathi Isolated Word Recognition System using MFCC and DTW Features
Marathi Isolated Word Recognition System using MFCC and DTW Features
 
A017410108
A017410108A017410108
A017410108
 
Speaker recognition.
Speaker recognition.Speaker recognition.
Speaker recognition.
 
A Novel Steganography Technique that Embeds Security along with Compression
A Novel Steganography Technique that Embeds Security along with CompressionA Novel Steganography Technique that Embeds Security along with Compression
A Novel Steganography Technique that Embeds Security along with Compression
 
When ePRO
When ePROWhen ePRO
When ePRO
 
IJCER (www.ijceronline.com) International Journal of computational Engineeri...
 IJCER (www.ijceronline.com) International Journal of computational Engineeri... IJCER (www.ijceronline.com) International Journal of computational Engineeri...
IJCER (www.ijceronline.com) International Journal of computational Engineeri...
 
T 121 5300 (2008) User Interface Design 2 Cli
T 121 5300 (2008) User Interface Design 2   CliT 121 5300 (2008) User Interface Design 2   Cli
T 121 5300 (2008) User Interface Design 2 Cli
 

Similar to Intrusion Detection using C4.5: Performance Enhancement by Classifier Combination

CLASSIFIER SELECTION MODELS FOR INTRUSION DETECTION SYSTEM (IDS)
CLASSIFIER SELECTION MODELS FOR INTRUSION DETECTION SYSTEM (IDS)CLASSIFIER SELECTION MODELS FOR INTRUSION DETECTION SYSTEM (IDS)
CLASSIFIER SELECTION MODELS FOR INTRUSION DETECTION SYSTEM (IDS)
ieijjournal1
 
International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)
ijceronline
 
An intrusion detection system for packet and flow based networks using deep n...
An intrusion detection system for packet and flow based networks using deep n...An intrusion detection system for packet and flow based networks using deep n...
An intrusion detection system for packet and flow based networks using deep n...
IJECEIAES
 
Anomaly detection final
Anomaly detection finalAnomaly detection final
Anomaly detection final
Akshay Bansal
 

Similar to Intrusion Detection using C4.5: Performance Enhancement by Classifier Combination (20)

Design and Implementation of Intrusion Detection and Prevention by Applying D...
Design and Implementation of Intrusion Detection and Prevention by Applying D...Design and Implementation of Intrusion Detection and Prevention by Applying D...
Design and Implementation of Intrusion Detection and Prevention by Applying D...
 
COMBINING NAIVE BAYES AND DECISION TREE FOR ADAPTIVE INTRUSION DETECTION
COMBINING NAIVE BAYES AND DECISION TREE FOR ADAPTIVE INTRUSION DETECTIONCOMBINING NAIVE BAYES AND DECISION TREE FOR ADAPTIVE INTRUSION DETECTION
COMBINING NAIVE BAYES AND DECISION TREE FOR ADAPTIVE INTRUSION DETECTION
 
352 356
352 356352 356
352 356
 
Proposal for System Analysis and Desing
Proposal for System Analysis and DesingProposal for System Analysis and Desing
Proposal for System Analysis and Desing
 
CLASSIFIER SELECTION MODELS FOR INTRUSION DETECTION SYSTEM (IDS)
CLASSIFIER SELECTION MODELS FOR INTRUSION DETECTION SYSTEM (IDS)CLASSIFIER SELECTION MODELS FOR INTRUSION DETECTION SYSTEM (IDS)
CLASSIFIER SELECTION MODELS FOR INTRUSION DETECTION SYSTEM (IDS)
 
Network Intrusion Datasets Used In Network Security Education
Network Intrusion Datasets Used In Network Security EducationNetwork Intrusion Datasets Used In Network Security Education
Network Intrusion Datasets Used In Network Security Education
 
NETWORK INTRUSION DATASETS USED IN NETWORK SECURITY EDUCATION
NETWORK INTRUSION DATASETS USED IN NETWORK SECURITY EDUCATIONNETWORK INTRUSION DATASETS USED IN NETWORK SECURITY EDUCATION
NETWORK INTRUSION DATASETS USED IN NETWORK SECURITY EDUCATION
 
NETWORJS3.pdf
NETWORJS3.pdfNETWORJS3.pdf
NETWORJS3.pdf
 
Current issues - International Journal of Network Security & Its Applications...
Current issues - International Journal of Network Security & Its Applications...Current issues - International Journal of Network Security & Its Applications...
Current issues - International Journal of Network Security & Its Applications...
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
 
International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)
 
An intrusion detection system for packet and flow based networks using deep n...
An intrusion detection system for packet and flow based networks using deep n...An intrusion detection system for packet and flow based networks using deep n...
An intrusion detection system for packet and flow based networks using deep n...
 
An Approach of Automatic Data Mining Algorithm for Intrusion Detection and P...
An Approach of Automatic Data Mining Algorithm for Intrusion  Detection and P...An Approach of Automatic Data Mining Algorithm for Intrusion  Detection and P...
An Approach of Automatic Data Mining Algorithm for Intrusion Detection and P...
 
A SURVEY ON THE USE OF DATA CLUSTERING FOR INTRUSION DETECTION SYSTEM IN CYBE...
A SURVEY ON THE USE OF DATA CLUSTERING FOR INTRUSION DETECTION SYSTEM IN CYBE...A SURVEY ON THE USE OF DATA CLUSTERING FOR INTRUSION DETECTION SYSTEM IN CYBE...
A SURVEY ON THE USE OF DATA CLUSTERING FOR INTRUSION DETECTION SYSTEM IN CYBE...
 
EFFICIENT IDENTIFICATION AND REDUCTION OF MULTIPLE ATTACKS ADD VICTIMISATION ...
EFFICIENT IDENTIFICATION AND REDUCTION OF MULTIPLE ATTACKS ADD VICTIMISATION ...EFFICIENT IDENTIFICATION AND REDUCTION OF MULTIPLE ATTACKS ADD VICTIMISATION ...
EFFICIENT IDENTIFICATION AND REDUCTION OF MULTIPLE ATTACKS ADD VICTIMISATION ...
 
A45010107
A45010107A45010107
A45010107
 
A45010107
A45010107A45010107
A45010107
 
Anomaly detection final
Anomaly detection finalAnomaly detection final
Anomaly detection final
 
Comparison study of machine learning classifiers to detect anomalies
Comparison study of machine learning classifiers  to detect anomalies Comparison study of machine learning classifiers  to detect anomalies
Comparison study of machine learning classifiers to detect anomalies
 
IDS Network security - Bouvry
IDS Network security - BouvryIDS Network security - Bouvry
IDS Network security - Bouvry
 

More from IDES Editor

Optimal Placement of DG for Loss Reduction and Voltage Sag Mitigation in Radi...
Optimal Placement of DG for Loss Reduction and Voltage Sag Mitigation in Radi...Optimal Placement of DG for Loss Reduction and Voltage Sag Mitigation in Radi...
Optimal Placement of DG for Loss Reduction and Voltage Sag Mitigation in Radi...
IDES Editor
 
Line Losses in the 14-Bus Power System Network using UPFC
Line Losses in the 14-Bus Power System Network using UPFCLine Losses in the 14-Bus Power System Network using UPFC
Line Losses in the 14-Bus Power System Network using UPFC
IDES Editor
 
Cloud Security and Data Integrity with Client Accountability Framework
Cloud Security and Data Integrity with Client Accountability FrameworkCloud Security and Data Integrity with Client Accountability Framework
Cloud Security and Data Integrity with Client Accountability Framework
IDES Editor
 
Permutation of Pixels within the Shares of Visual Cryptography using KBRP for...
Permutation of Pixels within the Shares of Visual Cryptography using KBRP for...Permutation of Pixels within the Shares of Visual Cryptography using KBRP for...
Permutation of Pixels within the Shares of Visual Cryptography using KBRP for...
IDES Editor
 

More from IDES Editor (20)

Power System State Estimation - A Review
Power System State Estimation - A ReviewPower System State Estimation - A Review
Power System State Estimation - A Review
 
Artificial Intelligence Technique based Reactive Power Planning Incorporating...
Artificial Intelligence Technique based Reactive Power Planning Incorporating...Artificial Intelligence Technique based Reactive Power Planning Incorporating...
Artificial Intelligence Technique based Reactive Power Planning Incorporating...
 
Design and Performance Analysis of Genetic based PID-PSS with SVC in a Multi-...
Design and Performance Analysis of Genetic based PID-PSS with SVC in a Multi-...Design and Performance Analysis of Genetic based PID-PSS with SVC in a Multi-...
Design and Performance Analysis of Genetic based PID-PSS with SVC in a Multi-...
 
Optimal Placement of DG for Loss Reduction and Voltage Sag Mitigation in Radi...
Optimal Placement of DG for Loss Reduction and Voltage Sag Mitigation in Radi...Optimal Placement of DG for Loss Reduction and Voltage Sag Mitigation in Radi...
Optimal Placement of DG for Loss Reduction and Voltage Sag Mitigation in Radi...
 
Line Losses in the 14-Bus Power System Network using UPFC
Line Losses in the 14-Bus Power System Network using UPFCLine Losses in the 14-Bus Power System Network using UPFC
Line Losses in the 14-Bus Power System Network using UPFC
 
Study of Structural Behaviour of Gravity Dam with Various Features of Gallery...
Study of Structural Behaviour of Gravity Dam with Various Features of Gallery...Study of Structural Behaviour of Gravity Dam with Various Features of Gallery...
Study of Structural Behaviour of Gravity Dam with Various Features of Gallery...
 
Assessing Uncertainty of Pushover Analysis to Geometric Modeling
Assessing Uncertainty of Pushover Analysis to Geometric ModelingAssessing Uncertainty of Pushover Analysis to Geometric Modeling
Assessing Uncertainty of Pushover Analysis to Geometric Modeling
 
Secure Multi-Party Negotiation: An Analysis for Electronic Payments in Mobile...
Secure Multi-Party Negotiation: An Analysis for Electronic Payments in Mobile...Secure Multi-Party Negotiation: An Analysis for Electronic Payments in Mobile...
Secure Multi-Party Negotiation: An Analysis for Electronic Payments in Mobile...
 
Selfish Node Isolation & Incentivation using Progressive Thresholds
Selfish Node Isolation & Incentivation using Progressive ThresholdsSelfish Node Isolation & Incentivation using Progressive Thresholds
Selfish Node Isolation & Incentivation using Progressive Thresholds
 
Various OSI Layer Attacks and Countermeasure to Enhance the Performance of WS...
Various OSI Layer Attacks and Countermeasure to Enhance the Performance of WS...Various OSI Layer Attacks and Countermeasure to Enhance the Performance of WS...
Various OSI Layer Attacks and Countermeasure to Enhance the Performance of WS...
 
Responsive Parameter based an AntiWorm Approach to Prevent Wormhole Attack in...
Responsive Parameter based an AntiWorm Approach to Prevent Wormhole Attack in...Responsive Parameter based an AntiWorm Approach to Prevent Wormhole Attack in...
Responsive Parameter based an AntiWorm Approach to Prevent Wormhole Attack in...
 
Cloud Security and Data Integrity with Client Accountability Framework
Cloud Security and Data Integrity with Client Accountability FrameworkCloud Security and Data Integrity with Client Accountability Framework
Cloud Security and Data Integrity with Client Accountability Framework
 
Genetic Algorithm based Layered Detection and Defense of HTTP Botnet
Genetic Algorithm based Layered Detection and Defense of HTTP BotnetGenetic Algorithm based Layered Detection and Defense of HTTP Botnet
Genetic Algorithm based Layered Detection and Defense of HTTP Botnet
 
Enhancing Data Storage Security in Cloud Computing Through Steganography
Enhancing Data Storage Security in Cloud Computing Through SteganographyEnhancing Data Storage Security in Cloud Computing Through Steganography
Enhancing Data Storage Security in Cloud Computing Through Steganography
 
Low Energy Routing for WSN’s
Low Energy Routing for WSN’sLow Energy Routing for WSN’s
Low Energy Routing for WSN’s
 
Permutation of Pixels within the Shares of Visual Cryptography using KBRP for...
Permutation of Pixels within the Shares of Visual Cryptography using KBRP for...Permutation of Pixels within the Shares of Visual Cryptography using KBRP for...
Permutation of Pixels within the Shares of Visual Cryptography using KBRP for...
 
Rotman Lens Performance Analysis
Rotman Lens Performance AnalysisRotman Lens Performance Analysis
Rotman Lens Performance Analysis
 
Band Clustering for the Lossless Compression of AVIRIS Hyperspectral Images
Band Clustering for the Lossless Compression of AVIRIS Hyperspectral ImagesBand Clustering for the Lossless Compression of AVIRIS Hyperspectral Images
Band Clustering for the Lossless Compression of AVIRIS Hyperspectral Images
 
Microelectronic Circuit Analogous to Hydrogen Bonding Network in Active Site ...
Microelectronic Circuit Analogous to Hydrogen Bonding Network in Active Site ...Microelectronic Circuit Analogous to Hydrogen Bonding Network in Active Site ...
Microelectronic Circuit Analogous to Hydrogen Bonding Network in Active Site ...
 
Texture Unit based Monocular Real-world Scene Classification using SOM and KN...
Texture Unit based Monocular Real-world Scene Classification using SOM and KN...Texture Unit based Monocular Real-world Scene Classification using SOM and KN...
Texture Unit based Monocular Real-world Scene Classification using SOM and KN...
 

Recently uploaded

Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 

Recently uploaded (20)

Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source Milvus
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 

Intrusion Detection using C4.5: Performance Enhancement by Classifier Combination

  • 1. ACEEE Int. J. on Signal & Image Processing, Vol. 01, No. 03, Dec 2010 Intrusion Detection using C4.5: Performance Enhancement by Classifier Combination Manasi Gyanchandani1, R. N. Yadav2, J. L. Rana3 Dept. of Information Technology MANIT Bhopal manasi_gyanchandani@yahoo.co.in Dept of Electronics and Communication MANIT Bhopal myadav@gmail.com Ex-HOD, Dept. of CS/IT, MANIT, Bhopal jl_rana@yahoo.com. Abstract: Data Security has become a very critical part of any computer security has been an active area of research since organizational information system. Intrusion Detection it was originally proposed in [8]. System (IDS) is used as a security measure to preserve data Initially KDDCUP ’99 dataset was used for IDS but it integrity and system availability from various attacks. This has some inherent problems. The important problem was a paper evaluates the performance of C4.5 classifier and its combination using bagging, boosting and stacking over NSL- huge number of redundant records; about 78% and 75% of KDD dataset for IDS. This dataset set consists of selected the records are duplicate in the train and test set records of the complete KDD dataset. respectively. This large amount of redundant records in the train set will cause learning algorithms to be biased I. INTRODUCTION towards the more frequent records, and thus prevent it from learning unfrequent records which are usually more Our lives have drastically changed due to information harmful to networks such as U2R and R2L attacks. The technology, at the same time we are completely dependent other problem with duplicated records in the test set will on technology that is vulnerable to attacks. Because of cause the evaluation results to be biased by the methods these attacks the confidentiality, integrity and availability which have better detection rates on the frequent records of the information may be lost. It is estimated that these .To remove these problems a new data set was proposed, attacks are costing tens or even hundreds of millions of NSL-KDD [1]. dollars each year. The numbers of attacks are even C4.5 which is an extension of ID3 algorithm [11] is an doubling each year. These attacks may cause a serious algorithm used to generate a decision tree. The decision threat to national security. IDS were designed to monitor trees generated by C4.5 can be used for classification, and attacks and generate alarms whenever certain abnormal for this reason, it is often referred to as a statistical activities are detected. classifier .The results will vary significantly if training data IDSs can be categorized based on which events they is changed. This variation is known as error due to variance monitor, the way they collect information that an intrusion that can be minimized using various classifier has occurred. IDSs that critically analysis data circulating combinations. on the network are called as Network based IDS (NIDSs) Section II presents the description of dataset used. and IDS that reside on the host and collect logs of Section III describes the various classifier combination operating system-related events are called as Host based techniques. Section IV provides the experimental results IDS (HIDSs) [3] [8]. and discussion. Section V concludes the paper. Two types of Intrusion Detection techniques exist based on the method of inspecting the traffic: II. DATA SET DESCRIPTION • Signature based IDS • Statistical anomaly based IDS. Mostly all the experiments on intrusion detection are In signature based IDS, also known as misuse detection, done on KDDCUP ’99 dataset, which is a subset of the signatures of known attacks are stored and the events are 1998 DARPA Intrusion Detection Evaluation data set, and matched against the stored signatures. It will signal an is processed, extracting 41 features from the raw data of intrusion if a match is found. The main drawback with this DARPA 98 data set. [4] defined higher-level features that method is that it cannot detect new attacks whose help in distinguishing between “good” normal connections signatures are unknown. This means that an IDS using from “bad” connections (attacks). This data can be used to misuse detection will only detect known attacks or attacks test both host based and network based systems, and both that are similar enough to a known attack to match its signature and anomaly detection systems. A connection is signature [3]. Statistical anomaly based intrusion detection a sequence of Transmission Control Protocol (TCP) has attracted many academic researchers due to its potential packets starting and ending with well defined times, for addressing novel attacks. The researchers have found between which data flows from a source IP address to a that several machine learning algorithms have a very high target IP address under some well defined protocol. Each detection rate while keeping a low false alarm rate. connection is labeled as normal, or as an attack, with Anomaly detection applied to intrusion detection and exactly one specific attack type. Each connection record consists of about 100 bytes [9]. Some of the basics features of individual TCP connection are listed in Table I . © 2010 ACEEE 46 DOI: 01.IJSIP.01.03.45
  • 2. ACEEE Int. J. on Signal & Image Processing, Vol. 01, No. 03, Dec 2010 TABLE I TPR = TP / (TP+FN) BASIC FEATURES OF INDIVIDUAL TCP CONNECTIONS • A false positive (FP) occurs when the outcome is Feature incorrectly predicted as yes (or positive) when it Description Type name is actually no (negative). It is calculated as length (number of seconds) of the Contin below. Duration connection uous FPR = FP / (TN + FP) Protocol _ type of the protocol, e.g. tcp, udp, etc. Discret • A false negative (FN) occurs when the outcome e Type is incorrectly predicted as negative when it is Service network service on the destination, e.g., Discret actually positive. http, telnet, etc. e • Recall: The percentage of the total relevant src_byte number of data bytes from source to Contin s destination uous documents in a database retrieved by your dst_byte number of data bytes from destination to Contin search. If it is known that there were 1000 s source uous relevant documents in a database and search Discret retrieved 100 of these relevant documents, the Flag normal or error status of the connection e 1 if connection is from/to the same Discret recall would be 10%. It is calculated as below. Land Recall =TP / (TP+FN) host/port; 0 otherwise e wrong_fr number of ``wrong'' fragments Contin • Precision: The percentage of relevant documents agment uous in relation to the number of documents retrieved. Contin Urgent number of urgent packets uous If search retrieves 100 documents and 20 of these are relevant, the precision is 20%. It is A. NSL-KDD calculated as below. KDD train and test set consists of huge number of Precision=TP / (TP+FP) redundant records. Almost about 78% and 75% of the • The overall success rate is the number of correct records are duplicated in the train and test set respectively. classifications divided by the total number of This may cause the classification algorithms to be biased classifications. towards these redundant records and thus prevent it from Success rate = (TP+TN) / (TP+TN+FP+FN) classifying the other records (which are not duplicate).To Error Rate = 1- Success rate solve this problem, a new dataset was developed NSL- • In a multiclass prediction, the result on a test set is KDD. All the repeated records in the entire KDD train and often displayed as a two dimensional confusion test set were removed, and only one copy of each record matrix with a row and column for each class. Each was kept. Tables II and III show the statistical analysis of matrix element shows the number of test examples the reduction of repeated records in the KDD train and test for which the actual class is the row and the sets, respectively, [1]. predicted class is the column. Good results TABLE II correspond to large numbers down the main STATISTICAL ANALYSIS OF REDUNDANT RECORDS IN THE diagonal and small, ideally zero, off-diagonal KDD TRAIN SET elements.The confusion Matrix is formed based on the Table IV. Reduction Original Records Distinct Records TABLE IV rate CONFUSION MATRIX Attacks 3,925,650 262,178 93.32% Normal 972,781 812,814 16.44% Predicted Class Total 4,898,431 1,074,992 78.05% Attack Normal TABLE III STATISTICAL ANALYSIS OF REDUNDANT RECORDS IN THE Actual Attack TP FN KDD TEST SET Class Normal FP TN Original Records Distinct Records Reduction rate Attacks 250,436 29,378 88.26% Normal 60,591 47,911 20.92% III.CLASSIFIER COMBINATION TECHNIQUES Total 311,027 77,289 75.15% Classifier combination technique can be used to reduce B. Evaluation Metrics the error due to variance. In order to make decisions in intrusion detection more reliable , the output of different Metrics which are mainly used to evaluate the models can be combined. Several machine learning performance of classifier are present in [6] [2] and are techniques do this by learning an ensemble of models and given here for ready reference. using them in combination , Bagging, Boosting, and • The true positives (TP) and true negatives (TN) are Stacking are most efficient among them. These models can correct classifications. True positive is the probability increase the predictive performance over a single model that there is an alert, when there is an intrusion. It is and can also be applied to numeric prediction problems and calculated as below. classification tasks. The performance of these three models © 2010 ACEEE 47 DOI: 01.IJSIP.01.03.45
  • 3. ACEEE Int. J. on Signal & Image Processing, Vol. 01, No. 03, Dec 2010 is good. An ensemble of classifiers is a set of classifiers It was found that for the normal class, as shown in the whose individual decisions are combined to classify new Table V, bagging gives the better result. The recall was examples. The purpose of combining classifiers consists on found to be 0.719 for bagging and it was 0.708 for C4.5, improving the accuracy of a single classifier [10]. both having the same precision value (0.973). While for the anomaly class as shown in Table VI, both recall and A. Bagging: precision have higher values for bagging. The Bootstrap aggregating algorithm generates different TABLE V classifiers from different bootstrap samples and combines PERFORMANCE METRICS FOR NORMAL CLASS decisions from the different classifiers into a single prediction by voting (the class that gets more votes from Bagging Boosting Stacking C4.5 the classifiers wins). TP 0.973 0.957 0.974 0.973 B. Boosting: FP 0.288 0.346 0.326 0.304 Another method to construct an ensemble of classifiers Recall 0.719 0.677 0.693 0.708 is know as boosting, which is used to boost the Precision 0.973 0.957 0.974 0.973 performance of a weak learner. A weak learner is a simple classifier whose error is less than 50% on training TABLE VI instances. The models which are more successful will be PERFORMANCE METRICS FOR ANOMALY CLASS assigned with more weight as compared to other models. Here each new model is influenced by the performance of Bagging Boosting Stacking C4.5 previously built model. TP 0.712 0.654 0.674 0.696 Thus boosting can built a powerful combined classifier FP 0.027 0.043 0.026 0.027 from very simple learning methods. It can convert these Recall 0.972 0.953 0.971 0.971 simple learning methods called as weak learners into strong Precision 0.712 0.654 0.674 0.696 ones. It produces classifiers that are more accurate on fresh data than ones generated by bagging. But it sometimes fails in practical situations: It generate a classifier that is less V CONCLUSIONS accurate than a single classifier from the same data [7]. Error due to variance has been reduced using classifier C. Stacking: combinations thus increasing the performance of the Stacking is the abbreviation to refer to Stacked classification using the NSL-KDD dataset. Out of the three Generalization. Unlike bagging and boosting, it uses classifiers Bagging provides better results. NSL-KDD different learning algorithms to generate the ensemble of dataset can be used for performance evaluation for 5- classifiers. The main idea of stacking is to combine classes (normal, dos, probe, u2r and r2l) instead of 2- classifiers from different learners such as decision trees, classes. Further performance can be improved by reducing instance-based learners, etc. the features as given in [12]. Since each one uses different knowledge representation Different set of features are used for different class. and different learning biases, the hypothesis space will be More classification algorithm and its combination can be explored differently, and different classifiers will be used on NSL-KDD dataset obtained. Thus, it is expected that they will not be correlated. REFERENCES Once the classifiers have been generated, they must be [1] M. Tavallaee, E. Bagheri, W. Lu, and A. Ghorbani, “A combined. Unlike bagging and boosting, stacking does not Detailed Analysis of the KDD CUP 99 Data Set,” use a voting system because, for example, if the majority of Proceedings of the Second IEEE Symposium on the classifiers make bad predictions this will lead to a final Computational Intelligence for Security and Defense bad classification. To solve this problem, stacking uses the Applications (CISDA) 2009. concept of Meta learner.[10] The Meta learner (or level-1 [2] M. Shyu, S. Chen, K. Sarinnapakorn, & L. Chang,” A novel model), tries to learn, using a learning algorithm, how the anomaly detection scheme based on principal component classifier”, Proceedings of the IEEE foundation & New decisions of the base classifiers (or level-0 models) should Directions of Data Mining Workshop, in conjunction with the be combined . Third IEEE International Conference on Data Mining (ICDM03), pp. 172-179, 2003. IV RESULTS AND DISCUSSION [3] D.E.Denning, “An Intrusion Detection Model”, IEEE Transactions on Software Engineering, SE-13, pp. 222-232, In order to reduce the error due to variance classifier 1987. combinations are used. Initially C4.5 classifier is applied [4] Stolfo J., Fan W., Lee W., Prodromidis A., and Chan P.K., over NSL-KDD dataset. NSL-KDD contains 125973 “Cost-based modeling and evaluation for data mining with records in the train set and 22544 records in the test set. To application to fraud and intrusion detection,” DARPA improve the performance of C4.5 classifier over NSL-KDD Information Survivability Conference, 2000. dataset, classifier combinations techniques: bagging, [5] http://weka.sourceforge.net/wekadoc/index.php/en: boosting and stacking are used. [6] P Srinivasulu, D Nagaraju, P Ramesh Kumar, and K Nagerwara Rao, “Classifying the Network Intrusion Attacks © 2010 ACEEE 48 DOI: 01.IJSIP.01.03.45
  • 4. ACEEE Int. J. on Signal & Image Processing, Vol. 01, No. 03, Dec 2010 using Data Mining Classification Methods and their [9] The KDD Archive. KDD99 cup dataset, 1999. Performance http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html Comparison”, International Journal of Computer Science and [10] Ricardo Aler, Daniel Borrajo, and Agapito Ledezma, “ Network Security, Vol.9 No.6, pp 11-18 June 2009. Heuristic Search Based Stacking of Classifiers”, Universidad [7] Ian H. Witten and Eibe Frank, “Data Mining”, Practical Carlos III, Avda, Universidad, 30, 28911 Leganés (Madrid) Machine Learning Tools and Techniques, Second Edition, , 2002. Elsevier, 2005. [11] http://en.wikipedia.org/wiki/C4.5_algorithm [8] Srilatha Chebrolu, Ajith Abraham, Johnson P. Thomas,” [12] Anazida Zainal, Mohd Aizaini Maarof, and Siti Mariyam Feature Deduction and ensemble design of Intrusion Shamsuddin, Ensemble Classifiers for Network Intrusion Detection”, Elsevier, Computer and Security, 24,pp 295-307, Detection System”, Journal of Information Assurance and 2005. Security 4, 217-225, 2009 © 2010 ACEEE 49 DOI: 01.IJSIP.01.03.45