SlideShare a Scribd company logo
1 of 4
Download to read offline
ISSN: 2277 – 9043
                                                    International Journal of Advanced Research in Computer Science and Electronics Engineering
                                                                                                                  Volume 1, Issue 2, April 2012




                    Attack Detection By Clustering And
                          Classification Approach
                               Ms. Priyanka J. Pathak , Asst. Prof. Snehlata S. Dongre



Abstract—Intrusion detection is a software application that
monitors network and/or system activities for malicious               streams for intrusion detection. Clustering is done by
activities or policy violations and produces reports to a
                                                                      K-means algorithm this will generates clusters of similar
Management Station. Security is becoming big issue for all
networks. Hackers and intruders have made many successful             attack type which fall into same category. The centroids
attempts to bring down high profile company networks and web          generated by this algorithm is used in next step of
services. Intrusion Detection System (IDS) is an important            classification. In classification step K-nearest neighbors and
detection that is used as a countermeasure to preserve data           decision tree algorithms are used which detects the different
integrity and system availability from attacks. The work is           types of attacks.
implemented in two phases, in first phase clustering by
K-means is done and in next step of classification is done with
k-nearest neighbours and decision trees. The objects are                                      II. CLUSTERING
clustered or grouped based on the principle of maximizing the            A cluster [Han & Camber] is a collection of data objects
intra-class similarity and minimizing the interclass similarity.      that are similar to one another within the same cluster and are
This paper proposes an approach which make the clusters of            dissimilar to the objects in other clusters. A cluster of data
similar attacks and in next step of classification with K nearest     objects can be treated collectively as one group and so may
neighbours and Decision trees it detect the attack types. This
                                                                      be
method is advantageous over single classifier as it detect better
class than single classifier system.                                  Considered as a form of data compression. Although
                                                                      classification is an effective means for distinguishing groups
  Index Terms—Data mining, Decision Trees, K-Means,                   or classes of objects, it requires the often costly collection
K-Nearest Neighbours,                                                 and labeling of a large set of training tuples or patterns, which
                                                                      the classifier uses to model each group. It is often more
                      I. INTRODUCTION                                 desirable to proceed in the reverse direction: First partition
                                                                      the set of data into groups based on data similarity (e.g., using
  Now a day everyone gets connected to the system but, there
                                                                      clustering), and then assign labels to the relatively small
are many issues in network environment. So there is need of
                                                                      number of groups. Additional advantages of such a
securing information, because there are lots of security              clustering-based process are that it is adaptable to changes
threats are present in network environment. Intrusion                 and helps single out useful features that distinguish different
detection[9] is a software application that monitors network          groups.
and/or system activities for malicious activities or policy              Clustering[8],[10] is an unsupervised learning technique
violations and produces reports to a Management Station.              which divides the datasets into subparts, which share
Intrusion detection systems are usually classified as host            common properties. For clustering data points, there should
based or network based in terms of target domain. A number            be high intra cluster similarity and low inter cluster similarity.
of techniques are available for intrusion detection [8],[11].         A clustering method which results in such type of clusters is
Data mining is the one of the efficient techniques available          considered as good clustering algorithm.
for intrusion detection. Data mining refers to extracting or
                                                                      A. K-Means Algorithm
mining knowledge from large amounts of data. Data mining
techniques may be supervised or unsupervised. The                        K-means [Han & Camber] is one of the simplest
Clustering[10] and Classification are the two important               unsupervised learning algorithms that solve the well known
techniques used in data mining for extracting valuable                clustering problem. The procedure follows a simple and easy
information. This paper proposes clustering and                       way to classify a given data set through a certain number of
classification. Classification is the processing of finding a set     clusters (assume k clusters) fixed a priori. The main idea is to
of models which describe and distinguish data classes or              define k centroids, one for each cluster. These centroids
concepts, for the purposes of being able to use the model to          should be placed in a cunning way because of different
predict the class of objects whose class label is unknown. The        location causes different result. So, the better choice is to
paper is organized as follows. Section 2 gives the details of         place them as much as possible far away from each other. The
clustering algorithms. Section 3 & Section 4 details about            next step is to take each point belonging to a given data set
strategies for classification for novel class detection in data       and associate it to the nearest centroid. When no point is
                                                                      pending, the first step is completed and an early groupage is
                                                                      done. At this point we need to re-calculate k new centroids as


                                                                                                                                          115
                                             All Rights Reserved © 2012 IJARCSEE
ISSN: 2277 – 9043
                                                           International Journal of Advanced Research in Computer Science and Electronics Engineering
                                                                                                                         Volume 1, Issue 2, April 2012


barycenters of the clusters resulting from the previous step.                figure shows overall system architecture. The Clustering and
After we have these k new centroids, a new binding has to be                 Classification are the two parts of system.
done between the same data set points and the nearest new                       The input to the system is KDD data set. The first step is
centroid. A loop has been generated. As a result of this loop                preprocess the input and this preprocessed input is applied to
we may notice that the k centroids change their location step                the K-Means Clustering algorithm. The Preprocessing steps
by step until no more changes are done.                                      contains :
   The algorithm is composed of the following steps:                               Data cleaning
   Algorithm: k-means                                                              Feature Selection
   The k-means algorithm for partitioning, where each                              Data transformation etc.
cluster’s center is represented by the mean value of the
objects in the cluster.
   Input:
   k: the number of clusters,
   D: a data set containing n objects.
   Output:
   A set of k clusters.
   Method:
   (1) Arbitrarily choose k objects from D as the initial
cluster centers;
   (2) Repeat
   (3) Reassign each object to the cluster to which the object
is the most similar, based on the mean value of the objects in
the cluster;
   (4) Update the cluster means, i.e., calculate the mean value
of the objects for each cluster;
   (5) Until no change;


                                                                                                    Fig.3 System Architecture

                                                                              The K-Means creates the clusters of similar attacks based on
                                                                             similarity measure. Each record of data set contains 42
                                                                             different features. The values of these features are already
     Fig: 1 Clustering of a set of objects based on the k-means method
                                                                             pre-processed in training data. The calculated centroid values
                                                                             are used in next K-Nearest neighbours classification.
  This produces a separation of the objects into groups from
which the metric to be minimized can be calculated.




                                                                                                      Fig4:Preprocessed Data

                    Fig2: output of K-means algorithm
                                                                               A. K-Nearest Neighbour Classification
                     III.    CLASSIFICATION                                    In this method, one simply finds in the N-dimensional
                                                                             feature space the closest object from the training set to an
   Classification [4],[13] is a process of finding a model that              object being classified. Since the neighbour is nearby, it is
describes and distinguishes data classes or concept for the                  likely to be similar to the object being classified and so is
purpose of being able to use the model to predict the class of               likely to be the same class as that object. Nearest neighbour
objects whose class label is unknown. The derived model is                   methods have the advantage that they are easy to implement.
based on the analysis of set of training data. This paper                    They can also give quite good results if the features are
introduces a method about clustering and classification. The                 chosen carefully. A new sample is classified by calculating
clustering is done by K-Means algorithm which is explained                   the distance to the nearest training case; the sign of that point
in earlier section. The classification method is proposed                    then determines the classification of the sample. Closeness"
through which novel class can be detected. The following


                                                                                                                                                 116
                                                  All Rights Reserved © 2012 IJARCSEE
ISSN: 2277 – 9043
                                                       International Journal of Advanced Research in Computer Science and Electronics Engineering
                                                                                                                     Volume 1, Issue 2, April 2012

is defined in terms of Euclidean distance, where the                          2.    Choose attribute for which entropy is minimum (or,
Euclidean distance between two points,                                              equivalently, information gain is maximum)
                                                                              3.    Make node containing that attribute

                                                                         The algorithm is as follows:
                                                                         Input: A data set, S
                                                                         Output: A decision tree
                                                                              If all the instances have the same value for the target
                                                                                  attribute then return a decision tree that is simply
                                                                                  this value.
                                                                              Else
                                                                                       ◦ Compute Gain values for all attributes and
                                                                                            select an attribute with the highest value
                                                                                            and create a node for that attribute.
                                                                                       ◦ Make a branch from this node for every
         Fig5: Flow Chart of K-Nearest Neighbour algorithm
                                                                                            value of the attribute
  The unknown sample is assigned the most common class                                 ◦ Assign all possible values of the attribute
among its k nearest neighbours. Following fig. shows attack                                 to branches.
types and their number of nodes detected by K-Nearest                                  ◦ Follow each branch by partitioning the
neighbour algorithm.                                                                        dataset to be only instances whereby the
                                                                                            value of the branch is present and then go
                                                                                            back to 1.
                                                                           This algorithm usually produces small trees, but it does not
                                                                         always produce the smallest possible tree.
                                                                         The optimization step makes use of information entropy:
                                                                         Entropy:


                                                                         Where :
                                                                              E(S) is the information entropy of the set S;
                                                                              n is the number of different values of the attribute
                                                                                  in S (entropy is computed for one chosen attribute)
                                                                              f S(j) is the frequency (proportion) of the value j in
                                                                                  the set S.
           Fig6: Output of K-Nearest-Neighbour algorithm
                                                                              log2 is the binary logarithm
                                                                         An entropy of 0 identifies a perfectly classified set.
                                                                         Entropy is used to determine which node to split next in the
  B. Decision Trees                                                      algorithm. The higher the entropy, the higher the potential to
   Decision tree learning [5] ,[6] is a method commonly used             improve the classification.
in data mining. The goal is to create a model that predicts the          Gain:
value of a target variable based on several input variables.
Each interior node corresponds to one of the input variables;            Gain is computed to estimate the gain produced by
there are edges to children for each of the possible values of
                                                                         a split over an attribute :
that input variable. Each leaf represents a value of the target
variable given the values of the input variables represented
by the path from the root to the leaf. A tree can be "learned"
by splitting the source set into subsets based on an attribute
value test. This process is repeated on each derived subset in           Where :
a recursive manner called recursive partitioning. The
                                                                                   G(S,A) is the gain of the set S after a split over the
recursion is completed when the subset at a node all has the
                                                                                    A attribute
same value of the target variable, or when splitting no longer
adds value to the predictions.                                                     E(S) is the information entropy of the set S.
   C. ID3 Algorithm
                                                                                   m is the number of different values of the attribute A
  In decision tree learning, ID3 (Iterative Dichotomiser 3)[6]                      in S.
is an algorithm used to generate a decision tree. The ID3
algorithm can be summarized as follows:                                            Fs(Ai) is the frequency (proportion) of the items
     1. Take all unused attributes and count their entropy                          possessing Ai as value for A in S.
          concerning test samples
                                                                                   Ai is ith possible value of A.


                                                                                                                                             117
                                               All Rights Reserved © 2012 IJARCSEE
ISSN: 2277 – 9043
                                                             International Journal of Advanced Research in Computer Science and Electronics Engineering
                                                                                                                           Volume 1, Issue 2, April 2012


          SAi is a subset of S containing all items where the                 [12] Srilatha Chebrolua, Ajith Abrahama, Johnson P. Thomasa” Feature
                                                                                    deduction and ensemble design of intrusion detection systems” 2004
           value of A is Ai.                                                        Elsevier Ltd
Gain quantifies the entropy improvement by splitting over an                   [13] Sandhya Peddabachigaria, Ajith Abrahamb,,Crina Grosanc, Johnson
                                                                                    Thomasa” Modeling intrusion detection system using hybrid intelligent
attribute.                                                                          systems” 2005 Elsevier Ltd.


                                                                                  First Author:
                                                                                  Priyanka J. Pathak
                                                                                  IV sem MTech[CSE],
                                                                                  G.H.Raisoni College of Engineering,Nagpur,
                                                                                  R.T.M.N.U, Nagpur

                                                                                  Second Author :
                                                                                  Asst. Prof. Snehlata Dongre
                                                                                  CSE Department,
                                                                                  G.H.Raisoni College of engineering,Nagpur
                                                                                  R.T.M.N.U, Nagpur




                      Fig7: Output of ID3 algorithm

                          IV. CONCLUSION
    In this paper, the work is done through which the novel
class is detected The system uses K-Means clustering
algorithm which produces different clusters of similar type of
attacks and total nodes per clusters in the input dataset. Also
it shows the updated centroids of each parameter in the input
dataset. The Classification stage gives details about detection
of different types of attacks and number of nodes in dataset.

                                REFERENCES
[1]  Kapil K. Wankhade, Snehlata S. Dongre, Prakash S. Prasad , Mrudula
     M. Gudadhe, Kalpana A. Mankar,” Intrusion Detection System Using
     New Ensemble Boosting Approach” 2011 3rd International
     Conference on Computer Modeling and Simulation (ICCMS 2011)
[2] Kapil K. Wankhade, Snehlata S. Dongre, Kalpana A. Mankar, Prashant
     K. Adakane,” A New Adaptive Ensemble Boosting Classifier for
     Concept Drifting Stream Data” 2011 3rd International Conference on
     Computer Modeling and Simulation (ICCMS 2011)
[3] Hongbo Zhu, Yaqiang Wang, Zhonghua Yu “Clustering of Evolving
     Data Stream with Multiple Adaptive Sliding Window” International
     Conference on Data Storage and Data Engineering, 2010.
[4] Peng Zhang†, Xingquan Zhu‡, Jianlong Tan†, Li Guo† “Classifier and
     Cluster Ensembles for Mining Concept Drifting Data Streams” IEEE
     International Conference on Data Mining, 2010.
[5] T.Jyothirmayi, Suresh Reddy “An Algorithm for Better Decision
     Tree” T.Jyothirmayi et. al. / (IJCSE) International Journal on Computer
     Science and Engineering , 2010
[6] Yongjin Liu, NaLi, Leina Shi, Fangping Li “An Intrusion Detection
     Method Based on Decision Tree”, International Conference on
     E-Health Networking, Digital Ecosystems and Technologies, 2010.
[7] A. Bifet, G. Holmes, B. Pfahringer, R. Kirkby and R. Gavalda, “New
     ensemble methods for evloving data streams”, In KDD’09, ACM,
     Paris, 2009, pp. 139-148.
[8] LID Li-xiong, KANGJing, GUO Yun-fei, HUANGHai “A Three-Step
     Clustering Algorithm over an Evolving Data Stream” The National
     High Technology Research and Development Program("863"
     Program) of China, Fund 2008 AAOII002 sponsors.
[9] Mrutyunjaya Panda, Manas Ranjan Patra “A COMPARATIVE
     STUDY OF DATA MINING ALGORITHMS FOR NETWORK
     INTRUSION DETECTION” First International Conference on
     Emerging Trends in Engineering and Technology,2008
[10] Shuang Wu, Chunyu Yang and Jie Zhou “Clustering-training for Data
     Stream Mining” Sixth IEEE International Conference on Data Mining -
     Workshops (ICDMW'06)
[11] Sang-Hyun Oh, Jin-Suk Kang, Yung-Cheol Byun, Gyung-Leen Park3
     and Sang-Yong Byun3 “Intrusion Detection based on Clustering a Data
     Stream” Third ACIS Int'l Conference on Software Engineering
     Research, Management and Applications (SERA’05).


                                                                                                                                                    118
                                                    All Rights Reserved © 2012 IJARCSEE

More Related Content

What's hot

Design of a Reliable Wireless Sensor Network with Optimized Energy Efficiency...
Design of a Reliable Wireless Sensor Network with Optimized Energy Efficiency...Design of a Reliable Wireless Sensor Network with Optimized Energy Efficiency...
Design of a Reliable Wireless Sensor Network with Optimized Energy Efficiency...paperpublications3
 
Using Neighbor’s State Cross-correlation to Accelerate Adaptation in Docitiv...
Using Neighbor’s State Cross-correlation to Accelerate Adaptation  in Docitiv...Using Neighbor’s State Cross-correlation to Accelerate Adaptation  in Docitiv...
Using Neighbor’s State Cross-correlation to Accelerate Adaptation in Docitiv...paperpublications3
 
A Novel Classification via Clustering Method for Anomaly Based Network Intrus...
A Novel Classification via Clustering Method for Anomaly Based Network Intrus...A Novel Classification via Clustering Method for Anomaly Based Network Intrus...
A Novel Classification via Clustering Method for Anomaly Based Network Intrus...IDES Editor
 
Machine Learning Algorithms for Image Classification of Hand Digits and Face ...
Machine Learning Algorithms for Image Classification of Hand Digits and Face ...Machine Learning Algorithms for Image Classification of Hand Digits and Face ...
Machine Learning Algorithms for Image Classification of Hand Digits and Face ...IRJET Journal
 
A NOVEL ROUTING PROTOCOL FOR TARGET TRACKING IN WIRELESS SENSOR NETWORKS
A NOVEL ROUTING PROTOCOL FOR TARGET TRACKING IN WIRELESS SENSOR NETWORKSA NOVEL ROUTING PROTOCOL FOR TARGET TRACKING IN WIRELESS SENSOR NETWORKS
A NOVEL ROUTING PROTOCOL FOR TARGET TRACKING IN WIRELESS SENSOR NETWORKSIJCNCJournal
 
Improved Performance of Unsupervised Method by Renovated K-Means
Improved Performance of Unsupervised Method by Renovated K-MeansImproved Performance of Unsupervised Method by Renovated K-Means
Improved Performance of Unsupervised Method by Renovated K-MeansIJASCSE
 
An efficient technique for color image classification based on lower feature ...
An efficient technique for color image classification based on lower feature ...An efficient technique for color image classification based on lower feature ...
An efficient technique for color image classification based on lower feature ...Alexander Decker
 
A survey on Object Tracking Techniques in Wireless Sensor Network
A survey on Object Tracking Techniques in Wireless Sensor NetworkA survey on Object Tracking Techniques in Wireless Sensor Network
A survey on Object Tracking Techniques in Wireless Sensor NetworkIRJET Journal
 
X-TREPAN : A Multi Class Regression and Adapted Extraction of Comprehensible ...
X-TREPAN : A Multi Class Regression and Adapted Extraction of Comprehensible ...X-TREPAN : A Multi Class Regression and Adapted Extraction of Comprehensible ...
X-TREPAN : A Multi Class Regression and Adapted Extraction of Comprehensible ...csandit
 
A COST EFFECTIVE COMPRESSIVE DATA AGGREGATION TECHNIQUE FOR WIRELESS SENSOR N...
A COST EFFECTIVE COMPRESSIVE DATA AGGREGATION TECHNIQUE FOR WIRELESS SENSOR N...A COST EFFECTIVE COMPRESSIVE DATA AGGREGATION TECHNIQUE FOR WIRELESS SENSOR N...
A COST EFFECTIVE COMPRESSIVE DATA AGGREGATION TECHNIQUE FOR WIRELESS SENSOR N...ijasuc
 
HCIFR: Hierarchical Clustering and Iterative Filtering Routing Algorithm for ...
HCIFR: Hierarchical Clustering and Iterative Filtering Routing Algorithm for ...HCIFR: Hierarchical Clustering and Iterative Filtering Routing Algorithm for ...
HCIFR: Hierarchical Clustering and Iterative Filtering Routing Algorithm for ...IJAEMSJORNAL
 
Energy Efficient Multipath Data Fusion Technique for Wireless Sensor Networks
Energy Efficient Multipath Data Fusion Technique for Wireless Sensor NetworksEnergy Efficient Multipath Data Fusion Technique for Wireless Sensor Networks
Energy Efficient Multipath Data Fusion Technique for Wireless Sensor NetworksIDES Editor
 
A seminar report on data aggregation in wireless sensor networks
A seminar report on data aggregation in wireless sensor networksA seminar report on data aggregation in wireless sensor networks
A seminar report on data aggregation in wireless sensor networkspraveen369
 
A Novel Dencos Model For High Dimensional Data Using Genetic Algorithms
A Novel Dencos Model For High Dimensional Data Using Genetic Algorithms A Novel Dencos Model For High Dimensional Data Using Genetic Algorithms
A Novel Dencos Model For High Dimensional Data Using Genetic Algorithms ijcseit
 
Introduction to Multi-Objective Clustering Ensemble
Introduction to Multi-Objective Clustering EnsembleIntroduction to Multi-Objective Clustering Ensemble
Introduction to Multi-Objective Clustering EnsembleIJSRD
 

What's hot (18)

Design of a Reliable Wireless Sensor Network with Optimized Energy Efficiency...
Design of a Reliable Wireless Sensor Network with Optimized Energy Efficiency...Design of a Reliable Wireless Sensor Network with Optimized Energy Efficiency...
Design of a Reliable Wireless Sensor Network with Optimized Energy Efficiency...
 
Using Neighbor’s State Cross-correlation to Accelerate Adaptation in Docitiv...
Using Neighbor’s State Cross-correlation to Accelerate Adaptation  in Docitiv...Using Neighbor’s State Cross-correlation to Accelerate Adaptation  in Docitiv...
Using Neighbor’s State Cross-correlation to Accelerate Adaptation in Docitiv...
 
A Novel Classification via Clustering Method for Anomaly Based Network Intrus...
A Novel Classification via Clustering Method for Anomaly Based Network Intrus...A Novel Classification via Clustering Method for Anomaly Based Network Intrus...
A Novel Classification via Clustering Method for Anomaly Based Network Intrus...
 
Machine Learning Algorithms for Image Classification of Hand Digits and Face ...
Machine Learning Algorithms for Image Classification of Hand Digits and Face ...Machine Learning Algorithms for Image Classification of Hand Digits and Face ...
Machine Learning Algorithms for Image Classification of Hand Digits and Face ...
 
A NOVEL ROUTING PROTOCOL FOR TARGET TRACKING IN WIRELESS SENSOR NETWORKS
A NOVEL ROUTING PROTOCOL FOR TARGET TRACKING IN WIRELESS SENSOR NETWORKSA NOVEL ROUTING PROTOCOL FOR TARGET TRACKING IN WIRELESS SENSOR NETWORKS
A NOVEL ROUTING PROTOCOL FOR TARGET TRACKING IN WIRELESS SENSOR NETWORKS
 
Improved Performance of Unsupervised Method by Renovated K-Means
Improved Performance of Unsupervised Method by Renovated K-MeansImproved Performance of Unsupervised Method by Renovated K-Means
Improved Performance of Unsupervised Method by Renovated K-Means
 
JOSA TechTalks - Machine Learning in Practice
JOSA TechTalks - Machine Learning in PracticeJOSA TechTalks - Machine Learning in Practice
JOSA TechTalks - Machine Learning in Practice
 
ICICCE0298
ICICCE0298ICICCE0298
ICICCE0298
 
An efficient technique for color image classification based on lower feature ...
An efficient technique for color image classification based on lower feature ...An efficient technique for color image classification based on lower feature ...
An efficient technique for color image classification based on lower feature ...
 
A survey on Object Tracking Techniques in Wireless Sensor Network
A survey on Object Tracking Techniques in Wireless Sensor NetworkA survey on Object Tracking Techniques in Wireless Sensor Network
A survey on Object Tracking Techniques in Wireless Sensor Network
 
X-TREPAN : A Multi Class Regression and Adapted Extraction of Comprehensible ...
X-TREPAN : A Multi Class Regression and Adapted Extraction of Comprehensible ...X-TREPAN : A Multi Class Regression and Adapted Extraction of Comprehensible ...
X-TREPAN : A Multi Class Regression and Adapted Extraction of Comprehensible ...
 
A COST EFFECTIVE COMPRESSIVE DATA AGGREGATION TECHNIQUE FOR WIRELESS SENSOR N...
A COST EFFECTIVE COMPRESSIVE DATA AGGREGATION TECHNIQUE FOR WIRELESS SENSOR N...A COST EFFECTIVE COMPRESSIVE DATA AGGREGATION TECHNIQUE FOR WIRELESS SENSOR N...
A COST EFFECTIVE COMPRESSIVE DATA AGGREGATION TECHNIQUE FOR WIRELESS SENSOR N...
 
HCIFR: Hierarchical Clustering and Iterative Filtering Routing Algorithm for ...
HCIFR: Hierarchical Clustering and Iterative Filtering Routing Algorithm for ...HCIFR: Hierarchical Clustering and Iterative Filtering Routing Algorithm for ...
HCIFR: Hierarchical Clustering and Iterative Filtering Routing Algorithm for ...
 
Energy Efficient Multipath Data Fusion Technique for Wireless Sensor Networks
Energy Efficient Multipath Data Fusion Technique for Wireless Sensor NetworksEnergy Efficient Multipath Data Fusion Technique for Wireless Sensor Networks
Energy Efficient Multipath Data Fusion Technique for Wireless Sensor Networks
 
A seminar report on data aggregation in wireless sensor networks
A seminar report on data aggregation in wireless sensor networksA seminar report on data aggregation in wireless sensor networks
A seminar report on data aggregation in wireless sensor networks
 
A Novel Dencos Model For High Dimensional Data Using Genetic Algorithms
A Novel Dencos Model For High Dimensional Data Using Genetic Algorithms A Novel Dencos Model For High Dimensional Data Using Genetic Algorithms
A Novel Dencos Model For High Dimensional Data Using Genetic Algorithms
 
Introduction to Multi-Objective Clustering Ensemble
Introduction to Multi-Objective Clustering EnsembleIntroduction to Multi-Objective Clustering Ensemble
Introduction to Multi-Objective Clustering Ensemble
 
Dsto tr-1436
Dsto tr-1436Dsto tr-1436
Dsto tr-1436
 

Similar to 41 125-1-pb

Paper id 26201478
Paper id 26201478Paper id 26201478
Paper id 26201478IJRAT
 
A Comparative Study Of Various Clustering Algorithms In Data Mining
A Comparative Study Of Various Clustering Algorithms In Data MiningA Comparative Study Of Various Clustering Algorithms In Data Mining
A Comparative Study Of Various Clustering Algorithms In Data MiningNatasha Grant
 
Volume 2-issue-6-2143-2147
Volume 2-issue-6-2143-2147Volume 2-issue-6-2143-2147
Volume 2-issue-6-2143-2147Editor IJARCET
 
Cancer data partitioning with data structure and difficulty independent clust...
Cancer data partitioning with data structure and difficulty independent clust...Cancer data partitioning with data structure and difficulty independent clust...
Cancer data partitioning with data structure and difficulty independent clust...IRJET Journal
 
Classification Techniques: A Review
Classification Techniques: A ReviewClassification Techniques: A Review
Classification Techniques: A ReviewIOSRjournaljce
 
A survey on Efficient Enhanced K-Means Clustering Algorithm
 A survey on Efficient Enhanced K-Means Clustering Algorithm A survey on Efficient Enhanced K-Means Clustering Algorithm
A survey on Efficient Enhanced K-Means Clustering Algorithmijsrd.com
 
An Iterative Improved k-means Clustering
An Iterative Improved k-means ClusteringAn Iterative Improved k-means Clustering
An Iterative Improved k-means ClusteringIDES Editor
 
Unsupervised Machine Learning Algorithm K-means-Clustering.pptx
Unsupervised Machine Learning Algorithm K-means-Clustering.pptxUnsupervised Machine Learning Algorithm K-means-Clustering.pptx
Unsupervised Machine Learning Algorithm K-means-Clustering.pptxAnupama Kate
 
A new approach of detecting network anomalies using improved id3 with horizon...
A new approach of detecting network anomalies using improved id3 with horizon...A new approach of detecting network anomalies using improved id3 with horizon...
A new approach of detecting network anomalies using improved id3 with horizon...Alexander Decker
 
A HYBRID MODEL FOR MINING MULTI DIMENSIONAL DATA SETS
A HYBRID MODEL FOR MINING MULTI DIMENSIONAL DATA SETSA HYBRID MODEL FOR MINING MULTI DIMENSIONAL DATA SETS
A HYBRID MODEL FOR MINING MULTI DIMENSIONAL DATA SETSEditor IJCATR
 
Certain Investigation on Dynamic Clustering in Dynamic Datamining
Certain Investigation on Dynamic Clustering in Dynamic DataminingCertain Investigation on Dynamic Clustering in Dynamic Datamining
Certain Investigation on Dynamic Clustering in Dynamic Dataminingijdmtaiir
 
Intrusion Detection System Based on K-Star Classifier and Feature Set Reduction
Intrusion Detection System Based on K-Star Classifier and Feature Set ReductionIntrusion Detection System Based on K-Star Classifier and Feature Set Reduction
Intrusion Detection System Based on K-Star Classifier and Feature Set ReductionIOSR Journals
 
Improved K-mean Clustering Algorithm for Prediction Analysis using Classifica...
Improved K-mean Clustering Algorithm for Prediction Analysis using Classifica...Improved K-mean Clustering Algorithm for Prediction Analysis using Classifica...
Improved K-mean Clustering Algorithm for Prediction Analysis using Classifica...IJCSIS Research Publications
 
K Means Clustering Algorithm for Partitioning Data Sets Evaluated From Horizo...
K Means Clustering Algorithm for Partitioning Data Sets Evaluated From Horizo...K Means Clustering Algorithm for Partitioning Data Sets Evaluated From Horizo...
K Means Clustering Algorithm for Partitioning Data Sets Evaluated From Horizo...IOSR Journals
 
The improved k means with particle swarm optimization
The improved k means with particle swarm optimizationThe improved k means with particle swarm optimization
The improved k means with particle swarm optimizationAlexander Decker
 
Survey on classification algorithms for data mining (comparison and evaluation)
Survey on classification algorithms for data mining (comparison and evaluation)Survey on classification algorithms for data mining (comparison and evaluation)
Survey on classification algorithms for data mining (comparison and evaluation)Alexander Decker
 
84cc04ff77007e457df6aa2b814d2346bf1b
84cc04ff77007e457df6aa2b814d2346bf1b84cc04ff77007e457df6aa2b814d2346bf1b
84cc04ff77007e457df6aa2b814d2346bf1bPRAWEEN KUMAR
 

Similar to 41 125-1-pb (20)

Paper id 26201478
Paper id 26201478Paper id 26201478
Paper id 26201478
 
A Comparative Study Of Various Clustering Algorithms In Data Mining
A Comparative Study Of Various Clustering Algorithms In Data MiningA Comparative Study Of Various Clustering Algorithms In Data Mining
A Comparative Study Of Various Clustering Algorithms In Data Mining
 
Volume 2-issue-6-2143-2147
Volume 2-issue-6-2143-2147Volume 2-issue-6-2143-2147
Volume 2-issue-6-2143-2147
 
Cancer data partitioning with data structure and difficulty independent clust...
Cancer data partitioning with data structure and difficulty independent clust...Cancer data partitioning with data structure and difficulty independent clust...
Cancer data partitioning with data structure and difficulty independent clust...
 
Classification Techniques: A Review
Classification Techniques: A ReviewClassification Techniques: A Review
Classification Techniques: A Review
 
Chapter 5.pdf
Chapter 5.pdfChapter 5.pdf
Chapter 5.pdf
 
A survey on Efficient Enhanced K-Means Clustering Algorithm
 A survey on Efficient Enhanced K-Means Clustering Algorithm A survey on Efficient Enhanced K-Means Clustering Algorithm
A survey on Efficient Enhanced K-Means Clustering Algorithm
 
An Iterative Improved k-means Clustering
An Iterative Improved k-means ClusteringAn Iterative Improved k-means Clustering
An Iterative Improved k-means Clustering
 
Unsupervised Machine Learning Algorithm K-means-Clustering.pptx
Unsupervised Machine Learning Algorithm K-means-Clustering.pptxUnsupervised Machine Learning Algorithm K-means-Clustering.pptx
Unsupervised Machine Learning Algorithm K-means-Clustering.pptx
 
Ij2514951500
Ij2514951500Ij2514951500
Ij2514951500
 
31 34
31 3431 34
31 34
 
A new approach of detecting network anomalies using improved id3 with horizon...
A new approach of detecting network anomalies using improved id3 with horizon...A new approach of detecting network anomalies using improved id3 with horizon...
A new approach of detecting network anomalies using improved id3 with horizon...
 
A HYBRID MODEL FOR MINING MULTI DIMENSIONAL DATA SETS
A HYBRID MODEL FOR MINING MULTI DIMENSIONAL DATA SETSA HYBRID MODEL FOR MINING MULTI DIMENSIONAL DATA SETS
A HYBRID MODEL FOR MINING MULTI DIMENSIONAL DATA SETS
 
Certain Investigation on Dynamic Clustering in Dynamic Datamining
Certain Investigation on Dynamic Clustering in Dynamic DataminingCertain Investigation on Dynamic Clustering in Dynamic Datamining
Certain Investigation on Dynamic Clustering in Dynamic Datamining
 
Intrusion Detection System Based on K-Star Classifier and Feature Set Reduction
Intrusion Detection System Based on K-Star Classifier and Feature Set ReductionIntrusion Detection System Based on K-Star Classifier and Feature Set Reduction
Intrusion Detection System Based on K-Star Classifier and Feature Set Reduction
 
Improved K-mean Clustering Algorithm for Prediction Analysis using Classifica...
Improved K-mean Clustering Algorithm for Prediction Analysis using Classifica...Improved K-mean Clustering Algorithm for Prediction Analysis using Classifica...
Improved K-mean Clustering Algorithm for Prediction Analysis using Classifica...
 
K Means Clustering Algorithm for Partitioning Data Sets Evaluated From Horizo...
K Means Clustering Algorithm for Partitioning Data Sets Evaluated From Horizo...K Means Clustering Algorithm for Partitioning Data Sets Evaluated From Horizo...
K Means Clustering Algorithm for Partitioning Data Sets Evaluated From Horizo...
 
The improved k means with particle swarm optimization
The improved k means with particle swarm optimizationThe improved k means with particle swarm optimization
The improved k means with particle swarm optimization
 
Survey on classification algorithms for data mining (comparison and evaluation)
Survey on classification algorithms for data mining (comparison and evaluation)Survey on classification algorithms for data mining (comparison and evaluation)
Survey on classification algorithms for data mining (comparison and evaluation)
 
84cc04ff77007e457df6aa2b814d2346bf1b
84cc04ff77007e457df6aa2b814d2346bf1b84cc04ff77007e457df6aa2b814d2346bf1b
84cc04ff77007e457df6aa2b814d2346bf1b
 

More from Mahendra Sisodia (15)

48 144-1-pb
48 144-1-pb48 144-1-pb
48 144-1-pb
 
47 141-1-pb
47 141-1-pb47 141-1-pb
47 141-1-pb
 
45 135-1-pb
45 135-1-pb45 135-1-pb
45 135-1-pb
 
43 131-1-pb
43 131-1-pb43 131-1-pb
43 131-1-pb
 
42 128-1-pb
42 128-1-pb42 128-1-pb
42 128-1-pb
 
38 116-1-pb
38 116-1-pb38 116-1-pb
38 116-1-pb
 
37 112-1-pb
37 112-1-pb37 112-1-pb
37 112-1-pb
 
34 107-1-pb
34 107-1-pb34 107-1-pb
34 107-1-pb
 
33 102-1-pb
33 102-1-pb33 102-1-pb
33 102-1-pb
 
32 99-1-pb
32 99-1-pb32 99-1-pb
32 99-1-pb
 
27 122-1-pb
27 122-1-pb27 122-1-pb
27 122-1-pb
 
24 83-1-pb
24 83-1-pb24 83-1-pb
24 83-1-pb
 
23 79-1-pb
23 79-1-pb23 79-1-pb
23 79-1-pb
 
20 74-1-pb
20 74-1-pb20 74-1-pb
20 74-1-pb
 
46 138-1-pb
46 138-1-pb46 138-1-pb
46 138-1-pb
 

Recently uploaded

2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Principled Technologies
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024The Digital Insurer
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 

Recently uploaded (20)

2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 

41 125-1-pb

  • 1. ISSN: 2277 – 9043 International Journal of Advanced Research in Computer Science and Electronics Engineering Volume 1, Issue 2, April 2012 Attack Detection By Clustering And Classification Approach Ms. Priyanka J. Pathak , Asst. Prof. Snehlata S. Dongre  Abstract—Intrusion detection is a software application that monitors network and/or system activities for malicious streams for intrusion detection. Clustering is done by activities or policy violations and produces reports to a K-means algorithm this will generates clusters of similar Management Station. Security is becoming big issue for all networks. Hackers and intruders have made many successful attack type which fall into same category. The centroids attempts to bring down high profile company networks and web generated by this algorithm is used in next step of services. Intrusion Detection System (IDS) is an important classification. In classification step K-nearest neighbors and detection that is used as a countermeasure to preserve data decision tree algorithms are used which detects the different integrity and system availability from attacks. The work is types of attacks. implemented in two phases, in first phase clustering by K-means is done and in next step of classification is done with k-nearest neighbours and decision trees. The objects are II. CLUSTERING clustered or grouped based on the principle of maximizing the A cluster [Han & Camber] is a collection of data objects intra-class similarity and minimizing the interclass similarity. that are similar to one another within the same cluster and are This paper proposes an approach which make the clusters of dissimilar to the objects in other clusters. A cluster of data similar attacks and in next step of classification with K nearest objects can be treated collectively as one group and so may neighbours and Decision trees it detect the attack types. This be method is advantageous over single classifier as it detect better class than single classifier system. Considered as a form of data compression. Although classification is an effective means for distinguishing groups Index Terms—Data mining, Decision Trees, K-Means, or classes of objects, it requires the often costly collection K-Nearest Neighbours, and labeling of a large set of training tuples or patterns, which the classifier uses to model each group. It is often more I. INTRODUCTION desirable to proceed in the reverse direction: First partition the set of data into groups based on data similarity (e.g., using Now a day everyone gets connected to the system but, there clustering), and then assign labels to the relatively small are many issues in network environment. So there is need of number of groups. Additional advantages of such a securing information, because there are lots of security clustering-based process are that it is adaptable to changes threats are present in network environment. Intrusion and helps single out useful features that distinguish different detection[9] is a software application that monitors network groups. and/or system activities for malicious activities or policy Clustering[8],[10] is an unsupervised learning technique violations and produces reports to a Management Station. which divides the datasets into subparts, which share Intrusion detection systems are usually classified as host common properties. For clustering data points, there should based or network based in terms of target domain. A number be high intra cluster similarity and low inter cluster similarity. of techniques are available for intrusion detection [8],[11]. A clustering method which results in such type of clusters is Data mining is the one of the efficient techniques available considered as good clustering algorithm. for intrusion detection. Data mining refers to extracting or A. K-Means Algorithm mining knowledge from large amounts of data. Data mining techniques may be supervised or unsupervised. The K-means [Han & Camber] is one of the simplest Clustering[10] and Classification are the two important unsupervised learning algorithms that solve the well known techniques used in data mining for extracting valuable clustering problem. The procedure follows a simple and easy information. This paper proposes clustering and way to classify a given data set through a certain number of classification. Classification is the processing of finding a set clusters (assume k clusters) fixed a priori. The main idea is to of models which describe and distinguish data classes or define k centroids, one for each cluster. These centroids concepts, for the purposes of being able to use the model to should be placed in a cunning way because of different predict the class of objects whose class label is unknown. The location causes different result. So, the better choice is to paper is organized as follows. Section 2 gives the details of place them as much as possible far away from each other. The clustering algorithms. Section 3 & Section 4 details about next step is to take each point belonging to a given data set strategies for classification for novel class detection in data and associate it to the nearest centroid. When no point is pending, the first step is completed and an early groupage is done. At this point we need to re-calculate k new centroids as 115 All Rights Reserved © 2012 IJARCSEE
  • 2. ISSN: 2277 – 9043 International Journal of Advanced Research in Computer Science and Electronics Engineering Volume 1, Issue 2, April 2012 barycenters of the clusters resulting from the previous step. figure shows overall system architecture. The Clustering and After we have these k new centroids, a new binding has to be Classification are the two parts of system. done between the same data set points and the nearest new The input to the system is KDD data set. The first step is centroid. A loop has been generated. As a result of this loop preprocess the input and this preprocessed input is applied to we may notice that the k centroids change their location step the K-Means Clustering algorithm. The Preprocessing steps by step until no more changes are done. contains : The algorithm is composed of the following steps:  Data cleaning Algorithm: k-means  Feature Selection The k-means algorithm for partitioning, where each  Data transformation etc. cluster’s center is represented by the mean value of the objects in the cluster. Input: k: the number of clusters, D: a data set containing n objects. Output: A set of k clusters. Method: (1) Arbitrarily choose k objects from D as the initial cluster centers; (2) Repeat (3) Reassign each object to the cluster to which the object is the most similar, based on the mean value of the objects in the cluster; (4) Update the cluster means, i.e., calculate the mean value of the objects for each cluster; (5) Until no change; Fig.3 System Architecture The K-Means creates the clusters of similar attacks based on similarity measure. Each record of data set contains 42 different features. The values of these features are already Fig: 1 Clustering of a set of objects based on the k-means method pre-processed in training data. The calculated centroid values are used in next K-Nearest neighbours classification. This produces a separation of the objects into groups from which the metric to be minimized can be calculated. Fig4:Preprocessed Data Fig2: output of K-means algorithm A. K-Nearest Neighbour Classification III. CLASSIFICATION In this method, one simply finds in the N-dimensional feature space the closest object from the training set to an Classification [4],[13] is a process of finding a model that object being classified. Since the neighbour is nearby, it is describes and distinguishes data classes or concept for the likely to be similar to the object being classified and so is purpose of being able to use the model to predict the class of likely to be the same class as that object. Nearest neighbour objects whose class label is unknown. The derived model is methods have the advantage that they are easy to implement. based on the analysis of set of training data. This paper They can also give quite good results if the features are introduces a method about clustering and classification. The chosen carefully. A new sample is classified by calculating clustering is done by K-Means algorithm which is explained the distance to the nearest training case; the sign of that point in earlier section. The classification method is proposed then determines the classification of the sample. Closeness" through which novel class can be detected. The following 116 All Rights Reserved © 2012 IJARCSEE
  • 3. ISSN: 2277 – 9043 International Journal of Advanced Research in Computer Science and Electronics Engineering Volume 1, Issue 2, April 2012 is defined in terms of Euclidean distance, where the 2. Choose attribute for which entropy is minimum (or, Euclidean distance between two points, equivalently, information gain is maximum) 3. Make node containing that attribute The algorithm is as follows: Input: A data set, S Output: A decision tree  If all the instances have the same value for the target attribute then return a decision tree that is simply this value.  Else ◦ Compute Gain values for all attributes and select an attribute with the highest value and create a node for that attribute. ◦ Make a branch from this node for every Fig5: Flow Chart of K-Nearest Neighbour algorithm value of the attribute The unknown sample is assigned the most common class ◦ Assign all possible values of the attribute among its k nearest neighbours. Following fig. shows attack to branches. types and their number of nodes detected by K-Nearest ◦ Follow each branch by partitioning the neighbour algorithm. dataset to be only instances whereby the value of the branch is present and then go back to 1. This algorithm usually produces small trees, but it does not always produce the smallest possible tree. The optimization step makes use of information entropy: Entropy: Where :  E(S) is the information entropy of the set S;  n is the number of different values of the attribute in S (entropy is computed for one chosen attribute)  f S(j) is the frequency (proportion) of the value j in the set S. Fig6: Output of K-Nearest-Neighbour algorithm  log2 is the binary logarithm An entropy of 0 identifies a perfectly classified set. Entropy is used to determine which node to split next in the B. Decision Trees algorithm. The higher the entropy, the higher the potential to Decision tree learning [5] ,[6] is a method commonly used improve the classification. in data mining. The goal is to create a model that predicts the Gain: value of a target variable based on several input variables. Each interior node corresponds to one of the input variables; Gain is computed to estimate the gain produced by there are edges to children for each of the possible values of a split over an attribute : that input variable. Each leaf represents a value of the target variable given the values of the input variables represented by the path from the root to the leaf. A tree can be "learned" by splitting the source set into subsets based on an attribute value test. This process is repeated on each derived subset in Where : a recursive manner called recursive partitioning. The  G(S,A) is the gain of the set S after a split over the recursion is completed when the subset at a node all has the A attribute same value of the target variable, or when splitting no longer adds value to the predictions.  E(S) is the information entropy of the set S. C. ID3 Algorithm  m is the number of different values of the attribute A In decision tree learning, ID3 (Iterative Dichotomiser 3)[6] in S. is an algorithm used to generate a decision tree. The ID3 algorithm can be summarized as follows:  Fs(Ai) is the frequency (proportion) of the items 1. Take all unused attributes and count their entropy possessing Ai as value for A in S. concerning test samples  Ai is ith possible value of A. 117 All Rights Reserved © 2012 IJARCSEE
  • 4. ISSN: 2277 – 9043 International Journal of Advanced Research in Computer Science and Electronics Engineering Volume 1, Issue 2, April 2012  SAi is a subset of S containing all items where the [12] Srilatha Chebrolua, Ajith Abrahama, Johnson P. Thomasa” Feature deduction and ensemble design of intrusion detection systems” 2004 value of A is Ai. Elsevier Ltd Gain quantifies the entropy improvement by splitting over an [13] Sandhya Peddabachigaria, Ajith Abrahamb,,Crina Grosanc, Johnson Thomasa” Modeling intrusion detection system using hybrid intelligent attribute. systems” 2005 Elsevier Ltd. First Author: Priyanka J. Pathak IV sem MTech[CSE], G.H.Raisoni College of Engineering,Nagpur, R.T.M.N.U, Nagpur Second Author : Asst. Prof. Snehlata Dongre CSE Department, G.H.Raisoni College of engineering,Nagpur R.T.M.N.U, Nagpur Fig7: Output of ID3 algorithm IV. CONCLUSION In this paper, the work is done through which the novel class is detected The system uses K-Means clustering algorithm which produces different clusters of similar type of attacks and total nodes per clusters in the input dataset. Also it shows the updated centroids of each parameter in the input dataset. The Classification stage gives details about detection of different types of attacks and number of nodes in dataset. REFERENCES [1] Kapil K. Wankhade, Snehlata S. Dongre, Prakash S. Prasad , Mrudula M. Gudadhe, Kalpana A. Mankar,” Intrusion Detection System Using New Ensemble Boosting Approach” 2011 3rd International Conference on Computer Modeling and Simulation (ICCMS 2011) [2] Kapil K. Wankhade, Snehlata S. Dongre, Kalpana A. Mankar, Prashant K. Adakane,” A New Adaptive Ensemble Boosting Classifier for Concept Drifting Stream Data” 2011 3rd International Conference on Computer Modeling and Simulation (ICCMS 2011) [3] Hongbo Zhu, Yaqiang Wang, Zhonghua Yu “Clustering of Evolving Data Stream with Multiple Adaptive Sliding Window” International Conference on Data Storage and Data Engineering, 2010. [4] Peng Zhang†, Xingquan Zhu‡, Jianlong Tan†, Li Guo† “Classifier and Cluster Ensembles for Mining Concept Drifting Data Streams” IEEE International Conference on Data Mining, 2010. [5] T.Jyothirmayi, Suresh Reddy “An Algorithm for Better Decision Tree” T.Jyothirmayi et. al. / (IJCSE) International Journal on Computer Science and Engineering , 2010 [6] Yongjin Liu, NaLi, Leina Shi, Fangping Li “An Intrusion Detection Method Based on Decision Tree”, International Conference on E-Health Networking, Digital Ecosystems and Technologies, 2010. [7] A. Bifet, G. Holmes, B. Pfahringer, R. Kirkby and R. Gavalda, “New ensemble methods for evloving data streams”, In KDD’09, ACM, Paris, 2009, pp. 139-148. [8] LID Li-xiong, KANGJing, GUO Yun-fei, HUANGHai “A Three-Step Clustering Algorithm over an Evolving Data Stream” The National High Technology Research and Development Program("863" Program) of China, Fund 2008 AAOII002 sponsors. [9] Mrutyunjaya Panda, Manas Ranjan Patra “A COMPARATIVE STUDY OF DATA MINING ALGORITHMS FOR NETWORK INTRUSION DETECTION” First International Conference on Emerging Trends in Engineering and Technology,2008 [10] Shuang Wu, Chunyu Yang and Jie Zhou “Clustering-training for Data Stream Mining” Sixth IEEE International Conference on Data Mining - Workshops (ICDMW'06) [11] Sang-Hyun Oh, Jin-Suk Kang, Yung-Cheol Byun, Gyung-Leen Park3 and Sang-Yong Byun3 “Intrusion Detection based on Clustering a Data Stream” Third ACIS Int'l Conference on Software Engineering Research, Management and Applications (SERA’05). 118 All Rights Reserved © 2012 IJARCSEE