SlideShare a Scribd company logo
NSL KDD Cup 99 dataset Anomaly Detection using
Machine Learning Technique
An Experiment and evaluation using Decision Tree
Under Guidance of
Dr. Kalpana Thakre
NATIONAL CONFERENCE
ON RECENT TRENDS AND
ADVANCES IN COMPUTING,
COMMUNICATION AND SECURITY
Presented by
Sujeet Raosaheb Suryawanshi
ME IT SEM III ; Roll No. 613012
Agenda
 Anomaly Detection
 Machine Learning
 IDPS
 Survey of algorithm
 Decision Tree
 Experiment with NSL KDD Cup 99
 Result
 Future research roadmap
Anomaly Detection
 Intrusion Detection System / Intrusion Prevention system are
used to protect trusted networks from untrusted networks
 One of the threat is Denial of Service (DoS) Attack
 Approaches to detect DoS attack
1. Signature based
2. Anomaly based
 Signature based deals with limited/fixed set of known threats
 Anomaly-based detection technique centres on the concept of
a baseline for network behaviour, any deviation from this
baseline is considered as an anomaly.
Machine Learning
 A scientific discipline that is concerned with the design and
development of algorithms that allow computers to learn
based on data. A major focus of machine learning research is
to automatically learn to recognize complex patterns.
 This is similar to the way human brain works, humans take
decision based on the learning or experiences they have.
Motivation & Objective
 To understand techniques available to support the vision
envisaged for “Anomaly Detection using Machine Learning
Technique”
 To experiment and evaluate NSL KDD Cup 99 dataset using
Decision Tree Classifier
 To understand various anomaly detection and machine
learning techniques
 Identify requirements for building platform for anomaly
detection system
Classification of IDPS
IntrusionDetection
System Data collection
techniques
HIDS
NIDS
Data analysis
techniques
Specification
based
Anomaly based
Nearest
neighbor based
Clustering
based
K-Means
Statistical based
Classification
based
SVM
Fuzzy Logic
Genetic Algo
Decision Tree
Naive Byesian
Neural Network
Others
Signature based
TECHNIQUES Nearest neighbor based
detection techniques
Clustering-based
anomalies detection
techniques
Statistical
techniques
Classification
techniques
Assumption Normal
data
instances
present in dense
neighbourhoods
belong to a cluster in the
data, lie close to their closest
cluster centroid, belong to
large and dense clusters,
occur in high
probability regions
of a stochastic
model
A classifier that
can distinguish
between normal
and anomalous
classes can be
learnt in the given
feature space.
Anomalies occur far from their closest
neighbours
does not belong to any
cluster, are far away from
their closest cluster centroid,
are either too small or too
sparse clusters.
occur in the low
probability regions
of the stochastic
model
Advantages  Unsupervised/semi-
supervised mode
 Simplest approach
 Unsupervised
 Fast comparison
 Unsupervised
and simple
 Confidence
interval is
provided with
anomaly score
 Fast testing
phase process
 Improved
efficiency with
ensemble
methods
Disadvantages  High computational cost
in testing phase
 Difficult where several
regions are with widely
differing densities.
 Difficult to identify in case
if anomalies are present
in groups.
 Dependent on the
proximity measures used
 High computation cost in
cluster formation phase
 A data object not
belonging to any cluster
may be a noise rather
than an anomaly
 Not suited for large
datasets
 Fail to label anomalies in
certain cases
 Fail to label
the anomalies
correctly in
certain cases
 Difficult to find
best statistic
 For
multivariate
data it fails to
capture the
interactions
between
different
 Heavy
dependency
and reliability
on training
data
 Class
imbalance
problem
Decision Tree SVM Naive Bayes ANN Fuzzy Logic GA K-Means
Technique Classification Classification
& Regression
Classification Classification Classification Classification Clustering
Computation cost High High Less - High - -
High dimensional
data
Yes Yes Yes Yes - - -
Advantages  Easy to
understand
for smaller
trees
 Handles
irrelevant and
missing data
 Compact after
pruning
 High
detection
accuracy.
 Learning
ability for
small set of
samples.
 High
training rate
and
decision
rate,
insensitiven
ess to
dimension
of input
data
 Easy
constructio
n
 Takes
short
computatio
n time;
 Works
efficiently
with large
dataset
 Ability to
generalize
from
limited,
noisy and
incomplete
data.
 Ease of use
 Detect
unknown
intrusions.
 Supports
multiclass
detection.
 Permits a
data point
to be in
more than
one cluster.
It has a
more
natural
representat
ion of the
behavior of
genes. It’s
effective,
especially
against port
scans and
probes.
 Derives
best
classificatio
n rules.
 Selects
optimal
parameters
.
 Simple to
use.
Disadvantage  Fails to
classify a
scattered
data
 Uses greedy
algorithm,
hence may
not find best
tree

 Positive &
negative
examples
req.
 High
dependenc
y on
selecting
good kernel
function.
 Training
takes a long
time.
 Difficult to
handle
continuous
features.
 Highly
dependent
on prior
knowledge.
 Training
required
 Needs to
be
emulated.
 Longer
training
process.
 Over-fitting
issue
 Need to
determine
membershi
p cutoff
value
 Clusters
are
sensitive to
initial
assignment
of centroids
 Can’t
assure
constant
optimizatio
n response
times.
 Over-fitting
issue
 Necessity
of
specifying
k.
 Sensitive
to noise
 Clusters
are
sensitive
to initial
assignme
nt of
centroids.
Decision Tree Classifier
Algorithm : Decision tree
1. Split(node, {example}):
2. A the best attribute for splitting the {examples}
3. Decision attribute for this node  A
4. For each value of A, create new child node
5. Split training {examples} to child nodes
6. For each child node/subset:
If subset is pure: STOP
Else: Split(node,{subset})
Entropy
 For selecting best attribute:
 At each step, find the attribute that can be used to partition the
dataset to minimise the entropy of the data
 A completely homogeneous sample has entropy of 0.
 An equally divided sample has entropy of 1.
 Entropy(s) = - p+log2 (p+) -p-log2 (p-) for a sample of negative and
positive elements.
 The formula for entropy is:
Decision Tree – Sample Dataset
Years
Experience Employed?
Previous
employers
Level of
Education
Top-tier
school Interned Hired
10 Y 4 BS N N Y
0 N 0 BS Y Y Y
7 N 6 BS N N N
2 Y 1 MS Y N Y
20 N 2 PhD Y N N
0 N 0 PhD Y Y Y
5 Y 2 MS N Y Y
3 N 1 BS N Y Y
15 Y 5 BS N N Y
0 N 0 BS N N N
1 N 1 PhD Y N N
4 Y 1 BS N Y Y
0 N 0 PhD Y N Y
Decision Tree – Sample Dataset Explained
1
2
3
45
Steps for creating and evaluating Model
 1) Import data
 2) Edit Metadata
 3) Convert Indicator Values
 4) Select Columns in dataset
 5) Feature selection
 6) “Decision Tree” on
separate partitions
 7) Score Model by adding
scored labels and scored
possibilities
 10) Evaluate model using
Precision, Recall and False
positive rate
 11) Compare performance
and conclude which model to
be used
Activity Diagram
Import Data
Read training set
Convert to indicator
Values
Replace Class column with
indicator values
Select Columns in Dataset
Remove diff level column
along with other
unnecessary columns
Import Data
Read training set
Convert to indicator
Values
Replace Class column with
indicator values
Select Columns in Dataset
Remove diff level column
along with other
unnecessary columns
Feature Selection
Select 15 most important
features
Two-Class
Decision
Tree
Two-Class
Decision Tree Tune Model
Tune Model
Score Model
Score Model
Evaluate Model
Generate and compare
scores
Generate Table that
summarises result
Evaluate Model
Generate and compare
scores
For model testing
ForModelcreationandtuning
Results
 Total Records = ~1.25 Lacs (125973)
 Model Building = ~75K (60%)
 Test Model = ~50K (40%)
Precision(Positive predictive value)= TP/(TP + FP)
Recall (True Positive Rate) = TP/(TP+FN)
False positive rate (FPR), Fall-out, probability of false alarm = FP/Total
Negative
Depth of a tree precision recall false positive rate precision recall false positive rate
5 0.986469 0.986458 0.014073 0.969968 0.969788 0.029288
10 0.996714 0.996713 0.003258 0.98458 0.984557 0.01519
15 0.998297 0.998297 0.00173 0.98616 0.986121 0.01346
20 0.998258 0.998258 0.001764 0.986866 0.986814 0.012658
25 0.998258 0.998258 0.001764 0.98705 0.986992 0.012443
All Features Selected Features
Future research roadmap
 Work with other algorithms - Random Forest, SVM, K-Means,
Logistic Regression and observe if ensemble methodology can
further enhance the model
 Build real time anomaly detection using the same approach
and methodology
[1] K. H. Rao, “Implementation of Anomaly Detection Technique Using Machine Learning Algorithms,” International Journal of Computer Science and Telecommunication, vol. 2, no. 3, pp. 25-31, 2011.
[2] D. K. &. M. Karami, “A Comprehensive Survey on Anomaly-Based Intrusion Detection,” Computer and Information Science, vol. 5, no. 4, pp. 132-140, 2012.
[3] S. S. Ravneet Kaur, “A survey of data mining and social network analysis based anomaly detection techniques,” Egyptian Informatics Journal, vol. 2016, no. 17, p. 199–216, 2016.
[4] A. M. V. M. Niharika Sharma, “Machine Learning Techniques Used in Detection of DOS Attacks: A Literature Review Attacks: A Literature Review,” International Journal of Advanced Research in
Computer Science and Software Engineering, vol. 6, no. 3, pp. 100-106, 2016.
[5] A. N. H. H. J. Salima Omar, “Machine Learning Techniques for Anomaly Detection: An Overview,” International Journal of Computer Applications (0975 8887), vol. 79, no. 2, 2013.
[6] M. H. Dunham, Data Minig, PEARSON, 2013.
[7] M. K. Rashmi Hebbar, “Network Attack Detection Using Machine Learning Approach,” in International Conference , “Computational Systems for Health & Sustainability”, Bangalore, 2015.
[8] M. J. N. Jayveer Singh, “A Survey on Machine Learning Techniques for Intrusion Detection Systems,” International Journal of Advanced Research in Computer and Communication Engineering, Pune,
2013.
[9] G. S. J. M. Harjinder Kaur, “A review of Machine Learning based Anamoly Detection Techniques,” International Journal of Computer Applications Technology and Research, vol. 2, no. 2, pp. 185-187,
2013.
[10] M. R. A. R. O. M. R. F. M. S. D. F. A. K. H. Nutan farah haq, “Application of Machine Learning Approaches in Intrusion Detection System: A Survey,” (IJARAI) International Journal of Advanced Research
in Artificial Intelligence, vol. 4, no. 3, pp. 9-19, 2015.
[11] S. J. Peyman Asgharzadeh, “A SURVEY ON INTRUSION DETECTION SYSTEM BASED SUPPORT VECTOR MACHINE ALGORITHM,” INTERNATIONAL JOURNAL OF RESEARCH IN COMPUTER
APPLICATIONS AND ROBOTICS, vol. 3, no. 12, pp. 42-50, 2015.
[12] J. A. Shikha Agrawal, “Survey on Anomaly Detection using Data Mining Techniques,” in International Conference on Knowledge Based and Intelligent Information and Engineering Systems, Department of
Computer Science and Engineering, Rajiv Gandhi Proudyogiki Vishwavidyalaya, Bhopal, India, 2015.
[13] M. S. H. M. D. A. Asghar Ali Shah, “Analysis of Machine Learning Techniques for Intrusion Detection System: A Review,” International Journal of Computer Applications, vol. 119, no. 3, pp. 19-40, June
2015.
[14] N. P. f. Intelligent, “Numenta,” 2015. [Online]. Available: https://numenta.com/assets/pdf/whitepapers/Numenta%20White%20Paper%20-%20Science%20of%20Anomaly%20Detection.pdf.
[15] A. B. a. V. K. VARUN CHANDOLA, “Anomaly Detection : A Survey,” ACM Computing Surveys, Minneapolis and St. Paul, Minnesota, 2009.
[16] J. W. B. Sergio Armando Gutierrez, Application of Machine Learning Techniques to Distributed Denial of Service (DDoS) Attack Detection: A Systematic Literature Review, Medell´ın, 2012.
[17] J. Goldberg, “RSA,” 2013. [Online]. Available: http://www.rsaconference.com/writable/presentations/file_upload/ht-t08-_big-data_-for-security-purposes_how-can-i-put-big-data-to-work-for-me_copy1.pdf.
[18] “splunk,” 2015. [Online]. Available: https://www.splunk.com/web_assets/pdfs/secure/Splunk_as_a_SIEM_Tech_Brief.pdf. [Accessed 15 April 2016].
[19] B. J. B. A. A. S. David J. Weller-Fahy, “A Survey of Distance and Similarity Measures Used Within Network Intrusion Anomaly Detection,” IEEE COMMUNICATION SURVEYS & TUTORIALS, vol. 17, no.
Bibliography
Thank You
System GINI=0.21,
Gini(Employed)=0.15,
Gini(Interned)=0.15
Results
 Total Records = ~1.25 Lacs (125973)
 Model Building = ~75K (60%)
 Test Model = ~50K (40%)
Anomaly Normal
Anomaly 26887
(TP)
111 (FN) (Type II)
Normal 554 (FP) 22957 (TN)
Total 27441 23068
Accuracy = (TP+TN)/Total=49884/50509 = 0.9868
Precision = TP/(TP + FP)=26887/(26887+554)=0.9798
Actual
Predicted (All Features)
Anomaly Normal Total
26366
(TP)
632(FN) (Type II) 26998
698 (FP) 22813(TN) 23511
27064 23445 50509
Predicted (Selected Features)
Accuracy= 0.9736
Precision= 0.9742
Results
Description Precision Recall Area Under ROC
1. Decision Tree, Full
Data Accuracy
0.9951 0.9764 98.62%
2. Decision Tree,
Selected Feature Data
Accuracy
0.9730 0.9703 97.35%
• Precision (Positive Predictive Value) PPV=TP/TP+FP
• Recall (True Positive Rate) TPR =TP/TP+FN
• Area under ROC: Plot of true positive rate (TPR, or specificity)
against false positive rate (FPR, or 1 - sensitivity), which is all a
Receiver Operating Characteristics (ROC) curve.
Output of Anomaly Detection
 Scores
 Labels
Decision Trees
 Supervised technique
 Entropy
 A measure of dataset’s order-How same or different it is
 If we classify dataset into N different classes
 0=all classes are same
 1=classes are different
 At each step, find the attribute that can be used to partition the data set to
minimise the entropy of the data
 A completely homogeneous sample has entropy of 0.
 An equally divided sample has entropy of 1.
 Entropy(s) = - p+log2 (p+) -p-log2 (p-) for a sample of negative and positive elements.
 The formula for entropy is:
 Greedy algorithm is used
 Demo : Refer Excel
Support Vector Machines
 Supervised technique
 Works well for classifying higher-dimensional data
 Finds higher-dimensional support vectors across which to divide the data
 Kernels can be used to represent data in higher dimensional spaces to find
hyperplanes that might not be apparent in lower dimensions
 Types:
 Linear
 Polynomial (Curves)
 RBF
 Functions takes low dimensional input space and transform it to a higher dimensional
space i.e. it converts not separable problem to separable problems.
 Useful in non-linear separation problem. Simply put, it does some extremely complex
data transformations, then find out the process to separate the data based on the labels
or outputs you’ve defined.
 Computationally expensive
 Plot each data item as a point in n-dimensional space (where n is number of
features you have) with the value of each feature being the value of a particular
coordinate.
 Perform classification by finding the hyper-plane that differentiate the two classes
 Use Train test to decide the model
Support Vector Machines
 ADV:
 Works well when clear
separation exists
 It uses a subset of training
points in the decision function
(called support vectors), so it is
also memory efficient.
 Works well for high dimensional
data
 DISADV
 It doesn’t perform well,
 when we have large data set
because the required training time
is higher
 when the data set has more noise
i.e. target classes are overlapping
 SVM doesn’t directly provide
probability estimates, these are
calculated using an expensive
five-fold cross-validation.
 Noise may create issue
Naïve Bayes
 Classification technique based on Bayes Theorem
 Bayes Theorem
 P(A|B)=P(A)P(B|A)/P(B)
 Efficient in computation as compared to decision trees
 Naïve Bayesian Network can be represented in using DAG,
 Each node represents attribute
 Each link represents influence of one node to another
 Calculate probability and sum it up and as per threshold predict.
 Demo: Spam Classifier
 P(spam|free)=P(spam)P(free|spam) / P(free)
 Probability of message being spam and containing word ‘free’ / overall
probability of having word ‘free’
Naïve Bayes
 ADV:
 Construction is easy and also
takes short computation time;
 It can be applied to large
dataset since it does not
involve in complicated
parameter;
 Interpretation of knowledge
representation; &
 Encodes probabilistic
relationships among the
variables of interest. Ability to
incorporate both Prior
knowledge and data.
 DISADV
 Harder to handle continuous
features. May not contain any
good classifiers if prior
knowledge is wrong.
K-Means Clustering
 Iterative clustering technique based on splitting of data into K
groups that are closes to K centroids
 Unsupervised learning based on the position of each element
 Can uncover interesting grouping
 Randomly pick K centroids
 Assign each data point to its closes centroid
 Recompute the centroids based on their average position
 Iterate until points stop changing
 If want to predict cluster for new data points, just check it is closest to
which centroid
K-Means Clustering
 ADV:
 Less complex
 DISADV
 Choosing right value of K
 Labelling of cluster to be
done manually
 Sensitive to noise
Nature of Input Data
 binary, categorical or continuous.
 Univariate/multivariate
 Nature of attributes determines the applicability of anomaly
detection techniques
 E.g., Statistical techniques to be used for continuous and categorical
data.
Data Labels
 Based on the extent to which the labels are available, anomaly
detection techniques can operate in one of the following three
modes:
 Supervised
 Semi-Supervised
 Unsupervised
Types of Anomalies
 Point
 Contextual
 Collective
Challenges
 Defining a normal region
 Anomalous observations appear like normal
 Notion of an anomaly
 Availability of labeled data
 Noise
Key components
Research Areas
Machine Learning
Data Mining
Information Theory
Spectral Theory
……….
Anomaly
Detection
Technique
Problem
Characteristics
Nature of Data
Labels
Anomaly Type
Output
………
……….
Application Domains
Intrusion Detection
Fraud Detection
………
………
……….
Methodology
Monitored Environment
Parameterisation
Training
Model
Detection
Intrusion Reporting
Architecture
Apache Spark Streaming
MLibSource of Input
Data
Anomaly
detection model Detected Anomalies
(Outliers)
Real-time stream processing engine Real-time
stream
Real-time
stream
Use Case Diagram

More Related Content

What's hot

Computer Security and Intrusion Detection(IDS/IPS)
Computer Security and Intrusion Detection(IDS/IPS)Computer Security and Intrusion Detection(IDS/IPS)
Computer Security and Intrusion Detection(IDS/IPS)
LJ PROJECTS
 
Deep learning approach for network intrusion detection system
Deep learning approach for network intrusion detection systemDeep learning approach for network intrusion detection system
Deep learning approach for network intrusion detection system
Avinash Kumar
 
Anomaly detection
Anomaly detectionAnomaly detection
Anomaly detection
QuantUniversity
 
Anomaly detection
Anomaly detectionAnomaly detection
Anomaly detection
Hitesh Mohapatra
 
Poisoning attacks on Federated Learning based IoT Intrusion Detection System
Poisoning attacks on Federated Learning based IoT Intrusion Detection SystemPoisoning attacks on Federated Learning based IoT Intrusion Detection System
Poisoning attacks on Federated Learning based IoT Intrusion Detection System
Sai Kiran Kadam
 
Anomaly Detection in Seasonal Time Series
Anomaly Detection in Seasonal Time SeriesAnomaly Detection in Seasonal Time Series
Anomaly Detection in Seasonal Time Series
Humberto Marchezi
 
Application of Machine Learning in Cybersecurity
Application of Machine Learning in CybersecurityApplication of Machine Learning in Cybersecurity
Application of Machine Learning in Cybersecurity
Pratap Dangeti
 
Anomaly Detection Technique
Anomaly Detection TechniqueAnomaly Detection Technique
Anomaly Detection Technique
Chakrit Phain
 
Knn Algorithm presentation
Knn Algorithm presentationKnn Algorithm presentation
Knn Algorithm presentation
RishavSharma112
 
Seminar Report | Network Intrusion Detection using Supervised Machine Learnin...
Seminar Report | Network Intrusion Detection using Supervised Machine Learnin...Seminar Report | Network Intrusion Detection using Supervised Machine Learnin...
Seminar Report | Network Intrusion Detection using Supervised Machine Learnin...
Jowin John Chemban
 
Malware Dectection Using Machine learning
Malware Dectection Using Machine learningMalware Dectection Using Machine learning
Malware Dectection Using Machine learning
Shubham Dubey
 
Anomaly detection
Anomaly detectionAnomaly detection
Anomaly detection
Dr. Stylianos Kampakis
 
A review of machine learning based anomaly detection
A review of machine learning based anomaly detectionA review of machine learning based anomaly detection
A review of machine learning based anomaly detection
Mohamed Elfadly
 
Intrusion detection system
Intrusion detection systemIntrusion detection system
Intrusion detection system
AAKASH S
 
3.3 hierarchical methods
3.3 hierarchical methods3.3 hierarchical methods
3.3 hierarchical methods
Krish_ver2
 
intrusion detection system (IDS)
intrusion detection system (IDS)intrusion detection system (IDS)
intrusion detection system (IDS)
Aj Maurya
 
Intrusion detection system
Intrusion detection systemIntrusion detection system
Intrusion detection system
Roshan Ranabhat
 
Malware Detection - A Machine Learning Perspective
Malware Detection - A Machine Learning PerspectiveMalware Detection - A Machine Learning Perspective
Malware Detection - A Machine Learning Perspective
Chong-Kuan Chen
 
Intrusion detection system
Intrusion detection system Intrusion detection system
Intrusion detection system
gaurav koriya
 

What's hot (20)

Computer Security and Intrusion Detection(IDS/IPS)
Computer Security and Intrusion Detection(IDS/IPS)Computer Security and Intrusion Detection(IDS/IPS)
Computer Security and Intrusion Detection(IDS/IPS)
 
Deep learning approach for network intrusion detection system
Deep learning approach for network intrusion detection systemDeep learning approach for network intrusion detection system
Deep learning approach for network intrusion detection system
 
Anomaly detection
Anomaly detectionAnomaly detection
Anomaly detection
 
Anomaly detection
Anomaly detectionAnomaly detection
Anomaly detection
 
Poisoning attacks on Federated Learning based IoT Intrusion Detection System
Poisoning attacks on Federated Learning based IoT Intrusion Detection SystemPoisoning attacks on Federated Learning based IoT Intrusion Detection System
Poisoning attacks on Federated Learning based IoT Intrusion Detection System
 
Anomaly Detection in Seasonal Time Series
Anomaly Detection in Seasonal Time SeriesAnomaly Detection in Seasonal Time Series
Anomaly Detection in Seasonal Time Series
 
Application of Machine Learning in Cybersecurity
Application of Machine Learning in CybersecurityApplication of Machine Learning in Cybersecurity
Application of Machine Learning in Cybersecurity
 
Anomaly Detection Technique
Anomaly Detection TechniqueAnomaly Detection Technique
Anomaly Detection Technique
 
Knn Algorithm presentation
Knn Algorithm presentationKnn Algorithm presentation
Knn Algorithm presentation
 
Seminar Report | Network Intrusion Detection using Supervised Machine Learnin...
Seminar Report | Network Intrusion Detection using Supervised Machine Learnin...Seminar Report | Network Intrusion Detection using Supervised Machine Learnin...
Seminar Report | Network Intrusion Detection using Supervised Machine Learnin...
 
Malware Dectection Using Machine learning
Malware Dectection Using Machine learningMalware Dectection Using Machine learning
Malware Dectection Using Machine learning
 
Anomaly detection
Anomaly detectionAnomaly detection
Anomaly detection
 
A review of machine learning based anomaly detection
A review of machine learning based anomaly detectionA review of machine learning based anomaly detection
A review of machine learning based anomaly detection
 
Intrusion detection system
Intrusion detection systemIntrusion detection system
Intrusion detection system
 
3.3 hierarchical methods
3.3 hierarchical methods3.3 hierarchical methods
3.3 hierarchical methods
 
intrusion detection system (IDS)
intrusion detection system (IDS)intrusion detection system (IDS)
intrusion detection system (IDS)
 
Intrusion detection system
Intrusion detection systemIntrusion detection system
Intrusion detection system
 
Malware Detection - A Machine Learning Perspective
Malware Detection - A Machine Learning PerspectiveMalware Detection - A Machine Learning Perspective
Malware Detection - A Machine Learning Perspective
 
K Nearest Neighbors
K Nearest NeighborsK Nearest Neighbors
K Nearest Neighbors
 
Intrusion detection system
Intrusion detection system Intrusion detection system
Intrusion detection system
 

Viewers also liked

Detecting Hacks: Anomaly Detection on Networking Data
Detecting Hacks: Anomaly Detection on Networking DataDetecting Hacks: Anomaly Detection on Networking Data
Detecting Hacks: Anomaly Detection on Networking Data
James Sirota
 
A Practical Guide to Anomaly Detection for DevOps
A Practical Guide to Anomaly Detection for DevOpsA Practical Guide to Anomaly Detection for DevOps
A Practical Guide to Anomaly Detection for DevOps
BigPanda
 
Anomaly Detection
Anomaly DetectionAnomaly Detection
Anomaly Detection
Carol Hargreaves
 
Anomaly Detection with Apache Spark
Anomaly Detection with Apache SparkAnomaly Detection with Apache Spark
Anomaly Detection with Apache SparkCloudera, Inc.
 
Anomaly Detection
Anomaly DetectionAnomaly Detection
Anomaly Detection
DataminingTools Inc
 
Fighting Knowledge Acquisition Bottleneck with Argument Based ...
Fighting Knowledge Acquisition Bottleneck with Argument Based ...Fighting Knowledge Acquisition Bottleneck with Argument Based ...
Fighting Knowledge Acquisition Bottleneck with Argument Based ...butest
 
∂u∂u Multi-Tenanted Framework: Distributed Near Duplicate Detection for Big Data
∂u∂u Multi-Tenanted Framework: Distributed Near Duplicate Detection for Big Data∂u∂u Multi-Tenanted Framework: Distributed Near Duplicate Detection for Big Data
∂u∂u Multi-Tenanted Framework: Distributed Near Duplicate Detection for Big Data
Pradeeban Kathiravelu, Ph.D.
 
powerpoint feb
powerpoint febpowerpoint feb
powerpoint febimu409
 
Adaptive Intrusion Detection Using Learning Classifiers
Adaptive Intrusion Detection Using Learning ClassifiersAdaptive Intrusion Detection Using Learning Classifiers
Adaptive Intrusion Detection Using Learning Classifiers
Patrick Nicolas
 
ViTeNA: An SDN-Based Virtual Network Embedding Algorithm for Multi-Tenant Dat...
ViTeNA: An SDN-Based Virtual Network Embedding Algorithm for Multi-Tenant Dat...ViTeNA: An SDN-Based Virtual Network Embedding Algorithm for Multi-Tenant Dat...
ViTeNA: An SDN-Based Virtual Network Embedding Algorithm for Multi-Tenant Dat...
Pradeeban Kathiravelu, Ph.D.
 
machine learning in the age of big data: new approaches and business applicat...
machine learning in the age of big data: new approaches and business applicat...machine learning in the age of big data: new approaches and business applicat...
machine learning in the age of big data: new approaches and business applicat...
Armando Vieira
 
KM technologies and strategy
KM technologies and strategyKM technologies and strategy
KM technologies and strategy
Andre Saito
 
Intrusion detection using data mining
Intrusion detection using data miningIntrusion detection using data mining
Intrusion detection using data miningbalbeerrawat
 
Automatic Machine Learning, AutoML
Automatic Machine Learning, AutoMLAutomatic Machine Learning, AutoML
Automatic Machine Learning, AutoML
Himadri Mishra
 
Ids presentation
Ids presentationIds presentation
Ids presentation
Solmaz Salehian
 
Analysis and Design for Intrusion Detection System Based on Data Mining
Analysis and Design for Intrusion Detection System Based on Data MiningAnalysis and Design for Intrusion Detection System Based on Data Mining
Analysis and Design for Intrusion Detection System Based on Data Mining
Pritesh Ranjan
 
2015 01-17 Lambda Architecture with Apache Spark, NextML Conference
2015 01-17 Lambda Architecture with Apache Spark, NextML Conference2015 01-17 Lambda Architecture with Apache Spark, NextML Conference
2015 01-17 Lambda Architecture with Apache Spark, NextML Conference
DB Tsai
 
Efficient Duplicate Detection Over Massive Data Sets
Efficient Duplicate Detection Over Massive Data SetsEfficient Duplicate Detection Over Massive Data Sets
Efficient Duplicate Detection Over Massive Data Sets
Pradeeban Kathiravelu, Ph.D.
 
Computer security - A machine learning approach
Computer security - A machine learning approachComputer security - A machine learning approach
Computer security - A machine learning approach
Sandeep Sabnani
 
Machine Learning - Challenges, Learnings & Opportunities
Machine Learning - Challenges, Learnings & OpportunitiesMachine Learning - Challenges, Learnings & Opportunities
Machine Learning - Challenges, Learnings & Opportunities
CodePolitan
 

Viewers also liked (20)

Detecting Hacks: Anomaly Detection on Networking Data
Detecting Hacks: Anomaly Detection on Networking DataDetecting Hacks: Anomaly Detection on Networking Data
Detecting Hacks: Anomaly Detection on Networking Data
 
A Practical Guide to Anomaly Detection for DevOps
A Practical Guide to Anomaly Detection for DevOpsA Practical Guide to Anomaly Detection for DevOps
A Practical Guide to Anomaly Detection for DevOps
 
Anomaly Detection
Anomaly DetectionAnomaly Detection
Anomaly Detection
 
Anomaly Detection with Apache Spark
Anomaly Detection with Apache SparkAnomaly Detection with Apache Spark
Anomaly Detection with Apache Spark
 
Anomaly Detection
Anomaly DetectionAnomaly Detection
Anomaly Detection
 
Fighting Knowledge Acquisition Bottleneck with Argument Based ...
Fighting Knowledge Acquisition Bottleneck with Argument Based ...Fighting Knowledge Acquisition Bottleneck with Argument Based ...
Fighting Knowledge Acquisition Bottleneck with Argument Based ...
 
∂u∂u Multi-Tenanted Framework: Distributed Near Duplicate Detection for Big Data
∂u∂u Multi-Tenanted Framework: Distributed Near Duplicate Detection for Big Data∂u∂u Multi-Tenanted Framework: Distributed Near Duplicate Detection for Big Data
∂u∂u Multi-Tenanted Framework: Distributed Near Duplicate Detection for Big Data
 
powerpoint feb
powerpoint febpowerpoint feb
powerpoint feb
 
Adaptive Intrusion Detection Using Learning Classifiers
Adaptive Intrusion Detection Using Learning ClassifiersAdaptive Intrusion Detection Using Learning Classifiers
Adaptive Intrusion Detection Using Learning Classifiers
 
ViTeNA: An SDN-Based Virtual Network Embedding Algorithm for Multi-Tenant Dat...
ViTeNA: An SDN-Based Virtual Network Embedding Algorithm for Multi-Tenant Dat...ViTeNA: An SDN-Based Virtual Network Embedding Algorithm for Multi-Tenant Dat...
ViTeNA: An SDN-Based Virtual Network Embedding Algorithm for Multi-Tenant Dat...
 
machine learning in the age of big data: new approaches and business applicat...
machine learning in the age of big data: new approaches and business applicat...machine learning in the age of big data: new approaches and business applicat...
machine learning in the age of big data: new approaches and business applicat...
 
KM technologies and strategy
KM technologies and strategyKM technologies and strategy
KM technologies and strategy
 
Intrusion detection using data mining
Intrusion detection using data miningIntrusion detection using data mining
Intrusion detection using data mining
 
Automatic Machine Learning, AutoML
Automatic Machine Learning, AutoMLAutomatic Machine Learning, AutoML
Automatic Machine Learning, AutoML
 
Ids presentation
Ids presentationIds presentation
Ids presentation
 
Analysis and Design for Intrusion Detection System Based on Data Mining
Analysis and Design for Intrusion Detection System Based on Data MiningAnalysis and Design for Intrusion Detection System Based on Data Mining
Analysis and Design for Intrusion Detection System Based on Data Mining
 
2015 01-17 Lambda Architecture with Apache Spark, NextML Conference
2015 01-17 Lambda Architecture with Apache Spark, NextML Conference2015 01-17 Lambda Architecture with Apache Spark, NextML Conference
2015 01-17 Lambda Architecture with Apache Spark, NextML Conference
 
Efficient Duplicate Detection Over Massive Data Sets
Efficient Duplicate Detection Over Massive Data SetsEfficient Duplicate Detection Over Massive Data Sets
Efficient Duplicate Detection Over Massive Data Sets
 
Computer security - A machine learning approach
Computer security - A machine learning approachComputer security - A machine learning approach
Computer security - A machine learning approach
 
Machine Learning - Challenges, Learnings & Opportunities
Machine Learning - Challenges, Learnings & OpportunitiesMachine Learning - Challenges, Learnings & Opportunities
Machine Learning - Challenges, Learnings & Opportunities
 

Similar to NSL KDD Cup 99 dataset Anomaly Detection using Machine Learning Technique

Review of Algorithms for Crime Analysis & Prediction
Review of Algorithms for Crime Analysis & PredictionReview of Algorithms for Crime Analysis & Prediction
Review of Algorithms for Crime Analysis & Prediction
IRJET Journal
 
Cyb 5675 class project final
Cyb 5675   class project finalCyb 5675   class project final
Cyb 5675 class project finalCraig Cannon
 
A chi-square-SVM based pedagogical rule extraction method for microarray data...
A chi-square-SVM based pedagogical rule extraction method for microarray data...A chi-square-SVM based pedagogical rule extraction method for microarray data...
A chi-square-SVM based pedagogical rule extraction method for microarray data...
IJAAS Team
 
Outlier Detection in Data Mining An Essential Component of Semiconductor Manu...
Outlier Detection in Data Mining An Essential Component of Semiconductor Manu...Outlier Detection in Data Mining An Essential Component of Semiconductor Manu...
Outlier Detection in Data Mining An Essential Component of Semiconductor Manu...
yieldWerx Semiconductor
 
Decision treeinductionmethodsandtheirapplicationtobigdatafinal 5
Decision treeinductionmethodsandtheirapplicationtobigdatafinal 5Decision treeinductionmethodsandtheirapplicationtobigdatafinal 5
Decision treeinductionmethodsandtheirapplicationtobigdatafinal 5
ssuser33da69
 
Identifying and classifying unknown Network Disruption
Identifying and classifying unknown Network DisruptionIdentifying and classifying unknown Network Disruption
Identifying and classifying unknown Network Disruption
jagan477830
 
Fault detection of imbalanced data using incremental clustering
Fault detection of imbalanced data using incremental clusteringFault detection of imbalanced data using incremental clustering
Fault detection of imbalanced data using incremental clustering
IRJET Journal
 
The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)
theijes
 
SEMI SUPERVISED BASED SPATIAL EM FRAMEWORK FOR MICROARRAY ANALYSIS
SEMI SUPERVISED BASED SPATIAL EM FRAMEWORK FOR MICROARRAY ANALYSISSEMI SUPERVISED BASED SPATIAL EM FRAMEWORK FOR MICROARRAY ANALYSIS
SEMI SUPERVISED BASED SPATIAL EM FRAMEWORK FOR MICROARRAY ANALYSIS
IRJET Journal
 
New Fuzzy Logic Based Intrusion Detection System
New Fuzzy Logic Based Intrusion Detection SystemNew Fuzzy Logic Based Intrusion Detection System
New Fuzzy Logic Based Intrusion Detection System
ijsrd.com
 
G44093135
G44093135G44093135
G44093135
IJERA Editor
 
Network Intrusion Detection System using Machine Learning
Network Intrusion Detection System using Machine LearningNetwork Intrusion Detection System using Machine Learning
Network Intrusion Detection System using Machine Learning
IRJET Journal
 
Text Analytics for Legal work
Text Analytics for Legal workText Analytics for Legal work
Text Analytics for Legal work
AlgoAnalytics Financial Consultancy Pvt. Ltd.
 
Developing an Artificial Immune Model for Cash Fraud Detection
Developing an Artificial Immune Model for Cash Fraud Detection   Developing an Artificial Immune Model for Cash Fraud Detection
Developing an Artificial Immune Model for Cash Fraud Detection
khawla Osama
 
Study and Analysis of K-Means Clustering Algorithm Using Rapidminer
Study and Analysis of K-Means Clustering Algorithm Using RapidminerStudy and Analysis of K-Means Clustering Algorithm Using Rapidminer
Study and Analysis of K-Means Clustering Algorithm Using Rapidminer
IJERA Editor
 
Analysis on different Data mining Techniques and algorithms used in IOT
Analysis on different Data mining Techniques and algorithms used in IOTAnalysis on different Data mining Techniques and algorithms used in IOT
Analysis on different Data mining Techniques and algorithms used in IOT
IJERA Editor
 
Wearable Computing - Part IV: Ensemble classifiers & Insight into ongoing res...
Wearable Computing - Part IV: Ensemble classifiers & Insight into ongoing res...Wearable Computing - Part IV: Ensemble classifiers & Insight into ongoing res...
Wearable Computing - Part IV: Ensemble classifiers & Insight into ongoing res...
Daniel Roggen
 
Anomaly detection by using CFS subset and neural network with WEKA tools
Anomaly detection by using CFS subset and neural network with WEKA tools Anomaly detection by using CFS subset and neural network with WEKA tools
Anomaly detection by using CFS subset and neural network with WEKA tools
Drjabez
 

Similar to NSL KDD Cup 99 dataset Anomaly Detection using Machine Learning Technique (20)

Review of Algorithms for Crime Analysis & Prediction
Review of Algorithms for Crime Analysis & PredictionReview of Algorithms for Crime Analysis & Prediction
Review of Algorithms for Crime Analysis & Prediction
 
Cyb 5675 class project final
Cyb 5675   class project finalCyb 5675   class project final
Cyb 5675 class project final
 
PNN and inversion-B
PNN and inversion-BPNN and inversion-B
PNN and inversion-B
 
A chi-square-SVM based pedagogical rule extraction method for microarray data...
A chi-square-SVM based pedagogical rule extraction method for microarray data...A chi-square-SVM based pedagogical rule extraction method for microarray data...
A chi-square-SVM based pedagogical rule extraction method for microarray data...
 
Outlier Detection in Data Mining An Essential Component of Semiconductor Manu...
Outlier Detection in Data Mining An Essential Component of Semiconductor Manu...Outlier Detection in Data Mining An Essential Component of Semiconductor Manu...
Outlier Detection in Data Mining An Essential Component of Semiconductor Manu...
 
Decision treeinductionmethodsandtheirapplicationtobigdatafinal 5
Decision treeinductionmethodsandtheirapplicationtobigdatafinal 5Decision treeinductionmethodsandtheirapplicationtobigdatafinal 5
Decision treeinductionmethodsandtheirapplicationtobigdatafinal 5
 
Identifying and classifying unknown Network Disruption
Identifying and classifying unknown Network DisruptionIdentifying and classifying unknown Network Disruption
Identifying and classifying unknown Network Disruption
 
Fault detection of imbalanced data using incremental clustering
Fault detection of imbalanced data using incremental clusteringFault detection of imbalanced data using incremental clustering
Fault detection of imbalanced data using incremental clustering
 
The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)
 
SEMI SUPERVISED BASED SPATIAL EM FRAMEWORK FOR MICROARRAY ANALYSIS
SEMI SUPERVISED BASED SPATIAL EM FRAMEWORK FOR MICROARRAY ANALYSISSEMI SUPERVISED BASED SPATIAL EM FRAMEWORK FOR MICROARRAY ANALYSIS
SEMI SUPERVISED BASED SPATIAL EM FRAMEWORK FOR MICROARRAY ANALYSIS
 
New Fuzzy Logic Based Intrusion Detection System
New Fuzzy Logic Based Intrusion Detection SystemNew Fuzzy Logic Based Intrusion Detection System
New Fuzzy Logic Based Intrusion Detection System
 
Ij2514951500
Ij2514951500Ij2514951500
Ij2514951500
 
G44093135
G44093135G44093135
G44093135
 
Network Intrusion Detection System using Machine Learning
Network Intrusion Detection System using Machine LearningNetwork Intrusion Detection System using Machine Learning
Network Intrusion Detection System using Machine Learning
 
Text Analytics for Legal work
Text Analytics for Legal workText Analytics for Legal work
Text Analytics for Legal work
 
Developing an Artificial Immune Model for Cash Fraud Detection
Developing an Artificial Immune Model for Cash Fraud Detection   Developing an Artificial Immune Model for Cash Fraud Detection
Developing an Artificial Immune Model for Cash Fraud Detection
 
Study and Analysis of K-Means Clustering Algorithm Using Rapidminer
Study and Analysis of K-Means Clustering Algorithm Using RapidminerStudy and Analysis of K-Means Clustering Algorithm Using Rapidminer
Study and Analysis of K-Means Clustering Algorithm Using Rapidminer
 
Analysis on different Data mining Techniques and algorithms used in IOT
Analysis on different Data mining Techniques and algorithms used in IOTAnalysis on different Data mining Techniques and algorithms used in IOT
Analysis on different Data mining Techniques and algorithms used in IOT
 
Wearable Computing - Part IV: Ensemble classifiers & Insight into ongoing res...
Wearable Computing - Part IV: Ensemble classifiers & Insight into ongoing res...Wearable Computing - Part IV: Ensemble classifiers & Insight into ongoing res...
Wearable Computing - Part IV: Ensemble classifiers & Insight into ongoing res...
 
Anomaly detection by using CFS subset and neural network with WEKA tools
Anomaly detection by using CFS subset and neural network with WEKA tools Anomaly detection by using CFS subset and neural network with WEKA tools
Anomaly detection by using CFS subset and neural network with WEKA tools
 

Recently uploaded

standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
ArpitMalhotra16
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
ewymefz
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
nscud
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
TravisMalana
 
FP Growth Algorithm and its Applications
FP Growth Algorithm and its ApplicationsFP Growth Algorithm and its Applications
FP Growth Algorithm and its Applications
MaleehaSheikh2
 
Business update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMIBusiness update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMI
AlejandraGmez176757
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
Tabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflowsTabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflows
alex933524
 
社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .
NABLAS株式会社
 
tapal brand analysis PPT slide for comptetive data
tapal brand analysis PPT slide for comptetive datatapal brand analysis PPT slide for comptetive data
tapal brand analysis PPT slide for comptetive data
theahmadsaood
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
ewymefz
 
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
ewymefz
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
enxupq
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
ocavb
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
ukgaet
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
benishzehra469
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
haila53
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
NABLAS株式会社
 
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
correoyaya
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
Subhajit Sahu
 

Recently uploaded (20)

standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
 
FP Growth Algorithm and its Applications
FP Growth Algorithm and its ApplicationsFP Growth Algorithm and its Applications
FP Growth Algorithm and its Applications
 
Business update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMIBusiness update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMI
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
Tabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflowsTabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflows
 
社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .
 
tapal brand analysis PPT slide for comptetive data
tapal brand analysis PPT slide for comptetive datatapal brand analysis PPT slide for comptetive data
tapal brand analysis PPT slide for comptetive data
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
 
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
 
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
 

NSL KDD Cup 99 dataset Anomaly Detection using Machine Learning Technique

  • 1. NSL KDD Cup 99 dataset Anomaly Detection using Machine Learning Technique An Experiment and evaluation using Decision Tree Under Guidance of Dr. Kalpana Thakre NATIONAL CONFERENCE ON RECENT TRENDS AND ADVANCES IN COMPUTING, COMMUNICATION AND SECURITY Presented by Sujeet Raosaheb Suryawanshi ME IT SEM III ; Roll No. 613012
  • 2. Agenda  Anomaly Detection  Machine Learning  IDPS  Survey of algorithm  Decision Tree  Experiment with NSL KDD Cup 99  Result  Future research roadmap
  • 3. Anomaly Detection  Intrusion Detection System / Intrusion Prevention system are used to protect trusted networks from untrusted networks  One of the threat is Denial of Service (DoS) Attack  Approaches to detect DoS attack 1. Signature based 2. Anomaly based  Signature based deals with limited/fixed set of known threats  Anomaly-based detection technique centres on the concept of a baseline for network behaviour, any deviation from this baseline is considered as an anomaly.
  • 4. Machine Learning  A scientific discipline that is concerned with the design and development of algorithms that allow computers to learn based on data. A major focus of machine learning research is to automatically learn to recognize complex patterns.  This is similar to the way human brain works, humans take decision based on the learning or experiences they have.
  • 5. Motivation & Objective  To understand techniques available to support the vision envisaged for “Anomaly Detection using Machine Learning Technique”  To experiment and evaluate NSL KDD Cup 99 dataset using Decision Tree Classifier  To understand various anomaly detection and machine learning techniques  Identify requirements for building platform for anomaly detection system
  • 6. Classification of IDPS IntrusionDetection System Data collection techniques HIDS NIDS Data analysis techniques Specification based Anomaly based Nearest neighbor based Clustering based K-Means Statistical based Classification based SVM Fuzzy Logic Genetic Algo Decision Tree Naive Byesian Neural Network Others Signature based
  • 7. TECHNIQUES Nearest neighbor based detection techniques Clustering-based anomalies detection techniques Statistical techniques Classification techniques Assumption Normal data instances present in dense neighbourhoods belong to a cluster in the data, lie close to their closest cluster centroid, belong to large and dense clusters, occur in high probability regions of a stochastic model A classifier that can distinguish between normal and anomalous classes can be learnt in the given feature space. Anomalies occur far from their closest neighbours does not belong to any cluster, are far away from their closest cluster centroid, are either too small or too sparse clusters. occur in the low probability regions of the stochastic model Advantages  Unsupervised/semi- supervised mode  Simplest approach  Unsupervised  Fast comparison  Unsupervised and simple  Confidence interval is provided with anomaly score  Fast testing phase process  Improved efficiency with ensemble methods Disadvantages  High computational cost in testing phase  Difficult where several regions are with widely differing densities.  Difficult to identify in case if anomalies are present in groups.  Dependent on the proximity measures used  High computation cost in cluster formation phase  A data object not belonging to any cluster may be a noise rather than an anomaly  Not suited for large datasets  Fail to label anomalies in certain cases  Fail to label the anomalies correctly in certain cases  Difficult to find best statistic  For multivariate data it fails to capture the interactions between different  Heavy dependency and reliability on training data  Class imbalance problem
  • 8. Decision Tree SVM Naive Bayes ANN Fuzzy Logic GA K-Means Technique Classification Classification & Regression Classification Classification Classification Classification Clustering Computation cost High High Less - High - - High dimensional data Yes Yes Yes Yes - - - Advantages  Easy to understand for smaller trees  Handles irrelevant and missing data  Compact after pruning  High detection accuracy.  Learning ability for small set of samples.  High training rate and decision rate, insensitiven ess to dimension of input data  Easy constructio n  Takes short computatio n time;  Works efficiently with large dataset  Ability to generalize from limited, noisy and incomplete data.  Ease of use  Detect unknown intrusions.  Supports multiclass detection.  Permits a data point to be in more than one cluster. It has a more natural representat ion of the behavior of genes. It’s effective, especially against port scans and probes.  Derives best classificatio n rules.  Selects optimal parameters .  Simple to use. Disadvantage  Fails to classify a scattered data  Uses greedy algorithm, hence may not find best tree   Positive & negative examples req.  High dependenc y on selecting good kernel function.  Training takes a long time.  Difficult to handle continuous features.  Highly dependent on prior knowledge.  Training required  Needs to be emulated.  Longer training process.  Over-fitting issue  Need to determine membershi p cutoff value  Clusters are sensitive to initial assignment of centroids  Can’t assure constant optimizatio n response times.  Over-fitting issue  Necessity of specifying k.  Sensitive to noise  Clusters are sensitive to initial assignme nt of centroids.
  • 9. Decision Tree Classifier Algorithm : Decision tree 1. Split(node, {example}): 2. A the best attribute for splitting the {examples} 3. Decision attribute for this node  A 4. For each value of A, create new child node 5. Split training {examples} to child nodes 6. For each child node/subset: If subset is pure: STOP Else: Split(node,{subset})
  • 10. Entropy  For selecting best attribute:  At each step, find the attribute that can be used to partition the dataset to minimise the entropy of the data  A completely homogeneous sample has entropy of 0.  An equally divided sample has entropy of 1.  Entropy(s) = - p+log2 (p+) -p-log2 (p-) for a sample of negative and positive elements.  The formula for entropy is:
  • 11. Decision Tree – Sample Dataset Years Experience Employed? Previous employers Level of Education Top-tier school Interned Hired 10 Y 4 BS N N Y 0 N 0 BS Y Y Y 7 N 6 BS N N N 2 Y 1 MS Y N Y 20 N 2 PhD Y N N 0 N 0 PhD Y Y Y 5 Y 2 MS N Y Y 3 N 1 BS N Y Y 15 Y 5 BS N N Y 0 N 0 BS N N N 1 N 1 PhD Y N N 4 Y 1 BS N Y Y 0 N 0 PhD Y N Y
  • 12. Decision Tree – Sample Dataset Explained 1 2 3 45
  • 13. Steps for creating and evaluating Model  1) Import data  2) Edit Metadata  3) Convert Indicator Values  4) Select Columns in dataset  5) Feature selection  6) “Decision Tree” on separate partitions  7) Score Model by adding scored labels and scored possibilities  10) Evaluate model using Precision, Recall and False positive rate  11) Compare performance and conclude which model to be used
  • 14. Activity Diagram Import Data Read training set Convert to indicator Values Replace Class column with indicator values Select Columns in Dataset Remove diff level column along with other unnecessary columns Import Data Read training set Convert to indicator Values Replace Class column with indicator values Select Columns in Dataset Remove diff level column along with other unnecessary columns Feature Selection Select 15 most important features Two-Class Decision Tree Two-Class Decision Tree Tune Model Tune Model Score Model Score Model Evaluate Model Generate and compare scores Generate Table that summarises result Evaluate Model Generate and compare scores For model testing ForModelcreationandtuning
  • 15. Results  Total Records = ~1.25 Lacs (125973)  Model Building = ~75K (60%)  Test Model = ~50K (40%) Precision(Positive predictive value)= TP/(TP + FP) Recall (True Positive Rate) = TP/(TP+FN) False positive rate (FPR), Fall-out, probability of false alarm = FP/Total Negative Depth of a tree precision recall false positive rate precision recall false positive rate 5 0.986469 0.986458 0.014073 0.969968 0.969788 0.029288 10 0.996714 0.996713 0.003258 0.98458 0.984557 0.01519 15 0.998297 0.998297 0.00173 0.98616 0.986121 0.01346 20 0.998258 0.998258 0.001764 0.986866 0.986814 0.012658 25 0.998258 0.998258 0.001764 0.98705 0.986992 0.012443 All Features Selected Features
  • 16. Future research roadmap  Work with other algorithms - Random Forest, SVM, K-Means, Logistic Regression and observe if ensemble methodology can further enhance the model  Build real time anomaly detection using the same approach and methodology
  • 17. [1] K. H. Rao, “Implementation of Anomaly Detection Technique Using Machine Learning Algorithms,” International Journal of Computer Science and Telecommunication, vol. 2, no. 3, pp. 25-31, 2011. [2] D. K. &. M. Karami, “A Comprehensive Survey on Anomaly-Based Intrusion Detection,” Computer and Information Science, vol. 5, no. 4, pp. 132-140, 2012. [3] S. S. Ravneet Kaur, “A survey of data mining and social network analysis based anomaly detection techniques,” Egyptian Informatics Journal, vol. 2016, no. 17, p. 199–216, 2016. [4] A. M. V. M. Niharika Sharma, “Machine Learning Techniques Used in Detection of DOS Attacks: A Literature Review Attacks: A Literature Review,” International Journal of Advanced Research in Computer Science and Software Engineering, vol. 6, no. 3, pp. 100-106, 2016. [5] A. N. H. H. J. Salima Omar, “Machine Learning Techniques for Anomaly Detection: An Overview,” International Journal of Computer Applications (0975 8887), vol. 79, no. 2, 2013. [6] M. H. Dunham, Data Minig, PEARSON, 2013. [7] M. K. Rashmi Hebbar, “Network Attack Detection Using Machine Learning Approach,” in International Conference , “Computational Systems for Health & Sustainability”, Bangalore, 2015. [8] M. J. N. Jayveer Singh, “A Survey on Machine Learning Techniques for Intrusion Detection Systems,” International Journal of Advanced Research in Computer and Communication Engineering, Pune, 2013. [9] G. S. J. M. Harjinder Kaur, “A review of Machine Learning based Anamoly Detection Techniques,” International Journal of Computer Applications Technology and Research, vol. 2, no. 2, pp. 185-187, 2013. [10] M. R. A. R. O. M. R. F. M. S. D. F. A. K. H. Nutan farah haq, “Application of Machine Learning Approaches in Intrusion Detection System: A Survey,” (IJARAI) International Journal of Advanced Research in Artificial Intelligence, vol. 4, no. 3, pp. 9-19, 2015. [11] S. J. Peyman Asgharzadeh, “A SURVEY ON INTRUSION DETECTION SYSTEM BASED SUPPORT VECTOR MACHINE ALGORITHM,” INTERNATIONAL JOURNAL OF RESEARCH IN COMPUTER APPLICATIONS AND ROBOTICS, vol. 3, no. 12, pp. 42-50, 2015. [12] J. A. Shikha Agrawal, “Survey on Anomaly Detection using Data Mining Techniques,” in International Conference on Knowledge Based and Intelligent Information and Engineering Systems, Department of Computer Science and Engineering, Rajiv Gandhi Proudyogiki Vishwavidyalaya, Bhopal, India, 2015. [13] M. S. H. M. D. A. Asghar Ali Shah, “Analysis of Machine Learning Techniques for Intrusion Detection System: A Review,” International Journal of Computer Applications, vol. 119, no. 3, pp. 19-40, June 2015. [14] N. P. f. Intelligent, “Numenta,” 2015. [Online]. Available: https://numenta.com/assets/pdf/whitepapers/Numenta%20White%20Paper%20-%20Science%20of%20Anomaly%20Detection.pdf. [15] A. B. a. V. K. VARUN CHANDOLA, “Anomaly Detection : A Survey,” ACM Computing Surveys, Minneapolis and St. Paul, Minnesota, 2009. [16] J. W. B. Sergio Armando Gutierrez, Application of Machine Learning Techniques to Distributed Denial of Service (DDoS) Attack Detection: A Systematic Literature Review, Medell´ın, 2012. [17] J. Goldberg, “RSA,” 2013. [Online]. Available: http://www.rsaconference.com/writable/presentations/file_upload/ht-t08-_big-data_-for-security-purposes_how-can-i-put-big-data-to-work-for-me_copy1.pdf. [18] “splunk,” 2015. [Online]. Available: https://www.splunk.com/web_assets/pdfs/secure/Splunk_as_a_SIEM_Tech_Brief.pdf. [Accessed 15 April 2016]. [19] B. J. B. A. A. S. David J. Weller-Fahy, “A Survey of Distance and Similarity Measures Used Within Network Intrusion Anomaly Detection,” IEEE COMMUNICATION SURVEYS & TUTORIALS, vol. 17, no. Bibliography
  • 19.
  • 20.
  • 22. Results  Total Records = ~1.25 Lacs (125973)  Model Building = ~75K (60%)  Test Model = ~50K (40%) Anomaly Normal Anomaly 26887 (TP) 111 (FN) (Type II) Normal 554 (FP) 22957 (TN) Total 27441 23068 Accuracy = (TP+TN)/Total=49884/50509 = 0.9868 Precision = TP/(TP + FP)=26887/(26887+554)=0.9798 Actual Predicted (All Features) Anomaly Normal Total 26366 (TP) 632(FN) (Type II) 26998 698 (FP) 22813(TN) 23511 27064 23445 50509 Predicted (Selected Features) Accuracy= 0.9736 Precision= 0.9742
  • 23. Results Description Precision Recall Area Under ROC 1. Decision Tree, Full Data Accuracy 0.9951 0.9764 98.62% 2. Decision Tree, Selected Feature Data Accuracy 0.9730 0.9703 97.35% • Precision (Positive Predictive Value) PPV=TP/TP+FP • Recall (True Positive Rate) TPR =TP/TP+FN • Area under ROC: Plot of true positive rate (TPR, or specificity) against false positive rate (FPR, or 1 - sensitivity), which is all a Receiver Operating Characteristics (ROC) curve.
  • 24. Output of Anomaly Detection  Scores  Labels
  • 25. Decision Trees  Supervised technique  Entropy  A measure of dataset’s order-How same or different it is  If we classify dataset into N different classes  0=all classes are same  1=classes are different  At each step, find the attribute that can be used to partition the data set to minimise the entropy of the data  A completely homogeneous sample has entropy of 0.  An equally divided sample has entropy of 1.  Entropy(s) = - p+log2 (p+) -p-log2 (p-) for a sample of negative and positive elements.  The formula for entropy is:  Greedy algorithm is used  Demo : Refer Excel
  • 26. Support Vector Machines  Supervised technique  Works well for classifying higher-dimensional data  Finds higher-dimensional support vectors across which to divide the data  Kernels can be used to represent data in higher dimensional spaces to find hyperplanes that might not be apparent in lower dimensions  Types:  Linear  Polynomial (Curves)  RBF  Functions takes low dimensional input space and transform it to a higher dimensional space i.e. it converts not separable problem to separable problems.  Useful in non-linear separation problem. Simply put, it does some extremely complex data transformations, then find out the process to separate the data based on the labels or outputs you’ve defined.  Computationally expensive  Plot each data item as a point in n-dimensional space (where n is number of features you have) with the value of each feature being the value of a particular coordinate.  Perform classification by finding the hyper-plane that differentiate the two classes  Use Train test to decide the model
  • 27. Support Vector Machines  ADV:  Works well when clear separation exists  It uses a subset of training points in the decision function (called support vectors), so it is also memory efficient.  Works well for high dimensional data  DISADV  It doesn’t perform well,  when we have large data set because the required training time is higher  when the data set has more noise i.e. target classes are overlapping  SVM doesn’t directly provide probability estimates, these are calculated using an expensive five-fold cross-validation.  Noise may create issue
  • 28. Naïve Bayes  Classification technique based on Bayes Theorem  Bayes Theorem  P(A|B)=P(A)P(B|A)/P(B)  Efficient in computation as compared to decision trees  Naïve Bayesian Network can be represented in using DAG,  Each node represents attribute  Each link represents influence of one node to another  Calculate probability and sum it up and as per threshold predict.  Demo: Spam Classifier  P(spam|free)=P(spam)P(free|spam) / P(free)  Probability of message being spam and containing word ‘free’ / overall probability of having word ‘free’
  • 29. Naïve Bayes  ADV:  Construction is easy and also takes short computation time;  It can be applied to large dataset since it does not involve in complicated parameter;  Interpretation of knowledge representation; &  Encodes probabilistic relationships among the variables of interest. Ability to incorporate both Prior knowledge and data.  DISADV  Harder to handle continuous features. May not contain any good classifiers if prior knowledge is wrong.
  • 30. K-Means Clustering  Iterative clustering technique based on splitting of data into K groups that are closes to K centroids  Unsupervised learning based on the position of each element  Can uncover interesting grouping  Randomly pick K centroids  Assign each data point to its closes centroid  Recompute the centroids based on their average position  Iterate until points stop changing  If want to predict cluster for new data points, just check it is closest to which centroid
  • 31. K-Means Clustering  ADV:  Less complex  DISADV  Choosing right value of K  Labelling of cluster to be done manually  Sensitive to noise
  • 32. Nature of Input Data  binary, categorical or continuous.  Univariate/multivariate  Nature of attributes determines the applicability of anomaly detection techniques  E.g., Statistical techniques to be used for continuous and categorical data.
  • 33. Data Labels  Based on the extent to which the labels are available, anomaly detection techniques can operate in one of the following three modes:  Supervised  Semi-Supervised  Unsupervised
  • 34. Types of Anomalies  Point  Contextual  Collective
  • 35. Challenges  Defining a normal region  Anomalous observations appear like normal  Notion of an anomaly  Availability of labeled data  Noise
  • 36. Key components Research Areas Machine Learning Data Mining Information Theory Spectral Theory ………. Anomaly Detection Technique Problem Characteristics Nature of Data Labels Anomaly Type Output ……… ………. Application Domains Intrusion Detection Fraud Detection ……… ……… ……….
  • 38. Architecture Apache Spark Streaming MLibSource of Input Data Anomaly detection model Detected Anomalies (Outliers) Real-time stream processing engine Real-time stream Real-time stream

Editor's Notes

  1. Baseline can be considered as description of the type of network behaviour that can be accepted or is normal, any deviation from this baseline is considered as an anomaly.
  2. assign an anomaly score to each instance in the test data depending on the degree to which that instance is considered an anomaly. analyst may choose to either analyze top few anomalies or use a cut-off threshold to select the anomalies. assign a label (normal or anomalous) to each test instance.
  3. Defining a normal region which encompasses every possible normal behavior is very difficult. Make the anomalous observations appear like normal The exact notion of an anomaly is different for different application domains. Availability of labeled data for training/validation of models used by anomaly detection techniques is usually a major issue. Often the data contains noise which tends to be similar to the actual anomalies and hence is difficult to distinguish and remove.
  4. Defining a normal region which encompasses every possible normal behavior is very difficult. Make the anomalous observations appear like normal The exact notion of an anomaly is different for different application domains. Availability of labeled data for training/validation of models used by anomaly detection techniques is usually a major issue. Often the data contains noise which tends to be similar to the actual anomalies and hence is difficult to distinguish and remove.