SlideShare a Scribd company logo
1 of 27
Download to read offline
Using Machine Learning in
Networks Intrusion Detection
Systems
OMAR SHAYA
Georg-August-Universitรคt Gรถttingen โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“ 1
Sections
โœค Introduction
โœค Intrusion Detection Methodologies
โœค A Machine Learning Based IDS (Intrusion Detection System)
โœค Challenges of Using Machine Learning in Intrusion Detection
โœค Summary
โœค References
โœค Appendix
Georg-August-Universitรคt Gรถttingen โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“ 2
INTRODUCITON
Georg-August-Universitรคt Gรถttingen โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“ 3
IDS: Intrusion Detection System
Increasing attacks on computer networks and the need
for automated detection
Georg-August-Universitรคt Gรถttingen โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“ 4
โ€ข Internet and computer systems have raised numerous security
and privacy issues
โ€ข Explosive use of networks due to many reasons e.g. internet,
wireless networks, cloud computing
โ€ข Thus, malicious attacks on networks have increased year over
year
โ€ข Need to automate systems that detect these attacks
โ€ข Based on on known attacks
โ€ข But what about attacks that were not seen before
โ€ข Machine learning?
INTRODUCTION
De๏ฌnition: intrusion & intrusion detection
Georg-August-Universitรคt Gรถttingen โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“ 5
INTRODUCTION
โ€œIntrusion is an attempt to compromise CIA
(Con๏ฌdentiality, Integrity, Availability), or to bypass
the security mechanisms of a computer or networkโ€œ
โ€œIntrusion detection is the process of monitoring
the events occurring in a computer system or
network, and analyzing them for signs of intrusionโ€
INTRUSION DETECTION METHODOLOGIES
Georg-August-Universitรคt Gรถttingen โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“ 6
IDS: Intrusion Detection System
There are 3 main Detection Methodologies
Georg-August-Universitรคt Gรถttingen โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“ 7
โ€ข Signature-based Detection (SD)
โ€ข A signature is a string or pattern that corresponds to known attack or threat
โ€ข SD is a process to compare patterns against captured events for recognizing
possible intrusions
โ€ข Uses the knowledge accumulated by speci๏ฌc attacks and system vulnerabilities
โ€ข Also known as Knowledge-based Detection or Misuse Detection
โ€ข Anomaly-based Detection (AD)
โ€ข Anomaly is a deviation to โ€œnormalโ€ behavior
โ€ข Pro๏ฌles of normal derived from monitoring network traf๏ฌc
โ€ข AD compares normal pro๏ฌles with observed events to recognize attacks
โ€ข Stateful Protocol Analysis (SPA)
โ€ข SPA depends on vendor-developed generic pro๏ฌles to speci๏ฌc protocols
โ€ข Protocols based on standards from international standard organizations
โ€ข Hybrid IDS use multiple methodologies
โ€ข SD and AD are complementary methods, former concerns with certain attacks
and the later focuses on unknown attacks
INTRUSION DETECTION METHODOLOGIES
There are 3 main Detection Methodologies
Georg-August-Universitรคt Gรถttingen โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“ 8
โ€ข Hybrid IDS use multiple methodologies
โ€ข E.g. SD and AD are complementary methods
โ€ข SD concerns with certain attacks and AD focuses on unknown attacks
INTRUSION DETECTION METHODOLOGIES
Signature-based Detection
(SD)*
Anomaly-based Detection
(AD)
Stateful Protocol Analysis
(SPA)
SD is a process to compare patterns
against captured events for
recognizing possible intrusions
AD compares normal pro๏ฌles with
observed events to recognize attacks
SPA depends on vendor-developed
generic pro๏ฌles to speci๏ฌc protocols
A signature is a string or pattern that
corresponds to known attack or threat
Anomaly is a deviation to โ€œnormalโ€
behavior
The stateful in SPA indicates that IDS
could know and trace the protocol
states (e.g., pairing requests with
replies)
Uses the knowledge accumulated by
speci๏ฌc attacks and system
vulnerabilities
Pro๏ฌles of normal derived from
monitoring network traf๏ฌc
Protocols based on standards from
international standard organizations
* Also known as Knowledge-based Detection or Misuse Detection
Pros and cons of Intrusion Detection Methods
Georg-August-Universitรคt Gรถttingen โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“ 9
INTRUSION DETECTION METHODOLOGIES
Table 1: Pros and Cons of intrusion detection methodologies. Source [2]
Signature-based Detection
(SD)
Anomaly-based Detection
(AD)
Stateful Protocol Analysis
(SPA)
โ€ข Simplest and effective method to
detect attacks
โ€ข Detail contextual analysis
โ€ข Effective to detect new and
unforeseen vulnerabilities
โ€ข Less dependent on OS
โ€ข Facilitate detections of privilege
abuse
โ€ข Know and trace protocol states
โ€ข Distinguish unexpected sequences
of commands
โ€ข Ineffective with unknown attacks
and variants of known attacks
โ€ข Little understanding to states and
protocols
โ€ข Hard to keep signatures/patterns up
to date
โ€ข Time consuming to maintain the
knowledge
โ€ข Weak pro๏ฌles accuracy due to
observed events
โ€ข Unavailable during rebuilding of
behavior pro๏ฌles
โ€ข Dif๏ฌcult to trigger alerts in right time
โ€ข Resource consuming to protocol
state tracing and examination
โ€ข Unable to inspect attacks looking
like benign protocol behaviors
โ€ข Might be incompatible to dedicated
OSs or APs
PROSCONS
A MACHINE LEARNING BASED IDS
Georg-August-Universitรคt Gรถttingen โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“ 10
IDS: Intrusion Detection System
Machine learning in anomaly detection
Georg-August-Universitรคt Gรถttingen โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“ 11
โ€ข Anomaly-based Detection (AD)
โ€ข Easy when it is possible to characterize what is normal in the
data using simple mathematical model, e.g. normal distribution
โ€ข Most interesting real world systems have complex behavior that
doesnโ€™t follow such distribution
โ€ข Machine learning is useful to learn the characteristics of the
system from observed data
โ€ข Feature Selection is the process of selecting a subset of relevant
features (variables, predictors) for use in model construction. Feature
selection techniques are used for three reasons:
โ€ข Simpli๏ฌcation of models to make them easier to interpret
โ€ข Shorter training times
โ€ข Enhanced generalization by reducing over๏ฌtting
โ€ข Outlier Detection: an outlier is an observation point that is distant from
other observations
A MACHINE LEARNING BASED IDS
Robust Feature Selection and Robust PCA for Internet
Traf๏ฌc Anomaly Detection
Georg-August-Universitรคt Gรถttingen โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“ 12
โ€ข Couples feature selection algorithm with outlier detection
method
โ€ข Uses robust statistics tools in both procedures
โ€ข Reliable results even with outliersโ€™ presence
โ€ข Feature selection based on robust mutual estimator
โ€ข MI (Mutual Information): an information-theoretic metric that
captures both linear and non-linear dependencies
โ€ข Outlier detection on robust PCA (Principal Component Analysis)
โ€ข Mathematical procedure used to reduce dimensionality of a
problem
A MACHINE LEARNING BASED IDS
Robust Feature Selection and Robust PCA for Internet
Traf๏ฌc Anomaly Detection
Georg-August-Universitรคt Gรถttingen โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“ 13
โ€ข Feature selection
โ€ข Important preprocessing step (๏ฌlter)
โ€ข Reduce dimensionality with high-dimensional data
โ€ข Remove irrelevant data
โ€ข Increase learning accuracy
โ€ข Gives signi๏ฌcant performance gains โ€จ
A MACHINE LEARNING BASED IDS
Robust Feature Selection and Robust PCA for Internet
Traf๏ฌc Anomaly Detection
Georg-August-Universitรคt Gรถttingen โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“ 14
A MACHINE LEARNING BASED IDS
โ€ข Robust statistics
โ€ข Reliable results even in the
presence of outliers
Example:
โ€ข In normal distribution, the inner 95%
are in โ€œcenter ยฑ 1.96 X spreadโ€
โ€ข Center: instead of mean, โ€จ
take the median
โ€ข Spread: instead of SD (standard
deviation), take the MAD (median
absolute deviation)
Source [1]
Dataset creation for training and testing (1/2)
Georg-August-Universitรคt Gรถttingen โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“ 15
โ€ข Dataset collected from mirroring traf๏ฌc passing the switch of:
โ€ข Private laboratory network, 17 inter-connected PCs
โ€ข 10 for users producing licit traf๏ฌc
โ€ข 1 for server, 1 for measurements
โ€ข 5 for attacks
โ€ข Licit traf๏ฌc
โ€ข File sharing (BitTorrent)
โ€ข Video streaming (IPTV over TCP)
โ€ข Web browsing (HTTP)
โ€ข Attacks
โ€ข Botnets
โ€ข Port-scans: identify other targets vulnerable to infections
โ€ข Snapshots: type of identity theft for stealing personal information
โ€ข Other Botnet attacks are not used e.g. spyware, malware, denial of service, and
email spam
โ€ข Happen uniquely on host level
โ€ข Can be detected by e.g. anti-virus, monitoring at router/๏ฌrewalls, email scanning
A MACHINE LEARNING BASED IDS
Dataset creation for training and testing (2/2)
Georg-August-Universitรคt Gรถttingen โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“ 16
โ€ข Customer usage pro๏ฌles
โ€ข (a) Soft browsing (HTTP only)
โ€ข (b) File sharing machine (BitTorrent only)
โ€ข (c) File sharing user (BitTorrent and HTTP)
โ€ข (d) Heavy user (HTTP, BitTorrent, and
Streaming)
โ€ข Network scenarios
โ€ข (B) Business user
โ€ข 100% (a)
โ€ข (R) Residential user
โ€ข 30% (b), 40% (c), 30% (d)
โ€ข Attack intensities
โ€ข (1) 6% (5% snapshot, 1% port-scan)
โ€ข (2) 20% (15% snapshot, 5% port-scan)
โ€ข (3) 35% (30% snapshot, 5% port-scan)
A MACHINE LEARNING BASED IDS
Table 2. Source [1]
Results (1/3)
Georg-August-Universitรคt Gรถttingen โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“ 17
A MACHINE LEARNING BASED IDS
โ€ข 6 types of anomaly detectors A-B
โ€ข A: feature selection method, B Outlier
detection method
โ€ข R (robust)
โ€ข NR (non-robust)
โ€ข โˆ… (no-method)
โ€ข Performance measures
โ€ข Nr Ftrs: number of selected features
โ€ข Recall: probability that an observation is
classi๏ฌed as anomaly when in fact it is an
anomaly
โ€ข False positive rate (FPR): probability that an
observation is classi๏ฌed as an anomaly when
in fact it is a regular observation
โ€ข Precision: probability of having an anomalous
observation given that it is classi๏ฌed as an
anomaly
Table 3. Source [1]
Results (2/3)
Georg-August-Universitรคt Gรถttingen โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“ 18
โ€ข R-R detector achieved the best
results
โ€ข Recall is always 1
โ€ข B1, B2, B3, R3 performance is maximum
โ€ข FPR and Precision are close to their optimal
โ€ข Improvement over non-robust
version is high
โ€ข Low recall means large percentage of
anomalies are not correctly identi๏ฌed
โ€ข B2, B3, R3 recall improved from 0.167,
0.273, and 0.125 to 1
โ€ข Feature selection
โ€ข Feature selection reduces Nr Ftrs, improves
performance
โ€ข B3 and R3: no feature selection sometimes
better than non-robust feature selection
A MACHINE LEARNING BASED IDS
Table 3. Source [1]
Results (3/3)
Georg-August-Universitรคt Gรถttingen โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“ 19
A MACHINE LEARNING BASED IDS
โ€ข Compare R-NR (top) and R-R
(bottom)
โ€ข Any point with score or distance
larger than a threshold (the lines) is
considered an anomaly
โ€ข R-NR case there is confusion
around snapshots
โ€ข thus poor recall value 0.125
โ€ข proximity in behavior between snapshots and
some HTTP and BitTorrent fools the non-robust
outlier detector
โ€ข All consist of small ๏ฌle uploads
Source [1]
Fig. 2.
Discussion
Georg-August-Universitรคt Gรถttingen โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“ 20
โ€ข There are advantages of using feature selection step and
using robust statistics for both feature selection and outlier
detection
โ€ข System achieves very high performance
โ€ข The systemโ€™s anomaly detector is adaptive to different traf๏ฌc conditions (licit traf๏ฌc
differs signi๏ฌcantly in the two scenarios)
โ€ข However, the dataset used was obtained from a private lab
with 17 PCs, and not necessarily representative of a real
world scenario
โ€ข Need to show proof of the effectiveness of the system in larger scale network
traf๏ฌc dataset
A MACHINE LEARNING BASED IDS
CHALLENGES OF USING MACHINE
LEARNING IN INTRUSION DETECTION
Georg-August-Universitรคt Gรถttingen โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“ 21
Outliers, cost of error, semantics, and evaluation
Georg-August-Universitรคt Gรถttingen โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“ 22
โ€ข Outlier detection
โ€ข Hard to de๏ฌne normal in network traf๏ฌc as the usage varies in every
session and with new applications (diversity of network traf๏ฌc)
โ€ข High cost of errors
โ€ข Cost of misclassi๏ฌcation is extremely high
โ€ข False positive: expensive analyst time
โ€ข False negative: cause serious damage to an organization
โ€ข Error in other applications of ML not expensive e.g. product
recommendations, OCR, spam detection
โ€ข Semantic gap
โ€ข Currently it is only assessment of capability to identify deviations from
normal pro๏ฌle (could be good or bad)
โ€ข Need to interpret results from operator point of view, what does it mean?
โ€ข Dif๏ฌculties with evaluation
โ€ข Designing sound evaluation schemes can be more dif๏ฌcult than the
detector itself
โ€ข Lack of public data sets for assessing anomaly detection
โ€ข Hard to gain real data set for many reasons e.g. leak of personal data
โ€ข Simulated data is not accurate
CHALLENGES OF USING MACHINE LEARNING IN INTRUSION DETECTION
SUMMARY
Georg-August-Universitรคt Gรถttingen โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“ 23
Summary
Georg-August-Universitรคt Gรถttingen โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“ 24
โ€ข Introduction
โ€ข The need for automated Intrusion Detection Systems
โ€ข De๏ฌnition of Intrusion and Intrusion Detectionโ€จ
โ€ข Intrusion Detection Methodologies
โ€ข Signature-based Detection (SD)
โ€ข Anomaly-based Detection (AD)
โ€ข Stateful Protocol Analysis (SPA)โ€จ
โ€ข Machine Learning Based IDS
โ€ข Using feature selection and robust statistics
โ€ข Dataset creation
โ€ข Results and evaluation
โ€ข Discussionโ€จ
โ€ข Challenges of Using Machine Learning in ID
โ€ข Outlier detection, high cost of error, semantic gap, and dif๏ฌculties with evaluation
SUMMARY
OMAR SHAYA โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“ omar.shaya@stud.uni-goettingen.de
Thanks!
Georg-August-Universitรคt Gรถttingen โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“ 25
Georg-August-Universitรคt Gรถttingen โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“ 26
References
[1] C. Pasocal, M. Oliveira, R. Valdas, P. Filzmoser, P. Salvador and A. Pacheco. Robust Feature Selection and
Robust PCA for Internet Traffic Anomaly Detection. In Proceedings IEEE INFOCOM, pages 1755-1763, 2012
[2] H. Liao, C. Lin, Y. Lin and K. Tung. Intrusion Detection System: A Comprehensive Review. In Journal of
Network and Computer Applications, pages 16-24, 2013
[3] R. Sommer and V. Paxson. Outside the Closed World: On Using Machine Learning For Network Intrusion
Detection. In IEEE Symposium on Security and Privacy, pages 305-316, 2010
[4] Feature Selection. https://en.wikipedia.org/wiki/Feature_selection on 6 August 2015
[5] Outlier. https://en.wikipedia.org/wiki/Outlier on 6 August 2015
[6] Anomaly Detection โ€“ Using Machine Learning to Detect Abnormalities in Time Series Data. http://
blogs.technet.com/b/machinelearning/archive/2014/11/05/anomaly-detection-using-machine-learning-to-
detect-abnormalities-in-time-series-data.aspx on 6 August 2015
REFERENCES
Precision and Recall
Georg-August-Universitรคt Gรถttingen โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“ 27
APPENDIX
Source: Dr. Stephan Siggโ€™s slides from Machine Learning and Pervasive Computing course SoSe 2015

More Related Content

What's hot

Intrusion detection
Intrusion detectionIntrusion detection
Intrusion detection
Umesh Dhital
ย 
When Cyber Security Meets Machine Learning
When Cyber Security Meets Machine LearningWhen Cyber Security Meets Machine Learning
When Cyber Security Meets Machine Learning
Lior Rokach
ย 

What's hot (20)

Intrusion Detection System
Intrusion Detection SystemIntrusion Detection System
Intrusion Detection System
ย 
Machine learning in Cyber Security
Machine learning in Cyber SecurityMachine learning in Cyber Security
Machine learning in Cyber Security
ย 
Intrusion Detection Presentation
Intrusion Detection PresentationIntrusion Detection Presentation
Intrusion Detection Presentation
ย 
Intrusion prevention system(ips)
Intrusion prevention system(ips)Intrusion prevention system(ips)
Intrusion prevention system(ips)
ย 
Intrusion detection
Intrusion detectionIntrusion detection
Intrusion detection
ย 
Intrusion detection system
Intrusion detection systemIntrusion detection system
Intrusion detection system
ย 
Intrusion Detection Systems and Intrusion Prevention Systems
Intrusion Detection Systems  and Intrusion Prevention Systems Intrusion Detection Systems  and Intrusion Prevention Systems
Intrusion Detection Systems and Intrusion Prevention Systems
ย 
Intrusion detection
Intrusion detectionIntrusion detection
Intrusion detection
ย 
Intrusion Detection System Project Report
Intrusion Detection System Project ReportIntrusion Detection System Project Report
Intrusion Detection System Project Report
ย 
Intrusion detection system
Intrusion detection systemIntrusion detection system
Intrusion detection system
ย 
Intrusion detection system
Intrusion detection system Intrusion detection system
Intrusion detection system
ย 
When Cyber Security Meets Machine Learning
When Cyber Security Meets Machine LearningWhen Cyber Security Meets Machine Learning
When Cyber Security Meets Machine Learning
ย 
Intrusion detection system
Intrusion detection systemIntrusion detection system
Intrusion detection system
ย 
Application of Machine Learning in Cybersecurity
Application of Machine Learning in CybersecurityApplication of Machine Learning in Cybersecurity
Application of Machine Learning in Cybersecurity
ย 
Intrusion Detection System(IDS)
Intrusion Detection System(IDS)Intrusion Detection System(IDS)
Intrusion Detection System(IDS)
ย 
Final spam-e-mail-detection
Final  spam-e-mail-detectionFinal  spam-e-mail-detection
Final spam-e-mail-detection
ย 
Web Security
Web SecurityWeb Security
Web Security
ย 
Network Security Fundamentals
Network Security FundamentalsNetwork Security Fundamentals
Network Security Fundamentals
ย 
Ids(final)
Ids(final)Ids(final)
Ids(final)
ย 
Seminar (network security)
Seminar (network security)Seminar (network security)
Seminar (network security)
ย 

Similar to Using Machine Learning in Networks Intrusion Detection Systems

BSIT3CD_Continuation of Cyber incident response (1).pdf
BSIT3CD_Continuation of Cyber incident response (1).pdfBSIT3CD_Continuation of Cyber incident response (1).pdf
BSIT3CD_Continuation of Cyber incident response (1).pdf
StevenJoeBiago
ย 
Chapter-3-Intrusion-Detection-Systems-part-1.ppt
Chapter-3-Intrusion-Detection-Systems-part-1.pptChapter-3-Intrusion-Detection-Systems-part-1.ppt
Chapter-3-Intrusion-Detection-Systems-part-1.ppt
madin20232022
ย 
Applying Provenance in APT Monitoring and Analysis Practical Challenges for S...
Applying Provenance in APT Monitoring and Analysis Practical Challenges for S...Applying Provenance in APT Monitoring and Analysis Practical Challenges for S...
Applying Provenance in APT Monitoring and Analysis Practical Challenges for S...
Graeme Jenkinson
ย 
Intrusion detection and anomaly detection system using sequential pattern mining
Intrusion detection and anomaly detection system using sequential pattern miningIntrusion detection and anomaly detection system using sequential pattern mining
Intrusion detection and anomaly detection system using sequential pattern mining
eSAT Journals
ย 
Intrusion detection and anomaly detection system using sequential pattern mining
Intrusion detection and anomaly detection system using sequential pattern miningIntrusion detection and anomaly detection system using sequential pattern mining
Intrusion detection and anomaly detection system using sequential pattern mining
eSAT Journals
ย 

Similar to Using Machine Learning in Networks Intrusion Detection Systems (20)

6. Cybersecurity for Industrial Ethernet - Dr Paul Comerford
6. Cybersecurity for Industrial Ethernet - Dr Paul Comerford6. Cybersecurity for Industrial Ethernet - Dr Paul Comerford
6. Cybersecurity for Industrial Ethernet - Dr Paul Comerford
ย 
IRJET- A Review on Application of Data Mining Techniques for Intrusion De...
IRJET-  	  A Review on Application of Data Mining Techniques for Intrusion De...IRJET-  	  A Review on Application of Data Mining Techniques for Intrusion De...
IRJET- A Review on Application of Data Mining Techniques for Intrusion De...
ย 
BSIT3CD_Continuation of Cyber incident response (1).pdf
BSIT3CD_Continuation of Cyber incident response (1).pdfBSIT3CD_Continuation of Cyber incident response (1).pdf
BSIT3CD_Continuation of Cyber incident response (1).pdf
ย 
First SCADA LAB International Workshop
First SCADA LAB International WorkshopFirst SCADA LAB International Workshop
First SCADA LAB International Workshop
ย 
Chapter-3-Intrusion-Detection-Systems-part-1.ppt
Chapter-3-Intrusion-Detection-Systems-part-1.pptChapter-3-Intrusion-Detection-Systems-part-1.ppt
Chapter-3-Intrusion-Detection-Systems-part-1.ppt
ย 
Unlock Security Insight from Machine Data
Unlock Security Insight from Machine DataUnlock Security Insight from Machine Data
Unlock Security Insight from Machine Data
ย 
Cps security bitsworkshopdec15.2012 (1)
Cps security bitsworkshopdec15.2012 (1)Cps security bitsworkshopdec15.2012 (1)
Cps security bitsworkshopdec15.2012 (1)
ย 
CPSSecurityBITSWorkshopDec15.2012 (1).pptx
CPSSecurityBITSWorkshopDec15.2012 (1).pptxCPSSecurityBITSWorkshopDec15.2012 (1).pptx
CPSSecurityBITSWorkshopDec15.2012 (1).pptx
ย 
IRJET- Genetic Algorithm based Intrusion Detection-Survey
IRJET- Genetic Algorithm based Intrusion Detection-SurveyIRJET- Genetic Algorithm based Intrusion Detection-Survey
IRJET- Genetic Algorithm based Intrusion Detection-Survey
ย 
ids.ppt
ids.pptids.ppt
ids.ppt
ย 
Vapt life cycle
Vapt life cycleVapt life cycle
Vapt life cycle
ย 
Intruders and Intrusion detection in Cryptosystems
Intruders and Intrusion detection in CryptosystemsIntruders and Intrusion detection in Cryptosystems
Intruders and Intrusion detection in Cryptosystems
ย 
Applying Provenance in APT Monitoring and Analysis Practical Challenges for S...
Applying Provenance in APT Monitoring and Analysis Practical Challenges for S...Applying Provenance in APT Monitoring and Analysis Practical Challenges for S...
Applying Provenance in APT Monitoring and Analysis Practical Challenges for S...
ย 
Vulnerability assessment & Penetration testing Basics
Vulnerability assessment & Penetration testing Basics Vulnerability assessment & Penetration testing Basics
Vulnerability assessment & Penetration testing Basics
ย 
Intrusion detection and anomaly detection system using sequential pattern mining
Intrusion detection and anomaly detection system using sequential pattern miningIntrusion detection and anomaly detection system using sequential pattern mining
Intrusion detection and anomaly detection system using sequential pattern mining
ย 
Intrusion detection and anomaly detection system using sequential pattern mining
Intrusion detection and anomaly detection system using sequential pattern miningIntrusion detection and anomaly detection system using sequential pattern mining
Intrusion detection and anomaly detection system using sequential pattern mining
ย 
Self Monitoring System to Catch Unauthorized Activity
Self Monitoring System to Catch Unauthorized ActivitySelf Monitoring System to Catch Unauthorized Activity
Self Monitoring System to Catch Unauthorized Activity
ย 
Monitoring - deeper dive
Monitoring  - deeper diveMonitoring  - deeper dive
Monitoring - deeper dive
ย 
intrusion detection system (IDS)
intrusion detection system (IDS)intrusion detection system (IDS)
intrusion detection system (IDS)
ย 
Network Forensics.pdf
Network Forensics.pdfNetwork Forensics.pdf
Network Forensics.pdf
ย 

Recently uploaded

obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...
yulianti213969
ย 
ๅŽŸไปถไธ€ๆ ท(UWOๆฏ•ไธš่ฏไนฆ๏ผ‰่ฅฟๅฎ‰ๅคง็•ฅๅคงๅญฆๆฏ•ไธš่ฏๆˆ็ปฉๅ•็•™ไฟกๅญฆๅŽ†่ฎค่ฏ
ๅŽŸไปถไธ€ๆ ท(UWOๆฏ•ไธš่ฏไนฆ๏ผ‰่ฅฟๅฎ‰ๅคง็•ฅๅคงๅญฆๆฏ•ไธš่ฏๆˆ็ปฉๅ•็•™ไฟกๅญฆๅŽ†่ฎค่ฏๅŽŸไปถไธ€ๆ ท(UWOๆฏ•ไธš่ฏไนฆ๏ผ‰่ฅฟๅฎ‰ๅคง็•ฅๅคงๅญฆๆฏ•ไธš่ฏๆˆ็ปฉๅ•็•™ไฟกๅญฆๅŽ†่ฎค่ฏ
ๅŽŸไปถไธ€ๆ ท(UWOๆฏ•ไธš่ฏไนฆ๏ผ‰่ฅฟๅฎ‰ๅคง็•ฅๅคงๅญฆๆฏ•ไธš่ฏๆˆ็ปฉๅ•็•™ไฟกๅญฆๅŽ†่ฎค่ฏ
pwgnohujw
ย 
Abortion pills in Riyadh Saudi Arabia| +966572737505 | Get Cytotec, Unwanted Kit
Abortion pills in Riyadh Saudi Arabia| +966572737505 | Get Cytotec, Unwanted KitAbortion pills in Riyadh Saudi Arabia| +966572737505 | Get Cytotec, Unwanted Kit
Abortion pills in Riyadh Saudi Arabia| +966572737505 | Get Cytotec, Unwanted Kit
Abortion pills in Riyadh +966572737505 get cytotec
ย 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
ย 
Abortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotec
Abortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotecAbortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotec
Abortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotec
Abortion pills in Riyadh +966572737505 get cytotec
ย 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Klinik kandungan
ย 
Huawei Ransomware Protection Storage Solution Technical Overview Presentation...
Huawei Ransomware Protection Storage Solution Technical Overview Presentation...Huawei Ransomware Protection Storage Solution Technical Overview Presentation...
Huawei Ransomware Protection Storage Solution Technical Overview Presentation...
LuisMiguelPaz5
ย 
Abortion pills in Jeddah |+966572737505 | get cytotec
Abortion pills in Jeddah |+966572737505 | get cytotecAbortion pills in Jeddah |+966572737505 | get cytotec
Abortion pills in Jeddah |+966572737505 | get cytotec
Abortion pills in Riyadh +966572737505 get cytotec
ย 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Bertram Ludรคscher
ย 

Recently uploaded (20)

Seven tools of quality control.slideshare
Seven tools of quality control.slideshareSeven tools of quality control.slideshare
Seven tools of quality control.slideshare
ย 
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...
ย 
Northern New England Tableau User Group (TUG) May 2024
Northern New England Tableau User Group (TUG) May 2024Northern New England Tableau User Group (TUG) May 2024
Northern New England Tableau User Group (TUG) May 2024
ย 
ๅŽŸไปถไธ€ๆ ท(UWOๆฏ•ไธš่ฏไนฆ๏ผ‰่ฅฟๅฎ‰ๅคง็•ฅๅคงๅญฆๆฏ•ไธš่ฏๆˆ็ปฉๅ•็•™ไฟกๅญฆๅŽ†่ฎค่ฏ
ๅŽŸไปถไธ€ๆ ท(UWOๆฏ•ไธš่ฏไนฆ๏ผ‰่ฅฟๅฎ‰ๅคง็•ฅๅคงๅญฆๆฏ•ไธš่ฏๆˆ็ปฉๅ•็•™ไฟกๅญฆๅŽ†่ฎค่ฏๅŽŸไปถไธ€ๆ ท(UWOๆฏ•ไธš่ฏไนฆ๏ผ‰่ฅฟๅฎ‰ๅคง็•ฅๅคงๅญฆๆฏ•ไธš่ฏๆˆ็ปฉๅ•็•™ไฟกๅญฆๅŽ†่ฎค่ฏ
ๅŽŸไปถไธ€ๆ ท(UWOๆฏ•ไธš่ฏไนฆ๏ผ‰่ฅฟๅฎ‰ๅคง็•ฅๅคงๅญฆๆฏ•ไธš่ฏๆˆ็ปฉๅ•็•™ไฟกๅญฆๅŽ†่ฎค่ฏ
ย 
Introduction to Statistics Presentation.pptx
Introduction to Statistics Presentation.pptxIntroduction to Statistics Presentation.pptx
Introduction to Statistics Presentation.pptx
ย 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
ย 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
ย 
Abortion pills in Riyadh Saudi Arabia| +966572737505 | Get Cytotec, Unwanted Kit
Abortion pills in Riyadh Saudi Arabia| +966572737505 | Get Cytotec, Unwanted KitAbortion pills in Riyadh Saudi Arabia| +966572737505 | Get Cytotec, Unwanted Kit
Abortion pills in Riyadh Saudi Arabia| +966572737505 | Get Cytotec, Unwanted Kit
ย 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
ย 
Abortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotec
Abortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotecAbortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotec
Abortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotec
ย 
Credit Card Fraud Detection: Safeguarding Transactions in the Digital Age
Credit Card Fraud Detection: Safeguarding Transactions in the Digital AgeCredit Card Fraud Detection: Safeguarding Transactions in the Digital Age
Credit Card Fraud Detection: Safeguarding Transactions in the Digital Age
ย 
โ„‚all Girls In Navi Mumbai Hire Me Neha 9910780858 Top Class โ„‚all Girl Serviโ„‚e...
โ„‚all Girls In Navi Mumbai Hire Me Neha 9910780858 Top Class โ„‚all Girl Serviโ„‚e...โ„‚all Girls In Navi Mumbai Hire Me Neha 9910780858 Top Class โ„‚all Girl Serviโ„‚e...
โ„‚all Girls In Navi Mumbai Hire Me Neha 9910780858 Top Class โ„‚all Girl Serviโ„‚e...
ย 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
ย 
Identify Rules that Predict Patientโ€™s Heart Disease - An Application of Decis...
Identify Rules that Predict Patientโ€™s Heart Disease - An Application of Decis...Identify Rules that Predict Patientโ€™s Heart Disease - An Application of Decis...
Identify Rules that Predict Patientโ€™s Heart Disease - An Application of Decis...
ย 
SCI8-Q4-MOD11.pdfwrwujrrjfaajerjrajrrarj
SCI8-Q4-MOD11.pdfwrwujrrjfaajerjrajrrarjSCI8-Q4-MOD11.pdfwrwujrrjfaajerjrajrrarj
SCI8-Q4-MOD11.pdfwrwujrrjfaajerjrajrrarj
ย 
Huawei Ransomware Protection Storage Solution Technical Overview Presentation...
Huawei Ransomware Protection Storage Solution Technical Overview Presentation...Huawei Ransomware Protection Storage Solution Technical Overview Presentation...
Huawei Ransomware Protection Storage Solution Technical Overview Presentation...
ย 
Abortion pills in Jeddah |+966572737505 | get cytotec
Abortion pills in Jeddah |+966572737505 | get cytotecAbortion pills in Jeddah |+966572737505 | get cytotec
Abortion pills in Jeddah |+966572737505 | get cytotec
ย 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
ย 
Harnessing the Power of GenAI for BI and Reporting.pptx
Harnessing the Power of GenAI for BI and Reporting.pptxHarnessing the Power of GenAI for BI and Reporting.pptx
Harnessing the Power of GenAI for BI and Reporting.pptx
ย 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
ย 

Using Machine Learning in Networks Intrusion Detection Systems

  • 1. Using Machine Learning in Networks Intrusion Detection Systems OMAR SHAYA Georg-August-Universitรคt Gรถttingen โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“ 1
  • 2. Sections โœค Introduction โœค Intrusion Detection Methodologies โœค A Machine Learning Based IDS (Intrusion Detection System) โœค Challenges of Using Machine Learning in Intrusion Detection โœค Summary โœค References โœค Appendix Georg-August-Universitรคt Gรถttingen โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“ 2
  • 4. Increasing attacks on computer networks and the need for automated detection Georg-August-Universitรคt Gรถttingen โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“ 4 โ€ข Internet and computer systems have raised numerous security and privacy issues โ€ข Explosive use of networks due to many reasons e.g. internet, wireless networks, cloud computing โ€ข Thus, malicious attacks on networks have increased year over year โ€ข Need to automate systems that detect these attacks โ€ข Based on on known attacks โ€ข But what about attacks that were not seen before โ€ข Machine learning? INTRODUCTION
  • 5. De๏ฌnition: intrusion & intrusion detection Georg-August-Universitรคt Gรถttingen โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“ 5 INTRODUCTION โ€œIntrusion is an attempt to compromise CIA (Con๏ฌdentiality, Integrity, Availability), or to bypass the security mechanisms of a computer or networkโ€œ โ€œIntrusion detection is the process of monitoring the events occurring in a computer system or network, and analyzing them for signs of intrusionโ€
  • 6. INTRUSION DETECTION METHODOLOGIES Georg-August-Universitรคt Gรถttingen โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“ 6 IDS: Intrusion Detection System
  • 7. There are 3 main Detection Methodologies Georg-August-Universitรคt Gรถttingen โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“ 7 โ€ข Signature-based Detection (SD) โ€ข A signature is a string or pattern that corresponds to known attack or threat โ€ข SD is a process to compare patterns against captured events for recognizing possible intrusions โ€ข Uses the knowledge accumulated by speci๏ฌc attacks and system vulnerabilities โ€ข Also known as Knowledge-based Detection or Misuse Detection โ€ข Anomaly-based Detection (AD) โ€ข Anomaly is a deviation to โ€œnormalโ€ behavior โ€ข Pro๏ฌles of normal derived from monitoring network traf๏ฌc โ€ข AD compares normal pro๏ฌles with observed events to recognize attacks โ€ข Stateful Protocol Analysis (SPA) โ€ข SPA depends on vendor-developed generic pro๏ฌles to speci๏ฌc protocols โ€ข Protocols based on standards from international standard organizations โ€ข Hybrid IDS use multiple methodologies โ€ข SD and AD are complementary methods, former concerns with certain attacks and the later focuses on unknown attacks INTRUSION DETECTION METHODOLOGIES
  • 8. There are 3 main Detection Methodologies Georg-August-Universitรคt Gรถttingen โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“ 8 โ€ข Hybrid IDS use multiple methodologies โ€ข E.g. SD and AD are complementary methods โ€ข SD concerns with certain attacks and AD focuses on unknown attacks INTRUSION DETECTION METHODOLOGIES Signature-based Detection (SD)* Anomaly-based Detection (AD) Stateful Protocol Analysis (SPA) SD is a process to compare patterns against captured events for recognizing possible intrusions AD compares normal pro๏ฌles with observed events to recognize attacks SPA depends on vendor-developed generic pro๏ฌles to speci๏ฌc protocols A signature is a string or pattern that corresponds to known attack or threat Anomaly is a deviation to โ€œnormalโ€ behavior The stateful in SPA indicates that IDS could know and trace the protocol states (e.g., pairing requests with replies) Uses the knowledge accumulated by speci๏ฌc attacks and system vulnerabilities Pro๏ฌles of normal derived from monitoring network traf๏ฌc Protocols based on standards from international standard organizations * Also known as Knowledge-based Detection or Misuse Detection
  • 9. Pros and cons of Intrusion Detection Methods Georg-August-Universitรคt Gรถttingen โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“ 9 INTRUSION DETECTION METHODOLOGIES Table 1: Pros and Cons of intrusion detection methodologies. Source [2] Signature-based Detection (SD) Anomaly-based Detection (AD) Stateful Protocol Analysis (SPA) โ€ข Simplest and effective method to detect attacks โ€ข Detail contextual analysis โ€ข Effective to detect new and unforeseen vulnerabilities โ€ข Less dependent on OS โ€ข Facilitate detections of privilege abuse โ€ข Know and trace protocol states โ€ข Distinguish unexpected sequences of commands โ€ข Ineffective with unknown attacks and variants of known attacks โ€ข Little understanding to states and protocols โ€ข Hard to keep signatures/patterns up to date โ€ข Time consuming to maintain the knowledge โ€ข Weak pro๏ฌles accuracy due to observed events โ€ข Unavailable during rebuilding of behavior pro๏ฌles โ€ข Dif๏ฌcult to trigger alerts in right time โ€ข Resource consuming to protocol state tracing and examination โ€ข Unable to inspect attacks looking like benign protocol behaviors โ€ข Might be incompatible to dedicated OSs or APs PROSCONS
  • 10. A MACHINE LEARNING BASED IDS Georg-August-Universitรคt Gรถttingen โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“ 10 IDS: Intrusion Detection System
  • 11. Machine learning in anomaly detection Georg-August-Universitรคt Gรถttingen โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“ 11 โ€ข Anomaly-based Detection (AD) โ€ข Easy when it is possible to characterize what is normal in the data using simple mathematical model, e.g. normal distribution โ€ข Most interesting real world systems have complex behavior that doesnโ€™t follow such distribution โ€ข Machine learning is useful to learn the characteristics of the system from observed data โ€ข Feature Selection is the process of selecting a subset of relevant features (variables, predictors) for use in model construction. Feature selection techniques are used for three reasons: โ€ข Simpli๏ฌcation of models to make them easier to interpret โ€ข Shorter training times โ€ข Enhanced generalization by reducing over๏ฌtting โ€ข Outlier Detection: an outlier is an observation point that is distant from other observations A MACHINE LEARNING BASED IDS
  • 12. Robust Feature Selection and Robust PCA for Internet Traf๏ฌc Anomaly Detection Georg-August-Universitรคt Gรถttingen โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“ 12 โ€ข Couples feature selection algorithm with outlier detection method โ€ข Uses robust statistics tools in both procedures โ€ข Reliable results even with outliersโ€™ presence โ€ข Feature selection based on robust mutual estimator โ€ข MI (Mutual Information): an information-theoretic metric that captures both linear and non-linear dependencies โ€ข Outlier detection on robust PCA (Principal Component Analysis) โ€ข Mathematical procedure used to reduce dimensionality of a problem A MACHINE LEARNING BASED IDS
  • 13. Robust Feature Selection and Robust PCA for Internet Traf๏ฌc Anomaly Detection Georg-August-Universitรคt Gรถttingen โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“ 13 โ€ข Feature selection โ€ข Important preprocessing step (๏ฌlter) โ€ข Reduce dimensionality with high-dimensional data โ€ข Remove irrelevant data โ€ข Increase learning accuracy โ€ข Gives signi๏ฌcant performance gains โ€จ A MACHINE LEARNING BASED IDS
  • 14. Robust Feature Selection and Robust PCA for Internet Traf๏ฌc Anomaly Detection Georg-August-Universitรคt Gรถttingen โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“ 14 A MACHINE LEARNING BASED IDS โ€ข Robust statistics โ€ข Reliable results even in the presence of outliers Example: โ€ข In normal distribution, the inner 95% are in โ€œcenter ยฑ 1.96 X spreadโ€ โ€ข Center: instead of mean, โ€จ take the median โ€ข Spread: instead of SD (standard deviation), take the MAD (median absolute deviation) Source [1]
  • 15. Dataset creation for training and testing (1/2) Georg-August-Universitรคt Gรถttingen โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“ 15 โ€ข Dataset collected from mirroring traf๏ฌc passing the switch of: โ€ข Private laboratory network, 17 inter-connected PCs โ€ข 10 for users producing licit traf๏ฌc โ€ข 1 for server, 1 for measurements โ€ข 5 for attacks โ€ข Licit traf๏ฌc โ€ข File sharing (BitTorrent) โ€ข Video streaming (IPTV over TCP) โ€ข Web browsing (HTTP) โ€ข Attacks โ€ข Botnets โ€ข Port-scans: identify other targets vulnerable to infections โ€ข Snapshots: type of identity theft for stealing personal information โ€ข Other Botnet attacks are not used e.g. spyware, malware, denial of service, and email spam โ€ข Happen uniquely on host level โ€ข Can be detected by e.g. anti-virus, monitoring at router/๏ฌrewalls, email scanning A MACHINE LEARNING BASED IDS
  • 16. Dataset creation for training and testing (2/2) Georg-August-Universitรคt Gรถttingen โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“ 16 โ€ข Customer usage pro๏ฌles โ€ข (a) Soft browsing (HTTP only) โ€ข (b) File sharing machine (BitTorrent only) โ€ข (c) File sharing user (BitTorrent and HTTP) โ€ข (d) Heavy user (HTTP, BitTorrent, and Streaming) โ€ข Network scenarios โ€ข (B) Business user โ€ข 100% (a) โ€ข (R) Residential user โ€ข 30% (b), 40% (c), 30% (d) โ€ข Attack intensities โ€ข (1) 6% (5% snapshot, 1% port-scan) โ€ข (2) 20% (15% snapshot, 5% port-scan) โ€ข (3) 35% (30% snapshot, 5% port-scan) A MACHINE LEARNING BASED IDS Table 2. Source [1]
  • 17. Results (1/3) Georg-August-Universitรคt Gรถttingen โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“ 17 A MACHINE LEARNING BASED IDS โ€ข 6 types of anomaly detectors A-B โ€ข A: feature selection method, B Outlier detection method โ€ข R (robust) โ€ข NR (non-robust) โ€ข โˆ… (no-method) โ€ข Performance measures โ€ข Nr Ftrs: number of selected features โ€ข Recall: probability that an observation is classi๏ฌed as anomaly when in fact it is an anomaly โ€ข False positive rate (FPR): probability that an observation is classi๏ฌed as an anomaly when in fact it is a regular observation โ€ข Precision: probability of having an anomalous observation given that it is classi๏ฌed as an anomaly Table 3. Source [1]
  • 18. Results (2/3) Georg-August-Universitรคt Gรถttingen โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“ 18 โ€ข R-R detector achieved the best results โ€ข Recall is always 1 โ€ข B1, B2, B3, R3 performance is maximum โ€ข FPR and Precision are close to their optimal โ€ข Improvement over non-robust version is high โ€ข Low recall means large percentage of anomalies are not correctly identi๏ฌed โ€ข B2, B3, R3 recall improved from 0.167, 0.273, and 0.125 to 1 โ€ข Feature selection โ€ข Feature selection reduces Nr Ftrs, improves performance โ€ข B3 and R3: no feature selection sometimes better than non-robust feature selection A MACHINE LEARNING BASED IDS Table 3. Source [1]
  • 19. Results (3/3) Georg-August-Universitรคt Gรถttingen โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“ 19 A MACHINE LEARNING BASED IDS โ€ข Compare R-NR (top) and R-R (bottom) โ€ข Any point with score or distance larger than a threshold (the lines) is considered an anomaly โ€ข R-NR case there is confusion around snapshots โ€ข thus poor recall value 0.125 โ€ข proximity in behavior between snapshots and some HTTP and BitTorrent fools the non-robust outlier detector โ€ข All consist of small ๏ฌle uploads Source [1] Fig. 2.
  • 20. Discussion Georg-August-Universitรคt Gรถttingen โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“ 20 โ€ข There are advantages of using feature selection step and using robust statistics for both feature selection and outlier detection โ€ข System achieves very high performance โ€ข The systemโ€™s anomaly detector is adaptive to different traf๏ฌc conditions (licit traf๏ฌc differs signi๏ฌcantly in the two scenarios) โ€ข However, the dataset used was obtained from a private lab with 17 PCs, and not necessarily representative of a real world scenario โ€ข Need to show proof of the effectiveness of the system in larger scale network traf๏ฌc dataset A MACHINE LEARNING BASED IDS
  • 21. CHALLENGES OF USING MACHINE LEARNING IN INTRUSION DETECTION Georg-August-Universitรคt Gรถttingen โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“ 21
  • 22. Outliers, cost of error, semantics, and evaluation Georg-August-Universitรคt Gรถttingen โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“ 22 โ€ข Outlier detection โ€ข Hard to de๏ฌne normal in network traf๏ฌc as the usage varies in every session and with new applications (diversity of network traf๏ฌc) โ€ข High cost of errors โ€ข Cost of misclassi๏ฌcation is extremely high โ€ข False positive: expensive analyst time โ€ข False negative: cause serious damage to an organization โ€ข Error in other applications of ML not expensive e.g. product recommendations, OCR, spam detection โ€ข Semantic gap โ€ข Currently it is only assessment of capability to identify deviations from normal pro๏ฌle (could be good or bad) โ€ข Need to interpret results from operator point of view, what does it mean? โ€ข Dif๏ฌculties with evaluation โ€ข Designing sound evaluation schemes can be more dif๏ฌcult than the detector itself โ€ข Lack of public data sets for assessing anomaly detection โ€ข Hard to gain real data set for many reasons e.g. leak of personal data โ€ข Simulated data is not accurate CHALLENGES OF USING MACHINE LEARNING IN INTRUSION DETECTION
  • 24. Summary Georg-August-Universitรคt Gรถttingen โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“ 24 โ€ข Introduction โ€ข The need for automated Intrusion Detection Systems โ€ข De๏ฌnition of Intrusion and Intrusion Detectionโ€จ โ€ข Intrusion Detection Methodologies โ€ข Signature-based Detection (SD) โ€ข Anomaly-based Detection (AD) โ€ข Stateful Protocol Analysis (SPA)โ€จ โ€ข Machine Learning Based IDS โ€ข Using feature selection and robust statistics โ€ข Dataset creation โ€ข Results and evaluation โ€ข Discussionโ€จ โ€ข Challenges of Using Machine Learning in ID โ€ข Outlier detection, high cost of error, semantic gap, and dif๏ฌculties with evaluation SUMMARY
  • 25. OMAR SHAYA โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“ omar.shaya@stud.uni-goettingen.de Thanks! Georg-August-Universitรคt Gรถttingen โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“ 25
  • 26. Georg-August-Universitรคt Gรถttingen โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“ 26 References [1] C. Pasocal, M. Oliveira, R. Valdas, P. Filzmoser, P. Salvador and A. Pacheco. Robust Feature Selection and Robust PCA for Internet Traffic Anomaly Detection. In Proceedings IEEE INFOCOM, pages 1755-1763, 2012 [2] H. Liao, C. Lin, Y. Lin and K. Tung. Intrusion Detection System: A Comprehensive Review. In Journal of Network and Computer Applications, pages 16-24, 2013 [3] R. Sommer and V. Paxson. Outside the Closed World: On Using Machine Learning For Network Intrusion Detection. In IEEE Symposium on Security and Privacy, pages 305-316, 2010 [4] Feature Selection. https://en.wikipedia.org/wiki/Feature_selection on 6 August 2015 [5] Outlier. https://en.wikipedia.org/wiki/Outlier on 6 August 2015 [6] Anomaly Detection โ€“ Using Machine Learning to Detect Abnormalities in Time Series Data. http:// blogs.technet.com/b/machinelearning/archive/2014/11/05/anomaly-detection-using-machine-learning-to- detect-abnormalities-in-time-series-data.aspx on 6 August 2015 REFERENCES
  • 27. Precision and Recall Georg-August-Universitรคt Gรถttingen โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“ 27 APPENDIX Source: Dr. Stephan Siggโ€™s slides from Machine Learning and Pervasive Computing course SoSe 2015