Data Mining for Intrusion Detection
DM for IDS
Outline
 Data Mining
 Intrusion Detection
 Data Mining for Intrusion Detection
Data Mining
 KDD (Knowledge Discovery in Databases):
• The process of identifying valid, novel, useful
understandable patterns in data.
• Steps: understanding the application domain, data
preparation, data mining, interpretation, and
utilizing the discovered knowledge.
• Data Mining (DM): applying specific algorithms to
extract patterns of data.
• DM is the core of KDD.
Data Mining (cont.)
 KDD vs. DM:
Data Mining (cont.)
 Data mining techniques & Algorithms:
 Classification: classify or map a data item to one
of predefined classes, decision tree algorithm.
 Clustering: grouping similar data items into
clusters, K-mean algorithm.
 Frequent pattern mining: finds patterns or
regularities that occur together.
 Sequential pattern analysis: time-based, order of
patterns is important.
Outline
 Data Mining
 Intrusion Detection
 Data Mining for Intrusion Detection
Intrusion Detection
 Computer security goals: confidentiality,
integrity, and availability.
 Intrusion: is a set of actions aimed to
compromise these goals.
 Intrusion prevention (authentication, encryption,
etc.) alone is not sufficient.
 Intrusion detection (ID) is needed
 ID: is the process of identifying intrusions in a
system.
 IDS: combination of hardware & software that
detect intrusions and raise alarms.
Intrusion Detection (cont.)
 Primary assumption: users and system
activities and resources can be monitored and
analyzed.
 Two types techniques of ID:
A. Misuse detection: use pattern of well-known
attack (signature) to identify intrusion, pattern-
based; Email example.
B. Anomaly detection: use deviation of normal
usage pattern to identify intrusions, profile-
based; user behavior example;
Intrusion Detection
 Misuse Detection
 Main Problems:
• Unknown intrusions can not be detected (that have
no matches patter in the system)
• Manual coding of known intrusion patterns.
Intrusion Detection (cont.)
 Anomaly detection:
 Main problems:
Selecting the right set of system features to be
measured in based on experience.
Unable to capture sequential interrelation between
events.
Intrusion Detection
 Example applications:
1. SNORT (www.snort.org) for misuse detection:
• It is an open source signature based IDS
• It stores signatures of each known intrusion.
1. Computer watch (AT&T) for anomaly detection:
• It is an expert system that summarize security
sensitive events and apply rules to detect
anomalies behaviors.
Outline
 Data Mining
 Intrusion Detection
 Data Mining for Intrusion Detection
Data Mining for ID
 Why DM is applicable in intrusion detection?
• Intrusion detection is a data analysis process.
• Normal and intrusive activities leave evidence in
audit data.
• Learn from traffic data:
• Supervised learning: learn precise models from past
intrusions.
• Unsupervised learning: identifying suspicious activities.
Data Mining for ID
 Data Mining based IDS – basic steps:
Data Mining for ID
 Misuse detection:
• Predictive models are built from labeled data sets (
instances are labeled as “normal” or “intrusive”.
• These models can be more sophisticated and precise
than manually created signatures.
• Classification techniques from DM are used.
 Anomaly Detection:
 Identifies anomalies as deviation from “normal”
behavior.
 EX, ADAM: Audit Data Analysis and Mining; MINDS –
MINnesota INtrusion Detection System
MINDS Project
Thanks!
Any questions?

DM for IDS

  • 1.
    Data Mining forIntrusion Detection DM for IDS
  • 2.
    Outline  Data Mining Intrusion Detection  Data Mining for Intrusion Detection
  • 3.
    Data Mining  KDD(Knowledge Discovery in Databases): • The process of identifying valid, novel, useful understandable patterns in data. • Steps: understanding the application domain, data preparation, data mining, interpretation, and utilizing the discovered knowledge. • Data Mining (DM): applying specific algorithms to extract patterns of data. • DM is the core of KDD.
  • 4.
  • 5.
    Data Mining (cont.) Data mining techniques & Algorithms:  Classification: classify or map a data item to one of predefined classes, decision tree algorithm.  Clustering: grouping similar data items into clusters, K-mean algorithm.  Frequent pattern mining: finds patterns or regularities that occur together.  Sequential pattern analysis: time-based, order of patterns is important.
  • 6.
    Outline  Data Mining Intrusion Detection  Data Mining for Intrusion Detection
  • 7.
    Intrusion Detection  Computersecurity goals: confidentiality, integrity, and availability.  Intrusion: is a set of actions aimed to compromise these goals.  Intrusion prevention (authentication, encryption, etc.) alone is not sufficient.  Intrusion detection (ID) is needed  ID: is the process of identifying intrusions in a system.  IDS: combination of hardware & software that detect intrusions and raise alarms.
  • 8.
    Intrusion Detection (cont.) Primary assumption: users and system activities and resources can be monitored and analyzed.  Two types techniques of ID: A. Misuse detection: use pattern of well-known attack (signature) to identify intrusion, pattern- based; Email example. B. Anomaly detection: use deviation of normal usage pattern to identify intrusions, profile- based; user behavior example;
  • 9.
    Intrusion Detection  MisuseDetection  Main Problems: • Unknown intrusions can not be detected (that have no matches patter in the system) • Manual coding of known intrusion patterns.
  • 10.
    Intrusion Detection (cont.) Anomaly detection:  Main problems: Selecting the right set of system features to be measured in based on experience. Unable to capture sequential interrelation between events.
  • 11.
    Intrusion Detection  Exampleapplications: 1. SNORT (www.snort.org) for misuse detection: • It is an open source signature based IDS • It stores signatures of each known intrusion. 1. Computer watch (AT&T) for anomaly detection: • It is an expert system that summarize security sensitive events and apply rules to detect anomalies behaviors.
  • 12.
    Outline  Data Mining Intrusion Detection  Data Mining for Intrusion Detection
  • 13.
    Data Mining forID  Why DM is applicable in intrusion detection? • Intrusion detection is a data analysis process. • Normal and intrusive activities leave evidence in audit data. • Learn from traffic data: • Supervised learning: learn precise models from past intrusions. • Unsupervised learning: identifying suspicious activities.
  • 14.
    Data Mining forID  Data Mining based IDS – basic steps:
  • 15.
    Data Mining forID  Misuse detection: • Predictive models are built from labeled data sets ( instances are labeled as “normal” or “intrusive”. • These models can be more sophisticated and precise than manually created signatures. • Classification techniques from DM are used.  Anomaly Detection:  Identifies anomalies as deviation from “normal” behavior.  EX, ADAM: Audit Data Analysis and Mining; MINDS – MINnesota INtrusion Detection System
  • 16.
  • 17.