3. Data Mining
KDD (Knowledge Discovery in Databases):
• The process of identifying valid, novel, useful
understandable patterns in data.
• Steps: understanding the application domain, data
preparation, data mining, interpretation, and
utilizing the discovered knowledge.
• Data Mining (DM): applying specific algorithms to
extract patterns of data.
• DM is the core of KDD.
5. Data Mining (cont.)
Data mining techniques & Algorithms:
Classification: classify or map a data item to one
of predefined classes, decision tree algorithm.
Clustering: grouping similar data items into
clusters, K-mean algorithm.
Frequent pattern mining: finds patterns or
regularities that occur together.
Sequential pattern analysis: time-based, order of
patterns is important.
7. Intrusion Detection
Computer security goals: confidentiality,
integrity, and availability.
Intrusion: is a set of actions aimed to
compromise these goals.
Intrusion prevention (authentication, encryption,
etc.) alone is not sufficient.
Intrusion detection (ID) is needed
ID: is the process of identifying intrusions in a
system.
IDS: combination of hardware & software that
detect intrusions and raise alarms.
8. Intrusion Detection (cont.)
Primary assumption: users and system
activities and resources can be monitored and
analyzed.
Two types techniques of ID:
A. Misuse detection: use pattern of well-known
attack (signature) to identify intrusion, pattern-
based; Email example.
B. Anomaly detection: use deviation of normal
usage pattern to identify intrusions, profile-
based; user behavior example;
9. Intrusion Detection
Misuse Detection
Main Problems:
• Unknown intrusions can not be detected (that have
no matches patter in the system)
• Manual coding of known intrusion patterns.
10. Intrusion Detection (cont.)
Anomaly detection:
Main problems:
Selecting the right set of system features to be
measured in based on experience.
Unable to capture sequential interrelation between
events.
11. Intrusion Detection
Example applications:
1. SNORT (www.snort.org) for misuse detection:
• It is an open source signature based IDS
• It stores signatures of each known intrusion.
1. Computer watch (AT&T) for anomaly detection:
• It is an expert system that summarize security
sensitive events and apply rules to detect
anomalies behaviors.
13. Data Mining for ID
Why DM is applicable in intrusion detection?
• Intrusion detection is a data analysis process.
• Normal and intrusive activities leave evidence in
audit data.
• Learn from traffic data:
• Supervised learning: learn precise models from past
intrusions.
• Unsupervised learning: identifying suspicious activities.
15. Data Mining for ID
Misuse detection:
• Predictive models are built from labeled data sets (
instances are labeled as “normal” or “intrusive”.
• These models can be more sophisticated and precise
than manually created signatures.
• Classification techniques from DM are used.
Anomaly Detection:
Identifies anomalies as deviation from “normal”
behavior.
EX, ADAM: Audit Data Analysis and Mining; MINDS –
MINnesota INtrusion Detection System