data mining for security application
Upcoming SlideShare
Loading in...5

data mining for security application






Total Views
Views on SlideShare
Embed Views



0 Embeds 0

No embeds



Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
  • very much helpful...:-)
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment

data mining for security application data mining for security application Presentation Transcript

  • Data Mining for Security Applications
    • Overview of Data Mining
    • Security Threats
    • Data Mining for Cyber security applications
      • Intrusion Detection
      • Data Mining for Firewall Policy Management
      • Data Mining for Worm Detection
      • Data Mining for Counter-terrorism
      • Surveillance
      • Advantages
      • Conclusion
    • Data Mining - Extraction of interesting ( non-trivial, implicit, previously unknown and potentially useful) information or patterns from data in large databases [Han and Kamber 2005].
    • Data mining is used to sort through the tremendous amounts of data stored by automated data collection tools.
    • Extracts rules, regularities, patterns, and constraints from databases.
  • Natural Disasters Human Errors Non - Information related threats Information Related threats Biological, Chemical, Nuclear Threats Critical Infrastructure Threats Threat Types
    • Data mining is being applied to problems such as intrusion detection and auditing. For example,
    • Anomaly detection techniques could be used to detect unusual patterns and behaviors.
    • Link analysis may be used to trace self-propagating malicious code to its authors.
    • Classificatio n may be used to group various cyber attacks and then use the profiles to detect an attack when it occurs.
    • Prediction may be used to determine potential future attacks depending in a way on information learnt about terrorists through email and phone conversations
    • An intrusion can be defined as “any set of actions that attempt to compromise the integrity, confidentiality, or availability of a resource”.
    • Attacks are:
      • Host-based attacks
      • Network-based attacks
    • Intrusion detection systems are split into two groups:
      • Anomaly detection systems
      • Misuse detection systems
    • Data mining can help automate the process of investigating intrusion detection alarms.
    • Data mining on historical audit data and intrusion detection alarms can reduce future false alarms.
    • Build models of normal data
    • Detect any deviation from normal data
    • Flag deviation as suspect
    • Identify new types of intrusions as deviation from normal behavior
    • Misuse detection
    • Label all instances in the data set (“normal” or “intrusion” )
    • Run learning algorithms over the labeled data to generate classification rules
    • Automatically retrain intrusion detection models on different input data
    • Misuse detection
      • Classification Model
      • Bayesian classifier
      • Decision tree
      • Association rule
      • Support vector machine
      • Learning from rare class
    • Anomaly detection
      • Anomaly Detection Model
      • Association rule
      • Neural network
      • Unsupervised SVM
      • Outlier detection
  • Analysis of Firewall Policy Rules Using Data Mining Technique s
      • Firewall is the de facto core technology of today’s network security
      • First line of defense against external network attacks and threats
      • Firewall controls or governs network access by allowing or denying the incoming or outgoing network traffic according to firewall policy rules.
      • Manual definition of rules often result in anomalies in the policy
      • Detecting and resolving these anomalies manually is a tedious and an error prone task
      • Anomaly detection:
      • Theoretical Framework for the resolution of anomaly
      • A new algorithm will simultaneously detect and resolve any anomaly that is present in the policy rules
      • Traffic Mining:
      • Mine the traffic and detect anomalies
    • To bridge the gap between what is written in the firewall policy rules and what is being observed in the network is to analyze traffic and log of the packets–
        • Network traffic trend may show that some rules are out-dated or not used recently
    Firewall Policy Rule Firewall Log File Mining Log File Using Frequency Filtering Rule Generalization Generic Rules Identify Decaying & Dominant Rules Edit Firewall Rules
    • What are worms?
      • Self-replicating program; Exploits software vulnerability on a victim; Remotely infects other victims
    • Goals of worm detection
      • Real-time detection
    • Issues
      • Substantial Volume of Identical Traffic, Random Probing
    • Methods for worm detection
      • Count number of sources/destinations; Count number of failed connection attempts
    • Worm Types
      • Email worms, Instant Messaging worms, Internet worms, IRC worms, File-sharing Networks worms
  • Training data Feature extraction Clean or Infected ? Outgoing Emails Classifier Machine Learning Test data The Model
    • Task:
      • given some training instances of both “normal” and “viral” emails, induce a hypothesis to detect “viral” emails.
    • Gather data from multiple sources
      • Information on terrorist attacks: who, what, where, when, how
      • Personal and business data: place of birth, ethnic origin, religion, education, work history, finances, criminal record, relatives, friends and associates, travel history, . . .
      • Unstructured data: newspaper articles, video clips, speeches, emails, phone records, . . .
    • Integrate the data, build warehouses and federations
    • Develop profiles of terrorists, activities/threats
    • Mine the data to extract patterns of potential terrorists and predict future activities and targets
    • Find the “needle in the haystack” - suspicious needles?
    • Data integrity is important
  • Integrate data sources Clean/ modify data sources Build Profiles of Terrorists and Activities Examine results/ Prune results Report final results Data sources with information about terrorists and terrorist activities Mine the data
    • Nature of data
      • Data arriving from sensors and other devices
        • Continuous data streams
      • Breaking news, video releases, satellite images
      • Some critical data may also reside in caches
    • Rapidly sift through the data and discard unwanted data for later use and analysis (non-real-time data mining)
    • Data mining techniques need to meet timing constraints
    • Quality of service (QoS) tradeoffs among timeliness, precision and accuracy
    • Presentation of results, visualization, real-time alerts and triggers
  • Integrate data sources in real - time Build real - time models Examine Results in Real - time Report final results Data sources with information about terrorists and terrorist activities Mine the data Rapidly sift through data and discard irrelevant data
    • Huge amounts of surveillance and video data available in the security domain
    • Analysis is being done off-line usually using “Human Eyes”
    • Need for tools to aid human analyst ( pointing out areas in video where unusual activity occurs)
    • Event Representation
      • Estimate distribution of pixel intensity change
    • Event Comparison
      • Contrast the event representation of different video sequences to determine if they contain similar semantic event content.
    • Event Detection
      • Using manually labeled training video sequences to classify unlabeled video sequences
    • Law enforcement : Data mining can aid law enforcers in identifying criminal suspects as well as apprehending these criminals by examining trends in location, crime type, habit, and other patterns of behaviors.
    • Researchers: Data mining can assist researchers by speeding up their data analyzing process; thus, allowing them more time to work on other projects.   
    • The various data mining techniques that have been proposed towards the enhancement of security of different application.
    • The ways in which data mining has been known to aid the process of Intrusion Detection,firewall,worm detection counter-terrorism and the ways in which the various techniques have been applied and evaluated.
    • B. Thuraisingham. Managing threats to web databases and cyber systems: Issues, solutions and challenges. In V. Kumar et al, editor, Cyber Security: Threats and Countermeasures. Kluwer
    • B. Thuraisingham. Data mining, national security, privacy and civil liberties. SIGKDD Explorations, January 2003
    • F. Bolz et al. The Counterterrorism Handbook: Tactics, Procedures, and Techniques. CRC Press, 2001.
    • /
    • Thank you