data mining for security application


Published on

Published in: Technology
1 Comment
1 Like
No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

data mining for security application

  1. 1. Data Mining for Security Applications
  2. 2. <ul><li>Overview of Data Mining </li></ul><ul><li>Security Threats </li></ul><ul><li>Data Mining for Cyber security applications </li></ul><ul><ul><li>Intrusion Detection </li></ul></ul><ul><ul><li>Data Mining for Firewall Policy Management </li></ul></ul><ul><ul><li>Data Mining for Worm Detection </li></ul></ul><ul><ul><li>Data Mining for Counter-terrorism </li></ul></ul><ul><ul><li>Surveillance </li></ul></ul><ul><ul><li>Advantages </li></ul></ul><ul><ul><li>Conclusion </li></ul></ul>
  3. 3. <ul><li>Data Mining - Extraction of interesting ( non-trivial, implicit, previously unknown and potentially useful) information or patterns from data in large databases [Han and Kamber 2005]. </li></ul><ul><li>Data mining is used to sort through the tremendous amounts of data stored by automated data collection tools. </li></ul><ul><li>Extracts rules, regularities, patterns, and constraints from databases. </li></ul>
  4. 4. Natural Disasters Human Errors Non - Information related threats Information Related threats Biological, Chemical, Nuclear Threats Critical Infrastructure Threats Threat Types
  5. 5. <ul><li>Data mining is being applied to problems such as intrusion detection and auditing. For example, </li></ul><ul><li>Anomaly detection techniques could be used to detect unusual patterns and behaviors. </li></ul><ul><li>Link analysis may be used to trace self-propagating malicious code to its authors. </li></ul><ul><li>Classificatio n may be used to group various cyber attacks and then use the profiles to detect an attack when it occurs. </li></ul><ul><li>Prediction may be used to determine potential future attacks depending in a way on information learnt about terrorists through email and phone conversations </li></ul>
  6. 6. <ul><li>An intrusion can be defined as “any set of actions that attempt to compromise the integrity, confidentiality, or availability of a resource”. </li></ul><ul><li>Attacks are: </li></ul><ul><ul><li>Host-based attacks </li></ul></ul><ul><ul><li>Network-based attacks </li></ul></ul><ul><li>Intrusion detection systems are split into two groups: </li></ul><ul><ul><li>Anomaly detection systems </li></ul></ul><ul><ul><li>Misuse detection systems </li></ul></ul>
  7. 7. <ul><li>Data mining can help automate the process of investigating intrusion detection alarms. </li></ul><ul><li>Data mining on historical audit data and intrusion detection alarms can reduce future false alarms. </li></ul>
  8. 8. <ul><li>Build models of normal data </li></ul><ul><li>Detect any deviation from normal data </li></ul><ul><li>Flag deviation as suspect </li></ul><ul><li>Identify new types of intrusions as deviation from normal behavior </li></ul><ul><li>Misuse detection </li></ul><ul><li>Label all instances in the data set (“normal” or “intrusion” ) </li></ul><ul><li>Run learning algorithms over the labeled data to generate classification rules </li></ul><ul><li>Automatically retrain intrusion detection models on different input data </li></ul>
  9. 9. <ul><li>Misuse detection </li></ul><ul><ul><li>Classification Model </li></ul></ul><ul><ul><li>Bayesian classifier </li></ul></ul><ul><ul><li>Decision tree </li></ul></ul><ul><ul><li>Association rule </li></ul></ul><ul><ul><li>Support vector machine </li></ul></ul><ul><ul><li>Learning from rare class </li></ul></ul>
  10. 10. <ul><li>Anomaly detection </li></ul><ul><ul><li>Anomaly Detection Model </li></ul></ul><ul><ul><li>Association rule </li></ul></ul><ul><ul><li>Neural network </li></ul></ul><ul><ul><li>Unsupervised SVM </li></ul></ul><ul><ul><li>Outlier detection </li></ul></ul>
  11. 11. Analysis of Firewall Policy Rules Using Data Mining Technique s <ul><ul><li>Firewall is the de facto core technology of today’s network security </li></ul></ul><ul><ul><li>First line of defense against external network attacks and threats </li></ul></ul><ul><ul><li>Firewall controls or governs network access by allowing or denying the incoming or outgoing network traffic according to firewall policy rules. </li></ul></ul><ul><ul><li>Manual definition of rules often result in anomalies in the policy </li></ul></ul><ul><ul><li>Detecting and resolving these anomalies manually is a tedious and an error prone task </li></ul></ul>
  12. 12. <ul><ul><li>Anomaly detection: </li></ul></ul><ul><ul><li>Theoretical Framework for the resolution of anomaly </li></ul></ul><ul><ul><li>A new algorithm will simultaneously detect and resolve any anomaly that is present in the policy rules </li></ul></ul><ul><ul><li>Traffic Mining: </li></ul></ul><ul><ul><li>Mine the traffic and detect anomalies </li></ul></ul>
  13. 13. <ul><li>To bridge the gap between what is written in the firewall policy rules and what is being observed in the network is to analyze traffic and log of the packets– </li></ul><ul><ul><ul><li>Network traffic trend may show that some rules are out-dated or not used recently </li></ul></ul></ul>Firewall Policy Rule Firewall Log File Mining Log File Using Frequency Filtering Rule Generalization Generic Rules Identify Decaying & Dominant Rules Edit Firewall Rules
  14. 14. <ul><li>What are worms? </li></ul><ul><ul><li>Self-replicating program; Exploits software vulnerability on a victim; Remotely infects other victims </li></ul></ul><ul><li>Goals of worm detection </li></ul><ul><ul><li>Real-time detection </li></ul></ul><ul><li>Issues </li></ul><ul><ul><li>Substantial Volume of Identical Traffic, Random Probing </li></ul></ul><ul><li>Methods for worm detection </li></ul><ul><ul><li>Count number of sources/destinations; Count number of failed connection attempts </li></ul></ul><ul><li>Worm Types </li></ul><ul><ul><li>Email worms, Instant Messaging worms, Internet worms, IRC worms, File-sharing Networks worms </li></ul></ul>
  15. 15. Training data Feature extraction Clean or Infected ? Outgoing Emails Classifier Machine Learning Test data The Model <ul><li>Task: </li></ul><ul><ul><li>given some training instances of both “normal” and “viral” emails, induce a hypothesis to detect “viral” emails. </li></ul></ul>
  16. 17. <ul><li>Gather data from multiple sources </li></ul><ul><ul><li>Information on terrorist attacks: who, what, where, when, how </li></ul></ul><ul><ul><li>Personal and business data: place of birth, ethnic origin, religion, education, work history, finances, criminal record, relatives, friends and associates, travel history, . . . </li></ul></ul><ul><ul><li>Unstructured data: newspaper articles, video clips, speeches, emails, phone records, . . . </li></ul></ul><ul><li>Integrate the data, build warehouses and federations </li></ul><ul><li>Develop profiles of terrorists, activities/threats </li></ul><ul><li>Mine the data to extract patterns of potential terrorists and predict future activities and targets </li></ul><ul><li>Find the “needle in the haystack” - suspicious needles? </li></ul><ul><li>Data integrity is important </li></ul>
  17. 18. Integrate data sources Clean/ modify data sources Build Profiles of Terrorists and Activities Examine results/ Prune results Report final results Data sources with information about terrorists and terrorist activities Mine the data
  18. 19. <ul><li>Nature of data </li></ul><ul><ul><li>Data arriving from sensors and other devices </li></ul></ul><ul><ul><ul><li>Continuous data streams </li></ul></ul></ul><ul><ul><li>Breaking news, video releases, satellite images </li></ul></ul><ul><ul><li>Some critical data may also reside in caches </li></ul></ul><ul><li>Rapidly sift through the data and discard unwanted data for later use and analysis (non-real-time data mining) </li></ul><ul><li>Data mining techniques need to meet timing constraints </li></ul><ul><li>Quality of service (QoS) tradeoffs among timeliness, precision and accuracy </li></ul><ul><li>Presentation of results, visualization, real-time alerts and triggers </li></ul>
  19. 20. Integrate data sources in real - time Build real - time models Examine Results in Real - time Report final results Data sources with information about terrorists and terrorist activities Mine the data Rapidly sift through data and discard irrelevant data
  20. 22. <ul><li>Huge amounts of surveillance and video data available in the security domain </li></ul><ul><li>Analysis is being done off-line usually using “Human Eyes” </li></ul><ul><li>Need for tools to aid human analyst ( pointing out areas in video where unusual activity occurs) </li></ul>
  21. 23. <ul><li>Event Representation </li></ul><ul><ul><li>Estimate distribution of pixel intensity change </li></ul></ul><ul><li>Event Comparison </li></ul><ul><ul><li>Contrast the event representation of different video sequences to determine if they contain similar semantic event content. </li></ul></ul><ul><li>Event Detection </li></ul><ul><ul><li>Using manually labeled training video sequences to classify unlabeled video sequences </li></ul></ul>
  22. 24. <ul><li>Law enforcement : Data mining can aid law enforcers in identifying criminal suspects as well as apprehending these criminals by examining trends in location, crime type, habit, and other patterns of behaviors. </li></ul><ul><li>Researchers: Data mining can assist researchers by speeding up their data analyzing process; thus, allowing them more time to work on other projects.    </li></ul>
  23. 25. <ul><li>The various data mining techniques that have been proposed towards the enhancement of security of different application. </li></ul><ul><li>The ways in which data mining has been known to aid the process of Intrusion Detection,firewall,worm detection counter-terrorism and the ways in which the various techniques have been applied and evaluated. </li></ul>
  24. 26. <ul><li>B. Thuraisingham. Managing threats to web databases and cyber systems: Issues, solutions and challenges. In V. Kumar et al, editor, Cyber Security: Threats and Countermeasures. Kluwer </li></ul><ul><li>B. Thuraisingham. Data mining, national security, privacy and civil liberties. SIGKDD Explorations, January 2003 </li></ul><ul><li>F. Bolz et al. The Counterterrorism Handbook: Tactics, Procedures, and Techniques. CRC Press, 2001. </li></ul><ul><li> / </li></ul>
  25. 27. <ul><li>Thank you </li></ul>