Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

O'Reilly Security New York - Predicting Exploitability Final

958 views

Published on

Security is all about reacting. It’s time to make some predictions. Michael Roytman explains how Kenna Security used the AWS Machine Learning platform to train a binary classifier for vulnerabilities, allowing the company to predict whether or not a vulnerability will become exploitable.

Michael offers an overview of the process. Kenna enriches the data with more specific, nondefinitional-level data. 500 million live vulnerabilities and their associated close rates inform the epidemiological data, as well as “in the wild” threat data from AlienVault’s OTX and SecureWorks’s CTU, Reversing Labs, and ISC SANS. The company uses 70% of the national vulnerability database as its training dataset and generates over 20,000 predictions on the remainder of the vulnerabilities. It then measures specificity and sensitivity, positive predictive value, and false positive and false negative rates before arriving at an optimal decision cutoff for the problem.

Published in: Internet
  • Really nice work. We are doing a similar thesis for Chalmers University in Gothenburg. Do you have a more in-depth techical report on this? Like, did you split your training and test data temporally or randomly? Did you include zero-day and minus day CVEs? Why does your classifier improve when you remove important information from your prediction variables? :) Very interested in hearing more about this! Alexander Reinthal, Chalmers University
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Be the first to like this

O'Reilly Security New York - Predicting Exploitability Final

  1. 1. PREDICTING EXPLOITABILITY @MROYTMAN
  2. 2. “Prediction is very difficult, especially about the future.” -Niels Bohr
  3. 3. 3 Types of “Data-Driven” Retrospective Analysis & Reporting Here-and-now Real-time processing & dashboards Predictions To enable smart applications
  4. 4. Too many vulnerabilities. How do we derive risk from vulnerability in a data-driven manner? Problem
  5. 5. Exploitability 1. RETROSPECTIVE 2. REAL-TIME 3. PREDICTIVE
  6. 6. 1. RETROSPECTIVE 2. REAL-TIME 3. PREDICTIVE Exploitability
  7. 7. Analyst Input Vulnerability Management Programs Augmenting Data Retrospective - Model: CVSS Temporal Score Estimation Vulnerability Researchers
  8. 8. 1. RETROSPECTIVE 2. REAL-TIME 3. PREDICTIVE Exploitability
  9. 9. Real-Time - The Data Vulnerability Scans (Qualys, Rapid7, Nessus, etc): • 7,000,000 Assets (desktops, servers, urls, ips, macaddresses) • 1,400,000,000 Vulnerabilities (unique asset/CVE pairs) Exploit Intelligence - Successful Exploitations • ReversingLabs’ backend metadata •Hashes for each CVE •Number of found pieces of malware corresponding to each hash • Alienvault Backdoor •“attempted exploits” correlated with open vulnerabilities
  10. 10. ATTACKERS ARE FAST Cumulative Probability of Exploitation
  11. 11. 0 5 10 15 20 25 30 35 40 CVSS*10 EDB MSP EDB+MSP Breach*Probability*(%) Positive Predictive Value of remediating a vulnerability with property X:
  12. 12. Q: “Of my current vulnerabilities, which ones should I remediate?” A: Old ones with stable, weaponized exploits Data of Future Past
  13. 13. Q: “A new vulnerability was just released. Do we scramble?” A: Future of Data Past
  14. 14. 1. RETROSPECTIVE 2. REAL-TIME 3. PREDICTIVE Exploitability
  15. 15. Classifier = Will a vulnerability have an exploit written and published? [YES OR NO]
  16. 16. Enter: AWS ML
  17. 17. N = 81303All CVE. Described By: 1. National Vulnerability Database 2. Common Platform Enumeration 3. Occurrences in Kenna Scan Data Labelled as Exploited/Not Exploited: 1. Exploit DB 2. Metasploit 3. D2 Elliot 4. Canvas 5. Blackhat Exploit Kits Predictive - The Data
  18. 18. 70% Training, 30% Evaluation Split N = 81303 L2 regularizer 1 gb 100 passes over the data Receiver operating characteristics for comparisons All Models
  19. 19. Distribution is not uniform. 77% of dataset is not exploited 1. Accuracy of 77% would be bad Precision matters more than Recall 1. No one would use this model absent actual exploit available data. 2. False Negatives matter less than false positives - wasted effort Predictive - The Expectations We are not modeling when something will be exploited, just IF 1. Could be tomorrow or in 6 months. Re-run the model every day
  20. 20. Model 1: Baseline -CVSS Base -CVSS Temporal -Remote Code Execution -Availability -Integrity -Confidentiality -Authentication -Access Complexity -Access Vector -Publication Date
  21. 21. LMGTFY:
  22. 22. Moar Simple? Sample Bad Chart Sample Good Chart
  23. 23. Model 2: Patches -CVSS Base -CVSS Temporal -Remote Code Execution -Availability -Integrity -Confidentiality -Authentication -Access Complexity -Access Vector -Publication Date -Patch Exists
  24. 24. Model 3: Affected Software -CVSS Base -CVSS Temporal -Remote Code Execution -Availability -Integrity -Confidentiality -Authentication -Access Complexity -Access Vector -Publication Date -Patch Exists -Vendors -Products
  25. 25. Model 4: Words! -CVSS Base -CVSS Temporal -Remote Code Execution -Availability -Integrity -Confidentiality -Authentication -Access Complexity -Access Vector -Publication Date -Patch Exists -Vendors -Products -Description, Ngrams 1-5
  26. 26. Model 5: Vulnerability Prevalence -CVSS Base -CVSS Temporal -Remote Code Execution -Availability -Integrity -Confidentiality -Authentication -Access Complexity -Access Vector -Publication Date -Patch Exists -Vendors -Products -Description, Ngrams 1-5 -Vulnerability Prevalence -Number of References
  27. 27. Model 6: Overfitting Fixes -2015 onward -Exclude: “in the wild” “as seen” “exploited”
  28. 28. Exploitability
  29. 29. -Track Predictions vs. Real Exploits -Integrate 20+ BlackHat Exploit Kits - FP reduction? -Find better vulnerability descriptions - mine advisories for content? FN reduction? -Predict Breaches, not Exploits -Attempt Models by Vendor Future Work
  30. 30. Too many vulnerabilities. How do we derive risk from vulnerability in a data-driven manner? Problem
  31. 31. 1. Gather data about known successful attack paths 2. Issue forecasts where data is lacking in order to predict new exploits 3. Gather MORE data about known successful attack paths Solution
  32. 32. These will have exploits in 2017… CVE-2016-0959 Released: 6/28/2017 No Exploit in ExploitDB, Metasploit, or Canvas Flash buffer overflow, predicted that an exploit will be released
  33. 33. Thanks! @MROYTMAN

×