Security trend analysis with CVE topic models

2,333 views

Published on

Presented at ISSRE 2010.

Published in: Technology, Business
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
2,333
On SlideShare
0
From Embeds
0
Number of Embeds
408
Actions
Shares
0
Downloads
1
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Security trend analysis with CVE topic models

  1. 1. © Microsoft Corporation Security Trend Analysis with CVE Topic Models Stephan Neuhaus Universita degli Studi di Trento, Italy Thomas Zimmermann Microsoft Research, Redmond, USA ISSRE 2010, San Jose, CA, USA
  2. 2. © Microsoft Corporation Background http://www.microsoft.com/security/sir/default.aspx Steve Christey, Robert A. Martin http://cve.mitre.org/docs/vuln-trends/index.html http://www.sans.org/top-cyber-security-risks/
  3. 3. © Microsoft Corporation Can we automate the trend analysis of security reports?
  4. 4. © Microsoft Corporation Trend Analysis on Vulnerability Data Raw Data Cleaning Topic Models Trends
  5. 5. © Microsoft Corporation Data Sources • Common Vulnerabilities and Exposures – Hosted by MITRE (large US research company and defense contractor) – Clearinghouse for vulnerabilities: Assigns IDs to vulnerabilities and collects descriptions • National Vulnerability Database – Annotated version of the CVE data – Downloadable from NIST
  6. 6. © Microsoft Corporation CVE Overview • Earliest CVE has a published date of October 1, 1988 (CVE 1999-0095) – Reports the sendmail DEBUG hole that was exploited by the Morris worm • Latest CVEs in our dataset are from December 31, 2009 • Total of 39,743 entries, 350 duplicates • 39,393 unique CVEs remain
  7. 7. © Microsoft Corporation Number of CVEs
  8. 8. © Microsoft Corporation Summary Impact Classification Date ID
  9. 9. © Microsoft Corporation Document Processing Stack-based buffer overflow in vpnconf.exe in TheGreenBow IPSec VPN Client 4.51.001, 4.65.003, and possibly other versions, allows user-assisted remote attackers to execute arbitrary code via a long OpenScriptAfterUp parameter in a policy (.tgb) file, related to "phase 2."
  10. 10. © Microsoft Corporation Document Processing Stack-based buffer overflow in vpnconf.exe in TheGreenBow IPSec VPN Client 4.51.001, 4.65.003, and possibly other versions, allows user-assisted remote attackers to execute arbitrary code via a long OpenScriptAfterUp parameter in a policy (.tgb) file, related to "phase 2." STOP WORDS Remove common words
  11. 11. © Microsoft Corporation Stack-based buffer overflow in vpnconf.exe in TheGreenBow IPSec VPN Client 4.51.001, 4.65.003, and possibly other versions, allows user-assisted remote attackers to execute arbitrary code via a long OpenScriptAfterUp parameter in a policy (.tgb) file, related to "phase 2." Stack-based buffer overflow in vpnconf.exe in TheGreenBow IPSec VPN Client 4.51.001, 4.65.003, and possibly other versions, allows user-assisted remote attackers to execute arbitrary code via a long OpenScriptAfterUp parameter in a policy (.tgb) file, related to "phase 2." i i Document Processing STOP WORDS Remove common words STEMMER Remove word suffixes
  12. 12. © Microsoft Corporation Length of CVE summary
  13. 13. © Microsoft Corporation Topic Analysis CVE Application Server Buffer Overflow Cross-site Scripting
  14. 14. © Microsoft Corporation Topic Analysis • Based on “Latent Dirichlet Allocation” • Documents (CVEs) are bags of words – Word order does not matter • Words are assigned to (different) topics – Probabilistic assessment • We compute topics for all years 2000-2009 – Use post-hoc probabilities to find the fraction of CVEs about a given topic in a given year
  15. 15. © Microsoft Corporation Topic Example overflow 6.29 execut 6.15 buffer 5.74 arbitrari 5.55 code 4.77 remot 4.29 command 3.05 long 2.83 craft 1.31 file 1.30 script 14.31 html 7.26 cross-sit 7.15 xss 6.92 web 6.73 inject 6.62 vulner 6.48 arbitrari 6.34 remot 5.73 paramet 4.47 11.3% 9.2% Buffer Overflow Cross-site Scripting
  16. 16. © Microsoft Corporation 28 Topics http://www.wordle.net/show/wrdl/2674704/Relative_Importance_of_Security_Topics_identified_from_CVEs 2009
  17. 17. © Microsoft Corporation Trends Topic Trend 2000 2009 Application Servers 1% 5% Arbitrary Code (PHP) 0% 2% Buffer Overflow 19% 11% Cross-Site Scripting 0% 10% Link Resolution 6% 1% Privilege Escalation 12% 2% Resource Management 14% 10% SQL Injection 1% 10%
  18. 18. © Microsoft Corporation Cause and Impact “x allows someone to y” BEA WebLogic Portal 10.0 and 9.2 through MP1, when an administrator deletes a single instance of a content portlet, removes entitlement policies for other content portlets bypass intended access restrictions. which allows attackers to
  19. 19. © Microsoft Corporation Cause and Impact “x allows someone to y” BEA WebLogic Portal 10.0 and 9.2 through MP1, when an administrator deletes a single instance of a content portlet, removes entitlement policies for other content portlets bypass intended access restrictions. which allows attackers to CAUSE
  20. 20. © Microsoft Corporation Cause and Impact “x allows someone to y” BEA WebLogic Portal 10.0 and 9.2 through MP1, when an administrator deletes a single instance of a content portlet, removes entitlement policies for other content portlets bypass intended access restrictions. which allows attackers to CAUSE IMPACT
  21. 21. © Microsoft Corporation 24 Topics for “Cause” http://www.wordle.net/show/wrdl/2674788/Relative_Importance_of_Security_%22Cause%22_Topics_identified_from_CVEs 2009
  22. 22. © Microsoft Corporation Trends for “Cause” Topic Trend 2000 2009 Buffer Overflow 17% 10% Cross-Site Scripting 11% 17% PHP 5% 8% SQL Injection 10% 21%
  23. 23. © Microsoft Corporation 12 Topics for “Impact” http://www.wordle.net/show/wrdl/2674920/Relative_Importance_of_Security_%22Impact%22_Topics_identified_from_CVEs 2009
  24. 24. © Microsoft Corporation Trends for “Impact” Topic Trend 2000 2009 Arbitrary Code 15% 24% Arbitrary Script 17% 35% Denial of Service 30% 11% Information Leak 22% 15% Privilege Escalation 11% 7%
  25. 25. © Microsoft Corporation Common Weakness Enumeration • Supposed to be a complete dictionary for software weaknesses • There are lots of available CWEs (659) • But only 19 are used in CVE entries, 73% of CVE entries have a CWE field • How well do our LDA topics align with the manual CWE classification? Classification
  26. 26. © Microsoft Corporation Alignment with CWEs Precision Recall LDA Topic Name Only those that mapped to CWEs
  27. 27. © Microsoft Corporation Alignment with CWEs Precision Recall LDA Topic Name SQL Injection Cross-Site Scripting Directory Traversal Link Resolution Format String Buffer Overflow Resource Management Cross-Site Request Forgery Information Leak Cryptography Credentials Management Arbitrary Code
  28. 28. © Microsoft Corporation Alignment with CWEs Precision Recall LDA Topic Name 97.8 94.6 SQL Injection 98.1 85.4 Cross-Site Scripting 93.1 85.6 Directory Traversal 57.6 80.1 Link Resolution 51.8 75.3 Format String 60.1 57.6 Buffer Overflow 29.7 49.3 Resource Management 24.9 54.5 Cross-Site Request Forgery 33.1 18.6 Information Leak 28.0 18.0 Cryptography 12.1 38.7 Credentials Management 14.2 8.7 Arbitrary Code
  29. 29. © Microsoft Corporation Buffer Overflow • CVE 2008-0090: – “A certain ActiveX control in npUpload.dll in DivX Player 6.6.0 allows remote attackers to cause a denial of service (Internet Explorer 7 crash) via a long argument to the SetPassword method.” – Possible classifiers: input validation, resource management, credentials management – Assigned classifier: buffer overflow
  30. 30. © Microsoft Corporation Buffer Overflow • Vulnerability descriptions are sometimes not very specific • CWEs are not mutually exclusive (there is “buffer overflow” and “arbitrary code”) • CWE assignment is not quality-checked • Only one CWE can be assigned, even when the CVE is about multiple issues
  31. 31. © Microsoft Corporation Get the Data and Scripts! http://tomz.me/issre2010cve (Or follow the link in the paper.)
  32. 32. © Microsoft Corporation PHP: declining, with occasional SQL injection. Buffer Overflows: flattening out after decline. Format Strings: in steep decline. SQL Injection and XSS: remaining strong, and rising. Cross-Site Request Forgery: a sleeping giant perhaps, stirring. Application Servers: rising steeply. http://msrconf.org
  33. 33. © Microsoft Corporation Thank you!
  34. 34. © Microsoft Corporation Mining Software Repositories 2011 http://msrconf.org

×