Approximating Attack Surfaces with Stack Traces [ICSE 15]

Christopher Theisen†, Kim Herzig‡, Patrick Morrison†, Brendan Murphy‡,
Laurie Williams†
†North Carolina State University
‡Microsoft Research, Cambridge UK
Approximating Attack Surfaces
with Stack Traces

1/17Introduction | Methodology | Results and Discussion | Future Work | Conclusion

Before we start…
What is the “Attack Surface” of a system?
Ex. early approximation of attack surface – Manadhata [2]:
Only covers API entry points
…easy to say, hard to define (practically).
The (OWASP) Attack Surface of an application is: [1]
1. …paths into and out of the application
2. the code that protects these paths
3. all valuable data used in the application
4. the code that protects data
Introduction | Methodology | Results and Discussion | Future Work | Conclusion 2/17
[1] https://www.owasp.org/index.php?title=Attack_Surface_Analysis_Cheat_Sheet&oldid=156006
[2] Manadhata, P., Wing, J., Flynn, M., & McQueen, M. (2006, October). Measuring the attack surfaces of two FTP daemons. In Proceedings of the 2nd
ACM workshop on Quality of protection (pp. 3-10). ACM

Our goal is to aid software engineers in
prioritizing security efforts by
approximating the attack surface of a
system via stack trace analysis.

Proposed Solution
Stack traces represent user activity that puts the system under stress
There’s a defect of some sort; does it have security implications?
Stack traces may localize security flaws
Crashes caused by user activity
Bad input that was handled improperly, et cetera
Crashes are a DoS attack by definition; you brought the service or
system down!
Hardware crashes are excluded

Research Questions
RQ1: How effectively can stack traces to be used to
approximate the attack surface of a system?
RQ2: Can the performance of vulnerability prediction be
improved by limiting the prediction space to the
approximated attack surface?

Overview
Catalog all code that appears on stack traces

Data Sources
Introduction | Methodology | Results and Discussion | Future Work | Conclusion
[4] "Description of the Dr. Watson for Windows," Microsoft Corporation, [Online]. Available: http://support.microsoft.com/kb/308538/en-us.
7/17

Attack Surface Construction (RQ1)
Data source, Crash ID, binary [4000+], filename [100,000+], function [10,000,000+]
Crashes Provide:
Binary
Function
foo!foobarDeviceQueueRequest+0x68
foo!fooDeviceSetup+0x72
foo!fooAllDone+0xA8
bar!barDeviceQueueRequest+0xB6
bar!barDeviceSetup+0x08
bar!barAllDone+0xFF
center!processAction+0x1034
center!dontDoAnything+0x1030

Results (RQ1)
Fuzzing
User Induced
Crashes
%binaries 0.9% 48.4%
%vulnerabilities 14.9% 94.6%
Microsoft targets fuzzing towards high-risk modules
We are covering the majority of vulnerabilities seen!
Targeting different crashes gets different results

Prediction Models (RQ2)
We believe that the key for [improving prediction] is by:
(1) developing new prediction techniques that deal with the
“needle in the haystack” problem
(2) finding new metrics that deal with the unique characteristics
of vulnerabilities and attacks.
Zimmermann et al. study [3]:
[3] T. Zimmermann, N. Nagappan and L. Williams, "Searching for a Needle in a Haystack: Predicting Security Vulnerabilities for Windows Vista," in Software
Testing, Verification and Validation (ICST), 2010 Third International Conference on, 2010
10/17

Prediction Models (RQ2)
We believe that the key for [improving prediction] is by:
(1) developing new prediction techniques that deal with the
“needle in the haystack” problem
(2) finding new metrics that deal with the unique characteristics
of vulnerabilities and attacks.
Zimmermann et al. study [3]:
Stack traces point to where flawed code lives!
[3] T. Zimmermann, N. Nagappan and L. Williams, "Searching for a Needle in a Haystack: Predicting Security Vulnerabilities for Windows Vista," in Software
Testing, Verification and Validation (ICST), 2010 Third International Conference on, 2010
10/17

Prediction Model Construction (RQ2)
Replicated the VPM from Windows Vista study
Run the VPM with all files considered as possibly vulnerable
Repeat, but remove code not found on stack traces
Vulnerability Prediction Model (VPM)
29 metrics in 6 categories:
Churn
Dependency
Legacy
CODEMINE data [5]
Size
Defects
Pre-release vulnerabilities
[5] J. Czerwonka, N. Nagappan, W. Schulte and B. Murphy, "CODEMINE: Building a Software Development Data Analytics Platform at Microsoft,"
Software, IEEE, vol. 30, no. 4, pp. 64--71, 2013.
11/17

Results (RQ2)
Comparing the VPM
run on all files vs. just
attack surface files…
Precision improved
from 0.5 to 0.69
Recall improved from
0.02 to 0.05
Statistical improvement
Practical? No.

Problems with Precision [6]
No. Low precision is fine in several situations.
When the cost of missing the target is prohibitively expensive.
When only a small fraction [of] the data is returned.
When there is little or no cost in checking false alarms.
Are low precision predictors unsatisfactory?
…especially on highly imbalanced datasets.
Recall and precision like to compete
[6] Tim Menzies, Alex Dekhtyar, Justin Distefano, and Jeremy Greenwald. 2007. Problems with Precision: A Response to "Comments on 'Data
Mining Static Code Attributes to Learn Defect Predictors'". IEEE Trans. Softw. Eng. 33, 9 (September 2007)

Problems with Precision [6]
No. Low precision is fine in several situations.
When the cost of missing the target is prohibitively expensive.
When only a small fraction [of] the data is returned.
When there is little or no cost in checking false alarms.
This seems appropriate for security flaws!
Are low precision predictors unsatisfactory?
…especially on highly imbalanced datasets.
Recall and precision like to compete
[6] Tim Menzies, Alex Dekhtyar, Justin Distefano, and Jeremy Greenwald. 2007. Problems with Precision: A Response to "Comments on 'Data
Mining Static Code Attributes to Learn Defect Predictors'". IEEE Trans. Softw. Eng. 33, 9 (September 2007)
13/17

Lessons Learned - Visualizations

Limitations
Stack traces are a good metric for Windows 8…
Different levels of granularity? (File/Function)
Smaller projects? Open source?
Not operating systems?
Results don’t necessarily generalize
Other learners?
Oversampling and Undersampling?
What else can we do with VPM’s?

Future Work
What else can we do with stack traces?
Frequency of appearance
Dependencies, not the entities themselves
How many stack traces are required?
Sliding window; how does the approximation change over time?
Additional Metrics
Visualization Plugin for IDEs
…does it actually help?
Tool Development

foo!foobarDeviceQueueRequest+0x68
foo!fooDeviceSetup+0x72
foo!fooAllDone+0xA8
bar!barDeviceQueueRequest+0xB6
bar!barDeviceSetup+0x08
bar!barAllDone+0xFF
center!processAction+0x1034
center!dontDoAnything+0x1030
Conclusion
17/17

Approximating Attack Surfaces with Stack Traces [ICSE 15]

Recommended

Recommended

More Related Content

Similar to Approximating Attack Surfaces with Stack Traces [ICSE 15]

Similar to Approximating Attack Surfaces with Stack Traces [ICSE 15] (20)

More from Chris Theisen

More from Chris Theisen (6)

Recently uploaded

Recently uploaded (20)

Approximating Attack Surfaces with Stack Traces [ICSE 15]