SlideShare a Scribd company logo
1 of 28
Risk-Based Attack Surface Approximation:
How Much Data is Enough?
Chris Theisen, Brendan Murphy, Kim Herzig, Laurie Williams
North Carolina State University
Microsoft Research
Introduction
What is the “Attack Surface”? Quoting the Open Web Application
Security Project…
• All paths for data and commands in a software system
• The data that travels these paths
• The code that implements and protects both
Concept used for security effort prioritization.
3
Introduction | Background | Methodology | Results | Conclusion
4
Crashes represent activity that put the system under
stress.
Stack Traces tell us what happened.
foo!foobarDeviceQueueRequest+0x68
foo!fooDeviceSetup+0x72
foo!fooAllDone+0xA8
bar!barDeviceQueueRequest+0xB6
bar!barDeviceSetup+0x08
bar!barAllDone+0xFF
center!processAction+0x1034
center!dontDoAnything+0x1030
Risk-Based Attack Surface Approximation
(RASA)
Introduction | Background | Methodology | Results | Conclusion
• Previous RASA study used tens of millions of crashes.
• Previous study was per binary.
Previously…
5
[SEIP ‘15] Chris Theisen, Kim Herzig, Pat Morrison, Brendan Murphy, and Laurie Williams, “Approximating Attack Surfaces with Stack Traces”, in
Companion Proceedings of the 37th International Conference on Software Engineering (2015).
[SEIP ‘15] Crashes
%binaries 48.4%
%vulnerabilities 94.6%
Introduction | Background | Methodology | Results | Conclusion
• Previous RASA study used tens of millions of crashes.
• Previous study was per binary.
Previously…
6
[SEIP ‘15] Chris Theisen, Kim Herzig, Pat Morrison, Brendan Murphy, and Laurie Williams, “Approximating Attack Surfaces with Stack Traces”, in
Companion Proceedings of the 37th International Conference on Software Engineering (2015).
[SEIP ‘15] Crashes
%binaries 48.4%
%vulnerabilities 94.6%
Great! All done, right?
Introduction | Background | Methodology | Results | Conclusion
Practitioner Problems
• Previous RASA study used tens of millions of crashes.
• Previous study was per binary.
7
Introduction | Background | Methodology | Results | Conclusion
Practitioner Problems
• Previous RASA study used tens of millions of crashes.
• Previous study was per binary.
• Practitioners had some issues with it…
– “Binary prioritization isn’t actionable.”
8
Introduction | Background | Methodology | Results | Conclusion
Practitioner Problems
• Previous RASA study used tens of millions of crashes.
• Previous study was per binary.
• Practitioners had some issues with it…
– “Binary prioritization isn’t actionable.”
– “We don’t have that much data!”
9
Introduction | Background | Methodology | Results | Conclusion
Practitioner Problems
• Previous RASA study used tens of millions of crashes.
• Previous study was per binary.
• Practitioners had some issues with it…
– “Binary prioritization isn’t actionable.”
– “We don’t have that much data!”
– “We don’t store every crash we received, we don’t
see the value in that.”
10
Introduction | Background | Methodology | Results | Conclusion
Practitioner Problems
• Previous RASA study used tens of millions of crashes.
• Previous study was per binary.
• Practitioners had some issues with it…
– “Binary prioritization isn’t actionable.”
– “We don’t have that much data!”
– “We don’t store every crash we received, we don’t
see the value in that.”
– “We don’t have historical vulnerabilities to use as a
goodness measure.”
11
Introduction | Background | Methodology | Results | Conclusion
Research Questions
• RQ1: Can the RASA approach be implemented at the
source code file level with actionable results?
• RQ2: How does random sampling of crash dump stack
traces effect RASA?
12
Introduction | Background | Methodology | Results | Conclusion
Data Sources
• Mozilla Firefox
– ~1M crashes
– Vulnerability data from Mozilla Security
Blog and bug tracker
• Windows 8.1
– ~9M crashes
– Vulnerability data from internal data
sources
13
Introduction | Background | Methodology | Results | Conclusion
Methodology - RASA
14
Introduction | Background | Methodology | Results | Conclusion
Methodology - RASA
15
Introduction | Background | Methodology | Results | Conclusion
Methodology - RASA
16
Introduction | Background | Methodology | Results | Conclusion
Methodology - Sampling
17
10% of…
Introduction | Background | Methodology | Results | Conclusion
Methodology - Sampling
18
10% of…
20% of…
Introduction | Background | Methodology | Results | Conclusion
Methodology - Sampling
19
10% of…
20% of…
• Sample at each “level”
• Record stdev of files,
vulnerabilities covered
Introduction | Background | Methodology | Results | Conclusion
20
12%
13%
14%
15%
16%
17%
70%
71%
72%
73%
74%
75%
Random Sample Size
Introduction | Background | Methodology | Results | Conclusion
Files
Vulnerabilities
10%
12%
14%
16%
18%
20%
22%
24%
26%
30%
32%
34%
36%
38%
40%
42%
44%
46%
Random Sample Size
21
Introduction | Background | Methodology | Results | Conclusion
Files
Vulnerabilities
Why Does Sampling Work?
• Crashes tend not to happen in isolation.
– If something crashes once, it will likely crash again.
• For Firefox, only 6 files in the data set with a vulnerability
had only one crash occurrence.
– Against ~300 vulnerable files, 50,000 total files
• If foo.cpp crashes many times, random sampling unlikely
to remove all foo.cpp’s from the dataset.
22
Introduction | Background | Methodology | Results | Conclusion
Future Work
• We have a list of vulnerable files; now what?
– Further prioritization to assist developers.
• We’re looking at:
– How the attack surface changes over time.
– How the complexity of the attack surface predicts
vulnerabilities.
– How proximity to the boundary of a software
system predicts vulnerabilities.
23
Introduction | Background | Methodology | Results | Conclusion
Conclusions
• “Binary prioritization isn’t actionable.”
– RASA can prioritize security effort effectively at the
source code file level.
24
Introduction | Background | Methodology | Results | Conclusion
Conclusions
• “Binary prioritization isn’t actionable.”
– RASA can prioritize security effort effectively at the
source code file level.
• “We don’t have that much data!”
– Orders of magnitude less data required compared
to previous studies.
25
Introduction | Background | Methodology | Results | Conclusion
Conclusions
• “We don’t store every crash we received, we don’t see
the value in that.”
– A naïve approach like random sampling still works.
26
Introduction | Background | Methodology | Results | Conclusion
Conclusions
• “We don’t store every crash we received, we don’t see
the value in that.”
– A naïve approach like random sampling still works.
• “We don’t have historical vulnerabilities to use as a
goodness measure.”
– Satisfied previous complaints with less data, naïve
sampling; evidence it will work on new systems.
27
Introduction | Background | Methodology | Results | Conclusion
28
foo!foobarDeviceQueueRequest+0x68
foo!fooDeviceSetup+0x72
foo!fooAllDone+0xA8
bar!barDeviceQueueRequest+0xB6
bar!barDeviceSetup+0x08
bar!barAllDone+0xFF
crtheise@ncsu.edu
@theisencr
theisencr.github.io
Expected Graduation: May 2018
Data Science, Security Analytics,
Security Education

More Related Content

Similar to Risk-Based Attack Surface Approximation: How Much Data is Enough? [ICSE - SEIP 2017]

Using Machine Learning to Understand Kafka Runtime Behavior (Shivanath Babu, ...
Using Machine Learning to Understand Kafka Runtime Behavior (Shivanath Babu, ...Using Machine Learning to Understand Kafka Runtime Behavior (Shivanath Babu, ...
Using Machine Learning to Understand Kafka Runtime Behavior (Shivanath Babu, ...confluent
 
Using Static Binary Analysis To Find Vulnerabilities And Backdoors in Firmware
Using Static Binary Analysis To Find Vulnerabilities And Backdoors in FirmwareUsing Static Binary Analysis To Find Vulnerabilities And Backdoors in Firmware
Using Static Binary Analysis To Find Vulnerabilities And Backdoors in FirmwareLastline, Inc.
 
Software Analytics - Achievements and Challenges
Software Analytics - Achievements and ChallengesSoftware Analytics - Achievements and Challenges
Software Analytics - Achievements and ChallengesTao Xie
 
Supporting image-based meta-analysis with NIDM: Standardized reporting of neu...
Supporting image-based meta-analysis with NIDM: Standardized reporting of neu...Supporting image-based meta-analysis with NIDM: Standardized reporting of neu...
Supporting image-based meta-analysis with NIDM: Standardized reporting of neu...Camille Maumet
 
Automating Speed: A Proven Approach to Preventing Performance Regressions in ...
Automating Speed: A Proven Approach to Preventing Performance Regressions in ...Automating Speed: A Proven Approach to Preventing Performance Regressions in ...
Automating Speed: A Proven Approach to Preventing Performance Regressions in ...HostedbyConfluent
 
Building an application security program
Building an application security programBuilding an application security program
Building an application security programOutpost24
 
Delivering Security Insights with Data Analytics and Visualization
Delivering Security Insights with Data Analytics and VisualizationDelivering Security Insights with Data Analytics and Visualization
Delivering Security Insights with Data Analytics and VisualizationRaffael Marty
 
BlackHat Presentation - Lies and Damn Lies: Getting past the Hype of Endpoint...
BlackHat Presentation - Lies and Damn Lies: Getting past the Hype of Endpoint...BlackHat Presentation - Lies and Damn Lies: Getting past the Hype of Endpoint...
BlackHat Presentation - Lies and Damn Lies: Getting past the Hype of Endpoint...Mike Spaulding
 
Machine Learning Application Development
Machine Learning Application DevelopmentMachine Learning Application Development
Machine Learning Application DevelopmentLARCA UPC
 
How to break web applications
How to break web applicationsHow to break web applications
How to break web applicationsDinis Cruz
 
Monitoring Application Attack Surface and Integrating Security into DevOps Pi...
Monitoring Application Attack Surface and Integrating Security into DevOps Pi...Monitoring Application Attack Surface and Integrating Security into DevOps Pi...
Monitoring Application Attack Surface and Integrating Security into DevOps Pi...Denim Group
 
501 ch 8 risk management tools
501 ch 8 risk management tools501 ch 8 risk management tools
501 ch 8 risk management toolsgocybersec
 
A Large-Scale Empirical Comparison of Static and DynamicTest Case Prioritizat...
A Large-Scale Empirical Comparison of Static and DynamicTest Case Prioritizat...A Large-Scale Empirical Comparison of Static and DynamicTest Case Prioritizat...
A Large-Scale Empirical Comparison of Static and DynamicTest Case Prioritizat...Kevin Moran
 
Webinar: Five Problems Facing Business-Critical NFS Deployments
Webinar: Five Problems Facing Business-Critical NFS DeploymentsWebinar: Five Problems Facing Business-Critical NFS Deployments
Webinar: Five Problems Facing Business-Critical NFS DeploymentsStorage Switzerland
 
Defcon 22-wesley-mc grew-instrumenting-point-of-sale-malware
Defcon 22-wesley-mc grew-instrumenting-point-of-sale-malwareDefcon 22-wesley-mc grew-instrumenting-point-of-sale-malware
Defcon 22-wesley-mc grew-instrumenting-point-of-sale-malwareDaveEdwards12
 
Policy 2012 presentation
Policy 2012 presentationPolicy 2012 presentation
Policy 2012 presentationbdemchak
 
RISELab: Enabling Intelligent Real-Time Decisions keynote by Ion Stoica
RISELab: Enabling Intelligent Real-Time Decisions keynote by Ion StoicaRISELab: Enabling Intelligent Real-Time Decisions keynote by Ion Stoica
RISELab: Enabling Intelligent Real-Time Decisions keynote by Ion StoicaSpark Summit
 
RISELab:Enabling Intelligent Real-Time Decisions
RISELab:Enabling Intelligent Real-Time DecisionsRISELab:Enabling Intelligent Real-Time Decisions
RISELab:Enabling Intelligent Real-Time DecisionsJen Aman
 

Similar to Risk-Based Attack Surface Approximation: How Much Data is Enough? [ICSE - SEIP 2017] (20)

Using Machine Learning to Understand Kafka Runtime Behavior (Shivanath Babu, ...
Using Machine Learning to Understand Kafka Runtime Behavior (Shivanath Babu, ...Using Machine Learning to Understand Kafka Runtime Behavior (Shivanath Babu, ...
Using Machine Learning to Understand Kafka Runtime Behavior (Shivanath Babu, ...
 
Using Static Binary Analysis To Find Vulnerabilities And Backdoors in Firmware
Using Static Binary Analysis To Find Vulnerabilities And Backdoors in FirmwareUsing Static Binary Analysis To Find Vulnerabilities And Backdoors in Firmware
Using Static Binary Analysis To Find Vulnerabilities And Backdoors in Firmware
 
My flings with data analysis
My flings with data analysisMy flings with data analysis
My flings with data analysis
 
Software Analytics - Achievements and Challenges
Software Analytics - Achievements and ChallengesSoftware Analytics - Achievements and Challenges
Software Analytics - Achievements and Challenges
 
Supporting image-based meta-analysis with NIDM: Standardized reporting of neu...
Supporting image-based meta-analysis with NIDM: Standardized reporting of neu...Supporting image-based meta-analysis with NIDM: Standardized reporting of neu...
Supporting image-based meta-analysis with NIDM: Standardized reporting of neu...
 
Automating Speed: A Proven Approach to Preventing Performance Regressions in ...
Automating Speed: A Proven Approach to Preventing Performance Regressions in ...Automating Speed: A Proven Approach to Preventing Performance Regressions in ...
Automating Speed: A Proven Approach to Preventing Performance Regressions in ...
 
Building an application security program
Building an application security programBuilding an application security program
Building an application security program
 
Delivering Security Insights with Data Analytics and Visualization
Delivering Security Insights with Data Analytics and VisualizationDelivering Security Insights with Data Analytics and Visualization
Delivering Security Insights with Data Analytics and Visualization
 
BlackHat Presentation - Lies and Damn Lies: Getting past the Hype of Endpoint...
BlackHat Presentation - Lies and Damn Lies: Getting past the Hype of Endpoint...BlackHat Presentation - Lies and Damn Lies: Getting past the Hype of Endpoint...
BlackHat Presentation - Lies and Damn Lies: Getting past the Hype of Endpoint...
 
Machine Learning Application Development
Machine Learning Application DevelopmentMachine Learning Application Development
Machine Learning Application Development
 
How to break web applications
How to break web applicationsHow to break web applications
How to break web applications
 
Monitoring Application Attack Surface and Integrating Security into DevOps Pi...
Monitoring Application Attack Surface and Integrating Security into DevOps Pi...Monitoring Application Attack Surface and Integrating Security into DevOps Pi...
Monitoring Application Attack Surface and Integrating Security into DevOps Pi...
 
Experience Sharing on School Pentest Project
Experience Sharing on School Pentest ProjectExperience Sharing on School Pentest Project
Experience Sharing on School Pentest Project
 
501 ch 8 risk management tools
501 ch 8 risk management tools501 ch 8 risk management tools
501 ch 8 risk management tools
 
A Large-Scale Empirical Comparison of Static and DynamicTest Case Prioritizat...
A Large-Scale Empirical Comparison of Static and DynamicTest Case Prioritizat...A Large-Scale Empirical Comparison of Static and DynamicTest Case Prioritizat...
A Large-Scale Empirical Comparison of Static and DynamicTest Case Prioritizat...
 
Webinar: Five Problems Facing Business-Critical NFS Deployments
Webinar: Five Problems Facing Business-Critical NFS DeploymentsWebinar: Five Problems Facing Business-Critical NFS Deployments
Webinar: Five Problems Facing Business-Critical NFS Deployments
 
Defcon 22-wesley-mc grew-instrumenting-point-of-sale-malware
Defcon 22-wesley-mc grew-instrumenting-point-of-sale-malwareDefcon 22-wesley-mc grew-instrumenting-point-of-sale-malware
Defcon 22-wesley-mc grew-instrumenting-point-of-sale-malware
 
Policy 2012 presentation
Policy 2012 presentationPolicy 2012 presentation
Policy 2012 presentation
 
RISELab: Enabling Intelligent Real-Time Decisions keynote by Ion Stoica
RISELab: Enabling Intelligent Real-Time Decisions keynote by Ion StoicaRISELab: Enabling Intelligent Real-Time Decisions keynote by Ion Stoica
RISELab: Enabling Intelligent Real-Time Decisions keynote by Ion Stoica
 
RISELab:Enabling Intelligent Real-Time Decisions
RISELab:Enabling Intelligent Real-Time DecisionsRISELab:Enabling Intelligent Real-Time Decisions
RISELab:Enabling Intelligent Real-Time Decisions
 

More from Chris Theisen

Public Key Cryptosystems and RSA
Public Key Cryptosystems and RSAPublic Key Cryptosystems and RSA
Public Key Cryptosystems and RSAChris Theisen
 
Prioritizing Security Efforts with a Risk-Based Attack Surface Approximation
Prioritizing Security Efforts with a Risk-Based Attack Surface ApproximationPrioritizing Security Efforts with a Risk-Based Attack Surface Approximation
Prioritizing Security Efforts with a Risk-Based Attack Surface ApproximationChris Theisen
 
Software Security Education at Scale
Software Security Education at ScaleSoftware Security Education at Scale
Software Security Education at ScaleChris Theisen
 
Automated Attack Surface Approximation [FSE - SRC 2015]
Automated Attack Surface Approximation [FSE - SRC 2015]Automated Attack Surface Approximation [FSE - SRC 2015]
Automated Attack Surface Approximation [FSE - SRC 2015]Chris Theisen
 
Attack Surface Analytics [ISSRE-DSW 15]
Attack Surface Analytics [ISSRE-DSW 15]Attack Surface Analytics [ISSRE-DSW 15]
Attack Surface Analytics [ISSRE-DSW 15]Chris Theisen
 
Science of Security Industry Day - October 2015
Science of Security Industry Day - October 2015Science of Security Industry Day - October 2015
Science of Security Industry Day - October 2015Chris Theisen
 
Approximating Attack Surfaces with Stack Traces [ICSE 15]
Approximating Attack Surfaces with Stack Traces [ICSE 15]Approximating Attack Surfaces with Stack Traces [ICSE 15]
Approximating Attack Surfaces with Stack Traces [ICSE 15]Chris Theisen
 

More from Chris Theisen (7)

Public Key Cryptosystems and RSA
Public Key Cryptosystems and RSAPublic Key Cryptosystems and RSA
Public Key Cryptosystems and RSA
 
Prioritizing Security Efforts with a Risk-Based Attack Surface Approximation
Prioritizing Security Efforts with a Risk-Based Attack Surface ApproximationPrioritizing Security Efforts with a Risk-Based Attack Surface Approximation
Prioritizing Security Efforts with a Risk-Based Attack Surface Approximation
 
Software Security Education at Scale
Software Security Education at ScaleSoftware Security Education at Scale
Software Security Education at Scale
 
Automated Attack Surface Approximation [FSE - SRC 2015]
Automated Attack Surface Approximation [FSE - SRC 2015]Automated Attack Surface Approximation [FSE - SRC 2015]
Automated Attack Surface Approximation [FSE - SRC 2015]
 
Attack Surface Analytics [ISSRE-DSW 15]
Attack Surface Analytics [ISSRE-DSW 15]Attack Surface Analytics [ISSRE-DSW 15]
Attack Surface Analytics [ISSRE-DSW 15]
 
Science of Security Industry Day - October 2015
Science of Security Industry Day - October 2015Science of Security Industry Day - October 2015
Science of Security Industry Day - October 2015
 
Approximating Attack Surfaces with Stack Traces [ICSE 15]
Approximating Attack Surfaces with Stack Traces [ICSE 15]Approximating Attack Surfaces with Stack Traces [ICSE 15]
Approximating Attack Surfaces with Stack Traces [ICSE 15]
 

Recently uploaded

在线办理UM毕业证迈阿密大学毕业证成绩单留信学历认证
在线办理UM毕业证迈阿密大学毕业证成绩单留信学历认证在线办理UM毕业证迈阿密大学毕业证成绩单留信学历认证
在线办理UM毕业证迈阿密大学毕业证成绩单留信学历认证nhjeo1gg
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degreeyuu sss
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our WorldEduminds Learning
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...Amil Baba Dawood bangali
 
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Boston Institute of Analytics
 
办美国加州大学伯克利分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
办美国加州大学伯克利分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree办美国加州大学伯克利分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
办美国加州大学伯克利分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degreeyuu sss
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFAAndrei Kaleshka
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Cantervoginip
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queensdataanalyticsqueen03
 
Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectBoston Institute of Analytics
 
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhh
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhhThiophen Mechanism khhjjjjjjjhhhhhhhhhhh
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhhYasamin16
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPTBoston Institute of Analytics
 
Business Analytics using Microsoft Excel
Business Analytics using Microsoft ExcelBusiness Analytics using Microsoft Excel
Business Analytics using Microsoft Excelysmaelreyes
 
SWOT Analysis Slides Powerpoint Template.pptx
SWOT Analysis Slides Powerpoint Template.pptxSWOT Analysis Slides Powerpoint Template.pptx
SWOT Analysis Slides Powerpoint Template.pptxviniciusperissetr
 
Real-Time AI Streaming - AI Max Princeton
Real-Time AI  Streaming - AI Max PrincetonReal-Time AI  Streaming - AI Max Princeton
Real-Time AI Streaming - AI Max PrincetonTimothy Spann
 
MK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docxMK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docxUnduhUnggah1
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 217djon017
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改yuu sss
 
Decoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis ProjectDecoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis ProjectBoston Institute of Analytics
 

Recently uploaded (20)

在线办理UM毕业证迈阿密大学毕业证成绩单留信学历认证
在线办理UM毕业证迈阿密大学毕业证成绩单留信学历认证在线办理UM毕业证迈阿密大学毕业证成绩单留信学历认证
在线办理UM毕业证迈阿密大学毕业证成绩单留信学历认证
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our World
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
 
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
 
办美国加州大学伯克利分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
办美国加州大学伯克利分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree办美国加州大学伯克利分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
办美国加州大学伯克利分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFA
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Canter
 
Call Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort ServiceCall Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort Service
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queens
 
Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis Project
 
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhh
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhhThiophen Mechanism khhjjjjjjjhhhhhhhhhhh
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhh
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
 
Business Analytics using Microsoft Excel
Business Analytics using Microsoft ExcelBusiness Analytics using Microsoft Excel
Business Analytics using Microsoft Excel
 
SWOT Analysis Slides Powerpoint Template.pptx
SWOT Analysis Slides Powerpoint Template.pptxSWOT Analysis Slides Powerpoint Template.pptx
SWOT Analysis Slides Powerpoint Template.pptx
 
Real-Time AI Streaming - AI Max Princeton
Real-Time AI  Streaming - AI Max PrincetonReal-Time AI  Streaming - AI Max Princeton
Real-Time AI Streaming - AI Max Princeton
 
MK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docxMK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docx
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
 
Decoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis ProjectDecoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis Project
 

Risk-Based Attack Surface Approximation: How Much Data is Enough? [ICSE - SEIP 2017]

  • 1. Risk-Based Attack Surface Approximation: How Much Data is Enough? Chris Theisen, Brendan Murphy, Kim Herzig, Laurie Williams North Carolina State University Microsoft Research
  • 2.
  • 3. Introduction What is the “Attack Surface”? Quoting the Open Web Application Security Project… • All paths for data and commands in a software system • The data that travels these paths • The code that implements and protects both Concept used for security effort prioritization. 3 Introduction | Background | Methodology | Results | Conclusion
  • 4. 4 Crashes represent activity that put the system under stress. Stack Traces tell us what happened. foo!foobarDeviceQueueRequest+0x68 foo!fooDeviceSetup+0x72 foo!fooAllDone+0xA8 bar!barDeviceQueueRequest+0xB6 bar!barDeviceSetup+0x08 bar!barAllDone+0xFF center!processAction+0x1034 center!dontDoAnything+0x1030 Risk-Based Attack Surface Approximation (RASA) Introduction | Background | Methodology | Results | Conclusion
  • 5. • Previous RASA study used tens of millions of crashes. • Previous study was per binary. Previously… 5 [SEIP ‘15] Chris Theisen, Kim Herzig, Pat Morrison, Brendan Murphy, and Laurie Williams, “Approximating Attack Surfaces with Stack Traces”, in Companion Proceedings of the 37th International Conference on Software Engineering (2015). [SEIP ‘15] Crashes %binaries 48.4% %vulnerabilities 94.6% Introduction | Background | Methodology | Results | Conclusion
  • 6. • Previous RASA study used tens of millions of crashes. • Previous study was per binary. Previously… 6 [SEIP ‘15] Chris Theisen, Kim Herzig, Pat Morrison, Brendan Murphy, and Laurie Williams, “Approximating Attack Surfaces with Stack Traces”, in Companion Proceedings of the 37th International Conference on Software Engineering (2015). [SEIP ‘15] Crashes %binaries 48.4% %vulnerabilities 94.6% Great! All done, right? Introduction | Background | Methodology | Results | Conclusion
  • 7. Practitioner Problems • Previous RASA study used tens of millions of crashes. • Previous study was per binary. 7 Introduction | Background | Methodology | Results | Conclusion
  • 8. Practitioner Problems • Previous RASA study used tens of millions of crashes. • Previous study was per binary. • Practitioners had some issues with it… – “Binary prioritization isn’t actionable.” 8 Introduction | Background | Methodology | Results | Conclusion
  • 9. Practitioner Problems • Previous RASA study used tens of millions of crashes. • Previous study was per binary. • Practitioners had some issues with it… – “Binary prioritization isn’t actionable.” – “We don’t have that much data!” 9 Introduction | Background | Methodology | Results | Conclusion
  • 10. Practitioner Problems • Previous RASA study used tens of millions of crashes. • Previous study was per binary. • Practitioners had some issues with it… – “Binary prioritization isn’t actionable.” – “We don’t have that much data!” – “We don’t store every crash we received, we don’t see the value in that.” 10 Introduction | Background | Methodology | Results | Conclusion
  • 11. Practitioner Problems • Previous RASA study used tens of millions of crashes. • Previous study was per binary. • Practitioners had some issues with it… – “Binary prioritization isn’t actionable.” – “We don’t have that much data!” – “We don’t store every crash we received, we don’t see the value in that.” – “We don’t have historical vulnerabilities to use as a goodness measure.” 11 Introduction | Background | Methodology | Results | Conclusion
  • 12. Research Questions • RQ1: Can the RASA approach be implemented at the source code file level with actionable results? • RQ2: How does random sampling of crash dump stack traces effect RASA? 12 Introduction | Background | Methodology | Results | Conclusion
  • 13. Data Sources • Mozilla Firefox – ~1M crashes – Vulnerability data from Mozilla Security Blog and bug tracker • Windows 8.1 – ~9M crashes – Vulnerability data from internal data sources 13 Introduction | Background | Methodology | Results | Conclusion
  • 14. Methodology - RASA 14 Introduction | Background | Methodology | Results | Conclusion
  • 15. Methodology - RASA 15 Introduction | Background | Methodology | Results | Conclusion
  • 16. Methodology - RASA 16 Introduction | Background | Methodology | Results | Conclusion
  • 17. Methodology - Sampling 17 10% of… Introduction | Background | Methodology | Results | Conclusion
  • 18. Methodology - Sampling 18 10% of… 20% of… Introduction | Background | Methodology | Results | Conclusion
  • 19. Methodology - Sampling 19 10% of… 20% of… • Sample at each “level” • Record stdev of files, vulnerabilities covered Introduction | Background | Methodology | Results | Conclusion
  • 20. 20 12% 13% 14% 15% 16% 17% 70% 71% 72% 73% 74% 75% Random Sample Size Introduction | Background | Methodology | Results | Conclusion Files Vulnerabilities
  • 21. 10% 12% 14% 16% 18% 20% 22% 24% 26% 30% 32% 34% 36% 38% 40% 42% 44% 46% Random Sample Size 21 Introduction | Background | Methodology | Results | Conclusion Files Vulnerabilities
  • 22. Why Does Sampling Work? • Crashes tend not to happen in isolation. – If something crashes once, it will likely crash again. • For Firefox, only 6 files in the data set with a vulnerability had only one crash occurrence. – Against ~300 vulnerable files, 50,000 total files • If foo.cpp crashes many times, random sampling unlikely to remove all foo.cpp’s from the dataset. 22 Introduction | Background | Methodology | Results | Conclusion
  • 23. Future Work • We have a list of vulnerable files; now what? – Further prioritization to assist developers. • We’re looking at: – How the attack surface changes over time. – How the complexity of the attack surface predicts vulnerabilities. – How proximity to the boundary of a software system predicts vulnerabilities. 23 Introduction | Background | Methodology | Results | Conclusion
  • 24. Conclusions • “Binary prioritization isn’t actionable.” – RASA can prioritize security effort effectively at the source code file level. 24 Introduction | Background | Methodology | Results | Conclusion
  • 25. Conclusions • “Binary prioritization isn’t actionable.” – RASA can prioritize security effort effectively at the source code file level. • “We don’t have that much data!” – Orders of magnitude less data required compared to previous studies. 25 Introduction | Background | Methodology | Results | Conclusion
  • 26. Conclusions • “We don’t store every crash we received, we don’t see the value in that.” – A naïve approach like random sampling still works. 26 Introduction | Background | Methodology | Results | Conclusion
  • 27. Conclusions • “We don’t store every crash we received, we don’t see the value in that.” – A naïve approach like random sampling still works. • “We don’t have historical vulnerabilities to use as a goodness measure.” – Satisfied previous complaints with less data, naïve sampling; evidence it will work on new systems. 27 Introduction | Background | Methodology | Results | Conclusion

Editor's Notes

  1. In 2010 paper, Tom Zimmermann compared finding vulnerabilities in code to finding “a needle in a haystack.” Based on my experience as a security engineer, it can be more like finding a needle in a field of them! Most difficult part of my job.
  2. One prioritization technique is to identify the attack surface. Commonly used for prioritization of effort, but typically based on the common knowledge of the team; imperfect.
  3. We developed approach called Risk-Based Attack Surface Approximation, or RASA… Hypothesis was that code that crashes shares properties with vulnerable code. Attempting to crash systems is one of the primary tools in forensics/red-team activity; we’re reverse-engineering what attackers do.
  4. So we evangelized this approach to industry days at NCSU, tech talks, etc. Got great feedback on making RASA actionable. I want to highlight a few of the words from the previous slide.
  5. When we follow up with, “well, just run it and see!” the response we got was… So we need to run studies at a lower level of granularity, with a lot less data, while still having historical vulnerabilities to compare against.
  6. Mine out individual code artifacts (files, in this case), from the stack traces in crash dumps.
  7. Map code mined from each crash to code in source control; use binary/function mapping for files, if files aren’t in crash
  8. The resultant pairing between code on crashes to source control code is our approximation of the attack surface. Simple by design; not language limited, works in multiple domains, just need crashes with stack traces. Highly flexible!
  9. To limit data, we do random sampling of crashes placed into the process at the first step. 10% of crashes, 20% of crashes, et cetera.
  10. Do multiple samples at each “level” (10%, 20%, etc), also look at how the attack surface changes from sample to sample at each sampling level. Trying to answer; does our approximation change greatly with different samplings of crash dump stack traces?
  11. Chart of percentage of files on the attack surface vs. vulnerabiltiies covered by attack surface. Vulnerabilities are 5 times as likely to be in code that crashes than not! Great place to start to run other tools, like static analysis, vulnerability prediction models, etc. Limits the work you need to do. Tiny variations, can use a quarter of the data available and our metrics change less than a percentage point with no change between samples.
  12. Comparison against previous study; we see a similar 2:1 ratio from vulnerabilities to files, so vulnerabilities are twice as dense in crashing code across all samples.
  13. Shorten or identify keywords
  14. Make a chart of this data