CrashLocator: Locating Crashing Faults Based on Crash Stacks (ISSTA 2014)

Sung Kim
Sung KimAssociate Prof.
CrashLocator: Locating Crashing
Faults Based on Crash Stacks
Rongxin Wu1, Hongyu Zhang2,
Shing-Chi Cheung1 and Sunghun Kim1
The Hong Kong University of Science and Technology1
Microsoft Research2
July 24th , 2014
ISSTA 2014
Background
2
Crash Information
with Crash Stack
Crash Reporting System
Software Crash
Bug ReportsDevelopers Crash Buckets
Feedbacks From Mozilla
Developers
• Locating crashing faults is hard
• Ad hoc approach
“… and look at the crash stack listed. It shows the line number
of the code, and then I go to the code and inspect it. If I am
unsure what it does I go to the second line of the stack and
code and inspect that, and so on and so forth …”
“Some crashes are hard to fix because it is not necessarily
indicative of the place where it crashes in the crash stack …”
“ I use the top down method of following the crash backwards.”
“Sometimes it can be very difficult.”
3
Uncertain Fault Location
• The faulty function may not appear in crash stack
About 33%~41% of crashing faults in Firefox
cannot be located in crash stacks!
A
B
C E F G
H
Buggy Code
D
Crash Stack
Crash Point
4
• Related Work
• Tarantula
(J. A. Jones et al., ICSE 2002)
(J. A. Jones et al., ASE 2005)
• Jaccard
(R. Abreu et al., TAICPART-MUTATION 2007)
• Ochiai
(R. Abreu et al., TAICPART-MUTATION 2007)
(S. Art et al., ISSTA 2010)
• …
• Passing Traces and Failing Traces
Spectrum-Based Fault Localization
5
• Are these techniques applicable?
Spectrum-Based Fault Localization
Instrumented
Product Software
Failing Traces
Passing Traces
Privacy Concern
Performance Overhead
(C. Luk et al., PLDI 2005)
x
Crash Stack
f1
f2
f3
…
fn
x
Test Cases
Effectiveness
(S. Artzi et al., ISSTA’10)
6
Our Research Goal
How to help developers fix crashing faults?
– Locate crashing faults based on crash stack
7
Our technique: CrashLocator
• Target at locating faulty functions
• No instrumentation needed
• Approximate failing traces
 Based on Crash Stacks
 Use static analysis techniques
• Rank suspicious functions
 Without passing traces
 Based on characteristics of faulty functions
8
Approximate Failing Traces
• Basic Stack Expansion Algorithm
A
B
C
D
Crash Stack
E
J
M
N
Depth-1
F
K
L
Depth-2
G
H
Depth-3 A
B J
C K L
D E
M N F
G H
Call Graph
9
functionposition File Line
D0 file_0 l0
C1 file_1 l1
B2 file_2 l2
A3 file_3 l3
Crash Stack
Approximate Failing Traces
• Basic Stack Expansion Algorithm
 Function call information only
• Improved Stack Expansion Algorithm
 Source file position information
10
Improved Stack Expansion
Algorithm
• Control Flow Analysis
if
J()
…
B()
…
Entry
Exit
In Crash Stack
CFG of A
A
B
C
D
Crash Stack
E
J
M
N
Depth-1
F
K
L
Depth-2
G
H
Depth-3
11
Improved Stack Expansion
Algorithm
• Backward Slicing
1. Obj D(){
2. Obj s;
3. int a = M();
4. char b = ‘’;
5. Obj[] c = N(b);
6. s=c[1]; //crash here
7. if(s!=‘’){
8. …
9. }
8. …
9. }
variables {s,c}
A
B
C
D
Crash Stack
E
M
N
Depth-1
F
Depth-2
G
H
Depth-3
Not in slicing
12
After crash stack expansion, there are still
a large number of suspicious functions
How to rank the suspicious functions?
13
Rank suspicious functions
• An empirical study on the characteristics of faulty
functions
• Quantify the suspiciousness of suspicious functions
14
Observation 1:
Frequent Function
• Faulty functions appear frequently in the crash
traces of the corresponding buckets.
 Function Frequency (FF)
Crash
Report More Frequent,
More Suspicious
For 89-92% crashing faults, the associated faulty functions
appear in all crash execution traces in the corresponding bucket.
Crash Bucket
15
Frequent Function
• Some frequent functions are unlikely to be buggy
 Entry points (main, _RtlUserThreadStart, …)
 Event handling routine (CloseHandle)
• Information retrieval, some frequent words are useless
 stop-words, e.g. “the”, “an”, “a”
 Inverse Document Frequency (IDF)
• Inverse Bucket Frequency (IBF)
 If a function appears in many buckets, it is less likely to be
buggy
16
Observation 2:
Functions Close to Crash Point
• Faulty functions appear closer to crash point
 In Mozilla Firefox, for 84.3% of crashing faults, the
distance between crash point and the associated faulty
functions is less 5.
• Inverse Average Distance to Crash Point (IAD)
17
Observation 3:
Less Frequently Changed Functions
• Functions that do not contain crashing faults are
often less frequently changed
 94.1% of faulty functions have been changed at least
once during the past 12 months
 Immune Functions (Y. Dang et al. ICSE 2012)
• Less frequently changed functions
 Functions that have no changes in past 12 months
 Suspicious score is 0
18
Observation 4: Large Functions
• Our prior study (H. Zhang. ICSM 2009) showed that
large modules are more likely to be defect-prone
• Function’s Lines of Code (FLOC)
19
Suspicious Score
𝑆𝑐𝑜𝑟𝑒 𝑓, 𝐵 = 𝐹𝐹 𝑓, 𝐵 ∗ 𝐼𝐵𝐹 𝑓 ∗ 𝐼𝐴𝐷 𝑓, 𝐵 ∗ 𝐹𝐿𝑂𝐶(𝑓)
• FF (Function Frequency)
𝐹𝐹 𝑓, 𝐵 =
𝑁𝑓,𝐵
𝑁 𝐵
• IBF(Inverse Bucket Frequency)
𝐼𝐵𝐹 𝑓 = 𝑙𝑜𝑔(
#𝐵
#𝐵𝑓
+ 1)
• IAD(Inverse Distance to Crash Point)
𝐼𝐴𝐷 𝑓, 𝐵 =
𝑁𝑓,𝐵
1 + 𝑗=1
𝑛
𝑑𝑖𝑠𝑗(𝑓)
• FLOC(Function Lines of Code)
𝐹𝐿𝑂𝐶 𝑓 = 𝑙𝑜𝑔 (𝐿𝑂𝐶 𝑓 + 1) 20
Evaluation Subjects
• Mozilla Products
 5 releases of Firefox
 2 releases of Thunderbird
 1 release of SeaMonkey
• 160 crashing faults(buckets)
• Large-Scale
 More than 2 million LOC
 More than 120K functions
21
Evaluation Metrics
• Recall@N: Percentage of successfully located faults
by examining top N recommended functions
• Mean Reciprocal Rank (MRR)
 Measure the quality of the ranking results in IR
 Range value: 0 ~ 1
 Higher value means better ranking
22
Experimental Design
• RQ1: How many faults can be successfully located by
CrashLocator?
• RQ2: Can CrashLocator outperform the conventional
stack-only methods?
• RQ3: How does each factor contribute to the crash
localization performance?
• RQ4: How effective is the proposed crash stack
expansion algorithm?
23
RQ1: CrashLocator Performance
System Recall@1 Recall@5 Recall@10 MRR
Firefox 4.0b4 55.6% 66.7% 77.8% 0.627
Firefox 4.0b5 47.1% 70.6% 70.6% 0.566
Firefox 4.0b6 48.0% 64.0% 64.0% 0.540
Firefox14.0.1 52.0% 52.0% 56.0% 0.528
Firefox16.0.1 53.8% 53.8% 53.8% 0.542
Thunderbird17.0 48.5% 66.7% 78.8% 0.568
Thunderbird24.0 50.0% 66.7% 66.7% 0.544
SeaMonkey2.21 55.0% 70.0% 70.0% 0.600
Summary 50.6% 63.7% 67.5% 0.559
24
RQ2: Comparison with Stack-Only
methods
• Conventional Stack-Only Methods
• StackOnlySampling
• StackOnlyAverage
• StackOnlyChangeDate
25
RQ2: Comparison with Stack-Only
methods
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
1 5 10 20 50 100
Recall@N
Top N Functions
StackOnlySampling
StackOnlyAverage
StackOnlyChangeDate
CrashLocator
26
RQ3: Contribution of Each Factors
• Inverse Bucket Frequency (IBF)
• Function Frequency (FF)
• Function’s Lines of Code (FLOC)
• Inverse Average Distance to Crash Point (IAD)
27
RQ3: Contribution of Each Factors
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
ff4.0b4 ff4.0b5 ff4.0b6 ff14.0.1 ff16.0.1 tb17.0 tb24.0 sm2.21 Summary
MRR
IBF IBF*FF IBF*FF*FLOC IBF*FF*FLOC*IAD
28
RQ4: Stack Expansion Algorithms
• Basic Stack Expansion Algorithm
 Static Call Graph
• Improved Stack Expansion Algorithm
 Static Call Graph
 Control Flow Analysis
 Backward Slicing
29
RQ4: Stack Expansion Algorithms
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
Recall@1 Recall@5 Recall@10 Recall@20 Recall@50 MRR
Basic Stack Trace Expansion Improved Stack Trace Expansion
30
Conclusions
• Propose a novel technique CrashLocator to locate
crashing faults based on crash stack only
• Evaluate on real and large-scale projects
• 50.6%, 63.7%, and 67.5% of crashing faults can be
located by examining only top 1,5,10 functions
• CrashLocator outperforms Stack-Only methods
significantly, with the improvement of MRR at least
32% and the improvement of Recall@10 at least 23%
31
1 of 31

Recommended

Crowd debugging (FSE 2015) by
Crowd debugging (FSE 2015)Crowd debugging (FSE 2015)
Crowd debugging (FSE 2015)Sung Kim
1.9K views33 slides
Automatically Generated Patches as Debugging Aids: A Human Study (FSE 2014) by
Automatically Generated Patches as Debugging Aids: A Human Study (FSE 2014)Automatically Generated Patches as Debugging Aids: A Human Study (FSE 2014)
Automatically Generated Patches as Debugging Aids: A Human Study (FSE 2014)Sung Kim
1.9K views65 slides
Partitioning Composite Code Changes to Facilitate Code Review (MSR2015) by
Partitioning Composite Code Changes to Facilitate Code Review (MSR2015)Partitioning Composite Code Changes to Facilitate Code Review (MSR2015)
Partitioning Composite Code Changes to Facilitate Code Review (MSR2015)Sung Kim
1.6K views24 slides
Personalized Defect Prediction by
Personalized Defect PredictionPersonalized Defect Prediction
Personalized Defect PredictionSung Kim
3.7K views78 slides
STAR: Stack Trace based Automatic Crash Reproduction by
STAR: Stack Trace based Automatic Crash ReproductionSTAR: Stack Trace based Automatic Crash Reproduction
STAR: Stack Trace based Automatic Crash ReproductionSung Kim
7K views99 slides
REMI: Defect Prediction for Efficient API Testing (

ESEC/FSE 2015, Industria... by
REMI: Defect Prediction for Efficient API Testing (

ESEC/FSE 2015, Industria...REMI: Defect Prediction for Efficient API Testing (

ESEC/FSE 2015, Industria...
REMI: Defect Prediction for Efficient API Testing (

ESEC/FSE 2015, Industria...Sung Kim
2.5K views16 slides

More Related Content

What's hot

A Survey on Dynamic Symbolic Execution for Automatic Test Generation by
A Survey on  Dynamic Symbolic Execution  for Automatic Test GenerationA Survey on  Dynamic Symbolic Execution  for Automatic Test Generation
A Survey on Dynamic Symbolic Execution for Automatic Test GenerationSung Kim
3.1K views65 slides
DeepAM: Migrate APIs with Multi-modal Sequence to Sequence Learning by
DeepAM: Migrate APIs with Multi-modal Sequence to Sequence LearningDeepAM: Migrate APIs with Multi-modal Sequence to Sequence Learning
DeepAM: Migrate APIs with Multi-modal Sequence to Sequence LearningSung Kim
1.3K views23 slides
Deep API Learning (FSE 2016) by
Deep API Learning (FSE 2016)Deep API Learning (FSE 2016)
Deep API Learning (FSE 2016)Sung Kim
1.4K views25 slides
Improving Fault Localization for Simulink Models using Search-Based Testing a... by
Improving Fault Localization for Simulink Models using Search-Based Testing a...Improving Fault Localization for Simulink Models using Search-Based Testing a...
Improving Fault Localization for Simulink Models using Search-Based Testing a...Lionel Briand
241 views18 slides
Dissertation Defense by
Dissertation DefenseDissertation Defense
Dissertation DefenseSung Kim
17K views87 slides
Source code comprehension on evolving software by
Source code comprehension on evolving softwareSource code comprehension on evolving software
Source code comprehension on evolving softwareSung Kim
1.6K views26 slides

What's hot(20)

A Survey on Dynamic Symbolic Execution for Automatic Test Generation by Sung Kim
A Survey on  Dynamic Symbolic Execution  for Automatic Test GenerationA Survey on  Dynamic Symbolic Execution  for Automatic Test Generation
A Survey on Dynamic Symbolic Execution for Automatic Test Generation
Sung Kim3.1K views
DeepAM: Migrate APIs with Multi-modal Sequence to Sequence Learning by Sung Kim
DeepAM: Migrate APIs with Multi-modal Sequence to Sequence LearningDeepAM: Migrate APIs with Multi-modal Sequence to Sequence Learning
DeepAM: Migrate APIs with Multi-modal Sequence to Sequence Learning
Sung Kim1.3K views
Deep API Learning (FSE 2016) by Sung Kim
Deep API Learning (FSE 2016)Deep API Learning (FSE 2016)
Deep API Learning (FSE 2016)
Sung Kim1.4K views
Improving Fault Localization for Simulink Models using Search-Based Testing a... by Lionel Briand
Improving Fault Localization for Simulink Models using Search-Based Testing a...Improving Fault Localization for Simulink Models using Search-Based Testing a...
Improving Fault Localization for Simulink Models using Search-Based Testing a...
Lionel Briand241 views
Dissertation Defense by Sung Kim
Dissertation DefenseDissertation Defense
Dissertation Defense
Sung Kim17K views
Source code comprehension on evolving software by Sung Kim
Source code comprehension on evolving softwareSource code comprehension on evolving software
Source code comprehension on evolving software
Sung Kim1.6K views
Survey on Software Defect Prediction by Sung Kim
Survey on Software Defect PredictionSurvey on Software Defect Prediction
Survey on Software Defect Prediction
Sung Kim14.1K views
It's Not a Bug, It's a Feature — How Misclassification Impacts Bug Prediction by sjust
It's Not a Bug, It's a Feature — How Misclassification Impacts Bug PredictionIt's Not a Bug, It's a Feature — How Misclassification Impacts Bug Prediction
It's Not a Bug, It's a Feature — How Misclassification Impacts Bug Prediction
sjust3K views
The Impact of Test Ownership and Team Structure on the Reliability and Effect... by Kim Herzig
The Impact of Test Ownership and Team Structure on the Reliability and Effect...The Impact of Test Ownership and Team Structure on the Reliability and Effect...
The Impact of Test Ownership and Team Structure on the Reliability and Effect...
Kim Herzig3K views
Issre2014 test defectprediction by Kim Herzig
Issre2014 test defectpredictionIssre2014 test defectprediction
Issre2014 test defectprediction
Kim Herzig1.6K views
Software testing: an introduction - 2017 by XavierDevroey
Software testing: an introduction - 2017Software testing: an introduction - 2017
Software testing: an introduction - 2017
XavierDevroey156 views
TMPA-2017: 5W+1H Static Analysis Report Quality Measure by Iosif Itkin
TMPA-2017: 5W+1H Static Analysis Report Quality MeasureTMPA-2017: 5W+1H Static Analysis Report Quality Measure
TMPA-2017: 5W+1H Static Analysis Report Quality Measure
Iosif Itkin1.1K views
Change Impact Analysis for Natural Language Requirements by Lionel Briand
Change Impact Analysis for Natural Language RequirementsChange Impact Analysis for Natural Language Requirements
Change Impact Analysis for Natural Language Requirements
Lionel Briand486 views
The Road Not Taken: Estimating Path Execution Frequency Statically by Ray Buse
The Road Not Taken: Estimating Path Execution Frequency StaticallyThe Road Not Taken: Estimating Path Execution Frequency Statically
The Road Not Taken: Estimating Path Execution Frequency Statically
Ray Buse1.2K views
SBST 2019 Keynote by Shiva Nejati
SBST 2019 Keynote SBST 2019 Keynote
SBST 2019 Keynote
Shiva Nejati395 views
Code Coverage and Test Suite Effectiveness: Empirical Study with Real Bugs in... by Pavneet Singh Kochhar
Code Coverage and Test Suite Effectiveness: Empirical Study with Real Bugs in...Code Coverage and Test Suite Effectiveness: Empirical Study with Real Bugs in...
Code Coverage and Test Suite Effectiveness: Empirical Study with Real Bugs in...
Automated and Scalable Solutions for Software Testing: The Essential Role of ... by Lionel Briand
Automated and Scalable Solutions for Software Testing: The Essential Role of ...Automated and Scalable Solutions for Software Testing: The Essential Role of ...
Automated and Scalable Solutions for Software Testing: The Essential Role of ...
Lionel Briand353 views
Dealing with the Three Horrible Problems in Verification by DVClub
Dealing with the Three Horrible Problems in VerificationDealing with the Three Horrible Problems in Verification
Dealing with the Three Horrible Problems in Verification
DVClub700 views
Presentation slides: "How to get 100% code coverage" by Rapita Systems Ltd
Presentation slides: "How to get 100% code coverage" Presentation slides: "How to get 100% code coverage"
Presentation slides: "How to get 100% code coverage"
Rapita Systems Ltd1.1K views

Viewers also liked

Heterogeneous Defect Prediction (

ESEC/FSE 2015) by
Heterogeneous Defect Prediction (

ESEC/FSE 2015)Heterogeneous Defect Prediction (

ESEC/FSE 2015)
Heterogeneous Defect Prediction (

ESEC/FSE 2015)Sung Kim
2.2K views28 slides
How Do Software Engineers Understand Code Changes? FSE 2012 by
How Do Software Engineers Understand Code Changes? FSE 2012How Do Software Engineers Understand Code Changes? FSE 2012
How Do Software Engineers Understand Code Changes? FSE 2012Sung Kim
1.8K views51 slides
The Anatomy of Developer Social Networks by
The Anatomy of Developer Social NetworksThe Anatomy of Developer Social Networks
The Anatomy of Developer Social NetworksSung Kim
835 views46 slides
How We Get There: A Context-Guided Search Strategy in Concolic Testing (FSE 2... by
How We Get There: A Context-Guided Search Strategy in Concolic Testing (FSE 2...How We Get There: A Context-Guided Search Strategy in Concolic Testing (FSE 2...
How We Get There: A Context-Guided Search Strategy in Concolic Testing (FSE 2...Sung Kim
2.2K views34 slides
Automatic patch generation learned from human written patches by
Automatic patch generation learned from human written patchesAutomatic patch generation learned from human written patches
Automatic patch generation learned from human written patchesSung Kim
9.2K views171 slides
A Survey on Automatic Test Generation and Crash Reproduction by
A Survey on Automatic Test Generation and Crash ReproductionA Survey on Automatic Test Generation and Crash Reproduction
A Survey on Automatic Test Generation and Crash ReproductionSung Kim
2.1K views56 slides

Viewers also liked(13)

Heterogeneous Defect Prediction (

ESEC/FSE 2015) by Sung Kim
Heterogeneous Defect Prediction (

ESEC/FSE 2015)Heterogeneous Defect Prediction (

ESEC/FSE 2015)
Heterogeneous Defect Prediction (

ESEC/FSE 2015)
Sung Kim2.2K views
How Do Software Engineers Understand Code Changes? FSE 2012 by Sung Kim
How Do Software Engineers Understand Code Changes? FSE 2012How Do Software Engineers Understand Code Changes? FSE 2012
How Do Software Engineers Understand Code Changes? FSE 2012
Sung Kim1.8K views
The Anatomy of Developer Social Networks by Sung Kim
The Anatomy of Developer Social NetworksThe Anatomy of Developer Social Networks
The Anatomy of Developer Social Networks
Sung Kim835 views
How We Get There: A Context-Guided Search Strategy in Concolic Testing (FSE 2... by Sung Kim
How We Get There: A Context-Guided Search Strategy in Concolic Testing (FSE 2...How We Get There: A Context-Guided Search Strategy in Concolic Testing (FSE 2...
How We Get There: A Context-Guided Search Strategy in Concolic Testing (FSE 2...
Sung Kim2.2K views
Automatic patch generation learned from human written patches by Sung Kim
Automatic patch generation learned from human written patchesAutomatic patch generation learned from human written patches
Automatic patch generation learned from human written patches
Sung Kim9.2K views
A Survey on Automatic Test Generation and Crash Reproduction by Sung Kim
A Survey on Automatic Test Generation and Crash ReproductionA Survey on Automatic Test Generation and Crash Reproduction
A Survey on Automatic Test Generation and Crash Reproduction
Sung Kim2.1K views
Transfer defect learning by Sung Kim
Transfer defect learningTransfer defect learning
Transfer defect learning
Sung Kim3.2K views
Defect, defect, defect: PROMISE 2012 Keynote by Sung Kim
Defect, defect, defect: PROMISE 2012 Keynote Defect, defect, defect: PROMISE 2012 Keynote
Defect, defect, defect: PROMISE 2012 Keynote
Sung Kim4.5K views
Tensor board by Sung Kim
Tensor boardTensor board
Tensor board
Sung Kim8.4K views
CNN 초보자가 만드는 초보자 가이드 (VGG 약간 포함) by Lee Seungeun
CNN 초보자가 만드는 초보자 가이드 (VGG 약간 포함)CNN 초보자가 만드는 초보자 가이드 (VGG 약간 포함)
CNN 초보자가 만드는 초보자 가이드 (VGG 약간 포함)
Lee Seungeun68K views
AWS 클라우드 기반 확장성 높은 천만 사용자 웹 서비스 만들기 - 윤석찬 by Amazon Web Services Korea
AWS 클라우드 기반 확장성 높은 천만 사용자 웹 서비스 만들기 - 윤석찬AWS 클라우드 기반 확장성 높은 천만 사용자 웹 서비스 만들기 - 윤석찬
AWS 클라우드 기반 확장성 높은 천만 사용자 웹 서비스 만들기 - 윤석찬
Time series classification by Sung Kim
Time series classificationTime series classification
Time series classification
Sung Kim5.7K views
의료빅데이터 컨테스트 결과 보고서 by GY Lee
의료빅데이터 컨테스트 결과 보고서의료빅데이터 컨테스트 결과 보고서
의료빅데이터 컨테스트 결과 보고서
GY Lee8.2K views

Similar to CrashLocator: Locating Crashing Faults Based on Crash Stacks (ISSTA 2014)

Effective Fault-Localization Techniques for Concurrent Software by
Effective Fault-Localization Techniques for Concurrent SoftwareEffective Fault-Localization Techniques for Concurrent Software
Effective Fault-Localization Techniques for Concurrent SoftwareSangmin Park
837 views66 slides
Fighting Software Inefficiency Through Automated Bug Detection by
 Fighting Software Inefficiency Through Automated Bug Detection Fighting Software Inefficiency Through Automated Bug Detection
Fighting Software Inefficiency Through Automated Bug DetectionMd E. Haque
245 views42 slides
Technical Workshop - Win32/Georbot Analysis by
Technical Workshop - Win32/Georbot AnalysisTechnical Workshop - Win32/Georbot Analysis
Technical Workshop - Win32/Georbot AnalysisPositive Hack Days
813 views27 slides
It Does What You Say, Not What You Mean: Lessons From A Decade of Program Repair by
It Does What You Say, Not What You Mean: Lessons From A Decade of Program RepairIt Does What You Say, Not What You Mean: Lessons From A Decade of Program Repair
It Does What You Say, Not What You Mean: Lessons From A Decade of Program RepairClaire Le Goues
2.3K views63 slides
How to Design a Program Repair Bot? Insights from the Repairnator Project by
How to Design a Program Repair Bot? Insights from the Repairnator ProjectHow to Design a Program Repair Bot? Insights from the Repairnator Project
How to Design a Program Repair Bot? Insights from the Repairnator ProjectSimon Urli
516 views30 slides
Practical Windows Kernel Exploitation by
Practical Windows Kernel ExploitationPractical Windows Kernel Exploitation
Practical Windows Kernel ExploitationzeroSteiner
2K views38 slides

Similar to CrashLocator: Locating Crashing Faults Based on Crash Stacks (ISSTA 2014)(20)

Effective Fault-Localization Techniques for Concurrent Software by Sangmin Park
Effective Fault-Localization Techniques for Concurrent SoftwareEffective Fault-Localization Techniques for Concurrent Software
Effective Fault-Localization Techniques for Concurrent Software
Sangmin Park837 views
Fighting Software Inefficiency Through Automated Bug Detection by Md E. Haque
 Fighting Software Inefficiency Through Automated Bug Detection Fighting Software Inefficiency Through Automated Bug Detection
Fighting Software Inefficiency Through Automated Bug Detection
Md E. Haque245 views
It Does What You Say, Not What You Mean: Lessons From A Decade of Program Repair by Claire Le Goues
It Does What You Say, Not What You Mean: Lessons From A Decade of Program RepairIt Does What You Say, Not What You Mean: Lessons From A Decade of Program Repair
It Does What You Say, Not What You Mean: Lessons From A Decade of Program Repair
Claire Le Goues2.3K views
How to Design a Program Repair Bot? Insights from the Repairnator Project by Simon Urli
How to Design a Program Repair Bot? Insights from the Repairnator ProjectHow to Design a Program Repair Bot? Insights from the Repairnator Project
How to Design a Program Repair Bot? Insights from the Repairnator Project
Simon Urli516 views
Practical Windows Kernel Exploitation by zeroSteiner
Practical Windows Kernel ExploitationPractical Windows Kernel Exploitation
Practical Windows Kernel Exploitation
zeroSteiner2K views
SAST, CWE, SEI CERT and other smart words from the information security world by Andrey Karpov
SAST, CWE, SEI CERT and other smart words from the information security worldSAST, CWE, SEI CERT and other smart words from the information security world
SAST, CWE, SEI CERT and other smart words from the information security world
Andrey Karpov233 views
Talos: Neutralizing Vulnerabilities with Security Workarounds for Rapid Respo... by Zhen Huang
Talos: Neutralizing Vulnerabilities with Security Workarounds for Rapid Respo...Talos: Neutralizing Vulnerabilities with Security Workarounds for Rapid Respo...
Talos: Neutralizing Vulnerabilities with Security Workarounds for Rapid Respo...
Zhen Huang77 views
DEF CON 27 - CHRISTOPHER ROBERTS - firmware slap by Felipe Prado
DEF CON 27 - CHRISTOPHER ROBERTS - firmware slapDEF CON 27 - CHRISTOPHER ROBERTS - firmware slap
DEF CON 27 - CHRISTOPHER ROBERTS - firmware slap
Felipe Prado61 views
Metasploit & Windows Kernel Exploitation by zeroSteiner
Metasploit & Windows Kernel ExploitationMetasploit & Windows Kernel Exploitation
Metasploit & Windows Kernel Exploitation
zeroSteiner4.3K views
ACSAC2016: Code Obfuscation Against Symbolic Execution Attacks by Sebastian Banescu
ACSAC2016: Code Obfuscation Against Symbolic Execution AttacksACSAC2016: Code Obfuscation Against Symbolic Execution Attacks
ACSAC2016: Code Obfuscation Against Symbolic Execution Attacks
Sebastian Banescu670 views
Building next gen malware behavioural analysis environment by isc2-hellenic
Building next gen malware behavioural analysis environment Building next gen malware behavioural analysis environment
Building next gen malware behavioural analysis environment
isc2-hellenic633 views
Getting started with RISC-V verification what's next after compliance testing by RISC-V International
Getting started with RISC-V verification what's next after compliance testingGetting started with RISC-V verification what's next after compliance testing
Getting started with RISC-V verification what's next after compliance testing
Lecture 1 Try Throw Catch.pptx by VishuSaini22
Lecture 1 Try Throw Catch.pptxLecture 1 Try Throw Catch.pptx
Lecture 1 Try Throw Catch.pptx
VishuSaini229 views
Presentation by Lionel Briand by Ptidej Team
Presentation by Lionel BriandPresentation by Lionel Briand
Presentation by Lionel Briand
Ptidej Team186 views
Using Static Binary Analysis To Find Vulnerabilities And Backdoors in Firmware by Lastline, Inc.
Using Static Binary Analysis To Find Vulnerabilities And Backdoors in FirmwareUsing Static Binary Analysis To Find Vulnerabilities And Backdoors in Firmware
Using Static Binary Analysis To Find Vulnerabilities And Backdoors in Firmware
Lastline, Inc.1.5K views
Testing Zen by day
Testing ZenTesting Zen
Testing Zen
day961 views

More from Sung Kim

MSR2014 opening by
MSR2014 openingMSR2014 opening
MSR2014 openingSung Kim
17K views16 slides
Predicting Recurring Crash Stacks (ASE 2012) by
Predicting Recurring Crash Stacks (ASE 2012)Predicting Recurring Crash Stacks (ASE 2012)
Predicting Recurring Crash Stacks (ASE 2012)Sung Kim
1.6K views30 slides
Puzzle-Based Automatic Testing: Bringing Humans Into the Loop by Solving Puzz... by
Puzzle-Based Automatic Testing: Bringing Humans Into the Loop by Solving Puzz...Puzzle-Based Automatic Testing: Bringing Humans Into the Loop by Solving Puzz...
Puzzle-Based Automatic Testing: Bringing Humans Into the Loop by Solving Puzz...Sung Kim
1.8K views45 slides
Software Development Meets the Wisdom of Crowds by
Software Development Meets the Wisdom of CrowdsSoftware Development Meets the Wisdom of Crowds
Software Development Meets the Wisdom of CrowdsSung Kim
1.4K views60 slides
BugTriage with Bug Tossing Graphs (ESEC/FSE 2009) by
BugTriage with Bug Tossing Graphs (ESEC/FSE 2009)BugTriage with Bug Tossing Graphs (ESEC/FSE 2009)
BugTriage with Bug Tossing Graphs (ESEC/FSE 2009)Sung Kim
2.1K views35 slides
Self-defending software: Automatically patching errors in deployed software ... by
Self-defending software: Automatically patching  errors in deployed software ...Self-defending software: Automatically patching  errors in deployed software ...
Self-defending software: Automatically patching errors in deployed software ...Sung Kim
1.6K views44 slides

More from Sung Kim(7)

MSR2014 opening by Sung Kim
MSR2014 openingMSR2014 opening
MSR2014 opening
Sung Kim17K views
Predicting Recurring Crash Stacks (ASE 2012) by Sung Kim
Predicting Recurring Crash Stacks (ASE 2012)Predicting Recurring Crash Stacks (ASE 2012)
Predicting Recurring Crash Stacks (ASE 2012)
Sung Kim1.6K views
Puzzle-Based Automatic Testing: Bringing Humans Into the Loop by Solving Puzz... by Sung Kim
Puzzle-Based Automatic Testing: Bringing Humans Into the Loop by Solving Puzz...Puzzle-Based Automatic Testing: Bringing Humans Into the Loop by Solving Puzz...
Puzzle-Based Automatic Testing: Bringing Humans Into the Loop by Solving Puzz...
Sung Kim1.8K views
Software Development Meets the Wisdom of Crowds by Sung Kim
Software Development Meets the Wisdom of CrowdsSoftware Development Meets the Wisdom of Crowds
Software Development Meets the Wisdom of Crowds
Sung Kim1.4K views
BugTriage with Bug Tossing Graphs (ESEC/FSE 2009) by Sung Kim
BugTriage with Bug Tossing Graphs (ESEC/FSE 2009)BugTriage with Bug Tossing Graphs (ESEC/FSE 2009)
BugTriage with Bug Tossing Graphs (ESEC/FSE 2009)
Sung Kim2.1K views
Self-defending software: Automatically patching errors in deployed software ... by Sung Kim
Self-defending software: Automatically patching  errors in deployed software ...Self-defending software: Automatically patching  errors in deployed software ...
Self-defending software: Automatically patching errors in deployed software ...
Sung Kim1.6K views
ReCrash: Making crashes reproducible by preserving object states (ECOOP 2008) by Sung Kim
ReCrash: Making crashes reproducible by preserving object states (ECOOP 2008)ReCrash: Making crashes reproducible by preserving object states (ECOOP 2008)
ReCrash: Making crashes reproducible by preserving object states (ECOOP 2008)
Sung Kim1.7K views

Recently uploaded

SAP FOR TYRE INDUSTRY.pdf by
SAP FOR TYRE INDUSTRY.pdfSAP FOR TYRE INDUSTRY.pdf
SAP FOR TYRE INDUSTRY.pdfVirendra Rai, PMP
24 views3 slides
DSD-INT 2023 Process-based modelling of salt marsh development coupling Delft... by
DSD-INT 2023 Process-based modelling of salt marsh development coupling Delft...DSD-INT 2023 Process-based modelling of salt marsh development coupling Delft...
DSD-INT 2023 Process-based modelling of salt marsh development coupling Delft...Deltares
7 views18 slides
DSD-INT 2023 Delft3D FM Suite 2024.01 1D2D - Beta testing programme - Geertsema by
DSD-INT 2023 Delft3D FM Suite 2024.01 1D2D - Beta testing programme - GeertsemaDSD-INT 2023 Delft3D FM Suite 2024.01 1D2D - Beta testing programme - Geertsema
DSD-INT 2023 Delft3D FM Suite 2024.01 1D2D - Beta testing programme - GeertsemaDeltares
17 views13 slides
Advanced API Mocking Techniques by
Advanced API Mocking TechniquesAdvanced API Mocking Techniques
Advanced API Mocking TechniquesDimpy Adhikary
19 views11 slides
DSD-INT 2023 Simulation of Coastal Hydrodynamics and Water Quality in Hong Ko... by
DSD-INT 2023 Simulation of Coastal Hydrodynamics and Water Quality in Hong Ko...DSD-INT 2023 Simulation of Coastal Hydrodynamics and Water Quality in Hong Ko...
DSD-INT 2023 Simulation of Coastal Hydrodynamics and Water Quality in Hong Ko...Deltares
14 views23 slides
Keep by
KeepKeep
KeepGeniusee
75 views10 slides

Recently uploaded(20)

DSD-INT 2023 Process-based modelling of salt marsh development coupling Delft... by Deltares
DSD-INT 2023 Process-based modelling of salt marsh development coupling Delft...DSD-INT 2023 Process-based modelling of salt marsh development coupling Delft...
DSD-INT 2023 Process-based modelling of salt marsh development coupling Delft...
Deltares7 views
DSD-INT 2023 Delft3D FM Suite 2024.01 1D2D - Beta testing programme - Geertsema by Deltares
DSD-INT 2023 Delft3D FM Suite 2024.01 1D2D - Beta testing programme - GeertsemaDSD-INT 2023 Delft3D FM Suite 2024.01 1D2D - Beta testing programme - Geertsema
DSD-INT 2023 Delft3D FM Suite 2024.01 1D2D - Beta testing programme - Geertsema
Deltares17 views
Advanced API Mocking Techniques by Dimpy Adhikary
Advanced API Mocking TechniquesAdvanced API Mocking Techniques
Advanced API Mocking Techniques
Dimpy Adhikary19 views
DSD-INT 2023 Simulation of Coastal Hydrodynamics and Water Quality in Hong Ko... by Deltares
DSD-INT 2023 Simulation of Coastal Hydrodynamics and Water Quality in Hong Ko...DSD-INT 2023 Simulation of Coastal Hydrodynamics and Water Quality in Hong Ko...
DSD-INT 2023 Simulation of Coastal Hydrodynamics and Water Quality in Hong Ko...
Deltares14 views
DSD-INT 2023 3D hydrodynamic modelling of microplastic transport in lakes - J... by Deltares
DSD-INT 2023 3D hydrodynamic modelling of microplastic transport in lakes - J...DSD-INT 2023 3D hydrodynamic modelling of microplastic transport in lakes - J...
DSD-INT 2023 3D hydrodynamic modelling of microplastic transport in lakes - J...
Deltares9 views
DSD-INT 2023 Simulating a falling apron in Delft3D 4 - Engineering Practice -... by Deltares
DSD-INT 2023 Simulating a falling apron in Delft3D 4 - Engineering Practice -...DSD-INT 2023 Simulating a falling apron in Delft3D 4 - Engineering Practice -...
DSD-INT 2023 Simulating a falling apron in Delft3D 4 - Engineering Practice -...
Deltares6 views
Generic or specific? Making sensible software design decisions by Bert Jan Schrijver
Generic or specific? Making sensible software design decisionsGeneric or specific? Making sensible software design decisions
Generic or specific? Making sensible software design decisions
Airline Booking Software by SharmiMehta
Airline Booking SoftwareAirline Booking Software
Airline Booking Software
SharmiMehta6 views
Fleet Management Software in India by Fleetable
Fleet Management Software in India Fleet Management Software in India
Fleet Management Software in India
Fleetable11 views
DSD-INT 2023 Leveraging the results of a 3D hydrodynamic model to improve the... by Deltares
DSD-INT 2023 Leveraging the results of a 3D hydrodynamic model to improve the...DSD-INT 2023 Leveraging the results of a 3D hydrodynamic model to improve the...
DSD-INT 2023 Leveraging the results of a 3D hydrodynamic model to improve the...
Deltares6 views
Sprint 226 by ManageIQ
Sprint 226Sprint 226
Sprint 226
ManageIQ5 views
Dev-HRE-Ops - Addressing the _Last Mile DevOps Challenge_ in Highly Regulated... by TomHalpin9
Dev-HRE-Ops - Addressing the _Last Mile DevOps Challenge_ in Highly Regulated...Dev-HRE-Ops - Addressing the _Last Mile DevOps Challenge_ in Highly Regulated...
Dev-HRE-Ops - Addressing the _Last Mile DevOps Challenge_ in Highly Regulated...
TomHalpin96 views
Myths and Facts About Hospice Care: Busting Common Misconceptions by Care Coordinations
Myths and Facts About Hospice Care: Busting Common MisconceptionsMyths and Facts About Hospice Care: Busting Common Misconceptions
Myths and Facts About Hospice Care: Busting Common Misconceptions
AI and Ml presentation .pptx by FayazAli87
AI and Ml presentation .pptxAI and Ml presentation .pptx
AI and Ml presentation .pptx
FayazAli8711 views
Dapr Unleashed: Accelerating Microservice Development by Miroslav Janeski
Dapr Unleashed: Accelerating Microservice DevelopmentDapr Unleashed: Accelerating Microservice Development
Dapr Unleashed: Accelerating Microservice Development
Miroslav Janeski10 views
FIMA 2023 Neo4j & FS - Entity Resolution.pptx by Neo4j
FIMA 2023 Neo4j & FS - Entity Resolution.pptxFIMA 2023 Neo4j & FS - Entity Resolution.pptx
FIMA 2023 Neo4j & FS - Entity Resolution.pptx
Neo4j7 views

CrashLocator: Locating Crashing Faults Based on Crash Stacks (ISSTA 2014)

  • 1. CrashLocator: Locating Crashing Faults Based on Crash Stacks Rongxin Wu1, Hongyu Zhang2, Shing-Chi Cheung1 and Sunghun Kim1 The Hong Kong University of Science and Technology1 Microsoft Research2 July 24th , 2014 ISSTA 2014
  • 2. Background 2 Crash Information with Crash Stack Crash Reporting System Software Crash Bug ReportsDevelopers Crash Buckets
  • 3. Feedbacks From Mozilla Developers • Locating crashing faults is hard • Ad hoc approach “… and look at the crash stack listed. It shows the line number of the code, and then I go to the code and inspect it. If I am unsure what it does I go to the second line of the stack and code and inspect that, and so on and so forth …” “Some crashes are hard to fix because it is not necessarily indicative of the place where it crashes in the crash stack …” “ I use the top down method of following the crash backwards.” “Sometimes it can be very difficult.” 3
  • 4. Uncertain Fault Location • The faulty function may not appear in crash stack About 33%~41% of crashing faults in Firefox cannot be located in crash stacks! A B C E F G H Buggy Code D Crash Stack Crash Point 4
  • 5. • Related Work • Tarantula (J. A. Jones et al., ICSE 2002) (J. A. Jones et al., ASE 2005) • Jaccard (R. Abreu et al., TAICPART-MUTATION 2007) • Ochiai (R. Abreu et al., TAICPART-MUTATION 2007) (S. Art et al., ISSTA 2010) • … • Passing Traces and Failing Traces Spectrum-Based Fault Localization 5
  • 6. • Are these techniques applicable? Spectrum-Based Fault Localization Instrumented Product Software Failing Traces Passing Traces Privacy Concern Performance Overhead (C. Luk et al., PLDI 2005) x Crash Stack f1 f2 f3 … fn x Test Cases Effectiveness (S. Artzi et al., ISSTA’10) 6
  • 7. Our Research Goal How to help developers fix crashing faults? – Locate crashing faults based on crash stack 7
  • 8. Our technique: CrashLocator • Target at locating faulty functions • No instrumentation needed • Approximate failing traces  Based on Crash Stacks  Use static analysis techniques • Rank suspicious functions  Without passing traces  Based on characteristics of faulty functions 8
  • 9. Approximate Failing Traces • Basic Stack Expansion Algorithm A B C D Crash Stack E J M N Depth-1 F K L Depth-2 G H Depth-3 A B J C K L D E M N F G H Call Graph 9
  • 10. functionposition File Line D0 file_0 l0 C1 file_1 l1 B2 file_2 l2 A3 file_3 l3 Crash Stack Approximate Failing Traces • Basic Stack Expansion Algorithm  Function call information only • Improved Stack Expansion Algorithm  Source file position information 10
  • 11. Improved Stack Expansion Algorithm • Control Flow Analysis if J() … B() … Entry Exit In Crash Stack CFG of A A B C D Crash Stack E J M N Depth-1 F K L Depth-2 G H Depth-3 11
  • 12. Improved Stack Expansion Algorithm • Backward Slicing 1. Obj D(){ 2. Obj s; 3. int a = M(); 4. char b = ‘’; 5. Obj[] c = N(b); 6. s=c[1]; //crash here 7. if(s!=‘’){ 8. … 9. } 8. … 9. } variables {s,c} A B C D Crash Stack E M N Depth-1 F Depth-2 G H Depth-3 Not in slicing 12
  • 13. After crash stack expansion, there are still a large number of suspicious functions How to rank the suspicious functions? 13
  • 14. Rank suspicious functions • An empirical study on the characteristics of faulty functions • Quantify the suspiciousness of suspicious functions 14
  • 15. Observation 1: Frequent Function • Faulty functions appear frequently in the crash traces of the corresponding buckets.  Function Frequency (FF) Crash Report More Frequent, More Suspicious For 89-92% crashing faults, the associated faulty functions appear in all crash execution traces in the corresponding bucket. Crash Bucket 15
  • 16. Frequent Function • Some frequent functions are unlikely to be buggy  Entry points (main, _RtlUserThreadStart, …)  Event handling routine (CloseHandle) • Information retrieval, some frequent words are useless  stop-words, e.g. “the”, “an”, “a”  Inverse Document Frequency (IDF) • Inverse Bucket Frequency (IBF)  If a function appears in many buckets, it is less likely to be buggy 16
  • 17. Observation 2: Functions Close to Crash Point • Faulty functions appear closer to crash point  In Mozilla Firefox, for 84.3% of crashing faults, the distance between crash point and the associated faulty functions is less 5. • Inverse Average Distance to Crash Point (IAD) 17
  • 18. Observation 3: Less Frequently Changed Functions • Functions that do not contain crashing faults are often less frequently changed  94.1% of faulty functions have been changed at least once during the past 12 months  Immune Functions (Y. Dang et al. ICSE 2012) • Less frequently changed functions  Functions that have no changes in past 12 months  Suspicious score is 0 18
  • 19. Observation 4: Large Functions • Our prior study (H. Zhang. ICSM 2009) showed that large modules are more likely to be defect-prone • Function’s Lines of Code (FLOC) 19
  • 20. Suspicious Score 𝑆𝑐𝑜𝑟𝑒 𝑓, 𝐵 = 𝐹𝐹 𝑓, 𝐵 ∗ 𝐼𝐵𝐹 𝑓 ∗ 𝐼𝐴𝐷 𝑓, 𝐵 ∗ 𝐹𝐿𝑂𝐶(𝑓) • FF (Function Frequency) 𝐹𝐹 𝑓, 𝐵 = 𝑁𝑓,𝐵 𝑁 𝐵 • IBF(Inverse Bucket Frequency) 𝐼𝐵𝐹 𝑓 = 𝑙𝑜𝑔( #𝐵 #𝐵𝑓 + 1) • IAD(Inverse Distance to Crash Point) 𝐼𝐴𝐷 𝑓, 𝐵 = 𝑁𝑓,𝐵 1 + 𝑗=1 𝑛 𝑑𝑖𝑠𝑗(𝑓) • FLOC(Function Lines of Code) 𝐹𝐿𝑂𝐶 𝑓 = 𝑙𝑜𝑔 (𝐿𝑂𝐶 𝑓 + 1) 20
  • 21. Evaluation Subjects • Mozilla Products  5 releases of Firefox  2 releases of Thunderbird  1 release of SeaMonkey • 160 crashing faults(buckets) • Large-Scale  More than 2 million LOC  More than 120K functions 21
  • 22. Evaluation Metrics • Recall@N: Percentage of successfully located faults by examining top N recommended functions • Mean Reciprocal Rank (MRR)  Measure the quality of the ranking results in IR  Range value: 0 ~ 1  Higher value means better ranking 22
  • 23. Experimental Design • RQ1: How many faults can be successfully located by CrashLocator? • RQ2: Can CrashLocator outperform the conventional stack-only methods? • RQ3: How does each factor contribute to the crash localization performance? • RQ4: How effective is the proposed crash stack expansion algorithm? 23
  • 24. RQ1: CrashLocator Performance System Recall@1 Recall@5 Recall@10 MRR Firefox 4.0b4 55.6% 66.7% 77.8% 0.627 Firefox 4.0b5 47.1% 70.6% 70.6% 0.566 Firefox 4.0b6 48.0% 64.0% 64.0% 0.540 Firefox14.0.1 52.0% 52.0% 56.0% 0.528 Firefox16.0.1 53.8% 53.8% 53.8% 0.542 Thunderbird17.0 48.5% 66.7% 78.8% 0.568 Thunderbird24.0 50.0% 66.7% 66.7% 0.544 SeaMonkey2.21 55.0% 70.0% 70.0% 0.600 Summary 50.6% 63.7% 67.5% 0.559 24
  • 25. RQ2: Comparison with Stack-Only methods • Conventional Stack-Only Methods • StackOnlySampling • StackOnlyAverage • StackOnlyChangeDate 25
  • 26. RQ2: Comparison with Stack-Only methods 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 1 5 10 20 50 100 Recall@N Top N Functions StackOnlySampling StackOnlyAverage StackOnlyChangeDate CrashLocator 26
  • 27. RQ3: Contribution of Each Factors • Inverse Bucket Frequency (IBF) • Function Frequency (FF) • Function’s Lines of Code (FLOC) • Inverse Average Distance to Crash Point (IAD) 27
  • 28. RQ3: Contribution of Each Factors 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 ff4.0b4 ff4.0b5 ff4.0b6 ff14.0.1 ff16.0.1 tb17.0 tb24.0 sm2.21 Summary MRR IBF IBF*FF IBF*FF*FLOC IBF*FF*FLOC*IAD 28
  • 29. RQ4: Stack Expansion Algorithms • Basic Stack Expansion Algorithm  Static Call Graph • Improved Stack Expansion Algorithm  Static Call Graph  Control Flow Analysis  Backward Slicing 29
  • 30. RQ4: Stack Expansion Algorithms 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 Recall@1 Recall@5 Recall@10 Recall@20 Recall@50 MRR Basic Stack Trace Expansion Improved Stack Trace Expansion 30
  • 31. Conclusions • Propose a novel technique CrashLocator to locate crashing faults based on crash stack only • Evaluate on real and large-scale projects • 50.6%, 63.7%, and 67.5% of crashing faults can be located by examining only top 1,5,10 functions • CrashLocator outperforms Stack-Only methods significantly, with the improvement of MRR at least 32% and the improvement of Recall@10 at least 23% 31

Editor's Notes

  1. Good afternoon. thanks for joining this presentation. My name is … Today, I am going to present… This work is a joint work between … Let me start to introduce it.
  2. As we know, software crash is common. Crash is a severe manifestation of faults. Due to the importance and severity of crash, recent years, some industrial companies and open source communities developed crash reporting systems to collect crash reports from end users. Due to the large number of users, there are many crash reports received daily. It is impossible for developers to inspect each of them. Therefore, crash reporting systems will organize the crash reports. The organizing process is also called crash bucketing, which group the crash reports caused by the same bugs together. Then, bug reports are generated based on the crash buckets and sent to developers for debugging.
  3. Although the crash reporting systems have been proved to be useful in debugging, still debugging crashing faults is not easy. After communicating with Mozilla developers, we found that, sometimes locating crashing fault is hard. Especially, when they cannot directly get evidences from the crash stack, the crashing fault become difficult. To fix crashing bug, they usually used ad hoc approach. Usually, they used top down method to inspect crash stack. Crash stack is useful. However, using only crash stack is insufficient.
  4. We conduct an empirical study in 3 release versions of Firefox. We find that the buggy code may not always appear in crash stack. This is because, the buggy code may be executed and popped out of call stack. Then, the side effect of buggy code is taken in the later executed statements. In Mozilla, 33%-41% of crashing faults cannot be located in crash stacks.
  5. Then, we consider the fault localization techniques to assist the debugging. In recent years, many spectrum-based fault localization techniques are proposed, such as Tarantula, Jaccard, Ochiai. These techniques contrast passing and failing execution traces, and compute the suspicious scores of program elements, and present the ranked list of program elements to developers.
  6. These techniques are well studied. However, are these techniques directly applicable? As we know, the passing and failing traces are required by these techniques. To obtain these traces, usually they need to instrument the programs to collect. However, in production software, usually instrumentation is not allowed, due to the privacy concern and the performance overhead caused by the instrumentation. Therefore, we cannot obtain the traces from end users. For failing trace, what we have now is crash stack. Crash stack is a snapshot of call stack at the time of crashing. It is a partial execution trace and is not equivalent to complete failing trace. For passing trace, we may be able to obtain via the test cases. However, the study by S. Artzi showed that, the fault localization techniques are effective, if the passing traces are similar to the failing traces. It is not always possible to have such test cases to generate passing traces similar to failing ones. First, the instrumenting production software to collect full trace is usually not available. This is mainly because of the privacy concerns of end users as well as the performance overhead caused by the instrumentation. We noticed that, recently some researches in our community have proposed some low-overhead instrumentation technologies to profile the dynamic behavior of software. However, before the wide-adoption of these new instrumentation technique, instrumenting production software to collect full trace is still not available. Except the instrumentation, are we able to obtain the execution traces? For the failing traces, since the crash stack is a call stack information at the time of crashing, it is a partial failing trace. For the passing traces, although we can get passing traces from existing test cases, we cannot guarantee the effectiveness of these test cases. Some studies show that, leverage passing tests whose characteristics are similar to the failing trace can achieve effective fault localization performance. As such, due to these limitations, conventional fault localization techniques may not be directly applicable in current step.
  7. Then, with only crash reports available in crash reporting system, how we can help developers fix crashing faults? We propose our research goal: to locate crashing faults based on crash stacks.
  8. Our technique is named as CrashLocator, which aims at locating faulty functions, because functions are commonly used in unit testing and helpful for crash reproducing. Different from conventional fault localization, our technique does not need any instrumentation. CrashLocator contains two major steps. The first step is approximating failing traces, the second is ranking suspicious functions. The first step is to approximate the failing traces. This is because faulty functions may not reside on crash stack. In this step, we use static analysis to generate the failing traces based o crash stacks. The second step is to rank the suspicious functions. This is because the number of suspicious functions after approximating can be very large and we need to prioritize the list. In this step, we do not use passing traces. Instead, the ranking is based on the characteristics of faulty functions.
  9. Let us see the detail of our technique. To approximate the failing traces, a simple way is to expand the crash stack via call graph information. For example, we have a crash stack and call graph at the beginning. For the function A, there are two callee function B and J. B is in crash stack, J is not in crash stack but can be possibly executed before crash. Therefore, we include J into failing traces. Similarly, we can do this for the function C and D in crash stack. As such, we can include JEMN in our failing traces in the call depth 1. We can further expanding the failing traces by analysis the functions that can be executed by JEMN. As such, we can include the function KLF in the failing traces. By expanding crash stack in different call depths, we can approximate the failing traces.
  10. The basic stack expansion algorithm is simple and conservative. It only use the function call information in crash stack. However, we find that crash stack contains more information, such as the source file position information. Therefore, we proposed the improved stack expansion algorithm based on it.
  11. To reduce the functions that are impossible to be executed before crash, we conduct control flow analysis on each function in crash stack. For example, we first get the cfg of function A. We find this position is in crash stack. so we can infer the possible control flow path and J is not in the path. Then, we can filter the function call J out. In the stack expansion steps, we will not consider to expand the function call J from A.
  12. In our study, we find that, usually, the variables in crash lines are related to crashes. Then, we perform backward slicing to get the statements that can affect the crash-related variables. For example, in function D, Line 6 is crash line. Crash related variables are s and c. Via backward slicing, we find that, Line 3 is not in the slicing statements. The function call to M will not affect the s and c. Therefore, we filter out the function call to M from D in our expansion steps. Based on control flow analysis and backward slicing, we can approximate a comparable precise failing traces.
  13. Let us see the first observation. A crashing bug may trigger a bucket of crash reports. The crash stacks in these reports may be different, since a single fault may manifest in different ways due to different configuration and platforms. Intuitively, the faulty functions should appear frequently in the failing traces in these crash reports. Our empirical study showed that, 89-92% of crashing faults, the associated functions appear in all crash execution traces in the corresponding bucket. We conclude this result as our first observation. Faulty functions appear frequently in crash traces of the corresponding buckets. Then we propose our first factor function frequency to characterize the faulty function.
  14. However, some functions appear frequently but are unlikely to be buggy, e.g. the entry points and some event handling routines. This is similar to the concept of “stop-words” in information retrieval. The words like “a”, “an” “the” appear frequently but contain less meaning. Therefore, to decrease the weight of these words, inverse document frequency will be used. We adopt the similar concept, and generate our second factor Inverse Bucket Frequency to decrease the priority of the frequent functions that are across many buckets.
  15. We also find that, in Mozilla, for 84.3% of crashing faults, the distance between faulty function and crashing point is very close. W summarize this studying result as our second observation. Based on that, we propose our third factors “Inverse average distance ” (IAD). IAD gives high priority to the functions closer to crash point.
  16. Our empirical study also showed that, 94.1% of faulty functions have been changed at least once during the past 12 months. This result is consistent with our previous study in Microsoft. In that work, we find the existence of immune functions. Immune functions are a list of functions that are considered to be unlikely to be buggy. One category of immune functions are those functions have been successfully used for quite a long time without changes. Therefore, we summarize our third observation as Functions that do not contain crashing faults are often less frequently changed. Using this observation, we select the functions that have no changes in past 12 months and assign 0 as the suspicious score for them.
  17. In our prior study, we find that a large modules are more likely to be buggy. Therefore, we design the fourth factor Function’s Lines of Code.
  18. Based on the four factors, we design the suspicious score as multiplying all of factors. Based on the suspicious score, we rank the functions in approximated traces.
  19. For the evaluation, we select Mozilla three products as our evaluation subjects. In total, there are 160 crashing buckets. The programming language is C/C++. All the subjects are large-scale.
  20. We use Recall@N and MRR as evaluation metrics. Recall@N measures the percentage of the bugs can be located by examining top N recommended functions. MRR is a widely-used metrics to measure the quality of ranking results in IR. Its value ranges from 0 to 1. The higher value of MRR means a better ranking result.
  21. We design four research questions. RQ1 evaluates the performance of our approach. RQ2 compares our approach with the baseline approach named as stack-only methods. The stack-only methods are originated from the Mozilla developer’s feedback. RQ3 evaluates the contribution of each factor. RQ4 evaluates the effectiveness of our proposed crash stack expansion algorithm by comparing with basic stack expansion algorithm.
  22. The table shows the evaluation on RQ1. For each product, we showed the metrics of Recall@1 Recall@5 and Recall@10, as well as MRR. Take firefox 4.0b4 as an example, Recall@1 is 55.6%, that means only examining the top 1 recommended function, we can locate 55.6% of crashing faults. Similarly, by examining top 5 functions, we can locate 66.7% of faults, by examining top 10 functions, we can locate 77.8% of faults. The MRR value is 0.627. Overall, by examining top 1 functions, we can locate 50.6% of faults.
  23. For RQ2, we compare with the baseline approaches, that is stack-only methods. With the feedback from Mozilla developers, they usually inspected the functions in crash stack for debugging. Then, we design three variants of stack-only approaches. In StackOnlySampling method, for each bucket, we randomly select one crash from the bucket, rank the functions based on their position in crash stack. In StackOnlyAverage method, for each bucket, we select all the crashes from the bucket, rank the functions based on their average position in crash stack. In StackOnlyChangeDate method, for each bucket, we randomly select one crash from the bucket, and rank the functions based on the last modified date of the functions.
  24. The figure shows the comparison results. The X axis is the number of functions we examined in the recommendation list. The Y axis is the Recall@N metric. As we can see, CrashLocator outperforms all the other approaches. For example, by examining top 1 functions, CrashLocator can locate 50.6% of faults, while the second best approach StackOnlyAverage can only locate 35.6% of faults. In terms of Recall@1, the improvement of CrashLocator over StackOnlyAverage is 42%. Similarly, in terms of Recall@10, improvement is ranging from 23.2% to 45.8%.
  25. In RQ3, we evaluate the contribution of the four proposed factors, IBF, FF, FLOC, and IAD.
  26. This figure shows the performance of crashlocator by incrementally applying IBF, FF, FLOC and IAD factor, in terms of MRR metric. When only applying IBF, the performance is lowest, e.g. the overall MRR is about 0.1, by incrementally adding FF and FLOC factors, the performance is improved. When all factors are considered, the performance is the best. Therefore, we can know that, each factor can contribute to the performance, and IAD factor has more significant contributions than other factors.
  27. RQ4, we evaluate the effectiveness of our proposed stack expansion algorithm by comparing with the basic one which only uses static call graph.
  28. This figure shows the comparison between two stack expansion algorithms in terms of Recall@1, Recall@5, Recall@10, Recall@20, Recall@50 and MRR. in terms of Recall@N, the improvement of the proposed expansion algorithm over the basic one is ranging from 13.3% to 72.3%. In terms of MRR, the improvement is 59.3%.