Predicting
Recurring
Crash Stacks
Hyunmin Seo and Sunghun Kim
The Hong Kong University Of Science And Technology
September 7th, 2012
Automated Software Engineering 2012
Essen, Germany
Recurring Crashes
Bug
Report
52831
1
3.6b3 Patch 3.6b4
Crash Point Crash Point
nsXULTreeAccessible:: nsXULTreeAccessible::
GetTreeItemAccessible GetTreeItemAccessible
2
Bad Fixes
• Bad fixes comprise
as much as 9% of all bugs
(Gu et al. ICSE 2010)
• 14.8%∼24.4% of fixes for
post-release bugs are incorrect
(Yin et al. FSE 2011)
3
Comment in Bug Report
“I don’t know how this bit (crash
trace) got lost from the patch I
ended up checking in, but it’s pretty
essential...”
A comment in bug report #523528
14
Incomplete Fixes
• We call this as incomplete fixes
• “incomplete” in terms of fix
locations
• How can we help to prevent this?
15
Stack Expansion
L-1 L-2 L-3 Entry
A
if
G H I Path 1 Path 2
Block1 Block2
B G ( )
J Y ( ) B ( )
X ( )
C
Exit
D ✓
CFG of A
F
E
Covered ( if S F )
Crash Stack Missing ( otherwise )
18
Experimental Design
RQ1 - How good is the classification?
RQ2 - How can this help developers?
19
Subjects
Name Description
Subject 19 releases of Firefox 3.6
Release Date Oct 2009 ~ Mar 2011
Programming Language C / C++
LOC 3.2M ~ 4.4M
Name Value
# of crash buckets 33
# of total sub-groups 1159
# of recurring sub-groups 354
20
Experimental Design
RQ1 - How good is the classification?
RQ2 - How can this help developers?
21
RQ1 - Prediction Result
L4 Prediction Actual
292 167
Precision 0.57
Recall 0.49
F-measure 0.53
22
Experimental Design
RQ1 - How good is the classification?
RQ2 - How can this help developers?
24
IEnumConnectionPoints
trace a
RQ2 - Case Study
_RemoteNext_Thunk
trace b
IEnumOleUndoUnits
_Next_Stub
nsAccessibleWrap nsRootAccessible
::Next ::HandleEvent
nsXULTreeAccessible
::GetChildAt ✓ nsRootAccessible::
HandleEventWithTarget
First Fix (#528311)
in 3.6b4 Second Fix (#528311)
nsXULTreeAccessible:: ✓ in 3.6b5
GetTreeItemAccessible
crash point
286 NS_ENSURE_ARG_POINTER(aChild); 545 *aAccessible = nsnull;
287 *aChild = nsnull; 546
288 547- if (aRow < 0)
289+ if (IsDefunct()) 547+ if (aRow < 0 || IsDefunct ())
290+ return NS_ERROR_FAILURE; 548 return;
291 549
292 PRInt32 childCount = 0; 550 PRInt32 rowCount = 0;
25
RQ2 - Developer
Feedback
Firefox developer emails and mailing lists
21 responses - 3 very useful, 7 useful
10“It should be an interesting feature 1 not useful
requested more information, and useful
like any automation tool. It should make the
engineering work easier and keep users less
annoyed.”
“The first patch fixed the known steps but
missed the fact that other routes led to the
same state inconsistency. ... If you have a
system that automates that process it would
indeed be helpful.” 26
Threats to validity
• The subject is open source software
• Collected crash data might be biased
• Oracle data set is incomplete
27
Discussion – Future Work
nsJARInputThunk::EnsureJarStream
nsZipReaderCache::GetZip
nsJAR::Open
nsZipArchive::OpenArchive
crash point ✓ nsZipArchive::BuildFileList
539 //-- Read the central directory headers
540 buf = startp + centralOffset;
541+ if (endp - buf < sizeof(PRUint32))
542+ return NS_ERROR_FILE_CORRUPTED;
543 PRUint32 sig = xtolong(buf); // crash point
544 while (sig == CENTRALSIG) {
28
Related Work
• Crash bucketing
(Dang et al., ICSE 2012)
• Post-mortem crash analysis
(Manevich et al., FSE 2004)
• Bug fix verification
(Gu et al., ICSE 2010)
29
Conclusions
• 48% of fixed crashes in Firefox recurred.
• We present an approach to predict recurring
crashes
• RQ1 - How good is the classification?
• Our approach yields reasonable accuracy - 0.57
precision and 0.49 recall
• RQ2 - How can this help developers?
• Our case studies and developers’ feedback show the
idea is useful 30
Editor's Notes
We were interested in recurring crashes. That is, software crashes again even after bug fixes.Here’s an example. Firefox crashes at this location. This is the name of the function where Firefox crashed.It is also called as crash point. A developer decided to fix this crash. He filed a bug report and made a patch.This patch was included in the next release. However Firefox crashed again at the same location
This problem is called as bad fixes.There is a bug. I make a fix. But the fix itself is buggy or does not remove previous bug perfectly.Gu et al investigated bug databases of Ant, AspectJ, and Rhino projects and found bad fixes..Yin et al investigated 4 large OS bug fixes and found 14..l
We also found similar bad fix problems in crash bug fixes.Then we wanted to know how often are bad fixes in case of crash bug fixes.And isn’t there any way that we can help prevent this?Our work is motivated by these questions
To see how often bad fixes are we investigated crash reporting system.A crash reporting system is an automated system designed to help developers fix crashes.When software crashes a windows pops up and asks you if you would send a crash report.Microsoft, Apple and Mozilla has their own crash reporting system.
Let me first briefly explain about Mozilla crash reporting system.When a software is released people download it and use it.Some of them experience crashes.Then the client part of CRS generates a crash report with important information about crashessuch as crash location, software version, os, hardware information, stack traces etc.Then it sends generated crash reports to a server.
The server receives many crash reports so they group similar crash reports together.This process is called as bucketing.Mozilla groups crash report having the same crash points together.Then developers investigate crash buckets.
Usually they focus on the most frequent crashes first.If he decided to fix a crash bucket he file a bug report, make a patch then the patch will be included in the next release.
To see how often are bad fixes we investigated Mozilla crash reporting system and Bug reporting systemfor 19 sub-versions of Firefox 3.6. And we found 70 bug reports which fixed 79 crash points.
Then we checked if all crashes are gone.We identified two versions. Before patch released and after patch is released.Then we counted the number of crash reports at both versions. This way you can see if crashes are gone.
Here’s a few examples. Firefox crashed here. We found 677 crash reports at this version.Then developer fixed this crash and after fix, we couldn’t find any more crash reports.This is what we’re expecting. It’s a good fix.However these two are bad fix examples. We still could find a large number of crash reports after bug fixes.Overall, among 79 crash points we found more than 48% of crash points are recurring.
Then isn’t there any way we can help prevent this?We investigated crash reports and bug fixes further and found thatThe crash report in a crash bucket have the same crash point buttheir crash paths could be different. Then if the developer missed one path in his fix, the same crash could recur following the missed path.
Let’s see an example.We found 70 crash reports before bug is fixed.We grouped these crash reports according to their crash stacks and counted the number of crash reports at each stackThis is the result. There are 5 unique crash stacks and the bar shows the number of crash reports.Then we did the same thing at this version. Interestingly all the other crash stacks are gone except the second one.It seems the second path is missed from the first fix.If we look at the history of this bug report, after realizing that the crash is not gone the developer reopened this closed bug reportand made another patch. Now after the new patch is released the crash was gone.
Here’s another evidence developers do miss some paths.This is a comment in another bug report.The situation is similar. The developer realized that the crash is not gone so he reopened the bug reportmade another patch and left this comment.So definitely he missed a crash trace.
We call this as incomplete fixesThe fixes are incomplete in terms of fix locations.How can we help this?How about if we can find those missed paths automatically?That can help developer right?
So this is the overview of our approach to find those paths.When a developer makes a patchwe compare the patch with crash reports and classify them as covered or missing.And we present missed crash path to the developer so that he can fix them again.We can divide the process as preprocessing and classification and now I’ll only explain classification
The idea behind the classification is this.Let’s say Firefox started from here and followed this path and crashed here.Now assume a developer changed code here.In the next release This is what is happening to those missed path.So by comparing the fix location and execution path, we can find missed path.
But what we have in crash reports is crash stack not execution trace.So we use crash stack.This is a crash stack. Each circle is a function or stack frame. A called B B called C and it crashed here.
With this approach we designed an experiment.We had two research questions.The first one is how good is...So what is the precision of our classification.We had a problem here. How do we know whether our prediction is correct or not?We don’t have oracle.So instead, what we did is this.If a crash path is really missed from a patch, it may recur in the next release.So we predicted the crash paths classified as missing to recur. And we checked the predictionwith the real crash reports in the next release. We calculated precision and recall.The next question is How can this help developer?For this one we present case studies and developers’ feedback.The first research question we had is “How good is our classification?”How do we know if the classification result is correct or not?To check this, we predicted the crash traces that we classified as “missing” will recur after bug fixes.We evaluated our prediction with the real crash report in Mozilla CRS.We present precision, recall and f-measure for this prediction.The next research question is “How can this help developers?”We present a few case studies with developers’ feedback for this question.
This is the subject we used in our experiment.And this is the number of unique crash stacks in our experiment.We classify each of them as covered or missing.And this is the number of really recurred crash paths among this.
So for the first question.How good is the classification?
At expansion level 4, our approach predicted 292 crash paths to recur and among them 167 crash paths actually recurred.So the precision is 0.57.The precision and recall vary according to the expansion level.
This shows the result at different expansion level.L-0 means no expansion and L-~ means the stack is expanded as much as possible.As expansion level goes higher, the stack becomes lager and it becomes more close to over-approximation of the original execution trace.You can see the precision goes up while recall goes downAlso the accuracy is highly affected by the crash reports we collected in the next release.That set becomes our oracle data set. There are many reasons that this set is incomplete. We only collected crash reports during a limited period of time. Users may not have submitted crash reports.All these affect our prediction result. But our approach shows reasonable accuracy.
OK,The result shows reasonable precision and recall. We can predict either with high precision or high recall.Now, how can this help developers?
Here is a case study.There are two stack trace.The first one is covered, and the second one is missing.Our approach correctly predict the second one will recur.Now look at the two fixes.This is the first fix and this is the second fix.The two fixes are very similar. Both calls IsDefunct and returns.So if the developer knew the existence of this missed path when he made the first fix,he could easily have fixed that too because the two fixes are very similar. You can find more case studies in the paper.
Also, we asked developers if this approach is useful.We briefly explained our approach with a few case studies and send an email to the Firefox developerswho fixed the crashes used in our experiment. We received 21 responses 3 said it is very useful 7 said it is useful and 10 requested more informationlike can you do the experiment to the recent Firefox crash reports?
Only used Mozilla Firefox Only sub-versions of Firefox 3.6 in limited period
In this work we only focused on incomplete fixes.But there are other types of bad fixes which is incorrect fixes.To handle this crash developer fixed at this location so our approach predict this will not recur.However this crash recurred because this fix was wrong.Previously this buf was pointing invalid memory area so developer added this code to check the validity of this variable.But this code was insufficient and Firefox went through this code and crashed again. Later developer added more code.To handle this case we need more rigorous verification technique which is our future work. Our approach can not find such incorrect fixes currently.
There are related work about crashes.It is possible that Firefox crashes at two different locations but have the same root cause.In this case it is better to put those crash reports into the same bucket. Crash bucketing algorithm addresses this issue. Our approach can be more accurate if we have better bucketing algorithm.There is a work trying to find the root cause of crashes by reasoning backward from crash point.Our work is different. Once developer made a fix then we try to verify it.Gu et al also tried to verify bug fixes by generating more inputs that can trigger the same bug.We couldn’t use the same approach in case of crashes because reproducing crashes is very challenging.Instead we used crash stacks to verify bug fixes.