• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Predicting Recurring Crash Stacks (ASE 2012)
 

Predicting Recurring Crash Stacks (ASE 2012)

on

  • 839 views

Hyunmin's ASE 2012 presentation. A Winner of the ACM SIGSOFT Distinguished Paper Award!

Hyunmin's ASE 2012 presentation. A Winner of the ACM SIGSOFT Distinguished Paper Award!

Statistics

Views

Total Views
839
Views on SlideShare
839
Embed Views
0

Actions

Likes
0
Downloads
0
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

CC Attribution-NonCommercial LicenseCC Attribution-NonCommercial License

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • We were interested in recurring crashes. That is, software crashes again even after bug fixes.Here’s an example. Firefox crashes at this location. This is the name of the function where Firefox crashed.It is also called as crash point. A developer decided to fix this crash. He filed a bug report and made a patch.This patch was included in the next release. However Firefox crashed again at the same location
  • This problem is called as bad fixes.There is a bug. I make a fix. But the fix itself is buggy or does not remove previous bug perfectly.Gu et al investigated bug databases of Ant, AspectJ, and Rhino projects and found bad fixes..Yin et al investigated 4 large OS bug fixes and found 14..l
  • We also found similar bad fix problems in crash bug fixes.Then we wanted to know how often are bad fixes in case of crash bug fixes.And isn’t there any way that we can help prevent this?Our work is motivated by these questions
  • To see how often bad fixes are we investigated crash reporting system.A crash reporting system is an automated system designed to help developers fix crashes.When software crashes a windows pops up and asks you if you would send a crash report.Microsoft, Apple and Mozilla has their own crash reporting system.
  • Let me first briefly explain about Mozilla crash reporting system.When a software is released people download it and use it.Some of them experience crashes.Then the client part of CRS generates a crash report with important information about crashessuch as crash location, software version, os, hardware information, stack traces etc.Then it sends generated crash reports to a server.
  • The server receives many crash reports so they group similar crash reports together.This process is called as bucketing.Mozilla groups crash report having the same crash points together.Then developers investigate crash buckets.
  • Usually they focus on the most frequent crashes first.If he decided to fix a crash bucket he file a bug report, make a patch then the patch will be included in the next release.
  • To see how often are bad fixes we investigated Mozilla crash reporting system and Bug reporting systemfor 19 sub-versions of Firefox 3.6. And we found 70 bug reports which fixed 79 crash points.
  • Then we checked if all crashes are gone.We identified two versions. Before patch released and after patch is released.Then we counted the number of crash reports at both versions. This way you can see if crashes are gone.
  • Here’s a few examples. Firefox crashed here. We found 677 crash reports at this version.Then developer fixed this crash and after fix, we couldn’t find any more crash reports.This is what we’re expecting. It’s a good fix.However these two are bad fix examples. We still could find a large number of crash reports after bug fixes.Overall, among 79 crash points we found more than 48% of crash points are recurring.
  • Then isn’t there any way we can help prevent this?We investigated crash reports and bug fixes further and found thatThe crash report in a crash bucket have the same crash point buttheir crash paths could be different. Then if the developer missed one path in his fix, the same crash could recur following the missed path.
  • Let’s see an example.We found 70 crash reports before bug is fixed.We grouped these crash reports according to their crash stacks and counted the number of crash reports at each stackThis is the result. There are 5 unique crash stacks and the bar shows the number of crash reports.Then we did the same thing at this version. Interestingly all the other crash stacks are gone except the second one.It seems the second path is missed from the first fix.If we look at the history of this bug report, after realizing that the crash is not gone the developer reopened this closed bug reportand made another patch. Now after the new patch is released the crash was gone.
  • Here’s another evidence developers do miss some paths.This is a comment in another bug report.The situation is similar. The developer realized that the crash is not gone so he reopened the bug reportmade another patch and left this comment.So definitely he missed a crash trace.
  • We call this as incomplete fixesThe fixes are incomplete in terms of fix locations.How can we help this?How about if we can find those missed paths automatically?That can help developer right?
  • So this is the overview of our approach to find those paths.When a developer makes a patchwe compare the patch with crash reports and classify them as covered or missing.And we present missed crash path to the developer so that he can fix them again.We can divide the process as preprocessing and classification and now I’ll only explain classification
  • The idea behind the classification is this.Let’s say Firefox started from here and followed this path and crashed here.Now assume a developer changed code here.In the next release This is what is happening to those missed path.So by comparing the fix location and execution path, we can find missed path.
  • But what we have in crash reports is crash stack not execution trace.So we use crash stack.This is a crash stack. Each circle is a function or stack frame. A called B B called C and it crashed here.
  • With this approach we designed an experiment.We had two research questions.The first one is how good is...So what is the precision of our classification.We had a problem here. How do we know whether our prediction is correct or not?We don’t have oracle.So instead, what we did is this.If a crash path is really missed from a patch, it may recur in the next release.So we predicted the crash paths classified as missing to recur. And we checked the predictionwith the real crash reports in the next release. We calculated precision and recall.The next question is How can this help developer?For this one we present case studies and developers’ feedback.The first research question we had is “How good is our classification?”How do we know if the classification result is correct or not?To check this, we predicted the crash traces that we classified as “missing” will recur after bug fixes.We evaluated our prediction with the real crash report in Mozilla CRS.We present precision, recall and f-measure for this prediction.The next research question is “How can this help developers?”We present a few case studies with developers’ feedback for this question.
  • This is the subject we used in our experiment.And this is the number of unique crash stacks in our experiment.We classify each of them as covered or missing.And this is the number of really recurred crash paths among this.
  • So for the first question.How good is the classification?
  • At expansion level 4, our approach predicted 292 crash paths to recur and among them 167 crash paths actually recurred.So the precision is 0.57.The precision and recall vary according to the expansion level.
  • This shows the result at different expansion level.L-0 means no expansion and L-~ means the stack is expanded as much as possible.As expansion level goes higher, the stack becomes lager and it becomes more close to over-approximation of the original execution trace.You can see the precision goes up while recall goes downAlso the accuracy is highly affected by the crash reports we collected in the next release.That set becomes our oracle data set. There are many reasons that this set is incomplete. We only collected crash reports during a limited period of time. Users may not have submitted crash reports.All these affect our prediction result. But our approach shows reasonable accuracy.
  • OK,The result shows reasonable precision and recall. We can predict either with high precision or high recall.Now, how can this help developers?
  • Here is a case study.There are two stack trace.The first one is covered, and the second one is missing.Our approach correctly predict the second one will recur.Now look at the two fixes.This is the first fix and this is the second fix.The two fixes are very similar. Both calls IsDefunct and returns.So if the developer knew the existence of this missed path when he made the first fix,he could easily have fixed that too because the two fixes are very similar. You can find more case studies in the paper.
  • Also, we asked developers if this approach is useful.We briefly explained our approach with a few case studies and send an email to the Firefox developerswho fixed the crashes used in our experiment. We received 21 responses 3 said it is very useful 7 said it is useful and 10 requested more informationlike can you do the experiment to the recent Firefox crash reports?
  • Only used Mozilla Firefox Only sub-versions of Firefox 3.6 in limited period
  • In this work we only focused on incomplete fixes.But there are other types of bad fixes which is incorrect fixes.To handle this crash developer fixed at this location so our approach predict this will not recur.However this crash recurred because this fix was wrong.Previously this buf was pointing invalid memory area so developer added this code to check the validity of this variable.But this code was insufficient and Firefox went through this code and crashed again. Later developer added more code.To handle this case we need more rigorous verification technique which is our future work. Our approach can not find such incorrect fixes currently.
  • There are related work about crashes.It is possible that Firefox crashes at two different locations but have the same root cause.In this case it is better to put those crash reports into the same bucket. Crash bucketing algorithm addresses this issue. Our approach can be more accurate if we have better bucketing algorithm.There is a work trying to find the root cause of crashes by reasoning backward from crash point.Our work is different. Once developer made a fix then we try to verify it.Gu et al also tried to verify bug fixes by generating more inputs that can trigger the same bug.We couldn’t use the same approach in case of crashes because reproducing crashes is very challenging.Instead we used crash stacks to verify bug fixes.

Predicting Recurring Crash Stacks (ASE 2012) Predicting Recurring Crash Stacks (ASE 2012) Presentation Transcript

  • Predicting Recurring Crash StacksHyunmin Seo and Sunghun Kim The Hong Kong University Of Science And Technology September 7th, 2012 Automated Software Engineering 2012 Essen, Germany
  • Recurring Crashes Bug Report 52831 1 3.6b3 Patch 3.6b4Crash Point Crash PointnsXULTreeAccessible:: nsXULTreeAccessible::GetTreeItemAccessible GetTreeItemAccessible 2
  • Bad Fixes• Bad fixes comprise as much as 9% of all bugs (Gu et al. ICSE 2010)• 14.8%∼24.4% of fixes for post-release bugs are incorrect (Yin et al. FSE 2011) 3
  • Motivation• How often do bad fixes occur?• How can we help to prevent it? 4
  • Crash Reporting System (CRS) 5
  • Mozilla CRS CRRELEASE CRS CR SERVER 6
  • Mozilla CRS CRSSERVER 7
  • Mozilla CRS Bug Report #50001 Patch File NEXT RELEASE 8
  • How often are bad fixes? Crash Bug Reporting Reporting System System 19 sub-versions of Firefox 3.6 70 Bug Reports 79 Crash Points 9
  • Have all crashesdisappeared after fixes? ? Crash Report Bug Report #5000 1 Before fix Patch After fix 10
  • Recurring Crash ExamplesBUGID CRASH POINT ver1 ver2 ver3 nsHtml5ElementName:: 3.6.8 3.6.9 3.6.10538722 initializeStatics 677 0 0 3.6.6 3.6.7 3.6.8554544 nsTextFrame::Reflow 773 186 497 nsXULTreeAccessible:: 3.6b3 3.6b4 3.6b5528311 GetTreeItemAccessible 70 168 0 48.1 % (38/79) 11
  • Crash Paths• The same crash point but different crash paths• The fix may miss some paths 12
  • BUGID CRASH POINT ver1 ver2 ver3 nsXULTreeAccessible:: 3.6b3 3.6b4 3.6b5 528311 GetTreeItemAccessible 70 168 0 35 175 30 150 # of Crash Reports# of Crash Reports 25 125 20 100 15 75 10 50 5 25 0 0 #1 #2 #3 #4 #5 #1 #2 #3 #4 #5 Sub-group Sub-group 3.6b3 3.6b4 13
  • Comment in Bug Report “I don’t know how this bit (crash trace) got lost from the patch I ended up checking in, but it’s pretty essential...” A comment in bug report #523528 14
  • Incomplete Fixes• We call this as incomplete fixes• “incomplete” in terms of fix locations• How can we help to prevent this? 15
  • Approach Overview Covered BugReport#5000 1 Patch File Missing 16
  • Idea behind Classification A fix has nothing to do if it is not executed ✓ Fix Location 17
  • Stack Expansion L-1 L-2 L-3 EntryA if G H I Path 1 Path 2 Block1 Block2B G ( ) J Y ( ) B ( ) X ( ) C Exit D ✓ CFG of A F E Covered ( if S F ) Crash Stack Missing ( otherwise ) 18
  • Experimental Design RQ1 - How good is the classification?RQ2 - How can this help developers? 19
  • SubjectsName DescriptionSubject 19 releases of Firefox 3.6Release Date Oct 2009 ~ Mar 2011Programming Language C / C++LOC 3.2M ~ 4.4MName Value# of crash buckets 33# of total sub-groups 1159# of recurring sub-groups 354 20
  • Experimental Design RQ1 - How good is the classification?RQ2 - How can this help developers? 21
  • RQ1 - Prediction Result L4 Prediction Actual 292 167 Precision 0.57 Recall 0.49 F-measure 0.53 22
  • RQ1 - Expansion Levels 0.9 0.8 0.7 0.6 0.5Value 0.4 0.3 PRECISION 0.2 RECALL 0.1 F-MEASURE 0 L-0 L-1 L-2 L-3 L-4 L-5 L-7 L-10 L-∞ Expansion Level 23
  • Experimental Design RQ1 - How good is the classification?RQ2 - How can this help developers? 24
  • IEnumConnectionPoints trace a RQ2 - Case Study_RemoteNext_Thunk trace b IEnumOleUndoUnits _Next_Stub nsAccessibleWrap nsRootAccessible ::Next ::HandleEvent nsXULTreeAccessible ::GetChildAt ✓ nsRootAccessible:: HandleEventWithTarget First Fix (#528311) in 3.6b4 Second Fix (#528311) nsXULTreeAccessible:: ✓ in 3.6b5 GetTreeItemAccessible crash point286 NS_ENSURE_ARG_POINTER(aChild); 545 *aAccessible = nsnull;287 *aChild = nsnull; 546288 547- if (aRow < 0)289+ if (IsDefunct()) 547+ if (aRow < 0 || IsDefunct ())290+ return NS_ERROR_FAILURE; 548 return;291 549292 PRInt32 childCount = 0; 550 PRInt32 rowCount = 0; 25
  • RQ2 - Developer Feedback Firefox developer emails and mailing lists 21 responses - 3 very useful, 7 useful10“It should be an interesting feature 1 not useful requested more information, and useful like any automation tool. It should make the engineering work easier and keep users less annoyed.” “The first patch fixed the known steps but missed the fact that other routes led to the same state inconsistency. ... If you have a system that automates that process it would indeed be helpful.” 26
  • Threats to validity• The subject is open source software• Collected crash data might be biased• Oracle data set is incomplete 27
  • Discussion – Future Work nsJARInputThunk::EnsureJarStream nsZipReaderCache::GetZip nsJAR::Open nsZipArchive::OpenArchive crash point ✓ nsZipArchive::BuildFileList539 //-- Read the central directory headers540 buf = startp + centralOffset;541+ if (endp - buf < sizeof(PRUint32))542+ return NS_ERROR_FILE_CORRUPTED;543 PRUint32 sig = xtolong(buf); // crash point544 while (sig == CENTRALSIG) { 28
  • Related Work• Crash bucketing (Dang et al., ICSE 2012)• Post-mortem crash analysis (Manevich et al., FSE 2004)• Bug fix verification (Gu et al., ICSE 2010) 29
  • Conclusions• 48% of fixed crashes in Firefox recurred.• We present an approach to predict recurring crashes• RQ1 - How good is the classification? • Our approach yields reasonable accuracy - 0.57 precision and 0.49 recall• RQ2 - How can this help developers? • Our case studies and developers’ feedback show the idea is useful 30