Comparing Text Mining Algorithms for Predicting the Severity of a Reported Bug
Upcoming SlideShare
Loading in...5
×
 

Like this? Share it with your network

Share

Comparing Text Mining Algorithms for Predicting the Severity of a Reported Bug

on

  • 794 views

Comparing Text Mining Algorithms for Predicting the Severity of a Reported Bug

Comparing Text Mining Algorithms for Predicting the Severity of a Reported Bug

Statistics

Views

Total Views
794
Views on SlideShare
603
Embed Views
191

Actions

Likes
0
Downloads
6
Comments
0

3 Embeds 191

http://ansymo.ua.ac.be 153
http://lore.ua.ac.be 36
http://win.ua.ac.be 2

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Comparing Text Mining Algorithms for Predicting the Severity of a Reported Bug Presentation Transcript

  • 1. Proceedings of the 15th European Conference on Software Maintenance and Reengineering Comparing Text Mining Algorithms for Predicting the Severity of a Reported Bug Ahmed Lamkanfi, Serge Demeyer, Quinten David Soetens, Tim Verdonck 1 /19Monday 7 March 2011
  • 2. Proceedings of the 15th European Conference on Software Maintenance and Reengineering Comparing Text Mining Algorithms for Predicting the Severity of a Reported Bug Ahmed Lamkanfi, Serge Demeyer, Quinten David Soetens, Tim Verdonck 1 /19Monday 7 March 2011
  • 3. 2 /19Monday 7 March 2011
  • 4. 3 /19Monday 7 March 2011
  • 5. Severity of a bug is important ✓ Critical factor in deciding how soon it needs to be fixed, i.e. when prioritizing bugs 4 /19Monday 7 March 2011
  • 6. Severity of a bug is important ✓ Critical factor in deciding how soon it needs to be fixed, i.e. when prioritizing bugs Priority is not severity! ✓ e.g.: a crash occurring only at a small user base may have a low priority 4 /19Monday 7 March 2011
  • 7. Severity is technical 5 /19Monday 7 March 2011
  • 8. Priority is business 6 /19Monday 7 March 2011
  • 9. 7 /19Monday 7 March 2011
  • 10. ✓ Severity varies: ➡ trivial, minor, normal, major, critical and blocker ➡ clear guidelines exist to classify severity of bug reports 7 /19Monday 7 March 2011
  • 11. ✓ Severity varies: ➡ trivial, minor, normal, major, critical and blocker ➡ clear guidelines exist to classify severity of bug reports ✓ Bugs are grouped according to products and components ➡ e.g.: UI, SWT, Debug are components of product Eclipse 7 /19Monday 7 March 2011
  • 12. Approach 8 /19Monday 7 March 2011
  • 13. Bug Database 9 /19Monday 7 March 2011
  • 14. Bug Database (1) & (2) Extract and preprocess bug reports ------------ ------------ ------------ Bug Reports ------------ ------------ ------------ ------------ ------------ 9 /19Monday 7 March 2011
  • 15. Bug Database (1) & (2) Extract and preprocess bug reports ------------ ------------ ------------ Bug Reports ------------ (3) Training predict ------------ ------------ ------------ ------------ 9 /19Monday 7 March 2011
  • 16. Bug Database New report --------------- (1) & (2) Extract and preprocess bug reports (4) Predict the severity ------------ ------------ ------------ Bug Reports ------------ (3) Training predict ------------ ------------ ------------ ------------ 9 /19Monday 7 March 2011
  • 17. Bug Database New report --------------- (1) & (2) Extract and preprocess bug reports (4) Predict the severity ------------ ------------ ------------ Bug Reports ------------ (3) Training predict prediction ------------ ------------ ------------ ------------ Non- Severe Severe • minor • major • trivial • critical • blocker 9 /19Monday 7 March 2011
  • 18. Bug Database New report --------------- (1) & (2) Extract and preprocess bug reports (4) Predict the severity ------------ ------------ ------------ Bug Reports ------------ (3) Training predict prediction ------------ ------------ ------------ ------------ Non- Severe Severe • minor • major • trivial • critical • blocker Proceedings of the 2010 7th IEEE Working Conference on Mining Software Repositories, p.1-10 9 /19Monday 7 March 2011
  • 19. Bug Database New report --------------- (1) & (2) Extract and preprocess bug reports (4) Predict the severity ------------ ------------ ------------ Bug Reports ------------ (3) Training predict prediction ------------ ------------ ------------ ------------ Non- Severe Severe • minor • major • trivial • critical • blocker Proceedings of the 2010 7th IEEE Working Conference on Mining Software Repositories, p.1-10 9 /19Monday 7 March 2011
  • 20. Which text mining algorithm should we use when predicting the severity? 10/19Monday 7 March 2011
  • 21. Which text mining algorithm should we use when predicting the severity? Secondary questions: 10/19Monday 7 March 2011
  • 22. Which text mining algorithm should we use when predicting the severity? Secondary questions: How much training necessary? 10/19Monday 7 March 2011
  • 23. Which text mining algorithm should we use when predicting the severity? Secondary questions: How much training necessary? What are the characteristics of the prediction algorithm? 10/19Monday 7 March 2011
  • 24. Which text mining algorithm should we use when predicting the severity? ➡ Support Vector Machines, Naive Bayes, Naive Bayes Multinomial, 1-Nearest Neighbor Secondary questions: How much training necessary? What are the characteristics of the prediction algorithm? 10/19Monday 7 March 2011
  • 25. Which text mining algorithm should we use when predicting the severity? ➡ Support Vector Machines, Naive Bayes, Naive Bayes Multinomial, 1-Nearest Neighbor Secondary questions: How much training necessary? ➡ investigate Learning Curve What are the characteristics of the prediction algorithm? 10/19Monday 7 March 2011
  • 26. Which text mining algorithm should we use when predicting the severity? ➡ Support Vector Machines, Naive Bayes, Naive Bayes Multinomial, 1-Nearest Neighbor Secondary questions: How much training necessary? ➡ investigate Learning Curve What are the characteristics of the prediction algorithm? ➡ extract keywords 10/19Monday 7 March 2011
  • 27. 11/19Monday 7 March 2011
  • 28. Support Vector Machines 11/19Monday 7 March 2011
  • 29. Support Vector Machines 1-Nearest Neighbor 11/19Monday 7 March 2011
  • 30. Support Vector Machines 1-Nearest Neighbor Naive Bayes Naive Bayes Multinomial 11/19Monday 7 March 2011
  • 31. Evaluation: ✓ Receiver Operating Characteristic(ROC) curve ✓ Area Under Curve(AUC): 0.5 is random prediction; 1.0 perfect classification ✓ deals with unbalanced category distributions 1 Classifier 1 Classifier 2 Classifier 3 0.8 True positive rate 0.6 0.4 0.2 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 False 12/19 rate positiveMonday 7 March 2011
  • 32. Table IIerm ti is defined as BASIC NUMBERS ABOUT THE SELECTED COMPONENTS FOR Cases for our study: RESPECTIVELY E CLIPSE AND GNOME d}| Product Name Non-severe bugs Severe bugsocuments and |{d ∈ Eclipse SWT 696 3218 Eclipse User Interface 1485 3351cuments containing JDT User Interface 1470 1554 Eclipse Debug 327 485erm ti in document CDT Debug 60 205 GEF Draw2D 36 83 Evolution Mailer 2537 7291fi Evolution Calendar 619 2661n with the 1-Nearest GNOME Panel 332 1297 Metacity General 331 293 classifiers. GStreamer Core 93 352 Nautilus CD-Burner 73 355ng bug reports from ate our experiment:se Bugzilla as their the name K-Fold Cross-validation. For example in the case of 10-fold cross validation, the complete set of available bugugs] Eclipse is an reports is first split randomly into 10 subsets. These subsets nvironment widely are split in a stratified manner, meaning that the distribution l settings. The bug of the severities in the subsets respect the distribution of 13/19 Monday 7 March 2011 the severities in the complete set of the bug reports. Then,
  • 33. Table IIerm ti is defined as BASIC NUMBERS ABOUT THE SELECTED COMPONENTS FOR Cases for our study: RESPECTIVELY E CLIPSE AND GNOME d}| Product Name Non-severe bugs Severe bugs Table IIerm ti is defined asocuments and |{d ∈ Eclipse SWT 696 3218 BASIC NUMBERS ABOUT THE SELECTED COMPONENTS FOR Eclipse RESPECTIVELY E CLIPSE AND GNOME 3351 User Interface 1485cuments containing JDT User Interface 1470 1554 Eclipse Debug 327 485∈ d}|ti in document erm Product CDT Name Debug Non-severe bugs Severe bugs 60 205documents and |{d ∈ Eclipse GEF SWT Draw2D 36696 3218 83 Eclipse User Interface 1485 3351 ocuments containing Evolution Mailer 2537 7291fi JDT User Interface 1470 1554 Evolution Calendar 619 2661 Eclipse Debug 327 485n withti in 1-Nearestterm the document GNOME Panel 332 1297 CDT Debug 60 205 Metacity General 331 293 classifiers. GEF GStreamer Draw2D Core 93 36 352 83 Nautilus Evolution CD-Burner Mailer 73 2537 355 7291dfi Evolution Calendar 619 2661on with reports from ng bug the 1-Nearest GNOME Panel 332 1297 sate our experiment: Metacity General 331 293 classifiers. the name K-Fold Cross-validation. For example in 352 case GStreamer Core 93 these Bugzilla as their of 10-fold cross validation, the complete set of available bug Nautilus CD-Burner 73 355ugs] bug reports from reports is first split randomly into 10 subsets. These subsets ing Eclipse is an nvironment widely are split in a stratified manner, meaning that the distribution uate our experiment: of the severities in the subsets respect example in the case the name K-Fold Cross-validation. For the distribution ofusesettings. The bug l Bugzilla as their 13/19 Monday 7 March 2011 the severities in the complete set of the bug reports. Then,
  • 34. Results 14/19Monday 7 March 2011
  • 35. Which text mining algorithm to use? (1/2) 1 1 NB NB NB Multinomial NB Multinomial 1-NN 0.9 1-NN SVM SVM 0.8 0.8 0.7 True positive rate True positive rate 0.6 0.6 0.5 0.4 0.4 0.3 0.2 0.2 0.1 0 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 False positive rate False positive rate Eclipse / SWT Evolution / Mailer 15/19Monday 7 March 2011
  • 36. OC curves accurate classifier. accurate classifier. Classifier 1 Classifier Classifier 1 2 Classifier 2 Classifier Classifier 3 3 Table IV Table IV Which text mining algorithm to use? A REA U NDER C URVE RESULTS FROM THE DIFFERENT COMPONENTS A REA U NDER C URVE RESULTS FROM THE DIFFERENT COMPONENTS (2/2) Product / Component Product / Component NB NB NB Mult. NB Mult. 1-NN 1-NN SVM SVM Eclipse / SWT Eclipse / SWT 0.74 0.74 0.83 0.83 0.62 0.62 0.76 0.76 JDT / UI JDT / UI 0.69 0.69 0.75 0.75 0.63 0.63 0.71 0.71 Eclipe / UI Eclipe / UI 0.70 0.70 0.80 0.80 0.61 0.61 0.79 0.79 Eclipse / Debug Eclipse / Debug 0.72 0.72 0.76 0.76 0.67 0.67 0.73 0.73 GEF / Draw2D GEF / Draw2D 0.59 0.59 0.55 0.55 0.51 0.51 0.48 0.48 CDT / Debug CDT / Debug 0.68 0.68 0.70 0.70 0.52 0.52 0.69 0.69 Evolution Evolution / / Mailer Mailer 0.84 0.84 0.89 0.89 0.73 0.73 0.87 0.870.60.6 0.7 0.7 0.8 0.8 0.9 0.9 1 1ateate Evolution Evolution / / Calendar Calendar 0.86 0.86 0.90 0.90 0.78 0.78 0.86 0.86 GNOME / Panel GNOME / Panel 0.89 0.89 0.90 0.90 0.78 0.78 0.86 0.86 a cumbersome activity, Metacity / General 0.72 0.76 0.69 0.71 Metacity / General 0.72 0.76 0.69 0.71 se together. Therefore, GStreamer / Core 0.74 0.76 0.65 0.73 GStreamer / Core 0.74 0.76 0.65 0.73calculated which serves Nautilus / CDBurner 0.93 0.93 0.81 0.91 Nautilus / CDBurner 0.93 0.93 0.81 0.91 accuracy. If the Area5 then the classifier isber close to 1.0 means The same conclusions can also be drawn from the other erfect predictions. This selected cases based on an analysis of the Area Under Curve sions when comparing measures in Table IV. In this table here, we highlighted the best results in bold. /19 16 From these results, we indeed notice that Monday 7 March 2011
  • 37. OC curves graph indicates lrandom predictions but of the accurate classifier.notice that the for respectively an Eclipse algorithm denoting Furthermore, we its accuracy accuracy decreases in the accurate classifier. er. In order to optimize Classifier 1 Classifier 1 Classifier 2 Classifier 2 and GNOME case. Na¨ve Bayes classifier. curve we the caseaof the standard Remember, the nearer a Lastly,is to see ı Classifier 3m for classifiers with a Table IV 1-Nearest of the graph, theIVTHE accurately be predic- Table left-upper sideNeighbor based approach tends COMPONENTS the REA U NDER C URVE RESULTS FROMmoreDIFFERENT to the the less Classifier 3OC the coordinate (1,0) o curves Which text mining algorithm to use? A REA U NDER C URVE RESULTS FROM THE DIFFERENT COMPONENTS A tions are. From Figure 3, we notice a winner in(2/2) cases: accurate classifier. both re 2 we see1 the ROC Classifier the Na¨ve Bayes Multinomial classifier. At the same time, we ı Classifier 2 Product / Component Product / Component NB IV Mult. 1-NN SVM Table NB Mult. NB NB 1-NN SVM We can observe that Classifier 3 also observe that the Support VectorDIFFERENT COMPONENTSis A REA U NDER C URVE RESULTS FROM THE Machines classifier ehavior. We also notice Eclipse / SWT Eclipse / SWT 0.74 ı 0.74 0.83 nearly as accurate as the Na¨ve Bayes Multinomial 0.76 0.83 0.62 0.62 classifier. 0.76 random predictions but JDT / UI we notice 0.69 JDT / UI 0.69 0.75 Furthermore, Component that the accuracy 1-NN SVMin the 0.75 decreases 0.63 0.63 0.71 0.71 Product / NB NB Mult. case of the/ standard Na¨ve Bayes0.80 Eclipe / UI Eclipe UI ı 0.70 0.70 classifier. Lastly, we see 0.80 0.61 0.61 0.79 0.79 the 1-NearestSWT Eclipse / Debug 0.74 0.72 0.72 0.83 Neighbor based approach tends to be the less Eclipse / Debug 0.76 0.76 0.62 0.67 0.67 0.76 0.73 0.73OC curves accurate classifier. JDT // UI GEF / Draw2D 0.69 0.59 0.75 0.55 0.63 0.51 0.71 0.48 GEF Draw2D 0.59 0.55 0.51 0.48 Classifier 1 Classifier 2 Classifier 3 Eclipe / UI CDT / Debug CDT / Debug 0.68 IV 0.80 0.70 0.68 Table 0.70 0.70 0.61 0.52 0.52 0.79 0.69 0.69 0.72 0.76 A REA U NDER Debug RESULTS FROM THE DIFFERENT COMPONENTS Eclipse / C URVE 0.67 0.73 Evolution / Mailer Evolution / Mailer 0.84 0.84 0.89 0.89 0.73 0.73 0.87 0.87 0.6 0.6 0.7 0.7 0.8 0.8 0.9 0.9 1 1 GEF / Draw2D 0.59 0.55 0.51 0.48 ate Evolution / Calendar Evolution / Calendar 0.86 0.86 0.90 0.90 0.78 0.78 0.86 0.86 ate CDT / Debug Product / Component 0.68 NB 0.70 NB Mult. 0.52 1-NN 0.69 SVM GNOME / Panel GNOME / Panel 0.89 0.89 0.90 0.90 0.78 0.78 0.86 0.86 a cumbersome activity, Evolution/ /General 0.89 Metacity Mailer Metacity / General Eclipse / SWT 0.84 0.72 0.74 0.72 0.76 0.83 0.76 0.73 0.69 0.62 0.69 0.87 0.71 0.76 0.71 se together. Therefore, 0.6 0.7 0.8 0.9 1 Evolution / /Calendar GStreamer / Core 0.86 0.74 0.90 0.76 0.78 0.65 0.86 0.73 ate JDT / UI GStreamer Core 0.69 0.74 0.75 0.76 0.63 0.65 0.71 0.73calculated which serves GNOMEUI Panel Nautilus // CDBurner 0.89 0.93 0.90 0.93 0.78 0.81 0.86 0.91 Eclipe / / CDBurner Nautilus 0.70 0.93 0.80 0.93 0.61 0.81 0.79 0.91 a cumbersomethe Area accuracy. If activity, Metacity / General 0.72 0.76 0.69 0.71 Eclipse / Debug 0.72 0.76 0.67 0.73 se together. Therefore,5 then the classifier is GStreamer / Core 0.74 0.76 0.65 0.73 GEF / Draw2D 0.59 0.55 0.51 0.48calculated to 1.0 meansber close which serves The same /conclusions Nautilus CDBurner can 0.93 also 0.93 drawn from 0.91 other be 0.81 the erfect predictions. Area accuracy. If the This selected cases based on an analysis0.70 the Area Under Curve CDT / Debug 0.68 of 0.52 0.695 thenwhenclassifier is sions the comparing measures in Table IV. In 0.84 table here, we highlighted the Evolution / Mailer this 0.89 0.73 0.87ber close 0.8 1.0 means 0.6 ate 0.7 to 0.9 1 The same /bold. /19 16 best results inconclusions 0.86 also 0.90 drawn fromnotice that From these results, we indeed 0.86 other Evolution Calendar can be 0.78 the Monday 7 March 2011
  • 38. How much need for training? 1 NB NB mult. 1-NN SVM0.90.80.70.60.5 0 200 400 600 800 1000 1200 1400 Eclipse / SWT 17/19Monday 7 March 2011
  • 39. How much need for training? 1 1 NB NB NB mult. NB mult. 1-NN 1-NN SVM SVM0.9 0.90.8 0.80.7 0.70.6 0.60.5 0.5 0 200 400 600 800 1000 1200 1400 0 500 1000 1500 2000 2500 3000 Eclipse / SWT Evolution / Mailer 17/19Monday 7 March 2011
  • 40. terms which we extracted from the resulting Na¨ve Bayes ı tend to vary Multinomial classifier of two cases. specific indic Classifier characteristics Table VI T OP MOST SIGNIFICANT TERMS OF EACH SEVERITY approach whe is sound. Case Non-severe Severe Each com of describ Eclipse quick, fix, dialog, npe, java, file, JDT type, gener, code, package, open, junit, which are UI set, javadoc, wizard, eclips, editor, compone mnemon, messag, prefer, folder, problem, import, method, project, cannot, manipul, button, delete, rename, warn, page, miss, error, view, search, In this sect wrong, extract, label, fail, intern, broken, add, quickfix, pref, run, explore, cause, validity of ou constant, enabl, icon, perspect, jdt, or alleviate th paramet, constructor classpath, hang, resourc, save, crash studies resear Evolution message, not, crash, evolut, mail, categories. Mailer button, change, email, imap, click, Construc dialog, display, evo, inbox, mailer, doesnt, header, list, open, read, server, component, a search, select, show, start, hang, bodi, component w signature, text, cancel, onli, appli, unread, view, window, junk, make, prefer, reporters hav load, pad, ad, content, tree, user, automat, field in a bug startup, subscribe, mode, sourc, sigsegv, another, encrypt, warn, segment risk that the import, press, print, ways than inte sometimes components w 18/19Monday 7 March 2011 Internal
  • 41. terms which we extracted from the resulting Na¨ve Bayes ı tend to vary Multinomial classifier of two cases. specific indic Classifier characteristics Table VI T OP MOST SIGNIFICANT TERMS OF EACH SEVERITY approach whe is sound. Case Non-severe Severe Each com of describ Eclipse quick, fix, dialog, npe, java, file, JDT type, gener, code, package, open, junit, which are UI set, javadoc, wizard, eclips, editor, compone mnemon, messag, prefer, folder, problem, import, method, project, cannot, manipul, button, delete, rename, warn, page, miss, error, view, search, In this sect wrong, extract, label, fail, intern, broken, add, quickfix, pref, run, explore, cause, validity of ou constant, enabl, icon, perspect, jdt, or alleviate th paramet, constructor classpath, hang, resourc, save, crash studies resear Evolution message, not, crash, evolut, mail, categories. Mailer button, change, email, imap, click, Construc dialog, display, evo, inbox, mailer, doesnt, header, list, open, read, server, component, a search, select, show, start, hang, bodi, component w signature, text, cancel, onli, appli, unread, view, window, junk, make, prefer, reporters hav load, pad, ad, content, tree, user, automat, field in a bug startup, subscribe, mode, sourc, sigsegv, another, encrypt, warn, segment risk that the import, press, print, ways than inte sometimes components w 18/19Monday 7 March 2011 Internal
  • 42. Conclusions ✓ Naive Bayes Multinomial most accurate predictor ✓ More training results in more stable predictions ✓ Characteristics of classifiers tend to be component specific 19/19Monday 7 March 2011