Your SlideShare is downloading. ×
CASCON08.ppt
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

CASCON08.ppt

34
views

Published on

Published in: Technology, Education

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
34
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
0
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Is it a Bug or an Enhancement? A Text-based Approach to Classify Change Requests Giuliano Antoniol, Kamel Ayari, Massimiliano Di Penta, Foutse Khomh and Yann-Gaël Guéhéneuc CASCON [2008] October 30, 2008CASCON [2008] October 27-30 © Khomh 2008
  • 2. Context Bug tracking systems are valuable assets for managing maintenance activities. They collect many different kinds of issues: requests for defect fixing, enhancements, refactoring/restructuring activities and organizational issues. But, these are simply labeled as bug for lack of a better classification. In recent years, the literature reported contributions on merging data from CVS and bug reports to identify whether CVS changes are related to bug fixes, to detect co-changes and to study evolution patterns.CASCON [2008] October 27-30 2 © Khomh 2008
  • 3. Related WorksThese recent years, there have been many studies based on data from BTS Sliwerski et al. introduced a refined approach to identify whether a change induced a bug fix. Runeson et al. investigated using Natural Language Processing techniques to identify of duplicate defect reports. But, none of these works deeply investigated the kinds of data stored in BTS, such as the Mozillas Bugzilla BTS.CASCON [2008] October 27-30 3 © Khomh 2008
  • 4. Research questions In this paper, we study the consistency of data contain in BTS.To achieve that, We manually classified 1,800 issues extracted from the BTS of Mozilla, Eclipse, and JBoss using simple majority voting. We used machine learning techniques to perform automatic classifications.CASCON [2008] October 27-30 4 © Khomh 2008
  • 5. Background: BTS (Bug Tracking System) Developer views Te the bug description s te r po s ts th e bu g in to BT T Tester verifies Developper the bug makes and error Error results in a fault In this study, we’ll focus on the two most popular bug tracking systems: Bugzilla and JiraCASCON [2008] October 27-30 5 © Khomh 2008
  • 6. Research questions We answered the following questions: RQ1: Issue classification. To what extent the information contained in issues posted on bug tracking systems can be used to classify such issues, distinguishing bugs (i.e., corrective maintenance) from other activities (e.g., enhancement, refactoring. . . )? RQ2: Discriminating terms. What are the terms/fields that machine learning techniques use to discern bugs from other issues? RQ3: Comparison with grep. Do machine learning techniques perform better than grep and regular expression matching in general, techniques often used to analyze Concurrent Versions Systems (CVS)/SubVersion (SVN) logs and classify commits between bugs and other activities?CASCON [2008] October 27-30 6 © Khomh 2008
  • 7. Objects of our study We perform our study using three well-known, industrial- strength, open-source systems. - Eclipse, - Mozilla - JBoss We use a RSS feeder to extract the 3,207 issues classified as “Resolved”. We select issues with the “Resolved” or “Closed” status to avoid duplicated bugs, rejected issues, or issues awaiting triage.CASCON [2008] October 27-30 7 © Khomh 2008
  • 8. Automatic classification We first use a feature selection algorithm to select a subset of the available features with which to perform the automatic classification. Automatic classifiers require a labeled corpus, a set of tagged BTS issues acting as the Oracle for the training. Then, each automatic classifier is trained on a set of BTS issues and its performance is evaluated using cross validation.CASCON [2008] October 27-30 8 © Khomh 2008
  • 9. Automatic classificationThe automatic classification of BTS issues is performed using the Weka tool (http://www.cs.waikato.ac.nz/ml/weka/), in particular using: the symmetrical uncertainty attribute selector, the standard probabilistic naive Bayes classifier, the alternating decision tree (ADTree), and the linear logistic regression classifier.CASCON [2008] October 27-30 9 © Khomh 2008
  • 10. Construction of the Oracle We randomly sample and manually classify 600 issues for each system, for a total of 1,800 distinct issues. We organize the issues in bundles of 150 entries each. For every subset, we ask three software engineers to classify the issues manually. Stating if the issues are a corrective maintenance (bugs) or a non-corrective maintenance (enhancement, refactoring, re-documentation, or other, i.e., non bug). The classifications go through a simple majority vote and a decision on the status of each issue is made.CASCON [2008] October 27-30 10 © Khomh 2008
  • 11. Construction of the OracleDecision rule An entry is considered a corrective maintenance if at least two out of three engineers classified it as a corrective maintenance (hereby referred to as “bug”). Otherwise the entry is considered as a non-corrective maintenance (hereby referred to as “non bug”). The classification yields the following results:CASCON [2008] October 27-30 11 © Khomh 2008
  • 12. Terms extraction and indexing Step 1: Term extraction from bug reporting systems Pruning punctuation and special characters Camel case, “-”, and “_” word splitting Step 2: Stemming using the R implementation of the Porter stemmer (http://www.r-project.org) Note: stop words are not removed since terms such as “not”, “might”, “should” contributes to the classification and are actually selected Step 3: Term indexing Terms are indexed using the term frequency (tf) We didn’t use tf-idf since we don’t want to penalize terms appearing in many documentsCASCON [2008] October 27-30 12 © Khomh 2008
  • 13. Results of the automatic classification Mozilla: automatic classification confusion matrices (in bold percentage of correct decisions).CASCON [2008] October 27-30 13 © Khomh 2008
  • 14. Results of the automatic classification Eclipse: automatic classification confusion matrices (in bold percentage of correct decisions).CASCON [2008] October 27-30 14 © Khomh 2008
  • 15. Results of the automatic classificationJBoss: automatic classification confusion matrices (in bold percentage of correct decisions).CASCON [2008] October 27-30 15 © Khomh 2008
  • 16. Discriminating Terms we also studied the features that are used to perform the classification. Positive coefficients lead the classification tend towards a bug classification, while negative coefficients towards a non-bug classification. Terms such as “crash”, “critic”, “broken”, “when” lead to classifying the issue as a “bug”. Terms such as “should”, “implement”, “support” cause a classification as “non-bug“. Mozilla: Example of ADtreeCASCON [2008] October 27-30 16 © Khomh 2008
  • 17. Discriminating Terms Positive coefficients lead the classification tend towards a non-bug classification, while negative coefficients towards a bug classification. Terms having a high influence for the “bug” classification are: “except(ion)”, “fail”, “npe” (null-pointer exception), “error”, “correct”, “termin(ation)”, “invalid”. Terms such as “provid(e)”, “add”, possibly indicate a non- bug issue. Eclipse: Logistic Regression CoefficientsCASCON [2008] October 27-30 17 © Khomh 2008
  • 18. Comparison with grep To assess the usefulness of the machine-learning classifiers, it is useful to compare their performance with those of the simplest classifier that developers would have used: string and regular expression matching, e.g. using the Unix utility grep. We classify issues my means of the following grep regular expression to maximize retrieval:CASCON [2008] October 27-30 18 © Khomh 2008
  • 19. Comparison with grep Each hit on the filtered textual information of the 1,800 manually-classified bugs was considered as a detected bug; multiple hits on the same issues were not counted. Mozilla grep confusion matrix for manually classified bugs (in bold percentage of correct decisions).CASCON [2008] October 27-30 19 © Khomh 2008
  • 20. Comparison with grep Eclipse grep confusion matrix for manually classified bugs (in bold percentage of correct decisions). JBoss grep confusion matrix for manually classified bugs (in bold percentage of correct decisions).CASCON [2008] October 27-30 20 © Khomh 2008
  • 21. Threats to the ValidityInternal validity We attempted to avoid any bias in the building of the oracle and of the classifiers by first classifying each issues manually with-out making any choice on the classifiers to be used.External validity We randomly selected and manually classified our issues. We obtained a confident level of 95% and a confidence interval of 10% for precision and recall. Although the approach is perfectly applicable to other systems, we do not know whether the same results will be obtained.CASCON [2008] October 27-30 21 © Khomh 2008
  • 22. Conclusion We showed that linguistic information contained in BTS entries is sufficient to automatically distinguish corrective maintenance from other activities. This is relevant in that it opens the possibility of building automatic routing systems, i.e., systems that automatically classify submitted tickets and route them to the maintenance team (bugs) or to team leader (enhancement requests and other issues). Certain terms and fields lead to more discriminating classifiers between “bug” and “non-bug” issues. A naive approach, using grep, is no match for the classifiers built using our oracle.CASCON [2008] October 27-30 22 © Khomh 2008
  • 23. Conclusion We can report that, out of the 1,800 manually-classified issues, less than half are related to corrective maintenance. Therefore, bug tracking systems, in open-source development, have a far more complex use than simple bookkeeping of corrective maintenance. Study based on BTS issues should carefully consider what issues are used to build their predictive models.CASCON [2008] October 27-30 23 © Khomh 2008
  • 24. Future WorkOur future work includes studying the relation between: Bugs and design patterns, Bugs and design defects.CASCON [2008] October 27-30 24 © Khomh 2008
  • 25. Questions Thank you for your attention !CASCON [2008] October 27-30 25 © Khomh 2008
  • 26. Feature selection Not all features (terms) contribute to increase precision and recall Also, need to build a simple model easy to be interpreted and reused (external validity) Occam’s razor principle Feature selection therefore necessary Some algorithms (e.g. decision tree) already have a feature selection, others not Symmetric uncertainty selector: select attributes that well correlate with the class and have little intercorrelationCASCON [2008] October 27-30 26 © Khomh 2008
  • 27. Naïve Bayes Classifier Probabilistic classifier that applies the Bayes theorem assuming a strong (naïve) assumption of independence among features Selects the most likely classification C for the feature values F1, F2, … Fn p(C) p(F1 ,..., Fn | C) p(C | F1 ,..., Fn ) = p(F 1 ,..., Fn )CASCON [2008] October 27-30 27 © Khomh 2008
  • 28. Classification treeMapping from observationsabout an item to conclusions outlookabout its target valueLeaves represent classifications sunny rainyBranches represent overcastconjunctions (questions) onfeatures that lead to those humidity windyclassifications high normal false true yes no yes yes noCASCON [2008] October 27-30 28 © Khomh 2008
  • 29. Logistic regressionModels relationshipbetween set of variables xi dichotomous (binary) variable Y 1.0Very common in software defect-pronenessengineering 0.8 E.g. classification of Probability of faulty classes 0.6 0.4 0.2 0.0 XiCASCON [2008] October 27-30 29 © Khomh 2008

×