PROMISE 2011: "Detecting Bug Duplicate Reports through Locality of Reference"


Published on

"Detecting Bug Duplicate Reports through Locality of Reference"
Tomi Prifti, Sean Banerjee and Bojan Cukic.

Published in: Technology, Education
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

PROMISE 2011: "Detecting Bug Duplicate Reports through Locality of Reference"

  1. 1. Detecting Bug Duplicate Reports Through Locality of Reference Tomi Prifti, Sean Banerjee, Bojan Cukic Lane Department of CSEE West Virginia University Morgantown, WV, USA September 2011
  2. 2. Presentation Outline <ul><li>Introduction </li></ul><ul><li>Goals </li></ul><ul><li>Related Work </li></ul><ul><li>Understanding the Firefox Repository </li></ul><ul><li>Experimental Setup </li></ul><ul><li>Results </li></ul><ul><li>Summary </li></ul>
  3. 3. Introduction <ul><li>Bug tracking systems are essential for software maintenance and testing </li></ul><ul><li>Developers and simple users can report failure occurrences </li></ul><ul><li>Advantages: </li></ul><ul><li>Users involved in error reporting </li></ul><ul><li>Direct impact of software quality. </li></ul><ul><li>Disadvantages: </li></ul><ul><li>Large number of reports on daily basis. </li></ul><ul><li>Significant effort to triage. </li></ul><ul><li>Users may submit many duplicate reports. </li></ul>
  4. 4. A typical bug report
  5. 5. Goals <ul><li>Comprehensive empirical analysis of a large bug report dataset. </li></ul><ul><li>Creation of a search tool </li></ul><ul><ul><li>Encourage users to search the repository </li></ul></ul><ul><ul><li>Avoid duplicate report submission whenever possible </li></ul></ul><ul><ul><li>Assisting with report triage </li></ul></ul><ul><ul><ul><li>Build a list of reports possibly describing the same problem </li></ul></ul></ul><ul><ul><ul><li>Let a triager examines the suggested list </li></ul></ul></ul>
  6. 6. Related Work <ul><li>Providing Triagers with a Suggested List </li></ul><ul><ul><li>Provide a suggested list of similar bugs to triagers for examinations. </li></ul></ul><ul><ul><ul><li>Wang et. al. exploit NLP techniques and execution information </li></ul></ul></ul><ul><ul><ul><li>Duplicate detection rate as high as 67%-93% </li></ul></ul></ul><ul><li>Semi-automated Filtering </li></ul><ul><ul><li>Determine the type of the report (Duplicate or Primary). If the new report is classified as a duplicate filter it out. </li></ul></ul><ul><ul><ul><li>Jalbert et al. use text semantics and a graph clustering technique to predict duplicate status </li></ul></ul></ul><ul><ul><ul><li>Filtered out only 8% of duplicate reports </li></ul></ul></ul>
  7. 7. Related Work <ul><li>Semi-automated Assignment </li></ul><ul><ul><li>Apply text categorization techniques to predict the developer that should work on the bug </li></ul></ul><ul><ul><ul><li>Cubranic et. al. apply supervised Bayesian learning. Correctly classify 30% of the reports </li></ul></ul></ul><ul><ul><ul><li>Anvik et. al. uses a supervised machine learning algorithm. Precision rates of 57% and 64% for Firefox and Eclipse </li></ul></ul></ul><ul><li>Improving Report Quality </li></ul><ul><ul><li>Duplicate reports are not considered harmful </li></ul></ul><ul><ul><ul><li>Bettenburg et al. developed a tool, called CUEZILLA, that measures the quality of bug reports in real time </li></ul></ul></ul><ul><ul><ul><li>“ Steps to reproduce” and “Stack traces” are the most useful information in bug reports </li></ul></ul></ul>
  8. 8. Related Work <ul><li>Bugzilla Search Tool </li></ul><ul><ul><li>Bugzilla 4.0 released around February 2011 provides duplicate detection </li></ul></ul><ul><ul><li>Tool performs a Boolean full text search on the title over the entire repository </li></ul></ul><ul><ul><li>Generates a dozen or so reports that may match at least one of the search terms </li></ul></ul><ul><ul><li>In some instance testing with the exact title from an existing report title did not return the report itself </li></ul></ul><ul><ul><li>Unknown accuracy of reported matches </li></ul></ul>
  9. 9. Firefox Repository <ul><li>Firefox releases: 1.0.5, 1.5, 2.0, 3.0, 3.5 and the current version 3.6 (as of June 2010). </li></ul><ul><li>65% of reports reside in groups of one. </li></ul><ul><li>90% of duplicates are distributed in small groups of 2-16 reports </li></ul>
  10. 10. Time Interval Between Reports <ul><li>Many bugs receive the first duplicate within the first few months of the original report. </li></ul>
  11. 11. Experimental Setup <ul><li>Tokenization - “Bag-of-Words” </li></ul><ul><li>Stemming. Reducing words to their root </li></ul><ul><li>Stop Words Removal </li></ul><ul><ul><li>Lucene API used for pre-processing </li></ul></ul><ul><li>Term Frequency/Inverse Document Frequency (TF/IDF) used for weighting words </li></ul><ul><li>Cosine Similarity used for similarity measures </li></ul><ul><li>Example of tokenizing, stemming and stop word removal </li></ul><ul><li>Sending email is not functional. send email function </li></ul>
  12. 12. Experimental Procedure <ul><li>Start with initial 50% as historical information </li></ul><ul><li>Group containing most recent primary or duplicate is on top of the initial list </li></ul><ul><li>Build suggested list using IR techniques </li></ul><ul><li>As experiment progresses historical repository increases </li></ul><ul><li>Continue until reports classified as duplicate or primary </li></ul><ul><li>If a bug is primary it is forwarded to the repository </li></ul><ul><li>This may not be realistic as triagers may misjudge reports </li></ul>
  13. 13. Measuring Performance <ul><li>Performance of the bug search tool is measured by the recall rate </li></ul><ul><ul><li>N recalled refers to the number of duplicate reports correctly classified </li></ul></ul><ul><ul><li>N total refers to the total number of duplicate reports </li></ul></ul>
  14. 14. Approach methodology <ul><li>Reporters query the repository. </li></ul><ul><li>Use “title” (summary) to compare reports. </li></ul><ul><li>Four experiments: </li></ul><ul><ul><li>TF/IDF </li></ul></ul><ul><ul><li>“ Sliding Window” - TF/IDF </li></ul></ul><ul><ul><li>“ Sliding Window” - Group Centroids - TF/IDF </li></ul></ul><ul><ul><li>“ Sliding Window” - Group Centroids </li></ul></ul><ul><li>The centroid is composed of all unique terms from all reports in the group and the sum of their frequencies in each report. The total frequency of each term is divided by the number of reports in the group. </li></ul>
  15. 15. Sliding Window Defined <ul><li>“ Sliding-Window” approach. Keep a window of fixed size n </li></ul><ul><ul><li>Sort all groups based on the time elapsed between the last report and the new incoming report. </li></ul></ul><ul><ul><li>Select top n groups (2000 is optimal analysis shows 95% accuracy of duplicate being in this group) </li></ul></ul><ul><ul><li>Apply IR techniques only on top n groups </li></ul></ul><ul><ul><li>Build a short list of top m most similar reports to present to the triager/reporter </li></ul></ul>
  16. 16. Experimental Results <ul><li>Our results demonstrate that Time-Window/Group Centroid and report summaries predict duplicate problem reports with a recall ratio of up to 53%. </li></ul>
  17. 17. Performance and Runtime <ul><li>Large variance in recall rate initially. Time window approach stabilizes, while TF/IDF degrades. </li></ul><ul><li>Classification run time is faster for the Time Window approach. Additional report increases computation time in TF/IDF </li></ul>
  18. 18. Result Comparisons Group Approach Results Hiew, et-al Text analysis Recall rate ~50% Cubranic, et-al Bayesian learning Text categorization Correctly predicted ~30% duplicates Jalbert, et-al Text Similarity Clustering Recall rate ~51% List size 20 Wang, et-al NLP Execution Information 67-93% detection rate (43-72% with NLP) Wang, et-al Enhanced version of prior algorithm 17-31% improvement over state of art Our approach Time Window/Centroids ~53% recall rate
  19. 19. Threats to Validity <ul><li>Assumption that the ground truth on duplicates is correct </li></ul><ul><ul><li>The life cycle of a bug is ever changing </li></ul></ul><ul><ul><li>Some reports often change state multiple times </li></ul></ul>
  20. 20. Summary and Future Work <ul><li>SUMMARY </li></ul><ul><li>Comprehensive study to analyze long term duplicate trends in a large, open source project. </li></ul><ul><li>Improve search features in duplicate detection by providing a search list. </li></ul><ul><li>Time interval between reports can be used to improve the search space. </li></ul><ul><li>FUTURE WORK </li></ul><ul><li>Compare with other projects (eg: Eclipse) to be able to generalize the approach. </li></ul><ul><li>Effects on duplicate propagation caused by a user incorrectly selecting a report from the suggested list. </li></ul>
  21. 21. TF/IDF <ul><li>Compare vector representing a new report to every vector that is currently in the database. </li></ul><ul><li>Vectors in the database are weighted using TF/IDF to emphasize rare words. </li></ul><ul><li>The reports are ranked based on their cosine-similarity scores. </li></ul><ul><li>Report ranking is used to build the suggested list presented to the user. </li></ul><ul><li>Run time impacted as repository size grows. </li></ul>
  22. 22. Sliding Window - TF/IDF <ul><li>Apply time window to limit groups under consideration for search. </li></ul><ul><li>Only the reports within 2,000 groups are considered. </li></ul><ul><li>Reports are weighted using TF/IDF. </li></ul><ul><li>Scoring and building of the suggested list same as TF/IDF </li></ul>
  23. 23. Sliding Window – Centroid <ul><li>Same time window. </li></ul><ul><li>Reports from the 2,000 groups not immediately searched and weighted using TF/IDF. </li></ul><ul><li>Centroid vector representing each group is used. </li></ul><ul><li>Example: </li></ul><ul><ul><li>Summary 1 unable send email </li></ul></ul><ul><ul><li>Summary 2 send email function </li></ul></ul><ul><ul><li>Summary 3 send email after enter recipient </li></ul></ul><ul><ul><li>The resulting centroid of the group is: 1.0 send, 0.33 unable, 1.0 mail, 0.33 function, 0.33 after, 0.33 enter, 0.33 recipient. </li></ul></ul>
  24. 24. Sliding Window – Centroid – TD/IDF <ul><li>Uses centroid technique described before. </li></ul><ul><li>Weight each term in centroids using TF/IDF weighting scheme. </li></ul>