• Save

Loading…

Flash Player 9 (or above) is needed to view presentations.
We have detected that you do not have it on your computer. To install it, go here.

Like this presentation? Why not share!

The Quality of Bug Reports in Eclipse ETX'07

on

  • 2,190 views

Talk given at ETX 2007 Workshop (ooPSLA 2007, Montreal)

Talk given at ETX 2007 Workshop (ooPSLA 2007, Montreal)

Statistics

Views

Total Views
2,190
Views on SlideShare
1,338
Embed Views
852

Actions

Likes
1
Downloads
0
Comments
0

9 Embeds 852

http://www.st.cs.uni-saarland.de 733
http://www.st.cs.uni-sb.de 64
http://www.cs.vu.nl 15
http://www.sascha-just.com 14
https://www.st.cs.uni-saarland.de 13
http://nicolas-bettenburg.com 5
https://blackboard.ucalgary.ca 4
http://www.slideshare.net 3
http://www.linkedin.com 1
More...

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    The Quality of Bug Reports in Eclipse ETX'07 The Quality of Bug Reports in Eclipse ETX'07 Presentation Transcript

    • The Quality of Bug Reports in Eclipse Nicolas Bettenburg Sascha Just Adrian Schröter Saarland University Saarland University Saarland University Cathrin Weiß Rahul Premraj Tom Zimmermann Saarland University Saarland University University of Calgary
    • Basis of Research: Bug Reports Who Should Fix This Bug? Detection of Duplicate Defect Reports Using Natural Language Processing John Anvik, Lyndon Hiew and Gail C. Murphy Department of Computer Science University of British Columbia Per Runeson, Magnus Alexandersson and Oskar Nyholm lyndonh, murphy}@cs.ubc.ca {janvik, Software Engineering Research Group Lund University, Box 118, SE-221 00 Lund, Sweden per.runeson@telecom.lth.se ABSTRACT However, this potential advantage also comes with a sig- nificant cost. Each bug that is reported must be triaged Open source development projects typically support an open How Long will it Take to Fix This Bug? to determine if it describes a meaningful new problem or bug repository to which both developers and users can re- enhancement, and if it does, it must be assigned to an ap- port bugs. The reports that appear in this repository must propriate developer for further handling [13]. Consider the be triaged to determine if the report is one which requires Abstract and handling, hence support to speed up the duplicate case of the Eclipse open source project1 over a four month attention and if it is, which developer will be assigned the period (January 1, 2005 to April 30, 2005) when 3426 re- Rahul Premraj Cathrin Weiß Thomas Zimmermann Andreas Zeller detection process is appreciated. responsibility of resolving the report. Large open source de- Saarland University Saarland University Saarland University Defect reports are generated from various testing and Saarland University The defect reports are written in natural language, ports were filed, averaging 29 reports per day. Assuming velopments are burdened by the rate at which new bug re- that a triager takes approximately five minutes to read and and the duplicate identification requires suitable infor- weiss@st.cs.uni-sb.de premraj@cs.uni-sb.de tz@acm.org zeller@acm.org development activities in software engineering. Some- ports appear in the bug repository. In this paper, we present handle each report, two person-hours per day is being spent mation retrieval methods. In this study, we investigate a semi-automated approach intended to ease one part of this times two reports are submitted that describe the same on this activity. If all of these reports led to improvements process, the assignment of reports to a developer. Our ap- the use of Natural Language Processing (NLP) [17] problem, leading to duplicate reports. These reports in the code, this might be an acceptable cost to the project. proach applies a machine learning algorithm to the open bug techniques to help automate this process. NLP is previ- are mostly written in structured natural language, and However, since many of the reports are duplicates of exist- Abstract repository to learn the kinds of reports each developer re- project managers, because they allow to plan the cost and ously used in requirements engineering [12][3][19], as such, it is hard to compare two reports for similarity ing reports or are not valid reports, much of this work does solves. When a new report arrives, the classifier produced time of future releases. program comprehension [2] and in defect report man- not improve the product. For instance, of the 3426 reports with formal methods. In order to identify duplicates, by the machine learning technique suggests a small number for Eclipse, 1190 (36%) were marked either as invalid, a du- software problem has Predicting the time and effort for a Our approach is illustrated in Figure 1. As a new issue agement [15], although with a different angle. we investigate using Natural Language Processing of developers suitable to resolve the report. With this ap- plicate, a bug that could not be a difficult task.one that will an approach that au- long been replicated, or We present report r is entered into the bug database (1), we search for Basically, we take the words in the defect report in proach, we have reached precision levels of 57% and 64% on (NLP) techniques to support the identification. A pro- not be fixed. tomatically predicts the fixing effort, i.e., the person-hours the existing issue reports which have a description that is plain English, make some processing of the text and the Eclipse and Firefox development projects respectively. totype tool is developed and evaluated in a case study As a means of reducing the time spent triaging, we present spent on fixing an issue. Our technique leverages existing most similar to r (2). We then combine their reported effort We have also applied our approach to the gcc open source de- then use the statistics on the occurrences of the words analyzing defect reports at Sony Ericsson Mobile Com- an approach for semi-automating one part of the process, the assignment of a developer to a newly received given a new issue report, we use issue tracking systems: report. Our as a prediction for our issue report r (3). velopment with less positive results. We describe the condi- to identify similar defect reports. We implemented a munications. The evaluation shows that about 2/3 of tions under which the approach is applicable and also report approach uses a machine Lucene framework to search for similar, earlier reports the learning algorithm to recommend In contrast to previous work (see Section 8), the present prototype tool and evaluated its effects on the internal the duplicates can possibly be found using the NLP on the lessons we learned about applying machine learning to a triager a set of and use their average be appropriate developers who may time as a prediction. Our approach paper makes the following original contributions: defect reporting system of Sony Ericsson Mobile techniques. Different variants of the techniques pro- to repositories used in open source development. for resolving the bug.thus allows for early effort the triage helping in assign- This information can help estimation, Communications which contained thousands of reports. 1. We leverage existing vide databases toresult differences, indicating a robust bug only minor automatically Categories and Subject Descriptors: D.2 [Software]: process in two ways: ing issues and schedulingto process a it may allow a triager stable releases. We evaluated our Further, we interviewed some users of the prototype estimate effort for new problems. User testing shows that the overall attitude technology. Software Engineering bug more quickly, andapproach using effort with less overallJBoss project. Given it may allow triagers data from the tool to get a qualitative view of the effects. The proto- towards the technique is positive and that it has a knowledge of the system to perform bug assignments more General Terms: Management. a sufficient number of issues reports, our automatic predic- 2. We use text similarity techniques to identify those issue type tool identified about 40% of the marked duplicate growth potential. correctly. Our approach requires a project to have had an Keywords: Problem tracking, issue tracking, bug report open bug repository for some period to time from which the issues that are bugs, tions are close of the actual effort; for reports which are most closely related. defect reports, which can be seen as low figure. How- assignment, bug triage, machine learning patterns of who solves what off by only onecan bebeating na¨ve predictions by a we are kinds of bugs hour, learned. ı ever, since only one type of duplicate reports are possi- 3. Given a sufficient number of issue reports to learn Our approach also requires thefour. factor of specification of heuristics to bly found by the technique, we estimate that the tech- 1. Introduction from, our predictions are close to the actual effort, es- 1. INTRODUCTION interpret how a project uses the bug repository. We believe nique finds 2/3 of the possible duplicates. Also, in pecially for issues that are bugs. that neither of these requirements are arduous for the large Most open source software developments incorporate an terms of working hours, reducing the effort to identify When a complex software product like a mobile projects we are targeting with this approach. Using our ap- open bug repository that allows both developers and users to proach we have been 1. Introduction The remainder of the paper is organized as follows: In Sec- duplicate reports with 40% is still a substantial saving able to correctly suggest appropriate phone is developed, it is natural and common that post problems encountered with the software, suggest possi- tion 2, we give background information on the role of issue for a major software development company, which developers to whom to assign a bug with a precision between ble enhancements, and comment upon existing bug reports. software defects slip into the product, leading to func- reports in the software process. Section 3 briefly describes 57% and 64% for the Eclipse and Firefox2a bug repositories, handles thousands of defect reports every year. Predicting when particular software development task One potential advantage of an open bug repository is that it tional failures, i.e. the phone does not have the ex- how we accessed the data. Section 4 describes our statistical which we used to develop the approach. We have also ap- The paper is outlined as follows. Section 2 intro- may allow more bugs to be identified and solved, improving will be completed has always been difficult. The time it pected behavior. These failures are found in testing or plied our approach to the gcc repository, but the results the quality of the software produced [12]. approach, which is then evaluated in a case study (Section 5 duces the theory on defect reporting and on natural were not as encouraging, hovering a defect6% particularly challenging to predict. takes to fix around is precision. We other development activities and reported in a defect and 6) involving JBoss and four of its subprojects. After language processing. Section 3 presents the tailoring believe this is in partWhy to that so? In contrast to programming, which is a con- due is a prolific bug-fixing developer management system [5][18]. If the development proc- discussing threats to validity (Section 7) and related work made of the NLP techniques to fit the duplicate detec- struction process, debugging is a search process—a search who skews the learning process. ess is highly parallel, or a product line architecture is (Section 8), we close with consequences (Section 9). tion purpose. In Section 4, we specify the case study The paper makes two contributions: all of the program’s code, its runs, its which can involve used, where components are used in different products, conducted for evaluation of the technique, and Section states, or even its history. Debugging is particularly nasty the same defect may easily be reported multiple times, 5 presents the case study results. Finally Section 6 con- Eclipse provides an because the original assumptions of the program’s authors 1 extensible development environment, 6+-%,*quot;5(+'%*+,quot;*$ resulting in duplicate reports in the defect management 7*+0&/$+0%+11quot;*$ cludes the paper and outlines further work. including a Java IDE, cannot be trusted.at www.eclipse.org identified, fixing it is and can be found Once the defect is )#%-+&4.$+0%)8+*)4+ © ACM, (2006). This is the author’s version of the work. It is posted here system. These duplicates cost effort in identification (verified 31/08/05). Firefox provides a web browser and can be found butwww.earlier effort to search again a programming activity, at the by permission of ACM for your personal use. Not for redistribution. 2 ICSE’06, May 20–28, 2006, Shanghai, China. mozilla.org/products/firefox/ outweighs the correction effort. typically far (verified 07/09/05). Copyright 2006 ACM 1-59593-085-X/06/0005 ...$5.00. In this paper, we address the problem of estimating the time it takes to fix an issue1 from a novel perspective. Our 9:; 9<; approach is based on leveraging the experience from earlier issues—or, more prosaic, to extract issues reports from bug 29th International Conference on Software Engineering (ICSE'07) databases and to use their features to make predictions for 9=; 0-7695-2828-7/07 $20.00 © 2007 new, similar problems. We have used this approach to pre- dict the fixing effort—that is, the effort (in person-hours) it 234%0)$)5)#+ !quot;#$%#&'&()*%*+,quot;*$# takes to fix a particular issue. These estimates are central to -&$.%*+/quot;*0+0%+11quot;*$ 1 An issue is either a bug, feature request, or task. We refer to the Figure 1. Predicting effort for an issue report database that collects issues as bug database or issue tracking system.
    • Basis of Research: Bug Reports Who Should Fix This Bug? Detection of Duplicate Defect Reports Using Natural Language Processing John Anvik, Lyndon Hiew and Gail C. Murphy Department of Computer Science University of British Columbia Per Runeson, Magnus Alexandersson and Oskar Nyholm lyndonh, murphy}@cs.ubc.ca {janvik, Software Engineering Research Group Lund University, Box 118, SE-221 00 Lund, Sweden per.runeson@telecom.lth.se ABSTRACT However, this potential advantage also comes with a sig- nificant cost. Each bug that is reported must be triaged Open source development projects typically support an open How Long will it Take to Fix This Bug? to determine if it describes a meaningful new problem or bug repository to which both developers and users can re- enhancement, and if it does, it must be assigned to an ap- port bugs. The reports that appear in this repository must propriate developer for further handling [13]. Consider the be triaged to determine if the report is one which requires Abstract and handling, hence support to speed up the duplicate case of the Eclipse open source project1 over a four month attention and if it is, which developer will be assigned the period (January 1, 2005 to April 30, 2005) when 3426 re- Rahul Premraj Cathrin Weiß Thomas Zimmermann Andreas Zeller detection process is appreciated. responsibility of resolving the report. Large open source de- Saarland University Saarland University Saarland University Defect reports are generated from various testing and Saarland University The defect reports are written in natural language, ports were filed, averaging 29 reports per day. Assuming velopments are burdened by the rate at which new bug re- that a triager takes approximately five minutes to read and and the duplicate identification requires suitable infor- weiss@st.cs.uni-sb.de premraj@cs.uni-sb.de tz@acm.org zeller@acm.org development activities in software engineering. Some- ports appear in the bug repository. In this paper, we present handle each report, two person-hours per day is being spent mation retrieval methods. In this study, we investigate a semi-automated approach intended to ease one part of this times two reports are submitted that describe the same on this activity. If all of these reports led to improvements process, the assignment of reports to a developer. Our ap- the use of Natural Language Processing (NLP) [17] problem, leading to duplicate reports. These reports in the code, this might be an acceptable cost to the project. Good Reports proach applies a machine learning algorithm to the open bug techniques to help automate this process. NLP is previ- are mostly written in structured natural language, and However, since many of the reports are duplicates of exist- Abstract repository to learn the kinds of reports each developer re- project managers, because they allow to plan the cost and ously used in requirements engineering [12][3][19], as such, it is hard to compare two reports for similarity ing reports or are not valid reports, much of this work does solves. When a new report arrives, the classifier produced time of future releases. program comprehension [2] and in defect report man- not improve the product. For instance, of the 3426 reports with formal methods. In order to identify duplicates, by the machine learning technique suggests a small number for Eclipse, 1190 (36%) were marked either as invalid, a du- software problem has Predicting the time and effort for a Our approach is illustrated in Figure 1. As a new issue agement [15], although with a different angle. we investigate using Natural Language Processing of developers suitable to resolve the report. With this ap- plicate, a bug that could not be a difficult task.one that will an approach that au- long been replicated, or We present report r is entered into the bug database (1), we search for Basically, we take the words in the defect report in proach, we have reached precision levels of 57% and 64% on (NLP) techniques to support the identification. A pro- not be fixed. tomatically predicts the fixing effort, i.e., the person-hours the existing issue reports which have a description that is plain English, make some processing of the text and the Eclipse and Firefox development projects respectively. totype tool is developed and evaluated in a case study As a means of reducing the time spent triaging, we present spent on fixing an issue. Our technique leverages existing most similar to r (2). We then combine their reported effort We have also applied our approach to the gcc open source de- then use the statistics on the occurrences of the words analyzing defect reports at Sony Ericsson Mobile Com- an approach for semi-automating one part of the process, the assignment of a developer to a newly received given a new issue report, we use issue tracking systems: report. Our as a prediction for our issue report r (3). velopment with less positive results. We describe the condi- to identify similar defect reports. We implemented a munications. The evaluation shows that about 2/3 of tions under which the approach is applicable and also report approach uses a machine Lucene framework to search for similar, earlier reports the learning algorithm to recommend In contrast to previous work (see Section 8), the present prototype tool and evaluated its effects on the internal the duplicates can possibly be found using the NLP on the lessons we learned about applying machine learning to a triager a set of and use their average be appropriate developers who may time as a prediction. Our approach paper makes the following original contributions: defect reporting system of Sony Ericsson Mobile techniques. Different variants of the techniques pro- to repositories used in open source development. for resolving the bug.thus allows for early effort the triage helping in assign- This information can help estimation, Communications which contained thousands of reports. 1. We leverage existing vide databases toresult differences, indicating a robust bug only minor automatically Categories and Subject Descriptors: D.2 [Software]: process in two ways: ing issues and schedulingto process a it may allow a triager stable releases. We evaluated our Further, we interviewed some users of the prototype estimate effort for new problems. User testing shows that the overall attitude technology. Software Engineering bug more quickly, andapproach using effort with less overallJBoss project. Given it may allow triagers data from the tool to get a qualitative view of the effects. The proto- towards the technique is positive and that it has a knowledge of the system to perform bug assignments more General Terms: Management. a sufficient number of issues reports, our automatic predic- 2. We use text similarity techniques to identify those issue type tool identified about 40% of the marked duplicate growth potential. correctly. Our approach requires a project to have had an Keywords: Problem tracking, issue tracking, bug report open bug repository for some period to time from which the issues that are bugs, tions are close of the actual effort; for reports which are most closely related. defect reports, which can be seen as low figure. How- assignment, bug triage, machine learning patterns of who solves what off by only onecan bebeating na¨ve predictions by a we are kinds of bugs hour, learned. ı ever, since only one type of duplicate reports are possi- 3. Given a sufficient number of issue reports to learn Our approach also requires thefour. factor of specification of heuristics to bly found by the technique, we estimate that the tech- 1. Introduction from, our predictions are close to the actual effort, es- 1. INTRODUCTION interpret how a project uses the bug repository. We believe nique finds 2/3 of the possible duplicates. Also, in pecially for issues that are bugs. that neither of these requirements are arduous for the large Most open source software developments incorporate an terms of working hours, reducing the effort to identify When a complex software product like a mobile projects we are targeting with this approach. Using our ap- open bug repository that allows both developers and users to proach we have been 1. Introduction The remainder of the paper is organized as follows: In Sec- duplicate reports with 40% is still a substantial saving able to correctly suggest appropriate phone is developed, it is natural and common that post problems encountered with the software, suggest possi- tion 2, we give background information on the role of issue for a major software development company, which developers to whom to assign a bug with a precision between ble enhancements, and comment upon existing bug reports. software defects slip into the product, leading to func- reports in the software process. Section 3 briefly describes 57% and 64% for the Eclipse and Firefox2a bug repositories, handles thousands of defect reports every year. Predicting when particular software development task One potential advantage of an open bug repository is that it tional failures, i.e. the phone does not have the ex- how we accessed the data. Section 4 describes our statistical which we used to develop the approach. We have also ap- The paper is outlined as follows. Section 2 intro- may allow more bugs to be identified and solved, improving will be completed has always been difficult. The time it pected behavior. These failures are found in testing or plied our approach to the gcc repository, but the results the quality of the software produced [12]. approach, which is then evaluated in a case study (Section 5 duces the theory on defect reporting and on natural were not as encouraging, hovering a defect6% particularly challenging to predict. takes to fix around is precision. We other development activities and reported in a defect and 6) involving JBoss and four of its subprojects. After language processing. Section 3 presents the tailoring believe this is in partWhy to that so? In contrast to programming, which is a con- due is a prolific bug-fixing developer management system [5][18]. If the development proc- discussing threats to validity (Section 7) and related work made of the NLP techniques to fit the duplicate detec- struction process, debugging is a search process—a search who skews the learning process. ess is highly parallel, or a product line architecture is (Section 8), we close with consequences (Section 9). tion purpose. In Section 4, we specify the case study The paper makes two contributions: all of the program’s code, its runs, its which can involve used, where components are used in different products, conducted for evaluation of the technique, and Section states, or even its history. Debugging is particularly nasty the same defect may easily be reported multiple times, 5 presents the case study results. Finally Section 6 con- Eclipse provides an because the original assumptions of the program’s authors 1 extensible development environment, 6+-%,*quot;5(+'%*+,quot;*$ resulting in duplicate reports in the defect management 7*+0&/$+0%+11quot;*$ cludes the paper and outlines further work. including a Java IDE, cannot be trusted.at www.eclipse.org identified, fixing it is and can be found Once the defect is )#%-+&4.$+0%)8+*)4+ © ACM, (2006). This is the author’s version of the work. It is posted here system. These duplicates cost effort in identification (verified 31/08/05). Firefox provides a web browser and can be found butwww.earlier effort to search again a programming activity, at the by permission of ACM for your personal use. Not for redistribution. 2 ICSE’06, May 20–28, 2006, Shanghai, China. mozilla.org/products/firefox/ outweighs the correction effort. typically far (verified 07/09/05). Copyright 2006 ACM 1-59593-085-X/06/0005 ...$5.00. In this paper, we address the problem of estimating the time it takes to fix an issue1 from a novel perspective. Our 9:; 9<; approach is based on leveraging the experience from earlier issues—or, more prosaic, to extract issues reports from bug 29th International Conference on Software Engineering (ICSE'07) databases and to use their features to make predictions for 9=; 0-7695-2828-7/07 $20.00 © 2007 new, similar problems. We have used this approach to pre- dict the fixing effort—that is, the effort (in person-hours) it 234%0)$)5)#+ !quot;#$%#&'&()*%*+,quot;*$# takes to fix a particular issue. These estimates are central to -&$.%*+/quot;*0+0%+11quot;*$ 1 An issue is either a bug, feature request, or task. We refer to the Figure 1. Predicting effort for an issue report database that collects issues as bug database or issue tracking system.
    • Basis of Research: Bug Reports Who Should Fix This Bug? Detection of Duplicate Defect Reports Using Natural Language Processing John Anvik, Lyndon Hiew and Gail C. Murphy Department of Computer Science University of British Columbia Per Runeson, Magnus Alexandersson and Oskar Nyholm lyndonh, murphy}@cs.ubc.ca {janvik, Software Engineering Research Group Lund University, Box 118, SE-221 00 Lund, Sweden per.runeson@telecom.lth.se ABSTRACT However, this potential advantage also comes with a sig- nificant cost. Each bug that is reported must be triaged Open source development projects typically support an open How Long will it Take to Fix This Bug? to determine if it describes a meaningful new problem or bug repository to which both developers and users can re- enhancement, and if it does, it must be assigned to an ap- port bugs. The reports that appear in this repository must propriate developer for further handling [13]. Consider the be triaged to determine if the report is one which requires Abstract and handling, hence support to speed up the duplicate case of the Eclipse open source project1 over a four month attention and if it is, which developer will be assigned the period (January 1, 2005 to April 30, 2005) when 3426 re- Rahul Premraj Cathrin Weiß Thomas Zimmermann Andreas Zeller detection process is appreciated. responsibility of resolving the report. Large open source de- Saarland University Saarland University Saarland University Defect reports are generated from various testing and Saarland University The defect reports are written in natural language, ports were filed, averaging 29 reports per day. Assuming velopments are burdened by the rate at which new bug re- that a triager takes approximately five minutes to read and and the duplicate identification requires suitable infor- weiss@st.cs.uni-sb.de premraj@cs.uni-sb.de tz@acm.org zeller@acm.org development activities in software engineering. Some- ports appear in the bug repository. In this paper, we present handle each report, two person-hours per day is being spent mation retrieval methods. In this study, we investigate a semi-automated approach intended to ease one part of this times two reports are submitted that describe the same on this activity. If all of these reports led to improvements process, the assignment of reports to a developer. Our ap- the use of Natural Language Processing (NLP) [17] problem, leading to duplicate reports. These reports in the code, this might be an acceptable cost to the project. Good Reports proach applies a machine learning algorithm to the open bug techniques to help automate this process. NLP is previ- are mostly written in structured natural language, and However, since many of the reports are duplicates of exist- Abstract repository to learn the kinds of reports each developer re- project managers, because they allow to plan the cost and ously used in requirements engineering [12][3][19], as such, it is hard to compare two reports for similarity ing reports or are not valid reports, much of this work does solves. When a new report arrives, the classifier produced time of future releases. program comprehension [2] and in defect report man- not improve the product. For instance, of the 3426 reports with formal methods. In order to identify duplicates, by the machine learning technique suggests a small number for Eclipse, 1190 (36%) were marked either as invalid, a du- software problem has Predicting the time and effort for a Our approach is illustrated in Figure 1. As a new issue agement [15], although with a different angle. we investigate using Natural Language Processing of developers suitable to resolve the report. With this ap- plicate, a bug that could not be a difficult task.one that will an approach that au- long been replicated, or We present report r is entered into the bug database (1), we search for Basically, we take the words in the defect report in proach, we have reached precision levels of 57% and 64% on (NLP) techniques to support the identification. A pro- •Improve Research not be fixed. tomatically predicts the fixing effort, i.e., the person-hours the existing issue reports which have a description that is plain English, make some processing of the text and the Eclipse and Firefox development projects respectively. totype tool is developed and evaluated in a case study As a means of reducing the time spent triaging, we present spent on fixing an issue. Our technique leverages existing most similar to r (2). We then combine their reported effort We have also applied our approach to the gcc open source de- then use the statistics on the occurrences of the words analyzing defect reports at Sony Ericsson Mobile Com- an approach for semi-automating one part of the process, the assignment of a developer to a newly received given a new issue report, we use issue tracking systems: report. Our as a prediction for our issue report r (3). velopment with less positive results. We describe the condi- to identify similar defect reports. We implemented a munications. The evaluation shows that about 2/3 of tions under which the approach is applicable and also report approach uses a machine Lucene framework to search for similar, earlier reports the learning algorithm to recommend In contrast to previous work (see Section 8), the present prototype tool and evaluated its effects on the internal the duplicates can possibly be found using the NLP on the lessons we learned about applying machine learning to a triager a set of and use their average be appropriate developers who may time as a prediction. Our approach paper makes the following original contributions: defect reporting system of Sony Ericsson Mobile techniques. Different variants of the techniques pro- to repositories used in open source development. for resolving the bug.thus allows for early effort the triage helping in assign- This information can help estimation, Communications which contained thousands of reports. 1. We leverage existing vide databases toresult differences, indicating a robust bug only minor automatically Categories and Subject Descriptors: D.2 [Software]: process in two ways: ing issues and schedulingto process a it may allow a triager stable releases. We evaluated our Further, we interviewed some users of the prototype estimate effort for new problems. User testing shows that the overall attitude technology. Software Engineering bug more quickly, andapproach using effort with less overallJBoss project. Given it may allow triagers data from the tool to get a qualitative view of the effects. The proto- towards the technique is positive and that it has a knowledge of the system to perform bug assignments more General Terms: Management. a sufficient number of issues reports, our automatic predic- 2. We use text similarity techniques to identify those issue type tool identified about 40% of the marked duplicate growth potential. correctly. Our approach requires a project to have had an Keywords: Problem tracking, issue tracking, bug report open bug repository for some period to time from which the issues that are bugs, tions are close of the actual effort; for reports which are most closely related. defect reports, which can be seen as low figure. How- assignment, bug triage, machine learning patterns of who solves what off by only onecan bebeating na¨ve predictions by a we are kinds of bugs hour, learned. ı ever, since only one type of duplicate reports are possi- 3. Given a sufficient number of issue reports to learn Our approach also requires thefour. factor of specification of heuristics to bly found by the technique, we estimate that the tech- 1. Introduction from, our predictions are close to the actual effort, es- 1. INTRODUCTION interpret how a project uses the bug repository. We believe nique finds 2/3 of the possible duplicates. Also, in pecially for issues that are bugs. that neither of these requirements are arduous for the large Most open source software developments incorporate an terms of working hours, reducing the effort to identify When a complex software product like a mobile projects we are targeting with this approach. Using our ap- open bug repository that allows both developers and users to proach we have been 1. Introduction The remainder of the paper is organized as follows: In Sec- duplicate reports with 40% is still a substantial saving able to correctly suggest appropriate phone is developed, it is natural and common that post problems encountered with the software, suggest possi- tion 2, we give background information on the role of issue for a major software development company, which developers to whom to assign a bug with a precision between ble enhancements, and comment upon existing bug reports. software defects slip into the product, leading to func- reports in the software process. Section 3 briefly describes 57% and 64% for the Eclipse and Firefox2a bug repositories, handles thousands of defect reports every year. Predicting when particular software development task One potential advantage of an open bug repository is that it tional failures, i.e. the phone does not have the ex- how we accessed the data. Section 4 describes our statistical which we used to develop the approach. We have also ap- The paper is outlined as follows. Section 2 intro- may allow more bugs to be identified and solved, improving will be completed has always been difficult. The time it pected behavior. These failures are found in testing or plied our approach to the gcc repository, but the results the quality of the software produced [12]. approach, which is then evaluated in a case study (Section 5 duces the theory on defect reporting and on natural were not as encouraging, hovering a defect6% particularly challenging to predict. takes to fix around is precision. We other development activities and reported in a defect and 6) involving JBoss and four of its subprojects. After language processing. Section 3 presents the tailoring believe this is in partWhy to that so? In contrast to programming, which is a con- due is a prolific bug-fixing developer management system [5][18]. If the development proc- discussing threats to validity (Section 7) and related work made of the NLP techniques to fit the duplicate detec- struction process, debugging is a search process—a search who skews the learning process. ess is highly parallel, or a product line architecture is (Section 8), we close with consequences (Section 9). tion purpose. In Section 4, we specify the case study The paper makes two contributions: all of the program’s code, its runs, its which can involve used, where components are used in different products, conducted for evaluation of the technique, and Section states, or even its history. Debugging is particularly nasty the same defect may easily be reported multiple times, 5 presents the case study results. Finally Section 6 con- Eclipse provides an because the original assumptions of the program’s authors 1 extensible development environment, 6+-%,*quot;5(+'%*+,quot;*$ resulting in duplicate reports in the defect management 7*+0&/$+0%+11quot;*$ cludes the paper and outlines further work. including a Java IDE, cannot be trusted.at www.eclipse.org identified, fixing it is and can be found Once the defect is )#%-+&4.$+0%)8+*)4+ © ACM, (2006). This is the author’s version of the work. It is posted here system. These duplicates cost effort in identification (verified 31/08/05). Firefox provides a web browser and can be found butwww.earlier effort to search again a programming activity, at the by permission of ACM for your personal use. Not for redistribution. 2 ICSE’06, May 20–28, 2006, Shanghai, China. mozilla.org/products/firefox/ outweighs the correction effort. typically far (verified 07/09/05). Copyright 2006 ACM 1-59593-085-X/06/0005 ...$5.00. In this paper, we address the problem of estimating the time it takes to fix an issue1 from a novel perspective. Our 9:; 9<; approach is based on leveraging the experience from earlier issues—or, more prosaic, to extract issues reports from bug 29th International Conference on Software Engineering (ICSE'07) databases and to use their features to make predictions for 9=; 0-7695-2828-7/07 $20.00 © 2007 new, similar problems. We have used this approach to pre- dict the fixing effort—that is, the effort (in person-hours) it 234%0)$)5)#+ !quot;#$%#&'&()*%*+,quot;*$# takes to fix a particular issue. These estimates are central to -&$.%*+/quot;*0+0%+11quot;*$ 1 An issue is either a bug, feature request, or task. We refer to the Figure 1. Predicting effort for an issue report database that collects issues as bug database or issue tracking system.
    • Basis of Research: Bug Reports Who Should Fix This Bug? Detection of Duplicate Defect Reports Using Natural Language Processing John Anvik, Lyndon Hiew and Gail C. Murphy Department of Computer Science University of British Columbia Per Runeson, Magnus Alexandersson and Oskar Nyholm lyndonh, murphy}@cs.ubc.ca {janvik, Software Engineering Research Group Lund University, Box 118, SE-221 00 Lund, Sweden per.runeson@telecom.lth.se ABSTRACT However, this potential advantage also comes with a sig- nificant cost. Each bug that is reported must be triaged Open source development projects typically support an open How Long will it Take to Fix This Bug? to determine if it describes a meaningful new problem or bug repository to which both developers and users can re- enhancement, and if it does, it must be assigned to an ap- port bugs. The reports that appear in this repository must propriate developer for further handling [13]. Consider the be triaged to determine if the report is one which requires Abstract and handling, hence support to speed up the duplicate case of the Eclipse open source project1 over a four month attention and if it is, which developer will be assigned the period (January 1, 2005 to April 30, 2005) when 3426 re- Rahul Premraj Cathrin Weiß Thomas Zimmermann Andreas Zeller detection process is appreciated. responsibility of resolving the report. Large open source de- Saarland University Saarland University Saarland University Defect reports are generated from various testing and Saarland University The defect reports are written in natural language, ports were filed, averaging 29 reports per day. Assuming velopments are burdened by the rate at which new bug re- that a triager takes approximately five minutes to read and and the duplicate identification requires suitable infor- weiss@st.cs.uni-sb.de premraj@cs.uni-sb.de tz@acm.org zeller@acm.org development activities in software engineering. Some- ports appear in the bug repository. In this paper, we present handle each report, two person-hours per day is being spent mation retrieval methods. In this study, we investigate a semi-automated approach intended to ease one part of this times two reports are submitted that describe the same on this activity. If all of these reports led to improvements process, the assignment of reports to a developer. Our ap- the use of Natural Language Processing (NLP) [17] problem, leading to duplicate reports. These reports in the code, this might be an acceptable cost to the project. Good Reports proach applies a machine learning algorithm to the open bug techniques to help automate this process. NLP is previ- are mostly written in structured natural language, and However, since many of the reports are duplicates of exist- Abstract repository to learn the kinds of reports each developer re- project managers, because they allow to plan the cost and ously used in requirements engineering [12][3][19], as such, it is hard to compare two reports for similarity ing reports or are not valid reports, much of this work does solves. When a new report arrives, the classifier produced time of future releases. program comprehension [2] and in defect report man- not improve the product. For instance, of the 3426 reports with formal methods. In order to identify duplicates, by the machine learning technique suggests a small number for Eclipse, 1190 (36%) were marked either as invalid, a du- software problem has Predicting the time and effort for a Our approach is illustrated in Figure 1. As a new issue agement [15], although with a different angle. we investigate using Natural Language Processing of developers suitable to resolve the report. With this ap- plicate, a bug that could not be a difficult task.one that will an approach that au- long been replicated, or We present report r is entered into the bug database (1), we search for Basically, we take the words in the defect report in proach, we have reached precision levels of 57% and 64% on (NLP) techniques to support the identification. A pro- •Improve Research not be fixed. tomatically predicts the fixing effort, i.e., the person-hours the existing issue reports which have a description that is plain English, make some processing of the text and the Eclipse and Firefox development projects respectively. totype tool is developed and evaluated in a case study As a means of reducing the time spent triaging, we present spent on fixing an issue. Our technique leverages existing most similar to r (2). We then combine their reported effort We have also applied our approach to the gcc open source de- then use the statistics on the occurrences of the words analyzing defect reports at Sony Ericsson Mobile Com- an approach for semi-automating one part of the process, the assignment of a developer to a newly received given a new issue report, we use issue tracking systems: report. Our as a prediction for our issue report r (3). velopment with less positive results. We describe the condi- to identify similar defect reports. We implemented a munications. The evaluation shows that about 2/3 of tions under which the approach is applicable and also report approach uses a machine Lucene framework to search for similar, earlier reports the learning algorithm to recommend In contrast to previous work (see Section 8), the present prototype tool and evaluated its effects on the internal the duplicates can possibly be found using the NLP on the lessons we learned about applying machine learning to a triager a set of and use their average be appropriate developers who may time as a prediction. Our approach paper makes the following original contributions: defect reporting system of Sony Ericsson Mobile techniques. Different variants of the techniques pro- to repositories used in open source development. for resolving the bug.thus allows for early effort the triage helping in assign- This information can help estimation, Communications which contained thousands of reports. 1. We leverage existing vide databases toresult differences, indicating a robust bug only minor automatically •Help in Bug Fixing Process Categories and Subject Descriptors: D.2 [Software]: process in two ways: ing issues and schedulingto process a it may allow a triager stable releases. We evaluated our Further, we interviewed some users of the prototype estimate effort for new problems. User testing shows that the overall attitude technology. Software Engineering bug more quickly, andapproach using effort with less overallJBoss project. Given it may allow triagers data from the tool to get a qualitative view of the effects. The proto- towards the technique is positive and that it has a knowledge of the system to perform bug assignments more General Terms: Management. a sufficient number of issues reports, our automatic predic- 2. We use text similarity techniques to identify those issue type tool identified about 40% of the marked duplicate growth potential. correctly. Our approach requires a project to have had an Keywords: Problem tracking, issue tracking, bug report open bug repository for some period to time from which the issues that are bugs, tions are close of the actual effort; for reports which are most closely related. defect reports, which can be seen as low figure. How- assignment, bug triage, machine learning patterns of who solves what off by only onecan bebeating na¨ve predictions by a we are kinds of bugs hour, learned. ı ever, since only one type of duplicate reports are possi- 3. Given a sufficient number of issue reports to learn Our approach also requires thefour. factor of specification of heuristics to bly found by the technique, we estimate that the tech- 1. Introduction from, our predictions are close to the actual effort, es- 1. INTRODUCTION interpret how a project uses the bug repository. We believe nique finds 2/3 of the possible duplicates. Also, in pecially for issues that are bugs. that neither of these requirements are arduous for the large Most open source software developments incorporate an terms of working hours, reducing the effort to identify When a complex software product like a mobile projects we are targeting with this approach. Using our ap- open bug repository that allows both developers and users to proach we have been 1. Introduction The remainder of the paper is organized as follows: In Sec- duplicate reports with 40% is still a substantial saving able to correctly suggest appropriate phone is developed, it is natural and common that post problems encountered with the software, suggest possi- tion 2, we give background information on the role of issue for a major software development company, which developers to whom to assign a bug with a precision between ble enhancements, and comment upon existing bug reports. software defects slip into the product, leading to func- reports in the software process. Section 3 briefly describes 57% and 64% for the Eclipse and Firefox2a bug repositories, handles thousands of defect reports every year. Predicting when particular software development task One potential advantage of an open bug repository is that it tional failures, i.e. the phone does not have the ex- how we accessed the data. Section 4 describes our statistical which we used to develop the approach. We have also ap- The paper is outlined as follows. Section 2 intro- may allow more bugs to be identified and solved, improving will be completed has always been difficult. The time it pected behavior. These failures are found in testing or plied our approach to the gcc repository, but the results the quality of the software produced [12]. approach, which is then evaluated in a case study (Section 5 duces the theory on defect reporting and on natural were not as encouraging, hovering a defect6% particularly challenging to predict. takes to fix around is precision. We other development activities and reported in a defect and 6) involving JBoss and four of its subprojects. After language processing. Section 3 presents the tailoring believe this is in partWhy to that so? In contrast to programming, which is a con- due is a prolific bug-fixing developer management system [5][18]. If the development proc- discussing threats to validity (Section 7) and related work made of the NLP techniques to fit the duplicate detec- struction process, debugging is a search process—a search who skews the learning process. ess is highly parallel, or a product line architecture is (Section 8), we close with consequences (Section 9). tion purpose. In Section 4, we specify the case study The paper makes two contributions: all of the program’s code, its runs, its which can involve used, where components are used in different products, conducted for evaluation of the technique, and Section states, or even its history. Debugging is particularly nasty the same defect may easily be reported multiple times, 5 presents the case study results. Finally Section 6 con- Eclipse provides an because the original assumptions of the program’s authors 1 extensible development environment, 6+-%,*quot;5(+'%*+,quot;*$ resulting in duplicate reports in the defect management 7*+0&/$+0%+11quot;*$ cludes the paper and outlines further work. including a Java IDE, cannot be trusted.at www.eclipse.org identified, fixing it is and can be found Once the defect is )#%-+&4.$+0%)8+*)4+ © ACM, (2006). This is the author’s version of the work. It is posted here system. These duplicates cost effort in identification (verified 31/08/05). Firefox provides a web browser and can be found butwww.earlier effort to search again a programming activity, at the by permission of ACM for your personal use. Not for redistribution. 2 ICSE’06, May 20–28, 2006, Shanghai, China. mozilla.org/products/firefox/ outweighs the correction effort. typically far (verified 07/09/05). Copyright 2006 ACM 1-59593-085-X/06/0005 ...$5.00. In this paper, we address the problem of estimating the time it takes to fix an issue1 from a novel perspective. Our 9:; 9<; approach is based on leveraging the experience from earlier issues—or, more prosaic, to extract issues reports from bug 29th International Conference on Software Engineering (ICSE'07) databases and to use their features to make predictions for 9=; 0-7695-2828-7/07 $20.00 © 2007 new, similar problems. We have used this approach to pre- dict the fixing effort—that is, the effort (in person-hours) it 234%0)$)5)#+ !quot;#$%#&'&()*%*+,quot;*$# takes to fix a particular issue. These estimates are central to -&$.%*+/quot;*0+0%+11quot;*$ 1 An issue is either a bug, feature request, or task. We refer to the Figure 1. Predicting effort for an issue report database that collects issues as bug database or issue tracking system.
    • Basis of Research: Bug Reports Who Should Fix This Bug? Detection of Duplicate Defect Reports Using Natural Language Processing John Anvik, Lyndon Hiew and Gail C. Murphy Department of Computer Science University of British Columbia Per Runeson, Magnus Alexandersson and Oskar Nyholm lyndonh, murphy}@cs.ubc.ca {janvik, Software Engineering Research Group Lund University, Box 118, SE-221 00 Lund, Sweden per.runeson@telecom.lth.se ABSTRACT However, this potential advantage also comes with a sig- nificant cost. Each bug that is reported must be triaged Open source development projects typically support an open How Long will it Take to Fix This Bug? to determine if it describes a meaningful new problem or bug repository to which both developers and users can re- enhancement, and if it does, it must be assigned to an ap- port bugs. The reports that appear in this repository must propriate developer for further handling [13]. Consider the be triaged to determine if the report is one which requires Abstract and handling, hence support to speed up the duplicate case of the Eclipse open source project1 over a four month attention and if it is, which developer will be assigned the period (January 1, 2005 to April 30, 2005) when 3426 re- Rahul Premraj Cathrin Weiß Thomas Zimmermann Andreas Zeller detection process is appreciated. responsibility of resolving the report. Large open source de- Saarland University Saarland University Saarland University Defect reports are generated from various testing and Saarland University The defect reports are written in natural language, ports were filed, averaging 29 reports per day. Assuming velopments are burdened by the rate at which new bug re- that a triager takes approximately five minutes to read and and the duplicate identification requires suitable infor- weiss@st.cs.uni-sb.de premraj@cs.uni-sb.de tz@acm.org zeller@acm.org development activities in software engineering. Some- ports appear in the bug repository. In this paper, we present handle each report, two person-hours per day is being spent mation retrieval methods. In this study, we investigate a semi-automated approach intended to ease one part of this times two reports are submitted that describe the same on this activity. If all of these reports led to improvements process, the assignment of reports to a developer. Our ap- the use of Natural Language Processing (NLP) [17] problem, leading to duplicate reports. These reports in the code, this might be an acceptable cost to the project. Good Reports proach applies a machine learning algorithm to the open bug techniques to help automate this process. NLP is previ- are mostly written in structured natural language, and However, since many of the reports are duplicates of exist- Abstract repository to learn the kinds of reports each developer re- project managers, because they allow to plan the cost and ously used in requirements engineering [12][3][19], as such, it is hard to compare two reports for similarity ing reports or are not valid reports, much of this work does solves. When a new report arrives, the classifier produced time of future releases. program comprehension [2] and in defect report man- not improve the product. For instance, of the 3426 reports with formal methods. In order to identify duplicates, by the machine learning technique suggests a small number for Eclipse, 1190 (36%) were marked either as invalid, a du- software problem has Predicting the time and effort for a Our approach is illustrated in Figure 1. As a new issue agement [15], although with a different angle. we investigate using Natural Language Processing of developers suitable to resolve the report. With this ap- plicate, a bug that could not be a difficult task.one that will an approach that au- long been replicated, or We present report r is entered into the bug database (1), we search for Basically, we take the words in the defect report in proach, we have reached precision levels of 57% and 64% on (NLP) techniques to support the identification. A pro- •Improve Research not be fixed. tomatically predicts the fixing effort, i.e., the person-hours the existing issue reports which have a description that is plain English, make some processing of the text and the Eclipse and Firefox development projects respectively. totype tool is developed and evaluated in a case study As a means of reducing the time spent triaging, we present spent on fixing an issue. Our technique leverages existing most similar to r (2). We then combine their reported effort We have also applied our approach to the gcc open source de- then use the statistics on the occurrences of the words analyzing defect reports at Sony Ericsson Mobile Com- an approach for semi-automating one part of the process, the assignment of a developer to a newly received given a new issue report, we use issue tracking systems: report. Our as a prediction for our issue report r (3). velopment with less positive results. We describe the condi- to identify similar defect reports. We implemented a munications. The evaluation shows that about 2/3 of tions under which the approach is applicable and also report approach uses a machine Lucene framework to search for similar, earlier reports the learning algorithm to recommend In contrast to previous work (see Section 8), the present prototype tool and evaluated its effects on the internal the duplicates can possibly be found using the NLP on the lessons we learned about applying machine learning to a triager a set of and use their average be appropriate developers who may time as a prediction. Our approach paper makes the following original contributions: defect reporting system of Sony Ericsson Mobile techniques. Different variants of the techniques pro- to repositories used in open source development. for resolving the bug.thus allows for early effort the triage helping in assign- This information can help estimation, Communications which contained thousands of reports. 1. We leverage existing vide databases toresult differences, indicating a robust bug only minor automatically •Help in Bug Fixing Process Categories and Subject Descriptors: D.2 [Software]: process in two ways: ing issues and schedulingto process a it may allow a triager stable releases. We evaluated our Further, we interviewed some users of the prototype estimate effort for new problems. User testing shows that the overall attitude technology. Software Engineering bug more quickly, andapproach using effort with less overallJBoss project. Given it may allow triagers data from the tool to get a qualitative view of the effects. The proto- towards the technique is positive and that it has a knowledge of the system to perform bug assignments more General Terms: Management. a sufficient number of issues reports, our automatic predic- 2. We use text similarity techniques to identify those issue type tool identified about 40% of the marked duplicate growth potential. correctly. Our approach requires a project to have had an Keywords: Problem tracking, issue tracking, bug report open bug repository for some period to time from which the issues that are bugs, tions are close of the actual effort; for reports which are most closely related. defect reports, which can be seen as low figure. How- assignment, bug triage, machine learning patterns of who solves what off by only onecan bebeating na¨ve predictions by a we are kinds of bugs hour, learned. ı ever, since only one type of duplicate reports are possi- 3. Given a sufficient number of issue reports to learn Our approach also requires thefour. factor of specification of heuristics to bly found by the technique, we estimate that the tech- 1. Introduction • Win-Win Situation! from, our predictions are close to the actual effort, es- 1. INTRODUCTION interpret how a project uses the bug repository. We believe nique finds 2/3 of the possible duplicates. Also, in pecially for issues that are bugs. that neither of these requirements are arduous for the large Most open source software developments incorporate an terms of working hours, reducing the effort to identify When a complex software product like a mobile projects we are targeting with this approach. Using our ap- open bug repository that allows both developers and users to proach we have been 1. Introduction The remainder of the paper is organized as follows: In Sec- duplicate reports with 40% is still a substantial saving able to correctly suggest appropriate phone is developed, it is natural and common that post problems encountered with the software, suggest possi- tion 2, we give background information on the role of issue for a major software development company, which developers to whom to assign a bug with a precision between ble enhancements, and comment upon existing bug reports. software defects slip into the product, leading to func- reports in the software process. Section 3 briefly describes 57% and 64% for the Eclipse and Firefox2a bug repositories, handles thousands of defect reports every year. Predicting when particular software development task One potential advantage of an open bug repository is that it tional failures, i.e. the phone does not have the ex- how we accessed the data. Section 4 describes our statistical which we used to develop the approach. We have also ap- The paper is outlined as follows. Section 2 intro- may allow more bugs to be identified and solved, improving will be completed has always been difficult. The time it pected behavior. These failures are found in testing or plied our approach to the gcc repository, but the results the quality of the software produced [12]. approach, which is then evaluated in a case study (Section 5 duces the theory on defect reporting and on natural were not as encouraging, hovering a defect6% particularly challenging to predict. takes to fix around is precision. We other development activities and reported in a defect and 6) involving JBoss and four of its subprojects. After language processing. Section 3 presents the tailoring believe this is in partWhy to that so? In contrast to programming, which is a con- due is a prolific bug-fixing developer management system [5][18]. If the development proc- discussing threats to validity (Section 7) and related work made of the NLP techniques to fit the duplicate detec- struction process, debugging is a search process—a search who skews the learning process. ess is highly parallel, or a product line architecture is (Section 8), we close with consequences (Section 9). tion purpose. In Section 4, we specify the case study The paper makes two contributions: all of the program’s code, its runs, its which can involve used, where components are used in different products, conducted for evaluation of the technique, and Section states, or even its history. Debugging is particularly nasty the same defect may easily be reported multiple times, 5 presents the case study results. Finally Section 6 con- Eclipse provides an because the original assumptions of the program’s authors 1 extensible development environment, 6+-%,*quot;5(+'%*+,quot;*$ resulting in duplicate reports in the defect management 7*+0&/$+0%+11quot;*$ cludes the paper and outlines further work. including a Java IDE, cannot be trusted.at www.eclipse.org identified, fixing it is and can be found Once the defect is )#%-+&4.$+0%)8+*)4+ © ACM, (2006). This is the author’s version of the work. It is posted here system. These duplicates cost effort in identification (verified 31/08/05). Firefox provides a web browser and can be found butwww.earlier effort to search again a programming activity, at the by permission of ACM for your personal use. Not for redistribution. 2 ICSE’06, May 20–28, 2006, Shanghai, China. mozilla.org/products/firefox/ outweighs the correction effort. typically far (verified 07/09/05). Copyright 2006 ACM 1-59593-085-X/06/0005 ...$5.00. In this paper, we address the problem of estimating the time it takes to fix an issue1 from a novel perspective. Our 9:; 9<; approach is based on leveraging the experience from earlier issues—or, more prosaic, to extract issues reports from bug 29th International Conference on Software Engineering (ICSE'07) databases and to use their features to make predictions for 9=; 0-7695-2828-7/07 $20.00 © 2007 new, similar problems. We have used this approach to pre- dict the fixing effort—that is, the effort (in person-hours) it 234%0)$)5)#+ !quot;#$%#&'&()*%*+,quot;*$# takes to fix a particular issue. These estimates are central to -&$.%*+/quot;*0+0%+11quot;*$ 1 An issue is either a bug, feature request, or task. We refer to the Figure 1. Predicting effort for an issue report database that collects issues as bug database or issue tracking system.
    • What makes a good Bug Report ?
    • Ask Vanessa!
    • Online Survey Software Engineering Chair (Prof. Zeller) Saarland University, Dept. of Informatics Quality of Bug Reports Postfach 15 11 50 66041 Saarbrücken, Germany E-mail: zeller@cs.uni-sb.de Survey Phone: +49 (0)681 302-64011 As a developer, you perhaps have witnessed that quality of bug reports influences the effort that goes into fixing bugs. But what characterises a good bug report? Our survey adresses the quality of bug reports from a developer's perspective. We invite you to answer the following four questions, which will take no more than 5 minutes of your time. Please feel free to contact us at survey@st.cs.uni-sb.de. Many thanks in advance! Bug Report Quality Group Part A Question 1 Which of the following items have you previously used when fixing bugs? (select as many items as you wish) product hardware observed behavior screenshots component operating system expected behavior code examples version summary steps to reproduce error reports severity build information stack traces test cases Question 2 Which three items helped you the most? (select max. 3 items) product hardware observed behavior screenshots component operating system expected behavior code examples version summary steps to reproduce error reports severity build information stack traces test cases Part B Question 1 Which of the following problems have you encountered when fixing bugs? (select as many items as you wish) You were given wrong: There were errors in: The reporter used: Others: product name code examples bad grammar duplicates component name steps to reproduce unstructured text spam version number test cases prose text incomplete information hardware stack traces too long text viruses/worms operating system non-technical language observed behavior no spellcheck expected behavior Question 2 Which three problems caused you most delay in fixing bugs? (select max. 3 items) You were given wrong: There were errors in: The reporter used: Others: product name code examples bad grammar duplicates component name steps to reproduce unstructured text spam version number test cases prose text incomplete information hardware stack traces too long text viruses/worms operating system non-technical language observed behavior no spellcheck expected behavior Part C Comments Please feel free to share any interesting thoughts or experiences. Submit
    • Online Survey Part A : Helpful Information Software Engineering Chair (Prof. Zeller) Saarland University, Dept. of Informatics Quality of Bug Reports Postfach 15 11 50 66041 Saarbrücken, Germany E-mail: zeller@cs.uni-sb.de Survey Phone: +49 (0)681 302-64011 As a developer, you perhaps have witnessed that quality of bug reports influences the effort that goes into fixing bugs. But what characterises a good bug report? Our survey adresses the quality of bug reports from a developer's perspective. We invite you to answer the following four questions, which will take no more than 5 minutes of your time. Please feel free to contact us at survey@st.cs.uni-sb.de. Many thanks in advance! Bug Report Quality Group Part A Question 1 Which of the following items have you previously used when fixing bugs? (select as many items as you wish) product hardware observed behavior screenshots component operating system expected behavior code examples version summary steps to reproduce error reports severity build information stack traces test cases Question 2 Which three items helped you the most? (select max. 3 items) product hardware observed behavior screenshots component operating system expected behavior code examples version summary steps to reproduce error reports severity build information stack traces test cases Part B Question 1 Which of the following problems have you encountered when fixing bugs? (select as many items as you wish) You were given wrong: There were errors in: The reporter used: Others: product name code examples bad grammar duplicates component name steps to reproduce unstructured text spam version number test cases prose text incomplete information hardware stack traces too long text viruses/worms operating system non-technical language observed behavior no spellcheck expected behavior Question 2 Which three problems caused you most delay in fixing bugs? (select max. 3 items) You were given wrong: There were errors in: The reporter used: Others: product name code examples bad grammar duplicates component name steps to reproduce unstructured text spam version number test cases prose text incomplete information hardware stack traces too long text viruses/worms operating system non-technical language observed behavior no spellcheck expected behavior Part C Comments Please feel free to share any interesting thoughts or experiences. Submit
    • Online Survey Part A : Helpful Information Software Engineering Chair (Prof. Zeller) Saarland University, Dept. of Informatics Quality of Bug Reports Postfach 15 11 50 66041 Saarbrücken, Germany E-mail: zeller@cs.uni-sb.de Survey Phone: +49 (0)681 302-64011 As a developer, you perhaps have witnessed that quality of bug reports influences the effort that goes into fixing bugs. But what characterises a good bug report? Our survey adresses the quality of bug reports from a developer's perspective. We invite you to answer the following four questions, which will take no more than 5 minutes of your time. Please feel free to contact us at survey@st.cs.uni-sb.de. Many thanks in advance! Bug Report Quality Group Part A Question 1 Which of the following items have you previously used when fixing bugs? (select as many items as you wish) product hardware observed behavior screenshots component operating system expected behavior code examples version summary steps to reproduce error reports Part B : Potential Problems severity build information stack traces test cases Question 2 Which three items helped you the most? (select max. 3 items) product hardware observed behavior screenshots component operating system expected behavior code examples version summary steps to reproduce error reports severity build information stack traces test cases Part B Question 1 Which of the following problems have you encountered when fixing bugs? (select as many items as you wish) You were given wrong: There were errors in: The reporter used: Others: product name code examples bad grammar duplicates component name steps to reproduce unstructured text spam version number test cases prose text incomplete information hardware stack traces too long text viruses/worms operating system non-technical language observed behavior no spellcheck expected behavior Question 2 Which three problems caused you most delay in fixing bugs? (select max. 3 items) You were given wrong: There were errors in: The reporter used: Others: product name code examples bad grammar duplicates component name steps to reproduce unstructured text spam version number test cases prose text incomplete information hardware stack traces too long text viruses/worms operating system non-technical language observed behavior no spellcheck expected behavior Part C Comments Please feel free to share any interesting thoughts or experiences. Submit
    • Online Survey Part A : Helpful Information Software Engineering Chair (Prof. Zeller) Saarland University, Dept. of Informatics Quality of Bug Reports Postfach 15 11 50 66041 Saarbrücken, Germany E-mail: zeller@cs.uni-sb.de Survey Phone: +49 (0)681 302-64011 As a developer, you perhaps have witnessed that quality of bug reports influences the effort that goes into fixing bugs. But what characterises a good bug report? Our survey adresses the quality of bug reports from a developer's perspective. We invite you to answer the following four questions, which will take no more than 5 minutes of your time. Please feel free to contact us at survey@st.cs.uni-sb.de. Many thanks in advance! Bug Report Quality Group Part A Question 1 Which of the following items have you previously used when fixing bugs? (select as many items as you wish) product hardware observed behavior screenshots component operating system expected behavior code examples version summary steps to reproduce error reports Part B : Potential Problems severity build information stack traces test cases Question 2 Which three items helped you the most? (select max. 3 items) product hardware observed behavior screenshots component operating system expected behavior code examples version summary steps to reproduce error reports severity build information stack traces test cases Part B Question 1 Which of the following problems have you encountered when fixing bugs? (select as many items as you wish) You were given wrong: There were errors in: The reporter used: Others: product name code examples bad grammar duplicates component name steps to reproduce unstructured text spam version number test cases prose text incomplete information hardware stack traces too long text viruses/worms operating system non-technical language observed behavior no spellcheck expected behavior Part C : Comments Question 2 Which three problems caused you most delay in fixing bugs? (select max. 3 items) You were given wrong: There were errors in: The reporter used: Others: product name code examples bad grammar duplicates component name steps to reproduce unstructured text spam version number test cases prose text incomplete information hardware stack traces too long text viruses/worms operating system non-technical language observed behavior no spellcheck expected behavior Part C Comments Please feel free to share any interesting thoughts or experiences. Submit
    • Online Survey Part A : Helpful Information Software Engineering Chair (Prof. Zeller) Saarland University, Dept. of Informatics Quality of Bug Reports Postfach 15 11 50 66041 Saarbrücken, Germany E-mail: zeller@cs.uni-sb.de Survey Phone: +49 (0)681 302-64011 As a developer, you perhaps have witnessed that quality of bug reports influences the effort that goes into fixing bugs. But what characterises a good bug report? Our survey adresses the quality of bug reports from a developer's perspective. We invite you to answer the following four questions, which will take no more than 5 minutes of your time. Please feel free to contact us at survey@st.cs.uni-sb.de. Many thanks in advance! Bug Report Quality Group Part A Question 1 Which of the following items have you previously used when fixing bugs? (select as many items as you wish) product hardware observed behavior screenshots component operating system expected behavior code examples version summary steps to reproduce error reports Part B : Potential Problems severity build information stack traces test cases Question 2 Which three items helped you the most? (select max. 3 items) product hardware observed behavior screenshots component operating system expected behavior code examples version summary steps to reproduce error reports severity build information stack traces test cases Part B Question 1 Which of the following problems have you encountered when fixing bugs? (select as many items as you wish) You were given wrong: There were errors in: The reporter used: Others: product name code examples bad grammar duplicates component name steps to reproduce unstructured text spam version number test cases prose text incomplete information hardware stack traces too long text viruses/worms operating system non-technical language observed behavior no spellcheck expected behavior Part C : Comments Question 2 Which three problems caused you most delay in fixing bugs? (select max. 3 items) You were given wrong: There were errors in: The reporter used: Others: product name code examples bad grammar duplicates component name steps to reproduce unstructured text spam version number test cases prose text incomplete information hardware stack traces too long text viruses/worms operating system non-technical language observed behavior no spellcheck expected behavior Part C Comments Please feel free to share any interesting thoughts or experiences. Part D: Rate Bug Reports Submit
    • Online Survey (Part A)
    • Online Survey (Part A)
    • Online Survey (Part A)
    • Online Survey (Part B)
    • Online Survey (Part B)
    • Online Survey (Part B)
    • Online Survey (Part B)
    • Online Survey (Part B)
    • Online Survey (Part B)
    • Survey Participants
    • Survey Participants 365 Invitations sent out
    • Survey Participants 365 Invitations sent out 29 Bounced back
    • Survey Participants 365 Invitations sent out 29 Bounced back 336 Developers reached
    • Survey Participants 365 Invitations sent out 29 Bounced back 336 Developers reached 48 Responses received
    • Survey Participants 365 Invitations sent out 29 Bounced back 336 Developers reached 48 Responses received Response Rate of 14%
    • Results of Part A Helped Most Used when fixing a Bug
    • Results of Part A Helped Most version hardware severity Used when fixing a Bug
    • Results of Part A Helped Most screenshots observed behavior test cases/examples expected behavior version hardware severity Used when fixing a Bug
    • Results of Part A steps to reproduce Helped Most stack traces screenshots observed behavior test cases/examples expected behavior version hardware severity Used when fixing a Bug
    • Results of Part B Hindered Most Encountered when fixing a Bug
    • Results of Part B Hindered Most wrong system information viruses/spam Encountered when fixing a Bug
    • Results of Part B Hindered Most bad grammar errors in test cases duplicates wrong system information viruses/spam Encountered when fixing a Bug
    • Results of Part B Hindered Most wrong observed behaviour wrong expected behaviour wrong product information bad grammar errors in test cases duplicates wrong system information viruses/spam Encountered when fixing a Bug
    • Results of Part B incomplete information Hindered Most erroneous steps to reproduce wrong observed behaviour wrong expected behaviour wrong product information bad grammar errors in test cases duplicates wrong system information viruses/spam Encountered when fixing a Bug
    • Voluntary Part: Rate Bug Reports Error in Java dropdown menu a la VAME debugger. The debugger should show bytecodes, and step through them, etc. if there is no source for a class file. VAME did this, and it was very useful if you didn't have source, because it allowed you to see where branches were, see what methods were called, etc. Even not bing able to read bytecodes, I found it way better than just showing a blank editor. NOTES: CM (9/26/2001 1:39:02 PM) Reopening because somebody on Eclipse Corner asked has now for it... <g>
    • Voluntary Part: Rate Bug Reports Error in Java dropdown menu a la VAME debugger. The debugger should show bytecodes, and step through them, etc. if there is no source for a class file. VAME did this, and it was very useful if you didn't have source, because it allowed you to see where branches were, see what methods were called, etc. Even not bing able to read bytecodes, I found it way better than just showing a blank editor. NOTES: CM (9/26/2001 1:39:02 PM) Reopening because somebody on Eclipse Corner asked has now for it... <g> 1 2 3 4 5
    • Bug Report: Example
    • Bug Report: Example 4 unique votes Average Score: 1.75
    • Bug Report: Example 2
    • Bug Report: Example 2 2 unique votes Average Score: 5.00
    • Can a Tool do the same?
    • Automatic Quality Measure: Idea Report
    • Automatic Quality Measure: Idea Report Tool
    • Automatic Quality Measure: Idea Report Tool Set of Metrics Keywords Stack Trace Code Examples Readability Repro Steps Screenshots
    • Automatic Quality Measure: Idea Quality Score Report Tool Set of Metrics Keywords Stack Trace Code Examples Readability Repro Steps Screenshots
    • Tool Prototype: quZilla
    • Tool Prototype: quZilla Developed in Python • Rapid Protoyping • String Processing Features
    • Tool Prototype: quZilla Developed in Python • Rapid Protoyping • String Processing Features Natural Language Processing • Tokenization • Stemmers
    • Tool Evaluation: quZilla as rated by the developers very poor poor medium good very good as predicted by quZilla very poor 0 0 0 0 0 poor 0 1 2 1 0 medium 0 5 21 12 0 good 1 5 20 18 3 very good 0 0 3 6 2
    • Tool Evaluation: quZilla as rated by the developers very poor poor medium good very good as predicted by quZilla very poor 0 0 0 0 0 poor 0 1 2 1 0 medium 0 5 21 12 0 good 1 5 20 18 3 very good 0 0 3 6 2
    • Tool Evaluation: quZilla as rated by the developers very poor poor medium good very good as predicted by quZilla very poor 0 0 0 0 0 poor 0 1 2 1 0 medium 0 5 21 12 0 good 1 5 20 18 3 very good 0 0 3 6 2
    • Tool Evaluation: quZilla as rated by the developers very poor poor medium good very good as predicted by quZilla very poor 0 0 0 0 0 44% Agreement between quZilla &1 Developers poor 0 1 2 0 (90% within one interval) medium 0 5 21 12 0 good 1 5 20 18 3 very good 0 0 3 6 2
    • Improve quZilla
    • Improve quZilla Measure Quality
    • Improve quZilla Measure Quality Provide Incentives
    • Conclusions
    • Conclusions • Online Survey reveals important parts
    • Conclusions • Online Survey reveals important parts • Tools to provide automatic quality measure
    • Conclusions • Online Survey reveals important parts • Tools to provide automatic quality measure • Guide Reporter through the process
    • Conclusions • Online Survey reveals important parts • Tools to provide automatic quality measure • Guide Reporter through the process • Create Bug Reports of higher quality
    • Conclusions ! ou • Online Survey reveals important parts kY • Tools to provide automatic quality measure • Guide Reporter through the process an • Create Bug Reports of higher quality Th
    • Developer‘s Comments „The most important info that a „The most annoying problem is reporter can provide is a way to too brief bug description.“ reliably reproduce the problem.“ „Incomplete information is „Using Bug report template for bug the biggest problem.[...]“ Description would be helpful, I think.“