Predicting Bug Fixing Efforts for 
Open-source Software Systems 
Prashant Raghav, Jenny Wang 
CS 846
OUTLINE 
1.Problem and Our Solution 
2.Initial Setup 
3.COS 
4.Effort Estimation 
5.References
Traditional Approach 
LOC/Avg LOC ph by a developer = Total 
number of developer hours 
● Doesn’t account for 
○ Project complexity 
○ Developer Proficiency
Our Tool ... 
Tell me who the bug is Assigned to I will tell 
you How much time it gonna take ?
1.Selecting Dataset 
Choices : BugZilla, JBoss project, Linux 
Apache Hadoop Common Issue Tracking 
System.
Issues 30 Day Summary 
Hadoop Common is the common library for 
Apache Hadoop 
Issues 30 Day Summary. 
Issues: 114 created 
66 resolved
2.Data Extraction 
Download Data 
Extract Developers Bug Fix Activity 
ID, Title, Description, Status, Detail, Developer
3. Database 
Store each Defect in DB with defect 
information.
3.New Defect 
Compare with Previous Defects. 
● Duplicate Defect 
● New Feature
Bug : Incomplete Closing of Firefox 
Hadoop : 
Bug-12435 
Unable to run Hadoop (2.2.0) commands on Cygwin (2.831) on Windows XP 3 
Bug-239223 
(Ghostproc) – Hadoop version 2.2.0 command while running on Windows XP3 
using Cygwin(2.831)
Bug : Incomplete Closing of Firefox 
Hadoop : 
Bug-12435 
Unable to execute Hadoop (2.2.0) commands on Cygwin (2.831) running on 
Windows XP SP3 
Bug-239223 
(Ghostproc) – Hadoop version 2.2.0 command could not run on Windows XP 
(service pack 3) using Cygwin(2.831)
More Bugs 
Bug-244372: "Document contains no data" 
message on continuation page of NY Times 
article 
Bug-219232: random "The Document 
contains no data." Alerts
4.Coefficient of Similarity 
CoS : Depends on various factors. 
a) Are the code files similar? 
b) Input Files Similar ? 
c) Fraction of common keywords ? 
d) Which component ?
5.COS 
More the similarity Higher the CoS. 
Exact Duplicate Defect CoS =1 
CoS = w1*TS + w2*FS + w3*CS + w4*IFS 
* TS : Bug Report Similarity 
FS : Source Files Similarity 
SC : Component Similarity 
IS : Input Files Similarity 
where Wi are weight to be determined by experiments.
6. Programmer Proficiency 
4 Buckets : 
● Beginner 
● Intermediate 
● Seasoned 
● Expert
7.Bracket Determination 
Bracket Adjustment Factor(BAF) 
● Commits to Software(Features) 
● 6 months time Frame. 
○ No of Defects Solved 
○ No of Defects Reopened
8.Priority of Bug 
Priority Adjustment Factor 
● High 
● Medium 
● Low
9.Comparing CoS for all defects 
Each New Defect Compared against all 
defects defined in the database. Those with 
Highest CoS are extracted .
10. Effort Estimate 
COS > threshold 
ES -> Σ CoSi * DTi / n 
Where n - > No Of Similar Defects 
DTi : Developer Time 
Programmer Proficiency 
Priority of Defect 
If CoS = 1 : Duplicate Defect Discard
References 
● https://issues.apache.org 
● http://menzies.us/pdf/11ase.pdf 
● Local vs. Global Models for Effort Estimation and Defect 
Prediction , Tim Menzis , Andrian Marcus 
● Towards Improving Bug Tracking Systems with Game 
Mechanisms ,Leonardo Pasos ,University of Waterloo

Recommendation system for code bugs.

  • 1.
    Predicting Bug FixingEfforts for Open-source Software Systems Prashant Raghav, Jenny Wang CS 846
  • 2.
    OUTLINE 1.Problem andOur Solution 2.Initial Setup 3.COS 4.Effort Estimation 5.References
  • 3.
    Traditional Approach LOC/AvgLOC ph by a developer = Total number of developer hours ● Doesn’t account for ○ Project complexity ○ Developer Proficiency
  • 4.
    Our Tool ... Tell me who the bug is Assigned to I will tell you How much time it gonna take ?
  • 6.
    1.Selecting Dataset Choices: BugZilla, JBoss project, Linux Apache Hadoop Common Issue Tracking System.
  • 7.
    Issues 30 DaySummary Hadoop Common is the common library for Apache Hadoop Issues 30 Day Summary. Issues: 114 created 66 resolved
  • 8.
    2.Data Extraction DownloadData Extract Developers Bug Fix Activity ID, Title, Description, Status, Detail, Developer
  • 9.
    3. Database Storeeach Defect in DB with defect information.
  • 10.
    3.New Defect Comparewith Previous Defects. ● Duplicate Defect ● New Feature
  • 11.
    Bug : IncompleteClosing of Firefox Hadoop : Bug-12435 Unable to run Hadoop (2.2.0) commands on Cygwin (2.831) on Windows XP 3 Bug-239223 (Ghostproc) – Hadoop version 2.2.0 command while running on Windows XP3 using Cygwin(2.831)
  • 12.
    Bug : IncompleteClosing of Firefox Hadoop : Bug-12435 Unable to execute Hadoop (2.2.0) commands on Cygwin (2.831) running on Windows XP SP3 Bug-239223 (Ghostproc) – Hadoop version 2.2.0 command could not run on Windows XP (service pack 3) using Cygwin(2.831)
  • 13.
    More Bugs Bug-244372:"Document contains no data" message on continuation page of NY Times article Bug-219232: random "The Document contains no data." Alerts
  • 14.
    4.Coefficient of Similarity CoS : Depends on various factors. a) Are the code files similar? b) Input Files Similar ? c) Fraction of common keywords ? d) Which component ?
  • 15.
    5.COS More thesimilarity Higher the CoS. Exact Duplicate Defect CoS =1 CoS = w1*TS + w2*FS + w3*CS + w4*IFS * TS : Bug Report Similarity FS : Source Files Similarity SC : Component Similarity IS : Input Files Similarity where Wi are weight to be determined by experiments.
  • 16.
    6. Programmer Proficiency 4 Buckets : ● Beginner ● Intermediate ● Seasoned ● Expert
  • 17.
    7.Bracket Determination BracketAdjustment Factor(BAF) ● Commits to Software(Features) ● 6 months time Frame. ○ No of Defects Solved ○ No of Defects Reopened
  • 18.
    8.Priority of Bug Priority Adjustment Factor ● High ● Medium ● Low
  • 19.
    9.Comparing CoS forall defects Each New Defect Compared against all defects defined in the database. Those with Highest CoS are extracted .
  • 20.
    10. Effort Estimate COS > threshold ES -> Σ CoSi * DTi / n Where n - > No Of Similar Defects DTi : Developer Time Programmer Proficiency Priority of Defect If CoS = 1 : Duplicate Defect Discard
  • 21.
    References ● https://issues.apache.org ● http://menzies.us/pdf/11ase.pdf ● Local vs. Global Models for Effort Estimation and Defect Prediction , Tim Menzis , Andrian Marcus ● Towards Improving Bug Tracking Systems with Game Mechanisms ,Leonardo Pasos ,University of Waterloo