This document describes techniques for selecting discriminating terms from bug reports to improve the performance of automated bug assignment. It presents two term selection techniques based on log odds ratio scoring: Terms From All, which selects the highest scoring terms regardless of developer, and Term-Class Related, which selects a fixed or variable number of the highest scoring terms for each developer class. An experimental evaluation is conducted to assess the classification performance of these term selection methods.
Formal analysis of selecting discriminating terms for bug assignment
1. Selecting Discriminating Terms for Bug Assignment: A
Formal Analysis.
Ibrahim Aljarah, Shadi Banitaan, Sameer Abufardeh, Wei Jin and Saeed Salem
North Dakota State University, Fargo, ND, USA
This research is supported by
2. Presentation Outlines
2
Bug Assignment Problem Overview
Bug Assignment Steps.
Term Selections
Log Odds Ratio based Term Selection Techniques.
Experimental Results
Conclusion
Future Directions
3. Bug Assignment Problem
3
Suggest whom to assign this bug to.
Assign the bug to an appropriate developer.
New Bugs D1 B2 D2
B2 B6
B1 B1
B3
B4
B7
B5
D3 D4
B5 B7
B6 Bug Triager B3
B4
7. Bug-term matrix (M) and
Bug-developer vector (Y) construction
7
t1 t2 t3 ……. tR
b1 0 0 1 ….… 1 d1
b2 1 1 1 ….… 0 d1
b3 0 0 1 ….… 1 d3
b4 1 1 0 ….… 0 d1
b5 0 0 1 ….… 1 d5
. 1 0 0 ….… 0 .
. 0 0 1 ….… 1 .
. 1 1 1 ….… 0 .
. . . . ……. . .
bN . . . ……. . d9
Need to assign a value {0,1} to each entry of the bug-
M Y
term matrix.
T = {t1, t2, · · · , tR} is a set of R terms.
D = {d1,....., dL} is a set of L pre-defined developers.
B = {b1,..... bN} is a set of N bug reports to be assigned.
8. Term Selection
8
Term Selection:
It selects a subset of terms to describe the bug report.
It has been noted that the terms selection can be a good idea to
reduce the calculations time.
Thus, it Leads to significant improvement in classification
performance.
Common Techniques: Information Gain, Latent Semantic
analysis.
9. Discriminating Terms
9
A term that it is commonly found in the bug reports
that have been fixed by a specific developer, but
rarely found in other bug reports.
Log Odds Ratio Score used to decide which terms
are discriminated.
Research goal: improving the classification quality
by discarding non-discriminating terms before doing
the classification task(bug assignment).
10. Log Odds Ratio (LOR)
10
The LOR score is calculated with respect to the individual
developer (class) which discriminates the terms in that class.
High score means that it is more discriminated.
The LOR score is calculated as follows:
12. Proposed Term Selection Techniques
Log-Odds-Ratio-based techniques
12
Terms From All selection (TFA)
In this method, the R' terms that have the highest LOR scores
will be chosen without considering the terms distribution
over all developers.
All the LOR scores for the terms that are related to each
class terms are combined in one common list
Then scores are sorted
And finally the R′ terms with the highest scoring are
extracted from the list.
14. Proposed Term Selection Techniques
Log-Odds-Ratio-based techniques
14
Term-Class Related selection (TCR):
Idea : Select k terms from each class (developer).
It enhances the selection criteria by targeting terms that
have the highest LOR scores in each class.
Two ways are suggested to specify k, which are:
Equally Likely.
Variable.
15. Proposed Term Selection Techniques
Log-Odds-Ratio-based techniques
15
TCR- ki Equally Likely:
Choosing fixed number of terms for each class. (k)
For example:
if we have 10 classes (developers) and we need to select
100 terms then we select 10 terms from the highest LOR
scored terms for each developer.
We maintain a unique set of terms, i.e., the number of
obtained terms R′ can be less than or equal to k × L.
17. Proposed Term Selection Techniques
Log-Odds-Ratio-based techniques
17
TCR- ki Variable:
Choosing a variable number of terms for each class.
k is specified based on the developer fixing rate.
Fixing rate: is proportional to the number of bug reports
assigned to the developer from all available bug reports.
Selection of the highest scored terms with (R' =20) from 100 bug
reports and 5 developers:
21. Experimental results
21
Eclipse Project Bugs Dataset:
A variety of open bug repositories are used in open source
development, our experiments applied on Bugzilla repository related
to Eclipse (https://bugs.eclipse.org).
Number of bugs in 2009 are:
Total Reported 38843 Bugs.
FIXED 20502 Bugs.
WONTFIX 1182 Bugs.
DUBLICATE 3120 Bugs.
WORKSFORME 1362 Bugs
INVALID 1465 Bugs
Not Eclipse 365 Bugs
Other (REASSINED ,NEW,REOPEN) 10847 Bugs (Still without Resolution)
22. Bugs Reports Status and Resolutions
22
FIXED WONTFIX DUBLICATE WORKSFORME
INVALID NOTECLIPSE OTHER
28%
53%
1%
4%
3%
8%
3%
23. Experimental results
23
Eclipse Bugs Reports Components:
Bugzilla Repository - Eclipse Project divided in 907 different components.
We use the most motivated components (have maximum Fixed Bugs) are :
Core Component: JDT Core is the Java infrastructure of the Java IDE
http://www.eclipse.org/jdt/core/index.php
UI Component: Java Development Toolkit UI.
http://www.eclipse.org/jdt/ui/index.html
SWT Component: Eclipse standard Widgets Toolkit.
http://www.eclipse.org/swt/
24. Number Of Fixed Bugs Per Component
24
2500
2000
1500
Count Of Fixed Bugs
1000
500
0
UI Core SWT
25. Experimental results
25
Evaluations:
Precision is the ratio of the correctly classified bug reports to
the total number of misclassified bug reports and correctly
classified bug reports.
Recall is the ratio of correctly classified bug reports to the total
number of unclassified bug reports and correctly classified bug
reports.
.
We used the Bayesian network Classifier.
26. Experimental results
26
Other Techniques used to compare are:
Information Gain which is calculated for each term with
respect to all classes, and terms with top R' information gain
values are returned.
Latent Semantic Analysis which is transforming terms into
concepts by extracting relations between terms in the
selected bug reports.
27. Experimental results
27
F-measure results of the five term selection methods using different number of terms.
These methods were applied on the Core component and only active developers were
considered.
28. Experimental results
28
The results for the SWT Component
TRC - ki Variable had the highest precision (0.59) and highest recall (0.55).
29. Experimental results
29
The results for the UI Component
TRC - ki Variable achieved the highest precision (0.56) and was from the highest
recall (0.46) values.
30. Conclusion
30
This research investigates the impact of several term selection methods on
effectiveness of the classification.
Three Log Odds Ratio (LOR) variants selection methods were proposed.
A comparison between the proposed selection methods and the
Information Gain (IG) and Latent Semantic Analysis (LSA) techniques was
done.
The LOR-based selection method (TRC - ki Variable) achieved:
up to 30% improvement in the precision and up to 5% in recall
These results demonstrate the impact of incorporating effective term
selection techniques on improving classification performance.
31. Future Directions
31
Investigation of other alternative weighting schemes to better identify
discriminating terms for improving classification accuracy.
Exploring the potential of incorporating external domain knowledge
and other evidence sources to better address the general bug
assignment task.
Expanding the data sets from multiple domains to further examine
the effectiveness of proposed term selection techniques.