CosTriage: A Cost-Aware Algorithm for Bug Reporting Systems (AAAI 2011)
1. AAAI 2011
CosTriage: A Cost-Aware Algorithm for
Information & Database Systems Lab
Bug Reporting Systems
Jin-woo Park1, Mu-Woong Lee1, Jinhan Kim1, Seung-won Hwang1, Sunghun Kim2
POSTECH, Korea, Republic of1
HKUST, Hong Kong2
2. Bug reporting systems
Bugs!!
More than 300 bug reports per day in Mozilla (a big software
project)
Information & Database Systems Lab
Bug Solving
One of the important issues in a software development process
Bug reports are posted, discussed, and assigned to developers
Open sources projects
Apache
Eclipse
Linux kernel
Mozilla
3. Bug reporting systems
Bug reports
Has Bug ID, title, description, status, and other meta data
Assigned to developers
Fixed by developers
Information & Database Systems Lab
Challenges
Bug triage
Duplicate bug detection
…
bug ID
title (summary)
status
bug fix history (time)
other data
description
4. Bug reporting systems
Bug Triage
Assigning a new bug report to a suitable developer
Bottleneck of bug fixing process
Labor intensive
Information & Database Systems Lab
Miss-assignment can lead to slow bug fix
Bottleneck of
the bug fixing
process
assign
Open
Source Bug
Project Reports
Triager
Developers
Can be
automated!!!
5. Preliminary (recommendation)
Recommender algorithms
Content-based recommendation (CBR)
Predicting user’s interests based on item features
Machine learning methods
Information & Database Systems Lab
Over-specialization problem
Collaborative filtering recommendation (CF)
Predicting user’s interests based on affinity’s interests for items
User neighborhood
Sparsity problem
Hybrid recommendation
Content-boosted collaborative filtering (CBCF)
Combining an existing CBR with a CF
Better performance than either approach alone
6. Preliminary (recommendation)
Recommender algorithms
Content-based recommendation (CBR)
Predict user’s interests based on item features
Title, Status, Description, …
Information & Database Systems Lab
Learn features using machine learning methods
Recommend similar items
Over-specialization problem
Developers
Bug
Report 4 Title Bug 1 Bug 2 Bug 3 Bug 4
Dev 1 10 9 1 ?
Feature word1 word2 word3 Dev 2 6 5 10 ?
Count 1 1 4 Dev 3 8 7 7 ?
Rating score table
7. Preliminary (recommendation)
Recommender algorithms
Content-based recommendation (CBR)
Predict user’s interests based on item features
Title, Status, Description, …
Information & Database Systems Lab
Learn features using machine learning methods
Recommend similar items
Over-specialization problem
Developers
Bug
Report 4 Title Bug 1 Bug 2 Bug 3 Bug 4
Dev 1 10 9 1 9
Feature word1 word2 word3 Dev 2 6 5 10 5
Count 1 1 4 Dev 3 8 7 7 7
Rating score table
8. Preliminary (recommendation)
Recommender algorithms
Collaborative filtering recommendation (CF)
Predicting user’s interests based on affinity’s interests for items
User neighborhood
Information & Database Systems Lab
Sparsity problem
Bug
Reports
Bug 1 Bug 2 Bug 3
Developer 1 10 5 10
Developer 2 5 10 6
Developer 3 7 7 7
Developer 4 9 5 ?
9. Preliminary (recommendation)
Recommender algorithms
Collaborative filtering recommendation (CF)
Predicting user’s interests based on affinity’s interests for items
User neighborhood
Information & Database Systems Lab
Sparsity problem
Bug
Reports
Bug 1 Bug 2 Bug 3
Developer 1 10 5 10
Developer 2 5 10 6
Developer 3 7 7 7
Developer 4 9 5 ?
10. Preliminary (recommendation)
Recommender algorithms
Collaborative filtering recommendation (CF)
Predicting user’s interests based on affinity’s interests for items
User neighborhood
Information & Database Systems Lab
Sparsity problem
Bug
Reports
Bug 1 Bug 2 Bug 3
Developer 1 10 5 10
Developer 2 5 10 6
Developer 3 7 7 7
Developer 4 9 5 9
11. Preliminary (recommendation)
Recommender algorithms
Collaborative filtering recommendation (CF)
Predicting user’s interests based on affinity’s interests for items
User neighborhood
Information & Database Systems Lab
Sparsity problem
Unsuitable for triaging because:
Bug
No one solved new bug!!
Report 4
Bug 1 Bug 2 Bug 3 Bug 4
Developer 1 10 5 10 ?
Developer 2 5 10 6 ?
Developer 3 7 7 7 ?
Developer 4 9 5 9 ?
12. Preliminary (recommendation)
Recommender algorithms
Collaborative filtering recommendation (CF)
Predicting user’s interests based on affinity’s interests for items
User neighborhood
Information & Database Systems Lab
Sparsity problem
Unsuitable for triaging because:
Bug
Rating is extremely sparse!!
Report 4
Bug 1 Bug 2 Bug 3 Bug 4
Developer 1 10 5 ? ?
Developer 2 ? ? ? ?
Developer 3 ? ? 10 ?
Developer 4 ? ? ? 3
13. Preliminary (recommendation)
Recommender algorithms
Hybrid recommendation
Content-boosted collaborative filtering (CBCF)
Combining an existing CBR with a CF
Information & Database Systems Lab
Better performance than either approach alone
Two Phases
Bug
- CBR phase Feature word1 word2 word3
Report 5
- CF Phase Count 3 2 4
Bug 1 Bug 2 Bug 3 Bug 4 Bug 5
Dev 1 10 ? ? ? ?
Dev 2 ? 8 3 ? ?
Dev 3 ? ? ? 7 ?
14. Preliminary (recommendation)
Recommender algorithms
Hybrid recommendation
Content-boosted collaborative filtering (CBCF)
Combining an existing CBR with a CF
Information & Database Systems Lab
Better performance than either approach alone
CBR phase
Bug
Report 5
Bug 1 Bug 2 Bug 3 Bug 4 Bug 5
Dev 1 10 10 10 10 10
Dev 2 3 8 3 8 3
Dev 3 7 7 7 7 7
15. Preliminary (recommendation)
Recommender algorithms
Hybrid recommendation
Content-boosted collaborative filtering (CBCF)
Combining an existing CBR with a CF
Information & Database Systems Lab
Improving CBR/CF by combining the complementary strength of CBR
and CF
CF phase
Bug
Report 5
Bug 1 Bug 2 Bug 3 Bug 4 Bug 5
Dev 1 10 9 7 9 8
Dev 2 5 8 3 8 5
Dev 3 7 8 5 7 6
Existing recommendation approaches are not suitable!
16. Preliminary (Bug triage)
PureCBR [Anvik06]
Construct multi-class classifier using a SVM classifier
Bug reports B are converted into pair <F(B), D> for training
Bug Report History Input Data
Information & Database Systems Lab
Developers Classes
F(B) is the feature vector indicating the counts of the keyword w of description of B
D is the developer who fixed B (class)
Bug Fix
History
training
Assign a new bug to Dev 1
New Bug
Report
Classifier’s scores
17. Preliminary (Bug triage)
PureCBR [Anvik06]
Good performances
Problem
This approach only considers accuracy
Information & Database Systems Lab
Many bugs One super-developer
Over-specialization problem
Are developer happy?
We consider developer’s cost (e.g., interests, bug fix time, and
expertise)
We assume that faster bug fixing time has higher developer cost
Bug Bug
Report 1 Report 2
50 days 2 days
18. Goal
Goal
Find efficient bug-developer matching
Optimizing not only accuracy but also cost
Use modified CBCF approach
Information & Database Systems Lab
Constructing developer profiles for cost
Challenge
Enhancing CBCF approach for sparse data
Extreme sparseness of the past bug fix history data
A bug fixed by a developer
Need to reduce sparseness for enhancing quality of CBCF
Bug fix time from bug fix history
19. Overview
Cost
Developer profiles Cost score
Information & Database Systems Lab
Recommended
New Bug Aggregation
Developer
Report Accuracy
Bug classifier Accuracy score
<SVM>
Merging classifier’s scores and developer’s cost scores.
The accuracy scores are obtained using PureCBR [Anvik06]
The developer cost scores are obtained from “de-sparsified” bug
fix history.
Two scores are then merged for prediction
Assign a new bug to Dev 1
+ =
Accuracy scores Cost scores Hybrid scores
20. CosTriage (Cost Estimation)
CosTriage: A Cost-aware Triage Algorithm for bug reporting
system
Challenge to estimate the developer cost?
How to reduce the sparseness problem?
Information & Database Systems Lab
Using a Topic Modeling
Bug fix time from bug fix history
Categorization bugs to reduce the sparseness
21. CosTriage
Categorizing bugs
Topic Modeling
Latent Dirichlet Allocation (LDA) [BleiNg03]
Each topic is represented as a bug type
Information & Database Systems Lab
The topic distribution of reports determine bug types
We adopt the divergence measure proposed in [Arun, R. PAKDD ‘10]
Finding the natural number of topics (# bug types)
t is the natural number of bug types
22. CosTriage
Developer profiles modeling
Developer Profiles
N-dimensional feature vector
The element of developer profiles, Pu[i], denotes the developer cost
Information & Database Systems Lab
for ith-type bugs
T denotes the number of bug types
Developer Cost
The average time to fix ith type bugs
23. CosTriage
Predicting missing values in profiles
Using CF for developer profiles
Similarity measure:
Information & Database Systems Lab
k=1
24. CosTriage
Obtaining developer’s cost for a new bug report
Bug type = 1
Information & Database Systems Lab
New Bug
Report
Developer cost for a new bug
26. Experiments
Subject Systems
97,910 valid bug reports
255 active developers
From four open source projects
Information & Database Systems Lab
Approaches
PureCBR: State of the art CBR-based approach
CBCF: Original CBCF
CosTraige: Our approach
27. Experiments
Two research questions
Q1. How much can our approach improve cost (bug fix time) without
sacrificing bug assignment accuracy?
Q2. What are the trade-offs between accuracy and cost (bug fix
Information & Database Systems Lab
time)?
Evaluation measures
W is the set of bug reports predicted correctly.
N is the number of bug reports in the test set.
The real fix time is unknown, we only use the fix time for correctly
matched bugs.
28. Experiments
Relative errors of expected bug fix time
Information & Database Systems Lab
Improvement of bug fix time (Q1)
CosTriage improves the costs efficiently up to 30% without
seriously compromising accuracy
29. Experiments
Trade-off between accuracy and bug fix time (Q2)
Information & Database Systems Lab
30. Conclusion
We proposed a new bug triaging technique
Optimize not only accuracy but also cost
Solve data sparseness problem by using topic modeling
Information & Database Systems Lab
Experiments using four real bug report corpora
Improve the cost without heavy losses of accuracy
33. Back up
We adopt the divergence measure proposed in [Arun10]
Finding the natural number of topics (# bug types)
Information & Database Systems Lab
t is the natural number of bug types
Mozilla Apache
34. Back up - Bug features
Bug features
Keywords of title and description
Other meta data
Information & Database Systems Lab
Title : Traditional Memory Rendering refactoring request
New Bug Description : Request additional refactoring so we can
…
Report Traditional Rendering.
Remove stopwords
Traditional Memory Rendering refactoring request Request
refactoring … Traditional Rendering