CosTriage: A Cost-Aware Algorithm for Bug Reporting Systems (AAAI 2011)

  • 633 views
Uploaded on

 

More in: Technology , Education
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
633
On Slideshare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
0
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. AAAI 2011 CosTriage: A Cost-Aware Algorithm forInformation & Database Systems Lab Bug Reporting Systems Jin-woo Park1, Mu-Woong Lee1, Jinhan Kim1, Seung-won Hwang1, Sunghun Kim2 POSTECH, Korea, Republic of1 HKUST, Hong Kong2
  • 2. Bug reporting systems  Bugs!!  More than 300 bug reports per day in Mozilla (a big software project)Information & Database Systems Lab  Bug Solving  One of the important issues in a software development process  Bug reports are posted, discussed, and assigned to developers  Open sources projects  Apache  Eclipse  Linux kernel  Mozilla
  • 3. Bug reporting systems  Bug reports  Has Bug ID, title, description, status, and other meta data  Assigned to developers  Fixed by developersInformation & Database Systems Lab  Challenges  Bug triage  Duplicate bug detection  … bug ID title (summary) status bug fix history (time) other data description
  • 4. Bug reporting systems  Bug Triage  Assigning a new bug report to a suitable developer  Bottleneck of bug fixing process  Labor intensiveInformation & Database Systems Lab  Miss-assignment can lead to slow bug fix Bottleneck of the bug fixing process assign Open Source Bug Project Reports Triager Developers Can be automated!!!
  • 5. Preliminary (recommendation)  Recommender algorithms  Content-based recommendation (CBR)  Predicting user’s interests based on item features  Machine learning methodsInformation & Database Systems Lab  Over-specialization problem  Collaborative filtering recommendation (CF)  Predicting user’s interests based on affinity’s interests for items  User neighborhood  Sparsity problem  Hybrid recommendation  Content-boosted collaborative filtering (CBCF)  Combining an existing CBR with a CF  Better performance than either approach alone
  • 6. Preliminary (recommendation)  Recommender algorithms  Content-based recommendation (CBR)  Predict user’s interests based on item features  Title, Status, Description, …Information & Database Systems Lab  Learn features using machine learning methods  Recommend similar items  Over-specialization problem Developers Bug Report 4 Title Bug 1 Bug 2 Bug 3 Bug 4 Dev 1 10 9 1 ? Feature word1 word2 word3 Dev 2 6 5 10 ? Count 1 1 4 Dev 3 8 7 7 ? Rating score table
  • 7. Preliminary (recommendation)  Recommender algorithms  Content-based recommendation (CBR)  Predict user’s interests based on item features  Title, Status, Description, …Information & Database Systems Lab  Learn features using machine learning methods  Recommend similar items  Over-specialization problem Developers Bug Report 4 Title Bug 1 Bug 2 Bug 3 Bug 4 Dev 1 10 9 1 9 Feature word1 word2 word3 Dev 2 6 5 10 5 Count 1 1 4 Dev 3 8 7 7 7 Rating score table
  • 8. Preliminary (recommendation)  Recommender algorithms  Collaborative filtering recommendation (CF)  Predicting user’s interests based on affinity’s interests for items  User neighborhoodInformation & Database Systems Lab  Sparsity problem Bug Reports Bug 1 Bug 2 Bug 3 Developer 1 10 5 10 Developer 2 5 10 6 Developer 3 7 7 7 Developer 4 9 5 ?
  • 9. Preliminary (recommendation)  Recommender algorithms  Collaborative filtering recommendation (CF)  Predicting user’s interests based on affinity’s interests for items  User neighborhoodInformation & Database Systems Lab  Sparsity problem Bug Reports Bug 1 Bug 2 Bug 3 Developer 1 10 5 10 Developer 2 5 10 6 Developer 3 7 7 7 Developer 4 9 5 ?
  • 10. Preliminary (recommendation)  Recommender algorithms  Collaborative filtering recommendation (CF)  Predicting user’s interests based on affinity’s interests for items  User neighborhoodInformation & Database Systems Lab  Sparsity problem Bug Reports Bug 1 Bug 2 Bug 3 Developer 1 10 5 10 Developer 2 5 10 6 Developer 3 7 7 7 Developer 4 9 5 9
  • 11. Preliminary (recommendation)  Recommender algorithms  Collaborative filtering recommendation (CF)  Predicting user’s interests based on affinity’s interests for items  User neighborhoodInformation & Database Systems Lab  Sparsity problem Unsuitable for triaging because: Bug  No one solved new bug!! Report 4 Bug 1 Bug 2 Bug 3 Bug 4 Developer 1 10 5 10 ? Developer 2 5 10 6 ? Developer 3 7 7 7 ? Developer 4 9 5 9 ?
  • 12. Preliminary (recommendation)  Recommender algorithms  Collaborative filtering recommendation (CF)  Predicting user’s interests based on affinity’s interests for items  User neighborhoodInformation & Database Systems Lab  Sparsity problem Unsuitable for triaging because: Bug  Rating is extremely sparse!! Report 4 Bug 1 Bug 2 Bug 3 Bug 4 Developer 1 10 5 ? ? Developer 2 ? ? ? ? Developer 3 ? ? 10 ? Developer 4 ? ? ? 3
  • 13. Preliminary (recommendation)  Recommender algorithms  Hybrid recommendation  Content-boosted collaborative filtering (CBCF)  Combining an existing CBR with a CFInformation & Database Systems Lab  Better performance than either approach alone Two Phases Bug - CBR phase Feature word1 word2 word3 Report 5 - CF Phase Count 3 2 4 Bug 1 Bug 2 Bug 3 Bug 4 Bug 5 Dev 1 10 ? ? ? ? Dev 2 ? 8 3 ? ? Dev 3 ? ? ? 7 ?
  • 14. Preliminary (recommendation)  Recommender algorithms  Hybrid recommendation  Content-boosted collaborative filtering (CBCF)  Combining an existing CBR with a CFInformation & Database Systems Lab  Better performance than either approach alone CBR phase Bug Report 5 Bug 1 Bug 2 Bug 3 Bug 4 Bug 5 Dev 1 10 10 10 10 10 Dev 2 3 8 3 8 3 Dev 3 7 7 7 7 7
  • 15. Preliminary (recommendation)  Recommender algorithms  Hybrid recommendation  Content-boosted collaborative filtering (CBCF)  Combining an existing CBR with a CFInformation & Database Systems Lab  Improving CBR/CF by combining the complementary strength of CBR and CF CF phase Bug Report 5 Bug 1 Bug 2 Bug 3 Bug 4 Bug 5 Dev 1 10 9 7 9 8 Dev 2 5 8 3 8 5 Dev 3 7 8 5 7 6 Existing recommendation approaches are not suitable!
  • 16. Preliminary (Bug triage)  PureCBR [Anvik06]  Construct multi-class classifier using a SVM classifier  Bug reports B are converted into pair <F(B), D> for training  Bug Report History  Input DataInformation & Database Systems Lab  Developers  Classes F(B) is the feature vector indicating the counts of the keyword w of description of B D is the developer who fixed B (class) Bug Fix History training Assign a new bug to Dev 1 New Bug Report Classifier’s scores
  • 17. Preliminary (Bug triage)  PureCBR [Anvik06]  Good performances  Problem  This approach only considers accuracyInformation & Database Systems Lab  Many bugs  One super-developer  Over-specialization problem  Are developer happy?  We consider developer’s cost (e.g., interests, bug fix time, and expertise)  We assume that faster bug fixing time has higher developer cost Bug Bug Report 1 Report 2 50 days 2 days
  • 18. Goal  Goal  Find efficient bug-developer matching  Optimizing not only accuracy but also cost  Use modified CBCF approachInformation & Database Systems Lab  Constructing developer profiles for cost  Challenge  Enhancing CBCF approach for sparse data  Extreme sparseness of the past bug fix history data  A bug fixed by a developer  Need to reduce sparseness for enhancing quality of CBCF Bug fix time from bug fix history
  • 19. Overview Cost Developer profiles Cost scoreInformation & Database Systems Lab Recommended New Bug Aggregation Developer Report Accuracy Bug classifier Accuracy score <SVM>  Merging classifier’s scores and developer’s cost scores.  The accuracy scores are obtained using PureCBR [Anvik06]  The developer cost scores are obtained from “de-sparsified” bug fix history.  Two scores are then merged for prediction Assign a new bug to Dev 1 + = Accuracy scores Cost scores Hybrid scores
  • 20. CosTriage (Cost Estimation)  CosTriage: A Cost-aware Triage Algorithm for bug reporting system  Challenge to estimate the developer cost?  How to reduce the sparseness problem?Information & Database Systems Lab  Using a Topic Modeling Bug fix time from bug fix history Categorization bugs to reduce the sparseness
  • 21. CosTriage  Categorizing bugs  Topic Modeling  Latent Dirichlet Allocation (LDA) [BleiNg03]  Each topic is represented as a bug typeInformation & Database Systems Lab  The topic distribution of reports determine bug types  We adopt the divergence measure proposed in [Arun, R. PAKDD ‘10]  Finding the natural number of topics (# bug types) t is the natural number of bug types
  • 22. CosTriage  Developer profiles modeling  Developer Profiles  N-dimensional feature vector  The element of developer profiles, Pu[i], denotes the developer costInformation & Database Systems Lab for ith-type bugs T denotes the number of bug types  Developer Cost  The average time to fix ith type bugs
  • 23. CosTriage  Predicting missing values in profiles  Using CF for developer profiles  Similarity measure:Information & Database Systems Lab k=1
  • 24. CosTriage  Obtaining developer’s cost for a new bug report Bug type = 1Information & Database Systems Lab New Bug Report Developer cost for a new bug
  • 25. Merging Cost Developer profiles Cost score <Bug types>Information & Database Systems Lab Recommended Bug Aggregation Reports Developer Accuracy Bug classifier <SVM> Accuracy score  Merging classifier’s scores and developer’s cost scores. + = Accuracy scores Cost scores (CosTriage) Hybrid scores [Anvik06]
  • 26. Experiments  Subject Systems  97,910 valid bug reports  255 active developers  From four open source projectsInformation & Database Systems Lab  Approaches  PureCBR: State of the art CBR-based approach  CBCF: Original CBCF  CosTraige: Our approach
  • 27. Experiments  Two research questions Q1. How much can our approach improve cost (bug fix time) without sacrificing bug assignment accuracy? Q2. What are the trade-offs between accuracy and cost (bug fixInformation & Database Systems Lab time)?  Evaluation measures W is the set of bug reports predicted correctly. N is the number of bug reports in the test set.  The real fix time is unknown, we only use the fix time for correctly matched bugs.
  • 28. Experiments  Relative errors of expected bug fix timeInformation & Database Systems Lab  Improvement of bug fix time (Q1)  CosTriage improves the costs efficiently up to 30% without seriously compromising accuracy
  • 29. Experiments  Trade-off between accuracy and bug fix time (Q2)Information & Database Systems Lab
  • 30. Conclusion  We proposed a new bug triaging technique  Optimize not only accuracy but also cost  Solve data sparseness problem by using topic modelingInformation & Database Systems Lab  Experiments using four real bug report corpora  Improve the cost without heavy losses of accuracy
  • 31. Q&AInformation & Database Systems Lab Thank you! Do you have any questions?
  • 32. ContactInformation & Database Systems Lab Jin-woo Park jwpark85@postech.ac.kr
  • 33. Back up  We adopt the divergence measure proposed in [Arun10]  Finding the natural number of topics (# bug types)Information & Database Systems Lab t is the natural number of bug types Mozilla Apache
  • 34. Back up - Bug features  Bug features  Keywords of title and description  Other meta dataInformation & Database Systems Lab Title : Traditional Memory Rendering refactoring request New Bug Description : Request additional refactoring so we can … Report Traditional Rendering. Remove stopwords Traditional Memory Rendering refactoring request Request refactoring … Traditional Rendering