Your SlideShare is downloading. ×
CosTriage: A Cost-Aware Algorithm for Bug Reporting Systems (AAAI 2011)
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Introducing the official SlideShare app

Stunning, full-screen experience for iPhone and Android

Text the download link to your phone

Standard text messaging rates apply

CosTriage: A Cost-Aware Algorithm for Bug Reporting Systems (AAAI 2011)

655
views

Published on

Published in: Technology, Education

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
655
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
0
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. AAAI 2011 CosTriage: A Cost-Aware Algorithm forInformation & Database Systems Lab Bug Reporting Systems Jin-woo Park1, Mu-Woong Lee1, Jinhan Kim1, Seung-won Hwang1, Sunghun Kim2 POSTECH, Korea, Republic of1 HKUST, Hong Kong2
  • 2. Bug reporting systems  Bugs!!  More than 300 bug reports per day in Mozilla (a big software project)Information & Database Systems Lab  Bug Solving  One of the important issues in a software development process  Bug reports are posted, discussed, and assigned to developers  Open sources projects  Apache  Eclipse  Linux kernel  Mozilla
  • 3. Bug reporting systems  Bug reports  Has Bug ID, title, description, status, and other meta data  Assigned to developers  Fixed by developersInformation & Database Systems Lab  Challenges  Bug triage  Duplicate bug detection  … bug ID title (summary) status bug fix history (time) other data description
  • 4. Bug reporting systems  Bug Triage  Assigning a new bug report to a suitable developer  Bottleneck of bug fixing process  Labor intensiveInformation & Database Systems Lab  Miss-assignment can lead to slow bug fix Bottleneck of the bug fixing process assign Open Source Bug Project Reports Triager Developers Can be automated!!!
  • 5. Preliminary (recommendation)  Recommender algorithms  Content-based recommendation (CBR)  Predicting user’s interests based on item features  Machine learning methodsInformation & Database Systems Lab  Over-specialization problem  Collaborative filtering recommendation (CF)  Predicting user’s interests based on affinity’s interests for items  User neighborhood  Sparsity problem  Hybrid recommendation  Content-boosted collaborative filtering (CBCF)  Combining an existing CBR with a CF  Better performance than either approach alone
  • 6. Preliminary (recommendation)  Recommender algorithms  Content-based recommendation (CBR)  Predict user’s interests based on item features  Title, Status, Description, …Information & Database Systems Lab  Learn features using machine learning methods  Recommend similar items  Over-specialization problem Developers Bug Report 4 Title Bug 1 Bug 2 Bug 3 Bug 4 Dev 1 10 9 1 ? Feature word1 word2 word3 Dev 2 6 5 10 ? Count 1 1 4 Dev 3 8 7 7 ? Rating score table
  • 7. Preliminary (recommendation)  Recommender algorithms  Content-based recommendation (CBR)  Predict user’s interests based on item features  Title, Status, Description, …Information & Database Systems Lab  Learn features using machine learning methods  Recommend similar items  Over-specialization problem Developers Bug Report 4 Title Bug 1 Bug 2 Bug 3 Bug 4 Dev 1 10 9 1 9 Feature word1 word2 word3 Dev 2 6 5 10 5 Count 1 1 4 Dev 3 8 7 7 7 Rating score table
  • 8. Preliminary (recommendation)  Recommender algorithms  Collaborative filtering recommendation (CF)  Predicting user’s interests based on affinity’s interests for items  User neighborhoodInformation & Database Systems Lab  Sparsity problem Bug Reports Bug 1 Bug 2 Bug 3 Developer 1 10 5 10 Developer 2 5 10 6 Developer 3 7 7 7 Developer 4 9 5 ?
  • 9. Preliminary (recommendation)  Recommender algorithms  Collaborative filtering recommendation (CF)  Predicting user’s interests based on affinity’s interests for items  User neighborhoodInformation & Database Systems Lab  Sparsity problem Bug Reports Bug 1 Bug 2 Bug 3 Developer 1 10 5 10 Developer 2 5 10 6 Developer 3 7 7 7 Developer 4 9 5 ?
  • 10. Preliminary (recommendation)  Recommender algorithms  Collaborative filtering recommendation (CF)  Predicting user’s interests based on affinity’s interests for items  User neighborhoodInformation & Database Systems Lab  Sparsity problem Bug Reports Bug 1 Bug 2 Bug 3 Developer 1 10 5 10 Developer 2 5 10 6 Developer 3 7 7 7 Developer 4 9 5 9
  • 11. Preliminary (recommendation)  Recommender algorithms  Collaborative filtering recommendation (CF)  Predicting user’s interests based on affinity’s interests for items  User neighborhoodInformation & Database Systems Lab  Sparsity problem Unsuitable for triaging because: Bug  No one solved new bug!! Report 4 Bug 1 Bug 2 Bug 3 Bug 4 Developer 1 10 5 10 ? Developer 2 5 10 6 ? Developer 3 7 7 7 ? Developer 4 9 5 9 ?
  • 12. Preliminary (recommendation)  Recommender algorithms  Collaborative filtering recommendation (CF)  Predicting user’s interests based on affinity’s interests for items  User neighborhoodInformation & Database Systems Lab  Sparsity problem Unsuitable for triaging because: Bug  Rating is extremely sparse!! Report 4 Bug 1 Bug 2 Bug 3 Bug 4 Developer 1 10 5 ? ? Developer 2 ? ? ? ? Developer 3 ? ? 10 ? Developer 4 ? ? ? 3
  • 13. Preliminary (recommendation)  Recommender algorithms  Hybrid recommendation  Content-boosted collaborative filtering (CBCF)  Combining an existing CBR with a CFInformation & Database Systems Lab  Better performance than either approach alone Two Phases Bug - CBR phase Feature word1 word2 word3 Report 5 - CF Phase Count 3 2 4 Bug 1 Bug 2 Bug 3 Bug 4 Bug 5 Dev 1 10 ? ? ? ? Dev 2 ? 8 3 ? ? Dev 3 ? ? ? 7 ?
  • 14. Preliminary (recommendation)  Recommender algorithms  Hybrid recommendation  Content-boosted collaborative filtering (CBCF)  Combining an existing CBR with a CFInformation & Database Systems Lab  Better performance than either approach alone CBR phase Bug Report 5 Bug 1 Bug 2 Bug 3 Bug 4 Bug 5 Dev 1 10 10 10 10 10 Dev 2 3 8 3 8 3 Dev 3 7 7 7 7 7
  • 15. Preliminary (recommendation)  Recommender algorithms  Hybrid recommendation  Content-boosted collaborative filtering (CBCF)  Combining an existing CBR with a CFInformation & Database Systems Lab  Improving CBR/CF by combining the complementary strength of CBR and CF CF phase Bug Report 5 Bug 1 Bug 2 Bug 3 Bug 4 Bug 5 Dev 1 10 9 7 9 8 Dev 2 5 8 3 8 5 Dev 3 7 8 5 7 6 Existing recommendation approaches are not suitable!
  • 16. Preliminary (Bug triage)  PureCBR [Anvik06]  Construct multi-class classifier using a SVM classifier  Bug reports B are converted into pair <F(B), D> for training  Bug Report History  Input DataInformation & Database Systems Lab  Developers  Classes F(B) is the feature vector indicating the counts of the keyword w of description of B D is the developer who fixed B (class) Bug Fix History training Assign a new bug to Dev 1 New Bug Report Classifier’s scores
  • 17. Preliminary (Bug triage)  PureCBR [Anvik06]  Good performances  Problem  This approach only considers accuracyInformation & Database Systems Lab  Many bugs  One super-developer  Over-specialization problem  Are developer happy?  We consider developer’s cost (e.g., interests, bug fix time, and expertise)  We assume that faster bug fixing time has higher developer cost Bug Bug Report 1 Report 2 50 days 2 days
  • 18. Goal  Goal  Find efficient bug-developer matching  Optimizing not only accuracy but also cost  Use modified CBCF approachInformation & Database Systems Lab  Constructing developer profiles for cost  Challenge  Enhancing CBCF approach for sparse data  Extreme sparseness of the past bug fix history data  A bug fixed by a developer  Need to reduce sparseness for enhancing quality of CBCF Bug fix time from bug fix history
  • 19. Overview Cost Developer profiles Cost scoreInformation & Database Systems Lab Recommended New Bug Aggregation Developer Report Accuracy Bug classifier Accuracy score <SVM>  Merging classifier’s scores and developer’s cost scores.  The accuracy scores are obtained using PureCBR [Anvik06]  The developer cost scores are obtained from “de-sparsified” bug fix history.  Two scores are then merged for prediction Assign a new bug to Dev 1 + = Accuracy scores Cost scores Hybrid scores
  • 20. CosTriage (Cost Estimation)  CosTriage: A Cost-aware Triage Algorithm for bug reporting system  Challenge to estimate the developer cost?  How to reduce the sparseness problem?Information & Database Systems Lab  Using a Topic Modeling Bug fix time from bug fix history Categorization bugs to reduce the sparseness
  • 21. CosTriage  Categorizing bugs  Topic Modeling  Latent Dirichlet Allocation (LDA) [BleiNg03]  Each topic is represented as a bug typeInformation & Database Systems Lab  The topic distribution of reports determine bug types  We adopt the divergence measure proposed in [Arun, R. PAKDD ‘10]  Finding the natural number of topics (# bug types) t is the natural number of bug types
  • 22. CosTriage  Developer profiles modeling  Developer Profiles  N-dimensional feature vector  The element of developer profiles, Pu[i], denotes the developer costInformation & Database Systems Lab for ith-type bugs T denotes the number of bug types  Developer Cost  The average time to fix ith type bugs
  • 23. CosTriage  Predicting missing values in profiles  Using CF for developer profiles  Similarity measure:Information & Database Systems Lab k=1
  • 24. CosTriage  Obtaining developer’s cost for a new bug report Bug type = 1Information & Database Systems Lab New Bug Report Developer cost for a new bug
  • 25. Merging Cost Developer profiles Cost score <Bug types>Information & Database Systems Lab Recommended Bug Aggregation Reports Developer Accuracy Bug classifier <SVM> Accuracy score  Merging classifier’s scores and developer’s cost scores. + = Accuracy scores Cost scores (CosTriage) Hybrid scores [Anvik06]
  • 26. Experiments  Subject Systems  97,910 valid bug reports  255 active developers  From four open source projectsInformation & Database Systems Lab  Approaches  PureCBR: State of the art CBR-based approach  CBCF: Original CBCF  CosTraige: Our approach
  • 27. Experiments  Two research questions Q1. How much can our approach improve cost (bug fix time) without sacrificing bug assignment accuracy? Q2. What are the trade-offs between accuracy and cost (bug fixInformation & Database Systems Lab time)?  Evaluation measures W is the set of bug reports predicted correctly. N is the number of bug reports in the test set.  The real fix time is unknown, we only use the fix time for correctly matched bugs.
  • 28. Experiments  Relative errors of expected bug fix timeInformation & Database Systems Lab  Improvement of bug fix time (Q1)  CosTriage improves the costs efficiently up to 30% without seriously compromising accuracy
  • 29. Experiments  Trade-off between accuracy and bug fix time (Q2)Information & Database Systems Lab
  • 30. Conclusion  We proposed a new bug triaging technique  Optimize not only accuracy but also cost  Solve data sparseness problem by using topic modelingInformation & Database Systems Lab  Experiments using four real bug report corpora  Improve the cost without heavy losses of accuracy
  • 31. Q&AInformation & Database Systems Lab Thank you! Do you have any questions?
  • 32. ContactInformation & Database Systems Lab Jin-woo Park jwpark85@postech.ac.kr
  • 33. Back up  We adopt the divergence measure proposed in [Arun10]  Finding the natural number of topics (# bug types)Information & Database Systems Lab t is the natural number of bug types Mozilla Apache
  • 34. Back up - Bug features  Bug features  Keywords of title and description  Other meta dataInformation & Database Systems Lab Title : Traditional Memory Rendering refactoring request New Bug Description : Request additional refactoring so we can … Report Traditional Rendering. Remove stopwords Traditional Memory Rendering refactoring request Request refactoring … Traditional Rendering