CosTriage: A Cost-Aware Algorithm for Bug Reporting Systems (AAAI 2011)

Sung Kim
Sung KimAssociate Prof.
AAAI 2011

                                        CosTriage: A Cost-Aware Algorithm for
Information & Database Systems Lab




                                        Bug Reporting Systems
                                     Jin-woo Park1, Mu-Woong Lee1, Jinhan Kim1, Seung-won Hwang1, Sunghun Kim2
                                                           POSTECH, Korea, Republic of1
                                                                HKUST, Hong Kong2
Bug reporting systems
                                      Bugs!!
                                        More than 300 bug reports per day in Mozilla (a big software
                                         project)
Information & Database Systems Lab




                                      Bug Solving
                                        One of the important issues in a software development process
                                        Bug reports are posted, discussed, and assigned to developers


                                      Open sources projects
                                          Apache
                                          Eclipse
                                          Linux kernel
                                          Mozilla
Bug reporting systems
                                      Bug reports
                                         Has Bug ID, title, description, status, and other meta data
                                         Assigned to developers
                                         Fixed by developers
Information & Database Systems Lab




                                      Challenges
                                         Bug triage
                                         Duplicate bug detection
                                         …
                                         bug ID
                                                                                                        title (summary)
                                         status
                                                                                                   bug fix history (time)

                                      other data



                                      description
Bug reporting systems
                                      Bug Triage
                                        Assigning a new bug report to a suitable developer
                                        Bottleneck of bug fixing process
                                           Labor intensive
Information & Database Systems Lab




                                           Miss-assignment can lead to slow bug fix
                                                                                          Bottleneck of
                                                                                          the bug fixing
                                                                                             process




                                                                                           assign
                                           Open
                                          Source             Bug
                                          Project           Reports
                                                                                Triager

                                                                                                           Developers

                                                                                               Can be
                                                                                            automated!!!
Preliminary (recommendation)
                                      Recommender algorithms
                                        Content-based recommendation (CBR)
                                           Predicting user’s interests based on item features
                                           Machine learning methods
Information & Database Systems Lab




                                           Over-specialization problem


                                        Collaborative filtering recommendation (CF)
                                           Predicting user’s interests based on affinity’s interests for items
                                           User neighborhood
                                           Sparsity problem


                                        Hybrid recommendation
                                           Content-boosted collaborative filtering (CBCF)
                                           Combining an existing CBR with a CF
                                           Better performance than either approach alone
Preliminary (recommendation)
                                      Recommender algorithms
                                         Content-based recommendation (CBR)
                                                Predict user’s interests based on item features
                                                   Title, Status, Description, …
Information & Database Systems Lab




                                                Learn features using machine learning methods
                                                Recommend similar items
                                                Over-specialization problem



                                                                                                    Developers
                                                      Bug
                                                    Report 4                        Title   Bug 1      Bug 2     Bug 3   Bug 4

                                                                                    Dev 1    10          9         1      ?

                                     Feature      word1    word2     word3          Dev 2     6          5        10      ?

                                      Count        1           1      4             Dev 3     8          7         7      ?


                                                                                             Rating score table
Preliminary (recommendation)
                                      Recommender algorithms
                                         Content-based recommendation (CBR)
                                                Predict user’s interests based on item features
                                                   Title, Status, Description, …
Information & Database Systems Lab




                                                Learn features using machine learning methods
                                                Recommend similar items
                                                Over-specialization problem



                                                                                                    Developers
                                                      Bug
                                                    Report 4                        Title   Bug 1      Bug 2     Bug 3   Bug 4

                                                                                    Dev 1    10          9         1       9

                                     Feature      word1    word2     word3          Dev 2     6          5        10       5

                                      Count        1           1      4             Dev 3     8          7         7       7


                                                                                             Rating score table
Preliminary (recommendation)
                                      Recommender algorithms
                                        Collaborative filtering recommendation (CF)
                                           Predicting user’s interests based on affinity’s interests for items
                                           User neighborhood
Information & Database Systems Lab




                                           Sparsity problem



                                                                           Bug
                                                                          Reports


                                                                  Bug 1             Bug 2      Bug 3
                                                Developer 1         10               5           10
                                                Developer 2          5               10           6
                                                Developer 3          7               7            7
                                                Developer 4          9               5            ?
Preliminary (recommendation)
                                      Recommender algorithms
                                        Collaborative filtering recommendation (CF)
                                           Predicting user’s interests based on affinity’s interests for items
                                           User neighborhood
Information & Database Systems Lab




                                           Sparsity problem



                                                                           Bug
                                                                          Reports


                                                                  Bug 1             Bug 2      Bug 3
                                                Developer 1         10               5           10
                                                Developer 2          5               10           6
                                                Developer 3          7               7            7
                                                Developer 4          9               5            ?
Preliminary (recommendation)
                                      Recommender algorithms
                                        Collaborative filtering recommendation (CF)
                                           Predicting user’s interests based on affinity’s interests for items
                                           User neighborhood
Information & Database Systems Lab




                                           Sparsity problem



                                                                           Bug
                                                                          Reports


                                                                  Bug 1             Bug 2      Bug 3
                                                Developer 1         10               5           10
                                                Developer 2          5               10           6
                                                Developer 3          7               7            7
                                                Developer 4          9               5            9
Preliminary (recommendation)
                                      Recommender algorithms
                                        Collaborative filtering recommendation (CF)
                                            Predicting user’s interests based on affinity’s interests for items
                                            User neighborhood
Information & Database Systems Lab




                                            Sparsity problem


                                     Unsuitable for triaging because:
                                                                           Bug
                                      No one solved new bug!!
                                                                         Report 4


                                                          Bug 1          Bug 2          Bug 3          Bug 4
                                        Developer 1         10             5              10             ?
                                        Developer 2          5             10             6              ?
                                        Developer 3          7             7              7              ?
                                        Developer 4          9             5              9              ?
Preliminary (recommendation)
                                      Recommender algorithms
                                        Collaborative filtering recommendation (CF)
                                            Predicting user’s interests based on affinity’s interests for items
                                            User neighborhood
Information & Database Systems Lab




                                            Sparsity problem


                                     Unsuitable for triaging because:
                                                                           Bug
                                      Rating is extremely sparse!!
                                                                         Report 4


                                                          Bug 1          Bug 2          Bug 3          Bug 4
                                        Developer 1         10             5              ?              ?
                                        Developer 2          ?             ?              ?              ?
                                        Developer 3          ?             ?              10             ?
                                        Developer 4          ?             ?              ?              3
Preliminary (recommendation)
                                      Recommender algorithms
                                        Hybrid recommendation
                                          Content-boosted collaborative filtering (CBCF)
                                          Combining an existing CBR with a CF
Information & Database Systems Lab




                                          Better performance than either approach alone


                                           Two Phases
                                                                         Bug
                                           - CBR phase                                Feature     word1   word2   word3
                                                                       Report 5
                                           - CF Phase                                 Count        3        2      4



                                                     Bug 1      Bug 2         Bug 3      Bug 4            Bug 5
                                           Dev 1       10          ?              ?           ?            ?
                                           Dev 2         ?         8              3           ?            ?
                                           Dev 3         ?         ?              ?           7            ?
Preliminary (recommendation)
                                      Recommender algorithms
                                        Hybrid recommendation
                                          Content-boosted collaborative filtering (CBCF)
                                          Combining an existing CBR with a CF
Information & Database Systems Lab




                                          Better performance than either approach alone


                                           CBR phase
                                                                         Bug
                                                                       Report 5



                                                     Bug 1      Bug 2         Bug 3    Bug 4   Bug 5
                                           Dev 1       10         10              10    10      10
                                           Dev 2        3          8              3     8       3
                                           Dev 3        7          7              7     7       7
Preliminary (recommendation)
                                      Recommender algorithms
                                        Hybrid recommendation
                                          Content-boosted collaborative filtering (CBCF)
                                          Combining an existing CBR with a CF
Information & Database Systems Lab




                                          Improving CBR/CF by combining the complementary strength of CBR
                                           and CF

                                           CF phase
                                                                       Bug
                                                                     Report 5



                                                      Bug 1    Bug 2        Bug 3   Bug 4      Bug 5
                                           Dev 1       10        9              7     9          8
                                           Dev 2       5         8              3     8          5
                                           Dev 3       7         8              5     7          6

                                        Existing recommendation approaches are not suitable!
Preliminary (Bug triage)
                                      PureCBR [Anvik06]
                                        Construct multi-class classifier using a SVM classifier
                                        Bug reports B are converted into pair <F(B), D> for training
                                            Bug Report History  Input Data
Information & Database Systems Lab




                                            Developers  Classes

                                         F(B) is the feature vector indicating the counts of the keyword w of description of B
                                         D is the developer who fixed B (class)



                                                            Bug Fix
                                                            History

                                                                                           training
                                                                                                        Assign a new bug to Dev 1

                                      New Bug
                                       Report
                                                                                                            Classifier’s scores
Preliminary (Bug triage)
                                      PureCBR [Anvik06]
                                        Good performances
                                        Problem
                                          This approach only considers accuracy
Information & Database Systems Lab




                                             Many bugs  One super-developer
                                             Over-specialization problem
                                          Are developer happy?
                                            We consider developer’s cost (e.g., interests, bug fix time, and
                                           expertise)
                                            We assume that faster bug fixing time has higher developer cost




                                                                   Bug            Bug
                                                                 Report 1       Report 2


                                                                  50 days         2 days
Goal
                                      Goal
                                        Find efficient bug-developer matching
                                           Optimizing not only accuracy but also cost
                                        Use modified CBCF approach
Information & Database Systems Lab




                                           Constructing developer profiles for cost

                                      Challenge
                                        Enhancing CBCF approach for sparse data
                                        Extreme sparseness of the past bug fix history data
                                           A bug fixed by a developer
                                           Need to reduce sparseness for enhancing quality of CBCF




                                                              Bug fix time from bug fix history
Overview
                                                            Cost

                                                             Developer profiles                   Cost score
Information & Database Systems Lab




                                                                                                                         Recommended
                                        New Bug                                                  Aggregation
                                                                                                                           Developer
                                         Report             Accuracy

                                                               Bug classifier                   Accuracy score
                                                                 <SVM>



                                      Merging classifier’s scores and developer’s cost scores.
                                        The accuracy scores are obtained using PureCBR [Anvik06]
                                        The developer cost scores are obtained from “de-sparsified” bug
                                         fix history.
                                        Two scores are then merged for prediction
                                                                                                                 Assign a new bug to Dev 1

                                                                   +                                       =
                                          Accuracy scores                         Cost scores                           Hybrid scores
CosTriage (Cost Estimation)
                                      CosTriage: A Cost-aware Triage Algorithm for bug reporting
                                       system
                                      Challenge to estimate the developer cost?
                                      How to reduce the sparseness problem?
Information & Database Systems Lab




                                         Using a Topic Modeling




                                                           Bug fix time from bug fix history




                                                     Categorization bugs to reduce the sparseness
CosTriage
                                      Categorizing bugs
                                        Topic Modeling
                                           Latent Dirichlet Allocation (LDA) [BleiNg03]
                                           Each topic is represented as a bug type
Information & Database Systems Lab




                                           The topic distribution of reports determine bug types
                                        We adopt the divergence measure proposed in [Arun, R. PAKDD ‘10]
                                           Finding the natural number of topics (# bug types)


                                            t is the natural number of bug types
CosTriage
                                      Developer profiles modeling
                                        Developer Profiles
                                           N-dimensional feature vector
                                           The element of developer profiles, Pu[i], denotes the developer cost
Information & Database Systems Lab




                                            for ith-type bugs


                                                    T denotes the number of bug types



                                        Developer Cost
                                           The average time to fix ith type bugs
CosTriage
                                      Predicting missing values in profiles
                                        Using CF for developer profiles
                                           Similarity measure:
Information & Database Systems Lab




                                               k=1
CosTriage
                                      Obtaining developer’s cost for a new bug report

                                                 Bug type = 1
Information & Database Systems Lab




                                                 New Bug
                                                  Report




                                                                Developer cost for a new bug
Merging
                                                             Cost

                                                              Developer profiles                  Cost score
                                                                <Bug types>
Information & Database Systems Lab




                                                                                                                     Recommended
                                          Bug                                                    Aggregation
                                        Reports                                                                        Developer
                                                             Accuracy

                                                                Bug classifier
                                                                  <SVM>                         Accuracy score



                                      Merging classifier’s scores and developer’s cost scores.


                                                                    +                                          =

                                           Accuracy scores                       Cost scores (CosTriage)           Hybrid scores
                                              [Anvik06]
Experiments
                                      Subject Systems
                                        97,910 valid bug reports
                                        255 active developers
                                        From four open source projects
Information & Database Systems Lab




                                      Approaches
                                        PureCBR: State of the art CBR-based approach
                                        CBCF: Original CBCF
                                        CosTraige: Our approach
Experiments
                                      Two research questions
                                       Q1. How much can our approach improve cost (bug fix time) without
                                           sacrificing bug assignment accuracy?

                                       Q2. What are the trade-offs between accuracy and cost (bug fix
Information & Database Systems Lab




                                           time)?


                                      Evaluation measures




                                       W is the set of bug reports predicted correctly.
                                       N is the number of bug reports in the test set.

                                        The real fix time is unknown, we only use the fix time for correctly
                                         matched bugs.
Experiments
                                      Relative errors of expected bug fix time
Information & Database Systems Lab




                                      Improvement of bug fix time (Q1)




                                        CosTriage improves the costs efficiently up to 30% without
                                         seriously compromising accuracy
Experiments
                                      Trade-off between accuracy and bug fix time (Q2)
Information & Database Systems Lab
Conclusion
                                      We proposed a new bug triaging technique
                                        Optimize not only accuracy but also cost
                                        Solve data sparseness problem by using topic modeling
Information & Database Systems Lab




                                      Experiments using four real bug report corpora
                                        Improve the cost without heavy losses of accuracy
Q&A
Information & Database Systems Lab




                                                Thank you!


                                           Do you have any questions?
Contact
Information & Database Systems Lab




                                                     Jin-woo Park
                                               jwpark85@postech.ac.kr
Back up
                                      We adopt the divergence measure proposed in [Arun10]
                                        Finding the natural number of topics (# bug types)
Information & Database Systems Lab




                                            t is the natural number of bug types




                                                        Mozilla                         Apache
Back up - Bug features
                                      Bug features
                                        Keywords of title and description
                                        Other meta data
Information & Database Systems Lab




                                                                    Title : Traditional Memory Rendering refactoring request

                                          New Bug                   Description : Request additional refactoring so we can
                                                                                                …
                                           Report                                    Traditional Rendering.



                                                                                                    Remove stopwords



                                                                   Traditional Memory Rendering refactoring request Request
                                                                               refactoring … Traditional Rendering
1 of 34

Recommended

KeepIt Course 4: Putting storage, format management and preservation planning... by
KeepIt Course 4: Putting storage, format management and preservation planning...KeepIt Course 4: Putting storage, format management and preservation planning...
KeepIt Course 4: Putting storage, format management and preservation planning...JISC KeepIt project
449 views22 slides
V Code And V Data Illustrating A New Framework For Supporting The Video Annot... by
V Code And V Data Illustrating A New Framework For Supporting The Video Annot...V Code And V Data Illustrating A New Framework For Supporting The Video Annot...
V Code And V Data Illustrating A New Framework For Supporting The Video Annot...GoogleTecTalks
522 views5 slides
Sahara icsm 2011 by
Sahara icsm 2011Sahara icsm 2011
Sahara icsm 2011rnbach
276 views24 slides
Bug tracking system ppt by
Bug tracking system pptBug tracking system ppt
Bug tracking system pptNeha Kaurav
10.4K views15 slides
A Bug Tracking System Is A Software Application by
A Bug Tracking System Is A Software ApplicationA Bug Tracking System Is A Software Application
A Bug Tracking System Is A Software ApplicationAbhishek Pasricha
16.8K views27 slides
Towards effective bug triage with software by
Towards effective bug triage with softwareTowards effective bug triage with software
Towards effective bug triage with softwareNexgen Technology
2.1K views12 slides

More Related Content

Similar to CosTriage: A Cost-Aware Algorithm for Bug Reporting Systems (AAAI 2011)

Predicting Fault-Prone Files using Machine Learning by
Predicting Fault-Prone Files using Machine LearningPredicting Fault-Prone Files using Machine Learning
Predicting Fault-Prone Files using Machine LearningGuido A. Ciollaro
403 views6 slides
Webinar issues we_find_slideshare by
Webinar issues we_find_slideshareWebinar issues we_find_slideshare
Webinar issues we_find_slideshareSOASTA
351 views27 slides
Bottlenecks exposed web app db servers by
Bottlenecks exposed web app db serversBottlenecks exposed web app db servers
Bottlenecks exposed web app db serversUpender Dravidum
3K views28 slides
Cloud Computing for Developers and Architects - QCon 2008 Tutorial by
Cloud Computing for Developers and Architects - QCon 2008 TutorialCloud Computing for Developers and Architects - QCon 2008 Tutorial
Cloud Computing for Developers and Architects - QCon 2008 TutorialStuart Charlton
4.7K views72 slides
The Cloud: A game changer to test, at scale and in production, SOA based web... by
The Cloud: A game changer to test, at scale and in production,  SOA based web...The Cloud: A game changer to test, at scale and in production,  SOA based web...
The Cloud: A game changer to test, at scale and in production, SOA based web...Fred Beringer
912 views79 slides
2012 06-15-jazoon12-sub138-eranea-large-apps-migration by
2012 06-15-jazoon12-sub138-eranea-large-apps-migration2012 06-15-jazoon12-sub138-eranea-large-apps-migration
2012 06-15-jazoon12-sub138-eranea-large-apps-migrationDidier Durand
6K views32 slides

Similar to CosTriage: A Cost-Aware Algorithm for Bug Reporting Systems (AAAI 2011)(20)

Predicting Fault-Prone Files using Machine Learning by Guido A. Ciollaro
Predicting Fault-Prone Files using Machine LearningPredicting Fault-Prone Files using Machine Learning
Predicting Fault-Prone Files using Machine Learning
Guido A. Ciollaro403 views
Webinar issues we_find_slideshare by SOASTA
Webinar issues we_find_slideshareWebinar issues we_find_slideshare
Webinar issues we_find_slideshare
SOASTA351 views
Cloud Computing for Developers and Architects - QCon 2008 Tutorial by Stuart Charlton
Cloud Computing for Developers and Architects - QCon 2008 TutorialCloud Computing for Developers and Architects - QCon 2008 Tutorial
Cloud Computing for Developers and Architects - QCon 2008 Tutorial
Stuart Charlton4.7K views
The Cloud: A game changer to test, at scale and in production, SOA based web... by Fred Beringer
The Cloud: A game changer to test, at scale and in production,  SOA based web...The Cloud: A game changer to test, at scale and in production,  SOA based web...
The Cloud: A game changer to test, at scale and in production, SOA based web...
Fred Beringer912 views
2012 06-15-jazoon12-sub138-eranea-large-apps-migration by Didier Durand
2012 06-15-jazoon12-sub138-eranea-large-apps-migration2012 06-15-jazoon12-sub138-eranea-large-apps-migration
2012 06-15-jazoon12-sub138-eranea-large-apps-migration
Didier Durand6K views
Testability for developers – Fighting a mess by making it testable by Alexander Tarlinder
Testability for developers – Fighting a mess by making it testableTestability for developers – Fighting a mess by making it testable
Testability for developers – Fighting a mess by making it testable
App Dynamics & SOASTA Testing & Monitoring Converge, March 2012 by SOASTA
App Dynamics & SOASTA Testing & Monitoring Converge, March 2012App Dynamics & SOASTA Testing & Monitoring Converge, March 2012
App Dynamics & SOASTA Testing & Monitoring Converge, March 2012
SOASTA787 views
Changes and Bugs: Mining and Predicting Development Activities by Thomas Zimmermann
Changes and Bugs: Mining and Predicting Development ActivitiesChanges and Bugs: Mining and Predicting Development Activities
Changes and Bugs: Mining and Predicting Development Activities
Thomas Zimmermann4.6K views
Analysis of Testability of a Flight Software Product Line by Dharmalingam Ganesan
Analysis of Testability of a Flight Software Product LineAnalysis of Testability of a Flight Software Product Line
Analysis of Testability of a Flight Software Product Line
WebLogic Diagnostic Framework Dr. Frank Munz / munz & more WLS11g by InSync Conference
WebLogic Diagnostic Framework  Dr. Frank Munz / munz & more WLS11gWebLogic Diagnostic Framework  Dr. Frank Munz / munz & more WLS11g
WebLogic Diagnostic Framework Dr. Frank Munz / munz & more WLS11g
InSync Conference2.4K views
Profiling Multicore Systems to Maximize Core Utilization by mentoresd
Profiling Multicore Systems to Maximize Core Utilization Profiling Multicore Systems to Maximize Core Utilization
Profiling Multicore Systems to Maximize Core Utilization
mentoresd1.3K views
Microsoft HPC User Group by sjwoodman
Microsoft HPC User Group Microsoft HPC User Group
Microsoft HPC User Group
sjwoodman404 views
Jimwebber soa by d0nn9n
Jimwebber soaJimwebber soa
Jimwebber soa
d0nn9n350 views

More from Sung Kim

DeepAM: Migrate APIs with Multi-modal Sequence to Sequence Learning by
DeepAM: Migrate APIs with Multi-modal Sequence to Sequence LearningDeepAM: Migrate APIs with Multi-modal Sequence to Sequence Learning
DeepAM: Migrate APIs with Multi-modal Sequence to Sequence LearningSung Kim
1.3K views23 slides
Deep API Learning (FSE 2016) by
Deep API Learning (FSE 2016)Deep API Learning (FSE 2016)
Deep API Learning (FSE 2016)Sung Kim
1.4K views25 slides
Time series classification by
Time series classificationTime series classification
Time series classificationSung Kim
5.7K views29 slides
Tensor board by
Tensor boardTensor board
Tensor boardSung Kim
8.4K views17 slides
REMI: Defect Prediction for Efficient API Testing (

ESEC/FSE 2015, Industria... by
REMI: Defect Prediction for Efficient API Testing (

ESEC/FSE 2015, Industria...REMI: Defect Prediction for Efficient API Testing (

ESEC/FSE 2015, Industria...
REMI: Defect Prediction for Efficient API Testing (

ESEC/FSE 2015, Industria...Sung Kim
2.5K views16 slides
Heterogeneous Defect Prediction (

ESEC/FSE 2015) by
Heterogeneous Defect Prediction (

ESEC/FSE 2015)Heterogeneous Defect Prediction (

ESEC/FSE 2015)
Heterogeneous Defect Prediction (

ESEC/FSE 2015)Sung Kim
2.2K views28 slides

More from Sung Kim(20)

DeepAM: Migrate APIs with Multi-modal Sequence to Sequence Learning by Sung Kim
DeepAM: Migrate APIs with Multi-modal Sequence to Sequence LearningDeepAM: Migrate APIs with Multi-modal Sequence to Sequence Learning
DeepAM: Migrate APIs with Multi-modal Sequence to Sequence Learning
Sung Kim1.3K views
Deep API Learning (FSE 2016) by Sung Kim
Deep API Learning (FSE 2016)Deep API Learning (FSE 2016)
Deep API Learning (FSE 2016)
Sung Kim1.4K views
Time series classification by Sung Kim
Time series classificationTime series classification
Time series classification
Sung Kim5.7K views
Tensor board by Sung Kim
Tensor boardTensor board
Tensor board
Sung Kim8.4K views
REMI: Defect Prediction for Efficient API Testing (

ESEC/FSE 2015, Industria... by Sung Kim
REMI: Defect Prediction for Efficient API Testing (

ESEC/FSE 2015, Industria...REMI: Defect Prediction for Efficient API Testing (

ESEC/FSE 2015, Industria...
REMI: Defect Prediction for Efficient API Testing (

ESEC/FSE 2015, Industria...
Sung Kim2.5K views
Heterogeneous Defect Prediction (

ESEC/FSE 2015) by Sung Kim
Heterogeneous Defect Prediction (

ESEC/FSE 2015)Heterogeneous Defect Prediction (

ESEC/FSE 2015)
Heterogeneous Defect Prediction (

ESEC/FSE 2015)
Sung Kim2.2K views
A Survey on Automatic Software Evolution Techniques by Sung Kim
A Survey on Automatic Software Evolution TechniquesA Survey on Automatic Software Evolution Techniques
A Survey on Automatic Software Evolution Techniques
Sung Kim1.1K views
Crowd debugging (FSE 2015) by Sung Kim
Crowd debugging (FSE 2015)Crowd debugging (FSE 2015)
Crowd debugging (FSE 2015)
Sung Kim1.9K views
Software Defect Prediction on Unlabeled Datasets by Sung Kim
Software Defect Prediction on Unlabeled DatasetsSoftware Defect Prediction on Unlabeled Datasets
Software Defect Prediction on Unlabeled Datasets
Sung Kim16.7K views
Partitioning Composite Code Changes to Facilitate Code Review (MSR2015) by Sung Kim
Partitioning Composite Code Changes to Facilitate Code Review (MSR2015)Partitioning Composite Code Changes to Facilitate Code Review (MSR2015)
Partitioning Composite Code Changes to Facilitate Code Review (MSR2015)
Sung Kim1.6K views
Automatically Generated Patches as Debugging Aids: A Human Study (FSE 2014) by Sung Kim
Automatically Generated Patches as Debugging Aids: A Human Study (FSE 2014)Automatically Generated Patches as Debugging Aids: A Human Study (FSE 2014)
Automatically Generated Patches as Debugging Aids: A Human Study (FSE 2014)
Sung Kim1.9K views
How We Get There: A Context-Guided Search Strategy in Concolic Testing (FSE 2... by Sung Kim
How We Get There: A Context-Guided Search Strategy in Concolic Testing (FSE 2...How We Get There: A Context-Guided Search Strategy in Concolic Testing (FSE 2...
How We Get There: A Context-Guided Search Strategy in Concolic Testing (FSE 2...
Sung Kim2.2K views
CrashLocator: Locating Crashing Faults Based on Crash Stacks (ISSTA 2014) by Sung Kim
CrashLocator: Locating Crashing Faults Based on Crash Stacks (ISSTA 2014)CrashLocator: Locating Crashing Faults Based on Crash Stacks (ISSTA 2014)
CrashLocator: Locating Crashing Faults Based on Crash Stacks (ISSTA 2014)
Sung Kim6.4K views
Source code comprehension on evolving software by Sung Kim
Source code comprehension on evolving softwareSource code comprehension on evolving software
Source code comprehension on evolving software
Sung Kim1.6K views
A Survey on Dynamic Symbolic Execution for Automatic Test Generation by Sung Kim
A Survey on  Dynamic Symbolic Execution  for Automatic Test GenerationA Survey on  Dynamic Symbolic Execution  for Automatic Test Generation
A Survey on Dynamic Symbolic Execution for Automatic Test Generation
Sung Kim3.1K views
Survey on Software Defect Prediction by Sung Kim
Survey on Software Defect PredictionSurvey on Software Defect Prediction
Survey on Software Defect Prediction
Sung Kim14.1K views
MSR2014 opening by Sung Kim
MSR2014 openingMSR2014 opening
MSR2014 opening
Sung Kim17K views
Personalized Defect Prediction by Sung Kim
Personalized Defect PredictionPersonalized Defect Prediction
Personalized Defect Prediction
Sung Kim3.7K views
STAR: Stack Trace based Automatic Crash Reproduction by Sung Kim
STAR: Stack Trace based Automatic Crash ReproductionSTAR: Stack Trace based Automatic Crash Reproduction
STAR: Stack Trace based Automatic Crash Reproduction
Sung Kim7K views
Transfer defect learning by Sung Kim
Transfer defect learningTransfer defect learning
Transfer defect learning
Sung Kim3.2K views

Recently uploaded

Future of Indian ConsumerTech by
Future of Indian ConsumerTechFuture of Indian ConsumerTech
Future of Indian ConsumerTechKapil Khandelwal (KK)
21 views68 slides
Voice Logger - Telephony Integration Solution at Aegis by
Voice Logger - Telephony Integration Solution at AegisVoice Logger - Telephony Integration Solution at Aegis
Voice Logger - Telephony Integration Solution at AegisNirmal Sharma
39 views1 slide
Igniting Next Level Productivity with AI-Infused Data Integration Workflows by
Igniting Next Level Productivity with AI-Infused Data Integration Workflows Igniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration Workflows Safe Software
263 views86 slides
Powerful Google developer tools for immediate impact! (2023-24) by
Powerful Google developer tools for immediate impact! (2023-24)Powerful Google developer tools for immediate impact! (2023-24)
Powerful Google developer tools for immediate impact! (2023-24)wesley chun
10 views38 slides
Business Analyst Series 2023 - Week 3 Session 5 by
Business Analyst Series 2023 -  Week 3 Session 5Business Analyst Series 2023 -  Week 3 Session 5
Business Analyst Series 2023 - Week 3 Session 5DianaGray10
248 views20 slides
GDG Cloud Southlake 28 Brad Taylor and Shawn Augenstein Old Problems in the N... by
GDG Cloud Southlake 28 Brad Taylor and Shawn Augenstein Old Problems in the N...GDG Cloud Southlake 28 Brad Taylor and Shawn Augenstein Old Problems in the N...
GDG Cloud Southlake 28 Brad Taylor and Shawn Augenstein Old Problems in the N...James Anderson
85 views32 slides

Recently uploaded(20)

Voice Logger - Telephony Integration Solution at Aegis by Nirmal Sharma
Voice Logger - Telephony Integration Solution at AegisVoice Logger - Telephony Integration Solution at Aegis
Voice Logger - Telephony Integration Solution at Aegis
Nirmal Sharma39 views
Igniting Next Level Productivity with AI-Infused Data Integration Workflows by Safe Software
Igniting Next Level Productivity with AI-Infused Data Integration Workflows Igniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration Workflows
Safe Software263 views
Powerful Google developer tools for immediate impact! (2023-24) by wesley chun
Powerful Google developer tools for immediate impact! (2023-24)Powerful Google developer tools for immediate impact! (2023-24)
Powerful Google developer tools for immediate impact! (2023-24)
wesley chun10 views
Business Analyst Series 2023 - Week 3 Session 5 by DianaGray10
Business Analyst Series 2023 -  Week 3 Session 5Business Analyst Series 2023 -  Week 3 Session 5
Business Analyst Series 2023 - Week 3 Session 5
DianaGray10248 views
GDG Cloud Southlake 28 Brad Taylor and Shawn Augenstein Old Problems in the N... by James Anderson
GDG Cloud Southlake 28 Brad Taylor and Shawn Augenstein Old Problems in the N...GDG Cloud Southlake 28 Brad Taylor and Shawn Augenstein Old Problems in the N...
GDG Cloud Southlake 28 Brad Taylor and Shawn Augenstein Old Problems in the N...
James Anderson85 views
STKI Israeli Market Study 2023 corrected forecast 2023_24 v3.pdf by Dr. Jimmy Schwarzkopf
STKI Israeli Market Study 2023   corrected forecast 2023_24 v3.pdfSTKI Israeli Market Study 2023   corrected forecast 2023_24 v3.pdf
STKI Israeli Market Study 2023 corrected forecast 2023_24 v3.pdf
"Running students' code in isolation. The hard way", Yurii Holiuk by Fwdays
"Running students' code in isolation. The hard way", Yurii Holiuk "Running students' code in isolation. The hard way", Yurii Holiuk
"Running students' code in isolation. The hard way", Yurii Holiuk
Fwdays11 views
Automating a World-Class Technology Conference; Behind the Scenes of CiscoLive by Network Automation Forum
Automating a World-Class Technology Conference; Behind the Scenes of CiscoLiveAutomating a World-Class Technology Conference; Behind the Scenes of CiscoLive
Automating a World-Class Technology Conference; Behind the Scenes of CiscoLive
6g - REPORT.pdf by Liveplex
6g - REPORT.pdf6g - REPORT.pdf
6g - REPORT.pdf
Liveplex10 views
Piloting & Scaling Successfully With Microsoft Viva by Richard Harbridge
Piloting & Scaling Successfully With Microsoft VivaPiloting & Scaling Successfully With Microsoft Viva
Piloting & Scaling Successfully With Microsoft Viva
SAP Automation Using Bar Code and FIORI.pdf by Virendra Rai, PMP
SAP Automation Using Bar Code and FIORI.pdfSAP Automation Using Bar Code and FIORI.pdf
SAP Automation Using Bar Code and FIORI.pdf
Special_edition_innovator_2023.pdf by WillDavies22
Special_edition_innovator_2023.pdfSpecial_edition_innovator_2023.pdf
Special_edition_innovator_2023.pdf
WillDavies2217 views
PharoJS - Zürich Smalltalk Group Meetup November 2023 by Noury Bouraqadi
PharoJS - Zürich Smalltalk Group Meetup November 2023PharoJS - Zürich Smalltalk Group Meetup November 2023
PharoJS - Zürich Smalltalk Group Meetup November 2023
Noury Bouraqadi127 views

CosTriage: A Cost-Aware Algorithm for Bug Reporting Systems (AAAI 2011)

  • 1. AAAI 2011 CosTriage: A Cost-Aware Algorithm for Information & Database Systems Lab Bug Reporting Systems Jin-woo Park1, Mu-Woong Lee1, Jinhan Kim1, Seung-won Hwang1, Sunghun Kim2 POSTECH, Korea, Republic of1 HKUST, Hong Kong2
  • 2. Bug reporting systems  Bugs!!  More than 300 bug reports per day in Mozilla (a big software project) Information & Database Systems Lab  Bug Solving  One of the important issues in a software development process  Bug reports are posted, discussed, and assigned to developers  Open sources projects  Apache  Eclipse  Linux kernel  Mozilla
  • 3. Bug reporting systems  Bug reports  Has Bug ID, title, description, status, and other meta data  Assigned to developers  Fixed by developers Information & Database Systems Lab  Challenges  Bug triage  Duplicate bug detection  … bug ID title (summary) status bug fix history (time) other data description
  • 4. Bug reporting systems  Bug Triage  Assigning a new bug report to a suitable developer  Bottleneck of bug fixing process  Labor intensive Information & Database Systems Lab  Miss-assignment can lead to slow bug fix Bottleneck of the bug fixing process assign Open Source Bug Project Reports Triager Developers Can be automated!!!
  • 5. Preliminary (recommendation)  Recommender algorithms  Content-based recommendation (CBR)  Predicting user’s interests based on item features  Machine learning methods Information & Database Systems Lab  Over-specialization problem  Collaborative filtering recommendation (CF)  Predicting user’s interests based on affinity’s interests for items  User neighborhood  Sparsity problem  Hybrid recommendation  Content-boosted collaborative filtering (CBCF)  Combining an existing CBR with a CF  Better performance than either approach alone
  • 6. Preliminary (recommendation)  Recommender algorithms  Content-based recommendation (CBR)  Predict user’s interests based on item features  Title, Status, Description, … Information & Database Systems Lab  Learn features using machine learning methods  Recommend similar items  Over-specialization problem Developers Bug Report 4 Title Bug 1 Bug 2 Bug 3 Bug 4 Dev 1 10 9 1 ? Feature word1 word2 word3 Dev 2 6 5 10 ? Count 1 1 4 Dev 3 8 7 7 ? Rating score table
  • 7. Preliminary (recommendation)  Recommender algorithms  Content-based recommendation (CBR)  Predict user’s interests based on item features  Title, Status, Description, … Information & Database Systems Lab  Learn features using machine learning methods  Recommend similar items  Over-specialization problem Developers Bug Report 4 Title Bug 1 Bug 2 Bug 3 Bug 4 Dev 1 10 9 1 9 Feature word1 word2 word3 Dev 2 6 5 10 5 Count 1 1 4 Dev 3 8 7 7 7 Rating score table
  • 8. Preliminary (recommendation)  Recommender algorithms  Collaborative filtering recommendation (CF)  Predicting user’s interests based on affinity’s interests for items  User neighborhood Information & Database Systems Lab  Sparsity problem Bug Reports Bug 1 Bug 2 Bug 3 Developer 1 10 5 10 Developer 2 5 10 6 Developer 3 7 7 7 Developer 4 9 5 ?
  • 9. Preliminary (recommendation)  Recommender algorithms  Collaborative filtering recommendation (CF)  Predicting user’s interests based on affinity’s interests for items  User neighborhood Information & Database Systems Lab  Sparsity problem Bug Reports Bug 1 Bug 2 Bug 3 Developer 1 10 5 10 Developer 2 5 10 6 Developer 3 7 7 7 Developer 4 9 5 ?
  • 10. Preliminary (recommendation)  Recommender algorithms  Collaborative filtering recommendation (CF)  Predicting user’s interests based on affinity’s interests for items  User neighborhood Information & Database Systems Lab  Sparsity problem Bug Reports Bug 1 Bug 2 Bug 3 Developer 1 10 5 10 Developer 2 5 10 6 Developer 3 7 7 7 Developer 4 9 5 9
  • 11. Preliminary (recommendation)  Recommender algorithms  Collaborative filtering recommendation (CF)  Predicting user’s interests based on affinity’s interests for items  User neighborhood Information & Database Systems Lab  Sparsity problem Unsuitable for triaging because: Bug  No one solved new bug!! Report 4 Bug 1 Bug 2 Bug 3 Bug 4 Developer 1 10 5 10 ? Developer 2 5 10 6 ? Developer 3 7 7 7 ? Developer 4 9 5 9 ?
  • 12. Preliminary (recommendation)  Recommender algorithms  Collaborative filtering recommendation (CF)  Predicting user’s interests based on affinity’s interests for items  User neighborhood Information & Database Systems Lab  Sparsity problem Unsuitable for triaging because: Bug  Rating is extremely sparse!! Report 4 Bug 1 Bug 2 Bug 3 Bug 4 Developer 1 10 5 ? ? Developer 2 ? ? ? ? Developer 3 ? ? 10 ? Developer 4 ? ? ? 3
  • 13. Preliminary (recommendation)  Recommender algorithms  Hybrid recommendation  Content-boosted collaborative filtering (CBCF)  Combining an existing CBR with a CF Information & Database Systems Lab  Better performance than either approach alone Two Phases Bug - CBR phase Feature word1 word2 word3 Report 5 - CF Phase Count 3 2 4 Bug 1 Bug 2 Bug 3 Bug 4 Bug 5 Dev 1 10 ? ? ? ? Dev 2 ? 8 3 ? ? Dev 3 ? ? ? 7 ?
  • 14. Preliminary (recommendation)  Recommender algorithms  Hybrid recommendation  Content-boosted collaborative filtering (CBCF)  Combining an existing CBR with a CF Information & Database Systems Lab  Better performance than either approach alone CBR phase Bug Report 5 Bug 1 Bug 2 Bug 3 Bug 4 Bug 5 Dev 1 10 10 10 10 10 Dev 2 3 8 3 8 3 Dev 3 7 7 7 7 7
  • 15. Preliminary (recommendation)  Recommender algorithms  Hybrid recommendation  Content-boosted collaborative filtering (CBCF)  Combining an existing CBR with a CF Information & Database Systems Lab  Improving CBR/CF by combining the complementary strength of CBR and CF CF phase Bug Report 5 Bug 1 Bug 2 Bug 3 Bug 4 Bug 5 Dev 1 10 9 7 9 8 Dev 2 5 8 3 8 5 Dev 3 7 8 5 7 6 Existing recommendation approaches are not suitable!
  • 16. Preliminary (Bug triage)  PureCBR [Anvik06]  Construct multi-class classifier using a SVM classifier  Bug reports B are converted into pair <F(B), D> for training  Bug Report History  Input Data Information & Database Systems Lab  Developers  Classes F(B) is the feature vector indicating the counts of the keyword w of description of B D is the developer who fixed B (class) Bug Fix History training Assign a new bug to Dev 1 New Bug Report Classifier’s scores
  • 17. Preliminary (Bug triage)  PureCBR [Anvik06]  Good performances  Problem  This approach only considers accuracy Information & Database Systems Lab  Many bugs  One super-developer  Over-specialization problem  Are developer happy?  We consider developer’s cost (e.g., interests, bug fix time, and expertise)  We assume that faster bug fixing time has higher developer cost Bug Bug Report 1 Report 2 50 days 2 days
  • 18. Goal  Goal  Find efficient bug-developer matching  Optimizing not only accuracy but also cost  Use modified CBCF approach Information & Database Systems Lab  Constructing developer profiles for cost  Challenge  Enhancing CBCF approach for sparse data  Extreme sparseness of the past bug fix history data  A bug fixed by a developer  Need to reduce sparseness for enhancing quality of CBCF Bug fix time from bug fix history
  • 19. Overview Cost Developer profiles Cost score Information & Database Systems Lab Recommended New Bug Aggregation Developer Report Accuracy Bug classifier Accuracy score <SVM>  Merging classifier’s scores and developer’s cost scores.  The accuracy scores are obtained using PureCBR [Anvik06]  The developer cost scores are obtained from “de-sparsified” bug fix history.  Two scores are then merged for prediction Assign a new bug to Dev 1 + = Accuracy scores Cost scores Hybrid scores
  • 20. CosTriage (Cost Estimation)  CosTriage: A Cost-aware Triage Algorithm for bug reporting system  Challenge to estimate the developer cost?  How to reduce the sparseness problem? Information & Database Systems Lab  Using a Topic Modeling Bug fix time from bug fix history Categorization bugs to reduce the sparseness
  • 21. CosTriage  Categorizing bugs  Topic Modeling  Latent Dirichlet Allocation (LDA) [BleiNg03]  Each topic is represented as a bug type Information & Database Systems Lab  The topic distribution of reports determine bug types  We adopt the divergence measure proposed in [Arun, R. PAKDD ‘10]  Finding the natural number of topics (# bug types) t is the natural number of bug types
  • 22. CosTriage  Developer profiles modeling  Developer Profiles  N-dimensional feature vector  The element of developer profiles, Pu[i], denotes the developer cost Information & Database Systems Lab for ith-type bugs T denotes the number of bug types  Developer Cost  The average time to fix ith type bugs
  • 23. CosTriage  Predicting missing values in profiles  Using CF for developer profiles  Similarity measure: Information & Database Systems Lab k=1
  • 24. CosTriage  Obtaining developer’s cost for a new bug report Bug type = 1 Information & Database Systems Lab New Bug Report Developer cost for a new bug
  • 25. Merging Cost Developer profiles Cost score <Bug types> Information & Database Systems Lab Recommended Bug Aggregation Reports Developer Accuracy Bug classifier <SVM> Accuracy score  Merging classifier’s scores and developer’s cost scores. + = Accuracy scores Cost scores (CosTriage) Hybrid scores [Anvik06]
  • 26. Experiments  Subject Systems  97,910 valid bug reports  255 active developers  From four open source projects Information & Database Systems Lab  Approaches  PureCBR: State of the art CBR-based approach  CBCF: Original CBCF  CosTraige: Our approach
  • 27. Experiments  Two research questions Q1. How much can our approach improve cost (bug fix time) without sacrificing bug assignment accuracy? Q2. What are the trade-offs between accuracy and cost (bug fix Information & Database Systems Lab time)?  Evaluation measures W is the set of bug reports predicted correctly. N is the number of bug reports in the test set.  The real fix time is unknown, we only use the fix time for correctly matched bugs.
  • 28. Experiments  Relative errors of expected bug fix time Information & Database Systems Lab  Improvement of bug fix time (Q1)  CosTriage improves the costs efficiently up to 30% without seriously compromising accuracy
  • 29. Experiments  Trade-off between accuracy and bug fix time (Q2) Information & Database Systems Lab
  • 30. Conclusion  We proposed a new bug triaging technique  Optimize not only accuracy but also cost  Solve data sparseness problem by using topic modeling Information & Database Systems Lab  Experiments using four real bug report corpora  Improve the cost without heavy losses of accuracy
  • 31. Q&A Information & Database Systems Lab Thank you! Do you have any questions?
  • 32. Contact Information & Database Systems Lab Jin-woo Park jwpark85@postech.ac.kr
  • 33. Back up  We adopt the divergence measure proposed in [Arun10]  Finding the natural number of topics (# bug types) Information & Database Systems Lab t is the natural number of bug types Mozilla Apache
  • 34. Back up - Bug features  Bug features  Keywords of title and description  Other meta data Information & Database Systems Lab Title : Traditional Memory Rendering refactoring request New Bug Description : Request additional refactoring so we can … Report Traditional Rendering. Remove stopwords Traditional Memory Rendering refactoring request Request refactoring … Traditional Rendering