SlideShare a Scribd company logo
1 of 32
Download to read offline
Selecting Discriminating Terms for Bug Assignment: A
Formal Analysis.

Ibrahim Aljarah, Shadi Banitaan, Sameer Abufardeh, Wei Jin and Saeed Salem
North Dakota State University, Fargo, ND, USA




                                                This research is supported by
Presentation Outlines
2


       Bug Assignment Problem Overview
       Bug Assignment Steps.
       Term Selections
       Log Odds Ratio based Term Selection Techniques.
       Experimental Results
       Conclusion
       Future Directions
Bug Assignment Problem
3



    Suggest whom to assign this bug to.
Assign the bug to an appropriate developer.


          New Bugs                        D1    B2              D2
                     B2                                    B6
     B1                                         B1
                B3

     B4
               B7
     B5
                                                D3               D4
                                           B5         B7
               B6         Bug Triager      B3

                                                     B4
4




    9/21/2011
Bug Assignment Steps
Bug Reports Preprocessing
6
Bug-term matrix (M) and
             Bug-developer vector (Y) construction
7

                                                                  t1 t2 t3 ……. tR
                                                        b1        0 0 1 ….… 1       d1
                                                        b2        1 1 1 ….… 0       d1
                                                        b3        0 0 1 ….… 1       d3
                                                        b4        1 1 0 ….… 0       d1
                                                        b5        0 0 1 ….… 1       d5
                                                        .         1 0 0 ….… 0       .
                                                        .         0 0 1 ….… 1       .
                                                        .         1 1 1 ….… 0       .
                                                        .         . .   . ……. .     .
                                                        bN        . .   . ……. .     d9


    Need to assign a value {0,1} to each entry of the bug-
                                                                       M            Y
    term matrix.
    T = {t1, t2, · · · , tR} is a set of R terms.
    D = {d1,....., dL} is a set of L pre-defined developers.
    B = {b1,..... bN} is a set of N bug reports to be assigned.
Term Selection
8


       Term Selection:
         It selects a subset of terms to describe the bug report.



           It has been noted that the terms selection can be a good idea to
            reduce the calculations time.

           Thus, it Leads to significant improvement in classification
            performance.

           Common Techniques: Information Gain, Latent Semantic
            analysis.
Discriminating Terms
9


       A term that it is commonly found in the bug reports
        that have been fixed by a specific developer, but
        rarely found in other bug reports.

       Log Odds Ratio Score used to decide which terms
        are discriminated.

       Research goal: improving the classification quality
        by discarding non-discriminating terms before doing
        the classification task(bug assignment).
Log Odds Ratio (LOR)
10




      The LOR score is calculated with respect to the individual
       developer (class) which discriminates the terms in that class.

      High score means that it is more discriminated.

      The LOR score is calculated as follows:
Log Odds Ratio Calculation Example
11
                        Term1        Term2   Term3     Class
          Bug Report1   1            1       1         D1
          Bug Report2   1            0       0         D1
          Bug Report3   0            0       1         D1
          Bug Report4   0            1       1         D2
          Bug Report5   0            0       0         D2
          Bug Report6   0            0       1         D3
          Bug Report7   1            0       0         D3



LogOdds(Term1 |D1)  2/3*log((2/3)/(1/4)= 1.78 ( Term1 has a highest Log Odds
                            Ratio)

LogOdds(Term2 |D1)  1/3*log((1/3)/(1/4)=0.44
LogOdds(Term3 |D1)  2/3*log((2/3)/(2/4)=0.88
Proposed Term Selection Techniques
     Log-Odds-Ratio-based techniques
12


        Terms From All selection (TFA)
            In this method, the R' terms that have the highest LOR scores
             will be chosen without considering the terms distribution
             over all developers.

          All the LOR scores for the terms that are related to each
           class terms are combined in one common list
          Then scores are sorted

          And finally the R′ terms with the highest scoring are
           extracted from the list.
Terms From All selection (TFA)
13
     t1   t2   t3   t4   t5   t6   t7   t8   t9   t10   d1
     0    0    1    1    1    1    0    1    1    1     d3         d1          d2          d3
     1    1    1    0    1    0    1    1    1    1     d2   t1         1.04        1.95        1.33
     0    0    1    1    0    1    1    0    0    1     d2   t2         1.75        1.64        1.02
     1    1    0    0    1    1    1    1    0    0     d1   t3         1.07        1.43        1.35
     0    0    1    1    0    0    1    1    1    1     d3   t4         1.54        1.88        1.62
     0    0    1    0    0    1    1    0    0    0     d1   t5         1.85        1.16        1.53
     0    0    1    1    1    1    0    1    1    1     d1
                                                             t6         1.19        1.23        1.23
     0    0    1    1    1    0    0    1    1    0     d3
     0    0    0    0    1    1    1    0    0    1     d1   t7         1.63        1.43        1.67
     0    1    1    1    0    1    1    1    1    0     d1   t8         1.12        1.92        1.43
     0    1    0    1    1    0    1    1    1    1     d2   t9         1.12        1.12        1.39
     0    1    0    0    0    1    1    1    0    0     d3   t10        1.13        1.98        1.11
                         M                              Y               LOR Values (virtual)
 We have 12 bug reports, 3 developers and 10 different terms.
 If we want to select 6 terms to generate the reduced Bug-term matrix M‘
 Select 6 terms from highest scores regardless the distribution between developer
Proposed Term Selection Techniques
     Log-Odds-Ratio-based techniques
14


        Term-Class Related selection (TCR):
           Idea : Select k terms from each class (developer).

           It enhances the selection criteria by targeting terms that
            have the highest LOR scores in each class.

           Two ways are suggested to specify k, which are:
              Equally Likely.
              Variable.
Proposed Term Selection Techniques
     Log-Odds-Ratio-based techniques
15


        TCR- ki Equally Likely:
            Choosing fixed number of terms for each class. (k)

            For example:
              if we have 10 classes (developers) and we need to select
               100 terms then we select 10 terms from the highest LOR
               scored terms for each developer.

              We  maintain a unique set of terms, i.e., the number of
               obtained terms R′ can be less than or equal to k × L.
TCR- k Equally Likely:
16
      t1   t2   t3   t4   t5   t6   t7   t8   t9   t10   d1
      0    0    1    1    1    1    0    1    1    1     d3         d1          d2          d3
b1
b2    1    1    1    0    1    0    1    1    1    1     d2   t1         1.04        1.95        1.33
b3    0    0    1    1    0    1    1    0    0    1     d2   t2         0.75        1.64        1.02
b4    1    1    0    0    1    1    1    1    0    0     d1   t3         1.07        1.43        1.35
b5    0    0    1    1    0    0    1    1    1    1     d3   t4         1.54        1.88        1.62
b6    0    0    1    0    0    1    1    0    0    0     d1   t5         1.85        1.16        1.53
b7    0    0    1    1    1    1    0    1    1    1     d1
                                                              t6         1.19        1.23        1.23
b8    0    0    1    1    1    0    0    1    1    0     d3
b9    0    0    0    0    1    1    1    0    0    1     d1   t7         1.63        0.43        1.67
b10   0    1    1    1    0    1    1    1    1    0     d1   t8         1.12        1.92        1.43
b11   0    1    0    1    1    0    1    1    1    1     d2   t9         1.12        1.12        1.39
b12   0    1    0    0    0    1    1    1    0    0     d3   t10        1.13        1.98        1.11
                          M                              Y               LOR Values (virtual)
 We have 12 bug reports, 3 developers and 10 different terms.
 If we want to select 6 terms to generate The reduced Bug-term matrix M′
 2 term  d3 , 2 term  d2 , 2 term d1
Proposed Term Selection Techniques
     Log-Odds-Ratio-based techniques
17


    TCR- ki Variable:
      Choosing a variable number of terms for each class.
      k is specified based on the developer fixing rate.

      Fixing rate: is proportional to the number of bug reports
       assigned to the developer from all available bug reports.




         Selection of the highest scored terms with (R' =20) from 100 bug
         reports and 5 developers:
TCR- ki Variable:
18
      t1   t2   t3   t4   t5   t6   t7   t8   t9   t10   d1
      0    0    1    1    1    1    0    1    1    1     d3         d1          D2      d3
b1
b2    1    1    1    0    1    0    1    1    1    1     d2   t1         1.04     1.95     1.33
b3    0    0    1    1    0    1    1    0    0    1     d2   t2         1.75     1.64     1.02
b4    1    1    0    0    1    1    1    1    0    0     d1   t3         1.07     1.43     1.35
b5    0    0    1    1    0    0    1    1    1    1     d3   t4         1.54     1.88     1.62
b6    0    0    1    0    0    1    1    0    0    0     d1   t5         1.85     1.16     1.53
b7    0    0    1    1    1    1    0    1    1    1     d1
      0    0    1    1    1    0    0    1    1    0          T6         1.19      1.23    1.23
b8                                                       d3
b9    0    0    0    0    1    1    1    0    0    1     d1   t7         1.13      1.43    1.67
b10   0    1    1    1    0    1    1    1    1    0     d1   t8         1.12     1.92     1.43
b11   0    1    0    1    1    0    1    1    1    1     d2   t9         1.63     1.12     1.39
b12   0    1    0    0    0    1    1    1    0    0     d3   t10        1.13     1.98     1.11
                          M                              Y               LOR Values (virtual)
 We have 12 bug reports, 3 developers and 10 different terms.
 If we want to select 6 terms to generate The reduced Bug-term matrix M′
 1 term  d3 , 2 term  d2 , 3 term d1
Reduced Bug-term matrix M'
19

                                                          t2 t3 t4 ……. tR'
     b1   t1 t2    t3 ……. tR   D1
          0 0     1 ….… 1                            b1   0 0 1 ….… 1        D1
     b2                        D1                    b2   1 1 1 ….… 0        D1
     b3   1 1     1 ….… 0      D3
          0 0     1 ….… 1                            b3   0 0 1 ….… 1        D3
     b4                        D1   Term selection   b4   1 1 0 ….… 0        D1
     b5   1 1     0 ….… 0      D5
          0 0     1 ….… 1                            b5   0 0 1 ….… 1        D5
     .                         .                     .    1 0 0 ….… 0        .
     .    1 0     0 ….… 0      .
          0 0     1 ….… 1                            .    0 0 1 ….… 1        .
     .                         .                     .    1 1 1 ….… 0        .
     .    1 1     1 ….… 0      .
          . .     . ……. .                            .    . .   . ……. .      .
     bN                        DL                    bN   . .   . ……. .      D9
          . .     . ……. .


                     M                                              M'
Training and Testing Data Preparation
20

                                                    Training Data Set
                                      B1    1   1         1 ….…   0   d2
                                      B2    1   1         1 ….…   1   d2
       t1 t2 t3                tR'    B3    0   0         1 ….…   1   …
                                      B4    1   1         0 ….…   0   dL
 B1    1       1        1 ….… 1 d1
                                      B5    0   0         1 ….…   1
                        1 ….… 0 d2
                                                                      ….
 B2    1       1                                                      …..
                                      B?    .   .         . …….   .
 B3    1       1        1 ….… 1 d2
 B4    1       1        0 ….… 0 d4
 B5    0       0        1 ….… 1 d4
 .     1       0        0 ….… 0 .
 .     0       0        1 ….… 1 .
 .     1       1        1 ….… 0 .
 .     .   .       . …………. ……. . .
 .     .   .       . …………. ……. d1 .
 Bm    .   .       . .………… …….  dL.
                                      B6    1   1        1 ….…    0    d1
                                      B7                               ….
                        M'            B8
                                            1
                                            0
                                                1
                                                0
                                                         1 ….…
                                                         1 ….…
                                                                  1
                                                                  1    d1
                                      B9    1   1        0 ….…    0    ….
Applying 5 folds Cross Validation.    B10   0   0        1 ….…    1    ….
                                      B?    .   .        . …….    .    dL
                                                      Testing Data Set
Experimental results
21


        Eclipse Project Bugs Dataset:
            A variety of open bug repositories are used in open source
             development, our experiments applied on Bugzilla repository related
             to Eclipse (https://bugs.eclipse.org).

            Number of bugs in 2009 are:
             Total Reported                  38843 Bugs.
             FIXED                           20502 Bugs.
             WONTFIX                         1182 Bugs.
             DUBLICATE                       3120 Bugs.
             WORKSFORME                      1362 Bugs
             INVALID                         1465 Bugs
             Not Eclipse                     365 Bugs
             Other (REASSINED ,NEW,REOPEN)   10847 Bugs (Still without Resolution)
Bugs Reports Status and Resolutions
22



         FIXED          WONTFIX         DUBLICATE    WORKSFORME
         INVALID        NOTECLIPSE      OTHER




                          28%




                                               53%
                   1%
                        4%
                         3%

                              8%


                                   3%
Experimental results
23



    Eclipse Bugs Reports Components:
        Bugzilla Repository - Eclipse Project divided in 907 different components.


      We use the most motivated components (have maximum Fixed Bugs) are :
          Core Component: JDT Core is the Java infrastructure of the Java IDE
                 http://www.eclipse.org/jdt/core/index.php
             UI Component: Java Development Toolkit UI.
                 http://www.eclipse.org/jdt/ui/index.html
            SWT Component: Eclipse standard Widgets Toolkit.
                 http://www.eclipse.org/swt/
Number Of Fixed Bugs Per Component
24



       2500


       2000


       1500
                                Count Of Fixed Bugs
       1000


        500


          0
              UI   Core   SWT
Experimental results
25

  Evaluations:


      Precision is the ratio of the correctly classified bug reports to
       the total number of misclassified bug reports and correctly
       classified bug reports.

      Recall is the ratio of correctly classified bug reports to the total
       number of unclassified bug reports and correctly classified bug
       reports.

     .

      We used the Bayesian network Classifier.
Experimental results
26

        Other Techniques used to compare are:
          Information Gain which is calculated for each term with
           respect to all classes, and terms with top R' information gain
           values are returned.

            Latent Semantic Analysis which is transforming terms into
             concepts by extracting relations between terms in the
             selected bug reports.
Experimental results
27




     F-measure results of the five term selection methods using different number of terms.
     These methods were applied on the Core component and only active developers were
     considered.
Experimental results
28



     The results for the SWT Component




      TRC - ki Variable had the highest precision (0.59) and highest recall (0.55).
Experimental results
29


     The results for the UI Component




     TRC - ki Variable achieved the highest precision (0.56) and was from the highest
     recall (0.46) values.
Conclusion
30



      This research investigates the impact of several term selection methods on
       effectiveness of the classification.

      Three Log Odds Ratio (LOR) variants selection methods were proposed.

      A comparison between the proposed selection methods and the
       Information Gain (IG) and Latent Semantic Analysis (LSA) techniques was
       done.

      The LOR-based selection method (TRC - ki Variable) achieved:
         up to 30% improvement in the precision and up to 5% in recall

      These results demonstrate the impact of incorporating effective term
       selection techniques on improving classification performance.
Future Directions
31




      Investigation of other alternative weighting schemes to better identify
       discriminating terms for improving classification accuracy.

      Exploring the potential of incorporating external domain knowledge
       and other evidence sources to better address the general bug
       assignment task.

      Expanding the data sets from multiple domains to further examine
       the effectiveness of proposed term selection techniques.
Thank You.
32




     Any Questions ?

More Related Content

Similar to Formal analysis of selecting discriminating terms for bug assignment

Self Organinising neural networks
Self Organinising  neural networksSelf Organinising  neural networks
Self Organinising neural networksESCOM
 
Index Determination in DAEs using the Library indexdet and the ADOL-C Package...
Index Determination in DAEs using the Library indexdet and the ADOL-C Package...Index Determination in DAEs using the Library indexdet and the ADOL-C Package...
Index Determination in DAEs using the Library indexdet and the ADOL-C Package...Dagmar Monett
 
Andrew_Hair_Assignment_3
Andrew_Hair_Assignment_3Andrew_Hair_Assignment_3
Andrew_Hair_Assignment_3Andrew Hair
 
A Comparison of Evaluation Methods in Coevolution 20070921
A Comparison of Evaluation Methods in Coevolution 20070921A Comparison of Evaluation Methods in Coevolution 20070921
A Comparison of Evaluation Methods in Coevolution 20070921Ting-Shuo Yo
 
reliability workshop
reliability workshopreliability workshop
reliability workshopGaurav Dixit
 
DS-004-Robust Design
DS-004-Robust DesignDS-004-Robust Design
DS-004-Robust Designhandbook
 
CP3_SDM_2010_Souma
CP3_SDM_2010_SoumaCP3_SDM_2010_Souma
CP3_SDM_2010_SoumaMDO_Lab
 
Convolution codes - Coding/Decoding Tree codes and Trellis codes for multiple...
Convolution codes - Coding/Decoding Tree codes and Trellis codes for multiple...Convolution codes - Coding/Decoding Tree codes and Trellis codes for multiple...
Convolution codes - Coding/Decoding Tree codes and Trellis codes for multiple...Madhumita Tamhane
 
Towards Probabilistic Assessment of Modularity
Towards Probabilistic Assessment of ModularityTowards Probabilistic Assessment of Modularity
Towards Probabilistic Assessment of ModularityKevin Hoffman
 
Report of the Implementation of DAE Integrators for MBSim
Report of the Implementation of DAE Integrators for MBSimReport of the Implementation of DAE Integrators for MBSim
Report of the Implementation of DAE Integrators for MBSimZhan Wang
 
Lcdf4 chap 03_p2
Lcdf4 chap 03_p2Lcdf4 chap 03_p2
Lcdf4 chap 03_p2ozgur_can
 
Evolutionary Algorithms and their Applications in Civil Engineering - 1
Evolutionary Algorithms and their Applications in Civil Engineering - 1Evolutionary Algorithms and their Applications in Civil Engineering - 1
Evolutionary Algorithms and their Applications in Civil Engineering - 1shreymodi
 
Decoders and encoders
Decoders and encodersDecoders and encoders
Decoders and encoderssanket1996
 
Chapter 1 Digital Systems and Binary Numbers.ppt
Chapter 1 Digital Systems and Binary Numbers.pptChapter 1 Digital Systems and Binary Numbers.ppt
Chapter 1 Digital Systems and Binary Numbers.pptAparnaDas827261
 

Similar to Formal analysis of selecting discriminating terms for bug assignment (20)

Self Organinising neural networks
Self Organinising  neural networksSelf Organinising  neural networks
Self Organinising neural networks
 
Index Determination in DAEs using the Library indexdet and the ADOL-C Package...
Index Determination in DAEs using the Library indexdet and the ADOL-C Package...Index Determination in DAEs using the Library indexdet and the ADOL-C Package...
Index Determination in DAEs using the Library indexdet and the ADOL-C Package...
 
Andrew_Hair_Assignment_3
Andrew_Hair_Assignment_3Andrew_Hair_Assignment_3
Andrew_Hair_Assignment_3
 
A Comparison of Evaluation Methods in Coevolution 20070921
A Comparison of Evaluation Methods in Coevolution 20070921A Comparison of Evaluation Methods in Coevolution 20070921
A Comparison of Evaluation Methods in Coevolution 20070921
 
reliability workshop
reliability workshopreliability workshop
reliability workshop
 
AA-sort with SSE4.1
AA-sort with SSE4.1AA-sort with SSE4.1
AA-sort with SSE4.1
 
DS-004-Robust Design
DS-004-Robust DesignDS-004-Robust Design
DS-004-Robust Design
 
CP3_SDM_2010_Souma
CP3_SDM_2010_SoumaCP3_SDM_2010_Souma
CP3_SDM_2010_Souma
 
Complement.pdf
Complement.pdfComplement.pdf
Complement.pdf
 
Robust Design
Robust DesignRobust Design
Robust Design
 
Convolution codes - Coding/Decoding Tree codes and Trellis codes for multiple...
Convolution codes - Coding/Decoding Tree codes and Trellis codes for multiple...Convolution codes - Coding/Decoding Tree codes and Trellis codes for multiple...
Convolution codes - Coding/Decoding Tree codes and Trellis codes for multiple...
 
Towards Probabilistic Assessment of Modularity
Towards Probabilistic Assessment of ModularityTowards Probabilistic Assessment of Modularity
Towards Probabilistic Assessment of Modularity
 
Report of the Implementation of DAE Integrators for MBSim
Report of the Implementation of DAE Integrators for MBSimReport of the Implementation of DAE Integrators for MBSim
Report of the Implementation of DAE Integrators for MBSim
 
Asian basket options
Asian basket optionsAsian basket options
Asian basket options
 
Lcdf4 chap 03_p2
Lcdf4 chap 03_p2Lcdf4 chap 03_p2
Lcdf4 chap 03_p2
 
Evolutionary Algorithms and their Applications in Civil Engineering - 1
Evolutionary Algorithms and their Applications in Civil Engineering - 1Evolutionary Algorithms and their Applications in Civil Engineering - 1
Evolutionary Algorithms and their Applications in Civil Engineering - 1
 
Decoders and encoders
Decoders and encodersDecoders and encoders
Decoders and encoders
 
Chapter 1 Digital Systems and Binary Numbers.ppt
Chapter 1 Digital Systems and Binary Numbers.pptChapter 1 Digital Systems and Binary Numbers.ppt
Chapter 1 Digital Systems and Binary Numbers.ppt
 
DCT_TR802
DCT_TR802DCT_TR802
DCT_TR802
 
DCT_TR802
DCT_TR802DCT_TR802
DCT_TR802
 

Recently uploaded

How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 

Recently uploaded (20)

How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 

Formal analysis of selecting discriminating terms for bug assignment

  • 1. Selecting Discriminating Terms for Bug Assignment: A Formal Analysis. Ibrahim Aljarah, Shadi Banitaan, Sameer Abufardeh, Wei Jin and Saeed Salem North Dakota State University, Fargo, ND, USA This research is supported by
  • 2. Presentation Outlines 2  Bug Assignment Problem Overview  Bug Assignment Steps.  Term Selections  Log Odds Ratio based Term Selection Techniques.  Experimental Results  Conclusion  Future Directions
  • 3. Bug Assignment Problem 3 Suggest whom to assign this bug to. Assign the bug to an appropriate developer. New Bugs D1 B2 D2 B2 B6 B1 B1 B3 B4 B7 B5 D3 D4 B5 B7 B6 Bug Triager B3 B4
  • 4. 4 9/21/2011
  • 7. Bug-term matrix (M) and Bug-developer vector (Y) construction 7 t1 t2 t3 ……. tR b1 0 0 1 ….… 1 d1 b2 1 1 1 ….… 0 d1 b3 0 0 1 ….… 1 d3 b4 1 1 0 ….… 0 d1 b5 0 0 1 ….… 1 d5 . 1 0 0 ….… 0 . . 0 0 1 ….… 1 . . 1 1 1 ….… 0 . . . . . ……. . . bN . . . ……. . d9 Need to assign a value {0,1} to each entry of the bug- M Y term matrix. T = {t1, t2, · · · , tR} is a set of R terms. D = {d1,....., dL} is a set of L pre-defined developers. B = {b1,..... bN} is a set of N bug reports to be assigned.
  • 8. Term Selection 8  Term Selection:  It selects a subset of terms to describe the bug report.  It has been noted that the terms selection can be a good idea to reduce the calculations time.  Thus, it Leads to significant improvement in classification performance.  Common Techniques: Information Gain, Latent Semantic analysis.
  • 9. Discriminating Terms 9  A term that it is commonly found in the bug reports that have been fixed by a specific developer, but rarely found in other bug reports.  Log Odds Ratio Score used to decide which terms are discriminated.  Research goal: improving the classification quality by discarding non-discriminating terms before doing the classification task(bug assignment).
  • 10. Log Odds Ratio (LOR) 10  The LOR score is calculated with respect to the individual developer (class) which discriminates the terms in that class.  High score means that it is more discriminated.  The LOR score is calculated as follows:
  • 11. Log Odds Ratio Calculation Example 11 Term1 Term2 Term3 Class Bug Report1 1 1 1 D1 Bug Report2 1 0 0 D1 Bug Report3 0 0 1 D1 Bug Report4 0 1 1 D2 Bug Report5 0 0 0 D2 Bug Report6 0 0 1 D3 Bug Report7 1 0 0 D3 LogOdds(Term1 |D1)  2/3*log((2/3)/(1/4)= 1.78 ( Term1 has a highest Log Odds Ratio) LogOdds(Term2 |D1)  1/3*log((1/3)/(1/4)=0.44 LogOdds(Term3 |D1)  2/3*log((2/3)/(2/4)=0.88
  • 12. Proposed Term Selection Techniques Log-Odds-Ratio-based techniques 12  Terms From All selection (TFA)  In this method, the R' terms that have the highest LOR scores will be chosen without considering the terms distribution over all developers.  All the LOR scores for the terms that are related to each class terms are combined in one common list  Then scores are sorted  And finally the R′ terms with the highest scoring are extracted from the list.
  • 13. Terms From All selection (TFA) 13 t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 d1 0 0 1 1 1 1 0 1 1 1 d3 d1 d2 d3 1 1 1 0 1 0 1 1 1 1 d2 t1 1.04 1.95 1.33 0 0 1 1 0 1 1 0 0 1 d2 t2 1.75 1.64 1.02 1 1 0 0 1 1 1 1 0 0 d1 t3 1.07 1.43 1.35 0 0 1 1 0 0 1 1 1 1 d3 t4 1.54 1.88 1.62 0 0 1 0 0 1 1 0 0 0 d1 t5 1.85 1.16 1.53 0 0 1 1 1 1 0 1 1 1 d1 t6 1.19 1.23 1.23 0 0 1 1 1 0 0 1 1 0 d3 0 0 0 0 1 1 1 0 0 1 d1 t7 1.63 1.43 1.67 0 1 1 1 0 1 1 1 1 0 d1 t8 1.12 1.92 1.43 0 1 0 1 1 0 1 1 1 1 d2 t9 1.12 1.12 1.39 0 1 0 0 0 1 1 1 0 0 d3 t10 1.13 1.98 1.11 M Y LOR Values (virtual) We have 12 bug reports, 3 developers and 10 different terms. If we want to select 6 terms to generate the reduced Bug-term matrix M‘ Select 6 terms from highest scores regardless the distribution between developer
  • 14. Proposed Term Selection Techniques Log-Odds-Ratio-based techniques 14  Term-Class Related selection (TCR):  Idea : Select k terms from each class (developer).  It enhances the selection criteria by targeting terms that have the highest LOR scores in each class.  Two ways are suggested to specify k, which are:  Equally Likely.  Variable.
  • 15. Proposed Term Selection Techniques Log-Odds-Ratio-based techniques 15  TCR- ki Equally Likely:  Choosing fixed number of terms for each class. (k)  For example:  if we have 10 classes (developers) and we need to select 100 terms then we select 10 terms from the highest LOR scored terms for each developer.  We maintain a unique set of terms, i.e., the number of obtained terms R′ can be less than or equal to k × L.
  • 16. TCR- k Equally Likely: 16 t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 d1 0 0 1 1 1 1 0 1 1 1 d3 d1 d2 d3 b1 b2 1 1 1 0 1 0 1 1 1 1 d2 t1 1.04 1.95 1.33 b3 0 0 1 1 0 1 1 0 0 1 d2 t2 0.75 1.64 1.02 b4 1 1 0 0 1 1 1 1 0 0 d1 t3 1.07 1.43 1.35 b5 0 0 1 1 0 0 1 1 1 1 d3 t4 1.54 1.88 1.62 b6 0 0 1 0 0 1 1 0 0 0 d1 t5 1.85 1.16 1.53 b7 0 0 1 1 1 1 0 1 1 1 d1 t6 1.19 1.23 1.23 b8 0 0 1 1 1 0 0 1 1 0 d3 b9 0 0 0 0 1 1 1 0 0 1 d1 t7 1.63 0.43 1.67 b10 0 1 1 1 0 1 1 1 1 0 d1 t8 1.12 1.92 1.43 b11 0 1 0 1 1 0 1 1 1 1 d2 t9 1.12 1.12 1.39 b12 0 1 0 0 0 1 1 1 0 0 d3 t10 1.13 1.98 1.11 M Y LOR Values (virtual) We have 12 bug reports, 3 developers and 10 different terms. If we want to select 6 terms to generate The reduced Bug-term matrix M′ 2 term  d3 , 2 term  d2 , 2 term d1
  • 17. Proposed Term Selection Techniques Log-Odds-Ratio-based techniques 17  TCR- ki Variable:  Choosing a variable number of terms for each class.  k is specified based on the developer fixing rate.  Fixing rate: is proportional to the number of bug reports assigned to the developer from all available bug reports. Selection of the highest scored terms with (R' =20) from 100 bug reports and 5 developers:
  • 18. TCR- ki Variable: 18 t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 d1 0 0 1 1 1 1 0 1 1 1 d3 d1 D2 d3 b1 b2 1 1 1 0 1 0 1 1 1 1 d2 t1 1.04 1.95 1.33 b3 0 0 1 1 0 1 1 0 0 1 d2 t2 1.75 1.64 1.02 b4 1 1 0 0 1 1 1 1 0 0 d1 t3 1.07 1.43 1.35 b5 0 0 1 1 0 0 1 1 1 1 d3 t4 1.54 1.88 1.62 b6 0 0 1 0 0 1 1 0 0 0 d1 t5 1.85 1.16 1.53 b7 0 0 1 1 1 1 0 1 1 1 d1 0 0 1 1 1 0 0 1 1 0 T6 1.19 1.23 1.23 b8 d3 b9 0 0 0 0 1 1 1 0 0 1 d1 t7 1.13 1.43 1.67 b10 0 1 1 1 0 1 1 1 1 0 d1 t8 1.12 1.92 1.43 b11 0 1 0 1 1 0 1 1 1 1 d2 t9 1.63 1.12 1.39 b12 0 1 0 0 0 1 1 1 0 0 d3 t10 1.13 1.98 1.11 M Y LOR Values (virtual) We have 12 bug reports, 3 developers and 10 different terms. If we want to select 6 terms to generate The reduced Bug-term matrix M′ 1 term  d3 , 2 term  d2 , 3 term d1
  • 19. Reduced Bug-term matrix M' 19 t2 t3 t4 ……. tR' b1 t1 t2 t3 ……. tR D1 0 0 1 ….… 1 b1 0 0 1 ….… 1 D1 b2 D1 b2 1 1 1 ….… 0 D1 b3 1 1 1 ….… 0 D3 0 0 1 ….… 1 b3 0 0 1 ….… 1 D3 b4 D1 Term selection b4 1 1 0 ….… 0 D1 b5 1 1 0 ….… 0 D5 0 0 1 ….… 1 b5 0 0 1 ….… 1 D5 . . . 1 0 0 ….… 0 . . 1 0 0 ….… 0 . 0 0 1 ….… 1 . 0 0 1 ….… 1 . . . . 1 1 1 ….… 0 . . 1 1 1 ….… 0 . . . . ……. . . . . . ……. . . bN DL bN . . . ……. . D9 . . . ……. . M M'
  • 20. Training and Testing Data Preparation 20 Training Data Set B1 1 1 1 ….… 0 d2 B2 1 1 1 ….… 1 d2 t1 t2 t3 tR' B3 0 0 1 ….… 1 … B4 1 1 0 ….… 0 dL B1 1 1 1 ….… 1 d1 B5 0 0 1 ….… 1 1 ….… 0 d2 …. B2 1 1 ….. B? . . . ……. . B3 1 1 1 ….… 1 d2 B4 1 1 0 ….… 0 d4 B5 0 0 1 ….… 1 d4 . 1 0 0 ….… 0 . . 0 0 1 ….… 1 . . 1 1 1 ….… 0 . . . . . …………. ……. . . . . . . …………. ……. d1 . Bm . . . .………… ……. dL. B6 1 1 1 ….… 0 d1 B7 …. M' B8 1 0 1 0 1 ….… 1 ….… 1 1 d1 B9 1 1 0 ….… 0 …. Applying 5 folds Cross Validation. B10 0 0 1 ….… 1 …. B? . . . ……. . dL Testing Data Set
  • 21. Experimental results 21  Eclipse Project Bugs Dataset:  A variety of open bug repositories are used in open source development, our experiments applied on Bugzilla repository related to Eclipse (https://bugs.eclipse.org).  Number of bugs in 2009 are: Total Reported 38843 Bugs. FIXED 20502 Bugs. WONTFIX 1182 Bugs. DUBLICATE 3120 Bugs. WORKSFORME 1362 Bugs INVALID 1465 Bugs Not Eclipse 365 Bugs Other (REASSINED ,NEW,REOPEN) 10847 Bugs (Still without Resolution)
  • 22. Bugs Reports Status and Resolutions 22 FIXED WONTFIX DUBLICATE WORKSFORME INVALID NOTECLIPSE OTHER 28% 53% 1% 4% 3% 8% 3%
  • 23. Experimental results 23  Eclipse Bugs Reports Components:  Bugzilla Repository - Eclipse Project divided in 907 different components.  We use the most motivated components (have maximum Fixed Bugs) are :  Core Component: JDT Core is the Java infrastructure of the Java IDE  http://www.eclipse.org/jdt/core/index.php  UI Component: Java Development Toolkit UI.  http://www.eclipse.org/jdt/ui/index.html  SWT Component: Eclipse standard Widgets Toolkit.  http://www.eclipse.org/swt/
  • 24. Number Of Fixed Bugs Per Component 24 2500 2000 1500 Count Of Fixed Bugs 1000 500 0 UI Core SWT
  • 25. Experimental results 25  Evaluations:  Precision is the ratio of the correctly classified bug reports to the total number of misclassified bug reports and correctly classified bug reports.  Recall is the ratio of correctly classified bug reports to the total number of unclassified bug reports and correctly classified bug reports. .  We used the Bayesian network Classifier.
  • 26. Experimental results 26  Other Techniques used to compare are:  Information Gain which is calculated for each term with respect to all classes, and terms with top R' information gain values are returned.  Latent Semantic Analysis which is transforming terms into concepts by extracting relations between terms in the selected bug reports.
  • 27. Experimental results 27 F-measure results of the five term selection methods using different number of terms. These methods were applied on the Core component and only active developers were considered.
  • 28. Experimental results 28 The results for the SWT Component TRC - ki Variable had the highest precision (0.59) and highest recall (0.55).
  • 29. Experimental results 29 The results for the UI Component TRC - ki Variable achieved the highest precision (0.56) and was from the highest recall (0.46) values.
  • 30. Conclusion 30  This research investigates the impact of several term selection methods on effectiveness of the classification.  Three Log Odds Ratio (LOR) variants selection methods were proposed.  A comparison between the proposed selection methods and the Information Gain (IG) and Latent Semantic Analysis (LSA) techniques was done.  The LOR-based selection method (TRC - ki Variable) achieved:  up to 30% improvement in the precision and up to 5% in recall  These results demonstrate the impact of incorporating effective term selection techniques on improving classification performance.
  • 31. Future Directions 31  Investigation of other alternative weighting schemes to better identify discriminating terms for improving classification accuracy.  Exploring the potential of incorporating external domain knowledge and other evidence sources to better address the general bug assignment task.  Expanding the data sets from multiple domains to further examine the effectiveness of proposed term selection techniques.
  • 32. Thank You. 32 Any Questions ?