SlideShare a Scribd company logo
Dealing with
          Noise
          in bug prediction



      Sunghun Kim, Hongyu Zhang,
        Rongxin Wu and Liang Gong
The Hong Kong University of Science & Technology
                          Tsinghua University
Where are the bugs?




         2
Where are the bugs?
Complex ļ¬les!
 [Menzies et al.]




                    2
Where are the bugs?
                        Modiļ¬ed ļ¬les!
Complex ļ¬les!           [Nagappan et al.]
 [Menzies et al.]




                    2
Where are the bugs?
                          Modiļ¬ed ļ¬les!
Complex ļ¬les!             [Nagappan et al.]
 [Menzies et al.]




Nearby other bugs!
[Zimmermann et al.]



                      2
Where are the bugs?
                             Modiļ¬ed ļ¬les!
Complex ļ¬les!                [Nagappan et al.]
 [Menzies et al.]




Nearby other bugs!        Previously ļ¬xed ļ¬les
[Zimmermann et al.]           [Hassan et al.]



                      2
Prediction model
  training instances
  (features+ labels)




            3
Prediction model
  training instances
  (features+ labels)




       Learner
            3
Prediction model
      training instances
      (features+ labels)




?

           Learner
                3
Prediction model
      training instances
      (features+ labels)




?

           Learner
                3
Prediction model
      training instances
      (features+ labels)




?

           Learner         Prediction
                3
Prediction model
      training instances
      (features+ labels)




?

           Learner         Prediction
                3
Training on software evolution is key

  ā€¢ Software features can be used to predict bugs
  ā€¢ Defect labels obtained from software evolution
  ā€¢ Supervised learning algorithms


          Version                     Bug
          Archive                   Database




                          4
Change classiļ¬cation




5
    Kim, Whitehead Jr., Zhang: Classifying Software Changes: Clean or Buggy? (TSE 2008)
Change classiļ¬cation


    bug-introducing (ā€œbadā€)

    X        X              X       X




5
        Kim, Whitehead Jr., Zhang: Classifying Software Changes: Clean or Buggy? (TSE 2008)
Change classiļ¬cation

        BUILD A LEARNER
    bug-introducing (ā€œbadā€)

    X         X              X       X




5
         Kim, Whitehead Jr., Zhang: Classifying Software Changes: Clean or Buggy? (TSE 2008)
Change classiļ¬cation

        BUILD A LEARNER
    bug-introducing (ā€œbadā€)

    X         X              X       X
                                                                new change




5
         Kim, Whitehead Jr., Zhang: Classifying Software Changes: Clean or Buggy? (TSE 2008)
Change classiļ¬cation

        BUILD A LEARNER
    bug-introducing (ā€œbadā€)

    X         X              X       X
                                                                new change


                                          PREDICT QUALITY
5
         Kim, Whitehead Jr., Zhang: Classifying Software Changes: Clean or Buggy? (TSE 2008)
Training Classiļ¬ers
             0   1   0   1   0   1   0   1   ā€¦   0   1
Historical
changes



             0   0   0   1   0   1   0   1   ā€¦   0   0
             0   1   1   1   0   1   1   1   ā€¦   0   0
             0   1   0   3   0   0   0   1   ā€¦   0   1
             0   1   0   1   0   1   0   1   ā€¦   0   0


      Ā§ļ‚§ Machine learning techniques
         ā€¢ Bayesian Network, SVM
Source Repository                                                       Bug Database

                  all commits C
                        commit                                                       all bugs B
             commit                commit




    commit                                      commit




                        commit
                                                                              ļ¬xed bugs Bf

                                            commit




               commit




                          commit




7                                                Bird et al. ā€œFair and Balanced? Bias in Bug-Fix Datasets,ā€ FSE2009
Source Repository                                                         Bug Database

                  all commits C
                        commit                                                       all bugs B
             commit                commit




    commit                                      commit




                        commit
                                                                               ļ¬xed bugs Bf

                                            commit




               commit




                          commit



                                                     linked via log messages


7                                                Bird et al. ā€œFair and Balanced? Bias in Bug-Fix Datasets,ā€ FSE2009
Source Repository                                                          Bug Database

                  all commits C
                        commit                                                        all bugs B
             commit                commit




    commit                                      commit




                        commit
                                                                                 ļ¬xed bugs Bf

                                            commit




                                                                               linked ļ¬xed bugs Bļ¬‚
               commit




                          commit



                                                     linked via log messages


7                                                Bird et al. ā€œFair and Balanced? Bias in Bug-Fix Datasets,ā€ FSE2009
Source Repository                                                          Bug Database

                  all commits C
                        commit                                                        all bugs B
             commit                commit




    commit                                      commit




                        commit
                                                                                 ļ¬xed bugs Bf

                                            commit




              linked ļ¬xes Cļ¬‚                                                   linked ļ¬xed bugs Bļ¬‚
               commit




                          commit



                                                     linked via log messages


7                                                Bird et al. ā€œFair and Balanced? Bias in Bug-Fix Datasets,ā€ FSE2009
Source Repository                                                          Bug Database

                  all commits C
                        commit                                                        all bugs B
             commit                commit




    commit                                      commit




                                                               related,
                        commit
                                                            but not linked       ļ¬xed bugs Bf

                                            commit




              linked ļ¬xes Cļ¬‚                                                   linked ļ¬xed bugs Bļ¬‚
               commit




                          commit



                                                     linked via log messages


7                                                Bird et al. ā€œFair and Balanced? Bias in Bug-Fix Datasets,ā€ FSE2009
Source Repository                                                           Bug Database

                  all commits C
                         commit                                                        all bugs B
             commit                 commit




    commit                                       commit



                      bug ļ¬xes Cf                               related,
                         commit
                                                             but not linked       ļ¬xed bugs Bf

                                             commit




              linked ļ¬xes Cļ¬‚                                                    linked ļ¬xed bugs Bļ¬‚
               commit




                           commit



                                                      linked via log messages


7                                                 Bird et al. ā€œFair and Balanced? Bias in Bug-Fix Datasets,ā€ FSE2009
Source Repository                                                           Bug Database

                  all commits C
                                                              oise!
                                                             N                         all bugs B
                         commit

             commit                 commit




    commit                                       commit



                      bug ļ¬xes Cf                               related,
                         commit
                                                             but not linked       ļ¬xed bugs Bf

                                             commit




              linked ļ¬xes Cļ¬‚                                                    linked ļ¬xed bugs Bļ¬‚
               commit




                           commit



                                                      linked via log messages


7                                                 Bird et al. ā€œFair and Balanced? Bias in Bug-Fix Datasets,ā€ FSE2009
Effect of training on
     superbiased data (Severity)



                                             Trained on all bugs
                                             Trained on biased data1
                                             Trained on biased data2


    0%   20%      40%              60%              80%              100%

                     Bug Recall
8              Bird et al. ā€œFair and Balanced? Bias in Bug-Fix Datasets,ā€ FSE2009
Effect of training on
     superbiased data (Severity)



                                             Trained on all bugs
                                             Trained on biased data1
                                             Trained on biased data2


    0%   20%      40%              60%              80%              100%

                     Bug Recall
9              Bird et al. ā€œFair and Balanced? Bias in Bug-Fix Datasets,ā€ FSE2009
Effect of training on
      superbiased data (Severity)



                                              Trained on all bugs
                                              Trained on biased data1
                                              Trained on biased data2


     0%   20%      40%              60%              80%              100%

                      Bug Recall
10              Bird et al. ā€œFair and Balanced? Bias in Bug-Fix Datasets,ā€ FSE2009
Effect of training on
      superbiased data (Severity)


           Bias in bug severityon all bugs
                           Trained
            affects BugCache on biased data1
                           Trained
                           Trained on biased data2


     0%   20%       40%              60%              80%              100%

                       Bug Recall
10               Bird et al. ā€œFair and Balanced? Bias in Bug-Fix Datasets,ā€ FSE2009
Are defect prediction
models learned from
noisy data reliable?

             11
Study questions

ā€¢ Q1: How resistant a defect prediction
  model is to noise?
ā€¢ Q2: How much noise could be detected/
  removed?
ā€¢ Q3: Could we remove noise to improve
  defect prediction performance?


                       12
Study approach




      13
Study approach




      13
Study approach




      13
Study approach
 Training




                 Testing


            13
Study approach
 Training




            Bayes Net   Testing


                 13
Making noisy training instances




              Training        Testing




                14
Making noisy training instances


1 Removing
  buggy labels     False negative noise




                        Training          Testing




                          14
Making noisy training instances


1 Removing
  buggy labels     False negative noise




                        Training          Testing
2 Adding
  buggy labels



                   False positive noise




                          14
Prediction models
                                               buggy
Rev n        Rev n+1
...          ...
...          ...


        change

                                               clean
                        change classiļ¬cation




                                  15
Prediction models
                                                        buggy
Rev n           Rev n+1
...             ...
...             ...


        change

                                                        clean
                             change classiļ¬cation

                                                        buggy

        File
        ...
        ...
        ...
         File
                                                        clean
                          ļ¬le-level defect prediction
                                        15
Performance evaluation
Ā§ļ‚§ 4 possible outcomes from prediction models
  Ā§ļ‚§ Classifying a buggy change as buggy (nb->b)
  Ā§ļ‚§ Classifying a buggy change as clean (nb->c)
  Ā§ļ‚§ Classifying a clean change as clean (nc->c)
  Ā§ļ‚§ Classifying a clean change as buggy (nc->b)

                    nb->b                        nb->b
Ā§ļ‚§ Precision =                    , Recall=
                 nb->b + nc->b                nb->b + nb->c

                   precision ! recall
Ā§ļ‚§ F-measure = 2 !
                   precision + recall
                             16
Subjects
change classiļ¬cation
    subject      # instances        % buggy   # features
   Columba          1,800            29.4%     17,411
 Eclipse (JDT)       659             10.1%     16,192
     Scarab         1,090            50.6%      5,710



ļ¬le-level defect prediction
   subject       # instances        % buggy   # features
    SWT            1,485             44%         18
    Debug          1,065            24.7%        18

                               17
Experimental results
                    $"
                  !#,"
                  !#+"
                  !#*"
!"##$%&'()*+",)




                  !#)"
                  !#("
                                                                             -./0123"
                  !#'"
                  !#&"                                                       40115"6.7"-./0123"

                  !#%"
                  !#$"
                    !"
                         !"   !#$"   !#%"   !#&"   !#'"   !#("   !#)"
                          -./%0,*1212#%+)3%4*5+)%2)#*67)%-&8/%9%4*5+)%:;+167)%-&</%,*3)
                                                           18
Columba
                    $"
                  !#,"
                  !#+"
                  !#*"
!"##$%&'()*+",)




                  !#)"
                  !#("
                                                                             -./0123"
                  !#'"
                  !#&"                                                       40115"6.7"-./0123"

                  !#%"
                  !#$"
                    !"
                         !"   !#$"   !#%"   !#&"   !#'"   !#("   !#)"
                          -./%0,*1212#%+)3%4*5+)%2)#*67)%-&8/%9%4*5+)%:;+167)%-&</%,*3)
                                                           19
Columba
                    $"
                  !#,"
                  !#+"                                    1. Random guess (50% buggy, 50% clean)
                                                          2. Columbaā€™s defect rate is about 30%
                  !#*"
!"##$%&'()*+",)




                                                          3. Precision = 0.3 and Recall =0.5
                  !#)"                                    4. F-measure = 0.375 (2*0.5*0.3)/(0.3+0.5)
                  !#("
                                                                              -./0123"
                  !#'"
                  !#&"                                                        40115"6.7"-./0123"

                  !#%"
                  !#$"
                    !"
                         !"   !#$"   !#%"   !#&"   !#'"     !#("   !#)"
                          -./%0,*1212#%+)3%4*5+)%2)#*67)%-&8/%9%4*5+)%:;+167)%-&</%,*3)
                                                              19
Columba
                    $"
                  !#,"
                  !#+"
                  !#*"
!"##$%&'()*+",)




                  !#)"
                  !#("
                                                                             -./0123"
                  !#'"
                  !#&"                                                       40115"6.7"-./0123"

                  !#%"
                  !#$"
                    !"
                         !"   !#$"   !#%"   !#&"   !#'"   !#("   !#)"
                          -./%0,*1212#%+)3%4*5+)%2)#*67)%-&8/%9%4*5+)%:;+167)%-&</%,*3)
                                                           20
Eclipse (JDT)
                    $"
                  !#,"
                  !#+"
                  !#*"
!"##$%&'()*+",)




                  !#)"
                  !#("
                                                                               -./0123"
                  !#'"
                  !#&"                                                         45667"89:"-./0123"

                  !#%"
                  !#$"
                    !"
                         !"   !#$"   !#%"    !#&"   !#'"   !#("   !#)"
                          -./%0,*1212#%+)3%4*5+)%2)#*67)%-&8/%9%4*5+)%:;+167)%-&</%,*3)
                                                           21
Scarab
                    $"
                  !#,"
                  !#+"
                  !#*"
!"##$%&'()*+",)




                  !#)"
                  !#("
                                                                               -./0/1"
                  !#'"
                  !#&"                                                         23445"670"-./0/1"

                  !#%"
                  !#$"
                    !"
                         !"   !#$"    !#%"   !#&"   !#'"   !#("   !#)"
                          -./%0,*1212#%+)3%4*5+)%2)#*67)%-&8/%9%4*5+)%:;+167)%-&</%,*3)
                                                           22
Eclipse (Debug)
                     $"
                   !#,"
                   !#+"
                   !#*"
!"##$%&'()*+",)%




                   !#)"
                   !#("
                                                                                              -./01"
                   !#'"
                   !#&"                                                                       -0223"456"-./01"

                   !#%"
                   !#$"
                     !"
                          !"       !#$"      !#%"       !#&"      !#'"       !#("      !#)"
                           -./%0,*1212#%+)3%4*5+)%2)#*67)%-&8/%9%4*5+)%:;+167)%-&</%,*3)%

                                                                  23
Eclipse (SWT)
                     $"
                   !#,"
                   !#+"
                   !#*"
!"##$%&'()*+",)%




                   !#)"
                   !#("
                                                                                              -./"
                   !#'"
                   !#&"                                                                       01223"456"-./"

                   !#%"
                   !#$"
                     !"
                          !"      !#$"       !#%"       !#&"      !#'"       !#("      !#)"
                           -./%0,*1212#%+)3%4*5+)%2)#*67)%-&8/%9%4*5+)%:;+167)%-&</%,*3)%

                                                                 24
Q1: How resistant a defect
                              prediction model is to noise?
                    $"
                  !#,"
                  !#+"
                  !#*"
!"##$%&'()*+",)




                                                                                                      -./"
                  !#)"
                  !#("                                                                                01234"
                  !#'"                                                                                5673829"
                  !#&"
                                                                                                      :;7<=>1"
                  !#%"
                                                                                                      -;9?92"
                  !#$"
                    !"
                         !"         !#$"        !#%"       !#&"        !#'"        !#("        !#)"
                               -./%0,*1212#%+)3%4*5+)%2)#*67)%-&8/%9%4*5+)%:;+167)%-&</%,*3)
                                                                  25
Q1: How resistant a defect
                              prediction model is to noise?
                    $"
                  !#,"
                  !#+"
                  !#*"
!"##$%&'()*+",)




                                                                                                      -./"
                  !#)"
                  !#("                                                                                01234"
                  !#'"                                                                                5673829"
                  !#&"
                                                                                                      :;7<=>1"
                  !#%"
                                                                                                      -;9?92"
                  !#$"
                    !"
                         !"         !#$"        !#%"       !#&"        !#'"        !#("        !#)"
                               -./%0,*1212#%+)3%4*5+)%2)#*67)%-&8/%9%4*5+)%:;+167)%-&</%,*3)
                                                                  26
Q1: How resistant a defect
                              prediction model is to noise?
                    $"
                  !#,"
                  !#+"
                  !#*"
!"##$%&'()*+",)




                                                                                                      -./"
                  !#)"
                  !#("                                                                                01234"
                  !#'"                                                                                5673829"
                  !#&"
                  !#%"
                  !#$"
                                     20~30%                                                           :;7<=>1"

                                                                                                      -;9?92"

                    !"
                         !"         !#$"        !#%"       !#&"        !#'"        !#("        !#)"
                               -./%0,*1212#%+)3%4*5+)%2)#*67)%-&8/%9%4*5+)%:;+167)%-&</%,*3)
                                                                  26
Study questions

ā€¢ Q1: How resistant a defect prediction
  model is to noise?
ā€¢ Q2: How much noise could be detected/
  removed?
ā€¢ Q3: Could we remove noise to improve
  defect prediction performance?


                       27
Detecting noise


1 Removing
  buggy labels       False negative noise




                      Original training
2 Adding
  buggy labels



                     False positive noise




                            28
Detecting noise


    False negative noise




     Original training




    False positive noise




           29
30
ts. However, it is very hard to get a golden set. In our approach,
e carefully select high quality datasets and assume them the
 lden sets. We then add FPs and FNs intentionally to create a
                            False positive noise
 ise set. To add FPs and FNs, we randomly selects instances in a
 lden set and artificially change their labels from buggy to clean
 from clean to buggy, inspired by experiments in [4].
                             Original training
                                      ?
     noise
     Clean                  False negative noise
                     Detecting noise
           F igure 4. C reating biased training set
 make FN data sets (for RQ1), we randomly select n% buggy
return Aj
  Closest 9. Thenoise identiļ¬cation algorit
    F igure
            list pseudo-code of the C LN I



                   A




                   31
Noise detection performance


        Precision        Recall     F-measure

Debug    0.681           0.871        0.764

SWT      0.624           0.830        0.712



                            (noise level =20%)
                    32
Noise detection performance
   1

  0.9

  0.8

  0.7

  0.6

  0.5

  0.4

  0.3                                     Precision

  0.2                                     Recall

  0.1
                                          F-measure

   0
        0.1   0.15   0.2     0.25   0.3        0.35   0.4   0.45   0.5

                           FP & FN noise level
                            Noise Rate
                                     33
Study questions

ā€¢ Q1: How resistant a defect prediction
  model is to noise?
ā€¢ Q2: How much noise could be detected/
  removed?
Q3: Could we remove noise to improve
defect prediction performance?


                       34
Bug prediction using cleaned data
                                Noisey          Cleaned

                100



                 75
SWT F-measure




                 50



                 25



                  0
                      0%        15%                       30%   45%

                           35            Noise level
Bug prediction using cleaned data
                                Noisey          Cleaned

                100



                 75
SWT F-measure




                 50



                 25



                  0
                      0%        15%                       30%   45%

                           36            Noise level
Bug prediction using cleaned data
                                 Noisey          Cleaned

                100



                 75
SWT F-measure




                 50



                 25
                           76%
                             F-measure
                           with 45% noise
                  0
                      0%         15%                       30%   45%

                            36            Noise level
Study limitations

ā€¢ All datasets are collected from open source
  projects
ā€¢ The golden set used in this paper may not be
  perfect
ā€¢ The noisy data simulations may not reļ¬‚ect
  the actual noise patterns in practice


                      37
Summary

ā€¢ Prediction models (used in our experiments)
  are resistant (up to 20~30%) of noise
ā€¢ Noise detection is promising
ā€¢ Future work
  - Building oracle defect sets
  - Improving noise detection algorithms
  - Applying to more defect prediction models
    (regression, bugcache)

                        38

More Related Content

More from Sung Kim

REMI: Defect Prediction for Efficient API Testing (ā€Øā€ØESEC/FSE 2015, Industria...
REMI: Defect Prediction for Efficient API Testing (ā€Øā€ØESEC/FSE 2015, Industria...REMI: Defect Prediction for Efficient API Testing (ā€Øā€ØESEC/FSE 2015, Industria...
REMI: Defect Prediction for Efficient API Testing (ā€Øā€ØESEC/FSE 2015, Industria...
Sung Kim
Ā 
Heterogeneous Defect Prediction (ā€Øā€ØESEC/FSE 2015)
Heterogeneous Defect Prediction (ā€Øā€ØESEC/FSE 2015)Heterogeneous Defect Prediction (ā€Øā€ØESEC/FSE 2015)
Heterogeneous Defect Prediction (ā€Øā€ØESEC/FSE 2015)
Sung Kim
Ā 
A Survey on Automatic Software Evolution Techniques
A Survey on Automatic Software Evolution TechniquesA Survey on Automatic Software Evolution Techniques
A Survey on Automatic Software Evolution Techniques
Sung Kim
Ā 
Crowd debugging (FSE 2015)
Crowd debugging (FSE 2015)Crowd debugging (FSE 2015)
Crowd debugging (FSE 2015)
Sung Kim
Ā 
Software Defect Prediction on Unlabeled Datasets
Software Defect Prediction on Unlabeled DatasetsSoftware Defect Prediction on Unlabeled Datasets
Software Defect Prediction on Unlabeled Datasets
Sung Kim
Ā 
Partitioning Composite Code Changes to Facilitate Code Review (MSR2015)
Partitioning Composite Code Changes to Facilitate Code Review (MSR2015)Partitioning Composite Code Changes to Facilitate Code Review (MSR2015)
Partitioning Composite Code Changes to Facilitate Code Review (MSR2015)
Sung Kim
Ā 
Automatically Generated Patches as Debugging Aids: A Human Study (FSE 2014)
Automatically Generated Patches as Debugging Aids: A Human Study (FSE 2014)Automatically Generated Patches as Debugging Aids: A Human Study (FSE 2014)
Automatically Generated Patches as Debugging Aids: A Human Study (FSE 2014)
Sung Kim
Ā 
How We Get There: A Context-Guided Search Strategy in Concolic Testing (FSE 2...
How We Get There: A Context-Guided Search Strategy in Concolic Testing (FSE 2...How We Get There: A Context-Guided Search Strategy in Concolic Testing (FSE 2...
How We Get There: A Context-Guided Search Strategy in Concolic Testing (FSE 2...
Sung Kim
Ā 
CrashLocator: Locating Crashing Faults Based on Crash Stacks (ISSTA 2014)
CrashLocator: Locating Crashing Faults Based on Crash Stacks (ISSTA 2014)CrashLocator: Locating Crashing Faults Based on Crash Stacks (ISSTA 2014)
CrashLocator: Locating Crashing Faults Based on Crash Stacks (ISSTA 2014)
Sung Kim
Ā 
Source code comprehension on evolving software
Source code comprehension on evolving softwareSource code comprehension on evolving software
Source code comprehension on evolving software
Sung Kim
Ā 
A Survey on Dynamic Symbolic Execution for Automatic Test Generation
A Survey on  Dynamic Symbolic Execution  for Automatic Test GenerationA Survey on  Dynamic Symbolic Execution  for Automatic Test Generation
A Survey on Dynamic Symbolic Execution for Automatic Test Generation
Sung Kim
Ā 
Survey on Software Defect Prediction
Survey on Software Defect PredictionSurvey on Software Defect Prediction
Survey on Software Defect Prediction
Sung Kim
Ā 
MSR2014 opening
MSR2014 openingMSR2014 opening
MSR2014 openingSung Kim
Ā 
Personalized Defect Prediction
Personalized Defect PredictionPersonalized Defect Prediction
Personalized Defect Prediction
Sung Kim
Ā 
STAR: Stack Trace based Automatic Crash Reproduction
STAR: Stack Trace based Automatic Crash ReproductionSTAR: Stack Trace based Automatic Crash Reproduction
STAR: Stack Trace based Automatic Crash Reproduction
Sung Kim
Ā 
Transfer defect learning
Transfer defect learningTransfer defect learning
Transfer defect learning
Sung Kim
Ā 
Automatic patch generation learned from human written patches
Automatic patch generation learned from human written patchesAutomatic patch generation learned from human written patches
Automatic patch generation learned from human written patches
Sung Kim
Ā 
The Anatomy of Developer Social Networks
The Anatomy of Developer Social NetworksThe Anatomy of Developer Social Networks
The Anatomy of Developer Social Networks
Sung Kim
Ā 
A Survey on Automatic Test Generation and Crash Reproduction
A Survey on Automatic Test Generation and Crash ReproductionA Survey on Automatic Test Generation and Crash Reproduction
A Survey on Automatic Test Generation and Crash Reproduction
Sung Kim
Ā 
How Do Software Engineers Understand Code Changes? FSE 2012
How Do Software Engineers Understand Code Changes? FSE 2012How Do Software Engineers Understand Code Changes? FSE 2012
How Do Software Engineers Understand Code Changes? FSE 2012
Sung Kim
Ā 

More from Sung Kim (20)

REMI: Defect Prediction for Efficient API Testing (ā€Øā€ØESEC/FSE 2015, Industria...
REMI: Defect Prediction for Efficient API Testing (ā€Øā€ØESEC/FSE 2015, Industria...REMI: Defect Prediction for Efficient API Testing (ā€Øā€ØESEC/FSE 2015, Industria...
REMI: Defect Prediction for Efficient API Testing (ā€Øā€ØESEC/FSE 2015, Industria...
Ā 
Heterogeneous Defect Prediction (ā€Øā€ØESEC/FSE 2015)
Heterogeneous Defect Prediction (ā€Øā€ØESEC/FSE 2015)Heterogeneous Defect Prediction (ā€Øā€ØESEC/FSE 2015)
Heterogeneous Defect Prediction (ā€Øā€ØESEC/FSE 2015)
Ā 
A Survey on Automatic Software Evolution Techniques
A Survey on Automatic Software Evolution TechniquesA Survey on Automatic Software Evolution Techniques
A Survey on Automatic Software Evolution Techniques
Ā 
Crowd debugging (FSE 2015)
Crowd debugging (FSE 2015)Crowd debugging (FSE 2015)
Crowd debugging (FSE 2015)
Ā 
Software Defect Prediction on Unlabeled Datasets
Software Defect Prediction on Unlabeled DatasetsSoftware Defect Prediction on Unlabeled Datasets
Software Defect Prediction on Unlabeled Datasets
Ā 
Partitioning Composite Code Changes to Facilitate Code Review (MSR2015)
Partitioning Composite Code Changes to Facilitate Code Review (MSR2015)Partitioning Composite Code Changes to Facilitate Code Review (MSR2015)
Partitioning Composite Code Changes to Facilitate Code Review (MSR2015)
Ā 
Automatically Generated Patches as Debugging Aids: A Human Study (FSE 2014)
Automatically Generated Patches as Debugging Aids: A Human Study (FSE 2014)Automatically Generated Patches as Debugging Aids: A Human Study (FSE 2014)
Automatically Generated Patches as Debugging Aids: A Human Study (FSE 2014)
Ā 
How We Get There: A Context-Guided Search Strategy in Concolic Testing (FSE 2...
How We Get There: A Context-Guided Search Strategy in Concolic Testing (FSE 2...How We Get There: A Context-Guided Search Strategy in Concolic Testing (FSE 2...
How We Get There: A Context-Guided Search Strategy in Concolic Testing (FSE 2...
Ā 
CrashLocator: Locating Crashing Faults Based on Crash Stacks (ISSTA 2014)
CrashLocator: Locating Crashing Faults Based on Crash Stacks (ISSTA 2014)CrashLocator: Locating Crashing Faults Based on Crash Stacks (ISSTA 2014)
CrashLocator: Locating Crashing Faults Based on Crash Stacks (ISSTA 2014)
Ā 
Source code comprehension on evolving software
Source code comprehension on evolving softwareSource code comprehension on evolving software
Source code comprehension on evolving software
Ā 
A Survey on Dynamic Symbolic Execution for Automatic Test Generation
A Survey on  Dynamic Symbolic Execution  for Automatic Test GenerationA Survey on  Dynamic Symbolic Execution  for Automatic Test Generation
A Survey on Dynamic Symbolic Execution for Automatic Test Generation
Ā 
Survey on Software Defect Prediction
Survey on Software Defect PredictionSurvey on Software Defect Prediction
Survey on Software Defect Prediction
Ā 
MSR2014 opening
MSR2014 openingMSR2014 opening
MSR2014 opening
Ā 
Personalized Defect Prediction
Personalized Defect PredictionPersonalized Defect Prediction
Personalized Defect Prediction
Ā 
STAR: Stack Trace based Automatic Crash Reproduction
STAR: Stack Trace based Automatic Crash ReproductionSTAR: Stack Trace based Automatic Crash Reproduction
STAR: Stack Trace based Automatic Crash Reproduction
Ā 
Transfer defect learning
Transfer defect learningTransfer defect learning
Transfer defect learning
Ā 
Automatic patch generation learned from human written patches
Automatic patch generation learned from human written patchesAutomatic patch generation learned from human written patches
Automatic patch generation learned from human written patches
Ā 
The Anatomy of Developer Social Networks
The Anatomy of Developer Social NetworksThe Anatomy of Developer Social Networks
The Anatomy of Developer Social Networks
Ā 
A Survey on Automatic Test Generation and Crash Reproduction
A Survey on Automatic Test Generation and Crash ReproductionA Survey on Automatic Test Generation and Crash Reproduction
A Survey on Automatic Test Generation and Crash Reproduction
Ā 
How Do Software Engineers Understand Code Changes? FSE 2012
How Do Software Engineers Understand Code Changes? FSE 2012How Do Software Engineers Understand Code Changes? FSE 2012
How Do Software Engineers Understand Code Changes? FSE 2012
Ā 

Recently uploaded

The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
Ā 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
Ā 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
Ā 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
Product School
Ā 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
Ā 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
Elena Simperl
Ā 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
Product School
Ā 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
Ā 
Dev Dives: Train smarter, not harder ā€“ active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder ā€“ active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder ā€“ active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder ā€“ active learning and UiPath LLMs for do...
UiPathCommunity
Ā 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
Ā 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
OnBoard
Ā 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
Ā 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Prayukth K V
Ā 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
sonjaschweigert1
Ā 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Thierry Lestable
Ā 
By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
Pierluigi Pugliese
Ā 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
mikeeftimakis1
Ā 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
Ralf Eggert
Ā 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Aggregage
Ā 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
Ā 

Recently uploaded (20)

The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Ā 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Ā 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Ā 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
Ā 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Ā 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
Ā 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
Ā 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Ā 
Dev Dives: Train smarter, not harder ā€“ active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder ā€“ active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder ā€“ active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder ā€“ active learning and UiPath LLMs for do...
Ā 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ā 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
Ā 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Ā 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Ā 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
Ā 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Ā 
By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
Ā 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
Ā 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
Ā 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Ā 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
Ā 

Dealing with Noise in Defect Prediction

  • 1. Dealing with Noise in bug prediction Sunghun Kim, Hongyu Zhang, Rongxin Wu and Liang Gong The Hong Kong University of Science & Technology Tsinghua University
  • 2. Where are the bugs? 2
  • 3. Where are the bugs? Complex ļ¬les! [Menzies et al.] 2
  • 4. Where are the bugs? Modiļ¬ed ļ¬les! Complex ļ¬les! [Nagappan et al.] [Menzies et al.] 2
  • 5. Where are the bugs? Modiļ¬ed ļ¬les! Complex ļ¬les! [Nagappan et al.] [Menzies et al.] Nearby other bugs! [Zimmermann et al.] 2
  • 6. Where are the bugs? Modiļ¬ed ļ¬les! Complex ļ¬les! [Nagappan et al.] [Menzies et al.] Nearby other bugs! Previously ļ¬xed ļ¬les [Zimmermann et al.] [Hassan et al.] 2
  • 7. Prediction model training instances (features+ labels) 3
  • 8. Prediction model training instances (features+ labels) Learner 3
  • 9. Prediction model training instances (features+ labels) ? Learner 3
  • 10. Prediction model training instances (features+ labels) ? Learner 3
  • 11. Prediction model training instances (features+ labels) ? Learner Prediction 3
  • 12. Prediction model training instances (features+ labels) ? Learner Prediction 3
  • 13. Training on software evolution is key ā€¢ Software features can be used to predict bugs ā€¢ Defect labels obtained from software evolution ā€¢ Supervised learning algorithms Version Bug Archive Database 4
  • 14. Change classiļ¬cation 5 Kim, Whitehead Jr., Zhang: Classifying Software Changes: Clean or Buggy? (TSE 2008)
  • 15. Change classiļ¬cation bug-introducing (ā€œbadā€) X X X X 5 Kim, Whitehead Jr., Zhang: Classifying Software Changes: Clean or Buggy? (TSE 2008)
  • 16. Change classiļ¬cation BUILD A LEARNER bug-introducing (ā€œbadā€) X X X X 5 Kim, Whitehead Jr., Zhang: Classifying Software Changes: Clean or Buggy? (TSE 2008)
  • 17. Change classiļ¬cation BUILD A LEARNER bug-introducing (ā€œbadā€) X X X X new change 5 Kim, Whitehead Jr., Zhang: Classifying Software Changes: Clean or Buggy? (TSE 2008)
  • 18. Change classiļ¬cation BUILD A LEARNER bug-introducing (ā€œbadā€) X X X X new change PREDICT QUALITY 5 Kim, Whitehead Jr., Zhang: Classifying Software Changes: Clean or Buggy? (TSE 2008)
  • 19. Training Classiļ¬ers 0 1 0 1 0 1 0 1 ā€¦ 0 1 Historical changes 0 0 0 1 0 1 0 1 ā€¦ 0 0 0 1 1 1 0 1 1 1 ā€¦ 0 0 0 1 0 3 0 0 0 1 ā€¦ 0 1 0 1 0 1 0 1 0 1 ā€¦ 0 0 Ā§ļ‚§ Machine learning techniques ā€¢ Bayesian Network, SVM
  • 20. Source Repository Bug Database all commits C commit all bugs B commit commit commit commit commit ļ¬xed bugs Bf commit commit commit 7 Bird et al. ā€œFair and Balanced? Bias in Bug-Fix Datasets,ā€ FSE2009
  • 21. Source Repository Bug Database all commits C commit all bugs B commit commit commit commit commit ļ¬xed bugs Bf commit commit commit linked via log messages 7 Bird et al. ā€œFair and Balanced? Bias in Bug-Fix Datasets,ā€ FSE2009
  • 22. Source Repository Bug Database all commits C commit all bugs B commit commit commit commit commit ļ¬xed bugs Bf commit linked ļ¬xed bugs Bļ¬‚ commit commit linked via log messages 7 Bird et al. ā€œFair and Balanced? Bias in Bug-Fix Datasets,ā€ FSE2009
  • 23. Source Repository Bug Database all commits C commit all bugs B commit commit commit commit commit ļ¬xed bugs Bf commit linked ļ¬xes Cļ¬‚ linked ļ¬xed bugs Bļ¬‚ commit commit linked via log messages 7 Bird et al. ā€œFair and Balanced? Bias in Bug-Fix Datasets,ā€ FSE2009
  • 24. Source Repository Bug Database all commits C commit all bugs B commit commit commit commit related, commit but not linked ļ¬xed bugs Bf commit linked ļ¬xes Cļ¬‚ linked ļ¬xed bugs Bļ¬‚ commit commit linked via log messages 7 Bird et al. ā€œFair and Balanced? Bias in Bug-Fix Datasets,ā€ FSE2009
  • 25. Source Repository Bug Database all commits C commit all bugs B commit commit commit commit bug ļ¬xes Cf related, commit but not linked ļ¬xed bugs Bf commit linked ļ¬xes Cļ¬‚ linked ļ¬xed bugs Bļ¬‚ commit commit linked via log messages 7 Bird et al. ā€œFair and Balanced? Bias in Bug-Fix Datasets,ā€ FSE2009
  • 26. Source Repository Bug Database all commits C oise! N all bugs B commit commit commit commit commit bug ļ¬xes Cf related, commit but not linked ļ¬xed bugs Bf commit linked ļ¬xes Cļ¬‚ linked ļ¬xed bugs Bļ¬‚ commit commit linked via log messages 7 Bird et al. ā€œFair and Balanced? Bias in Bug-Fix Datasets,ā€ FSE2009
  • 27. Effect of training on superbiased data (Severity) Trained on all bugs Trained on biased data1 Trained on biased data2 0% 20% 40% 60% 80% 100% Bug Recall 8 Bird et al. ā€œFair and Balanced? Bias in Bug-Fix Datasets,ā€ FSE2009
  • 28. Effect of training on superbiased data (Severity) Trained on all bugs Trained on biased data1 Trained on biased data2 0% 20% 40% 60% 80% 100% Bug Recall 9 Bird et al. ā€œFair and Balanced? Bias in Bug-Fix Datasets,ā€ FSE2009
  • 29. Effect of training on superbiased data (Severity) Trained on all bugs Trained on biased data1 Trained on biased data2 0% 20% 40% 60% 80% 100% Bug Recall 10 Bird et al. ā€œFair and Balanced? Bias in Bug-Fix Datasets,ā€ FSE2009
  • 30. Effect of training on superbiased data (Severity) Bias in bug severityon all bugs Trained affects BugCache on biased data1 Trained Trained on biased data2 0% 20% 40% 60% 80% 100% Bug Recall 10 Bird et al. ā€œFair and Balanced? Bias in Bug-Fix Datasets,ā€ FSE2009
  • 31. Are defect prediction models learned from noisy data reliable? 11
  • 32. Study questions ā€¢ Q1: How resistant a defect prediction model is to noise? ā€¢ Q2: How much noise could be detected/ removed? ā€¢ Q3: Could we remove noise to improve defect prediction performance? 12
  • 37. Study approach Training Bayes Net Testing 13
  • 38. Making noisy training instances Training Testing 14
  • 39. Making noisy training instances 1 Removing buggy labels False negative noise Training Testing 14
  • 40. Making noisy training instances 1 Removing buggy labels False negative noise Training Testing 2 Adding buggy labels False positive noise 14
  • 41. Prediction models buggy Rev n Rev n+1 ... ... ... ... change clean change classiļ¬cation 15
  • 42. Prediction models buggy Rev n Rev n+1 ... ... ... ... change clean change classiļ¬cation buggy File ... ... ... File clean ļ¬le-level defect prediction 15
  • 43. Performance evaluation Ā§ļ‚§ 4 possible outcomes from prediction models Ā§ļ‚§ Classifying a buggy change as buggy (nb->b) Ā§ļ‚§ Classifying a buggy change as clean (nb->c) Ā§ļ‚§ Classifying a clean change as clean (nc->c) Ā§ļ‚§ Classifying a clean change as buggy (nc->b) nb->b nb->b Ā§ļ‚§ Precision = , Recall= nb->b + nc->b nb->b + nb->c precision ! recall Ā§ļ‚§ F-measure = 2 ! precision + recall 16
  • 44. Subjects change classiļ¬cation subject # instances % buggy # features Columba 1,800 29.4% 17,411 Eclipse (JDT) 659 10.1% 16,192 Scarab 1,090 50.6% 5,710 ļ¬le-level defect prediction subject # instances % buggy # features SWT 1,485 44% 18 Debug 1,065 24.7% 18 17
  • 45. Experimental results $" !#," !#+" !#*" !"##$%&'()*+",) !#)" !#(" -./0123" !#'" !#&" 40115"6.7"-./0123" !#%" !#$" !" !" !#$" !#%" !#&" !#'" !#(" !#)" -./%0,*1212#%+)3%4*5+)%2)#*67)%-&8/%9%4*5+)%:;+167)%-&</%,*3) 18
  • 46. Columba $" !#," !#+" !#*" !"##$%&'()*+",) !#)" !#(" -./0123" !#'" !#&" 40115"6.7"-./0123" !#%" !#$" !" !" !#$" !#%" !#&" !#'" !#(" !#)" -./%0,*1212#%+)3%4*5+)%2)#*67)%-&8/%9%4*5+)%:;+167)%-&</%,*3) 19
  • 47. Columba $" !#," !#+" 1. Random guess (50% buggy, 50% clean) 2. Columbaā€™s defect rate is about 30% !#*" !"##$%&'()*+",) 3. Precision = 0.3 and Recall =0.5 !#)" 4. F-measure = 0.375 (2*0.5*0.3)/(0.3+0.5) !#(" -./0123" !#'" !#&" 40115"6.7"-./0123" !#%" !#$" !" !" !#$" !#%" !#&" !#'" !#(" !#)" -./%0,*1212#%+)3%4*5+)%2)#*67)%-&8/%9%4*5+)%:;+167)%-&</%,*3) 19
  • 48. Columba $" !#," !#+" !#*" !"##$%&'()*+",) !#)" !#(" -./0123" !#'" !#&" 40115"6.7"-./0123" !#%" !#$" !" !" !#$" !#%" !#&" !#'" !#(" !#)" -./%0,*1212#%+)3%4*5+)%2)#*67)%-&8/%9%4*5+)%:;+167)%-&</%,*3) 20
  • 49. Eclipse (JDT) $" !#," !#+" !#*" !"##$%&'()*+",) !#)" !#(" -./0123" !#'" !#&" 45667"89:"-./0123" !#%" !#$" !" !" !#$" !#%" !#&" !#'" !#(" !#)" -./%0,*1212#%+)3%4*5+)%2)#*67)%-&8/%9%4*5+)%:;+167)%-&</%,*3) 21
  • 50. Scarab $" !#," !#+" !#*" !"##$%&'()*+",) !#)" !#(" -./0/1" !#'" !#&" 23445"670"-./0/1" !#%" !#$" !" !" !#$" !#%" !#&" !#'" !#(" !#)" -./%0,*1212#%+)3%4*5+)%2)#*67)%-&8/%9%4*5+)%:;+167)%-&</%,*3) 22
  • 51. Eclipse (Debug) $" !#," !#+" !#*" !"##$%&'()*+",)% !#)" !#(" -./01" !#'" !#&" -0223"456"-./01" !#%" !#$" !" !" !#$" !#%" !#&" !#'" !#(" !#)" -./%0,*1212#%+)3%4*5+)%2)#*67)%-&8/%9%4*5+)%:;+167)%-&</%,*3)% 23
  • 52. Eclipse (SWT) $" !#," !#+" !#*" !"##$%&'()*+",)% !#)" !#(" -./" !#'" !#&" 01223"456"-./" !#%" !#$" !" !" !#$" !#%" !#&" !#'" !#(" !#)" -./%0,*1212#%+)3%4*5+)%2)#*67)%-&8/%9%4*5+)%:;+167)%-&</%,*3)% 24
  • 53. Q1: How resistant a defect prediction model is to noise? $" !#," !#+" !#*" !"##$%&'()*+",) -./" !#)" !#(" 01234" !#'" 5673829" !#&" :;7<=>1" !#%" -;9?92" !#$" !" !" !#$" !#%" !#&" !#'" !#(" !#)" -./%0,*1212#%+)3%4*5+)%2)#*67)%-&8/%9%4*5+)%:;+167)%-&</%,*3) 25
  • 54. Q1: How resistant a defect prediction model is to noise? $" !#," !#+" !#*" !"##$%&'()*+",) -./" !#)" !#(" 01234" !#'" 5673829" !#&" :;7<=>1" !#%" -;9?92" !#$" !" !" !#$" !#%" !#&" !#'" !#(" !#)" -./%0,*1212#%+)3%4*5+)%2)#*67)%-&8/%9%4*5+)%:;+167)%-&</%,*3) 26
  • 55. Q1: How resistant a defect prediction model is to noise? $" !#," !#+" !#*" !"##$%&'()*+",) -./" !#)" !#(" 01234" !#'" 5673829" !#&" !#%" !#$" 20~30% :;7<=>1" -;9?92" !" !" !#$" !#%" !#&" !#'" !#(" !#)" -./%0,*1212#%+)3%4*5+)%2)#*67)%-&8/%9%4*5+)%:;+167)%-&</%,*3) 26
  • 56. Study questions ā€¢ Q1: How resistant a defect prediction model is to noise? ā€¢ Q2: How much noise could be detected/ removed? ā€¢ Q3: Could we remove noise to improve defect prediction performance? 27
  • 57. Detecting noise 1 Removing buggy labels False negative noise Original training 2 Adding buggy labels False positive noise 28
  • 58. Detecting noise False negative noise Original training False positive noise 29
  • 59. 30 ts. However, it is very hard to get a golden set. In our approach, e carefully select high quality datasets and assume them the lden sets. We then add FPs and FNs intentionally to create a False positive noise ise set. To add FPs and FNs, we randomly selects instances in a lden set and artificially change their labels from buggy to clean from clean to buggy, inspired by experiments in [4]. Original training ? noise Clean False negative noise Detecting noise F igure 4. C reating biased training set make FN data sets (for RQ1), we randomly select n% buggy
  • 60. return Aj Closest 9. Thenoise identiļ¬cation algorit F igure list pseudo-code of the C LN I A 31
  • 61. Noise detection performance Precision Recall F-measure Debug 0.681 0.871 0.764 SWT 0.624 0.830 0.712 (noise level =20%) 32
  • 62. Noise detection performance 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 Precision 0.2 Recall 0.1 F-measure 0 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 FP & FN noise level Noise Rate 33
  • 63. Study questions ā€¢ Q1: How resistant a defect prediction model is to noise? ā€¢ Q2: How much noise could be detected/ removed? Q3: Could we remove noise to improve defect prediction performance? 34
  • 64. Bug prediction using cleaned data Noisey Cleaned 100 75 SWT F-measure 50 25 0 0% 15% 30% 45% 35 Noise level
  • 65. Bug prediction using cleaned data Noisey Cleaned 100 75 SWT F-measure 50 25 0 0% 15% 30% 45% 36 Noise level
  • 66. Bug prediction using cleaned data Noisey Cleaned 100 75 SWT F-measure 50 25 76% F-measure with 45% noise 0 0% 15% 30% 45% 36 Noise level
  • 67. Study limitations ā€¢ All datasets are collected from open source projects ā€¢ The golden set used in this paper may not be perfect ā€¢ The noisy data simulations may not reļ¬‚ect the actual noise patterns in practice 37
  • 68. Summary ā€¢ Prediction models (used in our experiments) are resistant (up to 20~30%) of noise ā€¢ Noise detection is promising ā€¢ Future work - Building oracle defect sets - Improving noise detection algorithms - Applying to more defect prediction models (regression, bugcache) 38