SlideShare a Scribd company logo
Micro Interaction Metrics
                for Defect Prediction



Taek Lee, Jaechang Nam, Dongyun Han, Sunghun Kim, Hoh Peter In
                FSE 2011, Hungary, Sep. 5-9
Outline

• Research motivation
• The existing metrics
• The proposed metrics
• Experiment results
• Threats to validity
• Conclusion
Defect Prediction?
 why is it necessary?
Software quality assurance
 is inherently a resource
   constrained activity!
Predicting defect-prone
     software entities* is
to put the best labor effort
        on the entities


                  * functions or code files
Indicators of defects

• Complexity of source codes        (Chidamber and Kemerer 1994)



• Frequent code changes     (Moser et al. 2008)



• Previous defect information       (Kim et al. 2007)



• Code dependencies   (Zimmermann 2007)
Indeed,
where do defects
   come from?
Humans Error!
Programmers make mistakes,
  consequently defects are
 injected, and software fails

    Human      Bugs     Software
    Errors   Injected     fails
Programmer Interaction
 and Software Quality
Programmer Interaction
    and Software Quality

“Errors are from cognitive breakdown
while understanding and implementing
            requirements”
                        - Ko et al. 2005
Programmer Interaction
    and Software Quality

“Errors are from cognitive breakdown
while understanding and implementing
            requirements”
                          - Ko et al. 2005

“Work interruptions or task switching
may affect programmer productivity”
                     - DeLine et al. 2006
Don’t we need to also consider
   developers’ interactions
          as defect indicators?
…, but the existing indicators
can NOT directly capture
  developers’ interactions
Using Mylyn data, we propose novel
“Micro Interaction Metrics (MIMs)”
    capturing developers’ interactions
The Mylyn* data is stored
  as an attachment to the
corresponding bug reports in
       the XML format


      * Eclipse plug-in storing and recovering task contexts
<InteractionEvent … Kind=“ ” … StartDate=“ ” EndDate=“ ”
        … StructureHandle=“ ” … Interest=“ ” … >
<InteractionEvent … Kind=“ ” … StartDate=“ ” EndDate=“ ”
        … StructureHandle=“ ” … Interest=“ ” … >
<InteractionEvent … Kind=“ ” … StartDate=“ ” EndDate=“ ”
        … StructureHandle=“ ” … Interest=“ ” … >
Two levels of MIMs Design
Two levels of MIMs Design

File-level MIMs
specific interactions for a
file in a task
(e.g., AvgTimeIntervalEditEdit)
Two levels of MIMs Design

File-level MIMs
specific interactions for a
file in a task
(e.g., AvgTimeIntervalEditEdit)

Task-level MIMs
property values shared
over the whole task
(e.g., TimeSpent)
Two levels of MIMs Design

File-level MIMs                   Mylyn Task Logs
specific interactions for a
file in a task                    10:30   Selection   file A
(e.g., AvgTimeIntervalEditEdit)

                                  11:00     Edit      file B


                                  12:30     Edit      file B
Two levels of MIMs Design

                         Mylyn Task Logs


                         10:30   Selection   file A




Task-level MIMs          11:00     Edit      file B


property values shared   12:30     Edit      file B

over the whole task
(e.g., TimeSpent)
The Proposed Micro Interaction Metrics
The Proposed Micro Interaction Metrics
The Proposed Micro Interaction Metrics
For example,
NumPatternSXEY is to capture
     this interaction:
For example,
 NumPatternSXEY is to capture
      this interaction:

“How much times did a programmer
     Select a file of group X
  and then Edit a file of group Y
        in a task activity?”
X if a file shows defect
                           locality* properties
group X or Y
                           Y otherwise


                           H if a file has
group H or L               high** DOI value
                           L otherwise


                                 * hinted by the paper [Kim et al. 2007]
        ** threshold: median of degree of interest (DOI) values in a task
Bug Prediction Process
STEP1: Counting & Labeling
               Instances



           Task 1   Task 2   Task 3       Task i        Task i+1      Task i+2   Task i+3

                                                                                 f1.java
                                                            f3.java   f2.java
                                      …                                          f2.java    …
                                                            f1.java   f3.java
                                                                                 f3.java



Dec 2005                                           Time P                              Sep 2010
STEP1: Counting & Labeling
               Instances



           Task 1   Task 2   Task 3         Task i        Task i+1      Task i+2   Task i+3

                                                                                   f1.java
                                                              f3.java   f2.java
                                      …                                            f2.java    …
                                                              f1.java   f3.java
                                                                                   f3.java



Dec 2005                                             Time P                              Sep 2010




 All the Mylyn task data collectable from
  Eclipse subprojects (Dec 05 ~Sep 10)
STEP1: Counting & Labeling
               Instances



           Task 1   Task 2   Task 3       Task i        Task i+1      Task i+2   Task i+3

                                                                                 f1.java
                                                            f3.java   f2.java
                                      …                                          f2.java    …
                                                            f1.java   f3.java
                                                                                 f3.java



Dec 2005                                           Time P                              Sep 2010
STEP1: Counting & Labeling
               Instances



           Task 1   Task 2   Task 3       Task i        Task i+1      Task i+2   Task i+3

                                                                                 f1.java
                                                            f3.java   f2.java
                                      …                                          f2.java    …
                                                            f1.java   f3.java
                                                                                 f3.java



Dec 2005                                           Time P                              Sep 2010


                                                             Post-defect counting period
STEP1: Counting & Labeling
               Instances



           Task 1   Task 2   Task 3       Task i        Task i+1      Task i+2   Task i+3

                                                                                 f1.java
                                                            f3.java   f2.java
                                      …                                          f2.java    …
                                                            f1.java   f3.java
                                                                                 f3.java



Dec 2005                                           Time P                              Sep 2010


                                                             Post-defect counting period
STEP1: Counting & Labeling
               Instances



           Task 1   Task 2   Task 3       Task i        Task i+1      Task i+2   Task i+3

                                                                                 f1.java
                                                            f3.java   f2.java
                                      …                                          f2.java    …
                                                            f1.java   f3.java
                                                                                 f3.java



Dec 2005                                           Time P                              Sep 2010


                                                             Post-defect counting period
STEP1: Counting & Labeling
               Instances



           Task 1   Task 2   Task 3       Task i        Task i+1      Task i+2   Task i+3

                                                                                 f1.java
                                                            f3.java   f2.java
                                      …                                          f2.java    …
                                                            f1.java   f3.java
                                                                                 f3.java



Dec 2005                                           Time P                              Sep 2010


                                                             Post-defect counting period
STEP1: Counting & Labeling
               Instances



           Task 1   Task 2   Task 3       Task i        Task i+1      Task i+2   Task i+3

                                                                                 f1.java
                                                            f3.java   f2.java
                                      …                                          f2.java    …
                                                            f1.java   f3.java
                                                                                 f3.java



Dec 2005                                           Time P                              Sep 2010


                                                             Post-defect counting period
STEP1: Counting & Labeling
               Instances

   The number of counted post defects
      (edited files only within bug fixing tasks)
         Task 1
                       f1.java Task 3
                    Task 2
                               =1                Task i        Task i+1      Task i+2   Task i+3
                       f2.java = 1
                       f3.java = 2                                                      f1.java
                                                                   f3.java   f2.java
                            …            …                         f1.java   f3.java
                                                                                        f2.java    …
                                                                                        f3.java
     Labeling rule for the file instance
           “buggy” (if # of post-defects > 0)
Dec 2005    “clean” (if # of post-defects = 0)            Time P                              Sep 2010


                                                                    Post-defect counting period
STEP2: Extraction of MIMs




Dec 2005                Time P     Sep 2010
STEP2: Extraction of MIMs


           Task 1    Task 2     Task 3          Task 4


                                                         …




Dec 2005                                                     Time P   Sep 2010


                    Metrics extraction period
STEP2: Extraction of MIMs


           Task 1

           f3.java
              ...
             edit
              …
             edit
              …




Dec 2005                                         Time P   Sep 2010


                     Metrics extraction period
STEP2: Extraction of MIMs

                                                   Metrics Computation
           Task 1

           f3.java
              ...
             edit
              …
             edit
              …




Dec 2005                                         Time P                  Sep 2010


                     Metrics extraction period
STEP2: Extraction of MIMs

                                                   Metrics Computation
           Task 1                                   MIMf3.java  valueTask1
           f3.java
              ...
             edit
              …
             edit
              …




Dec 2005                                         Time P                       Sep 2010


                     Metrics extraction period
STEP2: Extraction of MIMs

                                                   Metrics Computation
           Task 1     Task 2                        MIMf3.java  valueTask1
           f3.java
              ...
                      f1.java
                         ...
                                                   MIMf1.java  valueTask2
             edit       edit
              …          …
             edit       edit
              …          …




Dec 2005                                         Time P                       Sep 2010


                     Metrics extraction period
STEP2: Extraction of MIMs

                                                   Metrics Computation
           Task 1     Task 2      Task 3            MIMf3.java  valueTask1
           f3.java
              ...
                      f1.java
                         ...
                                 f2.java
                                    ...
                                                   MIMf1.java  valueTask2
             edit       edit       edit
              …
             edit
                         …
                        edit
                                    …
                                   edit
                                                    MIMf2.java  valueTask3
              …          …          …




Dec 2005                                         Time P                       Sep 2010


                     Metrics extraction period
STEP2: Extraction of MIMs

                                                                 Metrics Computation
           Task 1     Task 2      Task 3         Task 4           MIMf3.java  valueTask1
           f3.java
              ...
                      f1.java
                         ...
                                 f2.java
                                    ...
                                             f1.java             MIMf1.java  valueTask2
                                                 …edit
             edit       edit       edit
                                                                  MIMf2.java  valueTask3
                                                 …edit..
              …          …          …                      …
             edit       edit       edit      f2.java
              …          …          …            …edit…




Dec 2005                                                       Time P                       Sep 2010


                     Metrics extraction period
STEP2: Extraction of MIMs

                                                                 Metrics Computation
           Task 1     Task 2      Task 3         Task 4           MIMf3.java  valueTask1
           f3.java    f1.java    f2.java     f1.java              MIMf1.java  (valueTask2+valueTask4)
              ...        ...        ...          …edit
             edit
              …
                        edit
                         …
                                   edit
                                    …
                                                 …edit..
                                             f2.java
                                                           …      MIMf2.java  (valueTask3+valueTask4)
             edit       edit       edit
              …          …          …            …edit…




Dec 2005                                                       Time P                          Sep 2010


                     Metrics extraction period
STEP2: Extraction of MIMs

                                                                 Metrics Computation
           Task 1     Task 2      Task 3         Task 4          MIMf3.java  valueTask1
           f3.java    f1.java    f2.java     f1.java             MIMf1.java  (valueTask2+valueTask4)/2
                                                                                                    )
              ...        ...        ...          …edit
             edit
              …
                        edit
                         …
                                   edit
                                    …
                                                 …edit..
                                             f2.java
                                                           …     MIMf2.java  (valueTask3+valueTask4)/2
                                                                                                    )
             edit       edit       edit
              …          …          …            …edit…




Dec 2005                                                       Time P                        Sep 2010


                     Metrics extraction period
Understand JAVA tool was used
for extracting 32 source Code
      Metrics (CMs)*




            * Chidamber and Kemerer, and OO metrics
Understand JAVA tool was used
      for extracting 32 source Code
                                Metrics (CMs)*
                                              List of selected source code metrics
                CVS
           last revision

       …

Dec 2005               Time P   Sep 2010




                                           * Chidamber and Kemerer, and OO metrics
Fifteen History Metrics (HMs)* were
    collected from the corresponding
              CVS repository




                              * Moser et al.
Fifteen History Metrics (HMs)* were
     collected from the corresponding
               CVS repository
                                             List of history metrics (HMs)


           CVS revisions


    …

Dec 2005                   Time P Sep 2010




                                                                    * Moser et al.
STEP3: Creating a training corpus

   Instance
               Extracted MIMs     Label
     Name


                                               Training
                    …                          Classifier




    Instance                       # of post
                 Extracted MIMs
      Name                          defects


                                                Training
                    …                          Regression
STEP4: Building prediction models

Classification and Regression
modeling with different machine
 learning algorithms using the
          WEKA* tool


                 * an open source data mining tool
STEP5: Prediction Evaluation




Classification
  Measures
STEP5: Prediction Evaluation

                     How many instances
                    are really buggy among
                      the buggy-predicted
                           outcomes?




Classification
  Measures
STEP5: Prediction Evaluation

                        How many instances
                       are really buggy among
                         the buggy-predicted
                              outcomes?

                     How many instances
                  are correctly predicted as
                   ‘buggy’ among the real
                         buggy ones




Classification
  Measures
STEP5: Prediction Evaluation



                         Regression
                          Measures


               correlation coefficient (-1~1)
                 mean absolute error (0~1)
                    root square error (0~1)
STEP5: Prediction Evaluation



                                         Regression
  between # of real buggy
instances and # of instances              Measures
     predicted as buggy



                               correlation coefficient (-1~1)
                                 mean absolute error (0~1)
                                    root square error (0~1)
T-test with 100 times of
 10-fold cross validation

  Reject H0* and accept H1*
        if p-value < 0.05
 (at the 95% confidence level)


 * H0: no difference in average performance, H1: different (better!)
Result Summary
 MIMs improve prediction accuracy for
1 different Eclipse project subjects

2 different machine learning algorithms

3 different model training periods
Prediction for different project subjects



          File instances and % of defects
Prediction for different project subjects




    MIM: the proposed metrics   CM: source code metrics   HM: history metrics
Prediction for different project subjects



                                                BASELINE: Dummy Classifier
                                             predicts in a purely random manner

                                             e.g., for 12.5% of buggy instances
                                            Precision(B)=12.5%, Recall(B)=50%
                                                     F-measure(B)=20%




    MIM: the proposed metrics   CM: source code metrics   HM: history metrics
Prediction for different project subjects




    MIM: the proposed metrics   CM: source code metrics   HM: history metrics
Prediction for different project subjects



    T-test results (significant figures are in bold, p-value < 0.05)
Prediction with different algorithms
Prediction with different algorithms




  T-test results (significant figures are in bold, p-value < 0.05)
Prediction in different training periods




           Model training period        Model testing period


Dec 2005                                                  Sep 2010
                                   Time P
Prediction in different training periods



               50%                    :               50%
               70%                    :               30%
               80%                    :               20%

           Model training period          Model testing period


Dec 2005                                                    Sep 2010
                                   Time P
Prediction in different training periods
Prediction in different training periods




   T-test results (significant figures are in bold, p-value < 0.05)
Top 42 (37%) from MIMs
 among total 113 metrics
   (MIMs+CMs+HMs)
Possible Insight
TOP 1: NumLowDOIEdit
TOP 2: NumPatternEXSX
TOP 3: TimeSpentOnEdit
Possible Insight
        TOP 1: NumLowDOIEdit
        TOP 2: NumPatternEXSX
        TOP 3: TimeSpentOnEdit
Chances are that more defects might be generated
if a programmer       TOP2   repeatedly edit and browse a
file especially related to the previous defects     TOP3

  with putting more weight on editing time, and
  especially   TOP1   when editing such the files less
   frequently or less recently accessed ever …
Performance comparison
with regression modeling
for predicting # of post-defects
Predicting Post-Defect Numbers
Predicting Post-Defect Numbers




T-test results (significant figures are in bold, p-value < 0.05)
Threats to Validity
• Systems examined might not be representative

• Systems are all open source projects

• Defect information might be biased
Conclusion

Our findings exemplify that developer’s
interaction can affect software quality

Our proposed micro interaction metrics
 improve defect prediction accuracy
            significantly
                  …
We believe future defect prediction
models will use more developers’ direct and
   micro level interaction information




MIMs are a first step towards it
Thank you! Any Question?
• Problem
  – Developer’s interaction information can affect
    software quality (defects)?
• Approach
  – We proposed novel micro interaction metrics
    (MIMs) overcoming the popular static metrics
• Result
  – MIMs significantly improve prediction accuracy
    compared to source code metrics (CMs) and
    history metrics (HMs)
Backup Slides
One possible ARGUMENT:
 Some developers may not
have used Mylyn to fix bugs
Error chance in counting post-defects
    as a result getting biased labels
(i.e., incorrect % of buggy instances)
Repeated experiment using
 same instances but with a
different heuristics of defect
 counting, CVS-log-based
         approach*


    * with keywords: “fix”, “bug”, “bug report ID” in change logs
Prediction with CVS-log-based approach




                                     CVS-log-based
Prediction with CVS-log-based approach




                                                                      CVS-log-based
   T-test results (significant figures are in bold, p-value < 0.05)
CVS-log-based approach reported more
        additional post-defects
 (more % of buggy-labeled instances)
CVS-log-based approach reported more
        additional post-defects
 (more % of buggy-labeled instances)


 MIMs failed to feature them due to
lack of the corresponding Mylyn data
Note that it is difficult to
100% guarantee the quality of
          CVS change logs
 (e.g., no explicit bug ID, missing logs)

More Related Content

More from Sung Kim

REMI: Defect Prediction for Efficient API Testing (

ESEC/FSE 2015, Industria...
REMI: Defect Prediction for Efficient API Testing (

ESEC/FSE 2015, Industria...REMI: Defect Prediction for Efficient API Testing (

ESEC/FSE 2015, Industria...
REMI: Defect Prediction for Efficient API Testing (

ESEC/FSE 2015, Industria...
Sung Kim
 
Heterogeneous Defect Prediction (

ESEC/FSE 2015)
Heterogeneous Defect Prediction (

ESEC/FSE 2015)Heterogeneous Defect Prediction (

ESEC/FSE 2015)
Heterogeneous Defect Prediction (

ESEC/FSE 2015)
Sung Kim
 
A Survey on Automatic Software Evolution Techniques
A Survey on Automatic Software Evolution TechniquesA Survey on Automatic Software Evolution Techniques
A Survey on Automatic Software Evolution Techniques
Sung Kim
 
Crowd debugging (FSE 2015)
Crowd debugging (FSE 2015)Crowd debugging (FSE 2015)
Crowd debugging (FSE 2015)
Sung Kim
 
Software Defect Prediction on Unlabeled Datasets
Software Defect Prediction on Unlabeled DatasetsSoftware Defect Prediction on Unlabeled Datasets
Software Defect Prediction on Unlabeled Datasets
Sung Kim
 
Partitioning Composite Code Changes to Facilitate Code Review (MSR2015)
Partitioning Composite Code Changes to Facilitate Code Review (MSR2015)Partitioning Composite Code Changes to Facilitate Code Review (MSR2015)
Partitioning Composite Code Changes to Facilitate Code Review (MSR2015)
Sung Kim
 
Automatically Generated Patches as Debugging Aids: A Human Study (FSE 2014)
Automatically Generated Patches as Debugging Aids: A Human Study (FSE 2014)Automatically Generated Patches as Debugging Aids: A Human Study (FSE 2014)
Automatically Generated Patches as Debugging Aids: A Human Study (FSE 2014)
Sung Kim
 
How We Get There: A Context-Guided Search Strategy in Concolic Testing (FSE 2...
How We Get There: A Context-Guided Search Strategy in Concolic Testing (FSE 2...How We Get There: A Context-Guided Search Strategy in Concolic Testing (FSE 2...
How We Get There: A Context-Guided Search Strategy in Concolic Testing (FSE 2...
Sung Kim
 
CrashLocator: Locating Crashing Faults Based on Crash Stacks (ISSTA 2014)
CrashLocator: Locating Crashing Faults Based on Crash Stacks (ISSTA 2014)CrashLocator: Locating Crashing Faults Based on Crash Stacks (ISSTA 2014)
CrashLocator: Locating Crashing Faults Based on Crash Stacks (ISSTA 2014)
Sung Kim
 
Source code comprehension on evolving software
Source code comprehension on evolving softwareSource code comprehension on evolving software
Source code comprehension on evolving software
Sung Kim
 
A Survey on Dynamic Symbolic Execution for Automatic Test Generation
A Survey on  Dynamic Symbolic Execution  for Automatic Test GenerationA Survey on  Dynamic Symbolic Execution  for Automatic Test Generation
A Survey on Dynamic Symbolic Execution for Automatic Test Generation
Sung Kim
 
Survey on Software Defect Prediction
Survey on Software Defect PredictionSurvey on Software Defect Prediction
Survey on Software Defect Prediction
Sung Kim
 
MSR2014 opening
MSR2014 openingMSR2014 opening
MSR2014 openingSung Kim
 
Personalized Defect Prediction
Personalized Defect PredictionPersonalized Defect Prediction
Personalized Defect Prediction
Sung Kim
 
STAR: Stack Trace based Automatic Crash Reproduction
STAR: Stack Trace based Automatic Crash ReproductionSTAR: Stack Trace based Automatic Crash Reproduction
STAR: Stack Trace based Automatic Crash Reproduction
Sung Kim
 
Transfer defect learning
Transfer defect learningTransfer defect learning
Transfer defect learning
Sung Kim
 
Automatic patch generation learned from human written patches
Automatic patch generation learned from human written patchesAutomatic patch generation learned from human written patches
Automatic patch generation learned from human written patches
Sung Kim
 
The Anatomy of Developer Social Networks
The Anatomy of Developer Social NetworksThe Anatomy of Developer Social Networks
The Anatomy of Developer Social Networks
Sung Kim
 
A Survey on Automatic Test Generation and Crash Reproduction
A Survey on Automatic Test Generation and Crash ReproductionA Survey on Automatic Test Generation and Crash Reproduction
A Survey on Automatic Test Generation and Crash Reproduction
Sung Kim
 
How Do Software Engineers Understand Code Changes? FSE 2012
How Do Software Engineers Understand Code Changes? FSE 2012How Do Software Engineers Understand Code Changes? FSE 2012
How Do Software Engineers Understand Code Changes? FSE 2012
Sung Kim
 

More from Sung Kim (20)

REMI: Defect Prediction for Efficient API Testing (

ESEC/FSE 2015, Industria...
REMI: Defect Prediction for Efficient API Testing (

ESEC/FSE 2015, Industria...REMI: Defect Prediction for Efficient API Testing (

ESEC/FSE 2015, Industria...
REMI: Defect Prediction for Efficient API Testing (

ESEC/FSE 2015, Industria...
 
Heterogeneous Defect Prediction (

ESEC/FSE 2015)
Heterogeneous Defect Prediction (

ESEC/FSE 2015)Heterogeneous Defect Prediction (

ESEC/FSE 2015)
Heterogeneous Defect Prediction (

ESEC/FSE 2015)
 
A Survey on Automatic Software Evolution Techniques
A Survey on Automatic Software Evolution TechniquesA Survey on Automatic Software Evolution Techniques
A Survey on Automatic Software Evolution Techniques
 
Crowd debugging (FSE 2015)
Crowd debugging (FSE 2015)Crowd debugging (FSE 2015)
Crowd debugging (FSE 2015)
 
Software Defect Prediction on Unlabeled Datasets
Software Defect Prediction on Unlabeled DatasetsSoftware Defect Prediction on Unlabeled Datasets
Software Defect Prediction on Unlabeled Datasets
 
Partitioning Composite Code Changes to Facilitate Code Review (MSR2015)
Partitioning Composite Code Changes to Facilitate Code Review (MSR2015)Partitioning Composite Code Changes to Facilitate Code Review (MSR2015)
Partitioning Composite Code Changes to Facilitate Code Review (MSR2015)
 
Automatically Generated Patches as Debugging Aids: A Human Study (FSE 2014)
Automatically Generated Patches as Debugging Aids: A Human Study (FSE 2014)Automatically Generated Patches as Debugging Aids: A Human Study (FSE 2014)
Automatically Generated Patches as Debugging Aids: A Human Study (FSE 2014)
 
How We Get There: A Context-Guided Search Strategy in Concolic Testing (FSE 2...
How We Get There: A Context-Guided Search Strategy in Concolic Testing (FSE 2...How We Get There: A Context-Guided Search Strategy in Concolic Testing (FSE 2...
How We Get There: A Context-Guided Search Strategy in Concolic Testing (FSE 2...
 
CrashLocator: Locating Crashing Faults Based on Crash Stacks (ISSTA 2014)
CrashLocator: Locating Crashing Faults Based on Crash Stacks (ISSTA 2014)CrashLocator: Locating Crashing Faults Based on Crash Stacks (ISSTA 2014)
CrashLocator: Locating Crashing Faults Based on Crash Stacks (ISSTA 2014)
 
Source code comprehension on evolving software
Source code comprehension on evolving softwareSource code comprehension on evolving software
Source code comprehension on evolving software
 
A Survey on Dynamic Symbolic Execution for Automatic Test Generation
A Survey on  Dynamic Symbolic Execution  for Automatic Test GenerationA Survey on  Dynamic Symbolic Execution  for Automatic Test Generation
A Survey on Dynamic Symbolic Execution for Automatic Test Generation
 
Survey on Software Defect Prediction
Survey on Software Defect PredictionSurvey on Software Defect Prediction
Survey on Software Defect Prediction
 
MSR2014 opening
MSR2014 openingMSR2014 opening
MSR2014 opening
 
Personalized Defect Prediction
Personalized Defect PredictionPersonalized Defect Prediction
Personalized Defect Prediction
 
STAR: Stack Trace based Automatic Crash Reproduction
STAR: Stack Trace based Automatic Crash ReproductionSTAR: Stack Trace based Automatic Crash Reproduction
STAR: Stack Trace based Automatic Crash Reproduction
 
Transfer defect learning
Transfer defect learningTransfer defect learning
Transfer defect learning
 
Automatic patch generation learned from human written patches
Automatic patch generation learned from human written patchesAutomatic patch generation learned from human written patches
Automatic patch generation learned from human written patches
 
The Anatomy of Developer Social Networks
The Anatomy of Developer Social NetworksThe Anatomy of Developer Social Networks
The Anatomy of Developer Social Networks
 
A Survey on Automatic Test Generation and Crash Reproduction
A Survey on Automatic Test Generation and Crash ReproductionA Survey on Automatic Test Generation and Crash Reproduction
A Survey on Automatic Test Generation and Crash Reproduction
 
How Do Software Engineers Understand Code Changes? FSE 2012
How Do Software Engineers Understand Code Changes? FSE 2012How Do Software Engineers Understand Code Changes? FSE 2012
How Do Software Engineers Understand Code Changes? FSE 2012
 

Recently uploaded

Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
Alison B. Lowndes
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Paige Cruz
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Product School
 
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
UiPathCommunity
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
mikeeftimakis1
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
OnBoard
 
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfSAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
Peter Spielvogel
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
Product School
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Aggregage
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
Elena Simperl
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
Ralf Eggert
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
Safe Software
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
Free Complete Python - A step towards Data Science
Free Complete Python - A step towards Data ScienceFree Complete Python - A step towards Data Science
Free Complete Python - A step towards Data Science
RinaMondal9
 
By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
Pierluigi Pugliese
 

Recently uploaded (20)

Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
 
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
 
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfSAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
Free Complete Python - A step towards Data Science
Free Complete Python - A step towards Data ScienceFree Complete Python - A step towards Data Science
Free Complete Python - A step towards Data Science
 
By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
 

Micro Interaction Metrics for Defect Prediction (ESEC/FSE 2011)

  • 1. Micro Interaction Metrics for Defect Prediction Taek Lee, Jaechang Nam, Dongyun Han, Sunghun Kim, Hoh Peter In FSE 2011, Hungary, Sep. 5-9
  • 2. Outline • Research motivation • The existing metrics • The proposed metrics • Experiment results • Threats to validity • Conclusion
  • 3. Defect Prediction? why is it necessary?
  • 4. Software quality assurance is inherently a resource constrained activity!
  • 5. Predicting defect-prone software entities* is to put the best labor effort on the entities * functions or code files
  • 6. Indicators of defects • Complexity of source codes (Chidamber and Kemerer 1994) • Frequent code changes (Moser et al. 2008) • Previous defect information (Kim et al. 2007) • Code dependencies (Zimmermann 2007)
  • 8. Humans Error! Programmers make mistakes, consequently defects are injected, and software fails Human Bugs Software Errors Injected fails
  • 9. Programmer Interaction and Software Quality
  • 10. Programmer Interaction and Software Quality “Errors are from cognitive breakdown while understanding and implementing requirements” - Ko et al. 2005
  • 11. Programmer Interaction and Software Quality “Errors are from cognitive breakdown while understanding and implementing requirements” - Ko et al. 2005 “Work interruptions or task switching may affect programmer productivity” - DeLine et al. 2006
  • 12. Don’t we need to also consider developers’ interactions as defect indicators?
  • 13. …, but the existing indicators can NOT directly capture developers’ interactions
  • 14. Using Mylyn data, we propose novel “Micro Interaction Metrics (MIMs)” capturing developers’ interactions
  • 15. The Mylyn* data is stored as an attachment to the corresponding bug reports in the XML format * Eclipse plug-in storing and recovering task contexts
  • 16.
  • 17. <InteractionEvent … Kind=“ ” … StartDate=“ ” EndDate=“ ” … StructureHandle=“ ” … Interest=“ ” … >
  • 18. <InteractionEvent … Kind=“ ” … StartDate=“ ” EndDate=“ ” … StructureHandle=“ ” … Interest=“ ” … >
  • 19. <InteractionEvent … Kind=“ ” … StartDate=“ ” EndDate=“ ” … StructureHandle=“ ” … Interest=“ ” … >
  • 20. Two levels of MIMs Design
  • 21. Two levels of MIMs Design File-level MIMs specific interactions for a file in a task (e.g., AvgTimeIntervalEditEdit)
  • 22. Two levels of MIMs Design File-level MIMs specific interactions for a file in a task (e.g., AvgTimeIntervalEditEdit) Task-level MIMs property values shared over the whole task (e.g., TimeSpent)
  • 23. Two levels of MIMs Design File-level MIMs Mylyn Task Logs specific interactions for a file in a task 10:30 Selection file A (e.g., AvgTimeIntervalEditEdit) 11:00 Edit file B 12:30 Edit file B
  • 24. Two levels of MIMs Design Mylyn Task Logs 10:30 Selection file A Task-level MIMs 11:00 Edit file B property values shared 12:30 Edit file B over the whole task (e.g., TimeSpent)
  • 25. The Proposed Micro Interaction Metrics
  • 26. The Proposed Micro Interaction Metrics
  • 27. The Proposed Micro Interaction Metrics
  • 28. For example, NumPatternSXEY is to capture this interaction:
  • 29. For example, NumPatternSXEY is to capture this interaction: “How much times did a programmer Select a file of group X and then Edit a file of group Y in a task activity?”
  • 30. X if a file shows defect locality* properties group X or Y Y otherwise H if a file has group H or L high** DOI value L otherwise * hinted by the paper [Kim et al. 2007] ** threshold: median of degree of interest (DOI) values in a task
  • 32. STEP1: Counting & Labeling Instances Task 1 Task 2 Task 3 Task i Task i+1 Task i+2 Task i+3 f1.java f3.java f2.java … f2.java … f1.java f3.java f3.java Dec 2005 Time P Sep 2010
  • 33. STEP1: Counting & Labeling Instances Task 1 Task 2 Task 3 Task i Task i+1 Task i+2 Task i+3 f1.java f3.java f2.java … f2.java … f1.java f3.java f3.java Dec 2005 Time P Sep 2010 All the Mylyn task data collectable from Eclipse subprojects (Dec 05 ~Sep 10)
  • 34. STEP1: Counting & Labeling Instances Task 1 Task 2 Task 3 Task i Task i+1 Task i+2 Task i+3 f1.java f3.java f2.java … f2.java … f1.java f3.java f3.java Dec 2005 Time P Sep 2010
  • 35. STEP1: Counting & Labeling Instances Task 1 Task 2 Task 3 Task i Task i+1 Task i+2 Task i+3 f1.java f3.java f2.java … f2.java … f1.java f3.java f3.java Dec 2005 Time P Sep 2010 Post-defect counting period
  • 36. STEP1: Counting & Labeling Instances Task 1 Task 2 Task 3 Task i Task i+1 Task i+2 Task i+3 f1.java f3.java f2.java … f2.java … f1.java f3.java f3.java Dec 2005 Time P Sep 2010 Post-defect counting period
  • 37. STEP1: Counting & Labeling Instances Task 1 Task 2 Task 3 Task i Task i+1 Task i+2 Task i+3 f1.java f3.java f2.java … f2.java … f1.java f3.java f3.java Dec 2005 Time P Sep 2010 Post-defect counting period
  • 38. STEP1: Counting & Labeling Instances Task 1 Task 2 Task 3 Task i Task i+1 Task i+2 Task i+3 f1.java f3.java f2.java … f2.java … f1.java f3.java f3.java Dec 2005 Time P Sep 2010 Post-defect counting period
  • 39. STEP1: Counting & Labeling Instances Task 1 Task 2 Task 3 Task i Task i+1 Task i+2 Task i+3 f1.java f3.java f2.java … f2.java … f1.java f3.java f3.java Dec 2005 Time P Sep 2010 Post-defect counting period
  • 40. STEP1: Counting & Labeling Instances The number of counted post defects (edited files only within bug fixing tasks) Task 1 f1.java Task 3 Task 2 =1 Task i Task i+1 Task i+2 Task i+3 f2.java = 1 f3.java = 2 f1.java f3.java f2.java … … f1.java f3.java f2.java … f3.java Labeling rule for the file instance “buggy” (if # of post-defects > 0) Dec 2005 “clean” (if # of post-defects = 0) Time P Sep 2010 Post-defect counting period
  • 41. STEP2: Extraction of MIMs Dec 2005 Time P Sep 2010
  • 42. STEP2: Extraction of MIMs Task 1 Task 2 Task 3 Task 4 … Dec 2005 Time P Sep 2010 Metrics extraction period
  • 43. STEP2: Extraction of MIMs Task 1 f3.java ... edit … edit … Dec 2005 Time P Sep 2010 Metrics extraction period
  • 44. STEP2: Extraction of MIMs Metrics Computation Task 1 f3.java ... edit … edit … Dec 2005 Time P Sep 2010 Metrics extraction period
  • 45. STEP2: Extraction of MIMs Metrics Computation Task 1 MIMf3.java  valueTask1 f3.java ... edit … edit … Dec 2005 Time P Sep 2010 Metrics extraction period
  • 46. STEP2: Extraction of MIMs Metrics Computation Task 1 Task 2 MIMf3.java  valueTask1 f3.java ... f1.java ... MIMf1.java  valueTask2 edit edit … … edit edit … … Dec 2005 Time P Sep 2010 Metrics extraction period
  • 47. STEP2: Extraction of MIMs Metrics Computation Task 1 Task 2 Task 3 MIMf3.java  valueTask1 f3.java ... f1.java ... f2.java ... MIMf1.java  valueTask2 edit edit edit … edit … edit … edit MIMf2.java  valueTask3 … … … Dec 2005 Time P Sep 2010 Metrics extraction period
  • 48. STEP2: Extraction of MIMs Metrics Computation Task 1 Task 2 Task 3 Task 4 MIMf3.java  valueTask1 f3.java ... f1.java ... f2.java ... f1.java MIMf1.java  valueTask2 …edit edit edit edit MIMf2.java  valueTask3 …edit.. … … … … edit edit edit f2.java … … … …edit… Dec 2005 Time P Sep 2010 Metrics extraction period
  • 49. STEP2: Extraction of MIMs Metrics Computation Task 1 Task 2 Task 3 Task 4 MIMf3.java  valueTask1 f3.java f1.java f2.java f1.java MIMf1.java  (valueTask2+valueTask4) ... ... ... …edit edit … edit … edit … …edit.. f2.java … MIMf2.java  (valueTask3+valueTask4) edit edit edit … … … …edit… Dec 2005 Time P Sep 2010 Metrics extraction period
  • 50. STEP2: Extraction of MIMs Metrics Computation Task 1 Task 2 Task 3 Task 4 MIMf3.java  valueTask1 f3.java f1.java f2.java f1.java MIMf1.java  (valueTask2+valueTask4)/2 ) ... ... ... …edit edit … edit … edit … …edit.. f2.java … MIMf2.java  (valueTask3+valueTask4)/2 ) edit edit edit … … … …edit… Dec 2005 Time P Sep 2010 Metrics extraction period
  • 51. Understand JAVA tool was used for extracting 32 source Code Metrics (CMs)* * Chidamber and Kemerer, and OO metrics
  • 52. Understand JAVA tool was used for extracting 32 source Code Metrics (CMs)* List of selected source code metrics CVS last revision … Dec 2005 Time P Sep 2010 * Chidamber and Kemerer, and OO metrics
  • 53. Fifteen History Metrics (HMs)* were collected from the corresponding CVS repository * Moser et al.
  • 54. Fifteen History Metrics (HMs)* were collected from the corresponding CVS repository List of history metrics (HMs) CVS revisions … Dec 2005 Time P Sep 2010 * Moser et al.
  • 55. STEP3: Creating a training corpus Instance Extracted MIMs Label Name Training … Classifier Instance # of post Extracted MIMs Name defects Training … Regression
  • 56. STEP4: Building prediction models Classification and Regression modeling with different machine learning algorithms using the WEKA* tool * an open source data mining tool
  • 58. STEP5: Prediction Evaluation How many instances are really buggy among the buggy-predicted outcomes? Classification Measures
  • 59. STEP5: Prediction Evaluation How many instances are really buggy among the buggy-predicted outcomes? How many instances are correctly predicted as ‘buggy’ among the real buggy ones Classification Measures
  • 60. STEP5: Prediction Evaluation Regression Measures correlation coefficient (-1~1) mean absolute error (0~1) root square error (0~1)
  • 61. STEP5: Prediction Evaluation Regression between # of real buggy instances and # of instances Measures predicted as buggy correlation coefficient (-1~1) mean absolute error (0~1) root square error (0~1)
  • 62. T-test with 100 times of 10-fold cross validation Reject H0* and accept H1* if p-value < 0.05 (at the 95% confidence level) * H0: no difference in average performance, H1: different (better!)
  • 63. Result Summary MIMs improve prediction accuracy for 1 different Eclipse project subjects 2 different machine learning algorithms 3 different model training periods
  • 64. Prediction for different project subjects File instances and % of defects
  • 65. Prediction for different project subjects MIM: the proposed metrics CM: source code metrics HM: history metrics
  • 66. Prediction for different project subjects BASELINE: Dummy Classifier predicts in a purely random manner e.g., for 12.5% of buggy instances Precision(B)=12.5%, Recall(B)=50% F-measure(B)=20% MIM: the proposed metrics CM: source code metrics HM: history metrics
  • 67. Prediction for different project subjects MIM: the proposed metrics CM: source code metrics HM: history metrics
  • 68. Prediction for different project subjects T-test results (significant figures are in bold, p-value < 0.05)
  • 70. Prediction with different algorithms T-test results (significant figures are in bold, p-value < 0.05)
  • 71. Prediction in different training periods Model training period Model testing period Dec 2005 Sep 2010 Time P
  • 72. Prediction in different training periods 50% : 50% 70% : 30% 80% : 20% Model training period Model testing period Dec 2005 Sep 2010 Time P
  • 73. Prediction in different training periods
  • 74. Prediction in different training periods T-test results (significant figures are in bold, p-value < 0.05)
  • 75.
  • 76. Top 42 (37%) from MIMs among total 113 metrics (MIMs+CMs+HMs)
  • 77. Possible Insight TOP 1: NumLowDOIEdit TOP 2: NumPatternEXSX TOP 3: TimeSpentOnEdit
  • 78. Possible Insight TOP 1: NumLowDOIEdit TOP 2: NumPatternEXSX TOP 3: TimeSpentOnEdit Chances are that more defects might be generated if a programmer TOP2 repeatedly edit and browse a file especially related to the previous defects TOP3 with putting more weight on editing time, and especially TOP1 when editing such the files less frequently or less recently accessed ever …
  • 79. Performance comparison with regression modeling for predicting # of post-defects
  • 81. Predicting Post-Defect Numbers T-test results (significant figures are in bold, p-value < 0.05)
  • 82. Threats to Validity • Systems examined might not be representative • Systems are all open source projects • Defect information might be biased
  • 83. Conclusion Our findings exemplify that developer’s interaction can affect software quality Our proposed micro interaction metrics improve defect prediction accuracy significantly …
  • 84. We believe future defect prediction models will use more developers’ direct and micro level interaction information MIMs are a first step towards it
  • 85. Thank you! Any Question? • Problem – Developer’s interaction information can affect software quality (defects)? • Approach – We proposed novel micro interaction metrics (MIMs) overcoming the popular static metrics • Result – MIMs significantly improve prediction accuracy compared to source code metrics (CMs) and history metrics (HMs)
  • 87.
  • 88. One possible ARGUMENT: Some developers may not have used Mylyn to fix bugs
  • 89. Error chance in counting post-defects as a result getting biased labels (i.e., incorrect % of buggy instances)
  • 90. Repeated experiment using same instances but with a different heuristics of defect counting, CVS-log-based approach* * with keywords: “fix”, “bug”, “bug report ID” in change logs
  • 91. Prediction with CVS-log-based approach CVS-log-based
  • 92. Prediction with CVS-log-based approach CVS-log-based T-test results (significant figures are in bold, p-value < 0.05)
  • 93. CVS-log-based approach reported more additional post-defects (more % of buggy-labeled instances)
  • 94. CVS-log-based approach reported more additional post-defects (more % of buggy-labeled instances) MIMs failed to feature them due to lack of the corresponding Mylyn data
  • 95. Note that it is difficult to 100% guarantee the quality of CVS change logs (e.g., no explicit bug ID, missing logs)