SlideShare a Scribd company logo
1 of 44
Download to read offline
Will my system run (correctly)
after the upgrade?


Martin Pinzger
Assistant Professor
Delft University of Technology
Martin’s upgrades


       Assistant
       Professor


                             PhD




     Postdoc

                    Pfunds         2
My Experience with Software Upgrades




                                       3
4
5
Bugs on upgrades get reported




                                6
Hmm, wait a minute

Can’t we learn “something” from that data?




                                             7
Software repository mining for
preventing upgrade failures


Martin Pinzger
Assistant Professor
Delft University of Technology
Goal of software repository mining

Making the information stored in software repositories
available to software developers
  Quality analysis and defect prediction
  Recommender systems
  ...




                                                         9
Software repositories




                        10
Examples from my mining research

Predicting failure-prone source files using changes (MSR 2011)

The relationship between developer contributions and failures
(FSE 2008)




There are many more studies
  MSR 2012 http://2012.msrconf.org/
  A survey and taxonomy of approaches for mining software repositories in
  the context of software evolution, Kagdi et al. 2007

                                                                        11
Using Fine-Grained Source
Code Changes for Bug
Prediction

Joint work with Emanuel Giger, Harald Gall
University of Zurich
Bug prediction

Goal
  Train models to predict the bug-prone source files of the next release


How
  Using product measures, process measures, organizational measures with
  machine learning techniques


Many existing studies on building prediction models
  Moser et al., Nagappan et al., Zimmermann et al., Hassan et al., etc.
  Process measures performed particularly well




                                                                           13
Classical change measures

Number of file revisions

Code Churn aka lines added/deleted/changed




Research question of this study: Can we further improve these
models?




                                                                14
Revisions are coarse grained

What did change in a revision?




                                 15
Code Churn can be imprecise

Extra changes not relevant for locating bugs




                                               16
Fine Grained-Source Code Changes (SCC)

  Account.java 1.5                                    Account.java 1.6
                                                    "balance > 0 && amount <= balance"
               IF     "balance > 0"
                                                          IF



    THEN                                     THEN                 ELSE



     MI                                       MI                   MI
                                                                          notify();
"withDraw(amount);"                   "withDraw(amount);"


3 SCC: 1x condition change, 1x else-part insert, 1x invocation
statement insert                                                                      17
Research hypotheses

H1   SCC is correlated with the number of
     bugs in source files

H2   SCC is a predictor for bug-prone source
     files (and outperforms LM)

H3   SCC is a predictor for the number of bugs
     in source files (and outperforms LM)



                                                 18
15 Eclipse plug-ins

Data
 >850’000 fine-grained source code changes (SCC)
 >10’000 files
 >9’700’000 lines modified (LM)
 >9 years of development history
 ..... and a lot of bugs referenced in commit messages




                                                         19
on parametric Spearman rank correlation of           Table 5: N
 nd SCC . * is correlated with #bugsat
  H1: SCC marks significant correlations              and cate
rger values are printed bold.                          = 0.01
    Eclipse Project    LM     SCC                       Eclipse Pr
    Compare           0.68    0.76                      Compare
    jFace             0.74    0.71                      jFace
    JDT Debug         0.62    0.8                       Resource
    Resource          0.75    0.86                      Team Cor
    Runtime           0.66    0.79                      CVS Core
    Team Core         0.15    0.66                      Debug Co
    CVS Core          0.60    0.79                      Runtime
    Debug Core        0.63    0.78                      JDT Debu
    jFace Text        0.75    0.74                      jFace Text
    Update Core       0.43    0.62                      JDT Debu
    Debug UI          0.56    0.81                      Update C
                                     +/-0.5 substantial Debug UI
    JDT Debug UI      0.80    0.81
    Help              0.54    0.48   +/-0.7 strong      Help
    JDT Core          0.70    0.74                      OSGI
    OSGI              0.70    0.77   *significant        JDT Core
    Median             0.66   0.77   correlation at 0.01Mean
                                                               20
calculate and assign a probability to a file if it is bug-prone or
not bug-prone. bug-prone files
 Predicting
   For each Eclipse project we binned files into bug-prone and
not bug-prone using the median of the number of bugs per file
  Bug-prone vs. not bug-prone
(#bugs):
               ⇢
                  not bug prone : #bugs <= median
  bugClass =
                      bug prone : #bugs > median

When using the median as cut point the labeling of a file is
relative to how much bugs other files have in a project. There
exist several ways of binning files afore. They mainly vary in
that they result in different prior probabilities: For instance
Zimmerman et al. [40] and Bernstein et al. [4] labeled files as
bug-prone if they had at least one bug. When having heavily
skewed distributions this approach may lead to high a prior
probability towards a one class. Nagappan et al. [28] used a    21
UC values of E 1 using logistic regression with
CC as predictors for bug-prone and a notfiles
  H2: SCC can predict bug-prone bug-
 Larger values are printed in bold.
   Eclipse Project   AUC LM   AUC SCC
   Compare            0.84    0.85
   jFace              0.90    0.90
   JDT Debug          0.83    0.95
   Resource           0.87    0.93
   Runtime            0.83    0.91
   Team Core          0.62    0.87
   CVS Core           0.80    0.90
   Debug Core         0.86    0.94      SCC outperforms LM
   jFace Text         0.87    0.87
   Update Core        0.78    0.85
   Debug UI           0.85    0.93
   JDT Debug UI       0.90    0.91
   Help               0.75    0.70
   JDT Core           0.86    0.87
   OSGI               0.88    0.88
   Median             0.85    0.90
   Overall            0.85    0.89                           22
Predicting the number of bugs

Non linear regression with asymptotic model:
                                Team Core

               60
       #Bugs




               40




               20




                               f(#Bugs) = a1 + b2*eb3*SCC
                0
                    0   1000        2000     3000      4000
                                                              23
                                    #SCC
1.50


Table 8: Results of predict the number of of R
H3: SCC can the nonlinear regression in terms bugs
                                                            2

and Spearman correlation using LM and SCC as predictors.                          1.00




                                                                nrm. Residuals
 Project        R2 LM   R2 SCC   SpearmanLM   SpearmanSCC                          .50


 Compare         0.84    0.88        0.68          0.76
 jFace           0.74    0.79        0.74          0.71                            .00


 JDT Debug       0.69    0.68        0.62           0.8
 Resource        0.81    0.85        0.75          0.86                           -.50


 Runtime         0.69    0.72        0.66          0.79
 Team Core       0.26    0.53        0.15          0.66                          -1.00


 CVS Core        0.76    0.83        0.62          0.79
 Debug Core      0.88    0.92        0.63          0.78
 Jface Text      0.83    0.89        0.75          0.74            6,000.0


 Update Core     0.41    0.48        0.43          0.62
 Debug UI         0.7    0.79        0.56          0.81            5,000.0

 JDT Debug UI    0.82    0.82         0.8          0.81
 Help            0.66    0.67        0.54          0.84            4,000.0
 JDT Core        0.69    0.77         0.7          0.74
 OSGI            0.51     0.8        0.74          0.77
                                                                   3,000.0
 Median           0.7    0.79        0.66          0.77
 Overall         0.65    0.72        0.62          0.74
                                                                   2,000.0


SCC outperforms LM
                                                                   1,000.0 4
                                                                         2
Summary of results

SCC performs significantly better than LM
  Advanced learners are not always better
  Change types do not yield extra discriminatory power


Predicting the number of bugs is “possible”

More information
  “Comparing Fine-Grained Source Code Changes And Code Churn For Bug
  Prediction”, MSR 2011




                                                                   25
What is next?

Analysis of the effect(s) of changes
  What is the effect on the design?
  What is the effect on the quality?


Ease understanding of changes

Recommender techniques
  Models that can provide feedback on the effects




                                                    26
27
Can developer-module
networks predict failures?


Joint work with Nachi Nagappan, Brendan Murphy
Microsoft Research
Research question

Are binaries with fragmented contributions from many
developers more likely to have post-release failures?
  Should developers focus on one thing?




                                                        29
Study with MS Vista project

Data
 Released in January, 2007
 > 4 years of development
 Several thousand developers
 Several thousand binaries (*.exe, *.dll)
 Several millions of commits




                                            30
Approach in a nutshell

                      Fu                       Alice
                      6                    6
  Change      Eric        b        Bob         a       Go
                  5           2      4             2
   Logs               4                    5           7

                      Dan                      Hin     c
                                                   4




            Binary    #bugs       #centrality
              a           12         0.9
   Bugs       b           7          0.5
              c           3          0.2




                                         Regression Analysis
                                         Validation with data splitting
                                                                          31
Contribution network


                                 Windows binary (*.dll)
                                 Developer




Which binary is failure-prone?
                                                          32
Measuring fragmentation




                     Freeman degree




 Closeness           Bonacich’s power
                                        33
Research hypotheses


     Binaries with fragmented contributions
H1
     are failure-prone

     Fragmentation correlates positively with
H2
     the number of post-release failures

     Advanced fragmentation measures
H3
     improve failure estimation



                                                34
Correlation analysis

Spearman rank correlation

            nrCommits nrAuthors   Power   dPower   Closeness   Reach   Betweenness

 Failures    0,700      0,699     0,692   0,740     0,747      0,746     0,503
nrCommits               0,704     0,996   0,773     0,748      0,732     0,466
nrAuthors                         0,683   0,981     0,914      0,944     0,830
 Power                                    0,756     0,732      0,714     0,439
 dPower                                             0,943      0,964     0,772
Closeness                                                      0,990     0,738
  Reach                                                                  0,773

All correlations are significant at the 0.01 level (2-tailed)

                                                                                     35
H1: Predicting failure-prone binaries

Binary logistic regression of 50 random splits
       4 principal components from 7 centrality measures



Precision                    Recall                  AUC
1.00                         1.00                    1.00


0.90                         0.90                    0.90


0.80                         0.80                    0.80


0.70                         0.70                    0.70


0.60                         0.60                    0.60


0.50                         0.50                    0.50
        0      20     40            0   20     40           0   20   40




                                                                          36
H2: Predicting the number of failures

Linear regression of 50 random splits
           #Failures = b0 + b1*nCloseness + b2*nrAuthors + b3*nrCommits




R-Square                           Pearson                       Spearman
1.00                               1.00                          1.00


0.90                               0.90                          0.90


0.80                               0.80                          0.80


0.70                               0.70                          0.70


0.60                               0.60                          0.60


0.50                               0.50                          0.50
       0          20       40             0     20       40             0   20   40


All correlations are significant at the 0.01 level (2-tailed)
                                                                                      37
H3: Basic vs. advanced measures
  Model with nrAuthors,   Model with nCloseness,
  nrCommits               nrAuthors, nrCommits
  1.00                    1.00




                                               R-Square
  0.90                    0.90
  0.80                    0.80
  0.70                    0.70
  0.60                    0.60
  0.50                    0.50
  0.40                    0.40
  0.30                    0.30
         0   20   40             0   20   40




                                               Spearman
 1.00                     1.00
 0.90                     0.90
 0.80                     0.80
 0.70                     0.70
 0.60                     0.60
 0.50                     0.50
 0.40                     0.40
 0.30                     0.30
         0   20   40             0   20   40
                                                          38
Summary of results

Centrality measures can predict more than 83% of failure-
pone Vista binaries

Closeness, nrAuthors, and nrCommits can predict the number
of post-release failures

Closeness or Reach can improve prediction of the number of
post-release failures by 32%

More information
  Can Developer-Module Networks Predict Failures?, FSE 2008




                                                              39
What can we learn from that?

                    6                  6



               5        2       4          2
                    4                  5           7


                                               4

Increase testing effort for central binaries? - yes

Re-factor central binaries? - maybe

Re-organize contributions? - maybe
                                                       40
What is next?

Analysis of the contributions of a developer
  Who is working on which parts of the system?
  What exactly is the contribution of a developer?
  Who is introducing bugs/smells and how can we avoid it?


Global distributed software engineering
  What are the contributions of teams, smells and how to avoid it?
  Can we empirically prove Conway’s Law?

Expert recommendation
  Whom to ask for advice on a piece of code?



                                                                     41
Ideas for software upgrade research

1. Mining software repositories to identify the upgrade-critical
components
  What are the characteristics of such components?
    Product and process measures
  What are the characteristics of the target environments?
    Hardware, operating system, configuration
  Train a model with these characteristics and reported bugs




                                                               42
Further ideas for research

Who is upgrading which applications when?
  Study upgrade behavior of users?


What is the environment of the users when they upgrade?
  Where did it work, where did it fail?
  Collect crash reports for software upgrades?


Upgrades in distributed applications?
  Finding the optimal time when to upgrade which component?




                                                              43
Conclusions
                        Team Core

        60

                                                      6            6
#Bugs




        40


                                                  5       2    4       2
                                                      4            5           7
        20




                                                                           4
         0
             0   1000      2000     3000   4000

                          #SCC




                                                              Questions?
                                                            Martin Pinzger
                                                      m.pinzger@tudelft.nl
                                                                                   44

More Related Content

Similar to Keynote HotSWUp 2012

Studying the impact of dependency network measures on software quality
Studying the impact of dependency network measures on software quality	Studying the impact of dependency network measures on software quality
Studying the impact of dependency network measures on software quality ICSM 2010
 
Static analysis as means of improving code quality
Static analysis as means of improving code quality Static analysis as means of improving code quality
Static analysis as means of improving code quality Andrey Karpov
 
Cocomo ii estimation
Cocomo ii estimationCocomo ii estimation
Cocomo ii estimationjujin1810
 
SAST, CWE, SEI CERT and other smart words from the information security world
SAST, CWE, SEI CERT and other smart words from the information security worldSAST, CWE, SEI CERT and other smart words from the information security world
SAST, CWE, SEI CERT and other smart words from the information security worldAndrey Karpov
 
The Little Unicorn That Could
The Little Unicorn That CouldThe Little Unicorn That Could
The Little Unicorn That CouldPVS-Studio
 
Dependability Benchmarking by Injecting Software Bugs
Dependability Benchmarking by Injecting Software BugsDependability Benchmarking by Injecting Software Bugs
Dependability Benchmarking by Injecting Software BugsRoberto Natella
 
Automatic Fine-Grained Issue Report Reclassification
Automatic Fine-Grained Issue Report ReclassificationAutomatic Fine-Grained Issue Report Reclassification
Automatic Fine-Grained Issue Report ReclassificationPavneet Singh Kochhar
 
Android RenderScript on LLVM
Android RenderScript on LLVMAndroid RenderScript on LLVM
Android RenderScript on LLVMJohn Lee
 
Of Bugs and Men (and Plugins too)
Of Bugs and Men (and Plugins too)Of Bugs and Men (and Plugins too)
Of Bugs and Men (and Plugins too)Michel Wermelinger
 
Predicting Defects using Network Analysis on Dependency Graphs
Predicting Defects using Network Analysis on Dependency GraphsPredicting Defects using Network Analysis on Dependency Graphs
Predicting Defects using Network Analysis on Dependency GraphsThomas Zimmermann
 
Cross-project defect prediction
Cross-project defect predictionCross-project defect prediction
Cross-project defect predictionThomas Zimmermann
 
Safe Automated Refactoring for Intelligent Parallelization of Java 8 Streams
Safe Automated Refactoring for Intelligent Parallelization of Java 8 StreamsSafe Automated Refactoring for Intelligent Parallelization of Java 8 Streams
Safe Automated Refactoring for Intelligent Parallelization of Java 8 StreamsRaffi Khatchadourian
 
Parsing and Type checking all 2^10000 configurations of the Linux kernel
Parsing and Type checking all 2^10000 configurations of the Linux kernelParsing and Type checking all 2^10000 configurations of the Linux kernel
Parsing and Type checking all 2^10000 configurations of the Linux kernelchk49
 
PVS-Studio. Static code analyzer. Windows/Linux, C/C++/C#. 2017
PVS-Studio. Static code analyzer. Windows/Linux, C/C++/C#. 2017PVS-Studio. Static code analyzer. Windows/Linux, C/C++/C#. 2017
PVS-Studio. Static code analyzer. Windows/Linux, C/C++/C#. 2017Andrey Karpov
 
It Does What You Say, Not What You Mean: Lessons From A Decade of Program Repair
It Does What You Say, Not What You Mean: Lessons From A Decade of Program RepairIt Does What You Say, Not What You Mean: Lessons From A Decade of Program Repair
It Does What You Say, Not What You Mean: Lessons From A Decade of Program RepairClaire Le Goues
 
Integrating R with the CDK: Enhanced Chemical Data Mining
Integrating R with the CDK: Enhanced Chemical Data MiningIntegrating R with the CDK: Enhanced Chemical Data Mining
Integrating R with the CDK: Enhanced Chemical Data MiningRajarshi Guha
 
Update from android kk to android l
Update from android kk to android lUpdate from android kk to android l
Update from android kk to android lBin Yang
 
Machine-learning based performance heuristics for Runtime CPU/GPU Selection i...
Machine-learning based performance heuristics for Runtime CPU/GPU Selection i...Machine-learning based performance heuristics for Runtime CPU/GPU Selection i...
Machine-learning based performance heuristics for Runtime CPU/GPU Selection i...Akihiro Hayashi
 

Similar to Keynote HotSWUp 2012 (20)

poster_3.0
poster_3.0poster_3.0
poster_3.0
 
Studying the impact of dependency network measures on software quality
Studying the impact of dependency network measures on software quality	Studying the impact of dependency network measures on software quality
Studying the impact of dependency network measures on software quality
 
Static analysis as means of improving code quality
Static analysis as means of improving code quality Static analysis as means of improving code quality
Static analysis as means of improving code quality
 
Cocomo ii estimation
Cocomo ii estimationCocomo ii estimation
Cocomo ii estimation
 
SAST, CWE, SEI CERT and other smart words from the information security world
SAST, CWE, SEI CERT and other smart words from the information security worldSAST, CWE, SEI CERT and other smart words from the information security world
SAST, CWE, SEI CERT and other smart words from the information security world
 
The Little Unicorn That Could
The Little Unicorn That CouldThe Little Unicorn That Could
The Little Unicorn That Could
 
Dependability Benchmarking by Injecting Software Bugs
Dependability Benchmarking by Injecting Software BugsDependability Benchmarking by Injecting Software Bugs
Dependability Benchmarking by Injecting Software Bugs
 
Automatic Fine-Grained Issue Report Reclassification
Automatic Fine-Grained Issue Report ReclassificationAutomatic Fine-Grained Issue Report Reclassification
Automatic Fine-Grained Issue Report Reclassification
 
Android RenderScript on LLVM
Android RenderScript on LLVMAndroid RenderScript on LLVM
Android RenderScript on LLVM
 
Of Bugs and Men
Of Bugs and MenOf Bugs and Men
Of Bugs and Men
 
Of Bugs and Men (and Plugins too)
Of Bugs and Men (and Plugins too)Of Bugs and Men (and Plugins too)
Of Bugs and Men (and Plugins too)
 
Predicting Defects using Network Analysis on Dependency Graphs
Predicting Defects using Network Analysis on Dependency GraphsPredicting Defects using Network Analysis on Dependency Graphs
Predicting Defects using Network Analysis on Dependency Graphs
 
Cross-project defect prediction
Cross-project defect predictionCross-project defect prediction
Cross-project defect prediction
 
Safe Automated Refactoring for Intelligent Parallelization of Java 8 Streams
Safe Automated Refactoring for Intelligent Parallelization of Java 8 StreamsSafe Automated Refactoring for Intelligent Parallelization of Java 8 Streams
Safe Automated Refactoring for Intelligent Parallelization of Java 8 Streams
 
Parsing and Type checking all 2^10000 configurations of the Linux kernel
Parsing and Type checking all 2^10000 configurations of the Linux kernelParsing and Type checking all 2^10000 configurations of the Linux kernel
Parsing and Type checking all 2^10000 configurations of the Linux kernel
 
PVS-Studio. Static code analyzer. Windows/Linux, C/C++/C#. 2017
PVS-Studio. Static code analyzer. Windows/Linux, C/C++/C#. 2017PVS-Studio. Static code analyzer. Windows/Linux, C/C++/C#. 2017
PVS-Studio. Static code analyzer. Windows/Linux, C/C++/C#. 2017
 
It Does What You Say, Not What You Mean: Lessons From A Decade of Program Repair
It Does What You Say, Not What You Mean: Lessons From A Decade of Program RepairIt Does What You Say, Not What You Mean: Lessons From A Decade of Program Repair
It Does What You Say, Not What You Mean: Lessons From A Decade of Program Repair
 
Integrating R with the CDK: Enhanced Chemical Data Mining
Integrating R with the CDK: Enhanced Chemical Data MiningIntegrating R with the CDK: Enhanced Chemical Data Mining
Integrating R with the CDK: Enhanced Chemical Data Mining
 
Update from android kk to android l
Update from android kk to android lUpdate from android kk to android l
Update from android kk to android l
 
Machine-learning based performance heuristics for Runtime CPU/GPU Selection i...
Machine-learning based performance heuristics for Runtime CPU/GPU Selection i...Machine-learning based performance heuristics for Runtime CPU/GPU Selection i...
Machine-learning based performance heuristics for Runtime CPU/GPU Selection i...
 

Recently uploaded

SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 
Hybridoma Technology ( Production , Purification , and Application )
Hybridoma Technology  ( Production , Purification , and Application  ) Hybridoma Technology  ( Production , Purification , and Application  )
Hybridoma Technology ( Production , Purification , and Application ) Sakshi Ghasle
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13Steve Thomason
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxSayali Powar
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionSafetyChain Software
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfciinovamais
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991RKavithamani
 
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...RKavithamani
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfsanyamsingh5019
 
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxContemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxRoyAbrique
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdfssuser54595a
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxGaneshChakor2
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityGeoBlogs
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxmanuelaromero2013
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdfSoniaTolstoy
 

Recently uploaded (20)

SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
Hybridoma Technology ( Production , Purification , and Application )
Hybridoma Technology  ( Production , Purification , and Application  ) Hybridoma Technology  ( Production , Purification , and Application  )
Hybridoma Technology ( Production , Purification , and Application )
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory Inspection
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...
 
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdfTataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdf
 
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxContemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptx
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptx
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
 

Keynote HotSWUp 2012

  • 1. Will my system run (correctly) after the upgrade? Martin Pinzger Assistant Professor Delft University of Technology
  • 2. Martin’s upgrades Assistant Professor PhD Postdoc Pfunds 2
  • 3. My Experience with Software Upgrades 3
  • 4. 4
  • 5. 5
  • 6. Bugs on upgrades get reported 6
  • 7. Hmm, wait a minute Can’t we learn “something” from that data? 7
  • 8. Software repository mining for preventing upgrade failures Martin Pinzger Assistant Professor Delft University of Technology
  • 9. Goal of software repository mining Making the information stored in software repositories available to software developers Quality analysis and defect prediction Recommender systems ... 9
  • 11. Examples from my mining research Predicting failure-prone source files using changes (MSR 2011) The relationship between developer contributions and failures (FSE 2008) There are many more studies MSR 2012 http://2012.msrconf.org/ A survey and taxonomy of approaches for mining software repositories in the context of software evolution, Kagdi et al. 2007 11
  • 12. Using Fine-Grained Source Code Changes for Bug Prediction Joint work with Emanuel Giger, Harald Gall University of Zurich
  • 13. Bug prediction Goal Train models to predict the bug-prone source files of the next release How Using product measures, process measures, organizational measures with machine learning techniques Many existing studies on building prediction models Moser et al., Nagappan et al., Zimmermann et al., Hassan et al., etc. Process measures performed particularly well 13
  • 14. Classical change measures Number of file revisions Code Churn aka lines added/deleted/changed Research question of this study: Can we further improve these models? 14
  • 15. Revisions are coarse grained What did change in a revision? 15
  • 16. Code Churn can be imprecise Extra changes not relevant for locating bugs 16
  • 17. Fine Grained-Source Code Changes (SCC) Account.java 1.5 Account.java 1.6 "balance > 0 && amount <= balance" IF "balance > 0" IF THEN THEN ELSE MI MI MI notify(); "withDraw(amount);" "withDraw(amount);" 3 SCC: 1x condition change, 1x else-part insert, 1x invocation statement insert 17
  • 18. Research hypotheses H1 SCC is correlated with the number of bugs in source files H2 SCC is a predictor for bug-prone source files (and outperforms LM) H3 SCC is a predictor for the number of bugs in source files (and outperforms LM) 18
  • 19. 15 Eclipse plug-ins Data >850’000 fine-grained source code changes (SCC) >10’000 files >9’700’000 lines modified (LM) >9 years of development history ..... and a lot of bugs referenced in commit messages 19
  • 20. on parametric Spearman rank correlation of Table 5: N nd SCC . * is correlated with #bugsat H1: SCC marks significant correlations and cate rger values are printed bold. = 0.01 Eclipse Project LM SCC Eclipse Pr Compare 0.68 0.76 Compare jFace 0.74 0.71 jFace JDT Debug 0.62 0.8 Resource Resource 0.75 0.86 Team Cor Runtime 0.66 0.79 CVS Core Team Core 0.15 0.66 Debug Co CVS Core 0.60 0.79 Runtime Debug Core 0.63 0.78 JDT Debu jFace Text 0.75 0.74 jFace Text Update Core 0.43 0.62 JDT Debu Debug UI 0.56 0.81 Update C +/-0.5 substantial Debug UI JDT Debug UI 0.80 0.81 Help 0.54 0.48 +/-0.7 strong Help JDT Core 0.70 0.74 OSGI OSGI 0.70 0.77 *significant JDT Core Median 0.66 0.77 correlation at 0.01Mean 20
  • 21. calculate and assign a probability to a file if it is bug-prone or not bug-prone. bug-prone files Predicting For each Eclipse project we binned files into bug-prone and not bug-prone using the median of the number of bugs per file Bug-prone vs. not bug-prone (#bugs): ⇢ not bug prone : #bugs <= median bugClass = bug prone : #bugs > median When using the median as cut point the labeling of a file is relative to how much bugs other files have in a project. There exist several ways of binning files afore. They mainly vary in that they result in different prior probabilities: For instance Zimmerman et al. [40] and Bernstein et al. [4] labeled files as bug-prone if they had at least one bug. When having heavily skewed distributions this approach may lead to high a prior probability towards a one class. Nagappan et al. [28] used a 21
  • 22. UC values of E 1 using logistic regression with CC as predictors for bug-prone and a notfiles H2: SCC can predict bug-prone bug- Larger values are printed in bold. Eclipse Project AUC LM AUC SCC Compare 0.84 0.85 jFace 0.90 0.90 JDT Debug 0.83 0.95 Resource 0.87 0.93 Runtime 0.83 0.91 Team Core 0.62 0.87 CVS Core 0.80 0.90 Debug Core 0.86 0.94 SCC outperforms LM jFace Text 0.87 0.87 Update Core 0.78 0.85 Debug UI 0.85 0.93 JDT Debug UI 0.90 0.91 Help 0.75 0.70 JDT Core 0.86 0.87 OSGI 0.88 0.88 Median 0.85 0.90 Overall 0.85 0.89 22
  • 23. Predicting the number of bugs Non linear regression with asymptotic model: Team Core 60 #Bugs 40 20 f(#Bugs) = a1 + b2*eb3*SCC 0 0 1000 2000 3000 4000 23 #SCC
  • 24. 1.50 Table 8: Results of predict the number of of R H3: SCC can the nonlinear regression in terms bugs 2 and Spearman correlation using LM and SCC as predictors. 1.00 nrm. Residuals Project R2 LM R2 SCC SpearmanLM SpearmanSCC .50 Compare 0.84 0.88 0.68 0.76 jFace 0.74 0.79 0.74 0.71 .00 JDT Debug 0.69 0.68 0.62 0.8 Resource 0.81 0.85 0.75 0.86 -.50 Runtime 0.69 0.72 0.66 0.79 Team Core 0.26 0.53 0.15 0.66 -1.00 CVS Core 0.76 0.83 0.62 0.79 Debug Core 0.88 0.92 0.63 0.78 Jface Text 0.83 0.89 0.75 0.74 6,000.0 Update Core 0.41 0.48 0.43 0.62 Debug UI 0.7 0.79 0.56 0.81 5,000.0 JDT Debug UI 0.82 0.82 0.8 0.81 Help 0.66 0.67 0.54 0.84 4,000.0 JDT Core 0.69 0.77 0.7 0.74 OSGI 0.51 0.8 0.74 0.77 3,000.0 Median 0.7 0.79 0.66 0.77 Overall 0.65 0.72 0.62 0.74 2,000.0 SCC outperforms LM 1,000.0 4 2
  • 25. Summary of results SCC performs significantly better than LM Advanced learners are not always better Change types do not yield extra discriminatory power Predicting the number of bugs is “possible” More information “Comparing Fine-Grained Source Code Changes And Code Churn For Bug Prediction”, MSR 2011 25
  • 26. What is next? Analysis of the effect(s) of changes What is the effect on the design? What is the effect on the quality? Ease understanding of changes Recommender techniques Models that can provide feedback on the effects 26
  • 27. 27
  • 28. Can developer-module networks predict failures? Joint work with Nachi Nagappan, Brendan Murphy Microsoft Research
  • 29. Research question Are binaries with fragmented contributions from many developers more likely to have post-release failures? Should developers focus on one thing? 29
  • 30. Study with MS Vista project Data Released in January, 2007 > 4 years of development Several thousand developers Several thousand binaries (*.exe, *.dll) Several millions of commits 30
  • 31. Approach in a nutshell Fu Alice 6 6 Change Eric b Bob a Go 5 2 4 2 Logs 4 5 7 Dan Hin c 4 Binary #bugs #centrality a 12 0.9 Bugs b 7 0.5 c 3 0.2 Regression Analysis Validation with data splitting 31
  • 32. Contribution network Windows binary (*.dll) Developer Which binary is failure-prone? 32
  • 33. Measuring fragmentation Freeman degree Closeness Bonacich’s power 33
  • 34. Research hypotheses Binaries with fragmented contributions H1 are failure-prone Fragmentation correlates positively with H2 the number of post-release failures Advanced fragmentation measures H3 improve failure estimation 34
  • 35. Correlation analysis Spearman rank correlation nrCommits nrAuthors Power dPower Closeness Reach Betweenness Failures 0,700 0,699 0,692 0,740 0,747 0,746 0,503 nrCommits 0,704 0,996 0,773 0,748 0,732 0,466 nrAuthors 0,683 0,981 0,914 0,944 0,830 Power 0,756 0,732 0,714 0,439 dPower 0,943 0,964 0,772 Closeness 0,990 0,738 Reach 0,773 All correlations are significant at the 0.01 level (2-tailed) 35
  • 36. H1: Predicting failure-prone binaries Binary logistic regression of 50 random splits 4 principal components from 7 centrality measures Precision Recall AUC 1.00 1.00 1.00 0.90 0.90 0.90 0.80 0.80 0.80 0.70 0.70 0.70 0.60 0.60 0.60 0.50 0.50 0.50 0 20 40 0 20 40 0 20 40 36
  • 37. H2: Predicting the number of failures Linear regression of 50 random splits #Failures = b0 + b1*nCloseness + b2*nrAuthors + b3*nrCommits R-Square Pearson Spearman 1.00 1.00 1.00 0.90 0.90 0.90 0.80 0.80 0.80 0.70 0.70 0.70 0.60 0.60 0.60 0.50 0.50 0.50 0 20 40 0 20 40 0 20 40 All correlations are significant at the 0.01 level (2-tailed) 37
  • 38. H3: Basic vs. advanced measures Model with nrAuthors, Model with nCloseness, nrCommits nrAuthors, nrCommits 1.00 1.00 R-Square 0.90 0.90 0.80 0.80 0.70 0.70 0.60 0.60 0.50 0.50 0.40 0.40 0.30 0.30 0 20 40 0 20 40 Spearman 1.00 1.00 0.90 0.90 0.80 0.80 0.70 0.70 0.60 0.60 0.50 0.50 0.40 0.40 0.30 0.30 0 20 40 0 20 40 38
  • 39. Summary of results Centrality measures can predict more than 83% of failure- pone Vista binaries Closeness, nrAuthors, and nrCommits can predict the number of post-release failures Closeness or Reach can improve prediction of the number of post-release failures by 32% More information Can Developer-Module Networks Predict Failures?, FSE 2008 39
  • 40. What can we learn from that? 6 6 5 2 4 2 4 5 7 4 Increase testing effort for central binaries? - yes Re-factor central binaries? - maybe Re-organize contributions? - maybe 40
  • 41. What is next? Analysis of the contributions of a developer Who is working on which parts of the system? What exactly is the contribution of a developer? Who is introducing bugs/smells and how can we avoid it? Global distributed software engineering What are the contributions of teams, smells and how to avoid it? Can we empirically prove Conway’s Law? Expert recommendation Whom to ask for advice on a piece of code? 41
  • 42. Ideas for software upgrade research 1. Mining software repositories to identify the upgrade-critical components What are the characteristics of such components? Product and process measures What are the characteristics of the target environments? Hardware, operating system, configuration Train a model with these characteristics and reported bugs 42
  • 43. Further ideas for research Who is upgrading which applications when? Study upgrade behavior of users? What is the environment of the users when they upgrade? Where did it work, where did it fail? Collect crash reports for software upgrades? Upgrades in distributed applications? Finding the optimal time when to upgrade which component? 43
  • 44. Conclusions Team Core 60 6 6 #Bugs 40 5 2 4 2 4 5 7 20 4 0 0 1000 2000 3000 4000 #SCC Questions? Martin Pinzger m.pinzger@tudelft.nl 44