Keynote HotSWUp 2012

549 views

Published on

Keynote for the Fourth Workshop on Hot Topics in Software Upgrades, co-located with ICSE 2012, Zurich, Switzerland

Published in: Education, Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
549
On SlideShare
0
From Embeds
0
Number of Embeds
4
Actions
Shares
0
Downloads
6
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Keynote HotSWUp 2012

  1. 1. Will my system run (correctly)after the upgrade?Martin PinzgerAssistant ProfessorDelft University of Technology
  2. 2. Martin’s upgrades Assistant Professor PhD Postdoc Pfunds 2
  3. 3. My Experience with Software Upgrades 3
  4. 4. 4
  5. 5. 5
  6. 6. Bugs on upgrades get reported 6
  7. 7. Hmm, wait a minuteCan’t we learn “something” from that data? 7
  8. 8. Software repository mining forpreventing upgrade failuresMartin PinzgerAssistant ProfessorDelft University of Technology
  9. 9. Goal of software repository miningMaking the information stored in software repositoriesavailable to software developers Quality analysis and defect prediction Recommender systems ... 9
  10. 10. Software repositories 10
  11. 11. Examples from my mining researchPredicting failure-prone source files using changes (MSR 2011)The relationship between developer contributions and failures(FSE 2008)There are many more studies MSR 2012 http://2012.msrconf.org/ A survey and taxonomy of approaches for mining software repositories in the context of software evolution, Kagdi et al. 2007 11
  12. 12. Using Fine-Grained SourceCode Changes for BugPredictionJoint work with Emanuel Giger, Harald GallUniversity of Zurich
  13. 13. Bug predictionGoal Train models to predict the bug-prone source files of the next releaseHow Using product measures, process measures, organizational measures with machine learning techniquesMany existing studies on building prediction models Moser et al., Nagappan et al., Zimmermann et al., Hassan et al., etc. Process measures performed particularly well 13
  14. 14. Classical change measuresNumber of file revisionsCode Churn aka lines added/deleted/changedResearch question of this study: Can we further improve thesemodels? 14
  15. 15. Revisions are coarse grainedWhat did change in a revision? 15
  16. 16. Code Churn can be impreciseExtra changes not relevant for locating bugs 16
  17. 17. Fine Grained-Source Code Changes (SCC) Account.java 1.5 Account.java 1.6 "balance > 0 && amount <= balance" IF "balance > 0" IF THEN THEN ELSE MI MI MI notify();"withDraw(amount);" "withDraw(amount);"3 SCC: 1x condition change, 1x else-part insert, 1x invocationstatement insert 17
  18. 18. Research hypothesesH1 SCC is correlated with the number of bugs in source filesH2 SCC is a predictor for bug-prone source files (and outperforms LM)H3 SCC is a predictor for the number of bugs in source files (and outperforms LM) 18
  19. 19. 15 Eclipse plug-insData >850’000 fine-grained source code changes (SCC) >10’000 files >9’700’000 lines modified (LM) >9 years of development history ..... and a lot of bugs referenced in commit messages 19
  20. 20. on parametric Spearman rank correlation of Table 5: N nd SCC . * is correlated with #bugsat H1: SCC marks significant correlations and caterger values are printed bold. = 0.01 Eclipse Project LM SCC Eclipse Pr Compare 0.68 0.76 Compare jFace 0.74 0.71 jFace JDT Debug 0.62 0.8 Resource Resource 0.75 0.86 Team Cor Runtime 0.66 0.79 CVS Core Team Core 0.15 0.66 Debug Co CVS Core 0.60 0.79 Runtime Debug Core 0.63 0.78 JDT Debu jFace Text 0.75 0.74 jFace Text Update Core 0.43 0.62 JDT Debu Debug UI 0.56 0.81 Update C +/-0.5 substantial Debug UI JDT Debug UI 0.80 0.81 Help 0.54 0.48 +/-0.7 strong Help JDT Core 0.70 0.74 OSGI OSGI 0.70 0.77 *significant JDT Core Median 0.66 0.77 correlation at 0.01Mean 20
  21. 21. calculate and assign a probability to a file if it is bug-prone ornot bug-prone. bug-prone files Predicting For each Eclipse project we binned files into bug-prone andnot bug-prone using the median of the number of bugs per file Bug-prone vs. not bug-prone(#bugs): ⇢ not bug prone : #bugs <= median bugClass = bug prone : #bugs > medianWhen using the median as cut point the labeling of a file isrelative to how much bugs other files have in a project. Thereexist several ways of binning files afore. They mainly vary inthat they result in different prior probabilities: For instanceZimmerman et al. [40] and Bernstein et al. [4] labeled files asbug-prone if they had at least one bug. When having heavilyskewed distributions this approach may lead to high a priorprobability towards a one class. Nagappan et al. [28] used a 21
  22. 22. UC values of E 1 using logistic regression withCC as predictors for bug-prone and a notfiles H2: SCC can predict bug-prone bug- Larger values are printed in bold. Eclipse Project AUC LM AUC SCC Compare 0.84 0.85 jFace 0.90 0.90 JDT Debug 0.83 0.95 Resource 0.87 0.93 Runtime 0.83 0.91 Team Core 0.62 0.87 CVS Core 0.80 0.90 Debug Core 0.86 0.94 SCC outperforms LM jFace Text 0.87 0.87 Update Core 0.78 0.85 Debug UI 0.85 0.93 JDT Debug UI 0.90 0.91 Help 0.75 0.70 JDT Core 0.86 0.87 OSGI 0.88 0.88 Median 0.85 0.90 Overall 0.85 0.89 22
  23. 23. Predicting the number of bugsNon linear regression with asymptotic model: Team Core 60 #Bugs 40 20 f(#Bugs) = a1 + b2*eb3*SCC 0 0 1000 2000 3000 4000 23 #SCC
  24. 24. 1.50Table 8: Results of predict the number of of RH3: SCC can the nonlinear regression in terms bugs 2and Spearman correlation using LM and SCC as predictors. 1.00 nrm. Residuals Project R2 LM R2 SCC SpearmanLM SpearmanSCC .50 Compare 0.84 0.88 0.68 0.76 jFace 0.74 0.79 0.74 0.71 .00 JDT Debug 0.69 0.68 0.62 0.8 Resource 0.81 0.85 0.75 0.86 -.50 Runtime 0.69 0.72 0.66 0.79 Team Core 0.26 0.53 0.15 0.66 -1.00 CVS Core 0.76 0.83 0.62 0.79 Debug Core 0.88 0.92 0.63 0.78 Jface Text 0.83 0.89 0.75 0.74 6,000.0 Update Core 0.41 0.48 0.43 0.62 Debug UI 0.7 0.79 0.56 0.81 5,000.0 JDT Debug UI 0.82 0.82 0.8 0.81 Help 0.66 0.67 0.54 0.84 4,000.0 JDT Core 0.69 0.77 0.7 0.74 OSGI 0.51 0.8 0.74 0.77 3,000.0 Median 0.7 0.79 0.66 0.77 Overall 0.65 0.72 0.62 0.74 2,000.0SCC outperforms LM 1,000.0 4 2
  25. 25. Summary of resultsSCC performs significantly better than LM Advanced learners are not always better Change types do not yield extra discriminatory powerPredicting the number of bugs is “possible”More information “Comparing Fine-Grained Source Code Changes And Code Churn For Bug Prediction”, MSR 2011 25
  26. 26. What is next?Analysis of the effect(s) of changes What is the effect on the design? What is the effect on the quality?Ease understanding of changesRecommender techniques Models that can provide feedback on the effects 26
  27. 27. 27
  28. 28. Can developer-modulenetworks predict failures?Joint work with Nachi Nagappan, Brendan MurphyMicrosoft Research
  29. 29. Research questionAre binaries with fragmented contributions from manydevelopers more likely to have post-release failures? Should developers focus on one thing? 29
  30. 30. Study with MS Vista projectData Released in January, 2007 > 4 years of development Several thousand developers Several thousand binaries (*.exe, *.dll) Several millions of commits 30
  31. 31. Approach in a nutshell Fu Alice 6 6 Change Eric b Bob a Go 5 2 4 2 Logs 4 5 7 Dan Hin c 4 Binary #bugs #centrality a 12 0.9 Bugs b 7 0.5 c 3 0.2 Regression Analysis Validation with data splitting 31
  32. 32. Contribution network Windows binary (*.dll) DeveloperWhich binary is failure-prone? 32
  33. 33. Measuring fragmentation Freeman degree Closeness Bonacich’s power 33
  34. 34. Research hypotheses Binaries with fragmented contributionsH1 are failure-prone Fragmentation correlates positively withH2 the number of post-release failures Advanced fragmentation measuresH3 improve failure estimation 34
  35. 35. Correlation analysisSpearman rank correlation nrCommits nrAuthors Power dPower Closeness Reach Betweenness Failures 0,700 0,699 0,692 0,740 0,747 0,746 0,503nrCommits 0,704 0,996 0,773 0,748 0,732 0,466nrAuthors 0,683 0,981 0,914 0,944 0,830 Power 0,756 0,732 0,714 0,439 dPower 0,943 0,964 0,772Closeness 0,990 0,738 Reach 0,773All correlations are significant at the 0.01 level (2-tailed) 35
  36. 36. H1: Predicting failure-prone binariesBinary logistic regression of 50 random splits 4 principal components from 7 centrality measuresPrecision Recall AUC1.00 1.00 1.000.90 0.90 0.900.80 0.80 0.800.70 0.70 0.700.60 0.60 0.600.50 0.50 0.50 0 20 40 0 20 40 0 20 40 36
  37. 37. H2: Predicting the number of failuresLinear regression of 50 random splits #Failures = b0 + b1*nCloseness + b2*nrAuthors + b3*nrCommitsR-Square Pearson Spearman1.00 1.00 1.000.90 0.90 0.900.80 0.80 0.800.70 0.70 0.700.60 0.60 0.600.50 0.50 0.50 0 20 40 0 20 40 0 20 40All correlations are significant at the 0.01 level (2-tailed) 37
  38. 38. H3: Basic vs. advanced measures Model with nrAuthors, Model with nCloseness, nrCommits nrAuthors, nrCommits 1.00 1.00 R-Square 0.90 0.90 0.80 0.80 0.70 0.70 0.60 0.60 0.50 0.50 0.40 0.40 0.30 0.30 0 20 40 0 20 40 Spearman 1.00 1.00 0.90 0.90 0.80 0.80 0.70 0.70 0.60 0.60 0.50 0.50 0.40 0.40 0.30 0.30 0 20 40 0 20 40 38
  39. 39. Summary of resultsCentrality measures can predict more than 83% of failure-pone Vista binariesCloseness, nrAuthors, and nrCommits can predict the numberof post-release failuresCloseness or Reach can improve prediction of the number ofpost-release failures by 32%More information Can Developer-Module Networks Predict Failures?, FSE 2008 39
  40. 40. What can we learn from that? 6 6 5 2 4 2 4 5 7 4Increase testing effort for central binaries? - yesRe-factor central binaries? - maybeRe-organize contributions? - maybe 40
  41. 41. What is next?Analysis of the contributions of a developer Who is working on which parts of the system? What exactly is the contribution of a developer? Who is introducing bugs/smells and how can we avoid it?Global distributed software engineering What are the contributions of teams, smells and how to avoid it? Can we empirically prove Conway’s Law?Expert recommendation Whom to ask for advice on a piece of code? 41
  42. 42. Ideas for software upgrade research1. Mining software repositories to identify the upgrade-criticalcomponents What are the characteristics of such components? Product and process measures What are the characteristics of the target environments? Hardware, operating system, configuration Train a model with these characteristics and reported bugs 42
  43. 43. Further ideas for researchWho is upgrading which applications when? Study upgrade behavior of users?What is the environment of the users when they upgrade? Where did it work, where did it fail? Collect crash reports for software upgrades?Upgrades in distributed applications? Finding the optimal time when to upgrade which component? 43
  44. 44. Conclusions Team Core 60 6 6#Bugs 40 5 2 4 2 4 5 7 20 4 0 0 1000 2000 3000 4000 #SCC Questions? Martin Pinzger m.pinzger@tudelft.nl 44

×