Defect, defect, defect: PROMISE 2012 Keynote

6,531 views

Published on

Software prediction leveraging repositories has received a tremendous amount of attention within the software engineering community, including PROMISE. In this talk, I will first present great achievements in defect prediction research including new defect prediction features, promising algorithms, and interesting analysis results. However, there are still many challenges in defect prediction. I will talk about them and discuss potential solutions for them leveraging prediction 2.0.

Published in: Technology
1 Comment
2 Likes
Statistics
Notes
No Downloads
Views
Total views
6,531
On SlideShare
0
From Embeds
0
Number of Embeds
3,231
Actions
Shares
0
Downloads
155
Comments
1
Likes
2
Embeds 0
No embeds

No notes for slide

Defect, defect, defect: PROMISE 2012 Keynote

  1. 1. Keynote Defect,Defect, Defect Sung KimThe Hong Kong University of Science and Technology
  2. 2. Program Analysis and Mining (PAM) Group
  3. 3. Program Analysis and Mining (PAM) Group
  4. 4. The First Bug September 9, 1947
  5. 5. More Bugs
  6. 6. Finding Bugs Verification Testing Prediction
  7. 7. Defect Prediction 42 24 14Program Tool Future defects
  8. 8. Why Prediction?
  9. 9. Defect Prediction Model D= 4.8 6+ 0. 0 18L F. Akiyama, “An Example of Software System Debugging,” Information Processing, vol. 71, 1971
  10. 10. Defect PredictionIdentifying New MetricsDeveloping New AlgorithmsVarious Granularities
  11. 11. Defect PredictionIdentifying New MetricsDeveloping New AlgorithmsVarious Granularities
  12. 12. Complex Files a simple file a complex fileOstrand and Weyuker, Basili et al., TSE 1996, Ohlsson and Alberg, TSE 1996, Menzies et al., TSE 2007
  13. 13. Complex Files a simple file a complex fileOstrand and Weyuker, Basili et al., TSE 1996, Ohlsson and Alberg, TSE 1996, Menzies et al., TSE 2007
  14. 14. ChangesBell et al. PROMISE 2011, Moser et al., ICSE 2008, Nagappan et al., ICSE 2006, Hassan et al., ICSM 2005
  15. 15. ChangesBell et al. PROMISE 2011, Moser et al., ICSE 2008, Nagappan et al., ICSE 2006, Hassan et al., ICSM 2005
  16. 16. View/Edit Patterns Lee et al., FSE2011
  17. 17. Slide by Mik Kersten. “Mylyn – The task-focused interface” (December 2007, http://live.eclipse.org)
  18. 18. With Mylyn Tasks are integrated See only what you are working onSlide by Mik Kersten. “Mylyn – The task-focused interface” (December 2007, http://live.eclipse.org)
  19. 19. * Eclipse plug-in storing and recovering task contexts
  20. 20. * Eclipse plug-in storing and recovering task contexts
  21. 21. <InteractionEvent … Kind=“ ” … StartDate=“ ” EndDate=“ ” … StructureHandle=“ ” … Interest=“ ” … > * Eclipse plug-in storing and recovering task contexts
  22. 22. Burst Edits/Views Lee et al., FSE2011
  23. 23. Burst Edits/Views Lee et al., FSE2011
  24. 24. Change Entropy11 Low Entropy High Entropy 3 3 3 3 3 1 1 1 1F1F1 F2 F2 F3 F3 F4 F4 F5 F5 F6 F1 F7 F2 F8 F3 F9 F4 F10 F5 The number of changes in a period (e.g., a week) per file Hassan, “Predicting Faults Using the Complexity of Code Changes,” ICSE 2009
  25. 25. Change Entropy11 Low Entropy High Entropy 3 3 3 3 3 1 1 1 1F1F1 F2 F2 F3 F3 F4 F4 F5 F5 F6 F1 F7 F2 F8 F3 F9 F4 F10 F5 The number of changes in a period (e.g., a week) per file Hassan, “Predicting Faults Using the Complexity of Code Changes,” ICSE 2009
  26. 26. Previous Fixes Hassan et al., ICSM 2005, Kim et al., ICSE 2007
  27. 27. Previous Fixes Hassan et al., ICSM 2005, Kim et al., ICSE 2007
  28. 28. Previous Fixes Hassan et al., ICSM 2005, Kim et al., ICSE 2007
  29. 29. NetworkZimmermann and Nagappan, “Predicting Defects using Network Analysis on Dependency Graphs,”ICSE 2008
  30. 30. NetworkZimmermann and Nagappan, “Predicting Defects using Network Analysis on Dependency Graphs,”ICSE 2008
  31. 31. More Metrics Complexity (Size) CK McCabe OO Process metrics Halstead Developer Count metrics Change metricsEntropy of changes (Change Complexity) Churn (source code metrics) # of changes to the file Previous defects Network measures Calling structure attributes Entropy (source code metrics) 0 5 10 15 20 25 # of publications (last 7 years)
  32. 32. Defect PredictionIdentifying New MetricsDeveloping New AlgorithmsVarious Granularities
  33. 33. Classification training instances complexity metrics (metrics+ labels) historical metrics ... ?new instance Prediction Learner (classification)
  34. 34. Regression training instances complexity metrics (metrics+ values) historical metrics ... ?new instance Prediction Learner (values)
  35. 35. Active Learning Anomaly Detection System 1 Refinement Sorted Engine Bug Reports 2 5 <<Refinement Loop>> First Few Bug User Reports 3 Feedback 4 Figure 4. Active Refinement Processcharacteristics in a clone group. Then, the set ofet al., PROMISE 2012 Lo et al., “Active Refinement of Clone Anomaly Reports,” ICSE 2012, Lu anomalies or
  36. 36. Bug Cache c h 10% files e t most bug-prone - f re ad s is p on Lo m t en em ac plNearby: co changes re all files Kim et al., “Predicting Faults from Cached History,” ICSE 2007
  37. 37. Algorithms Classification 21Algorithms Regression 18 Both 4 Etc. 4 0 5 10 15 20 25 # of publications (recent 7 years) 31
  38. 38. Defect PredictionIdentifying New MetricsDeveloping New AlgorithmsVarious Granularities
  39. 39. Module/Binary/Package Level
  40. 40. Module/Binary/Package Level
  41. 41. File Level
  42. 42. File Level
  43. 43. Method Level void foo () { ... }Hata et al.,“Bug Prediction Based on Fine-Grained Module Histories,” ICSE 2012
  44. 44. Method Level void foo () { ... }Hata et al.,“Bug Prediction Based on Fine-Grained Module Histories,” ICSE 2012
  45. 45. Change Level Development history of a fileRev 1 Rev 2 Rev 3 Rev 4... ... ... ...... change ... change ... change ...... ... ... ...... ... ... ... Did I just introduce a bug? Kim et al., "Classifying Software Changes: Clean or Buggy?" TSE 2009
  46. 46. Change Level Development history of a fileRev 1 Rev 2 Rev 3 Rev 4... ... ... ...... change ... change ... change ...... ... ... ...... ... ... ... Did I just introduce a bug? Kim et al., "Classifying Software Changes: Clean or Buggy?" TSE 2009
  47. 47. More GranularitiesProject/Release/SubSystem 3 Component/Module 8 Package 3 File 19 Class 8 Function/Method 2 Change/Hunk level 1 0 5 10 15 20 # of publications (recent 7 years)
  48. 48. Defect Prediction Summary Identifying New Metrics Developing New Algorithms Various Granularities
  49. 49. Performance 11 Apache ArgoUML Eclipse Embedded Healthcare Microsoft Mozilla System system systemHall et al., "A Systematic Review of Fault Prediction Performance in Software Engineering," TSE 2011 (Figure 2)
  50. 50. Performance 13 13 Class File Module Binary/plug-in *For example plug-ins, binaries *For example plug-ins, binaries Figure 6. The granularity of the resultsThe granularity of the results Hall et al., "A Systematic Review of Fault Prediction Performance in Software Engineering," TSE 2011 (Figure 6)
  51. 51. Performance 13 13 Class File Module Binary/plug-in *For example plug-ins, binaries *For example plug-ins, binaries Figure 6. The granularity of the resultsThe granularity of the results Hall et al., "A Systematic Review of Fault Prediction Performance in Software Engineering," TSE 2011 (Figure 6)
  52. 52. Defect prediction totally works!
  53. 53. Defect prediction totally works!
  54. 54. Done? Why are not using?
  55. 55. Detailed To Fix ListVS Buggy Modules
  56. 56. Detailed To Fix ListVS Buggy Modules
  57. 57. This is what developers want!
  58. 58. Defect Prediction 2.0Finer GranularityNoise HandlingNew Customers
  59. 59. Defect Prediction 2.0Finer GranularityNoise HandlingNew Customers
  60. 60. FindBugs http://findbugs.sourceforge.net/
  61. 61. Performance of Bug Detection Tools Tools priority 1 FindBugsWarnings jLint PMD 0 5 10 15 20 Precision (%) Kim and Ernst, “Which Warnings Should I Fix First?” FSE 2007
  62. 62. RQ1: How Many False Negatives!  Defects missed, partially, or fully captured!  Warnings from a tool should also correctly explain in detail why a flagged line may be faulty!  How many one-line defects are captured and explained reasonably well (so called, “strictly captured”)? Very high miss rates! 21 Thung et al., “To What Extent Could We Detect Field Defects?” ASE 2012
  63. 63. RQ1: How Many False Negatives!  Defects missed, partially, or fully captured!  Warnings from a tool should also correctly explain in detail why a flagged line may be faulty!  How many one-line defects are captured and explained reasonably well (so called, “strictly captured”)? Very high miss rates! 21 Thung et al., “To What Extent Could We Detect Field Defects?” ASE 2012
  64. 64. Line Level Defect Prediction
  65. 65. Line Level Defect Prediction We have seen this bug in revision 100
  66. 66. Bug Fix Memories Bug fix changes in revision 1 .. n-1 ……Extract patterns in bug fix Memory change history Kim et al., “"Memories of bug fixes",” FSE 2006
  67. 67. Bug Fix Memories Bug fix changes in revision 1 .. n-1 Code to examine …… Search for patterns in MemoryExtract patterns in bug fix Memory change history Kim et al., “"Memories of bug fixes",” FSE 2006
  68. 68. Fix Wizardpublic void setColspan(int colspan) throws WrongValueException{ public if (colspan <= 0) throw new WrongValueException(...); if ( colspan != colspan) { public colspan = colspan; Objec final Execution exec = Executions.getCurrent(); if (tar MCla if (exec != null && exec.isExplorer()) invalidate() ; Colle smartUpdate(”colspan” Integer.toString( colspan));... , MOp classpublic void setRowspan(int rowspan) throws WrongValueException{ if (rowspan <= 0) throw new WrongValueException(...); if ( rowspan != rowspan) { public rowspan = rowspan; final Execution exec = Executions.getCurrent(); public Objec if (exec != null && exec.isExplorer()) invalidate(); if (tar smartUpdate(”rowspan” Integer.toString( rowspan));... , MCla Colle MAt class Figure 1: et al., “Recurring at v5088-v5089 in ZK Nguyen Bug Fixes Bug Fixes in Object-Oriented Programs,” ICSE 2010
  69. 69. if (exec != null && exec.isExplorer()) invalidate(); smartUpdate(”rowspan” Integer.toString( rowspan));... , Fix Wizard in ZK Figure 1: Bug Fixes at v5088-v5089 public void setColspan(int colspan) throws WrongValueException{ if (colspan <= 0) throw new WrongValueException(...); Usage in method colspan) { if ( colspan != colSpan Usage in method rowSpan colspan = colspan; IF IF IF IF final Execution exec = Executions.getCurrent();WrongValueException(exec if .<init> != nullExecutions.getCurrent && exec.isExplorer()) invalidate() ; WrongValueException .<init> Executions.getCurrent smartUpdate(”colspan” Integer.toString( colspan));... , Execution.isExplorer Execution.isExplorer IF public void setRowspan(int rowspan) throws WrongValueException{ IF if (rowspan <= 0) throw new WrongValueException(...); if ( rowspan != rowspan) { Auxheader.invalidate Auxheader.invalidate rowspan = rowspan; Usage in changed code Auxheader.smartUpdate final Execution exec = Executions.getCurrent(); Auxheader.smartUpdate if (exec != null && exec.isExplorer()) invalidate(); smartUpdate(”rowspan” Integer.toString( rowspan));... , Figure 2: Graph-based Object Usages for Figure 1 Nguyen et al., Recurring Bug Fixes in Object-Oriented Programs,” ICSE 2010
  70. 70. Word Level Defect Prediction
  71. 71. Word Level Defect Prediction Fix suggestion ...
  72. 72. Defect Prediction 2.0Finer GranularityNoise HandlingNew Customers
  73. 73. Source Repository Bug Database all commits C commit all bugs B commit commitcommit commit commit fixed bugs Bf commit commit commit Bird et al., “Fair and Balanced? Bias in Bug-Fix Datasets,” FSE2009
  74. 74. Source Repository Bug Database all commits C commit all bugs B commit commitcommit commit commit fixed bugs Bf commit commit commit linked via log messages Bird et al., “Fair and Balanced? Bias in Bug-Fix Datasets,” FSE2009
  75. 75. Source Repository Bug Database all commits C commit all bugs B commit commitcommit commit commit fixed bugs Bf commit linked fixed bugs Bfl commit commit linked via log messages Bird et al., “Fair and Balanced? Bias in Bug-Fix Datasets,” FSE2009
  76. 76. Source Repository Bug Database all commits C commit all bugs B commit commitcommit commit commit fixed bugs Bf commit linked fixes Cfl linked fixed bugs Bfl commit commit linked via log messages Bird et al., “Fair and Balanced? Bias in Bug-Fix Datasets,” FSE2009
  77. 77. Source Repository Bug Database all commits C commit all bugs B commit commitcommit commit related, commit but not linked fixed bugs Bf commit linked fixes Cfl linked fixed bugs Bfl commit commit linked via log messages Bird et al., “Fair and Balanced? Bias in Bug-Fix Datasets,” FSE2009
  78. 78. Source Repository Bug Database all commits C commit all bugs B commit commitcommit commit bug fixes Cf related, commit but not linked fixed bugs Bf commit linked fixes Cfl linked fixed bugs Bfl commit commit linked via log messages Bird et al., “Fair and Balanced? Bias in Bug-Fix Datasets,” FSE2009
  79. 79. Source Repository Bug Database all commits C oise! N all bugs B commit commit commitcommit commit bug fixes Cf related, commit but not linked fixed bugs Bf commit linked fixes Cfl linked fixed bugs Bfl commit commit linked via log messages Bird et al., “Fair and Balanced? Bias in Bug-Fix Datasets,” FSE2009
  80. 80. How resistant a defect prediction model is to noise? 1" 0.9" 0.8" 0.7"Buggy%Fmeasure SWT" 0.6" 0.5" Debug" 0.4" Columba" 0.3" Eclipse" 0.2" Scarab" 0.1" 0" 0" 0.1" 0.2" 0.3" 0.4" 0.5" 0.6" (c)%Training%set%false%nega6ve%(FN)%&%false%posi6ve%(FP)%rate Kim et al., “Dealing with Noise in Defect Prediction,” ICSE 2011
  81. 81. How resistant a defect prediction model is to noise? 1" 0.9" 0.8" 0.7"Buggy%Fmeasure SWT" 0.6" 0.5" Debug" 0.4" Columba" 0.3" Eclipse" 0.2" Scarab" 0.1" 0" 0" 0.1" 0.2" 0.3" 0.4" 0.5" 0.6" (c)%Training%set%false%nega6ve%(FN)%&%false%posi6ve%(FP)%rate Kim et al., “Dealing with Noise in Defect Prediction,” ICSE 2011
  82. 82. How resistant a defect prediction model is to noise? 1" 0.9" 0.8" 0.7"Buggy%Fmeasure SWT" 0.6" 0.5" Debug" 0.4" Columba" 0.3" 0.2" 0.1" 20% Eclipse" Scarab" 0" 0" 0.1" 0.2" 0.3" 0.4" 0.5" 0.6" (c)%Training%set%false%nega6ve%(FN)%&%false%posi6ve%(FP)%rate Kim et al., “Dealing with Noise in Defect Prediction,” ICSE 2011
  83. 83. Closest List Noisereturn Aj Identification F igure 9. The pseudo-code of the C LN I algorit A Kim et al., “Dealing with Noise in Defect Prediction,” ICSE 2011
  84. 84. Noise detection performance Precision Recall F-measureDebug 0.681 0.871 0.764SWT 0.624 0.830 0.712 (noise level =20%) Kim et al., “Dealing with Noise in Defect Prediction,” ICSE 2011
  85. 85. Bug prediction using cleaned data Noisey 100 75SWT F-measure 50 25 0 0% 15% 30% 45% Noise level
  86. 86. Bug prediction using cleaned data Noisey Cleaned 100 75SWT F-measure 50 25 0 0% 15% 30% 45% Noise level
  87. 87. Bug prediction using cleaned data Noisey Cleaned 100 75SWT F-measure 50 25 76% F-measure with 45% noise 0 0% 15% 30% 45% Noise level
  88. 88. ReLink Source coderepository Traditional Unknown heuristics (link miner) links Bugdatabase Recovering links using feature Links Features Links Combine Links Wu et al., “ReLink: Recovering Links between Bugs and Changes,” FSE 2011
  89. 89. ReLink Source coderepository Traditional Unknown heuristics (link miner) links Bugdatabase Recovering links using feature Links Features Links Combine Links Wu et al., “ReLink: Recovering Links between Bugs and Changes,” FSE 2011
  90. 90. ReLink Performance ZXingProjects OpenIntents Apache 0 20 40 60 80 100 F-measure Traditional ReLink Wu et al., “ReLink: Recovering Links between Bugs and Changes,” FSE 2011
  91. 91. Label Historical Changes Change message: “fix for bug 28434” Rev 101 (with BUG) Rev 102 (no BUG) ... ... ... ... ... ... fixed ... ... Rev 1 Rev 100 Rev 101 Rev 102 ... ... ... ... ... ... …… ... ... change ... ... change ... ... ... ... ... ... Development history of a fileFischer et al, “Populating a Release History Database from Version Control and Bug Tracking Systems,” ICSM2003
  92. 92. Atomic Change Change message: “fix for bug 28434” Rev 101 (with BUG) Rev 102 (no BUG) ... ... setText(“t”) insertTab() ... ... fixed ... ...Fischer et al, “Populating a Release History Database from Version Control and Bug Tracking Systems,” ICSM2003
  93. 93. Composite Change public TimeSeriesDataItem addOrUpdate(RegularTimePeriod period, double value)hunk 1 677 678 } return this.addOrUpdate(period, new Double(value)); } return addOrUpdate(period, new Double(value)); this. this. public TimeSeries createCopy(RegularTimePeriod start, RegularTimePeriod end) this. 944 if (endIndex < 0) { if ((endIndex < 0) || (endIndex < startIndex)) {hunk 2 945 946 } emptyRange = true; } emptyRange = true; if (ti | ge public boolean equals(Object object) 973 if (!ObjectUtilities.equal( if (!ObjectUtilities.equal(getDomainDescription(), }hunk 3 974 )){ getDomainDescription(), s.getDomainDescription() s.getDomainDescription())) { 975 return false; return false; 976 } } pub 978 if (!ObjectUtilities.equal( if (!ObjectUtilities.equal(getRangeDescription(),hunk 4 979 )){ getRangeDescription(), s.getRangeDescription() s.getRangeDescription())) { } 980 return false; return false; 981 } } JFree revision 1083 Figure 5. JFreeChart revision 1083. Tao et al, “"How Do Software Engineers Understand Code Changes?” FSE 2012
  94. 94. Defect Prediction 2.0Finer GranularityNoise HandlingNew Customers
  95. 95. Warning Developers “Safe” Files(Predicted as not buggy) “Risky” Files (Predicted as buggy)
  96. 96. Change ClassificationRev 1 Rev 2 Rev 3 Rev 4... ... ... ...... change ... change ... change ...... ... ... ... Kim et al., "Classifying Software Changes: Clean or Buggy?" TSE 2009
  97. 97. Change Classification Rev 1 Rev 2 Rev 3 Rev 4 ... ... ... ... ... change ... change ... change ... ... ... ... ...“Safe” Files Rev 1 Rev 2 Rev 3 Rev 4 ... ... ... ... ... change ... change ... change ... ... ... ... ...“Risky” Files
  98. 98. Change Classification Rev 1 Rev 2 Rev 3 Rev 4 ... ... ... ... ... change ... change ... change ... ... ... ... ...“Safe” Files Rev 1 Rev 2 Rev 3 Rev 4 ... ... ... ... ... change ... change ... change ... ... ... ... ...“Risky” Files
  99. 99. Defect prediction based Change Classification Debug UI JDT JEditProjects PDE POI Team UI 0 0.20 0.40 0.60 0.80 F-measure CC Cached CC
  100. 100. Warning Developers “Safe” Location(Predicted as not buggy) “Risky” Location (Predicted as buggy)
  101. 101. Test-case Selection
  102. 102. Test-case Selection Executing test cases
  103. 103. Test-case Selection 1.00 0.75 BaselineAPFD 0.50 History1 History2 0.25 0 R1.0 R1.1 R1.2 R1.3 R1.4 R1.5 Releases Runeson and Ljung, “Improving Regression Testing Transparency and Efficiency with History-Based Prioritization,” ICST 2011
  104. 104. Warning Prioritization
  105. 105. Warning Prioritization
  106. 106. Warning Prioritization 18" 16" 14" 12"Precision)(%)) 10" 8" History" Tool" 6" 4" 2" 0" 0" 20" 40" 60" 80" 100" Warning)Instances)by)Priority) Kim and Ernst, “Which Warnings Should I Fix First?” FSE 2007
  107. 107. Other Topics• Explanation - Why it has been predicted as defect-prone?• Cross-project prediction• Cost effectiveness measures• Active Learning/Refinement
  108. 108. Defect Prediction 2.0 New metrics AlgorithmsCoarse granularity 1.0
  109. 109. Defect Prediction 2.0 New metrics Finer granularity Algorithms Noise HandlingCoarse granularity 1.0 New customers 2.0
  110. 110. Defect Prediction 2.0 New metrics Finer granularity Algorithms Noise HandlingCorse granularity 1.0 New customers 2.0
  111. 111. 2013
  112. 112. MSR$2013:$Back$to$roots$Tom$Zimmermann$ Alberto$Bacchelli$$Generalchair MiningChallengeChair Massimiliano$Di$Penta$and$Sung$Kim$ Programco)chairs
  113. 113. MSR$2013:$Back$to$roots$Tom$Zimmermann$ Alberto$Bacchelli$$Generalchair MiningChallengeChair February 1 5 Massimiliano$Di$Penta$and$Sung$Kim$ Programco)chairs
  114. 114. Some slides/data are borrowed with thanks from• Tom Zimmermann, Chris Bird• Andreas Zeller• Ahmed Hassan• David Lo• Jaechang Nam,Yida Tao• Tien Neguan• Steve Counsell, David Bowes, Tracy Hall and David Gray• Wen Zhang

×