Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Automatically Generated Patches as Debugging Aids: A Human Study (FSE 2014)

1,201 views

Published on

Yida's FSE presentation.

Published in: Spiritual
  • Be the first to comment

Automatically Generated Patches as Debugging Aids: A Human Study (FSE 2014)

  1. 1. Automatically Generated Patches as Debugging Aids: A Human Study Yida Tao, Jindae Kim, Sunghun Kim Dept. of CSE, The Hong Kong University of Science and Technology Chang Xu State Key Lab for Novel Software Technology, Nanjing University
  2. 2. • Promising research progress • ClearView1: Prevent all 10 Firefox exploits • GenProg2: Fix 55/105 real bugs [1] Automatically Patching Errors in Deployed Software. Perkins et al. SOSP’09 [2] A systematic study of automated program repair: fixing 55 out of 105 bugs for $8 each. Le Goues et al. ICSE’12 2 Automatic Program Repair
  3. 3. 3 Automatic Program Repair
  4. 4. “It won't get your bug patched any quicker. You’ll just have shifted the coders' attention away from their own app's bugs, and onto the repair tool’s bugs.” - Slashdot discussion: http://science.slashdot.org/story/09/10/29/2248246/Fixing-Bugs-But- Bypassing-the-Source-Code 4 Automatic Program Repair
  5. 5. #what-could-possibly-go-wrong • Blackbox repair • Increasing maintenance cost • Vulnerable to attack - Slashdot discussion: http://science.slashdot.org/story/09/10/29/2248246/Fixing-Bugs-But- Bypassing-the-Source-Code - A human study of patch maintainability. ISSTA’12 5 - Automatic patch generation learned from human-written patches. ICSE’13
  6. 6. #what-could-possibly-go-wrong #program-out-of-control - Slashdot discussion: http://science.slashdot.org/story/09/10/29/2248246/Fixing-Bugs-But- Bypassing-the-Source-Code - A human study of patch maintainability. ISSTA’12 6 - Automatic patch generation learned from human-written patches. ICSE’13 • Blackbox repair • Increasing maintenance cost • Vulnerable to attack
  7. 7. Use automatically generated patches as debugging aids 7
  8. 8. Use automatically generated patches as debugging aids Our Human Study • Investigate the usefulness of generated patches as debugging aids • Discuss the impact of patch quality on debugging performance • Explore practitioners’ feedback on adopting automatic program repair 8
  9. 9. Methodology 9
  10. 10. Debugging aid Participants Bugs 10 is given to Debug
  11. 11. Debugging aid Participants Bugs 11
  12. 12. Low-quality generated patch Debugging aid Participants Bugs 12
  13. 13. Low-quality generated patch High-quality generated patch Debugging aid Participants Bugs 13
  14. 14. Low-quality generated patch High-quality generated patch Buggy method location Debugging aid Participants Bugs 14
  15. 15. Grad: 44 MTurk: 23 Engr: 28 95 Participants CS graduate students Amazon Mechanical Turk workers Industrial software engineers Debugging aid Participants Bugs 15
  16. 16. Debugging aid Participants Bugs 16
  17. 17. 44 Graduate students • Between-group design 14 students 15 students 15 students Debugging aid Participants Bugs 17
  18. 18. 44 Graduate students • Between-group design Low-quality generated patch High-quality generated patch Buggy method location 14 students 15 students 15 students Debugging aid Participants Bugs 18
  19. 19. 44 Graduate students • Between-group design • Onsite setting • Eclipse IDE • Supervised session Low-quality generated patch High-quality generated patch Buggy method location 14 students 15 students 15 students Debugging aid Participants Bugs 19
  20. 20. Low-quality generated patch High-quality generated patch Buggy method location Remote participants (28 Engr + 23 MTurk) • Within-group design Debugging aid Participants Bugs 20
  21. 21. Remote participants (28 Engr + 23 MTurk) • Within-group design • Online debugging system Low-quality generated patch High-quality generated patch Buggy method location Debugging aid Participants Bugs 21
  22. 22. Debugging aid Participants Bugs 22
  23. 23. Bug Selection Criteria • Real bugs • The bug has accepted patches written by developers • Proper number of bugs • The bug has generated patches with different quality Debugging aid Participants Bugs 23
  24. 24. Automatic patch generation learned from human-written patches. Kim et al. ICSE’13 Debugging aid Participants Bugs 24
  25. 25. Automatic patch generation learned from human-written patches. Kim et al. ICSE’13 for (int i=0; i<parenCount; i++) SubString sub = (SubString)parens.get(i) if(sub!=null){ args[i+1] = sub.toString(); Auto-generated patch A Auto-generated patch B Debugging aid Participants Bugs 25 } } for (int i=0; i<parenCount; i++) SubString sub = (SubString)parens.get(i) args[parenCount+1] = new Integer(reImpl.leftContext.length); }
  26. 26. Automatic patch generation learned from human-written patches. Kim et al. ICSE’13 for (int i=0; i<parenCount; i++) SubString sub = (SubString)parens.get(i) if(sub!=null){ args[i+1] = sub.toString(); Auto-generated patch A Auto-generated patch B avg. ranking from 85 devs and students Debugging aid Participants Bugs 26 } } for (int i=0; i<parenCount; i++) SubString sub = (SubString)parens.get(i) args[parenCount+1] = new Integer(reImpl.leftContext.length); } 1.6 2.8
  27. 27. Automatic patch generation learned from human-written patches. Kim et al. ICSE’13 for (int i=0; i<parenCount; i++) SubString sub = (SubString)parens.get(i) if(sub!=null){ args[i+1] = sub.toString(); Auto-generated patch A Auto-generated patch B High-Quality Patch Low-Quality patch avg. ranking from 85 devs and students Debugging aid Participants Bugs 27 } } for (int i=0; i<parenCount; i++) SubString sub = (SubString)parens.get(i) args[parenCount+1] = new Integer(reImpl.leftContext.length); } 1.6 2.8
  28. 28. Debugging aid Participants Bugs 28
  29. 29. Participants submit 337 patches as their debugging outcome Debugging aid Participants Bugs 29
  30. 30. Location 109 LowQ 112 HighQ # submitted patches 116 w.r.t debugging aid Participants submit 337 patches as their debugging outcome Debugging aid Participants Bugs 30
  31. 31. Location 109 LowQ 112 HighQ # submitted patches 116 w.r.t debugging aid Bug1 66 Bug2 74 Bug5 62 Bug3 59 Bug4 76 # submitted patches w.r.t bugs Participants submit 337 patches as their debugging outcome Debugging aid Participants Bugs 31
  32. 32. Evaluation of debugging performance 32
  33. 33. Patch Correctness Correctness 33
  34. 34. Patch Correctness • Passing test cases Correctness 34
  35. 35. Patch Correctness • Passing test cases • Matching the semantics of original accepted patches Correctness 35
  36. 36. Patch Correctness • Passing test cases • Matching the semantics of original accepted patches • 3 evaluators Correctness 36
  37. 37. Debugging Time • Eclipse Plug-in •Website Timer Correctness Debugging time 37
  38. 38. Correctness Debugging time • Independent variables • Debugging aids • Bugs • Participant types • Programming experience 38
  39. 39. Multiple Regression Analysis Correctness Debugging time • Independent variables • Debugging aids • Bugs • Participant types • Programming experience correctness = α0 + α1 ∙ x1 + α2 ∙ x2 + α3 ∙ x3 + α4 ∙ x4 debugging time = β0 + β1 ∙ x1 + β2 ∙ x2 + β3 ∙ x3 + β4 ∙ x4 39
  40. 40. Post-study Survey • Helpfulness of debugging aids • Difficulty of bugs • Opinions on using generated patches as debugging aids Correctness Debugging time Survey feedback 40
  41. 41. Results 41
  42. 42. High-quality patches significantly improve debugging correctness 1 48% 33% 71% 42
  43. 43. High-quality patches significantly improve debugging correctness 1 % of correct patches 48% 33% 71% 43 Location LowQ HighQ
  44. 44. High-quality patches significantly improve debugging correctness % of correct patches Location LowQ HighQ 1 Positive Coefficient = 1.25 p-value= 0.00 < 0.05 48% 71% 44
  45. 45. Low-quality patches slightly undermine debugging correctness % of correct patches Location LowQ HighQ 2 48% 33% 71% 45
  46. 46. Low-quality patches slightly undermine debugging correctness % of correct patches Location LowQ HighQ 2 Negative Coefficient = -0.55 p-value= 0.09 48% 33% 71% 46
  47. 47. Low-quality patches can undermine debugging correctness % of correct patches Location LowQ HighQ 2 Negative Coefficient = -0.55 p-value= 0.09 48% 33% 71% 47
  48. 48. High-quality patches are more useful for 3 difficult bugs 48
  49. 49. High-quality patches are more useful for 3 difficult bugs 49 5 4 3 2 Bug Difficulty Bug1 Math-280 Bug2 Rhino-114493 Bug3 Rhino-192226 Bug4 Rhino-217379 Bug5 Rhino-76683
  50. 50. High-quality patches are more useful for 3 difficult bugs 90% 80% 70% 60% 50% 40% 30% 20% 10% 0% % of correct patches Bug1 Bug2 Bug3 Bug4 Bug5 Location LowQ HighQ 50 5 4 3 2 Bug Difficulty Bug1 Math-280 Bug2 Rhino-114493 Bug3 Rhino-192226 Bug4 Rhino-217379 Bug5 Rhino-76683
  51. 51. 4 The type of debugging aid does not affect debugging time 51
  52. 52. 4 The type of debugging aid does not affect debugging time 80 60 40 20 0 Debugging time (min) Location LowQ HighQ 52
  53. 53. 5 Other factors’ impact on debugging performance Difficult bugs significantly slow down debugging Engr and MTurk are more likely to debug correctly Novices tend to benefit more from HighQ patches 53
  54. 54. Helpfulness of debugging aids Very helpful Helpful Medium Slightly Helpful Not Helpful 6 54 Participants consider high-quality generated patches much more helpful than low-quality patches Low-quality generated patch High-quality generated patch Mann-Whitney U test p-value = 0.001
  55. 55. Feedback 55
  56. 56. 56
  57. 57. Quick starting point • Point to the buggy area • Brainstorm “They would seem to be useful in helping find various ideas around fixing the issue, even if the patch isn’t always correct on its own.” 57
  58. 58. Quick starting point • Point to the buggy area • Brainstorm Confusing, incomplete, misleading • Wrong lead, especially for novices • Require further human perfection “They would seem to be useful in helping find various ideas around fixing the issue, even if the patch isn’t always correct on its own.” 58
  59. 59. “Generated patches would be good at recognizing obvious problems” “…but may not recognize more involved defects.” 59
  60. 60. “Generated patches would be good at recognizing obvious problems” “…but may not recognize more involved defects.” 60 “Generated patches simplify the problem” “…but they may over-simplify it by not addressing the root cause.”
  61. 61. “I would use generated patches as debugging aids, as they provide extra diagnostic information” 61
  62. 62. “I would use generated patches as debugging aids, as they provide extra diagnostic information” “…along with access to standard debugging tools.” 62
  63. 63. Threats to Validity 63
  64. 64. Threats to Validity • Bugs and generated patches may not be representative • Quality measure of generated patches may not generalize • May not generalize to domain experts • Possibility of blindly reusing generated patches • Remove patches that are submitted less than 1 minute 64
  65. 65. Takeaway 65 • Auto-generated patches can be useful as debugging aids • Participants fix bugs more correctly with auto-generated patches • Quality control is required • Participants’ debugging correctness is compromised with low-quality generated patches • Maximize the benefits • Difficult bugs • Novice developers

×