Advertisement
Advertisement

More Related Content

Similar to Impact of Tool Support in Patch Construction(20)

Advertisement

Impact of Tool Support in Patch Construction

  1. Impact of Tool Support in Patch Construction Anil Koyuncu, Tegawendé F. Bissyandé, Dongsun Kim, Jacques Klein, Martin Monperrus and Yves Le Traon 1
  2. Motivation 2
  3. Traditional Patch Construction 3 Detection Localization Generation
  4. Thanks to… 4 Test Automation Automated Bug/Fault Localization Program Repair Manual Patching Fully Automated Patching Static/DynamicAnalysis
  5. Really? 5 They use? Or don’t use? Adopted by whom? Patch à Accepted? Stable?
  6. 1991 2017 … Long Life Software PProblem Solution Single problem per patch 7 ~ 15 millions LOC Rich code repository Subject --- Linux Kernel Project
  7. Subject --- Data Sources 8 Bug Reports Change History Developer Discussion
  8. Patch Construction Processes + + Process H (Human) Process DLH (Detection Localization Human) Process HMG (Human Match Generation) 10 •Fully Manual •Automated Localization (static/dynamic analysis) •Manual Patch Generation •Manual Design of PatchTemplate •Automatic Application
  9. H patches • Identification based on direct link to Bugzilla IDs 11
  10. DLH Patches • Detection based on <tool> names + 12
  11. DLH Patches – Tools 13 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 Author Dates 20 40 60 80 100#ofPatches checkpatch Sparse Linux Driver Verification Project Smatch Coverity cppcheck Strace Syzkaller
  12. HMG Patches + • mentioning “coccinelle” or “semantic patch” 14
  13. HMG Patches – Coccinelle SmPL (Semantic Patch Language) for specifying desired matches and transformations in C code. 15 Patch derived from the SmPLSmPL
  14. H Patches DLH Patches HMG Patches Linux 2.6.12 (June 2005) -Linux 4.8 (October 2016) 616,291 commits 5758 commits 729 commits 4050 commits Dataset 16
  15. Temporal Distribution 17 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 Author Dates 500 1000 #ofPatches H patches DLH Patches HMG patches
  16. arch drivers fs include kernel net sound staging others kernel subsystem (code directory) 0 10 20 30 40 50 60 70 80 90 100 Percentages H patches DLH patches HMG patches Spatial Distribution 18
  17. RQ2 - Who is using? Research Questions RQ1 - Community Reaction RQ3 – Impact on Stability RQ4 - Kind of Bugs 19
  18. RQ1 - Community Reaction • Delays in integrating commits • Gaps between proposed and integrated patches 20 RQ1 - Community Reaction
  19. RQ1 - Commit Acceptance Delay • Finding: Integration ofTool-supported patches are slower than traditional H patches. 21 Submission date acceptance date
  20. RQ1- LKML mentioning HMG (Coccinelle) • Finding:The gap is closing, patches appear to be accepted. 0 500 1000 1500 2000 2500 3000 2008 2009 2010 2011 2012 2013 2014 2015 2016 # of reference or commits Patch Submitted in LKML Accepted Commit Developer / Maintainer Reply 22 nt. Per- of DLH are ap- shown entied ed in a to the hat end, he gaps munity ge sug- n need ainline he criti- ggested correlate this frequency on a monthly basis with the corresponding statistics on accepted DLH patches related to the specic tools. 0 100 200 300 400 500 600 #ofreferenceorcommits Patch Submitted in LKML Accepted Commit Developer / Maintainer Reply (a) Data on checkpatch-related (DLH) patches. 0.5 0.6 0.7 0.8 0.9 1 1.1 0 20 40 60 80 % timeline Gap Slope (Linear regression) (b) Evolution of the Gap. 0 50 100 150 200 250 300 350 400 #ofreferenceorcommits Patch Submitted in LKML Accepted Commit Developer / Maintainer Reply (c) Data on coccinelle-related (HMG) patches. -6 -5 -4 -3 -2 -1 0 1 0 20 40 60 80 % timeline Gap Slope (linear regression) (d) Evolution of the Gap. Figure 7: # of Patches submitted / discussed / accepted. We have crawled all emails archived in the Linux Kernel Mailing List (LKML) using Scrapy14. We use heuristics to dierentiate mes-
  21. RQ2 - Profile of Patch Authors • Specialty • Commitment 23
  22. RQ2 - Specialty 24 • Finding: HMG Patches are often generated by less specialized developers. Speciality is dened as a metric for characterizing the extent to which a developer is focused on a specic subsystem. We compute it as the percentage of patches, among all her/his patches, which a developer contributes to a specic subsystem. Thus, speciality is measured with respect to each Linux code directory. We then draw, in Figure 8, the distributions of speciality metric values of developers for the dierent types of patches: e.g., for an automated patch applied to a le in a subsystem, we consider the commit author speciality w.r.t that subsystem. % of Speciality H Patches DLH Patches HMG Patches Figure 8: Speciality of developers Vs. Patch types. H patches are mostly provided by specialized developers. This may imply that the developers focus on implementing specic func- tionalities over time. Similarly, DLH patches appear to be mostly Focus on Specific modules Contribute to all modules
  23. RQ2 - Commitment of developers • Finding: Patch application tools (HMG) enable developers to remain committed to the code base. 25 # days between first patch and last patch #patches integrated into Linux Commitment
  24. to roll back changes, it is common to revert commit e.g., Commit message: revert hash 26 RQ3 - Stability of Patches
  25. 0 50 100 % of patches reverted H patches DLH patches HMG patches Ledleagueinwins 2.81 0.27 0.32 RQ3 - Stability of Patches 27 • Finding: Tool-supported patches are generally stable.
  26. RQ3 - Stability of Patches • Finding: Issues on fix patterns appear to be discovered quickly, however bug detection tools need long time. 28
  27. RQ4 - Kind of Bugs • Spread of buggy code ~ Locality of the patches • Complexity of the bugs ~ Change operations 29
  28. RQ4 - Bug Locality 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% H pathes DLH Patches HMG Patches % of patches 1 file 2 files 3 files 4 files 5+ files 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% H pathes DLH Patches HMG Patches % of patches 1 hunk 2 hunks 3 hunks 4 hunks 5+ hunks 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% H pathes DLH Patches HMG Patches % of patches 1 line 2 lines 3 lines 4 lines 5+ lines More Local Several Hunks Several Lines 30
  29. Change operations - Gumtree If/Unary:del Ident/GenericString:mv If/FunCall:add Ident/GenericString:mv 31 • AST DiffTool to identify change operations
  30. RQ4 - Change operations in patches • Finding: Several change operations 0 10 20 30 40 50 60 70 Ident/GenericString GenericList/Left Compound/ExprStatement Program/Declaration Compound/If % of patches HMG Patches upd mv add del 0 10 20 30 40 50 60 70 80 Program/CppTop Ident/GenericString Program/Declaration Compound/ExprStatement Compound/If % of patches H Patches upd mv add del 0 5 10 15 20 25 30 35 40 45 Ident/GenericString Compound/ExprStatement Left/Constant Compound/If If/Compound % of patches DLH Patches upd mv add del 32
  31. Take-aways 33 (1) Tools are gradually adopted. (2) HMG patches leverage micro-clones. (1) DLH HMG patches need more time to be accepted. Perhaps due to less-severity. (2) HMG patch acceptance has been fast.
  32. Take-aways 34 (1) DLH HMG patches can also change several lines. (2) HMG patches change several files due to APIs. (1) More opportunities à HMG patches leverage redundancy. (2) Need to target more complex defects.
  33. Really? 6 They use? Or don’t use? Adopted by whom? Patch à Accepted? Stable? Subject --- Data Sources 10 Bug Reports Change History Developer Discussion Patch Types + + Process H Process DLH Process HMG 13 •Fully Manual •Automated Localization (static/dynamic analysis) •Manual Patch Generation •Manual Design of Patch Template •Automatic Application RQ2 - Commitment of developers • Finding: Patch application tools enable developers to remain committed to contributing patches to the code base. 31 = # patches integrated into Linux * # days between first patch and last patch
Advertisement