An Empirical Study on Inconsistent Changes to Code Clones at Release Level

975 views

Published on

This is a talk I gave at the 2009 Working Conference on Reverse Engineering in Lille, France about our work on the effects of inconsistent changes on software quality if we observe them at a release level.

Published in: Education, Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
975
On SlideShare
0
From Embeds
0
Number of Embeds
36
Actions
Shares
0
Downloads
22
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

An Empirical Study on Inconsistent Changes to Code Clones at Release Level

  1. 1. Nicolas Bettenburg Walid Ibrahim Ahmed E. Hassan Weyi Shang Bram Adams Ying Zou An Empirical Study on Inconsistent Changes to Code Clones at Release Level
  2. 2. 2 Code Clones: Recent Research in the Field
  3. 3. 2 Code Clones: Recent Research in the Field “Cloning Considered Harmful” Considered Harmful Cory Kapser and Michael W. Godfrey Cloning as Software Architecture Group (SWAG) Engineering Tool David R. Cheriton School of Computer Science, University of Waterloo {cjkapser, migod}@uwaterloo.ca Abstract clones pose additional problems if they do not evolve synchronously. With this in mind, methods for automatic urrent literature on the topic of duplicated (cloned) refactoring have been suggested [4, 7], and tools specifically in software systems often considers duplication to aid developers in the manual refactoring of clones have ful to the system quality and the reasons commonly also been developed [19]. for duplicating code often have a negative There is no doubt that code cloning is often an indication otation. While these positions are sometimes of sloppy design and in such cases should be considered to ct, during our case studies we have found that this is be a kind of development “bad smell”. However, we have niversally true, and we have found several situations found that there are many instances where this is simply not e code duplication seems to be a reasonable or the case. For example, cloning may be used to introduce beneficial design option. For example, a method of experimental optimizations to core subsystems without ducing experimental changes to core subsystems is to negatively effecting the stability of the main code. Thus, cate the subsystem and introduce changes there in a a variety of concerns such as stability, code ownership, and of sandbox testbed. As features mature and become design clarity need to be considered before any refactoring e within the experimental subsystem, they can then is attempted; a manager should try to understand the reason troduced gradually into the stable code base. In this behind the duplication before deciding what action (if any) risk of introducing instabilities in the stable version is to take. 1 mized. This paper describes several patterns of cloning This paper introduces eight cloning patterns that we have we have encountered in our case studies and discusses uncovered during case studies on large software systems,
  4. 4. 2 Code Clones: Recent Research in the Field “Cloning Considered Harmful” Considered Harmful Cory Kapser and Michael W. Godfrey Cloning as Software Architecture Group (SWAG) Engineering Tool David R. Cheriton School of Computer Science, University of Waterloo {cjkapser, migod}@uwaterloo.ca Abstract Do Code Clones Matter? clones pose additional problems if they do not evolve synchronously. With this in mind, methods for automatic urrent literature on the topic of duplicated (cloned) refactoring have been suggested [4, 7], and tools specifically in software systems often considers Deissenboeck, to aid developers in the Stefan refactoring of clones have Elmar Juergens, Florian duplication Benjamin Hummel, manual Wagner Inconsistent Clones Institut f¨ r Informatik, Technischebeen developedM¨ nchen ful to the system quality and theureasons commonly also Universit¨ t [19]. a u Single Snapshots for duplicating code often have 3, 85748 Garching b. M¨ nchen, Germany Boltzmannstr. a negative There is no doubt that code cloning is often an indication u otation. While these positions are sometimes of sloppy design and in such cases should be considered to {juergens,deissenb,hummelb,wagnerst}@in.tum.de ct, during our case studies we have found that this is be a kind of development “bad smell”. However, we have niversally true, and we have found several situations found that there are many instances where this is simply not e code duplication seems to be a reasonable or the case. For example, cloning may be used to introduce Abstract beneficial design option. For example, a method of experimental optimizations tofixed insubsystems without found in cloned code but not core all clone instances, ducing experimental changes to core subsystems is to negatively effecting the still exhibit the incorrect behavior. the system is likely to stability of the main code. Thus, cate the subsystem and introduce changes there in a ode cloning is not only assumed to inflate mainte- a variety of concerns such as stability, codewhere a missing To illustrate this, Fig. 1 shows an example, ownership, and ce costs but also considered defect-prone asand become of sandbox testbed. As features mature inconsistent null-check was retrofitted in only one clone instance. design clarity need to be considered before any refactoring nges to code duplicates can lead to unexpected can then e within the experimental subsystem, they behavior. is attempted; apresents the results of a understand case study This paper manager should try to large-scale the reason sequently,gradually into the of duplicated code, clone troduced the identification stable code base. In this behind the duplication before deciding whatare changed in- that was undertaken to find out (1) if clones action (if any) risk of has been a very active area theresearch in recent ction, introducing instabilities in of stable version is to take. 1 consistently, (2) if these inconsistencies are introduced in- mized. This paper describes substantial investigation of s. Up to now, however, no several patterns of cloning tentionally and, (3) if unintentional inconsistencies we have This paper introduces eight cloning patterns that can rep- consequences of code cloning on program correctness we have encountered in our case studies and discusses uncovered during case studies we analyzed three commer- resent faults. In this case study on large software systems,
  5. 5. 2 Code Clones: Recent Research in the Field “Cloning Considered Harmful” Considered Harmful Cory Kapser and Michael W. Godfrey Cloning as Software Architecture Group (SWAG) Engineering Tool David R. Cheriton School of Computer Science, University of Waterloo {cjkapser, migod}@uwaterloo.ca Abstract Do Code Clones Matter? clones pose additional problems if they do not evolve synchronously. With this in mind, methods for automatic urrent literature on the topic of duplicated (cloned) refactoring have been suggested [4, 7], and tools specifically in software systems often considers Deissenboeck, to aid developers in the Stefan refactoring of clones have Elmar Juergens, Florian duplication Benjamin Hummel, manual Wagner Inconsistent Clones Institut f¨ r Informatik, Technischebeen developedM¨ nchen ful to the system quality and theureasons commonly also Universit¨ t [19]. a u Single Snapshots for duplicating code often have 3, 85748 Garching b. M¨ nchen, Germany Boltzmannstr. a negative There is no doubt that code cloning is often an indication u otation. While these positions are sometimes of sloppy design and in such cases should be considered to {juergens,deissenb,hummelb,wagnerst}@in.tum.de ct, during our case studies we have found that this is be a kind of development “bad smell”. However, we have niversally true, and we have found several situations found that there are many instances where this is simply not e code duplication seems to be a reasonable or the case. For example, cloning may be used to introduce Abstract beneficial design option. For example, a method of experimental optimizations tofixed insubsystems without found in cloned code but not core all clone instances, A Study of Consistent and Inconsistent Changesthe still exhibit Clones code. Thus, ducing experimental changes to core subsystems is to the system is likely to Code the main negatively effecting to stability of the incorrect behavior. cate the subsystem and introduce changes there in a ode cloning is not only assumed to inflate mainte- a variety of concerns such as stability, codewhere a missing To illustrate this, Fig. 1 shows an example, ownership, and ce costs but also considered defect-prone asand become of sandbox testbed. As features mature inconsistent null-check was retrofitted in only one clone instance. Inconsistent Clones design clarity need to be considered before any refactoring nges to code duplicates can lead to unexpected can thenKrinke This paperapresents the results of a understand case study Jens e within the experimental subsystem, they behavior. is attempted; manager should try to large-scale the reason sequently,gradually into the of duplicated code, clone Hagen, Germany to before deciding whatare changed in- troduced the identification stable code base. In at in this behind the duplication find out (1) if clones action (if any) that was undertaken Weekly Snapshots FernUniversit¨ risk of has been a very active area theresearch in recent ction, introducing instabilities in of stable version is to take. 1 consistently, (2) if these inconsistencies are introduced in- krinke@acm.org tentionally and, (3) if unintentional inconsistencies we have mized. This paper describes substantial investigation of s. Up to now, however, no several patterns of cloning This paper introduces eight cloning patterns that can rep- consequences of code cloning on program correctness we have encountered in our case studies and discusses uncovered during case studies we analyzed three commer- resent faults. In this case study on large software systems,
  6. 6. 3 Code Clones: Inconsistent Changes “During the evolution of a system, code clones should be changed consistently to prevent bugs.”
  7. 7. 3 Code Clones: Inconsistent Changes “During the evolution of a system, code clones should be changed consistently to prevent bugs.” Demonstrated to be true at a micro-level!
  8. 8. 4 Revision Level vs. Release Level Analysis r2014 ... r2209 ... r2351 ... r2682 Revisions
  9. 9. 4 Revision Level vs. Release Level Analysis A r2014 ... r2209 ... r2351 ... r2682 Revisions
  10. 10. 4 Revision Level vs. Release Level Analysis A A r2014 ... r2209 ... r2351 ... r2682 Revisions
  11. 11. 4 Revision Level vs. Release Level Analysis A A r2014 ... r2209 ... r2351 ... r2682 Revisions
  12. 12. 4 Revision Level vs. Release Level Analysis A A r2014 ... r2209 ... r2351 ... r2682 Revisions
  13. 13. 4 Revision Level vs. Release Level Analysis A A ... Revisions Experimentation r2014 ... r2209 ... r2351 r2682 Refactoring Bug-Fixing
  14. 14. 4 Revision Level vs. Release Level Analysis Transient Effects Code Clones Amount Inconsistent Changes A A Time ... Revisions Experimentation r2014 ... r2209 ... r2351 r2682 Refactoring Bug-Fixing
  15. 15. 4 Revision Level vs. Release Level Analysis Transient Effects Code Clones Amount Inconsistent Changes A A Time ... Revisions Experimentation r2014 ... r2209 ... r2351 r2682 Refactoring Bug-Fixing 2.1 2.2 2.3 2.4 3.0 Releases
  16. 16. 5 Study Design: Subject Systems 22 Releases over 1 year 51 Days / release 15k Lines of code 50 Releases over 4 years 36 Days / release 90k Lines of code
  17. 17. 6 Study Design: Clone Detection & Tracking 2.1 2.2 2.3 2.4 3.0 Releases
  18. 18. 6 Study Design: Clone Detection & Tracking Clone Reports 2.1 2.2 2.3 2.4 3.0 Releases
  19. 19. 6 Study Design: Clone Detection & Tracking Clone Groups Clone Reports 2.1 2.2 2.3 2.4 3.0 Releases
  20. 20. 6 Study Design: Clone Detection & Tracking Genealogies Clone Groups Clone Reports 2.1 2.2 2.3 2.4 3.0 Releases
  21. 21. 7 Study Design: Inconsistent Changes Inconsistent Change Consistent Change 2.1 2.2 2.3 2.4 3.0
  22. 22. 8 Research Questions What are the characteristics of long-lived clone Q1 genealogies at release level? What is the effect of inconsistent changes on code Q2 quality when measured at release level? What type of cloning patterns do we observe at release Q3 level?
  23. 23. 9 Research Questions What are the characteristics of long-lived Q1 clone genealogies at release level? Life-Time Group Size
  24. 24. Lifetime  of  Clone  Groups 10 50 20 Number  of  Releases 10 5 2 1 Apache  Mina jEdit Number  of  Genealogies
  25. 25. Lifetime  of  Clone  Groups 10 50 20 Number  of  Releases 10 5 2 1 Apache  Mina jEdit Number  of  Genealogies
  26. 26. Lifetime  of  Clone  Groups 10 50 20 Number  of  Releases 10 5 2 1 Apache  Mina jEdit Number  of  Genealogies
  27. 27. Lifetime  of  Clone  Groups 10 50 Long-lived clone groups 20 Number  of  Releases 10 5 2 1 Apache  Mina jEdit Number  of  Genealogies
  28. 28. 11 Size  of  Clone  Groups 200 100 50 Number  of  Clones 20 10 5 2 1 Apache  Mina jEdit Number  of  Genealogies
  29. 29. 11 Size  of  Clone  Groups 200 100 50 Number  of  Clones 20 10 5 2 1 Apache  Mina jEdit Number  of  Genealogies
  30. 30. 11 Size  of  Clone  Groups 200 Mostly small 100 clone groups 50 Number  of  Clones 20 10 5 2 1 Apache  Mina jEdit Number  of  Genealogies
  31. 31. 12 Research Questions What is the effect of inconsistent changes on code Q2 quality when measured at release level? Inconsistent Change Reports 2.1 2.2 2.3 Inspection
  32. 32. 13 Research Question Q2 org.gjt.sp.jedit.jEdit.newView(View, Buffer) { ... // show tip of the day if(newView == viewsFirst) { // Don't show the welcome message if jEdit was started // with the -nosettings switch jEdit if(settingsDirectory != null && getBooleanProperty("firstTime")) new HelpViewer("jeditresource:/doc/welcome.html"); 4.0.2 ... org.gjt.sp.jedit.jEdit.newView(View, String) { ... // show tip of the day if(newView == viewsFirst) { // Don't show the welcome message if jEdit was started // with the -nosettings switch if(settingsDirectory != null && getBooleanProperty("firstTime")) new HelpViewer("jeditresource:/doc/welcome.html"); ...
  33. 33. 14 Research Question Q2 org.gjt.sp.jedit.jEdit.newView(View, Buffer) { ... // show tip of the day if(newView == viewsFirst) { // Don't show the welcome message if jEdit was started // with the -nosettings switch jEdit if(settingsDirectory != null && getBooleanProperty("firstTime")) new HelpViewer("jeditresource:/doc/welcome.html"); 4.0.2 4.0.3 ... org.gjt.sp.jedit.jEdit.newView(View, String) { ... // show tip of the day if(newView == viewsFirst) { // Don't show the welcome message if jEdit was started // with the -nosettings switch if(settingsDirectory != null && getBooleanProperty("firstTime")) new HelpViewer("jeditresource:/doc/welcome.html"); ...
  34. 34. 15 Research Question Q2 org.gjt.sp.jedit.jEdit.newView(View, Buffer) { ... // show tip of the day if(newView == viewsFirst) { // Don't show the welcome message if jEdit was started // with the -nosettings switch jEdit if(settingsDirectory != null && getBooleanProperty("firstTime")) new HelpViewer(); 4.0.3 4.0.4 ... org.gjt.sp.jedit.jEdit.newView(View, String) { ... // show tip of the day if(newView == viewsFirst) { // Don't show the welcome message if jEdit was started // with the -nosettings switch if(settingsDirectory != null && getBooleanProperty("firstTime")) new HelpViewer("jeditresource:/doc/welcome.html"); ...
  35. 35. 16 Research Question Q2 • 748 inconsistent changes flagged by our tool • Manual inspection of reports and source code • Only 7 inconsistent changes related to bugs • Inconsistent changes seem carried out on purpose.
  36. 36. 16 Research Question Q2 • 748 inconsistent changes flagged by our tool • Manual inspection of reports and source code • Only 7 inconsistent changes related to bugs • Inconsistent changes seem carried out on purpose. Only a fraction of inconsistent changes to long-lived clones introduce bugs!
  37. 37. 17 Research Questions What type of cloning patterns do we observe Q3 at release level? Clone Patterns Classification Clone Reports 2.1 2.2 2.3 2.4 3.0 Releases
  38. 38. 18 Research Question Q3 #$$%&'#()*+),+!)-.+!")*.$+%*+/0-%1+ !"#$$%&'#()*+),+!)-.+!")*.$+%*+/0#'1.+2 92):!;1$ 2<"2)#12.3$ 72(8!91$ jEdit %'$ 5'$ Mina %&$ 2:"2(#12-3$ ;&$ !"#$ !"#$ (2"5#'!32$!-0$ %&'$ %%&$ *"2'#!5#42$ 6/&$ )2"6#(!32$!.0$ +"2(#!6#42$ 78'$ ()*++(,-./$ '()**'+,-.$ &'$ #0#*1+$ #0#)1*$ /&$ %'$ %%&$ "!(!123(#420$ "!)!123)#420$ %&$ 5'$
  39. 39. 18 Research Question Q3 #$$%&'#()*+),+!)-.+!")*.$+%*+/0-%1+ !"#$$%&'#()*+),+!)-.+!")*.$+%*+/0#'1.+2 92):!;1$ 2<"2)#12.3$ 72(8!91$ jEdit %'$ 5'$ Mina %&$ 2:"2(#12-3$ ;&$ !"#$ !"#$ (2"5#'!32$!-0$ %&'$ %%&$ *"2'#!5#42$ 6/&$ )2"6#(!32$!.0$ +"2(#!6#42$ 78'$ ()*++(,-./$ '()**'+,-.$ &'$ #0#*1+$ #0#)1*$ /&$ %'$ %%&$ "!(!123(#420$ "!)!123)#420$ %&$ 5'$ 46% - 68% Replicate and Specialize
  40. 40. 18 Research Question Q3 #$$%&'#()*+),+!)-.+!")*.$+%*+/0-%1+ !"#$$%&'#()*+),+!)-.+!")*.$+%*+/0#'1.+2 92):!;1$ 2<"2)#12.3$ 72(8!91$ jEdit %'$ 5'$ Mina %&$ 2:"2(#12-3$ ;&$ !"#$ !"#$ (2"5#'!32$!-0$ %&'$ %%&$ *"2'#!5#42$ 6/&$ )2"6#(!32$!.0$ +"2(#!6#42$ 78'$ ()*++(,-./$ '()**'+,-.$ &'$ #0#*1+$ #0#)1*$ /&$ %'$ %%&$ "!(!123(#420$ "!)!123)#420$ %&$ 5'$ 46% - 68% 22% - 23% Replicate and Specialize API Usage
  41. 41. 19 Research Question Q3 #$$%&'#()*+),+!)-.+!")*.$+%*+/0-%1+ !"#$$%&'#()*+),+!)-.+!")*.$+%*+/0#'1.+2%*#+ 92):!;1$ 2<"2)#12.3$ 72(8!91$ %'$ 5'$ %&$ 2:"2(#12-3$ ;&$ !"#$ !"#$ (2"5#'!32$!-0$ %&'$ %%&$ *"2'#!5#42$ 6/&$ )2"6#(!32$!.0$ +"2(#!6#42$ 78'$ ()*++(,-./$ '()**'+,-.$ &'$ #0#*1+$ #0#)1*$ /&$ %'$ %%&$ "!(!123(#420$ "!)!123)#420$ %&$ 5'$
  42. 42. 19 Research Question Q3 #$$%&'#()*+),+!)-.+!")*.$+%*+/0-%1+ !"#$$%&'#()*+),+!)-.+!")*.$+%*+/0#'1.+2%*#+ 92):!;1$ 2<"2)#12.3$ 72(8!91$ %'$ 5'$ %&$ 2:"2(#12-3$ ;&$ !"#$ !"#$ (2"5#'!32$!-0$ %&'$ %%&$ *"2'#!5#42$ 6/&$ )2"6#(!32$!.0$ +"2(#!6#42$ 78'$ ()*++(,-./$ '()**'+,-.$ &'$ #0#*1+$ #0#)1*$ /&$ %'$ %%&$ "!(!123(#420$ "!)!123)#420$ %&$ 5'$ Inconsistent changes are carried out on purpose because 70%-90% of the cloned code is meant to be changed separately!
  43. 43. QUESTIONS?

×