Inconsistent Changes to Code Clones Often Half of Changes

Inconsistent Changes to Code Clones
Nicolas Bettenburg
An Empirical Study on
at Release Level
Weyi Shang Walid Ibrahim Bram Adams Ying Zou Ahmed E. Hassan
1

2
Code Clones: Recent Research in the Field
“Cloning Considered Harmful” Considered Harmful
Cory Kapser and Michael W. Godfrey
Software Architecture Group (SWAG)
David R. Cheriton School of Computer Science, University of Waterloo
{cjkapser, migod}@uwaterloo.ca
Abstract
Current literature on the topic of duplicated (cloned)
code in software systems often considers duplication
harmful to the system quality and the reasons commonly
cited for duplicating code often have a negative
connotation. While these positions are sometimes
correct, during our case studies we have found that this is
not universally true, and we have found several situations
where code duplication seems to be a reasonable or
even beneficial design option. For example, a method of
introducing experimental changes to core subsystems is to
duplicate the subsystem and introduce changes there in a
kind of sandbox testbed. As features mature and become
stable within the experimental subsystem, they can then
be introduced gradually into the stable code base. In this
way risk of introducing instabilities in the stable version is
minimized. This paper describes several patterns of cloning
that we have encountered in our case studies and discusses
the advantages and disadvantages associated with using
them.
1. Introduction
It is believed that most large software systems contain
a non-trivial amount of redundant code. Often referred to
as code clones, these segments of code typically involve
10–15% of the source code [24, 25]. Code clones can
arise through a number of different activities. For example,
intentional clones may be introduced through direct “copy-
and-pasting” of code. Unintentional clones on the other
hand may be the manifestation of programming idioms
related to the language or libraries the developers are using.
In much of the literature on the topic [2, 7, 12, 21, 22,
27, 28], cloning is considered harmful to the quality of the
source code. Code clones can cause additional maintenance
effort. Changes to one segment of code may need to
be propagated to several others, incurring unnecessary
maintenance costs [15]. Locating and maintaining these
clones pose additional problems if they do not evolve
synchronously. With this in mind, methods for automatic
refactoring have been suggested [4, 7], and tools specifically
to aid developers in the manual refactoring of clones have
also been developed [19].
There is no doubt that code cloning is often an indication
of sloppy design and in such cases should be considered to
be a kind of development “bad smell”. However, we have
found that there are many instances where this is simply not
the case. For example, cloning may be used to introduce
experimental optimizations to core subsystems without
negatively effecting the stability of the main code. Thus,
a variety of concerns such as stability, code ownership, and
design clarity need to be considered before any refactoring
is attempted; a manager should try to understand the reason
behind the duplication before deciding what action (if any)
to take. 1
This paper introduces eight cloning patterns that we have
uncovered during case studies on large software systems,
some of which we reported in [23, 24, 25]. These
patterns present both good and bad motivations for cloning,
and we discuss both the advantages and disadvantages of
these patterns of cloning in terms of development and
maintenance. In some cases, we identify patterns of cloning
that we believe are beneficial to the quality of the system.
From our observations we have found that refactoring may
not be the best solution in all patterns of cloning. Tools
need to be developed to aid the synchronous maintenance
of clones within a software system, such as Linked Editing
presented by Toomim et al. [29].
This paper introduces the notion of categorizing high
level patterns of cloning in a similar fashion to the
cataloging of design patterns [14] or anti-patterns [8].
There are several benefits that can be gained from
this characterization of cloning. First, it provides a
flexible framework on top of which we can document
our knowledge about how and why cloning occurs in
1A simple (but trivial) example is the title of this paper. Although there
is a kind of duplication in the wording, no ”refactoring” of the title would
carry the same connotations as the original statement.
A Study of Consistent and Inconsistent Changes to Code Clones
Jens Krinke
FernUniversität in Hagen, Germany
krinke@acm.org
Abstract
Code Cloning is regarded as a threat to software main-
tenance, because it is generally assumed that a change to
a code clone usually has to be applied to the other clones
of the clone group as well. However, there exists little
empirical data that supports this assumption. This paper
presents a study on the changes applied to code clones in
open source software systems based on the changes between
versions of the system. It is analyzed if changes to code
clones are consistent to all code clones of a clone group or
not. The results show that usually half of the changes to
code clone groups are inconsistent changes. Moreover, the
study observes that when there are inconsistent changes to
a code clone group in a near version, it is rarely the case
that there are additional changes in later versions such that
the code clone group then has only consistent changes.
1 Introduction
Duplicated code is common in all kind of software sys-
tems. Although cut-copy-paste (-and-adapt) techniques are
considered bad practice, every programmer uses them.
Since these practices involve both duplication and mod-
ification, they are collectively called code cloning. While
the duplicated code is called a code clone. A clone group
consists of code clones that are clones of each other (some-
times this is also called a clone class). During the software
development cycle, code cloning is both easy and inexpen-
sive (in both cost and money). However, this practice can
complicate software maintenence in the following ways:
• Errors may have been duplicated (cloned) in parallel
with the cloned code.
• Modifications done to the original code must often be
applied to the cloned code as well.
Because of these problems, research has developed many
approaches to detect cloned code [5,6,9,12,16–18,20]. In
addition, some empirical work done has attempted to check
whether or not the above mentioned problems are relevant
in practice. Kim et al. [15] investigated the evolution of
code clones and provided a classification for evolving code
clones. Their work already showed that during the evolution
of the code clones, consistent changes to the code clones
of a group are fewer than anticipated. Aversano et al. [4]
did a similar study and they state “that the majority of clone
classes is always maintained consistently.” Geiger et al. [10]
studied the relation of code clone groups and change cou-
plings (files which are committed at the same time, by the
same author, and with the same modification description),
but could not find a (strong) relation. Therefore, this work
will present an empirical study that verifies the following
hypothesis:
During the evolution of a system, code clones of
a clone group are changed consistently.
Of course, a system may contain bugs where a change
has been applied to some code clones, but has been forgot-
ten for other code clones of the clone group. For stable
systems it can be assumed that such bugs will be resolved
at a later time. This results in a second hypothesis:
During the evolution of a system, if code clones
of a clone group are not changed consistently, the
missing changes will appear in a later version.
This work will verify the two hypotheses by studying the
changes that are applied to code clones during 200 weeks of
evolution of five open source software systems. The contri-
butions of this paper are:
• A large empirical study that examines the changes to
code clones in evolving systems. This study involves
both a greater number and diversity of systems than
previous empirical studies.
• The study will show that both hypotheses are not gen-
erally valid for the five studied systems. In summary,
clone groups are changed consistently in roughly half
of the time, invalidating the first hypothesis. The sec-
ond hypothesis is only partially valid. This is because
c2007 IEEE. To be published in the Proceedings of the 14th Working Conference on Reverse Engineering, 2007 in Vancouver, Canada. Personal use of this
material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works
for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE.
Do Code Clones Matter?
Elmar Juergens, Florian Deissenboeck, Benjamin Hummel, Stefan Wagner
Institut für Informatik, Technische Universität München
Boltzmannstr. 3, 85748 Garching b. München, Germany
{juergens,deissenb,hummelb,wagnerst}@in.tum.de
Abstract
Code cloning is not only assumed to inflate mainte-
nance costs but also considered defect-prone as inconsistent
changes to code duplicates can lead to unexpected behavior.
Consequently, the identification of duplicated code, clone
detection, has been a very active area of research in recent
years. Up to now, however, no substantial investigation of
the consequences of code cloning on program correctness
has been carried out. To remedy this shortcoming, this pa-
per presents the results of a large-scale case study that was
undertaken to find out if inconsistent changes to cloned code
can indicate faults. For the analyzed commercial and open
source systems we not only found that inconsistent changes
to clones are very frequent but also identified a significant
number of faults induced by such changes. The clone de-
tection tool used in the case study implements a novel algo-
rithm for the detection of inconsistent clones. It is available
as open source to enable other researchers to use it as basis
for further investigations.
1. Clones correctness
Research in software maintenance has shown that
many programs contain a significant amount of duplicated
(cloned) code. Such cloned code is considered harmful for
two reasons: (1) multiple, possibly unnecessary, duplicates
of code increase maintenance costs and, (2) inconsistent
changes to cloned code can create faults and, hence, lead
to incorrect program behavior [20, 29]. While clone detec-
tion has been a very active area of research in recent years,
up to now, there is no thorough understanding of the degree
of harmfulness of code cloning. In fact, some researchers
even started to doubt the harmfulness of cloning at all [17].
To shed light on the situation, we investigated the ef-
fects of code cloning on program correctness. It is impor-
tant to understand, that clones do not directly cause faults
but inconsistent changes to clones can lead to unexpected
program behavior. A particularly dangerous type of change
to cloned code is the inconsistent bug fix. If a fault was
found in cloned code but not fixed in all clone instances,
the system is likely to still exhibit the incorrect behavior.
To illustrate this, Fig. 1 shows an example, where a missing
null-check was retrofitted in only one clone instance.
This paper presents the results of a large-scale case study
that was undertaken to find out (1) if clones are changed in-
consistently, (2) if these inconsistencies are introduced in-
tentionally and, (3) if unintentional inconsistencies can rep-
resent faults. In this case study we analyzed three commer-
cial systems written in C#, one written in Cobol and one
open-source system written in Java. To conduct the study
we developed a novel detection algorithm that enables us
to detect inconsistent clones. We manually inspected about
900 clone groups to handle the inevitable false positives and
discussed each of the over 700 inconsistent clone groups
with the developers of the respective systems to determine
if the inconsistencies are intentional and if they represent
faults. Altogether, around 1800 individual clone group as-
sessments were manually performed in the course of the
case study. The study lead to the identification of 107 faults
that have been confirmed by the systems’ developers.
Research Problem Although most previous work agrees
that code cloning poses a problem for software mainte-
nance, “there is little information available concerning the
impacts of code clones on software quality” [29]. As the
consequences of code cloning on program correctness, in
particular, are not fully understood today, it remains unclear
how harmful code clones really are. We consider the ab-
sence of a thorough understanding of code cloning precari-
ous for software engineering research, education and prac-
tice.
Contribution The contribution of this paper is twofold.
First, we extend the existing empirical knowledge by a case
study that demonstrates that clones get changed inconsis-
tently and that such changes can represent faults. Second,
we present a novel suffix-tree based algorithm for the detec-
tion of inconsistent clones. In contrast to other algorithms
for the detection of inconsistent clones, our tool suite is
made available for other researchers as open source.
ICSE’09, May 16-24, 2009, Vancouver, Canada
978-1-4244-3452-7/09/$25.00 © 2009 IEEE 485
2

3
Code Clones: Inconsistent Changes
“During the evolution of a system, code clones
should be changed consistently to prevent bugs.”
3

3
“During the evolution of a system, code clones
should be changed consistently to prevent bugs.”
Demonstrated to be
true at a micro-level!
Juergens et al. ICSE’09, Krinke et al. WCRE’07
3

4
• Cloning an engineering tool
• Reduce complexity of programming tasks
• Facilitate easier experimentation
• Copy now, refactor later!
4

4
• Cloning an engineering tool
• Reduce complexity of programming tasks
• Facilitate easier experimentation
• Copy now, refactor later!
What happens at a
macro-level?
4

5
Revision Level vs. Release Level Analysis
Revisionsr2014 r2351r2209... ... ... r2682
5

5
Revisionsr2014 r2351r2209... ... ... r2682
A
5

5
Revisionsr2014 r2351r2209... ... ... r2682
A A
5

Maintenance
Experimentation
Refactoring
5
Revisionsr2014 r2351r2209... ... ... r2682
A A
5

Maintenance
Experimentation
Refactoring
5
Revisionsr2014 r2351r2209... ... ... r2682
Inconsistent Changes
Code Clones
Time
Amount
A A
5

Maintenance
Experimentation
Refactoring
5
Releases2.1 2.32.2 2.4 3.0
Revisionsr2014 r2351r2209... ... ... r2682
Inconsistent Changes
Code Clones
Time
Amount
A A
5

6
Motivation
How does this look like, if we step back and
observe the effects of inconsistent changes
at release level?
6

7
Study Design: Subject Systems
22 Releases over 1 year
51 Days / release
15k Lines of code
50 Releases over 4 years
36 Days / release
90k Lines of code
7

8
Study Design: Clone Detection Tracking
Releases2.1 2.32.2 2.4 3.0
8

8
Releases2.1 2.32.2 2.4 3.0
Clone
Reports
8

8
Releases2.1 2.32.2 2.4 3.0
Clone
Reports
Clone Groups
8

8
Releases2.1 2.32.2 2.4 3.0
Clone
Reports
Clone Groups
Genealogies
8

9
Study Design: Inconsistent Changes
2.1 2.32.2 2.4 3.0
Inconsistent Change Consistent Change
9

10
Research Questions
Q1
What are the characteristics of long-lived clone
genealogies at release level?
Q2
What is the effect of inconsistent changes to code clones
when measured at release level?
Q3
What type of cloning patterns do we observe at release
level?
10

11
Research Questions
Q1
What are the characteristics of long-lived
clone genealogies at release level?
11

12
Research Question Q1
125102050
Lifetime of Clone Groups
Number of Releases
Apache Mina
jEdit
Number of Genealogies
125102050100200
Size of Clone Groups
Number of Clones
Apache Mina
jEdit
12

12
125102050
Lifetime of Clone Groups
Number of Releases
Apache Mina
jEdit
125102050100200
Size of Clone Groups
Number of Clones
Apache Mina
jEdit
Small, rather long lived
clone-genealogies.
12

13
Research Questions
Q2
What is the effect of inconsistent changes to
code clones when measured at release level?
13

14
org.gjt.sp.jedit.jEdit.newView(View Buffer)
{
...
// show tip of the day
if(newView == viewsFirst)
{
// Don't show the welcome message if jEdit was started
// with the -nosettings switch
if(settingsDirectory != null getBooleanProperty(firstTime))
new HelpViewer(jeditresource:/doc/welcome.html);
else if(jEdit.getBooleanProperty(tip.show))
new TipOfTheDay(newView);
}
...
org.gjt.sp.jedit.jEdit.newView(View String)
{
...
{
}
...
4.0.1 4.0.2
jEdit
14

15
{
...
{
}
...
{
...
{
}
...
4.0.2 4.0.3
jEdit
15

16
{
...
{
new HelpViewer();
}
...
{
...
{
}
...
4.0.3 4.0.4
jEdit
16

17
• 748 inconsistent changes ﬂagged by our tool
• Manual inspection of the source code
• 7 inconsistent changes related to bugs
• For 41% - 65% of genealogies: inconsistent
changes carried out on purpose.
17

17
• 748 inconsistent changes ﬂagged by our tool
• Manual inspection of the source code
• 7 inconsistent changes related to bugs
• For 41% - 65% of genealogies: inconsistent
changes carried out on purpose.
Independent evolution of
clones observed at
release level.
17

18
Research Questions
Q3 What type of cloning patterns do we observe at release
level?
18

19
!# $% !#
'()**'+,,#-. / '()**'+,,#-. $
#0#)1* $ #0#)1*
!(!12,(#320 % !(!12,(#320
(24#'!,25!-05*2'#!4#32 67 (24#'!,25!-05*2'#!4#32 %$
82(9!,#1 $ 82(9!,#1
2:2(#12-, 2:2(#12-, ;
!#$$%%'#(%)*+)+!),-+!)*-$+%*+./#'0-+1%*#
!#$$%%'#(%)*+)+!),-+!)*-$+%*+23,%(
!#$%'()'*+
,#%'$%-
./0
1!2'(%3
40
5#!%3*(
60
'#%
440
!7,,8((%*9
/0
%+%73,
440#'!'3(!%-+
40
!#$%'()'*+
,#%'$%-
/:0
1!2'(%3
40 5#!%3*(
;0
'#%
40
!7,,8((%*9
0
%+%73,
40
#'!'3(!%-+
;0
19

19
!# $% !#
'()**'+,,#-. / '()**'+,,#-. $
#0#)1* $ #0#)1*
!(!12,(#320 % !(!12,(#320
(24#'!,25!-05*2'#!4#32 67 (24#'!,25!-05*2'#!4#32 %$
82(9!,#1 $ 82(9!,#1
2:2(#12-, 2:2(#12-, ;
!#$$%%'#(%)*+)+!),-+!)*-$+%*+./#'0-+1%*#
!#$$%%'#(%)*+)+!),-+!)*-$+%*+23,%(
!#$%'()'*+
,#%'$%-
./0
1!2'(%3
40
5#!%3*(
60
'#%
440
!7,,8((%*9
/0
%+%73,
440#'!'3(!%-+
40
!#$%'()'*+
,#%'$%-
/:0
1!2'(%3
40 5#!%3*(
;0
'#%
40
!7,,8((%*9
0
%+%73,
40
#'!'3(!%-+
;0
Replicate and Specialize
46% - 68%
inconsistent changes due to independent
evolution of cloned parts.
19

19
!# $% !#
'()**'+,,#-. / '()**'+,,#-. $
#0#)1* $ #0#)1*
!(!12,(#320 % !(!12,(#320
(24#'!,25!-05*2'#!4#32 67 (24#'!,25!-05*2'#!4#32 %$
82(9!,#1 $ 82(9!,#1
2:2(#12-, 2:2(#12-, ;
!#$$%%'#(%)*+)+!),-+!)*-$+%*+./#'0-+1%*#
!#$$%%'#(%)*+)+!),-+!)*-$+%*+23,%(
!#$%'()'*+
,#%'$%-
./0
1!2'(%3
40
5#!%3*(
60
'#%
440
!7,,8((%*9
/0
%+%73,
440#'!'3(!%-+
40
!#$%'()'*+
,#%'$%-
/:0
1!2'(%3
40 5#!%3*(
;0
'#%
40
!7,,8((%*9
0
%+%73,
40
#'!'3(!%-+
;0
46% - 68%
API Usage
22% - 23%
inconsistent changes due to use of distinct
parts of the API to create functionality.
19

19
!# $% !#
'()**'+,,#-. / '()**'+,,#-. $
#0#)1* $ #0#)1*
!(!12,(#320 % !(!12,(#320
(24#'!,25!-05*2'#!4#32 67 (24#'!,25!-05*2'#!4#32 %$
82(9!,#1 $ 82(9!,#1
2:2(#12-, 2:2(#12-, ;
!#$$%%'#(%)*+)+!),-+!)*-$+%*+./#'0-+1%*#
!#$$%%'#(%)*+)+!),-+!)*-$+%*+23,%(
!#$%'()'*+
,#%'$%-
./0
1!2'(%3
40
5#!%3*(
60
'#%
440
!7,,8((%*9
/0
%+%73,
440#'!'3(!%-+
40
!#$%'()'*+
,#%'$%-
/:0
1!2'(%3
40 5#!%3*(
;0
'#%
40
!7,,8((%*9
0
%+%73,
40
#'!'3(!%-+
;0
46% - 68%
API Usage
22% - 23%
inconsistent changes due to use of distinct
parts of the API to create functionality.
Developers seem
to effectively handle
inconsistent changes
at macro-level.
19

Inconsistent Changes to Code Clones Often Half of Changes

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (6)

Similar to Inconsistent Changes to Code Clones Often Half of Changes

Similar to Inconsistent Changes to Code Clones Often Half of Changes (20)

More from SAIL_QU

More from SAIL_QU (20)

Inconsistent Changes to Code Clones Often Half of Changes