SlideShare a Scribd company logo
1 of 45
Download to read offline
Inconsistent Changes to Code Clones
Nicolas Bettenburg
An Empirical Study on
at Release Level
Weyi Shang Walid Ibrahim Bram Adams Ying Zou Ahmed E. Hassan
1
2
Code Clones: Recent Research in the Field
“Cloning Considered Harmful” Considered Harmful
Cory Kapser and Michael W. Godfrey
Software Architecture Group (SWAG)
David R. Cheriton School of Computer Science, University of Waterloo
{cjkapser, migod}@uwaterloo.ca
Abstract
Current literature on the topic of duplicated (cloned)
code in software systems often considers duplication
harmful to the system quality and the reasons commonly
cited for duplicating code often have a negative
connotation. While these positions are sometimes
correct, during our case studies we have found that this is
not universally true, and we have found several situations
where code duplication seems to be a reasonable or
even beneficial design option. For example, a method of
introducing experimental changes to core subsystems is to
duplicate the subsystem and introduce changes there in a
kind of sandbox testbed. As features mature and become
stable within the experimental subsystem, they can then
be introduced gradually into the stable code base. In this
way risk of introducing instabilities in the stable version is
minimized. This paper describes several patterns of cloning
that we have encountered in our case studies and discusses
the advantages and disadvantages associated with using
them.
1. Introduction
It is believed that most large software systems contain
a non-trivial amount of redundant code. Often referred to
as code clones, these segments of code typically involve
10–15% of the source code [24, 25]. Code clones can
arise through a number of different activities. For example,
intentional clones may be introduced through direct “copy-
and-pasting” of code. Unintentional clones on the other
hand may be the manifestation of programming idioms
related to the language or libraries the developers are using.
In much of the literature on the topic [2, 7, 12, 21, 22,
27, 28], cloning is considered harmful to the quality of the
source code. Code clones can cause additional maintenance
effort. Changes to one segment of code may need to
be propagated to several others, incurring unnecessary
maintenance costs [15]. Locating and maintaining these
clones pose additional problems if they do not evolve
synchronously. With this in mind, methods for automatic
refactoring have been suggested [4, 7], and tools specifically
to aid developers in the manual refactoring of clones have
also been developed [19].
There is no doubt that code cloning is often an indication
of sloppy design and in such cases should be considered to
be a kind of development “bad smell”. However, we have
found that there are many instances where this is simply not
the case. For example, cloning may be used to introduce
experimental optimizations to core subsystems without
negatively effecting the stability of the main code. Thus,
a variety of concerns such as stability, code ownership, and
design clarity need to be considered before any refactoring
is attempted; a manager should try to understand the reason
behind the duplication before deciding what action (if any)
to take. 1
This paper introduces eight cloning patterns that we have
uncovered during case studies on large software systems,
some of which we reported in [23, 24, 25]. These
patterns present both good and bad motivations for cloning,
and we discuss both the advantages and disadvantages of
these patterns of cloning in terms of development and
maintenance. In some cases, we identify patterns of cloning
that we believe are beneficial to the quality of the system.
From our observations we have found that refactoring may
not be the best solution in all patterns of cloning. Tools
need to be developed to aid the synchronous maintenance
of clones within a software system, such as Linked Editing
presented by Toomim et al. [29].
This paper introduces the notion of categorizing high
level patterns of cloning in a similar fashion to the
cataloging of design patterns [14] or anti-patterns [8].
There are several benefits that can be gained from
this characterization of cloning. First, it provides a
flexible framework on top of which we can document
our knowledge about how and why cloning occurs in
1A simple (but trivial) example is the title of this paper. Although there
is a kind of duplication in the wording, no ”refactoring” of the title would
carry the same connotations as the original statement.
A Study of Consistent and Inconsistent Changes to Code Clones
Jens Krinke
FernUniversit¨at in Hagen, Germany
krinke@acm.org
Abstract
Code Cloning is regarded as a threat to software main-
tenance, because it is generally assumed that a change to
a code clone usually has to be applied to the other clones
of the clone group as well. However, there exists little
empirical data that supports this assumption. This paper
presents a study on the changes applied to code clones in
open source software systems based on the changes between
versions of the system. It is analyzed if changes to code
clones are consistent to all code clones of a clone group or
not. The results show that usually half of the changes to
code clone groups are inconsistent changes. Moreover, the
study observes that when there are inconsistent changes to
a code clone group in a near version, it is rarely the case
that there are additional changes in later versions such that
the code clone group then has only consistent changes.
1 Introduction
Duplicated code is common in all kind of software sys-
tems. Although cut-copy-paste (-and-adapt) techniques are
considered bad practice, every programmer uses them.
Since these practices involve both duplication and mod-
ification, they are collectively called code cloning. While
the duplicated code is called a code clone. A clone group
consists of code clones that are clones of each other (some-
times this is also called a clone class). During the software
development cycle, code cloning is both easy and inexpen-
sive (in both cost and money). However, this practice can
complicate software maintenence in the following ways:
• Errors may have been duplicated (cloned) in parallel
with the cloned code.
• Modifications done to the original code must often be
applied to the cloned code as well.
Because of these problems, research has developed many
approaches to detect cloned code [5,6,9,12,16–18,20]. In
addition, some empirical work done has attempted to check
whether or not the above mentioned problems are relevant
in practice. Kim et al. [15] investigated the evolution of
code clones and provided a classification for evolving code
clones. Their work already showed that during the evolution
of the code clones, consistent changes to the code clones
of a group are fewer than anticipated. Aversano et al. [4]
did a similar study and they state “that the majority of clone
classes is always maintained consistently.” Geiger et al. [10]
studied the relation of code clone groups and change cou-
plings (files which are committed at the same time, by the
same author, and with the same modification description),
but could not find a (strong) relation. Therefore, this work
will present an empirical study that verifies the following
hypothesis:
During the evolution of a system, code clones of
a clone group are changed consistently.
Of course, a system may contain bugs where a change
has been applied to some code clones, but has been forgot-
ten for other code clones of the clone group. For stable
systems it can be assumed that such bugs will be resolved
at a later time. This results in a second hypothesis:
During the evolution of a system, if code clones
of a clone group are not changed consistently, the
missing changes will appear in a later version.
This work will verify the two hypotheses by studying the
changes that are applied to code clones during 200 weeks of
evolution of five open source software systems. The contri-
butions of this paper are:
• A large empirical study that examines the changes to
code clones in evolving systems. This study involves
both a greater number and diversity of systems than
previous empirical studies.
• The study will show that both hypotheses are not gen-
erally valid for the five studied systems. In summary,
clone groups are changed consistently in roughly half
of the time, invalidating the first hypothesis. The sec-
ond hypothesis is only partially valid. This is because
c2007 IEEE. To be published in the Proceedings of the 14th Working Conference on Reverse Engineering, 2007 in Vancouver, Canada. Personal use of this
material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works
for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE.
Do Code Clones Matter?
Elmar Juergens, Florian Deissenboeck, Benjamin Hummel, Stefan Wagner
Institut f¨ur Informatik, Technische Universit¨at M¨unchen
Boltzmannstr. 3, 85748 Garching b. M¨unchen, Germany
{juergens,deissenb,hummelb,wagnerst}@in.tum.de
Abstract
Code cloning is not only assumed to inflate mainte-
nance costs but also considered defect-prone as inconsistent
changes to code duplicates can lead to unexpected behavior.
Consequently, the identification of duplicated code, clone
detection, has been a very active area of research in recent
years. Up to now, however, no substantial investigation of
the consequences of code cloning on program correctness
has been carried out. To remedy this shortcoming, this pa-
per presents the results of a large-scale case study that was
undertaken to find out if inconsistent changes to cloned code
can indicate faults. For the analyzed commercial and open
source systems we not only found that inconsistent changes
to clones are very frequent but also identified a significant
number of faults induced by such changes. The clone de-
tection tool used in the case study implements a novel algo-
rithm for the detection of inconsistent clones. It is available
as open source to enable other researchers to use it as basis
for further investigations.
1. Clones  correctness
Research in software maintenance has shown that
many programs contain a significant amount of duplicated
(cloned) code. Such cloned code is considered harmful for
two reasons: (1) multiple, possibly unnecessary, duplicates
of code increase maintenance costs and, (2) inconsistent
changes to cloned code can create faults and, hence, lead
to incorrect program behavior [20, 29]. While clone detec-
tion has been a very active area of research in recent years,
up to now, there is no thorough understanding of the degree
of harmfulness of code cloning. In fact, some researchers
even started to doubt the harmfulness of cloning at all [17].
To shed light on the situation, we investigated the ef-
fects of code cloning on program correctness. It is impor-
tant to understand, that clones do not directly cause faults
but inconsistent changes to clones can lead to unexpected
program behavior. A particularly dangerous type of change
to cloned code is the inconsistent bug fix. If a fault was
found in cloned code but not fixed in all clone instances,
the system is likely to still exhibit the incorrect behavior.
To illustrate this, Fig. 1 shows an example, where a missing
null-check was retrofitted in only one clone instance.
This paper presents the results of a large-scale case study
that was undertaken to find out (1) if clones are changed in-
consistently, (2) if these inconsistencies are introduced in-
tentionally and, (3) if unintentional inconsistencies can rep-
resent faults. In this case study we analyzed three commer-
cial systems written in C#, one written in Cobol and one
open-source system written in Java. To conduct the study
we developed a novel detection algorithm that enables us
to detect inconsistent clones. We manually inspected about
900 clone groups to handle the inevitable false positives and
discussed each of the over 700 inconsistent clone groups
with the developers of the respective systems to determine
if the inconsistencies are intentional and if they represent
faults. Altogether, around 1800 individual clone group as-
sessments were manually performed in the course of the
case study. The study lead to the identification of 107 faults
that have been confirmed by the systems’ developers.
Research Problem Although most previous work agrees
that code cloning poses a problem for software mainte-
nance, “there is little information available concerning the
impacts of code clones on software quality” [29]. As the
consequences of code cloning on program correctness, in
particular, are not fully understood today, it remains unclear
how harmful code clones really are. We consider the ab-
sence of a thorough understanding of code cloning precari-
ous for software engineering research, education and prac-
tice.
Contribution The contribution of this paper is twofold.
First, we extend the existing empirical knowledge by a case
study that demonstrates that clones get changed inconsis-
tently and that such changes can represent faults. Second,
we present a novel suffix-tree based algorithm for the detec-
tion of inconsistent clones. In contrast to other algorithms
for the detection of inconsistent clones, our tool suite is
made available for other researchers as open source.
ICSE’09, May 16-24, 2009, Vancouver, Canada
978-1-4244-3452-7/09/$25.00 © 2009 IEEE 485
2
3
Code Clones: Inconsistent Changes
“During the evolution of a system, code clones
should be changed consistently to prevent bugs.”
3
3
Code Clones: Inconsistent Changes
“During the evolution of a system, code clones
should be changed consistently to prevent bugs.”
Demonstrated to be
true at a micro-level!
Juergens et al. ICSE’09, Krinke et al. WCRE’07
3
4
Code Clones: Inconsistent Changes
• Cloning an engineering tool
• Reduce complexity of programming tasks
• Facilitate easier experimentation
• Copy now, refactor later!
4
4
Code Clones: Inconsistent Changes
• Cloning an engineering tool
• Reduce complexity of programming tasks
• Facilitate easier experimentation
• Copy now, refactor later!
What happens at a
macro-level?
4
5
Revision Level vs. Release Level Analysis
Revisionsr2014 r2351r2209... ... ... r2682
5
5
Revision Level vs. Release Level Analysis
Revisionsr2014 r2351r2209... ... ... r2682
A
5
5
Revision Level vs. Release Level Analysis
Revisionsr2014 r2351r2209... ... ... r2682
A A
5
5
Revision Level vs. Release Level Analysis
Revisionsr2014 r2351r2209... ... ... r2682
A A
5
5
Revision Level vs. Release Level Analysis
Revisionsr2014 r2351r2209... ... ... r2682
A A
5
5
Revision Level vs. Release Level Analysis
Revisionsr2014 r2351r2209... ... ... r2682
A A
5
Maintenance
Experimentation
Refactoring
5
Revision Level vs. Release Level Analysis
Revisionsr2014 r2351r2209... ... ... r2682
A A
5
Maintenance
Experimentation
Refactoring
5
Revision Level vs. Release Level Analysis
Revisionsr2014 r2351r2209... ... ... r2682
Inconsistent Changes
Code Clones
Time
Amount
A A
5
Maintenance
Experimentation
Refactoring
5
Revision Level vs. Release Level Analysis
Releases2.1 2.32.2 2.4 3.0
Revisionsr2014 r2351r2209... ... ... r2682
Inconsistent Changes
Code Clones
Time
Amount
A A
5
6
Motivation
How does this look like, if we step back and
observe the effects of inconsistent changes
at release level?
6
7
Study Design: Subject Systems
22 Releases over 1 year
51 Days / release
15k Lines of code
50 Releases over 4 years
36 Days / release
90k Lines of code
7
8
Study Design: Clone Detection  Tracking
Releases2.1 2.32.2 2.4 3.0
8
8
Study Design: Clone Detection  Tracking
Releases2.1 2.32.2 2.4 3.0
Clone
Reports
8
8
Study Design: Clone Detection  Tracking
Releases2.1 2.32.2 2.4 3.0
Clone
Reports
Clone Groups
8
8
Study Design: Clone Detection  Tracking
Releases2.1 2.32.2 2.4 3.0
Clone
Reports
Clone Groups
Genealogies
8
9
Study Design: Inconsistent Changes
2.1 2.32.2 2.4 3.0
Inconsistent Change Consistent Change
9
10
Research Questions
Q1
What are the characteristics of long-lived clone
genealogies at release level?
Q2
What is the effect of inconsistent changes to code clones
when measured at release level?
Q3
What type of cloning patterns do we observe at release
level?
10
11
Research Questions
Q1
What are the characteristics of long-lived
clone genealogies at release level?
11
12
Research Question Q1
125102050
Lifetime  of  Clone  Groups
Number  of  Releases
Apache  Mina
jEdit
Number  of  Genealogies
125102050100200
Size  of  Clone  Groups
Number  of  Clones
Apache  Mina
jEdit
Number  of  Genealogies
12
12
Research Question Q1
125102050
Lifetime  of  Clone  Groups
Number  of  Releases
Apache  Mina
jEdit
Number  of  Genealogies
125102050100200
Size  of  Clone  Groups
Number  of  Clones
Apache  Mina
jEdit
Number  of  Genealogies
12
12
Research Question Q1
125102050
Lifetime  of  Clone  Groups
Number  of  Releases
Apache  Mina
jEdit
Number  of  Genealogies
125102050100200
Size  of  Clone  Groups
Number  of  Clones
Apache  Mina
jEdit
Number  of  Genealogies
12
12
Research Question Q1
125102050
Lifetime  of  Clone  Groups
Number  of  Releases
Apache  Mina
jEdit
Number  of  Genealogies
125102050100200
Size  of  Clone  Groups
Number  of  Clones
Apache  Mina
jEdit
Number  of  Genealogies
Small, rather long lived
clone-genealogies.
12
13
Research Questions
Q2
What is the effect of inconsistent changes to
code clones when measured at release level?
13
14
Research Question Q2
org.gjt.sp.jedit.jEdit.newView(View Buffer)
{
...
// show tip of the day
if(newView == viewsFirst)
{
// Don't show the welcome message if jEdit was started
// with the -nosettings switch
if(settingsDirectory != null  getBooleanProperty(firstTime))
new HelpViewer(jeditresource:/doc/welcome.html);
else if(jEdit.getBooleanProperty(tip.show))
new TipOfTheDay(newView);
}
...
org.gjt.sp.jedit.jEdit.newView(View String)
{
...
// show tip of the day
if(newView == viewsFirst)
{
// Don't show the welcome message if jEdit was started
// with the -nosettings switch
if(settingsDirectory != null  getBooleanProperty(firstTime))
new HelpViewer(jeditresource:/doc/welcome.html);
else if(jEdit.getBooleanProperty(tip.show))
new TipOfTheDay(newView);
}
...
4.0.1 4.0.2
jEdit
14
15
Research Question Q2
org.gjt.sp.jedit.jEdit.newView(View Buffer)
{
...
// show tip of the day
if(newView == viewsFirst)
{
// Don't show the welcome message if jEdit was started
// with the -nosettings switch
if(settingsDirectory != null  getBooleanProperty(firstTime))
new HelpViewer(jeditresource:/doc/welcome.html);
else if(jEdit.getBooleanProperty(tip.show))
new TipOfTheDay(newView);
}
...
org.gjt.sp.jedit.jEdit.newView(View String)
{
...
// show tip of the day
if(newView == viewsFirst)
{
// Don't show the welcome message if jEdit was started
// with the -nosettings switch
if(settingsDirectory != null  getBooleanProperty(firstTime))
new HelpViewer(jeditresource:/doc/welcome.html);
else if(jEdit.getBooleanProperty(tip.show))
new TipOfTheDay(newView);
}
...
4.0.2 4.0.3
jEdit
15
16
Research Question Q2
org.gjt.sp.jedit.jEdit.newView(View Buffer)
{
...
// show tip of the day
if(newView == viewsFirst)
{
// Don't show the welcome message if jEdit was started
// with the -nosettings switch
if(settingsDirectory != null  getBooleanProperty(firstTime))
new HelpViewer();
else if(jEdit.getBooleanProperty(tip.show))
new TipOfTheDay(newView);
}
...
org.gjt.sp.jedit.jEdit.newView(View String)
{
...
// show tip of the day
if(newView == viewsFirst)
{
// Don't show the welcome message if jEdit was started
// with the -nosettings switch
if(settingsDirectory != null  getBooleanProperty(firstTime))
new HelpViewer(jeditresource:/doc/welcome.html);
else if(jEdit.getBooleanProperty(tip.show))
new TipOfTheDay(newView);
}
...
4.0.3 4.0.4
jEdit
16
17
Research Question Q2
• 748 inconsistent changes flagged by our tool
• Manual inspection of the source code
• 7 inconsistent changes related to bugs
• For 41% - 65% of genealogies: inconsistent
changes carried out on purpose.
17
17
Research Question Q2
• 748 inconsistent changes flagged by our tool
• Manual inspection of the source code
• 7 inconsistent changes related to bugs
• For 41% - 65% of genealogies: inconsistent
changes carried out on purpose.
Independent evolution of
clones observed at
release level.
17
18
Research Questions
Q3 What type of cloning patterns do we observe at release
level?
18
19
Research Question Q3
!# $% !# 
'()**'+,,#-. / '()**'+,,#-. $
#0#)1* $ #0#)1* 
!(!12,(#320 % !(!12,(#320 
(24#'!,25!-05*2'#!4#32 67 (24#'!,25!-05*2'#!4#32 %$
82(9!,#1 $ 82(9!,#1 
2:2(#12-,  2:2(#12-, ;
!#$$%%'#(%)*+)+!),-+!)*-$+%*+./#'0-+1%*#
!#$$%%'#(%)*+)+!),-+!)*-$+%*+23,%(
!#$%'()'*+
,#%'$%-
./0
1!2'(%3
40
5#!%3*(
60
'#%
440
!7,,8((%*9
/0
%+%73,
440#'!'3(!%-+
40
!#$%'()'*+
,#%'$%-
/:0
1!2'(%3
40 5#!%3*(
;0
'#%
40
!7,,8((%*9
0
%+%73,
40
#'!'3(!%-+
;0
19
19
Research Question Q3
!# $% !# 
'()**'+,,#-. / '()**'+,,#-. $
#0#)1* $ #0#)1* 
!(!12,(#320 % !(!12,(#320 
(24#'!,25!-05*2'#!4#32 67 (24#'!,25!-05*2'#!4#32 %$
82(9!,#1 $ 82(9!,#1 
2:2(#12-,  2:2(#12-, ;
!#$$%%'#(%)*+)+!),-+!)*-$+%*+./#'0-+1%*#
!#$$%%'#(%)*+)+!),-+!)*-$+%*+23,%(
!#$%'()'*+
,#%'$%-
./0
1!2'(%3
40
5#!%3*(
60
'#%
440
!7,,8((%*9
/0
%+%73,
440#'!'3(!%-+
40
!#$%'()'*+
,#%'$%-
/:0
1!2'(%3
40 5#!%3*(
;0
'#%
40
!7,,8((%*9
0
%+%73,
40
#'!'3(!%-+
;0
Replicate and Specialize
46% - 68%
inconsistent changes due to independent
evolution of cloned parts.
19
19
Research Question Q3
!# $% !# 
'()**'+,,#-. / '()**'+,,#-. $
#0#)1* $ #0#)1* 
!(!12,(#320 % !(!12,(#320 
(24#'!,25!-05*2'#!4#32 67 (24#'!,25!-05*2'#!4#32 %$
82(9!,#1 $ 82(9!,#1 
2:2(#12-,  2:2(#12-, ;
!#$$%%'#(%)*+)+!),-+!)*-$+%*+./#'0-+1%*#
!#$$%%'#(%)*+)+!),-+!)*-$+%*+23,%(
!#$%'()'*+
,#%'$%-
./0
1!2'(%3
40
5#!%3*(
60
'#%
440
!7,,8((%*9
/0
%+%73,
440#'!'3(!%-+
40
!#$%'()'*+
,#%'$%-
/:0
1!2'(%3
40 5#!%3*(
;0
'#%
40
!7,,8((%*9
0
%+%73,
40
#'!'3(!%-+
;0
Replicate and Specialize
46% - 68%
inconsistent changes due to independent
evolution of cloned parts.
API Usage
22% - 23%
inconsistent changes due to use of distinct
parts of the API to create functionality.
19
19
Research Question Q3
!# $% !# 
'()**'+,,#-. / '()**'+,,#-. $
#0#)1* $ #0#)1* 
!(!12,(#320 % !(!12,(#320 
(24#'!,25!-05*2'#!4#32 67 (24#'!,25!-05*2'#!4#32 %$
82(9!,#1 $ 82(9!,#1 
2:2(#12-,  2:2(#12-, ;
!#$$%%'#(%)*+)+!),-+!)*-$+%*+./#'0-+1%*#
!#$$%%'#(%)*+)+!),-+!)*-$+%*+23,%(
!#$%'()'*+
,#%'$%-
./0
1!2'(%3
40
5#!%3*(
60
'#%
440
!7,,8((%*9
/0
%+%73,
440#'!'3(!%-+
40
!#$%'()'*+
,#%'$%-
/:0
1!2'(%3
40 5#!%3*(
;0
'#%
40
!7,,8((%*9
0
%+%73,
40
#'!'3(!%-+
;0
Replicate and Specialize
46% - 68%
inconsistent changes due to independent
evolution of cloned parts.
API Usage
22% - 23%
inconsistent changes due to use of distinct
parts of the API to create functionality.
Developers seem
to effectively handle
inconsistent changes
at macro-level.
19
20
Conclusions
20
20
Conclusions
20
20
Conclusions
20
20
Conclusions
20
20
Conclusions
20
20
Conclusions
QUESTIONS?
20

More Related Content

Viewers also liked

An Empirical Study Of Function Clones In Open Source Software
An Empirical Study Of Function Clones In Open Source SoftwareAn Empirical Study Of Function Clones In Open Source Software
An Empirical Study Of Function Clones In Open Source Softwarek_furqan
 
An Empirical Study of Duplication in Cascading Style Sheets
An Empirical Study of Duplication in Cascading Style SheetsAn Empirical Study of Duplication in Cascading Style Sheets
An Empirical Study of Duplication in Cascading Style SheetsNikolaos Tsantalis
 
Late Propagation in Software Clones
Late Propagation in Software ClonesLate Propagation in Software Clones
Late Propagation in Software ClonesFoutse Khomh
 
Improving the Unification of Software Clones Using Tree and Graph Matching Al...
Improving the Unification of Software Clones Using Tree and Graph Matching Al...Improving the Unification of Software Clones Using Tree and Graph Matching Al...
Improving the Unification of Software Clones Using Tree and Graph Matching Al...Nikolaos Tsantalis
 
Unsupervised Machine Learning for clone detection
Unsupervised Machine Learning for clone detectionUnsupervised Machine Learning for clone detection
Unsupervised Machine Learning for clone detectionValerio Maggio
 

Viewers also liked (6)

An Empirical Study Of Function Clones In Open Source Software
An Empirical Study Of Function Clones In Open Source SoftwareAn Empirical Study Of Function Clones In Open Source Software
An Empirical Study Of Function Clones In Open Source Software
 
Mps2
Mps2Mps2
Mps2
 
An Empirical Study of Duplication in Cascading Style Sheets
An Empirical Study of Duplication in Cascading Style SheetsAn Empirical Study of Duplication in Cascading Style Sheets
An Empirical Study of Duplication in Cascading Style Sheets
 
Late Propagation in Software Clones
Late Propagation in Software ClonesLate Propagation in Software Clones
Late Propagation in Software Clones
 
Improving the Unification of Software Clones Using Tree and Graph Matching Al...
Improving the Unification of Software Clones Using Tree and Graph Matching Al...Improving the Unification of Software Clones Using Tree and Graph Matching Al...
Improving the Unification of Software Clones Using Tree and Graph Matching Al...
 
Unsupervised Machine Learning for clone detection
Unsupervised Machine Learning for clone detectionUnsupervised Machine Learning for clone detection
Unsupervised Machine Learning for clone detection
 

Similar to Inconsistent Changes to Code Clones Often Half of Changes

Software Product Line Analysis and Detection of Clones
Software Product Line Analysis and Detection of ClonesSoftware Product Line Analysis and Detection of Clones
Software Product Line Analysis and Detection of ClonesRSIS International
 
Annotated Bibliography .Guidelines Annotated Bibliograph.docx
Annotated Bibliography  .Guidelines Annotated Bibliograph.docxAnnotated Bibliography  .Guidelines Annotated Bibliograph.docx
Annotated Bibliography .Guidelines Annotated Bibliograph.docxjustine1simpson78276
 
Paper id 22201490
Paper id 22201490Paper id 22201490
Paper id 22201490IJRAT
 
Thesis+of+fehmi+jaafar.ppt
Thesis+of+fehmi+jaafar.pptThesis+of+fehmi+jaafar.ppt
Thesis+of+fehmi+jaafar.pptPtidej Team
 
ScalaItaly 2015 - Your Microservice as a Function
ScalaItaly 2015 - Your Microservice as a FunctionScalaItaly 2015 - Your Microservice as a Function
ScalaItaly 2015 - Your Microservice as a FunctionPhil Calçado
 
Phil Calçado - Your microservice as a function
Phil Calçado - Your microservice as a functionPhil Calçado - Your microservice as a function
Phil Calçado - Your microservice as a functionScala Italy
 
Early Detection of Collaboration Conflicts & Risks in Software Development
Early Detection of Collaboration Conflicts & Risks in Software DevelopmentEarly Detection of Collaboration Conflicts & Risks in Software Development
Early Detection of Collaboration Conflicts & Risks in Software DevelopmentRoopesh Jhurani
 
IRJET- Code Cloning using Abstract Syntax Tree
IRJET- Code Cloning using Abstract Syntax TreeIRJET- Code Cloning using Abstract Syntax Tree
IRJET- Code Cloning using Abstract Syntax TreeIRJET Journal
 
Applying Machine Learning to Software Clustering
Applying Machine Learning to Software ClusteringApplying Machine Learning to Software Clustering
Applying Machine Learning to Software Clusteringbutest
 
LOCK-FREE PARALLEL ACCESS COLLECTIONS
LOCK-FREE PARALLEL ACCESS COLLECTIONSLOCK-FREE PARALLEL ACCESS COLLECTIONS
LOCK-FREE PARALLEL ACCESS COLLECTIONSijdpsjournal
 
Lock free parallel access collections
Lock free parallel access collectionsLock free parallel access collections
Lock free parallel access collectionsijdpsjournal
 
Rhein-Main Scala Enthusiasts — Your microservice as a Function
Rhein-Main Scala Enthusiasts — Your microservice as a FunctionRhein-Main Scala Enthusiasts — Your microservice as a Function
Rhein-Main Scala Enthusiasts — Your microservice as a FunctionPhil Calçado
 
A novel approach based on topic
A novel approach based on topicA novel approach based on topic
A novel approach based on topiccsandit
 
Got Myth? Myths in Software Engineering
Got Myth? Myths in Software EngineeringGot Myth? Myths in Software Engineering
Got Myth? Myths in Software EngineeringThomas Zimmermann
 
Software Refactoring Under Uncertainty: A Robust Multi-Objective Approach
Software Refactoring Under Uncertainty:  A Robust Multi-Objective ApproachSoftware Refactoring Under Uncertainty:  A Robust Multi-Objective Approach
Software Refactoring Under Uncertainty: A Robust Multi-Objective ApproachWiem Mkaouer
 
A Novel Approach for Code Clone Detection Using Hybrid Technique
A Novel Approach for Code Clone Detection Using Hybrid TechniqueA Novel Approach for Code Clone Detection Using Hybrid Technique
A Novel Approach for Code Clone Detection Using Hybrid TechniqueINFOGAIN PUBLICATION
 
SECURITY FOR DEVOPS DEPLOYMENT PROCESSES: DEFENSES, RISKS, RESEARCH DIRECTIONS
SECURITY FOR DEVOPS DEPLOYMENT PROCESSES: DEFENSES, RISKS, RESEARCH DIRECTIONSSECURITY FOR DEVOPS DEPLOYMENT PROCESSES: DEFENSES, RISKS, RESEARCH DIRECTIONS
SECURITY FOR DEVOPS DEPLOYMENT PROCESSES: DEFENSES, RISKS, RESEARCH DIRECTIONSijseajournal
 
130404 fehmi jaafar - on the relationship between program evolution and fau...
130404   fehmi jaafar - on the relationship between program evolution and fau...130404   fehmi jaafar - on the relationship between program evolution and fau...
130404 fehmi jaafar - on the relationship between program evolution and fau...Ptidej Team
 

Similar to Inconsistent Changes to Code Clones Often Half of Changes (20)

Software Product Line Analysis and Detection of Clones
Software Product Line Analysis and Detection of ClonesSoftware Product Line Analysis and Detection of Clones
Software Product Line Analysis and Detection of Clones
 
Annotated Bibliography .Guidelines Annotated Bibliograph.docx
Annotated Bibliography  .Guidelines Annotated Bibliograph.docxAnnotated Bibliography  .Guidelines Annotated Bibliograph.docx
Annotated Bibliography .Guidelines Annotated Bibliograph.docx
 
Paper id 22201490
Paper id 22201490Paper id 22201490
Paper id 22201490
 
Thesis+of+fehmi+jaafar.ppt
Thesis+of+fehmi+jaafar.pptThesis+of+fehmi+jaafar.ppt
Thesis+of+fehmi+jaafar.ppt
 
ScalaItaly 2015 - Your Microservice as a Function
ScalaItaly 2015 - Your Microservice as a FunctionScalaItaly 2015 - Your Microservice as a Function
ScalaItaly 2015 - Your Microservice as a Function
 
Phil Calçado - Your microservice as a function
Phil Calçado - Your microservice as a functionPhil Calçado - Your microservice as a function
Phil Calçado - Your microservice as a function
 
Early Detection of Collaboration Conflicts & Risks in Software Development
Early Detection of Collaboration Conflicts & Risks in Software DevelopmentEarly Detection of Collaboration Conflicts & Risks in Software Development
Early Detection of Collaboration Conflicts & Risks in Software Development
 
IRJET- Code Cloning using Abstract Syntax Tree
IRJET- Code Cloning using Abstract Syntax TreeIRJET- Code Cloning using Abstract Syntax Tree
IRJET- Code Cloning using Abstract Syntax Tree
 
Applying Machine Learning to Software Clustering
Applying Machine Learning to Software ClusteringApplying Machine Learning to Software Clustering
Applying Machine Learning to Software Clustering
 
LOCK-FREE PARALLEL ACCESS COLLECTIONS
LOCK-FREE PARALLEL ACCESS COLLECTIONSLOCK-FREE PARALLEL ACCESS COLLECTIONS
LOCK-FREE PARALLEL ACCESS COLLECTIONS
 
Lock free parallel access collections
Lock free parallel access collectionsLock free parallel access collections
Lock free parallel access collections
 
scp
scpscp
scp
 
Most Influential Paper - SANER 2017
Most Influential Paper - SANER 2017Most Influential Paper - SANER 2017
Most Influential Paper - SANER 2017
 
Rhein-Main Scala Enthusiasts — Your microservice as a Function
Rhein-Main Scala Enthusiasts — Your microservice as a FunctionRhein-Main Scala Enthusiasts — Your microservice as a Function
Rhein-Main Scala Enthusiasts — Your microservice as a Function
 
A novel approach based on topic
A novel approach based on topicA novel approach based on topic
A novel approach based on topic
 
Got Myth? Myths in Software Engineering
Got Myth? Myths in Software EngineeringGot Myth? Myths in Software Engineering
Got Myth? Myths in Software Engineering
 
Software Refactoring Under Uncertainty: A Robust Multi-Objective Approach
Software Refactoring Under Uncertainty:  A Robust Multi-Objective ApproachSoftware Refactoring Under Uncertainty:  A Robust Multi-Objective Approach
Software Refactoring Under Uncertainty: A Robust Multi-Objective Approach
 
A Novel Approach for Code Clone Detection Using Hybrid Technique
A Novel Approach for Code Clone Detection Using Hybrid TechniqueA Novel Approach for Code Clone Detection Using Hybrid Technique
A Novel Approach for Code Clone Detection Using Hybrid Technique
 
SECURITY FOR DEVOPS DEPLOYMENT PROCESSES: DEFENSES, RISKS, RESEARCH DIRECTIONS
SECURITY FOR DEVOPS DEPLOYMENT PROCESSES: DEFENSES, RISKS, RESEARCH DIRECTIONSSECURITY FOR DEVOPS DEPLOYMENT PROCESSES: DEFENSES, RISKS, RESEARCH DIRECTIONS
SECURITY FOR DEVOPS DEPLOYMENT PROCESSES: DEFENSES, RISKS, RESEARCH DIRECTIONS
 
130404 fehmi jaafar - on the relationship between program evolution and fau...
130404   fehmi jaafar - on the relationship between program evolution and fau...130404   fehmi jaafar - on the relationship between program evolution and fau...
130404 fehmi jaafar - on the relationship between program evolution and fau...
 

More from SAIL_QU

Studying the Integration Practices and the Evolution of Ad Libraries in the G...
Studying the Integration Practices and the Evolution of Ad Libraries in the G...Studying the Integration Practices and the Evolution of Ad Libraries in the G...
Studying the Integration Practices and the Evolution of Ad Libraries in the G...SAIL_QU
 
Studying the Dialogue Between Users and Developers of Free Apps in the Google...
Studying the Dialogue Between Users and Developers of Free Apps in the Google...Studying the Dialogue Between Users and Developers of Free Apps in the Google...
Studying the Dialogue Between Users and Developers of Free Apps in the Google...SAIL_QU
 
Improving the testing efficiency of selenium-based load tests
Improving the testing efficiency of selenium-based load testsImproving the testing efficiency of selenium-based load tests
Improving the testing efficiency of selenium-based load testsSAIL_QU
 
Studying User-Developer Interactions Through the Distribution and Reviewing M...
Studying User-Developer Interactions Through the Distribution and Reviewing M...Studying User-Developer Interactions Through the Distribution and Reviewing M...
Studying User-Developer Interactions Through the Distribution and Reviewing M...SAIL_QU
 
Studying online distribution platforms for games through the mining of data f...
Studying online distribution platforms for games through the mining of data f...Studying online distribution platforms for games through the mining of data f...
Studying online distribution platforms for games through the mining of data f...SAIL_QU
 
Understanding the Factors for Fast Answers in Technical Q&A Websites: An Empi...
Understanding the Factors for Fast Answers in Technical Q&A Websites: An Empi...Understanding the Factors for Fast Answers in Technical Q&A Websites: An Empi...
Understanding the Factors for Fast Answers in Technical Q&A Websites: An Empi...SAIL_QU
 
Investigating the Challenges in Selenium Usage and Improving the Testing Effi...
Investigating the Challenges in Selenium Usage and Improving the Testing Effi...Investigating the Challenges in Selenium Usage and Improving the Testing Effi...
Investigating the Challenges in Selenium Usage and Improving the Testing Effi...SAIL_QU
 
Mining Development Knowledge to Understand and Support Software Logging Pract...
Mining Development Knowledge to Understand and Support Software Logging Pract...Mining Development Knowledge to Understand and Support Software Logging Pract...
Mining Development Knowledge to Understand and Support Software Logging Pract...SAIL_QU
 
Which Log Level Should Developers Choose For a New Logging Statement?
Which Log Level Should Developers Choose For a New Logging Statement?Which Log Level Should Developers Choose For a New Logging Statement?
Which Log Level Should Developers Choose For a New Logging Statement?SAIL_QU
 
Towards Just-in-Time Suggestions for Log Changes
Towards Just-in-Time Suggestions for Log ChangesTowards Just-in-Time Suggestions for Log Changes
Towards Just-in-Time Suggestions for Log ChangesSAIL_QU
 
The Impact of Task Granularity on Co-evolution Analyses
The Impact of Task Granularity on Co-evolution AnalysesThe Impact of Task Granularity on Co-evolution Analyses
The Impact of Task Granularity on Co-evolution AnalysesSAIL_QU
 
A Framework for Evaluating the Results of the SZZ Approach for Identifying Bu...
A Framework for Evaluating the Results of the SZZ Approach for Identifying Bu...A Framework for Evaluating the Results of the SZZ Approach for Identifying Bu...
A Framework for Evaluating the Results of the SZZ Approach for Identifying Bu...SAIL_QU
 
How are Discussions Associated with Bug Reworking? An Empirical Study on Open...
How are Discussions Associated with Bug Reworking? An Empirical Study on Open...How are Discussions Associated with Bug Reworking? An Empirical Study on Open...
How are Discussions Associated with Bug Reworking? An Empirical Study on Open...SAIL_QU
 
A Study of the Relation of Mobile Device Attributes with the User-Perceived Q...
A Study of the Relation of Mobile Device Attributes with the User-Perceived Q...A Study of the Relation of Mobile Device Attributes with the User-Perceived Q...
A Study of the Relation of Mobile Device Attributes with the User-Perceived Q...SAIL_QU
 
A Large-Scale Study of the Impact of Feature Selection Techniques on Defect C...
A Large-Scale Study of the Impact of Feature Selection Techniques on Defect C...A Large-Scale Study of the Impact of Feature Selection Techniques on Defect C...
A Large-Scale Study of the Impact of Feature Selection Techniques on Defect C...SAIL_QU
 
Studying the Dialogue Between Users and Developers of Free Apps in the Google...
Studying the Dialogue Between Users and Developers of Free Apps in the Google...Studying the Dialogue Between Users and Developers of Free Apps in the Google...
Studying the Dialogue Between Users and Developers of Free Apps in the Google...SAIL_QU
 
What Do Programmers Know about Software Energy Consumption?
What Do Programmers Know about Software Energy Consumption?What Do Programmers Know about Software Energy Consumption?
What Do Programmers Know about Software Energy Consumption?SAIL_QU
 
Threshold for Size and Complexity Metrics: A Case Study from the Perspective ...
Threshold for Size and Complexity Metrics: A Case Study from the Perspective ...Threshold for Size and Complexity Metrics: A Case Study from the Perspective ...
Threshold for Size and Complexity Metrics: A Case Study from the Perspective ...SAIL_QU
 
Revisiting the Experimental Design Choices for Approaches for the Automated R...
Revisiting the Experimental Design Choices for Approaches for the Automated R...Revisiting the Experimental Design Choices for Approaches for the Automated R...
Revisiting the Experimental Design Choices for Approaches for the Automated R...SAIL_QU
 
Measuring Program Comprehension: A Large-Scale Field Study with Professionals
Measuring Program Comprehension: A Large-Scale Field Study with ProfessionalsMeasuring Program Comprehension: A Large-Scale Field Study with Professionals
Measuring Program Comprehension: A Large-Scale Field Study with ProfessionalsSAIL_QU
 

More from SAIL_QU (20)

Studying the Integration Practices and the Evolution of Ad Libraries in the G...
Studying the Integration Practices and the Evolution of Ad Libraries in the G...Studying the Integration Practices and the Evolution of Ad Libraries in the G...
Studying the Integration Practices and the Evolution of Ad Libraries in the G...
 
Studying the Dialogue Between Users and Developers of Free Apps in the Google...
Studying the Dialogue Between Users and Developers of Free Apps in the Google...Studying the Dialogue Between Users and Developers of Free Apps in the Google...
Studying the Dialogue Between Users and Developers of Free Apps in the Google...
 
Improving the testing efficiency of selenium-based load tests
Improving the testing efficiency of selenium-based load testsImproving the testing efficiency of selenium-based load tests
Improving the testing efficiency of selenium-based load tests
 
Studying User-Developer Interactions Through the Distribution and Reviewing M...
Studying User-Developer Interactions Through the Distribution and Reviewing M...Studying User-Developer Interactions Through the Distribution and Reviewing M...
Studying User-Developer Interactions Through the Distribution and Reviewing M...
 
Studying online distribution platforms for games through the mining of data f...
Studying online distribution platforms for games through the mining of data f...Studying online distribution platforms for games through the mining of data f...
Studying online distribution platforms for games through the mining of data f...
 
Understanding the Factors for Fast Answers in Technical Q&A Websites: An Empi...
Understanding the Factors for Fast Answers in Technical Q&A Websites: An Empi...Understanding the Factors for Fast Answers in Technical Q&A Websites: An Empi...
Understanding the Factors for Fast Answers in Technical Q&A Websites: An Empi...
 
Investigating the Challenges in Selenium Usage and Improving the Testing Effi...
Investigating the Challenges in Selenium Usage and Improving the Testing Effi...Investigating the Challenges in Selenium Usage and Improving the Testing Effi...
Investigating the Challenges in Selenium Usage and Improving the Testing Effi...
 
Mining Development Knowledge to Understand and Support Software Logging Pract...
Mining Development Knowledge to Understand and Support Software Logging Pract...Mining Development Knowledge to Understand and Support Software Logging Pract...
Mining Development Knowledge to Understand and Support Software Logging Pract...
 
Which Log Level Should Developers Choose For a New Logging Statement?
Which Log Level Should Developers Choose For a New Logging Statement?Which Log Level Should Developers Choose For a New Logging Statement?
Which Log Level Should Developers Choose For a New Logging Statement?
 
Towards Just-in-Time Suggestions for Log Changes
Towards Just-in-Time Suggestions for Log ChangesTowards Just-in-Time Suggestions for Log Changes
Towards Just-in-Time Suggestions for Log Changes
 
The Impact of Task Granularity on Co-evolution Analyses
The Impact of Task Granularity on Co-evolution AnalysesThe Impact of Task Granularity on Co-evolution Analyses
The Impact of Task Granularity on Co-evolution Analyses
 
A Framework for Evaluating the Results of the SZZ Approach for Identifying Bu...
A Framework for Evaluating the Results of the SZZ Approach for Identifying Bu...A Framework for Evaluating the Results of the SZZ Approach for Identifying Bu...
A Framework for Evaluating the Results of the SZZ Approach for Identifying Bu...
 
How are Discussions Associated with Bug Reworking? An Empirical Study on Open...
How are Discussions Associated with Bug Reworking? An Empirical Study on Open...How are Discussions Associated with Bug Reworking? An Empirical Study on Open...
How are Discussions Associated with Bug Reworking? An Empirical Study on Open...
 
A Study of the Relation of Mobile Device Attributes with the User-Perceived Q...
A Study of the Relation of Mobile Device Attributes with the User-Perceived Q...A Study of the Relation of Mobile Device Attributes with the User-Perceived Q...
A Study of the Relation of Mobile Device Attributes with the User-Perceived Q...
 
A Large-Scale Study of the Impact of Feature Selection Techniques on Defect C...
A Large-Scale Study of the Impact of Feature Selection Techniques on Defect C...A Large-Scale Study of the Impact of Feature Selection Techniques on Defect C...
A Large-Scale Study of the Impact of Feature Selection Techniques on Defect C...
 
Studying the Dialogue Between Users and Developers of Free Apps in the Google...
Studying the Dialogue Between Users and Developers of Free Apps in the Google...Studying the Dialogue Between Users and Developers of Free Apps in the Google...
Studying the Dialogue Between Users and Developers of Free Apps in the Google...
 
What Do Programmers Know about Software Energy Consumption?
What Do Programmers Know about Software Energy Consumption?What Do Programmers Know about Software Energy Consumption?
What Do Programmers Know about Software Energy Consumption?
 
Threshold for Size and Complexity Metrics: A Case Study from the Perspective ...
Threshold for Size and Complexity Metrics: A Case Study from the Perspective ...Threshold for Size and Complexity Metrics: A Case Study from the Perspective ...
Threshold for Size and Complexity Metrics: A Case Study from the Perspective ...
 
Revisiting the Experimental Design Choices for Approaches for the Automated R...
Revisiting the Experimental Design Choices for Approaches for the Automated R...Revisiting the Experimental Design Choices for Approaches for the Automated R...
Revisiting the Experimental Design Choices for Approaches for the Automated R...
 
Measuring Program Comprehension: A Large-Scale Field Study with Professionals
Measuring Program Comprehension: A Large-Scale Field Study with ProfessionalsMeasuring Program Comprehension: A Large-Scale Field Study with Professionals
Measuring Program Comprehension: A Large-Scale Field Study with Professionals
 

Inconsistent Changes to Code Clones Often Half of Changes

  • 1. Inconsistent Changes to Code Clones Nicolas Bettenburg An Empirical Study on at Release Level Weyi Shang Walid Ibrahim Bram Adams Ying Zou Ahmed E. Hassan 1
  • 2. 2 Code Clones: Recent Research in the Field “Cloning Considered Harmful” Considered Harmful Cory Kapser and Michael W. Godfrey Software Architecture Group (SWAG) David R. Cheriton School of Computer Science, University of Waterloo {cjkapser, migod}@uwaterloo.ca Abstract Current literature on the topic of duplicated (cloned) code in software systems often considers duplication harmful to the system quality and the reasons commonly cited for duplicating code often have a negative connotation. While these positions are sometimes correct, during our case studies we have found that this is not universally true, and we have found several situations where code duplication seems to be a reasonable or even beneficial design option. For example, a method of introducing experimental changes to core subsystems is to duplicate the subsystem and introduce changes there in a kind of sandbox testbed. As features mature and become stable within the experimental subsystem, they can then be introduced gradually into the stable code base. In this way risk of introducing instabilities in the stable version is minimized. This paper describes several patterns of cloning that we have encountered in our case studies and discusses the advantages and disadvantages associated with using them. 1. Introduction It is believed that most large software systems contain a non-trivial amount of redundant code. Often referred to as code clones, these segments of code typically involve 10–15% of the source code [24, 25]. Code clones can arise through a number of different activities. For example, intentional clones may be introduced through direct “copy- and-pasting” of code. Unintentional clones on the other hand may be the manifestation of programming idioms related to the language or libraries the developers are using. In much of the literature on the topic [2, 7, 12, 21, 22, 27, 28], cloning is considered harmful to the quality of the source code. Code clones can cause additional maintenance effort. Changes to one segment of code may need to be propagated to several others, incurring unnecessary maintenance costs [15]. Locating and maintaining these clones pose additional problems if they do not evolve synchronously. With this in mind, methods for automatic refactoring have been suggested [4, 7], and tools specifically to aid developers in the manual refactoring of clones have also been developed [19]. There is no doubt that code cloning is often an indication of sloppy design and in such cases should be considered to be a kind of development “bad smell”. However, we have found that there are many instances where this is simply not the case. For example, cloning may be used to introduce experimental optimizations to core subsystems without negatively effecting the stability of the main code. Thus, a variety of concerns such as stability, code ownership, and design clarity need to be considered before any refactoring is attempted; a manager should try to understand the reason behind the duplication before deciding what action (if any) to take. 1 This paper introduces eight cloning patterns that we have uncovered during case studies on large software systems, some of which we reported in [23, 24, 25]. These patterns present both good and bad motivations for cloning, and we discuss both the advantages and disadvantages of these patterns of cloning in terms of development and maintenance. In some cases, we identify patterns of cloning that we believe are beneficial to the quality of the system. From our observations we have found that refactoring may not be the best solution in all patterns of cloning. Tools need to be developed to aid the synchronous maintenance of clones within a software system, such as Linked Editing presented by Toomim et al. [29]. This paper introduces the notion of categorizing high level patterns of cloning in a similar fashion to the cataloging of design patterns [14] or anti-patterns [8]. There are several benefits that can be gained from this characterization of cloning. First, it provides a flexible framework on top of which we can document our knowledge about how and why cloning occurs in 1A simple (but trivial) example is the title of this paper. Although there is a kind of duplication in the wording, no ”refactoring” of the title would carry the same connotations as the original statement. A Study of Consistent and Inconsistent Changes to Code Clones Jens Krinke FernUniversit¨at in Hagen, Germany krinke@acm.org Abstract Code Cloning is regarded as a threat to software main- tenance, because it is generally assumed that a change to a code clone usually has to be applied to the other clones of the clone group as well. However, there exists little empirical data that supports this assumption. This paper presents a study on the changes applied to code clones in open source software systems based on the changes between versions of the system. It is analyzed if changes to code clones are consistent to all code clones of a clone group or not. The results show that usually half of the changes to code clone groups are inconsistent changes. Moreover, the study observes that when there are inconsistent changes to a code clone group in a near version, it is rarely the case that there are additional changes in later versions such that the code clone group then has only consistent changes. 1 Introduction Duplicated code is common in all kind of software sys- tems. Although cut-copy-paste (-and-adapt) techniques are considered bad practice, every programmer uses them. Since these practices involve both duplication and mod- ification, they are collectively called code cloning. While the duplicated code is called a code clone. A clone group consists of code clones that are clones of each other (some- times this is also called a clone class). During the software development cycle, code cloning is both easy and inexpen- sive (in both cost and money). However, this practice can complicate software maintenence in the following ways: • Errors may have been duplicated (cloned) in parallel with the cloned code. • Modifications done to the original code must often be applied to the cloned code as well. Because of these problems, research has developed many approaches to detect cloned code [5,6,9,12,16–18,20]. In addition, some empirical work done has attempted to check whether or not the above mentioned problems are relevant in practice. Kim et al. [15] investigated the evolution of code clones and provided a classification for evolving code clones. Their work already showed that during the evolution of the code clones, consistent changes to the code clones of a group are fewer than anticipated. Aversano et al. [4] did a similar study and they state “that the majority of clone classes is always maintained consistently.” Geiger et al. [10] studied the relation of code clone groups and change cou- plings (files which are committed at the same time, by the same author, and with the same modification description), but could not find a (strong) relation. Therefore, this work will present an empirical study that verifies the following hypothesis: During the evolution of a system, code clones of a clone group are changed consistently. Of course, a system may contain bugs where a change has been applied to some code clones, but has been forgot- ten for other code clones of the clone group. For stable systems it can be assumed that such bugs will be resolved at a later time. This results in a second hypothesis: During the evolution of a system, if code clones of a clone group are not changed consistently, the missing changes will appear in a later version. This work will verify the two hypotheses by studying the changes that are applied to code clones during 200 weeks of evolution of five open source software systems. The contri- butions of this paper are: • A large empirical study that examines the changes to code clones in evolving systems. This study involves both a greater number and diversity of systems than previous empirical studies. • The study will show that both hypotheses are not gen- erally valid for the five studied systems. In summary, clone groups are changed consistently in roughly half of the time, invalidating the first hypothesis. The sec- ond hypothesis is only partially valid. This is because c2007 IEEE. To be published in the Proceedings of the 14th Working Conference on Reverse Engineering, 2007 in Vancouver, Canada. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE. Do Code Clones Matter? Elmar Juergens, Florian Deissenboeck, Benjamin Hummel, Stefan Wagner Institut f¨ur Informatik, Technische Universit¨at M¨unchen Boltzmannstr. 3, 85748 Garching b. M¨unchen, Germany {juergens,deissenb,hummelb,wagnerst}@in.tum.de Abstract Code cloning is not only assumed to inflate mainte- nance costs but also considered defect-prone as inconsistent changes to code duplicates can lead to unexpected behavior. Consequently, the identification of duplicated code, clone detection, has been a very active area of research in recent years. Up to now, however, no substantial investigation of the consequences of code cloning on program correctness has been carried out. To remedy this shortcoming, this pa- per presents the results of a large-scale case study that was undertaken to find out if inconsistent changes to cloned code can indicate faults. For the analyzed commercial and open source systems we not only found that inconsistent changes to clones are very frequent but also identified a significant number of faults induced by such changes. The clone de- tection tool used in the case study implements a novel algo- rithm for the detection of inconsistent clones. It is available as open source to enable other researchers to use it as basis for further investigations. 1. Clones correctness Research in software maintenance has shown that many programs contain a significant amount of duplicated (cloned) code. Such cloned code is considered harmful for two reasons: (1) multiple, possibly unnecessary, duplicates of code increase maintenance costs and, (2) inconsistent changes to cloned code can create faults and, hence, lead to incorrect program behavior [20, 29]. While clone detec- tion has been a very active area of research in recent years, up to now, there is no thorough understanding of the degree of harmfulness of code cloning. In fact, some researchers even started to doubt the harmfulness of cloning at all [17]. To shed light on the situation, we investigated the ef- fects of code cloning on program correctness. It is impor- tant to understand, that clones do not directly cause faults but inconsistent changes to clones can lead to unexpected program behavior. A particularly dangerous type of change to cloned code is the inconsistent bug fix. If a fault was found in cloned code but not fixed in all clone instances, the system is likely to still exhibit the incorrect behavior. To illustrate this, Fig. 1 shows an example, where a missing null-check was retrofitted in only one clone instance. This paper presents the results of a large-scale case study that was undertaken to find out (1) if clones are changed in- consistently, (2) if these inconsistencies are introduced in- tentionally and, (3) if unintentional inconsistencies can rep- resent faults. In this case study we analyzed three commer- cial systems written in C#, one written in Cobol and one open-source system written in Java. To conduct the study we developed a novel detection algorithm that enables us to detect inconsistent clones. We manually inspected about 900 clone groups to handle the inevitable false positives and discussed each of the over 700 inconsistent clone groups with the developers of the respective systems to determine if the inconsistencies are intentional and if they represent faults. Altogether, around 1800 individual clone group as- sessments were manually performed in the course of the case study. The study lead to the identification of 107 faults that have been confirmed by the systems’ developers. Research Problem Although most previous work agrees that code cloning poses a problem for software mainte- nance, “there is little information available concerning the impacts of code clones on software quality” [29]. As the consequences of code cloning on program correctness, in particular, are not fully understood today, it remains unclear how harmful code clones really are. We consider the ab- sence of a thorough understanding of code cloning precari- ous for software engineering research, education and prac- tice. Contribution The contribution of this paper is twofold. First, we extend the existing empirical knowledge by a case study that demonstrates that clones get changed inconsis- tently and that such changes can represent faults. Second, we present a novel suffix-tree based algorithm for the detec- tion of inconsistent clones. In contrast to other algorithms for the detection of inconsistent clones, our tool suite is made available for other researchers as open source. ICSE’09, May 16-24, 2009, Vancouver, Canada 978-1-4244-3452-7/09/$25.00 © 2009 IEEE 485 2
  • 3. 3 Code Clones: Inconsistent Changes “During the evolution of a system, code clones should be changed consistently to prevent bugs.” 3
  • 4. 3 Code Clones: Inconsistent Changes “During the evolution of a system, code clones should be changed consistently to prevent bugs.” Demonstrated to be true at a micro-level! Juergens et al. ICSE’09, Krinke et al. WCRE’07 3
  • 5. 4 Code Clones: Inconsistent Changes • Cloning an engineering tool • Reduce complexity of programming tasks • Facilitate easier experimentation • Copy now, refactor later! 4
  • 6. 4 Code Clones: Inconsistent Changes • Cloning an engineering tool • Reduce complexity of programming tasks • Facilitate easier experimentation • Copy now, refactor later! What happens at a macro-level? 4
  • 7. 5 Revision Level vs. Release Level Analysis Revisionsr2014 r2351r2209... ... ... r2682 5
  • 8. 5 Revision Level vs. Release Level Analysis Revisionsr2014 r2351r2209... ... ... r2682 A 5
  • 9. 5 Revision Level vs. Release Level Analysis Revisionsr2014 r2351r2209... ... ... r2682 A A 5
  • 10. 5 Revision Level vs. Release Level Analysis Revisionsr2014 r2351r2209... ... ... r2682 A A 5
  • 11. 5 Revision Level vs. Release Level Analysis Revisionsr2014 r2351r2209... ... ... r2682 A A 5
  • 12. 5 Revision Level vs. Release Level Analysis Revisionsr2014 r2351r2209... ... ... r2682 A A 5
  • 13. Maintenance Experimentation Refactoring 5 Revision Level vs. Release Level Analysis Revisionsr2014 r2351r2209... ... ... r2682 A A 5
  • 14. Maintenance Experimentation Refactoring 5 Revision Level vs. Release Level Analysis Revisionsr2014 r2351r2209... ... ... r2682 Inconsistent Changes Code Clones Time Amount A A 5
  • 15. Maintenance Experimentation Refactoring 5 Revision Level vs. Release Level Analysis Releases2.1 2.32.2 2.4 3.0 Revisionsr2014 r2351r2209... ... ... r2682 Inconsistent Changes Code Clones Time Amount A A 5
  • 16. 6 Motivation How does this look like, if we step back and observe the effects of inconsistent changes at release level? 6
  • 17. 7 Study Design: Subject Systems 22 Releases over 1 year 51 Days / release 15k Lines of code 50 Releases over 4 years 36 Days / release 90k Lines of code 7
  • 18. 8 Study Design: Clone Detection Tracking Releases2.1 2.32.2 2.4 3.0 8
  • 19. 8 Study Design: Clone Detection Tracking Releases2.1 2.32.2 2.4 3.0 Clone Reports 8
  • 20. 8 Study Design: Clone Detection Tracking Releases2.1 2.32.2 2.4 3.0 Clone Reports Clone Groups 8
  • 21. 8 Study Design: Clone Detection Tracking Releases2.1 2.32.2 2.4 3.0 Clone Reports Clone Groups Genealogies 8
  • 22. 9 Study Design: Inconsistent Changes 2.1 2.32.2 2.4 3.0 Inconsistent Change Consistent Change 9
  • 23. 10 Research Questions Q1 What are the characteristics of long-lived clone genealogies at release level? Q2 What is the effect of inconsistent changes to code clones when measured at release level? Q3 What type of cloning patterns do we observe at release level? 10
  • 24. 11 Research Questions Q1 What are the characteristics of long-lived clone genealogies at release level? 11
  • 25. 12 Research Question Q1 125102050 Lifetime  of  Clone  Groups Number  of  Releases Apache  Mina jEdit Number  of  Genealogies 125102050100200 Size  of  Clone  Groups Number  of  Clones Apache  Mina jEdit Number  of  Genealogies 12
  • 26. 12 Research Question Q1 125102050 Lifetime  of  Clone  Groups Number  of  Releases Apache  Mina jEdit Number  of  Genealogies 125102050100200 Size  of  Clone  Groups Number  of  Clones Apache  Mina jEdit Number  of  Genealogies 12
  • 27. 12 Research Question Q1 125102050 Lifetime  of  Clone  Groups Number  of  Releases Apache  Mina jEdit Number  of  Genealogies 125102050100200 Size  of  Clone  Groups Number  of  Clones Apache  Mina jEdit Number  of  Genealogies 12
  • 28. 12 Research Question Q1 125102050 Lifetime  of  Clone  Groups Number  of  Releases Apache  Mina jEdit Number  of  Genealogies 125102050100200 Size  of  Clone  Groups Number  of  Clones Apache  Mina jEdit Number  of  Genealogies Small, rather long lived clone-genealogies. 12
  • 29. 13 Research Questions Q2 What is the effect of inconsistent changes to code clones when measured at release level? 13
  • 30. 14 Research Question Q2 org.gjt.sp.jedit.jEdit.newView(View Buffer) { ... // show tip of the day if(newView == viewsFirst) { // Don't show the welcome message if jEdit was started // with the -nosettings switch if(settingsDirectory != null getBooleanProperty(firstTime)) new HelpViewer(jeditresource:/doc/welcome.html); else if(jEdit.getBooleanProperty(tip.show)) new TipOfTheDay(newView); } ... org.gjt.sp.jedit.jEdit.newView(View String) { ... // show tip of the day if(newView == viewsFirst) { // Don't show the welcome message if jEdit was started // with the -nosettings switch if(settingsDirectory != null getBooleanProperty(firstTime)) new HelpViewer(jeditresource:/doc/welcome.html); else if(jEdit.getBooleanProperty(tip.show)) new TipOfTheDay(newView); } ... 4.0.1 4.0.2 jEdit 14
  • 31. 15 Research Question Q2 org.gjt.sp.jedit.jEdit.newView(View Buffer) { ... // show tip of the day if(newView == viewsFirst) { // Don't show the welcome message if jEdit was started // with the -nosettings switch if(settingsDirectory != null getBooleanProperty(firstTime)) new HelpViewer(jeditresource:/doc/welcome.html); else if(jEdit.getBooleanProperty(tip.show)) new TipOfTheDay(newView); } ... org.gjt.sp.jedit.jEdit.newView(View String) { ... // show tip of the day if(newView == viewsFirst) { // Don't show the welcome message if jEdit was started // with the -nosettings switch if(settingsDirectory != null getBooleanProperty(firstTime)) new HelpViewer(jeditresource:/doc/welcome.html); else if(jEdit.getBooleanProperty(tip.show)) new TipOfTheDay(newView); } ... 4.0.2 4.0.3 jEdit 15
  • 32. 16 Research Question Q2 org.gjt.sp.jedit.jEdit.newView(View Buffer) { ... // show tip of the day if(newView == viewsFirst) { // Don't show the welcome message if jEdit was started // with the -nosettings switch if(settingsDirectory != null getBooleanProperty(firstTime)) new HelpViewer(); else if(jEdit.getBooleanProperty(tip.show)) new TipOfTheDay(newView); } ... org.gjt.sp.jedit.jEdit.newView(View String) { ... // show tip of the day if(newView == viewsFirst) { // Don't show the welcome message if jEdit was started // with the -nosettings switch if(settingsDirectory != null getBooleanProperty(firstTime)) new HelpViewer(jeditresource:/doc/welcome.html); else if(jEdit.getBooleanProperty(tip.show)) new TipOfTheDay(newView); } ... 4.0.3 4.0.4 jEdit 16
  • 33. 17 Research Question Q2 • 748 inconsistent changes flagged by our tool • Manual inspection of the source code • 7 inconsistent changes related to bugs • For 41% - 65% of genealogies: inconsistent changes carried out on purpose. 17
  • 34. 17 Research Question Q2 • 748 inconsistent changes flagged by our tool • Manual inspection of the source code • 7 inconsistent changes related to bugs • For 41% - 65% of genealogies: inconsistent changes carried out on purpose. Independent evolution of clones observed at release level. 17
  • 35. 18 Research Questions Q3 What type of cloning patterns do we observe at release level? 18
  • 36. 19 Research Question Q3 !# $% !# '()**'+,,#-. / '()**'+,,#-. $ #0#)1* $ #0#)1* !(!12,(#320 % !(!12,(#320 (24#'!,25!-05*2'#!4#32 67 (24#'!,25!-05*2'#!4#32 %$ 82(9!,#1 $ 82(9!,#1 2:2(#12-, 2:2(#12-, ; !#$$%%'#(%)*+)+!),-+!)*-$+%*+./#'0-+1%*# !#$$%%'#(%)*+)+!),-+!)*-$+%*+23,%( !#$%'()'*+ ,#%'$%- ./0 1!2'(%3 40 5#!%3*( 60 '#% 440 !7,,8((%*9 /0 %+%73, 440#'!'3(!%-+ 40 !#$%'()'*+ ,#%'$%- /:0 1!2'(%3 40 5#!%3*( ;0 '#% 40 !7,,8((%*9 0 %+%73, 40 #'!'3(!%-+ ;0 19
  • 37. 19 Research Question Q3 !# $% !# '()**'+,,#-. / '()**'+,,#-. $ #0#)1* $ #0#)1* !(!12,(#320 % !(!12,(#320 (24#'!,25!-05*2'#!4#32 67 (24#'!,25!-05*2'#!4#32 %$ 82(9!,#1 $ 82(9!,#1 2:2(#12-, 2:2(#12-, ; !#$$%%'#(%)*+)+!),-+!)*-$+%*+./#'0-+1%*# !#$$%%'#(%)*+)+!),-+!)*-$+%*+23,%( !#$%'()'*+ ,#%'$%- ./0 1!2'(%3 40 5#!%3*( 60 '#% 440 !7,,8((%*9 /0 %+%73, 440#'!'3(!%-+ 40 !#$%'()'*+ ,#%'$%- /:0 1!2'(%3 40 5#!%3*( ;0 '#% 40 !7,,8((%*9 0 %+%73, 40 #'!'3(!%-+ ;0 Replicate and Specialize 46% - 68% inconsistent changes due to independent evolution of cloned parts. 19
  • 38. 19 Research Question Q3 !# $% !# '()**'+,,#-. / '()**'+,,#-. $ #0#)1* $ #0#)1* !(!12,(#320 % !(!12,(#320 (24#'!,25!-05*2'#!4#32 67 (24#'!,25!-05*2'#!4#32 %$ 82(9!,#1 $ 82(9!,#1 2:2(#12-, 2:2(#12-, ; !#$$%%'#(%)*+)+!),-+!)*-$+%*+./#'0-+1%*# !#$$%%'#(%)*+)+!),-+!)*-$+%*+23,%( !#$%'()'*+ ,#%'$%- ./0 1!2'(%3 40 5#!%3*( 60 '#% 440 !7,,8((%*9 /0 %+%73, 440#'!'3(!%-+ 40 !#$%'()'*+ ,#%'$%- /:0 1!2'(%3 40 5#!%3*( ;0 '#% 40 !7,,8((%*9 0 %+%73, 40 #'!'3(!%-+ ;0 Replicate and Specialize 46% - 68% inconsistent changes due to independent evolution of cloned parts. API Usage 22% - 23% inconsistent changes due to use of distinct parts of the API to create functionality. 19
  • 39. 19 Research Question Q3 !# $% !# '()**'+,,#-. / '()**'+,,#-. $ #0#)1* $ #0#)1* !(!12,(#320 % !(!12,(#320 (24#'!,25!-05*2'#!4#32 67 (24#'!,25!-05*2'#!4#32 %$ 82(9!,#1 $ 82(9!,#1 2:2(#12-, 2:2(#12-, ; !#$$%%'#(%)*+)+!),-+!)*-$+%*+./#'0-+1%*# !#$$%%'#(%)*+)+!),-+!)*-$+%*+23,%( !#$%'()'*+ ,#%'$%- ./0 1!2'(%3 40 5#!%3*( 60 '#% 440 !7,,8((%*9 /0 %+%73, 440#'!'3(!%-+ 40 !#$%'()'*+ ,#%'$%- /:0 1!2'(%3 40 5#!%3*( ;0 '#% 40 !7,,8((%*9 0 %+%73, 40 #'!'3(!%-+ ;0 Replicate and Specialize 46% - 68% inconsistent changes due to independent evolution of cloned parts. API Usage 22% - 23% inconsistent changes due to use of distinct parts of the API to create functionality. Developers seem to effectively handle inconsistent changes at macro-level. 19