1. Late Propagation
in Software Clones
Liliane Barbour, Foutse Khomh,
and Ying Zou
2. Late Propagation (LP)
• Definition: An inconsistent change that diverges a
clone pair, later followed by a consistent, re-
synchronizing change.
• It can be risky because failure to propagate changes
between clones in a clone pair can lead to faults
• In our work, we found that 8-21% of genealogies
contain a late propagation
2
3. LP With Propagation Example from
ArgoUML
//Clone A, Revision 595
add Field(new UMLComboBox(typeModel),1,0,0);
//Clone B, Revision 595
add Field(new UMLComboBox(classifierModel),2,0,0);
//Diverging Change: Clone A, Revision 602
add Field(new UMLComboBoxNavigator(this,”NavClass”,
new UMLComboBox(typeModel)),1,0,0);
//Re-synchronizing Change: Clone B, Revision 604
add Field(new UMLComboBoxNavigator (this,”NavClass”,
new UMLComboBox(classifierModel)),2,0,0);
Clone A Clone B
Revision 595
Revision 602 Diverging
Change
Re-synchronizing
Revision 604 Change 3
4. LP Without Propagation Example
from Ant
//Clone A, Revision 270250 Clone A Clone B
if( destFile == null )
{ Revision
destFile = new File(destDir,file.getName()); 270250
}
//Clone B, Revision 270250 Revision Diverging
if (destFile == null ) { 270264 Change
destFile = new File(destDir,file.getName());
}
Revision Re-synchronizing
// Diverging Change: Clone A, Revision 270264 271109 Change
if ( m_destFile == null )
{
m_destFile = new File(m_destDir,m_file.getName());
}
//Re-synchronizing Change: Clone A, Revision 271109
if ( destFile == null ) {
destFile = new File(destDir,file.getName());
}
4
5. Types of Late Propagation
Propagation LP Modified During Modified During Modified During
Category Type Diverging Change the Period of Re-synchronizing
Divergence Change
Propagation LP1 A A B
Always Occurs LP2 A A and B B
LP3 A A A and B
Propagation May LP4 A A and B A
or May Not LP5 A A and B A and B
Occur
LP6 A and B A and B A or B
LP7 A and B A and B A and B
Propagation LP8 A A A
Never Occurs
5
6. Research Questions
RQ1: Are there different types of LP?
RQ2: Are some types of LP more fault-prone than
others?
RQ3: Which type of LP experiences the highest
proportion of faults?
6
7. Subject Systems
# Gen # LP # Gen # LP
System # LOC # Revisions CCFinder CCFinder Simian Simian
ArgoUML 3.1M 18k 14k 1.1k 111 23
Ant 2.3M 1.0M 30k 4.7k 461 80
7
9. Mining the SVN
• Use J-Rex to mine the SVN
• Heuristics used to identify reason for commit
(Mockus et al., 2000)
• Snapshots of all revisions to each Java file are stored
in an XML file
• Test files are removed
9
10. Clone Detection
• Contents of each method revision extracted into
individual files
• Perform clone detection once on all snapshots
• Two existing clone detection tools are used
– Simian (text-based) and CCFinder (token-based)
10
11. Building Clone Genealogies
• Build clone genealogies using the existing clone list
• Query the SVN using diff to track changes to each
clone in a clone pair over time.
• If a change modifies one of the clones in a clone
pair, query the clone list for a matching clone
11
13. RQ1: Are there different types of LP?
Breakdown of LP Type by System
80%
Percentage of All LP Occurrences
70%
60%
50%
40%
30%
20%
10%
0%
LP1 LP2 LP3 LP4 LP5 LP6 LP7 LP8
LP Types
ArgoUML - Simian ArgoUML - CCFinder Ant - Simian Ant - CCFinder
There is representation from multiple types of LP
and across all categories of LP. 13
14. RQ2: Are some types of LP more fault-
prone than others?
Part 1: Is Late Propagation fault-prone?
Part 2: Are specific types of late propagation more
fault-prone?
14
15. Part 1: Is Late Propagation Fault-
prone?
LP vs. Non-LP
Odds Ratios
4
ArgoUML – Simian
Odds Ratio
3
is omitted because
2
it is not statistically
1 significant
0
Ant - Simian ArgoUML - CCFinder Ant - CCFinder
In all significant cases, the odds ratio is greater than 1.
Therefore, LP genealogies are more fault prone than
non-LP genealogies.
15
16. Part 2: Are specific types of late
propagation more fault-prone?
Odds Ratios Between Each LP Type
and Non-LP Genealogies
16
14
12
Odds Ratio
10
8
6
4
2
0
LP1 LP2 LP3 LP4 LP5 LP6 LP7 LP8
LP Type
Ant - Simian ArgoUML - CCFinder Ant - CCFinder
Note: ArgoUML – Simian is omitted because it is not statistically significant 16
17. RQ2 Observations
• In general, some LP types are not more fault-prone
than non-LP genealogies (i.e. odds ratio < 1)
• Some types that make up a small proportion of LP
instances have a very high odds ratio
• LP7 and LP8 occur frequently but have low odds
ratios.
Each type of LP has a different level of fault-proneness.
17
18. RQ3: Which type of LP experiences
the highest proportion of faults?
18
19. RQ3: Which type of LP experiences
the highest proportion of faults?
Percentage of Fault Occurrences
Broken Down by LP Type
Percentage of Fault Occurrences
80%
60%
40%
20%
0%
LP1 LP2 LP3 LP4 LP5 LP6 LP7 LP8
LP Type
Ant - Simian ArgoUML - CCFinder Ant - CCFinder
Note: ArgoUML – Simian is omitted because it is not statistically significant 19
20. RQ3 Observations
• LP7 and LP8 contribute a large proportion of the
faults but have lower odds ratios (RQ2)
– When faults occur, they occur in large numbers
• Overall, LP7 and LP8 are the most dangerous, with
the other types being system dependent in their
fault-proneness.
The proportion of faults is different for
each LP type.
20
21. Conclusion
• In general, LP genealogies are more fault-prone than
non-LP genealogies
• LP7 and LP8 are the riskiest, in terms of their fault-
proneness and magnitude of faults.
– LP8 contains no propagation of changes
– LP7 may or may not contain any propagation of
changes
• The fault-proneness and fault-occurrence is
dependent on the LP type and is system-dependent.
21