Searching for Configurations in Clone Evaluation: A Replication Study [SSBSE'16]

Searching for Configurations
in Clone Evaluation:
A Replication Study
C. Ragkhitwetsagul, M. Paixao, M. Adham, S. Busari, J. Krinke
J. H. Drake
CENTRE FOR RESEARCH ON EVOLUTION, SEARCH AND TESTING
DEPARTMENT OF COMPUTER SCIENCE
UNIVERSITY COLLEGE LONDON

Searching for Configurations in Clone Evaluation: A Replication Study — C. Ragkhitwetsagul, M. Paixao, M. Adham, S. Busari, J. Krinke, J. H. Drake
Code Clone
2

Clone Detectors
3
if (x==0) then y=y+1;
if (check==0) then count=count+1;
$p ($p==0) $p $p=$p+1;
$p ($p==0) $p $p=$p+1;
if_s
if ( cond_e ) then assign_e
if_s
if ( cond_e ) then assign_e
Deckard
CCFinder
Simian
NiCad

Oracle Problem in Code Clone
Absence of the possibility to establish a ground truth, we do
not know if code is actually cloned
4
?

Agreement
5
?

Parameters Tuning
6

EvaClone
7
T. Wang, M. Harman., Y. Jia, & J. Krinke. Searching for Better
Conﬁgurations: A Rigorous Approach to Clone Evaluation. in FSE’13
6 Clone Detectors:
PMD, iClones
ConQAT, Simian,
NiCad, CCFinder
8 Software Projects:
weltab, cook, snns,
psql, javadoc, ant,
jdtcore, swing
15 years

Maximising Agreement
8
C D N S
Maximise
Clone detectors

EvaClone (cont.)
9
EvaClone favors recall over precision  
and more candidates will be reported.

Replication Study
10

Fitness Function
11
4x3x2x1x ++ +
4 x (All clone lines)

Replication Study (cont.)
12
Deckard
CCFinder
Simian
NiCad 25 parameters
Population size 100
No. of Generation 100
Crossover 0.8
Mutation 0.1
Elitism 0.25
2 x 1012

13
Ver. 0.9 1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 1.10 2.0.0 2.0.44
SLOC
(k)
5.5 6.7 6.78 6.82 7.2 7.6 8.4 8.9 10.1 12.4 17.9 22.8 23.6 25.3
%Inc N/A 21% 2% 1% 6% 5% 11% 7% 13% 23% 44% 28% 3% 8%
Note: there are 2 complete libraries (cglib and asm) embedded in release 1.5 — 1.9 and have been removed before the analysis

RQ1: Optimised Agreement
How do the default parameters perform in terms of
clone agreement on each Mockito release compared
to the optimised ones?
14
0.30
0.35
0.40
0.45
0.50
0.9 1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 1.10 2.0.0 2.0.44
Mockito
FitnessValue
Default
EvaClone Highest
EvaClone Lowest
Comparison of optimised tools agreement (the highest and the lowest in 20 runs) to the default agreement over 14 Mockito releases

RQ2: Stability of Optimised Parameters
15
Are there noticeable differences in the values of
optimised parameters over releases?
Tool Parameter DF
Optimised
0.9 1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 1.10 2.0.0 2.0.44
CCFinder
MinToken
TKS
50
12
10
10
70
16
70
18
70
19
80
18
80
18
80
19
80
20
10
14
10
17
10
10
10
10
10
10
10
10
Deckard
MinToken
Stride
Similarity
30
5
0.9
30
inf
0.9
50
8
1.0
50
8
1.0
50
8
1.0
50
8
1.0
50
8
1.0
50
8
1.0
50
8
1.0
50
16
0.95
50
5
1.0
50
inf
0.9
50
inf
0.9
50
inf
0.9
50
inf
0.9
NiCad
MinLine
MaxLine
UPI
Blind
Abstract
6
1K
0.3
0
0
5
200
0.3
1
4
7
100
0.0
0
6
7
100
0.1
0
6
7
400
0.0
0
6
6
400
0.0
0
6
6
200
0.1
0
5
6
200
0.1
0
5
7
200
0.0
1
6
6
200
0.3
1
6
5
100
0.1
1
2
5
100
0.3
1
4
5
100
0.3
1
4
5
200
0.3
1
4
5
200
0.3
1
4

RQ2: Stability of Optimised Parameters
16
Tool Parameter DF
Optimised
0.9 1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 1.10 2.0.0 2.0.44
Simian
ignoreCurlyBraces 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0
ignoreIdentifiers 0 1 0 0 0 0 0 0 0 1 1 1 1 1 1
ignoreIdentifierCase 0 ✱ ✱ ✱ ✱ ✱ ✱ ✱ ✱ ✱ ✱ ✱ ✱ ✱ ✱
ignoreStrings 0 1 0 0 0 0 0 0 0 1 0 ✱ ✱ ✱ ✱
ignoreStringCase 1 ✱ 1 1 0 0 0 0 0 ✱ 0 ✱ ✱ ✱ ✱
ignoreNumbers 0 1 0 1 0 1 1 0 1 1 0 ✱ ✱ ✱ ✱
ignoreCharacters 0 0 0 1 0 0 0 1 0 0 1 ✱ ✱ ✱ ✱
ignoreCharacterCase 1 0 0 ✱ 1 1 0 ✱ 1 1 ✱ ✱ ✱ ✱ ✱
ignoreLiterals 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1
ignoreSubtypeNames 0 1 0 0 0 0 0 0 0 0 0 0 0 1 1
ignoreModifiers 1 1 1 0 1 0 0 0 0 0 0 1 1 1 1
ignoreVariableNames 0 1 0 0 0 0 0 0 0 1 1 0 0 0 1
balanceParentheses 0 0 1 1 1 1 1 1 1 0 0 0 0 0 0
balanceSquareBrackets 0 1 0 0 0 1 1 0 1 1 1 1 1 1 0
MinLine 6 5 6 6 6 6 6 6 6 7 7 5 5 5 5
Are there noticeable differences in the values of
optimised parameters over releases?

RQ3: Clones over Releases
17
How many clones in Mockito are reported with the
highest agreement over releases?
DefaultEvaClone

Maximising Agreement
18
C D N S
Maximise
Clone detectors

Open Challenge
A better ﬁtness function  
for EvaClone is needed
It must not only rely on the number of cloned
lines, but also include other aspects:
How often a line is found to be cloned to other
places?
Precision vs. Recall?
Location of clones
19
???

20
0.30
0.35
0.40
0.45
0.50
0.9 1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 1.10 2.0.0 2.0.44
Mockito
FitnessValue
Default
EvaClone Highest
EvaClone Lowest
Opt. params vs Def. params
Tool Parameter
D
F
Optimised
0.9 1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 1.10
2.
0.
0
2.
0.
44
CCFinder
MinToken
TKS
5
0
1
2
10
10
70
16
70
18
70
19
80
18
80
18
80
19
80
20
10
14
10
17
10
10
10
10
10
10
10
10
Deckard
MinToken
Stride
Similarity
3
0
5
0.
9
30
inf
0.
9
50
8
1.
0
50
8
1.
0
50
8
1.
0
50
8
1.
0
50
8
1.
0
50
8
1.
0
50
8
1.
0
50
16
0.
95
50
5
1.
0
50
inf
0.
9
50
inf
0.
9
50
inf
0.
9
50
inf
0.
9
NiCad
MinLine
MaxLine
UPI
Blind
Abstract
6
1
K
0.
3
0
0
5
20
0
0.
3
1
4
7
10
0
0.
0
0
6
7
10
0
0.
1
0
6
7
40
0
0.
0
0
6
6
40
0
0.
0
0
6
6
20
0
0.
1
0
5
6
20
0
0.
1
0
5
7
20
0
0.
0
1
6
6
20
0
0.
3
1
6
5
10
0
0.
1
1
2
5
10
0
0.
3
1
4
5
10
0
0.
3
1
4
5
20
0
0.
3
1
4
5
20
0
0.
3
1
4
Opt. params are not stable over releases
DefaultEvaClone
Fitness func. needs improvements

Searching for Configurations in Clone Evaluation: A Replication Study [SSBSE'16]

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (13)

Similar to Searching for Configurations in Clone Evaluation: A Replication Study [SSBSE'16]

Similar to Searching for Configurations in Clone Evaluation: A Replication Study [SSBSE'16] (20)

Recently uploaded

Recently uploaded (20)

Searching for Configurations in Clone Evaluation: A Replication Study [SSBSE'16]