Mon 24 - Fri 28 April 2023 Montevideo, Uruguay
Cross-language Clone
Detection for Mobile Apps
Stephannie Jimenez, Gordana Rakic,2 Silvia Takahashi, Nicolás Cardozo
Systems an Computing Engineering - Universidad de los Andes, Bogotá - Colombia
2Faculty of Sciences - University of Novi Sad - Serbia
{s.jimenez16, stakahas, n.cardozo}@uniandes.edu.co, gordana.rakic@dmi.uns.ac.rs
@ncardoz
CIbSE 2023
XXVI Ibero-American Conference on Software Engineering
XXVI Congreso Iberoamericano en Ingeniería de Software
XXVI Congresso Ibero-Americano em Engenharia de Software
2
Code clones
3
Clone detection
Python code snippets
i = 1 i = 1
3
Clone detection
Type 1 clones
Python code snippets
i = 1 i = 1
3
Clone detection
Type 1 clones
Type 2 clones
Python code snippets
i = 1
j = 2
i = 1
l = 2
3
Clone detection
Type 1 clones
Type 2 clones
Type 3 clones
Python code snippets
i = 1
j = 2
i = i + 1
i = 1
l = 2
i += 1
3
Clone detection
Type 1 clones
Type 2 clones
Type 3 clones
Type 4 clones
Python code snippets
i = 1
j = 2
i = i + 1
for i in range(1,10):
i += 1
i = 1
l = 2
i += 1
i = 1 + 9
3
Clone detection
Type 1 clones
Type 2 clones
Type 3 clones
Type 4 clones
Python code snippets
i = 1
j = 2
i = i + 1
for i in range(1,10):
i += 1
i = 1
l = 2
i += 1
i = 1 + 9
}Focus of clone
detection
4
How to detect clones
Textual
Textual
4
How to detect clones
Token or lexical
Textual
Textual
4
How to detect clones
Token or lexical Tree
Textual
Textual
4
How to detect clones
Token or lexical Tree Graph
Textual
Textual
4
How to detect clones
Token or lexical Tree Graph Hybrid
Textual
Textual
OUT OF STEP is a hybrid approach combining a
generalized tree structure and the textual
representation of tokens
5
Tools and algorithms
Tool Multi-language Type Runs
NICAD [Cody and Roy 2011] No Textual No
CCFinderX [Kamiya et al. 2002] No Lexical No
Simian [Harris 2019] No Textual No
Duploc [Gordon and Bannier 2021] No Textual Yes
SourcererCC [Sajnani et al. 2016 No Lexical No
iClones [Göde and Koschke 2009] No Lexical No
PMD/CPD [pmd 2021] No Lexical Yes
Deckard [Lingxiao et al. 2018] No Tree-base No
Licca [Vislavski et al. 2018] Yes Hybrid Yes
5
Tools and algorithms
Tool Multi-language Type Runs
NICAD [Cody and Roy 2011] No Textual No
CCFinderX [Kamiya et al. 2002] No Lexical No
Simian [Harris 2019] No Textual No
Duploc [Gordon and Bannier 2021] No Textual Yes
SourcererCC [Sajnani et al. 2016 No Lexical No
iClones [Göde and Koschke 2009] No Lexical No
PMD/CPD [pmd 2021] No Lexical Yes
Deckard [Lingxiao et al. 2018] No Tree-base No
Licca [Vislavski et al. 2018] Yes Hybrid Yes
1. Languages have different
shapes and forms
5
Tools and algorithms
Tool Multi-language Type Runs
NICAD [Cody and Roy 2011] No Textual No
CCFinderX [Kamiya et al. 2002] No Lexical No
Simian [Harris 2019] No Textual No
Duploc [Gordon and Bannier 2021] No Textual Yes
SourcererCC [Sajnani et al. 2016 No Lexical No
iClones [Göde and Koschke 2009] No Lexical No
PMD/CPD [pmd 2021] No Lexical Yes
Deckard [Lingxiao et al. 2018] No Tree-base No
Licca [Vislavski et al. 2018] Yes Hybrid Yes
1. Languages have different
shapes and forms
2. We need to abstract away from
language details while keeping a
reference to the language
5
Tools and algorithms
Tool Multi-language Type Runs
NICAD [Cody and Roy 2011] No Textual No
CCFinderX [Kamiya et al. 2002] No Lexical No
Simian [Harris 2019] No Textual No
Duploc [Gordon and Bannier 2021] No Textual Yes
SourcererCC [Sajnani et al. 2016 No Lexical No
iClones [Göde and Koschke 2009] No Lexical No
PMD/CPD [pmd 2021] No Lexical Yes
Deckard [Lingxiao et al. 2018] No Tree-base No
Licca [Vislavski et al. 2018] Yes Hybrid Yes
1. Languages have different
shapes and forms
2. We need to abstract away from
language details while keeping a
reference to the language
3. In current
settings
6
Mobile development
APP
6
Mobile development
APP
6
Mobile development
APP
FEATURE DIVERGENCE
7
Kotlin
var i = 1
val j = 2
i = i + 1
for(i in 1..10) {
i += 1
}
int i = 1;
int l = 2;
i += 1;
i = 1 + 9;
Dart
OUT OF STEP
7
Type 1 clones
Type 2 clones
Type 3 clones
Type 4 clones
Kotlin
var i = 1
val j = 2
i = i + 1
for(i in 1..10) {
i += 1
}
int i = 1;
int l = 2;
i += 1;
i = 1 + 9;
}Focus of clone
detection
Dart
OUT OF STEP
8
Clone analysis process
Grammar definition
8
Clone analysis process
Grammar definition
ANTLR 4
generation
8
Clone analysis process
Grammar definition
ANTLR 4
generation
Lexer
parser
8
Clone analysis process
Grammar definition
ANTLR 4
generation
sum 1
+=
int num
eCST
Lexer
parser
8
Clone analysis process
Grammar definition
ANTLR 4
generation
sum 1
+=
int num
eCST
Lexer
parser
OutOfStep
8
Clone analysis process
Grammar definition
ANTLR 4
generation
sum 1
+=
int num
eCST
Lexer
parser
OutOfStep
Type 1
Type 2
Type 3
Clone
detection
9
Enriched Concrete Syntax Tree (eCST)
main(int num) {
int sum = 0;
for (i in i..num){
// Count numbers
// sum = sum + 1;
sum += 1;
}
}
9
Enriched Concrete Syntax Tree (eCST)
main(int num) {
int sum = 0;
for (i in i..num){
// Count numbers
// sum = sum + 1;
sum += 1;
}
}
File
Advance nodes
9
Enriched Concrete Syntax Tree (eCST)
main(int num) {
int sum = 0;
for (i in i..num){
// Count numbers
// sum = sum + 1;
sum += 1;
}
}
File
fun_decl fun_body
Universal nodes Advance nodes Stop nodes
9
Enriched Concrete Syntax Tree (eCST)
main(int num) {
int sum = 0;
for (i in i..num){
// Count numbers
// sum = sum + 1;
sum += 1;
}
}
int num
File
fun_decl fun_body
literal
literal
parameter
Universal nodes Advance nodes Stop nodes
9
Enriched Concrete Syntax Tree (eCST)
main(int num) {
int sum = 0;
for (i in i..num){
// Count numbers
// sum = sum + 1;
sum += 1;
}
}
int num
File
fun_decl fun_body
assignment
literal
literal
parameter
Universal nodes Advance nodes Stop nodes
9
Enriched Concrete Syntax Tree (eCST)
main(int num) {
int sum = 0;
for (i in i..num){
// Count numbers
// sum = sum + 1;
sum += 1;
}
}
int num
File
fun_decl fun_body
assignment
loop
literal
literal
body
parameter
Universal nodes Advance nodes Stop nodes
9
Enriched Concrete Syntax Tree (eCST)
main(int num) {
int sum = 0;
for (i in i..num){
// Count numbers
// sum = sum + 1;
sum += 1;
}
}
sum 1
+=
int num
File
fun_decl fun_body
assignment
literal
loop
literal
literal
literal
body
posfix
parameter
Universal nodes Advance nodes Stop nodes
10
Universal nodes represent syntactic grammar rules abstracting
programming languages’ concepts
Enriched Concrete Syntax Tree (eCST)
10
Advance nodes are used to give groupings and a more
hierarchical structure to eCSTs, easing the comparison across
language grammars
Universal nodes represent syntactic grammar rules abstracting
programming languages’ concepts
Enriched Concrete Syntax Tree (eCST)
10
Advance nodes are used to give groupings and a more
hierarchical structure to eCSTs, easing the comparison across
language grammars
Universal nodes represent syntactic grammar rules abstracting
programming languages’ concepts
text: while
line: 3
column: 1
clone type: -
Enriched Concrete Syntax Tree (eCST)
11
Stop nodes allow us to fragment the eCST, to compare code
snippets across different tree locations
Enriched Concrete Syntax Tree (eCST)
a
get_b
get_a
b
File
fun_decl
fun_body
attribute attribute
fun_decl
fun_body
b
get_b
get_a
a
File
fun_decl
fun_body
attribute attribute
fun_decl
fun_body
11
Stop nodes allow us to fragment the eCST, to compare code
snippets across different tree locations
Enriched Concrete Syntax Tree (eCST)
a
get_b
get_a
b
File
fun_decl
fun_body
attribute attribute
fun_decl
fun_body
b
get_b
get_a
a
File
fun_decl
fun_body
attribute attribute
fun_decl
fun_body
12
Clone detection metric
The first step in the comparison passes through a mapping of universal
nodes, where we state which nodes represent similar concepts in the
languages
12
Clone detection metric
The first step in the comparison passes through a mapping of universal
nodes, where we state which nodes represent similar concepts in the
languages
type
parameter_type
function_type
...
12
Clone detection metric
The first step in the comparison passes through a mapping of universal
nodes, where we state which nodes represent similar concepts in the
languages
type
parameter_type
function_type
...
Second, we compare the tokens’ text
text1 text2
=
?
13
Clone detection metric
Universal node type Token comparison Clone type
13
Clone detection metric
Universal node type Token comparison Clone type
Not similar Not similar No clone
13
Clone detection metric
Universal node type Token comparison Clone type
Not similar Not similar No clone
Same type Exactly equal Type 1 clone
13
Clone detection metric
Universal node type Token comparison Clone type
Not similar Not similar No clone
Same type Exactly equal Type 1 clone
Same type Not equal Type 2 clone
13
Clone detection metric
Universal node type Token comparison Clone type
Not similar Not similar No clone
Same type Exactly equal Type 1 clone
Same type Not equal Type 2 clone
Similar type Not equal Type 3 clone
13
Clone detection metric
Universal node type Token comparison Clone type
Not similar Not similar No clone
Same type Exactly equal Type 1 clone
Same type Not equal Type 2 clone
Similar type Not equal Type 3 clone
m = 3T1 + 2T2 + T3
Find the most similar clone pair:
14
Clone detection example
a
get_b
get_a
b
File
fun_decl
fun_body
attribute attribute
fun_decl
fun_body
b
get_b
get_a
a
File
fun_decl
fun_body
attribute attribute
fun_decl
fun_body
fun get_a() {String b = “b” }
fun get_b() {Int a = 1 }
get_a() {int a = 1; }
get_b() {String b = “b”; }
15
Clone detection example
fun get_a() {String b = “b” }
fun get_b() {Int a = 1 }
get_a() {int a = 1; }
get_b() {String b = “b”; }
get_a get_b
get_a
Type 1
m = 3
Type 2
m = 2
get_b
Type 2
m = 2
Type 1
m = 3
16
Clone detection example
a
get_b
get_a
b
File
fun_decl
fun_body
attribute attribute
fun_decl
fun_body
b
get_b
get_a
a
File
fun_decl
fun_body
attribute attribute
fun_decl
fun_body
fun get_a() {String b = “b” }
fun get_b() {Int a = 1 }
get_a() {int a = 1; }
get_b() {String b = “b”; }
17
Clone detection example
a b
b
Type 2
m = 2
Type 1
m = 3
a
Type 1
m = 3
Type 2
m = 2
fun get_a() {String b = “b” }
fun get_b() {Int a = 1 }
get_a() {int a = 1; }
get_b() {String b = “b”; }
18
Clone detection example
a
get_b
get_a
b
File
fun_decl
fun_body
attribute attribute
fun_decl
fun_body
b
get_b
get_a
a
File
fun_decl
fun_body
attribute attribute
fun_decl
fun_body
fun get_a() {String b = “b” }
fun get_b() {Int a = 1 }
get_a() {int a = 1; }
get_b() {String b = “b”; }
18
Clone detection example
a
get_b
get_a
b
File
fun_decl
fun_body
attribute attribute
fun_decl
fun_body
b
get_b
get_a
a
File
fun_decl
fun_body
attribute attribute
fun_decl
fun_body
fun get_a() {String b = “b” }
fun get_b() {Int a = 1 }
get_a() {int a = 1; }
get_b() {String b = “b”; }
19
OUT OF STEP at work
1. Language Features
1. Variables
2. Functions
3. Conditionals
4. Loops
5. Classes
2. Sorting Algorithms
3. Mobile Apps
https://
fl
aglab.github.io/CloneDetection/
20
OUT OF STEP at work
int sum = 0;
int i = 1;
while(i<=100) {
sum = sum + i;
i = i + 1;
}
var sum = 0;
for (i in 1..100)
sum = sum + i
20
OUT OF STEP at work
+
for
sum
File
assignment
binary
loop
assignment
+i
while
sum
File
assignment loop
assignment
i
sum i literal
literal
assignment
+sum
binary
int sum = 0;
int i = 1;
while(i<=100) {
sum = sum + i;
i = i + 1;
}
var sum = 0;
for (i in 1..100)
sum = sum + i
20
OUT OF STEP at work
+
for
sum
File
assignment
binary
loop
assignment
+i
while
sum
File
assignment loop
assignment
i
sum i literal
literal
assignment
+sum
binary
int sum = 0;
int i = 1;
while(i<=100) {
sum = sum + i;
i = i + 1;
}
var sum = 0;
for (i in 1..100)
sum = sum + i
20
OUT OF STEP at work
+
for
sum
File
assignment
binary
loop
assignment
+i
while
sum
File
assignment loop
assignment
i
sum i literal
literal
assignment
+sum
binary
int sum = 0;
int i = 1;
while(i<=100) {
sum = sum + i;
i = i + 1;
}
var sum = 0;
for (i in 1..100)
sum = sum + i
20
OUT OF STEP at work
+
for
sum
File
assignment
binary
loop
assignment
+i
while
sum
File
assignment loop
assignment
i
sum i literal
literal
assignment
+sum
binary
int sum = 0;
int i = 1;
while(i<=100) {
sum = sum + i;
i = i + 1;
}
var sum = 0;
for (i in 1..100)
sum = sum + i
20
OUT OF STEP at work
+
for
sum
File
assignment
binary
loop
assignment
+i
while
sum
File
assignment loop
assignment
i
sum i literal
literal
assignment
+sum
binary
int sum = 0;
int i = 1;
while(i<=100) {
sum = sum + i;
i = i + 1;
}
var sum = 0;
for (i in 1..100)
sum = sum + i
21
OUT OF STEP at work
•52 applications evaluated (26 Kotlin, 26 Dart)
•12 different application domains
•(8) Medium and (18) large (> 2000LOC) applications
•Different production levels (novice, intermediate,
enterprise)
22
OUT OF STEP at work
App Type Avg. LOC Kotlin # of Files
Total Type 1 Type 2 and 3
Dart Kotlin Dart Kotlin
Shopping 1220 3180 6 51 192395 66566 125829
Health 5463 4776 46 48 733716 167673 566043
Health 2956 2236 29 21 110573 27768 82805
Games 2499 3408 20 40 260244 65166 195121
Productivity 4220 3321 85 47 584825 161981 422844
Shopping 3449 1913 74 29 240201 57909 182386
Library 3909 2432 30 32 219373 51978 167395
Shopping 3468 2951 39 84 327299 90354 236945
Shopping 8139 3771 72 84 803354 210749 592605
Health 3660 1836 28 40 106931 27068 79863
Health 4215 1087 39 23 136150 34199 101951
Bookings 4953 3433 44 50 463125 112528 350597
Services 3927 2700 27 31 178192 38384 139808
Lifestyle 3123 1547 21 26 92305 27354 64951
Productivity 2429 1652 21 26 140574 31716 108858
Bookings 1045 1982 25 67 83825 26741 57084
Discounts 2380 1781 37 27 120956 30090 90866
Pets 5754 2615 41 51 250769 70000 180769
Pets 3039 1724 23 26 96517 24828 71689
Bookings 5490 2538 58 81 564947 150226 415011
Productivity 136 104 1 1 149 23 126
Productivity 144 104 1 1 149 23 126
Productivity 57 23 1 1 90 25 65
Information 367 2383 6 47 60215 25139 35076
Lifestyle 510 1831 14 59 55591 18471 37120
SmartHome 759 3248 9 33 111810 28052 83758
23
Conclusion and future work
23
Conclusion and future work
eCSTs define
universal nodes
to abstract away
specific language
constructs.
Effectively
enabling
comparison
across languages
23
Conclusion and future work
eCSTs define
universal nodes
to abstract away
specific language
constructs.
Effectively
enabling
comparison
across languages
We are able to analyze polyglot systems
23
Conclusion and future work
eCSTs define
universal nodes
to abstract away
specific language
constructs.
Effectively
enabling
comparison
across languages
We are able to analyze polyglot systems
OUT OF STEP effectively
detects clones with a
precision of ~79%
23
Conclusion and future work
eCSTs define
universal nodes
to abstract away
specific language
constructs.
Effectively
enabling
comparison
across languages
We are able to analyze polyglot systems
OUT OF STEP effectively
detects clones with a
precision of ~79%
We need to extend
the detection to GUIs
and other language
frameworks to
capture the entirety of
applications

[CIbSE2023] Cross-language clone detection for Mobile Apps

  • 1.
    Mon 24 -Fri 28 April 2023 Montevideo, Uruguay Cross-language Clone Detection for Mobile Apps Stephannie Jimenez, Gordana Rakic,2 Silvia Takahashi, Nicolás Cardozo Systems an Computing Engineering - Universidad de los Andes, Bogotá - Colombia 2Faculty of Sciences - University of Novi Sad - Serbia {s.jimenez16, stakahas, n.cardozo}@uniandes.edu.co, gordana.rakic@dmi.uns.ac.rs @ncardoz CIbSE 2023 XXVI Ibero-American Conference on Software Engineering XXVI Congreso Iberoamericano en Ingeniería de Software XXVI Congresso Ibero-Americano em Engenharia de Software
  • 2.
  • 3.
    3 Clone detection Python codesnippets i = 1 i = 1
  • 4.
    3 Clone detection Type 1clones Python code snippets i = 1 i = 1
  • 5.
    3 Clone detection Type 1clones Type 2 clones Python code snippets i = 1 j = 2 i = 1 l = 2
  • 6.
    3 Clone detection Type 1clones Type 2 clones Type 3 clones Python code snippets i = 1 j = 2 i = i + 1 i = 1 l = 2 i += 1
  • 7.
    3 Clone detection Type 1clones Type 2 clones Type 3 clones Type 4 clones Python code snippets i = 1 j = 2 i = i + 1 for i in range(1,10): i += 1 i = 1 l = 2 i += 1 i = 1 + 9
  • 8.
    3 Clone detection Type 1clones Type 2 clones Type 3 clones Type 4 clones Python code snippets i = 1 j = 2 i = i + 1 for i in range(1,10): i += 1 i = 1 l = 2 i += 1 i = 1 + 9 }Focus of clone detection
  • 9.
    4 How to detectclones Textual Textual
  • 10.
    4 How to detectclones Token or lexical Textual Textual
  • 11.
    4 How to detectclones Token or lexical Tree Textual Textual
  • 12.
    4 How to detectclones Token or lexical Tree Graph Textual Textual
  • 13.
    4 How to detectclones Token or lexical Tree Graph Hybrid Textual Textual OUT OF STEP is a hybrid approach combining a generalized tree structure and the textual representation of tokens
  • 14.
    5 Tools and algorithms ToolMulti-language Type Runs NICAD [Cody and Roy 2011] No Textual No CCFinderX [Kamiya et al. 2002] No Lexical No Simian [Harris 2019] No Textual No Duploc [Gordon and Bannier 2021] No Textual Yes SourcererCC [Sajnani et al. 2016 No Lexical No iClones [Göde and Koschke 2009] No Lexical No PMD/CPD [pmd 2021] No Lexical Yes Deckard [Lingxiao et al. 2018] No Tree-base No Licca [Vislavski et al. 2018] Yes Hybrid Yes
  • 15.
    5 Tools and algorithms ToolMulti-language Type Runs NICAD [Cody and Roy 2011] No Textual No CCFinderX [Kamiya et al. 2002] No Lexical No Simian [Harris 2019] No Textual No Duploc [Gordon and Bannier 2021] No Textual Yes SourcererCC [Sajnani et al. 2016 No Lexical No iClones [Göde and Koschke 2009] No Lexical No PMD/CPD [pmd 2021] No Lexical Yes Deckard [Lingxiao et al. 2018] No Tree-base No Licca [Vislavski et al. 2018] Yes Hybrid Yes 1. Languages have different shapes and forms
  • 16.
    5 Tools and algorithms ToolMulti-language Type Runs NICAD [Cody and Roy 2011] No Textual No CCFinderX [Kamiya et al. 2002] No Lexical No Simian [Harris 2019] No Textual No Duploc [Gordon and Bannier 2021] No Textual Yes SourcererCC [Sajnani et al. 2016 No Lexical No iClones [Göde and Koschke 2009] No Lexical No PMD/CPD [pmd 2021] No Lexical Yes Deckard [Lingxiao et al. 2018] No Tree-base No Licca [Vislavski et al. 2018] Yes Hybrid Yes 1. Languages have different shapes and forms 2. We need to abstract away from language details while keeping a reference to the language
  • 17.
    5 Tools and algorithms ToolMulti-language Type Runs NICAD [Cody and Roy 2011] No Textual No CCFinderX [Kamiya et al. 2002] No Lexical No Simian [Harris 2019] No Textual No Duploc [Gordon and Bannier 2021] No Textual Yes SourcererCC [Sajnani et al. 2016 No Lexical No iClones [Göde and Koschke 2009] No Lexical No PMD/CPD [pmd 2021] No Lexical Yes Deckard [Lingxiao et al. 2018] No Tree-base No Licca [Vislavski et al. 2018] Yes Hybrid Yes 1. Languages have different shapes and forms 2. We need to abstract away from language details while keeping a reference to the language 3. In current settings
  • 18.
  • 19.
  • 20.
  • 21.
    7 Kotlin var i =1 val j = 2 i = i + 1 for(i in 1..10) { i += 1 } int i = 1; int l = 2; i += 1; i = 1 + 9; Dart OUT OF STEP
  • 22.
    7 Type 1 clones Type2 clones Type 3 clones Type 4 clones Kotlin var i = 1 val j = 2 i = i + 1 for(i in 1..10) { i += 1 } int i = 1; int l = 2; i += 1; i = 1 + 9; }Focus of clone detection Dart OUT OF STEP
  • 23.
  • 24.
    8 Clone analysis process Grammardefinition ANTLR 4 generation
  • 25.
    8 Clone analysis process Grammardefinition ANTLR 4 generation Lexer parser
  • 26.
    8 Clone analysis process Grammardefinition ANTLR 4 generation sum 1 += int num eCST Lexer parser
  • 27.
    8 Clone analysis process Grammardefinition ANTLR 4 generation sum 1 += int num eCST Lexer parser OutOfStep
  • 28.
    8 Clone analysis process Grammardefinition ANTLR 4 generation sum 1 += int num eCST Lexer parser OutOfStep Type 1 Type 2 Type 3 Clone detection
  • 29.
    9 Enriched Concrete SyntaxTree (eCST) main(int num) { int sum = 0; for (i in i..num){ // Count numbers // sum = sum + 1; sum += 1; } }
  • 30.
    9 Enriched Concrete SyntaxTree (eCST) main(int num) { int sum = 0; for (i in i..num){ // Count numbers // sum = sum + 1; sum += 1; } } File Advance nodes
  • 31.
    9 Enriched Concrete SyntaxTree (eCST) main(int num) { int sum = 0; for (i in i..num){ // Count numbers // sum = sum + 1; sum += 1; } } File fun_decl fun_body Universal nodes Advance nodes Stop nodes
  • 32.
    9 Enriched Concrete SyntaxTree (eCST) main(int num) { int sum = 0; for (i in i..num){ // Count numbers // sum = sum + 1; sum += 1; } } int num File fun_decl fun_body literal literal parameter Universal nodes Advance nodes Stop nodes
  • 33.
    9 Enriched Concrete SyntaxTree (eCST) main(int num) { int sum = 0; for (i in i..num){ // Count numbers // sum = sum + 1; sum += 1; } } int num File fun_decl fun_body assignment literal literal parameter Universal nodes Advance nodes Stop nodes
  • 34.
    9 Enriched Concrete SyntaxTree (eCST) main(int num) { int sum = 0; for (i in i..num){ // Count numbers // sum = sum + 1; sum += 1; } } int num File fun_decl fun_body assignment loop literal literal body parameter Universal nodes Advance nodes Stop nodes
  • 35.
    9 Enriched Concrete SyntaxTree (eCST) main(int num) { int sum = 0; for (i in i..num){ // Count numbers // sum = sum + 1; sum += 1; } } sum 1 += int num File fun_decl fun_body assignment literal loop literal literal literal body posfix parameter Universal nodes Advance nodes Stop nodes
  • 36.
    10 Universal nodes representsyntactic grammar rules abstracting programming languages’ concepts Enriched Concrete Syntax Tree (eCST)
  • 37.
    10 Advance nodes areused to give groupings and a more hierarchical structure to eCSTs, easing the comparison across language grammars Universal nodes represent syntactic grammar rules abstracting programming languages’ concepts Enriched Concrete Syntax Tree (eCST)
  • 38.
    10 Advance nodes areused to give groupings and a more hierarchical structure to eCSTs, easing the comparison across language grammars Universal nodes represent syntactic grammar rules abstracting programming languages’ concepts text: while line: 3 column: 1 clone type: - Enriched Concrete Syntax Tree (eCST)
  • 39.
    11 Stop nodes allowus to fragment the eCST, to compare code snippets across different tree locations Enriched Concrete Syntax Tree (eCST) a get_b get_a b File fun_decl fun_body attribute attribute fun_decl fun_body b get_b get_a a File fun_decl fun_body attribute attribute fun_decl fun_body
  • 40.
    11 Stop nodes allowus to fragment the eCST, to compare code snippets across different tree locations Enriched Concrete Syntax Tree (eCST) a get_b get_a b File fun_decl fun_body attribute attribute fun_decl fun_body b get_b get_a a File fun_decl fun_body attribute attribute fun_decl fun_body
  • 41.
    12 Clone detection metric Thefirst step in the comparison passes through a mapping of universal nodes, where we state which nodes represent similar concepts in the languages
  • 42.
    12 Clone detection metric Thefirst step in the comparison passes through a mapping of universal nodes, where we state which nodes represent similar concepts in the languages type parameter_type function_type ...
  • 43.
    12 Clone detection metric Thefirst step in the comparison passes through a mapping of universal nodes, where we state which nodes represent similar concepts in the languages type parameter_type function_type ... Second, we compare the tokens’ text text1 text2 = ?
  • 44.
    13 Clone detection metric Universalnode type Token comparison Clone type
  • 45.
    13 Clone detection metric Universalnode type Token comparison Clone type Not similar Not similar No clone
  • 46.
    13 Clone detection metric Universalnode type Token comparison Clone type Not similar Not similar No clone Same type Exactly equal Type 1 clone
  • 47.
    13 Clone detection metric Universalnode type Token comparison Clone type Not similar Not similar No clone Same type Exactly equal Type 1 clone Same type Not equal Type 2 clone
  • 48.
    13 Clone detection metric Universalnode type Token comparison Clone type Not similar Not similar No clone Same type Exactly equal Type 1 clone Same type Not equal Type 2 clone Similar type Not equal Type 3 clone
  • 49.
    13 Clone detection metric Universalnode type Token comparison Clone type Not similar Not similar No clone Same type Exactly equal Type 1 clone Same type Not equal Type 2 clone Similar type Not equal Type 3 clone m = 3T1 + 2T2 + T3 Find the most similar clone pair:
  • 50.
    14 Clone detection example a get_b get_a b File fun_decl fun_body attributeattribute fun_decl fun_body b get_b get_a a File fun_decl fun_body attribute attribute fun_decl fun_body fun get_a() {String b = “b” } fun get_b() {Int a = 1 } get_a() {int a = 1; } get_b() {String b = “b”; }
  • 51.
    15 Clone detection example funget_a() {String b = “b” } fun get_b() {Int a = 1 } get_a() {int a = 1; } get_b() {String b = “b”; } get_a get_b get_a Type 1 m = 3 Type 2 m = 2 get_b Type 2 m = 2 Type 1 m = 3
  • 52.
    16 Clone detection example a get_b get_a b File fun_decl fun_body attributeattribute fun_decl fun_body b get_b get_a a File fun_decl fun_body attribute attribute fun_decl fun_body fun get_a() {String b = “b” } fun get_b() {Int a = 1 } get_a() {int a = 1; } get_b() {String b = “b”; }
  • 53.
    17 Clone detection example ab b Type 2 m = 2 Type 1 m = 3 a Type 1 m = 3 Type 2 m = 2 fun get_a() {String b = “b” } fun get_b() {Int a = 1 } get_a() {int a = 1; } get_b() {String b = “b”; }
  • 54.
    18 Clone detection example a get_b get_a b File fun_decl fun_body attributeattribute fun_decl fun_body b get_b get_a a File fun_decl fun_body attribute attribute fun_decl fun_body fun get_a() {String b = “b” } fun get_b() {Int a = 1 } get_a() {int a = 1; } get_b() {String b = “b”; }
  • 55.
    18 Clone detection example a get_b get_a b File fun_decl fun_body attributeattribute fun_decl fun_body b get_b get_a a File fun_decl fun_body attribute attribute fun_decl fun_body fun get_a() {String b = “b” } fun get_b() {Int a = 1 } get_a() {int a = 1; } get_b() {String b = “b”; }
  • 56.
    19 OUT OF STEPat work 1. Language Features 1. Variables 2. Functions 3. Conditionals 4. Loops 5. Classes 2. Sorting Algorithms 3. Mobile Apps https:// fl aglab.github.io/CloneDetection/
  • 57.
    20 OUT OF STEPat work int sum = 0; int i = 1; while(i<=100) { sum = sum + i; i = i + 1; } var sum = 0; for (i in 1..100) sum = sum + i
  • 58.
    20 OUT OF STEPat work + for sum File assignment binary loop assignment +i while sum File assignment loop assignment i sum i literal literal assignment +sum binary int sum = 0; int i = 1; while(i<=100) { sum = sum + i; i = i + 1; } var sum = 0; for (i in 1..100) sum = sum + i
  • 59.
    20 OUT OF STEPat work + for sum File assignment binary loop assignment +i while sum File assignment loop assignment i sum i literal literal assignment +sum binary int sum = 0; int i = 1; while(i<=100) { sum = sum + i; i = i + 1; } var sum = 0; for (i in 1..100) sum = sum + i
  • 60.
    20 OUT OF STEPat work + for sum File assignment binary loop assignment +i while sum File assignment loop assignment i sum i literal literal assignment +sum binary int sum = 0; int i = 1; while(i<=100) { sum = sum + i; i = i + 1; } var sum = 0; for (i in 1..100) sum = sum + i
  • 61.
    20 OUT OF STEPat work + for sum File assignment binary loop assignment +i while sum File assignment loop assignment i sum i literal literal assignment +sum binary int sum = 0; int i = 1; while(i<=100) { sum = sum + i; i = i + 1; } var sum = 0; for (i in 1..100) sum = sum + i
  • 62.
    20 OUT OF STEPat work + for sum File assignment binary loop assignment +i while sum File assignment loop assignment i sum i literal literal assignment +sum binary int sum = 0; int i = 1; while(i<=100) { sum = sum + i; i = i + 1; } var sum = 0; for (i in 1..100) sum = sum + i
  • 63.
    21 OUT OF STEPat work •52 applications evaluated (26 Kotlin, 26 Dart) •12 different application domains •(8) Medium and (18) large (> 2000LOC) applications •Different production levels (novice, intermediate, enterprise)
  • 64.
    22 OUT OF STEPat work App Type Avg. LOC Kotlin # of Files Total Type 1 Type 2 and 3 Dart Kotlin Dart Kotlin Shopping 1220 3180 6 51 192395 66566 125829 Health 5463 4776 46 48 733716 167673 566043 Health 2956 2236 29 21 110573 27768 82805 Games 2499 3408 20 40 260244 65166 195121 Productivity 4220 3321 85 47 584825 161981 422844 Shopping 3449 1913 74 29 240201 57909 182386 Library 3909 2432 30 32 219373 51978 167395 Shopping 3468 2951 39 84 327299 90354 236945 Shopping 8139 3771 72 84 803354 210749 592605 Health 3660 1836 28 40 106931 27068 79863 Health 4215 1087 39 23 136150 34199 101951 Bookings 4953 3433 44 50 463125 112528 350597 Services 3927 2700 27 31 178192 38384 139808 Lifestyle 3123 1547 21 26 92305 27354 64951 Productivity 2429 1652 21 26 140574 31716 108858 Bookings 1045 1982 25 67 83825 26741 57084 Discounts 2380 1781 37 27 120956 30090 90866 Pets 5754 2615 41 51 250769 70000 180769 Pets 3039 1724 23 26 96517 24828 71689 Bookings 5490 2538 58 81 564947 150226 415011 Productivity 136 104 1 1 149 23 126 Productivity 144 104 1 1 149 23 126 Productivity 57 23 1 1 90 25 65 Information 367 2383 6 47 60215 25139 35076 Lifestyle 510 1831 14 59 55591 18471 37120 SmartHome 759 3248 9 33 111810 28052 83758
  • 65.
  • 66.
    23 Conclusion and futurework eCSTs define universal nodes to abstract away specific language constructs. Effectively enabling comparison across languages
  • 67.
    23 Conclusion and futurework eCSTs define universal nodes to abstract away specific language constructs. Effectively enabling comparison across languages We are able to analyze polyglot systems
  • 68.
    23 Conclusion and futurework eCSTs define universal nodes to abstract away specific language constructs. Effectively enabling comparison across languages We are able to analyze polyglot systems OUT OF STEP effectively detects clones with a precision of ~79%
  • 69.
    23 Conclusion and futurework eCSTs define universal nodes to abstract away specific language constructs. Effectively enabling comparison across languages We are able to analyze polyglot systems OUT OF STEP effectively detects clones with a precision of ~79% We need to extend the detection to GUIs and other language frameworks to capture the entirety of applications