Learning to Spot and Refactor Inconsistent Method Names

Dongsun Kim
Dongsun KimResearch Associate
logotype of the University
of Luxembourg
1 Interdisciplinary Centre for Security, Reliability and Trust (SnT), University of Luxembourg
2 Department of Software Engineering, Chonbuk National University, South Korea
Kui Liu1, Dongsun Kim1, Tegawendé F. Bissyandé1, Taeyoung Kim2,
Kisub Kim1, Anil Koyuncu1, Suntae Kim2, Yves Le Traon1
Learning to Spot and Refactor
Inconsistent Method Names
29th May 2019
2
Programming with Libraries
2
Programming with Libraries
LibraryA.java
2
Programming with Libraries
LibraryA.java
getItem()
setObject(…)
…()
2
Programming with Libraries
LibraryA.java
getItem()
setObject(…)
…()
2
Programming with Libraries
LibraryA.java
getItem()
setObject(…)
…()
2
Programming with Libraries
LibraryA.java
getItem()
setObject(…)
…()
Developers often
do not check the inside
of the method.
2
Programming with Libraries
LibraryA.java
getItem()
setObject(…)
…()
Developers often
do not check the inside
of the method.
2
Programming with Libraries
LibraryA.java
getItem()
setObject(…)
…()
Using a method relies on
its name (+ API document).
Developers often
do not check the inside
of the method.
3
A Method can Disguise
3
A Method can Disguise
getPokemon( … )
3
A Method can Disguise
getPokemon( … )
What I expect
3
A Method can Disguise
getPokemon( … )
getPokemonRealMonster( … )
What I expect
3
A Method can Disguise
getPokemon( … )
getPokemonRealMonster( … )
What I expect
What I actually get
3
A Method can Disguise
getPokemon( … )
getPokemonRealMonster( … )
What I expect
What I actually get
4
Making the name consistent is NOT easy
https://www.itworld.com/article/2833265/
don-t-go-into-programming-if-you-don-t-
have-a-good-thesaurus.html
Naming Things
49%
4
Making the name consistent is NOT easy
https://www.itworld.com/article/2833265/
don-t-go-into-programming-if-you-don-t-
have-a-good-thesaurus.html
Naming Things
49%
5
Consequence of inconsistent names
There are 5K+ questions on naming issues in
stackoverflow.com.
6
Naming bugs are common
We found 183K+ commits addressing naming issues from
GitHub.com by a quick search with simple queries such as
“inconsistent, consistency, misleading, …”
7
Our Goals
Detect inconsistent
method names.
Repair the names to be
consistent.
7
Our Goals
Detect inconsistent
method names.
Repair the names to be
consistent.
7
Our Goals
Detect inconsistent
method names.
Repair the names to be
consistent.
8
Idea
Similar implementations would have similar names.
8
Idea
Similar implementations would have similar names.
8
Idea
Similar implementations would have similar names.
8
Idea
Similar implementations would have similar names.
8
Idea
Similar implementations would have similar names.
8
Idea
Similar implementations would have similar names.
8
Idea
Similar implementations would have similar names.
9
Idea
Similar implementations would have similar names.
9
Idea
Similar implementations would have similar names.
9
Idea
Similar implementations would have similar names.
9
Idea
Similar implementations would have similar names.
9
Idea
Similar implementations would have similar names.
9
Idea
Similar implementations would have similar names.
10
Idea
Similar implementations, but different names.
10
Idea
Similar implementations, but different names.
Approach
11
Approach
11
How to find similar
names/implementations?
12
Sim( , ) = ?
13
14
14
What we need!
15
Autoencoder
16
Method Method
Autoencoder
P P
17
Program Vectors
P P
17
Program Vectors
P P
17
Program Vectors
Program
Encoder
P P
17
Program Vectors
Program
Encoder
M1
M2
M3
M4
P P
17
Program Vectors
Program
Encoder
M1
M2
M3
M4
P P
17
Program Vectors
Program
Encoder
M1
M2
M3
M4
P P
17
Program Vectors
Program
Encoder
M1
M2
M3
M4
<9, 2, 3, …>
<7, 1, 6, …>
<2, 8, 3, …>
<0, 1, 8, …>
18
Method = Name + Body
getID(…)
{
for(…)
{
for(…)
{
for(…)
…
Similar
Names
Similar
Bodies
N
B
19
Method Name Embedding
findField
findMatchesHelper
containsTarget
containsField
findInstruction1
find, Field
find, Matches, Helper
contains, Target
contains, Field
find, Instruction1
Tokenized Names
(camel case, underscore)
Method
Names
Embedded
Vectors
Sentence2vec
(PV-DM)
20
return (String[]) list.toArray(new String[0]);
Method Body Embedding
Preprocessing: Program Serialization
Method Body:
[ReturnStatement, return,ArrayType, String[],Variable, listVar, Method, toArray,
ArrayCreation, new,ArrayType, String[], NumberLiteral,“0”]
Serialized AST:
21
return (String[]) list.toArray(new String[0]);
Method Body:
[ReturnStatement, return,ArrayType, String[],Variable, listVar, Method, toArray,
ArrayCreation, new,ArrayType, String[], NumberLiteral,“0”]
Serialized AST:
...
Method Body Embedding
Preprocessing: Program Serialization
22
return (String[]) list.toArray(new String[0]);
Method Body:
[ReturnStatement, return,ArrayType, String[],Variable, listVar, Method, toArray,
ArrayCreation, new,ArrayType, String[], NumberLiteral,“0”]
Serialized AST:
Method Body Embedding
Preprocessing: Program Serialization
23
Vectorization
[ReturnStatement, return,ArrayType, String[],Variable, listVar, Method, toArray,
ArrayCreation, new,ArrayType, String[], NumberLiteral,“0”]
Serialized AST:
Method Body Embedding
24
Vectorization
[ReturnStatement, return,ArrayType, String[],Variable, listVar, Method, toArray,
ArrayCreation, new,ArrayType, String[], NumberLiteral,“0”]
Serialized AST:
Method Body Embedding
24
Vectorization
[ReturnStatement, return,ArrayType, String[],Variable, listVar, Method, toArray,
ArrayCreation, new,ArrayType, String[], NumberLiteral,“0”]
Serialized AST:
Token Embedding
(Word2Vec)
Method Body Embedding
24
Vectorization
[ReturnStatement, return,ArrayType, String[],Variable, listVar, Method, toArray,
ArrayCreation, new,ArrayType, String[], NumberLiteral,“0”]
Serialized AST:
Token Embedding
(Word2Vec)
<2, 6, 9, …> <8, 4, 1, …> <9, 0, 7, …> <2, 3, 0, …> … <7, 1, 2, …> …
Method Body Embedding
24
Vectorization
[ReturnStatement, return,ArrayType, String[],Variable, listVar, Method, toArray,
ArrayCreation, new,ArrayType, String[], NumberLiteral,“0”]
Serialized AST:
Token Embedding
(Word2Vec)
<2, 6, 9, …> <8, 4, 1, …> <9, 0, 7, …> <2, 3, 0, …> … <7, 1, 2, …> …
Method Body Embedding
25
Encoding (CNN-based)
icle has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TSE.2018.288495
Transactions on Software Engineering
7
…
…
n ! k (a two-dimensional
numeric vector)
Input layer
C1: 4 feature maps
S1: 4 feature maps
C2: 6 feature maps
S2: 6 feature maps
Convolutional layer
Convolutional
layerSubsampling layer
Subsampling
layer Fully connected layers
Output
layer
Dense layer
Output is extracted
features
ReturnStatement
return
ArrayType
String[]
Variable
listVar
Method
toArray
ArrayCreation
new
ArrayType
String[]
NumberLiteral
“0”
0 0 0 0 0 00
0 0 0 0 0 00
Fig. 8: CNN architecture for extracting clustering features. C1 is the first convolutional layer, and C2 is the second one. S1 is
2 6 9 ...
...
...
...
...
...
...
...
...
...
...
...
...
...
8 4 1
9 0 7
2 3 0
7 1 2
... ......
26
Encoding (CNN-based)
icle has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TSE.2018.288495
Transactions on Software Engineering
7
…
…
n ! k (a two-dimensional
numeric vector)
Input layer
C1: 4 feature maps
S1: 4 feature maps
C2: 6 feature maps
S2: 6 feature maps
Convolutional layer
Convolutional
layerSubsampling layer
Subsampling
layer Fully connected layers
Output
layer
Dense layer
Output is extracted
features
ReturnStatement
return
ArrayType
String[]
Variable
listVar
Method
toArray
ArrayCreation
new
ArrayType
String[]
NumberLiteral
“0”
0 0 0 0 0 00
0 0 0 0 0 00
Fig. 8: CNN architecture for extracting clustering features. C1 is the first convolutional layer, and C2 is the second one. S1 is
2 6 9 ...
...
...
...
...
...
...
...
...
...
...
...
...
...
8 4 1
9 0 7
2 3 0
7 1 2
... ......
26
Encoding (CNN-based)
icle has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TSE.2018.288495
Transactions on Software Engineering
7
…
…
n ! k (a two-dimensional
numeric vector)
Input layer
C1: 4 feature maps
S1: 4 feature maps
C2: 6 feature maps
S2: 6 feature maps
Convolutional layer
Convolutional
layerSubsampling layer
Subsampling
layer Fully connected layers
Output
layer
Dense layer
Output is extracted
features
ReturnStatement
return
ArrayType
String[]
Variable
listVar
Method
toArray
ArrayCreation
new
ArrayType
String[]
NumberLiteral
“0”
0 0 0 0 0 00
0 0 0 0 0 00
Fig. 8: CNN architecture for extracting clustering features. C1 is the first convolutional layer, and C2 is the second one. S1 is
2 6 9 ...
...
...
...
...
...
...
...
...
...
...
...
...
...
8 4 1
9 0 7
2 3 0
7 1 2
... ......
27
Inconsistency detection
N
B
findField
{
for(…)
{
…
28
Inconsistency detection
N
B
findField
{
for(…)
{
…
Adjacent
Methods
=
29
Inconsistency detection
N
B
findField
{
for(…)
{
…
30
Inconsistency detection
=
True
False
The method is likely to have
a consistent name.
The method name could be
inconsistent with the implementation.
Suggest a new name.→
31
Name Suggestion
32
Name Suggestion
33
Name Suggestion
Four ranking strategies:
Suggestion = Sorting similar imp.
33
Name Suggestion
Four ranking strategies:
Suggestion = Sorting similar imp.
R1: Sort names based on distance,
don’t care identical names.
33
Name Suggestion
Four ranking strategies:
Suggestion = Sorting similar imp.
R1: Sort names based on distance,
don’t care identical names.
R2: Group identical names first,
sort groups based on size.
33
Name Suggestion
Four ranking strategies:
Suggestion = Sorting similar imp.
R1: Sort names based on distance,
don’t care identical names.
R2: Group identical names first,
sort groups based on size.
R3: Group identical names first,
sort groups based on avg. distance.
33
Name Suggestion
Four ranking strategies:
Suggestion = Sorting similar imp.
R1: Sort names based on distance,
don’t care identical names.
R2: Group identical names first,
sort groups based on size.
R3: Group identical names first,
sort groups based on avg. distance.
R4: Same with R3, but penalize groups
with size=1.
34
Evaluation
35
Research Questions
RQ1: Inconsistency Identification
RQ2: Suggestion Precision
RQ3: Comparative Study
RQ4: Live Study
35
Research Questions
RQ1: Inconsistency Identification
RQ2: Suggestion Precision
RQ3: Comparative Study
RQ4: Live Study
}Training/testing data from
open-source projects.
35
Research Questions
RQ1: Inconsistency Identification
RQ2: Suggestion Precision
RQ3: Comparative Study
RQ4: Live Study
}Training/testing data from
open-source projects.
Comparing with an approach* with
based on a convolutional attention network.
→
[*] M. Allamanis, H. Peng, and C. Sutton, “A convolutional attention network for extreme summarization of source
code,” in Proceedings of the 33nd International Conference on Machine Learning. JMLR.org, 2016, pp. 2091–2100.
35
Research Questions
RQ1: Inconsistency Identification
RQ2: Suggestion Precision
RQ3: Comparative Study
RQ4: Live Study
}Training/testing data from
open-source projects.
Comparing with an approach* with
based on a convolutional attention network.
→
Submitting our suggestion results as pull-requests
to open-source projects.
→
[*] M. Allamanis, H. Peng, and C. Sutton, “A convolutional attention network for extreme summarization of source
code,” in Proceedings of the 33nd International Conference on Machine Learning. JMLR.org, 2016, pp. 2091–2100.
36
Training/Testing Set
Total: 430 projects
36
Training/Testing Set
Total: 430 projects
36
Training/Testing Set
Total: 430 projects
36
Training/Testing Set
Total: 430 projects
36
Training/Testing Set
Total: 430 projects
Training Data
2,116,413 methods
37
Training/Testing Set
37
Training/Testing Set
37
Training/Testing Set
→
37
Training/Testing Set
→
37
Training/Testing Set
→
Testing
2,805 methods
(name pairs + bodies)
38
RQ1: Inconsistency Identification
# of neighbors to look up k=1 5 10 30
Inconsistent
(%)
Precision 56.8 53.7 53.3 49.9
Recall 84.5 55.9 46.7 28.8
F1 67.9 54.8 49.7 36.5
Consistent
(%)
Precision 72.0 55.9 54.2 51.4
Recall 38.2 53.7 60.7 72.2
F1 49.9 54.8 57.3 60.0
Accuracy (%) 60.9 54.8 53.8 50.9
→
38
RQ1: Inconsistency Identification
# of neighbors to look up k=1 5 10 30
Inconsistent
(%)
Precision 56.8 53.7 53.3 49.9
Recall 84.5 55.9 46.7 28.8
F1 67.9 54.8 49.7 36.5
Consistent
(%)
Precision 72.0 55.9 54.2 51.4
Recall 38.2 53.7 60.7 72.2
F1 49.9 54.8 57.3 60.0
Accuracy (%) 60.9 54.8 53.8 50.9
→
39
RQ2: Name Suggestion
Accuracy (%)
k=thr k=10
R1 R2 R3 R4
First Token
thr=1 23.4 23.2 23.0 24.1
thr=5 35.7 39.4 39.4 39.7
Full Name
thr=1 10.7 11.0 10.9 10.9
thr=5 17.0 18.7 19.0 19.2
39
RQ2: Name Suggestion
Accuracy (%)
k=thr k=10
R1 R2 R3 R4
First Token
thr=1 23.4 23.2 23.0 24.1
thr=5 35.7 39.4 39.4 39.7
Full Name
thr=1 10.7 11.0 10.9 10.9
thr=5 17.0 18.7 19.0 19.2
40
RQ3: Comparison — Name Suggestion
Accuracy (%)
First Token Full Name
thr=1 thr=5 thr=1 thr=5
R1 36.4 47.2 16.5 22.9
R2 34.8 50.2 17.0 25.4
R3 34.7 50.3 16.9 25.5
R4 35.4 50.5 16.0 25.7
conv_attention 22.3 33.6 0.3 0.6
copy_attention 23.5 44.7 0.4 1.1
state-of-the-art
}
40
RQ3: Comparison — Name Suggestion
Accuracy (%)
First Token Full Name
thr=1 thr=5 thr=1 thr=5
R1 36.4 47.2 16.5 22.9
R2 34.8 50.2 17.0 25.4
R3 34.7 50.3 16.9 25.5
R4 35.4 50.5 16.0 25.7
conv_attention 22.3 33.6 0.3 0.6
copy_attention 23.5 44.7 0.4 1.1
state-of-the-art
}
41
Training Data
RQ4: Live Study — Setup
41
Training Data
10%
RQ4: Live Study — Setup
41
Training Data
10%
RQ4: Live Study — Setup
42
Training Data
RQ4: Live Study — Setup
10%
42
Training Data Identify inconsistent names
and suggest new names
(sampled 100 cases).
RQ4: Live Study — Setup
10%
42
Training Data Identify inconsistent names
and suggest new names
(sampled 100 cases).
RQ4: Live Study — Setup
10%
42
Training Data Identify inconsistent names
and suggest new names
(sampled 100 cases).
Create a pull request
RQ4: Live Study — Setup
10%
42
Training Data Identify inconsistent names
and suggest new names
(sampled 100 cases).
Create a pull request
RQ4: Live Study — Setup
10%
42
Training Data Identify inconsistent names
and suggest new names
(sampled 100 cases).
Create a pull request
Ask a maintainer to refactor
the method names
RQ4: Live Study — Setup
10%
43
RQ4: Live Study
Agreed Agreed but not fixed
Disagreed Ignored Total
Merged Approved Improved Cannot Won’t
40 26 4 1 2 9 18 100
43
RQ4: Live Study
Agreed Agreed but not fixed
Disagreed Ignored Total
Merged Approved Improved Cannot Won’t
40 26 4 1 2 9 18 100
Half of them are public methods.
43
RQ4: Live Study
Agreed Agreed but not fixed
Disagreed Ignored Total
Merged Approved Improved Cannot Won’t
40 26 4 1 2 9 18 100
Half of them are public methods.
Developer feedback includes
43
RQ4: Live Study
Agreed Agreed but not fixed
Disagreed Ignored Total
Merged Approved Improved Cannot Won’t
40 26 4 1 2 9 18 100
Half of them are public methods.
Developer feedback includes
* It should follow project-specific naming conventions.
43
RQ4: Live Study
Agreed Agreed but not fixed
Disagreed Ignored Total
Merged Approved Improved Cannot Won’t
40 26 4 1 2 9 18 100
Half of them are public methods.
Developer feedback includes
* It should follow project-specific naming conventions.
* Some method names should consider class names.
e.g., In “XXXBuilder”, many methods cannot be named as “build()”
even though they return “XXXBuilder” objects.
44
Summary
X
RQ4: Live Study
Agreed Agreed but not fixed
Disagreed Ignored Total
Merged Approved Improved Cannot Won’t
40 26 4 1 2 9 18 100
Half of them are public methods.
* It should follow project-specific naming conventions.
* Some method names should consider class names.
e.g., In “XXXBuilder”, many methods cannot be named as “build()”
even though they return “XXXBuilder” objects.
X
RQ3: Comparison
Accuracy (%)
First Token Full Name
thr=1 thr=5 thr=1 thr=5
R1 36.4 47.2 16.5 22.9
R2 34.8 50.2 17.0 25.4
R3 34.7 50.3 16.9 25.5
R4 35.4 50.5 16.0 25.7
conv_attention 22.3 33.6 0.3 0.6
copy_attention 23.5 44.7 0.4 1.1
X
Encoding (CNN-based)
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TSE.2018.2884955, IEEE
Transactions on Software Engineering
7
…
…
n ! k (a two-dimensional
numeric vector)
Input layer
C1: 4 feature maps
S1: 4 feature maps
C2: 6 feature maps
S2: 6 feature maps
Convolutional layer
Convolutional
layerSubsampling layer
Subsampling
layer Fully connected layers
Output
layer
Dense layer
Output is extracted
features
ReturnStatement
return
ArrayType
String[]
Variable
listVar
Method
toArray
ArrayCreation
new
ArrayType
String[]
NumberLiteral
“0”
0 0 0 0 0 00
0 0 0 0 0 00
Fig. 8: CNN architecture for extracting clustering features. C1 is the first convolutional layer, and C2 is the second one. S1 is
the first subsampling layer, and S2 is the second one. The output of dense layer is considered as extracted features of code
fragments and will be used to do clustering.
2.4.4 Code Patterns Mining
Although violations can be parsed and converted into two-
dimensional numeric vectors, it is still challenging to mine
code patterns given that noisy information (e.g., specific
meaningless identifiers) can interfere with identifying sim-
ilar violations. Deep learning has recently been shown
promising in various software engineering tasks [18], [47],
[49]. In particular, it offers a major advantage of requiring
less prior knowledge and human effort in feature design for
machine learning applications. Consequently, our method is
designed to deeply learn discriminating features for mining
code patterns of violations. We leverage CNNs to perform
deep learning of violation features with embedded viola-
tions, and also use X-means clustering algorithm to cluster
violations with learned features.
Feature learning with CNNs
Figure 8 shows the CNNs architecture for learning violation
features. The input is two-dimensional numeric vectors
of preprocessed violations. The alternating local-connected
convolutional and subsampling layers are used to capture
the local features of violations. The dense layer compresses
all local features captured by former layers. We select the
output of the dense layer as the learned violation features
to cluster violations. Note that our approach uses CNNs to
of violations from clustered similar code fragments of viola-
tions to show patterns clearly. Note that, the whole process
of mining patterns is automated.
2.5 Mining Common Fix Patterns
Our goal in this step is to summarize how a violation
is resolved by developers. To achieve this goal, we col-
lect violation fixing changes and proceed to identify their
common fix patterns. The approach of mining common fix
patterns is similar to that of mining common code patterns.
The differences lie in the data collection and tokenization
process. Before describing our approach of mining common
fix patterns, we formalize the definitions of patch and fix
pattern.
2.5.1 Preliminaries
A patch represents a modification carried on a program
source code to repair the program which was brought to
an erroneous state at runtime. A patch thus captures some
knowledge on modification behavior, and similar patches
may be associated with similar behavioral changes.
Definition 4. Patch (P): A patch is a pair of source code
fragments, one representing a buggy version and another
as its updated (i.e., bug-fixing) version. In the traditional
GNU diff representation of patches, the buggy version is
2 6 9 ...
...
...
...
...
...
...
...
...
...
...
...
...
...
8 4 1
9 0 7
2 3 0
7 1 2
... ......
X
Making the name consistent is NOT easy
https://www.itworld.com/article/2833265/
don-t-go-into-programming-if-you-don-t-
have-a-good-thesaurus.html
45
https://github.com/SerVal-DTF/debug-method-name
Tool and Data
46
https://www.darkrsw.net http://wwwen.uni.lu/snt/
research/serval
Hire me! Université du Luxembourg
1.1 - logotype of the University
of Luxembourg
The logotype may not be altered under any
circumstances.
It is to be used like this for all communication mediums.
Université du Luxembourg © 03/2013
3.1 - the Interdisciplinary Centre for
Security Reliability and Trust
The SnT uses its own logo. It is used on all external
communication tools in combination with the UL logo.
Design guidelines are available at SnT.
Hiring
1 of 114

Recommended

Mining Fix Patterns for FindBugs Violations by
Mining Fix Patterns for FindBugs ViolationsMining Fix Patterns for FindBugs Violations
Mining Fix Patterns for FindBugs ViolationsDongsun Kim
641 views59 slides
TBar: Revisiting Template-based Automated Program Repair by
TBar: Revisiting Template-based Automated Program RepairTBar: Revisiting Template-based Automated Program Repair
TBar: Revisiting Template-based Automated Program RepairDongsun Kim
379 views26 slides
Impact of Tool Support in Patch Construction by
Impact of Tool Support in Patch ConstructionImpact of Tool Support in Patch Construction
Impact of Tool Support in Patch ConstructionDongsun Kim
589 views33 slides
A Closer Look at Real-World Patches by
A Closer Look at Real-World PatchesA Closer Look at Real-World Patches
A Closer Look at Real-World PatchesDongsun Kim
425 views26 slides
LSRepair: Live Search of Fix Ingredients for Automated Program Repair by
LSRepair: Live Search of Fix Ingredients for Automated Program RepairLSRepair: Live Search of Fix Ingredients for Automated Program Repair
LSRepair: Live Search of Fix Ingredients for Automated Program RepairDongsun Kim
463 views20 slides
Bench4BL: Reproducibility Study on the Performance of IR-Based Bug Localization by
Bench4BL: Reproducibility Study on the Performance of IR-Based Bug LocalizationBench4BL: Reproducibility Study on the Performance of IR-Based Bug Localization
Bench4BL: Reproducibility Study on the Performance of IR-Based Bug LocalizationDongsun Kim
780 views57 slides

More Related Content

What's hot

AVATAR : Fixing Semantic Bugs with Fix Patterns of Static Analysis Violations by
AVATAR : Fixing Semantic Bugs with Fix Patterns of Static Analysis ViolationsAVATAR : Fixing Semantic Bugs with Fix Patterns of Static Analysis Violations
AVATAR : Fixing Semantic Bugs with Fix Patterns of Static Analysis ViolationsDongsun Kim
468 views25 slides
Java Quiz by
Java QuizJava Quiz
Java QuizDharmraj Sharma
7.5K views81 slides
Test final jav_aaa by
Test final jav_aaaTest final jav_aaa
Test final jav_aaaBagusBudi11
63 views19 slides
Headache from using mathematical software by
Headache from using mathematical softwareHeadache from using mathematical software
Headache from using mathematical softwarePVS-Studio
30 views13 slides
Looking for Bugs in MonoDevelop by
Looking for Bugs in MonoDevelopLooking for Bugs in MonoDevelop
Looking for Bugs in MonoDevelopPVS-Studio
270 views14 slides
Code Analysis-run time error prediction by
Code Analysis-run time error predictionCode Analysis-run time error prediction
Code Analysis-run time error predictionNIKHIL NAWATHE
723 views24 slides

What's hot(20)

AVATAR : Fixing Semantic Bugs with Fix Patterns of Static Analysis Violations by Dongsun Kim
AVATAR : Fixing Semantic Bugs with Fix Patterns of Static Analysis ViolationsAVATAR : Fixing Semantic Bugs with Fix Patterns of Static Analysis Violations
AVATAR : Fixing Semantic Bugs with Fix Patterns of Static Analysis Violations
Dongsun Kim468 views
Headache from using mathematical software by PVS-Studio
Headache from using mathematical softwareHeadache from using mathematical software
Headache from using mathematical software
PVS-Studio30 views
Looking for Bugs in MonoDevelop by PVS-Studio
Looking for Bugs in MonoDevelopLooking for Bugs in MonoDevelop
Looking for Bugs in MonoDevelop
PVS-Studio270 views
Code Analysis-run time error prediction by NIKHIL NAWATHE
Code Analysis-run time error predictionCode Analysis-run time error prediction
Code Analysis-run time error prediction
NIKHIL NAWATHE723 views
C, C++ Interview Questions Part - 1 by ReKruiTIn.com
C, C++ Interview Questions Part - 1C, C++ Interview Questions Part - 1
C, C++ Interview Questions Part - 1
ReKruiTIn.com16.8K views
Java level 1 Quizzes by Steven Luo
Java level 1 QuizzesJava level 1 Quizzes
Java level 1 Quizzes
Steven Luo845 views
Cppcheck and PVS-Studio compared by PVS-Studio
Cppcheck and PVS-Studio comparedCppcheck and PVS-Studio compared
Cppcheck and PVS-Studio compared
PVS-Studio418 views
Eclipse Con 2015: Codan - a C/C++ Code Analysis Framework for CDT by Elena Laskavaia
Eclipse Con 2015: Codan - a C/C++ Code Analysis Framework for CDTEclipse Con 2015: Codan - a C/C++ Code Analysis Framework for CDT
Eclipse Con 2015: Codan - a C/C++ Code Analysis Framework for CDT
Elena Laskavaia2.5K views
The First C# Project Analyzed by PVS-Studio
The First C# Project AnalyzedThe First C# Project Analyzed
The First C# Project Analyzed
PVS-Studio106 views
Binary code obfuscation through c++ template meta programming by nong_dan
Binary code obfuscation through c++ template meta programmingBinary code obfuscation through c++ template meta programming
Binary code obfuscation through c++ template meta programming
nong_dan998 views
Static analysis works for mission-critical systems, why not yours? by Rogue Wave Software
Static analysis works for mission-critical systems, why not yours? Static analysis works for mission-critical systems, why not yours?
Static analysis works for mission-critical systems, why not yours?
SherLog: Error Diagnosis by Connecting Clues from Run-time Logs by Dacong (Tony) Yan
SherLog: Error Diagnosis by Connecting Clues from Run-time LogsSherLog: Error Diagnosis by Connecting Clues from Run-time Logs
SherLog: Error Diagnosis by Connecting Clues from Run-time Logs
Code Analysis and Refactoring with CDT by dschaefer
Code Analysis and Refactoring with CDTCode Analysis and Refactoring with CDT
Code Analysis and Refactoring with CDT
dschaefer2.9K views

Similar to Learning to Spot and Refactor Inconsistent Method Names

Designing Architecture-aware Library using Boost.Proto by
Designing Architecture-aware Library using Boost.ProtoDesigning Architecture-aware Library using Boost.Proto
Designing Architecture-aware Library using Boost.ProtoJoel Falcou
1.8K views58 slides
Kernel Recipes 2018 - 10 years of automated evolution in the Linux kernel - J... by
Kernel Recipes 2018 - 10 years of automated evolution in the Linux kernel - J...Kernel Recipes 2018 - 10 years of automated evolution in the Linux kernel - J...
Kernel Recipes 2018 - 10 years of automated evolution in the Linux kernel - J...Anne Nicolas
454 views41 slides
Compiler Construction | Lecture 14 | Interpreters by
Compiler Construction | Lecture 14 | InterpretersCompiler Construction | Lecture 14 | Interpreters
Compiler Construction | Lecture 14 | InterpretersEelco Visser
307 views66 slides
Attention mechanisms with tensorflow by
Attention mechanisms with tensorflowAttention mechanisms with tensorflow
Attention mechanisms with tensorflowKeon Kim
35.7K views67 slides
Scala lens: An introduction by
Scala lens: An introductionScala lens: An introduction
Scala lens: An introductionKnoldus Inc.
5.9K views33 slides
Study on Different Code-Clone Detection Techniques & Approaches to MitigateCo... by
Study on Different Code-Clone Detection Techniques & Approaches to MitigateCo...Study on Different Code-Clone Detection Techniques & Approaches to MitigateCo...
Study on Different Code-Clone Detection Techniques & Approaches to MitigateCo...IRJET Journal
10 views5 slides

Similar to Learning to Spot and Refactor Inconsistent Method Names(20)

Designing Architecture-aware Library using Boost.Proto by Joel Falcou
Designing Architecture-aware Library using Boost.ProtoDesigning Architecture-aware Library using Boost.Proto
Designing Architecture-aware Library using Boost.Proto
Joel Falcou1.8K views
Kernel Recipes 2018 - 10 years of automated evolution in the Linux kernel - J... by Anne Nicolas
Kernel Recipes 2018 - 10 years of automated evolution in the Linux kernel - J...Kernel Recipes 2018 - 10 years of automated evolution in the Linux kernel - J...
Kernel Recipes 2018 - 10 years of automated evolution in the Linux kernel - J...
Anne Nicolas454 views
Compiler Construction | Lecture 14 | Interpreters by Eelco Visser
Compiler Construction | Lecture 14 | InterpretersCompiler Construction | Lecture 14 | Interpreters
Compiler Construction | Lecture 14 | Interpreters
Eelco Visser307 views
Attention mechanisms with tensorflow by Keon Kim
Attention mechanisms with tensorflowAttention mechanisms with tensorflow
Attention mechanisms with tensorflow
Keon Kim35.7K views
Scala lens: An introduction by Knoldus Inc.
Scala lens: An introductionScala lens: An introduction
Scala lens: An introduction
Knoldus Inc.5.9K views
Study on Different Code-Clone Detection Techniques & Approaches to MitigateCo... by IRJET Journal
Study on Different Code-Clone Detection Techniques & Approaches to MitigateCo...Study on Different Code-Clone Detection Techniques & Approaches to MitigateCo...
Study on Different Code-Clone Detection Techniques & Approaches to MitigateCo...
IRJET Journal10 views
Study on Different Code-Clone Detection Techniques & Approaches to MitigateCo... by IRJET Journal
Study on Different Code-Clone Detection Techniques & Approaches to MitigateCo...Study on Different Code-Clone Detection Techniques & Approaches to MitigateCo...
Study on Different Code-Clone Detection Techniques & Approaches to MitigateCo...
IRJET Journal6 views
Finding Resource Manipulation Bugs in Linux Code by Andrzej Wasowski
Finding Resource Manipulation Bugs in Linux CodeFinding Resource Manipulation Bugs in Linux Code
Finding Resource Manipulation Bugs in Linux Code
Andrzej Wasowski306 views
Discovering Your AI Super Powers - Tips and Tricks to Jumpstart your AI Projects by Wee Hyong Tok
Discovering Your AI Super Powers - Tips and Tricks to Jumpstart your AI ProjectsDiscovering Your AI Super Powers - Tips and Tricks to Jumpstart your AI Projects
Discovering Your AI Super Powers - Tips and Tricks to Jumpstart your AI Projects
Wee Hyong Tok285 views
OpenSees: Future Directions by openseesdays
OpenSees: Future DirectionsOpenSees: Future Directions
OpenSees: Future Directions
openseesdays1.1K views
ASE2023_SCPatcher_Presentation_V5.pptx by jzyNick
ASE2023_SCPatcher_Presentation_V5.pptxASE2023_SCPatcher_Presentation_V5.pptx
ASE2023_SCPatcher_Presentation_V5.pptx
jzyNick89 views
Madeo - a CAD Tool for reconfigurable Hardware by ESUG
Madeo - a CAD Tool for reconfigurable HardwareMadeo - a CAD Tool for reconfigurable Hardware
Madeo - a CAD Tool for reconfigurable Hardware
ESUG383 views
CMPT470-usask-guest-lecture by Masud Rahman
CMPT470-usask-guest-lectureCMPT470-usask-guest-lecture
CMPT470-usask-guest-lecture
Masud Rahman62 views
Language Integrated Query - LINQ by Doncho Minkov
Language Integrated Query - LINQLanguage Integrated Query - LINQ
Language Integrated Query - LINQ
Doncho Minkov1.4K views
Seq2Seq (encoder decoder) model by 佳蓉 倪
Seq2Seq (encoder decoder) modelSeq2Seq (encoder decoder) model
Seq2Seq (encoder decoder) model
佳蓉 倪1.1K views
11.secure compressed image transmission using self organizing feature maps by Alexander Decker
11.secure compressed image transmission using self organizing feature maps11.secure compressed image transmission using self organizing feature maps
11.secure compressed image transmission using self organizing feature maps
Alexander Decker458 views
Terence Barr - jdk7+8 - 24mai2011 by Agora Group
Terence Barr - jdk7+8 - 24mai2011Terence Barr - jdk7+8 - 24mai2011
Terence Barr - jdk7+8 - 24mai2011
Agora Group420 views
2nd Opinion The role of the preprocessor in the.pdf by bkbk37
2nd Opinion The role of the preprocessor in the.pdf2nd Opinion The role of the preprocessor in the.pdf
2nd Opinion The role of the preprocessor in the.pdf
bkbk373 views
2nd Opinion The role of the preprocessor in the.pdf by bkbk37
2nd Opinion The role of the preprocessor in the.pdf2nd Opinion The role of the preprocessor in the.pdf
2nd Opinion The role of the preprocessor in the.pdf
bkbk373 views
2nd Opinion The role of the preprocessor in the.pdf by scottharry3
2nd Opinion The role of the preprocessor in the.pdf2nd Opinion The role of the preprocessor in the.pdf
2nd Opinion The role of the preprocessor in the.pdf
scottharry34 views

Recently uploaded

Unit 1_Lecture 2_Physical Design of IoT.pdf by
Unit 1_Lecture 2_Physical Design of IoT.pdfUnit 1_Lecture 2_Physical Design of IoT.pdf
Unit 1_Lecture 2_Physical Design of IoT.pdfStephenTec
11 views36 slides
Tunable Laser (1).pptx by
Tunable Laser (1).pptxTunable Laser (1).pptx
Tunable Laser (1).pptxHajira Mahmood
23 views37 slides
Data-centric AI and the convergence of data and model engineering: opportunit... by
Data-centric AI and the convergence of data and model engineering:opportunit...Data-centric AI and the convergence of data and model engineering:opportunit...
Data-centric AI and the convergence of data and model engineering: opportunit...Paolo Missier
34 views40 slides
DALI Basics Course 2023 by
DALI Basics Course  2023DALI Basics Course  2023
DALI Basics Course 2023Ivory Egg
14 views12 slides
From chaos to control: Managing migrations and Microsoft 365 with ShareGate! by
From chaos to control: Managing migrations and Microsoft 365 with ShareGate!From chaos to control: Managing migrations and Microsoft 365 with ShareGate!
From chaos to control: Managing migrations and Microsoft 365 with ShareGate!sammart93
9 views39 slides
Uni Systems for Power Platform.pptx by
Uni Systems for Power Platform.pptxUni Systems for Power Platform.pptx
Uni Systems for Power Platform.pptxUni Systems S.M.S.A.
50 views21 slides

Recently uploaded(20)

Unit 1_Lecture 2_Physical Design of IoT.pdf by StephenTec
Unit 1_Lecture 2_Physical Design of IoT.pdfUnit 1_Lecture 2_Physical Design of IoT.pdf
Unit 1_Lecture 2_Physical Design of IoT.pdf
StephenTec11 views
Data-centric AI and the convergence of data and model engineering: opportunit... by Paolo Missier
Data-centric AI and the convergence of data and model engineering:opportunit...Data-centric AI and the convergence of data and model engineering:opportunit...
Data-centric AI and the convergence of data and model engineering: opportunit...
Paolo Missier34 views
DALI Basics Course 2023 by Ivory Egg
DALI Basics Course  2023DALI Basics Course  2023
DALI Basics Course 2023
Ivory Egg14 views
From chaos to control: Managing migrations and Microsoft 365 with ShareGate! by sammart93
From chaos to control: Managing migrations and Microsoft 365 with ShareGate!From chaos to control: Managing migrations and Microsoft 365 with ShareGate!
From chaos to control: Managing migrations and Microsoft 365 with ShareGate!
sammart939 views
SAP Automation Using Bar Code and FIORI.pdf by Virendra Rai, PMP
SAP Automation Using Bar Code and FIORI.pdfSAP Automation Using Bar Code and FIORI.pdf
SAP Automation Using Bar Code and FIORI.pdf
TouchLog: Finger Micro Gesture Recognition Using Photo-Reflective Sensors by sugiuralab
TouchLog: Finger Micro Gesture Recognition  Using Photo-Reflective SensorsTouchLog: Finger Micro Gesture Recognition  Using Photo-Reflective Sensors
TouchLog: Finger Micro Gesture Recognition Using Photo-Reflective Sensors
sugiuralab15 views
Business Analyst Series 2023 - Week 3 Session 5 by DianaGray10
Business Analyst Series 2023 -  Week 3 Session 5Business Analyst Series 2023 -  Week 3 Session 5
Business Analyst Series 2023 - Week 3 Session 5
DianaGray10209 views
Transcript: The Details of Description Techniques tips and tangents on altern... by BookNet Canada
Transcript: The Details of Description Techniques tips and tangents on altern...Transcript: The Details of Description Techniques tips and tangents on altern...
Transcript: The Details of Description Techniques tips and tangents on altern...
BookNet Canada130 views
PharoJS - Zürich Smalltalk Group Meetup November 2023 by Noury Bouraqadi
PharoJS - Zürich Smalltalk Group Meetup November 2023PharoJS - Zürich Smalltalk Group Meetup November 2023
PharoJS - Zürich Smalltalk Group Meetup November 2023
Noury Bouraqadi120 views
AMAZON PRODUCT RESEARCH.pdf by JerikkLaureta
AMAZON PRODUCT RESEARCH.pdfAMAZON PRODUCT RESEARCH.pdf
AMAZON PRODUCT RESEARCH.pdf
JerikkLaureta15 views
Automating a World-Class Technology Conference; Behind the Scenes of CiscoLive by Network Automation Forum
Automating a World-Class Technology Conference; Behind the Scenes of CiscoLiveAutomating a World-Class Technology Conference; Behind the Scenes of CiscoLive
Automating a World-Class Technology Conference; Behind the Scenes of CiscoLive
handbook for web 3 adoption.pdf by Liveplex
handbook for web 3 adoption.pdfhandbook for web 3 adoption.pdf
handbook for web 3 adoption.pdf
Liveplex19 views
Black and White Modern Science Presentation.pptx by maryamkhalid2916
Black and White Modern Science Presentation.pptxBlack and White Modern Science Presentation.pptx
Black and White Modern Science Presentation.pptx
maryamkhalid291614 views

Learning to Spot and Refactor Inconsistent Method Names

  • 1. logotype of the University of Luxembourg 1 Interdisciplinary Centre for Security, Reliability and Trust (SnT), University of Luxembourg 2 Department of Software Engineering, Chonbuk National University, South Korea Kui Liu1, Dongsun Kim1, Tegawendé F. Bissyandé1, Taeyoung Kim2, Kisub Kim1, Anil Koyuncu1, Suntae Kim2, Yves Le Traon1 Learning to Spot and Refactor Inconsistent Method Names 29th May 2019
  • 9. 2 Programming with Libraries LibraryA.java getItem() setObject(…) …() Using a method relies on its name (+ API document). Developers often do not check the inside of the method.
  • 10. 3 A Method can Disguise
  • 11. 3 A Method can Disguise getPokemon( … )
  • 12. 3 A Method can Disguise getPokemon( … ) What I expect
  • 13. 3 A Method can Disguise getPokemon( … ) getPokemonRealMonster( … ) What I expect
  • 14. 3 A Method can Disguise getPokemon( … ) getPokemonRealMonster( … ) What I expect What I actually get
  • 15. 3 A Method can Disguise getPokemon( … ) getPokemonRealMonster( … ) What I expect What I actually get
  • 16. 4 Making the name consistent is NOT easy https://www.itworld.com/article/2833265/ don-t-go-into-programming-if-you-don-t- have-a-good-thesaurus.html Naming Things 49%
  • 17. 4 Making the name consistent is NOT easy https://www.itworld.com/article/2833265/ don-t-go-into-programming-if-you-don-t- have-a-good-thesaurus.html Naming Things 49%
  • 18. 5 Consequence of inconsistent names There are 5K+ questions on naming issues in stackoverflow.com.
  • 19. 6 Naming bugs are common We found 183K+ commits addressing naming issues from GitHub.com by a quick search with simple queries such as “inconsistent, consistency, misleading, …”
  • 20. 7 Our Goals Detect inconsistent method names. Repair the names to be consistent.
  • 21. 7 Our Goals Detect inconsistent method names. Repair the names to be consistent.
  • 22. 7 Our Goals Detect inconsistent method names. Repair the names to be consistent.
  • 40. How to find similar names/implementations? 12 Sim( , ) = ?
  • 41. 13
  • 42. 14
  • 52. P P 17 Program Vectors Program Encoder M1 M2 M3 M4 <9, 2, 3, …> <7, 1, 6, …> <2, 8, 3, …> <0, 1, 8, …>
  • 53. 18 Method = Name + Body getID(…) { for(…) { for(…) { for(…) … Similar Names Similar Bodies N B
  • 54. 19 Method Name Embedding findField findMatchesHelper containsTarget containsField findInstruction1 find, Field find, Matches, Helper contains, Target contains, Field find, Instruction1 Tokenized Names (camel case, underscore) Method Names Embedded Vectors Sentence2vec (PV-DM)
  • 55. 20 return (String[]) list.toArray(new String[0]); Method Body Embedding Preprocessing: Program Serialization Method Body: [ReturnStatement, return,ArrayType, String[],Variable, listVar, Method, toArray, ArrayCreation, new,ArrayType, String[], NumberLiteral,“0”] Serialized AST:
  • 56. 21 return (String[]) list.toArray(new String[0]); Method Body: [ReturnStatement, return,ArrayType, String[],Variable, listVar, Method, toArray, ArrayCreation, new,ArrayType, String[], NumberLiteral,“0”] Serialized AST: ... Method Body Embedding Preprocessing: Program Serialization
  • 57. 22 return (String[]) list.toArray(new String[0]); Method Body: [ReturnStatement, return,ArrayType, String[],Variable, listVar, Method, toArray, ArrayCreation, new,ArrayType, String[], NumberLiteral,“0”] Serialized AST: Method Body Embedding Preprocessing: Program Serialization
  • 58. 23 Vectorization [ReturnStatement, return,ArrayType, String[],Variable, listVar, Method, toArray, ArrayCreation, new,ArrayType, String[], NumberLiteral,“0”] Serialized AST: Method Body Embedding
  • 59. 24 Vectorization [ReturnStatement, return,ArrayType, String[],Variable, listVar, Method, toArray, ArrayCreation, new,ArrayType, String[], NumberLiteral,“0”] Serialized AST: Method Body Embedding
  • 60. 24 Vectorization [ReturnStatement, return,ArrayType, String[],Variable, listVar, Method, toArray, ArrayCreation, new,ArrayType, String[], NumberLiteral,“0”] Serialized AST: Token Embedding (Word2Vec) Method Body Embedding
  • 61. 24 Vectorization [ReturnStatement, return,ArrayType, String[],Variable, listVar, Method, toArray, ArrayCreation, new,ArrayType, String[], NumberLiteral,“0”] Serialized AST: Token Embedding (Word2Vec) <2, 6, 9, …> <8, 4, 1, …> <9, 0, 7, …> <2, 3, 0, …> … <7, 1, 2, …> … Method Body Embedding
  • 62. 24 Vectorization [ReturnStatement, return,ArrayType, String[],Variable, listVar, Method, toArray, ArrayCreation, new,ArrayType, String[], NumberLiteral,“0”] Serialized AST: Token Embedding (Word2Vec) <2, 6, 9, …> <8, 4, 1, …> <9, 0, 7, …> <2, 3, 0, …> … <7, 1, 2, …> … Method Body Embedding
  • 63. 25 Encoding (CNN-based) icle has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TSE.2018.288495 Transactions on Software Engineering 7 … … n ! k (a two-dimensional numeric vector) Input layer C1: 4 feature maps S1: 4 feature maps C2: 6 feature maps S2: 6 feature maps Convolutional layer Convolutional layerSubsampling layer Subsampling layer Fully connected layers Output layer Dense layer Output is extracted features ReturnStatement return ArrayType String[] Variable listVar Method toArray ArrayCreation new ArrayType String[] NumberLiteral “0” 0 0 0 0 0 00 0 0 0 0 0 00 Fig. 8: CNN architecture for extracting clustering features. C1 is the first convolutional layer, and C2 is the second one. S1 is 2 6 9 ... ... ... ... ... ... ... ... ... ... ... ... ... ... 8 4 1 9 0 7 2 3 0 7 1 2 ... ......
  • 64. 26 Encoding (CNN-based) icle has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TSE.2018.288495 Transactions on Software Engineering 7 … … n ! k (a two-dimensional numeric vector) Input layer C1: 4 feature maps S1: 4 feature maps C2: 6 feature maps S2: 6 feature maps Convolutional layer Convolutional layerSubsampling layer Subsampling layer Fully connected layers Output layer Dense layer Output is extracted features ReturnStatement return ArrayType String[] Variable listVar Method toArray ArrayCreation new ArrayType String[] NumberLiteral “0” 0 0 0 0 0 00 0 0 0 0 0 00 Fig. 8: CNN architecture for extracting clustering features. C1 is the first convolutional layer, and C2 is the second one. S1 is 2 6 9 ... ... ... ... ... ... ... ... ... ... ... ... ... ... 8 4 1 9 0 7 2 3 0 7 1 2 ... ......
  • 65. 26 Encoding (CNN-based) icle has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TSE.2018.288495 Transactions on Software Engineering 7 … … n ! k (a two-dimensional numeric vector) Input layer C1: 4 feature maps S1: 4 feature maps C2: 6 feature maps S2: 6 feature maps Convolutional layer Convolutional layerSubsampling layer Subsampling layer Fully connected layers Output layer Dense layer Output is extracted features ReturnStatement return ArrayType String[] Variable listVar Method toArray ArrayCreation new ArrayType String[] NumberLiteral “0” 0 0 0 0 0 00 0 0 0 0 0 00 Fig. 8: CNN architecture for extracting clustering features. C1 is the first convolutional layer, and C2 is the second one. S1 is 2 6 9 ... ... ... ... ... ... ... ... ... ... ... ... ... ... 8 4 1 9 0 7 2 3 0 7 1 2 ... ......
  • 69. 30 Inconsistency detection = True False The method is likely to have a consistent name. The method name could be inconsistent with the implementation. Suggest a new name.→
  • 72. 33 Name Suggestion Four ranking strategies: Suggestion = Sorting similar imp.
  • 73. 33 Name Suggestion Four ranking strategies: Suggestion = Sorting similar imp. R1: Sort names based on distance, don’t care identical names.
  • 74. 33 Name Suggestion Four ranking strategies: Suggestion = Sorting similar imp. R1: Sort names based on distance, don’t care identical names. R2: Group identical names first, sort groups based on size.
  • 75. 33 Name Suggestion Four ranking strategies: Suggestion = Sorting similar imp. R1: Sort names based on distance, don’t care identical names. R2: Group identical names first, sort groups based on size. R3: Group identical names first, sort groups based on avg. distance.
  • 76. 33 Name Suggestion Four ranking strategies: Suggestion = Sorting similar imp. R1: Sort names based on distance, don’t care identical names. R2: Group identical names first, sort groups based on size. R3: Group identical names first, sort groups based on avg. distance. R4: Same with R3, but penalize groups with size=1.
  • 78. 35 Research Questions RQ1: Inconsistency Identification RQ2: Suggestion Precision RQ3: Comparative Study RQ4: Live Study
  • 79. 35 Research Questions RQ1: Inconsistency Identification RQ2: Suggestion Precision RQ3: Comparative Study RQ4: Live Study }Training/testing data from open-source projects.
  • 80. 35 Research Questions RQ1: Inconsistency Identification RQ2: Suggestion Precision RQ3: Comparative Study RQ4: Live Study }Training/testing data from open-source projects. Comparing with an approach* with based on a convolutional attention network. → [*] M. Allamanis, H. Peng, and C. Sutton, “A convolutional attention network for extreme summarization of source code,” in Proceedings of the 33nd International Conference on Machine Learning. JMLR.org, 2016, pp. 2091–2100.
  • 81. 35 Research Questions RQ1: Inconsistency Identification RQ2: Suggestion Precision RQ3: Comparative Study RQ4: Live Study }Training/testing data from open-source projects. Comparing with an approach* with based on a convolutional attention network. → Submitting our suggestion results as pull-requests to open-source projects. → [*] M. Allamanis, H. Peng, and C. Sutton, “A convolutional attention network for extreme summarization of source code,” in Proceedings of the 33nd International Conference on Machine Learning. JMLR.org, 2016, pp. 2091–2100.
  • 86. 36 Training/Testing Set Total: 430 projects Training Data 2,116,413 methods
  • 92. 38 RQ1: Inconsistency Identification # of neighbors to look up k=1 5 10 30 Inconsistent (%) Precision 56.8 53.7 53.3 49.9 Recall 84.5 55.9 46.7 28.8 F1 67.9 54.8 49.7 36.5 Consistent (%) Precision 72.0 55.9 54.2 51.4 Recall 38.2 53.7 60.7 72.2 F1 49.9 54.8 57.3 60.0 Accuracy (%) 60.9 54.8 53.8 50.9 →
  • 93. 38 RQ1: Inconsistency Identification # of neighbors to look up k=1 5 10 30 Inconsistent (%) Precision 56.8 53.7 53.3 49.9 Recall 84.5 55.9 46.7 28.8 F1 67.9 54.8 49.7 36.5 Consistent (%) Precision 72.0 55.9 54.2 51.4 Recall 38.2 53.7 60.7 72.2 F1 49.9 54.8 57.3 60.0 Accuracy (%) 60.9 54.8 53.8 50.9 →
  • 94. 39 RQ2: Name Suggestion Accuracy (%) k=thr k=10 R1 R2 R3 R4 First Token thr=1 23.4 23.2 23.0 24.1 thr=5 35.7 39.4 39.4 39.7 Full Name thr=1 10.7 11.0 10.9 10.9 thr=5 17.0 18.7 19.0 19.2
  • 95. 39 RQ2: Name Suggestion Accuracy (%) k=thr k=10 R1 R2 R3 R4 First Token thr=1 23.4 23.2 23.0 24.1 thr=5 35.7 39.4 39.4 39.7 Full Name thr=1 10.7 11.0 10.9 10.9 thr=5 17.0 18.7 19.0 19.2
  • 96. 40 RQ3: Comparison — Name Suggestion Accuracy (%) First Token Full Name thr=1 thr=5 thr=1 thr=5 R1 36.4 47.2 16.5 22.9 R2 34.8 50.2 17.0 25.4 R3 34.7 50.3 16.9 25.5 R4 35.4 50.5 16.0 25.7 conv_attention 22.3 33.6 0.3 0.6 copy_attention 23.5 44.7 0.4 1.1 state-of-the-art }
  • 97. 40 RQ3: Comparison — Name Suggestion Accuracy (%) First Token Full Name thr=1 thr=5 thr=1 thr=5 R1 36.4 47.2 16.5 22.9 R2 34.8 50.2 17.0 25.4 R3 34.7 50.3 16.9 25.5 R4 35.4 50.5 16.0 25.7 conv_attention 22.3 33.6 0.3 0.6 copy_attention 23.5 44.7 0.4 1.1 state-of-the-art }
  • 98. 41 Training Data RQ4: Live Study — Setup
  • 100. 41 Training Data 10% RQ4: Live Study — Setup
  • 101. 42 Training Data RQ4: Live Study — Setup 10%
  • 102. 42 Training Data Identify inconsistent names and suggest new names (sampled 100 cases). RQ4: Live Study — Setup 10%
  • 103. 42 Training Data Identify inconsistent names and suggest new names (sampled 100 cases). RQ4: Live Study — Setup 10%
  • 104. 42 Training Data Identify inconsistent names and suggest new names (sampled 100 cases). Create a pull request RQ4: Live Study — Setup 10%
  • 105. 42 Training Data Identify inconsistent names and suggest new names (sampled 100 cases). Create a pull request RQ4: Live Study — Setup 10%
  • 106. 42 Training Data Identify inconsistent names and suggest new names (sampled 100 cases). Create a pull request Ask a maintainer to refactor the method names RQ4: Live Study — Setup 10%
  • 107. 43 RQ4: Live Study Agreed Agreed but not fixed Disagreed Ignored Total Merged Approved Improved Cannot Won’t 40 26 4 1 2 9 18 100
  • 108. 43 RQ4: Live Study Agreed Agreed but not fixed Disagreed Ignored Total Merged Approved Improved Cannot Won’t 40 26 4 1 2 9 18 100 Half of them are public methods.
  • 109. 43 RQ4: Live Study Agreed Agreed but not fixed Disagreed Ignored Total Merged Approved Improved Cannot Won’t 40 26 4 1 2 9 18 100 Half of them are public methods. Developer feedback includes
  • 110. 43 RQ4: Live Study Agreed Agreed but not fixed Disagreed Ignored Total Merged Approved Improved Cannot Won’t 40 26 4 1 2 9 18 100 Half of them are public methods. Developer feedback includes * It should follow project-specific naming conventions.
  • 111. 43 RQ4: Live Study Agreed Agreed but not fixed Disagreed Ignored Total Merged Approved Improved Cannot Won’t 40 26 4 1 2 9 18 100 Half of them are public methods. Developer feedback includes * It should follow project-specific naming conventions. * Some method names should consider class names. e.g., In “XXXBuilder”, many methods cannot be named as “build()” even though they return “XXXBuilder” objects.
  • 112. 44 Summary X RQ4: Live Study Agreed Agreed but not fixed Disagreed Ignored Total Merged Approved Improved Cannot Won’t 40 26 4 1 2 9 18 100 Half of them are public methods. * It should follow project-specific naming conventions. * Some method names should consider class names. e.g., In “XXXBuilder”, many methods cannot be named as “build()” even though they return “XXXBuilder” objects. X RQ3: Comparison Accuracy (%) First Token Full Name thr=1 thr=5 thr=1 thr=5 R1 36.4 47.2 16.5 22.9 R2 34.8 50.2 17.0 25.4 R3 34.7 50.3 16.9 25.5 R4 35.4 50.5 16.0 25.7 conv_attention 22.3 33.6 0.3 0.6 copy_attention 23.5 44.7 0.4 1.1 X Encoding (CNN-based) This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TSE.2018.2884955, IEEE Transactions on Software Engineering 7 … … n ! k (a two-dimensional numeric vector) Input layer C1: 4 feature maps S1: 4 feature maps C2: 6 feature maps S2: 6 feature maps Convolutional layer Convolutional layerSubsampling layer Subsampling layer Fully connected layers Output layer Dense layer Output is extracted features ReturnStatement return ArrayType String[] Variable listVar Method toArray ArrayCreation new ArrayType String[] NumberLiteral “0” 0 0 0 0 0 00 0 0 0 0 0 00 Fig. 8: CNN architecture for extracting clustering features. C1 is the first convolutional layer, and C2 is the second one. S1 is the first subsampling layer, and S2 is the second one. The output of dense layer is considered as extracted features of code fragments and will be used to do clustering. 2.4.4 Code Patterns Mining Although violations can be parsed and converted into two- dimensional numeric vectors, it is still challenging to mine code patterns given that noisy information (e.g., specific meaningless identifiers) can interfere with identifying sim- ilar violations. Deep learning has recently been shown promising in various software engineering tasks [18], [47], [49]. In particular, it offers a major advantage of requiring less prior knowledge and human effort in feature design for machine learning applications. Consequently, our method is designed to deeply learn discriminating features for mining code patterns of violations. We leverage CNNs to perform deep learning of violation features with embedded viola- tions, and also use X-means clustering algorithm to cluster violations with learned features. Feature learning with CNNs Figure 8 shows the CNNs architecture for learning violation features. The input is two-dimensional numeric vectors of preprocessed violations. The alternating local-connected convolutional and subsampling layers are used to capture the local features of violations. The dense layer compresses all local features captured by former layers. We select the output of the dense layer as the learned violation features to cluster violations. Note that our approach uses CNNs to of violations from clustered similar code fragments of viola- tions to show patterns clearly. Note that, the whole process of mining patterns is automated. 2.5 Mining Common Fix Patterns Our goal in this step is to summarize how a violation is resolved by developers. To achieve this goal, we col- lect violation fixing changes and proceed to identify their common fix patterns. The approach of mining common fix patterns is similar to that of mining common code patterns. The differences lie in the data collection and tokenization process. Before describing our approach of mining common fix patterns, we formalize the definitions of patch and fix pattern. 2.5.1 Preliminaries A patch represents a modification carried on a program source code to repair the program which was brought to an erroneous state at runtime. A patch thus captures some knowledge on modification behavior, and similar patches may be associated with similar behavioral changes. Definition 4. Patch (P): A patch is a pair of source code fragments, one representing a buggy version and another as its updated (i.e., bug-fixing) version. In the traditional GNU diff representation of patches, the buggy version is 2 6 9 ... ... ... ... ... ... ... ... ... ... ... ... ... ... 8 4 1 9 0 7 2 3 0 7 1 2 ... ...... X Making the name consistent is NOT easy https://www.itworld.com/article/2833265/ don-t-go-into-programming-if-you-don-t- have-a-good-thesaurus.html
  • 114. 46 https://www.darkrsw.net http://wwwen.uni.lu/snt/ research/serval Hire me! Université du Luxembourg 1.1 - logotype of the University of Luxembourg The logotype may not be altered under any circumstances. It is to be used like this for all communication mediums. Université du Luxembourg © 03/2013 3.1 - the Interdisciplinary Centre for Security Reliability and Trust The SnT uses its own logo. It is used on all external communication tools in combination with the UL logo. Design guidelines are available at SnT. Hiring