Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
logotype of the University
of Luxembourg
1 Interdisciplinary Centre for Security, Reliability and Trust (SnT), University ...
2
Programming with Libraries
2
Programming with Libraries
LibraryA.java
2
Programming with Libraries
LibraryA.java
getItem()
setObject(…)
…()
2
Programming with Libraries
LibraryA.java
getItem()
setObject(…)
…()
2
Programming with Libraries
LibraryA.java
getItem()
setObject(…)
…()
2
Programming with Libraries
LibraryA.java
getItem()
setObject(…)
…()
Developers often
do not check the inside
of the meth...
2
Programming with Libraries
LibraryA.java
getItem()
setObject(…)
…()
Developers often
do not check the inside
of the meth...
2
Programming with Libraries
LibraryA.java
getItem()
setObject(…)
…()
Using a method relies on
its name (+ API document).
...
3
A Method can Disguise
3
A Method can Disguise
getPokemon( … )
3
A Method can Disguise
getPokemon( … )
What I expect
3
A Method can Disguise
getPokemon( … )
getPokemonRealMonster( … )
What I expect
3
A Method can Disguise
getPokemon( … )
getPokemonRealMonster( … )
What I expect
What I actually get
3
A Method can Disguise
getPokemon( … )
getPokemonRealMonster( … )
What I expect
What I actually get
4
Making the name consistent is NOT easy
https://www.itworld.com/article/2833265/
don-t-go-into-programming-if-you-don-t-
...
4
Making the name consistent is NOT easy
https://www.itworld.com/article/2833265/
don-t-go-into-programming-if-you-don-t-
...
5
Consequence of inconsistent names
There are 5K+ questions on naming issues in
stackoverflow.com.
6
Naming bugs are common
We found 183K+ commits addressing naming issues from
GitHub.com by a quick search with simple que...
7
Our Goals
Detect inconsistent
method names.
Repair the names to be
consistent.
7
Our Goals
Detect inconsistent
method names.
Repair the names to be
consistent.
7
Our Goals
Detect inconsistent
method names.
Repair the names to be
consistent.
8
Idea
Similar implementations would have similar names.
8
Idea
Similar implementations would have similar names.
8
Idea
Similar implementations would have similar names.
8
Idea
Similar implementations would have similar names.
8
Idea
Similar implementations would have similar names.
8
Idea
Similar implementations would have similar names.
8
Idea
Similar implementations would have similar names.
9
Idea
Similar implementations would have similar names.
9
Idea
Similar implementations would have similar names.
9
Idea
Similar implementations would have similar names.
9
Idea
Similar implementations would have similar names.
9
Idea
Similar implementations would have similar names.
9
Idea
Similar implementations would have similar names.
10
Idea
Similar implementations, but different names.
10
Idea
Similar implementations, but different names.
Approach
11
Approach
11
How to find similar
names/implementations?
12
Sim( , ) = ?
13
14
14
What we need!
15
Autoencoder
16
Method Method
Autoencoder
P P
17
Program Vectors
P P
17
Program Vectors
P P
17
Program Vectors
Program
Encoder
P P
17
Program Vectors
Program
Encoder
M1
M2
M3
M4
P P
17
Program Vectors
Program
Encoder
M1
M2
M3
M4
P P
17
Program Vectors
Program
Encoder
M1
M2
M3
M4
P P
17
Program Vectors
Program
Encoder
M1
M2
M3
M4
<9, 2, 3, …>
<7, 1, 6, …>
<2, 8, 3, …>
<0, 1, 8, …>
18
Method = Name + Body
getID(…)
{
for(…)
{
for(…)
{
for(…)
…
Similar
Names
Similar
Bodies
N
B
19
Method Name Embedding
findField
findMatchesHelper
containsTarget
containsField
findInstruction1
find, Field
find, Match...
20
return (String[]) list.toArray(new String[0]);
Method Body Embedding
Preprocessing: Program Serialization
Method Body:
...
21
return (String[]) list.toArray(new String[0]);
Method Body:
[ReturnStatement, return,ArrayType, String[],Variable, list...
22
return (String[]) list.toArray(new String[0]);
Method Body:
[ReturnStatement, return,ArrayType, String[],Variable, list...
23
Vectorization
[ReturnStatement, return,ArrayType, String[],Variable, listVar, Method, toArray,
ArrayCreation, new,Array...
24
Vectorization
[ReturnStatement, return,ArrayType, String[],Variable, listVar, Method, toArray,
ArrayCreation, new,Array...
24
Vectorization
[ReturnStatement, return,ArrayType, String[],Variable, listVar, Method, toArray,
ArrayCreation, new,Array...
24
Vectorization
[ReturnStatement, return,ArrayType, String[],Variable, listVar, Method, toArray,
ArrayCreation, new,Array...
24
Vectorization
[ReturnStatement, return,ArrayType, String[],Variable, listVar, Method, toArray,
ArrayCreation, new,Array...
25
Encoding (CNN-based)
icle has been accepted for publication in a future issue of this journal, but has not been fully e...
26
Encoding (CNN-based)
icle has been accepted for publication in a future issue of this journal, but has not been fully e...
26
Encoding (CNN-based)
icle has been accepted for publication in a future issue of this journal, but has not been fully e...
27
Inconsistency detection
N
B
findField
{
for(…)
{
…
28
Inconsistency detection
N
B
findField
{
for(…)
{
…
Adjacent
Methods
=
29
Inconsistency detection
N
B
findField
{
for(…)
{
…
30
Inconsistency detection
=
True
False
The method is likely to have
a consistent name.
The method name could be
inconsist...
31
Name Suggestion
32
Name Suggestion
33
Name Suggestion
Four ranking strategies:
Suggestion = Sorting similar imp.
33
Name Suggestion
Four ranking strategies:
Suggestion = Sorting similar imp.
R1: Sort names based on distance,
don’t care...
33
Name Suggestion
Four ranking strategies:
Suggestion = Sorting similar imp.
R1: Sort names based on distance,
don’t care...
33
Name Suggestion
Four ranking strategies:
Suggestion = Sorting similar imp.
R1: Sort names based on distance,
don’t care...
33
Name Suggestion
Four ranking strategies:
Suggestion = Sorting similar imp.
R1: Sort names based on distance,
don’t care...
34
Evaluation
35
Research Questions
RQ1: Inconsistency Identification
RQ2: Suggestion Precision
RQ3: Comparative Study
RQ4: Live Study
35
Research Questions
RQ1: Inconsistency Identification
RQ2: Suggestion Precision
RQ3: Comparative Study
RQ4: Live Study
}...
35
Research Questions
RQ1: Inconsistency Identification
RQ2: Suggestion Precision
RQ3: Comparative Study
RQ4: Live Study
}...
35
Research Questions
RQ1: Inconsistency Identification
RQ2: Suggestion Precision
RQ3: Comparative Study
RQ4: Live Study
}...
36
Training/Testing Set
Total: 430 projects
36
Training/Testing Set
Total: 430 projects
36
Training/Testing Set
Total: 430 projects
36
Training/Testing Set
Total: 430 projects
36
Training/Testing Set
Total: 430 projects
Training Data
2,116,413 methods
37
Training/Testing Set
37
Training/Testing Set
37
Training/Testing Set
→
37
Training/Testing Set
→
37
Training/Testing Set
→
Testing
2,805 methods
(name pairs + bodies)
38
RQ1: Inconsistency Identification
# of neighbors to look up k=1 5 10 30
Inconsistent
(%)
Precision 56.8 53.7 53.3 49.9
R...
38
RQ1: Inconsistency Identification
# of neighbors to look up k=1 5 10 30
Inconsistent
(%)
Precision 56.8 53.7 53.3 49.9
R...
39
RQ2: Name Suggestion
Accuracy (%)
k=thr k=10
R1 R2 R3 R4
First Token
thr=1 23.4 23.2 23.0 24.1
thr=5 35.7 39.4 39.4 39....
39
RQ2: Name Suggestion
Accuracy (%)
k=thr k=10
R1 R2 R3 R4
First Token
thr=1 23.4 23.2 23.0 24.1
thr=5 35.7 39.4 39.4 39....
40
RQ3: Comparison — Name Suggestion
Accuracy (%)
First Token Full Name
thr=1 thr=5 thr=1 thr=5
R1 36.4 47.2 16.5 22.9
R2 ...
40
RQ3: Comparison — Name Suggestion
Accuracy (%)
First Token Full Name
thr=1 thr=5 thr=1 thr=5
R1 36.4 47.2 16.5 22.9
R2 ...
41
Training Data
RQ4: Live Study — Setup
41
Training Data
10%
RQ4: Live Study — Setup
41
Training Data
10%
RQ4: Live Study — Setup
42
Training Data
RQ4: Live Study — Setup
10%
42
Training Data Identify inconsistent names
and suggest new names
(sampled 100 cases).
RQ4: Live Study — Setup
10%
42
Training Data Identify inconsistent names
and suggest new names
(sampled 100 cases).
RQ4: Live Study — Setup
10%
42
Training Data Identify inconsistent names
and suggest new names
(sampled 100 cases).
Create a pull request
RQ4: Live St...
42
Training Data Identify inconsistent names
and suggest new names
(sampled 100 cases).
Create a pull request
RQ4: Live St...
42
Training Data Identify inconsistent names
and suggest new names
(sampled 100 cases).
Create a pull request
Ask a mainta...
43
RQ4: Live Study
Agreed Agreed but not fixed
Disagreed Ignored Total
Merged Approved Improved Cannot Won’t
40 26 4 1 2 9 ...
43
RQ4: Live Study
Agreed Agreed but not fixed
Disagreed Ignored Total
Merged Approved Improved Cannot Won’t
40 26 4 1 2 9 ...
43
RQ4: Live Study
Agreed Agreed but not fixed
Disagreed Ignored Total
Merged Approved Improved Cannot Won’t
40 26 4 1 2 9 ...
43
RQ4: Live Study
Agreed Agreed but not fixed
Disagreed Ignored Total
Merged Approved Improved Cannot Won’t
40 26 4 1 2 9 ...
43
RQ4: Live Study
Agreed Agreed but not fixed
Disagreed Ignored Total
Merged Approved Improved Cannot Won’t
40 26 4 1 2 9 ...
44
Summary
X
RQ4: Live Study
Agreed Agreed but not fixed
Disagreed Ignored Total
Merged Approved Improved Cannot Won’t
40 2...
45
https://github.com/SerVal-DTF/debug-method-name
Tool and Data
46
https://www.darkrsw.net http://wwwen.uni.lu/snt/
research/serval
Hire me! Université du Luxembourg
1.1 - logotype of th...
Upcoming SlideShare
Loading in …5
×

Learning to Spot and Refactor Inconsistent Method Names

22 views

Published on

To ensure code readability and facilitate software maintenance, program methods must be named properly. In particular, method names must be consistent with the corresponding method implementations. Debugging method names remains an important topic in the literature, where various approaches analyze commonalities among method names in a large dataset to detect inconsistent method names and suggest better ones. We note that the state-of-the-art does not analyze the implemented code itself to assess consistency. We thus propose a novel automated approach to debugging method names based on the analysis of consistency between method names and method code. The approach leverages deep feature representation techniques adapted to the nature of each artifact. Experimental results on over 2.1 million Java methods show that we can achieve up to 15 percentage points improvement over the state-of-the-art, establishing a record performance of 67.9% F1-measure in identifying inconsistent method names. We further demonstrate that our approach yields up to 25% accuracy in suggesting full names, while the state-of-the-art lags far behind at 1.1% accuracy. Finally, we report on our success in fixing 66 inconsistent method names in a live study on projects in the wild.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Learning to Spot and Refactor Inconsistent Method Names

  1. 1. logotype of the University of Luxembourg 1 Interdisciplinary Centre for Security, Reliability and Trust (SnT), University of Luxembourg 2 Department of Software Engineering, Chonbuk National University, South Korea Kui Liu1, Dongsun Kim1, Tegawendé F. Bissyandé1, Taeyoung Kim2, Kisub Kim1, Anil Koyuncu1, Suntae Kim2, Yves Le Traon1 Learning to Spot and Refactor Inconsistent Method Names 29th May 2019
  2. 2. 2 Programming with Libraries
  3. 3. 2 Programming with Libraries LibraryA.java
  4. 4. 2 Programming with Libraries LibraryA.java getItem() setObject(…) …()
  5. 5. 2 Programming with Libraries LibraryA.java getItem() setObject(…) …()
  6. 6. 2 Programming with Libraries LibraryA.java getItem() setObject(…) …()
  7. 7. 2 Programming with Libraries LibraryA.java getItem() setObject(…) …() Developers often do not check the inside of the method.
  8. 8. 2 Programming with Libraries LibraryA.java getItem() setObject(…) …() Developers often do not check the inside of the method.
  9. 9. 2 Programming with Libraries LibraryA.java getItem() setObject(…) …() Using a method relies on its name (+ API document). Developers often do not check the inside of the method.
  10. 10. 3 A Method can Disguise
  11. 11. 3 A Method can Disguise getPokemon( … )
  12. 12. 3 A Method can Disguise getPokemon( … ) What I expect
  13. 13. 3 A Method can Disguise getPokemon( … ) getPokemonRealMonster( … ) What I expect
  14. 14. 3 A Method can Disguise getPokemon( … ) getPokemonRealMonster( … ) What I expect What I actually get
  15. 15. 3 A Method can Disguise getPokemon( … ) getPokemonRealMonster( … ) What I expect What I actually get
  16. 16. 4 Making the name consistent is NOT easy https://www.itworld.com/article/2833265/ don-t-go-into-programming-if-you-don-t- have-a-good-thesaurus.html Naming Things 49%
  17. 17. 4 Making the name consistent is NOT easy https://www.itworld.com/article/2833265/ don-t-go-into-programming-if-you-don-t- have-a-good-thesaurus.html Naming Things 49%
  18. 18. 5 Consequence of inconsistent names There are 5K+ questions on naming issues in stackoverflow.com.
  19. 19. 6 Naming bugs are common We found 183K+ commits addressing naming issues from GitHub.com by a quick search with simple queries such as “inconsistent, consistency, misleading, …”
  20. 20. 7 Our Goals Detect inconsistent method names. Repair the names to be consistent.
  21. 21. 7 Our Goals Detect inconsistent method names. Repair the names to be consistent.
  22. 22. 7 Our Goals Detect inconsistent method names. Repair the names to be consistent.
  23. 23. 8 Idea Similar implementations would have similar names.
  24. 24. 8 Idea Similar implementations would have similar names.
  25. 25. 8 Idea Similar implementations would have similar names.
  26. 26. 8 Idea Similar implementations would have similar names.
  27. 27. 8 Idea Similar implementations would have similar names.
  28. 28. 8 Idea Similar implementations would have similar names.
  29. 29. 8 Idea Similar implementations would have similar names.
  30. 30. 9 Idea Similar implementations would have similar names.
  31. 31. 9 Idea Similar implementations would have similar names.
  32. 32. 9 Idea Similar implementations would have similar names.
  33. 33. 9 Idea Similar implementations would have similar names.
  34. 34. 9 Idea Similar implementations would have similar names.
  35. 35. 9 Idea Similar implementations would have similar names.
  36. 36. 10 Idea Similar implementations, but different names.
  37. 37. 10 Idea Similar implementations, but different names.
  38. 38. Approach 11
  39. 39. Approach 11
  40. 40. How to find similar names/implementations? 12 Sim( , ) = ?
  41. 41. 13
  42. 42. 14
  43. 43. 14 What we need!
  44. 44. 15 Autoencoder
  45. 45. 16 Method Method Autoencoder
  46. 46. P P 17 Program Vectors
  47. 47. P P 17 Program Vectors
  48. 48. P P 17 Program Vectors Program Encoder
  49. 49. P P 17 Program Vectors Program Encoder M1 M2 M3 M4
  50. 50. P P 17 Program Vectors Program Encoder M1 M2 M3 M4
  51. 51. P P 17 Program Vectors Program Encoder M1 M2 M3 M4
  52. 52. P P 17 Program Vectors Program Encoder M1 M2 M3 M4 <9, 2, 3, …> <7, 1, 6, …> <2, 8, 3, …> <0, 1, 8, …>
  53. 53. 18 Method = Name + Body getID(…) { for(…) { for(…) { for(…) … Similar Names Similar Bodies N B
  54. 54. 19 Method Name Embedding findField findMatchesHelper containsTarget containsField findInstruction1 find, Field find, Matches, Helper contains, Target contains, Field find, Instruction1 Tokenized Names (camel case, underscore) Method Names Embedded Vectors Sentence2vec (PV-DM)
  55. 55. 20 return (String[]) list.toArray(new String[0]); Method Body Embedding Preprocessing: Program Serialization Method Body: [ReturnStatement, return,ArrayType, String[],Variable, listVar, Method, toArray, ArrayCreation, new,ArrayType, String[], NumberLiteral,“0”] Serialized AST:
  56. 56. 21 return (String[]) list.toArray(new String[0]); Method Body: [ReturnStatement, return,ArrayType, String[],Variable, listVar, Method, toArray, ArrayCreation, new,ArrayType, String[], NumberLiteral,“0”] Serialized AST: ... Method Body Embedding Preprocessing: Program Serialization
  57. 57. 22 return (String[]) list.toArray(new String[0]); Method Body: [ReturnStatement, return,ArrayType, String[],Variable, listVar, Method, toArray, ArrayCreation, new,ArrayType, String[], NumberLiteral,“0”] Serialized AST: Method Body Embedding Preprocessing: Program Serialization
  58. 58. 23 Vectorization [ReturnStatement, return,ArrayType, String[],Variable, listVar, Method, toArray, ArrayCreation, new,ArrayType, String[], NumberLiteral,“0”] Serialized AST: Method Body Embedding
  59. 59. 24 Vectorization [ReturnStatement, return,ArrayType, String[],Variable, listVar, Method, toArray, ArrayCreation, new,ArrayType, String[], NumberLiteral,“0”] Serialized AST: Method Body Embedding
  60. 60. 24 Vectorization [ReturnStatement, return,ArrayType, String[],Variable, listVar, Method, toArray, ArrayCreation, new,ArrayType, String[], NumberLiteral,“0”] Serialized AST: Token Embedding (Word2Vec) Method Body Embedding
  61. 61. 24 Vectorization [ReturnStatement, return,ArrayType, String[],Variable, listVar, Method, toArray, ArrayCreation, new,ArrayType, String[], NumberLiteral,“0”] Serialized AST: Token Embedding (Word2Vec) <2, 6, 9, …> <8, 4, 1, …> <9, 0, 7, …> <2, 3, 0, …> … <7, 1, 2, …> … Method Body Embedding
  62. 62. 24 Vectorization [ReturnStatement, return,ArrayType, String[],Variable, listVar, Method, toArray, ArrayCreation, new,ArrayType, String[], NumberLiteral,“0”] Serialized AST: Token Embedding (Word2Vec) <2, 6, 9, …> <8, 4, 1, …> <9, 0, 7, …> <2, 3, 0, …> … <7, 1, 2, …> … Method Body Embedding
  63. 63. 25 Encoding (CNN-based) icle has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TSE.2018.288495 Transactions on Software Engineering 7 … … n ! k (a two-dimensional numeric vector) Input layer C1: 4 feature maps S1: 4 feature maps C2: 6 feature maps S2: 6 feature maps Convolutional layer Convolutional layerSubsampling layer Subsampling layer Fully connected layers Output layer Dense layer Output is extracted features ReturnStatement return ArrayType String[] Variable listVar Method toArray ArrayCreation new ArrayType String[] NumberLiteral “0” 0 0 0 0 0 00 0 0 0 0 0 00 Fig. 8: CNN architecture for extracting clustering features. C1 is the first convolutional layer, and C2 is the second one. S1 is 2 6 9 ... ... ... ... ... ... ... ... ... ... ... ... ... ... 8 4 1 9 0 7 2 3 0 7 1 2 ... ......
  64. 64. 26 Encoding (CNN-based) icle has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TSE.2018.288495 Transactions on Software Engineering 7 … … n ! k (a two-dimensional numeric vector) Input layer C1: 4 feature maps S1: 4 feature maps C2: 6 feature maps S2: 6 feature maps Convolutional layer Convolutional layerSubsampling layer Subsampling layer Fully connected layers Output layer Dense layer Output is extracted features ReturnStatement return ArrayType String[] Variable listVar Method toArray ArrayCreation new ArrayType String[] NumberLiteral “0” 0 0 0 0 0 00 0 0 0 0 0 00 Fig. 8: CNN architecture for extracting clustering features. C1 is the first convolutional layer, and C2 is the second one. S1 is 2 6 9 ... ... ... ... ... ... ... ... ... ... ... ... ... ... 8 4 1 9 0 7 2 3 0 7 1 2 ... ......
  65. 65. 26 Encoding (CNN-based) icle has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TSE.2018.288495 Transactions on Software Engineering 7 … … n ! k (a two-dimensional numeric vector) Input layer C1: 4 feature maps S1: 4 feature maps C2: 6 feature maps S2: 6 feature maps Convolutional layer Convolutional layerSubsampling layer Subsampling layer Fully connected layers Output layer Dense layer Output is extracted features ReturnStatement return ArrayType String[] Variable listVar Method toArray ArrayCreation new ArrayType String[] NumberLiteral “0” 0 0 0 0 0 00 0 0 0 0 0 00 Fig. 8: CNN architecture for extracting clustering features. C1 is the first convolutional layer, and C2 is the second one. S1 is 2 6 9 ... ... ... ... ... ... ... ... ... ... ... ... ... ... 8 4 1 9 0 7 2 3 0 7 1 2 ... ......
  66. 66. 27 Inconsistency detection N B findField { for(…) { …
  67. 67. 28 Inconsistency detection N B findField { for(…) { … Adjacent Methods
  68. 68. = 29 Inconsistency detection N B findField { for(…) { …
  69. 69. 30 Inconsistency detection = True False The method is likely to have a consistent name. The method name could be inconsistent with the implementation. Suggest a new name.→
  70. 70. 31 Name Suggestion
  71. 71. 32 Name Suggestion
  72. 72. 33 Name Suggestion Four ranking strategies: Suggestion = Sorting similar imp.
  73. 73. 33 Name Suggestion Four ranking strategies: Suggestion = Sorting similar imp. R1: Sort names based on distance, don’t care identical names.
  74. 74. 33 Name Suggestion Four ranking strategies: Suggestion = Sorting similar imp. R1: Sort names based on distance, don’t care identical names. R2: Group identical names first, sort groups based on size.
  75. 75. 33 Name Suggestion Four ranking strategies: Suggestion = Sorting similar imp. R1: Sort names based on distance, don’t care identical names. R2: Group identical names first, sort groups based on size. R3: Group identical names first, sort groups based on avg. distance.
  76. 76. 33 Name Suggestion Four ranking strategies: Suggestion = Sorting similar imp. R1: Sort names based on distance, don’t care identical names. R2: Group identical names first, sort groups based on size. R3: Group identical names first, sort groups based on avg. distance. R4: Same with R3, but penalize groups with size=1.
  77. 77. 34 Evaluation
  78. 78. 35 Research Questions RQ1: Inconsistency Identification RQ2: Suggestion Precision RQ3: Comparative Study RQ4: Live Study
  79. 79. 35 Research Questions RQ1: Inconsistency Identification RQ2: Suggestion Precision RQ3: Comparative Study RQ4: Live Study }Training/testing data from open-source projects.
  80. 80. 35 Research Questions RQ1: Inconsistency Identification RQ2: Suggestion Precision RQ3: Comparative Study RQ4: Live Study }Training/testing data from open-source projects. Comparing with an approach* with based on a convolutional attention network. → [*] M. Allamanis, H. Peng, and C. Sutton, “A convolutional attention network for extreme summarization of source code,” in Proceedings of the 33nd International Conference on Machine Learning. JMLR.org, 2016, pp. 2091–2100.
  81. 81. 35 Research Questions RQ1: Inconsistency Identification RQ2: Suggestion Precision RQ3: Comparative Study RQ4: Live Study }Training/testing data from open-source projects. Comparing with an approach* with based on a convolutional attention network. → Submitting our suggestion results as pull-requests to open-source projects. → [*] M. Allamanis, H. Peng, and C. Sutton, “A convolutional attention network for extreme summarization of source code,” in Proceedings of the 33nd International Conference on Machine Learning. JMLR.org, 2016, pp. 2091–2100.
  82. 82. 36 Training/Testing Set Total: 430 projects
  83. 83. 36 Training/Testing Set Total: 430 projects
  84. 84. 36 Training/Testing Set Total: 430 projects
  85. 85. 36 Training/Testing Set Total: 430 projects
  86. 86. 36 Training/Testing Set Total: 430 projects Training Data 2,116,413 methods
  87. 87. 37 Training/Testing Set
  88. 88. 37 Training/Testing Set
  89. 89. 37 Training/Testing Set →
  90. 90. 37 Training/Testing Set →
  91. 91. 37 Training/Testing Set → Testing 2,805 methods (name pairs + bodies)
  92. 92. 38 RQ1: Inconsistency Identification # of neighbors to look up k=1 5 10 30 Inconsistent (%) Precision 56.8 53.7 53.3 49.9 Recall 84.5 55.9 46.7 28.8 F1 67.9 54.8 49.7 36.5 Consistent (%) Precision 72.0 55.9 54.2 51.4 Recall 38.2 53.7 60.7 72.2 F1 49.9 54.8 57.3 60.0 Accuracy (%) 60.9 54.8 53.8 50.9 →
  93. 93. 38 RQ1: Inconsistency Identification # of neighbors to look up k=1 5 10 30 Inconsistent (%) Precision 56.8 53.7 53.3 49.9 Recall 84.5 55.9 46.7 28.8 F1 67.9 54.8 49.7 36.5 Consistent (%) Precision 72.0 55.9 54.2 51.4 Recall 38.2 53.7 60.7 72.2 F1 49.9 54.8 57.3 60.0 Accuracy (%) 60.9 54.8 53.8 50.9 →
  94. 94. 39 RQ2: Name Suggestion Accuracy (%) k=thr k=10 R1 R2 R3 R4 First Token thr=1 23.4 23.2 23.0 24.1 thr=5 35.7 39.4 39.4 39.7 Full Name thr=1 10.7 11.0 10.9 10.9 thr=5 17.0 18.7 19.0 19.2
  95. 95. 39 RQ2: Name Suggestion Accuracy (%) k=thr k=10 R1 R2 R3 R4 First Token thr=1 23.4 23.2 23.0 24.1 thr=5 35.7 39.4 39.4 39.7 Full Name thr=1 10.7 11.0 10.9 10.9 thr=5 17.0 18.7 19.0 19.2
  96. 96. 40 RQ3: Comparison — Name Suggestion Accuracy (%) First Token Full Name thr=1 thr=5 thr=1 thr=5 R1 36.4 47.2 16.5 22.9 R2 34.8 50.2 17.0 25.4 R3 34.7 50.3 16.9 25.5 R4 35.4 50.5 16.0 25.7 conv_attention 22.3 33.6 0.3 0.6 copy_attention 23.5 44.7 0.4 1.1 state-of-the-art }
  97. 97. 40 RQ3: Comparison — Name Suggestion Accuracy (%) First Token Full Name thr=1 thr=5 thr=1 thr=5 R1 36.4 47.2 16.5 22.9 R2 34.8 50.2 17.0 25.4 R3 34.7 50.3 16.9 25.5 R4 35.4 50.5 16.0 25.7 conv_attention 22.3 33.6 0.3 0.6 copy_attention 23.5 44.7 0.4 1.1 state-of-the-art }
  98. 98. 41 Training Data RQ4: Live Study — Setup
  99. 99. 41 Training Data 10% RQ4: Live Study — Setup
  100. 100. 41 Training Data 10% RQ4: Live Study — Setup
  101. 101. 42 Training Data RQ4: Live Study — Setup 10%
  102. 102. 42 Training Data Identify inconsistent names and suggest new names (sampled 100 cases). RQ4: Live Study — Setup 10%
  103. 103. 42 Training Data Identify inconsistent names and suggest new names (sampled 100 cases). RQ4: Live Study — Setup 10%
  104. 104. 42 Training Data Identify inconsistent names and suggest new names (sampled 100 cases). Create a pull request RQ4: Live Study — Setup 10%
  105. 105. 42 Training Data Identify inconsistent names and suggest new names (sampled 100 cases). Create a pull request RQ4: Live Study — Setup 10%
  106. 106. 42 Training Data Identify inconsistent names and suggest new names (sampled 100 cases). Create a pull request Ask a maintainer to refactor the method names RQ4: Live Study — Setup 10%
  107. 107. 43 RQ4: Live Study Agreed Agreed but not fixed Disagreed Ignored Total Merged Approved Improved Cannot Won’t 40 26 4 1 2 9 18 100
  108. 108. 43 RQ4: Live Study Agreed Agreed but not fixed Disagreed Ignored Total Merged Approved Improved Cannot Won’t 40 26 4 1 2 9 18 100 Half of them are public methods.
  109. 109. 43 RQ4: Live Study Agreed Agreed but not fixed Disagreed Ignored Total Merged Approved Improved Cannot Won’t 40 26 4 1 2 9 18 100 Half of them are public methods. Developer feedback includes
  110. 110. 43 RQ4: Live Study Agreed Agreed but not fixed Disagreed Ignored Total Merged Approved Improved Cannot Won’t 40 26 4 1 2 9 18 100 Half of them are public methods. Developer feedback includes * It should follow project-specific naming conventions.
  111. 111. 43 RQ4: Live Study Agreed Agreed but not fixed Disagreed Ignored Total Merged Approved Improved Cannot Won’t 40 26 4 1 2 9 18 100 Half of them are public methods. Developer feedback includes * It should follow project-specific naming conventions. * Some method names should consider class names. e.g., In “XXXBuilder”, many methods cannot be named as “build()” even though they return “XXXBuilder” objects.
  112. 112. 44 Summary X RQ4: Live Study Agreed Agreed but not fixed Disagreed Ignored Total Merged Approved Improved Cannot Won’t 40 26 4 1 2 9 18 100 Half of them are public methods. * It should follow project-specific naming conventions. * Some method names should consider class names. e.g., In “XXXBuilder”, many methods cannot be named as “build()” even though they return “XXXBuilder” objects. X RQ3: Comparison Accuracy (%) First Token Full Name thr=1 thr=5 thr=1 thr=5 R1 36.4 47.2 16.5 22.9 R2 34.8 50.2 17.0 25.4 R3 34.7 50.3 16.9 25.5 R4 35.4 50.5 16.0 25.7 conv_attention 22.3 33.6 0.3 0.6 copy_attention 23.5 44.7 0.4 1.1 X Encoding (CNN-based) This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TSE.2018.2884955, IEEE Transactions on Software Engineering 7 … … n ! k (a two-dimensional numeric vector) Input layer C1: 4 feature maps S1: 4 feature maps C2: 6 feature maps S2: 6 feature maps Convolutional layer Convolutional layerSubsampling layer Subsampling layer Fully connected layers Output layer Dense layer Output is extracted features ReturnStatement return ArrayType String[] Variable listVar Method toArray ArrayCreation new ArrayType String[] NumberLiteral “0” 0 0 0 0 0 00 0 0 0 0 0 00 Fig. 8: CNN architecture for extracting clustering features. C1 is the first convolutional layer, and C2 is the second one. S1 is the first subsampling layer, and S2 is the second one. The output of dense layer is considered as extracted features of code fragments and will be used to do clustering. 2.4.4 Code Patterns Mining Although violations can be parsed and converted into two- dimensional numeric vectors, it is still challenging to mine code patterns given that noisy information (e.g., specific meaningless identifiers) can interfere with identifying sim- ilar violations. Deep learning has recently been shown promising in various software engineering tasks [18], [47], [49]. In particular, it offers a major advantage of requiring less prior knowledge and human effort in feature design for machine learning applications. Consequently, our method is designed to deeply learn discriminating features for mining code patterns of violations. We leverage CNNs to perform deep learning of violation features with embedded viola- tions, and also use X-means clustering algorithm to cluster violations with learned features. Feature learning with CNNs Figure 8 shows the CNNs architecture for learning violation features. The input is two-dimensional numeric vectors of preprocessed violations. The alternating local-connected convolutional and subsampling layers are used to capture the local features of violations. The dense layer compresses all local features captured by former layers. We select the output of the dense layer as the learned violation features to cluster violations. Note that our approach uses CNNs to of violations from clustered similar code fragments of viola- tions to show patterns clearly. Note that, the whole process of mining patterns is automated. 2.5 Mining Common Fix Patterns Our goal in this step is to summarize how a violation is resolved by developers. To achieve this goal, we col- lect violation fixing changes and proceed to identify their common fix patterns. The approach of mining common fix patterns is similar to that of mining common code patterns. The differences lie in the data collection and tokenization process. Before describing our approach of mining common fix patterns, we formalize the definitions of patch and fix pattern. 2.5.1 Preliminaries A patch represents a modification carried on a program source code to repair the program which was brought to an erroneous state at runtime. A patch thus captures some knowledge on modification behavior, and similar patches may be associated with similar behavioral changes. Definition 4. Patch (P): A patch is a pair of source code fragments, one representing a buggy version and another as its updated (i.e., bug-fixing) version. In the traditional GNU diff representation of patches, the buggy version is 2 6 9 ... ... ... ... ... ... ... ... ... ... ... ... ... ... 8 4 1 9 0 7 2 3 0 7 1 2 ... ...... X Making the name consistent is NOT easy https://www.itworld.com/article/2833265/ don-t-go-into-programming-if-you-don-t- have-a-good-thesaurus.html
  113. 113. 45 https://github.com/SerVal-DTF/debug-method-name Tool and Data
  114. 114. 46 https://www.darkrsw.net http://wwwen.uni.lu/snt/ research/serval Hire me! Université du Luxembourg 1.1 - logotype of the University of Luxembourg The logotype may not be altered under any circumstances. It is to be used like this for all communication mediums. Université du Luxembourg © 03/2013 3.1 - the Interdisciplinary Centre for Security Reliability and Trust The SnT uses its own logo. It is used on all external communication tools in combination with the UL logo. Design guidelines are available at SnT. Hiring

×