Suggesting Descriptive Method Names: An Exploratory Study of Two Machine Learning Approaches

Suggesting Descriptive Method Names: An Exploratory
Study of Two Machine Learning Approaches
Oleksandr Zaitsev
Arolla,
Inria, Univ. Lille, CNRS,
Centrale Lille,
UMR 9189 - CRIStAL
Lille, France
oleksandr.zaitsev@inria.fr
13th International Conference on the Quality of Information and Communications Technology
Stéphane Ducasse
Inria, Univ. Lille, CNRS,
Centrale Lille,
UMR 9189 - CRIStAL
Lille, France
stephane.ducasse@inria.fr
Alexandre Bergel
ISCLab,
DCC University of Chile
Santiago, Chile
abergel@dcc.uchile.cl
Mathieu Eveillard
Arolla
Paris, France
mathieu.eveillard@arolla.fr
1

10 to 1
2
the ratio of time
spent reading
versus writing is
well over
(Kent Beck & Robert Martin)
72%
identifier names
can take up to
(Deissenboeck & Pizka, 2006)
of source code

Good identifier names degrade and bad
ones become even more misleading.
3
Software evolves
They must be constantly
maintained and updated.
Developers could benefit
from automated tools that
suggest identifier names
and check their validity.

Text Summarisation
Text:
French authorities have placed seven more
departments covering major cities such as Lille,
Strasbourg and Dijon on high alert as increases
in Covid-19 infections accelerate, the
government said Sunday. Of France's 101
mainland and overseas departments, 28 are now
considered "red zones" where authorities will be
able to impose exceptional measures to slow
the number of new coronavirus cases. The move
comes as France reported a record of nearly
9,000 daily cases on Friday, and a further 8,550
cases in the past 24 hours on Saturday, when the
nationwide test positivity rate increased to 4.7
percent…
Title:
France puts more departments
on Covid high alert
* Text taken from the news article https://www.thelocal.fr/20200906/france-puts-more-departments-on-covid-high-alert 4

Text Summarisation
Extractive Abstractive
Every word of a summary is
extracted from the text of an article
Summary may contain words that
did not appear in the text
5

Text Summarisation
Every word of a summary is
extracted from the text of an article
Summary may contain words that
did not appear in the text
6
French authorities have placed seven more departments covering major cities such as
Lille, Strasbourg and Dijon on high alert as increases in Covid-19 infections accelerate.
French authorities have placed more
departments on hight Covid-19 alert
France puts more departments
on hight Covid alert

Summarising Source Code
Source code (body of a method):
"Evaluate aBlock with each of the
receiver's elements as the argument.
Return the sum of the answers.” 
| sum | 
sum := 0. 
self do: [:each |  
sum :=( aBlock value: each) + sum].  
^ sum
Method name:
detectSum:
7

Machine Learning Models
for Code Summarisation
Every word in a method name is
extracted from the body of a method
Method names are composed of words
from a general vocabulary (e.g. all words
that were found in other methods)
9
TF-IDF + n-gram model Sequence-to-sequence
attention-based neural network

Term Frequency - Inverse Document Frequency (TF-IDF)
10
Term Frequency (TF) — number of
occurrences of a given word w in a
given document d (in a source code
of the method)
Document Frequency (DF) — total
number of occurrences of a given word
w in all documents from a corpus C
(in all methods from a dataset)
Inverse Document Frequency (IDF):
IDF(w, C) = log
1
DF(w, C)
Term Frequency - Inverse
Document Frequency (TF-IDF):
TF-IDF(w, d, C) = TF(w, d) ⋅ IDF(w, C)
Words with high TF-IDF score appear a lot
in this method but rarely in other methods

N-gram language model
11
Statistical model that assigns probabilities to the sequences of words and can be used to predict
the next word in a sequence.
Calculates the conditional probability*:
P(the|its water is so transparent that) =
C(its water is so transparent that the)
C(its water is so transparent that)
* Example taken from Speech and Language Processing by Daniel Jurafsky & James H. Martin 
https://web.stanford.edu/~jurafsky/slp3/3.pdf
N-gram model can be
used to select the most
likely order of words:
P(water is transparent) = 0.45
P(is water transparent) = 0.24
P(transparent water is) = 0
{is, transparent, water}

Extractive Model for Generating Method Names
12
Step 1: Extract keywords from source code using TF-IDF
Step 2: Choose the best order of words using the n-gram model

Sequence to Sequence Neural Network
13
Encoded
Vector
the cat is black
Encoded
Vector
<s> le chat
est
chat
le
est
noir
Decoder:
Encoder:
noir
<s>

Sequence to Sequence Neural Network (for code)
14
Encoded
Vector
equals : <num> .
Encoded
Vector
<s> test add
<s>
add
test
Decoder:
Encoder:
: <num> + <num>
self assert

Experimental Design
15
Dataset of
methods
70% Training set
10%
20%
Validation set
Test set
Evaluation strategy:
Compare generated names to
the real ones
Precision =
TP
TP + FP
Recall =
TP
TP + FN
F1 = 2 ⋅
Precision ⋅ Recall
Precision + Recall

Case Study: Pharo Ecosystem
16
- In-place method arguments
bob send: email to: emma instead
of bob.send(email, emma)
- many stop words
(on, with, and, to, etc.)
- Very short methods (median 3 LoC)
- Statically typed

Collecting the Dataset of Methods
17
Projects 50
Packages 824
Classes 13,935
Methods 132,046
(selected by experts from Pharo open source community)
92,127 methods (61%)
filtering
64,488 methods (70%)
9,212 methods (10%)
18,425 methods (20%)
Training set
Validation set
Test set

0.0%
10.0%
20.0%
30.0%
40.0%
0 25000 50000 75000 100000
Iteration
Score
Metrics
Exact match
F1
Precision
Recall
Models
Abstractive
Extractive
18

Examples of Generated Names
19
self assert: self newNode isComment.
Method body
isComment
Generated name (extractive)
testIsComment
Method name
testIsComment
Generated name (abstractive)

20
aVisitor visitDraggableInteraction: self with: args
Method body
visitDraggableInteraction
acceptWith
Method name
accept

21
aPackage isPackage
ifFalse: [ ˆ self ].
self addElement: aPackage in: self packages.
Method body
package
addPackage
Method name
addPackage

Results of the Numeric Evaluation
22
0.0%
10.0%
20.0%
30.0%
40.0%
Exact Match Precision Recall F1
Model
Random Extractive
Extractive
Abstractive
46%
45%
36%
11%

Threats to validity
23
- Abstractive model is only as good as the method names on which it was
trained (but we handpicked the projects that follow good practices)
- We evaluated our models by comparing them to real names, which are not
necessarily good
- The quality of a generated name depends on a quality of method body. We
can not propose good names for badly written methods

Future work
24
- Human evaluation of the generated names
- Other programming languages
- Do not remove comments and literals
- Controlled experiment to observe how suggesting method names can
improve bug fixing and feature incorporation time
- Cross-project and cross-domain training

Conclusion
25
- We proposed and compared two machine learning models for generating
method names based on the source code from method’s body
- The extractive model (based on TF-IDF and n-gram model) achieved the
highest recall of 45%
- The abstractive model (seq2seq neural network) achieved the highest
precision score of 46%
- 11% of method names generated by the abstractive model are exactly the
same as the ones given by the developers

Suggesting Descriptive Method Names: An Exploratory Study of Two Machine Learning Approaches

More Related Content

What's hot

Similar to Suggesting Descriptive Method Names: An Exploratory Study of Two Machine Learning Approaches

More from Oleksandr Zaitsev

Recently uploaded

Suggesting Descriptive Method Names: An Exploratory Study of Two Machine Learning Approaches