Suggesting Descriptive Method Names: An Exploratory
Study of Two Machine Learning Approaches
Oleksandr Zaitsev
Arolla,
Inria, Univ. Lille, CNRS,
Centrale Lille,
UMR 9189 - CRIStAL
Lille, France
oleksandr.zaitsev@inria.fr
13th International Conference on the Quality of Information and Communications Technology
Stéphane Ducasse
Inria, Univ. Lille, CNRS,
Centrale Lille,
UMR 9189 - CRIStAL
Lille, France
stephane.ducasse@inria.fr
Alexandre Bergel
ISCLab,
DCC University of Chile
Santiago, Chile
abergel@dcc.uchile.cl
Mathieu Eveillard
Arolla
Paris, France
mathieu.eveillard@arolla.fr
1
10 to 1
2
the ratio of time
spent reading
versus writing is
well over
(Kent Beck & Robert Martin)
72%
identifier names
can take up to
(Deissenboeck & Pizka, 2006)
of source code
Good identifier names degrade and bad
ones become even more misleading.
3
Software evolves
They must be constantly
maintained and updated.
Developers could benefit
from automated tools that
suggest identifier names
and check their validity.
Text Summarisation
Text:
French authorities have placed seven more
departments covering major cities such as Lille,
Strasbourg and Dijon on high alert as increases
in Covid-19 infections accelerate, the
government said Sunday. Of France's 101
mainland and overseas departments, 28 are now
considered "red zones" where authorities will be
able to impose exceptional measures to slow
the number of new coronavirus cases. The move
comes as France reported a record of nearly
9,000 daily cases on Friday, and a further 8,550
cases in the past 24 hours on Saturday, when the
nationwide test positivity rate increased to 4.7
percent…
Title:
France puts more departments
on Covid high alert
* Text taken from the news article https://www.thelocal.fr/20200906/france-puts-more-departments-on-covid-high-alert 4
Text Summarisation
Extractive Abstractive
Every word of a summary is
extracted from the text of an article
Summary may contain words that
did not appear in the text
5
Text Summarisation
Extractive Abstractive
Every word of a summary is
extracted from the text of an article
Summary may contain words that
did not appear in the text
6
French authorities have placed seven more departments covering major cities such as
Lille, Strasbourg and Dijon on high alert as increases in Covid-19 infections accelerate.
French authorities have placed more
departments on hight Covid-19 alert
France puts more departments
on hight Covid alert
Summarising Source Code
Source code (body of a method):
"Evaluate aBlock with each of the
receiver's elements as the argument.
Return the sum of the answers.”

| sum |

sum := 0.

self do: [:each | 

sum :=( aBlock value: each) + sum]. 

^ sum
Method name:
detectSum:
7
Representing a Method as Text with Summary
8
| sum |
sum := 0.
self do: [:each |
sum := (aBlock value: each) + sum].
^sum
| sum | sum := <num> . self do :
[ : each | sum := ( a block value :
each ) + sum ] . ^ sum .
detectSum: detect sum
Source code: Text:
Method name: Short summary:
Machine Learning Models
for Code Summarisation
Extractive Abstractive
Every word in a method name is
extracted from the body of a method
Method names are composed of words
from a general vocabulary (e.g. all words
that were found in other methods)
9
TF-IDF + n-gram model Sequence-to-sequence
attention-based neural network
Term Frequency - Inverse Document Frequency (TF-IDF)
10
Term Frequency (TF) — number of
occurrences of a given word w in a
given document d (in a source code
of the method)
Document Frequency (DF) — total
number of occurrences of a given word
w in all documents from a corpus C
(in all methods from a dataset)
Inverse Document Frequency (IDF):
IDF(w, C) = log
1
DF(w, C)
Term Frequency - Inverse
Document Frequency (TF-IDF):
TF-IDF(w, d, C) = TF(w, d) ⋅ IDF(w, C)
Words with high TF-IDF score appear a lot
in this method but rarely in other methods
N-gram language model
11
Statistical model that assigns probabilities to the sequences of words and can be used to predict
the next word in a sequence.
Calculates the conditional probability*:
P(the|its water is so transparent that) =
C(its water is so transparent that the)
C(its water is so transparent that)
* Example taken from Speech and Language Processing by Daniel Jurafsky & James H. Martin

https://web.stanford.edu/~jurafsky/slp3/3.pdf
N-gram model can be
used to select the most
likely order of words:
P(water is transparent) = 0.45
P(is water transparent) = 0.24
P(transparent water is) = 0
{is, transparent, water}
Extractive Model for Generating Method Names
12
Step 1: Extract keywords from source code using TF-IDF
Step 2: Choose the best order of words using the n-gram model
Sequence to Sequence Neural Network
13
Encoded
Vector
the cat is black
Encoded
Vector
<s> le chat
est
chat
le
est
noir
Decoder:
Encoder:
noir
<s>
Sequence to Sequence Neural Network (for code)
14
Encoded
Vector
equals : <num> .
Encoded
Vector
<s> test add
<s>
add
test
Decoder:
Encoder:
: <num> + <num>
self assert
Experimental Design
15
Dataset of
methods
70% Training set
10%
20%
Validation set
Test set
Evaluation strategy:
Compare generated names to
the real ones
Precision =
TP
TP + FP
Recall =
TP
TP + FN
F1 = 2 ⋅
Precision ⋅ Recall
Precision + Recall
Case Study: Pharo Ecosystem
16
- In-place method arguments
bob send: email to: emma instead
of bob.send(email, emma)
- many stop words
(on, with, and, to, etc.)
- Very short methods (median 3 LoC)
- Statically typed
Collecting the Dataset of Methods
17
Projects 50
Packages 824
Classes 13,935
Methods 132,046
(selected by experts from Pharo open source community)
92,127 methods (61%)
filtering
64,488 methods (70%)
9,212 methods (10%)
18,425 methods (20%)
Training set
Validation set
Test set
0.0%
10.0%
20.0%
30.0%
40.0%
0 25000 50000 75000 100000
Iteration
Score
Metrics
Exact match
F1
Precision
Recall
Models
Abstractive
Extractive
18
Examples of Generated Names
19
self assert: self newNode isComment.
Method body
isComment
Generated name (extractive)
testIsComment
Method name
testIsComment
Generated name (abstractive)
Examples of Generated Names
20
aVisitor visitDraggableInteraction: self with: args
Method body
visitDraggableInteraction
Generated name (extractive)
acceptWith
Method name
accept
Generated name (abstractive)
Examples of Generated Names
21
aPackage isPackage
ifFalse: [ ˆ self ].
self addElement: aPackage in: self packages.
Method body
package
Generated name (extractive)
addPackage
Method name
addPackage
Generated name (abstractive)
Results of the Numeric Evaluation
22
0.0%
10.0%
20.0%
30.0%
40.0%
Exact Match Precision Recall F1
Model
Random Extractive
Extractive
Abstractive
46%
45%
36%
11%
Threats to validity
23
- Abstractive model is only as good as the method names on which it was
trained (but we handpicked the projects that follow good practices)
- We evaluated our models by comparing them to real names, which are not
necessarily good
- The quality of a generated name depends on a quality of method body. We
can not propose good names for badly written methods
Future work
24
- Human evaluation of the generated names
- Other programming languages
- Do not remove comments and literals
- Controlled experiment to observe how suggesting method names can
improve bug fixing and feature incorporation time
- Cross-project and cross-domain training
Conclusion
25
- We proposed and compared two machine learning models for generating
method names based on the source code from method’s body
- The extractive model (based on TF-IDF and n-gram model) achieved the
highest recall of 45%
- The abstractive model (seq2seq neural network) achieved the highest
precision score of 46%
- 11% of method names generated by the abstractive model are exactly the
same as the ones given by the developers

Suggesting Descriptive Method Names: An Exploratory Study of Two Machine Learning Approaches

  • 1.
    Suggesting Descriptive MethodNames: An Exploratory Study of Two Machine Learning Approaches Oleksandr Zaitsev Arolla, Inria, Univ. Lille, CNRS, Centrale Lille, UMR 9189 - CRIStAL Lille, France oleksandr.zaitsev@inria.fr 13th International Conference on the Quality of Information and Communications Technology Stéphane Ducasse Inria, Univ. Lille, CNRS, Centrale Lille, UMR 9189 - CRIStAL Lille, France stephane.ducasse@inria.fr Alexandre Bergel ISCLab, DCC University of Chile Santiago, Chile abergel@dcc.uchile.cl Mathieu Eveillard Arolla Paris, France mathieu.eveillard@arolla.fr 1
  • 2.
    10 to 1 2 theratio of time spent reading versus writing is well over (Kent Beck & Robert Martin) 72% identifier names can take up to (Deissenboeck & Pizka, 2006) of source code
  • 3.
    Good identifier namesdegrade and bad ones become even more misleading. 3 Software evolves They must be constantly maintained and updated. Developers could benefit from automated tools that suggest identifier names and check their validity.
  • 4.
    Text Summarisation Text: French authoritieshave placed seven more departments covering major cities such as Lille, Strasbourg and Dijon on high alert as increases in Covid-19 infections accelerate, the government said Sunday. Of France's 101 mainland and overseas departments, 28 are now considered "red zones" where authorities will be able to impose exceptional measures to slow the number of new coronavirus cases. The move comes as France reported a record of nearly 9,000 daily cases on Friday, and a further 8,550 cases in the past 24 hours on Saturday, when the nationwide test positivity rate increased to 4.7 percent… Title: France puts more departments on Covid high alert * Text taken from the news article https://www.thelocal.fr/20200906/france-puts-more-departments-on-covid-high-alert 4
  • 5.
    Text Summarisation Extractive Abstractive Everyword of a summary is extracted from the text of an article Summary may contain words that did not appear in the text 5
  • 6.
    Text Summarisation Extractive Abstractive Everyword of a summary is extracted from the text of an article Summary may contain words that did not appear in the text 6 French authorities have placed seven more departments covering major cities such as Lille, Strasbourg and Dijon on high alert as increases in Covid-19 infections accelerate. French authorities have placed more departments on hight Covid-19 alert France puts more departments on hight Covid alert
  • 7.
    Summarising Source Code Sourcecode (body of a method): "Evaluate aBlock with each of the receiver's elements as the argument. Return the sum of the answers.”
 | sum |
 sum := 0.
 self do: [:each | 
 sum :=( aBlock value: each) + sum]. 
 ^ sum Method name: detectSum: 7
  • 8.
    Representing a Methodas Text with Summary 8 | sum | sum := 0. self do: [:each | sum := (aBlock value: each) + sum]. ^sum | sum | sum := <num> . self do : [ : each | sum := ( a block value : each ) + sum ] . ^ sum . detectSum: detect sum Source code: Text: Method name: Short summary:
  • 9.
    Machine Learning Models forCode Summarisation Extractive Abstractive Every word in a method name is extracted from the body of a method Method names are composed of words from a general vocabulary (e.g. all words that were found in other methods) 9 TF-IDF + n-gram model Sequence-to-sequence attention-based neural network
  • 10.
    Term Frequency -Inverse Document Frequency (TF-IDF) 10 Term Frequency (TF) — number of occurrences of a given word w in a given document d (in a source code of the method) Document Frequency (DF) — total number of occurrences of a given word w in all documents from a corpus C (in all methods from a dataset) Inverse Document Frequency (IDF): IDF(w, C) = log 1 DF(w, C) Term Frequency - Inverse Document Frequency (TF-IDF): TF-IDF(w, d, C) = TF(w, d) ⋅ IDF(w, C) Words with high TF-IDF score appear a lot in this method but rarely in other methods
  • 11.
    N-gram language model 11 Statisticalmodel that assigns probabilities to the sequences of words and can be used to predict the next word in a sequence. Calculates the conditional probability*: P(the|its water is so transparent that) = C(its water is so transparent that the) C(its water is so transparent that) * Example taken from Speech and Language Processing by Daniel Jurafsky & James H. Martin
 https://web.stanford.edu/~jurafsky/slp3/3.pdf N-gram model can be used to select the most likely order of words: P(water is transparent) = 0.45 P(is water transparent) = 0.24 P(transparent water is) = 0 {is, transparent, water}
  • 12.
    Extractive Model forGenerating Method Names 12 Step 1: Extract keywords from source code using TF-IDF Step 2: Choose the best order of words using the n-gram model
  • 13.
    Sequence to SequenceNeural Network 13 Encoded Vector the cat is black Encoded Vector <s> le chat est chat le est noir Decoder: Encoder: noir <s>
  • 14.
    Sequence to SequenceNeural Network (for code) 14 Encoded Vector equals : <num> . Encoded Vector <s> test add <s> add test Decoder: Encoder: : <num> + <num> self assert
  • 15.
    Experimental Design 15 Dataset of methods 70%Training set 10% 20% Validation set Test set Evaluation strategy: Compare generated names to the real ones Precision = TP TP + FP Recall = TP TP + FN F1 = 2 ⋅ Precision ⋅ Recall Precision + Recall
  • 16.
    Case Study: PharoEcosystem 16 - In-place method arguments bob send: email to: emma instead of bob.send(email, emma) - many stop words (on, with, and, to, etc.) - Very short methods (median 3 LoC) - Statically typed
  • 17.
    Collecting the Datasetof Methods 17 Projects 50 Packages 824 Classes 13,935 Methods 132,046 (selected by experts from Pharo open source community) 92,127 methods (61%) filtering 64,488 methods (70%) 9,212 methods (10%) 18,425 methods (20%) Training set Validation set Test set
  • 18.
    0.0% 10.0% 20.0% 30.0% 40.0% 0 25000 5000075000 100000 Iteration Score Metrics Exact match F1 Precision Recall Models Abstractive Extractive 18
  • 19.
    Examples of GeneratedNames 19 self assert: self newNode isComment. Method body isComment Generated name (extractive) testIsComment Method name testIsComment Generated name (abstractive)
  • 20.
    Examples of GeneratedNames 20 aVisitor visitDraggableInteraction: self with: args Method body visitDraggableInteraction Generated name (extractive) acceptWith Method name accept Generated name (abstractive)
  • 21.
    Examples of GeneratedNames 21 aPackage isPackage ifFalse: [ ˆ self ]. self addElement: aPackage in: self packages. Method body package Generated name (extractive) addPackage Method name addPackage Generated name (abstractive)
  • 22.
    Results of theNumeric Evaluation 22 0.0% 10.0% 20.0% 30.0% 40.0% Exact Match Precision Recall F1 Model Random Extractive Extractive Abstractive 46% 45% 36% 11%
  • 23.
    Threats to validity 23 -Abstractive model is only as good as the method names on which it was trained (but we handpicked the projects that follow good practices) - We evaluated our models by comparing them to real names, which are not necessarily good - The quality of a generated name depends on a quality of method body. We can not propose good names for badly written methods
  • 24.
    Future work 24 - Humanevaluation of the generated names - Other programming languages - Do not remove comments and literals - Controlled experiment to observe how suggesting method names can improve bug fixing and feature incorporation time - Cross-project and cross-domain training
  • 25.
    Conclusion 25 - We proposedand compared two machine learning models for generating method names based on the source code from method’s body - The extractive model (based on TF-IDF and n-gram model) achieved the highest recall of 45% - The abstractive model (seq2seq neural network) achieved the highest precision score of 46% - 11% of method names generated by the abstractive model are exactly the same as the ones given by the developers