SlideShare a Scribd company logo
Aspects of software naturalness through
the generation of IdentifierNames
Oleksandr Zaytsev
Ukrainian Catholic University
oleks@ucu.edu.ua
Stéphane Ducasse
Inria Lille - Nord Europe
stephane.ducasse@inria.fr
Alexandre Bergel
DCC, Universidad de Chile
abergel@dcc.uchile.cl
1. Introduction (problem statement)
2. Data collection
3. What makes code different from English?
4. Translating code to English
5. Evaluation
6. Conclusion
Contents
2
Each dot indicates
a chapter
1. Introduction
3
Developers spend
of time

reading

code90%
up to *
* According to Martin (2009)
4
Identifier names
* According to Deissenboeck and Pizka (2006)
of source 

code
72%
can take

up to
*
5
Bad names
A >> b: c
| d |
d := self e f: c; g.
^ self h i: d j
6
*
* It actually works
Good names
Collection >> union: aCollection
| set |
set := self asSet addAll: aCollection; yourself.
^ self species withAll: set asArray
7
*
* That’s the same method
Problem
1. As software evolves, names become obsolete
2. They need to be maintained and updated
3. This requires complete understanding of the whole
system at all times (impossible for big systems)
➡ We need automated tools to assist humans
8
Pharo
9
• Dialect of Smalltalk
• Object-oriented
• Created by Stéphane
and Marcus in 2008
What?
• Outside C-family
• Syntax for kids
• Access to community
Why?
Challenge
Suggest descriptive names for Pharo
methods based on their source code
10
This is hard because...
We name methods based on how they
should be used, not how they are
implemented
11
Software naturalness
• Programming is a form of communication
• Source code written by people has similar regularities
as texts in natural languages
➡ We can apply NLP to source code
• We take this a bit further:

In certain applications and after good preprocessing we
can treat code as a natural language
12
Research questions
RQ1. Is source code of Pharo natural enough to allow us
to use its regularities in machine learning models?
RQ2. Does source code of a method contain enough
semantic information to generate a name that
would express the purpose of that method?
13
• Collected datasets of Pharo and Java methods from
selected projects
• Compared source code of Pharo and Java to English
• Built a deep learning model that infers the name of
a method from its body
Contributions
14
To our knowledge, this is …
• First study of software naturalness outside the
C-family of programming languages
• First application of machine learning and natural
language processing to source code of Smalltalk
for building a tool in Pharo IDE
15
2. Collecting the data
16
We have collected…
Pharo methods
132, 046 136, 811
Java methods
17
…and compared them to
natural English
sentences from

Brown corpus
57,340 98,552
sentences from
Gutenberg corpus
18
Word count in datasets
0
1,750,000
3,500,000
5,250,000
7,000,000
Brown Gutenberg Pharo Java
Natural English
Programming languages
19
Tokenizing source code
20
text := string asRopedText.
attributes := (anEditorElement editor text attributesAt: aBrText end)
reject: [ :each | each = self ].
text
attributes: attributes;
foreground: BrGlamorousColors defaultButtonTextColor;
bold.
^ (BlTextElement text: text)
background: (BrGlamorousColors errorBackgroundColor alpha: 0.5);
padding: (BlInsets top: 3 left: 0 bottom: 3 right: 0);
yourself .
Tokenizing source code
21
text := string asRopedText.
attributes := (anEditorElement editor text attributesAt: aBrText end)
reject: [ :each | each = self ].
text
attributes: attributes;
foreground: BrGlamorousColors defaultButtonTextColor;
bold.
^ (BlTextElement text: text)
background: (BrGlamorousColors errorBackgroundColor alpha: 0.5);
padding: (BlInsets top: 3 left: 0 bottom: 3 right: 0);
yourself .
Tokenizing source code
22
text := string asRopedText .
attributes := ( anEditorElement editor text attributesAt: aBrText end )
reject: [ :each | each = self ] .
text
attributes: attributes ;
foreground: BrGlamorousColors defaultButtonTextColor ;
bold .
^ ( BlTextElement text: text )
background: ( BrGlamorousColors errorBackgroundColor alpha: 0.5 ) ;
padding: ( BlInsets top: 3 left: 0 bottom: 3 right: 0 ) ;
yourself .
Tokenizing source code
23
text := string asRopedText .
attributes := ( anEditorElement editor text attributesAt: aBrText end )
reject: [ :each | each = self ] .
text
attributes: attributes ;
foreground: BrGlamorousColors defaultButtonTextColor ;
bold .
^ ( BlTextElement text: text )
background: ( BrGlamorousColors errorBackgroundColor alpha: 0.5 ) ;
padding: ( BlInsets top: 3 left: 0 bottom: 3 right: 0 ) ;
yourself .
Tokenizing source code
24
text := string asRopedText .
attributes := ( anEditorElement editor text attributesAt : aBrText end )
reject : [ : each | each = self ] .
text
attributes : attributes ;
foreground : BrGlamorousColors defaultButtonTextColor ;
bold .
^ ( BlTextElement text : text )
background : ( BrGlamorousColors errorBackgroundColor alpha : 0.5 ) ;
padding : ( BlInsets top : 3 left : 0 bottom : 3 right : 0 ) ;
yourself .
Tokenizing source code
25
text := string asRopedText .
attributes := ( anEditorElement editor text attributesAt : aBrText end )
reject : [ : each | each = self ] .
text
attributes : attributes ;
foreground : BrGlamorousColors defaultButtonTextColor ;
bold .
^ ( BlTextElement text : text )
background : ( BrGlamorousColors errorBackgroundColor alpha : 0.5 ) ;
padding : ( BlInsets top : 3 left : 0 bottom : 3 right : 0 ) ;
yourself .
Tokenizing source code
26
text := string as roped text .
attributes := ( an editor element editor text attributes at : a br text end )
reject : [ : each | each = self ] .
text
attributes : attributes ;
foreground : br glamorous colors default button text color ;
bold .
^ ( bl text element text : text )
background : ( br glamorous colors error background color alpha : 0.5 ) ;
padding : ( bl insets top : 3 left : 0 bottom : 3 right : 0 ) ;
yourself .
Tokenizing source code
27
text := string as roped text .
attributes := ( an editor element editor text attributes at : a br text end )
reject : [ : each | each = self ] .
text
attributes : attributes ;
foreground : br glamorous colors default button text color ;
bold .
^ ( bl text element text : text )
background : ( br glamorous colors error background color alpha : 0.5 ) ;
padding : ( bl insets top : 3 left : 0 bottom : 3 right : 0 ) ;
yourself .
Tokenizing source code
28
text := string as roped text .
attributes := ( an editor element editor text attributes at : a br text end )
reject : [ : each | each = self ] .
text
attributes : attributes ;
foreground : br glamorous colors default button text color ;
bold .
^ ( bl text element text : text )
background : ( br glamorous colors error background color alpha : <num> ) ;
padding : ( bl insets top : <num> left : <num> bottom : <num> right : <num> ) ;
yourself .
Tokenizing source code
29
text := string as roped text .P
attributes := ( an editor element editor text attributes at : a br text end )P
reject : [ : each | each = self ] .P
P
textP
attributes : attributes ;P
foreground : br glamorous colors default button text color ;P
bold .P
P
^ ( bl text element text : text )P
background : ( br glamorous colors error background color alpha : <num> ) ;P
padding : ( bl insets top : <num> left : <num> bottom : <num> right : <num> ) ;P
yourself .
30
text := string as roped text . attributes := ( an editor element editor text
attributes at : a br text end ) reject : [ : each | each = self ] . text attributes
: attributes ; foreground : br glamorous colors default button text color ; bold .
^ ( bl text element text : text ) background : ( br glamorous colors error
background color alpha : <num> ) ; padding : ( bl insets top : <num> left : <num>
bottom : <num> right : <num> ) ; yourself .
Tokenizing source code
Words
text := string as roped text . attributes := ( an editor element editor text
attributes at : a br text end ) reject : [ : each | each = self ] . text attributes
: attributes ; foreground : br glamorous colors default button text color ; bold .
^ ( bl text element text : text ) background : ( br glamorous colors error
background color alpha : <num> ) ; padding : ( bl insets top : <num> left : <num>
bottom : <num> right : <num> ) ; yourself .
Tokenizing source code
31
Paragraph with 4 sentences
Sentence
3. What makes code
different from English?
32
Limited vocabulary
6,415 50,286
Unique words

in Pharo code
Unique words

in Gutenberg corpus
33
Specialized vocabulary

Top 5 words from…
Bible
the

and

of

to

that
34
Reuters
the

of

to

in

and
Pharo
self

a

if

new

assert
Java
get

new

string

if

return
4. Translating source
code to English
35
36
Method name as short
English sentence
sumOfIntegerNumbers — method name
sum of integer numbers — short English sentence
• Sequence to sequence neural network
• Attention-based decoder
• Teacher forcing
• GRU cells
Model
37
Translating

English to French
38
Translating method’s
body to name
39
Example of generated
method names
40
self assert: self newNode isComment.
Real name: test is comment
Generated name: test is comment
Example of generated
method names
41
aVisitor
visitDraggableInteractreion: self
with: args.
Real name: accept with
Generated name: accept
5. Evaluation
42
Idea
• Real method name is ground truth.
• Good model will on average generate
names that are close to real names
43
Random baseline*
Idea: Select 3 random words from vocabulary
of words used in method names
44
* Lowest baseline
TF-IDF baseline*
Idea: Select 3 words that appear often in
source code of this method and are
very rare in other methods (keywords)
45
* Simple, fast and surprisingly good
Evaluation on
validation set
46
Evaluation on test set
Exact match score
0%
4%
7%
11%
14%
Random model TF-IDF Our model
47
13.82%
Evaluation on test set
Precision score
0%
15%
30%
45%
60%
Random model TF-IDF Our model
48
Evaluation on test set
Recall score
0%
10%
20%
30%
40%
Random model TF-IDF Our model
49
6. Conclusion
50
RQ1. Can we apply NLP and
ML to source code of Pharo?
51
• We can build tools for Pharo IDE using same
machine learning models that are applied to
natural texts
• Similarly to natural languages, code has regularities
that can be exploited by machine learning models
YES
RQ2. Can we infer method
names only from source code?
52
• With no additional features we were able to
generate descriptive method names based on
source code
• We explain this by semantic information carried by
identifier names inside code
YES
Questions?
53

More Related Content

What's hot

AmI 2015 - Python basics
AmI 2015 - Python basicsAmI 2015 - Python basics
AmI 2015 - Python basics
Luigi De Russis
 
Python Presentation
Python PresentationPython Presentation
Python Presentation
Narendra Sisodiya
 
Python Tutorial
Python TutorialPython Tutorial
Python Tutorial
AkramWaseem
 
Programming with Python
Programming with PythonProgramming with Python
Programming with Python
Rasan Samarasinghe
 
Python Tutorial Part 1
Python Tutorial Part 1Python Tutorial Part 1
Python Tutorial Part 1
Haitham El-Ghareeb
 
Antlr V3
Antlr V3Antlr V3
Antlr V3
guest5024494
 
Fundamentals of Python Programming
Fundamentals of Python ProgrammingFundamentals of Python Programming
Fundamentals of Python Programming
Kamal Acharya
 
Python
PythonPython
Introduction to Python - Part Two
Introduction to Python - Part TwoIntroduction to Python - Part Two
Introduction to Python - Part Two
amiable_indian
 
Python ppt
Python pptPython ppt
Python ppt
Anush verma
 
introduction to python
 introduction to python introduction to python
introduction to python
Jincy Nelson
 
Python basics_ part1
Python basics_ part1Python basics_ part1
Python basics_ part1
Elaf A.Saeed
 
Python introduction towards data science
Python introduction towards data sciencePython introduction towards data science
Python introduction towards data science
deepak teja
 
Programming languages
Programming languagesProgramming languages
Programming languages
Eelco Visser
 
Let's write a PDF file
Let's write a PDF fileLet's write a PDF file
Let's write a PDF file
Ange Albertini
 
Python Basics
Python BasicsPython Basics
Python Basics
tusharpanda88
 
PPT on Data Science Using Python
PPT on Data Science Using PythonPPT on Data Science Using Python
PPT on Data Science Using Python
NishantKumar1179
 
CS4200 2019 | Lecture 3 | Parsing
CS4200 2019 | Lecture 3 | ParsingCS4200 2019 | Lecture 3 | Parsing
CS4200 2019 | Lecture 3 | Parsing
Eelco Visser
 
1. python programming
1. python programming1. python programming
1. python programming
sreeLekha51
 
Yacc lex
Yacc lexYacc lex
Yacc lex
915086731
 

What's hot (20)

AmI 2015 - Python basics
AmI 2015 - Python basicsAmI 2015 - Python basics
AmI 2015 - Python basics
 
Python Presentation
Python PresentationPython Presentation
Python Presentation
 
Python Tutorial
Python TutorialPython Tutorial
Python Tutorial
 
Programming with Python
Programming with PythonProgramming with Python
Programming with Python
 
Python Tutorial Part 1
Python Tutorial Part 1Python Tutorial Part 1
Python Tutorial Part 1
 
Antlr V3
Antlr V3Antlr V3
Antlr V3
 
Fundamentals of Python Programming
Fundamentals of Python ProgrammingFundamentals of Python Programming
Fundamentals of Python Programming
 
Python
PythonPython
Python
 
Introduction to Python - Part Two
Introduction to Python - Part TwoIntroduction to Python - Part Two
Introduction to Python - Part Two
 
Python ppt
Python pptPython ppt
Python ppt
 
introduction to python
 introduction to python introduction to python
introduction to python
 
Python basics_ part1
Python basics_ part1Python basics_ part1
Python basics_ part1
 
Python introduction towards data science
Python introduction towards data sciencePython introduction towards data science
Python introduction towards data science
 
Programming languages
Programming languagesProgramming languages
Programming languages
 
Let's write a PDF file
Let's write a PDF fileLet's write a PDF file
Let's write a PDF file
 
Python Basics
Python BasicsPython Basics
Python Basics
 
PPT on Data Science Using Python
PPT on Data Science Using PythonPPT on Data Science Using Python
PPT on Data Science Using Python
 
CS4200 2019 | Lecture 3 | Parsing
CS4200 2019 | Lecture 3 | ParsingCS4200 2019 | Lecture 3 | Parsing
CS4200 2019 | Lecture 3 | Parsing
 
1. python programming
1. python programming1. python programming
1. python programming
 
Yacc lex
Yacc lexYacc lex
Yacc lex
 

Similar to Aspects of software naturalness through the generation of IdentifierNames

C program compiler presentation
C program compiler presentationC program compiler presentation
C program compiler presentation
Rigvendra Kumar Vardhan
 
A Brief Overview of (Static) Program Query Languages
A Brief Overview of (Static) Program Query LanguagesA Brief Overview of (Static) Program Query Languages
A Brief Overview of (Static) Program Query Languages
Kim Mens
 
What we can learn from Rebol?
What we can learn from Rebol?What we can learn from Rebol?
What we can learn from Rebol?
lichtkind
 
Let's us c language (sabeel Bugti)
Let's us c language (sabeel Bugti)Let's us c language (sabeel Bugti)
Let's us c language (sabeel Bugti)
Unviersity of balochistan quetta
 
Devry cis 170 c i lab 5 of 7 arrays and strings
Devry cis 170 c i lab 5 of 7 arrays and stringsDevry cis 170 c i lab 5 of 7 arrays and strings
Devry cis 170 c i lab 5 of 7 arrays and strings
jody zoll
 
Lecture 0 - CS50's Introduction to Programming with Python.pdf
Lecture 0 - CS50's Introduction to Programming with Python.pdfLecture 0 - CS50's Introduction to Programming with Python.pdf
Lecture 0 - CS50's Introduction to Programming with Python.pdf
SrinivasPonugupaty1
 
Part I_Translating & Starting a Program_Compiler, Linker, Assembler, Loader_L...
Part I_Translating & Starting a Program_Compiler, Linker, Assembler, Loader_L...Part I_Translating & Starting a Program_Compiler, Linker, Assembler, Loader_L...
Part I_Translating & Starting a Program_Compiler, Linker, Assembler, Loader_L...
Suresh D.S.
 
Compiler chapter six .ppt course material
Compiler chapter six .ppt course materialCompiler chapter six .ppt course material
Compiler chapter six .ppt course material
gadisaAdamu
 
Code Generation using T4
Code Generation using T4Code Generation using T4
Code Generation using T4
Joubin Najmaie
 
.gradle 파일 정독해보기
.gradle 파일 정독해보기.gradle 파일 정독해보기
.gradle 파일 정독해보기
경주 전
 
paython practical
paython practical paython practical
paython practical
Upadhyayjanki
 
C++ lecture 01
C++   lecture 01C++   lecture 01
C++ lecture 01
HNDE Labuduwa Galle
 
c_pro_introduction.pptx
c_pro_introduction.pptxc_pro_introduction.pptx
c_pro_introduction.pptx
RohitRaj744272
 
PART 3: THE SCRIPTING COMPOSER AND PYTHON
PART 3: THE SCRIPTING COMPOSER AND PYTHONPART 3: THE SCRIPTING COMPOSER AND PYTHON
PART 3: THE SCRIPTING COMPOSER AND PYTHON
Andrea Antonello
 
Devry cis 170 c i lab 5 of 7 arrays and strings
Devry cis 170 c i lab 5 of 7 arrays and stringsDevry cis 170 c i lab 5 of 7 arrays and strings
Devry cis 170 c i lab 5 of 7 arrays and strings
ash52393
 
Devry cis 170 c i lab 5 of 7 arrays and strings
Devry cis 170 c i lab 5 of 7 arrays and stringsDevry cis 170 c i lab 5 of 7 arrays and strings
Devry cis 170 c i lab 5 of 7 arrays and strings
shyaminfo04
 
PDC Video on C# 4.0 Futures
PDC Video on C# 4.0 FuturesPDC Video on C# 4.0 Futures
PDC Video on C# 4.0 Futures
nithinmohantk
 
What is turbo c and how it works
What is turbo c and how it worksWhat is turbo c and how it works
What is turbo c and how it works
Mark John Lado, MIT
 
The Ring programming language version 1.5.3 book - Part 39 of 184
The Ring programming language version 1.5.3 book - Part 39 of 184The Ring programming language version 1.5.3 book - Part 39 of 184
The Ring programming language version 1.5.3 book - Part 39 of 184
Mahmoud Samir Fayed
 
How to avoid Go gotchas - Ivan Daniluk - Codemotion Milan 2016
How to avoid Go gotchas - Ivan Daniluk - Codemotion Milan 2016How to avoid Go gotchas - Ivan Daniluk - Codemotion Milan 2016
How to avoid Go gotchas - Ivan Daniluk - Codemotion Milan 2016
Codemotion
 

Similar to Aspects of software naturalness through the generation of IdentifierNames (20)

C program compiler presentation
C program compiler presentationC program compiler presentation
C program compiler presentation
 
A Brief Overview of (Static) Program Query Languages
A Brief Overview of (Static) Program Query LanguagesA Brief Overview of (Static) Program Query Languages
A Brief Overview of (Static) Program Query Languages
 
What we can learn from Rebol?
What we can learn from Rebol?What we can learn from Rebol?
What we can learn from Rebol?
 
Let's us c language (sabeel Bugti)
Let's us c language (sabeel Bugti)Let's us c language (sabeel Bugti)
Let's us c language (sabeel Bugti)
 
Devry cis 170 c i lab 5 of 7 arrays and strings
Devry cis 170 c i lab 5 of 7 arrays and stringsDevry cis 170 c i lab 5 of 7 arrays and strings
Devry cis 170 c i lab 5 of 7 arrays and strings
 
Lecture 0 - CS50's Introduction to Programming with Python.pdf
Lecture 0 - CS50's Introduction to Programming with Python.pdfLecture 0 - CS50's Introduction to Programming with Python.pdf
Lecture 0 - CS50's Introduction to Programming with Python.pdf
 
Part I_Translating & Starting a Program_Compiler, Linker, Assembler, Loader_L...
Part I_Translating & Starting a Program_Compiler, Linker, Assembler, Loader_L...Part I_Translating & Starting a Program_Compiler, Linker, Assembler, Loader_L...
Part I_Translating & Starting a Program_Compiler, Linker, Assembler, Loader_L...
 
Compiler chapter six .ppt course material
Compiler chapter six .ppt course materialCompiler chapter six .ppt course material
Compiler chapter six .ppt course material
 
Code Generation using T4
Code Generation using T4Code Generation using T4
Code Generation using T4
 
.gradle 파일 정독해보기
.gradle 파일 정독해보기.gradle 파일 정독해보기
.gradle 파일 정독해보기
 
paython practical
paython practical paython practical
paython practical
 
C++ lecture 01
C++   lecture 01C++   lecture 01
C++ lecture 01
 
c_pro_introduction.pptx
c_pro_introduction.pptxc_pro_introduction.pptx
c_pro_introduction.pptx
 
PART 3: THE SCRIPTING COMPOSER AND PYTHON
PART 3: THE SCRIPTING COMPOSER AND PYTHONPART 3: THE SCRIPTING COMPOSER AND PYTHON
PART 3: THE SCRIPTING COMPOSER AND PYTHON
 
Devry cis 170 c i lab 5 of 7 arrays and strings
Devry cis 170 c i lab 5 of 7 arrays and stringsDevry cis 170 c i lab 5 of 7 arrays and strings
Devry cis 170 c i lab 5 of 7 arrays and strings
 
Devry cis 170 c i lab 5 of 7 arrays and strings
Devry cis 170 c i lab 5 of 7 arrays and stringsDevry cis 170 c i lab 5 of 7 arrays and strings
Devry cis 170 c i lab 5 of 7 arrays and strings
 
PDC Video on C# 4.0 Futures
PDC Video on C# 4.0 FuturesPDC Video on C# 4.0 Futures
PDC Video on C# 4.0 Futures
 
What is turbo c and how it works
What is turbo c and how it worksWhat is turbo c and how it works
What is turbo c and how it works
 
The Ring programming language version 1.5.3 book - Part 39 of 184
The Ring programming language version 1.5.3 book - Part 39 of 184The Ring programming language version 1.5.3 book - Part 39 of 184
The Ring programming language version 1.5.3 book - Part 39 of 184
 
How to avoid Go gotchas - Ivan Daniluk - Codemotion Milan 2016
How to avoid Go gotchas - Ivan Daniluk - Codemotion Milan 2016How to avoid Go gotchas - Ivan Daniluk - Codemotion Milan 2016
How to avoid Go gotchas - Ivan Daniluk - Codemotion Milan 2016
 

More from Oleksandr Zaitsev

Cormas: Modelling for Citizens with Citizens. Building accessible and reliabl...
Cormas: Modelling for Citizens with Citizens. Building accessible and reliabl...Cormas: Modelling for Citizens with Citizens. Building accessible and reliabl...
Cormas: Modelling for Citizens with Citizens. Building accessible and reliabl...
Oleksandr Zaitsev
 
Cormas RMoD
Cormas RMoDCormas RMoD
Cormas RMoD
Oleksandr Zaitsev
 
Cirad Parcours
Cirad ParcoursCirad Parcours
Cirad Parcours
Oleksandr Zaitsev
 
Cirad Concours
Cirad ConcoursCirad Concours
Cirad Concours
Oleksandr Zaitsev
 
Agent-Based Modelling in Pharo Using Cormas
Agent-Based Modelling in Pharo Using CormasAgent-Based Modelling in Pharo Using Cormas
Agent-Based Modelling in Pharo Using Cormas
Oleksandr Zaitsev
 
AI for Software Engineering:
Research & Innovation
AI for Software Engineering:
Research & InnovationAI for Software Engineering:
Research & Innovation
AI for Software Engineering:
Research & Innovation
Oleksandr Zaitsev
 
How Libraries Evolve. A Survey of Two Industrial Companies and an Open-Source...
How Libraries Evolve. A Survey of Two Industrial Companies and an Open-Source...How Libraries Evolve. A Survey of Two Industrial Companies and an Open-Source...
How Libraries Evolve. A Survey of Two Industrial Companies and an Open-Source...
Oleksandr Zaitsev
 
Data Mining-based Tools to Support Library Update. PhD Defence of Oleksandr Z...
Data Mining-based Tools to Support Library Update. PhD Defence of Oleksandr Z...Data Mining-based Tools to Support Library Update. PhD Defence of Oleksandr Z...
Data Mining-based Tools to Support Library Update. PhD Defence of Oleksandr Z...
Oleksandr Zaitsev
 
PolyMath (ESUG 2022)
PolyMath (ESUG 2022)PolyMath (ESUG 2022)
PolyMath (ESUG 2022)
Oleksandr Zaitsev
 
How Fast is AI in Pharo? Benchmarking Linear Regression
How Fast is AI in Pharo? Benchmarking Linear RegressionHow Fast is AI in Pharo? Benchmarking Linear Regression
How Fast is AI in Pharo? Benchmarking Linear Regression
Oleksandr Zaitsev
 
DepMiner: Automatic Recommendation of Transformation Rules for Method Depreca...
DepMiner: Automatic Recommendation of Transformation Rules for Method Depreca...DepMiner: Automatic Recommendation of Transformation Rules for Method Depreca...
DepMiner: Automatic Recommendation of Transformation Rules for Method Depreca...
Oleksandr Zaitsev
 
Suggesting Descriptive Method Names: An Exploratory Study of Two Machine Lear...
Suggesting Descriptive Method Names: An Exploratory Study of Two Machine Lear...Suggesting Descriptive Method Names: An Exploratory Study of Two Machine Lear...
Suggesting Descriptive Method Names: An Exploratory Study of Two Machine Lear...
Oleksandr Zaitsev
 
Machine Learning-based Tools to Support Library Update
Machine Learning-based Tools to Support Library UpdateMachine Learning-based Tools to Support Library Update
Machine Learning-based Tools to Support Library Update
Oleksandr Zaitsev
 
Introduction to Git Version Control System
Introduction to Git Version Control SystemIntroduction to Git Version Control System
Introduction to Git Version Control System
Oleksandr Zaitsev
 
PhD Roadmap
PhD RoadmapPhD Roadmap
PhD Roadmap
Oleksandr Zaitsev
 
Magic Literals in Pharo
Magic Literals in PharoMagic Literals in Pharo
Magic Literals in Pharo
Oleksandr Zaitsev
 

More from Oleksandr Zaitsev (16)

Cormas: Modelling for Citizens with Citizens. Building accessible and reliabl...
Cormas: Modelling for Citizens with Citizens. Building accessible and reliabl...Cormas: Modelling for Citizens with Citizens. Building accessible and reliabl...
Cormas: Modelling for Citizens with Citizens. Building accessible and reliabl...
 
Cormas RMoD
Cormas RMoDCormas RMoD
Cormas RMoD
 
Cirad Parcours
Cirad ParcoursCirad Parcours
Cirad Parcours
 
Cirad Concours
Cirad ConcoursCirad Concours
Cirad Concours
 
Agent-Based Modelling in Pharo Using Cormas
Agent-Based Modelling in Pharo Using CormasAgent-Based Modelling in Pharo Using Cormas
Agent-Based Modelling in Pharo Using Cormas
 
AI for Software Engineering:
Research & Innovation
AI for Software Engineering:
Research & InnovationAI for Software Engineering:
Research & Innovation
AI for Software Engineering:
Research & Innovation
 
How Libraries Evolve. A Survey of Two Industrial Companies and an Open-Source...
How Libraries Evolve. A Survey of Two Industrial Companies and an Open-Source...How Libraries Evolve. A Survey of Two Industrial Companies and an Open-Source...
How Libraries Evolve. A Survey of Two Industrial Companies and an Open-Source...
 
Data Mining-based Tools to Support Library Update. PhD Defence of Oleksandr Z...
Data Mining-based Tools to Support Library Update. PhD Defence of Oleksandr Z...Data Mining-based Tools to Support Library Update. PhD Defence of Oleksandr Z...
Data Mining-based Tools to Support Library Update. PhD Defence of Oleksandr Z...
 
PolyMath (ESUG 2022)
PolyMath (ESUG 2022)PolyMath (ESUG 2022)
PolyMath (ESUG 2022)
 
How Fast is AI in Pharo? Benchmarking Linear Regression
How Fast is AI in Pharo? Benchmarking Linear RegressionHow Fast is AI in Pharo? Benchmarking Linear Regression
How Fast is AI in Pharo? Benchmarking Linear Regression
 
DepMiner: Automatic Recommendation of Transformation Rules for Method Depreca...
DepMiner: Automatic Recommendation of Transformation Rules for Method Depreca...DepMiner: Automatic Recommendation of Transformation Rules for Method Depreca...
DepMiner: Automatic Recommendation of Transformation Rules for Method Depreca...
 
Suggesting Descriptive Method Names: An Exploratory Study of Two Machine Lear...
Suggesting Descriptive Method Names: An Exploratory Study of Two Machine Lear...Suggesting Descriptive Method Names: An Exploratory Study of Two Machine Lear...
Suggesting Descriptive Method Names: An Exploratory Study of Two Machine Lear...
 
Machine Learning-based Tools to Support Library Update
Machine Learning-based Tools to Support Library UpdateMachine Learning-based Tools to Support Library Update
Machine Learning-based Tools to Support Library Update
 
Introduction to Git Version Control System
Introduction to Git Version Control SystemIntroduction to Git Version Control System
Introduction to Git Version Control System
 
PhD Roadmap
PhD RoadmapPhD Roadmap
PhD Roadmap
 
Magic Literals in Pharo
Magic Literals in PharoMagic Literals in Pharo
Magic Literals in Pharo
 

Recently uploaded

8.Isolation of pure cultures and preservation of cultures.pdf
8.Isolation of pure cultures and preservation of cultures.pdf8.Isolation of pure cultures and preservation of cultures.pdf
8.Isolation of pure cultures and preservation of cultures.pdf
by6843629
 
MICROBIAL INTERACTION PPT/ MICROBIAL INTERACTION AND THEIR TYPES // PLANT MIC...
MICROBIAL INTERACTION PPT/ MICROBIAL INTERACTION AND THEIR TYPES // PLANT MIC...MICROBIAL INTERACTION PPT/ MICROBIAL INTERACTION AND THEIR TYPES // PLANT MIC...
MICROBIAL INTERACTION PPT/ MICROBIAL INTERACTION AND THEIR TYPES // PLANT MIC...
ABHISHEK SONI NIMT INSTITUTE OF MEDICAL AND PARAMEDCIAL SCIENCES , GOVT PG COLLEGE NOIDA
 
11.1 Role of physical biological in deterioration of grains.pdf
11.1 Role of physical biological in deterioration of grains.pdf11.1 Role of physical biological in deterioration of grains.pdf
11.1 Role of physical biological in deterioration of grains.pdf
PirithiRaju
 
23PH301 - Optics - Optical Lenses.pptx
23PH301 - Optics  -  Optical Lenses.pptx23PH301 - Optics  -  Optical Lenses.pptx
23PH301 - Optics - Optical Lenses.pptx
RDhivya6
 
Candidate young stellar objects in the S-cluster: Kinematic analysis of a sub...
Candidate young stellar objects in the S-cluster: Kinematic analysis of a sub...Candidate young stellar objects in the S-cluster: Kinematic analysis of a sub...
Candidate young stellar objects in the S-cluster: Kinematic analysis of a sub...
Sérgio Sacani
 
The binding of cosmological structures by massless topological defects
The binding of cosmological structures by massless topological defectsThe binding of cosmological structures by massless topological defects
The binding of cosmological structures by massless topological defects
Sérgio Sacani
 
The cost of acquiring information by natural selection
The cost of acquiring information by natural selectionThe cost of acquiring information by natural selection
The cost of acquiring information by natural selection
Carl Bergstrom
 
Farming systems analysis: what have we learnt?.pptx
Farming systems analysis: what have we learnt?.pptxFarming systems analysis: what have we learnt?.pptx
Farming systems analysis: what have we learnt?.pptx
Frédéric Baudron
 
GBSN - Biochemistry (Unit 6) Chemistry of Proteins
GBSN - Biochemistry (Unit 6) Chemistry of ProteinsGBSN - Biochemistry (Unit 6) Chemistry of Proteins
GBSN - Biochemistry (Unit 6) Chemistry of Proteins
Areesha Ahmad
 
在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样
在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样
在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样
vluwdy49
 
Discovery of An Apparent Red, High-Velocity Type Ia Supernova at 𝐳 = 2.9 wi...
Discovery of An Apparent Red, High-Velocity Type Ia Supernova at  𝐳 = 2.9  wi...Discovery of An Apparent Red, High-Velocity Type Ia Supernova at  𝐳 = 2.9  wi...
Discovery of An Apparent Red, High-Velocity Type Ia Supernova at 𝐳 = 2.9 wi...
Sérgio Sacani
 
(June 12, 2024) Webinar: Development of PET theranostics targeting the molecu...
(June 12, 2024) Webinar: Development of PET theranostics targeting the molecu...(June 12, 2024) Webinar: Development of PET theranostics targeting the molecu...
(June 12, 2024) Webinar: Development of PET theranostics targeting the molecu...
Scintica Instrumentation
 
The debris of the ‘last major merger’ is dynamically young
The debris of the ‘last major merger’ is dynamically youngThe debris of the ‘last major merger’ is dynamically young
The debris of the ‘last major merger’ is dynamically young
Sérgio Sacani
 
Mending Clothing to Support Sustainable Fashion_CIMaR 2024.pdf
Mending Clothing to Support Sustainable Fashion_CIMaR 2024.pdfMending Clothing to Support Sustainable Fashion_CIMaR 2024.pdf
Mending Clothing to Support Sustainable Fashion_CIMaR 2024.pdf
Selcen Ozturkcan
 
LEARNING TO LIVE WITH LAWS OF MOTION .pptx
LEARNING TO LIVE WITH LAWS OF MOTION .pptxLEARNING TO LIVE WITH LAWS OF MOTION .pptx
LEARNING TO LIVE WITH LAWS OF MOTION .pptx
yourprojectpartner05
 
Compexometric titration/Chelatorphy titration/chelating titration
Compexometric titration/Chelatorphy titration/chelating titrationCompexometric titration/Chelatorphy titration/chelating titration
Compexometric titration/Chelatorphy titration/chelating titration
Vandana Devesh Sharma
 
CLASS 12th CHEMISTRY SOLID STATE ppt (Animated)
CLASS 12th CHEMISTRY SOLID STATE ppt (Animated)CLASS 12th CHEMISTRY SOLID STATE ppt (Animated)
CLASS 12th CHEMISTRY SOLID STATE ppt (Animated)
eitps1506
 
Summary Of transcription and Translation.pdf
Summary Of transcription and Translation.pdfSummary Of transcription and Translation.pdf
Summary Of transcription and Translation.pdf
vadgavevedant86
 
Randomised Optimisation Algorithms in DAPHNE
Randomised Optimisation Algorithms in DAPHNERandomised Optimisation Algorithms in DAPHNE
Randomised Optimisation Algorithms in DAPHNE
University of Maribor
 
Methods of grain storage Structures in India.pdf
Methods of grain storage Structures in India.pdfMethods of grain storage Structures in India.pdf
Methods of grain storage Structures in India.pdf
PirithiRaju
 

Recently uploaded (20)

8.Isolation of pure cultures and preservation of cultures.pdf
8.Isolation of pure cultures and preservation of cultures.pdf8.Isolation of pure cultures and preservation of cultures.pdf
8.Isolation of pure cultures and preservation of cultures.pdf
 
MICROBIAL INTERACTION PPT/ MICROBIAL INTERACTION AND THEIR TYPES // PLANT MIC...
MICROBIAL INTERACTION PPT/ MICROBIAL INTERACTION AND THEIR TYPES // PLANT MIC...MICROBIAL INTERACTION PPT/ MICROBIAL INTERACTION AND THEIR TYPES // PLANT MIC...
MICROBIAL INTERACTION PPT/ MICROBIAL INTERACTION AND THEIR TYPES // PLANT MIC...
 
11.1 Role of physical biological in deterioration of grains.pdf
11.1 Role of physical biological in deterioration of grains.pdf11.1 Role of physical biological in deterioration of grains.pdf
11.1 Role of physical biological in deterioration of grains.pdf
 
23PH301 - Optics - Optical Lenses.pptx
23PH301 - Optics  -  Optical Lenses.pptx23PH301 - Optics  -  Optical Lenses.pptx
23PH301 - Optics - Optical Lenses.pptx
 
Candidate young stellar objects in the S-cluster: Kinematic analysis of a sub...
Candidate young stellar objects in the S-cluster: Kinematic analysis of a sub...Candidate young stellar objects in the S-cluster: Kinematic analysis of a sub...
Candidate young stellar objects in the S-cluster: Kinematic analysis of a sub...
 
The binding of cosmological structures by massless topological defects
The binding of cosmological structures by massless topological defectsThe binding of cosmological structures by massless topological defects
The binding of cosmological structures by massless topological defects
 
The cost of acquiring information by natural selection
The cost of acquiring information by natural selectionThe cost of acquiring information by natural selection
The cost of acquiring information by natural selection
 
Farming systems analysis: what have we learnt?.pptx
Farming systems analysis: what have we learnt?.pptxFarming systems analysis: what have we learnt?.pptx
Farming systems analysis: what have we learnt?.pptx
 
GBSN - Biochemistry (Unit 6) Chemistry of Proteins
GBSN - Biochemistry (Unit 6) Chemistry of ProteinsGBSN - Biochemistry (Unit 6) Chemistry of Proteins
GBSN - Biochemistry (Unit 6) Chemistry of Proteins
 
在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样
在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样
在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样
 
Discovery of An Apparent Red, High-Velocity Type Ia Supernova at 𝐳 = 2.9 wi...
Discovery of An Apparent Red, High-Velocity Type Ia Supernova at  𝐳 = 2.9  wi...Discovery of An Apparent Red, High-Velocity Type Ia Supernova at  𝐳 = 2.9  wi...
Discovery of An Apparent Red, High-Velocity Type Ia Supernova at 𝐳 = 2.9 wi...
 
(June 12, 2024) Webinar: Development of PET theranostics targeting the molecu...
(June 12, 2024) Webinar: Development of PET theranostics targeting the molecu...(June 12, 2024) Webinar: Development of PET theranostics targeting the molecu...
(June 12, 2024) Webinar: Development of PET theranostics targeting the molecu...
 
The debris of the ‘last major merger’ is dynamically young
The debris of the ‘last major merger’ is dynamically youngThe debris of the ‘last major merger’ is dynamically young
The debris of the ‘last major merger’ is dynamically young
 
Mending Clothing to Support Sustainable Fashion_CIMaR 2024.pdf
Mending Clothing to Support Sustainable Fashion_CIMaR 2024.pdfMending Clothing to Support Sustainable Fashion_CIMaR 2024.pdf
Mending Clothing to Support Sustainable Fashion_CIMaR 2024.pdf
 
LEARNING TO LIVE WITH LAWS OF MOTION .pptx
LEARNING TO LIVE WITH LAWS OF MOTION .pptxLEARNING TO LIVE WITH LAWS OF MOTION .pptx
LEARNING TO LIVE WITH LAWS OF MOTION .pptx
 
Compexometric titration/Chelatorphy titration/chelating titration
Compexometric titration/Chelatorphy titration/chelating titrationCompexometric titration/Chelatorphy titration/chelating titration
Compexometric titration/Chelatorphy titration/chelating titration
 
CLASS 12th CHEMISTRY SOLID STATE ppt (Animated)
CLASS 12th CHEMISTRY SOLID STATE ppt (Animated)CLASS 12th CHEMISTRY SOLID STATE ppt (Animated)
CLASS 12th CHEMISTRY SOLID STATE ppt (Animated)
 
Summary Of transcription and Translation.pdf
Summary Of transcription and Translation.pdfSummary Of transcription and Translation.pdf
Summary Of transcription and Translation.pdf
 
Randomised Optimisation Algorithms in DAPHNE
Randomised Optimisation Algorithms in DAPHNERandomised Optimisation Algorithms in DAPHNE
Randomised Optimisation Algorithms in DAPHNE
 
Methods of grain storage Structures in India.pdf
Methods of grain storage Structures in India.pdfMethods of grain storage Structures in India.pdf
Methods of grain storage Structures in India.pdf
 

Aspects of software naturalness through the generation of IdentifierNames

  • 1. Aspects of software naturalness through the generation of IdentifierNames Oleksandr Zaytsev Ukrainian Catholic University oleks@ucu.edu.ua Stéphane Ducasse Inria Lille - Nord Europe stephane.ducasse@inria.fr Alexandre Bergel DCC, Universidad de Chile abergel@dcc.uchile.cl
  • 2. 1. Introduction (problem statement) 2. Data collection 3. What makes code different from English? 4. Translating code to English 5. Evaluation 6. Conclusion Contents 2 Each dot indicates a chapter
  • 4. Developers spend of time
 reading
 code90% up to * * According to Martin (2009) 4
  • 5. Identifier names * According to Deissenboeck and Pizka (2006) of source 
 code 72% can take
 up to * 5
  • 6. Bad names A >> b: c | d | d := self e f: c; g. ^ self h i: d j 6 * * It actually works
  • 7. Good names Collection >> union: aCollection | set | set := self asSet addAll: aCollection; yourself. ^ self species withAll: set asArray 7 * * That’s the same method
  • 8. Problem 1. As software evolves, names become obsolete 2. They need to be maintained and updated 3. This requires complete understanding of the whole system at all times (impossible for big systems) ➡ We need automated tools to assist humans 8
  • 9. Pharo 9 • Dialect of Smalltalk • Object-oriented • Created by Stéphane and Marcus in 2008 What? • Outside C-family • Syntax for kids • Access to community Why?
  • 10. Challenge Suggest descriptive names for Pharo methods based on their source code 10
  • 11. This is hard because... We name methods based on how they should be used, not how they are implemented 11
  • 12. Software naturalness • Programming is a form of communication • Source code written by people has similar regularities as texts in natural languages ➡ We can apply NLP to source code • We take this a bit further:
 In certain applications and after good preprocessing we can treat code as a natural language 12
  • 13. Research questions RQ1. Is source code of Pharo natural enough to allow us to use its regularities in machine learning models? RQ2. Does source code of a method contain enough semantic information to generate a name that would express the purpose of that method? 13
  • 14. • Collected datasets of Pharo and Java methods from selected projects • Compared source code of Pharo and Java to English • Built a deep learning model that infers the name of a method from its body Contributions 14
  • 15. To our knowledge, this is … • First study of software naturalness outside the C-family of programming languages • First application of machine learning and natural language processing to source code of Smalltalk for building a tool in Pharo IDE 15
  • 16. 2. Collecting the data 16
  • 17. We have collected… Pharo methods 132, 046 136, 811 Java methods 17
  • 18. …and compared them to natural English sentences from
 Brown corpus 57,340 98,552 sentences from Gutenberg corpus 18
  • 19. Word count in datasets 0 1,750,000 3,500,000 5,250,000 7,000,000 Brown Gutenberg Pharo Java Natural English Programming languages 19
  • 20. Tokenizing source code 20 text := string asRopedText. attributes := (anEditorElement editor text attributesAt: aBrText end) reject: [ :each | each = self ]. text attributes: attributes; foreground: BrGlamorousColors defaultButtonTextColor; bold. ^ (BlTextElement text: text) background: (BrGlamorousColors errorBackgroundColor alpha: 0.5); padding: (BlInsets top: 3 left: 0 bottom: 3 right: 0); yourself .
  • 21. Tokenizing source code 21 text := string asRopedText. attributes := (anEditorElement editor text attributesAt: aBrText end) reject: [ :each | each = self ]. text attributes: attributes; foreground: BrGlamorousColors defaultButtonTextColor; bold. ^ (BlTextElement text: text) background: (BrGlamorousColors errorBackgroundColor alpha: 0.5); padding: (BlInsets top: 3 left: 0 bottom: 3 right: 0); yourself .
  • 22. Tokenizing source code 22 text := string asRopedText . attributes := ( anEditorElement editor text attributesAt: aBrText end ) reject: [ :each | each = self ] . text attributes: attributes ; foreground: BrGlamorousColors defaultButtonTextColor ; bold . ^ ( BlTextElement text: text ) background: ( BrGlamorousColors errorBackgroundColor alpha: 0.5 ) ; padding: ( BlInsets top: 3 left: 0 bottom: 3 right: 0 ) ; yourself .
  • 23. Tokenizing source code 23 text := string asRopedText . attributes := ( anEditorElement editor text attributesAt: aBrText end ) reject: [ :each | each = self ] . text attributes: attributes ; foreground: BrGlamorousColors defaultButtonTextColor ; bold . ^ ( BlTextElement text: text ) background: ( BrGlamorousColors errorBackgroundColor alpha: 0.5 ) ; padding: ( BlInsets top: 3 left: 0 bottom: 3 right: 0 ) ; yourself .
  • 24. Tokenizing source code 24 text := string asRopedText . attributes := ( anEditorElement editor text attributesAt : aBrText end ) reject : [ : each | each = self ] . text attributes : attributes ; foreground : BrGlamorousColors defaultButtonTextColor ; bold . ^ ( BlTextElement text : text ) background : ( BrGlamorousColors errorBackgroundColor alpha : 0.5 ) ; padding : ( BlInsets top : 3 left : 0 bottom : 3 right : 0 ) ; yourself .
  • 25. Tokenizing source code 25 text := string asRopedText . attributes := ( anEditorElement editor text attributesAt : aBrText end ) reject : [ : each | each = self ] . text attributes : attributes ; foreground : BrGlamorousColors defaultButtonTextColor ; bold . ^ ( BlTextElement text : text ) background : ( BrGlamorousColors errorBackgroundColor alpha : 0.5 ) ; padding : ( BlInsets top : 3 left : 0 bottom : 3 right : 0 ) ; yourself .
  • 26. Tokenizing source code 26 text := string as roped text . attributes := ( an editor element editor text attributes at : a br text end ) reject : [ : each | each = self ] . text attributes : attributes ; foreground : br glamorous colors default button text color ; bold . ^ ( bl text element text : text ) background : ( br glamorous colors error background color alpha : 0.5 ) ; padding : ( bl insets top : 3 left : 0 bottom : 3 right : 0 ) ; yourself .
  • 27. Tokenizing source code 27 text := string as roped text . attributes := ( an editor element editor text attributes at : a br text end ) reject : [ : each | each = self ] . text attributes : attributes ; foreground : br glamorous colors default button text color ; bold . ^ ( bl text element text : text ) background : ( br glamorous colors error background color alpha : 0.5 ) ; padding : ( bl insets top : 3 left : 0 bottom : 3 right : 0 ) ; yourself .
  • 28. Tokenizing source code 28 text := string as roped text . attributes := ( an editor element editor text attributes at : a br text end ) reject : [ : each | each = self ] . text attributes : attributes ; foreground : br glamorous colors default button text color ; bold . ^ ( bl text element text : text ) background : ( br glamorous colors error background color alpha : <num> ) ; padding : ( bl insets top : <num> left : <num> bottom : <num> right : <num> ) ; yourself .
  • 29. Tokenizing source code 29 text := string as roped text .P attributes := ( an editor element editor text attributes at : a br text end )P reject : [ : each | each = self ] .P P textP attributes : attributes ;P foreground : br glamorous colors default button text color ;P bold .P P ^ ( bl text element text : text )P background : ( br glamorous colors error background color alpha : <num> ) ;P padding : ( bl insets top : <num> left : <num> bottom : <num> right : <num> ) ;P yourself .
  • 30. 30 text := string as roped text . attributes := ( an editor element editor text attributes at : a br text end ) reject : [ : each | each = self ] . text attributes : attributes ; foreground : br glamorous colors default button text color ; bold . ^ ( bl text element text : text ) background : ( br glamorous colors error background color alpha : <num> ) ; padding : ( bl insets top : <num> left : <num> bottom : <num> right : <num> ) ; yourself . Tokenizing source code
  • 31. Words text := string as roped text . attributes := ( an editor element editor text attributes at : a br text end ) reject : [ : each | each = self ] . text attributes : attributes ; foreground : br glamorous colors default button text color ; bold . ^ ( bl text element text : text ) background : ( br glamorous colors error background color alpha : <num> ) ; padding : ( bl insets top : <num> left : <num> bottom : <num> right : <num> ) ; yourself . Tokenizing source code 31 Paragraph with 4 sentences Sentence
  • 32. 3. What makes code different from English? 32
  • 33. Limited vocabulary 6,415 50,286 Unique words
 in Pharo code Unique words
 in Gutenberg corpus 33
  • 34. Specialized vocabulary
 Top 5 words from… Bible the
 and
 of
 to
 that 34 Reuters the
 of
 to
 in
 and Pharo self
 a
 if
 new
 assert Java get
 new
 string
 if
 return
  • 35. 4. Translating source code to English 35
  • 36. 36 Method name as short English sentence sumOfIntegerNumbers — method name sum of integer numbers — short English sentence
  • 37. • Sequence to sequence neural network • Attention-based decoder • Teacher forcing • GRU cells Model 37
  • 40. Example of generated method names 40 self assert: self newNode isComment. Real name: test is comment Generated name: test is comment
  • 41. Example of generated method names 41 aVisitor visitDraggableInteractreion: self with: args. Real name: accept with Generated name: accept
  • 43. Idea • Real method name is ground truth. • Good model will on average generate names that are close to real names 43
  • 44. Random baseline* Idea: Select 3 random words from vocabulary of words used in method names 44 * Lowest baseline
  • 45. TF-IDF baseline* Idea: Select 3 words that appear often in source code of this method and are very rare in other methods (keywords) 45 * Simple, fast and surprisingly good
  • 47. Evaluation on test set Exact match score 0% 4% 7% 11% 14% Random model TF-IDF Our model 47 13.82%
  • 48. Evaluation on test set Precision score 0% 15% 30% 45% 60% Random model TF-IDF Our model 48
  • 49. Evaluation on test set Recall score 0% 10% 20% 30% 40% Random model TF-IDF Our model 49
  • 51. RQ1. Can we apply NLP and ML to source code of Pharo? 51 • We can build tools for Pharo IDE using same machine learning models that are applied to natural texts • Similarly to natural languages, code has regularities that can be exploited by machine learning models YES
  • 52. RQ2. Can we infer method names only from source code? 52 • With no additional features we were able to generate descriptive method names based on source code • We explain this by semantic information carried by identifier names inside code YES