Speculative Analysis for Comment
Quality Assessment
Pooja Rani

Advisor: Prof. Oscar Nierstrasz
Software Composition Group

University of Bern, Switzerland
Trustworthy form of documentation
/*
*

*A class representing a window on the screen
.

*For example
:

*<pre
>

*Window win = new Window(parent)
;

*win.show()
;

*</pre
>

*
 

*@author Sami Shai
o

*@version 1.13, 06/08/0
6

*@see java.awt.BaseWindo
w

*/
 

class Window extends BaseWindow
{

.
.

}
Motivation
2
High-quality code comments assist developers
3
is this a high-
quality comment?
is this a low-
quality comment?
/*
*

*A class representing a window on the screen
.

*For example
:

*<pre
>

*Window win = new Window(parent)
;

*win.show()
;

*</pre
>

*
 

*@author Sami Shai
o

*@version 1.13, 06/08/0
6

*@see java.awt.BaseWindo
w

*/
 

class Window extends BaseWindow
{

.
.

}
Problem
4
No standard definition of comment quality


No strict syntax and structure conventions


The compiler does not check comments


Lack of quality assessment tools


Makes quality assessment a non-trivial problem
Challenges
5
State of the art
6
7
Documentation (comment) generation
Documentation (comment) assessment
8
Not all of them focused mainly on comments or all aspects
of comments
Documentation (comment) assessment
9
Given the increasing use of multi-language environment,
we need a deeper understanding of developer
commenting practices and concerns to achieve high-
quality comments
10
Thesis statement
“Understanding the speci
fi
cation of high-quality comments to
build e
ff
ective assessment tools requires a multi-perspective
view of the comments”.
11
“Understanding the speci
fi
cation of high-quality comments to
build e
ff
ective assessment tools requires a multi-perspective
view of the comments. The view can be approached by”
RQ1: What do developers write in comments across languages?
Thesis statement
12
“Understanding the speci
fi
cation of high-quality comments to
build e
ff
ective assessment tools requires a multi-perspective
view of the comments. The view can be approached by”
RQ1: What do developers write in comments across languages?
RQ2: What do developers ask about code commenting practices?
Thesis statement
13
RQ3: What quality attributes are often considered in assessing comment quality?
“Understanding the speci
fi
cation of high-quality comments to
build e
ff
ective assessment tools requires a multi-perspective
view of the comments. The view can be approached by”
RQ1: What do developers write in comments across languages?
RQ2: What do developers ask about code commenting practices?
Thesis statement
14
Java, Python, Smalltalk,
 

20 projects
Classify sample comments
,

1, 066 comments
Output: a taxonomy, and a
classifier
Extract class comments,
 

37, 446 comments
RQ1: What do developers write in comments across languages?
/*
*

*A class representing a window on the screen
.

*For example
:

*<pre
>

*Window win = new Window(parent)
;

*win.show()
;

*</pre
>

*
 

*@author Sami Shai
o

*@version 1.13, 06/08/0
6

*@see java.awt.BaseWindo
w

*/
 

class Window extends BaseWindow
{

.
.

}
15
Summary
Usage
Ownership
Pointer
RQ1: What do developers write in comments across languages?
16
Java Smalltalk Python
CCTM
RQ1: What do developers write in comments across languages?
17
Recurrent natural language patterns in
writing a specific type of information
/*
*

*A class representing a window on the screen
.

*For example
:

*<pre
>

*Window win = new Window(parent)
;

*win.show()
;

*</pre
>

*
 

*@author Sami Shai
o

*@version 1.13, 06/08/0
6

*@see java.awt.BaseWindo
w

*/
 

class Window extends BaseWindow
{

.
.

}
Pointer
Summary
Automatically identify an information type
18
Represents [something]
[verb]s [noun]
Sees [something]
Pointer
/*
*

*A class representing a window on the screen
.

*For example
:

*<pre
>

*Window win = new Window(parent)
;

*win.show()
;

*</pre
>

*
 

*@author Sami Shai
o

*@version 1.13, 06/08/0
6

*@see java.awt.BaseWindo
w

*/
 

class Window extends BaseWindow
{

.
.

}
Summary
Automatically identify an information type
Recurrent natural language patterns in
writing a specific type of information
19
Techniques
Textual Analysis
Features
1) 2) 3)
Learning Phase Evaluation
4)
TEXT FEATURES
NLP RULES
J48
Naive Bayes
Random Forest,
Natural Language
Processing
Automated Classification of Comments Types
(RQ2)
Projects CCTM
Taxonomy Definition
(RQ1)
CCTM
Used recurrent patterns as
feature set + text features
Ground truth: manually
analyzed comments
Automatically identify an information type
20
0
0.2
0.4
0.6
0.8
1
S
u
m
m
a
r
y
E
x
p
a
n
d
O
w
n
e
r
s
h
i
p
P
o
i
n
t
e
r
U
s
a
g
e
D
e
p
r
e
c
a
t
i
o
n
R
a
t
i
o
n
a
l
e
Java Top Categories
NaiveBayes J48 RandomForest
S
u
m
m
a
r
y
U
s
a
g
e
E
x
p
a
n
d
D
e
v
e
l
o
p
m
e
n
t
N
o
t
e
s
P
a
r
a
m
e
t
e
r
s
Python Top Categories
NaiveBayes J48 RandomForest
R
e
s
p
o
n
s
i
b
i
l
i
t
y
I
n
t
e
n
t
C
o
l
l
a
b
o
r
a
t
o
r
s
E
x
a
m
p
l
e
s
C
l
a
s
s
R
e
f
e
r
e
n
c
e
s
K
e
y
M
e
s
s
a
g
e
I
m
p
l
e
m
e
n
t
a
t
i
o
n
P
o
i
n
t
Pharo Top Categories
NaiveBayes J48 RandomForest
Random Forest technique classifies comments better
Java Smalltalk
Python
Automatically identify an information type
21
We are currently testing our classification approach using the deep
learning technique (FastText)
Developers embed at least 10 types of information in
comments across languages

Using recurrent natural language patterns as features
improves the classification

Takeaway
22
Timeline & Contribution
Accepte
d

Empirical Software
Engineering (EMSE)
,

2021
Under minor revisio
n

Journal of Systems and Software
(JSS), 2021
How to Identify Class Comment
Types? A Multi-language Approach for
Class Comments Classi
fi
cation
An investigation of comments in Java,
Python, Smalltalk, and develop an
approach to automatically identify
information types across languages
What do class comments tell us? An
investigation of comment evolution
and practices in Pharo Smalltalk
An investigation of comment evolution,
content and their adherence to style
guideline
In the stage of acceptance
23
RQ2: What do developers ask about comments?
24
Mailing lists, Stack
Overflow, Quora
Classify sample discussion
,

1, 400 posts
Output: a taxonomy,
guidelines, tool
Extract discussion,
 

23, 631 posts
RQ2: What do developers ask about comments?
25
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
Error
Opinion
Implementation Problems
Limitation and Possibilities
Background Information
Best Practice
Implementation Strategy
Information Types
Question
Types
Commenting high levels Programming languages Tools IDEs & editors Other
RQ2: What do developers ask about comments?
Developers seek commenting conventions to write

quality comments but often find it hard to locate them
26
To what extent developers follow comment conventions in writing
comments?
Takeaway
27
Timeline & Contribution
What do Developers Discuss about
Comment Conventions?
An investigation of developer concerns
related to comments on various online
platforms
Makar: A Framework for Multi-source
Studies based on Unstructured Data
A tool to conduct study on multiple
sources such as emails, stack over
fl
ow,
GitHub
In preparation
 

Source Code Analysis and
Manipulation (SCAM),
 

2021
Accepte
d

Software Analysis, Evolution and
Reengineering (SANER), 2021
In the stage of acceptance
In the stage of submission
28
RQ3: What quality attributes are often considered in assessing
comment quality?
29
SE conferences & journal
s

195 venues
Identify candidate paper
s

2, 353 papers
Output: quality attributes,
metrics
Extract proceedings,
 

332 proceedings
Select relevant papers,
 

47 papers
RQ3: What quality attributes are often considered in assessing
comment quality?
30
RQ3: What quality attributes are often considered in assessing
comment quality?
31
RQ3: What quality attributes are often considered in assessing
comment quality?
32
RQ3: What quality attributes are often considered in assessing
comment quality?
Researchers are focusing often on handful of quality
attributes in assessing comment quality
33
Numerous studies suggest the benefits of usability, or accessibility of
comments. We need to focus on assessing comments w.r.t them.
Takeaway
34
Results are submitted to a
software engineering venue
Timeline & Contribution
Do Developers Follow Comment
Conventions?
An investigation of comment
adherence to comment conventions
A SLR about comments
An investigation of quality
attributes, and techniques used
for comment assessment
In preparation
In the stage of submission
35
Speculative Analysis of Code Comments
What information do developers
write in the comments?
- Identify information types from
sample comments
- Propose a ML pipeline to automate
Class Comment
Type Model (CCTM)
Machine learning
What quality attributes are
considered in assessing
comments ?
Conduct a systematic literature
review on code comments quality
Metrics
Quality
attributes
Relevant
Literature
Implications for researchers and developers to improve comment quality
What do developers ask about
commenting practices?
- Analyze sample posts
- Prepare a taxonomy
Challenges
Relevant topics Tool
36
The ultimate goal of automatically assessing
comments is still far away…
37
Which quality attributes do developers find important?


Which information types do developers find important?


How do various information types support developers?


An IDE plugin to support automatic assessment of
comments


Future work
38
Developer seek and follow comment conventions.

Having support to assess various aspects of comments
can help maintaining high-quality comments
Takeaway
https://twitter.com/poojaruhal http://scg.unibe.ch/staff/Pooja-Rani

Speculative analysis for comment quality assessment

  • 1.
    Speculative Analysis forComment Quality Assessment Pooja Rani Advisor: Prof. Oscar Nierstrasz Software Composition Group University of Bern, Switzerland
  • 2.
    Trustworthy form ofdocumentation /* * *A class representing a window on the screen . *For example : *<pre > *Window win = new Window(parent) ; *win.show() ; *</pre > * *@author Sami Shai o *@version 1.13, 06/08/0 6 *@see java.awt.BaseWindo w */ class Window extends BaseWindow { . . } Motivation 2 High-quality code comments assist developers
  • 3.
    3 is this ahigh- quality comment? is this a low- quality comment? /* * *A class representing a window on the screen . *For example : *<pre > *Window win = new Window(parent) ; *win.show() ; *</pre > * *@author Sami Shai o *@version 1.13, 06/08/0 6 *@see java.awt.BaseWindo w */ class Window extends BaseWindow { . . } Problem
  • 4.
    4 No standard definitionof comment quality No strict syntax and structure conventions The compiler does not check comments Lack of quality assessment tools Makes quality assessment a non-trivial problem Challenges
  • 5.
  • 6.
  • 7.
  • 8.
    8 Not all ofthem focused mainly on comments or all aspects of comments Documentation (comment) assessment
  • 9.
    9 Given the increasinguse of multi-language environment, we need a deeper understanding of developer commenting practices and concerns to achieve high- quality comments
  • 10.
    10 Thesis statement “Understanding thespeci fi cation of high-quality comments to build e ff ective assessment tools requires a multi-perspective view of the comments”.
  • 11.
    11 “Understanding the speci fi cationof high-quality comments to build e ff ective assessment tools requires a multi-perspective view of the comments. The view can be approached by” RQ1: What do developers write in comments across languages? Thesis statement
  • 12.
    12 “Understanding the speci fi cationof high-quality comments to build e ff ective assessment tools requires a multi-perspective view of the comments. The view can be approached by” RQ1: What do developers write in comments across languages? RQ2: What do developers ask about code commenting practices? Thesis statement
  • 13.
    13 RQ3: What qualityattributes are often considered in assessing comment quality? “Understanding the speci fi cation of high-quality comments to build e ff ective assessment tools requires a multi-perspective view of the comments. The view can be approached by” RQ1: What do developers write in comments across languages? RQ2: What do developers ask about code commenting practices? Thesis statement
  • 14.
    14 Java, Python, Smalltalk, 20 projects Classify sample comments , 1, 066 comments Output: a taxonomy, and a classifier Extract class comments, 37, 446 comments RQ1: What do developers write in comments across languages?
  • 15.
    /* * *A class representinga window on the screen . *For example : *<pre > *Window win = new Window(parent) ; *win.show() ; *</pre > * *@author Sami Shai o *@version 1.13, 06/08/0 6 *@see java.awt.BaseWindo w */ class Window extends BaseWindow { . . } 15 Summary Usage Ownership Pointer RQ1: What do developers write in comments across languages?
  • 16.
    16 Java Smalltalk Python CCTM RQ1:What do developers write in comments across languages?
  • 17.
    17 Recurrent natural languagepatterns in writing a specific type of information /* * *A class representing a window on the screen . *For example : *<pre > *Window win = new Window(parent) ; *win.show() ; *</pre > * *@author Sami Shai o *@version 1.13, 06/08/0 6 *@see java.awt.BaseWindo w */ class Window extends BaseWindow { . . } Pointer Summary Automatically identify an information type
  • 18.
    18 Represents [something] [verb]s [noun] Sees[something] Pointer /* * *A class representing a window on the screen . *For example : *<pre > *Window win = new Window(parent) ; *win.show() ; *</pre > * *@author Sami Shai o *@version 1.13, 06/08/0 6 *@see java.awt.BaseWindo w */ class Window extends BaseWindow { . . } Summary Automatically identify an information type Recurrent natural language patterns in writing a specific type of information
  • 19.
    19 Techniques Textual Analysis Features 1) 2)3) Learning Phase Evaluation 4) TEXT FEATURES NLP RULES J48 Naive Bayes Random Forest, Natural Language Processing Automated Classification of Comments Types (RQ2) Projects CCTM Taxonomy Definition (RQ1) CCTM Used recurrent patterns as feature set + text features Ground truth: manually analyzed comments Automatically identify an information type
  • 20.
    20 0 0.2 0.4 0.6 0.8 1 S u m m a r y E x p a n d O w n e r s h i p P o i n t e r U s a g e D e p r e c a t i o n R a t i o n a l e Java Top Categories NaiveBayesJ48 RandomForest S u m m a r y U s a g e E x p a n d D e v e l o p m e n t N o t e s P a r a m e t e r s Python Top Categories NaiveBayes J48 RandomForest R e s p o n s i b i l i t y I n t e n t C o l l a b o r a t o r s E x a m p l e s C l a s s R e f e r e n c e s K e y M e s s a g e I m p l e m e n t a t i o n P o i n t Pharo Top Categories NaiveBayes J48 RandomForest Random Forest technique classifies comments better Java Smalltalk Python Automatically identify an information type
  • 21.
    21 We are currentlytesting our classification approach using the deep learning technique (FastText) Developers embed at least 10 types of information in comments across languages Using recurrent natural language patterns as features improves the classification Takeaway
  • 22.
    22 Timeline & Contribution Accepte d EmpiricalSoftware Engineering (EMSE) , 2021 Under minor revisio n Journal of Systems and Software (JSS), 2021 How to Identify Class Comment Types? A Multi-language Approach for Class Comments Classi fi cation An investigation of comments in Java, Python, Smalltalk, and develop an approach to automatically identify information types across languages What do class comments tell us? An investigation of comment evolution and practices in Pharo Smalltalk An investigation of comment evolution, content and their adherence to style guideline In the stage of acceptance
  • 23.
    23 RQ2: What dodevelopers ask about comments?
  • 24.
    24 Mailing lists, Stack Overflow,Quora Classify sample discussion , 1, 400 posts Output: a taxonomy, guidelines, tool Extract discussion, 23, 631 posts RQ2: What do developers ask about comments?
  • 25.
    25 0% 10% 20%30% 40% 50% 60% 70% 80% 90% 100% Error Opinion Implementation Problems Limitation and Possibilities Background Information Best Practice Implementation Strategy Information Types Question Types Commenting high levels Programming languages Tools IDEs & editors Other RQ2: What do developers ask about comments?
  • 26.
    Developers seek commentingconventions to write quality comments but often find it hard to locate them 26 To what extent developers follow comment conventions in writing comments? Takeaway
  • 27.
    27 Timeline & Contribution Whatdo Developers Discuss about Comment Conventions? An investigation of developer concerns related to comments on various online platforms Makar: A Framework for Multi-source Studies based on Unstructured Data A tool to conduct study on multiple sources such as emails, stack over fl ow, GitHub In preparation Source Code Analysis and Manipulation (SCAM), 2021 Accepte d Software Analysis, Evolution and Reengineering (SANER), 2021 In the stage of acceptance In the stage of submission
  • 28.
    28 RQ3: What qualityattributes are often considered in assessing comment quality?
  • 29.
    29 SE conferences &journal s 195 venues Identify candidate paper s 2, 353 papers Output: quality attributes, metrics Extract proceedings, 332 proceedings Select relevant papers, 47 papers RQ3: What quality attributes are often considered in assessing comment quality?
  • 30.
    30 RQ3: What qualityattributes are often considered in assessing comment quality?
  • 31.
    31 RQ3: What qualityattributes are often considered in assessing comment quality?
  • 32.
    32 RQ3: What qualityattributes are often considered in assessing comment quality?
  • 33.
    Researchers are focusingoften on handful of quality attributes in assessing comment quality 33 Numerous studies suggest the benefits of usability, or accessibility of comments. We need to focus on assessing comments w.r.t them. Takeaway
  • 34.
    34 Results are submittedto a software engineering venue Timeline & Contribution Do Developers Follow Comment Conventions? An investigation of comment adherence to comment conventions A SLR about comments An investigation of quality attributes, and techniques used for comment assessment In preparation In the stage of submission
  • 35.
    35 Speculative Analysis ofCode Comments What information do developers write in the comments? - Identify information types from sample comments - Propose a ML pipeline to automate Class Comment Type Model (CCTM) Machine learning What quality attributes are considered in assessing comments ? Conduct a systematic literature review on code comments quality Metrics Quality attributes Relevant Literature Implications for researchers and developers to improve comment quality What do developers ask about commenting practices? - Analyze sample posts - Prepare a taxonomy Challenges Relevant topics Tool
  • 36.
    36 The ultimate goalof automatically assessing comments is still far away…
  • 37.
    37 Which quality attributesdo developers find important? Which information types do developers find important? How do various information types support developers? An IDE plugin to support automatic assessment of comments Future work
  • 38.
    38 Developer seek andfollow comment conventions. Having support to assess various aspects of comments can help maintaining high-quality comments Takeaway https://twitter.com/poojaruhal http://scg.unibe.ch/staff/Pooja-Rani