How to Identify Class Comment Types? 

A Multi-language Approach for Class
Comment Classification
Pooja Rani, Sebastiano Panichella, Manuel
Leuenberger, Andrea Di Sorbo, Oscar Nierstrasz
SANER 2022 Journal First
/**
* A class representing a window on the screen.
*
* For example:
* <pre>
* Window win = new Window(parent);
* win.show();
* </pre>
*
* @author Sami Shaio
* @version 1.13, 06/08/06
* @see java.awt.BaseWindow
*/
class Window extends BaseWindow {
…
}
2
Motivation
Trustworthy form of documentation
- McMillan et al. 2010
High-quality code comments assist developers
- Dekel et al. 2009
3
Problem
/**
* A class representing a window on the screen.
*
* For example:
* <pre>
* Window win = new Window(parent);
* win.show();
* </pre>
*
* @author Sami Shaio
* @version 1.13, 06/08/06
* @see java.awt.BaseWindow
*/
class Window extends BaseWindow {
…
} Does this
comment contain any
warning/example?
4
No standard definition of comments

No strict syntax and structure conventions

Lack of quality assessment tools

Challenges
5
No standard definition of comments

No strict syntax and structure conventions

Lack of quality assessment tools

Challenges
6
No standard definition of comments

No strict syntax and structure conventions

Lack of quality assessment tools

Challenges
7
No standard definition of comments

No strict syntax and structure conventions

Lack of quality assessment tools

Makes information identification a non-trivial problem
Challenges
8
Increasing multi-language environments
9
Increasing multi-language environments
10
- Tomassetti et al. 2014
97% of open-source projects used two or more programming languages
Increasing multi-language environments
Each language has its own conventions to write comments
11
Given the increasing use of multi-language environments,
we need a deeper understanding of developer
commenting practices across languages
12
RQ1: What types of information are present in class comments? 

To what extent do information types vary across programming languages?
Given the increasing use of multi-language environments,
we need a deeper understanding of developer
commenting practices across languages
/**
* A class representing a window on the screen.
*
* For example:
* <pre>
* Window win = new Window(parent);
* win.show();
* </pre>
*
* @author Sami Shaio
* @version 1.13, 06/08/06
* @see java.awt.BaseWindow
*/
class Window extends BaseWindow {
…
}
13
Summary
Information types in comments
14
Summary
Information types in comments
Usage
/**
* A class representing a window on the screen.
*
* For example:
* <pre>
* Window win = new Window(parent);
* win.show();
* </pre>
*
* @author Sami Shaio
* @version 1.13, 06/08/06
* @see java.awt.BaseWindow
*/
class Window extends BaseWindow {
…
}
15
Java, Python, Smalltalk,
20 projects
Classify sample comments,
1, 066 comments
Output: a taxonomy, and a classifier
Extract class comments,
37, 446 comments
RQ1: What types of information are present in class comments?
16
Comment taxonomies in languages
17
Pascarella et al., 2017 Zhang et. al., 2018
Rani et al., 2021
Java Smalltalk Python
18
Pascarella et al., 2017 Zhang et. al., 2018
Rani et al., 2021
License/Copyright
Extension
Autogenerated
Commented code
Deprecation
Directive
Exception
Expand
Formatter
Incomplete
License
Noise
Ownership
Pointer
Rationale
Summary
Todo
Under Development
Unmapped
Usage Class References
Coding Guidelines
Collaborators
Dependencies
Discourse
Examples
Instance Variables
Intent
Key Implementation Point
Key Messages
Links
Observation
Other
Preconditions
Recommedation
ReferenceToOtherResource
Responsibility
Subclasses Explanation
Todo
Unmapped
Warnings
Development Notes
Exception
Expand
Links
Metadata
Noise
Parameters
Summary
Todo
Unmapped
Usage
Version
Java Smalltalk Python
License
Extension
Auto generated
Commented code
Deprecation
Directive
Exception
Expand
Incomplete
License
Noise
Ownership
Pointer
Rationale
Summary
Todo
Under development
Unmapped
Usage Class reference
Coding guideline
Collaborator
Dependency
Discourse
Example
Instance variable
Intent
Key implementation point
Key message
Links
Observation
Other
Precondition
Recommedation
ReferenceOtherResource
Responsibility
Subclass explanation
Todo
Unmapped
Warning
Development notes
Exception
Expand
Links
Metadata
Parameters
Summary
Todo
Unmapped
Usage
Version
Formatter Noise
19
Java Smalltalk Python
Pascarella et al., 2017 Zhang et. al., 2018
Rani et al., 2021
CCTM (Class Comment Types Model)
Color
scale
according
to
percentage
of
comments
falling
into
a
category
Eclipse
Guice
Guava
Vaadin
Hadoop
Spark
S
u
m
m
a
r
y
E
x
p
a
n
d
O
w
n
e
r
s
h
i
p
P
o
i
n
t
e
r
U
s
a
g
e
D
e
p
r
e
c
a
t
i
o
n
R
a
t
i
o
n
a
l
e
W
a
r
n
i
n
g
E
x
c
e
p
t
i
o
n
T
o
d
o
R
e
c
o
m
m
e
n
d
a
t
i
o
n
P
r
e
c
o
n
d
i
t
i
o
n
O
b
s
e
r
v
a
t
i
o
n
F
o
r
m
a
t
t
e
r
S
u
b
c
l
a
s
s
E
x
p
l
a
n
a
t
i
o
n
C
o
m
m
e
n
t
e
d
C
o
d
e
D
i
r
e
c
t
i
v
e
I
n
c
o
m
p
l
e
t
e
A
u
t
o
G
e
n
e
r
a
t
e
d
Django
Pipenv
Pytorch
Ipython
Pandas
Requests
Mailpile
S
u
m
m
a
r
y
U
s
a
g
e
E
x
p
a
n
d
D
e
v
e
l
o
p
m
e
n
t
N
o
t
e
s
P
a
r
a
m
e
t
e
r
s
W
a
r
n
i
n
g
L
i
n
k
s
R
e
c
o
m
m
e
n
d
a
t
i
o
n
S
u
b
c
l
a
s
s
E
x
p
l
a
n
a
t
i
o
n
E
x
c
e
p
t
i
o
n
V
e
r
s
i
o
n
P
r
e
c
o
n
d
i
t
i
o
n
C
o
d
i
n
g
G
u
i
d
e
l
i
n
e
T
o
d
o
O
b
s
e
r
v
a
t
i
o
n
D
e
p
e
n
d
e
n
c
y
E
x
t
e
n
s
i
o
n
N
o
i
s
e
GToolkit
Seaside
Roassal
Moose
PolyMath
Petit
Pillar
R
e
s
p
o
n
s
i
b
i
l
i
t
y
I
n
t
e
n
t
C
o
l
l
a
b
o
r
a
t
o
r
E
x
a
m
p
l
e
C
l
a
s
s
R
e
f
e
r
e
n
c
e
K
e
y
M
e
s
s
a
g
e
I
m
p
l
e
m
e
n
t
a
t
i
o
n
P
o
i
n
t
W
a
r
n
i
n
g
I
n
s
t
a
n
c
e
V
a
r
i
a
b
l
e
R
e
f
e
r
e
n
c
e
O
t
h
e
r
R
e
s
o
u
r
c
e
S
u
b
c
l
a
s
s
E
x
p
l
a
n
a
t
i
o
n
P
r
e
c
o
n
d
i
t
o
n
R
e
c
o
m
m
e
n
d
a
t
i
o
n
L
i
n
k
s
E
x
t
e
n
s
i
o
n
O
b
s
e
r
v
a
t
i
o
n
C
o
d
i
n
g
G
u
i
d
e
l
i
n
e
L
i
c
e
n
s
e
D
i
s
c
o
u
r
s
e
T
o
d
o
D
e
p
e
n
d
e
n
c
y
O
t
h
e
r
Categories
Smalltalk
Projects
Python
Projects
Java
Projects
100
50
0
Java Projects Python Projects Smalltalk Projects
Projects
Information
types
20
Color
scale
according
to
percentage
of
comments
falling
into
a
category
Eclipse
Guice
Guava
Vaadin
Hadoop
Spark
S
u
m
m
a
r
y
E
x
p
a
n
d
O
w
n
e
r
s
h
i
p
P
o
i
n
t
e
r
U
s
a
g
e
D
e
p
r
e
c
a
t
i
o
n
R
a
t
i
o
n
a
l
e
W
a
r
n
i
n
g
E
x
c
e
p
t
i
o
n
T
o
d
o
R
e
c
o
m
m
e
n
d
a
t
i
o
n
P
r
e
c
o
n
d
i
t
i
o
n
O
b
s
e
r
v
a
t
i
o
n
F
o
r
m
a
t
t
e
r
S
u
b
c
l
a
s
s
E
x
p
l
a
n
a
t
i
o
n
C
o
m
m
e
n
t
e
d
C
o
d
e
D
i
r
e
c
t
i
v
e
I
n
c
o
m
p
l
e
t
e
A
u
t
o
G
e
n
e
r
a
t
e
d
Django
Pipenv
Pytorch
Ipython
Pandas
Requests
Mailpile
S
u
m
m
a
r
y
U
s
a
g
e
E
x
p
a
n
d
D
e
v
e
l
o
p
m
e
n
t
N
o
t
e
s
P
a
r
a
m
e
t
e
r
s
W
a
r
n
i
n
g
L
i
n
k
s
R
e
c
o
m
m
e
n
d
a
t
i
o
n
S
u
b
c
l
a
s
s
E
x
p
l
a
n
a
t
i
o
n
E
x
c
e
p
t
i
o
n
V
e
r
s
i
o
n
P
r
e
c
o
n
d
i
t
i
o
n
C
o
d
i
n
g
G
u
i
d
e
l
i
n
e
T
o
d
o
O
b
s
e
r
v
a
t
i
o
n
D
e
p
e
n
d
e
n
c
y
E
x
t
e
n
s
i
o
n
N
o
i
s
e
GToolkit
Seaside
Roassal
Moose
PolyMath
Petit
Pillar
R
e
s
p
o
n
s
i
b
i
l
i
t
y
I
n
t
e
n
t
C
o
l
l
a
b
o
r
a
t
o
r
E
x
a
m
p
l
e
C
l
a
s
s
R
e
f
e
r
e
n
c
e
K
e
y
M
e
s
s
a
g
e
I
m
p
l
e
m
e
n
t
a
t
i
o
n
P
o
i
n
t
W
a
r
n
i
n
g
I
n
s
t
a
n
c
e
V
a
r
i
a
b
l
e
R
e
f
e
r
e
n
c
e
O
t
h
e
r
R
e
s
o
u
r
c
e
S
u
b
c
l
a
s
s
E
x
p
l
a
n
a
t
i
o
n
P
r
e
c
o
n
d
i
t
o
n
R
e
c
o
m
m
e
n
d
a
t
i
o
n
L
i
n
k
s
E
x
t
e
n
s
i
o
n
O
b
s
e
r
v
a
t
i
o
n
C
o
d
i
n
g
G
u
i
d
e
l
i
n
e
L
i
c
e
n
s
e
D
i
s
c
o
u
r
s
e
T
o
d
o
D
e
p
e
n
d
e
n
c
y
O
t
h
e
r
Categories
Smalltalk
Projects
Python
Projects
Java
Projects
100
50
0
Java Projects Python Projects Smalltalk Projects
Projects
Information
types
21
Color
scale
according
to
percentage
of
comments
falling
into
a
category
Eclipse
Guice
Guava
Vaadin
Hadoop
Spark
S
u
m
m
a
r
y
E
x
p
a
n
d
O
w
n
e
r
s
h
i
p
P
o
i
n
t
e
r
U
s
a
g
e
D
e
p
r
e
c
a
t
i
o
n
R
a
t
i
o
n
a
l
e
W
a
r
n
i
n
g
E
x
c
e
p
t
i
o
n
T
o
d
o
R
e
c
o
m
m
e
n
d
a
t
i
o
n
P
r
e
c
o
n
d
i
t
i
o
n
O
b
s
e
r
v
a
t
i
o
n
F
o
r
m
a
t
t
e
r
S
u
b
c
l
a
s
s
E
x
p
l
a
n
a
t
i
o
n
C
o
m
m
e
n
t
e
d
C
o
d
e
D
i
r
e
c
t
i
v
e
I
n
c
o
m
p
l
e
t
e
A
u
t
o
G
e
n
e
r
a
t
e
d
Django
Pipenv
Pytorch
Ipython
Pandas
Requests
Mailpile
S
u
m
m
a
r
y
U
s
a
g
e
E
x
p
a
n
d
D
e
v
e
l
o
p
m
e
n
t
N
o
t
e
s
P
a
r
a
m
e
t
e
r
s
W
a
r
n
i
n
g
L
i
n
k
s
R
e
c
o
m
m
e
n
d
a
t
i
o
n
S
u
b
c
l
a
s
s
E
x
p
l
a
n
a
t
i
o
n
E
x
c
e
p
t
i
o
n
V
e
r
s
i
o
n
P
r
e
c
o
n
d
i
t
i
o
n
C
o
d
i
n
g
G
u
i
d
e
l
i
n
e
T
o
d
o
O
b
s
e
r
v
a
t
i
o
n
D
e
p
e
n
d
e
n
c
y
E
x
t
e
n
s
i
o
n
N
o
i
s
e
GToolkit
Seaside
Roassal
Moose
PolyMath
Petit
Pillar
R
e
s
p
o
n
s
i
b
i
l
i
t
y
I
n
t
e
n
t
C
o
l
l
a
b
o
r
a
t
o
r
E
x
a
m
p
l
e
C
l
a
s
s
R
e
f
e
r
e
n
c
e
K
e
y
M
e
s
s
a
g
e
I
m
p
l
e
m
e
n
t
a
t
i
o
n
P
o
i
n
t
W
a
r
n
i
n
g
I
n
s
t
a
n
c
e
V
a
r
i
a
b
l
e
R
e
f
e
r
e
n
c
e
O
t
h
e
r
R
e
s
o
u
r
c
e
S
u
b
c
l
a
s
s
E
x
p
l
a
n
a
t
i
o
n
P
r
e
c
o
n
d
i
t
o
n
R
e
c
o
m
m
e
n
d
a
t
i
o
n
L
i
n
k
s
E
x
t
e
n
s
i
o
n
O
b
s
e
r
v
a
t
i
o
n
C
o
d
i
n
g
G
u
i
d
e
l
i
n
e
L
i
c
e
n
s
e
D
i
s
c
o
u
r
s
e
T
o
d
o
D
e
p
e
n
d
e
n
c
y
O
t
h
e
r
Categories
Smalltalk
Projects
Python
Projects
Java
Projects
100
50
0
Java Projects Python Projects Smalltalk Projects
Projects
Information
types
22
23
RQ1: What types of information are present in class comments? To what extent do information
types vary across programming languages?
RQ2: Can machine learning be used to automatically identify class comment types according
to CCTM?
Given the increasing use of multi-language environment,
we need a deeper understanding of developer
commenting practices across languages
24
/**
* A class representing a window on the screen.
*
* For example:
* <pre>
* Window win = new Window(parent);
* win.show();
* </pre>
*
* @author Sami Shaio
* @version 1.13, 06/08/06
* @see java.awt.BaseWindow
*/
class Window extends BaseWindow {
…
}
Summary
[verb]s [noun]
Class represents
[something]
Recurrent natural language patterns exist in various information types
25
/**
* A class representing a window on the screen.
*
* For example:
* <pre>
* Window win = new Window(parent);
* win.show();
* </pre>
*
* @author Sami Shaio
* @version 1.13, 06/08/06
* @see java.awt.BaseWindow
*/
class Window extends BaseWindow {
…
}
Summary
[verb]s [noun]
Class represents
[something]
How do we extract such patterns?
• To automatically identify textual
patterns in informal software
documents, intention mining
can be used.


• Di Sorbo et al., developed a
tool, NEON, to detect natural
language patterns.
26 Di Sorbo et. al., An NLP-based Tool for Software Artifact Analysis. ICSME 2021
Extract patterns
27
/**
*A class representing a window on the
screen.
*
*For example:
*<pre>
*Window win = new Window(parent);
*win.show();
*</pre>
*
*@author Sami Shaio
*@version 1.13, 06/08/06
*@see java.awt.BaseWindow
*/
class Window extends BaseWindow{
..
}
Class represents
[something]
Summary
Example patterns from summary
Techniques
Textual Analysis (TA)
Features
1) 2) 3)
Learning phase Evaluation
4)
TA Features
NLP Rule Features
J48


Naive Bayes


Random Forest,
Natural Language


Processing (NLP)
Projects CCTM
CCTM
28
Automatic identification of information types
Ground truth: 1,066
classified
comments
29
Ground truth: 1,066
classified
comments
Techniques
Textual Analysis (TA)
Features
1) 2) 3)
Learning phase Evaluation
4)
TA Features
NLP Rule Features
J48


Naive Bayes


Random Forest,
Natural Language


Processing (NLP)
Projects CCTM
CCTM
Features: recurrent
NL patterns + text
features
Automatic identification of information types
30
Supervised ML

algorithms
Ground truth: 1,066
classified
comments
Techniques
Textual Analysis (TA)
Features
1) 2) 3)
Learning phase Evaluation
4)
TA Features
NLP Rule Features
J48


Naive Bayes


Random Forest,
Natural Language


Processing (NLP)
Projects CCTM
CCTM
Features: recurrent
NL patterns + text
features
Automatic identification of information types
31
Random Forest technique classifies comments better
0
0.2
0.4
0.6
0.8
1
Summary
Expand
Ownership
Pointer
Usage
Deprecation
Rationale
Accuracy
Top categories in Java comments
NaiveBayes J48 RandomForest
Summary
Usage
Expand
DevelopmentNotes
Parameters
Top categories in Python comments
NaiveBayes J48 RandomForest
Responsibility
Intent
Collaborators
Examples
ClassReferences
KeyMessage
ImplementationPoint
Top categories in Smalltalk comments
NaiveBayes J48 RandomForest
Top categories in Java comments Top categories in Python comments Top categories in Smalltalk comments
S
u
m
m
a
r
y
E
x
p
a
n
d
O
w
n
e
r
s
h
i
p
P
o
i
n
t
e
r
U
s
a
g
e
D
e
p
r
e
c
a
t
i
o
n
R
a
t
i
o
n
a
l
e
P
a
r
a
m
e
t
e
r
s
R
e
s
p
o
n
s
i
b
i
l
i
t
y
I
n
t
e
n
t
C
o
l
l
a
b
o
r
a
t
o
r
s
E
x
a
m
p
l
e
s
C
l
a
s
s
R
e
f
e
r
e
n
c
e
K
e
y
M
e
s
s
a
g
e
I
m
p
l
e
m
e
n
t
a
t
i
o
n
P
o
i
n
t
S
u
m
m
a
r
y
U
s
a
g
e
E
x
p
a
n
d
D
e
v
e
l
o
p
m
e
n
t
N
o
t
e
s
Results
32
The ultimate goal of automatically assessing
comments is still far away…
33
Which information types do developers find important?

How do various information types support developers?

What quality attributes are important for comments?

An IDE plugin to support automatic assessment of
comments.

Future work
34
https://twitter.com/poojaruhal http://scg.unibe.ch/staff/Pooja-Rani
How to Identify Class Comment Types? A Multi-language
Approach for Class Comment Classification
Paper

https://www.sciencedirect.com/science/article/pii/S0164121221001448

Replication Package on GitHub

https://github.com/poojaruhal/RP-class-comment-classification.

YouTube

https://www.youtube.com/watch?v=_auMqCsxg0s
35
Summary

How to Identify Class Comment Types? A Multi-language Approach for Class Comment Classification