Analyzing Code Comments
Pooja Rani
PhD student
Software Composition Group
University of Bern, Switzerland
Roadmap
■ Code comment types
■ What is a good comment?
■ Challenges
■ Address the challenges
2
We use different tools and
techniques to understand code.
3
4
In GT, try to understand the class “BrLook”, “BlElement”
without class comments.
5
6
7
Code can describe how but it can
not explain why.
Code comment types
■ Documentation (/** … */)
(also used for packages / classes / methods
■ Block comments (/* … */)
■ Inline comments (//…)
8
Programming languages follow different
syntaxes for comments.
9
However, most languages support a distinct delimiter for
comments.
10
Java class comment
11
Java class comment
Python class comment
12
Java class comment Python class comment
GT class comment
13
Java method comment
14
Java method comment Python method comment
15
Java method comment Python method comment
GT method comment
What is a good comment?
16
/*
* Dear Maintainer
*
* Once you are done trying to ‘optimize’ this routine,
* and you have realized what a terrible mistake that was,
* please increment the following counter as a warning
* to the next guy.
*
* total_hours_wasted_here = 73
*
// When I wrote this, only God and I understood what I was doing
// Now, God only knows
#This is brilliant
#Thanks. It’s nap time.
What is a good comment?
17
// format matched kk:mm:ss EEE, MMM dd, yyy
Pattern timePattern = Pattern.compile("d*:d*:d* w*, w*, d*, d*");
Note that to encode a String as Base64, you first have to encode the characters as bytes
using character encoder.
It makes sense to use me if scalable element has fixed or matching parent horizontal size
but fits content vertically.
Examples
Preconditions
Warnings
■ Helps other developers in working with your code
■ Describes why, and not how
■ Reveals intent, limitation, assumptions, design decisions
■ Justifies the violation of a programming style
18
What is a good comment?
Coding style guidelines
■ To write consistent & informative comments
■ “Write a one-line summary of the class in the class
comment”.
19
Coding style guidelines
■ Java: Oracle, Apache, Google
■ Python: Pep, Google, Numpy
■ Smalltalk: Smalltalk style guide, comment template
■ Ruby: RubyStyle
20
Oracle: https://www.oracle.com/technical-resources/articles/java/javadoc-tool.html
Numpy:https://numpydoc.readthedocs.io/en/latest/format.html#docstring-standard
Smalltalk:http://sdmeta.gforge.inria.fr/FreeBooks/WithStyle/SmalltalkWithStyle.pdf
21
Pharo class comment template
Smalltalk:http://sdmeta.gforge.inria.fr/FreeBooks/WithStyle/SmalltalkWithStyle.pdf
Challenges
■ Missing comments
■ Inconsistent & Incomplete
■ Duplicate comments
■ Outdated comments
■ Takes time to write
■ Complex to understand
22
Impact overall quality of
comments
23
We use different tools and
techniques to analyze comment
quality.
Code analysis tools
■ Syntax
■ Semantics
■ Style
24
C de
Te i g
C
a a i
C di g e
chec e
S a ic a a i
R i e
Da a
A a i
IDE
25
But what about comments?
Comment analysis tools
■ Syntax
■ Semantics
■ Style
26
Da a
A a i
C e
Te i g
C
a a i
C di g e
chec e
S a ic a a i
R i e
IDE
Comment analysis tools
■ Syntax
■ Semantics
■ Style
27
Da a
A a i
C e
Te i g
C
a a i
C di g e
chec e
S a ic a a i
R i e
IDE
Javadoc, pydoc
C
heckstyle, Pylint
Documentation tools
■ Check syntax of comments
■ Do not check the content
28
Coding style checkers
■ Java: Checkstyle, PMD
■ Python: pylint, pycodestyle
■ Smalltalk: No linter
■ Ruby: RuboCop
29
Coding style checkers
■ Detect presence/absence of comments
■ Check whether comments follow style guidelines or not
to some extent.
■ Limited to selected metrics (code/comment ratio)
30
31
Comment Content Analysis
32
Information types in Java code comments
■ Analyze code comments of
big and diverse projects
■ Identified the information
types using machine
learning approaches.
33
Information types in Java code comments
34
Information types in Python code
35
Information types in Pharo code comments
1
1
2
3
4
3
5
4
8
10
11
21
29
34
33
43
40
51
57
75
134
231
256
0 50 100 150 200 250 300
License
TODO comments
Links
Coding Guidelines
Extensions
Other
Naming conventions
Observations
Subclasses explanation
Recommendations
Discourse
ReferencesOtherResources
Dependencies
Contracts
Key Messages
Instance Variables
Warnings
Implementation Points
Examples
Class References
Collaborators
Responsibility
Intent
Number of classes
Categories
36
Information types in Pharo code comments
1
1
2
3
4
3
5
4
8
10
11
21
29
34
33
43
40
51
57
75
134
231
256
0 50 100 150 200 250 300
License
TODO comments
Links
Coding Guidelines
Extensions
Other
Naming conventions
Observations
Subclasses explanation
Recommendations
Discourse
ReferencesOtherResources
Dependencies
Contracts
Key Messages
Instance Variables
Warnings
Implementation Points
Examples
Class References
Collaborators
Responsibility
Intent
Number of classes
Categories
37
Information types across Pharo projects
38
Class: FTTreeItem
I am an abstract class to define an Item use by a tree data source of Fast table.
Description
-------------------------------------------------
I define the basics methods needed by a FTTreeDataSource.
I use FTTreeItem to manage my elements and I am use by a FTFastTable.
Public API and Key Messages
-------------------------------------------------
• #data. anObject from: aFTTreeDataSource
This is my constructor that is use by FTTreeDataSource and myself
Example
-------------------------------------------------
Should not be instanciate.
Internal Representation and Key Implementation Points.
-------------------------------------------------
Instance Variables
dataSource: I am the dataSource that holds this Item.
children: I am a collection of Items calculate by the item. I contains the chldren of the
Item.
Collected data
39
Class: FTTreeItem
I am an abstract class to define an Item use by a tree data source of Fast table.
Description
-------------------------------------------------
I define the basics methods needed by a FTTreeDataSource.
I use FTTreeItem to manage my elements and I am use by a FTFastTable.
Public API and Key Messages
-------------------------------------------------
• #data. anObject from: aFTTreeDataSource
This is my constructor that is use by FTTreeDataSource and myself
Example
-------------------------------------------------
Should not be instanciate.
Internal Representation and Key Implementation Points.
-------------------------------------------------
Instance Variables
dataSource: I am the dataSource that holds this Item.
children: I am a collection of Items calculate by the item. I contains the chldren of the
Item.
Intent
Responsibility
Key	Messages
Warning
Instance	variables
Collaborator
Collected data
40
Taxonomy
41
Class: FTTreeItem
I am an abstract class to define an Item use by a tree data source of Fast table.
Description
-------------------------------------------------
I define the basics methods needed by a FTTreeDataSource.
I use FTTreeItem to manage my elements and I am use by a FTFastTable.
Public API and Key Messages
-------------------------------------------------
• #data. anObject from: aFTTreeDataSource
This is my constructor that is use by FTTreeDataSource and myself
Example
-------------------------------------------------
Should not be instanciate.
Internal Representation and Key Implementation Points.
-------------------------------------------------
Instance Variables
dataSource: I am the dataSource that holds this Item.
children: I am a collection of Items calculate by the item. I contains the chldren of the
Item.
Intent
Responsibility
Key	Messages
Warning
Instance	variables
Collaborator
Identification of patterns
42
Techniques
43
Training & Testing
44
Workflow
Comments do change over time
45
We need to understand their change behaviour.
46
Comment Evolution
Java code comment co-evolution
47
48
Java code comment co-evolution
Source: Analyzing the co-evolution of comments and source code, fig 1
Java code comment co-evolution
■ Over 50% of the comment
changes are related to
source code changes
■ Newly added code gets
barely commented
49
Pharo class comment co-evolution
50
Pharo Comment Evolution
51
Pharo Comment Evolution
52
Pharo class comment evolution
53
■ Over 50% of the comment
changes are related to
source code changes
■ Newly added code gets
commented often
54
Pharo class comment co-evolution
Not all comments need to be changed
when the code is changed & vice versa
55
We need to identify such cases when a comment needs to be changed
Automatic generation and
summarization of comments
56
Code summaries
57
■ “Automatically generated, short, yet accurate descriptions of
source code entities”.
■ They give more information than just the header or the name
of an artifact.
■ Significantly shorter and faster to read than the source code
they summarise
Example of natural language summaries
58
35
https://www.textcompactor.com/
Example of natural language summaries
59
35
https://www.textcompactor.com/
60
Example of Natural Language Summaries
MILAN, Italy, April 18. A small airplane crashed into a government
building in heart of Milan, setting the top floors on fire, Italian
police reported. There were no immediate reports on casualties as
rescue workers attempted to clear the area in the city's financial
district. Few details of the crash were available, but news reports
about it immediately set off fears that it might be a terrorist act
akin to the Sept. 11 attacks in the United States. Those fears sent
U.S. stocks tumbling to session lows in late morning trading.
Witnesses reported hearing a loud explosion from the 30-story
office building, which houses the administrative offices of the local
Lombardy region and sits next to the city's central train station.
Italian state television said the crash put a hole in the 25th floor
of the Pirelli building. News reports said smoke poured from the
opening. Police and ambulances rushed to the building in downtown
Milan. No further details were immediately available.
61
Example of natural language summaries
MILAN, Italy, April 18. A small airplane crashed into a government
building in heart of Milan, setting the top floors on fire, Italian
police reported. There were no immediate reports on casualties as
rescue workers attempted to clear the area in the city's financial
district. Few details of the crash were available, but news reports
about it immediately set off fears that it might be a terrorist act
akin to the Sept. 11 attacks in the United States. Those fears sent
U.S. stocks tumbling to session lows in late morning trading.
Witnesses reported hearing a loud explosion from the 30-story
office building, which houses the administrative offices of the local
Lombardy region and sits next to the city's central train station.
Italian state television said the crash put a hole in the 25th floor
of the Pirelli building. News reports said smoke poured from the
opening. Police and ambulances rushed to the building in downtown
Milan. No further details were immediately available.
How many victims?
Was it a terrorist act?
What was the target?
What happened?
Says who?
When, where?
Comment summaries
62
https://www.textcompactor.com/
63
https://www.textcompactor.com/
Comment summaries
Navigating classes
64
https://github.com/larsb/atunesplus/blob/master/aTunes/src/main/java/net/sourceforge/atunes/kernel/modules/repository/audio/AudioFile.java
Navigating classes
65
https://github.com/larsb/atunesplus/blob/master/aTunes/src/main/java/net/sourceforge/atunes/kernel/modules/repository/audio/AudioFile.java
We look at:
- Name of the class
- Attributes
- Methods
- Dependencies between classes
Java class summaries
■ Identify the stereotypes.
■ Gather the heuristics
■ Generate the class
summary
66
Workflow
67
JSummarizer: An automatic generator of natural language summaries for Java classes by Moreno
Summary
68
■ Missing some information
■ Not easy to understand
69
Content adequacy vs. expressiveness
To overcome these limitations…
70
Researchers proposed to detect source code
descriptions from external sources:

■ Mailing list and issue trackers

■ StackOverflow discussions
What information to include, how much,
and how to generate and present it.
71
Linguistic Analysis
72
73
TAACO
Linguistic analysis for English
74
TAACO TAALES
Linguistic analysis for English
75
TAACO TAALES ARTE
Linguistic analysis for English
76
Style analysis of Java comments
■ Metric-based methods
■ Comment too short, too long
■ Check documentable items
(tags) of a code entity
■ Readability of the comments
77
Style analysis of Java comments
■ Metric-based method
■ Check whether comments
are similar to method names
Adjust the metrics and their thresholds
according to comment.
78
Future work
■ Evaluate if a comment is good or not
■ Detect the inconsistency in the comment
■ Propose refactoring of the comment, if required.
79
Summary
80

Analyzing code comment aspects

  • 1.
    Analyzing Code Comments PoojaRani PhD student Software Composition Group University of Bern, Switzerland
  • 2.
    Roadmap ■ Code commenttypes ■ What is a good comment? ■ Challenges ■ Address the challenges 2
  • 3.
    We use differenttools and techniques to understand code. 3
  • 4.
    4 In GT, tryto understand the class “BrLook”, “BlElement” without class comments.
  • 5.
  • 6.
  • 7.
    7 Code can describehow but it can not explain why.
  • 8.
    Code comment types ■Documentation (/** … */) (also used for packages / classes / methods ■ Block comments (/* … */) ■ Inline comments (//…) 8
  • 9.
    Programming languages followdifferent syntaxes for comments. 9 However, most languages support a distinct delimiter for comments.
  • 10.
  • 11.
  • 12.
    12 Java class commentPython class comment GT class comment
  • 13.
  • 14.
    14 Java method commentPython method comment
  • 15.
    15 Java method commentPython method comment GT method comment
  • 16.
    What is agood comment? 16 /* * Dear Maintainer * * Once you are done trying to ‘optimize’ this routine, * and you have realized what a terrible mistake that was, * please increment the following counter as a warning * to the next guy. * * total_hours_wasted_here = 73 * // When I wrote this, only God and I understood what I was doing // Now, God only knows #This is brilliant #Thanks. It’s nap time.
  • 17.
    What is agood comment? 17 // format matched kk:mm:ss EEE, MMM dd, yyy Pattern timePattern = Pattern.compile("d*:d*:d* w*, w*, d*, d*"); Note that to encode a String as Base64, you first have to encode the characters as bytes using character encoder. It makes sense to use me if scalable element has fixed or matching parent horizontal size but fits content vertically. Examples Preconditions Warnings
  • 18.
    ■ Helps otherdevelopers in working with your code ■ Describes why, and not how ■ Reveals intent, limitation, assumptions, design decisions ■ Justifies the violation of a programming style 18 What is a good comment?
  • 19.
    Coding style guidelines ■To write consistent & informative comments ■ “Write a one-line summary of the class in the class comment”. 19
  • 20.
    Coding style guidelines ■Java: Oracle, Apache, Google ■ Python: Pep, Google, Numpy ■ Smalltalk: Smalltalk style guide, comment template ■ Ruby: RubyStyle 20 Oracle: https://www.oracle.com/technical-resources/articles/java/javadoc-tool.html Numpy:https://numpydoc.readthedocs.io/en/latest/format.html#docstring-standard Smalltalk:http://sdmeta.gforge.inria.fr/FreeBooks/WithStyle/SmalltalkWithStyle.pdf
  • 21.
    21 Pharo class commenttemplate Smalltalk:http://sdmeta.gforge.inria.fr/FreeBooks/WithStyle/SmalltalkWithStyle.pdf
  • 22.
    Challenges ■ Missing comments ■Inconsistent & Incomplete ■ Duplicate comments ■ Outdated comments ■ Takes time to write ■ Complex to understand 22 Impact overall quality of comments
  • 23.
    23 We use differenttools and techniques to analyze comment quality.
  • 24.
    Code analysis tools ■Syntax ■ Semantics ■ Style 24 C de Te i g C a a i C di g e chec e S a ic a a i R i e Da a A a i IDE
  • 25.
  • 26.
    Comment analysis tools ■Syntax ■ Semantics ■ Style 26 Da a A a i C e Te i g C a a i C di g e chec e S a ic a a i R i e IDE
  • 27.
    Comment analysis tools ■Syntax ■ Semantics ■ Style 27 Da a A a i C e Te i g C a a i C di g e chec e S a ic a a i R i e IDE Javadoc, pydoc C heckstyle, Pylint
  • 28.
    Documentation tools ■ Checksyntax of comments ■ Do not check the content 28
  • 29.
    Coding style checkers ■Java: Checkstyle, PMD ■ Python: pylint, pycodestyle ■ Smalltalk: No linter ■ Ruby: RuboCop 29
  • 30.
    Coding style checkers ■Detect presence/absence of comments ■ Check whether comments follow style guidelines or not to some extent. ■ Limited to selected metrics (code/comment ratio) 30
  • 31.
  • 32.
    32 Information types inJava code comments ■ Analyze code comments of big and diverse projects ■ Identified the information types using machine learning approaches.
  • 33.
    33 Information types inJava code comments
  • 34.
  • 35.
    35 Information types inPharo code comments 1 1 2 3 4 3 5 4 8 10 11 21 29 34 33 43 40 51 57 75 134 231 256 0 50 100 150 200 250 300 License TODO comments Links Coding Guidelines Extensions Other Naming conventions Observations Subclasses explanation Recommendations Discourse ReferencesOtherResources Dependencies Contracts Key Messages Instance Variables Warnings Implementation Points Examples Class References Collaborators Responsibility Intent Number of classes Categories
  • 36.
    36 Information types inPharo code comments 1 1 2 3 4 3 5 4 8 10 11 21 29 34 33 43 40 51 57 75 134 231 256 0 50 100 150 200 250 300 License TODO comments Links Coding Guidelines Extensions Other Naming conventions Observations Subclasses explanation Recommendations Discourse ReferencesOtherResources Dependencies Contracts Key Messages Instance Variables Warnings Implementation Points Examples Class References Collaborators Responsibility Intent Number of classes Categories
  • 37.
  • 38.
    38 Class: FTTreeItem I aman abstract class to define an Item use by a tree data source of Fast table. Description ------------------------------------------------- I define the basics methods needed by a FTTreeDataSource. I use FTTreeItem to manage my elements and I am use by a FTFastTable. Public API and Key Messages ------------------------------------------------- • #data. anObject from: aFTTreeDataSource This is my constructor that is use by FTTreeDataSource and myself Example ------------------------------------------------- Should not be instanciate. Internal Representation and Key Implementation Points. ------------------------------------------------- Instance Variables dataSource: I am the dataSource that holds this Item. children: I am a collection of Items calculate by the item. I contains the chldren of the Item. Collected data
  • 39.
    39 Class: FTTreeItem I aman abstract class to define an Item use by a tree data source of Fast table. Description ------------------------------------------------- I define the basics methods needed by a FTTreeDataSource. I use FTTreeItem to manage my elements and I am use by a FTFastTable. Public API and Key Messages ------------------------------------------------- • #data. anObject from: aFTTreeDataSource This is my constructor that is use by FTTreeDataSource and myself Example ------------------------------------------------- Should not be instanciate. Internal Representation and Key Implementation Points. ------------------------------------------------- Instance Variables dataSource: I am the dataSource that holds this Item. children: I am a collection of Items calculate by the item. I contains the chldren of the Item. Intent Responsibility Key Messages Warning Instance variables Collaborator Collected data
  • 40.
  • 41.
    41 Class: FTTreeItem I aman abstract class to define an Item use by a tree data source of Fast table. Description ------------------------------------------------- I define the basics methods needed by a FTTreeDataSource. I use FTTreeItem to manage my elements and I am use by a FTFastTable. Public API and Key Messages ------------------------------------------------- • #data. anObject from: aFTTreeDataSource This is my constructor that is use by FTTreeDataSource and myself Example ------------------------------------------------- Should not be instanciate. Internal Representation and Key Implementation Points. ------------------------------------------------- Instance Variables dataSource: I am the dataSource that holds this Item. children: I am a collection of Items calculate by the item. I contains the chldren of the Item. Intent Responsibility Key Messages Warning Instance variables Collaborator Identification of patterns
  • 42.
  • 43.
  • 44.
  • 45.
    Comments do changeover time 45 We need to understand their change behaviour.
  • 46.
  • 47.
    Java code commentco-evolution 47
  • 48.
    48 Java code commentco-evolution Source: Analyzing the co-evolution of comments and source code, fig 1
  • 49.
    Java code commentco-evolution ■ Over 50% of the comment changes are related to source code changes ■ Newly added code gets barely commented 49
  • 50.
    Pharo class commentco-evolution 50
  • 51.
  • 52.
  • 53.
    Pharo class commentevolution 53
  • 54.
    ■ Over 50%of the comment changes are related to source code changes ■ Newly added code gets commented often 54 Pharo class comment co-evolution
  • 55.
    Not all commentsneed to be changed when the code is changed & vice versa 55 We need to identify such cases when a comment needs to be changed
  • 56.
  • 57.
    Code summaries 57 ■ “Automaticallygenerated, short, yet accurate descriptions of source code entities”. ■ They give more information than just the header or the name of an artifact. ■ Significantly shorter and faster to read than the source code they summarise
  • 58.
    Example of naturallanguage summaries 58 35 https://www.textcompactor.com/
  • 59.
    Example of naturallanguage summaries 59 35 https://www.textcompactor.com/
  • 60.
    60 Example of NaturalLanguage Summaries MILAN, Italy, April 18. A small airplane crashed into a government building in heart of Milan, setting the top floors on fire, Italian police reported. There were no immediate reports on casualties as rescue workers attempted to clear the area in the city's financial district. Few details of the crash were available, but news reports about it immediately set off fears that it might be a terrorist act akin to the Sept. 11 attacks in the United States. Those fears sent U.S. stocks tumbling to session lows in late morning trading. Witnesses reported hearing a loud explosion from the 30-story office building, which houses the administrative offices of the local Lombardy region and sits next to the city's central train station. Italian state television said the crash put a hole in the 25th floor of the Pirelli building. News reports said smoke poured from the opening. Police and ambulances rushed to the building in downtown Milan. No further details were immediately available.
  • 61.
    61 Example of naturallanguage summaries MILAN, Italy, April 18. A small airplane crashed into a government building in heart of Milan, setting the top floors on fire, Italian police reported. There were no immediate reports on casualties as rescue workers attempted to clear the area in the city's financial district. Few details of the crash were available, but news reports about it immediately set off fears that it might be a terrorist act akin to the Sept. 11 attacks in the United States. Those fears sent U.S. stocks tumbling to session lows in late morning trading. Witnesses reported hearing a loud explosion from the 30-story office building, which houses the administrative offices of the local Lombardy region and sits next to the city's central train station. Italian state television said the crash put a hole in the 25th floor of the Pirelli building. News reports said smoke poured from the opening. Police and ambulances rushed to the building in downtown Milan. No further details were immediately available. How many victims? Was it a terrorist act? What was the target? What happened? Says who? When, where?
  • 62.
  • 63.
  • 64.
  • 65.
  • 66.
    Java class summaries ■Identify the stereotypes. ■ Gather the heuristics ■ Generate the class summary 66
  • 67.
    Workflow 67 JSummarizer: An automaticgenerator of natural language summaries for Java classes by Moreno
  • 68.
  • 69.
    ■ Missing someinformation ■ Not easy to understand 69 Content adequacy vs. expressiveness
  • 70.
    To overcome theselimitations… 70 Researchers proposed to detect source code descriptions from external sources: ■ Mailing list and issue trackers ■ StackOverflow discussions
  • 71.
    What information toinclude, how much, and how to generate and present it. 71
  • 72.
  • 73.
  • 74.
  • 75.
    75 TAACO TAALES ARTE Linguisticanalysis for English
  • 76.
    76 Style analysis ofJava comments ■ Metric-based methods ■ Comment too short, too long ■ Check documentable items (tags) of a code entity ■ Readability of the comments
  • 77.
    77 Style analysis ofJava comments ■ Metric-based method ■ Check whether comments are similar to method names
  • 78.
    Adjust the metricsand their thresholds according to comment. 78
  • 79.
    Future work ■ Evaluateif a comment is good or not ■ Detect the inconsistency in the comment ■ Propose refactoring of the comment, if required. 79
  • 80.