Supporting program comprehension with source code summarization icse nier 2010

Supporting Program
Comprehension with Source
Code Summarization
Sonia Haiduc*, Jairo Aponte**, Andrian Marcus*

ICSE NIER 2010

* **

Developers read source code

• Before performing maintenance on a
system, developers need to understand
its source code

• During comprehension, programmers
search and browse the code

Skimming vs. reading code
• Skimming (Starke’09): quickly reading the names of
software artifacts
+ Fast
– Insufficient information
– Shallow understanding

• Reading in depth
– Slow
– Too much information
+ Deeper understanding

Code summaries

• Automatically generated, short, yet accurate
descriptions of source code entities

• They give more information than just the
header or the name of an artifact

• Significantly shorter and faster to read than
the source code they summarize

What should we summarize?
• Code
– Packages
– Classes
– Methods
– Method sequences
– Etc.

• Other artifacts
– Bug reports (ICSE 2010 - S. Rastakar, G. Murphy, G. Murray)
– E-mails
– Etc.

What should we include
in code summaries?

• Semantic information
– What does the source code do?
– Identifiers and comments that capture the main concepts

• Structural information
– How does the code work?
– Class relationships, callers and callees, members of a
class, etc.

Description: VFS virtual file system read write
mkdir directory path save +
Internal classes: DirectoryEntry +
Methods: listDirectory, mkdir, constructPath +
Fields: WRITE_CAP, READ_CAP, lock +
Sub-classes: FileVFS, FavoritesVFS +
Other: ...

How should we generate
code summaries?

• Semantic information: automatic text
summarization
– Machine Learning
– Discourse-based approaches
– Term-based Text Retrieval techniques

• Structural information: static analysis

How can we evaluate code
summaries?

• How good are the automatic summaries
when compared to manual ones?

• How useful are the automatic code
summaries for SE tasks?

Preliminary evaluation

• Compared automatic code summaries
with developer code summaries

• 6 developers, 12 methods in ATunes

• Used only lexical information – 5 most
relevant terms

Results
• Automatic source code summaries good in
reflecting developers’ summaries

• Text Retrieval techniques work as well on
source code as on natural language in reflecting
human summaries

• Developers make use of structural information in
their code summaries:
– Method name terms
– Class name terms
– Formal parameter types terms

What are we doing now?

• What type and how much structural
information should be included in code
summaries?
• How do developers generate summaries?
• Are different summaries needed for
different tasks?
• How useful are the code summaries for
SE tasks?, etc.

In summary…
• Automatic code summaries:
– Short yet accurate descriptions of source code
– Can reduce the effort of program comprehension
– Embed both semantic and structural information
– Can be generated for a variety of software entities

• Visit my poster
(HINT: look for the huge and colorful one)
• www.cs.wayne.edu/~severe and
www.cs.wayne.edu/~shaiduc
• sonja@wayne.edu

Supporting program comprehension with source code summarization icse nier 2010

More Related Content

What's hot

Similar to Supporting program comprehension with source code summarization icse nier 2010

Recently uploaded

Supporting program comprehension with source code summarization icse nier 2010