Supporting ProgramComprehension with Source   Code Summarization     Sonia Haiduc*, Jairo Aponte**, Andrian Marcus*       ...
Developers read source code• Before performing maintenance on a  system, developers need to understand  its source code• D...
Skimming vs. reading code• Skimming (Starke’09): quickly reading the names of  software artifacts  + Fast  – Insufficient ...
Code summaries• Automatically generated, short, yet accurate  descriptions of source code entities• They give more informa...
What should we summarize?• Code   –   Packages   –   Classes   –   Methods   –   Method sequences   –   Etc.• Other artifa...
What should we include         in code summaries?• Semantic information  – What does the source code do?  – Identifiers an...
Description: VFS virtual file system read write              mkdir directory path save      +Internal classes: DirectoryEn...
How should we generate        code summaries?• Semantic information: automatic text  summarization  – Machine Learning  – ...
How can we evaluate code          summaries?• How good are the automatic summaries  when compared to manual ones?• How use...
Preliminary evaluation• Compared automatic code summaries  with developer code summaries• 6 developers, 12 methods in ATun...
Results• Automatic source code summaries good in  reflecting developers’ summaries• Text Retrieval techniques work as well...
What are we doing now?• What type and how much structural  information should be included in code  summaries?• How do deve...
In summary…• Automatic code summaries:  –   Short yet accurate descriptions of source code  –   Can reduce the effort of p...
Upcoming SlideShare
Loading in...5
×

Supporting program comprehension with source code summarization icse nier 2010

979

Published on

One of the main challenges faced by today’s developers is keeping up with the staggering amount of source code that needs to be read and understood. In order to help developers with this problem and reduce the costs associated with it, one solution is to use simple textual descriptions of source code entities that developers can grasp easily, while capturing the code semantics precisely. We propose an approach to automatically determine such descriptions, based on automated text summarization technology and structural information.

Published in: Education, Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
979
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
8
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Supporting program comprehension with source code summarization icse nier 2010

  1. 1. Supporting ProgramComprehension with Source Code Summarization Sonia Haiduc*, Jairo Aponte**, Andrian Marcus* ICSE NIER 2010 * **
  2. 2. Developers read source code• Before performing maintenance on a system, developers need to understand its source code• During comprehension, programmers search and browse the code
  3. 3. Skimming vs. reading code• Skimming (Starke’09): quickly reading the names of software artifacts + Fast – Insufficient information – Shallow understanding• Reading in depth – Slow – Too much information + Deeper understanding
  4. 4. Code summaries• Automatically generated, short, yet accurate descriptions of source code entities• They give more information than just the header or the name of an artifact• Significantly shorter and faster to read than the source code they summarize
  5. 5. What should we summarize?• Code – Packages – Classes – Methods – Method sequences – Etc.• Other artifacts – Bug reports (ICSE 2010 - S. Rastakar, G. Murphy, G. Murray) – E-mails – Etc.
  6. 6. What should we include in code summaries?• Semantic information – What does the source code do? – Identifiers and comments that capture the main concepts• Structural information – How does the code work? – Class relationships, callers and callees, members of a class, etc.
  7. 7. Description: VFS virtual file system read write mkdir directory path save +Internal classes: DirectoryEntry +Methods: listDirectory, mkdir, constructPath +Fields: WRITE_CAP, READ_CAP, lock +Sub-classes: FileVFS, FavoritesVFS +Other: ...
  8. 8. How should we generate code summaries?• Semantic information: automatic text summarization – Machine Learning – Discourse-based approaches – Term-based Text Retrieval techniques• Structural information: static analysis
  9. 9. How can we evaluate code summaries?• How good are the automatic summaries when compared to manual ones?• How useful are the automatic code summaries for SE tasks?
  10. 10. Preliminary evaluation• Compared automatic code summaries with developer code summaries• 6 developers, 12 methods in ATunes• Used only lexical information – 5 most relevant terms
  11. 11. Results• Automatic source code summaries good in reflecting developers’ summaries• Text Retrieval techniques work as well on source code as on natural language in reflecting human summaries• Developers make use of structural information in their code summaries: – Method name terms – Class name terms – Formal parameter types terms
  12. 12. What are we doing now?• What type and how much structural information should be included in code summaries?• How do developers generate summaries?• Are different summaries needed for different tasks?• How useful are the code summaries for SE tasks?, etc.
  13. 13. In summary…• Automatic code summaries: – Short yet accurate descriptions of source code – Can reduce the effort of program comprehension – Embed both semantic and structural information – Can be generated for a variety of software entities• Visit my poster (HINT: look for the huge and colorful one)• www.cs.wayne.edu/~severe and www.cs.wayne.edu/~shaiduc• sonja@wayne.edu
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×