Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

ICPC 2012 - Mining Source Code Descriptions

712 views

Published on

Very often, source code lacks comments that adequately describe its behavior. In such situations developers need to infer knowledge from the source code itself, or to search for source code descriptions in external artifacts. We argue that messages exchanged among contributors/developers, in the form of bug reports and emails, are a useful source of information to help understanding source code. However, such communications are unstructured and usually not explicitly meant to describe specific parts of the source code. Developers searching for code descriptions within communications
face the challenge of filtering large amount of data to extract what pieces of information are important to them. We propose an approach to automatically extract method descriptions from
communications in bug tracking systems and mailing lists. We have evaluated the approach on bug reports and mailing lists from two open source systems (Lucene and Eclipse). The
results indicate that mailing lists and bug reports contain relevant
descriptions of about 36% of the methods from Lucene and 7%
from Eclipse, and that the proposed approach is able to extract such descriptions with a precision up to 79% for Eclipse and 87% for Lucene. The extracted method descriptions can help
developers in understanding the code and could also be used as a starting point for source code re-documentation.

  • Be the first to comment

  • Be the first to like this

ICPC 2012 - Mining Source Code Descriptions

  1. 1. Mining Source Code Descriptions from Developer CommunicationsSebastiano Jairo Massimiliano Andrian GerardoPanichella Aponte Di Penta Marcus Canfora
  2. 2. Context: Software Project describes Sequence diagram Program Documentation ComprehensionSource Code Class diagram Maintenance Tasks Difficult understanding understanding Developer
  3. 3. Context: Software ProjectComing backto the reality... describes Sequence diagram Program Documentation ComprehensionSource Code Class diagram Maintenance Tasks Difficult understanding understanding Developer
  4. 4. .................................................. When call the method IndexSplitter.split(File Idea destDir, String[] segs) from the Lucene cotrib directory(contrib/misc/src/java/org/apache/luc ene/index) it creates an index with segmentsWe argue that messages exchanged wrong data. Namely wrong descriptor file with among contributors/developersare a useful source of information to help understanding of segment is the number representing the name source code. that would be created next in this index. ..................................................In such situations developers need to infer knowledge from, CLASS: IndexSplitter METHOD: split source code descriptions in external the source code itself artifacts. Developer
  5. 5. A Five Step-Approach for Mining Method DescriptionsDeveloper
  6. 6. Step 1: Downloading emails/bugs reports and tracing them onto classes The discussion contains a fully-qualified class name (e.g., org.apache.lucene.analysis.MappingCharFilter); or the email contains a Two heuristics file name (e.g., MappingCharFilter.java) For bug reports, we complement the above heuristic by matching the bug ID of each closed bug to the commit notes, therefore tracing the bug report to the files changed in that commit Developer DiscussionWhen call the method IndexSplitter .split(File destDir, String[] segs) from theLucene cotrib directory (contrib/misc/src/java/org/apache/lucene/index) it createsan index with segments descriptor file with wrong data. Namely wrong is the numberrepresenting the name of segment that would be created next in this index. CLASS: IndexSplitterpublic void split(File destDir, String[] segs) throws IOException {destDir.mkdirs();FSDirectory destFSDir = FSDirectory.open(destDir);SegmentInfos destInfos = new SegmentInfos }If some of the segments of the index already has this name this results either toimpossibility to create new segment or in crating of an corrupted segment.
  7. 7. Step 2: Extracting paragraphs We consider as paragraphs, text section separated by one or more white lines Two heuristics We prune out paragraph description from source code fragments and/or stack Traces "by using an approach inspired by the work of Bacchelli et al.DeveloperDiscussionWhen call the method IndexSplitter.split(File destDir, String[] segs) from the PARLucene cotrib directory (contrib/misc/src/java/org/apache/lucene/index) it createsan index with segments descriptor file with wrong data. Namely wrong is the number 1representing the name of segment that would be created next in this index.public void split(File destDir, String[] segs) throws IOException {destDir.mkdirs(); PARFSDirectory destFSDir = FSDirectory.open(destDir); 2SegmentInfos destInfos = new SegmentInfos }If some of the segments of the index already has this name this results either to PARimpossibility to create new segment or in crating of an corrupted segment. 3
  8. 8. Step 3: Tracing paragraphs onto methods A) A valid paragraph must contain the keyword “method” These paragraphs must respect the following B) and the method name must be followed by a open parenthesis— two conditions: i.e., we match “foo(” Developer Discussion A) B)When call the method IndexSplitter.split(File destDir, String[] segs) PARfrom the Lucene cotrib directory it creates an index with segments1descriptor file with wrong data. Namely wrong is the numberrepresenting the name of segment that would be created next in thisindex.CLASS: IndexSplitterMETHOD: split(........................................................................................................................................................................................................................................................................................................................................................
  9. 9. Step 4: Heuristic based Filtering We defined a set of heuristics to further filter the paragraphs associated with methods that assign each paragraph a score:.......................... a) Method parameters:Problem seems to come from 1 if the % of parameters methodMainMethodeSearchEngine inorg.eclipse.jdt.internal.ui.launcher s1= mentioned in the does notThe Method paragraphs. Value havesearchMainMethods(IProgressMonitor, between 0 and 1 parameters IJavaSearchScope, boolean) ,theresa call to addSubTypes(List,IProgressMonitor, IJavaSearchScope)Method if includesSubtypes flag isON. This method add all types sub-types as soon as the given scopeencloses them without testing ifsub-types have a main! After returnIType[] before the excecution..........................CLASS: MainMethodSearchEngineMETHOD: serachMainMethods% parameter = 100% -> s1= 1SCORE
  10. 10. Step 4: Heuristic based Filtering We defined a set of heuristics to further filter the paragraphs associated with methods that assign each paragraph a score:.......................... a) Method parameters:Problem seems to come from 1 if the % of parameters methodMainMethodeSearchEngine inorg.eclipse.jdt.internal.ui.launcher s1= mentioned in the does notThe Method paragraphs. Value havesearchMainMethods(IProgressMonitor, between 0 and 1 parameters IJavaSearchScope, boolean) ,theresa call to addSubTypes(List, b) Syntactic descriptions (mentioning return values):IProgressMonitor, IJavaSearchScope)Method if includesSubtypes flag is check whether the Equal to one ifON. This method add all types sub- paragraph contains the the method istypes as soon as the given scope s2= keyword “return”. If YESencloses them without testing if void.sub-types have a main! After return Value equal 1, 0 otherwiseIType[] before the excecution..........................CLASS: MainMethodSearchEngineMETHOD: serachMainMethods% parameter = 100% -> s1= 1SCORE = 1+
  11. 11. Step 4: Heuristic based Filtering We defined a set of heuristics to further filter the paragraphs associated with methods that assign each paragraph a score:.......................... a) Method parameters:Problem seems to come from 1 if the % of parameters methodMainMethodeSearchEngine inorg.eclipse.jdt.internal.ui.launcher s1= mentioned in the does notThe Method paragraphs. Value havesearchMainMethods(IProgressMonitor, between 0 and 1 parameters IJavaSearchScope, boolean) ,theresa call to addSubTypes(List, b) Syntactic descriptions (mentioning return values):IProgressMonitor, IJavaSearchScope)Method if includesSubtypes flag is check whether the Equal to one ifON. This method add all types sub- paragraph contains the the method istypes as soon as the given scope s2= keyword “return”. If YESencloses them without testing if void.sub-types have a main! After return Value equal 1, 0 otherwiseIType[] before the excecution c) Overriding/Overloading:.......................... 1 if any of the “overload” orCLASS: MainMethodSearchEngine s3=“override” keywords appearsMETHOD: serachMainMethods in the paragraph, 0 otherwise% parameter = 100% -> s1= 1 d) Method invocations:SCORE = 1+ 0+ 1 = 2 1 if any of the “call” or s4=“excecute” keywords appears in the paragraph, 0 otherwise
  12. 12. Step 4: Heuristic based Filtering We defined a set of heuristics to further filter the paragraphs associated with methods that assign each paragraph a score: a) Method parameters:We selected paragraphs that have: % of parameters 1 if the method s1= mentioned in the does not1. s1 ≥ thP = 0.5 paragraphs. Value have between 0 and 1 parameters2. s2 + s3 + s4 ≥ thH = 1 b) Syntactic descriptions (mentioning return values): check whether the Equal to one if paragraph contains the the method is s2= keyword “return”. If YES void. Value equal 1, 0 otherwise OK c) Overriding/Overloading: 1 if any of the “overload” or s3=“override” keywords appears in the paragraph, 0 otherwise% parameter = 100% -> s1= 1 ≥ 0.5 d) Method invocations:SCORE = 1+ 0+ 1 = 2 ≥ 1 1 if any of the “call” or s4=“excecute” keywords appears in the paragraph, 0 otherwise
  13. 13. Step 5: Similarity based Filtering We rank filtered paragraphs through their textual similarity with the method they are likely describing. METHOD PARAGRAPH SCORE Similarity Removing: - English stop words; Method_3 Paragraph_4 2.5 96.1% - Programming Method_1 Paragraph_1 2.5 95.6% language keywords Method_2 Paragraph_2 1.5 97.4% Using: Method_3 Paragraph_3 1.5 86.2% - Camel Case splitting the on Method_1 Paragraph_3 1.5 79.0% remaining words Method_3 Paragraph_2 1.5 77.5% - Vector Space Model Method_2 Paragraph_4 1.5 64.3% Method_2 Paragraph_3 1.3 83.2% Method_3 Paragraph_1 1.3 73.9% th>=0.80 Method_2 Paragraph_1 1.3 68.7% Method_1 Paragraph_4 1.3 53.6%
  14. 14. Empirical Study• Goal: analyze source code descriptions in developer discussions• Purpose: investigating how developer discussions describe methods of Java Source Code• Quality focus: find good method description in messages exchanged among contributors/developers• Context: Bug-report and mailing lists of two Java Project  Apache Lucene and Eclipse
  15. 15. Context
  16. 16. Research Questions RQ1 (method coverage): How many methods from the analyzed software systems are described by the paragraphs identified by the proposed approach? RQ2 (precision): How precise is the proposed approach in identifying method descriptions? RQ3 (missing descriptions): How many potentially good method descriptions are missed by the approach?
  17. 17. RQ1: How many methods from the analyzed software systems are described by the paragraphs identified by the proposed approach?
  18. 18. RQ2: How precise is the proposed approach in identifying method descriptions? We sampled 250 descriptions from each project
  19. 19. RQ3: How many potentially good method descriptions aremissed by the approach? We sampled 100 descriptions from each project TABLE III The analysis of a sample of 100 paragraphs traced to methods, but not satisfying the Step 4 heuristic System True Negatives False Negatives Eclipse 78 22 Apache Lucene 67 33
  20. 20. Conclusion

×