A Comment Analysis Approach for Program             ComprehensionJosé L. Freitas   1                           Daniela da ...
ContextProgram Comprehension is a vital task of SoftwareMaintenance.In Software Maintenance, 50% of the time is spent onco...
MotivationMost of PC tools are based on the extraction of structuralinformation.Example: Function Y is used by function X ...
Motivation         Comments    can be the biggest source of semantic information         on code, alongside with identiers...
Bad and Good Comments                  When a comment is bad or good?Apart from the existing controversy around this subje...
Goal Create a Program Comprehension tool that explores comments to          search for Problem Domain concepts: Darius.   ...
Outline    Darius  Comment Evaluator      Preliminary study1    Darius  Concept Locator      Experiment23   Conclusion    ...
Darius  Comment EvaluatorThe rst version of Darius analyzes:    Comment Quantity: number of comments, percentage of    com...
Darius  Preliminary study           Comment Evaluator Modules         Freitas, Cruz, Henriques   Comment Analysis for Prog...
Darius GUI (1)         Freitas, Cruz, Henriques   Comment Analysis for Program Comprehension
Darius  Comment Extractor module                         Comment ExtractorDarius extract three types of comments:  1  Inli...
Darius  Comment Extractor moduleIn order to discover and identify what type of source code entity isassociated with the co...
Darius  Statistics Calculator module              Statistic Calculator moduleNumber of comments of a project (global, per ...
Darius  Words Analyzer module                     Words Analyzer moduleGiven a list of words extracted from the ontology o...
Darius  Words Analyzer module        Freitas, Cruz, Henriques   Comment Analysis for Program Comprehension
Outline    Darius  Comment Evaluator      Preliminary study1    Darius  Concept Locator      Experiment23   Conclusion    ...
Darius  Preliminary studyIn order to perform a preliminary study, 10 open-source softwareprojects written in Java were sel...
Darius  Preliminary study Project                Description                   Files       LoC         Classes    iText   ...
Darius  Preliminary study resultsComment Quantity:         6/10 test programs ≥ 19% comments         Freitas, Cruz, Henriq...
Darius  Preliminary study results                                                 Type of Comments      Project     #CM   ...
Darius  Preliminary study results    Project      If    For While Switch Class Interf. Method       iText     5       7 7 ...
Darius  Preliminary study results    Project      If    For      While Switch Class               Interf. Method       iTe...
Darius  Preliminary study resultsComment Content:   10/10 test programs ≥ 23% Problemand Program Domain terms          Fre...
Darius  Preliminary study resultsGoal:  explore the content of comments, by checking weathercomments contain Problem and P...
Darius  Preliminary study results   Project         Problem Domain               Program Domain     iText                 ...
Darius  Preliminary study results                        IC                      BC                      JD  Project      ...
Darius  Preliminary study conclusionHigher level source code entities tend to have commentsoriented for Problem Domain inf...
Outline    Darius  Comment Evaluator      Preliminary study1    Darius  Concept Locator      Experiment23   Conclusion    ...
Darius - Problem Concept LocationGoal:  Search of Problem Domain concepts to nd the mappings ofthese concepts on the sourc...
Darius GUI (2)         Freitas, Cruz, Henriques   Comment Analysis for Program Comprehension
Darius  Employed TechniquesLatent Semantic Analysis (LSA)A technique in natural language processing, of analyzingrelations...
Darius  Employed TechniquesVector Space Model (VSM)An algebraic model for representing text documents as vectors.Each dime...
Outline    Darius  Comment Evaluator      Preliminary study1    Darius  Concept Locator      Experiment23   Conclusion    ...
Darius  Experiment with iTextThe object of study chosen to be subject on this test, is iText.iText contains a sucient amou...
Darius  Experiment with iText        Freitas, Cruz, Henriques   Comment Analysis for Program Comprehension
Darius  Experiment with iTextHow a PDF document is created?         Freitas, Cruz, Henriques   Comment Analysis for Progra...
Darius  Experiment with iTextHow to write a PDF document into an output stream?          Freitas, Cruz, Henriques   Commen...
Darius  Experiment with iTextHow to add a title to a PDF?          Freitas, Cruz, Henriques   Comment Analysis for Program...
Darius  Experiment with iTextGoing deeper in the searches and using the information discoveredin each executed query, the ...
Outline    Darius  Comment Evaluator      Preliminary study1    Darius  Concept Locator      Experiment23   Conclusion    ...
ConclusionDo real world programs actually contain enough and meaningfulcomments to justify the analysis eort and the appro...
Upcoming SlideShare
Loading in …5
×

Comment Analysis approach for Program Comprehension (Software Engineering Workshop - Crete)

764 views
704 views

Published on

How comments can be used to locate problem domain concepts and then identify the related source code.

0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
764
On SlideShare
0
From Embeds
0
Number of Embeds
14
Actions
Shares
0
Downloads
4
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Comment Analysis approach for Program Comprehension (Software Engineering Workshop - Crete)

  1. 1. A Comment Analysis Approach for Program ComprehensionJosé L. Freitas 1 Daniela da Cruz 1 Pedro R. Henriques 1 1 Universidade do Minho, Portugal Software Engineering Workshop, Crete Oct. 12-13, 2012 Freitas, Cruz, Henriques Comment Analysis for Program Comprehension
  2. 2. ContextProgram Comprehension is a vital task of SoftwareMaintenance.In Software Maintenance, 50% of the time is spent oncomprehending the system.Several approaches of source code analysis have been appliedto develop PC tools: program slicing, control-ow, data-ow,etc. Freitas, Cruz, Henriques Comment Analysis for Program Comprehension
  3. 3. MotivationMost of PC tools are based on the extraction of structuralinformation.Example: Function Y is used by function X n times. etc.However, they lack the extraction of the meaning of a programor the Problem domain concepts related with the program.Example: Function Y calculates the amount of credit of abanking account. etc. Freitas, Cruz, Henriques Comment Analysis for Program Comprehension
  4. 4. Motivation Comments can be the biggest source of semantic information on code, alongside with identiers.1 / ∗ T h i s f u n c t i o n r e c e i v e s t h e i d number o f a b a n k i n g a c c o u n t and r e t u r n s t h e a v a i l a b l e amount o f c r e d i t ∗/3 int credit ( int id ){ . . . } Why not use comments to search for Problem Domain concepts, needed to understand a program? Freitas, Cruz, Henriques Comment Analysis for Program Comprehension
  5. 5. Bad and Good Comments When a comment is bad or good?Apart from the existing controversy around this subject, a badcomment can start from being a comment which is inconsistentwith the code which is commenting, and that leads to themisleading of the person who reads it. states that comments help on the comprehension if theyprovide Problem and Program Domain information and means toBrooksestablish bridges between those two domains. Freitas, Cruz, Henriques Comment Analysis for Program Comprehension
  6. 6. Goal Create a Program Comprehension tool that explores comments to search for Problem Domain concepts: Darius. 1 1 Relative to King Darius I of Persia, the rst known man to create the rstbridge between Europe and Asia, on the Bosphorus strait. Freitas, Cruz, Henriques Comment Analysis for Program Comprehension
  7. 7. Outline Darius Comment Evaluator Preliminary study1 Darius Concept Locator Experiment23 Conclusion Freitas, Cruz, Henriques Comment Analysis for Program Comprehension
  8. 8. Darius Comment EvaluatorThe rst version of Darius analyzes: Comment Quantity: number of comments, percentage of comments, etc. Comment Content: Use of Problem Domain and Program Domain terms. Freitas, Cruz, Henriques Comment Analysis for Program Comprehension
  9. 9. Darius Preliminary study Comment Evaluator Modules Freitas, Cruz, Henriques Comment Analysis for Program Comprehension
  10. 10. Darius GUI (1) Freitas, Cruz, Henriques Comment Analysis for Program Comprehension
  11. 11. Darius Comment Extractor module Comment ExtractorDarius extract three types of comments: 1 Inline Comments, IC for short: // ... 2 Block Comments, BC for short: /* ... */ 3 JavaDoc Comments, BC for short: /** ... */ Freitas, Cruz, Henriques Comment Analysis for Program Comprehension
  12. 12. Darius Comment Extractor moduleIn order to discover and identify what type of source code entity isassociated with the comment, the next line after the comment isextracted too. Darius associates comments with: 1 classes 2 interfaces 3 methods 4 conditionals (if) 5 loops (while and for) 6 switches Freitas, Cruz, Henriques Comment Analysis for Program Comprehension
  13. 13. Darius Statistics Calculator module Statistic Calculator moduleNumber of comments of a project (global, per type ofcomment and per line of source code);Average number of comment lines per lines of code;Average number of lines of a non inline comment;Average number of each type of source code entity which iscommented;Type of comments most used (global and per source codeentity). Freitas, Cruz, Henriques Comment Analysis for Program Comprehension
  14. 14. Darius Words Analyzer module Words Analyzer moduleGiven a list of words extracted from the ontology of the ProblemDomain, Darius computes: Percentage and frequency of words in the list found in comments; Frequency of each type of comment that contains words from the list; Frequency of each type of source code entity commented that contains words from the list. Freitas, Cruz, Henriques Comment Analysis for Program Comprehension
  15. 15. Darius Words Analyzer module Freitas, Cruz, Henriques Comment Analysis for Program Comprehension
  16. 16. Outline Darius Comment Evaluator Preliminary study1 Darius Concept Locator Experiment23 Conclusion Freitas, Cruz, Henriques Comment Analysis for Program Comprehension
  17. 17. Darius Preliminary studyIn order to perform a preliminary study, 10 open-source softwareprojects written in Java were selected.The choice for the use of open-source projects has two reasons: 1 The source code is totally free; Open-source software projects are highly used by the community to change and manipulate the source code over 2 and over againThese kind of projects tend to be constantly updated and thuscomprehension tasks are involved. Commenting can be a properway of helping on these tasks. Freitas, Cruz, Henriques Comment Analysis for Program Comprehension
  18. 18. Darius Preliminary study Project Description Files LoC Classes iText PDF Library 480 145666 403ganttproject Project Management Library 530 68945 394 gwt-dev Googles Web Toolkit 987 192738 803 jEdit Text Editor 531 176006 404 vuze Peer-to-peer client 3284 785935 2463 junit Tests Framework 154 10926 130 jfreechart Chart Library 989 313231 876 antlr Grammar Framework 221 85867 212 jexcelapi Excel Library 438 93876 166 robocode Programming Game of Robots 571 81519 485 Total 8185 1954709 6336 Table : Description and size of each selected project Freitas, Cruz, Henriques Comment Analysis for Program Comprehension
  19. 19. Darius Preliminary study resultsComment Quantity: 6/10 test programs ≥ 19% comments Freitas, Cruz, Henriques Comment Analysis for Program Comprehension
  20. 20. Darius Preliminary study results Type of Comments Project #CM CM/LOSC #IC #BC #JD iText 13343 0.24 4930 3777 4636 ganttproject 4468 0.11 2925 814 729 gwt-dev 12969 0.16 7219 866 4884 jEdit 18986 0.21 806 14421 3759 vuze 27723 0.08 18245 2319 7159 junit 519 0.21 2 77 440 jfreechart 22516 0.27 6592 2530 13394 antlr 5292 0.14 3903 1380 9 jexcelapi 8354 0.26 2213 775 5366 robocode 5071 0.19 3108 102 1861 Total 119241 0.16 63633 13371 42237 Table : Comments Frequency in the projects. Freitas, Cruz, Henriques Comment Analysis for Program Comprehension
  21. 21. Darius Preliminary study results Project If For While Switch Class Interf. Method iText 5 7 7 7 89 90 76ganttproject 5 5 3 8 57 41 18 gwt-dev 9 10 7 5 96 97 19 jEdit 9 8 4 2 86 79 61 vuze 6 6 5 7 45 46 24 junit 1 0 0 0 25 71 37 jfreechart 6 10 2 18 100 100 100 antlr 11 16 5 4 61 56 22 jexcelapi 14 18 12 0 99 100 88 robocode 7 11 12 3 76 94 20 Total 7 8 6 5 69 60 45 Table : Percentage of Source Code Entities commented Freitas, Cruz, Henriques Comment Analysis for Program Comprehension
  22. 22. Darius Preliminary study results Project If For While Switch Class Interf. Method iText IC IC IC IC JD JD JDganttproject IC IC IC IC JD JD JD gwt-dev IC IC IC IC JD JD JD jEdit IC IC IC IC JD JD JD vuze IC IC IC IC JD JD JD junit IC NA NA NA JD JD JD jfreechart IC IC IC IC JD JD JD antlr IC IC IC IC BC BC BC jexcelapi IC IC BC NA JD JD JD robocode IC IC IC IC JD JD JD Total IC IC IC IC JD JD JD Table : Most used type of comment per type of source code entity Freitas, Cruz, Henriques Comment Analysis for Program Comprehension
  23. 23. Darius Preliminary study resultsComment Content: 10/10 test programs ≥ 23% Problemand Program Domain terms Freitas, Cruz, Henriques Comment Analysis for Program Comprehension
  24. 24. Darius Preliminary study resultsGoal: explore the content of comments, by checking weathercomments contain Problem and Program domain information.Information necessary to run these tests: a list of problem domain terms for each one of the software projects. a (single) list of program domain terms. Freitas, Cruz, Henriques Comment Analysis for Program Comprehension
  25. 25. Darius Preliminary study results Project Problem Domain Program Domain iText 92.31 86.76 ganttproject 84.31 75.0 gwt-dev 56.34 86.76 jEdit 89.74 86.76 vuze 92.11 88.24 junit 81.82 67.65 jfreechart 86.36 89.71 antlr 88.24 83.82 jexcelapi 79.31 85.29 robocode 88.89 83.82 Total 82.21 83.38 Table : Percentage of domain words found Freitas, Cruz, Henriques Comment Analysis for Program Comprehension
  26. 26. Darius Preliminary study results IC BC JD Project Prob. Prog. Prob. Prog. Prob. Prog. iText 13.05 13.99 6.89 14.1 14.7 17.36ganttproject 13.76 13.52 14.78 11.58 14.67 14.2 gwt-dev 0.96 19.7 2.03 18.31 2.62 22.16 jEdit 5.1 17.15 6.44 24.76 9.28 16.69 vuze 4.6 18.02 5.14 11.38 4.29 18.89 junit 0 20.0 17.14 16.57 22.66 25.77 jfreechart 20.7 20.73 16.74 12.45 15.58 21.41 antlr 13.85 13.81 13.95 10.7 2.13 11.35 jexcelapi 10.38 16.16 17.08 12.97 24.97 17.01 robocode 17.0 14.06 16.52 12.6 25.13 12.5 Total 9.97 16.27 8.33 14.58 13.13 19.13Table : Frequency (%) of words of each Domain per type of comment Freitas, Cruz, Henriques Comment Analysis for Program Comprehension
  27. 27. Darius Preliminary study conclusionHigher level source code entities tend to have commentsoriented for Problem Domain information, while commentsof lower level entities tends to include more ProgramDomain information. Freitas, Cruz, Henriques Comment Analysis for Program Comprehension
  28. 28. Outline Darius Comment Evaluator Preliminary study1 Darius Concept Locator Experiment23 Conclusion Freitas, Cruz, Henriques Comment Analysis for Program Comprehension
  29. 29. Darius - Problem Concept LocationGoal: Search of Problem Domain concepts to nd the mappings ofthese concepts on the source code. Freitas, Cruz, Henriques Comment Analysis for Program Comprehension
  30. 30. Darius GUI (2) Freitas, Cruz, Henriques Comment Analysis for Program Comprehension
  31. 31. Darius Employed TechniquesLatent Semantic Analysis (LSA)A technique in natural language processing, of analyzingrelationships between a set of documents and the terms theycontain by producing a set of concepts related to the documentsand terms.LSA assumes that words that are close in meaning will occur closetogether in text. It constructs a matrix containing word counts perparagraph (rows represent unique words and columns representeach paragraph). Freitas, Cruz, Henriques Comment Analysis for Program Comprehension
  32. 32. Darius Employed TechniquesVector Space Model (VSM)An algebraic model for representing text documents as vectors.Each dimension corresponds to a separate term. If a term occurs inthe document, its value in the vector is non-zero.A weight is used to evaluate how important a word is to adocument in a collection. The importance increasesproportionally to the number of times a word appears in thedocument. Freitas, Cruz, Henriques Comment Analysis for Program Comprehension
  33. 33. Outline Darius Comment Evaluator Preliminary study1 Darius Concept Locator Experiment23 Conclusion Freitas, Cruz, Henriques Comment Analysis for Program Comprehension
  34. 34. Darius Experiment with iTextThe object of study chosen to be subject on this test, is iText.iText contains a sucient amount of comments, and the contentsof that comments have a sucient dose of Problem and Programdomain information, and so this program can be explored for PCpurposes using its comments. Freitas, Cruz, Henriques Comment Analysis for Program Comprehension
  35. 35. Darius Experiment with iText Freitas, Cruz, Henriques Comment Analysis for Program Comprehension
  36. 36. Darius Experiment with iTextHow a PDF document is created? Freitas, Cruz, Henriques Comment Analysis for Program Comprehension
  37. 37. Darius Experiment with iTextHow to write a PDF document into an output stream? Freitas, Cruz, Henriques Comment Analysis for Program Comprehension
  38. 38. Darius Experiment with iTextHow to add a title to a PDF? Freitas, Cruz, Henriques Comment Analysis for Program Comprehension
  39. 39. Darius Experiment with iTextGoing deeper in the searches and using the information discoveredin each executed query, the programmer can build an incrementalknowledge of the software.The programmer should be able to gure out the implementation ofevery concept on the source code and the relations among them, byusing the information present on comments. Freitas, Cruz, Henriques Comment Analysis for Program Comprehension
  40. 40. Outline Darius Comment Evaluator Preliminary study1 Darius Concept Locator Experiment23 Conclusion Freitas, Cruz, Henriques Comment Analysis for Program Comprehension
  41. 41. ConclusionDo real world programs actually contain enough and meaningfulcomments to justify the analysis eort and the approach proposed? Using simple but eective queries the process of locating concepts using comment information is faster than the complex task of reading the whole source code of the program. Darius shows the potential value of comprehension that comments poses. As future work: Questionnaires will be made to understand how a programmer would deal with Darius. Develop Darius as a plugin for an IDE (e.g. Eclipse). Freitas, Cruz, Henriques Comment Analysis for Program Comprehension

×