Sentence-to-Code Traceability Recovery with Domain Ontologies

5,796 views
6,304 views

Published on

Presented at APSEC 2010
http://dx.doi.org/10.1109/APSEC.2010.51

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
5,796
On SlideShare
0
From Embeds
0
Number of Embeds
3,799
Actions
Shares
0
Downloads
15
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Sentence-to-Code Traceability Recovery with Domain Ontologies

  1. 1. Sentence-to-Code Traceability Recoverywith Domain Ontologies Shinpei Hayashi, Takashi Yoshikawa, Motoshi Saeki Tokyo Institute of Technology, Japan
  2. 2. NL sentence Source code  Results: It works well! − Implemented an automated tool − Performed a case study using JDraw • Recovered traceability between 7 sentences and code • Obtained more accurate results than the cases without using ontology
  3. 3. Background Documentation-to-code traceability links are important − For reducing maintenance costs − For software reuse & extension Our focus: recovering sentence-to-code traceability − In some software products, there’s only documentation of simple sentences without any detailed description E.g., by use of agile/bazaar-style processes
  4. 4. Example of Sentence JDraw (http://jdraw.sf.net/) 4
  5. 5. Example of Sentence JDraw (http://jdraw.sf.net/) 5
  6. 6. Example of Sentence JDraw (http://jdraw.sf.net/) plain, filled and gradient filled rectanglesJust single sentence,no detailed documents 6
  7. 7. Aim To detect the set of methods related to the input sentence NL sentence Source code (a set of words) (a set of methods) Users can draw a plain oval. draw Oval() writeLog() getCanvas() getColor() setPixel() getColorPallete() DrawPanel() OvalTool() 7
  8. 8. Challenge How to find the correct set? − Word similarity: leads to false positives/negatives NL sentence Source code (a set of methods) (a set of words) Users can draw draw Oval() a plain oval. writeLog() void setPixel(...) { getCanvas() getColor() ... draw ... False } setPixel() negatives False getColorPallete() positives DrawPanel() OvalTool() 8
  9. 9. Challenge How to find the correct set? − Word similarity: leads to false positives/negatives − Method invocation: leads to false positives NL sentence Source code (a set of methods) (a set of words) False positive Users can draw drawOval() a plain oval. writeLog() getCanvas() getColor() setPixel() getColorPallete() DrawPanel() OvalTool() 9
  10. 10. Challenge Another criterion required − To judge whether a method invocation is needed − Considering the problem domain NL sentence Source code (a set of methods) (a set of words) Users can draw drawOval() Not a plain oval. important important writeLog() getCanvas() getColor() setPixel() getColorPallete() DrawPanel() OvalTool() 10
  11. 11. Domain Ontology Formally representing the knowledge of the target problem domain − As relationships between concepts (words) canvas A concept “canvas” is a possible target to “draw”. draw The function “draw” concerns the concept “color”. An ontology for oval color painting tools (excerpt) 11
  12. 12. Solution Choosing method invocations using domain ontology NL sentence Source code (a set of words) (a set of methods and their invocations) Users can draw a plain oval. draw Oval() Ontology writeLog() canvas getCanvas() getColor() setPixel() draw getColorPallete() DrawPanel() OvalTool() oval color 12
  13. 13. System Overview Inputs Outputs Functional Sentence Call-graphs (FCGs) Source 1st score Code 100 2nd Domain score 85 Ontology 13
  14. 14. Procedure NL sentence Source code (a set of words) (a set of methods and their invocations) … draw oval m11. Extracting call-graph m2 m32. Extracting words3. Extracting functional call-graphs (FCGs) m4 m54. Prioritizing FCGs m6 m7 m8 14
  15. 15. Procedure NL sentence Source code (a set of words) (a set of methods and their invocations) … draw oval m11. Extracting call-graph m2 m3 by static analysis2. Extracting words3. Extracting functional m4 m5 call-graphs (FCGs)4. Prioritizing FCGs m6 m7 m8 15
  16. 16. Procedure NL sentence Source code (a set of words) (a set of methods and their invocations) … draw oval {..., draw} {..., draw, oval} m11. Extracting {..., color} {..., draw} call-graph m2 m32. Extracting words • Stemming {..., oval} {..., color} • Removing stopwords m4 m53. Extracting functional call-graphs (FCGs) {..., pixel} {..., scale} {..., log}4. Prioritizing FCGs m6 m7 m8 16
  17. 17. Procedure NL sentence Source code (a set of words) (a set of methods and their invocations) … draw oval Sa m11. Extracting call-graph m2 m32. Extracting words3. Extracting functional call- m4 m5 graphs (FCGs) Sb4. Prioritizing FCGs m6 m7 m8 17
  18. 18. Procedure NL sentence Source code (a set of words) (a set of methods and their invocations) … draw oval Sa m1 Score 1001. Extracting call-graph m2 m32. Extracting words3. Extracting functional call-graphs (FCGs) m4 m54. Prioritizing FCGs S b Score m6 m7 m8 85 18
  19. 19. Extracting FCGs NL sentence Source code (a set of words) (a set of methods and their invocations) {wa, wb} m11. Root selection m2 m32. Traversal m4 m5 m6 m7 m8 19
  20. 20. Extracting FCGs NL sentence Source code (a set of words) (a set of methods and their invocations) {wa, wb} {wa, ...} m1 role: {wa}1. Root selection {wb, ...}– Choose the methods m2 m3 having the words in role: {wb} the input sentence {wb, ...}– These words are m4 m5 role: {wb} tagged as roles2. Traversal m6 m7 m8 20
  21. 21. Extracting FCGs NL sentence Source code (a set of words) (a set of methods and their invocations) {wa, wb} m1 role: {wa}1. Root selection2. Traversal m2 m3– Traverse method role: {wb} invocations from the roots m4 m5 role: {wb}– Forwards if the invocation satisfies one of the m6 m7 m8 traversal rules 21
  22. 22. Extracting FCGs NL sentence Source code (a set of words) (a set of methods and their invocations) {wa, wb} m1Rule #1 (Sentence-based) role: {wa}1. Root selection holds if the callee method2. Traversal hasTraverse methodgiven – a word in the m2 {..., wb} m3 role: {wb} sentence invocations from the roots if the invocation m4 satisfies one of the role: {wb} m5 traversal rules m6 m7 m8 22
  23. 23. Extracting FCGs NL sentence Source code (a set of words) (a set of methods and their invocations) {wa, wb} m1 role: {wa}1. Root selection2. Traversal #2 (Ontology-based) Rule – Traverse method 2m m3 holds if the method invocation} invocations from the role: {wb role: {wc} matches the relationships in roots if the invocation m4 {..., wc} the given ontology satisfies one of the role: {wb} m5 traversal rules m6 m7 m8 23
  24. 24. Extracting FCGs NL sentence Source code (a set of words) (a set of methods and their invocations) {wa, wb} m1 role: {wa}1. Root selection2. Traversal m2 – Traverse method (Inheritance) m3 Rule #3 role: {wb} role: {wc} invocations from the roots if holds if the callee method the invocation m4 satisfies onea word in the roles of } has of the role: {wb m5 {..., wc} traversal rules role: {wc} the caller method m6 m7 m8 24
  25. 25. Ontology-Based Rule Holds if the invocation matches the relationships in the given ontology Ontology drawOval() role: {draw, oval} draw Caller canvas Callee getCanvas() role: {canvas} 25
  26. 26. Extracting FCGs NL sentence Source code (a set of words) (a set of methods and their invocations) {wa, wb} Sa m1 role: {wa}1. Root selection2. Traversal m2 m3– Traverse method role: {wb} role: {wc} invocations from the roots m4 m5 role: {wb} role: {wc}– Traversed methods will be a FCG Sb m6 m7 m8 role: {…} 26
  27. 27. Prioritizing FCGs Using an weighting scheme Criteria: we prioritize FCGs which 1. include methods having important roles as their names 2. include method invocations matching to the relationships in the ontology and/or the sentence 3. cover many words in the input sentence 27
  28. 28. Case Study Evaluation target: JDraw 1.1.5 − We picked up 7 sentences from JDraws manual on the Web − Prepared an ontology for painting tools (including 38 concepts and 45 relationships) − Prepared control (answer) sets of methods by an expert Evaluation Criteria − Calculating precision and recall values by comparing the extracted sets with the control sets 28
  29. 29. Results Use of ontology Yes Yes No No Input sentences Prec. Recall Prec. Recall1. "plain, filled and gradient filled rectangles" 0.83 0.94 1.00 0.192. "plain, filled and gradient filled ovals" 0.82 0.98 1.00 0.213. "image rotation" 1.00 0.35 0.00 0.004. "image scaling" 0.22 0.68 1.00 0.585. "save JPEGs of configurable quality" 0.40 1.00 0.67 1.006. "colour reduction" 0.74 0.95 0.74 0.957. "grayscaling" - - - - Picked up the FCGs having the highest F- measure from the results ranked in the top 5 − F-measure = 2 / (precision-1 + recall-1) 29
  30. 30. Results Use of ontology Yes Yes No No Input sentences Prec. Recall Prec. Recall1. "plain, filled and gradient filled rectangles" 0.83 0.94 1.00 0.192. "plain, filled and gradient filled ovals" 0.82 0.98 1.00 0.213. "image rotation" 1.00 0.35 0.00 0.004. "image scaling" 0.22 0.68 1.00 0.585. "save JPEGs of configurable quality" 0.40 1.00 0.67 1.006. "colour reduction" 0.74 0.95 0.74 0.957. "grayscaling" - - - - Accurate results by using the ontology − precision > 0.7 for 3 cases − recall > 0.9 for 4 cases 30
  31. 31. Results Use of ontology Yes Yes No No Input sentences Prec. Recall Prec. Recall1. "plain, filled and gradient filled rectangles" 0.83 0.94 1.00 0.192. "plain, filled and gradient filled ovals" 0.82 0.98 1.00 0.213. "image rotation" 1.00 0.35 0.00 0.004. "image scaling" 0.22 0.68 1.00 0.585. "save JPEGs of configurable quality" 0.40 1.00 0.67 1.006. "colour reduction" 0.74 0.95 0.74 0.957. "grayscaling" - - - - Improvement by using the ontology − Improved recall for 1st and 2nd cases − detected traceability for 3rd case Domain ontology gives us valuable guides for sentence-to-code traceability recovery. 31
  32. 32. Future Work Case study++ − Larger − Another/multiple domains Supporting ontology construction Improving NLP techniques Combination of other techniques − Dynamic analysis − Relevance feedback
  33. 33. Example of Sentence Solution JDraw (http://jdraw.sf.net/)  Choosing method invocations using domain ontology NL sentence Source code (a set of words) (a set of methods and their invocations) Users can draw a plain oval. draw Oval() Ontology writeLog() canvas getCanvas() getColor() setPixel() draw getColorPallete() DrawPanel() OvalTool() oval color 5 12Procedure Results NL sentence Source code (a set of words) (a set of methods and their invocations) {wa, wb} a b Sa m1 role: {wa}1. Root selection a2. Traversal – Traverse method m2 m3 role: {wb} role: {wc} invocations from the b c roots if the invocation m4 m5  Improvement by using the ontology satisfies one of the role: {wb} traversal rules b role: {wc} S b c − Improved recall for 1st and 2nd cases − detected traceability for 3rd case – Traversed methods m6 m7 m8 will be a FCG role: {…} Domain ontology gives us valuable guides 28 for sentence-to-code traceability recovery. 34
  34. 34. additional slides
  35. 35. System Overview Words in theSentence Sentence Extracting Words splitting, stemming, extractSource identifiers removing stop words... Words in the Code Code static analysis Call-graphDomain FunctionalOntology Traversing Call-graphs call-graphInputs Prioritizing Outputs Ordered functional Call-graphs 39
  36. 36. Extracting Call Graph Statically extracting invocations Consideration of overridden methodsclass Tool ....class OvalTool extends Tool ... Drawer# Drawer#class Drawer { startDraw startDraw public void startDraw() { Tool tool = Tool extracted draw() Tool.getCurrent(); tool.draw(); Tool# OvalTool# } OvalTool draw() draw draw} 40
  37. 37. Extracting Words From source code {starts, with, Extracting startsWith, Tokenizing starts, identifiers startDraw, ... camelCases draw, ...} From sentence {draw, a, Normalizing draw a circle circle} Normalization {draw, {start, − Stemming oval} draw, ...} − Removing stop words 41
  38. 38. Bad Results Use of ontology Yes Yes No No Input sentences Prec. Recall Prec. Recall1. "plain, filled and gradient filled rectangles" 0.83 0.94 1.00 0.192. "plain, filled and gradient filled ovals" 0.82 0.98 1.00 0.213. "image rotation" 1.00 0.35 0.00 0.004. "image scaling" 0.22 0.68 1.00 0.585. "save JPEGs of configurable quality" 0.40 1.00 0.67 1.006. "colour reduction" 0.74 0.95 0.74 0.957. "grayscaling" - - - - Bad effects, no improvement − reason: use of external modules − reason: use of compound words (grayscale vs. grayScale) 42

×