Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Source code comprehension on evolving software

1,053 views

Published on

Yida's PQE

Published in: Software, Technology
  • Be the first to comment

  • Be the first to like this

Source code comprehension on evolving software

  1. 1. Source Code Comprehension on Evolving Software: A Literature Survey Yida Tao Supervisor: Sunghun Kim 1
  2. 2. Motivation Code Change Comprehension Tao et al., FSE’12 Code change comprehension is • Frequently required • In major development activities, in particular the code-review process • How do software engineers understand code changes? An exploratory study in industry. Tao et al., FSE’12 • Expectations, outcomes, and challenges of modern code review. Bacchelli and Bird, ICSE’13 Bacchelli & Bird, ICSE’13 • “…review and understand code they have not seen before may be more common that a developer working on new code” • “From interviews, no other code review challenge emerged as clearly as understanding the submitted change” 2
  3. 3. Outline Program Differencing Describing code changes Code Change Summarization Explaining code changes Querying and Filtering Customization Code Change Comprehension 3
  4. 4. Program Differencing 4  Text Differencing  Syntactic Differencing  Semantic Differencing
  5. 5. Text Differencing  Flat representation of a program  Sequence of strings  Unix diff  Only output added/deleted lines, can not detect modified lines  Hard to determine when a code fragment is moved upward or downward  Ldiff (Canfora et al., ICSE’09)  An enhanced line differencing tool  Limitations  Changes to *characters*  No syntactic-structure information 5
  6. 6. Syntactic Differencing  Structured representation of a program  Abstract syntax tree; XML  ChangeDistiller (Fluri et al., TSE’07)  Tree differencing  Node: bigram string similarity  Control structure: subtree similarity  Output: tree edit script (insert, delete, move, update)  XML differecing  srcXML (Maletic & Collard, ICSM’04): embeds abstract syntax and structure within the source code  diffX (Al-Ekram et al., CASCON '05)  Limitation  Cannot describe how the behavior of a program is changed  Still report differences for behavior-preserving changes 6
  7. 7. Semantic Differencing  Semantic diff (Jackson and Ladd, ICSM’94)  Method-level  Variable dependencies comparison 7 ==
  8. 8. Semantic Differencing (cont.)  JDiff (Apiwattanapong et al. ASE’04, 06)  Extended control-flow graph (ECFG)  Dynamic binding, class hierarchy, exception handling, etc. 8
  9. 9. Semantic Differencing (cont.)  Differential symbolic execution (Person et al., FSE’08)  “Executing” a program using symbolic values 9
  10. 10. Outline Program Differencing Text Differencing Syntactic differencing Semantic differencing Code Change Comprehension Code Change Summarization Explaining code changes Querying and Filtering Customization 10
  11. 11. Code Change Summarization  LSdiff (Kim and Notkin, ICSE’09)  Group related changes  Detect potential inconsistencies in a code change 11
  12. 12. Code Change Summarization (cont.)  DeltaDoc (Buse and Weimer, ASE’10)  Symbolic execution: obtain path predicates for each statement in both versions  Identify statements that are added, deleted, or have a changed predicates  Summarization 12
  13. 13. Code Change Summarization (cont.)  Multi-document summarization (Rastkar and Murphy, ICSE’13)  Linking evolutionary documents (commit log, issue tracking entries)  Finding the most informative sentences to extract to form a summary  Similarity between a sentence and the title of the enclosing document  Overlap between a sentence and the adjacent document 13
  14. 14. Code Change Summarization (cont.)  Challenges  Evolutionary documents  Linkage might not be found (Bachman et al., FSE’10, Wu et al., FSE’11)  Human-written document may be unavailable or uninformative (Buse and Weimer, ASE’10, Tao et al., FSE’12)  Automatically generated document  Verbosity  Uninteresting changes are identified, e.g., “all types that declared toString() added constructors” (Kim and Notkin, ICSE’09) 14 LSdiff DeltaDoc
  15. 15. Outline Program Differencing Text Differencing Syntactic differencing Semantic differencing Code Change Summarization Rules and exceptions Control-flow changes Evolutionary documentation Code Change Comprehension Querying and Filtering Customization 15
  16. 16. Querying and Filtering  Specifying and detecting meaningful changes (Yu et al., ASE’11)  Normalize the program (user-specified) before differencing  Non-trivial to construct the query 16
  17. 17. Querying and Filtering (cont.)  Filtering non-essential changes (Kawrykow and Robillard, ICSE’11)  Non-essential changes: rename-induced modifications, local variable extraction, trivial keyword modification, whitespace and documentation updates  ChangeDistiller (Fluri et al., TSE’07) + Partial program analysis (Dagenais and Robillard, ICSE’08)  Goal: improving mining and recommendation accuracy instead of developers’ comprehension 17
  18. 18. Outline Program Differencing Text Differencing Syntactic differencing Semantic differencing Code Change Summarization Rules and exceptions Control-flow changes Evolutionary documentation Querying and Filtering Meaningful changes Non-essential changes Code Change Comprehension 18
  19. 19. Research Directions Program Differencing Text Differencing Syntactic differencing Semantic differencing Code Change Summarization Rules and exceptions Control-flow changes Evolutionary documentation Querying and Filtering Meaningful changes Non-essential changes Source Code Changes Work-item-based changes? 19
  20. 20. Work-item-based Changes  Multiple work-items in a single code change (e.g., a bug fix + code cleanup + a new feature)  Very difficult to understand (Tao et al., FSE’12) 20 JFreeChart revision 1083 Trivial keyword removal Bug fix Formatting
  21. 21. Work-item-based Change Detection  Multiple work-items in a single code change (e.g., a bug fix + code cleanup + a new feature)  Very difficult to understand (Tao et al., FSE’12)  Change decomposition  Program slicing (entity dependencies)  Pattern matching (similarities)  A single work-item spreads across multiple code changes (e.g., 5 changes to finally fix a bug completely)  Change aggregation  Linkage to the same issue  Heuristics like time duration, commit authors, program dependencies, etc. 21
  22. 22. Research Directions Program Differencing Text Differencing Syntax differencing Semantic differencing Code Change Summarization Rules and exceptions Control-flow changes Evolutionary documentation Querying and Filtering Meaningful changes Non-essential changes Code Change Comprehension Work-item change detection Change decomposition Change aggregation 22
  23. 23. Research Directions Program Differencing Text Differencing Syntax differencing Semantic differencing Code Change Summarization Rules and exceptions Control-flow changes Evolutionary documentation Querying and Filtering Meaningful changes Non-essential changes Work-item-specific changes Code Change Comprehension Work-item change detection Change decomposition Change aggregation 23
  24. 24. Research Directions Program Differencing Text Differencing Syntax differencing Semantic differencing Code Change Summarization Rules and exceptions Control-flow changes Evolutionary documentation Querying and Filtering Meaningful changes Non-essential changes Work-item-specific changes Code Change Comprehension Concrete Execution Work-item change detection Change decomposition Change aggregation 24
  25. 25. Explaining code changes with executions of co- changed test cases 25  Test cases  Best documentation for source code  Test cases co-changed with source code  Documentation for code changes?  Mostly synchronous co-evolution of production and test code (Zaidman et al., Empirical Software Engineering’11)  Differential test executions  Co-changed test cases T  Executing T on the old version P and new version P’  Comparing executions to explained change behaviors From StackExchange http://programmers.stackexchange.com/questions/154439/quality-of-code-in- unit-tests?newsletter=1&nlcode=67628%7c1a35 • “Unit tests are one of the best sources of documentation for your system, and arguably the most reliable form” • “Unit tests are often the first thing you look at when trying to grasp what some piece of code does” • “They can also serve as a starting point for people new to the code base”
  26. 26. Research Directions Program Differencing Text Differencing Syntax differencing Semantic differencing Code Change Summarization Rules and exceptions Control-flow changes Evolutionary documentation Querying and Filtering Meaningful changes Non-essential changes Work-item-specific changes Code Change Comprehension Concrete Execution • Co-changed test cases • Differential test execution Work-item change detection Change decomposition Change aggregation 26

×