Source code comprehension on evolving software

Source Code Comprehension on Evolving Software:
A Literature Survey
Yida Tao
Supervisor: Sunghun Kim
1

Motivation
Code Change Comprehension
Tao et al., FSE’12
Code change comprehension is
• Frequently required
• In major development activities, in
particular the code-review process
• How do software engineers understand code changes? An exploratory study in industry. Tao et al., FSE’12
• Expectations, outcomes, and challenges of modern code review. Bacchelli and Bird, ICSE’13
Bacchelli & Bird, ICSE’13
• “…review and understand code they
have not seen before may be more
common that a developer working on
new code”
• “From interviews, no other code
review challenge emerged as clearly as
understanding the submitted change”
2

Outline
Program Differencing
Describing code changes
Code Change Summarization
Explaining code changes
Querying and Filtering
Customization
3

4
 Text Differencing
 Syntactic Differencing
 Semantic Differencing

Text Differencing
 Flat representation of a program
 Sequence of strings
 Unix diff
 Only output added/deleted lines, can not detect modified lines
 Hard to determine when a code fragment is moved upward or downward
 Ldiff (Canfora et al., ICSE’09)
 An enhanced line differencing tool
 Limitations
 Changes to *characters*
 No syntactic-structure information
5

Syntactic Differencing
 Structured representation of a program
 Abstract syntax tree; XML
 ChangeDistiller (Fluri et al., TSE’07)
 Tree differencing
 Node: bigram string similarity
 Control structure: subtree similarity
 Output: tree edit script (insert, delete, move, update)
 XML differecing
 srcXML (Maletic & Collard, ICSM’04): embeds abstract syntax and structure
within the source code
 diffX (Al-Ekram et al., CASCON '05)
 Limitation
 Cannot describe how the behavior of a program is changed
 Still report differences for behavior-preserving changes
6

Semantic Differencing
 Semantic diff (Jackson and Ladd, ICSM’94)
 Method-level
 Variable dependencies comparison
7
==

Semantic Differencing (cont.)
 JDiff (Apiwattanapong et al. ASE’04, 06)
 Extended control-flow graph (ECFG)
 Dynamic binding, class hierarchy, exception handling, etc.
8

Semantic Differencing (cont.)
 Differential symbolic execution (Person et al., FSE’08)
 “Executing” a program using symbolic values
9

Outline
Text Differencing
Syntactic differencing
Semantic differencing
Explaining code changes
Customization
10

 LSdiff (Kim and Notkin, ICSE’09)
 Group related changes
 Detect potential inconsistencies in a code change
11

Code Change Summarization (cont.)
 DeltaDoc (Buse and Weimer, ASE’10)
 Symbolic execution: obtain path predicates for each statement in both
versions
 Identify statements that are added, deleted, or have a changed predicates
 Summarization
12

 Multi-document summarization (Rastkar and Murphy, ICSE’13)
 Linking evolutionary documents (commit log, issue tracking entries)
 Finding the most informative sentences to extract to form a summary
 Similarity between a sentence and the title of the enclosing document
 Overlap between a sentence and the adjacent document
13

 Challenges
 Evolutionary documents
 Linkage might not be found (Bachman et al., FSE’10, Wu et al., FSE’11)
 Human-written document may be unavailable or uninformative (Buse and Weimer,
ASE’10, Tao et al., FSE’12)
 Automatically generated document
 Verbosity
 Uninteresting changes are identified, e.g., “all types that declared toString() added
constructors” (Kim and Notkin, ICSE’09)
14
LSdiff DeltaDoc

Outline
Text Differencing
Rules and exceptions
Control-flow changes
Evolutionary documentation
Customization
15

 Specifying and detecting meaningful changes (Yu et al., ASE’11)
 Normalize the program (user-specified) before differencing
 Non-trivial to construct the query
16

Querying and Filtering (cont.)
 Filtering non-essential changes (Kawrykow and
Robillard, ICSE’11)
 Non-essential changes: rename-induced modifications, local
variable extraction, trivial keyword modification, whitespace
and documentation updates
 ChangeDistiller (Fluri et al., TSE’07) + Partial program
analysis (Dagenais and Robillard, ICSE’08)
 Goal: improving mining and recommendation accuracy
instead of developers’ comprehension
17

Outline
Text Differencing
Meaningful changes
Non-essential changes
18

Research Directions
Text Differencing
Meaningful changes
Source Code Changes
Work-item-based changes?
19

Work-item-based Changes
 Multiple work-items in a single code change (e.g., a bug fix +
code cleanup + a new feature)
 Very difficult to understand (Tao et al., FSE’12)
20
JFreeChart revision 1083
Trivial keyword removal
Bug fix
Formatting

Work-item-based Change Detection
 Multiple work-items in a single code change (e.g., a bug fix +
code cleanup + a new feature)
 Very difficult to understand (Tao et al., FSE’12)
 Change decomposition
 Program slicing (entity dependencies)
 Pattern matching (similarities)
 A single work-item spreads across multiple code changes
(e.g., 5 changes to finally fix a bug completely)
 Change aggregation
 Linkage to the same issue
 Heuristics like time duration, commit authors, program dependencies, etc.
21

Research Directions
Text Differencing
Syntax differencing
Meaningful changes
Work-item change detection
Change decomposition
Change aggregation
22

Research Directions
Text Differencing
Syntax differencing
Meaningful changes
Work-item-specific
changes
Change aggregation
23

Research Directions
Text Differencing
Syntax differencing
Meaningful changes
Work-item-specific
changes
Concrete Execution
Change aggregation
24

Explaining code changes with executions of co-
changed test cases
25
 Test cases
 Best documentation for source code
 Test cases co-changed with source code
 Documentation for code changes?
 Mostly synchronous co-evolution of production and test
code (Zaidman et al., Empirical Software Engineering’11)
 Differential test executions
 Co-changed test cases T
 Executing T on the old version P and new version P’
 Comparing executions to explained change behaviors
From StackExchange
http://programmers.stackexchange.com/questions/154439/quality-of-code-in-
unit-tests?newsletter=1&nlcode=67628%7c1a35
• “Unit tests are one of the best sources of documentation for your system,
and arguably the most reliable form”
• “Unit tests are often the first thing you look at when trying to grasp what
some piece of code does”
• “They can also serve as a starting point for people new to the code base”

Research Directions
Text Differencing
Syntax differencing
Meaningful changes
Work-item-specific
changes
Concrete Execution
• Co-changed test cases
• Differential test execution
Change aggregation
26

Source code comprehension on evolving software

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (14)

Similar to Source code comprehension on evolving software

Similar to Source code comprehension on evolving software (20)

More from Sung Kim

More from Sung Kim (9)

Recently uploaded

Recently uploaded (20)

Source code comprehension on evolving software

Editor's Notes