Measuring Program Comprehension: A Large-Scale Field Study with Professionals
Ian wcre2011
1. An Exploratory Study of the Evolution of
Communicated Information
about the Execution of Large Software Systems
Weiyi Shang
Zhen Ming Jiang
Bram Adams
Ahmed E. Hassan
Michael W. Godfrey
University of WaterlooQueen’s University
Mohamed Nasser
Parminder Flora
Research In Motion (RIM)
5. CI forms basis of Ecosystem of Log Processing
Apps
Workload recoveryAnomaly
detection
Capacity
planning System
monitoring
Performance
analysis
5
Failure
diagnosis
6. How to keep Log Processing Apps in sync with
CI?
Release 1 Release 2 Release 3
6
7. Our Study Dimensions
What types of
modifications
happen to CI?
What information is
conveyed by the
short-lived CI?
Quantity Type Content
How does CI
evolve over
time?
7
9. Our Study Dimensions
What types of
modifications
happen to CI?
What information is
conveyed by the
short-lived CI?
Quantity Type Content
How does CI
evolve over
time?
9
10. CI keeps on growing over time
0
20
40
60
80
100
120
140
160
180
0.14.0
0.15.0
0.16.0
0.17.0
0.18.0
0.19.0
0.20.0
0.20.1
0.20.2
0.21.0
releases
#
execution
events
10
11. …even when system size decreases
# K SLOC # Execution log events
0.19.0 293 113
0.20.0 250 121
11
12. CI is impacted by re-engineering
0.00%
20.00%
40.00%
60.00%
80.00%
100.00%
0.15.0 0.16.0 0.17.0 0.18.0 0.19.0 0.20.0 0.20.1 0.20.2 0.21.0
Unchanged CI
Large amounts of implementation changes
12
13. How does CI
evolve over
time?
13
Growing &
changing
Document &
track
What types of
modifications
happen to CI?
What information is
conveyed by the
short-lived CI?
Quantity Type Content
14. Six types of modification exist
Rephrasing Redundant
information
Adding
information
Deleting
information
Diverging Merging
14
15. Six types of modification exist
Rephrasing Redundant
information
Adding
information
Deleting
information
Diverging Merging
Hadoop mapred Reduce task fetch n bytes
Hadoop MapReduce task Reduce fetch n bytes
15
16. Six types of modification exist
Rephrasing Redundant
information
Adding
information
Deleting
information
Diverging Merging
ShuffleRamManager memory limit n MaxSingleShuffleLimit m
ShuffleRamManager memory limit n MaxSingleShuffleLimit
m mergeThreshold Q
16
17. Six types of modification exist
Rephrasing Redundant
information
Adding
information
Deleting
information
Diverging Merging
Adding task to tasktracker
Adding Map Task to
tasktracker
Adding Reduce Task to
tasktracker
17
18. Six types of modification exist
Rephrasing Redundant
information
Adding
information
Deleting
information
Diverging Merging
Avoidable
18
19. Six types of modification exist
Rephrasing Redundant
information
Adding
information
Deleting
information
Diverging Merging
Recoverable
19
20. Six types of modification exist
Rephrasing Redundant
information
Adding
information
Deleting
information
Diverging Merging
Unavoidable
20
21. Most modifications can be avoided
9.86%
61.97%
14.08%
7.04% 7.04% 2.82%
0.00%
10.00%
20.00%
30.00%
40.00%
50.00%
60.00%
70.00%
80.00%
90.00%
100.00%
redundant
info
rephrasing adding info deleting
info
diverging merging
avoidable recoverable unavoidable
21
22. How does CI
evolve over
time?
22
Growing &
changing
Document &
track
What types of
modifications
happen to CI?
What information is
conveyed by the
short-lived CI?
Quantity Type Content
6 types
Are mostly
avoidable
23. Short-lived CI contains implementation details
Hadoop saves output to a machine.
Hadoop assigns a reduce task to a machine.
Map task updates its progress.
Hadoop reads from a local file.
Hadoop Attempt saves its output and reports to
the task tracker.
23
Node name
Local path
Using ipc
Output file name
24. How does CI
evolve over
time?
24
Growing &
changing
Document &
track
What types of
modifications
happen to CI?
What information is
conveyed by the
short-lived CI?
Quantity Type Content
6 types
Are mostly
avoidable
Implementation-
level details
Fragile
Maintenance
effort