1. Supporting Software Evolution Using
Adaptive Change Propagation Heuristics
Haroon Malik
Ahmed E. Hassan
School of Computing, Queen’s University, Canada
1
2. What is Change Propagation
It is the process of propagating code
changes to other entities in software
system.
It ensures the consistency of assumptions
in the system after changing an entity.
Mis-propagating likely to introduce bugs
2
3. The Change Propagation
Process
3
Determine
Initial Entity
To Change
Change
Entity
Determine
Other Entities
To Change
Consult
Guru for
Advice
New Req., Bug Fix
“How does a change in one source code entity propagate to other
entities?”
No More
Changes
For Each Entity
Suggested Entity
5. Consider change set with A, B and C
changing together
5
A
B
C
B
C
A
D
E
D
HIST
Heuristic
CUD Heuristic
(Static dependency)
HELPFUL Wasted Developer time
6. Consider change set with A, B and C
changing together
6
A
B
C
B
C
A
D
E
D
HIST
Heuristic
CUD Heuristic
(Static dependency)
HELPFUL Wasted Developer time
Which heuristics
should we pick ?
We should track
the performance of
pool of heuristics
over time for each
entity
7. Consider change set with A, B and C
changing together
7
A
B
C
B
C
A
D
D
D
HIST
Heuristic
CUD Heuristic
(Static dependency)
HELPFUL Wasted Developer time
Best Heuristic table
(BHT)
Tracks and updates
8. Consider change set with A, B and C
changing together
8
A
B
C
B
C
A
D
D
D
HIST
Heuristic
CUD Heuristic
(Static dependency)
A
E
D
Time
HIST or CUD?
BHT says HIST always work
well with A [A-Freq].
We use HIST
BHT might also say HIST
worked well with A, last time
[A-REC]
11. Consider change set with A, B and D
changing together
11
E
D
A
B
X
Y
Precision= 1/5= 20%
Recall = 1/1= 100%
We want high Precision & high
Recall
12. Change Propagation Challenge
Mostly manual & time consuming process
Requires dependency on others
knowledge of senior developers, who are usually too
busy to guide every change
Experience of guru, who rarely exists in large projects
Communication among different teams; itself is a
challenge in large projects
Use of documentation & previous test suits which are
rarely up-todate
12
13. Shortcomings of Current
Practices
Explores single dimension
HIST: Given a changed entity A, a HIST heuristic would suggest
all entities that changed often with A in the past.
CUD: Given a modified entity A, a CUD heuristic returns all
entities that depend on A or that A depends on.
FILE: Given a modified entity A, a file heuristic would return all
entities in the same file as A
Static heuristics
Do not adjust over time nor,
Adapt to particular changed entity
13
14. Proposed Approach
Adaptive co-change meta-heuristics:
Tracks best performing heuristics for each
entity in Best Heuristic table (BHT)
Updates Table as project evolves
14
15. BHT Update
BHT has best performing heuristics
A-Recency:
For the last change of an entity
A-Frequency
Over all changes of an entity
By continuously updating the BHT table, we ensure that we
are always using the most optimal heuristic for an entity
15
16. Empirical Study
Used change sets from 5 open source projects
with over 39 years of development:
PostgreSQL, FreeBSD, Gcluster and GCC
Recover change sets from source control
repositories (CVS)
Replayed the history to measure the
performance
16
17. Performance Measures of
Heuristics
Project
HIST CUD FILE A-Freq A-Rec
Rec Prec Rec Prec Rec Prec Rec Prec Rec Prec
Postgress 0.69 0.14 0.44 0.02 0.73 0.13 0.45 0.25 0.4 0.30
FreeBSD 0.70 0.12 0.40 0.02 0.76 0.11 0.41 0.27 0.41 0.30
GCluster 0.52 0.18 0.38 0.09 0.70 0.14 0.39 0.22 0.35 0.28
GCC 0.78 0.10 0.43 0.02 0.80 0.12 0.51 0.21 0.47 0.25
All 0.67 0.13 0.41 0.04 0.74 0.12 0.44 0.23 0.40 0.28
F-measure 0.23 0.06 0.21 0.30 0.33
Recall: Adaptive heuristics are similar to traditional heuristics
Precision: Adaptive heuristics out perform traditional heuristics
F-measure: Adaptive heuristics out perform traditional heuristics
(23% better than the best heuristic HIST)
17
18. Performance Characteristics of
Adaptive Heuristics
To better understand our Adaptive Heuristics we
examined their performance along three direction:
Performance Over Time
BHT Composition over Time
BHT suggestions vs. optimal suggestions
18
19. Performance Over Time
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
1993 1995 1997 1999 2001 2003 2005
Years
Precesion
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
1993 1995 1997 1999 2001 2003 2005
Years
Recall
HIST CUD File A-Freq A-Rec
For Precision:
Adaptive heuristic outperforms traditional heuristics.
For Recall:
Adaptive heuristics do not perform as well as other traditional heuristics.
Overall A-Rec has lower recall as compared to A-Freq for all projects 19
20. BHT Composition over Time
0
5
10
15
20
25
30
35
40
45
50
55
60
0 500 1000 1500 2000 2500 3000 3500 4000
Day(s)
HBTcompostion(%)
HIST
FILE
CUD
0
5
10
15
20
25
30
35
40
45
50
55
0 500 1000 1500 2000 2500 3000 3500 4000
Day(s)
HBTcompostion(%)
HIST
FILE
CUD
A-Freq A-Rec
BHT for Free BSD
All projects show same trends
At start History is not widely used
As the projects evolves, HIST is most effective.
20
21. BHT Suggestion Vs. Optimal
Since we are replaying of historical change set we can
compare Adaptive vs. Optimal heuristic
Optimal heuristic always 100% suggests the best heuristic
Suggestion: # of correctly suggested heuristics
76-85%
Performance:
63% of optimal F-measure
HIST is 44% of optimal best performing basic heuristics
37% room for improvement
21
22. Improving the Performance
Adaptive Heuristics
Improve HIST in hope to improve adaptive
heuristics by employing advance techniques
Two improved HIST [Hassan, Holt: 2005]
RECN(M): given a changed entity E, RECN(M) suggests all
entities that changed with E in the past M months.
FREQ(A): given a changed entity E, FREQ(A) suggests all
entities that changed with E at least twice in the past and
changed more that A% of the time with E.
22
23. Improved HIST heuristics
Integrated RECN(4) and FREQ(60) into the heuristic pool
used by adaptive meta-heuristics
Achieved 0.73 to 0.78 for Recall and 0.64 for Precision
Nearly 30% increase in performance:
A-FREQ is within 91% of the optimal heuristic
A-REC is within 93% of the optimal heuristic
RECN(M) F-Measure FREQ(A) F-Measure
RECN(2) 0.39 FREQ(50) 0.39
RECN(4) 0.40 FREQ(60) 0.44
RECN(6) 0.34 FREQ(70) 0.42
RECN(8) 0.28 FREQ(80) 0.39
23
24. Findings
Adaptive heuristics can achieve:
0.73 to 0.78 for Recall and
0.64% Precession
57% improvement over T. heuristics
Performance difference are statically
significant based on a paired Wilcoxon signed
rant test at 5% level of significant.
(Alpha=0.05)
24