The document discusses techniques for analyzing software evolution and defects using mining software repositories (MSR) approaches. It describes change coupling analysis, which uses part of a meta-model to make sense of large amounts of change coupling information and address the goal of understanding how changes propagate through a software system. The technique analyzes the relationships between source code changes and software defects by examining the coupling between changes in source code and bug reports.
Evolution of Software Studied Through Mining Software Repositories
1. On the Evolution of
Source Code and Software Defects
Marco D’Ambros
REVEAL group @ Faculty of Informatics
University of Lugano, Switzerland
Dissertation committee
Prof. Michele Lanza
Prof. Carlo Ghezzi
Prof. Cesare Pautasso
Prof. Harald C. Gall
Prof. Hausi A. Müller
7. Ph.D.
Conclusion
x
x
Thesis
Analysis
x
techniques x Tool
support
We are here
8. Ph.D.
Conclusion
swamp of
procrastination
x
peaks of tools
madness
x
Thesis
Analysis
x
techniques x Tool
support
We are here haunted teachwood
forest
9. Ph.D.
Conclusion
swamp of
procrastination
x
peaks of tools
madness
x
Thesis
Analysis
x
techniques x Tool
support
We are here haunted teachwood
forest
10. The Evolution of Software
Evolution
1975 First software configuration First bug tracking First workshop MSR becomes a
managament (SCSS) RCS CVS system (GNATS) Bugzilla Subversion on MSR Git Jazz conference
1980 1988 1990 1995 1996 1997 1999 2000 2003 2010
1982 1986 1992 2004 2006 2007 2008
Cost of software maintenance First MSR Cost of software Release History
Lehman's laws of Boehm's spiral
estimated to be 50-75% of the total approach (Ball maintenance estimated to Database
software evolution model cost of software (Sommerville, Davis) et. al.) be more than 85% (Erlik) (Fischer et. al.)
Extreme Programming
Explained: Embrace Changes
11. 1975 First software configuration
managament (SCSS) RCS CVS
1980 1988 1990
1982 1986
Lehman's laws of Boehm's spiral
software evolution model
Foundation Publication Tool
12. First bug tracking
system (GNATS) Bugzilla Subversion
1990 1995 1996 1997 1999 2000
1992
Cost of software maintenance First MSR
estimated to be 50-75% of the total approach (Ball
cost of software (Sommerville, Davis) et. al.)
Extreme Programming
Explained: Embrace Changes
Infrastructure
Implementation Publication Tool
13. First workshop MSR becomes a
on MSR Git Jazz conference
2000 2003 2010
2004 2006 2007 2008
Cost of software Release History
maintenance estimated to Database
be more than 85% (Erlik) (Fischer et. al.)
The Advent of MSR Event Publication Tool
14. The Evolution of Software
Evolution
1975 First software configuration First bug tracking First workshop MSR becomes a
managament (SCSS) RCS CVS system (GNATS) Bugzilla Subversion on MSR Git Jazz conference
1980 1988 1990 1995 1996 1997 1999 2000 2003 2010
1982 1986 1992 2004 2006 2007 2008
Cost of software maintenance First MSR Cost of software Release History
Lehman's laws of Boehm's spiral
estimated to be 50-75% of the total approach (Ball maintenance estimated to Database
software evolution model cost of software (Sommerville, Davis) et. al.) be more than 85% (Erlik) (Fischer et. al.)
Extreme Programming
Explained: Embrace Changes
Foundation Infrastructure The advent of
Implementation MSR
18. Models
SCM meta-data Source code
for(int j=m; j>i; j--){
uCJM1= dataUC[j-1];
uCJ= dataUC[j];
if(uCJM1.compare(z)>
{ /* exchange */
tempStr= data[j-1];
/* sort the data */
data[j-1]= data[j];
data[j]= tempStr;
dataUC[j-1]= uCJ;
dataUC[j]= uCJM1;
}
}
IDE data Documentation
Holistic
software
evolution
Chats Unit tests
Software defects E-mail archive
Bytecode
19. Models
SCM meta-data Source code
for(int j=m; j>i; j--){
uCJM1= dataUC[j-1];
uCJ= dataUC[j];
if(uCJM1.compare(z)>
{ /* exchange */
tempStr= data[j-1];
/* sort the data */
data[j-1]= data[j];
data[j]= tempStr;
dataUC[j-1]= uCJ;
dataUC[j]= uCJM1;
}
}
IDE data Documentation
Holistic
software
evolution
Chats Unit tests
Software defects E-mail archive
Bytecode
20. Models
SCM meta-data Source code
for(int j=m; j>i; j--){
uCJM1= dataUC[j-1];
uCJ= dataUC[j];
if(uCJM1.compare(z)>
{ /* exchange */
tempStr= data[j-1];
/* sort the data */
data[j-1]= data[j];
data[j]= tempStr;
dataUC[j-1]= uCJ;
dataUC[j]= uCJM1;
}
}
IDE data Documentation
Holistic
software
evolution
Chats Unit tests
Software defects E-mail archive
Bytecode
21. Models
SCM meta-data Source code
for(int j=m; j>i; j--){
uCJM1= dataUC[j-1];
uCJ= dataUC[j];
if(uCJM1.compare(z)>
{ /* exchange */
tempStr= data[j-1];
/* sort the data */
data[j-1]= data[j];
data[j]= tempStr;
dataUC[j-1]= uCJ;
dataUC[j]= uCJM1;
}
}
IDE data Documentation
Holistic
software
evolution
Chats Unit tests
Software defects E-mail archive
Bytecode A. Zeller, MSR keynote 2007
22. Models
SCM meta-data Source code
for(int j=m; j>i; j--){
uCJM1= dataUC[j-1];
uCJ= dataUC[j];
if(uCJM1.compare(z)>
{ /* exchange */
tempStr= data[j-1];
/* sort the data */
data[j-1]= data[j];
data[j]= tempStr;
dataUC[j-1]= uCJ;
dataUC[j]= uCJM1;
}
}
IDE data Documentation
Our Approach
Chats Unit tests
Software defects E-mail archive
Bytecode
23. An integrated view of software
evolution, combining historical
information regarding source
co de and software defects,
supports an extensible set of
software maintenance tasks.
D’Ambros, 2010
29. Ph.D.
Conclusion
swamp of
procrastination
x
peaks of tools
madness
x
Thesis
Analysis
x
techniques x Tool
support
We are here haunted teachwood
forest
30. Ph.D.
Conclusion
swamp of
procrastination
x
peaks of tools
madness
x
Thesis
Analysis
x
techniques x Tool
support
We are here haunted teachwood
forest
31. Ph.D.
Conclusion
swamp of
procrastination
x
peaks of tools
madness
x
Thesis
Analysis
x
techniques x Tool
support
We are here haunted teachwood
forest
34. Change Coupling Analysis
SCM meta-data Source code
for(int j=m; j>i; j--){
uCJM1= dataUC[j-1];
uCJ= dataUC[j];
if(uCJM1.compare(z)>
{ /* exchange */
tempStr= data[j-1];
/* sort the data */
data[j-1]= data[j];
data[j]= tempStr;
dataUC[j-1]= uCJ;
dataUC[j]= uCJM1;
}
}
Software defects E-mail archive
Goal: Make sense of huge amount of change
coupling information
35. Technique name
Change Coupling Analysis
SCM meta-data Source code
for(int j=m; j>i; j--){
uCJM1= dataUC[j-1];
uCJ= dataUC[j];
if(uCJM1.compare(z)>
{ /* exchange */
tempStr= data[j-1];
/* sort the data */
data[j-1]= data[j];
data[j]= tempStr;
dataUC[j-1]= uCJ;
dataUC[j]= uCJM1;
}
}
Software defects E-mail archive
Goal: Make sense of huge amount of change
coupling information
36. Technique name
Change Coupling Analysis
SCM meta-data Source code Used part of
for(int j=m; j>i; j--){
uCJM1= dataUC[j-1];
uCJ= dataUC[j];
if(uCJM1.compare(z)>
{ /* exchange */
tempStr= data[j-1];
/* sort the data */
data[j-1]= data[j];
data[j]= tempStr;
the meta-model
dataUC[j-1]= uCJ;
dataUC[j]= uCJM1;
}
}
Software defects E-mail archive
Goal: Make sense of huge amount of change
coupling information
37. Technique name
Change Coupling Analysis
SCM meta-data Source code Used part of
for(int j=m; j>i; j--){
uCJM1= dataUC[j-1];
uCJ= dataUC[j];
if(uCJM1.compare(z)>
{ /* exchange */
tempStr= data[j-1];
/* sort the data */
data[j-1]= data[j];
data[j]= tempStr;
the meta-model
dataUC[j-1]= uCJ;
dataUC[j]= uCJM1;
}
}
Software defects E-mail archive
Goal: Make sense of huge amount of change
coupling information
Goal /
Question
38. Technique name
Change Coupling Analysis
SCM meta-data Source code Used part of
for(int j=m; j>i; j--){
uCJM1= dataUC[j-1];
uCJ= dataUC[j];
if(uCJM1.compare(z)>
{ /* exchange */
tempStr= data[j-1];
/* sort the data */
data[j-1]= data[j];
data[j]= tempStr;
the meta-model
dataUC[j-1]= uCJ;
dataUC[j]= uCJM1;
}
}
Technique
number
Software defects E-mail archive
Goal: Make sense of huge amount of change
coupling information
Goal /
Question
39. Change Coupling Analysis
SCM meta-data Source code
for(int j=m; j>i; j--){
uCJM1= dataUC[j-1];
uCJ= dataUC[j];
if(uCJM1.compare(z)>
{ /* exchange */
tempStr= data[j-1];
/* sort the data */
data[j-1]= data[j];
data[j]= tempStr;
dataUC[j-1]= uCJ;
dataUC[j]= uCJM1;
}
}
Software defects E-mail archive
Goal: Make sense of huge amount of change
coupling information
40. change cou•pling
implicit dependency of files
that frequently change together
[Gall et al., ICSM 1998]
44. Package Class Protocol Method
browser browser browser browser
Main Evolution Secondary Evolution
Radar Visualization Radar Visualization
Current project
package list
The Evolution Radar
45. Package Class Protocol Method
browser browser browser browser
Main Evolution Secondary Evolution
Radar Visualization Radar Visualization
Current project
package list
System re-documentation and restructuring
46. Assessing Change impact
ay
architecture dec analysis
Coupled files
Tasks Supported
System evolution analys
is
Protocol Method
Package Class browser
browser browser
browser
Main Evolution Secondary Evolution
Radar Visualization Radar Visualization
Time
Current project
package list
System re-documentation and restructuring
51. System Radiography view
Which components experienced
many defects?
Which defects are hard to fix?
Bug watch view
52. Bug Evolution Analysis
The visual analysis of bug
histories permits the detection
of critical software components
and exceptional bugs.
VISSOFT 2007
53. Bug-Code Co-Evolution Analysis
SCM meta-data Source code
for(int j=m; j>i; j--){
uCJM1= dataUC[j-1];
uCJ= dataUC[j];
if(uCJM1.compare(z)>
{ /* exchange */
tempStr= data[j-1];
/* sort the data */
data[j-1]= data[j];
data[j]= tempStr;
dataUC[j-1]= uCJ;
dataUC[j]= uCJM1;
}
}
Software defects E-mail archive
Goal: Detect patterns in the co-evolution
of source code and defects
61. Bug-Code Co-Evolution Analysis
Detecting and visualizing
co-evolutionary patterns allows
the characterization of software
components based on their
co-evolution.
CSMR 2006, JSME 2009
82. Prediction Performance Across Five Systems
SCM meta-data
Entropy of changes
Code metrics
Previous defects
Churn of code metrics
Entropy of code metrics
0 4 8 11 15 Score
83. Prediction Performance Across Five Systems
SCM meta-data
Performance not stable
across systems
Entropy of changes
Code metrics
Good performance and
fast to compute
Previous defects
Churn of code metrics Most stable performance
but computationally
expensive
Entropy of code metrics
0 4 8 11 15 Score
84. Bug Prediction
The entropy and the churn of
code metrics are the most stable
predictors across different
systems.
MSR 2010, EMSE under review
87. Measuring Change Coupling
Distribution
Number of coupled classes
Force
Number of co-changes
Time decay
Changes far in the past count less
88. Correlation with Software Defects
Distribution
Change coupling str
ongly
Force correlates with softw
are defects
0.8+
Time decay
Spearman’s correlation
89. Correlation with Software Defects
Distribution
Change coupling str
ongly
Force correlates with softw
are defects
0.8+
Time decay
Spearman’s correlation
90. Defect Prediction
SCM Change
data coupling
Code
metrics
Explanative power
92. Defect Prediction
0.9
Change
0.8 coupling
SCM
Change data
0.6
SCM
data coupling
Code
metrics
0.5
0.3 Code
metrics
0.2
0
Explanative power Predictive power
93. Defect Prediction
0.9
Change
0.8
14% coupling
SCM
Change data
0.6
SCM
data coupling
Code
metrics
0.5
0.3 Code
metrics
0.2
0
Explanative power Predictive power
94. Bug Prediction with Change Coupling
Change coupling correlates
with software defects and can
be used to improve defect
prediction models.
WCRE 2009
96. Design Flaws
Class (Size ∝responsibility)
Design guideline: A class should have
one responsibility
97. Design Flaws
Class (Size ∝responsibility)
Design guideline: A class should have
one responsibility
Vio l ate d
98. Design Flaws
Does the pre sence of design
flaws corre late with software
defects?
An d their addition?
Class (Size ∝responsibility)
Design guideline: A class should have
one responsibility
Vio l ate d
106. Analyzing Design Flaw Deltas
Number of flaws Addition of flaws
Time
correlation
Defects
Time
Δt
107. Results on Six Large Systems
0.4 +
Flaw presence correlation
0.6+
Flaw addition correlation
108. Results on Six Large Systems
0.4 +
Flaw presence correlation
aw correlate s more than
No fl
sistently acr oss systems
other s con
0.6+
Flaw addition correlation
110. Bug Prediction with E-mails
SCM meta-data Source code
for(int j=m; j>i; j--){
uCJM1= dataUC[j-1];
uCJ= dataUC[j];
if(uCJM1.compare(z)>
{ /* exchange */
tempStr= data[j-1];
/* sort the data */
data[j-1]= data[j];
data[j]= tempStr;
dataUC[j-1]= uCJ;
dataUC[j]= uCJM1;
}
}
Software defects E-mail archive
Question: Can bug prediction techniques
be improved with e-mail data?
111. Software entities that are frequently
mentioned in development mailing
lists are defect prone
Popularity Metrics
112. Spearman’s correlation
Do popularit
y metrics
correlate wit
h defects?
Popularity metrics
113. Spearman’s correlation
Do popularit
y metrics
0.8 correlate wit
h defects?
Popularity metrics
0.6
0.4
0.2
0
Equinox Jackrabbit Lucene Maven
114. Spearman’s correlation
Do popularit
y metrics
0.8 correlate wit
h defects?
Popularity metrics
Lines of code
0.6
0.4
0.2
0
Equinox Jackrabbit Lucene Maven
115. Prediction performance Do popularity metrics improve
existing bug prediction techniques?
POP POP
metrics metrics
116. Prediction performance Do popularity metrics improve
existing bug prediction techniques?
0.5
SCM data
0.4
Code
metrics
0.3
POP POP
metrics metrics
0.2
0.1
0
117. Prediction performance Do popularity metrics improve
existing bug prediction techniques?
0.5
4% SCM
0.4
SCM data + 12% Code
+
POP POP
Code
metrics
0.3
POP POP
metrics metrics
0.2
0.1
0
118. Bug Prediction with E-mails
Popularity metrics extracted
from development mailing lists
correlate with defects and
improve existing bug prediction
techniques.
FASE 2010
121. Ph.D.
Conclusion
swamp of
procrastination
x
peaks of tools
madness
x
Thesis
Analysis
x
techniques x Tool
support
We are here haunted teachwood
forest
122. Ph.D.
Conclusion
swamp of
procrastination
x
peaks of tools
madness
x
Thesis
Analysis
x
techniques x Tool
support
We are here haunted teachwood
forest
123. Ph.D.
Conclusion
swamp of
procrastination
x
peaks of tools
madness
x
Thesis
Analysis
x
techniques x Tool
support
We are here haunted teachwood
forest
136. Ph.D.
Conclusion
swamp of
procrastination
x
peaks of tools
madness
x
Thesis
Analysis
x
techniques x Tool
support
We are here haunted teachwood
forest
137. Ph.D.
Conclusion
swamp of
procrastination
x
peaks of tools
madness
x
Thesis
Analysis
x
techniques x Tool
support
We are here haunted teachwood
forest
138. Ph.D.
Intermezzo
x Conclusion
swamp of
procrastination
x
peaks of tools
madness
x
Thesis
Analysis
x
techniques x Tool
support
We are here haunted teachwood
forest
139. Replicating
Experiments
Bug prediction benchmar k
Models available through
Churrasco web interface
140. Recent annotations
added
User
SVG
Interactive
Visualization
People participating
to the collaboration
Report generator
Selected figure
Context menu
Selected figure
information
Metrics mapping
configurator
Package selector
Regular expression
matcher
Collaboration in Churrasco STTT 2010
141. Ph.D.
Intermezzo
x Conclusion
swamp of
procrastination
x
peaks of tools
madness
x
Thesis
Analysis
x
techniques x Tool
support
We are here haunted teachwood
forest
142. Ph.D.
Intermezzo
x Conclusion
swamp of
procrastination
x
peaks of tools
madness
x
Thesis
Analysis
x
techniques x Tool
support
We are here haunted teachwood
forest
143. Ph.D.
Intermezzo
x Conclusion
swamp of
procrastination
x
peaks of tools
madness
x
Thesis
Analysis
x
techniques x Tool
support
We are here haunted teachwood
forest
144. Ph.D.
Intermezzo
x Conclusion
swamp of
procrastination
x
peaks of tools
madness
x
Thesis
Analysis
x
techniques x Tool
support
We are here haunted teachwood
forest
154. User studies Other languages
Deve
lop
er
neo
Limitations
More case studies
155. User studies Other languages
Deve
lop
er
neo
Limitations
More case studies
Future Work
156. User studies Other languages
Deve
lop
er
neo
Limitations
Exploit author data More case studies
Future Work
157. User studies Other languages
Deve
lop
er
neo
Limitations
Exploit author data More case studies
for(int j=m; j>i; j--){
uCJM1= dataUC[j-1];
uCJ= dataUC[j];
if(uCJM1.compare(z)>
{ /* exchange */
tempStr= data[j-1];
/* sort the data */
data[j-1]= data[j];
Extend the Combine
data[j]= tempStr;
dataUC[j-1]= uCJ;
dataUC[j]= uCJM1;
}
}
meta-model techniques Future Work
Mevo
160. Ph.D.
Intermezzo
x Conclusion
swamp of
procrastination
x
peaks of tools
madness
x
Thesis
Analysis
x
techniques x Tool
support
We are here haunted teachwood
forest
161. Ph.D.
Intermezzo
x Conclusion
swamp of
procrastination
x
peaks of tools
madness
x
Thesis
Analysis
x
techniques x Tool
support
We are here haunted teachwood
forest
162. Journal papers Conference papers
1. On Porting Software Visualization Tools to the Web 1. On the Impact of Design Flaws on Software Defects
Marco D'Ambros, Michele Lanza, Mircea Lungu, Romain Robbes
Marco D'Ambros, Alberto Bacchelli, Michele Lanza
In Software Tools for Technology Transfer (STTT), Springer, 2010.
In Proceedings of QSIC 2010, pp. 23-31.
2. Distributed and Collaborative Software Evolution Analysis with
Churrasco 2. An Extensive Comparison of Bug Prediction Approaches
Marco D'Ambros, Michele Lanza Marco D'Ambros, Michele Lanza, Romain Robbes
In Journal of Science of Computer Programming (SCP),Vol. 75. No. 4, pp. 276-287. In Proceedings of MSR 2010, pp. 31-41.
Elsevier, 2010.
3. Are Popular Classes More Defect Prone?
3. Visualizing Co-Change Information with the Evolution Radar Alberto Bacchelli, Marco D'Ambros, Michele Lanza
Marco D'Ambros, Michele Lanza, Mircea Lungu In Proceedings of FASE 2010, pp. 59-73.
In IEEE Transactions on Software Engineering (TSE),Vol. 35. No. 5, pp. 720-735. IEEE CS
4. On the Relationship Between Change Coupling and Software Defects
Press, 2009.
Marco D'Ambros and Michele Lanza and Romain Robbes
4. Visual Software Evolution Reconstruction In Proceedings of WCRE 2009, pp. 135-144.
Marco D'Ambros, Michele Lanza
5. Promises and Perils of Porting Software Visualization Tools to the Web
In Journal on Software Maintenance and Evolution: Research and Practice (JSME),Vol.21,
Marco D'Ambros, Mircea Lungu, Michele Lanza, Romain Robbes
No.3, pp. 217-232, May 2009. John Wiley & Sons, 2009.
In Proceedings of WSE 2009, pp. 109-118.
6. A Flexible Framework to Support Collaborative Software Evolution Analysis
Marco D'Ambros, Michele Lanza
In Proceedings of CSMR 2008, pp. 3-12.
7. Reverse Engineering with Logical Coupling
Other publications Marco D'Ambros, Michele Lanza
In Proceedings of WCRE 2006, pp. 189-198.
8. Software Bugs and Evolution: A Visual Approach to Uncover Their Relationships
1. Supporting Software Evolution Analysis with Historical Dependencies and
Marco D'Ambros, Michele Lanza
Defect Information
In Proceedings of CSMR 2006, pp. 227-236.
Marco DʼAmbros
In Proceedings of ICSM 2008, pp. 412-415.
2. The Metabase: Generating Object Persistency Using Meta Descriptions
Marco D'Ambros, Michele Lanza, Martin Pinzger
In Proceedings of FAMOOSr 2007.
3. BugCrawler: Visualizing Evolving Software Systems
Workshop papers
Marco D'Ambros, Michele Lanza
1. Churrasco: Supporting Collaborative Software Evolution Analysis
In Proceedings of CSMR 2007, pp. 333-334.
Marco D'Ambros, Michele Lanza
4. Applying the Evolution Radar to PostgreSQL In Proceedings of WASDeTT 2008, 2008.
Marco D'Ambros, Michele Lanza
2. "A Bug's Life" - Visualizing a Bug Database
In Proceedings of MSR 2006, pp. 177-178, 2006.
Marco D'Ambros, Michele Lanza, Martin Pinzger
In Proceedings of VISSOFT 2007, pp. 113-120.
3. The Evolution Radar: Visualizing Integrated Logical Coupling Information
Marco D'Ambros, Michele Lanza, Mircea Lungu
In Proceedings of MSR 2006, pp. 26-32.