1. Software Evolution and Defects from a Controlled,
Multiple, Industrial Case Study
Aiko Yamashita, S. Amirhossein Abtahizadeh, Foutse Khomh, Yann-Gaël Guéhéneuc
Centrum Wiskunde & Informatica
Oslo and Akershus University College of Applied Sciences
Polytechnique Montréal
Data Showcase - MSR 2017 - Buenos Aires, Argentina
6. • Simula Experiment• Software Replicability• 4 Norwegian firms
Java Applications with near same functionality
A DB C
Study 1
Task and learning effect
7. Task and learning effect
•Simula multiple case study
• Software Maintainability
• 2 European firms
Study 2
8. Task and learning effect
Task 3.
New Reporting
functionality
Task 1. Replacing external data source
✔
Task 2.
New authentication
mechanism
System!
Control over task•Simula multiple case study
• Software Maintainability
• 2 European firms
Study 2
9. Task and learning effect
DCBA
Developer
System
Control over learning effect
Task 3.
New Reporting
functionality
Task 1. Replacing external data source
✔
Task 2.
New authentication
mechanism
System!
Control over task•Simula multiple case study
• Software Maintainability
• 2 European firms
Study 2
11. Programming skills
“Construction and Validation of an Instrument
for Measuring Programming Skill”
(Bergersen et. al. 2014)
Control over programming skills
12. Programming skills
• Measurement instrument based
on combination of speed and
correctness.
“Construction and Validation of an Instrument
for Measuring Programming Skill”
(Bergersen et. al. 2014)
Control over programming skills
13. Programming skills
• Measurement instrument based
on combination of speed and
correctness.
• The Rasch measurement model
was used.
“Construction and Validation of an Instrument
for Measuring Programming Skill”
(Bergersen et. al. 2014)
Control over programming skills
14. Programming skills
• Measurement instrument based
on combination of speed and
correctness.
• The Rasch measurement model
was used.
• Sixty-five professional developers
from eight countries participated
in validating the instrument
“Construction and Validation of an Instrument
for Measuring Programming Skill”
(Bergersen et. al. 2014)
Control over programming skills
15. Programming skills
• Measurement instrument based
on combination of speed and
correctness.
• The Rasch measurement model
was used.
• Sixty-five professional developers
from eight countries participated
in validating the instrument
• They solved 19 Java
programming tasks over two days
“Construction and Validation of an Instrument
for Measuring Programming Skill”
(Bergersen et. al. 2014)
Control over programming skills
16. Programming skills
• Measurement instrument based
on combination of speed and
correctness.
• The Rasch measurement model
was used.
• Sixty-five professional developers
from eight countries participated
in validating the instrument
• They solved 19 Java
programming tasks over two days
• Six of the participants who
scored better than average skill
were selected
“Construction and Validation of an Instrument
for Measuring Programming Skill”
(Bergersen et. al. 2014)
Control over programming skills
17. Variables and Data Sources
System
Project context
Tasks
Source
code
Daily interviews
Audio files/notes
Subversion
database
Programming
Skill
Defects*
Development
Technology
Change
Size**
Effort**
Maintenance outcomes
Think aloud
Video files/notes
Task
progress
sheets
Eclipse
activity
logs
Trac (Issue tracker),
Acceptance test
reports
Open interviews
Audio files/notes
Variables
of interest
Data
sources
Moderator
variables
Code smells
(num. smells**
smell density**)
** System and file level
* Only at system level
Maintainability
perception*
Maintenance
problems**
Think aloud
Video files/notes
Study
diary
Task
Dates+
Figure from [1]
[1] Yamashita, 2012: “Assessing the capability of code smells to support software maintainability
assessments: Empirical inquiry and methodological approach” PhD Thesis
18. Source Code**
Java Applications with near same functionality
A DB C
**Available at: opendata.soccerlab.polymtl.ca/git/users/root/projects
19. Source Code**
• Java, Javascript, SQL, HTML, XML.
Java Applications with near same functionality
A DB C
**Available at: opendata.soccerlab.polymtl.ca/git/users/root/projects
20. Source Code**
• Java, Javascript, SQL, HTML, XML.
• Developed by 4 Norwegian companies based on same specification
Java Applications with near same functionality
A DB C
**Available at: opendata.soccerlab.polymtl.ca/git/users/root/projects
21. Source Code**
• Java, Javascript, SQL, HTML, XML.
• Developed by 4 Norwegian companies based on same specification
• Result from experiment reported by Anda et al., (2008): “Variability and
Reproducibility in Software Engineering: A Study of Four Companies
that Developed the Same System”
Java Applications with near same functionality
A DB C
**Available at: opendata.soccerlab.polymtl.ca/git/users/root/projects
22. Code smells and evolution data**
**Available at https://zenodo.org/record/293719
23. Code smells and evolution data**
Code Smells:
**Available at https://zenodo.org/record/293719
24. Code smells and evolution data**
Code Smells:
• Tools for Code Smells: Borland Together and InCode
**Available at https://zenodo.org/record/293719
25. Code smells and evolution data**
Code Smells:
• Tools for Code Smells: Borland Together and InCode
• Code Smells: Detected Data Class, Data Clumps, Duplicated code in conditional
branches, Feature Envy, God (Large) Class, God (Long) Method, Misplaced Class,
Refused Bequest, Shotgun Surgery, Temporary variable used for several purposes,
Use of implementation instead of interface, and Interface Segregation Principle (ISP)
Violation
**Available at https://zenodo.org/record/293719
26. Code smells and evolution data**
Code Smells:
• Tools for Code Smells: Borland Together and InCode
• Code Smells: Detected Data Class, Data Clumps, Duplicated code in conditional
branches, Feature Envy, God (Large) Class, God (Long) Method, Misplaced Class,
Refused Bequest, Shotgun Surgery, Temporary variable used for several purposes,
Use of implementation instead of interface, and Interface Segregation Principle (ISP)
Violation
• Files: InitialSmells.xls (1 version), FinalSmells.xls (12 versions)
**Available at https://zenodo.org/record/293719
27. Code smells and evolution data**
Code Smells:
• Tools for Code Smells: Borland Together and InCode
• Code Smells: Detected Data Class, Data Clumps, Duplicated code in conditional
branches, Feature Envy, God (Large) Class, God (Long) Method, Misplaced Class,
Refused Bequest, Shotgun Surgery, Temporary variable used for several purposes,
Use of implementation instead of interface, and Interface Segregation Principle (ISP)
Violation
• Files: InitialSmells.xls (1 version), FinalSmells.xls (12 versions)
Code Evolution:
**Available at https://zenodo.org/record/293719
28. Code smells and evolution data**
Code Smells:
• Tools for Code Smells: Borland Together and InCode
• Code Smells: Detected Data Class, Data Clumps, Duplicated code in conditional
branches, Feature Envy, God (Large) Class, God (Long) Method, Misplaced Class,
Refused Bequest, Shotgun Surgery, Temporary variable used for several purposes,
Use of implementation instead of interface, and Interface Segregation Principle (ISP)
Violation
• Files: InitialSmells.xls (1 version), FinalSmells.xls (12 versions)
Code Evolution:
• Tool for changes: Custom written code with SVNKit
**Available at https://zenodo.org/record/293719
29. Code smells and evolution data**
Code Smells:
• Tools for Code Smells: Borland Together and InCode
• Code Smells: Detected Data Class, Data Clumps, Duplicated code in conditional
branches, Feature Envy, God (Large) Class, God (Long) Method, Misplaced Class,
Refused Bequest, Shotgun Surgery, Temporary variable used for several purposes,
Use of implementation instead of interface, and Interface Segregation Principle (ISP)
Violation
• Files: InitialSmells.xls (1 version), FinalSmells.xls (12 versions)
Code Evolution:
• Tool for changes: Custom written code with SVNKit
• Variables: Programmer, Revision No., Date, Full path, Filename, File extension, System,
Action Type (i.e. Added, Deleted, Modified, Renamed), No. lines added, No. lines
deleted, No. lines changed, and Churn
**Available at https://zenodo.org/record/293719
30. Code smells and evolution data**
Code Smells:
• Tools for Code Smells: Borland Together and InCode
• Code Smells: Detected Data Class, Data Clumps, Duplicated code in conditional
branches, Feature Envy, God (Large) Class, God (Long) Method, Misplaced Class,
Refused Bequest, Shotgun Surgery, Temporary variable used for several purposes,
Use of implementation instead of interface, and Interface Segregation Principle (ISP)
Violation
• Files: InitialSmells.xls (1 version), FinalSmells.xls (12 versions)
Code Evolution:
• Tool for changes: Custom written code with SVNKit
• Variables: Programmer, Revision No., Date, Full path, Filename, File extension, System,
Action Type (i.e. Added, Deleted, Modified, Renamed), No. lines added, No. lines
deleted, No. lines changed, and Churn
• File: Changes.xls (includes evolution of all 12 versions)
**Available at https://zenodo.org/record/293719
32. Software Evolution History**
• 3 projects per system, i.e., 6 developers x 2 systems =
12 projects (cases or evolution histories)
DCBA
Developer
System
**Available at: opendata.soccerlab.polymtl.ca/git/users/root/projects
33. Software Evolution History**
• 3 projects per system, i.e., 6 developers x 2 systems =
12 projects (cases or evolution histories)
• Technologies involved: MySQL, Apache Tomcat, SVN,
Trac, My Eclipse
DCBA
Developer
System
**Available at: opendata.soccerlab.polymtl.ca/git/users/root/projects
34. Software Evolution History**
• 3 projects per system, i.e., 6 developers x 2 systems =
12 projects (cases or evolution histories)
• Technologies involved: MySQL, Apache Tomcat, SVN,
Trac, My Eclipse
• Each project took 3-4 weeks, full-time.
DCBA
Developer
System
**Available at: opendata.soccerlab.polymtl.ca/git/users/root/projects
35. Software Evolution History**
• 3 projects per system, i.e., 6 developers x 2 systems =
12 projects (cases or evolution histories)
• Technologies involved: MySQL, Apache Tomcat, SVN,
Trac, My Eclipse
• Each project took 3-4 weeks, full-time.
• SVN was converted to Git and hosted at Polytechnic of
Montreal.
DCBA
Developer
System
**Available at: opendata.soccerlab.polymtl.ca/git/users/root/projects
36. Defect Data**
++original SVN repo and Trac instances are
available upon request
**Available at https://zenodo.org/record/293719
37. Defect Data**
• Due to heterogeneity of systems, no common unit testing suit is
available :(
++original SVN repo and Trac instances are
available upon request
**Available at https://zenodo.org/record/293719
38. Defect Data**
• Due to heterogeneity of systems, no common unit testing suit is
available :(
• 2 rounds of acceptance testing for each of the 12 projects
++original SVN repo and Trac instances are
available upon request
**Available at https://zenodo.org/record/293719
39. Defect Data**
• Due to heterogeneity of systems, no common unit testing suit is
available :(
• 2 rounds of acceptance testing for each of the 12 projects
• Defects were recorded in Trac after each acceptance testing
++original SVN repo and Trac instances are
available upon request
**Available at https://zenodo.org/record/293719
40. Defect Data**
• Due to heterogeneity of systems, no common unit testing suit is
available :(
• 2 rounds of acceptance testing for each of the 12 projects
• Defects were recorded in Trac after each acceptance testing
• Trac was too tightly-integrated with SVN, therefore not possible to
install on a server
++
++original SVN repo and Trac instances are
available upon request
**Available at https://zenodo.org/record/293719
41. Defect Data**
• Due to heterogeneity of systems, no common unit testing suit is
available :(
• 2 rounds of acceptance testing for each of the 12 projects
• Defects were recorded in Trac after each acceptance testing
• Trac was too tightly-integrated with SVN, therefore not possible to
install on a server
++
• 12 reports extracted from Trac:
++original SVN repo and Trac instances are
available upon request
**Available at https://zenodo.org/record/293719
42. Defect Data**
• Due to heterogeneity of systems, no common unit testing suit is
available :(
• 2 rounds of acceptance testing for each of the 12 projects
• Defects were recorded in Trac after each acceptance testing
• Trac was too tightly-integrated with SVN, therefore not possible to
install on a server
++
• 12 reports extracted from Trac:
Defects_Dev{1/2/3/4/5/6}_Sys{A/B/C/D}.xlsx
++original SVN repo and Trac instances are
available upon request
**Available at https://zenodo.org/record/293719
43. Defect Data**
• Due to heterogeneity of systems, no common unit testing suit is
available :(
• 2 rounds of acceptance testing for each of the 12 projects
• Defects were recorded in Trac after each acceptance testing
• Trac was too tightly-integrated with SVN, therefore not possible to
install on a server
++
• 12 reports extracted from Trac:
Defects_Dev{1/2/3/4/5/6}_Sys{A/B/C/D}.xlsx
•
++original SVN repo and Trac instances are
available upon request
**Available at https://zenodo.org/record/293719
45. Task Dates**
A problem in longitudinal, brown-field study: limits between tasks become “blurry”
**Available at https://zenodo.org/record/293719
46. Task Dates**
A problem in longitudinal, brown-field study: limits between tasks become “blurry”
Examples:
**Available at https://zenodo.org/record/293719
47. Task Dates**
A problem in longitudinal, brown-field study: limits between tasks become “blurry”
Examples:
Developer finishes Task 3 in System 1 in the morning, and moves on to
Task 1 for System 2 in the afternoon.
**Available at https://zenodo.org/record/293719
48. Task Dates**
A problem in longitudinal, brown-field study: limits between tasks become “blurry”
Examples:
Developer finishes Task 3 in System 1 in the morning, and moves on to
Task 1 for System 2 in the afternoon.
Developer was working on Task 2, but then forgot to change something in
Task 1, so switch temporary between tasks.
**Available at https://zenodo.org/record/293719
49. Task Dates**
A problem in longitudinal, brown-field study: limits between tasks become “blurry”
Examples:
Developer finishes Task 3 in System 1 in the morning, and moves on to
Task 1 for System 2 in the afternoon.
Developer was working on Task 2, but then forgot to change something in
Task 1, so switch temporary between tasks.
We used different sources to estimate the Dates in which a developer was
working on a given System and a given Task.
**Available at https://zenodo.org/record/293719
50. Task Dates**
A problem in longitudinal, brown-field study: limits between tasks become “blurry”
Examples:
Developer finishes Task 3 in System 1 in the morning, and moves on to
Task 1 for System 2 in the afternoon.
Developer was working on Task 2, but then forgot to change something in
Task 1, so switch temporary between tasks.
We used different sources to estimate the Dates in which a developer was
working on a given System and a given Task.
Project context
Daily interviews
Audio files/notes
Subversion
database
Defects*
Development
Technology
Change
Size**
Effort**
Maintenance outcomes
Think aloud
Video files/notes
Task
progress
sheets
Eclipse
activity
logs
Trac (Issue tracker),
Acceptance test
reports
Open interviews
Audio files/notes
Maintainability
perception*
Maintenance
problems**
oud
/notes
Study
diary
**Available at https://zenodo.org/record/293719
51. Task Dates**
A problem in longitudinal, brown-field study: limits between tasks become “blurry”
Examples:
Developer finishes Task 3 in System 1 in the morning, and moves on to
Task 1 for System 2 in the afternoon.
Developer was working on Task 2, but then forgot to change something in
Task 1, so switch temporary between tasks.
We used different sources to estimate the Dates in which a developer was
working on a given System and a given Task.
Project context
Daily interviews
Audio files/notes
Subversion
database
Defects*
Development
Technology
Change
Size**
Effort**
Maintenance outcomes
Think aloud
Video files/notes
Task
progress
sheets
Eclipse
activity
logs
Trac (Issue tracker),
Acceptance test
reports
Open interviews
Audio files/notes
Maintainability
perception*
Maintenance
problems**
oud
/notes
Study
diary
**Available at https://zenodo.org/record/293719
52. Task Dates**
A problem in longitudinal, brown-field study: limits between tasks become “blurry”
Examples:
Developer finishes Task 3 in System 1 in the morning, and moves on to
Task 1 for System 2 in the afternoon.
Developer was working on Task 2, but then forgot to change something in
Task 1, so switch temporary between tasks.
We used different sources to estimate the Dates in which a developer was
working on a given System and a given Task.
Project context
Daily interviews
Audio files/notes
Subversion
database
Defects*
Development
Technology
Change
Size**
Effort**
Maintenance outcomes
Think aloud
Video files/notes
Task
progress
sheets
Eclipse
activity
logs
Trac (Issue tracker),
Acceptance test
reports
Open interviews
Audio files/notes
Maintainability
perception*
Maintenance
problems**
oud
/notes
Study
diary
Task
Dates
**Available at https://zenodo.org/record/293719
55. Potential usage scenarios
a) Analysis of “repeated defects” in a
multiple case study
b) Studies on the impact of different
metrics/attributes on software evolution
56. Potential usage scenarios
a) Analysis of “repeated defects” in a
multiple case study
b) Studies on the impact of different
metrics/attributes on software evolution
c) Further studies on inter-smell relations
57. Potential usage scenarios
a) Analysis of “repeated defects” in a
multiple case study
b) Studies on the impact of different
metrics/attributes on software evolution
c) Further studies on inter-smell relations
d) Cost-benefit analysis of code smell
removal
58. Potential usage scenarios
a) Analysis of “repeated defects” in a
multiple case study
b) Studies on the impact of different
metrics/attributes on software evolution
c) Further studies on inter-smell relations
d) Cost-benefit analysis of code smell
removal
e) Benchmarking of diverse tools/
methodologies
59. Potential usage scenarios
a) Analysis of “repeated defects” in a
multiple case study
b) Studies on the impact of different
metrics/attributes on software evolution
c) Further studies on inter-smell relations
d) Cost-benefit analysis of code smell
removal
e) Benchmarking of diverse tools/
methodologies
f) Task/context extraction, alongside
ideas by [2]
[2] M. Barnett, et al., “Helping Developers Help Themselves: Automatic Decomposition
of Code Review Change-sets,” (ICSE ’15)
62. What to consider when using the data..
A. Context of the study
B. Tasks were individual
63. What to consider when using the data..
A. Context of the study
B. Tasks were individual
C. Time frame is approx. 1-2 sprints
64. What to consider when using the data..
A. Context of the study
B. Tasks were individual
C. Time frame is approx. 1-2 sprints
D. The age of the systems (+10 years)
65. What to consider when using the data..
A. Context of the study
B. Tasks were individual
C. Time frame is approx. 1-2 sprints
D. The age of the systems (+10 years)
E. Tool for code smells not available
66. What to consider when using the data..
A. Context of the study
B. Tasks were individual
C. Time frame is approx. 1-2 sprints
D. The age of the systems (+10 years)
E. Tool for code smells not available
F. No explicit corrective tasks
67. What to consider when using the data..
A. Context of the study
B. Tasks were individual
C. Time frame is approx. 1-2 sprints
D. The age of the systems (+10 years)
E. Tool for code smells not available
F. No explicit corrective tasks
G. Date accuracy for the tasks
68. What to consider when using the data..
A. Context of the study
B. Tasks were individual
C. Time frame is approx. 1-2 sprints
D. The age of the systems (+10 years)
E. Tool for code smells not available
F. No explicit corrective tasks
G. Date accuracy for the tasks
H. Not all the commit logs were associated with an issue ID
69. What to consider when using the data..
A. Context of the study
B. Tasks were individual
C. Time frame is approx. 1-2 sprints
D. The age of the systems (+10 years)
E. Tool for code smells not available
F. No explicit corrective tasks
G. Date accuracy for the tasks
H. Not all the commit logs were associated with an issue ID
I. Consider the trade-off between the degree of realism and the
degree of control in such type of studies
71. Trade-off between
realism and control
Sample size (Big Data)DataRichness(ThickData)
Low High
Low
High
Controlled/Lab
Experiments
72. Trade-off between
realism and control
Sample size (Big Data)DataRichness(ThickData)
Low High
Low
High
Case studies
Controlled/Lab
Experiments
73. Trade-off between
realism and control
Sample size (Big Data)DataRichness(ThickData)
Low High
Low
High
Case studies
Controlled/Lab
Experiments
Ethnography
74. Trade-off between
realism and control
Sample size (Big Data)DataRichness(ThickData)
Low High
Low
High
Case studies
Repository
Analysis (OSS)
Controlled/Lab
Experiments
Ethnography
75. Trade-off between
realism and control
Sample size (Big Data)DataRichness(ThickData)
Low High
Low
High
Case studies
Repository
Analysis (OSS)
Controlled/Lab
Experiments
Our study?
Ethnography
76. Trade-off between
realism and control
Sample size (Big Data)DataRichness(ThickData)
Low High
Low
High
Case studies
Repository
Analysis (OSS)
Controlled/Lab
Experiments
Our study?
Mega-cross-project
experiments?
Ethnography
79. Experimental Replication Applied to Case Study [1]
Context Context
Case 1 Case 2
Literal Replication
≈
Same Tasks
Developers with similar skills
Same project setting
Same technology
Case 2
Code
Smells
System A
Code
Smells
System A
≈
Maintenance
outcomes
Maintenance
outcomes
System ASystem A
Same Systems
Context Context
Case 1 Case 2
Maintenance
outcomes
Theoretical Replication
≠
Same Tasks
Developers with similar skills
Same project setting
Same technology
Case 3
Code
Smells
System A
Code
Smells
System B
≠
Maintenance
outcomes
System BSystem A
Different Systems
80. Experimental Replication Applied to Case Study [1]
Context Context
Case 1 Case 2
Literal Replication
≈
Same Tasks
Developers with similar skills
Same project setting
Same technology
Case 2
Code
Smells
System A
Code
Smells
System A
≈
Maintenance
outcomes
Maintenance
outcomes
System ASystem A
Same Systems
Context Context
Case 1 Case 2
Maintenance
outcomes
Theoretical Replication
≠
Same Tasks
Developers with similar skills
Same project setting
Same technology
Case 3
Code
Smells
System A
Code
Smells
System B
≠
Maintenance
outcomes
System BSystem A
Different Systems
[1] Yamashita, 2012: “Assessing the capability of code smells to support software maintainability
assessments: Empirical inquiry and methodological approach” PhD Thesis