Software Evolution and Defects from a Controlled,
Multiple, Industrial Case Study
Aiko Yamashita, S. Amirhossein Abtahizadeh, Foutse Khomh, Yann-Gaël Guéhéneuc 

Centrum Wiskunde & Informatica

Oslo and Akershus University College of Applied Sciences

Polytechnique Montréal
Data Showcase - MSR 2017 - Buenos Aires, Argentina
Moderator Factors in Software Engineering
Researcher
Software Project
Moderator Factors in Software Engineering
Researcher
Software Project
System
Project context
Tasks
Source
code
Daily interviews
Audio files/notes
Subversion
database
Programming
Skill
Defects*
Development
Technology
Change
Size**
Effort**
Maintenance outcomes
Eclipse
activity
logs
Trac (Issue tracker),
Acceptance test
reports
Open interviews
Audio files/notes
Variables
of interest
Data
sources
Moderator
variables
Code smells
(num. smells**
smell density**)
Maintainability
perception*
Maintenance
problems**
Think aloud
Video files/notes
Study
diary
Learning Effect
Moderator Factors in Software Engineering
Researcher
Software Project
System
Project context
Tasks
Source
code
Daily interviews
Audio files/notes
Subversion
database
Programming
Skill
Defects*
Development
Technology
Change
Size**
Effort**
Maintenance outcomes
Eclipse
activity
logs
Trac (Issue tracker),
Acceptance test
reports
Open interviews
Audio files/notes
Variables
of interest
Data
sources
Moderator
variables
Code smells
(num. smells**
smell density**)
Maintainability
perception*
Maintenance
problems**
Think aloud
Video files/notes
Study
diary
Learning Effect
Task and learning effect
• Simula Experiment• Software Replicability• 4 Norwegian firms
Java Applications with near same functionality
A DB C
Study 1
Task and learning effect
Task and learning effect
•Simula multiple case study
• Software Maintainability
• 2 European firms
Study 2
Task and learning effect
Task 3.
New Reporting
functionality
Task 1. Replacing external data source
✔
Task 2.
New authentication
mechanism
System!
Control over task•Simula multiple case study
• Software Maintainability
• 2 European firms
Study 2
Task and learning effect
DCBA
Developer
System
Control over learning effect
Task 3.
New Reporting
functionality
Task 1. Replacing external data source
✔
Task 2.
New authentication
mechanism
System!
Control over task•Simula multiple case study
• Software Maintainability
• 2 European firms
Study 2
Programming skills
Programming skills
“Construction and Validation of an Instrument

for Measuring Programming Skill”

(Bergersen et. al. 2014)
Control over programming skills
Programming skills
• Measurement instrument based
on combination of speed and
correctness.
“Construction and Validation of an Instrument

for Measuring Programming Skill”

(Bergersen et. al. 2014)
Control over programming skills
Programming skills
• Measurement instrument based
on combination of speed and
correctness.
• The Rasch measurement model
was used.
“Construction and Validation of an Instrument

for Measuring Programming Skill”

(Bergersen et. al. 2014)
Control over programming skills
Programming skills
• Measurement instrument based
on combination of speed and
correctness.
• The Rasch measurement model
was used.
• Sixty-five professional developers
from eight countries participated
in validating the instrument
“Construction and Validation of an Instrument

for Measuring Programming Skill”

(Bergersen et. al. 2014)
Control over programming skills
Programming skills
• Measurement instrument based
on combination of speed and
correctness.
• The Rasch measurement model
was used.
• Sixty-five professional developers
from eight countries participated
in validating the instrument
• They solved 19 Java
programming tasks over two days
“Construction and Validation of an Instrument

for Measuring Programming Skill”

(Bergersen et. al. 2014)
Control over programming skills
Programming skills
• Measurement instrument based
on combination of speed and
correctness.
• The Rasch measurement model
was used.
• Sixty-five professional developers
from eight countries participated
in validating the instrument
• They solved 19 Java
programming tasks over two days
• Six of the participants who
scored better than average skill
were selected
“Construction and Validation of an Instrument

for Measuring Programming Skill”

(Bergersen et. al. 2014)
Control over programming skills
Variables and Data Sources
System
Project context
Tasks
Source
code
Daily interviews
Audio files/notes
Subversion
database
Programming
Skill
Defects*
Development
Technology
Change
Size**
Effort**
Maintenance outcomes
Think aloud
Video files/notes
Task
progress
sheets
Eclipse
activity
logs
Trac (Issue tracker),
Acceptance test
reports
Open interviews
Audio files/notes
Variables
of interest
Data
sources
Moderator
variables
Code smells
(num. smells**
smell density**)
** System and file level
* Only at system level
Maintainability
perception*
Maintenance
problems**
Think aloud
Video files/notes
Study
diary
Task
Dates+
Figure from [1]
[1] Yamashita, 2012: “Assessing the capability of code smells to support software maintainability
assessments: Empirical inquiry and methodological approach” PhD Thesis
Source Code**
Java Applications with near same functionality
A DB C
**Available at: opendata.soccerlab.polymtl.ca/git/users/root/projects
Source Code**
• Java, Javascript, SQL, HTML, XML.
Java Applications with near same functionality
A DB C
**Available at: opendata.soccerlab.polymtl.ca/git/users/root/projects
Source Code**
• Java, Javascript, SQL, HTML, XML.
• Developed by 4 Norwegian companies based on same specification
Java Applications with near same functionality
A DB C
**Available at: opendata.soccerlab.polymtl.ca/git/users/root/projects
Source Code**
• Java, Javascript, SQL, HTML, XML.
• Developed by 4 Norwegian companies based on same specification
• Result from experiment reported by Anda et al., (2008): “Variability and
Reproducibility in Software Engineering: A Study of Four Companies
that Developed the Same System”
Java Applications with near same functionality
A DB C
**Available at: opendata.soccerlab.polymtl.ca/git/users/root/projects
Code smells and evolution data**
**Available at https://zenodo.org/record/293719
Code smells and evolution data**
Code Smells:
**Available at https://zenodo.org/record/293719
Code smells and evolution data**
Code Smells:
• Tools for Code Smells: Borland Together and InCode
**Available at https://zenodo.org/record/293719
Code smells and evolution data**
Code Smells:
• Tools for Code Smells: Borland Together and InCode
• Code Smells: Detected Data Class, Data Clumps, Duplicated code in conditional
branches, Feature Envy, God (Large) Class, God (Long) Method, Misplaced Class,
Refused Bequest, Shotgun Surgery, Temporary variable used for several purposes,
Use of implementation instead of interface, and Interface Segregation Principle (ISP)
Violation
**Available at https://zenodo.org/record/293719
Code smells and evolution data**
Code Smells:
• Tools for Code Smells: Borland Together and InCode
• Code Smells: Detected Data Class, Data Clumps, Duplicated code in conditional
branches, Feature Envy, God (Large) Class, God (Long) Method, Misplaced Class,
Refused Bequest, Shotgun Surgery, Temporary variable used for several purposes,
Use of implementation instead of interface, and Interface Segregation Principle (ISP)
Violation
• Files: InitialSmells.xls (1 version), FinalSmells.xls (12 versions)
**Available at https://zenodo.org/record/293719
Code smells and evolution data**
Code Smells:
• Tools for Code Smells: Borland Together and InCode
• Code Smells: Detected Data Class, Data Clumps, Duplicated code in conditional
branches, Feature Envy, God (Large) Class, God (Long) Method, Misplaced Class,
Refused Bequest, Shotgun Surgery, Temporary variable used for several purposes,
Use of implementation instead of interface, and Interface Segregation Principle (ISP)
Violation
• Files: InitialSmells.xls (1 version), FinalSmells.xls (12 versions)
Code Evolution:
**Available at https://zenodo.org/record/293719
Code smells and evolution data**
Code Smells:
• Tools for Code Smells: Borland Together and InCode
• Code Smells: Detected Data Class, Data Clumps, Duplicated code in conditional
branches, Feature Envy, God (Large) Class, God (Long) Method, Misplaced Class,
Refused Bequest, Shotgun Surgery, Temporary variable used for several purposes,
Use of implementation instead of interface, and Interface Segregation Principle (ISP)
Violation
• Files: InitialSmells.xls (1 version), FinalSmells.xls (12 versions)
Code Evolution:
• Tool for changes: Custom written code with SVNKit
**Available at https://zenodo.org/record/293719
Code smells and evolution data**
Code Smells:
• Tools for Code Smells: Borland Together and InCode
• Code Smells: Detected Data Class, Data Clumps, Duplicated code in conditional
branches, Feature Envy, God (Large) Class, God (Long) Method, Misplaced Class,
Refused Bequest, Shotgun Surgery, Temporary variable used for several purposes,
Use of implementation instead of interface, and Interface Segregation Principle (ISP)
Violation
• Files: InitialSmells.xls (1 version), FinalSmells.xls (12 versions)
Code Evolution:
• Tool for changes: Custom written code with SVNKit
• Variables: Programmer, Revision No., Date, Full path, Filename, File extension, System,
Action Type (i.e. Added, Deleted, Modified, Renamed), No. lines added, No. lines
deleted, No. lines changed, and Churn
**Available at https://zenodo.org/record/293719
Code smells and evolution data**
Code Smells:
• Tools for Code Smells: Borland Together and InCode
• Code Smells: Detected Data Class, Data Clumps, Duplicated code in conditional
branches, Feature Envy, God (Large) Class, God (Long) Method, Misplaced Class,
Refused Bequest, Shotgun Surgery, Temporary variable used for several purposes,
Use of implementation instead of interface, and Interface Segregation Principle (ISP)
Violation
• Files: InitialSmells.xls (1 version), FinalSmells.xls (12 versions)
Code Evolution:
• Tool for changes: Custom written code with SVNKit
• Variables: Programmer, Revision No., Date, Full path, Filename, File extension, System,
Action Type (i.e. Added, Deleted, Modified, Renamed), No. lines added, No. lines
deleted, No. lines changed, and Churn
• File: Changes.xls (includes evolution of all 12 versions)
**Available at https://zenodo.org/record/293719
Software Evolution History**
DCBA
Developer
System
**Available at: opendata.soccerlab.polymtl.ca/git/users/root/projects
Software Evolution History**
• 3 projects per system, i.e., 6 developers x 2 systems =
12 projects (cases or evolution histories)
DCBA
Developer
System
**Available at: opendata.soccerlab.polymtl.ca/git/users/root/projects
Software Evolution History**
• 3 projects per system, i.e., 6 developers x 2 systems =
12 projects (cases or evolution histories)
• Technologies involved: MySQL, Apache Tomcat, SVN,
Trac, My Eclipse
DCBA
Developer
System
**Available at: opendata.soccerlab.polymtl.ca/git/users/root/projects
Software Evolution History**
• 3 projects per system, i.e., 6 developers x 2 systems =
12 projects (cases or evolution histories)
• Technologies involved: MySQL, Apache Tomcat, SVN,
Trac, My Eclipse
• Each project took 3-4 weeks, full-time.
DCBA
Developer
System
**Available at: opendata.soccerlab.polymtl.ca/git/users/root/projects
Software Evolution History**
• 3 projects per system, i.e., 6 developers x 2 systems =
12 projects (cases or evolution histories)
• Technologies involved: MySQL, Apache Tomcat, SVN,
Trac, My Eclipse
• Each project took 3-4 weeks, full-time.
• SVN was converted to Git and hosted at Polytechnic of
Montreal.
DCBA
Developer
System
**Available at: opendata.soccerlab.polymtl.ca/git/users/root/projects
Defect Data**
++original SVN repo and Trac instances are
available upon request
**Available at https://zenodo.org/record/293719
Defect Data**
• Due to heterogeneity of systems, no common unit testing suit is
available :(
++original SVN repo and Trac instances are
available upon request
**Available at https://zenodo.org/record/293719
Defect Data**
• Due to heterogeneity of systems, no common unit testing suit is
available :(
• 2 rounds of acceptance testing for each of the 12 projects
++original SVN repo and Trac instances are
available upon request
**Available at https://zenodo.org/record/293719
Defect Data**
• Due to heterogeneity of systems, no common unit testing suit is
available :(
• 2 rounds of acceptance testing for each of the 12 projects
• Defects were recorded in Trac after each acceptance testing
++original SVN repo and Trac instances are
available upon request
**Available at https://zenodo.org/record/293719
Defect Data**
• Due to heterogeneity of systems, no common unit testing suit is
available :(
• 2 rounds of acceptance testing for each of the 12 projects
• Defects were recorded in Trac after each acceptance testing
• Trac was too tightly-integrated with SVN, therefore not possible to
install on a server
++
++original SVN repo and Trac instances are
available upon request
**Available at https://zenodo.org/record/293719
Defect Data**
• Due to heterogeneity of systems, no common unit testing suit is
available :(
• 2 rounds of acceptance testing for each of the 12 projects
• Defects were recorded in Trac after each acceptance testing
• Trac was too tightly-integrated with SVN, therefore not possible to
install on a server
++
• 12 reports extracted from Trac:
++original SVN repo and Trac instances are
available upon request
**Available at https://zenodo.org/record/293719
Defect Data**
• Due to heterogeneity of systems, no common unit testing suit is
available :(
• 2 rounds of acceptance testing for each of the 12 projects
• Defects were recorded in Trac after each acceptance testing
• Trac was too tightly-integrated with SVN, therefore not possible to
install on a server
++
• 12 reports extracted from Trac:
Defects_Dev{1/2/3/4/5/6}_Sys{A/B/C/D}.xlsx
++original SVN repo and Trac instances are
available upon request
**Available at https://zenodo.org/record/293719
Defect Data**
• Due to heterogeneity of systems, no common unit testing suit is
available :(
• 2 rounds of acceptance testing for each of the 12 projects
• Defects were recorded in Trac after each acceptance testing
• Trac was too tightly-integrated with SVN, therefore not possible to
install on a server
++
• 12 reports extracted from Trac:
Defects_Dev{1/2/3/4/5/6}_Sys{A/B/C/D}.xlsx
•
++original SVN repo and Trac instances are
available upon request
**Available at https://zenodo.org/record/293719
Task Dates**
**Available at https://zenodo.org/record/293719
Task Dates**
A problem in longitudinal, brown-field study: limits between tasks become “blurry”
**Available at https://zenodo.org/record/293719
Task Dates**
A problem in longitudinal, brown-field study: limits between tasks become “blurry”
Examples:
**Available at https://zenodo.org/record/293719
Task Dates**
A problem in longitudinal, brown-field study: limits between tasks become “blurry”
Examples:
Developer finishes Task 3 in System 1 in the morning, and moves on to
Task 1 for System 2 in the afternoon.
**Available at https://zenodo.org/record/293719
Task Dates**
A problem in longitudinal, brown-field study: limits between tasks become “blurry”
Examples:
Developer finishes Task 3 in System 1 in the morning, and moves on to
Task 1 for System 2 in the afternoon.
Developer was working on Task 2, but then forgot to change something in
Task 1, so switch temporary between tasks.
**Available at https://zenodo.org/record/293719
Task Dates**
A problem in longitudinal, brown-field study: limits between tasks become “blurry”
Examples:
Developer finishes Task 3 in System 1 in the morning, and moves on to
Task 1 for System 2 in the afternoon.
Developer was working on Task 2, but then forgot to change something in
Task 1, so switch temporary between tasks.
We used different sources to estimate the Dates in which a developer was
working on a given System and a given Task.
**Available at https://zenodo.org/record/293719
Task Dates**
A problem in longitudinal, brown-field study: limits between tasks become “blurry”
Examples:
Developer finishes Task 3 in System 1 in the morning, and moves on to
Task 1 for System 2 in the afternoon.
Developer was working on Task 2, but then forgot to change something in
Task 1, so switch temporary between tasks.
We used different sources to estimate the Dates in which a developer was
working on a given System and a given Task.
Project context
Daily interviews
Audio files/notes
Subversion
database
Defects*
Development
Technology
Change
Size**
Effort**
Maintenance outcomes
Think aloud
Video files/notes
Task
progress
sheets
Eclipse
activity
logs
Trac (Issue tracker),
Acceptance test
reports
Open interviews
Audio files/notes
Maintainability
perception*
Maintenance
problems**
oud
/notes
Study
diary
**Available at https://zenodo.org/record/293719
Task Dates**
A problem in longitudinal, brown-field study: limits between tasks become “blurry”
Examples:
Developer finishes Task 3 in System 1 in the morning, and moves on to
Task 1 for System 2 in the afternoon.
Developer was working on Task 2, but then forgot to change something in
Task 1, so switch temporary between tasks.
We used different sources to estimate the Dates in which a developer was
working on a given System and a given Task.
Project context
Daily interviews
Audio files/notes
Subversion
database
Defects*
Development
Technology
Change
Size**
Effort**
Maintenance outcomes
Think aloud
Video files/notes
Task
progress
sheets
Eclipse
activity
logs
Trac (Issue tracker),
Acceptance test
reports
Open interviews
Audio files/notes
Maintainability
perception*
Maintenance
problems**
oud
/notes
Study
diary
**Available at https://zenodo.org/record/293719
Task Dates**
A problem in longitudinal, brown-field study: limits between tasks become “blurry”
Examples:
Developer finishes Task 3 in System 1 in the morning, and moves on to
Task 1 for System 2 in the afternoon.
Developer was working on Task 2, but then forgot to change something in
Task 1, so switch temporary between tasks.
We used different sources to estimate the Dates in which a developer was
working on a given System and a given Task.
Project context
Daily interviews
Audio files/notes
Subversion
database
Defects*
Development
Technology
Change
Size**
Effort**
Maintenance outcomes
Think aloud
Video files/notes
Task
progress
sheets
Eclipse
activity
logs
Trac (Issue tracker),
Acceptance test
reports
Open interviews
Audio files/notes
Maintainability
perception*
Maintenance
problems**
oud
/notes
Study
diary
Task
Dates
**Available at https://zenodo.org/record/293719
Potential usage scenarios
Potential usage scenarios
a) Analysis of “repeated defects” in a
multiple case study
Potential usage scenarios
a) Analysis of “repeated defects” in a
multiple case study
b) Studies on the impact of different
metrics/attributes on software evolution
Potential usage scenarios
a) Analysis of “repeated defects” in a
multiple case study
b) Studies on the impact of different
metrics/attributes on software evolution
c) Further studies on inter-smell relations
Potential usage scenarios
a) Analysis of “repeated defects” in a
multiple case study
b) Studies on the impact of different
metrics/attributes on software evolution
c) Further studies on inter-smell relations
d) Cost-benefit analysis of code smell
removal
Potential usage scenarios
a) Analysis of “repeated defects” in a
multiple case study
b) Studies on the impact of different
metrics/attributes on software evolution
c) Further studies on inter-smell relations
d) Cost-benefit analysis of code smell
removal
e) Benchmarking of diverse tools/
methodologies
Potential usage scenarios
a) Analysis of “repeated defects” in a
multiple case study
b) Studies on the impact of different
metrics/attributes on software evolution
c) Further studies on inter-smell relations
d) Cost-benefit analysis of code smell
removal
e) Benchmarking of diverse tools/
methodologies
f) Task/context extraction, alongside
ideas by [2]
[2] M. Barnett, et al., “Helping Developers Help Themselves: Automatic Decomposition
of Code Review Change-sets,” (ICSE ’15)

What to consider when using the data..
What to consider when using the data..
A. Context of the study
What to consider when using the data..
A. Context of the study
B. Tasks were individual
What to consider when using the data..
A. Context of the study
B. Tasks were individual
C. Time frame is approx. 1-2 sprints
What to consider when using the data..
A. Context of the study
B. Tasks were individual
C. Time frame is approx. 1-2 sprints
D. The age of the systems (+10 years)
What to consider when using the data..
A. Context of the study
B. Tasks were individual
C. Time frame is approx. 1-2 sprints
D. The age of the systems (+10 years)
E. Tool for code smells not available
What to consider when using the data..
A. Context of the study
B. Tasks were individual
C. Time frame is approx. 1-2 sprints
D. The age of the systems (+10 years)
E. Tool for code smells not available
F. No explicit corrective tasks
What to consider when using the data..
A. Context of the study
B. Tasks were individual
C. Time frame is approx. 1-2 sprints
D. The age of the systems (+10 years)
E. Tool for code smells not available
F. No explicit corrective tasks
G. Date accuracy for the tasks
What to consider when using the data..
A. Context of the study
B. Tasks were individual
C. Time frame is approx. 1-2 sprints
D. The age of the systems (+10 years)
E. Tool for code smells not available
F. No explicit corrective tasks
G. Date accuracy for the tasks
H. Not all the commit logs were associated with an issue ID
What to consider when using the data..
A. Context of the study
B. Tasks were individual
C. Time frame is approx. 1-2 sprints
D. The age of the systems (+10 years)
E. Tool for code smells not available
F. No explicit corrective tasks
G. Date accuracy for the tasks
H. Not all the commit logs were associated with an issue ID
I. Consider the trade-off between the degree of realism and the
degree of control in such type of studies
Trade-off between
realism and control
Sample size (Big Data)DataRichness(ThickData)
Low High
Low
High
Trade-off between
realism and control
Sample size (Big Data)DataRichness(ThickData)
Low High
Low
High
Controlled/Lab
Experiments
Trade-off between
realism and control
Sample size (Big Data)DataRichness(ThickData)
Low High
Low
High
Case studies
Controlled/Lab
Experiments
Trade-off between
realism and control
Sample size (Big Data)DataRichness(ThickData)
Low High
Low
High
Case studies
Controlled/Lab
Experiments
Ethnography
Trade-off between
realism and control
Sample size (Big Data)DataRichness(ThickData)
Low High
Low
High
Case studies
Repository
Analysis (OSS)
Controlled/Lab
Experiments
Ethnography
Trade-off between
realism and control
Sample size (Big Data)DataRichness(ThickData)
Low High
Low
High
Case studies
Repository
Analysis (OSS)
Controlled/Lab
Experiments
Our study?
Ethnography
Trade-off between
realism and control
Sample size (Big Data)DataRichness(ThickData)
Low High
Low
High
Case studies
Repository
Analysis (OSS)
Controlled/Lab
Experiments
Our study?
Mega-cross-project
experiments?
Ethnography
Experimental Replication Applied to Case Study [1]
Experimental Replication Applied to Case Study [1]
Context Context
Case 1 Case 2
Literal Replication
≈
Same Tasks
Developers with similar skills
Same project setting
Same technology
Case 2
Code
Smells
System A
Code
Smells
System A
≈
Maintenance
outcomes
Maintenance
outcomes
System ASystem A
Same Systems
Context Context
Case 1 Case 2
Maintenance
outcomes
Theoretical Replication
≠
Same Tasks
Developers with similar skills
Same project setting
Same technology
Case 3
Code
Smells
System A
Code
Smells
System B
≠
Maintenance
outcomes
System BSystem A
Different Systems
Experimental Replication Applied to Case Study [1]
Context Context
Case 1 Case 2
Literal Replication
≈
Same Tasks
Developers with similar skills
Same project setting
Same technology
Case 2
Code
Smells
System A
Code
Smells
System A
≈
Maintenance
outcomes
Maintenance
outcomes
System ASystem A
Same Systems
Context Context
Case 1 Case 2
Maintenance
outcomes
Theoretical Replication
≠
Same Tasks
Developers with similar skills
Same project setting
Same technology
Case 3
Code
Smells
System A
Code
Smells
System B
≠
Maintenance
outcomes
System BSystem A
Different Systems
[1] Yamashita, 2012: “Assessing the capability of code smells to support software maintainability
assessments: Empirical inquiry and methodological approach” PhD Thesis

Msr17a.ppt

  • 1.
    Software Evolution andDefects from a Controlled, Multiple, Industrial Case Study Aiko Yamashita, S. Amirhossein Abtahizadeh, Foutse Khomh, Yann-Gaël Guéhéneuc Centrum Wiskunde & Informatica Oslo and Akershus University College of Applied Sciences Polytechnique Montréal Data Showcase - MSR 2017 - Buenos Aires, Argentina
  • 2.
    Moderator Factors inSoftware Engineering Researcher Software Project
  • 3.
    Moderator Factors inSoftware Engineering Researcher Software Project System Project context Tasks Source code Daily interviews Audio files/notes Subversion database Programming Skill Defects* Development Technology Change Size** Effort** Maintenance outcomes Eclipse activity logs Trac (Issue tracker), Acceptance test reports Open interviews Audio files/notes Variables of interest Data sources Moderator variables Code smells (num. smells** smell density**) Maintainability perception* Maintenance problems** Think aloud Video files/notes Study diary Learning Effect
  • 4.
    Moderator Factors inSoftware Engineering Researcher Software Project System Project context Tasks Source code Daily interviews Audio files/notes Subversion database Programming Skill Defects* Development Technology Change Size** Effort** Maintenance outcomes Eclipse activity logs Trac (Issue tracker), Acceptance test reports Open interviews Audio files/notes Variables of interest Data sources Moderator variables Code smells (num. smells** smell density**) Maintainability perception* Maintenance problems** Think aloud Video files/notes Study diary Learning Effect
  • 5.
  • 6.
    • Simula Experiment•Software Replicability• 4 Norwegian firms Java Applications with near same functionality A DB C Study 1 Task and learning effect
  • 7.
    Task and learningeffect •Simula multiple case study • Software Maintainability • 2 European firms Study 2
  • 8.
    Task and learningeffect Task 3. New Reporting functionality Task 1. Replacing external data source ✔ Task 2. New authentication mechanism System! Control over task•Simula multiple case study • Software Maintainability • 2 European firms Study 2
  • 9.
    Task and learningeffect DCBA Developer System Control over learning effect Task 3. New Reporting functionality Task 1. Replacing external data source ✔ Task 2. New authentication mechanism System! Control over task•Simula multiple case study • Software Maintainability • 2 European firms Study 2
  • 10.
  • 11.
    Programming skills “Construction andValidation of an Instrument for Measuring Programming Skill” (Bergersen et. al. 2014) Control over programming skills
  • 12.
    Programming skills • Measurementinstrument based on combination of speed and correctness. “Construction and Validation of an Instrument for Measuring Programming Skill” (Bergersen et. al. 2014) Control over programming skills
  • 13.
    Programming skills • Measurementinstrument based on combination of speed and correctness. • The Rasch measurement model was used. “Construction and Validation of an Instrument for Measuring Programming Skill” (Bergersen et. al. 2014) Control over programming skills
  • 14.
    Programming skills • Measurementinstrument based on combination of speed and correctness. • The Rasch measurement model was used. • Sixty-five professional developers from eight countries participated in validating the instrument “Construction and Validation of an Instrument for Measuring Programming Skill” (Bergersen et. al. 2014) Control over programming skills
  • 15.
    Programming skills • Measurementinstrument based on combination of speed and correctness. • The Rasch measurement model was used. • Sixty-five professional developers from eight countries participated in validating the instrument • They solved 19 Java programming tasks over two days “Construction and Validation of an Instrument for Measuring Programming Skill” (Bergersen et. al. 2014) Control over programming skills
  • 16.
    Programming skills • Measurementinstrument based on combination of speed and correctness. • The Rasch measurement model was used. • Sixty-five professional developers from eight countries participated in validating the instrument • They solved 19 Java programming tasks over two days • Six of the participants who scored better than average skill were selected “Construction and Validation of an Instrument for Measuring Programming Skill” (Bergersen et. al. 2014) Control over programming skills
  • 17.
    Variables and DataSources System Project context Tasks Source code Daily interviews Audio files/notes Subversion database Programming Skill Defects* Development Technology Change Size** Effort** Maintenance outcomes Think aloud Video files/notes Task progress sheets Eclipse activity logs Trac (Issue tracker), Acceptance test reports Open interviews Audio files/notes Variables of interest Data sources Moderator variables Code smells (num. smells** smell density**) ** System and file level * Only at system level Maintainability perception* Maintenance problems** Think aloud Video files/notes Study diary Task Dates+ Figure from [1] [1] Yamashita, 2012: “Assessing the capability of code smells to support software maintainability assessments: Empirical inquiry and methodological approach” PhD Thesis
  • 18.
    Source Code** Java Applicationswith near same functionality A DB C **Available at: opendata.soccerlab.polymtl.ca/git/users/root/projects
  • 19.
    Source Code** • Java,Javascript, SQL, HTML, XML. Java Applications with near same functionality A DB C **Available at: opendata.soccerlab.polymtl.ca/git/users/root/projects
  • 20.
    Source Code** • Java,Javascript, SQL, HTML, XML. • Developed by 4 Norwegian companies based on same specification Java Applications with near same functionality A DB C **Available at: opendata.soccerlab.polymtl.ca/git/users/root/projects
  • 21.
    Source Code** • Java,Javascript, SQL, HTML, XML. • Developed by 4 Norwegian companies based on same specification • Result from experiment reported by Anda et al., (2008): “Variability and Reproducibility in Software Engineering: A Study of Four Companies that Developed the Same System” Java Applications with near same functionality A DB C **Available at: opendata.soccerlab.polymtl.ca/git/users/root/projects
  • 22.
    Code smells andevolution data** **Available at https://zenodo.org/record/293719
  • 23.
    Code smells andevolution data** Code Smells: **Available at https://zenodo.org/record/293719
  • 24.
    Code smells andevolution data** Code Smells: • Tools for Code Smells: Borland Together and InCode **Available at https://zenodo.org/record/293719
  • 25.
    Code smells andevolution data** Code Smells: • Tools for Code Smells: Borland Together and InCode • Code Smells: Detected Data Class, Data Clumps, Duplicated code in conditional branches, Feature Envy, God (Large) Class, God (Long) Method, Misplaced Class, Refused Bequest, Shotgun Surgery, Temporary variable used for several purposes, Use of implementation instead of interface, and Interface Segregation Principle (ISP) Violation **Available at https://zenodo.org/record/293719
  • 26.
    Code smells andevolution data** Code Smells: • Tools for Code Smells: Borland Together and InCode • Code Smells: Detected Data Class, Data Clumps, Duplicated code in conditional branches, Feature Envy, God (Large) Class, God (Long) Method, Misplaced Class, Refused Bequest, Shotgun Surgery, Temporary variable used for several purposes, Use of implementation instead of interface, and Interface Segregation Principle (ISP) Violation • Files: InitialSmells.xls (1 version), FinalSmells.xls (12 versions) **Available at https://zenodo.org/record/293719
  • 27.
    Code smells andevolution data** Code Smells: • Tools for Code Smells: Borland Together and InCode • Code Smells: Detected Data Class, Data Clumps, Duplicated code in conditional branches, Feature Envy, God (Large) Class, God (Long) Method, Misplaced Class, Refused Bequest, Shotgun Surgery, Temporary variable used for several purposes, Use of implementation instead of interface, and Interface Segregation Principle (ISP) Violation • Files: InitialSmells.xls (1 version), FinalSmells.xls (12 versions) Code Evolution: **Available at https://zenodo.org/record/293719
  • 28.
    Code smells andevolution data** Code Smells: • Tools for Code Smells: Borland Together and InCode • Code Smells: Detected Data Class, Data Clumps, Duplicated code in conditional branches, Feature Envy, God (Large) Class, God (Long) Method, Misplaced Class, Refused Bequest, Shotgun Surgery, Temporary variable used for several purposes, Use of implementation instead of interface, and Interface Segregation Principle (ISP) Violation • Files: InitialSmells.xls (1 version), FinalSmells.xls (12 versions) Code Evolution: • Tool for changes: Custom written code with SVNKit **Available at https://zenodo.org/record/293719
  • 29.
    Code smells andevolution data** Code Smells: • Tools for Code Smells: Borland Together and InCode • Code Smells: Detected Data Class, Data Clumps, Duplicated code in conditional branches, Feature Envy, God (Large) Class, God (Long) Method, Misplaced Class, Refused Bequest, Shotgun Surgery, Temporary variable used for several purposes, Use of implementation instead of interface, and Interface Segregation Principle (ISP) Violation • Files: InitialSmells.xls (1 version), FinalSmells.xls (12 versions) Code Evolution: • Tool for changes: Custom written code with SVNKit • Variables: Programmer, Revision No., Date, Full path, Filename, File extension, System, Action Type (i.e. Added, Deleted, Modified, Renamed), No. lines added, No. lines deleted, No. lines changed, and Churn **Available at https://zenodo.org/record/293719
  • 30.
    Code smells andevolution data** Code Smells: • Tools for Code Smells: Borland Together and InCode • Code Smells: Detected Data Class, Data Clumps, Duplicated code in conditional branches, Feature Envy, God (Large) Class, God (Long) Method, Misplaced Class, Refused Bequest, Shotgun Surgery, Temporary variable used for several purposes, Use of implementation instead of interface, and Interface Segregation Principle (ISP) Violation • Files: InitialSmells.xls (1 version), FinalSmells.xls (12 versions) Code Evolution: • Tool for changes: Custom written code with SVNKit • Variables: Programmer, Revision No., Date, Full path, Filename, File extension, System, Action Type (i.e. Added, Deleted, Modified, Renamed), No. lines added, No. lines deleted, No. lines changed, and Churn • File: Changes.xls (includes evolution of all 12 versions) **Available at https://zenodo.org/record/293719
  • 31.
    Software Evolution History** DCBA Developer System **Availableat: opendata.soccerlab.polymtl.ca/git/users/root/projects
  • 32.
    Software Evolution History** •3 projects per system, i.e., 6 developers x 2 systems = 12 projects (cases or evolution histories) DCBA Developer System **Available at: opendata.soccerlab.polymtl.ca/git/users/root/projects
  • 33.
    Software Evolution History** •3 projects per system, i.e., 6 developers x 2 systems = 12 projects (cases or evolution histories) • Technologies involved: MySQL, Apache Tomcat, SVN, Trac, My Eclipse DCBA Developer System **Available at: opendata.soccerlab.polymtl.ca/git/users/root/projects
  • 34.
    Software Evolution History** •3 projects per system, i.e., 6 developers x 2 systems = 12 projects (cases or evolution histories) • Technologies involved: MySQL, Apache Tomcat, SVN, Trac, My Eclipse • Each project took 3-4 weeks, full-time. DCBA Developer System **Available at: opendata.soccerlab.polymtl.ca/git/users/root/projects
  • 35.
    Software Evolution History** •3 projects per system, i.e., 6 developers x 2 systems = 12 projects (cases or evolution histories) • Technologies involved: MySQL, Apache Tomcat, SVN, Trac, My Eclipse • Each project took 3-4 weeks, full-time. • SVN was converted to Git and hosted at Polytechnic of Montreal. DCBA Developer System **Available at: opendata.soccerlab.polymtl.ca/git/users/root/projects
  • 36.
    Defect Data** ++original SVNrepo and Trac instances are available upon request **Available at https://zenodo.org/record/293719
  • 37.
    Defect Data** • Dueto heterogeneity of systems, no common unit testing suit is available :( ++original SVN repo and Trac instances are available upon request **Available at https://zenodo.org/record/293719
  • 38.
    Defect Data** • Dueto heterogeneity of systems, no common unit testing suit is available :( • 2 rounds of acceptance testing for each of the 12 projects ++original SVN repo and Trac instances are available upon request **Available at https://zenodo.org/record/293719
  • 39.
    Defect Data** • Dueto heterogeneity of systems, no common unit testing suit is available :( • 2 rounds of acceptance testing for each of the 12 projects • Defects were recorded in Trac after each acceptance testing ++original SVN repo and Trac instances are available upon request **Available at https://zenodo.org/record/293719
  • 40.
    Defect Data** • Dueto heterogeneity of systems, no common unit testing suit is available :( • 2 rounds of acceptance testing for each of the 12 projects • Defects were recorded in Trac after each acceptance testing • Trac was too tightly-integrated with SVN, therefore not possible to install on a server ++ ++original SVN repo and Trac instances are available upon request **Available at https://zenodo.org/record/293719
  • 41.
    Defect Data** • Dueto heterogeneity of systems, no common unit testing suit is available :( • 2 rounds of acceptance testing for each of the 12 projects • Defects were recorded in Trac after each acceptance testing • Trac was too tightly-integrated with SVN, therefore not possible to install on a server ++ • 12 reports extracted from Trac: ++original SVN repo and Trac instances are available upon request **Available at https://zenodo.org/record/293719
  • 42.
    Defect Data** • Dueto heterogeneity of systems, no common unit testing suit is available :( • 2 rounds of acceptance testing for each of the 12 projects • Defects were recorded in Trac after each acceptance testing • Trac was too tightly-integrated with SVN, therefore not possible to install on a server ++ • 12 reports extracted from Trac: Defects_Dev{1/2/3/4/5/6}_Sys{A/B/C/D}.xlsx ++original SVN repo and Trac instances are available upon request **Available at https://zenodo.org/record/293719
  • 43.
    Defect Data** • Dueto heterogeneity of systems, no common unit testing suit is available :( • 2 rounds of acceptance testing for each of the 12 projects • Defects were recorded in Trac after each acceptance testing • Trac was too tightly-integrated with SVN, therefore not possible to install on a server ++ • 12 reports extracted from Trac: Defects_Dev{1/2/3/4/5/6}_Sys{A/B/C/D}.xlsx • ++original SVN repo and Trac instances are available upon request **Available at https://zenodo.org/record/293719
  • 44.
    Task Dates** **Available athttps://zenodo.org/record/293719
  • 45.
    Task Dates** A problemin longitudinal, brown-field study: limits between tasks become “blurry” **Available at https://zenodo.org/record/293719
  • 46.
    Task Dates** A problemin longitudinal, brown-field study: limits between tasks become “blurry” Examples: **Available at https://zenodo.org/record/293719
  • 47.
    Task Dates** A problemin longitudinal, brown-field study: limits between tasks become “blurry” Examples: Developer finishes Task 3 in System 1 in the morning, and moves on to Task 1 for System 2 in the afternoon. **Available at https://zenodo.org/record/293719
  • 48.
    Task Dates** A problemin longitudinal, brown-field study: limits between tasks become “blurry” Examples: Developer finishes Task 3 in System 1 in the morning, and moves on to Task 1 for System 2 in the afternoon. Developer was working on Task 2, but then forgot to change something in Task 1, so switch temporary between tasks. **Available at https://zenodo.org/record/293719
  • 49.
    Task Dates** A problemin longitudinal, brown-field study: limits between tasks become “blurry” Examples: Developer finishes Task 3 in System 1 in the morning, and moves on to Task 1 for System 2 in the afternoon. Developer was working on Task 2, but then forgot to change something in Task 1, so switch temporary between tasks. We used different sources to estimate the Dates in which a developer was working on a given System and a given Task. **Available at https://zenodo.org/record/293719
  • 50.
    Task Dates** A problemin longitudinal, brown-field study: limits between tasks become “blurry” Examples: Developer finishes Task 3 in System 1 in the morning, and moves on to Task 1 for System 2 in the afternoon. Developer was working on Task 2, but then forgot to change something in Task 1, so switch temporary between tasks. We used different sources to estimate the Dates in which a developer was working on a given System and a given Task. Project context Daily interviews Audio files/notes Subversion database Defects* Development Technology Change Size** Effort** Maintenance outcomes Think aloud Video files/notes Task progress sheets Eclipse activity logs Trac (Issue tracker), Acceptance test reports Open interviews Audio files/notes Maintainability perception* Maintenance problems** oud /notes Study diary **Available at https://zenodo.org/record/293719
  • 51.
    Task Dates** A problemin longitudinal, brown-field study: limits between tasks become “blurry” Examples: Developer finishes Task 3 in System 1 in the morning, and moves on to Task 1 for System 2 in the afternoon. Developer was working on Task 2, but then forgot to change something in Task 1, so switch temporary between tasks. We used different sources to estimate the Dates in which a developer was working on a given System and a given Task. Project context Daily interviews Audio files/notes Subversion database Defects* Development Technology Change Size** Effort** Maintenance outcomes Think aloud Video files/notes Task progress sheets Eclipse activity logs Trac (Issue tracker), Acceptance test reports Open interviews Audio files/notes Maintainability perception* Maintenance problems** oud /notes Study diary **Available at https://zenodo.org/record/293719
  • 52.
    Task Dates** A problemin longitudinal, brown-field study: limits between tasks become “blurry” Examples: Developer finishes Task 3 in System 1 in the morning, and moves on to Task 1 for System 2 in the afternoon. Developer was working on Task 2, but then forgot to change something in Task 1, so switch temporary between tasks. We used different sources to estimate the Dates in which a developer was working on a given System and a given Task. Project context Daily interviews Audio files/notes Subversion database Defects* Development Technology Change Size** Effort** Maintenance outcomes Think aloud Video files/notes Task progress sheets Eclipse activity logs Trac (Issue tracker), Acceptance test reports Open interviews Audio files/notes Maintainability perception* Maintenance problems** oud /notes Study diary Task Dates **Available at https://zenodo.org/record/293719
  • 53.
  • 54.
    Potential usage scenarios a)Analysis of “repeated defects” in a multiple case study
  • 55.
    Potential usage scenarios a)Analysis of “repeated defects” in a multiple case study b) Studies on the impact of different metrics/attributes on software evolution
  • 56.
    Potential usage scenarios a)Analysis of “repeated defects” in a multiple case study b) Studies on the impact of different metrics/attributes on software evolution c) Further studies on inter-smell relations
  • 57.
    Potential usage scenarios a)Analysis of “repeated defects” in a multiple case study b) Studies on the impact of different metrics/attributes on software evolution c) Further studies on inter-smell relations d) Cost-benefit analysis of code smell removal
  • 58.
    Potential usage scenarios a)Analysis of “repeated defects” in a multiple case study b) Studies on the impact of different metrics/attributes on software evolution c) Further studies on inter-smell relations d) Cost-benefit analysis of code smell removal e) Benchmarking of diverse tools/ methodologies
  • 59.
    Potential usage scenarios a)Analysis of “repeated defects” in a multiple case study b) Studies on the impact of different metrics/attributes on software evolution c) Further studies on inter-smell relations d) Cost-benefit analysis of code smell removal e) Benchmarking of diverse tools/ methodologies f) Task/context extraction, alongside ideas by [2] [2] M. Barnett, et al., “Helping Developers Help Themselves: Automatic Decomposition of Code Review Change-sets,” (ICSE ’15)

  • 60.
    What to considerwhen using the data..
  • 61.
    What to considerwhen using the data.. A. Context of the study
  • 62.
    What to considerwhen using the data.. A. Context of the study B. Tasks were individual
  • 63.
    What to considerwhen using the data.. A. Context of the study B. Tasks were individual C. Time frame is approx. 1-2 sprints
  • 64.
    What to considerwhen using the data.. A. Context of the study B. Tasks were individual C. Time frame is approx. 1-2 sprints D. The age of the systems (+10 years)
  • 65.
    What to considerwhen using the data.. A. Context of the study B. Tasks were individual C. Time frame is approx. 1-2 sprints D. The age of the systems (+10 years) E. Tool for code smells not available
  • 66.
    What to considerwhen using the data.. A. Context of the study B. Tasks were individual C. Time frame is approx. 1-2 sprints D. The age of the systems (+10 years) E. Tool for code smells not available F. No explicit corrective tasks
  • 67.
    What to considerwhen using the data.. A. Context of the study B. Tasks were individual C. Time frame is approx. 1-2 sprints D. The age of the systems (+10 years) E. Tool for code smells not available F. No explicit corrective tasks G. Date accuracy for the tasks
  • 68.
    What to considerwhen using the data.. A. Context of the study B. Tasks were individual C. Time frame is approx. 1-2 sprints D. The age of the systems (+10 years) E. Tool for code smells not available F. No explicit corrective tasks G. Date accuracy for the tasks H. Not all the commit logs were associated with an issue ID
  • 69.
    What to considerwhen using the data.. A. Context of the study B. Tasks were individual C. Time frame is approx. 1-2 sprints D. The age of the systems (+10 years) E. Tool for code smells not available F. No explicit corrective tasks G. Date accuracy for the tasks H. Not all the commit logs were associated with an issue ID I. Consider the trade-off between the degree of realism and the degree of control in such type of studies
  • 70.
    Trade-off between realism andcontrol Sample size (Big Data)DataRichness(ThickData) Low High Low High
  • 71.
    Trade-off between realism andcontrol Sample size (Big Data)DataRichness(ThickData) Low High Low High Controlled/Lab Experiments
  • 72.
    Trade-off between realism andcontrol Sample size (Big Data)DataRichness(ThickData) Low High Low High Case studies Controlled/Lab Experiments
  • 73.
    Trade-off between realism andcontrol Sample size (Big Data)DataRichness(ThickData) Low High Low High Case studies Controlled/Lab Experiments Ethnography
  • 74.
    Trade-off between realism andcontrol Sample size (Big Data)DataRichness(ThickData) Low High Low High Case studies Repository Analysis (OSS) Controlled/Lab Experiments Ethnography
  • 75.
    Trade-off between realism andcontrol Sample size (Big Data)DataRichness(ThickData) Low High Low High Case studies Repository Analysis (OSS) Controlled/Lab Experiments Our study? Ethnography
  • 76.
    Trade-off between realism andcontrol Sample size (Big Data)DataRichness(ThickData) Low High Low High Case studies Repository Analysis (OSS) Controlled/Lab Experiments Our study? Mega-cross-project experiments? Ethnography
  • 78.
  • 79.
    Experimental Replication Appliedto Case Study [1] Context Context Case 1 Case 2 Literal Replication ≈ Same Tasks Developers with similar skills Same project setting Same technology Case 2 Code Smells System A Code Smells System A ≈ Maintenance outcomes Maintenance outcomes System ASystem A Same Systems Context Context Case 1 Case 2 Maintenance outcomes Theoretical Replication ≠ Same Tasks Developers with similar skills Same project setting Same technology Case 3 Code Smells System A Code Smells System B ≠ Maintenance outcomes System BSystem A Different Systems
  • 80.
    Experimental Replication Appliedto Case Study [1] Context Context Case 1 Case 2 Literal Replication ≈ Same Tasks Developers with similar skills Same project setting Same technology Case 2 Code Smells System A Code Smells System A ≈ Maintenance outcomes Maintenance outcomes System ASystem A Same Systems Context Context Case 1 Case 2 Maintenance outcomes Theoretical Replication ≠ Same Tasks Developers with similar skills Same project setting Same technology Case 3 Code Smells System A Code Smells System B ≠ Maintenance outcomes System BSystem A Different Systems [1] Yamashita, 2012: “Assessing the capability of code smells to support software maintainability assessments: Empirical inquiry and methodological approach” PhD Thesis