1
How to Perform Experiments:
Basic Concepts
CSCI 783: Empirical Software Engineering
2
Empirical Software Engineering: How to use empirical research in software engineering?
Repetition of empirical studies is necessary!
Definition
Planning and Design
Execution
Analysis
Packaging
Definition: Determine study goal(s)
Design: and research hypothesis(es). Select type of empirical study to be employed Operationalize study goal(s) and hypotheses. Make study plan: what needs to be done by whom and when. Prepare material required to conduct the study
Execution: Run study according to plan and collect required data
Analysis: Analyze collected data to answer operationalized study goals and hypotheses
Packaging: Report your studies
3
Empiricism in Software Engineering
Confirmation
Evaluation
Identification
Validation
Understanding
Guidance / Control
Of more or less accepted hypotheses:
For example: object-orientation is good for reuse
Of Methods:
For example: Whether Java produces higher quality code than C++
Of Relationships:
For example: Find a relationship between fault prone components and design concepts
Of Models and Measures:
For example: Validate a specific cost estimate model
Of Methods, Techniques and Models:
For example: To understand the relationship between inspections and testing
to help in Management:
For example: as input to personnel to software inspections
To support Decision- Making with respect to Changes:
For example: Whether or not to introduce a new development tool
C
Change / Improve
Experimentation in software engineering
4
Experiment Objective
Cause
Construct
Effect
Construct
Cause-effect
Construct
Theory
Treatment
Outcome
Treatment - Outcome
Construct
Observation
Experiment Operation
Independent variable
Dependent variable
5
What is Empirical Software Engineering Research
What kinds of questions are "interesting"?
What kinds of results help to answer these questions, and what research methods can produce these results?
What kinds of evidence can demonstrate the validity of a result, and how to distinguish good results from bad ones?
6
Types of Research Questions
What kinds of questions are "interesting"?
Types of Research Questions
Method or means of development
Method for analysis
Design, evaluation, or analysis of a particular instance
Generalization or characterization
Feasibility
How can we do/create (or automate doing) X?
What is a better way to do/create X?
How can I evaluate the quality/correctness of X?
How do I choose between X and Y?
What is a (better) design or implementation for application X?
What is property X of artifact/method Y?
How does X compare to Y?
What is the current state of X / practice of Y?
Given X, what will Y (necessarily) be?
What, exactly, do we mean by X?
What are the important characteristics of X?
What is a good formal/empirical model for X?
What are the varieties of X, how are they related?
Is it possible to accomplis.
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
1How to Perform ExperimentsBasic Concepts CSCI .docx
1. 1
How to Perform Experiments:
Basic Concepts
CSCI 783: Empirical Software Engineering
2
Empirical Software Engineering: How to use empirical research
in software engineering?
Repetition of empirical studies is necessary!
Definition
Planning and Design
Execution
Analysis
Packaging
Definition: Determine study goal(s)
Design: and research hypothesis(es). Select type of empirical
study to be employed Operationalize study goal(s) and
hypotheses. Make study plan: what needs to be done by whom
and when. Prepare material required to conduct the study
Execution: Run study according to plan and collect required
data
2. Analysis: Analyze collected data to answer operationalized
study goals and hypotheses
Packaging: Report your studies
3
Empiricism in Software Engineering
Confirmation
Evaluation
Identification
Validation
Understanding
Guidance / Control
Of more or less accepted hypotheses:
For example: object-orientation is good for reuse
Of Methods:
For example: Whether Java produces higher quality code than
C++
Of Relationships:
For example: Find a relationship between fault prone
3. components and design concepts
Of Models and Measures:
For example: Validate a specific cost estimate model
Of Methods, Techniques and Models:
For example: To understand the relationship between
inspections and testing
to help in Management:
For example: as input to personnel to software inspections
To support Decision- Making with respect to Changes:
For example: Whether or not to introduce a new development
tool
C
Change / Improve
Experimentation in software engineering
4
Experiment Objective
Cause
4. Construct
Effect
Construct
Cause-effect
Construct
Theory
Treatment
Outcome
Treatment - Outcome
Construct
Observation
Experiment Operation
Independent variable
Dependent variable
5
What is Empirical Software Engineering Research
What kinds of questions are "interesting"?
What kinds of results help to answer these questions, and what
research methods can produce these results?
What kinds of evidence can demonstrate the validity of a result,
and how to distinguish good results from bad ones?
6
Types of Research Questions
What kinds of questions are "interesting"?
Types of Research Questions
5. Method or means of development
Method for analysis
Design, evaluation, or analysis of a particular instance
Generalization or characterization
Feasibility
How can we do/create (or automate doing) X?
What is a better way to do/create X?
How can I evaluate the quality/correctness of X?
How do I choose between X and Y?
What is a (better) design or implementation for application X?
What is property X of artifact/method Y?
How does X compare to Y?
What is the current state of X / practice of Y?
Given X, what will Y (necessarily) be?
What, exactly, do we mean by X?
What are the important characteristics of X?
What is a good formal/empirical model for X?
What are the varieties of X, how are they related?
Is it possible to accomplish X at all?
7
What is Software Engineering Research
What kinds of questions are "interesting"?
What kinds of results help to answer these questions, and what
research methods can produce these results?
6. 8
Types of Research Results
What kinds of questions are "interesting"?
What kinds of results help to answer these questions,
Procedure or technique
Qualitative or descriptive model
Empirical model
Analytic model
Notation or tool
Specific solution
Answer or judgment
Report
New or better way to do some task, such as design,
implementation,
measurement, evaluation, selection from alternatives,
Techniques for implementation, representation, management,
and analysis, but not advice or guidelines
Structure or Taxonomy for a problem area, architecture style, or
design
pattern
Well grounded checklist
Empirical Predictive model based on data
Structural model precise enough to support formal analysis or
automatic manipulation
Formal language to support technique or model
Implemented tool that embodies a technique
Solution
7. to application problem that shows use of software engineering
principles – may be design, rather than implementation
Result of specific analysis, evaluation, or comparison
Interesting observations, rules of thumb
9
Software Engineering ResearchQuestionResults /
methodValidationFeasibilityQualitative
ModelPersuasionCharacterizationTechniqueImplementationMeth
od/MeansSystemEvaluationGeneralizationEmpirical
ModelAnalysisSelectionAnalytic ModelExperience
10
Software Engineering Research: A common
PlanQuestionResults /
methodValidationFeasibilityQualitative
ModelPersuasionCharacterizationTechniqueImplementationMeth
8. od/MeansSystemEvaluationGeneralizationEmpirical
ModelAnalysisSelectionAnalytic ModelExperience
Can X be Done Better
Build a Y
Measure Y to compare X
11
Software Engineering Research: A common (often bad)
PlanQuestionResults /
methodValidationFeasibilityQualitative
ModelPersuasionCharacterizationTechniqueImplementationMeth
od/MeansSystemEvaluationGeneralizationEmpirical
ModelAnalysisSelectionAnalytic ModelExperience
Can X be Done Better
Build a Y
Look it works
9. 12
Software Engineering Research: 2 Other good
PlansQuestionResults /
methodValidationFeasibilityQualitative
ModelPersuasionCharacterizationTechniqueImplementationMeth
od/MeansSystemEvaluationGeneralizationEmpirical
ModelAnalysisSelectionAnalytic ModelExperience
Can X be Done at all
Build a Y that does X
Look it works
Is X always true of Y
Formally Model X and
Prove Y
Check proof
13
Goal Question Metric (GQM) Paradigm
10. 14
Goal Question Metric (GQM) Paradigm: ExampleGoal 1[1]
Purpose
[2] Issue
[3] Object (process)
[4] ViewpointMaintain
a maximum level of
customer satisfaction
from the Help Desk user’s viewpointQuestion 1What is the
current help desk ticket trend?Metrics 1
Metrics 2
Metrics 3
Metrics 4Number of help desk tickets closed
Number of new help desk tickets
% tickets outside of the upper limit
Subjective rating of customer satisfactionMetrics 5Number of
new help desk tickets openQuestion 2Is the help desk
satisfaction improving or diminishing?Metrics 6
Metrics 7
Metrics 8
Metrics 9Number of help desk calls abandoned
Number of help desk calls answered
Number of help desk calls sent to voicemail
Subjective rating of customer satisfaction
“If you can not measure it, you can not improve it.”:
11. By Great Lord Kelvin
15
Experiment Definition
Definition
Experiment Definition:
Determine study goal(s)
The Goal Template:
Analyze <a process, product, method, model>
For the purpose of <characterizing, understanding, evaluating,
predicting, improving>
With respect to their <Quality Focus>
From the point of view of <Developer, Customer, Manager>
in the context of <Other Context factors that may affect
outcomes>
12. 16
Experiment Definition: Example
The Goal Template:
Analyze <PBR and Checklist Technique>
For the purpose of <Evaluating>
With respect to their <Effectiveness and Efficiency>
From the point of view of <Researchers>
in the context of <Students reading requirement documents>
example
TASK: Software-development process management
PROBLEM: During the software testing phase many anomalies
were discovered and it is suspected that the software quality
would not reach a satisfactory level by the shipping deadline.
QUESTION: Construct a GQM tree that helps you to decide
when it would be possible to ship the software.
17
13. 18
Example – sentence format
Analyze the unit test process to understand the impact of adding
additional tests to project K from the viewpoint of the project
manager
19
Experiment Planning and Design
Experiment Definition
Context Selection
Experiment Implementation
Hypothesis
Formulation
14. Variable Selection
Selection of Subjects
Design
Instrumentation
Validity Evaluation
Experiment
Planning and Design
20
Experiment Planning and Design: Context Selection
Context Selection:
Off line vs. On-line
Students vs. Professionals
Toy vs. Real problems
15. 21
Experiment Planning and Design
Experiment
Definition
Context Selection
Hypothesis
Formulation
Experiment
Planning and Design
22
Experiment Planning and Design: Hypothesis Formulation
Null hypothesis (no real underlying trend or
pattern) and alternative hypothesis.
The objective is to reject the null hypothesis with as high
16. significance as possible.
23
Experiment Planning and Design: Hypothesis Formulation
(Example)
Null hypothesis
There is no difference in code quality between code produced
using clean-room and code produced using our current process
Alternative hypothesis
The quality of code produced using clean-room is higher than
the quality of code produced using our current process
24
17. Experiment Planning and Design
Experiment
Definition
Context Selection
Hypothesis
Formulation
Variable Selection
Experiment
Planning and Design
25
Experiment Planning and Design: Variable Selection
Process
19. Selection of Subjects
Experiment
Planning and Design
27
Experiment Planning and Design
Experiment
Definition
Context Selection
Hypothesis
Formulation
Variable Selection
Selection of Subjects
Design
Experiment
Planning and Design
20. 28
Experiment Planning and Design:
Design
1. Design principles
Randomization
Blocking (e.g., on experience)
Balancing (same number of subjects in each group)
29
Experiment Planning and Design:
Design
2. Design types
21. A large number of standard designs exist, and we should select
an appropriate design type depending on:
“treatments” and “number of subjects”, and “the objective
(hypothesis)“ of the experiment
30
Experiment Planning and Design
Experiment
Definition
Context Selection
Hypothesis
Formulation
Variable Selection
Selection of Subjects
Design
Instrumentation
Experiment
22. Planning and Design
31
Experiment Planning and Design: Instrumentation
Objects
Guidelines
Measurement Instruments
32
Experiment Planning and Design
Experiment
Definition
23. Context Selection
Hypothesis
Formulation
Variable Selection
Selection of Subjects
Design
Instrumentation
Validity Evaluation
Experiment Implementation
Experiment
Planning and Design
Construct Validity
whether the theoretical constructs are interpreted and measured
correctly
Internal Validity
focuses on the study design, and particularly whether the results
really do follow from the data
External Validity
24. whether claims for the generality of the results are justified
Reliability
whether the study yields the same results if other researchers
replicate it
Experiment Planning and Design: Validity Evaluation
For empirical work to be acceptable as a contribution to
scientific knowledge, the researcher needs to convince readers
that the conclusions drawn from an empirical study are valid.
Not surprisingly, the criteria by which researchers judge
validity depend on their philosophical stance.
For positivists, research is normally theory-driven. The key
steps include deriving study propositions from the theory,
designing the study to address the propositions, and then
drawing more general conclusions from the results. Each of
these steps must be shown to be sound. Accordingly, positivists
usually identify four criteria for validity:
• Construct validity focuses on whether the theoretical
constructs are interpreted and measured correctly. For example,
if Jane designs an experiment to test her claims about the
25. efficiency of fish eye views, will she interpret “efficiency” in
the same way that other researchers have, and does she have an
appropriate means for measuring it? Problems with construct
validity occur when the measured variables don’t correspond to
the intended meanings of the theoretical terms.
• Internal validity focuses on the study design, and particularly
whether the results really do follow from the data. Typical
mistakes include the failure to handle confounding variables
properly, and misuse of statistical analysis.
• External validity focuses on whether claims for the generality
of the results are justified. Often, this depends on the nature of
the sampling used in a study. For example, if Jane’s experiment
is conducted with students as her subjects, it might be hard to
convince people that the results would apply to practitioners in
general.
• Reliability focuses on whether the study yields the same
results if other researchers replicate it. Problems occur if the
researcher introduces bias, perhaps because the tool being
evaluated is one that the researcher herself has a stake in.
These criteria are useful for evaluating all positivist studies,
including controlled experiments, most case studies and survey
research. In reporting positivist empirical studies, it is
important to include a section on threats to validity, in which
potential weaknesses in the study design as well as attempts to
26. mitigate these threats are discussed in terms of these four
criteria. This is important because all study designs have flaws.
By acknowledging them explicitly, the researchers show that
they are aware of the flaws and have taken reasonable steps to
minimize their effects.
34
Empirical Software Engineering: How to use empirical research
in software engineering?
Definition
Planning and Design
Definition: Determine study goal(s)
Planning and Design:
Research hypothesis(es).
Select type of empirical study to be employed .
Operationalize study goal(s) and hypotheses.
Make study plan: what needs to be done by whom and when.
Prepare material required to conduct the study
27. Execution: Run study according to plan and collect required
data
Execution
35
Experiment Execution
Experiment
Definition
Train Subjects
Run Treatment
Data Analysis and Interpretation
Experiment Execution
Experiment
28. Planning and Design
Data Collection
36
Empirical Software Engineering: How to use empirical research
in software engineering?
Definition
Planning and Design
Execution
Analysis
Definition: Determine study goal(s)
Design: and research hypothesis(es). Select type of empirical
study to be employed Operationalize study goal(s) and
hypotheses. Make study plan: what needs to be done by whom
29. and when. Prepare material required to conduct the study
Execution: Run study according to plan and collect required
data
Analysis: Analyze collected data to answer operationalized
study goals and hypotheses
37
Experiment
Definition
Descriptive Statistics
Hypothesis Testing
Packaging
30. Experiment
Planning and Design
Discussion and Conclusion
Experiment
Execution
Data Analysis and Interpretation
Data Analysis and Interpretation
38
Empirical Software Engineering: How to use empirical research
in software engineering?
Definition
Planning and Design
Execution
Analysis
Packaging
31. Definition: Determine study goal(s)
Design: and research hypothesis(es). Select type of empirical
study to be employed Operationalize study goal(s) and
hypotheses. Make study plan: what needs to be done by whom
and when. Prepare material required to conduct the study
Execution: Run study according to plan and collect required
data
Analysis: Analyze collected data to answer operationalized
study goals and hypotheses
Packaging: Report your studies
32. 39
Experiment Packaging
Definition
Packaging
Planning and Design
Execution
Data Analysis Interpretation
Report Outline:
Introduction
Problem Statement
Experiment Planning
Experiment Operation
Data Analysis
Interpretation of Results
Discussions and Conclusions
Appendix