This presentation is about a lecture I gave within the "Green Lab" course of the Computer Science master program, of the Vrije Universiteit Amsterdam.
http://www.ivanomalavolta.com
4. Vrije Universiteit Amsterdam
Goal: to determine the set of trials the experiment shall have to
make sure that the effect of the treatments is visible
4 Ivano Malavolta / S2 group / Empirical software engineering
Experiment design
Experiment design: how to organize and execute the trials
6. Vrije Universiteit Amsterdam
6
What can you decide?
β Context?
β Fixed subject: software developer
β Factor: image encoding algorithm
βͺ other factors?
β Treatments:
βͺ JPEG
βͺ PNG
β Objects: mobile apps
βͺ one app vs many apps
β How to βcoverβ all the relevant combinations
of treatments, objects, and subjects?
8. Vrije Universiteit Amsterdam
8
Randomization
PROBLEM: when executing many trials, the chosen objects,
subjects and execution ordering may lead to biased results
SOLUTION: randomize the involved objects and subjects, and
the order in which trials are executed
9. Vrije Universiteit Amsterdam
9
Randomization
β Aim: remove the effects of a specific non-controlled
independent variable
β Group the subjects/objects by that variable and then
randomize
Examples:
β We randomly choose mobile apps from a dataset
β For each app we randomly assign a specific encoding algorithm for its
images
11. Vrije Universiteit Amsterdam
11
Blocking
PROBLEM: one factor influences our results but we want to
mitigate its effects
SOLUTION: split the population in blocks with same (or
similar) level of this factor
12. Vrije Universiteit Amsterdam
12
Blocking
β The blocks are studied separately
β We DO NOT study the effects between the groups
β e.g. no conclusions on the correlation between the effects of image encoding
and the type of device
eg, Main factor: image encoding algorithm {PNG, JPEG}
Blocked factor: type of device {low-end, high-end}
Block 1: run the apps only in low-end devices
Block 2: run the apps only in high-end devices
13. Vrije Universiteit Amsterdam
13
Why more than 2 factors?
β {PNG, JPEG} β energy consumption
β {PNG, JPEG} and {single_render, par_render} β energy consumption
β factor vs block?
Ivano Malavolta / S2 group / Experiment design
16. Vrije Universiteit Amsterdam
16
Balancing
PROBLEM: many statistical analyses are more powerful and
simple when performed on balanced data
SOLUTION: consider the same (or similar) number of
subjects for each type of treatment
eg, Block 1: 20 apps
Block 2: 20 apps
18. Vrije Universiteit Amsterdam
We can have the following cases:
β 1 factor and 2 treatments (1F-2T)
β 1 factor and >2 treatments (1F-MT)
β 2 factors and 2 treatments (2F-2T)
β >2 factors, each one with >=2 treatments (MF-MT)
18 Ivano Malavolta / S2 group / Empirical software engineering
Basic design types
Basic
19. Vrije Universiteit Amsterdam
19 Ivano Malavolta / S2 group / Empirical software engineering
1 factor and 2 treatments
We assume to have 1 dependent variable P
Notation:
β πi : dependent variable mean for treatment i
β πi = avg(P)
β yij: j-th measure of the dependent variable for treatment i
Example:
β We are seeing whether different image encoding algorithms impact
energy consumption of mobile apps
β Factor: encoding algorithm
β Treatments:
β PNG
β JPEG
β Dependent variable: consumed energy during common usage scenarios
20. Vrije Universiteit Amsterdam
20 Ivano Malavolta / S2 group / Empirical software engineering
1F-2T: fully randomized design
β Each treatment is randomly assigned to the experimental objects
β Same number of objects for each treatment (balancing)
Object
(Application)
Treatment 1
(PNG)
Treatment 2
(JPEG)
1 X
2 X
3 X
4 X
Example of hypotheses:
H0 : π1 = π2
Ha : π1 != π2 or π1 > π2 or π1 < π2
Analyses:
β t-test (unpaired)
β Mann-Whitney test
21. Vrije Universiteit Amsterdam
21 Ivano Malavolta / S2 group / Empirical software engineering
1F-2T: paired comparison design
β Each treatment is applied on each object (crossover design)
β The order of the treatments is random
Object
(Application)
Treatment 1
(PNG)
Treatment 2
(JPEG)
1 1st 2nd
2 2nd 1st
3 2nd 1st
4 1st 2nd
22. Vrije Universiteit Amsterdam
22 Ivano Malavolta / S2 group / Empirical software engineering
1F-2T: paired comparison design
P
t
A1, JPEG
A1, PNG
tj
dj
πd= avg(d0...n)
23. Vrije Universiteit Amsterdam
23 Ivano Malavolta / S2 group / Empirical software engineering
1F-2T: paired comparison design
Example of hypotheses:
H0 : πd = 0
Ha : πd != 0or πd > 0or πd < 0
Analyses:
β Paired t-test
β Sign test
β Wilcoxon
24. Vrije Universiteit Amsterdam
24
How to choose between the two design?
Object
(Application)
Treatment 1
(PNG)
Treatment 2
(JPEG)
1 1st 2nd
2 2nd 1st
3 2nd 1st
4 1st 2nd
Object
(Application)
Treatment 1
(PNG)
Treatment 2
(JPEG)
1 X
2 X
3 X
4 X
25. Vrije Universiteit Amsterdam
25 Ivano Malavolta / S2 group / Empirical software engineering
1 factor and >2 treatments
In this case the factor can have more than one value
Example:
β Factor: encoding algorithms
β Treatments:
β PNG
β JPEG
β TIFF
β Dependent variable: consumed energy during common usage scenarios
26. Vrije Universiteit Amsterdam
26 Ivano Malavolta / S2 group / Empirical software engineering
1F-MT: fully randomized design
β Each treatment is randomly assigned to the experimental objects
β Same number of objects per each treatment (balancing)
Example of hypotheses:
H0 : π1 = π2 = π3 = β¦ = πt
Ha : πi != πj for at least on pair
of (i,j)
Analyses:
β ANOVA (ANalysis Of
VAriance)
β Kruskal-Wallis
Object
(Application)
Treatment 1
(PNG)
Treatment 2
(JPEG)
Treatment 3
(TIFF)
1 X
2 X
3 X
4 X
5 X
6 X
27. Vrije Universiteit Amsterdam
27 Ivano Malavolta / S2 group / Empirical software engineering
1F-MT: paired comparison design
β Each treatment is applied on each object (crossover design)
β The order of the treatments is random
Object
(Application)
Treatment 1
(PNG)
Treatment 2
(JPEG)
Treatment 3
(TIFF)
1 1st 3rd 2nd
2 3rd 1st 2nd
3 2nd 3rd 1st
4 2nd 1st 3rd
5 3rd 2nd 1st
6 1st 2nd 3rd
Example of hypotheses:
H0 : π1 = π2 = π3 = β¦ = πt
Ha : πi != πj for at least on pair
of (i,j)
Analyses:
β ANOVA
β Kruskal-Wallis
β Repeated Measures ANOVA
28. Vrije Universiteit Amsterdam
28 Ivano Malavolta / S2 group / Empirical software engineering
Some contents of lecture extracted from:
β Giuseppe Procacciantiβs lectures at VU
Acknowledgements