Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

The Green Lab - [09 B] Experiment validity

1,560 views

Published on

This presentation is about a lecture I gave within the "Green Lab" course of the Computer Science master, Software Engineering and Green IT track of the Vrije Universiteit Amsterdam: http://masters.vu.nl/en/programmes/computer-science-software-engineering-green-it/index.aspx

http://www.ivanomalavolta.com

Published in: Technology
  • Be the first to comment

  • Be the first to like this

The Green Lab - [09 B] Experiment validity

  1. 1. 1 Het begint met een idee Experiment validity Ivano Malavolta
  2. 2. Vrije Universiteit Amsterdam 2 Ivano Malavolta / S2 group / Empirical software engineering Planning phases Scope of this lecture
  3. 3. Vrije Universiteit Amsterdam 3 Ivano Malavolta / S2 group / Green Lab Experiment validity ● We aim for adequate validity, not universal validity ○ What matters is our population of interest Validity is the extent to which our results are sound and applicable to the real world ● Validity is in trade-off with experiment scope
  4. 4. Vrije Universiteit Amsterdam Threats Identification 4 ● Identifying threats helps to plan for adequate validity ● Each threat needs appropriate mitigation ● Several classifications of validity threats: ○ Campbell and Stanley [1] ○ Cook and Campbell [2] Ivano Malavolta / S2 group / Green Lab
  5. 5. Vrije Universiteit Amsterdam 5 Types of threat to validity Theory Observation Cause EffectCausation e.g. encoding algorithms e.g. Energy efficiency Treatment Experiment Outcome e.g. JPEG e.g. energy per image Ivano Malavolta / S2 group / Green Lab
  6. 6. Vrije Universiteit Amsterdam Causation Experiment 6 Types of threat to validity Theory Observation Cause Effect Treatment Outcome Construct Internal Conclusion Construct External Ivano Malavolta / S2 group / Green Lab e.g. encoding algorithms e.g. Energy efficiency e.g. JPEG e.g. energy per image
  7. 7. Vrije Universiteit Amsterdam 7 Internal validity Internal Validity: causality between treatment and outcome ● Strongly related to the experiment design and operation ○ Are my results caused by the treatment? ○ Have I considered all possible factors? Ivano Malavolta / S2 group / Green Lab
  8. 8. Vrije Universiteit Amsterdam 8 Internal validity: types of threat ● History ○ Different trials of the experiment performed in different time frames (eg, after holidays vs normal days) ● Maturation ○ Subjects may react differently over time (eg, learning effect, tiresome, boredome) ● Selection ○ Some subjects may abandon the experiment ○ Event worse, some specific type of subjects may leave it ● Direction of causal influence ○ A causes B, B causes A, or X causes A and B? Ivano Malavolta / S2 group / Green Lab
  9. 9. Vrije Universiteit Amsterdam 9 Internal validity: mitigation Analyze and identify confounding factors/noise Choose appropriate experiment design Ivano Malavolta / S2 group / Green Lab
  10. 10. Vrije Universiteit Amsterdam Conclusion Validity: statistical correctness and significance ● Are my conclusions correct? ● Are my results significant enough? 10 Conclusion validity Ivano Malavolta / S2 group / Green Lab
  11. 11. Vrije Universiteit Amsterdam 11 Conclusion validity: types of threat ● Low statistical power ○ Results not statistically significant ○ There is a significant difference but the statistical test does not reveal it due to the low number of data points ● Violated assumptions of statistical tests ○ eg, many tests assume normally distributed samples ● Fishing and error rate ○ If you are combining multiple statistical tests, also their significance should be adapted ● Reliability of measures ○ If you repeat the measurement you should get similar results → same conclusions Ivano Malavolta / S2 group / Green Lab
  12. 12. Vrije Universiteit Amsterdam 12 Conclusion validity: mitigation Select appropriate tests Use only as much significance as needed Keep environment under control Ivano Malavolta / S2 group / Green Lab
  13. 13. Vrije Universiteit Amsterdam 13 Construct validity ● Have I defined my constructs properly? ● Am I analyzing the correct variables for the effects? Construct Validity: relation between theory and observation Ivano Malavolta / S2 group / Green Lab
  14. 14. Vrije Universiteit Amsterdam 14 Construct validity: types of threat ● Inadequate preoperational explication of constructs ○ construct not well defined before being translated into measures ○ Theory unclear ○ Comparing two methods, but not clear what does mean that a method is better than another ● Mono-operation bias ○ I have one independent variable only, one single object or treatment → the experiment could not represent the theory ○ eg, inspection conducted on a single document not representative of the set of documents on which the technique is often applied ● Mono-method bias ○ When you use a single type of measures or observations ○ The experimenter may bias the measures Ivano Malavolta / S2 group / Green Lab
  15. 15. Vrije Universiteit Amsterdam 15 Construct validity: mitigation Early definition of constructs (GQM) Use appropriate experiment design Introduce redundancy for cross-checks Ivano Malavolta / S2 group / Green Lab
  16. 16. Vrije Universiteit Amsterdam 16 External validity ● Are my results valid for the whole target population? ● Have I selected a representative sample? External Validity: generalizability of the results Ivano Malavolta / S2 group / Green Lab
  17. 17. Vrije Universiteit Amsterdam 17 External validity: types of threat ● Interaction of selection and treatment ○ the population of subjects is not representative of the one for which I would like to generalize my results ○ eg, performing experiments with students to use results in industry ● Interaction of setting and treatment ○ the experimental setting or the material are not representative ○ e.g. I let the subjects using tools that they don’t use in the reality ○ e.g. Web development using textual editors ○ Use of toy objects ● Interaction of history and treatment ○ the experiment is conducted on a special time or day which affects the results ○ eg, our experiment on green software is performed after a big congress at which some subjects participated Ivano Malavolta / S2 group / Green Lab
  18. 18. Vrije Universiteit Amsterdam 18 External validity: mitigation Use an environment as realistic as possible Explicitly define and model your context Ivano Malavolta / S2 group / Green Lab
  19. 19. Vrije Universiteit Amsterdam ● You know that you have to explicitly take into account the threats to validity of your experiment ● Discussing threats actually makes your experiment stronger ○ you are not showing your weaknesses, but you are playing for replicability ● You will make tradeoffs between threats to validity in your experiment ● Consider threats to validity from the beginning ○ Reasoning on them will make you feel more confident about the scope and design of your experiment 19 What this lecture means to you? Ivano Malavolta / S2 group / Green Lab
  20. 20. Vrije Universiteit Amsterdam 20 Ivano Malavolta / S2 group / Empirical software engineering Readings Chapter 8 [1] Campbell and Stanley, Experimental and Quasi- Experimental designs for Research (1963). (Blackboard) [2] Cook and Campbell, Quasi-experimentation - Design and Analysis Issues for Field Settings (1979). Available at the VU library. Ivano Malavolta / S2 group / Green Lab
  21. 21. Vrije Universiteit Amsterdam 21 Ivano Malavolta / S2 group / Empirical software engineering Some contents of lecture extracted from: ● Giuseppe Procaccianti’s lectures at VU Acknowledgements

×