PROMISE keynote Juristo

14 views

Published on

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
14
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
1
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

PROMISE keynote Juristo

  1. 1. Use and Misuse of the Term Experiment in MSR Research Natalia Juristo University of Oulu & Technical University of Madrid PROMISE September 7th 2016
  2. 2. Mo?va?on n  Today empiricism is everywhere in SE n  This does not mean SE is empirically mature n  Conduc?ng empirical studies does not imply they are carried out and understood properly n  I focus here in a methodological issue on MSR research n  The use of experiments in MSR 2
  3. 3. Mo?va?on n  For several years I have been struggling with matching MSR research with the more tradi?onal SE empirical research (being conducted along the last 35 years) n  Very oLen I was shocked hearing to call experiment (in MSR works) to empirical studies I do not consider as such n  I discuss today about a research we are conduc?ng to clarify this issue 3
  4. 4. Collabora?on n  This research has been conducted in collabora?on with n  Claudia Ayala n  Xavier Franch n  Burak Turhan 4
  5. 5. Evidence of Misuse
  6. 6. Small-scale Literature Review n  We conducted a literature review to double- check the use of the term experiment in MSR works n  2015 MSR, ESEM and EMSE n  MSR 42 papers reviewed n  ESEM 36 papers n  EMSE 55 papers 6
  7. 7. Findings Venue 2015 Use of Term Experiment MSR vs tradi<onal experiment MSR Use vs. Misuse ESEM 30.5% 11 out of 36 72,72% MSR Works (8 papers) 27,28% tradi?onal experiments (3 papers) Wrong use: 12,5% Proper use : 87,5% MSR 42,8% 18 out of 42 100% MSR Works (18 papers) 0% tradi?onal experiments Wrong use: 44,45% Proper use : 55,55% EMSE 52,72% 29 out of 55 65,51% MSR Works (19 papers) 34,48% tradi?onal experiments (10 papers) Wrong use: 52,63% Proper use : 47,36% ….Let me elaborate why the term is misused
  8. 8. What is an experiment
  9. 9. Experiment Definition n  Empirical procedure where key variables of a reality are manipulated to investigate the impact of such variations
  10. 10. What Makes an Experiment n  Manipula?on of variables under study n  Treatments must be assigned to experimental units n  Controlling poten?al confounding variables impac?ng results n  Confounding is eliminated though random assignment of treatments to units 10
  11. 11. What Makes an Experiment Interven?on n  Experimenta?on n  There is a purposely interven?on by researchers n  Researchers allocate treatments to units n  Experimental groups (exposure and unexposure) are determined by researcher n  Observa?on n  Researchers have a passive role and do not interfere with reality n  Data are generated directly from reality and a>er they are analyzed n  Exposure status is not determined by researcher 11
  12. 12. What Makes an Experiment Randomiza?on n  Experiments limit the poten?al for any confounding factors (biases) by randomly assigning one par?cipant pool to a treatment and another par?cipant pool to control or other treatment n  Random alloca?on of treatments to subjects minimizes the chance that the incidence of confounding (par?cularly unknown confounding) variables will differ between the two groups 12
  13. 13. What Makes an Experiment Interven?on + Randomiza?on n  Interven?on guarantees causality n  Inspiring example n  In a quasi-experiment the alloca?on of treatment is not possible n  Although run under controlled condi?ons n  The case of psychology experiments n  Personality treats 13
  14. 14. What Does not Makes an Experiment n  Randomiza?on n  Comparison n  Analysis techniques 14
  15. 15. What Does not Makes an Experiment Randomiza?on n  Randomiza?on is a strategy aiming to reduce confounding variables (bias) n  It is mandatory in controlled experiments n  Can be applied to other type of empirical studies n  Inspiring example n  Randomiza?on in surveys 15
  16. 16. What Does not Makes an Experiment Comparison n  Compare among the impact of values of a variable does not mean we will be able to reveal causality n  Comparing in a set of data units with different values of a variable neither makes the study an experiment nor can trace back differences to treatments 16
  17. 17. What Does not Makes an Experiment Analysis n  Analysis techniques do not differen?ate experiments from other empirical studies n  What allows to reveal causality is not the type of analysis technique it is the design of the study n  Applying to a set of data an analysis technique typically used in experiments neither makes the study an experiment nor detects causality 17
  18. 18. What Does not Makes an Experiment n  An MSR study n  Applying ANOVA does not mean it is an experiment n  Comparing pools of data differing in a variable’s value does not imply it is an experiment n  Even if MSR studies would randomized they were not experiments n  Design guarantees n  The drop of bias and confounding variables n  The differences observed in behavior are caused by treatments 18
  19. 19. Impact of Randomiza?on and Design 19
  20. 20. Types of Experiments n  Without interven?on n  Natural environment n  Natural experiments n  Interven?on n  Where? n  Ar?ficial controlled environment n  Laboratory controlled experiments n  Natural environment n  Field experiments 20
  21. 21. Laboratory experiments Purposely interven?on Randomized alloca?on of treatments Ar?ficial environment highly controlled Field experiments Purposely interven?on Randomized alloca?on of treatments Natural uncontrolled environment
  22. 22. 22 Natural experiments No interven?on In a natural uncontrolled environment
  23. 23. Mining SoLware Repositories n  MSR research n  Outcomes (such as quality and produc?vity) are studied in large- samples of past data to n  Apply sta?s?cal methods to test hypothesis n  Build machine learning and mining methods on past data into tools to support programming tasks n  The data stored in a repository have been obtained from reality (without interven?on) n  Therefore MSR works are observa?onal studies n  We could call them natural experiments but that term is misleading 23
  24. 24. MSR and Epidemiology
  25. 25. Empirical Studies in Medicine 25 MethodDevelopment Laboratory Research or Pre-clinical Non-Human Experiments Field Research Ill People Ill & Healthy People From 20-100 volunteers to 1-2M patients Descriptive A n a l y t i c Retrospective Prospective Descriptive
  26. 26. Empirical Studies in Medicine Analy<cal Experimental Clinical Trial Field Trial Group Trial Observa<onal Cohort Studies Prospec@ve Study; Follow-up study Concurrent study; Incidence study Longitudinal study Historical Cohort studies Case-Control Studies Retrospec@ve study; Case comparison study Case history study; Case compeer study; Case referent study; Trohoc study Descrip<ve Individuals Cross-Sec?onal Studies Prevalence study; Disease frequency study Morbidity survey; Health survey Case series Single case Popula<on Ecological Studies
  27. 27. (Prospec?ve) Cohort Study n  A collec?on of data at regular intervals of a group of people who do not have the disease for a period of ?me and see who develops the disease (new incidence) n  Cohort n  Group of people who share a common characteris?c within a defined period n  e.g., are born, are exposed to a drug or vaccine or pollutant, or undergo a certain medical procedure n  Comparison group n  The general popula?on from which the cohort is drawn n  Another cohort of persons thought to have had likle or no exposure to the substance under inves?ga?on, but otherwise similar n  SE: Projects/Commits that have not applied the method under study n  Example n  Does exposure to X (smoking) associate with outcome Y (lung cancer)? n  Such a study would recruit a group of smokers and a group of non-smokers (the unexposed group) and follow them for a set period of ?me and note differences in the incidence of lung cancer between the groups at the end of this ?me n  SE: A passive follow up of projects/commits, collec@ng data at regular intervals and no@ng the quality/produc@ve they get 27
  28. 28. Retrospec?ve Studies n  The researcher collects data from past records and does not follow pa?ents up as is in prospec?ve studies n  All the events (exposure, latent period, and subsequent outcome -development of disease-) have already occurred in the past n  Errors due to confounding and bias are more common in retrospec?ve studies than in prospec?ve studies 28
  29. 29. Retrospec?ve Studies Threats to Validity n  Some key data have not been measured n  Biases may affect the selec?on of controls n  Selec?on bias n  Only select pa?ents with the necessary informa?on n  Misclassifica?on or informa?on bias as a result of the retrospec?ve aspect n  Researchers cannot control exposure or outcome assessment but instead need to rely on others for accurate recordkeeping n  It can be very difficult to make accurate comparisons between the exposed and the non-exposed 29
  30. 30. Retrospec?ve Cohort Study n  Records of groups of individuals who are alike in many ways but differ by a certain characteris?c are compared for a par?cular output n  For example, female nurses who smoke and those who do not smoke n  SE: Use of past data in a repository to compare certain output of projects with characteris@c A and no-A n  The researcher collects data from past records and does not follow pa?ents up as is the case with a prospec?ve study 30
  31. 31. (Retrospec?ve) Case-Control Study n  Records of individuals are divided in two groups differing in outcome (disease or not) and compared on the basis of some supposed causal akribute n  Case-Control studies select subjects based on their disease status (the effect) n  Cohort studies select subjects based on their exposure status (the cause) n  SE: Select projects/commits with certain level (i.e. quality value) and trace back certain project characteris@cs that is believed to contribute to quality 31
  32. 32. Ecological Studies n  Units of analysis are popula?ons n  Comparison of groups rather than individuals n  Explores correla?ons between group level exposure and outcomes 32
  33. 33. Hierarchies of Evidence
  34. 34. Hierarchy of Evidences n  It is cri?cal to understand which empirical study you are conduc?ng n  To fully understand what the results are telling us n  The type of results depends on the type of study!!! n  Evidence hierarchies reflect the rela?ve authority of various types of empirical studies 34
  35. 35. Authority of Evidences Field Experiments Observational Analytical Prospective Retrospective Observational Descriptive Laboratory Experiments
  36. 36. Psychology Hierarchy of Evidence 38
  37. 37. Two MSR examples
  38. 38. Example 1 n  MSR’15 n  The Uniqueness of Changes: Characteris?cs and Applica?ons n  Ray, Nagappan, Bird, Nagappan, Zimmeramnn n  Why this paper n  A very well wriken paper n  Several empirical studies of different type about the same issue n  Prominent MSR authors 40
  39. 39. Empirical Studies (Authors’ terms) n  Topic n  Some changes are unique while other are not n  They propose a way to iden?fy uniqueness of changes n  Empirical studies (in authors’ terms) n  Analysis of unique and non-unique changes proper?es n  What is the extent of unique changes; Who introduces unique changes; Where do unique changes take place n  Applica?ons n  Experiment for Risk Analysis n  Check whether U file commits are have a higher defect rate than NU file commits n  Use Mann-Whitney test for the comparison n  Recommenda?on systems n  A system is embedded in the development environment to suggest changes to developers n  Precision and recall of the recommenda?ons is analyzed 41
  40. 40. Type of Empirical Studies (Epidemiology terms) n  Analysis of unique and non-unique changes proper?es n  What is the extent of unique changes; Who introduces unique changes; Where do unique changes take place n  Ecological study n  Descrip?ve; Use of popula?on aggregated data n  Applica?on: Experiment for Risk Analysis n  Check whether U file commits have a higher defect rate than NU file commits n  Retrospec?ve cohort study n  Comparison of past data n  Applica?ons: Recommenda?on systems n  A system is embedded in the development environment to suggest changes to developers; Precision and recall of the recommenda?ons is analyzed n  Prospec?ve observa?onal study; ecological? n  But no comparison is made (i.e.: if quality/produc?vity of developments using the recommenda?ons) n  Could be conducted as Field Trial or (Prospec?ve) Cohort study 42
  41. 41. Example 2 n  ESEM’15 n  How to make best use of cross-company data for web effort es?ma?on n  Minku, Sarro, Mendes, Ferrucci n  Topic n  Compares CC dataset versus WC dataset for web effort es?ma?on n  Compares Dycom against NN-filtering n  Dycom: Framework for learning soLware effort es?ma?on models for a company based on mapping CC models to the company’s context) n  NN-filtering: Nearest Neighbor filtering to make CC es?ma?ons 43
  42. 42. Experiments in Effort Es?ma?on Research n  Interven?on n  The two (effort es?ma?on) techniques compared n  Alloca?on of treatments to units? n  Yes n  Every project belonging to the test data set is an experimental unit n  Experimental groups are the test data set es?mated with one or the other technique n  Typical AB designs; But could try others n  Control confounding variables through randomiza?on? n  No 44
  43. 43. Which Uses were Right Venue 2015 Use of Term Experiment MSR vs tradi<onal experiment MSR Use vs. Misuse ESEM 30.5% 11 out of 36 72,7% MSR Works (8 papers) 27,3% tradi?onal experiments (3 papers) Observa?onal: 12,5% Data experiments: 87,5% MSR 42,8% 18 out of 42 100% MSR Works (18 papers) 0% tradi?onal experiments Observa?onal: 44,4% Data experiments: 55,5% EMSE 52,72% 29 out of 55 65,5% MSR Works (19 papers) 34,5% tradi?onal experiments (10 papers) Observa?onal: 52,6% Data experiments : 47,4%
  44. 44. Conclusions
  45. 45. Conclusions n  MSR is a research method by which several type of empirical studies can be conducted n  In any case most research is n  Observa?onal n  Retrospec?ve n  Unless data is mined from development tools prospec?vely n  Therefore the evidence obtained is of lower quality than n  Observa?onal prospec?ve studies n  Field experimental studies n  Show correla?on but it is hard to prove causa?on n  More powerful types of observa?onal studies (Case-control; Cohort) could get beker evidence 47
  46. 46. Use and Misuse of the Term Experiment in MSR Research Natalia Juristo University of Oulu & Technical University of Madrid PROMISE September 7th 2016

×