Advertisement

POP77034 Experimental Methods HT2023 week 2 slides.pdf

Estrategias de Marketing & Desarrollo de Nuevos Negocios
Mar. 23, 2023
Advertisement

More Related Content

Advertisement

POP77034 Experimental Methods HT2023 week 2 slides.pdf

  1. POP77034 Experimental Methods for Social Scientists Dr Noah Buckley Trinity College Dublin HT2023 1
  2. The plan for today • Quick causal inference recap • Experimental design • Ethics • Next week: more experimental design 2
  3. The Fundamental Problem of Causal Inference • It is impossible to observe any unit we’re interested in (e.g., person, country, fi rm, school) both when it has and has not been changed by a causal action • Only in physics and chemistry are units (particles, molecules) interchangeable (“exchangeable”) enough that we don’t have to worry about this • If I give 100 euros to Mary and she gets happier than she was before, we fundamentally cannot know how happy she would have been if I had not given her 100 euros • We can use theory, intuition, anecdote, data to come up with a (very) good guess • But we can never be sure
  4. Recapping causal inference https://egap.org/resource/10-things-to-know-about-causal-inference/
  5. Recapping causal inference
  6. That’s experiments in theory, what about in practice? Real-world implementation of experiments is di ffi cult! • When you assign a unit (e.g. person) to treatment, they may not actually take that treatment • You give them a drug but they don’t take it • You send them a YouTube video to watch but they don’t watch it, or they mute it and don’t pay attention • Same for the control group • They may go out and fi nd the drug themselves, or stumble on the YouTube video
  7. That’s experiments in theory, what about in practice? • “Compliers are subjects who will take the treatment if and only if they were assigned to the treatment group… • Non-compliers are composed of the three remaining subgroups: • Always-takers are subjects who will always take the treatment even if they were assigned to the control group • Never-takers are subjects who will never take the treatment even if they were assigned to the treatment group • De fi ers are subjects who will do the opposite of their treatment assignment status” https://en.wikipedia.org/wiki/Local_average_treatment_e ff ect
  8. Experimental design • Experiments have three “main” components: • Treatment • Randomization into treatment and control groups • Measurement of outcome • Let’s look at each of these components in turn • Also look at groupings of these that form common ‘types’ of experiments 8
  9. Designing a treatment Good treatments • “One hopes that the treatment alters values of the independent variable (e.g., causes subjects to think about campaign fi nance in terms of free speech) or induces certain beliefs among participants (e.g., how much they will get paid).” (Druckman 2020 p.82) • The treatment should: • Be e ffi cacious • Fit with the theoretical construct the researcher is interested in • Vitamin D and…beach holiday? Multivitamin? Stern lecture from doctor? • Support for Putin and…seeing an o ffi cial arrested for corruption? Watching a Navalny video about regime corruption? Reading a TI report about Russian corruption levels? • Have a basis in theory • How will the knowledge gained from the experiment fi t in with other things we know about the world? 9
  10. Designing a treatment Validation, piloting • “When it comes to evaluating treatments, researchers should not trust themselves to validate them. • A crucial step taken in the design of an experiment entails validating the intervention with a sample that matches the experimental participants and/or the participants themselves.” • “One need not test the outcome variables of interest but instead assess whether participants interpret and react to the intervention as presumed (e.g., increased anxiety or social trust) 10
  11. Designing a treatment Validation, piloting 2 • Piloting has the advantage of allowing one to evaluate di ff erent approaches before implementing the actual experiment • Ideally, one pilots on a sample drawn from the same population as the experiment • If that is not possible, however, one should carefully think about possible di ff erences between the pilot sample and the experimental sample” 11
  12. Designing a treatment Manipulation checks • “In addition to piloting, one can incorporate a manipulation check into the experiment itself to empirically assess whether respondents receive and perceive the treatment as intended.” • Example: experiment on whether seeing a news report from Fox News leads people to vote for Republicans more than a CNN report • Manipulation check: ask what the source of the clip was • Downsides: extra cost, be careful with outcome measurement 12
  13. Measurement and validity Druckman 2020 p.87, 93 • Experiments are usually* taken to have good internal validity and ‘statistical conclusion validity’ • Good treatment design, measurement, randomization will help ensure the fi rst three of these types of validity 13
  14. External validity and generalizability Druckman p.94-102 • “External validity means generalizing across 1) samples, 2) settings, 3) treatments, and 4) outcome measures” • What is being generalized? • Existence of an e ff ect? Precise e ff ect size? • To what are you generalizing? • What population? • The answers to these questions depend on the goals of the experiment 14
  15. External validity and realism/naturalism • Does it matter how realistic your treatment is? • What is feasible and ethical? • Example: • Outcome: voting in an election • Conceptual treatment: watching advertisements for a candidate • Practical/actual treatment: • Have participants watch 30 minutes of the news with advertisements interspersed? • Show a series of only advertisements? How many? How many times? 15
  16. Other kinds of treatments Encouragement design • Intent-to-treat estimator • “randomly incentivize subjects recruited via survey to follow one of two Twitter accounts programmed to retweet posts by politically in fl uential users. Subjects were periodically quizzed about the contents of their Twitter feeds and surveyed again to gauge the e ff ect of exposure to counter-attitudinal social media content.” (Guess 2021) • Shows the trade-o ff between naturalism and strength of treatment • “Like the o ffl ine world, online environments are crowded and multifaceted, with many competing demands on users’ attention.” • People just don’t see or pay attention to stu ff ! • “at least in an intent-to-treat world, manipulating a single post, ad impression, or account exposure may not in itself be expected to produce measurably large e ff ects.” 16
  17. Assignment 1 17
  18. Ethics Morton & Williams Chapters 11-13 • Experiments must be ethical! • Harm or risk to participants • Changing of important real- world outcomes (e.g., elections) • Deception 18
  19. Ethics Morton & Williams • Bene fi ts vs. risks • Harms • Psychological harm • Invasion of privacy or con fi dentiality 19
  20. Ethics Morton & Williams • Probability and magnitude of harm • Compare to daily life and routine risks • Vulnerable subjects • Prisoners, children, disabled • When possible, experiments need to get informed consent • Not always feasible! This may be a foreign concept or may interfere with the experiment • “Informed consent has become a mainstay of research with human subjects because it serves two purposes: (1) it ensures that the subjects are voluntarily participating and that their autonomy is protected and (2) it provides researchers with legal protections in case of unexpected events.” 20
  21. Ethics Morton & Williams Chapter 13 • Deception • Concerns about contaminating a subject pool • If you must use deception, you should probably debrief 21
  22. Population and sample • The population you wish to generalize to may be: • All adult residents of Ireland • All adult voters of Ireland • Residents of Dublin between 18 and 45 years of age • Or perhaps the population is irrelevant • Your experiment will need to de fi ne a sample of that population on which your treatment will be applied 22
  23. Sampling Druckman p.109-120 • How homogenous do you think the treatment is? • If you’re interested in attitudes towards pension reform, your sample may need su ffi cient young and old people • Pharmaceuticals and biological sex • Urban vs. rural residents • Cost, generalizability, practicality 23
  24. Sampling: Random samples • Dial random telephone numbers • Pick names out of a list (phonebook) randomly • Where do you get the list?? • Not always legal or feasible 24 Druckman p.109-120
  25. Sampling: Convenience samples • Take whoever is convenient • or whoever selects into your sample • Put up posters, send out emails, buy advertisements • Talk to people on the street • Cheaper and easier, but sharply limits generalizability 25 Druckman p.109-120
  26. Sampling: Weighting • “Weighting requires that one obtain descriptive data of the target population, typically demographics. • For example, when the population includes all Americans, one can use the U.S. Census…for demographic population fi gures. • One then computes weights that account for each respondent’s probability of being included in the sample • For example, if the population consists of 50% men but the sample contains only 40% men, then male sample respondents will be weighted to count more in computations from the sample (and women will be counted less) 26 Druckman p.109-120
  27. Sampling: Weighting • Survey researchers commonly use weights, even with many probability samples, to ensure the accuracy of observational inferences (e.g., the percentage of men who hold a particular attitude)” (Druckman 2020 p.117) • Consider weighting if: • e ff ects are heterogeneous in a way you can correct for • you care about the population • you are interested in precise e ff ect size 27 Druckman p.109-120
  28. Sample size and power https://egap.org/resource/10-things-to-know-about-statistical-power/ • “Power is the ability to distinguish signal from noise.” • “If our experiments are highly-powered, we can be con fi dent that if there truly is a treatment e ff ect, we’ll be able to see it.” • We want to avoid false negatives and false positives • Example: • “Now suppose an experiment instead used subjects’ income as an outcome variable. • Incomes can vary pretty widely – in some places, it is not uncommon for people to have neighbors that earn two, ten, or one hundred times their daily wages. • When noise is high, experiments have more trouble. • A treatment that increased workers’ incomes by 1% would be di ffi cult to detect, because incomes di ff er by so much in the fi rst place.” 28
  29. Sample size and power https://egap.org/resource/10-things-to-know-about-statistical-power/ • The three ingredients of statistical power: • Strength of the treatment • Background noise • As the background noise of your outcome variables increases, the power of your experiment decreases • To the extent that it is possible, try to select outcome variables that have low variability • In practical terms, this means comparing the standard deviation of the outcome variable to the expected treatment e ff ect size • Sample size • See link for formula and calculator, but also beware! Power is a slippery thing 29
  30. Sample size and power https://egap.org/resource/10-things-to-know-about-statistical-power/ • https://www.stat.ubc.ca/~rollin/stats/ssize/n2.html • https://machinelearningmastery.com/statistical-power-and-power- analysis-in-python/ • “Statistical power is the probability of a hypothesis test of fi nding an e ff ect if there is an e ff ect to be found. • A power analysis can be used to estimate the minimum sample size required for an experiment, given a desired signi fi cance level, e ff ect size, and statistical power.” 30
  31. Randomization Random assignment to treatment and control groups • So you’ve got your experimental design, a sample of people to experiment on • Now you need to assign people to treatment and control • Otherwise it wouldn’t be an experiment! • Simple randomization • Complete simple randomization • Block and cluster randomization 31
  32. Randomization: Simple random assignment Druckman 2020, p.109-120 • “Simple random assignment is a term of art, referring to a procedure—a die roll or coin toss—that gives each subject an identical probability of being assigned to the treatment group • The practical drawback of simple random assignment is that when N is small, random chance can create a treatment group that is larger or smaller than what the researcher intended.” (FEDAI p.36) • “A useful special case of simple random assignment is complete random assignment, where exactly m of N units are assigned to the treatment group with equal probability.” • Be careful about de fi ning random: things like birthday may not be completely random in a formal sense 32
  33. Block randomization https://egap.org/resource/10-things-to-know-about-randomization/ • It is possible, when randomizing, to specify the balance of particular factors you care about between treatment and control groups • even though it is not possible to specify which particular units are selected for either group • For example, you can specify that treatment and control groups contain equal ratios of men to women 33
  34. Block randomization https://egap.org/resource/10-things-to-know-about-randomization/ • Why is this desirable? • Not because our estimate of the average treatment e ff ect would otherwise be biased, but because it could be really noisy. • Suppose that a random assignment happened to generate a very male treatment group and a very female control group. We would observe a correlation between gender and treatment status. If we were to estimate a treatment e ff ect, that treatment e ff ect would still be unbiased because gender did not cause treatment status. • However, it would be more di ffi cult to reject the null hypothesis that it was not our treatment but gender that was producing the e ff ect. • In short, the imbalance produces a noisy estimate, which makes it more di ffi cult for us to be con fi dent in our estimates. 34
  35. Block randomization https://cran.r-project.org/web/packages/randomizr/vignettes/randomizr_vignette.html • “Block random assignment (sometimes known as strati fi ed random assignment) is a powerful tool when used well. • In this design, subjects are sorted into blocks (strata) according to their pre-treatment covariates, and then complete random assignment is conducted within each block. • For example, a researcher might block on gender, assigning exactly half of the men and exactly half of the women to treatment.” 35
  36. Block randomization https://cran.r-project.org/web/packages/randomizr/vignettes/randomizr_vignette.html • “Why block? • The fi rst reason is to signal to future readers that treatment e ff ect heterogeneity may be of interest: is the treatment e ff ect di ff erent for men versus women? Of course, such heterogeneity could be explored if complete random assignment had been used, but blocking on a covariate defends a researcher (somewhat) against claims of data dredging. • The second reason is to increase precision. If the blocking variables are predictive of the outcome (i.e., they are correlated with the outcome), then blocking may help to decrease sampling variability. It’s important, however, not to overstate these advantages. The gains from a blocked design can often be realized through covariate adjustment alone.” 36
  37. Block randomization https://cran.r-project.org/web/packages/randomizr/vignettes/randomizr_vignette.html 37
  38. Cluster randomization https://cran.r-project.org/web/packages/randomizr/vignettes/randomizr_vignette.html • Assigning units to treatment or control as a cluster • “Housemates in households: whole households are assigned to treatment or control • Students in classrooms: whole classrooms are assigned to treatment or control • Residents in towns or villages: whole communities are assigned to treatment or control” • Don’t do this unless you really have to! • “Clustered assignment decreases the e ff ective sample size of an experiment. In the extreme case when outcomes are perfectly correlated with clusters, the experiment has an e ff ective sample size equal to the number of clusters. When outcomes are perfectly uncorrelated with clusters, the e ff ective sample size is equal to the number of subjects. Almost all cluster-assigned experiments fall somewhere in the middle of these two extremes.” 38
  39. Types of experiments Common groupings of treatment and measurement • More on this next week 39
  40. Experiment cookbook Druckman p.234+ • Big picture idea • Short (i.e., few pages) document on the general topic and why it is relevant to understanding social, political, and/or economic phenomena 40
  41. • Detailed literature review • An exhaustive search of research on the topic, and detailed descriptions of speci fi c studies • It is here that the researcher should identify speci fi c gaps in existing knowledge. 41 Experiment cookbook Druckman p.234+
  42. • Research question(s) and outcomes • Given the identi fi cation of a gap in existing work, the next step is to put forth a speci fi c question (or questions) to be addressed • This includes identifying the precise outcome variable(s) of interest 42 Experiment cookbook Druckman p.234+
  43. • Theory and hypotheses • Development of a theory and hypotheses to be tested • Researchers should take their time to derive concrete and speci fi c predictions • As part of this step, potential mediators and/or moderators should be speci fi ed • Also, in putting forth predictions, one must be careful to isolate the comparisons to be used. 43 Experiment cookbook Druckman p.234+
  44. • Research design • Discussion of the designs used by others who have addressed similar questions, and how the proposed design connects with previous work. In many cases, the ideal strategy is to utilize and extend prior designs. • Discussion of how such a design will provide data relevant to the larger questions. 44 Experiment cookbook Druckman p.234+
  45. • Research design (cont’d) • Identifying where the data will come from, which includes: • Consideration of the sample and any potential biases. • Detailed measures and where the measures were obtained—that is, where have they been used in prior studies? The measures need to clearly connect to the hypotheses, including the outcome variables and mediators/moderators. 45 Experiment cookbook Druckman p.234+
  46. • Research design (continued more) • In many cases, the design may be too practically complex (e.g., number of experimental conditions relative to realistic sample size), and decisions must be made on what can be trimmed without interfering with the goal of the study. • For original data collection, pre-tests of stimuli, question wordings, etc., are critical to ensure the approach has content and construct validity. • Issues related to internal and external validity should be discussed. 46 Experiment cookbook Druckman p.234+
  47. • Data collection document • If the project involves original data collection, a step-by-step plan needs to be put forth so as not to later forget such details as recruitment, implementation, etc. 47 Experiment cookbook Druckman p.234+
  48. • Data analysis plan • There needs to be a clear data analysis plan—how exactly will the data be used to test hypotheses? The researcher should directly connect the design and measures to the hypotheses. • This often involves making a table with each measure and how it maps on to speci fi c hypotheses. 48 Experiment cookbook Druckman p.234+
  49. • Then • Do the experiment 49 Experiment cookbook Druckman p.234+
  50. Next time • More on speci fi c experimental designs • Take a look at the readings — choose chapters that are interesting to you • Assignment 1! • Due Sunday 50
Advertisement