The Classic Experiment
(and Its Limitations)
Class 6
Stages of the Research Process
• Research process begins with a hypothesis
about a presumed (causal?) relationship
between an independent and a dependent
variable
– We also might assume that there are conditioning
variables, as well
• The elements of a test of this hypothesis are:
– Research design to assess the relationships between
the variables
– Recruiting subjects for testing the hypotheses
– Valid and reliable measurement of the variables
– Appropriate methods of statistical analysis that permit
inferential conclusions about the hypothesis
Research Designs
• Today, we discuss research designs, focusing on
experiments.
– Contrast this with an epidemiological model, where we infer that
group differences are attributable to the hypothesized effect in a
population.
– In an experiment, we attempt to control for those differences
between groups, so that any differences we observe between
groups is attributable to the test, and not to the group differences
• This is why experiments are considered a “gold
standard” in identifying a causal relationship between a
dependent and an independent variable.
– Obviously, experiments are not always feasible
– Their strengths and limitations fuel endless debates, and have
become a battleground for litigants seeking to assess a pattern
of facts
– Examples from video games, alcohol and car crashes
Types of Research Designs
• Case studies
– Good for generating hypotheses, for understanding and
illustrating causal linkages
– Not good for testing hypotheses, or for generalizing to other
populations
• Correlational studies
– Studies that assess simultaneous changes in independent and
dependent variables.
• Example: income levels and voter preferences on surveys
• Example: diet and disease (epi causation model
– You can still make predictions from correlational studies if you
have ruled out other causes, but you cannot achieve “control”
without understanding directionality of effect.
• True experiments
– Random assignment of subjects to groups, unequal treatment of
similarly situated people….. ‘but for…’ causation
• Examples: Perry Pre-School, clinical drug trials
• Quasi-experiments
– Nonrandom assignment, with approximations and control for
between-group differences.
• Why are experiments the gold standard?
– An experiment is a design for testing
hypotheses regarding the empirical
relationship between an independent and a
dependent variable
– It is the most efficient and reliable way to rule
out spurious causation (rival hypotheses)
through random assignment of individuals to
test conditions, and therefore to establish
conditions for causal inference.
– Causality is critical for the scientific goals of
“explanation," "prediction" and “control.”
Why Random Assignment?
• RA assigns units to conditions based on chance
– Not the same as random sampling – we get to this later, as an
example of a validity threat or strength
• Avoids correlation of causes with treatment conditions
• When is randomization feasible? ETHICAL DECISION
– When demand outstrips supply
– When supply of X is short
– When isolation or separation of experimental group is possible
– Mandatory change (legislation)
– No preferences
– No advantages (denial of possibly beneficial service)
– New organizations are created
– Lotteries
Types of Experiments
• The Classic Experimental Design
• The Post-test Only Experimental Design
– Strengths -- No test effects, no desensitization
– Weaknesses -- Problems in attribution of effects, does
not eliminate rival causal factors such as history or
test effects, introduces test effects (!)
• The Solomon Four-Group Design (Fig 8.5)
– Provides estimates of test effects, avoids reactivity
and test effects.
– Expensive, difficult to implement, especially under
field conditions
• Nested, or Hierarchical Designs
– Allows for identification of contextual effects
– Common in school research
Natural Experiments
• Natural Disasters, Policy or Legislative
Changes
• Examples
– Flipping Coins in the Courtroom
– Damage Caps
– Disaster Research – Highway 880
– Waiver Laws in Adjacent Areas
Some Limitations to Experiments
• Generalizability of X -- complex realities vs.
single variables
• Representations of theory -- e.g., the meaning of
arrest
• Period effects -- problems of the day, factors
related to crimes or behaviors at one time may
not be salient at another time (e.g., Drug eras,
drug-crime relationships)
• Political Limitations (e.g., over-rides)
• Organizational resistance
When You Can’t Randomize:
Quasi-Experiments
• Theory and Logic
– Adjusting for selection differences
– This can be done either by design controls or statistical controls
or both
• No-Control Quasi-Experimental Designs
– Time series before and after an intervention
– Removed TX (satisfies the essentialist view of causation)
• Critiques of multiple pretest observations
– Test effects (sensitization, et al.) – works best if the pretest
observations are unobtrusive
– Change over time in status of subject vis-à-vis the preconditions
for treatment
• Matched Strategies
– Matched Cases – (Case Control Designs) Housing Discrimination
– Matched Samples -- Bishop Waiver Study
– Weaknesses and Strengths (omitted variable biases)
• Difficulties and Problems with Matching
– Endogeneity of Cause and Effect
• Strategies for Better Matches
– Use stable variables (avoid measurement errors)
– Avoid confounding of matching variables with dependent variables
(outcomes)
– Use “deep” matches – longitudinally measured or stable variables,
for example, rather than single-state variables
• Statistical Solutions
– instrumental variables approach
– “propensity score matching” – try to model the underlying differences
between experimental and control groups
Quasi-Experimental Designs
That Use Control Groups
Experimental Validity
• Validity - whether an experiment produces “true”
or “accurate” answers
• Threats to internal validity
– Threats posed by the design of the experiment itself --
whether the observational procedures may have
produced the results. Internal validity refers to the
soundness of the design to justify the conclusions
reached.
• Threats to external validity
– Threats due to the limitations of the sample --
whether the research is generalizeable or applicable
only to the population studied. In other words, it
refers to the extent to which the results can be
generalized.
Internal Validity Threats
• History – local factors
• Maturation of subjects – they change
• Test Effects – subjects figure out test
• Instrumentation – biased instruments
• Regression to the Mean – “what goes up…”
• Selection Bias I – non-equivalent groups
• Mortality – subjects leave experiment
• Testing Effects – you know you’re being studied
• Reactivity – reactions to the researcher rather
than the stimulus
External Validity Threats
• Selection Bias II -- groups are unrepresentative of
general populations
• Multiple treatment inference -- more than one
independent variable operating
• Halo effects -- conferring status or label that influences
behavior
• Local history – changing contexts
• Diffusion of treatment -- controls imitate experimental
subjects
• Compensatory equalization of treatment -- controls want
to receive experimental treatment
• Decay -- erosion of treatment
• Contamination -- C's receive some of E treatment
Tradeoffs
• Must we trade internal validity for external
validity in experiments?

The classic experiment_(and_its_limitations)-1

  • 1.
    The Classic Experiment (andIts Limitations) Class 6
  • 2.
    Stages of theResearch Process • Research process begins with a hypothesis about a presumed (causal?) relationship between an independent and a dependent variable – We also might assume that there are conditioning variables, as well • The elements of a test of this hypothesis are: – Research design to assess the relationships between the variables – Recruiting subjects for testing the hypotheses – Valid and reliable measurement of the variables – Appropriate methods of statistical analysis that permit inferential conclusions about the hypothesis
  • 3.
    Research Designs • Today,we discuss research designs, focusing on experiments. – Contrast this with an epidemiological model, where we infer that group differences are attributable to the hypothesized effect in a population. – In an experiment, we attempt to control for those differences between groups, so that any differences we observe between groups is attributable to the test, and not to the group differences • This is why experiments are considered a “gold standard” in identifying a causal relationship between a dependent and an independent variable. – Obviously, experiments are not always feasible – Their strengths and limitations fuel endless debates, and have become a battleground for litigants seeking to assess a pattern of facts – Examples from video games, alcohol and car crashes
  • 4.
    Types of ResearchDesigns • Case studies – Good for generating hypotheses, for understanding and illustrating causal linkages – Not good for testing hypotheses, or for generalizing to other populations • Correlational studies – Studies that assess simultaneous changes in independent and dependent variables. • Example: income levels and voter preferences on surveys • Example: diet and disease (epi causation model – You can still make predictions from correlational studies if you have ruled out other causes, but you cannot achieve “control” without understanding directionality of effect. • True experiments – Random assignment of subjects to groups, unequal treatment of similarly situated people….. ‘but for…’ causation • Examples: Perry Pre-School, clinical drug trials • Quasi-experiments – Nonrandom assignment, with approximations and control for between-group differences.
  • 5.
    • Why areexperiments the gold standard? – An experiment is a design for testing hypotheses regarding the empirical relationship between an independent and a dependent variable – It is the most efficient and reliable way to rule out spurious causation (rival hypotheses) through random assignment of individuals to test conditions, and therefore to establish conditions for causal inference. – Causality is critical for the scientific goals of “explanation," "prediction" and “control.”
  • 6.
    Why Random Assignment? •RA assigns units to conditions based on chance – Not the same as random sampling – we get to this later, as an example of a validity threat or strength • Avoids correlation of causes with treatment conditions • When is randomization feasible? ETHICAL DECISION – When demand outstrips supply – When supply of X is short – When isolation or separation of experimental group is possible – Mandatory change (legislation) – No preferences – No advantages (denial of possibly beneficial service) – New organizations are created – Lotteries
  • 7.
    Types of Experiments •The Classic Experimental Design • The Post-test Only Experimental Design – Strengths -- No test effects, no desensitization – Weaknesses -- Problems in attribution of effects, does not eliminate rival causal factors such as history or test effects, introduces test effects (!) • The Solomon Four-Group Design (Fig 8.5) – Provides estimates of test effects, avoids reactivity and test effects. – Expensive, difficult to implement, especially under field conditions • Nested, or Hierarchical Designs – Allows for identification of contextual effects – Common in school research
  • 8.
    Natural Experiments • NaturalDisasters, Policy or Legislative Changes • Examples – Flipping Coins in the Courtroom – Damage Caps – Disaster Research – Highway 880 – Waiver Laws in Adjacent Areas
  • 9.
    Some Limitations toExperiments • Generalizability of X -- complex realities vs. single variables • Representations of theory -- e.g., the meaning of arrest • Period effects -- problems of the day, factors related to crimes or behaviors at one time may not be salient at another time (e.g., Drug eras, drug-crime relationships) • Political Limitations (e.g., over-rides) • Organizational resistance
  • 10.
    When You Can’tRandomize: Quasi-Experiments • Theory and Logic – Adjusting for selection differences – This can be done either by design controls or statistical controls or both • No-Control Quasi-Experimental Designs – Time series before and after an intervention – Removed TX (satisfies the essentialist view of causation) • Critiques of multiple pretest observations – Test effects (sensitization, et al.) – works best if the pretest observations are unobtrusive – Change over time in status of subject vis-à-vis the preconditions for treatment
  • 11.
    • Matched Strategies –Matched Cases – (Case Control Designs) Housing Discrimination – Matched Samples -- Bishop Waiver Study – Weaknesses and Strengths (omitted variable biases) • Difficulties and Problems with Matching – Endogeneity of Cause and Effect • Strategies for Better Matches – Use stable variables (avoid measurement errors) – Avoid confounding of matching variables with dependent variables (outcomes) – Use “deep” matches – longitudinally measured or stable variables, for example, rather than single-state variables • Statistical Solutions – instrumental variables approach – “propensity score matching” – try to model the underlying differences between experimental and control groups Quasi-Experimental Designs That Use Control Groups
  • 12.
    Experimental Validity • Validity- whether an experiment produces “true” or “accurate” answers • Threats to internal validity – Threats posed by the design of the experiment itself -- whether the observational procedures may have produced the results. Internal validity refers to the soundness of the design to justify the conclusions reached. • Threats to external validity – Threats due to the limitations of the sample -- whether the research is generalizeable or applicable only to the population studied. In other words, it refers to the extent to which the results can be generalized.
  • 13.
    Internal Validity Threats •History – local factors • Maturation of subjects – they change • Test Effects – subjects figure out test • Instrumentation – biased instruments • Regression to the Mean – “what goes up…” • Selection Bias I – non-equivalent groups • Mortality – subjects leave experiment • Testing Effects – you know you’re being studied • Reactivity – reactions to the researcher rather than the stimulus
  • 14.
    External Validity Threats •Selection Bias II -- groups are unrepresentative of general populations • Multiple treatment inference -- more than one independent variable operating • Halo effects -- conferring status or label that influences behavior • Local history – changing contexts • Diffusion of treatment -- controls imitate experimental subjects • Compensatory equalization of treatment -- controls want to receive experimental treatment • Decay -- erosion of treatment • Contamination -- C's receive some of E treatment
  • 15.
    Tradeoffs • Must wetrade internal validity for external validity in experiments?