HONORS THESIS CAPSTONE
COURSE




         GOVERNMENT DEPARTMENT
DATE                 PROFESSOR
         FALL 2010               MICHAEL NELSON
This time...
   empirical methods
       sampling
small-N causal inference
sampling
  probability sampling
non-probability sampling
 sampling “challenges”
Groups in Sampling


The Theoretical Population


         The Study Population


                      The Sampling Frame

                                   The Sample
probability sampling from Henry
general sampling strategies   from Patton
sampling & case selection challenges
                                                                                              y
                                                                                                            a, b
• Population Size
• Sampling Bias
   • probability of selection correlated with IV; will get the same relationship,                           pop
     but there is systematic non-representativeness
• Selection Bias                                                                                                   x
   • subset of sampling bias; probability of selection correlated with DV                    misses   gets
   • underestimates the relationship (regression line b instead of a)                         y
                                                                                                             a
• Non-response Bias
                                                                                                                   b
   • possibility that you are unable to collect data; data set is unrepresentative    gets

                                                                                     misses           pop

                                                                                                                   x
Causal inference
  for small-N
    research
properties of small-N research
 case study purposes & types
          strategies
Case selection

• For quantitative research, selection should be random


• For qualitative research, selection often must be done intentionally (King,
  Keohane and Verba, 1994).
properties of small-n research

• intensive
• field research in natural settings
• many kinds of data: observation, interview, archives
• typically: case-centered, not variable centered
Case selection strategies
Case studies and
 research design
from Gerring and McDermott
           (2007)
Gerring on case studies
        Research Goals        Case Study      Cross-Case Study
        1. Hypothesis         Generating      Testing
        2. Validity           Internal        External
        3. Causal Insight     Mechanisms      Effects
        4. Scope of           Deep            Broad
        Proposition
        Empirical Factors     Case Study      Cross-Case Study
        5. Populations of     Heterogeneous   Homogenous
        Cases
        6. Causal Strength    Strong          Weak
        7. Useful Variation Rare              Common
        8. Data Availability Concentrated     Dispersed
        Additional Factors Case Study         Cross-Case Study
        1. Causal             ?               ?
        Complexity
        2. State of the Field ?               ?
Case study purposes & types:
case selection as sampling

1.Descriptive Case Study: atheoretical; goal is to understand the case itself
2.Plausibility Probe: does the empirical phenomena exist; focus on availability of data;
  concern with plausibility of finding relationships between variables of interest
3.Hypothesis-Generating Case Study: seeks to find a generalization about cause and
  effect
4.Hypothesis-Testing Case Studies
   4.1. Critical Case
   4.2. Rival Hypotheses
   4.3. ....
Generating Hypotheses
Extreme cases

• Represent unusual values of
  the dependent or independent
  variables


• Used for hypothesis generation


• Not intended to be
  representative
Deviant cases

• Cases that deviate from the
  typical population


• A “high residual” case (outlier)


• Useful for generating
  hypotheses, especially new
  explanations for the outcome
  (dependent variable) of interest
Hypothesis- Testing Strategies: case selection

1.goal: establish the relationship between two or more variables

2.selection advice:

   2.1. choose cases that minimize variability in the other variables that might
        impact the relationship you are investigating

   2.2. representative sample
hypothesis - testing case studies


                             critical case


                      rival hypotheses
Selecting the typical
case

• Look for cases that are
  “typical” other cases


• Idea is that these cases are
  “low residual” cases


• Useful for hypothesis testing.
Select diverse cases

• Select cases that are represent
  the full range of variation


• Useful for hypothesis
  generation and hypothesis
  testing


• Represent variation in the
  population but not necessarily
  the distribution of that
  population
Influential case

• Cases with influential
  configurations of the
  independent variables are
  chosen


• Useful for verifying the status of
  a highly influential case


• Not necessarily representative
Crucial case

• Cases that are likely to represent an outcome of interest


• Choice usually requires qualitative assessment of crucialness


• Useful for hypothesis testing


• Should be highly representative
Selecting cases on the Independent Variable

• You select cases based on the values of an independent variable(s)


• Requires that you know a little bit about all of the potential cases


• Requires you act as if you don’t know the values of the dependent variable
Mill’s Methods



                    agreement




                 difference
Most Similar cases

• Cases are selected based on their similarity on variables other than the
  independent variable the hypothesis is testing the outcome of interest


• Useful for hypothesis testing and generation


• Not necessarily representative of the broader


• Most Similar Systems analysis involves a non-equivalent group design:

                                                                  NOXO
                                                                  NO     O
Thad’s example: income inequality and civil war

                        Income
                        Inequality

        Poverty                      Civil
                                     War
        Colonial Past

        External Threat
Case         Income       Poverty   Colonial    External         Civil
             Inequality             Past        Threat           War?

Costa Rica   Moderate     Yes       Yup         Nope             No



El Salvador High          Yes       Yup         Nope             Yes



Cuba         High         Yes       Yup         Nope             Yes




                                               adapted from Thad Kousser, UCSD
Case selection challenges
Case study challenges

• Motive behind the selection of case studies is not obvious (Is it convenience? Or is
  it because they are good stories). Without understanding this, the project is at best
  useless and at worst terrible misleading.
• Generalizability – Can the lessons learned from this case be applied to a larger
  class?
• Falsifiability – Results are presented in such a way that it would be difficult for an
  impartial researcher to replicate the project and arrive at the same result.
• No or Negative Degrees of Freedom: The researcher has more explanatory
  variables (moving pieces) than observations.
• Selection on the Dependent Variable: Choosing cases because of their
  performance on outcome of interest.
Strategies: remember threats to internal & external validity!

• History, maturation, instrumentation (data limitations)
• Selection bias
   • KKV give example of business school student who wants a high paid job and
     selects for his study sample only those graduates earning high salaries. He then
     relates salary to number of accounting courses. By excluding graduates with low
     salaries, he paradoxically underestimates the effect of additional accounting
     courses on income.
Geddes on selection bias
Geddes, continued
Strategies: combining with large-N
1. Goal: Increase number of observations
   1.1. Comparative case with large-N analysis of embedded units
2. Goal: Study causal mechanisms
   2.1. Large-N study establishes relationships between variables (causal effect)
   2.2. Small-N study establishes causal mechanism, looking at intervening steps (causal mechanism)
   2.3. Note: causal explanation requires an understanding of both the causal effect and the causal
     mechanism
3. Goal: Study of spuriousness
   3.1. Large-N study establishes relationships between variables (causal effect)
   3.2. Small-N study engages claims of spuriousness
4. Goal: Study of deviant cases
   4.1. Large-N study establishes deviant cases
   4.2. Small-N study examines deviant cases
5. Goal: Establish generality of findings
   5.1. Small-N study suggests X causes Y, but lacks external validity
   5.2. Large-N study looks to establish the generality of findings
Strategies:
Increasing leverage for causal inference in case studies

1.Congruence Method: Test a hypothesis by understanding a case; looks for fit between
  theory and case; involves multiple independent variables
2.Pattern Matching: Type of congruence testing, usually focused on a single
  independent variable; compares alternative theories with respect to multiple outcomes




3. Process Tracing: Focus is on establishing the causal mechanism, by examining fit of
  theory to intervening causal steps; how does “X” produce a series of conditions that
  come together in some way (or don’t) to produce “Y”?
4. Counterfactual Analysis: Gain leverage through rigorous, disciplined thought
  experiments
Strategies: structured, focused comparison
1.   “the comparison is focused because it deals
     selectively with only certain aspects of a historical
     case... and structured because it employs general
     questions to guide the data collection analysis in that
     historical case” - Alexander and George

2.    Steps (Kaarbo and Beasley)
     2.1. Identify the research question
     2.2. Identify variables (usually from existing theory)
     2.3. Select cases: comparable cases with variation in
          the values of the dependent variable, selected
          from across population subgroups (aids external
          validity)
     2.4. Define and specify your measurement strategy for
          concepts, including a “codebook” for the
          questions you employ in data collection
     2.5. “Code-write cases”
     2.6. Comparison (search for patterns) and implications
          for theory

Sampling and case selection

  • 1.
    HONORS THESIS CAPSTONE COURSE GOVERNMENT DEPARTMENT DATE PROFESSOR FALL 2010 MICHAEL NELSON
  • 3.
    This time... empirical methods sampling small-N causal inference
  • 4.
    sampling probabilitysampling non-probability sampling sampling “challenges”
  • 5.
    Groups in Sampling TheTheoretical Population The Study Population The Sampling Frame The Sample
  • 6.
  • 7.
  • 8.
    sampling & caseselection challenges y a, b • Population Size • Sampling Bias • probability of selection correlated with IV; will get the same relationship, pop but there is systematic non-representativeness • Selection Bias x • subset of sampling bias; probability of selection correlated with DV misses gets • underestimates the relationship (regression line b instead of a) y a • Non-response Bias b • possibility that you are unable to collect data; data set is unrepresentative gets misses pop x
  • 9.
    Causal inference for small-N research properties of small-N research case study purposes & types strategies
  • 10.
    Case selection • Forquantitative research, selection should be random • For qualitative research, selection often must be done intentionally (King, Keohane and Verba, 1994).
  • 12.
    properties of small-nresearch • intensive • field research in natural settings • many kinds of data: observation, interview, archives • typically: case-centered, not variable centered
  • 13.
  • 14.
    Case studies and research design from Gerring and McDermott (2007)
  • 15.
    Gerring on casestudies Research Goals Case Study Cross-Case Study 1. Hypothesis Generating Testing 2. Validity Internal External 3. Causal Insight Mechanisms Effects 4. Scope of Deep Broad Proposition Empirical Factors Case Study Cross-Case Study 5. Populations of Heterogeneous Homogenous Cases 6. Causal Strength Strong Weak 7. Useful Variation Rare Common 8. Data Availability Concentrated Dispersed Additional Factors Case Study Cross-Case Study 1. Causal ? ? Complexity 2. State of the Field ? ?
  • 18.
    Case study purposes& types: case selection as sampling 1.Descriptive Case Study: atheoretical; goal is to understand the case itself 2.Plausibility Probe: does the empirical phenomena exist; focus on availability of data; concern with plausibility of finding relationships between variables of interest 3.Hypothesis-Generating Case Study: seeks to find a generalization about cause and effect 4.Hypothesis-Testing Case Studies 4.1. Critical Case 4.2. Rival Hypotheses 4.3. ....
  • 19.
  • 20.
    Extreme cases • Representunusual values of the dependent or independent variables • Used for hypothesis generation • Not intended to be representative
  • 21.
    Deviant cases • Casesthat deviate from the typical population • A “high residual” case (outlier) • Useful for generating hypotheses, especially new explanations for the outcome (dependent variable) of interest
  • 22.
    Hypothesis- Testing Strategies:case selection 1.goal: establish the relationship between two or more variables 2.selection advice: 2.1. choose cases that minimize variability in the other variables that might impact the relationship you are investigating 2.2. representative sample
  • 23.
    hypothesis - testingcase studies critical case rival hypotheses
  • 24.
    Selecting the typical case •Look for cases that are “typical” other cases • Idea is that these cases are “low residual” cases • Useful for hypothesis testing.
  • 25.
    Select diverse cases •Select cases that are represent the full range of variation • Useful for hypothesis generation and hypothesis testing • Represent variation in the population but not necessarily the distribution of that population
  • 26.
    Influential case • Caseswith influential configurations of the independent variables are chosen • Useful for verifying the status of a highly influential case • Not necessarily representative
  • 27.
    Crucial case • Casesthat are likely to represent an outcome of interest • Choice usually requires qualitative assessment of crucialness • Useful for hypothesis testing • Should be highly representative
  • 28.
    Selecting cases onthe Independent Variable • You select cases based on the values of an independent variable(s) • Requires that you know a little bit about all of the potential cases • Requires you act as if you don’t know the values of the dependent variable
  • 29.
    Mill’s Methods agreement difference
  • 30.
    Most Similar cases •Cases are selected based on their similarity on variables other than the independent variable the hypothesis is testing the outcome of interest • Useful for hypothesis testing and generation • Not necessarily representative of the broader • Most Similar Systems analysis involves a non-equivalent group design: NOXO NO O
  • 31.
    Thad’s example: incomeinequality and civil war Income Inequality Poverty Civil War Colonial Past External Threat
  • 32.
    Case Income Poverty Colonial External Civil Inequality Past Threat War? Costa Rica Moderate Yes Yup Nope No El Salvador High Yes Yup Nope Yes Cuba High Yes Yup Nope Yes adapted from Thad Kousser, UCSD
  • 33.
  • 34.
    Case study challenges •Motive behind the selection of case studies is not obvious (Is it convenience? Or is it because they are good stories). Without understanding this, the project is at best useless and at worst terrible misleading. • Generalizability – Can the lessons learned from this case be applied to a larger class? • Falsifiability – Results are presented in such a way that it would be difficult for an impartial researcher to replicate the project and arrive at the same result. • No or Negative Degrees of Freedom: The researcher has more explanatory variables (moving pieces) than observations. • Selection on the Dependent Variable: Choosing cases because of their performance on outcome of interest.
  • 35.
    Strategies: remember threatsto internal & external validity! • History, maturation, instrumentation (data limitations) • Selection bias • KKV give example of business school student who wants a high paid job and selects for his study sample only those graduates earning high salaries. He then relates salary to number of accounting courses. By excluding graduates with low salaries, he paradoxically underestimates the effect of additional accounting courses on income.
  • 36.
  • 37.
  • 38.
    Strategies: combining withlarge-N 1. Goal: Increase number of observations 1.1. Comparative case with large-N analysis of embedded units 2. Goal: Study causal mechanisms 2.1. Large-N study establishes relationships between variables (causal effect) 2.2. Small-N study establishes causal mechanism, looking at intervening steps (causal mechanism) 2.3. Note: causal explanation requires an understanding of both the causal effect and the causal mechanism 3. Goal: Study of spuriousness 3.1. Large-N study establishes relationships between variables (causal effect) 3.2. Small-N study engages claims of spuriousness 4. Goal: Study of deviant cases 4.1. Large-N study establishes deviant cases 4.2. Small-N study examines deviant cases 5. Goal: Establish generality of findings 5.1. Small-N study suggests X causes Y, but lacks external validity 5.2. Large-N study looks to establish the generality of findings
  • 39.
    Strategies: Increasing leverage forcausal inference in case studies 1.Congruence Method: Test a hypothesis by understanding a case; looks for fit between theory and case; involves multiple independent variables 2.Pattern Matching: Type of congruence testing, usually focused on a single independent variable; compares alternative theories with respect to multiple outcomes 3. Process Tracing: Focus is on establishing the causal mechanism, by examining fit of theory to intervening causal steps; how does “X” produce a series of conditions that come together in some way (or don’t) to produce “Y”? 4. Counterfactual Analysis: Gain leverage through rigorous, disciplined thought experiments
  • 40.
    Strategies: structured, focusedcomparison 1. “the comparison is focused because it deals selectively with only certain aspects of a historical case... and structured because it employs general questions to guide the data collection analysis in that historical case” - Alexander and George 2. Steps (Kaarbo and Beasley) 2.1. Identify the research question 2.2. Identify variables (usually from existing theory) 2.3. Select cases: comparable cases with variation in the values of the dependent variable, selected from across population subgroups (aids external validity) 2.4. Define and specify your measurement strategy for concepts, including a “codebook” for the questions you employ in data collection 2.5. “Code-write cases” 2.6. Comparison (search for patterns) and implications for theory