This document presents the results of a simulation study and guidelines for determining minimum sample sizes in qualitative research. The study explores the number of sampling steps required to reach theoretical saturation across different scenarios by simulating populations and varying the number of codes and probability of observing codes. The main findings are that the probability of observing a code is more important than the number of codes, and that purposive sampling typically requires less than 50 sampling steps with around 20 steps being common. Guidelines are provided for researchers to identify the applicable scenario and choose an appropriate sampling strategy based on estimating key factors. The guidelines aim to provide a theoretical basis for sample sizes while accounting for the assumptions and iterative nature of qualitative research.
(I Can't Get No) Saturation: A Simulation and Guidelines for Minimum Sample Sizes in Qualitative Research
1. Copernicus Institute of Sustainable Development
(I Can’t Get No) Saturation: A Simulation and
Guidelines for Minimum Sample Sizes in
Qualitative Research
Frank van Rijnsoever
f.j.vanrijnsoever@uu.nl
2. Copernicus Institute of Sustainable Development
A random conversation…
• Question: How many interviews do I
need to do?
• Answer: It depends…
• Question: Depends on what?
• Answer: It depends on who you ask.
• Answer: But since you asked me, I will
give you my version of events.
3. Copernicus Institute of Sustainable Development
Introduction (1)
• Inductive qualitative research
Is becoming more popular (Bluhm, Harman,
Lee, & Mitchell, 2011)
• Innovation policy, transition studies
• Useful for exploring new concepts, theories,
and processes of change in an in-depth
manner, among other things…
Increased attention to methodology
(Suddaby, 2006)
Sample size is a debated topic.
• Laborious process, don’t oversample too much.
• Typical recommended sizes: 15 - 25.
• Little rules (Patton, 1990), except ‘experience’ and
‘judgement of the researcher’ (Sandelowski, 1995).
4. Copernicus Institute of Sustainable Development
Introduction (2)
Aim
• “this paper explores the sample size that is required to reach
theoretical saturation in various scenarios and to use these insights
to formulate guidelines about purposive sampling.”
Simulation
• Insights in mechanisms behind purposive sampling
Contributions
• Theoretical basis for sample size
• Guidelines for practitioners
6. Copernicus Institute of Sustainable Development
Theoretical concepts
• A population is the “universe of units of analysis” from which a sample
can be drawn.
• Does not have to be the same as the unit from which information is
gathered.
• Population size = N
• Codes emerge from information sources that are part of a population.
• Informants for interviews, existing documents, etc.
• Denoted as i
• At each sampling step an information source is sampled from the
population.
• Part of an iterative process that includes data collection, analysis,
and interpretation
• Number of sampling steps = n
7. Copernicus Institute of Sustainable Development
Theoretical concepts
• Codes represent information.
“tags” or “labels” on unique pieces of information (Bryman, 2013), e.g. concepts,
properties, relationships between other codes.
Each code represents only one piece of information, there are no synonyms
Denoted as c
• Theoretical saturation is reached when each code in the population is
observed at least once. Two factors influence the number of sampling
steps towards theoretical saturation: the number of codes and the mean
probability of observing codes
Denoted as ns
• Purposive sampling implies informed estimation of these factors
Complexity of the research question
The likelihood of an information source actually containing the code,
The willingness and ability of the source to let the code be uncovered, and
The ability of the researcher to observe the code.
8. Copernicus Institute of Sustainable Development
Theoretical concepts
• In this paper I test the number of sampling steps required for
saturation based on three typical theoretical ‘sampling
scenario’s.’
Random chance: random sampling
Minimal information: each sampling step yields an information
sources with at least one new code.
Maximal information: each sampling step yields an information
sources with the largest possible number of new codes.
• I simulate hypothetical populations in which I vary the
number of codes (k) and the mean probability of observing
codes (𝜱 𝒄.)
9. Copernicus Institute of Sustainable Development
Some mathematical notation
• Codes are stored in a vector of 0 and 1 of length k. Information sources are
denoted by i.
• 𝑐𝑖 = 𝑐𝑖1, 𝑐𝑖2, … , 𝑐𝑖𝑘 -> for example: (0,1,1,1,0,0,1)
• The probability that a code is present is represented by a random Bernouli trial Φ.
All codes probabilities together form a vector 𝛷𝑐 of length k.
• The probability that theoretical saturation is reached (𝑝 𝑛) based on random
chance is given by, 𝑝 𝑛 = 𝑐=1
𝑘
(1 − 1 − Φ 𝑐𝑘
𝑛
)
where n is the number of sampling steps
• If all values of 𝛷𝑐 are the same (𝛷 𝑘), then this becomes:
• 𝑝 𝑛 = (1 − 1 − Φ 𝑘
𝑛
) 𝑘
• When 𝑛 𝑠 is the number of sampling steps to reach theoretical saturation given Φ 𝑘
, k and 𝑝 𝑛. This can be rewritten to:
• 𝑛 𝑠 =
ln(1− 𝑘 𝑝 𝑛)
ln(1−Φ 𝑘)
• If we add a minimum number of repetitive codes (v) the formulas become:
• 𝑝 𝑛 = (1 − 1 − Φ 𝑘
𝑛
) 𝑘
) 𝜈
and 𝑛 𝑠 =
ln(1− 𝑘𝜈 𝑝 𝑛)
ln(1−Φ 𝑘)
• Only under very specific assumptions can we calculate theoretical saturation.
• Useful for calibrating my simulation!
10. Copernicus Institute of Sustainable Development
Methods
• The distribution of probabilities of vector
𝜱 𝒄 can be represented by the beta-
distribution.
𝐸[𝛷𝑐] =
𝛼
𝛼+𝛽
𝜱 𝒄
𝑉𝑎𝑟 Φ 𝑐 =
𝛼𝛽
𝛼+𝛽 2(𝛼+𝛽+1)
• Input for simulations
Simulate hypothetical populations
• N by k matrices with values 0 and 1
Systematically vary 𝛼, 𝛽 and k
• 𝛼 & 𝛽 are 1, 2, 3, … 10
• k = 1, 11, 21, 31, … 101
• N=5000
• 1100 hypothetical populations
For all three scenarios
Set 𝑝 𝑛 to 0.95 (probability reaching ns)
• 500 trials per population
14. Copernicus Institute of Sustainable Development
Main findings
• 𝜱 𝒄 is more important than k to reach theoretical
saturation.
• Purposive sampling typically requires less than 50
sampling steps. A common value is around 20. This
is the same range as in the literature.
• Little differences between minimal and maximal
information.
Minimal information gives more repetitive codes.
Trade-off between efficiency and repetition.
15. Copernicus Institute of Sustainable Development
Guidelines for purposive sampling
1. Identify a population of information sources, and
subpopulations.
2. Estimate the number of codes per sub-population.
3. Estimate the mean probability of a code being observed.
4. Set a degree of certainty to reach theoretical saturation.
5. Assess which scenario is most applicable to each sub-
population.
6. Choose a fitting sampling strategy
7. Account for these steps when reporting the research.
In general: working under the assumptions of minimal
information seems reasonable.
16. Copernicus Institute of Sustainable Development
Limitations
• Not empirical
Not possible.
Not required.
• Mechanistic approach
But in line with the assumptions of
qualitative research.
Everyone is free to apply the results as he or
she wishes.
Mixture of scenarios is possible.
• Not all possibilities are simulated
But enough variation to capture plausible
conditions.