Couples presenting to the infertility clinic- Do they really have infertility...
Analysis of small datasets
1. Saudi Board of Preventive Medicine, Riyadh Ministry of Health, KSA
Dr. S. A. Rizwan, M.D.Demystifying statistics series: Meta-analysis course
Analysis of small datasets
Dr. S. A. Rizwan, M.D.,
Public Health Specialist,
Saudi Board of Preventive Medicine,
Riyadh, Kingdom of Saudi Arabia
11/25/19 1
2. Saudi Board of Preventive Medicine, Riyadh Ministry of Health, KSA
Dr. S. A. Rizwan, M.D.Demystifying statistics series: Meta-analysis course
Outline
• What is small?
• Misconceptions about small datasets
• Where do we see small datasets?
• Problems with small datasets
• Descriptive statistics for small datasets
• Inferential statistics for small datasets
211/25/19
3. Saudi Board of Preventive Medicine, Riyadh Ministry of Health, KSA
Dr. S. A. Rizwan, M.D.Demystifying statistics series: Meta-analysis course
What is small?
• n<30 rule
• Arbitrary
• Not always correct
• Full multivariate techniques even 100 may be considered small
• When do we call a study sample small?
• Outcome is highly influenced by one or two cases
• Valid estimates of parameters and SE not possible
• Iterative methods do not converge
• Relation between sample size and effect size are not appropriate
• Distributions of data are not consistent
311/25/19
4. Saudi Board of Preventive Medicine, Riyadh Ministry of Health, KSA
Dr. S. A. Rizwan, M.D.Demystifying statistics series: Meta-analysis course
What is small?
411/25/19
5. Saudi Board of Preventive Medicine, Riyadh Ministry of Health, KSA
Dr. S. A. Rizwan, M.D.Demystifying statistics series: Meta-analysis course
What is small?
511/25/19
6. Saudi Board of Preventive Medicine, Riyadh Ministry of Health, KSA
Dr. S. A. Rizwan, M.D.Demystifying statistics series: Meta-analysis course
Misconceptions about small datasets
• Some think can’t use statistics
• Not useful
• It is sometimes likened to making astronomical observations with
binoculars (i.e., only big things like planets, meteors can be seen)
• However, Galileo used low power telescopes in his time to discover
the moons of Jupiter
611/25/19
7. Saudi Board of Preventive Medicine, Riyadh Ministry of Health, KSA
Dr. S. A. Rizwan, M.D.Demystifying statistics series: Meta-analysis course
Where do we see small datasets?
• Brand new drug trials
• Preclinical studies
• Animal experiments (esp. requiring sacrifice)
• Limited biological samples (like organs)
• Proof of concept studies
• Brand new or expensive technology or test (eg. fMRI)
• Neurosurgery/neuropsychology
711/25/19
8. Saudi Board of Preventive Medicine, Riyadh Ministry of Health, KSA
Dr. S. A. Rizwan, M.D.Demystifying statistics series: Meta-analysis course
Problems with small datasets
• Non normal distribution (limited statistical procedures)
• Outliers
• Statistical significance less likely
• Practical significance less likely
• Perceived deficiency in generalizability
• Lower power and higher margin of error
• Limited to seeing only big effects (inflated effect size)
• Inflated false discovery rate
• Low reproducibility
• Reduced scope of multiple subgroup analysis
• Because small sample data analyses require compromises, it is difficult to justify
811/25/19
9. Saudi Board of Preventive Medicine, Riyadh Ministry of Health, KSA
Dr. S. A. Rizwan, M.D.Demystifying statistics series: Meta-analysis course
Problems with
small datasets
• Small sample size also
prevents us from properly
estimating and modeling
the populations we sample
from.
• As a consequence, small n
stops us from answering a
fundamental, yet often
ignored empirical question:
how do distributions
differ?
911/25/19
10. Saudi Board of Preventive Medicine, Riyadh Ministry of Health, KSA
Dr. S. A. Rizwan, M.D.Demystifying statistics series: Meta-analysis course
Descriptive statistics for small datasets
• Mean sometimes
• Median, IQR, range
• Log or other transformations, Geometric mean
• Outlier examination
• Displaying frequencies instead of percentages
• Publishing the entire dataset as a table
1011/25/19
11. Saudi Board of Preventive Medicine, Riyadh Ministry of Health, KSA
Dr. S. A. Rizwan, M.D.Demystifying statistics series: Meta-analysis course
Inferential statistics for small datasets
• Nonparametric/exact hypothesis tests
• (N-1) finite population correction for tests
• Power calculation in case of non-significant tests
• Data simulation techniques
• Bayesian inferences
1111/25/19
12. Saudi Board of Preventive Medicine, Riyadh Ministry of Health, KSA
Dr. S. A. Rizwan, M.D.Demystifying statistics series: Meta-analysis course
Inferential statistics for small datasets
• Confidence intervals for small datasets/non normal distributions
• Based on t distributions
• Log transformed intervals
• Exact method
• Adjusted Wald interval for proportions
• Score method
• Bootstrapping and Monte-Carlo simulations
1211/25/19
13. Saudi Board of Preventive Medicine, Riyadh Ministry of Health, KSA
Dr. S. A. Rizwan, M.D.Demystifying statistics series: Meta-analysis course
When to use exact tests in SPSS
1311/25/19
14. Saudi Board of Preventive Medicine, Riyadh Ministry of Health, KSA
Dr. S. A. Rizwan, M.D.Demystifying statistics series: Meta-analysis course
Approaches to analysis of small datasets
• Informative analysis
• Data analysis is informative when it addresses the question that motivated
the research
• Hypothesis testing - sufficiently powered to detect meaningful effects
• As a compromise, conduct descriptive analyses to set the stage
• Finite population correction
• Assumes random sampling without replacement and accounts for a reduction
in sampling error as f=n/N increases toward 1.
1411/25/19
15. Saudi Board of Preventive Medicine, Riyadh Ministry of Health, KSA
Dr. S. A. Rizwan, M.D.Demystifying statistics series: Meta-analysis course
Approaches to analysis of small datasets
• Design and measurement issues to optimize research
• If the goal is to detect a significant effect, there are two options for increasing
t (A general t-test: the ratio of a parameter estimate to its standard error. ):
• Approaches for increasing the parameter estimate
• Sharpen the focus and increase the dosage in the Rx group
• No hint of the active component in the control group
• Treatment directly focused on causal mechanism
• Approaches for decreasing the SE
• Increase sample size
• Full use of data, even incomplete ones, missing data via imputation
• In multivariate model add more explanatory variables at the cost of df
1511/25/19
16. Saudi Board of Preventive Medicine, Riyadh Ministry of Health, KSA
Dr. S. A. Rizwan, M.D.Demystifying statistics series: Meta-analysis course
Approaches to analysis of small datasets
• Design and measurement issues to optimize research
• Outcome measure chosen should be reliable to minimize attenuation and
sensitive to maximize the odds of detecting difference
• Focus on proximal rather than distal outcomes which are easier to prove
1611/25/19
17. Saudi Board of Preventive Medicine, Riyadh Ministry of Health, KSA
Dr. S. A. Rizwan, M.D.Demystifying statistics series: Meta-analysis course
Approaches to analysis of small datasets
• Multivariate Models
• Substantial evidence of people using so-called large sample multivariate
techniques with samples that are clearly small
• In cluster studies, fewer than 30 clusters is small
• Growth models, exploratory factor analysis studies, structural equation
models with fewer than 100 participants are small
• For multilevel modeling, small might be considered fewer than 40 clusters.
• (Approaches include restricted maximum likelihood, restricted maximum likelihood with
the Kenward-Roger correction, wild cluster bootstrap)
• Structural equation modeling with fewer than 200 people is considered small
sample
1711/25/19
18. Saudi Board of Preventive Medicine, Riyadh Ministry of Health, KSA
Dr. S. A. Rizwan, M.D.Demystifying statistics series: Meta-analysis course
Approaches to analysis of small datasets
• Bayesian methods
• Bayesian statistics incorporate prior knowledge along with a given set of
current observations in order to make statistical inferences
• The prior information could come from observational data
• Particularly useful in cases where there is a lack of current test data but there
is a strong prior understanding about the parameter
• By incorporating prior information about a parameter, a posterior distribution
for a parameter can be produced and an adequate estimate of reliability can
be obtained
• Situations might include poverty in a small area, such as a school district, or a
treatment effect
• Bayesian modeling suggests a middle ground—an estimate that is between
the direct estimate and the regression estimate
1811/25/19
19. Saudi Board of Preventive Medicine, Riyadh Ministry of Health, KSA
Dr. S. A. Rizwan, M.D.Demystifying statistics series: Meta-analysis course
Take home messages
• Small datasets are not all bad
• They could be useful in very specific situations
• A thorough understanding of statistical methods for small datasets is
required for proper conclusions
• Beware of conclusions that use regular statistics for small datasets
1911/25/19
20. Saudi Board of Preventive Medicine, Riyadh Ministry of Health, KSA
Dr. S. A. Rizwan, M.D.Demystifying statistics series: Meta-analysis course
Thank you
Kindly email your queries to sarizwan1986@outlook.com
2011/25/19