Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Analysis of small datasets

A primer into the principles of analysis of small datasets

  • Login to see the comments

  • Be the first to like this

Analysis of small datasets

  1. 1. Saudi Board of Preventive Medicine, Riyadh Ministry of Health, KSA Dr. S. A. Rizwan, M.D.Demystifying statistics series: Meta-analysis course Analysis of small datasets Dr. S. A. Rizwan, M.D., Public Health Specialist, Saudi Board of Preventive Medicine, Riyadh, Kingdom of Saudi Arabia 11/25/19 1
  2. 2. Saudi Board of Preventive Medicine, Riyadh Ministry of Health, KSA Dr. S. A. Rizwan, M.D.Demystifying statistics series: Meta-analysis course Outline • What is small? • Misconceptions about small datasets • Where do we see small datasets? • Problems with small datasets • Descriptive statistics for small datasets • Inferential statistics for small datasets 211/25/19
  3. 3. Saudi Board of Preventive Medicine, Riyadh Ministry of Health, KSA Dr. S. A. Rizwan, M.D.Demystifying statistics series: Meta-analysis course What is small? • n<30 rule • Arbitrary • Not always correct • Full multivariate techniques even 100 may be considered small • When do we call a study sample small? • Outcome is highly influenced by one or two cases • Valid estimates of parameters and SE not possible • Iterative methods do not converge • Relation between sample size and effect size are not appropriate • Distributions of data are not consistent 311/25/19
  4. 4. Saudi Board of Preventive Medicine, Riyadh Ministry of Health, KSA Dr. S. A. Rizwan, M.D.Demystifying statistics series: Meta-analysis course What is small? 411/25/19
  5. 5. Saudi Board of Preventive Medicine, Riyadh Ministry of Health, KSA Dr. S. A. Rizwan, M.D.Demystifying statistics series: Meta-analysis course What is small? 511/25/19
  6. 6. Saudi Board of Preventive Medicine, Riyadh Ministry of Health, KSA Dr. S. A. Rizwan, M.D.Demystifying statistics series: Meta-analysis course Misconceptions about small datasets • Some think can’t use statistics • Not useful • It is sometimes likened to making astronomical observations with binoculars (i.e., only big things like planets, meteors can be seen) • However, Galileo used low power telescopes in his time to discover the moons of Jupiter 611/25/19
  7. 7. Saudi Board of Preventive Medicine, Riyadh Ministry of Health, KSA Dr. S. A. Rizwan, M.D.Demystifying statistics series: Meta-analysis course Where do we see small datasets? • Brand new drug trials • Preclinical studies • Animal experiments (esp. requiring sacrifice) • Limited biological samples (like organs) • Proof of concept studies • Brand new or expensive technology or test (eg. fMRI) • Neurosurgery/neuropsychology 711/25/19
  8. 8. Saudi Board of Preventive Medicine, Riyadh Ministry of Health, KSA Dr. S. A. Rizwan, M.D.Demystifying statistics series: Meta-analysis course Problems with small datasets • Non normal distribution (limited statistical procedures) • Outliers • Statistical significance less likely • Practical significance less likely • Perceived deficiency in generalizability • Lower power and higher margin of error • Limited to seeing only big effects (inflated effect size) • Inflated false discovery rate • Low reproducibility • Reduced scope of multiple subgroup analysis • Because small sample data analyses require compromises, it is difficult to justify 811/25/19
  9. 9. Saudi Board of Preventive Medicine, Riyadh Ministry of Health, KSA Dr. S. A. Rizwan, M.D.Demystifying statistics series: Meta-analysis course Problems with small datasets • Small sample size also prevents us from properly estimating and modeling the populations we sample from. • As a consequence, small n stops us from answering a fundamental, yet often ignored empirical question: how do distributions differ? 911/25/19
  10. 10. Saudi Board of Preventive Medicine, Riyadh Ministry of Health, KSA Dr. S. A. Rizwan, M.D.Demystifying statistics series: Meta-analysis course Descriptive statistics for small datasets • Mean sometimes • Median, IQR, range • Log or other transformations, Geometric mean • Outlier examination • Displaying frequencies instead of percentages • Publishing the entire dataset as a table 1011/25/19
  11. 11. Saudi Board of Preventive Medicine, Riyadh Ministry of Health, KSA Dr. S. A. Rizwan, M.D.Demystifying statistics series: Meta-analysis course Inferential statistics for small datasets • Nonparametric/exact hypothesis tests • (N-1) finite population correction for tests • Power calculation in case of non-significant tests • Data simulation techniques • Bayesian inferences 1111/25/19
  12. 12. Saudi Board of Preventive Medicine, Riyadh Ministry of Health, KSA Dr. S. A. Rizwan, M.D.Demystifying statistics series: Meta-analysis course Inferential statistics for small datasets • Confidence intervals for small datasets/non normal distributions • Based on t distributions • Log transformed intervals • Exact method • Adjusted Wald interval for proportions • Score method • Bootstrapping and Monte-Carlo simulations 1211/25/19
  13. 13. Saudi Board of Preventive Medicine, Riyadh Ministry of Health, KSA Dr. S. A. Rizwan, M.D.Demystifying statistics series: Meta-analysis course When to use exact tests in SPSS 1311/25/19
  14. 14. Saudi Board of Preventive Medicine, Riyadh Ministry of Health, KSA Dr. S. A. Rizwan, M.D.Demystifying statistics series: Meta-analysis course Approaches to analysis of small datasets • Informative analysis • Data analysis is informative when it addresses the question that motivated the research • Hypothesis testing - sufficiently powered to detect meaningful effects • As a compromise, conduct descriptive analyses to set the stage • Finite population correction • Assumes random sampling without replacement and accounts for a reduction in sampling error as f=n/N increases toward 1. 1411/25/19
  15. 15. Saudi Board of Preventive Medicine, Riyadh Ministry of Health, KSA Dr. S. A. Rizwan, M.D.Demystifying statistics series: Meta-analysis course Approaches to analysis of small datasets • Design and measurement issues to optimize research • If the goal is to detect a significant effect, there are two options for increasing t (A general t-test: the ratio of a parameter estimate to its standard error. ): • Approaches for increasing the parameter estimate • Sharpen the focus and increase the dosage in the Rx group • No hint of the active component in the control group • Treatment directly focused on causal mechanism • Approaches for decreasing the SE • Increase sample size • Full use of data, even incomplete ones, missing data via imputation • In multivariate model add more explanatory variables at the cost of df 1511/25/19
  16. 16. Saudi Board of Preventive Medicine, Riyadh Ministry of Health, KSA Dr. S. A. Rizwan, M.D.Demystifying statistics series: Meta-analysis course Approaches to analysis of small datasets • Design and measurement issues to optimize research • Outcome measure chosen should be reliable to minimize attenuation and sensitive to maximize the odds of detecting difference • Focus on proximal rather than distal outcomes which are easier to prove 1611/25/19
  17. 17. Saudi Board of Preventive Medicine, Riyadh Ministry of Health, KSA Dr. S. A. Rizwan, M.D.Demystifying statistics series: Meta-analysis course Approaches to analysis of small datasets • Multivariate Models • Substantial evidence of people using so-called large sample multivariate techniques with samples that are clearly small • In cluster studies, fewer than 30 clusters is small • Growth models, exploratory factor analysis studies, structural equation models with fewer than 100 participants are small • For multilevel modeling, small might be considered fewer than 40 clusters. • (Approaches include restricted maximum likelihood, restricted maximum likelihood with the Kenward-Roger correction, wild cluster bootstrap) • Structural equation modeling with fewer than 200 people is considered small sample 1711/25/19
  18. 18. Saudi Board of Preventive Medicine, Riyadh Ministry of Health, KSA Dr. S. A. Rizwan, M.D.Demystifying statistics series: Meta-analysis course Approaches to analysis of small datasets • Bayesian methods • Bayesian statistics incorporate prior knowledge along with a given set of current observations in order to make statistical inferences • The prior information could come from observational data • Particularly useful in cases where there is a lack of current test data but there is a strong prior understanding about the parameter • By incorporating prior information about a parameter, a posterior distribution for a parameter can be produced and an adequate estimate of reliability can be obtained • Situations might include poverty in a small area, such as a school district, or a treatment effect • Bayesian modeling suggests a middle ground—an estimate that is between the direct estimate and the regression estimate 1811/25/19
  19. 19. Saudi Board of Preventive Medicine, Riyadh Ministry of Health, KSA Dr. S. A. Rizwan, M.D.Demystifying statistics series: Meta-analysis course Take home messages • Small datasets are not all bad • They could be useful in very specific situations • A thorough understanding of statistical methods for small datasets is required for proper conclusions • Beware of conclusions that use regular statistics for small datasets 1911/25/19
  20. 20. Saudi Board of Preventive Medicine, Riyadh Ministry of Health, KSA Dr. S. A. Rizwan, M.D.Demystifying statistics series: Meta-analysis course Thank you Kindly email your queries to sarizwan1986@outlook.com 2011/25/19

×