Data and Statistical Considerations in Research Study Design ...


Published on

  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Data and Statistical Considerations in Research Study Design ...

  1. 1. Data and Statistical Considerations in Research Study Design and Analysis Nathan D. Wong, PhD, FACC Associate Professor and Director Heart Disease Prevention Program University of California, Irvine
  2. 2. Questions to Ask Regarding Study Design and Performance <ul><li>Was assignment of patients to treatments randomized? </li></ul><ul><li>Were all patients who entered the trial accounted for? </li></ul><ul><li>Was follow-up sufficiently long and complete? </li></ul><ul><li>Were patients analyzed in the groups to which they were randomized (intent to treat)? </li></ul><ul><li>Were patients, health workers, and study personnel “blind” to treatment? </li></ul><ul><li>Were groups similar (or study sample representative of population) at start of the trial? (selection bias) </li></ul><ul><li>Aside from experimental intervention, were the groups treated equally? (performance bias) </li></ul><ul><li>Were objective and unbiased outcome criteria used? (detection bias) </li></ul>
  3. 3. Questions to Ask Regarding Statistical Analysis <ul><li>Was there sufficient power/sample size? </li></ul><ul><li>Was the choice of statistical analysis appropriate? </li></ul><ul><li>Was the choice (and coding/classification) of outcome and treatment variables appropriate? </li></ul><ul><li>Is there an adequate description of magnitude and precision of effect? </li></ul><ul><li>Was there adjustment for potential confounders? </li></ul>
  4. 4. Outline <ul><li>Database set-up and structure </li></ul><ul><li>Classification of study variables </li></ul><ul><li>Sample size considerations </li></ul><ul><li>Choice of statistical procedures for different study designs </li></ul>
  5. 5. Data Collection / Management <ul><li>Always have a clear plan on how to collect data-- design and pilot questionnaires, case report forms. </li></ul><ul><li>The medical record should only serve as source documentation to back up what you have coded on your forms </li></ul><ul><li>Use acceptable error checking data entry screens or spreadsheet software (e.g., EXCEL) that is convertible into a statistical package (SAS highly recommended and avail via UCI site license) </li></ul><ul><li>Carefully design the structure of your database (e.g, one subject/ record, study variables in columns) so convertible into an analyzable format </li></ul>
  6. 6. Variable Classification <ul><li>What is your outcome (Y) (dependent variable) of interest? </li></ul><ul><ul><li>Categorical (binary, 3 or more categories) examples: survival, CHD incidence, achievement of BP control (yes vs. no) </li></ul></ul><ul><ul><li>Continuous: change in blood pressure </li></ul></ul><ul><li>What is the main explanatory or independent variable (X) of interest? </li></ul><ul><ul><li>Categorical (binary, 3 or more categories) examples: treatment status (active vs. placebo), JNC-VI blood pressure category (optimal, normal, high normal, stage I, II, or III) </li></ul></ul><ul><ul><li>Continuous: baseline blood pressure </li></ul></ul>
  7. 7. Covariates / Confounders <ul><li>The relationship between X and Y may be partially or completely due to one or more covariates (C1, C2, C3, etc.) if these covariates are related to both X and Y </li></ul><ul><li>A comparison of baseline treatment group differences in all possible known covariates is often done and presented </li></ul><ul><li>The effect of confounders can be assessed by: </li></ul><ul><ul><li>Stratifying your analysis by levels of these variables (e.g., examine relationship of X and Y separately among levels of covariates C) </li></ul></ul><ul><ul><li>Adjusting for covariates in a multivariable analysis </li></ul></ul><ul><ul><li>Considering interaction terms to test whether effect of one factor (e.g., treatment) on outcome varies by level of another factor (e.g., gender) </li></ul></ul>
  8. 8. Fallacies in Presenting Results: Statistically vs. Clinically Significant? <ul><li>Having a large sample size can virtually assure statistically significant results, but even with a very low correlation or relative risk </li></ul><ul><li>Conversely, an insufficient sample size can hide (not significant) clinically important differences </li></ul><ul><li>Statistical significance directly related to sample size and magnitude of effect or difference, and indirectly related to variance in measure </li></ul>
  9. 9. Sample Size Considerations <ul><li>What level of difference between the two groups constitutes a clinically significant effect? (e.g., difference in mean SBP response or difference in treatment vs. control incidence rates of CHD) </li></ul><ul><li>If continuous outcome, know mean and SD. </li></ul><ul><li>Find out how large of a sample needed to detect a true difference between the groups with 80-90% probability (power of study, or 1-beta or Type II error) </li></ul><ul><li>Use a reasonable alpha (or Type I) error of .01 or .05, the likelihood that a difference found to be significant is due to sampling error. </li></ul>
  10. 10. Power of a Test <ul><li>Power of a test is the probability of rejecting the null hypothesis when it is false, also 1-beta, where beta error is the probability of accepting a false null hypothesis. </li></ul><ul><li>For instance if the null hypothesis is Mean group A = Mean group B. If this is not true, beta error is likelihood of concluding it is true. Ideally this should be <0.20, so power is 1-beta, or at least 0.80. </li></ul>
  11. 11. Assessing Accuracy of a Test TRUE DISEASE STATUS / TREATMENT DIFFERENCE TEST RESULT SENSITIVITY = a / (a+c) SPECIFICITY = d / (b+d) Pos. Pred. Value = a / (a+b) Neg. Pred. Value = d/(c+d) False positive error (alpha, Type I) = b / (b+d) False negative error (beta, Type II) = c/ (a+c) a+b+c+d b+d a+c TOTAL c+d d c NEGATIVE / accept null a+b b a POSITIVE / reject null TOTAL NONDISEASED / NO DISEASED / YES
  12. 12. Questions to ask regarding study results <ul><li>How large is the treatment effect (or likelihood of outcome)? </li></ul><ul><ul><li>Relative risk reduction (may obscure comparative absolute risks) </li></ul></ul><ul><ul><li>Absolute risk reduction </li></ul></ul><ul><li>How precise is the treatment effect (or likelihood of outcome)? </li></ul><ul><ul><li>What are the confidence intervals? </li></ul></ul><ul><ul><li>Do they exclude the null value? </li></ul></ul><ul><ul><li>(e.g., is the result statistically significant– magnitude of Chi-square or F-value) </li></ul></ul>
  13. 13. (2.6 mmol/l) (3.4 mmol/l)
  14. 14. Examining Magnitude of Effect: HPS Study Example of Vascular Event Reduction Control event rate (CER) = c/c+d = 2606/10267=0.254 Experimental event rate (EER) = a/a+b = 2042/10269 = 0.199 Relative Risk (RR) = EER/CER = (.199)/(.254) = 0.78 Relative Risk Reduction (RRR) = CER-EER/CER=(0.254-0.199)/.254=0.22 Absolute Risk Reduction (ARR) = CER-EER = 0.01 – 0.008 = 0.055, or 5.5% Number Needed to Treat = 1/ARR = 1/0.055 = 18.2 (or 56 events prevented per 1000 treated) d 7661 c 2606 Placebo / Control b 8227 a 2042 Simvastatin/ Treatment Event No Event Yes
  15. 15. Measures of Precision of Effect <ul><li>The p-value, or alpha error, is most commonly an estimate of the precision of the result </li></ul><ul><li>A t-statistic, Chi-square, or r-square value gives the relative magnitude of a relation between two variables. </li></ul><ul><li>An F-statistic (or multiple r-square) identifies the magnitude of the variance in the dependent variable explained by the treatment or explanatory variable(s) </li></ul><ul><li>A Wald or Likelihood Ratio Chi-square statistic is frequently used in logistic or Cox regression survival analysis. </li></ul>
  16. 16. Precision of Effect: The Confidence Interval <ul><li>The estimate of where the true value of a result lies is expressed within 95% confidence intervals, which will contain the true relative risk or odds ratio 95% of the time </li></ul><ul><li>95% Confidence intervals are the RR + 1.96 X SE (since SE is SD/ sqrt(N), confidence intervals are smallest (precision greatest) with larger studies. </li></ul><ul><li>95% CI of the ARR is + 1.96 X square root of </li></ul><ul><li>([CER X (1-CER)/# control patients + EER X (1-EER)/# of exp’l patients] </li></ul><ul><li>95% CI for NNT = 1 / [95% CI for ARR] </li></ul>
  17. 17. Statistics and Statistical Procedures for Cross-Sectional and Case-Control Designs <ul><ul><li>When both independent and dependent variables are continuous: Pearson correlation or linear/polynomial regression ( Cross-sectional only ) </li></ul></ul><ul><ul><li>When dependent variable is continuous and independent variables are categorical and continuous: Linear or polynomial regression </li></ul></ul>
  18. 18. Analysis for Cross-Sectional and Case Control Designs (cont.) <ul><ul><li>When both independent and dependent variables are categorical: Chi-square test of proportions- prevalence odds ratio for likelihood of factor Y in those with vs. w/o factor X. </li></ul></ul><ul><ul><li>When outcome is binary (e.g., survival) and explanatory variables are categorical and/or continuous: </li></ul></ul><ul><ul><ul><li>Student-test or Chi-square for initial analysis </li></ul></ul></ul><ul><ul><ul><li>Logistic regression (multiple logistic regression for covariate adjustment) </li></ul></ul></ul>
  19. 19. Statistical Procedures for Prospective Cohort Studies <ul><li>When outcome is continuous: Linear and/or polynomial regression </li></ul><ul><li>When outcome is binary: Relative risk (RR) for incidence of disease in those with vs. without risk factor of interest, adjusted for covariates and considering follow-up time to event--Cox PH regression: HR (t,z i ) = HR 0 (t) exp ( α ’z i ) </li></ul><ul><li>If follow-up time is not known, use logistic regression: p (Y=1 | r 1 ,r 2 ,…) = 1/(1+ exp[-a-b 1 r 1 -… b n r n ) </li></ul>
  20. 20. Statistics and Statistical Procedures for Randomized Clinical Trials <ul><ul><li>Relative risk (RR) of binary event occurring in intervention vs. control group: </li></ul></ul><ul><ul><ul><li>- when follow-up time is known and varies, use Cox PH regression, where RR= e beta for the trt var. </li></ul></ul></ul><ul><ul><ul><li>-- when follow-up time is uniform or unknown, use logistic regression </li></ul></ul></ul><ul><ul><li>For continuously measured outcomes , (e.g., changes in blood pressure): </li></ul></ul><ul><ul><ul><li>Pre-post differences in a single group examined by paired t-test </li></ul></ul></ul><ul><ul><ul><li>Treatment vs. control differences examined by Student’s T-test (ANCOVA used when adjusting for covariates) </li></ul></ul></ul><ul><ul><ul><li>repeated measures ANOVA / ANCOVA used for multiple measures across a treatment period and covariates </li></ul></ul></ul>
  21. 21. Will the results help me in caring for my patients? <ul><li>For a study evaluating therapy: </li></ul><ul><ul><li>Can the results be applied to my patient care? (was the study or meta-analysis large enough with adequate precision?) </li></ul></ul><ul><ul><li>Were all clinically important treatment outcomes considered? (were secondary outcomes and adverse events assessed?) </li></ul></ul><ul><ul><li>Are the likely treatment benefits worth the potential harms and costs? (does the absolute benefit outweight the risk of adverse events and cost of therapy?) </li></ul></ul>
  22. 22. Will the results help me in caring for my patients (cont.)? <ul><li>For a study evaluating prognosis: </li></ul><ul><ul><li>Were the study patients similar to my own? (demographically representative, stage of disease) </li></ul></ul><ul><ul><li>Will the results lead directly to selecting or avoiding therapy? (useful to know clinical course of pts.) </li></ul></ul><ul><ul><li>Are the results useful for reassuring or counseling patients? (a valid, precise result of a good prognosis is useful in this case) </li></ul></ul>