Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Bivariate Analysis in R

47 views

Published on

Prepared for HSCI 432, SFU

Published in: Education
  • Be the first to comment

  • Be the first to like this

Bivariate Analysis in R

  1. 1. USING R FOR EPIDEMIOLOGICAL RESEARCHKiffer G. Card, PhD Kirk J. Hepburn, MPP
  2. 2. BIVARIATE ANALYSES IN R
  3. 3. • Hypotheses Testing • Analysis of Variance • Non-Parametric Tests • Correlation Tests • Skills-building Activity Outline
  4. 4. Which Test Should I Use? Nature of Independent Variables Nature of Dependent Variable Test(s) 1 IV with 2 levels (independent groups) interval & normal 2 independent sample t-test ordinal or interval Wilcoxon-Mann Whitney test categorical Chi-square test Fisher’s exact test 1 IV with 2 or more levels (independent groups) interval & normal one-way ANOVA ordinal or interval Kruskal Wallis categorical Chi-square test 1 IV with 2 levels (dependent/matched groups) interval & normal paired t-test ordinal or interval Wilcoxon signed ranks test categorical McNemar 1 IV with 2 or more levels (dependent/matched groups) interval & normal one-way repeated measures ANOVA ordinal or interval Friedman test categorical (2 categories) repeated measures logistic regression 2 or more IVs (independent groups) interval & normal factorial ANOVA ordinal or interval ordered logistic regression categorical (2 categories) factorial logistic regression 1 interval IV interval & normal correlation interval & normal simple linear regression ordinal or interval non-parametric correlation categorical simple logistic regression 1 or more interval IVs and/or 1 or more categorical IVs interval & normal multiple regression analysis of covariance categorical multiple logistic regression
  5. 5. 2.1. HYPOTHESIS TESTING Independent t-test Paired t-test Chi-Squared Test McNemar’s Test
  6. 6. Hypothesis Testing Analysis of Variance Non-Parametric Testing Correlation • The unpaired two-samples t-test is used to compare the mean of two independent groups. alternative = “greater”, “less”, or “two.sided” var.equal = if FALSE calculates Welch’s t-test; if TRUE calculates the Student’s t-test Independent t-test t.test(var ~ grp, alternative = “two.sided”, var.equal = TRUE, data = dat)
  7. 7. Hypothesis Testing Analysis of Variance Non-Parametric Testing Correlation • Independence of Observations • Normal distribution • Test using a Shapiro-Wick Normality Test • Equal Variances (Only Student’s t-test) • If the variance of the two groups are heteroscedastic, you calculate a Welch t-test instead. Assumptions of Independent t-test shapiro.test(dat$var[dat$group == “group 1”]) shapiro.test(dat$var[dat$group == “group 2”]) res.ftest <- var.test(var ~ grp, data = dat) res.ftest
  8. 8. Hypothesis Testing Analysis of Variance Non-Parametric Testing Correlation • The paired samples t-test is used to compare the means between two related groups of samples. In this case, you have two values (i.e., pair of values) for the same samples. “paired = TRUE” indicates paired or matched samples. Paired t-test t.test(var ~ grp, paired = TRUE, alternative = “two.sided”, data = dat)
  9. 9. Hypothesis Testing Analysis of Variance Non-Parametric Testing Correlation • The chi-square test of independence is used to analyze the frequency table (i.e. contengency table) formed by two categorical variables. • Usually used for non-dichotomous variables. Pearson’s Chi-Squared Test chisq.test(x = dat$var1,y = dat$var2, data = dat)
  10. 10. Hypothesis Testing Analysis of Variance Non-Parametric Testing Correlation • The McNemar’s test is like a paired-test but for a dichotomous rather than continuous variable. • Pre-post testing or matched pairs. McNemar’s Test mcnemar.test(x = obs1, y = obs2, data = dat)
  11. 11. 2.2. ANALYSIS OF VARIANCE One-Way ANOVA Assumptions of ANOVA Advanced ANOVA
  12. 12. Hypothesis Testing Analysis of Variance Non-Parametric Testing Correlation • The one-way analysis of variance (ANOVA), also known as one-factor ANOVA, is an extension of independent two-samples t-test for comparing means in a situation where there are more than two groups. • In one-way ANOVA, the data is organized into several groups base on one single grouping variable (also called factor variable). • http://www.sthda.com/english/wiki/one-way-anova- test-in-r#what-is-one-way-anova-test One-Way ANOVA res.aov <- aov(var ~ grp, data = dat) summary(res.aov)
  13. 13. Hypothesis Testing Analysis of Variance Non-Parametric Testing Correlation • Independent • Homogeneity of variance • If they are not equal you can use a welch one-way test: Assumptions of ANOVA plot(res.aov, 1) library(car) leveneTest(var ~ grp, data = dat) oneway.test(var ~ grp, data = dat)
  14. 14. Hypothesis Testing Analysis of Variance Non-Parametric Testing Correlation • Normality of Residuals Assumptions of ANOVA plot(res.aov, 2) # Extract the residuals aov_residuals <- residuals(object = res.aov) # Run Shapiro-Wilk test shapiro.test(x = aov_residuals)
  15. 15. Hypothesis Testing Analysis of Variance Non-Parametric Testing Correlation • In addition to the simple one-way ANOVA, other more advance methods can be used: • Two-way ANOVA compares the mean differences between groups that have been split on two independent variables (called factors). • Multivariate analysis of variance (MANOVA) is simply an ANOVA with several outcome variables. • For example, we may conduct a study where we try two different textbooks, and we are interested in the students' improvements in math and physics. In that case, improvements in math and physics are the two dependent variables, and our hypothesis is that both together are affected by the difference in textbooks. • Repeated measures ANOVA is the equivalent of the one- way ANOVA, but for related, not independent groups, and is the extension of the dependent t-test. ADVANCED ANOVA
  16. 16. 2.3. NON-PARAMETRIC TESTS Parametric vs. Non-Parametric Mann-Whitney Test Wilcoxon Signed Rank Kruskal Wallis H Test
  17. 17. Hypothesis Testing Analysis of Variance Non-Parametric Testing Correlation Parametric Non-Parametric “Normally Distributed” Tolerant of… Normal Data Interval & Ratio Data Data without outliers Data with equal variances More “Efficient” use of “Information” “Distribution Free” Tolerant of… Non-Normal Data Ordinal & Nominal Data Outliers Unequal Variances Subject to “Information” Loss Parametric vs. Non-Parametric Tests
  18. 18. Hypothesis Testing Analysis of Variance Non-Parametric Testing Correlation Parametric methods use “values” Non-parametric methods mostly use “ranks” How to Calculate Ranks 1.Organize values in ascending order. 2. Assign values a rank. 3. For “tied values” split the ranks they would occupy. 121 199 123 121 120 124 122 120 121 121 122 123 124 199 Values 1 2.5 2.5 4 5 6 7 Ranks Parametric vs. Non-Parametric
  19. 19. Hypothesis Testing Analysis of Variance Non-Parametric Testing Correlation Parametric Independent t test Paired t test One-Way ANOVA Non Parametric Mann-Whitney U Test Wilcoxon Signed Rank Kruskal Wallis H-Test Compare two paired means Compare multiple independent means Equivalent to Paired t test11 Equivalent to One-way ANOVA Compare two independent means Equivalent to Independent t test Parametric vs. Non-Parametric
  20. 20. Hypothesis Testing Analysis of Variance Non-Parametric Testing Correlation • The unpaired two-samples Wilcoxon test (also known as Wilcoxon rank sum test or Mann-Whitney test) is a non-parametric alternative to the unpaired two- samples t-test, which can be used to compare two independent groups of samples. It’s used when your data are not normally distributed. Mann-Whitney Test (a.k.a. Mann-Whitney-Wilcoxon test, Wilcoxon Rank-Sum) res <- wilcox.test(var ~ grp, data = dat) res
  21. 21. Hypothesis Testing Analysis of Variance Non-Parametric Testing Correlation • The paired samples Wilcoxon test (also known as Wilcoxon signed-rank test) is a non-parametric alternative to paired t-test used to compare paired data. It’s used when your data are not normally distributed. Wilcoxon Signed Rank res <- wilcox.test(var ~ grp, paired = TRUE, data = dat) res
  22. 22. Hypothesis Testing Analysis of Variance Non-Parametric Testing Correlation • Kruskal-Wallis test by rank is a non-parametric alternative to one-way ANOVA test, which extends the two-samples Wilcoxon test in the situation where there are more than two groups. Kruskal Wallis H Test kruskal.test(var ~ grp, data = dat)
  23. 23. 2.4. CORRELATION Pearson’s Correlation Spearmen’s Rank Test R-squared
  24. 24. Hypothesis Testing Analysis of Variance Non-Parametric Testing Correlation • Correlation test is used to evaluate the association between two or more variables. Pearson's Correlation res <- cor.test(dat$var1, dat$var2, method = “pearson”) res
  25. 25. Hypothesis Testing Analysis of Variance Non-Parametric Testing Correlation • Linear association • Plot scatter plot with regression line. • Normal distribution • Check that both variables are normally distributed using the shapiro.test() function Assumptions of Pearson's Correlation
  26. 26. Hypothesis Testing Analysis of Variance Non-Parametric Testing Correlation • Spearman’s rho statistic is the most widely used rank- based measure of association. Spearmen’s Rank Test res <- cor.test(dat$var1, dat$var2, method = “spearman”) res
  27. 27. Hypothesis Testing Analysis of Variance Non-Parametric Testing Correlation • The Kendall rank correlation coefficient or Kendall’s tau statistic is used to estimate a rank-based measure of association. Kendall’s rank has better statistical properties, though is less widely used. Kendall Rank Test res <- cor.test(dat$var1, dat$var2, method = “kendall”) res
  28. 28. 2.5. Skills Building Activity Infertility Data Analysis
  29. 29. • For each variable in the “infert” dataset calculate descriptive statics for the overall sample, for cases, and for controls. • Identify any statistically significant differences between cases and controls using the appropriate parametric or non-parametric test. • Identify any statistically significant associations between the other variables in the dataset. • For each test conducted above, rationalize why the test was chosen and provide an interpretation of the result. • Describe how your analyses either support or contradict those of Trichopoulos et al. (1976) “Induced Abortion and Secondary Infertility.” British Journal of Obstetrics and Gynaecology. Skills Building Activity

×