- 1. Week 4.1: Model Comparison ! Lab: Interactions Practice ! Model Comparison ! Nested Models ! Hypothesis Testing ! REML vs ML ! Non-Nested Models ! Shrinkage ! The Problem ! Solutions
- 2. Interpreting Interactions • Numerical interaction term tells us how the interaction works: • Strengthens individual effects with the same sign as the interaction • Weakens individual effects with a different sign as the interaction • Or, again, just look at the graph ☺
- 3. Interpreting Interactions Practice • Dependent variable: Classroom learning • Independent variable 1: Intrinsic motivation • Learning because you want to learn (intrinsic) vs. to get a good grade (extrinsic) • Intrinsic motivation has a + effect on learning • Independent variable 2: Autonomy language • “You can…” (vs. “You must…”) • Also has a + effect on learning • Motivation x autonomy interaction is + • Interpretation: Combining intrinsic motivation and autonomy language especially benefits learning • “Synergistic” interaction Vansteenkiste et al., 2004, JPSP
- 4. Interpreting Interactions Practice • Dependent variable: Satisfaction with a consumer purchase • Number of choices: - effect on satisfaction • “Maximizing” strategy: - effect on satisfaction • Trying to find the best option vs. “good enough” • Choices x maximizing strategy is - • Interpretation: Having lots of choices when you’re a maximizer especially reduces satisfaction • Also a synergistic interaction (Carrillat, Ladik, & Legoux, 2011; Marketing Letters)
- 5. Week 4.1: Model Comparison ! Lab: Interactions Practice ! Model Comparison ! Nested Models ! Hypothesis Testing ! REML vs ML ! Non-Nested Models ! Shrinkage ! The Problem ! Solutions
- 6. Model Formulae Practice • Write the R formula for each model: • 1) We’re interested in the effects of FamilySES, PriorNightSleep, and Nutrition on MathTest Performance, but we don’t expect them to interact • 2) We factorially manipulated SentenceType (active or passive) and Plausibility (low or high) in a test of TextComprehensionAccuracy
- 7. Model Formulae Practice • Write the R formula for each model: • 1) We’re interested in the effects of FamilySES, PriorNightSleep, and Nutrition on MathTest Performance, but we don’t expect them to interact • MathPerformance ~ 1 + SES + Sleep + Nutrition • 2) We factorially manipulated SentenceType (active or passive) and Plausibility (low or high) in a test of TextComprehensionAccuracy • ComprehensionAccuracy ~ 1 + SentenceType + Plausibility + SentenceType:Plausibility or ComprehensionAccuracy ~ 1 + SentenceType*Plausibility
- 8. Interpreting Interactions Practice • Second language proficiency: + effect on translation accuracy • Word frequency: + effect on accuracy • Frequency x proficiency interaction is - • Interpretation: Proficiency matters less when translating high frequency words • Or: Difference between high & low proficiency words gets smaller if you have high proficiency • “Antagonistic” interaction. Combining the effects reduces or reverses the individual effects. (e.g., Diependaele, Lemhöfer, Brysbaert, 2012, QJEP)
- 9. Interpreting Interactions Practice • Retrieval practice: + effect on long-term learning • Working memory span: + effect on learning • Retrieval practice x WM span interaction is - (Agarwal et al., 2016) • Interpretation: Retrieval practice is especially beneficial for people with low working memory. • Or: Low WM confers less of a disadvantage if you do retrieval practice
- 10. Interpreting Interactions Practice • Affectionate touch: + effect on feeling of relationship security • Avoidant attachment style: - effect on security • Touch x avoidant attachment interaction is - • Interpretation: Affectionate touch enhances relationship security less for people with an avoidant attachment style (Jakubiak & Feeney, SPPS, 2016)
- 11. Interpreting Interactions Practice • Age: - effect on picture memory • Older adults have poorer memory • Emotional valence: - effect on accuracy • Positive pictures are not remembered as well compared to negative pictures • Age x Valence interaction is + • Interpretation: Age declines are smaller for positive pictures • Or: Disadvantage of positive pictures is not as strong for older adults (e.g., Mather & Carstensen, 2005, TiCS)
- 12. Week 4.1: Model Comparison ! Lab: Interactions Practice ! Model Comparison ! Nested Models ! Hypothesis Testing ! REML vs ML ! Non-Nested Models ! Shrinkage ! The Problem ! Solutions
- 13. Model Comparison • Sometimes, we may have more than 1 model that we could consider applying to the data • 2 or more competing theoretical models • e.g., critical period in language acquisition No critical period (Vanhove, 2013) Critical period hypothesis (Hartshorne et al., 2020) 1 + AgeOfAcquisition 1 + AgeOfAcquisition*CriticalPeriod
- 14. Model Comparison • Sometimes, we may have more than 1 model that we could consider applying to the data • 2 or more competing theoretical models • Exploratory analysis where we don’t yet know which model would be appropriate
- 15. Dataset ! Social support & health (e.g., Cohen & Wills, 1985) ! lifeexpectancy.csv: ! Longitudinal study of 1000 subjects – some siblings from same family, so 517 total families ! Perceived social support (z-scored) ! Lifespan ! And several control variables
- 16. Nested Models ! Three possible models of life expectancy: ! Amount of weekly exercise ! Amount of weekly exercise & perceived social support ! Amount of weekly exercise, perceived social support, years of education, conscientiousness, yearly income, and number of vowels in your last name ! These are nested models—each one can be formed by subtracting variables from the one below it (“nested inside it”)
- 17. Nested Models
- 18. Nested Models ! Three possible models of life expectancy: ! Amount of weekly exercise ! Amount of weekly exercise & perceived social support ! Amount of weekly exercise, perceived social support, years of education, conscientiousness, yearly income, and number of vowels in your last name ! Which set of information would give us the most accurate fitted() values?
- 19. Nested Models ! Three possible models of life expectancy: ! Amount of weekly exercise ! Amount of weekly exercise & perceived social support ! Amount of weekly exercise, perceived social support, years of education, conscientiousness, yearly income, and number of vowels in your last name • The “biggest” nested model will always provide predictions that are at least as good • Adding info can only explain more of the variance
- 20. Nested Models • The “biggest” nested model will always provide predictions that are at least as good • Adding info can only explain more of the variance • Might not be much better (“number of vowels” effect zero or close to zero) but can’t be worse Slope of regression line relating last name vowels to life expectancy is near 0 But that merely fails to improve predictions; doesn’t hurt them
- 21. Week 4.1: Model Comparison ! Lab: Interactions Practice ! Model Comparison ! Nested Models ! Hypothesis Testing ! REML vs ML ! Non-Nested Models ! Shrinkage ! The Problem ! Solutions
- 22. Hypothesis Testing ! Let’s think about our first two models: ! Comparing these two statistical models closely relates to our research question: Which theoretical model best explains data? ! The theoretical model where social support does affect life expectancy ! The model where social support doesn’t affect life expectancy E(Yi(j)) = γ00 + γ10HrsExercise + γ20SocSupport model1 E(Yi(j)) = γ00 + γ10HrsExercise model2
- 23. Hypothesis Testing ! Let’s think about our first two models: ! What are some possible values of γ20 (the SocSupport effect) in model1? ! 3.83 ! -1.04 ! 0 – there is no social support effect E(Yi(j)) = γ00 + γ10HrsExercise + γ20SocSupport model1 E(Yi(j)) = γ00 + γ10HrsExercise model2
- 24. ! Let’s think about our first two models: ! What happens when γ20 is equal to 0? ! Anything multiplied by 0 is 0, so SocSupport just drops out of the equation ! Becomes the same thing as model2 E(Yi(j)) = γ00 + γ10HrsExercise + γ20SocSupport Hypothesis Testing 0 E(Yi(j)) = γ00 + γ10HrsExercise + γ20SocSupport model1 E(Yi(j)) = γ00 + γ10HrsExercise model2
- 25. Hypothesis Testing ! Let’s think about our first two models: ! model2 is just a special case of model1 ! The version of model1 where γ20 happens to be 0 ! One of many possible versions of model1 ! Why we say model2 is “nested” in model1 E(Yi(j)) = γ00 + γ10HrsExercise + γ20SocSupport E(Yi(j)) = γ00 + γ10HrsExercise + γ20SocSupport model1 E(Yi(j)) = γ00 + γ10HrsExercise model2 0
- 26. Hypothesis Testing ! Let’s think about our first two models: ! This also helps show why model1 always fits as well as model2 or better ! model1 can account for the case where γ20 = 0 ! But it can also account for many other cases, too E(Yi(j)) = γ00 + γ10HrsExercise + γ20SocSupport E(Yi(j)) = γ00 + γ10HrsExercise + γ20SocSupport model1 E(Yi(j)) = γ00 + γ10HrsExercise model2 0
- 27. Likelihood Ratio Test ! We can compare nested models (only) using the likelihood-ratio test ! Remember that likelihood is what we search for in fitting an individual model (find the values with the highest likelihood) ! First, fit each of the models to be compared ! model1 <- lmer(Lifespan ~ 1 + HrsExercise + SocSupport + (1|Family), data=lifeexpectancy) ! model2 <- lmer(Lifespan ~ 1 + HrsExercise + (1|Family), data=lifeexpectancy)
- 28. Likelihood Ratio Test • Then, compare them with anova(): • anova(model1, model2) • Order doesn’t matter • Differences in (log) likelihoods are distributed as a chi-square • d.f. = # of variables added or removed • Here, χ2 (1) = 8.67, p = .003 Log likelihood will also be somewhat higher (better) for the complex model … but is it SIGNIFICANTLY better? We’ll discuss what this means in a moment (don’t worry; it’s what we want)
- 29. Likelihood Ratio Test • t-test and LR test are very similar! • t-test: Tests whether an effect differs from 0, based on this model • Likelihood ratio: Compare to a model where the effect actually IS constrained to be 0 • With an infinitely large sample, these two tests would produce identical conclusions • With small sample, t-test is less likely to detect spurious differences (Luke, 2017) • But, large differences uncommon
- 30. Likelihood Ratio Test • t-test and LR test are very similar! • t-test: Tests whether an effect differs from 0, based on this model • Likelihood ratio: Compare to a model where the effect actually IS constrained to be 0 p-value from likelihood ratio test: .0032 p-value from lmerTest t- test: .0033
- 31. Likelihood Ratio Test • t-test and LR test are very similar! • t-test: Tests whether an effect differs from 0, based on this model • Likelihood ratio: Compare to a model where the effect actually IS constrained to be 0 • Guidance: • LR test is useful for testing groups of variable • model1 <- lmer(Lifespan ~ 1 + HrsExercise …) • model3 <- lmer(Lifespan ~ 1 + HrsExercise + SocSupport + YrsEducation + Conscientiousness …) • If testing just one variable at a time, use t-test— slightly less likely to produce Type I error
- 32. Week 4.1: Model Comparison ! Lab: Interactions Practice ! Model Comparison ! Nested Models ! Hypothesis Testing ! REML vs ML ! Non-Nested Models ! Shrinkage ! The Problem ! Solutions
- 33. REML vs ML • Technically, two different algorithms that R can use “behind the scenes” to get the estimates # REML: Restricted Maximum Likelihood • Assumes the fixed effects structure is correct • Bad for comparing models that differ in fixed effects # ML: Maximum Likelihood • OK for comparing models • But, may underestimate variance of random effects • Ideal: ML for model comparison, REML for final results • lme4 does this automatically for you! • Defaults to REML. But automatically refits models with ML when you do likelihood ratio test.
- 34. REML vs ML • The one time you might want to mess with this: • If you are going to be doing a lot of model comparisons, can fit the model with ML to begin with • model1 <- lmer(DV ~ 1 + Predictors, data=lifeexpectancy, REML=FALSE) • Saves refitting for each comparison • Remember to refit the model with REML=TRUE for your final results
- 35. Week 4.1: Model Comparison ! Lab: Interactions Practice ! Model Comparison ! Nested Models ! Hypothesis Testing ! REML vs ML ! Non-Nested Models ! Shrinkage ! The Problem ! Solutions
- 36. Non-Nested Models • Which of these pairs is not a case of nested models? • A • Accuracy ~ SentenceType + Aphasia + SentenceType:Aphasia • Accuracy ~ SentenceType + Aphasia • B • MathAchievement ~ SocioeconomicStatus • MathAchievement ~ TeacherRating + ClassSize • C • Recall ~ StudyTime • Recall ~ StudyTime + StudyStrategy
- 37. Non-Nested Models • Which of these pairs is not a case of nested models? • A • Accuracy ~ SentenceType + Aphasia + SentenceType:Aphasia • Accuracy ~ SentenceType + Aphasia • B • MathAchievement ~ SocioeconomicStatus • MathAchievement ~ TeacherRating + ClassSize • Each of these models has something that the other doesn’t have.
- 38. Non-Nested Models • Models that aren’t nested can’t be tested the same way • A non-nested comparison: • What would support 1st model over 2nd? • γ20 is significantly greater than 0, but also γ10 is 0 • But remember we can’t test that something is 0 with frequentist statistics … can’t prove the H0 is true • Parametric statistics don’t apply here $ E(Yi(j)) = γ00 + γ10YrsEducation + γ20IncomeThousands E(Yi(j)) = γ00 + γ10YrsEducation + γ20IncomeThousands 0 0
- 39. Non-Nested Models: Comparison • Can be compared with information criteria • Remember our fitted values from last week? • fitted(model2) • What if we replaced all of our observations with just the fitted (predicted) values? • We’d be losing some information • However, if the model predicted the data well, we would not be losing that much • Information criteria measure how much information is lost with the fitted values (so, lower is better)
- 40. Non-Nested Models: Comparison • AIC: An Information Criterion or Akaike’s Information Criterion • -2(log likelihood) + 2k • k = # of fixed and random effects in a particular model • A model with a lower AIC is better Akaike, 1974
- 41. Non-Nested Models: Comparison • AIC: An Information Criterion or Akaike’s Information Criterion • -2(log likelihood) + 2k • k = # of fixed and random effects in a particular model • A model with a lower AIC is better • Doesn’t assume any of the models is correct • Appropriate for correlational / non-experimental data • BIC: Bayesian Information Criterion • -2(log likelihood) + log(n)k • k = # of fixed & random effects, n = num. observations • A model with a lower BIC is better • Typically prefers simpler models than AIC • Assumes that there’s a “true” underlying model in the set of variables being considered • Appropriate for experimental data Yang, 2005; Oehlert, 2012
- 42. Non-Nested Models: Comparison • Can also get these from anova(model1, model2) • Just ignore the chi-square if non-nested models • AIC and BIC do not have a significance test associated with them • The model with the lower AIC/BIC is preferred, but we don’t know how reliable this preference is
- 43. Week 4.1: Model Comparison ! Lab: Interactions Practice ! Model Comparison ! Nested Models ! Hypothesis Testing ! REML vs ML ! Non-Nested Models ! Shrinkage ! The Problem ! Solutions
- 44. Shrinkage • The “Madden curse”… • Each year, a top NFL football player is picked to appear on the cover of the Madden NFL video game • That player often doesn’t play as well in the following season • Is the cover “cursed”?
- 45. Shrinkage • The “Madden curse”… • Each year, a top NFL football player is picked to appear on the cover of the Madden NFL video game • That player often doesn’t play as well in the following season • Is the cover “cursed”?
- 46. Shrinkage • What’s needed to be one of the top NFL players in a season? • You have to be a good player • Genuine predictor (signal) • And, luck on your side • Random chance or error • Top-performing player probably very good and very lucky • The next season… • Your skill may persist • Random chance probably won’t • Regression to the mean • Madden video game cover imperfect predicts next season’s performance because it was partly based on random error
- 47. Shrinkage • Our estimates (& any choice of variables based on them) always partially reflect random chance in the dataset we used to obtain them • Won’t fit any later data set quite as well … shrinkage • Problem when we’re using the data to decide the model
- 48. Shrinkage • Our estimates (& any choice of variables based on them) always partially reflect random chance in the dataset we used to obtain them • Won’t fit any later data set quite as well … shrinkage • “If you use a sample to construct a model, or to choose a hypothesis to test, you cannot make a rigorous scientific test of the model or the hypothesis using that same sample data.” (Babyak, 2004, p. 414)
- 49. Shrinkage—Examples • Relations that we observe between a predictor variable and a dependent variable might simply be capitalizing on random chance • U.S. government puts out 45,000 economic statistics each year (Silver, 2012) • Can we use these to predict whether US economy will go into recession? • With 45,000 predictors, we are very likely to find a spurious relation by chance • Especially w/ only 15 recessions since the end of WW II
- 50. Shrinkage—Examples • Relations that we observe between a predictor variable and a dependent variable might simply be capitalizing on random chance • U.S. government puts out 45,000 economic statistics each year (Silver, 2012) • Can we use these to predict whether US economy will go into recession? • With 45,000 predictors, we are very likely to find a spurious relation by chance • Significance tests try to address this … but with 45,000 predictors, we are likely to find significant effects by chance (5% Type I error rate at ɑ=.05)
- 51. Shrinkage—Examples • Adak Island, Alaska • Daily temperature here predicts stock market activity! • r = -.87 correlation with the price of a specific group of stocks! • Completely true—I’m not making this up! • Problem with this: • With thousands of weather stations & stocks, easy to find a strong correlation somewhere, even if it’s just sampling error • Problem is that this factoid doesn’t reveal all of the other (non- significant) weather stations & stocks we searched through • Would only be impressive if this hypothesis continued to be true on a new set of weather data & stock prices Vul et al., 2009
- 52. Shrinkage—Examples • “Puzzlingly high correlations” in some fMRI work • Correlate each voxel in a brain scan with a behavioral measure (e.g., personality survey) • Restrict the analysis to voxels where the correlation is above some threshold • Compute final correlation in this region with behavioral measure—very high! • Problem: Voxels were already chosen based on those high correlations • Includes sampling error favoring the correlation but excludes error that doesn’t Vul et al., 2009
- 53. Week 4.1: Model Comparison ! Lab: Interactions Practice ! Model Comparison ! Nested Models ! Hypothesis Testing ! REML vs ML ! Non-Nested Models ! Shrinkage ! The Problem ! Solutions
- 54. Shrinkage—Solutions • One solution: Select model(s) in advance (perhaps even pre-registered) • A theory is valuable for this • Adak Island example is implausible in part because there’s no causal reason why an island in Alaska would relate to stock prices “Just as you do not need to know exactly how a car engine works in order to drive safely, you do not need to understand all the intricacies of the economy to accurately read those gauges.” – Economic forecasting firm ECRI (quoted in Silver, 2012)
- 55. Shrinkage—Solutions • One solution: Select model(s) in advance (perhaps even pre-registered) • A theory is valuable for this • Not driven purely by the data or by chance if we have an a priori reason to favor this variable “There is really nothing so practical as a good theory.” -- Social psychologist Kurt Lewin (Lewin’s Maxim)
- 56. Shrinkage—Solutions • One solution: Select model(s) in advance (perhaps even pre-registered) • A theory is valuable for this • Not driven purely by the data or by chance if we have an a priori reason to favor this variable • Based on some other measure (e.g., another brain scan)
- 57. Shrinkage—Solutions • One solution: Select model(s) in advance (perhaps even pre-registered) • A theory is valuable for this • Not driven purely by the data or by chance if we have an a priori reason to favor this variable • Based on some other measure (e.g., another brain scan) • Based on research design • For factorial experiments, typical to include all experimental variables and interactions • Research design implies you were interested in all of these
- 58. Shrinkage—Solutions • For more exploratory analyses: Show that the finding replicates • On a second dataset • Test a model obtained from one subset of the data applies to another subset (cross-validation) • e.g., training and test sets • A better version: Do this with many randomly chosen subsets • Monte Carlo methods • Reading on Canvas for some general ways to do this in R