Successfully reported this slideshow.

Mixed Effects Models - Random Slopes

0

Share

Loading in …3
×
1 of 73
1 of 73

More Related Content

Related Books

Free with a 30 day trial from Scribd

See all

Related Audiobooks

Free with a 30 day trial from Scribd

See all

Mixed Effects Models - Random Slopes

  1. 1. Week 5.2: Random Slopes ! Random Slopes ! Recap of Random Intercepts ! Introduction to Random Slopes ! Notation ! Implementation ! Testing Random Effects ! Model Convergence ! Failures to Converge ! Solutions ! Scaling
  2. 2. Random Intercepts Recap • Let’s do an intervention: Hours of use of math tutoring software • tutor.csv: 25 students in each of 10 classrooms in each of 20 schools School 1 School 2 Sampled SCHOOLS Sampled CLASSROOMS Sampled STUDENTS LEVEL 3 LEVEL 2 LEVEL 1 Student 1 Student 2 Student 3 Student 4 Mr. Wagner’s Class Ms. Fulton’s Class Ms. Green’s Class Ms. Cornell’s Class
  3. 3. Random Intercepts Recap • Let’s do an intervention: Hours of use of math tutoring software (TutorHours) • Which level(s) of the model could this be at? School 1 School 2 Sampled SCHOOLS Sampled CLASSROOMS Sampled STUDENTS LEVEL 3 LEVEL 2 LEVEL 1 Student 1 Student 2 Student 3 Student 4 Mr. Wagner’s Class Ms. Fulton’s Class Ms. Green’s Class Ms. Cornell’s Class
  4. 4. Random Intercepts Recap • Let’s do an intervention: Hours of use of math tutoring software (TutorHours) • Which level(s) of the model could this be at? School 1 School 2 Sampled SCHOOLS Sampled CLASSROOMS Sampled STUDENTS LEVEL 3 LEVEL 2 LEVEL 1 Student 1 Student 2 Student 3 Student 4 Mr. Wagner’s Class Ms. Fulton’s Class Ms. Green’s Class Ms. Cornell’s Class If use of the tutor characterizes a whole school
  5. 5. Random Intercepts Recap • Let’s do an intervention: Hours of use of math tutoring software (TutorHours) • Which level(s) of the model could this be at? School 1 School 2 Sampled SCHOOLS Sampled CLASSROOMS Sampled STUDENTS LEVEL 3 LEVEL 2 LEVEL 1 Student 1 Student 2 Student 3 Student 4 Mr. Wagner’s Class Ms. Fulton’s Class Ms. Green’s Class Ms. Cornell’s Class If classrooms within a school vary in tutor use, but consistent within a classroom
  6. 6. Random Intercepts Recap • Let’s do an intervention: Hours of use of math tutoring software (TutorHours) • Which level(s) of the model could this be at? School 1 School 2 Sampled SCHOOLS Sampled CLASSROOMS Sampled STUDENTS LEVEL 3 LEVEL 2 LEVEL 1 Student 1 Student 2 Student 3 Student 4 Mr. Wagner’s Class Ms. Fulton’s Class Ms. Green’s Class Ms. Cornell’s Class If students within a classroom varied in their tutor usage
  7. 7. Random Intercepts Recap • Let’s check this in R: • tutor %>% group_by(School, TutorHours) %>% summarize(Frequency=n()) School 1 School 2 Sampled SCHOOLS Sampled CLASSROOMS Sampled STUDENTS LEVEL 3 LEVEL 2 LEVEL 1 Student 1 Student 2 Student 3 Student 4 Mr. Wagner’s Class Ms. Fulton’s Class Ms. Green’s Class Ms. Cornell’s Class
  8. 8. Random Intercepts Recap • Let’s check this in R: • tutor %>% group_by(ClassroomID, TutorHours) %>% summarize(Frequency=n()) School 1 School 2 Sampled SCHOOLS Sampled CLASSROOMS Sampled STUDENTS LEVEL 3 LEVEL 2 LEVEL 1 Student 1 Student 2 Student 3 Student 4 Mr. Wagner’s Class Ms. Fulton’s Class Ms. Green’s Class Ms. Cornell’s Class If classrooms within a school vary in tutor use, but consistent within a classroom
  9. 9. • Let’s run a model to look at the effect of TutorHours on FinalMathScore • While accounting for clustering in School and ClassroomID • model.Intercepts <- lmer(FinalMathScore ~ 1 + TutorHours + ???????? + ????????, data=tutor) Nested Random Effects Recap School 1 School 2 Sampled SCHOOLS Sampled CLASSROOMS Sampled STUDENTS LEVEL 3 LEVEL 2 LEVEL 1 Student 1 Student 2 Student 3 Student 4 Mr. Wagner’s Class Ms. Fulton’s Class Ms. Green’s Class Ms. Cornell’s Class
  10. 10. • Let’s run a model to look at the effect of TutorHours on FinalMathScore • While accounting for clustering in School and ClassroomID • model.Intercepts <- lmer(FinalMathScore ~ 1 + TutorHours + (1|ClassroomID) + (1|School), data=tutor) Nested Random Effects Recap School 1 School 2 Sampled SCHOOLS Sampled CLASSROOMS Sampled STUDENTS LEVEL 3 LEVEL 2 LEVEL 1 Student 1 Student 2 Student 3 Student 4 Mr. Wagner’s Class Ms. Fulton’s Class Ms. Green’s Class Ms. Cornell’s Class
  11. 11. Nested Random Effects Recap ! Why didn’t we include a random intercept for Student? ! Each child was observed only once ! No clustering of observations within students School 1 School 2 Sampled SCHOOLS Sampled CLASSROOMS Sampled STUDENTS LEVEL 3 LEVEL 2 LEVEL 1 Student 1 Student 2 Student 3 Student 4 Mr. Wagner’s Class Ms. Fulton’s Class Ms. Green’s Class Ms. Cornell’s Class
  12. 12. Nested Random Effects Recap School 1 School 2 Sampled SCHOOLS Sampled CLASSROOMS Sampled STUDENTS LEVEL 3 LEVEL 2 LEVEL 1 Student 1 Student 2 Student 3 Student 4 Mr. Wagner’s Class Ms. Fulton’s Class Ms. Green’s Class Ms. Cornell’s Class If classrooms within a school vary in tutor use, but consistent within a classroom
  13. 13. • Which do you think is better? • Teachers are allowed to choose how much they use the tutor in their classroom • We randomly assign a specific # of tutor hours Nested Random Effects Recap School 1 School 2 Sampled SCHOOLS Sampled CLASSROOMS Sampled STUDENTS LEVEL 3 LEVEL 2 LEVEL 1 Student 1 Student 2 Student 3 Student 4 Mr. Wagner’s Class Ms. Fulton’s Class Ms. Green’s Class Ms. Cornell’s Class If classrooms within a school vary in tutor use, but consistent within a classroom
  14. 14. • Random assignment allows stronger claims about causality • But, model is implemented the same either way • Just affects how we interpret the results Nested Random Effects Recap School 1 School 2 Sampled SCHOOLS Sampled CLASSROOMS Sampled STUDENTS LEVEL 3 LEVEL 2 LEVEL 1 Student 1 Student 2 Student 3 Student 4 Mr. Wagner’s Class Ms. Fulton’s Class Ms. Green’s Class Ms. Cornell’s Class If classrooms within a school vary in tutor use, but consistent within a classroom
  15. 15. Week 5.2: Random Slopes ! Random Slopes ! Recap of Random Intercepts ! Introduction to Random Slopes ! Notation ! Implementation ! Testing Random Effects ! Model Convergence ! Failures to Converge ! Solutions ! Scaling
  16. 16. Introduction to Random Slopes • Right now, our assumption is that schools differ in their baseline math score (intercept) • What other ways might schools differ? • Some schools might use the tutor more effectively than others • Fidelity of Implementation is a major concern in research on educational & other interventions!
  17. 17. Introduction to Random Slopes • Right now, our assumption is that schools differ in their baseline math score (intercept) • What other ways might schools differ? • Some schools might use the tutor more effectively than others High fidelity: Mastery- based learning of specific skills Low fidelity: Assigning the same problems to everyone
  18. 18. Introduction to Random Slopes • Overall relationship between a classroom’s use of the tutor & their average math score • Current assumption is that the slope of this line (= tutor effect) is the same in every school ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 10 15 20 25 30 35 50 60 70 80 90 100 Hours that a classroom used the tutor Average final math score
  19. 19. Random Slopes • That is, so far, our model says that schools vary in baseline math score • Random intercept • And that every 1 hour of tutor use ≈ 0.49 points gain in final math score • Slope is a fixed effect 50 55 60 65 70 Hours that a classroom used the tutor Average final math score 0 5 10 Highland 50 55 60 65 70 Midland 50 55 60 65 70 Crescent Different intercepts for each school Assumption is that slope is the same for all schools
  20. 20. Random Slopes • That is, so far, our model says that schools vary in baseline math score • Random intercept • And that every 1 hour of tutor use ≈ 0.49 points gain in final math score • Slope is a fixed effect • Schools may also vary in how much tutor affects their classrooms’ scores • A random slope of tutor usage by schools • Such differences may correlate with baseline diff.s 1 2 3 500 600 700 800 900 1000 1100 1200 Word frequency RT Subject 1 1 2 3 500 600 700 800 900 1000 1100 1200 Subject 2 1 2 3 500 600 700 800 900 1000 1100 1200 Subject 3 1 2 3 500 600 700 800 900 1000 1100 1200 Subject 4 50 55 60 65 70 Hours that a classroom used the tutor Average final math score 0 5 10 Highland 50 55 60 65 70 Midland 50 55 60 65 70 Crescent Schools still vary in intercept Slopes now also differ across sampled schools Schools that start off worse show a bigger benefit of tutor
  21. 21. Random Slopes: Statistical Consequences • Important to capture this variability in the model • Observations from the same school will show more similar effects of tutor use • Another kind of clustering! • If we don’t account for this similarity, non-independence that underestimates true variability • Inflates false positive rate Highland School Typically high fidelity Classroom 21, 27 hrs Highland School Typically high fidelity Classroom 22, 28 hrs Highland School Typically high fidelity Classroom 29, 25 hrs Midland School Typically low fidelity Classroom 30, 21 hrs Midland School Typically low fidelity Classroom 97, 27 hrs Midland School Typically low fidelity Classroom 98, 28 hrs t = Estimate Std. error
  22. 22. Week 5.2: Random Slopes ! Random Slopes ! Recap of Random Intercepts ! Introduction to Random Slopes ! Notation ! Implementation ! Testing Random Effects ! Model Convergence ! Failures to Converge ! Solutions ! Scaling
  23. 23. Notation • • Level-2 model of classroom j: • Level 1 model of student i: Student Error Ei(jk) = End-of-year math exam score + Yi(jk) β0jk Tutor hours γ100x1j(k) U0j(k) = Classroom intercept + β0jk γ000 Teacher effect for this classroom (Error) Overall baseline + Baseline for students in this classroom
  24. 24. Notation • Level-3 model of school k: • Level-2 model of classroom j: • Level 1 model of student i: Student Error Ei(jk) = End-of-year math exam score + Yi(jk) β0jk Tutor hours γ100x1j(k) U0j(k) = Classroom intercept + β0jk δ00k Teacher effect for this classroom (Error) Baseline for classrooms in this school + Baseline for students in this classroom School effect for this school (Error) V00k = School intercept + δ00k γ000 Overall baseline
  25. 25. Notation • Level-3 model of school k: • Level-2 model of classroom j: • Level 1 model of student i: Student Error Ei(jk) = End-of-year math exam score + Yi(jk) β0jk Tutor hours γ100x1j(k) U0j(k) = Classroom intercept + β0jk δ00k Teacher effect for this classroom (Error) Baseline for classrooms in this school + Baseline for students in this classroom School effect for this school (Error) V00k = School intercept + δ00k γ000 Overall baseline Right now, tutor slope γ100 is still just a fixed value. Maybe, like the intercept, we should allow to vary across schools.
  26. 26. • Level-3 model of school k: • Level-2 model of classroom j: • Level 1 model of student i: Student Error Ei(jk) = End-of-year math exam score + Yi(jk) β0jk Tutor hours δ10kx1j(k) U0j(k) = Classroom intercept + β0jk δ00k Teacher effect for this classroom (Error) Baseline for classrooms in this school + Baseline for students in this classroom School effect for this school (Error) V00k = School intercept + δ00k γ000 Overall baseline By-school adjustment for tutor slope (Error) V10k = Tutor slope + δ10k γ100 Overall tutor slope (fixed effect)
  27. 27. • Level-3 model of school k: • Level-2 model of classroom j: • Level 1 model of student i: Student Error Ei(jk) = End-of-year math exam score + Yi(jk) β0jk Tutor hours δ10kx1j(k) U0j(k) = Classroom intercept + β0jk δ00k Teacher effect for this classroom (Error) Baseline for classrooms in this school + Baseline for students in this classroom School effect for this school (Error) V00k = School intercept + δ00k γ000 Overall baseline By-school adjustment for tutor slope (Error) V10k = Tutor slope + δ10k γ100 Overall tutor slope (fixed effect) Let’s start doing our algebraic substitution
  28. 28. • Level-3 model of school k: • Level-2 model of classroom j: • Level 1 model of student i: Student Error Ei(jk) = End-of-year math exam score + Yi(jk) β0jk Tutor hours δ10kx1j(k) U0j(k) = Classroom intercept + β0jk δ00k Teacher effect for this classroom (Error) Baseline for classrooms in this school + Baseline for students in this classroom School effect for this school (Error) V00k = School intercept + δ00k γ000 Overall baseline By-school adjustment for tutor slope (Error) V10k = Tutor slope + δ10k γ100 Overall tutor slope (fixed effect) Let’s start doing our algebraic substitution
  29. 29. • Level-3 model of school k: • Level-2 model of classroom j: • Level 1 model of student i: Student Error Ei(jk) = End-of-year math exam score + Yi(jk) β0jk Tutor hours δ10kx1j(k) U0j(k) = Classroom intercept + β0jk Teacher effect for this classroom (Error) + Baseline for students in this classroom By-school adjustment for intercept (Error) V00k + γ000 Overall baseline By-school adjustment for tutor slope (Error) V10k = Tutor slope + δ10k γ100 Overall tutor slope (fixed effect) Let’s start doing our algebraic substitution
  30. 30. • Level-3 model of school k: • Level-2 model of classroom j: • Level 1 model of student i: Student Error Ei(jk) = End-of-year math exam score + Yi(jk) β0jk Tutor hours δ10kx1j(k) U0j(k) = Classroom intercept + β0jk Teacher effect for this classroom (Error) + Baseline for students in this classroom By-school adjustment for intercept (Error) V00k + γ000 Overall baseline By-school adjustment for tutor slope (Error) V10k = Tutor slope + δ10k γ100 Overall tutor slope (fixed effect) Let’s start doing our algebraic substitution
  31. 31. • Level-2 model of classroom j: • Level 1 model of student i: Student Error Ei(jk) = End-of-year math exam score + Yi(jk) β0jk Tutor hours effect (overall & by school) (γ100 + V10k)x1j(k) U0j(k) = Classroom intercept + β0jk Teacher effect for this classroom (Error) + Baseline for students in this classroom By-school adjustment for intercept (Error) V00k + γ000 Overall baseline Let’s start doing our algebraic substitution
  32. 32. • Level-2 model of classroom j: • Level 1 model of student i: Student Error Ei(jk) = End-of-year math exam score + Yi(jk) β0jk Tutor hours effect (overall & by school) (γ100 + V10k)x1j(k) U0j(k) = Classroom intercept + β0jk Teacher effect for this classroom (Error) + Baseline for students in this classroom By-school adjustment for intercept (Error) V00k + γ000 Overall baseline Let’s start doing our algebraic substitution
  33. 33. • Mixed-effects model of student i: Student Error Ei(jk) = End-of-year math exam score + Yi(jk) Tutor hours effect (overall & by school) (γ100 + V10k)x1j(k) U0j(k) + Teacher effect for this classroom (Error) + By-school adjustment for intercept (Error) V00k + γ000 Overall baseline
  34. 34. • Mixed-effects model of student i: Student Error Ei(jk) = End-of-year math exam score + Yi(jk) Tutor hours effect (overall & by school) (γ100 + V10k)x1j(k) U0j(k) + Teacher effect for this classroom (Error) + By-school adjustment for intercept (Error) V00k + γ000 Overall baseline Apply distributive property
  35. 35. • Mixed-effects model of student i: Student Error Ei(jk) = End-of-year math exam score + Yi(jk) Overall fixed effect of tutor hours γ100x1j(k) U0j(k) + Teacher effect for this classroom (Error) + By-school adjustment for intercept (Error) V00k + γ000 Overall baseline By-school adjustment for tutor slope (Error) V10kx1j(k) + FIXED RANDOM
  36. 36. • Because we have more than one random variable, they can also covary Covariance Matrix [ ] σ2 u00j0 σ2 u10j0 cov(σ2 u00j0, u10j0) cov(σ2 u00j0, u10j0) Variance of school intercept Variance of TutorHours slope by schools The correlation parameter! Covariance matrix for school random effects in maximal model [ ] σ2 u00j0 σ2 u10j0 0 0 In near-maximal model without the correlation parameter
  37. 37. Week 5.2: Random Slopes ! Random Slopes ! Recap of Random Intercepts ! Introduction to Random Slopes ! Notation ! Implementation ! Testing Random Effects ! Model Convergence ! Failures to Converge ! Solutions ! Scaling
  38. 38. Random Slopes: Implementation = End-of-year math exam score Yi(jk) Overall fixed effect of tutor hours γ100x1j(k) U0j(k) Teacher effect for this classroom (Error) + By-school adjustment for intercept (Error) V00k + γ000 Overall baseline By-school adjustment for tutor slope (Error) V10kx1j(k) + FIXED RANDOM • So, lmer model: • model.Slope <- lmer(FinalMathScore ~ Next, we need overall (fixed) effects of Intercept and TutorHours
  39. 39. Random Slopes: Implementation = End-of-year math exam score Yi(jk) Overall fixed effect of tutor hours γ100x1j(k) U0j(k) Teacher effect for this classroom (Error) + By-school adjustment for intercept (Error) V00k + γ000 Overall baseline By-school adjustment for tutor slope (Error) V10kx1j(k) + FIXED RANDOM • So, lmer model: • model.Slope <- lmer(FinalMathScore ~ 1 + TutorHours Next, we need overall (fixed) effects of Intercept and TutorHours
  40. 40. Random Slopes: Implementation = End-of-year math exam score Yi(jk) Overall fixed effect of tutor hours γ100x1j(k) U0j(k) Teacher effect for this classroom (Error) + By-school adjustment for intercept (Error) V00k + γ000 Overall baseline By-school adjustment for tutor slope (Error) V10kx1j(k) + FIXED RANDOM • So, lmer model: • model.Slope <- lmer(FinalMathScore ~ 1 + TutorHours How about a random difference in the intercept for each ClassroomID?
  41. 41. Random Slopes: Implementation = End-of-year math exam score Yi(jk) Overall fixed effect of tutor hours γ100x1j(k) U0j(k) Teacher effect for this classroom (Error) + By-school adjustment for intercept (Error) V00k + γ000 Overall baseline By-school adjustment for tutor slope (Error) V10kx1j(k) + FIXED RANDOM • So, lmer model: • model.Slope <- lmer(FinalMathScore ~ 1 + TutorHours + (1|ClassroomID) How about a random difference in the intercept for each Classroom?
  42. 42. Random Slopes: Implementation = End-of-year math exam score Yi(jk) Overall fixed effect of tutor hours γ100x1j(k) U0j(k) Teacher effect for this classroom (Error) + By-school adjustment for intercept (Error) V00k + γ000 Overall baseline By-school adjustment for tutor slope (Error) V10kx1j(k) + FIXED RANDOM • So, lmer model: • model.Slope <- lmer(FinalMathScore ~ 1 + TutorHours + (1|ClassroomID) Lastly, both intercept and TutorHours slope need to vary across schools
  43. 43. Random Slopes: Implementation = End-of-year math exam score Yi(jk) Overall fixed effect of tutor hours γ100x1j(k) U0j(k) Teacher effect for this classroom (Error) + By-school adjustment for intercept (Error) V00k + γ000 Overall baseline By-school adjustment for tutor slope (Error) V10kx1j(k) + FIXED RANDOM • So, lmer model: • model.Slope <- lmer(FinalMathScore ~ 1 + TutorHours + (1|ClassroomID) + (1+TutorHours|School) Lastly, both intercept and TutorHours slope need to vary across schools
  44. 44. Random Slopes: Implementation = End-of-year math exam score Yi(jk) Overall fixed effect of tutor hours γ100x1j(k) U0j(k) Teacher effect for this classroom (Error) + By-school adjustment for intercept (Error) V00k + γ000 Overall baseline By-school adjustment for tutor slope (Error) V10kx1j(k) + FIXED RANDOM • So, lmer model: • model.Slope <- lmer(FinalMathScore ~ 1 + TutorHours + (1|ClassroomID) + (1+TutorHours|School), data=tutor)
  45. 45. Random Slopes: Implementation • Here’s our final model again: • model.Slope <- lmer(FinalMathScore ~ 1 + TutorHours + (1|ClassroomID) + (1 + TutorHours|School), data=tutor) Like a miniature model formula for things we think will vary by schools Schools differ in their intercept (baseline math score) Schools differ in the effectiveness of the tutor on their math scores
  46. 46. Random Slopes: Output Still have a fixed effect of TutorHours— slope estimated across all subjects NEW: How much schools vary from the mean TutorHours slope The fixed effect is reliable. In this case, we can conclude that final math scores are generally higher for classrooms that used the tutor more, even if there are some differences across schools in the size of this effect. Correlation: Schools with a lower starting score show a larger benefit of the tutor
  47. 47. Random Slopes: Model Comparison • Now that we’ve properly accounted for clustering, our estimate of the standard error of the tutor effect is higher … and t value lower • Excluding the slope would have increased our Type I error rate Our estimate of baseline school variance has now decreased—many of these unexplained “baseline” differences have now been explained as differing effectiveness of the tutor NEW MODEL (WITH SLOPE) OLD MODEL (INTERCEPT ONLY)
  48. 48. Random Effects: Implementation • By the way, how come we never discussed a (1+TutorHours|ClassroomID) slope? • Within a school, we can calculate a regression line relating tutor use to average math score • Each class = 1 point ● ● ● ● ● ● ● ● ● ● 20 22 24 26 28 30 32 65 70 75 80 85 Hours that a classroom used the tutor Average final math score Class 30 Class 23 Class 29 Class 26 Class 27 Class 21 Class 22 Class 24 Class 25 Class 28
  49. 49. Random Effects: Implementation • By the way, how come we never discussed a (1+TutorHours|ClassroomID) slope? • But each class has only a single value of TutorHours • Classroom 22 used it for 28 hours • No way to draw a line relating different values of TutorHours to score • Need 2 points for a line! ● 20 22 24 26 28 30 32 65 70 75 80 85 Hours that a classroom used the tutor Average final math score Class 22
  50. 50. Week 5.2: Random Slopes ! Random Slopes ! Recap of Random Intercepts ! Introduction to Random Slopes ! Notation ! Implementation ! Testing Random Effects ! Model Convergence ! Failures to Converge ! Solutions ! Scaling
  51. 51. Testing Random Effects • Does this random slope contribute significantly? • i.e., is there significant variation in the effectiveness of the tutor across schools? • We can compare the fit of models with & without the random slope • Using the likelihood-ratio test • anova(model.Intercepts, model.Slope) Model with slope fits significantly better!
  52. 52. Testing Random Effects • Remember that this tests whether there is significant variation in the slope across schools • i.e., more than expected under H0 of no variation • More complex model will always fit numerically the same or better • But if three schools had slopes of 0.39, 0.38, and 0.40, this would probably not be significant • Consistent with what’s expected from sampling error
  53. 53. Testing Random Effects • Caveats: • Overfitting / shrinkage • Problem we talked about last week about using the data to decide the model. “Using up” degrees of freedom • A new sample might yield a different random effects structure • In cases where we have a clear sampling design, would want model to reflect that • e.g., weird not to include random intercepts of classroom & school here
  54. 54. Week 5.2: Random Slopes ! Random Slopes ! Recap of Random Intercepts ! Introduction to Random Slopes ! Notation ! Implementation ! Testing Random Effects ! Model Convergence ! Failures to Converge ! Solutions ! Scaling
  55. 55. Failures to Converge • Remember how lmer() iteratively searches for the parameter estimates? • With more complex random-effect structures, possible that we fail to converge on a specific estimate or hit false convergence • lmer() searched & searched but wasn’t able to figure out the right parameter estimates • Still returns a model, but we don’t want to trust it
  56. 56. Failures to Converge • Why does this happen? • Not enough data relative to the questions we’re trying to ask about it • Random effects structure in our lmer model is genuinely more complex than structure of the world • lmer(FinalMathScore ~ 1 + TutorHours + TeacherTheory + TOI + (1|Classroom) + (1 + TutorHours + TeacherTheory + TOI|School), data=tutor) As we add slopes, combinatorial explosion of correlations! Do we have enough data to estimate them all? Is pattern of differences among schools really this complex?
  57. 57. Week 5.2: Random Slopes ! Random Slopes ! Recap of Random Intercepts ! Introduction to Random Slopes ! Notation ! Implementation ! Testing Random Effects ! Model Convergence ! Failures to Converge ! Solutions ! Scaling
  58. 58. Failures to Converge: Solutions • “There is no perfect solution, but there are accepted ones.” (Matuschek et al., 2017, p. 306)
  59. 59. Change the data (turn to slide 72) Failures to Converge: Solutions Try harder to converge (turn to slide 62) Change the model (turn to slide 64) • The model we chose didn’t converge when fit to this dataset. What should we do?
  60. 60. • Increase number of attempts • Default is 10,000 • lmer(FinalMathScore ~ 1 + TutorHours + (1|Classroom) + (1 + TutorHours + TeacherTheory + TOI|School), data=tutor, control=lmerControl(optCtrl=list(maxfun=20000))) • But, if the model is too complex relative to data, spending more time doesn’t help • Often not the problem, not the solution Try Harder to Converge
  61. 61. • Try a different optimizer • R’s “strategy” for playing guess & check • Default strategy is pretty good, but could try others • Install packages dfoptim and optmix • Then: • allFit(yourModelName) • Tests 7 (!) different optimizers Try Harder to Converge You let your computer run all night with more optimizers and more iterations, but you wake up to find the model still hasn’t converged. You are crushingly disappointed. Return to slide 61.
  62. 62. • Centering or applying orthogonal contrast coding will often help with convergence • A “bonus”—often, this is what you already want for your research questions • We’ll see what this is soon ☺ Change the Model
  63. 63. • Model may be more complicated than needed to explain data (it’s overparameterized) • Solution: Make it simpler • Example: May have lots of correlation parameters, but world isn’t that complex • Not a complex pattern where schools separately vary in baseline diffs, tutor effects, & theory-of-intelligence effects • Indicator: Correlations close to (or at) +1 or -1 Change the Model
  64. 64. • We can often simply a model by removing unnecessary random effects • e.g., remove correlation parameters • Results in near-maximal model • Essentially no (or no) effect on tests of fixed effects (Barr et al., 2013) • Often not of theoretical interest (except in some individual-difference studies) • “Model tonsillectomy” • Syntax – use ||: • model.NoCorr <- lmer(FinalMathScore ~ 1 + TutorHours + (1|Classroom) + (1 + TutorHours + TeacherTheory + TOI||School), data=tutor) Change the Model
  65. 65. • How do we know if this “tonsillectomy” is justified? • i.e., is the model without correlation parameters a significantly poorer fit? • Use anova() to compare the two models • We don’t want to exclude something that matters, so use a liberal criterion for including random effects (e.g., p < .20; Barr et al., 2013) Change the Model
  66. 66. • If it still doesn’t converge, also possible to test whether random slopes matter and only includes ones that significantly contribute • This results in the parsimonious model or “maximal random effects structure justified by the data” • Little increase in Type I error compared to maximal model and often higher power (Matuschek et al., 2017) Change the Model
  67. 67. • For experimental data, a good idea to at least start with maximal model, even if we reduce it to a more parsimonious model • Bad to neglect an important random effect • Like having a repeated-measures design, but not running repeated-measures ANOVA • “Neither the [maximal] nor the [minimal] linear mixed models are appropriate for most repeated measures analysis. […] We can usually find middle ground, a covariance model that adequately accounts for correlation but is more parsimonious than the maximal model. Doing so allows us full control over [T]ype I error rates without needlessly sacrificing power.” (Stroup, 2012, p. 185) Change the Model
  68. 68. • For experimental data, a good idea to at least start with maximal model, even if we reduce it to a more parsimonious model • Bad to neglect an important random effect • Like having a repeated-measures design, but not running repeated-measures ANOVA • Be clear which model you are reporting and why you chose it Change the Model
  69. 69. • For experimental data, a good idea to at least start with maximal model, even if we reduce it to a more parsimonious model • Bad to neglect an important random effect • Like having a repeated-measures design, but not running repeated-measures ANOVA Change the Model Success! Centering your predictor variables made the model converge. If not, you’re sure that removing the correlation parameters would have solved it. But what about the model for Experiment 2? A feeling of dread comes over you as you prepare for the next analysis. Return to slide 61.
  70. 70. • Collect more data! • Failure to converge can simply be a sign that we are asking too much of a small dataset • Just because we want the data to be able to answer this question doesn’t mean it can Change the Data Collecting more data allowed the model to converge. But you blew your research budget for the term, and now your advisor is furious with you. Return to slide 61.
  71. 71. Week 5.2: Random Slopes ! Random Slopes ! Recap of Random Intercepts ! Introduction to Random Slopes ! Notation ! Implementation ! Testing Random Effects ! Model Convergence ! Failures to Converge ! Solutions ! Scaling
  72. 72. Failures to Converge Due to Scaling • Imagine a study of second-language translation: • model1 <- lmer(TranslationRT ~ YearsOfStudy + PrevTrialRT + (1|Subject) + (1|Item), data=translate) • Algorithm doesn’t work as well if the variables are on very different scales • Latter effect basically just gets “rounded out” YearsOfStudy measured in years PrevTrialRT measured in msec BIG effect of 1-unit change TINY effect of 1-unit change
  73. 73. Failures to Converge Due to Scaling • Imagine a study of second-language translation: • model1 <- lmer(TranslationRT ~ YearsOfStudy + PrevTrialRT + (1|Subject) + (1|Item), data=translate) • Algorithm doesn’t work as well if the variables are on very different scales • Simple solution: Change a variable to be a different scale • translate %>% mutate(PrevTrialRTInSeconds = PrevTrialRT / 1000) -> translate

×