# Mixed Effects Models - Random Intercepts

Jul. 2, 2021
1 of 79

### Mixed Effects Models - Random Intercepts

• 1. Course Business ! Next three weeks: Random effects for different types of designs ! This week and next: “Nested” random effects ! After that: “Crossed” random effects ! Informal Early Feedback survey will be available on Canvas after class ! Look under “Quizzes”
• 2. Course Business ! Package sjPlot provides a convenient way to plot lmer results ! library(sjPlot) ! model2 %>% plot_model() ! A ggplot, so all ggplot settings can be used Each row is one independent variable (or interaction) Confidence interval x-axis: Estimate of the effect (compare to 0)
• 3. Week 4.2: Nested Random Effects ! Overfitting ! The Problem ! Solution ! Nested Random Effects ! Introduction to Clustering ! Random Effects ! Modeling Random Effects in R ! Interpretation ! Random Intercept ! BLUPs ! Residual Error ! ICC ! Notation ! Summary
• 4. Overfitting • The “Madden curse”… • Each year, a top NFL football player is picked to appear on the cover of the Madden NFL video game • That player often doesn’t play as well in the following season • Is the cover “cursed”?
• 5. Overfitting • The “Madden curse”… • Each year, a top NFL football player is picked to appear on the cover of the Madden NFL video game • That player often doesn’t play as well in the following season • Is the cover “cursed”?
• 6. Overfitting • What’s needed to be one of the top NFL players in a season? • You have to be a good player • Genuine predictor (signal) • And, luck on your side • Random chance or error • Top-performing player probably very good and very lucky • The next season… • Your skill may persist • Random chance probably won’t • Regression to the mean • Madden video game cover imperfect predicts next season’s performance because it was partly based on random error
• 7. Overfitting • Our estimates (& any choice of variables based on them) always partially reflect random chance in the dataset we used to obtain them • Won’t fit any later data set quite as well … shrinkage • Problem when we’re using the data to decide the model
• 8. Overfitting • Our estimates (& any choice of variables based on them) always partially reflect random chance in the dataset we used to obtain them • Won’t fit any later data set quite as well … shrinkage • “If you use a sample to construct a model, or to choose a hypothesis to test, you cannot make a rigorous scientific test of the model or the hypothesis using that same sample data.” (Babyak, 2004, p. 414)
• 9. Overfitting—Examples • Relations that we observe between a predictor variable and a dependent variable might simply be capitalizing on random chance • U.S. government puts out 45,000 economic statistics each year (Silver, 2012) • Can we use these to predict whether US economy will go into recession? • With 45,000 predictors, we are very likely to find a spurious relation by chance • Especially w/ only 15 recessions since the end of WW II
• 10. Overfitting—Examples • Relations that we observe between a predictor variable and a dependent variable might simply be capitalizing on random chance • U.S. government puts out 45,000 economic statistics each year (Silver, 2012) • Can we use these to predict whether US economy will go into recession? • With 45,000 predictors, we are very likely to find a spurious relation by chance • Significance tests try to address this … but with 45,000 predictors, we are likely to find significant effects by chance (5% Type I error rate at ɑ=.05)
• 11. Overfitting—Examples • Adak Island, Alaska • Daily temperature here predicts stock market activity! • r = -.87 correlation with the price of a specific group of stocks! • Completely true—I’m not making this up! • Problem with this: • With thousands of weather stations & stocks, easy to find a strong correlation somewhere, even if it’s just sampling error • Problem is that this factoid doesn’t reveal all of the other (non- significant) weather stations & stocks we searched through • Would only be impressive if this hypothesis continued to be true on a new set of weather data & stock prices Vul et al., 2009
• 12. Overfitting—Examples • “Puzzlingly high correlations” in some fMRI work • Correlate each voxel in a brain scan with a behavioral measure (e.g., personality survey) • Restrict the analysis to voxels where the correlation is above some threshold • Compute final correlation in this region with behavioral measure—very high! • Problem: Voxels were already chosen based on those high correlations • Includes sampling error favoring the correlation but excludes error that doesn’t Vul et al., 2009
• 13. Week 4.2: Nested Random Effects ! Overfitting ! The Problem ! Solution ! Nested Random Effects ! Introduction to Clustering ! Random Effects ! Modeling Random Effects in R ! Interpretation ! Random Intercept ! BLUPs ! Residual Error ! ICC ! Notation ! Summary
• 14. Overfitting—Solutions • One solution: Select model(s) in advance (perhaps even pre-registered) • A theory is valuable for this • Adak Island example is implausible in part because there’s no causal reason why an island in Alaska would relate to stock prices “Just as you do not need to know exactly how a car engine works in order to drive safely, you do not need to understand all the intricacies of the economy to accurately read those gauges.” – Economic forecasting firm ECRI (quoted in Silver, 2012)
• 15. Overfitting—Solutions • One solution: Select model(s) in advance (perhaps even pre-registered) • A theory is valuable for this • Not driven purely by the data or by chance if we have an a priori reason to favor this variable “There is really nothing so practical as a good theory.” -- Social psychologist Kurt Lewin (Lewin’s Maxim)
• 16. Overfitting—Solutions • One solution: Select model(s) in advance (perhaps even pre-registered) • A theory is valuable for this • Not driven purely by the data or by chance if we have an a priori reason to favor this variable • Based on some other measure (e.g., another brain scan)
• 17. Overfitting—Solutions • One solution: Select model(s) in advance (perhaps even pre-registered) • A theory is valuable for this • Not driven purely by the data or by chance if we have an a priori reason to favor this variable • Based on some other measure (e.g., another brain scan) • Based on research design • For factorial experiments, typical to include all experimental variables and interactions • Research design implies you were interested in all of these • Variables viewed in advance as necessary controls
• 18. Overfitting—Solutions • For more exploratory analyses: Show that the finding replicates • On a second dataset • Test a model obtained from one subset of the data applies to another subset (cross-validation) • e.g., training and test sets • A better version: Do this with many randomly chosen subsets • Bootstrapping methods • Reading on Canvas for some general ways to do this in R
• 19. Overfitting—Solutions • Also: Can limit the number of variables • The more variables relative to our sample size, the more likely we are to be overfitting • Common rule of thumb (Babyak, 2004): • 10-15 observations per predictor • e.g., 4 predictor variables of interest " N=40 to 60 needed
• 20. Week 4.2: Nested Random Effects ! Overfitting ! The Problem ! Solution ! Nested Random Effects ! Introduction to Clustering ! Random Effects ! Modeling Random Effects in R ! Interpretation ! Random Intercept ! BLUPs ! Residual Error ! ICC ! Notation ! Summary
• 21. Theories of Intelligence ! For each item, rate your agreement on a scale of 0 to 7 DEFINITELY AGREE DEFINITELY DISAGREE 7 0
• 22. Theories of Intelligence 1. “You have a certain amount of intelligence, and you can’t really do much to change it.” DEFINITELY AGREE DEFINITELY DISAGREE 7 0
• 23. Theories of Intelligence 2. “Your intelligence is something about you that you can’t change very much.” DEFINITELY AGREE DEFINITELY DISAGREE 7 0
• 24. Theories of Intelligence 3. “You can learn new things, but you can’t really change your basic intelligence.” DEFINITELY AGREE DEFINITELY DISAGREE 7 0
• 25. Theories of Intelligence ! Find your total, then divide by 3 ! Learners hold different views of intelligence (Dweck, 2008): FIXED MINDSET: Intelligence is fixed. Performance = ability GROWTH MINDSET: Intelligence is malleable Performance = effort 7 0
• 26. Theories of Intelligence • Fixed mindset has been linked to less persistence & success in academic (& other work) (Dweck, 2008) • Let’s see if this is true for middle-schoolers’ math achievement • math.csv on Canvas • 30 students in each of 24 classrooms (N = 720) • Measure fixed mindset … 0 to 7 questionnaire • Dependent measure: Score on an end-of-year standardized math exam (0 to 100)
• 27. Theories of Intelligence • We can start writing a regression line to relate fixed mindset to end-of-year score = End-of-year math exam score Yi(j) Fixed mindset γ10x1i(j)
• 28. Theories of Intelligence • What about kids whose Fixed Mindset score is 0? • Completely Growth mindset • These kids probably will score decently well on the math exam • Include an intercept term • Math score when Fixed Mindset score = 0 = End-of-year math exam score + Baseline Yi(j) γ00 Fixed mindset γ10x1i(j)
• 29. Theories of Intelligence • We probably can’t predict each student’s math score exactly • Kids differ in ways other than their fixed mindset • Include an error term • Residual difference between predicted & observed score for observation i in classroom j • Captures what’s unique about child i • Assume these are independently, identically normally distributed (mean 0) Error Ei(j) = End-of-year math exam score + + Baseline Yi(j) γ00 Fixed mindset γ10x1i(j)
• 30. Theories of Intelligence Data Student 1 Student 2 Student 3 Student 4 Sampled STUDENTS Mr. Wagner’s Class Ms. Fulton’s Class Ms. Green’s Class Ms. Cornell’s Class Sampled CLASSROOMS Math achievement score y11 Theory of intelligence score x111 Independent error term e11 Math achievement score y21 Theory of intelligence score x121 Independent error term e21 Math achievement score y42 Theory of intelligence score x142 Independent error term e42 • Where is the problem here?
• 31. Theories of Intelligence Data Student 1 Student 2 Student 3 Student 4 Sampled STUDENTS Mr. Wagner’s Class Ms. Fulton’s Class Ms. Green’s Class Ms. Cornell’s Class Sampled CLASSROOMS Math achievement score y11 Theory of intelligence score x111 Independent error term e11 Math achievement score y21 Theory of intelligence score x121 Independent error term e21 Math achievement score y42 Theory of intelligence score x142 Independent error term e42 • Error terms not fully independent • Students in the same classroom probably have more similar scores. Clustering. • Differences in classroom size, teaching style, teacher’s experience…
• 32. Clustering • Why does clustering matter? • Remember that we test effects by comparing the estimates to their standard error: • Failing to account for clustering can lead us to detect spurious results (sometimes quite badly!) t = Estimate Std. error But if we have a lot of kids from the same classroom, they share more similarities than all kids in population Understating the standard error across subjects… …thus overstating the significance test
• 33. Week 4.2: Nested Random Effects ! Overfitting ! The Problem ! Solution ! Nested Random Effects ! Introduction to Clustering ! Random Effects ! Modeling Random Effects in R ! Interpretation ! Random Intercept ! BLUPs ! Residual Error ! ICC ! Notation ! Summary
• 34. Fixed Effects vs. Random Effects • 1 + TOI + Classroom What we want to know about the Classroom variable, and how we are using it, is different from the effect of Theory Of Intelligence. Can’t we just add Classroom as another fixed effect variable?
• 35. Fixed Effects vs. Random Effects • What makes the Classroom variable different from the TOI variable? # If we included Classroom as a fixed effect, we’d get many, many comparisons between individual classrooms
• 36. Fixed Effects vs. Random Effects • What makes the Classroom variable different from the TOI variable? # If we included Classroom as a fixed effect, we’d get many, many comparisons between individual classrooms # But, our theoretical interest is in effects of theories of intelligence, not in effects of being Ms. Fulton # If another researcher wanted to replicate this experiment, they could include the Theories of Intelligence scale, but they probably couldn’t get the same teachers # We do expect our results to generalize to other teachers/classrooms, but this experiment doesn’t tell us anything about how the relation would generalize to other questionnaires
• 37. Fixed Effects vs. Random Effects • What makes the Classroom variable different from the TOI variable? • These classrooms are just some classrooms we sampled out of the population of interest # Fixed effects: • We’re interested in the specific categories/levels • The categories are a complete set • At least within the context of the research design # Random effects: • Not interested in the specific categories • Observed categories are simply a sample out of a larger population
• 38. Fixed Effect or Random Effect? • Scott interested in the effects of distributing practice over time on statistics learning. For his experimental items, he picks 10 statistics formulae randomly out of a textbook. Then, he samples 20 Pittsburgh-area grad students as participants. Half study the items using distributed practice and half study using massed practice (a single day) before they are all tested. 1. Participant is a… 2. Item is a… 3. Practice type (distributed vs. massed) is a …
• 39. Fixed Effect or Random Effect? • Scott interested in the effects of distributing practice over time on statistics learning. For his experimental items, he picks 10 statistics formulae randomly out of a textbook. Then, he samples 20 Pittsburgh-area grad students as participants. Half study the items using distributed practice and half study using massed practice (a single day) before they are all tested. 1. Participant is a… • Random effect. Scott sampled them out of a much larger population of interest (grad students). 2. Item is a… • Random effect. Scott’s not interested in these specific formulae; he picked them out randomly. 3. Practice type (distributed vs. massed) is a … • Fixed effect. We’re comparing these 2 specific conditions
• 40. Fixed Effect or Random Effect? 4. A researcher in education is interested in the relation between class size and student evaluations at the university level. The research team collects data at 10 different universities across the US. University is a… 5. A planner for the city of Pittsburgh compares the availability of parking at Pitt vs CMU. University is a…
• 41. Fixed Effect or Random Effect? 4. A researcher in education is interested in the relation between class size and student evaluations at the university level. The research team collects data at 10 different universities across the US. University is a… • Random effect. Goal is to generalize to universities as a whole, and we just sampled these 10. 5. A planner for the city of Pittsburgh compares the availability of parking at Pitt vs CMU. University is a… • Fixed effect. Now, we DO care about these two particular universities.
• 42. Fixed Effect or Random Effect? 6. We’re testing the effectiveness of a new SSRI on depressive systems. In our clinical trial, we manipulate the dosage of the SSRI that participants receive to be either 0 mg (placebo), 10 mg, or 20 mg per day based on common prescriptions. Dosage is a…
• 43. Fixed Effect or Random Effect? 6. We’re testing the effectiveness of a new SSRI on depressive systems. In our clinical trial, we manipulate the dosage of the SSRI that participants receive to be either 0 mg (placebo), 10 mg, or 20 mg per day based on common prescriptions. Dosage is a… • Fixed effect. This is the variable that we’re theoretically interested in and want to model. Also, 0, 10, and 20 mg exhaustively characterize dosage within this experimental design.
• 44. Week 4.2: Nested Random Effects ! Overfitting ! The Problem ! Solution ! Nested Random Effects ! Introduction to Clustering ! Random Effects ! Modeling Random Effects in R ! Interpretation ! Random Intercept ! BLUPs ! Residual Error ! ICC ! Notation ! Summary
• 45. Modeling Random Effects • Let’s add Classroom as a random effect to the model • model1 <- lmer(FinalMathScore ~ 1 + TOI + (1|Classroom), data=math) • We are now controlling for some classrooms having higher scores than others • Still a significant TOI effect!
• 46. Modeling Random Effects • What is (1|Classroom) doing? • model1 <- lmer(FinalMathScore ~ 1 + TOI + (1|Classroom), data=math) • We’re allowing each classroom to have a different intercept • Some classrooms have higher math scores on average • Some have lower math scores on average • A random intercept
• 47. Modeling Random Effects • What is (1|Classroom) doing? • model1 <- lmer(FinalMathScore ~ 1 + TOI + (1|Classroom), data=math) • We are not interested in comparing the specific classrooms we sampled • Instead, we are model the variance of this population • How much do classrooms typically vary in math achievement?
• 48. Modeling Random Effects • Model results: • We are not interested in comparing the specific classrooms we sampled • Instead, we are model the variance of this population • How much do classrooms typically vary in math achievement? • Standard deviation across classrooms is 2.86 points Additional, unexplained subject variance (even after accounting for classroom differences) Variance of classroom intercepts
• 49. Week 4.2: Nested Random Effects ! Overfitting ! The Problem ! Solution ! Nested Random Effects ! Introduction to Clustering ! Random Effects ! Modeling Random Effects in R ! Interpretation ! Random Intercept ! BLUPs ! Residual Error ! ICC ! Notation ! Summary
• 50. Understanding the Random Intercept ! Think back to a normal distribution… ! The standard normal has mean 0 and standard deviation 1 1
• 51. Understanding the Random Intercept ! We can also have normal distributions with other means and standard deviations ! This one has mean ~66 and standard deviation ~3 3
• 52. Understanding the Random Intercept ! Fixed intercept tells us that the mean intercept, across all classes, is 66
• 53. Understanding the Random Intercept ! But, there is a distribution of class averages ! Some classrooms average high or lower than that ! This distribution has a standard deviation of ~2.9 (std. deviation of random intercept) 3 64% of classrooms have an intercept between 63 and 69 96% of classrooms have an intercept between 60 and 72
• 54. Understanding the Random Intercept ! So: ! Fixed intercept tell us the mean of the distribution: 66 ! Standard deviation of the random intercept tells us the standard deviation of that distribution: 2.9 ! Assumed in lmer() to be a normal distribution 3 64% of classrooms have an intercept between 63 and 69 96% of classrooms have an intercept between 60 and 72
• 55. Understanding the Random Intercept ! Our classroom are a random sample from this population of classrooms with different class averages 3 64% of classrooms have an intercept between 63 and 69 96% of classrooms have an intercept between 60 and 72
• 56. Understanding the Random Intercept ! How much variance tells us how much variability there is across classrooms ! i.e., how wide a spread of classrooms ! e.g., if the SD had only been 1 " less variable 64% of classrooms have an intercept between 65 and 67 96% of classrooms have an intercept between 64 and 68
• 57. Understanding the Random Intercept ! How much variance tells us how much variability there is across classrooms ! i.e., how wide a spread of classrooms ! Or if the SD had been 10 " much more variable 64% of classrooms have an intercept between 56 and 76 96% of classrooms have an intercept between 46 and 86
• 58. Caveats • For a fair estimate of the population variance: • At least 5-6 clustering units, 10+ preferred (e.g., 5+ classrooms) (Bolker, 2018) • Population size is at least 100x the number of groups you have (e.g., at least 2400 classrooms in the world) (Smith, 2013) • If not, should still include the random effect to account for clustering. Just wouldn’t a good estimate of the population variance • For a true “random effect”, the observed set of categories samples from a larger population • If we’re not trying to generalize to a population, might instead call this a variable intercept model (Smith, 2013)
• 59. Week 4.2: Nested Random Effects ! Overfitting ! The Problem ! Solution ! Nested Random Effects ! Introduction to Clustering ! Random Effects ! Modeling Random Effects in R ! Interpretation ! Random Intercept ! BLUPs ! Residual Error ! ICC ! Notation ! Summary
• 60. BLUPs ! Where do individual classrooms fall in this distribution? • ranef(model1) • Shows you the intercepts for individual classrooms • These are adjustments relative to the fixed effect • Best Linear Unbiased Predictors (BLUPs) Ms. Baker’s classroom has a class average that is +4.5 relative to the overall intercept Mean intercept is 66 ms, so intercept for Ms. Baker’s class: 66 + 4.5 = 70.5
• 61. BLUPs • Why aren’t these BLUPs displayed in our initial results from summary()? • For random effects, we’re mainly interested in modeling variability • BLUPs aren’t considered parameters of the model • Not what this is a model “of” • We ran this analysis to model the effects of TOI on kids’ math performance, not the effect of being Ms. Baker from Allentown • If we ran the same design with a different sample, BLUPs probably wouldn’t be the same • No reason to expect that Classroom #12 in the new sample will again be one of the better classrooms • By contrast, we do intend for our fixed effects to replicate
• 62. Week 4.2: Nested Random Effects ! Overfitting ! The Problem ! Solution ! Nested Random Effects ! Introduction to Clustering ! Random Effects ! Modeling Random Effects in R ! Interpretation ! Random Intercept ! BLUPs ! Residual Error ! ICC ! Notation ! Summary
• 63. Residual Variance • We now know how to understand the Classroom variance • What about the Residual variance? • This is the variance of the residuals • Variance in individual math scores not explained by any of our other variables: • Overall intercept • Theory of intelligence • Classroom differences • True error variance • In this case, what’s unique about child i
• 64. Residual Variance ! There is a distribution of child-level residuals ! This distribution has a standard deviation of 5.5 ! Mean of the distribution of residuals is 0 by definition 5.5 64% of children have a residual between -5.5 and 5.5 96% of children have a residual between -11 and 11
• 65. Week 4.2: Nested Random Effects ! Overfitting ! The Problem ! Solution ! Nested Random Effects ! Introduction to Clustering ! Random Effects ! Modeling Random Effects in R ! Interpretation ! Random Intercept ! BLUPs ! Residual Error ! ICC ! Notation ! Summary
• 66. Intraclass Correlation Coefficient • Model results: • The intraclass correlation coefficient measures how much variance is attributed to a particular random effect ICC = Variance of Random Effects of Interest Sum of All Random Effect Variances = Classroom Variance Classroom Variance + Residual Variance ≈ .21
• 67. Intraclass Correlation Coefficient • The intraclass correlation coefficient measures how much variance is attributed to a random effect • Proportion of all random variation that has to do with classrooms • 21% of random student variation due to which classroom they are in • Also the correlation among observations from the same classroom • High correlation among observations from the same classroom = Classroom matters a lot = high ICC • Low correlation among observations from the same classroom = Classroom not that important = low ICC
• 68. Week 4.2: Nested Random Effects ! Overfitting ! The Problem ! Solution ! Nested Random Effects ! Introduction to Clustering ! Random Effects ! Modeling Random Effects in R ! Interpretation ! Random Intercept ! BLUPs ! Residual Error ! ICC ! Notation ! Summary
• 69. Notation • What exactly is this model doing? • Let’s go back to our model of individual students (now slightly different): Student Error Ei(j) = End-of-year math exam score + + Baseline Yi(j) B0j Fixed mindset γ10x1i(j)
• 70. Notation • What exactly is this model doing? • Let’s go back to our model of individual students (now slightly different): Student Error Ei(j) = End-of-year math exam score + + Baseline Yi(j) B0j Fixed mindset γ10x1i(j) What now determines the baseline that we should expect for students with fixed mindset=0?
• 71. Notation • What exactly is this model doing? • Baseline (intercept) for a student in classroom j now depends on two things: • Let’s go back to our model of individual students (now slightly different): Student Error Ei(j) = End-of-year math exam score + + Baseline Yi(j) B0j Fixed mindset γ10x1i(j) U0j = Intercept + Overall intercept across everyone B0j γ00 Teacher effect for this classroom (Error)
• 72. Notation • Essentially, we have two regression models • Hierarchical linear model • Model of classroom j: • Model of student i: Student Error Ei(j) = End-of-year math exam score + + Baseline Yi(j) B0j Growth mindset γ10x1i(j) U0j = Intercept + B0j γ00 Teacher effect for this classroom (Error) LEVEL-1 MODEL (Student) LEVEL-2 MODEL (Classroom) Overall intercept across everyone
• 73. Hierarchical Linear Model Student 1 Student 2 Student 3 Student 4 Level-1 model: Sampled STUDENTS Mr. Wagner’s Class Ms. Fulton’s Class Ms. Green’s Class Ms. Cornell’s Class Level-2 model: Sampled CLASSROOMS • Level-2 model is for the superordinate level here, Level-1 model is for the subordinate level Variance of classroom intercept is the error variance at Level 2 Residual is the error variance at Level 1
• 74. Notation • Two models seems confusing. But we can simplify with some algebra… • Model of classroom j: • Model of student i: Student Error Ei(j) = End-of-year math exam score + + Baseline Yi(j) B0j Growth mindset γ10x1i(j) U0j = Intercept + B0j γ00 Teacher effect for this classroom (Error) LEVEL-1 MODEL (Student) LEVEL-2 MODEL (Classroom) Overall intercept across everyone
• 75. Notation • Substitution gives us a single model that combines level-1 and level-2 • Mixed effects model • Combined model: Student Error Ei(j) = End-of-year math exam score + + Yi(j) Growth mindset γ10x1i(j) U0j + Overall intercept γ00 Teacher effect for this classroom (Error)
• 76. Notation • Just two slightly different ways of writing the same thing. Notation difference, not statistical! • Mixed effects model: • Hierarchical linear model: Ei(j) = + + Yi(j) γ10x1i(j) U0j + γ00 Ei(j) = Yi(j) B0j γ10x1i(j) U0j = + B0j γ00 + +
• 77. Notation • lme4 always uses the mixed-effects model notation • lmer( FinalMathScore ~ 1 + TOI + (1|Classroom) ) • (Level-1 error is always implied, don’t have to include) Student Error Ei(j) = End-of-year math exam score + + Yi(j) Growth mindset γ10x1i(j) U0j + Overall intercept γ00 Teacher effect for this class (Error)
• 78. Week 4.2: Nested Random Effects ! Overfitting ! The Problem ! Solution ! Nested Random Effects ! Introduction to Clustering ! Random Effects ! Modeling Random Effects in R ! Interpretation ! Random Intercept ! BLUPs ! Residual Error ! ICC ! Notation ! Summary
• 79. Summary • Adding a random intercept for Classroom accomplishes two things: • Controls for variation across classrooms • Deals with the clustering of observations with classrooms • Failing to control for clustering this inflates Type I error • Measures the amount of this variation • What is the variance of math scores across classrooms? • How does this compare to other sources of variance?