Your SlideShare is downloading. ×

An Introduction to Multilevel Regression Modeling for Prediction

6,804

Published on

To download the presentation, please go to Harlan's or Jared's websites: …

To download the presentation, please go to Harlan's or Jared's websites:
jaredlander.com
harlan.harris.name

Although traditional multiple regression is an extremely powerful tool for prediction, it can be inadequate when the goal is to predict relationships that differ among groups. For example, the relationship between income and political affiliation varies among American states, and the relationship between income level and calorie intake varies among counties of the world. Traditional multiple regression will either try to independently estimate these relationships, which can be very problematic if there is not enough data, or will lump the groups together, throwing away potentially valuable differences among groups. A more powerful approach is to assume that the groups have a statistical distribution of their own, just as the error among individual observations is assumed to come from a (often normal) distribution. Then the data in each group is "partially pooled" with all of the other data, appropriately splitting the difference between the two extremes. In the general case, this is Bayesian model estimation, which can be very complex and difficult to do well. But in more common cases, simpler statistical techniques called variously "multilevel" or "hierarchical regression," and "mixed-effects modeling" can be used to improve the quality of predictions. In this talk, we will motivate and explain the basics of practical multilevel regression, and will demonstrate how it works using R.

Bios:
Harlan D. Harris, PhD, works as a statistical data scientist for Kaplan Test Prep and Admissions in New York City. He previously worked as a cognitive psychology researcher at NYU, UConn and Columbia University, and studied machine learning and cognitive science at the University of Illinois at Urbana-Champaign.

Jared Lander is a statistical consultant based in New York City. With a masters from Columbia University in statistics and a bachelors from Muhlenberg College in mathematics, he has experience in both academic research and industry. His work for both large and small organizations ranges from music and fund raising to finance and humanitarian relief efforts. He specializes in data management, multilevel models, machine learning, generalized linear models, data management and
statistical computing.

1 Comment
6 Likes
Statistics
Notes
No Downloads
Views
Total Views
6,804
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
0
Comments
1
Likes
6
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Harlan D. Harris, PhD Jared P. Lander, MA NYC Predictive Analytics Meetup October 14, 2010 Predicting Pizza in Chinatown: An Intro to Multilevel Regression
  • 2.
    • 1. How do cost and  fuel type affect pizza quality?
    • 2. How do those factors vary by neighborhood ?
  • 3. Linear Regression (OLS)
    • rating i = β 0 +
    • β price *price i + ε i
    •  
    • find β’s to minimize Σ ε i 2
    •  
  • 4. Linear Regression (OLS)
    • rating i = β p +
    • β price *price i + ε i
    •  
    • find β ’s to minimize Σ ε i 2
    •  
  • 5. Multiple Regression
    • rating i = beta[intercept] * 1 +
    •               beta[price] * price i +
    •               beta[oven=wood] * I(oven i =wood) +
    •               beta[oven=coal] * I(oven i =coal) +
    •               error i
    • Goal: find betas/coefficients that minimize Σ i error i 2
    3 types of oven =  2 coefficients (gas is reference)
  • 6. Multiple Regression (OLS)
    • rating i = β 0 +
    • β price *price i +
    •   β wood *I(oven i = "wood") +
    • β coal *I(oven i = "coal") +  
    •   ε i
    •  
    • find β’ s to minimize Σ ε i 2
    •  
    •  
  • 7. Multiple Regression (OLS) with Interactions
    • rating i = β 0 +
    • β price *price i +
    •   β wood *I(oven i = "wood") +
    • β wood,price *price i *
    • I(oven i = "wood") +
    •   β coal *I(oven i = "coal") +  
    •   β coal,price *price i * 
    •         I(oven i = "coal") +
    •   ε i
    •  
    •  
  • 8. Groups
    • Examples:
    • teachers / test scores
    • states / poll results
    • pizza ratings / neighborhoods
  • 9. Full Pooling (ignore groups)
    • Examples:
    • teachers / test scores
    • states / poll results
    • pizza ratings /    
    •             neighborhoods
    rating i = β 0 + β price *price i + ε i
  • 10. No Pooling (groups as factors)
    • rating i = β 0 +
    • β price *price i +
    •   β B *I(group i = " B ") +  
    •   β B ,price *price i *
    • I(group i = " B ") +    
    •   β C *I(group i = " C ") +  
    •   β C ,price *price i *
    • I(group i = " C ") +      
    •   ε i
    •  
    •  
    •  
  • 11. Pizzas Name Rating $/Slice Fuel Type Neighborhood Rosario’s 3.5 2.00 Gas Lower East Side Ray’s 2.8 2.50 Gas Chinatown Joe’s 3.3 1.75 Wood East Village Pomodoro 3.8 3.50 Coal SoHo Response Continuous Categorical Group
  • 12. Data Summary in R
    • > za.df <- read.csv(&quot;Fake Pizza Data.csv&quot;)
    • > summary(za.df)      Rating       CostPerSlice   HeatSource      Neighborhood  Min.   :0.030   Min.   :1.250   Coal: 17   Chinatown  :14    1st Qu.:1.445   1st Qu.:2.000   Gas :158   EVillage   :48    Median :4.020   Median :2.500   Wood: 25   LES        :35    Mean   :3.222   Mean   :2.584              LittleItaly:43    3rd Qu.:4.843   3rd Qu.:3.250              SoHo       :60    Max.   :5.000   Max.   :5.250                              
    http://github.com/HarlanH/nyc-pa-meetup-multilevel-pizza
  • 13. Viewing the Data in R
    • > plot(za.df)
    •  
  • 14. Visualize ggplot(za.df, aes(CostPerSlice, Rating,     color=HeatSource)) +  geom_point() + facet_wrap(~ Neighborhood) + geom_smooth(aes(color=NULL),     color='black', method='lm',      se=FALSE, size=2)
  • 15.
    • > lm.full.main <- lm(Rating ~ CostPerSlice + HeatSource, data=za.df) > plotCoef(lm.full.main)
    Multiple Regression in R http://www.jaredlander.com/code/plotCoef.r
  • 16. Full-Pooling: Include Interaction
    • > lm.full.int <- lm(Rating ~ CostPerSlice * HeatSource,data=za.df) > plotCoef(lm.full.int)
    •  
    •  
    •  
    •  
    •  
    •  
    •  
    •  
    •  
    •  
    •  
    •  
    •  
    •  
    •  
    •  
  • 17. Visualize the Fit (Full-Pooling)
    • > lm.full.int <- lm(Rating ~ CostPerSlice * HeatSource,data=za.df)
    •  
    •  
    •  
    •  
    •  
    •  
    •  
    •  
    •  
    •  
    •  
    •  
    •  
    •  
    •  
    •  
  • 18.  
  • 19. No Pooling Model
    • lm(Rating ~ CostPerSlice * Neighborhood + HeatSource,data=za.df)
    •  
  • 20. Visualize the Fit (No-Pooling)
    • lm(Rating ~ CostPerSlice * Neighborhood + HeatSource,data=za.df)
  • 21. Evaluation of Fitted Model
      • Cross-Validation Error
      • Adjusted-R 2
      • AIC
      • BIC
      • RSS
      • Tests for Normal Residuals
  • 22. Use Natural Groupings
    • Cluster Sampling
    • Intercluster Differences
    • Intracluster Similarities
  • 23. Multilevel Characteristics
    • Model gravitates toward big groups
    • Small groups gravitate toward the model
    •  
    • Best when groups are similar to each other
    •  
    • y_i = Intercept_j[i] + Slope_j[i] + noise
      • Intercept[j] = Intercept_alpha + Slope_alpha + noise
      • Slope[j] = Intercept_beta + Slope_beta + noise
    • Model the effects of the groups
  • 24. Multi-Names for Multilevel Models
    • Multilevel
    • Hierarchical
    • Mixed-Effects
    • Bayesian
    • Partial-Pooling
  • 25. Multi-Names for Multilevel Models
    • (1) Fixed effects are constant across individuals, and random effects vary. For example, in a growth study, a model with random intercepts a_i and fixed slope b corresponds to parallel lines for different individuals i, or the model y_it = a_i + b t. Kreft and De Leeuw (1998) thus distinguish between fixed and random coefficients.
    • (2) Effects are fixed if they are interesting in themselves or random if there is interest in the underlying population. Searle, Casella, and McCulloch (1992, Section 1.4) explore this distinction in depth.
    • (3) &quot;When a sample exhausts the population, the corresponding variable is fixed ; when the sample is a small (i.e., negligible) part of the population the corresponding variable is  random .&quot; (Green and Tukey, 1960)
    • (4) &quot;If an effect is assumed to be a realized value of a random variable, it is called a random effect.&quot; (LaMotte, 1983)
    • (5) Fixed effects are estimated using least squares (or, more generally, maximum likelihood) and random effects are estimated with shrinkage (&quot;linear unbiased prediction&quot; in the terminology of Robinson, 1991). This definition is standard in the multilevel modeling literature (see, for example, Snijders and Bosker, 1999, Section 4.2) and in econometrics.
    • http://www.stat.columbia.edu/~cook/movabletype/archives/2005/01/why_i_dont_use.html
  • 26. Bayesian Interpretation
    • Everything has a distribution (including the groups)
    • Group-level model is prior information for the individual-level coefficients
    • Group-level model has an assumed-normal prior
    • (Can fit multilevel models with Bayesian methods, or with simpler/faster/easier approximations.)
  • 27. R Options
      • lme4::lmer()
      • nlme::lme()
      • MCMCglmm()
      • BUGS
      • Others/niche approaches…
  • 28. Back to the Pizza
    • Model the overall pattern among neighborhoods
    • Natural clustering of pizzerias in neighborhoods adds information
    • Neighborhoods with many/few pizzerias
      • Many: trust data , ala no-pooling model
      • Few: trust overall patterns , ala full-pooling model
  • 29. Back to the Pizza
    • Use Neighborhoods as natural grouping
    •  
  • 30.
    • 5 slope coefficients and 5 intercept coefficients, one of each per neighborhood
    • Slopes/intercepts are assumed to have Gaussian distribution
    • Ideally, could describe all 5 slopes with 2 numbers (mean/variance)
    • Neighborhoods with little data don’t get freedom to set their own coefficients – get pulled towards overall slope or intercept
    Multilevel Pizza
  • 31. R syntax
    • lm.me.cost2 <- lmer(Rating ~ HeatSource +
    • (1+CostPerSlice | Neighborhood), data=za.df)
  • 32. Results (Partial-Pooling)
    • lm.me.cost2 <- lmer(Rating ~ HeatSource +
    • (1+CostPerSlice | Neighborhood), data=za.df)
  • 33. Predicting a New Pizzeria
    • Neighborhood: Chinatown
    • Cost: $4.20
    • Fuel: Wood
  • 34. Uncertainty in Prediction
    • Fitted coefficients are uncertain
      • arm::sim()
    • Model error term
      • rnorm(1, model matrix %*% sim$Neighborhood[ , ‘Chinatown’, ], variance)
    • New neighborhood – model possible coefficients
      • mvrnorm(1, 0, VarCorr(model)$Neighborhood)
    http://github.com/HarlanH/nyc-pa-meetup-multilevel-pizza
  • 35.
    • Red State Blue State
    Other Examples
  • 36.
    • Tobacco Usage
    Other Examples
  • 37.
    • Diabetes Prevalence
    Other Examples
  • 38.
    • Insufficient Fruit and Vegetable Intake
    Other Examples
  • 39.
    • Clean Drinking Water
    Other Examples
  • 40.
    • Full-Pooling Model
    • No-Pooling Model
    • Separate Models
    • Two–Step Analysis
    Steps to Multilevel Models
  • 41.
    • As few as one or two groups
    • Even two observations per group
    • Can have many groups with just one observation
    How Many Groups? How Many Observations?
  • 42.
    • Andy Gelman: “The Blessing of Dimensionality”
    • More Data  Add Complexity
      • Because you can
    Larger Datasets
  • 43. Resources
    • Gelman and Hill (ARM)
    • Pineiro & Bates
    • Snijders and Bosker
    • R-SIG-Mixed-Models (http://glmm.wikidot.com/faq)
    • (SAS/SPSS)
    •  
  • 44. Thanks!

×