Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.

Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.

Successfully reported this slideshow.

Like this presentation? Why not share!

- Driscoll bi sig_15_jun2010 by Dataspora 2098 views
- Winning With Big Data: Secrets of ... by Dataspora 2178 views
- Winning with Big Data: Secrets of t... by Dataspora 9891 views
- A Survey Of R Graphics by Dataspora 15515 views
- ForecastIT 6. Multi-Variable Linear... by DeepThought, Inc. 2483 views
- Wal Mart And Tesco Study by Pham Ngoc 19987 views

2,109 views

Published on

A workshop on using sampling weights in multi-level modelling

No Downloads

Total views

2,109

On SlideShare

0

From Embeds

0

Number of Embeds

12

Shares

0

Downloads

68

Comments

0

Likes

1

No embeds

No notes for slide

- 1. Please check if you have 4 files in the folder C:KIEL Please start SPSS nowMULTIPLE REGRESSION ANDMULTILEVEL MODELLING 1
- 2. 2 I NTRODUCTION 1. Brief overview of multiple regression analysis 2. Multiple regression using PISA data 3. Brief overview of multilevel modelling 4. Multilevel modelling using PISA data 5. Differences between the two types of analyses
- 3. O VERVIEW MULTIPLE REGRESSION 3
- 4. 4 S IMPLE REGRESSION MODEL Predicting the dependent variable using linear relationship with independent variables Regression analysis with one independent variable: β0 is the intercept (the value of Ŷi when Xi=0) β1 is the slope of the line that minimises εi
- 5. 5 β1 β0
- 6. 6 M ULTIPLE REGRESSION MODEL R<>0 Ŷi r=0
- 7. 7 M ULTIPLE REGRESSION MODEL O O O
- 8. M ULTIPLE REGRESSION WITHPISA DATA 8
- 9. 9 PISA DATA P LAUSIBLE VALUES Plausible values for cognitive performance 5 randomly drawn values from a student’s most likely ability range (posterior distribution) Unbiased populations estimates (even with one PV) Imputation variance (measurement error) NEVER average the plausible values! Instead, average 5 statistics (means, regression coefficients, etc.)
- 10. 10 PISA DATA S TUDENT WEIGHTS Final student weight: the number of students represented in the population by each student The inverse of the probability to select the student’s school times the probably of selecting the student given that the school is selected Non-response and post-stratification adjustments and trimming
- 11. 11 PISA DATA R EPLICATE WEIGHTS 80 BRR replicate weight with Fay’s k=0.5 Used to compute sampling variance Computation of sampling variance using BRR weights Takes two-stage sampling method into account Takes stratification into account is identical for any statistic
- 12. 12 E RROR VARIANCE Error variance is a combination of the sampling variance and the imputation variance (measurement error) Imputation variance can only be estimated when using a set of plausible values Imputation variance is small compared to the sampling variance Standard error is the square root of the error variance
- 13. 13 C OMPUTATION OF STANDARD ERROR Error variance Sampling variance Imputation variance Standard error is the square root of the error variance
- 14. 14 SPSS REPLICATES ADD - IN Password WI-FI: Hawking09+ mypisa.acer.edu.au Public data & analysis Software & manuals Download and install replicates add-in Start SPSS Copy CD to C:Kiel and unzip file
- 15. 15 E XAMPLE IN SPSS C:KielINT_Stu06_SCHWGT.sav German data Regress science performance on Sex Immigration status ESCS
- 16. O VERVIEW MULTILEVEL MODELLING 16
- 17. 17 E XAMPLE For Japan in 2006: Strong relationship between ESCS and science (38.8) Large intra-class correlation in performance Small intra-class correlation in ESCS For this example, only nine Japanese schools are selected
- 18. 18 S INGLE LEVEL REGRESSION Overall slope is 38.8
- 19. 19 M ULTILEVEL MODEL WITH RANDOM SLOPES
- 20. 20 M ULTILEVEL MODEL WITH FIXED SLOPES Average slope is 7.2
- 21. 21 I NTERPRETATION REGRESSION COEFFICIENTS Single level regression gives the overall relationship between ESCS and performance in a country (38.8 in Japan) Multi-level regression takes the 2-level structure of the data into account and Estimates a unique slope within each school (or the variance of the slopes) or Estimates the average slope within schools (7.2 in Japan) Which type of analysis is more correct?
- 22. 22 N OTATION MLM Random intercept model Level 1: Yij 0 j 1 X ij rij Level 2: 0 j 00 01W j u0 j Random slopes and random intercept Level 1: Yij 0 j 1 j X ij rij Level 2: 0 j 00 01W j u0 j 1 j 10 11W j u1 j
- 23. 23 R ANDOM INTERCEPT System of equations Level 1: Yij 0 j 1 X ij rij Level 2: 0 j 00 0 jW j u0 j Mixed-effects model Yij 00 0 jW j 1 X ij u0 j rij Fixed part Random part
- 24. 24 R ANDOM INTERCEPT AND RANDOM SLOPES System of equations Level 1: Yij 0 j 1 j X ij rij Level 2: 0 j 00 0 jW j u0 j 1 j 10 11W j u1 j Mixed-effects model Yij 00 0 jW j u0 j 10 11W j u1 j X ij rij 00 0 jW j 10 X ij 11W j X ij u1 j X ij u0 j rij Fixed part Random part Cross-level interaction
- 25. 25 VARIANCE DECOMPOSITION In single level regression analysis, the overall variance of the dependent variable is estimated and the amount of this variance that is explained by the independent variables (R-squared) In multilevel analysis, the variance is decomposed in between-cluster (school) and within-cluster variance The independent variables can explain variance at either level or at both levels
- 26. 26 VARIANCES Total variance = within-cluster variance + between-cluster variance Average within-cluster variance y y j 2 n( 2) n(1) 2 ij n(2) n(1) 1 r j 1 i 1 Between-cluster variance y y 2 n( 2 ) r2 0 j j 2 (2) j 1 n n (1)
- 27. 27 I NTRACLASS CORRELATION AND EXPLAINED VARIANCE Null model: yij 00 u0 j rij Intraclass correlation (rho)= between-cluster variance / total variance Explained variance (R-squared) of a model with predictors: Level 1: 1 - (var(W)p / var(W)n) Level 2: 1 - (var(B)p / var(B)n)
- 28. 28 T HE STANDARD ERROR One assumption of OLS is independence of observations In 2-stage sampling designs, observations within clusters are often not independent MLM allows for correlated errors and therefore gives unbiased SEs Generally, SEs estimated with OLS are too small However, BRR replicate weights are designed to deal with the dependence of observations within schools, so OLS with BRR gives correct standard errors!
- 29. 29 W EIGHTING - 1 Single level regression: final students weights and BRR replicate weights How do we use PISA weights in MLM? Data analysis manual: normalise final student weights and replication weights and run the analysis in SPSS or SAS We now know this is not the best way
- 30. 30 W EIGHTING - 2 SPSS and SAS do not assume the weights to be sampling weights (they are precision weights) SPSS and SAS can only weight at the student level MLM and BRR are both taking the multi-level structure of the data into account, so this is done twice in the PISA data analysis manual method However, there is no final consensus about the right way to use weights in MLM
- 31. 31 W EIGHTING - 3 In PISA school-level sampling is much more informative than student-level sampling (stratification is at school-level; students have often very similar weights within schools ) Therefore, schools should be weighted by a school-level weight Students should be weighted by a conditional student weight (inverse of the probability to be selected given that the student’s school is sampled)
- 32. 32 W EIGHTING - 4 Options for conditional student level weights: Equal weights (weight=1) Raw conditional student weights Rescaled weights: Pfefferman method 1 when student sampling is not informative Rescaled weights: Pfefferman method 2 when student sampling is informative Differences are small when cluster sizes are larger than 20 students
- 33. 33 R AW CONDITIONAL STUDENT WEIGHTS Raw conditional student weights: W_FSTUWT w (1) i| j W_FSCHWT School weight is included in the school questionnaire data file Not exactly correct, because some adjustments are made independent of schools (e.g. non- response adjustment) Often leads to an overestimation of the between-school variance
- 34. 34 P FEFFERMAN METHOD 1 When student sampling is not informative at level 1 Conditional student weights are multiplied by the sum of weights within cluster divided by the sum of squared weights within cluster n(1) j |j wi(1) PFEFF1 wi(1) |j i 1 n(1) w j (1) 2 i| j i 1
- 35. 35 P FEFFERMAN METHOD 2 When student sampling is informative Conditional student weights are divided by the average conditional student weight in school j or n (1) PFEFF 2 wi(1) j |j n(1) j w i 1 (1) i| j This is the same as normalising full student weights within schools
- 36. 36 L ET ’ S TRY IT OUT IN MLWI N Australia, because they oversample indigenous students who perform less than non-indigenous students (positive correlation between conditional student weights and performance) C:Kiel INT_Stu06_SCHWGT.sav I have added the full school weights (W_FSCHWT) and the normalised school weights (N_FSCHWT) N_FSCHWT= W_FSCHWT*SAMPSIZE/POPSIZE
- 37. 37 C OMPARING CONDITIONAL STUDENT WEIGHTS IN MLWI N - 1 Equal Raw Pfeff1 Pfeff2 Std MLwiN Response PV1SCIE PV1SCIE PV1SCIE PV1SCIE PV1SCIE Fixed Part CONS 521 523 522 522 520 Random Part Level: SCHOOLID CONS/CONS 1527 1300 1517 1508 1782 Level: STIDSTD CONS/CONS 8605 29105 8172 8472 8404 -2*loglikelihood: 169452 170788 169614 169633 173562 DIC: Units: SCHOOLID 356 356 356 356 356 Units: STIDSTD 14170 14170 14170 14170 14170
- 38. 38 C OMPARING CONDITIONAL STUDENT WEIGHTS IN MLWI N – 2 Equal weights (=1) and Pfefferman methods 1 and 2 give similar results when using PISA data Pfefferman method 2 most conservative: recommended Raw weights over-estimate the within-school variance (I think this is MLwiN specific, similar problem with unscaled school weights)
- 39. 39 W EIGHTS STANDARDISED BY MLWI N MLwiN’s standardisation of the weights: At the school level, the full school weight is normalised at country level The student level weight is the Pfefferman 2 conditional student weight * the normalised school weight * a factor to make the average student weight equal to one Odd that the school weight is included at both levels, but results are the same as in HLM
- 40. 40 W HICH WEIGHTS ARE BETTER ? In simulation study the differences in results were minimal, but the differences were big when using data from some real countries We do not know which method is best Probably safest in MLwiN to use standardised weights, because we do not know how the weights are built into their algorithm Need to explore what other software packages do (gllamm in STATA)
- 41. 41 R EFERENCES Rabe-Hesketh, S. & Skondral, A. (2006). Multi- level modelling of complex survey data. Journal of Royal Statistical Society, 169, 805-827 Chantala, K., Blanchette, D. & Suchindran, C. M. (2006). Software to compute sampling weights for multilevel analysis. http://www.cpc.unc.edu/restools/data_analysis/ ml_sampling_weights/Compute%20Weights%20f or%20Multilevel%20Analysis.pdf
- 42. P RACTISE MLM WITH PISA DATA 42
- 43. 43 E XERCISE For MLwiN, data has to be sorted first by the highest level ID variable, then by the second highest, etc. (SCHOOLID in PISA) MLwiN needs a constant in the data (compute CONS=1.) to estimate the intercept Start with data from Chile, where the intraclass correlation in both science performance and ESCS is high Start MLwiN…
- 44. 44 WARNINGS Definition of a school is not the same in each country and not always that clear (campus) Differences in educational systems between or even within countries or cycles (tracked) Risk of swimming and too complicated models to interpret if MLM is more data driven than theory driven To interpret results carefully, you need to know enough about the educational system in a country or differences across countries
- 45. C OMPARING MULTIPLE REGRESSIONAND MULTILEVEL ANALYSIS 45
- 46. 46 C OMPARISONS OLS with BRR MLM Fixed effects Random effects and cross- level interactions Includes measurement Difficult to include error measurement error Takes stratification into I think it doesn’t take account school stratification into account Output is SPSS data file for Output is often in text easy editing format
- 47. 47 O PTIONS FOR FINAL PART OF THE WORKSHOP Try a MLM on data of your own country Try school and student level variables Try to add cross level interactions (free the slopes) Discuss MLMs that you have tried in the past or would like to do in the future Ask any PISA related data analysis questions

No public clipboards found for this slide

Be the first to comment