Presentation for the fourth meeting of the EARLI SIG 18 Educational Effectiveness.
Abstract: The topic of comparative international largescale assessments (LSA) has always had a lot of attention from policy makers and educational researchers, inviting criticism. One criticism concerns the fact that the complex sampling design of LSA is not always taken into account. This paper aims to demonstrate the consequences of not taking into account the sampling design of one such assessment, TIMSS 2011. Three features, weights, proficiency estimation with plausible values and variance estimation with jackknife are used in single level (students) and multilevel (students and schools) cases. The results show that the consequences can be significant, but are not completely in line with previous literature.
