Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Predicting the quality of a survey question from its design characteristics: SQP

626 views

Published on

Talk held at the Census Bureau, 2011.

Published in: Science
  • Be the first to comment

  • Be the first to like this

Predicting the quality of a survey question from its design characteristics: SQP

  1. 1. Introduction Question design Modeling measurement error Estimating measurement error Predicting measurement error Concl Predicting the quality of a survey question from its design characteristics: SQP Daniel Oberski (joint work with Willem Saris) U N I V E R S I T A T P O M P E U F A B R A Predicting the quality of a survey question from its design characteristics: SQP Daniel Oberski
  2. 2. Introduction Question design Modeling measurement error Estimating measurement error Predicting measurement error Concl Measurement Representation Construct Measurement Response Edited data Validity Processing error Measurement error Inferential population Target population Sampling frame Sample Respondents Survey statistic Coverage error Sampling error Nonresponse error (Groves et al. 2004). Predicting the quality of a survey question from its design characteristics: SQP Daniel Oberski
  3. 3. Introduction Question design Modeling measurement error Estimating measurement error Predicting measurement error ConclConstruct Measurement Response Edited data Validity Processing error Measurement error Predicting the quality of a survey question from its design characteristics: SQP Daniel Oberski
  4. 4. Introduction Question design Modeling measurement error Estimating measurement error Predicting measurement error Concl • Assume the step from construct to measurement is already acceptable → Assume that the question measures an intended construct: respondent knows the answer, can interpret the question, ... Predicting the quality of a survey question from its design characteristics: SQP Daniel Oberski
  5. 5. Introduction Question design Modeling measurement error Estimating measurement error Predicting measurement error Concl • Assume the step from construct to measurement is already acceptable → Assume that the question measures an intended construct: respondent knows the answer, can interpret the question, ... → reaction of respondent to the question depends on some unobserved value/opinion, which is in turn a measure of construct. Predicting the quality of a survey question from its design characteristics: SQP Daniel Oberski
  6. 6. Introduction Question design Modeling measurement error Estimating measurement error Predicting measurement error Concl • Assume the step from construct to measurement is already acceptable → Assume that the question measures an intended construct: respondent knows the answer, can interpret the question, ... → reaction of respondent to the question depends on some unobserved value/opinion, which is in turn a measure of construct. • We focus only on the degree to which the response is a good measure of this unobserved score/opinion, “measurement error”. Predicting the quality of a survey question from its design characteristics: SQP Daniel Oberski
  7. 7. Introduction Question design Modeling measurement error Estimating measurement error Predicting measurement error Concl • Assume the step from construct to measurement is already acceptable → Assume that the question measures an intended construct: respondent knows the answer, can interpret the question, ... → reaction of respondent to the question depends on some unobserved value/opinion, which is in turn a measure of construct. • We focus only on the degree to which the response is a good measure of this unobserved score/opinion, “measurement error”. • (NOT the degree to which the question is interpretable, measures some construct, etc.) Predicting the quality of a survey question from its design characteristics: SQP Daniel Oberski
  8. 8. Introduction Question design Modeling measurement error Estimating measurement error Predicting measurement error Concl • Assume the step from construct to measurement is already acceptable → Assume that the question measures an intended construct: respondent knows the answer, can interpret the question, ... → reaction of respondent to the question depends on some unobserved value/opinion, which is in turn a measure of construct. • We focus only on the degree to which the response is a good measure of this unobserved score/opinion, “measurement error”. • (NOT the degree to which the question is interpretable, measures some construct, etc.) Predicting the quality of a survey question from its design characteristics: SQP Daniel Oberski
  9. 9. Introduction Question design Modeling measurement error Estimating measurement error Predicting measurement error Concl Reasons to study measurement error • Reliability is an upper bound on validity; responses can never measure underlying construct better than the single indicator. • Unreliability increases the variance of estimators: Predicting the quality of a survey question from its design characteristics: SQP Daniel Oberski
  10. 10. Introduction Question design Modeling measurement error Estimating measurement error Predicting measurement error Concl Reasons to study measurement error • Reliability is an upper bound on validity; responses can never measure underlying construct better than the single indicator. • Unreliability increases the variance of estimators: • var(ˆµ) = κ−1 σ2 /n, where κ ∈ (0, 1) is reliability Predicting the quality of a survey question from its design characteristics: SQP Daniel Oberski
  11. 11. Introduction Question design Modeling measurement error Estimating measurement error Predicting measurement error Concl Reasons to study measurement error • Reliability is an upper bound on validity; responses can never measure underlying construct better than the single indicator. • Unreliability increases the variance of estimators: • var(ˆµ) = κ−1 σ2 /n, where κ ∈ (0, 1) is reliability • Unreliability reduces apparent strength of relationships between variables: Predicting the quality of a survey question from its design characteristics: SQP Daniel Oberski
  12. 12. Introduction Question design Modeling measurement error Estimating measurement error Predicting measurement error Concl Reasons to study measurement error • Reliability is an upper bound on validity; responses can never measure underlying construct better than the single indicator. • Unreliability increases the variance of estimators: • var(ˆµ) = κ−1 σ2 /n, where κ ∈ (0, 1) is reliability • Unreliability reduces apparent strength of relationships between variables: • ρxy = κx · κy · ρXY , where ρXY is the true correlation and ρxy the observed correlation. Predicting the quality of a survey question from its design characteristics: SQP Daniel Oberski
  13. 13. Introduction Question design Modeling measurement error Estimating measurement error Predicting measurement error Concl Reasons to study measurement error • Reliability is an upper bound on validity; responses can never measure underlying construct better than the single indicator. • Unreliability increases the variance of estimators: • var(ˆµ) = κ−1 σ2 /n, where κ ∈ (0, 1) is reliability • Unreliability reduces apparent strength of relationships between variables: • ρxy = κx · κy · ρXY , where ρXY is the true correlation and ρxy the observed correlation. • Correlated measurement errors will make variables look more related than they really are; e.g. “How many minutes does it take to...” questions correlate partly because they are all asked in the same way. Predicting the quality of a survey question from its design characteristics: SQP Daniel Oberski
  14. 14. Introduction Question design Modeling measurement error Estimating measurement error Predicting measurement error Concl Reasons to study measurement error • Reliability is an upper bound on validity; responses can never measure underlying construct better than the single indicator. • Unreliability increases the variance of estimators: • var(ˆµ) = κ−1 σ2 /n, where κ ∈ (0, 1) is reliability • Unreliability reduces apparent strength of relationships between variables: • ρxy = κx · κy · ρXY , where ρXY is the true correlation and ρxy the observed correlation. • Correlated measurement errors will make variables look more related than they really are; e.g. “How many minutes does it take to...” questions correlate partly because they are all asked in the same way. Predicting the quality of a survey question from its design characteristics: SQP Daniel Oberski
  15. 15. Introduction Question design Modeling measurement error Estimating measurement error Predicting measurement error Concl Public health ranking: Correction of regression coefficients for κ Country Educationaldifferentialsinsubjectivehealthwith2s.e.interval -0.4-0.3-0.2-0.10.0 GR CZ PT SI FI HU PL SK LU ES EE DK DE TR IS NO CH BE IE FR UA AT NL SE Uncorrected regression coefficient Measurement error-corrected coefficient 0.82 0.85 0.78 0.73 0.56 0.75 0.71 0.81 0.86 0.85 0.95 0.84 0.91 0.70 0.81 0.87 0.81 0.82 0.92 0.85 0.91 0.81 0.93 0.99 Predicting the quality of a survey question from its design characteristics: SQP Daniel Oberski
  16. 16. Introduction Question design Modeling measurement error Estimating measurement error Predicting measurement error Concl Design characteristics of questions • Social Desirability • Centrality • Reference period • Question formulation • WH word used • Use of gradation • Balance of the request • Encouragement • Showcards present • Showcards have pictures • ... • Emphasis on subjective opinion in request • Information about the opinion of other people • Use of stimulus or statement in the question • Absolute or comparative judgment • Response scale: basic choice • Number of categories • Labels full, partial, or no • Labels full sentences • Knowledge provided • Survey mode • ... • Order of the labels • Correspondence between labels and numbers of the scale • Theoretical range of the scale • Neutral category • Number of fixed reference points • Don’t know option • Interviewer instruction • Respondent instruction • Extra motivation, info or definition available? • Agree-disagree scale • . . . (Saris & Gallhofer 2007) Predicting the quality of a survey question from its design characteristics: SQP Daniel Oberski
  17. 17. Introduction Question design Modeling measurement error Estimating measurement error Predicting measurement error Concl Question design choices • There are a great number of question design characteristics for which it has at some point been found or suggested that they influence the response; • Any question in a questionnaire represents a series of choices (conscious or not) on those characteristics: a method of asking the question; Predicting the quality of a survey question from its design characteristics: SQP Daniel Oberski
  18. 18. Introduction Question design Modeling measurement error Estimating measurement error Predicting measurement error Concl Question design choices • There are a great number of question design characteristics for which it has at some point been found or suggested that they influence the response; • Any question in a questionnaire represents a series of choices (conscious or not) on those characteristics: a method of asking the question; • It is clear that what is a good method depends strongly on the topic, for example Predicting the quality of a survey question from its design characteristics: SQP Daniel Oberski
  19. 19. Introduction Question design Modeling measurement error Estimating measurement error Predicting measurement error Concl Question design choices • There are a great number of question design characteristics for which it has at some point been found or suggested that they influence the response; • Any question in a questionnaire represents a series of choices (conscious or not) on those characteristics: a method of asking the question; • It is clear that what is a good method depends strongly on the topic, for example • The frequency and importance of an event or series of events asked about determine: reasonable reference periods; reasonable categories - wide or deep; approximately or exactly (Tourangeau et al. 2000). Predicting the quality of a survey question from its design characteristics: SQP Daniel Oberski
  20. 20. Introduction Question design Modeling measurement error Estimating measurement error Predicting measurement error Concl Question design choices • There are a great number of question design characteristics for which it has at some point been found or suggested that they influence the response; • Any question in a questionnaire represents a series of choices (conscious or not) on those characteristics: a method of asking the question; • It is clear that what is a good method depends strongly on the topic, for example • The frequency and importance of an event or series of events asked about determine: reasonable reference periods; reasonable categories - wide or deep; approximately or exactly (Tourangeau et al. 2000). • But are some methods generally better than others? Predicting the quality of a survey question from its design characteristics: SQP Daniel Oberski
  21. 21. Introduction Question design Modeling measurement error Estimating measurement error Predicting measurement error Concl Question design choices • There are a great number of question design characteristics for which it has at some point been found or suggested that they influence the response; • Any question in a questionnaire represents a series of choices (conscious or not) on those characteristics: a method of asking the question; • It is clear that what is a good method depends strongly on the topic, for example • The frequency and importance of an event or series of events asked about determine: reasonable reference periods; reasonable categories - wide or deep; approximately or exactly (Tourangeau et al. 2000). • But are some methods generally better than others? • If so, what about those methods makes them better? Predicting the quality of a survey question from its design characteristics: SQP Daniel Oberski
  22. 22. Introduction Question design Modeling measurement error Estimating measurement error Predicting measurement error Concl Question design choices • There are a great number of question design characteristics for which it has at some point been found or suggested that they influence the response; • Any question in a questionnaire represents a series of choices (conscious or not) on those characteristics: a method of asking the question; • It is clear that what is a good method depends strongly on the topic, for example • The frequency and importance of an event or series of events asked about determine: reasonable reference periods; reasonable categories - wide or deep; approximately or exactly (Tourangeau et al. 2000). • But are some methods generally better than others? • If so, what about those methods makes them better? Predicting the quality of a survey question from its design characteristics: SQP Daniel Oberski
  23. 23. Introduction Question design Modeling measurement error Estimating measurement error Predicting measurement error Concl Question design choices • But are some methods generally better than others? • If so, what about those methods makes them better? Predicting the quality of a survey question from its design characteristics: SQP Daniel Oberski
  24. 24. Introduction Question design Modeling measurement error Estimating measurement error Predicting measurement error Concl Talk outline 1 Question design The influence of the method Variation in influence of the method 2 Modeling measurement error Definitions Formal model and assumptions 3 Estimating measurement error Design requirements Estimation of the model 4 Predicting measurement error Description of the data Meta-analysis of the MTMM experiments Program demonstration Predicting the quality of a survey question from its design characteristics: SQP Daniel Oberski
  25. 25. Introduction Question design Modeling measurement error Estimating measurement error Predicting measurement error Concl The influence of the method The method influences the answers Predicting the quality of a survey question from its design characteristics: SQP Daniel Oberski
  26. 26. Introduction Question design Modeling measurement error Estimating measurement error Predicting measurement error Concl The influence of the method European Social Survey, 2002 Predicting the quality of a survey question from its design characteristics: SQP Daniel Oberski
  27. 27. Introduction Question design Modeling measurement error Estimating measurement error Predicting measurement error Concl The influence of the method European Social Survey, 2002 Method A: ENTER START TIME: 1 TvTot CARD 1 On an average weekday, how much time, in total, do you spend watching television? Please use this card to answer. No time at all Less than ½ hour ½ hour to 1 hour More than 1 hour, up to1½ hours More than 1½ hours, up to 2 hours More than 2 hours, up to 2½ hours More than 2½ hours, up to 3 hours More than 3 hours (Don’t know) A2 TvPol STILL CARD 1 And again on an average weekday, how much of your time watching television is spent watching news or programmes about politics and current affairs1 ? Still use this card. 00 GO TO A3 01 02 03 04 ASK A2 05 06 07 88 Predicting the quality of a survey question from its design characteristics: SQP Daniel Oberski
  28. 28. Introduction Question design Modeling measurement error Estimating measurement error Predicting measurement error Concl The influence of the method European Social Survey, 2002 Method A: ENTER START TIME: 1 TvTot CARD 1 On an average weekday, how much time, in total, do you spend watching television? Please use this card to answer. No time at all Less than ½ hour ½ hour to 1 hour More than 1 hour, up to1½ hours More than 1½ hours, up to 2 hours More than 2 hours, up to 2½ hours More than 2½ hours, up to 3 hours More than 3 hours (Don’t know) A2 TvPol STILL CARD 1 And again on an average weekday, how much of your time watching television is spent watching news or programmes about politics and current affairs1 ? Still use this card. 00 GO TO A3 01 02 03 04 ASK A2 05 06 07 88 Method B:! !""#$%&'()*%)+&#!)&,%$# ! -&.# !"#$"#$%&'$(&#)&&*+$,-#./)#012.#340&-#4"#3/3$5-#+/#,/1#67&"+#)$32.4"(# 3&5&%464/"89 :## # # # # ,$/+%#/)#;!<=>0#### ###?@A#BC@<DE>0# # # # # # # # -&1# #!"#$"#$%&'$(&#)&&*+$,-#./)#012.#340&-#4"#3/3$5-#+/#,/1#67&"+#5463&"4"(#3/# 3.&#'$+4/8F :## # # # # ,$/+%#/)#;!<=>G## ?@A#BC@<DE>G# # # # # # # # # # # # -&2# !"#$"#$%&'$(&#)&&*+$,-#./)#012.#340&-#4"#3/3$5-#+/#,/1#67&"+#'&$+4"(#3.&# "&)67$7&'688 :## # # # # ,$/+%#/)#;!<=>G# #?@A#BC@<DE>G# # # # #Predicting the quality of a survey question from its design characteristics: SQP Daniel Oberski
  29. 29. Introduction Question design Modeling measurement error Estimating measurement error Predicting measurement error Concl The influence of the method TV watching: method A versus method B 0 h<0.5 0.5<=h<=1 1<h<=1.5 1.5<h<=2 2<h<=2.5 2.5<h<=3 h>3 Hours of TV watching: categorical scale 0 2000 4000 6000 8000 qqq q qqqq qq qq q q q q q q q q q q q q q q q q q q q q q q q q qqq q q q q q q q q qq q q q q q qqq q q qq q q q q qq qq qq qq qq q q q q q q q q q q q q q q q qq q q qq q qq q q q qqqq q qq qq q q q q q q q q q q q q q q q q qqq q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q qqq q q q q qq q q q qqqq q q q q q q q q q q q q q q q q q q q q q qq q qq q qqq q q q q q q q qqq q q q q q q qq q q q q q qq q q q q q q q q q q q q q q q q qq q q q q q q q qq q q q q q q q q qq q q qq q q q q q q q q qq q q q qq qq q q q q q q q q q qqq q q qqq qq qq q q qq q q q qq qq qqq q qqq qq q q q q q qq q q q q q q qq q q q q qq q q q q q q qqq q q qqq q q q qqqq q q qqq qqqqq qq q q qqq q q q q q q qq q q q q q q q q q q qqq q qq qq q q q q qq q q q q q q q q q qq q q q q q q q q q q qq q q q q q q q q q qqqq q q q q qq q q q q q qqq q qq q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q qq q q q q qq q q q q q q q q q qq q q q qqqq q qq q q q q q q qq q q q q q q qq q q q q qq q q q qq q q q q q q q qq q q q q q q qq q qq q q q qq qqqq q q q qqq qq q q q q q q q q q q q q q q qq q q qq q q q q q q qq q q q q q q q qqqqqq q q q q q q q q q q q q q qq qq q q q q q q q qq q q q q q q q q q q q q q q q q q q q q qq q q 0 5 10 15 Hours of TV watching: write in hrs and mins Predicting the quality of a survey question from its design characteristics: SQP Daniel Oberski
  30. 30. Introduction Question design Modeling measurement error Estimating measurement error Predicting measurement error Concl The influence of the method Radio listening: method A versus method B 0 h<0.5 0.5<=h<=1 1<h<=1.5 1.5<h<=2 2<h<=2.5 2.5<h<=3 h>3 Hours of radio listening: categorical scale 0 2000 4000 6000 8000 q q qqqqq q q qq q q q q q q q qq q q q q q q q q q q qq qq q q qq q q q q q q q qqqq q q q q q q q q q q qq q q qq q q q q qqq q q q qq q q q qqqq q q q q qqqqqqq qq q q qq q qq q qq q q q q qq q q q q q q qq q q q q q q q q qqqq q q q q q q q q q qqqq q qqq qq qq q qqqq q q qq q qq q q q q qq q q qq q q qqq q q q qq qqqq q qqqq qq qq qq qqq qq qqqq q q q qqqq qq q q q q q q q qq qqq q qq q qq q q q q qq q qq qq q q q q q q qq qq q qq q q q q q q q qqq q q q q q q q q q q q q qqq q q q q q q q q q q qq q q q q qq q qq q q q q q q q q q q q q q q q q q qq q q qq q q qq qqq qq qq q qq q q q q qq q q q q q q qq q q q q q q q q q q q qq qqq q q qqq q q q q q q q q q q q q q q q q q qqq q q q q qq q q qqqq q q qq qq q q q qq q qq q q q q q q q q q q qqq q qqq q qq q qqq qq q q q q q q q q qq q qqq qq q q q q q q q q q q q q q qq q qqqqq qq q q q q qq q qqq q qq q q q qqq q q q q q q qq q q qqq qqq qq q q qqq q q qqqq q q q q q q q q q qq q q q q q q q q qq q qq q q q q q qq q q qqq q q q q q qq q q q qqqqq q qqq q q q qqq q q q q q q q q q q q qqq q q q q q qq qq q qq q q q q qq q q qq q q qqq q qqqqq q qqq q q q q q q qq qq q q q q q q qq q q qqq q q qqq qq q q q q q qqq q q qqq q q q q q q q qqq qqq q q qqqq qq q qq q qqq q q q q q qq q q q q q q qq q q q q qqq qq q qq q q q q q q q qq q qq q q qq q q q qqqq q qqq q q q qqq qq q q qqq q q q q q qqqq q q q q q qqq qq q q q q q qqq q q q q q q q q q qq q qq q q qq q qq q qq q qqq q qq q q qqq q q qq q qq q qqq q qq q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q qq q q q qqq q qqqqq q q q q q q qq q q qq q q q q q qqq q q q q q q q qq q q qq qq qq q q q qq q q q qq q q q q q q q qqq q q q qq q q q q q q q q q qq q q qqq qq q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q qq q q 0 5 10 15 Hours of radio listening: write in hrs and mins Predicting the quality of a survey question from its design characteristics: SQP Daniel Oberski
  31. 31. Introduction Question design Modeling measurement error Estimating measurement error Predicting measurement error Concl The influence of the method Newspaper reading: method A versus method B 0 h<0.5 0.5<=h<=1 1<h<=1.5 1.5<h<=2 2<h<=2.5 2.5<h<=3 h>3 Hours of newspaper reading: categorical scale 0 2000 4000 6000 8000 10000 12000 q q q 0 2000 4000 6000 8000 10000 Hours of newspaper reading: write in hrs and mins Predicting the quality of a survey question from its design characteristics: SQP Daniel Oberski
  32. 32. Introduction Question design Modeling measurement error Estimating measurement error Predicting measurement error Concl The influence of the method TV watching: method A versus method B 0 h<0.5 0.5<=h<=1 1<h<=1.5 1.5<h<=2 2<h<=2.5 2.5<h<=3 h>3 Hours of TV watching: categorical scale 0.00 0.05 0.10 0.15 0.20 0 h<0.5 0.5<=h<=1 1<h<=1.5 1.5<h<=2 2<h<=2.5 2.5<h<=3 h>3 Hours of TV watching: write in hrs and mins, recoded 0.00 0.05 0.10 0.15 0.20 Predicting the quality of a survey question from its design characteristics: SQP Daniel Oberski
  33. 33. Introduction Question design Modeling measurement error Estimating measurement error Predicting measurement error Concl The influence of the method Radio listening: method A versus method B 0 h<0.5 0.5<=h<=1 1<h<=1.5 1.5<h<=2 2<h<=2.5 2.5<h<=3 h>3 Hours of radio listening: categorical scale 0.00 0.05 0.10 0.15 0.20 0.25 0 h<0.5 0.5<=h<=1 1<h<=1.5 1.5<h<=2 2<h<=2.5 2.5<h<=3 h>3 Hours of radio listening: write in hrs and mins, recoded 0.00 0.05 0.10 0.15 0.20 0.25 Predicting the quality of a survey question from its design characteristics: SQP Daniel Oberski
  34. 34. Introduction Question design Modeling measurement error Estimating measurement error Predicting measurement error Concl The influence of the method Newspaper reading: method A versus method B 0 h<0.5 0.5<=h<=1 1<h<=1.5 1.5<h<=2 2<h<=2.5 2.5<h<=3 h>3 Hours of newspaper reading: categorical scale 0.0 0.1 0.2 0.3 0.4 0 h<0.5 0.5<=h<=1 1<h<=1.5 1.5<h<=2 2<h<=2.5 2.5<h<=3 h>3 Hours of newspaper reading: write in hrs and mins, recoded 0.0 0.1 0.2 0.3 0.4 Predicting the quality of a survey question from its design characteristics: SQP Daniel Oberski
  35. 35. Introduction Question design Modeling measurement error Estimating measurement error Predicting measurement error Concl Variation in influence of the method Do people answer methods differently? • The numeric method clearly produces many outliers, as well as very high values that may or may not be outliers. • To the extent that this is due to confusion of hours and minutes, version C may remedy that problem. Predicting the quality of a survey question from its design characteristics: SQP Daniel Oberski
  36. 36. Introduction Question design Modeling measurement error Estimating measurement error Predicting measurement error Concl Variation in influence of the method Do people answer methods differently? • The numeric method clearly produces many outliers, as well as very high values that may or may not be outliers. • To the extent that this is due to confusion of hours and minutes, version C may remedy that problem. • Distributions of hours with method A and B (recoded) is similar but not the same: Predicting the quality of a survey question from its design characteristics: SQP Daniel Oberski
  37. 37. Introduction Question design Modeling measurement error Estimating measurement error Predicting measurement error Concl Variation in influence of the method Do people answer methods differently? • The numeric method clearly produces many outliers, as well as very high values that may or may not be outliers. • To the extent that this is due to confusion of hours and minutes, version C may remedy that problem. • Distributions of hours with method A and B (recoded) is similar but not the same: • There are much fewer people who watch very little TV with method B, (9% versus 4% of 40,355 respondents), Predicting the quality of a survey question from its design characteristics: SQP Daniel Oberski
  38. 38. Introduction Question design Modeling measurement error Estimating measurement error Predicting measurement error Concl Variation in influence of the method Do people answer methods differently? • The numeric method clearly produces many outliers, as well as very high values that may or may not be outliers. • To the extent that this is due to confusion of hours and minutes, version C may remedy that problem. • Distributions of hours with method A and B (recoded) is similar but not the same: • There are much fewer people who watch very little TV with method B, (9% versus 4% of 40,355 respondents), • Numeric method B has more people who watch a lot of TV. Predicting the quality of a survey question from its design characteristics: SQP Daniel Oberski
  39. 39. Introduction Question design Modeling measurement error Estimating measurement error Predicting measurement error Concl Variation in influence of the method Do people answer methods differently? • The numeric method clearly produces many outliers, as well as very high values that may or may not be outliers. • To the extent that this is due to confusion of hours and minutes, version C may remedy that problem. • Distributions of hours with method A and B (recoded) is similar but not the same: • There are much fewer people who watch very little TV with method B, (9% versus 4% of 40,355 respondents), • Numeric method B has more people who watch a lot of TV. • Numeric method B has a spike at exactly 1 hour for radio and newspaper. Predicting the quality of a survey question from its design characteristics: SQP Daniel Oberski
  40. 40. Introduction Question design Modeling measurement error Estimating measurement error Predicting measurement error Concl Variation in influence of the method Do people answer methods differently? • The numeric method clearly produces many outliers, as well as very high values that may or may not be outliers. • To the extent that this is due to confusion of hours and minutes, version C may remedy that problem. • Distributions of hours with method A and B (recoded) is similar but not the same: • There are much fewer people who watch very little TV with method B, (9% versus 4% of 40,355 respondents), • Numeric method B has more people who watch a lot of TV. • Numeric method B has a spike at exactly 1 hour for radio and newspaper. • Overall it is clear the method has some influence on average over all 40,355 respondents. Predicting the quality of a survey question from its design characteristics: SQP Daniel Oberski
  41. 41. Introduction Question design Modeling measurement error Estimating measurement error Predicting measurement error Concl Variation in influence of the method Do people answer methods differently? • The numeric method clearly produces many outliers, as well as very high values that may or may not be outliers. • To the extent that this is due to confusion of hours and minutes, version C may remedy that problem. • Distributions of hours with method A and B (recoded) is similar but not the same: • There are much fewer people who watch very little TV with method B, (9% versus 4% of 40,355 respondents), • Numeric method B has more people who watch a lot of TV. • Numeric method B has a spike at exactly 1 hour for radio and newspaper. • Overall it is clear the method has some influence on average over all 40,355 respondents. Predicting the quality of a survey question from its design characteristics: SQP Daniel Oberski
  42. 42. Introduction Question design Modeling measurement error Estimating measurement error Predicting measurement error Concl Variation in influence of the method Is the difference between methods the same for all respondents? Predicting the quality of a survey question from its design characteristics: SQP Daniel Oberski
  43. 43. Introduction Question design Modeling measurement error Estimating measurement error Predicting measurement error Concl Variation in influence of the method Is the difference between methods the same for all respondents? The same people were asked both versions. This allows us to show variation in answers to the numeric question, within categories of the categorical question. Predicting the quality of a survey question from its design characteristics: SQP Daniel Oberski
  44. 44. Introduction Question design Modeling measurement error Estimating measurement error Predicting measurement error Concl Variation in influence of the method Is the difference between methods the same for all respondents? No time at all Numeric value given Density 0 1 2 3 4 0.00.20.40.60.81.0 Less than 0,5 hour Numeric value given Density 0 1 2 3 4 0.00.20.40.60.81.0 0,5 hour to 1 hour Numeric value given Density 0 1 2 3 4 0.00.20.40.60.81.0 More than 1 hour, up to 1,5 hours Numeric value given Density 0 1 2 3 4 0.00.20.40.60.81.0 More than 1,5 hours, up to 2 hours Numeric value given Density 0 1 2 3 4 0.00.20.40.60.81.0 More than 2 hours, up to 2,5 hours Numeric value given Density 0 1 2 3 4 0.00.20.40.60.81.0 More than 2,5 hours, up to 3 hours Numeric value given Density 0 1 2 3 4 0.00.20.40.60.81.0 Predicting the quality of a survey question from its design characteristics: SQP Daniel Oberski
  45. 45. Introduction Question design Modeling measurement error Estimating measurement error Predicting measurement error Concl Variation in influence of the method Do people answer methods differently? • Not only does the method influence the distribution of answers, • the method effect also depends on the person. Predicting the quality of a survey question from its design characteristics: SQP Daniel Oberski
  46. 46. Introduction Question design Modeling measurement error Estimating measurement error Predicting measurement error Concl Variation in influence of the method Do people answer methods differently? • Not only does the method influence the distribution of answers, • the method effect also depends on the person. Predicting the quality of a survey question from its design characteristics: SQP Daniel Oberski
  47. 47. Introduction Question design Modeling measurement error Estimating measurement error Predicting measurement error Concl Definitions Traits, Methods, and Persons • Can imagine the same question (“Trait”) being asked in different ways (“Methods”); • Can imagine the same method being used to ask different questions; Predicting the quality of a survey question from its design characteristics: SQP Daniel Oberski
  48. 48. Introduction Question design Modeling measurement error Estimating measurement error Predicting measurement error Concl Definitions Traits, Methods, and Persons • Can imagine the same question (“Trait”) being asked in different ways (“Methods”); • Can imagine the same method being used to ask different questions; • A response to a survey question is then different person’s answers to Trait-Method combinations. Predicting the quality of a survey question from its design characteristics: SQP Daniel Oberski
  49. 49. Introduction Question design Modeling measurement error Estimating measurement error Predicting measurement error Concl Definitions Traits, Methods, and Persons • Can imagine the same question (“Trait”) being asked in different ways (“Methods”); • Can imagine the same method being used to ask different questions; • A response to a survey question is then different person’s answers to Trait-Method combinations. Predicting the quality of a survey question from its design characteristics: SQP Daniel Oberski
  50. 50. Introduction Question design Modeling measurement error Estimating measurement error Predicting measurement error Concl Definitions Measurement error model 1 Responses are a measure of some underlying score (“trait”) so that if a person’s memory were erased and the person re-interviewed, they should give a similar answer. 2 Responses are influenced by random variation: errors, such as mistaking minutes for hours, but also variation in information retrieved from memory. Predicting the quality of a survey question from its design characteristics: SQP Daniel Oberski
  51. 51. Introduction Question design Modeling measurement error Estimating measurement error Predicting measurement error Concl Definitions Measurement error model 1 Responses are a measure of some underlying score (“trait”) so that if a person’s memory were erased and the person re-interviewed, they should give a similar answer. 2 Responses are influenced by random variation: errors, such as mistaking minutes for hours, but also variation in information retrieved from memory. 3 The method influences the answers on average, e.g. there might be more social desirability bias in one method than another, the scale may suggest some unspoken norm, etc. Predicting the quality of a survey question from its design characteristics: SQP Daniel Oberski
  52. 52. Introduction Question design Modeling measurement error Estimating measurement error Predicting measurement error Concl Definitions Measurement error model 1 Responses are a measure of some underlying score (“trait”) so that if a person’s memory were erased and the person re-interviewed, they should give a similar answer. 2 Responses are influenced by random variation: errors, such as mistaking minutes for hours, but also variation in information retrieved from memory. 3 The method influences the answers on average, e.g. there might be more social desirability bias in one method than another, the scale may suggest some unspoken norm, etc. 4 Influence of method is different for different people: random variation in the differences between methods. Predicting the quality of a survey question from its design characteristics: SQP Daniel Oberski
  53. 53. Introduction Question design Modeling measurement error Estimating measurement error Predicting measurement error Concl Definitions Measurement error model 1 Responses are a measure of some underlying score (“trait”) so that if a person’s memory were erased and the person re-interviewed, they should give a similar answer. 2 Responses are influenced by random variation: errors, such as mistaking minutes for hours, but also variation in information retrieved from memory. 3 The method influences the answers on average, e.g. there might be more social desirability bias in one method than another, the scale may suggest some unspoken norm, etc. 4 Influence of method is different for different people: random variation in the differences between methods. Predicting the quality of a survey question from its design characteristics: SQP Daniel Oberski
  54. 54. Introduction Question design Modeling measurement error Estimating measurement error Predicting measurement error Concl Definitions Modeling measurement error Predicting the quality of a survey question from its design characteristics: SQP Daniel Oberski
  55. 55. Introduction Question design Modeling measurement error Estimating measurement error Predicting measurement error Concl Definitions Quasi-equation Response = Responses are a measure of some underlying score (“trait”) so that if a person’s memory were erased and the person re-interviewed, they should give a similar answer. Trait + Trait × Person+ Responses are influenced by random variation: er- rors, such as mistaking minutes for hours, but also variation in information retrieved from memory. Person × Moment+ The method influences the answers on average, e.g. there might be more social desirability bias in one method than another, the scale may suggest some unspoken norm, etc. Method + Method × Trait Influence of method is different for different people: random variation in the differences between meth- ods. Method × Person Predicting the quality of a survey question from its design characteristics: SQP Daniel Oberski
  56. 56. Introduction Question design Modeling measurement error Estimating measurement error Predicting measurement error Concl Definitions Quasi-equation Response = Trait + Method + Trait × Method+ Trait × Person + Method × Person+ Person × Moment Predicting the quality of a survey question from its design characteristics: SQP Daniel Oberski
  57. 57. Introduction Question design Modeling measurement error Estimating measurement error Predicting measurement error Concl Definitions Interpretation of the model If persons are a random sample from a population U, consider Person a random factor. 1 “Rest” variance is called “random measurement error” 2 Proportion of Residual variance on the total is called “unreliability” (1 − r2) Predicting the quality of a survey question from its design characteristics: SQP Daniel Oberski
  58. 58. Introduction Question design Modeling measurement error Estimating measurement error Predicting measurement error Concl Definitions Interpretation of the model If persons are a random sample from a population U, consider Person a random factor. 1 “Rest” variance is called “random measurement error” 2 Proportion of Residual variance on the total is called “unreliability” (1 − r2) 3 Proportion of Method×Person variance on the total is called “common method variance” (sometimes “invalidity”), (1 − v2) Predicting the quality of a survey question from its design characteristics: SQP Daniel Oberski
  59. 59. Introduction Question design Modeling measurement error Estimating measurement error Predicting measurement error Concl Definitions Interpretation of the model If persons are a random sample from a population U, consider Person a random factor. 1 “Rest” variance is called “random measurement error” 2 Proportion of Residual variance on the total is called “unreliability” (1 − r2) 3 Proportion of Method×Person variance on the total is called “common method variance” (sometimes “invalidity”), (1 − v2) 4 Proportion of Trait×Person variance on the total is called “quality” of the question (q2 or κ) Predicting the quality of a survey question from its design characteristics: SQP Daniel Oberski
  60. 60. Introduction Question design Modeling measurement error Estimating measurement error Predicting measurement error Concl Definitions Interpretation of the model If persons are a random sample from a population U, consider Person a random factor. 1 “Rest” variance is called “random measurement error” 2 Proportion of Residual variance on the total is called “unreliability” (1 − r2) 3 Proportion of Method×Person variance on the total is called “common method variance” (sometimes “invalidity”), (1 − v2) 4 Proportion of Trait×Person variance on the total is called “quality” of the question (q2 or κ) 5 “Quality” (q2 or κ) will equal v2 · r2. Predicting the quality of a survey question from its design characteristics: SQP Daniel Oberski
  61. 61. Introduction Question design Modeling measurement error Estimating measurement error Predicting measurement error Concl Definitions Interpretation of the model If persons are a random sample from a population U, consider Person a random factor. 1 “Rest” variance is called “random measurement error” 2 Proportion of Residual variance on the total is called “unreliability” (1 − r2) 3 Proportion of Method×Person variance on the total is called “common method variance” (sometimes “invalidity”), (1 − v2) 4 Proportion of Trait×Person variance on the total is called “quality” of the question (q2 or κ) 5 “Quality” (q2 or κ) will equal v2 · r2. Predicting the quality of a survey question from its design characteristics: SQP Daniel Oberski
  62. 62. Introduction Question design Modeling measurement error Estimating measurement error Predicting measurement error Concl Formal model and assumptions Equation model Yijk = τijk + ηij + ξik + ijk , where i Indexes persons; j Indexes traits; k Indexes methods. Predicting the quality of a survey question from its design characteristics: SQP Daniel Oberski
  63. 63. Introduction Question design Modeling measurement error Estimating measurement error Predicting measurement error Concl Formal model and assumptions Model Response = Trait + Method + Trait × Method+ Trait × Person + Method × Person+ Person × Moment Yijk = τijk + ηij + ξik + ijk , where i Indexes persons; j Indexes traits; k Indexes methods. Predicting the quality of a survey question from its design characteristics: SQP Daniel Oberski
  64. 64. Introduction Question design Modeling measurement error Estimating measurement error Predicting measurement error Concl Formal model and assumptions Equation with Trait×Method interaction with Trait×Person Yijk = τijk + λjk ηij + ξik + ijk , where i Indexes persons; j Indexes traits; k Indexes methods. Predicting the quality of a survey question from its design characteristics: SQP Daniel Oberski
  65. 65. Introduction Question design Modeling measurement error Estimating measurement error Predicting measurement error Concl Formal model and assumptions Assumptions in the model 1 The (interaction) effects do not depend on other Method×Trait combinations a person might receive; (“no carry-over effects”, “SUTVA”, “independence assumption”) Assumption 2 can sometimes be relaxed (Oberski et al in Salzborn, Davidov & Reinecke (eds), 2012) Predicting the quality of a survey question from its design characteristics: SQP Daniel Oberski
  66. 66. Introduction Question design Modeling measurement error Estimating measurement error Predicting measurement error Concl Formal model and assumptions Assumptions in the model 1 The (interaction) effects do not depend on other Method×Trait combinations a person might receive; (“no carry-over effects”, “SUTVA”, “independence assumption”) 2 There is no separate Person main effect: Trait and Method within Person already capture all within-person correlation Assumption 2 can sometimes be relaxed (Oberski et al in Salzborn, Davidov & Reinecke (eds), 2012) Predicting the quality of a survey question from its design characteristics: SQP Daniel Oberski
  67. 67. Introduction Question design Modeling measurement error Estimating measurement error Predicting measurement error Concl Formal model and assumptions Assumptions in the model 1 The (interaction) effects do not depend on other Method×Trait combinations a person might receive; (“no carry-over effects”, “SUTVA”, “independence assumption”) 2 There is no separate Person main effect: Trait and Method within Person already capture all within-person correlation (“method variance is the only systematic variance”, COVU( ijk , ξik ) = 0 and COVU( ijk , ηik ) = 0 ) Assumption 2 can sometimes be relaxed (Oberski et al in Salzborn, Davidov & Reinecke (eds), 2012) Predicting the quality of a survey question from its design characteristics: SQP Daniel Oberski
  68. 68. Introduction Question design Modeling measurement error Estimating measurement error Predicting measurement error Concl Formal model and assumptions Assumptions in the model 1 The (interaction) effects do not depend on other Method×Trait combinations a person might receive; (“no carry-over effects”, “SUTVA”, “independence assumption”) 2 There is no separate Person main effect: Trait and Method within Person already capture all within-person correlation (“method variance is the only systematic variance”, COVU( ijk , ξik ) = 0 and COVU( ijk , ηik ) = 0 ) Assumption 2 can sometimes be relaxed (Oberski et al in Salzborn, Davidov & Reinecke (eds), 2012) Predicting the quality of a survey question from its design characteristics: SQP Daniel Oberski
  69. 69. Introduction Question design Modeling measurement error Estimating measurement error Predicting measurement error Concl Formal model and assumptions The parameters of interest in the model are • The variance over persons in the Trait effect; • The variance over persons in the Method effect. Expressed as proportions of the total variance over persons of Yjk , these two quantities equal, respectively, Predicting the quality of a survey question from its design characteristics: SQP Daniel Oberski
  70. 70. Introduction Question design Modeling measurement error Estimating measurement error Predicting measurement error Concl Formal model and assumptions The parameters of interest in the model are • The variance over persons in the Trait effect; • The variance over persons in the Method effect. Expressed as proportions of the total variance over persons of Yjk , these two quantities equal, respectively, • The reliability κjk of a question asking Trait j with Method k Predicting the quality of a survey question from its design characteristics: SQP Daniel Oberski
  71. 71. Introduction Question design Modeling measurement error Estimating measurement error Predicting measurement error Concl Formal model and assumptions The parameters of interest in the model are • The variance over persons in the Trait effect; • The variance over persons in the Method effect. Expressed as proportions of the total variance over persons of Yjk , these two quantities equal, respectively, • The reliability κjk of a question asking Trait j with Method k • The correlation between two different questions that is purely due to them being measured with the same method. Predicting the quality of a survey question from its design characteristics: SQP Daniel Oberski
  72. 72. Introduction Question design Modeling measurement error Estimating measurement error Predicting measurement error Concl Formal model and assumptions The parameters of interest in the model are • The variance over persons in the Trait effect; • The variance over persons in the Method effect. Expressed as proportions of the total variance over persons of Yjk , these two quantities equal, respectively, • The reliability κjk of a question asking Trait j with Method k • The correlation between two different questions that is purely due to them being measured with the same method. Predicting the quality of a survey question from its design characteristics: SQP Daniel Oberski
  73. 73. Introduction Question design Modeling measurement error Estimating measurement error Predicting measurement error Concl Estimation of measurement error with the MTMM design Predicting the quality of a survey question from its design characteristics: SQP Daniel Oberski
  74. 74. Introduction Question design Modeling measurement error Estimating measurement error Predicting measurement error Concl Design requirements What design is needed to estimate this model? Response = Trait + Method + Trait × Method+ Trait × Person + Method × Person+ Person × Moment Yijk = τijk + ηij + ξik + ijk , i Indexes persons; j indexes traits; k indexes methods. • The model suggests that a Person×Method×Trait factorial experiment would allow for the estimation of the reliability and method variance. • Residual or “measurement error” error Person × Moment is estimated by Person × Trait × Method interaction. Predicting the quality of a survey question from its design characteristics: SQP Daniel Oberski
  75. 75. Introduction Question design Modeling measurement error Estimating measurement error Predicting measurement error Concl Design requirements What design is needed to estimate this model? • A Person×Method×Trait factorial experiment would ask the same question in different ways (Methods) and use different methods to ask the same questions, within each person; • Campbell and Fiske introduced such designs in 1959 under the name “Multitrait-multimethod” (MTMM) experiment. • Not all Trait-Method combinations are necessary, but at least one repetition within each person is required (Saris, Satorra & Coenders, 2004). • Under the model and assumptions 1 and 2, the MTMM design will provide data that allow for the estimation of the reliability and method variance (“invalidity”). Predicting the quality of a survey question from its design characteristics: SQP Daniel Oberski
  76. 76. Introduction Question design Modeling measurement error Estimating measurement error Predicting measurement error Concl Design requirements Example of an MTMM experiment On an average weekday, how much time, in total... T = 1 ...do you spend watching television? T = 2 ...do you spend listening to the radio? T = 3 ...do you spend reading the newspapers? Scales: M = 1: 8pt (hours) M = 2: Write in hours and minutes M = 3: 7pts vague quantifiers Each respondent answered all three questions in two different ways. The repetition was given at the end of the interview (after approximately 50 minutes passed) Predicting the quality of a survey question from its design characteristics: SQP Daniel Oberski
  77. 77. Introduction Question design Modeling measurement error Estimating measurement error Predicting measurement error Concl Estimation of the model Estimation issues Yijk = τijk + λjk ηij + ξik + ijk . • The model can be estimated with regression (with Person a random factor); • Not flexible enough: little influence on covariance structure and λjk not possible. • The model can also be recognized as a factor analysis or more generally as a structural equation model (SEM), • through transformation as an IRT or latent class model. • The SEM framework allows enough flexibility to estimate the parameters of interest: trait, method and residual variance or r2, v2, and quality q2. Predicting the quality of a survey question from its design characteristics: SQP Daniel Oberski
  78. 78. Introduction Question design Modeling measurement error Estimating measurement error Predicting measurement error Concl Estimation of the model The model as a SEM (or IRT or latent class) model M1 M2 M3 T1 T2 T3 y11 y21 y31 y12 y22 y32 y13 y23 y33 Predicting the quality of a survey question from its design characteristics: SQP Daniel Oberski
  79. 79. Introduction Question design Modeling measurement error Estimating measurement error Predicting measurement error Concl Estimation of the model Another example COMPARING QUESTIONS WITH AGREE/DISAGREE RESPONSE OPTIONS TO QUESTIONS WITH ITEM-SPECIFIC RESPONSE OPTIONS 69 Table 4: Experiment 2 of round 2 Introduction Statements Answer categories Main Using this card, - There is a lot of variety in my work - not at all true questionnaire please tell me how - My job is secure - a little true true each of the - My health or safety is at risk because - quite true “A/D” following statements of my work - very true is about your current job. SC group 1 The next 3 questions - Please choose one of the following to - not at all varied are about your describe how varied your work is. - a little varied IS current job. - Please choose one of the following to - quite varied describe how secure your job is - very varied - Please choose one of the following to (same type of response say how much, if at all, your work puts scale using terms secure your health and safety at risk. and safe instead of varied) SC group 2 - Please indicate, on a scale of 0 to 10, Horizontal 11 point how varied your work is, where 0 is not scale only labelled at the IS at all varied and 10 is very varied. end points - Now please indicate, on a scale of 0 to 10, how secure your job is, where 0 is not at all secure and 10 is very secure. - Please indicate, on a scale of 0 to 10, how much your health and safety is at risk from your work, where 0 is not at all at risk and 10 is very much at risk. Table 5: The means reliability, validity and quality of the three questions of experiment 2 in Round 2 of the ESS across 10 countries for the different methods (standard deviations in brackets) Reliability r2 Validity v2 Quality q2 Method Q1 Q2 Q3 Q1 Q2 Q3 Q1 Q2 Q3 Source: R´evilla, Saris & Krosnick, (2010) Predicting the quality of a survey question from its design characteristics: SQP Daniel Oberski
  80. 80. Introduction Question design Modeling measurement error Estimating measurement error Predicting measurement error Concl Estimation of the model Results from another example - Please choose one of the following to (same type of response say how much, if at all, your work puts scale using terms secure your health and safety at risk. and safe instead of varied) SC group 2 - Please indicate, on a scale of 0 to 10, Horizontal 11 point how varied your work is, where 0 is not scale only labelled at the IS at all varied and 10 is very varied. end points - Now please indicate, on a scale of 0 to 10, how secure your job is, where 0 is not at all secure and 10 is very secure. - Please indicate, on a scale of 0 to 10, how much your health and safety is at risk from your work, where 0 is not at all at risk and 10 is very much at risk. Table 5: The means reliability, validity and quality of the three questions of experiment 2 in Round 2 of the ESS across 10 countries for the different methods (standard deviations in brackets) Reliability r2 Validity v2 Quality q2 Method Q1 Q2 Q3 Q1 Q2 Q3 Q1 Q2 Q3 A/D(4) .65 .59 .61 .99 .98 .99 .64 .58 .60 (.09) (.18) (.15) (.02) (.03) (.03) (.10) (.18) (.15) IS(4) .80 .80 .80 1 1 1 .80 .80 .80 (.14) (.13) (.14) (0) (0) (0) (.14) (.13) (.14) IS(11) .81 .83 .77 .98 .98 .98 .80 .82 .76 (.09) (.11) (.12) (.03) (.03) (.04) (.10) (.12) (.14) using a truth scale with the same number of categories for all three questions (around .7 to .9 versus .5 to .6). The position of the IS scale in the supplementary questionnaire is not an issue as the better quality of the IS scale is also observed both when it comes first and when it comes later. Possibly the order of the observations with the different scale types has an impact on the size of the differences since we see fewer differences in this second experiment than in the first, but this may also be linked to the subject matter of the experiments or to other characteristics of the methods used (such as the number of points). More research is needed to determine this, however the important point here is that in different combinations, the superiority of the IS in terms of scale with 11 categories was also better than the IS scale with 4 categories. So, not only might the kind of scale (IS versus A/D) impact the total quality of a measure, but so might the length of the scale (number of response categories). How- ever, it seems that this effect varies across countries. Experiments in Round 3 of the ESS In round 3 of the ESS again two SB-MTMM experiments have been done which allow the comparison of the IS scales with A/D scales. The attraction of these experiments is thatPredicting the quality of a survey question from its design characteristics: SQP Daniel Oberski
  81. 81. Introduction Question design Modeling measurement error Estimating measurement error Predicting measurement error Concl Estimation of the model Results from another example Quality q2 Q1 Q2 Q3 .64 .58 .60 (.10) (.18) (.15) .80 .80 .80 (.14) (.13) (.14) .80 .82 .76 (.10) (.12) (.14) Predicting the quality of a survey question from its design characteristics: SQP Daniel Oberski
  82. 82. Introduction Question design Modeling measurement error Estimating measurement error Predicting measurement error Concl Estimation of the model Results from another example • It looks like there is much more measurement error (residual variance) in the agree-disagree questions than there is in the item-specific scales. • This was true over all countries (shown is the average over countries). Predicting the quality of a survey question from its design characteristics: SQP Daniel Oberski
  83. 83. Introduction Question design Modeling measurement error Estimating measurement error Predicting measurement error Concl Estimation of the model Results from another example • It looks like there is much more measurement error (residual variance) in the agree-disagree questions than there is in the item-specific scales. • This was true over all countries (shown is the average over countries). • Still wonder whether the same would be found with other topics and under other conditions, and with other combinations of methods. Predicting the quality of a survey question from its design characteristics: SQP Daniel Oberski
  84. 84. Introduction Question design Modeling measurement error Estimating measurement error Predicting measurement error Concl Estimation of the model Results from another example • It looks like there is much more measurement error (residual variance) in the agree-disagree questions than there is in the item-specific scales. • This was true over all countries (shown is the average over countries). • Still wonder whether the same would be found with other topics and under other conditions, and with other combinations of methods. Predicting the quality of a survey question from its design characteristics: SQP Daniel Oberski
  85. 85. Introduction Question design Modeling measurement error Estimating measurement error Predicting measurement error Concl Are some types of questions better than others? Predicting the quality of a survey question from its design characteristics: SQP Daniel Oberski
  86. 86. Introduction Question design Modeling measurement error Estimating measurement error Predicting measurement error Concl Description of the data • The examples given so far come from a much larger series of MTMM experiments; • In the European Social Survey (ESS), every round about six MTMM experiments are done; Predicting the quality of a survey question from its design characteristics: SQP Daniel Oberski
  87. 87. Introduction Question design Modeling measurement error Estimating measurement error Predicting measurement error Concl Description of the data • The examples given so far come from a much larger series of MTMM experiments; • In the European Social Survey (ESS), every round about six MTMM experiments are done; • So far there have been five rounds (2002, 4, 6, 8, and 10). Predicting the quality of a survey question from its design characteristics: SQP Daniel Oberski
  88. 88. Introduction Question design Modeling measurement error Estimating measurement error Predicting measurement error Concl Description of the data • The examples given so far come from a much larger series of MTMM experiments; • In the European Social Survey (ESS), every round about six MTMM experiments are done; • So far there have been five rounds (2002, 4, 6, 8, and 10). • The experiments are done in 20-30 European countries every two years; Predicting the quality of a survey question from its design characteristics: SQP Daniel Oberski
  89. 89. Introduction Question design Modeling measurement error Estimating measurement error Predicting measurement error Concl Description of the data • The examples given so far come from a much larger series of MTMM experiments; • In the European Social Survey (ESS), every round about six MTMM experiments are done; • So far there have been five rounds (2002, 4, 6, 8, and 10). • The experiments are done in 20-30 European countries every two years; • Effective sample size per country is at least 1500. Predicting the quality of a survey question from its design characteristics: SQP Daniel Oberski
  90. 90. Introduction Question design Modeling measurement error Estimating measurement error Predicting measurement error Concl Description of the data • The examples given so far come from a much larger series of MTMM experiments; • In the European Social Survey (ESS), every round about six MTMM experiments are done; • So far there have been five rounds (2002, 4, 6, 8, and 10). • The experiments are done in 20-30 European countries every two years; • Effective sample size per country is at least 1500. • Each experiment usually estimates the quality for 9 questions (Method-Trait combinations). Predicting the quality of a survey question from its design characteristics: SQP Daniel Oberski
  91. 91. Introduction Question design Modeling measurement error Estimating measurement error Predicting measurement error Concl Description of the data • The examples given so far come from a much larger series of MTMM experiments; • In the European Social Survey (ESS), every round about six MTMM experiments are done; • So far there have been five rounds (2002, 4, 6, 8, and 10). • The experiments are done in 20-30 European countries every two years; • Effective sample size per country is at least 1500. • Each experiment usually estimates the quality for 9 questions (Method-Trait combinations). • Range of topics is reasonably diverse, though factual questions are underrepresented. Predicting the quality of a survey question from its design characteristics: SQP Daniel Oberski
  92. 92. Introduction Question design Modeling measurement error Estimating measurement error Predicting measurement error Concl Description of the data • The examples given so far come from a much larger series of MTMM experiments; • In the European Social Survey (ESS), every round about six MTMM experiments are done; • So far there have been five rounds (2002, 4, 6, 8, and 10). • The experiments are done in 20-30 European countries every two years; • Effective sample size per country is at least 1500. • Each experiment usually estimates the quality for 9 questions (Method-Trait combinations). • Range of topics is reasonably diverse, though factual questions are underrepresented. • In total about 5000 questions available, but only 3000 of those will be used here for various reasons. Predicting the quality of a survey question from its design characteristics: SQP Daniel Oberski
  93. 93. Introduction Question design Modeling measurement error Estimating measurement error Predicting measurement error Concl Description of the data • The examples given so far come from a much larger series of MTMM experiments; • In the European Social Survey (ESS), every round about six MTMM experiments are done; • So far there have been five rounds (2002, 4, 6, 8, and 10). • The experiments are done in 20-30 European countries every two years; • Effective sample size per country is at least 1500. • Each experiment usually estimates the quality for 9 questions (Method-Trait combinations). • Range of topics is reasonably diverse, though factual questions are underrepresented. • In total about 5000 questions available, but only 3000 of those will be used here for various reasons. Predicting the quality of a survey question from its design characteristics: SQP Daniel Oberski
  94. 94. Introduction Question design Modeling measurement error Estimating measurement error Predicting measurement error Concl Description of the data • In addition to the ESS, an older series of experiments also exists (F. Andrews; K¨oltringer; Saris; Billiet, 1990’s) • These add another 1089 questions for which reliability and validity coefficients are estimated Predicting the quality of a survey question from its design characteristics: SQP Daniel Oberski
  95. 95. Introduction Question design Modeling measurement error Estimating measurement error Predicting measurement error Concl Description of the data • In addition to the ESS, an older series of experiments also exists (F. Andrews; K¨oltringer; Saris; Billiet, 1990’s) • These add another 1089 questions for which reliability and validity coefficients are estimated • Combining the two datasets (ESS question qualities and Old experiment qualities, we created a database of 3011 questions with their reliability and validity estimates. Predicting the quality of a survey question from its design characteristics: SQP Daniel Oberski
  96. 96. Introduction Question design Modeling measurement error Estimating measurement error Predicting measurement error Concl Description of the data • In addition to the ESS, an older series of experiments also exists (F. Andrews; K¨oltringer; Saris; Billiet, 1990’s) • These add another 1089 questions for which reliability and validity coefficients are estimated • Combining the two datasets (ESS question qualities and Old experiment qualities, we created a database of 3011 questions with their reliability and validity estimates. Predicting the quality of a survey question from its design characteristics: SQP Daniel Oberski
  97. 97. Introduction Question design Modeling measurement error Estimating measurement error Predicting measurement error Concl Description of the data Reliability and validity estimates of 3011 questions Reliability coefficient Reliability coefficient Frequency 0.4 0.6 0.8 1.0 0200400600800 Validity coefficient Validity coefficient Frequency 0.2 0.4 0.6 0.8 1.0 050010001500 Predicting the quality of a survey question from its design characteristics: SQP Daniel Oberski
  98. 98. Introduction Question design Modeling measurement error Estimating measurement error Predicting measurement error Concl Description of the data Logit transform of Reliability and validity estimates Reliability coefficient, logit Validity coefficient Frequency 0 2 4 6 0200400600800 Validity coefficient, logit Validity coefficient Frequency 0 2 4 6 0100200300400500 Predicting the quality of a survey question from its design characteristics: SQP Daniel Oberski
  99. 99. Introduction Question design Modeling measurement error Estimating measurement error Predicting measurement error Concl Description of the data Coding design characteristics of the 3011 questions • For each of the 3011 questions in all countries, a team of coders coded 40 design characteristics of the question; • Some codes were automatically generated by Natural Language Processing software (syllables, words, etc). Predicting the quality of a survey question from its design characteristics: SQP Daniel Oberski
  100. 100. Introduction Question design Modeling measurement error Estimating measurement error Predicting measurement error Concl Description of the data Coding design characteristics of the 3011 questions • For each of the 3011 questions in all countries, a team of coders coded 40 design characteristics of the question; • Some codes were automatically generated by Natural Language Processing software (syllables, words, etc). • Coders were students, assistants to the local coordinators of the ESS, and two experts; Predicting the quality of a survey question from its design characteristics: SQP Daniel Oberski
  101. 101. Introduction Question design Modeling measurement error Estimating measurement error Predicting measurement error Concl Description of the data Coding design characteristics of the 3011 questions • For each of the 3011 questions in all countries, a team of coders coded 40 design characteristics of the question; • Some codes were automatically generated by Natural Language Processing software (syllables, words, etc). • Coders were students, assistants to the local coordinators of the ESS, and two experts; • For English source version, experts double-coded questions independently, then created consensus codes; Predicting the quality of a survey question from its design characteristics: SQP Daniel Oberski
  102. 102. Introduction Question design Modeling measurement error Estimating measurement error Predicting measurement error Concl Description of the data Coding design characteristics of the 3011 questions • For each of the 3011 questions in all countries, a team of coders coded 40 design characteristics of the question; • Some codes were automatically generated by Natural Language Processing software (syllables, words, etc). • Coders were students, assistants to the local coordinators of the ESS, and two experts; • For English source version, experts double-coded questions independently, then created consensus codes; • Non-expert codes were quality-controlled by detailed comparison with consensus codes for the English source; Predicting the quality of a survey question from its design characteristics: SQP Daniel Oberski
  103. 103. Introduction Question design Modeling measurement error Estimating measurement error Predicting measurement error Concl Description of the data Coding design characteristics of the 3011 questions • For each of the 3011 questions in all countries, a team of coders coded 40 design characteristics of the question; • Some codes were automatically generated by Natural Language Processing software (syllables, words, etc). • Coders were students, assistants to the local coordinators of the ESS, and two experts; • For English source version, experts double-coded questions independently, then created consensus codes; • Non-expert codes were quality-controlled by detailed comparison with consensus codes for the English source; • In a meeting between the experts and each other coder, the discrepancies were discussed and either corrected or left in as true differences. Predicting the quality of a survey question from its design characteristics: SQP Daniel Oberski
  104. 104. Introduction Question design Modeling measurement error Estimating measurement error Predicting measurement error Concl Description of the data Coding design characteristics of the 3011 questions • For each of the 3011 questions in all countries, a team of coders coded 40 design characteristics of the question; • Some codes were automatically generated by Natural Language Processing software (syllables, words, etc). • Coders were students, assistants to the local coordinators of the ESS, and two experts; • For English source version, experts double-coded questions independently, then created consensus codes; • Non-expert codes were quality-controlled by detailed comparison with consensus codes for the English source; • In a meeting between the experts and each other coder, the discrepancies were discussed and either corrected or left in as true differences. Predicting the quality of a survey question from its design characteristics: SQP Daniel Oberski
  105. 105. Introduction Question design Modeling measurement error Estimating measurement error Predicting measurement error Concl Description of the data • absolute • avgabs intro • avgabs total • avgsy total • avgwrd intro • avgwrd total • balance • centrality • computer.assisted • concept • country • domain • dont know • encourage • fixrefpoints • form basic • future • labels • instr interv • instr respon • interviewer • intr request • intropresent • knowledge • labels gramm • labels order • language • motivation • opinionother • past • position • questiontype • scal neutral • scale basic • scale corres • scale trange • scale urange • showc boxes • showc horiz • showc letter • showc over • showc quest • showc start • socdesir • stimulus • subjectiveop • symmetry • used WH word • usedshowcard • visual • from Predicting the quality of a survey question from its design characteristics: SQP Daniel Oberski
  106. 106. Introduction Question design Modeling measurement error Estimating measurement error Predicting measurement error Concl Description of the data Domain of question # questions Internatl politics 64 Health 190 Living conditions 453 Other beliefs 292 Work 469 Personal relations 320 Consumer behavior 34 Leisure activts 131 National gvt 141 Institutions 284 Political parties 30 Trade unions 12 Economy 237 Other 354 Predicting the quality of a survey question from its design characteristics: SQP Daniel Oberski
  107. 107. Introduction Question design Modeling measurement error Estimating measurement error Predicting measurement error Concl Description of the data Concept of question # questions Evaluative belief 713 Feeling 903 Importance 96 Expectation 39 Facts, behavior 63 Judgement 123 Relationship 8 Evaluation 704 Norm 57 Policy 250 Right 4 Action tendency 51 Predicting the quality of a survey question from its design characteristics: SQP Daniel Oberski
  108. 108. Introduction Question design Modeling measurement error Estimating measurement error Predicting measurement error Concl Meta-analysis of the MTMM experiments Meta-analysis dataset • For each of the 3011 questions, we have in the database: • The estimated quality (reliability and validity coefficients) Predicting the quality of a survey question from its design characteristics: SQP Daniel Oberski
  109. 109. Introduction Question design Modeling measurement error Estimating measurement error Predicting measurement error Concl Meta-analysis of the MTMM experiments Meta-analysis dataset • For each of the 3011 questions, we have in the database: • The estimated quality (reliability and validity coefficients) • About 50 design characteristics (through hand- and automatic coding) Predicting the quality of a survey question from its design characteristics: SQP Daniel Oberski
  110. 110. Introduction Question design Modeling measurement error Estimating measurement error Predicting measurement error Concl Meta-analysis of the MTMM experiments Meta-analysis dataset • For each of the 3011 questions, we have in the database: • The estimated quality (reliability and validity coefficients) • About 50 design characteristics (through hand- and automatic coding) • The next step was to relate the design characteristics to the quality estimates: Predicting the quality of a survey question from its design characteristics: SQP Daniel Oberski
  111. 111. Introduction Question design Modeling measurement error Estimating measurement error Predicting measurement error Concl Meta-analysis of the MTMM experiments Meta-analysis dataset • For each of the 3011 questions, we have in the database: • The estimated quality (reliability and validity coefficients) • About 50 design characteristics (through hand- and automatic coding) • The next step was to relate the design characteristics to the quality estimates: • Can the quality estimates be predicted from the design characteristics? Predicting the quality of a survey question from its design characteristics: SQP Daniel Oberski
  112. 112. Introduction Question design Modeling measurement error Estimating measurement error Predicting measurement error Concl Meta-analysis of the MTMM experiments Meta-analysis dataset • For each of the 3011 questions, we have in the database: • The estimated quality (reliability and validity coefficients) • About 50 design characteristics (through hand- and automatic coding) • The next step was to relate the design characteristics to the quality estimates: • Can the quality estimates be predicted from the design characteristics? Predicting the quality of a survey question from its design characteristics: SQP Daniel Oberski
  113. 113. Introduction Question design Modeling measurement error Estimating measurement error Predicting measurement error Concl Meta-analysis of the MTMM experiments Meta-analysis dataset • For each of the 3011 questions, we have in the database: • The estimated quality (reliability and validity coefficients) • About 50 design characteristics (through hand- and automatic coding) • The next step was to relate the design characteristics to the quality estimates: • Can the quality estimates be predicted from the design characteristics? Predicting the quality of a survey question from its design characteristics: SQP Daniel Oberski
  114. 114. Introduction Question design Modeling measurement error Estimating measurement error Predicting measurement error Concl Meta-analysis of the MTMM experiments Meta-analysis • Prediction by random forests of regression trees (Breiman 2001); • Two separate models: one for validity and for reliability coefficients; Predicting the quality of a survey question from its design characteristics: SQP Daniel Oberski
  115. 115. Introduction Question design Modeling measurement error Estimating measurement error Predicting measurement error Concl Meta-analysis of the MTMM experiments Meta-analysis • Prediction by random forests of regression trees (Breiman 2001); • Two separate models: one for validity and for reliability coefficients; • Missing data are multiply imputed using the MICE algorithm (van Buuren & Groothuis-Oudshoorn 2011). Predicting the quality of a survey question from its design characteristics: SQP Daniel Oberski
  116. 116. Introduction Question design Modeling measurement error Estimating measurement error Predicting measurement error Concl Meta-analysis of the MTMM experiments Meta-analysis • Prediction by random forests of regression trees (Breiman 2001); • Two separate models: one for validity and for reliability coefficients; • Missing data are multiply imputed using the MICE algorithm (van Buuren & Groothuis-Oudshoorn 2011). Predicting the quality of a survey question from its design characteristics: SQP Daniel Oberski
  117. 117. Introduction Question design Modeling measurement error Estimating measurement error Predicting measurement error Concl Meta-analysis of the MTMM experiments Example regression tree for logit(reliability coefficient) | domain=3,4,7,11,13,14,112 domain=3 gradation>=0.5 position< 339.5 position>=410 concept=1,2 position< 404.5 concept=1,73,78 position< 322.5 ncategories>=4.5 domain=6,101,103,120 domain=4,7,11,13,14,112 gradation< 0.5 position>=339.5 position< 410 concept=73,75,76 position>=404.5 concept=2,76 position>=322.5 ncategories< 4.5 1.955 n=1988 1.724 n=1303 0.9636 n=108 0.4959 n=36 1.198 n=72 1.793 n=1195 1.642 n=722 2.023 n=473 1.544 n=108 1.28 n=76 2.17 n=32 2.165 n=365 1.97 n=217 2.45 n=148 2.394 n=685 1.489 n=138 2.622 n=547 2.384 n=233 2.799 n=314 2.681 n=260 3.364 n=54 Example regression tree for reliability coefficient Predicting the quality of a survey question from its design characteristics: SQP Daniel Oberski
  118. 118. Introduction Question design Modeling measurement error Estimating measurement error Predicting measurement error Concl Meta-analysis of the MTMM experiments Meta-analysis with random forests • R2 based on out-of-bag (crossvalidation) mean square error is 85% for validity coefficient and 60% for reliability coefficient. • Importance measures indicate domain, number of categories, concept, position in the questionnaire, number of syllables, country, number of words, fixed reference points, and other linguistic complexity measures are the most influential for reliability. Predicting the quality of a survey question from its design characteristics: SQP Daniel Oberski
  119. 119. Introduction Question design Modeling measurement error Estimating measurement error Predicting measurement error Concl Meta-analysis of the MTMM experiments Meta-analysis with random forests • R2 based on out-of-bag (crossvalidation) mean square error is 85% for validity coefficient and 60% for reliability coefficient. • Importance measures indicate domain, number of categories, concept, position in the questionnaire, number of syllables, country, number of words, fixed reference points, and other linguistic complexity measures are the most influential for reliability. • For validity, in addition to the above, order of the labels (positive-negative), centrality of the trait and other characteristics are also important. Predicting the quality of a survey question from its design characteristics: SQP Daniel Oberski
  120. 120. Introduction Question design Modeling measurement error Estimating measurement error Predicting measurement error Concl Meta-analysis of the MTMM experiments Meta-analysis with random forests • R2 based on out-of-bag (crossvalidation) mean square error is 85% for validity coefficient and 60% for reliability coefficient. • Importance measures indicate domain, number of categories, concept, position in the questionnaire, number of syllables, country, number of words, fixed reference points, and other linguistic complexity measures are the most influential for reliability. • For validity, in addition to the above, order of the labels (positive-negative), centrality of the trait and other characteristics are also important. Predicting the quality of a survey question from its design characteristics: SQP Daniel Oberski

×