About this talkWhat this talk is about: • The main concepts in surveys and questionnaires • Some “best practices” and general principles • There’s no way we can cover everything (not even a Ph.D. covers everything)What this talk isn’t about: • Statistical methods • Sampling theory • Scholarly literatureWhat you should get from this talk: • The ability to constructively critique questionnaires • The perspective needed to do better survey research
What is a survey?1. Census != Surveys Census: an entire population Survey: a sample representing a population2. Surveys != Questionnaire Surveys: highly structured process of measuring self-reported attitudes, opinions, beliefs, habits, behaviors of a population via a sample Questionnaire: instrument used in surveys that is distributed to the sample
Survey issuesSampling Who is the population? Is it possible to use the whole population? If not, how am I sampling? Is my method representative?Design Are all respondents getting the same survey? Or do I have multiple conditions?Analysis What are the data going to look like? How should I use counts or proportions? Are my results statistically significant?
Questionnaire issuesQuestion wording Have I written this question using unambiguous language? Will every word be understood the same way by every respondent?Response methods What options should I give to the respondent? Should I use scales? Agree-disagree? Open- endeds? Should I include no opinion/neutral?Question ordering Does it matter which order I put my questions or response options?
The ConstructConstructs are theoretical variables that you can’t measure directly • Examples: user satisfaction, attitude toward the missionThe questionnaire is the instrument used to measure constructs through observed variables • Examples: Likert scales, feeling thermometersAlways consider the following: is my construct valid? Am I asking respondents questions that are accurately measuring this construct?
The ConstructSome things to think about your construct:• What’s the polarity? Does it have valence?• How would I describe its continuum?• What’s the dimensionality?
Questionnaires: WordingIf respondents don’t understand your question in the exact same way and can’t respond equally easily, you will get measurement error. “Which of the following changes to Firefox would have the most impact on your experience?” Vocabulary ambiguity
Questionnaires: WordingIf respondents don’t understand your question in the exact same way and can’t respond equally easily, you will get measurement error. “Did you know that Mozilla is a mission- driven organization to make the Internet a better place?” Double-barreled
Questionnaires: WordingIf respondents don’t understand your question in the exact same way and can’t respond equally easily, you will get measurement error.“Would you say that mobile Firefox is better than any other mobile browser available on the market?” Lack of balance
Questionnaires: WordingIf respondents don’t understand your question in the exact same way and can’t respond equally easily, you will get measurement error. “How strongly do you agree or disagree that Mozilla is a positive force for Internet privacy?” Prone to cognitive bias
Questionnaires: WordingIf respondents don’t understand your question in the exact same way and can’t respond equally easily, you will get measurement error. “Rank these 20 features in order of most useful to least.” Prone to satisficing
Questionnaires: Responses 1.Make it as easy as possible for every respondent to respond! 2. The response options should map as closely to the construct’s continuum as possible.
Questionnaires: Responses “Can I use a rating scale?” Unipolar measure = 5pt scale (e.g. “Not at all -> All the time”) Bipolar measure = 7pt scale, with neutral point (e.g. “Strongly agree- Strongly disagree”)
Questionnaires: Responses “Should I enumerate my options or fully-label them?” Fully-labeled, non-enumerated options for scales have been shown to be the most reliable. Remember, one respondent’s “3” might not be the same as another’s!
Questionnaires: Responses “Should I include “don’t know“/ “no opinion” / neutral points?” Pro: You may get more accurate responses from low knowledge respondents (or ones without opinions) Con: You may see increased satisficing
Questionnaires: Responses “Can I use ranking?” Only with a few items, and only if you think all respondents will be able to clearly distinguish between all options. What if most respondents don’t care about almost all of your options? What if they can’t choose the third most important item between three different options (equally important)? Most importantly, how are you going to do your analysis?
Questionnaires: Responses “Can I use agree-disagree?” Think about the eventual distribution of responses to these questions; it is almost always easier to agree than to disagree with statements. It is harder to evaluate from a negative frame than a positive, so flipping the valence of a question might not help. There are, however, exceptions.
Questionnaires: Responses “Should I ask for specific quantities?” Humans are not very accurate at any quantitatively specific. Stick to intervals and natural frequencies (1/10, not 10%) as much as possible.
Questionnaires: Responses “What kind of options should I use for habitual or behavioral questions?” Humans are also bad at remembering their previous habits or behaviors. Use average time periods, e.g. “In an average day/week/month…”
Questionnaires: Responses “When should I use open-ended questions?” They are great for exploratory but not confirmatory research They are also useful if you don’t want to bias your respondents towards choosing options that they haven’t seen before “How many open-ended questions can I use?” Thoughtful, deliberative responses are extremely taxing cognitively. If you want a good response rate, never make them mandatory. If you must, use them sparingly. No more than 1-3, and try not to put them together.
Questionnaires: Ordering Why should I care about the order of questions or responses?Questions might have spillover influence on future responses: The answer to question x might affect responses to question x + 1…n. This is why demographic questions tend to put at the end of questionnaires.Response option ordering might skew your distribution: People tend to focus more on earlier or later options, and spend less time evaluating middle options (primacy or recency effects).One way to protect against ordering effects: randomization Blocks of questions: randomize between blocks and/or within blocks Response options: ranking, list ordering, polarity
A few examplesNow we’re going to walk through some examples to show how questionphrasing and response options can influence your conclusions.Consider a classic example: how satisfied are users with a product?A reasonable first approach: why don’t we just ask users how satisfiedthey are with the product?
A few examplesNot bad!But what does satisfied mean? Do respondents have a set of featuresthat they evaluate a product on? Does “satisfied” mean that theproduct is doing a better job on delivering those features thanotherwise? Are people carefully considering each of these featureswhen they evaluate a product for their level of satisfaction?Maybe we should just ask about likelihood to recommend theproduct. After all, if they’re satisfied with it, they’re probably morelikely to recommend it to other people they encounter who are in themarket for a product like ours.
A few examplesNow we’re getting some interesting differences; more users say they’rewilling to recommend this product than are satisfied.At this point, we could do some interesting comparisons; who are theseusers who endorse one answer to the first question and then a differentposition on the second question?But remember, we’re still trying to get to this idea of satisfaction. Clearly,there’s a bit of difference between satisfaction and willingness torecommend.What if we just ask about likability? After all, both satisfaction andwillingness to recommend presuppose that you generally like theproduct.
A few examplesLikability shows a different distribution of responses than the other twoquestions! From this response, we see that more users report that they “like[the product] a great deal” than they report their satisfaction or theirwillingness to recommend.From these three questions, we can get to a much better understanding ofhow attitudes towards the product can influence willingness to recommend itto others.Let’s compare this to a well-known, widely used question for measuringcustomer satisfaction.
A few examplesThis is an 11-pt, partially labeled, unipolar scale with a neutral point. Can youlist all the problems with this approach?A common way that people use this type of question: subtract the proportionof respondents who indicate 6 or less from the proportion of respondents whoindicate 9 or up (apologies that the 7+ responses are lumped together ingray).Note how the distribution of responses to this question does not allow you thekind of insight that you would develop from the previous three responses.Look at how all responses below “neutral” are lumped together.Note how a single question would not capture the differences betweenwillingness to recommend, satisfaction, and likability.
Best Practices1. Always write down your research goal. You should write it down in 2-3 sentences so that a stranger can understand it.2. Verify that you can’t achieve your research goal through behavioral measures.3. Try to make your research questions as clear as possible. This makes it easier to write your questionnaire to directly address your questions.4. Work with at least one other person in creating your questionnaire.5. Pretest your survey with naïve respondents.6. Always think about the distribution of responses!7. Don’t put too much emphasis on statistical significance. Remember, you can make anything significant with enough respondents.8. Most importantly, it’s questionnaire design not engineering. These aren’t rules, but guidelines to get better results!