What paradata can tell you about the
quality of web surveys?
Mario Callegaro Ph.D.
Senior Survey Research Scientist
User Insights team, Brand Studio
Google London
Qualtrics Converge Europe, London April 26, 2017
Disclaimer
The opinions expressed in this presentation are the author's own and do not reflect the views of
Google
2
How do we know if a question works?
How do we know if a question measures what is intended to measure?
How do we know if respondents understand the question and can appropriately respond to it?
3
What are paradata?
Paradata are data about the process of answering the survey itself
Taxonomy of paradata types
Paradata for web surveys can be classified into the following groups:
1. Direct paradata
• Contact-info
• Device-type paradata
• Questionnaire navigation paradata
2. Indirect paradata
• E.g. eye tracking, video recording, behavioral coding
5
Contact info paradata
Direct paradata: Contact info
• Outcomes of an email invitation
• Access to the questionnaire introduction page
• Last question answered before breakoff
7
Survey breakoffs by question
8
(Sakshaug & Crawford, 2010) Data courtesy from Sakshaug
75
80
85
90
95
100
Permission asked to use
school records (grades)
for research purposes
Device type paradata
Direct paradata: Device type
• User-agent string
• Screen resolution
• Browser window size
• Javascript and Flash active
• IP Address (mostly considered Personal Identifiable Information)
• GPS coordinates (mostly considered Personal Identifiable Information)
• Cookies
10
Device type: GPS coordinates example
11Dayton, J & H. Driscoll: The Next CAPI Evolution - Completing Web Surveys on Cell-Enabled iPads. AAPOR
Device type: GPS coordinates example (cont.)
12Dayton, J & H. Driscoll: The Next CAPI Evolution - Completing Web Surveys on Cell-Enabled iPads. AAPOR 2011
Questionnaire navigation paradata
part 1
Direct paradata: Questionnaire navigation 1
Mouse clicks and mouse coordinates
Mouse clicks and its position can be captured with a JavaScript. Excessive mouse movements can
be a sign of problems with the question
Change of answers
Change of answers is an indicator of potential confusion with a question and can be used to improve
questionnaire design
Typing and keystrokes
Typing and keystrokes can create an audit trail for each survey and used to detect unusual behavior
both from the respondent side and the interviewer side
14
Questionnaire navigation paradata example
lXNtoilre7_2|1|M677|13|1320#
M548|174|830#
M160|101|1750#
M366|192|550#
M728|4|7690#
M489|247|610#
C493|229|3301#
R110|1#
C493|280|4301#
R110|3#
C493|345|3901#
R110|5#
C521|399|3801#
SU521|399|60|undefined#|
15
Stieger and Reips (2010, p. 1490)
Change of answers ex. (Haraldsen et al, 2005)
16
Fully labeled vs. polar point vs. polar point with numbers vs. answer box
17
Stern (2008, p. 384)
Fully labeled vs. polar point vs. polar point with numbers vs. answer box
Mean ratings
18
2
2 2
3
1
2
3
4
5
Fully labeled Polar point Polar point w/#'s Answer box
Stern (2008) & Christian (2003)
Fully labeled vs. polar point vs. polar point with numbers vs. answer box
% of reciprocal changes
19
2
7
6
8
0
2
4
6
8
10
Fully labeled Polar point Polar point w/ #'s Answer box
Stern (2008)
Questionnaire navigation paradata
part 2
Direct paradata: Questionnaire navigation 2
Order of answering
In a page with multiple questions the order of answering is an indicator on how the respondent reads
the questions
Movements across the questionnaire (forward/backward)
If the questionnaire allows going backward or going forward by skipping questions, unusual
movements are a symptom of issues with the questionnaire or the respondent
Scrolling
The amount of scrolling depends on the screen size of the device used and on the size of the
browser window used by the respondent
21
Time latency paradata
Time spent per question/screen
This is the most published topic in paradata research: time latency information.
There are many studies focusing on major themes:
• Attitude strength
• Response uncertainty
• Question wording
• Response error (e.g. speeding)
• Satisficing / Optimizing
22
Order of response categories:
Positive vs. negative orientation
POSITIVE
How accessible have your
instructors been both in and
outside of class?
Very accessible
Somewhat accessible
Neutral
Somewhat inaccessible
Very inaccessible
Don’t know
23
NEGATIVE
How accessible have your
instructors been both in and
outside of class?
Very inaccessible
Somewhat inaccessible
Neutral
Somewhat accessible
Very accessible
Don’t know
Christian, Parsons & Dillman (2009)
Positive vs. negative orientation
Results in %
24
0
10
20
30
40
50
Positive order Negative order
Christian, Parsons & Dillman (2009)
Positive vs. negative orientation
Time spent answering the question
25
0
0.4
0.8
1.2
1.6
2
2.4
Positive order Negative order
Christian, Parsons & Dillman (2009)
Privacy and ethical issues in collecting paradata
Should we tell respondents we are collecting paradata?
What happens when we tell respondents we are collecting paradata and we ask permission to use
them?
• 59.5% agreed in the LISS Dutch panel (across experimental manipulations)
• 65.6% agreed in the Knowledge Networks U.S. panel (across experiment manipulations)
• 69.3% agreed in a U.S. volunteer non-probability panel (across experimental manipulations)
(Couper and Singer, 2013, studies done using vignettes)
26
Conclusions & references
Conclusions on paradata
• The amount of paradata that can be collected grow as the technological capabilities grow
• Although paradata can be collected “easily” and at a low cost, we should not underestimate the
cost of managing and analysing paradata (Nicolaas, 2011)
• Paradata should not replace other ways of pretesting the questionnaire because it does not
answer all the research questions
• Paradata analysis is another tool to use in assessing the quality of a survey and in making
improvements to the questionnaire and the entire online survey experience
28
References on Paradata for web surveys
Callegaro, M. (2013). Paradata in web surveys
(Chapter 11).
In F. Kreuter (Ed.), Improving surveys with paradata:
Analytic use of process information (pp. 261–279).
Hoboken, NJ: Wiley.
PDF available at
http://research.google.com/pubs/MarioCallegaro.html
Callegaro, Lozar Manfreda & Vehovar (2015). Web
survey methodology. London: Sage
29
30
Q & A

What paradata can tell you about the quality of web surveys?

  • 1.
    What paradata cantell you about the quality of web surveys? Mario Callegaro Ph.D. Senior Survey Research Scientist User Insights team, Brand Studio Google London Qualtrics Converge Europe, London April 26, 2017
  • 2.
    Disclaimer The opinions expressedin this presentation are the author's own and do not reflect the views of Google 2
  • 3.
    How do weknow if a question works? How do we know if a question measures what is intended to measure? How do we know if respondents understand the question and can appropriately respond to it? 3
  • 4.
    What are paradata? Paradataare data about the process of answering the survey itself
  • 5.
    Taxonomy of paradatatypes Paradata for web surveys can be classified into the following groups: 1. Direct paradata • Contact-info • Device-type paradata • Questionnaire navigation paradata 2. Indirect paradata • E.g. eye tracking, video recording, behavioral coding 5
  • 6.
  • 7.
    Direct paradata: Contactinfo • Outcomes of an email invitation • Access to the questionnaire introduction page • Last question answered before breakoff 7
  • 8.
    Survey breakoffs byquestion 8 (Sakshaug & Crawford, 2010) Data courtesy from Sakshaug 75 80 85 90 95 100 Permission asked to use school records (grades) for research purposes
  • 9.
  • 10.
    Direct paradata: Devicetype • User-agent string • Screen resolution • Browser window size • Javascript and Flash active • IP Address (mostly considered Personal Identifiable Information) • GPS coordinates (mostly considered Personal Identifiable Information) • Cookies 10
  • 11.
    Device type: GPScoordinates example 11Dayton, J & H. Driscoll: The Next CAPI Evolution - Completing Web Surveys on Cell-Enabled iPads. AAPOR
  • 12.
    Device type: GPScoordinates example (cont.) 12Dayton, J & H. Driscoll: The Next CAPI Evolution - Completing Web Surveys on Cell-Enabled iPads. AAPOR 2011
  • 13.
  • 14.
    Direct paradata: Questionnairenavigation 1 Mouse clicks and mouse coordinates Mouse clicks and its position can be captured with a JavaScript. Excessive mouse movements can be a sign of problems with the question Change of answers Change of answers is an indicator of potential confusion with a question and can be used to improve questionnaire design Typing and keystrokes Typing and keystrokes can create an audit trail for each survey and used to detect unusual behavior both from the respondent side and the interviewer side 14
  • 15.
    Questionnaire navigation paradataexample lXNtoilre7_2|1|M677|13|1320# M548|174|830# M160|101|1750# M366|192|550# M728|4|7690# M489|247|610# C493|229|3301# R110|1# C493|280|4301# R110|3# C493|345|3901# R110|5# C521|399|3801# SU521|399|60|undefined#| 15 Stieger and Reips (2010, p. 1490)
  • 16.
    Change of answersex. (Haraldsen et al, 2005) 16
  • 17.
    Fully labeled vs.polar point vs. polar point with numbers vs. answer box 17 Stern (2008, p. 384)
  • 18.
    Fully labeled vs.polar point vs. polar point with numbers vs. answer box Mean ratings 18 2 2 2 3 1 2 3 4 5 Fully labeled Polar point Polar point w/#'s Answer box Stern (2008) & Christian (2003)
  • 19.
    Fully labeled vs.polar point vs. polar point with numbers vs. answer box % of reciprocal changes 19 2 7 6 8 0 2 4 6 8 10 Fully labeled Polar point Polar point w/ #'s Answer box Stern (2008)
  • 20.
  • 21.
    Direct paradata: Questionnairenavigation 2 Order of answering In a page with multiple questions the order of answering is an indicator on how the respondent reads the questions Movements across the questionnaire (forward/backward) If the questionnaire allows going backward or going forward by skipping questions, unusual movements are a symptom of issues with the questionnaire or the respondent Scrolling The amount of scrolling depends on the screen size of the device used and on the size of the browser window used by the respondent 21
  • 22.
    Time latency paradata Timespent per question/screen This is the most published topic in paradata research: time latency information. There are many studies focusing on major themes: • Attitude strength • Response uncertainty • Question wording • Response error (e.g. speeding) • Satisficing / Optimizing 22
  • 23.
    Order of responsecategories: Positive vs. negative orientation POSITIVE How accessible have your instructors been both in and outside of class? Very accessible Somewhat accessible Neutral Somewhat inaccessible Very inaccessible Don’t know 23 NEGATIVE How accessible have your instructors been both in and outside of class? Very inaccessible Somewhat inaccessible Neutral Somewhat accessible Very accessible Don’t know Christian, Parsons & Dillman (2009)
  • 24.
    Positive vs. negativeorientation Results in % 24 0 10 20 30 40 50 Positive order Negative order Christian, Parsons & Dillman (2009)
  • 25.
    Positive vs. negativeorientation Time spent answering the question 25 0 0.4 0.8 1.2 1.6 2 2.4 Positive order Negative order Christian, Parsons & Dillman (2009)
  • 26.
    Privacy and ethicalissues in collecting paradata Should we tell respondents we are collecting paradata? What happens when we tell respondents we are collecting paradata and we ask permission to use them? • 59.5% agreed in the LISS Dutch panel (across experimental manipulations) • 65.6% agreed in the Knowledge Networks U.S. panel (across experiment manipulations) • 69.3% agreed in a U.S. volunteer non-probability panel (across experimental manipulations) (Couper and Singer, 2013, studies done using vignettes) 26
  • 27.
  • 28.
    Conclusions on paradata •The amount of paradata that can be collected grow as the technological capabilities grow • Although paradata can be collected “easily” and at a low cost, we should not underestimate the cost of managing and analysing paradata (Nicolaas, 2011) • Paradata should not replace other ways of pretesting the questionnaire because it does not answer all the research questions • Paradata analysis is another tool to use in assessing the quality of a survey and in making improvements to the questionnaire and the entire online survey experience 28
  • 29.
    References on Paradatafor web surveys Callegaro, M. (2013). Paradata in web surveys (Chapter 11). In F. Kreuter (Ed.), Improving surveys with paradata: Analytic use of process information (pp. 261–279). Hoboken, NJ: Wiley. PDF available at http://research.google.com/pubs/MarioCallegaro.html Callegaro, Lozar Manfreda & Vehovar (2015). Web survey methodology. London: Sage 29
  • 30.