SlideShare a Scribd company logo
1 of 12
Download to read offline
Working with Survey Data
Cameron Raynor
March 29, 2022
+
Background
Founder and Principal of RA2 in Calgary
Apply data analysis, research and digital tools for
NGOs, political groups and brands
- Political polling
- Data strategy
- Stakeholder/member relations
Survey research, NLP/machine learning, social
network analysis
Survey Data
Goal:
- Estimate opinions, attitudes, beliefs,
values, behaviour, etc. for a population
- Accurately assess the variable of interest
for the whole population, not just the
survey respondents
Source: Stats and R Blog
Population vs. Sample
Survey error comes from many sources
Most commonly reported error is the Margin of
Random Sampling Error
● This only accounts for part of the error
Other sources of error come down to how the data
was collected, how questions were asked, etc.
● These are more difficult to estimate
Total Survey Error
Source: Biemer (2010), Total Survey Error: Design,
Implementation, and Evaluation
Survey Error
Probability vs. Non-probability sample
● Probability samples are the gold standard
● In probability samples, each member of
the sample has a chance of being included
in the sample
● New methods are somewhere in between.
E.g. probability panels
Stratification and quotas
● Reduce bias in the sample
● Not a silver-bullet solution
Practical Considerations
Cost considerations
● Convenience samples are typically much
cheaper than probability samples
Convenience considerations
● Some methods take longer to field
● May not be easy to reach some groups
Nonsense/Fraudulent Responses:
● Satisficing—respondents take mental shortcuts
● Respondents may not paying attention
● May just want the survey incentive (if applicable)
● Could be malicious to distort survey results
Quality Control Checks
● Straightlining: Respondent chooses all the same questions in a
grid
● Speeding: Respondent complete the survey in superhuman time
● Trap Questions: Respondents select implausible answers or don’t
follow instructions
Fatigue Leads to Satisficing
● Shorter is better
● Very short (less than 5 questions) is ideal
● Data quality drop significantly after ~20 minutes (YMMV)
Survey Data Quality
For more on trap questions:
Liu and Wronski 2018: Trap questions in
online surveys: Results from three web
survey experiments
Kung et al 2018: Are Attention Check
Questions a Threat to Scale Validity
Missing Completely at Random (MCAR):
● The best case scenario
● Data is missing at random and not changing the distribution of responses
Missing Not at Random (MNAR)
● There is a pattern to the missingness
● Could indicate a larger issue with data collection
● May indicate response bias (social desirability bias, etc.) or bias in the
sample
Missing at Random (MAR)
● There is only a relationship between missingness and the value you’re
measuring
Missing Data
There are many methods to deal with missing data (complete cases, nearest
neighbour, mean, median, etc.)
MICE is a top-performer
● Stands for Multivariate Imputation by Chained Equations
● Uses other variables in the dataset to estimate missing values
● Generates “plausible synthetic values”
From the documentation: By default, the method uses pmm, predictive mean
matching (numeric data) logreg, logistic regression imputation (binary data, factor
with 2 levels) polyreg, polytomous regression imputation for unordered categorical
data (factor > 2 levels) polr, proportional odds model for (ordered, > 2 levels).
General rule is to only input up to 5% missingness
Great documentation at
https://www.rdocumentation.org/packages/mice/versions/3.13.0/topics/mice
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3074241/
Imputation with MICE
Image designed by Jaden M. Walters
Using population data from a Fall 2021 political
research poll
Will make some simplifying assumptions for
demonstration purposes
● Ignoring stratification
● Assuming MCAR
(Missing Completely at Random)
MICE Imputation Example in R
Propensity weighting
● Adjust survey sample to known population parameters
● Weight by the inverse probability of selection to remove bias
● With probability samples, selection probabilities are known
● With non-probability samples, probabilities are estimated
Algorithm iteratively adjusts weights to match survey distributions to known population distributions
● Implemented in R using the rake() function from the survey package, as well as the
anesrake() function from the anesrake package
This is one of, if not the, most common weighting method used by researchers and pollsters
● Only requires knowledge of the marginal populations
Weighting with Post-Stratification
Using population data from:
https://www12.statcan.gc.ca/census-recensement/2016/dp-pd/prof/
details/download-telecharger/comp/page_dl-tc.cfm?Lang=E
Post-Stratification Raking Example in R
Thank you!
✉ cam@ra2.io
🔗 https://www.linkedin.com/in/cameronraynor/
🌐 https://www.ra2.io

More Related Content

What's hot

AAPOR 2016 - Dutwin and Buskirk - Apples to Oranges
AAPOR 2016 - Dutwin and Buskirk - Apples to OrangesAAPOR 2016 - Dutwin and Buskirk - Apples to Oranges
AAPOR 2016 - Dutwin and Buskirk - Apples to OrangesSSRS Market Research
 
MAT 510 Effective Communication - tutorialrank.com
MAT 510  Effective Communication - tutorialrank.comMAT 510  Effective Communication - tutorialrank.com
MAT 510 Effective Communication - tutorialrank.comBartholomew46
 
Topic 6 errors in survey research
Topic 6   errors in survey researchTopic 6   errors in survey research
Topic 6 errors in survey researchDhani Ahmad
 
Causal inference in practice
Causal inference in practiceCausal inference in practice
Causal inference in practiceAmit Sharma
 
MAT 510 Effective Communication - snaptutorial.com
MAT 510 Effective Communication - snaptutorial.comMAT 510 Effective Communication - snaptutorial.com
MAT 510 Effective Communication - snaptutorial.comdonaldzs24
 
Mat 510 Believe Possibilities / snaptutorial.com
Mat 510  Believe Possibilities / snaptutorial.comMat 510  Believe Possibilities / snaptutorial.com
Mat 510 Believe Possibilities / snaptutorial.comDavis29a
 
Basics of Data Analysis
Basics of Data AnalysisBasics of Data Analysis
Basics of Data Analysisankurjain1909
 
BUS 308 Week 2 Quiz (Statistics For Managers - entirecourse.com)
BUS 308 Week 2 Quiz (Statistics For Managers - entirecourse.com)BUS 308 Week 2 Quiz (Statistics For Managers - entirecourse.com)
BUS 308 Week 2 Quiz (Statistics For Managers - entirecourse.com)John Sperling
 
Weber thesis defense presentation
Weber thesis defense presentationWeber thesis defense presentation
Weber thesis defense presentationjl_weber
 
[WI 2017] Affective Prediction By Collaborative Chains In Movie Recommendation
[WI 2017] Affective Prediction By Collaborative Chains In Movie Recommendation[WI 2017] Affective Prediction By Collaborative Chains In Movie Recommendation
[WI 2017] Affective Prediction By Collaborative Chains In Movie RecommendationYONG ZHENG
 
Data Analysis Section
Data Analysis SectionData Analysis Section
Data Analysis SectionBrett Combs
 
GradTrack: Getting Started with Statistics September 20, 2018
GradTrack: Getting Started with Statistics September 20, 2018GradTrack: Getting Started with Statistics September 20, 2018
GradTrack: Getting Started with Statistics September 20, 2018Nancy Garmer
 
Multivariate Data analysis Workshop at UC Davis 2012
Multivariate Data analysis Workshop at UC Davis 2012Multivariate Data analysis Workshop at UC Davis 2012
Multivariate Data analysis Workshop at UC Davis 2012Dmitry Grapov
 
Automation Extraction of Side Effect Information from Consumer drug reviews
Automation Extraction of Side Effect Information from Consumer drug reviewsAutomation Extraction of Side Effect Information from Consumer drug reviews
Automation Extraction of Side Effect Information from Consumer drug reviewsSunil Paudel
 
NCompass Live: Conducting Surveys III: Analyzing Data and Reporting Methods
NCompass Live: Conducting Surveys III: Analyzing Data and Reporting Methods NCompass Live: Conducting Surveys III: Analyzing Data and Reporting Methods
NCompass Live: Conducting Surveys III: Analyzing Data and Reporting Methods Nebraska Library Commission
 
Arevik avedian-survey-design-power point
Arevik avedian-survey-design-power pointArevik avedian-survey-design-power point
Arevik avedian-survey-design-power pointRheza Putra Wijaya
 
High throughput Data Analysis
High throughput Data AnalysisHigh throughput Data Analysis
High throughput Data AnalysisSetia Pramana
 
Women in Data Science 2018 Slides--Small Samples, Subgroups, and Topology
Women in Data Science 2018 Slides--Small Samples, Subgroups, and TopologyWomen in Data Science 2018 Slides--Small Samples, Subgroups, and Topology
Women in Data Science 2018 Slides--Small Samples, Subgroups, and TopologyColleen Farrelly
 

What's hot (19)

AAPOR 2016 - Dutwin and Buskirk - Apples to Oranges
AAPOR 2016 - Dutwin and Buskirk - Apples to OrangesAAPOR 2016 - Dutwin and Buskirk - Apples to Oranges
AAPOR 2016 - Dutwin and Buskirk - Apples to Oranges
 
MAT 510 Effective Communication - tutorialrank.com
MAT 510  Effective Communication - tutorialrank.comMAT 510  Effective Communication - tutorialrank.com
MAT 510 Effective Communication - tutorialrank.com
 
Topic 6 errors in survey research
Topic 6   errors in survey researchTopic 6   errors in survey research
Topic 6 errors in survey research
 
Causal inference in practice
Causal inference in practiceCausal inference in practice
Causal inference in practice
 
MAT 510 Effective Communication - snaptutorial.com
MAT 510 Effective Communication - snaptutorial.comMAT 510 Effective Communication - snaptutorial.com
MAT 510 Effective Communication - snaptutorial.com
 
Mat 510 Believe Possibilities / snaptutorial.com
Mat 510  Believe Possibilities / snaptutorial.comMat 510  Believe Possibilities / snaptutorial.com
Mat 510 Believe Possibilities / snaptutorial.com
 
Basics of Data Analysis
Basics of Data AnalysisBasics of Data Analysis
Basics of Data Analysis
 
BUS 308 Week 2 Quiz (Statistics For Managers - entirecourse.com)
BUS 308 Week 2 Quiz (Statistics For Managers - entirecourse.com)BUS 308 Week 2 Quiz (Statistics For Managers - entirecourse.com)
BUS 308 Week 2 Quiz (Statistics For Managers - entirecourse.com)
 
Weber thesis defense presentation
Weber thesis defense presentationWeber thesis defense presentation
Weber thesis defense presentation
 
[WI 2017] Affective Prediction By Collaborative Chains In Movie Recommendation
[WI 2017] Affective Prediction By Collaborative Chains In Movie Recommendation[WI 2017] Affective Prediction By Collaborative Chains In Movie Recommendation
[WI 2017] Affective Prediction By Collaborative Chains In Movie Recommendation
 
Data Analysis Section
Data Analysis SectionData Analysis Section
Data Analysis Section
 
GradTrack: Getting Started with Statistics September 20, 2018
GradTrack: Getting Started with Statistics September 20, 2018GradTrack: Getting Started with Statistics September 20, 2018
GradTrack: Getting Started with Statistics September 20, 2018
 
Multivariate Data analysis Workshop at UC Davis 2012
Multivariate Data analysis Workshop at UC Davis 2012Multivariate Data analysis Workshop at UC Davis 2012
Multivariate Data analysis Workshop at UC Davis 2012
 
Quality of data
Quality of dataQuality of data
Quality of data
 
Automation Extraction of Side Effect Information from Consumer drug reviews
Automation Extraction of Side Effect Information from Consumer drug reviewsAutomation Extraction of Side Effect Information from Consumer drug reviews
Automation Extraction of Side Effect Information from Consumer drug reviews
 
NCompass Live: Conducting Surveys III: Analyzing Data and Reporting Methods
NCompass Live: Conducting Surveys III: Analyzing Data and Reporting Methods NCompass Live: Conducting Surveys III: Analyzing Data and Reporting Methods
NCompass Live: Conducting Surveys III: Analyzing Data and Reporting Methods
 
Arevik avedian-survey-design-power point
Arevik avedian-survey-design-power pointArevik avedian-survey-design-power point
Arevik avedian-survey-design-power point
 
High throughput Data Analysis
High throughput Data AnalysisHigh throughput Data Analysis
High throughput Data Analysis
 
Women in Data Science 2018 Slides--Small Samples, Subgroups, and Topology
Women in Data Science 2018 Slides--Small Samples, Subgroups, and TopologyWomen in Data Science 2018 Slides--Small Samples, Subgroups, and Topology
Women in Data Science 2018 Slides--Small Samples, Subgroups, and Topology
 

Similar to Working with survey data with Cameron Rayner

Missing Data Analysis_Data Analysis Techniques
Missing Data Analysis_Data Analysis TechniquesMissing Data Analysis_Data Analysis Techniques
Missing Data Analysis_Data Analysis TechniquesPrakriti Chandan Sinha
 
A Two-Step Self-Evaluation Algorithm On Imputation Approaches For Missing Cat...
A Two-Step Self-Evaluation Algorithm On Imputation Approaches For Missing Cat...A Two-Step Self-Evaluation Algorithm On Imputation Approaches For Missing Cat...
A Two-Step Self-Evaluation Algorithm On Imputation Approaches For Missing Cat...CSCJournals
 
Human-Machine Collaboration in Organizations: Impact of Algorithm Bias on De...
 Human-Machine Collaboration in Organizations: Impact of Algorithm Bias on De... Human-Machine Collaboration in Organizations: Impact of Algorithm Bias on De...
Human-Machine Collaboration in Organizations: Impact of Algorithm Bias on De...Anh Luong
 
Fairness in Search & RecSys 네이버 검색 콜로키움 김진영
Fairness in Search & RecSys 네이버 검색 콜로키움 김진영Fairness in Search & RecSys 네이버 검색 콜로키움 김진영
Fairness in Search & RecSys 네이버 검색 콜로키움 김진영Jin Young Kim
 
8 sampling & sample size (Dr. Mai,2014)
8  sampling & sample size (Dr. Mai,2014)8  sampling & sample size (Dr. Mai,2014)
8 sampling & sample size (Dr. Mai,2014)Phong Đá
 
Stat 3203 -sampling errors and non-sampling errors
Stat 3203 -sampling errors  and non-sampling errorsStat 3203 -sampling errors  and non-sampling errors
Stat 3203 -sampling errors and non-sampling errorsKhulna University
 
Forecasting Elections from Voters’ Perceptions
Forecasting Elections from Voters’ Perceptions Forecasting Elections from Voters’ Perceptions
Forecasting Elections from Voters’ Perceptions agraefe
 
Exploratory Factor Analysis With Small Samples and Missing Data
Exploratory Factor Analysis With Small Samples and Missing DataExploratory Factor Analysis With Small Samples and Missing Data
Exploratory Factor Analysis With Small Samples and Missing DataFatemeh Nikbakht
 
NPTL - Machine Learning by Madhur Jatiya.pdf
NPTL - Machine Learning by Madhur Jatiya.pdfNPTL - Machine Learning by Madhur Jatiya.pdf
NPTL - Machine Learning by Madhur Jatiya.pdfMr. Moms
 
Credit Card Default Risk
Credit Card Default RiskCredit Card Default Risk
Credit Card Default RiskVipul55627
 
Analysis-of-data-with-missing-values.pptx
Analysis-of-data-with-missing-values.pptxAnalysis-of-data-with-missing-values.pptx
Analysis-of-data-with-missing-values.pptxAASTHAJAJOO
 
Stated preference methods and analysis
Stated preference methods and analysisStated preference methods and analysis
Stated preference methods and analysisHabet Madoyan
 
errors in research design-4-.pdf.pdf
errors in research design-4-.pdf.pdferrors in research design-4-.pdf.pdf
errors in research design-4-.pdf.pdfHudaElMaghraby1
 
Construction of composite index: process & methods
Construction of composite index:  process & methodsConstruction of composite index:  process & methods
Construction of composite index: process & methodsgopichandbalusu
 
biki1 biostat.pdf
biki1 biostat.pdfbiki1 biostat.pdf
biki1 biostat.pdfGoogle
 

Similar to Working with survey data with Cameron Rayner (20)

Missing Data Analysis_Data Analysis Techniques
Missing Data Analysis_Data Analysis TechniquesMissing Data Analysis_Data Analysis Techniques
Missing Data Analysis_Data Analysis Techniques
 
A Two-Step Self-Evaluation Algorithm On Imputation Approaches For Missing Cat...
A Two-Step Self-Evaluation Algorithm On Imputation Approaches For Missing Cat...A Two-Step Self-Evaluation Algorithm On Imputation Approaches For Missing Cat...
A Two-Step Self-Evaluation Algorithm On Imputation Approaches For Missing Cat...
 
Human-Machine Collaboration in Organizations: Impact of Algorithm Bias on De...
 Human-Machine Collaboration in Organizations: Impact of Algorithm Bias on De... Human-Machine Collaboration in Organizations: Impact of Algorithm Bias on De...
Human-Machine Collaboration in Organizations: Impact of Algorithm Bias on De...
 
Survey Research: An Intro
Survey Research: An IntroSurvey Research: An Intro
Survey Research: An Intro
 
Fairness in Search & RecSys 네이버 검색 콜로키움 김진영
Fairness in Search & RecSys 네이버 검색 콜로키움 김진영Fairness in Search & RecSys 네이버 검색 콜로키움 김진영
Fairness in Search & RecSys 네이버 검색 콜로키움 김진영
 
8 sampling & sample size (Dr. Mai,2014)
8  sampling & sample size (Dr. Mai,2014)8  sampling & sample size (Dr. Mai,2014)
8 sampling & sample size (Dr. Mai,2014)
 
Stat 3203 -sampling errors and non-sampling errors
Stat 3203 -sampling errors  and non-sampling errorsStat 3203 -sampling errors  and non-sampling errors
Stat 3203 -sampling errors and non-sampling errors
 
Forecasting Elections from Voters’ Perceptions
Forecasting Elections from Voters’ Perceptions Forecasting Elections from Voters’ Perceptions
Forecasting Elections from Voters’ Perceptions
 
Exploratory Factor Analysis With Small Samples and Missing Data
Exploratory Factor Analysis With Small Samples and Missing DataExploratory Factor Analysis With Small Samples and Missing Data
Exploratory Factor Analysis With Small Samples and Missing Data
 
Session 3 sample design
Session 3   sample designSession 3   sample design
Session 3 sample design
 
NPTL - Machine Learning by Madhur Jatiya.pdf
NPTL - Machine Learning by Madhur Jatiya.pdfNPTL - Machine Learning by Madhur Jatiya.pdf
NPTL - Machine Learning by Madhur Jatiya.pdf
 
Credit Card Default Risk
Credit Card Default RiskCredit Card Default Risk
Credit Card Default Risk
 
Analysis-of-data-with-missing-values.pptx
Analysis-of-data-with-missing-values.pptxAnalysis-of-data-with-missing-values.pptx
Analysis-of-data-with-missing-values.pptx
 
Survey research lecture 9
Survey research lecture 9Survey research lecture 9
Survey research lecture 9
 
Methods of sample selection
Methods of sample selectionMethods of sample selection
Methods of sample selection
 
Stated preference methods and analysis
Stated preference methods and analysisStated preference methods and analysis
Stated preference methods and analysis
 
Data quality: total survey error
Data quality: total survey errorData quality: total survey error
Data quality: total survey error
 
errors in research design-4-.pdf.pdf
errors in research design-4-.pdf.pdferrors in research design-4-.pdf.pdf
errors in research design-4-.pdf.pdf
 
Construction of composite index: process & methods
Construction of composite index:  process & methodsConstruction of composite index:  process & methods
Construction of composite index: process & methods
 
biki1 biostat.pdf
biki1 biostat.pdfbiki1 biostat.pdf
biki1 biostat.pdf
 

More from Data For Good Regina

Machine Learning Made Easy - Data for Good Regina
Machine Learning Made Easy - Data for Good ReginaMachine Learning Made Easy - Data for Good Regina
Machine Learning Made Easy - Data for Good ReginaData For Good Regina
 
Regina Food Summit - Data for Good
Regina Food Summit - Data for GoodRegina Food Summit - Data for Good
Regina Food Summit - Data for GoodData For Good Regina
 
Finding and using unique data sources
Finding and using unique data sourcesFinding and using unique data sources
Finding and using unique data sourcesData For Good Regina
 
Precision.ai Presentation @ Data for Good Regina - March
Precision.ai Presentation @ Data for Good Regina - MarchPrecision.ai Presentation @ Data for Good Regina - March
Precision.ai Presentation @ Data for Good Regina - MarchData For Good Regina
 
Data for Good - Data viz Challenge #2
Data for Good - Data viz Challenge #2Data for Good - Data viz Challenge #2
Data for Good - Data viz Challenge #2Data For Good Regina
 
SaskTel & Data for Good - Sean Milne
SaskTel & Data for Good - Sean MilneSaskTel & Data for Good - Sean Milne
SaskTel & Data for Good - Sean MilneData For Good Regina
 
Data Visualization Kick Off #1 - Nov 3 2020 - Data for Good Saskatchewan
Data Visualization Kick Off #1 - Nov 3 2020 - Data for Good SaskatchewanData Visualization Kick Off #1 - Nov 3 2020 - Data for Good Saskatchewan
Data Visualization Kick Off #1 - Nov 3 2020 - Data for Good SaskatchewanData For Good Regina
 
Data for Good Regina - 7shifts Presentation
Data for Good Regina - 7shifts PresentationData for Good Regina - 7shifts Presentation
Data for Good Regina - 7shifts PresentationData For Good Regina
 
Data for Good - Strategic Overview October 2020
Data for Good - Strategic Overview October 2020Data for Good - Strategic Overview October 2020
Data for Good - Strategic Overview October 2020Data For Good Regina
 
A Playbook - Data Gathering To Storytelling
A Playbook - Data Gathering To StorytellingA Playbook - Data Gathering To Storytelling
A Playbook - Data Gathering To StorytellingData For Good Regina
 
Community Safety and Well Being Symposium 2020
Community Safety and Well Being Symposium 2020Community Safety and Well Being Symposium 2020
Community Safety and Well Being Symposium 2020Data For Good Regina
 
Data for Good: The Regina intersectoral Partnership
Data for Good: The Regina intersectoral PartnershipData for Good: The Regina intersectoral Partnership
Data for Good: The Regina intersectoral PartnershipData For Good Regina
 
Regina Early Years Family Centre - Data For Good
Regina Early Years Family Centre - Data For GoodRegina Early Years Family Centre - Data For Good
Regina Early Years Family Centre - Data For GoodData For Good Regina
 
ISM Environment Insights w/ Advanced Analytics - Data For Good
ISM Environment Insights w/ Advanced Analytics - Data For GoodISM Environment Insights w/ Advanced Analytics - Data For Good
ISM Environment Insights w/ Advanced Analytics - Data For GoodData For Good Regina
 
The United Way - Summer Success Program - Data For Good
The United Way - Summer Success Program - Data For GoodThe United Way - Summer Success Program - Data For Good
The United Way - Summer Success Program - Data For GoodData For Good Regina
 

More from Data For Good Regina (20)

april2023.pptx
april2023.pptxapril2023.pptx
april2023.pptx
 
march2023.pdf
march2023.pdfmarch2023.pdf
march2023.pdf
 
Machine Learning Made Easy - Data for Good Regina
Machine Learning Made Easy - Data for Good ReginaMachine Learning Made Easy - Data for Good Regina
Machine Learning Made Easy - Data for Good Regina
 
Regina Food Summit - Data for Good
Regina Food Summit - Data for GoodRegina Food Summit - Data for Good
Regina Food Summit - Data for Good
 
Finding and using unique data sources
Finding and using unique data sourcesFinding and using unique data sources
Finding and using unique data sources
 
Precision.ai Presentation @ Data for Good Regina - March
Precision.ai Presentation @ Data for Good Regina - MarchPrecision.ai Presentation @ Data for Good Regina - March
Precision.ai Presentation @ Data for Good Regina - March
 
Data for Good - Data viz Challenge #2
Data for Good - Data viz Challenge #2Data for Good - Data viz Challenge #2
Data for Good - Data viz Challenge #2
 
SaskTel & Data for Good - Sean Milne
SaskTel & Data for Good - Sean MilneSaskTel & Data for Good - Sean Milne
SaskTel & Data for Good - Sean Milne
 
Regina Food Bank & Data for Good
Regina Food Bank & Data for GoodRegina Food Bank & Data for Good
Regina Food Bank & Data for Good
 
Data Visualization Kick Off #1 - Nov 3 2020 - Data for Good Saskatchewan
Data Visualization Kick Off #1 - Nov 3 2020 - Data for Good SaskatchewanData Visualization Kick Off #1 - Nov 3 2020 - Data for Good Saskatchewan
Data Visualization Kick Off #1 - Nov 3 2020 - Data for Good Saskatchewan
 
Data for Good Regina - 7shifts Presentation
Data for Good Regina - 7shifts PresentationData for Good Regina - 7shifts Presentation
Data for Good Regina - 7shifts Presentation
 
Data for Good - Strategic Overview October 2020
Data for Good - Strategic Overview October 2020Data for Good - Strategic Overview October 2020
Data for Good - Strategic Overview October 2020
 
A Playbook - Data Gathering To Storytelling
A Playbook - Data Gathering To StorytellingA Playbook - Data Gathering To Storytelling
A Playbook - Data Gathering To Storytelling
 
Community Safety and Well Being Symposium 2020
Community Safety and Well Being Symposium 2020Community Safety and Well Being Symposium 2020
Community Safety and Well Being Symposium 2020
 
Data for Good: The Regina intersectoral Partnership
Data for Good: The Regina intersectoral PartnershipData for Good: The Regina intersectoral Partnership
Data for Good: The Regina intersectoral Partnership
 
Tourism Regina & Data
Tourism Regina & DataTourism Regina & Data
Tourism Regina & Data
 
U of R - Carbon Pricing Models
U of R - Carbon Pricing ModelsU of R - Carbon Pricing Models
U of R - Carbon Pricing Models
 
Regina Early Years Family Centre - Data For Good
Regina Early Years Family Centre - Data For GoodRegina Early Years Family Centre - Data For Good
Regina Early Years Family Centre - Data For Good
 
ISM Environment Insights w/ Advanced Analytics - Data For Good
ISM Environment Insights w/ Advanced Analytics - Data For GoodISM Environment Insights w/ Advanced Analytics - Data For Good
ISM Environment Insights w/ Advanced Analytics - Data For Good
 
The United Way - Summer Success Program - Data For Good
The United Way - Summer Success Program - Data For GoodThe United Way - Summer Success Program - Data For Good
The United Way - Summer Success Program - Data For Good
 

Recently uploaded

代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改atducpo
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...Florian Roscheck
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptxthyngster
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...Suhani Kapoor
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingNeil Barnes
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Sapana Sha
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...Pooja Nehwal
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFAAndrei Kaleshka
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxEmmanuel Dauda
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceSapana Sha
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一F sss
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts ServiceSapana Sha
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998YohFuh
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfJohn Sterrett
 
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一F La
 

Recently uploaded (20)

代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
 
Call Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort ServiceCall Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort Service
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
 
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFA
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptx
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts Service
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts Service
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdf
 
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
 
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
 

Working with survey data with Cameron Rayner

  • 1. Working with Survey Data Cameron Raynor March 29, 2022 +
  • 2. Background Founder and Principal of RA2 in Calgary Apply data analysis, research and digital tools for NGOs, political groups and brands - Political polling - Data strategy - Stakeholder/member relations Survey research, NLP/machine learning, social network analysis
  • 3. Survey Data Goal: - Estimate opinions, attitudes, beliefs, values, behaviour, etc. for a population - Accurately assess the variable of interest for the whole population, not just the survey respondents Source: Stats and R Blog Population vs. Sample
  • 4. Survey error comes from many sources Most commonly reported error is the Margin of Random Sampling Error ● This only accounts for part of the error Other sources of error come down to how the data was collected, how questions were asked, etc. ● These are more difficult to estimate Total Survey Error Source: Biemer (2010), Total Survey Error: Design, Implementation, and Evaluation Survey Error
  • 5. Probability vs. Non-probability sample ● Probability samples are the gold standard ● In probability samples, each member of the sample has a chance of being included in the sample ● New methods are somewhere in between. E.g. probability panels Stratification and quotas ● Reduce bias in the sample ● Not a silver-bullet solution Practical Considerations Cost considerations ● Convenience samples are typically much cheaper than probability samples Convenience considerations ● Some methods take longer to field ● May not be easy to reach some groups
  • 6. Nonsense/Fraudulent Responses: ● Satisficing—respondents take mental shortcuts ● Respondents may not paying attention ● May just want the survey incentive (if applicable) ● Could be malicious to distort survey results Quality Control Checks ● Straightlining: Respondent chooses all the same questions in a grid ● Speeding: Respondent complete the survey in superhuman time ● Trap Questions: Respondents select implausible answers or don’t follow instructions Fatigue Leads to Satisficing ● Shorter is better ● Very short (less than 5 questions) is ideal ● Data quality drop significantly after ~20 minutes (YMMV) Survey Data Quality For more on trap questions: Liu and Wronski 2018: Trap questions in online surveys: Results from three web survey experiments Kung et al 2018: Are Attention Check Questions a Threat to Scale Validity
  • 7. Missing Completely at Random (MCAR): ● The best case scenario ● Data is missing at random and not changing the distribution of responses Missing Not at Random (MNAR) ● There is a pattern to the missingness ● Could indicate a larger issue with data collection ● May indicate response bias (social desirability bias, etc.) or bias in the sample Missing at Random (MAR) ● There is only a relationship between missingness and the value you’re measuring Missing Data
  • 8. There are many methods to deal with missing data (complete cases, nearest neighbour, mean, median, etc.) MICE is a top-performer ● Stands for Multivariate Imputation by Chained Equations ● Uses other variables in the dataset to estimate missing values ● Generates “plausible synthetic values” From the documentation: By default, the method uses pmm, predictive mean matching (numeric data) logreg, logistic regression imputation (binary data, factor with 2 levels) polyreg, polytomous regression imputation for unordered categorical data (factor > 2 levels) polr, proportional odds model for (ordered, > 2 levels). General rule is to only input up to 5% missingness Great documentation at https://www.rdocumentation.org/packages/mice/versions/3.13.0/topics/mice https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3074241/ Imputation with MICE Image designed by Jaden M. Walters
  • 9. Using population data from a Fall 2021 political research poll Will make some simplifying assumptions for demonstration purposes ● Ignoring stratification ● Assuming MCAR (Missing Completely at Random) MICE Imputation Example in R
  • 10. Propensity weighting ● Adjust survey sample to known population parameters ● Weight by the inverse probability of selection to remove bias ● With probability samples, selection probabilities are known ● With non-probability samples, probabilities are estimated Algorithm iteratively adjusts weights to match survey distributions to known population distributions ● Implemented in R using the rake() function from the survey package, as well as the anesrake() function from the anesrake package This is one of, if not the, most common weighting method used by researchers and pollsters ● Only requires knowledge of the marginal populations Weighting with Post-Stratification
  • 11. Using population data from: https://www12.statcan.gc.ca/census-recensement/2016/dp-pd/prof/ details/download-telecharger/comp/page_dl-tc.cfm?Lang=E Post-Stratification Raking Example in R
  • 12. Thank you! ✉ cam@ra2.io 🔗 https://www.linkedin.com/in/cameronraynor/ 🌐 https://www.ra2.io