SlideShare a Scribd company logo
www.rti.orgRTI International is a registered trademark and a trade name of Research Triangle Institute.
Data Quality Concerns in Scientific Tasks
Y. Patrick Hsieh
Stephanie Eckman
Herschel Sanders
Amanda Smith
1
Use of Crowdsourcing
 Crowdsourcing popular source of online workforce for
scientific research
– Classifying images
– Transcribing audio files
– Coding texts or social media content
 Fast & inexpensive
 Amazon Mechanical Turk (MTurk)
2
These tasks are
a lot like surveys
What about
Data Quality?
Crowdsourcing vs Panels
MTurk
 Paid per HIT
 Metrics available
– # of tasks completed
– % of tasks approved
 Strong norm:
– Quality work → fair
pay
Online Panel
• Paid per survey
• Few quality metrics
available
3
Do cultures & incentives lead
to data quality differences?
• In surveys?
• In scientific tasks?
Motivated misreporting
 Web survey design
Research Question
4
Format MTurk Online Panel
Grouped
Filter
Filter
Filter
Follow Up
Follow Up
Follow Up
Follow Up
Filter
Filter
Filter
Follow Up
Follow Up
Follow Up
Follow Up
Interleafed
Filter
Follow Up
Follow Up
Filter
Filter
Follow Up
Follow Up
Filter
Follow Up
Follow Up
Filter
Filter
Follow Up
Follow Up
2 tasks:
• Survey
• Image
coding
2 Sources of Participants
 MTurk
– 80% prior approval rate
– In US
 Online panel
– Convenience sample in US
– Balanced to Census
5
 Survey:
– 185/214 completed
– 59% female
– 39 years old
– 48% >= bachelors
 Image coding:
– 141/342 completed
– 62% female
– 50% bachelors or higher
 Survey:
– 204/260 completed
– 53% female
– 48 years old
– 37% >= bachelors
 Image coding:
– 141/372 completed
– 60% female
– 45% bachelors or higher
Task A: Lifestyle Survey
 4 filter sections
– Clothing
– Consumer goods
– Leisure activity
– Credit cards
 30 minutes
 $4 incentive
 Order of sections randomized
 Filters in forward or backward order
6
Has anyone in this household
purchased pants in the last 3
months?
Yes
How much did those pants cost?
Does that price include tax?
Did you buy them online?
……………….
Has anyone in this household
purchased shoes in the last 3
months?
Yes?
Task B: Image Coding
7
 Image coding task
– 40 photos of Haiti buildings
– $6 incentive
– 50 minutes
 4 elements
– Beam
– Column
– Slab
– Wall
 2 filters
– Can you see element?
– Is it damaged?
Results: Motivated Misreporting in Survey Questions
 Expected format effect: more YES answers in GROUPED format
8
Results: Motivated Misreporting in Survey Questions
 DV: YES response
 Controlling for:
– Demographics
– Order * section
– Format * MTurk / Panel
9
Results: Motivated Misreporting in Image Coding
 Effect in opposite direction: More YES in lnterleafed
 MTurkers answered YES more often
10
Average # of YES responses
Element visibility Element damage
Grouped 68.7 49.3
Interleaf 87.1 53.1
Average # of YES responses
Element visibility Element damage
Panel 65.4 47.1
MTurk 88.9 55.0
Take Aways (preliminary)
 Results not as expected
– Survey: Format effect only in MTurk
– MTurkers are similar to other survey respondents
– Why no format effect in panel?
 No motivated misreporting in Panel?
 Or misreporting in both formats?
– Image Coding: Format effect in opposite direction
 Some evidence MTurkers work harder than panelists
– Survey: less item NR
– Image Coding: longer time with training materials
11
???
Discussion
 Data scientists are doing surveys to make training data
 We know a lot about survey data quality!
– Measurement error
– Nonresponse error
– Coverage error
12
How do these affect
• Training data?
• Model predictions?
More Information
Y. Patrick Hsieh
yph@rti.org
@coolpat
Stephanie Eckman
seckman@rti.org
@stephnie
13

More Related Content

Similar to Data Quality Concerns when Crowdsourcing Scientific Tasks

Thesis review Presentation
Thesis review PresentationThesis review Presentation
Thesis review Presentation
Andrew Harvey
 
Fairness in Search & RecSys 네이버 검색 콜로키움 김진영
Fairness in Search & RecSys 네이버 검색 콜로키움 김진영Fairness in Search & RecSys 네이버 검색 콜로키움 김진영
Fairness in Search & RecSys 네이버 검색 콜로키움 김진영
Jin Young Kim
 
In pursuit of augmented intelligence
In pursuit of augmented intelligenceIn pursuit of augmented intelligence
In pursuit of augmented intelligence
DataScienceAssociation
 
Tutorial: Context-awareness In Information Retrieval and Recommender Systems
Tutorial: Context-awareness In Information Retrieval and Recommender SystemsTutorial: Context-awareness In Information Retrieval and Recommender Systems
Tutorial: Context-awareness In Information Retrieval and Recommender Systems
YONG ZHENG
 
Measuring Relevance in the Negative Space
Measuring Relevance in the Negative SpaceMeasuring Relevance in the Negative Space
Measuring Relevance in the Negative Space
Trey Grainger
 
Practical Approaches to Sharing Information
Practical Approaches to Sharing InformationPractical Approaches to Sharing Information
Practical Approaches to Sharing Information
Christine Connors
 
Detecting Good Abandonment in Mobile Search
Detecting Good Abandonment in Mobile SearchDetecting Good Abandonment in Mobile Search
Detecting Good Abandonment in Mobile Search
Julia Kiseleva
 
Protland Trail blazers
Protland Trail blazersProtland Trail blazers
Protland Trail blazers
Russab Ali
 
Toward Hybrid Computing
Toward Hybrid ComputingToward Hybrid Computing
Toward Hybrid Computing
Joe McCarthy
 
Technology Motivators and Usage in Non-Profit Arts Organizations
Technology Motivators and Usage in Non-Profit Arts OrganizationsTechnology Motivators and Usage in Non-Profit Arts Organizations
Technology Motivators and Usage in Non-Profit Arts Organizations
CAMT
 
Machine learning
Machine learning Machine learning
Machine learning
sum1705
 
Juliette Melton - Mobile User Experience Research
Juliette Melton - Mobile User Experience ResearchJuliette Melton - Mobile User Experience Research
Juliette Melton - Mobile User Experience Research
Web Directions
 
problem
problemproblem
problem
Mad Monk
 
Data & Marketing Analytics Theatre; The democratisation of market research
Data & Marketing Analytics Theatre; The democratisation of market researchData & Marketing Analytics Theatre; The democratisation of market research
Data & Marketing Analytics Theatre; The democratisation of market research
TFM&A
 
GradTrack: Getting Started with Statistics September 20, 2018
GradTrack: Getting Started with Statistics September 20, 2018GradTrack: Getting Started with Statistics September 20, 2018
GradTrack: Getting Started with Statistics September 20, 2018
Nancy Garmer
 
GradTrack: Getting Started with Statistics September 20, 2018
GradTrack: Getting Started with Statistics September 20, 2018GradTrack: Getting Started with Statistics September 20, 2018
GradTrack: Getting Started with Statistics September 20, 2018
Evans Library at Florida Institute of Technology
 
Paper Presentation: Data Mining User Preference in Interactive Multimedia
Paper Presentation: Data Mining User Preference in Interactive MultimediaPaper Presentation: Data Mining User Preference in Interactive Multimedia
Paper Presentation: Data Mining User Preference in Interactive Multimedia
Jeanette Howe
 
SA1: How to use Mechanical Turk for Behavioral Research
SA1: How to use Mechanical Turk for Behavioral ResearchSA1: How to use Mechanical Turk for Behavioral Research
SA1: How to use Mechanical Turk for Behavioral Research
John Breslin
 
Brightfind world usability day 2016 full deck final
Brightfind world usability day 2016   full deck finalBrightfind world usability day 2016   full deck final
Brightfind world usability day 2016 full deck final
Brightfind
 
Comparison GWAP Mechanical Turk
Comparison GWAP Mechanical TurkComparison GWAP Mechanical Turk
Comparison GWAP Mechanical Turk
Elena Simperl
 

Similar to Data Quality Concerns when Crowdsourcing Scientific Tasks (20)

Thesis review Presentation
Thesis review PresentationThesis review Presentation
Thesis review Presentation
 
Fairness in Search & RecSys 네이버 검색 콜로키움 김진영
Fairness in Search & RecSys 네이버 검색 콜로키움 김진영Fairness in Search & RecSys 네이버 검색 콜로키움 김진영
Fairness in Search & RecSys 네이버 검색 콜로키움 김진영
 
In pursuit of augmented intelligence
In pursuit of augmented intelligenceIn pursuit of augmented intelligence
In pursuit of augmented intelligence
 
Tutorial: Context-awareness In Information Retrieval and Recommender Systems
Tutorial: Context-awareness In Information Retrieval and Recommender SystemsTutorial: Context-awareness In Information Retrieval and Recommender Systems
Tutorial: Context-awareness In Information Retrieval and Recommender Systems
 
Measuring Relevance in the Negative Space
Measuring Relevance in the Negative SpaceMeasuring Relevance in the Negative Space
Measuring Relevance in the Negative Space
 
Practical Approaches to Sharing Information
Practical Approaches to Sharing InformationPractical Approaches to Sharing Information
Practical Approaches to Sharing Information
 
Detecting Good Abandonment in Mobile Search
Detecting Good Abandonment in Mobile SearchDetecting Good Abandonment in Mobile Search
Detecting Good Abandonment in Mobile Search
 
Protland Trail blazers
Protland Trail blazersProtland Trail blazers
Protland Trail blazers
 
Toward Hybrid Computing
Toward Hybrid ComputingToward Hybrid Computing
Toward Hybrid Computing
 
Technology Motivators and Usage in Non-Profit Arts Organizations
Technology Motivators and Usage in Non-Profit Arts OrganizationsTechnology Motivators and Usage in Non-Profit Arts Organizations
Technology Motivators and Usage in Non-Profit Arts Organizations
 
Machine learning
Machine learning Machine learning
Machine learning
 
Juliette Melton - Mobile User Experience Research
Juliette Melton - Mobile User Experience ResearchJuliette Melton - Mobile User Experience Research
Juliette Melton - Mobile User Experience Research
 
problem
problemproblem
problem
 
Data & Marketing Analytics Theatre; The democratisation of market research
Data & Marketing Analytics Theatre; The democratisation of market researchData & Marketing Analytics Theatre; The democratisation of market research
Data & Marketing Analytics Theatre; The democratisation of market research
 
GradTrack: Getting Started with Statistics September 20, 2018
GradTrack: Getting Started with Statistics September 20, 2018GradTrack: Getting Started with Statistics September 20, 2018
GradTrack: Getting Started with Statistics September 20, 2018
 
GradTrack: Getting Started with Statistics September 20, 2018
GradTrack: Getting Started with Statistics September 20, 2018GradTrack: Getting Started with Statistics September 20, 2018
GradTrack: Getting Started with Statistics September 20, 2018
 
Paper Presentation: Data Mining User Preference in Interactive Multimedia
Paper Presentation: Data Mining User Preference in Interactive MultimediaPaper Presentation: Data Mining User Preference in Interactive Multimedia
Paper Presentation: Data Mining User Preference in Interactive Multimedia
 
SA1: How to use Mechanical Turk for Behavioral Research
SA1: How to use Mechanical Turk for Behavioral ResearchSA1: How to use Mechanical Turk for Behavioral Research
SA1: How to use Mechanical Turk for Behavioral Research
 
Brightfind world usability day 2016 full deck final
Brightfind world usability day 2016   full deck finalBrightfind world usability day 2016   full deck final
Brightfind world usability day 2016 full deck final
 
Comparison GWAP Mechanical Turk
Comparison GWAP Mechanical TurkComparison GWAP Mechanical Turk
Comparison GWAP Mechanical Turk
 

More from Stephanie Eckman

Combining Survey and Wearable Data on Exercise and Sleep
Combining Survey and Wearable Data on Exercise and Sleep	Combining Survey and Wearable Data on Exercise and Sleep
Combining Survey and Wearable Data on Exercise and Sleep
Stephanie Eckman
 
Data Quality Concerns when Crowdsourcing Scientific Tasks
Data Quality Concerns when Crowdsourcing Scientific TasksData Quality Concerns when Crowdsourcing Scientific Tasks
Data Quality Concerns when Crowdsourcing Scientific Tasks
Stephanie Eckman
 
Three Studies on Supplementing Survey Data with Active Data
Three Studies on Supplementing Survey Data with Active DataThree Studies on Supplementing Survey Data with Active Data
Three Studies on Supplementing Survey Data with Active Data
Stephanie Eckman
 
Interviewer Involvement in Selection Shapes the Relationship between Response...
Interviewer Involvement in Selection Shapes the Relationship between Response...Interviewer Involvement in Selection Shapes the Relationship between Response...
Interviewer Involvement in Selection Shapes the Relationship between Response...
Stephanie Eckman
 
Response Rates Impact Data Quality, But not How you Might Think
Response Rates Impact Data Quality, But not How you Might ThinkResponse Rates Impact Data Quality, But not How you Might Think
Response Rates Impact Data Quality, But not How you Might Think
Stephanie Eckman
 
Are the Hard ­to ­Cover Also Less Likely to Respond?
Are the Hard ­to ­Cover Also Less Likely to Respond?Are the Hard ­to ­Cover Also Less Likely to Respond?
Are the Hard ­to ­Cover Also Less Likely to Respond?
Stephanie Eckman
 
Sampling Nomads: A New Technique for Remote, Hard-to-Reach, and Mobile Popula...
Sampling Nomads: A New Technique for Remote, Hard-to-Reach, and Mobile Popula...Sampling Nomads: A New Technique for Remote, Hard-to-Reach, and Mobile Popula...
Sampling Nomads: A New Technique for Remote, Hard-to-Reach, and Mobile Popula...
Stephanie Eckman
 
Use of Dependent Interviewing in Panel Surveys
Use of Dependent Interviewing in Panel SurveysUse of Dependent Interviewing in Panel Surveys
Use of Dependent Interviewing in Panel Surveys
Stephanie Eckman
 
Coverage Nonresponse Trade-Off
Coverage Nonresponse Trade-OffCoverage Nonresponse Trade-Off
Coverage Nonresponse Trade-Off
Stephanie Eckman
 
Uses of GIS in Survey Data Collection
Uses of GIS in Survey Data CollectionUses of GIS in Survey Data Collection
Uses of GIS in Survey Data Collection
Stephanie Eckman
 
Format Effect in Looping Questions
Format Effect in Looping QuestionsFormat Effect in Looping Questions
Format Effect in Looping Questions
Stephanie Eckman
 

More from Stephanie Eckman (11)

Combining Survey and Wearable Data on Exercise and Sleep
Combining Survey and Wearable Data on Exercise and Sleep	Combining Survey and Wearable Data on Exercise and Sleep
Combining Survey and Wearable Data on Exercise and Sleep
 
Data Quality Concerns when Crowdsourcing Scientific Tasks
Data Quality Concerns when Crowdsourcing Scientific TasksData Quality Concerns when Crowdsourcing Scientific Tasks
Data Quality Concerns when Crowdsourcing Scientific Tasks
 
Three Studies on Supplementing Survey Data with Active Data
Three Studies on Supplementing Survey Data with Active DataThree Studies on Supplementing Survey Data with Active Data
Three Studies on Supplementing Survey Data with Active Data
 
Interviewer Involvement in Selection Shapes the Relationship between Response...
Interviewer Involvement in Selection Shapes the Relationship between Response...Interviewer Involvement in Selection Shapes the Relationship between Response...
Interviewer Involvement in Selection Shapes the Relationship between Response...
 
Response Rates Impact Data Quality, But not How you Might Think
Response Rates Impact Data Quality, But not How you Might ThinkResponse Rates Impact Data Quality, But not How you Might Think
Response Rates Impact Data Quality, But not How you Might Think
 
Are the Hard ­to ­Cover Also Less Likely to Respond?
Are the Hard ­to ­Cover Also Less Likely to Respond?Are the Hard ­to ­Cover Also Less Likely to Respond?
Are the Hard ­to ­Cover Also Less Likely to Respond?
 
Sampling Nomads: A New Technique for Remote, Hard-to-Reach, and Mobile Popula...
Sampling Nomads: A New Technique for Remote, Hard-to-Reach, and Mobile Popula...Sampling Nomads: A New Technique for Remote, Hard-to-Reach, and Mobile Popula...
Sampling Nomads: A New Technique for Remote, Hard-to-Reach, and Mobile Popula...
 
Use of Dependent Interviewing in Panel Surveys
Use of Dependent Interviewing in Panel SurveysUse of Dependent Interviewing in Panel Surveys
Use of Dependent Interviewing in Panel Surveys
 
Coverage Nonresponse Trade-Off
Coverage Nonresponse Trade-OffCoverage Nonresponse Trade-Off
Coverage Nonresponse Trade-Off
 
Uses of GIS in Survey Data Collection
Uses of GIS in Survey Data CollectionUses of GIS in Survey Data Collection
Uses of GIS in Survey Data Collection
 
Format Effect in Looping Questions
Format Effect in Looping QuestionsFormat Effect in Looping Questions
Format Effect in Looping Questions
 

Recently uploaded

一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
slg6lamcq
 
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
nuttdpt
 
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataPredictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Kiwi Creative
 
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
sameer shah
 
Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......
Sachin Paul
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
Roger Valdez
 
Challenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more importantChallenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more important
Sm321
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
dwreak4tg
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
74nqk8xf
 
The Ipsos - AI - Monitor 2024 Report.pdf
The  Ipsos - AI - Monitor 2024 Report.pdfThe  Ipsos - AI - Monitor 2024 Report.pdf
The Ipsos - AI - Monitor 2024 Report.pdf
Social Samosa
 
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
nyfuhyz
 
Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...
Bill641377
 
Global Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headedGlobal Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headed
vikram sood
 
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Aggregage
 
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
mzpolocfi
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
g4dpvqap0
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
rwarrenll
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
aqzctr7x
 

Recently uploaded (20)

一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
 
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
 
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataPredictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
 
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
 
Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
 
Challenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more importantChallenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more important
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
 
The Ipsos - AI - Monitor 2024 Report.pdf
The  Ipsos - AI - Monitor 2024 Report.pdfThe  Ipsos - AI - Monitor 2024 Report.pdf
The Ipsos - AI - Monitor 2024 Report.pdf
 
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
 
Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...
 
Global Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headedGlobal Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headed
 
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
 
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
 

Data Quality Concerns when Crowdsourcing Scientific Tasks

  • 1. www.rti.orgRTI International is a registered trademark and a trade name of Research Triangle Institute. Data Quality Concerns in Scientific Tasks Y. Patrick Hsieh Stephanie Eckman Herschel Sanders Amanda Smith 1
  • 2. Use of Crowdsourcing  Crowdsourcing popular source of online workforce for scientific research – Classifying images – Transcribing audio files – Coding texts or social media content  Fast & inexpensive  Amazon Mechanical Turk (MTurk) 2 These tasks are a lot like surveys What about Data Quality?
  • 3. Crowdsourcing vs Panels MTurk  Paid per HIT  Metrics available – # of tasks completed – % of tasks approved  Strong norm: – Quality work → fair pay Online Panel • Paid per survey • Few quality metrics available 3 Do cultures & incentives lead to data quality differences? • In surveys? • In scientific tasks? Motivated misreporting
  • 4.  Web survey design Research Question 4 Format MTurk Online Panel Grouped Filter Filter Filter Follow Up Follow Up Follow Up Follow Up Filter Filter Filter Follow Up Follow Up Follow Up Follow Up Interleafed Filter Follow Up Follow Up Filter Filter Follow Up Follow Up Filter Follow Up Follow Up Filter Filter Follow Up Follow Up 2 tasks: • Survey • Image coding
  • 5. 2 Sources of Participants  MTurk – 80% prior approval rate – In US  Online panel – Convenience sample in US – Balanced to Census 5  Survey: – 185/214 completed – 59% female – 39 years old – 48% >= bachelors  Image coding: – 141/342 completed – 62% female – 50% bachelors or higher  Survey: – 204/260 completed – 53% female – 48 years old – 37% >= bachelors  Image coding: – 141/372 completed – 60% female – 45% bachelors or higher
  • 6. Task A: Lifestyle Survey  4 filter sections – Clothing – Consumer goods – Leisure activity – Credit cards  30 minutes  $4 incentive  Order of sections randomized  Filters in forward or backward order 6 Has anyone in this household purchased pants in the last 3 months? Yes How much did those pants cost? Does that price include tax? Did you buy them online? ………………. Has anyone in this household purchased shoes in the last 3 months? Yes?
  • 7. Task B: Image Coding 7  Image coding task – 40 photos of Haiti buildings – $6 incentive – 50 minutes  4 elements – Beam – Column – Slab – Wall  2 filters – Can you see element? – Is it damaged?
  • 8. Results: Motivated Misreporting in Survey Questions  Expected format effect: more YES answers in GROUPED format 8
  • 9. Results: Motivated Misreporting in Survey Questions  DV: YES response  Controlling for: – Demographics – Order * section – Format * MTurk / Panel 9
  • 10. Results: Motivated Misreporting in Image Coding  Effect in opposite direction: More YES in lnterleafed  MTurkers answered YES more often 10 Average # of YES responses Element visibility Element damage Grouped 68.7 49.3 Interleaf 87.1 53.1 Average # of YES responses Element visibility Element damage Panel 65.4 47.1 MTurk 88.9 55.0
  • 11. Take Aways (preliminary)  Results not as expected – Survey: Format effect only in MTurk – MTurkers are similar to other survey respondents – Why no format effect in panel?  No motivated misreporting in Panel?  Or misreporting in both formats? – Image Coding: Format effect in opposite direction  Some evidence MTurkers work harder than panelists – Survey: less item NR – Image Coding: longer time with training materials 11 ???
  • 12. Discussion  Data scientists are doing surveys to make training data  We know a lot about survey data quality! – Measurement error – Nonresponse error – Coverage error 12 How do these affect • Training data? • Model predictions?
  • 13. More Information Y. Patrick Hsieh yph@rti.org @coolpat Stephanie Eckman seckman@rti.org @stephnie 13