Your SlideShare is downloading. ×
0
Mechanisms forData Quality and Validationin Citizen ScienceA. Wiggins, G. Newman, R. Stevenson & K. CrowstonPresented by N...
Motivation Data quality and validation are a primary concern  for most citizen science projects   More contributors = mo...
Methods Survey   Questionnaire with 70 items, all optional   63 completed questionnaires representing 62 projects   Mo...
Survey: Resources FTEs: 0 – 50+   Average: 2.4; Median: 1   Often small fractions of several individuals’ time Annual ...
Survey: Methods UsedMethod                                                n    PercentageExpert review                    ...
Survey:         Combining MethodsMethods                                      n    PercentageSingle method                ...
Survey:     Resources & Methods Number of validation methods and staff are  positively correlated (r2 = 0.11)   More sta...
Survey: Other Validation Options “Please describe any additional validation methods  used in your project”   Several pro...
Choosing Mechanisms Data characteristics to consider when choosing  mechanisms to ensure quality   Accuracy and precisio...
Mechanisms: ProtocolsMechanism                 Process   Type/DetailQA project plans          Before    SOP in some areasR...
Mechanisms: ParticipantsMechanism                 Process   Types/DetailsParticipant training      Before,   Initial; Ongo...
Discussion Need to pay more attention to way that data are  created, not just protocols but also qualities of data  like ...
Future Work Most projects worry more about contributor  expertise than appropriate analysis methods   Resources are need...
Thanks! Nate Prestopnik DataONE working group on Public Participation in  Scientific Research US NSF grants 09-43049 & ...
Upcoming SlideShare
Loading in...5
×

Mechanisms for Data Quality and Validation in Citizen Science

977

Published on

Presentation for a paper on ways to improve data quality for citizen science. Presentation delivered by Nathan Prestopnik at a workshop on citizen science at eScience 2011.

Published in: Technology, Business
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
977
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
16
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide
  • Rating = classification or judgment tasks, admittedly not the clearest wording, but no one corrected this in text responsesPercentage = percentage of responding projects that use each method
  • Percentage = Percentage of responding projects that use this combination of methodsThere were a few other combinations that a handful of projects used; these were the dominant ones.Surprised to see so many with photos, as they are hard to use and store, and the frequency of using paper data sheets
  • Note that we did ask about numbers of contributions, but the units of contribution for each project (and even the way they count volunteers) were so different that they couldn’t be used for analysis
  • Split framework of mechanisms in two for ease of viewing; these are methods that address the protocol as the presumed source of errorStarred items address errors arising from both protocols and participants
  • These methods all address expected errors form participants, focusing primarily on skill evaluation and filtering or review for unusual reports
  • Transcript of "Mechanisms for Data Quality and Validation in Citizen Science"

    1. 1. Mechanisms forData Quality and Validationin Citizen ScienceA. Wiggins, G. Newman, R. Stevenson & K. CrowstonPresented by Nathan Prestopnik
    2. 2. Motivation Data quality and validation are a primary concern for most citizen science projects  More contributors = more opportunities for error There has been no review of appropriate data quality and validation mechanisms  Diverse projects face similar challenges Contributors’ skills and scale of participation are important considerations in ensuring quality
    3. 3. Methods Survey  Questionnaire with 70 items, all optional  63 completed questionnaires representing 62 projects  Mostly small-to-medium sized projects in US, Canada, UK; most focus on monitoring and observation Inductive development of framework  Based on survey results and authors’ direct experience with citizen science projects
    4. 4. Survey: Resources FTEs: 0 – 50+  Average: 2.4; Median: 1  Often small fractions of several individuals’ time Annual budgets: $125 - $1,000,000  Average: $105,000; Median: $35,000; Mode: $20,000  Up to 5 different funding sources, usually grants, in- kind contributions (staff time), & private donations Age/duration: -1 to 100 years  Average age: 13 years; Median: 9 years; Mode: 2 years
    5. 5. Survey: Methods UsedMethod n PercentageExpert review 46 77%Photo submissions 24 40%Paper data sheets submitted along with online entry 20 33%Replication/rating by multiple participants 14 23%QA/QC training program 13 22%Automatic filtering of unusual reports 11 18%Uniform equipment 9 15%Validation planned but not yet implemented 5 8%Replication/rating, by the same participant 2 3%Rating of established control items 2 3%None 2 3%Not sure/don’t know 2 3%
    6. 6. Survey: Combining MethodsMethods n PercentageSingle method 10 17%Multiple methods, up to 5 (average 2.5) 45 75%Expert review + Automatic filtering 11 18%Expert review + Paper data sheets 10 17%Expert review + Photos 14 23%Expert review + Photos + Paper data sheets 6 10%Expert review + Replication, multiple 10 17%
    7. 7. Survey: Resources & Methods Number of validation methods and staff are positively correlated (r2 = 0.11)  More staffing = more supervisory capacity Number of validation methods and budget are negatively correlated (r2 = -0.15)  If larger budgets means more contributors, this constrains scalability of multiple methods  Larger projects may use fewer but more sophisticated mechanisms  Suggests that human-supervised methods don’t scale
    8. 8. Survey: Other Validation Options “Please describe any additional validation methods used in your project”  Several projects rely on personal knowledge of contributing individuals for data quality  Not scientifically robust, but understandably relevant  Most comments referred to details of expert review  Reinforces the perceived value of expertise  Reporting interface and associated error-checking is often overlooked, but provides important initial data verification
    9. 9. Choosing Mechanisms Data characteristics to consider when choosing mechanisms to ensure quality  Accuracy and precision: taxonomic, spatial, temporal, etc.  Error prevention: malfeasance (gaming the system), inexperience, data entry errors, etc. Evaluate assumptions about error and accuracy  Where does error originate? How do mechanisms address this? At what step in the research process? How transparent is data review and outcomes? How much data will be reviewed? In how much detail?
    10. 10. Mechanisms: ProtocolsMechanism Process Type/DetailQA project plans Before SOP in some areasRepeated samples/tasks During By multiple participants, single participant, or experts (calibration)Tasks involving control During Contributions compared to known statesitemsUniform/calibrated During Used for measurements; cost/scaleequipment tradeoff; who pays?Paper data sheets + During Extended details, verifying data entryonline entry* accuracyDigital vouchers* During Photos, audio, specimens/archivesData triangulation, After Corroboration from other data sources;normalization, mining* statistical & computer science methodsData documentation* After Provide metadata about processes
    11. 11. Mechanisms: ParticipantsMechanism Process Types/DetailsParticipant training Before, Initial; Ongoing; Formal QA/QC DuringParticipant testing Before, Following training; Pre/test-retest DuringRating participant During, Unknown to participant; Known toperformance After participantFiltering of unusual During, Automatically; Manuallyreports AfterContacting participants After May alienate/educate contributorsabout unusual reportsAutomatic recognition After Techniques for image/text processingExpert review After By professionals, experienced contributors, or multiple parties
    12. 12. Discussion Need to pay more attention to way that data are created, not just protocols but also qualities of data like accuracy, precision Clear need for quality/validation mechanisms for analysis, not only for data collection/processing  Data mining techniques  Spatio-temporal modeling Scalability of validation may be limited  May need to plan different quality management techniques based on expected/actual project growth
    13. 13. Future Work Most projects worry more about contributor expertise than appropriate analysis methods  Resources are needed to support suitable analysis approaches and tools Comparative valuation of the efficacy of the data quality and validation mechanisms identified  Develop a QA/QC planning and evaluation tool Develop examples of appropriate data documentation for citizen science projects  Necessary for peer review, data re-use
    14. 14. Thanks! Nate Prestopnik DataONE working group on Public Participation in Scientific Research US NSF grants 09-43049 & 11-11107
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.

    ×