ESA Ignite talk on quality control for data

991 views

Published on

Talk on Quality Control and Quality Assurance for ecological data, presented as an ignite talk for ESA 2014 meeting in Sacramento CA 12 Aug 2014

Published in: Science
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
991
On SlideShare
0
From Embeds
0
Number of Embeds
21
Actions
Shares
0
Downloads
7
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

ESA Ignite talk on quality control for data

  1. 1. Ensuring That Your Data are High Quality Carly Strasser | @carlystrasser California Digital Library ESA 2014
  2. 2. Quality assurance & control: Mechanisms for preventing errors from entering a data set
  3. 3. Quality assurance & control: Plan Collect Assure Describe Preserve Discover Integrate Analyze
  4. 4. Why?
  5. 5. Why?
  6. 6. From:
  7. 7. Prevent Minimize Detect Handle 4 Strategies: From Flickr by Elliott Teel
  8. 8. Prevent Errors Before Collection •  Define & enforce standards •  Formats •  Codes •  Measurement units •  Metadata FromFlickrbyStacieBee
  9. 9. Prevent Errors Before Collection •  Define & enforce standards •  Formats •  Codes •  Measurement units •  Metadata •  Assign responsibility for data quality FromFlickrbyStacieBee
  10. 10. Comments & notes fields Allows handling of unexpected situations Prevent Errors Before Collection Allow “other” values From Flickr by Olga Nohra
  11. 11. Minimize Errors During Collection •  Eliminate manual data entry •  Design data storage well •  Minimize repeat entry •  Use consistent terminology •  Atomize data From Flickr by Butal Lee
  12. 12. You should invest time in learning databases if your data sets are large or complex Consider investing time in learning databases if your data are small and humble you ever intend to share your data you are < 30 years old From Mark Schildhauer Minimize Errors: Use databases
  13. 13. Databases •  FileMaker Pro (Mac) •  Access (PC) •  LibreOffice Minimize Errors: Tools
  14. 14. Databases •  FileMaker Pro (Mac) •  Access (PC) •  LibreOffice Spreadsheets •  Google forms •  LibreOffice •  Lists & data validation in Excel Minimize Errors: Tools
  15. 15. Detect Errors After Collection Look for outliers Goal is not to eliminate outliers but to identify potential data contamination 0 10 20 30 40 50 60 0 10 20 30 40
  16. 16. Detect Errors After Collection Look for outliers Goal is not to eliminate outliers but to identify potential data contamination Strategies •  Normal probability plots •  Regression •  Scatter plots •  Maps 0 10 20 30 40 50 60 0 10 20 30 40
  17. 17. Handle Errors •  Case-by-case decision •  Flag them? •  Remove them? •  Fix them? •  Document all changes readme.txt, scripts
  18. 18. Handle Errors •  Case-by-case decision •  Flag them? •  Remove them? •  Fix them? •  Document all changes readme.txt, scripts •  Keep original data separate •  Use scripts Raw data as .csv R script for QAQC
  19. 19. Prevent Minimize Detect Handle 4 Strategies: From Flickr by Elliott Teel
  20. 20. Website Email Twiter Slides carlystrasser.net carlystrasser@gmail.com @carlystrasser slideshare.net/carlystrasser

×