Your SlideShare is downloading. ×
0
ESA Ignite talk on quality control for data
ESA Ignite talk on quality control for data
ESA Ignite talk on quality control for data
ESA Ignite talk on quality control for data
ESA Ignite talk on quality control for data
ESA Ignite talk on quality control for data
ESA Ignite talk on quality control for data
ESA Ignite talk on quality control for data
ESA Ignite talk on quality control for data
ESA Ignite talk on quality control for data
ESA Ignite talk on quality control for data
ESA Ignite talk on quality control for data
ESA Ignite talk on quality control for data
ESA Ignite talk on quality control for data
ESA Ignite talk on quality control for data
ESA Ignite talk on quality control for data
ESA Ignite talk on quality control for data
ESA Ignite talk on quality control for data
ESA Ignite talk on quality control for data
ESA Ignite talk on quality control for data
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

ESA Ignite talk on quality control for data

366

Published on

Talk on Quality Control and Quality Assurance for ecological data, presented as an ignite talk for ESA 2014 meeting in Sacramento CA 12 Aug 2014

Talk on Quality Control and Quality Assurance for ecological data, presented as an ignite talk for ESA 2014 meeting in Sacramento CA 12 Aug 2014

Published in: Science
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
366
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
3
Comments
0
Likes
2
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Ensuring That Your Data are High Quality Carly Strasser | @carlystrasser California Digital Library ESA 2014
  • 2. Quality assurance & control: Mechanisms for preventing errors from entering a data set
  • 3. Quality assurance & control: Plan Collect Assure Describe Preserve Discover Integrate Analyze
  • 4. Why?
  • 5. Why?
  • 6. From:
  • 7. Prevent Minimize Detect Handle 4 Strategies: From Flickr by Elliott Teel
  • 8. Prevent Errors Before Collection •  Define & enforce standards •  Formats •  Codes •  Measurement units •  Metadata FromFlickrbyStacieBee
  • 9. Prevent Errors Before Collection •  Define & enforce standards •  Formats •  Codes •  Measurement units •  Metadata •  Assign responsibility for data quality FromFlickrbyStacieBee
  • 10. Comments & notes fields Allows handling of unexpected situations Prevent Errors Before Collection Allow “other” values From Flickr by Olga Nohra
  • 11. Minimize Errors During Collection •  Eliminate manual data entry •  Design data storage well •  Minimize repeat entry •  Use consistent terminology •  Atomize data From Flickr by Butal Lee
  • 12. You should invest time in learning databases if your data sets are large or complex Consider investing time in learning databases if your data are small and humble you ever intend to share your data you are < 30 years old From Mark Schildhauer Minimize Errors: Use databases
  • 13. Databases •  FileMaker Pro (Mac) •  Access (PC) •  LibreOffice Minimize Errors: Tools
  • 14. Databases •  FileMaker Pro (Mac) •  Access (PC) •  LibreOffice Spreadsheets •  Google forms •  LibreOffice •  Lists & data validation in Excel Minimize Errors: Tools
  • 15. Detect Errors After Collection Look for outliers Goal is not to eliminate outliers but to identify potential data contamination 0 10 20 30 40 50 60 0 10 20 30 40
  • 16. Detect Errors After Collection Look for outliers Goal is not to eliminate outliers but to identify potential data contamination Strategies •  Normal probability plots •  Regression •  Scatter plots •  Maps 0 10 20 30 40 50 60 0 10 20 30 40
  • 17. Handle Errors •  Case-by-case decision •  Flag them? •  Remove them? •  Fix them? •  Document all changes readme.txt, scripts
  • 18. Handle Errors •  Case-by-case decision •  Flag them? •  Remove them? •  Fix them? •  Document all changes readme.txt, scripts •  Keep original data separate •  Use scripts Raw data as .csv R script for QAQC
  • 19. Prevent Minimize Detect Handle 4 Strategies: From Flickr by Elliott Teel
  • 20. Website Email Twiter Slides carlystrasser.net carlystrasser@gmail.com @carlystrasser slideshare.net/carlystrasser

×