SlideShare a Scribd company logo
1 of 23
Best Practices
Creating Research Data




                         Sherry Lake
                         July 31, 2012 University of Florida Data Management Workshop
WHY?

Following these Best Practices…….
• Will improve the usability of the data by you
  or by others
• Your data will be “computer ready”
• Your data will be ready to share with others
Spreadsheet Examples
Spreadsheet Problems?
Problems

• Dates are not
  stored
  consistently
• Values are labeled inconsistently
• Data coding is inconsistent
• Order of values are different
Problems

• Confusion
  between
  numbers and
  text
• Different types of data are stored in the
  same columns
• The spreadsheet loses interpretability if it
  is sorted
Best Practices Data Organization
• Lines or rows of data should be complete
   – Designed to be machine readable, not human
     readable (sort)
Best Practices Data Organization


• Include a Header Line 1st line (or record)
• Label each Column with a short but
  descriptive name
  – Names should be unique
  – Use letters, numbers, or “_” (underscore)
  – Do not include blank spaces or symbols (+ - & ^ *)
Best Practices Data Organization


• Columns of data should be consistent
  – Use the same naming convention for text data
• Columns should include only a single kind of
  data
  – Text or “string” data
  – Integer numbers
  – Floating point or real numbers
Use Standardized Formats

• ISO 8601 Standard for Date and Time
  – YYYYMMDDThh:mmss.sTZD
               20091013T09:1234.9Z
       20091013T09:1234.9+05:00
• Spatial Coordinates for Latitute/Longitude
  – +/- DD.DDDDD
        -78.476 (longitude)
        +38.029 (latitude)
File Names
File Names
• Use descriptive names
• Not too long
• Don’t use spaces
• Try to include time,
  place & theme
• May use “-” or “_”
File Names

• String words together with
  Caps (VegBiodiv_2007)
• Think about using version
  numbers
• Don’t change default
  extensions (txt, jpg, csv,…)
Quantitative Assurance/Control
Dataset Creation & Integrity Errors
   • Use a data entry program
      – Program to catch typing errors
      – Program pull-down menu options
   • Perform double entry of the data
   • Manually check 5 – 10% of data records
   • Check for out-of-range values (plotting)
   • Check for missing or impossible values
   • Perform statistical summaries (random samples)
Analyzing Data - Notes
• Keep Original File
  – Uncorrected copy
  – Make “read-only”
• Make notes on transformations
• Any changes, save as a new file
• Use scripted code to transform and correct
  data
Analyzing Data
• Use a scripted program (R, SAS, SPSS, Matlab)
  – Steps are recorded in textual format
  – Can be easily revised and re-executed
  – Helps sharing and repetition
  – Easy to document
• GUI-bases analysis may be easier, but harder
  to reproduce
Document EVERYTHING!

• Create a Project Document File
  – More than a Lab Notebook
  – Data Management Plan
• Start at the beginning of the project and
  continue throughout data collection & analysis
  – Why you are collecting data
  – Exact details of methods of collecting & analyzing
Document EVERYTHING!
• Details such as:
  – Names of data & analysis files associated with
    study
  – Definitions for data and codes (include missing
    value codes, names) example
  – Units of measure (accuracy and precision)
  – Standards or instrument calibrations
Choosing File Formats

• Accessible Data (in the future)
  – Non-proprietary (software formats)
  – Open, documented standard
  – Common, used by the research community
  – Standard representation (ASCII, Unicode)
  – Unencrypted & Uncompressed
  – Media formats (hardware formats)
Preferred Format Choices
•   PDF, not Word
•   ASCII, not Excel
•   MPEG-4, not Quicktime
•   TIFF or JPEG2000, not GIF or JPG
•   XML or RDF, not RDBMS

Good if not software specific
Best Practices

1. Use Consistent Data Organization
2. Use Standardized Formats
3. Assign Descriptive File Names
4. Perform Basic Quality Assurance/ Quality Control
5. Use Scripted Program for Analysis and Keep Notes
6. Document EVERYTHING! (Define Contents of Data
   Files )
7. Use Consistent, Stable and Open File Formats
Best Practices Bibliography
Borer, E. T., Seabloom, E. W., Jones, M. B., & Schildhauer, M. (2009). Some
   simple guidelines for effective data management. Bulletin of the Ecological
   Society of America, 90(2), 205-214.
Hook, L. A., Santhana Vannan, S.K., Beaty, T. W., Cook, R. B. and Wilson, B.E.
  (2010). Best Practices for Preparing Environmental Data Sets to Share and
  Archive. Available online (http://daac.ornl.gov/PI/BestPractices-2010.pdf)
  from Oak Ridge National Laboratory Distributed Active Archive Center, Oak
  Ridge, Tennessee, U.S.A. doi:10.3334/ORNLDAAC/BestPractices-2010.
Inter-university Consortium for Political and Social Research (ICPSR). (2012).
    Guide to social science data preparation and archiving: Best practices
    throughout the data cycle (5th ed.). Ann Arbor, MI. Retrieved
    05/31/2012, from
    http://www.icpsr.umich.edu/files/ICPSR/access/dataprep.pdf.
Data Observation Network for Earth (DataONE). (2012). DataONE Best
   Practices database. Retrieved 07/21/12, from
   http://www.dataone.org/best-practices.
Questions? Discussion?

• Sherry Lake
  Senior Scientific Data Consultant, UVA Library
• shlake@virginia.edu
• Twitter: shlakeuva
• Slideshare: http://www.slideshare.net/shlake
• Web: http://www.lib.virginia.edu/brown/data




                                                   23

More Related Content

What's hot

Who Should Own Data Governance – IT or Business?
Who Should Own Data Governance – IT or Business?Who Should Own Data Governance – IT or Business?
Who Should Own Data Governance – IT or Business?DATAVERSITY
 
Data Governance
Data GovernanceData Governance
Data GovernanceRob Lux
 
Data quality and data profiling
Data quality and data profilingData quality and data profiling
Data quality and data profilingShailja Khurana
 
Data Quality Management - Data Issue Management & Resolutionn / Practical App...
Data Quality Management - Data Issue Management & Resolutionn / Practical App...Data Quality Management - Data Issue Management & Resolutionn / Practical App...
Data Quality Management - Data Issue Management & Resolutionn / Practical App...Burak S. Arikan
 
Data Warehouse or Data Lake, Which Do I Choose?
Data Warehouse or Data Lake, Which Do I Choose?Data Warehouse or Data Lake, Which Do I Choose?
Data Warehouse or Data Lake, Which Do I Choose?DATAVERSITY
 
Building a Data Governance Strategy
Building a Data Governance StrategyBuilding a Data Governance Strategy
Building a Data Governance StrategyAnalytics8
 
Getting Data Quality Right
Getting Data Quality RightGetting Data Quality Right
Getting Data Quality RightDATAVERSITY
 
Data Governance Best Practices
Data Governance Best PracticesData Governance Best Practices
Data Governance Best PracticesDATAVERSITY
 
Data Quality Management: Cleaner Data, Better Reporting
Data Quality Management: Cleaner Data, Better ReportingData Quality Management: Cleaner Data, Better Reporting
Data Quality Management: Cleaner Data, Better Reportingaccenture
 
Data Quality Success Stories
Data Quality Success StoriesData Quality Success Stories
Data Quality Success StoriesDATAVERSITY
 
Data Quality Best Practices
Data Quality Best PracticesData Quality Best Practices
Data Quality Best PracticesDATAVERSITY
 
Data Architecture Strategies: Data Architecture for Digital Transformation
Data Architecture Strategies: Data Architecture for Digital TransformationData Architecture Strategies: Data Architecture for Digital Transformation
Data Architecture Strategies: Data Architecture for Digital TransformationDATAVERSITY
 
Data Management PowerPoint Presentation Slides
Data Management PowerPoint Presentation Slides Data Management PowerPoint Presentation Slides
Data Management PowerPoint Presentation Slides SlideTeam
 
Real-World Data Governance: How to Write a Data Steward Job Description
Real-World Data Governance: How to Write a Data Steward Job DescriptionReal-World Data Governance: How to Write a Data Steward Job Description
Real-World Data Governance: How to Write a Data Steward Job DescriptionDATAVERSITY
 
MDM for Customer data with Talend
MDM for Customer data with Talend MDM for Customer data with Talend
MDM for Customer data with Talend Jean-Michel Franco
 
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...DATAVERSITY
 
Data Architecture - The Foundation for Enterprise Architecture and Governance
Data Architecture - The Foundation for Enterprise Architecture and GovernanceData Architecture - The Foundation for Enterprise Architecture and Governance
Data Architecture - The Foundation for Enterprise Architecture and GovernanceDATAVERSITY
 
DAS Slides: Metadata Management From Technical Architecture & Business Techni...
DAS Slides: Metadata Management From Technical Architecture & Business Techni...DAS Slides: Metadata Management From Technical Architecture & Business Techni...
DAS Slides: Metadata Management From Technical Architecture & Business Techni...DATAVERSITY
 

What's hot (20)

Who Should Own Data Governance – IT or Business?
Who Should Own Data Governance – IT or Business?Who Should Own Data Governance – IT or Business?
Who Should Own Data Governance – IT or Business?
 
Data Governance
Data GovernanceData Governance
Data Governance
 
Data quality and data profiling
Data quality and data profilingData quality and data profiling
Data quality and data profiling
 
Data Quality Management - Data Issue Management & Resolutionn / Practical App...
Data Quality Management - Data Issue Management & Resolutionn / Practical App...Data Quality Management - Data Issue Management & Resolutionn / Practical App...
Data Quality Management - Data Issue Management & Resolutionn / Practical App...
 
Data Warehouse or Data Lake, Which Do I Choose?
Data Warehouse or Data Lake, Which Do I Choose?Data Warehouse or Data Lake, Which Do I Choose?
Data Warehouse or Data Lake, Which Do I Choose?
 
Building a Data Governance Strategy
Building a Data Governance StrategyBuilding a Data Governance Strategy
Building a Data Governance Strategy
 
Data Governance
Data GovernanceData Governance
Data Governance
 
Getting Data Quality Right
Getting Data Quality RightGetting Data Quality Right
Getting Data Quality Right
 
Data Governance Best Practices
Data Governance Best PracticesData Governance Best Practices
Data Governance Best Practices
 
Data Quality Management: Cleaner Data, Better Reporting
Data Quality Management: Cleaner Data, Better ReportingData Quality Management: Cleaner Data, Better Reporting
Data Quality Management: Cleaner Data, Better Reporting
 
Data Quality Success Stories
Data Quality Success StoriesData Quality Success Stories
Data Quality Success Stories
 
Data Quality Best Practices
Data Quality Best PracticesData Quality Best Practices
Data Quality Best Practices
 
Data Architecture Strategies: Data Architecture for Digital Transformation
Data Architecture Strategies: Data Architecture for Digital TransformationData Architecture Strategies: Data Architecture for Digital Transformation
Data Architecture Strategies: Data Architecture for Digital Transformation
 
Data Quality Presentation.ppt
Data Quality Presentation.pptData Quality Presentation.ppt
Data Quality Presentation.ppt
 
Data Management PowerPoint Presentation Slides
Data Management PowerPoint Presentation Slides Data Management PowerPoint Presentation Slides
Data Management PowerPoint Presentation Slides
 
Real-World Data Governance: How to Write a Data Steward Job Description
Real-World Data Governance: How to Write a Data Steward Job DescriptionReal-World Data Governance: How to Write a Data Steward Job Description
Real-World Data Governance: How to Write a Data Steward Job Description
 
MDM for Customer data with Talend
MDM for Customer data with Talend MDM for Customer data with Talend
MDM for Customer data with Talend
 
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...
 
Data Architecture - The Foundation for Enterprise Architecture and Governance
Data Architecture - The Foundation for Enterprise Architecture and GovernanceData Architecture - The Foundation for Enterprise Architecture and Governance
Data Architecture - The Foundation for Enterprise Architecture and Governance
 
DAS Slides: Metadata Management From Technical Architecture & Business Techni...
DAS Slides: Metadata Management From Technical Architecture & Business Techni...DAS Slides: Metadata Management From Technical Architecture & Business Techni...
DAS Slides: Metadata Management From Technical Architecture & Business Techni...
 

Similar to Best practices data collection

Data Management for Graduate Students
Data Management for Graduate StudentsData Management for Graduate Students
Data Management for Graduate StudentsRebekah Cummings
 
Data Archiving and Sharing
Data Archiving and SharingData Archiving and Sharing
Data Archiving and SharingC. Tobin Magle
 
3.1 Database structure - designing a system.ppt
3.1 Database structure - designing a system.ppt3.1 Database structure - designing a system.ppt
3.1 Database structure - designing a system.pptAghaSyedNaqvi
 
Best Practice in Data Management and Sharing
Best Practice in Data Management and Sharing Best Practice in Data Management and Sharing
Best Practice in Data Management and Sharing Mojtaba Lotfaliany
 
Data Management for Undergraduate Researchers (updated - 02/2016)
Data Management for Undergraduate Researchers (updated - 02/2016)Data Management for Undergraduate Researchers (updated - 02/2016)
Data Management for Undergraduate Researchers (updated - 02/2016)Rebekah Cummings
 
CSU-ACADIS_dataManagement101-20120217
CSU-ACADIS_dataManagement101-20120217CSU-ACADIS_dataManagement101-20120217
CSU-ACADIS_dataManagement101-20120217lyarmey
 
Creating Effective Data Visualizations in Excel 2016: Some Basics
Creating Effective Data Visualizations in Excel 2016:  Some BasicsCreating Effective Data Visualizations in Excel 2016:  Some Basics
Creating Effective Data Visualizations in Excel 2016: Some BasicsShalin Hai-Jew
 
Elements of Data Documentation
Elements of Data DocumentationElements of Data Documentation
Elements of Data Documentationssri-duke
 
Lec20.pptx introduction to data bases and information systems
Lec20.pptx introduction to data bases and information systemsLec20.pptx introduction to data bases and information systems
Lec20.pptx introduction to data bases and information systemssamiullahamjad06
 
Making your data good enough for sharing.
Making your data good enough for sharing.Making your data good enough for sharing.
Making your data good enough for sharing.FAIRDOM
 
IS L03 - Database Management
IS L03 - Database ManagementIS L03 - Database Management
IS L03 - Database ManagementJan Wong
 
Data Management for Undergraduate Researchers
Data Management for Undergraduate ResearchersData Management for Undergraduate Researchers
Data Management for Undergraduate ResearchersRebekah Cummings
 
Responsible conduct of research: Data Management
Responsible conduct of research: Data ManagementResponsible conduct of research: Data Management
Responsible conduct of research: Data ManagementC. Tobin Magle
 
Support Your Data, Kyoto University
Support Your Data, Kyoto UniversitySupport Your Data, Kyoto University
Support Your Data, Kyoto UniversityStephanie Simms
 
Database Systems - Lecture Week 1
Database Systems - Lecture Week 1Database Systems - Lecture Week 1
Database Systems - Lecture Week 1Dios Kurniawan
 
Epidata presentation course for heath science
Epidata presentation course for heath scienceEpidata presentation course for heath science
Epidata presentation course for heath scienceMitikuTeka1
 
Bren - UCSB - Spooky spreadsheets
Bren - UCSB - Spooky spreadsheetsBren - UCSB - Spooky spreadsheets
Bren - UCSB - Spooky spreadsheetsCarly Strasser
 

Similar to Best practices data collection (20)

Data Management for Graduate Students
Data Management for Graduate StudentsData Management for Graduate Students
Data Management for Graduate Students
 
6.2 software
6.2 software6.2 software
6.2 software
 
Data Archiving and Sharing
Data Archiving and SharingData Archiving and Sharing
Data Archiving and Sharing
 
3.1 Database structure - designing a system.ppt
3.1 Database structure - designing a system.ppt3.1 Database structure - designing a system.ppt
3.1 Database structure - designing a system.ppt
 
Best Practice in Data Management and Sharing
Best Practice in Data Management and Sharing Best Practice in Data Management and Sharing
Best Practice in Data Management and Sharing
 
Data Management for Undergraduate Researchers (updated - 02/2016)
Data Management for Undergraduate Researchers (updated - 02/2016)Data Management for Undergraduate Researchers (updated - 02/2016)
Data Management for Undergraduate Researchers (updated - 02/2016)
 
CSU-ACADIS_dataManagement101-20120217
CSU-ACADIS_dataManagement101-20120217CSU-ACADIS_dataManagement101-20120217
CSU-ACADIS_dataManagement101-20120217
 
Creating Effective Data Visualizations in Excel 2016: Some Basics
Creating Effective Data Visualizations in Excel 2016:  Some BasicsCreating Effective Data Visualizations in Excel 2016:  Some Basics
Creating Effective Data Visualizations in Excel 2016: Some Basics
 
Elements of Data Documentation
Elements of Data DocumentationElements of Data Documentation
Elements of Data Documentation
 
Digital data
Digital dataDigital data
Digital data
 
Digital Types
Digital TypesDigital Types
Digital Types
 
Lec20.pptx introduction to data bases and information systems
Lec20.pptx introduction to data bases and information systemsLec20.pptx introduction to data bases and information systems
Lec20.pptx introduction to data bases and information systems
 
Making your data good enough for sharing.
Making your data good enough for sharing.Making your data good enough for sharing.
Making your data good enough for sharing.
 
IS L03 - Database Management
IS L03 - Database ManagementIS L03 - Database Management
IS L03 - Database Management
 
Data Management for Undergraduate Researchers
Data Management for Undergraduate ResearchersData Management for Undergraduate Researchers
Data Management for Undergraduate Researchers
 
Responsible conduct of research: Data Management
Responsible conduct of research: Data ManagementResponsible conduct of research: Data Management
Responsible conduct of research: Data Management
 
Support Your Data, Kyoto University
Support Your Data, Kyoto UniversitySupport Your Data, Kyoto University
Support Your Data, Kyoto University
 
Database Systems - Lecture Week 1
Database Systems - Lecture Week 1Database Systems - Lecture Week 1
Database Systems - Lecture Week 1
 
Epidata presentation course for heath science
Epidata presentation course for heath scienceEpidata presentation course for heath science
Epidata presentation course for heath science
 
Bren - UCSB - Spooky spreadsheets
Bren - UCSB - Spooky spreadsheetsBren - UCSB - Spooky spreadsheets
Bren - UCSB - Spooky spreadsheets
 

More from Sherry Lake

Planning for Libra Data
Planning for Libra DataPlanning for Libra Data
Planning for Libra DataSherry Lake
 
Virginia Data Management Bootcamp: Building the Research Data Community of Pr...
Virginia Data Management Bootcamp: Building the Research Data Community of Pr...Virginia Data Management Bootcamp: Building the Research Data Community of Pr...
Virginia Data Management Bootcamp: Building the Research Data Community of Pr...Sherry Lake
 
Using a Case Study to Teach Data Management to Librarians
Using a Case Study to Teach Data Management to LibrariansUsing a Case Study to Teach Data Management to Librarians
Using a Case Study to Teach Data Management to LibrariansSherry Lake
 
Documentation and Metdata - VA DM Bootcamp
Documentation and Metdata - VA DM BootcampDocumentation and Metdata - VA DM Bootcamp
Documentation and Metdata - VA DM BootcampSherry Lake
 
DMTool-ASERL-Webinar
DMTool-ASERL-WebinarDMTool-ASERL-Webinar
DMTool-ASERL-WebinarSherry Lake
 
DMPTool Workshop University of Georgia
DMPTool Workshop University of GeorgiaDMPTool Workshop University of Georgia
DMPTool Workshop University of GeorgiaSherry Lake
 
Federal funder mandates
Federal funder mandatesFederal funder mandates
Federal funder mandatesSherry Lake
 
DMPTool2 demo for DMPTool-DMPonline Workshop IDCC 2014
DMPTool2 demo for DMPTool-DMPonline Workshop IDCC 2014DMPTool2 demo for DMPTool-DMPonline Workshop IDCC 2014
DMPTool2 demo for DMPTool-DMPonline Workshop IDCC 2014Sherry Lake
 
Data Management Planning for Engineers
Data Management Planning for EngineersData Management Planning for Engineers
Data Management Planning for EngineersSherry Lake
 
DMPTool Webinar Environmental Scan
DMPTool Webinar Environmental ScanDMPTool Webinar Environmental Scan
DMPTool Webinar Environmental ScanSherry Lake
 
Lake dmp tool_i_conference
Lake dmp tool_i_conferenceLake dmp tool_i_conference
Lake dmp tool_i_conferenceSherry Lake
 
Lake us-canada policesupdate
Lake us-canada policesupdateLake us-canada policesupdate
Lake us-canada policesupdateSherry Lake
 
Re tooling for data management-support
Re tooling for data management-supportRe tooling for data management-support
Re tooling for data management-supportSherry Lake
 
Managing the research life cycle
Managing the research life cycleManaging the research life cycle
Managing the research life cycleSherry Lake
 
Dmp tool presentation
Dmp tool presentationDmp tool presentation
Dmp tool presentationSherry Lake
 
Funder requirements for Data Management Plans
Funder requirements for Data Management PlansFunder requirements for Data Management Plans
Funder requirements for Data Management PlansSherry Lake
 
Library support for life cycle
Library support for life cycleLibrary support for life cycle
Library support for life cycleSherry Lake
 

More from Sherry Lake (20)

Planning for Libra Data
Planning for Libra DataPlanning for Libra Data
Planning for Libra Data
 
Virginia Data Management Bootcamp: Building the Research Data Community of Pr...
Virginia Data Management Bootcamp: Building the Research Data Community of Pr...Virginia Data Management Bootcamp: Building the Research Data Community of Pr...
Virginia Data Management Bootcamp: Building the Research Data Community of Pr...
 
Using a Case Study to Teach Data Management to Librarians
Using a Case Study to Teach Data Management to LibrariansUsing a Case Study to Teach Data Management to Librarians
Using a Case Study to Teach Data Management to Librarians
 
Documentation and Metdata - VA DM Bootcamp
Documentation and Metdata - VA DM BootcampDocumentation and Metdata - VA DM Bootcamp
Documentation and Metdata - VA DM Bootcamp
 
Creating dmp
Creating dmpCreating dmp
Creating dmp
 
DMTool-ASERL-Webinar
DMTool-ASERL-WebinarDMTool-ASERL-Webinar
DMTool-ASERL-Webinar
 
DMPTool Workshop University of Georgia
DMPTool Workshop University of GeorgiaDMPTool Workshop University of Georgia
DMPTool Workshop University of Georgia
 
Federal funder mandates
Federal funder mandatesFederal funder mandates
Federal funder mandates
 
DMPTool2 demo for DMPTool-DMPonline Workshop IDCC 2014
DMPTool2 demo for DMPTool-DMPonline Workshop IDCC 2014DMPTool2 demo for DMPTool-DMPonline Workshop IDCC 2014
DMPTool2 demo for DMPTool-DMPonline Workshop IDCC 2014
 
Data Management Planning for Engineers
Data Management Planning for EngineersData Management Planning for Engineers
Data Management Planning for Engineers
 
DMPTool Webinar Environmental Scan
DMPTool Webinar Environmental ScanDMPTool Webinar Environmental Scan
DMPTool Webinar Environmental Scan
 
Lake dmp tool_i_conference
Lake dmp tool_i_conferenceLake dmp tool_i_conference
Lake dmp tool_i_conference
 
Lake us-canada policesupdate
Lake us-canada policesupdateLake us-canada policesupdate
Lake us-canada policesupdate
 
Why managedata
Why managedataWhy managedata
Why managedata
 
Re tooling for data management-support
Re tooling for data management-supportRe tooling for data management-support
Re tooling for data management-support
 
Web links
Web linksWeb links
Web links
 
Managing the research life cycle
Managing the research life cycleManaging the research life cycle
Managing the research life cycle
 
Dmp tool presentation
Dmp tool presentationDmp tool presentation
Dmp tool presentation
 
Funder requirements for Data Management Plans
Funder requirements for Data Management PlansFunder requirements for Data Management Plans
Funder requirements for Data Management Plans
 
Library support for life cycle
Library support for life cycleLibrary support for life cycle
Library support for life cycle
 

Recently uploaded

Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfchloefrazer622
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Celine George
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeThiyagu K
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpinRaunakKeshri1
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfJayanti Pande
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfsanyamsingh5019
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfAyushMahapatra5
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdfQucHHunhnh
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfagholdier
 
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...PsychoTech Services
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...fonyou31
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfciinovamais
 
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...Sapna Thakur
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAssociation for Project Management
 

Recently uploaded (20)

Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdf
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpin
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdf
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdf
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across Sectors
 

Best practices data collection

  • 1. Best Practices Creating Research Data Sherry Lake July 31, 2012 University of Florida Data Management Workshop
  • 2. WHY? Following these Best Practices……. • Will improve the usability of the data by you or by others • Your data will be “computer ready” • Your data will be ready to share with others
  • 5. Problems • Dates are not stored consistently • Values are labeled inconsistently • Data coding is inconsistent • Order of values are different
  • 6. Problems • Confusion between numbers and text • Different types of data are stored in the same columns • The spreadsheet loses interpretability if it is sorted
  • 7. Best Practices Data Organization • Lines or rows of data should be complete – Designed to be machine readable, not human readable (sort)
  • 8. Best Practices Data Organization • Include a Header Line 1st line (or record) • Label each Column with a short but descriptive name – Names should be unique – Use letters, numbers, or “_” (underscore) – Do not include blank spaces or symbols (+ - & ^ *)
  • 9. Best Practices Data Organization • Columns of data should be consistent – Use the same naming convention for text data • Columns should include only a single kind of data – Text or “string” data – Integer numbers – Floating point or real numbers
  • 10. Use Standardized Formats • ISO 8601 Standard for Date and Time – YYYYMMDDThh:mmss.sTZD 20091013T09:1234.9Z 20091013T09:1234.9+05:00 • Spatial Coordinates for Latitute/Longitude – +/- DD.DDDDD -78.476 (longitude) +38.029 (latitude)
  • 12. File Names • Use descriptive names • Not too long • Don’t use spaces • Try to include time, place & theme • May use “-” or “_”
  • 13. File Names • String words together with Caps (VegBiodiv_2007) • Think about using version numbers • Don’t change default extensions (txt, jpg, csv,…)
  • 14. Quantitative Assurance/Control Dataset Creation & Integrity Errors • Use a data entry program – Program to catch typing errors – Program pull-down menu options • Perform double entry of the data • Manually check 5 – 10% of data records • Check for out-of-range values (plotting) • Check for missing or impossible values • Perform statistical summaries (random samples)
  • 15. Analyzing Data - Notes • Keep Original File – Uncorrected copy – Make “read-only” • Make notes on transformations • Any changes, save as a new file • Use scripted code to transform and correct data
  • 16. Analyzing Data • Use a scripted program (R, SAS, SPSS, Matlab) – Steps are recorded in textual format – Can be easily revised and re-executed – Helps sharing and repetition – Easy to document • GUI-bases analysis may be easier, but harder to reproduce
  • 17. Document EVERYTHING! • Create a Project Document File – More than a Lab Notebook – Data Management Plan • Start at the beginning of the project and continue throughout data collection & analysis – Why you are collecting data – Exact details of methods of collecting & analyzing
  • 18. Document EVERYTHING! • Details such as: – Names of data & analysis files associated with study – Definitions for data and codes (include missing value codes, names) example – Units of measure (accuracy and precision) – Standards or instrument calibrations
  • 19. Choosing File Formats • Accessible Data (in the future) – Non-proprietary (software formats) – Open, documented standard – Common, used by the research community – Standard representation (ASCII, Unicode) – Unencrypted & Uncompressed – Media formats (hardware formats)
  • 20. Preferred Format Choices • PDF, not Word • ASCII, not Excel • MPEG-4, not Quicktime • TIFF or JPEG2000, not GIF or JPG • XML or RDF, not RDBMS Good if not software specific
  • 21. Best Practices 1. Use Consistent Data Organization 2. Use Standardized Formats 3. Assign Descriptive File Names 4. Perform Basic Quality Assurance/ Quality Control 5. Use Scripted Program for Analysis and Keep Notes 6. Document EVERYTHING! (Define Contents of Data Files ) 7. Use Consistent, Stable and Open File Formats
  • 22. Best Practices Bibliography Borer, E. T., Seabloom, E. W., Jones, M. B., & Schildhauer, M. (2009). Some simple guidelines for effective data management. Bulletin of the Ecological Society of America, 90(2), 205-214. Hook, L. A., Santhana Vannan, S.K., Beaty, T. W., Cook, R. B. and Wilson, B.E. (2010). Best Practices for Preparing Environmental Data Sets to Share and Archive. Available online (http://daac.ornl.gov/PI/BestPractices-2010.pdf) from Oak Ridge National Laboratory Distributed Active Archive Center, Oak Ridge, Tennessee, U.S.A. doi:10.3334/ORNLDAAC/BestPractices-2010. Inter-university Consortium for Political and Social Research (ICPSR). (2012). Guide to social science data preparation and archiving: Best practices throughout the data cycle (5th ed.). Ann Arbor, MI. Retrieved 05/31/2012, from http://www.icpsr.umich.edu/files/ICPSR/access/dataprep.pdf. Data Observation Network for Earth (DataONE). (2012). DataONE Best Practices database. Retrieved 07/21/12, from http://www.dataone.org/best-practices.
  • 23. Questions? Discussion? • Sherry Lake Senior Scientific Data Consultant, UVA Library • shlake@virginia.edu • Twitter: shlakeuva • Slideshare: http://www.slideshare.net/shlake • Web: http://www.lib.virginia.edu/brown/data 23

Editor's Notes

  1. Have you ever collected data and had trouble remembering what you did at the start?Tried to share your data with someone and they (or you) couldn’t understand itUsing “Best Practices” when you collect and record your data will improve future usability and may save time.Preparing your data using these “Best Practices”Following these best practices (guidelines) will help you Following these best practices will improve the usability of the data by you or by others … use it with other data.
  2. Spreadsheets are widely used for simple analyses They are easy to use BUT They allow (encourage) users to structure data in ways that are hard to use with other softwareYou can use them like Word, with columns. These spreadsheets (in this format) are good for “human” interpretation, not computers – and since you probably will need either Write a program or use a software package, then the “human” format is not best.These formats are good for presenting your findings such as publishing…. But it will be harder to use with other software later on (if you need to do any analysis).It is betterto store the data in ways that it can be used in automated ways, with minimal human intervention
  3. This is some well data measurements, where a salinity meter was used to measure the salinity (top and bottom) and the conductivity (Top & bottom)Take a look at this spreadsheet… What’s wrong with it?Could this be easily automated? Sorted?
  4. Dates are not stored consistentlySometimes date is stored with a label (e.g., “Date:5/23/2005”) sometimes in its own cell (10/2/2005)Values are labeled inconsistentlySometimes “Conductivity Top” others “conductivity_top”For Salinity sometimes two cells are used for top and bottom, in others they are combined in one cellData coding is inconsistentSometimes YSI_Model_30, sometimes “YSI Model 30”---- sort of can’t tell if it’s a “label” or a data valueTide State is sometimes a text description, sometimes a numberThe order of values in the “mini-table” for a given sampling date are different“Meter Type” comes first in the 5/23 table and second in the 10/2 table
  5. Confusion between numbers and textFor most software 39% or <30 are considered TEXT not numbers (what is the average of 349 and <30?)Different types of data are stored in the same columnsMany software products require that a single column contain either TEXT or NUMBERS (but not both!)The spreadsheet loses interpretability if it is sortedDates are related to a set of attributes only by their position in the file. Once sorted that relationship is lost.Not sure why you would sort this.
  6. The spreadsheet loses interpretability if it is sortedDates are related to a set of attributes only by their position in the file. Once sorted that relationship is lost.Look what happens when we sort this….Look at the difference in this one… sort it..https://docs.google.com/spreadsheet/ccc?key=0Att-cHR6O7gCdEZ2NzRhUWFLYy1nM2FMcDhaNGRVeWchttps://docs.google.com/spreadsheet/ccc?key=0Att-cHR6O7gCdHpTMC1kdWREbTNlanBwM3J5WVE3ZFE
  7. Standard convention for many software programs (usually a “check” yes,no) is for the 1st line (record) to be a header line… lists the names of variables in the file. Rest of records (lines) are data.Not too long some software programs may not work with long variable names
  8. We’ve seen that a spreadsheet or word processor can create datasets that can only be interpreted by human interventionThe “ugly spreadsheet” example would be hard to analyze even in a spreadsheet, except with lots case-by-case human decisionsBut what are some principles that characterize good archival data?Keep in mind that good data formats for data and sharing may not be the ones you prefer for viewing or analysis!Same naming convention for text data – use a vocabulary, keep same… “slack-high”…. Not “slack high”
  9. There are already standards for certain types of data (like date/time, spatial coords). Use them, don’t invent your own.Can you think of others?(am/pm NOT allowed) T appears literally in the string. Min. for date is YYYY.YYYY = four-digit yearMM = two-digit month (01=January, etc.) DD = two-digit day of month (01 through 31)hh = two digits of hour (00 through 23)mm = two digits of minute (00 through 59)ss = two digits of second (00 through 59) s = one or more digits representing a decimal fraction of a second TZD = time zone designator (Z or +hh:mm or -hh:mm) Vs. DMS degree minutes seconds important when data field could have more than one type of unit.
  10. Guidelines for filenames will only help you with your files/research. Once they are “archived” they will get new names that fit with the systems, usually a permanent name based on computer “locating” the file.Look at the file names……Context.txt, DataFile1.txt, DataFile2.txt, word6doc.zipLong ones….Safari, Ray… good date, placeNote “_” and “-” Think about how the name will look in a directory with lots of other files, want to be able to “pick it out”.
  11. File names easiest way to indicate the contents of the file, use terse but indicative of their content. Want to uniquely id the data file.Don’t’ make them too long, some scripting programs have a filename limit for file importing (reading)Don’t use blanks, some software may not be able to read file names with blanks.Think about how the name will look in a directory with lots of other files, want to be able to “pick it out”.
  12. Maybe use version numbers…. Don’t forget the extension (3 char.) used to tell the file type
  13. Data Quality control takes place at various stages during data collection, data entry, and data checking. The quality of the collection methods has direct correlation to the quality of the data.Quality of data collection methods used has a significant bearing on data quality.Quality includes: equipment calibration (use instrument calibration to check precision) allows other researchers to look at your data and compare to theirs need to validate transcriptionTrain coders (different people doing this) – create handbook.Can create (program) data entry interfaces and verify data entry, use lists to choose fromVerification: out-of range values, random samples, double checking entriesMinimize manual entryVisual Basic can create forms for Excel. Access form creationRandom sample of dataConsistency checkseach record is keyed in and then re-keyed against the original. Several standard packages offer this feature. In the re-entry process, the program catches discrepancies immediately. Start before data collection, define standards – document in handbook
  14. Don’t want to change something (or delete something) that could be important later.If use a scripted language you could re-run analyses
  15. Analysis “scripted” software: R, SAS, SPSS, MatlabAnalysis scripts are written records of the various steps involved in processing and analyzing data (sort of “analytical metadata”).Easily revised and re-executed at any time if needs to modify analysisVS. GUI (easier) but does not leave a clear accounting of exactly what you have doneDocument scripted code with comments on why data is being changed.
  16. Important to repeat!!!!More documentation: Documentation can also be called metadataDescription of the data file names (especially if using acronyms and abbreviations).Record why you are collecting data, Details of methods of analysisNames of all data and analysis filesDefinitions for data (include coding keys)Missing value codesUnit of measures.Structured metadata (XML) format standards for discipline (Ecological Metadata language – EML)
  17. Can also be called metadataDescription of the data file names (especially if using acronyms and abbrevs.Record why you are collecting data, Details of methods of analysisNames of all data and analysis filesDefinitions for data (include coding keys)Missing value codesUnit of measures.Calibrations so others can compare their results with yours.Structured metadata (XML) format standards for discipline (Ecological Metadata language – EML)
  18. Spreadsheets are widely used for simple analysesBut they have poor archival qualities Different versions over time are not compatibleFormulas are hard to capture or displayPlan what type of data you will be collecting. Want to choose a file format that can be read well into the future and is independent of software changes.These are formats more likely to be accessible in the future. to replace old media, maintaining devices that can still read the proprietary formats or media typeFormat of the file is a major factor in the ability to use the data in the future. As technology changes, plan for software and hardware obsolescence. System files (SAS, SPSS) are compact and efficient, but not very portable. Use software to “export” data to a portable (or transport) file. Convert proprietary formats to non-proprietary. Check for data errors in conversion.
  19. Examples of preferred format choicesFormats for long-term digital preservation (open). Don’t expect you (won’t have time) or the archive to be able to convert older formats to new one.
  20. Remember create spreadsheet so it can be automated2. Date/Time standards, Geospatial coords, Species, other standards from discipline3. Descriptive File Names – File names can help id what’ inside 4. Quality Assurance – when planning on data entry can “program” data checks in forms (Access and Excel), create pick lists (codes), missing data values5. Make it easier to replicate data transformation, can be documented6. Document EVERYTHING, dataset details, database details, collection notes – conditions, You will not remember everything 20 years from now! What someone would need to know about your data to use it.7. Stable File Formats – easier if all files were same format, also knowing what formats are better in the long-term