SlideShare a Scribd company logo
1 of 32
Download to read offline
Managing the Research
       Data Life Cycle
               Presented by Sherry Lake
                  ShLake@virginia.edu


               July 31, 2012 University of Florida Data Management Workshop
Research Life Cycle


                      Data           Re-        Data                  Deposit
                      Discovery      Use
                                                Archive




Proposal   Project    Data               Data              Data           End of
Planning   Start Up   Collection         Analysis          Sharing        Project
Writing



                               Re-
                               Purpose
                                                    Data Life Cycle
Why Manage Data?

 Saves time

 Others can understand your data

 Makes sharing/preserving data easier
   Reinforces open scientific inquiry and replication of results

   Increases the visibility of your research

   Facilitates new discoveries

   Reduces costs by avoiding duplication

   Required by funding agencies                              Proposal
                                                              Planning
                                                              Writing
Ethical and Legal Issues

 Confidentiality
   Evaluate the sensitivity of your data
   Comply with institution’s research guidelines
   Comply with regulations for health research
   May need to enable a restricted view of your data

 Intellectual Property
   Copyright
   Patents

                                                        Proposal
                                                        Planning
                                                        Writing
Data Sharing and Retention
Requirements

 Be Aware of Funding Requirements
   Informal sharing statement
   Separate Data Management Plan

 Know What Your Institution Requires

 Know What Your Department Requires

 Publisher’s Requirement
   Nature Magazine

                                        Proposal
                                        Planning
                                        Writing
Create a Data Management Plan

 Appoint Data Manager Contact
 Describe data to be collected and methodology
 Include guidelines on data documentation
 Plan quality assurance and backup procedures
 Plan sharing of data for public use
 Include preservation plans
 Document copyright and intellectual property rights

                                                  Project
                                                  Start Up
Data Life Cycle
           within Context of the Research Life Cycle

                               Data              Re-         Data                 Deposit
                               Discovery         Use
                                                             Archive




Proposal          Project      Data                   Data              Data          End of
Planning          Start Up     Collection             Analysis          Sharing       Project
Writing



                                            Re-
                                            Purpose
                                                                 Data Life
                                                                 Cycle
Managing Data in the Data Life Cycle

 Data Collection and Organization

 Data Control & Security

 Backup & Storage

 Documentation and Metadata

 Processing and Analysis

 Preparing Data to Share
What is Data?

 Observational – data captured in real-time
        Examples: Sensor readings, telemetry, survey
         results, images
        Usually irreplaceable


 Experimental – data from lab equipment
        Examples: gene
         sequences, chromatograms, magnetic field
         readings
        Often reproducible, but can be expensive
What is Data?

 Simulation – data generated from test models
       Examples: climate models, economic models
       Models & metadata (inputs) more important than
         output data


 Derived or compiled – data
       Examples: text and data mining, compiled
         database, 3D models
       Reproducible (but very expensive)
Types and Formats of Data
Types             Examples

Text              ASCII, Word, PDF
Numerical         ASCII, SPSS, STATA, Excel, Access,
                  MySQL
Multimedia        Jpeg, tiff, mpeg, quicktime
Models            3D, statistical
Software          Java, C, Fortran
Domain-specific   FITS in astronomy, CIF in chemistry
Instrument-       Olympus Confocal Microscope
specific           Data Format
Organizing Your Files

 File Version Control

 Directory Structure/File Naming Conventions

 File Naming Conventions for Specific Disciplines

 File Structure

 Use Same Structure for Backups
Data Security & Access Control

Protection of data from unauthorized
access, use, change, disclosure and destruction
  • Network Security
  • Physical Security
  • Computer Systems & Files
Data Security & Access Control

 Network security
   Keep confidential data off internet servers (or behind firewalls)
   Put sensitive materials on computers not connected to the
     internet

 Physical security
   Access to buildings and rooms

 Computer systems & files
   Use passwords on files/systems
   Virus protection
Data Storage

Things to consider when deciding on where and how to store
your data

 File Format

 Media Life and Format

 Disaster Recovery Plan

 Environmental Conditions

 Security
Backup Your Data

 Reduce the risk of damage or loss

 Use multiple locations (one off-site)

 Validate using checksums

 Create a backup schedule

 Use reliable backup medium

 Test your backup system (i.e., test file recovery)
Backup & Storage Options

 Personal Computer
 Departmental or University Server
 Tape Backups
 Subject archive
 CDs or DVDs – NOT Recommended
 External Hard Drives
 Cloud Storage
Documentation

 Start at beginning of research and continue throughout

 Data documentation enables you to understand the data in
   detail

 Enables others to find it, use it and properly cite it
Data Documentation
Data documentation includes information on:
   + The Project
   + Data Collection Methods
   + Structure of the data files
   + Data sources used
   + Transformations of the data

At the data-level, information on:
   + Labels and descriptions for variables & records
   + Codes and classifications
   + Derived data algorithms
   + File format and software used
Data Collection

Best Practices detailed in the
  presentation that follows.




                                 Data
                                 Collection
Data Processing & Analysis

Software tools to create, process and visualize the data
   + Programming languages (Fortran, PHP, Ruby, Python, C++, etc)
   + Data collection software (LabView)
   + Analysis (SPSS, SAS, Matlab, Mathematica, R, etc)




                                                           Data
                                                           Analysis
Recording Processes

Record every change to a file, no matter how small
   + Document changes to files
   + Use file naming conventions
   + Headers inside the file
   + Log files (automatic)
   + Version Control Software (e.g. SVN)
   + File sharing software (Google Drive, or DropBox, others)



                                                                Data
                                                                Analysis
Prepare to Share

Preparing data to share makes publishing data easier
  •   Archive Submission Policies/Guidelines
  •   File Format Conversion
  •   Documentation & Metadata
  •   Programming Code
  •   Citations to existing datasets
  •   Creation of un-restricted dataset



                                                       Data
                                                       Sharing
Choosing File Formats

Accessible in the future
   •   Non-proprietary
   •   Open, documented standard
   •   Common, used by the research community
   •   Standard representation (ASCII, Unicode)
   •   Unencrypted
   •   Uncompressed



                                                  Data
                                                  Sharing
Preferred Format Choices

 PDF, not Word

 ASCII, not Excel

 MPEG-4, not Quicktime

 TIFF or JPEG2000, not GIF or JPG

 XML or RDF, not RDBMS


Not software specific                Data
                                     Sharing
Documentation & Metadata

What is Metadata?

 Who created the data?

 What is the content of the data set?

 When was it created?

 Where was it collected?

 How was it developed?
                                         Data
 Why was it developed?                  Sharing
Metadata Formats & Standards
 Provides structure to describe data
   Common terms
   Definitions
   Language
   Structure

 Many different standards (based on discipline)
   DDI
   FGDC
   EML

 Tools for creating metadata files
   Nesstar (DDI)                                  Data
                                                   Sharing
   Metavist (FGDC)
   Morpho (EML)
Archiving Your Data

 Informally on a peer-to-peer basis

 Make accessible on online project web page

 Make accessible on institutional web site

 Submitting to a journal

 Deposit in discipline specific repository

 Deposit in Institutional Repository
Advantages of Repositories

 Secure Environment        Backups

 Quality of Data           Promotion of Data

 Access Control to Data    Easy Dissemination

 Long-term Preservation    Online Resource Discovery

 Licensing Arrangements
Data Repositories
 Example of discipline specific repositories:
  + SIMBAD (Astronomy)
  + Protein Data Bank (Biology)
  + PubChem (Chemistry)
  + GEON (Earth Science)
  + Long Term Ecological Research (Ecology)
  + ICPSR (Social Sciences)

Databib is a tool for helping people identify
and locate online repositories of research data.
http://databib.org
Data Management Bibliography

Graham, A., McNeill, K., Stout, A., & Sweeney, L. (2010). Data Management
   and Publishing. Retrieved 05/31/2012, from
   http://libraries.mit.edu/guides/subjects/data-management/.

Inter-university Consortium for Political and Social Research (ICPSR). (2012).
   Guide to social science data preparation and archiving: Best practices
   throughout the data cycle (5th ed.). Ann Arbor, MI. Retrieved
   05/31/2012, from
   http://www.icpsr.umich.edu/files/ICPSR/access/dataprep.pdf.

Van den Eynden, V., Corti, L., Woollard, M. & Bishop, L. (2011). Managing and
  Sharing Data: A Best Practice Guide for Researchers (3rd ed.). Retrieved
  05/31/2012, from http://www.data-
  archive.ac.uk/media/2894/managingsharing.pdf
Questions?

 Sherry Lake
  Senior Scientific Data Consultant, UVA Library
 shlake@virginia.edu

 Twitter: shlakeuva

 Slideshare: http://www.slideshare.net/shlake

 Web: http://www.lib.virginia.edu/brown/data

                           32

More Related Content

What's hot

Virginia Data Management Bootcamp: Building the Research Data Community of Pr...
Virginia Data Management Bootcamp: Building the Research Data Community of Pr...Virginia Data Management Bootcamp: Building the Research Data Community of Pr...
Virginia Data Management Bootcamp: Building the Research Data Community of Pr...Sherry Lake
 
Support Your Data, Kyoto University
Support Your Data, Kyoto UniversitySupport Your Data, Kyoto University
Support Your Data, Kyoto UniversityStephanie Simms
 
Data Management for Research (New Faculty Orientation)
Data Management for Research (New Faculty Orientation)Data Management for Research (New Faculty Orientation)
Data Management for Research (New Faculty Orientation)aaroncollie
 
Data management for TA's
Data management for TA'sData management for TA's
Data management for TA'saaroncollie
 
Introduction to Data Management
Introduction to Data ManagementIntroduction to Data Management
Introduction to Data ManagementAmanda Whitmire
 
Going Full Circle: Research Data Management @ University of Pretoria
Going Full Circle: Research Data Management @ University of PretoriaGoing Full Circle: Research Data Management @ University of Pretoria
Going Full Circle: Research Data Management @ University of PretoriaJohann van Wyk
 
Research Data Management and Librarians
Research Data Management and LibrariansResearch Data Management and Librarians
Research Data Management and LibrariansJohann van Wyk
 
IDCC Workshop: Analysing DMPs to inform research data services: lessons from ...
IDCC Workshop: Analysing DMPs to inform research data services: lessons from ...IDCC Workshop: Analysing DMPs to inform research data services: lessons from ...
IDCC Workshop: Analysing DMPs to inform research data services: lessons from ...Amanda Whitmire
 
Data as a Library Aquisition
Data as a Library AquisitionData as a Library Aquisition
Data as a Library Aquisitionaaroncollie
 

What's hot (20)

Virginia Data Management Bootcamp: Building the Research Data Community of Pr...
Virginia Data Management Bootcamp: Building the Research Data Community of Pr...Virginia Data Management Bootcamp: Building the Research Data Community of Pr...
Virginia Data Management Bootcamp: Building the Research Data Community of Pr...
 
Preparing Your Research Material for the Future - 2014-06-09 - Humanities Div...
Preparing Your Research Material for the Future - 2014-06-09 - Humanities Div...Preparing Your Research Material for the Future - 2014-06-09 - Humanities Div...
Preparing Your Research Material for the Future - 2014-06-09 - Humanities Div...
 
Preparing Your Research Material for the Future - 2016-02-22 - Humanities Div...
Preparing Your Research Material for the Future - 2016-02-22 - Humanities Div...Preparing Your Research Material for the Future - 2016-02-22 - Humanities Div...
Preparing Your Research Material for the Future - 2016-02-22 - Humanities Div...
 
Introduction to Research Data Management - 2016-02-03 - MPLS Division, Univer...
Introduction to Research Data Management - 2016-02-03 - MPLS Division, Univer...Introduction to Research Data Management - 2016-02-03 - MPLS Division, Univer...
Introduction to Research Data Management - 2016-02-03 - MPLS Division, Univer...
 
Support Your Data, Kyoto University
Support Your Data, Kyoto UniversitySupport Your Data, Kyoto University
Support Your Data, Kyoto University
 
Research Data Management: An Overview - 2014-05-12 - Humanities Division, Uni...
Research Data Management: An Overview - 2014-05-12 - Humanities Division, Uni...Research Data Management: An Overview - 2014-05-12 - Humanities Division, Uni...
Research Data Management: An Overview - 2014-05-12 - Humanities Division, Uni...
 
Introduction to Research Data Management - 2014-02-26 - Mathematical, Physica...
Introduction to Research Data Management - 2014-02-26 - Mathematical, Physica...Introduction to Research Data Management - 2014-02-26 - Mathematical, Physica...
Introduction to Research Data Management - 2014-02-26 - Mathematical, Physica...
 
Data Management for Research (New Faculty Orientation)
Data Management for Research (New Faculty Orientation)Data Management for Research (New Faculty Orientation)
Data Management for Research (New Faculty Orientation)
 
Data Management Planning for Researchers - 2016-02-08 - University of Oxford
Data Management Planning for Researchers - 2016-02-08 - University of OxfordData Management Planning for Researchers - 2016-02-08 - University of Oxford
Data Management Planning for Researchers - 2016-02-08 - University of Oxford
 
Preparing Your Research Material for the Future 2016-05-16 - Humanities Divis...
Preparing Your Research Material for the Future 2016-05-16 - Humanities Divis...Preparing Your Research Material for the Future 2016-05-16 - Humanities Divis...
Preparing Your Research Material for the Future 2016-05-16 - Humanities Divis...
 
Writing a Research Data Management Plan - 2016-11-09 - University of Oxford
Writing a Research Data Management Plan - 2016-11-09 - University of OxfordWriting a Research Data Management Plan - 2016-11-09 - University of Oxford
Writing a Research Data Management Plan - 2016-11-09 - University of Oxford
 
Data management for TA's
Data management for TA'sData management for TA's
Data management for TA's
 
Introduction to Data Management
Introduction to Data ManagementIntroduction to Data Management
Introduction to Data Management
 
Preparing Your Research Material for the Future - 2015-02-23 - Humanities Div...
Preparing Your Research Material for the Future - 2015-02-23 - Humanities Div...Preparing Your Research Material for the Future - 2015-02-23 - Humanities Div...
Preparing Your Research Material for the Future - 2015-02-23 - Humanities Div...
 
Going Full Circle: Research Data Management @ University of Pretoria
Going Full Circle: Research Data Management @ University of PretoriaGoing Full Circle: Research Data Management @ University of Pretoria
Going Full Circle: Research Data Management @ University of Pretoria
 
Research Data Management and Librarians
Research Data Management and LibrariansResearch Data Management and Librarians
Research Data Management and Librarians
 
Preparing Your Research Material for the Future - 2016-11-16 - Humanities Div...
Preparing Your Research Material for the Future - 2016-11-16 - Humanities Div...Preparing Your Research Material for the Future - 2016-11-16 - Humanities Div...
Preparing Your Research Material for the Future - 2016-11-16 - Humanities Div...
 
NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...
NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...
NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...
 
IDCC Workshop: Analysing DMPs to inform research data services: lessons from ...
IDCC Workshop: Analysing DMPs to inform research data services: lessons from ...IDCC Workshop: Analysing DMPs to inform research data services: lessons from ...
IDCC Workshop: Analysing DMPs to inform research data services: lessons from ...
 
Data as a Library Aquisition
Data as a Library AquisitionData as a Library Aquisition
Data as a Library Aquisition
 

Similar to Managing the research life cycle

UK Digital Curation Centre: enabling research data management at the coalface
UK Digital Curation Centre: enabling research data management at the coalfaceUK Digital Curation Centre: enabling research data management at the coalface
UK Digital Curation Centre: enabling research data management at the coalfaceLizLyon
 
Research Data Management: What is it and why is the Library & Archives Servic...
Research Data Management: What is it and why is the Library & Archives Servic...Research Data Management: What is it and why is the Library & Archives Servic...
Research Data Management: What is it and why is the Library & Archives Servic...GarethKnight
 
The state of global research data initiatives: observations from a life on th...
The state of global research data initiatives: observations from a life on th...The state of global research data initiatives: observations from a life on th...
The state of global research data initiatives: observations from a life on th...Projeto RCAAP
 
Data Management for Librarians: An Introduction
Data Management for Librarians: An IntroductionData Management for Librarians: An Introduction
Data Management for Librarians: An IntroductionGarethKnight
 
Policy-based Data Management
Policy-based Data Management Policy-based Data Management
Policy-based Data Management Gary Wilhelm
 
Data Management for Undergraduate Researchers
Data Management for Undergraduate ResearchersData Management for Undergraduate Researchers
Data Management for Undergraduate ResearchersRebekah Cummings
 
Data Visibility and Protection at the Scale of Life Sciences
Data Visibility and Protection at the Scale of Life SciencesData Visibility and Protection at the Scale of Life Sciences
Data Visibility and Protection at the Scale of Life SciencesAdam Marko
 
Research Data (and Software) Management at Imperial: (Everything you need to ...
Research Data (and Software) Management at Imperial: (Everything you need to ...Research Data (and Software) Management at Imperial: (Everything you need to ...
Research Data (and Software) Management at Imperial: (Everything you need to ...Sarah Anna Stewart
 
FAIR BioData Management
FAIR BioData ManagementFAIR BioData Management
FAIR BioData ManagementUlrike Wittig
 
e-Science, Research Data and Libaries
e-Science, Research Data and Libariese-Science, Research Data and Libaries
e-Science, Research Data and LibariesRob Grim
 
Research Data Management Fundamentals for MSU Engineering Students
Research Data Management Fundamentals for MSU Engineering StudentsResearch Data Management Fundamentals for MSU Engineering Students
Research Data Management Fundamentals for MSU Engineering StudentsAaron Collie
 
CSU-ACADIS_dataManagement101-20120217
CSU-ACADIS_dataManagement101-20120217CSU-ACADIS_dataManagement101-20120217
CSU-ACADIS_dataManagement101-20120217lyarmey
 
Preparing your data for sharing and publishing
Preparing your data for sharing and publishingPreparing your data for sharing and publishing
Preparing your data for sharing and publishingVarsha Khodiyar
 
SEAD Datanet and Sustainability Science
SEAD Datanet and Sustainability Science SEAD Datanet and Sustainability Science
SEAD Datanet and Sustainability Science Robert H. McDonald
 

Similar to Managing the research life cycle (20)

UK Digital Curation Centre: enabling research data management at the coalface
UK Digital Curation Centre: enabling research data management at the coalfaceUK Digital Curation Centre: enabling research data management at the coalface
UK Digital Curation Centre: enabling research data management at the coalface
 
What is-rdm
What is-rdmWhat is-rdm
What is-rdm
 
Research Data Management: What is it and why is the Library & Archives Servic...
Research Data Management: What is it and why is the Library & Archives Servic...Research Data Management: What is it and why is the Library & Archives Servic...
Research Data Management: What is it and why is the Library & Archives Servic...
 
The state of global research data initiatives: observations from a life on th...
The state of global research data initiatives: observations from a life on th...The state of global research data initiatives: observations from a life on th...
The state of global research data initiatives: observations from a life on th...
 
Data Management for Librarians: An Introduction
Data Management for Librarians: An IntroductionData Management for Librarians: An Introduction
Data Management for Librarians: An Introduction
 
Good Practice in Research Data Management
Good Practice in Research Data ManagementGood Practice in Research Data Management
Good Practice in Research Data Management
 
Policy-based Data Management
Policy-based Data Management Policy-based Data Management
Policy-based Data Management
 
Data Management for Undergraduate Researchers
Data Management for Undergraduate ResearchersData Management for Undergraduate Researchers
Data Management for Undergraduate Researchers
 
Data Visibility and Protection at the Scale of Life Sciences
Data Visibility and Protection at the Scale of Life SciencesData Visibility and Protection at the Scale of Life Sciences
Data Visibility and Protection at the Scale of Life Sciences
 
Research Data (and Software) Management at Imperial: (Everything you need to ...
Research Data (and Software) Management at Imperial: (Everything you need to ...Research Data (and Software) Management at Imperial: (Everything you need to ...
Research Data (and Software) Management at Imperial: (Everything you need to ...
 
FAIR BioData Management
FAIR BioData ManagementFAIR BioData Management
FAIR BioData Management
 
e-Science, Research Data and Libaries
e-Science, Research Data and Libariese-Science, Research Data and Libaries
e-Science, Research Data and Libaries
 
Research Data Management Fundamentals for MSU Engineering Students
Research Data Management Fundamentals for MSU Engineering StudentsResearch Data Management Fundamentals for MSU Engineering Students
Research Data Management Fundamentals for MSU Engineering Students
 
CSU-ACADIS_dataManagement101-20120217
CSU-ACADIS_dataManagement101-20120217CSU-ACADIS_dataManagement101-20120217
CSU-ACADIS_dataManagement101-20120217
 
Preparing Your Research Data for the Future - 2014-05-19 - Social Sciences Di...
Preparing Your Research Data for the Future - 2014-05-19 - Social Sciences Di...Preparing Your Research Data for the Future - 2014-05-19 - Social Sciences Di...
Preparing Your Research Data for the Future - 2014-05-19 - Social Sciences Di...
 
Preparing your data for sharing and publishing
Preparing your data for sharing and publishingPreparing your data for sharing and publishing
Preparing your data for sharing and publishing
 
Prototype Design of Open Access Institutional Repository
Prototype Design of Open Access Institutional RepositoryPrototype Design of Open Access Institutional Repository
Prototype Design of Open Access Institutional Repository
 
SEAD Datanet and Sustainability Science
SEAD Datanet and Sustainability Science SEAD Datanet and Sustainability Science
SEAD Datanet and Sustainability Science
 
BigData Testing by Shreya Pal
BigData Testing by Shreya PalBigData Testing by Shreya Pal
BigData Testing by Shreya Pal
 
RDM for trainee physicians
RDM for trainee physiciansRDM for trainee physicians
RDM for trainee physicians
 

More from Sherry Lake

Planning for Libra Data
Planning for Libra DataPlanning for Libra Data
Planning for Libra DataSherry Lake
 
DMTool-ASERL-Webinar
DMTool-ASERL-WebinarDMTool-ASERL-Webinar
DMTool-ASERL-WebinarSherry Lake
 
DMPTool Workshop University of Georgia
DMPTool Workshop University of GeorgiaDMPTool Workshop University of Georgia
DMPTool Workshop University of GeorgiaSherry Lake
 
Federal funder mandates
Federal funder mandatesFederal funder mandates
Federal funder mandatesSherry Lake
 
DMPTool2 demo for DMPTool-DMPonline Workshop IDCC 2014
DMPTool2 demo for DMPTool-DMPonline Workshop IDCC 2014DMPTool2 demo for DMPTool-DMPonline Workshop IDCC 2014
DMPTool2 demo for DMPTool-DMPonline Workshop IDCC 2014Sherry Lake
 
Data Management Planning for Engineers
Data Management Planning for EngineersData Management Planning for Engineers
Data Management Planning for EngineersSherry Lake
 
DMPTool Webinar Environmental Scan
DMPTool Webinar Environmental ScanDMPTool Webinar Environmental Scan
DMPTool Webinar Environmental ScanSherry Lake
 
Lake dmp tool_i_conference
Lake dmp tool_i_conferenceLake dmp tool_i_conference
Lake dmp tool_i_conferenceSherry Lake
 
Lake us-canada policesupdate
Lake us-canada policesupdateLake us-canada policesupdate
Lake us-canada policesupdateSherry Lake
 
Re tooling for data management-support
Re tooling for data management-supportRe tooling for data management-support
Re tooling for data management-supportSherry Lake
 
Dmp tool presentation
Dmp tool presentationDmp tool presentation
Dmp tool presentationSherry Lake
 
Funder requirements for Data Management Plans
Funder requirements for Data Management PlansFunder requirements for Data Management Plans
Funder requirements for Data Management PlansSherry Lake
 
Library support for life cycle
Library support for life cycleLibrary support for life cycle
Library support for life cycleSherry Lake
 
Environmental scan - Keeping Updated
Environmental scan - Keeping UpdatedEnvironmental scan - Keeping Updated
Environmental scan - Keeping UpdatedSherry Lake
 
Re tooling for data management-support
Re tooling for data management-supportRe tooling for data management-support
Re tooling for data management-supportSherry Lake
 
Supporting research life cycle librarians
Supporting research life cycle   librariansSupporting research life cycle   librarians
Supporting research life cycle librariansSherry Lake
 

More from Sherry Lake (17)

Planning for Libra Data
Planning for Libra DataPlanning for Libra Data
Planning for Libra Data
 
DMTool-ASERL-Webinar
DMTool-ASERL-WebinarDMTool-ASERL-Webinar
DMTool-ASERL-Webinar
 
DMPTool Workshop University of Georgia
DMPTool Workshop University of GeorgiaDMPTool Workshop University of Georgia
DMPTool Workshop University of Georgia
 
Federal funder mandates
Federal funder mandatesFederal funder mandates
Federal funder mandates
 
DMPTool2 demo for DMPTool-DMPonline Workshop IDCC 2014
DMPTool2 demo for DMPTool-DMPonline Workshop IDCC 2014DMPTool2 demo for DMPTool-DMPonline Workshop IDCC 2014
DMPTool2 demo for DMPTool-DMPonline Workshop IDCC 2014
 
Data Management Planning for Engineers
Data Management Planning for EngineersData Management Planning for Engineers
Data Management Planning for Engineers
 
DMPTool Webinar Environmental Scan
DMPTool Webinar Environmental ScanDMPTool Webinar Environmental Scan
DMPTool Webinar Environmental Scan
 
Lake dmp tool_i_conference
Lake dmp tool_i_conferenceLake dmp tool_i_conference
Lake dmp tool_i_conference
 
Lake us-canada policesupdate
Lake us-canada policesupdateLake us-canada policesupdate
Lake us-canada policesupdate
 
Re tooling for data management-support
Re tooling for data management-supportRe tooling for data management-support
Re tooling for data management-support
 
Web links
Web linksWeb links
Web links
 
Dmp tool presentation
Dmp tool presentationDmp tool presentation
Dmp tool presentation
 
Funder requirements for Data Management Plans
Funder requirements for Data Management PlansFunder requirements for Data Management Plans
Funder requirements for Data Management Plans
 
Library support for life cycle
Library support for life cycleLibrary support for life cycle
Library support for life cycle
 
Environmental scan - Keeping Updated
Environmental scan - Keeping UpdatedEnvironmental scan - Keeping Updated
Environmental scan - Keeping Updated
 
Re tooling for data management-support
Re tooling for data management-supportRe tooling for data management-support
Re tooling for data management-support
 
Supporting research life cycle librarians
Supporting research life cycle   librariansSupporting research life cycle   librarians
Supporting research life cycle librarians
 

Recently uploaded

Alamkara theory by Bhamaha Indian Poetics (1).pptx
Alamkara theory by Bhamaha Indian Poetics (1).pptxAlamkara theory by Bhamaha Indian Poetics (1).pptx
Alamkara theory by Bhamaha Indian Poetics (1).pptxDhatriParmar
 
AI Uses and Misuses: Academic and Workplace Applications
AI Uses and Misuses: Academic and Workplace ApplicationsAI Uses and Misuses: Academic and Workplace Applications
AI Uses and Misuses: Academic and Workplace ApplicationsStella Lee
 
LEAD5623 The Economics of Community Coll
LEAD5623 The Economics of Community CollLEAD5623 The Economics of Community Coll
LEAD5623 The Economics of Community CollDr. Bruce A. Johnson
 
3.12.24 The Social Construction of Gender.pptx
3.12.24 The Social Construction of Gender.pptx3.12.24 The Social Construction of Gender.pptx
3.12.24 The Social Construction of Gender.pptxmary850239
 
Auchitya Theory by Kshemendra Indian Poetics
Auchitya Theory by Kshemendra Indian PoeticsAuchitya Theory by Kshemendra Indian Poetics
Auchitya Theory by Kshemendra Indian PoeticsDhatriParmar
 
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - HK2 (...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - HK2 (...HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - HK2 (...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - HK2 (...Nguyen Thanh Tu Collection
 
THYROID HORMONE.pptx by Subham Panja,Asst. Professor, Department of B.Sc MLT,...
THYROID HORMONE.pptx by Subham Panja,Asst. Professor, Department of B.Sc MLT,...THYROID HORMONE.pptx by Subham Panja,Asst. Professor, Department of B.Sc MLT,...
THYROID HORMONE.pptx by Subham Panja,Asst. Professor, Department of B.Sc MLT,...Subham Panja
 
Research Methodology and Tips on Better Research
Research Methodology and Tips on Better ResearchResearch Methodology and Tips on Better Research
Research Methodology and Tips on Better ResearchRushdi Shams
 
POST ENCEPHALITIS case study Jitendra bhargav
POST ENCEPHALITIS case study  Jitendra bhargavPOST ENCEPHALITIS case study  Jitendra bhargav
POST ENCEPHALITIS case study Jitendra bhargavJitendra Bhargav
 
BBA 205 BUSINESS ENVIRONMENT UNIT I.pptx
BBA 205 BUSINESS ENVIRONMENT UNIT I.pptxBBA 205 BUSINESS ENVIRONMENT UNIT I.pptx
BBA 205 BUSINESS ENVIRONMENT UNIT I.pptxProf. Kanchan Kumari
 
LEAD6001 - Introduction to Advanced Stud
LEAD6001 - Introduction to Advanced StudLEAD6001 - Introduction to Advanced Stud
LEAD6001 - Introduction to Advanced StudDr. Bruce A. Johnson
 
Quantitative research methodology and survey design
Quantitative research methodology and survey designQuantitative research methodology and survey design
Quantitative research methodology and survey designBalelaBoru
 
25 CHUYÊN ĐỀ ÔN THI TỐT NGHIỆP THPT 2023 – BÀI TẬP PHÁT TRIỂN TỪ ĐỀ MINH HỌA...
25 CHUYÊN ĐỀ ÔN THI TỐT NGHIỆP THPT 2023 – BÀI TẬP PHÁT TRIỂN TỪ ĐỀ MINH HỌA...25 CHUYÊN ĐỀ ÔN THI TỐT NGHIỆP THPT 2023 – BÀI TẬP PHÁT TRIỂN TỪ ĐỀ MINH HỌA...
25 CHUYÊN ĐỀ ÔN THI TỐT NGHIỆP THPT 2023 – BÀI TẬP PHÁT TRIỂN TỪ ĐỀ MINH HỌA...Nguyen Thanh Tu Collection
 
EDD8524 The Future of Educational Leader
EDD8524 The Future of Educational LeaderEDD8524 The Future of Educational Leader
EDD8524 The Future of Educational LeaderDr. Bruce A. Johnson
 
3.14.24 The Selma March and the Voting Rights Act.pptx
3.14.24 The Selma March and the Voting Rights Act.pptx3.14.24 The Selma March and the Voting Rights Act.pptx
3.14.24 The Selma March and the Voting Rights Act.pptxmary850239
 
LEAD6001 - Introduction to Advanced Stud
LEAD6001 - Introduction to Advanced StudLEAD6001 - Introduction to Advanced Stud
LEAD6001 - Introduction to Advanced StudDr. Bruce A. Johnson
 
Plant Tissue culture., Plasticity, Totipotency, pptx
Plant Tissue culture., Plasticity, Totipotency, pptxPlant Tissue culture., Plasticity, Totipotency, pptx
Plant Tissue culture., Plasticity, Totipotency, pptxHimansu10
 
2024.03.16 How to write better quality materials for your learners ELTABB San...
2024.03.16 How to write better quality materials for your learners ELTABB San...2024.03.16 How to write better quality materials for your learners ELTABB San...
2024.03.16 How to write better quality materials for your learners ELTABB San...Sandy Millin
 

Recently uploaded (20)

Alamkara theory by Bhamaha Indian Poetics (1).pptx
Alamkara theory by Bhamaha Indian Poetics (1).pptxAlamkara theory by Bhamaha Indian Poetics (1).pptx
Alamkara theory by Bhamaha Indian Poetics (1).pptx
 
AI Uses and Misuses: Academic and Workplace Applications
AI Uses and Misuses: Academic and Workplace ApplicationsAI Uses and Misuses: Academic and Workplace Applications
AI Uses and Misuses: Academic and Workplace Applications
 
LEAD5623 The Economics of Community Coll
LEAD5623 The Economics of Community CollLEAD5623 The Economics of Community Coll
LEAD5623 The Economics of Community Coll
 
3.12.24 The Social Construction of Gender.pptx
3.12.24 The Social Construction of Gender.pptx3.12.24 The Social Construction of Gender.pptx
3.12.24 The Social Construction of Gender.pptx
 
Auchitya Theory by Kshemendra Indian Poetics
Auchitya Theory by Kshemendra Indian PoeticsAuchitya Theory by Kshemendra Indian Poetics
Auchitya Theory by Kshemendra Indian Poetics
 
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - HK2 (...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - HK2 (...HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - HK2 (...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - HK2 (...
 
THYROID HORMONE.pptx by Subham Panja,Asst. Professor, Department of B.Sc MLT,...
THYROID HORMONE.pptx by Subham Panja,Asst. Professor, Department of B.Sc MLT,...THYROID HORMONE.pptx by Subham Panja,Asst. Professor, Department of B.Sc MLT,...
THYROID HORMONE.pptx by Subham Panja,Asst. Professor, Department of B.Sc MLT,...
 
Research Methodology and Tips on Better Research
Research Methodology and Tips on Better ResearchResearch Methodology and Tips on Better Research
Research Methodology and Tips on Better Research
 
POST ENCEPHALITIS case study Jitendra bhargav
POST ENCEPHALITIS case study  Jitendra bhargavPOST ENCEPHALITIS case study  Jitendra bhargav
POST ENCEPHALITIS case study Jitendra bhargav
 
BBA 205 BUSINESS ENVIRONMENT UNIT I.pptx
BBA 205 BUSINESS ENVIRONMENT UNIT I.pptxBBA 205 BUSINESS ENVIRONMENT UNIT I.pptx
BBA 205 BUSINESS ENVIRONMENT UNIT I.pptx
 
ANOVA Parametric test: Biostatics and Research Methodology
ANOVA Parametric test: Biostatics and Research MethodologyANOVA Parametric test: Biostatics and Research Methodology
ANOVA Parametric test: Biostatics and Research Methodology
 
Least Significance Difference:Biostatics and Research Methodology
Least Significance Difference:Biostatics and Research MethodologyLeast Significance Difference:Biostatics and Research Methodology
Least Significance Difference:Biostatics and Research Methodology
 
LEAD6001 - Introduction to Advanced Stud
LEAD6001 - Introduction to Advanced StudLEAD6001 - Introduction to Advanced Stud
LEAD6001 - Introduction to Advanced Stud
 
Quantitative research methodology and survey design
Quantitative research methodology and survey designQuantitative research methodology and survey design
Quantitative research methodology and survey design
 
25 CHUYÊN ĐỀ ÔN THI TỐT NGHIỆP THPT 2023 – BÀI TẬP PHÁT TRIỂN TỪ ĐỀ MINH HỌA...
25 CHUYÊN ĐỀ ÔN THI TỐT NGHIỆP THPT 2023 – BÀI TẬP PHÁT TRIỂN TỪ ĐỀ MINH HỌA...25 CHUYÊN ĐỀ ÔN THI TỐT NGHIỆP THPT 2023 – BÀI TẬP PHÁT TRIỂN TỪ ĐỀ MINH HỌA...
25 CHUYÊN ĐỀ ÔN THI TỐT NGHIỆP THPT 2023 – BÀI TẬP PHÁT TRIỂN TỪ ĐỀ MINH HỌA...
 
EDD8524 The Future of Educational Leader
EDD8524 The Future of Educational LeaderEDD8524 The Future of Educational Leader
EDD8524 The Future of Educational Leader
 
3.14.24 The Selma March and the Voting Rights Act.pptx
3.14.24 The Selma March and the Voting Rights Act.pptx3.14.24 The Selma March and the Voting Rights Act.pptx
3.14.24 The Selma March and the Voting Rights Act.pptx
 
LEAD6001 - Introduction to Advanced Stud
LEAD6001 - Introduction to Advanced StudLEAD6001 - Introduction to Advanced Stud
LEAD6001 - Introduction to Advanced Stud
 
Plant Tissue culture., Plasticity, Totipotency, pptx
Plant Tissue culture., Plasticity, Totipotency, pptxPlant Tissue culture., Plasticity, Totipotency, pptx
Plant Tissue culture., Plasticity, Totipotency, pptx
 
2024.03.16 How to write better quality materials for your learners ELTABB San...
2024.03.16 How to write better quality materials for your learners ELTABB San...2024.03.16 How to write better quality materials for your learners ELTABB San...
2024.03.16 How to write better quality materials for your learners ELTABB San...
 

Managing the research life cycle

  • 1. Managing the Research Data Life Cycle Presented by Sherry Lake ShLake@virginia.edu July 31, 2012 University of Florida Data Management Workshop
  • 2. Research Life Cycle Data Re- Data Deposit Discovery Use Archive Proposal Project Data Data Data End of Planning Start Up Collection Analysis Sharing Project Writing Re- Purpose Data Life Cycle
  • 3. Why Manage Data?  Saves time  Others can understand your data  Makes sharing/preserving data easier  Reinforces open scientific inquiry and replication of results  Increases the visibility of your research  Facilitates new discoveries  Reduces costs by avoiding duplication  Required by funding agencies Proposal Planning Writing
  • 4. Ethical and Legal Issues  Confidentiality  Evaluate the sensitivity of your data  Comply with institution’s research guidelines  Comply with regulations for health research  May need to enable a restricted view of your data  Intellectual Property  Copyright  Patents Proposal Planning Writing
  • 5. Data Sharing and Retention Requirements  Be Aware of Funding Requirements  Informal sharing statement  Separate Data Management Plan  Know What Your Institution Requires  Know What Your Department Requires  Publisher’s Requirement  Nature Magazine Proposal Planning Writing
  • 6. Create a Data Management Plan  Appoint Data Manager Contact  Describe data to be collected and methodology  Include guidelines on data documentation  Plan quality assurance and backup procedures  Plan sharing of data for public use  Include preservation plans  Document copyright and intellectual property rights Project Start Up
  • 7. Data Life Cycle within Context of the Research Life Cycle Data Re- Data Deposit Discovery Use Archive Proposal Project Data Data Data End of Planning Start Up Collection Analysis Sharing Project Writing Re- Purpose Data Life Cycle
  • 8. Managing Data in the Data Life Cycle  Data Collection and Organization  Data Control & Security  Backup & Storage  Documentation and Metadata  Processing and Analysis  Preparing Data to Share
  • 9. What is Data?  Observational – data captured in real-time  Examples: Sensor readings, telemetry, survey results, images  Usually irreplaceable  Experimental – data from lab equipment  Examples: gene sequences, chromatograms, magnetic field readings  Often reproducible, but can be expensive
  • 10. What is Data?  Simulation – data generated from test models  Examples: climate models, economic models  Models & metadata (inputs) more important than output data  Derived or compiled – data  Examples: text and data mining, compiled database, 3D models  Reproducible (but very expensive)
  • 11. Types and Formats of Data Types Examples Text ASCII, Word, PDF Numerical ASCII, SPSS, STATA, Excel, Access, MySQL Multimedia Jpeg, tiff, mpeg, quicktime Models 3D, statistical Software Java, C, Fortran Domain-specific FITS in astronomy, CIF in chemistry Instrument- Olympus Confocal Microscope specific Data Format
  • 12. Organizing Your Files  File Version Control  Directory Structure/File Naming Conventions  File Naming Conventions for Specific Disciplines  File Structure  Use Same Structure for Backups
  • 13. Data Security & Access Control Protection of data from unauthorized access, use, change, disclosure and destruction • Network Security • Physical Security • Computer Systems & Files
  • 14. Data Security & Access Control  Network security  Keep confidential data off internet servers (or behind firewalls)  Put sensitive materials on computers not connected to the internet  Physical security  Access to buildings and rooms  Computer systems & files  Use passwords on files/systems  Virus protection
  • 15. Data Storage Things to consider when deciding on where and how to store your data  File Format  Media Life and Format  Disaster Recovery Plan  Environmental Conditions  Security
  • 16. Backup Your Data  Reduce the risk of damage or loss  Use multiple locations (one off-site)  Validate using checksums  Create a backup schedule  Use reliable backup medium  Test your backup system (i.e., test file recovery)
  • 17. Backup & Storage Options  Personal Computer  Departmental or University Server  Tape Backups  Subject archive  CDs or DVDs – NOT Recommended  External Hard Drives  Cloud Storage
  • 18. Documentation  Start at beginning of research and continue throughout  Data documentation enables you to understand the data in detail  Enables others to find it, use it and properly cite it
  • 19. Data Documentation Data documentation includes information on: + The Project + Data Collection Methods + Structure of the data files + Data sources used + Transformations of the data At the data-level, information on: + Labels and descriptions for variables & records + Codes and classifications + Derived data algorithms + File format and software used
  • 20. Data Collection Best Practices detailed in the presentation that follows. Data Collection
  • 21. Data Processing & Analysis Software tools to create, process and visualize the data + Programming languages (Fortran, PHP, Ruby, Python, C++, etc) + Data collection software (LabView) + Analysis (SPSS, SAS, Matlab, Mathematica, R, etc) Data Analysis
  • 22. Recording Processes Record every change to a file, no matter how small + Document changes to files + Use file naming conventions + Headers inside the file + Log files (automatic) + Version Control Software (e.g. SVN) + File sharing software (Google Drive, or DropBox, others) Data Analysis
  • 23. Prepare to Share Preparing data to share makes publishing data easier • Archive Submission Policies/Guidelines • File Format Conversion • Documentation & Metadata • Programming Code • Citations to existing datasets • Creation of un-restricted dataset Data Sharing
  • 24. Choosing File Formats Accessible in the future • Non-proprietary • Open, documented standard • Common, used by the research community • Standard representation (ASCII, Unicode) • Unencrypted • Uncompressed Data Sharing
  • 25. Preferred Format Choices  PDF, not Word  ASCII, not Excel  MPEG-4, not Quicktime  TIFF or JPEG2000, not GIF or JPG  XML or RDF, not RDBMS Not software specific Data Sharing
  • 26. Documentation & Metadata What is Metadata?  Who created the data?  What is the content of the data set?  When was it created?  Where was it collected?  How was it developed? Data  Why was it developed? Sharing
  • 27. Metadata Formats & Standards  Provides structure to describe data  Common terms  Definitions  Language  Structure  Many different standards (based on discipline)  DDI  FGDC  EML  Tools for creating metadata files  Nesstar (DDI) Data Sharing  Metavist (FGDC)  Morpho (EML)
  • 28. Archiving Your Data  Informally on a peer-to-peer basis  Make accessible on online project web page  Make accessible on institutional web site  Submitting to a journal  Deposit in discipline specific repository  Deposit in Institutional Repository
  • 29. Advantages of Repositories  Secure Environment  Backups  Quality of Data  Promotion of Data  Access Control to Data  Easy Dissemination  Long-term Preservation  Online Resource Discovery  Licensing Arrangements
  • 30. Data Repositories  Example of discipline specific repositories: + SIMBAD (Astronomy) + Protein Data Bank (Biology) + PubChem (Chemistry) + GEON (Earth Science) + Long Term Ecological Research (Ecology) + ICPSR (Social Sciences) Databib is a tool for helping people identify and locate online repositories of research data. http://databib.org
  • 31. Data Management Bibliography Graham, A., McNeill, K., Stout, A., & Sweeney, L. (2010). Data Management and Publishing. Retrieved 05/31/2012, from http://libraries.mit.edu/guides/subjects/data-management/. Inter-university Consortium for Political and Social Research (ICPSR). (2012). Guide to social science data preparation and archiving: Best practices throughout the data cycle (5th ed.). Ann Arbor, MI. Retrieved 05/31/2012, from http://www.icpsr.umich.edu/files/ICPSR/access/dataprep.pdf. Van den Eynden, V., Corti, L., Woollard, M. & Bishop, L. (2011). Managing and Sharing Data: A Best Practice Guide for Researchers (3rd ed.). Retrieved 05/31/2012, from http://www.data- archive.ac.uk/media/2894/managingsharing.pdf
  • 32. Questions?  Sherry Lake Senior Scientific Data Consultant, UVA Library  shlake@virginia.edu  Twitter: shlakeuva  Slideshare: http://www.slideshare.net/shlake  Web: http://www.lib.virginia.edu/brown/data 32

Editor's Notes

  1. This class is aimed at those engaged in the life cycle of research, from applying for research grant, thru data collection & ultimately to preparation of the data for deposit in a public archive.Some projects generate enormous amounts of data that it takes up much of the scientists time. Data management primarily occurs within the lifecycle of a research porject.Data sharing plans should be developed in conjunction with an archive to maximize the utility of the data to research and to ensure the availability of the data in the future.
  2. Steps in the Research Life Cycle:Proposal Planning & Writing: Conduct a review of existing data setsDetermine if project will produce a new dataset (or combing existing)Investigate archiving challenges, consent and confidentialityId potential users of your dataDetermine costs related to archivingContact Archives for advice (Look for archives)Project Start UpCreate a data management planMake decisions about document form and contentConduct pretest & tests of materials and methodsData CollectionFollow Best PracticeOrganize files, backups & storage, QA for data collectionAccess Control and SecurityData AnalysisManage file versionsDocument analysis and file manipulationsData SharingDetermine file formatsContact Archive for adviceMore documenting and cleaning up dataEnd of ProjectWrite PaperSubmit Report FindingsDeposit Data in Data Archive (Repository) Remember: Managing Data in a research project is a process that runs throughout the project. Good data management is the foundation for good research. Especially if you are going to share your data. Good management is essential to ensure that data can be preserved and remain accessible I the long-term, so it can be re-used and understood by other researchers. When managed and preserved properly research data can be successfully used for future scientific purposes.
  3. Planning the management of your data before you begin your research AND throughout its lifecycle is essential to ensure its current usability & long-term preservation and access.Can focus on research not user requestsWith a repository keeping your data, you can focus on your research rather than fielding requests or worrying about data on a web page. Your project may have lots of people working on it, you will need to know what each is doing and has done. Project may last years.Funding agencies now require a data management planYou can understand your data at a later timeHaving your data documented will allow future users understand your data and be able to use it.Takes less time to get data ready to shareIf follow plan then data should be ready for archiving (documenting the data throughout) insures proper description of the data are maintained.
  4. Will the data contain direct or indirect identifiers that could be used to identify research participants?Challenges for archiving data…. Need to think about consentLinks on Uva compliance in research links on handout.Health Research links on handouts too. HIPPA Privacy Rule (Health Insurance Portability and Accountability Act (HIPAA) Privacy Rule is the first comprehensive Federal protection for the privacy of personal health information)Your discipline may have other policies, i.e. National Academy of Engineering (link on handouts)Intellectual Property-determine copyright & ownership of research dataIf you’ve gathered the data from multiple sources, need to obtain permission to publish it.
  5. Regarding research data generated from proposal/project Sharing and Data RetentionBefore you start your plan check mandates, policies, & procedures of grant funding and UvaExample from UVA: UVa’s policy on recordkeeping in research, Uva’s Health System Office of ResearchNIH Data Sharing Policy & Implementation Guidance (2003) suggests the following in the proposals: Schedule for data sharing Format of final dataset Documentation to be provided Analytical tools to be provided, if any Need for data sharing agreement Mode of data sharingNIH generally requires that files resulting from research awards be retained for at least three years after the final financial report has been filed.  However, Commonwealth of Virginia record retention regulations are more strict (see below) and require that such records be retained five years after filing of the final financial report of a funding periodNSFdevelop and submit specific plans to share materials collected with NSF support, except where this is inappropriate or impossible. These plans should cover how and where these materials will be stored at reasonable cost, and how access will be provided to other researchers, generally at their cost. UVaData and notebooks resulting from sponsored research are the property of the University of Virginia. It is the responsibility of the principal investigator to retain all raw data in laboratory notebooks (or other appropriate format) for at least five years after completion of the research project (i.e., publication of a paper describing the work, or termination of the supporting research grant, whichever comes first) unless required to be retained longer by contract, law, regulation, or by some reasonable continuing need to refer to them.Uva Health SystemHas a responsible conduct of research that includes data management (protection, sharing, retention times)
  6. How do you get started managing data.So how do I get started managing data?Handout has a link to Managing & Sharing Data with more detailsAlso link to a Data Management Plan FormShould be written down… sort like an instruction book.
  7. Life cycle of a research project with respect to the data it creates:Data Collectiondata collection, entry, checking & cleaningData Analysis analyze data, derived “new” data, data documentationData Sharing prepare data for submissionManaging the Data in the Data Life Cycle includes: backup & storage, version control, file conversions, security & access control Document all data details
  8. Here’s the details about what we are going to manage in the Data Life Cycle.
  9. National Science Board. (2005). Long-lived digital data collections: Enabling research and education in the 21st century. Retrieved from http://www.nsf.gov/pubs/2005/nsb0540/nsb0540.pdfobservational data cannot be recollected and are archived indefinitely. cannot be recollected, remeasured, or verified. Data are typi- cally time and/or location dependent. This context is set by the fact that much of the value of observational data is in its secondary analysis. Experimental data can often be reproduced, although there are cases where experimental conditions or variables are unknown. Experimental data may be associated with a particular meth- odology or instrument
  10. These are sometimes lumped together as computational data:Data that is the result of computer models or simulations can be reproduced if adequate infor- mation is provided about the computer hardware, software, and inputs. Statistical data, computational models, and simulations can also be recreated and verified, as long as sufficient disciplines Can you think of anything else as “data”? Most of the time we are managing the “digital” data, what about the non-digital … lab notebooks, notes, ?
  11. Shows the many differing types and the many different formats for each one.Things to consider when choosing File FormatsCollection/Analysis format does not have to be the same as Preservation format, but if not, then it will need to be converted (interchangeable format – will talk about this later) for archiving.You can choose one format to do analysis, because it may be faster to do in proprietary format. But will need to change to a non-proprietary format later for archiving (Prepare for sharing). Migrate data into a format with these characteristics. Also keep a copy of the original software format.
  12. Keep track of versions of documentation and data. Use directory structure and file naming conventions to help, or use Version Control SoftwareAlways record every change to a file no matter how small. Record relationships between files.Directory Structure: Top Level folder should include Project Name and Date,Each subsequent level should have its naming convention documented….. i.e., categorize by people, experiment, dataset versionFile naming conventions: reserve 3-letter file extension for application-specific codes, Id project in the file nameUse dates in filenames, some disciplines have their own recommendations for file namingFile Structure… flat files vs database (relational)Keep directory structure same for backups.I’ll go over more detail with examples in the next presentation on best practices
  13. Keep master copy to an assigned team memberRestrict write access to specific membersRecord changes with Version controlNetwork: keep confidential data off internet servers (or behind firewalls), put sensitive materials on computers not connected to the internetPhysical security… who has access to your office,. Allowing repairs by an outside companyComputer: Keep virus protection up to date, does your computer have a login password, not sending personal or confidential data via e-mail or FTP, transmit via encrypted data, imposing confidentially agreements for data users Link Managing and Sharing Data document has anindepth section on Ethics, Consent and Confidentiality.
  14. Data Storage for collected data and for backupsConsider Storage and Backup Options the sameUse formats that will be useable in the long-term, not dependent on a software versionCD & DVDs media life not reliable, may have to replace old media, maintaining devices that can still read the proprietary formats or media typeCopy or migrate data files to new media between 2 and 5 years after created.Appropriate environmental conditions will increase the life-span of media. Check environmental conditions recommendations for your particular media. Make sure storage location free from risk of fire and flood. Proper storage of “paper” dataBe aware of thefts, file changes and “loss” (data only on paper..??)
  15. Why backup data?Keeping reliable backups is an integral part of data management. Regular back-ups protect against data loss due to:Hardware failure, software of media faults, virus infection or hacking, power failure, human errorsRecommendation, 3 backup copies original, external/local, external/remoteFull-backups, incrementalCheck the integrity of the files ensure transmitted without error (checksum and file size) Calculate a “value” of a block of data, perform on both files and if same “number” then OK.If using departmental server, check on backup/restore procedures (how quickly can you get files restored?)May want to have the backup procedures controlled by you.Test your backup system, test restoring files, don’t over re-use backup media
  16. Use some options for “storage” others for backupsCloud Storage (Google Docs, DropBox, Windows Live SkyDrive, SpiderOak)
  17. Documentation should start with the Data Management Plan. Start at the beginning and continue reduces likelihood that you will forget aspects of your data later.Document data collection, lab notebooks, digizitation infoThink about non-digital, papers, photos, reports, lab notebooks…. Should be digitized and stored with digital data.In order for the data to be used properly once it’s been archived the data must be documented.Data documentation (otherwise known as Metadata) enables you to understand the data in detail, enable others to find it, use it and properly cite it.Use versioning software for documentation file too.
  18. Conform to community standards for recording data & metadata that adequately describe the context & quality of the data & help others use & find it.Data validation and other quality assurance proceduresModifications of the dataInformation should include:Title, Creator, Subject, Funders, Rights, Dates, Location, Methodology, Data Processing, Sources, File Formats, Variable Lists, Code lists, May need to put the this info in a metadata standard DDI, MODS, FGDC, DarwinCore, EML
  19. Keep a copy of the data in its original form. Maintain it and final version as read-only. With detailed documentation, someone could replicate your findings from the original set to final.As you analyze your data, there will be various changes, additions and deletions to the dataset.Enables reproducibility – validate findings- Executability – others can re-run or re-use analysis
  20. 1st version: original data collection2nd version: “cleaned” dataset3rd version: combining variables & analysis Filenames include version# & “who”
  21. There are lots of ways to share your data without depositing it in a repository: e-mail to requestors, posting to a web site, google, or other “cloud” sharing site, but you have to maintain it. And it makes “finding” your data harder.Depositing it in an archive makes it easier to discover and preserve. If it’s documented, well, then easy to use.Make sure confidentiality of respondent data is preserved. Will need to create a version of the dataset without personal info.
  22. Safest option to guarantee long-term data access is to convert data to standard formats.For you the researcher even if not planning on sharing (publishing)These are formats more likely to be accessible in the future.Format of the file is a major factor in the ability to use the data in the future. As technology changes, plan for software and hardware obsolescence. System files (SAS, SPSS) are compact and efficient, but not very portable. Use software to “export” data to a portable (or transport) file. “Interchangeable format”Convert proprietary formats to non-proprietary. Check for data errors in conversion.
  23. Examples of preferred format choicesFormats for long-term digital preservation (open). Don’t expect you (won’t have time) or the archive to be able to convert older formats to new ones.Good chart in the UK Document on Managing and Sharing Data (page 9).
  24. Let’s stop and make sure everyone knows or can define “metadata”. What you use to describe your data, the pieces of information that will allow someone to understand your data, how it was collected thus making another person to replicate your results.In order for the data to be used properly once it’s been archived the data must be documented.If you had been documenting your data and files all along, this step should be easy
  25. In order for the data to be used properly once it’s been archived the data must be documented.Metadata accompanying file should be written for a user 20 yrs into future…. Or written to someone not know about you or your work.
  26. Where you archive your data has an impact on “who can find” your data. Are you looking for long-term preservation (how long would your data be useful)?Each has advantage and disadvantages. Data centers may not be able to accept all data. Start looking at where you want to archive while doing your project. Base your Data management plan on the expectations and criteria for archiving.
  27. Data repositories may have criteria to evaluate and select datasets for reservation.
  28. Select those that could provide long-term access