• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Introduction to Data Management and Sharing
 

Introduction to Data Management and Sharing

on

  • 4,001 views

Scholars and researchers are being asked by an increasing number of research sponsors and journals to outline how they will manage and share their research data. This is an introduction to data ...

Scholars and researchers are being asked by an increasing number of research sponsors and journals to outline how they will manage and share their research data. This is an introduction to data management and sharing practices with some specific information for Columbia University researchers.

Statistics

Views

Total Views
4,001
Views on SlideShare
1,630
Embed Views
2,371

Actions

Likes
3
Downloads
23
Comments
0

6 Embeds 2,371

http://scholcomm.columbia.edu 1838
http://scholcomm.cul.columbia.edu 478
http://scholcomm-dev.cul.columbia.edu 28
http://digital.lampdev.columbia.edu 25
http://www.slideshare.net 1
http://webdev.cdrs.columbia.edu 1

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

CC Attribution License

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel

Introduction to Data Management and Sharing Introduction to Data Management and Sharing Presentation Transcript

  • Introduction to Data Management and Sharing University Libraries/Information Services Office of Research Compliance and Training
    • Why is there a new focus on data management and sharing?
  • Data sharing is not widely practiced…
    • Lack of time
      • for data clean up, user questions
    • Lack of recognition
      • not valued in promotion/tenure
    • Lack of control
      • worries about scooping, misinterpretation
    • Legal concerns
      • copyright, patents
    • Inadequate infrastructure
  • … yet its value is recognized Data sharing was a key element of: Human Genome Project NIH-funded Alzheimer’s study published in April 2011 Sloan Digital Sky Survey
  • There are new possibilities…
    • Networked digital technology creates new potential for:
    • data collection
    • data analysis
    • data “mash ups”
    • collaboration
    • citizen science
    National Science Foundation
  • “ The impact of science on people’s lives, and the implications of scientific assessments for society and the economy are now so great that  people won’t just believe scientists when they say “trust me, I’m an expert.” … Science has to adapt.” - Geoffrey Boulton, chair of working group for study: Science as a public enterprise: opening up scientific information , 5.13.11 … and science is in the spotlight
  • These factors have changed the conversation, resulting in…
  • Calls for data accessibility… “ It is obvious that making data widely available is an essential element of scientific research.” - Science editorial “Making Data Maximally Available,” 2.11.11
  • … and new data management policies
    • NSF and other research sponsors are strengthening their data management and sharing policies to help:
    • increase the accessibility of data
    • create standards and protocols
    • develop interoperable data repositories
    • encourage transparency of research
  • Submitting a proposal to the NSF?
    • You must:
    • Submit a two-page data management plan with your proposal.
    • Share your research data (or justify why you should not share it).
  • Publishing in a Nature journal? “… authors are required to make materials, data and associated protocols promptly available to readers.”
  • More than ever, researchers are expected to make their data accessible to—and usable by—others.
    • This means…
    • Having a data management plan is more important than ever.
    Library of Congress
  • Data management plan (DMP)
    • A data management plan outlines how you will collect, organize, manage, store, secure, back up, preserve, and share your data.
    Academic Commons
  • Other DMP elements
    • Designating who is responsible for data management
    • Tools or software needed to create/process/visualize the data
    • Compliance with policies and regulations
    NIST
  • Columbia DMP Template
    • Columbia provides a DMP template.
    • Though created in response to NSF requirements, you can use it as a guide for creating any DMP.
    • You can find the template on the NSF Data Management Requirements page of this website.
    • Some points to consider when creating your DMP
  • Your data storage needs
    • Data formats and size
    • Retention period
    • Privacy or security requirements
    • Backup plan
    • Access requirements
    Pittsburgh Supercomputing Center
  • Data storage planning
    • Plan for the entire life-cycle.
    • Establish a baseline and project the rate of growth for the duration of the project.
    CDC/Dorothy Roland
  • Two types of storage
    • Active
        • Frequent additions and updates
    • Archival
        • In fixed form; only need periodic access
    CDC
  • Active storage at Columbia
    • School/department/division servers
        • Many researchers use servers managed by “local” IT groups.
    • CUIT
        • 20-80 MB personal storage
        • Central LAN service
    • Center for Digital Research & Scholarship
        • Consultation available
  • Archival storage at Columbia
    • Digital
        • Academic Commons is Columbia’s online research repository.
    • Physical
        • Consult the appropriate Columbia University Libraries archive.
  • Best archival file formats
    • Nonproprietary file formats
    • Uncompressed and unencrypted files
    • Consider ease of migration going forward
    • May need to archive software as well as data
    INL
  • Data retention requirements
  • Other important retention policies
    • NIH
      • 3 years
    • NSF
      • Check with individual NSF directorates
        • Health Information Portability and Accountability Act (HIPPA)
      • At least 6 years
    USGS
  • Data security and integrity
    • Security
      • Protect data from unauthorized access or accidental disclosure.
    • Integrity
      • Ensure that data remains unaltered before, during, and after analysis and presentation.
    NPS
  • Data security requirements
    • Your data may be subject to laws and policies such as:
      • HIPAA (Health Information Portability and Accountability Act)
        • IRB (Institutional Review Board)
        • Columbia computing policies
            • See the Computing and Technology section of the Columbia Administrative Policy Library
  • Physical security best practices
      • Restricted access to research facilities, computers, data
      • Only trusted individuals troubleshoot computer problems
      • Lab notebooks, samples in locked cabinets
    Lawrence Berkeley National Laboratory
  • Digital security best practices
      • Sensitive data on computers not connected to Internet
      • Virus protection up to date
      • No confidential data via e-mail or FTP
      • Passwords to access files and computers
      • Proper data disposal at end of retention period
    Lawrence Livermore National Laboratory
  • Data backup best practices
    • Make 3 copies
      • Original
      • External/local
    • Verify recovery is possible
      • Checksum validation
      • Test file restore after initial setup
      • Per iodically thereafter
      • External/remote – different geographic area
  • Data backup options
    • Hard drive
    • Tape back-up
    • Server
    • Cloud storage
        • Amazon S3
        • Subject Repository/ Data Centers
            • Examples: PubChem, Dryad, IRI/LDEO
    NIH
  • Sharing requirements
    • How, when, and what you share depends on:
        • Data format
        • Restrictions on data
        • Funder and publisher guidelines
        • Customary embargo periods
        • Availability of appropriate repositories or other vehicles for sharing
    NIH
  • Sample data sharing guidelines
  • Sharing restrictions
      • Under HIPAA (Health Information Portability and Accountability Act), you cannot share information that compromises the confidentiality or privacy of human subjects. Any data resulting from studies using human subjects must be scrubbed of identifying information.
      • You may have other reasons that justify not sharing your data, and you can detail these in your data management plan. Funders may allow exceptions to data sharing policies.
    Sharing restrictions
  • Don’t forget metadata
    • Metadata is structured information that describes, explains, locates, and otherwise makes it easier to retrieve and use an information resource.
    BLM NTSC
      • “ The metadata accompanying your data should be written for a user 20 years into the future -- what does that person need to know to use your data properly? Prepare the metadata for a user who is unfamiliar with your project, methods, or observations. “
      • Oak Ridge National Laboratory
      • Distributed Active Archive Center
    Metadata facilitates use of your data
  • Major metadata standards
    • Darwin Core (Biology)
    • DDI (Data Documentation Initiative, for social and behavioral sciences data)
    • DIF (Directory Interchange Format for scientific data)
    • EML (Ecological Metadata Language)
    • FGDC/CSDGM (geographic data)
    • NBII (National Biological Information Infrastructure)
    • Online data repositories
      • organized around institutions or subjects
      • often open access
      • archival, not active,
      • may offer:
          • long-term preservation and access
          • search engine optimization
          • permanent URL or DOI
    Repositories for data sharing
  • Columbia’s repository Academic Commons accepts materials from faculty, students, and staff.
      • secure replicated storage
      • accurate metadata
      • globally accessible repository
      • contextual linking between data and publications
      • a permanent URL
  • Some subject-based repositories Space science mission repository Cryospheric data repository Macromolecular structural data repository Marine data repository Biological activities of small molecules data repository
  • More subject-based repositories Deep-sea core samples repository housed at LDEO Data repository for archeology and related disciplines Basic and applied biosciences data repository Geodesy data repository Social science data repository
  • Licensing your data
    • Copyright issues around data can be complex
    • These groups offer “ready-made” licenses for data that help clarify any restrictions on reuse
  • For more information
    • Data Management section of Scholarly Communication Program website
    • Sponsored Projects Administration
    • Office of Research Compliance and Training
    • Center for Digital Research and Scholarship
    • CUIT
    • Computing and Technology section of Columbia Administrative Policy Library