Data management plans (dmp) for nsf
Upcoming SlideShare
Loading in...5
×
 

Data management plans (dmp) for nsf

on

  • 571 views

 

Statistics

Views

Total Views
571
Views on SlideShare
571
Embed Views
0

Actions

Likes
0
Downloads
7
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Data management plans (dmp) for nsf Data management plans (dmp) for nsf Presentation Transcript

  • Brad Houston University Records Officer December 2, 2011
    • Document describing data (and/or digital materials) that have been or will be gathered in a study or project.
    • Often includes details on how data will be organized, preserved, and accessed
    • Facilitates re-use of data sets by either PI or other researchers
    • Required component of grants for MANY agencies (NSF and NIH)
    • Starting January 2011 for NEW, non-collaborative proposals
    • Not voluntary – “integral part” of proposal
    • Data Management Plans for all data resulting from any level of NSF funding
    • Supplementary 2-page document (max)
    • Optional: Also part of 15-page (max) Project Description
    • Must address both physical and digital data
    • “ Efficiency and effectiveness” of the DMP will be considered by NSF and disciplinary division or directorate
    • Must include sufficient information that peer reviewers and project monitors can assess present proposal and past performance
    • As of January 2011, proposals will not be accepted without an accompanying data plan!
      • Such dissemination of data is necessary for the community to stimulate new advances as quickly as possible and to allow prompt evaluation of the results by the scientific community. “ – NSF (italics mine)
      • Part of Openness trend in federal government (data.gov - Open Government Initiative)
      • NIH Public Access Policy (2008)
      • Public access to federally funded research hearings - Information Policy, Census and National Archives Subcommittee of U.S. Congress (July, 2010)
    • It makes your research easier!
    • Data available in case you need it later
    • Helps avoid accusations of fraud or bad science
    • To share it for others to use and learn from
    • To get credit for producing it
    • To keep from drowning in irrelevant stuff
      • ... especially at grant/project end
    • Gene expression microarray data: “Publicly available data was significantly (p=0.006) associated with a 69% increase in citations, independently of journal impact factor, date of publication, and author country of origin.”
      • Piwowar, Heather et al. “Sharing detailed research data is associated with increased citation rate.” PLoS One 2010. DOI: 10.1371/journal.pone.0000308
    • Maybe there’s an advantage here!
    • Discuss specific requirements for NSF Data Management plans
    • Suggest ways to manage, share, and archive data more effectively
    • Provide resources for more information
    • Planning, retention, data defined
    • What data are you collecting or making?
    • Can it be recreated? How much would that cost?
    • How much of it? How fast is it growing? Does it change?
    • What file format(s)?
    • What’s your infrastructure for data collection and storage like?
    • How do you find it, or find what you’re looking for in it?
    • How easy is it to get new people up to speed? Or share data with others?
    • Who are the audiences for your data?
      • You (including Future You), your lab colleagues (including future ones), your PIs
      • Disciplinary colleagues, at your institution or at others
      • Colleagues in allied disciplines
      • The world!
    • What are your obligations to others?
      • Funder requirements
      • Confidentiality issues
      • IP questions
      • Security
    • How do you and your lab get from where you are to where you need to be?
      • Document, document, document all decisions and all processes!
    • Secret sauce: the more you strategize upfront, the less angst and panic later.
      • “ Make it up as you go along” is very bad practice!
      • But the best-laid plans go agley... so be flexible.
      • And watch your field! Best practices are still in flux.
    • Four kinds of data defined by OMB:
      • Observational
        • Examples: Sensor data, telemetry, survey data, sample data, neuroimages.
      • Experimental
        • Examples: gene sequences, chromatograms, toroid magnetic field data.
      • Simulation
        • Examples: climate models, economic models.
      • Derived or compiled
        • Examples: text and data mining, compiled database, 3D models, data gathered from public documents.
    • Preliminary analyses
      • Raw data is included in this definition
    • Drafts of scientific papers
    • Plans for future research
    • Peer reviews or communications with colleagues
    • Physical objects, such as gel samples
    • As early as possible, but no later than guidelines laid down by relevant Directorate
      • Engineering Section: “no later than the acceptance for publication of the main findings of the final data”
      • Earth Sciences: “No later than two (2) years after the data were collected.”
      • Social and Economic Sciences: “within one year after the expiration of an award”
    • Be aware of concerns that may require earlier or later disclosure
      • FERPA? Human Subjects data? HIPAA?
    • Again, specific retention periods will depend on the type of data and the Directorate
      • Example: Engineering Section suggests retention period of “three years after either completion of the grant project or public release of research data, whichever is later”
    • Certain types of data will need to be retained longer
      • Patent data, longitudinal data sets, etc.
    • Ask: is your data of permanent value?
    • Analyzed data (incl. images, tables and tables of numbers used for making graphs)
    • Metadata that defines how data was generated, such as experiment descriptions, computer code, and computer-calculation input
    • Investigators are expected to preserve/share primary data, samples, physical collections, & supporting materials
    • Provide easily accessible information about data holdings, including quality assessments and guidance/finding aids
    • Data may be made available through submission to national data center, publication in journal, book, or accessible website of institutional archives
    • Requirements and their implications
    • All submitted plans must include, at minimum:
      • Expected Data : types, physical/electronic collections, materials to be produced
      • Standards for data and metadata format and content
      • Policies for access and sharing, including provisions for appropriate protection of privacy, confidentiality, security, intellectual property, etc.
      • Policies and provisions for re-use, re-distribution, and the production of derivatives
      • Plans for archiving data, samples, and other research products , and for preservation of access to them
    • In short: What kind of data will be produced by your research processes?
    • Keep in mind:
      • File formats of complete data sets
      • Any software or code that will be needed/produced
      • Physical samples or other individual data points
        • Some divisions require retention of physical samples; consult your Program Officer
    • In short: how will you organize your data within datasets to make it widely accessible, and how will you make data sets identifiable?
    • Keep in mind:
      • Any data formatting standards for your particular discipline
      • Any metadata (author, date, subject, etc.) that your program attaches automatically, and what you will need to attach manually
      • How will you find your data for later consultation? How will others find it?
    • In short: How will you allow other researchers to find and use your data?
    • Keep in mind:
      • How will other researchers find your data? (i.e. How will you publicize its existence?)
      • How will you provide access to your data?(CD-RW? Data Repository? Download via panther FILE ?)
      • How will you prepare your data for sharing?
        • Do you need to depersonalize or declassify anything?
    • Data Management Plans are required even if a project is not expected to generate data that requires sharing
    • DMP should clearly explain non-sharing in light of COI standards (peer review)
    • Between the lines: Not sharing will require justification and close scrutiny by NSF
    • Sharing is preferred
    • In short: How will researchers obtain the appropriate permissions to use your data?
    • Keep in mind:
      • Is a blanket permissions statement or a case-by-case policy more efficient/practical?
      • What responsibilities will users of your data have re: privacy, intellectual property, etc.?
      • How will you deal with users who violate these provisions?
    • In short: How will you make sure your data stays intact and available once you are done using it?
    • Keep in Mind:
      • What are your retention requirements? Is this a permanent data set?
      • What storage media will you use? Are you prepared to migrate/emulate as needed?
      • Do you have a data backup plan?
    • Preparing, sharing, and archiving your data sets
    • Think about where you will put your data
      • Local? Network drive? Online data management system?
    • Think about how you (or others) will find your data
      • Think about how others may use your data, when found
    • Think about how to store your data in the long term (or if to store it long-term at all)
    • Will anybody be able to read these files at the end of your time horizon?
    • Where possible, prefer file formats that are:
      • Open, standardized
      • Documented
      • In wide use
      • Easy to data-mine, transform, recast
    • If you need to transform data for durability,
    • do it now, not later.
    • Fundamental question: What would someone unfamiliar with your data need in order to find, evaluate, understand, and reuse them?
    • Consider the differences between someone inside your lab, someone outside your lab but in your field, and someone outside your field.
    • Two parts: metadata and methods
    • About the project
      • Title, people, key dates, funders and grants
    • About the data
      • Title, key dates, creator(s), subjects, rights, included files, format(s),versions, checksums
      • Interpretive aids: codebooks, data dictionaries, algorithms, code
    • Keep this with the data– think of it as a Readme file
    • Reason #1 for not reusing someone else’s data: “I don’t know enough about how it was gathered to trust it.”
    • Document what you did. (A published article may or may not be enough.)
    • Document any limitations of what you did.
    • If you ran code on the data, document the code and keep it with the data.
    • Need a codebook? Or a data dictionary?
      • If I can’t identify at sight what each bit of your dataset means, yes, you do need a codebook or data dictionary.
      • DO NOT FORGET UNITS!
    • Your own drive (PC, server, flash drive, etc.)
      • And if you lose it? Or it breaks?
    • Somebody else’s drive
      • Departmental or campus drive
      • “ Cloud” drive
      • Do they care as much about your data as you do?
    • What about versioning?
    • Library motto: Lots Of Copies Keeps Stuff Safe.
      • Two onsite copies, one offsite copy.
      • Keep confidentiality and security requirements in mind, of course
    • If data need to persist beyond project end, you have to deal with a new kind of risk: organizational risk.
      • Servers come and go. So do labs. So do entire departments.
      • This is especially important if you share data! Don’t let it 404!
    • You need to find a trustworthy partner.
      • On campus: try the library or Research and Sponsored Programs. (UITS has a role but can’t do it alone!)
      • Off campus: look for a disciplinary data repository, or a journal that accepts data. (It’s a good idea to do this as part of your planning process.)
    • Let somebody else worry! You have new projects to get on with.
    • Where to go for help and more information
    • Informational websites
      • UW-Madison: http://dataplan.wisc.edu/
      • UW-Milwaukee: http://dataplan.uwm.edu
      • Don’t just use the site for your own campus!
    • Data experts
      • IT cyberinfrastructure experts
      • Archivists/records managers
    • MINDS@UW: minds.wisconsin.edu
      • Data in final form that make sense as discrete files
    • For Information:
      • NSF Grant Proposal Guide
        • http://www.nsf.gov/pubs/policydocs/pappguide/nsf11001/gpg_index.jsp
      • MIT Data Management and Publishing
        • http://libraries.mit.edu/guides/subjects/data-management/index.html
    • For storage/management (non-inclusive):
      • A partial list of potential repositories: http://libraries.mit.edu/guides/subjects/data-management/publishing.html
      • Ask: can my home institution provide better service?
    • For assistance with writing your plan:
      • California Digital Library DMP Creation Tool
        • https://dmp.cdlib.org/ (select “None of the Above”)
      • Data Conservancy DMP Template/Questionnaire
        • http://dataconservancy.org/dataManagementPlans
      • DataONE Best Practices Examples
        • http://www.dataone.org/plans
    • Make sure your data plan covers at least the minimum requirements set out by NSF
    • Create appropriate metadata to help you manage and find data
    • Use open, universal standards and file formats
    • Be prepared to preserve access tools along with data itself
    • Be aware of time periods for data sharing and retention
    • Contact the presenter
      • Brad Houston, UW-Milwaukee
      • [email_address] (414) 229-6979
    • This presentation available online at:
      • http://www.slideshare.net/herodotusjr/data-management-plans-dmp-for-nsf