Wisconsin Cyberinfrastructure Days November 5, 2010 Dorothea Salo & Brad Houston
Document describing data (and/or digital materials) that have been or will be gathered in a study or project.
Often includes details on how data will be organized, preserved, and accessed
Facilitates re-use of data sets by either PI or other researchers
Required component of grants for MANY agencies (NSF and NIH)
Starting January 2011 for NEW, non-collaborative proposals
Not voluntary – “integral part” of proposal
Data Management Plans for all data resulting from any level of NSF funding
Supplementary 2-page document (max)
Optional: Also part of 15-page (max) Project Description
Must address both physical and digital data
“ Efficiency and effectiveness” of the DMP will be considered by NSF and disciplinary division or directorate
Must include sufficient information that peer reviewers and project monitors can assess present proposal and past performance
Such dissemination of data is necessary for the community to stimulate new advances as quickly as possible and to allow prompt evaluation of the results by the scientific community. “ – NSF (italics mine)
Part of Openness trend in federal government (data.gov - Open Government Initiative)
NIH Public Access Policy (2008)
Public access to federally funded research hearings - Information Policy, Census and National Archives Subcommittee of U.S. Congress (July, 2010)
It makes your research easier!
Data available in case you need it later
Helps avoid accusations of fraud or bad science
To share it for others to use and learn from
To get credit for producing it
To keep from drowning in irrelevant stuff
... especially at grant/project end
Gene expression microarray data: “Publicly available data was significantly (p=0.006) associated with a 69% increase in citations, independently of journal impact factor, date of publication, and author country of origin.”
Piwowar, Heather et al. “Sharing detailed research data is associated with increased citation rate.” PLoS One 2010. DOI: 10.1371/journal.pone.0000308
Maybe there’s an advantage here!
Discuss specific requirements for NSF Data Management plans
Suggest ways to manage, share, and archive data more effectively
Provide resources for more information
Requirements, retention, and planning
What data are you collecting or making?
Can it be recreated? How much would that cost?
How much of it? How fast is it growing? Does it change?
What file format(s)?
What’s your infrastructure for data collection and
How do you find it, or find what you’re looking for in it?
How easy is it to get new people up to speed? Or share data with others?
Who are the audiences for your data?
You (including Future You), your lab colleagues (including future ones), your PIs
Disciplinary colleagues, at your institution or at others
Colleagues in allied disciplines
What are your obligations to others?
How do you and your lab get from where you are to where you need to be?
Document, document, document all decisions and all processes!
Secret sauce: the more you strategize upfront, the less angst and panic later.
“ Make it up as you go along” is very bad practice!
But the best-laid plans go agley... so be flexible.
And watch your field! Best practices are still in flux.
All submitted plans must include, at minimum:
Expected Data : types, physical/electronic collections, materials to be produced
Standards for data and metadata format and content
Policies for access and sharing, including provisions for appropriate protection of privacy, confidentiality, security, intellectual property, etc.
Policies and provisions for re-use, re-distribution, and the production of derivatives
Plans for archiving data, samples, and other research products , and for preservation of access to them