1. Where’s Your Data?
What's so important about
managing data?
Sherry Lake
University of Virginia Library
Curry Research Conference February 1, 2013
2. Why are you here?
• You’re managing data (your own or your
lab’s)
• Or you think maybe your should be
– Who has told you to?
• Or you think you will have to in the future
• You’re not sure why it matters
• You’re curious and want to know how
managing data affects you
3. Discussion
• Stories about poor data management
• In groups, assess & report out (2 minutes
max)
– What bad data-related practices are talked
about?
– What problems did the bad practice(s) cause?
– What happened to the researcher because of
poor practice?
– How could this have been prevented?
4. Why Manage Research Data?
• Saves Time
– Simplifies your research & increases your
research efficiency
• Makes preserving data for the long term
easier
– Takes less time to get data ready to share
• Supports Sharing
– Can focus on research not user requests
– Lets others understand your data
– Increases Research Impact
• Meets grant requirements
4
5. Who Cares?
From Flickr by Redden-McAllister
From Flickr by AJC1 www.rba.gov.au
6. Why Share Data?
• Required by funding agencies
• Reinforces open scientific inquiry
• Increases the visibility of your research &
your own research reputation
• Facilitates new discoveries
• Reduces costs by avoiding duplication
• Makes it easier to re-use and verify data sets
6
7. Who is Requiring Data Sharing?
• Publishers
– Nature Publishing Group
– American Naturalist
– Evolution
– Journal of Evolutionary Biology
– ESA journals
• Funding agencies
– National Institute of Health (NIH)
– NIH Public Access Mandate (for publications)
– National Science Foundation (NSF)
– Institute of Museum and Library Services (IMLS)
7
8. Dissemination & Sharing of Research Results:
“Investigators are expected to share with other
researchers, at no more than incremental
cost and within a reasonable time, the
primary data, samples, physical collections
and other supporting materials created or
gathered in the course of work under NSF
grants. Grantees are expected to encourage
and facilitate such sharing.”
NSF: Award & Administration Guide (AAG)
Chapter VI.D.4
8
9. Plans for Data Management & Sharing of the
Products of Research
• Proposals must include a supplementary
document of no more than two pages labeled:
“Data Management Plan”
• Document should describe how the proposal will
conform to NSF sharing policy
NSF: Grant Proposal Guide (GPG) Chapter II.C.2.j
10. What is a Data Management Plan?
• Brief description of how you will comply
with funder’s data sharing policy
• Reviewed as part of a grant application
• How do I create one?
• What is included?
10
11. DMPTool
• Online Data Management Plan creation tool
• Helps researchers meet requirements of NSF
and other U.S. funding agencies
• Guides researchers through the process of
creating a data management plan
• Is available to everyone
• Provides additional help for researchers at
UVa
23. Parts of a NSF Data Management Plan
I. Products of the Research: The types of data, samples,
physical collections, software, curriculum materials, and
other materials to be produced in the course of the project.
II. Data Formats: The standards to be used for data and
metadata format and content (where existing standards are
absent or deemed inadequate, this should be documented
along with any proposed solutions or remedies).
III. Access to Data and Data Sharing Practices and Policies:
Policies for access and sharing including provisions for
appropriate protection of privacy, confidentiality, security,
intellectual property, or other rights or requirements.
IV. Policies for Re-Use, Re-Distribution, and Production of
Derivatives.
V. Archiving of Data: Plans for archiving data, samples, and
other research products, and for preservation of access to
them.
Grant Proposal Guide (GPG) Chapter II.C.2.j 23
24. What is a Data Management Plan?
A comprehensive plan of how you will
manage your research data throughout the
lifecycle of your research project
1. Project description 5. Data administration issues:
2. Survey of existing data a. Funding and legislative
3. Data to be created requirements
a. Data organization b. Data owners and
methods (optional) stakeholders
4. Data sharing and c. Access and security
archiving d. Backups
6. Responsibilities
7. Data documentation and
metadata 24
8. Budget
Now let’s focus on why everyone is here. Learn about Data Management…..Why should you care about data management? Why should you manage your data?Saving Time: How many have had trouble “finding” your own data files, understanding it?, can’t tell which file is the most recent?If you have all your data organized and documented as you go through your research, it is easier at the end to “share” your data.When you share (i.e., in an archive) others can access your data without having to bother you and if it is well documented. Others can understand it.It has been shown that research with easy to find data and easy for reuse, is associated with an increase in citation rate. Piwowar, Heather A, Roger S Day, and Douglas B Fridsma. “Sharing Detailed Research Data Is Associated With Increased Citation Rate.” PLoS ONE 2.3 : 5.As you will see a key importance with having a Data Management Plan is help with sharing of data.– Main goal for grants.
If have journal article, have record of what you did stored in journals,..But the data underlying the results are really important,funders careColleagues – potential collaboratorsInstitutions (not shown here)Tenure committees more in the future.You: need to care you might need to go back to it in a few years… need good description.Future scientists – potentially use your data to discover important things. Need to be thinking about the future. (providing data for them)
In recent years several national scientific organizations have issued statements and policies underscoring the need for prompt archiving of data and some funding agencies have stared to require that the data they fund be deposited in a public archive.http://www.carlboettiger.info/archives/905Making you data available to other researchers through repositories can increase your prominence and show continued use of the data and the relevance of your research .Enabling others to use your data reinforces open scientific inquiry and can lead to new discoveries (maybe in a different discipline) new collaborations, prevents duplication of effortNYTimes – Alzheimer’s: parked our egos and intellectual-property noses outside the door and agreed that all of our data would be public immediately.
If you look at this list you will see journals in the life sciences, but it is a trend.The Office of Management and Budget (OMB) Circular A-110 provides the federal administrative requirements for grants and agreements with institutions of higher education, hospitals and other non-profit organizations. In1999, revised to provide public access under some circumstances to research data through the Freedom of Information Act (FOIA). Funding agencies have implemented the OMB requirement in various waysPubMed for NIH papersNIH 2003 Data Sharing Policy:In NIH's view, all data should be considered for data sharing. Data should be made as widely and freely available as possible while safeguarding the privacy of participants, and protecting confidential and proprietary data. To facilitate data sharing, investigators submitting a research application requesting $500,000 or more of direct costs in any single year to NIH on or after October 1, 2003 are expected to include a plan for sharing final research data for research purposes, or state why data sharing is not possible.The NIH Public Access Policy ensures that the public has access to the published results of NIH funded research. It requires scientists to submit final peer-reviewed journal manuscripts that arise from NIH funds to the digital archive PubMed Centralupon acceptance for publication. Â To help advance science and improve human health, the Policy requires that these papers are accessible to the public on PubMed Central no later than 12 months after publication.Remember Managing your Data will make it easier to share!!
The rest of this talk will focus on the Data Management Plan requirements for the NSFHas been in the Grant Policy Manual since 2002.Even though this “sharing” requirement was in the Admin Guide, there had been little if any enforcement. There was only a “check box” in the Fast Lane system.
DMP should describe how the proposal will conform to NSF policy on the dissemination and sharing of research results (see AAG Chapter VI.D.4), and may include:A valid Data Management Plan may include only the statement that no detailed plan is needed, as long as the statement is accompanied by a clear justification.These are the parts from the Generic guidelines.
As we will see, the NSF is really concerned with managing data in order to share it. Currently, it is not interested researchers providing a more comprehensive Data Management Plan though out the research life cycle. We recommend initiating a more comprehensive Data Management Plan to see the full benefits of managing your data. But to comply with NSF mandate, all you need is a 2-page description what data you have and how you will share it.
Home PageSlide is Clickable to go to DMPTool page
Many ways to get started & logged in
UVa is set up to use Netbadge as authentication, do not need to create a separate account (passwords)
DMP should describe how the proposal will conform to NSF policy on the dissemination and sharing of research results (see AAG Chapter VI.D.4), and may include:A valid Data Management Plan may include only the statement that no detailed plan is needed, as long as the statement is accompanied by a clear justification.These are the parts from the Generic guidelines.
Data Types Choosing file formatsOrganizing your Files File naming conventions Version controlAccess control & security Physical, Network, Computer systems and filesBackup & storage MediaData Quality Control Collection, preparationData Processing & AnalysisFile format conversions accessible in the future, not software/hardware dependentDocument all data detailsAs we will see, the NSF is really concerned with managing data in order to share it. Currently, it is not interested researchers providing a more comprehensive Data Management Plan though out the research life cycle. We recommend initiating a more comprehensive Data Management Plan to see the full benefits of managing your data. But to comply with NSF mandate, all you need is a 2-page description what data you have and how you will share it.
DIY, just-time-designTwo navigation pathsGrad student lifecycleData needsDifferent types of content delivery (videos, instructions, how-to, descriptions, case studies)Opportunity to engage and own via Social media bars and commentsLearn about good data management practices.