2. MSU Libraries
Research Data Management
Introductions
⢠Please tell us your
name and department
⢠A brief description of
your primary research
area
⢠What do you consider
to be your research
data
⢠Experience and/or
comfort level with
managing research
data?
cc http://www.flickr.com/photos/quinnanya/
3. MSU Libraries
Research Data Management
⢠Introduction
⢠Background
⢠The Impetus: NSF Data Management Plan Mandate
⢠The Effect: Policy to Practice
⢠The Response: Changing Data Landscape
⢠Fundamentals Practices
⢠File Organization
⢠Data Documentation
⢠Reliable Backup
⢠Data Publishing, Sharing, & Reuse
⢠Protecting Data & Responsible Reuse
⢠Data Lifecycle Resources
Agenda
4. MSU Libraries
Research Data Management
Data Management. Isnât thatâŚ
trivial?
⢠Not so much. Data is a primary output of research; it is
very expensive to produce high quality data. Data may
be collected in nanoseconds, but it takes the expert
application of research protocol and design to generate
data.
CC-BY-SA-3.0 Rob Lavinsky CC-BY-SA-3.0 Rob
5. MSU Libraries
Research Data Management
Data is the input of a process that generates
higher orders of understanding.
Wisdom
Knowledge
Information
Data
Understanding is
hierarchical!
Russell Ackoff
9. MSU Libraries
Research Data Management
The scientific method âis often
misrepresented as a fixed
sequence of steps,â rather than
being seen for what it truly is,
âa highly variable and creative
processâ (AAAS 2000:18).
Gauch, Hugh G. Scientific Method in Practice. New York: Cambridge University Press, 2010. Print. (Emphasis added)
12. MSU Libraries
Research Data Management
Public Trust of Science : The Crisis Tri-
force
⢠Scholarly Communication crisis
â Challenge: Increasing costs of
access
â Opportunity: Open Access
⢠Reproducibility crisis
â Challenge: Failure-to-replicate
rates
â Opportunity: Open Science
⢠Higher Education Crisis
â Challenge: Value of education
â Opportunity: Open Education
13. MSU Libraries
Research Data Management
Crisis, Part 1:
Scholarly Communication âThe
cageâ
⢠https://www.lib.msu.
edu/about/collections
/scholcomm/more/
⢠http://www.arl.org/sto
rage/documents/mo
nograph-serial-
costs.pdf
14. MSU Libraries
Research Data Management
Crisis, Part 2:
Replication Crisis âThe Canaryâ
⢠Ioannidis JPA (2005) Why Most Published
Research Findings Are False. PLoS Med
2(8): e124.
doi:10.1371/journal.pmed.0020124
⢠"Estimating the reproducibility of
psychological science". Science 349
(6251). August 28, 2015.
doi:10.1126/science.aac4716. Retrieved
September 12, 2015.
15. MSU Libraries
Research Data Management
Crisis, Part 3:
Higher Education âThe coal mineâ
http://news.bbc.co.uk/onthisday/hi/dates/stories/december/30/newsid_2547000/2547587.stm
18. MSU Libraries
Research Data Management
The Research Depth Chart
Scientific Method
Research Design
Research Method
Research Tasks
MoreDomainSpecificMoreGeneric
19. MSU Libraries
Research Data Management
Problem
Identification
Study Concept
Literature
Review
Environmental
Scan
Funding &
Proposal
Research
Design
Research
Methodology
Research
Workflow
Hypothesis
Formation
Design
Validation
Research
Activity
Data
Management
Data
Organization
Data
Storage
Data
Description
Data Sharing
Scholarly
Communication
Report
Findings
Publish
Peer Review
20. MSU Libraries
Research Data Management
Problem
Identification
Study Concept
Literature
Review
Environmental
Scan
Funding &
Proposal
Research
Design
Research
Methodology
Research
Workflow
Hypothesis
Formation
Design
Validation
Research
Activity
Data
Management
Data
Organization
Data
Storage
Data
Description
Data Sharing
Scholarly
Communication
Report
Findings
Publish
Peer Review
21. MSU Libraries
Research Data Management
Data Management
⢠The process of
planning for and
implementing a
system of care for
your research data
before, during, and
after a research
project in order to
ensure a (re)usable
resource.
22. MSU Libraries
Research Data Management
⢠Introduction
⢠Background
⢠The Impetus: NSF Data Management Plan Mandate
⢠The Effect: Policy to Practice
⢠The Response: Changing Data Landscape
⢠Fundamentals Practices
⢠File Organization
⢠Data Documentation
⢠Reliable Backup
⢠Data Publishing, Sharing, & Reuse
⢠Protecting Data & Responsible Reuse
⢠Data Lifecycle Resources
Agenda
23. MSU Libraries
Research Data Management
⢠Introduction
⢠Background
⢠The Impetus: NSF Data Management Plan Mandate
⢠The Effect: Policy to Practice
⢠The Response: Changing Data Landscape
⢠Fundamentals Practices
⢠File Organization
⢠Data Documentation
⢠Reliable Backup
⢠Data Publishing, Sharing, & Reuse
⢠Protecting Data & Responsible Reuse
⢠Data Lifecycle Resources
Agenda
24. MSU Libraries
Research Data Management
So why are we here?
Good science!
Government and Research
Funder Mandates
25. MSU Libraries
Research Data Management
But why are we really here?
⢠Impetus: NSF has mandated that all grant applications
submitted after January 18th, 2011 must include a
supplemental âData Management Planâ
⢠Effect: The original NSF mandate has had a domino
effect, and many funders now require or state guidelines
for data management of grant funded research
⢠Response: Data management has not traditionally
received a full treatment in (many) graduate and doctoral
curricula; intervention is necessary
26. MSU Libraries
Research Data Management
Positive reinforcementâŚ.
⢠National Science Foundation Data
Management Plan mandate (January 18,
2011)
⢠Presidential Memorandum on Managing
Government Records (August 24, 2012)
â Managing Government Records Directive: All
permanent electronic records in Federal
agencies will be managed electronically to the
fullest extent possible for eventual transfer
and accessioning by NARA in an electronic
format.
27. MSU Libraries
Research Data Management
Positive reinforcement⌠(cont.)
⢠White House policy memo (February 22,
2013)
â Increasing Access to the Results of Federally Funded Scientific
Research: Federal agencies with more than $100M in R&D
expenditures must develop plans to make the published results
of federally funded research freely available to the public within
one year of publication.
⢠OSTP policy memo (March 20, 2014)
â Improving the Management of and Access to Scientific
Collections: directs each Federal agency that owns, maintains,
or otherwise financially supports permanent scientific collections
to develop a draft scientific-collections management and access
policy within six months.
28. MSU Libraries
Research Data Management
Positive reinforcement⌠(cont. w/
teeth!)
⢠AHRQ = ââŚall AHRQ-funded researchers will be
required to include a data management plan for
sharing final research data in digital format, or state
why data sharing is not possible.
⢠NASA = This plan extends NASAâs culture of open
data access to all NASA-funded research.â
⢠USDA = Phased approach beginning with DMP
⢠More: http://www.arl.org/focus-areas/public-access-
policies/federally-funded-research/2696-white-house-
directive-on-public-access-to-federally-funded-
research-and-data#agency-policies
29. MSU Libraries
Research Data Management
1980
Forsham v.
Harris
â˘Research data
not subject to
FOIA
1999
Data Access
Act of 1999
â˘OMB Circular A-
110 revised:
data produced
with Federal
monies subject
to FOIA
2003
NIH Data
Sharing
Plans
â˘Grants over
$500,000
require plans
2010
America
Competes
Reauthorization
Act
â˘OSTP must coordinate
policies for
dissemination and
stewardship of scholarly
publications and data
produced with Federal
funds
2011
NSF Data
Management
Plans
â˘Grants require
supplementary DMP
2013
OSTP memo:
Increasing Access
to the Results of
Federally Funded
Research
â˘Requires agencies to
develop plans for public
access to publications and
data
Federally funded research data
Fischer, E. A. (2013). Public Access to Data from Federally Funded Research: Provisions in OMB Circular A-110 (Congressional Research Service). Retrieved from
HTTP://congressional.proquest.com.proxy2.cl.msu.edu/congressional/docview/t21.d22.crs-2013-rsi-0116?accountid=12598
30. MSU Libraries
Research Data Management
Funder Policies
NASA âpromotes the full and open sharing of all dataâ
ârequires that dataâŚbe submitted to and archived by
designated national data centers.â
âexpects the timely release and sharing of final research data"
"IMLS encourages sharing of research data."
ââŚshould describe how the project team will manage and
disseminate data generated by the projectâ
31. MSU Libraries
Research Data Management
ď§ Policies for re-use, re-distribution, and creation of
derivatives
ď§ Plans for archiving data, samples, and other research
outcomes, maintaining access
ď§ Types of data, samples, physical collections, software
generated
⢠Standards for data and metadata format and content
⢠Access and sharing policies, with stipulations for
privacy, confidentiality, security, intellectual property, or
other rights or requirements
32. MSU Libraries
Research Data Management
⢠NSF will not evaluate any proposal
missing a DMP
⢠PI may state that project will not generate
data
⢠DMP is reviewed as part of intellectual
merit or broader impacts of application, or
both
⢠Costs to implement DMP may be included
in proposalâs budget
⢠May be up to two pages long
33. MSU Libraries
Research Data Management
⢠Investigators seeking $500,000 or more in direct costs in any year
should include a description of how final research data will be
shared, or explain why data sharing is not possible.
⢠The precise content of the data-sharing plan will vary, depending on
the data being collected and how the investigator is planning to
share the data.
⢠More stringent data management and sharing requirements may be
required in specific NIH Funding Opportunity Announcements.
Principal Investigators must discuss how these requirements will be
met in their Data Sharing Plans.
34. MSU Libraries
Research Data Management
ď§ Roles and responsibilities
ď§ Expected Data
ď§ Period of data retention
⢠Data formats and dissemination
⢠Data storage and preservation of access
35. MSU Libraries
Research Data Management
Local Policy
University Research Council Best Practices:
https://rio.msu.edu/research-data
Research Data: Management, Control, and
Access
â To assure that research data are appropriately
recorded, archived for a reasonable period of
time, and available for review under the
appropriate circumstances.
⢠Ownership = MSU
⢠âStewardshipâ = You
⢠Period of Retention = 3 years
⢠Transfer of Responsibility = Written Request
36. MSU Libraries
Research Data Management
Broader Response: Changing
Data Landscapes
⢠Data Management Competencies
â Standards & Best Practices
â Discipline Specific Discourse
⢠Data sharing and open data
â Data sets as publications
â Data journals
â Citations for data (e.g., used in secondary
analysis)
â Data as supplementary materials to traditional
articles
â Data repositories and archives
37. MSU Libraries
Research Data Management
Science Paradigms
⢠Thousand years ago:
science was empirical
describing natural phenomena
⢠Last few hundred years:
theoretical branch
using models, generalizations
⢠Last few decades:
a computational branch
simulating complex phenomena
⢠Today:
data exploration (eScience)
unify theory, experiment, and simulation
â Data captured by instruments
Or generated by simulator
â Processed by software
â Information/Knowledge stored in computer
â Scientist analyzes database / files
using data management and statistics
2
2
2
.
3
4
a
cG
a
a
ďďď˝
ďˇ
ďˇ
ďˇ
ď¸
ďś
ď§
ď§
ď§
ď¨
ďŚ
ď˛ď°
Slide credit: Gray, J. & Szalay, A. (11 January 2007). eScience Talk at NRC-CSTB meeting. http://research.microsoft.com/en-us/um/people/gray/talks/NRC-CSTB_eScience.ppt
38. MSU Libraries
Research Data Management
Curation responsibilities (Carlson, The Chronicle, 2006)
âData from Big Science is ⌠easier to handle, understand and archive.
Small Science is horribly heterogeneous and far more vast. In time Small
Science will generate 2-3 times more data than Big Science.â
big science
data
small science data
institution?
domain?
MacColl, John (2010). The Role of libraries in data curation. RLG Partnership Annual Meeting, Chicago. June 2010
39. MSU Libraries
Research Data Management
Whatâs in it for me?
⢠Better organization = less headaches
â Course management
â Bibliographic management
â File management
â Research
⢠Career advancement
â Publish datasets and list on your CV
â Data management is an âunnamed practiceâ â
name it for yourself and your students!
40. MSU Libraries
Research Data Management
Data Sharing Impacts
⢠Reinforces open
scientific inquiry
⢠Encourages diversity
of analysis and opinion
⢠Promotes new
research, testing of
new or alternative
hypotheses and
methods of analysis
⢠Supports studies on
data collection
methods and
measurement
Cc http://www.flickr.com/photos/pinchof_10/
41. MSU Libraries
Research Data Management
Data Sharing Impacts
⢠Facilitates education
of new researchers
⢠Enables exploration
of topics not
envisioned by initial
investigators
⢠Permits creation of
new datasets by
combining data from
multiple sources
42. MSU Libraries
Research Data Management
Data as
Publication
Figure 1. To be
published, datasets are
typically deposited in a
repository to make them
available, documented
to support reproduction
and reuse, and assigned
an identifier to facilitate
citation.
Kratz J and Strasser C (2014) [version 3] Data publication consensus and controversies.
F1000Research 3:94 (doi: 10.12688/f1000research.3979.3)
43. MSU Libraries
Research Data Management
What could come next?
⢠Increasing emphasis on making data available over long time
periods from institutionally maintained repositories.
â The end of project or PI based data repositories
⢠Increasing pressure for rapid release of data and project
transparency
â The annual reports maybe used to dictate data release in the
near future
⢠How these challenges are resolved is a core institutional
responsibility. It is important that MSU be a leader in this
initiative.
â Institutions are broadly competing to establish these data
repositories, and I suspect these will be a competitive indicator
for proposal success in the future.
Slide from Messina, Joseph. (2015) Data Plans in SBE and Geography. CI Forum October 22, 2015
44. MSU Libraries
Research Data Management
http://retractionwatch.com/2014/01/07/doing-the-right-thing-authors-retract-brain-paper-with-systematic-human-error-in-coding/
45. MSU Libraries
Research Data Management
⢠Introduction
⢠Background
⢠The Impetus: NSF Data Management Plan Mandate
⢠The Effect: Policy to Practice
⢠The Response: Changing Data Landscape
⢠Fundamentals Practices
⢠File Organization
⢠Data Documentation
⢠Reliable Backup
⢠Data Publishing, Sharing, & Reuse
⢠Protecting Data & Responsible Reuse
⢠Data Lifecycle Resources
Agenda
46. MSU Libraries
Research Data Management
Research Data Management
Fundamentals
⢠Documentation
⢠File Organization
⢠Storage & Backup
⢠Data Publishing, Sharing,
& Reuse
⢠Protecting Data
& Responsible Reuse
47. MSU Libraries
Research Data Management
Documentation Practices:
Overview
⢠Researchers benefit from proper
documentation to decipher or reuse their
datasets â even prior to thinking about
sharing
⢠Think âdownstreamâ
48. MSU Libraries
Research Data Management
Documentation Practices: Overview
1. At minimum create a
README file that you can
use to document your
project
2. Utilize standards for
describing data including
Metadata Standards
3. If applicable, use in-line
code commentary to
explain code
(cc) Will Scullin
49. MSU Libraries
Research Data Management
Create a README file
⢠At minimum, store documentation in
readme.txt file or equivalent, with data
â What data consists of
â How it was collected
â Restrictions to distribution or use
â Other descriptive information
50. MSU Libraries
Research Data Management
⢠âData about dataâ
⢠Standardized way of describing data
⢠Explains who, what, where, when of data
creation and methods of use
⢠Data more easily found
⢠Data more easily compared to other data sets
Use Metadata Standards
51. MSU Libraries
Research Data Management
Use Metadata Standards
Basic project metadata:
⢠Title ⢠Language ⢠File Formats
⢠Creator ⢠Dates ⢠File Structure
⢠Identifier ⢠Location ⢠Variable List
⢠Subject ⢠Methodology ⢠Code Lists
⢠Funders ⢠Data Processing ⢠Versions
⢠Rights ⢠Sources ⢠Checksums
⢠Access
Information
⢠List of File Names
52. MSU Libraries
Research Data Management
Use Metadata Standards
⢠Dublin Core: Commonly-used descriptive
metadata format facilitates dataset discovery
across the Web.
⢠Data Documentation Initiative (DDI): Defines
metadata content, presentation, transport, and
preservation for the social and behavioral
sciences.
⢠ISO 19115:2003: Describes geographic data such
as maps and charts.
⢠More
examples:http://www.lib.msu.edu/about/diginfo/coll
ect.jsp
53. MSU Libraries
Research Data Management
Use In-Line Code Commentary
Example of R code commentary
# Cumulative normal density
pnorm(c(-1.96,0,1.96))
⢠If applicable, in-line code commentary helps
explain code
54. MSU Libraries
Research Data Management
File Organization Practices:
Overview
1. Design a file plan
for your research
project
2. Use file naming
conventions that
work for your project
3. Choose file formats
to maximize
usefulness
âWhen I was a
freshmen I named
my assignments
Paper Paperr
Paperrr Paperrrrâ
-Undergrad
55. MSU Libraries
Research Data Management
Design a File Plan
⢠File structure is the framework
⢠Classification system makes it easier to
locate folders/files
⢠Benefits:
â Simple organization intuitive to team
members and colleagues
â Reduces duplicate copies in personal drives
and e-mail attachments
56. MSU Libraries
Research Data Management
Design a File Plan
Choose a sortable directory hierarchy
⢠Example 1: Investigator, Process, Date
Collie
TEI_Encoding
20110117
⢠Example 2: Instrument, Date, Sample
Usability Survey
2012043
sample_1
57. MSU Libraries
Research Data Management
Design a File Plan
Example documentation of Directory Hierarchy:
/[Project]/[Grant Number]/[Event]/[Investigator/Date]
58. MSU Libraries
Research Data Management
Use File Naming Conventions
â Enable better access/retrieval of files
â Create logical sequences for file sorting
â More easily identify what youâre searching for
59. MSU Libraries
Research Data Management
⢠Meaningful but shortâ255 character limit
⢠Use alphanumeric characters
â Example: abc123
⢠Capital letters or underscores differentiate
between words
⢠Surname first followed by initials of first name
Use File Naming Conventions
60. MSU Libraries
Research Data Management
⢠Year-month-day format for dates, with or
without hyphens
Example 1: 2006-03-13
Example 2: 20060313
⢠Decide on a simple versioning method
Example: file_v001
Use File Naming Conventions
61. MSU Libraries
Research Data Management
⢠To create consistent file names, specify a
template such as:
[investigator]_[descriptor]_[YYYYMMDD].[ex
t]
Use File Naming Conventions
This Not This
sharpeW_krillMicrograph_backscatter3_20110117.tif KrillData2011.tif
This Not This
borgesJ_collocation_20080414.xml Borges_Textbase.xml
62. MSU Libraries
Research Data Management
Choose Appropriate File Formats
⢠Non-proprietary
⢠Open, documented standard
⢠Common usage by research community
⢠Standard representation (ASCII, Unicode)
⢠Unencrypted
⢠Uncompressed
63. MSU Libraries
Research Data Management
Choose Appropriate File Formats
Format Genre Optimal Standards
TEXT .txt; .odt; .xml; .html
AUDIO .flac; .wav,
VIDEO .mp2/.mp4; .mkv
IMAGE .tif; .png; .svg; .jpg
DATA .sql; .csv
64. MSU Libraries
Research Data Management
Storage & Backup Practices
1. Avoid single points of
failure
2. Ensure data redundancy &
replication
3. Understand common
types of storage
(cc) George Ornbo
Data at significant risk of loss without storage
and backup plan
65. MSU Libraries
Research Data Management
Avoid Single Points of Failure
A single point of failure occurs when it
would only take one event to destroy all
data on a device
⢠Use managed networked storage when
possible
⢠Move data off of portable media
⢠Never rely on one copy of data
⢠Do not rely on CD or DVD copies to be
readable
⢠Be wary of software lifespans
66. MSU Libraries
Research Data Management
Ensure Data Redundancy
⢠Effective data storage plan provides for 3
copies:
â Primary authoritative copy
â Secondary local backup
â Tertiary remote backup
⢠Geographically distribute and secure
â Local vs. remote, depending on needed recovery
time
⢠Personal computer, external hard drives,
departmental, or university servers may be
used
67. MSU Libraries
Research Data Management
Ensure Data Redundancy
⢠Cloud storage
â Amazon s3
â Google
â MS Azure
â DuraCloud
â Rackspace
â Glacier
Note that many enterprise
cloud storage services
include a charge for in/out of
data transfers
$$$
68. MSU Libraries
Research Data Management
Understand Common Types of
Storage
⢠Optical Media
⢠Portable Flash Media
⢠Commercial Hard Drives
⢠Commercial NAS
⢠Cloud Storage
⢠Enterprise Network Storage
⢠Trusted Archival Storage
69. MSU Libraries
Research Data Management
Understand Common Types of
Storage
⢠Features of storage types:
⢠Portable data transfers
⢠Short-term storage
⢠Project term storage
⢠Networked data transfer
⢠Long-term storage
⢠Reliable backup option
70. MSU Libraries
Research Data Management
Understand Common Types of
StoragePortable
Data
Transfer
Short
Term
Storage
Project
Term
Storage
Networked
Data Transfer
Long
Term
Storage
Reliable
Backup
Option
Optical Media â â â â â â
Portable Flash
Media
â â â â â â
Commercial Hard
Drives
â â â â â â
Commercial NAS â â â â â â
Cloud Storage â â â â â â
Enterprise Network
Storage
â â â â â â
Trusted Archival
Storage
â â â â â â
71. MSU Libraries
Research Data Management
Understand Common Types of
Storage
Media Storage @ MSU
Optical Media MSU Computer StoreâSells Optical Media and hardware accessories
UAHC Media Storage ServiceâOffers physical lock-box like storage for MSU
Flash Media MSU Computer StoreâSells Optical Media and hardware accessories
UAHC Media Storage ServiceâOffers physical lock-box like storage for MSU
Commercial Hard
Drives
MSU Computer StoreâSells Optical Media and hardware accessories.
UAHC Media Storage ServiceâOffers physical lock-box like storage for MSU
Enterprise Cloud
Storage
AngelâFree. Ideal for collaboration; not storage space. Phase out 2015
Desire2LearnâFree. Ideal for collaboration; not storage space. Replaces Angel
GoogleAppsâFree. Ideal for collaboration; not intended as storage space
Enterprise
Network Storage
AFS SpaceâFree to 1GB, addâl space can be purchased w/dept. account
IT Services Individual, Mid-Tier and Enterprise StorageâFee based
HPCC Home or ResearchâFree up to 1TB. Fee based additions available
Trusted Archival
Storage
Disciplinary Repositories â Disciplinary repositories offer archival services for
pertinent research data.
72. MSU Libraries
Research Data Management
Data Publishing, Sharing, Reuse
1. Time-intensive, with potentially
high return on investment
2. Publish data in several data
publication venues to more
broadly share results of research
Research datasets on par with peer-reviewed
journal articles as first-class scholarly contributions
73. MSU Libraries
Research Data Management
Sharing & Publishing Data
⢠Data preparation for sharing and publication
is a time-intensive process
⢠Potential positive outcomes:
⢠Increased research impact and citations
⢠Enable additional scientific inquiry
⢠Opportunities for co-authorship and
collaboration
⢠Enhance your grant proposalâs
competitiveness
74. MSU Libraries
Research Data Management
Data Publication Venues
⢠Multiple ways to publish research data
⢠Faculty or project website
⢠Journal supplementary materials
⢠Disciplinary data repository (data archive)
⢠Varying levels of support for indexing, access
controls, and long-term curation
75. MSU Libraries
Research Data Management
Data Publication Venues
⢠Disciplinary Data Repository
⢠Securely share data, ensure long-term access
⢠High visibility
⢠Often offer persistent citations
⢠Availability varies across domains
⢠Databib.org directory
76. MSU Libraries
Research Data Management
Data Publication Venues
⢠Disciplinary Data Repository
⢠Securely share data, ensure long-term access
⢠High visibility
⢠Often offer persistent citations
⢠Availability varies across domains
⢠Databib.org directory
77. MSU Libraries
Research Data Management
Protecting Data & Responsible Reuse
1. Consider how to protect
data and intellectual
property rights while
encouraging reuse
2. Keep in mind ethical
concerns when sharing
data
(cc) Will Scullin
78. MSU Libraries
Research Data Management
Intellectual Property
⢠IP refers to exclusive rights of creators of
works
⢠Individual data cannot be protected by US
copyright
⢠Organization of data such as database,
creative work produced by data, and research
instruments used may be protected
Š
79. MSU Libraries
Research Data Management
Intellectual Property
⢠Principal investigatorâs institution holds IP
rights
⢠Provide clearly stated license for producing
derivatives, reusing, and redistributing
datasets
⢠License under Creative Commons
⢠State if any restrictions or embargos on use
⢠Provide example of how work should be cited
to encourage proper attribution on reuse
⢠Document any IP / copyright issues
80. MSU Libraries
Research Data Management
Ethics & Data Sharing
⢠Keep in mind the following ethical concerns
when sharing your data:
⢠Privacy
⢠Confidentiality
⢠Security and integrity of the data
⢠For data involving human subjects, obtain
written permission or consent stating how the
data may be reused
81. MSU Libraries
Research Data Management
Best Practices = High Impact Data
⢠File organization ensures easier access and
retrieval of data
⢠Documentation makes datasets accessible
and intelligible to users
⢠Storage and backup safeguards data
⢠Data publishing and sharing encourages the
most widespread reuse of data
⢠Data protection ensures responsible reuse
82. MSU Libraries
Research Data Management
⢠Introduction
⢠Background
⢠The Impetus: NSF Data Management Plan Mandate
⢠The Effect: Policy to Practice
⢠The Response: Changing Data Landscape
⢠Fundamentals Practices
⢠File Organization
⢠Data Documentation
⢠Reliable Backup
⢠Data Publishing, Sharing, & Reuse
⢠Protecting Data & Responsible Reuse
⢠Data Lifecycle Resources
Agenda
83. MSU Libraries
Research Data Management
Volunstrordinaries!
Aaron
Collie
Devin Higgins Brandon
Locke
Ranti Junus Judy
Matthews
Tina Qin
84. MSU Libraries
Research Data Management
We teach people about RDM
Librarianship
Training
Assessment
Consultation
Ad-hoc
6-12 new clients per semester
100% satisfied / 100% would use
again
71% of new clients are referrals
60% requested additional services
15% through NFO, 14% through
website
86. MSU Libraries
Research Data Management
RDM@MSU 101
⢠Who: You, as the designated steward
⢠What: âthe dataâ
⢠When: Minimum 3 years after
publ./degree
⢠Where: Managed networked storage
⢠Why: Legal, Ethical, Scholarly
⢠How: With fidelity and documentation
sufficient to reproduce the research
87. MSU Libraries
Research Data Management
Contact
Aaron Collie
collie@msu.edu
@aaroncollie
http://www.lib.msu.edu/rdmg
Editor's Notes
Data management is about more than just the lost back-pack. It is about expert application. Expert application in any industry is expensive.
In the academic industry data is the input to our final product. It takes years of training and experience to succeed in this field.
Research is a process, it is scientific, and we use an overarching model to describe the process at a high level. But this is a conceptual model, it is not a process model. But this is a pretty sterile model; and we know that because it is not prescriptive to all academic disciplines.
Research is a process, it is scientific, and we use an overarching model to describe the process at a high level. But this is a conceptual model, it is not a process model. But this is a pretty sterile model; and we know that because it is not prescriptive to all academic disciplines.
In practice, research is a complicated process. It is a creative process as well as a scientific process.
This has been noticed.
Research is hard, managing research is boring.
And right now is a really good time to be O.K. with this, because science is under attack. And, given the interesting times we live in, it might be a good opportunity to get our house in order.
So back to this. Research is a process, it is scientific, and we use an overarching model to describe the process at a high level. But this is a conceptual model, it is not a process model. But this is a pretty sterile model; and we know that because it is not prescriptive to all academic disciplines.
You might think of the scientific method as a bit of an iceberg model. At the tip of the iceberg are these general activities, but research isnât really conducted at this high of a level.
Research is a thing that happens at many levels simultaneously. The more experience you gain with research, the more of the depth chart you develop expertise within.
Data management is a subprocess of research. It is part of a holistic research method that includes a ton of other functions like funding, literature reviews, workflows and publication.
Today we are just going to focus on the one of these areas. Data management.
HANDOUT: DMP (blue)
Federal debate on the right of public access to government funded research data dates back to 1980
Forsham v. Harris â research data not subject to FOIA. Private grantees are not agencies subject to FOIA â the data had not been created/obtained by a federal agency. Plaintiffs were physicians seeking to obtain data underlying a Dept. of Health, Education, and Welfare report on diabetes treatment regimens.
Scientific research funding via grants was meant to allow the scientific knowledge system to operate on its established norms free from partisan influence.
The 1999 âShelby Amendmentâ changed this by revising OMB Circular A-110, making research data subject to FOIA if it is
âresearch data relating to published research findings produced under an award that were used by the Federal Government in developing an agency action that has the force and effect of law.â
Senator Shelby advocated for the amendment on the basis of (1) transparency, (2) accountability.
Further policy arguments for data access = âreturn on scientific capitalâ
Based on effort to obtain data from Harvardâs âSix Citiesâ study (funded by NIH) showing a link between particulate air pollution and health, which was the scientific basis of debates around EPA air quality regulations.
NIH â 2003 policy requires data sharing plans for grants over $500K, and since 1994 included policy that requirements making results available to the public.
NSF â has had a data sharing policy in place since 1989 which states that grantees are expected to share data, and encourage to facilitate data sharing, but it does not specifically require formal data publication. Neither does the 2011 DMP policy, although it has been a major force in increased attention to open data.
In recent years, reports from the National Academies and elsewhere on the merits of CI and data sharing have snowballed.
In 2010 the ACRA required the Director of OSTP to coordinate agency policies ârelated to the dissemination and long-term stewardship of the results of unclassified research, including digital data and peer-reviewed scholarly publications, supported wholly, or in part, by funding from the Federal science agencies.â
As a result, 2013 OSTP memo, requires federal agencies funding more than $100 million in R&D annually to develop and implement plans for public access to publications and data.
National Oceanic and Atmospheric Administration (NOAA)
IMLS encourages sharing of research data. Applications that develop digital products must fill out an additional form with ten questions focused on âDeveloping Data Management Plans for Research Projects.
The federal government has the right to obtain, reproduce, publish or otherwise use the data first produced under an award and authorize others to do so for government purposes.â
Ex: Digging Into Data
HANDOUT: DMP examples (white)
NSFâs data management plan requirement
May be up to two pages long
PI may state that project will not generate data or samples
DMP is reviewed as part of intellectual merit or broader impacts of application, or both
HANDOUT: DMP examples (white)
HANDOUT: DMP examples (white)
(OMB Circular A-10, Sec. 53; 42CFR, Part 50, Subpart A)
Replication, transparency, re-use, mashups, repurposing, extending grant dollars and enabling more researchâŚ
37
In fact, it is one thing to share data and quite another to publish data.
Data publication allows for data to be a âfirst class objectâ as part of the scholarly record, allowing for collection, management, curation, and citation.
Dissenting opinion â data publication is a round peg in a square hole, shouldnât try and make it conform to an antiquated system of scholarly communication.
Pragmatism = treat as publication
Bad press
Benefits include:
Electronic documents maintained together in one place and easily accessible to project staff
Data backed up and recoverable in the event of system failure
Promote culture of sharing information as an institutional resource, rather than individual ownership
Reduce duplicate copies in personal drives and email attachments
Starting point
nuances of metadata -- data dictionaries, lab notebooks / journals,
Starting point
Descriptive documentation that accompanies a dataset
Better project transitions
Electronic documents maintained together in one place, easily accessible to project staff
Reduces duplicate copies in personal drives and email attachments
(Hierarchical/taxonomical/temporal)
Benefits include:
Electronic documents maintained together in one place and easily accessible to project staff
Data backed up and recoverable in the event of system failure
Promote culture of sharing information as an institutional resource, rather than individual ownership
Reduce duplicate copies in personal drives and email attachments
Benefits include:
Electronic documents maintained together in one place and easily accessible to project staff
Data backed up and recoverable in the event of system failure
Promote culture of sharing information as an institutional resource, rather than individual ownership
Reduce duplicate copies in personal drives and email attachments
Will know how to name future folders as your project grows.
Good practices
Good choices includeâŚ
Consider later lifecycle activities
Flexible
What format used for analysis, preservation, etc.
Consider later lifecycle activities
Flexible
What format used for analysis, preservation, etc.
Data at significant risk of loss without storage and backup plan, including:
Hardware / network failures
Bit rot
Human error
Singular commercial grade hard drives
Effective data storage plan provides for:
Primary authoritative copy
Secondary local backup
Tertiary remote backup
One event might be a dropped hard drive
Good practices
Be wary of software lifespans, such as with course management software like ANGEL or Desire2Learn
Examples of 3 copies
original + external/local + external/remote
original + 2 formats on 2 drives in 2 locations
Mention new Backup Media Storage service offered by the University Archives.
Mention new Backup Media Storage service offered by the University Archives.
ANGEL, Desire2Learn, and Google Apps might be considered Cloud offerings from MSU. Good for collaboration and short term, donât use for long-term storage.
Not immune to data loss â Dedoose example.
In booklet
For exampleâŚ.
Include description
Angel and Desire2Learn not intended as storage space
For more information on disciplinary repositories, contact RDMG or peruse Databib.org
In booklet
For exampleâŚ.
In booklet
For exampleâŚ.
In booklet
For exampleâŚ.
In booklet
For exampleâŚ.
Principal investigatorâs institution holds IP rights-- usually
File organization ensures easier access and retrieval of data during and after project
Documentation make datasets accessible and intelligible to users
Storage and backup safeguards data against technical failure, human error, and natural catastrophe
Data publishing and sharing encourages the most widespread reuse of data
Data protection ensures responsible reuse in light of intellectual property and ethical concerns
Increase impact of data and promote new research opportunities
Service model
A Plus / Delta exercise focusing on extant infrastructure and services
Weave known MSU resources
Discussion starters:
Describe your interaction with dept, college, university, external bodies?
What makes managing research data difficult?
What services/tools do you need/want?
Advice Website
Database designers
Targeted seminar series
Data storage and curation options