Managing Research Data
Part 1
Planning Working
Finalizing
Sharing
Data
This work is licensed under a Creative Commons
Attribution 4.0 International License.
WHY – WHAT– WHO – WHEN & HOW
WHY manage data -
WHAT research data are-
WHO manages research data -
WHEN & HOW data management is done -
Planning Working
Finalizing
Sharing
Data
Managing Research Data
This work is licensed under a Creative Commons
Attribution 4.0 International License.
This two-part course is a collaboration between CU Libraries/
Information Services and the Office of Research Compliance &
Training. The purpose of this course is to familiarize you with the
various aspects of research data management (RDM) by taking
3
Managing Research Data
44/
Managing Research Data
This course will guide you through these areas, offering in-depth
details on each of them. Please refer to the top navigation to keep
track of which area you are currently exploring.
•  Why RDM is both recommended and required
•  What research data are
•  Who is responsible for RDM
•  When RDM activities occur
•  How you can carry out RDM activities
Part 1:
Part 2:
Learning objectives:
At the end of this training you will be able to:
•  Define & identify research data
•  Understand the demands of responsible conduct of research with
regard to research data management
•  Understand the reasons behind the federal mandates of research data
management
4
Managing Research Data
44/
Managing Research Data
Links to many of the references and
policies referred to in this course can be
found on the final slides.
Have Fun!
5
Managing Research Data
44/
Managing Research Data
Why should you care about
Research Data Management?
WHY –WHAT – WHO – WHEN & HOW
644/
Managing Research Data
WHY –WHAT – WHO – WHEN & HOW
Managing research data:
SAVES TIME
Taking time to plan for your expected data, back them up, and document
them in detail saves time otherwise lost in searching for, recovering, and
deciphering data in the future
SIMPLIFIES YOUR LIFE
Managing your data, by adopting an organization scheme, developing a
description standard, and creating a preservation plan avoids future
confusion and turmoil
INCREASES RESEARCH EFFICIENCY
By saving time and avoiding confusion you will be more efficient! Manage
your data for the future and you will be able to more easily find, access,
understand, and use your data
ENSURES RESEARCH INTEGRITY
Good research data management makes it more feasible to fulfill the
commitments of responsible research
744/
Managing Research Data
Up to 80% of data lost within 20 years of
publication:
http://www.nature.com/news/scientists-losing-data-at-a-rapid-rate-1.14416
516 ecology papers
published between
1991 and 2011
The chance of data
being accessible fell by
17% per year
Vines, T. H. et al. Curr. Biol.
http://dx.doi.org/10.1016/j.cub.
2013.11.014 (2013)
8
WHY –WHAT – WHO – WHEN & HOW
44/
Managing Research Data
When you engage in research at Columbia
University you must:
•  Be ethical in the conduct of the research
•  Abide by regulations and policies
•  Be responsible stewards of the research dollars and other resources
•  Share the results of your research for the good of society
Managing data is a critical responsibility
for all researchers
9
WHY –WHAT – WHO – WHEN & HOW
44/
Managing Research Data
•  Increases visibility
•  Facilitates discovery
•  Satisfies funder & journal requirements
•  Reinforces open scientific inquiry
•  Establishes priority & enables citation
•  Speeds research
Adapted from: https://libraries.mit.edu/guides/subjects/data-management/
why.html & http://researchdata.wisc.edu/share-your-data/data-access-2/10
WHY –WHAT – WHO – WHEN & HOW
Managing research data enables
sharing, which:
44/
Managing Research Data
Sharing enables breakthroughs
that lead to economic
development:
http://www.whitehouse.gov/sites/default/files/microsites/ostp/
ostp_public_access_memo_2013.pdf11
WHY –WHAT – WHO – WHEN & HOW
“Scientific research supported by the Federal Government
catalyzes innovative breakthroughs that drive our economy. The
results of that research become the grist for new insights and are
assets for progress in areas such as health, energy, the
environment, agriculture, and national security.”
44/
Managing Research Data
“…a research
project's success is
measured … also by
the data it makes
available to the
wider community.”
“It is obvious that
making data widely
available is an
essential element of
scientific research.”
12
WHY –WHAT – WHO – WHEN & HOW
44/
Managing Research Data
The directive to make federally funded research data openly
accessible “is integrally tied to and supports the mission of
higher education to produce, preserve, and share scholarship. It
therefore provides the community with an opportunity to
marshal its resources to improve the interoperability of research
support systems and maximize the value of research funding.”
Association of Research Libraries (ARL) on the Office of Science &
Technology Policy memorandum “Increasing Access to the Results of
Federally Funded Scientific Research”
13
WHY –WHAT – WHO – WHEN & HOW
44/
Managing Research Data
•  Funders
–  Federal agencies
–  Foundations
•  Journals
14
WHY –WHAT – WHO – WHEN & HOW
44/
Managing Research Data
Data sharing is required by:
National Science Foundation (NSF):
https://www.nsf.gov/eng/general/dmp.jsp15
WHY –WHAT – WHO – WHEN & HOW
44/
Managing Research Data
“Beginning January 18, 2011, proposals submitted to NSF must
include a supplementary document of no more than two pages
labeled "Data Management Plan" (DMP) . This supplementary
document should describe how the proposal will conform to
NSF policy on the dissemination and sharing of research results.
Proposals that do not include a DMP will not be able to be
submitted.”
National Institutes of Health (NIH):
1 http://grants.nih.gov/grants/policy/data_sharing/
2 http://grants.nih.gov/grants/guide/notice-files/NOT-OD-03-032.html16
WHY –WHAT – WHO – WHEN & HOW
44/
Managing Research Data
“Data sharing is essential for expedited translation of research
results into knowledge, products and procedures to improve
human health.”1
“…all investigator-initiated applications with direct costs greater
than $500,000 in any single year will be expected to address
data sharing in their application”2
“The Office of Science and Technology Policy (OSTP) hereby
directs each Federal agency with over $44 million in annual
conduct of research and development expenditures to develop a
plan to support increased public access to the results of research
funded by the Federal Government.”
“…digitally formatted scientific data resulting from unclassified
research supported wholly or in part by Federal funding should
be stored and publicly accessible to search, retrieve, and
analyze.” (2013)
http://www.whitehouse.gov/sites/default/files/microsites/ostp/
ostp_public_access_memo_2013.pdf17
WHY –WHAT – WHO – WHEN & HOW
44/
Managing Research Data
…and not just them! More federal
agencies will be requiring public
access to data:
http://www.nature.com/srep/policies/index.html
http://http://www.aeaweb.org/aer/data.php
Journal Sharing Policies:
“It is the policy of the American Economic Review to
publish papers only if the data used in the analysis are
clearly and precisely documented and are readily available
to any researcher for purposes of replication. Authors of
accepted papers that contain empirical work, simulations,
or experimental work must provide to the Review, prior to
publication, the data, programs, and other details of the
computations sufficient to permit replication.”
“…authors are required to make materials, data and
associated protocols promptly available to readers.”
1844/
Managing Research Data
WHY –WHAT – WHO – WHEN & HOW
http://www.sciencemag.org/site/feature/contribinfo/prep/gen_info.xhtml#dataavail
http://www.bmj.com/about-bmj/resources-authors/article-types/research
“…trials of drugs and medical devices will be considered
for publication only if the authors commit to making the
relevant anonymised patient level data available on
reasonable request”
“All data necessary to understand,
assess, and extend the conclusions of
the manuscript must be available to any
reader of Science. All computer codes
involved in the creation or analysis of
data must also be available...”
19
WHY –WHAT – WHO – WHEN & HOW
44/
Managing Research Data
Journal Sharing Policies:
Benefits of good data management & sharing
practices:
20
WHY –WHAT – WHO – WHEN & HOW
44/
Managing Research Data
•  Increase citations
•  Avoid retractions (& potential misconduct questions)
•  Advance knowledge
•  Enable reproducibility
Increase citations:
Piwowar HA, Day RS, Fridsma DB (2007). Sharing Detailed Research Data Is
Associated with Increased Citation Rate. PLoS ONE 2(3): e308. doi:10.1371/
journal.pone.0000308
21
WHY –WHAT – WHO – WHEN & HOW
44/
Managing Research Data
Publicly available data was significantly associated with a 69%
increase in citations, independent of journal impact factor, date
of publication, and author country of origin using linear
regression.
Avoid retractions:
http://retractionwatch.wordpress.com/2013/10/30/nejm-paper-on-sleep-apnea-
retracted-when-original-data-cant-be-found/
22
WHY –WHAT – WHO – WHEN & HOW
44/
Managing Research Data
Advance knowledge:
http://www.sciencedaily.com/releases/2013/09/130903194155.htm23
WHY –WHAT – WHO – WHEN & HOW
44/
Managing Research Data
“70 percent of published genetic sequence comparisons are not
publicly accessible, leaving researchers worldwide unable to get
to critical data they may need to tackle a host of problems
ranging from climate change to disease control.”
Enable reproducibility:
Looking at 238 recently published papers, pulled from five
fields of biomedicine, a team of scientists found that just
under 50 percent of the research materials, from lab mice
to antibodies, used in the work could not be identified.
This phenomenon impedes the ability of scientists to
reproduce & extend published studies.
Vasilevsky NA, Brush MH, Paddock H, Ponting L, Tripathy SJ et al. (2013) On the
reproducibility of science: unique identification of research resources in the
biomedical literature. PeerJ 1:e148 http://dx.doi.org/10.7717/peerj.148
24
WHY –WHAT – WHO – WHEN & HOW
44/
Managing Research Data
TAKE-AWAYS
25
WHY –WHAT – WHO – WHEN & HOW
44/
Managing Research Data
•  Researchers share the products of their research (e.g.,
publications, data) for the good of:
–  Society
–  Advancement of science
–  Themselves
•  Data management is required by:
–  Funding bodies
–  Institutions
•  Data sharing is a requirement of:
–  Funding bodies
–  Publishers
What are the research data
that need to be managed?
WHY –WHAT – WHO – WHEN & HOW
2644/
Managing Research Data
WHY –WHAT – WHO – WHEN & HOW
Defining Research Data:
1 Marieke Guy. http://www.slideshare.net/MariekeGuy/bridging-the-gap-between-
researchers-and-research-data-management , #2
2 Queensland University of Technology. Manual of Procedures and Policies. Section
2.8.3. http://www.mopp.qut.edu.au/D/D_02_08.jsp
3 http://www.whitehouse.gov/omb/circulars_a110#362744/
Managing Research Data
“…information created [or discovered] in the course of
research”1
Material or information “on which an argument, theory, test or
hypothesis, or another research output is based.” 2
“(i) Research data is defined as the recorded factual material
commonly accepted in the [research] community as necessary to
validate research findings…”3
Data may be collected in many ways:
2844/
Managing Research Data
WHY –WHAT – WHO – WHEN & HOW
Through real time & unique observations, from repeatable
experiments or simulations, or through derivations from unique
collections of data, as a few examples.
Data collection method costs and
risks:
Some data may be impossible to replace, some data may merely
be very expensive replace. Alternatively, some data are so cheap
and quick to acquire that it is less expensive to repeat the
collection process than to store the data, e.g., some gene
sequences.
Data may be classified by collection
method, which include:
2944/
Managing Research Data
WHY –WHAT – WHO – WHEN & HOW
OBSERVATIONS – collected in real time / irreplaceable
e.g. survey results, images, telemetry, sensor readings, some literary/historical
sources, recordings
EXPERIMENTS – reproducible/ variable expense
e.g. chromatograms, antenna mappings, word frequency
SIMULATIONS – Models & inputs used to create datasets
e.g. economic models, climate models
DERIVATIONS/COMPILATIONS – reproducible/expensive
e.g. text or data mining, 3D models, compiled database
RESEARCH PROCESS DATA – real time / irreplaceable
e.g. survey instruments, data description/documentation, developed software,
algorithms, code/script, instrument settings
TRUE OR FALSE:
3044/
Managing Research Data
WHY –WHAT – WHO – WHEN & HOW
In scientific research, only the information and observations
that are collected as part of your research are considered data.
FALSE
3144/
Managing Research Data
WHY –WHAT – WHO – WHEN & HOW
Data are not only the information and observations made as
part of scientific research but also the materials, the means, and
the products of that research.
Examples:
•  Survey instruments
•  Associated software
•  Cell lines
•  Specimens
Information exists in different forms
during the research process:
3244/
Managing Research Data
WHY –WHAT – WHO – WHEN & HOW
RAW OR PRIMARY– Lab notebooks, observational notes,
instrument readings, images, footage, individual survey
responses, historical sources, textual analysis, etc.
PROCESSED– Statistical analyses, sources organized as evidence,
rich descriptions, aggregated survey responses, etc.
PUBLISHED– Distribution in some finalized format to those
outside of the project. Distribution may occur in both static
and dynamic (e.g. longitudinal data sets with annual
reporting) instances, etc.
“Research data means the recorded factual material commonly
accepted in the scientific community as necessary to validate
research findings”
33
•  Preliminary analyses
•  Drafts of scientific papers
•  Plans for future research
•  Peer reviews
•  Communications with colleagues.
•  Trade secrets
•  Commercial information
•  Materials necessary to be held
confidential by a researcher until they
are published
•  Information which is protected under
law
•  Personnel and medical information
•  Information the disclosure of which
would constitute a clearly unwarranted
invasion of personal privacy, such as
information that could be used to
identify a particular person in a
research study.
http://federalregister.gov/a/2013-3046544/
Managing Research Data
WHY –WHAT – WHO – WHEN & HOW
The federal Office of Management
and Budget offers the following
explanation (summarized):
Exclusions:
2 CFR § 200.315, Intangible property, e (3)
Some data on research subjects may require special protections
because they are highly sensitive and highly regulated. These
sensitive data may require encryption and other security
measures:
•  Personal Health Information (PHI) e.g. insurance
information, health conditions, etc.
•  Personally Identifying Information (PII) e.g. financial
information, social security numbers, etc.
There are a number of university policies that govern handling
information of these types. Special training is required for
researchers and others handling PHI.
3444/
Managing Research Data
WHY –WHAT – WHO – WHEN & HOW
Sensitive data:
Release of sensitive data can damage:
3544/
Managing Research Data
WHY –WHAT – WHO – WHEN & HOW
-  Individuals whose data were released: identity theft, financial
loss, privacy violations, etc.
-  Research team members: loss of reputation, loss of position
-  Research institution: financial liability
UNIVERSITY RESOURCES:
-  Office of HIPAA Compliance website
-  HIPAA training
-  IRB website and training
-  Data Classification Policy
-  Policy on Electronic Data Security Breach Reporting and
Response
-  Other IT Security Policies
Take-aways:
3644/
Managing Research Data
WHY –WHAT – WHO – WHEN & HOW
•  Definitions of data are varied – Use the one(s) appropriate to
your research community
•  Some data are sensitive
–  Know which data they are
–  Know and take the proper precautions to protect these data
Who is responsible for
Research Data Management?
WHY –WHAT – WHO – WHEN & HOW
3744/
Managing Research Data
WHY –WHAT – WHO – WHEN & HOW
Who is responsible for research data?
3844/
Managing Research Data
The PI is ultimately responsible for the data, and is the steward
of the data (more on this later).
It is incumbent upon every member of the research team to
safeguard research products (more on this later, too).
PI responsibilities:
3944/
Managing Research Data
WHY –WHAT – WHO – WHEN & HOW
“The full administrative, fiscal and scientific responsibility for
the management of a sponsored project resides with the
principal investigator named in the award”
Faculty Handbook 2008
As with all aspects of a proposal submission, the PI must be
involved with establishing and describing an appropriate data
management plan, as required.
PI responsibilities:
4044/
Managing Research Data
WHY –WHAT – WHO – WHEN & HOW
The PI is responsible for the collection, management,
maintenance and retention of research data accumulated during
a research project. It is the PI’s responsibility to:
•  Determine what records need to be retained to comply with
sponsor requirements
•  Adopt an orderly system of data organization
•  Communicate the chosen system to all members of a research
group & to the appropriate administrative personnel
•  Establish & maintain procedures for protection of essential
records in the event of a natural disaster or other emergency
Sponsored Projects Handbook
Research team member
responsibilities:
4144/
Managing Research Data
WHY –WHAT – WHO – WHEN & HOW
Everyone involved in the research project is responsible for
adhering to the statements and requirements presented in the
data management plan, and all other data management
practices related to the research project.
These may include practices of handling:
•  Physical data e.g., lab notebooks, samples, data
documentation (aka metadata), etc.
•  Electronic data e.g., file naming conventions, generating
metadata, keeping an e-lab notebook, data storage, data back-
ups, annotating findings
Take-aways:
4244/
Managing Research Data
WHY –WHAT – WHO – WHEN & HOW
•  PI is responsible for all aspects of the grant, including data
management
•  All members of the research team are responsible for
adhering to the data management plan
Research data management can be complex, but there are
resources available
See Part 2 of this course for details on WHEN & HOW to
practice Research Data Management
à SEE NEXT PAGE!
Resources for Research Data Management:
4344/
Managing Research Data
WHY –WHAT – WHO – WHEN & HOW
Title	
   URL	
  
Scholarly Communications Program,
Data Management
http://scholcomm.columbia.edu/data-management/
Research and Data Integrity Program
(ReaDI)
http://www.columbia.edu/cu/compliance/docs/ReaDI_Program/
index.html
Data Management Plan Templates
http://scholcomm.columbia.edu/data-management/data-
management-plan-templates/
CUIT Research Computing Services http://rcs.columbia.edu
Academic Commons Archival Storage http://academiccommons.columbia.edu/about
Citation Management http://library.columbia.edu/research/citation-management.html
Managing Secure Information -
Training
http://columbia.sighttraining.com
Data Security Policies http://policylibrary.columbia.edu/category/computingtechnology
This work is licensed under a Creative Commons
Attribution 4.0 International License.
REFERENCES
•  Sco$,	
  Mark,	
  Boardman,	
  Richard	
  P.,	
  Reed,	
  Philippa	
  A.S.	
  and	
  Cox,	
  Simon	
  J.	
  (2012)	
  
Introducing	
  research	
  data.	
  Southampton,	
  GB,	
  Univeristy	
  of	
  Southampton,	
  29pp.	
  
h$p://eprints.soton.ac.uk/338816/	
  
•  Responsible	
  research	
  data	
  management	
  and	
  the	
  prevenQon	
  of	
  scienQfic	
  
misconduct	
  www.knaw.nl/Content/Internet_KNAW/publicaQes/pdf/2013449.pdf	
  
•  h$p://dmconsult.library.virginia.edu/	
  
4444/
Managing Research Data
Created	
  by:	
  Amy	
  Nurnberger,	
  2015-­‐05-­‐12	
  	
  
This work is licensed under a Creative Commons
Attribution 4.0 International License.

Research Data Management: Part 1, Principles & Responsibilities

  • 1.
    Managing Research Data Part1 Planning Working Finalizing Sharing Data This work is licensed under a Creative Commons Attribution 4.0 International License. WHY – WHAT– WHO – WHEN & HOW
  • 2.
    WHY manage data- WHAT research data are- WHO manages research data - WHEN & HOW data management is done - Planning Working Finalizing Sharing Data Managing Research Data This work is licensed under a Creative Commons Attribution 4.0 International License.
  • 3.
    This two-part courseis a collaboration between CU Libraries/ Information Services and the Office of Research Compliance & Training. The purpose of this course is to familiarize you with the various aspects of research data management (RDM) by taking 3 Managing Research Data 44/ Managing Research Data This course will guide you through these areas, offering in-depth details on each of them. Please refer to the top navigation to keep track of which area you are currently exploring. •  Why RDM is both recommended and required •  What research data are •  Who is responsible for RDM •  When RDM activities occur •  How you can carry out RDM activities Part 1: Part 2:
  • 4.
    Learning objectives: At theend of this training you will be able to: •  Define & identify research data •  Understand the demands of responsible conduct of research with regard to research data management •  Understand the reasons behind the federal mandates of research data management 4 Managing Research Data 44/ Managing Research Data
  • 5.
    Links to manyof the references and policies referred to in this course can be found on the final slides. Have Fun! 5 Managing Research Data 44/ Managing Research Data
  • 6.
    Why should youcare about Research Data Management? WHY –WHAT – WHO – WHEN & HOW 644/ Managing Research Data
  • 7.
    WHY –WHAT –WHO – WHEN & HOW Managing research data: SAVES TIME Taking time to plan for your expected data, back them up, and document them in detail saves time otherwise lost in searching for, recovering, and deciphering data in the future SIMPLIFIES YOUR LIFE Managing your data, by adopting an organization scheme, developing a description standard, and creating a preservation plan avoids future confusion and turmoil INCREASES RESEARCH EFFICIENCY By saving time and avoiding confusion you will be more efficient! Manage your data for the future and you will be able to more easily find, access, understand, and use your data ENSURES RESEARCH INTEGRITY Good research data management makes it more feasible to fulfill the commitments of responsible research 744/ Managing Research Data
  • 8.
    Up to 80%of data lost within 20 years of publication: http://www.nature.com/news/scientists-losing-data-at-a-rapid-rate-1.14416 516 ecology papers published between 1991 and 2011 The chance of data being accessible fell by 17% per year Vines, T. H. et al. Curr. Biol. http://dx.doi.org/10.1016/j.cub. 2013.11.014 (2013) 8 WHY –WHAT – WHO – WHEN & HOW 44/ Managing Research Data
  • 9.
    When you engagein research at Columbia University you must: •  Be ethical in the conduct of the research •  Abide by regulations and policies •  Be responsible stewards of the research dollars and other resources •  Share the results of your research for the good of society Managing data is a critical responsibility for all researchers 9 WHY –WHAT – WHO – WHEN & HOW 44/ Managing Research Data
  • 10.
    •  Increases visibility • Facilitates discovery •  Satisfies funder & journal requirements •  Reinforces open scientific inquiry •  Establishes priority & enables citation •  Speeds research Adapted from: https://libraries.mit.edu/guides/subjects/data-management/ why.html & http://researchdata.wisc.edu/share-your-data/data-access-2/10 WHY –WHAT – WHO – WHEN & HOW Managing research data enables sharing, which: 44/ Managing Research Data
  • 11.
    Sharing enables breakthroughs thatlead to economic development: http://www.whitehouse.gov/sites/default/files/microsites/ostp/ ostp_public_access_memo_2013.pdf11 WHY –WHAT – WHO – WHEN & HOW “Scientific research supported by the Federal Government catalyzes innovative breakthroughs that drive our economy. The results of that research become the grist for new insights and are assets for progress in areas such as health, energy, the environment, agriculture, and national security.” 44/ Managing Research Data
  • 12.
    “…a research project's successis measured … also by the data it makes available to the wider community.” “It is obvious that making data widely available is an essential element of scientific research.” 12 WHY –WHAT – WHO – WHEN & HOW 44/ Managing Research Data
  • 13.
    The directive tomake federally funded research data openly accessible “is integrally tied to and supports the mission of higher education to produce, preserve, and share scholarship. It therefore provides the community with an opportunity to marshal its resources to improve the interoperability of research support systems and maximize the value of research funding.” Association of Research Libraries (ARL) on the Office of Science & Technology Policy memorandum “Increasing Access to the Results of Federally Funded Scientific Research” 13 WHY –WHAT – WHO – WHEN & HOW 44/ Managing Research Data
  • 14.
    •  Funders –  Federalagencies –  Foundations •  Journals 14 WHY –WHAT – WHO – WHEN & HOW 44/ Managing Research Data Data sharing is required by:
  • 15.
    National Science Foundation(NSF): https://www.nsf.gov/eng/general/dmp.jsp15 WHY –WHAT – WHO – WHEN & HOW 44/ Managing Research Data “Beginning January 18, 2011, proposals submitted to NSF must include a supplementary document of no more than two pages labeled "Data Management Plan" (DMP) . This supplementary document should describe how the proposal will conform to NSF policy on the dissemination and sharing of research results. Proposals that do not include a DMP will not be able to be submitted.”
  • 16.
    National Institutes ofHealth (NIH): 1 http://grants.nih.gov/grants/policy/data_sharing/ 2 http://grants.nih.gov/grants/guide/notice-files/NOT-OD-03-032.html16 WHY –WHAT – WHO – WHEN & HOW 44/ Managing Research Data “Data sharing is essential for expedited translation of research results into knowledge, products and procedures to improve human health.”1 “…all investigator-initiated applications with direct costs greater than $500,000 in any single year will be expected to address data sharing in their application”2
  • 17.
    “The Office ofScience and Technology Policy (OSTP) hereby directs each Federal agency with over $44 million in annual conduct of research and development expenditures to develop a plan to support increased public access to the results of research funded by the Federal Government.” “…digitally formatted scientific data resulting from unclassified research supported wholly or in part by Federal funding should be stored and publicly accessible to search, retrieve, and analyze.” (2013) http://www.whitehouse.gov/sites/default/files/microsites/ostp/ ostp_public_access_memo_2013.pdf17 WHY –WHAT – WHO – WHEN & HOW 44/ Managing Research Data …and not just them! More federal agencies will be requiring public access to data:
  • 18.
    http://www.nature.com/srep/policies/index.html http://http://www.aeaweb.org/aer/data.php Journal Sharing Policies: “Itis the policy of the American Economic Review to publish papers only if the data used in the analysis are clearly and precisely documented and are readily available to any researcher for purposes of replication. Authors of accepted papers that contain empirical work, simulations, or experimental work must provide to the Review, prior to publication, the data, programs, and other details of the computations sufficient to permit replication.” “…authors are required to make materials, data and associated protocols promptly available to readers.” 1844/ Managing Research Data WHY –WHAT – WHO – WHEN & HOW
  • 19.
    http://www.sciencemag.org/site/feature/contribinfo/prep/gen_info.xhtml#dataavail http://www.bmj.com/about-bmj/resources-authors/article-types/research “…trials of drugsand medical devices will be considered for publication only if the authors commit to making the relevant anonymised patient level data available on reasonable request” “All data necessary to understand, assess, and extend the conclusions of the manuscript must be available to any reader of Science. All computer codes involved in the creation or analysis of data must also be available...” 19 WHY –WHAT – WHO – WHEN & HOW 44/ Managing Research Data Journal Sharing Policies:
  • 20.
    Benefits of gooddata management & sharing practices: 20 WHY –WHAT – WHO – WHEN & HOW 44/ Managing Research Data •  Increase citations •  Avoid retractions (& potential misconduct questions) •  Advance knowledge •  Enable reproducibility
  • 21.
    Increase citations: Piwowar HA,Day RS, Fridsma DB (2007). Sharing Detailed Research Data Is Associated with Increased Citation Rate. PLoS ONE 2(3): e308. doi:10.1371/ journal.pone.0000308 21 WHY –WHAT – WHO – WHEN & HOW 44/ Managing Research Data Publicly available data was significantly associated with a 69% increase in citations, independent of journal impact factor, date of publication, and author country of origin using linear regression.
  • 22.
  • 23.
    Advance knowledge: http://www.sciencedaily.com/releases/2013/09/130903194155.htm23 WHY –WHAT– WHO – WHEN & HOW 44/ Managing Research Data “70 percent of published genetic sequence comparisons are not publicly accessible, leaving researchers worldwide unable to get to critical data they may need to tackle a host of problems ranging from climate change to disease control.”
  • 24.
    Enable reproducibility: Looking at238 recently published papers, pulled from five fields of biomedicine, a team of scientists found that just under 50 percent of the research materials, from lab mice to antibodies, used in the work could not be identified. This phenomenon impedes the ability of scientists to reproduce & extend published studies. Vasilevsky NA, Brush MH, Paddock H, Ponting L, Tripathy SJ et al. (2013) On the reproducibility of science: unique identification of research resources in the biomedical literature. PeerJ 1:e148 http://dx.doi.org/10.7717/peerj.148 24 WHY –WHAT – WHO – WHEN & HOW 44/ Managing Research Data
  • 25.
    TAKE-AWAYS 25 WHY –WHAT –WHO – WHEN & HOW 44/ Managing Research Data •  Researchers share the products of their research (e.g., publications, data) for the good of: –  Society –  Advancement of science –  Themselves •  Data management is required by: –  Funding bodies –  Institutions •  Data sharing is a requirement of: –  Funding bodies –  Publishers
  • 26.
    What are theresearch data that need to be managed? WHY –WHAT – WHO – WHEN & HOW 2644/ Managing Research Data
  • 27.
    WHY –WHAT –WHO – WHEN & HOW Defining Research Data: 1 Marieke Guy. http://www.slideshare.net/MariekeGuy/bridging-the-gap-between- researchers-and-research-data-management , #2 2 Queensland University of Technology. Manual of Procedures and Policies. Section 2.8.3. http://www.mopp.qut.edu.au/D/D_02_08.jsp 3 http://www.whitehouse.gov/omb/circulars_a110#362744/ Managing Research Data “…information created [or discovered] in the course of research”1 Material or information “on which an argument, theory, test or hypothesis, or another research output is based.” 2 “(i) Research data is defined as the recorded factual material commonly accepted in the [research] community as necessary to validate research findings…”3
  • 28.
    Data may becollected in many ways: 2844/ Managing Research Data WHY –WHAT – WHO – WHEN & HOW Through real time & unique observations, from repeatable experiments or simulations, or through derivations from unique collections of data, as a few examples. Data collection method costs and risks: Some data may be impossible to replace, some data may merely be very expensive replace. Alternatively, some data are so cheap and quick to acquire that it is less expensive to repeat the collection process than to store the data, e.g., some gene sequences.
  • 29.
    Data may beclassified by collection method, which include: 2944/ Managing Research Data WHY –WHAT – WHO – WHEN & HOW OBSERVATIONS – collected in real time / irreplaceable e.g. survey results, images, telemetry, sensor readings, some literary/historical sources, recordings EXPERIMENTS – reproducible/ variable expense e.g. chromatograms, antenna mappings, word frequency SIMULATIONS – Models & inputs used to create datasets e.g. economic models, climate models DERIVATIONS/COMPILATIONS – reproducible/expensive e.g. text or data mining, 3D models, compiled database RESEARCH PROCESS DATA – real time / irreplaceable e.g. survey instruments, data description/documentation, developed software, algorithms, code/script, instrument settings
  • 30.
    TRUE OR FALSE: 3044/ ManagingResearch Data WHY –WHAT – WHO – WHEN & HOW In scientific research, only the information and observations that are collected as part of your research are considered data.
  • 31.
    FALSE 3144/ Managing Research Data WHY–WHAT – WHO – WHEN & HOW Data are not only the information and observations made as part of scientific research but also the materials, the means, and the products of that research. Examples: •  Survey instruments •  Associated software •  Cell lines •  Specimens
  • 32.
    Information exists indifferent forms during the research process: 3244/ Managing Research Data WHY –WHAT – WHO – WHEN & HOW RAW OR PRIMARY– Lab notebooks, observational notes, instrument readings, images, footage, individual survey responses, historical sources, textual analysis, etc. PROCESSED– Statistical analyses, sources organized as evidence, rich descriptions, aggregated survey responses, etc. PUBLISHED– Distribution in some finalized format to those outside of the project. Distribution may occur in both static and dynamic (e.g. longitudinal data sets with annual reporting) instances, etc.
  • 33.
    “Research data meansthe recorded factual material commonly accepted in the scientific community as necessary to validate research findings” 33 •  Preliminary analyses •  Drafts of scientific papers •  Plans for future research •  Peer reviews •  Communications with colleagues. •  Trade secrets •  Commercial information •  Materials necessary to be held confidential by a researcher until they are published •  Information which is protected under law •  Personnel and medical information •  Information the disclosure of which would constitute a clearly unwarranted invasion of personal privacy, such as information that could be used to identify a particular person in a research study. http://federalregister.gov/a/2013-3046544/ Managing Research Data WHY –WHAT – WHO – WHEN & HOW The federal Office of Management and Budget offers the following explanation (summarized): Exclusions: 2 CFR § 200.315, Intangible property, e (3)
  • 34.
    Some data onresearch subjects may require special protections because they are highly sensitive and highly regulated. These sensitive data may require encryption and other security measures: •  Personal Health Information (PHI) e.g. insurance information, health conditions, etc. •  Personally Identifying Information (PII) e.g. financial information, social security numbers, etc. There are a number of university policies that govern handling information of these types. Special training is required for researchers and others handling PHI. 3444/ Managing Research Data WHY –WHAT – WHO – WHEN & HOW Sensitive data:
  • 35.
    Release of sensitivedata can damage: 3544/ Managing Research Data WHY –WHAT – WHO – WHEN & HOW -  Individuals whose data were released: identity theft, financial loss, privacy violations, etc. -  Research team members: loss of reputation, loss of position -  Research institution: financial liability UNIVERSITY RESOURCES: -  Office of HIPAA Compliance website -  HIPAA training -  IRB website and training -  Data Classification Policy -  Policy on Electronic Data Security Breach Reporting and Response -  Other IT Security Policies
  • 36.
    Take-aways: 3644/ Managing Research Data WHY–WHAT – WHO – WHEN & HOW •  Definitions of data are varied – Use the one(s) appropriate to your research community •  Some data are sensitive –  Know which data they are –  Know and take the proper precautions to protect these data
  • 37.
    Who is responsiblefor Research Data Management? WHY –WHAT – WHO – WHEN & HOW 3744/ Managing Research Data
  • 38.
    WHY –WHAT –WHO – WHEN & HOW Who is responsible for research data? 3844/ Managing Research Data The PI is ultimately responsible for the data, and is the steward of the data (more on this later). It is incumbent upon every member of the research team to safeguard research products (more on this later, too).
  • 39.
    PI responsibilities: 3944/ Managing ResearchData WHY –WHAT – WHO – WHEN & HOW “The full administrative, fiscal and scientific responsibility for the management of a sponsored project resides with the principal investigator named in the award” Faculty Handbook 2008 As with all aspects of a proposal submission, the PI must be involved with establishing and describing an appropriate data management plan, as required.
  • 40.
    PI responsibilities: 4044/ Managing ResearchData WHY –WHAT – WHO – WHEN & HOW The PI is responsible for the collection, management, maintenance and retention of research data accumulated during a research project. It is the PI’s responsibility to: •  Determine what records need to be retained to comply with sponsor requirements •  Adopt an orderly system of data organization •  Communicate the chosen system to all members of a research group & to the appropriate administrative personnel •  Establish & maintain procedures for protection of essential records in the event of a natural disaster or other emergency Sponsored Projects Handbook
  • 41.
    Research team member responsibilities: 4144/ ManagingResearch Data WHY –WHAT – WHO – WHEN & HOW Everyone involved in the research project is responsible for adhering to the statements and requirements presented in the data management plan, and all other data management practices related to the research project. These may include practices of handling: •  Physical data e.g., lab notebooks, samples, data documentation (aka metadata), etc. •  Electronic data e.g., file naming conventions, generating metadata, keeping an e-lab notebook, data storage, data back- ups, annotating findings
  • 42.
    Take-aways: 4244/ Managing Research Data WHY–WHAT – WHO – WHEN & HOW •  PI is responsible for all aspects of the grant, including data management •  All members of the research team are responsible for adhering to the data management plan Research data management can be complex, but there are resources available See Part 2 of this course for details on WHEN & HOW to practice Research Data Management à SEE NEXT PAGE!
  • 43.
    Resources for ResearchData Management: 4344/ Managing Research Data WHY –WHAT – WHO – WHEN & HOW Title   URL   Scholarly Communications Program, Data Management http://scholcomm.columbia.edu/data-management/ Research and Data Integrity Program (ReaDI) http://www.columbia.edu/cu/compliance/docs/ReaDI_Program/ index.html Data Management Plan Templates http://scholcomm.columbia.edu/data-management/data- management-plan-templates/ CUIT Research Computing Services http://rcs.columbia.edu Academic Commons Archival Storage http://academiccommons.columbia.edu/about Citation Management http://library.columbia.edu/research/citation-management.html Managing Secure Information - Training http://columbia.sighttraining.com Data Security Policies http://policylibrary.columbia.edu/category/computingtechnology This work is licensed under a Creative Commons Attribution 4.0 International License.
  • 44.
    REFERENCES •  Sco$,  Mark,  Boardman,  Richard  P.,  Reed,  Philippa  A.S.  and  Cox,  Simon  J.  (2012)   Introducing  research  data.  Southampton,  GB,  Univeristy  of  Southampton,  29pp.   h$p://eprints.soton.ac.uk/338816/   •  Responsible  research  data  management  and  the  prevenQon  of  scienQfic   misconduct  www.knaw.nl/Content/Internet_KNAW/publicaQes/pdf/2013449.pdf   •  h$p://dmconsult.library.virginia.edu/   4444/ Managing Research Data Created  by:  Amy  Nurnberger,  2015-­‐05-­‐12     This work is licensed under a Creative Commons Attribution 4.0 International License.