4. Data- Some Definitions
Digital Curation Center (UK): “Data, any information in binary digital form, is at
the centre of the Curation Lifecycle.”
Office of Management and Budget: “Research data means the recorded factual
material commonly accepted in the scientific community as necessary to
validate research findings”
The Oxford English Dictionary (OED)defines “data” as:
Related items of (chiefly numerical) information considered collectively,
typically obtained by scientific work and used for reference, analysis, or
calculation.
Data can be both analogue and digital materials.
5. Data in the Sciences and Humanities
BICEP2 (South Pole telescope) Performativity, Place, Space
Burgess and Hamming, 2011BICEP2 Collaboration, 2014
6. Every discipline has data!
Types of data include:
• observational data
• laboratory experimental data
• computer simulation
• textual analysis
• physical artifacts or relics
Examples of data include:
• Audio and video files
• Code or scripts
• Digital text
• Lab notebooks
• Geospatial images
• Instrumental data
• Photographs
• Rock samples
• Survey results
• Scanned documents
• Spreadsheets
• Video games
https://www.flickr.com/photos/23165290@N00/9338136777/(CC BY-SA 2.0)
7. Federal Funding Agency Requirements
https://www.flickr.com/photos/pdenker/2556591663/ (CC By 2.0)
8. Brief History of Data Sharing Requirements
• February 26, 2003 - NIH requires a Data Sharing Policy for projects above $500K.
• January 18, 2011- NSF requires Data Management Plans (DMPs) to be submitted
with all new grant proposals.
• February 22, 2013- Memo issued by White House Office of Science and Technology
Policy (OSTP).
http://www.whitehouse.gov/sites/default/files/microsites/ostp/ostp_public_acces
s_memo_2013.pdf
• March 24, 2014 – Follow up memo issued by OSTP.
http://www.whitehouse.gov/sites/default/files/microsites/ostp/OpenAccess_Marc
h-2014.pdf
• November 13, 2014- Progress update on policies to increase public access to the
results of federally funded scientific research issued by OSTP.
http://www.whitehouse.gov/sites/default/files/microsites/ostp/public_access_rep
ort_to_congress_ostp_11.13.14.pdf
• July 24, 2014, the DOE releases its Public Access Plan for article and data sharing
• 2015 - 16 Agencies/Departments have released their responses
9. Responding Agencies to OSTP Memo
Agency for Healthcare Research and Quality (AHRQ)
HHS Office of the Assistant Secretary for Preparedness and Response (ASPR)
Centers for Disease Control and Prevention (CDC)
Department of Commerce (DOC)
Department of Defense (DOD)
Department of Energy (DOE)
Department of the Interior (DOI)
Department of Health and Human Services (HHS)
Department of Homeland Security (DHS)
Department of Transportation (DOT)
Department of Education (ED)
Environmental Protection Agency (EPA)
Food and Drug Administration (FDA)
National Aeronautics and Space Administration (NASA)
National Institutes of Health (NIH)
National Institute of Standards and Technology (NIST)
National Oceanic and Atmospheric Administration (NOAA)
National Science Foundation (NSF)
Office of the Director of National Intelligence (ODNI)
Smithsonian Institution (SI)
United States Agency for International Development (USAID)
United States Department of Agriculture (USDA)
United States Department of Veterans Affairs (VA)
10. Agency Responses Summary- Articles
AGENCIES USING PUBMEDCENTRAL
Agency for Healthcare Research and Quality (AHRQ)
HHS Office of the Assistant Secretary for Preparedness and Response (ASPR)
Centers for Disease Control and Prevention (CDC)
Food and Drug Administration (FDA)
National Aeronautics and Space Administration (NASA)
National Institutes of Health (NIH)
National Institute of Standards and Technology (NIST)
United States Department of Veterans Affairs (VA)
AGENCIES USING DOE’S PAGES (Public Access Gateway for Energy & Science)
Department of Energy (DOE)
National Science Foundation (NSF)
AGENCIES WITH OWN REPOSITORIES
Department of Defense (DOD)-- Defense Technical Info Center
National Oceanic and Atmospheric Administration (NOAA)
United States Department of Agriculture (USDA)-USDA public access archive system
OTHER (TBD)
Department of Transportation (DOT)
United States Agency for International Development (USAID)
United States Geological Survey (USGS)
11. Agency Responses Summary
Time Frame for Depositing Data in a Publically Accessible Repository
At time of article publication
Agency for Healthcare Research and Quality (AHRQ)
Department of Energy (DOE)
Food and Drug Administration (FDA)
National Institutes of Health (NIH)
National Institute of Standards and Technology (NIST)
National Science Foundation (NSF) (exploring this option)
United States Agency for International Development (USAID)
With article publication or within 30 months of collection
HHS Office of the Assistant Secretary for Preparedness and Response (ASPR)
Centers for Disease Control and Prevention (CDC)
With article publication or within 1 year of collection
National Oceanic and Atmospheric Administration (NOAA)
At time of publication or within a reasonable time period after publication
National Aeronautics and Space Administration (NASA)
Within a reasonable time
Department of Defense (DOD)-- Defense Technical Info Center
Doesn’t specify
United States Department of Veterans Affairs (VA)
United States Department of Agriculture (USDA)
Department of Transportation (DOT)
United States Geological Survey (USGS)
12. Journal Requirements
PLOS journals require authors to make all data underlying the findings
described in their manuscript fully available without restriction, with rare
exception.
13. Why do funders and broader science
community want to share and preserve
data?
https://www.flickr.com/photos/joyvanb/11111295964/ (CC BY-NC-ND 2.0)
18. Benefits of Sharing Data
• Clearly documents and provides evidence for research in conjunction with
published results.
• Meet copyright and ethical compliance (i.e. HIPAA).
• Increases the impact of research through data citation.
• Preserves data for long-term access and prevents loss of data.
• Describes and shares data with others to further new discoveries and research.
• Prevent duplication of research.
• Accelerates the pace of research.
• Promotes reproducibility of research.
19. Recognition
Chapter II.C.2.f(i)(c), Biographical Sketch(es), has been revised to rename the
“Publications” section to “Products” and amend terminology and instructions
accordingly. This change makes clear that products may include, but are not
limited to, publications, data sets, software, patents, and copyrights.
20. Data Management
• Managing data effectively across the data lifecycle is critical for the
success of a research project
– Make a data management plan
• Data management refers to all aspects of creating, housing,
delivering, maintaining, and archiving and preserving data
• It is one of the essential areas of responsible conduct of research
• All subject areas (humanities, social science, and hard sciences)
engage with data in many formats.
• Absence of data documentation and management will limit the
potential use of that data.
21. From: Fary, Michael and Owen, Kim, Developing an
Institutional Research Data Management Plan Service,
Educause ACTI white paper, January 2013,
http://net.educause.edu/ir/library/pdf/ACTI1301.pdf
Common Data
Lifecycle Stages
22. Aspects of Research Data
Management
•DMPs/Planning
•Storage & backup
•File organization & naming
•Documentation & metadata
•Legal/ethical considerations
•Sharing & reuse
•Preservation & Archiving
24. • Types of data to be produced.
• Standards or descriptions that would be used with the data
(metadata).
• How these data will be accessed and shared.
• Policies and provisions for data sharing and reuse.
• Provisions for archiving and preservation.
https://flickr.com/photos/inl/5097547405 (CC BY 2.0)
Points to address in your Data Management Plan (DMP)
25.
26.
27.
28.
29. Aspects of Research Data
Management
•DMPs/Planning
•Storage & backup
•File organization & naming
•Documentation & metadata
•Legal/ethical considerations
•Sharing & reuse
•Preservation & Archiving
30. Metadata
• Commonly defined as “data about data”
• It is information that describes the data
• When talking to faculty, don’t use library
jargon like metadata. It is confusing to
researchers.
https://www.flickr.com/photos/musebrarian/3289649684/ (CC BY-NC-SA 2.0)
31. Some good data practices
File organization and naming
• Label and define the content of your data files in a systematic way
• Use descriptive file names
– For example not- FIAGC (Fluffy is a great cat) but age, blood pressure
etc.
• Use consistent date formatting ( e.g. YYMMDD)
• Keep file names short (no more than 25 characters)
• Don’t use special characters
• Use underscores instead of blank spaces
• Keep track of versions
• Don’t use confusing labels ( e.g. Pete’s data, final, final2, really final,
really really final)
35. Toy Story 2
How Toy Story 2 Almost Got Deleted: Stories From Pixar Animation: ENTV
https://www.youtube.com/watch?v=8dhp_20j0Ys
36. Storage, back up and securing data
• Have at least 3 copies of your data
• Don’t use your personal computer, data sticks or
CDs if you can avoid it
– They break, get lost, lose data over time
• Use a hard drive if you can
• Use cloud storage if you can ( but be aware of
sensitive data)
• Northwestern has a subscription to Box.net for
faculty, staff and graduate students
– See http://www.it.northwestern.edu/file-
sharing/box.html
flickr.com/photos/s_w_ellis/3877534599 (CC By 2.0)
37. Preservation and Sharing data
• Some options for preserving and sharing data
– Self-archive
– Institutional repository
– Open data repository
– National or international data archive or
repository
By Florian Hirzinger - www.fh-ap.com (Own work (Florian Hirzinger)) [CC BY-SA 3.0 (http://creativecommons.org/licenses/by-sa/3.0) or GFDL (http://www.gnu.org/copyleft/fdl.html)], via Wikimedia Commons
38. Northwestern Libraries
• Stewardship, institutional memory
• Long tradition of broad subject expertise, liaisons to and in every
discipline
• Potential Data services:
• finding data
• licensing data
• depositing data
• software for working with data
• assistance/ support with DMP’s
• training
• metadata assistance
• outreach
39.
40.
41.
42. Considerations for the medical campus
• All human subjects data is subject to IRB
approval
– Implications for knowledge of data management
plans
– Researchers need exposure to and awareness of
new NIH Sharing Plan
43. Resources at the CDSI
http://www.nucats.northwestern.edu/centers-programs/cdsi
44. Resources at the CDSI
REDCap secure survey platform
• REDCap
– http://www.nucats.northwestern.edu/resources-
services/data-informatics-services/software-
tools/redcap
• REDCap (Research Electronic Data Capture) is
a secure, web-based application for building
and managing online data capture for
research studies
45. Precision medicine
• Precision medicine is the #1 priority for DJ
Patil, Chief Data Scientist and Deputy Chief
Technology Officer for Data Policy at the
White House in the Office of Science and
Technology Policy
– Source: NSF Data Science webinar with DJ Patil
May 1, 2015
46. Resources at the CDSI – i2b2
Informatics for Integrating Biology & the Bedside
i2b2 at NUCATS
47. Finding partners
• Get to know who your departments’ Grant Officers are in
the OSR: http://osr.northwestern.edu/?src=or-hdr
48. Finding partners
• NUIT Research Computing
– http://www.it.northwestern.edu/research/
– Seminars & events
– Visualization and consultation services
• Sometimes knowing the resources means
knowing where to refer the user
49. Preparing to meet a researcher
• Know their work
– Read their papers, or at least scan them
– This helps you to ask meaningful questions about
their data
– It also helps warm them up to you
• Go to their seminars or department meetings
• Already mentioned: avoid library jargon
– Ask the user to explain or describe their data
50. RESOURCES:
Northwestern University Library Data Management LibGuide:
http://libguides.northwestern.edu/datamanagement
DMPTool: https://dmp.org/
Northwestern University's Research Data: Ownership, Retention and Access Policy:
http://www.research.northwestern.edu/policies/documents/research_data.pdf
Cunera Buys- e-science librarian: c-buys@northwestern.edu