Research Data Management
Nikesh Narayanan
IT Librarian, Zayed University, Dubai
What are
research
data ?
ALL MANNER OF THINGS PRODUCED IN THE
COURSE OF RESEARCH
What is
Research
Data
Data that are collected,
observed, or created, for
purposes of analysis to produce
original research results.
Types of Research data
Instrument
measurements
Experimental
observations
Still images, video
and audio
Text documents,
spreadsheets,
databases
Quantitative data
(e.g. household
survey data)
Survey results &
interview
transcripts
Simulation data,
models &
software
Slides, artefacts,
specimens,
samples
Sketches, diaries,
lab notebooks …
What is Research Data Management
It covers the planning, collecting, organizing,
managing, storage, security, backing up,
preserving, and sharing your data and ensures
that research data are managed according to
legal, statutory, ethical and funding body
requirements. (Whyte, A. & Tedds, J., 2011).
Why manage
research data
• Ensuring research integrity and reproducibility
• Increasing your research efficiency
• Ensuring research data and records are accurate,
complete, authentic and reliable
• Saving time and resources in the long run
• Enhancing data security and minimizing the risk of data
loss
• Preventing duplication of effort by enabling others to use
your data
• Meeting funding body grant requirements (if applicable)
What is
involved in
RDM
Data Management Planning
Creating data
Documenting data
Accessing / using data
Storage and backup
Sharing data
Research Data Management- Life cycle
FAIR principle
Research data management- stages
Data
Management
Plan (DMP)
• A data management plan (DMP) contains all the
information related to managing the data for
your project: what data, stored where by whom,
how it is looked after and when it is made public.
• A researcher needs to make the plan in
compliance with funders and Institutional
requirements
• There are various tools and best practices guides
to help in this process
Funding Agency requirements- Examples
DMP-
Common
questions
Description of data to be collected / created (i.e. content,
type, format, volume...)
Standards / methodologies for data collection &
management
Ethics and Intellectual Property (highlight any restrictions
on data sharing e.g. embargoes, confidentiality)
Plans for data sharing and access (i.e. how, when, to whom)
Strategy for long-term preservation
DMP tools
• DMP Tool (https://dmptool.org/) is a free,
open-source, online application service of
the University of California Curation Center of
the California Digital Library.It helps
researchers to create data management
plans.
• DMP oline https://dmponline.dcc.ac.uk/ by The
University of Edinburgh
• RDM Plan Template - University of
Melbourne, Australia
How DMP
tools help
researchers
Variety of plans based on
funder/institutional
requirements
DMP Templates
DMP Guidelines
Best practices in
Research Data
Management
• File organization & Formats
• Metadata
• Deal with sensitive data
• Data sharing
• Data citation
Guidelines for choosing formats
• When selecting file formats for archiving, the formats should ideally be:
• Non-proprietary
• Unencrypted
• Uncompressed
• In common usage by the research community
• Interoperable among diverse platforms and applications
• Fully published and available royalty-free
• Fully and independently implementable by multiple software providers on multiple
platforms without any intellectual property restrictions for necessary technology
• Developed and maintained by an open standards organization with a well-defined
inclusive process for evolution of the standard.
Ref: Stanford library
Some preferred file formats
Containers: TAR,
GZIP, ZIP
Databases: XML,
CSV
Geospatial: SHP,
DBF, GeoTIFF,
NetCDF
Moving images:
MOV, MPEG, AVI,
MXF
Sounds: WAVE,
AIFF, MP3, MXF
Statistics: ASCII,
DTA, POR, SAS,
SAV
Still images: TIFF,
JPEG 2000, PDF,
PNG, GIF, BMP
Tabular data: CSV
Text: XML, PDF/A,
HTML, ASCII,
UTF-8
Web archive:
WAR
These sites provide a detailed discussion of file formats
ANDS File formats ANDS File format guide
Stanford Libraries Best
practice for file formats
University of Leicester
File formats and
software
Metadata
• What is Metadata
Metadata is defined as "structured
information that describes, explains,
locates, or otherwise makes it easier
to retrieve, use, or manage an
information resource. Metadata is
often called data about data or
information about information.
Metadata Type Example
Properties
Descriptive metadata
Common fields which help users to discover
online sources through searching and
browsing
Title
Author
Subject
Genre
Publication
date
Technical metadata
Fields which describe the information
required to access the data
File type
File size
Creation
date/time
Compression
scheme
Metadata standards/schemas may vary from discipline to
discipline. Dublin Core is one of the most commonly-used
generic metadata standards.
Discipline specific metadata- Examples
• Agricultural Metadata Element Set (AgMES)
• Astronomy Visualization Metadata Standard (AVMS)
• Access to Biological Collection Data (ABCD)
• Institute of Electrical and Electronics Engineers (IEEE) Learning Object
Metadata (LOM) standard:
• More examples are available in Texas Tech University
https://guides.library.ttu.edu/c.php?g=765394&p=5697292
Sensitive data
Sensitive data can be information that is protected
against unwarranted disclosure. It can include but
not limited to personal data, proprietary data and
other restricted or confidential Data that should be
protected from unauthorised access.
Sharing Sensitive Information- Important points
• Including provision for data sharing when gaining
informed consent
• Protecting people's identities by anonymising
data where needed
• Considering controlling access to data
• Applying an appropriate licence
Data Sharing
• Avoid duplication
• Scientific integrity
• More collaboration
• Better research
• Increased citation
BENEFITS
• Public expectations
• Government agenda
• Institutional agenda
DRIVERS
Data sharing
Important points
• Institutional Policies:
• Funder Policies: Researchers should be aware of any funder
policies that may stipulate the ways and restrictions on data
dissemination and sharing.
• Research Collaboration Agreement: Researchers should come
to an agreement on how, when, and by whom the data will be
accessed, used and disseminated in the future if appropriate.
• Usage of Extant Proprietary Data: Researchers should seek
permission from the data owner or producer prior to the
sharing the original or derived data if appropriate.
• Re-use of Others’ Data: If the research data was not previously
collected by you, instead of sharing the research data,
researchers should give credit to the data producers with a
proper data citation.
Data repositories -
Directories
• Re3Data: Database of data repositories
• Fairsharing.org: Catalogue of databases and
related resources
• DataCite: Database of datasets and repositories
• European Union Open Data Portal: Catalogue of
open datasets
• Data Citation Index (DCI): Database of datasets
(TUoS access through the Library Web of
Science page)
• EMBL-EBI: Database of repositories and other
resources
• Google Dataset Search
Data
repositories-
general
• Harvard Dataverse: by Harvard University
• Dryad Digital Repository: A broad life-sciences
and medicine repository to house data
underlying publications.
• Figshare: FigShare provides limited free storage
space to hold research data from various
disciplines.
• Mendeley Data: An open research data
repository by Elsevier, where researchers can
store and share their research data.
• Zenodo: A repository for research outputs from
all fields of science.
• https://ckan.org/
Subject specific repositories
• Chemistry
• Biological Magnetic Resonance Data Bank
• Cambridge Structural Database (CSD)
• ChemSpider
• ChemSynthesis
• Crystallography Open Database
• PubChem
• Computer Science
• CodePlex Archive:.
• Cooperative Association for Internet Data Analysis (CAIDA
• GitHub
• Launchpad:
• SourceForge
• Earth and Environmental Science
• Climate Change Knowlegde Portal:
• National Centers for Environmental Information (NCEI)
• National Ecological Observatory Network (NEON)
• National Snow and Ice Data Center (NSIDC)
• Geoscience
• Geospatial at Data.gov
• Marine Geoscience Data System (MGDS)
• NASA's Earthdata
• National Geospatial Digital Archive (NGDA)
• Biology and Life SciencesT
• he Cell Image Library
• Plant Expression Database (PLEXdb
• Universal Protein Resource (UniProt
• Worldwide Protein Data Bank (wwPDB):.
• Humanities
• Archaeology Data Service (ADS):
• ACultural Policy and the Arts National Data Archive (CPANDA)
• National Archive of Data on Arts and Culture (NADAC): TextGrid
• the Digital Archaeological Record (tDAR)
• Open Context
• Physics, Astrophysics and Astronomy
• HEPData:.
• National Nuclear Data Center (NNDC)
• NIST Atomic Spectra Database
• NoMaD Repository
• UK Solar System Data Centre (UKSSDC):
• Social Sciences
• Australian Data Archive
• Inter-university Consortium for Political and Social Research
(ICPSR):
• Qualitative Data Repository (QDR)
• UK Data Archive
How libraries
can engage in
RDM
defining the institutional strategy
developing RDM policy
delivering training courses
helping researchers to write DMPs
advising on data sharing and citation
setting up data repositories
Why should
libraries
support
RDM?
existing data and
open access
leadership roles
often run
publication
repositories
have good
relationships with
researchers
proven liaison and
negotiation skills
knowledge of
information
management,
metadata etc
highly relevant skill
set
Possible
Library RDM
Roles
Leading on (institutional) data policy
Bringing data into undergraduate research-based learning
Teaching data literacy to postgraduate students
Developing researcher data awareness
Providing advice, e.g. on writing DMPs
Explaining the impact of sharing data, and how to cite data
Developing and managing access to data collections
Documenting what datasets an institution has
Promoting data reuse by making known what is available
Potential
Challenges
How deep is our understanding of
research, especially scientific research and
our level of subject knowledge?
Translating library practices to research
data issues
Will researchers look to libraries for this
support?
Still need to resource and develop
infrastructure
Thank you

Researh data management

  • 1.
    Research Data Management NikeshNarayanan IT Librarian, Zayed University, Dubai
  • 2.
    What are research data ? ALLMANNER OF THINGS PRODUCED IN THE COURSE OF RESEARCH
  • 3.
    What is Research Data Data thatare collected, observed, or created, for purposes of analysis to produce original research results.
  • 4.
    Types of Researchdata Instrument measurements Experimental observations Still images, video and audio Text documents, spreadsheets, databases Quantitative data (e.g. household survey data) Survey results & interview transcripts Simulation data, models & software Slides, artefacts, specimens, samples Sketches, diaries, lab notebooks …
  • 5.
    What is ResearchData Management It covers the planning, collecting, organizing, managing, storage, security, backing up, preserving, and sharing your data and ensures that research data are managed according to legal, statutory, ethical and funding body requirements. (Whyte, A. & Tedds, J., 2011).
  • 6.
    Why manage research data •Ensuring research integrity and reproducibility • Increasing your research efficiency • Ensuring research data and records are accurate, complete, authentic and reliable • Saving time and resources in the long run • Enhancing data security and minimizing the risk of data loss • Preventing duplication of effort by enabling others to use your data • Meeting funding body grant requirements (if applicable)
  • 7.
    What is involved in RDM DataManagement Planning Creating data Documenting data Accessing / using data Storage and backup Sharing data
  • 8.
  • 9.
  • 10.
  • 11.
    Data Management Plan (DMP) • Adata management plan (DMP) contains all the information related to managing the data for your project: what data, stored where by whom, how it is looked after and when it is made public. • A researcher needs to make the plan in compliance with funders and Institutional requirements • There are various tools and best practices guides to help in this process
  • 12.
  • 13.
    DMP- Common questions Description of datato be collected / created (i.e. content, type, format, volume...) Standards / methodologies for data collection & management Ethics and Intellectual Property (highlight any restrictions on data sharing e.g. embargoes, confidentiality) Plans for data sharing and access (i.e. how, when, to whom) Strategy for long-term preservation
  • 14.
    DMP tools • DMPTool (https://dmptool.org/) is a free, open-source, online application service of the University of California Curation Center of the California Digital Library.It helps researchers to create data management plans. • DMP oline https://dmponline.dcc.ac.uk/ by The University of Edinburgh • RDM Plan Template - University of Melbourne, Australia
  • 15.
    How DMP tools help researchers Varietyof plans based on funder/institutional requirements DMP Templates DMP Guidelines
  • 16.
    Best practices in ResearchData Management • File organization & Formats • Metadata • Deal with sensitive data • Data sharing • Data citation
  • 17.
    Guidelines for choosingformats • When selecting file formats for archiving, the formats should ideally be: • Non-proprietary • Unencrypted • Uncompressed • In common usage by the research community • Interoperable among diverse platforms and applications • Fully published and available royalty-free • Fully and independently implementable by multiple software providers on multiple platforms without any intellectual property restrictions for necessary technology • Developed and maintained by an open standards organization with a well-defined inclusive process for evolution of the standard. Ref: Stanford library
  • 18.
    Some preferred fileformats Containers: TAR, GZIP, ZIP Databases: XML, CSV Geospatial: SHP, DBF, GeoTIFF, NetCDF Moving images: MOV, MPEG, AVI, MXF Sounds: WAVE, AIFF, MP3, MXF Statistics: ASCII, DTA, POR, SAS, SAV Still images: TIFF, JPEG 2000, PDF, PNG, GIF, BMP Tabular data: CSV Text: XML, PDF/A, HTML, ASCII, UTF-8 Web archive: WAR
  • 19.
    These sites providea detailed discussion of file formats ANDS File formats ANDS File format guide Stanford Libraries Best practice for file formats University of Leicester File formats and software
  • 20.
    Metadata • What isMetadata Metadata is defined as "structured information that describes, explains, locates, or otherwise makes it easier to retrieve, use, or manage an information resource. Metadata is often called data about data or information about information. Metadata Type Example Properties Descriptive metadata Common fields which help users to discover online sources through searching and browsing Title Author Subject Genre Publication date Technical metadata Fields which describe the information required to access the data File type File size Creation date/time Compression scheme Metadata standards/schemas may vary from discipline to discipline. Dublin Core is one of the most commonly-used generic metadata standards.
  • 21.
    Discipline specific metadata-Examples • Agricultural Metadata Element Set (AgMES) • Astronomy Visualization Metadata Standard (AVMS) • Access to Biological Collection Data (ABCD) • Institute of Electrical and Electronics Engineers (IEEE) Learning Object Metadata (LOM) standard: • More examples are available in Texas Tech University https://guides.library.ttu.edu/c.php?g=765394&p=5697292
  • 22.
    Sensitive data Sensitive datacan be information that is protected against unwarranted disclosure. It can include but not limited to personal data, proprietary data and other restricted or confidential Data that should be protected from unauthorised access. Sharing Sensitive Information- Important points • Including provision for data sharing when gaining informed consent • Protecting people's identities by anonymising data where needed • Considering controlling access to data • Applying an appropriate licence
  • 23.
    Data Sharing • Avoidduplication • Scientific integrity • More collaboration • Better research • Increased citation BENEFITS • Public expectations • Government agenda • Institutional agenda DRIVERS
  • 24.
    Data sharing Important points •Institutional Policies: • Funder Policies: Researchers should be aware of any funder policies that may stipulate the ways and restrictions on data dissemination and sharing. • Research Collaboration Agreement: Researchers should come to an agreement on how, when, and by whom the data will be accessed, used and disseminated in the future if appropriate. • Usage of Extant Proprietary Data: Researchers should seek permission from the data owner or producer prior to the sharing the original or derived data if appropriate. • Re-use of Others’ Data: If the research data was not previously collected by you, instead of sharing the research data, researchers should give credit to the data producers with a proper data citation.
  • 25.
    Data repositories - Directories •Re3Data: Database of data repositories • Fairsharing.org: Catalogue of databases and related resources • DataCite: Database of datasets and repositories • European Union Open Data Portal: Catalogue of open datasets • Data Citation Index (DCI): Database of datasets (TUoS access through the Library Web of Science page) • EMBL-EBI: Database of repositories and other resources • Google Dataset Search
  • 26.
    Data repositories- general • Harvard Dataverse:by Harvard University • Dryad Digital Repository: A broad life-sciences and medicine repository to house data underlying publications. • Figshare: FigShare provides limited free storage space to hold research data from various disciplines. • Mendeley Data: An open research data repository by Elsevier, where researchers can store and share their research data. • Zenodo: A repository for research outputs from all fields of science. • https://ckan.org/
  • 27.
    Subject specific repositories •Chemistry • Biological Magnetic Resonance Data Bank • Cambridge Structural Database (CSD) • ChemSpider • ChemSynthesis • Crystallography Open Database • PubChem • Computer Science • CodePlex Archive:. • Cooperative Association for Internet Data Analysis (CAIDA • GitHub • Launchpad: • SourceForge • Earth and Environmental Science • Climate Change Knowlegde Portal: • National Centers for Environmental Information (NCEI) • National Ecological Observatory Network (NEON) • National Snow and Ice Data Center (NSIDC) • Geoscience • Geospatial at Data.gov • Marine Geoscience Data System (MGDS) • NASA's Earthdata • National Geospatial Digital Archive (NGDA) • Biology and Life SciencesT • he Cell Image Library • Plant Expression Database (PLEXdb • Universal Protein Resource (UniProt • Worldwide Protein Data Bank (wwPDB):. • Humanities • Archaeology Data Service (ADS): • ACultural Policy and the Arts National Data Archive (CPANDA) • National Archive of Data on Arts and Culture (NADAC): TextGrid • the Digital Archaeological Record (tDAR) • Open Context • Physics, Astrophysics and Astronomy • HEPData:. • National Nuclear Data Center (NNDC) • NIST Atomic Spectra Database • NoMaD Repository • UK Solar System Data Centre (UKSSDC): • Social Sciences • Australian Data Archive • Inter-university Consortium for Political and Social Research (ICPSR): • Qualitative Data Repository (QDR) • UK Data Archive
  • 28.
    How libraries can engagein RDM defining the institutional strategy developing RDM policy delivering training courses helping researchers to write DMPs advising on data sharing and citation setting up data repositories
  • 29.
    Why should libraries support RDM? existing dataand open access leadership roles often run publication repositories have good relationships with researchers proven liaison and negotiation skills knowledge of information management, metadata etc highly relevant skill set
  • 30.
    Possible Library RDM Roles Leading on(institutional) data policy Bringing data into undergraduate research-based learning Teaching data literacy to postgraduate students Developing researcher data awareness Providing advice, e.g. on writing DMPs Explaining the impact of sharing data, and how to cite data Developing and managing access to data collections Documenting what datasets an institution has Promoting data reuse by making known what is available
  • 31.
    Potential Challenges How deep isour understanding of research, especially scientific research and our level of subject knowledge? Translating library practices to research data issues Will researchers look to libraries for this support? Still need to resource and develop infrastructure
  • 32.