Introduction to Research Data Management - 2014-02-26 - Mathematical, Physical and Life Sciences Division, University of Oxford

Introduction to research
data management

Slides provided by the Research Support
Team, IT Services, University of Oxford

WHAT IS RESEARCH DATA
MANAGEMENT?
data management

What is data?
“A reinterpretable representation of information in a formalized
manner suitable for communication, interpretation, or processing.”
Digital Curation Centre

Slide adapted from
the PrePARe Project

data management

What is data?

Any information you use in your
research

Slide adapted from
the PrePARe Project

data management

Introductions


What sort of data do you use?
 Where does it come from?
 Are

you creating new data?

 Are

you working with pre-existing data?
 Where is your data stored?

data management

What is data management?


A general term covering how you organize,
structure, store, and care for the information
used or generated during a research project
 How

you deal with information on a day-today basis over the lifetime of a project
happens to data in the longer term –
what you do with it after the project
concludes

 What

data management

Carrots and sticks


Work efficiently and
 University of Oxford
with minimum hassle
Policy on the
now
Management of
Research Data and
 Save time and avoid
problems in the future Records
 Make it easy to share  Funding body
requirements
your data

data management

University of Oxford policy

Introduced July 2012

data management






The full policy can be viewed on the University of
Oxford Research Data Management website
Research data defined as the information needed „to
support or validate a research project‟s observations,
findings or outputs‟
Research data should be:
 Accurate, complete, identifiable, retrievable, and
securely stored
 Able to be made available to others

data management



Research data should be retained for „as long as they
are of continuing value to the researcher and the wider
research community‟ – but a minimum of three years




Specific requirements from funders take precedence

Researchers are responsible for:



Planning for the ongoing custodianship of their data





Developing and documenting clear data management procedures
Ensuring that legal, ethical, and funding body requirements are met

Policy applies to University staff and doctoral students


Depositing relevant research data may ultimately become a condition
of award for doctorates
data management

Funders‟ requirements


Funding bodies are taking an increasing
interest in what happens to research data
 You may be required to make your data
publicly available at the end of a project
 Check

the small print in your grant conditions



Many funders require a data management plan
as part of grant applications
 Oxford‟s RDM website provides a summary of
requirements
data management

DAY-TO-DAY DATA
MANAGEMENT
data management

Can you find what you
need, when you need it?

„What a mess‟ by .pst, via Flickr: http://www.flickr.com/photos/psteichen/3915657914/.

data management

Hierarchical systems vs. tagging


Hierarchical organization uses nested folders




Default option for most operating systems

Tagging allows more flexibility



Some operating systems support tagging





Items can be in multiple categories
File tagging software is also
available

Sort… or search?
data management

Adding tags in Windows 7

data management

Hyperlinks and shortcuts




Hyperlinks are not just for websites –
they can also lead to other files on your
computer
Use shortcuts to avoid duplicating files
 Create

project folders as an easy way to
access related material

data management

File naming


Aim for concise but informative names
 Ideally,

you should be able to tell what‟s in a file
without opening it



Think about the ordering of elements within a
filename
 YYYY-MM-DD

dates allow chronological sorting

 You

can force an order by adding a number at
the beginning of the name



Consider including version information
data management

File naming strategies – examples


Order by date:



Order by type:

2013-04-12_analysis_ASPH.xlsx
2013-04-12_raw-data_ASPH.txt

Analysis_JARID1A_2013-04-12.xlsx

2012-12-15_analysis_JARID1A.xlsx

Raw-data_ASPH_2012-12-15.txt

2012-12-15_raw-data_JARID1A.txt


Analysis_ASPH_2012-12-15.xlsx

Raw-data_JARID1A_2013-04-12.txt

Order by subject:



Forced order with numbering:

ASPH_analysis_2012-12-15.xlsx

01_JARID1A_raw-data_2013-04-12.txt

ASPH_raw-data_2012-12-15.txt

02_JARID1A_analysis_2013-04-12.xlsx

JARID1A_analysis_2013-04-12.xlsx

03_ASPH_raw-data_2012-12-15.txt

JARID1A_raw-data_2013-04-12.txt

04_ASPH_analysis_2012-12-15.xlsx

data management

File naming strategies – examples

In retrospect I am not very happy with the method I
used for naming files. The biggest problem was with
the newspaper articles I downloaded… I named the
files only based on the topic of the article, without
mentioning the name of the periodical and the year
of publication, which would have been very useful
later, when I began writing the thesis.
– Doctoral student researching communication history

data management

Are you using the right tools for the job?


Take time to assess whether your current
software and methods are meeting your needs
 Sticking with old familiars can be false
economy
 Ask friends and colleagues for
recommendations

data management

Research Skills Toolkit


Website and handson workshops
 A guide to software,
University services,
and other tools and
resources for
research
 Requires SSO login
http://www.skillstoolkit.ox.ac.uk/
data management

IT Learning Programme


Over 200 different IT
courses
 Covering software, skills,
and new technologies
http://www.oucs.ox.ac.uk/itlp/
 ITLP Portfolio offers
course materials and
other resources
http://portfolio.it.ox.ac.uk/
data management

ORDS – Online Research Database
Service


Specifically designed for academic research data
 Cloud-hosted and automatically backed up
 Web interface makes collaboration straightforward
 If desired, databases can easily be made public
 Designed to permit easy archiving
 Currently being used by a small group of test users –
will become more widely available
later in 2014
 http://ords.ox.ac.uk/

data management

KEEPING YOUR DATA SAFE
data management

Backing up is
easier than
replacing
lost data…

http://blogs.ch.cam.ac.uk/pmr/2011/08/01/why-you-need-a-data-management-plan/

Slide adapted from
the PrePARe Project

data management

Make multiple copies…

…and keep them in different places

Automate the
process if you can
Slide adapted from
the PrePARe Project

data management

Example back-up plan







Raw data from instruments are stored on the
instrument PC, which is backed up every couple of
months to DVDs
Much raw data also transferred to desktop computers –
usually stored on external hard drives
Analysed data (e.g. Excel spreadsheets and
PowerPoint files) are stored in a shared folder on a
departmental server which is backed up daily
Lab books are stored inside the laboratory in locked
cupboards
data management

IT Services: Data Back-up on the HFS
HFS is Oxford‟s central back-up and archiving
service
 Free of charge to University staff and
postgraduates
 Automated back-ups of machines connected to
University network
 Copies kept in multiple places


data management

Think about your storage media…

… and about file formats

Slide adapted from
the PrePARe Project

data management

For discussion


What data management challenges have you
encountered?
 What strategies have you personally found
useful?
 Be ready to feed back to the group

data management

DOCUMENTATION AND
METADATA
data management

Documentation and metadata


Documentation is the contextual information
required to make data intelligible and aid
interpretation
A

users‟ guide to your data

 May


be given at study level or data level

Metadata is similar, but usually more structured
 Conforms
 Machine

to set standards

readable
data management

Make material understandable

What‟s obvious
now might not
be in a few
months, years,
decades…

MAKE SURE
YOU CAN
UNDERSTAND
IT LATER

Adapted from „Clay Tablets with Linear B Script‟ by Dennis, via Flickr: http://www.flickr.com/photos/archer10/5692813531/

Slide adapted from
the PrePARe Project

data management

Make material verifiable

Image by woodleywonderworks , via Flickr:
http://www.flickr.com/photos/wwworks/4588700881/

• Detailing your methods
helps people
understand what you
did
• Reduces risk of
misinterpretation
• Helps make your work
reproducible
• Conclusions can be
verified
Slide adapted from
the PrePARe Project

data management

data management

Exercise


In small groups, look at the sample data sheet
 Imagine you have just downloaded this dataset from an
archive
 What contextual or explanatory information is missing?
 What additional documentation would you like to see
supplied


At the data level?



At the study level?

data management

Documentation – what to include
• Who created it, when and why

•
•
•
•

Description of the item
Methodology and methods
Units of measurement
Definitions of jargon,
acronyms and code
• References to related data
Slide adapted from
the PrePARe Project

data management

Metadata – data about data


A formal,
structured
description
of a dataset
 Used by
archives
to create
catalogue
records
data management

ISA tools software suite
Open source
metadata
tracking tools
for the life
sciences

http://isa-tools.org/
data management

Missing metadata – or the riddle of the
sixth toe





This painting shows
Georgiana, Duchess of
Devonshire as Diana
… or maybe Cynthia
She has six toes – but
no one knows why

Public domain image from Wikimedia Commons:
http://commons.wikimedia.org/wiki/File:Georgiana_Cavendish,_Duchess_of_Devonshire_as_Diana.jpg

data management

WHAT HAPPENS AT THE END
OF THE PROJECT?
data management

Data archiving


Data generated during a research project is
valuable
 Don‟t leave it languishing on your hard drive
 Consider depositing it in an archive or repository
A

number of national disciplinary archives exist

 DataBib
 Oxford



provides a catalogue: http://databib.org/

will soon have its own data archive

If possible, make it available for others to re-use
data management

Why share data? Reputation


Get credit for high quality
research
 Recognition for contribution
to research community
 Open data leads to increased
citations
 Of

the data itself

 Of

associated papers
Slide adapted from
the PrePARe Project

data management

Why share data? Reuse


Reduces duplication of
effort
 Allows public research
funding to be used more
effectively
 Extend research beyond
your discipline
 Perhaps into contexts not
currently envisaged
Slide adapted from
the PrePARe Project

data management

Why share data? Be a trailblazer!


A paradigm shift in how research outputs are
viewed is occurring
 Data outputs are of increasing importance –
and are likely to become even more so




Major journals are increasingly
looking to publish datasets
alongside articles

Be at the forefront of an
important shift in the
academic world
data management

Figshare


Free online data sharing platform




Shared research is allocated a DataCite DOI
A possible alternative to conventional repositories


If no suitable
repository is
available



If you need a
data sharing
solution in a
hurry
data management

Video by NYU Health Sciences Libraries: http://www.youtube.com/watch?v=N2zK3sAtr-4

data management

Data sharing – concerns


Ethical concerns
 Confidential



Legal concerns
 Third



or sensitive data

party data

Professional concerns
 Intended

publication

 Commercial

issues (e.g. patent protection)

data management

Data sharing – concerns
• Redact or embargo if there is good reason
• Planning ahead can reduce difficulties

Slide adapted from
the PrePARe Project

data management

Data licensing


A licence clarifies the conditions for accessing
and making use of a dataset
 User

knows what‟s allowed without asking further
permission

 Doesn‟t

exclude possibility of specific requests to
go beyond the terms of the licence



For databases, structure and content may be
covered by separate rights

data management

Data licences - examples


Creative Common licences



Six different flavours, plus CC0 public domain dedication





Widely used and recognized
http://creativecommons.org/

Open Data Commons


Specifically designed for datasets



Recognizes the structure/content distinction



http://opendatacommons.org/
data management

Data licensing - guidance


„How to License Research Data‟
A

guide from the Digital Curation Centre

http://www.dcc.ac.uk/resources/how-guides/license-research-data
data management

DATA MANAGEMENT
PLANNING
data management

Data management plans


A document which may be created in the early
stages of a project
 While
 An



planning, applying for funding, or setting up

initial plan may be expanded later

Details plans and expectations for data
 Nature

of data and its creation or acquisition

 Storage

and security

 Preservation

and sharing
data management

Exercise


Using the resources available, have a go at
drafting a data management plan for your own
research
 If there are questions you can‟t answer at this
stage, make a note of
 What

you need to find out

 Decisions

you need to make

data management

Digital Curation Centre


A national service
providing advice and
resources
 Create a data
management plan
using the DMP online
tool

http://www.dcc.ac.uk/

https://dmponline.dcc.ac.uk/
data management

„In preparing for
battle, I have always
found that plans are
useless but planning
is indispensable.‟
Dwight D. Eisenhower

data management

UNIVERSITY SERVICES
data management

ORA-Data and DataFinder



Two forthcoming University of Oxford services
Launch date TBC

data management

ORA-Data (formerly DataBank)



University of Oxford‟s institutional data archive
Long term preservation for datasets without another
natural home





Datasets will be assigned DOIs
Will work alongside ORA-Publications to form a
composite University archive




In some cases, may a suitable home for DPhil data

Possible to link publications and datasets in ORA

Depositors can opt to make datasets publicly available,
embargoed for a fixed period, or hidden
data management

DataFinder


A catalogue of datasets




Will harvest metadata from ORA-Data and other
compatible data stores





Information on the nature, location, and availability of the data

So anything in ORA-Data will have a record in DataFinder

Researchers depositing data elsewhere strongly
encouraged to add a record to DataFinder
Should provide a substantial resource for researchers
seeking datasets for reuse

data management

FURTHER INFORMATION AND
RESOURCES
data management

Research data management website
Oxford‟s central
advisory website
 University policy
is available
 Questions?
Email
researchdata
@ox.ac.uk


http://researchdata.ox.ac.uk/
data management

IT Services: Research Support Team


Can assist with technical aspects of research
projects at all stages of the project lifecycle
 Help
 But



with DMPs, selecting software or storage, etc.

the earlier you seek advice, the better

For more information, see our website:
http://research.it.ox.ac.uk

data management

Research Data MANTRA


Free online
interactive
training modules
 Aimed at
postgraduates
and early career
researchers
http://datalib.edina.ac.uk/mantra/
data management

Any questions?

Ask now, or email us on
researchdata@ox.ac.uk

data management

Rights and re-use








This slideshow is part of a series of research data management
training resources prepared by the DaMaRO Project at the
University of Oxford
With the exception of clip art used with permission from Microsoft,
commercial logos and trademarks, and images credited to other
sources, the slideshow is made available under a Creative
Commons Attribution Non-Commercial Share-Alike License
Parts of this slideshow draw on teaching materials produced by
the PrePARe Project, DATUM for Health, and DataTrain
Archaeology
Within the terms of this licence, we actively encourage sharing,
adaptation, and re-use of this material

data management

Introduction to Research Data Management - 2014-02-26 - Mathematical, Physical and Life Sciences Division, University of Oxford

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to Introduction to Research Data Management - 2014-02-26 - Mathematical, Physical and Life Sciences Division, University of Oxford

Similar to Introduction to Research Data Management - 2014-02-26 - Mathematical, Physical and Life Sciences Division, University of Oxford (19)

More from Research Support Team, IT Services, University of Oxford

More from Research Support Team, IT Services, University of Oxford (6)

Recently uploaded

Recently uploaded (20)

Introduction to Research Data Management - 2014-02-26 - Mathematical, Physical and Life Sciences Division, University of Oxford

Editor's Notes