P4C x ELT = P4ELT: Its Theoretical Background (Kanazawa, 2024 March).pdf
Jisc Research Data Management Shared Service Workshop: An institutional perspective
1. Jisc Research Data Management
Shared Service Workshop:
An institutional perspective
Jenny Mitcham
Digital Archivist
Borthwick Institute for Archives
University of York
22nd February 2016
2. The RDM Problem at York
• At the University of York, RDM is not *yet* a
solved problem
• We have a repository:
– But not all the necessary workflows in place to ingest
and manage research data
• We have a CRIS that researchers can use to enter
metadata about their research data:
– But ad hoc manual workflows for getting hold of the
dataset
• We are not currently addressing all the long term
preservation needs of the research data
3. What am I talking about?
You
Digital preservation refers to
the series of managed
activities necessary to ensure
continued access to digital
materials for as long as
necessary.
Digital preservation ...refers
to all of the actions required
to maintain access to digital
materials beyond the limits of
media failure or technological
change.*
Me
Oh I see!
Not just
storage then?
* Text shamelessly stolen from the DPC Preservation Handbook
4. This is a digital archive
The Open Archival Information System (OAIS)
5. Filling the digital preservation gap:
Project aim
“…to investigate
Archivematica and explore
how it might be used to
provide digital preservation
functionality within a wider
infrastructure for Research
Data Management.”
6. To find out more…
Date: 24th February
Time: 13.50-14.10
Place: Zurich 1
Session: B3: Digital Preservation
Paper: "Filling the Preservation Gap" for
Research Data by Jenny Mitcham
7. We wanted to be able to answer the
following questions...
• What is the nature of current research data at
York (ie: file format, size, sensitivity)?
• How is research data stored, managed and
shared currently?
• What are the barriers to people managing
their data effectively?
• Where are the gaps in current provision, and
what services do we need to provide to fill
these gaps?
8. RDM questionnaire (DAF audit)
• Questionnaire based
(loosely) on DCC’s DAF
• Informed by examples
from other institutions
• Used Google forms
• Sent to research staff and
students
• March-May 2013
• 188 responses
12. Will you deposit your data with an
archive?
Reasons why not:
• It is not something I had ever considered - 42%
• It is not something my funder requires - 35%
• There isn't a suitable data centre for my discipline – 18%
13. Data management issues
Large volume of data caused problems managing and accessing it 75 41%
Problem finding or accessing research data from former colleagues e.g.
PhD students or research staff who have left the University
69 38%
Problem locating where files are stored 62 34%
Absence of file naming conventions made it difficult to find the file you were
looking for
56 31%
Insufficient digital storage space 56 31%
Lack of version control caused confusion 52 29%
Inability to read files in old software formats on old media or because of
expired software licences
44 24%
No data management issues 44 24%
Difficulty interpreting data due to inadequate or lost documentation 43 24%
Insufficient physical storage space 23 13%
Problems establishing ownership of data 14 8%
Problems reading files because of security and encryption 10 5%
Other 9 5%
14. Value of research data
“There has probably been an awful lot
of good data lost due to poor practice
in archiving ...”
“Storing vast datasets which are not part of
the final publication adds a lot of cost for
very little benefit.”
“Unprocessed data is generally large
and difficult to analyse, unless the
analysis tools are provided in the
archive.”
“I hope strongly that in the future I might
contribute to a widely available repository
for musical instruction/examples ....both for
other players/composers and for
musicological researchers.”
Researchers
15. What does research data look like?
York RDM questionnaire
2013: Please select the main
types of electronic research
data you generate
19. The importance of identification
How well are our top 20
formats represented in
Pronom?
• Better than expected
• Sometimes partial
• Sometimes quite
generic (without a
version number)
MATLAB N
SPSS Partial
Stata N
R N
EndNote Partial
NVivo N
LaTeX Partial
Python NWolfram
Mathematica Partial
Gaussian N
ChemDraw Partial
SAS Partial
ArcGIS Partial
GraphPad Prism Partial
Adobe Photoshop Partial
ATLAS.ti N
C++ N
Eclipse NA? No native file formats
MS Excel Y
RSB - ImageJ Partial
22. Some final points
• It is great that Jisc has included digital
preservation as an element of the shared
service
• …but it is not just a question of adopting the
tools
– we may also need to enhance them
– and integrate them
• We need to work with the wider digital
preservation community too
23. Where to find out more
http://www.york.ac.uk/borthwick/