SlideShare a Scribd company logo
1 of 35
A collaborative approach to
“filling the digital preservation
gap” for Research Data
Management
Jenny Mitcham
Digital Archivist
Borthwick Institute for Archives
University of York
10 September 2015
What am I talking about...?
You
I’m going to talk to you
about digital archiving...
Me
Does she
mean storage?
OAIS blaa blaa AIP blaa
TRAC, PREMIS blaa...
You
Who invited
her anyway?Me
You
Digital preservation is all
about the active
management of data
Me
Is she still just
talking about
storage...?
So what am I talking about?
You
Digital preservation refers to
the series of managed
activities necessary to ensure
continued access to digital
materials for as long as
necessary.
Digital preservation ...refers
to all of the actions required
to maintain access to digital
materials beyond the limits of
media failure or technological
change.*
Me
Oh I see!
zzzzzzzz
Not just
storage then?
* Text shamelessly stolen from the DPC Preservation Handbook
This is a digital archive
The Open Archival Information System (OAIS)
Archaeologist not archivist
Me on a beach in Devon
Digital archivist at the Archaeology Data
Service (ADS) (2003-2012)
Digital archivist at the Borthwick Institute
for Archives (2012-?)
Filling the digital preservation gap:
Project aim
“…to investigate
Archivematica and explore
how it might be used to
provide digital preservation
functionality within a wider
infrastructure for Research
Data Management.”
This is a collaboration
University of Hull:
• Chris Awre – Head of Information Services, Library and
Learning Innovation
• Richard Green – Independent Consultant
• Simon Wilson – University Archivist
University of York:
• Julie Allinson – Manager,
Digital York
• Jen Mitcham – Digital Archivist
Artefactual Systems
Jisc
Project structure
• Phase 1 – explore: testing, research,
thinking -produce a report (3 months)
• Phase 2 – develop: make
Archivematica better for RDM, plan
implementation (4 months)
• Phase 3 – implement: set up proof of
concepts at York and Hull (6 months)
Why do we need digital preservation?
Why do we need digital preservation
for research data?
• There is a digital preservation gap in current RDM
infrastructures
• We can’t ignore digital preservation – moving targets for
data retention mean we need to take this seriously
• Funder requirements around retention:
– NERC - data should be retained for a minimum of 10 years but
for projects of major importance this may need to be 20 years
or longer
– STFC - expect data to be retained for a minimum of 10 years and
data that cannot be re-measured should be retained indefinitely
– Wellcome Trust – expect data to be kept for a minimum of 10
years but suggest longer periods for certain types of data
University of York RDM questionnaire 2013
• Which data management issues have you come
across in your research over the last five years?
– “Inability to read files in old software formats on old
media or because of expired software licences”
– 24% of 181 researchers who answered this question
admitted this had been a problem for them
Why do we need digital preservation
for research data?
Why Archivematica?
“The goal of the Archivematica project is to give
archivists and librarians with limited technical
and financial capacity the tools, methodology
and confidence to begin preserving digital
information today.”
Why Archivematica?
• Standards-based
• Open Source
• Flexible and customisable
• Compatible with hundreds of file formats
• Advanced search and storage management
• Integrated with third-party systems
From https://ww.archivematica.org/en/
What does Archivematica do?
The short answer:
“It packages data up in a standards compliant
way and prepares it to be stored for the long
term”
What does Archivematica do?
The longer answer:
• Assigns unique identifiers
• Creates a checksum for each object
• Creates a text file with a directory tree of the transfer
• Option to quarantine data for a specified period
• Runs virus checks
• Cleans up file and directory names (removing characters that may cause
problems)
• Runs identification tools so you can find out what file formats you have
• Extracts data from zip files (or not if you would rather not)
• Extracts metadata embedded in the files (if you want)
• Normalises files (if a migration path exists)
• ...
What does Archivematica do?
The really really long answer (if you have time):
• Read the manual
https://ww.archivematica.org/en/docs/archivematica-1.4/
What does research data look like?
York RDM questionnaire
2013: Please select the main
types of electronic research
data you generate
Top research data applications at York
What does research data look like?
York RDM questionnaire 2013:
If your project is not yet
complete, can you make an
estimate of the ‘final’ size of
your digital data
Value of research data
“There has probably been an awful lot
of good data lost due to poor practice
in archiving ...”
“Storing vast datasets which are not
part of the final publication adds a lot
of cost for very little benefit.”
“Unprocessed data is generally large
and difficult to analyse, unless the
analysis tools are provided in the
archive.”
“I hope strongly that in the future I
might contribute to a widely available
repository for musical
instruction/examples ....both for
other players/composers and for
musicological researchers.”
Researchers
...a pragmatic approach?
How could you use Archivematica?
• Host it in-house and link it to an existing
repository/access system (for example DSpace,
CONTENTdm, Fedora/Hydra ...or a CRIS)
• Host it in-house and use as a standalone system
(you would need to have a storage system in place and establish a
way of facilitating access to the data)
• Sign up for a hosted instance of Archivematica
with archivesDIRECT (combines Archivematica with
DuraCloud storage)
• Sign up for a hosted instance of Archivematica
with Arkivum (combines Archivematica with Arkivum storage)
Why would we recommend
Archivematica for RDM?
• It is flexible and can be configured in different ways for
different institutional needs and workflows
• It allows many of the tasks around digital preservation to
be carried out in an automated fashion
• It can be used alongside other existing systems as part of a
wider workflow for research data
• It is a good digital preservation solution for those with
limited resources
• It is an evolving solution that is continually driven and
enhanced by and for the digital preservation community
• It gives institutions greater confidence that they will be
able to continue to provide access to usable copies of
research data over time
What are the downsides?
• It isn’t a magic bullet
• There is no guarantee your data will be
readable in the future
• It can only be as good as current digital
preservation practice
• It can be fiddly to install correctly
• The GUI isn’t that intuitive
• You need staff who understand it
FAQs
• Why do we need a digital preservation system for research
data?
• What are the risks if we don't address digital preservation?
• Why are we interested in Archivematica?
• Why do we recommend Archivematica to help preserve
research data?
• What does Archivematica actually do?
• How could Archivematica be incorporated into a wider
technical infrastructure for research data management?
• What does research data look like?
• How would Archivematica handle research data?
• What are the limitations of Archivematica for research data?
• What costs are associated with using Archivematica?
• ....
Read all about it!
http://digital-archiving.blogspot.co.uk/
RDM Workflows at York
• We get a copy of data from a researcher
• We transfer it to Archivematica
• Archivematica packages it up for storage and
creates the Archival Information Package (AIP)
• Archivematica sends the AIP to archival storage
• Metadata is published in data catalogue
• If someone requests the data Archivematica will
create a Dissemination Information Package (DIP)
• DIP will be uploaded to Digital Library for access
How do York plan to use Archivematica?
How can we improve Archivematica?
1. Enable better workflows for RDM (producing a
DIP on request)
2. Allowing the DIP (access copy of data) to be
usable by different repository systems
3. Helping reduce bottlenecks for big data
4. Workflows for unidentified files
5. Enabling easier querying of data within
Archivematica by third party applications
6. Better documentation
Where to find out more
http://www.york.ac.uk/borthwick/
Where to find out more
Any questions?
Thanks for listening
You
Has she
finished yet...?
Me
Feel free to contact me at
jenny.mitcham@york.ac.uk
Me
...I really am
keen to talk to
you about this
project
Useful links:
Borthwick website: http://www.york.ac.uk/borthwick/
Digital archiving blog: http://digital-archiving.blogspot.co.uk/
Archivematica: https://www.archivematica.org/en/
Report: http://dx.doi.org/10.6084/m9.figshare.1481170

More Related Content

What's hot

Getting to Grips with Research Data Management
Getting to Grips with Research Data Management Getting to Grips with Research Data Management
Getting to Grips with Research Data Management IzzyChad
 
Digital Preservation in Production (DPN and DuraCloud Vault)
Digital Preservation in Production (DPN and DuraCloud Vault)Digital Preservation in Production (DPN and DuraCloud Vault)
Digital Preservation in Production (DPN and DuraCloud Vault)DuraSpace
 
5.15.17 Powering Linked Data and Hosted Solutions with Fedora Webinar Slides
5.15.17 Powering Linked Data and Hosted Solutions with Fedora Webinar Slides5.15.17 Powering Linked Data and Hosted Solutions with Fedora Webinar Slides
5.15.17 Powering Linked Data and Hosted Solutions with Fedora Webinar SlidesDuraSpace
 
Digital preservation: an introduction
Digital preservation: an introductionDigital preservation: an introduction
Digital preservation: an introductionPublicLibraryServices
 
Service and Support for Science IT -Peter Kunzst, University of Zurich
Service and Support for Science IT-Peter Kunzst, University of ZurichService and Support for Science IT-Peter Kunzst, University of Zurich
Service and Support for Science IT -Peter Kunzst, University of ZurichMind the Byte
 
151111 tryggve-nordic biobank
151111 tryggve-nordic biobank151111 tryggve-nordic biobank
151111 tryggve-nordic biobankanttipursula
 
Getting to grips with Research Data Management
Getting to grips with Research Data ManagementGetting to grips with Research Data Management
Getting to grips with Research Data ManagementIzzyChad
 
Planning for Research Data Managment
Planning for Research Data ManagmentPlanning for Research Data Managment
Planning for Research Data ManagmentDaniel Crane
 
Tryggve support for-research
Tryggve support for-researchTryggve support for-research
Tryggve support for-researchanttipursula
 
Providing support and services for researchers in good data governance
Providing support and services for researchers in good data governanceProviding support and services for researchers in good data governance
Providing support and services for researchers in good data governanceRobin Rice
 
Transfer overview for Wellington government recordkeeping forum June 2012
Transfer overview for Wellington government recordkeeping forum June 2012Transfer overview for Wellington government recordkeeping forum June 2012
Transfer overview for Wellington government recordkeeping forum June 2012RecordkeepingForum
 
Core Trust Seal for Trustworthy Data Repositories, 2018-04-19
Core Trust Seal for Trustworthy Data Repositories, 2018-04-19Core Trust Seal for Trustworthy Data Repositories, 2018-04-19
Core Trust Seal for Trustworthy Data Repositories, 2018-04-19Ciarán Quinn
 
Digital Preservation
Digital PreservationDigital Preservation
Digital PreservationMichael Day
 

What's hot (20)

Getting to Grips with Research Data Management
Getting to Grips with Research Data Management Getting to Grips with Research Data Management
Getting to Grips with Research Data Management
 
Digital Preservation in Production (DPN and DuraCloud Vault)
Digital Preservation in Production (DPN and DuraCloud Vault)Digital Preservation in Production (DPN and DuraCloud Vault)
Digital Preservation in Production (DPN and DuraCloud Vault)
 
Research Data Management Plan: How to Write One - 2017-02-01 - University of ...
Research Data Management Plan: How to Write One - 2017-02-01 - University of ...Research Data Management Plan: How to Write One - 2017-02-01 - University of ...
Research Data Management Plan: How to Write One - 2017-02-01 - University of ...
 
Preparing Your Research Material for the Future - 2017-02-22 - Humanities Div...
Preparing Your Research Material for the Future - 2017-02-22 - Humanities Div...Preparing Your Research Material for the Future - 2017-02-22 - Humanities Div...
Preparing Your Research Material for the Future - 2017-02-22 - Humanities Div...
 
Introduction to Research Data Management - 2017-02-15 - MPLS Division, Univer...
Introduction to Research Data Management - 2017-02-15 - MPLS Division, Univer...Introduction to Research Data Management - 2017-02-15 - MPLS Division, Univer...
Introduction to Research Data Management - 2017-02-15 - MPLS Division, Univer...
 
5.15.17 Powering Linked Data and Hosted Solutions with Fedora Webinar Slides
5.15.17 Powering Linked Data and Hosted Solutions with Fedora Webinar Slides5.15.17 Powering Linked Data and Hosted Solutions with Fedora Webinar Slides
5.15.17 Powering Linked Data and Hosted Solutions with Fedora Webinar Slides
 
Digital preservation: an introduction
Digital preservation: an introductionDigital preservation: an introduction
Digital preservation: an introduction
 
Service and Support for Science IT -Peter Kunzst, University of Zurich
Service and Support for Science IT-Peter Kunzst, University of ZurichService and Support for Science IT-Peter Kunzst, University of Zurich
Service and Support for Science IT -Peter Kunzst, University of Zurich
 
151111 tryggve-nordic biobank
151111 tryggve-nordic biobank151111 tryggve-nordic biobank
151111 tryggve-nordic biobank
 
Getting to grips with Research Data Management
Getting to grips with Research Data ManagementGetting to grips with Research Data Management
Getting to grips with Research Data Management
 
Planning for Research Data Managment
Planning for Research Data ManagmentPlanning for Research Data Managment
Planning for Research Data Managment
 
Tryggve support for-research
Tryggve support for-researchTryggve support for-research
Tryggve support for-research
 
Providing support and services for researchers in good data governance
Providing support and services for researchers in good data governanceProviding support and services for researchers in good data governance
Providing support and services for researchers in good data governance
 
Transfer overview for Wellington government recordkeeping forum June 2012
Transfer overview for Wellington government recordkeeping forum June 2012Transfer overview for Wellington government recordkeeping forum June 2012
Transfer overview for Wellington government recordkeeping forum June 2012
 
Preparing Your Research Material for the Future - 2016-11-16 - Humanities Div...
Preparing Your Research Material for the Future - 2016-11-16 - Humanities Div...Preparing Your Research Material for the Future - 2016-11-16 - Humanities Div...
Preparing Your Research Material for the Future - 2016-11-16 - Humanities Div...
 
Core Trust Seal for Trustworthy Data Repositories, 2018-04-19
Core Trust Seal for Trustworthy Data Repositories, 2018-04-19Core Trust Seal for Trustworthy Data Repositories, 2018-04-19
Core Trust Seal for Trustworthy Data Repositories, 2018-04-19
 
Engaging the Researcher in RDM
Engaging the Researcher in RDMEngaging the Researcher in RDM
Engaging the Researcher in RDM
 
Digital Preservation
Digital PreservationDigital Preservation
Digital Preservation
 
NISO Two-Part Webinar: Sustainable Information Part 2: Digital Preservation o...
NISO Two-Part Webinar: Sustainable Information Part 2: Digital Preservation o...NISO Two-Part Webinar: Sustainable Information Part 2: Digital Preservation o...
NISO Two-Part Webinar: Sustainable Information Part 2: Digital Preservation o...
 
Using a dumb identifier to do smart things
Using a dumb identifier to do smart thingsUsing a dumb identifier to do smart things
Using a dumb identifier to do smart things
 

Viewers also liked

Project update: A collaborative approach to "filling the digital preservation...
Project update: A collaborative approach to "filling the digital preservation...Project update: A collaborative approach to "filling the digital preservation...
Project update: A collaborative approach to "filling the digital preservation...Jenny Mitcham
 
Jisc Shared Service requirements presentation - 18th November 2015
Jisc Shared Service requirements presentation - 18th November 2015Jisc Shared Service requirements presentation - 18th November 2015
Jisc Shared Service requirements presentation - 18th November 2015Jenny Mitcham
 
A collaborative approach to "filling the digital preservation gap" for Resear...
A collaborative approach to "filling the digital preservation gap" for Resear...A collaborative approach to "filling the digital preservation gap" for Resear...
A collaborative approach to "filling the digital preservation gap" for Resear...Jenny Mitcham
 
"Filling the digital preservation gap" with Archivematica
"Filling the digital preservation gap" with Archivematica"Filling the digital preservation gap" with Archivematica
"Filling the digital preservation gap" with ArchivematicaJenny Mitcham
 
Implementing Archivematica, research data network
Implementing Archivematica, research data networkImplementing Archivematica, research data network
Implementing Archivematica, research data networkJisc RDM
 
Implementing figshare, research data network
Implementing figshare, research data networkImplementing figshare, research data network
Implementing figshare, research data networkJisc RDM
 
Building a collaborative RDM community, research data network
Building a collaborative RDM community, research data networkBuilding a collaborative RDM community, research data network
Building a collaborative RDM community, research data networkJisc RDM
 
ORDS, research data network
ORDS, research data networkORDS, research data network
ORDS, research data networkJisc RDM
 
Grampian safe haven, research data network
Grampian safe haven, research data networkGrampian safe haven, research data network
Grampian safe haven, research data networkJisc RDM
 
Clipper, research data network
Clipper, research data networkClipper, research data network
Clipper, research data networkJisc RDM
 
Standardising research data policies, research data network
Standardising research data policies, research data networkStandardising research data policies, research data network
Standardising research data policies, research data networkJisc RDM
 
Gold, silver, bronze - research data network
Gold, silver, bronze - research data networkGold, silver, bronze - research data network
Gold, silver, bronze - research data networkJisc RDM
 
DAF Survey Results, research data network
DAF Survey Results, research data networkDAF Survey Results, research data network
DAF Survey Results, research data networkJisc RDM
 
Cv / Interview tips ep professional
Cv / Interview tips   ep professionalCv / Interview tips   ep professional
Cv / Interview tips ep professionalLaura Parkes
 
Infografía el nuevo profesional de marketing
Infografía el nuevo profesional de marketingInfografía el nuevo profesional de marketing
Infografía el nuevo profesional de marketingAdgravity
 
Parent-adolescent relationship and adol suicidality
Parent-adolescent relationship and adol suicidalityParent-adolescent relationship and adol suicidality
Parent-adolescent relationship and adol suicidalityMark O'Donovan
 
معرفی تهک اداره کل هواشناسی استان اصفهان
معرفی تهک اداره کل هواشناسی استان اصفهانمعرفی تهک اداره کل هواشناسی استان اصفهان
معرفی تهک اداره کل هواشناسی استان اصفهانBabak Asadi
 
Las cifras de los Adblockers
Las cifras de los AdblockersLas cifras de los Adblockers
Las cifras de los AdblockersAdgravity
 

Viewers also liked (20)

Project update: A collaborative approach to "filling the digital preservation...
Project update: A collaborative approach to "filling the digital preservation...Project update: A collaborative approach to "filling the digital preservation...
Project update: A collaborative approach to "filling the digital preservation...
 
Jisc Shared Service requirements presentation - 18th November 2015
Jisc Shared Service requirements presentation - 18th November 2015Jisc Shared Service requirements presentation - 18th November 2015
Jisc Shared Service requirements presentation - 18th November 2015
 
A collaborative approach to "filling the digital preservation gap" for Resear...
A collaborative approach to "filling the digital preservation gap" for Resear...A collaborative approach to "filling the digital preservation gap" for Resear...
A collaborative approach to "filling the digital preservation gap" for Resear...
 
"Filling the digital preservation gap" with Archivematica
"Filling the digital preservation gap" with Archivematica"Filling the digital preservation gap" with Archivematica
"Filling the digital preservation gap" with Archivematica
 
Implementing Archivematica, research data network
Implementing Archivematica, research data networkImplementing Archivematica, research data network
Implementing Archivematica, research data network
 
Implementing figshare, research data network
Implementing figshare, research data networkImplementing figshare, research data network
Implementing figshare, research data network
 
Building a collaborative RDM community, research data network
Building a collaborative RDM community, research data networkBuilding a collaborative RDM community, research data network
Building a collaborative RDM community, research data network
 
ORDS, research data network
ORDS, research data networkORDS, research data network
ORDS, research data network
 
Grampian safe haven, research data network
Grampian safe haven, research data networkGrampian safe haven, research data network
Grampian safe haven, research data network
 
Clipper, research data network
Clipper, research data networkClipper, research data network
Clipper, research data network
 
Standardising research data policies, research data network
Standardising research data policies, research data networkStandardising research data policies, research data network
Standardising research data policies, research data network
 
Gold, silver, bronze - research data network
Gold, silver, bronze - research data networkGold, silver, bronze - research data network
Gold, silver, bronze - research data network
 
DAF Survey Results, research data network
DAF Survey Results, research data networkDAF Survey Results, research data network
DAF Survey Results, research data network
 
Cv / Interview tips ep professional
Cv / Interview tips   ep professionalCv / Interview tips   ep professional
Cv / Interview tips ep professional
 
Infografía el nuevo profesional de marketing
Infografía el nuevo profesional de marketingInfografía el nuevo profesional de marketing
Infografía el nuevo profesional de marketing
 
Parent-adolescent relationship and adol suicidality
Parent-adolescent relationship and adol suicidalityParent-adolescent relationship and adol suicidality
Parent-adolescent relationship and adol suicidality
 
MOTIVATING TOMORROWS ENGINEERS
MOTIVATING TOMORROWS ENGINEERSMOTIVATING TOMORROWS ENGINEERS
MOTIVATING TOMORROWS ENGINEERS
 
معرفی تهک اداره کل هواشناسی استان اصفهان
معرفی تهک اداره کل هواشناسی استان اصفهانمعرفی تهک اداره کل هواشناسی استان اصفهان
معرفی تهک اداره کل هواشناسی استان اصفهان
 
Las cifras de los Adblockers
Las cifras de los AdblockersLas cifras de los Adblockers
Las cifras de los Adblockers
 
LEGEN I SIG SELV. PD 2015
LEGEN I SIG SELV. PD 2015LEGEN I SIG SELV. PD 2015
LEGEN I SIG SELV. PD 2015
 

Similar to A collaborative approach to "filling the digital preservation gap" for Research Data Management

Jisc Research Data Management Shared Service Workshop: An institutional persp...
Jisc Research Data Management Shared Service Workshop: An institutional persp...Jisc Research Data Management Shared Service Workshop: An institutional persp...
Jisc Research Data Management Shared Service Workshop: An institutional persp...Jisc RDM
 
ERA CoBioTech Data Management Webinar
ERA CoBioTech Data Management WebinarERA CoBioTech Data Management Webinar
ERA CoBioTech Data Management WebinarFAIRDOM
 
OU Library Research Support webinar: Working with research data
OU Library Research Support webinar: Working with research dataOU Library Research Support webinar: Working with research data
OU Library Research Support webinar: Working with research dataIzzyChad
 
Planning for Research Data Management
Planning for Research Data ManagementPlanning for Research Data Management
Planning for Research Data Managementdancrane_open
 
Impact of Covid-19 on Learning and Education
Impact of Covid-19 on Learning and EducationImpact of Covid-19 on Learning and Education
Impact of Covid-19 on Learning and EducationMANENDRASINGH30
 
Introduction to digital curation
Introduction to digital curationIntroduction to digital curation
Introduction to digital curationGarethKnight
 
Research Data Management at Imperial College London
Research Data Management at Imperial College LondonResearch Data Management at Imperial College London
Research Data Management at Imperial College LondonSarah Anna Stewart
 
PIDs, Data and Software: How Libraries Can Support Researchers in an Evolving...
PIDs, Data and Software: How Libraries Can Support Researchers in an Evolving...PIDs, Data and Software: How Libraries Can Support Researchers in an Evolving...
PIDs, Data and Software: How Libraries Can Support Researchers in an Evolving...Sarah Anna Stewart
 
UBC Library's Digital Preservation Strategy
UBC Library's Digital Preservation StrategyUBC Library's Digital Preservation Strategy
UBC Library's Digital Preservation StrategyUBC Library
 
INNOVATION AND ‎RESEARCH (Digital Library ‎Information Access)‎
INNOVATION AND ‎RESEARCH (Digital Library ‎Information Access)‎INNOVATION AND ‎RESEARCH (Digital Library ‎Information Access)‎
INNOVATION AND ‎RESEARCH (Digital Library ‎Information Access)‎Libcorpio
 
Filling the Digital Preservation Gap - Acting on Change
Filling the Digital Preservation Gap - Acting on ChangeFilling the Digital Preservation Gap - Acting on Change
Filling the Digital Preservation Gap - Acting on ChangePERICLES_FP7
 
Setting a Course for Success: Getting Started with Digital Preservation in Yo...
Setting a Course for Success: Getting Started with Digital Preservation in Yo...Setting a Course for Success: Getting Started with Digital Preservation in Yo...
Setting a Course for Success: Getting Started with Digital Preservation in Yo...WiLS
 
Wednesday 6 May: Hand me the data! What you should know as a humanities resea...
Wednesday 6 May: Hand me the data! What you should know as a humanities resea...Wednesday 6 May: Hand me the data! What you should know as a humanities resea...
Wednesday 6 May: Hand me the data! What you should know as a humanities resea...WARCnet
 
Planning for Research Data Management
Planning for Research Data ManagementPlanning for Research Data Management
Planning for Research Data Managementdancrane_open
 
The state of global research data initiatives: observations from a life on th...
The state of global research data initiatives: observations from a life on th...The state of global research data initiatives: observations from a life on th...
The state of global research data initiatives: observations from a life on th...Projeto RCAAP
 

Similar to A collaborative approach to "filling the digital preservation gap" for Research Data Management (20)

Jisc Research Data Management Shared Service Workshop: An institutional persp...
Jisc Research Data Management Shared Service Workshop: An institutional persp...Jisc Research Data Management Shared Service Workshop: An institutional persp...
Jisc Research Data Management Shared Service Workshop: An institutional persp...
 
ERA CoBioTech Data Management Webinar
ERA CoBioTech Data Management WebinarERA CoBioTech Data Management Webinar
ERA CoBioTech Data Management Webinar
 
OU Library Research Support webinar: Working with research data
OU Library Research Support webinar: Working with research dataOU Library Research Support webinar: Working with research data
OU Library Research Support webinar: Working with research data
 
Preparing Your Research Material for the Future - 2018-06-08 - Humanities Div...
Preparing Your Research Material for the Future - 2018-06-08 - Humanities Div...Preparing Your Research Material for the Future - 2018-06-08 - Humanities Div...
Preparing Your Research Material for the Future - 2018-06-08 - Humanities Div...
 
Planning for Research Data Management
Planning for Research Data ManagementPlanning for Research Data Management
Planning for Research Data Management
 
Caplan and York, 'What It Takes To Make It Last: E-Resources Preservation"
Caplan and York, 'What It Takes To Make It Last:  E-Resources Preservation"Caplan and York, 'What It Takes To Make It Last:  E-Resources Preservation"
Caplan and York, 'What It Takes To Make It Last: E-Resources Preservation"
 
Impact of Covid-19 on Learning and Education
Impact of Covid-19 on Learning and EducationImpact of Covid-19 on Learning and Education
Impact of Covid-19 on Learning and Education
 
Introduction to digital curation
Introduction to digital curationIntroduction to digital curation
Introduction to digital curation
 
Research Data Management at Imperial College London
Research Data Management at Imperial College LondonResearch Data Management at Imperial College London
Research Data Management at Imperial College London
 
Presentation 16 may keynote karin bredenberg
Presentation 16 may keynote karin bredenbergPresentation 16 may keynote karin bredenberg
Presentation 16 may keynote karin bredenberg
 
PIDs, Data and Software: How Libraries Can Support Researchers in an Evolving...
PIDs, Data and Software: How Libraries Can Support Researchers in an Evolving...PIDs, Data and Software: How Libraries Can Support Researchers in an Evolving...
PIDs, Data and Software: How Libraries Can Support Researchers in an Evolving...
 
UBC Library's Digital Preservation Strategy
UBC Library's Digital Preservation StrategyUBC Library's Digital Preservation Strategy
UBC Library's Digital Preservation Strategy
 
INNOVATION AND ‎RESEARCH (Digital Library ‎Information Access)‎
INNOVATION AND ‎RESEARCH (Digital Library ‎Information Access)‎INNOVATION AND ‎RESEARCH (Digital Library ‎Information Access)‎
INNOVATION AND ‎RESEARCH (Digital Library ‎Information Access)‎
 
Filling the Digital Preservation Gap - Acting on Change
Filling the Digital Preservation Gap - Acting on ChangeFilling the Digital Preservation Gap - Acting on Change
Filling the Digital Preservation Gap - Acting on Change
 
Setting a Course for Success: Getting Started with Digital Preservation in Yo...
Setting a Course for Success: Getting Started with Digital Preservation in Yo...Setting a Course for Success: Getting Started with Digital Preservation in Yo...
Setting a Course for Success: Getting Started with Digital Preservation in Yo...
 
Wednesday 6 May: Hand me the data! What you should know as a humanities resea...
Wednesday 6 May: Hand me the data! What you should know as a humanities resea...Wednesday 6 May: Hand me the data! What you should know as a humanities resea...
Wednesday 6 May: Hand me the data! What you should know as a humanities resea...
 
Planning for Research Data Management
Planning for Research Data ManagementPlanning for Research Data Management
Planning for Research Data Management
 
The state of global research data initiatives: observations from a life on th...
The state of global research data initiatives: observations from a life on th...The state of global research data initiatives: observations from a life on th...
The state of global research data initiatives: observations from a life on th...
 
RDM & ELNs @ Edinburgh
RDM & ELNs @ EdinburghRDM & ELNs @ Edinburgh
RDM & ELNs @ Edinburgh
 
DC101 UWE
DC101 UWEDC101 UWE
DC101 UWE
 

Recently uploaded

Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...gajnagarg
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNKTimothy Spann
 
Data Analyst Tasks to do the internship.pdf
Data Analyst Tasks to do the internship.pdfData Analyst Tasks to do the internship.pdf
Data Analyst Tasks to do the internship.pdftheeltifs
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...Bertram Ludäscher
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Valters Lauzums
 
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangePredicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangeThinkInnovation
 
Harnessing the Power of GenAI for BI and Reporting.pptx
Harnessing the Power of GenAI for BI and Reporting.pptxHarnessing the Power of GenAI for BI and Reporting.pptx
Harnessing the Power of GenAI for BI and Reporting.pptxParas Gupta
 
PLE-statistics document for primary schs
PLE-statistics document for primary schsPLE-statistics document for primary schs
PLE-statistics document for primary schscnajjemba
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...gajnagarg
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...ZurliaSoop
 
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制vexqp
 
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制vexqp
 
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...Health
 
SR-101-01012024-EN.docx Federal Constitution of the Swiss Confederation
SR-101-01012024-EN.docx  Federal Constitution  of the Swiss ConfederationSR-101-01012024-EN.docx  Federal Constitution  of the Swiss Confederation
SR-101-01012024-EN.docx Federal Constitution of the Swiss ConfederationEfruzAsilolu
 
Switzerland Constitution 2002.pdf.........
Switzerland Constitution 2002.pdf.........Switzerland Constitution 2002.pdf.........
Switzerland Constitution 2002.pdf.........EfruzAsilolu
 
Dubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls DubaiDubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls Dubaikojalkojal131
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样wsppdmt
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabiaahmedjiabur940
 

Recently uploaded (20)

Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Data Analyst Tasks to do the internship.pdf
Data Analyst Tasks to do the internship.pdfData Analyst Tasks to do the internship.pdf
Data Analyst Tasks to do the internship.pdf
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangePredicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
 
Harnessing the Power of GenAI for BI and Reporting.pptx
Harnessing the Power of GenAI for BI and Reporting.pptxHarnessing the Power of GenAI for BI and Reporting.pptx
Harnessing the Power of GenAI for BI and Reporting.pptx
 
PLE-statistics document for primary schs
PLE-statistics document for primary schsPLE-statistics document for primary schs
PLE-statistics document for primary schs
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
 
Sequential and reinforcement learning for demand side management by Margaux B...
Sequential and reinforcement learning for demand side management by Margaux B...Sequential and reinforcement learning for demand side management by Margaux B...
Sequential and reinforcement learning for demand side management by Margaux B...
 
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
 
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
 
SR-101-01012024-EN.docx Federal Constitution of the Swiss Confederation
SR-101-01012024-EN.docx  Federal Constitution  of the Swiss ConfederationSR-101-01012024-EN.docx  Federal Constitution  of the Swiss Confederation
SR-101-01012024-EN.docx Federal Constitution of the Swiss Confederation
 
Switzerland Constitution 2002.pdf.........
Switzerland Constitution 2002.pdf.........Switzerland Constitution 2002.pdf.........
Switzerland Constitution 2002.pdf.........
 
Dubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls DubaiDubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls Dubai
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
 

A collaborative approach to "filling the digital preservation gap" for Research Data Management

  • 1. A collaborative approach to “filling the digital preservation gap” for Research Data Management Jenny Mitcham Digital Archivist Borthwick Institute for Archives University of York 10 September 2015
  • 2. What am I talking about...? You I’m going to talk to you about digital archiving... Me Does she mean storage? OAIS blaa blaa AIP blaa TRAC, PREMIS blaa... You Who invited her anyway?Me You Digital preservation is all about the active management of data Me Is she still just talking about storage...?
  • 3. So what am I talking about? You Digital preservation refers to the series of managed activities necessary to ensure continued access to digital materials for as long as necessary. Digital preservation ...refers to all of the actions required to maintain access to digital materials beyond the limits of media failure or technological change.* Me Oh I see! zzzzzzzz Not just storage then? * Text shamelessly stolen from the DPC Preservation Handbook
  • 4. This is a digital archive The Open Archival Information System (OAIS)
  • 5. Archaeologist not archivist Me on a beach in Devon
  • 6. Digital archivist at the Archaeology Data Service (ADS) (2003-2012)
  • 7. Digital archivist at the Borthwick Institute for Archives (2012-?)
  • 8. Filling the digital preservation gap: Project aim “…to investigate Archivematica and explore how it might be used to provide digital preservation functionality within a wider infrastructure for Research Data Management.”
  • 9. This is a collaboration University of Hull: • Chris Awre – Head of Information Services, Library and Learning Innovation • Richard Green – Independent Consultant • Simon Wilson – University Archivist University of York: • Julie Allinson – Manager, Digital York • Jen Mitcham – Digital Archivist Artefactual Systems Jisc
  • 10. Project structure • Phase 1 – explore: testing, research, thinking -produce a report (3 months) • Phase 2 – develop: make Archivematica better for RDM, plan implementation (4 months) • Phase 3 – implement: set up proof of concepts at York and Hull (6 months)
  • 11. Why do we need digital preservation?
  • 12. Why do we need digital preservation for research data? • There is a digital preservation gap in current RDM infrastructures • We can’t ignore digital preservation – moving targets for data retention mean we need to take this seriously • Funder requirements around retention: – NERC - data should be retained for a minimum of 10 years but for projects of major importance this may need to be 20 years or longer – STFC - expect data to be retained for a minimum of 10 years and data that cannot be re-measured should be retained indefinitely – Wellcome Trust – expect data to be kept for a minimum of 10 years but suggest longer periods for certain types of data
  • 13. University of York RDM questionnaire 2013 • Which data management issues have you come across in your research over the last five years? – “Inability to read files in old software formats on old media or because of expired software licences” – 24% of 181 researchers who answered this question admitted this had been a problem for them Why do we need digital preservation for research data?
  • 14. Why Archivematica? “The goal of the Archivematica project is to give archivists and librarians with limited technical and financial capacity the tools, methodology and confidence to begin preserving digital information today.”
  • 15. Why Archivematica? • Standards-based • Open Source • Flexible and customisable • Compatible with hundreds of file formats • Advanced search and storage management • Integrated with third-party systems From https://ww.archivematica.org/en/
  • 16. What does Archivematica do? The short answer: “It packages data up in a standards compliant way and prepares it to be stored for the long term”
  • 17. What does Archivematica do? The longer answer: • Assigns unique identifiers • Creates a checksum for each object • Creates a text file with a directory tree of the transfer • Option to quarantine data for a specified period • Runs virus checks • Cleans up file and directory names (removing characters that may cause problems) • Runs identification tools so you can find out what file formats you have • Extracts data from zip files (or not if you would rather not) • Extracts metadata embedded in the files (if you want) • Normalises files (if a migration path exists) • ...
  • 18. What does Archivematica do? The really really long answer (if you have time): • Read the manual https://ww.archivematica.org/en/docs/archivematica-1.4/
  • 19. What does research data look like? York RDM questionnaire 2013: Please select the main types of electronic research data you generate
  • 20. Top research data applications at York
  • 21. What does research data look like? York RDM questionnaire 2013: If your project is not yet complete, can you make an estimate of the ‘final’ size of your digital data
  • 22. Value of research data “There has probably been an awful lot of good data lost due to poor practice in archiving ...” “Storing vast datasets which are not part of the final publication adds a lot of cost for very little benefit.” “Unprocessed data is generally large and difficult to analyse, unless the analysis tools are provided in the archive.” “I hope strongly that in the future I might contribute to a widely available repository for musical instruction/examples ....both for other players/composers and for musicological researchers.” Researchers
  • 24. How could you use Archivematica? • Host it in-house and link it to an existing repository/access system (for example DSpace, CONTENTdm, Fedora/Hydra ...or a CRIS) • Host it in-house and use as a standalone system (you would need to have a storage system in place and establish a way of facilitating access to the data) • Sign up for a hosted instance of Archivematica with archivesDIRECT (combines Archivematica with DuraCloud storage) • Sign up for a hosted instance of Archivematica with Arkivum (combines Archivematica with Arkivum storage)
  • 25. Why would we recommend Archivematica for RDM? • It is flexible and can be configured in different ways for different institutional needs and workflows • It allows many of the tasks around digital preservation to be carried out in an automated fashion • It can be used alongside other existing systems as part of a wider workflow for research data • It is a good digital preservation solution for those with limited resources • It is an evolving solution that is continually driven and enhanced by and for the digital preservation community • It gives institutions greater confidence that they will be able to continue to provide access to usable copies of research data over time
  • 26. What are the downsides? • It isn’t a magic bullet • There is no guarantee your data will be readable in the future • It can only be as good as current digital preservation practice • It can be fiddly to install correctly • The GUI isn’t that intuitive • You need staff who understand it
  • 27. FAQs • Why do we need a digital preservation system for research data? • What are the risks if we don't address digital preservation? • Why are we interested in Archivematica? • Why do we recommend Archivematica to help preserve research data? • What does Archivematica actually do? • How could Archivematica be incorporated into a wider technical infrastructure for research data management? • What does research data look like? • How would Archivematica handle research data? • What are the limitations of Archivematica for research data? • What costs are associated with using Archivematica? • ....
  • 28. Read all about it! http://digital-archiving.blogspot.co.uk/
  • 29. RDM Workflows at York • We get a copy of data from a researcher • We transfer it to Archivematica • Archivematica packages it up for storage and creates the Archival Information Package (AIP) • Archivematica sends the AIP to archival storage • Metadata is published in data catalogue • If someone requests the data Archivematica will create a Dissemination Information Package (DIP) • DIP will be uploaded to Digital Library for access
  • 30. How do York plan to use Archivematica?
  • 31.
  • 32. How can we improve Archivematica? 1. Enable better workflows for RDM (producing a DIP on request) 2. Allowing the DIP (access copy of data) to be usable by different repository systems 3. Helping reduce bottlenecks for big data 4. Workflows for unidentified files 5. Enabling easier querying of data within Archivematica by third party applications 6. Better documentation
  • 33. Where to find out more http://www.york.ac.uk/borthwick/
  • 34. Where to find out more
  • 35. Any questions? Thanks for listening You Has she finished yet...? Me Feel free to contact me at jenny.mitcham@york.ac.uk Me ...I really am keen to talk to you about this project Useful links: Borthwick website: http://www.york.ac.uk/borthwick/ Digital archiving blog: http://digital-archiving.blogspot.co.uk/ Archivematica: https://www.archivematica.org/en/ Report: http://dx.doi.org/10.6084/m9.figshare.1481170