The research data spring project "A consortia-based approach to Research Data Management systems within small and specialist institutions" slides for the third sandpit workshop. Project led by CREST, Leeds Trinity University, Arkivum, and ULCC.
3. Objectives
»Understand how small and specialist institutions can
best implement RDM
» Streamlined in-house systems (PURE)
» Hosted data repository services (EPrints/Arkivum)
» Customisation for specific disciplines (Arts/RDIVA)
» Shared staff and expertise (CREST)
»Capture requirements, prototype systems, conduct
trials, publish findings, sustain results
4. Reports
CREST RDMS survey http://dx.doi.org/10.6084/m9.fi
gshare.1480453
Case Study 1: A consortial approach to
RDMS: A streamlined commercial option
(Pure: Elsevier)
https://dx.doi.org/10.6084/m9.f
igshare.1478780
Case Study 2: Working with Eprints to
implement an open source approach to
RDM in the visual arts
https://dx.doi.org/10.6084/m9.f
igshare.1480454
Case Study 3: RDM workflows and
integrations for HEIs using hosted services
https://dx.doi.org/10.6084/m9.f
igshare.1476832
Figshare stats: 1700+ views, 500+ downloads, top 5% of Altmetric
5. Blogs, workshops, presentations
» CREST RDMS Blog
» CREST Heads of Research
» RDIVA workshop
» OAI9
» RDMF
» RDS
» D-Lib Article
» Submitted Practice Paper to IDCC16
» OR2016 (submission in development)
» JISC Podcast
» RC Concordat for Open Research Data
» Input to JISC Shared Services
7. Progress on academic engagement with Pure system:
» Currently metadata for over 1600 research outputs in the repository
(over 60% of these outputs are journal articles, books and book
chapters).
» 15 outputs contain full-text (this is low but will increase as new outputs
are published/added to the repository).
» Although no data sets have been added to the repository progress has
been made on facilitating RDM within the University:
» appointment of a Repository and Open Access Officer whose role
includes assisting researchers with the deposit of research data into
the repository and developing research data management plans;
» including RDM in the University Ethics Policy and ethics approval
process.
9. »Metadata in EPrints and data in Arkivum
»Direct upload/download of data from the archive
»Good for large datasets
»Good for lots of files
»Ease of use (drag and drop, desktop integration, security)
»Flexible deployment models (hosted, onsite, mixed)
»Institutional control (QC, embargos, access control)
Objectives
10. Research data deposit and archive
EPrints vArchive
5. Review
7. Approve
Archive Cache
ArkivumService
Researcher
Editor
2. Create
data folder
3. Files
1. Create record
9. Files
10. Files
safe
11. Files safe
Researcher files
12. Files safe
DataCite
13. Mint DOI
4. Data ready
8. Protect
14. DOI 6. Check
v
Browser
15. DOI
17. Next steps
» Continue to develop hosted service
» Trials, tests, pilots
» CREST
» Dissemination and sharing of results
» Conferences and events
» CREST members
» Roll-out beyond CREST
» ULCC, Arkivum
» Engagement with JISC RDM Shared Service
» Glasgow School of Art
<p&p> CC NC ND
https://flic.kr/p/5euqU2
But, the problem with the original plugin and workflow is that all the data has to physically pass through the EPrints server.
This doesn’t work well for large datasets where either files are very large or there are very large numbers of files. EPrints web upload doesn’t handle these very well.
It also doesn’t work well when the data archive is local to an institution but the EPrints repository is remote, for example because some of the data is confidential and needs to be kept separate from a publications repository.
This has caused us to create a more flexible integration.
We still support the original workflow, but now we allow data to be uploaded/downloaded direct from a data archive. This is done through owncloud. Owncloud is software for building private dropbox type systems. Owncloud means it’s possible to put a very friendly front end on the archive to allow files to be copied in and out very easily. There’s in built support for mobile clients, desktop clients or web upload.
The workflow looks quite similar to before, but instead of the data being uploaded to EPrints and then passed to the archive, this time EPrints causes a folder to be created in owncloud ready for the user to upload their data. The user is given the owncloud link and can immediately upload their data to the archive. If they have an owncloud account then the process is made even more seamless, for example they see a folder appear in their local owncloud desktop and can drag and drop files straight to it.
EPrints knows the location of the data in the archive and hence can include a list of the data files in the EPrints record. The editor can also be given access to the files so they can still QA the data if needs be and do this before the data is approved for long-term archiving.
There’s lots of detail in the workflow report we created earlier in the project that explains how the new integration will work, including how access to archive data can be granted and controlled.
The EPrints user wants to upload data to the Eprint record.
EPrints asks owncloud for a folder to upload the data to. This folder is a URL that the user can click on.
The user can then include the folder in their owncloud client on their desktop for easy drag and drop of data. Or they could simply upload data over the Web.
The user copies in their data files. Owncloud tells them when they have been uploaded
The files appear in the Arkivum archive.
The files also appear in the EPrints record.
Embargos and other access restrictions can then be applied to the files using standard EPrints functionality.
If another user tries to access the files through EPrints then they are only presented with a list of files that they are allowed to access. If they access these files then they are given a owncloud link to a folder containing just these files. This means they can’t see any of the other data in the archive or download anything that they are not allowed to.