IASSIST40 W1:
Data management &
curation workshop
Toronto, 3 June, 2014
Robin Rice
EDINA and Data Library
University of Edinburgh
Edinburgh DataShare
• An institutional OA data repository based in DSpace
• Multidisciplinary, multiple data types
• Supports University Research Data Mgmt Policy
• For UoE researchers & their collaborators only
• For research projects without a domain repository
• Has been full service since 2010 but recent University
funding for development is allowing enhancements
• Promoted as part of RDM programme, one of many
‘tools for the job’
Scope
• No limits in terms of subject matter or data types
• From mathematical proofs to audio files of Shilluuk
tribal narratives
• Meeting new users across university
• 21 ‘schools’ make up our top level ‘communities’ –
several still empty
• Open data a tough sell for (some) academics
• Interviews with Creative Arts researchers, not
comfortable with term ‘data’ in their context
• Social Sciences covered well by UK Data Archive
Policies
• No mandate for deposit; can only promote benefits
• No deposit of sensitive data, please
• No user registration or enforcement of T&Cs; open
data or embargo or ‘request copy’ from depositor
• Self-deposit model: so KISS, in terms of workflow
o Guidance, such as checklist for deposit, user guide with screenshots
o Meetings to discuss data welcome; assisted deposit where warranted
• Basic quality assurance checks by staff
(documentation exists, file formats, file integrity)
• Open Data Commons Attribution licence by
default; open metadata
• Preservation policy; depositor agreement; service
level definition
Discoverability
• Keep DSpace up to date; search engine hits good
• Focus on discoverability metadata (DCMI); ‘type’
• System-generated ‘suggested citation’ based on
required fields on landing page
• Handle on landing page. DOIs coming soon
• Harvested by Data Citation Index
• Metadata links to data sources, articles, other
versions
Platform
• DSpace originally, in 2013 RDM Steering Group
suggested pilot depositors to test fitness for purpose
• Advantages of continuity but we have also looked
at: Fedora, Dataverse, CKAN, Invenio
• Still mainly upload/download model, no
visualisation, modeling, online analysis
• SWORD and batch ingest recently enabled for
large and/or voluminous datasets (upload)
• Becoming part of a local system including CRIS,
Data Asset Register, Vault, Active Data Store
Metadata standards
• Based on DCMI and a few necessary system-
related fields
• Needs to be lightweight & multidisciplinary
• Conforms to DataCite minimum fields (DOIs soon)
• Discovery metadata only; documentation files
required to allow re-use (part of manual QA check)
• File formats are supported, known, or unknown, with
guidance given to depositors.
Thank you
Binary-by-Xerones-CC-BY-NC

IASSIST40: Data management & curation workshop

  • 1.
    IASSIST40 W1: Data management& curation workshop Toronto, 3 June, 2014 Robin Rice EDINA and Data Library University of Edinburgh
  • 2.
    Edinburgh DataShare • Aninstitutional OA data repository based in DSpace • Multidisciplinary, multiple data types • Supports University Research Data Mgmt Policy • For UoE researchers & their collaborators only • For research projects without a domain repository • Has been full service since 2010 but recent University funding for development is allowing enhancements • Promoted as part of RDM programme, one of many ‘tools for the job’
  • 3.
    Scope • No limitsin terms of subject matter or data types • From mathematical proofs to audio files of Shilluuk tribal narratives • Meeting new users across university • 21 ‘schools’ make up our top level ‘communities’ – several still empty • Open data a tough sell for (some) academics • Interviews with Creative Arts researchers, not comfortable with term ‘data’ in their context • Social Sciences covered well by UK Data Archive
  • 4.
    Policies • No mandatefor deposit; can only promote benefits • No deposit of sensitive data, please • No user registration or enforcement of T&Cs; open data or embargo or ‘request copy’ from depositor • Self-deposit model: so KISS, in terms of workflow o Guidance, such as checklist for deposit, user guide with screenshots o Meetings to discuss data welcome; assisted deposit where warranted • Basic quality assurance checks by staff (documentation exists, file formats, file integrity) • Open Data Commons Attribution licence by default; open metadata • Preservation policy; depositor agreement; service level definition
  • 5.
    Discoverability • Keep DSpaceup to date; search engine hits good • Focus on discoverability metadata (DCMI); ‘type’ • System-generated ‘suggested citation’ based on required fields on landing page • Handle on landing page. DOIs coming soon • Harvested by Data Citation Index • Metadata links to data sources, articles, other versions
  • 6.
    Platform • DSpace originally,in 2013 RDM Steering Group suggested pilot depositors to test fitness for purpose • Advantages of continuity but we have also looked at: Fedora, Dataverse, CKAN, Invenio • Still mainly upload/download model, no visualisation, modeling, online analysis • SWORD and batch ingest recently enabled for large and/or voluminous datasets (upload) • Becoming part of a local system including CRIS, Data Asset Register, Vault, Active Data Store
  • 7.
    Metadata standards • Basedon DCMI and a few necessary system- related fields • Needs to be lightweight & multidisciplinary • Conforms to DataCite minimum fields (DOIs soon) • Discovery metadata only; documentation files required to allow re-use (part of manual QA check) • File formats are supported, known, or unknown, with guidance given to depositors.
  • 8.