1. IASSIST40 W1:
Data management &
curation workshop
Toronto, 3 June, 2014
Robin Rice
EDINA and Data Library
University of Edinburgh
2. Edinburgh DataShare
• An institutional OA data repository based in DSpace
• Multidisciplinary, multiple data types
• Supports University Research Data Mgmt Policy
• For UoE researchers & their collaborators only
• For research projects without a domain repository
• Has been full service since 2010 but recent University
funding for development is allowing enhancements
• Promoted as part of RDM programme, one of many
‘tools for the job’
3. Scope
• No limits in terms of subject matter or data types
• From mathematical proofs to audio files of Shilluuk
tribal narratives
• Meeting new users across university
• 21 ‘schools’ make up our top level ‘communities’ –
several still empty
• Open data a tough sell for (some) academics
• Interviews with Creative Arts researchers, not
comfortable with term ‘data’ in their context
• Social Sciences covered well by UK Data Archive
4. Policies
• No mandate for deposit; can only promote benefits
• No deposit of sensitive data, please
• No user registration or enforcement of T&Cs; open
data or embargo or ‘request copy’ from depositor
• Self-deposit model: so KISS, in terms of workflow
o Guidance, such as checklist for deposit, user guide with screenshots
o Meetings to discuss data welcome; assisted deposit where warranted
• Basic quality assurance checks by staff
(documentation exists, file formats, file integrity)
• Open Data Commons Attribution licence by
default; open metadata
• Preservation policy; depositor agreement; service
level definition
5. Discoverability
• Keep DSpace up to date; search engine hits good
• Focus on discoverability metadata (DCMI); ‘type’
• System-generated ‘suggested citation’ based on
required fields on landing page
• Handle on landing page. DOIs coming soon
• Harvested by Data Citation Index
• Metadata links to data sources, articles, other
versions
6. Platform
• DSpace originally, in 2013 RDM Steering Group
suggested pilot depositors to test fitness for purpose
• Advantages of continuity but we have also looked
at: Fedora, Dataverse, CKAN, Invenio
• Still mainly upload/download model, no
visualisation, modeling, online analysis
• SWORD and batch ingest recently enabled for
large and/or voluminous datasets (upload)
• Becoming part of a local system including CRIS,
Data Asset Register, Vault, Active Data Store
7. Metadata standards
• Based on DCMI and a few necessary system-
related fields
• Needs to be lightweight & multidisciplinary
• Conforms to DataCite minimum fields (DOIs soon)
• Discovery metadata only; documentation files
required to allow re-use (part of manual QA check)
• File formats are supported, known, or unknown, with
guidance given to depositors.