Here are some potential barriers to data sharing and some possible actions to address them:- Confidentiality/privacy of human subjects - Anonymize or aggregate data, obtain consent for broad sharing - Intellectual property/commercialization - Use licenses that allow non-commercial or academic use and re-use- National security/cultural sensitivities - Restrict access to authorized users from certain regions/groups- Lack of standards/documentation - Improve metadata and documentation to enable others to understand and use- Lack of skills/resources - Provide training and support services, work with repositories that host and preserve- Embargo periods - Negotiate reasonable time periods with funders, make data
Similar to Here are some potential barriers to data sharing and some possible actions to address them:- Confidentiality/privacy of human subjects - Anonymize or aggregate data, obtain consent for broad sharing - Intellectual property/commercialization - Use licenses that allow non-commercial or academic use and re-use- National security/cultural sensitivities - Restrict access to authorized users from certain regions/groups- Lack of standards/documentation - Improve metadata and documentation to enable others to understand and use- Lack of skills/resources - Provide training and support services, work with repositories that host and preserve- Embargo periods - Negotiate reasonable time periods with funders, make data
Similar to Here are some potential barriers to data sharing and some possible actions to address them:- Confidentiality/privacy of human subjects - Anonymize or aggregate data, obtain consent for broad sharing - Intellectual property/commercialization - Use licenses that allow non-commercial or academic use and re-use- National security/cultural sensitivities - Restrict access to authorized users from certain regions/groups- Lack of standards/documentation - Improve metadata and documentation to enable others to understand and use- Lack of skills/resources - Provide training and support services, work with repositories that host and preserve- Embargo periods - Negotiate reasonable time periods with funders, make data (20)
Designing IA for AI - Information Architecture Conference 2024
Here are some potential barriers to data sharing and some possible actions to address them:- Confidentiality/privacy of human subjects - Anonymize or aggregate data, obtain consent for broad sharing - Intellectual property/commercialization - Use licenses that allow non-commercial or academic use and re-use- National security/cultural sensitivities - Restrict access to authorized users from certain regions/groups- Lack of standards/documentation - Improve metadata and documentation to enable others to understand and use- Lack of skills/resources - Provide training and support services, work with repositories that host and preserve- Embargo periods - Negotiate reasonable time periods with funders, make data
1. Research Data
Management
for librarians
Sarah Jones & Marieke Guy
Digital Curation Centre
Miggie Pickton, University of Northampton
2. About this course
Short presentations with exercises and discussion
Five main sections
― Research data and RDM (30 mins)
― Data Management Planning (30 mins)
― Data sharing (20 mins)
― Skills (30 mins)
― RDM at Northampton (30 mins)
Coffee break halfway through, after data sharing
3. Introductions
Introduce yourself and offer a reflection on the questions:
What is your understanding of research?
Do you know anything about data management?
What do you want to find out today?
Do you see a role for librarians in supporting RDM?
5. Exercise: What are research data?
In pairs, list as many types of data as you can, focusing
(if appropriate) on the subject areas you support
You have 5 minutes
6. What are research data?
All manner of things produced
in the course of research
7. Defining research data
Research data are collected, observed or created, for
the purposes of analysis to produce and validate
original research results
Both analogue and digital materials are 'data'
Lab notebooks and software may be classed as 'data'
Digital data can be:
― created in a digital form ('born digital')
― converted to a digital form (digitised)
8. Types of research data
Instrument measurements
Experimental observations
Still images, video and audio
Text documents, spreadsheets, databases
Quantitative data (e.g. household survey data)
Survey results & interview transcripts
Simulation data, models & software
Slides, artefacts, specimens, samples
Sketches, diaries, lab notebooks …
9. What is data management?
“the active management and appraisal of data over
the lifecycle of scholarly and scientific interest”
Digital Curation Centre
10. What is involved in RDM?
Data Management Planning
Creating data
Documenting data
Create
Accessing / using data
Preserve Document
Storage and backup
Sharing data Share Use
Preserving data Store
11. RDM principles and advice
to share with researchers
n.b. Data Management Planning and Data Sharing are
covered in separate sections
See in particular:
UK Data Archive, Managing and sharing data: best practice for researchers
http://data-archive.ac.uk/media/2894/managingsharing.pdf
12. Data creation
Decide what data will be created and how - this should
be communicated to the whole research team
Develop procedures for consistency and data quality
Choose appropriate software and formats - some are
better for long-term preservation and reuse
Ensure consent forms, licences and partnership
agreements don’t limit options to share data if desired
13. Documentation
Collect together all the information users would
need to understand and reuse the data
Create metadata at the time - it’s hard to do later
Use standards where possible
Name, structure and version files clearly
14. Access and use
Restrict access to those who need to read/edit data
Consider the data security implications or where you
store data and from which devices you access files
Choose appropriate methods to transfer / share data
― filestores & encrypted media rather than email & Dropbox
15. Storage and backup
Use managed services where possible e.g. University
filestores rather than local or external hard drives
Ask the local IT team for advice
3… 2… 1… backup!
― at least 3 copies of a file
― on at least 2 different media
― with at least 1 offsite
16. Data selection
It’s not possible to keep everything. Select based on:
― What has to be kept e.g. data underlying publications
― What legally must be destroyed
― What can’t be recreated e.g. environmental recordings
― What is potentially useful to others
― The scientific or historical value
― ...
How to select and appraise research data:
www.dcc.ac.uk/resources/how-guides/appraise-select-research-data
17. Data preservation
Be aware of requirements to preserve data
Consult and work with experts in this field
Use available subject repositories, data centres and
structured databases
― http://databib.org
19. Data Management Planning
DMPs are written at the start of a project to define:
What data will be collected or created?
How the data will be documented and described?
Where the data will be stored?
Who will be responsible for data security and backup?
Which data will be shared and/or preserved?
How the data will be shared and with whom?
20. Why develop a DMP?
DMPs are often submitted with grant applications, but
are useful whenever researchers are creating data.
They can help researchers to:
Make informed decisions to anticipate & avoid problems
Avoid duplication, data loss and security breaches
Develop procedures early on for consistency
Ensure data are accurate, complete, reliable and secure
21. Which funders require a DMP?
www.dcc.ac.uk/resources/policy-and-legal/ overview-funders-data-policies
22. What do research funders want?
A brief plan submitted in grant applications, and in the
case of NERC, a more detailed plan once funded
1-3 sides of A4 as attachment or a section in Je-S form
Typically a prose statement covering suggested themes
Outline data management and sharing plans, justifying
decisions and any limitations
23. Five common themes / questions
Description of data to be collected / created
(i.e. content, type, format, volume...)
Standards / methodologies for data collection & management
Ethics and Intellectual Property
(highlight any restrictions on data sharing e.g. embargoes, confidentiality)
Plans for data sharing and access
(i.e. how, when, to whom)
Strategy for long-term preservation
24. Exercise: My DMP - a satire
Read through the satirical DMP
Highlight examples of bad practice
Suggest alternative methods / approaches
You have 15 minutes
My Data Management Plan – a satire, Dr C. Titus Brown
http://ivory.idyll.org/blog/data-management.html
25. A useful framework to get started
Think about why
the questions are
being asked
Look at examples
to get an idea of
what to include
www.icpsr.umich.edu/icpsrweb/content/datamanagement/dmp/framework.html
26. Help from the DCC
https://dmponline.dcc.ac.uk
www.dcc.ac.uk/resources/how-guides/develop-data-plan
27. How DMPonline works
Create a plan
based on
relevant
funder /
institutional
templates...
...and then
answer the
questions
using the
guidance
provided
28. Supporting researchers with DMPs
Various types of support could be provided by libraries:
Guidelines and templates on what to include in plans
Example answers, guidance and links to local support
A library of successful DMPs to reuse
Training courses and guidance websites
Tailored consultancy services
Online tools (e.g. customised DMPonline)
29. Tips to share: writing DMPs
Keep it simple, short and specific
Seek advice - consult and collaborate
Base plans on available skills and support
Make sure implementation is feasible
Justify any resources or restrictions needed
Also see: http://www.youtube.com/watch?v=7OJtiA53-Fk
31. What is data sharing?
“… the practice of making data used for scholarly
research available to others.” [Wikipedia]
Who’s involved?
the data sharer
the data repository
the secondary data user
support staff!
32. Reasons to share data
BENEFITS DRIVERS
Avoid duplication Public expectations
Scientific integrity Government agenda
More collaboration RCUK Data Policy
Better research
― www.rcuk.ac.uk/research/Pages/
DataPolicy.aspx
Increased citation
Northampton RDM policy
69% increase shown in study ― http://tiny.cc/Research-Data-
(Piwowar, 2007, PLoS) Policy
33. The expectation of public access
The RCUK Common Principles state that:
“Publicly funded research data are a public good,
produced in the public interest, which should be
made openly available with as few restrictions as
possible in a timely and responsible manner that
does not harm intellectual property.”
34. Exercise: barriers to data sharing
Briefly list some reasons why certain data can’t be
shared and consider whether any actions could be
taken to reduce or overcome these restrictions
You have 10 minutes
Constraints on data sharing Possible solutions / approaches
35. Managing restrictions on sharing
Ethics
Balance data protection with data sharing
Informed consent – cover current and future use
Confidentiality – is anonymisation appropriate?
Access control – who, what, when?
IPR
Clarify copyright before research starts
Consider licensing options e.g. Creative Commons
36. Select formats for data sharing
It’s better to use formats that are:
Unencrypted
Uncompressed
Non-proprietary/patent-encumbered
Research360
Open, documented standard
Standard representation (ASCII, Unicode)
Type Recommended Avoid for data sharing
Tabular data CSV, TSV, SPSS portable Excel
Text Plain text, HTML, RTF Word
PDF/A only if layout matters
Media Container: MP4, Ogg Quicktime
Codec: Theora, Dirac, FLAC H264
Images TIFF, JPEG2000, PNG GIF, JPG
Structured data XML, RDF RDBMS
Further examples: http://www.data-archive.ac.uk/create-manage/format/formats-table
37. How to share research data
Use appropriate repositories
― http://databib.org
License the data so it is clear how it can be reused
― www.dcc.ac.uk/resources/how-guides/license-research-data
Make sure it’s clear how to cite the data
― http://www.dcc.ac.uk/resources/how-guides/cite-datasets
39. How are libraries engaging in RDM?
The library is leading on most DCC institutional engagements.
They are involved in:
defining the institutional strategy
developing RDM policy
delivering training courses
helping researchers to write DMPs
advising on data sharing and citation
setting up data repositories
Library
...
Research
www.dcc.ac.uk/community/institutional-engagements Office
IT
40. Why should libraries support RDM?
RDM requires the input of all support services, but
libraries are taking the lead in the UK – why?
― existing data and open access leadership roles
― often run publication repositories
― have good relationships with researchers
― proven liaison and negotiation skills
― knowledge of information management, metadata etc
― highly relevant skill set
41. Exercise: skills to support RDM
Based on the activities we discussed earlier, consider who
may have relevant skills or expertise to share.
You have 15 minutes
Activity Library and IT Services Other professional
Learning Services services
Copyright
Data citation
Information
literacy
Data storage
Digital preservation
Metadata
...
42. Possible Library RDM roles
Leading on local (institutional) data policy
Bringing data into undergraduate research-based learning
Teaching data literacy to postgraduate students
Developing researcher data awareness
Providing advice, e.g. on writing DMPs or advice on RDM within a project
Explaining the impact of sharing data, and how to cite data
Signposting who in the Uni to consult in relation to a particular question
Auditing to identify data sets for archiving or RDM needs
Developing and managing access to data collections
Documenting what datasets an institution has
Developing local data curation capacity
RDMRose Lite
Promoting data reuse by making known what is available
43. An exciting opportunity
“Researchers need help to manage
their data. This is a really exciting
opportunity for libraries….”
Liz Lyon, VALA 2012
Leadership
Providing tools and support
Advocacy and training
Developing data informatics capacity & capability
44. Potential challenges
Librarians are already over-taxed!
― Other challenges in supporting research (Auckland, 2012)
― Getting up-to-speed and keeping up-to-date
How deep is our understanding of research, especially
scientific research and our level of subject knowledge?
Translating library practices to research data issues
Will researchers look to libraries for this support?
RDMRose Lite
Still need to resource and develop infrastructure
46. RDM drivers at Northampton
REF: research environment; impact
Institutional reputation
Pressure from funders: government; RCUK; EPSRC
(sharing mandates)
Publisher demands: evidence to support published work
Legislative requirements: FOI/EIR requests; Data
Protection
Long term (open) access: reuse and repurpose
Good research practice
47. A (very) brief history of RDM at Northampton
May-June
• First research data (DAF) project aims to establish researchers’ current RDM practices
2010
• DAF project report presented to University Research Committee (URC)
October
2010
• URC working group convened to develop research data policy
Jan-June
• Research Data Policy proposed, refined and approved by URC
2011
• Research data roadmap created in response to EPSRC requirements
April 2012 • DCC ‘engagement' starts
• RDM training and guidance for researchers – led by DCC, supported by LLS
Ongoing • Piloting of TUNDRA2 for research data storage and access
48. Northampton RDM policy
Adopt the RCUK code of good practice
Write and follow a Data Management Plan
Make data accessible wherever possible
Deposit in a repository for preservation
www.northampton.ac.uk/info/20283/academic-
research/1606/research-data-policy
49. UoN research data roadmap
Maps current and planned practice to EPSRC expectations
Covers: awareness of regulatory environment; connection with
published papers; access to datasets; use of metadata; and
data curation
Coverage extended to all subject areas to encourage good data
management practice
and ensure equality of
provision
Roadmap approved by
R&EC in April 2012
But extra resources still
need approval by UET
50. DCC Engagement
So far DCC staff have run training sessions on:
― Managing your PhD data (for research students)
― Managing data through the research lifecycle (Business)
― Meeting funders’ requirements for RDM (Social Sciences)
And provided guidance:
― Creation of a DMPonline template for the University of Northampton, with
attached guidelines
― Development of a guide to meeting ESRC data management planning
requirements (in conjunction with John Horton)
We have also run one-to-one RDM clinics for researchers
Still to come:
― Further training for Schools
― Series of posts on the Research Support Hub
― Further support for research data storage...
51. TUNDRA2 for research data
The University (led by Phil Oakman) is rolling out TUNDRA 2
― open content management system
― to store, manage and preserve files
― facility to share internally and externally
Jane Callaghan & colleagues are piloting this for managing
research data in her big European project
Phil hopes to develop a generic template in TUNDRA2 that
will serve other research projects
Let us know if you know of others who would like to be
involved
52. Exercise: supporting RDM at Northampton?
In small groups, discuss which activities you think
should fall within your role and which shouldn’t.
Do you feel confident to support RDM?
How would you like to see things develop?
You have 15 minutes
54. Summary
In the light of external drivers, researchers at
Northampton need support for RDM
LLS has a key role in shaping services for researchers
in this area
LLS staff have an opportunity to apply their skills in a
new and exciting way
55. Feedback
Has the event met your expectations?
― If not, what would you have liked to see more / less of?
Was the content useful?
Did you like the mix of exercises?
56. Acknowledgement
Ideas and content have been taken from various courses:
― Skills matrix, ADMIRe project, University of Nottingham
http://admire.jiscinvolve.org/wp/2012/09/18/rdmnottingham-training-event
― DIY Training Kit for Librarians, University of Edinburgh
http://datalib.edina.ac.uk/mantra/libtraining.html
― Managing your research data, Research360, University of Bath
http://opus.bath.ac.uk/32296
― RDMRose Lite, University of Sheffield
http://rdmrose.group.shef.ac.uk/?page_id=364
― RoaDMaP training materials, University of Leeds
http://library.leeds.ac.uk/roadmap-project-outputs
― SupportDM modules, University of East London
http://www.uel.ac.uk/trad/outputs/resources