A talk given at a Vitae event in Leeds, 2015-12-01, on how universities and other research organisations can help their researchers practice open research, with a special focus on the training resources provided by FOSTER.
Supporting open research - how to help your researchers - Vitae15
1. Supporting open research – how
to help your researchers
Kevin Ashley
Digital Curation Centre
www.dcc.ac.uk
@kevingashley
Kevin.ashley@ed.ac.uk
Reusable with attribution: CC-BY
The DC Cis supported by Jisc, the
European Commission and the
University of Edinburgh
With content from Sarah Jones (DCC Glasgow),
Martin Donnelly (DCC Edinburgh), Astrid Orth &
Birgit Schmidt (State and University Library
Göttingen), Dan North (LIBER)
2. My home – the DCC
• Mission – to
increase capability
and capacity for
research data
services in UK
institutions
• Not just a UK
problem – an
international one
• Training, shared
services, guidance,
policy, standards,
futures
2015-12-01 Kevin Ashley – Vitae event, Leeds - CC-BY 2
3. DCC networks and partnerships
Original Slide:
Martin Donnelly,
DCC
2015-12-01 Kevin Ashley – Vitae event, Leeds - CC-BY 3
4. What is open research?
“research carried out and communicated in a manner
which allows others to contribute, collaborate and add
to the research effort, with all kinds of data, results
and protocols made freely available at different stages
of the research process.”
Research Information Network, Open Science case studies
www.rin.ac.uk/our-work/data-management-and-curation/
open-science-case-studies
2015-12-01 Kevin Ashley – Vitae event, Leeds - CC-BY 4
5. More than open access publishing
CC-BY Andreas Neuhold
https://commons.wikimedia.org/wiki/File:Open_Science_-_Prinzipien.png2015-12-01 Kevin Ashley – Vitae event, Leeds - CC-BY 5
6. Open methods
• Documenting and sharing workflows and methods
• Sharing code and tools to allow others to reproduce
work
• Using web based tools to facilitate collaboration and
interaction from the outside world
• Open netbook science – “when there is a URL to a
laboratory notebook that is freely available and
indexed on common search engines.”
http://drexel-coas-elearning.blogspot.co.uk/2006/09/open-notebook-science.html
2015-12-01 Kevin Ashley – Vitae event, Leeds - CC-BY 6
7. Research as a cycle
SHARE
…and
RE-USE
The DataONE
lifecycle model
2015-12-01 Kevin Ashley – Vitae event, Leeds - CC-BY 7
8. What should the institution do for
researchers?
• Provide incentives (not all need them)
• Provide support:
– examples
– tools and services
– Provide knowledge and skills
• Being open with own content
• Using other’s open content
2015-12-01 Kevin Ashley – Vitae event, Leeds - CC-BY 8
9. Incentives
COMPLIANCE-DRIVEN
• Your funder requires open
data/papers/code etc
• University policy requires it
• Professional practice/ethics
require it
ADVANTAGE-DRIVEN
• You will get results more
quickly
• Your work will be more
highly-cited
• You are more likely to be
invited to join collaborative
research proposals
• You will be more
employable
2015-12-01 Kevin Ashley – Vitae event, Leeds - CC-BY 9
10. Increased use and economic benefit
Up to 2008
• Sold through the US Geological
Survey for US$600 per scene
• Sales of 19,000 scenes per year
• Annual revenue of $11.4 million
Since 2009
• Freely available over the internet
• Google Earth now uses the images
• Transmission of 2,100,000
scenes per year.
• Estimated to have created value for the
environmental management industry of
$935 million, with direct benefit of
more than $100 million per year to the
US economy
• Has stimulated the development of
applications from a large number of
companies worldwide
The case of NASA Landsat satellite imagery of the Earth’s surface:
http://earthobservatory.nasa.gov/IOTD/view.php?id=83394&src=ve
2015-12-01 Kevin Ashley – Vitae event, Leeds - CC-BY 10
11. 2015-12-01 Kevin Ashley – Vitae event, Leeds - CC-BY 11
DCC Policy
Summary
http://www.dcc.ac.uk/resources/policy-and-legal
12. Incentive - Citability
• Making data available increases citations
• Everyone – academic, funder, institution –
loves citations
• Want evidence?
– Alter, Pienta, Lyle – 240%, social sciences *
– Piwowar, Vision – 9% (microarray data)†
– Henneken, Accomazzi – 20% (astronomy) #
2015-12-01 Kevin Ashley – Vitae event, Leeds - CC-BY 12
† Piwowar H, Vision TJ. (2013) Data reuse & the open data citation advantage. PeerJ PrePrints 1:e1v1
http://dx.doi.org/10.7287/peerj.preprints.1v1
* Amy Pienta, George Alter, Jared Lyle, (2010) The Enduring Value of Social Science Research: The Use and Reuse of Primary Research Data.
http://hdl.handle.net/2027.42/78307
# Edwin Henneken, Alberto Accomazzi, (2011) Linking to Data - Effect on Citation Rates in Astronomy. http://arxiv.org/abs/1111.3618
13. Not just with data…
Vandewalle (2012) DOI: 10.1109/MCSE.2012.632015-12-01 Kevin Ashley – Vitae event, Leeds - CC-BY 13
Slide: Neil Chue Hong
14. Examples - Data reuse stories
• The palaeontologist who saved years of work with
archaeological data
• The 19th-century ships logs that help us model climate
change
• The ‘noise’ from research radar that mapped dust from
Eyjafjallajökull
• See also:
– Whyopenresearch.org
2015-12-01 Kevin Ashley – Vitae event, Leeds - CC-BY 14
15. Data reuse from Hubble
2015-12-01 Kevin Ashley – Vitae event, Leeds - CC-BY 15
17. Some example services
• Storage – persistent, shareable
• Permanent, citeable identifiers
• Database as a service (e.g. Oxford ORDS)
• Embed tools in Excel – Dataup, others
• Workflow management – Taverna
• Training for early career researchers
2015-12-01 Kevin Ashley – Vitae event, Leeds - CC-BY 17
20. Open access button
The Open Access Button helps you get the research
you want right now (without paying for it), and adds
papers you still need to your wishlist.
https://openaccessbutton.org2015-12-01 Kevin Ashley – Vitae event, Leeds - CC-BY 20
21. How to make data open?
1. Choose your dataset(s)
- What can you may open? You may need to revisit this step if you
encounter problems later.
2. Apply an open license
- Determine what IP exists. Apply a suitable licence e.g. CC-BY
3. Make the data available
- Provide the data in a suitable format. Use repositories.
4. Make it discoverable
- Post on the web, register in catalogues…
https://okfn.org
2015-12-01 Kevin Ashley – Vitae event, Leeds - CC-BY 21
22. www.dcc.ac.uk/resources/how-guides/license-research-data
Licensing research data openly
This DCC guide outlines the pros and cons
of each approach and gives practical
advice on how to implement your licence
CREATIVE COMMONS LIMITATIONS
NC Non-Commercial
What counts as commercial?
ND No Derivatives
Severely restricts use
These clauses are not open licenses
Horizon 2020 Open Access
guidelines point to:
or
2015-12-01 Kevin Ashley – Vitae event, Leeds - CC-BY 22
23. EUDAT licensing tool
Answer questions to determine which licence(s)
are appropriate to use
http://ufal.github.io/lindat-license-selector
2015-12-01 Kevin Ashley – Vitae event, Leeds - CC-BY 23
24. Metadata standards to use
Use relevant standards for interoperability
www.dcc.ac.uk/resources/metadata-standards
2015-12-01 Kevin Ashley – Vitae event, Leeds - CC-BY 24
25. Data repositories
http://databib.org
http://service.re3data.org/search
• Does your publisher or funder suggest a repository?
• Are there data centres or community databases for your discipline?
• Does your university offer support for long-term preservation?
Zenodo
• OpenAIRE-CERN joint effort
• Multidisciplinary repository
• Multiple data types
– Publications
– Long tail of research
data
• Citable data (DOI)
• Links funding, publications,
data & software
www.zenodo.org
2015-12-01 Kevin Ashley – Vitae event, Leeds - CC-BY 25
26. OBJECTIVES
• To support different stakeholders, especially
younger researchers, in adopting open access in the
context of the European Research Area (ERA) and in
complying with the open access policies and rules
of participation set out for Horizon 2020
• To integrate open access principles and practice in
the current research workflow by targeting the
young researcher training environment
• To strengthen institutional training capacity to
foster compliance with the open access policies of
the ERA and Horizon 2020 (beyond the FOSTER
project)
• To facilitate the adoption, reinforcement and
implementation of open access policies from other
European funders, in line with the EC’s
recommendation, in partnership with PASTEUR4OA
project
FacilitateOpenScienceTrainingforEuropeanResearch
The FOSTER project
2015-12-01 Kevin Ashley – Vitae event, Leeds - CC-BY 26
27. METHODS
• Identifying existing content
• Developing a portal to support e-learning
• Delivering face-to-face training
FacilitateOpenScienceTrainingforEuropeanResearch
The project
2015-12-01 Kevin Ashley – Vitae event, Leeds - CC-BY 27
29. Training Toolkit
• Helping you to create your own Training Event
• Suggestion: 6 TYPES OF TRAINING SESSIONS
1. Expert talk: ‘ex cathedra’ talk by an external expert on the subject, preferably
followed by Q&A.
2. Talk by peers: experience-based talk by a peer, preferably followed by Q&A.
3. Panel session: panel consisting of three or more experts, preferably with audience
engagement.
4. Workshop: informal, hands-on session lead by an expert. Can be aimed at creation
of tools/policies or just include practical exercises.
5. Group work/Break-out sessions: informal sessions where experts and/or peers
share knowledge and/or experiences.
6. E-learning: using online educational technologies for learning and teaching (online
courses, webinars, etc.).
2015-12-01 Kevin Ashley – Vitae event, Leeds - CC-BY 29
30. FOSTER Portal
• All materials are free
to use, and can be
edited, repurposed,
recombined to suit
your own training
needs.
• Many of these
materials are being
compiled into
courses. [Work in
progress.]
2015-12-01 Kevin Ashley – Vitae event, Leeds - CC-BY 30
32. Summary
• Persuade those who need persuading that
open research is good for them and you
• Support them with services and training
• You aren’t alone – help is available
2015-12-01 Kevin Ashley – Vitae event, Leeds - CC-BY 32
Editor's Notes
I’m Kevin Ashley; I run an organisation called the Digital Curation Centre (DCC) in the UK, and I’ve been invited here today to talk about the lesson we have learnt about establishing research data management services.
My home – the DCC – is a national service whose role is to increase the capability and capacity for UK research institutions – mainly universities – to run their own research data services. Where it makes sense, we also run some national services which those universities use. Because this is not just an national problem, we work alongside many partners and colleagues around the world. We provide training, shared services, guidance, policy, we develop standards and we look at future possible directions in this area.
But the strapline – the phrase at the heading of our web pages – bears closer examination. “Because good research needs good data” is behind all of what we do and much of what I’ll say today. The data we use isn’t always ours, but it always needs to be good.
And a couple I forgot to put on the original slide
INNOVATION: Enables new uses for publication (e.g. text mining and automation)
There’s also an economic benefit, as seen by the case of the NASA landsat satellite images. These were sold until 2008 for $600 a scene. Now they’re freely available and used by Google Earth. Previously they sold 19,000 images a year, whereas now they transmit 2.1 million. The revenue has gone up incredibly too from $11.4 million to an estimated value of $935 million with direct benefit of more than $100 million. The release has also stimulated the development of applications from companies worldwide.
This case study comes from the Royal Society Report on Science as an Open Enterprise.
Did I mention that making data available increases citations? This is a win all round. If you don’t believe me, here are three studies from three different areas that all show robust, positive correlations. The effect size varies with discipline, but we have enough evidence now that anyone who says that their area is different needs to come up with evidence to show why.
There are many such stories of unexpected data reuse; these are a few examples. The last, exemplified in the Old Weather project, is seeing the original data being reused for at least the third time and in doing so is helping both climatologist and family historians through a single piece of transcription work. An impressive result.
Many of you may be familiar with this graph from the Hubble Space telescope data archive. It tells the same story in a different way, and also tells a story about the transformation of astronomy as a discipline. In the days of photographic plates, sharing (analogue) astronomical data was difficult. Digital instruments transformed this, and some time around 2000, more research was being done with old data than with new data.
Which leads to our second lesson. Some people say data sharing and reuse is a difficult change for researchers. In some disciplines, it is. But many have been doing it for some time, and those that have changed have benefited as a result.
Guidance from the DCC can also help researchers to understand data licensing. This guide outlines the pros and cons of each approach e.g. the limitations of some CC options
The OA guidelines under Horizon 2020 point to CC-0 or CC-BY as a straightforward and effective way to make it possible for others to mine, exploit and reproduce the data. See p11 at: http://ec.europa.eu/research/participants/data/ref/h2020/grants_manual/hi/oa_pilot/h2020-hi-oa-pilot-guide_en.pdf
To make sure their data can be understood by themselves, their community and others, researchers should create metadata and documentation.
Metadata is basic descriptive information to help identify and understand the structure of the data e.g. title, author...
Documentation provides the wider context. It’s useful to share the methodology / workflow, software and any information needed to understand the data e.g. explanation of abbreviations or acronyms
There are lots of standards that can be used. The DCC started a catalogue of disciplinary metadata standards which is now being taken forward as an international initiative via an RDA working group
The FOSTER project – what and how.
FOSTER’s training strategy uses a combination of methods and activities, from face-to-face training, to the use of e-learning, blended and self-learning, as well as the dissemination of training materials/contents/curricula via a dedicated training portal, plus a helpdesk. Face-to-face trainings targets graduate schools in European universities and in particular will train trainers/teachers/multipliers that can conduct further training and dissemination activities in their institution, country and disciplinary community. FOSTER combines experiences and materials to showcase best practices, setting the scene for an active learning and teaching community for open access practices across Europe.
The main outcomes of the project are:
The FOSTER portal to host training courses and curricula;
Facilitate the organisation of FOSTER training events and the creation of training content across Europe
Identification of existing contents that can be reused in the context of the training activities and develop/create/ enhance contents if/where they are needed;
The FOSTER project – what and how.
FOSTER’s training strategy uses a combination of methods and activities, from face-to-face training, to the use of e-learning, blended and self-learning, as well as the dissemination of training materials/contents/curricula via a dedicated training portal, plus a helpdesk. Face-to-face trainings targets graduate schools in European universities and in particular will train trainers/teachers/multipliers that can conduct further training and dissemination activities in their institution, country and disciplinary community. FOSTER combines experiences and materials to showcase best practices, setting the scene for an active learning and teaching community for open access practices across Europe.
The main outcomes of the project are:
The FOSTER portal to host training courses and curricula;
Facilitate the organisation of FOSTER training events and the creation of training content across Europe
Identification of existing contents that can be reused in the context of the training activities and develop/create/ enhance contents if/where they are needed;
It is important to note that this is not about standardising the training that people receive: indidividual organisations will always have varying needs in terms of their training provision. But it is about reducing the confusion or complexity that can come when materials are dispersed in discreet units across the far corners of the internet. And by organising those materials into courses, the aim is to create a cohesive e-learning expression out of some of the modular content that exists, so that you can target particular topics and match them to the needs of your staff. Although the portal contents are not exhaustive, they will continue to grow, especially in response to community feedback, and all of the portal contents have been evaluated for quality by FOSTER partners.