Introduction to Open Science and EOSC

www.geant.org
www.geant.org
1 |
Click to edit Master title style
• Click to edit Master text styles
• Second level
• Third level
• Fourth level
• Fifth level
01/04/2022 1
Introduction to Open Science and EOSC
www.geant.org
Sarah Jones
EOSC Engagement Manager
sarah.jones@geant.org
Twitter: @sarahroams
Predictive Epigenetics PEP-NET training network
1st April 2020

www.geant.org
www.geant.org
Go to menti.com
and enter code
3330 9014
2 |

www.geant.org
www.geant.org
3 |
The what and why of
FAIR and Open Science
Image by Michael Longmire https://unsplash.com/photos/L9EV3OogLh0

www.geant.org
www.geant.org
“science carried out and communicated in a manner which
allows others to contribute, collaborate and add to the research
effort, with all kinds of data, results and protocols made freely
available at different stages of the research process.”
Research Information Network, Open Science case studies
www.rin.ac.uk/our-work/data-management-and-curation/
open-science-case-studies
Defining Open Science
4 |

www.geant.org
www.geant.org
The spectrum of Open Science
5 |
CC-BY Andreas Neuhold
https://commons.wikimedia.org/wiki/File:Open_Science_-_Prinzipien.png

www.geant.org
www.geant.org
Why open access?
Open Access Explained!
www.youtube.com/watch?v=L5rVH1KGBCY

www.geant.org
www.geant.org
• Free, immediate, online access to the results of research
• Two routes to make sure anyone can access your papers
– Gold route: paying APCs to ensure publishers makes copy open
– Green route: self-archiving Open Access copy in repository
• Find out what your publisher allows on SHERPA RoMEO
– www.sherpa.ac.uk/romeo
Open access to publications

www.geant.org
www.geant.org
Open data
 make your stuff available on the Web (whatever format) under an open licence
 make it available as structured data (e.g. Excel instead of a scan of a table)
 use non-proprietary formats (e.g. CSV instead of Excel)
 use URIs to denote things, so that people can point at your stuff
 link your data to other data to provide context
Tim Berners-Lee’s proposal for five star open data - http://5stardata.info
“Open data and content can be freely used, modified
and shared by anyone for any purpose”
http://opendefinition.org

www.geant.org
www.geant.org
• Documenting and sharing workflows and methods
• Sharing code and tools to allow others to reproduce work
• Using web based tools to facilitate collaboration and interaction from the
outside world in your research
• Using tools like MyExperiment and Taverna
Open methods

www.geant.org
www.geant.org
Reliance on specialist research software
Slide from Neil Chue-Hong, Software Sustainability Institute
56%
71%
Do you use research
software?
What would happen to your
research without software
Survey of researchers from 15 UK Russell Group universities conducted
by SSI between August - October 2014. DOI: 10.5281/zenodo.14809
Develop their
own software
Have no formal
software training

www.geant.org
www.geant.org
Design
Experiment
Analysis
Publication
Release
Openness at every stage of research
Open science image CC BY-SA 3.0 by Greg Emmerich www.flickr.com/photos/gemmerich/6365692655
Change the typical
lifecycle
Publish earlier and
release more
Papers + Data +
Methods + Code…
Support
reproducibility

www.geant.org
www.geant.org
Degrees of openness
Open Restricted Closed
Content that can be
freely used, modified and
shared by anyone
for any purpose
Limits on who can use the data,
how or for what purpose
- Charges for use
- Data sharing agreements
- Restrictive licences
- Peer-to-peer exchange
- …
Five star open data

Unable to share
Under embargo

www.geant.org
www.geant.org
• FAIR ≠ Open
• FAIR ensures data can be found, understood and reused
• Data can be shared under restrictions & still be FAIR
"As open as possible, as closed as necessary"
And what is FAIR?
13 |
Image CC-BY-SA by SangyaPundir Image CC-BY by European Commission FAIR data expert group

www.geant.org
www.geant.org
What FAIR means: 15 principles
Findable
F1. (meta)data are assigned a globally unique and eternally
persistent identifier.
F2. data are described with rich metadata.
F3. (meta)data are registered or indexed in a searchable resource.
F4. metadata specify the data identifier.
Interoperable
I1. (meta)data use a formal, accessible, shared, and broadly
applicable language for knowledge representation.
I2. (meta)data use vocabularies that follow FAIR principles.
I3. (meta)data include qualified references to other (meta)data.
Accessible
A1 (meta)data are retrievable by their identifier using a
standardized communications protocol.
A1.1 the protocol is open, free, and universally implementable.
A1.2 the protocol allows for an authentication and authorization
procedure, where necessary.
A2 metadata are accessible, even when the data are no longer
available.
Reusable
R1. meta(data) have a plurality of accurate and relevant attributes.
R1.1. (meta)data are released with a clear and accessible data
usage license.
R1.2. (meta)data are associated with their provenance.
R1.3. (meta)data meet domain-relevant community standards.
Slide CC-BY by Erik Schultes, Leiden UMC
doi: 10.1038/sdata.2016.18

www.geant.org
www.geant.org
The FAIR data principles explained
• Clarifications from GO FAIR
• Each principle is a link to
further clarification,
examples and context
https://www.go-fair.org/fair-
principles
R1. Meta(data) are richly described with a plurality of accurate and relevant
attributes
• By giving data many ‘labels’, it will be much easier to find and reuse the data.
• Provide not just metadata that allows discovery, but also metadata that richly
describes the context under which that data was generated
• “plurality” indicates that metadata should be as generous as possible, even to the
point of providing information that may seem irrelevant.

www.geant.org
www.geant.org
• Findable
- Persistent Identifier
- Metadata online
• Accessible
- Data online
- Restrictions where needed
• Interoperable
- Use standards, controlled vocabs
- Common (open) formats
• Reusable
- Rich documentation
- Clear usage licence
FAIR data checklist
https://doi.org/10.5281/zenodo.5111307

www.geant.org
www.geant.org
• Various research communities have been sharing their
data in a ‘FAIR’ way long before the term emerged
• Meaningful and memorable articulation of concepts
• Natural desire to want to be ‘fair’
• FAIR is gaining significant international traction
FAIR is nothing new

www.geant.org
www.geant.org
Science has always been open
18 |

www.geant.org
www.geant.org
More scientific breakthroughs
www.nytimes.com/2010/08/13/health/research/13alzheimer.html?pagewanted=all&_r=0
“It was unbelievable. Its not science
the way most of us have practiced in
our careers. But we all realised that
we would never get biomarkers
unless all of us parked our egos and
intellectual property noses outside
the door and agreed that all of our
data would be public immediately.”
Dr John Trojanowski, University of Pennsylvania

www.geant.org
www.geant.org
A study that analysed the citation counts of 10,555 papers on gene expression
studies that created microarray data, showed:
“studies that made data available in a public repository
received 9% more citations than similar studies for
which the data was not made available”
Data reuse and the open data citation advantage,
Piwowar, H. & Vision, T. https://peerj.com/articles/175
Get a citation advantage

www.geant.org
www.geant.org
Increased use and economic benefit
Up to 2008 Since 2009
• Freely available over the internet
• Google Earth now uses the images
• Transmission of 2,100,000
scenes per year.
• Estimated to have created value for the
environmental management industry of
$935 million, with direct benefit of more
than $100 million per year to the US
economy
• Has stimulated the development of
applications from a large number of
companies worldwide
The case of NASA Landsat satellite imagery of the Earth’s surface:
http://earthobservatory.nasa.gov/IOTD/view.php?id=83394&src=ve
• Sold through the US Geological
Survey for US$600 per scene
• Sales of 19,000 scenes per year
• Annual revenue of $11.4 million

www.geant.org
www.geant.org
“Open Research Europe requires open
access to research data supporting
articles under the principle ‘as open
as possible, as closed as necessary’,
according to the policy of Horizon
Europe. Data should be deposited in
trusted data repositories.”
Funder imperatives...
https://open-research-europe.ec.europa.eu/for-
authors/data-guidelines#opendata

www.geant.org
www.geant.org
But there are also opportunity costs
By Emilio Bruna
http://brunalab.org/blog/2014/09/04/the-opportunity-
cost-of-my-openscience-was-35-hours-690
For his paper he calculated the following:
1. Double checking the main dataset and
reformatting to submit to Dryad: 5 hours
2. Creating complementary file and preparing
metadata: 3 hours
3. Submission of these two files and the
metadata to Dryad: 45 minutes
4. Preparing a map of the locations: 1 hour
5. Submission of map to Figshare: 15 minutes
6. Cleaning up and documenting the code,
uploading it to GitHub: 25 hours
7. Cost of archiving in Dryad: US$90
8. Page Charges: $600

www.geant.org
www.geant.org
• EC and Member States committed to FAIR and Open
• Pursue this in research policy and grant conditions
• Lots of investment in infrastructure to support data sharing
• Ultimately supports the science ecosystem and ensures
greater return on investment
FAIR and Open both central to EOSC
24 |

www.geant.org
www.geant.org
Go to menti.com
and enter code
3330 9014
25 |

www.geant.org
www.geant.org
What is EOSC?
Image: Martin Reisch https://unsplash.com/photos/6DivtP_WRYs

www.geant.org
www.geant.org
• Collaboration between European
Commission and Member States to
“make Open Science the new normal”
• Established EOSC Association as legal
entity to govern and oversee the
implementation
• Huge investment in infrastructure –
€350 million in initial development
phase and at least €1 billion co-
investment foreseen for next 7 years
Large EC initiative
27 |
EOSC
Association
Steering
Board
European
Commission

Long history of political agreements and activity
Lots of groundwork since 2015
• Council Conclusions
• Expert Group reports
• EC documents
• Major investment in EOSC
related projects to develop the
infrastructure and platform

www.geant.org
www.geant.org
29

www.geant.org
www.geant.org
• A web of FAIR data and services
• Federation of eInfra and Research
Infrastructures (RIs)
• Environment in which data can be
brought together with services to
perform analyses and address
societal challenges
The EOSC platform

www.geant.org
www.geant.org
Aims to enable multidisciplinary discovery & use
Disconnected silos to a
federated infrastructure
providing added value to
researchers

www.geant.org
www.geant.org
FAIR is central to principles in EOSC
• Is the glue that connects data & services
• Requirement for FAIR to support reuse
• Use community standards
• Share all types of output (openly)

www.geant.org
www.geant.org
Many projects and initiatives contributing…

www.geant.org
www.geant.org
Current state of the EOSC platform –
still a work in progress!
34 |
Image: Xuan Nuygen https://unsplash.com/photos/i2M4JeyBIV8

www.geant.org
www.geant.org
• Currently the primary resource for
navigating EOSC
• https://eosc-portal.eu
• Includes a virtual tour for new users
• Catalogue and marketplace is how
you discover, access and compose
resources
EOSC Portal

www.geant.org
www.geant.org
Search by scientific domains or categories of services
Navigating resources

Access to free storage, compute and support services
C-SCALE will federate compute
and data resources from the
Copernicus DIAS, the national
Collaborative Ground
Segments and the European
Open Science Cloud (EOSC)
towards a European open
source Big (Copernicus) Data
Analytics platform:
- Storage services: up to 12 PB
- Cloud services: up to
17,728,500 CPU hours
- HPC/HTC services: up to
3,100,000 CPU hours
- GPU services: up to 6,000
GPU hours
DICE makes available a set of
data management services (and
associated resources) for
researchers and research
communities from any scientific
domain including:
- Data archives (up to 25 PB)
- Policies based data archives (up
to 17 PB)
- Personal and project
workspaces (up to 5 PB)
- Data repository services for
data sharing (up to 8 PB)
- Data discovery services (with
PID and DOI services and
metadata harvesting)
EGI-ACE will deliver the EOSC
Compute Platform and will
contribute to the EOSC Data
Commons. Services offered
include: compute and storage
resources, compute platform
services, data management
services and related user support
and training.
The total capacity that EGI-ACE
makes available through the call
between 2021-2023 is:
- 80,000,000 CPU hours
- 250,000 GPU hours
- 20 PB storage
support to Argos DMP service by
drafting discipline specific DMPs,
Horizon Europe DMP support
set your own community
research gateway
(connect.openaire.eu) and
Zenodo communities
access open science metrics for
your projects, institution,
community
service to anonymise your data
and comply with GDPR
support and mentoring on
Horizon Europe open access
mandates
Provides three core services for
Research Lifecycle Management:
- ROHub: tool to facilitate the
exchange of information across the
scientific community.
- Text Enrichment and Mining:
service which automatically extracts
valuable information and metadata
from bibliographic sources and
other text documents
- Datacube technology for Earth
Observation (EO) data
management: efficient access to
extensive collections of multi-
temporal and multi-dimensional EO
imagery, also allowing
interoperability among the different
information layers.
https://marketplace.eosc-portal.eu

www.geant.org
www.geant.org
Finding research data
• Currently via community repositories and catalogues
• These are being aggregated to offer a cross-search in EOSC…

www.geant.org
www.geant.org
EOSC Future is using AI techniques to make recommendations to users:
• relevant projects, data, publications, training materials
• potential collaborators (people, task forces, communities)
Recommendations based on
• viewing history
• order history
• general popularity
• popularity among users with
a similar background/interests
Recommendations for users

www.geant.org
www.geant.org
• Federated identity management – ease of single sign on
• Access to a greater number of services
• Funding provided to pay for compute e.g. EGI-ACE, DICE
• Discovery of related data from other disciplines / sectors
• Greater ability to collaborate and address key research
questions
Benefits of EOSC for researchers
40

www.geant.org
www.geant.org
Go to menti.com
and enter code
3330 9014
41 |

www.geant.org
www.geant.org
42 |
How to practice Open Science
Image: Jetshoots.com https://unsplash.com/photos/VdOO4_HFTWM

www.geant.org
www.geant.org
1. Choose your dataset(s)
– What can you may open? You may need to revisit this step if you
encounter problems later.
2. Apply an open license
– Determine what IP exists. Apply a suitable licence e.g. CC-BY
3. Make the data available
– Provide the data in a suitable format. Use repositories.
•
4.Make it discoverable
– Post on the web, register in catalogues…
How to make data open?
https://okfn.org

www.geant.org
www.geant.org
https://www.dcc.ac.uk/guidance/how-guides/license-research-data
License research data openly
This DCC guide outlines the pros and
cons of each approach and gives practical
advice on how to implement your licence
CREATIVE COMMONS LIMITATIONS
NC Non-Commercial
What counts as commercial?
ND No Derivatives
Severely restricts use
These clauses are not open licenses
European Commission
guidelines point to:
or

www.geant.org
www.geant.org
Answer questions to determine which licence(s) are
appropriate to use
EUDAT licensing tool
https://ufal.github.io/public-license-selector

www.geant.org
www.geant.org
www.fosteropenscience.eu/content/re3data-demo
Deposit in a data repository
http://databib.org
http://service.re3data.org/search
Re3data is one registry of repositories that can be searched to
find a relevant home for your data. FAIRsharing is another.

www.geant.org
www.geant.org
• Look for provision from your community, university, publisher, funder etc
• Check they match your particular data needs: e.g. formats accepted;
mixture of Open and Restricted Access.
• See if they provide guidance on how to cite the deposited data.
• Do they assign a persistent & globally unique identifier for sustainable
citations and to links back to particular researchers and grants?
• Look for certification as a ‘Trustworthy Digital Repository’ with an explicit
ambition to keep the data available in long term.
How to select a repository?

www.geant.org
www.geant.org
Metadata Standards Directory
Broad, disciplinary listing of standards
and tools. Maintained by RDA group
http://rd-alliance.github.io/metadata-directory
Use metadata standards
FAIRsharing
• A portal of data standards,
databases, and policies
• Focused on life, environmental
and biomedical sciences
https://fairsharing.org

www.geant.org
www.geant.org
If you want your data to be re-used and sustainable in the long-
term, you typically want to opt for open, non-proprietary formats.
Choose appropriate file formats
Type Recommended Avoid for data sharing
Tabular data CSV, TSV, SPSS portable Excel
Text Plain text, HTML, RTF
PDF/A only if layout matters
Word
Media Container: MP4, Ogg
Codec: Theora, Dirac, FLAC
Quicktime
H264
Images TIFF, JPEG2000, PNG GIF, JPG
Structured data XML, RDF RDBMS
Further examples:
https://ukdataservice.ac.uk/learning-hub/research-data-management/format-
your-data/recommended-formats

www.geant.org
www.geant.org
Managing and sharing data:
a best practice guide
https://dam.ukdataservice.ac.uk/media/622417/managingsharing.pdf

www.geant.org
www.geant.org
More on life science tools
and infrastructure coming
up in Susanna’s talk
51 |
Image: Sangharsh Lohakare https://unsplash.com/photos/Iy7QyzOs1bo

www.geant.org
www.geant.org
Click to edit Master title style
• Click to edit Master text styles
• Second level
• Third level
• Fourth level
• Fifth level
01/04/2022 52
Thank you
www.geant.org
Any questions?
© GÉANT Association on behalf of the GN4 Phase 2 project (GN4-2).
The research leading to these results has received funding from
the European Union’s Horizon 2020 research and innovation
programme under Grant Agreement No. 731122 (GN4-2). 52 |

Introduction to Open Science and EOSC

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Introduction to Open Science and EOSC

Similar to Introduction to Open Science and EOSC (20)

More from Sarah Jones

More from Sarah Jones (20)

Recently uploaded

Recently uploaded (20)

Introduction to Open Science and EOSC

Editor's Notes