3. Introduction EOSC
Open and FAIR data
EOSC, EOSC-hub, OpenAIRE Advance – Ellen Leenarts
Making your data Open and FAIR – Marjan Grootveld
Services in the lifecycle
Questions and answers
Data services when you need them in the research
data lifecycle – Ellen and Marjan
Please put “Question” before your questions in the chatbox
Slides will be made available afterwards.
5. 5
The EOSC is part of the overall
European Cloud Initiative,
which ultimately aims to connect
business, industry and public
facilities through the cloud.
EOSC-building projects are for instance
• OpenAIRE-Advance
• EOSC-hub
• EOSCpilot
• eInfraCentral
• FREYA
6. The EOSC-hub project mobilises providers from the EGI Federation, EUDAT CDI,
INDIGO-DataCloud and major research e-infrastructures offering services for advanced
data-driven research and innovation.
These resources are offered via the Hub – the integration and management system of
the European Open Science Cloud, acting as a single entry point for all stakeholders.
EOSC-hub: Services for the European Open Science Cloud
65/15/2018
7. • Full title: Integrating and managing services for the European Open
Science Cloud
• 100 Partners, 76 beneficiaries (75 funded)
• 3,874 PMs, 108 FTEs, more than 200 technical and scientific staff
involved
• €33,331,180, funded by:
- European Commission: €30,000,000 (call H2020-EINFRA-2016-2017)
- The participants of the EGI Foundation: €3,331,180
• 36 months: January 2018 – December 2020
Project fact sheet EOSC-hub
75/15/2018
8. 1. Implement, monitor, align Open Science policies across Europe and the world
2. Harvesting of OA output, linking to contextual information
3. Deploy services to embed Open Science into researcher workflows
4. Develop global open standards for linking all research
5. Train for Open Science, for FAIR Science
What is OpenAIRE?
8
OpenAIREisabout
opening-sharing-reusingresearch
outcomes
9. • Both in EINFRA-12 (topic A and B)
- EOSC-hub ~ storage, compute, application services
- OpenAIRE ~ RDM; Publication services
• Let’s support Open Science together!
- Joint workplan plan
Technical integration of online services
Dissemination, community building, support, training
Governance
EOSC-hub – OpenAIRE-Advance collaboration
95/15/2018
12. Horizon2020: Open and FAIR
Source: Daniel Spichtinger, European Commission DG RTD, Unit A.6. – October 11, 2017
13. • Findable
– Assign persistent IDs, provide rich metadata, register in a searchable resource, ...
• Accessible
– Retrievable by their ID using a standard protocol, metadata remain accessible even if data aren’t...
• Interoperable
– Use formal, broadly applicable languages, use standard vocabularies, qualified references...
• Reusable
– Rich, accurate metadata, clear licences, provenance, use of community standards...
FAIR data principles
www.force11.org/group/fairgroup/fairprinciples
http://www.nature.com/articles/sdata201618
14. H2020 DMP Guidelines: “This template is inspired by FAIR as a general concept.”
Meaning: find your own (disciplinary) practice.
Guidelines on FAIR data management in Horizon 2020:
http://ec.europa.eu/research/participants/data/ref/h2020/grants_manual/hi/oa_pilot/h2020-hi-oa-data-mgt_en.pdf
Principles =/= practice
GO FAIR: initiative towards the internet of FAIR data
and services. Started in Europe, but reaches out wide.
https://www.dtls.nl/fair-data/go-fair/
Infographic EC: http://ec.europa.eu/research/images/infographics/policy/open-data-2016-w920.png
16. Sample lifecycle 1
The integrated scientific life cycle of embedded networked sensor research. From: Alberto Pepe, Matthew
Mayernik, Christine L. Borgman, Herbert Van de Sompel: “From Artifacts to Aggregations: Modeling Scientific Life
Cycles on the Semantic Web”. https://arxiv.org/ftp/arxiv/papers/0906/0906.2549.pdf
17. Sample lifecycle 2: data lifecycle as part of research lifecycle
17“Open Access Tube Map” (CC-BY) - Awre, Chris L.; Stainthorp, Paul; and Stone, Graham (2016) "Supporting Open
Access Processes Through Library Collaboration”, Collaborative Librarianship: Vol. 8 : Iss. 2 , Article 8.
Data lifecycle
18. Sample lifecycle 3: what OpenAIRE and EOSC-hub support (or plan to support)
Presented by Gergely Sipos during the EOSC-hub – OpenAIRE webinar “National nodes meetup”, April 24 2018.
Retrieved May 8 2018 from https://www.openaire.eu/webinars/
19. Sample lifecycle 4: EOSC-hub research data lifecycle
19
Processing & Analysis
Data Management,
Curation & Preservation
Access, Deposition &
Sharing
1
2
3
4
Discover & Reuse
20. Our favourite: simplified research data lifecycle
20
CREATING
DATA
PROCESSING
DATA
ANALYSING
DATA
PRESERVING
DATA
GIVING
ACCESS TO
DATA
RE-USING
DATA
Based on UK Data Archive lifecycle: https://www.ukdataservice.ac.uk/manage-data/lifecycle
Used in OpenAIRE RDM briefing paper: https://www.openaire.eu/briefpaper-rdm-infonoads
21. What would a re-user need?
Planning for FAIR: think backwards
CREATING
DATA
PROCESSING
DATA
PRESERVING
DATA
GIVING
ACCESS TO
DATA
RE-USING
DATA
“Lots of documentation
is needed”
EUDAT FAIR checklist - CC-BY Sarah Jones & Marjan Grootveld, EUDAT. https://doi.org/10.5281/zenodo.1065991
22. • Metadata (persistent identifier included) is needed to locate research data and get a first idea of the
content.
• Use relevant standards to enable interoperability.
• Check which standards the long-term repository supports or expects.
Metadata
22
• Arts and humanities
• Engineering
• Life sciences
• Physical sciences and mathematics
• Social and behavioral sciences
• General research data: e.g. Dublin Core and
DataCite
http://rd-alliance.github.io/metadata-directory
https://rdamsc.dcc.ac.uk/
Extra: metadata tools: https://rdamsc.dcc.ac.uk/tool-index
https://fairsharing.org/
23. • Code book explaining the variables
• Study design
• Lab journal
• iPython or Jupyter notebook
• Statistical queries
• Software or instruments to understand or to reproduce the data
• Machine configurations
• Informed consent information
• Data usage licence
• …
In short: document and preserve everything that is needed to reproduce the study –
ideally following the standard in your discipline
Documentation?
24. Interoperability
Before clocks were invented, people
kept time using different instruments to
observe the Sun’s zenith at noon.
Towns and cities set clocks based on
sunsets and sunrises. Time calculation
became a serious problem for people
travelling by train, sometimes hundreds
of miles in a day. UTC is the World's
Time Standard.
24
27. CREATING
DATA
PROCESSING
DATA
ANALYSING
DATA
PRESERVING
DATA
GIVING
ACCESS TO
DATA
RE-USING
DATA
PIDs Referencing data: make data
findable and citable: DOIs from
B2SHARE and Zenodo; B2HANDLE HPC Data Transfer from
public data servers:
B2STAGE
Document what you do;
Store your mutable data (versions!)
in B2DROP
Move data to HPC;
Keep documenting;
Analysing: High Throughput Compute
Prepare data for sharing: Amnesia
Simplified lifecycle with FAIR support
Promote Open / Restricted access to data –
invite reuse; add a clear usage licence.
Zenodo, B2SHARE
Annotate the data for reusability; B2NOTE Deposit data with metadata and
documentation for interoperability
and reusability with B2SHARE,
B2SAFE, Online Storage
Metadata support findability and
the decision to reuse; should be
interoperable itself: B2SHARE,
B2FIND, Datahub
Plan for FAIR and well-managed data with
EasyDMP or DMPonline.
Because a DMP is a living document, ask
yourself in each stage of the lifecycle if
there are reasons to update or refine it.
OpenAIRE/EOSC-hub webinar 15-05-2018
“How to manage your data to make them Open and FAIR”
Different stages of the lifecycle can benefit
from different services: Marketplace
28. B2FIND
28
Making Open Science findable
http://b2find.eudat.eu/
Provided through EOSC-hub
● Cross-disciplinary metadata and discovery service (B2FIND) allowing RI to
make their data findable and discoverable in a central catalogue
○ Metadata can be harvested via OAI-PMH. Possibility to use also APIs as JSON-API’s and
CSW2.0 to collect the metadata from the communities.
○ The project provides support to integrate community data catalogue
29. B2DROP
29
Sync and share research data (https://www.eudat.eu/services/b2drop)
Provided through EOSC-hub:
● Store and share data with colleagues and team members, including research
data not finalised for publishing
○ Cloud storage to share data with fine-grained access controls
○ Synchronise multiple versions of data across different devices, including workflow and
computing environments
○ Publish data via B2SHARE
30. B2SHARE
30
Store and publish data (https://b2share.eudat.eu/)
Provided through EOSC-hub:
● Data repository & publishing service (B2SHARE) allowing RIs to publish and
manage data in a persistent way
○ Use of DataCite DOIs & EPIC PID
○ Domain specific metadata extensions
○ Manage the publish life cycle with version control
○ Community defined authorisation rules
○ Annotations via defined ontologies
31. B2SHARE - Public license selector
31
Choose a public license by
answering some
questions regarding
access to your dataset.
Suggestions depend on
several factors:
- Type of data
- Original licenses
- Data consumer access
and distribution rights
Or use the search
functionality.
32. B2NOTE
32
Use annotations to structure your data (https://b2note.eudat.eu/)
Provided through EOSC-hub:
● Manage and share annotations on data with colleagues and team members
○ Annotations are keywords or commentaries attached to a object, that explains or classifies
it.
○ B2NOTE annotation service is integrated with the B2SHARE service and technology
○ B2NOTE can be easily integrated with other community data repository services
○ Provide training on semantic annotations
33. Marketplace
33
Provided through EOSC-hub:
● Marketplace: multi-tenant user-facing platform for service providers to publish
their EOSC services and EOSC-compliant data repositories, and collect service
orders
○ Mature services and curated data
○ The RI retains control and accountability for the services and data published and participate in the
management of the Hub service portfolio
○ Support to usage of common service templates
https://marketplace.egi.eu/
34. • Micro data often reveal important private information, e.g., medical condition of a person
- Individuals are afraid to provide their data
- Companies are afraid to share data with experts
- GDPR makes a strict protection scheme obligatory
• The key idea in anonymization is that identifying information is removed from the
published data, so no sensitive information can be attributed to a person – not even after
data linking
• The aim of anonymization methods is to allow sharing such data, without compromising
the privacy of the users.
Amnesia: making personal data shareable
OpenAIRE Amnesia webinar 24-04--2018
https://www.openaire.eu/amnesia-data-anonymization-made-easy
35. • Amnesia not only removes direct identifiers like names, social security numbers et cetera, but
also transforms secondary identifiers like birth date and zip code so that individuals cannot be
identified in the data.
• Amnesia is available as a public beta version at
- https://amnesia.openaire.eu
• On-line version is for demonstration and testing purposes mostly (sample datasets included)
• Sensitive data can be anonymized locally by downloading the application
- Security
- Scalability
• OpenAIRE is in the process of adjusting it to health data, and looking for your feedback!
- amnesia-helpdesk@imis.athena-innovation.gr
Amnesia status
OpenAIRE Amnesia webinar 24-04--2018
https://www.openaire.eu/amnesia-data-anonymization-made-easy
36. • Catch-all repository for EU-funded research
• Up to 50 GB per upload
• Data stored in the CERN Data Center
• Persistent identifiers (DOIs) for every upload, with DOI versioning
• Includes article-level metrics
• Free for the long tail of science
• Open to all research outputs from all disciplines
• GitHub integration
• Easily add EC funding information and report via OpenAIRE
Short facts about Zenodo
36
Zenodo: https://zenodo.org/
37. DOI versioning in Zenodo
http://blog.zenodo.org/2017/05/30/doi-versioning-launched/
39. • Recall that research funders like the EC and (academic) employers increasingly demand DMPs
• Tools available for writing your DMP
Data management planning
DMPOnline: https://dmponline.dcc.ac.uk/
EasyDMP: https://easydmp.sigma2.no/
40. Both tools…
• … contain the EC’s Horizon2020 DMP template
• … allow you to collaborate with others on your DMP (under construction)
• … allow you to export your DMP
• … plan to support “machine-actionable DMPs”
DMP-writing tools
Guidance follows EC Guidance text more closely
Additional DCC guidance
Guidance is more interpretative
Pull-down menus to select e.g. metadata schema
and file formats
Any feedback? support@easydmp.sigma2.no
DMPOnline: https://dmponline.dcc.ac.uk/
EasyDMP: https://easydmp.sigma2.no/
41. • When you integrate Open Science in your European research proposal, this makes your
proposal more competitive.
- Grigorov, Ivo; Elbæk, Mikael; Rettberg, Najla; Davidson, Joy: “Winning Horizon 2020 with Open
Science”. https://doi.org/10.5281/zenodo.12247
• There is evidence that grant proposals are receiving praise for including a DMP outline – even
though in H2020 a DMP is not required at the proposal stage, and not a competitive point.
• Quotes from EC evaluation reviews of grant proposals:
- “a clear description is provided of how core data sets and model development can be shared broadly
within the scientific community”
- “data storage and accessibility issues are not considered sufficiently”
- “there is very good realization of the commercial potential of the project outcomes, which is
reflected in the establishment of a data management plan, including IP related issues.”
So: you better start early on a concrete and convincing DMP ;-)
Did you know?
Thanks to Ivo Grigorov (Technical University of Denmark, FOSTER project) for sharing these quotes.
Webinar May 14th 2018
43. • Findable
– Assign persistent IDs, provide rich metadata,
register in a searchable resource, ...
• Accessible
– Retrievable by their ID using a standard protocol,
metadata remain accessible even if data aren’t...
• Interoperable
– Use formal, broadly applicable languages, use
standard vocabularies, qualified references...
• Reusable
– Rich, accurate metadata, clear licences,
provenance, use of community standards...
FAIR data principles Services to improve FAIR & Open
• Amnesia
• B2FIND
• B2DROP
• B2SHARE
• B2NOTE
• DMPonline and EasyDMP
• Marketplace
• Zenodo
• Et cetera!
OpenAIRE/EOSC-hub webinar 15-05-2018
“How to manage your data to make them Open and FAIR”
44. @EOSC_eu
@openaire_eu
Questions?
Acknowledgements: we reused slides from the EOSC-hub, OpenAIRE-Advance and EUDAT projects. Thanks to Gergely
Sipos (EGI), Shaun de Witt (CCFE), Manolis Terrovitis (Research Center Athena, IMSI), Najla Rettberg (University of
Göttingen), Pedro Principe (University of Minho), Ivo Grigorov (Technical University Denmark) and EUDAT training team.
Interoperability is often considered the most problematic one. However, we humans have a long history of reaching consensus, standards, and common reference points. So the degree of interoperability will clearly vary between disciplines and that’s ok, as long as we see it as an ambition and a goal for the disciplines we work in.