Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

WEBINAR: "How to manage your data to make them open and fair"

882 views

Published on

Joint OpenAIRE and EOSC-hub webinar delivered by Marjan Grootveld and Ellen Leenarts (DANS) on May 15, 2018

Published in: Science
  • Be the first to comment

WEBINAR: "How to manage your data to make them open and fair"

  1. 1. @EOSC_eu@OpenAIRE_eu| www.eosc-hub.eu www.openaire.eu EOSC-hub and OpenAIRE-Advance receive funding from the European Union's Horizon 2020 research and innovation programme under grant agreement numbers 777536 and 777541, respectively.
  2. 2. May 15, 2018 EllenLeenarts DANS MarjanGrootveld DANS How to manage your data to make them Open and FAIR
  3. 3. Introduction EOSC Open and FAIR data EOSC, EOSC-hub, OpenAIRE Advance – Ellen Leenarts Making your data Open and FAIR – Marjan Grootveld Services in the lifecycle Questions and answers Data services when you need them in the research data lifecycle – Ellen and Marjan Please put “Question” before your questions in the chatbox Slides will be made available afterwards.
  4. 4. EOSC and EOSC-building projects
  5. 5. 5 The EOSC is part of the overall European Cloud Initiative, which ultimately aims to connect business, industry and public facilities through the cloud. EOSC-building projects are for instance • OpenAIRE-Advance • EOSC-hub • EOSCpilot • eInfraCentral • FREYA
  6. 6. The EOSC-hub project mobilises providers from the EGI Federation, EUDAT CDI, INDIGO-DataCloud and major research e-infrastructures offering services for advanced data-driven research and innovation. These resources are offered via the Hub – the integration and management system of the European Open Science Cloud, acting as a single entry point for all stakeholders. EOSC-hub: Services for the European Open Science Cloud 65/15/2018
  7. 7. • Full title: Integrating and managing services for the European Open Science Cloud • 100 Partners, 76 beneficiaries (75 funded) • 3,874 PMs, 108 FTEs, more than 200 technical and scientific staff involved • €33,331,180, funded by: - European Commission: €30,000,000 (call H2020-EINFRA-2016-2017) - The participants of the EGI Foundation: €3,331,180 • 36 months: January 2018 – December 2020 Project fact sheet EOSC-hub 75/15/2018
  8. 8. 1. Implement, monitor, align Open Science policies across Europe and the world 2. Harvesting of OA output, linking to contextual information 3. Deploy services to embed Open Science into researcher workflows 4. Develop global open standards for linking all research 5. Train for Open Science, for FAIR Science What is OpenAIRE? 8 OpenAIREisabout opening-sharing-reusingresearch outcomes
  9. 9. • Both in EINFRA-12 (topic A and B) - EOSC-hub ~ storage, compute, application services - OpenAIRE ~ RDM; Publication services • Let’s support Open Science together! - Joint workplan plan Technical integration of online services Dissemination, community building, support, training Governance EOSC-hub – OpenAIRE-Advance collaboration 95/15/2018
  10. 10. Open and FAIR data
  11. 11. Open and FAIR data management Open? FAIR?  ????
  12. 12. Horizon2020: Open and FAIR Source: Daniel Spichtinger, European Commission DG RTD, Unit A.6. – October 11, 2017
  13. 13. • Findable – Assign persistent IDs, provide rich metadata, register in a searchable resource, ... • Accessible – Retrievable by their ID using a standard protocol, metadata remain accessible even if data aren’t... • Interoperable – Use formal, broadly applicable languages, use standard vocabularies, qualified references... • Reusable – Rich, accurate metadata, clear licences, provenance, use of community standards... FAIR data principles www.force11.org/group/fairgroup/fairprinciples http://www.nature.com/articles/sdata201618
  14. 14. H2020 DMP Guidelines: “This template is inspired by FAIR as a general concept.” Meaning: find your own (disciplinary) practice. Guidelines on FAIR data management in Horizon 2020: http://ec.europa.eu/research/participants/data/ref/h2020/grants_manual/hi/oa_pilot/h2020-hi-oa-data-mgt_en.pdf Principles =/= practice GO FAIR: initiative towards the internet of FAIR data and services. Started in Europe, but reaches out wide. https://www.dtls.nl/fair-data/go-fair/ Infographic EC: http://ec.europa.eu/research/images/infographics/policy/open-data-2016-w920.png
  15. 15. Intermezzo: research data lifecycles
  16. 16. Sample lifecycle 1 The integrated scientific life cycle of embedded networked sensor research. From: Alberto Pepe, Matthew Mayernik, Christine L. Borgman, Herbert Van de Sompel: “From Artifacts to Aggregations: Modeling Scientific Life Cycles on the Semantic Web”. https://arxiv.org/ftp/arxiv/papers/0906/0906.2549.pdf
  17. 17. Sample lifecycle 2: data lifecycle as part of research lifecycle 17“Open Access Tube Map” (CC-BY) - Awre, Chris L.; Stainthorp, Paul; and Stone, Graham (2016) "Supporting Open Access Processes Through Library Collaboration”, Collaborative Librarianship: Vol. 8 : Iss. 2 , Article 8. Data lifecycle
  18. 18. Sample lifecycle 3: what OpenAIRE and EOSC-hub support (or plan to support) Presented by Gergely Sipos during the EOSC-hub – OpenAIRE webinar “National nodes meetup”, April 24 2018. Retrieved May 8 2018 from https://www.openaire.eu/webinars/
  19. 19. Sample lifecycle 4: EOSC-hub research data lifecycle 19 Processing & Analysis Data Management, Curation & Preservation Access, Deposition & Sharing 1 2 3 4 Discover & Reuse
  20. 20. Our favourite: simplified research data lifecycle 20 CREATING DATA PROCESSING DATA ANALYSING DATA PRESERVING DATA GIVING ACCESS TO DATA RE-USING DATA Based on UK Data Archive lifecycle: https://www.ukdataservice.ac.uk/manage-data/lifecycle Used in OpenAIRE RDM briefing paper: https://www.openaire.eu/briefpaper-rdm-infonoads
  21. 21. What would a re-user need? Planning for FAIR: think backwards CREATING DATA PROCESSING DATA PRESERVING DATA GIVING ACCESS TO DATA RE-USING DATA “Lots of documentation is needed” EUDAT FAIR checklist - CC-BY Sarah Jones & Marjan Grootveld, EUDAT. https://doi.org/10.5281/zenodo.1065991
  22. 22. • Metadata (persistent identifier included) is needed to locate research data and get a first idea of the content. • Use relevant standards to enable interoperability. • Check which standards the long-term repository supports or expects. Metadata 22 • Arts and humanities • Engineering • Life sciences • Physical sciences and mathematics • Social and behavioral sciences • General research data: e.g. Dublin Core and DataCite http://rd-alliance.github.io/metadata-directory https://rdamsc.dcc.ac.uk/ Extra: metadata tools: https://rdamsc.dcc.ac.uk/tool-index https://fairsharing.org/
  23. 23. • Code book explaining the variables • Study design • Lab journal • iPython or Jupyter notebook • Statistical queries • Software or instruments to understand or to reproduce the data • Machine configurations • Informed consent information • Data usage licence • … In short: document and preserve everything that is needed to reproduce the study – ideally following the standard in your discipline Documentation?
  24. 24. Interoperability Before clocks were invented, people kept time using different instruments to observe the Sun’s zenith at noon. Towns and cities set clocks based on sunsets and sunrises. Time calculation became a serious problem for people travelling by train, sometimes hundreds of miles in a day. UTC is the World's Time Standard. 24
  25. 25. Services at the point of need – Research data lifecycle
  26. 26. EOSC-hub and OpenAIRE services for a researcher? Focus Clinic via Flickr cc
  27. 27. CREATING DATA PROCESSING DATA ANALYSING DATA PRESERVING DATA GIVING ACCESS TO DATA RE-USING DATA PIDs  Referencing data: make data findable and citable: DOIs from B2SHARE and Zenodo; B2HANDLE HPC Data Transfer from public data servers: B2STAGE Document what you do; Store your mutable data (versions!) in B2DROP Move data to HPC; Keep documenting; Analysing: High Throughput Compute Prepare data for sharing: Amnesia Simplified lifecycle with FAIR support Promote Open / Restricted access to data – invite reuse; add a clear usage licence. Zenodo, B2SHARE Annotate the data for reusability; B2NOTE Deposit data with metadata and documentation for interoperability and reusability with B2SHARE, B2SAFE, Online Storage Metadata support findability and the decision to reuse; should be interoperable itself: B2SHARE, B2FIND, Datahub Plan for FAIR and well-managed data with EasyDMP or DMPonline. Because a DMP is a living document, ask yourself in each stage of the lifecycle if there are reasons to update or refine it. OpenAIRE/EOSC-hub webinar 15-05-2018 “How to manage your data to make them Open and FAIR” Different stages of the lifecycle can benefit from different services: Marketplace
  28. 28. B2FIND 28 Making Open Science findable http://b2find.eudat.eu/ Provided through EOSC-hub ● Cross-disciplinary metadata and discovery service (B2FIND) allowing RI to make their data findable and discoverable in a central catalogue ○ Metadata can be harvested via OAI-PMH. Possibility to use also APIs as JSON-API’s and CSW2.0 to collect the metadata from the communities. ○ The project provides support to integrate community data catalogue
  29. 29. B2DROP 29 Sync and share research data (https://www.eudat.eu/services/b2drop) Provided through EOSC-hub: ● Store and share data with colleagues and team members, including research data not finalised for publishing ○ Cloud storage to share data with fine-grained access controls ○ Synchronise multiple versions of data across different devices, including workflow and computing environments ○ Publish data via B2SHARE
  30. 30. B2SHARE 30 Store and publish data (https://b2share.eudat.eu/) Provided through EOSC-hub: ● Data repository & publishing service (B2SHARE) allowing RIs to publish and manage data in a persistent way ○ Use of DataCite DOIs & EPIC PID ○ Domain specific metadata extensions ○ Manage the publish life cycle with version control ○ Community defined authorisation rules ○ Annotations via defined ontologies
  31. 31. B2SHARE - Public license selector 31 Choose a public license by answering some questions regarding access to your dataset. Suggestions depend on several factors: - Type of data - Original licenses - Data consumer access and distribution rights Or use the search functionality.
  32. 32. B2NOTE 32 Use annotations to structure your data (https://b2note.eudat.eu/) Provided through EOSC-hub: ● Manage and share annotations on data with colleagues and team members ○ Annotations are keywords or commentaries attached to a object, that explains or classifies it. ○ B2NOTE annotation service is integrated with the B2SHARE service and technology ○ B2NOTE can be easily integrated with other community data repository services ○ Provide training on semantic annotations
  33. 33. Marketplace 33 Provided through EOSC-hub: ● Marketplace: multi-tenant user-facing platform for service providers to publish their EOSC services and EOSC-compliant data repositories, and collect service orders ○ Mature services and curated data ○ The RI retains control and accountability for the services and data published and participate in the management of the Hub service portfolio ○ Support to usage of common service templates https://marketplace.egi.eu/
  34. 34. • Micro data often reveal important private information, e.g., medical condition of a person - Individuals are afraid to provide their data - Companies are afraid to share data with experts - GDPR makes a strict protection scheme obligatory • The key idea in anonymization is that identifying information is removed from the published data, so no sensitive information can be attributed to a person – not even after data linking • The aim of anonymization methods is to allow sharing such data, without compromising the privacy of the users. Amnesia: making personal data shareable OpenAIRE Amnesia webinar 24-04--2018 https://www.openaire.eu/amnesia-data-anonymization-made-easy
  35. 35. • Amnesia not only removes direct identifiers like names, social security numbers et cetera, but also transforms secondary identifiers like birth date and zip code so that individuals cannot be identified in the data. • Amnesia is available as a public beta version at - https://amnesia.openaire.eu • On-line version is for demonstration and testing purposes mostly (sample datasets included) • Sensitive data can be anonymized locally by downloading the application - Security - Scalability • OpenAIRE is in the process of adjusting it to health data, and looking for your feedback! - amnesia-helpdesk@imis.athena-innovation.gr Amnesia status OpenAIRE Amnesia webinar 24-04--2018 https://www.openaire.eu/amnesia-data-anonymization-made-easy
  36. 36. • Catch-all repository for EU-funded research • Up to 50 GB per upload • Data stored in the CERN Data Center • Persistent identifiers (DOIs) for every upload, with DOI versioning • Includes article-level metrics • Free for the long tail of science • Open to all research outputs from all disciplines • GitHub integration • Easily add EC funding information and report via OpenAIRE Short facts about Zenodo 36 Zenodo: https://zenodo.org/
  37. 37. DOI versioning in Zenodo http://blog.zenodo.org/2017/05/30/doi-versioning-launched/
  38. 38. Github > Zenodo Zenodo: https://zenodo.org/
  39. 39. • Recall that research funders like the EC and (academic) employers increasingly demand DMPs • Tools available for writing your DMP Data management planning DMPOnline: https://dmponline.dcc.ac.uk/ EasyDMP: https://easydmp.sigma2.no/
  40. 40. Both tools… • … contain the EC’s Horizon2020 DMP template • … allow you to collaborate with others on your DMP (under construction) • … allow you to export your DMP • … plan to support “machine-actionable DMPs” DMP-writing tools Guidance follows EC Guidance text more closely Additional DCC guidance Guidance is more interpretative Pull-down menus to select e.g. metadata schema and file formats Any feedback? support@easydmp.sigma2.no DMPOnline: https://dmponline.dcc.ac.uk/ EasyDMP: https://easydmp.sigma2.no/
  41. 41. • When you integrate Open Science in your European research proposal, this makes your proposal more competitive. - Grigorov, Ivo; Elbæk, Mikael; Rettberg, Najla; Davidson, Joy: “Winning Horizon 2020 with Open Science”. https://doi.org/10.5281/zenodo.12247 • There is evidence that grant proposals are receiving praise for including a DMP outline – even though in H2020 a DMP is not required at the proposal stage, and not a competitive point. • Quotes from EC evaluation reviews of grant proposals: - “a clear description is provided of how core data sets and model development can be shared broadly within the scientific community” - “data storage and accessibility issues are not considered sufficiently” - “there is very good realization of the commercial potential of the project outcomes, which is reflected in the establishment of a data management plan, including IP related issues.” So: you better start early on a concrete and convincing DMP ;-) Did you know? Thanks to Ivo Grigorov (Technical University of Denmark, FOSTER project) for sharing these quotes. Webinar May 14th 2018
  42. 42. To conclude
  43. 43. • Findable – Assign persistent IDs, provide rich metadata, register in a searchable resource, ... • Accessible – Retrievable by their ID using a standard protocol, metadata remain accessible even if data aren’t... • Interoperable – Use formal, broadly applicable languages, use standard vocabularies, qualified references... • Reusable – Rich, accurate metadata, clear licences, provenance, use of community standards... FAIR data principles Services to improve FAIR & Open • Amnesia • B2FIND • B2DROP • B2SHARE • B2NOTE • DMPonline and EasyDMP • Marketplace • Zenodo • Et cetera! OpenAIRE/EOSC-hub webinar 15-05-2018 “How to manage your data to make them Open and FAIR”
  44. 44. @EOSC_eu @openaire_eu Questions? Acknowledgements: we reused slides from the EOSC-hub, OpenAIRE-Advance and EUDAT projects. Thanks to Gergely Sipos (EGI), Shaun de Witt (CCFE), Manolis Terrovitis (Research Center Athena, IMSI), Najla Rettberg (University of Göttingen), Pedro Principe (University of Minho), Ivo Grigorov (Technical University Denmark) and EUDAT training team.
  45. 45. EllenLeenarts&MarjanGrootveld name.surname@dans.knaw.nl Thank you!

×