Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Openness, exchange, FAIR DATA – oh brave new world that has such vision! (Dr. Marjan Grootveld)


Published on

Openness, exchange, FAIR data - oh brave new world. For some researchers, this is no longer a vision but already their day-to-day reality. For many others, however, terms like ‘open’, ‘FAIR data’* or ‘data exchange’ pose a challenge. What contribution can we make to ensure that new data comply with the FAIR Data Principles, and how can we measure the FAIRness of existing data? “Trust” is a key aspect: Trust that others interpret ‘your’ data correctly for example, or trust in data repositories.

Published in: Education
  • Be the first to comment

  • Be the first to like this

Openness, exchange, FAIR DATA – oh brave new world that has such vision! (Dr. Marjan Grootveld)

  1. 1. @openaire_eu Openness, exchange, FAIR Data – oh brave new world that has such vision in’t! ETH Library’s 17:15 Colloquium 25 October 2018 Marjan Grootveld DANS/OpenAIRE @MarjanGrootveld
  2. 2. Setting the scene MIRANDA:O,wonder! Howmanygoodlycreaturesaretherehere! Howbeautousmankindis!Obravenewworld, Thathassuchpeoplein’t! Source:Shakespeare,TheTempest “Oftenwhenthephrase"bravenewworld"isusedtoday,itismeanttoimplypositiveif challengingchange.” ThisiscertainlytrueforOpenScienceandFAIRdata.Forexample, - WhatexactlyisFAIRinmyfield? - Relevantlegislationininternationalcollaboration? - Whatmeans“publishingmydata”? - Public-privateresearch:contracts?Academicfreedom? - Etcetera! Keymessage:no-oneknowsalltheanswers,solet’sproceed insmallsteps–withafocuson trustworthyrepositoriestoday. ETH Library’s 17:15 Colloquium Image CC0 - 2
  3. 3. Contents Introductions Data FAIR data Repositories FAIR data in repositories ETH Library’s 17:15 Colloquium 3
  4. 4. Introductions ETH Library’s 17:15 Colloquium 4
  5. 5. Institute of Dutch Academy and Research Funding Organisation (KNAW & NWO) since 2005 First predecessor dates back to 1964 (Steinmetz Foundation), Historical Data Archive 1989 Mission: promote and provide permanent access to digital research resources What is DANS? 5
  6. 6. 6 DANS services DataverseNL for short- and mid-term data storage EASY: certified long-term Electronic Archiving System for self-deposit NARCIS: Gateway to scholarly information in the Netherlands
  7. 7. • Open Access Infrastructure for Research in Europe • Funded by Horizon2020 to develop and maintain the infrastructure to support OA policy of the EU • Supports H2020 OA mandates o 100% OA on scientific publications o Open Research Data Pilot • A National Open Access Desk in each country • DANS leads taskgroup Research Data Management OpenAIRE Advance 7
  8. 8. OpenAIRE support: FAIR data in trustworthy repositories: the basics FAIR data in trustworthy repositories: the basics 8
  9. 9. OpenAIRE guides: FAIR data in trustworthy repositories: the basics FAIR data in trustworthy repositories: the basics 9
  10. 10. There’s value in data ETH Library’s 17:15 Colloquium 10
  11. 11. Viking Mars Lander 11 Data now available from
  12. 12. Climatological database for the world’s oceans Image copied from Every yellow dot represents a ship report. Project website: 12
  13. 13. Happy funders Photo by – CC0 - 13
  14. 14. FAIR data ETH Library’s 17:15 Colloquium 14
  15. 15. 1. Findable – Easy to find by both humans and computer systems and based on mandatory description of the metadata that allow the discovery of interesting datasets; 2. Accessible – Stored for long term such that they can be easily accessed and/or downloaded withwell-defined licence and access conditions (Open Access when possible), whether at the level of metadata, or at the level of the actual data content; 3. Interoperable – Ready to be combined with other datasets by humans as well as computer systems; 4. Re-usable – Ready to be used for future research and to be processed further using computational methods. Note: Accessible does not imply Open for everyone FAIR data principles • • •
  16. 16. FAIR data High-level Expert Group’s recommendations FAIR data Expert Group presentation: action-plan 16 Rec. 10: Trusted Digital Repositories Repositories need to be encouraged and supported to achieve CoreTrustSeal certification. The development of rival repository accreditation schemes, based solely on the FAIR principles, should be discouraged. Rec. 29: Implement FAIR metrics (…) Repositories should publish assessments of the FAIRness of datasets, where practical, based on community review and the judgement of data stewards.
  17. 17. Open and FAIR: European Commission European Commission in the Guidelines: “Where will the data and associated metadata, documentation and code be deposited? Preference should be given to certified repositories which support open access where possible.” 17
  18. 18. Open and FAIR: Switserland 18
  19. 19. Publish data also in your own interest ;-) 19
  20. 20. CREATING DATA PROCESSING DATA PRESERVING DATA GIVING ACCESS TO DATA RE-USING DATA Focus on re-using data “Lots of documentation is needed” EUDAT FAIR checklist - CC-BY Sarah Jones & Marjan Grootveld, EUDAT. 20
  21. 21. Metadata are needed to find the research data and get a first idea of the content. SNSF: intrinsic metadata: e.g. author's name, content of dataset, associated publication, etc. submitter-defined metadata: e.g. definition of variable names, etc. Use relevant community standards to enable interoperability. Check which standards the long-term repository supports or expects. Metadata Extra: metadata tools: index 21
  22. 22. Code book explaining the variables Study design Informed consent information Lab journal iPython or Jupyter notebook Statistical queries Software or instruments tounderstand or to reproduce the data Machine configurations Data usage licence … Inshort: document and preserve everything that isneeded toreplicate the study – ideally followingthe standard inyour discipline Documentation? 22
  23. 23. Interoperability Before clocks were invented, people kept time using different instruments to observe the Sun’s zenith at noon. Towns and cities set clocks based on sunsets and sunrises. Time calculation became a serious problem for people travelling by train, sometimes hundreds of miles in a day. UTC is the World's Time Standard. 23
  24. 24. Interoperability and sustainability Repositories may demand or prefer certain file formats, for long-term accessibility. No world-wide standard; may depend on the research domains that a repository serves. It’s recommended to contact your repository at an early stage. 24
  25. 25. Repositories ETH Library’s 17:15 Colloquium 25
  26. 26. For giving (i.e. archiving & sharing) and taking (i.e. reusing) data: Certified as a ‘Trustworthy Digital Repository’ Matches your data needs regarding file formats, access, licences Supports metadata standards Provides a persistent and globally unique identifier Provides guidance on data citation Provides clear information about costs (if any) Contact your repository timely and benefit from its FAIR support. Select a trustworthy repository 26
  27. 27. Standards of trust 27
  28. 28. 28 CoreTrustSeal https://www.coretrustseal.or g/ CTS = Core Trust Seal DSA = Data Seal of Approval WDS = World Data Systems
  29. 29. World Glacier Monitoring Service 29
  30. 30. CoreTrustSeal requirements relate to FAIR data R2. The repository maintains all applicable licenses covering data access and use and monitors compliance. R3. The repository has a continuity plan to ensure ongoing access to and preservation of its holdings. R4. The repository ensures, to the extent possible, that data are created, curated, accessed, and used in compliance with disciplinary and ethical norms. R7. The repository guarantees the integrity and authenticity of the data. R8. The repository accepts data and metadata based on defined criteria to ensure relevance and understandability for data users. R10. The repository assumes responsibility for long-term preservation and manages this function in a planned and documented way. R11. The repository has appropriate expertise to address technical data and metadata quality and ensures that sufficient information is available for end users to make quality-related evaluations. R13. The repository enables users to discover the data and refer to them in a persistent way through proper citation. R14. The repository enables reuse of the data over time, ensuring that appropriate metadata are available to support the understanding and use of the data. 30
  31. 31. 31
  32. 32. EUDAT licensing wizard helps you pick licences for data & software Licensing research data and software 32 and integrated in Choose a public license by answering some questions regarding access to your dataset. Suggestions depend on several factors: - Type of data - Original licenses - Data consumer access and distribution rights EUDAT services are provided through EOSC-hub.
  33. 33. Catch-all repository for EU-funded research Up to 50 GB per upload Data stored in the CERN Data Center Persistent identifiers (DOIs) for every upload, with DOI versioning Includes article-level metrics Free for the long tail of science Open to all research outputs from all disciplines GitHub integration Easily add EC funding information and report via OpenAIRE Zenodo 33Zenodo:
  34. 34. The working group “Research data” within Science Europe is preparing criteria for the selection of trustworthy repositories (and core requirements for data management plans). Goal: data management based on this will support the researcher to ensure that data is FAIR. Well aligned with CoreTrustSeal requirements, but no certificate. Science Europe will probably recommend that repositories that are not certified yet seek certification. In the making 34
  35. 35. FAIR data in repositories ETH Library’s 17:15 Colloquium 35
  36. 36. Making data FAIR is essential. But how? For instance by inspecting how FAIR existing datasets are: Go to a data repository and look at a dataset: Would you feel comfortable with reusing the data? Why? Why not? What would help you to trust these data? How can we measure this in a more structured way? Prototypes for FAIR data metrics are being developed. Who should do the assessment? Researchers from the data producer’s domain? Actual re-users? Data repository staff? A machine? Assessing the FAIRness of data 36
  37. 37. Aim: to define metrics enabling both qualitative and quantitative assessment of the degree to which online resources comply with the 15 principles of FAIR Data Foundingmembers: MarkWilkinson,UniversidadPolitécnicadeMadrid SusannaSansone,UniversityofOxford MichelDumontier,MaastrichtUniversity PeterDoorn,DANS LuizOlavoBonino,VUAmsterdam/DTL ErikSchultes,DutchTechcentreforLifeSciences GO FAIR Metrics Group 37
  38. 38. FAIR “light” assessment Findable (defined by metadata (PID included) and documentation) 1. No PID nor metadata/documentation 2. PID without or with insufficient metadata 3. Sufficient/limited metadata without PID 4. PID with sufficient metadata 5. Extensive metadata and rich additional documentation available Accessible (defined by presence of user license) 1. Metadata nor data are accessible 2. Metadata are accessible but data is not accessible (no clear terms of reuse in license) 3. User restrictions apply (i.e. privacy, commercial interests, embargo period) 4. Public access (after registration) 5. Open access unrestricted Interoperable (defined by data format) 1. Proprietary (privately owned), non-open format data 2. Proprietary format, accepted by Certified Trustworthy Data Repository 3. Non-proprietary, open format = ‘preferred format’ 4. As well as in the preferred format, data is standardised using a standard vocabulary format (for the research field to which the data pertain) 5. Data additionally linked to other data to provide context 38
  39. 39. • FAIR as proxy for data “quality” or “fitness for (re-)use” • Wewant tocreate a badge system using the FAIR principles toassess data sets in aTrustworthy Digital Repository • Score each FAIR dimension on a 5-point scale • Operationalised the original principles toensure no interactions among dimensions toease scoring –challenging ;-) Consider Reusability as the resultant of the other three: the average FAIRness as an indicator of data quality (F+A+I)/3=R • Outcome of the assessment: a FAIR badge per dataset • Developed adata assessment prototype: FAIRdat • Redesigning the prototype as a FAIR checklist for researchers FAIR badge scheme F A I R 2 User Reviews 1 Archivist Assessment 24 Downloads 39
  40. 40. Display FAIR badges in any repository 40
  41. 41. Aim for aFAIR-aligned research data lifecycle. Credit researchers and others whoseek value inand add value toexisting data. Support FAIR and Open data intrustworthy repositories, for instance by stickingtostandards fordata documentation andfile formats (“replication packages”); checking howFAIR existing data areandlearning from this; pushing your favourite repository tomake processes andpolicies transparant, andgetcertified; teaching early-career researchers tomanage datato make them FAIRandasopen aspossible. It’s all about TRUST! More information: OpenAIRE/EOSC-hub webinar: OpenAIRE webinar on Open Research Data in Horizon 2020: EUDAT/OpenAIRE/DANS webinar: 41 Photo by Yuri Catalano – CC0 -
  42. 42. Thank you! Marjan Grootveld ETH Library’s 17:15 Colloquium, 25-10-2018