Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

OSFair2017 workshop | Monitoring the FAIRness of data sets - Introducing the DANS approach to FAIR metrics

161 views

Published on

Elly Dijk & Peter Doorn present the DANS approach to FAIR metrics

Workshop title: Open Science Monitor

Workshop overview:
Which are the measurable components of Open Science? How do we build a trustworthy, global open science monitor? This workshop will discuss a potential framework to measure Open Science, including the path from the publishing of an open policy (registries of policies and how these are represented or machine read), to the use of open methodologies, and the opening up of research results, their recording and measurement.

DAY 2 - PARALLEL SESSION 5

Published in: Science
  • Be the first to comment

  • Be the first to like this

OSFair2017 workshop | Monitoring the FAIRness of data sets - Introducing the DANS approach to FAIR metrics

  1. 1. Monitoring the FAIRness of Data Sets - Introducing the DANS Approach to FAIR Metrics Elly Dijk & Peter Doorn, DANS With thanks to Emily Thomas for some slides www.dans.knaw.n l Open Science Monitor – A workshop organised by OpenAIRE, EOSCpilot, Jisc, and DANS at the OPEN SCIENCE FAIR, Athens, 7 Sept 2017
  2. 2. What is FAIR data and why is it important for Open Science? • Open Science has become a policy priority: not only publications need to be openly accessible, but also other research outputs, especially research data • Principle: open if possible, protected if necessary • Legitimate restrictions to openness: • to protect privacy • data creator must be able to publish first • Openness itself is not enough, data must also be: • Findable F • Accessible A • Interoperable I • Reusable R
  3. 3. Institute of Dutch Academy and Research Funding Organisation (KNAW & NWO) since 2005 First predecessor dates back to 1964 (Steinmetz Foundation), Historical Data Archive 1989 Mission: promote and provide permanent access to digital research resources DANS is about keeping data FAIR
  4. 4. FAIR guiding principles 1. Findable – Easy to find by both humans and computer systems and based on mandatory description of the metadata that allow the discovery of interesting datasets; 2. Accessible – Stored for long term such that they can be easily accessed and/or downloaded with well-defined license and access conditions (Open Access when possible), whether at the level of metadata, or at the level of the actual data content; 3. Interoperable – Ready to be combined with other datasets by humans as well as computer systems; 4. Reusable – Ready to be used for future research and to be processed further using computational methods. Source:
  5. 5. Paper: http://www.nature.com/articles/sdata201618
  6. 6. Everybody loves FAIR! Everybody wants to be FAIR… But what does that mean? How to put the principles into practice?
  7. 7. Our struggle with FAIR operationalization To be Findable: •F1. (meta)data are assigned a globally unique and eternally persistent identifier. •F2. data are described with rich metadata. •F3. (meta)data are registered or indexed in a searchable resource. •F4. metadata specify the data identifier. To be Accessible: •A1 (meta)data are retrievable by their identifier using a standardized communications protocol. •A1.1 the protocol is open, free, and universally implementable. •A1.2 the protocol allows for an authentication and authorization procedure, where necessary. •A2 metadata are accessible, even when the data are no longer available. No rankings Relation F2 – F4? Relation A1 – F2,3,4? What if they never were? Do access licenses belong to Re-use? Access criteria relate mainly to machine access Help-page: “The FAIR Data Principles explained”
  8. 8. Difficulties in FAIR operationalization II To be Interoperable: • I1. (meta)data use a formal, accessible, shared, and broadly applicable language for knowledge representation. • I2. (meta)data use vocabularies that follow FAIR principles. • I3. (meta)data include qualified references to other (meta)data. To be Re-usable: • R1. meta(data) have a plurality of accurate and relevant attributes. • R1.1. (meta)data are released with a clear and accessible data usage license. • R1.2. (meta)data are associated with their provenance. • R1.3. (meta)data meet domain-relevant community standards. Droste effect: self-reference Which other metadata? Is not R1 about Findable again? Are not Licenses about Acc conditions? What if a license says “Not accessible”? Their? Whose? Is not R1.3 about Findable again? For FAIR Help see: https://www.dtls.nl/fair-data/fair-principles-explained/
  9. 9. Our evaluation of the original FAIR principles: 1. Dependencies among dimensions, difficulty to understand and measure the criteria, no rank order from “low” to “high” FAIRness, grouping of criteria under dimensions is disputable 2. Do we need or want additional dimensions, principles or criteria? • Is “openness” a separate dimension, not included in FAIR? • Is it desirable/possible to say something about “substantive” data quality, such as the accuracy/precision or correctness of the data? • What about the long-term access? For how long does data remain FAIR? • Should data security be included? 3. Several FAIR criteria can be solved at the level of the repository 4. Do we need separate FAIR criteria for different disciplines? • e.g. machine actionable data are more important in some fields than in other; note that data accessibility by machines is partly defined by technical specs (A1), partly by licenses (R1.1) Therefore, DANS started work on a FAIR “light” implementation in the context of data repositories compliant with the Data Seal of Approval
  10. 10. GO-FAIR Initiative: FAIR Metrics Group • Aim: to define metrics enabling both qualitative and quantitative assessment of the degree to which online resources comply with the 15 Principles of FAIR Data • Founding Members: • Mark Wilkinson, Universidad Politécnica de Madrid • Susanna Sansone, University of Oxford • Michel Dumontier, Maastricht University • Peter Doorn, DANS • Luiz Olavo Bonino, VU/DTL • Erik Schultes, DTL • https://www.dtls.nl/fair-data/fair-metrics-group/ and http://fairmetrics.org/
  11. 11. FAIR Metrics Framework www.eoscpilot.eu The European Open Science Cloud for Research pilot project is funded by the European Commission, DG Research & Innovation under contract no. 739563 11
  12. 12. FAIR Metrics Form Metric Identifier Metric Name To which principle does it apply? What is being measured? Why should we measure it? What must be provided? How do we measure it? What is a valid result? For which digital resource(s) is this relevant? Examples of their application across types of digital resource Comment www.eoscpilot.eu The European Open Science Cloud for Research pilot project is funded by the European Commission, DG Research & Innovation under contract no. 739563 12 • GOALS: • Develop a broadly useable framework and tool for FAIR assessment • Harmonize current ongoing efforts • Expected outcomes: • Checklist of clearly defined metrics • Indication of how these metrics can be readily implemented (e.g. self- reporting, automated) • Proposal(s) for undertaking a FAIR assessment of a digital resource • One or more implementations for performing an assessment Can we do it? FAIR Metrics Group
  13. 13. The DANS approach to FAIR www.eoscpilot.eu The European Open Science Cloud for Research pilot project is funded by the European Commission, DG Research & Innovation under contract no. 739563 13 Data Seal of Approval Principles (for data repositories) 2006 FAIR Principles (for data sets) 2014 data can be found on the internet Findable data are accessible Accessible data are in a usable format Interoperable data are reliable Reusable data can be referred to (citable) The resemblance is not perfect: • usable format (DSA) is an aspect of interoperability (FAIR) • reliability (DSA) is a precondition for reusability (FAIR) • FAIR explicitly addresses machine readability A DSA certified repository already guarantees a baseline data quality level
  14. 14. All data sets in a Trusted Repository are FAIR, but some are more FAIR than others
  15. 15. Findable (defined by metadata (PID included) and documentation) 1. No PID nor metadata/documentation 2. PID without or with insufficient metadata 3. Sufficient/limited metadata without PID 4. PID with sufficient metadata 5. Extensive metadata and rich additional documentation available Accessible (defined by presence of user license) 1. Metadata nor data are accessible 2. Metadata are accessible but data is not accessible (no clear terms of reuse in license) 3. User restrictions apply (i.e. privacy, commercial interests, embargo period) 4. Public access (after registration) 5. Open access unrestricted Interoperable (defined by data format) 1. Proprietary (privately owned), non-open format data 2. Proprietary format, accepted by Certified Trustworthy Data Repository 3. Non-proprietary, open format = ‘preferred format’ 4. As well as in the preferred format, data is standardised using a standard vocabulary format (for the research field to which the data pertain) 5. Data additionally linked to other data to provide context FAIR “Light” assessment
  16. 16. Towards a FAIR Framework? Analogous to Certification Framework? Formal ----------------------------------- Extended ----------------------------------------- Core All noses in the same direction?
  17. 17. DANS FAIR assessment profile • Developing the data assessment tool: FAIRdat • We want to create a badge system using the FAIR principles to assess data sets in a TDR • Operationalise the original principles to ensure no interactions among dimensions to ease scoring • Consider Reusability as the resultant of the other three: • the average FAIRness as an indicator of data quality • (F+A+I)/3=R • Manual and automatic scoring • New principles still in progress F A I R 2 User Reviews 1 Archivist Assessment 24 Downloads
  18. 18. Ideas for our operationalization Solution R1. (meta)data are richly described with a plurality of accurate and relevant attributes ≈ F2: “data are described with rich metadata” → F R1.1. (meta)data are released with a clear and accessible data usage license → A, aren’t licenses about access conditions? R1.2. (meta)data are associated with detailed provenance → F, appears as a criteria for Findable metadata R1.3. (meta)data meet domain-relevant community standards → I, ≈ standardized vocabularies Attempt to operationalize ‘R’ • Proposed solution
  19. 19. Creating the assessment tool FAIRdat
  20. 20. Accompanying documentation
  21. 21. Future goals… FAIRdat website
  22. 22. …FAIR database • Identifier • Name of Data Set • Depositor / “owner” • Repository • Discipline • … • FAIR scores per item • Total FAIR score • Badge
  23. 23. Display FAIR badges in any repository (Zenodo, Dataverse, Mendeley Data, figshare, B2SAFE, …)
  24. 24. Monitoring Open Science: research data •How can we use this tool for Monitoring Open Science? • Do you think people are willing to assess research data in repositories? • And: Who will do the assessment? (depositors, data experts at repositories, users) • Using the FAIR metrics • in own repository – harvested by OpenAIRE • in FAIR database – harvested by OpenAIRE www.eoscpilot.eu The European Open Science Cloud for Research pilot project is funded by the European Commission, DG Research & Innovation under contract no. 739563 24
  25. 25. Thank you for your attention! • You’re invited to visit the DANS/EOSCpilot workshop and try FAIRdat • FAIR metrics: starring your datasets •Friday 8 September at 11:30h •Book Castle Ground Floor • elly.dijk@dans.knaw.nl • peter.doorn@dans.knaw.nl • www.dans.knaw.nl • www.eoscpilot.eu • www.openaire.eu www.eoscpilot.eu The European Open Science Cloud for Research pilot project is funded by the European Commission, DG Research & Innovation under contract no. 739563 25 FAIR Metrics: STARRING YOUR DATAF A I R 2 User Reviews 1 Archivist Assessment 24 Downloads

×