Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

FAIR principles and metrics for evaluation

1,554 views

Published on

A talk about current work on developing FAIR metrics and assessing repositories.

Published in: Science
  • Hi there! Get Your Professional Job-Winning Resume Here - Check our website! http://bit.ly/resumpro
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

FAIR principles and metrics for evaluation

  1. 1. FAIR principles and metrics for evaluation 1 Michel Dumontier, Ph.D. Distinguished Professor of Data Science @micheldumontier::#DANSLOD:2017-05-01
  2. 2. Principles to enhance the value of all digital resources and their metadata. data, images, software, web services, repositories @micheldumontier::#DANSLOD:2017-05-012 http://www.nature.com/articles/sdata201618
  3. 3. @micheldumontier::#DANSLOD:2017-05-013
  4. 4. Rapid Adoption of Principles Developed and endorsed by researchers, publishers, funding agencies, industry partners. As of May 2017, 100+ citations since 2016 publication Included in G20 communique, EOSC, H2020, NIH, and more… @micheldumontier::#DANSLOD:2017-05-015
  5. 5. Hypothesis Improving the FAIRness of digital resources will increase their reuse. @micheldumontier::#DANSLOD:2017-05-016
  6. 6. What is FAIRness? FAIRness reflects the extent to which a digital resource addresses the FAIR principles as per the expectations defined by a community of stakeholders. @micheldumontier::#DANSLOD:2017-05-017
  7. 7. How do we assess compliance to the FAIR principles? • Principles identify what needs to be there, but they don’t tell what is necessary and/or sufficient • They also don’t tell you how to achieve FAIR • Going beyond the principles requires some thought about what constitutes FAIRness and how do we measure it. @micheldumontier::#DANSLOD:2017-05-018
  8. 8. Fundamental Questions • In what ways can we assess the FAIRness of a digital resource? • To what degree can we automate this assessment? • Must we treat each type of digital resource differently? • Who will use the metrics? The producers, the funders, or the users? • Can one resource be more FAIR than another? • Will/should FAIRness assessments impact funding decisions? • Should only one organization define these metrics? Or can anybody make their own metrics? What happens if a digital resources scores well against one set of metrics, but not another? @micheldumontier::#DANSLOD:2017-05-019
  9. 9. Horizon 2020: Data Management Plan Section 2. FAIR data 1. Making data findable, including provisions for metadata (5 questions) 2. Making data openly accessible (10 questions) 3. Making data interoperable (4 questions) 4. Increase data re-use (through clarifying licenses - 4 questions) Additional sections: 1. Data summary (6 questions, 5 of which also cover aspects of FAIRness) 2. Allocation of resources (4 questions) 3. Data security (2 questions) 4. Ethical aspects (2 questions) 5. Other issues (2 questions) Total of 23 + 16 = 39 questions!! @micheldumontier::#DANSLOD:2017-05-0110 https://goo.gl/Strjua
  10. 10. FAIRness of repositories • IDCC17 Practice Paper “Are the FAIR Data Principles fair?” by Alastair Dunning, Madelein de Smael, Jasmin Böhmer • web-interfaces, help-pages and metadata- records of over 40 data repositories were examined to score the individual data repository against the FAIR principles • ~2 months of work @micheldumontier::#DANSLOD:2017-05-0111 Data: http://dx.doi.org/10.4121/uuid:5146dd06-98e4-426c-9ae5-dc8fa65c549f Paper: https://zenodo.org/record/321423#.WNFNrTvytm8
  11. 11. 37 repositories @micheldumontier::#DANSLOD:2017-05-0112
  12. 12. Scoring the resources @micheldumontier::#DANSLOD:2017-05-0113
  13. 13. @micheldumontier::#DANSLOD:2017-05-0114
  14. 14. @micheldumontier::#DANSLOD:2017-05-0115
  15. 15. Overall evaluation @micheldumontier::#DANSLOD:2017-05-0116
  16. 16. Summary of Study • Impressive first attempt at a assessment of FAIRness across repositories • Issues – Lack of fully described mechanism by which repository owners can provide the necessary information. – Fully manual effort, but AFAIK inter-annotator agreement not established. – Not easy to scale, can we automate it? @micheldumontier::#DANSLOD:2017-05-0117
  17. 17. Measures for Digital Repositories • Data Seal of Approval – 6 core requirements – 16 criteria • DIN31644: Information and documentation - Criteria for trustworthy digital archives – 10 core requirements – 34 criteria • ISO16363: : Audit and certification of trustworthy digital repositories – 100+ criteria @micheldumontier::#DANSLOD:2017-05-0118
  18. 18. DSA The data can be found on the Internet The data are accessible (clear rights and licences) The data are in a usable format The data are reliable The data are identified in a unique and persistent way so that they can be referred to @micheldumontier::#DANSLOD:2017-05-0119
  19. 19. DSA 16 requirements 1. mission to provide access to and preserve data 2. licenses covering data access and use and monitors compliance. 3. continuity plan 4. ensures that data created/used in compliance with norms. 5. adequate funding and qualified staff through clear governance 6. mechanism(s) for expert guidance and feedback 7. guarantees the integrity and authenticity of the data 8. accepts data and metadata to ensure relevance and understandability 9. applies documented processes in archival 10. responsibility for preservation that is documented. 11. expertise to address data and metadata quality 12. archiving according to defined workflows. 13. enables discovery and citation. 14. enables reuse with appropriate metadata. 15. infrastructure 16. infrastructure @micheldumontier::#DANSLOD:2017-05-0120 https://www.datasealofapproval.org
  20. 20. Data Seal of Approval • self-assessment in the DSA online tool. The online tool takes you through the 16 requirements and provides you with support. • Once you have completed your self- assessment you can submit it for peer review. @micheldumontier::#DANSLOD:2017-05-0121
  21. 21. • Score data on each FAIR dimension (e.g. from 1 to 5) • Total score of FAIRness as an indicator of data quality • Scoring can only be partly automatic, not all principles can be established objectively: – scoring at ingest by data archivists of TDR – after reuse by data users (community review) @micheldumontier::#DANSLOD:2017-05-0122 Peter Doorn: https://dans.knaw.nl/nl/actueel/PresentationP.D..pdf
  22. 22. DANS FAIR metrics proposal @micheldumontier::#DANSLOD:2017-05-0123
  23. 23. @micheldumontier::#DANSLOD:2017-05-0124
  24. 24. @micheldumontier::#DANSLOD:2017-05-0125
  25. 25. @micheldumontier::#DANSLOD:2017-05-0126
  26. 26. @micheldumontier::#DANSLOD:2017-05-0127 http://www.w3.org/TR/hcls-dataset/
  27. 27. http://hw-swel.github.io/Validata/ VALIDATA DEMO @micheldumontier::#DANSLOD:2017-05-0128 RDF constraint validation tool Configurable to any profile Declarative reusable schema description Shape Expression (ShEx) constraints Open source javascript implementation
  28. 28. NIH Commons Framework Working Group on FAIR Metrics Aim: To identify and prototype methods to assess the FAIRness of a digital resource. – Identify and include initial stakeholders – Develop and discuss potential metrics – Explore ways in which to report and assess metrics. @micheldumontier::#DANSLOD:2017-05-0129
  29. 29. What is a metric? • A metric is a standard of measurement. • It must provide clear definition of what is being measured, why one wants to measure it. • It must describe the process by which you obtain a valid measurement result, so that it can be reproduced by others. It needs to specify what a valid result is. @micheldumontier::#DANSLOD:2017-05-0130
  30. 30. Example of a FAIRness Metric F1 (meta)data are assigned a globally unique and persistent identifier Aspect: Identifier Persistence Rationale: An identifier can be used to find, access, and reuse a resource. As such, it must be available to users in the longest term possible otherwise we will not be able to perform those functions with the identifier in hand. Relevant FAIR Principles: F,A,I,R Metric: Availability of data management plan, which includes a section dealing with continuity and contingencies related to the persistence of identifiers. The value of the metric is true or false. Procedure: Check and verify the URL in the resource metadata points to a data management plan with continuity section. Document should follow a community standard, or recommend a basic structure. @micheldumontier::#DANSLOD:2017-05-0131
  31. 31. Current Thinking: FAIRness Index • A FAIRness Index is a collection of metrics that are aligned to the FAIR principles and can be consistently and transparently evaluated. • A community, comprised of clearly defined stakeholders (researchers, publishers, users, etc), may define their own FAIRness Index that expresses what makes a digital resource ideally or maximally FAIR. @micheldumontier::#DANSLOD:2017-05-0132
  32. 32. Stakeholders People worried about – Findability – Accessibility – Interoperability – Reuse – Provenance – Licensing – Citation – Value @micheldumontier::#DANSLOD:2017-05-0133 People who are - Potential users - Resource creators - Academics - Publishers - Industry - The public - Funding agencies
  33. 33. Ways can we gather information to assess FAIRness A) Self assessment B) Self-appointed FAIR Assessment Team C) Automated assessment D) Crowdsourcing E) All of the above @micheldumontier::#DANSLOD:2017-05-0134
  34. 34. • Is there structured metadata describing the resource? – Check for embedded metadata as microdata or linked data – Check for hyperlinked documents with standardized formats: HCLS dataset description/DCAT schema.org annotations, etc • Are entries identified with a persistent identifier? – Is there a DOI with scholarly publications? – Is there a permanent URL for each item (w/out query parameters) – Is there a resource type specified, does it use a well known vocabulary such as EDAM, identifiers.org, etc. • Can the resource be found in a recognized repository? – E.g. a database in Biosharing – E.g. a tool in Elixir bio.tools – E.g. gene expression data in GEO • Can the resource be found with a web search engine? – What rank does the resource appear at when using the identifier or title in a web search? @micheldumontier::#DANSLOD:2017-05-0135 Sample Findable Metrics
  35. 35. Sample FAIR Metrics Accessible metrics • Are the (meta)data accessible by permanent URL? • Can you obtain the resource as a standardized language (e.g. HTML, XML, JSON, JSON-LD)? • Are the data downloadable in bulk or in part with an application programming interface (API)? Is the API documented using Swagger, smartAPI, or follow the Hydra protocol? Interoperable metrics • Are the (meta)data described with a community vocabulary? • Are the data and metadata linked to other datasets, vocabularies and ontologies? • Are the data and metadata expressed in universal languages (e.g. XML, JSON, JSON-LD, RDF/XML) Reusable metrics • Is there a license specified? Is it a standardized license? Is it linked to in the resource metadata? • Is it clear how the work should be cited? See the FORCE11 Data Citation Implementation Pilot and bioCADDIE Working Group 5. • Is there any indication of reuse beyond its original context and original creators? • Is there any indication of access through published statistics? @micheldumontier::#DANSLOD:2017-05-0136
  36. 36. michel.dumontier@maastrichtuniversity.nl Website: http://maastrichtuniversity.nl/ids Presentations: http://slideshare.com/micheldumontier 37 Early stages of thinking about assessing the FAIRness of digital resources. Your input can help shape this emerging phenomenon. Questions: 1. Does it make sense to, and what are the implications of assessing the FAIRness of digital resources? 2. What are the barriers to realizing the FAIR vision? METRICS @micheldumontier::#DANSLOD:2017-05-01

×