Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
www.dans.knaw.nl
DANS is an institute of KNAW and NWO
FAIR Data in Trustworthy Data Repositories:
Everybody wants to play ...
Who we are
Watch our videos on YouTube
https://www.youtube.com/user/DANSDataArchiving
What you can expect to learn
• General understanding of core requirements for trustworthy
data repositories
• General unde...
DANS and DSA
• 2005: DANS to promote and provide permanent access to
digital research information
• Formulate quality guid...
The certification landscape
DSA and WDS: look-a-likes
Communalities:
• Lightweight, community review
Complementarity:
• Geographical spread
• Discipli...
Partnership
Goals:
• Realizing efficiencies
• Simplifying assessment options
• Stimulating more certifications
• Increasin...
New common requirements
• Context (1)
• Organizational infrastructure (6)
• Digital object management (8)
• Technology (2)...
1. Organisational infrastructure
R1. The repository has an explicit mission to provide access to and preserve data in
its ...
2. Digital object management (1)
R7. The repository guarantees the integrity and authenticity of the data.
R8. The reposit...
2. Digital object management (2)
R11. The repository has appropriate expertise to address technical data
and metadata qual...
3. Technical infrastructure
R15. The repository functions on well-supported operating systems and
other core infrastructur...
New requirements are out now!
http://www.datasealofapproval.org/en/news-and-events/news/2016/11/25/wds-
and-dsa-announce-u...
Back to 2005: DSA principles
The	DSA	is	intended	to	ensure	that:
• The	data	can	be	found	on	the	internet
• The	data	are	ac...
FAIR Data Principles
Workshop Leiden 2014: minimal set of community agreed
guiding principles to make data more easily dis...
FAIR principles
In the FAIR approach, data should be:
1. Findable – Easy to find by both humans and computer
systems and b...
Paper in:
http://www.nature.com/articles/sdata201618
Everybody loves FAIR!
Everybody	wants	to	be	FAIR…	But	what	does	that	
mean?	How	to	put	the	principles	into	practice?
Implementing the FAIR Principles?
See:	http://datafairport.org/fair-principles-living-document-menu and	
https://www.force...
Different implementations for different aims?
1. FAIR data management: posing requirements for new data creation
2. FAIR d...
Creation: New Horizon 2020 guidelines on
FAIR Data Management
Section 2. FAIR data
1. Making data findable, including prov...
This may result in:
FAIR	
DMP
Assess: Resemblance DSA – FAIR principles
DSA	Principles	(for	data	repositories) FAIR	Principles	(for	data	sets)
data	can	...
Combine and operationalize: DSA & FAIR
• Growing demand for quality criteria for
research datasets and a way to assess
the...
How could it work?
• Consider F, A and I as separate dimensions of
data quality
• Interaction effects among dimensions
com...
Each dataset a FAIR profile
Examples:
• Dataset X has FAIR profile F4-A3-I2 è R=3
• PID with limited metadata (well findab...
By the way, alternative visualisation ideas…
FAIR dice suggested by
Herbert van de Sompel
FAIR letters suggested by
Ian Du...
Operationalising F - Findable
Findable - defined by metadata, documentation (and
identifier for citation):
1. No PID and n...
Operationalising F - Findable
Findable - defined by metadata, documentation (and
identifier for citation):
1. No PID and n...
Operationalising A - Accessible
Accessible - defined by presence of a user license
[metadata retrievable by identifier: al...
Operationalising I - Interoperable
Interoperable - defined by the data
format:
1. Proprietary, non-open format data
2. Pro...
First we attempted to operationalise R –
Reusable as well… but we changed our mind
Reusable – is it a separate dimension? ...
What it would look like in the DANS EASY archive
Or in Zenodo, Dataverse, Mendeley Data, figshare,
B2SAFE, … (if they comply with the DSA)
Or in Zenodo, Dataverse, Mendeley Data, figshare,
B2SAFE, … (if they comply with the DSA)
Towards a FAIR Data Assessment Tool
• Independent website like the DSA-website
• Repositories will link to the FAIR assess...
Can FAIR Data Assessment be automatic?
Criterion Automatic?	
Y/N/Semi
Subjective?
Y/N/Semi
Comments
F1 No	PID	/	No	Metadat...
Main takeaways
• The core certification requirements and the FAIR
principles form a perfect couple for quality
assessment ...
Hands-on FAIR data management: IDCC workshops
12th International Digital Curation Conference, Edinburgh, 20-23 February 20...
Thank you for listening
peter.doorn@dans.knaw.nl
ingrid.dillo@dans.knaw.nl
www.dans.knaw.nl
FAIR Data in Trustworthy Data Repositories Webinar - 12-13 December 2016| www.eudat.eu |
Upcoming SlideShare
Loading in …5
×

FAIR Data in Trustworthy Data Repositories Webinar - 12-13 December 2016| www.eudat.eu |

1,920 views

Published on

| www.eudat.eu | This webinar was co-organised by DANS, EUDAT and OpenAIRE and was held on 12th and 13th December 2016.


Everybody wants to play FAIR, but how do we put the principles into practice?

There is a growing demand for quality criteria for research datasets. In this webinar we will argue that the DSA (Data Seal of Approval for data repositories) and FAIR principles get as close as possible to giving quality criteria for research data. They do not do this by trying to make value judgements about the content of datasets, but rather by qualifying the fitness for data reuse in an impartial and measurable way. By bringing the ideas of the DSA and FAIR together, we will be able to offer an operationalization that can be implemented in any certified Trustworthy Digital Repository.

In 2014 the FAIR Guiding Principles (Findable, Accessible, Interoperable and Reusable) were formulated. The well-chosen FAIR acronym is highly attractive: it is one of these ideas that almost automatically get stuck in your mind once you have heard it. In a relatively short term, the FAIR data principles have been adopted by many stakeholder groups, including research funders.

The FAIR principles are remarkably similar to the underlying principles of DSA (2005): the data can be found on the Internet, are accessible (clear rights and licenses), in a usable format, reliable and are identified in a unique and persistent way so that they can be referred to. Essentially, the DSA presents quality criteria for digital repositories, whereas the FAIR principles target individual datasets.



In this webinar the two sets of principles will be discussed and compared and a tangible operationalization will be presented.

Published in: Data & Analytics
  • Be the first to comment

FAIR Data in Trustworthy Data Repositories Webinar - 12-13 December 2016| www.eudat.eu |

  1. 1. www.dans.knaw.nl DANS is an institute of KNAW and NWO FAIR Data in Trustworthy Data Repositories: Everybody wants to play FAIR, but how do we put the principles into practice? Peter Doorn, Director DANS Ingrid Dillo, Deputy Director DANS EUDAT/OpenAIRE webinar, 12 and 13 December 2016
  2. 2. Who we are Watch our videos on YouTube https://www.youtube.com/user/DANSDataArchiving
  3. 3. What you can expect to learn • General understanding of core requirements for trustworthy data repositories • General understanding of the FAIR principles • Introduction to a possible way of operationalizing the FAIR principles
  4. 4. DANS and DSA • 2005: DANS to promote and provide permanent access to digital research information • Formulate quality guidelines for digital repositories including DANS (TRAC, Nestor) • 2006: 5 basic principles as basis for 16 DSA guidelines • 2009: international DSA Board • Over 60 seals acquired around the globe, but with a focus on Europe
  5. 5. The certification landscape
  6. 6. DSA and WDS: look-a-likes Communalities: • Lightweight, community review Complementarity: • Geographical spread • Disciplinary spread
  7. 7. Partnership Goals: • Realizing efficiencies • Simplifying assessment options • Stimulating more certifications • Increasing impact on the community Outcomes: • Common catalogue of requirements for core repository assessment • Common procedures for assessment • Shared testbed for assessment
  8. 8. New common requirements • Context (1) • Organizational infrastructure (6) • Digital object management (8) • Technology (2) • Additional information and applicant feedback (2)
  9. 9. 1. Organisational infrastructure R1. The repository has an explicit mission to provide access to and preserve data in its domain. R2. The repository maintains all applicable licenses covering data access and use and monitors compliance. R3. The repository has a continuity plan to ensure ongoing access to and preservation of its holdings. R4. The repository ensures, to the extent possible, that data are created, curated, accessed, and used in compliance with disciplinary and ethical norms. R5. The repository has adequate funding and sufficient numbers of qualified staff managed through a clear system of governance to effectively carry out the mission. R6. The repository adopts mechanism(s) to secure ongoing expert guidance and feedback (either in-house, or external, including scientific guidance, if relevant).
  10. 10. 2. Digital object management (1) R7. The repository guarantees the integrity and authenticity of the data. R8. The repository accepts data and metadata based on defined criteria to ensure relevance and understandability for data users. R9. The repository applies documented processes and procedures in managing archival storage of the data. R10. The repository assumes responsibility for long-term preservation and manages this function in a planned and documented way.
  11. 11. 2. Digital object management (2) R11. The repository has appropriate expertise to address technical data and metadata quality and ensures that sufficient information is available for end users to make quality-related evaluations. R12. Archiving takes place according to defined workflows from ingest to dissemination. R13. The repository enables users to discover the data and refer to them in a persistent way through proper citation. R14. The repository enables reuse of the data over time, ensuring that appropriate metadata are available to support the understanding and use of the data.
  12. 12. 3. Technical infrastructure R15. The repository functions on well-supported operating systems and other core infrastructural software and is using hardware and software technologies appropriate to the services it provides to its Designated Community. R16. The technical infrastructure of the repository provides for protection of the facility and its data, products, services, and users.
  13. 13. New requirements are out now! http://www.datasealofapproval.org/en/news-and-events/news/2016/11/25/wds- and-dsa-announce-uni-ed-requirements-core-cert/ https://www.icsu-wds.org/news/news-archive/wds-dsa-unified-requirements-for- core-certification-of-trustworthy-data-repositories
  14. 14. Back to 2005: DSA principles The DSA is intended to ensure that: • The data can be found on the internet • The data are accessible (clear rights and licenses) • The data are in a usable format • The data are reliable • The data are identified in a unique and persistent way so that they can be referred to
  15. 15. FAIR Data Principles Workshop Leiden 2014: minimal set of community agreed guiding principles to make data more easily discoverable, accessible, appropriately integrated and re-usable, and adequately citable. FAIR principles: • Findable • Accessible • Interoperable • Reusable (both for machines and for people)
  16. 16. FAIR principles In the FAIR approach, data should be: 1. Findable – Easy to find by both humans and computer systems and based on mandatory description of the metadata that allow the discovery of interesting datasets; 2. Accessible – Stored for long term such that they can be easily accessed and/or downloaded with well-defined license and access conditions (Open Access when possible), whether at the level of metadata, or at the level of the actual data content; 3. Interoperable – Ready to be combined with other datasets by humans as well as computer systems; 4. Reusable – Ready to be used for future research and to be processed further using computational methods. Source: http://www.dtls.nl/fair-data/
  17. 17. Paper in: http://www.nature.com/articles/sdata201618
  18. 18. Everybody loves FAIR! Everybody wants to be FAIR… But what does that mean? How to put the principles into practice?
  19. 19. Implementing the FAIR Principles? See: http://datafairport.org/fair-principles-living-document-menu and https://www.force11.org/group/fairgroup/fairprinciples
  20. 20. Different implementations for different aims? 1. FAIR data management: posing requirements for new data creation 2. FAIR data assessment: establishing the profile of existing data 3. FAIR data technologies: transformation tools to make data FAIR Creation Assessment Transformation
  21. 21. Creation: New Horizon 2020 guidelines on FAIR Data Management Section 2. FAIR data 1. Making data findable, including provisions for metadata (5 questions) 2. Making data openly accessible (10 questions) 3. Making data interoperable (4 questions) 4. Increase data re-use (through clarifying licenses - 4 questions) Additional sections: 1. Data summary (6 questions, 5 of which also cover aspects of FAIRness) 3. Allocation of resources (4 questions) 4. Data security (2 questions) 5. Ethical aspects (2 questions) 6. Other issues (2 questions) Total of 23 + 16 = 39 questions!!
  22. 22. This may result in: FAIR DMP
  23. 23. Assess: Resemblance DSA – FAIR principles DSA Principles (for data repositories) FAIR Principles (for data sets) data can be found on the internet Findable data are accessible Accessible data are in a usable format Interoperable data are reliable Reusable data can be referred to (citable) The resemblance is not perfect: • usable format (DSA) is an aspect of interoperability (FAIR) • reliability (DSA) is a condition for reuse (FAIR) • FAIR explicitly addresses machine readability • citability is in FAIR an aspect of findability
  24. 24. Combine and operationalize: DSA & FAIR • Growing demand for quality criteria for research datasets and a way to assess their fitness for use • Combine the principles of core repository certification and FAIR • Use the principles as quality criteria: • Core certification – digital repositories • FAIR – research data (sets) • Operationalize the principles to make them easily implementable in any trustworthy digital repository
  25. 25. How could it work? • Consider F, A and I as separate dimensions of data quality • Interaction effects among dimensions complicate scoring, and so do elements that occur under different FAIR dimensions • Score datasets on each dimension (from 1 to 5) • Consider Reusability as the resultant of the other three: • Consider R, the average FAIRness as an indicator of data quality • (F + A + I) / 3 = R • Make scoring as automatic as possible, although not all principles can be established objectively • scoring at ingest by data archivists of TDR • after reuse by data users (community review)
  26. 26. Each dataset a FAIR profile Examples: • Dataset X has FAIR profile F4-A3-I2 è R=3 • PID with limited metadata (well findable; 4) • Accessible with some restrictions (3) • Fairly low interoperability (2) • Dataset Y has FAIR profile F5-A2-I3 è R=3,3 • PID and extensive metadata (very easy to find; 5) • Only metadata accessible (2) • Average rating on interoperability (3) Numbers of assessments, reviews and downloads indicated as well
  27. 27. By the way, alternative visualisation ideas… FAIR dice suggested by Herbert van de Sompel FAIR letters suggested by Ian Duncan FAIR leaves suggested by Wouter Haak FAIR clover 1 FAIR clover 2 FAIR sunflower
  28. 28. Operationalising F - Findable Findable - defined by metadata, documentation (and identifier for citation): 1. No PID and no metadata/documentation 2. PID without or with insufficient* metadata 3. Sufficient* metadata without PID 4. PID with sufficient* metadata – Information on data provenance 5. PID, rich metadata and additional documentation – Additional explanation of how data can be used * Sufficient = enough metadata to understand what the data is about
  29. 29. Operationalising F - Findable Findable - defined by metadata, documentation (and identifier for citation): 1. No PID and no metadata/documentation 2. PID without or with insufficient* metadata 3. Sufficient* metadata without PID 4. PID with sufficient* metadata – Information on data provenance 5. PID, rich metadata and additional documentation – Additional explanation of how data can be used * Sufficient = enough metadata to understand what the data is about
  30. 30. Operationalising A - Accessible Accessible - defined by presence of a user license [metadata retrievable by identifier: already included under F]: 1. No user license / unclear conditions of reuse / metadata nor data are accessible 2. Metadata are accessible (even when the data are not or no longer available) 3. User restrictions apply (of any kind, including privacy, commercial interests, embargo period, etc.) 4. Public Access (after registration) 5. Open Access (unrestricted, CC0 – perhaps also CCby?) Note 1: Some people want “Openness” to be separate from Accessible Note 2: A3 could be seen as an acceptable threshold (e.g. by funders)
  31. 31. Operationalising I - Interoperable Interoperable - defined by the data format: 1. Proprietary, non-open format data 2. Proprietary format, accepted by DSA Certified Trusted Data Repository 3. Non-proprietary, open format (= “preferred” or “archival” format) 4. Data is additionally harmonized/ standardized, using standard vocabularies 5. Data is additionally linked to other data to provide context Note: this is an adaptation of Tim Berners-Lee’s 5-star open data plan!
  32. 32. First we attempted to operationalise R – Reusable as well… but we changed our mind Reusable – is it a separate dimension? Partly subjective: it depends on what you want to use the data for! Idea for operationalization Why we rejected it Clear provenance of data (to facilitate both replication and reuse) Aspect of F4 Data is in a TDR – unsustained data will not remain usable Aspect of Repository à Data Seal of Approval Explication on how data was or can be used is available Aspect of F5 Data automatically usable by machines Aspect of I5 Data is reliable (replicable) Can only be known after re-analysis
  33. 33. What it would look like in the DANS EASY archive
  34. 34. Or in Zenodo, Dataverse, Mendeley Data, figshare, B2SAFE, … (if they comply with the DSA)
  35. 35. Or in Zenodo, Dataverse, Mendeley Data, figshare, B2SAFE, … (if they comply with the DSA)
  36. 36. Towards a FAIR Data Assessment Tool • Independent website like the DSA-website • Repositories will link to the FAIR assessment website • The website will provide: Ø Assessment tool (“questionnaire” with explanation and examples for each criterion) Ø Online database containing: o Repository holding the dataset o PID (+ basic metadata such as name) of dataset o Reviewer info (ID can be withheld – anonymous reviews should be possible) o FAIR profile and scores Ø Analytics of FAIR profiles
  37. 37. Can FAIR Data Assessment be automatic? Criterion Automatic? Y/N/Semi Subjective? Y/N/Semi Comments F1 No PID / No Metadata Y N F2 PID / Insuff. Metadata S S Insufficient metadata is subjective F3 No PID / Suff. Metadata S S Sufficient metadata is subjective F4 PID / Sufficient Metadata S S Sufficient metadata is subjective F5 PID / Rich Metadata S S Rich metadata is subjective A1 No License / No Access Y N A2 Metadata Accessible Y N A3 User Restrictions Y N A4 Public Access Y N A5 Open Access Y N I1 Proprietary Format S N Depends on list of proprietary formats I2 Accepted Format S S Depends on list of accepted formats I3 Archival Format S S Depends on list of archival formats I4 + Harmonized N S Depends on domain vocabularies I5 + Linked S N Depends on semantic methods used Optional: qualitative assessment / data review
  38. 38. Main takeaways • The core certification requirements and the FAIR principles form a perfect couple for quality assessment of research data and trustworthy data repositories • DANS is developing an operationalization of these principles: • Data archive staff will assess FAIRness of data upon ingest • Data users will assess FAIRness upon reuse • Scoring mechanism as automatic as possible • Ideally: all certified trustworthy repositories should contain FAIR data • But: the FAIR scoring mechanism is applicable in any repository
  39. 39. Hands-on FAIR data management: IDCC workshops 12th International Digital Curation Conference, Edinburgh, 20-23 February 2017 How EUDAT Services could support FAIR data (workshop 4) … In this workshop we relate the FAIR principles to EUDAT’s Service Suite. You will experiment with community metadata and with an annotation tool for curating and retrieving data. Community representatives will talk about their experiences... OpenAIRE services and tools for Open Research Data in H2020 (workshop 6) … The workshop will focus on practical issues of storage and data sharing aspects, on how to select an appropriate data repository, on guidance for efficient data management plans, on the monitoring of contextualized (linked) research… Essentials 4 Data Support: the Train-the Trainer version (workshop 9) … You learn how Research Data Netherlands made the popular ‘Essentials 4 Data Support’ training with rich online content, weekly assignments and expert speakers… to practice hands- on writing DMPs and making a data management stakeholder analysis… http://www.dcc.ac.uk/events/idcc17 For more information and registration see:
  40. 40. Thank you for listening peter.doorn@dans.knaw.nl ingrid.dillo@dans.knaw.nl www.dans.knaw.nl

×