Thank you for inviting me to give a brief perspective on the FAIR data principles. First of all I am intrigued, who of you have already heard of the FAIR data principles?
I am a partnerships programs manager and I work for ANDS. This is a commonwealth government funded project under the NCRIS program, like ALA. This year we are working with two other NCRIS facilities: RDS and Nectar in an aligned fashion. Together we are working on Services over Research data, tools for research communities and increasing the value of research data. This year we have picked up the FAIR data principles as a very useful way of thinking about research data and making research data more reusable.
So enough about us and on to the main topic of my talk today: The FAIR data principles. A community of experts met in Leiden in the Netherlands back in 2015 and came up with this set of principles. These have since really taken off and attracted a lot of attention. If you want to read more of the detail, I would recommend reading the Nature article and the list of resources on our Website (see the last slide of this presentation) FAIR stands for Findable, Accessible, Interoperable and Re-usable
I think there a number of reasons why the principles have proven so popular.
The principles have received international recognition as a really useful way of thinking about data and how it can be made more reusable.
There are a number of elements that are particularly of interest regarding the FAIR data principles and explain their success: The principles describe that data should not only be human readable but also machine readable. Making data machine readable supports much broader reuse and application of the data. It allows for the application of big data approaches, bringing together of separate data collections and allows machines to do pattern recognition and other forms of analysis. Which will support new innovative methods of research and allow for new unexpected findings and outcomes.
The principles are Technology agnostic, they do not recommend one technology over another,
The principles are discipline independent and can be picked up by any research discipline or area
It addresses both the metadata describing the data but also how the data can be made more reusable.
I will now briefly go through the principles one by one
I won’t go through all these principles in detail as that will take too much time, but try to highlight a few pointers
Findable The principles recommend that the data have a persistent identifier (like a DOI or a Handle) assigned to them so they can be found in the longer term And they are well described and Findable through a repository and relevant disciplinary and national repositories (like here in Australia: TERN, and Research Data Australia, and relevant international disciplinary registries)
The data should be accessible, but that does not necessarily mean it has to be Open. There are very good reasons why some data cannot be made Open. In the context of trait data that might be for reasons This can be for example - To protect the location of an endangered species
If it is not Open it should at least be accessible through appropriate protocols and it should be clear how this procedure works. For example first getting approval from an ethics committee.
On the technical side data can be really large and complex. In that case downloading the data will not make sense and then it should be possible for a machine to access a selected part of the data using community agreed data services.
Both the data and the metadata should be interoperable.
That means using a file format and a data format that is agreed in the research area/discipline so others can pick up the data and connect it with other data sets.
Using community agreed vocabularies is very useful in this regard.
Also in the metadata it is good to link to other information using identifiers so the re-user can find related information.
For example ORICIDs for authors DOIs to link to publications tools, software, workflows, Using grant IDs to link to Grant information
The last of the principles is to make the data re-usable. You could argue that all the previous principles are also needed to make it reusable. The main points here are that it is important to assign a clear standard machine readable licence to the data (for Open Data a Creative Commons licence is ideal).
It is also important to add information about the provenance of the data to the dataset. This gives the re-user a better picture how the data was collected, off which instrument, using which settings, which subsequent manipulations have taken place over the data.
We do not work in national isolation but work together with international partners in this space on developments surrounding research data and tools.
Thank you very much for your attention. And thank you again to Ginny for arranging this webinar and putting FAIR into perspective with other thoughts on FAIR.
Fair traits data 20180517
FAIR traits data
Partnerships Programs Manager
17 May 2018
What are the FAIR data principles?
• Drafted in a workshop in 2015
• Nature article and support by FORCE11
• Received international recognition
• Human readable and machine readable
• Technology agnostic
• Discipline independent
• Both the data and the metadata
F1. (meta)data are assigned a globally unique and
eternally persistent identifier.
(e.g. DOIs, Handles, Purls)
F2. data are described with rich metadata.
(e.g. dc, Darwin Core)
F3. (meta)data are registered or indexed in a
(e.g. TERN, RDA, TRY)
F4. metadata specify the data identifier.
A1 (meta)data are retrievable by their
identifier using a standardized communications
(e.g. http, OGC data services)
A1.1 the protocol is open, free, and universally
A1.2 the protocol allows for an authentication and
authorization procedure, where necessary.
A2 metadata are accessible, even when the data are
no longer available.
I1. (meta)data use a formal, accessible, shared, and
broadly applicable language for knowledge
(e.g. NetCDF, rdf representation)
I2. (meta)data use vocabularies (and ontologies) that
follow FAIR principles.
I3. (meta)data include qualified references to other
(e.g. ORCID, DOIs, Physical Object IDs, Grant IDs, RAIDs)
R1. meta(data) have a plurality of accurate and
R1.1. (meta)data are released with a clear and
accessible data usage license.
(e.g. Creative Commons)
R1.2. (meta)data are associated with
(e.g. information on observations, capture, circumstances, processing)
R1.3. (meta)data meet domain-relevant community
More information on the FAIR
For webinars, resources, workshops
Partnerships Programs Manager
03 9905 6273
With the exception of third party images or where otherwise indicated, this work is licensed under the Creative
Commons 4.0 International Attribution Licence.
ANDS, Nectar and RDS are supported by the Australian Government through the National Collaborative Research
Infrastructure Strategy Program (NCRIS).