Applying FAIR (Findable, Accessible, Interoperable and Reusable) principles to linked datasets: opportunities and challenges
I am Keith Russell, Manager of Engagements at the Australian Research Data Commons Just one of the organisations worldwide that has embraced the FAIR data principles and are trying to raise awareness and understanding for the principles amongst researchers and those that support researchers.
In 2016 we started a project with SAHMRI to look at what it would mean to combine and share data sets in the health and medical field. We decided to bring together datasets around Colorectal cancer Both
Interventions: increasing FIT participation rates; (ii) increasing colonoscopy participation; (iii) changing ages of screening eligibility; and (iv) cost-benefits of each scenario
All of the economic data modelling to estimate the relative impact of changes to screening and diagnostic intervention is complete. The modelling as worked on improving to 60% bowel cancer screening and 90% follow-up in colonoscopy where indicated. It has demonstrated potential mortality reductions and cost-effectiveness for interventions to increase participation. The results are currently being written up for the medical journal of Australia. Not related specifically to the datalinkge project, but complementary to this work, Dr Dan Worthley (gastroenterologist and lead of No Australian Dying of Bowel Cancer project) and his clinical colleagues, Dr Peter Bampton, Dr Tarik Sammour, Dr Gregor Brown and Dr David Hewett, published a letter in the Medical Journal of Australia this week to make their case for uniform ratings for colonoscopy providers based on each provider’s affordability, availability and ability to perform the service (the 3-A's). Providing patients with real-time accurate information about their colonoscopy options using these three indicators is expected to improve patient autonomy and access and allow them to make a rational choice. Such information is also expected to help optimise the benefits of the National Bowel Cancer Screening Program (NBCSP) and allow public and private sectors to work more effectively together to eradicate bowel cancer death in Australia. With regard to the datalinkage specifically: David Roder’s team has been testing the quality of the linked CRC data, and are planning completion of the staging of private hospital data, now that we’ve achieved access to the relevant source data. Meanwhile, the present data have been used to demonstrate the impact of bowel screening participation on stage distribution and inferred mortality reduction in SA. Survivals have also been analysed by screening history. In addition, time lags from a positive screening test to colonoscopy have been analysed and compared across sociodemographic sub-groups ad with clinical standards. These data have been provided to the Data Steering Committee and presented to the GIT (gastrointestinal) tumour group at a face-to-face meeting. Other analyses of time from diagnosis to treatment, and associations with cancer survival are presently underway. This is the current state of play. I’m sure that’s far more background than you need. Good luck with the talk, sorry I can’t be there
What does it actually mean to implement these in to a discipline?
That is why we are organising the global sprint: to develop materials to help researchers when they are in a specific discipline and want to make their data FAIR.
And to give you some inspiration for possible things to cover: Think about what persistent identifiers are used in that discipline What discovery metadata is common and relevant
Transforming practice and approaches around the Country By partnering Coherent research environment Data intensive e-Infrastructures
In exploring making this data FAIR, the challenges and questions that arose were: Sensitive health data Multiple data custodians To move beyond the original, approved linkage project, custodians will all need to be consulted
We don’t yet know the appetite of all the different custodians including clinician scientists and state and commonwealth government agencies for the linked data set to be made FAIR We don’t yet know the appetite of the data linkage platform partner (SURE) or the resourcing/capacity
The project continues to be a pathfinder in this field, addressing questions that have not been raised in linking human identifiable datasets
Applying FAIR principles to linked datasets: Opportunities and Challenges
Applying FAIR principles to linked
datasets: Opportunities and
Colorectal cancer data
Keith Russell, Manager Engagements
Value of the data
• Colorectal Cancer (CRC) is one priority area of the SA Academic Health Science and
• The Beat Bowel Cancer Project has the ambitious target of eliminating premature and
preventable mortality from CRC
• There is a high burden of disease in Australia, and bowel cancer is Australia’s second
leading cause of cancer death
• CRC is highly amenable to intervention:
Linkage, first steps in making it FAIR
To succeed, the Beat Bowel Cancer Project needs data to drive rational decision making about where
to focus interventions
Economic modelling is being undertaken to predict potential deaths prevented from a range of
Data linkage is being undertaken to enable a ‘whole-of-system’ view of CRC, to identify unexplained
variations in health care. Linkage includes:
• Population cancer registry data (incidence, mortality and survival by cancer type, age, sex, SES,
remoteness, Aboriginal status, and country of birth);
• SA clinical cancer registry data (stage and grade); SA inpatient and radiotherapy data, clinician
held registries, Commonwealth MBS/PBS claims data and National Bowel Cancer Screening
CRC project achievements
• All state based data linked, analysed and reported on. New insights in terms of
burdens of mortality.
• Commonwealth data linked
• A national first – a population-based health-system-wide data set for research, health
service planning and evaluation, monitoring effectiveness, cost-effectiveness and
equity in service delivery
• The data linkage project can now: inform our local clinicians; inform our health
services; enable the Beat Bowel Cancer Project to target interventions to improve
• Findings will be published in the scientific literature.
… but what did making the data FAIR
more broadly mean in practice?
F1. (meta)data are assigned a globally unique and eternally
F2. data are described with rich metadata.
F3. (meta)data are registered or indexed in a searchable
F4. metadata specify the data identifier.
● Minting a DOI
● Providing Rich Metadata
A1 (meta)data are retrievable by their
identifier using a standardized
A1.1 the protocol is open, free, and
A1.2 the protocol allows for
an authentication and authorization
procedure, where necessary.
A2 metadata are accessible, even when
the data are no longer available.
● Metadata will be accessible from
Research Data Australia
● DOI will resolve to a landing page e.g. SA
Translation Centre, which is maintained
● The Data Collection would be accessible
from within SURE, to other researchers
with the appropriate ethics approvals
I1. (meta)data use a formal,
accessible, shared, and broadly
applicable language for knowledge
use vocabularies (and ontologies)
that follow FAIR principles.
I3. (meta)data include qualified
references to other (meta)data.
• Data has already been assembled using firmly
established methods and standard language
• Data collection will be described using FOR
codes, MESH subject headings,
• Links established to investigators' ORCID IDs,
relevant future Grant IDs and publications
R1. meta(data) have a plurality of accurate and
R1.1. (meta)data are released with a clear and
accessible data usage license.
R1.2. (meta)data are associated with
R1.3. (meta)data meet domain-relevant
The specific reuse conditions would
be provided to each new research
project that gains ethics and data
custodian approvals to use the Data
Collection via SURE
Challenges and questions that arose
Sensitive health data from multiple data custodians
To move beyond the original, approved linkage project, custodians will all need to be consulted
We don’t yet know the appetite of all the different custodians including clinician scientists and
state and commonwealth government agencies for the linked data set to be made FAIR
We don’t yet know the appetite of the data linkage platform partner (SURE) or the
The project continues to be a pathfinder in this field, addressing questions that have not been
raised in linking human identifiable datasets
With the exception of third party images or where otherwise indicated, this work is licensed under the
Creative Commons 4.0 International Attribution Licence
M: 04 2745 23 42
The ARDC is supported by the Australian
Government through the National
Collaborative Research Infrastructure
Research Project Achievements
Professor David Roder