SlideShare a Scribd company logo
FAIR as a Basis for the Cancer
Research Data Commons
Ian Fore, D.Phil.
NCI Center for Biomedical Informatics and Information Technology
@ianmfore
Demystifying FAIR Science: Examples, Tools and Use Cases
ISMB/ECCB 2019
July 23 2019
Three themes
• Experience of FAIR
• Capturing diversity of science
• Allow more people to be compliant
• Understanding collaboration
Gartner hype cycle
Under the hype curve
“there are emerging indications that the original meanings of findable,
accessible, interoperable, and reusable sometimes may be stretched”
“the proposed implementation of these principles, with the goal of an
Inter- net of FAIR Data and Services, is beginning to raise concern and
confusion”
Cloudy, increasingly FAIR; revisiting the FAIR Data guiding
principles for the European Open Science Cloud
Mons et al (2017)
Information Services & Use, vol. 37, pp. 49-56
FAIR Evaluation
Desirable characteristics of FAIR Assessment
• Transparency
• What are the criteria
• Who is doing the assessment
• Qualifications for doing so
• Not just a score
• Non judgmental
• Reflect qualities of a given resource
• Not strict
• Important to leave scope for the novelty of science
CC BY-SA 4.0
Case statement of the WG
2019-04-03 www.rd-alliance.org - @resdatall 3
Challenge
Ambiguity and wide range of interpretations of FAIRness
Lack of a common set of core assessment criteria and a minimum set of shared
guidelines
Approach
Bring together stakeholders
Build on existing approaches and expertise
Intended results
RDA Recommendation of core assessment criteria
Generic and expandable self-assessment model
Self-assessment toolset
FAIR data checklist
9
CC BY-SA 4.0
Measuring maturity
Alternative #1
2019-04-03 www.rd-alliance.org - @resdatall 19
Five options for R1.1 [metadata/data]
Level 0: no licence
Level 1: non standard licence in a human-readable format allowing access
Level 2: standard licence in a human-readable format allowing access
Level 3: standard open licence in a human-readable format allowing reuse
Level 4: standard open licence in a machine-readable format allowing reuse
Level 5: standard open licence in a machine-readable format with clear
criteria allowing reuse
Each option is defining a maturity level
Method step 9
Level 0
Level 1
Level 2
Level 3
Level 4
FAIRNess compliance for R1.1
Level 5
11
Data Commons Framework
Clinical Proteomics ImagingGenomics Immuno-
oncology
Cancer Models Biomarkers
NCI Cancer Research
Data Commons
SBG CGC
Broad FireCloud ISB CGC
Elastic
Compute
Query
Visualization
Clinical Proteomics Tumor
Analysis Consortium*
Tool
Deployment
The Cancer Imaging Archive*
TCIA
Web
Interface
APIs Data
Submission
Authentication
& AuthorizationAuthentication
& Authorization
Data Models &
Dictionaries
Computational
Workspaces
Data Contributors and Consumers
Tool
Repositories
Metadata
Validation
& Tools
Analysis
As Is Genomics
GDC and Cloud Resources are
available now; Framework, As Is
Genomics, PDC, IDC are in
development; all else is
notional.
NCBI Sequence Data Delivery Pilot (SDDP)
• Sequence data will be stored in commercial cloud
• Data submission by existing DbGaP mechanisms
• Familiar to investigators
• Minimal change to existing DbGaP sequence submission
• VCF and phenotype observations in DbGaP
• Relieves pressure on SRA by allowing NCI to provide directly
for storage of data from its programs
Data “As Is?”
• Not as harmonized and restructured as e.g. Genomic Data Commons
• Goal:
• Usable by those not engaged in the creation or production of the dataset
• FAIR as a general principle
• Ensure the context of samples is captured
Standards essential but…
• Science inherently non-standard
• Even when standard – large diversity of biomedical data
• Need to understand data standards from other domains
• Domains constantly in flux
• Data models reflect unique study designs
• Harmonization has proven expensive
Shift what we ask for
Not just
“These are the columns you must provide”
“Fit your data into our model”
But also (not strict)
What are the variables that define your study?
What is the model of your data? Standardized or not.
Standardizing the description of the non-
standard
<variable id="phv00357184.v1">
<name>SEX</name>
<description>Sex of participant</description>
<type>string</type>
<value>Female</value>
<value>Male</value>
</variable>
<variable id="phv00169062.v7">
<name>SEX</name>
<description>Sex</description>
<type>integer, encoded value</type>
<comment>Sex The Donor's Identification of
sex based upon self-report, family/next of kin,
or medical record abstraction. </comment>
<value code="1">Male</value>
<value code="2">Female</value>
</variable>
One dbGaP study - GECCO
Another dbGaP study - GTEx
What is needed for standard description of
variables?
<variable id="phv00217659.v3">
<name>SMUBRTRM</name>
<description>Uberon Term, anatomical location as described by the Uber Anatomy
Ontology (UBERON)</description>
<type>string</type>
<comment>Generated at LDACC Term as specified by the Uber Anatomy
Ontology (UBERON),
http://bioportal.bioontology.org/ontologies/UBERON. </comment>
</variable>
<variable id="phv00169242.v7">
<name>SMTSISCH</name>
<description>Total Ischemic time for a sample</description>
<type>integer</type>
<unit>Minutes</unit>
<comment>Sample Ischemic Time Interval between actual death, presumed death,
or cross clamp application and final tissue stabilization. </comment>
</variable>
From GTEX Sample Attributes
What is the standard to describe data
Things, Attributes, Relationships
• Some existing candidates
• Metadata repositories
• ISO11179 style (caDSR, CDISC, etc.)
• RDA Data Type Registries
• GA4GH – Discovery Work Stream – Search/SchemaBlocks
• BDBag
• PFB – Portable Format for Biomedical Data
• JSON-LD
Data Sharing:
It’s not the technology, it’s the
attitude we take
Greg Simon, Director – Cancer Moonshot Task
Force
NIH Institutes Represented in BD2K Radical
Collaboration Training
• Center for Scientific Review
• National Center for Advancing Translational
Sciences
• National Cancer Institute
• National Human Genome Research Institute
• National Heart, Lung, and Blood Institute
• National Institute on Aging
• National Institute of Allergy and Infectious
Diseases
• National Institute of Biomedical Imaging and
Bioengineering
• National Institute of Child Health and Human
Development
• National Institute on Drug Abuse
• National Institute of General Medical Sciences
• National Institute of Mental Health
• National Institute on Minority Health and
Health Disparities
• NIH Office of the Director
Thanks to: Phil Bourne and Warren Kibbe
we wrestle with the question of
whether natural selection inherently
favors selfish behavior. Is the
process of evolutionary competion
cruel, or does it sometimes pay to be
nice?
https://www.wnycstudios.org/story/104082-prisoners-dilemma
FIRO-B
All Human beings share some basic similarities
All people want to
feel:
Significant Competent Likable
All people have some
fear of being:
Ignored Humiliated Rejected
All people have
behavior preferences
about:
Inclusion Control Openness
By creating environments that invite people to feel significant, competent and likable, you
reduce the level of fear and create environments that are more conducive to honesty,
collaboration, accountability and fun! It is in these environments where people bring their best
to the workplace and productivity soars in an atmosphere of trust.
By being accountable for your own internal environment, you take responsibility for your own
psychological patterns and mental models and do not inappropriately associate your own
deeper self-concept with the substantive issues you are working with.
Celeste Blackman, Green Zone Culture Group
Operational phase matters more than the
outset
• FAIR has served to bring the community together
• Success depends on the follow through
• There is no silver bullet
• Fred Brooks
26
Submit Proposal to Build the Cancer Data Aggregator
• Accessible via NCI Cloud Resources and
other applications.
• API layer will allow users to query across:
• NCI Cancer Research Data Commons
(CRDC)
• NCI Data Coordinating Centers
(e.g. HTAN DCC)
• Additional Repositories
(e.g. KidsFirst DRC)
Proposals due by August 15, 2019: https://go.usa.gov/xyKe3

More Related Content

What's hot

ReVeaLD: A user-driven domain-specific interactive search platform for biomed...
ReVeaLD: A user-driven domain-specific interactive search platform for biomed...ReVeaLD: A user-driven domain-specific interactive search platform for biomed...
ReVeaLD: A user-driven domain-specific interactive search platform for biomed...
Maulik Kamdar
 
Model Organism Linked Data
Model Organism Linked DataModel Organism Linked Data
Model Organism Linked Data
Michel Dumontier
 
Data Mining and Machine Learning (DMML)
Data Mining and Machine Learning (DMML)Data Mining and Machine Learning (DMML)
Data Mining and Machine Learning (DMML)
butest
 
Semantic approaches for biomedical knowledge discovery - Discovery Science 20...
Semantic approaches for biomedical knowledge discovery - Discovery Science 20...Semantic approaches for biomedical knowledge discovery - Discovery Science 20...
Semantic approaches for biomedical knowledge discovery - Discovery Science 20...
Michel Dumontier
 
Thesis Proposal, as presented for dissertation proposal defense
Thesis Proposal, as presented for dissertation proposal defenseThesis Proposal, as presented for dissertation proposal defense
Thesis Proposal, as presented for dissertation proposal defense
Heather Piwowar
 
Engaging Diverse Communities in Cancer Conversations Through Creation of Stru...
Engaging Diverse Communities in Cancer Conversations Through Creation of Stru...Engaging Diverse Communities in Cancer Conversations Through Creation of Stru...
Engaging Diverse Communities in Cancer Conversations Through Creation of Stru...
University of Michigan Taubman Health Sciences Library
 
Using the Micropublications ontology and the Open Annotation Data Model to re...
Using the Micropublications ontology and the Open Annotation Data Model to re...Using the Micropublications ontology and the Open Annotation Data Model to re...
Using the Micropublications ontology and the Open Annotation Data Model to re...
jodischneider
 
CINECA webinar slides: Open science through fair health data networks dream o...
CINECA webinar slides: Open science through fair health data networks dream o...CINECA webinar slides: Open science through fair health data networks dream o...
CINECA webinar slides: Open science through fair health data networks dream o...
CINECAProject
 
Nicole Nogoy: GigaScience...how licensing can change the way we do research
Nicole Nogoy: GigaScience...how licensing can change the way we do researchNicole Nogoy: GigaScience...how licensing can change the way we do research
Nicole Nogoy: GigaScience...how licensing can change the way we do research
GigaScience, BGI Hong Kong
 
CINECA webinar slides: Making cohort data FAIR
CINECA webinar slides: Making cohort data FAIRCINECA webinar slides: Making cohort data FAIR
CINECA webinar slides: Making cohort data FAIR
CINECAProject
 
Machine Learning Powered Metabolomic Network Analysis
Machine Learning Powered Metabolomic Network AnalysisMachine Learning Powered Metabolomic Network Analysis
Machine Learning Powered Metabolomic Network Analysis
Dmitry Grapov
 
140127 Performance Metrics WG
140127 Performance Metrics WG140127 Performance Metrics WG
140127 Performance Metrics WG
GenomeInABottle
 
Why study Data Sharing? (+ why share your data)
Why study Data Sharing?  (+ why share your data)Why study Data Sharing?  (+ why share your data)
Why study Data Sharing? (+ why share your data)
Heather Piwowar
 
Embi cri yir-2017-final
Embi cri yir-2017-finalEmbi cri yir-2017-final
Embi cri yir-2017-final
Peter Embi
 
tranSMART Community Meeting 5-7 Nov 13 - Session 5: Recent tranSMART Lessons ...
tranSMART Community Meeting 5-7 Nov 13 - Session 5: Recent tranSMART Lessons ...tranSMART Community Meeting 5-7 Nov 13 - Session 5: Recent tranSMART Lessons ...
tranSMART Community Meeting 5-7 Nov 13 - Session 5: Recent tranSMART Lessons ...
David Peyruc
 
Big data supporting drug discovery - cautionary tales from the world of chemi...
Big data supporting drug discovery - cautionary tales from the world of chemi...Big data supporting drug discovery - cautionary tales from the world of chemi...
Big data supporting drug discovery - cautionary tales from the world of chemi...
Valery Tkachenko
 
Linking assertions to evidence with the MicroPublications ontology WG evidenc...
Linking assertions to evidence with the MicroPublications ontology WG evidenc...Linking assertions to evidence with the MicroPublications ontology WG evidenc...
Linking assertions to evidence with the MicroPublications ontology WG evidenc...
jodischneider
 
Developing a Replicable Methodology for Automated Identification of Emerging ...
Developing a Replicable Methodology for Automated Identification of Emerging ...Developing a Replicable Methodology for Automated Identification of Emerging ...
Developing a Replicable Methodology for Automated Identification of Emerging ...
University of Michigan Taubman Health Sciences Library
 
Acs collaborative computational technologies for biomedical research an enabl...
Acs collaborative computational technologies for biomedical research an enabl...Acs collaborative computational technologies for biomedical research an enabl...
Acs collaborative computational technologies for biomedical research an enabl...
Sean Ekins
 
AI in translational medicine webinar
AI in translational medicine webinarAI in translational medicine webinar
AI in translational medicine webinar
Pistoia Alliance
 

What's hot (20)

ReVeaLD: A user-driven domain-specific interactive search platform for biomed...
ReVeaLD: A user-driven domain-specific interactive search platform for biomed...ReVeaLD: A user-driven domain-specific interactive search platform for biomed...
ReVeaLD: A user-driven domain-specific interactive search platform for biomed...
 
Model Organism Linked Data
Model Organism Linked DataModel Organism Linked Data
Model Organism Linked Data
 
Data Mining and Machine Learning (DMML)
Data Mining and Machine Learning (DMML)Data Mining and Machine Learning (DMML)
Data Mining and Machine Learning (DMML)
 
Semantic approaches for biomedical knowledge discovery - Discovery Science 20...
Semantic approaches for biomedical knowledge discovery - Discovery Science 20...Semantic approaches for biomedical knowledge discovery - Discovery Science 20...
Semantic approaches for biomedical knowledge discovery - Discovery Science 20...
 
Thesis Proposal, as presented for dissertation proposal defense
Thesis Proposal, as presented for dissertation proposal defenseThesis Proposal, as presented for dissertation proposal defense
Thesis Proposal, as presented for dissertation proposal defense
 
Engaging Diverse Communities in Cancer Conversations Through Creation of Stru...
Engaging Diverse Communities in Cancer Conversations Through Creation of Stru...Engaging Diverse Communities in Cancer Conversations Through Creation of Stru...
Engaging Diverse Communities in Cancer Conversations Through Creation of Stru...
 
Using the Micropublications ontology and the Open Annotation Data Model to re...
Using the Micropublications ontology and the Open Annotation Data Model to re...Using the Micropublications ontology and the Open Annotation Data Model to re...
Using the Micropublications ontology and the Open Annotation Data Model to re...
 
CINECA webinar slides: Open science through fair health data networks dream o...
CINECA webinar slides: Open science through fair health data networks dream o...CINECA webinar slides: Open science through fair health data networks dream o...
CINECA webinar slides: Open science through fair health data networks dream o...
 
Nicole Nogoy: GigaScience...how licensing can change the way we do research
Nicole Nogoy: GigaScience...how licensing can change the way we do researchNicole Nogoy: GigaScience...how licensing can change the way we do research
Nicole Nogoy: GigaScience...how licensing can change the way we do research
 
CINECA webinar slides: Making cohort data FAIR
CINECA webinar slides: Making cohort data FAIRCINECA webinar slides: Making cohort data FAIR
CINECA webinar slides: Making cohort data FAIR
 
Machine Learning Powered Metabolomic Network Analysis
Machine Learning Powered Metabolomic Network AnalysisMachine Learning Powered Metabolomic Network Analysis
Machine Learning Powered Metabolomic Network Analysis
 
140127 Performance Metrics WG
140127 Performance Metrics WG140127 Performance Metrics WG
140127 Performance Metrics WG
 
Why study Data Sharing? (+ why share your data)
Why study Data Sharing?  (+ why share your data)Why study Data Sharing?  (+ why share your data)
Why study Data Sharing? (+ why share your data)
 
Embi cri yir-2017-final
Embi cri yir-2017-finalEmbi cri yir-2017-final
Embi cri yir-2017-final
 
tranSMART Community Meeting 5-7 Nov 13 - Session 5: Recent tranSMART Lessons ...
tranSMART Community Meeting 5-7 Nov 13 - Session 5: Recent tranSMART Lessons ...tranSMART Community Meeting 5-7 Nov 13 - Session 5: Recent tranSMART Lessons ...
tranSMART Community Meeting 5-7 Nov 13 - Session 5: Recent tranSMART Lessons ...
 
Big data supporting drug discovery - cautionary tales from the world of chemi...
Big data supporting drug discovery - cautionary tales from the world of chemi...Big data supporting drug discovery - cautionary tales from the world of chemi...
Big data supporting drug discovery - cautionary tales from the world of chemi...
 
Linking assertions to evidence with the MicroPublications ontology WG evidenc...
Linking assertions to evidence with the MicroPublications ontology WG evidenc...Linking assertions to evidence with the MicroPublications ontology WG evidenc...
Linking assertions to evidence with the MicroPublications ontology WG evidenc...
 
Developing a Replicable Methodology for Automated Identification of Emerging ...
Developing a Replicable Methodology for Automated Identification of Emerging ...Developing a Replicable Methodology for Automated Identification of Emerging ...
Developing a Replicable Methodology for Automated Identification of Emerging ...
 
Acs collaborative computational technologies for biomedical research an enabl...
Acs collaborative computational technologies for biomedical research an enabl...Acs collaborative computational technologies for biomedical research an enabl...
Acs collaborative computational technologies for biomedical research an enabl...
 
AI in translational medicine webinar
AI in translational medicine webinarAI in translational medicine webinar
AI in translational medicine webinar
 

Similar to Fore FAIR ISMB 2019

BigDataInPractice_EXLPHARMA_KOCH
BigDataInPractice_EXLPHARMA_KOCHBigDataInPractice_EXLPHARMA_KOCH
BigDataInPractice_EXLPHARMA_KOCH
John Koch
 
Edge Informatics and FAIR (Findable, Accessible, Interoperable and Reusable) ...
Edge Informatics and FAIR (Findable, Accessible, Interoperable and Reusable) ...Edge Informatics and FAIR (Findable, Accessible, Interoperable and Reusable) ...
Edge Informatics and FAIR (Findable, Accessible, Interoperable and Reusable) ...
Tom Plasterer
 
Qualitative analysis boot camp final presentation slides
Qualitative analysis boot camp final presentation slidesQualitative analysis boot camp final presentation slides
Qualitative analysis boot camp final presentation slides
Alexandra Howson MA, PhD, CHCP
 
Qualitative analysis boot camp final presentation slides
Qualitative analysis boot camp final presentation slidesQualitative analysis boot camp final presentation slides
Qualitative analysis boot camp final presentation slides
Alexandra Howson MA, PhD, CHCP
 
Biomedical Literature
Biomedical Literature Biomedical Literature
Biomedical Literature
Arete-Zoe, LLC
 
Workshop finding and accessing data - fiona - lunteren april 18 2016
Workshop   finding and accessing data - fiona - lunteren april 18 2016Workshop   finding and accessing data - fiona - lunteren april 18 2016
Workshop finding and accessing data - fiona - lunteren april 18 2016
Fiona Nielsen
 
Chain of Trust, a web quality assessment tool
Chain of Trust, a web quality assessment toolChain of Trust, a web quality assessment tool
Chain of Trust, a web quality assessment tool
University of Michigan Taubman Health Sciences Library
 
2016 CRI Year-in-Review
2016 CRI Year-in-Review2016 CRI Year-in-Review
2016 CRI Year-in-Review
Peter Embi
 
FOCUSING YOUR RESEARCH EFFORTS Planning Your Research
FOCUSING YOUR RESEARCH EFFORTS Planning Your Research FOCUSING YOUR RESEARCH EFFORTS Planning Your Research
FOCUSING YOUR RESEARCH EFFORTS Planning Your Research
ShainaBoling829
 
Will Biomedical Research Fundamentally Change in the Era of Big Data?
Will Biomedical Research Fundamentally Change in the Era of Big Data?Will Biomedical Research Fundamentally Change in the Era of Big Data?
Will Biomedical Research Fundamentally Change in the Era of Big Data?
Philip Bourne
 
A FAIR Data Sharing Framework for Large-Scale Human Cancer Proteogenomics
A FAIR Data Sharing Framework for Large-Scale Human Cancer ProteogenomicsA FAIR Data Sharing Framework for Large-Scale Human Cancer Proteogenomics
A FAIR Data Sharing Framework for Large-Scale Human Cancer Proteogenomics
Brett Tully
 
Quality assessment in systematic literature review
Quality assessment in systematic literature reviewQuality assessment in systematic literature review
Quality assessment in systematic literature review
Jingjing Lin
 
BioPharma and FAIR Data, a Collaborative Advantage
BioPharma and FAIR Data, a Collaborative AdvantageBioPharma and FAIR Data, a Collaborative Advantage
BioPharma and FAIR Data, a Collaborative Advantage
Tom Plasterer
 
Clinical research innovation hub walking deck v12
Clinical research innovation hub walking deck v12Clinical research innovation hub walking deck v12
Clinical research innovation hub walking deck v12
Ryan Tubbs
 
effective data sharing for a learning healthcare system
effective data sharing for a learning healthcare systemeffective data sharing for a learning healthcare system
effective data sharing for a learning healthcare system
Paul Houston
 
Data Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health SystemData Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health System
Warren Kibbe
 
Clinical trial data wants to be free: Lessons from the ImmPort Immunology Dat...
Clinical trial data wants to be free: Lessons from the ImmPort Immunology Dat...Clinical trial data wants to be free: Lessons from the ImmPort Immunology Dat...
Clinical trial data wants to be free: Lessons from the ImmPort Immunology Dat...
Barry Smith
 
Sharing and standards christopher hart - clinical innovation and partnering...
Sharing and standards   christopher hart - clinical innovation and partnering...Sharing and standards   christopher hart - clinical innovation and partnering...
Sharing and standards christopher hart - clinical innovation and partnering...
Christopher Hart
 
Data Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health SystemData Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health System
Warren Kibbe
 
How Semantic Technology Helps Researchers
How Semantic Technology Helps ResearchersHow Semantic Technology Helps Researchers
How Semantic Technology Helps Researchers
Darrell W. Gunter
 

Similar to Fore FAIR ISMB 2019 (20)

BigDataInPractice_EXLPHARMA_KOCH
BigDataInPractice_EXLPHARMA_KOCHBigDataInPractice_EXLPHARMA_KOCH
BigDataInPractice_EXLPHARMA_KOCH
 
Edge Informatics and FAIR (Findable, Accessible, Interoperable and Reusable) ...
Edge Informatics and FAIR (Findable, Accessible, Interoperable and Reusable) ...Edge Informatics and FAIR (Findable, Accessible, Interoperable and Reusable) ...
Edge Informatics and FAIR (Findable, Accessible, Interoperable and Reusable) ...
 
Qualitative analysis boot camp final presentation slides
Qualitative analysis boot camp final presentation slidesQualitative analysis boot camp final presentation slides
Qualitative analysis boot camp final presentation slides
 
Qualitative analysis boot camp final presentation slides
Qualitative analysis boot camp final presentation slidesQualitative analysis boot camp final presentation slides
Qualitative analysis boot camp final presentation slides
 
Biomedical Literature
Biomedical Literature Biomedical Literature
Biomedical Literature
 
Workshop finding and accessing data - fiona - lunteren april 18 2016
Workshop   finding and accessing data - fiona - lunteren april 18 2016Workshop   finding and accessing data - fiona - lunteren april 18 2016
Workshop finding and accessing data - fiona - lunteren april 18 2016
 
Chain of Trust, a web quality assessment tool
Chain of Trust, a web quality assessment toolChain of Trust, a web quality assessment tool
Chain of Trust, a web quality assessment tool
 
2016 CRI Year-in-Review
2016 CRI Year-in-Review2016 CRI Year-in-Review
2016 CRI Year-in-Review
 
FOCUSING YOUR RESEARCH EFFORTS Planning Your Research
FOCUSING YOUR RESEARCH EFFORTS Planning Your Research FOCUSING YOUR RESEARCH EFFORTS Planning Your Research
FOCUSING YOUR RESEARCH EFFORTS Planning Your Research
 
Will Biomedical Research Fundamentally Change in the Era of Big Data?
Will Biomedical Research Fundamentally Change in the Era of Big Data?Will Biomedical Research Fundamentally Change in the Era of Big Data?
Will Biomedical Research Fundamentally Change in the Era of Big Data?
 
A FAIR Data Sharing Framework for Large-Scale Human Cancer Proteogenomics
A FAIR Data Sharing Framework for Large-Scale Human Cancer ProteogenomicsA FAIR Data Sharing Framework for Large-Scale Human Cancer Proteogenomics
A FAIR Data Sharing Framework for Large-Scale Human Cancer Proteogenomics
 
Quality assessment in systematic literature review
Quality assessment in systematic literature reviewQuality assessment in systematic literature review
Quality assessment in systematic literature review
 
BioPharma and FAIR Data, a Collaborative Advantage
BioPharma and FAIR Data, a Collaborative AdvantageBioPharma and FAIR Data, a Collaborative Advantage
BioPharma and FAIR Data, a Collaborative Advantage
 
Clinical research innovation hub walking deck v12
Clinical research innovation hub walking deck v12Clinical research innovation hub walking deck v12
Clinical research innovation hub walking deck v12
 
effective data sharing for a learning healthcare system
effective data sharing for a learning healthcare systemeffective data sharing for a learning healthcare system
effective data sharing for a learning healthcare system
 
Data Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health SystemData Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health System
 
Clinical trial data wants to be free: Lessons from the ImmPort Immunology Dat...
Clinical trial data wants to be free: Lessons from the ImmPort Immunology Dat...Clinical trial data wants to be free: Lessons from the ImmPort Immunology Dat...
Clinical trial data wants to be free: Lessons from the ImmPort Immunology Dat...
 
Sharing and standards christopher hart - clinical innovation and partnering...
Sharing and standards   christopher hart - clinical innovation and partnering...Sharing and standards   christopher hart - clinical innovation and partnering...
Sharing and standards christopher hart - clinical innovation and partnering...
 
Data Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health SystemData Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health System
 
How Semantic Technology Helps Researchers
How Semantic Technology Helps ResearchersHow Semantic Technology Helps Researchers
How Semantic Technology Helps Researchers
 

Recently uploaded

Integrating Ayurveda into Parkinson’s Management: A Holistic Approach
Integrating Ayurveda into Parkinson’s Management: A Holistic ApproachIntegrating Ayurveda into Parkinson’s Management: A Holistic Approach
Integrating Ayurveda into Parkinson’s Management: A Holistic Approach
Ayurveda ForAll
 
CHEMOTHERAPY_RDP_CHAPTER 6_Anti Malarial Drugs.pdf
CHEMOTHERAPY_RDP_CHAPTER 6_Anti Malarial Drugs.pdfCHEMOTHERAPY_RDP_CHAPTER 6_Anti Malarial Drugs.pdf
CHEMOTHERAPY_RDP_CHAPTER 6_Anti Malarial Drugs.pdf
rishi2789
 
NARCOTICS- POLICY AND PROCEDURES FOR ITS USE
NARCOTICS- POLICY AND PROCEDURES FOR ITS USENARCOTICS- POLICY AND PROCEDURES FOR ITS USE
NARCOTICS- POLICY AND PROCEDURES FOR ITS USE
Dr. Ahana Haroon
 
Role of Mukta Pishti in the Management of Hyperthyroidism
Role of Mukta Pishti in the Management of HyperthyroidismRole of Mukta Pishti in the Management of Hyperthyroidism
Role of Mukta Pishti in the Management of Hyperthyroidism
Dr. Jyothirmai Paindla
 
CHEMOTHERAPY_RDP_CHAPTER 4_ANTI VIRAL DRUGS.pdf
CHEMOTHERAPY_RDP_CHAPTER 4_ANTI VIRAL DRUGS.pdfCHEMOTHERAPY_RDP_CHAPTER 4_ANTI VIRAL DRUGS.pdf
CHEMOTHERAPY_RDP_CHAPTER 4_ANTI VIRAL DRUGS.pdf
rishi2789
 
Hemodialysis: Chapter 4, Dialysate Circuit - Dr.Gawad
Hemodialysis: Chapter 4, Dialysate Circuit - Dr.GawadHemodialysis: Chapter 4, Dialysate Circuit - Dr.Gawad
Hemodialysis: Chapter 4, Dialysate Circuit - Dr.Gawad
NephroTube - Dr.Gawad
 
Complementary feeding in infant IAP PROTOCOLS
Complementary feeding in infant IAP PROTOCOLSComplementary feeding in infant IAP PROTOCOLS
Complementary feeding in infant IAP PROTOCOLS
chiranthgowda16
 
CHEMOTHERAPY_RDP_CHAPTER 1_ANTI TB DRUGS.pdf
CHEMOTHERAPY_RDP_CHAPTER 1_ANTI TB DRUGS.pdfCHEMOTHERAPY_RDP_CHAPTER 1_ANTI TB DRUGS.pdf
CHEMOTHERAPY_RDP_CHAPTER 1_ANTI TB DRUGS.pdf
rishi2789
 
The Nervous and Chemical Regulation of Respiration
The Nervous and Chemical Regulation of RespirationThe Nervous and Chemical Regulation of Respiration
The Nervous and Chemical Regulation of Respiration
MedicoseAcademics
 
Hemodialysis: Chapter 5, Dialyzers Overview - Dr.Gawad
Hemodialysis: Chapter 5, Dialyzers Overview - Dr.GawadHemodialysis: Chapter 5, Dialyzers Overview - Dr.Gawad
Hemodialysis: Chapter 5, Dialyzers Overview - Dr.Gawad
NephroTube - Dr.Gawad
 
Efficacy of Avartana Sneha in Ayurveda
Efficacy of Avartana Sneha in AyurvedaEfficacy of Avartana Sneha in Ayurveda
Efficacy of Avartana Sneha in Ayurveda
Dr. Jyothirmai Paindla
 
Abortion PG Seminar Power point presentation
Abortion PG Seminar Power point presentationAbortion PG Seminar Power point presentation
Abortion PG Seminar Power point presentation
AksshayaRajanbabu
 
OCT Training Course for clinical practice Part 1
OCT Training Course for clinical practice Part 1OCT Training Course for clinical practice Part 1
OCT Training Course for clinical practice Part 1
KafrELShiekh University
 
Medical Quiz ( Online Quiz for API Meet 2024 ).pdf
Medical Quiz ( Online Quiz for API Meet 2024 ).pdfMedical Quiz ( Online Quiz for API Meet 2024 ).pdf
Medical Quiz ( Online Quiz for API Meet 2024 ).pdf
Jim Jacob Roy
 
REGULATION FOR COMBINATION PRODUCTS AND MEDICAL DEVICES.pptx
REGULATION FOR COMBINATION PRODUCTS AND MEDICAL DEVICES.pptxREGULATION FOR COMBINATION PRODUCTS AND MEDICAL DEVICES.pptx
REGULATION FOR COMBINATION PRODUCTS AND MEDICAL DEVICES.pptx
LaniyaNasrink
 
Physical demands in sports - WCSPT Oslo 2024
Physical demands in sports - WCSPT Oslo 2024Physical demands in sports - WCSPT Oslo 2024
Physical demands in sports - WCSPT Oslo 2024
Torstein Dalen-Lorentsen
 
Muscles of Mastication by Dr. Rabia Inam Gandapore.pptx
Muscles of Mastication by Dr. Rabia Inam Gandapore.pptxMuscles of Mastication by Dr. Rabia Inam Gandapore.pptx
Muscles of Mastication by Dr. Rabia Inam Gandapore.pptx
Dr. Rabia Inam Gandapore
 
Ketone bodies and metabolism-biochemistry
Ketone bodies and metabolism-biochemistryKetone bodies and metabolism-biochemistry
Ketone bodies and metabolism-biochemistry
Dhayanithi C
 
Outbreak management including quarantine, isolation, contact.pptx
Outbreak management including quarantine, isolation, contact.pptxOutbreak management including quarantine, isolation, contact.pptx
Outbreak management including quarantine, isolation, contact.pptx
Pratik328635
 
All info about Diabetes and how to control it.
 All info about Diabetes and how to control it. All info about Diabetes and how to control it.
All info about Diabetes and how to control it.
Gokuldas Hospital
 

Recently uploaded (20)

Integrating Ayurveda into Parkinson’s Management: A Holistic Approach
Integrating Ayurveda into Parkinson’s Management: A Holistic ApproachIntegrating Ayurveda into Parkinson’s Management: A Holistic Approach
Integrating Ayurveda into Parkinson’s Management: A Holistic Approach
 
CHEMOTHERAPY_RDP_CHAPTER 6_Anti Malarial Drugs.pdf
CHEMOTHERAPY_RDP_CHAPTER 6_Anti Malarial Drugs.pdfCHEMOTHERAPY_RDP_CHAPTER 6_Anti Malarial Drugs.pdf
CHEMOTHERAPY_RDP_CHAPTER 6_Anti Malarial Drugs.pdf
 
NARCOTICS- POLICY AND PROCEDURES FOR ITS USE
NARCOTICS- POLICY AND PROCEDURES FOR ITS USENARCOTICS- POLICY AND PROCEDURES FOR ITS USE
NARCOTICS- POLICY AND PROCEDURES FOR ITS USE
 
Role of Mukta Pishti in the Management of Hyperthyroidism
Role of Mukta Pishti in the Management of HyperthyroidismRole of Mukta Pishti in the Management of Hyperthyroidism
Role of Mukta Pishti in the Management of Hyperthyroidism
 
CHEMOTHERAPY_RDP_CHAPTER 4_ANTI VIRAL DRUGS.pdf
CHEMOTHERAPY_RDP_CHAPTER 4_ANTI VIRAL DRUGS.pdfCHEMOTHERAPY_RDP_CHAPTER 4_ANTI VIRAL DRUGS.pdf
CHEMOTHERAPY_RDP_CHAPTER 4_ANTI VIRAL DRUGS.pdf
 
Hemodialysis: Chapter 4, Dialysate Circuit - Dr.Gawad
Hemodialysis: Chapter 4, Dialysate Circuit - Dr.GawadHemodialysis: Chapter 4, Dialysate Circuit - Dr.Gawad
Hemodialysis: Chapter 4, Dialysate Circuit - Dr.Gawad
 
Complementary feeding in infant IAP PROTOCOLS
Complementary feeding in infant IAP PROTOCOLSComplementary feeding in infant IAP PROTOCOLS
Complementary feeding in infant IAP PROTOCOLS
 
CHEMOTHERAPY_RDP_CHAPTER 1_ANTI TB DRUGS.pdf
CHEMOTHERAPY_RDP_CHAPTER 1_ANTI TB DRUGS.pdfCHEMOTHERAPY_RDP_CHAPTER 1_ANTI TB DRUGS.pdf
CHEMOTHERAPY_RDP_CHAPTER 1_ANTI TB DRUGS.pdf
 
The Nervous and Chemical Regulation of Respiration
The Nervous and Chemical Regulation of RespirationThe Nervous and Chemical Regulation of Respiration
The Nervous and Chemical Regulation of Respiration
 
Hemodialysis: Chapter 5, Dialyzers Overview - Dr.Gawad
Hemodialysis: Chapter 5, Dialyzers Overview - Dr.GawadHemodialysis: Chapter 5, Dialyzers Overview - Dr.Gawad
Hemodialysis: Chapter 5, Dialyzers Overview - Dr.Gawad
 
Efficacy of Avartana Sneha in Ayurveda
Efficacy of Avartana Sneha in AyurvedaEfficacy of Avartana Sneha in Ayurveda
Efficacy of Avartana Sneha in Ayurveda
 
Abortion PG Seminar Power point presentation
Abortion PG Seminar Power point presentationAbortion PG Seminar Power point presentation
Abortion PG Seminar Power point presentation
 
OCT Training Course for clinical practice Part 1
OCT Training Course for clinical practice Part 1OCT Training Course for clinical practice Part 1
OCT Training Course for clinical practice Part 1
 
Medical Quiz ( Online Quiz for API Meet 2024 ).pdf
Medical Quiz ( Online Quiz for API Meet 2024 ).pdfMedical Quiz ( Online Quiz for API Meet 2024 ).pdf
Medical Quiz ( Online Quiz for API Meet 2024 ).pdf
 
REGULATION FOR COMBINATION PRODUCTS AND MEDICAL DEVICES.pptx
REGULATION FOR COMBINATION PRODUCTS AND MEDICAL DEVICES.pptxREGULATION FOR COMBINATION PRODUCTS AND MEDICAL DEVICES.pptx
REGULATION FOR COMBINATION PRODUCTS AND MEDICAL DEVICES.pptx
 
Physical demands in sports - WCSPT Oslo 2024
Physical demands in sports - WCSPT Oslo 2024Physical demands in sports - WCSPT Oslo 2024
Physical demands in sports - WCSPT Oslo 2024
 
Muscles of Mastication by Dr. Rabia Inam Gandapore.pptx
Muscles of Mastication by Dr. Rabia Inam Gandapore.pptxMuscles of Mastication by Dr. Rabia Inam Gandapore.pptx
Muscles of Mastication by Dr. Rabia Inam Gandapore.pptx
 
Ketone bodies and metabolism-biochemistry
Ketone bodies and metabolism-biochemistryKetone bodies and metabolism-biochemistry
Ketone bodies and metabolism-biochemistry
 
Outbreak management including quarantine, isolation, contact.pptx
Outbreak management including quarantine, isolation, contact.pptxOutbreak management including quarantine, isolation, contact.pptx
Outbreak management including quarantine, isolation, contact.pptx
 
All info about Diabetes and how to control it.
 All info about Diabetes and how to control it. All info about Diabetes and how to control it.
All info about Diabetes and how to control it.
 

Fore FAIR ISMB 2019

  • 1. FAIR as a Basis for the Cancer Research Data Commons Ian Fore, D.Phil. NCI Center for Biomedical Informatics and Information Technology @ianmfore Demystifying FAIR Science: Examples, Tools and Use Cases ISMB/ECCB 2019 July 23 2019
  • 2. Three themes • Experience of FAIR • Capturing diversity of science • Allow more people to be compliant • Understanding collaboration
  • 4. Under the hype curve “there are emerging indications that the original meanings of findable, accessible, interoperable, and reusable sometimes may be stretched” “the proposed implementation of these principles, with the goal of an Inter- net of FAIR Data and Services, is beginning to raise concern and confusion” Cloudy, increasingly FAIR; revisiting the FAIR Data guiding principles for the European Open Science Cloud Mons et al (2017) Information Services & Use, vol. 37, pp. 49-56
  • 6. Desirable characteristics of FAIR Assessment • Transparency • What are the criteria • Who is doing the assessment • Qualifications for doing so • Not just a score • Non judgmental • Reflect qualities of a given resource • Not strict • Important to leave scope for the novelty of science
  • 7.
  • 8. CC BY-SA 4.0 Case statement of the WG 2019-04-03 www.rd-alliance.org - @resdatall 3 Challenge Ambiguity and wide range of interpretations of FAIRness Lack of a common set of core assessment criteria and a minimum set of shared guidelines Approach Bring together stakeholders Build on existing approaches and expertise Intended results RDA Recommendation of core assessment criteria Generic and expandable self-assessment model Self-assessment toolset FAIR data checklist
  • 9. 9 CC BY-SA 4.0 Measuring maturity Alternative #1 2019-04-03 www.rd-alliance.org - @resdatall 19 Five options for R1.1 [metadata/data] Level 0: no licence Level 1: non standard licence in a human-readable format allowing access Level 2: standard licence in a human-readable format allowing access Level 3: standard open licence in a human-readable format allowing reuse Level 4: standard open licence in a machine-readable format allowing reuse Level 5: standard open licence in a machine-readable format with clear criteria allowing reuse Each option is defining a maturity level Method step 9 Level 0 Level 1 Level 2 Level 3 Level 4 FAIRNess compliance for R1.1 Level 5
  • 10.
  • 11. 11 Data Commons Framework Clinical Proteomics ImagingGenomics Immuno- oncology Cancer Models Biomarkers NCI Cancer Research Data Commons SBG CGC Broad FireCloud ISB CGC Elastic Compute Query Visualization Clinical Proteomics Tumor Analysis Consortium* Tool Deployment The Cancer Imaging Archive* TCIA Web Interface APIs Data Submission Authentication & AuthorizationAuthentication & Authorization Data Models & Dictionaries Computational Workspaces Data Contributors and Consumers Tool Repositories Metadata Validation & Tools Analysis As Is Genomics GDC and Cloud Resources are available now; Framework, As Is Genomics, PDC, IDC are in development; all else is notional.
  • 12. NCBI Sequence Data Delivery Pilot (SDDP) • Sequence data will be stored in commercial cloud • Data submission by existing DbGaP mechanisms • Familiar to investigators • Minimal change to existing DbGaP sequence submission • VCF and phenotype observations in DbGaP • Relieves pressure on SRA by allowing NCI to provide directly for storage of data from its programs
  • 13. Data “As Is?” • Not as harmonized and restructured as e.g. Genomic Data Commons • Goal: • Usable by those not engaged in the creation or production of the dataset • FAIR as a general principle • Ensure the context of samples is captured
  • 14. Standards essential but… • Science inherently non-standard • Even when standard – large diversity of biomedical data • Need to understand data standards from other domains • Domains constantly in flux • Data models reflect unique study designs • Harmonization has proven expensive
  • 15. Shift what we ask for Not just “These are the columns you must provide” “Fit your data into our model” But also (not strict) What are the variables that define your study? What is the model of your data? Standardized or not.
  • 16. Standardizing the description of the non- standard <variable id="phv00357184.v1"> <name>SEX</name> <description>Sex of participant</description> <type>string</type> <value>Female</value> <value>Male</value> </variable> <variable id="phv00169062.v7"> <name>SEX</name> <description>Sex</description> <type>integer, encoded value</type> <comment>Sex The Donor's Identification of sex based upon self-report, family/next of kin, or medical record abstraction. </comment> <value code="1">Male</value> <value code="2">Female</value> </variable> One dbGaP study - GECCO Another dbGaP study - GTEx
  • 17. What is needed for standard description of variables? <variable id="phv00217659.v3"> <name>SMUBRTRM</name> <description>Uberon Term, anatomical location as described by the Uber Anatomy Ontology (UBERON)</description> <type>string</type> <comment>Generated at LDACC Term as specified by the Uber Anatomy Ontology (UBERON), http://bioportal.bioontology.org/ontologies/UBERON. </comment> </variable> <variable id="phv00169242.v7"> <name>SMTSISCH</name> <description>Total Ischemic time for a sample</description> <type>integer</type> <unit>Minutes</unit> <comment>Sample Ischemic Time Interval between actual death, presumed death, or cross clamp application and final tissue stabilization. </comment> </variable> From GTEX Sample Attributes
  • 18. What is the standard to describe data Things, Attributes, Relationships • Some existing candidates • Metadata repositories • ISO11179 style (caDSR, CDISC, etc.) • RDA Data Type Registries • GA4GH – Discovery Work Stream – Search/SchemaBlocks • BDBag • PFB – Portable Format for Biomedical Data • JSON-LD
  • 19. Data Sharing: It’s not the technology, it’s the attitude we take Greg Simon, Director – Cancer Moonshot Task Force
  • 20.
  • 21.
  • 22. NIH Institutes Represented in BD2K Radical Collaboration Training • Center for Scientific Review • National Center for Advancing Translational Sciences • National Cancer Institute • National Human Genome Research Institute • National Heart, Lung, and Blood Institute • National Institute on Aging • National Institute of Allergy and Infectious Diseases • National Institute of Biomedical Imaging and Bioengineering • National Institute of Child Health and Human Development • National Institute on Drug Abuse • National Institute of General Medical Sciences • National Institute of Mental Health • National Institute on Minority Health and Health Disparities • NIH Office of the Director Thanks to: Phil Bourne and Warren Kibbe
  • 23. we wrestle with the question of whether natural selection inherently favors selfish behavior. Is the process of evolutionary competion cruel, or does it sometimes pay to be nice? https://www.wnycstudios.org/story/104082-prisoners-dilemma
  • 24. FIRO-B All Human beings share some basic similarities All people want to feel: Significant Competent Likable All people have some fear of being: Ignored Humiliated Rejected All people have behavior preferences about: Inclusion Control Openness By creating environments that invite people to feel significant, competent and likable, you reduce the level of fear and create environments that are more conducive to honesty, collaboration, accountability and fun! It is in these environments where people bring their best to the workplace and productivity soars in an atmosphere of trust. By being accountable for your own internal environment, you take responsibility for your own psychological patterns and mental models and do not inappropriately associate your own deeper self-concept with the substantive issues you are working with. Celeste Blackman, Green Zone Culture Group
  • 25. Operational phase matters more than the outset • FAIR has served to bring the community together • Success depends on the follow through • There is no silver bullet • Fred Brooks
  • 26. 26 Submit Proposal to Build the Cancer Data Aggregator • Accessible via NCI Cloud Resources and other applications. • API layer will allow users to query across: • NCI Cancer Research Data Commons (CRDC) • NCI Data Coordinating Centers (e.g. HTAN DCC) • Additional Repositories (e.g. KidsFirst DRC) Proposals due by August 15, 2019: https://go.usa.gov/xyKe3

Editor's Notes

  1. FAIR Findable, Accessible, Interoperable, Reusable
  2. Building upon this foundation, our vision for a Cancer Research Data Commons: Is a virtual, expandable infrastructure that will support collaboration among researchers, clinicians, patients, computational scientists, and tool developers. It will house multiple cloud-based Commons Nodes for multiple data types initially including: Genomic, Imaging, and Proteomics data. In the future, we envision additional nodes that support other data types. ANIMATE: The genomics data node is built upon the foundation of the GDC, which provides a means of data submission, user interfaces, and visualization tools. ANIMATE: “As is genomics” is similar, in principle, to the Sequence Read Archive within dbGaP. ANIMATE: The Cloud Resources provide search, compute, and analytical resources, as well as a way for researchers to use their own data and tools. ANIMATE: The Data Commons Framework provides secure access, an approach to metadata validation, user workspaces, shared data models and dictionaries to facilitate interoperability, and a tool repository to allow users to use and share new algorithms, tools, pipelines, and visualizations. ANIMATE: The Proteomics Data node will incorporate data from the Clinical Proteomics Tumor Analysis Consortium (CPTAC) and other sources. ANIMATE: The Imaging Data node will work with the Cancer Imaging Archive (TCIA) and other sources of imaging data. In terms of status, GDC and Cloud Resources are operational; Framework, As Is Genomics, PDC, and IDC are in development; and more is planned for the future.
  3. There should be a different obligation - to require the investigator to share what they know rather than a fixed set of attributes. This should be more motivating to the investigator as it unlocks the creativity inherent in how they designed their study. “Metadata” is a seen as a burden Can we lessen the burden by making sure we capture what scientists are doing anyway?
  4. Two examples here 1. A variable that uses an ontology. How could the term be more explicity referenced? 2. A numeric variable where the type (integer) and unit are specified.