SlideShare a Scribd company logo
DataFAIRy bioassays
pilot project - lessons
learned and future
outlook
Isabella Feierberg, AstraZeneca
Samantha Jeschonek, Collaborative Drug Discovery
Nick Lynch, Curlew Research
2021-06-02
Why DataFAIRy?
2
Substantial investments are being made in AI, ML and FAIR data
across life science industry and academia
Available metadata in (public domain) data repositories is often
insufficient for answering current and future business questions
Pharma companies already pay for curation of partially overlapping
public domain data (e.g., ChEMBL, papers, chemistry patents)
There is a need for FAIR public domain data with high quality
annotations using public ontologies and a common data model
Example
Predict
kinase
selectivity
Get all
kinase
activity/sel
ectivity
data from
ChEMBL,
inhouse,
collaborato
rs
Build model Use model
Update
model with
new data
3
What we want
4
Well annotated data
Lots of data
High quality data
Siloed data is
not helpful
5
My organization’s data
Public data
Partner’s data
The proposed DataFAIRy operational model (2018)
7
Curation and QC by
independent domain
experts
unstructured
public data
FAIR
data
DataFAIRy
Partners
Cost-shared annotation of public domain bioassay descriptions with high quality, using an agreed data model, making data FAIR
FAIR = Findable, Accessible, Interoperable, Reusable
Small molecule bioassays make up
a good pilot case
8
Chemogenomic model building
Assay development, e.g., assay conditions and tool compounds
Enriching public chemogenomics data with FAIR metadata will
show impact across the cheminformatics domain
Project planning – what is available in the public domain?
•
•
•
•
Roche
Project team
9
Rama Balakrishnan
Martin Romacker
Novartis
Anosha Siripala
Gabriel Backiananthan
BMS
Dana Vanderwall
AstraZeneca
Tim Ikeda
Isabella Feierberg
Collaborative Drug
Discovery
Samantha Jeschonek
Jason Harris
Whitney Smith
Pistoia Alliance
Vladimir Makarov
Thomas Liener
Feasibility study, guidance for a larger initiative, example creation
Pilot project (2020) – Summary
10
Curation of 496 public domain assay descriptions were converted
into FAIR information objects using an agreed data model, which
was guided by jointly defined business questions. Upload of the
metadata to PubChem.
Learning points were captured along with recommendations for
future endeavors
Pilot Project - Business questions
11
Biology oriented literature mining for discovery project planning
Assay technology oriented
Chemistry/tool compound oriented
Specific assay conditions
Computational chemogenomic modelling (e.g., target activity, ”PAINS”)
1
2
3
4
5
26 initial questions, pruned down to 15, across 5 main categories
CDD’s BioAssay Express = NLP/ML + Human in the Loop
12
Pilot Project - Assay selection
13
245 Commercial panel assays: ThermoFisher’s kinase selectivity Z’-lyte panel
-Downloaded vendor’s pdf document with assay protocol
42 PubChem NCATS assays – qHTS, large datasets
-Assay Description and Assay Protocol sections in plain text on Pubchem page
210 publication assays: ChEMBL assays where the target is EGFR, and the reference is Open Access
-Paper/supplementary material, references
1
2
3
100 of these 496 annotated assays were subjected to manual QC by project team members
Pilot: 100 QC:d assays (~20%)
14
• Learning points are largely extrapolating
on the 100 QC:d assays
• 89 ChEMBL assays, 5 NCATS assays
• 6 ThermoFisher panel assays QC:d
15
How well did the pilot assays get annotated?
Pilot Project – Learnings
16
Review of supplements and citations → High
cost. Choose assays wisely.
No persistent links exist for commercial
assay panel protocols
Errors propagate between papers
Commercial assay panels were the easiest to
annotate (low-hanging fruit)
Fully automated is not fully accurate:
Benefit from good work practices: audit trail,
versioning, iterative QC by experts
Need for a common community data standard
for future assay publications.
1
2
3
4
5
6
7
Hard and expensive to annotate old assay
protocols from literature : A need for published
assay protocols to be well-annotated in public
databanks and linked to the publication
Value statement
17
“Richly annotated FAIR bioassay data has been very valuable for an internal data
integration project, where it has provided additional terminology aiding the
assimilation of the chemogenomics datasets used by the machine-learning models.
The extra annotations better harmonise our dataset with those from external
partners, enabling the federated platform to provide superior multi-task predictions
across range of panels and safety screens in a privacy preserving way”
Lewis Mervin, Machine Learning and Cheminformatics Expert, Molecular AI,
Discovery Sciences, BioPharmaceuticals R&D, AstraZeneca
Optimize process, data sources, tools, QC within quality constraints.
Define quality metrics.
Next steps:
18
Define and promote a community standard for assay reporting and
publishing --align with vendors, publishers, government agencies.
Attract new project members and sufficient funding to start the
next phase
Scale up (x 10-100) in next steps. Having more partners
lowers cost per partner per assay and overhead cost
19
Thanks to
AstraZeneca
Nigel Green
David Hayes
Tom Plasterer
BMS
Rick Bishop
Janssen
Herman van Vlijmen
Novartis
Fabien Pernot
MMV
Jeremy Burrows
PubChem
Evan Bolton
ChEMBL
Anna Gaulton
Andrew Leach
Roche
Olivier Roche
Medicines Discovery
Catapult
John Overington
Mark Davies
Pangeadata.ai
Vibhor Gupta
University of Miami
Stephan Schürer
BioSci Consulting
Scott Wagers
Collaborative Drug
Discovery
Barry Bunin
Frank Cole
Alex Clark
Hande Kücük McGinty
(now Univ. Of Ohio)
Pistoia Alliance
Carmen Nitsche (now at CCDC)
Nick Lynch (Now at curlew Research)

More Related Content

What's hot

2011-10-11 Open PHACTS at BioIT World Europe
2011-10-11 Open PHACTS at BioIT World Europe2011-10-11 Open PHACTS at BioIT World Europe
2011-10-11 Open PHACTS at BioIT World Europe
open_phacts
 
Toward F.A.I.R. Pharma. PhUSE Linked Data Initiatives Past and Present
Toward F.A.I.R. Pharma. PhUSE Linked Data Initiatives Past and PresentToward F.A.I.R. Pharma. PhUSE Linked Data Initiatives Past and Present
Toward F.A.I.R. Pharma. PhUSE Linked Data Initiatives Past and Present
Tim Williams
 
Knowledge Graphs : Shaping Our Data Future
Knowledge Graphs : Shaping Our Data FutureKnowledge Graphs : Shaping Our Data Future
Knowledge Graphs : Shaping Our Data Future
Tim Williams
 
Medicinal Chemistry Due Diligence: Computational Predictions of an expert’s e...
Medicinal Chemistry Due Diligence: Computational Predictions of an expert’s e...Medicinal Chemistry Due Diligence: Computational Predictions of an expert’s e...
Medicinal Chemistry Due Diligence: Computational Predictions of an expert’s e...
Sean Ekins
 
How to Create a Big Data Culture in Pharma
How to Create a Big Data Culture in PharmaHow to Create a Big Data Culture in Pharma
How to Create a Big Data Culture in Pharma
Chris Waller
 
Online Resources to Support Open Drug Discovery Systems
Online Resources to Support Open Drug Discovery SystemsOnline Resources to Support Open Drug Discovery Systems
FAIR Data Knowledge Graphs–from Theory to Practice
FAIR Data Knowledge Graphs–from Theory to PracticeFAIR Data Knowledge Graphs–from Theory to Practice
FAIR Data Knowledge Graphs–from Theory to Practice
Tom Plasterer
 
2014 agbt giab_progress update
2014 agbt giab_progress update2014 agbt giab_progress update
2014 agbt giab_progress update
GenomeInABottle
 
2015-04-28 Open PHACTS at Swedish Linked Data Network Meet-up
2015-04-28 Open PHACTS at Swedish Linked Data Network Meet-up2015-04-28 Open PHACTS at Swedish Linked Data Network Meet-up
2015-04-28 Open PHACTS at Swedish Linked Data Network Meet-up
open_phacts
 
2015-02-10 The Open PHACTS Discovery Platform: Semantic Data Integration for ...
2015-02-10 The Open PHACTS Discovery Platform: Semantic Data Integration for ...2015-02-10 The Open PHACTS Discovery Platform: Semantic Data Integration for ...
2015-02-10 The Open PHACTS Discovery Platform: Semantic Data Integration for ...
open_phacts
 
2020.04.07 automated molecular design and the bradshaw platform webinar
2020.04.07 automated molecular design and the bradshaw platform webinar2020.04.07 automated molecular design and the bradshaw platform webinar
2020.04.07 automated molecular design and the bradshaw platform webinar
Pistoia Alliance
 
Dataset Catalogs as a Foundation for FAIR* Data
Dataset Catalogs as a Foundation for FAIR* DataDataset Catalogs as a Foundation for FAIR* Data
Dataset Catalogs as a Foundation for FAIR* Data
Tom Plasterer
 
effective data sharing for a learning healthcare system
effective data sharing for a learning healthcare systemeffective data sharing for a learning healthcare system
effective data sharing for a learning healthcare system
Paul Houston
 
cBioPortal Webinar Slides (3/3)
cBioPortal Webinar Slides (3/3)cBioPortal Webinar Slides (3/3)
cBioPortal Webinar Slides (3/3)
Pistoia Alliance
 
Pistoia Alliance US Conference 2015 - 1.1.2 Innovation in Pharma - Chris Waller
Pistoia Alliance US Conference 2015 - 1.1.2 Innovation in Pharma - Chris WallerPistoia Alliance US Conference 2015 - 1.1.2 Innovation in Pharma - Chris Waller
Pistoia Alliance US Conference 2015 - 1.1.2 Innovation in Pharma - Chris Waller
Pistoia Alliance
 
Analytics in Pharmaceutical Industry
Analytics in Pharmaceutical IndustryAnalytics in Pharmaceutical Industry
Analytics in Pharmaceutical Industry
International School of Engineering
 
MPS webinar master deck
MPS webinar master deckMPS webinar master deck
MPS webinar master deck
Pistoia Alliance
 
The Next Generation Open Targets Platform
The Next Generation Open Targets PlatformThe Next Generation Open Targets Platform
The Next Generation Open Targets Platform
HelenaCornu
 
Developing open data analysis pipelines in the cloud: Enabling the ‘big data’...
Developing open data analysis pipelines in the cloud: Enabling the ‘big data’...Developing open data analysis pipelines in the cloud: Enabling the ‘big data’...
Developing open data analysis pipelines in the cloud: Enabling the ‘big data’...
Juan Antonio Vizcaino
 
Role of Data Accessibility During Pandemic
Role of Data Accessibility During PandemicRole of Data Accessibility During Pandemic
Role of Data Accessibility During Pandemic
Databricks
 

What's hot (20)

2011-10-11 Open PHACTS at BioIT World Europe
2011-10-11 Open PHACTS at BioIT World Europe2011-10-11 Open PHACTS at BioIT World Europe
2011-10-11 Open PHACTS at BioIT World Europe
 
Toward F.A.I.R. Pharma. PhUSE Linked Data Initiatives Past and Present
Toward F.A.I.R. Pharma. PhUSE Linked Data Initiatives Past and PresentToward F.A.I.R. Pharma. PhUSE Linked Data Initiatives Past and Present
Toward F.A.I.R. Pharma. PhUSE Linked Data Initiatives Past and Present
 
Knowledge Graphs : Shaping Our Data Future
Knowledge Graphs : Shaping Our Data FutureKnowledge Graphs : Shaping Our Data Future
Knowledge Graphs : Shaping Our Data Future
 
Medicinal Chemistry Due Diligence: Computational Predictions of an expert’s e...
Medicinal Chemistry Due Diligence: Computational Predictions of an expert’s e...Medicinal Chemistry Due Diligence: Computational Predictions of an expert’s e...
Medicinal Chemistry Due Diligence: Computational Predictions of an expert’s e...
 
How to Create a Big Data Culture in Pharma
How to Create a Big Data Culture in PharmaHow to Create a Big Data Culture in Pharma
How to Create a Big Data Culture in Pharma
 
Online Resources to Support Open Drug Discovery Systems
Online Resources to Support Open Drug Discovery SystemsOnline Resources to Support Open Drug Discovery Systems
Online Resources to Support Open Drug Discovery Systems
 
FAIR Data Knowledge Graphs–from Theory to Practice
FAIR Data Knowledge Graphs–from Theory to PracticeFAIR Data Knowledge Graphs–from Theory to Practice
FAIR Data Knowledge Graphs–from Theory to Practice
 
2014 agbt giab_progress update
2014 agbt giab_progress update2014 agbt giab_progress update
2014 agbt giab_progress update
 
2015-04-28 Open PHACTS at Swedish Linked Data Network Meet-up
2015-04-28 Open PHACTS at Swedish Linked Data Network Meet-up2015-04-28 Open PHACTS at Swedish Linked Data Network Meet-up
2015-04-28 Open PHACTS at Swedish Linked Data Network Meet-up
 
2015-02-10 The Open PHACTS Discovery Platform: Semantic Data Integration for ...
2015-02-10 The Open PHACTS Discovery Platform: Semantic Data Integration for ...2015-02-10 The Open PHACTS Discovery Platform: Semantic Data Integration for ...
2015-02-10 The Open PHACTS Discovery Platform: Semantic Data Integration for ...
 
2020.04.07 automated molecular design and the bradshaw platform webinar
2020.04.07 automated molecular design and the bradshaw platform webinar2020.04.07 automated molecular design and the bradshaw platform webinar
2020.04.07 automated molecular design and the bradshaw platform webinar
 
Dataset Catalogs as a Foundation for FAIR* Data
Dataset Catalogs as a Foundation for FAIR* DataDataset Catalogs as a Foundation for FAIR* Data
Dataset Catalogs as a Foundation for FAIR* Data
 
effective data sharing for a learning healthcare system
effective data sharing for a learning healthcare systemeffective data sharing for a learning healthcare system
effective data sharing for a learning healthcare system
 
cBioPortal Webinar Slides (3/3)
cBioPortal Webinar Slides (3/3)cBioPortal Webinar Slides (3/3)
cBioPortal Webinar Slides (3/3)
 
Pistoia Alliance US Conference 2015 - 1.1.2 Innovation in Pharma - Chris Waller
Pistoia Alliance US Conference 2015 - 1.1.2 Innovation in Pharma - Chris WallerPistoia Alliance US Conference 2015 - 1.1.2 Innovation in Pharma - Chris Waller
Pistoia Alliance US Conference 2015 - 1.1.2 Innovation in Pharma - Chris Waller
 
Analytics in Pharmaceutical Industry
Analytics in Pharmaceutical IndustryAnalytics in Pharmaceutical Industry
Analytics in Pharmaceutical Industry
 
MPS webinar master deck
MPS webinar master deckMPS webinar master deck
MPS webinar master deck
 
The Next Generation Open Targets Platform
The Next Generation Open Targets PlatformThe Next Generation Open Targets Platform
The Next Generation Open Targets Platform
 
Developing open data analysis pipelines in the cloud: Enabling the ‘big data’...
Developing open data analysis pipelines in the cloud: Enabling the ‘big data’...Developing open data analysis pipelines in the cloud: Enabling the ‘big data’...
Developing open data analysis pipelines in the cloud: Enabling the ‘big data’...
 
Role of Data Accessibility During Pandemic
Role of Data Accessibility During PandemicRole of Data Accessibility During Pandemic
Role of Data Accessibility During Pandemic
 

Similar to DataFAIRy bioassays pilot -- lessons learned and future outlook

tranSMART Community Meeting 5-7 Nov 13 - Session 5: Recent tranSMART Lessons ...
tranSMART Community Meeting 5-7 Nov 13 - Session 5: Recent tranSMART Lessons ...tranSMART Community Meeting 5-7 Nov 13 - Session 5: Recent tranSMART Lessons ...
tranSMART Community Meeting 5-7 Nov 13 - Session 5: Recent tranSMART Lessons ...
David Peyruc
 
ELSS use cases and strategy
ELSS use cases and strategyELSS use cases and strategy
ELSS use cases and strategy
Anton Yuryev
 
Project Focused Activity And Knowledge Tracker A Unified Data Analysis Collab...
Project Focused Activity And Knowledge Tracker A Unified Data Analysis Collab...Project Focused Activity And Knowledge Tracker A Unified Data Analysis Collab...
Project Focused Activity And Knowledge Tracker A Unified Data Analysis Collab...
brosiusad
 
A Big Picture in Research Data Management
A Big Picture in Research Data ManagementA Big Picture in Research Data Management
A Big Picture in Research Data Management
Carole Goble
 
How SAP HANA can provide value for Pharma R&D
How SAP HANA can provide value for Pharma R&DHow SAP HANA can provide value for Pharma R&D
How SAP HANA can provide value for Pharma R&D
Marc Maurer
 
FAIR Data Knowledge Graphs
FAIR Data Knowledge GraphsFAIR Data Knowledge Graphs
FAIR Data Knowledge Graphs
Tom Plasterer
 
2011-12-02 Open PHACTS at STM Innovation
2011-12-02 Open PHACTS at STM Innovation2011-12-02 Open PHACTS at STM Innovation
2011-12-02 Open PHACTS at STM Innovation
open_phacts
 
NIH Data Summit - The NIH Data Commons
NIH Data Summit - The NIH Data CommonsNIH Data Summit - The NIH Data Commons
NIH Data Summit - The NIH Data Commons
Vivien Bonazzi
 
Is that a scientific report or just some cool pictures from the lab? Reproduc...
Is that a scientific report or just some cool pictures from the lab? Reproduc...Is that a scientific report or just some cool pictures from the lab? Reproduc...
Is that a scientific report or just some cool pictures from the lab? Reproduc...
Greg Landrum
 
Precompetitive Collaborations
Precompetitive CollaborationsPrecompetitive Collaborations
Precompetitive Collaborations
Chris Waller
 
Data Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health SystemData Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health System
Warren Kibbe
 
Stephen Friend CRUK-MD Anderson Cancer Workshop 2012-02-28
Stephen Friend CRUK-MD Anderson Cancer Workshop 2012-02-28Stephen Friend CRUK-MD Anderson Cancer Workshop 2012-02-28
Stephen Friend CRUK-MD Anderson Cancer Workshop 2012-02-28
Sage Base
 
tranSMART Community Meeting 5-7 Nov 13 - Session 3: Pfizer’s Recent Use of tr...
tranSMART Community Meeting 5-7 Nov 13 - Session 3: Pfizer’s Recent Use of tr...tranSMART Community Meeting 5-7 Nov 13 - Session 3: Pfizer’s Recent Use of tr...
tranSMART Community Meeting 5-7 Nov 13 - Session 3: Pfizer’s Recent Use of tr...
David Peyruc
 
The Commons: Leveraging the Power of the Cloud for Big Data
The Commons: Leveraging the Power of the Cloud for Big DataThe Commons: Leveraging the Power of the Cloud for Big Data
The Commons: Leveraging the Power of the Cloud for Big Data
Philip Bourne
 
Tag.bio: Self Service Data Mesh Platform
Tag.bio: Self Service Data Mesh PlatformTag.bio: Self Service Data Mesh Platform
Tag.bio: Self Service Data Mesh Platform
Sanjay Padhi, Ph.D
 
Maximize Your Understanding of Operational Realities in Manufacturing with Pr...
Maximize Your Understanding of Operational Realities in Manufacturing with Pr...Maximize Your Understanding of Operational Realities in Manufacturing with Pr...
Maximize Your Understanding of Operational Realities in Manufacturing with Pr...
Bigfinite
 
2016 Standardization of Laboratory Test Coding - PHI Conference
2016 Standardization of Laboratory Test Coding - PHI Conference2016 Standardization of Laboratory Test Coding - PHI Conference
2016 Standardization of Laboratory Test Coding - PHI Conference
Megan Sawchuk
 
Clinical trial data wants to be free: Lessons from the ImmPort Immunology Dat...
Clinical trial data wants to be free: Lessons from the ImmPort Immunology Dat...Clinical trial data wants to be free: Lessons from the ImmPort Immunology Dat...
Clinical trial data wants to be free: Lessons from the ImmPort Immunology Dat...
Barry Smith
 
BDE SC1 Workshop 3 - Open PHACTS Pilot (Kiera McNeice)
BDE SC1 Workshop 3 - Open PHACTS Pilot (Kiera McNeice)BDE SC1 Workshop 3 - Open PHACTS Pilot (Kiera McNeice)
BDE SC1 Workshop 3 - Open PHACTS Pilot (Kiera McNeice)
BigData_Europe
 
Opening up pharmacological space, the OPEN PHACTs api
Opening up pharmacological space, the OPEN PHACTs apiOpening up pharmacological space, the OPEN PHACTs api
Opening up pharmacological space, the OPEN PHACTs api
Chris Evelo
 

Similar to DataFAIRy bioassays pilot -- lessons learned and future outlook (20)

tranSMART Community Meeting 5-7 Nov 13 - Session 5: Recent tranSMART Lessons ...
tranSMART Community Meeting 5-7 Nov 13 - Session 5: Recent tranSMART Lessons ...tranSMART Community Meeting 5-7 Nov 13 - Session 5: Recent tranSMART Lessons ...
tranSMART Community Meeting 5-7 Nov 13 - Session 5: Recent tranSMART Lessons ...
 
ELSS use cases and strategy
ELSS use cases and strategyELSS use cases and strategy
ELSS use cases and strategy
 
Project Focused Activity And Knowledge Tracker A Unified Data Analysis Collab...
Project Focused Activity And Knowledge Tracker A Unified Data Analysis Collab...Project Focused Activity And Knowledge Tracker A Unified Data Analysis Collab...
Project Focused Activity And Knowledge Tracker A Unified Data Analysis Collab...
 
A Big Picture in Research Data Management
A Big Picture in Research Data ManagementA Big Picture in Research Data Management
A Big Picture in Research Data Management
 
How SAP HANA can provide value for Pharma R&D
How SAP HANA can provide value for Pharma R&DHow SAP HANA can provide value for Pharma R&D
How SAP HANA can provide value for Pharma R&D
 
FAIR Data Knowledge Graphs
FAIR Data Knowledge GraphsFAIR Data Knowledge Graphs
FAIR Data Knowledge Graphs
 
2011-12-02 Open PHACTS at STM Innovation
2011-12-02 Open PHACTS at STM Innovation2011-12-02 Open PHACTS at STM Innovation
2011-12-02 Open PHACTS at STM Innovation
 
NIH Data Summit - The NIH Data Commons
NIH Data Summit - The NIH Data CommonsNIH Data Summit - The NIH Data Commons
NIH Data Summit - The NIH Data Commons
 
Is that a scientific report or just some cool pictures from the lab? Reproduc...
Is that a scientific report or just some cool pictures from the lab? Reproduc...Is that a scientific report or just some cool pictures from the lab? Reproduc...
Is that a scientific report or just some cool pictures from the lab? Reproduc...
 
Precompetitive Collaborations
Precompetitive CollaborationsPrecompetitive Collaborations
Precompetitive Collaborations
 
Data Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health SystemData Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health System
 
Stephen Friend CRUK-MD Anderson Cancer Workshop 2012-02-28
Stephen Friend CRUK-MD Anderson Cancer Workshop 2012-02-28Stephen Friend CRUK-MD Anderson Cancer Workshop 2012-02-28
Stephen Friend CRUK-MD Anderson Cancer Workshop 2012-02-28
 
tranSMART Community Meeting 5-7 Nov 13 - Session 3: Pfizer’s Recent Use of tr...
tranSMART Community Meeting 5-7 Nov 13 - Session 3: Pfizer’s Recent Use of tr...tranSMART Community Meeting 5-7 Nov 13 - Session 3: Pfizer’s Recent Use of tr...
tranSMART Community Meeting 5-7 Nov 13 - Session 3: Pfizer’s Recent Use of tr...
 
The Commons: Leveraging the Power of the Cloud for Big Data
The Commons: Leveraging the Power of the Cloud for Big DataThe Commons: Leveraging the Power of the Cloud for Big Data
The Commons: Leveraging the Power of the Cloud for Big Data
 
Tag.bio: Self Service Data Mesh Platform
Tag.bio: Self Service Data Mesh PlatformTag.bio: Self Service Data Mesh Platform
Tag.bio: Self Service Data Mesh Platform
 
Maximize Your Understanding of Operational Realities in Manufacturing with Pr...
Maximize Your Understanding of Operational Realities in Manufacturing with Pr...Maximize Your Understanding of Operational Realities in Manufacturing with Pr...
Maximize Your Understanding of Operational Realities in Manufacturing with Pr...
 
2016 Standardization of Laboratory Test Coding - PHI Conference
2016 Standardization of Laboratory Test Coding - PHI Conference2016 Standardization of Laboratory Test Coding - PHI Conference
2016 Standardization of Laboratory Test Coding - PHI Conference
 
Clinical trial data wants to be free: Lessons from the ImmPort Immunology Dat...
Clinical trial data wants to be free: Lessons from the ImmPort Immunology Dat...Clinical trial data wants to be free: Lessons from the ImmPort Immunology Dat...
Clinical trial data wants to be free: Lessons from the ImmPort Immunology Dat...
 
BDE SC1 Workshop 3 - Open PHACTS Pilot (Kiera McNeice)
BDE SC1 Workshop 3 - Open PHACTS Pilot (Kiera McNeice)BDE SC1 Workshop 3 - Open PHACTS Pilot (Kiera McNeice)
BDE SC1 Workshop 3 - Open PHACTS Pilot (Kiera McNeice)
 
Opening up pharmacological space, the OPEN PHACTs api
Opening up pharmacological space, the OPEN PHACTs apiOpening up pharmacological space, the OPEN PHACTs api
Opening up pharmacological space, the OPEN PHACTs api
 

Recently uploaded

Micronuclei test.M.sc.zoology.fisheries.
Micronuclei test.M.sc.zoology.fisheries.Micronuclei test.M.sc.zoology.fisheries.
Micronuclei test.M.sc.zoology.fisheries.
Aditi Bajpai
 
Eukaryotic Transcription Presentation.pptx
Eukaryotic Transcription Presentation.pptxEukaryotic Transcription Presentation.pptx
Eukaryotic Transcription Presentation.pptx
RitabrataSarkar3
 
bordetella pertussis.................................ppt
bordetella pertussis.................................pptbordetella pertussis.................................ppt
bordetella pertussis.................................ppt
kejapriya1
 
8.Isolation of pure cultures and preservation of cultures.pdf
8.Isolation of pure cultures and preservation of cultures.pdf8.Isolation of pure cultures and preservation of cultures.pdf
8.Isolation of pure cultures and preservation of cultures.pdf
by6843629
 
Topic: SICKLE CELL DISEASE IN CHILDREN-3.pdf
Topic: SICKLE CELL DISEASE IN CHILDREN-3.pdfTopic: SICKLE CELL DISEASE IN CHILDREN-3.pdf
Topic: SICKLE CELL DISEASE IN CHILDREN-3.pdf
TinyAnderson
 
Applied Science: Thermodynamics, Laws & Methodology.pdf
Applied Science: Thermodynamics, Laws & Methodology.pdfApplied Science: Thermodynamics, Laws & Methodology.pdf
Applied Science: Thermodynamics, Laws & Methodology.pdf
University of Hertfordshire
 
THEMATIC APPERCEPTION TEST(TAT) cognitive abilities, creativity, and critic...
THEMATIC  APPERCEPTION  TEST(TAT) cognitive abilities, creativity, and critic...THEMATIC  APPERCEPTION  TEST(TAT) cognitive abilities, creativity, and critic...
THEMATIC APPERCEPTION TEST(TAT) cognitive abilities, creativity, and critic...
Abdul Wali Khan University Mardan,kP,Pakistan
 
The debris of the ‘last major merger’ is dynamically young
The debris of the ‘last major merger’ is dynamically youngThe debris of the ‘last major merger’ is dynamically young
The debris of the ‘last major merger’ is dynamically young
Sérgio Sacani
 
Medical Orthopedic PowerPoint Templates.pptx
Medical Orthopedic PowerPoint Templates.pptxMedical Orthopedic PowerPoint Templates.pptx
Medical Orthopedic PowerPoint Templates.pptx
terusbelajar5
 
aziz sancar nobel prize winner: from mardin to nobel
aziz sancar nobel prize winner: from mardin to nobelaziz sancar nobel prize winner: from mardin to nobel
aziz sancar nobel prize winner: from mardin to nobel
İsa Badur
 
Sharlene Leurig - Enabling Onsite Water Use with Net Zero Water
Sharlene Leurig - Enabling Onsite Water Use with Net Zero WaterSharlene Leurig - Enabling Onsite Water Use with Net Zero Water
Sharlene Leurig - Enabling Onsite Water Use with Net Zero Water
Texas Alliance of Groundwater Districts
 
The binding of cosmological structures by massless topological defects
The binding of cosmological structures by massless topological defectsThe binding of cosmological structures by massless topological defects
The binding of cosmological structures by massless topological defects
Sérgio Sacani
 
Equivariant neural networks and representation theory
Equivariant neural networks and representation theoryEquivariant neural networks and representation theory
Equivariant neural networks and representation theory
Daniel Tubbenhauer
 
Unlocking the mysteries of reproduction: Exploring fecundity and gonadosomati...
Unlocking the mysteries of reproduction: Exploring fecundity and gonadosomati...Unlocking the mysteries of reproduction: Exploring fecundity and gonadosomati...
Unlocking the mysteries of reproduction: Exploring fecundity and gonadosomati...
AbdullaAlAsif1
 
Compexometric titration/Chelatorphy titration/chelating titration
Compexometric titration/Chelatorphy titration/chelating titrationCompexometric titration/Chelatorphy titration/chelating titration
Compexometric titration/Chelatorphy titration/chelating titration
Vandana Devesh Sharma
 
Shallowest Oil Discovery of Turkiye.pptx
Shallowest Oil Discovery of Turkiye.pptxShallowest Oil Discovery of Turkiye.pptx
Shallowest Oil Discovery of Turkiye.pptx
Gokturk Mehmet Dilci
 
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
yqqaatn0
 
Basics of crystallography, crystal systems, classes and different forms
Basics of crystallography, crystal systems, classes and different formsBasics of crystallography, crystal systems, classes and different forms
Basics of crystallography, crystal systems, classes and different forms
MaheshaNanjegowda
 
molar-distalization in orthodontics-seminar.pptx
molar-distalization in orthodontics-seminar.pptxmolar-distalization in orthodontics-seminar.pptx
molar-distalization in orthodontics-seminar.pptx
Anagha Prasad
 
ESR spectroscopy in liquid food and beverages.pptx
ESR spectroscopy in liquid food and beverages.pptxESR spectroscopy in liquid food and beverages.pptx
ESR spectroscopy in liquid food and beverages.pptx
PRIYANKA PATEL
 

Recently uploaded (20)

Micronuclei test.M.sc.zoology.fisheries.
Micronuclei test.M.sc.zoology.fisheries.Micronuclei test.M.sc.zoology.fisheries.
Micronuclei test.M.sc.zoology.fisheries.
 
Eukaryotic Transcription Presentation.pptx
Eukaryotic Transcription Presentation.pptxEukaryotic Transcription Presentation.pptx
Eukaryotic Transcription Presentation.pptx
 
bordetella pertussis.................................ppt
bordetella pertussis.................................pptbordetella pertussis.................................ppt
bordetella pertussis.................................ppt
 
8.Isolation of pure cultures and preservation of cultures.pdf
8.Isolation of pure cultures and preservation of cultures.pdf8.Isolation of pure cultures and preservation of cultures.pdf
8.Isolation of pure cultures and preservation of cultures.pdf
 
Topic: SICKLE CELL DISEASE IN CHILDREN-3.pdf
Topic: SICKLE CELL DISEASE IN CHILDREN-3.pdfTopic: SICKLE CELL DISEASE IN CHILDREN-3.pdf
Topic: SICKLE CELL DISEASE IN CHILDREN-3.pdf
 
Applied Science: Thermodynamics, Laws & Methodology.pdf
Applied Science: Thermodynamics, Laws & Methodology.pdfApplied Science: Thermodynamics, Laws & Methodology.pdf
Applied Science: Thermodynamics, Laws & Methodology.pdf
 
THEMATIC APPERCEPTION TEST(TAT) cognitive abilities, creativity, and critic...
THEMATIC  APPERCEPTION  TEST(TAT) cognitive abilities, creativity, and critic...THEMATIC  APPERCEPTION  TEST(TAT) cognitive abilities, creativity, and critic...
THEMATIC APPERCEPTION TEST(TAT) cognitive abilities, creativity, and critic...
 
The debris of the ‘last major merger’ is dynamically young
The debris of the ‘last major merger’ is dynamically youngThe debris of the ‘last major merger’ is dynamically young
The debris of the ‘last major merger’ is dynamically young
 
Medical Orthopedic PowerPoint Templates.pptx
Medical Orthopedic PowerPoint Templates.pptxMedical Orthopedic PowerPoint Templates.pptx
Medical Orthopedic PowerPoint Templates.pptx
 
aziz sancar nobel prize winner: from mardin to nobel
aziz sancar nobel prize winner: from mardin to nobelaziz sancar nobel prize winner: from mardin to nobel
aziz sancar nobel prize winner: from mardin to nobel
 
Sharlene Leurig - Enabling Onsite Water Use with Net Zero Water
Sharlene Leurig - Enabling Onsite Water Use with Net Zero WaterSharlene Leurig - Enabling Onsite Water Use with Net Zero Water
Sharlene Leurig - Enabling Onsite Water Use with Net Zero Water
 
The binding of cosmological structures by massless topological defects
The binding of cosmological structures by massless topological defectsThe binding of cosmological structures by massless topological defects
The binding of cosmological structures by massless topological defects
 
Equivariant neural networks and representation theory
Equivariant neural networks and representation theoryEquivariant neural networks and representation theory
Equivariant neural networks and representation theory
 
Unlocking the mysteries of reproduction: Exploring fecundity and gonadosomati...
Unlocking the mysteries of reproduction: Exploring fecundity and gonadosomati...Unlocking the mysteries of reproduction: Exploring fecundity and gonadosomati...
Unlocking the mysteries of reproduction: Exploring fecundity and gonadosomati...
 
Compexometric titration/Chelatorphy titration/chelating titration
Compexometric titration/Chelatorphy titration/chelating titrationCompexometric titration/Chelatorphy titration/chelating titration
Compexometric titration/Chelatorphy titration/chelating titration
 
Shallowest Oil Discovery of Turkiye.pptx
Shallowest Oil Discovery of Turkiye.pptxShallowest Oil Discovery of Turkiye.pptx
Shallowest Oil Discovery of Turkiye.pptx
 
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
 
Basics of crystallography, crystal systems, classes and different forms
Basics of crystallography, crystal systems, classes and different formsBasics of crystallography, crystal systems, classes and different forms
Basics of crystallography, crystal systems, classes and different forms
 
molar-distalization in orthodontics-seminar.pptx
molar-distalization in orthodontics-seminar.pptxmolar-distalization in orthodontics-seminar.pptx
molar-distalization in orthodontics-seminar.pptx
 
ESR spectroscopy in liquid food and beverages.pptx
ESR spectroscopy in liquid food and beverages.pptxESR spectroscopy in liquid food and beverages.pptx
ESR spectroscopy in liquid food and beverages.pptx
 

DataFAIRy bioassays pilot -- lessons learned and future outlook

  • 1. DataFAIRy bioassays pilot project - lessons learned and future outlook Isabella Feierberg, AstraZeneca Samantha Jeschonek, Collaborative Drug Discovery Nick Lynch, Curlew Research 2021-06-02
  • 2. Why DataFAIRy? 2 Substantial investments are being made in AI, ML and FAIR data across life science industry and academia Available metadata in (public domain) data repositories is often insufficient for answering current and future business questions Pharma companies already pay for curation of partially overlapping public domain data (e.g., ChEMBL, papers, chemistry patents) There is a need for FAIR public domain data with high quality annotations using public ontologies and a common data model
  • 4. What we want 4 Well annotated data Lots of data High quality data
  • 5. Siloed data is not helpful 5 My organization’s data Public data Partner’s data
  • 6. The proposed DataFAIRy operational model (2018) 7 Curation and QC by independent domain experts unstructured public data FAIR data DataFAIRy Partners Cost-shared annotation of public domain bioassay descriptions with high quality, using an agreed data model, making data FAIR FAIR = Findable, Accessible, Interoperable, Reusable
  • 7. Small molecule bioassays make up a good pilot case 8 Chemogenomic model building Assay development, e.g., assay conditions and tool compounds Enriching public chemogenomics data with FAIR metadata will show impact across the cheminformatics domain Project planning – what is available in the public domain? • • • •
  • 8. Roche Project team 9 Rama Balakrishnan Martin Romacker Novartis Anosha Siripala Gabriel Backiananthan BMS Dana Vanderwall AstraZeneca Tim Ikeda Isabella Feierberg Collaborative Drug Discovery Samantha Jeschonek Jason Harris Whitney Smith Pistoia Alliance Vladimir Makarov Thomas Liener
  • 9. Feasibility study, guidance for a larger initiative, example creation Pilot project (2020) – Summary 10 Curation of 496 public domain assay descriptions were converted into FAIR information objects using an agreed data model, which was guided by jointly defined business questions. Upload of the metadata to PubChem. Learning points were captured along with recommendations for future endeavors
  • 10. Pilot Project - Business questions 11 Biology oriented literature mining for discovery project planning Assay technology oriented Chemistry/tool compound oriented Specific assay conditions Computational chemogenomic modelling (e.g., target activity, ”PAINS”) 1 2 3 4 5 26 initial questions, pruned down to 15, across 5 main categories
  • 11. CDD’s BioAssay Express = NLP/ML + Human in the Loop 12
  • 12. Pilot Project - Assay selection 13 245 Commercial panel assays: ThermoFisher’s kinase selectivity Z’-lyte panel -Downloaded vendor’s pdf document with assay protocol 42 PubChem NCATS assays – qHTS, large datasets -Assay Description and Assay Protocol sections in plain text on Pubchem page 210 publication assays: ChEMBL assays where the target is EGFR, and the reference is Open Access -Paper/supplementary material, references 1 2 3 100 of these 496 annotated assays were subjected to manual QC by project team members
  • 13. Pilot: 100 QC:d assays (~20%) 14 • Learning points are largely extrapolating on the 100 QC:d assays • 89 ChEMBL assays, 5 NCATS assays • 6 ThermoFisher panel assays QC:d
  • 14. 15 How well did the pilot assays get annotated?
  • 15. Pilot Project – Learnings 16 Review of supplements and citations → High cost. Choose assays wisely. No persistent links exist for commercial assay panel protocols Errors propagate between papers Commercial assay panels were the easiest to annotate (low-hanging fruit) Fully automated is not fully accurate: Benefit from good work practices: audit trail, versioning, iterative QC by experts Need for a common community data standard for future assay publications. 1 2 3 4 5 6 7 Hard and expensive to annotate old assay protocols from literature : A need for published assay protocols to be well-annotated in public databanks and linked to the publication
  • 16. Value statement 17 “Richly annotated FAIR bioassay data has been very valuable for an internal data integration project, where it has provided additional terminology aiding the assimilation of the chemogenomics datasets used by the machine-learning models. The extra annotations better harmonise our dataset with those from external partners, enabling the federated platform to provide superior multi-task predictions across range of panels and safety screens in a privacy preserving way” Lewis Mervin, Machine Learning and Cheminformatics Expert, Molecular AI, Discovery Sciences, BioPharmaceuticals R&D, AstraZeneca
  • 17. Optimize process, data sources, tools, QC within quality constraints. Define quality metrics. Next steps: 18 Define and promote a community standard for assay reporting and publishing --align with vendors, publishers, government agencies. Attract new project members and sufficient funding to start the next phase Scale up (x 10-100) in next steps. Having more partners lowers cost per partner per assay and overhead cost
  • 18. 19 Thanks to AstraZeneca Nigel Green David Hayes Tom Plasterer BMS Rick Bishop Janssen Herman van Vlijmen Novartis Fabien Pernot MMV Jeremy Burrows PubChem Evan Bolton ChEMBL Anna Gaulton Andrew Leach Roche Olivier Roche Medicines Discovery Catapult John Overington Mark Davies Pangeadata.ai Vibhor Gupta University of Miami Stephan Schürer BioSci Consulting Scott Wagers Collaborative Drug Discovery Barry Bunin Frank Cole Alex Clark Hande Kücük McGinty (now Univ. Of Ohio) Pistoia Alliance Carmen Nitsche (now at CCDC) Nick Lynch (Now at curlew Research)