SlideShare a Scribd company logo
1 of 16
Download to read offline
Prediction of novel targets using
disease association data from
Open Targets
Enrico Ferrero, PhD, Associate GSK Fellow
Scientific Leader, Computational Biology, Target Sciences
GSK
BioData World Congress
03.11.2017
@enricoferrero
Data + AI = drugs?
BBC News, 2017 Nature Biotechnology, 2017
The pharma AI space is getting crowded
Partner
Partner
Developing a new drug: 15+ years, $2B+
So, what’s wrong?
Harrison, Nat Rev Drug Discov, 2016
Cook et al., Nat Rev Drug Discov, 2014
Rethink the drug discovery pipeline
Manhattan Institute, 2012
Late phase
failures cost
(a lot) more
Spend more time
and resources in
target discovery
Reduce
attrition in
later phases
But how do we find good targets?
Nelson et al., Nat Genet, 2015
Open Targets
Koscielny et al., 2016
Could it be as easy as spotting spam emails?
▪ Is it possible to predict novel therapeutic targets using available
gene – disease association data?
▪ Is Open Targets just a catalogue of gene – disease associations
or can we learn from it what makes a good target?
A positive – unlabelled (PU) semi-
supervised learning approach
▪ Obtain all gene – disease associations and supporting evidence from Open
Targets platform. For all genes, create numeric features by taking the
mean score across all diseases:
▪ Genetic associations (germline)
▪ Somatic mutations
▪ Significant gene expression changes
▪ Disease-relevant phenotype in animal model
▪ Pathway-level evidence
▪ Gather positive labels from Pharmaprojects: only consider targets with
drugs currently on the market, in clinical trials or preclinical studies. A
semi-supervised framework with only positive labels is used: targets
according to PharmaProjects constitute the positive class (P), while the
rest of the proteome is used as the unlabelled class (U), containing both
negatives and yet-to-be-discovered positive.
▪ All positive cases (1421) and an equal number of randomly selected
unlabelled cases (2842 in total) are set apart for training (80%) and
testing (20%). The remainder is kept as a prediction set where predictions
from the final model will be made.
Finding structure and most important features
t-SNE dimensionality reduction
reveals structured observations
Most important features
according to chi-squared test and
information gain
Nested cross-validation and bagging for
tuning and model selection
Bischl et al., 2012
Wikipedia
Four classifiers are independently tuned, trained and tested on the training
set using a nested cross-validation strategy (4 inner rounds for parameter
tuning and 4 outer rounds to assess performance):
▪ Random forest
▪ Feed-forward neural network with single hidden layer
▪ Support vector machine with radial kernel
▪ Gradient boosting machine with AdaBoost exponential loss
function
In PU learning, U contains both positive and negative cases, which results in classifier
instability. Bagging (bootstrap aggregating) can improve the performance of instable
classifiers by randomly resampling P and U with replacement (bootstrap) and then
aggregating the results by majority voting:
▪ Bagging with 100 iterations was applied to the neural network, the support vector
machine and the gradient boosting machine.
▪ Random forests are already a special case of bagging.
Assessing performance and investigating results
Neural network classifier
achieves 71% accuracy
(0.76 AUC) on test set
More advanced targets
have higher disease
association evidence
Validation of predictions with literature mining
Significant overlap between neural
network predictions and text mining
results (p = 5.05e-172)
Automating drug target discovery
with machine learning
▪ The gene – disease association data from Open Targets contains enough
information to predict whether a protein can make a therapeutic target or
not with decent accuracy.
▪ According to our model, the most informative evidence types are animal
models showing disease-relevant phenotypes, dysregulated gene
expression in disease tissue and genetic associations between gene and
disease.
▪ The ability to predict late stage targets with greater accuracy confirms that
clear linkage between target and disease is essential to maximise chances
of success in the clinic.
▪ Limitations:
▪ Lack of prediction on indication;
▪ No tractability considerations.
Thank you!
▪ Philippe Sanseau
▪ Ian Dunham
▪ Gautier Koscielny
▪ Giovanni Dall’Olio
▪ Pankaj Agarwal
▪ Mark Hurle
▪ Steven Barrett
▪ Nicola Richmond
▪ Jin Yao

More Related Content

What's hot

SMi Group's AI in Drug Discovery 2020 conference
SMi Group's AI in Drug Discovery 2020 conferenceSMi Group's AI in Drug Discovery 2020 conference
SMi Group's AI in Drug Discovery 2020 conferenceDale Butler
 
AI applications in life sciences - drug development
AI applications in life sciences - drug developmentAI applications in life sciences - drug development
AI applications in life sciences - drug developmentJayanthi Repalli, PhD
 
Ai in drug discovery and drug development
Ai in drug discovery and drug developmentAi in drug discovery and drug development
Ai in drug discovery and drug developmentSRUTHI N
 
Errors in Research
Errors in ResearchErrors in Research
Errors in ResearchTANUSISODIA2
 
Digital platforms could disrupts how pharma companies plan and excecute clini...
Digital platforms could disrupts how pharma companies plan and excecute clini...Digital platforms could disrupts how pharma companies plan and excecute clini...
Digital platforms could disrupts how pharma companies plan and excecute clini...Jayanthi Repalli, PhD
 
Developing Drugs in the New Era of Personalized Medicines
Developing Drugs in the New Era of Personalized Medicines Developing Drugs in the New Era of Personalized Medicines
Developing Drugs in the New Era of Personalized Medicines PAREXEL International
 
Machine Learning for Preclinical Research
Machine Learning for Preclinical ResearchMachine Learning for Preclinical Research
Machine Learning for Preclinical ResearchPaul Agapow
 
The End of the Drug Development Casino?
The End of the Drug Development Casino?The End of the Drug Development Casino?
The End of the Drug Development Casino?Paul Agapow
 
Overcoming obstacles to repurposing for neurodegenerative disease
Overcoming obstacles to repurposing for neurodegenerative diseaseOvercoming obstacles to repurposing for neurodegenerative disease
Overcoming obstacles to repurposing for neurodegenerative diseaseLona Vincent
 
So, My FitBit is Clinical Trial Grade Right?
So, My FitBit is Clinical Trial Grade Right?So, My FitBit is Clinical Trial Grade Right?
So, My FitBit is Clinical Trial Grade Right?PAREXEL International
 
5 essential steps for sample size determination in clinical trials slideshare
5 essential steps for sample size determination in clinical trials   slideshare5 essential steps for sample size determination in clinical trials   slideshare
5 essential steps for sample size determination in clinical trials slidesharenQuery
 
Developing a Quality Audit Report for General Practice Prescribing for Hypert...
Developing a Quality Audit Report for General Practice Prescribing for Hypert...Developing a Quality Audit Report for General Practice Prescribing for Hypert...
Developing a Quality Audit Report for General Practice Prescribing for Hypert...Health Informatics New Zealand
 
Early MA Assessment for Personalized Medicine: a framework to assess the chal...
Early MA Assessment for Personalized Medicine: a framework to assess the chal...Early MA Assessment for Personalized Medicine: a framework to assess the chal...
Early MA Assessment for Personalized Medicine: a framework to assess the chal...3GDR
 
BioVariance - Pediatric Pharmacogenomics in Drug Discovery
BioVariance - Pediatric Pharmacogenomics in Drug DiscoveryBioVariance - Pediatric Pharmacogenomics in Drug Discovery
BioVariance - Pediatric Pharmacogenomics in Drug DiscoveryJosef Scheiber
 
Bayesian estimations of strong toxic signals [compatibility mode]
Bayesian estimations of strong toxic signals [compatibility mode]Bayesian estimations of strong toxic signals [compatibility mode]
Bayesian estimations of strong toxic signals [compatibility mode]Bhaswat Chakraborty
 
Research methodology 101
Research methodology 101Research methodology 101
Research methodology 101Hesham Gaber
 
Artificial Intelligence and Expediting Drug Development
Artificial Intelligence and Expediting Drug DevelopmentArtificial Intelligence and Expediting Drug Development
Artificial Intelligence and Expediting Drug DevelopmentAshley Recchione
 

What's hot (20)

Discovery_Schreiner
Discovery_SchreinerDiscovery_Schreiner
Discovery_Schreiner
 
SMi Group's AI in Drug Discovery 2020 conference
SMi Group's AI in Drug Discovery 2020 conferenceSMi Group's AI in Drug Discovery 2020 conference
SMi Group's AI in Drug Discovery 2020 conference
 
AI applications in life sciences - drug development
AI applications in life sciences - drug developmentAI applications in life sciences - drug development
AI applications in life sciences - drug development
 
Ai in drug discovery and drug development
Ai in drug discovery and drug developmentAi in drug discovery and drug development
Ai in drug discovery and drug development
 
Errors in Research
Errors in ResearchErrors in Research
Errors in Research
 
Digital platforms could disrupts how pharma companies plan and excecute clini...
Digital platforms could disrupts how pharma companies plan and excecute clini...Digital platforms could disrupts how pharma companies plan and excecute clini...
Digital platforms could disrupts how pharma companies plan and excecute clini...
 
Developing Drugs in the New Era of Personalized Medicines
Developing Drugs in the New Era of Personalized Medicines Developing Drugs in the New Era of Personalized Medicines
Developing Drugs in the New Era of Personalized Medicines
 
Machine Learning for Preclinical Research
Machine Learning for Preclinical ResearchMachine Learning for Preclinical Research
Machine Learning for Preclinical Research
 
The End of the Drug Development Casino?
The End of the Drug Development Casino?The End of the Drug Development Casino?
The End of the Drug Development Casino?
 
Overcoming obstacles to repurposing for neurodegenerative disease
Overcoming obstacles to repurposing for neurodegenerative diseaseOvercoming obstacles to repurposing for neurodegenerative disease
Overcoming obstacles to repurposing for neurodegenerative disease
 
So, My FitBit is Clinical Trial Grade Right?
So, My FitBit is Clinical Trial Grade Right?So, My FitBit is Clinical Trial Grade Right?
So, My FitBit is Clinical Trial Grade Right?
 
5 essential steps for sample size determination in clinical trials slideshare
5 essential steps for sample size determination in clinical trials   slideshare5 essential steps for sample size determination in clinical trials   slideshare
5 essential steps for sample size determination in clinical trials slideshare
 
Developing a Quality Audit Report for General Practice Prescribing for Hypert...
Developing a Quality Audit Report for General Practice Prescribing for Hypert...Developing a Quality Audit Report for General Practice Prescribing for Hypert...
Developing a Quality Audit Report for General Practice Prescribing for Hypert...
 
Early MA Assessment for Personalized Medicine: a framework to assess the chal...
Early MA Assessment for Personalized Medicine: a framework to assess the chal...Early MA Assessment for Personalized Medicine: a framework to assess the chal...
Early MA Assessment for Personalized Medicine: a framework to assess the chal...
 
O1
O1O1
O1
 
BioVariance - Pediatric Pharmacogenomics in Drug Discovery
BioVariance - Pediatric Pharmacogenomics in Drug DiscoveryBioVariance - Pediatric Pharmacogenomics in Drug Discovery
BioVariance - Pediatric Pharmacogenomics in Drug Discovery
 
Bayesian estimations of strong toxic signals [compatibility mode]
Bayesian estimations of strong toxic signals [compatibility mode]Bayesian estimations of strong toxic signals [compatibility mode]
Bayesian estimations of strong toxic signals [compatibility mode]
 
Research methodology 101
Research methodology 101Research methodology 101
Research methodology 101
 
Artificial Intelligence and Expediting Drug Development
Artificial Intelligence and Expediting Drug DevelopmentArtificial Intelligence and Expediting Drug Development
Artificial Intelligence and Expediting Drug Development
 
Clinical trial design
Clinical trial designClinical trial design
Clinical trial design
 

Similar to Prediction of novel targets using disease association data

Gene Profiling in Clinical Oncology - Slide 9 - F. André - Genomic evaluation...
Gene Profiling in Clinical Oncology - Slide 9 - F. André - Genomic evaluation...Gene Profiling in Clinical Oncology - Slide 9 - F. André - Genomic evaluation...
Gene Profiling in Clinical Oncology - Slide 9 - F. André - Genomic evaluation...European School of Oncology
 
Evaluating the Medical Literature
Evaluating the Medical LiteratureEvaluating the Medical Literature
Evaluating the Medical LiteratureClista Clanton
 
Review : Impact of informatics on IVF
Review : Impact of informatics on IVFReview : Impact of informatics on IVF
Review : Impact of informatics on IVFVirochana Kaul
 
K7 - Critical Appraisal.pdf
K7 - Critical Appraisal.pdfK7 - Critical Appraisal.pdf
K7 - Critical Appraisal.pdfJeslynTengkawan1
 
Evidence Synthesis for Sparse Evidence Base, Heterogeneous Studies, and Disco...
Evidence Synthesis for Sparse Evidence Base, Heterogeneous Studies, and Disco...Evidence Synthesis for Sparse Evidence Base, Heterogeneous Studies, and Disco...
Evidence Synthesis for Sparse Evidence Base, Heterogeneous Studies, and Disco...InsideScientific
 
Amia tbi-14-final
Amia tbi-14-finalAmia tbi-14-final
Amia tbi-14-finalRuss Altman
 
introductoin to Biostatistics ( 1st and 2nd lec ).ppt
introductoin to Biostatistics ( 1st and 2nd lec ).pptintroductoin to Biostatistics ( 1st and 2nd lec ).ppt
introductoin to Biostatistics ( 1st and 2nd lec ).pptDr.Venkata Suresh Ponnuru
 
HRUG - Text Mining to Construct Causal Models
HRUG - Text Mining to Construct Causal ModelsHRUG - Text Mining to Construct Causal Models
HRUG - Text Mining to Construct Causal Modelsegoodwintx
 
Leverage machine learning and new technologies to enhance rwe generation and ...
Leverage machine learning and new technologies to enhance rwe generation and ...Leverage machine learning and new technologies to enhance rwe generation and ...
Leverage machine learning and new technologies to enhance rwe generation and ...Athula Herath
 
A Health Economics Perspective on NICE and Stratified Medicine Towse Jan 2014
A Health Economics Perspective on NICE and Stratified Medicine Towse Jan 2014A Health Economics Perspective on NICE and Stratified Medicine Towse Jan 2014
A Health Economics Perspective on NICE and Stratified Medicine Towse Jan 2014Office of Health Economics
 
Hit-Miss Model for Duplicate Detection-WHO Drug Safety Database_PVER Conf_May...
Hit-Miss Model for Duplicate Detection-WHO Drug Safety Database_PVER Conf_May...Hit-Miss Model for Duplicate Detection-WHO Drug Safety Database_PVER Conf_May...
Hit-Miss Model for Duplicate Detection-WHO Drug Safety Database_PVER Conf_May...NORC at the University of Chicago
 
Eblm pres final
Eblm pres finalEblm pres final
Eblm pres finalprasath172
 
Introduction to Evidence Based Medicine (EBM)
Introduction to Evidence Based Medicine (EBM)Introduction to Evidence Based Medicine (EBM)
Introduction to Evidence Based Medicine (EBM)Elsayed Salih
 
Diabetes Systems Biology And Genetics V6
Diabetes Systems Biology And Genetics V6Diabetes Systems Biology And Genetics V6
Diabetes Systems Biology And Genetics V6cphensley
 
NLP tutorial at AIME 2020
NLP tutorial at AIME 2020NLP tutorial at AIME 2020
NLP tutorial at AIME 2020Rui Zhang
 
Analysis of Medication Possession Ratio for Improved Blood Pressure Control
Analysis of Medication Possession Ratio for Improved Blood Pressure ControlAnalysis of Medication Possession Ratio for Improved Blood Pressure Control
Analysis of Medication Possession Ratio for Improved Blood Pressure ControlHealth Informatics New Zealand
 
The Clinical Genome Conference 2014
The Clinical Genome Conference 2014The Clinical Genome Conference 2014
The Clinical Genome Conference 2014Nicole Proulx
 

Similar to Prediction of novel targets using disease association data (20)

Gene Profiling in Clinical Oncology - Slide 9 - F. André - Genomic evaluation...
Gene Profiling in Clinical Oncology - Slide 9 - F. André - Genomic evaluation...Gene Profiling in Clinical Oncology - Slide 9 - F. André - Genomic evaluation...
Gene Profiling in Clinical Oncology - Slide 9 - F. André - Genomic evaluation...
 
Evaluating the Medical Literature
Evaluating the Medical LiteratureEvaluating the Medical Literature
Evaluating the Medical Literature
 
Review : Impact of informatics on IVF
Review : Impact of informatics on IVFReview : Impact of informatics on IVF
Review : Impact of informatics on IVF
 
K7 - Critical Appraisal.pdf
K7 - Critical Appraisal.pdfK7 - Critical Appraisal.pdf
K7 - Critical Appraisal.pdf
 
Evidence Synthesis for Sparse Evidence Base, Heterogeneous Studies, and Disco...
Evidence Synthesis for Sparse Evidence Base, Heterogeneous Studies, and Disco...Evidence Synthesis for Sparse Evidence Base, Heterogeneous Studies, and Disco...
Evidence Synthesis for Sparse Evidence Base, Heterogeneous Studies, and Disco...
 
Amia tbi-14-final
Amia tbi-14-finalAmia tbi-14-final
Amia tbi-14-final
 
introductoin to Biostatistics ( 1st and 2nd lec ).ppt
introductoin to Biostatistics ( 1st and 2nd lec ).pptintroductoin to Biostatistics ( 1st and 2nd lec ).ppt
introductoin to Biostatistics ( 1st and 2nd lec ).ppt
 
HRUG - Text Mining to Construct Causal Models
HRUG - Text Mining to Construct Causal ModelsHRUG - Text Mining to Construct Causal Models
HRUG - Text Mining to Construct Causal Models
 
Leverage machine learning and new technologies to enhance rwe generation and ...
Leverage machine learning and new technologies to enhance rwe generation and ...Leverage machine learning and new technologies to enhance rwe generation and ...
Leverage machine learning and new technologies to enhance rwe generation and ...
 
AI in eHealth
AI in eHealthAI in eHealth
AI in eHealth
 
A Health Economics Perspective on NICE and Stratified Medicine Towse Jan 2014
A Health Economics Perspective on NICE and Stratified Medicine Towse Jan 2014A Health Economics Perspective on NICE and Stratified Medicine Towse Jan 2014
A Health Economics Perspective on NICE and Stratified Medicine Towse Jan 2014
 
Hit-Miss Model for Duplicate Detection-WHO Drug Safety Database_PVER Conf_May...
Hit-Miss Model for Duplicate Detection-WHO Drug Safety Database_PVER Conf_May...Hit-Miss Model for Duplicate Detection-WHO Drug Safety Database_PVER Conf_May...
Hit-Miss Model for Duplicate Detection-WHO Drug Safety Database_PVER Conf_May...
 
Eblm pres final
Eblm pres finalEblm pres final
Eblm pres final
 
Introduction to Evidence Based Medicine (EBM)
Introduction to Evidence Based Medicine (EBM)Introduction to Evidence Based Medicine (EBM)
Introduction to Evidence Based Medicine (EBM)
 
Diabetes Systems Biology And Genetics V6
Diabetes Systems Biology And Genetics V6Diabetes Systems Biology And Genetics V6
Diabetes Systems Biology And Genetics V6
 
NLP tutorial at AIME 2020
NLP tutorial at AIME 2020NLP tutorial at AIME 2020
NLP tutorial at AIME 2020
 
Analysis of Medication Possession Ratio for Improved Blood Pressure Control
Analysis of Medication Possession Ratio for Improved Blood Pressure ControlAnalysis of Medication Possession Ratio for Improved Blood Pressure Control
Analysis of Medication Possession Ratio for Improved Blood Pressure Control
 
The Clinical Genome Conference 2014
The Clinical Genome Conference 2014The Clinical Genome Conference 2014
The Clinical Genome Conference 2014
 
Towse NDDP implications for drug development
Towse NDDP implications for drug developmentTowse NDDP implications for drug development
Towse NDDP implications for drug development
 
Vita
VitaVita
Vita
 

Recently uploaded

科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理e4aez8ss
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdfHuman37
 
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptxAmazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptxAbdelrhman abooda
 
9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home ServiceSapana Sha
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]📊 Markus Baersch
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfJohn Sterrett
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024thyngster
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxStephen266013
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Colleen Farrelly
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxEmmanuel Dauda
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...Florian Roscheck
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptxthyngster
 
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptxNLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptxBoston Institute of Analytics
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...Boston Institute of Analytics
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava
 
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degreeyuu sss
 

Recently uploaded (20)

科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf
 
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptxAmazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
 
9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]
 
Call Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort ServiceCall Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort Service
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdf
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docx
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024
 
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptx
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
 
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptxNLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
 
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
 

Prediction of novel targets using disease association data

  • 1. Prediction of novel targets using disease association data from Open Targets Enrico Ferrero, PhD, Associate GSK Fellow Scientific Leader, Computational Biology, Target Sciences GSK BioData World Congress 03.11.2017 @enricoferrero
  • 2. Data + AI = drugs? BBC News, 2017 Nature Biotechnology, 2017
  • 3. The pharma AI space is getting crowded Partner Partner
  • 4. Developing a new drug: 15+ years, $2B+
  • 5. So, what’s wrong? Harrison, Nat Rev Drug Discov, 2016 Cook et al., Nat Rev Drug Discov, 2014
  • 6. Rethink the drug discovery pipeline Manhattan Institute, 2012 Late phase failures cost (a lot) more Spend more time and resources in target discovery Reduce attrition in later phases
  • 7. But how do we find good targets? Nelson et al., Nat Genet, 2015
  • 9. Could it be as easy as spotting spam emails? ▪ Is it possible to predict novel therapeutic targets using available gene – disease association data? ▪ Is Open Targets just a catalogue of gene – disease associations or can we learn from it what makes a good target?
  • 10. A positive – unlabelled (PU) semi- supervised learning approach ▪ Obtain all gene – disease associations and supporting evidence from Open Targets platform. For all genes, create numeric features by taking the mean score across all diseases: ▪ Genetic associations (germline) ▪ Somatic mutations ▪ Significant gene expression changes ▪ Disease-relevant phenotype in animal model ▪ Pathway-level evidence ▪ Gather positive labels from Pharmaprojects: only consider targets with drugs currently on the market, in clinical trials or preclinical studies. A semi-supervised framework with only positive labels is used: targets according to PharmaProjects constitute the positive class (P), while the rest of the proteome is used as the unlabelled class (U), containing both negatives and yet-to-be-discovered positive. ▪ All positive cases (1421) and an equal number of randomly selected unlabelled cases (2842 in total) are set apart for training (80%) and testing (20%). The remainder is kept as a prediction set where predictions from the final model will be made.
  • 11. Finding structure and most important features t-SNE dimensionality reduction reveals structured observations Most important features according to chi-squared test and information gain
  • 12. Nested cross-validation and bagging for tuning and model selection Bischl et al., 2012 Wikipedia Four classifiers are independently tuned, trained and tested on the training set using a nested cross-validation strategy (4 inner rounds for parameter tuning and 4 outer rounds to assess performance): ▪ Random forest ▪ Feed-forward neural network with single hidden layer ▪ Support vector machine with radial kernel ▪ Gradient boosting machine with AdaBoost exponential loss function In PU learning, U contains both positive and negative cases, which results in classifier instability. Bagging (bootstrap aggregating) can improve the performance of instable classifiers by randomly resampling P and U with replacement (bootstrap) and then aggregating the results by majority voting: ▪ Bagging with 100 iterations was applied to the neural network, the support vector machine and the gradient boosting machine. ▪ Random forests are already a special case of bagging.
  • 13. Assessing performance and investigating results Neural network classifier achieves 71% accuracy (0.76 AUC) on test set More advanced targets have higher disease association evidence
  • 14. Validation of predictions with literature mining Significant overlap between neural network predictions and text mining results (p = 5.05e-172)
  • 15. Automating drug target discovery with machine learning ▪ The gene – disease association data from Open Targets contains enough information to predict whether a protein can make a therapeutic target or not with decent accuracy. ▪ According to our model, the most informative evidence types are animal models showing disease-relevant phenotypes, dysregulated gene expression in disease tissue and genetic associations between gene and disease. ▪ The ability to predict late stage targets with greater accuracy confirms that clear linkage between target and disease is essential to maximise chances of success in the clinic. ▪ Limitations: ▪ Lack of prediction on indication; ▪ No tractability considerations.
  • 16. Thank you! ▪ Philippe Sanseau ▪ Ian Dunham ▪ Gautier Koscielny ▪ Giovanni Dall’Olio ▪ Pankaj Agarwal ▪ Mark Hurle ▪ Steven Barrett ▪ Nicola Richmond ▪ Jin Yao