SlideShare a Scribd company logo
1 of 40
Download to read offline
and the Quest for New Drug Targets
Tudor I.Oprea
Global Pharma R&D Informatics
Congress
Lisbon, Portugal
▪We navigate “through time
and space” by defining &
observing conventions
▪Geographical maps are a
matter of convention…
▪Informatics and data
science need conventions,
without which there can be
little progress
▪ What do we know?
▪ When do we know it?
▪ Philosophers and scientists alike struggle with this one
…humans are not well equipped to handle contradictions; machines are worse
▪ Machine learning (binary classification models) reflects a simplified view of the
Universe: “A” or “non-A”,True or False. No alternatives
▪ Science exists in a world of relative truths: Stanley Cup winners change annually,
gravity waves only recently confirmed,
▪ Humans accept political polls and weather on TV as substitutes for truth…
▪ We’re perfectly capable to live in the murky world of half-truths and half-lies,
where facts can change overnight.
▪ Facts (and data) have an expiration date
▪ No mathematical models can help discriminate truth from falsehood
▪ Confirmation bias pervades Publications.Obfuscation pervadesPatents.
…we publish positive results,rarely bad ones,and mask“true IP”in patents
▪ How we write papers :Scientist or team seeks“A”,finds“A1”– then the paper gets
written about“A1”
▪ Serendipity & confirmation bias are rarely acknowledged,but we tend to find what
we seek…
▪ We retro-fit new data into old models, until forced to accept “newscience”
▪ Worse,people lie – IRL, but in science aswell
▪ With machine learning models,we take an optimistic view – our model would have
worked if only…
▪ By the time we can have meaningful predictions,the system is capable of
modeling/capturing all states…
▪ The ubiquity of machine learning has shifted the narrative from explanatory
science (Aristotle’s original intent) to predictive science
…yet we can’t handle Black Swans (Turkey surprise) or the fallacy of induction
▪ Despite considerable many-valued logic work (e.g.,Gödel,Łukasiewicz),we
continue to develop ML models using true/false statements.
▪ We should shift from binary models to four-valued logic:A,non-A,both (A,non-A)
and neither A nor non-A. For example,to drug {confirmed} vs. non-drug, we
should add both drug and non-drug (e.g.,an antifungal used as pesticide), and
neither (“don’t care”)
▪ This may improve machine learning … as we sift through computational assertions
that are assigned varied degrees of truth values, it is likely that we will be better
able to model (and understand) biology.
▪ Fast predictions further shifting the paradigm from “important” to“urgent”
6/20/17 revision
How reliable are Big Data?
Can we sift “True Data” from“False Data”?!
https://en.wiktionary.org/wiki/embiggen
▪ Most protein classification schemes are
based on structural and functional criteria.
▪ For therapeutic development,it is useful to
understand how much and what types of
data are available for a given protein,
thereby highlighting well-studied and
understudied targets.
▪ Proteins annotated as drug targets are Tclin
▪ Proteins for which potent small molecules
are known are Tchem
▪ Proteins for which biology is better
understood are Tbio
▪ Proteins that lack antibodies, publicationsor
Gene RIFs are Tdark
11/09/16 revisionT.Oprea et al., Nature Rev.Drug Discov.poster,Jan2017
▪ Tclin proteins are associated
with drug Mechanism of Action
(MoA)
▪ Tchem proteins have
bioactivitis in ChEMBL and
DrugCentral, + humancuration
for some targets
▪ Kinases: <= 30nM
▪ GPCRs: <= 100nM
▪ Nuclear Receptors:<= 100nM
▪ Ion Channels: <= 10μM
▪ Non-IDG Family Targets:<=1μM
10/19/16 revision
Bioactivities of approved drugs (by Target class)
ChEMBL: database of bioactive chemicals
https://www.ebi.ac.uk/chembl/
DrugCentral: online drug compendium
http://drugcentral.org/
▪ Tbio proteins lack small molecule annotation cf. Tchemcriteria,
and satisfy one of these criteria:
▪ protein is above the cutoff criteria for Tdark
▪ protein is annotated with a GO Molecular Function or Biological Process
leaf term(s) with an Experimental Evidence code
▪ protein has confirmed OMIM phenotype(s)
▪ Tdark (“ignorome”) have little information available,and satisfy
these criteria:
▪ PubMed text-mining score from Jensen Lab < 5
▪ <= 3 Gene RIFs
▪ <= 50 Antibodies available according to antibodypedia.com
8/20/15 revision
8/31/16 revisionT.Oprea et al., Nature Rev.Drug Discov.poster,Jan2017
Tdark parameters differ from the other TDLs across the 4 external
metrics cf.Kruskal-Wallis post-hoc pairwise Dunntests
3/28/17 revision
“Counts by Name” == users accessing the STRING website and typing in a gene symbol.
“Counts by Link” == users accessing the network for a gene in STRING by linking to it
from another resource
Data courtesy of Christian von Mehring,STRING-db
3/28/17 revision
▪ Are Tdark proteins underfunded because there is no scientific
interest in this category, or is the lack of knowledge perpetuated by
lack of funding?
▪ It is possible that the absence of high quality,well characterized
molecular probes may be a root cause for this situation.
▪ However,lack of tools leads to lack of interest,and lack ofinterest
diminishes the probability of such tools being developed
▪Leptin Receptor:Tdark in 1995,Marketed Drug in 2014
▪Smoothened Receptor:Tdark in 1997,Marketed Drug in 2012
▪Sphingosine 1-phosphate receptor 1:Tdark in 1997, Marketed
Drug in 2010
▪Orexin Receptors:Tdark in 1997,Marketed Drug in 2014
▪ Proprotein convertase subtilisin/kexin type 9 (PCSK9): Tdark
in 1998, Marketed drugs in 2015
▪Ghrelin Receptor: Tdark in 1999, successful phase 3 RCT in
2016
▪ TNFRSF17,aka BCMA:Tdark in 2000,phase 1 RCT in 2017
▪on average it takes 15-20 years for Tdark to bear fruit
6/20/17 revisionTdark analyses by Cristian Bologa
▪ Initially to answer“how many drugs are out there”…
▪ Mapped products (what patients and docs call “drugs”) onto
active ingredients (what scientists call“drugs”)
▪ Also wanted to know how many drug targets there are……….
10/20/16 revisionOleg Ursu et al.,Nucl Acids Res,database issue,2017,doi:10.1093/nar/gkw993
10/20/16 revisionR.Santos & O.Ursu et al.,Nature Rev.Drug Discov,2017,doi:10.1038/nrd.2016.230
Drugs distributed by
ATC codes (levels 1-2).
Concentric rings
indicate ATC levels.
Histograms represent
the number of drugs
distributed per year of
first approval.
Maximum scale:100.
11/09/16 revision
Human drug targets
distributed by ATC
codes (levels 1-2).
Concentric rings
indicate ATC levels.
Histograms represent
the global income
derived from drugs that
act on these targets via
Mode of Action
annotations.
Maximum scale:287
billion USD.
T.Oprea et al., Nature Rev.Drug Discov.poster,Jan2017
▪Text mining of all NIH grants (NIH RePORTER) for the 2000-
2015 period suggests that 8858 proteis received zero funding.
▪ Of these,6051 are Tdark,2616 are Tbio.(this is expected).
▪However, 119 are Tchem and 72 are Tclin. Possible
explanations: old drug targets or funded elsewhere.
▪We are not able to track patterns of funded research in EU or
other institutions. However, most of these proteins are likely to be
“in the dark” given overall PubMed/Patent data
▪ Pharma and Academia could pay more attention to these
8858 underfunded proteins.
Note:NIH is the only Funding Agency that makes available all funded research data in bulk (for text mining).
Other agencies (e.g.,ESF,IMI,even ERC) do not provide bulk access to funded research abstracts.
T.Oprea et al., Nature Rev.Drug Discov,2017 (submitted) 5/26/17 revision
over 37% of the proteins remain
poorly described (Tdark)
~10% of the Proteome (Tclin & Tchem) can be targeted by
small molecules
Weaddress ~15% of human diseases *) at most
with therapeutic agents
*) disease-ontology.org catalogs ~9,000 disease concepts.This resource lacks~6,000
rare diseases.We estimate ~15,000 disease concepts; ~2500 have therapeutic agents
11/29/17 revision
▪ IDG KMC2 seeks knowledge gaps
across the five branches of the
“knowledge tree”:
▪ Genotype; Phenotype;Interactions
& Pathways; Structure & Function;
and Expression,respectively.
▪ We can use biological systems
network modeling to infer novel
relationships based on available
evidence,and infer new“function”
and “role in disease” data based
on other layers of evidence
▪ Primary focus on Tdark & Tbio
T.Oprea et al., IDG KMC Phase 2,anticipated start 2018 (if funded) 11/29/17 revision
▪ A meta-path encodes type-specific network topology between the source node
(e.g.,Protein target) and the destination node (e.g., Disease orFunction)
▪ Target –– (member of) → PPI Network ← (member of) –– Protein –– (associated
with) → Disease
▪ Target –– (expressed in) → Tissue ← (localized in) –– Disease
T.Oprea et al., IDG KMC Phase 2,anticipated start 2018 (if funded) 11/29/17 revision
▪ Build data matrix from protein knowledge
graph along metapaths:
▪ Protein – Protein Interactions
▪ Pathways
▪ GO terms
▪ Gene expression
▪ ...
▪ For the specific disease/phenotype
“Dilated cardiomyopathy” in OMIM /TCRD
▪ 35 genes linked to this disease in TCRD
▪ Remaining genes assumed not linked
▪ Random forest binary classifier
T.Oprea et al., IDG KMC Phase 2,anticipated start 2018 (if funded) 11/29/17 revision
PP: protein – protein
interaction with
specified protein
VIMP higher for
positive vs negative
Two areTbio:
LMNA – prelaminA/C
DES - desmin
One is Tchem:
PSEN2 – presenilin 2
(for Alzheimer’s)
HighVIMPoverall:
RBM20 (Tbio)
MYBPC3 (Tbio)
TNNT2 (Tbio)
11/29/17 revisionT.Oprea et al., IDG KMC Phase 2,anticipated start 2018 (if funded)
▪ Genes not annotated in OMIM with“Dilated cardiomyopathy”classified as
positives:
▪ NKX2-5 [https://www.ncbi.nlm.nih.gov/pubmed/25503402]
▪ CACNA1C [https://www.ncbi.nlm.nih.gov/pubmed/22589547]
▪ HSPB1 (HSP27) [https://www.ncbi.nlm.nih.gov/pubmed/17873025]
▪ These“false positives”are,in fact,involved in regulating muscle contraction
▪ Given Presenilin 2 is a target for dilated cardiomyopathy,can we find a
link between cardiovascular disease and Alzheimer’s?
T.Oprea et al., IDG KMC Phase 2,anticipated start 2018 (if funded) 11/29/17 revision
Top Dx Prior to
Alzheimer’s (5 yrs or
more):
Essential hypertension
Hyperlipidemia
Type 2 Diabetes mellitus
Hypercholesterolemia
Coronary atherosclerosis
Atrial Fibrillation
…
Persistent mental disorders
Memory Loss
http://rpubs.com/cbologa
/alzheimers_disease
C.Bologa,O.Ursu,J.J.Yang & T.I.Oprea (UNM) data from Cerner HealthFacts DB 11/29/17 revision
10/20/16 revision/
A.B.Jensen et al.,Nature Communications 2014 5:4022
C.Bologa,O.Ursu,J.J.Yang & T.I.Oprea (UNM) data from Cerner HealthFacts DB
Our Cerner HealthFacts chronic kidney
disease (CKD) data supports the
Disease Trajectory work done
on the population of Denmark (15 yrs)
11/6/17 revisionT.Oprea et al., IDG KMC Phase 2,anticipated start 2018 (if funded)
PJ Kindermans,KR Müller etal. 2/10/16 revision
1-class SVM model
based on expression
data
Tclin is “known”
(everything else is
“not known”)
Goal: IdentifyTdark
proteins that share
similar expression
patterns to Tclin
• N is the number of
tissues & xi is the
expression profile
component, norma-
lized by the maximal
component value.
Data from GTEx v7
• N = 53
• All non-selective
genes have  = 0.43;
• Selective genes,  = 1;
• No data,  = 0.
TSI, , provides enrichment for disease & tissue specific genes for drugtargets
(GSK paper from V.Kumar et al., Sci Rep 2016, 6: 36205, PMC5099936) Except…
TSI does not discriminate well Tclin among TDLclasses…
TSI formula fromYanai et al., Bioinformatics2005, 21:650-659 bti042 11/6/17 revision
M.E.Dickinson et al.,Nature 2016,537:508-514
IMPC is currently composed of 18 research institutions and 5 national funders
IMPC
Unassigned
IMPC in
progress
IMPC
planned
IMPC model
MGI models
Unassigned
Vomoronasal and
taste receptors
Aborted
Other MGI
models
95% of eligible IDG genes
(339/356) have plans,
attempts,or models
384 genes were prioritized
by IDG KMC (2014-2016) 17
28
17
1
63
24
50
79
168306 genes are Tbio
90 genes are Tdark
42 genes areTchem
11/29/17 revisionSlide from Steve Murray,JacksonLab
11/7/17 revision
Accurate Information is the Foundation of in silico Methods
the age of informatics-driven
drug/target discovery is here
Project conceived by
ALEX ZHAVORONKOV, PHD
Insilico Medicine, Inc
Emerging Technology Centers
Johns Hopkins University
B301, 1101 33rd Street
Baltimore, MD, 21218
• Drug Discovery
• Drug Repurposing
• Pathway Activation Analysis
• Biomarker Development
• Clin. Trials Predictors
• Aging Research
PHARMA.AI
MEDICINAL CHEMISTRY REIMAGINED
11/29/17 revision
11/29/17 revision
MEDICINAL CHEMISTS VS PHARMA.AI
▪ The drug industry reward system is based on patents
▪ These are awarded for drugs, not targets
▪ Clinically validated targets lead to the me-too phenomenon
▪ Time to pool resources together on Target Selection (which includes target
identification, target validation and all –omic research)
▪ Industry should lead a Target Selection Consortium: Partner with academia and
co-sponsor“double blind” target finding (avoid the reproducibilitycrisis)
▪ @John Baldoni / GSK: we’d like to team up with OpenTargets & ATOM
5/20/17 revision
▪ We somehow expect that in the
near future,artificial intelligence
(AI; machine learning; singularity)
will be able to process all data
related to biomolecules, their
biologic endpoints, ex vivo,invivo
and clinical data, and discover
drugs
▪ The problem with that:GIGO
▪ Garbage in, Garbage out.
https://www.linkedin.com/pulse/why-ai-ready-drug-discovery-tudor-oprea
5/20/17 revision
▪ An epistemological question: Can AI discover new knowledge?
To date, no credible evidence of this has been provided.
Chatbots, winning at chess, GO and Jeopardy! does notcount.
▪ We live in the world of alternative facts when it comes to research
(not just politics). People lie.See work by JP Ioannidis,but also
notes from Bayer (Asadullah et al) and Amgen (Begley et al). As
long as AI gets false data, we cannot provide what’s needed.
▪ As of now,patents cannot name AI as inventors. Patent offices
world-wide grant patents to people.What happens when AIdoes
become the inventor? Will this happen?!
▪ Will it matter?
https://www.linkedin.com/pulse/why-ai-ready-drug-discovery-tudor-oprea
5/20/17 revision
▪ The unrealized promise relates to
our (in)ability to explain the two
pillars of clinical drug effectiveness:
Safety and Efficacy.
▪ In short, be it Drug Safety orPatient
Safety,or indeed Clinical Efficacy,
we remain unable to model these
processes as function of molecular
structure.
T.I.Oprea et al.,Drug Discov.TodayTherap.Strategies 2011, 8:61-69
Presented at the Global Pharma R&D Informatics
Congress.
For more information, visit:
www.global-engage.com

More Related Content

What's hot

Data Visualization to Enhance our Understanding of the Cancer Genome
Data Visualization to Enhance our Understanding of the Cancer GenomeData Visualization to Enhance our Understanding of the Cancer Genome
Data Visualization to Enhance our Understanding of the Cancer GenomeNils Gehlenborg
 
Human Genome and Big Data Challenges
Human Genome and Big Data ChallengesHuman Genome and Big Data Challenges
Human Genome and Big Data ChallengesPhilip Bourne
 
A Platform for Integrated Genome Data Analysis
A Platform for Integrated Genome Data AnalysisA Platform for Integrated Genome Data Analysis
A Platform for Integrated Genome Data AnalysisMatthieu Schapranow
 
BioNRW: Big Medical Data: Challenge or Potential
BioNRW: Big Medical Data: Challenge or PotentialBioNRW: Big Medical Data: Challenge or Potential
BioNRW: Big Medical Data: Challenge or PotentialMatthieu Schapranow
 
Analyze Genomes: Drug Response Analysis
Analyze Genomes: Drug Response AnalysisAnalyze Genomes: Drug Response Analysis
Analyze Genomes: Drug Response AnalysisMatthieu Schapranow
 
Visual Exploration of Clinical and Genomic Data for Patient Stratification
Visual Exploration of Clinical and Genomic Data for Patient StratificationVisual Exploration of Clinical and Genomic Data for Patient Stratification
Visual Exploration of Clinical and Genomic Data for Patient StratificationNils Gehlenborg
 
Usage of open source software for Real World Data Analysis in pharmaceutical ...
Usage of open source software for Real World Data Analysis in pharmaceutical ...Usage of open source software for Real World Data Analysis in pharmaceutical ...
Usage of open source software for Real World Data Analysis in pharmaceutical ...Kees van Bochove
 
Processing of Big Medical Data in Personalized Medicine: Challenge or Potential
Processing of Big Medical Data in Personalized Medicine: Challenge or PotentialProcessing of Big Medical Data in Personalized Medicine: Challenge or Potential
Processing of Big Medical Data in Personalized Medicine: Challenge or PotentialMatthieu Schapranow
 
Using Healthcare Data for Research @ The Hyve - Campus Party 2016
Using Healthcare Data for Research @ The Hyve - Campus Party 2016Using Healthcare Data for Research @ The Hyve - Campus Party 2016
Using Healthcare Data for Research @ The Hyve - Campus Party 2016Kees van Bochove
 
Gesundheit geht uns alle an: Smart Data ermöglicht passendere Entscheidungen...
Gesundheit geht uns alle an: Smart Data ermöglicht passendere Entscheidungen...Gesundheit geht uns alle an: Smart Data ermöglicht passendere Entscheidungen...
Gesundheit geht uns alle an: Smart Data ermöglicht passendere Entscheidungen...Matthieu Schapranow
 
Evaluation Mechanism for Similarity-Based Ranked Search Over Scientific Data
Evaluation Mechanism for Similarity-Based Ranked Search Over Scientific DataEvaluation Mechanism for Similarity-Based Ranked Search Over Scientific Data
Evaluation Mechanism for Similarity-Based Ranked Search Over Scientific DataAM Publications
 

What's hot (12)

Data Visualization to Enhance our Understanding of the Cancer Genome
Data Visualization to Enhance our Understanding of the Cancer GenomeData Visualization to Enhance our Understanding of the Cancer Genome
Data Visualization to Enhance our Understanding of the Cancer Genome
 
Human Genome and Big Data Challenges
Human Genome and Big Data ChallengesHuman Genome and Big Data Challenges
Human Genome and Big Data Challenges
 
A Platform for Integrated Genome Data Analysis
A Platform for Integrated Genome Data AnalysisA Platform for Integrated Genome Data Analysis
A Platform for Integrated Genome Data Analysis
 
BioNRW: Big Medical Data: Challenge or Potential
BioNRW: Big Medical Data: Challenge or PotentialBioNRW: Big Medical Data: Challenge or Potential
BioNRW: Big Medical Data: Challenge or Potential
 
Analyze Genomes: Drug Response Analysis
Analyze Genomes: Drug Response AnalysisAnalyze Genomes: Drug Response Analysis
Analyze Genomes: Drug Response Analysis
 
Visual Exploration of Clinical and Genomic Data for Patient Stratification
Visual Exploration of Clinical and Genomic Data for Patient StratificationVisual Exploration of Clinical and Genomic Data for Patient Stratification
Visual Exploration of Clinical and Genomic Data for Patient Stratification
 
Usage of open source software for Real World Data Analysis in pharmaceutical ...
Usage of open source software for Real World Data Analysis in pharmaceutical ...Usage of open source software for Real World Data Analysis in pharmaceutical ...
Usage of open source software for Real World Data Analysis in pharmaceutical ...
 
Processing of Big Medical Data in Personalized Medicine: Challenge or Potential
Processing of Big Medical Data in Personalized Medicine: Challenge or PotentialProcessing of Big Medical Data in Personalized Medicine: Challenge or Potential
Processing of Big Medical Data in Personalized Medicine: Challenge or Potential
 
Using Healthcare Data for Research @ The Hyve - Campus Party 2016
Using Healthcare Data for Research @ The Hyve - Campus Party 2016Using Healthcare Data for Research @ The Hyve - Campus Party 2016
Using Healthcare Data for Research @ The Hyve - Campus Party 2016
 
Gesundheit geht uns alle an: Smart Data ermöglicht passendere Entscheidungen...
Gesundheit geht uns alle an: Smart Data ermöglicht passendere Entscheidungen...Gesundheit geht uns alle an: Smart Data ermöglicht passendere Entscheidungen...
Gesundheit geht uns alle an: Smart Data ermöglicht passendere Entscheidungen...
 
Evaluation Mechanism for Similarity-Based Ranked Search Over Scientific Data
Evaluation Mechanism for Similarity-Based Ranked Search Over Scientific DataEvaluation Mechanism for Similarity-Based Ranked Search Over Scientific Data
Evaluation Mechanism for Similarity-Based Ranked Search Over Scientific Data
 
"When time matters..."
"When time matters...""When time matters..."
"When time matters..."
 

Similar to Illuminating the druggable genome and the quest for new drug targets

Illuminating the Druggable Genome with Knowledge Engineering and Machine Lear...
Illuminating the Druggable Genome with Knowledge Engineering and Machine Lear...Illuminating the Druggable Genome with Knowledge Engineering and Machine Lear...
Illuminating the Druggable Genome with Knowledge Engineering and Machine Lear...Jeremy Yang
 
Evolution of Knowledge Discovery and Management
Evolution of Knowledge Discovery and Management Evolution of Knowledge Discovery and Management
Evolution of Knowledge Discovery and Management inscit2006
 
The Monarch Initiative: From Model Organism to Precision Medicine
The Monarch Initiative: From Model Organism to Precision MedicineThe Monarch Initiative: From Model Organism to Precision Medicine
The Monarch Initiative: From Model Organism to Precision Medicinemhaendel
 
Methods to enhance the validity of precision guidelines emerging from big data
Methods to enhance the validity of precision guidelines emerging from big dataMethods to enhance the validity of precision guidelines emerging from big data
Methods to enhance the validity of precision guidelines emerging from big dataChirag Patel
 
Overpromise of AI in Drug Discovery
Overpromise of AI in Drug DiscoveryOverpromise of AI in Drug Discovery
Overpromise of AI in Drug DiscoveryTudor Oprea
 
AstraZeneca - The promise of graphs & graph-based learning in drug discovery
AstraZeneca - The promise of graphs & graph-based learning in drug discoveryAstraZeneca - The promise of graphs & graph-based learning in drug discovery
AstraZeneca - The promise of graphs & graph-based learning in drug discoveryNeo4j
 
Drug Repositioning Conference Washington DC 20190923
Drug Repositioning Conference Washington DC 20190923Drug Repositioning Conference Washington DC 20190923
Drug Repositioning Conference Washington DC 20190923Tudor Oprea
 
Leveraging Publicly Accessible Clinical Trails Data Sharing, Dissemination an...
Leveraging Publicly Accessible Clinical Trails Data Sharing, Dissemination an...Leveraging Publicly Accessible Clinical Trails Data Sharing, Dissemination an...
Leveraging Publicly Accessible Clinical Trails Data Sharing, Dissemination an...Vaticle
 
dkNET Webinar: Illuminating The Druggable Genome With Pharos 10/23/2020
dkNET Webinar: Illuminating The Druggable Genome With Pharos 10/23/2020dkNET Webinar: Illuminating The Druggable Genome With Pharos 10/23/2020
dkNET Webinar: Illuminating The Druggable Genome With Pharos 10/23/2020dkNET
 
Knowledge Will Propel Machine Understanding of Big Data
Knowledge Will Propel Machine Understanding of Big DataKnowledge Will Propel Machine Understanding of Big Data
Knowledge Will Propel Machine Understanding of Big DataAmit Sheth
 
Collaborative Drug Discovery: A Platform For Transforming Neglected Disease R...
Collaborative Drug Discovery: A Platform For Transforming Neglected Disease R...Collaborative Drug Discovery: A Platform For Transforming Neglected Disease R...
Collaborative Drug Discovery: A Platform For Transforming Neglected Disease R...Sean Ekins
 
Research Statement Chien-Wei Lin
Research Statement Chien-Wei LinResearch Statement Chien-Wei Lin
Research Statement Chien-Wei LinChien-Wei Lin
 
ai-in-healthcare-202011-201117103639.pptx
ai-in-healthcare-202011-201117103639.pptxai-in-healthcare-202011-201117103639.pptx
ai-in-healthcare-202011-201117103639.pptxssuser6b571f
 
Amia tb-review-12
Amia tb-review-12Amia tb-review-12
Amia tb-review-12Russ Altman
 
Simplifying semantics for biomedical applications
Simplifying semantics for biomedical applicationsSimplifying semantics for biomedical applications
Simplifying semantics for biomedical applicationsSemantic Web San Diego
 
AAPM Foster July 2009
AAPM Foster July 2009AAPM Foster July 2009
AAPM Foster July 2009Ian Foster
 
EUSFLAT 2019: explainable neuro fuzzy recurrent neural network to predict col...
EUSFLAT 2019: explainable neuro fuzzy recurrent neural network to predict col...EUSFLAT 2019: explainable neuro fuzzy recurrent neural network to predict col...
EUSFLAT 2019: explainable neuro fuzzy recurrent neural network to predict col...Servio Fernando Lima Reina
 
ConstructPrecisePhenotypesBigDataChallenge
ConstructPrecisePhenotypesBigDataChallengeConstructPrecisePhenotypesBigDataChallenge
ConstructPrecisePhenotypesBigDataChallengeAthula Herath
 

Similar to Illuminating the druggable genome and the quest for new drug targets (20)

Illuminating the Druggable Genome with Knowledge Engineering and Machine Lear...
Illuminating the Druggable Genome with Knowledge Engineering and Machine Lear...Illuminating the Druggable Genome with Knowledge Engineering and Machine Lear...
Illuminating the Druggable Genome with Knowledge Engineering and Machine Lear...
 
Evolution of Knowledge Discovery and Management
Evolution of Knowledge Discovery and Management Evolution of Knowledge Discovery and Management
Evolution of Knowledge Discovery and Management
 
The Monarch Initiative: From Model Organism to Precision Medicine
The Monarch Initiative: From Model Organism to Precision MedicineThe Monarch Initiative: From Model Organism to Precision Medicine
The Monarch Initiative: From Model Organism to Precision Medicine
 
Methods to enhance the validity of precision guidelines emerging from big data
Methods to enhance the validity of precision guidelines emerging from big dataMethods to enhance the validity of precision guidelines emerging from big data
Methods to enhance the validity of precision guidelines emerging from big data
 
Overpromise of AI in Drug Discovery
Overpromise of AI in Drug DiscoveryOverpromise of AI in Drug Discovery
Overpromise of AI in Drug Discovery
 
AstraZeneca - The promise of graphs & graph-based learning in drug discovery
AstraZeneca - The promise of graphs & graph-based learning in drug discoveryAstraZeneca - The promise of graphs & graph-based learning in drug discovery
AstraZeneca - The promise of graphs & graph-based learning in drug discovery
 
Use of data
Use of dataUse of data
Use of data
 
Drug Repositioning Conference Washington DC 20190923
Drug Repositioning Conference Washington DC 20190923Drug Repositioning Conference Washington DC 20190923
Drug Repositioning Conference Washington DC 20190923
 
Leveraging Publicly Accessible Clinical Trails Data Sharing, Dissemination an...
Leveraging Publicly Accessible Clinical Trails Data Sharing, Dissemination an...Leveraging Publicly Accessible Clinical Trails Data Sharing, Dissemination an...
Leveraging Publicly Accessible Clinical Trails Data Sharing, Dissemination an...
 
dkNET Webinar: Illuminating The Druggable Genome With Pharos 10/23/2020
dkNET Webinar: Illuminating The Druggable Genome With Pharos 10/23/2020dkNET Webinar: Illuminating The Druggable Genome With Pharos 10/23/2020
dkNET Webinar: Illuminating The Druggable Genome With Pharos 10/23/2020
 
Knowledge Will Propel Machine Understanding of Big Data
Knowledge Will Propel Machine Understanding of Big DataKnowledge Will Propel Machine Understanding of Big Data
Knowledge Will Propel Machine Understanding of Big Data
 
Collaborative Drug Discovery: A Platform For Transforming Neglected Disease R...
Collaborative Drug Discovery: A Platform For Transforming Neglected Disease R...Collaborative Drug Discovery: A Platform For Transforming Neglected Disease R...
Collaborative Drug Discovery: A Platform For Transforming Neglected Disease R...
 
Research Statement Chien-Wei Lin
Research Statement Chien-Wei LinResearch Statement Chien-Wei Lin
Research Statement Chien-Wei Lin
 
ai-in-healthcare-202011-201117103639.pptx
ai-in-healthcare-202011-201117103639.pptxai-in-healthcare-202011-201117103639.pptx
ai-in-healthcare-202011-201117103639.pptx
 
Reproducibility
ReproducibilityReproducibility
Reproducibility
 
Amia tb-review-12
Amia tb-review-12Amia tb-review-12
Amia tb-review-12
 
Simplifying semantics for biomedical applications
Simplifying semantics for biomedical applicationsSimplifying semantics for biomedical applications
Simplifying semantics for biomedical applications
 
AAPM Foster July 2009
AAPM Foster July 2009AAPM Foster July 2009
AAPM Foster July 2009
 
EUSFLAT 2019: explainable neuro fuzzy recurrent neural network to predict col...
EUSFLAT 2019: explainable neuro fuzzy recurrent neural network to predict col...EUSFLAT 2019: explainable neuro fuzzy recurrent neural network to predict col...
EUSFLAT 2019: explainable neuro fuzzy recurrent neural network to predict col...
 
ConstructPrecisePhenotypesBigDataChallenge
ConstructPrecisePhenotypesBigDataChallengeConstructPrecisePhenotypesBigDataChallenge
ConstructPrecisePhenotypesBigDataChallenge
 

More from Laura Berry

Domains of unknown function are essential in yeast
Domains of unknown function are essential in yeastDomains of unknown function are essential in yeast
Domains of unknown function are essential in yeastLaura Berry
 
Data-driven design of cell factories and communities
Data-driven design of cell factories and communitiesData-driven design of cell factories and communities
Data-driven design of cell factories and communitiesLaura Berry
 
Synthetic Biology via programmable directed evolution
Synthetic Biology via programmable directed evolutionSynthetic Biology via programmable directed evolution
Synthetic Biology via programmable directed evolutionLaura Berry
 
Machine reading for cancer biology
Machine reading for cancer biologyMachine reading for cancer biology
Machine reading for cancer biologyLaura Berry
 
3decision®: Bringing structural data analytics to the masses
3decision®: Bringing structural data analytics to the masses3decision®: Bringing structural data analytics to the masses
3decision®: Bringing structural data analytics to the massesLaura Berry
 
Network-driven drug discovery: A computational network biology approach to dr...
Network-driven drug discovery: A computational network biology approach to dr...Network-driven drug discovery: A computational network biology approach to dr...
Network-driven drug discovery: A computational network biology approach to dr...Laura Berry
 
Measuring project success and Shannon's maxim: the enemy knows the system
Measuring project success and Shannon's maxim: the enemy knows the systemMeasuring project success and Shannon's maxim: the enemy knows the system
Measuring project success and Shannon's maxim: the enemy knows the systemLaura Berry
 
The Largest General Translational Informatics Public Private Partnership to Date
The Largest General Translational Informatics Public Private Partnership to DateThe Largest General Translational Informatics Public Private Partnership to Date
The Largest General Translational Informatics Public Private Partnership to DateLaura Berry
 
The challenges of Analytical Data Management in R&D
The challenges of Analytical Data Management in R&DThe challenges of Analytical Data Management in R&D
The challenges of Analytical Data Management in R&DLaura Berry
 
Will data scientists lead the discovery of cancer therapeutics?
Will data scientists lead the discovery of cancer therapeutics?Will data scientists lead the discovery of cancer therapeutics?
Will data scientists lead the discovery of cancer therapeutics?Laura Berry
 
Improving exome sequencing, targeted sequencing, and low frequency variant de...
Improving exome sequencing, targeted sequencing, and low frequency variant de...Improving exome sequencing, targeted sequencing, and low frequency variant de...
Improving exome sequencing, targeted sequencing, and low frequency variant de...Laura Berry
 
Population scale sequencing by cost-efficient targeted NGS
Population scale sequencing by cost-efficient targeted NGSPopulation scale sequencing by cost-efficient targeted NGS
Population scale sequencing by cost-efficient targeted NGSLaura Berry
 
Disease interpretation of whole genome sequence variants
Disease interpretation of whole genome sequence variantsDisease interpretation of whole genome sequence variants
Disease interpretation of whole genome sequence variantsLaura Berry
 
Targeting giants - the pharmacology of adhesions GPCRs
Targeting giants - the pharmacology of adhesions GPCRsTargeting giants - the pharmacology of adhesions GPCRs
Targeting giants - the pharmacology of adhesions GPCRsLaura Berry
 
Inventing, developing and commercialising targeted small-molecule drugs for p...
Inventing, developing and commercialising targeted small-molecule drugs for p...Inventing, developing and commercialising targeted small-molecule drugs for p...
Inventing, developing and commercialising targeted small-molecule drugs for p...Laura Berry
 
Using antitumor agents to probe the sensitivity contexts of cancer cells and ...
Using antitumor agents to probe the sensitivity contexts of cancer cells and ...Using antitumor agents to probe the sensitivity contexts of cancer cells and ...
Using antitumor agents to probe the sensitivity contexts of cancer cells and ...Laura Berry
 
Accessing genetically tagged heterocycle libraries via a chemoresistant DNA s...
Accessing genetically tagged heterocycle libraries via a chemoresistant DNA s...Accessing genetically tagged heterocycle libraries via a chemoresistant DNA s...
Accessing genetically tagged heterocycle libraries via a chemoresistant DNA s...Laura Berry
 
Next-Gen Drug Discovery: An Integrated Micro-Droplet Based Platform
Next-Gen Drug Discovery: An Integrated Micro-Droplet Based PlatformNext-Gen Drug Discovery: An Integrated Micro-Droplet Based Platform
Next-Gen Drug Discovery: An Integrated Micro-Droplet Based PlatformLaura Berry
 
Lighting Rockets at the UChicago Microbiome Launchpad
Lighting Rockets at the UChicago Microbiome LaunchpadLighting Rockets at the UChicago Microbiome Launchpad
Lighting Rockets at the UChicago Microbiome LaunchpadLaura Berry
 
Human Microbiota: Proof of Concept to Production
Human Microbiota: Proof of Concept to ProductionHuman Microbiota: Proof of Concept to Production
Human Microbiota: Proof of Concept to ProductionLaura Berry
 

More from Laura Berry (20)

Domains of unknown function are essential in yeast
Domains of unknown function are essential in yeastDomains of unknown function are essential in yeast
Domains of unknown function are essential in yeast
 
Data-driven design of cell factories and communities
Data-driven design of cell factories and communitiesData-driven design of cell factories and communities
Data-driven design of cell factories and communities
 
Synthetic Biology via programmable directed evolution
Synthetic Biology via programmable directed evolutionSynthetic Biology via programmable directed evolution
Synthetic Biology via programmable directed evolution
 
Machine reading for cancer biology
Machine reading for cancer biologyMachine reading for cancer biology
Machine reading for cancer biology
 
3decision®: Bringing structural data analytics to the masses
3decision®: Bringing structural data analytics to the masses3decision®: Bringing structural data analytics to the masses
3decision®: Bringing structural data analytics to the masses
 
Network-driven drug discovery: A computational network biology approach to dr...
Network-driven drug discovery: A computational network biology approach to dr...Network-driven drug discovery: A computational network biology approach to dr...
Network-driven drug discovery: A computational network biology approach to dr...
 
Measuring project success and Shannon's maxim: the enemy knows the system
Measuring project success and Shannon's maxim: the enemy knows the systemMeasuring project success and Shannon's maxim: the enemy knows the system
Measuring project success and Shannon's maxim: the enemy knows the system
 
The Largest General Translational Informatics Public Private Partnership to Date
The Largest General Translational Informatics Public Private Partnership to DateThe Largest General Translational Informatics Public Private Partnership to Date
The Largest General Translational Informatics Public Private Partnership to Date
 
The challenges of Analytical Data Management in R&D
The challenges of Analytical Data Management in R&DThe challenges of Analytical Data Management in R&D
The challenges of Analytical Data Management in R&D
 
Will data scientists lead the discovery of cancer therapeutics?
Will data scientists lead the discovery of cancer therapeutics?Will data scientists lead the discovery of cancer therapeutics?
Will data scientists lead the discovery of cancer therapeutics?
 
Improving exome sequencing, targeted sequencing, and low frequency variant de...
Improving exome sequencing, targeted sequencing, and low frequency variant de...Improving exome sequencing, targeted sequencing, and low frequency variant de...
Improving exome sequencing, targeted sequencing, and low frequency variant de...
 
Population scale sequencing by cost-efficient targeted NGS
Population scale sequencing by cost-efficient targeted NGSPopulation scale sequencing by cost-efficient targeted NGS
Population scale sequencing by cost-efficient targeted NGS
 
Disease interpretation of whole genome sequence variants
Disease interpretation of whole genome sequence variantsDisease interpretation of whole genome sequence variants
Disease interpretation of whole genome sequence variants
 
Targeting giants - the pharmacology of adhesions GPCRs
Targeting giants - the pharmacology of adhesions GPCRsTargeting giants - the pharmacology of adhesions GPCRs
Targeting giants - the pharmacology of adhesions GPCRs
 
Inventing, developing and commercialising targeted small-molecule drugs for p...
Inventing, developing and commercialising targeted small-molecule drugs for p...Inventing, developing and commercialising targeted small-molecule drugs for p...
Inventing, developing and commercialising targeted small-molecule drugs for p...
 
Using antitumor agents to probe the sensitivity contexts of cancer cells and ...
Using antitumor agents to probe the sensitivity contexts of cancer cells and ...Using antitumor agents to probe the sensitivity contexts of cancer cells and ...
Using antitumor agents to probe the sensitivity contexts of cancer cells and ...
 
Accessing genetically tagged heterocycle libraries via a chemoresistant DNA s...
Accessing genetically tagged heterocycle libraries via a chemoresistant DNA s...Accessing genetically tagged heterocycle libraries via a chemoresistant DNA s...
Accessing genetically tagged heterocycle libraries via a chemoresistant DNA s...
 
Next-Gen Drug Discovery: An Integrated Micro-Droplet Based Platform
Next-Gen Drug Discovery: An Integrated Micro-Droplet Based PlatformNext-Gen Drug Discovery: An Integrated Micro-Droplet Based Platform
Next-Gen Drug Discovery: An Integrated Micro-Droplet Based Platform
 
Lighting Rockets at the UChicago Microbiome Launchpad
Lighting Rockets at the UChicago Microbiome LaunchpadLighting Rockets at the UChicago Microbiome Launchpad
Lighting Rockets at the UChicago Microbiome Launchpad
 
Human Microbiota: Proof of Concept to Production
Human Microbiota: Proof of Concept to ProductionHuman Microbiota: Proof of Concept to Production
Human Microbiota: Proof of Concept to Production
 

Recently uploaded

Neurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 trNeurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 trssuser06f238
 
Call Girls in Aiims Metro Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Aiims Metro Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Aiims Metro Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Aiims Metro Delhi 💯Call Us 🔝9953322196🔝 💯Escort.aasikanpl
 
Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Patrick Diehl
 
Artificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PArtificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PPRINCE C P
 
Analytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdfAnalytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdfSwapnil Therkar
 
Manassas R - Parkside Middle School 🌎🏫
Manassas R - Parkside Middle School 🌎🏫Manassas R - Parkside Middle School 🌎🏫
Manassas R - Parkside Middle School 🌎🏫qfactory1
 
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...lizamodels9
 
Forest laws, Indian forest laws, why they are important
Forest laws, Indian forest laws, why they are importantForest laws, Indian forest laws, why they are important
Forest laws, Indian forest laws, why they are importantadityabhardwaj282
 
‏‏VIRUS - 123455555555555555555555555555555555555555
‏‏VIRUS -  123455555555555555555555555555555555555555‏‏VIRUS -  123455555555555555555555555555555555555555
‏‏VIRUS - 123455555555555555555555555555555555555555kikilily0909
 
Evidences of Evolution General Biology 2
Evidences of Evolution General Biology 2Evidences of Evolution General Biology 2
Evidences of Evolution General Biology 2John Carlo Rollon
 
Transposable elements in prokaryotes.ppt
Transposable elements in prokaryotes.pptTransposable elements in prokaryotes.ppt
Transposable elements in prokaryotes.pptArshadWarsi13
 
TOPIC 8 Temperature and Heat.pdf physics
TOPIC 8 Temperature and Heat.pdf physicsTOPIC 8 Temperature and Heat.pdf physics
TOPIC 8 Temperature and Heat.pdf physicsssuserddc89b
 
Solution chemistry, Moral and Normal solutions
Solution chemistry, Moral and Normal solutionsSolution chemistry, Moral and Normal solutions
Solution chemistry, Moral and Normal solutionsHajira Mahmood
 
zoogeography of pakistan.pptx fauna of Pakistan
zoogeography of pakistan.pptx fauna of Pakistanzoogeography of pakistan.pptx fauna of Pakistan
zoogeography of pakistan.pptx fauna of Pakistanzohaibmir069
 
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.PraveenaKalaiselvan1
 
insect anatomy and insect body wall and their physiology
insect anatomy and insect body wall and their  physiologyinsect anatomy and insect body wall and their  physiology
insect anatomy and insect body wall and their physiologyDrAnita Sharma
 
Welcome to GFDL for Take Your Child To Work Day
Welcome to GFDL for Take Your Child To Work DayWelcome to GFDL for Take Your Child To Work Day
Welcome to GFDL for Take Your Child To Work DayZachary Labe
 
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxAnalytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxSwapnil Therkar
 

Recently uploaded (20)

Neurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 trNeurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 tr
 
Call Girls in Aiims Metro Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Aiims Metro Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Aiims Metro Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Aiims Metro Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
 
Volatile Oils Pharmacognosy And Phytochemistry -I
Volatile Oils Pharmacognosy And Phytochemistry -IVolatile Oils Pharmacognosy And Phytochemistry -I
Volatile Oils Pharmacognosy And Phytochemistry -I
 
Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?
 
Artificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PArtificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C P
 
Analytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdfAnalytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdf
 
Manassas R - Parkside Middle School 🌎🏫
Manassas R - Parkside Middle School 🌎🏫Manassas R - Parkside Middle School 🌎🏫
Manassas R - Parkside Middle School 🌎🏫
 
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
 
Forest laws, Indian forest laws, why they are important
Forest laws, Indian forest laws, why they are importantForest laws, Indian forest laws, why they are important
Forest laws, Indian forest laws, why they are important
 
‏‏VIRUS - 123455555555555555555555555555555555555555
‏‏VIRUS -  123455555555555555555555555555555555555555‏‏VIRUS -  123455555555555555555555555555555555555555
‏‏VIRUS - 123455555555555555555555555555555555555555
 
Evidences of Evolution General Biology 2
Evidences of Evolution General Biology 2Evidences of Evolution General Biology 2
Evidences of Evolution General Biology 2
 
Transposable elements in prokaryotes.ppt
Transposable elements in prokaryotes.pptTransposable elements in prokaryotes.ppt
Transposable elements in prokaryotes.ppt
 
TOPIC 8 Temperature and Heat.pdf physics
TOPIC 8 Temperature and Heat.pdf physicsTOPIC 8 Temperature and Heat.pdf physics
TOPIC 8 Temperature and Heat.pdf physics
 
Solution chemistry, Moral and Normal solutions
Solution chemistry, Moral and Normal solutionsSolution chemistry, Moral and Normal solutions
Solution chemistry, Moral and Normal solutions
 
Engler and Prantl system of classification in plant taxonomy
Engler and Prantl system of classification in plant taxonomyEngler and Prantl system of classification in plant taxonomy
Engler and Prantl system of classification in plant taxonomy
 
zoogeography of pakistan.pptx fauna of Pakistan
zoogeography of pakistan.pptx fauna of Pakistanzoogeography of pakistan.pptx fauna of Pakistan
zoogeography of pakistan.pptx fauna of Pakistan
 
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
 
insect anatomy and insect body wall and their physiology
insect anatomy and insect body wall and their  physiologyinsect anatomy and insect body wall and their  physiology
insect anatomy and insect body wall and their physiology
 
Welcome to GFDL for Take Your Child To Work Day
Welcome to GFDL for Take Your Child To Work DayWelcome to GFDL for Take Your Child To Work Day
Welcome to GFDL for Take Your Child To Work Day
 
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxAnalytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
 

Illuminating the druggable genome and the quest for new drug targets

  • 1. and the Quest for New Drug Targets Tudor I.Oprea Global Pharma R&D Informatics Congress Lisbon, Portugal
  • 2. ▪We navigate “through time and space” by defining & observing conventions ▪Geographical maps are a matter of convention… ▪Informatics and data science need conventions, without which there can be little progress ▪ What do we know? ▪ When do we know it?
  • 3. ▪ Philosophers and scientists alike struggle with this one …humans are not well equipped to handle contradictions; machines are worse ▪ Machine learning (binary classification models) reflects a simplified view of the Universe: “A” or “non-A”,True or False. No alternatives ▪ Science exists in a world of relative truths: Stanley Cup winners change annually, gravity waves only recently confirmed, ▪ Humans accept political polls and weather on TV as substitutes for truth… ▪ We’re perfectly capable to live in the murky world of half-truths and half-lies, where facts can change overnight. ▪ Facts (and data) have an expiration date ▪ No mathematical models can help discriminate truth from falsehood
  • 4. ▪ Confirmation bias pervades Publications.Obfuscation pervadesPatents. …we publish positive results,rarely bad ones,and mask“true IP”in patents ▪ How we write papers :Scientist or team seeks“A”,finds“A1”– then the paper gets written about“A1” ▪ Serendipity & confirmation bias are rarely acknowledged,but we tend to find what we seek… ▪ We retro-fit new data into old models, until forced to accept “newscience” ▪ Worse,people lie – IRL, but in science aswell ▪ With machine learning models,we take an optimistic view – our model would have worked if only… ▪ By the time we can have meaningful predictions,the system is capable of modeling/capturing all states…
  • 5. ▪ The ubiquity of machine learning has shifted the narrative from explanatory science (Aristotle’s original intent) to predictive science …yet we can’t handle Black Swans (Turkey surprise) or the fallacy of induction ▪ Despite considerable many-valued logic work (e.g.,Gödel,Łukasiewicz),we continue to develop ML models using true/false statements. ▪ We should shift from binary models to four-valued logic:A,non-A,both (A,non-A) and neither A nor non-A. For example,to drug {confirmed} vs. non-drug, we should add both drug and non-drug (e.g.,an antifungal used as pesticide), and neither (“don’t care”) ▪ This may improve machine learning … as we sift through computational assertions that are assigned varied degrees of truth values, it is likely that we will be better able to model (and understand) biology. ▪ Fast predictions further shifting the paradigm from “important” to“urgent”
  • 6. 6/20/17 revision How reliable are Big Data? Can we sift “True Data” from“False Data”?! https://en.wiktionary.org/wiki/embiggen
  • 7. ▪ Most protein classification schemes are based on structural and functional criteria. ▪ For therapeutic development,it is useful to understand how much and what types of data are available for a given protein, thereby highlighting well-studied and understudied targets. ▪ Proteins annotated as drug targets are Tclin ▪ Proteins for which potent small molecules are known are Tchem ▪ Proteins for which biology is better understood are Tbio ▪ Proteins that lack antibodies, publicationsor Gene RIFs are Tdark 11/09/16 revisionT.Oprea et al., Nature Rev.Drug Discov.poster,Jan2017
  • 8. ▪ Tclin proteins are associated with drug Mechanism of Action (MoA) ▪ Tchem proteins have bioactivitis in ChEMBL and DrugCentral, + humancuration for some targets ▪ Kinases: <= 30nM ▪ GPCRs: <= 100nM ▪ Nuclear Receptors:<= 100nM ▪ Ion Channels: <= 10μM ▪ Non-IDG Family Targets:<=1μM 10/19/16 revision Bioactivities of approved drugs (by Target class) ChEMBL: database of bioactive chemicals https://www.ebi.ac.uk/chembl/ DrugCentral: online drug compendium http://drugcentral.org/
  • 9. ▪ Tbio proteins lack small molecule annotation cf. Tchemcriteria, and satisfy one of these criteria: ▪ protein is above the cutoff criteria for Tdark ▪ protein is annotated with a GO Molecular Function or Biological Process leaf term(s) with an Experimental Evidence code ▪ protein has confirmed OMIM phenotype(s) ▪ Tdark (“ignorome”) have little information available,and satisfy these criteria: ▪ PubMed text-mining score from Jensen Lab < 5 ▪ <= 3 Gene RIFs ▪ <= 50 Antibodies available according to antibodypedia.com 8/20/15 revision
  • 10. 8/31/16 revisionT.Oprea et al., Nature Rev.Drug Discov.poster,Jan2017 Tdark parameters differ from the other TDLs across the 4 external metrics cf.Kruskal-Wallis post-hoc pairwise Dunntests
  • 11. 3/28/17 revision “Counts by Name” == users accessing the STRING website and typing in a gene symbol. “Counts by Link” == users accessing the network for a gene in STRING by linking to it from another resource Data courtesy of Christian von Mehring,STRING-db
  • 12. 3/28/17 revision ▪ Are Tdark proteins underfunded because there is no scientific interest in this category, or is the lack of knowledge perpetuated by lack of funding? ▪ It is possible that the absence of high quality,well characterized molecular probes may be a root cause for this situation. ▪ However,lack of tools leads to lack of interest,and lack ofinterest diminishes the probability of such tools being developed
  • 13. ▪Leptin Receptor:Tdark in 1995,Marketed Drug in 2014 ▪Smoothened Receptor:Tdark in 1997,Marketed Drug in 2012 ▪Sphingosine 1-phosphate receptor 1:Tdark in 1997, Marketed Drug in 2010 ▪Orexin Receptors:Tdark in 1997,Marketed Drug in 2014 ▪ Proprotein convertase subtilisin/kexin type 9 (PCSK9): Tdark in 1998, Marketed drugs in 2015 ▪Ghrelin Receptor: Tdark in 1999, successful phase 3 RCT in 2016 ▪ TNFRSF17,aka BCMA:Tdark in 2000,phase 1 RCT in 2017 ▪on average it takes 15-20 years for Tdark to bear fruit 6/20/17 revisionTdark analyses by Cristian Bologa
  • 14. ▪ Initially to answer“how many drugs are out there”… ▪ Mapped products (what patients and docs call “drugs”) onto active ingredients (what scientists call“drugs”) ▪ Also wanted to know how many drug targets there are………. 10/20/16 revisionOleg Ursu et al.,Nucl Acids Res,database issue,2017,doi:10.1093/nar/gkw993
  • 15. 10/20/16 revisionR.Santos & O.Ursu et al.,Nature Rev.Drug Discov,2017,doi:10.1038/nrd.2016.230 Drugs distributed by ATC codes (levels 1-2). Concentric rings indicate ATC levels. Histograms represent the number of drugs distributed per year of first approval. Maximum scale:100.
  • 16. 11/09/16 revision Human drug targets distributed by ATC codes (levels 1-2). Concentric rings indicate ATC levels. Histograms represent the global income derived from drugs that act on these targets via Mode of Action annotations. Maximum scale:287 billion USD. T.Oprea et al., Nature Rev.Drug Discov.poster,Jan2017
  • 17. ▪Text mining of all NIH grants (NIH RePORTER) for the 2000- 2015 period suggests that 8858 proteis received zero funding. ▪ Of these,6051 are Tdark,2616 are Tbio.(this is expected). ▪However, 119 are Tchem and 72 are Tclin. Possible explanations: old drug targets or funded elsewhere. ▪We are not able to track patterns of funded research in EU or other institutions. However, most of these proteins are likely to be “in the dark” given overall PubMed/Patent data ▪ Pharma and Academia could pay more attention to these 8858 underfunded proteins. Note:NIH is the only Funding Agency that makes available all funded research data in bulk (for text mining). Other agencies (e.g.,ESF,IMI,even ERC) do not provide bulk access to funded research abstracts. T.Oprea et al., Nature Rev.Drug Discov,2017 (submitted) 5/26/17 revision
  • 18. over 37% of the proteins remain poorly described (Tdark) ~10% of the Proteome (Tclin & Tchem) can be targeted by small molecules Weaddress ~15% of human diseases *) at most with therapeutic agents *) disease-ontology.org catalogs ~9,000 disease concepts.This resource lacks~6,000 rare diseases.We estimate ~15,000 disease concepts; ~2500 have therapeutic agents 11/29/17 revision
  • 19. ▪ IDG KMC2 seeks knowledge gaps across the five branches of the “knowledge tree”: ▪ Genotype; Phenotype;Interactions & Pathways; Structure & Function; and Expression,respectively. ▪ We can use biological systems network modeling to infer novel relationships based on available evidence,and infer new“function” and “role in disease” data based on other layers of evidence ▪ Primary focus on Tdark & Tbio T.Oprea et al., IDG KMC Phase 2,anticipated start 2018 (if funded) 11/29/17 revision
  • 20. ▪ A meta-path encodes type-specific network topology between the source node (e.g.,Protein target) and the destination node (e.g., Disease orFunction) ▪ Target –– (member of) → PPI Network ← (member of) –– Protein –– (associated with) → Disease ▪ Target –– (expressed in) → Tissue ← (localized in) –– Disease T.Oprea et al., IDG KMC Phase 2,anticipated start 2018 (if funded) 11/29/17 revision
  • 21. ▪ Build data matrix from protein knowledge graph along metapaths: ▪ Protein – Protein Interactions ▪ Pathways ▪ GO terms ▪ Gene expression ▪ ... ▪ For the specific disease/phenotype “Dilated cardiomyopathy” in OMIM /TCRD ▪ 35 genes linked to this disease in TCRD ▪ Remaining genes assumed not linked ▪ Random forest binary classifier T.Oprea et al., IDG KMC Phase 2,anticipated start 2018 (if funded) 11/29/17 revision
  • 22. PP: protein – protein interaction with specified protein VIMP higher for positive vs negative Two areTbio: LMNA – prelaminA/C DES - desmin One is Tchem: PSEN2 – presenilin 2 (for Alzheimer’s) HighVIMPoverall: RBM20 (Tbio) MYBPC3 (Tbio) TNNT2 (Tbio) 11/29/17 revisionT.Oprea et al., IDG KMC Phase 2,anticipated start 2018 (if funded)
  • 23. ▪ Genes not annotated in OMIM with“Dilated cardiomyopathy”classified as positives: ▪ NKX2-5 [https://www.ncbi.nlm.nih.gov/pubmed/25503402] ▪ CACNA1C [https://www.ncbi.nlm.nih.gov/pubmed/22589547] ▪ HSPB1 (HSP27) [https://www.ncbi.nlm.nih.gov/pubmed/17873025] ▪ These“false positives”are,in fact,involved in regulating muscle contraction ▪ Given Presenilin 2 is a target for dilated cardiomyopathy,can we find a link between cardiovascular disease and Alzheimer’s? T.Oprea et al., IDG KMC Phase 2,anticipated start 2018 (if funded) 11/29/17 revision
  • 24. Top Dx Prior to Alzheimer’s (5 yrs or more): Essential hypertension Hyperlipidemia Type 2 Diabetes mellitus Hypercholesterolemia Coronary atherosclerosis Atrial Fibrillation … Persistent mental disorders Memory Loss http://rpubs.com/cbologa /alzheimers_disease C.Bologa,O.Ursu,J.J.Yang & T.I.Oprea (UNM) data from Cerner HealthFacts DB 11/29/17 revision
  • 25. 10/20/16 revision/ A.B.Jensen et al.,Nature Communications 2014 5:4022 C.Bologa,O.Ursu,J.J.Yang & T.I.Oprea (UNM) data from Cerner HealthFacts DB Our Cerner HealthFacts chronic kidney disease (CKD) data supports the Disease Trajectory work done on the population of Denmark (15 yrs)
  • 26. 11/6/17 revisionT.Oprea et al., IDG KMC Phase 2,anticipated start 2018 (if funded)
  • 27. PJ Kindermans,KR Müller etal. 2/10/16 revision 1-class SVM model based on expression data Tclin is “known” (everything else is “not known”) Goal: IdentifyTdark proteins that share similar expression patterns to Tclin
  • 28. • N is the number of tissues & xi is the expression profile component, norma- lized by the maximal component value. Data from GTEx v7 • N = 53 • All non-selective genes have  = 0.43; • Selective genes,  = 1; • No data,  = 0. TSI, , provides enrichment for disease & tissue specific genes for drugtargets (GSK paper from V.Kumar et al., Sci Rep 2016, 6: 36205, PMC5099936) Except… TSI does not discriminate well Tclin among TDLclasses… TSI formula fromYanai et al., Bioinformatics2005, 21:650-659 bti042 11/6/17 revision
  • 29. M.E.Dickinson et al.,Nature 2016,537:508-514 IMPC is currently composed of 18 research institutions and 5 national funders
  • 30. IMPC Unassigned IMPC in progress IMPC planned IMPC model MGI models Unassigned Vomoronasal and taste receptors Aborted Other MGI models 95% of eligible IDG genes (339/356) have plans, attempts,or models 384 genes were prioritized by IDG KMC (2014-2016) 17 28 17 1 63 24 50 79 168306 genes are Tbio 90 genes are Tdark 42 genes areTchem 11/29/17 revisionSlide from Steve Murray,JacksonLab
  • 31. 11/7/17 revision Accurate Information is the Foundation of in silico Methods the age of informatics-driven drug/target discovery is here
  • 32. Project conceived by ALEX ZHAVORONKOV, PHD Insilico Medicine, Inc Emerging Technology Centers Johns Hopkins University B301, 1101 33rd Street Baltimore, MD, 21218 • Drug Discovery • Drug Repurposing • Pathway Activation Analysis • Biomarker Development • Clin. Trials Predictors • Aging Research PHARMA.AI MEDICINAL CHEMISTRY REIMAGINED 11/29/17 revision
  • 35. ▪ The drug industry reward system is based on patents ▪ These are awarded for drugs, not targets ▪ Clinically validated targets lead to the me-too phenomenon ▪ Time to pool resources together on Target Selection (which includes target identification, target validation and all –omic research) ▪ Industry should lead a Target Selection Consortium: Partner with academia and co-sponsor“double blind” target finding (avoid the reproducibilitycrisis) ▪ @John Baldoni / GSK: we’d like to team up with OpenTargets & ATOM
  • 36. 5/20/17 revision ▪ We somehow expect that in the near future,artificial intelligence (AI; machine learning; singularity) will be able to process all data related to biomolecules, their biologic endpoints, ex vivo,invivo and clinical data, and discover drugs ▪ The problem with that:GIGO ▪ Garbage in, Garbage out. https://www.linkedin.com/pulse/why-ai-ready-drug-discovery-tudor-oprea
  • 37. 5/20/17 revision ▪ An epistemological question: Can AI discover new knowledge? To date, no credible evidence of this has been provided. Chatbots, winning at chess, GO and Jeopardy! does notcount. ▪ We live in the world of alternative facts when it comes to research (not just politics). People lie.See work by JP Ioannidis,but also notes from Bayer (Asadullah et al) and Amgen (Begley et al). As long as AI gets false data, we cannot provide what’s needed. ▪ As of now,patents cannot name AI as inventors. Patent offices world-wide grant patents to people.What happens when AIdoes become the inventor? Will this happen?! ▪ Will it matter? https://www.linkedin.com/pulse/why-ai-ready-drug-discovery-tudor-oprea
  • 38. 5/20/17 revision ▪ The unrealized promise relates to our (in)ability to explain the two pillars of clinical drug effectiveness: Safety and Efficacy. ▪ In short, be it Drug Safety orPatient Safety,or indeed Clinical Efficacy, we remain unable to model these processes as function of molecular structure. T.I.Oprea et al.,Drug Discov.TodayTherap.Strategies 2011, 8:61-69
  • 39.
  • 40. Presented at the Global Pharma R&D Informatics Congress. For more information, visit: www.global-engage.com