21 June, 2019
Big Data Mining and AI for Drug
Repurposing
Pistoia Alliance Centre of Excellence
for AI in Life Sciences and Elsevier
Datathon Report
Panelists: Aleksandar Poleksic, Professor, University of Northern Iowa
Bruce Aronow, Co-director of the Computational Medicine Center at
Cincinnati Children’s Hospital Medical Center
Finlay Maclean, Elsevier, London UK
Jabe Wilson of Elsevier
Moderator: Vladimir Makarov
This webinar is being recorded
©PistoiaAlliance
Introduction to Today’s Speakers
Aleksandar Poleksic
Professor
University of Northern
Iowa
Finlay Maclean
Elsevier, London UK
Bruce Aronow
Co-director
Computational Medicine Center
at Cincinnati Children’s Hospital
Medical Center
Jabe Wilson
Elsevier
Predictive Analytics for Drug Repurposing
21.06.2019
• Collaboration across Pharma, Academic and Non-Profit
• Data from both Elsevier and 3rd Party sources
• Machine Learning and other Analytics methods used
to predict Drugs to be repurposed for disease treatment
• Results validated by leading experts in the disease
(Chronic Pancreatitis)
• Our partner Mission-Cure is planning to take drugs to
patient trials by January 2020
• “The datathon exceeded our expectations,
producing 5 repurposing candidates to address
multiple chronic pancreatitis targets” Megan Golden
CEO Mission Cure
Predictive Analytics for Drug Repurposing
1. March - July 2019: Finish identifying the most promising candidates, identify which ones need
additional preclinical work
3. July 24-26, 2019: PancreasFest meeting in Pittsburgh: coordinate preclinical work and
plan clinical trials with PI's
2. July - December 2019: Fund and conduct preclinical work; plan pilots/trials for safest, most
promising candidates
4. January - June 2020: Conduct small open-label pilots with safest, most promising candidates
and informed patient volunteers
5. July 2020 - June 2022: Conduct repurposing clinical trials using efficient trial designs (e.g.
aggregated n of 1 trials); develop master trial protocol
6. July 2022 - June 2024: Implement master trial to test multiple promising therapeutic candidates
alone and in combination
7. July 2024 - June 2027: Continue master trial until therapies identified
Predictive Analytics for Drug Repurposing
Thank you
Disease-specific
concepts
Disease
entity
Phenotypes
of Disease
Disease
Causes/Factors
Pathway-Network
Target Associations
Disease
names
Genetics; Infectious;
Immuno/ Allergic;
Environmental; Drugs
Gene functions/ annotations, gene
interactions; regulators of genes,
pathways, cells, tissues; phenotypes
ClinVar; ClinGen; MP;
Drug-associations: Adverse Events,
other indications, eg AERsMine;
https://research.cchmc.org/aers/
Information
Sources
Pathologic
attributes and
associations
• OMIM
• HPO Human Phenotype
Ontology
• ICD
• UMLS
• Wikipedia
Effects and Causal
Relationships
Modeling a Disease: Identifying Attributes, Causes, Effects, Modifiers, and Treatments
Human Cell Atlas
Cell type transcriptome
Tissue cell map
http://toppcell.cchmc.org
Elements of
any Disease
21.06.2019
Disease
Elements
Toppcell database: using single cell gene expression data to
understand gene networks responsible for organ health and
disease
Single cell dataset(s)
Learned cell
annotation
User-
defined
Genelist
Biological
pathway-
based
Genelist
Cell type
specific
Genelist
± ±
Machine Learning-based Analysis
User-defined
cell
annotation
Normalization;
Clustering;
Differential analysis;
…
Processing
Interactive heatmaps
Re-
analysis
Searching
Clustering
Searching
Grouping
Enrichment
Eric Bardes
±
ToppCell: Leveraging the Human Cell Atlas
21.06.2019
Data Mining by
Organ/Cell Type
Search/
Cluster/
Enrich/Net
Derive models for
° Differentiation
° Organogenesis
° Pathways / Networks
° Cell-cell Interactions
° Physiology
° Pathology
Pancreas tissue  individual single cells
exocrine endocrine
acinar ductal St alpha delta PP beta
marker genes in
pancreatic cell
types
Portal Views For Data Mining/Systems Biology-Driven Analyses
(1) find/select cell clusters/gene modules and anatomical contexts
(2) carry out enrichment analyses and machine learning prioritization of genes, pathways, interactions
(3) assemble/save/share/export integrated systems biological network models
Tissue/Sample-Associated Cell Population Gene
Modules: cell type-centered signatures allow
for the analysis of cell class and subclass
similarities and differences.
Use Case: compare/combine alveolar epithelial cell subtypes
genesignatures--perregion,per
stage,percelltype/subtype
|ß Single Cells (1,004 shown) à | Systems Biology via the LungMAP Portalè Note the profound functional association differences
between AT1 and AT2 subtype signatures. However, it is precisely through the combinations of their
specialized biological functions that alveolar structure and physiological function achieves highly
efficient air – blood gas exchange. This illuminates the utility of providing users with subtype and
stage-specific gene modules for multimodule and multimodal/technology-based biological network
analyses.
Single Cell Atlas(es) Per Protocol, source, cell-types, subtypes, and developmental stages
(example mouse Fluidigm- LungMAP all distal, all stages, by cell type)
Anatomic regions; Cell types; subtypes; develop stages
|ß16,400Genes(redundancyok)à|
User selects
cell types/
gene modules
for biological
network
analyses
AT1
cell junctions
cell projections
cytoskeleton
angiogenesis
vascular morphogen
AT2
surfactant biology
lipid biosynth
vesicles
lamellar body
secretion
https://research.cchmc.org/aers/
https://research.cchmc.org/aers/
Drugs with high risk /
elevated safety signal
for pancreatitis
Drugs Associated
with (Unexpectedly)
HIGH risk of
Pancreatitis
Drugs Associated with
(Unexpectedly) LOW risk
of Pancreatitis
Using Heterogenous Ensemble Classifiers For Drug
Target Interaction Prediction
20
Finlay MacLean
The Problem
21
- Search space is huge
- Chemogenomic
- Pharmacologic
- Known information is sparse and heavily
biased
- Only positive measurements
- Possible data sources huge
- Multidomain multilevel information
Yella J, Yaddanapudi S, Wang Y, Jegga A. Changing trends in computational drug repositioning. Pharmaceuticals. 2018 Jun;11(2):57.
The Data
22
- 765 disease-associated targets
- 119401 positive interactions
- 203 targets with known bioactivities
- 44161 unique substances
- 2766 possible repurposable drugs
- 15 main genetic drivers
Accumulative bioactivies for disease-associated targets
No targets (accumulative)
Binding affinity for disease-implicated substances
Kernels and Similarity Metrics
23
Substances
- Morgan Fingerprint radius 3 to encode substructures
- Tanimoto Distance to determine substructure similarity
Targets
- Local Smith Waterman Alignment
Harish Kandan, Understanding the kernel trick. https://towardsdatascience.com/understanding-the-kernel-trick-e0bc6112ef78
Kernel Explosion
24
- Apply Kronecker multiplication to drug and target
kernels
- Train Support Vector Machine on Kronecker kernel.
- Training kernel:
203 targets with known bioactivities
44161 bioactive substances
41 209 targets x 1 950 193 921 substances = 80 trillion!
~ 500TB!
- I wish I had a cluster that big..
Ensemble Learning
25
- Train multiple models!
- 1. Each takes subset of data
- 2. Each self-evaluates
- 3. Evaluate meta-learner
- 4. Feed genetic driver of CP
- 5. Predict on repurposable drugs
- 6. Weighted average of results
- Reach optimization limit around 0.94
AUROC (for kernels of 30 substances and
30 targets).
- Largest kernels still only around 1000.
Kronecker-RLS
26
Pahikkala T, Airola A. RLScore: regularized least-squares learners. The Journal of Machine Learning Research. 2016 Jan 1;17(1):7803-7.Nascimento AC, Prudêncio
RB, Costa IG. A multiple kernel learning algorithm for drug-target interaction prediction. BMC bioinformatics. 2016 Dec;17(1):46.
Kashima H, Oyama S, Yamanishi Y, Tsuda K. On pairwise kernels: an efficient alternative and generalization analysis. Adv Data Min Knowl Disc. 2009; 5476:1030–7.
- Take advantage of inherent symmetry
- Eigendecompose similarity kernels
- Take advantage of kernel ‘trick’
- Employ regularised least squares
- Feed into ensemble!
- Homogenous bagging ensemble performed best
Final ensemble:
30 models, each:
- Trained and optimized on 500 substances
and 200 most bioactive targets
- Evaluated (model-level)
- Evaluated (ensemble-level)
- Predict!
Improvements
27
 Sparse data
 CGKronRLS (Semi-superversied learning)
 Other pairwise relationships can be used
 KronRLS-MKL (Multiple kernel learning)
 Use of Guassian Interaction Profiles
 Sequential model execution and storage
 Boosting instead of bagging (sample level optimization)
 Making numpy/BLAS work on distributed GPUs
 Employ a meta-learning not voting classifier
Tapio Pahikkala. Fast gradient computation for learning with tensor product kernels and sparse training labels. Structural, Syntactic, and Statistical Pattern
Recognition (S+SSPR). volume 8621 of Lecture Notes in Computer Science, pages 123–132. 2014.
Nascimento AC, Prudêncio RB, Costa IG. A multiple kernel learning algorithm for drug-target interaction prediction. BMC bioinformatics. 2016 Dec;17(1):46.
Pahikkala T, Airola A. RLScore: regularized least-squares learners. The Journal of Machine Learning Research. 2016 Jan 1;17(1):7803-7.Nascimento AC,
Prudêncio RB, Costa IG. A multiple kernel learning algorithm for drug-target interaction prediction. BMC bioinformatics. 2016 Dec;17(1):46.
Kashima H, Oyama S, Yamanishi Y, Tsuda K. On pairwise kernels: an efficient alternative and generalization analysis. Adv Data Min Knowl Disc. 2009;
5476:1030–7.
Using “compressed sensing” to
support drug repurposing for
chronic pancreatitis
Prof. Aleksandar Poleksić
Department of Computer Science
University of Northern Iowa
Compressed sensing for ADR prediction
• Idea: Factor 𝑅 𝑚×𝑛 into the product of
two lower dimensional matrices
𝑅 = 𝐹𝐺′
Logistic matrix
factorization
Idea: Factor 𝑅 𝑚×𝑛 into the product of
two lower dimensional matrices
𝑅 = 𝐹𝐺′
Loss function:
𝑖,𝑗
𝑤𝑖,𝑗{ln(1 + 𝑒 𝑓𝑖 𝑔 𝑗
′
) − (𝑟𝑖,𝑗+𝑞𝑖,𝑗)𝑓𝑖 𝑔𝑗
′
} +
𝜆 𝐹 𝐹 2
2
+ 𝜆 𝐺 𝐺 2
2
+
𝜆 𝑀 𝑖,𝑗 𝑚𝑖,𝑗 𝐹 𝑖, : − 𝐹 𝑗, : 2
2
+
𝜆 𝑁
𝑖,𝑗
𝑛𝑖,𝑗 𝐺 𝑖, : − 𝐺 𝑗, : 2
2
M,N – similarity matrices
Q – impute matrix
W- weight matrix
lambdas – tunable parameters
P – output probabilities
Optimization
𝜕/𝜕𝐹 = 𝑊⨀ 𝑃 − 𝑅 + 𝑄 𝐺 + 2𝜆 𝑟 𝐹 + 2𝜆 𝑀(𝐷 𝑀 − 𝑀)𝐹
𝜕/𝜕𝐺 = {𝑊 𝑇
⨀ 𝑃 𝑇
− 𝑅 𝑇
+ 𝑄 𝑇
}𝐹 + 2𝜆 𝑟 𝐺 + 2𝜆 𝑁(𝐷 𝑁 − 𝑁)𝐺
𝑖,𝑗
𝑤𝑖,𝑗 ln 1 + 𝑒 𝑓 𝑖 𝑔 𝑗
′
− 𝑟𝑖,𝑗 + 𝑞𝑖,𝑗 𝑓𝑖 𝑔𝑗
′
+ 𝜆 𝐹 𝐹 2
2
+ 𝜆 𝐺 𝐺 2
2
+ 𝜆 𝑀 𝑡𝑟 𝐹′ 𝐷 𝑀 − 𝑀 𝐹 + 𝜆 𝑁 𝑡𝑟 𝐺′ 𝐷 𝑁 − 𝑁 𝐺
Loss function:
Partial derivatives:
Minimization algorithm: Gradient descent
SIDER benchmark
Q: Is a new chemical likely to cause
hepatotoxicity?
Q: Is a new chemical likely to cause a serious
rare side effect?
ADR prediction for candidate CP drugs
LACOSAMIDE ADRs
CC(=O)NC(COC)C(=O)NCC1=CC=CC=C1
ADR_Name(CUI) Prob
Nausea(C0027497) 0.99
Vomiting(C0042963) 0.97
Asthenia(C0015672) 0.916
Dizziness(C0012833) 0.912
Headache(C0018681) 0.908
Dry_mouth(C0043352) 0.881
Diarrhea(C0011991) 0.874
Dermatitis(C0015230) 0.818
Constipation(C0009806) 0.789
Somnolence(C2830004) 0.738
Tremor(C0040822) 0.733
Lacosamide ADR profile
http://gpubox.cs.uni.edu
Candidate CP drugs – network prediction
Compound Disease
treats
resembles
Drug Prob Z-score
Hyoscyamine 0.35 6.61
Irinotecan 0.26 4.94
Varenicline 0.20 3.74
Octreotide 0.17 3.26
Propantheline 0.17 3.19
Citalopram 0.17 3.18
Acamprosate 0.14 2.61
Disulfiram 0.13 2.44
Epirubicin 0.12 2.31
Tamoxifen 0.11 2.08
Doxorubicin 0.11 2.05
Naltrexone 0.11 2.03
Paclitaxel 0.10 1.90
Erlotinib 0.10 1.85
Topotecan 0.10 1.83
Sorafenib 0.09 1.64
Proguanil 0.08 1.42
Metformin 0.07 1.30
Telbivudine 0.06 1.13
Orlistat 0.06 1.06
Apply compressed sensing on drug-
disease network*:
• 1552 compounds
• 137 diseases
• 755 known treatments
* Himmelstein, D.S. & Baranzini, S.E. PLoS Comput Biol 11, e1004259 (2015).
Lacosamide – network prediction
Gene
Compound Disease
treats
palliates
resemblesresembles
Lacosamide-binds-gene-associates pancreatitis:
CFTR (0.0806)
ALB (0.0522)
PTGS2 (0.0224)
MPO (0.0192)
CYP1A1 (0.0151)
ACE (0.0088)
ABCB1 (0.0086)
FDX1 (0.0065)
CXCL8 (0.0063)
TNF (0.0039)
AHR (0.0035)
ADRB2 (0.0028)
CA8 (0.0028)
SLC12A1 (0.0027)
BCHE (0.0017)
ADRB1 (0.0015)
Collaborators:
Prof. Lei Xie, CUNY Graduate Center
References:
1. Poleksic, A., & Xie, L. (2018). Predicting serious rare adverse reactions of novel
chemicals. Bioinformatics, 34(16), 2835-2842.
2. Lim, H., Gray, P., Xie, L., & Poleksic, A. (2016). Improved genome-scale multi-target
virtual screening via a novel collaborative filtering approach to cold-start problem. Scientific
reports, 6, 38860.
3. Poleksic, A., & Xie, L. (2019). Database of Adverse Events Associated with Drugs and
Drug Combinations, in review.
Poll Question:
In what other medical area should we run
the next pre-competitive research
exercise?
A. Oncology
B. Heart Disease
C. Diabetes
D. Obesity
E. Some other unmet need (send
©PistoiaAlliance
Audience Q&A
Please use the Question function in GoToWebinar
©PistoiaAlliance
Upcoming Webinars
1. Date TBD – July 2019: User Experience (UX) Design
for AI
2. Date TBD: Virtual Roundtable: Innovative Pathways
through the FDA & EMEA (with the Westchester
Biotech Project)
3. Planning: Ethics and AI
Please suggest other topics
info@pistoiaalliance.org @pistoiaalliance www.pistoiaalliance.org
Thank You

Pistoia Alliance-Elsevier Datathon

  • 1.
    21 June, 2019 BigData Mining and AI for Drug Repurposing Pistoia Alliance Centre of Excellence for AI in Life Sciences and Elsevier Datathon Report Panelists: Aleksandar Poleksic, Professor, University of Northern Iowa Bruce Aronow, Co-director of the Computational Medicine Center at Cincinnati Children’s Hospital Medical Center Finlay Maclean, Elsevier, London UK Jabe Wilson of Elsevier Moderator: Vladimir Makarov
  • 2.
    This webinar isbeing recorded
  • 3.
    ©PistoiaAlliance Introduction to Today’sSpeakers Aleksandar Poleksic Professor University of Northern Iowa Finlay Maclean Elsevier, London UK Bruce Aronow Co-director Computational Medicine Center at Cincinnati Children’s Hospital Medical Center Jabe Wilson Elsevier
  • 4.
    Predictive Analytics forDrug Repurposing
  • 5.
    21.06.2019 • Collaboration acrossPharma, Academic and Non-Profit • Data from both Elsevier and 3rd Party sources • Machine Learning and other Analytics methods used to predict Drugs to be repurposed for disease treatment • Results validated by leading experts in the disease (Chronic Pancreatitis) • Our partner Mission-Cure is planning to take drugs to patient trials by January 2020 • “The datathon exceeded our expectations, producing 5 repurposing candidates to address multiple chronic pancreatitis targets” Megan Golden CEO Mission Cure Predictive Analytics for Drug Repurposing
  • 6.
    1. March -July 2019: Finish identifying the most promising candidates, identify which ones need additional preclinical work 3. July 24-26, 2019: PancreasFest meeting in Pittsburgh: coordinate preclinical work and plan clinical trials with PI's 2. July - December 2019: Fund and conduct preclinical work; plan pilots/trials for safest, most promising candidates 4. January - June 2020: Conduct small open-label pilots with safest, most promising candidates and informed patient volunteers 5. July 2020 - June 2022: Conduct repurposing clinical trials using efficient trial designs (e.g. aggregated n of 1 trials); develop master trial protocol 6. July 2022 - June 2024: Implement master trial to test multiple promising therapeutic candidates alone and in combination 7. July 2024 - June 2027: Continue master trial until therapies identified Predictive Analytics for Drug Repurposing
  • 7.
  • 8.
    Disease-specific concepts Disease entity Phenotypes of Disease Disease Causes/Factors Pathway-Network Target Associations Disease names Genetics;Infectious; Immuno/ Allergic; Environmental; Drugs Gene functions/ annotations, gene interactions; regulators of genes, pathways, cells, tissues; phenotypes ClinVar; ClinGen; MP; Drug-associations: Adverse Events, other indications, eg AERsMine; https://research.cchmc.org/aers/ Information Sources Pathologic attributes and associations • OMIM • HPO Human Phenotype Ontology • ICD • UMLS • Wikipedia Effects and Causal Relationships Modeling a Disease: Identifying Attributes, Causes, Effects, Modifiers, and Treatments Human Cell Atlas Cell type transcriptome Tissue cell map http://toppcell.cchmc.org Elements of any Disease
  • 9.
  • 10.
    Toppcell database: usingsingle cell gene expression data to understand gene networks responsible for organ health and disease Single cell dataset(s) Learned cell annotation User- defined Genelist Biological pathway- based Genelist Cell type specific Genelist ± ± Machine Learning-based Analysis User-defined cell annotation Normalization; Clustering; Differential analysis; … Processing Interactive heatmaps Re- analysis Searching Clustering Searching Grouping Enrichment Eric Bardes ±
  • 11.
    ToppCell: Leveraging theHuman Cell Atlas 21.06.2019 Data Mining by Organ/Cell Type Search/ Cluster/ Enrich/Net Derive models for ° Differentiation ° Organogenesis ° Pathways / Networks ° Cell-cell Interactions ° Physiology ° Pathology Pancreas tissue  individual single cells
  • 12.
    exocrine endocrine acinar ductalSt alpha delta PP beta marker genes in pancreatic cell types
  • 13.
    Portal Views ForData Mining/Systems Biology-Driven Analyses (1) find/select cell clusters/gene modules and anatomical contexts (2) carry out enrichment analyses and machine learning prioritization of genes, pathways, interactions (3) assemble/save/share/export integrated systems biological network models Tissue/Sample-Associated Cell Population Gene Modules: cell type-centered signatures allow for the analysis of cell class and subclass similarities and differences. Use Case: compare/combine alveolar epithelial cell subtypes genesignatures--perregion,per stage,percelltype/subtype |ß Single Cells (1,004 shown) à | Systems Biology via the LungMAP Portalè Note the profound functional association differences between AT1 and AT2 subtype signatures. However, it is precisely through the combinations of their specialized biological functions that alveolar structure and physiological function achieves highly efficient air – blood gas exchange. This illuminates the utility of providing users with subtype and stage-specific gene modules for multimodule and multimodal/technology-based biological network analyses. Single Cell Atlas(es) Per Protocol, source, cell-types, subtypes, and developmental stages (example mouse Fluidigm- LungMAP all distal, all stages, by cell type) Anatomic regions; Cell types; subtypes; develop stages |ß16,400Genes(redundancyok)à| User selects cell types/ gene modules for biological network analyses AT1 cell junctions cell projections cytoskeleton angiogenesis vascular morphogen AT2 surfactant biology lipid biosynth vesicles lamellar body secretion
  • 14.
  • 17.
    https://research.cchmc.org/aers/ Drugs with highrisk / elevated safety signal for pancreatitis
  • 18.
  • 19.
    Drugs Associated with (Unexpectedly)LOW risk of Pancreatitis
  • 20.
    Using Heterogenous EnsembleClassifiers For Drug Target Interaction Prediction 20 Finlay MacLean
  • 21.
    The Problem 21 - Searchspace is huge - Chemogenomic - Pharmacologic - Known information is sparse and heavily biased - Only positive measurements - Possible data sources huge - Multidomain multilevel information Yella J, Yaddanapudi S, Wang Y, Jegga A. Changing trends in computational drug repositioning. Pharmaceuticals. 2018 Jun;11(2):57.
  • 22.
    The Data 22 - 765disease-associated targets - 119401 positive interactions - 203 targets with known bioactivities - 44161 unique substances - 2766 possible repurposable drugs - 15 main genetic drivers Accumulative bioactivies for disease-associated targets No targets (accumulative) Binding affinity for disease-implicated substances
  • 23.
    Kernels and SimilarityMetrics 23 Substances - Morgan Fingerprint radius 3 to encode substructures - Tanimoto Distance to determine substructure similarity Targets - Local Smith Waterman Alignment Harish Kandan, Understanding the kernel trick. https://towardsdatascience.com/understanding-the-kernel-trick-e0bc6112ef78
  • 24.
    Kernel Explosion 24 - ApplyKronecker multiplication to drug and target kernels - Train Support Vector Machine on Kronecker kernel. - Training kernel: 203 targets with known bioactivities 44161 bioactive substances 41 209 targets x 1 950 193 921 substances = 80 trillion! ~ 500TB! - I wish I had a cluster that big..
  • 25.
    Ensemble Learning 25 - Trainmultiple models! - 1. Each takes subset of data - 2. Each self-evaluates - 3. Evaluate meta-learner - 4. Feed genetic driver of CP - 5. Predict on repurposable drugs - 6. Weighted average of results - Reach optimization limit around 0.94 AUROC (for kernels of 30 substances and 30 targets). - Largest kernels still only around 1000.
  • 26.
    Kronecker-RLS 26 Pahikkala T, AirolaA. RLScore: regularized least-squares learners. The Journal of Machine Learning Research. 2016 Jan 1;17(1):7803-7.Nascimento AC, Prudêncio RB, Costa IG. A multiple kernel learning algorithm for drug-target interaction prediction. BMC bioinformatics. 2016 Dec;17(1):46. Kashima H, Oyama S, Yamanishi Y, Tsuda K. On pairwise kernels: an efficient alternative and generalization analysis. Adv Data Min Knowl Disc. 2009; 5476:1030–7. - Take advantage of inherent symmetry - Eigendecompose similarity kernels - Take advantage of kernel ‘trick’ - Employ regularised least squares - Feed into ensemble! - Homogenous bagging ensemble performed best Final ensemble: 30 models, each: - Trained and optimized on 500 substances and 200 most bioactive targets - Evaluated (model-level) - Evaluated (ensemble-level) - Predict!
  • 27.
    Improvements 27  Sparse data CGKronRLS (Semi-superversied learning)  Other pairwise relationships can be used  KronRLS-MKL (Multiple kernel learning)  Use of Guassian Interaction Profiles  Sequential model execution and storage  Boosting instead of bagging (sample level optimization)  Making numpy/BLAS work on distributed GPUs  Employ a meta-learning not voting classifier Tapio Pahikkala. Fast gradient computation for learning with tensor product kernels and sparse training labels. Structural, Syntactic, and Statistical Pattern Recognition (S+SSPR). volume 8621 of Lecture Notes in Computer Science, pages 123–132. 2014. Nascimento AC, Prudêncio RB, Costa IG. A multiple kernel learning algorithm for drug-target interaction prediction. BMC bioinformatics. 2016 Dec;17(1):46. Pahikkala T, Airola A. RLScore: regularized least-squares learners. The Journal of Machine Learning Research. 2016 Jan 1;17(1):7803-7.Nascimento AC, Prudêncio RB, Costa IG. A multiple kernel learning algorithm for drug-target interaction prediction. BMC bioinformatics. 2016 Dec;17(1):46. Kashima H, Oyama S, Yamanishi Y, Tsuda K. On pairwise kernels: an efficient alternative and generalization analysis. Adv Data Min Knowl Disc. 2009; 5476:1030–7.
  • 28.
    Using “compressed sensing”to support drug repurposing for chronic pancreatitis Prof. Aleksandar Poleksić Department of Computer Science University of Northern Iowa
  • 29.
    Compressed sensing forADR prediction • Idea: Factor 𝑅 𝑚×𝑛 into the product of two lower dimensional matrices 𝑅 = 𝐹𝐺′
  • 30.
    Logistic matrix factorization Idea: Factor𝑅 𝑚×𝑛 into the product of two lower dimensional matrices 𝑅 = 𝐹𝐺′ Loss function: 𝑖,𝑗 𝑤𝑖,𝑗{ln(1 + 𝑒 𝑓𝑖 𝑔 𝑗 ′ ) − (𝑟𝑖,𝑗+𝑞𝑖,𝑗)𝑓𝑖 𝑔𝑗 ′ } + 𝜆 𝐹 𝐹 2 2 + 𝜆 𝐺 𝐺 2 2 + 𝜆 𝑀 𝑖,𝑗 𝑚𝑖,𝑗 𝐹 𝑖, : − 𝐹 𝑗, : 2 2 + 𝜆 𝑁 𝑖,𝑗 𝑛𝑖,𝑗 𝐺 𝑖, : − 𝐺 𝑗, : 2 2 M,N – similarity matrices Q – impute matrix W- weight matrix lambdas – tunable parameters P – output probabilities
  • 31.
    Optimization 𝜕/𝜕𝐹 = 𝑊⨀𝑃 − 𝑅 + 𝑄 𝐺 + 2𝜆 𝑟 𝐹 + 2𝜆 𝑀(𝐷 𝑀 − 𝑀)𝐹 𝜕/𝜕𝐺 = {𝑊 𝑇 ⨀ 𝑃 𝑇 − 𝑅 𝑇 + 𝑄 𝑇 }𝐹 + 2𝜆 𝑟 𝐺 + 2𝜆 𝑁(𝐷 𝑁 − 𝑁)𝐺 𝑖,𝑗 𝑤𝑖,𝑗 ln 1 + 𝑒 𝑓 𝑖 𝑔 𝑗 ′ − 𝑟𝑖,𝑗 + 𝑞𝑖,𝑗 𝑓𝑖 𝑔𝑗 ′ + 𝜆 𝐹 𝐹 2 2 + 𝜆 𝐺 𝐺 2 2 + 𝜆 𝑀 𝑡𝑟 𝐹′ 𝐷 𝑀 − 𝑀 𝐹 + 𝜆 𝑁 𝑡𝑟 𝐺′ 𝐷 𝑁 − 𝑁 𝐺 Loss function: Partial derivatives: Minimization algorithm: Gradient descent
  • 32.
  • 33.
    Q: Is anew chemical likely to cause hepatotoxicity?
  • 34.
    Q: Is anew chemical likely to cause a serious rare side effect?
  • 35.
    ADR prediction forcandidate CP drugs LACOSAMIDE ADRs CC(=O)NC(COC)C(=O)NCC1=CC=CC=C1 ADR_Name(CUI) Prob Nausea(C0027497) 0.99 Vomiting(C0042963) 0.97 Asthenia(C0015672) 0.916 Dizziness(C0012833) 0.912 Headache(C0018681) 0.908 Dry_mouth(C0043352) 0.881 Diarrhea(C0011991) 0.874 Dermatitis(C0015230) 0.818 Constipation(C0009806) 0.789 Somnolence(C2830004) 0.738 Tremor(C0040822) 0.733 Lacosamide ADR profile http://gpubox.cs.uni.edu
  • 36.
    Candidate CP drugs– network prediction Compound Disease treats resembles Drug Prob Z-score Hyoscyamine 0.35 6.61 Irinotecan 0.26 4.94 Varenicline 0.20 3.74 Octreotide 0.17 3.26 Propantheline 0.17 3.19 Citalopram 0.17 3.18 Acamprosate 0.14 2.61 Disulfiram 0.13 2.44 Epirubicin 0.12 2.31 Tamoxifen 0.11 2.08 Doxorubicin 0.11 2.05 Naltrexone 0.11 2.03 Paclitaxel 0.10 1.90 Erlotinib 0.10 1.85 Topotecan 0.10 1.83 Sorafenib 0.09 1.64 Proguanil 0.08 1.42 Metformin 0.07 1.30 Telbivudine 0.06 1.13 Orlistat 0.06 1.06 Apply compressed sensing on drug- disease network*: • 1552 compounds • 137 diseases • 755 known treatments * Himmelstein, D.S. & Baranzini, S.E. PLoS Comput Biol 11, e1004259 (2015).
  • 37.
    Lacosamide – networkprediction Gene Compound Disease treats palliates resemblesresembles Lacosamide-binds-gene-associates pancreatitis: CFTR (0.0806) ALB (0.0522) PTGS2 (0.0224) MPO (0.0192) CYP1A1 (0.0151) ACE (0.0088) ABCB1 (0.0086) FDX1 (0.0065) CXCL8 (0.0063) TNF (0.0039) AHR (0.0035) ADRB2 (0.0028) CA8 (0.0028) SLC12A1 (0.0027) BCHE (0.0017) ADRB1 (0.0015)
  • 38.
    Collaborators: Prof. Lei Xie,CUNY Graduate Center References: 1. Poleksic, A., & Xie, L. (2018). Predicting serious rare adverse reactions of novel chemicals. Bioinformatics, 34(16), 2835-2842. 2. Lim, H., Gray, P., Xie, L., & Poleksic, A. (2016). Improved genome-scale multi-target virtual screening via a novel collaborative filtering approach to cold-start problem. Scientific reports, 6, 38860. 3. Poleksic, A., & Xie, L. (2019). Database of Adverse Events Associated with Drugs and Drug Combinations, in review.
  • 39.
    Poll Question: In whatother medical area should we run the next pre-competitive research exercise? A. Oncology B. Heart Disease C. Diabetes D. Obesity E. Some other unmet need (send
  • 40.
    ©PistoiaAlliance Audience Q&A Please usethe Question function in GoToWebinar
  • 41.
    ©PistoiaAlliance Upcoming Webinars 1. DateTBD – July 2019: User Experience (UX) Design for AI 2. Date TBD: Virtual Roundtable: Innovative Pathways through the FDA & EMEA (with the Westchester Biotech Project) 3. Planning: Ethics and AI Please suggest other topics
  • 42.

Editor's Notes

  • #12 - No absolute line between disease modelling and target idenification. Next method illustrates this.  - This tool developed by Dr Bruce Aronow and his research group. - Cell Atlas incredible project -> this builds upon this to gain greater understanding of disease mechanism. - TODO: Labels bigger – Y AXIS cell type gene modules