Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Pistoia Alliance-Elsevier Datathon

154 views

Published on

In the late Fall and Winter of 2018, the Pistoia Alliance in cooperation with Elsevier and charitable organizations Cures within Reach and Mission: Cure ran a datathon aiming to find drugs suitable for treatment of childhood chronic pancreatitis, a rare disease that causes extreme suffering. The datathon resulted in identification of four candidate compounds in a short time frame of just under three months. In this webinar our speakers discuss the technologies that made this leap possible

Published in: Health & Medicine
  • Be the first to comment

  • Be the first to like this

Pistoia Alliance-Elsevier Datathon

  1. 1. 21 June, 2019 Big Data Mining and AI for Drug Repurposing Pistoia Alliance Centre of Excellence for AI in Life Sciences and Elsevier Datathon Report Panelists: Aleksandar Poleksic, Professor, University of Northern Iowa Bruce Aronow, Co-director of the Computational Medicine Center at Cincinnati Children’s Hospital Medical Center Finlay Maclean, Elsevier, London UK Jabe Wilson of Elsevier Moderator: Vladimir Makarov
  2. 2. This webinar is being recorded
  3. 3. ©PistoiaAlliance Introduction to Today’s Speakers Aleksandar Poleksic Professor University of Northern Iowa Finlay Maclean Elsevier, London UK Bruce Aronow Co-director Computational Medicine Center at Cincinnati Children’s Hospital Medical Center Jabe Wilson Elsevier
  4. 4. Predictive Analytics for Drug Repurposing
  5. 5. 21.06.2019 • Collaboration across Pharma, Academic and Non-Profit • Data from both Elsevier and 3rd Party sources • Machine Learning and other Analytics methods used to predict Drugs to be repurposed for disease treatment • Results validated by leading experts in the disease (Chronic Pancreatitis) • Our partner Mission-Cure is planning to take drugs to patient trials by January 2020 • “The datathon exceeded our expectations, producing 5 repurposing candidates to address multiple chronic pancreatitis targets” Megan Golden CEO Mission Cure Predictive Analytics for Drug Repurposing
  6. 6. 1. March - July 2019: Finish identifying the most promising candidates, identify which ones need additional preclinical work 3. July 24-26, 2019: PancreasFest meeting in Pittsburgh: coordinate preclinical work and plan clinical trials with PI's 2. July - December 2019: Fund and conduct preclinical work; plan pilots/trials for safest, most promising candidates 4. January - June 2020: Conduct small open-label pilots with safest, most promising candidates and informed patient volunteers 5. July 2020 - June 2022: Conduct repurposing clinical trials using efficient trial designs (e.g. aggregated n of 1 trials); develop master trial protocol 6. July 2022 - June 2024: Implement master trial to test multiple promising therapeutic candidates alone and in combination 7. July 2024 - June 2027: Continue master trial until therapies identified Predictive Analytics for Drug Repurposing
  7. 7. Thank you
  8. 8. Disease-specific concepts Disease entity Phenotypes of Disease Disease Causes/Factors Pathway-Network Target Associations Disease names Genetics; Infectious; Immuno/ Allergic; Environmental; Drugs Gene functions/ annotations, gene interactions; regulators of genes, pathways, cells, tissues; phenotypes ClinVar; ClinGen; MP; Drug-associations: Adverse Events, other indications, eg AERsMine; https://research.cchmc.org/aers/ Information Sources Pathologic attributes and associations • OMIM • HPO Human Phenotype Ontology • ICD • UMLS • Wikipedia Effects and Causal Relationships Modeling a Disease: Identifying Attributes, Causes, Effects, Modifiers, and Treatments Human Cell Atlas Cell type transcriptome Tissue cell map http://toppcell.cchmc.org Elements of any Disease
  9. 9. 21.06.2019 Disease Elements
  10. 10. Toppcell database: using single cell gene expression data to understand gene networks responsible for organ health and disease Single cell dataset(s) Learned cell annotation User- defined Genelist Biological pathway- based Genelist Cell type specific Genelist ± ± Machine Learning-based Analysis User-defined cell annotation Normalization; Clustering; Differential analysis; … Processing Interactive heatmaps Re- analysis Searching Clustering Searching Grouping Enrichment Eric Bardes ±
  11. 11. ToppCell: Leveraging the Human Cell Atlas 21.06.2019 Data Mining by Organ/Cell Type Search/ Cluster/ Enrich/Net Derive models for ° Differentiation ° Organogenesis ° Pathways / Networks ° Cell-cell Interactions ° Physiology ° Pathology Pancreas tissue  individual single cells
  12. 12. exocrine endocrine acinar ductal St alpha delta PP beta marker genes in pancreatic cell types
  13. 13. Portal Views For Data Mining/Systems Biology-Driven Analyses (1) find/select cell clusters/gene modules and anatomical contexts (2) carry out enrichment analyses and machine learning prioritization of genes, pathways, interactions (3) assemble/save/share/export integrated systems biological network models Tissue/Sample-Associated Cell Population Gene Modules: cell type-centered signatures allow for the analysis of cell class and subclass similarities and differences. Use Case: compare/combine alveolar epithelial cell subtypes genesignatures--perregion,per stage,percelltype/subtype |ß Single Cells (1,004 shown) à | Systems Biology via the LungMAP Portalè Note the profound functional association differences between AT1 and AT2 subtype signatures. However, it is precisely through the combinations of their specialized biological functions that alveolar structure and physiological function achieves highly efficient air – blood gas exchange. This illuminates the utility of providing users with subtype and stage-specific gene modules for multimodule and multimodal/technology-based biological network analyses. Single Cell Atlas(es) Per Protocol, source, cell-types, subtypes, and developmental stages (example mouse Fluidigm- LungMAP all distal, all stages, by cell type) Anatomic regions; Cell types; subtypes; develop stages |ß16,400Genes(redundancyok)à| User selects cell types/ gene modules for biological network analyses AT1 cell junctions cell projections cytoskeleton angiogenesis vascular morphogen AT2 surfactant biology lipid biosynth vesicles lamellar body secretion
  14. 14. https://research.cchmc.org/aers/
  15. 15. https://research.cchmc.org/aers/ Drugs with high risk / elevated safety signal for pancreatitis
  16. 16. Drugs Associated with (Unexpectedly) HIGH risk of Pancreatitis
  17. 17. Drugs Associated with (Unexpectedly) LOW risk of Pancreatitis
  18. 18. Using Heterogenous Ensemble Classifiers For Drug Target Interaction Prediction 20 Finlay MacLean
  19. 19. The Problem 21 - Search space is huge - Chemogenomic - Pharmacologic - Known information is sparse and heavily biased - Only positive measurements - Possible data sources huge - Multidomain multilevel information Yella J, Yaddanapudi S, Wang Y, Jegga A. Changing trends in computational drug repositioning. Pharmaceuticals. 2018 Jun;11(2):57.
  20. 20. The Data 22 - 765 disease-associated targets - 119401 positive interactions - 203 targets with known bioactivities - 44161 unique substances - 2766 possible repurposable drugs - 15 main genetic drivers Accumulative bioactivies for disease-associated targets No targets (accumulative) Binding affinity for disease-implicated substances
  21. 21. Kernels and Similarity Metrics 23 Substances - Morgan Fingerprint radius 3 to encode substructures - Tanimoto Distance to determine substructure similarity Targets - Local Smith Waterman Alignment Harish Kandan, Understanding the kernel trick. https://towardsdatascience.com/understanding-the-kernel-trick-e0bc6112ef78
  22. 22. Kernel Explosion 24 - Apply Kronecker multiplication to drug and target kernels - Train Support Vector Machine on Kronecker kernel. - Training kernel: 203 targets with known bioactivities 44161 bioactive substances 41 209 targets x 1 950 193 921 substances = 80 trillion! ~ 500TB! - I wish I had a cluster that big..
  23. 23. Ensemble Learning 25 - Train multiple models! - 1. Each takes subset of data - 2. Each self-evaluates - 3. Evaluate meta-learner - 4. Feed genetic driver of CP - 5. Predict on repurposable drugs - 6. Weighted average of results - Reach optimization limit around 0.94 AUROC (for kernels of 30 substances and 30 targets). - Largest kernels still only around 1000.
  24. 24. Kronecker-RLS 26 Pahikkala T, Airola A. RLScore: regularized least-squares learners. The Journal of Machine Learning Research. 2016 Jan 1;17(1):7803-7.Nascimento AC, Prudêncio RB, Costa IG. A multiple kernel learning algorithm for drug-target interaction prediction. BMC bioinformatics. 2016 Dec;17(1):46. Kashima H, Oyama S, Yamanishi Y, Tsuda K. On pairwise kernels: an efficient alternative and generalization analysis. Adv Data Min Knowl Disc. 2009; 5476:1030–7. - Take advantage of inherent symmetry - Eigendecompose similarity kernels - Take advantage of kernel ‘trick’ - Employ regularised least squares - Feed into ensemble! - Homogenous bagging ensemble performed best Final ensemble: 30 models, each: - Trained and optimized on 500 substances and 200 most bioactive targets - Evaluated (model-level) - Evaluated (ensemble-level) - Predict!
  25. 25. Improvements 27  Sparse data  CGKronRLS (Semi-superversied learning)  Other pairwise relationships can be used  KronRLS-MKL (Multiple kernel learning)  Use of Guassian Interaction Profiles  Sequential model execution and storage  Boosting instead of bagging (sample level optimization)  Making numpy/BLAS work on distributed GPUs  Employ a meta-learning not voting classifier Tapio Pahikkala. Fast gradient computation for learning with tensor product kernels and sparse training labels. Structural, Syntactic, and Statistical Pattern Recognition (S+SSPR). volume 8621 of Lecture Notes in Computer Science, pages 123–132. 2014. Nascimento AC, Prudêncio RB, Costa IG. A multiple kernel learning algorithm for drug-target interaction prediction. BMC bioinformatics. 2016 Dec;17(1):46. Pahikkala T, Airola A. RLScore: regularized least-squares learners. The Journal of Machine Learning Research. 2016 Jan 1;17(1):7803-7.Nascimento AC, Prudêncio RB, Costa IG. A multiple kernel learning algorithm for drug-target interaction prediction. BMC bioinformatics. 2016 Dec;17(1):46. Kashima H, Oyama S, Yamanishi Y, Tsuda K. On pairwise kernels: an efficient alternative and generalization analysis. Adv Data Min Knowl Disc. 2009; 5476:1030–7.
  26. 26. Using “compressed sensing” to support drug repurposing for chronic pancreatitis Prof. Aleksandar Poleksić Department of Computer Science University of Northern Iowa
  27. 27. Compressed sensing for ADR prediction • Idea: Factor 𝑅 𝑚×𝑛 into the product of two lower dimensional matrices 𝑅 = 𝐹𝐺′
  28. 28. Logistic matrix factorization Idea: Factor 𝑅 𝑚×𝑛 into the product of two lower dimensional matrices 𝑅 = 𝐹𝐺′ Loss function: 𝑖,𝑗 𝑤𝑖,𝑗{ln(1 + 𝑒 𝑓𝑖 𝑔 𝑗 ′ ) − (𝑟𝑖,𝑗+𝑞𝑖,𝑗)𝑓𝑖 𝑔𝑗 ′ } + 𝜆 𝐹 𝐹 2 2 + 𝜆 𝐺 𝐺 2 2 + 𝜆 𝑀 𝑖,𝑗 𝑚𝑖,𝑗 𝐹 𝑖, : − 𝐹 𝑗, : 2 2 + 𝜆 𝑁 𝑖,𝑗 𝑛𝑖,𝑗 𝐺 𝑖, : − 𝐺 𝑗, : 2 2 M,N – similarity matrices Q – impute matrix W- weight matrix lambdas – tunable parameters P – output probabilities
  29. 29. Optimization 𝜕/𝜕𝐹 = 𝑊⨀ 𝑃 − 𝑅 + 𝑄 𝐺 + 2𝜆 𝑟 𝐹 + 2𝜆 𝑀(𝐷 𝑀 − 𝑀)𝐹 𝜕/𝜕𝐺 = {𝑊 𝑇 ⨀ 𝑃 𝑇 − 𝑅 𝑇 + 𝑄 𝑇 }𝐹 + 2𝜆 𝑟 𝐺 + 2𝜆 𝑁(𝐷 𝑁 − 𝑁)𝐺 𝑖,𝑗 𝑤𝑖,𝑗 ln 1 + 𝑒 𝑓 𝑖 𝑔 𝑗 ′ − 𝑟𝑖,𝑗 + 𝑞𝑖,𝑗 𝑓𝑖 𝑔𝑗 ′ + 𝜆 𝐹 𝐹 2 2 + 𝜆 𝐺 𝐺 2 2 + 𝜆 𝑀 𝑡𝑟 𝐹′ 𝐷 𝑀 − 𝑀 𝐹 + 𝜆 𝑁 𝑡𝑟 𝐺′ 𝐷 𝑁 − 𝑁 𝐺 Loss function: Partial derivatives: Minimization algorithm: Gradient descent
  30. 30. SIDER benchmark
  31. 31. Q: Is a new chemical likely to cause hepatotoxicity?
  32. 32. Q: Is a new chemical likely to cause a serious rare side effect?
  33. 33. ADR prediction for candidate CP drugs LACOSAMIDE ADRs CC(=O)NC(COC)C(=O)NCC1=CC=CC=C1 ADR_Name(CUI) Prob Nausea(C0027497) 0.99 Vomiting(C0042963) 0.97 Asthenia(C0015672) 0.916 Dizziness(C0012833) 0.912 Headache(C0018681) 0.908 Dry_mouth(C0043352) 0.881 Diarrhea(C0011991) 0.874 Dermatitis(C0015230) 0.818 Constipation(C0009806) 0.789 Somnolence(C2830004) 0.738 Tremor(C0040822) 0.733 Lacosamide ADR profile http://gpubox.cs.uni.edu
  34. 34. Candidate CP drugs – network prediction Compound Disease treats resembles Drug Prob Z-score Hyoscyamine 0.35 6.61 Irinotecan 0.26 4.94 Varenicline 0.20 3.74 Octreotide 0.17 3.26 Propantheline 0.17 3.19 Citalopram 0.17 3.18 Acamprosate 0.14 2.61 Disulfiram 0.13 2.44 Epirubicin 0.12 2.31 Tamoxifen 0.11 2.08 Doxorubicin 0.11 2.05 Naltrexone 0.11 2.03 Paclitaxel 0.10 1.90 Erlotinib 0.10 1.85 Topotecan 0.10 1.83 Sorafenib 0.09 1.64 Proguanil 0.08 1.42 Metformin 0.07 1.30 Telbivudine 0.06 1.13 Orlistat 0.06 1.06 Apply compressed sensing on drug- disease network*: • 1552 compounds • 137 diseases • 755 known treatments * Himmelstein, D.S. & Baranzini, S.E. PLoS Comput Biol 11, e1004259 (2015).
  35. 35. Lacosamide – network prediction Gene Compound Disease treats palliates resemblesresembles Lacosamide-binds-gene-associates pancreatitis: CFTR (0.0806) ALB (0.0522) PTGS2 (0.0224) MPO (0.0192) CYP1A1 (0.0151) ACE (0.0088) ABCB1 (0.0086) FDX1 (0.0065) CXCL8 (0.0063) TNF (0.0039) AHR (0.0035) ADRB2 (0.0028) CA8 (0.0028) SLC12A1 (0.0027) BCHE (0.0017) ADRB1 (0.0015)
  36. 36. Collaborators: Prof. Lei Xie, CUNY Graduate Center References: 1. Poleksic, A., & Xie, L. (2018). Predicting serious rare adverse reactions of novel chemicals. Bioinformatics, 34(16), 2835-2842. 2. Lim, H., Gray, P., Xie, L., & Poleksic, A. (2016). Improved genome-scale multi-target virtual screening via a novel collaborative filtering approach to cold-start problem. Scientific reports, 6, 38860. 3. Poleksic, A., & Xie, L. (2019). Database of Adverse Events Associated with Drugs and Drug Combinations, in review.
  37. 37. Poll Question: In what other medical area should we run the next pre-competitive research exercise? A. Oncology B. Heart Disease C. Diabetes D. Obesity E. Some other unmet need (send
  38. 38. ©PistoiaAlliance Audience Q&A Please use the Question function in GoToWebinar
  39. 39. ©PistoiaAlliance Upcoming Webinars 1. Date TBD – July 2019: User Experience (UX) Design for AI 2. Date TBD: Virtual Roundtable: Innovative Pathways through the FDA & EMEA (with the Westchester Biotech Project) 3. Planning: Ethics and AI Please suggest other topics
  40. 40. info@pistoiaalliance.org @pistoiaalliance www.pistoiaalliance.org Thank You

×