Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

MOLIERE: Automatic Biomedical Hypothesis Generation System

368 views

Published on

This presentation highlights the network construction and query process for the MOLIERE automatic hypothesis generation system. This is in reference to https://arxiv.org/abs/1702.06176

Published in: Science
  • Nice !! Download 100 % Free Ebooks, PPts, Study Notes, Novels, etc @ https://www.ThesisScientist.com
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

MOLIERE: Automatic Biomedical Hypothesis Generation System

  1. 1. MOLIERE Automatic Biomedical Hypothesis Generation System Justin Sybrandt Michael Shtutman Ilya Safro Clemson University School of Computing University of South Carolina College of Pharmacy Clemson University School of Computing
  2. 2. UNDISCOVERED PUBLIC KNOWLEDGE • Proposed by Swanson in 1986 • The set of public knowledge is too large • Contains implicit connections 2
  3. 3. HUMAN LIMITS • No person can read everything • 2,000 – 4,000 new papers daily • People Specialize • Limits knowledge sharing across disciplines • Bias and Inconsistent Assumptions 3
  4. 4. WHAT IS A HYPOTHESIS? 4 DRUG DISEASE ?Inhibits Causes • An idea or explanation for something that is based on known facts but has not yet been proven • (Definition of “hypothesis” from the Cambridge Academic Content Dictionary © Cambridge University Press)
  5. 5. AUTOMATIC HYPOTHESIS GENERATION 5 DISEASEDRUG Network of Medical Objects Connections NotWell Studied
  6. 6. AUTOMATIC HYPOTHESIS GENERATION 6 DISEASEDRUG Network of Medical Objects Connections NotWell Studied
  7. 7. AUTOMATIC HYPOTHESIS GENERATION 7 DISEASEDRUG Network of Medical Objects Connections NotWell Studied
  8. 8. 8 Methodology Highlights ARROWSMITH • One of the first hypothesis generation systems. • Found link between Fish Oil and Raynaud’s Disease. DiseaseConnect • Finds connections between genes and diseases. • Displays information in an interactive prompt. BioLDA • Constructs high quality topic models aided by domain-specific information. • Identified link betweenVenlafaxine and HTR1A. RELATED APPROACHES
  9. 9. 9 Methodology Limitation Reference ARROWSMITH Limited Document Set Neil R Smalheiser and Don R Swanson. 1998. DiseaseConnect Limited Document Set Chun-Chi Liu et al. 2014. BioLDA LimitedVocabulary Huijun Wang et al. 2011. RELATED APPROACHES
  10. 10. NEW APPROACH 10 • Construct a large network • Identify meaningful paths • Extend paths to neighboring nodes • Mine neighborhoods for important topics
  11. 11. NETWORK CONSTRUCTION 11 QUERY PROCESS
  12. 12. NETWORK CONSTRUCTION • National Library of Medicine (NLM) • National Center for Biotechnology Information (NCBI) • MEDLINE • 25 Million Documents • Titles and Abstracts 12
  13. 13. • SPECALIST NLPTOOLSET • Natural LanguageToolKit (NLTK) 13 some example text that hopefully more understandable than typical medical abstract This is some example text that is hopefully more understandable than the typical medical abstract. Raw Data CleanedText NETWORK CONSTRUCTION
  14. 14. NETWORK CONSTRUCTION • Topical Pattern Mining • Groups together common phrases • Creates 2,3,…,n-grams 14 example text hopefully more understandable typical medical abstract some example text that hopefully more understandable than typical medical abstract CleanedText Phrases
  15. 15. NETWORK CONSTRUCTION • FastText: Projects phrases into real valued vector space • Long word embedding composed from subwords 15 example text hopefully more understandable typical medical abstract Phrases Point Cloud
  16. 16. 16 Point Cloud Centroid NETWORK CONSTRUCTION • Embed documents by averaging over point clouds
  17. 17. NETWORK CONSTRUCTION • Construct KNN • Fast Library for Approximate Nearest Neighbors 17 All Centroids Network
  18. 18. NETWORK CONSTRUCTION 18 • UMLS Metathesarus • Curated keyword network • 2 Million Nodes • Superset of MESH
  19. 19. NETWORK CONSTRUCTION 19 • Edge weight ~ Distance • Inv.TF-IDF cross-layer edges • Edges normalized [0,1]
  20. 20. NETWORK CONSTRUCTION 20 QUERY PROCESS
  21. 21. QUERY PROCESS 21
  22. 22. 22 • User selects two nodes • Restrained to keywords • Can generalize to two sets QUERY PROCESS
  23. 23. QUERY PROCESS 23 • Identify shortest path between query sets
  24. 24. QUERY PROCESS 24 • N: Abstracts close to those in original path • C: Abstracts which share path-adjacent keywords
  25. 25. QUERY PROCESS 25 • PLDA+: Identifies topics present in a set of text. Topic … … … Topic … … … Topic … … … • Topic patterns shed light on results
  26. 26. QUERY PROCESS • Hypothesis represented as a topic model • Shown: Venlaflaxine vs. HTR1A 26 TOPIC: 0 antidepressant_drugs milnacipran org selected ht TOPIC: 1 increase reduced treatment dorsal_raphe_nucleus effect TOPIC: 2 rats sert ht_receptor escitalopram potency
  27. 27. RESULTS 27 VENLAFAXINE DRUG REPURPOSING
  28. 28. ? VENLAFAXINE EXAMPLE • Venlafaxine: • Treats depression / anxiety • HTR[12]A: • Linked to depression / anxiety • No paper linked these concepts 28 Venlafaxine HTR1A HTR2A Depression Anxiety
  29. 29. VENLAFAXINE RESULTS 29 0 2 4 6 8 10 12 14 16 18 20 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 KeywordCount Topic Number Depression Related Keywords PerTopic HTR1A HTR2A 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 KeywordCount Topic Number Anxiety Related Keywords PerTopic HTR1A HTR2A
  30. 30. • Hypothesis represented as a topic model • Shown: Venlaflaxine vs. HTR1A 30 TOPIC: 0 antidepressant_drugs milnacipran org selected ht TOPIC: 1 increase reduced treatment dorsal_raphe_nucleus effect TOPIC: 2 rats sert ht_receptor escitalopram potency antidepressant produces serotonin “Sertraline” antidepressant controlled by HTR1A antidepressant VENLAFAXINE RESULTS
  31. 31. DRUG REPURPOSING EXAMPLE 31 • Drugs can be modified to treat new diseases • Decreases drug development time and costs HIVDRUG ? Cancer DDX3
  32. 32. DRUG REPURPOSING EXPERIMENT 32 • Ran nearly 10,000 queries involving DDX3: Signal Transduction Transcription Adhesion Cancer Development Translation RNA DDX3
  33. 33. DRUG REPURPOSING EXPERIMENT 33 • Ran nearly 10,000 queries involving DDX3: • Expecting: WNT Signaling Pathways Cell – Cell Adhesion Cell – Matrix Adhesion
  34. 34. DRUG REPURPOSING RESULTS 34 WNT Signaling Pathways Cell – Cell Adhesion Cell – Matrix Adhesion substrate adhesion RGD cell adhesion domain cell adhesion factor focal adhesion kinase cell-cell adhesion regulation of cell-cell adhesion cell-adhesion molecules signal-transduction associated kinases cell adhesion kinase
  35. 35. APPLICATIONS 35 • Drug Repurposing • Extensions to new domains • Patents, Economics, etc. • Coping with Deadlines
  36. 36. OPEN RESEARCH QUESTIONS • Result Interpretation • SystemVerification • Automatic Network Tuning • Streaming Network Reconstruction • Inclusion of Additional Data Sources 36
  37. 37. THANK YOU J. Sybrandt, M. Shtutman, I. Safro “MOLIERE:Automatic Biomedical Hypothesis Generation System” Code and Data: https://people.cs.clemson.edu/~isafro/software.html Email: JSYBRAN@CLEMSON.EDU

×