Your SlideShare is downloading. ×
Delroy Cameron's Dissertation Defense: A Contenxt-Driven Subgraph Model for Literature-based Discovery
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Delroy Cameron's Dissertation Defense: A Contenxt-Driven Subgraph Model for Literature-based Discovery

666
views

Published on

Literature-Based Discovery (LBD) refers to the process of uncovering hidden connections that are implicit in scientific literature. Numerous hypotheses have been generated from scientific literature, …

Literature-Based Discovery (LBD) refers to the process of uncovering hidden connections that are implicit in scientific literature. Numerous hypotheses have been generated from scientific literature, which influenced innovations in diagnosis, treatment, preventions and overall public health. However, much of the existing research on discovering hidden connections among concepts have used distributional statistics and graph-theoretic measures to capture implicit associations. Such metrics do not explicitly capture the semantics of hidden connections. ...

While effective in some situations, the practice of relying on domain expertise, structured background knowledge and heuristics to complement distributional and graph-theoretic approaches, has serious limitations. ..

This dissertation proposes an innovative context-driven, automatic subgraph creation method for finding hidden and complex associations among concepts, along multiple thematic dimensions. It outlines definitions for context and shared context, based on implicit and explicit (or formal) semantics, which compensate for deficiencies in statistical and graph-based metrics. It also eliminates the need for heuristics a priori. An evidence-based evaluation of the proposed framework showed that 8 out of 9 existing scientific discoveries could be recovered using this approach. Additionally, insights into the meaning of associations could be obtained using provenance provided by the system. In a statistical evaluation to determine the interestingness of the generated subgraphs, it was observed that an arbitrary association is mentioned in only approximately 4 articles in MEDLINE, on average. These results suggest that leveraging implicit and explicit context, as defined in this dissertation, is an advancement of the state-of-the-art in LBD research.

Ph.D. Committee: Drs. Amit Sheth (Advisor), TK Prasad, Michael Raymer,
Ramakanth Kavuluru (UKY), Thomas C. Rindflesch (NLM) and Varun Bhagwan (Yahoo! Labs)

Relevant Publications (more at: http://knoesis.wright.edu/students/delroy/)

D. Cameron, R. Kavuluru, T. C. Rindflesch, O. Bodenreider, A. P. Sheth, K. Thirunarayan. Leveraging Distributional Semantics for Domain Agnostic Literature-Based Discovery (under preparation)

D. Cameron, O. Bodenreider, H. Yalamanchili, T. Danh, S. Vallabhaneni, K. Thirunarayan, A. P. Sheth, T. C. Rindflesch. A Graph-based Recovery and Decomposition of Swanson’s Hypothesis using Semantic Predications. Journal of Biomedical Informatics (JBI13), 46(2): 238–251, 2013

D. Cameron, R. Kavuluru, O. Bodenreider, P. N. Mendes, A. P. Sheth, K. Thirunarayan. Semantic Predications for Complex Information Needs in Biomedical Literature International Bioinformatics and Biomedical Conference (BIBM11), pp. 512–519, 2011 (acceptance rate=19.4%)

D. Cameron, P. N. Mendes, A. P. Sheth, V. Chan. Semantics-empowered Text Exploration for Knowledge Discovery. ACM Southeast Conference (ACMSE10), 14, 2010

Published in: Education

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
666
On Slideshare
0
From Embeds
0
Number of Embeds
4
Actions
Shares
0
Downloads
2
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide
  • Thank everyone for coming.
    Feel free to ask questions
  • Explored the Research Question of: Characteristics of Inheritance of Traits across Generations of Peas
    Gregor Johann Mendel – Debunked Blending Inheritance, Founder of Genetics, Pea Hybridization, 1866

    EXPERIMENTATION

    OBSERVATION
    - Inheritance of traits across generations seemed to extend beyond the immediate parents in the lineage

    EXPLANATION
    - Inheritance of traits appears to be influenced by the presence of dominant and recessive factors, which split, then independently recombine

    THEORY
    - Law of Segregation
    - Law of Independent assortment

    Explored the Research Question of: The mechanism of Cell Division (cytology) in the embryos of Grasshoppers
    Walter Sutton & Theodor Boveri – Cytology 1903, Genetic Inheritance, each cell split is equally likely – gives the causal mechanism for Mendel’s law

    OBSERVED
    - splitting of chromosomes in the cells of grasshoppers (meiosis)

    EXPLAINED
    - Mendels laws of inheritance applied to chromosomes at the cellular level in living organisms

    THEORIZED
    - Chromosomes are the basis for genetic inheritance

    Jorn Dyerberg & Hans Olaf Bang (1913–1994) – The Greenland Eskimo
    OBSERVED
    - Greenland Eskimos, no AMI

    EXPLAINED
    - diet rich in omega-3 fatty acids

    THEORIZED
    - marine oils can treat thrombosis, atherosceloris, and AMI
  • LBD is now driven by digital data (in silico as opposed to in vivo)
    Four activities involved in the science of making discoveries under the guidance of a Human
  • An information processing system that leverages rich representations of textual content from scientific literature based on implicit and explicit context can provide effective means for literature-based discovery. This has been convincingly demonstrated through rediscovery of several well-known associations (between biomedical concepts) and their substantiation using MEDLINE and the Medical Subject Headings (MeSH) vocabulary.

  • Vioxx Brand Name (Rofecoxib is a nonsteroidal anti-inflammatory drug - NSAID)
    - stronger pain medication than Naproxen (Brand Name Aleve)
    - easier on the stomach than Naproxen

    2004 Merck’s Clinical Trial - proved risk of heart attack

    Lawsuit by 50,000 patients

  • Vioxx (anti-inflammatory)
    - stronger
    - less severe side effects (easier on the stomach)

    Lawsuit by 50,000 patients

  • LBD is different from traditional research
    Direct observations of the object of interest

    Keyword-based – error prone due to absence of text normalization to standard concepts
    Concept-based – (also Semantics-based, concepts but no explicit relationships)
    Relations-based – (explicit relationships) but limited complexity, unable to capture causality, mechanisms of interaction
    Graph-based - Giant Component, Clustering Coefficient, Geodesic, Centrality (betweenness, closeness)
    Hybrid – combine machine learning, summarization with traditional LBD approaches
  • Rich representations
    Personalization
    Google Knowledge Graph
    Human Activity Modeling
    Mobile Applications/Advertising (get examples)

    Two goals for automation:
    Create subgraphs that capture complex associations
    Along multiple thematic dimensions

    Use of background knowledge to improve LBD
    BKR
    MeSH



  • Context
    overcome combinatorial explosion
    enable scalability
  • Problem definition
    In terms of path relatedness
    Decomposed to semantic predication relatedness

    To achieve this, we have studied characteristics of MEDLINE abstracts
    Articles have properties/attributes
    Provide various levels of abstraction of the full text
  • Given a way to represent context of a path, subgraphs can be automatically created in 6 steps
  • Frequency is the epiphenomenon of context
  • Compute Path Relatedness
    Two Objectives
    Binarize the vectors
  • Notice the binary vectors

    MeSH Semantic Similarity
    Set-based (Jaccard, Dice)
    Path Length (Rada, Wu&Palmer, Leacock&Chodorow)
    Information Content (Lin, Resnik, Jiang&Conrath)
    Gloss Vectors(LSI)
  • Mean – weighted average of the points
    Variance – average of the sum of squared distances away from the mean
    Standard Deviation – square root of Variance (What is normal, what is not)
  • Mean – weighted average of the points
    Variance – average of the sum of squared distances away from the mean
    Standard Deviation – square root of Variance (What is normal, what is not)
  • Single-link
    Cluster if maximum similarity is above the threshold
    Straggly Clusters

    Complete-link
    Cluster if minimum similarity is above threshold
    Strict, compact clusters

    Group-average
    Average of intra-cluster + inter-cluster
    Well connected but more broad connections than complete link
  • Definitional Knowledge – Top-down
    Assertional Knowledge – Bottom-up

    Using both together is probably best.
  • Analogy
    Google Knowledge Graph
    IBM Human Activity Modeling
    Yahoo Personalization
    Biomedicine Literature-based Discovery
    Mobile Applications
  • Transcript

    • 1. A CONTEXT-DRIVEN SUBGRAPH MODEL FOR LITERATURE-BASED DISCOVERY PH.D. DISSERTATION DEFENSE DELROY CAMERON AUGUST 18, 2014 PH.D. COMMITTEE AMIT P. SHETH (ADVISOR) KRISHNAPRASAD THIRUNARAYAN MICHAEL RAYMER RAMAKANTH KAVULURU (UKY) THOMAS C. RINDFLESCH (NIH) VARUN BHAGWAN (YAHOO! LABS)All truths are easy to understand once they are discovered; the point is to discover them. (Galileo Galilei, 1564–1642)
    • 2. 2 Historical Perspectives Walter Sutton (1877 – 1916) Theodor Boveri (1862 – 1915) Gregor Johann Mendel (1822 – 1884) Mendelian Laws of Inheritance (1866) Boveri-Sutton Chromosome Theory (1903)
    • 3. 3 Science of Making Discoveries Discovery Information Processing System What is promising?
    • 4. 4 Thesis Statement An information processing system that leverages rich representations of textual content from scientific literature based on implicit and explicit context can provide effective means for literature-based discovery.
    • 5. 5 Motivation Rofecoxib Osteoarthritis1999 TREAT Merck & Co. Increased risk of Heart Attack 2002 2004 $254.3 million Settlement 2005 Vioxx Withdrawn $4.85 billion Settlement Confirmed by Clinical Trial 2007 2011 $950 million Settlement 2013 $23 million Settlement
    • 6. 6 Motivation Literature-Based Discovery (LBD)
    • 7. 7 Literature-Based Discovery (LBD) ABC Model AnC Model Context-Driven Subgraph Model A CB A CB1 B2 BiSource: Wikipedia - http://en.wikipedia.org/wiki/Don_R._Swanson Keyword-based Concept-based Relations-based 2006 20111986 1996 ARROWSMITH v1 Term Frequency 1999 IRIDESCENT Term Co-occurrence 2001 DAD MetaMAP UMLS 2003 Litlinker MeSH, UMLS, Rules Level of Support Contribution #1 Context-Driven Subgraph Model for LBD SemBT Semantic Predications Level of Support Discovery Browsing Degree Centrality Cooperative Reciprocity Manual 2013 Manjal UMLS, MeSH Topic Profiles, TF-IDF 2004 Rajolink MeSH, Rarity BioSbKDS UMLS Relations MeSH 2005 BITOLA UMLS, MeSH Assoc. Rules, Confidence Graph-based ACS (2004) MeSH, Hebbian Learning A CB CAUSESINHIBITS A C PRODUCES INHIBITS Discovery Patterns Hybrid ARROWSMITH v2 8 Features (2007) Semantic MEDLINE Summarization Discovery Browsing Epiphanet Predications-based Semantic Indexing CoPub Keywords, Mutual Information 2010 Literature-based discovery refers to the use of papers and other academic publications (the “literature”) to find new relationships between existing knowledge (the “discovery”). Definition courtesy of Wikipedia: http://en.wikipedia.org/wiki/Literature-based_discovery
    • 8. 8 Application: Raynaud Syndrome – Fish Oil ISA Prostaglandin I3 CONVERTS_TO Dietary Fish Oils Platelet Aggregation DISRUPTS ISA DISRUPTS DISRUPTS Epoprostenol DISRUPTS ISA STIMULATES Prostaglandin CONVERTS_TO Raynaud Syndrome TREATS CAUSES D. Cameron, O. Bodenreider, H. Yalamanchili, T. Danh, S. Vallabhaneni, K. Thirunarayan, A. P. Sheth, T. C. Rindflesch. A Graph-based Recovery and Decomposition of Swanson’s Hypothesis using Semantic Predications. Journal of Biomedical Informatics (JBI13). 46(2): 238–251, 2013. Dietary Fish Oils Platelet Aggregation Raynaud Syndrome DISRUPTS CAUSES Dietary Fish Oils Platelet Aggregation Raynaud Syndrome Keyword/ Concept based Relations based Subgraph based Inferred predicates
    • 9. 9 Comparison Scenario Intermediate Cameron [19] Srinivasan [88, 89] Weeber [101, 102] Gordon [36,37,38] Hristovski [40] Raynaud Syndrome – Dietary Fish Oils Blood Viscosity × × × × × Platelet Aggregation × × × × × Vascular Reactivity × × × × Ramakrishnan [72]* ? ? ? Table 1: Comparison of intermediates rediscovered for Raynaud Syndrome – Dietary Fish Oil
    • 10. DISRUPTS ISA ISA Dietary Fish Oils Platelet Aggregation DISRUPTS Raynaud Syndrome CAUSES Prostaglandins CONVERTS_TO Prostacyclin (PGI2) DISRUPTS Prostaglandin I3 (PGI3) TREATSSTIMULATES Raynaud Syndrome Dietary Fish Oils Fatty Acid Essential Fatty Acid Triglyceride Lipid ISA DISRUPTS CAUSES ISA INHIBIT AFFECTS ISA INHIBITS Blood Viscosity Cellular Activity Blood Physiology Problem How to automate this? Tissue Function D. Cameron, O. Bodenreider, H. Yalamanchili, T. Danh, S. Vallabhaneni, K. Thirunarayan, A. P. Sheth, T. C. Rindflesch. A Graph-based Recovery and Decomposition of Swanson’s Hypothesis using DISRUPTS ISA Dietary Fish Oils Prostaglandin I3 (PGI3) Prostacyclin (PGI2) Raynaud Syndrome CAUSESVasoconstrictionINHIBIT CONVERTS_TO AFFECTS DISRUPTS TREATS
    • 11. Literature- Based Discovery Context- Driven Subgraph Model Foundations Automatic Subgraph Creation Experimental Results Dissertation Contributions Knowledge Exploration Limitations & Future Work
    • 12. PREDICATIONS GRAPH 12
    • 13. 13 . . . Subgraph Model Predications Graph (G) Candidate Graph (RG) Subgraphs (SG) No two contexts are the same R(s,t)(c1) R(s,t)(c2) R(s,t)(ck) R(s,t) . . . . . . What is context?
    • 14. Literature- Based Discovery Context- Driven Subgraph Model Foundations Automatic Subgraph Creation Experimental Results Dissertation Contributions Knowledge Exploration Limitations & Future Work
    • 15. 15 • Path Relatedness • Semantic Predication Context Context Distribution Assumption: The context of a semantic predication can be expressed as the distribution of all MeSH descriptors associated with all articles that contain it. Semantic Underpinnings Relational Semantic Summary Textual Semantic Summary Concept-Level Semantic Summary Interchangeability Assumption: The concept-level and relational semantic summary of a MEDLINE article are interchangeable.
    • 16. 16 Linguistic Underpinnings Linguistic items with similar distributions have similar meanings “You shall know a word by the company it keeps” – J. R. Firth 1957 Semantic Predications with shared contexts in their distributions are related Distributional Semantics Context-sensitive nature of meaning
    • 17. Literature- Based Discovery Context- Driven Subgraph Model Foundations Automatic Subgraph Creation Experimental Results Dissertation Contributions Knowledge Exploration Limitations & Future Work
    • 18. 18 MeSH Hierarchy MeSH Hierarchy Automatic Subgraph Creation m1 m2 m7 m8 m1 m7 m2 m8 m 1 m5 m9 m 8 Semantic Relatedness of MeSH Context Vectorsm9m1 m5 m8 Contribution #2 Context of a path as a vector of MeSH Descriptors pi pj
    • 19. 19 Path Relatedness 3 32 5 42 2 53 6 Objective #1: Maximize weights of In-Context Descriptors Objective #2: Minimize weights of Out-Of-Context Descriptors C(pi) C(pj) 1 3 1 2 2 3 00 00 02 0 0 03 22 5 42 53 61 3 1 20 00 p – path t – semantic predication m1 m2 m3 m4 m5 m1 m2 m6 m7 m8 m9 m10 m11 m12 m13 m1 m2 m6 m7 m8 m9 m10 m11 m12 m13m3 m4 m5 C(pi) C(pj)
    • 20. 20 Path Relatedness: Shared Context 1 00 00 01 0 0 01 11 1 11 11 11 1 1 10 00 Platelet aggregation Platelet activation Epoprostenol Platelet adhesiveness Prostaglandinsm3 m4 m5 m9 m10 m11 m12 m13 G-Tree platelet aggregation hemostasis Blood physiological process Blood physiological phenomena Circulatory and respiratory physiological phenomena platelet adhesiveness platelet activation Epoprostenol D-Tree Prostaglandins I Arachidonic Acids Fatty Acids, Unsaturated Fatty Acids Lipids Prostaglandins Eicosanoids Contribution #3 Structured Background Knowledge for computing shared context of paths C(pi) C(pj)
    • 21. 21 Path Relatedness Score *Dictionary of Distances, Elena Deza, Michel-Marie Deza, Elsevier, 2006
    • 22. 22 Hierarchical Agglomerative Clustering A C A CA CA C A CA CA C A C Iteration 1 Iteration n . . . Bucket PopulationBucket Merging ... A C A C A C A C Path Relatedness Threshold 1. Bucket Population 2. Bucket Merging 3. Subgraph Ranking
    • 23. 23 Summary of Metrics • Path Relatedness – Model: MeSH Context Vectors – Metrics: Semantics-enhanced shared context, Log Reduction – Threshold: ?? • MeSH Semantic Similarity – Model: MeSH Hierarchy – Metrics: Dice Similarity – Threshold: Manually
    • 24. 24 Automatic Threshold Selection RS-DFO Experiment Manual Threshold = 3.0 Gaussian Distribution Path Relatedness Score NumberofPathPairs
    • 25. 25 Automatic Threshold Selection Gaussian Function Path Relatedness Score ExpectedValue
    • 26. 26 Automatic Threshold Selection • Gaussian Distribution Diagram courtesy of Wikipedia* Points of Inflection
    • 27. 27 Threshold Comparisons Scenario Path Relatedness Score Max 2 Std Dev. Manual 3 Std Dev. RS-DFO 2.68 3.0 3.04 3.38 Testosterone-Sleep 3.35 3.5 3.8262 6.22 DEHP-Sepsis 3.94 4.0 4.53 4.84 Table 2: Path Relatedness Threshold Comparisons
    • 28. 28 Bucket Merging Ba Bb Christopher D. Manning, Prabhakar Raghavan, Hinrich Schütze: Introduction to information retrieval. Cambridge University Press 2008, ISBN 978-0-521-86571-5, pp. I-XXI, 1-482 Straggly Clusters Compact Clusters Broad Clusters
    • 29. 29 Subgraph Ranking Intra-Cluster Rank
    • 30. 30 Singleton Ranking Association Rarity
    • 31. 31 Summary of Metrics • Path Relatedness – Model: MeSH Context Vectors – Metrics: Semantics-enhanced shared context, Log Reduction – Manual Threshold for Semantic Similarity, Dice Similarity – Threshold: 2nd Standard Deviation from Mean of Gaussian • Bucket Relatedness – Model: Set of Paths – Metric: Inter-Cluster Similarity – Threshold: 2nd Standard Deviation from Mean of Gaussian • Subgraph Ranking – Metrics: Intra-Cluster Similarity, Singleton Rank (Association Rarity)
    • 32. 32 Algorithm Time Complexity: Θ(N 2logN )
    • 33. Literature- Based Discovery Context- Driven Subgraph Model Foundations Automatic Subgraph Creation Experimental Results Dissertation Contributions Knowledge Exploration Limitations & Future Work
    • 34. 34 Raynaud Syndrome – Dietary Fish Oil Inferred predicates Path Relatedness Threshold = 3σ
    • 35. Scenario 1: Raynaud Syndrome – Dietary Fish Oil Details Intermediate Association Status Cut-off date: Nov. 1985 By. D. R. Swanson (Article) Blood Viscosity Dietary Fish Oils INHIBITS Blood Viscosity Blood Viscosity CAUSES Raynaud Syndrome ZR-15 Platelet Aggregation Dietary Fish Oils INHIBITS Platelet Aggregation Platelet Aggregation CAUSES Raynaud Syndrome S1 Vasoconstriction Dietary Fish Oils INHIBITS Vasoconstriction Vasoconstriction CAUSES Raynaud Syndrome Legend ZR-zero rarity singleton S-Subgraph Not Found Results available online: http://wiki.knoesis.org/index.php/Obvio#Automatic_Subgraph_Creation Obvio Web Application: http://knoesis-hpco.cs.wright.edu/obvio/
    • 36. Scenario 2: Magnesium – Migraine Details Intermediate Association Status Cut-off date: Apr. 1987 By. D. R. Swanson (Article) Calcium Channel Blockers Magnesium ISA Calcium Channel Blocker Calcium Channel Blockers TREATS Migraine S22 Epilepsy Magnesium AFFECTS Epilepsy Epilepsy CO_EXISTS_WITH Migraine S9 Hypoxia Magnesium INHIBITS Hypoxia Hypoxia ASSOCIATED_WITH Migraine Inflammation Magnesium INHIBITS Inflammation Inflammation CAUSES Migraine ZR-3 Platelet Activity Magnesium INHIBITS Platelet Aggregation Platelet Aggregation CAUSES Migraine S1 Prostaglandins Magnesium STIMULATES Prostaglandins Prostaglandins DISRUPTS Migraine S4 Stress/Type A Personality STRESS INHIBITS Magnesium Stress ASSOICATED_WITH Migraine Serotonin Magnesium INHIBITS Serotonin Serotonin CAUSES Migraine S1 Cortical Depression Magnesium INHIBITS Spreading Cortical Depression Spreading Cortical Depression CAUSES Migraine Substance P Magnesium INHIBITS Substance P Substance P CAUSES Migraine Vascular Mechanisms Magnesium INHIBITS Vasoconstriction Vasoconstriction CAUSES Migraine S9 Legend ZR-zero rarity singleton S-Subgraph Not Found Results available online: http://wiki.knoesis.org/index.php/Obvio#Automatic_Subgraph_Creation Obvio Web Application: http://knoesis-hpco.cs.wright.edu/obvio/
    • 37. Scenario 3: Somatomedin C – Arginine Details Intermediate Association Status Cut-off date: Apr. 1989 By. D. R. Swanson (Article) Growth Hormone Arginine STIMULATES Growth Hormone Growth Hormone STIMULATES Somatomedins (IGF1) S5 Body Weight (body mass) Somatomedins (IGF1) STIMULATES Growth Arginine STIMULATES Growth S7 Malnutrition Somatomedins TREATS Malnutrition Arginine TREATS Malnutrition S7 Wound Healing (NK activity) Somatomedins STIMULATES Wound Healing Arginine STIMULATES Wound Healing Results available online: http://wiki.knoesis.org/index.php/Obvio#Automatic_Subgraph_Creation Obvio Web Application: http://knoesis-hpco.cs.wright.edu/obvio/ Legend ZR-zero rarity singleton S-Subgraph Not Found
    • 38. Scenario 4: Indomethacin – Alzheimer’s Disease Details Intermediate Association Status Cut-off date: Jul. 1995 By. Swanson/Smal heiser (Article) Acetylcholine Indomethacin INHIBITS Acetylcholine Acetylcholine CAUSES Alzheimers S4 Lipid Peroxidation Indomethacin INHIBITS Lipid Peroxidation Lipid Peroxidation CAUSES Alzheimers S2 M2-Muscarinic Indomethacin INHIBITS M2- Muscarinic M2-Muscarinic CAUSES Alzheimers Membrane Fluidity Indomethacin INHIBITS Membrane Fluidity Membrane Fluidity CAUSES Alzheimers Lymphocytes Indomethacin STIMULATES Natural Killer T-Cell Activity T-Cell Activity INHIBITS Alzheimers S14 Thyrotropin Indomethacin STIMULATES Thyrotropin Thyrotropin AFFECTS Alzheimers ZR-20 T-lymphocytes (T-Cells) Indomethacin STIMULATES T- lymphocytes T-lymphocyte Activity INHIBITS Alzheimers S3 Legend ZR-zero rarity singleton S-Subgraph Not Found Results available online: http://wiki.knoesis.org/index.php/Obvio#Automatic_Subgraph_Creation Obvio Web Application: http://knoesis-hpco.cs.wright.edu/obvio/
    • 39. Scenario 5: Estrogen – Alzheimer’s Disease Details Intermediate Association Status Cut-off date: Jul. 1995 By. Swanson/Smal heiser (Article) Antioxidant Activity Estrogen INHIBITS Antioxidant Activity Antioxidant Activity CAUSES Alzheimers S4 Aliproprotein E (ApoE) Estrogen INHIBITS ApoE ApoE CAUSES Alzheimers S3 Calbindin D28k Estrogen REGULATES Caldindin D28k Calbindin D28k AFFECTS Alzheimers S4 Cathepsin D Estrogen STIMULATES Cathepsin D Cathepsin D PREVENTS Alzheimers Cytochrome C Oxidase Subunit III Estrogen STIMULATES Cytochrome C Oxidase Subunit III Cytochrome C Oxidase Subunit III AFFECTS Alzheimers Glutamate Estrogen STIMULATES Glutamate Glutamate AFFECTS Alzheimers Receptor Polymorphism Estrogen EXHIBITS Receptor Polymorphism Receptor Polymorphism AFFECTS Alzheimers Legend ZR-zero rarity singleton S-Subgraph Not Found Results available online: http://wiki.knoesis.org/index.php/Obvio#Automatic_Subgraph_Creation Obvio Web Application: http://knoesis-hpco.cs.wright.edu/obvio/
    • 40. Scenario 6: Calcium Independent PLA2 – Schizophrenia Details Intermediate Association Status Cut-off date: 1997 By. Swanson/Smal heiser (Article) Oxidative Stress Oxidative Stress INHIBITS Calcium- Independent PLA2 Oxidative Stress CAUSES Schizophrenia ZR-2 Selenium Selenium INHIBITS Calcium- Independent PLA2 Selenium PREVENTS Schizophrenia ZR-2 Vitamin E Vitamin E INHIBITS Calcium- Independent PLA2 Vitamin E PREVENTS Schizophrenia ZR-2 Legend ZR-zero rarity singleton S-Subgraph Not Found Results available online: http://wiki.knoesis.org/index.php/Obvio#Automatic_Subgraph_Creation Obvio Web Application: http://knoesis-hpco.cs.wright.edu/obvio/
    • 41. Scenario 7: Chlorpromazine – Cardiac Hypertrophy Details Intermediate Association Status Cut-off date: 01/01/2002 By. J. D. Wren (Article) Calcineurin Chlorpromazine INHIBITS Calcineurin Calcineurin CAUSES Cardiac Hypertrophy S5 Isoproterenol Chlorpromazine INHIBITS Isoproterenol Isoproterenol CAUSES Cardiamegaly S12 Legend ZR-zero rarity singleton S-Subgraph Not Found Results available online: http://wiki.knoesis.org/index.php/Obvio#Automatic_Subgraph_Creation Obvio Web Application: http://knoesis-hpco.cs.wright.edu/obvio/
    • 42. Scenario 8: Testosterone – Sleep Details Intermediate Association Status Cut-off date: 01/01/2012 By. Miller/Rindflesc h (Article) Cortisol/Hydrocortisone Testosterone INHIBITS Cortisol Cortisol DISRUPTS Sleep S7 Legend ZR-zero rarity singleton S-Subgraph Not Found Results available online: http://wiki.knoesis.org/index.php/Obvio#Automatic_Subgraph_Creation Obvio Web Application: http://knoesis-hpco.cs.wright.edu/obvio/
    • 43. Scenario 9: Diethylhexyl Phthalate (DEHP) – Sepsis Details Intermediate Association Status Cut-off date: 01/01/2013 By. Cairelli/Rindfle sch (Article) PParGamma DEHP STIMULATES PParGamma PParGamma INHIBITS Sepsis Legend ZR-zero rarity singleton S-Subgraph Not Found Results available online: http://wiki.knoesis.org/index.php/Obvio#Automatic_Subgraph_Creation Obvio Web Application: http://knoesis-hpco.cs.wright.edu/obvio/
    • 44. 44 Statistical Evaluation Association Rarity Interestingness
    • 45. 45 Statistical Evaluation Experiment # Unique Associations Total MEDLINE Frequency Rarity r(E) Interestingness I(E) Raynaud-Fish Oil 10 0 0.00 1.00 Magnesium-Migraine 48 27 0.56 0.64 SomaC-Arginine 18 306 17.00 0.06 Indomethacin- Alzheimers 21 9 0.43 0.70 Estrogen-Alzheimers 42 36 0.86 0.54 PLA2-Schizophrenia 10 0 0.00 1.00 CPZ-Cardiac Hypertrophy 21 2 0.10 0.91 Testosterone-Sleep 61 654 10.72 0.09 Average 29 129 3.71 0.62 Table 3: Rarity and Interestingness score of the subgraphs in the rediscoveries
    • 46. Literature- Based Discovery Context- Driven Subgraph Model Foundations Automatic Subgraph Creation Experimental Results Dissertation Contributions Knowledge Exploration Limitations & Future Work
    • 47. 47 Predications-based Knowledge Exploration Corpus Predications Graph Definitional Knowledge (UMLS + MeSH) Provenance Knowledge Abstraction D. Cameron, R. Kavuluru, O. Bodenreider, P. N. Mendes, A. P. Sheth, K. Thirunarayan. Semantic Predications for Complex Information Needs in Biomedical Literature International Bioinformatics and Biomedical Conference (BIBM11). 512–519 , 2011. Contribution #4 Combining Assertional and Definitional Knowledge for Knowledge Exploration
    • 48. 48 Levels of Contexts A CB Predication Context A CB1 B2 Bi Path Context A CB1 B2 B3 A CB1 B2 Shared Context A C PRODUCES INHIBITS Subgraph Context … … … … … … A C A C A C … Dimensions
    • 49. Literature- Based Discovery Context- Driven Subgraph Model Foundations Automatic Subgraph Creation Experimental Results Dissertation Contributions Knowledge Exploration Limitations & Future Work
    • 50. 50 Dissertation Contributions 1. Context-Driven Subgraph Model – Knowledge Rediscovery & Decomposition 2. Predication/Path Context – Vector of MeSH Descriptors 3. Shared Context – Background Knowledge (MeSH Hierarchy) 4. Semantic Predications-based Text Exploration – Obvio Web Application
    • 51. 51 Innovation System/Technique Technique Type Automatic Relational Evidence- based Thematic Results #Discoveries #Rediscoveries IRIDESCENT [108] Keyword 1 0 ARROWSMITH [84] Keyword/Conc ept 5 0 DAD [101,102] Concept 0 2 BITOLA [46] Concept 0 1 Litlinker [110] Concept 0 2 Manjal [87,88] Concept × 0 5 SemBT [40,41,42] Relations × × 0 1 BioSbKDS [47] Relations × × 0 1 Wilkowski [107] Graph × × 0 0 Ramakrishnan [72] Graph × × 0 1* Zhang [114] Graph × × × 0 0 Obvio [19, 21] Graph × × × × 0 8 ARROWSMITH v2 [86,98] Hybrid × 0 6* Semantic MEDLINE [18,63] Hybrid × × 2 0 Note: References are from the PhD Dissertation manuscript entitled: A Context Driven Subgraph Model for Literature-Based Discovery Table 4: Comparison of capabilities and accomplishments of LBD techniques
    • 52. Literature- Based Discovery Context- Driven Subgraph Model Foundations Automatic Subgraph Creation Experimental Results Dissertation Contributions Knowledge Exploration Limitations & Future Work
    • 53. 53 Limitations 1. Manual Threshold – MeSH Semantic Similarity 2. Path Relatedness Threshold – Only Approximate Gaussian 3. Definition of Context
    • 54. 54 Levels of Semantic Representation Keywords Concepts MeSH Descriptors Semantic Predications Ensemble of Features Relationships A B Semantic Predication PREDICATE
    • 55. 55 Limitations 1. Manual Threshold – MeSH Semantic Similarity 2. Path Relatedness Threshold – Only Approximate Gaussian 3. Definition of Context 4. MEDLINE Querying – Deep integration of Assertional/Definitional 5. Contradiction Detection 6. Statistical Evaluation 7. Scalability of Clustering Algorithm 8. Subgraph Labeling
    • 56. 56 Take Away • Future of Information Processing – Rich Knowledge Representations o Implicit, Formal, Powerful semantics – Application to Literature-Based Discovery
    • 57. 57 Conclusion • Context-Driven Subgraph Model – Manually create Complex Associations – Automatic Subgraph Creation o Novel definitions for Context and Shared Context o Multiple Thematic Dimensions – Predications-based Knowledge Exploration o Predicates o Highlighted MEDLINE sentences – Knowledge Rediscovery o 8 out of 9 existing scientific discoveries
    • 58. 58 Publications 1. D. Cameron, R. Kavuluru, T. C. Rindflesch, O. Bodenreider, A. P. Sheth, K. Thirunarayan. Context-Driven Automatic Subgraph Creation for Literature-Based Discovery (under preparation) 2. D. Cameron, A. P. Sheth, N. Jaykumar, G. Anand, K. Thirunarayan, G. A. Smith. A Hybrid Approach to Finding Relevant Social Media Content for Domain Specific Information Needs. (submitted to the Journal of Web Semantics) 3. D. Cameron, O. Bodenreider, H. Yalamanchili, T. Danh, S. Vallabhaneni, K. Thirunarayan, A. P. Sheth, T. C. Rindflesch. A Graph-based Recovery and Decomposition of Swanson’s Hypothesis using Semantic Predications. Journal of Biomedical Informatics (JBI13). 46(2): 238–251, 2013. 4. D. Cameron, G. A. Smith, R. Daniulaityte, A. P. Sheth, D. Dave, L. Chen, G. Anand, R. Carlson, K. Z. Watkins, R. Falck. PREDOSE: A Semantic Web Platform for Drug Abuse Epidemiology using Social Media Journal of Biomedical Informatics (JBI13). 46(6): 985–997, 2013. 5. R. Daniulaityte, R. Carlson, R. Falck, D. Cameron, S. Perera, L. Chen, A. P. Sheth. “I just wanted to tell you that Loperamide WILL WORK: A Web-Based Study of Extra-medical use of Loperamide. Journal of Drug and Alcohol Dependence (DAD13) 130(1–3): 241–244, 2013. 6. D. Cameron, V. Bhagwan, A. P. Sheth. Towards Comprehensive Longitudinal Healthcare Data Capture. International Workshop on Semantic Web in Literature-Based Discovery (SWLBD12). 241–247, 2012. 7. R. Daniulaityte, R. Carlson, R. Falck, D. Cameron, S. Perera, L. Chen, A. P. Sheth. A Web-Based Study of Extra-medical use of Loperamide. The College on Problems of Drug Dependence (CPDD12), 2012. 8. D. Cameron, R. Kavuluru, O. Bodenreider, P. N. Mendes, A. P. Sheth, K. Thirunarayan. Semantic Predications for Complex Information Needs in Biomedical Literature. International Bioinformatics and Biomedical Conference (BIBM11). 512–519, 2011. 9. D. Cameron, B. Aleman-Meza, I. B. Arpinar, S. L. Decker, A. P. Sheth. A Taxonomy-based Model for Expertise Extrapolation. International Conference on Semantic Computing (ICSC10). 333–240, 2010. 10. D. Cameron, P. N. Mendes, A. P. Sheth, V. Chan. Semantics-empowered Text Exploration for Knowledge Discovery. ACM Southeast Conference (ACMSE10). 14, 2010. 11. C. Thomas, W. Wang, P. Mehra, D. Cameron, P. N. Mendes, A. P. Sheth. What Goes Around Comes Around – Improving Linked Open Data through On- Demand Model Creation. Web Science Conference (WebSci10), 2010. 12. P. N. Mendes, P. Kapanipathi, D. Cameron, A. P. Sheth. Dynamic Associative Relationships on the Linked Data Web. Web Science Conference (WebSci10), 2010.
    • 59. 59 Research Expertise Literature-Based Discovery Text MiningQuestion Answering [1] Information Retrieval [2] [3] [6] [4] [8] [10] [5] [7]
    • 60. 60 Parting Words “...some day the piecing together of dissociated knowledge will open up such terrifying vistas of reality,...that we shall either go mad from the revelation or flee from the deadly light into the peace and safety of a new dark age.” – H. P. Lovecraft (The Call of Cthulhu, The Horror in Clay). H. P. Lovecraft. The Call of Cthulhu. In S. T. Joshi, editor. The Call of Cthulhu and Other Weird Stories. Penguin Books Ltd., London, 1999
    • 61. 61 Acknowledgements • Olivier Bodenreider • Marcelo Fiszman • Mike Cairelli • Swapna Abhyankar • Drashti Dave • Dongwook Shin • Special Thanks o Pavan o Shreyansh o Swapnil o Nishita • PREDOSE Team o Nishita o Gaurish o Alan o Revathy
    • 62. 62 Ph.D. Committee Members Amit P. Sheth (Advisor) T.K. Prasad Michael Raymer Ramakanth Kavuluru Thomas C. Rindflesch Varun Bhagwan