Your SlideShare is downloading. ×
0
A CONTEXT-DRIVEN SUBGRAPH MODEL FOR
LITERATURE-BASED DISCOVERY
PH.D. DISSERTATION DEFENSE
DELROY CAMERON
AUGUST 18, 2014
P...
2
Historical Perspectives
Walter Sutton
(1877 – 1916)
Theodor Boveri
(1862 – 1915)
Gregor Johann Mendel
(1822 – 1884)
Mend...
3
Science of Making Discoveries
Discovery
Information Processing
System
What is promising?
4
Thesis Statement
An information processing system that leverages rich representations
of textual content from scientific...
5
Motivation
Rofecoxib Osteoarthritis1999 TREAT
Merck & Co.
Increased risk of
Heart Attack
2002
2004
$254.3 million
Settle...
6
Motivation
Literature-Based Discovery (LBD)
7
Literature-Based Discovery (LBD)
ABC Model
AnC Model
Context-Driven Subgraph Model
A CB
A CB1 B2 BiSource: Wikipedia - h...
8
Application: Raynaud Syndrome – Fish Oil
ISA
Prostaglandin I3
CONVERTS_TO
Dietary
Fish Oils
Platelet
Aggregation
DISRUPT...
9
Comparison
Scenario Intermediate Cameron [19]
Srinivasan
[88, 89]
Weeber
[101, 102]
Gordon
[36,37,38]
Hristovski
[40]
Ra...
DISRUPTS
ISA
ISA
Dietary
Fish Oils
Platelet
Aggregation
DISRUPTS
Raynaud
Syndrome
CAUSES
Prostaglandins
CONVERTS_TO
Prosta...
Literature-
Based
Discovery
Context-
Driven
Subgraph
Model
Foundations
Automatic
Subgraph
Creation
Experimental
Results
Di...
PREDICATIONS GRAPH
12
13
. . .
Subgraph Model
Predications
Graph (G)
Candidate
Graph (RG)
Subgraphs (SG)
No two contexts are the same
R(s,t)(c1)...
Literature-
Based
Discovery
Context-
Driven
Subgraph
Model
Foundations
Automatic
Subgraph
Creation
Experimental
Results
Di...
15
• Path Relatedness
• Semantic Predication Context
Context Distribution Assumption: The context of a semantic predicatio...
16
Linguistic Underpinnings
Linguistic items with similar distributions have similar meanings
“You shall know a word
by th...
Literature-
Based
Discovery
Context-
Driven
Subgraph
Model
Foundations
Automatic
Subgraph
Creation
Experimental
Results
Di...
18
MeSH Hierarchy
MeSH Hierarchy
Automatic Subgraph Creation
m1 m2
m7 m8
m1 m7 m2 m8
m
1
m5 m9 m
8
Semantic Relatedness
of...
19
Path Relatedness
3 32
5 42
2
53 6
Objective #1: Maximize weights of In-Context Descriptors
Objective #2: Minimize weigh...
20
Path Relatedness: Shared Context
1 00 00 01 0 0 01 11
1 11 11 11 1 1 10 00
Platelet
aggregation
Platelet
activation
Epo...
21
Path Relatedness Score
*Dictionary of Distances, Elena Deza, Michel-Marie Deza, Elsevier, 2006
22
Hierarchical Agglomerative Clustering
A C A CA CA C A CA CA C A C
Iteration 1
Iteration n
. . .
Bucket PopulationBucket...
23
Summary of Metrics
• Path Relatedness
– Model: MeSH Context Vectors
– Metrics: Semantics-enhanced shared context, Log R...
24
Automatic Threshold Selection
RS-DFO Experiment
Manual Threshold = 3.0
Gaussian Distribution
Path Relatedness Score
Num...
25
Automatic Threshold Selection
Gaussian Function
Path Relatedness Score
ExpectedValue
26
Automatic Threshold Selection
• Gaussian Distribution
Diagram courtesy of Wikipedia*
Points of Inflection
27
Threshold Comparisons
Scenario
Path Relatedness Score
Max
2 Std Dev. Manual 3 Std Dev.
RS-DFO 2.68 3.0 3.04 3.38
Testos...
28
Bucket Merging
Ba
Bb
Christopher D. Manning, Prabhakar Raghavan, Hinrich Schütze: Introduction to information retrieval...
29
Subgraph Ranking
Intra-Cluster Rank
30
Singleton Ranking
Association Rarity
31
Summary of Metrics
• Path Relatedness
– Model: MeSH Context Vectors
– Metrics: Semantics-enhanced shared context, Log R...
32
Algorithm
Time Complexity: Θ(N 2logN )
Literature-
Based
Discovery
Context-
Driven
Subgraph
Model
Foundations
Automatic
Subgraph
Creation
Experimental
Results
Di...
34
Raynaud Syndrome – Dietary Fish Oil
Inferred predicates
Path Relatedness Threshold = 3σ
Scenario 1: Raynaud Syndrome – Dietary Fish Oil
Details Intermediate Association Status
Cut-off date:
Nov. 1985
By. D. R.
...
Scenario 2: Magnesium – Migraine
Details Intermediate Association Status
Cut-off date:
Apr. 1987
By. D. R.
Swanson
(Articl...
Scenario 3: Somatomedin C – Arginine
Details Intermediate Association Status
Cut-off date:
Apr. 1989
By. D. R.
Swanson
(Ar...
Scenario 4: Indomethacin – Alzheimer’s Disease
Details Intermediate Association Status
Cut-off date:
Jul. 1995
By.
Swanson...
Scenario 5: Estrogen – Alzheimer’s Disease
Details Intermediate Association Status
Cut-off date:
Jul. 1995
By.
Swanson/Sma...
Scenario 6: Calcium Independent PLA2 – Schizophrenia
Details Intermediate Association Status
Cut-off date:
1997
By.
Swanso...
Scenario 7: Chlorpromazine – Cardiac Hypertrophy
Details Intermediate Association Status
Cut-off date:
01/01/2002
By. J. D...
Scenario 8: Testosterone – Sleep
Details Intermediate Association Status
Cut-off date:
01/01/2012
By.
Miller/Rindflesc
h
(...
Scenario 9: Diethylhexyl Phthalate (DEHP) – Sepsis
Details Intermediate Association Status
Cut-off date:
01/01/2013
By.
Ca...
44
Statistical Evaluation
Association Rarity Interestingness
45
Statistical Evaluation
Experiment
# Unique
Associations
Total
MEDLINE
Frequency
Rarity
r(E)
Interestingness
I(E)
Raynau...
Literature-
Based
Discovery
Context-
Driven
Subgraph
Model
Foundations
Automatic
Subgraph
Creation
Experimental
Results
Di...
47
Predications-based Knowledge Exploration
Corpus
Predications Graph
Definitional Knowledge (UMLS + MeSH)
Provenance
Know...
48
Levels of Contexts
A CB
Predication
Context
A CB1 B2 Bi
Path
Context
A CB1 B2 B3
A CB1 B2
Shared
Context
A C
PRODUCES
I...
Literature-
Based
Discovery
Context-
Driven
Subgraph
Model
Foundations
Automatic
Subgraph
Creation
Experimental
Results
Di...
50
Dissertation Contributions
1. Context-Driven Subgraph Model
– Knowledge Rediscovery & Decomposition
2. Predication/Path...
51
Innovation
System/Technique
Technique
Type
Automatic Relational
Evidence-
based
Thematic
Results
#Discoveries #Rediscov...
Literature-
Based
Discovery
Context-
Driven
Subgraph
Model
Foundations
Automatic
Subgraph
Creation
Experimental
Results
Di...
53
Limitations
1. Manual Threshold
– MeSH Semantic Similarity
2. Path Relatedness Threshold
– Only Approximate Gaussian
3....
54
Levels of Semantic Representation
Keywords
Concepts
MeSH Descriptors
Semantic Predications
Ensemble of Features
Relatio...
55
Limitations
1. Manual Threshold
– MeSH Semantic Similarity
2. Path Relatedness Threshold
– Only Approximate Gaussian
3....
56
Take Away
• Future of Information Processing
– Rich Knowledge Representations
o Implicit, Formal, Powerful semantics
– ...
57
Conclusion
• Context-Driven Subgraph Model
– Manually create Complex Associations
– Automatic Subgraph Creation
o Novel...
58
Publications
1. D. Cameron, R. Kavuluru, T. C. Rindflesch, O. Bodenreider, A. P. Sheth, K. Thirunarayan. Context-Driven...
59
Research Expertise
Literature-Based
Discovery
Text MiningQuestion
Answering
[1]
Information
Retrieval
[2]
[3]
[6]
[4]
[...
60
Parting Words
“...some day the piecing together of dissociated knowledge will open up such
terrifying vistas of reality...
61
Acknowledgements
• Olivier Bodenreider
• Marcelo Fiszman
• Mike Cairelli
• Swapna Abhyankar
• Drashti Dave
• Dongwook S...
62
Ph.D. Committee Members
Amit P. Sheth
(Advisor)
T.K. Prasad Michael Raymer
Ramakanth Kavuluru Thomas C. Rindflesch Varu...
Upcoming SlideShare
Loading in...5
×

Delroy Cameron's Dissertation Defense: A Contenxt-Driven Subgraph Model for Literature-based Discovery

305

Published on

Literature-Based Discovery (LBD) refers to the process of uncovering hidden connections that are implicit in scientific literature. Numerous hypotheses have been generated from scientific literature, which influenced innovations in diagnosis, treatment, preventions and overall public health. However, much of the existing research on discovering hidden connections among concepts have used distributional statistics and graph-theoretic measures to capture implicit associations. Such metrics do not explicitly capture the semantics of hidden connections. ...

While effective in some situations, the practice of relying on domain expertise, structured background knowledge and heuristics to complement distributional and graph-theoretic approaches, has serious limitations. ..

This dissertation proposes an innovative context-driven, automatic subgraph creation method for finding hidden and complex associations among concepts, along multiple thematic dimensions. It outlines definitions for context and shared context, based on implicit and explicit (or formal) semantics, which compensate for deficiencies in statistical and graph-based metrics. It also eliminates the need for heuristics a priori. An evidence-based evaluation of the proposed framework showed that 8 out of 9 existing scientific discoveries could be recovered using this approach. Additionally, insights into the meaning of associations could be obtained using provenance provided by the system. In a statistical evaluation to determine the interestingness of the generated subgraphs, it was observed that an arbitrary association is mentioned in only approximately 4 articles in MEDLINE, on average. These results suggest that leveraging implicit and explicit context, as defined in this dissertation, is an advancement of the state-of-the-art in LBD research.

Ph.D. Committee: Drs. Amit Sheth (Advisor), TK Prasad, Michael Raymer,
Ramakanth Kavuluru (UKY), Thomas C. Rindflesch (NLM) and Varun Bhagwan (Yahoo! Labs)

Relevant Publications (more at: http://knoesis.wright.edu/students/delroy/)

D. Cameron, R. Kavuluru, T. C. Rindflesch, O. Bodenreider, A. P. Sheth, K. Thirunarayan. Leveraging Distributional Semantics for Domain Agnostic Literature-Based Discovery (under preparation)

D. Cameron, O. Bodenreider, H. Yalamanchili, T. Danh, S. Vallabhaneni, K. Thirunarayan, A. P. Sheth, T. C. Rindflesch. A Graph-based Recovery and Decomposition of Swanson’s Hypothesis using Semantic Predications. Journal of Biomedical Informatics (JBI13), 46(2): 238–251, 2013

D. Cameron, R. Kavuluru, O. Bodenreider, P. N. Mendes, A. P. Sheth, K. Thirunarayan. Semantic Predications for Complex Information Needs in Biomedical Literature International Bioinformatics and Biomedical Conference (BIBM11), pp. 512–519, 2011 (acceptance rate=19.4%)

D. Cameron, P. N. Mendes, A. P. Sheth, V. Chan. Semantics-empowered Text Exploration for Knowledge Discovery. ACM Southeast Conference (ACMSE10), 14, 2010

Published in: Education
2 Comments
3 Likes
Statistics
Notes
  • I think we will continue to see application of structured background knowledge for various cognitive computing tasks. Biomedicine is only one among these. It was a privilege to explore this domain and understand some of the nuances of the domain.
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Outstanding work. Very Nice presentation in what I believe will be a pivotal cognitive computing area.
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
No Downloads
Views
Total Views
305
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
8
Comments
2
Likes
3
Embeds 0
No embeds

No notes for slide
  • Thank everyone for coming.
    Feel free to ask questions
  • Explored the Research Question of: Characteristics of Inheritance of Traits across Generations of Peas
    Gregor Johann Mendel – Debunked Blending Inheritance, Founder of Genetics, Pea Hybridization, 1866

    EXPERIMENTATION

    OBSERVATION
    - Inheritance of traits across generations seemed to extend beyond the immediate parents in the lineage

    EXPLANATION
    - Inheritance of traits appears to be influenced by the presence of dominant and recessive factors, which split, then independently recombine

    THEORY
    - Law of Segregation
    - Law of Independent assortment

    Explored the Research Question of: The mechanism of Cell Division (cytology) in the embryos of Grasshoppers
    Walter Sutton & Theodor Boveri – Cytology 1903, Genetic Inheritance, each cell split is equally likely – gives the causal mechanism for Mendel’s law

    OBSERVED
    - splitting of chromosomes in the cells of grasshoppers (meiosis)

    EXPLAINED
    - Mendels laws of inheritance applied to chromosomes at the cellular level in living organisms

    THEORIZED
    - Chromosomes are the basis for genetic inheritance

    Jorn Dyerberg & Hans Olaf Bang (1913–1994) – The Greenland Eskimo
    OBSERVED
    - Greenland Eskimos, no AMI

    EXPLAINED
    - diet rich in omega-3 fatty acids

    THEORIZED
    - marine oils can treat thrombosis, atherosceloris, and AMI
  • LBD is now driven by digital data (in silico as opposed to in vivo)
    Four activities involved in the science of making discoveries under the guidance of a Human
  • An information processing system that leverages rich representations of textual content from scientific literature based on implicit and explicit context can provide effective means for literature-based discovery. This has been convincingly demonstrated through rediscovery of several well-known associations (between biomedical concepts) and their substantiation using MEDLINE and the Medical Subject Headings (MeSH) vocabulary.

  • Vioxx Brand Name (Rofecoxib is a nonsteroidal anti-inflammatory drug - NSAID)
    - stronger pain medication than Naproxen (Brand Name Aleve)
    - easier on the stomach than Naproxen

    2004 Merck’s Clinical Trial - proved risk of heart attack

    Lawsuit by 50,000 patients

  • Vioxx (anti-inflammatory)
    - stronger
    - less severe side effects (easier on the stomach)

    Lawsuit by 50,000 patients

  • LBD is different from traditional research
    Direct observations of the object of interest

    Keyword-based – error prone due to absence of text normalization to standard concepts
    Concept-based – (also Semantics-based, concepts but no explicit relationships)
    Relations-based – (explicit relationships) but limited complexity, unable to capture causality, mechanisms of interaction
    Graph-based - Giant Component, Clustering Coefficient, Geodesic, Centrality (betweenness, closeness)
    Hybrid – combine machine learning, summarization with traditional LBD approaches
  • Rich representations
    Personalization
    Google Knowledge Graph
    Human Activity Modeling
    Mobile Applications/Advertising (get examples)

    Two goals for automation:
    Create subgraphs that capture complex associations
    Along multiple thematic dimensions

    Use of background knowledge to improve LBD
    BKR
    MeSH



  • Context
    overcome combinatorial explosion
    enable scalability
  • Problem definition
    In terms of path relatedness
    Decomposed to semantic predication relatedness

    To achieve this, we have studied characteristics of MEDLINE abstracts
    Articles have properties/attributes
    Provide various levels of abstraction of the full text
  • Given a way to represent context of a path, subgraphs can be automatically created in 6 steps
  • Frequency is the epiphenomenon of context
  • Compute Path Relatedness
    Two Objectives
    Binarize the vectors
  • Notice the binary vectors

    MeSH Semantic Similarity
    Set-based (Jaccard, Dice)
    Path Length (Rada, Wu&Palmer, Leacock&Chodorow)
    Information Content (Lin, Resnik, Jiang&Conrath)
    Gloss Vectors(LSI)
  • Mean – weighted average of the points
    Variance – average of the sum of squared distances away from the mean
    Standard Deviation – square root of Variance (What is normal, what is not)
  • Mean – weighted average of the points
    Variance – average of the sum of squared distances away from the mean
    Standard Deviation – square root of Variance (What is normal, what is not)
  • Single-link
    Cluster if maximum similarity is above the threshold
    Straggly Clusters

    Complete-link
    Cluster if minimum similarity is above threshold
    Strict, compact clusters

    Group-average
    Average of intra-cluster + inter-cluster
    Well connected but more broad connections than complete link
  • Definitional Knowledge – Top-down
    Assertional Knowledge – Bottom-up

    Using both together is probably best.
  • Analogy
    Google Knowledge Graph
    IBM Human Activity Modeling
    Yahoo Personalization
    Biomedicine Literature-based Discovery
    Mobile Applications
  • Transcript of "Delroy Cameron's Dissertation Defense: A Contenxt-Driven Subgraph Model for Literature-based Discovery"

    1. 1. A CONTEXT-DRIVEN SUBGRAPH MODEL FOR LITERATURE-BASED DISCOVERY PH.D. DISSERTATION DEFENSE DELROY CAMERON AUGUST 18, 2014 PH.D. COMMITTEE AMIT P. SHETH (ADVISOR) KRISHNAPRASAD THIRUNARAYAN MICHAEL RAYMER RAMAKANTH KAVULURU (UKY) THOMAS C. RINDFLESCH (NIH) VARUN BHAGWAN (YAHOO! LABS)All truths are easy to understand once they are discovered; the point is to discover them. (Galileo Galilei, 1564–1642)
    2. 2. 2 Historical Perspectives Walter Sutton (1877 – 1916) Theodor Boveri (1862 – 1915) Gregor Johann Mendel (1822 – 1884) Mendelian Laws of Inheritance (1866) Boveri-Sutton Chromosome Theory (1903)
    3. 3. 3 Science of Making Discoveries Discovery Information Processing System What is promising?
    4. 4. 4 Thesis Statement An information processing system that leverages rich representations of textual content from scientific literature based on implicit and explicit context can provide effective means for literature-based discovery.
    5. 5. 5 Motivation Rofecoxib Osteoarthritis1999 TREAT Merck & Co. Increased risk of Heart Attack 2002 2004 $254.3 million Settlement 2005 Vioxx Withdrawn $4.85 billion Settlement Confirmed by Clinical Trial 2007 2011 $950 million Settlement 2013 $23 million Settlement
    6. 6. 6 Motivation Literature-Based Discovery (LBD)
    7. 7. 7 Literature-Based Discovery (LBD) ABC Model AnC Model Context-Driven Subgraph Model A CB A CB1 B2 BiSource: Wikipedia - http://en.wikipedia.org/wiki/Don_R._Swanson Keyword-based Concept-based Relations-based 2006 20111986 1996 ARROWSMITH v1 Term Frequency 1999 IRIDESCENT Term Co-occurrence 2001 DAD MetaMAP UMLS 2003 Litlinker MeSH, UMLS, Rules Level of Support Contribution #1 Context-Driven Subgraph Model for LBD SemBT Semantic Predications Level of Support Discovery Browsing Degree Centrality Cooperative Reciprocity Manual 2013 Manjal UMLS, MeSH Topic Profiles, TF-IDF 2004 Rajolink MeSH, Rarity BioSbKDS UMLS Relations MeSH 2005 BITOLA UMLS, MeSH Assoc. Rules, Confidence Graph-based ACS (2004) MeSH, Hebbian Learning A CB CAUSESINHIBITS A C PRODUCES INHIBITS Discovery Patterns Hybrid ARROWSMITH v2 8 Features (2007) Semantic MEDLINE Summarization Discovery Browsing Epiphanet Predications-based Semantic Indexing CoPub Keywords, Mutual Information 2010 Literature-based discovery refers to the use of papers and other academic publications (the “literature”) to find new relationships between existing knowledge (the “discovery”). Definition courtesy of Wikipedia: http://en.wikipedia.org/wiki/Literature-based_discovery
    8. 8. 8 Application: Raynaud Syndrome – Fish Oil ISA Prostaglandin I3 CONVERTS_TO Dietary Fish Oils Platelet Aggregation DISRUPTS ISA DISRUPTS DISRUPTS Epoprostenol DISRUPTS ISA STIMULATES Prostaglandin CONVERTS_TO Raynaud Syndrome TREATS CAUSES D. Cameron, O. Bodenreider, H. Yalamanchili, T. Danh, S. Vallabhaneni, K. Thirunarayan, A. P. Sheth, T. C. Rindflesch. A Graph-based Recovery and Decomposition of Swanson’s Hypothesis using Semantic Predications. Journal of Biomedical Informatics (JBI13). 46(2): 238–251, 2013. Dietary Fish Oils Platelet Aggregation Raynaud Syndrome DISRUPTS CAUSES Dietary Fish Oils Platelet Aggregation Raynaud Syndrome Keyword/ Concept based Relations based Subgraph based Inferred predicates
    9. 9. 9 Comparison Scenario Intermediate Cameron [19] Srinivasan [88, 89] Weeber [101, 102] Gordon [36,37,38] Hristovski [40] Raynaud Syndrome – Dietary Fish Oils Blood Viscosity × × × × × Platelet Aggregation × × × × × Vascular Reactivity × × × × Ramakrishnan [72]* ? ? ? Table 1: Comparison of intermediates rediscovered for Raynaud Syndrome – Dietary Fish Oil
    10. 10. DISRUPTS ISA ISA Dietary Fish Oils Platelet Aggregation DISRUPTS Raynaud Syndrome CAUSES Prostaglandins CONVERTS_TO Prostacyclin (PGI2) DISRUPTS Prostaglandin I3 (PGI3) TREATSSTIMULATES Raynaud Syndrome Dietary Fish Oils Fatty Acid Essential Fatty Acid Triglyceride Lipid ISA DISRUPTS CAUSES ISA INHIBIT AFFECTS ISA INHIBITS Blood Viscosity Cellular Activity Blood Physiology Problem How to automate this? Tissue Function D. Cameron, O. Bodenreider, H. Yalamanchili, T. Danh, S. Vallabhaneni, K. Thirunarayan, A. P. Sheth, T. C. Rindflesch. A Graph-based Recovery and Decomposition of Swanson’s Hypothesis using DISRUPTS ISA Dietary Fish Oils Prostaglandin I3 (PGI3) Prostacyclin (PGI2) Raynaud Syndrome CAUSESVasoconstrictionINHIBIT CONVERTS_TO AFFECTS DISRUPTS TREATS
    11. 11. Literature- Based Discovery Context- Driven Subgraph Model Foundations Automatic Subgraph Creation Experimental Results Dissertation Contributions Knowledge Exploration Limitations & Future Work
    12. 12. PREDICATIONS GRAPH 12
    13. 13. 13 . . . Subgraph Model Predications Graph (G) Candidate Graph (RG) Subgraphs (SG) No two contexts are the same R(s,t)(c1) R(s,t)(c2) R(s,t)(ck) R(s,t) . . . . . . What is context?
    14. 14. Literature- Based Discovery Context- Driven Subgraph Model Foundations Automatic Subgraph Creation Experimental Results Dissertation Contributions Knowledge Exploration Limitations & Future Work
    15. 15. 15 • Path Relatedness • Semantic Predication Context Context Distribution Assumption: The context of a semantic predication can be expressed as the distribution of all MeSH descriptors associated with all articles that contain it. Semantic Underpinnings Relational Semantic Summary Textual Semantic Summary Concept-Level Semantic Summary Interchangeability Assumption: The concept-level and relational semantic summary of a MEDLINE article are interchangeable.
    16. 16. 16 Linguistic Underpinnings Linguistic items with similar distributions have similar meanings “You shall know a word by the company it keeps” – J. R. Firth 1957 Semantic Predications with shared contexts in their distributions are related Distributional Semantics Context-sensitive nature of meaning
    17. 17. Literature- Based Discovery Context- Driven Subgraph Model Foundations Automatic Subgraph Creation Experimental Results Dissertation Contributions Knowledge Exploration Limitations & Future Work
    18. 18. 18 MeSH Hierarchy MeSH Hierarchy Automatic Subgraph Creation m1 m2 m7 m8 m1 m7 m2 m8 m 1 m5 m9 m 8 Semantic Relatedness of MeSH Context Vectorsm9m1 m5 m8 Contribution #2 Context of a path as a vector of MeSH Descriptors pi pj
    19. 19. 19 Path Relatedness 3 32 5 42 2 53 6 Objective #1: Maximize weights of In-Context Descriptors Objective #2: Minimize weights of Out-Of-Context Descriptors C(pi) C(pj) 1 3 1 2 2 3 00 00 02 0 0 03 22 5 42 53 61 3 1 20 00 p – path t – semantic predication m1 m2 m3 m4 m5 m1 m2 m6 m7 m8 m9 m10 m11 m12 m13 m1 m2 m6 m7 m8 m9 m10 m11 m12 m13m3 m4 m5 C(pi) C(pj)
    20. 20. 20 Path Relatedness: Shared Context 1 00 00 01 0 0 01 11 1 11 11 11 1 1 10 00 Platelet aggregation Platelet activation Epoprostenol Platelet adhesiveness Prostaglandinsm3 m4 m5 m9 m10 m11 m12 m13 G-Tree platelet aggregation hemostasis Blood physiological process Blood physiological phenomena Circulatory and respiratory physiological phenomena platelet adhesiveness platelet activation Epoprostenol D-Tree Prostaglandins I Arachidonic Acids Fatty Acids, Unsaturated Fatty Acids Lipids Prostaglandins Eicosanoids Contribution #3 Structured Background Knowledge for computing shared context of paths C(pi) C(pj)
    21. 21. 21 Path Relatedness Score *Dictionary of Distances, Elena Deza, Michel-Marie Deza, Elsevier, 2006
    22. 22. 22 Hierarchical Agglomerative Clustering A C A CA CA C A CA CA C A C Iteration 1 Iteration n . . . Bucket PopulationBucket Merging ... A C A C A C A C Path Relatedness Threshold 1. Bucket Population 2. Bucket Merging 3. Subgraph Ranking
    23. 23. 23 Summary of Metrics • Path Relatedness – Model: MeSH Context Vectors – Metrics: Semantics-enhanced shared context, Log Reduction – Threshold: ?? • MeSH Semantic Similarity – Model: MeSH Hierarchy – Metrics: Dice Similarity – Threshold: Manually
    24. 24. 24 Automatic Threshold Selection RS-DFO Experiment Manual Threshold = 3.0 Gaussian Distribution Path Relatedness Score NumberofPathPairs
    25. 25. 25 Automatic Threshold Selection Gaussian Function Path Relatedness Score ExpectedValue
    26. 26. 26 Automatic Threshold Selection • Gaussian Distribution Diagram courtesy of Wikipedia* Points of Inflection
    27. 27. 27 Threshold Comparisons Scenario Path Relatedness Score Max 2 Std Dev. Manual 3 Std Dev. RS-DFO 2.68 3.0 3.04 3.38 Testosterone-Sleep 3.35 3.5 3.8262 6.22 DEHP-Sepsis 3.94 4.0 4.53 4.84 Table 2: Path Relatedness Threshold Comparisons
    28. 28. 28 Bucket Merging Ba Bb Christopher D. Manning, Prabhakar Raghavan, Hinrich Schütze: Introduction to information retrieval. Cambridge University Press 2008, ISBN 978-0-521-86571-5, pp. I-XXI, 1-482 Straggly Clusters Compact Clusters Broad Clusters
    29. 29. 29 Subgraph Ranking Intra-Cluster Rank
    30. 30. 30 Singleton Ranking Association Rarity
    31. 31. 31 Summary of Metrics • Path Relatedness – Model: MeSH Context Vectors – Metrics: Semantics-enhanced shared context, Log Reduction – Manual Threshold for Semantic Similarity, Dice Similarity – Threshold: 2nd Standard Deviation from Mean of Gaussian • Bucket Relatedness – Model: Set of Paths – Metric: Inter-Cluster Similarity – Threshold: 2nd Standard Deviation from Mean of Gaussian • Subgraph Ranking – Metrics: Intra-Cluster Similarity, Singleton Rank (Association Rarity)
    32. 32. 32 Algorithm Time Complexity: Θ(N 2logN )
    33. 33. Literature- Based Discovery Context- Driven Subgraph Model Foundations Automatic Subgraph Creation Experimental Results Dissertation Contributions Knowledge Exploration Limitations & Future Work
    34. 34. 34 Raynaud Syndrome – Dietary Fish Oil Inferred predicates Path Relatedness Threshold = 3σ
    35. 35. Scenario 1: Raynaud Syndrome – Dietary Fish Oil Details Intermediate Association Status Cut-off date: Nov. 1985 By. D. R. Swanson (Article) Blood Viscosity Dietary Fish Oils INHIBITS Blood Viscosity Blood Viscosity CAUSES Raynaud Syndrome ZR-15 Platelet Aggregation Dietary Fish Oils INHIBITS Platelet Aggregation Platelet Aggregation CAUSES Raynaud Syndrome S1 Vasoconstriction Dietary Fish Oils INHIBITS Vasoconstriction Vasoconstriction CAUSES Raynaud Syndrome Legend ZR-zero rarity singleton S-Subgraph Not Found Results available online: http://wiki.knoesis.org/index.php/Obvio#Automatic_Subgraph_Creation Obvio Web Application: http://knoesis-hpco.cs.wright.edu/obvio/
    36. 36. Scenario 2: Magnesium – Migraine Details Intermediate Association Status Cut-off date: Apr. 1987 By. D. R. Swanson (Article) Calcium Channel Blockers Magnesium ISA Calcium Channel Blocker Calcium Channel Blockers TREATS Migraine S22 Epilepsy Magnesium AFFECTS Epilepsy Epilepsy CO_EXISTS_WITH Migraine S9 Hypoxia Magnesium INHIBITS Hypoxia Hypoxia ASSOCIATED_WITH Migraine Inflammation Magnesium INHIBITS Inflammation Inflammation CAUSES Migraine ZR-3 Platelet Activity Magnesium INHIBITS Platelet Aggregation Platelet Aggregation CAUSES Migraine S1 Prostaglandins Magnesium STIMULATES Prostaglandins Prostaglandins DISRUPTS Migraine S4 Stress/Type A Personality STRESS INHIBITS Magnesium Stress ASSOICATED_WITH Migraine Serotonin Magnesium INHIBITS Serotonin Serotonin CAUSES Migraine S1 Cortical Depression Magnesium INHIBITS Spreading Cortical Depression Spreading Cortical Depression CAUSES Migraine Substance P Magnesium INHIBITS Substance P Substance P CAUSES Migraine Vascular Mechanisms Magnesium INHIBITS Vasoconstriction Vasoconstriction CAUSES Migraine S9 Legend ZR-zero rarity singleton S-Subgraph Not Found Results available online: http://wiki.knoesis.org/index.php/Obvio#Automatic_Subgraph_Creation Obvio Web Application: http://knoesis-hpco.cs.wright.edu/obvio/
    37. 37. Scenario 3: Somatomedin C – Arginine Details Intermediate Association Status Cut-off date: Apr. 1989 By. D. R. Swanson (Article) Growth Hormone Arginine STIMULATES Growth Hormone Growth Hormone STIMULATES Somatomedins (IGF1) S5 Body Weight (body mass) Somatomedins (IGF1) STIMULATES Growth Arginine STIMULATES Growth S7 Malnutrition Somatomedins TREATS Malnutrition Arginine TREATS Malnutrition S7 Wound Healing (NK activity) Somatomedins STIMULATES Wound Healing Arginine STIMULATES Wound Healing Results available online: http://wiki.knoesis.org/index.php/Obvio#Automatic_Subgraph_Creation Obvio Web Application: http://knoesis-hpco.cs.wright.edu/obvio/ Legend ZR-zero rarity singleton S-Subgraph Not Found
    38. 38. Scenario 4: Indomethacin – Alzheimer’s Disease Details Intermediate Association Status Cut-off date: Jul. 1995 By. Swanson/Smal heiser (Article) Acetylcholine Indomethacin INHIBITS Acetylcholine Acetylcholine CAUSES Alzheimers S4 Lipid Peroxidation Indomethacin INHIBITS Lipid Peroxidation Lipid Peroxidation CAUSES Alzheimers S2 M2-Muscarinic Indomethacin INHIBITS M2- Muscarinic M2-Muscarinic CAUSES Alzheimers Membrane Fluidity Indomethacin INHIBITS Membrane Fluidity Membrane Fluidity CAUSES Alzheimers Lymphocytes Indomethacin STIMULATES Natural Killer T-Cell Activity T-Cell Activity INHIBITS Alzheimers S14 Thyrotropin Indomethacin STIMULATES Thyrotropin Thyrotropin AFFECTS Alzheimers ZR-20 T-lymphocytes (T-Cells) Indomethacin STIMULATES T- lymphocytes T-lymphocyte Activity INHIBITS Alzheimers S3 Legend ZR-zero rarity singleton S-Subgraph Not Found Results available online: http://wiki.knoesis.org/index.php/Obvio#Automatic_Subgraph_Creation Obvio Web Application: http://knoesis-hpco.cs.wright.edu/obvio/
    39. 39. Scenario 5: Estrogen – Alzheimer’s Disease Details Intermediate Association Status Cut-off date: Jul. 1995 By. Swanson/Smal heiser (Article) Antioxidant Activity Estrogen INHIBITS Antioxidant Activity Antioxidant Activity CAUSES Alzheimers S4 Aliproprotein E (ApoE) Estrogen INHIBITS ApoE ApoE CAUSES Alzheimers S3 Calbindin D28k Estrogen REGULATES Caldindin D28k Calbindin D28k AFFECTS Alzheimers S4 Cathepsin D Estrogen STIMULATES Cathepsin D Cathepsin D PREVENTS Alzheimers Cytochrome C Oxidase Subunit III Estrogen STIMULATES Cytochrome C Oxidase Subunit III Cytochrome C Oxidase Subunit III AFFECTS Alzheimers Glutamate Estrogen STIMULATES Glutamate Glutamate AFFECTS Alzheimers Receptor Polymorphism Estrogen EXHIBITS Receptor Polymorphism Receptor Polymorphism AFFECTS Alzheimers Legend ZR-zero rarity singleton S-Subgraph Not Found Results available online: http://wiki.knoesis.org/index.php/Obvio#Automatic_Subgraph_Creation Obvio Web Application: http://knoesis-hpco.cs.wright.edu/obvio/
    40. 40. Scenario 6: Calcium Independent PLA2 – Schizophrenia Details Intermediate Association Status Cut-off date: 1997 By. Swanson/Smal heiser (Article) Oxidative Stress Oxidative Stress INHIBITS Calcium- Independent PLA2 Oxidative Stress CAUSES Schizophrenia ZR-2 Selenium Selenium INHIBITS Calcium- Independent PLA2 Selenium PREVENTS Schizophrenia ZR-2 Vitamin E Vitamin E INHIBITS Calcium- Independent PLA2 Vitamin E PREVENTS Schizophrenia ZR-2 Legend ZR-zero rarity singleton S-Subgraph Not Found Results available online: http://wiki.knoesis.org/index.php/Obvio#Automatic_Subgraph_Creation Obvio Web Application: http://knoesis-hpco.cs.wright.edu/obvio/
    41. 41. Scenario 7: Chlorpromazine – Cardiac Hypertrophy Details Intermediate Association Status Cut-off date: 01/01/2002 By. J. D. Wren (Article) Calcineurin Chlorpromazine INHIBITS Calcineurin Calcineurin CAUSES Cardiac Hypertrophy S5 Isoproterenol Chlorpromazine INHIBITS Isoproterenol Isoproterenol CAUSES Cardiamegaly S12 Legend ZR-zero rarity singleton S-Subgraph Not Found Results available online: http://wiki.knoesis.org/index.php/Obvio#Automatic_Subgraph_Creation Obvio Web Application: http://knoesis-hpco.cs.wright.edu/obvio/
    42. 42. Scenario 8: Testosterone – Sleep Details Intermediate Association Status Cut-off date: 01/01/2012 By. Miller/Rindflesc h (Article) Cortisol/Hydrocortisone Testosterone INHIBITS Cortisol Cortisol DISRUPTS Sleep S7 Legend ZR-zero rarity singleton S-Subgraph Not Found Results available online: http://wiki.knoesis.org/index.php/Obvio#Automatic_Subgraph_Creation Obvio Web Application: http://knoesis-hpco.cs.wright.edu/obvio/
    43. 43. Scenario 9: Diethylhexyl Phthalate (DEHP) – Sepsis Details Intermediate Association Status Cut-off date: 01/01/2013 By. Cairelli/Rindfle sch (Article) PParGamma DEHP STIMULATES PParGamma PParGamma INHIBITS Sepsis Legend ZR-zero rarity singleton S-Subgraph Not Found Results available online: http://wiki.knoesis.org/index.php/Obvio#Automatic_Subgraph_Creation Obvio Web Application: http://knoesis-hpco.cs.wright.edu/obvio/
    44. 44. 44 Statistical Evaluation Association Rarity Interestingness
    45. 45. 45 Statistical Evaluation Experiment # Unique Associations Total MEDLINE Frequency Rarity r(E) Interestingness I(E) Raynaud-Fish Oil 10 0 0.00 1.00 Magnesium-Migraine 48 27 0.56 0.64 SomaC-Arginine 18 306 17.00 0.06 Indomethacin- Alzheimers 21 9 0.43 0.70 Estrogen-Alzheimers 42 36 0.86 0.54 PLA2-Schizophrenia 10 0 0.00 1.00 CPZ-Cardiac Hypertrophy 21 2 0.10 0.91 Testosterone-Sleep 61 654 10.72 0.09 Average 29 129 3.71 0.62 Table 3: Rarity and Interestingness score of the subgraphs in the rediscoveries
    46. 46. Literature- Based Discovery Context- Driven Subgraph Model Foundations Automatic Subgraph Creation Experimental Results Dissertation Contributions Knowledge Exploration Limitations & Future Work
    47. 47. 47 Predications-based Knowledge Exploration Corpus Predications Graph Definitional Knowledge (UMLS + MeSH) Provenance Knowledge Abstraction D. Cameron, R. Kavuluru, O. Bodenreider, P. N. Mendes, A. P. Sheth, K. Thirunarayan. Semantic Predications for Complex Information Needs in Biomedical Literature International Bioinformatics and Biomedical Conference (BIBM11). 512–519 , 2011. Contribution #4 Combining Assertional and Definitional Knowledge for Knowledge Exploration
    48. 48. 48 Levels of Contexts A CB Predication Context A CB1 B2 Bi Path Context A CB1 B2 B3 A CB1 B2 Shared Context A C PRODUCES INHIBITS Subgraph Context … … … … … … A C A C A C … Dimensions
    49. 49. Literature- Based Discovery Context- Driven Subgraph Model Foundations Automatic Subgraph Creation Experimental Results Dissertation Contributions Knowledge Exploration Limitations & Future Work
    50. 50. 50 Dissertation Contributions 1. Context-Driven Subgraph Model – Knowledge Rediscovery & Decomposition 2. Predication/Path Context – Vector of MeSH Descriptors 3. Shared Context – Background Knowledge (MeSH Hierarchy) 4. Semantic Predications-based Text Exploration – Obvio Web Application
    51. 51. 51 Innovation System/Technique Technique Type Automatic Relational Evidence- based Thematic Results #Discoveries #Rediscoveries IRIDESCENT [108] Keyword 1 0 ARROWSMITH [84] Keyword/Conc ept 5 0 DAD [101,102] Concept 0 2 BITOLA [46] Concept 0 1 Litlinker [110] Concept 0 2 Manjal [87,88] Concept × 0 5 SemBT [40,41,42] Relations × × 0 1 BioSbKDS [47] Relations × × 0 1 Wilkowski [107] Graph × × 0 0 Ramakrishnan [72] Graph × × 0 1* Zhang [114] Graph × × × 0 0 Obvio [19, 21] Graph × × × × 0 8 ARROWSMITH v2 [86,98] Hybrid × 0 6* Semantic MEDLINE [18,63] Hybrid × × 2 0 Note: References are from the PhD Dissertation manuscript entitled: A Context Driven Subgraph Model for Literature-Based Discovery Table 4: Comparison of capabilities and accomplishments of LBD techniques
    52. 52. Literature- Based Discovery Context- Driven Subgraph Model Foundations Automatic Subgraph Creation Experimental Results Dissertation Contributions Knowledge Exploration Limitations & Future Work
    53. 53. 53 Limitations 1. Manual Threshold – MeSH Semantic Similarity 2. Path Relatedness Threshold – Only Approximate Gaussian 3. Definition of Context
    54. 54. 54 Levels of Semantic Representation Keywords Concepts MeSH Descriptors Semantic Predications Ensemble of Features Relationships A B Semantic Predication PREDICATE
    55. 55. 55 Limitations 1. Manual Threshold – MeSH Semantic Similarity 2. Path Relatedness Threshold – Only Approximate Gaussian 3. Definition of Context 4. MEDLINE Querying – Deep integration of Assertional/Definitional 5. Contradiction Detection 6. Statistical Evaluation 7. Scalability of Clustering Algorithm 8. Subgraph Labeling
    56. 56. 56 Take Away • Future of Information Processing – Rich Knowledge Representations o Implicit, Formal, Powerful semantics – Application to Literature-Based Discovery
    57. 57. 57 Conclusion • Context-Driven Subgraph Model – Manually create Complex Associations – Automatic Subgraph Creation o Novel definitions for Context and Shared Context o Multiple Thematic Dimensions – Predications-based Knowledge Exploration o Predicates o Highlighted MEDLINE sentences – Knowledge Rediscovery o 8 out of 9 existing scientific discoveries
    58. 58. 58 Publications 1. D. Cameron, R. Kavuluru, T. C. Rindflesch, O. Bodenreider, A. P. Sheth, K. Thirunarayan. Context-Driven Automatic Subgraph Creation for Literature-Based Discovery (under preparation) 2. D. Cameron, A. P. Sheth, N. Jaykumar, G. Anand, K. Thirunarayan, G. A. Smith. A Hybrid Approach to Finding Relevant Social Media Content for Domain Specific Information Needs. (submitted to the Journal of Web Semantics) 3. D. Cameron, O. Bodenreider, H. Yalamanchili, T. Danh, S. Vallabhaneni, K. Thirunarayan, A. P. Sheth, T. C. Rindflesch. A Graph-based Recovery and Decomposition of Swanson’s Hypothesis using Semantic Predications. Journal of Biomedical Informatics (JBI13). 46(2): 238–251, 2013. 4. D. Cameron, G. A. Smith, R. Daniulaityte, A. P. Sheth, D. Dave, L. Chen, G. Anand, R. Carlson, K. Z. Watkins, R. Falck. PREDOSE: A Semantic Web Platform for Drug Abuse Epidemiology using Social Media Journal of Biomedical Informatics (JBI13). 46(6): 985–997, 2013. 5. R. Daniulaityte, R. Carlson, R. Falck, D. Cameron, S. Perera, L. Chen, A. P. Sheth. “I just wanted to tell you that Loperamide WILL WORK: A Web-Based Study of Extra-medical use of Loperamide. Journal of Drug and Alcohol Dependence (DAD13) 130(1–3): 241–244, 2013. 6. D. Cameron, V. Bhagwan, A. P. Sheth. Towards Comprehensive Longitudinal Healthcare Data Capture. International Workshop on Semantic Web in Literature-Based Discovery (SWLBD12). 241–247, 2012. 7. R. Daniulaityte, R. Carlson, R. Falck, D. Cameron, S. Perera, L. Chen, A. P. Sheth. A Web-Based Study of Extra-medical use of Loperamide. The College on Problems of Drug Dependence (CPDD12), 2012. 8. D. Cameron, R. Kavuluru, O. Bodenreider, P. N. Mendes, A. P. Sheth, K. Thirunarayan. Semantic Predications for Complex Information Needs in Biomedical Literature. International Bioinformatics and Biomedical Conference (BIBM11). 512–519, 2011. 9. D. Cameron, B. Aleman-Meza, I. B. Arpinar, S. L. Decker, A. P. Sheth. A Taxonomy-based Model for Expertise Extrapolation. International Conference on Semantic Computing (ICSC10). 333–240, 2010. 10. D. Cameron, P. N. Mendes, A. P. Sheth, V. Chan. Semantics-empowered Text Exploration for Knowledge Discovery. ACM Southeast Conference (ACMSE10). 14, 2010. 11. C. Thomas, W. Wang, P. Mehra, D. Cameron, P. N. Mendes, A. P. Sheth. What Goes Around Comes Around – Improving Linked Open Data through On- Demand Model Creation. Web Science Conference (WebSci10), 2010. 12. P. N. Mendes, P. Kapanipathi, D. Cameron, A. P. Sheth. Dynamic Associative Relationships on the Linked Data Web. Web Science Conference (WebSci10), 2010.
    59. 59. 59 Research Expertise Literature-Based Discovery Text MiningQuestion Answering [1] Information Retrieval [2] [3] [6] [4] [8] [10] [5] [7]
    60. 60. 60 Parting Words “...some day the piecing together of dissociated knowledge will open up such terrifying vistas of reality,...that we shall either go mad from the revelation or flee from the deadly light into the peace and safety of a new dark age.” – H. P. Lovecraft (The Call of Cthulhu, The Horror in Clay). H. P. Lovecraft. The Call of Cthulhu. In S. T. Joshi, editor. The Call of Cthulhu and Other Weird Stories. Penguin Books Ltd., London, 1999
    61. 61. 61 Acknowledgements • Olivier Bodenreider • Marcelo Fiszman • Mike Cairelli • Swapna Abhyankar • Drashti Dave • Dongwook Shin • Special Thanks o Pavan o Shreyansh o Swapnil o Nishita • PREDOSE Team o Nishita o Gaurish o Alan o Revathy
    62. 62. 62 Ph.D. Committee Members Amit P. Sheth (Advisor) T.K. Prasad Michael Raymer Ramakanth Kavuluru Thomas C. Rindflesch Varun Bhagwan
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.

    ×