48th ACM Southeast Conference. ACMSE 2010.
Oxford, Mississippi. April 15-17, 2010.




  Semantic Predications for Complex Information
         Needs in Biomedical Literature

          Delroy Cameron, RamakanthKavuluru, Pablo N. Mendes
                 AmitP. Sheth, KrishnaprasadThirunarayan
                    Ohio Center for Excellence in Knowledge-enabled Computing
                              kno.e.sisCenter, Wright State University
                                      Dayton, OH 45435, USA


                                     Olivier Bodenreider
                                      National Library of Medicine
                                      Bethesda, MD 20894, USA


                   2011 International Conference on Bioinformatics and Biomedicine (BIBM)
                                   12-15 November, 2011 Atlanta, Georgia
MOTIVATION

  • Information Retrieval
     Interaction Sequence
       – Keyword Search
       – Document Selection
       – Document Inspection
       – Query Reformulation               Exploratory
                                             Search
    Document-Centric Model
       – Hyperlink-driven Browsing
       – Information is within Documents

     Limitations
       – Query Reformulation
       – Constrained Navigation




                                  2
“How do mutations in the Presenilin-1 (PS1) gene affect Alzheimer’s disease (AD)?”




             PS1 associated_withAlzheimer’s Disease
    . . . mutations in PS1 lead to Alzheimer’s disease by increasing the
    extracellular levels of [amyloid peptide 42] A42. (Source: PMID10652366)

                                Semantic Predications

    . . . familial early onset Alzheimer’s disease is caused by point mutations in
    the amyloid precursor protein gene on chromosome 21, in the presenilin
    2(PS2)1 gene on chromosome 1, or, most frequently, in the presenilin
                               chromosome14 finding_site_ofpresenilin1
    1(PS1) gene on chromosome14. . . (Source:PMID9013610)
COMPLEX INFO NEEDS



   Literature-Based Discovery (LBD)
       Don R. Swanson’s Hypotheses
         o   Raynaud’s Syndrome-Dietary Fish Oil
         o   Magnesium-Migraine


   Question Answering
     Text REtrieval Conference (TREC)
     2006, 28 questions

                 TREC Genomics Track - http://ir.ohsu.edu/genomics/

                                             6
PROBLEM



  COMPLEX BIOMEDICAL QUESTION
  ANSWER DOCUMENTS
   oLEGAL SPANS
  SEMANTIC PREDICATIONS-BASED RETRIEVAL




                    7
REACHABILITY



    “the notion of being able to get from one vertex in a
    directed graph to some other vertex”

 Labeled Graph               Calcium Channel Blocker

                                 ISA       a
                                 INVERSE_ISA
             Magnesium   b




                                       c




                                               8
REACHABILITY-DOCS

   “is the notion of being able to cover the documents in a
   document set, usingthe vertices in a directed graph
   from one vertex to some other vertex”




               Predications Graph Plane




                              Document Plane


                                 9
11



Knowledge Abstraction
 • Stopping Conditions                 C0007613 – Cell physiology
    – No Reachable Docs in PG
    – No Successors in PG +Reachable Docs in DP
    – No Successors in PG + No Reachable Docs in DP
                                     coexists_with

                                       coexists_with
                                                           C1261468 - Cell fusion



      C0040682 - cell transformation




              Predications Graph Plane




                                                Document Plane
12



Reachability Framework

  External Knowledge Base Graph Plane
                                              Knowledge Abstraction




    Predications Graph (PG) Plane
                                                 Provenance




                             Document Plane
DATASET

    • TREC 2006 Corpus
           26 Questions
           162,259 full text documents
           12,641,116 text items

     Gold Standard
        1381 Gold Standard Documents
        3461 Text items

    •Biomedical Knowledge Repository (BKR)
           13 million from UMLS Metathesaurus
           8 million from Literature using SemRep


  TREC Genomics Track - http://ir.ohsu.edu/genomics/ 13
EXPERIMENTS

  • Experiment I: Single-Graph
     2105 Vertices, 16942 Edges
    > 13,000 unique predications
     240 no predications, 3 not processed
     121,162 text items

   Experiment II: Multiple-Graph
     26 predications graph




                              14
OBSERVATIONS


    Absence of predications in text
     Predication extraction methods (SemRep)
     Absence of direct connections among text item
     Ambiguity in written language
    Abstractions may lead to information overflow
     Quality of background knowledge




                          17
“How do mutations in the Pes gene affect cell growth?”




                                 DNA Replication

                           G1_phase           G2_phase
CONCLUSION


 •   Novel Knowledge-driven framework
 •   Semantic Predications to link scientific content
 •   Alternative to Query reformulation
 •   Background knowledge boost recall

 • Effective at Coarse granularity
 • Poor at fine granularity


                             19
FUTURE WORK


 •   Path/Predication Ranking
 •   Path/Predication Collapsing
 •   Knowledge Abstraction tuning
 •   Scalability
 •   Computational Complexity

 • Replicate Swanson’s Hypotheses

                         20
ACKNOWLEDGEMENT

 National Library of Medicine (NIH/NLM)

 Human Performance & Cognition Ontology Project @knoesis

    CarticRamakrishnan
     Michael Cooney
     Gary Alan Smith
     Paul Fultz  II
     Jeffrey Ali Hyacinthe
    Thomas C. Rindflesch
     Mohamed Cyclegar
    Dongwook Shin
     John Nguyen
     May Cheh

                               21
QUESTIONS




            22
TREC Genomics Track 2006 (Questions)
  Topic ID                                                          Question
    160      What is the role of PrnP in mad cow disease?

     161     What is the role of IDE in Alzheimer's disease

    163      What is the role of APC (adenomatouspolyposis coli) in colon cancer?

    165      How do Cathepsin D (CTSD) and apolipoprotein E (ApoE) interactions contribute to Alzheimer's disease?

    167      How does nucleoside diphosphatekinase (NM23) contribute to tumor progression?

    168      How does BARD1 regulate BRCA1 activity?

    169      How does APC (adenomatouspolyposis coli) protein affect actin assembly

    170      How does COP2 contribute to CFTR export from the endoplasmic reticulum?

    172      How does p53 affect apoptosis?

    173      How do alpha7 nicotinic receptor subunits affect ethanol metabolism?

    174      How does BRCA1 ubiquitinating activity contribute to cancer?

    176      How does Sec61-mediated CFTR degradation contribute to cystic fibrosis?

     177     How do Bop-Pes interactions affect cell growth?

    178      How do interactions between insulin-like GFs and the insulin receptor affect skin biology?

    179      How do interactions between HNF4 and COUP-TF1 suppress liver function?

    180      How do Ret-GDNF interactions affect liver development?

    184      How do mutations in the Pes gene affect cell growth?

    185      How do mutations in the hypocretin receptor 2 gene affect narcolepsy?

    186      How do mutations in the Presenilin-1 gene affect Alzheimer's disease?
24



Most Relevant Subtree
             a
                        Level-0

                        Level-1



                        Level-2


                        Level-3


                        Level-4

Cameron.bibm2011

  • 1.
    48th ACM SoutheastConference. ACMSE 2010. Oxford, Mississippi. April 15-17, 2010. Semantic Predications for Complex Information Needs in Biomedical Literature Delroy Cameron, RamakanthKavuluru, Pablo N. Mendes AmitP. Sheth, KrishnaprasadThirunarayan Ohio Center for Excellence in Knowledge-enabled Computing kno.e.sisCenter, Wright State University Dayton, OH 45435, USA Olivier Bodenreider National Library of Medicine Bethesda, MD 20894, USA 2011 International Conference on Bioinformatics and Biomedicine (BIBM) 12-15 November, 2011 Atlanta, Georgia
  • 2.
    MOTIVATION •Information Retrieval  Interaction Sequence – Keyword Search – Document Selection – Document Inspection – Query Reformulation Exploratory Search Document-Centric Model – Hyperlink-driven Browsing – Information is within Documents  Limitations – Query Reformulation – Constrained Navigation 2
  • 3.
    “How do mutationsin the Presenilin-1 (PS1) gene affect Alzheimer’s disease (AD)?” PS1 associated_withAlzheimer’s Disease . . . mutations in PS1 lead to Alzheimer’s disease by increasing the extracellular levels of [amyloid peptide 42] A42. (Source: PMID10652366) Semantic Predications . . . familial early onset Alzheimer’s disease is caused by point mutations in the amyloid precursor protein gene on chromosome 21, in the presenilin 2(PS2)1 gene on chromosome 1, or, most frequently, in the presenilin chromosome14 finding_site_ofpresenilin1 1(PS1) gene on chromosome14. . . (Source:PMID9013610)
  • 4.
    COMPLEX INFO NEEDS  Literature-Based Discovery (LBD)  Don R. Swanson’s Hypotheses o Raynaud’s Syndrome-Dietary Fish Oil o Magnesium-Migraine  Question Answering  Text REtrieval Conference (TREC)  2006, 28 questions TREC Genomics Track - http://ir.ohsu.edu/genomics/ 6
  • 5.
    PROBLEM  COMPLEXBIOMEDICAL QUESTION  ANSWER DOCUMENTS oLEGAL SPANS  SEMANTIC PREDICATIONS-BASED RETRIEVAL 7
  • 6.
    REACHABILITY “the notion of being able to get from one vertex in a directed graph to some other vertex” Labeled Graph Calcium Channel Blocker ISA a INVERSE_ISA Magnesium b c 8
  • 7.
    REACHABILITY-DOCS “is the notion of being able to cover the documents in a document set, usingthe vertices in a directed graph from one vertex to some other vertex” Predications Graph Plane Document Plane 9
  • 9.
    11 Knowledge Abstraction •Stopping Conditions C0007613 – Cell physiology – No Reachable Docs in PG – No Successors in PG +Reachable Docs in DP – No Successors in PG + No Reachable Docs in DP coexists_with coexists_with C1261468 - Cell fusion C0040682 - cell transformation Predications Graph Plane Document Plane
  • 10.
    12 Reachability Framework External Knowledge Base Graph Plane Knowledge Abstraction Predications Graph (PG) Plane Provenance Document Plane
  • 11.
    DATASET • TREC 2006 Corpus  26 Questions  162,259 full text documents  12,641,116 text items  Gold Standard  1381 Gold Standard Documents  3461 Text items •Biomedical Knowledge Repository (BKR)  13 million from UMLS Metathesaurus  8 million from Literature using SemRep TREC Genomics Track - http://ir.ohsu.edu/genomics/ 13
  • 12.
    EXPERIMENTS •Experiment I: Single-Graph  2105 Vertices, 16942 Edges > 13,000 unique predications  240 no predications, 3 not processed  121,162 text items  Experiment II: Multiple-Graph  26 predications graph 14
  • 14.
    OBSERVATIONS Absence of predications in text  Predication extraction methods (SemRep)  Absence of direct connections among text item  Ambiguity in written language Abstractions may lead to information overflow  Quality of background knowledge 17
  • 15.
    “How do mutationsin the Pes gene affect cell growth?” DNA Replication G1_phase G2_phase
  • 16.
    CONCLUSION • Novel Knowledge-driven framework • Semantic Predications to link scientific content • Alternative to Query reformulation • Background knowledge boost recall • Effective at Coarse granularity • Poor at fine granularity 19
  • 17.
    FUTURE WORK • Path/Predication Ranking • Path/Predication Collapsing • Knowledge Abstraction tuning • Scalability • Computational Complexity • Replicate Swanson’s Hypotheses 20
  • 18.
    ACKNOWLEDGEMENT National Libraryof Medicine (NIH/NLM) Human Performance & Cognition Ontology Project @knoesis CarticRamakrishnan  Michael Cooney  Gary Alan Smith  Paul Fultz II  Jeffrey Ali Hyacinthe Thomas C. Rindflesch  Mohamed Cyclegar Dongwook Shin  John Nguyen  May Cheh 21
  • 19.
  • 20.
    TREC Genomics Track2006 (Questions) Topic ID Question 160 What is the role of PrnP in mad cow disease? 161 What is the role of IDE in Alzheimer's disease 163 What is the role of APC (adenomatouspolyposis coli) in colon cancer? 165 How do Cathepsin D (CTSD) and apolipoprotein E (ApoE) interactions contribute to Alzheimer's disease? 167 How does nucleoside diphosphatekinase (NM23) contribute to tumor progression? 168 How does BARD1 regulate BRCA1 activity? 169 How does APC (adenomatouspolyposis coli) protein affect actin assembly 170 How does COP2 contribute to CFTR export from the endoplasmic reticulum? 172 How does p53 affect apoptosis? 173 How do alpha7 nicotinic receptor subunits affect ethanol metabolism? 174 How does BRCA1 ubiquitinating activity contribute to cancer? 176 How does Sec61-mediated CFTR degradation contribute to cystic fibrosis? 177 How do Bop-Pes interactions affect cell growth? 178 How do interactions between insulin-like GFs and the insulin receptor affect skin biology? 179 How do interactions between HNF4 and COUP-TF1 suppress liver function? 180 How do Ret-GDNF interactions affect liver development? 184 How do mutations in the Pes gene affect cell growth? 185 How do mutations in the hypocretin receptor 2 gene affect narcolepsy? 186 How do mutations in the Presenilin-1 gene affect Alzheimer's disease?
  • 21.
    24 Most Relevant Subtree a Level-0 Level-1 Level-2 Level-3 Level-4

Editor's Notes

  • #19 1. Pescadillo affects rRNA Processing which may affect the S Phase, since the WDR12 Gene affects Cell Physiology with which rRNA Processing coexists.2 However, the S Phase or synthesis phase of the cell cycle is also a process of Aneuploidy which has negatively effect on cell growth. 3. Since DNA replication is known to occur between he G1 and G2 phases of the S Phase, mutations of Pescadillo could lead to G1 Phase arrest thereby showing growth arrest. 4. By confirmation in the text, a specific temperature sensitive mutant experiences this condition.