Your SlideShare is downloading. ×
The knowledge-driven exploration of integrated biomedical knowledge sources facilitates the generation of new hypotheses
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Introducing the official SlideShare app

Stunning, full-screen experience for iPhone and Android

Text the download link to your phone

Standard text messaging rates apply

The knowledge-driven exploration of integrated biomedical knowledge sources facilitates the generation of new hypotheses

2,030
views

Published on

To be presented at Linked Science, Oct 24, 2011

To be presented at Linked Science, Oct 24, 2011

Published in: Technology, Education, Business

0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
2,030
On Slideshare
0
From Embeds
0
Number of Embeds
4
Actions
Shares
0
Downloads
9
Comments
0
Likes
1
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide
  • Introduce the topic Introduce the data sources BKR and Chagas diseaseIntroduce the authors
  • Fact: 11000 articles in PubMed discuss about Chagas Disease.The left figure show one PubMed article and the right figure shows the DNA Microarray result of T. cruzi genes.One data source is pre-publication, one is post-publication text. The question now is how to make use of post-publication data in the ongoing research of Chagas disease? Or how to use experimental data to validate a finding in the article?
  • Question to be answered: give me a list of t.cruzi genes that are inhibited by a drug?
  • Links between BKR and PKR 60% of concepts in PEO and PLO ontologies in the UMLS Metathesaurus. PEO and PLO cover concepts in PKR data sets. Gene Ontology (GO) is integrated into UMLS Metathesaurus and a part of GO terms are used to annotate PKR data sets.Orthology relation between human genes and T.cruzi genes.
  • Explain the predications components with a human gene CALR, this is introduced here to make connection with ortholog information.
  • Explain the predications components with a human gene CALR, this is introduced here to make connection with ortholog information.
  • Use ortholog information as linking source
  • 1. Knowledge-driven exploration process: the literature-based discovery infers new relation from existing chain of named relationships.Assume that we know the relation R1 connecting A to B, and the relation R2 connecting B to D, we can infer a new relation R3 between A and D.The number of triples in the chain may be extended into include 3, 4 or more.Note: solid arrow for direct relation, dash arrow for indirect relation.2. Two steps in the process- Among 20 million predications in the knowledge bases, how biologists can navigate the chain of named relationships of their interest?Our tool provides the interface that help biologists to navigate such chain of named relationships.The graph expansion: from one given concept, the tool can display a graph with a number of related concepts.The graph restriction: a concept may be connected to other concepts via some relationships. A biologist can restrict the graph to display only relationship of interest. - The biologist will use their background knowledge to formulate new hypothesis from the existing chain of named relationships.
  • We will demonstrate one sample of the exploration process using our tool iExploreThe biologists in our team was interested in this conceptual chain of named relationship: a drug treats Chagas disease, and this drug inhibits some human genes and this human genes has some T.cruziorthologsIn order to accept the hypothesis, we will need to find the data that support the hypothesis.Play the video here
  • Explain the overview -Left side to control the setting, - Right side to display the graph of semantic predicationsNon-technical user oriented - Ease of use - Intuitive
  • The figure summarizes the relationships among concepts we are investigating.From this graph, the biologists can generate this hypothesis.
  • Explain the overview -Left side to control the setting, - Right side to display the graph of semantic predicationsNon-technical user oriented - Ease of use - Intuitive
  • Transcript

    • 1. The knowledge-driven exploration of integrated biomedical knowledge sources facilitates the generation of new hypothesesVinh Nguyen, Olivier Bodenreider, Todd Mining, AmitSheth
    • 2. Context• Chagas disease – Infectious disease caused by Trypanosomacruzi (T. cruzi) parasite – Affects 8-10 million people in South America, about 20,000 deaths per year – Finding drug targets and vaccine10/21/2011 2
    • 3. Context10/21/2011 3
    • 4. Why integrate two?• Synergistically exploit the two sources – If literature indicates certain finding (for the same or other organism [mice, rat], for same or different process [e.g. for gene expression], does my data support the same finding? – I find a pattern in my data, or a pathway component – is this reported in literature by someone in the same or similar context?10/21/2011 4
    • 5. The knowledge-driven integrated exploitation of complementary sources, which can facilitate generation of new hypotheses for additional investigations and insights.10/21/2011 5
    • 6. Use case• Data sources – Text data from PubMed – Experimental data about Chagas disease• Objectives/capabilities – Query experimental and literature (text) together – Find new insights into Chagas disease 10/21/2011 6
    • 7. Background Knowledge• Two knowledge bases were linked/partly integrated – The Biomedical Knowledge Repository (BKR) [NLM] – The Parasite Knowledge Repository for Chagas disease (PKR)10/21/2011 7
    • 8. Knowledge source - BKR• Created at NLM by Dr. Olivier Bodenreider and Dr. Thomas Rindflesch• ~ 20 million predications from 6 million PubMed articles by SemRep• Example10/21/2011 8
    • 9. Knowledge source - PKRCreated by T.cruziproject at Kno.e.sis, WSU with CTEGD, UGA primarily for semantic analysis of parasite experimental data from/including:• The expression level of T. cruzi genes (from Microarray analysis data)• The presence of proteins encoded by T. cruzi genes (using Proteome analysis)• Ontologies for annotation – Parasite Experiment Ontology (PEO) – Parasite Lifecycle Ontology (PLO)10/21/2011 9
    • 10. Linking the two knowledge bases• Orthology relation – Gene function is usually conserved across species10/21/2011 10
    • 11. How to explore the integrated knowledge bases?• Knowledge-driven exploration process• Two steps of the process – Navigate the chain of named relationships • Graph expansion • Graph restriction – Interpret the chain to infer new hypothesis10/21/2011 11
    • 12. Example of Exploration process• Sample chain10/21/2011 12
    • 13. Exploration process – Demo iExplorevideo demo is available athttp://www.youtube.com/watch?v=-7-BgBDTSjc10/21/2011 13
    • 14. Exploration process - Interpretation• Sample chain -> Hypothesis The T. cruzi orthologs of human genes inhibited by itraconazole may also be inhibited by itraconazole and thus would be candidates for further studies into the mechanism(s) of action of the drug itraconazole on T. cruzi10/21/2011 14
    • 15. Significance• Biologically – Biologists follow the exploration process to generate novel hypotheses – Can easily be generalized to other diseases• Technically – Technical details (e.g., SPARQL queries) hidden from the user – Scalable over millions of predications10/21/2011 15
    • 16. Technologies• Semantic Web – RDF, SPARQL standards – Virtuoso triple store• iExplore development – Client side: Flex application – Server side: Java10/21/2011 16
    • 17. Conclusion• Integrate knowledge bases – BKR – PKR• Knowledge-driven exploration – Two-step process: navigation and interpretation – Assist biologists to explore the integrated knowledge bases10/21/2011 17
    • 18. Future work• Validation of novel hypotheses – Collaborating with UGA researchers in the biomedical aspect – Incorporate the validation into iExplore• Automate the interpretation step – Learning the way biologists use their background knowledge to interpret the chain of named relationship10/21/2011 18
    • 19. Acknowledgements Dr. Thomas Rindflesch Dr. CarticRamakrishnan May Cheh Dr. Priti Parikh Dr. Clement McDonald Joshua Dotson Dr. BastienRance SarasiLalithsena Jonathan Mortensen John Nguyen Thang Nguyen Lister Hill Center10/21/2011 19
    • 20. 10/21/2011 20
    • 21. Further information, please visit http://knoesis.wright.edu/iExplore10/21/2011 21