"Ontology-centric navigation of the scientific literature"


Published on

Bridging Worlds Conference 2008, Singapore
Day Two Track Three
Speaker 1- Christopher Baker

Published in: Education, Technology
  • Be the first to comment

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

"Ontology-centric navigation of the scientific literature"

  1. 1. Ontology-centric Knowledge Navigation .. of the scientific literature Christopher J. O. Baker Institute for InfoComm Research, A*STAR, Singapore
  2. 2. Motivation • Scientists typically need to integrate a spectrum of information to successfully complete a task. • On average a scientist or knowledge worker spends 1 day per week searching for, integrating and analyzing information, 50% of which is unstructured digital formats. • Access to information structured according to explicit knowledge representations or taxonomies is a fundamental concern of all scientists. • Moving beyond keyword search requires tools that provide lexical matching to semantic, conceptual and contextual levels of information and this entails an infrastructure for indexing text segments according to domain-specific metadata
  3. 3. In the future …. • Users will be involved in the design of information systems • Publishers will charge users for value added search: (who will build such search systems) • Users will search across semantically integration data sources and data types (how to facilitate system creation / adoption) • Knowledge driven systems - rapidly built and deployed with the engagement of domain experts in a knowledge engineering team
  4. 4. Literature-driven, Ontology-centric Knowledge Integration and Navigation Visual Query Reasoning Ontology 50 sentences 500 documents, to read blogs, newsfeeds Ontology to browse Population Text Mining Content delivery using expressive semantics
  5. 5. W3C Semantic Web Technologies • URI / LSID • Ontologies • Reasoners • Query Languages • Web Services • Service Registries • Agents • Multi Agent Systems • Workflows Engines • GRID / Semantic GRID • Text Mining • Service Oriented Architecture
  6. 6. Controlled Vocabularies Ontologies Catalog/ Thesauri Formal General ID “narrower term” is-a Frames logical Controlled vocabularies part-of (properties) constraints Terms/ Informal Formal Value Glossary/Controlled is-a instance restrictions vocabularies part-of Capture knowledge: Make the content in The meaning of important vocabulary information sources explicit. (classes, properties/relations and instance data in a domain model). Common domain terminology Index and query model Basis for interoperability to a repository of information. between information systems.
  7. 7. Lipid Ontology > Implementation: OWL-DL > DL Expressivity ALCHIQ > Uses LIPIDMAPS systematic nomenclature > 560 Named classes > 352 Lipid subclasses 71 Object properties (inc inv.) > 4 Datatype properties > Lipid instance: Graph fragment DL Axioms LIPIDMAPS systematic name Lipid Hierarchy > Depth: 8 levels Domain Knowledge vs information Concept Definitions system metadata
  8. 8. Ontologies Online
  9. 9. Ontology-centric knowledge architecture
  10. 10. Ontology-centric Knowledge Integration • Content Delivery Platform - Automated Document delivery from online databases Tools for conversion to text-minable text Content Acquisition • Text Mining - Customized and Automated Regular Expressions, Named Entities, Relations, Domain • Knowledge Engineering – Ontology Creation specific Domain Modeling / Customized Rapid raw text Prototyping • Ontology Population – Automated Instantiation Sentences as instances / Co-occurrence and named relations (Rules)
  11. 11. Domian Ontology vs Mixed Metadata: a literature specification
  12. 12. Ontology Population Workflow • Ontology based information retrieval applies NLP to link documents to existing ontologies • Ontology-driven NLP - NLP that actively uses ontological resources for NLP tasks • Ontological NLP - ontologies used as a knowledge base for NLP tasks while also exporting the results of NLP analyses into an ontology that can then subsequent semantic queries to the ontology using description logic reasoners and a box reasoning • Ontology based NLP - the results of NLP are exported to another ontology, using external resources for text processing, Witte etal. 2007
  13. 13. Text Mining • Class Instance Generation from full text – Named entity recognition (gazetteer based) – Dictionary based matching of text tokens to domain specific vocabularies i.e. (LipidBank, Lipidmaps, KEGG, IUPAC) and curated Swissprot terms and disease ontology of CGM – Normalization and grounding to canonical names • Relation Detection - Role Assertions: – Co-occurrence and Rule-based relation detection of binary pairs from which knowledgebase instances are generated. Primary set of binary interactions mined from text: – Lipid-Protein, Lipid-Disease, Protein-Disease – Domain specific library of curated biological relations.
  14. 14. Knowledgebase Instantiation 1) Rule based identification of Sentences containing target keywords 2) Instantiation with JENA API http://jena.sourceforge.net/ for this purpose. Target keywords found in sentences are instantiated to corresponding ontology class • Lipid / Protein / Disease instances are instantiated to the respective ontology classes (as tagged by the gazetteer) • Binary pairs instantiated to the respective Object Properties as role assertions • Sentences instantiated to the respective Data type properties. For each lipid identified in a sentence the corresponding data are instantiated to the ontology from Lipid Data Warehouse records requiring no further text processing. • Lipid - LIPIDMAPS Systematic Name and its associated • Lipid - IUPAC Name, Lipid – synonyms, Lipid - Database ID.
  15. 15. Knowledgebase Instantiation Rule Based Sentence Processing <Lipid> AND <Protein> AND LipidProteinInteraction-TriggerWord e.g. quot;interactquot;, quot;bindquot;, quot;mediatequot; <Lipid> AND <Disease> AND LipidDiseaseInteraction-TriggerWord e.g quot;involvequot;, quot;causequot; Lipid Class Protein Instance Lipid Instance Lipid Instance
  16. 16. Knowledge Integration and Query User input query Search Web content or Engine Full text papers NLP tagging Papers identified: 262 121 papers with no lipid protein relations 141 papers contributed to ontology instantiation 186 lipid names docs 528 protein names tagged After normalisation and grounding: with 92 Lipidmaps systematic names relevant 52 IUPAC names, 412 exact synonyms, 6 broad synonyms, 319 protein names name Cross link to 59 Lipidbank entries entities Sentences: Co-occurrence before rules 1356 Sentences, After rules 683 Interaction sentences 92 Lipidmaps names instantiated to 35 classes (2.6 lipids per class) Instantiation Time: 22 seconds Ontology Knowledge “Instantiated ontology” instantiation User Output for end user Navigation Baker CJ, Kanagasabai R, Ang WT, Veeramani A, Low HS, and vehicle Wenk MR. Towards ontology-driven navigation of the lipid bibliosphere. BMC Bioinformatics. 2008;9 Suppl 1:S5.
  17. 17. Knowledge Integration and Query User input query Search Web content or Engine Full text papers NLP tagging docs tagged with relevant name entities Ontology Knowledge “Instantiated ontology” instantiation User Output for end user Navigation Baker CJ, Kanagasabai R, Ang WT, Veeramani A, Low HS, and vehicle Wenk MR. Towards ontology-driven navigation of the lipid bibliosphere. BMC Bioinformatics. 2008;9 Suppl 1:S5.
  18. 18. Knowlegator Query Composition Panel Results Panel Ontology Content Query Syntax Query Engine Concept Dialogue Properties Overview
  19. 19. Complex Query Generation rma tician In f o x pert ain e D om Find documents and sentences describing proteins- lipid interaction and corresponding lipid synonyms.
  20. 20. Pathway Discovery Algorithm Finds transitive paths across the graph: between source and target concepts. Can define path length and result size … paths between any object properties or a user defined object properties only e.g. protein interacts with protein
  21. 21. Pathway Knowledge Discovery 2 concepts or keywords ... across Results with multiple Kanagasabai R. Low HS ,Ang WT, Wenk MR, Baker CJO. semantic labelling Ontology-centric navigation of pathway information mined from text, Bio-Ontologies SIG: Knowledge in Biology, ISMB July 2008 relations
  22. 22. Pathway Knowledge Discovery 2
  23. 23. Navigation of Cancer Pathways
  24. 24. 1 search term (instance or concept) generates a list of natural language questions answerable by the ontology and a direct link to answers Ang WT, Kanagasabai R, Baker CJ. Knowledge Translation: Computing the query potential of bio-ontologies, Genome Informatics Workshop 2008 Submitted …..
  25. 25. Application Workflow
  26. 26. Semantic Technologies Architecture
  27. 27. Knowledge Services: Development Knowledge Worker involved in Discovery Navigation Paradigms Ontology Engineering Quality Semantic Evolution Data Maintenance Integration NLP & Text Databases Mining Multi-user involvement Domain Expert Text Mining Ontology Engineer Phase 1 Phase 2 Semantics Ontology Engineer Engineer
  28. 28. Annotation Services
  29. 29. Acknowledgements Semantic Technology Group Christopher J. O. Baker Kanagasabi Rajaraman Menaka Rajapakse Anitha Veeramani Ang Wee Tiong Alexander Garcia (Alumnus) Collaborators Markus R Wenk, NUS Low Hong-Sang, NUS Choo Kar Heng, I2R Shoba Ranganathan NUS Suisheng Tan, I2R
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.