An Up-to-date Knowledge Base and Focused Exploration System for Human Performance and Cognition<br />AmitSheth<br />LexisN...
Focused Knowledge Bases<br />A knowledge base (KB) functions as a<br />standalone reference for a particular domain of int...
What is a KB?<br />“A body of knowledge describing a topic or domain of interest”<br />categories or classes – Neurotransm...
Then, What are Ontologies?<br />“Ontology is the basic structure or armature around which a knowledge base can be built” (...
Ontologies in Life Sciences<br />The National Center for Biomedical Ontology (NCBO) - Open Biomedical Ontologies (OBO)<br ...
What about Emergent Knowledge, Richer Relationships?<br />New scientific results and insights published everyday backed by...
Enter HPC-KB<br />NLP<br />Patterns<br />SCOONER<br />
Steps in Creating the HPC-KB<br />Carve a focused domain hierarchy out of Wikipedia<br />Extract mentions of entities and ...
Workflow Overview<br />HPC keywords<br />Doozer: Base Hierarchy from Wikipedia<br />Focused Pattern based extraction<br />...
Hierarchy Using Wikipedia Categories and Graph Structure<br />
Triple Extraction<br />Open Extraction<br /> No fixed number of predetermined entities and predicates<br />At  Knoesis – N...
Mapping of Triples to Hierarchy<br />Entities in both subject and object must contain at least one concept from the hierar...
HPC-KB Stats<br /><ul><li>  Number of Entities   --   2 Million
  Number of non-trivial facts  --  3 Million</li></ul>    Examples:<br />NLP based: calcium-binding protein S100B modulate...
Full Architecture<br />
Scooner  Features<br />Knowledge-based browsing: Relations window, inverse relations, creating trails<br />Persistent proj...
ScoonerDemostration Video<br />			http://slidesha.re/scooner-video<br />
Comments on Scooner<br />“The ability to browse predications together with documents will likely reduce the cognitive load...
Next: Knexpace<br />Automatic Updates<br />Index new abstracts as they arrive<br />Extract relationships as new abstracts ...
Improving the KB Quality & Filtering<br />Adhere to a stricter schema<br />Having a fixed number of predicates and a fixed...
Normalize Entities and Predicates <br />How do we find <br />Bovine spongiform encephalopathy and mad cow disease are same...
Provenance and other meta data<br />Original abstract PMIDs will be captured for each triple.<br />Other data: authors, jo...
New Knowledge Example<br />VIP Peptide  – increases – Catecholamine Biosynthesis<br />Catecholamines – induce – β-adrenerg...
Domain Specific Provenance<br />For Immunology and Warfare Agent effects<br />Which pathogen:  Francisellatularensis<br />...
Better Ranking of Abstracts and Triples<br />Use search phrases to fine tune ranking of abstracts and triples<br />Just be...
Semantic Integration<br />Famous OBO Ontologies (total 7 foundry ontologies)<br />Gene Ontology<br />PRotein Ontology<br /...
Semantic Integration, contd…<br />Linked Open Drug Data<br />DrugBank: drugs and drug targets (pathways, structures, pathw...
Upcoming SlideShare
Loading in …5
×

An Up-to-date Knowledge Base and Focused Exploration System for Human Performance and Cognition

1,611 views

Published on

Talk given at Human Performance Directorate, Air Force Research Lab, Dayton OH on February 17, 2011.

Published in: Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,611
On SlideShare
0
From Embeds
0
Number of Embeds
645
Actions
Shares
0
Downloads
0
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • Just say that knowledge bases are useful for two specific purposes.In the next slide talk about what exactly is a KB, what does it look like.
  • Emphasize the predicates are primary indicators of new knowledge.People are aware of concepts, but what is interesting is which predicates (if any) connect those concepts
  • The edges here show part_of and is_a relations: neuroscience is a subclass of science, while brain is part of the general area of cognitive science.So we do not model a strict ontology here.
  • Ncbi – national center for biotechnology informationObo – open biomedical ontologiesNcbo – natinal center for biomedical ontologySNOMED - Systematized Nomenclature of Medicine
  • Rrf – rich release format
  • An Up-to-date Knowledge Base and Focused Exploration System for Human Performance and Cognition

    1. 1. An Up-to-date Knowledge Base and Focused Exploration System for Human Performance and Cognition<br />AmitSheth<br />LexisNexis Ohio Eminent Scholar<br />Director, Kno.e.sis Center<br />Wright State University<br />RamakanthKavuluru<br />Postdoctoral Research Scientist<br />Kno.e.sis Center<br />Thanks to Dr. Victor Chan for support and guidance<br />HPC-KB team: Christopher Thomas, Wenbo Wang, Alan Smith, Paul Fultz, Delroy Cameron, Priti Parikh<br />
    2. 2. Focused Knowledge Bases<br />A knowledge base (KB) functions as a<br />standalone reference for a particular domain of interest<br />backend for knowledge-based search, browsing, and exploration of literature<br />
    3. 3. What is a KB?<br />“A body of knowledge describing a topic or domain of interest”<br />categories or classes – Neurotransmitters, Disease<br />individuals (instances of classes)<br />Dopamine, Magnesium, Migraine<br />roles (properties/predicates) – inhibits, is a, part of<br />assertions (triples)<br />Dopamine is a neurotransmitter<br />Magnesium treats Migraine<br />
    4. 4. Then, What are Ontologies?<br />“Ontology is the basic structure or armature around which a knowledge base can be built” (Swartout and Tate, 1999)<br />“An ontology is an explicit representation of a shared understanding of the important concepts in some domain of interest.” (Kalfoglou, 2001)<br />So, mostly static blocks of well accepted and consensual knowledge<br />
    5. 5. Ontologies in Life Sciences<br />The National Center for Biomedical Ontology (NCBO) - Open Biomedical Ontologies (OBO)<br />About 200 ontologies and 1.5 million terms<br />Only part_of and is_a relations in the Gene Ontology<br />Histolysis is_a positive regulation of cell size<br />Requests for changes are expert reviewed before modifications<br />
    6. 6. What about Emergent Knowledge, Richer Relationships?<br />New scientific results and insights published everyday backed by experimentation<br />PubMed: 18+ million articles; 1300 new per day<br />Also, what about other predicates besides is_a and part_of (eg., UMLS Semantic Network of 54 predicates). <br />Need a way of capturing and meaningfully utilizing this emerging knowledge<br />
    7. 7. Enter HPC-KB<br />NLP<br />Patterns<br />SCOONER<br />
    8. 8. Steps in Creating the HPC-KB<br />Carve a focused domain hierarchy out of Wikipedia<br />Extract mentions of entities and relationships in the relevant scientific literature (Pubmed abstracts) to support non-hierarchical guidance.<br />Map extracted entity mentions to concepts and extracted predicates to relationships to create the knowledge-base<br />
    9. 9. Workflow Overview<br />HPC keywords<br />Doozer: Base Hierarchy from Wikipedia<br />Focused Pattern based extraction<br />SenseLab Neuroscience Ontologies<br />Initial KB Creation<br />Meta Knowledgebase<br />PubMed Abstracts<br />Knoesis: Parsing based NLP Triples <br /> Enrich Knowledge Base<br />NLM: Rule based BKR Triples<br />Final Knowledge Base<br />
    10. 10. Hierarchy Using Wikipedia Categories and Graph Structure<br />
    11. 11. Triple Extraction<br />Open Extraction<br /> No fixed number of predetermined entities and predicates<br />At Knoesis – NLP (parsing and dependency trees)<br />Supervised Extraction<br />Predetermined set of entities and predicates<br />At Knoesis – Pattern based extraction to connect entities in the base hierarchy using statistical techniques<br />At NLM – NLP and rule based approaches<br />
    12. 12. Mapping of Triples to Hierarchy<br />Entities in both subject and object must contain at least one concept from the hierarchy to be mapped to the KB<br />Preliminary synonyms based on anchor labels and page redirects in Wikipedia<br />Prolactostatin redirects to Dopamine<br />Predicates (verbs) and entities are subjected to stemming using Wordnet<br />
    13. 13. HPC-KB Stats<br /><ul><li> Number of Entities -- 2 Million
    14. 14. Number of non-trivial facts -- 3 Million</li></ul> Examples:<br />NLP based: calcium-binding protein S100B modulates long-term synaptic plasticity<br />Pattern based: Olfactory Bulb has physical part of anatomic structure Mitral cell<br /><ul><li> Extracted from all abstracts untilAug 2008</li></ul>(Note: Abstracts accessible through Scooner: Oct 2010) <br />
    15. 15. Full Architecture<br />
    16. 16. Scooner Features<br />Knowledge-based browsing: Relations window, inverse relations, creating trails<br />Persistent projects: Work bench, browsing history, comments, filtering<br />Collaboration: comments, dashboard, exporting (sub)projects, importing projects<br />
    17. 17. ScoonerDemostration Video<br /> http://slidesha.re/scooner-video<br />
    18. 18. Comments on Scooner<br />“The ability to browse predications together with documents will likely reduce the cognitive load required for encountering interesting facts, both for novice users and domain experts.” – Thomas Rindflesch, Researcher in Biomedical Informatics, NLM.<br />“Being able to keep track of multiple articles is a really nice tool to have, and it cuts down on time between jumping back and forth between articles” - Anonymous comment from an AFRL researcher <br />
    19. 19. Next: Knexpace<br />Automatic Updates<br />Index new abstracts as they arrive<br />Extract relationships as new abstracts arrive<br />Periodically update indices for abstracts and triples<br />Other Maintenance<br />Admin interfaces<br />Application software support<br />
    20. 20. Improving the KB Quality & Filtering<br />Adhere to a stricter schema<br />Having a fixed number of predicates and a fixed range and domain for each predicate<br />Ex: For the immunology and chemical warfare agents<br />Predicate: activates<br /> Restricted Domain: Cytokines OR μRNA<br />Restricted Range: Macrophages<br />Also useful to directly launch queries of the form<br /> Question: ?x activates Macrophages<br /> Answer: TNF-α and IFN-γ<br />
    21. 21. Normalize Entities and Predicates <br />How do we find <br />Bovine spongiform encephalopathy and mad cow disease are same (Use UMLS Metathesaurus! )<br />long term memory and long lasting memory are the same computationally (UMLS does not work)<br />More complex similarities: HepG(2) cell line and Human Hepatoma G2 cells<br />which textual forms map to the fixed set of predicates: Do modulates, regulates, stimulates all map to affects?(NLM expert collaboration & AFRL help)<br />NLM tools: MetaMap, SemRep, and other lexical tools<br />
    22. 22. Provenance and other meta data<br />Original abstract PMIDs will be captured for each triple.<br />Other data: authors, journal names<br />How about other meta data for filtering:<br />In Vitro / In Vivo<br />If, In Vivo, which organism. If Mice, which type: Ames Dwarf <br />Which techniques are used. ex: Flow Cytometry<br />
    23. 23. New Knowledge Example<br />VIP Peptide – increases – Catecholamine Biosynthesis<br />Catecholamines – induce – β-adrenergic receptor activity<br />β-adrenergic receptors – are involved – fear conditioning<br />VIP Peptide – affects – fear conditioning<br />In Cattle<br />In Rats<br />In Humans<br />
    24. 24. Domain Specific Provenance<br />For Immunology and Warfare Agent effects<br />Which pathogen: Francisellatularensis<br />Which Strains: U112, LVS, …<br />What is measured: Cells (dendritic, macrophages, monocytes), Proteins, cytokines, chemokines<br />Need to find preexisting taxonomies for organisms, techniques, pathogens; or need to build some and integrate<br />The more the specificity, the more the KB quality<br />
    25. 25. Better Ranking of Abstracts and Triples<br />Use search phrases to fine tune ranking of abstracts and triples<br />Just because the user clicked on neurogenesis, does not mean she wants to know everything about it. Predict which triples or abstracts the user might be more interested in the current search session.<br />Show top-k entities in the result set to facilitate filtering based on top concepts<br />Visualize networks of a set of browsed triples<br />
    26. 26. Semantic Integration<br />Famous OBO Ontologies (total 7 foundry ontologies)<br />Gene Ontology<br />PRotein Ontology<br />Other domain specific OBO candidateontologies<br />NCBI organismal classification<br />Infectious Disease<br />Human Disease (Tularemia found here, and also in SNOMED, & others too, what to choose for mapping?)<br />Pathogen transmission<br />
    27. 27. Semantic Integration, contd…<br />Linked Open Drug Data<br />DrugBank: drugs and drug targets (pathways, structures, pathways)<br />Diseasome: disorders and disease-gene associations<br />LinkedCT: Clinical trials<br />Mendelian Inheritance<br />Online MendelianInhertance in Animals (OMIA)<br />Online Mendelian Inheritance in Man (OMIM)<br />Different formats: owl, obo, rrf<br />
    28. 28. Tools, Vocabularies, Ontologies<br />UMLS<br />

    ×