Your SlideShare is downloading. ×
  • Like
  • Save
A knowledge capture framework for domain specific search systems
Upcoming SlideShare
Loading in...5

Thanks for flagging this SlideShare!

Oops! An error has occurred.


Now you can save presentations on your phone or tablet

Available for both IPhone and Android

Text the download link to your phone

Standard text messaging rates apply

A knowledge capture framework for domain specific search systems


This is the product roll out presentation at the AFRL on creating a focused knowledge base, search, and retrieval system for the domain of human performance and cognition.

This is the product roll out presentation at the AFRL on creating a focused knowledge base, search, and retrieval system for the domain of human performance and cognition.

Published in Education , Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads


Total Views
On SlideShare
From Embeds
Number of Embeds



Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

    No notes for slide
  • Just say that knowledge bases are useful for two specific purposes.In the next slide talk about what exactly is a KB, what does it look like.
  • Emphasize the predicates are primary indicators of new knowledge.People are aware of concepts, but what is interesting is which predicates (if any) connect those concepts
  • The edges here show part_of and is_a relations: neuroscience is a subclass of science, while brain is part of the general area of cognitive science.So we do not model a strict ontology here.
  • Ncbi – national center for biotechnology informationObo – open biomedical ontologiesNcbo – natinal center for biomedical ontologySNOMED - Systematized Nomenclature of Medicine
  • Rrf – rich release format


  • 1. An Up-to-date Knowledge Base and Focused Exploration System for Human Performance and Cognition
    LexisNexis Ohio Eminent Scholar
    Director, Kno.e.sis Center
    Wright State University
    Postdoctoral Research Scientist
    Kno.e.sis Center
    Thanks to Dr. Victor Chan for support and guidance
    HPC-KB team: Christopher Thomas, Wenbo Wang, Alan Smith, Paul Fultz, Delroy Cameron, Priti Parikh
  • 2. Focused Knowledge Bases
    A knowledge base (KB) functions as a
    standalone reference for a particular domain of interest
    backend for knowledge-based search, browsing, and exploration of literature
  • 3. What is a KB?
    “A body of knowledge describing a topic or domain of interest”
    categories or classes – Neurotransmitters, Disease
    individuals (instances of classes)
    Dopamine, Magnesium, Migraine
    roles (properties/predicates) – inhibits, is a, part of
    assertions (triples)
    Dopamine is a neurotransmitter
    Magnesium treats Migraine
  • 4. Then, What are Ontologies?
    “Ontology is the basic structure or armature around which a knowledge base can be built” (Swartout and Tate, 1999)
    “An ontology is an explicit representation of a shared understanding of the important concepts in some domain of interest.” (Kalfoglou, 2001)
    So, mostly static blocks of well accepted and consensual knowledge
  • 5. Ontologies in Life Sciences
    The National Center for Biomedical Ontology (NCBO) - Open Biomedical Ontologies (OBO)
    About 200 ontologies and 1.5 million terms
    Only part_of and is_a relations in the Gene Ontology
    Histolysis is_a positive regulation of cell size
    Requests for changes are expert reviewed before modifications
  • 6. What about Emergent Knowledge, Richer Relationships?
    New scientific results and insights published everyday backed by experimentation
    PubMed: 18+ million articles; 1300 new per day
    Also, what about other predicates besides is_a and part_of (eg., UMLS Semantic Network of 54 predicates).
    Need a way of capturing and meaningfully utilizing this emerging knowledge
  • 7. Enter HPC-KB
  • 8. Steps in Creating the HPC-KB
    Carve a focused domain hierarchy out of Wikipedia
    Extract mentions of entities and relationships in the relevant scientific literature (Pubmed abstracts) to support non-hierarchical guidance.
    Map extracted entity mentions to concepts and extracted predicates to relationships to create the knowledge-base
  • 9. Workflow Overview
    HPC keywords
    Doozer: Base Hierarchy from Wikipedia
    Focused Pattern based extraction
    SenseLab Neuroscience Ontologies
    Initial KB Creation
    Meta Knowledgebase
    PubMed Abstracts
    Knoesis: Parsing based NLP Triples
    Enrich Knowledge Base
    NLM: Rule based BKR Triples
    Final Knowledge Base
  • 10. Hierarchy Using Wikipedia Categories and Graph Structure
  • 11. Triple Extraction
    Open Extraction
    No fixed number of predetermined entities and predicates
    At Knoesis – NLP (parsing and dependency trees)
    Supervised Extraction
    Predetermined set of entities and predicates
    At Knoesis – Pattern based extraction to connect entities in the base hierarchy using statistical techniques
    At NLM – NLP and rule based approaches
  • 12. Mapping of Triples to Hierarchy
    Entities in both subject and object must contain at least one concept from the hierarchy to be mapped to the KB
    Preliminary synonyms based on anchor labels and page redirects in Wikipedia
    Prolactostatin redirects to Dopamine
    Predicates (verbs) and entities are subjected to stemming using Wordnet
  • 13. HPC-KB Stats
    • Number of Entities -- 2 Million
    • 14. Number of non-trivial facts -- 3 Million
    NLP based: calcium-binding protein S100B modulates long-term synaptic plasticity
    Pattern based: Olfactory Bulb has physical part of anatomic structure Mitral cell
    • Extracted from all abstracts untilAug 2008
    (Note: Abstracts accessible through Scooner: Oct 2010)
  • 15. Full Architecture
  • 16. Scooner Features
    Knowledge-based browsing: Relations window, inverse relations, creating trails
    Persistent projects: Work bench, browsing history, comments, filtering
    Collaboration: comments, dashboard, exporting (sub)projects, importing projects
  • 17. Comments on Scooner
    “The ability to browse predications together with documents will likely reduce the cognitive load required for encountering interesting facts, both for novice users and domain experts.” – Thomas Rindflesch, Researcher in Biomedical Informatics, NLM.
    “Being able to keep track of multiple articles is a really nice tool to have, and it cuts down on time between jumping back and forth between articles” - Anonymous comment from an AFRL researcher
  • 18. Future Work: Knexpace
    Automatic Updates
    Index new abstracts as they arrive
    Extract relationships as new abstracts arrive
    Periodically update indices for abstracts and triples
    Other Maintenance
    Admin interfaces
    Application software support
  • 19. Improving the KB Quality & Filtering
    Adhere to a stricter schema
    Having a fixed number of predicates and a fixed range and domain for each predicate
    Ex: For the immunology and chemical warfare agents
    Predicate: activates
    Restricted Domain: Cytokines OR μRNA
    Restricted Range: Macrophages
    Also useful to directly launch queries of the form
    Question: ?x activates Macrophages
    Answer: TNF-α and IFN-γ
  • 20. Normalize Entities and Predicates
    How do we find
    Bovine spongiform encephalopathy and mad cow disease are same (Use UMLS Metathesaurus! )
    long term memory and long lasting memory are the same computationally (UMLS does not work)
    More complex similarities: HepG(2) cell line and Human Hepatoma G2 cells
    which textual forms map to the fixed set of predicates: Do modulates, regulates, stimulates all map to affects?(NLM expert collaboration & AFRL help)
    NLM tools: MetaMap, SemRep, and other lexical tools
  • 21. Provenance and other meta data
    Original abstract PMIDs will be captured for each triple.
    Other data: authors, journal names
    How about other meta data for filtering:
    In Vitro / In Vivo
    If, In Vivo, which organism. If Mice, which type: Ames Dwarf
    Which techniques are used. ex: Flow Cytometry
  • 22. New Knowledge Example
    VIP Peptide – increases – Catecholamine Biosynthesis
    Catecholamines – induce – β-adrenergic receptor activity
    β-adrenergic receptors – are involved – fear conditioning
    VIP Peptide – affects – fear conditioning
    In Cattle
    In Rats
    In Humans
  • 23. Domain Specific Provenance
    For Immunology and Warfare Agent effects
    Which pathogen: Francisellatularensis
    Which Strains: U112, LVS, …
    What is measured: Cells (dendritic, macrophages, monocytes), Proteins, cytokines, chemokines
    Need to find preexisting taxonomies for organisms, techniques, pathogens; or need to build some and integrate
    The more the specificity, the more the KB quality
  • 24. Better Ranking of Abstracts and Triples
    Use search phrases to fine tune ranking of abstracts and triples
    Just because the user clicked on neurogenesis, does not mean she wants to know everything about it. Predict which triples or abstracts the user might be more interested in the current search session.
    Show top-k entities in the result set to facilitate filtering based on top concepts
    Visualize networks of a set of browsed triples
  • 25. Semantic Integration
    Famous OBO Ontologies (total 7 foundry ontologies)
    Gene Ontology
    PRotein Ontology
    Other domain specific OBO candidateontologies
    NCBI organismal classification
    Infectious Disease
    Human Disease (Tularemia found here, and also in SNOMED, & others too, what to choose for mapping?)
    Pathogen transmission
  • 26. Semantic Integration, contd…
    Linked Open Drug Data
    DrugBank: drugs and drug targets (pathways, structures, pathways)
    Diseasome: disorders and disease-gene associations
    LinkedCT: Clinical trials
    Mendelian Inheritance
    Online MendelianInhertance in Animals (OMIA)
    Online Mendelian Inheritance in Man (OMIM)
    Different formats: owl, obo, rrf
  • 27. Tools, Vocabularies, Ontologies