• Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
665
On Slideshare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
25
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide
  • First I will give a little context for ontologies and ontology learning Given the time constraints going through the tools one by one is not feasible so I am giving an overview of the techniques commonly used in the tools. I broke the tools into the following categories (explain categories) Then I will cover some possible future work in the conclusion
  • Ontologies capture concepts and relationships between them relationships are both taxonomic (class hierarchy) and non taxonomic (all other relationships) Ontologies Concepts medicine, disease Relationships Taxonomic doctor and patient a subclass of person Non taxonomic relationship between medicine and disease relationship between symptoms and disease Manual Ontology building often requires an expert and they disagree not practical for large scale ontologies also a problem for maintenance Reusing, maintaining and combining existing ontologies has the same problems
  • Ontology Learning aims to automate the ontology creation process uses techniques from other fields such as machine learning and inductive programming clustering, rule based still a long way from being fully automatic and useable on a large scale by novices requires validation and input from the user throughout the process
  • ASIUM processes the input text for syntactic structure creates initial clusters using word frequency uses clustering algorithm with user validation at each layer of the clustering HASTI one of the most robust end to end tools preprocesses the text using NLP Uses a modular architecture for each knowledge base and extraction unit uses templates to extract the knowledge uses other machine learning such as clustering to maintain and add to the ontology LTG preprocessing using NLP techniques runs series of algorithms on parsed output Mo’K workbench preprocessing using NLP techniques runs series of algorithms on parsed output OntoGen similar to ASIUM NLP ran on input documents unsupervised and supervised learning algorithms ran to generate suggestions OntoLT NLP is ran to create a XML version of the input with NLP annotations XSLT based rules are ran to extract the knowledge in the input SOAT NLP ran on input text rules used to start with a root concept and grow the taxonomy from that using the rules SVETLAN preprocessing using NLP techniques runs hierarchical clustering algorithm to create taxonomy using syntactic and semantic similarity
  • Cooperative learning appraoch tools checks with user at each step for validation or makes suggestion to user of actions to take ASIUM requires a lot of user interaction in the validation at each step in the clustering HASTI could allow more user adaptation like the other tools LTG earlier tools that does not actually output an ontology Mo’K workbench constrains the type of learning algorithms that can be used OntoGen requires a lot of user interaction in the validation at each step in the clustering OntoLT requires hand crafted XSLT rules and operators to build ontology SOAT Requires large amount of “high quality input” for domain coverage and concept learning SVETLAN more of a support tool to learn a basic taxonomy then an ontology learning tool
  • Very similar to the first group but also use structure present in existing ontologies, taxonomies and knowledge bases preprocess input text and use the same type of learning algorithms OntoEdit/KOAN/Text-To-Onto/Text2Onto modular architecture NLP processing of input text algorithm library used to run several algorithms on the input text some algorithms use information from input ontologies algorithms output is in standard format so it is combinable to create a meta-learner OntoLearn extracts terminology and then uses input to determine which terms are only used in that domain creates a concept forest for those terms using input ontology and inductive learning rules adds the concept forest back to ontology and trims the ontology to only represent that domain more robust then DODDLE tools by using semantic interpretation to associate an appropriate concept identifier with each term in the ontology ONTOTEXT rules based approach to learn ontology elements and extract the knowledge TFIDF NLP processing of input Extract single word terms learn multi-word terms and identify patterns Extract related terms by applying the learned patterns to the corpus and return back to the previous step Patterns are learned using existing patterns in the input ontology
  • OntoEdit/KOAN/Text-To-Onto/Text2Onto one of the most promising tools OntoLearn Depend on enough of the ontology domain being represented in the input ontology, taxonomy or knowledge base ONTOTEXT requires a large amount of hand created rules to learn even a small part of the ontology TFIDF Depend on enough of the ontology domain being represented in the input ontology, taxonomy or knowledge base
  • DODDLE and DODDLE II focus on building a hierarchically structured set of domain terms creates initial ontology by doing text matching to add domain terms to dictionary trims the initial ontology by determining which parts to drift to the final ontology as well as looking for inconsistent relationships and badly balanced sub-trees DODDLE II adds non taxonomy relationship discovery using word space and word cooccurrence to determine the strength of word relations uses word vector related similarity measures
  • DODDLE and DODDLE II highly dependent on the ability of simple text matching to map domain terms to the general dictionary to create the domain ontology could use more sophisticated matching techniques problem can be alleviated if a domain specific dictionary is available
  • Two subclasses for this group enhancing an existing ontology or merging two ontologies OntoBuilder and WebKB create an initial simple ontology using input and then augment it using further web page input use HTML page structure heavily to find important information WebKB is machine learning supervised learning based SyndiKate uses grammatical information available in the input text to learn new terms and add them to the input ontology assigns terms to the ontology using an iterative labeling techniques GLUE creates mappings between two ontologies use probability distributions learns classifiers to map between the two ontologies OntoLift requires the database owner to describe the structure of the underlying data and the types of queries that can be issues to the database uses mapping rules for mapping OntoMerge translates both ontologies to a semantic representation users bridging axioms to map between the two ontologies Prompt and Chimeara does the simple text based matching for the user creates list of suggested mappings and allows the user to validate them checks for inconsistencies in added mappings Useful for ontology maintance Prompt and Chimeara OntoBuilder WebKB
  • Make assumptions about input text being web based and coverage is based on input web pages OntoBuilder WebKB SyndiKate requires a large amount of input data GLUE only creates a one-to-one mappings OntoLift requires a lot of upfront work done by the data providers OntoMerge extra translation layer between syntactic and semantic layer Prompt and Chimeara requires a lot of user interaction
  • Selecting a tool most important thing for a user selecting a tool is the type of input they have available the next most important is the type of knowledge learned (concepts, taxonomic and non taxonomic relationships) could reverse these if you are willing to create the types of input needed will probably need to use a combination of several tools to get what is needed which lead to the workbench approach Suggested future work - Combine workbench best approach seen in the tools covered is the workbench approach that allows using multiple tools ideally an easily added to algorithm library that uses a common combinable representation and can take new algorithms as they are created also the workbench will probably need to validate the output from the various steps and algorithms which leads to the next point
  • Suggested future work - Validation for ontologies (manually created or machine learned) is still an open research area needs to be resolved if ontologies can be trusted for important teaks Machine learning validation Precision used to measure correctness of ontologies by looking for false positives Recall used to measure coverage of the ontology Validation is an important open area of research in ontology learning as noted by all of the papers can help solve other problems like the high need for user interaction in validation that could be replaces with validation techniques
  • fully automated solution workbench approach will help as well as validation support to replace the user in the validation of steps in the process Semantic web even though it still has a long way to go the hope for its possibilities are strong and the beginnings of it are being seen

Transcript

  • 1. Ontology Learning Tools: A Survey of Existing Tools Patrick Cash
  • 2. Outline
    • Ontology learning
    • Ontology learning from text
      • Learning from text alone
      • Learning from text and other resources
    • Ontology learning from structured data
      • Learning from a machine readable dictionary
      • Learning from existing ontologies
    • Conclusion
  • 3. Ontology Learning
    • Ontology
      • An explicit formal specification of a shared conceptualization of a domain
        • Facilitates knowledge sharing and machine understanding of knowledge
      • Manually building ontologies is a tedious task that becomes a bottleneck to knowledge acquisition
        • Ontology learning techniques were created to address this problem
  • 4. Ontology Learning
    • Uses machine learning and other AI techniques to learn ontology structure from input data
    • Fully automatic ontology learning remains in the distant future
      • Most tools are semi-automatic and require human (expert) intervention using cooperative learning approaches for ontology building
  • 5. Ontology Learning from Text
    • Learning from text alone
      • Natural language processing of the input text to find lexical and syntactic structure
      • Machine learning algorithms used to derive ontology structure out of this structure
        • Clustering – using user supplied similarity measure
        • Rule/template base knowledge extraction
        • Workbench tools allow the use of multiple algorithms
  • 6. Ontology Learning from Text
    • Problems with these tools
      • Requires large amounts of user interaction in validation
        • Cooperative learning approach
        • Not practical for large scale ontologies
      • Requires hand created rules and operators
      • Requires large amount of “high quality input” for domain coverage and concept learning
  • 7. Ontology Learning from Text
    • Learning from text and other resources
      • Natural language processing of the input text to find structure and extract domain keywords
      • Machine learning algorithms used to derive ontology structure out of the lexical and syntactic structure
        • Clustering, Rule/template base techniques
      • Uses structure in input ontologies like WordNet
        • Uses collocation and other attributes from the input ontology to form the new ontology
  • 8. Ontology Learning from Text
    • Problems with these tools
      • Many of the systems are rule based and require many hand created rules
        • Not practical for large ontologies
      • Depends on enough of the ontology domain being represented in the input ontology, taxonomy or knowledge base
        • Assumption will often not hold for technical or specialized domains
  • 9. Ontology Learning from Structured Data
    • Learning from a machine readable dictionary
      • Takes as input a machine readable dictionary and a set of domain keywords
      • Uses structure in machine readable dictionary
        • Creates initial ontology using information from machine readable dictionary
        • Adds domain keywords to the initial ontology
        • Trims initial ontology to create a domain specific ontology
  • 10. Ontology Learning from Structured Data
    • Problems with these tools
      • Depend on enough of the ontology domain being represented in the machine readable dictionary
        • Assumption will often not hold for technical or specialized domains
        • Matching to dictionary is based only on simple text or regular expression matching
  • 11. Ontology Learning from Structured Data
    • Learning from existing ontologies
      • Takes as input an existing ontology and enhances it
        • Uses classification techniques of the input data to determine where it is added to the input ontology
          • Supervised learning techniques with labeled training data
      • Takes as input two or more ontologies and translates or merges them by mapping between them
        • Simple text matching or mapping rules
        • Iterative labeling using statistical machine learning
        • Uses classifiers to map instances from one ontology into the other
  • 12. Ontology Learning from Structured Data
    • Problems with these tools
      • Makes several assumptions about the structure of the input data
        • Several of the tools are based on extraction of knowledge from HTML pages with specific structures (ex. Forms)
      • Requires large amount of upfront work done by information provider
        • Translation into a tool specific representation
        • Explicit mapping of input data’s structure using schemas and other hooks into the input knowledge base
  • 13. Conclusion
    • Selecting an ontology learning tool
      • Types of input available
      • Types of knowledge learned
    • Combined workbench/framework
      • The most robust tools are workbench or framework based using a modular architecture so that different learning techniques can be used in different use cases
      • A common representation will be needed for tools to work together
  • 14. Conclusion
    • Ontology Validation
      • Approaches using in current research
        • Validate against a gold standard ontology created by an expert
          • Not practical for large ontologies
        • Using machine learning validation techniques to validate the learned ontology
          • Precision: ratio of relevant terms retrieved over the entire number of terms in the ontology
          • Recall: ratio of relevant terms retrieved over the entire number of relevant terms
  • 15. Conclusion
    • Ontology learning techniques are still a long way from a fully automatic solution
    • Ontology learning techniques are necessary to make wide spread ontology use and the possibilities that entails practical
      • Tools that implement these techniques in a user friendly way are necessary for making these techniques available to non-expert users creating ontologies
  • 16.
    • Questions ?