SlideShare a Scribd company logo
1 of 37
Download to read offline
Motivation      Related work   Deficiencies        Research approach   Results   Discussion   Sum. Fw   Questions




               PrOntoLearn: Unsupervised lexico-semantic
             ontology generation using probabilistic methods

                   Saminda Abeyruwan1 Ubbo Visser1 Vance Lemmon2
                                     Stephan Sch¨rer3
                                                u

                            Department of Computer Science, University of Miami
               The Miami Project to Cure Paralysis, University of Miami Miller School of Medicine
             Department of Molecular and Cellular Pharmacology, University of Miami Miller School of
                                                   Medicine

                                             URSW 2010 7th November, 2010
Motivation     Related work   Deficiencies   Research approach   Results   Discussion   Sum. Fw   Questions



 Outline

        1    Motivation

        2    Related work

        3    Deficiencies

        4    Research approach

        5    Results

        6    Discussion

        7    Summary & Future work

        8    Questions
Motivation        Related work   Deficiencies   Research approach   Results   Discussion   Sum. Fw   Questions



 Motivation

       Why?
             1   An ontology is a formal, explicit specification of a shared
                 conceptualisation [TRG93, RS98]
             2   Knowledge-bases are represented by ontologies [UMLS09]
             3   Formalizing an ontology for a domain is a tedious and cumbersome
                 process (Knowledge acquisition bottleneck (KAB))
             4   Substantially large text corpora available to be classified into an
                 ontology [BAO09]
             5   Text corpora of the domain of discourse contains
                        Redundancy
                        Structured and unstructured text
                        Noisy data (Uncertainty via Degree of belief)
                        Lexical disambiguities
                        Semantic heterogeneity problems
             6   Research on KAB is highly investigated by the Semantic (Web)
                 Community
Motivation        Related work   Deficiencies   Research approach   Results   Discussion   Sum. Fw   Questions



 General idea


       General idea
             1   Reverse engineering an ontology (bottom-up) (Lexicon ⇒ An
                 ontology)
             2   Bayesian reasoning to deal with degree of belief
             3   Conceptualization is learned through probabilistic reasoning
             4   Lexicon-semantic structues extracted from Wordnet 3.0 [WN3009]
             5   Use top-down approach to check the consistency of the generated
                 ontology
             6   Constrained by conditions and hypotheses
             7   Serialize the learned ontology into OWL DL and query using
                 SPARQL
       “A little semantics goes a long way” - Hendler hypothesis [JH03]
Motivation    Related work   Deficiencies   Research approach   Results   Discussion   Sum. Fw   Questions



 Probabilistic reasoning & Heterogeneity


       Probabilistic reasoning
             P-CLASSIC [DK97]
             P-OWL extension [ZD04]
             P-SHIF(D), P-SHOIN(D) & P-Pellet [TL07, PP08]

       Heterogeneity
             Read the web project [TM09, TM10]
             SEAL, iSEAL & ASIA [RW07, RW08, RW09]
             Taxonomy induction [RS06]
             LOD [JB09, LD06]
Motivation    Related work   Deficiencies   Research approach   Results   Discussion   Sum. Fw   Questions



 Knowledge acquisition & ontology learning

       Knowledge acquisition
             Approaches [PC09, DSK09, LS09, HC09, JB05]
                    Large scale knowledge extraction
                    Knowledge integration
                    Extracting commonsensical knowledge
                    Textual entailment with first-order-logic
             Tools [TTO00, SS09, OLSW02, TTO01, HT09]
                    Text-To-Onto, Text2Onto, OntoWare.org LExO & HermiT


       Ontology learning
             Learning [PC09, PH05, CC08, JL09, LBM08]
                    Dealing with uncertainty and inconsistency
                    Semantic concepts with unsupervised statistical learning
                    Semantic Web Services & floksonomy
             Formal concept analysis [PC05]
Motivation        Related work   Deficiencies   Research approach   Results   Discussion   Sum. Fw   Questions



 Deficiencies
       Related work
            Pros
                        Learning terms, synonyms, concepts, taxonomies, rules, relations and
                        axioms for ontology O
                        NLP, dictionary passing, statistical methods & machine learning
                        techniques and co-occurrence among terms
                 Cons
                        Top-down approach. Classification or an ontology is given
                        Uncertainty is dealt with a domain expert
                        Most of the conceptualisation is learned by predefined rules

       Our approach
             1   Substantially large text corpora
             2   Uncertainty is represented with probabilistic approach
             3   Unsupervised learning
             4   Hypothesis: an ontology generation is much faster
             5   Goal: to achieve maximum confidence
Motivation        Related work   Deficiencies   Research approach   Results   Discussion   Sum. Fw   Questions



 Goals



       Goals
             1   To generate consistent lexico-semantic ontology O with a T − Box
                 and a A − Box that can be serialized into OWL DL
             2   Querying via SPARQL [SPARQL08] [JENA09]

       How do we start ?
             1   Corpus C contains a lot of documents di (di ∈ C ) for i = 1, 2, 3, . . .
             2   Learned lexicon set L contains a finite list of words wj
                 (L = w1 , w2 , . . . , wn ) and group set G contains a finite set of groups
                 gk (G = g1 , g2 , . . . , gm )
Motivation   Related work   Deficiencies   Research approach   Results   Discussion   Sum. Fw   Questions



 Overall process
Motivation   Related work     Deficiencies   Research approach   Results   Discussion   Sum. Fw   Questions




       Definition
       The lexicon L is the set that contains words belonging to the universe of
       English vocabulary, which is part-of-speech type tagged with the Penn
       Treebank English POS tag set [PT10] and the type of the word IS,

                            Term      Description
                            NN        Noun, singular or mass
                            NNP       Proper Noun, singular
                            NNS       Noun, plural
                            NNPS      Proper Noun, plural
                            JJ        Adjective
                            JJR       Adjective, comparative
                            JJS       Adjective, superlative
                            VB        Verb, base form
                            VBD       Verb, past tense
                            VBG       Verb, gerund or present participle
                            VBN       Verb, past participle
                            VBP       Verb, non-3rd person singular present
                            VBZ       Verb, 3rd person singular present
Motivation        Related work   Deficiencies   Research approach   Results   Discussion   Sum. Fw   Questions



 Phases


       Phases
             1   Pre-processing
                        Stanford tagger (the Pen Treebank POS tagger)
                        Filter elements for lexicon
             2   Syntactic analysis
                        Boostrap algorithm to count frequencies of words, groups
                        Normalizing, stemming and lemmatization of words
             3   Semantic analysis
                        Bayesian reasoning to produce concepts and relations
                        Subsumption hierarchy induction
                        Hyponym and meronym analysis
             4   Representation
                        Serialize to OWL DL
Motivation        Related work   Deficiencies   Research approach   Results   Discussion   Sum. Fw   Questions



 Pre-processing

       Filter
       Regex ([a-zA-Z]+[- ]?w*) , Length of a word (2)

       Example
             1   The mevalonate pathway is comprised of three consecutive reactions
                 that are catalyzed by the enzymes mevalonate kinase (MK; E.C.
                 2.7.1.36), phosphomevalonate kinase (PMK; E.C. 2.7.4.2), and
                 diphosphomevalonate decarboxylase (PDM-DC; E.C. 4.1.1.33).
             2   The DT mevalonate JJ pathway NN is VBZ comprised VBN of IN
                 three CD consecutive JJ reactions NNS that WDT are VBP
                 catalyzed VBN by IN the DT enzymes NNS mevalonate VBP
                 kinase NN -LRB- -LRB- MK NNP ; : E.C. NNP 2.7.1.36 CD
                 -RRB- -RRB- , , phosphomevalonate JJ kinase NN -LRB- -LRB-
                 PMK NNP ; : E.C. NNP 2.7.4.2 CD -RRB- -RRB- , , and CC
                 diphosphomevalonate JJ decarboxylase NN -LRB- -LRB-
                 PDM-DC NN ; : E.C. NNP 4.1.1.33 CD -RRB- -RRB- . .
Motivation        Related work   Deficiencies   Research approach   Results   Discussion   Sum. Fw   Questions



 Pre-processing

       Filter
       Regex ([a-zA-Z]+[- ]?w*) , Length of a word (2)

       Example
             1   The mevalonate pathway is comprised of three consecutive reactions
                 that are catalyzed by the enzymes mevalonate kinase (MK; E.C.
                 2.7.1.36), phosphomevalonate kinase (PMK; E.C. 2.7.4.2), and
                 diphosphomevalonate decarboxylase (PDM-DC; E.C. 4.1.1.33).
             2   The DT mevalonate JJ pathway NN is VBZ comprised VBN of IN
                 three CD consecutive JJ reactions NNS that WDT are VBP
                 catalyzed VBN by IN the DT enzymes NNS mevalonate VBP
                 kinase NN -LRB- -LRB- MK NNP ; : E.C. NNP 2.7.1.36 CD
                 -RRB- -RRB- , , phosphomevalonate JJ kinase NN -LRB- -LRB-
                 PMK NNP ; : E.C. NNP 2.7.4.2 CD -RRB- -RRB- , , and CC
                 diphosphomevalonate JJ decarboxylase NN -LRB- -LRB-
                 PDM-DC NN ; : E.C. NNP 4.1.1.33 CD -RRB- -RRB- . .
Motivation        Related work   Deficiencies   Research approach   Results   Discussion   Sum. Fw   Questions



 Syntactic analysis


       Bootstrap
             1   di (di ∈ C ) for i = 1, 2, 3, . . .
             2   From di read each sentence sj using OpenNLP
                 (sj ∈ di for j = 1, 2, 3, . . .)
             3   Generate lexicon L according to the definition of lexicon
             4   Each lexis wk ∈ L is normalized: find lemma or stemmed using
                 Wordnet 3.0
             5   Candidate semantic groups gl using N − Gram model for lexis wk
                 [SJB09]
             6   Candidate binary relationships vi (gj , gk ) vi , gk ∈ L using pattern
                 (NW OW VW NW OW )∗
                   ∗   ∗         ∗    ∗
Motivation   Related work   Deficiencies   Research approach   Results   Discussion   Sum. Fw   Questions



 N-Gram model
       3-Gram model




       4-Gram model




       Probability


                                P(wi |gj ) where i > 0, j > 0
Motivation   Related work   Deficiencies   Research approach   Results   Discussion   Sum. Fw   Questions



 N-Gram model
       3-Gram model




       4-Gram model




       Probability


                                P(wi |gj ) where i > 0, j > 0
Motivation   Related work   Deficiencies         Research approach         Results        Discussion   Sum. Fw   Questions



 T-Box subsumption model


       Subsumption model
                                                                    BN4


                                                                    w4


                            BN1                        BN2                          BN5


                            w1                         w2                           w5


                                          BN3

                                                                    g4
                                           w3




                             g1                        g2                           g2




                                           g3
Motivation   Related work       Deficiencies   Research approach   Results   Discussion   Sum. Fw   Questions



 T-Box relations model



       Relations model




       Semantic mapping
                            p(C1 , C2 |V ) = p(C1 |V )p(C2 |V ) → V (C1 , C2 )
Motivation        Related work   Deficiencies   Research approach   Results   Discussion   Sum. Fw   Questions



 Semantic analysis & representation



       Semantics
             1   Calculate probabilities
             2   T-Box subsumption model. Pruning parameter KF
             3   T-Box relations model. Pruning parameter RF
             4   Antonomy pruning
             5   Subsumption hierachy induction
             6   Hyponomy and meronym analisys using Wordnet recognizable words
             7   Serialize models to OWL DL
Motivation   Related work   Deficiencies   Research approach   Results   Discussion   Sum. Fw   Questions



 Example: T-Box Subsumption

       Example
Motivation   Related work   Deficiencies   Research approach   Results   Discussion   Sum. Fw   Questions



 Example: T-Box Relations

       Example
Motivation   Related work   Deficiencies   Research approach   Results   Discussion   Sum. Fw   Questions



 Example - Subsumption hierachy induction

       Example
Motivation        Related work   Deficiencies   Research approach   Results   Discussion   Sum. Fw   Questions



 Datasets



       Datasets
             1   PubChem assays, large public hight throughput screening dataset
                 [BAO09] (primary, qualitative evaluation). (Semantic Web Challenge
                 2010, http://bioassayontology.org )
             2   Sample collection of 218 web pages extracted from the University of
                 Miami, Dept. of Computer Science (www .cs.miami.edu) domain
                 (quantitative evaluation)
             3   Sample collection of 38 pdf files from ISWC 2009 proceedings
                 (secondary)
Motivation       Related work   Deficiencies     Research approach      Results   Discussion   Sum. Fw   Questions



 Dataset: www .cs.miami.edu domain

       Detaset
             Title                            Statistics            Description
                                                                    All documents are xhtml
             Documents                        218                   formated with a give template

                                                                    Norm. candidate concept words
             Unique ConceptWords              5,384                 from NN, NNP, NNS, JJ, JJR
                                                                    & JJS using [a-zA-Z]+[- ]?w*
                                                                    Norm. verbs from
             Unique Verbs                     835                   VB, VBD, VBG, VBN, VBP
                                                                    & VBZ using [a-zA-Z]+[- ]?w*
             Total   ConceptWords             39,455
             Total   Verbs                    4,797
             Total   Lexicon                  44,252                L = ConceptWords          Verbs
             Total   Groups                   39,455
Motivation         Related work   Deficiencies   Research approach    Results     Discussion      Sum. Fw    Questions



 Dataset: www .miami.edu domain, quantitative


   Measures: ref. ontology 1                                    Measures: ref. ontology 2
             KF        Prec.      Rec.      F1                       KF        Pre.       Rec.      F1
             0.1       0.209      1         0.309                    0.1       0.424      1         0.596
             0.2       0.194      1         0.325                    0.2       0.388      1         0.559
             0.3       0.257      1         0.410                    0.3       0.445      1         0.616
             0.4       0.257      1         0.410                    0.4       0.438      1         0.609
             0.5       0.257      1         0.410                    0.5       0.438      1         0.609
             0.6       0.248      1         0.397                    0.6       0.424      1         0.595
             0.7       0.244      1         0.393                    0.7       0.415      1         0.587
             0.8       0.236      1         0.383                    0.8       0.412      1         0.583
             0.9       0.237      1         0.383                    0.9       0.405      1         0.576
             1.0       0.13       1         0.232                    1.0       0.309      1         0.472
Motivation         Related work   Deficiencies   Research approach    Results     Discussion      Sum. Fw    Questions



 Dataset: www .miami.edu domain, quantitative


   Measures: ref. ontology 1                                    Measures: ref. ontology 2
             KF        Prec.      Rec.      F1                       KF        Pre.       Rec.      F1
             0.1       0.209      1         0.309                    0.1       0.424      1         0.596
             0.2       0.194      1         0.325                    0.2       0.388      1         0.559
             0.3       0.257      1         0.410                    0.3       0.445      1         0.616
             0.4       0.257      1         0.410                    0.4       0.438      1         0.609
             0.5       0.257      1         0.410                    0.5       0.438      1         0.609
             0.6       0.248      1         0.397                    0.6       0.424      1         0.595
             0.7       0.244      1         0.393                    0.7       0.415      1         0.587
             0.8       0.236      1         0.383                    0.8       0.412      1         0.583
             0.9       0.237      1         0.383                    0.9       0.405      1         0.576
             1.0       0.13       1         0.232                    1.0       0.309      1         0.472
Motivation       Related work   Deficiencies     Research approach      Results   Discussion   Sum. Fw   Questions



 Dataset: PubChem dataset (primary)

       Dataset
             Title                            Statistics            Description
                                                                    All documents are xhtml
             Documents                        1,759                 formated with a given template

                                                                    Norm. candidate concept words
             Unique ConceptWords              13,017                from NN, NNP, NNS, JJ, JJR
                                                                    & JJS using [a-zA-Z]+[- ]?w*
                                                                    Norm. verbs from
             Unique Verbs                     1,337                 VB, VBD, VBG, VBN, VBP
                                                                    & VBZ using [a-zA-Z]+[- ]?w*
             Total   ConceptWords             631,623
             Total   Verbs                    109,421
             Total   Lexicon                  741,044               L = ConceptWords          Verbs
             Total   Groups                   631,623
Motivation    Related work   Deficiencies   Research approach   Results   Discussion   Sum. Fw   Questions



 Dataset: BioAssay ontology dataset (primary)




       Evaluation: qualitative
             Availability of ground truth
             Domain expert evaluation (Prof. Stephan Schuerer)
             Results for 3-gram
                    Rich vocabulary
                    Good structure
             Suitable as a seeding ontology to influence domain experts decisions
Motivation           Related work           Deficiencies           Research approach             Results                Discussion           Sum. Fw               Questions



 Dataset: BioAssay ontology dataset (primary)


                                                                                                                           screen {some} assay_compound_line
                                                                                        acetylcholine_plate_step
                                                                                                                           screen {some} cell_compound_line


                                                                                                                               add {some} acetylcholine_assay_buffer
                                                                                      acetylcholine_calcium_receptor           add {some} assay_buffer_second
                                                                                                                               add {some} buffer_second_stimulation
                                                                                                                               add {some} second_step_stimulation


             Thing                  assay             0_acetylcholine
                                                                                 acetylcholine_receptor_turnover               add {some} acetylcholine_assay_buffer
                                                                                                                               add {some} assay_buffer_second




                                                                                                                              screen {some} assay_compound_line
                                                                                        acetylcholine_rat_receptor
                                                                                                                              screen {some} cell_compound_line




                                                                                      acetylcholine_nanomolar_plate
Motivation    Related work   Deficiencies   Research approach   Results   Discussion   Sum. Fw   Questions



 Discussion



       Discussion
           NLP expressions and our expression. Semantic attachment
             Substantial amount of data
             Distinction between concepts and individuals of the concepts
             WordNet unrecognizable words. Porter stemming algorithm.
             Complexity
                    Syntactic layer: O(M × max(sj) × max(wk))
                    Semantic layer: O(|L| × |SuperConcepts|)
                    Representation layer: complexity of Jena object model serializer
             Pellet and Fact++ reasoner output
Motivation    Related work   Deficiencies   Research approach   Results   Discussion   Sum. Fw   Questions



 Summary & Future work


       Summary
             Goal: The construction of an ontology for a random corpus
             Achievement: Seed ontology construction for a random text corpus
             Probabilistic reasoning to classify lexico-semantic structures

       Future work
           Inclusion of a set of English grammar rules to the N-gram models to
           get variable window sizes
           Extract information from other sources to provide a human readable
           concepts and roles
             Computational lexical semantics
             Expand the scope with adding more Pen Treebank tags
Motivation    Related work   Deficiencies   Research approach   Results   Discussion   Sum. Fw   Questions



 Summary & Future work


       Summary
             Goal: The construction of an ontology for a random corpus
             Achievement: Seed ontology construction for a random text corpus
             Probabilistic reasoning to classify lexico-semantic structures

       Future work
           Inclusion of a set of English grammar rules to the N-gram models to
           get variable window sizes
           Extract information from other sources to provide a human readable
           concepts and roles
             Computational lexical semantics
             Expand the scope with adding more Pen Treebank tags
Motivation   Related work   Deficiencies   Research approach   Results   Discussion   Sum. Fw   Questions



 Questions
Motivation     Related work          Deficiencies          Research approach           Results         Discussion           Sum. Fw             Questions



             UMLS. Unified Medical Language System        http://www.nlm.nih.gov/research/umls/ , 2009

             Alfresco Share Team. Alfresco BioAssayOntology     University of Miami, http://share.ccs.miami.edu/share/page/site-index , 2009.


             T. Berners-Lee. Linked Data    W3C Design Issues, 2006.

             J. Volker, P. Haase and P. Hitzler Learning Expressive Ontologies   Volume 2 Studies on the Semantic Web, 2009


             T. Mitchell. Populating the Semantic Web by Macro-Reading Internet Text      ISWC Keynote, 2009.

             A. Maedche and S. Staab The TEXT-TO-ONTO Ontology Learning Environment,             ICCS, 2000.


             A. Maedche. Ontology Learning for the Semantic Web,       Kluwer Academic Publishers,2002.

             S. Staab and R. Studer. Handbook on Ontologies      International Handbooks on Information Systems, 2009


             A. Maedche and R. Volz The Ontology Extraction and Maintenance Framework Text-To-Onto,             In proceeding of the ICDM’01
             Workshop on Integrating Data Mining and Knowledge Management, 2001

             P. Cimiano and J. Volker Text2Onto A framework for Ontology Learning and Data-driven Change Discovery,         Proceedings of
             the 10th Internatioanl Conference on Applications of Natural Language to Information System (NLDB), volume 3513 of LNCS,
             pages 227-238, Alicante, Spain, Springer, 2005

             P. Haase and J. Volker Ontology Learning and Reasoning - Dealing with Uncertainty and Inconsistency,      Proceedings of the
             workshop on Uncertainty Reasoning for the semantic web, (URSW pages 45-55), 2005

             P. Clark and P. Harrison. arge-Scae Extraction and Use of Knowledge from Text,      K-Cap, 2009.


             D.S. Kim, K. Barker, and B. Porter. Knowledge Integration Across Multiple Texts,     K-Cap, 2009.

             L. Schubert. Can be Derive General World Knowledge from Texts?.      K-Cap, 2009.


             H.C. Cankaya and D. Moldovan. Method for Extracting Commonsense Knowledge            K-Cap, 2009.
Motivation     Related work           Deficiencies          Research approach           Results           Discussion          Sum. Fw             Questions



             J. Bos and K. Markert. Recognising textual entailment with logical inference,    Proceedings of the conference on Human
             Language Technology and Empirical Methods in Natural Language Processing, Vancouver, British Columbia, Canada Pages: 628
             - 635, 2005.

             C. Chemudugunta, A. Holloway, P. Smyth and M. Steyvers Modeling Documents by Combining Semantic Concepts with
             Unsupervised Statistical Learning,   LNCS 5318, 2008.

             L.B. Marinho, K. Buza and L. Schmidt-Thieme Floksonomy-Based Collabulary Learning,           LNCS 5318, 2008.

             J.L. Ambite, S. Darbha, A. Goel, C.A. Knoblock,K. Lerman, R. Parundekar, T. Russ Automatically Constructing Semantic Web
             Services from Online Sources,     LNCS 5823, 2009.

             S. Russel and P. Norving Artificial Intelligence, A Modern Approach, 2nd ed.     Prentice Hall Series in Artificial Intelligence, 2001.

             S. Banerjee and T. Pedersen The Design, Implementation and Use of the Ngram Statistic Package,           LNCS 2588, 2009.


             T. Pedersen, M. Kayaalp and R. Bruce Significant Lexical Relationships,     13th National Conference on Artificial Intelligence,
             1996.

             Open NLP,      http://opennlp.sourceforge.net/ ,2009.


             WordNet 3.0.    http://wordnet.princeton.edu/, 2009.

             A. Ghazvinian, N.F. Noy, C. Jonquet, N. Shah and M.A. Musen. What Four Million Mappings Can Tell You about Two Hundred
             Ontologies,    LNCS 5823, 2009.

             J. Dolby, A. Fokoue, A. Kalyanpur, E. Schonberg and K. Srinivas Extracting Enterprise Vocabularies Using Linked Open Data,
             LNCS 5823, 2009..

             R. Snow, D. Jurafsky, A.Y. Ng Semantic Taxonomy Induction from Heterogeneous Evidence,           Proceedings of the 21st
             International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational
             Linguistics, Sydney, Australia, Pages: 801 - 808, 2006.

             R. Wang and W.W. Cohen Language-Independent Set Expansion of Named Entities using the Web,               Proceedings of the 2007
             Seventh IEEE International Conference on Data Mining, 2007.
Motivation     Related work           Deficiencies          Research approach            Results         Discussion          Sum. Fw            Questions



             R. Want and W.W. Cohen Iterative Set Expansion of Named Entitles using the Web,          , 2008 8th International Conference on
             Data Mining, 2008.

             R. Wang and W.W. Cohen Automatic Set Instance Extraction using the Web,           In Proceedings of the 47th Annual Meeting of
             the ACL and the 4th IJCNLP of the AFNLP, 2009

             SPARQL. SPARQL Query Language for RDF,           W3C Recommendation 15 January 2008,
             http://www.w3.org/TR/rdf-sparql-query/, 2008.

             Jena. A Semantic Web Framework for Java,       http://jena.sourceforge.net/ , 2009.


             P. Cimiano. Ontology Learning and Population from Text: Algorithms, Evaluation and Applications,        Springler, 2006

             P. Mika. Social Networks and the Semantic Web,       Springler, 2007.


             T.R. Gruber. Knowledge Acquisition,    A Translation Approach to Portable Ontologies. 5(2):199-220, 1993.

             R. Studer, R. Benjamins and D. Fensel. Data & Knowledge Engineering,        Knowledge Engineering: Principles and methods.
             25(1-2):161-198, 1998.

             J. Hendler. On beyond ontology,    Keynote talk, Second Internatioanl Semantic Web Conference, 2003.

             P. Cimiano, A. Madche, S. Staab and J. Volker. Ontology Learning,       Handbook On Ontologies, 254-267, 2009


             D. Koller and A. Levy and A. Pfeffer P-CLASSIC: A tractable probabilistic description logic, In Proceedings of AAAI-97, Pages
             390–397, 1997.

             Z. Ding and Yun Peng. A Probabilistic Extension to Ontology Language OWL, Proceedings of the 37th Hawaii International
             Conference on System Sciences, 2004.

             T. Lukasiewicz. Probabilistic description logics for the semantic web, Technical Report Nr. 1843-06-05, Institut fur
             Informationssysteme, Technische Universitat Wien, 2007.

             Pellet Pronto. Pellet Pronto, http://pellet.owldl.com/pronto/, 2008.
Motivation     Related work           Deficiencies          Research approach         Results        Discussion         Sum. Fw              Questions



             A. Carlson,J. Betteridge, R. C. Wang,E. R. Hruschka Jr. and T. M. Mitchell. Coupled Semi-Supervised Learning for Information
             Extraction, Proceedings of the Third ACM International Conference on Web Search and Data Mining (WSDM 2010), 2010.

             P. Cimiano and A. Hotho and S. Staab. Learning Concept Hierarchies from Text Corpora Using Formal Concept Analysis, Journal
             of Artificial Intelligence research, Pages 305–339, 2005.

             The Penn Treebank Project. The Penn Treebank Project, http://www.cis. upenn.edu/ treebank/, 2010.


             HermiT. Reasoning with Large Ontologies, http://www.comlab.ox.ac.uk/projects/HermiT/, 2010.

More Related Content

What's hot

An Intuitive Natural Language Understanding System
An Intuitive Natural Language Understanding SystemAn Intuitive Natural Language Understanding System
An Intuitive Natural Language Understanding Systeminscit2006
 
Pattern Mining To Unknown Word Extraction (10
Pattern Mining To Unknown Word Extraction (10Pattern Mining To Unknown Word Extraction (10
Pattern Mining To Unknown Word Extraction (10Jason Yang
 
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIESTHE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIESkevig
 
2010 PACLIC - pay attention to categories
2010 PACLIC - pay attention to categories2010 PACLIC - pay attention to categories
2010 PACLIC - pay attention to categoriesWarNik Chow
 
Nlp presentation
Nlp presentationNlp presentation
Nlp presentationSurya Sg
 
Latent Topic-semantic Indexing based Automatic Text Summarization
Latent Topic-semantic Indexing based Automatic Text SummarizationLatent Topic-semantic Indexing based Automatic Text Summarization
Latent Topic-semantic Indexing based Automatic Text SummarizationElaheh Barati
 
ESR10 Joachim Daiber - EXPERT Summer School - Malaga 2015
ESR10 Joachim Daiber - EXPERT Summer School - Malaga 2015ESR10 Joachim Daiber - EXPERT Summer School - Malaga 2015
ESR10 Joachim Daiber - EXPERT Summer School - Malaga 2015RIILP
 
Statistics-based Approaches to Lexical Semantics
Statistics-based Approaches to Lexical SemanticsStatistics-based Approaches to Lexical Semantics
Statistics-based Approaches to Lexical SemanticsMartin Thorsen Ranang
 
Improvement in Quality of Speech associated with Braille codes - A Review
Improvement in Quality of Speech associated with Braille codes - A ReviewImprovement in Quality of Speech associated with Braille codes - A Review
Improvement in Quality of Speech associated with Braille codes - A Reviewinscit2006
 
Objective Fiction, i-semantics keynote
Objective Fiction, i-semantics keynoteObjective Fiction, i-semantics keynote
Objective Fiction, i-semantics keynoteAldo Gangemi
 
dialogue act modeling for automatic tagging and recognition
 dialogue act modeling for automatic tagging and recognition dialogue act modeling for automatic tagging and recognition
dialogue act modeling for automatic tagging and recognitionVipul Munot
 
Automated Abstracts and Big Data
Automated Abstracts and Big DataAutomated Abstracts and Big Data
Automated Abstracts and Big DataSameer Wadkar
 
The recognition system of sentential
The recognition system of sententialThe recognition system of sentential
The recognition system of sententialijaia
 
Suitability of naïve bayesian methods for paragraph level text classification...
Suitability of naïve bayesian methods for paragraph level text classification...Suitability of naïve bayesian methods for paragraph level text classification...
Suitability of naïve bayesian methods for paragraph level text classification...ijaia
 
Unknown Word 08
Unknown Word 08Unknown Word 08
Unknown Word 08Jason Yang
 

What's hot (20)

L3 v2
L3 v2L3 v2
L3 v2
 
1 l5eng
1 l5eng1 l5eng
1 l5eng
 
An Intuitive Natural Language Understanding System
An Intuitive Natural Language Understanding SystemAn Intuitive Natural Language Understanding System
An Intuitive Natural Language Understanding System
 
Pattern Mining To Unknown Word Extraction (10
Pattern Mining To Unknown Word Extraction (10Pattern Mining To Unknown Word Extraction (10
Pattern Mining To Unknown Word Extraction (10
 
1909 paclic
1909 paclic1909 paclic
1909 paclic
 
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIESTHE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
 
2010 PACLIC - pay attention to categories
2010 PACLIC - pay attention to categories2010 PACLIC - pay attention to categories
2010 PACLIC - pay attention to categories
 
Nlp presentation
Nlp presentationNlp presentation
Nlp presentation
 
Latent Topic-semantic Indexing based Automatic Text Summarization
Latent Topic-semantic Indexing based Automatic Text SummarizationLatent Topic-semantic Indexing based Automatic Text Summarization
Latent Topic-semantic Indexing based Automatic Text Summarization
 
ESR10 Joachim Daiber - EXPERT Summer School - Malaga 2015
ESR10 Joachim Daiber - EXPERT Summer School - Malaga 2015ESR10 Joachim Daiber - EXPERT Summer School - Malaga 2015
ESR10 Joachim Daiber - EXPERT Summer School - Malaga 2015
 
Statistics-based Approaches to Lexical Semantics
Statistics-based Approaches to Lexical SemanticsStatistics-based Approaches to Lexical Semantics
Statistics-based Approaches to Lexical Semantics
 
Improvement in Quality of Speech associated with Braille codes - A Review
Improvement in Quality of Speech associated with Braille codes - A ReviewImprovement in Quality of Speech associated with Braille codes - A Review
Improvement in Quality of Speech associated with Braille codes - A Review
 
Objective Fiction, i-semantics keynote
Objective Fiction, i-semantics keynoteObjective Fiction, i-semantics keynote
Objective Fiction, i-semantics keynote
 
dialogue act modeling for automatic tagging and recognition
 dialogue act modeling for automatic tagging and recognition dialogue act modeling for automatic tagging and recognition
dialogue act modeling for automatic tagging and recognition
 
Automated Abstracts and Big Data
Automated Abstracts and Big DataAutomated Abstracts and Big Data
Automated Abstracts and Big Data
 
The recognition system of sentential
The recognition system of sententialThe recognition system of sentential
The recognition system of sentential
 
Parekh dfa
Parekh dfaParekh dfa
Parekh dfa
 
Language models
Language modelsLanguage models
Language models
 
Suitability of naïve bayesian methods for paragraph level text classification...
Suitability of naïve bayesian methods for paragraph level text classification...Suitability of naïve bayesian methods for paragraph level text classification...
Suitability of naïve bayesian methods for paragraph level text classification...
 
Unknown Word 08
Unknown Word 08Unknown Word 08
Unknown Word 08
 

Viewers also liked

Declarative Semantics Definition - Static Analysis and Error Checking
Declarative Semantics Definition - Static Analysis and Error CheckingDeclarative Semantics Definition - Static Analysis and Error Checking
Declarative Semantics Definition - Static Analysis and Error CheckingGuido Wachsmuth
 
Semantic Similarity Measures for Semantic Relation Extraction
Semantic Similarity Measures for Semantic Relation ExtractionSemantic Similarity Measures for Semantic Relation Extraction
Semantic Similarity Measures for Semantic Relation ExtractionAlexander Panchenko
 
Similarity Measures for Semantic Relation Extraction
Similarity Measures for Semantic Relation ExtractionSimilarity Measures for Semantic Relation Extraction
Similarity Measures for Semantic Relation ExtractionAlexander Panchenko
 
New strategies for teacher training 2
New strategies for teacher training 2New strategies for teacher training 2
New strategies for teacher training 2ksa
 
The good language teacher
The good language teacherThe good language teacher
The good language teacherFayez Habbal
 
Types of errors
Types of errorsTypes of errors
Types of errorsRima fathi
 
Error analysis presentation
Error analysis presentationError analysis presentation
Error analysis presentationGeraldine Lopez
 
Sentiment Analysis of Twitter Data
Sentiment Analysis of Twitter DataSentiment Analysis of Twitter Data
Sentiment Analysis of Twitter DataSumit Raj
 
Branches of linguistics
Branches of linguisticsBranches of linguistics
Branches of linguisticsApurv Verma
 

Viewers also liked (14)

Defensa memoria UAI Sentiment Analysis
Defensa memoria UAI Sentiment AnalysisDefensa memoria UAI Sentiment Analysis
Defensa memoria UAI Sentiment Analysis
 
Declarative Semantics Definition - Static Analysis and Error Checking
Declarative Semantics Definition - Static Analysis and Error CheckingDeclarative Semantics Definition - Static Analysis and Error Checking
Declarative Semantics Definition - Static Analysis and Error Checking
 
Semantic Similarity Measures for Semantic Relation Extraction
Semantic Similarity Measures for Semantic Relation ExtractionSemantic Similarity Measures for Semantic Relation Extraction
Semantic Similarity Measures for Semantic Relation Extraction
 
Similarity Measures for Semantic Relation Extraction
Similarity Measures for Semantic Relation ExtractionSimilarity Measures for Semantic Relation Extraction
Similarity Measures for Semantic Relation Extraction
 
New strategies for teacher training 2
New strategies for teacher training 2New strategies for teacher training 2
New strategies for teacher training 2
 
Semantic Patterns for Sentiment Analysis of Twitter
Semantic Patterns for Sentiment Analysis of TwitterSemantic Patterns for Sentiment Analysis of Twitter
Semantic Patterns for Sentiment Analysis of Twitter
 
The good language teacher
The good language teacherThe good language teacher
The good language teacher
 
(1) lexical semantics 1
(1) lexical semantics 1(1) lexical semantics 1
(1) lexical semantics 1
 
Error analysis revised
Error analysis revisedError analysis revised
Error analysis revised
 
Types of errors
Types of errorsTypes of errors
Types of errors
 
Error analysis presentation
Error analysis presentationError analysis presentation
Error analysis presentation
 
Sentiment Analysis of Twitter Data
Sentiment Analysis of Twitter DataSentiment Analysis of Twitter Data
Sentiment Analysis of Twitter Data
 
Branches of linguistics
Branches of linguisticsBranches of linguistics
Branches of linguistics
 
Applied linguistics
Applied linguisticsApplied linguistics
Applied linguistics
 

Similar to PrOntoLearn: Unsupervised Lexico-Semantic Ontology Generation using Probabilistic Methods

Site2011 tomidaokibayashitamura
Site2011 tomidaokibayashitamuraSite2011 tomidaokibayashitamura
Site2011 tomidaokibayashitamuraEiji Tomida
 
Formal languages to map Genotype to Phenotype in Natural Genomes
Formal languages to map Genotype to Phenotype in Natural GenomesFormal languages to map Genotype to Phenotype in Natural Genomes
Formal languages to map Genotype to Phenotype in Natural Genomesmadalladam
 
Introduction to BioNLP and its applications
Introduction to BioNLP and its applicationsIntroduction to BioNLP and its applications
Introduction to BioNLP and its applicationsShankaiYan
 
P#2 research framework
P#2 research frameworkP#2 research framework
P#2 research frameworkAPTIKOM3
 
MEBI 591C/598 – Data and Text Mining in Biomedical Informatics
MEBI 591C/598 – Data and Text Mining in Biomedical InformaticsMEBI 591C/598 – Data and Text Mining in Biomedical Informatics
MEBI 591C/598 – Data and Text Mining in Biomedical Informaticsbutest
 
QA4MRE LIMSI-CNRS - Gleize et al. 2013
QA4MRE LIMSI-CNRS - Gleize et al. 2013QA4MRE LIMSI-CNRS - Gleize et al. 2013
QA4MRE LIMSI-CNRS - Gleize et al. 2013Frédéric Giannetti
 
Cartic Ramakrishnan's dissertation defense
Cartic Ramakrishnan's dissertation defenseCartic Ramakrishnan's dissertation defense
Cartic Ramakrishnan's dissertation defenseCartic Ramakrishnan
 
Themes identification techniques in qualitative research
Themes identification techniques in qualitative researchThemes identification techniques in qualitative research
Themes identification techniques in qualitative researchGhulam Qambar
 
Question Classification using Semantic, Syntactic and Lexical features
Question Classification using Semantic, Syntactic and Lexical featuresQuestion Classification using Semantic, Syntactic and Lexical features
Question Classification using Semantic, Syntactic and Lexical featuresIJwest
 
Question Classification using Semantic, Syntactic and Lexical features
Question Classification using Semantic, Syntactic and Lexical featuresQuestion Classification using Semantic, Syntactic and Lexical features
Question Classification using Semantic, Syntactic and Lexical featuresdannyijwest
 
Mariana Neves - 2017 - Olelo: A Question Answering Application for Biomedicine
Mariana Neves - 2017 - Olelo: A Question Answering Application for BiomedicineMariana Neves - 2017 - Olelo: A Question Answering Application for Biomedicine
Mariana Neves - 2017 - Olelo: A Question Answering Application for BiomedicineAssociation for Computational Linguistics
 
Efficient Probabilistic Logic Programming for Biological Sequence Analysis
Efficient Probabilistic Logic Programming for Biological Sequence AnalysisEfficient Probabilistic Logic Programming for Biological Sequence Analysis
Efficient Probabilistic Logic Programming for Biological Sequence AnalysisChristian Have
 
Introduction to Natural Language Processing
Introduction to Natural Language ProcessingIntroduction to Natural Language Processing
Introduction to Natural Language ProcessingPranav Gupta
 
AsiaCALL 2017 presentation
AsiaCALL 2017 presentationAsiaCALL 2017 presentation
AsiaCALL 2017 presentationTakeshi Sato
 
Word Frequency Effects and Plurality in L2 Word Recognition—A Preliminary Study—
Word Frequency Effects and Plurality in L2 Word Recognition—A Preliminary Study—Word Frequency Effects and Plurality in L2 Word Recognition—A Preliminary Study—
Word Frequency Effects and Plurality in L2 Word Recognition—A Preliminary Study—Yu Tamura
 
Biomedical literature mining
Biomedical literature miningBiomedical literature mining
Biomedical literature miningLars Juhl Jensen
 
A survey on phrase structure learning methods for text classification
A survey on phrase structure learning methods for text classificationA survey on phrase structure learning methods for text classification
A survey on phrase structure learning methods for text classificationijnlc
 

Similar to PrOntoLearn: Unsupervised Lexico-Semantic Ontology Generation using Probabilistic Methods (20)

Site2011 tomidaokibayashitamura
Site2011 tomidaokibayashitamuraSite2011 tomidaokibayashitamura
Site2011 tomidaokibayashitamura
 
Biowriting
BiowritingBiowriting
Biowriting
 
Formal languages to map Genotype to Phenotype in Natural Genomes
Formal languages to map Genotype to Phenotype in Natural GenomesFormal languages to map Genotype to Phenotype in Natural Genomes
Formal languages to map Genotype to Phenotype in Natural Genomes
 
Introduction to BioNLP and its applications
Introduction to BioNLP and its applicationsIntroduction to BioNLP and its applications
Introduction to BioNLP and its applications
 
P#2 research framework
P#2 research frameworkP#2 research framework
P#2 research framework
 
MEBI 591C/598 – Data and Text Mining in Biomedical Informatics
MEBI 591C/598 – Data and Text Mining in Biomedical InformaticsMEBI 591C/598 – Data and Text Mining in Biomedical Informatics
MEBI 591C/598 – Data and Text Mining in Biomedical Informatics
 
QA4MRE LIMSI-CNRS - Gleize et al. 2013
QA4MRE LIMSI-CNRS - Gleize et al. 2013QA4MRE LIMSI-CNRS - Gleize et al. 2013
QA4MRE LIMSI-CNRS - Gleize et al. 2013
 
Cartic Ramakrishnan's dissertation defense
Cartic Ramakrishnan's dissertation defenseCartic Ramakrishnan's dissertation defense
Cartic Ramakrishnan's dissertation defense
 
Themes identification techniques in qualitative research
Themes identification techniques in qualitative researchThemes identification techniques in qualitative research
Themes identification techniques in qualitative research
 
Question Classification using Semantic, Syntactic and Lexical features
Question Classification using Semantic, Syntactic and Lexical featuresQuestion Classification using Semantic, Syntactic and Lexical features
Question Classification using Semantic, Syntactic and Lexical features
 
Question Classification using Semantic, Syntactic and Lexical features
Question Classification using Semantic, Syntactic and Lexical featuresQuestion Classification using Semantic, Syntactic and Lexical features
Question Classification using Semantic, Syntactic and Lexical features
 
Mariana Neves - 2017 - Olelo: A Question Answering Application for Biomedicine
Mariana Neves - 2017 - Olelo: A Question Answering Application for BiomedicineMariana Neves - 2017 - Olelo: A Question Answering Application for Biomedicine
Mariana Neves - 2017 - Olelo: A Question Answering Application for Biomedicine
 
Efficient Probabilistic Logic Programming for Biological Sequence Analysis
Efficient Probabilistic Logic Programming for Biological Sequence AnalysisEfficient Probabilistic Logic Programming for Biological Sequence Analysis
Efficient Probabilistic Logic Programming for Biological Sequence Analysis
 
Introduction to Natural Language Processing
Introduction to Natural Language ProcessingIntroduction to Natural Language Processing
Introduction to Natural Language Processing
 
AsiaCALL 2017 presentation
AsiaCALL 2017 presentationAsiaCALL 2017 presentation
AsiaCALL 2017 presentation
 
Word Frequency Effects and Plurality in L2 Word Recognition—A Preliminary Study—
Word Frequency Effects and Plurality in L2 Word Recognition—A Preliminary Study—Word Frequency Effects and Plurality in L2 Word Recognition—A Preliminary Study—
Word Frequency Effects and Plurality in L2 Word Recognition—A Preliminary Study—
 
Biomedical literature mining
Biomedical literature miningBiomedical literature mining
Biomedical literature mining
 
Deep learning for nlp
Deep learning for nlpDeep learning for nlp
Deep learning for nlp
 
A survey on phrase structure learning methods for text classification
A survey on phrase structure learning methods for text classificationA survey on phrase structure learning methods for text classification
A survey on phrase structure learning methods for text classification
 
Ibn Sina
Ibn SinaIbn Sina
Ibn Sina
 

More from Rommel Carvalho

Ouvidoria de Balcão vs Ouvidoria Digital: Desafios na Era Big Data
Ouvidoria de Balcão vs Ouvidoria Digital: Desafios na Era Big DataOuvidoria de Balcão vs Ouvidoria Digital: Desafios na Era Big Data
Ouvidoria de Balcão vs Ouvidoria Digital: Desafios na Era Big DataRommel Carvalho
 
Como transformar servidores em cientistas de dados e diminuir a distância ent...
Como transformar servidores em cientistas de dados e diminuir a distância ent...Como transformar servidores em cientistas de dados e diminuir a distância ent...
Como transformar servidores em cientistas de dados e diminuir a distância ent...Rommel Carvalho
 
Proposta de Modelo de Classificação de Riscos de Contratos Públicos
Proposta de Modelo de Classificação de Riscos de Contratos PúblicosProposta de Modelo de Classificação de Riscos de Contratos Públicos
Proposta de Modelo de Classificação de Riscos de Contratos PúblicosRommel Carvalho
 
Categorização de achados em auditorias de TI com modelos supervisionados e nã...
Categorização de achados em auditorias de TI com modelos supervisionados e nã...Categorização de achados em auditorias de TI com modelos supervisionados e nã...
Categorização de achados em auditorias de TI com modelos supervisionados e nã...Rommel Carvalho
 
Mapeamento de risco de corrupção na administração pública federal
Mapeamento de risco de corrupção na administração pública federalMapeamento de risco de corrupção na administração pública federal
Mapeamento de risco de corrupção na administração pública federalRommel Carvalho
 
Ciência de Dados no Combate à Corrupção
Ciência de Dados no Combate à CorrupçãoCiência de Dados no Combate à Corrupção
Ciência de Dados no Combate à CorrupçãoRommel Carvalho
 
Aplicação de técnicas de mineração de textos para classificação automática de...
Aplicação de técnicas de mineração de textos para classificação automática de...Aplicação de técnicas de mineração de textos para classificação automática de...
Aplicação de técnicas de mineração de textos para classificação automática de...Rommel Carvalho
 
Filiação partidária e risco de corrupção de servidores públicos federais
Filiação partidária e risco de corrupção de servidores públicos federaisFiliação partidária e risco de corrupção de servidores públicos federais
Filiação partidária e risco de corrupção de servidores públicos federaisRommel Carvalho
 
Uso de mineração de dados e textos para cálculo de preços de referência em co...
Uso de mineração de dados e textos para cálculo de preços de referência em co...Uso de mineração de dados e textos para cálculo de preços de referência em co...
Uso de mineração de dados e textos para cálculo de preços de referência em co...Rommel Carvalho
 
Detecção preventiva de fracionamento de compras
Detecção preventiva de fracionamento de comprasDetecção preventiva de fracionamento de compras
Detecção preventiva de fracionamento de comprasRommel Carvalho
 
Identificação automática de tipos de pedidos mais frequentes da LAI
Identificação automática de tipos de pedidos mais frequentes da LAIIdentificação automática de tipos de pedidos mais frequentes da LAI
Identificação automática de tipos de pedidos mais frequentes da LAIRommel Carvalho
 
BMAW 2014 - Using Bayesian Networks to Identify and Prevent Split Purchases i...
BMAW 2014 - Using Bayesian Networks to Identify and Prevent Split Purchases i...BMAW 2014 - Using Bayesian Networks to Identify and Prevent Split Purchases i...
BMAW 2014 - Using Bayesian Networks to Identify and Prevent Split Purchases i...Rommel Carvalho
 
URSW 2013 - UMP-ST plug-in
URSW 2013 - UMP-ST plug-inURSW 2013 - UMP-ST plug-in
URSW 2013 - UMP-ST plug-inRommel Carvalho
 
Integração do Portal da Copa @ Comissão CMA do Senado Federal
Integração do Portal da Copa @ Comissão CMA do Senado FederalIntegração do Portal da Copa @ Comissão CMA do Senado Federal
Integração do Portal da Copa @ Comissão CMA do Senado FederalRommel Carvalho
 
Dados Abertos Governamentais
Dados Abertos GovernamentaisDados Abertos Governamentais
Dados Abertos GovernamentaisRommel Carvalho
 
Modeling a Probabilistic Ontology for Maritime Domain Awareness
Modeling a Probabilistic Ontology for Maritime Domain AwarenessModeling a Probabilistic Ontology for Maritime Domain Awareness
Modeling a Probabilistic Ontology for Maritime Domain AwarenessRommel Carvalho
 
Probabilistic Ontology: Representation and Modeling Methodology
Probabilistic Ontology: Representation and Modeling MethodologyProbabilistic Ontology: Representation and Modeling Methodology
Probabilistic Ontology: Representation and Modeling MethodologyRommel Carvalho
 
SWRL-F - A Fuzzy Logic Extension of the Semantic Web Rule Language
SWRL-F - A Fuzzy Logic Extension of the Semantic Web Rule LanguageSWRL-F - A Fuzzy Logic Extension of the Semantic Web Rule Language
SWRL-F - A Fuzzy Logic Extension of the Semantic Web Rule LanguageRommel Carvalho
 
Default Logics for Plausible Reasoning with Controversial Axioms
Default Logics for Plausible Reasoning with Controversial AxiomsDefault Logics for Plausible Reasoning with Controversial Axioms
Default Logics for Plausible Reasoning with Controversial AxiomsRommel Carvalho
 

More from Rommel Carvalho (20)

Ouvidoria de Balcão vs Ouvidoria Digital: Desafios na Era Big Data
Ouvidoria de Balcão vs Ouvidoria Digital: Desafios na Era Big DataOuvidoria de Balcão vs Ouvidoria Digital: Desafios na Era Big Data
Ouvidoria de Balcão vs Ouvidoria Digital: Desafios na Era Big Data
 
Como transformar servidores em cientistas de dados e diminuir a distância ent...
Como transformar servidores em cientistas de dados e diminuir a distância ent...Como transformar servidores em cientistas de dados e diminuir a distância ent...
Como transformar servidores em cientistas de dados e diminuir a distância ent...
 
Proposta de Modelo de Classificação de Riscos de Contratos Públicos
Proposta de Modelo de Classificação de Riscos de Contratos PúblicosProposta de Modelo de Classificação de Riscos de Contratos Públicos
Proposta de Modelo de Classificação de Riscos de Contratos Públicos
 
Categorização de achados em auditorias de TI com modelos supervisionados e nã...
Categorização de achados em auditorias de TI com modelos supervisionados e nã...Categorização de achados em auditorias de TI com modelos supervisionados e nã...
Categorização de achados em auditorias de TI com modelos supervisionados e nã...
 
Mapeamento de risco de corrupção na administração pública federal
Mapeamento de risco de corrupção na administração pública federalMapeamento de risco de corrupção na administração pública federal
Mapeamento de risco de corrupção na administração pública federal
 
Ciência de Dados no Combate à Corrupção
Ciência de Dados no Combate à CorrupçãoCiência de Dados no Combate à Corrupção
Ciência de Dados no Combate à Corrupção
 
Aplicação de técnicas de mineração de textos para classificação automática de...
Aplicação de técnicas de mineração de textos para classificação automática de...Aplicação de técnicas de mineração de textos para classificação automática de...
Aplicação de técnicas de mineração de textos para classificação automática de...
 
Filiação partidária e risco de corrupção de servidores públicos federais
Filiação partidária e risco de corrupção de servidores públicos federaisFiliação partidária e risco de corrupção de servidores públicos federais
Filiação partidária e risco de corrupção de servidores públicos federais
 
Uso de mineração de dados e textos para cálculo de preços de referência em co...
Uso de mineração de dados e textos para cálculo de preços de referência em co...Uso de mineração de dados e textos para cálculo de preços de referência em co...
Uso de mineração de dados e textos para cálculo de preços de referência em co...
 
Detecção preventiva de fracionamento de compras
Detecção preventiva de fracionamento de comprasDetecção preventiva de fracionamento de compras
Detecção preventiva de fracionamento de compras
 
Identificação automática de tipos de pedidos mais frequentes da LAI
Identificação automática de tipos de pedidos mais frequentes da LAIIdentificação automática de tipos de pedidos mais frequentes da LAI
Identificação automática de tipos de pedidos mais frequentes da LAI
 
BMAW 2014 - Using Bayesian Networks to Identify and Prevent Split Purchases i...
BMAW 2014 - Using Bayesian Networks to Identify and Prevent Split Purchases i...BMAW 2014 - Using Bayesian Networks to Identify and Prevent Split Purchases i...
BMAW 2014 - Using Bayesian Networks to Identify and Prevent Split Purchases i...
 
A GUI for MLN
A GUI for MLNA GUI for MLN
A GUI for MLN
 
URSW 2013 - UMP-ST plug-in
URSW 2013 - UMP-ST plug-inURSW 2013 - UMP-ST plug-in
URSW 2013 - UMP-ST plug-in
 
Integração do Portal da Copa @ Comissão CMA do Senado Federal
Integração do Portal da Copa @ Comissão CMA do Senado FederalIntegração do Portal da Copa @ Comissão CMA do Senado Federal
Integração do Portal da Copa @ Comissão CMA do Senado Federal
 
Dados Abertos Governamentais
Dados Abertos GovernamentaisDados Abertos Governamentais
Dados Abertos Governamentais
 
Modeling a Probabilistic Ontology for Maritime Domain Awareness
Modeling a Probabilistic Ontology for Maritime Domain AwarenessModeling a Probabilistic Ontology for Maritime Domain Awareness
Modeling a Probabilistic Ontology for Maritime Domain Awareness
 
Probabilistic Ontology: Representation and Modeling Methodology
Probabilistic Ontology: Representation and Modeling MethodologyProbabilistic Ontology: Representation and Modeling Methodology
Probabilistic Ontology: Representation and Modeling Methodology
 
SWRL-F - A Fuzzy Logic Extension of the Semantic Web Rule Language
SWRL-F - A Fuzzy Logic Extension of the Semantic Web Rule LanguageSWRL-F - A Fuzzy Logic Extension of the Semantic Web Rule Language
SWRL-F - A Fuzzy Logic Extension of the Semantic Web Rule Language
 
Default Logics for Plausible Reasoning with Controversial Axioms
Default Logics for Plausible Reasoning with Controversial AxiomsDefault Logics for Plausible Reasoning with Controversial Axioms
Default Logics for Plausible Reasoning with Controversial Axioms
 

PrOntoLearn: Unsupervised Lexico-Semantic Ontology Generation using Probabilistic Methods

  • 1. Motivation Related work Deficiencies Research approach Results Discussion Sum. Fw Questions PrOntoLearn: Unsupervised lexico-semantic ontology generation using probabilistic methods Saminda Abeyruwan1 Ubbo Visser1 Vance Lemmon2 Stephan Sch¨rer3 u Department of Computer Science, University of Miami The Miami Project to Cure Paralysis, University of Miami Miller School of Medicine Department of Molecular and Cellular Pharmacology, University of Miami Miller School of Medicine URSW 2010 7th November, 2010
  • 2. Motivation Related work Deficiencies Research approach Results Discussion Sum. Fw Questions Outline 1 Motivation 2 Related work 3 Deficiencies 4 Research approach 5 Results 6 Discussion 7 Summary & Future work 8 Questions
  • 3. Motivation Related work Deficiencies Research approach Results Discussion Sum. Fw Questions Motivation Why? 1 An ontology is a formal, explicit specification of a shared conceptualisation [TRG93, RS98] 2 Knowledge-bases are represented by ontologies [UMLS09] 3 Formalizing an ontology for a domain is a tedious and cumbersome process (Knowledge acquisition bottleneck (KAB)) 4 Substantially large text corpora available to be classified into an ontology [BAO09] 5 Text corpora of the domain of discourse contains Redundancy Structured and unstructured text Noisy data (Uncertainty via Degree of belief) Lexical disambiguities Semantic heterogeneity problems 6 Research on KAB is highly investigated by the Semantic (Web) Community
  • 4. Motivation Related work Deficiencies Research approach Results Discussion Sum. Fw Questions General idea General idea 1 Reverse engineering an ontology (bottom-up) (Lexicon ⇒ An ontology) 2 Bayesian reasoning to deal with degree of belief 3 Conceptualization is learned through probabilistic reasoning 4 Lexicon-semantic structues extracted from Wordnet 3.0 [WN3009] 5 Use top-down approach to check the consistency of the generated ontology 6 Constrained by conditions and hypotheses 7 Serialize the learned ontology into OWL DL and query using SPARQL “A little semantics goes a long way” - Hendler hypothesis [JH03]
  • 5. Motivation Related work Deficiencies Research approach Results Discussion Sum. Fw Questions Probabilistic reasoning & Heterogeneity Probabilistic reasoning P-CLASSIC [DK97] P-OWL extension [ZD04] P-SHIF(D), P-SHOIN(D) & P-Pellet [TL07, PP08] Heterogeneity Read the web project [TM09, TM10] SEAL, iSEAL & ASIA [RW07, RW08, RW09] Taxonomy induction [RS06] LOD [JB09, LD06]
  • 6. Motivation Related work Deficiencies Research approach Results Discussion Sum. Fw Questions Knowledge acquisition & ontology learning Knowledge acquisition Approaches [PC09, DSK09, LS09, HC09, JB05] Large scale knowledge extraction Knowledge integration Extracting commonsensical knowledge Textual entailment with first-order-logic Tools [TTO00, SS09, OLSW02, TTO01, HT09] Text-To-Onto, Text2Onto, OntoWare.org LExO & HermiT Ontology learning Learning [PC09, PH05, CC08, JL09, LBM08] Dealing with uncertainty and inconsistency Semantic concepts with unsupervised statistical learning Semantic Web Services & floksonomy Formal concept analysis [PC05]
  • 7. Motivation Related work Deficiencies Research approach Results Discussion Sum. Fw Questions Deficiencies Related work Pros Learning terms, synonyms, concepts, taxonomies, rules, relations and axioms for ontology O NLP, dictionary passing, statistical methods & machine learning techniques and co-occurrence among terms Cons Top-down approach. Classification or an ontology is given Uncertainty is dealt with a domain expert Most of the conceptualisation is learned by predefined rules Our approach 1 Substantially large text corpora 2 Uncertainty is represented with probabilistic approach 3 Unsupervised learning 4 Hypothesis: an ontology generation is much faster 5 Goal: to achieve maximum confidence
  • 8. Motivation Related work Deficiencies Research approach Results Discussion Sum. Fw Questions Goals Goals 1 To generate consistent lexico-semantic ontology O with a T − Box and a A − Box that can be serialized into OWL DL 2 Querying via SPARQL [SPARQL08] [JENA09] How do we start ? 1 Corpus C contains a lot of documents di (di ∈ C ) for i = 1, 2, 3, . . . 2 Learned lexicon set L contains a finite list of words wj (L = w1 , w2 , . . . , wn ) and group set G contains a finite set of groups gk (G = g1 , g2 , . . . , gm )
  • 9. Motivation Related work Deficiencies Research approach Results Discussion Sum. Fw Questions Overall process
  • 10. Motivation Related work Deficiencies Research approach Results Discussion Sum. Fw Questions Definition The lexicon L is the set that contains words belonging to the universe of English vocabulary, which is part-of-speech type tagged with the Penn Treebank English POS tag set [PT10] and the type of the word IS, Term Description NN Noun, singular or mass NNP Proper Noun, singular NNS Noun, plural NNPS Proper Noun, plural JJ Adjective JJR Adjective, comparative JJS Adjective, superlative VB Verb, base form VBD Verb, past tense VBG Verb, gerund or present participle VBN Verb, past participle VBP Verb, non-3rd person singular present VBZ Verb, 3rd person singular present
  • 11. Motivation Related work Deficiencies Research approach Results Discussion Sum. Fw Questions Phases Phases 1 Pre-processing Stanford tagger (the Pen Treebank POS tagger) Filter elements for lexicon 2 Syntactic analysis Boostrap algorithm to count frequencies of words, groups Normalizing, stemming and lemmatization of words 3 Semantic analysis Bayesian reasoning to produce concepts and relations Subsumption hierarchy induction Hyponym and meronym analysis 4 Representation Serialize to OWL DL
  • 12. Motivation Related work Deficiencies Research approach Results Discussion Sum. Fw Questions Pre-processing Filter Regex ([a-zA-Z]+[- ]?w*) , Length of a word (2) Example 1 The mevalonate pathway is comprised of three consecutive reactions that are catalyzed by the enzymes mevalonate kinase (MK; E.C. 2.7.1.36), phosphomevalonate kinase (PMK; E.C. 2.7.4.2), and diphosphomevalonate decarboxylase (PDM-DC; E.C. 4.1.1.33). 2 The DT mevalonate JJ pathway NN is VBZ comprised VBN of IN three CD consecutive JJ reactions NNS that WDT are VBP catalyzed VBN by IN the DT enzymes NNS mevalonate VBP kinase NN -LRB- -LRB- MK NNP ; : E.C. NNP 2.7.1.36 CD -RRB- -RRB- , , phosphomevalonate JJ kinase NN -LRB- -LRB- PMK NNP ; : E.C. NNP 2.7.4.2 CD -RRB- -RRB- , , and CC diphosphomevalonate JJ decarboxylase NN -LRB- -LRB- PDM-DC NN ; : E.C. NNP 4.1.1.33 CD -RRB- -RRB- . .
  • 13. Motivation Related work Deficiencies Research approach Results Discussion Sum. Fw Questions Pre-processing Filter Regex ([a-zA-Z]+[- ]?w*) , Length of a word (2) Example 1 The mevalonate pathway is comprised of three consecutive reactions that are catalyzed by the enzymes mevalonate kinase (MK; E.C. 2.7.1.36), phosphomevalonate kinase (PMK; E.C. 2.7.4.2), and diphosphomevalonate decarboxylase (PDM-DC; E.C. 4.1.1.33). 2 The DT mevalonate JJ pathway NN is VBZ comprised VBN of IN three CD consecutive JJ reactions NNS that WDT are VBP catalyzed VBN by IN the DT enzymes NNS mevalonate VBP kinase NN -LRB- -LRB- MK NNP ; : E.C. NNP 2.7.1.36 CD -RRB- -RRB- , , phosphomevalonate JJ kinase NN -LRB- -LRB- PMK NNP ; : E.C. NNP 2.7.4.2 CD -RRB- -RRB- , , and CC diphosphomevalonate JJ decarboxylase NN -LRB- -LRB- PDM-DC NN ; : E.C. NNP 4.1.1.33 CD -RRB- -RRB- . .
  • 14. Motivation Related work Deficiencies Research approach Results Discussion Sum. Fw Questions Syntactic analysis Bootstrap 1 di (di ∈ C ) for i = 1, 2, 3, . . . 2 From di read each sentence sj using OpenNLP (sj ∈ di for j = 1, 2, 3, . . .) 3 Generate lexicon L according to the definition of lexicon 4 Each lexis wk ∈ L is normalized: find lemma or stemmed using Wordnet 3.0 5 Candidate semantic groups gl using N − Gram model for lexis wk [SJB09] 6 Candidate binary relationships vi (gj , gk ) vi , gk ∈ L using pattern (NW OW VW NW OW )∗ ∗ ∗ ∗ ∗
  • 15. Motivation Related work Deficiencies Research approach Results Discussion Sum. Fw Questions N-Gram model 3-Gram model 4-Gram model Probability P(wi |gj ) where i > 0, j > 0
  • 16. Motivation Related work Deficiencies Research approach Results Discussion Sum. Fw Questions N-Gram model 3-Gram model 4-Gram model Probability P(wi |gj ) where i > 0, j > 0
  • 17. Motivation Related work Deficiencies Research approach Results Discussion Sum. Fw Questions T-Box subsumption model Subsumption model BN4 w4 BN1 BN2 BN5 w1 w2 w5 BN3 g4 w3 g1 g2 g2 g3
  • 18. Motivation Related work Deficiencies Research approach Results Discussion Sum. Fw Questions T-Box relations model Relations model Semantic mapping p(C1 , C2 |V ) = p(C1 |V )p(C2 |V ) → V (C1 , C2 )
  • 19. Motivation Related work Deficiencies Research approach Results Discussion Sum. Fw Questions Semantic analysis & representation Semantics 1 Calculate probabilities 2 T-Box subsumption model. Pruning parameter KF 3 T-Box relations model. Pruning parameter RF 4 Antonomy pruning 5 Subsumption hierachy induction 6 Hyponomy and meronym analisys using Wordnet recognizable words 7 Serialize models to OWL DL
  • 20. Motivation Related work Deficiencies Research approach Results Discussion Sum. Fw Questions Example: T-Box Subsumption Example
  • 21. Motivation Related work Deficiencies Research approach Results Discussion Sum. Fw Questions Example: T-Box Relations Example
  • 22. Motivation Related work Deficiencies Research approach Results Discussion Sum. Fw Questions Example - Subsumption hierachy induction Example
  • 23. Motivation Related work Deficiencies Research approach Results Discussion Sum. Fw Questions Datasets Datasets 1 PubChem assays, large public hight throughput screening dataset [BAO09] (primary, qualitative evaluation). (Semantic Web Challenge 2010, http://bioassayontology.org ) 2 Sample collection of 218 web pages extracted from the University of Miami, Dept. of Computer Science (www .cs.miami.edu) domain (quantitative evaluation) 3 Sample collection of 38 pdf files from ISWC 2009 proceedings (secondary)
  • 24. Motivation Related work Deficiencies Research approach Results Discussion Sum. Fw Questions Dataset: www .cs.miami.edu domain Detaset Title Statistics Description All documents are xhtml Documents 218 formated with a give template Norm. candidate concept words Unique ConceptWords 5,384 from NN, NNP, NNS, JJ, JJR & JJS using [a-zA-Z]+[- ]?w* Norm. verbs from Unique Verbs 835 VB, VBD, VBG, VBN, VBP & VBZ using [a-zA-Z]+[- ]?w* Total ConceptWords 39,455 Total Verbs 4,797 Total Lexicon 44,252 L = ConceptWords Verbs Total Groups 39,455
  • 25. Motivation Related work Deficiencies Research approach Results Discussion Sum. Fw Questions Dataset: www .miami.edu domain, quantitative Measures: ref. ontology 1 Measures: ref. ontology 2 KF Prec. Rec. F1 KF Pre. Rec. F1 0.1 0.209 1 0.309 0.1 0.424 1 0.596 0.2 0.194 1 0.325 0.2 0.388 1 0.559 0.3 0.257 1 0.410 0.3 0.445 1 0.616 0.4 0.257 1 0.410 0.4 0.438 1 0.609 0.5 0.257 1 0.410 0.5 0.438 1 0.609 0.6 0.248 1 0.397 0.6 0.424 1 0.595 0.7 0.244 1 0.393 0.7 0.415 1 0.587 0.8 0.236 1 0.383 0.8 0.412 1 0.583 0.9 0.237 1 0.383 0.9 0.405 1 0.576 1.0 0.13 1 0.232 1.0 0.309 1 0.472
  • 26. Motivation Related work Deficiencies Research approach Results Discussion Sum. Fw Questions Dataset: www .miami.edu domain, quantitative Measures: ref. ontology 1 Measures: ref. ontology 2 KF Prec. Rec. F1 KF Pre. Rec. F1 0.1 0.209 1 0.309 0.1 0.424 1 0.596 0.2 0.194 1 0.325 0.2 0.388 1 0.559 0.3 0.257 1 0.410 0.3 0.445 1 0.616 0.4 0.257 1 0.410 0.4 0.438 1 0.609 0.5 0.257 1 0.410 0.5 0.438 1 0.609 0.6 0.248 1 0.397 0.6 0.424 1 0.595 0.7 0.244 1 0.393 0.7 0.415 1 0.587 0.8 0.236 1 0.383 0.8 0.412 1 0.583 0.9 0.237 1 0.383 0.9 0.405 1 0.576 1.0 0.13 1 0.232 1.0 0.309 1 0.472
  • 27. Motivation Related work Deficiencies Research approach Results Discussion Sum. Fw Questions Dataset: PubChem dataset (primary) Dataset Title Statistics Description All documents are xhtml Documents 1,759 formated with a given template Norm. candidate concept words Unique ConceptWords 13,017 from NN, NNP, NNS, JJ, JJR & JJS using [a-zA-Z]+[- ]?w* Norm. verbs from Unique Verbs 1,337 VB, VBD, VBG, VBN, VBP & VBZ using [a-zA-Z]+[- ]?w* Total ConceptWords 631,623 Total Verbs 109,421 Total Lexicon 741,044 L = ConceptWords Verbs Total Groups 631,623
  • 28. Motivation Related work Deficiencies Research approach Results Discussion Sum. Fw Questions Dataset: BioAssay ontology dataset (primary) Evaluation: qualitative Availability of ground truth Domain expert evaluation (Prof. Stephan Schuerer) Results for 3-gram Rich vocabulary Good structure Suitable as a seeding ontology to influence domain experts decisions
  • 29. Motivation Related work Deficiencies Research approach Results Discussion Sum. Fw Questions Dataset: BioAssay ontology dataset (primary) screen {some} assay_compound_line acetylcholine_plate_step screen {some} cell_compound_line add {some} acetylcholine_assay_buffer acetylcholine_calcium_receptor add {some} assay_buffer_second add {some} buffer_second_stimulation add {some} second_step_stimulation Thing assay 0_acetylcholine acetylcholine_receptor_turnover add {some} acetylcholine_assay_buffer add {some} assay_buffer_second screen {some} assay_compound_line acetylcholine_rat_receptor screen {some} cell_compound_line acetylcholine_nanomolar_plate
  • 30. Motivation Related work Deficiencies Research approach Results Discussion Sum. Fw Questions Discussion Discussion NLP expressions and our expression. Semantic attachment Substantial amount of data Distinction between concepts and individuals of the concepts WordNet unrecognizable words. Porter stemming algorithm. Complexity Syntactic layer: O(M × max(sj) × max(wk)) Semantic layer: O(|L| × |SuperConcepts|) Representation layer: complexity of Jena object model serializer Pellet and Fact++ reasoner output
  • 31. Motivation Related work Deficiencies Research approach Results Discussion Sum. Fw Questions Summary & Future work Summary Goal: The construction of an ontology for a random corpus Achievement: Seed ontology construction for a random text corpus Probabilistic reasoning to classify lexico-semantic structures Future work Inclusion of a set of English grammar rules to the N-gram models to get variable window sizes Extract information from other sources to provide a human readable concepts and roles Computational lexical semantics Expand the scope with adding more Pen Treebank tags
  • 32. Motivation Related work Deficiencies Research approach Results Discussion Sum. Fw Questions Summary & Future work Summary Goal: The construction of an ontology for a random corpus Achievement: Seed ontology construction for a random text corpus Probabilistic reasoning to classify lexico-semantic structures Future work Inclusion of a set of English grammar rules to the N-gram models to get variable window sizes Extract information from other sources to provide a human readable concepts and roles Computational lexical semantics Expand the scope with adding more Pen Treebank tags
  • 33. Motivation Related work Deficiencies Research approach Results Discussion Sum. Fw Questions Questions
  • 34. Motivation Related work Deficiencies Research approach Results Discussion Sum. Fw Questions UMLS. Unified Medical Language System http://www.nlm.nih.gov/research/umls/ , 2009 Alfresco Share Team. Alfresco BioAssayOntology University of Miami, http://share.ccs.miami.edu/share/page/site-index , 2009. T. Berners-Lee. Linked Data W3C Design Issues, 2006. J. Volker, P. Haase and P. Hitzler Learning Expressive Ontologies Volume 2 Studies on the Semantic Web, 2009 T. Mitchell. Populating the Semantic Web by Macro-Reading Internet Text ISWC Keynote, 2009. A. Maedche and S. Staab The TEXT-TO-ONTO Ontology Learning Environment, ICCS, 2000. A. Maedche. Ontology Learning for the Semantic Web, Kluwer Academic Publishers,2002. S. Staab and R. Studer. Handbook on Ontologies International Handbooks on Information Systems, 2009 A. Maedche and R. Volz The Ontology Extraction and Maintenance Framework Text-To-Onto, In proceeding of the ICDM’01 Workshop on Integrating Data Mining and Knowledge Management, 2001 P. Cimiano and J. Volker Text2Onto A framework for Ontology Learning and Data-driven Change Discovery, Proceedings of the 10th Internatioanl Conference on Applications of Natural Language to Information System (NLDB), volume 3513 of LNCS, pages 227-238, Alicante, Spain, Springer, 2005 P. Haase and J. Volker Ontology Learning and Reasoning - Dealing with Uncertainty and Inconsistency, Proceedings of the workshop on Uncertainty Reasoning for the semantic web, (URSW pages 45-55), 2005 P. Clark and P. Harrison. arge-Scae Extraction and Use of Knowledge from Text, K-Cap, 2009. D.S. Kim, K. Barker, and B. Porter. Knowledge Integration Across Multiple Texts, K-Cap, 2009. L. Schubert. Can be Derive General World Knowledge from Texts?. K-Cap, 2009. H.C. Cankaya and D. Moldovan. Method for Extracting Commonsense Knowledge K-Cap, 2009.
  • 35. Motivation Related work Deficiencies Research approach Results Discussion Sum. Fw Questions J. Bos and K. Markert. Recognising textual entailment with logical inference, Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing, Vancouver, British Columbia, Canada Pages: 628 - 635, 2005. C. Chemudugunta, A. Holloway, P. Smyth and M. Steyvers Modeling Documents by Combining Semantic Concepts with Unsupervised Statistical Learning, LNCS 5318, 2008. L.B. Marinho, K. Buza and L. Schmidt-Thieme Floksonomy-Based Collabulary Learning, LNCS 5318, 2008. J.L. Ambite, S. Darbha, A. Goel, C.A. Knoblock,K. Lerman, R. Parundekar, T. Russ Automatically Constructing Semantic Web Services from Online Sources, LNCS 5823, 2009. S. Russel and P. Norving Artificial Intelligence, A Modern Approach, 2nd ed. Prentice Hall Series in Artificial Intelligence, 2001. S. Banerjee and T. Pedersen The Design, Implementation and Use of the Ngram Statistic Package, LNCS 2588, 2009. T. Pedersen, M. Kayaalp and R. Bruce Significant Lexical Relationships, 13th National Conference on Artificial Intelligence, 1996. Open NLP, http://opennlp.sourceforge.net/ ,2009. WordNet 3.0. http://wordnet.princeton.edu/, 2009. A. Ghazvinian, N.F. Noy, C. Jonquet, N. Shah and M.A. Musen. What Four Million Mappings Can Tell You about Two Hundred Ontologies, LNCS 5823, 2009. J. Dolby, A. Fokoue, A. Kalyanpur, E. Schonberg and K. Srinivas Extracting Enterprise Vocabularies Using Linked Open Data, LNCS 5823, 2009.. R. Snow, D. Jurafsky, A.Y. Ng Semantic Taxonomy Induction from Heterogeneous Evidence, Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics, Sydney, Australia, Pages: 801 - 808, 2006. R. Wang and W.W. Cohen Language-Independent Set Expansion of Named Entities using the Web, Proceedings of the 2007 Seventh IEEE International Conference on Data Mining, 2007.
  • 36. Motivation Related work Deficiencies Research approach Results Discussion Sum. Fw Questions R. Want and W.W. Cohen Iterative Set Expansion of Named Entitles using the Web, , 2008 8th International Conference on Data Mining, 2008. R. Wang and W.W. Cohen Automatic Set Instance Extraction using the Web, In Proceedings of the 47th Annual Meeting of the ACL and the 4th IJCNLP of the AFNLP, 2009 SPARQL. SPARQL Query Language for RDF, W3C Recommendation 15 January 2008, http://www.w3.org/TR/rdf-sparql-query/, 2008. Jena. A Semantic Web Framework for Java, http://jena.sourceforge.net/ , 2009. P. Cimiano. Ontology Learning and Population from Text: Algorithms, Evaluation and Applications, Springler, 2006 P. Mika. Social Networks and the Semantic Web, Springler, 2007. T.R. Gruber. Knowledge Acquisition, A Translation Approach to Portable Ontologies. 5(2):199-220, 1993. R. Studer, R. Benjamins and D. Fensel. Data & Knowledge Engineering, Knowledge Engineering: Principles and methods. 25(1-2):161-198, 1998. J. Hendler. On beyond ontology, Keynote talk, Second Internatioanl Semantic Web Conference, 2003. P. Cimiano, A. Madche, S. Staab and J. Volker. Ontology Learning, Handbook On Ontologies, 254-267, 2009 D. Koller and A. Levy and A. Pfeffer P-CLASSIC: A tractable probabilistic description logic, In Proceedings of AAAI-97, Pages 390–397, 1997. Z. Ding and Yun Peng. A Probabilistic Extension to Ontology Language OWL, Proceedings of the 37th Hawaii International Conference on System Sciences, 2004. T. Lukasiewicz. Probabilistic description logics for the semantic web, Technical Report Nr. 1843-06-05, Institut fur Informationssysteme, Technische Universitat Wien, 2007. Pellet Pronto. Pellet Pronto, http://pellet.owldl.com/pronto/, 2008.
  • 37. Motivation Related work Deficiencies Research approach Results Discussion Sum. Fw Questions A. Carlson,J. Betteridge, R. C. Wang,E. R. Hruschka Jr. and T. M. Mitchell. Coupled Semi-Supervised Learning for Information Extraction, Proceedings of the Third ACM International Conference on Web Search and Data Mining (WSDM 2010), 2010. P. Cimiano and A. Hotho and S. Staab. Learning Concept Hierarchies from Text Corpora Using Formal Concept Analysis, Journal of Artificial Intelligence research, Pages 305–339, 2005. The Penn Treebank Project. The Penn Treebank Project, http://www.cis. upenn.edu/ treebank/, 2010. HermiT. Reasoning with Large Ontologies, http://www.comlab.ox.ac.uk/projects/HermiT/, 2010.