SlideShare a Scribd company logo
Università degli studi di Bari “Aldo Moro”
                                 Dipartimento di Informatica

                      Cooperating Techniques for
             Extracting Conceptual Taxonomies from Text
                                   S. Ferilli, F. Leuzzi, F. Rotella

                AI*IA 2011 XIIth Conference of the Italian Association for Artificial Intelligence
                             Workshop on Mining Complex Patterns (MCP 2011)
                                     Palermo, Italy, September 17, 2011
          1. Introduction & Objectives
          2. Extraction of knowledge from text
          3. Knowledge representation formalism
          4. Identification of relevant concepts
          5. Generalization of similar concepts
          6. Reasoning ‘by association’
          7. Conclusions & Future works

Cooperating Techniques for Extracting Conceptual Taxonomies from Text - S. Ferilli, F. Leuzzi, F. Rotella   2
          The spread of electronic documents and document
          repositories has generated the need for automatic techniques
          to understand and handle the documents content in order to
          help users in satisfying their information needs.

          Full Text Understading is not trivial, due to:
          1. intrinsic ambiguity of natural language;
          2. huge amount of common sense and conceptual background

          For facing these problems lexical and/or conceptual
          taxonomies are useful, even if manually building is very costly
          and error prone.
Cooperating Techniques for Extracting Conceptual Taxonomies from Text - S. Ferilli, F. Leuzzi, F. Rotella   3
          This lack is a strong motivation towards
          automatic construction of conceptual
          networks by mining large amounts of
          documents in natural language.

                                                   However, even assuming a correct
                                                   knowledge representation, we are
                                                   far to simulate human abilities yet.

Cooperating Techniques for Extracting Conceptual Taxonomies from Text - S. Ferilli, F. Leuzzi, F. Rotella   4

          1. Definition of a representation formalism for knowledge
             extracted from natural language texts

          2. Extraction of concepts and relevance assessment

          3. Generalization of concepts having similar descriptions

          4. Definition of a kind of reasoning by concept association that
             looks for possible indirect connections between two
             identified concepts

Cooperating Techniques for Extracting Conceptual Taxonomies from Text - S. Ferilli, F. Leuzzi, F. Rotella   5
Extraction of knowledge
                           from text
          Knowledge extracted by processing each sentence separately.

                    Stanford                              Stanford
                   Parser [1]                          Dependencies [2]

          The final output of the Stanford Dependencies is a typed
          syntactic structure of each sentence.

Cooperating Techniques for Extracting Conceptual Taxonomies from Text - S. Ferilli, F. Leuzzi, F. Rotella   6
Knowledge representation
          Among all grammatical roles played by words in a sentence,
          only subject, verb and complement have been considered.
          In the final conceptual graph subjects and complements will
          represent concepts, while verbs will express relations between


Cooperating Techniques for Extracting Conceptual Taxonomies from Text - S. Ferilli, F. Leuzzi, F. Rotella   7
Identification of
                               relevant concept
       A mix of several techniques are brought to cooperation for
       identifying relevant concepts:

       ●   Hub Words [3]: words having high frequency whose relevance is
           computed as:

                              W (t )=α w 0 +β n+γ ∑ i=1 w (t i )

           where: w0 , initial weight; n, # of relationships;
                     w(ti), tf*idf weight of i-th word related to t.

       ●   Keyword extraction techniques from single documents.
       ●   EM Clustering provided by Weka [4] based on Euclidean

Cooperating Techniques for Extracting Conceptual Taxonomies from Text - S. Ferilli, F. Leuzzi, F. Rotella   8
Identification of
                               relevant concept
          Inspired to the Hub Words approach we have defined a
          Relevance Weight:

                    A                   B                       C                       D            E
            w (̄)
                c           e(̄)c          ∑( c , ̄ ) w (c ) d M −d ( c )
                                                  c                   ̄        k (̄)
W ( ̄ )=α
    c                  +β               +γ                  +δ            +ε
          max c w( c )    max c e ( c )       e( ̄ ) c           dM          max c k ( c )

          where: α + β+γ +δ +ε =1

          Nodes in the network are ranked by decreasing Relevance
          A suitable cut-point in the ranking is determined by choosing
          the first item such that:
                        W ( c k )-W (c k+1 )≥ p⋅ max                   ( W ( c i )-W (c i+1 ) )
                                                     i =0,.. . , n−1
          where: p∈ [ 0,1 ]
Cooperating Techniques for Extracting Conceptual Taxonomies from Text - S. Ferilli, F. Leuzzi, F. Rotella   9
Identification of relevant concept
               Relevance Weight in details
                          Definition of the Initial Weight

          The whole set of triples <subject,verb,complement> is
          represented in a Concepts x Attributes matrix V recalling the
          classical Terms x Documents Vector Space Model.

                                            f i, j                 ∣A∣
          Resembling tf*idf:                           ⋅log
                                         ∑   k
                                                 f k, j     ∣{ j : c i ∈a j }∣

                                                          w (c )
          Therefore component A is:                   α
                                                        max c w ( c)
          where w(c) is the initial weight assigned to node c computed
          according to the above tf*idf schema.

Cooperating Techniques for Extracting Conceptual Taxonomies from Text - S. Ferilli, F. Leuzzi, F. Rotella   10
Identification of relevant concept
               Relevance Weight in details
                                   Connections Number
          Component B considers the number of connections (edges) in
          which c is involved
                                                  max c e ( c )

                          Neighborhood Weight Summary
          Component C takes into account the average
          initial weight of all neighbors of c

               ∑ (c,c )
                          w ( c)
                   e( c )

Cooperating Techniques for Extracting Conceptual Taxonomies from Text - S. Ferilli, F. Leuzzi, F. Rotella   11
Identification of relevant concept
               Relevance Weight in details
                            Inverse Distance form Center
          Component D represents the closeness to center of the cluster
                                                d M −d( c )

                                           KE Influence
         Component E takes into account the outcome of three KE
         techniques suitably weighted:
                                                 k (̄ )
                                               max c k (c )

               k ( ̄ )=ςk co−occurrences ( ̄ )+ηk synset ( ̄ )+θk mvn ( ̄ )
                   c                       c               c            c

Cooperating Techniques for Extracting Conceptual Taxonomies from Text - S. Ferilli, F. Leuzzi, F. Rotella   12
Identification of relevant concept
               Relevance Weight in details
              KE based on                                                           χ
                                              k co− occurrences=ς

              co-occurrences                                               max cluster χ

                                                                      kw synset
         ●    KE based on                     k synset =η
              WordNet Synsets                                   max ( kw synset )

              KE by means
                                                                     kw mvn

              Multivariate Normal              k mvn=θ
                                                               max ( kw mvn )
              Distribution (MVN)

Cooperating Techniques for Extracting Conceptual Taxonomies from Text - S. Ferilli, F. Leuzzi, F. Rotella   13
Identification of relevant concept
                        Test #       α         β         γ           δ           ε        p
                           1       0.10      0.10      0.30       0.25        0.25     1.0
                           2       0.20      0.15      0.15       0.25        0.25     0.7
                           3       0.15      0.25      0.30       0.15        0.15     1.0

           Test #     Concept         A            B          C           D           E        W
              1      network       0.100      0.100          0.021       0.178       0.250    0.649
                     access        0.001      0.001          0.154       0.239       0.250    0.646
                     subset       6.32E-4     0.001          0.150       0.239       0.250    0.641
              2      network       0.200      0.150      0.0105          0.178       0.250    0.789
              3      network       0.150      0.250          0.021       0.146       0.150    0.717
                     user          0.127      0.195          0.022       0.146       0.150    0.641
                     number        0.113      0.187          0.022       0.146       0.150    0.619
                     individual    0.103      0.174          0.020       0.146       0.150    0.594

Cooperating Techniques for Extracting Conceptual Taxonomies from Text - S. Ferilli, F. Leuzzi, F. Rotella   14
Generalization of similar concepts
                         Pairwise clustering
          Take in account the description of each concept, consisting in
          a binary vector that represents presence or absence (1 or 0
          respectively) of a <subject,complement> relation between
          the involved concepts. The Hamming distance provides a
          similarity evaluation between them.

Cooperating Techniques for Extracting Conceptual Taxonomies from Text - S. Ferilli, F. Leuzzi, F. Rotella   15
Generalization of similar concepts
            WordNet1 is an external resource that has some useful
            1. lexical taxonomy
            2. each concept is described as a set of synonyms (synset)
            3. synsets are interlinked by means of conceptual-
                semantic and lexical relations

            We are focused on hyperonymy, a relation that links the
            current synset to more general ones.


Cooperating Techniques for Extracting Conceptual Taxonomies from Text - S. Ferilli, F. Leuzzi, F. Rotella   16
Generalization of similar concepts
            Taxonomical similarity function
    More general: provides a                                  More specific: provides a
    similarity value on the bases of                          similarity value on the bases of
    common relations, without                                 common relations, relying on
    focusing on the specific path.                            the specific path.

Cooperating Techniques for Extracting Conceptual Taxonomies from Text - S. Ferilli, F. Leuzzi, F. Rotella   17
Generalization of similar concepts
                       WSD Domain Driven
          One Domain per Discourse assumption: many uses of a word
          in a coherent portion of text tend to share the same domain.
      Prevalent domain
      Prevalent domain

                                Extraction of all
                                Extraction of all
                           synsets for each term
                           synsets for each term

                                                       Extraction of all
                                                       Extraction of all
                                                domains for each synset
                                                domains for each synset

                                                                            Choice of prevalent
                                                                            Choice of prevalent
                                                                                domain synset
                                                                                domain synset

Cooperating Techniques for Extracting Conceptual Taxonomies from Text - S. Ferilli, F. Leuzzi, F. Rotella   18
Generalization of similar concepts
          Two toy experiments have been performed with Hamming
          distance threshold respectively equal to 0.001 and 0.0001,
          while taxonomical similarity function threshold has been kept
          equal to 0.4.

Cooperating Techniques for Extracting Conceptual Taxonomies from Text - S. Ferilli, F. Leuzzi, F. Rotella   19
Reasoning ‘by association’
                      Breadth-First Search
          Given two nodes (concepts), a Breadth-First Search starts
          from both nodes, the former searches the latter's frontier and
          vice versa, until the two frontiers meet by common nodes.
          Then the path is restored going backward to the roots in both

Cooperating Techniques for Extracting Conceptual Taxonomies from Text - S. Ferilli, F. Leuzzi, F. Rotella   20
Reasoning ‘by association’
          The table below shows a sample of possible outcomes.
          E.g., an interpretation of case 5 can be:
          “the adults write about freedom and use platform, that is
          recognized as a technology, as well as the internet”.

Cooperating Techniques for Extracting Conceptual Taxonomies from Text - S. Ferilli, F. Leuzzi, F. Rotella   21
    This work proposes an approach to extract automatic conceptual
    taxonomy from natural language texts.

    It works mixing different techniques in order to:
    ●   identify relevant terms/concepts in text;
    ●   generalize similar concepts;
    ●   perform some kind of reasoning “by association”.

    Preliminary experiments show that this approach can be viable
    although extensions and refinements are needed.
    A reliable outcome might help users in understanding the text
    content and machines to automatically perform some kind of
    reasoning on the taxonomy.
Cooperating Techniques for Extracting Conceptual Taxonomies from Text - S. Ferilli, F. Leuzzi, F. Rotella   22
Future works
          1. Extending the knowledge representation formalism to
             express negation.

          2. Defining a strategy to make a better choice of weights in
             Relevance Weight computation.

          3. Enriching the adjacency matrix to improve concept

          4. ODD alternatives exploration, to overcome its limits.

          5. Taxonomical similarity measures take into account only the
             hypernym relation, while a more accurate similarity can be
             obtained adding other relations.

          6. Define a strategy to prefer one verb rather than keeping all
             of them, in reasoning ‘by association’ phase.

Cooperating Techniques for Extracting Conceptual Taxonomies from Text - S. Ferilli, F. Leuzzi, F. Rotella   23
          [1] Dan Klein and Christopher D. Manning. Fast exact
          inference with a factored model for natural language parsing.
          In Advances in Neural Information Processing Systems,
          volume 15. MIT Press, 2003.
          [2] Marie-Catherine de Marneffe, Bill MacCartney, and
          Christopher D. Manning. Generating typed dependency parses
          from phrase structure trees. In LREC, 2006.
          [3] Sang Ok Koo, Soo Yeon Lim, and Sang-Jo Lee. Constructing
          an ontology based on hub words. In ISMIS’03, pages 93–97,
          [4] M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann,
          and I.H. Witten. The weka data mining software: an update.
          SIGKDD Explorations, 11(1):10–18,2009.

Cooperating Techniques for Extracting Conceptual Taxonomies from Text - S. Ferilli, F. Leuzzi, F. Rotella   24

More Related Content

What's hot

Integration in Finite Terms
Integration in Finite TermsIntegration in Finite Terms
Integration in Finite Terms
Kp Hart
Text smilarity02 corpus_based
Text smilarity02 corpus_basedText smilarity02 corpus_based
Text smilarity02 corpus_based
Jarrar.lecture notes.aai.2012s.descriptionlogic
Jarrar.lecture notes.aai.2012s.descriptionlogicJarrar.lecture notes.aai.2012s.descriptionlogic
Jarrar.lecture notes.aai.2012s.descriptionlogic
Exempler approach
Exempler approachExempler approach
Exempler approach
C Meenakshi Meyyappan
Extending the knowledge level of cognitive architectures with Conceptual Spac...
Extending the knowledge level of cognitive architectures with Conceptual Spac...Extending the knowledge level of cognitive architectures with Conceptual Spac...
Extending the knowledge level of cognitive architectures with Conceptual Spac...
Antonio Lieto
Introduction to Distributional Semantics
Introduction to Distributional SemanticsIntroduction to Distributional Semantics
Introduction to Distributional Semantics
Andre Freitas
ConNeKTion: A Tool for Exploiting Conceptual Graphs Automatically Learned fro...
ConNeKTion: A Tool for Exploiting Conceptual Graphs Automatically Learned fro...ConNeKTion: A Tool for Exploiting Conceptual Graphs Automatically Learned fro...
ConNeKTion: A Tool for Exploiting Conceptual Graphs Automatically Learned fro...
University of Bari (Italy)
RuleML2015 The Herbrand Manifesto - Thinking Inside the Box
RuleML2015 The Herbrand Manifesto - Thinking Inside the Box RuleML2015 The Herbrand Manifesto - Thinking Inside the Box
RuleML2015 The Herbrand Manifesto - Thinking Inside the Box
Lifelong Topic Modelling presentation
Lifelong Topic Modelling presentation Lifelong Topic Modelling presentation
Lifelong Topic Modelling presentation
Daniele Di Mitri
ConNeKTion: A Tool for Exploiting Conceptual Graphs Automatically Learned fro...
ConNeKTion: A Tool for Exploiting Conceptual Graphs Automatically Learned fro...ConNeKTion: A Tool for Exploiting Conceptual Graphs Automatically Learned fro...
ConNeKTion: A Tool for Exploiting Conceptual Graphs Automatically Learned fro...
University of Bari (Italy)
Constructive Description Logics 2006
Constructive Description Logics 2006Constructive Description Logics 2006
Constructive Description Logics 2006
Valeria de Paiva
Dependent Types in Natural Language Semantics
Dependent Types in Natural Language SemanticsDependent Types in Natural Language Semantics
Dependent Types in Natural Language Semantics
Daisuke BEKKI
How to Ground A Language for Legal Discourse In a Prototypical Perceptual Sem...
How to Ground A Language for Legal Discourse In a Prototypical Perceptual Sem...How to Ground A Language for Legal Discourse In a Prototypical Perceptual Sem...
How to Ground A Language for Legal Discourse In a Prototypical Perceptual Sem...
L. Thorne McCarty
A survey on parallel corpora alignment
A survey on parallel corpora alignment A survey on parallel corpora alignment
A survey on parallel corpora alignment
Cerutti--TAFA 2011
Cerutti--TAFA 2011Cerutti--TAFA 2011
Cerutti--TAFA 2011
Federico Cerutti
Constructive Hybrid Logics
Constructive Hybrid LogicsConstructive Hybrid Logics
Constructive Hybrid Logics
Valeria de Paiva
Truth as a logical connective?
Truth as a logical connective?Truth as a logical connective?
Truth as a logical connective?
Shunsuke Yatabe

What's hot (19)

Integration in Finite Terms
Integration in Finite TermsIntegration in Finite Terms
Integration in Finite Terms
Text smilarity02 corpus_based
Text smilarity02 corpus_basedText smilarity02 corpus_based
Text smilarity02 corpus_based
Jarrar.lecture notes.aai.2012s.descriptionlogic
Jarrar.lecture notes.aai.2012s.descriptionlogicJarrar.lecture notes.aai.2012s.descriptionlogic
Jarrar.lecture notes.aai.2012s.descriptionlogic
Exempler approach
Exempler approachExempler approach
Exempler approach
Extending the knowledge level of cognitive architectures with Conceptual Spac...
Extending the knowledge level of cognitive architectures with Conceptual Spac...Extending the knowledge level of cognitive architectures with Conceptual Spac...
Extending the knowledge level of cognitive architectures with Conceptual Spac...
Introduction to Distributional Semantics
Introduction to Distributional SemanticsIntroduction to Distributional Semantics
Introduction to Distributional Semantics
ConNeKTion: A Tool for Exploiting Conceptual Graphs Automatically Learned fro...
ConNeKTion: A Tool for Exploiting Conceptual Graphs Automatically Learned fro...ConNeKTion: A Tool for Exploiting Conceptual Graphs Automatically Learned fro...
ConNeKTion: A Tool for Exploiting Conceptual Graphs Automatically Learned fro...
RuleML2015 The Herbrand Manifesto - Thinking Inside the Box
RuleML2015 The Herbrand Manifesto - Thinking Inside the Box RuleML2015 The Herbrand Manifesto - Thinking Inside the Box
RuleML2015 The Herbrand Manifesto - Thinking Inside the Box
Lifelong Topic Modelling presentation
Lifelong Topic Modelling presentation Lifelong Topic Modelling presentation
Lifelong Topic Modelling presentation
ConNeKTion: A Tool for Exploiting Conceptual Graphs Automatically Learned fro...
ConNeKTion: A Tool for Exploiting Conceptual Graphs Automatically Learned fro...ConNeKTion: A Tool for Exploiting Conceptual Graphs Automatically Learned fro...
ConNeKTion: A Tool for Exploiting Conceptual Graphs Automatically Learned fro...
Constructive Description Logics 2006
Constructive Description Logics 2006Constructive Description Logics 2006
Constructive Description Logics 2006
Dependent Types in Natural Language Semantics
Dependent Types in Natural Language SemanticsDependent Types in Natural Language Semantics
Dependent Types in Natural Language Semantics
How to Ground A Language for Legal Discourse In a Prototypical Perceptual Sem...
How to Ground A Language for Legal Discourse In a Prototypical Perceptual Sem...How to Ground A Language for Legal Discourse In a Prototypical Perceptual Sem...
How to Ground A Language for Legal Discourse In a Prototypical Perceptual Sem...
A survey on parallel corpora alignment
A survey on parallel corpora alignment A survey on parallel corpora alignment
A survey on parallel corpora alignment
Cerutti--TAFA 2011
Cerutti--TAFA 2011Cerutti--TAFA 2011
Cerutti--TAFA 2011
Constructive Hybrid Logics
Constructive Hybrid LogicsConstructive Hybrid Logics
Constructive Hybrid Logics
Truth as a logical connective?
Truth as a logical connective?Truth as a logical connective?
Truth as a logical connective?

Viewers also liked

Technical report jada
Technical report jadaTechnical report jada
Technical report jada
University of Bari (Italy)
A Run Length Smoothing-Based Algorithm for Non-Manhattan Document Segmentation
A Run Length Smoothing-Based Algorithm for Non-Manhattan Document SegmentationA Run Length Smoothing-Based Algorithm for Non-Manhattan Document Segmentation
A Run Length Smoothing-Based Algorithm for Non-Manhattan Document Segmentation
University of Bari (Italy)
A Domain Based Approach to Information Retrieval in Digital Libraries - Rotel...
A Domain Based Approach to Information Retrieval in Digital Libraries - Rotel...A Domain Based Approach to Information Retrieval in Digital Libraries - Rotel...
A Domain Based Approach to Information Retrieval in Digital Libraries - Rotel...
University of Bari (Italy)
Recognising the Social Attitude in Natural Interaction with Pedagogical Agents
Recognising the Social Attitude in Natural Interaction with Pedagogical AgentsRecognising the Social Attitude in Natural Interaction with Pedagogical Agents
Recognising the Social Attitude in Natural Interaction with Pedagogical Agents
University of Bari (Italy)
Recognising the Social Attitude in Natural Interaction with Pedagogical Agents
Recognising the Social Attitude in Natural Interaction with Pedagogical AgentsRecognising the Social Attitude in Natural Interaction with Pedagogical Agents
Recognising the Social Attitude in Natural Interaction with Pedagogical Agents
University of Bari (Italy)
A Run Length Smoothing-Based Algorithm for Non-Manhattan Document Segmentation
A Run Length Smoothing-Based Algorithm for Non-Manhattan Document SegmentationA Run Length Smoothing-Based Algorithm for Non-Manhattan Document Segmentation
A Run Length Smoothing-Based Algorithm for Non-Manhattan Document Segmentation
University of Bari (Italy)

Viewers also liked (6)

Technical report jada
Technical report jadaTechnical report jada
Technical report jada
A Run Length Smoothing-Based Algorithm for Non-Manhattan Document Segmentation
A Run Length Smoothing-Based Algorithm for Non-Manhattan Document SegmentationA Run Length Smoothing-Based Algorithm for Non-Manhattan Document Segmentation
A Run Length Smoothing-Based Algorithm for Non-Manhattan Document Segmentation
A Domain Based Approach to Information Retrieval in Digital Libraries - Rotel...
A Domain Based Approach to Information Retrieval in Digital Libraries - Rotel...A Domain Based Approach to Information Retrieval in Digital Libraries - Rotel...
A Domain Based Approach to Information Retrieval in Digital Libraries - Rotel...
Recognising the Social Attitude in Natural Interaction with Pedagogical Agents
Recognising the Social Attitude in Natural Interaction with Pedagogical AgentsRecognising the Social Attitude in Natural Interaction with Pedagogical Agents
Recognising the Social Attitude in Natural Interaction with Pedagogical Agents
Recognising the Social Attitude in Natural Interaction with Pedagogical Agents
Recognising the Social Attitude in Natural Interaction with Pedagogical AgentsRecognising the Social Attitude in Natural Interaction with Pedagogical Agents
Recognising the Social Attitude in Natural Interaction with Pedagogical Agents
A Run Length Smoothing-Based Algorithm for Non-Manhattan Document Segmentation
A Run Length Smoothing-Based Algorithm for Non-Manhattan Document SegmentationA Run Length Smoothing-Based Algorithm for Non-Manhattan Document Segmentation
A Run Length Smoothing-Based Algorithm for Non-Manhattan Document Segmentation

Similar to Cooperating Techniques for Extracting Conceptual Taxonomies from Text

Cooperating Techniques for Extracting Conceptual Taxonomies from Text
Cooperating Techniques for Extracting Conceptual Taxonomies from TextCooperating Techniques for Extracting Conceptual Taxonomies from Text
Cooperating Techniques for Extracting Conceptual Taxonomies from Text
Fulvio Rotella
An Approach To Assess The Existence Of A Proposed Intervention In Essay-Argum...
An Approach To Assess The Existence Of A Proposed Intervention In Essay-Argum...An Approach To Assess The Existence Of A Proposed Intervention In Essay-Argum...
An Approach To Assess The Existence Of A Proposed Intervention In Essay-Argum...
Heather Strinden
word2vec, node2vec, graph2vec, X2vec: Towards a Theory of Vector Embeddings o...
word2vec, node2vec, graph2vec, X2vec: Towards a Theory of Vector Embeddings o...word2vec, node2vec, graph2vec, X2vec: Towards a Theory of Vector Embeddings o...
word2vec, node2vec, graph2vec, X2vec: Towards a Theory of Vector Embeddings o...
Subhajit Sahu
An Entity-Driven Recursive Neural Network Model for Chinese Discourse Coheren...
An Entity-Driven Recursive Neural Network Model for Chinese Discourse Coheren...An Entity-Driven Recursive Neural Network Model for Chinese Discourse Coheren...
An Entity-Driven Recursive Neural Network Model for Chinese Discourse Coheren...
Like Alice in Wonderland: Unraveling Reasoning and Cognition Using Analogies ...
Like Alice in Wonderland: Unraveling Reasoning and Cognition Using Analogies ...Like Alice in Wonderland: Unraveling Reasoning and Cognition Using Analogies ...
Like Alice in Wonderland: Unraveling Reasoning and Cognition Using Analogies ...
Facultad de Informática UCM
FCA-MERGE: Bottom-Up Merging of Ontologies
FCA-MERGE: Bottom-Up Merging of OntologiesFCA-MERGE: Bottom-Up Merging of Ontologies
FCA-MERGE: Bottom-Up Merging of Ontologies
Cross-lingual event-mining using wordnet as a shared knowledge interface
Cross-lingual event-mining using wordnet as a shared knowledge interfaceCross-lingual event-mining using wordnet as a shared knowledge interface
Cross-lingual event-mining using wordnet as a shared knowledge interface
Topic Extraction on Domain Ontology
Topic Extraction on Domain OntologyTopic Extraction on Domain Ontology
Topic Extraction on Domain Ontology
Keerti Bhogaraju
Ajay Ohri
Lean Logic for Lean Times: Varieties of Natural Logic
Lean Logic for Lean Times: Varieties of Natural LogicLean Logic for Lean Times: Varieties of Natural Logic
Lean Logic for Lean Times: Varieties of Natural Logic
Valeria de Paiva
Blei ngjordan2003
Blei ngjordan2003Blei ngjordan2003
Blei ngjordan2003
Ajay Ohri
Mahmoud Abdullah
Eswcsummerschool2010 ontologies final
Eswcsummerschool2010 ontologies finalEswcsummerschool2010 ontologies final
Eswcsummerschool2010 ontologies final
Elena Simperl
Cerutti--Knowledge Representation and Reasoning (postgrad seminar @ Universit...
Cerutti--Knowledge Representation and Reasoning (postgrad seminar @ Universit...Cerutti--Knowledge Representation and Reasoning (postgrad seminar @ Universit...
Cerutti--Knowledge Representation and Reasoning (postgrad seminar @ Universit...
Federico Cerutti
Method for ontology generation from concept maps in shallow domains
Method for ontology generation from concept maps in shallow domainsMethod for ontology generation from concept maps in shallow domains
Method for ontology generation from concept maps in shallow domains
Luigi Ceccaroni
Discovering Novel Information with sentence Level clustering From Multi-docu...
Discovering Novel Information with sentence Level clustering  From Multi-docu...Discovering Novel Information with sentence Level clustering  From Multi-docu...
Discovering Novel Information with sentence Level clustering From Multi-docu...
Identifying the semantic relations on
Identifying the semantic relations onIdentifying the semantic relations on
Identifying the semantic relations on

Similar to Cooperating Techniques for Extracting Conceptual Taxonomies from Text (20)

Cooperating Techniques for Extracting Conceptual Taxonomies from Text
Cooperating Techniques for Extracting Conceptual Taxonomies from TextCooperating Techniques for Extracting Conceptual Taxonomies from Text
Cooperating Techniques for Extracting Conceptual Taxonomies from Text
An Approach To Assess The Existence Of A Proposed Intervention In Essay-Argum...
An Approach To Assess The Existence Of A Proposed Intervention In Essay-Argum...An Approach To Assess The Existence Of A Proposed Intervention In Essay-Argum...
An Approach To Assess The Existence Of A Proposed Intervention In Essay-Argum...
word2vec, node2vec, graph2vec, X2vec: Towards a Theory of Vector Embeddings o...
word2vec, node2vec, graph2vec, X2vec: Towards a Theory of Vector Embeddings o...word2vec, node2vec, graph2vec, X2vec: Towards a Theory of Vector Embeddings o...
word2vec, node2vec, graph2vec, X2vec: Towards a Theory of Vector Embeddings o...
An Entity-Driven Recursive Neural Network Model for Chinese Discourse Coheren...
An Entity-Driven Recursive Neural Network Model for Chinese Discourse Coheren...An Entity-Driven Recursive Neural Network Model for Chinese Discourse Coheren...
An Entity-Driven Recursive Neural Network Model for Chinese Discourse Coheren...
Like Alice in Wonderland: Unraveling Reasoning and Cognition Using Analogies ...
Like Alice in Wonderland: Unraveling Reasoning and Cognition Using Analogies ...Like Alice in Wonderland: Unraveling Reasoning and Cognition Using Analogies ...
Like Alice in Wonderland: Unraveling Reasoning and Cognition Using Analogies ...
FCA-MERGE: Bottom-Up Merging of Ontologies
FCA-MERGE: Bottom-Up Merging of OntologiesFCA-MERGE: Bottom-Up Merging of Ontologies
FCA-MERGE: Bottom-Up Merging of Ontologies
Cross-lingual event-mining using wordnet as a shared knowledge interface
Cross-lingual event-mining using wordnet as a shared knowledge interfaceCross-lingual event-mining using wordnet as a shared knowledge interface
Cross-lingual event-mining using wordnet as a shared knowledge interface
Topic Extraction on Domain Ontology
Topic Extraction on Domain OntologyTopic Extraction on Domain Ontology
Topic Extraction on Domain Ontology
Lean Logic for Lean Times: Varieties of Natural Logic
Lean Logic for Lean Times: Varieties of Natural LogicLean Logic for Lean Times: Varieties of Natural Logic
Lean Logic for Lean Times: Varieties of Natural Logic
Blei ngjordan2003
Blei ngjordan2003Blei ngjordan2003
Blei ngjordan2003
Eswcsummerschool2010 ontologies final
Eswcsummerschool2010 ontologies finalEswcsummerschool2010 ontologies final
Eswcsummerschool2010 ontologies final
Cerutti--Knowledge Representation and Reasoning (postgrad seminar @ Universit...
Cerutti--Knowledge Representation and Reasoning (postgrad seminar @ Universit...Cerutti--Knowledge Representation and Reasoning (postgrad seminar @ Universit...
Cerutti--Knowledge Representation and Reasoning (postgrad seminar @ Universit...
Method for ontology generation from concept maps in shallow domains
Method for ontology generation from concept maps in shallow domainsMethod for ontology generation from concept maps in shallow domains
Method for ontology generation from concept maps in shallow domains
Discovering Novel Information with sentence Level clustering From Multi-docu...
Discovering Novel Information with sentence Level clustering  From Multi-docu...Discovering Novel Information with sentence Level clustering  From Multi-docu...
Discovering Novel Information with sentence Level clustering From Multi-docu...
Identifying the semantic relations on
Identifying the semantic relations onIdentifying the semantic relations on
Identifying the semantic relations on

Recently uploaded

How to Manage Your Lost Opportunities in Odoo 17 CRM
How to Manage Your Lost Opportunities in Odoo 17 CRMHow to Manage Your Lost Opportunities in Odoo 17 CRM
How to Manage Your Lost Opportunities in Odoo 17 CRM
Celine George
Your Skill Boost Masterclass: Strategies for Effective Upskilling
Your Skill Boost Masterclass: Strategies for Effective UpskillingYour Skill Boost Masterclass: Strategies for Effective Upskilling
Your Skill Boost Masterclass: Strategies for Effective Upskilling
Excellence Foundation for South Sudan
The Diamonds of 2023-2024 in the IGRA collection
The Diamonds of 2023-2024 in the IGRA collectionThe Diamonds of 2023-2024 in the IGRA collection
The Diamonds of 2023-2024 in the IGRA collection
Israel Genealogy Research Association
Colégio Santa Teresinha
How to Fix the Import Error in the Odoo 17
How to Fix the Import Error in the Odoo 17How to Fix the Import Error in the Odoo 17
How to Fix the Import Error in the Odoo 17
Celine George
ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...
ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...
ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...
C1 Rubenstein AP HuG xxxxxxxxxxxxxx.pptx
C1 Rubenstein AP HuG xxxxxxxxxxxxxx.pptxC1 Rubenstein AP HuG xxxxxxxxxxxxxx.pptx
C1 Rubenstein AP HuG xxxxxxxxxxxxxx.pptx
Chapter 4 - Islamic Financial Institutions in Malaysia.pptx
Chapter 4 - Islamic Financial Institutions in Malaysia.pptxChapter 4 - Islamic Financial Institutions in Malaysia.pptx
Chapter 4 - Islamic Financial Institutions in Malaysia.pptx
Mohd Adib Abd Muin, Senior Lecturer at Universiti Utara Malaysia
How to Setup Warehouse & Location in Odoo 17 Inventory
How to Setup Warehouse & Location in Odoo 17 InventoryHow to Setup Warehouse & Location in Odoo 17 Inventory
How to Setup Warehouse & Location in Odoo 17 Inventory
Celine George
Pollock and Snow "DEIA in the Scholarly Landscape, Session One: Setting Expec...
Pollock and Snow "DEIA in the Scholarly Landscape, Session One: Setting Expec...Pollock and Snow "DEIA in the Scholarly Landscape, Session One: Setting Expec...
Pollock and Snow "DEIA in the Scholarly Landscape, Session One: Setting Expec...
National Information Standards Organization (NISO)
The History of Stoke Newington Street Names
The History of Stoke Newington Street NamesThe History of Stoke Newington Street Names
The History of Stoke Newington Street Names
History of Stoke Newington
clinical examination of hip joint (1).pdf
clinical examination of hip joint (1).pdfclinical examination of hip joint (1).pdf
clinical examination of hip joint (1).pdf
Main Java[All of the Base Concepts}.docx
Main Java[All of the Base Concepts}.docxMain Java[All of the Base Concepts}.docx
Main Java[All of the Base Concepts}.docx
Executive Directors Chat Leveraging AI for Diversity, Equity, and Inclusion
Executive Directors Chat  Leveraging AI for Diversity, Equity, and InclusionExecutive Directors Chat  Leveraging AI for Diversity, Equity, and Inclusion
Executive Directors Chat Leveraging AI for Diversity, Equity, and Inclusion
PCOS corelations and management through Ayurveda.
PCOS corelations and management through Ayurveda.PCOS corelations and management through Ayurveda.
PCOS corelations and management through Ayurveda.
Dr. Shivangi Singh Parihar
Smart-Money for SMC traders good time and ICT
Smart-Money for SMC traders good time and ICTSmart-Money for SMC traders good time and ICT
Smart-Money for SMC traders good time and ICT
বাংলাদেশ অর্থনৈতিক সমীক্ষা (Economic Review) ২০২৪ UJS App.pdf
বাংলাদেশ অর্থনৈতিক সমীক্ষা (Economic Review) ২০২৪ UJS App.pdfবাংলাদেশ অর্থনৈতিক সমীক্ষা (Economic Review) ২০২৪ UJS App.pdf
বাংলাদেশ অর্থনৈতিক সমীক্ষা (Economic Review) ২০২৪ UJS App.pdf (প্রয়োজনীয় বাংলা বই)
BBR 2024 Summer Sessions Interview Training
BBR  2024 Summer Sessions Interview TrainingBBR  2024 Summer Sessions Interview Training
BBR 2024 Summer Sessions Interview Training
Katrina Pritchard
Digital Artefact 1 - Tiny Home Environmental Design
Digital Artefact 1 - Tiny Home Environmental DesignDigital Artefact 1 - Tiny Home Environmental Design
Digital Artefact 1 - Tiny Home Environmental Design

Recently uploaded (20)

How to Manage Your Lost Opportunities in Odoo 17 CRM
How to Manage Your Lost Opportunities in Odoo 17 CRMHow to Manage Your Lost Opportunities in Odoo 17 CRM
How to Manage Your Lost Opportunities in Odoo 17 CRM
Your Skill Boost Masterclass: Strategies for Effective Upskilling
Your Skill Boost Masterclass: Strategies for Effective UpskillingYour Skill Boost Masterclass: Strategies for Effective Upskilling
Your Skill Boost Masterclass: Strategies for Effective Upskilling
The Diamonds of 2023-2024 in the IGRA collection
The Diamonds of 2023-2024 in the IGRA collectionThe Diamonds of 2023-2024 in the IGRA collection
The Diamonds of 2023-2024 in the IGRA collection
How to Fix the Import Error in the Odoo 17
How to Fix the Import Error in the Odoo 17How to Fix the Import Error in the Odoo 17
How to Fix the Import Error in the Odoo 17
ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...
ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...
ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...
C1 Rubenstein AP HuG xxxxxxxxxxxxxx.pptx
C1 Rubenstein AP HuG xxxxxxxxxxxxxx.pptxC1 Rubenstein AP HuG xxxxxxxxxxxxxx.pptx
C1 Rubenstein AP HuG xxxxxxxxxxxxxx.pptx
Chapter 4 - Islamic Financial Institutions in Malaysia.pptx
Chapter 4 - Islamic Financial Institutions in Malaysia.pptxChapter 4 - Islamic Financial Institutions in Malaysia.pptx
Chapter 4 - Islamic Financial Institutions in Malaysia.pptx
How to Setup Warehouse & Location in Odoo 17 Inventory
How to Setup Warehouse & Location in Odoo 17 InventoryHow to Setup Warehouse & Location in Odoo 17 Inventory
How to Setup Warehouse & Location in Odoo 17 Inventory
Pollock and Snow "DEIA in the Scholarly Landscape, Session One: Setting Expec...
Pollock and Snow "DEIA in the Scholarly Landscape, Session One: Setting Expec...Pollock and Snow "DEIA in the Scholarly Landscape, Session One: Setting Expec...
Pollock and Snow "DEIA in the Scholarly Landscape, Session One: Setting Expec...
The History of Stoke Newington Street Names
The History of Stoke Newington Street NamesThe History of Stoke Newington Street Names
The History of Stoke Newington Street Names
clinical examination of hip joint (1).pdf
clinical examination of hip joint (1).pdfclinical examination of hip joint (1).pdf
clinical examination of hip joint (1).pdf
Main Java[All of the Base Concepts}.docx
Main Java[All of the Base Concepts}.docxMain Java[All of the Base Concepts}.docx
Main Java[All of the Base Concepts}.docx
Executive Directors Chat Leveraging AI for Diversity, Equity, and Inclusion
Executive Directors Chat  Leveraging AI for Diversity, Equity, and InclusionExecutive Directors Chat  Leveraging AI for Diversity, Equity, and Inclusion
Executive Directors Chat Leveraging AI for Diversity, Equity, and Inclusion
PCOS corelations and management through Ayurveda.
PCOS corelations and management through Ayurveda.PCOS corelations and management through Ayurveda.
PCOS corelations and management through Ayurveda.
Smart-Money for SMC traders good time and ICT
Smart-Money for SMC traders good time and ICTSmart-Money for SMC traders good time and ICT
Smart-Money for SMC traders good time and ICT
বাংলাদেশ অর্থনৈতিক সমীক্ষা (Economic Review) ২০২৪ UJS App.pdf
বাংলাদেশ অর্থনৈতিক সমীক্ষা (Economic Review) ২০২৪ UJS App.pdfবাংলাদেশ অর্থনৈতিক সমীক্ষা (Economic Review) ২০২৪ UJS App.pdf
বাংলাদেশ অর্থনৈতিক সমীক্ষা (Economic Review) ২০২৪ UJS App.pdf
BBR 2024 Summer Sessions Interview Training
BBR  2024 Summer Sessions Interview TrainingBBR  2024 Summer Sessions Interview Training
BBR 2024 Summer Sessions Interview Training
Digital Artefact 1 - Tiny Home Environmental Design
Digital Artefact 1 - Tiny Home Environmental DesignDigital Artefact 1 - Tiny Home Environmental Design
Digital Artefact 1 - Tiny Home Environmental Design

Cooperating Techniques for Extracting Conceptual Taxonomies from Text

  • 1. Università degli studi di Bari “Aldo Moro” Dipartimento di Informatica Cooperating Techniques for Extracting Conceptual Taxonomies from Text S. Ferilli, F. Leuzzi, F. Rotella L.A.C.A.M. AI*IA 2011 XIIth Conference of the Italian Association for Artificial Intelligence Workshop on Mining Complex Patterns (MCP 2011) Palermo, Italy, September 17, 2011
  • 2. Overview 1. Introduction & Objectives 2. Extraction of knowledge from text 3. Knowledge representation formalism 4. Identification of relevant concepts 5. Generalization of similar concepts 6. Reasoning ‘by association’ 7. Conclusions & Future works Cooperating Techniques for Extracting Conceptual Taxonomies from Text - S. Ferilli, F. Leuzzi, F. Rotella 2
  • 3. Introduction The spread of electronic documents and document repositories has generated the need for automatic techniques to understand and handle the documents content in order to help users in satisfying their information needs. Full Text Understading is not trivial, due to: 1. intrinsic ambiguity of natural language; 2. huge amount of common sense and conceptual background knowledge. For facing these problems lexical and/or conceptual taxonomies are useful, even if manually building is very costly and error prone. Cooperating Techniques for Extracting Conceptual Taxonomies from Text - S. Ferilli, F. Leuzzi, F. Rotella 3
  • 4. Introduction This lack is a strong motivation towards automatic construction of conceptual networks by mining large amounts of documents in natural language. However, even assuming a correct knowledge representation, we are far to simulate human abilities yet. Cooperating Techniques for Extracting Conceptual Taxonomies from Text - S. Ferilli, F. Leuzzi, F. Rotella 4
  • 5. Objectives 1. Definition of a representation formalism for knowledge extracted from natural language texts 2. Extraction of concepts and relevance assessment 3. Generalization of concepts having similar descriptions 4. Definition of a kind of reasoning by concept association that looks for possible indirect connections between two identified concepts Cooperating Techniques for Extracting Conceptual Taxonomies from Text - S. Ferilli, F. Leuzzi, F. Rotella 5
  • 6. Extraction of knowledge from text Knowledge extracted by processing each sentence separately. Stanford Stanford Parser [1] Dependencies [2] The final output of the Stanford Dependencies is a typed syntactic structure of each sentence. Cooperating Techniques for Extracting Conceptual Taxonomies from Text - S. Ferilli, F. Leuzzi, F. Rotella 6
  • 7. Knowledge representation formalism Among all grammatical roles played by words in a sentence, only subject, verb and complement have been considered. In the final conceptual graph subjects and complements will represent concepts, while verbs will express relations between them. subject, subject, verb, complement complement Cooperating Techniques for Extracting Conceptual Taxonomies from Text - S. Ferilli, F. Leuzzi, F. Rotella 7
  • 8. Identification of relevant concept A mix of several techniques are brought to cooperation for identifying relevant concepts: ● Hub Words [3]: words having high frequency whose relevance is computed as: W (t )=α w 0 +β n+γ ∑ i=1 w (t i ) where: w0 , initial weight; n, # of relationships; w(ti), tf*idf weight of i-th word related to t. ● Keyword extraction techniques from single documents. ● EM Clustering provided by Weka [4] based on Euclidean distance. Cooperating Techniques for Extracting Conceptual Taxonomies from Text - S. Ferilli, F. Leuzzi, F. Rotella 8
  • 9. Identification of relevant concept Inspired to the Hub Words approach we have defined a Relevance Weight: A B C D E w (̄) c e(̄)c ∑( c , ̄ ) w (c ) d M −d ( c ) c ̄ k (̄) c W ( ̄ )=α c +β +γ +δ +ε max c w( c ) max c e ( c ) e( ̄ ) c dM max c k ( c ) where: α + β+γ +δ +ε =1 Nodes in the network are ranked by decreasing Relevance Weight. A suitable cut-point in the ranking is determined by choosing the first item such that: W ( c k )-W (c k+1 )≥ p⋅ max ( W ( c i )-W (c i+1 ) ) i =0,.. . , n−1 where: p∈ [ 0,1 ] Cooperating Techniques for Extracting Conceptual Taxonomies from Text - S. Ferilli, F. Leuzzi, F. Rotella 9
  • 10. Identification of relevant concept Relevance Weight in details Definition of the Initial Weight The whole set of triples <subject,verb,complement> is represented in a Concepts x Attributes matrix V recalling the classical Terms x Documents Vector Space Model. f i, j ∣A∣ Resembling tf*idf: ⋅log ∑ k f k, j ∣{ j : c i ∈a j }∣ w (c ) ̄ Therefore component A is: α max c w ( c) where w(c) is the initial weight assigned to node c computed according to the above tf*idf schema. Cooperating Techniques for Extracting Conceptual Taxonomies from Text - S. Ferilli, F. Leuzzi, F. Rotella 10
  • 11. Identification of relevant concept Relevance Weight in details Connections Number Component B considers the number of connections (edges) in which c is involved e(̄)c β max c e ( c ) Neighborhood Weight Summary Component C takes into account the average initial weight of all neighbors of c ∑ (c,c ) ̄ w ( c) γ e( c ) ̄ Cooperating Techniques for Extracting Conceptual Taxonomies from Text - S. Ferilli, F. Leuzzi, F. Rotella 11
  • 12. Identification of relevant concept Relevance Weight in details Inverse Distance form Center Component D represents the closeness to center of the cluster d M −d( c ) ̄ δ dM KE Influence Component E takes into account the outcome of three KE techniques suitably weighted: k (̄ ) c ε max c k (c ) where: k ( ̄ )=ςk co−occurrences ( ̄ )+ηk synset ( ̄ )+θk mvn ( ̄ ) c c c c Cooperating Techniques for Extracting Conceptual Taxonomies from Text - S. Ferilli, F. Leuzzi, F. Rotella 12
  • 13. Identification of relevant concept Relevance Weight in details 2 KE based on χ k co− occurrences=ς ● 2 co-occurrences max cluster χ kw synset ● KE based on k synset =η WordNet Synsets max ( kw synset ) KE by means kw mvn ● Multivariate Normal k mvn=θ max ( kw mvn ) Distribution (MVN) Cooperating Techniques for Extracting Conceptual Taxonomies from Text - S. Ferilli, F. Leuzzi, F. Rotella 13
  • 14. Identification of relevant concept Evaluations Test # α β γ δ ε p 1 0.10 0.10 0.30 0.25 0.25 1.0 2 0.20 0.15 0.15 0.25 0.25 0.7 3 0.15 0.25 0.30 0.15 0.15 1.0 Test # Concept A B C D E W 1 network 0.100 0.100 0.021 0.178 0.250 0.649 access 0.001 0.001 0.154 0.239 0.250 0.646 subset 6.32E-4 0.001 0.150 0.239 0.250 0.641 2 network 0.200 0.150 0.0105 0.178 0.250 0.789 3 network 0.150 0.250 0.021 0.146 0.150 0.717 user 0.127 0.195 0.022 0.146 0.150 0.641 number 0.113 0.187 0.022 0.146 0.150 0.619 individual 0.103 0.174 0.020 0.146 0.150 0.594 Cooperating Techniques for Extracting Conceptual Taxonomies from Text - S. Ferilli, F. Leuzzi, F. Rotella 14
  • 15. Generalization of similar concepts Pairwise clustering Take in account the description of each concept, consisting in a binary vector that represents presence or absence (1 or 0 respectively) of a <subject,complement> relation between the involved concepts. The Hamming distance provides a similarity evaluation between them. Cooperating Techniques for Extracting Conceptual Taxonomies from Text - S. Ferilli, F. Leuzzi, F. Rotella 15
  • 16. Generalization of similar concepts WordNet WordNet1 is an external resource that has some useful properties: 1. lexical taxonomy 2. each concept is described as a set of synonyms (synset) 3. synsets are interlinked by means of conceptual- semantic and lexical relations We are focused on hyperonymy, a relation that links the current synset to more general ones. 1. Cooperating Techniques for Extracting Conceptual Taxonomies from Text - S. Ferilli, F. Leuzzi, F. Rotella 16
  • 17. Generalization of similar concepts Taxonomical similarity function More general: provides a More specific: provides a similarity value on the bases of similarity value on the bases of common relations, without common relations, relying on focusing on the specific path. the specific path. Cooperating Techniques for Extracting Conceptual Taxonomies from Text - S. Ferilli, F. Leuzzi, F. Rotella 17
  • 18. Generalization of similar concepts WSD Domain Driven One Domain per Discourse assumption: many uses of a word in a coherent portion of text tend to share the same domain. Prevalent domain Prevalent domain individuation individuation Extraction of all Extraction of all synsets for each term synsets for each term Extraction of all Extraction of all domains for each synset domains for each synset Choice of prevalent Choice of prevalent domain synset domain synset Cooperating Techniques for Extracting Conceptual Taxonomies from Text - S. Ferilli, F. Leuzzi, F. Rotella 18
  • 19. Generalization of similar concepts Evaluations Two toy experiments have been performed with Hamming distance threshold respectively equal to 0.001 and 0.0001, while taxonomical similarity function threshold has been kept equal to 0.4. Cooperating Techniques for Extracting Conceptual Taxonomies from Text - S. Ferilli, F. Leuzzi, F. Rotella 19
  • 20. Reasoning ‘by association’ Breadth-First Search Given two nodes (concepts), a Breadth-First Search starts from both nodes, the former searches the latter's frontier and vice versa, until the two frontiers meet by common nodes. Then the path is restored going backward to the roots in both directions. Cooperating Techniques for Extracting Conceptual Taxonomies from Text - S. Ferilli, F. Leuzzi, F. Rotella 20
  • 21. Reasoning ‘by association’ Evaluations The table below shows a sample of possible outcomes. E.g., an interpretation of case 5 can be: “the adults write about freedom and use platform, that is recognized as a technology, as well as the internet”. Cooperating Techniques for Extracting Conceptual Taxonomies from Text - S. Ferilli, F. Leuzzi, F. Rotella 21
  • 22. Conclusions This work proposes an approach to extract automatic conceptual taxonomy from natural language texts. It works mixing different techniques in order to: ● identify relevant terms/concepts in text; ● generalize similar concepts; ● perform some kind of reasoning “by association”. Preliminary experiments show that this approach can be viable although extensions and refinements are needed. A reliable outcome might help users in understanding the text content and machines to automatically perform some kind of reasoning on the taxonomy. Cooperating Techniques for Extracting Conceptual Taxonomies from Text - S. Ferilli, F. Leuzzi, F. Rotella 22
  • 23. Future works 1. Extending the knowledge representation formalism to express negation. 2. Defining a strategy to make a better choice of weights in Relevance Weight computation. 3. Enriching the adjacency matrix to improve concept descriptions. 4. ODD alternatives exploration, to overcome its limits. 5. Taxonomical similarity measures take into account only the hypernym relation, while a more accurate similarity can be obtained adding other relations. 6. Define a strategy to prefer one verb rather than keeping all of them, in reasoning ‘by association’ phase. Cooperating Techniques for Extracting Conceptual Taxonomies from Text - S. Ferilli, F. Leuzzi, F. Rotella 23
  • 24. References [1] Dan Klein and Christopher D. Manning. Fast exact inference with a factored model for natural language parsing. In Advances in Neural Information Processing Systems, volume 15. MIT Press, 2003. [2] Marie-Catherine de Marneffe, Bill MacCartney, and Christopher D. Manning. Generating typed dependency parses from phrase structure trees. In LREC, 2006. [3] Sang Ok Koo, Soo Yeon Lim, and Sang-Jo Lee. Constructing an ontology based on hub words. In ISMIS’03, pages 93–97, 2003. [4] M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, and I.H. Witten. The weka data mining software: an update. SIGKDD Explorations, 11(1):10–18,2009. Cooperating Techniques for Extracting Conceptual Taxonomies from Text - S. Ferilli, F. Leuzzi, F. Rotella 24