0
Università degli studi di Bari “Aldo Moro”                                 Dipartimento di Informatica                    ...
Overview          1. Introduction & Objectives          2. Extraction of knowledge from text          3. Knowledge represe...
Introduction          The spread of electronic documents and document          repositories has generated the need for aut...
Introduction          This lack is a strong motivation towards          automatic construction of conceptual          netw...
Objectives          1. Definition of a representation formalism for knowledge             extracted from natural language ...
Extraction of knowledge                           from text          Knowledge extracted by processing each sentence separ...
Knowledge representation                        formalism          Among all grammatical roles played by words in a senten...
Identification of                               relevant concept       A mix of several techniques are brought to cooperat...
Identification of                               relevant concept          Inspired to the Hub Words approach we have defin...
Identification of relevant concept               Relevance Weight in details                          Definition of the In...
Identification of relevant concept               Relevance Weight in details                                   Connections...
Identification of relevant concept               Relevance Weight in details                            Inverse Distance f...
Identification of relevant concept               Relevance Weight in details                                              ...
Identification of relevant concept                                         Evaluations                        Test #      ...
Generalization of similar concepts                         Pairwise clustering          Take in account the description of...
Generalization of similar concepts                                         WordNet            WordNet1 is an external reso...
Generalization of similar concepts            Taxonomical similarity function    More general: provides a                 ...
Generalization of similar concepts                       WSD Domain Driven          One Domain per Discourse assumption: m...
Generalization of similar concepts                                     Evaluations          Two toy experiments have been ...
Reasoning ‘by association’                      Breadth-First Search          Given two nodes (concepts), a Breadth-First ...
Reasoning ‘by association’                                     Evaluations          The table below shows a sample of poss...
Conclusions    This work proposes an approach to extract automatic conceptual    taxonomy from natural language texts.    ...
Future works          1. Extending the knowledge representation formalism to             express negation.          2. Def...
References          [1] Dan Klein and Christopher D. Manning. Fast exact          inference with a factored model for natu...
Upcoming SlideShare
Loading in...5
×

Cooperating Techniques for Extracting Conceptual Taxonomies from Text

357

Published on

The current abundance of electronic documents requires automatic techniques that support the users in understanding their content and extracting useful information. To this aim, it is important to have conceptual taxonomies that express common sense and implicit relationships among concepts. This work proposes a mix of several tech niques that are brought to cooperation for learning them automatically. Although the work is at a preliminary stage, interesting initial results suggest to go on extending and improving the approach.
More details can be found here:
http://www.di.uniba.it/~loglisci/MCP2011/mce2011.pdf

Published in: Education, Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
357
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
5
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Transcript of "Cooperating Techniques for Extracting Conceptual Taxonomies from Text"

  1. 1. Università degli studi di Bari “Aldo Moro” Dipartimento di Informatica Cooperating Techniques for Extracting Conceptual Taxonomies from Text S. Ferilli, F. Leuzzi, F. RotellaL.A.C.A.M.http://lacam.di.uniba.it:8000 AI*IA 2011 XIIth Conference of the Italian Association for Artificial Intelligence Workshop on Mining Complex Patterns (MCP 2011) Palermo, Italy, September 17, 2011
  2. 2. Overview 1. Introduction & Objectives 2. Extraction of knowledge from text 3. Knowledge representation formalism 4. Identification of relevant concepts 5. Generalization of similar concepts 6. Reasoning ‘by association’ 7. Conclusions & Future worksCooperating Techniques for Extracting Conceptual Taxonomies from Text - S. Ferilli, F. Leuzzi, F. Rotella 2
  3. 3. Introduction The spread of electronic documents and document repositories has generated the need for automatic techniques to understand and handle the documents content in order to help users in satisfying their information needs. Full Text Understading is not trivial, due to: 1. intrinsic ambiguity of natural language; 2. huge amount of common sense and conceptual background knowledge. For facing these problems lexical and/or conceptual taxonomies are useful, even if manually building is very costly and error prone.Cooperating Techniques for Extracting Conceptual Taxonomies from Text - S. Ferilli, F. Leuzzi, F. Rotella 3
  4. 4. Introduction This lack is a strong motivation towards automatic construction of conceptual networks by mining large amounts of documents in natural language. However, even assuming a correct knowledge representation, we are far to simulate human abilities yet.Cooperating Techniques for Extracting Conceptual Taxonomies from Text - S. Ferilli, F. Leuzzi, F. Rotella 4
  5. 5. Objectives 1. Definition of a representation formalism for knowledge extracted from natural language texts 2. Extraction of concepts and relevance assessment 3. Generalization of concepts having similar descriptions 4. Definition of a kind of reasoning by concept association that looks for possible indirect connections between two identified conceptsCooperating Techniques for Extracting Conceptual Taxonomies from Text - S. Ferilli, F. Leuzzi, F. Rotella 5
  6. 6. Extraction of knowledge from text Knowledge extracted by processing each sentence separately. Stanford Stanford Parser [1] Dependencies [2] The final output of the Stanford Dependencies is a typed syntactic structure of each sentence.Cooperating Techniques for Extracting Conceptual Taxonomies from Text - S. Ferilli, F. Leuzzi, F. Rotella 6
  7. 7. Knowledge representation formalism Among all grammatical roles played by words in a sentence, only subject, verb and complement have been considered. In the final conceptual graph subjects and complements will represent concepts, while verbs will express relations between them. subject, subject, verb, complement complementCooperating Techniques for Extracting Conceptual Taxonomies from Text - S. Ferilli, F. Leuzzi, F. Rotella 7
  8. 8. Identification of relevant concept A mix of several techniques are brought to cooperation for identifying relevant concepts: ● Hub Words [3]: words having high frequency whose relevance is computed as: W (t )=α w 0 +β n+γ ∑ i=1 w (t i ) where: w0 , initial weight; n, # of relationships; w(ti), tf*idf weight of i-th word related to t. ● Keyword extraction techniques from single documents. ● EM Clustering provided by Weka [4] based on Euclidean distance.Cooperating Techniques for Extracting Conceptual Taxonomies from Text - S. Ferilli, F. Leuzzi, F. Rotella 8
  9. 9. Identification of relevant concept Inspired to the Hub Words approach we have defined a Relevance Weight: A B C D E w (̄) c e(̄)c ∑( c , ̄ ) w (c ) d M −d ( c ) c ̄ k (̄) cW ( ̄ )=α c +β +γ +δ +ε max c w( c ) max c e ( c ) e( ̄ ) c dM max c k ( c ) where: α + β+γ +δ +ε =1 Nodes in the network are ranked by decreasing Relevance Weight. A suitable cut-point in the ranking is determined by choosing the first item such that: W ( c k )-W (c k+1 )≥ p⋅ max ( W ( c i )-W (c i+1 ) ) i =0,.. . , n−1 where: p∈ [ 0,1 ]Cooperating Techniques for Extracting Conceptual Taxonomies from Text - S. Ferilli, F. Leuzzi, F. Rotella 9
  10. 10. Identification of relevant concept Relevance Weight in details Definition of the Initial Weight The whole set of triples <subject,verb,complement> is represented in a Concepts x Attributes matrix V recalling the classical Terms x Documents Vector Space Model. f i, j ∣A∣ Resembling tf*idf: ⋅log ∑ k f k, j ∣{ j : c i ∈a j }∣ w (c ) ̄ Therefore component A is: α max c w ( c) where w(c) is the initial weight assigned to node c computed according to the above tf*idf schema.Cooperating Techniques for Extracting Conceptual Taxonomies from Text - S. Ferilli, F. Leuzzi, F. Rotella 10
  11. 11. Identification of relevant concept Relevance Weight in details Connections Number Component B considers the number of connections (edges) in which c is involved e(̄)c β max c e ( c ) Neighborhood Weight Summary Component C takes into account the average initial weight of all neighbors of c ∑ (c,c ) ̄ w ( c) γ e( c ) ̄Cooperating Techniques for Extracting Conceptual Taxonomies from Text - S. Ferilli, F. Leuzzi, F. Rotella 11
  12. 12. Identification of relevant concept Relevance Weight in details Inverse Distance form Center Component D represents the closeness to center of the cluster d M −d( c ) ̄ δ dM KE Influence Component E takes into account the outcome of three KE techniques suitably weighted: k (̄ ) c ε max c k (c ) where: k ( ̄ )=ςk co−occurrences ( ̄ )+ηk synset ( ̄ )+θk mvn ( ̄ ) c c c cCooperating Techniques for Extracting Conceptual Taxonomies from Text - S. Ferilli, F. Leuzzi, F. Rotella 12
  13. 13. Identification of relevant concept Relevance Weight in details 2 KE based on χ k co− occurrences=ς ● 2 co-occurrences max cluster χ kw synset ● KE based on k synset =η WordNet Synsets max ( kw synset ) KE by means kw mvn ● Multivariate Normal k mvn=θ max ( kw mvn ) Distribution (MVN)Cooperating Techniques for Extracting Conceptual Taxonomies from Text - S. Ferilli, F. Leuzzi, F. Rotella 13
  14. 14. Identification of relevant concept Evaluations Test # α β γ δ ε p 1 0.10 0.10 0.30 0.25 0.25 1.0 2 0.20 0.15 0.15 0.25 0.25 0.7 3 0.15 0.25 0.30 0.15 0.15 1.0 Test # Concept A B C D E W 1 network 0.100 0.100 0.021 0.178 0.250 0.649 access 0.001 0.001 0.154 0.239 0.250 0.646 subset 6.32E-4 0.001 0.150 0.239 0.250 0.641 2 network 0.200 0.150 0.0105 0.178 0.250 0.789 3 network 0.150 0.250 0.021 0.146 0.150 0.717 user 0.127 0.195 0.022 0.146 0.150 0.641 number 0.113 0.187 0.022 0.146 0.150 0.619 individual 0.103 0.174 0.020 0.146 0.150 0.594Cooperating Techniques for Extracting Conceptual Taxonomies from Text - S. Ferilli, F. Leuzzi, F. Rotella 14
  15. 15. Generalization of similar concepts Pairwise clustering Take in account the description of each concept, consisting in a binary vector that represents presence or absence (1 or 0 respectively) of a <subject,complement> relation between the involved concepts. The Hamming distance provides a similarity evaluation between them.Cooperating Techniques for Extracting Conceptual Taxonomies from Text - S. Ferilli, F. Leuzzi, F. Rotella 15
  16. 16. Generalization of similar concepts WordNet WordNet1 is an external resource that has some useful properties: 1. lexical taxonomy 2. each concept is described as a set of synonyms (synset) 3. synsets are interlinked by means of conceptual- semantic and lexical relations We are focused on hyperonymy, a relation that links the current synset to more general ones. 1. http://wordnet.princeton.edu/Cooperating Techniques for Extracting Conceptual Taxonomies from Text - S. Ferilli, F. Leuzzi, F. Rotella 16
  17. 17. Generalization of similar concepts Taxonomical similarity function More general: provides a More specific: provides a similarity value on the bases of similarity value on the bases of common relations, without common relations, relying on focusing on the specific path. the specific path.Cooperating Techniques for Extracting Conceptual Taxonomies from Text - S. Ferilli, F. Leuzzi, F. Rotella 17
  18. 18. Generalization of similar concepts WSD Domain Driven One Domain per Discourse assumption: many uses of a word in a coherent portion of text tend to share the same domain. Prevalent domain Prevalent domain individuation individuation Extraction of all Extraction of all synsets for each term synsets for each term Extraction of all Extraction of all domains for each synset domains for each synset Choice of prevalent Choice of prevalent domain synset domain synsetCooperating Techniques for Extracting Conceptual Taxonomies from Text - S. Ferilli, F. Leuzzi, F. Rotella 18
  19. 19. Generalization of similar concepts Evaluations Two toy experiments have been performed with Hamming distance threshold respectively equal to 0.001 and 0.0001, while taxonomical similarity function threshold has been kept equal to 0.4.Cooperating Techniques for Extracting Conceptual Taxonomies from Text - S. Ferilli, F. Leuzzi, F. Rotella 19
  20. 20. Reasoning ‘by association’ Breadth-First Search Given two nodes (concepts), a Breadth-First Search starts from both nodes, the former searches the latters frontier and vice versa, until the two frontiers meet by common nodes. Then the path is restored going backward to the roots in both directions.Cooperating Techniques for Extracting Conceptual Taxonomies from Text - S. Ferilli, F. Leuzzi, F. Rotella 20
  21. 21. Reasoning ‘by association’ Evaluations The table below shows a sample of possible outcomes. E.g., an interpretation of case 5 can be: “the adults write about freedom and use platform, that is recognized as a technology, as well as the internet”.Cooperating Techniques for Extracting Conceptual Taxonomies from Text - S. Ferilli, F. Leuzzi, F. Rotella 21
  22. 22. Conclusions This work proposes an approach to extract automatic conceptual taxonomy from natural language texts. It works mixing different techniques in order to: ● identify relevant terms/concepts in text; ● generalize similar concepts; ● perform some kind of reasoning “by association”. Preliminary experiments show that this approach can be viable although extensions and refinements are needed. A reliable outcome might help users in understanding the text content and machines to automatically perform some kind of reasoning on the taxonomy.Cooperating Techniques for Extracting Conceptual Taxonomies from Text - S. Ferilli, F. Leuzzi, F. Rotella 22
  23. 23. Future works 1. Extending the knowledge representation formalism to express negation. 2. Defining a strategy to make a better choice of weights in Relevance Weight computation. 3. Enriching the adjacency matrix to improve concept descriptions. 4. ODD alternatives exploration, to overcome its limits. 5. Taxonomical similarity measures take into account only the hypernym relation, while a more accurate similarity can be obtained adding other relations. 6. Define a strategy to prefer one verb rather than keeping all of them, in reasoning ‘by association’ phase.Cooperating Techniques for Extracting Conceptual Taxonomies from Text - S. Ferilli, F. Leuzzi, F. Rotella 23
  24. 24. References [1] Dan Klein and Christopher D. Manning. Fast exact inference with a factored model for natural language parsing. In Advances in Neural Information Processing Systems, volume 15. MIT Press, 2003. [2] Marie-Catherine de Marneffe, Bill MacCartney, and Christopher D. Manning. Generating typed dependency parses from phrase structure trees. In LREC, 2006. [3] Sang Ok Koo, Soo Yeon Lim, and Sang-Jo Lee. Constructing an ontology based on hub words. In ISMIS’03, pages 93–97, 2003. [4] M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, and I.H. Witten. The weka data mining software: an update. SIGKDD Explorations, 11(1):10–18,2009.Cooperating Techniques for Extracting Conceptual Taxonomies from Text - S. Ferilli, F. Leuzzi, F. Rotella 24
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×