Cooperating Techniques for Extracting Conceptual Taxonomies from Text
Upcoming SlideShare
Loading in...5
×
 

Cooperating Techniques for Extracting Conceptual Taxonomies from Text

on

  • 295 views

 

Statistics

Views

Total Views
295
Views on SlideShare
294
Embed Views
1

Actions

Likes
0
Downloads
4
Comments
0

1 Embed 1

http://www.linkedin.com 1

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Cooperating Techniques for Extracting Conceptual Taxonomies from Text Cooperating Techniques for Extracting Conceptual Taxonomies from Text Presentation Transcript

  • Università degli studi di Bari “Aldo Moro” Dipartimento di Informatica Cooperating Techniques for Extracting Conceptual Taxonomies from Text S. Ferilli, F. Leuzzi, F. RotellaL.A.C.A.M.http://lacam.di.uniba.it:8000 AI*IA 2011 XIIth Conference of the Italian Association for Artificial Intelligence Workshop on Mining Complex Patterns (MCP 2011) Palermo, Italy, September 17, 2011
  • Overview 1. Introduction & Objectives 2. Extraction of knowledge from text 3. Knowledge representation formalism 4. Identification of relevant concepts 5. Generalization of similar concepts 6. Reasoning ‘by association’ 7. Conclusions & Future worksCooperating Techniques for Extracting Conceptual Taxonomies from Text - S. Ferilli, F. Leuzzi, F. Rotella 2
  • Introduction The spread of electronic documents and document repositories has generated the need for automatic techniques to understand and handle the documents content in order to help users in satisfying their information needs. Full Text Understading is not trivial, due to: 1. intrinsic ambiguity of natural language; 2. huge amount of common sense and conceptual background knowledge. For facing these problems lexical and/or conceptual taxonomies are useful, even if manually building is very costly and error prone.Cooperating Techniques for Extracting Conceptual Taxonomies from Text - S. Ferilli, F. Leuzzi, F. Rotella 3
  • Introduction This lack is a strong motivation towards automatic construction of conceptual networks by mining large amounts of documents in natural language. However, even assuming a correct knowledge representation, we are far to simulate human abilities yet.Cooperating Techniques for Extracting Conceptual Taxonomies from Text - S. Ferilli, F. Leuzzi, F. Rotella 4
  • Objectives 1. Definition of a representation formalism for knowledge extracted from natural language texts 2. Extraction of concepts and relevance assessment 3. Generalization of concepts having similar descriptions 4. Definition of a kind of reasoning by concept association that looks for possible indirect connections between two identified conceptsCooperating Techniques for Extracting Conceptual Taxonomies from Text - S. Ferilli, F. Leuzzi, F. Rotella 5
  • Extraction of knowledge from text Knowledge extracted by processing each sentence separately. Stanford Stanford Parser [1] Dependencies [2] The final output of the Stanford Dependencies is a typed syntactic structure of each sentence.Cooperating Techniques for Extracting Conceptual Taxonomies from Text - S. Ferilli, F. Leuzzi, F. Rotella 6
  • Knowledge representation formalism Among all grammatical roles played by words in a sentence, only subject, verb and complement have been considered. In the final conceptual graph subjects and complements will represent concepts, while verbs will express relations between them. subject, subject, verb, complement complementCooperating Techniques for Extracting Conceptual Taxonomies from Text - S. Ferilli, F. Leuzzi, F. Rotella 7
  • Identification of relevant concept A mix of several techniques are brought to cooperation for identifying relevant concepts: ● Hub Words [3]: words having high frequency whose relevance is computed as: W (t )=α w 0 +β n+γ ∑ i=1 w (t i ) where: w0 , initial weight; n, # of relationships; w(ti), tf*idf weight of i-th word related to t. ● Keyword extraction techniques from single documents. ● EM Clustering provided by Weka [4] based on Euclidean distance.Cooperating Techniques for Extracting Conceptual Taxonomies from Text - S. Ferilli, F. Leuzzi, F. Rotella 8
  • Identification of relevant concept Inspired to the Hub Words approach we have defined a Relevance Weight: A B C D E w (̄) c e(̄)c ∑( c , ̄ ) w (c ) d M −d ( c ) c ̄ k (̄) cW ( ̄ )=α c +β +γ +δ +ε max c w( c ) max c e ( c ) e( ̄ ) c dM max c k ( c ) where: α + β+γ +δ +ε =1 Nodes in the network are ranked by decreasing Relevance Weight. A suitable cut-point in the ranking is determined by choosing the first item such that: W ( c k )-W (c k+1 )≥ p⋅ max ( W ( c i )-W (c i+1 ) ) i =0,.. . , n−1 where: p∈ [ 0,1 ]Cooperating Techniques for Extracting Conceptual Taxonomies from Text - S. Ferilli, F. Leuzzi, F. Rotella 9
  • Identification of relevant concept Relevance Weight in details Definition of the Initial Weight The whole set of triples <subject,verb,complement> is represented in a Concepts x Attributes matrix V recalling the classical Terms x Documents Vector Space Model. f i, j ∣A∣ Resembling tf*idf: ⋅log ∑ k f k, j ∣{ j : c i ∈a j }∣ w (c ) ̄ Therefore component A is: α max c w ( c) where w(c) is the initial weight assigned to node c computed according to the above tf*idf schema.Cooperating Techniques for Extracting Conceptual Taxonomies from Text - S. Ferilli, F. Leuzzi, F. Rotella 10
  • Identification of relevant concept Relevance Weight in details Connections Number Component B considers the number of connections (edges) in which c is involved e(̄)c β max c e ( c ) Neighborhood Weight Summary Component C takes into account the average initial weight of all neighbors of c ∑ (c,c ) ̄ w ( c) γ e( c ) ̄Cooperating Techniques for Extracting Conceptual Taxonomies from Text - S. Ferilli, F. Leuzzi, F. Rotella 11
  • Identification of relevant concept Relevance Weight in details Inverse Distance form Center Component D represents the closeness to center of the cluster d M −d( c ) ̄ δ dM KE Influence Component E takes into account the outcome of three KE techniques suitably weighted: k (̄ ) c ε max c k (c ) where: k ( ̄ )=ςk co−occurrences ( ̄ )+ηk synset ( ̄ )+θk mvn ( ̄ ) c c c cCooperating Techniques for Extracting Conceptual Taxonomies from Text - S. Ferilli, F. Leuzzi, F. Rotella 12
  • Identification of relevant concept Relevance Weight in details 2 KE based on χ k co− occurrences=ς ● 2 co-occurrences max cluster χ kw synset ● KE based on k synset =η WordNet Synsets max ( kw synset ) KE by means kw mvn ● Multivariate Normal k mvn=θ max ( kw mvn ) Distribution (MVN)Cooperating Techniques for Extracting Conceptual Taxonomies from Text - S. Ferilli, F. Leuzzi, F. Rotella 13
  • Identification of relevant concept Evaluations Test # α β γ δ ε p 1 0.10 0.10 0.30 0.25 0.25 1.0 2 0.20 0.15 0.15 0.25 0.25 0.7 3 0.15 0.25 0.30 0.15 0.15 1.0 Test # Concept A B C D E W 1 network 0.100 0.100 0.021 0.178 0.250 0.649 access 0.001 0.001 0.154 0.239 0.250 0.646 subset 6.32E-4 0.001 0.150 0.239 0.250 0.641 2 network 0.200 0.150 0.0105 0.178 0.250 0.789 3 network 0.150 0.250 0.021 0.146 0.150 0.717 user 0.127 0.195 0.022 0.146 0.150 0.641 number 0.113 0.187 0.022 0.146 0.150 0.619 individual 0.103 0.174 0.020 0.146 0.150 0.594Cooperating Techniques for Extracting Conceptual Taxonomies from Text - S. Ferilli, F. Leuzzi, F. Rotella 14
  • Generalization of similar concepts Pairwise clustering Take in account the description of each concept, consisting in a binary vector that represents presence or absence (1 or 0 respectively) of a <subject,complement> relation between the involved concepts. The Hamming distance provides a similarity evaluation between them.Cooperating Techniques for Extracting Conceptual Taxonomies from Text - S. Ferilli, F. Leuzzi, F. Rotella 15
  • Generalization of similar concepts WordNet WordNet1 is an external resource that has some useful properties: 1. lexical taxonomy 2. each concept is described as a set of synonyms (synset) 3. synsets are interlinked by means of conceptual- semantic and lexical relations We are focused on hyperonymy, a relation that links the current synset to more general ones. 1. http://wordnet.princeton.edu/Cooperating Techniques for Extracting Conceptual Taxonomies from Text - S. Ferilli, F. Leuzzi, F. Rotella 16
  • Generalization of similar concepts Taxonomical similarity function More general: provides a More specific: provides a similarity value on the bases of similarity value on the bases of common relations, without common relations, relying on focusing on the specific path. the specific path.Cooperating Techniques for Extracting Conceptual Taxonomies from Text - S. Ferilli, F. Leuzzi, F. Rotella 17
  • Generalization of similar concepts WSD Domain Driven One Domain per Discourse assumption: many uses of a word in a coherent portion of text tend to share the same domain. Prevalent domain Prevalent domain individuation individuation Extraction of all Extraction of all synsets for each term synsets for each term Extraction of all Extraction of all domains for each synset domains for each synset Choice of prevalent Choice of prevalent domain synset domain synsetCooperating Techniques for Extracting Conceptual Taxonomies from Text - S. Ferilli, F. Leuzzi, F. Rotella 18
  • Generalization of similar concepts Evaluations Two toy experiments have been performed with Hamming distance threshold respectively equal to 0.001 and 0.0001, while taxonomical similarity function threshold has been kept equal to 0.4.Cooperating Techniques for Extracting Conceptual Taxonomies from Text - S. Ferilli, F. Leuzzi, F. Rotella 19
  • Reasoning ‘by association’ Breadth-First Search Given two nodes (concepts), a Breadth-First Search starts from both nodes, the former searches the latters frontier and vice versa, until the two frontiers meet by common nodes. Then the path is restored going backward to the roots in both directions.Cooperating Techniques for Extracting Conceptual Taxonomies from Text - S. Ferilli, F. Leuzzi, F. Rotella 20
  • Reasoning ‘by association’ Evaluations The table below shows a sample of possible outcomes. E.g., an interpretation of case 5 can be: “the adults write about freedom and use platform, that is recognized as a technology, as well as the internet”.Cooperating Techniques for Extracting Conceptual Taxonomies from Text - S. Ferilli, F. Leuzzi, F. Rotella 21
  • Conclusions This work proposes an approach to extract automatic conceptual taxonomy from natural language texts. It works mixing different techniques in order to: ● identify relevant terms/concepts in text; ● generalize similar concepts; ● perform some kind of reasoning “by association”. Preliminary experiments show that this approach can be viable although extensions and refinements are needed. A reliable outcome might help users in understanding the text content and machines to automatically perform some kind of reasoning on the taxonomy.Cooperating Techniques for Extracting Conceptual Taxonomies from Text - S. Ferilli, F. Leuzzi, F. Rotella 22
  • Future works 1. Extending the knowledge representation formalism to express negation. 2. Defining a strategy to make a better choice of weights in Relevance Weight computation. 3. Enriching the adjacency matrix to improve concept descriptions. 4. ODD alternatives exploration, to overcome its limits. 5. Taxonomical similarity measures take into account only the hypernym relation, while a more accurate similarity can be obtained adding other relations. 6. Define a strategy to prefer one verb rather than keeping all of them, in reasoning ‘by association’ phase.Cooperating Techniques for Extracting Conceptual Taxonomies from Text - S. Ferilli, F. Leuzzi, F. Rotella 23
  • References [1] Dan Klein and Christopher D. Manning. Fast exact inference with a factored model for natural language parsing. In Advances in Neural Information Processing Systems, volume 15. MIT Press, 2003. [2] Marie-Catherine de Marneffe, Bill MacCartney, and Christopher D. Manning. Generating typed dependency parses from phrase structure trees. In LREC, 2006. [3] Sang Ok Koo, Soo Yeon Lim, and Sang-Jo Lee. Constructing an ontology based on hub words. In ISMIS’03, pages 93–97, 2003. [4] M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, and I.H. Witten. The weka data mining software: an update. SIGKDD Explorations, 11(1):10–18,2009.Cooperating Techniques for Extracting Conceptual Taxonomies from Text - S. Ferilli, F. Leuzzi, F. Rotella 24