Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Introduction
Techniques
Overview and Summary
Multilingual Text Classification
Gerard de Melo, Stefan Siersdorfer
Max Planck...
Introduction
Techniques
Overview and Summary
Text Classification
Text Classification
task: automatically assign text
documen...
Introduction
Techniques
Overview and Summary
Text Classification
Text Classification
task: automatically assign text
documen...
Introduction
Techniques
Overview and Summary
Text Classification
Text Classification
task: automatically assign text
documen...
Introduction
Techniques
Overview and Summary
Text Classification
Text Classification
task: automatically assign text
documen...
Introduction
Techniques
Overview and Summary
Machine Translation
Mapping to Semantic Concept
Weight Propagation
Machine Tr...
Introduction
Techniques
Overview and Summary
Machine Translation
Mapping to Semantic Concept
Weight Propagation
Machine Tr...
Introduction
Techniques
Overview and Summary
Machine Translation
Mapping to Semantic Concept
Weight Propagation
Machine Tr...
Introduction
Techniques
Overview and Summary
Machine Translation
Mapping to Semantic Concept
Weight Propagation
Semantic C...
Introduction
Techniques
Overview and Summary
Machine Translation
Mapping to Semantic Concept
Weight Propagation
Semantic C...
Introduction
Techniques
Overview and Summary
Machine Translation
Mapping to Semantic Concept
Weight Propagation
Semantic C...
Introduction
Techniques
Overview and Summary
Machine Translation
Mapping to Semantic Concept
Weight Propagation
Semantic C...
Introduction
Techniques
Overview and Summary
Machine Translation
Mapping to Semantic Concept
Weight Propagation
Weight Pro...
Introduction
Techniques
Overview and Summary
Machine Translation
Mapping to Semantic Concept
Weight Propagation
Weight Pro...
Introduction
Techniques
Overview and Summary
Machine Translation
Mapping to Semantic Concept
Weight Propagation
Weight Pro...
Introduction
Techniques
Overview and Summary
Overview and Summary
Overview and Summary
Ontology Region Mapping
1 optionall...
Introduction
Techniques
Overview and Summary
Overview and Summary
Overview and Summary
Ontology Region Mapping
1 optionall...
Introduction
Techniques
Overview and Summary
Overview and Summary
Overview and Summary
Ontology Region Mapping
1 optionall...
Upcoming SlideShare
Loading in …5
×

Multilingual Text Classification using Ontologies

944 views

Published on

In this paper, we investigate strategies for automatically classifying documents in different languages thematically, geographically or according to other criteria. A novel linguistically motivated text representation scheme is presented that can be used with machine learning algorithms in order to learn classifications from pre-classified examples and then automatically classify documents that might be provided in entirely different languages. Our approach makes use of ontologies and lexical resources but goes beyond a simple mapping from terms to concepts by fully exploiting the external knowledge manifested in such resources and mapping to entire regions of concepts. For this, a graph traversal algorithm is used to explore related concepts that might be relevant. Extensive testing has shown that our methods lead to significant improvements compared to existing approaches.

  • Be the first to comment

Multilingual Text Classification using Ontologies

  1. 1. Introduction Techniques Overview and Summary Multilingual Text Classification Gerard de Melo, Stefan Siersdorfer Max Planck Institute for Computer Science Saarbr¨ucken, Germany 2007-04-04 G. de Melo, S. Siersdorfer, Max-Planck-Institut Informatik Multilingual Text Classification
  2. 2. Introduction Techniques Overview and Summary Text Classification Text Classification task: automatically assign text documents to classes (e.g. thematically, geographically) machine learning algorithms, e.g. SVM, can learn from pre-classified training documents multilingual case: documents in multiple languages applications: news wire filtering, library management, e-mail, etc. G. de Melo, S. Siersdorfer, Max-Planck-Institut Informatik Multilingual Text Classification
  3. 3. Introduction Techniques Overview and Summary Text Classification Text Classification task: automatically assign text documents to classes (e.g. thematically, geographically) machine learning algorithms, e.g. SVM, can learn from pre-classified training documents multilingual case: documents in multiple languages applications: news wire filtering, library management, e-mail, etc. G. de Melo, S. Siersdorfer, Max-Planck-Institut Informatik Multilingual Text Classification
  4. 4. Introduction Techniques Overview and Summary Text Classification Text Classification task: automatically assign text documents to classes (e.g. thematically, geographically) machine learning algorithms, e.g. SVM, can learn from pre-classified training documents multilingual case: documents in multiple languages applications: news wire filtering, library management, e-mail, etc. G. de Melo, S. Siersdorfer, Max-Planck-Institut Informatik Multilingual Text Classification
  5. 5. Introduction Techniques Overview and Summary Text Classification Text Classification task: automatically assign text documents to classes (e.g. thematically, geographically) machine learning algorithms, e.g. SVM, can learn from pre-classified training documents multilingual case: documents in multiple languages applications: news wire filtering, library management, e-mail, etc. G. de Melo, S. Siersdorfer, Max-Planck-Institut Informatik Multilingual Text Classification
  6. 6. Introduction Techniques Overview and Summary Machine Translation Mapping to Semantic Concept Weight Propagation Machine Translation for Multilingual TC idea: simply translate all documents into a single language LI (prior work by Jalam 2002, Rigutini et al. 2005) shortcomings of this approach lexical variety in LI (English: huge vocabulary, many synonyms) variety of expression in source languages lexical ambiguity in LI (unnecessary introduction of additional ambiguity) G. de Melo, S. Siersdorfer, Max-Planck-Institut Informatik Multilingual Text Classification
  7. 7. Introduction Techniques Overview and Summary Machine Translation Mapping to Semantic Concept Weight Propagation Machine Translation for Multilingual TC idea: simply translate all documents into a single language LI (prior work by Jalam 2002, Rigutini et al. 2005) shortcomings of this approach lexical variety in LI (English: huge vocabulary, many synonyms) variety of expression in source languages lexical ambiguity in LI (unnecessary introduction of additional ambiguity) G. de Melo, S. Siersdorfer, Max-Planck-Institut Informatik Multilingual Text Classification
  8. 8. Introduction Techniques Overview and Summary Machine Translation Mapping to Semantic Concept Weight Propagation Machine Translation for Multilingual TC idea: simply translate all documents into a single language LI (prior work by Jalam 2002, Rigutini et al. 2005) shortcomings of this approach lexical variety in LI (English: huge vocabulary, many synonyms) variety of expression in source languages lexical ambiguity in LI (unnecessary introduction of additional ambiguity) Spanish coche −→ car French voiture −→ automobile G. de Melo, S. Siersdorfer, Max-Planck-Institut Informatik Multilingual Text Classification
  9. 9. Introduction Techniques Overview and Summary Machine Translation Mapping to Semantic Concept Weight Propagation Semantic Concepts Idea map all words to semantic concepts (e.g. WordNet synsets), thus distinguishing different senses of a word while identifying synonyms disambiguate using context information construct feature vectors by counting occurrences of concepts rather than terms G. de Melo, S. Siersdorfer, Max-Planck-Institut Informatik Multilingual Text Classification
  10. 10. Introduction Techniques Overview and Summary Machine Translation Mapping to Semantic Concept Weight Propagation Semantic Concepts Idea map all words to semantic concepts (e.g. WordNet synsets), thus distinguishing different senses of a word while identifying synonyms disambiguate using context information construct feature vectors by counting occurrences of concepts rather than terms G. de Melo, S. Siersdorfer, Max-Planck-Institut Informatik Multilingual Text Classification
  11. 11. Introduction Techniques Overview and Summary Machine Translation Mapping to Semantic Concept Weight Propagation Semantic Concepts Idea map all words to semantic concepts (e.g. WordNet synsets), thus distinguishing different senses of a word while identifying synonyms disambiguate using context information construct feature vectors by counting occurrences of concepts rather than terms G. de Melo, S. Siersdorfer, Max-Planck-Institut Informatik Multilingual Text Classification
  12. 12. Introduction Techniques Overview and Summary Machine Translation Mapping to Semantic Concept Weight Propagation Semantic Concepts Problems understemming polysemy: highly related senses are treated as distinct incongruent concepts between languages variety of expression lexical lacunae English I have a headache I have a headache Spanish Me duele la cabeza *It hurts the head to me French J’ai mal `a la t^ete *I have pain at the head G. de Melo, S. Siersdorfer, Max-Planck-Institut Informatik Multilingual Text Classification
  13. 13. Introduction Techniques Overview and Summary Machine Translation Mapping to Semantic Concept Weight Propagation Weight Propagation propagate weight from original concepts to related concepts choose path to c maximizing its weight Dijkstra-like algorithm in order to assign maximal possible weight to a concept G. de Melo, S. Siersdorfer, Max-Planck-Institut Informatik Multilingual Text Classification
  14. 14. Introduction Techniques Overview and Summary Machine Translation Mapping to Semantic Concept Weight Propagation Weight Propagation propagate weight from original concepts to related concepts choose path to c maximizing its weight Dijkstra-like algorithm in order to assign maximal possible weight to a concept G. de Melo, S. Siersdorfer, Max-Planck-Institut Informatik Multilingual Text Classification
  15. 15. Introduction Techniques Overview and Summary Machine Translation Mapping to Semantic Concept Weight Propagation Weight Propagation propagate weight from original concepts to related concepts choose path to c maximizing its weight Dijkstra-like algorithm in order to assign maximal possible weight to a concept G. de Melo, S. Siersdorfer, Max-Planck-Institut Informatik Multilingual Text Classification
  16. 16. Introduction Techniques Overview and Summary Overview and Summary Overview and Summary Ontology Region Mapping 1 optionally translate the documents – or use a multilingual lexical resource (aligned wordnets) 2 map terms to concepts 3 search for highly related concepts G. de Melo, S. Siersdorfer, Max-Planck-Institut Informatik Multilingual Text Classification
  17. 17. Introduction Techniques Overview and Summary Overview and Summary Overview and Summary Ontology Region Mapping 1 optionally translate the documents – or use a multilingual lexical resource (aligned wordnets) 2 map terms to concepts 3 search for highly related concepts G. de Melo, S. Siersdorfer, Max-Planck-Institut Informatik Multilingual Text Classification
  18. 18. Introduction Techniques Overview and Summary Overview and Summary Overview and Summary Ontology Region Mapping 1 optionally translate the documents – or use a multilingual lexical resource (aligned wordnets) 2 map terms to concepts 3 search for highly related concepts entire regions of concepts are relevant, so propagate a part of the concept’s weight to related concepts G. de Melo, S. Siersdorfer, Max-Planck-Institut Informatik Multilingual Text Classification

×