ARASU  ENGINEERING COLLEGE R.MUTHU KUMARAN (II-CSE) R.PANNEER SELVAM (II-ECE) AUTHORS, NATURAL LANGUAGE PROCESSING (NLP)  ...
Now a days the information is available electronically. Indeed, there has been an explosion of text and multimedia content...
<ul><li>The text – once converted into UNL – can be converted to many different languages . For example, once a home page ...
UNL  TAMIL HINDI FRENCH RUSSIAN ENCONVERSION DECONVERSION
Enconverter Analysis Rules Dictionary W W W W W n i n i+1 n i+2 Node List V TM N GM Node-net n i-1 n i+3
<ul><li>Currently we have many analysis for language conversion : </li></ul><ul><ul><li>Aspects Model Standard Theory </li...
ASPECTS MODEL STANDARD THEORY It was in the Aspects of the Theory of Syntax nouns are chosen on the basis of context free ...
EXTENDED STANDARD THEORY Ray Jackendoff offered a substantial criticism to the Standard Theory and showed that surface str...
<ul><li>ADVANTAGES </li></ul><ul><ul><li>Developing Machine Translation (MT) systems between Tamil and  other languages  p...
The choice of Tamil-Hindi MAT is because, both are Free word-order languages unlike English which is a positional language...
Morphological Analyser (MA) Spliting Word Word Word Morphons Morphons Tamil Sentence
Morphons Root word Help word Tense maker GNP maker Vibakthi
Example :  “  ”
Dictionary Morphons Convertion Generator Mapping  Unit
 
Generators Root word Help word Word Word Word Sentence
 
 
In this paper the development of Tamil – Hindi Translation is described. In Tamil most information for generating sentence...
 
 
 
 
<ul><li>Morphological Analysis </li></ul><ul><li>Semantic Analysis </li></ul>
<ul><li>Natural Access to Internet & Other Resources </li></ul><ul><ul><li>Headline Generation </li></ul></ul><ul><ul><li>...
Name the component:    Morphological Analyzer For Tamil     Morphological Analyzer For Hindi  ( would like to collaborate ...
Name the component:  POS Tagger The performance of these techniques in other languages. English Brills Tagger 99% <ul><li>...
Name the component:  : NP Chunker The performance of these techniques in other languages: FnTBL 98% Tamil : Present Perfor...
Name the component:  :Transfer Grammar Component The performance of these techniques in other languages? NA Tamil :  Prese...
Name the component:  : Word Generator and Local Language Splitter for Target Language Present Performance:  50%  1 st  Yea...
Name the lexical resource:  Hindi- Tamil Bilingual Dictionary The final size of the lexical resource? 30,000 root word The...
Upcoming SlideShare
Loading in …5
×

**JUNK** (no subject)

565 views

Published on

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
565
On SlideShare
0
From Embeds
0
Number of Embeds
27
Actions
Shares
0
Downloads
8
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

**JUNK** (no subject)

  1. 1. ARASU ENGINEERING COLLEGE R.MUTHU KUMARAN (II-CSE) R.PANNEER SELVAM (II-ECE) AUTHORS, NATURAL LANGUAGE PROCESSING (NLP) TAMIL - HINDI CONVERSION
  2. 2. Now a days the information is available electronically. Indeed, there has been an explosion of text and multimedia content on the World Wide Web. For many people, a large and growing fraction of work and leisure time is spent navigating and accessing this universe of information. The presence of so much text in electronic form is a huge challenge to NLP. The Universal Networking Language (UNL) is an electronic language in the form of a semantic network that act as an intermediate representation to express and exchange every kind of information . The UNL represents information, i.e. meaning, sentence-by-sentence. Sentence information is represented as a hyper-graph having Universal Words (UWs) as nodes and relations as arcs. INTRODUCTION
  3. 3. <ul><li>The text – once converted into UNL – can be converted to many different languages . For example, once a home page is expressed in UNL, it can be read in a variety of natural languages. </li></ul><ul><li>The meaning representation is directly available for retrieval and indexing mechanisms and tools for automatic summarization and knowledge extraction and it will be converted to a natural language only when communicating with a human user. </li></ul><ul><li>UNL greatly reduces the cost of developing knowledge or contents necessary for knowledge processing, by sharing knowledge and contents. Furthermore, if the type of knowledge required for doing some task is described in a language. </li></ul><ul><li>UNL, the software only needs to interpret unambiguous intermediate instructions written in the language to be able to perform its functions. </li></ul>UNL FEATURES
  4. 4. UNL TAMIL HINDI FRENCH RUSSIAN ENCONVERSION DECONVERSION
  5. 5. Enconverter Analysis Rules Dictionary W W W W W n i n i+1 n i+2 Node List V TM N GM Node-net n i-1 n i+3
  6. 6. <ul><li>Currently we have many analysis for language conversion : </li></ul><ul><ul><li>Aspects Model Standard Theory </li></ul></ul><ul><ul><li>Extended Standard Theory (EST) </li></ul></ul>
  7. 7. ASPECTS MODEL STANDARD THEORY It was in the Aspects of the Theory of Syntax nouns are chosen on the basis of context free rules ; verbs are then chosen on the basis of context sensitive rules, which are the terms to express the lexical features . Since nouns are the first words to be chosen, they are identified by lexical features only. Verbs and adjectives require additional features to indicate the environments in which they can appear. Aspects of grammar was organized into three major components:
  8. 8. EXTENDED STANDARD THEORY Ray Jackendoff offered a substantial criticism to the Standard Theory and showed that surface structure played a much more important role in semantic interpretation than the Deep structure. Here the partial representation of meaning is determined by grammatical structure. The derivation of logical form proceeds step by step which is determined by a derivational process analogous to those of syntax and phonology.
  9. 9. <ul><li>ADVANTAGES </li></ul><ul><ul><li>Developing Machine Translation (MT) systems between Tamil and other languages particularly English and Hindi </li></ul></ul><ul><ul><li>Building lexical resources in Tamil that are essential for researchers and developers </li></ul></ul><ul><ul><li>Developing basic tools for computational work in Tamil, such as morph analyzer, Part-Of-Speech (POS) tagger etc. </li></ul></ul><ul><ul><li>Application of NLP tools for Information Extraction from domain specific texts so as to build Information Extraction systems for various domains such as medicine, agriculture etc . </li></ul></ul>
  10. 10. The choice of Tamil-Hindi MAT is because, both are Free word-order languages unlike English which is a positional language. Ultimately our aim is to built a Human Aided Machine Translation System for Hindi-Tamil. A MT system basically has three major components. TAMIL-HINDI SYSTEM Tamil Word MA Generator Hindi Word Mapping Unit Tamil to Hindi Translation
  11. 11. Morphological Analyser (MA) Spliting Word Word Word Morphons Morphons Tamil Sentence
  12. 12. Morphons Root word Help word Tense maker GNP maker Vibakthi
  13. 13. Example : “ ”
  14. 14. Dictionary Morphons Convertion Generator Mapping Unit
  15. 16. Generators Root word Help word Word Word Word Sentence
  16. 19. In this paper the development of Tamil – Hindi Translation is described. In Tamil most information for generating sentence from UNL structure is tackled in morphological and syntactical level. The humble one could potentially alleviate for the most pressing issues of the NLP. The application of NLP is vast like ocean. We see a little drop of that ocean. In the feature NLP helps to comfortably communicate with computer. CONCLUSION
  17. 24. <ul><li>Morphological Analysis </li></ul><ul><li>Semantic Analysis </li></ul>
  18. 25. <ul><li>Natural Access to Internet & Other Resources </li></ul><ul><ul><li>Headline Generation </li></ul></ul><ul><ul><li>Headline Translation </li></ul></ul><ul><ul><li>Document Translation </li></ul></ul><ul><ul><li>Multilingual Multi document Summarization </li></ul></ul><ul><li>Cross-lingual Information Management </li></ul><ul><ul><li>Multilingual and Cross-lingual IR </li></ul></ul><ul><ul><li>Open Domain Question Answering </li></ul></ul>
  19. 26. Name the component: Morphological Analyzer For Tamil Morphological Analyzer For Hindi ( would like to collaborate with consortium ) The performance of these techniques in other languages? Kimmo Analyser –95% English <ul><li>For Tamil Morphological Analyser : </li></ul><ul><li>Present Performance is 92% </li></ul><ul><li>1 st Year : 96% </li></ul><ul><li>2 nd Year : 98-99% </li></ul>Language pair: Tamil –Hindi
  20. 27. Name the component: POS Tagger The performance of these techniques in other languages. English Brills Tagger 99% <ul><li>Tamil : Present Performance: 90+% </li></ul><ul><li>1 st Year : 96%; </li></ul><ul><li>2 nd Year : 98-99% </li></ul>Language pair: Tamil –Hindi Evaluation metrics in addition to the domain: Precision and Recall
  21. 28. Name the component: : NP Chunker The performance of these techniques in other languages: FnTBL 98% Tamil : Present Performance: 94+% 1 st Year : 96%; 2 nd Year : 98-99% Language pair: Tamil –Hindi Name the domain for which the performance will be optimized : Crime/ Tourism Name other evaluation metrics in addition to the domain: Precision and Recall
  22. 29. Name the component: :Transfer Grammar Component The performance of these techniques in other languages? NA Tamil : Present Performance: 50% 1 st Year : 90%; 2 nd Year : 95 and above Language pair: Tamil –Hindi <ul><li>Name other evaluation metrics in addition to the domain: </li></ul><ul><li>Precision and Recall </li></ul>
  23. 30. Name the component: : Word Generator and Local Language Splitter for Target Language Present Performance: 50% 1 st Year : 90%; 2 nd Year : 95 and above Language pair: Tamil –Hindi Name other evaluation metrics in addition to the domain: Precision, Recall and F measure
  24. 31. Name the lexical resource: Hindi- Tamil Bilingual Dictionary The final size of the lexical resource? 30,000 root word The average size of such a resource in other languages 20, 000 root words 1 st Year 15,000 root words 2 nd Year15, 000 root words Language pair: Tamil -Hindi

×