Improvement of English to Persian Machine Translation via N-grams of Part-of-Speech tags

•Download as PPTX, PDF•

1 like•397 views

Adel Rahimi from Sharif University of Technology worked to improve English to Persian machine translation by using n-grams of part-of-speech tags. This method analyzes sequences of part-of-speech tags in the translated text, which helped correct syntactical errors. The approach achieved a 65% accuracy level in evaluating correctness of translated sentences compared to the original Persian sentences.

Science

Improvement of English to Persian
Machine Translation via N-grams of
Part-of-Speech tags
Adel Rahimi
Sharif University Of Technlogy
adel.rahimi@mehr.sharif.edu
3rd Regional Conference On New Achievements In Electrical And Computer Engineering

Hi! I’m Adel Rahimi
I work at Sharif Speech and Language
Processing Lab.
I love NLP and Data Mining.
You can find me at:
http://mehr.sharif.edu/~adel.rahimi
Adel.rahimi@mehr.sharif.edu
2

IN SHORT Machine Translation has always been an interesting topic in
the NLP.
It’s always improving, we tried a new method to align the
English to Persian machine-translated texts. We used n-gram
modelling for part-of-speech tagged tokens. This method
improved the accuracy for syntactical mistranslated sentences.
3

PREVIOUS
STUDIES
▫Orch (1999) used a method that translated word by
word and then reordered words as the destination
language’s syntactic structure
▫Koehn (2009) proposed that we translate phrases
regardless of word structures
▫Kumar & Byrne (2008), Blackwell (2006), and
Kumar (2003) all were looking for a method to use
Finite State Transducer
4

METHODOLOGY We used N-gram of POS tagged items:
‫من‬‫این‬‫کد‬‫من‬ ‫و‬‫میخواهم‬
pronoun pronoun noun conjunction pronoun verb
‫من‬‫خواهم‬‫رفت‬
pronoun verb
6

THE DATASET
7
String
n n pro spec
n n pro qua spec n
n p n p v adv
n pro p adv v pro
p n adj adj n
number
۱
۲
۳
۴
۵

9
‫فارسی‬ ‫اصلی‬ ‫ی‬‫جمله‬‫یک‬ ‫این‬‫متریک‬‫است‬ ‫متداول‬ ‫بسیار‬
‫انگلیسی‬ ‫اصلی‬ ‫ی‬‫جمله‬This is a very common meteric
‫شده‬ ‫ترجمه‬ ‫ی‬‫جمله‬‫است‬ ‫متداول‬ ‫بسیار‬ ‫این‬ ‫متریک‬ ‫یک‬
‫شده‬‫ترجمه‬ ‫کالم‬ ‫اجزای‬ ‫ی‬‫دنباله‬n n pro adj adj v
‫ی‬‫دنباله‬‫شده‬ ‫اصالح‬ ‫کالم‬ ‫اجزای‬pro n n adj adj v

11
THANKS Any questions?
Contact me at:
▫ Mehr.sharif.edu/~adel.rahimi
▫ Adel.rahimi@mehr.sharif.edu

Machine Transliteration has come out to be an emerging and a very important research area in the field of machine translation. Transliteration basically aims to preserve the phonological structure of words. Proper transliteration of name entities plays a very significant role in improving the quality of machine translation. In this paper we are doing machine transliteration for English-Punjabi language pair using rule based approach. We have constructed some rules for syllabification. Syllabification is the process to extract or separate the syllable from the words. In this we are calculating the probabilities for name entities (Proper names and location). For those words which do not come under the category of name entities, separate probabilities are being calculated by using relative frequency through a statistical machine translation toolkit known as MOSES. Using these probabilities we are transliterating our input text from English to Punjabi.

Experiments with Different Models of Statistcial Machine Translation

khyati gupta

Khirulnizam malay proverb detection - mobilecase 19 sept 2012 - copyKhirulnizam Abd Rahman

IMPROVING THE QUALITY OF GUJARATI-HINDI MACHINE TRANSLATION THROUGH PART-OF-S...

ijnlc

Machine Translation for Indian languages is an emerging research area. Transliteration is one such module that we design while designing a translation system. Transliteration means mapping of source language text into the target language. Simple mapping decreases the efficiency of overall translation system. We propose the use of stemming and part-of-speech tagging for transliteration. The effectiveness of translation can be improved if we use part-of-speech tagging and stemming assisted transliteration. We have shown that much of the content in Gujarati gets transliterated while being processed for translation to Hindi language.

Syntactic parsing for arabicArabic_NLP_ImamU2013

English to Bangla Translation

Saugata Bose

NEURAL SYMBOLIC ARABIC PARAPHRASING WITH AUTOMATIC EVALUATION

cscpconf

We present symbolic and neural approaches for Arabic paraphrasing that yield high paraphrasing accuracy. This is the first work on sentence level paraphrase generation for Arabic and the first using neural models to generate paraphrased sentences for Arabic. We present and compare several methods for paraphrasing and obtaining monolingual parallel data. We share a large coverage phrase dictionary for Arabic and contribute a large parallel monolingual corpus that can be used in developing new seq-to-seq models for paraphrasing. This is the first large monolingual corpus of Arabic. We also present first results in Arabic paraphrasing using seq-to-seq neural methods. Additionally, we propose a novel automatic evaluation metric for paraphrasing that correlates highly with human judgement.

Neural machine translation is a new approach to machine translation that has shown the effective results for high-resource languages. Recently, the attention-based neural machine translation with the large scale parallel corpus plays an important role to achieve high performance for translation results. In this research, a parallel corpus for Myanmar-English language pair is prepared and attention-based neural machine translation models are introduced based on word to word level, character to word level, and syllable to word level. We do the experiments of the proposed model to translate the long sentences and to address morphological problems. To decrease the low resource problem, source side monolingual data are also used. So, this work investigates to improve Myanmar to English neural machine translation system. The experimental results show that syllable to word level neural mahine translation model obtains an improvement over the baseline systems.

ATTENTION-BASED SYLLABLE LEVEL NEURAL MACHINE TRANSLATION SYSTEM FOR MYANMAR ...

ijnlc

Parafraseo-Chenggang.pdf

Universidad Nacional de San Martin

El modelo de traducción de voz de extremo a extremo de alta calidad se basa en una gran escala de datos de entrenamiento de voz a texto, que suele ser escaso o incluso no está disponible para algunos pares de idiomas de bajos recursos. Para superar esto, nos proponer un método de aumento de datos del lado del objetivo para la traducción del habla en idiomas de bajos recursos. En particular, primero generamos paráfrasis del lado objetivo a gran escala basadas en un modelo de generación de paráfrasis que incorpora varias características de traducción automática estadística (SMT) y el uso común función de red neuronal recurrente (RNN). Luego, un modelo de filtrado que consiste en similitud semántica y se propuso la co-ocurrencia de pares de palabras y habla para seleccionar la fuente con la puntuación más alta pares de paráfrasis de los candidatos. Resultados experimentales en inglés, árabe, alemán, letón, estonio, La generación de paráfrasis eslovena y sueca muestra que el método propuesto logra resultados significativos. y mejoras consistentes sobre varios modelos de referencia sólidos en conjuntos de datos PPDB (http://paraphrase. org/). Para introducir los resultados de la generación de paráfrasis en la traducción de voz de bajo recurso, proponen dos estrategias: recombinación de pares audio-texto y entrenamiento de referencias múltiples. Experimental Los resultados muestran que los modelos de traducción de voz entrenados en nuevos conjuntos de datos de audio y texto que combinan los resultados de la generación de paráfrasis conducen a mejoras sustanciales sobre las líneas de base, especialmente en lenguas de escasos recursos.

Experiments with Different Models of Statistcial Machine Translation

khyati gupta

project presentkhyati gupta

almisbarIEEE-1Saman Rasheed

Personalising speech to-speech translationbehzad66

A new hybrid metric for verifying

csandit

This paper discusses a new metric that has been applied to verify the quality in translation between sentence pairs in parallel corpora of Arabic-English. This metric combines two techniques, one based on sentence length and the other based on compression code length. Experiments on sample test parallel Arabic-English corpora indicate the combination of these two techniques improves accuracy of the identification of satisfactory and unsatisfactory sentence pairs compared to sentence length and compression code length alone. The newmethod proposed in this research is effective at filtering noise and reducing mis-translations resulting in greatly improved quality.

HINDI AND MARATHI TO ENGLISH MACHINE TRANSLITERATION USING SVM

ijnlc

Ijetcas14 444Iasir Journals

C8 akumaranJasline Presilda

Jq3616701679

IJERA Editor

International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.

Efficiency lossless data techniques for arabic text compression

ijcsit

Study and evaluate the efficiency of LZW and BWT techniques for different categories of Arabic text files of different sizes. Compare these techniques on Arabic and English text files is introduced. Additional to exploiting morphological features of the Arabic language to improve performance of LZW techniques. We found that the enhanced LZW was the best one for all categories of the Arabic texts, then the LZW standard and BWT respectively.

Semi-Supervised Keyword Spotting in Arabic Speech Using Self-Training Ensembles

Mohamed El-Geish

Arabic speech recognition suffers from the scarcity of properly labeled data. In this project, we introduce a pipeline that performs semi-supervised segmentation of audio then— after hand-labeling a small dataset—feeds labeled segments to a supervised learning framework to select, through many rounds of hyperparameter optimization, an ensemble of models to infer labels for a larger dataset; using which we improved the keyword spotter’s F1 score from 75.85% (using a baseline model) to 90.91% on a ground-truth test set. We picked the keyword na`am (yes) to spot; we defined the system’s input as an audio file of an utterance and the output as a binary label: keyword or filler.

Source side pre-ordering using recurrent neural networks for English-Myanmar ...

IJECEIAES

Word reordering has remained one of the challenging problems for machine translation when translating between language pairs with different word orders e.g. English and Myanmar. Without reordering between these languages, a source sentence may be translated directly with similar word order and translation can not be meaningful. Myanmar is a subject-objectverb (SOV) language and an effective reordering is essential for translation. In this paper, we applied a pre-ordering approach using recurrent neural networks to pre-order words of the source Myanmar sentence into target English’s word order. This neural pre-ordering model is automatically derived from parallel word-aligned data with syntactic and lexical features based on dependency parse trees of the source sentences. This can generate arbitrary permutations that may be non-local on the sentence and can be combined into English-Myanmar machine translation. We exploited the model to reorder English sentences into Myanmar-like word order as a preprocessing stage for machine translation, obtaining improvements quality comparable to baseline rule-based pre-ordering approach on asian language treebank (ALT) corpus.

Part-of-Speech Tagging for Bengali Thesis submitted to Indian ...butest

Applying Rule-Based Maximum Matching Approach for Verb Phrase Identification ...

International Journal of Science and Research (IJSR)

Phrase Identification is one of the most critical and widely studied in Natural Language processing (NLP) tasks. Verb Phrase Identification within a sentence is very useful for a variety of application on NLP. One of the core enabling technologies required in NLP applications is a Morphological Analysis. This paper presents the Myanmar Verb Phrase Identification and Translation Algorithm and develops a Markov Model with Morphological Analysis. The system is based on Rule-Based Maximum Matching Approach. In Machine Translation, Large amount of information is needed to guide the translation process. Myanmar Language is inflected language and there are very few creations and researches of Lexicon in Myanmar, comparing to other language such as English, French and Czech etc. Therefore, this system is proposed Myanmar Verb Phrase identification and translation model based on Syntactic Structure and Morphology of Myanmar Language by using Myanmar- English bilingual lexicon. Markov Model is also used to reformulate the translation probability of Phrase pairs. Experiment results showed that proposed system can improve translation quality by applying morphological analysis on Myanmar Language.

S URVEY O N M ACHINE T RANSLITERATION A ND M ACHINE L EARNING M ODELS

ijnlc

Globalization and growth of Internet users truly demands for almost all internet based applications to support l oca l l anguages. Support of l oca l l anguages can be given in all internet based applications by means of Machine Transliteration and Machine Translation . This paper provides the thorough survey on machine transliteration models and machine learning approaches used for machine transliteration over the period of more than two decades for internationally used languages as well as Indian languages. Survey shows that linguistic approach provides better results for the closely related languages and probability based statistical approaches are good when one of the languages is phonetic and other is non - phonetic. B etter accuracy can be achieved only by using Hybrid and Combined models.

Singapore's Macroeconomics analysis

Adel Rahimi

Artificial Bee Colony: An introduction

Adel Rahimi

Similar to Improvement of English to Persian Machine Translation via N-grams of Part-of-Speech tags

Enriching Transliteration Lexicon Using Automatic Transliteration Extraction

Sarvnaz Karimi

Rule Based Transliteration Scheme for English to Punjabi

kevig

ATTENTION-BASED SYLLABLE LEVEL NEURAL MACHINE TRANSLATION SYSTEM FOR MYANMAR ...

kevig

ATTENTION-BASED SYLLABLE LEVEL NEURAL MACHINE TRANSLATION SYSTEM FOR MYANMAR ...

ijnlc

Parafraseo-Chenggang.pdf

Universidad Nacional de San Martin

Experiments with Different Models of Statistcial Machine Translation

khyati gupta

project presentkhyati gupta

almisbarIEEE-1Saman Rasheed

Personalising speech to-speech translationbehzad66

A new hybrid metric for verifying

csandit

HINDI AND MARATHI TO ENGLISH MACHINE TRANSLITERATION USING SVM

ijnlc

Ijetcas14 444Iasir Journals

C8 akumaranJasline Presilda

Jq3616701679

IJERA Editor

Efficiency lossless data techniques for arabic text compression

ijcsit

Semi-Supervised Keyword Spotting in Arabic Speech Using Self-Training Ensembles

Mohamed El-Geish

Source side pre-ordering using recurrent neural networks for English-Myanmar ...

IJECEIAES

Part-of-Speech Tagging for Bengali Thesis submitted to Indian ...butest

Applying Rule-Based Maximum Matching Approach for Verb Phrase Identification ...

International Journal of Science and Research (IJSR)

S URVEY O N M ACHINE T RANSLITERATION A ND M ACHINE L EARNING M ODELS

ijnlc

Similar to Improvement of English to Persian Machine Translation via N-grams of Part-of-Speech tags (20)

Enriching Transliteration Lexicon Using Automatic Transliteration Extraction

Rule Based Transliteration Scheme for English to Punjabi

ATTENTION-BASED SYLLABLE LEVEL NEURAL MACHINE TRANSLATION SYSTEM FOR MYANMAR ...

Parafraseo-Chenggang.pdf

Experiments with Different Models of Statistcial Machine Translation

project present

almisbarIEEE-1

Personalising speech to-speech translation

A new hybrid metric for verifying

HINDI AND MARATHI TO ENGLISH MACHINE TRANSLITERATION USING SVM

Ijetcas14 444

C8 akumaran

Jq3616701679

Efficiency lossless data techniques for arabic text compression

Semi-Supervised Keyword Spotting in Arabic Speech Using Self-Training Ensembles

Source side pre-ordering using recurrent neural networks for English-Myanmar ...

Part-of-Speech Tagging for Bengali Thesis submitted to Indian ...

Applying Rule-Based Maximum Matching Approach for Verb Phrase Identification ...

S URVEY O N M ACHINE T RANSLITERATION A ND M ACHINE L EARNING M ODELS

More from Adel Rahimi

Singapore's Macroeconomics analysis

Adel Rahimi

Artificial Bee Colony: An introduction

Adel Rahimi

Talking Animals

Adel Rahimi

Neural Networks with Focus on Language Modeling

Adel Rahimi

Neural Networks

Adel Rahimi

corpus study of multi token units

Adel Rahimi

Big Data + Sentiment Analysis = Awesome

Adel Rahimi

Detecting negative words

Adel Rahimi

Persian Intonation

Adel Rahimi

X bar theory

Adel Rahimi

Content based language learning I

Adel Rahimi

Content-basedlanguage learning A. RAHIMI What is cbi? CBI is designed to provide second-language learners instruction in content and language What are the benefits of cbi? Learners explore interesting content & are engaged in appropriate language-dependent activities. Learning language becomes automatic. CBI supports contextualized learning; learners are taught useful language that is embedded within relevant discourse contexts rather than as isolated language fragments Complex information is delivered through real life context for the students to grasp well & leads to intrinsic motivation. In CBI information is reiterated by strategically delivering information at right time & situation compelling the students to learn out of passion. Greater flexibility & adaptability in the curriculum can be deployed as per the student's interest. It gives hands on experience to the learner. DEMONSTRATION Intermediate class Preparing for general English First session for vocabulary Buying an airline ticket I'd like to reserve two seats to New York. Will that be one way or round trip? It's $819. Will you pay by check or by credit card? Here's my Visa Card. Can we get an aisle seat please? You can choose your seat when you check in. Vocabularies related to air travel Vocabularies related to air travel Getting your luggage At which carrousel will our luggage be? Great! I'll get a cart right away. Be sure you have your luggage ticket. -Anything to declare? -No, there's nothing to declare / Nothing to declare Traveling by sea We're going across to France by/on the ferry. We’re leaving for a cruise across Europe. Vocabularies associated with ships Bow: The front of the ship. Stern or Aft: The rear of the ship. Port: The left side of the ship when facing the bow. Starboard: The right side of the ship when toward the bow. Decks: Floors of the ship. Galley: Where food is prepared; the ship's kitchen. Larger vessels may have more than one. Muster Station: The designated meeting spot for passengers during emergencies or evacuations. Your muster station will be noted in your cabin. Cabin or Stateroom: Your room or sleeping quarters on board. Gangway: The entrance / exit area of the ship used while docked, typically on a lower deck. Traveling by car Where is the parking lot, please? Where can I park my car? Can I park my car here? Where can I rent a car? I would like to rent a car for.... days / weeks. The car costs £30 a day to rent, but you get unlimited mileage (= no charge for the miles traveled) I had a breakdown (= my car stopped working) in the middle of the road The car's still at the garage getting fixed.Where can I find a garage to repair my car? I'll need to take out extra car insurance for another driver.

Phonological CA

Adel Rahimi

SuprasegmentalsAdel Rahimi

More from Adel Rahimi (13)

Singapore's Macroeconomics analysis

Artificial Bee Colony: An introduction

Talking Animals

Neural Networks with Focus on Language Modeling

Neural Networks

corpus study of multi token units

Big Data + Sentiment Analysis = Awesome

Detecting negative words

Persian Intonation

X bar theory

Content based language learning I

Phonological CA

Suprasegmentals

Recently uploaded

Richard's entangled aventures in wonderland

Richard Gill

Since the loophole-free Bell experiments of 2020 and the Nobel prizes in physics of 2022, critics of Bell's work have retreated to the fortress of super-determinism. Now, super-determinism is a derogatory word - it just means "determinism". Palmer, Hance and Hossenfelder argue that quantum mechanics and determinism are not incompatible, using a sophisticated mathematical construction based on a subtle thinning of allowed states and measurements in quantum mechanics, such that what is left appears to make Bell's argument fail, without altering the empirical predictions of quantum mechanics. I think however that it is a smoke screen, and the slogan "lost in math" comes to my mind. I will discuss some other recent disproofs of Bell's theorem using the language of causality based on causal graphs. Causal thinking is also central to law and justice. I will mention surprising connections to my work on serial killer nurse cases, in particular the Dutch case of Lucia de Berk and the current UK case of Lucy Letby.

What is greenhouse gasses and how many gasses are there to affect the Earth.

moosaasad1975

platelets- lifespan -Clot retraction-disorders.pptx

muralinath2

Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...

Sérgio Sacani

Since volcanic activity was first discovered on Io from Voyager images in 1979, changes on Io’s surface have been monitored from both spacecraft and ground-based telescopes. Here, we present the highest spatial resolution images of Io ever obtained from a groundbased telescope. These images, acquired by the SHARK-VIS instrument on the Large Binocular Telescope, show evidence of a major resurfacing event on Io’s trailing hemisphere. When compared to the most recent spacecraft images, the SHARK-VIS images show that a plume deposit from a powerful eruption at Pillan Patera has covered part of the long-lived Pele plume deposit. Although this type of resurfacing event may be common on Io, few have been detected due to the rarity of spacecraft visits and the previously low spatial resolution available from Earth-based telescopes. The SHARK-VIS instrument ushers in a new era of high resolution imaging of Io’s surface using adaptive optics at visible wavelengths.

EY - Supply Chain Services 2018_template.pptx

AlguinaldoKong

GBSN- Microbiology (Lab 3) Gram Staining

Areesha Ahmad

PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION

ChetanK57

role of pramana in research.pptx in science

sonaliswain16

general properties of oerganologametal.ppt

IqrimaNabilatulhusni

Structures and textures of metamorphic rocks

kumarmathi863

Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...

University of Maribor

extra-chromosomal-inheritance[1].pptx.pdfpdf

DiyaBiswas10

Slide 1: Title Slide Extrachromosomal Inheritance Slide 2: Introduction to Extrachromosomal Inheritance Definition: Extrachromosomal inheritance refers to the transmission of genetic material that is not found within the nucleus. Key Components: Involves genes located in mitochondria, chloroplasts, and plasmids. Slide 3: Mitochondrial Inheritance Mitochondria: Organelles responsible for energy production. Mitochondrial DNA (mtDNA): Circular DNA molecule found in mitochondria. Inheritance Pattern: Maternally inherited, meaning it is passed from mothers to all their offspring. Diseases: Examples include Leber’s hereditary optic neuropathy (LHON) and mitochondrial myopathy. Slide 4: Chloroplast Inheritance Chloroplasts: Organelles responsible for photosynthesis in plants. Chloroplast DNA (cpDNA): Circular DNA molecule found in chloroplasts. Inheritance Pattern: Often maternally inherited in most plants, but can vary in some species. Examples: Variegation in plants, where leaf color patterns are determined by chloroplast DNA. Slide 5: Plasmid Inheritance Plasmids: Small, circular DNA molecules found in bacteria and some eukaryotes. Features: Can carry antibiotic resistance genes and can be transferred between cells through processes like conjugation. Significance: Important in biotechnology for gene cloning and genetic engineering. Slide 6: Mechanisms of Extrachromosomal Inheritance Non-Mendelian Patterns: Do not follow Mendel’s laws of inheritance. Cytoplasmic Segregation: During cell division, organelles like mitochondria and chloroplasts are randomly distributed to daughter cells. Heteroplasmy: Presence of more than one type of organellar genome within a cell, leading to variation in expression. Slide 7: Examples of Extrachromosomal Inheritance Four O’clock Plant (Mirabilis jalapa): Shows variegated leaves due to different cpDNA in leaf cells. Petite Mutants in Yeast: Result from mutations in mitochondrial DNA affecting respiration. Slide 8: Importance of Extrachromosomal Inheritance Evolution: Provides insight into the evolution of eukaryotic cells. Medicine: Understanding mitochondrial inheritance helps in diagnosing and treating mitochondrial diseases. Agriculture: Chloroplast inheritance can be used in plant breeding and genetic modification. Slide 9: Recent Research and Advances Gene Editing: Techniques like CRISPR-Cas9 are being used to edit mitochondrial and chloroplast DNA. Therapies: Development of mitochondrial replacement therapy (MRT) for preventing mitochondrial diseases. Slide 10: Conclusion Summary: Extrachromosomal inheritance involves the transmission of genetic material outside the nucleus and plays a crucial role in genetics, medicine, and biotechnology. Future Directions: Continued research and technological advancements hold promise for new treatments and applications. Slide 11: Questions and Discussion Invite Audience: Open the floor for any questions or further discussion on the topic.

erythropoiesis-I_mechanism& clinical significance.pptx

muralinath2

Mammalian Pineal Body Structure and Also Functions

YOGESH DOGRA

Citrus Greening Disease and its Management

subedisuryaofficial

Unveiling the Energy Potential of Marshmallow Deposits.pdf

Erdal Coalmaker

Comparative structure of adrenal gland in vertebrates

sachin783648

Lab report on liquid viscosity of glycerin

ossaicprecious19

The ASGCT Annual Meeting was packed with exciting progress in the field advan...

Health Advances

Hemoglobin metabolism_pathophysiology.pptx

muralinath2

Recently uploaded (20)

Richard's entangled aventures in wonderland

What is greenhouse gasses and how many gasses are there to affect the Earth.

platelets- lifespan -Clot retraction-disorders.pptx

Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...

EY - Supply Chain Services 2018_template.pptx

GBSN- Microbiology (Lab 3) Gram Staining

PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION

role of pramana in research.pptx in science

general properties of oerganologametal.ppt

Structures and textures of metamorphic rocks

Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...

extra-chromosomal-inheritance[1].pptx.pdfpdf

erythropoiesis-I_mechanism& clinical significance.pptx

Mammalian Pineal Body Structure and Also Functions

Citrus Greening Disease and its Management

Unveiling the Energy Potential of Marshmallow Deposits.pdf

Comparative structure of adrenal gland in vertebrates

Lab report on liquid viscosity of glycerin

The ASGCT Annual Meeting was packed with exciting progress in the field advan...

Hemoglobin metabolism_pathophysiology.pptx

Improvement of English to Persian Machine Translation via N-grams of Part-of-Speech tags

1. Improvement of English to Persian Machine Translation via N-grams of Part-of-Speech tags Adel Rahimi Sharif University Of Technlogy adel.rahimi@mehr.sharif.edu 3rd Regional Conference On New Achievements In Electrical And Computer Engineering

2. Hi! I’m Adel Rahimi I work at Sharif Speech and Language Processing Lab. I love NLP and Data Mining. You can find me at: http://mehr.sharif.edu/~adel.rahimi Adel.rahimi@mehr.sharif.edu 2

3. IN SHORT Machine Translation has always been an interesting topic in the NLP. It’s always improving, we tried a new method to align the English to Persian machine-translated texts. We used n-gram modelling for part-of-speech tagged tokens. This method improved the accuracy for syntactical mistranslated sentences. 3

4. PREVIOUS STUDIES ▫Orch (1999) used a method that translated word by word and then reordered words as the destination language’s syntactic structure ▫Koehn (2009) proposed that we translate phrases regardless of word structures ▫Kumar & Byrne (2008), Blackwell (2006), and Kumar (2003) all were looking for a method to use Finite State Transducer 4

5. HOW WAS IT DONE?

6. METHODOLOGY We used N-gram of POS tagged items: ‫من‬‫این‬‫کد‬‫من‬ ‫و‬‫میخواهم‬ pronoun pronoun noun conjunction pronoun verb ‫من‬‫خواهم‬‫رفت‬ pronoun verb 6

7. THE DATASET 7 String n n pro spec n n pro qua spec n n p n p v adv n pro p adv v pro p n adj adj n number ۱ ۲ ۳ ۴ ۵

8. 8 HOW ABOUT THE ACCURACY?

9. 9 ‫فارسی‬ ‫اصلی‬ ‫ی‬‫جمله‬‫یک‬ ‫این‬‫متریک‬‫است‬ ‫متداول‬ ‫بسیار‬ ‫انگلیسی‬ ‫اصلی‬ ‫ی‬‫جمله‬This is a very common meteric ‫شده‬ ‫ترجمه‬ ‫ی‬‫جمله‬‫است‬ ‫متداول‬ ‫بسیار‬ ‫این‬ ‫متریک‬ ‫یک‬ ‫شده‬‫ترجمه‬ ‫کالم‬ ‫اجزای‬ ‫ی‬‫دنباله‬n n pro adj adj v ‫ی‬‫دنباله‬‫شده‬ ‫اصالح‬ ‫کالم‬ ‫اجزای‬pro n n adj adj v

10. 10 65 percent accuracy

11. 11 THANKS Any questions? Contact me at: ▫ Mehr.sharif.edu/~adel.rahimi ▫ Adel.rahimi@mehr.sharif.edu

Improvement of English to Persian Machine Translation via N-grams of Part-of-Speech tags

Recommended

Recommended

More Related Content

Similar to Improvement of English to Persian Machine Translation via N-grams of Part-of-Speech tags

Similar to Improvement of English to Persian Machine Translation via N-grams of Part-of-Speech tags (20)

More from Adel Rahimi

More from Adel Rahimi (13)

Recently uploaded

Recently uploaded (20)

Improvement of English to Persian Machine Translation via N-grams of Part-of-Speech tags