Evaluation of hindi english mt systems, challenges and solutions

HUL 455
Evaluation of Hindi-English MT systems: Challenges and Solutions
APresentationby:
Sajeed Mahaboob
2011ME1111

MACHINE TRANSLATION
Translation can be defined as the act or process of translating,
especially from one language into another.
MT investigates the use of computer software to translate text
or speech from one language (SL) to another language (TL).
It is Automated system.
2

It analyzes text from Source Language (SL), processed it and
produces “equivalent” text in Target Language (TL).
It should be without human intervention.
MT systems are supposed to break the language barrier.
3

METHODS AND STRATEGIES
Direct Method
Transfer Method
Interlingual Method
4

DIRECT METHOD
The majority of MT systems of the 1950’s and
1960’s were based on this approach.
Designed in all details specifically for one
particular pair of languages.
Word by word matches of the SL and TL.
5

TRANSFER METHOD
Two stages that consist of underlying representations
for both SL and TL texts.
The first stage converts SL texts into SL
‘transfer’ representations.
The second stage converts these into TL
‘transfer’ representations.
6

INTERLINGUAL METHOD
Convert SL texts into semantico-syntactic
representations common to more than one
language.
From such ‘interlingual’ representations
texts would be generated into other
languages.
7

MT IN INDIA: WHY DO WE NEED ?
Multilingual country where the spoken language changes after every 50
miles.
22 official languages and approximately 2000 dialects are spoken.
State governments carry out their official work in their respective regional
language.
Translating documents manually is very time consuming and costly.
8

ENGLISH-HINDI MT SYSTEMS
MANTRA MT (1997)
Developed for information preservation. The text available in one Indian
language is made accessible in another Indian language with the help of
this system.
It uses XTAG based super tagger and light dependency analyzer for
performing the analysis of the input English text. The system produces
several outputs corresponding to a given input.
9

MANTRA MT(1999)
It translates English text into Hindi in a specific domain of personal
administration that includes gazette notifications, office orders, office
memorandums and circulars.
Uses the Tree Adjoining Grammar (TAG) formalism to represent the
English and Hindi grammar.
It uses tree transfer for translating from English to Hindi.
The system was tested for the translation of administrative documents such
as appointment letters, notification and circular issued in central
government from English to Hindi.
10

English–Hindi Translation System
A system based on transfer based translation approach, which uses
different grammatical rules of source and target languages and a
bilingual dictionary for translation.
The translation module consists of pre-processing, English tree
generator, post-processing of English tree, generation of Hindi tree,
Post-processing of Hindi tree and generating output.
The domain of the system was weather narration.
11

EVALUATION OF ENGLISH-HINDI MT SYSTEMS
Low accuracy, fluency and acceptability of output of any machine translation
system adversely affect the reliability and usage of that system. Evaluation
task can ascertain how and in what ways are the results of these systems
lacking.
Evaluation is one of the most important part in the development of MT systems
and one can’t claim MT systems success without evaluation !
The need and demand for evaluating an MT system is always at a higher
priority.
Here, we are evaluating the output of Hindi-English language pair through
two MT systems : Bing and Google.
12

Google MT/Translator is based on statistical and machine learning
approaches based on parallel corpora. It is running for 73 languages pairs.
Bing (Microsoft) MT is also based on statistical and machine learning
approaches based on parallel corpora. It also uses language specific rule-
based components to decode and encode sentences from one language
to another.
Linguistically informed statistical machine translation”. Bing MT
is running for 44 parallel languages pairs.
13

EVALUATIONSTRATEGIES
Evaluation strategies are mainly divided into two sections : (a) Automatic
evaluation (b) Manual or Human evaluation.
Automatic evaluation of any MT system is very difficult and is not as effective
as human metrics are. There are several tested MT evaluation measures
frequently used, for example: BLEU, mWER, mPER and NIST.
Human evaluation metrics are considered to be time taking and costly. But
they are the best strategies to improve any MT system’s accuracy ! !
It is a common scenario where more than one translation of a sentence exists.
At this level a human translator cum evaluator can judge the output
correctly. 14

CHALLENGES DURING EVALUATION
Sentences from the health and cuisine domains of the ILCI3 corpora are used
for evaluating the MT systems.
These sentences are entered in each of the systems in bulk and the output is
crawled, and discrepancies are marked.
In the resulting English output, several problems are noted particularly with
respect to gender agreement, structural mapping, Named Entity Recognition
(NER) and plural marker morphemes.
15

During the evaluation process the following kinds of
challenges are encountered.
1. Tokenization
2. Morph Issue
3. Structural/grammatical Differences
4. Errors with Gender agreement
5. Parser Issues
16

TOKENIZATION
 (i) With/Without Punctuation :
 (a) वह जाती है।
She goes by. (BO)
He is. (GO)
 (b) वह जाती है
He is (BO)
He is (GO)
Manual Translation: She goes.
 Examples (a) and (b) above exhibit how the use of a punctuation mark can significantly
affect translation. This variation in results is seen only in Bing. Google exhibits consistency.
17

TRANSLITERATION ISSUE:
 (b) एक नौन-स्टिक तवा गरम करें
A naun-stick frying pan and heat (BO)
A Non - stick frying pan and heat (GO)
 Manual Translation: Heat the non-stick fry pan
18

MORPHISSUE
 (i) Unknown words:
 छु आरे डालकर ममलाएं और
एक ममननि पकाएँ
One minute into the match and put chuare (BO)
Mix and cook one minute, add Cuare (GO)
 Manual Translation: Put date palm, stir and cook for a minute.
19

 (ii) Error with Paradigm fixation:
 कॅन्सर 1000 से अधिक बीमाररयों
 का एक समूह है
 Cancer is a group of more than 1000 berryman (BO)
 Cancer is a group of more than 1000 illnesses (GO)
 कॅन्सर 1000 से अधिक बीमारी
का एक समूह है
 Cancer is a group of more than 1,000 diseases (BO)
 Cancer is a group of more than 1000 illnesses (GO)
 Manual Translation: Cancer is a group of more than 1000 diseases. 20

STRUCTURAL/GRAMMATICAL DIFFERENCES
 वी. आइ. पी. क्या है?
 What is the VIP? (BO)
 VIP what is it? (GO)
 Manual Translation: What is the VIP?
Errors with Gender agreement
 वह जाती है।
 She goes by. (BO)
 He is. (GO)
 Manual Translation: She goes. 21

PARSER ISSUES
आँख की माांसपेधियों की कमजोरी के कारण लेंस अपना आकार नहीं बदल पाता पढ़ते या नजदीकी काम
करते समय प्रकाि की धकरणे रधिना के पीछे पड़ती है यह 40 वर्ष और उससे ऊपर की उम्र् में पाई जाती
है
Due to the weakness of the muscles of the eye lens cannot read or
change their size does proximity to work while the light rays have
it 40 years behind the retina and above in age (BO)
NO OUTPUT (GO)
22

Human evaluation strategy has been adopted to evaluate the Bing
(Microsoft) and Google MT (Hindi-English) output.
Methodology of MT testing:
For testing MT systems, 1,000 sentences were used. Their outputs were
then distributed into three different human evaluators who marked MT
outputs based on comprehensibility and fluency approaches.
23

Instructions for Evaluators to Evaluate :
 Read the target language translated output first.
 Judge each sentence for its comprehensibility.
 Rate it on the scale 0 to 4.
 Read the original source sentence only to verify the faithfulness of the translation (only for
reference).
 Do not read the source language sentence first.
 If the rating needs revision, change it to the new rating.
24

Guidelines of evaluation(on 5 point scale (over 0-4)):
 The following score is to be given to a sentence by looking at each output
sentence:
(A) For Comprehensibility
4= All meaning
3= most meaning
2 = much meaning
1= little meaning
0= none. 25

 (B)For fluency
4= for Flawless or Perfect: (like someone who knows the language)
3= for Good or Comprehensible but has quite a few errors: (like someone
speaking Hindi getting all its genders wrong)
2 = for Non-native or Comprehensible but has quite a few errors: (like
someone who can speak your language but would make lots of error.
However, you can make sense out of what is being said.)
1= for Diffluent or Some parts make sense but is not comprehensible over
all: (like listening to a language which has lot of borrowed words from your
language- you understood those words but nothing more)
0=for Incomprehensible or Non-Sense: (If the sentence does not make any
sense at all - It is like someone speaking to you in a language you do not
know)
26

EVALUATION METHOD
If scoring is done for N sentences and each of the N sentences is given a score
as above, the two parameters are as follows:
(a) Comprehensibility = (Number of sentences with the score of 2, 3, or 4) / N
(b) Fluency = 𝑘=1
𝑁
𝑆𝑖/𝑁
27

 Where Si is the score of ith sentence, for instance, If N=10, and suppose the scores obtained
for the each of the 10 sentences are : S1=3, S2=3, S3=2 S4=1, S5=4, S6=0, S7=0, S8=1, S9=0,
S10=0 This gives the following histogram :
 Number of sentences with score 4 = 1
 Weighted sum =14, then this produces:
 Comprehensibility = 40 % (Because 4 out of 10 sentences gain with a score of 2, 3, or 4.)
 Fluency = 14/10= 1.4 (on a scale of 0-4)
36% (on the max possible scale of 100) 28

Table 1: Score Table to Compute
Comprehensibility
Table 2: Score Table to Compute Fluency
29

Hence, we have evaluated Bing & Google MT systems. When
we examined and evaluated these systems, we found many
errors. And when, we evaluated MT systems, the fluency was
found to be very low but it was almost comprehensible. On
comparison, Google was found to be better than Bing MT in
comprehensibility.
31

SUGGESTIONS
While giving the input sentences tokenize them and avoid the use full stop
marker in final place.
Both MT systems should improve their morph dictionary through corpus data
and make linguistics rules for paradigm fixation(how to analyze inflectional
and derivational category), and if MT systems are trained with large number
of words and sentences then parsing issues might be resolved.
Then, these systems will improve and the errors will decrease up to some
extent. Following these steps, we can increase the Bing and Google MT
systems in fluency as well as in comprehensibility.
32

REFERENCES
http://www.shodhganga.inflibnet.ac.in
http://www.navbharattimes.indiatimes.com
http://www.academia.edu
Lecture slides
33

Evaluation of hindi english mt systems, challenges and solutions

Recommended

Recommended

More Related Content

What's hot

What's hot (18)

Viewers also liked

Viewers also liked (20)

Similar to Evaluation of hindi english mt systems, challenges and solutions

Similar to Evaluation of hindi english mt systems, challenges and solutions (20)

More from Sajeed Mahaboob

More from Sajeed Mahaboob (13)

Recently uploaded

Recently uploaded (20)

Evaluation of hindi english mt systems, challenges and solutions