Corpus Effects on the Evaluation of Automated Transliteration Systems

Corpus Effects
on the Evaluation
of Automated
Transliteration
Systems
Sarvnaz Karimi
Andrew Turpin
Falk Scholer
Introduction
Corpus
Experiments
Conclusion
Corpus Effects on the Evaluation of
Automated Transliteration Systems
Sarvnaz Karimi
Andrew Turpin
Falk Scholer
School of Computer Science and Information Technology
RMIT University, Melbourne, Australia
26 June 2007

Corpus Effects
on the Evaluation
of Automated
Transliteration
Systems
Sarvnaz Karimi
Andrew Turpin
Falk Scholer
Introduction
Corpus
Experiments
Conclusion
Transliteration
Machine Transliteration:
Automatically transforming a word written in a source
language into a word in a target language.
Example:
Prague (source) to üÉÃ (target)
Evaluation:
Machine generated words are compared with human
generated ones. Human judgment is a gold standard.

Corpus Effects
on the Evaluation
of Automated
Transliteration
Systems
Sarvnaz Karimi
Andrew Turpin
Falk Scholer
Introduction
Corpus
Experiments
Conclusion
Transliteration is Subjective
How to deﬁne correct transliteration?
Prague: üÉÃ or üÉÃ?
00000000000000000000000000
000000000000000000000000000000000000000
11111111111111111111111111
111111111111111111111111111111111111111
0000000000000
000000000000000000000000000000000000000
00000000000000000000000000
1111111111111
111111111111111111111111111111111111111
11111111111111111111111111?
?
Automatic
Transliterator
Target Word
Target Word
Source Word
Source Word
Human Transliterator
STANDARD
STANDARD
???
Praha
Prague
Prago
Prag
Praag
Prago
Prag
Prague
Praha
Prag
Praag
Prago

Corpus Effects
on the Evaluation
of Automated
Transliteration
Systems
Sarvnaz Karimi
Andrew Turpin
Falk Scholer
Introduction
Corpus
Experiments
Conclusion
Evaluating Algorithms or Corpus?
When evaluating a transliteration algorithm, can
a testing corpus mislead us in our judgments?
Algorithm
Corpus
+
Algorithm
+
Corpus
Specification

Corpus Effects
on the Evaluation
of Automated
Transliteration
Systems
Sarvnaz Karimi
Andrew Turpin
Falk Scholer
Introduction
Corpus
Experiments
Conclusion
Experimental Scheme
◮ Transliteration systems: Two grapheme-based
algorithms previously examined for English-Persian
language pairs. We refer them as system A and
system B.
◮ Corpus: We constructed a controlled corpus
(language origin, number of transliterators,
transliterators language knowledge).
◮ Evaluation measure: Word accuracy and its
variants, human agreement, entropy of transliteration
rules (transliterator’s consistency).

Corpus Effects
on the Evaluation
of Automated
Transliteration
Systems
Sarvnaz Karimi
Andrew Turpin
Falk Scholer
Introduction
Corpus
Experiments
Conclusion
Controlled corpus
We made a corpus with the following speciﬁcations:
◮ Three datasets (English, Arabic and Dutch)
containing 500 word-pairs each.
◮ Seven transliterators (Persian native speakers).
◮ All of the transliterators knew English and Arabic and
had no Dutch knowledge.
◮ All of the transliterators had at least a Bachelors
degree.
◮ The origin of the words was not given to them.

Corpus Effects
on the Evaluation
of Automated
Transliteration
Systems
Sarvnaz Karimi
Andrew Turpin
Falk Scholer
Introduction
Corpus
Experiments
Conclusion
Word Accuracy
WA = number of correct transliterations
total number of test words
If more than one judgment is available we can deﬁne:
1. Uniform Word Accuracy (UWA):
All the variations suggested by transliterators are equally valid.
2. Weighted Word Accuracy (WWA):
A weight is assigned to the transliterations based on the number
of people who suggested that variant.
3. Majority Word Accuracy (MWA):
Only one of the transliterations suggested by majority of the
transliterators, is chosen as the correct one.

Corpus Effects
on the Evaluation
of Automated
Transliteration
Systems
Sarvnaz Karimi
Andrew Turpin
Falk Scholer
Introduction
Corpus
Experiments
Conclusion
Language Origin and the Ranking of the
Systems
Corpora with different language origins:
E7 D7 A7 EDA7
Corpus
0
20
40
60
80
100
WordAccuracy(%)
UWA (SYS-B)
UWA (SYS-A)
MWA (SYS-B)
MWA (SYS-A)
Randomly selected EDA sub-corpora:
0 20 40 60 80 100
Corpus
0
20
40
60
80
100
WordAccuracy(%)
UWA (SYS-B)
UWA (SYS-A)
MWA (SYS-B)
MWA (SYS-A)
Systems ranking remains constant but not accuracy values.

Corpus Effects
on the Evaluation
of Automated
Transliteration
Systems
Sarvnaz Karimi
Andrew Turpin
Falk Scholer
Introduction
Corpus
Experiments
Conclusion
Accuracy and Single Transliterators
System A:
E7 D7 A7 EDA7
Corpus
0
20
40
60
WordAccuracy(%)
T1
T2
T3
T4
T5
T6
T7
17.2
39.0
System B:
E7 D7 A7 EDA7
Corpus
0
20
40
60
WordAccuracy(%)
T1
T2
T3
T4
T5
T6
T7
23.2
56.2
Evaluation can be heavily biased towards the judgments.

Corpus Effects
on the Evaluation
of Automated
Transliteration
Systems
Sarvnaz Karimi
Andrew Turpin
Falk Scholer
Introduction
Corpus
Experiments
Conclusion
Accuracy and Number of Transliterators
Transliteration using a combination of transliterators
(EDA corpus)
Creating a corpus for training and testing of a transliteration
system should be done using more than one transliterator.

Corpus Effects
on the Evaluation
of Automated
Transliteration
Systems
Sarvnaz Karimi
Andrew Turpin
Falk Scholer
Introduction
Corpus
Experiments
Conclusion
Human Agreement
How far do humans themselves agree on
transliteration?
Raw agreement adapted to calculate human agreement:
PA =
total number of actual agreements
total number of possible agreements

Corpus Effects
on the Evaluation
of Automated
Transliteration
Systems
Sarvnaz Karimi
Andrew Turpin
Falk Scholer
Introduction
Corpus
Experiments
Conclusion
Inter-Transliterator Agreement and Perceived
Difﬁculty
Transliterator’s perception of the task
(H:hard, M: medium, E:easy)
Transliterator English Dutch Arabic
1 H H M
2 M M E
3 M H M
4 M M E
5 M H E
6 M H E
7 M H M
Measured Agreement:
English: 33.6%
Dutch: 15.5%
Arabic: 33.3%
There is a direct relation between transliterator knowledge of the
source language which the words come from and their
agreement.

Corpus Effects
on the Evaluation
of Automated
Transliteration
Systems
Sarvnaz Karimi
Andrew Turpin
Falk Scholer
Introduction
Corpus
Experiments
Conclusion
Transliterator Consistency
A transliterator’s habit of transliteration deﬁnes the rules
of transforming words.
Rules: C → ( , 0.6)
C → (ô, 0.3)
C → ( , 0.1)
E7 D7 A7 EDA7
Corpus
0.0
0.2
0.4
0.6
Entropy
E7 D7 A7 EDA7
Corpus
0
20
40
60
WordAccuracy(%)
T1
T2
T3
T4
T5
T6
T7
The consistency with which transliterators employ their own
rules has a direct effect on the system’s accuracy.

Corpus Effects
on the Evaluation
of Automated
Transliteration
Systems
Sarvnaz Karimi
Andrew Turpin
Falk Scholer
Introduction
Corpus
Experiments
Conclusion
Conclusions
Main achievements of our experiments:
1. Although different transliteration systems may have
different accuracy levels on different corpora, their
ranking holds across these corpora.
2. One transliteration system can achieve different
accuracy with corpora constructed by different
transliterators. The variation can be up to 30% in
terms of word accuracy.
3. The origin of source words has a direct effect on
system performance. The English origin words are
generally transliterated more accurately than Arabic
and Dutch origin words.

Corpus Effects
on the Evaluation
of Automated
Transliteration
Systems
Sarvnaz Karimi
Andrew Turpin
Falk Scholer
Introduction
Corpus
Experiments
Conclusion
Suggestions
◮ We conclude that, when making a collection for
transliteration we should construct it with assistance
of multiple transliterators ( 4) or make sure that
transliterations are from different sources that are
more likely reﬂect different people knowedge.
◮ When we report our results we should report:
1. The origin of source words.
2. Number of transliterators who constructed the corpus
or exact process of corpus construction.
Corpus speciﬁcations are as important as
algorithms, and must be stated clearly in our
experiments.

Corpus Effects
on the Evaluation
of Automated
Transliteration
Systems
Sarvnaz Karimi
Andrew Turpin
Falk Scholer
Introduction
Corpus
Experiments
Conclusion
Thank You!

Corpus Effects on the Evaluation of Automated Transliteration Systems

Recommended

Recommended

More Related Content

Similar to Corpus Effects on the Evaluation of Automated Transliteration Systems

Similar to Corpus Effects on the Evaluation of Automated Transliteration Systems (20)

More from Sarvnaz Karimi

More from Sarvnaz Karimi (6)

Recently uploaded

Recently uploaded (20)

Corpus Effects on the Evaluation of Automated Transliteration Systems