This document discusses research into selecting the best machine translation system for a given input sentence without knowing the internal workings of each system. It presents a methodology using various features extracted from the input sentence, such as phrase structure features, probabilistic features, and dependency-based features, to train a machine learning classifier to predict the highest quality translation system. Experiments were conducted on English-to-Bangla translation systems using different datasets and classifiers. The results showed that the IB1 classifier achieved the best performance when using the proposed features for automatic selection of the most appropriate translation system.
How to Know Best Machine Translation System in Advance before Translating a Sentence?
1. How to Know Best Machine
Translation System in Advance
before Translating a
Sentence?
Bibekananda Kundu and Sanjay Kumar Choudhury
Centre for Development of Advanced
Computing
December 19, 2014
{ bibekananda.kundu,sanjay.choudhury }@cdac.in
2. 2/19
Contents
* Research Problem
* Methodology and Contributions
* Feature Set for Selecting Best MT System
* Experiments
* Results and Discussion
* Conclusions
3. 3/19
* Research Problem
....................
Was my camera repaired already?
.
MT1
.
MT2
.
MT3
...
আমার ক ােমরা িক ইিতমে মরামত করা হে িছল ?
.
আমার ক ােমরা িক ইিতমে মরামত করা িছেলা ?
.
িক আমার ক ােমরা ইতঃ েব মরামত করা হেয়িছল ?
.
আমার ক ােমরা ইিতমে মরামত করা ?
.
আমার ক ােমরা ইিতমে িছল মরামত ?
How to identify a MT system from a set of multiple MT
systems in advance, capable of producing most appropriate
translation for a source sentence without having any idea
about working principle of these MT systems.
8. 8/19
* Feature Set for Selecting Best MT System
* Phrase-structure features : represent structural
complexity of a sentence.
* Dependency based features: represent how words in
a sentence depend on each other even for long distances.
* Probabilistic features: represent complexity in
term of out-of-vocabulary (OOV), likelihood of a
sentence, likelihood of a dependency relation, mapping
capability of a source word to multiple target words or
vice versa.
9. 9/19
* Feature Set for Selecting Best MT System
* Phrase-structure features :
.
Number of Unique POS Tags (NUPT)
.
POS Tag Density (PTD)
.
Maximum and Mean Depth
.
Number of Internal nodes
.
Maximum and Mean Number of Child Nodes for
each Node
....S1...
..SQ.....
......
..?
.
....
..VP.....
..ADVP...
..RB...
..already
.
..
..VBN...
..repaired
.
....
..NP.....
..NN...
..camera
.
..
..PRP$...
..my
.
..
..AUX...
..Was
10. 10
/19
* Feature Set for Selecting Best MT System
* Probabilistic features :
.
Joint Probability of Input Sentence (JPIS): We
have approximated JPIS using trigram sequences.
P(S = w1w2w3 · · · wn) = P(w1)
× P(w2|w1)
× P(w3|w1w2)
× · · ·
× P(wn|wn−2wn−1)
11. 11/19
* Feature Set for Selecting Best MT System
* Probabilistic features :
Joint Probability Using N-gram Dependency (JPUND):
Dependency based language model is reported in
(Shen et al. 2008). JPUND for the dependency tree is
calculated as :
JPUND = PT (repaired)
× PL(camera | repairedhead )
× PL(my | camerahead )
× PL(was | my, camerahead )
× PR (already | repairedhead )
× PR (? | already, repairedhead )
..
Was
.
my
.
camera
.
repaired
.
already
.
nsubj
.
pos
.
advmod
.
cop
.
?
.
punct
.
ROOT
Figure : A dependency tree.
12. 12/19
* Feature Set for Selecting Best MT System
* Dependency based features :
.
Number of Dependency Link (NDL)
.
Maximum Dependency Distance (MDD)
.
Maximum amongst the Number of Dependent of a
Word (MNDW)
..
Was
.
my
.
camera
.
repaired
.
already
.
nsubj
.
pos
.
advmod
.
cop
.
?
.
punct
.
ROOT
Figure : A dependency tree.
13. 13/19
* Experiemnts
* Questions to Answer. Can features extracted from source sentences predict
the quality of a MT system?
. Which machine learning algorithm is most
appropriate for this classification task?
. How selection of different types of features influences
the performances of classifiers?
14. 14/19
* Experiemnts
* English-Bangla MT Systems
. AnglaMT: http://tdil-dc.in
. GoogleMT: https://translate.google.co.in
* Data Preparation
. 20K Basic Travel Expression Corpus (BTEC)
. 50K ILCI corpus: http://www.tdil-dc.in/
* Tools used in this experiments
. WEKA: http://www.cs.waikato.ac.nz/ ml/weka
. Charniak parser: http://cs.brown.edu/ ec/
. Malt parser: http://www.maltparser.org/
. Moses toolkit: http://www.statmt.org/moses/
16. 16/19
* Conclusions
. A machine learning approach for selecting a MT system
producing most appropriate translation before translating the
input sentence.
. Our approach uses phrase-structure, probabilistic and
dependency features.
. Features used in this paper can also be applied on similar
NLP tasks where measuring confidence of the system is
required.
. Experiment shows IB1 classifier provides best performance
when compare to other classifiers.