Internal Evaluation for a MT System, German to English

862 views

Published on

This presentation analyses the translation quality of MT systems for German to English using automatic evaluation metrics.

I reported the results of my experiments with phrase based statistical machine translation using Moses. Nine setups were considered including the baseline and tuning process. The best setup takes advantage of the tuning and showed an improvement of translation quality in terms of BLEU.

A brief introduction to factored MT system was taking into account in the last experiments in order to use the postag like a strategy in the target language (English).

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
862
On SlideShare
0
From Embeds
0
Number of Embeds
38
Actions
Shares
0
Downloads
0
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide
  • Let ’ s now move to the problem
  • The problem addressed in this paper is related with the coordinative and prepositional syntactic ambiguity in Spanish. Since, Spanish is considered a complex language for its variability structure and some different grammatical rules
  • So, Turning to the objetives
  • This paper proposed a method to solve coordinative and prepositional syntactic ambiguity for a written text in natural language. The main aims are: Decrease the number of syntactic representations of a phrase. Definition of a set of heuristic rules to indentify and solve this type of ambiguity. Implementation of this method for syntactic disambiguation for Spanish using the python language (Natural Language Toolkit - NLTK)
  • So, Turning to the objetives
  • Next, I will give you a brief explanation about the implementation of this method
  • Next, I will give you a brief explanation about the implementation of this method
  • Next, I will give you a brief explanation about the implementation of this method
  • Next, I will give you a brief explanation about the implementation of this method
  • Next, I will give you a brief explanation about the implementation of this method
  • For the extension of the work!!
  • For the extension of the work!!
  • Internal Evaluation for a MT System, German to English

    1. 1. Internal Evaluation German to English translation [email_address] , [email_address] Nervo Verdezoto D.
    2. 2. Outline <ul><li>The Problem </li></ul><ul><li>Objectives </li></ul><ul><li>Literature Review </li></ul><ul><li>Baseline </li></ul><ul><li>Experiments </li></ul><ul><li>Results </li></ul><ul><li>Summary </li></ul>
    3. 3. The Problem <ul><li>Word order differences </li></ul><ul><ul><li>En glish is SVO language while German is SOV language. </li></ul></ul><ul><ul><li>Long distance distortions are observed frequently. </li></ul></ul>
    4. 4. Outline <ul><li>The Problem </li></ul><ul><li>Objectives </li></ul><ul><li>Literature Review </li></ul><ul><li>Software & Hardware </li></ul><ul><li>Baseline </li></ul><ul><li>Experiments </li></ul><ul><li>Results </li></ul><ul><li>Summary </li></ul>
    5. 5. Objetives <ul><li>Improve the baseline </li></ul><ul><li>Get familiar with moses toolkit </li></ul><ul><li>Get familiar with scientific reporting </li></ul>
    6. 6. Outline <ul><li>The Problem </li></ul><ul><li>Objectives </li></ul><ul><li>Literature Review </li></ul><ul><li>Software & Hardware </li></ul><ul><li>Baseline </li></ul><ul><li>Experiments </li></ul><ul><li>Results </li></ul><ul><li>Summary </li></ul>
    7. 7. Literature Review <ul><li>Pre-processing. </li></ul><ul><ul><li>Alternative word reordering and alignment mechanisms as optional phrase table modifications, using POS-based reordering model [1] [2] [4] [7]. </li></ul></ul><ul><ul><li>Compound splitting, compound merging and part-of-speech/morphological sequence models [3] [4] [6]. </li></ul></ul><ul><ul><li>Augmenting the training corpus with an extracted dictionary [3] [5]. </li></ul></ul><ul><ul><li>Factored phrase-based statistical machine translation system, with different levels of linguistic information (e.g. lemmas, POS, etc.) [5] [8]. </li></ul></ul><ul><li>Post-processing. </li></ul><ul><ul><li>Out-of-vocabulary words (e.g. proper names which do not get capitalized properly or numbers with different format in German and English) [3]. </li></ul></ul><ul><li>Others </li></ul>
    8. 8. Road Map <ul><li>The Problem </li></ul><ul><li>Objectives </li></ul><ul><li>Literature Review </li></ul><ul><li>Software & Hardware </li></ul><ul><li>Baseline </li></ul><ul><li>Experiments </li></ul><ul><li>Results </li></ul><ul><li>Summary </li></ul>
    9. 9. Software & Hardware <ul><li>Software </li></ul><ul><ul><li>Moses, an open source MT toolkit. </li></ul></ul><ul><ul><li>The IRSTLM language modeling toolkit. </li></ul></ul><ul><ul><li>Giza ++, to perform word alignments. </li></ul></ul><ul><ul><li>C&C Language Processing Tools, for the CCG supertagger. </li></ul></ul><ul><ul><li>Word_error.pl script, (provided by CMU). </li></ul></ul><ul><li>Hardware. </li></ul><ul><ul><li>Intel(R) Core(TM) 2 CPU T7200 @ 2.00 GHz with 3GB of RAM and 32 bit Linux (ubuntu). </li></ul></ul><ul><li>see MOSES: http://www.statmt.org/moses/ </li></ul><ul><li>see SUPERTAGGER: http://svn.ask.it.usyd.edu.au/trac/candc/wiki </li></ul><ul><li>see WER: CMU: http://www.cs.cmu.edu/~roni/11761-s07/assignments/assignment6/word_error.pl </li></ul>
    10. 10. Outline <ul><li>The Problem </li></ul><ul><li>Objectives </li></ul><ul><li>Software & Hardware </li></ul><ul><li>Baseline </li></ul><ul><li>Experiments </li></ul><ul><li>Results </li></ul><ul><li>Summary </li></ul>
    11. 11. Baseline Table 1. Statistics of the Dataset de: wiederaufnahme der sitzungsperiode en: resumption of the session de: ich erkläre die am freitag , dem 17. dezember unterbrochene sitzungsperiode des europäischen parlaments für wiederaufgenommen , wünsche ihnen nochmals en: i declare resumed the session of the european parliament adjourned on friday 17 december 1999 , and i would like once again to wish you a happy new year in de: alles gute zum jahreswechsel und hoffe , daß sie schöne ferien hatten . en: the hope that you enjoyed a pleasant festive period . Figure 1. Sample of the training corpus German < >English Training Sentences 78524 Words 1581042 1684639 Dev Sentences 2000 Words 55118 58761 Test Sentences 2000 Words 55580 59153
    12. 12. Baseline - Results <ul><li>Evaluation Measures </li></ul><ul><ul><li>BLUE </li></ul></ul><ul><ul><li>NIST </li></ul></ul><ul><ul><li>WER </li></ul></ul><ul><ul><li>PER </li></ul></ul><ul><li>Tunning: MERT (1000 sentences) </li></ul>Table 2. Performance of initial models Measure Model 0 -Baseline Model 1 -Tunning BLUE 23.24% 23.62% NIST 6.5426 6.4539 WER 69.09% 70.90% PER 18.82% 16.43%
    13. 13. Outline <ul><li>The Problem </li></ul><ul><li>Objectives </li></ul><ul><li>Baseline </li></ul><ul><li>Sofware & Hardware </li></ul><ul><li>Experiments </li></ul><ul><li>Results </li></ul><ul><li>Summary </li></ul>
    14. 14. Experiments Setup Description Setup 1, 2 Filter sentences Baseline - 40 Setup 1 – 45 Setup 2 -35 Setup 3,4 and 5 Combination with baseline, setup1 and setup2 and lexicalized reordering model (reordering configuration msd-bidirectional-fe and distorsion limit 6 ). Setup 3 – filter(40) Setup 4 –filter (45) Setup 5 –filter (35) Setup 6 I tried to split source data but it does not work Setup 7 and 8 Adding Part Of Speech information using Factored translation mode in the target data (English) / LM: Setup 7 (3gram), Setup 8(5gram) Setup 9 I tried to used Moses for factored translation model in the source (German) but it does not work. I tried to train the suppertagger with a German corpus (TIGGER corpus) and I got a problem with the format of the files see http://www.ims.uni-stuttgart.de/projekte/TIGER/TIGERCorpus/annotation
    15. 15. Factored Translation model <ul><li>Alexandra Birch Miles Osborne Philipp Koehn. CCG Supertags in Factored Statistical Machine TranslationProceedings of the Second Workshop on Statistical Machine Translation, pages 9–16,Prague, June 2007. c2007 Association for Computational Linguistics </li></ul>
    16. 16. Factored translation model <ul><li>Preprocessing </li></ul>nervo @nervo-laptop:~/supertagger/candc-1.00$ bin/pos --model models/pos --input data/baseline.de-en.tok.clean.en --output data/baseline.de-en.tok.clean.postag.en nervo @nervo-laptop:~/supertagger/candc-1.00$ bin/super --model models/super --input data/baseline.de-en.tok.clean.postag.lowercased.en --output data/baseline.de-en.tok.clean.postag.lowercased.supertag.en nervo @nervo-laptop:~/MOSESMT/baseline-system/baseline-system$ cat trainingcorpus/baseline.de-en.tok.clean.postag.en | perl ../../moses-scripts/ lowercase . perl > trainingcorpus/baseline.de-en.tok.clean.postag.lowercased.en Figure 3. Sample of pre-processing (supertagger) resumption|resumption|nn of|of|in the|the|dt session|session|nn i|i|prp declare|declare|vbp resumed|resumed|vbn the|the|dt session|session|nn of|of|in the|the|dt european|european|nnp parliament|parliament|nnp adjourned|adjourned|vbd on|on|in friday|friday|nnp 17|17|cd december|december|nnp 1999|1999|cd ,|,|, and|and|cc i|i|prp would|would|md like|like|vb once|once|rb again|again|rb to|to|to wish|wish|vb you|you|prp a|a|dt happy|happy|jj new|new|jj year|year|nn in|in|in the|the|dt hope|hope|nn that|that|in you|you|prp enjoyed|enjoyed|vbd a|a|dt pleasant|pleasant|jj festive|festive|jj period|period|nn .|.|. Figure 4 . Sample of training data with supertags Original corpus resumption of the session POStagged corpus resumption|resumption|nn of|of|in the|the|dt session|session|nn
    17. 17. Training <ul><li>../moses/scripts/training/train-factored-phrase-model.perl -bin-dir /home/nervo/MOSESMT/bin/ -scripts-root-dir /home/nervo/MOSESMT/moses/scripts -root-dir . --corpus trainingcorpus/baseline.de-en.tok.clean.lowercased -f de -e en -first-step 1 -last-step 9 -max-phrase-length 5 -reordering msd-bidirectional-fe -lm 0:3 :/home/nervo/MOSESMT/SETUP7/lm/baseline.irstlm.bin:0 -lm 2:3 :/home/nervo/MOSESMT/SETUP7/lm/baseline.pos.irstlm.bin:0 >& logsetup7.train </li></ul><ul><li>../moses/scripts/training/train-factored-phrase-model.perl -bin-dir /home/nervo/MOSESMT/bin/ -scripts-root-dir /home/nervo/MOSESMT/moses/scripts -root-dir . -corpus trainingcorpus/baseline.de-en.tok.clean.lowercased -f de -e en -first-step 1 -last-step 9 -reordering msd-bidirectional-fe -lm 0:5 :/home/nervo/MOSESMT/SETUP8/lm/baseline.irstlm.bin:0 -lm 2:5 :/home/nervo/MOSESMT/SETUP8/lm/baseline.pos.irstlm.bin:0 >& logsetup8.train </li></ul>
    18. 18. Changes in moses.ini <ul><li># translation tables: source-factors, target-factors, number of scores, file </li></ul><ul><li>[ttable-file] </li></ul><ul><li>0 0,1,2 5 /home/nervo/MOSESMT/SETUP8/model/phrase-table.gz </li></ul><ul><li># language models: type(srilm/irstlm), factors, order, file </li></ul><ul><li>[lmodel-file] </li></ul><ul><li>1 0 5 /home/nervo/MOSESMT/SETUP8/lm/baseline.irstlm.bin </li></ul><ul><li>1 2 5 /home/nervo/MOSESMT/SETUP8/lm/baseline.pos.irstlm.bin </li></ul>
    19. 19. SETUP7
    20. 20. Outline <ul><li>The Problem </li></ul><ul><li>Objectives </li></ul><ul><li>Baseline </li></ul><ul><li>Sofware & Hardware </li></ul><ul><li>Experiments </li></ul><ul><li>Results </li></ul><ul><li>Summary </li></ul>
    21. 21. Experimental results Measure Baseline 40 SETUP 1 - 45 SETUP 2 35 BLUE 23.24% 23.20% 22.88% NIST 6.5426 6.5490 6.4742 WER 69.09% 69.08% 69.58% PER 18.82% 18.80% 18.84% Measure SETUP 3 40 SETUP 4 45 SETUP 5 35 BLUE 23.03% 23.06% 22.56% NIST 6.5168 6.5349 6.4485 WER 68.52% 68.29% 69.07% PER 19.46% 19.72% 19.46% Measure SETUP 7 40 - 3gram SETUP 8 40-5gram SETUP9 BLUE 21.51% 21.59% / NIST 6.1699 6.1754 / WER 73.29% 73.45% / PER 17.74% 17.39% /
    22. 22. MODEL EXAMPLE REFERENCE he wanted the presidency to outline the way forward at nice . BASELINE he HAS EXPRESSED THE WISH THAT the presidency IN NICE the way *** AHEAD AUFZEIGT TUNING he HAS EXPRESSED THE WISH THAT the presidency IN NICE the way *** AHEAD AUFZEIGT SETUP1 he HAS EXPRESSED THE WISH THAT the presidency IN NICE the way *** *** AHEAD SETUP2 he HAS EXPRESSED the WISH THAT THE PRESIDENCY IN NICE way AUFZEIGT THE FUTURE SETUP3 he HAS EXPRESSED THE WISH THAT the presidency IN NICE , the way *** AHEAD AUFZEIGT SETUP4 he HAS EXPRESSED THE WISH THAT the presidency IN NICE the way *** *** AHEAD SETUP5 he HAS EXPRESSED THE WISH THAT the presidency IN NICE , THE FUTURE PATH AUFZEIGT SETUP6 -- SETUP7 he HAS EXPRESSED THE WISH THAT the presidency IN NICE , THE FUTURE PATH SHOWS SETUP8 he HAS EXPRESSED THE WISH THAT the presidency IN NICE , the way *** AHEAD SHOWS
    23. 23. Outline <ul><li>The Problem </li></ul><ul><li>Objectives </li></ul><ul><li>Baseline </li></ul><ul><li>Sofware & Hardware </li></ul><ul><li>Experiments </li></ul><ul><li>Results </li></ul><ul><li>Summary </li></ul>
    24. 24. Summary and Conclusion <ul><li>The BLUE score after tuning is always higher than the rest of scores of the setups. </li></ul><ul><li>Manual analysis can hell to understand the role of postagging in the target data </li></ul><ul><li>The future direction in this way would be tried to develop another strategy in order to choose the correct parameters and I need to play more with the factored sequence model in order to find a good way to select and set these parameters taking into account the advantages of these systems. </li></ul>
    25. 25. REFERENCES <ul><li>Niehues, Jan., Kolss, M. A POS-Based Model for Long-Range Reordering in SMT. Proceedings of the Fourth Workshop on Statistical Machine Translation , pages 206–214, Athens, Greece, 30 March – 31 March 2009. Association for Computational Linguistics </li></ul><ul><li>Niehues, Jan., Herrmann, T., Kolss, M. The Universitat Karlsruhe Translation System for the EACL-WMT 2009. Proceedings of the Fourth Workshop on Statistical Machine Translation, pages 80–84, Athens, Greece, 30 March – 31 March 2009. Association for Computational Linguistics </li></ul><ul><li>Holmqvist, M., Stymne, S., Foo, J. and Ahrenberg, L. The Improving alignment for SMT by reordering and augmenting the training corpus . Proceedings of the Fourth Workshop on Statistical Machine Translation , pages 120–124, Athens, Greece, 30 March – 31 March 2009. Association for Computational Linguistics </li></ul><ul><li>Popovic, M., Vilar, D., Stein, D., Matusov E., Ney, H. The RWTH Machine Translation System for WMT 2009 . Proceedings of the Fourth Workshop on Statistical Machine Translation , pages 66–69, Athens, Greece, 30 March – 31 March 2009. Association for Computational Linguistics </li></ul><ul><li>Stymme, S. A Comparison of Merging Strategies for Translation of German Compounds. Proceedings of the EACL 2009 Student Research Workshop , pages 61–69, Athens, Greece, 2 April 2009. Association for Computational Linguistics </li></ul><ul><li>Dyer, C. Using a maximum entropy model to build segmentation lattices for MT. Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the ACL , pages 406–414, Boulder, Colorado, June 2009. c2009 Association for Computational Linguistics </li></ul><ul><li>Fraser, A. Experiments in morphosyntactic processing for translating to and from German. Proceedings of the Fourth Workshop on Statistical Machine Translation , pages 115–119, Athens, Greece, 30 March – 31 March 2009. c2009 Association for Computational Linguistics </li></ul><ul><li>Holmqvist, M., Stymne, S., Ahrenberg, L. Getting to know Moses: Initial experiments on German–English factored translation. Proceedings of the Second Workshop on Statistical Machine Translation , pages 181–184, Prague, June 2007. Association for Computational Linguistics </li></ul><ul><li>Birch, A., Osborne, M., Koehn, P. CCG Supertags in Factored Statistical Machine Translation. Proceedings of the Second Workshop on Statistical Machine Translation , pages 9–16, Prague, June 2007. Association for Computational Linguistics. </li></ul><ul><li>K. Papineni, S. Roukos, T. Ward, and W.-J. Zhu. 2002. BLEU: a method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting of the ACL , pages 311–318. </li></ul><ul><li>Federico, M., Bertoldi, N. A Word-to-Phrase Statistical Translation Model. ACM Transactions on Speech and Language Processing, Vol. 2, No. 2, December 2005, Pages 1–24. </li></ul><ul><li>Abdul-Rauf, S., Schwenk, H. Exploiting Comparable Corpora with TER and TERp . Proceedings of the 2nd Workshop on Building and Using Comparable Corpora, ACL-IJCNLP 2009 , pages 46–54, Suntec, Singapore, 6 August 2009. ACL and AFNLP </li></ul><ul><li>Koehn, P., Hoang, H., Birch, A., Callison-Burch, C., Federico, M., Bertoldi, N., Cowan, B., Shen, W., Moran, C., Zens, R., Dyer, C., Bojar, O., Constantin, A. and Herbst, E. 2007. Moses: Open source toolkit for statistical machine translation . In Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics (ACL ’07) – Companion Volume, June </li></ul><ul><li>Stolcke, A. 2002. SRILM – an extensible language modeling toolkit. In Proceedings of the Intl. Conf. on Spoken Language Processing . </li></ul><ul><li>Och, F. and Ney, H. 2003. A systematic comparison of various statistical alignment models. Computational Linguistics, 29(1):19–51, March. </li></ul><ul><li>Och, F. 2003. Minimum error rate training for statistical machine translation. In Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics (ACL-2003), Sapporo, Japan. </li></ul><ul><li>Clark, S. and Curran, J. 2004. Parsing the wsj using ccg and log-linear models . In Proceedings of the Association for Computational Linguistics , pages 103–110, Barcelona, Spain. </li></ul><ul><li>Clark, S. 2002. Supertagging for combinatory categorical grammar . In Proceedings of the International Workshop on Tree Adjoining Grammars , pages 19–24, Venice, Italy. </li></ul><ul><li>Callison-Burch, C., Osborne, M. and Koehn, P. 2006. Re-evaluating the role of Bleu in machine translation research. In Proceedings of the European Chapter of the Association for Computational Linguistics , Trento, Italy </li></ul>
    26. 26. <ul><li>THANK YOU </li></ul>Nervo Verdezoto D. [email_address] , [email_address]

    ×