Your SlideShare is downloading. ×
0
Hybrid Solutions for
Translation: Going hybrid
Qun Liu (DCU)
Dr. Manuel Herranz (Pangeanic)
12 November 2013, Birmingham, ...
PART A
Qun Liu (DCU)
qliu@computing.dcu.ie
Outline
 Why Hybrid MT?
 An overview of Hybrid MT
 Typical Hybrid MT Approaches
 Conclusion

Winter School 2013, Birmi...
MT Approaches
 RBMT: Rule-based Machine Translation
 EBMT: Example-based Machine Translation
 TM: Translation Memory
 ...
RBMT: Vauquois’ Triangle
Interlingua

Analysis

Generation
Semantic Transfer

Syntactic Transfer

Source Language

Direct
...
RBMT: Rules for Components
Morphological Analysis

Source Semantic Rules
Bilingual Lexicon

Syntactic Transfer

Syntactic ...
RBMT: an Example

Winter School 2013, Birmingham
RBMT: an Example

Winter School 2013, Birmingham
RBMT: an Example

Winter School 2013, Birmingham
RBMT: an Example

Winter School 2013, Birmingham
RBMT: an Example

Winter School 2013, Birmingham
RBMT
 RBMT makes use of human encoded
linguistic rules for translation
 Development of a RBMT system is
very expensive b...
RBMT
 RBMT systems can reach good translation
quality after years of development in the
given domain.
 Well developed RB...
EBMT
 An EBMT system translate sentences by
analog of existing translation examples
 EBMT does not need deep analysis of...
EBMT

Winter School 2013, Birmingham
EBMT
 Quality of EBMT increases while we
get more examples.
 A problem of EBMT is the coverage of
the examples, especial...
TM
 Translation Memory directly output
existing target sentence when a very
similar source sentence is found in the
memor...
SMT
 SMT builds statistical models to predict the
probability of a target sentence being the
translation of a given sourc...
SMT
 A large number of translation pairs (parallel
corpus) is needed to estimate the model
parameters.
 To predict the t...
Word-based SMT

Winter School 2013, Birmingham
Word-based SMT
Source

Target

Probability

Bushi (布什)

Bush

0.7

President

0.2

US

0.1

and

0.6

with

0.4

hold

0.7...
Phrase-based SMT

Winter School 2013, Birmingham
Phrase-based SMT
Source

Target

Probability

Bushi (布什)

Bush

0.5

president Bush

0.3

the US president

0.2

Bush and
...
Hierarchical Phrased-based SMT

Winter School 2013, Birmingham
Hierarchical Phrased-based SMT
Source

Target

Probability

juxing le huiang (举行了会谈)

hold a meeting

0.6

had a meeting

...
Syntax-based SMT

Winter School 2013, Birmingham
Syntax-based SMT
Source

Target

Probability

VPB(VS(juxing) AS(le) NPB(huiang))

hold a meeting

0.6

(举行了会谈)

have a mee...
SMT
 SMT is cheap
 SMT systems can be developed in a
short time
 SMT needs a large number of parallel
corpus

Winter Sc...
SMT
 SMT gets good quality translations if we
have plenty of in-domain data
 SMT quality drops dramatically for out-ofdo...
Why Hybrid MT?
 Each MT approach has its pros and cons.
 We want to take advantage of different MT
approaches
 We do no...
Outline
 Why Hybrid MT?
 An overview of Hybrid MT
 Typical Hybrid MT Approaches
 Conclusion

Winter School 2013, Birmi...
An overview of Hybrid MT
 Selective MT: loose coupling
 Pipelined MT: medium coupling
 Mixture MT: close coupling

Wint...
Selective MT
 Given translations generated by
different approaches, Selective MT
tries to select a best one, or select
be...
Selective MT
MT1

MT2

Select
Target

Source

MT3
Target
Winter School 2013, Birmingham
Selective MT
MT1

MT2

Select
Target

Source

MT3
Target
Winter School 2013, Birmingham
Selective MT
 Typical Selective MT:
 System Recommendation
 System Combination
 Sentence-level combination
 word-leve...
Pipelined MT
 Pipelined MT adopts one approach as
the main approach and use another
approach for monolingual preprocessin...
Pipelined MT

Pre-Processing

Main Approach

Post-Processing
Winter School 2013, Birmingham
Pipelined MT
 Typical Pipelined MT:
 Statistical Post-Editing for RBMT
 Rule-based Pre-reordering for SMT

Winter Schoo...
Mixture MT
 Mixture MT adopts one approach as
the main approach but utilizes one or
more different approaches in some
com...
Mixture MT

Winter School 2013, Birmingham
Mixture MT
 Typical Mixture MT:
 Statistical Parsing in RBMT
 Rule-based Named Entity Translation
in SMT
 Human-Encode...
Outline
 Why Hybrid MT?
 An overview of Hybrid MT
 Typical Hybrid MT Approaches
 Conclusion

Winter School 2013, Birmi...
Typical Hybrid MT Approaches
 Selective MT
 System Recommendation

System Combination
 Pipelined MT
 Mixture MT
Winte...
System Recommendation
 Yifan He, Yanjun Ma, Josef van Genabith and Andy
Way, Bridging SMT and TM with System
Recommendati...
System Recommendation
 Intuition:
 In some cases when we have enough big
translation memory, the trained SMT system is
c...
System Recommendation
TM
System
Recommendation

SMT

Parallel Corpus
Winter School 2013, Birmingham
System Recommendation
 A SVM binary classifier is adopted
 The classifier is trained on humanannotated data
 A confiden...
System Recommendation
 SMT System Features: features used in the SMT system
 TM Feature: Fuzzy Match Cost
 System Indep...
System Recommendation
 Evaluation Metrics:

Where A is the set of recommended MT
outputs, and B is the set of MT outputs ...
System Recommendation

Winter School 2013, Birmingham
System Recommendation

Winter School 2013, Birmingham
Typical Hybrid MT Approaches
 Selective MT
System Recommendation
 System Combination

 Pipelined MT
 Mixture MT
Winte...
System Combination
 Rosti, A. V. I., Ayan, N. F., Xiang, B., Matsoukas,
S., Schwartz, R. M., & Dorr, B. J. (2007, April)....
System Combination
 Rosti, A. V. I., Matsoukas, S., & Schwartz, R. (2007,
June). Improved word-level system combination f...
System Combination
 He, X., Yang, M., Gao, J., Nguyen, P., & Moore, R.
2008. Indirect-HMM-based hypothesis alignment for
...
System Combination
 Feng, Y., Liu, Y., Mi, H., Liu, Q., & Lü, Y. 2009. Latticebased system combination for statistical ma...
Sentence-Level
System Combination
 Kumar, S., & Byrne, W. J. (2004, May).
Minimum Bayes-Risk Decoding for
Statistical Mac...
Sentence-Level
System Combination
 Consider we have several MT systems
 For a given source text F, each MT system
output...
Sentence-Level
System Combination
 Minimum Bayes-Risk (MBR):

Winter School 2013, Birmingham
Word-Level
System Combination
 Select a translation candidate as a skeleton
(backbone) with Minimal Bayes Risk
 Construc...
Translation Candidate
Skeleton

Winter School 2013, Birmingham
Word Alignment
against the Skeleton

Skeleton

Winter School 2013, Birmingham
Confusion Network

Final output:
Please show me on the map.
Winter School 2013, Birmingham
Word-Level
System Combination
 System combination is proved to be very
effective
 In NIST Open MT Evaluation ChineseEngl...
Typical Hybrid MT Approaches
 Selective MT
 Pipelined MT
Statistical Post-Editing for RBMT
Rule-based Pre-reordering f...
Statistical Post-Editing for RBMT

 Dugast, L., Senellart, J., & Koehn, P. (2007, June).
Statistical post-editing on SYST...
Statistical Post-Editing for RBMT

 Simard, M., Ueffing, N., Isabelle, P., & Kuhn, R.
(2007). Rule-based Translation With...
Statistical Post-Editing
 When we have:
 A very good RBMT system
 Large number of parallel corpus which can be
used for...
Statistical Post-Editing
A Statistical Post-Editing (SPE) system is a
monolingual SMT system which takes the result of a
R...
Statistical Post Edit: Training

Source
Target

RBMT

RBMT
Target

SPE
Training

SPE

Target

Winter School 2013, Birmingh...
Statistical Post Edit: Training
 RBMT usually generates a better word
order while SMT can make better
lexical selection.
...
Typical Hybrid MT Approaches
 Selective MT
 Pipelined MT
Statistical Post-Editing for RBMT
Rule-based Pre-reordering f...
Rule-based Pre-reordering for SMT
 Elia Yuste, Manuel Herranz, Alexandra Helle and
Hirokazu Suzuki, Go Hybrid: Pangeanic'...
Rule-based Pre-reordering for SMT
 Xia, F., & McCord, M. (2004, August). Improving a
statistical MT system with automatic...
Rule-based Pre-reordering for SMT

 A phrase-based SMT (PBSMT) system
performs good lexical choices but is not
good at lo...
Rule-based Pre-reordering for SMT

Source
Text

PreReordering

Reordered
Source Text

PBSMT

Target
Text

Winter School 20...
PBSMT: Training

Source
Target

Prereordering

Reordered
Source

PBSMT
Training

PBSMT

Target

Winter School 2013, Birmin...
Pre-reordering: Training
 The rule for pre-ordering can be
automatic acquired from the parallel
corpus with automatic wor...
Pre-reordering: Training
 Parsing the source sentence
 Parsing the target sentence
 Align the words and the phrases in
...
Parsing Trees and Alignments

Winter School 2013, Birmingham
Rule Extraction

Winter School 2013, Birmingham
Rule Organization and Filtering

Winter School 2013, Birmingham
Applying Rewrite Rules

Winter School 2013, Birmingham
Rule-based Pre-reordering for SMT

Winter School 2013, Birmingham
Typical Hybrid MT Approaches
 Selective MT
 Pipelined MT

 Mixture MT
 Statistical Parsing in RBMT
 Rule-based Named ...
Statistical Parsing in RBMT
 Statistical parsing outperforms rulebased parsing if we have large scale
treebank.
 It is r...
Rule-based Named Entity Translation
in SMT
 Ney, H. (2013). Statistical MT Systems Revisited:
How much Hybridity do they ...
Numerical Expression Translation

English:
3,501,749

3 million
501 thousand
and 749

3501749
350,1749
Chinese:

350 wan 1...
Human-Acquired Rules in SMT
 Li, X., Lü, Y., Meng, Y., Liu, Q., & Yu, H.
Feedback Selecting of Manually Acquired
Rules Us...
Human-Acquired Rules in SMT

These rules are used in the decoding process
together with the Hierarchical Phrases in a
SMT ...
SMT Decoding with TM Phrases

 Philipp Koehn and Jean Senellart. 2010. Convergence of
translation memory and statistical ...
SMT Decoding with TM Phrases
 Yanjun Ma, Yifan He, Andy Way and Josef van Genabith.
2011. Consistent translation using di...
SMT Decoding with TM Phrases
 Extract TM phrases from similar
sentences in the translation memory
and use them in the dec...
Outline
 Why Hybrid MT?
 An overview of Hybrid MT
 Typical Hybrid MT Approaches
 Conclusion

Winter School 2013, Birmi...
Conclusion
 Different MT approaches have advantages and
disadvantages, which are usually complementary.
 Hybrid MT can t...
Thank you!
Q&A

Winter School 2013, Birmingham
Upcoming SlideShare
Loading in...5
×

8. Qun Liu (DCU) Hybrid Solutions for Translation

659

Published on

Published in: Technology, Business
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
659
On Slideshare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
11
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Transcript of "8. Qun Liu (DCU) Hybrid Solutions for Translation"

  1. 1. Hybrid Solutions for Translation: Going hybrid Qun Liu (DCU) Dr. Manuel Herranz (Pangeanic) 12 November 2013, Birmingham, UK
  2. 2. PART A Qun Liu (DCU) qliu@computing.dcu.ie
  3. 3. Outline  Why Hybrid MT?  An overview of Hybrid MT  Typical Hybrid MT Approaches  Conclusion Winter School 2013, Birmingham
  4. 4. MT Approaches  RBMT: Rule-based Machine Translation  EBMT: Example-based Machine Translation  TM: Translation Memory  SMT: Statistical Machine Translation Winter School 2013, Birmingham
  5. 5. RBMT: Vauquois’ Triangle Interlingua Analysis Generation Semantic Transfer Syntactic Transfer Source Language Direct Target Language Winter School 2013, Birmingham
  6. 6. RBMT: Rules for Components Morphological Analysis Source Semantic Rules Bilingual Lexicon Syntactic Transfer Syntactic Mapping Rules Semantic Transfer Semantic Mapping Rules Semantic Generation Generation Source Grammar Lexical Transfer Transfer Syntactic Analysis (Parsing) Semantic Analysis Analysis Source Morphological Rules Target Semantic Rules Syntactic Generation Target Grammar Morphological Generation Target Morphological Rules Winter School 2013, Birmingham
  7. 7. RBMT: an Example Winter School 2013, Birmingham
  8. 8. RBMT: an Example Winter School 2013, Birmingham
  9. 9. RBMT: an Example Winter School 2013, Birmingham
  10. 10. RBMT: an Example Winter School 2013, Birmingham
  11. 11. RBMT: an Example Winter School 2013, Birmingham
  12. 12. RBMT  RBMT makes use of human encoded linguistic rules for translation  Development of a RBMT system is very expensive because it needs plenty of human labour and takes a long time (years) Winter School 2013, Birmingham
  13. 13. RBMT  RBMT systems can reach good translation quality after years of development in the given domain.  Well developed RBMT systems tend to better capture large size sentence structures but perform worse on small size expressions compared with SMT systems. Winter School 2013, Birmingham
  14. 14. EBMT  An EBMT system translate sentences by analog of existing translation examples  EBMT does not need deep analysis of source text and may generate high quality translation when similar examples are found Winter School 2013, Birmingham
  15. 15. EBMT Winter School 2013, Birmingham
  16. 16. EBMT  Quality of EBMT increases while we get more examples.  A problem of EBMT is the coverage of the examples, especially for long sentences. Winter School 2013, Birmingham
  17. 17. TM  Translation Memory directly output existing target sentence when a very similar source sentence is found in the memory, or it outputs nothing. Winter School 2013, Birmingham
  18. 18. SMT  SMT builds statistical models to predict the probability of a target sentence being the translation of a given source sentence.  To translate a given source sentence is just to search for a target sentence with the highest translation probability. Winter School 2013, Birmingham
  19. 19. SMT  A large number of translation pairs (parallel corpus) is needed to estimate the model parameters.  To predict the translation, sentence pairs are broken into smaller translation equivalence, either in word level, or in phrase level or syntax rule level. Winter School 2013, Birmingham
  20. 20. Word-based SMT Winter School 2013, Birmingham
  21. 21. Word-based SMT Source Target Probability Bushi (布什) Bush 0.7 President 0.2 US 0.1 and 0.6 with 0.4 hold 0.7 had 0.3 hold 0.01 ... ... yu (与) juxing (举行) le (了) Winter School 2013, Birmingham
  22. 22. Phrase-based SMT Winter School 2013, Birmingham
  23. 23. Phrase-based SMT Source Target Probability Bushi (布什) Bush 0.5 president Bush 0.3 the US president 0.2 Bush and 0.8 the president and 0.2 and Shalong 0.6 with Shalong 0.4 hold a meeting 0.7 had a meeting 0.3 Bushi yu (布什与) yu Shalong (与沙龙) juxing le huiang (举行了会谈) Winter School 2013, Birmingham
  24. 24. Hierarchical Phrased-based SMT Winter School 2013, Birmingham
  25. 25. Hierarchical Phrased-based SMT Source Target Probability juxing le huiang (举行了会谈) hold a meeting 0.6 had a meeting 0.3 X a meeting 0.8 X a talk 0.2 hold a X 0.5 had a X 0.5 Bushi yu Shalong (布什与沙龙) Bush and Sharon 0.8 Bushi X (布什X) Bush X 0.7 X yu Y (X与Y) X and Y 0.9 X huitang (X会谈) juxing le X (举行了X) Winter School 2013, Birmingham
  26. 26. Syntax-based SMT Winter School 2013, Birmingham
  27. 27. Syntax-based SMT Source Target Probability VPB(VS(juxing) AS(le) NPB(huiang)) hold a meeting 0.6 (举行了会谈) have a meeting 0.3 have a talk 0.1 hold a x1 0.5 have a x1 0.5 VPB(VS(juxing) AS(le) x1:NPB) (举行了x1) VP(PP(P(yu) x1:NPB) x2:VPB) (与 x1 x2) x2 with x1 0.9 IP(x1:NPB VP(x2:PP x3:VPB)) 0.7 x1 x3 x2 Winter School 2013, Birmingham
  28. 28. SMT  SMT is cheap  SMT systems can be developed in a short time  SMT needs a large number of parallel corpus Winter School 2013, Birmingham
  29. 29. SMT  SMT gets good quality translations if we have plenty of in-domain data  SMT quality drops dramatically for out-ofdomain data  SMT results is fluent in short phrases but not good at large size sentence structures (esp. for distant languages) Winter School 2013, Birmingham
  30. 30. Why Hybrid MT?  Each MT approach has its pros and cons.  We want to take advantage of different MT approaches  We do not want to waste our investments on existing MT systems Winter School 2013, Birmingham
  31. 31. Outline  Why Hybrid MT?  An overview of Hybrid MT  Typical Hybrid MT Approaches  Conclusion Winter School 2013, Birmingham
  32. 32. An overview of Hybrid MT  Selective MT: loose coupling  Pipelined MT: medium coupling  Mixture MT: close coupling Winter School 2013, Birmingham
  33. 33. Selective MT  Given translations generated by different approaches, Selective MT tries to select a best one, or select best parts from different translations and combine them to a new one. Winter School 2013, Birmingham
  34. 34. Selective MT MT1 MT2 Select Target Source MT3 Target Winter School 2013, Birmingham
  35. 35. Selective MT MT1 MT2 Select Target Source MT3 Target Winter School 2013, Birmingham
  36. 36. Selective MT  Typical Selective MT:  System Recommendation  System Combination  Sentence-level combination  word-level combination Winter School 2013, Birmingham
  37. 37. Pipelined MT  Pipelined MT adopts one approach as the main approach and use another approach for monolingual preprocessing or post-processing. Winter School 2013, Birmingham
  38. 38. Pipelined MT Pre-Processing Main Approach Post-Processing Winter School 2013, Birmingham
  39. 39. Pipelined MT  Typical Pipelined MT:  Statistical Post-Editing for RBMT  Rule-based Pre-reordering for SMT Winter School 2013, Birmingham
  40. 40. Mixture MT  Mixture MT adopts one approach as the main approach but utilizes one or more different approaches in some components. Winter School 2013, Birmingham
  41. 41. Mixture MT Winter School 2013, Birmingham
  42. 42. Mixture MT  Typical Mixture MT:  Statistical Parsing in RBMT  Rule-based Named Entity Translation in SMT  Human-Encoded Rules in SMT  SMT Decoding with TM Phrases Winter School 2013, Birmingham
  43. 43. Outline  Why Hybrid MT?  An overview of Hybrid MT  Typical Hybrid MT Approaches  Conclusion Winter School 2013, Birmingham
  44. 44. Typical Hybrid MT Approaches  Selective MT  System Recommendation System Combination  Pipelined MT  Mixture MT Winter School 2013, Birmingham
  45. 45. System Recommendation  Yifan He, Yanjun Ma, Josef van Genabith and Andy Way, Bridging SMT and TM with System Recommendation, Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics (ACL2010), pages 622–630, Uppsala, Sweden, 11-16 July 2010. Winter School 2013, Birmingham
  46. 46. System Recommendation  Intuition:  In some cases when we have enough big translation memory, the trained SMT system is comparable with TM output in translation quality. Here comes the problem of selection.  System recommendation recommends SMT outputs to a TM user when it predicts that SMT outputs are more suitable for post-editing than the hits provided by the TM Winter School 2013, Birmingham
  47. 47. System Recommendation TM System Recommendation SMT Parallel Corpus Winter School 2013, Birmingham
  48. 48. System Recommendation  A SVM binary classifier is adopted  The classifier is trained on humanannotated data  A confidence score is given for the recommendation Winter School 2013, Birmingham
  49. 49. System Recommendation  SMT System Features: features used in the SMT system  TM Feature: Fuzzy Match Cost  System Independent Features:  Source-Side Language Model Score and Perplexity  Target-Side Language Model Perplexity  The Pseudo-Source Fuzzy Match Score  The IBM Model 1 Score. Winter School 2013, Birmingham
  50. 50. System Recommendation  Evaluation Metrics: Where A is the set of recommended MT outputs, and B is the set of MT outputs that have lower TER than TM hits. Winter School 2013, Birmingham
  51. 51. System Recommendation Winter School 2013, Birmingham
  52. 52. System Recommendation Winter School 2013, Birmingham
  53. 53. Typical Hybrid MT Approaches  Selective MT System Recommendation  System Combination  Pipelined MT  Mixture MT Winter School 2013, Birmingham
  54. 54. System Combination  Rosti, A. V. I., Ayan, N. F., Xiang, B., Matsoukas, S., Schwartz, R. M., & Dorr, B. J. (2007, April). Combining Outputs from Multiple Machine Translation Systems. In HLT-NAACL (pp. 228-235). Winter School 2013, Birmingham
  55. 55. System Combination  Rosti, A. V. I., Matsoukas, S., & Schwartz, R. (2007, June). Improved word-level system combination for machine translation. In ANNUAL MEETINGASSOCIATION FOR COMPUTATIONAL LINGUISTICS (Vol. 45, No. 1, p. 312). Winter School 2013, Birmingham
  56. 56. System Combination  He, X., Yang, M., Gao, J., Nguyen, P., & Moore, R. 2008. Indirect-HMM-based hypothesis alignment for combining outputs from machine translation systems. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (pp. 98-107). Association for Computational Linguistics. Winter School 2013, Birmingham
  57. 57. System Combination  Feng, Y., Liu, Y., Mi, H., Liu, Q., & Lü, Y. 2009. Latticebased system combination for statistical machine translation. In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 3-Volume 3 (pp. 1105-1113). Association for Computational Linguistics. Winter School 2013, Birmingham
  58. 58. Sentence-Level System Combination  Kumar, S., & Byrne, W. J. (2004, May). Minimum Bayes-Risk Decoding for Statistical Machine Translation. In HLT-NAACL (pp. 169-176). Winter School 2013, Birmingham
  59. 59. Sentence-Level System Combination  Consider we have several MT systems  For a given source text F, each MT system output a n-best target text  If possible, MT system gives each target text a probability P(E|F), or we may consider the n-best target text with equal probabilities. Winter School 2013, Birmingham
  60. 60. Sentence-Level System Combination  Minimum Bayes-Risk (MBR): Winter School 2013, Birmingham
  61. 61. Word-Level System Combination  Select a translation candidate as a skeleton (backbone) with Minimal Bayes Risk  Construct a confusion network by aligning all the words in other translation candidates to the words in the skeleton  Select the best path from the confusion network and generate a new translation Winter School 2013, Birmingham
  62. 62. Translation Candidate Skeleton Winter School 2013, Birmingham
  63. 63. Word Alignment against the Skeleton Skeleton Winter School 2013, Birmingham
  64. 64. Confusion Network Final output: Please show me on the map. Winter School 2013, Birmingham
  65. 65. Word-Level System Combination  System combination is proved to be very effective  In NIST Open MT Evaluation ChineseEnglish task, MSR-NRC-SRI ranked no.1 by using system combination technologies  In later NIST evaluations, different tracks are defined participants using or not using system combination technologies. Winter School 2013, Birmingham
  66. 66. Typical Hybrid MT Approaches  Selective MT  Pipelined MT Statistical Post-Editing for RBMT Rule-based Pre-reordering for SMT  Mixture MT Winter School 2013, Birmingham
  67. 67. Statistical Post-Editing for RBMT  Dugast, L., Senellart, J., & Koehn, P. (2007, June). Statistical post-editing on SYSTRAN's rule-based translation system. In Proceedings of the Second Workshop on Statistical Machine Translation (pp. 220-223). Association for Computational Linguistics. Winter School 2013, Birmingham
  68. 68. Statistical Post-Editing for RBMT  Simard, M., Ueffing, N., Isabelle, P., & Kuhn, R. (2007). Rule-based Translation With Statistical Phrase-based Post-editing. Second Workshop on Statistical Machine Translation. Prague, Czech Republic. June 23, 2007. pp. 203–206. Winter School 2013, Birmingham
  69. 69. Statistical Post-Editing  When we have:  A very good RBMT system  Large number of parallel corpus which can be used for SMT training  Both RBMT and SMT have advantages and disadvantages  Can we make benefits from both methods? Winter School 2013, Birmingham
  70. 70. Statistical Post-Editing A Statistical Post-Editing (SPE) system is a monolingual SMT system which takes the result of a RBMT system as input and generate a improved target output. Source Text RBMT RBMT Result SPE SPE Result Winter School 2013, Birmingham
  71. 71. Statistical Post Edit: Training Source Target RBMT RBMT Target SPE Training SPE Target Winter School 2013, Birmingham
  72. 72. Statistical Post Edit: Training  RBMT usually generates a better word order while SMT can make better lexical selection.  RBMT+SPE outperforms the original RBMT and SMT systems. Winter School 2013, Birmingham
  73. 73. Typical Hybrid MT Approaches  Selective MT  Pipelined MT Statistical Post-Editing for RBMT Rule-based Pre-reordering for SMT  Mixture MT Winter School 2013, Birmingham
  74. 74. Rule-based Pre-reordering for SMT  Elia Yuste, Manuel Herranz, Alexandra Helle and Hirokazu Suzuki, Go Hybrid: Pangeanic's and Toshiba's First Steps Towards ENJP MT Hybridization, AAMT Journal, No.50, December 2011 (Part B for this tutorial) Winter School 2013, Birmingham
  75. 75. Rule-based Pre-reordering for SMT  Xia, F., & McCord, M. (2004, August). Improving a statistical MT system with automatically learned rewrite patterns. In Proceedings of the 20th international conference on Computational Linguistics (p. 508). Association for Computational Linguistics. Winter School 2013, Birmingham
  76. 76. Rule-based Pre-reordering for SMT  A phrase-based SMT (PBSMT) system performs good lexical choices but is not good at long distance reordering without linguistics knowledge  A rule-based word-reordering on the source side is conducted to make the word order of the source text much more similar with the word order in the target side. Winter School 2013, Birmingham
  77. 77. Rule-based Pre-reordering for SMT Source Text PreReordering Reordered Source Text PBSMT Target Text Winter School 2013, Birmingham
  78. 78. PBSMT: Training Source Target Prereordering Reordered Source PBSMT Training PBSMT Target Winter School 2013, Birmingham
  79. 79. Pre-reordering: Training  The rule for pre-ordering can be automatic acquired from the parallel corpus with automatic word alignment and parsing trees in both side. Winter School 2013, Birmingham
  80. 80. Pre-reordering: Training  Parsing the source sentence  Parsing the target sentence  Align the words and the phrases in both sides  Extract the rewrite rules Winter School 2013, Birmingham
  81. 81. Parsing Trees and Alignments Winter School 2013, Birmingham
  82. 82. Rule Extraction Winter School 2013, Birmingham
  83. 83. Rule Organization and Filtering Winter School 2013, Birmingham
  84. 84. Applying Rewrite Rules Winter School 2013, Birmingham
  85. 85. Rule-based Pre-reordering for SMT Winter School 2013, Birmingham
  86. 86. Typical Hybrid MT Approaches  Selective MT  Pipelined MT  Mixture MT  Statistical Parsing in RBMT  Rule-based Named Entity Translation in SMT  Human-Acquired Rules in SMT  SMT Decoding with TM Phrases Winter School 2013, Birmingham
  87. 87. Statistical Parsing in RBMT  Statistical parsing outperforms rulebased parsing if we have large scale treebank.  It is reasonable to use statistical algorithm in the parsing component in a RBMT system. Winter School 2013, Birmingham
  88. 88. Rule-based Named Entity Translation in SMT  Ney, H. (2013). Statistical MT Systems Revisited: How much Hybridity do they have? Proceedings of the Second Workshop on Hybrid Approaches to Translation, page 7, Sofia, Bulgaria, August 8, 2013. Winter School 2013, Birmingham
  89. 89. Numerical Expression Translation English: 3,501,749 3 million 501 thousand and 749 3501749 350,1749 Chinese: 350 wan 1749 Winter School 2013, Birmingham
  90. 90. Human-Acquired Rules in SMT  Li, X., Lü, Y., Meng, Y., Liu, Q., & Yu, H. Feedback Selecting of Manually Acquired Rules Using Automatic Evaluation. Proceedings of the 4th Workshop on Patent Translation, pages 52-59, MT Summit XIII, Xiamen, China, September 2011 Winter School 2013, Birmingham
  91. 91. Human-Acquired Rules in SMT These rules are used in the decoding process together with the Hierarchical Phrases in a SMT system Winter School 2013, Birmingham
  92. 92. SMT Decoding with TM Phrases  Philipp Koehn and Jean Senellart. 2010. Convergence of translation memory and statistical machine translation. In AMTA Workshop on MT Research and the Translation Industry, pages 21–31.  Wang, K., Zong, C., & Su, K. Y. Integrating Translation Memory into Phrase-Based Machine Translation during Decoding. Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics, pages 11–21, Sofia, Bulgaria, August 4-9 2013 Winter School 2013, Birmingham
  93. 93. SMT Decoding with TM Phrases  Yanjun Ma, Yifan He, Andy Way and Josef van Genabith. 2011. Consistent translation using discriminative learning: a translation memory-inspired approach. In Proceedings of the 49th Annual Meeting of the Association for Computational Lingui stics, pages 1239–1248, Portland, Oregon.  Yifan He, Yanjun Ma, Andy Way and Josef van Genabith. 2011. Rich linguistic features for translation memory-inspired consistent translation. In Proceedings of the Thirteenth Machine Translation Summit, pages 456–463. Winter School 2013, Birmingham
  94. 94. SMT Decoding with TM Phrases  Extract TM phrases from similar sentences in the translation memory and use them in the decoding process in the runtime. Winter School 2013, Birmingham
  95. 95. Outline  Why Hybrid MT?  An overview of Hybrid MT  Typical Hybrid MT Approaches  Conclusion Winter School 2013, Birmingham
  96. 96. Conclusion  Different MT approaches have advantages and disadvantages, which are usually complementary.  Hybrid MT can take benefit from different MT approaches  Three categories of Hybrid MT is introduced: Selective, Pipelined and Mixture.  Actually almost all the real MT systems are hybrid system. Winter School 2013, Birmingham
  97. 97. Thank you! Q&A Winter School 2013, Birmingham
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×