RELEARNING AN RBMT SYSTEM

•

0 likes•635 views

This document summarizes a paper that explores relearning a rule-based machine translation (RBMT) system using statistical methods. It compares the performance of the original SYSTRAN RBMT system, a relearnt statistical model of SYSTRAN called SYSTRAN Relearnt, and a baseline statistical model called SYSTRAN Relearnt-0. The models are trained without parallel corpora by using SYSTRAN translations. Evaluation shows SYSTRAN Relearnt achieves 5 BLEU points higher than the baseline by using a real English language model and tuning set. Error analysis of 100 sentences identifies common error types between the systems like missing words, extra words, and translation choices to discriminate the nature and training of

Education

Summary of Can we relearn an RBMT system?

Summary of Can we relearn an RBMT system?
Hiroshi Matsumoto
Nagaoka University of Technology EEI Dept.

March 5, 2013

Summary of Can we relearn an RBMT system?

Outline

1 About this paper

2 Introduction

3 Systems

4 Models

5 Results

Summary of Can we relearn an RBMT system?
About this paper

About this paper:

Title:Can we relearn an rbmt system?
Author:Dugast, Lo{ï}c and Senellart, Jean and Koehn, Philipp
Booktitle:Proceedings of the Third Workshop on Statistical
Machine Translation
Pages: 175178
Year: 2008
Organization:Association for Computational Linguistics

Summary of Can we relearn an RBMT system?
Introduction

Introduction

Two Major Researches:
1 Rule-based Systems
Manually written rules associated with bilingual dictionaries
2 Statistical Machine Translation
Statistical framework based on large amount of monolingual
and parallel corpora

Aims of this research:
nding ecient combination setups
discriminating strengths/weaknesses of rule-based and
statistical systems

Summary of Can we relearn an RBMT system?
Systems

Systems

Systems
SYSTRAN:
a pure rule-based system

SYSTRAN Relearnt:
a statistical model of the rule-based engine
Relearnt uses a real English language model

SYSTRAN Relearnt-0:
a plain statistical model of SYSTRAN

MOSES

Summary of Can we relearn an RBMT system?
Models

Training w/o human ref. translation

Problem
The reliance of statistical models on parallel corpora is
problematic.
Solutions for this are such as by domain adaptation, statistical
post-editing.
Here, they came up with a new solution

Summary of Can we relearn an RBMT system?
Models

Training w/o human ref. translation

Submitted system:
SL side of parallel corpus was translated with rule-based
translation engine to produce the target side of the training
data
LM was trained on the real TL from data
Non-Submitted system:
Each corpus was built from newspaper
SL corpus was translated by the rule-based system to produce
the parallel training data, while TL corpus was used to train a
LM

Summary of Can we relearn an RBMT system?
Models

Training w/o human ref. translation

Summary of Can we relearn an RBMT system?
Results

Results #1

Comparison of Baseline Relearnt-0
Relearnt-0 model is slightly lower than the rule-based original
Comparison of Relearnt Relearnt-0
5 BLEU points more for the Relearnt-0 with a real English
language model and tuning set

Summary of Can we relearn an RBMT system?
Results

Results #2
To discriminate between the statistical nature of a translation
system and the fact it was trained on the relevant domain,
dened 11 error types
counted occurrences for 100 random-picked sentences

Summary of Can we relearn an RBMT system?
Results

Results #2

Missing words
Typical statistial error: but no evidence
Extra words
One of rule-based features to produce something extra
Unknown words
Not in dictionaries for rule-based
Translation choice
Statistical strength

Summary of Can we relearn an RBMT system?
Results

Result #3

Similar to RELEARNING AN RBMT SYSTEM

New Development in MT Technology and Services, by Anthony Wong, CCID TransTechTAUS - The Language Data Network

Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...Association for Computational Linguistics

Piecewise Controller Design for Affine Fuzzy SystemsISA Interchange

Piecewise controller design for affine fuzzy systemsISA Interchange

PPTbutest

EMPLOYING PIVOT LANGUAGE TECHNIQUE THROUGH STATISTICAL AND NEURAL MACHINE TRA...ijnlc

cushLEPOR uses LABSE distilled knowledge to improve correlation with human tr...Lifeng (Aaron) Han

Management Of Future Communication Networks And ServicesMiguel Ponce de Leon @ TSSG / Waterford Institute of Technology

Machine translation course program (in English)Dmitry Kan

Hua Shan - 2015 - A Dependency-to-String Model for Chinese-Japanese SMT SystemAssociation for Computational Linguistics

7. ebmt based on st smHiroshi Matsumoto

Comparative Analysis of Transformer Based Pre-Trained NLP Modelssaurav singla

Integration of speech recognition with computer assisted translationChamani Shiranthika

Decision tablesAshish Kulkarni

Class Diagram Extraction from Textual Requirements Using NLP Techniquesiosrjce

D017232729IOSR Journals

Data-Driven (Reinforcement Learning-Based) ControlDebmalya Biswas

Long Zhou - 2017 - Neural System Combination for Machine TransaltionAssociation for Computational Linguistics

10.combination of sm_tn_rbmtHiroshi Matsumoto

mapReduce for machine learning Pranya Prabhakar

Similar to RELEARNING AN RBMT SYSTEM (20)

New Development in MT Technology and Services, by Anthony Wong, CCID TransTech

Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...

Piecewise Controller Design for Affine Fuzzy Systems

Piecewise controller design for affine fuzzy systems

PPT

EMPLOYING PIVOT LANGUAGE TECHNIQUE THROUGH STATISTICAL AND NEURAL MACHINE TRA...

cushLEPOR uses LABSE distilled knowledge to improve correlation with human tr...

Management Of Future Communication Networks And Services

Machine translation course program (in English)

Hua Shan - 2015 - A Dependency-to-String Model for Chinese-Japanese SMT System

7. ebmt based on st sm

Comparative Analysis of Transformer Based Pre-Trained NLP Models

Integration of speech recognition with computer assisted translation

Decision tables

Class Diagram Extraction from Textual Requirements Using NLP Techniques

D017232729

Data-Driven (Reinforcement Learning-Based) Control

Long Zhou - 2017 - Neural System Combination for Machine Transaltion

10.combination of sm_tn_rbmt

mapReduce for machine learning

Recently uploaded

Código Creativo y Arte de Software | Unidad 1Maestría en Comunicación Digital Interactiva - UNR

Advanced Views - Calendar View in Odoo 17Celine George

18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdfssuser54595a

Mattingly "AI & Prompt Design: The Basics of Prompt Design"National Information Standards Organization (NISO)

Grant Readiness 101 TechSoup and Remy ConsultingTechSoup

Software Engineering Methodologies (overview)eniolaolutunde

Employee wellbeing at the workplace.pptxNirmalaLoungPoorunde1

Hybridoma Technology ( Production , Purification , and Application ) Sakshi Ghasle

microwave assisted reaction. General introductionMaksud Ahmed

BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdfSoniaTolstoy

POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxSayali Powar

How to Make a Pirate ship Primary Education.pptxmanuelaromero2013

A Critique of the Proposed National Education Policy ReformChameera Dedduwage

Arihant handbook biology for class 11 .pdfchloefrazer622

The basics of sentences session 2pptx copy.pptxheathfieldcps1

SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood

Beyond the EU: DORA and NIS 2 Directive's Global ImpactPECB

Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD

1029 - Danh muc Sach Giao Khoa 10 . pdfQucHHunhnh

Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"National Information Standards Organization (NISO)

Recently uploaded (20)

Código Creativo y Arte de Software | Unidad 1

Advanced Views - Calendar View in Odoo 17

18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf

Mattingly "AI & Prompt Design: The Basics of Prompt Design"

Grant Readiness 101 TechSoup and Remy Consulting

Software Engineering Methodologies (overview)

Employee wellbeing at the workplace.pptx

Hybridoma Technology ( Production , Purification , and Application )

microwave assisted reaction. General introduction

BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf

POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx

How to Make a Pirate ship Primary Education.pptx

A Critique of the Proposed National Education Policy Reform

Arihant handbook biology for class 11 .pdf

The basics of sentences session 2pptx copy.pptx

SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx

Beyond the EU: DORA and NIS 2 Directive's Global Impact

Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...

1029 - Danh muc Sach Giao Khoa 10 . pdf

Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"

RELEARNING AN RBMT SYSTEM

1. Summary of Can we relearn an RBMT system? Summary of Can we relearn an RBMT system? Hiroshi Matsumoto Nagaoka University of Technology EEI Dept. March 5, 2013

2. Summary of Can we relearn an RBMT system? Outline 1 About this paper 2 Introduction 3 Systems 4 Models 5 Results

3. Summary of Can we relearn an RBMT system? About this paper About this paper: Title:Can we relearn an rbmt system? Author:Dugast, Lo{ï}c and Senellart, Jean and Koehn, Philipp Booktitle:Proceedings of the Third Workshop on Statistical Machine Translation Pages: 175178 Year: 2008 Organization:Association for Computational Linguistics

4. Summary of Can we relearn an RBMT system? Introduction Introduction Two Major Researches: 1 Rule-based Systems Manually written rules associated with bilingual dictionaries 2 Statistical Machine Translation Statistical framework based on large amount of monolingual and parallel corpora Aims of this research: nding ecient combination setups discriminating strengths/weaknesses of rule-based and statistical systems

5. Summary of Can we relearn an RBMT system? Systems Systems Systems SYSTRAN: a pure rule-based system SYSTRAN Relearnt: a statistical model of the rule-based engine Relearnt uses a real English language model SYSTRAN Relearnt-0: a plain statistical model of SYSTRAN MOSES

6. Summary of Can we relearn an RBMT system? Models Training w/o human ref. translation Problem The reliance of statistical models on parallel corpora is problematic. Solutions for this are such as by domain adaptation, statistical post-editing. Here, they came up with a new solution

7. Summary of Can we relearn an RBMT system? Models Training w/o human ref. translation Submitted system: SL side of parallel corpus was translated with rule-based translation engine to produce the target side of the training data LM was trained on the real TL from data Non-Submitted system: Each corpus was built from newspaper SL corpus was translated by the rule-based system to produce the parallel training data, while TL corpus was used to train a LM

8. Summary of Can we relearn an RBMT system? Models Training w/o human ref. translation

9. Summary of Can we relearn an RBMT system? Results Results #1 Comparison of Baseline Relearnt-0 Relearnt-0 model is slightly lower than the rule-based original Comparison of Relearnt Relearnt-0 5 BLEU points more for the Relearnt-0 with a real English language model and tuning set

10. Summary of Can we relearn an RBMT system? Results Results #2 To discriminate between the statistical nature of a translation system and the fact it was trained on the relevant domain, dened 11 error types counted occurrences for 100 random-picked sentences

11. Summary of Can we relearn an RBMT system? Results Results #2 Missing words Typical statistial error: but no evidence Extra words One of rule-based features to produce something extra Unknown words Not in dictionaries for rule-based Translation choice Statistical strength

12. Summary of Can we relearn an RBMT system? Results Result #3

RELEARNING AN RBMT SYSTEM

Recommended

Recommended

More Related Content

Similar to RELEARNING AN RBMT SYSTEM

Similar to RELEARNING AN RBMT SYSTEM (20)

More from Hiroshi Matsumoto

More from Hiroshi Matsumoto (17)

Recently uploaded

Recently uploaded (20)

RELEARNING AN RBMT SYSTEM