Good to Go: More Preferable MT than HT Natsuki Wakabayashi (ISE)
ISE, the Gold partner of Systran, has got a remarkable progress in MT business. We have made a successful presentation on Mt evaluation in the TC symposium last year, which got a significant attention from Japanese audience, and continue to assist our clients to employ MT solutions.In this session, we show the favorable verification results, challenges to achieving the projects, and better and more practical ways to lead them in success.
Digital collaboration with Microsoft 365 as extension of Drupal
ISE - TAUS Tokyo Forum 2015
1. Good to Go: More Preferable
MT than HT
ISE MT Project 2015
ISE
Wakabayashi
electrosuisse japan
Nakamura
2. A leading company that pioneers new areas in technical
communication technologies
• Date of establishment
– October, 1979
• Business sites
– Tokyo (Headquarters), Osaka, Kobe
– Beijing, Shanghai, Switzerland
• Affiliated business
– Electrosuisse Japan (Kobe)
• Our Business
– Technical Communication
– Interface Design
– Systems Design &
Development
– Technical Consulting
• Customer Fields
– Japanese Governmental Agencies,
Educational/Research Institutions
– Financial Institutions, Trading,
Manufacturing, Information Services
About ISE
Information System Engineering,
Electrosuisse Co. Ⓒ 2015
2015/4/10 2
3. Background
• Typical Japanese manufacturers
– In Japanese: Excessively detailed manuals
– Other languages incl. English: not good at dealing with
them → outsourcing to LSPs → data accumulation
– Result (fact): English (master) → EU langs → large TMs
• Depending on langs: almost 1 mil. TM segment pairs!!!
– Utilizing TMs in MT to make L10n more effective and
efficient:
• Source language: English, not Japanese
• Main usage: Eng to European languages
• Which is the most appropriate MT tool for us?
3
Information System Engineering,
Electrosuisse Co. Ⓒ 2015
2015/4/10
4. Integrating MT system into L10n
• Purpose: Efficient L10n for the documents of a
global business (Simship)
• ISE: System solution provider for clients
(manufacturers), working for them (in house) 42015/4/10
Information System Engineering,
Electrosuisse Co. Ⓒ 2015
5. Project Phases
1. From English data: Utilizing TMs in MT to
make L10n more effective and efficient:
speeding up and lowering cost
2. From Japanese data: JA-EN MT-ize: utilizing
MT to facilitate communication including SNS,
customer support
Information System Engineering,
Electrosuisse Co. Ⓒ 2015
52015/4/10
6. 1st Phase : from EN
• Evaluation for Preparation
– Source Language : EN
– Target Language : FR, DE, CN
– Domain : ICT Equipment
– Document Type : Manual
– Corpus volumes(After cleaning)
• FR : approx. 300K TUs
• DE : approx. 250K TUs
• CN : approx. 200K TUs
Information System Engineering,
Electrosuisse Co. Ⓒ 2015
72015/4/10
7. Evaluation Process
• Corpus Cleaning
– Removing duplicate and/or conflicting language pairs from
corpora
• Corpus Training
– Building up translation models using each corpus
• Tuning/Improvement Cycle
– Applying translation rules, terminology, user dictionary,
and normalization dictionary
• Measurement
– HT and MT+PE translation time
• Evaluation
– Productivity, Quality
Information System Engineering,
Electrosuisse Co. Ⓒ 2015
82015/4/10
8. BLEU/Perfect/TER/WER
• BLEU
– Excellent levels
• Target score : 50+
• Perfect
– Excellent levels except DE
• Target score: 25+
• TER/WER
– Good scores for each language
– Effective and practical levels
• Target score: Under 40
Information System Engineering,
Electrosuisse Co. Ⓒ 2015
9
FR DE CN
BLEU CLEAR CLEAR CLEAR
Perfect CLEAR UNDER CLEAR
TER CLAER CLEAR CLEAR
WER CLEAR CLEAR CLEAR
2015/4/10
9. Productivity Evaluation
• Methodology
– Translation Targets
• Pick up 120 sentences from manual
– Measurement translation time each lang.
• HT
• MT + PE1
• MT + PE2
2015/4/10
Information System Engineering,
Electrosuisse Co. Ⓒ 2015
10
10. Productivity Evaluation
• Achieved Doubled Productivity Compared
with Standard HT
Information System Engineering,
Electrosuisse Co. Ⓒ 2015
112015/4/10
11. Quality Evaluation
• Methodology
– Testers evaluate the following 5 translations:
i. Original translation
ii. HT
iii. MT
iv. MT + PE1
v. MT + PE2
– Testers do not know which is which
– Perfect score: 100
Information System Engineering,
Electrosuisse Co. Ⓒ 2015
122015/4/10
12. Quality Evaluation
• From HT
– Excellent!
• From MT
– Good score :FR, CN
• From MT+PE
– Achieved the original translation quality level
– DE: got the scores to B after PE (from MT: C)
– Realized the same quality as those of the originals
Information System Engineering,
Electrosuisse Co. Ⓒ 2015
13
FR DE CN
Reference
(Original)
A B A
HT A A A
MT A C B
MT+PE1 A B A
MT+PE2 A B A
2015/4/10
14. 2nd Phase: JA to EN
Information System Engineering,
Electrosuisse Co. Ⓒ 2015
152015/4/10
15. Problems in JA>EN
• Terminology/word usages (expressions)
• Katakana words
• Itemization
• Viewing points
• Parallelism
• Modifications
• Grammar
• No subjects: you or we (instruction or descriptive)
• Passive
• Syntax
– Ha-ga construction
– Ergative case
– Others
• Singular or plural
• Definite article
• Pronominalization
Information System Engineering,
Electrosuisse Co. Ⓒ 2015
162015/4/10
16. Improvements
• Improving Japanese text quality by applying the Plain
and Logical Japanese 77 Rules
• Almost a half of them are effective (subjective evaluation)
– Effective 26
– Somewhat effective 16
– None 35
Information System Engineering,
Electrosuisse Co. Ⓒ 2015
172015/4/10
17. Viewing Points
• 主語も述語も共有しない重文は複数の文に分ける Divide the
compound sentence that doesn’t share the subjects and
predicatives
– まず手動でおおよそのポイントを調整し、その調整が合っ
たところを自動的に認識して、正確なポイントを検出します。
– First you adjust the approximate point manually, recognizing the
place where the adjustment is agreeable automatically, you detect
the accurate point.
– 手動でおおよそのポイントを合わせます。機器は、その調整されたポイ
ントを自動的に認識して、正確なポイントを検出します。
– Adjust the approximate point manually. The equipment, recognizing the
adjusted points automatically, detects the accurate point.
Information System Engineering,
Electrosuisse Co. Ⓒ 2015
182015/4/10
18. Modifications
• 連用修飾の数量表現は、連体修飾句に言い
換える Replace the continuous modifications
with adnominal modifications in quantity
expression
– このメーリングリストのファシリテーターが複数必
要になります。
– The facilitator of this mailing list is several needed.
– このメーリングリストには、複数のファシリテー
ターが必要です。
– Several facilitators are necessary in this mailing list.
Information System Engineering,
Electrosuisse Co. Ⓒ 2015
192015/4/10
19. Parallelism
• 並列関係にあるものは、並列であることを明示する(パ
ラレリズム) Show the things in the parallel grammatical
form if they are parallel
– 正常運転を行ったときはプラスの反応を示し、逆転する場
合はマイナスの反応が示されます。
– When normal operation shows the plus reaction, when it is
reversed, the negative reaction is shown.
– 正常運転を行なったときはプラスの反応が示され、逆転
運転を行なったときはマイナスの反応が示されます。
– When doing normal operation, the plus reaction is shown,
when doing reversal driving, the negative reaction is
shown.
Information System Engineering,
Electrosuisse Co. Ⓒ 2015
202015/4/10
20. No subjects: you or we (instruction or
descriptive)
• 抽象的な品詞よりも、より具体的な品詞を使う
Use concrete words, not using abstract words
– 開始点Aより終点Bまで実線を引く。
– From the start point A the solid line is pulled to
terminus B.
– 実線を開始点Aから終点Bまで書く。
– Write the solid line from the start point A to the
end point B.
Information System Engineering,
Electrosuisse Co. Ⓒ 2015
212015/4/10
21. Ha-ga structure
• 主題を表す副助詞「は」は使わない Don’t use
the restrictive particle, “Ha,” showing topic
– このマニュアルは、イラストが効果的です。
– This as for the manual, illustration is effective.
– このマニュアルに描かれているイラストは効果的
です。
– The illustration which is drawn in this manual is
effective. Singular or plural?
Information System Engineering,
Electrosuisse Co. Ⓒ 2015
222015/4/10
22. Ergative case
• 「なる」表現は「する」表現や受身形に言い換える Clarify
the doers by replacing “Be” expressions with “Do”
expressions
– 私たちは6月に結婚することになりました。
– そのマニュアルに修正を入れることになりました。
– We came to the point of getting married in June.
– That it came to the point of inserting correction in the manual.
– 私たちは6月に結婚します。
– CS部門がそのマニュアルを修正します。
– We get married in June.
– CS section corrects that manual.
Information System Engineering,
Electrosuisse Co. Ⓒ 2015
232015/4/10
23. Singular or plural
• この章では以下のことを説明します:
• In this chapter thing below is explained:
• The following are explained in this chapter:
Information System Engineering,
Electrosuisse Co. Ⓒ 2015
242015/4/10
24. Summary 1
• Advantages for clients
– Integrating an MT system in the current document
production system smoothly
– Decreasing l10n costs
– Getting various by-products (Ex. increasing the
document categories to be translated)
• Advantages for ISE (Solution provider)
– Step to a new business filed (MT)
– Adding new values to its solution business
Information System Engineering,
Electrosuisse Co. Ⓒ 2015
252015/4/10
25. Summary 2
• MT by-products
– Information sharing in a global business
– Speeding up in development and sales
– Revitalizing and facilitating internal communication
using SNS
2015/4/10
Information System Engineering,
Electrosuisse Co. Ⓒ 2015
26
26. Summary 3
• What can ISE do for you:
– Training for Japanese writing: Plain and Logical
Japanese 77 Rules
– Improving Japanese documents (Pre-editing)
– Training for English writing
– Post-editing
2015/4/10
Information System Engineering,
Electrosuisse Co. Ⓒ 2015
27
27. • Contact:
Information System Engineering,
Electrosuisse Co. Ⓒ 2015
28
ISE
Wakabayashi
natsuki.wakabayashi@ise.co.jp
electrosuisse japan
Nakamura
tetsuzo.nakamura@electrosuisse.co.jp
2015/4/10