Can we MT Japanese? Andrew Jones, Nikon Precision Technology.
This is a case study of our attempts to machine translate (MT) Japanese technical information—as fast and as accurately, and in as few steps, as possible—with limited linguistic knowledge, and even less time. So far what we have accomplished is determined to a great extent by certain characteristics of modern Japanese. In contrast to much of what is prized in Japanese society, in particular the demand for standardization (think of tatami) and precision (reaching the nanometer level), written texts often seem more like a Zen garden—makes sense in some ways but nothing “regular” about it. Still, it is these features we must be aware of, and take advantage of, to make progress towards the demand of our customers—one-button, real-time translation of texts and, next, chat.
3. 3
§ Nikon Precision is a photolithography
equipment manufacturer with subsidiaries
and offices worldwide
§ Translate avg. of 3 million characters
per month (technical content)à
Actual translation need is 1.5x that
§ It is difficult materialà80% of (human)
translator candidates fail the test
§ Inhouse we have 7 linguists, 3 editors, 2 DTP,
1 PM + outside linguists and editors
§ Our system is a Wordbee (WB) TMS with
Microsoft HUB connection for MT
The general situation
i
4. 4
The specific problem
Field engineers face the same communication barrier
as the Dutch in 1600s!
The difference…
Tools
TAUS J
Expectations
5. 5
MT needs/Previous MT use
§ Large amount of Japanese material produced hourly
– Non-Japanese speakers cannot understand any of it
– Human translation cannot cope with volume
– Translation delays lower overall work efficiency, miscommunication
§ Rule-based MT previously used
– Easy to use, one-button translation
– “Better than nothing”
– Expensive (licenses for each user)
– Difficult to centrally manage terminology
– No longer developed
§ Various on-line MT services also used (Bing, etc.)
– Security issues
§ Need secure, low cost, customizable MT solution
– Utilizes translation memories and glossaries
– Can improve over time
- Can understand context
- Can learn by itself
6. 6
Sample MT system evaluation results – Usability scores
*1 = Poor, 2 = Low Medium, 3 = Medium, 4 = Good, 5 = Excellent
- MT name kept hidden from evaluators
- System 20 got the highest overall scores
8. 8
The big MT expectation game
§ Until recently, much of the history of MT is a history
of broken expectations
– “Anybody can translate, so machines can do it”
– Raised expectations are crushed
§ With great improvements in many language pairs,
expectations may be nearing reality, but
§ Japanese->English is in its infancy, yet the mistaken
expectation syndrome is alive and well
§ This was the case for us à big expectations
suddenly emerged…even exceeding the great
need…and we felt a lot of pressure i
9. 9
MT system highlights and lowlights
§ Highlights
– Tight integration of HUB and WB (leverage memory matches)
– Color coding of results (MT results in red; >90-99% in yellow)
– Post-editing (PE) is part of workflow
§ Lowlights
– Sign-in and three clicks required
• Bing is one click!
• Not integrated with MS Office
– Expectations…
• In Nikon, “machine” = super accuracy, perfectly clear…
– Cannot work offline
10. 10
What current MT is good for
§ Frequently revised (high memory
match), highly consistent
languageàPE only
§ Noun piles (E.g., software
stringsàthousands per second,
compared to HT thousands per day)
§ Gist for non-J readers who have no
idea what even the subject of a text is
11. 11
What current MT is not good for
§ Intent is not clear, even just superficially (e.g., love
of 曖昧/vagueness; 難しい≠“difficult”)
§ Inconsistent language, logic (MT does not ask
author what s/he means, does not learn from
context/immediate experience)
§ Poorly structured text (e.g., mysterious use of Excel
for written content)
§ Most sentences over 7 words; handling word order
and other linguistic & technical challenges
i
12. 12
So, can we MT Japanese?
• For gisting? à Yes
No
The answer still depends on expected outcome
• For use with a well-developed TM + PE à Maybe
• For reliable translations à No
13. 13
Getting to “yes”
§ Conclusion
– As a global company based in Japan, with fast-paced technical
life cycles, there is a great need for J->E translation
• We are just starting to fill this need
– There are even greater expectations for what MT can do
• The expectation grows and seems higher after every MT
improvement…
– To better meet expectations of both management ($) and
engineers (know-how), we need to keep up with new MT
developments and find the best solution
§ Our hope
– Every LSP is trying to do MT, but at least for J->E…
– A broad cooperative effort may be the only real hope
i
15. 15
Nikon translation matrix
Typical Silicon valley company
Nikon
Qly
Qty
HT
Chat MT
Qly
Qty
MT
HT
Email
Procedures
Tech Bulletins
Sales material
Customer PPT
Often a direct
correlation between
quality/human
translation, and
quantity/MT
At Nikon, high
quantity with
high quality*MT + TM
+ PE
MT+PE
User guides
*Mistakes are extremely expensive and not much need for social media presence
16. 16
MT system setup - Workflow
Source documents
Requestor
Matching translation
from memory
Raw MT
download
FlashTrans
raw output
Post-Edit
Is this
information
required?
No need/
Stop processing
Use the translation
as-is
No
Is information
sufficient?
Request post-edit to
Language Services
Yes
Yes
No
Final
translation
Requestor
Decision Process
WB Portal
WB Memory
MT (HUB Engine)
No matching memory
Post-edited data
stored in WB memory
Post-edited data used for engine training
Edited
translation
17. 17
How do you rate translation quality?
Feedback from field - Initial survey result (sample 1)
- Majority of user evaluate MT system as very useful
- Japanese users evaluated the system as less useful than English users
18. 18
Feedback from field - Initial survey result (sample 2)
How do you rate the convenience of FlashTrans?
- Many expressed a preference for off-line access
- Again, Japanese users evaluation is lower
- Japanese uses in Japan are having problems with slow network speed at their work
sites
19. 19
Linguistic problems 1/2
§ Negation
ただし、この状態はB社の装置の状態ほど悪くはありません。
However, this status is as bad as the status of the machine for the company B.
§ Katakana recognition
これは清掃とウエハロット選別をすれば大丈夫という事で宜しいのでしょうか。
Are you sure you want a clean and ウエハロット selection can be avoided if this is?
§ Wrong word order
XYZ上面にアイボルト(x4)を取付ける。
on top of the XYZ eye bolt (x4) install the.
Install the eye bolts (x4) on the top of the XYZ.
20. 20
Linguistic problems 2/2
§ Irregular capitalization in target segments
§ Extra spaces in target segments
§ Double-byte characters in target segments
(配管を接続する。)
( connect the tubing. )
22. 22
Efforts and effects for improving MT process
Efforts Effects Note
Post-editing guidelines (light,
medium, heavy)
Big step for overall process
improvement
This is a process change. PE is faster,
more effective based on light, medium,
heavy project rules
Periodic retraining
* Large data for specific
problems
* Adding TM to MT from HT
Biggest effect, though
relatively small
Not programmatic (no access to phrase
table, etc.) and so inefficient
Adding terms to dictionary Slight but steady
Very good on noun piles (e.g., UI strings),
unique items (personal names (e.g., 山口
was Mountain Mouth and now is
Yamaguchi)
Post-editing additional e-mail
material
Mostly helps in the "set
phrase" 決まり文句, Nikon-
go (Nikon-ese) area
Initially spent most time on this but got the
least impact, so stopped
Set phrases added in
separate TM for 100% match
Slight but steady
This is follow-up after we stopped PE of
emails (above)
Source document writing
guidelines
Negative
Few follow the guidelines (no writing tool
available), and no one is happy with them
Pre-edit of the source
document
Negative
Pre-edit takes longer than HT, few SME
who have pre-edit skills
Concentrate on already well-
written, well-structured
documents
High (memory tuning is
easier, more TM)
This is a process change. Spend less time
on non-standard language (emails) and
more on standardized language
23. 23
Near-term Efforts
Our internal near-term efforts
- Integrate MT evaluation tools into TMS
- Develop multiple MT engines:
- SMT engine for procedural documents
- Hybrid for e-mail
- Integration of MT, TMS, and SharePoint
- Integration of FlashTrans in MS Office
- Use MT for chat
- MT offline