A "simplified guide to SMT" is about as simple as a "simplified guide to Photoshop." Professional tools require expertise. The questions are, what levels of expertise are required, how do you acquire them and what processes contribute to a successful SMT program? These fundamentals are the same whether you're planning to use an outsourcing service or preparing to operate an in-house system. This session reviews these fundamentals with examples that reference use cases with PTTools' DoMT Desktop, a commercial application with a Moses kernel.
This presentation is a part of the MosesCore project that encourages the development and usage of open source machine translation tools, notably the Moses statistical MT toolkit. MosesCore is supported by the European Commission Grant Number 288487 under the 7th Framework Programme.
For the latest updates go to http://www.statmt.org/mosescore/
or follow us on Twitter - #MosesCore
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
TAUS Machine Translation Showcase, The Simplified Guide to Getting Started in SMT, Precision Translation Tools, 2014
1. TAUS
MACHINE
TRANSLATION
SHOWCASE
Vancouver,
Canada
The Simplified Guide to Getting Started in
SMT
Wednesday, 29 October 2014
Tom Hoar, Precision Translation Tools
The
research
within
the
project
MosesCore
leading
to
these
results
has
received
funding
from
the
European
Union
7th
Framework
Programme,
grant
agreement
no
288487
2. The
Simplified
Guide
to
GeGng
Started
in
SMT
Professional
tools
Professional
experIse
3. PTTools
• SoJware
vendor
-‐
founded
Feb
2010
– Adobe
:
Photoshop
– PTTools
:
DoMT
• DoMT
brand
– DoMT
Deskop:
organize
and
manage
training
corpora,
models
and
custom
workflows.
– DoMT
Server:
automaIon
soluIon
• Customer
educaIon
Who We Are
4. AGENDA
Current
State
of
SMT
GeGng
Started
Skill
Requirements
Use
Cases
Q&A
Current SMT
5. Current
State
• Who
has
not
heard
of
SMT?
• Requires
powerful,
expensive
hardware
• Huge
translaIon
memories
• Complicated
processes
• Dearth
of
skilled
personnel
Current SMT
6. Then
vs
Now
Current SMT
2007
2014
Hardware
50
CPUs
in
private
cloud
One
24-‐CPU
machine
Mega
corpus
2
weeks
36
hours
Cost
US
$100K++
US
$1,500
1992
2014
Computer
SGI
@
$100K
Dell
@
$5,000
SoGware
Eclipse
Alias
@$25K
Adobe
CS
Cloud
$1,500
Graphic
ProducKon
$300
per
hour
$30++
per
hour
7. Business
Models
• Where
is
the
work
done?
• Who
does
the
work?
• Outsourced
– Free
– For
Fee
• Insourced
– Enterprise
Server
– Desktop
ApplicaIon
Current SMT
8. Reality
2014
• Inexpensive
capable
hardware
exists
• TranslaIon
memories
within
reach
• Processes
migraIng
to
soJware
• Training
available
for
exisIng
personnel
Current SMT
9. AGENDA
Current
State
of
SMT
GeLng
Started
Skill
Requirements
Use
Cases
Q&A
“Simple Guide”
10. Is
Academic
Moses
Enough?
“There
are
considerable
amounts
of
addiIonal
funcIonality...
that
are
not
included
in
Moses
that
are
essenIal
in
order
to
offer
a
strong
and
innovaIve
commercial
MT
plajorm.”
– Philipp
Koehn
–
Professor,
University
of
Edinburgh
(http://kv-emptypages.blogspot.com/2013/09/understanding-mt-customization.html)
“Simple Guide”
11. GeGng
Started
• Manage
Corpora
• Mange
SMT
Models
• Produce
MT
• Post
Edit
Results
“Simple Guide”
15. Post-‐edit
Results
• Subject
of
other
presentaIons
• Recycle
as
new
corpus?
“Simple Guide”
16. AGENDA
Current
State
of
SMT
GeGng
Started
Skill
Requirements
Use
Cases
Q&A
Human Resources
17. SMT
Specialists
• ComputaIonal
linguists
are
scienIst
who
specialize
in
language
and
compuIng
to
create
and
advance
the
science.
• Specialists
are
localizaIon
engineers
who
review
the
data
and
select
tools
to
prepare
a
training
corpus
that
minimizes
post-‐ediIng
in
commercial
producIon.
Human Resources
18. Specialist’s
Required
Skills
• OrganizaIon
skills
(e.g.
manage
TM’s)
• Observant
of
paserns
• Willingness
to
learn
• Regular
expression
–
helpful
• Programming
skills
–
unnecessary
• ComputaIonal
linguists
–
unnecessary
• System
Administrator
–
unnecessary
Human Resources
21. AGENDA
Current
State
of
SMT
GeGng
Started
Skill
Requirements
Use
Cases
Q&A
Use Cases
22. Use
Cases
• Large
LSP
– Extensive
MT
experience
– CSA
Top
10
• 2
Medium
LSP’s
– Post-‐ediIng
experience
– In-‐house
localizaIon
engineers
• Freelance
Translator
– United
NaIons
contractor
– Technically
savvy
Use Cases
23. Welocalize
• Work:
SoJware
localizaIon
• Hardware:
Virtual
machines
for
pilot
• SMT
models:
EN-‐ES,
EN-‐DE,
EN-‐ZH,
EN-‐RU
• Corpus:
All
corpora
<
500,000
segment
pairs
• Training:
3-‐month
pilot
• Results:
“Approached
outsourcing
vendors”
– Zero-‐edit
measure:
25-‐45%
Use Cases
24. EQHO
CommunicaIons
• Work:
SoJware
localizaIon
• Hardware:
$1,500
new
6-‐core
computer
• SMT
model:
EN
<-‐>
European
language
• Corpus:
~130,000
segment
pairs
• Training:
3
month
pilot
• Results:
BLEU’s
80
to
85
– Zero-‐edit
measure:
23-‐43%
Use Cases
25. Mid-‐sized
European
LSP
• Work:
Financial
and
regulatory
reports
• SMT
model:
EN
<-‐>
European
language
• Corpus:
~800,000
segment
pairs
(25
years)
• Training:
20
hours
of
tutorials
over
2
months
• Homework:
Categorize
TM’s
for
4+
months
• Results:
BLEU’s
rose
from
low
50’s
to
mid-‐80’s
Use Cases
26. Freelance
Translator
• Work:
United
NaIons
environmental
reports
• Hardware:
$1,500
new
6-‐core
computer
• SMT
model:
EN
<-‐>
European
language
• Corpus:
~250,000
segment
pairs
(25
years)
• Training:
40
hours
of
tutorials
over
2
months
• Results:
BLEU’s
75
to
85
– Zero-‐edit
measure:
averaged
35%
Use Cases
27. Conclusion
• Regardless
of
business
model
– Mange
Corpora
– Generate
Models
– Product
MT
– Publish
Results
• Re-‐purpose
exisIng
staff
with
training
• Rightsourcing