SlideShare a Scribd company logo
1 of 28
Download to read offline
TAUS	
  MACHINE	
  TRANSLATION	
  SHOWCASE	
  
Vancouver,	
  Canada	
  
The Simplified Guide to Getting Started in
SMT
Wednesday, 29 October 2014
Tom Hoar, Precision Translation Tools
The	
  research	
  within	
  the	
  project	
  MosesCore	
  leading	
  to	
  these	
  results	
  has	
  received	
  funding	
  from	
  the	
  European	
  Union	
  7th	
  Framework	
  Programme,	
  grant	
  agreement	
  no	
  288487	
  
The	
  Simplified	
  Guide	
  to	
  	
  
GeGng	
  Started	
  in	
  SMT	
  
Professional	
  tools	
  	
  
Professional	
  experIse	
  
PTTools	
  
•  SoJware	
  vendor	
  -­‐	
  founded	
  Feb	
  2010	
  
– Adobe	
  :	
  Photoshop	
  
– PTTools	
  :	
  DoMT	
  
•  DoMT	
  brand	
  
– DoMT	
  Deskop:	
  organize	
  and	
  manage	
  training	
  
corpora,	
  models	
  and	
  custom	
  workflows.	
  
– DoMT	
  Server:	
  automaIon	
  soluIon	
  
•  Customer	
  educaIon	
  
Who We Are
AGENDA	
  
Current	
  State	
  of	
  SMT	
  
GeGng	
  Started	
  
Skill	
  Requirements	
  
Use	
  Cases	
  
Q&A	
  
Current SMT
Current	
  State	
  
•  Who	
  has	
  not	
  heard	
  of	
  SMT?	
  
•  Requires	
  powerful,	
  expensive	
  hardware	
  
•  Huge	
  translaIon	
  memories	
  
•  Complicated	
  processes	
  
•  Dearth	
  of	
  skilled	
  personnel	
  
Current SMT
Then	
  vs	
  Now	
  
Current SMT
2007	
   2014	
  
Hardware	
   50	
  CPUs	
  in	
  private	
  cloud	
   One	
  24-­‐CPU	
  machine	
  
Mega	
  corpus	
   2	
  weeks	
   36	
  hours	
  
Cost	
   US	
  $100K++	
   US	
  $1,500	
  
1992	
   2014	
  
Computer	
   SGI	
  @	
  $100K	
   Dell	
  @	
  $5,000	
  
SoGware	
   Eclipse	
  Alias	
  @$25K	
   Adobe	
  CS	
  Cloud	
  $1,500	
  
Graphic	
  ProducKon	
   $300	
  per	
  hour	
   $30++	
  per	
  hour	
  
Business	
  Models	
  
•  Where	
  is	
  the	
  work	
  done?	
  
•  Who	
  does	
  the	
  work?	
  
•  Outsourced	
  
– Free	
  
– For	
  Fee	
  
•  Insourced	
  
– Enterprise	
  Server	
  
– Desktop	
  ApplicaIon	
  
Current SMT
Reality	
  2014	
  
•  Inexpensive	
  capable	
  hardware	
  exists	
  
•  TranslaIon	
  memories	
  within	
  reach	
  
•  Processes	
  migraIng	
  to	
  soJware	
  
•  Training	
  available	
  for	
  exisIng	
  personnel	
  
Current SMT
AGENDA	
  
Current	
  State	
  of	
  SMT	
  
GeLng	
  Started	
  
Skill	
  Requirements	
  
Use	
  Cases	
  
Q&A	
  
“Simple Guide”
Is	
  Academic	
  Moses	
  Enough?	
  
“There	
  are	
  considerable	
  amounts	
  of	
  addiIonal	
  
funcIonality...	
  that	
  are	
  not	
  included	
  in	
  Moses	
  
that	
  are	
  essenIal	
  in	
  order	
  to	
  offer	
  a	
  strong	
  
and	
  innovaIve	
  commercial	
  MT	
  plajorm.”	
  
	
  
– Philipp	
  Koehn	
  –	
  Professor,	
  University	
  of	
  Edinburgh	
  
(http://kv-emptypages.blogspot.com/2013/09/understanding-mt-customization.html)
“Simple Guide”
GeGng	
  Started	
  
•  Manage	
  Corpora	
  
•  Mange	
  SMT	
  Models	
  
•  Produce	
  MT	
  
•  Post	
  Edit	
  Results	
  
“Simple Guide”
Manage	
  Corpora	
  
•  Acquire	
  
– TranslaIon	
  memory	
  archives	
  
– Public	
  corpora	
  
– Convert	
  docs	
  
– Recycle	
  post-­‐edited	
  MT	
  
•  Process	
  
– Transform/filter	
  
– Curate/categorize	
  
“Simple Guide”
Manage	
  SMT	
  Models	
  
•  Train	
  TranslaIon	
  models	
  
•  Train	
  Language	
  model	
  
•  Tune	
  SMT	
  model	
  
•  Evaluate	
  SMT	
  model	
  
•  Deploy	
  SMT	
  engine	
  
•  Versioning	
  
“Simple Guide”
Produce	
  MT	
  
•  Manual	
  
– Import/export	
  TMX	
  	
  
– Import/Export	
  XLIFF	
  
– Doc-­‐to-­‐doc	
  support	
  
•  AutomaIon	
  
– TMS	
  IntegraIon	
  
– CAT	
  IntegraIon	
  
“Simple Guide”
Post-­‐edit	
  Results	
  
•  Subject	
  of	
  other	
  presentaIons	
  
•  Recycle	
  as	
  new	
  corpus?	
  
“Simple Guide”
AGENDA	
  
Current	
  State	
  of	
  SMT	
  
GeGng	
  Started	
  
Skill	
  Requirements	
  
Use	
  Cases	
  
Q&A	
  
Human Resources
SMT	
  Specialists	
  
•  ComputaIonal	
  linguists	
  are	
  scienIst	
  who	
  
specialize	
  in	
  language	
  and	
  compuIng	
  to	
  
create	
  and	
  advance	
  the	
  science.	
  
•  Specialists	
  are	
  localizaIon	
  engineers	
  who	
  
review	
  the	
  data	
  and	
  select	
  tools	
  to	
  prepare	
  a	
  
training	
  corpus	
  that	
  minimizes	
  post-­‐ediIng	
  in	
  
commercial	
  producIon.	
  
Human Resources
Specialist’s	
  Required	
  Skills	
  
•  OrganizaIon	
  skills	
  (e.g.	
  manage	
  TM’s)	
  
•  Observant	
  of	
  paserns	
  
•  Willingness	
  to	
  learn	
  
•  Regular	
  expression	
  –	
  helpful	
  
•  Programming	
  skills	
  –	
  unnecessary	
  
•  ComputaIonal	
  linguists	
  –	
  unnecessary	
  
•  System	
  Administrator	
  –	
  unnecessary	
  
Human Resources
Observant	
  of	
  Paserns	
  
Human Resources
Technical pattern
Linguistic patterns
Observant	
  of	
  Paserns	
  
<ut>{cs6f1cf6lang1024	
  </ut>	
  &lt;span	
  
class=&quot;small-­‐text&quot;&gt;	
  <ut>}	
  
</ut>Copyright	
  ©	
  1997-­‐2009	
  &amp;nbsp;	
  n	
  	
  n	
  
•  Archived	
  TMX	
  content	
  
– RTF	
  
– HTML	
  &	
  XML-­‐escaped	
  HTML	
  
– XML	
  
– Broken	
  programmer’s	
  markup	
  
Human Resources
AGENDA	
  
Current	
  State	
  of	
  SMT	
  
GeGng	
  Started	
  
Skill	
  Requirements	
  
Use	
  Cases	
  
Q&A	
  
Use Cases
Use	
  Cases	
  
•  Large	
  LSP	
  
– Extensive	
  MT	
  experience	
  
– CSA	
  Top	
  10	
  
•  2	
  Medium	
  LSP’s	
  
– Post-­‐ediIng	
  experience	
  
– In-­‐house	
  localizaIon	
  engineers	
  
•  Freelance	
  Translator	
  
– United	
  NaIons	
  contractor	
  
– Technically	
  savvy	
  
Use Cases
Welocalize	
  
•  Work:	
  SoJware	
  localizaIon	
  
•  Hardware:	
  Virtual	
  machines	
  for	
  pilot	
  
•  SMT	
  models:	
  EN-­‐ES,	
  EN-­‐DE,	
  EN-­‐ZH,	
  EN-­‐RU	
  
•  Corpus:	
  All	
  corpora	
  <	
  500,000	
  segment	
  pairs	
  
•  Training:	
  3-­‐month	
  pilot	
  
•  Results:	
  “Approached	
  outsourcing	
  vendors”	
  
– Zero-­‐edit	
  measure:	
  25-­‐45%	
  
Use Cases
EQHO	
  CommunicaIons	
  
•  Work:	
  SoJware	
  localizaIon	
  	
  
•  Hardware:	
  $1,500	
  new	
  6-­‐core	
  computer	
  
•  SMT	
  model:	
  EN	
  <-­‐>	
  European	
  language	
  
•  Corpus:	
  ~130,000	
  segment	
  pairs	
  
•  Training:	
  3	
  month	
  pilot	
  
•  Results:	
  BLEU’s	
  80	
  to	
  85	
  
– Zero-­‐edit	
  measure:	
  23-­‐43%	
  
Use Cases
Mid-­‐sized	
  European	
  LSP	
  
•  Work:	
  Financial	
  and	
  regulatory	
  reports	
  
•  SMT	
  model:	
  EN	
  <-­‐>	
  European	
  language	
  
•  Corpus:	
  ~800,000	
  segment	
  pairs	
  (25	
  years)	
  
•  Training:	
  20	
  hours	
  of	
  tutorials	
  over	
  2	
  months	
  
•  Homework:	
  Categorize	
  TM’s	
  for	
  4+	
  months	
  
•  Results:	
  BLEU’s	
  rose	
  from	
  low	
  50’s	
  to	
  mid-­‐80’s	
  
Use Cases
Freelance	
  Translator	
  
•  Work:	
  United	
  NaIons	
  environmental	
  reports	
  
•  Hardware:	
  $1,500	
  new	
  6-­‐core	
  computer	
  
•  SMT	
  model:	
  EN	
  <-­‐>	
  European	
  language	
  
•  Corpus:	
  ~250,000	
  segment	
  pairs	
  (25	
  years)	
  
•  Training:	
  40	
  hours	
  of	
  tutorials	
  over	
  2	
  months	
  
•  Results:	
  BLEU’s	
  75	
  to	
  85	
  
– Zero-­‐edit	
  measure:	
  averaged	
  35%	
  
Use Cases
Conclusion	
  
•  Regardless	
  of	
  business	
  model	
  
– Mange	
  Corpora	
  
– Generate	
  Models	
  
– Product	
  MT	
  
– Publish	
  Results	
  
•  Re-­‐purpose	
  exisIng	
  staff	
  with	
  training	
  
•  Rightsourcing	
  
AGENDA	
  
Current	
  State	
  of	
  SMT	
  
GeGng	
  Started	
  
Skill	
  Requirements	
  
Use	
  Cases	
  
Q&A	
  

More Related Content

Viewers also liked

Build Moses on Ubuntu (64-bit) in VirtualBox: recorded by Aaron
Build Moses on Ubuntu (64-bit) in VirtualBox: recorded by AaronBuild Moses on Ubuntu (64-bit) in VirtualBox: recorded by Aaron
Build Moses on Ubuntu (64-bit) in VirtualBox: recorded by AaronLifeng (Aaron) Han
 
The Future of Technical Communication is Marketing
The Future of Technical Communication is MarketingThe Future of Technical Communication is Marketing
The Future of Technical Communication is MarketingScott Abel
 
TAUS MT SHOWCASE, The WeMT Program, Olga Beregovaya, Welocalize, 10 October 2...
TAUS MT SHOWCASE, The WeMT Program, Olga Beregovaya, Welocalize, 10 October 2...TAUS MT SHOWCASE, The WeMT Program, Olga Beregovaya, Welocalize, 10 October 2...
TAUS MT SHOWCASE, The WeMT Program, Olga Beregovaya, Welocalize, 10 October 2...TAUS - The Language Data Network
 
Antzinaroa eta erdi aroa nora taus
Antzinaroa eta erdi aroa nora tausAntzinaroa eta erdi aroa nora taus
Antzinaroa eta erdi aroa nora tausLourdes Macicior
 
Achieving Translation Efficiency and Accuracy for Video Content, Xiao Yuan (P...
Achieving Translation Efficiency and Accuracy for Video Content, Xiao Yuan (P...Achieving Translation Efficiency and Accuracy for Video Content, Xiao Yuan (P...
Achieving Translation Efficiency and Accuracy for Video Content, Xiao Yuan (P...TAUS - The Language Data Network
 
TAUS USER CONFERENCE 2010, The Deep Hybrid machine translation engine
TAUS USER CONFERENCE 2010, The Deep Hybrid machine translation engineTAUS USER CONFERENCE 2010, The Deep Hybrid machine translation engine
TAUS USER CONFERENCE 2010, The Deep Hybrid machine translation engineTAUS - The Language Data Network
 
The cognitive era and the future of content
The cognitive era and the future of contentThe cognitive era and the future of content
The cognitive era and the future of contentScott Abel
 

Viewers also liked (10)

Build Moses on Ubuntu (64-bit) in VirtualBox: recorded by Aaron
Build Moses on Ubuntu (64-bit) in VirtualBox: recorded by AaronBuild Moses on Ubuntu (64-bit) in VirtualBox: recorded by Aaron
Build Moses on Ubuntu (64-bit) in VirtualBox: recorded by Aaron
 
TAUS Moses Roundtable, Prague, 11 September 2013
TAUS Moses Roundtable, Prague, 11 September 2013TAUS Moses Roundtable, Prague, 11 September 2013
TAUS Moses Roundtable, Prague, 11 September 2013
 
TAUS New Year's Reception 2014
TAUS New Year's Reception 2014TAUS New Year's Reception 2014
TAUS New Year's Reception 2014
 
The Future of Technical Communication is Marketing
The Future of Technical Communication is MarketingThe Future of Technical Communication is Marketing
The Future of Technical Communication is Marketing
 
TAUS MT SHOWCASE, The WeMT Program, Olga Beregovaya, Welocalize, 10 October 2...
TAUS MT SHOWCASE, The WeMT Program, Olga Beregovaya, Welocalize, 10 October 2...TAUS MT SHOWCASE, The WeMT Program, Olga Beregovaya, Welocalize, 10 October 2...
TAUS MT SHOWCASE, The WeMT Program, Olga Beregovaya, Welocalize, 10 October 2...
 
TAUS MT Post-Editing Guidelines
TAUS MT Post-Editing GuidelinesTAUS MT Post-Editing Guidelines
TAUS MT Post-Editing Guidelines
 
Antzinaroa eta erdi aroa nora taus
Antzinaroa eta erdi aroa nora tausAntzinaroa eta erdi aroa nora taus
Antzinaroa eta erdi aroa nora taus
 
Achieving Translation Efficiency and Accuracy for Video Content, Xiao Yuan (P...
Achieving Translation Efficiency and Accuracy for Video Content, Xiao Yuan (P...Achieving Translation Efficiency and Accuracy for Video Content, Xiao Yuan (P...
Achieving Translation Efficiency and Accuracy for Video Content, Xiao Yuan (P...
 
TAUS USER CONFERENCE 2010, The Deep Hybrid machine translation engine
TAUS USER CONFERENCE 2010, The Deep Hybrid machine translation engineTAUS USER CONFERENCE 2010, The Deep Hybrid machine translation engine
TAUS USER CONFERENCE 2010, The Deep Hybrid machine translation engine
 
The cognitive era and the future of content
The cognitive era and the future of contentThe cognitive era and the future of content
The cognitive era and the future of content
 

Similar to TAUS Machine Translation Showcase, The Simplified Guide to Getting Started in SMT, Precision Translation Tools, 2014

Machine Teaching for workflow automation RIGA COMM 2020
Machine Teaching for workflow automation RIGA COMM 2020Machine Teaching for workflow automation RIGA COMM 2020
Machine Teaching for workflow automation RIGA COMM 2020Muntis Rudzitis
 
Translation Trends for 2015
Translation Trends for 2015Translation Trends for 2015
Translation Trends for 2015Memsource
 
Case Study: Upgrade Strategies for PeopleSoft Financials and Supply Chain 9.1
Case Study: Upgrade Strategies for PeopleSoft Financials and Supply Chain 9.1Case Study: Upgrade Strategies for PeopleSoft Financials and Supply Chain 9.1
Case Study: Upgrade Strategies for PeopleSoft Financials and Supply Chain 9.1Smart ERP Solutions, Inc.
 
Improving Translator Productivity with MT: A Patent Translation Case Study
Improving Translator Productivity with MT: A Patent Translation Case StudyImproving Translator Productivity with MT: A Patent Translation Case Study
Improving Translator Productivity with MT: A Patent Translation Case StudyIconic Translation Machines
 
Karnov Super power your search with Text Analytics - Findability Day 2014
Karnov Super power your search with Text Analytics - Findability Day 2014Karnov Super power your search with Text Analytics - Findability Day 2014
Karnov Super power your search with Text Analytics - Findability Day 2014Findwise
 
Practical Applications of AI: Real World Examples
Practical Applications of AI: Real World ExamplesPractical Applications of AI: Real World Examples
Practical Applications of AI: Real World ExamplesJeremyOtt5
 
PAC 2019 virtual Alexander Podelko
PAC 2019 virtual Alexander Podelko PAC 2019 virtual Alexander Podelko
PAC 2019 virtual Alexander Podelko Neotys
 
LearnFlow Industrial Training Program - G.H.Raisoni
LearnFlow Industrial Training Program - G.H.RaisoniLearnFlow Industrial Training Program - G.H.Raisoni
LearnFlow Industrial Training Program - G.H.RaisoniKarthik Ragubathy
 
Lexcelera MT Breaking Compromises
Lexcelera MT Breaking CompromisesLexcelera MT Breaking Compromises
Lexcelera MT Breaking CompromisesLoriThicke
 
LearnFlow Industrial Training for Y.C.C.E Students
LearnFlow Industrial Training for Y.C.C.E StudentsLearnFlow Industrial Training for Y.C.C.E Students
LearnFlow Industrial Training for Y.C.C.E Studentslearnflow
 
Applied Machine Learning Course - Jodie Zhu (WeCloudData)
Applied Machine Learning Course - Jodie Zhu (WeCloudData)Applied Machine Learning Course - Jodie Zhu (WeCloudData)
Applied Machine Learning Course - Jodie Zhu (WeCloudData)WeCloudData
 
LearnFlow Industrial Training Program - Y.C.C.E
LearnFlow Industrial Training Program - Y.C.C.ELearnFlow Industrial Training Program - Y.C.C.E
LearnFlow Industrial Training Program - Y.C.C.EKarthik Ragubathy
 
User Empowered Machine Translation. Dion Wiggins, Asia Online
User Empowered Machine Translation. Dion Wiggins, Asia OnlineUser Empowered Machine Translation. Dion Wiggins, Asia Online
User Empowered Machine Translation. Dion Wiggins, Asia OnlineABBYY Language Serivces
 
Building a Scalable and reliable open source ML Platform with MLFlow
Building a Scalable and reliable open source ML Platform with MLFlowBuilding a Scalable and reliable open source ML Platform with MLFlow
Building a Scalable and reliable open source ML Platform with MLFlowGoDataDriven
 

Similar to TAUS Machine Translation Showcase, The Simplified Guide to Getting Started in SMT, Precision Translation Tools, 2014 (20)

Machine Teaching for workflow automation RIGA COMM 2020
Machine Teaching for workflow automation RIGA COMM 2020Machine Teaching for workflow automation RIGA COMM 2020
Machine Teaching for workflow automation RIGA COMM 2020
 
Sap abap course
Sap abap course Sap abap course
Sap abap course
 
Sap abap course content
Sap abap course contentSap abap course content
Sap abap course content
 
Translation Trends for 2015
Translation Trends for 2015Translation Trends for 2015
Translation Trends for 2015
 
Case Study: Upgrade Strategies for PeopleSoft Financials and Supply Chain 9.1
Case Study: Upgrade Strategies for PeopleSoft Financials and Supply Chain 9.1Case Study: Upgrade Strategies for PeopleSoft Financials and Supply Chain 9.1
Case Study: Upgrade Strategies for PeopleSoft Financials and Supply Chain 9.1
 
Machine learning specialist ver#4
Machine learning specialist ver#4Machine learning specialist ver#4
Machine learning specialist ver#4
 
Improving Translator Productivity with MT: A Patent Translation Case Study
Improving Translator Productivity with MT: A Patent Translation Case StudyImproving Translator Productivity with MT: A Patent Translation Case Study
Improving Translator Productivity with MT: A Patent Translation Case Study
 
Karnov Super power your search with Text Analytics - Findability Day 2014
Karnov Super power your search with Text Analytics - Findability Day 2014Karnov Super power your search with Text Analytics - Findability Day 2014
Karnov Super power your search with Text Analytics - Findability Day 2014
 
Practical Applications of AI: Real World Examples
Practical Applications of AI: Real World ExamplesPractical Applications of AI: Real World Examples
Practical Applications of AI: Real World Examples
 
PAC 2019 virtual Alexander Podelko
PAC 2019 virtual Alexander Podelko PAC 2019 virtual Alexander Podelko
PAC 2019 virtual Alexander Podelko
 
Preshanth without information
Preshanth without informationPreshanth without information
Preshanth without information
 
LearnFlow Industrial Training Program - G.H.Raisoni
LearnFlow Industrial Training Program - G.H.RaisoniLearnFlow Industrial Training Program - G.H.Raisoni
LearnFlow Industrial Training Program - G.H.Raisoni
 
MT Use in Lingosail, by Yongpeng Wei, Lingosail
MT Use in Lingosail, by Yongpeng Wei, LingosailMT Use in Lingosail, by Yongpeng Wei, Lingosail
MT Use in Lingosail, by Yongpeng Wei, Lingosail
 
Lexcelera MT Breaking Compromises
Lexcelera MT Breaking CompromisesLexcelera MT Breaking Compromises
Lexcelera MT Breaking Compromises
 
LearnFlow Industrial Training for Y.C.C.E Students
LearnFlow Industrial Training for Y.C.C.E StudentsLearnFlow Industrial Training for Y.C.C.E Students
LearnFlow Industrial Training for Y.C.C.E Students
 
Applied Machine Learning Course - Jodie Zhu (WeCloudData)
Applied Machine Learning Course - Jodie Zhu (WeCloudData)Applied Machine Learning Course - Jodie Zhu (WeCloudData)
Applied Machine Learning Course - Jodie Zhu (WeCloudData)
 
Stefan.van.Rensburg - CV-v1
Stefan.van.Rensburg - CV-v1Stefan.van.Rensburg - CV-v1
Stefan.van.Rensburg - CV-v1
 
LearnFlow Industrial Training Program - Y.C.C.E
LearnFlow Industrial Training Program - Y.C.C.ELearnFlow Industrial Training Program - Y.C.C.E
LearnFlow Industrial Training Program - Y.C.C.E
 
User Empowered Machine Translation. Dion Wiggins, Asia Online
User Empowered Machine Translation. Dion Wiggins, Asia OnlineUser Empowered Machine Translation. Dion Wiggins, Asia Online
User Empowered Machine Translation. Dion Wiggins, Asia Online
 
Building a Scalable and reliable open source ML Platform with MLFlow
Building a Scalable and reliable open source ML Platform with MLFlowBuilding a Scalable and reliable open source ML Platform with MLFlow
Building a Scalable and reliable open source ML Platform with MLFlow
 

More from TAUS - The Language Data Network

TAUS Global Content Summit Amsterdam 2019 / Beyond MT. A few premature reflec...
TAUS Global Content Summit Amsterdam 2019 / Beyond MT. A few premature reflec...TAUS Global Content Summit Amsterdam 2019 / Beyond MT. A few premature reflec...
TAUS Global Content Summit Amsterdam 2019 / Beyond MT. A few premature reflec...TAUS - The Language Data Network
 
TAUS Global Content Summit Amsterdam 2019 / Measure with DQF, Dace Dzeguze (T...
TAUS Global Content Summit Amsterdam 2019 / Measure with DQF, Dace Dzeguze (T...TAUS Global Content Summit Amsterdam 2019 / Measure with DQF, Dace Dzeguze (T...
TAUS Global Content Summit Amsterdam 2019 / Measure with DQF, Dace Dzeguze (T...TAUS - The Language Data Network
 
TAUS Global Content Summit Amsterdam 2019 / Automatic for the People by Domin...
TAUS Global Content Summit Amsterdam 2019 / Automatic for the People by Domin...TAUS Global Content Summit Amsterdam 2019 / Automatic for the People by Domin...
TAUS Global Content Summit Amsterdam 2019 / Automatic for the People by Domin...TAUS - The Language Data Network
 
TAUS Global Content Summit Amsterdam 2019 / The Quantum Leap: Human Parity, C...
TAUS Global Content Summit Amsterdam 2019 / The Quantum Leap: Human Parity, C...TAUS Global Content Summit Amsterdam 2019 / The Quantum Leap: Human Parity, C...
TAUS Global Content Summit Amsterdam 2019 / The Quantum Leap: Human Parity, C...TAUS - The Language Data Network
 
TAUS Global Content Summit Amsterdam 2019 / Growing Business by Connecting Co...
TAUS Global Content Summit Amsterdam 2019 / Growing Business by Connecting Co...TAUS Global Content Summit Amsterdam 2019 / Growing Business by Connecting Co...
TAUS Global Content Summit Amsterdam 2019 / Growing Business by Connecting Co...TAUS - The Language Data Network
 
Introduction Innovation Contest Shenzhen by Henri Broekmate (Lionbridge)
Introduction Innovation Contest Shenzhen by Henri Broekmate (Lionbridge)Introduction Innovation Contest Shenzhen by Henri Broekmate (Lionbridge)
Introduction Innovation Contest Shenzhen by Henri Broekmate (Lionbridge)TAUS - The Language Data Network
 
Game Changer for Linguistic Review: Shifting the Paradigm, Klaus Fleischmann...
 Game Changer for Linguistic Review: Shifting the Paradigm, Klaus Fleischmann... Game Changer for Linguistic Review: Shifting the Paradigm, Klaus Fleischmann...
Game Changer for Linguistic Review: Shifting the Paradigm, Klaus Fleischmann...TAUS - The Language Data Network
 
A translation memory P2P trading platform - to make global translation memory...
A translation memory P2P trading platform - to make global translation memory...A translation memory P2P trading platform - to make global translation memory...
A translation memory P2P trading platform - to make global translation memory...TAUS - The Language Data Network
 
Shiyibao — The Most Efficient Translation Feedback System Ever, Guanqing Hao ...
Shiyibao — The Most Efficient Translation Feedback System Ever, Guanqing Hao ...Shiyibao — The Most Efficient Translation Feedback System Ever, Guanqing Hao ...
Shiyibao — The Most Efficient Translation Feedback System Ever, Guanqing Hao ...TAUS - The Language Data Network
 
Stepes – Instant Human Translation Services for the Digital World, Carl Yao (...
Stepes – Instant Human Translation Services for the Digital World, Carl Yao (...Stepes – Instant Human Translation Services for the Digital World, Carl Yao (...
Stepes – Instant Human Translation Services for the Digital World, Carl Yao (...TAUS - The Language Data Network
 
Smart Translation Resource Management: Semantic Matching, Kirk Zhang (Wiitran...
Smart Translation Resource Management: Semantic Matching, Kirk Zhang (Wiitran...Smart Translation Resource Management: Semantic Matching, Kirk Zhang (Wiitran...
Smart Translation Resource Management: Semantic Matching, Kirk Zhang (Wiitran...TAUS - The Language Data Network
 
The Theory and Practice of Computer Aided Translation Training System, Liu Q...
 The Theory and Practice of Computer Aided Translation Training System, Liu Q... The Theory and Practice of Computer Aided Translation Training System, Liu Q...
The Theory and Practice of Computer Aided Translation Training System, Liu Q...TAUS - The Language Data Network
 
How to efficiently use large-scale TMs in translation, Jing Zhang (Tmxmall)
How to efficiently use large-scale TMs in translation, Jing Zhang (Tmxmall)How to efficiently use large-scale TMs in translation, Jing Zhang (Tmxmall)
How to efficiently use large-scale TMs in translation, Jing Zhang (Tmxmall)TAUS - The Language Data Network
 
A use-case for getting MT into your company, Kerstin Berns (berns language c...
 A use-case for getting MT into your company, Kerstin Berns (berns language c... A use-case for getting MT into your company, Kerstin Berns (berns language c...
A use-case for getting MT into your company, Kerstin Berns (berns language c...TAUS - The Language Data Network
 
How Existing Quality Models Get Challenged, by Katka Gasova (Moravia)
How Existing Quality Models Get Challenged, by Katka Gasova (Moravia)How Existing Quality Models Get Challenged, by Katka Gasova (Moravia)
How Existing Quality Models Get Challenged, by Katka Gasova (Moravia)TAUS - The Language Data Network
 

More from TAUS - The Language Data Network (20)

TAUS Global Content Summit Amsterdam 2019 / Beyond MT. A few premature reflec...
TAUS Global Content Summit Amsterdam 2019 / Beyond MT. A few premature reflec...TAUS Global Content Summit Amsterdam 2019 / Beyond MT. A few premature reflec...
TAUS Global Content Summit Amsterdam 2019 / Beyond MT. A few premature reflec...
 
TAUS Global Content Summit Amsterdam 2019 / Measure with DQF, Dace Dzeguze (T...
TAUS Global Content Summit Amsterdam 2019 / Measure with DQF, Dace Dzeguze (T...TAUS Global Content Summit Amsterdam 2019 / Measure with DQF, Dace Dzeguze (T...
TAUS Global Content Summit Amsterdam 2019 / Measure with DQF, Dace Dzeguze (T...
 
TAUS Global Content Summit Amsterdam 2019 / Automatic for the People by Domin...
TAUS Global Content Summit Amsterdam 2019 / Automatic for the People by Domin...TAUS Global Content Summit Amsterdam 2019 / Automatic for the People by Domin...
TAUS Global Content Summit Amsterdam 2019 / Automatic for the People by Domin...
 
TAUS Global Content Summit Amsterdam 2019 / The Quantum Leap: Human Parity, C...
TAUS Global Content Summit Amsterdam 2019 / The Quantum Leap: Human Parity, C...TAUS Global Content Summit Amsterdam 2019 / The Quantum Leap: Human Parity, C...
TAUS Global Content Summit Amsterdam 2019 / The Quantum Leap: Human Parity, C...
 
TAUS Global Content Summit Amsterdam 2019 / Growing Business by Connecting Co...
TAUS Global Content Summit Amsterdam 2019 / Growing Business by Connecting Co...TAUS Global Content Summit Amsterdam 2019 / Growing Business by Connecting Co...
TAUS Global Content Summit Amsterdam 2019 / Growing Business by Connecting Co...
 
Introduction Innovation Contest Shenzhen by Henri Broekmate (Lionbridge)
Introduction Innovation Contest Shenzhen by Henri Broekmate (Lionbridge)Introduction Innovation Contest Shenzhen by Henri Broekmate (Lionbridge)
Introduction Innovation Contest Shenzhen by Henri Broekmate (Lionbridge)
 
Game Changer for Linguistic Review: Shifting the Paradigm, Klaus Fleischmann...
 Game Changer for Linguistic Review: Shifting the Paradigm, Klaus Fleischmann... Game Changer for Linguistic Review: Shifting the Paradigm, Klaus Fleischmann...
Game Changer for Linguistic Review: Shifting the Paradigm, Klaus Fleischmann...
 
A translation memory P2P trading platform - to make global translation memory...
A translation memory P2P trading platform - to make global translation memory...A translation memory P2P trading platform - to make global translation memory...
A translation memory P2P trading platform - to make global translation memory...
 
Shiyibao — The Most Efficient Translation Feedback System Ever, Guanqing Hao ...
Shiyibao — The Most Efficient Translation Feedback System Ever, Guanqing Hao ...Shiyibao — The Most Efficient Translation Feedback System Ever, Guanqing Hao ...
Shiyibao — The Most Efficient Translation Feedback System Ever, Guanqing Hao ...
 
Stepes – Instant Human Translation Services for the Digital World, Carl Yao (...
Stepes – Instant Human Translation Services for the Digital World, Carl Yao (...Stepes – Instant Human Translation Services for the Digital World, Carl Yao (...
Stepes – Instant Human Translation Services for the Digital World, Carl Yao (...
 
Farmer Lv (TrueTran)
Farmer Lv (TrueTran)Farmer Lv (TrueTran)
Farmer Lv (TrueTran)
 
Smart Translation Resource Management: Semantic Matching, Kirk Zhang (Wiitran...
Smart Translation Resource Management: Semantic Matching, Kirk Zhang (Wiitran...Smart Translation Resource Management: Semantic Matching, Kirk Zhang (Wiitran...
Smart Translation Resource Management: Semantic Matching, Kirk Zhang (Wiitran...
 
The Theory and Practice of Computer Aided Translation Training System, Liu Q...
 The Theory and Practice of Computer Aided Translation Training System, Liu Q... The Theory and Practice of Computer Aided Translation Training System, Liu Q...
The Theory and Practice of Computer Aided Translation Training System, Liu Q...
 
Translation Technology Showcase in Shenzhen
Translation Technology Showcase in ShenzhenTranslation Technology Showcase in Shenzhen
Translation Technology Showcase in Shenzhen
 
How to efficiently use large-scale TMs in translation, Jing Zhang (Tmxmall)
How to efficiently use large-scale TMs in translation, Jing Zhang (Tmxmall)How to efficiently use large-scale TMs in translation, Jing Zhang (Tmxmall)
How to efficiently use large-scale TMs in translation, Jing Zhang (Tmxmall)
 
SDL Trados Studio 2017, Jocelyn He (SDL)
SDL Trados Studio 2017, Jocelyn He (SDL)SDL Trados Studio 2017, Jocelyn He (SDL)
SDL Trados Studio 2017, Jocelyn He (SDL)
 
How we train post-editors - Yongpeng Wei (Lingosail)
How we train post-editors - Yongpeng Wei (Lingosail)How we train post-editors - Yongpeng Wei (Lingosail)
How we train post-editors - Yongpeng Wei (Lingosail)
 
A use-case for getting MT into your company, Kerstin Berns (berns language c...
 A use-case for getting MT into your company, Kerstin Berns (berns language c... A use-case for getting MT into your company, Kerstin Berns (berns language c...
A use-case for getting MT into your company, Kerstin Berns (berns language c...
 
QE integrated in XTM, by Bob Willans (XTM)
QE integrated in XTM, by Bob Willans (XTM)QE integrated in XTM, by Bob Willans (XTM)
QE integrated in XTM, by Bob Willans (XTM)
 
How Existing Quality Models Get Challenged, by Katka Gasova (Moravia)
How Existing Quality Models Get Challenged, by Katka Gasova (Moravia)How Existing Quality Models Get Challenged, by Katka Gasova (Moravia)
How Existing Quality Models Get Challenged, by Katka Gasova (Moravia)
 

Recently uploaded

DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Zilliz
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelDeepika Singh
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfOverkill Security
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 

Recently uploaded (20)

DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 

TAUS Machine Translation Showcase, The Simplified Guide to Getting Started in SMT, Precision Translation Tools, 2014

  • 1. TAUS  MACHINE  TRANSLATION  SHOWCASE   Vancouver,  Canada   The Simplified Guide to Getting Started in SMT Wednesday, 29 October 2014 Tom Hoar, Precision Translation Tools The  research  within  the  project  MosesCore  leading  to  these  results  has  received  funding  from  the  European  Union  7th  Framework  Programme,  grant  agreement  no  288487  
  • 2. The  Simplified  Guide  to     GeGng  Started  in  SMT   Professional  tools     Professional  experIse  
  • 3. PTTools   •  SoJware  vendor  -­‐  founded  Feb  2010   – Adobe  :  Photoshop   – PTTools  :  DoMT   •  DoMT  brand   – DoMT  Deskop:  organize  and  manage  training   corpora,  models  and  custom  workflows.   – DoMT  Server:  automaIon  soluIon   •  Customer  educaIon   Who We Are
  • 4. AGENDA   Current  State  of  SMT   GeGng  Started   Skill  Requirements   Use  Cases   Q&A   Current SMT
  • 5. Current  State   •  Who  has  not  heard  of  SMT?   •  Requires  powerful,  expensive  hardware   •  Huge  translaIon  memories   •  Complicated  processes   •  Dearth  of  skilled  personnel   Current SMT
  • 6. Then  vs  Now   Current SMT 2007   2014   Hardware   50  CPUs  in  private  cloud   One  24-­‐CPU  machine   Mega  corpus   2  weeks   36  hours   Cost   US  $100K++   US  $1,500   1992   2014   Computer   SGI  @  $100K   Dell  @  $5,000   SoGware   Eclipse  Alias  @$25K   Adobe  CS  Cloud  $1,500   Graphic  ProducKon   $300  per  hour   $30++  per  hour  
  • 7. Business  Models   •  Where  is  the  work  done?   •  Who  does  the  work?   •  Outsourced   – Free   – For  Fee   •  Insourced   – Enterprise  Server   – Desktop  ApplicaIon   Current SMT
  • 8. Reality  2014   •  Inexpensive  capable  hardware  exists   •  TranslaIon  memories  within  reach   •  Processes  migraIng  to  soJware   •  Training  available  for  exisIng  personnel   Current SMT
  • 9. AGENDA   Current  State  of  SMT   GeLng  Started   Skill  Requirements   Use  Cases   Q&A   “Simple Guide”
  • 10. Is  Academic  Moses  Enough?   “There  are  considerable  amounts  of  addiIonal   funcIonality...  that  are  not  included  in  Moses   that  are  essenIal  in  order  to  offer  a  strong   and  innovaIve  commercial  MT  plajorm.”     – Philipp  Koehn  –  Professor,  University  of  Edinburgh   (http://kv-emptypages.blogspot.com/2013/09/understanding-mt-customization.html) “Simple Guide”
  • 11. GeGng  Started   •  Manage  Corpora   •  Mange  SMT  Models   •  Produce  MT   •  Post  Edit  Results   “Simple Guide”
  • 12. Manage  Corpora   •  Acquire   – TranslaIon  memory  archives   – Public  corpora   – Convert  docs   – Recycle  post-­‐edited  MT   •  Process   – Transform/filter   – Curate/categorize   “Simple Guide”
  • 13. Manage  SMT  Models   •  Train  TranslaIon  models   •  Train  Language  model   •  Tune  SMT  model   •  Evaluate  SMT  model   •  Deploy  SMT  engine   •  Versioning   “Simple Guide”
  • 14. Produce  MT   •  Manual   – Import/export  TMX     – Import/Export  XLIFF   – Doc-­‐to-­‐doc  support   •  AutomaIon   – TMS  IntegraIon   – CAT  IntegraIon   “Simple Guide”
  • 15. Post-­‐edit  Results   •  Subject  of  other  presentaIons   •  Recycle  as  new  corpus?   “Simple Guide”
  • 16. AGENDA   Current  State  of  SMT   GeGng  Started   Skill  Requirements   Use  Cases   Q&A   Human Resources
  • 17. SMT  Specialists   •  ComputaIonal  linguists  are  scienIst  who   specialize  in  language  and  compuIng  to   create  and  advance  the  science.   •  Specialists  are  localizaIon  engineers  who   review  the  data  and  select  tools  to  prepare  a   training  corpus  that  minimizes  post-­‐ediIng  in   commercial  producIon.   Human Resources
  • 18. Specialist’s  Required  Skills   •  OrganizaIon  skills  (e.g.  manage  TM’s)   •  Observant  of  paserns   •  Willingness  to  learn   •  Regular  expression  –  helpful   •  Programming  skills  –  unnecessary   •  ComputaIonal  linguists  –  unnecessary   •  System  Administrator  –  unnecessary   Human Resources
  • 19. Observant  of  Paserns   Human Resources Technical pattern Linguistic patterns
  • 20. Observant  of  Paserns   <ut>{cs6f1cf6lang1024  </ut>  &lt;span   class=&quot;small-­‐text&quot;&gt;  <ut>}   </ut>Copyright  ©  1997-­‐2009  &amp;nbsp;  n    n   •  Archived  TMX  content   – RTF   – HTML  &  XML-­‐escaped  HTML   – XML   – Broken  programmer’s  markup   Human Resources
  • 21. AGENDA   Current  State  of  SMT   GeGng  Started   Skill  Requirements   Use  Cases   Q&A   Use Cases
  • 22. Use  Cases   •  Large  LSP   – Extensive  MT  experience   – CSA  Top  10   •  2  Medium  LSP’s   – Post-­‐ediIng  experience   – In-­‐house  localizaIon  engineers   •  Freelance  Translator   – United  NaIons  contractor   – Technically  savvy   Use Cases
  • 23. Welocalize   •  Work:  SoJware  localizaIon   •  Hardware:  Virtual  machines  for  pilot   •  SMT  models:  EN-­‐ES,  EN-­‐DE,  EN-­‐ZH,  EN-­‐RU   •  Corpus:  All  corpora  <  500,000  segment  pairs   •  Training:  3-­‐month  pilot   •  Results:  “Approached  outsourcing  vendors”   – Zero-­‐edit  measure:  25-­‐45%   Use Cases
  • 24. EQHO  CommunicaIons   •  Work:  SoJware  localizaIon     •  Hardware:  $1,500  new  6-­‐core  computer   •  SMT  model:  EN  <-­‐>  European  language   •  Corpus:  ~130,000  segment  pairs   •  Training:  3  month  pilot   •  Results:  BLEU’s  80  to  85   – Zero-­‐edit  measure:  23-­‐43%   Use Cases
  • 25. Mid-­‐sized  European  LSP   •  Work:  Financial  and  regulatory  reports   •  SMT  model:  EN  <-­‐>  European  language   •  Corpus:  ~800,000  segment  pairs  (25  years)   •  Training:  20  hours  of  tutorials  over  2  months   •  Homework:  Categorize  TM’s  for  4+  months   •  Results:  BLEU’s  rose  from  low  50’s  to  mid-­‐80’s   Use Cases
  • 26. Freelance  Translator   •  Work:  United  NaIons  environmental  reports   •  Hardware:  $1,500  new  6-­‐core  computer   •  SMT  model:  EN  <-­‐>  European  language   •  Corpus:  ~250,000  segment  pairs  (25  years)   •  Training:  40  hours  of  tutorials  over  2  months   •  Results:  BLEU’s  75  to  85   – Zero-­‐edit  measure:  averaged  35%   Use Cases
  • 27. Conclusion   •  Regardless  of  business  model   – Mange  Corpora   – Generate  Models   – Product  MT   – Publish  Results   •  Re-­‐purpose  exisIng  staff  with  training   •  Rightsourcing  
  • 28. AGENDA   Current  State  of  SMT   GeGng  Started   Skill  Requirements   Use  Cases   Q&A