TAUS	
  MACHINE	
  TRANSLATION	
  SHOWCASE	
  
Creating Competitive Advantage with Rapid Customization &
Deployment of Mos...
No	
  Hardware.	
  No	
  So,ware.	
  No	
  Hassle	
  MT.	
  

Tony	
  O’Dowd	
  
Founder	
  &	
  Chief	
  Architect	
  
Lo...
What	
  we	
  aim	
  to	
  cover	
  today?	
  
—  User	
  Scenario	
  #1	
  
—  Building	
  Produc?on	
  MT	
  Systems	
...
What	
  is	
  KantanMT.com?	
  
—  Sta6s6cal	
  MT	
  System	
  
—  Cloud-­‐based	
  
— 
— 
— 

Highly	
  scalable	
 ...
Measure	
  –	
  KantanMT	
  engine	
  calibra?on	
  
—  Track	
  using	
  KantanWatch™	
  
—  Compare	
  engines	
  quic...
Learn	
  –	
  KantanMT	
  Experimenta?on	
  
	
  
	
  

TAUS	
  –	
  MT	
  Showcase	
  
Learn	
  –	
  KantanMT	
  Experimenta?on	
  
—  What	
  to	
  look	
  out	
  for?	
  
	
  
BLEU	
  
F-­‐Measure	
  
	
  
...
Learn	
  –	
  KantanMT	
  Experimenta?on	
  
—  Learn	
  from	
  examining	
  the	
  output	
  
	
  
§ 

Low	
  

OK	
  ...
Learn	
  –	
  KantanMT	
  Experimenta?on	
  
—  Learn	
  from	
  examining	
  the	
  output	
  
	
  
§  Ac6on	
  Plan	
 ...
Ac6on	
  Plan	
  –	
  focus	
  on	
  improving	
  measurements	
  

TAUS	
  –	
  MT	
  Showcase	
  
Build	
  	
  	
  	
  	
  Measure	
  	
  	
  	
  	
  Learn	
  :	
  The	
  Results	
  
—  Analyse	
  output	
  
	
  
§  Un...
User	
  Scenario	
  #2	
  
—  Long	
  history	
  of	
  MT	
  usage	
  
—  In-­‐house	
  exper?se	
  
—  Large	
  custom...
User	
  Scenario	
  #2	
  
—  Business	
  Scenario	
  
— 
— 

Con?nuous	
  transla?on	
  quality	
  improvement	
  
Red...
User	
  Scenario	
  #2	
  
—  Results	
  
— 
— 
— 

Greater	
  produc?on	
  capacity	
  
Improvement	
  in	
  quality	...
User	
  Scenario	
  #3	
  
—  Long	
  history	
  of	
  MT	
  usage	
  
—  In-­‐house	
  exper?se	
  
—  Large	
  custom...
User	
  Scenario	
  #3	
  
—  Business	
  Scenario	
  
—  Select	
  best	
  engine	
  for	
  language	
  combina?on	
  
...
Source
MT	
  Target
Spacing

Syntax	
  and	
  Grammar

Locale	
  Adaptation

Tags	
  and	
  Markup

Sentence	
  Structure
...
User	
  Scenario	
  #3	
  
—  Feedback	
  collated	
  from	
  Senior	
  Translators	
  
—  Match	
  best	
  engine	
  fo...
User	
  Scenario	
  #3	
  
—  Levels	
  of	
  post-­‐edi6ng	
  services	
  	
  
—  Adequacy	
  Review	
  
— 

— 
— 

...
Tony	
  O’Dowd	
  
tonyod@kantanmt.com	
  
Upcoming SlideShare
Loading in …5
×

TAUS MT SHOWCASE, Creating Competitive Advantage with Rapid Customization & Deployment of Moses, Tony O’Dowd, KantanMT, 10 October 2013

376
-1

Published on

This presentation is a part of the MosesCore project that encourages the development and usage of open source machine translation tools, notably the Moses statistical MT toolkit. 

MosesCore is supported by the European Commission Grant Number 288487 under the 7th Framework Programme. 




For the latest updates go to http://www.statmt.org/mosescore/
or follow us on Twitter - #MosesCore

Published in: Technology, Business
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
376
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
10
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

TAUS MT SHOWCASE, Creating Competitive Advantage with Rapid Customization & Deployment of Moses, Tony O’Dowd, KantanMT, 10 October 2013

  1. 1. TAUS  MACHINE  TRANSLATION  SHOWCASE   Creating Competitive Advantage with Rapid Customization & Deployment of Moses 10:20 – 10:30 Thursday, 10 October 2013 Tony O’Dowd KantanMT
  2. 2. No  Hardware.  No  So,ware.  No  Hassle  MT.   Tony  O’Dowd   Founder  &  Chief  Architect   Localiza6on  World  2013  
  3. 3. What  we  aim  to  cover  today?   —  User  Scenario  #1   —  Building  Produc?on  MT  Systems   —  —  Structured  Approach   Build  –  Measure  –  Learn  Process   —  User  Scenario  #2   —  Retraining  with  Post-­‐Edits   —  RoundTable  Inc.  –  their  story   —  User  Scenario  #3   —  Selec?ng  the  best  engine  for  the  job   —  —  20  Minutes   Milengo  –  their  approach   GeLng  the  Translator  involved   —  Q&A         TAUS  –  MT  Showcase  
  4. 4. What  is  KantanMT.com?   —  Sta6s6cal  MT  System   —  Cloud-­‐based   —  —  —  Highly  scalable   Inexpensive  to  operate   Quick  to  deploy   —  Our  Vision   —  To  put  Machine  Transla?on     —  —  —  Customiza?on     Improvement     Deployment     —  into  your  hands   Fully  Opera?onal  7  months   Ac6ve  KantanMT  Engines   6,632   Training  Words  Uploaded   23,653,605,925   Member  Words  Translated   362,291,925   TAUS  –  MT  Showcase  
  5. 5. Measure  –  KantanMT  engine  calibra?on   —  Track  using  KantanWatch™   —  Compare  engines  quickly   —  Monitor  produc?on  data   —  Use  your  own  test/tune  data  sets   TAUS  –  MT  Showcase  
  6. 6. Learn  –  KantanMT  Experimenta?on       TAUS  –  MT  Showcase  
  7. 7. Learn  –  KantanMT  Experimenta?on   —  What  to  look  out  for?     BLEU   F-­‐Measure            24%      50%   TER        66%   Wordcount      172K   TAUS  –  MT  Showcase  
  8. 8. Learn  –  KantanMT  Experimenta?on   —  Learn  from  examining  the  output     §  Low   OK   High   Low   Catalogue  Errors   §  §  §  §  Untranslated  text   Incorrect  numeric   formaLng   Invalid  characters   High  level  of  post-­‐edi?ng   required     §  Conclusions   §  §  §  §  Engine  coverage  is  bad  due   to  low  wordcount   Post-­‐Edi?ng  is  high  due  to   low  engine  coverage   Training  data  doesn’t   contain  correct  numeric   formaLng   Bad  formaLng  in  training   data   TAUS  –  MT  Showcase  
  9. 9. Learn  –  KantanMT  Experimenta?on   —  Learn  from  examining  the  output     §  Ac6on  Plan   §  §  §  §  Low   OK   High   Low   Coverage  –  More  training   data  required,  relevant  and   of  high  quality.  Also  use  a   Glossary  File  to  improve   terminology  consistency   and  accuracy.   Numeric  Forma_ng  –  Use   PEX  rule  to  post-­‐edit   transla?on  and  fix  numeric   formats   Invalid  Character  –  Use   PEX  rule  to  fix  this  invalid   character  issue   Post-­‐Edi6ng  –  By   increasing  the  quan?ty  of   training  data  the   KantanMT  engine  will   perform  be]er  overall   TAUS  –  MT  Showcase  
  10. 10. Ac6on  Plan  –  focus  on  improving  measurements   TAUS  –  MT  Showcase  
  11. 11. Build          Measure          Learn  :  The  Results   —  Analyse  output     §  Untranslated  text   §  §  Numeric  FormaLng     Invalid  Character   TAUS  –  MT  Showcase  
  12. 12. User  Scenario  #2   —  Long  history  of  MT  usage   —  In-­‐house  exper?se   —  Large  customer  demand   —  Using  MT  since  2005   —  Now  manage  their  own  in-­‐house   system  on  the  KantanMT.com   —  Goal   —  Faster  project  turnaround  ?mes   —  More  service  offerings  to  client  base   —  More  produc?on  capacity   —  Cost  efficiencies   About  RoundTable  Studio     RoundTable  Studio  is  a  leading   provider  of  transla?on  and  localiza?on     services  for  the  Spanish  and  Brazilian   Portuguese  language  markets.   Early Adopter TAUS  –  MT  Showcase  
  13. 13. User  Scenario  #2   —  Business  Scenario   —  —  Con?nuous  transla?on  quality  improvement   Reduced  post-­‐edi?ng/turn-­‐around  ?mes   Early Adopter TAUS  –  MT  Showcase  
  14. 14. User  Scenario  #2   —  Results   —  —  —  Greater  produc?on  capacity   Improvement  in  quality   Faster  project  turn-­‐around  ?mes   “Since  signing  up  with   KantanMT,  we  have  been  able   to  take  on  more  work  and   increase  our  capacity  levels”         Early Adopter Laura  Grossi  –  MT  Specialist,  RoundTable  Studio   TAUS  –  MT  Showcase  
  15. 15. User  Scenario  #3   —  Long  history  of  MT  usage   —  In-­‐house  exper?se   —  Large  customer  demand   —  Originally  outsourced  MT   —  3rd  party  consultancy  company   —  Vendor  Agnos6c   —  Microso,  Translator  Hub   —  KantanMT.com   —  All  systems  are  cloud  based   —  Like  hands-­‐on  approach  to  managing  their   own  MT  engines   About  Milengo     Milengo  provides  transla?on,  localiza?on   and  related  language  services  specializing   in  so,ware,  website  and  documenta?on   localiza?on.   TAUS  –  MT  Showcase  
  16. 16. User  Scenario  #3   —  Business  Scenario   —  Select  best  engine  for  language  combina?on   —  Client  requests  a  job  that  involves  a  MT  component   —  Finding  Training  Data   —  Data  is  aggregated  from  the  clients  previous   transla?ons   —  Building  Engines   —  Same  training  data  is  provided  to  each  engine   —  Same  language  combina?ons   —  Itera?ve  process  un?l  sa?sfied  with  system   performance  (internal  process)   TAUS  –  MT  Showcase  
  17. 17. Source MT  Target Spacing Syntax  and  Grammar Locale  Adaptation Tags  and  Markup Sentence  Structure Punctuation   Wrong  Part  of  Speech —  Transla6on  Quality  Analysis   —  Sample  of  1,000  segments  selected   —  Tabulated  &  anonymised   Style Wrong  Word  Form Capitalization Text/Information  added Literal  translation Compliance  with  client  specs Source  not   Translated/Omissions Wrong  Spelling Wrong  terminology          Overall  quality  (1-­‐4)          Fluency  (Score  1-­‐5)          Adequacy  (Score  1-­‐5) User  Scenario  #3   Tech     —  Dispatched  to  Senior  Translators   TAUS  –  MT  Showcase  
  18. 18. User  Scenario  #3   —  Feedback  collated  from  Senior  Translators   —  Match  best  engine  for  language  quality   —  Very  unique  –  pseudo-­‐crowd  sourcing  of  most   appropriate  engine   —  Match  engine  to  best  language  support   —  Translators  always  involved  in  engine   selec?on  process   —  Feedback  to  client   —  Match  requirements  and  quality  expecta?ons   TAUS  –  MT  Showcase  
  19. 19. User  Scenario  #3   —  Levels  of  post-­‐edi6ng  services     —  Adequacy  Review   —  —  —  All  meaning  expressed  in  the  source  segment  appears  in   the  translated  segment   Structural  integrity  –  tags,  placeholders   Fit-­‐for-­‐purpose  quality   —  Fluency  Review   —  —  No  grammar  errors,  excellent  word  selec?on  and  good   syntax   Publishable  quality   —  Client  picks  review   —  To  fit  budget,  ?me-­‐frame,  audience,  channel  etc.     TAUS  –  MT  Showcase  
  20. 20. Tony  O’Dowd   tonyod@kantanmt.com  
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×