TAUS MT SHOWCASE, Creating Competitive Advantage with Rapid Customization & Deployment of Moses, Tony O’Dowd, KantanMT, 10 October 2013
 

Like this? Share it with your network

Share

TAUS MT SHOWCASE, Creating Competitive Advantage with Rapid Customization & Deployment of Moses, Tony O’Dowd, KantanMT, 10 October 2013

on

  • 471 views

This presentation is a part of the MosesCore project that encourages the development and usage of open source machine translation tools, notably the Moses statistical MT toolkit. 

MosesCore is ...

This presentation is a part of the MosesCore project that encourages the development and usage of open source machine translation tools, notably the Moses statistical MT toolkit. 

MosesCore is supported by the European Commission Grant Number 288487 under the 7th Framework Programme. 




For the latest updates go to http://www.statmt.org/mosescore/
or follow us on Twitter - #MosesCore

Statistics

Views

Total Views
471
Views on SlideShare
471
Embed Views
0

Actions

Likes
0
Downloads
8
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

CC Attribution License

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

TAUS MT SHOWCASE, Creating Competitive Advantage with Rapid Customization & Deployment of Moses, Tony O’Dowd, KantanMT, 10 October 2013 Presentation Transcript

  • 1. TAUS  MACHINE  TRANSLATION  SHOWCASE   Creating Competitive Advantage with Rapid Customization & Deployment of Moses 10:20 – 10:30 Thursday, 10 October 2013 Tony O’Dowd KantanMT
  • 2. No  Hardware.  No  So,ware.  No  Hassle  MT.   Tony  O’Dowd   Founder  &  Chief  Architect   Localiza6on  World  2013  
  • 3. What  we  aim  to  cover  today?   —  User  Scenario  #1   —  Building  Produc?on  MT  Systems   —  —  Structured  Approach   Build  –  Measure  –  Learn  Process   —  User  Scenario  #2   —  Retraining  with  Post-­‐Edits   —  RoundTable  Inc.  –  their  story   —  User  Scenario  #3   —  Selec?ng  the  best  engine  for  the  job   —  —  20  Minutes   Milengo  –  their  approach   GeLng  the  Translator  involved   —  Q&A         TAUS  –  MT  Showcase  
  • 4. What  is  KantanMT.com?   —  Sta6s6cal  MT  System   —  Cloud-­‐based   —  —  —  Highly  scalable   Inexpensive  to  operate   Quick  to  deploy   —  Our  Vision   —  To  put  Machine  Transla?on     —  —  —  Customiza?on     Improvement     Deployment     —  into  your  hands   Fully  Opera?onal  7  months   Ac6ve  KantanMT  Engines   6,632   Training  Words  Uploaded   23,653,605,925   Member  Words  Translated   362,291,925   TAUS  –  MT  Showcase  
  • 5. Measure  –  KantanMT  engine  calibra?on   —  Track  using  KantanWatch™   —  Compare  engines  quickly   —  Monitor  produc?on  data   —  Use  your  own  test/tune  data  sets   TAUS  –  MT  Showcase  
  • 6. Learn  –  KantanMT  Experimenta?on       TAUS  –  MT  Showcase  
  • 7. Learn  –  KantanMT  Experimenta?on   —  What  to  look  out  for?     BLEU   F-­‐Measure            24%      50%   TER        66%   Wordcount      172K   TAUS  –  MT  Showcase  
  • 8. Learn  –  KantanMT  Experimenta?on   —  Learn  from  examining  the  output     §  Low   OK   High   Low   Catalogue  Errors   §  §  §  §  Untranslated  text   Incorrect  numeric   formaLng   Invalid  characters   High  level  of  post-­‐edi?ng   required     §  Conclusions   §  §  §  §  Engine  coverage  is  bad  due   to  low  wordcount   Post-­‐Edi?ng  is  high  due  to   low  engine  coverage   Training  data  doesn’t   contain  correct  numeric   formaLng   Bad  formaLng  in  training   data   TAUS  –  MT  Showcase  
  • 9. Learn  –  KantanMT  Experimenta?on   —  Learn  from  examining  the  output     §  Ac6on  Plan   §  §  §  §  Low   OK   High   Low   Coverage  –  More  training   data  required,  relevant  and   of  high  quality.  Also  use  a   Glossary  File  to  improve   terminology  consistency   and  accuracy.   Numeric  Forma_ng  –  Use   PEX  rule  to  post-­‐edit   transla?on  and  fix  numeric   formats   Invalid  Character  –  Use   PEX  rule  to  fix  this  invalid   character  issue   Post-­‐Edi6ng  –  By   increasing  the  quan?ty  of   training  data  the   KantanMT  engine  will   perform  be]er  overall   TAUS  –  MT  Showcase  
  • 10. Ac6on  Plan  –  focus  on  improving  measurements   TAUS  –  MT  Showcase  
  • 11. Build          Measure          Learn  :  The  Results   —  Analyse  output     §  Untranslated  text   §  §  Numeric  FormaLng     Invalid  Character   TAUS  –  MT  Showcase  
  • 12. User  Scenario  #2   —  Long  history  of  MT  usage   —  In-­‐house  exper?se   —  Large  customer  demand   —  Using  MT  since  2005   —  Now  manage  their  own  in-­‐house   system  on  the  KantanMT.com   —  Goal   —  Faster  project  turnaround  ?mes   —  More  service  offerings  to  client  base   —  More  produc?on  capacity   —  Cost  efficiencies   About  RoundTable  Studio     RoundTable  Studio  is  a  leading   provider  of  transla?on  and  localiza?on     services  for  the  Spanish  and  Brazilian   Portuguese  language  markets.   Early Adopter TAUS  –  MT  Showcase  
  • 13. User  Scenario  #2   —  Business  Scenario   —  —  Con?nuous  transla?on  quality  improvement   Reduced  post-­‐edi?ng/turn-­‐around  ?mes   Early Adopter TAUS  –  MT  Showcase  
  • 14. User  Scenario  #2   —  Results   —  —  —  Greater  produc?on  capacity   Improvement  in  quality   Faster  project  turn-­‐around  ?mes   “Since  signing  up  with   KantanMT,  we  have  been  able   to  take  on  more  work  and   increase  our  capacity  levels”         Early Adopter Laura  Grossi  –  MT  Specialist,  RoundTable  Studio   TAUS  –  MT  Showcase  
  • 15. User  Scenario  #3   —  Long  history  of  MT  usage   —  In-­‐house  exper?se   —  Large  customer  demand   —  Originally  outsourced  MT   —  3rd  party  consultancy  company   —  Vendor  Agnos6c   —  Microso,  Translator  Hub   —  KantanMT.com   —  All  systems  are  cloud  based   —  Like  hands-­‐on  approach  to  managing  their   own  MT  engines   About  Milengo     Milengo  provides  transla?on,  localiza?on   and  related  language  services  specializing   in  so,ware,  website  and  documenta?on   localiza?on.   TAUS  –  MT  Showcase  
  • 16. User  Scenario  #3   —  Business  Scenario   —  Select  best  engine  for  language  combina?on   —  Client  requests  a  job  that  involves  a  MT  component   —  Finding  Training  Data   —  Data  is  aggregated  from  the  clients  previous   transla?ons   —  Building  Engines   —  Same  training  data  is  provided  to  each  engine   —  Same  language  combina?ons   —  Itera?ve  process  un?l  sa?sfied  with  system   performance  (internal  process)   TAUS  –  MT  Showcase  
  • 17. Source MT  Target Spacing Syntax  and  Grammar Locale  Adaptation Tags  and  Markup Sentence  Structure Punctuation   Wrong  Part  of  Speech —  Transla6on  Quality  Analysis   —  Sample  of  1,000  segments  selected   —  Tabulated  &  anonymised   Style Wrong  Word  Form Capitalization Text/Information  added Literal  translation Compliance  with  client  specs Source  not   Translated/Omissions Wrong  Spelling Wrong  terminology          Overall  quality  (1-­‐4)          Fluency  (Score  1-­‐5)          Adequacy  (Score  1-­‐5) User  Scenario  #3   Tech     —  Dispatched  to  Senior  Translators   TAUS  –  MT  Showcase  
  • 18. User  Scenario  #3   —  Feedback  collated  from  Senior  Translators   —  Match  best  engine  for  language  quality   —  Very  unique  –  pseudo-­‐crowd  sourcing  of  most   appropriate  engine   —  Match  engine  to  best  language  support   —  Translators  always  involved  in  engine   selec?on  process   —  Feedback  to  client   —  Match  requirements  and  quality  expecta?ons   TAUS  –  MT  Showcase  
  • 19. User  Scenario  #3   —  Levels  of  post-­‐edi6ng  services     —  Adequacy  Review   —  —  —  All  meaning  expressed  in  the  source  segment  appears  in   the  translated  segment   Structural  integrity  –  tags,  placeholders   Fit-­‐for-­‐purpose  quality   —  Fluency  Review   —  —  No  grammar  errors,  excellent  word  selec?on  and  good   syntax   Publishable  quality   —  Client  picks  review   —  To  fit  budget,  ?me-­‐frame,  audience,  channel  etc.     TAUS  –  MT  Showcase  
  • 20. Tony  O’Dowd   tonyod@kantanmt.com