• Share
  • Email
  • Embed
  • Like
  • Private Content
TAUS MT SHOWCASE, Moses Past, Present and Future, Hieu Hoang, University of Edinburgh, 12 June 2013
 

TAUS MT SHOWCASE, Moses Past, Present and Future, Hieu Hoang, University of Edinburgh, 12 June 2013

on

  • 637 views

This presentation is a part of the MosesCore project that encourages the development and usage of open source machine translation tools, notably the Moses statistical MT toolkit. ...

This presentation is a part of the MosesCore project that encourages the development and usage of open source machine translation tools, notably the Moses statistical MT toolkit.

MosesCore is supported by the European Commission Grant Number 288487 under the 7th Framework Programme.

For the latest updates, follow us on Twitter - #MosesCore

Statistics

Views

Total Views
637
Views on SlideShare
634
Embed Views
3

Actions

Likes
0
Downloads
11
Comments
0

1 Embed 3

https://twitter.com 3

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

CC Attribution License

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    TAUS MT SHOWCASE, Moses Past, Present and Future, Hieu Hoang, University of Edinburgh, 12 June 2013 TAUS MT SHOWCASE, Moses Past, Present and Future, Hieu Hoang, University of Edinburgh, 12 June 2013 Presentation Transcript

    • TAUS  MACHINE  TRANSLATION  SHOWCASE  Moses Past, Present and Future09:20 – 09:40Wednesday, 12 June 2013Hieu HoangUniversity of Edinburgh
    • Sta$s$cal  Machine  Transla$on  with  Moses  Hieu  Hoang  Localiza$on  World  2013  0.6227  
    • Agenda  •  What  is  Sta$s$cal  Machine  Transla$on?  •  What  is  Moses?  – Common  misconcep$ons  •  Coming  up  •  What  can  we  do  for  you?  Moses  by  Hieu  Hoang,  University  of  Edinburgh  3  
    • Agenda  •  What  is  Sta$s$cal  Machine  Transla$on?  •  What  is  Moses?  – Common  misconcep$ons  •  Coming  up  •  What  can  we  do  for  you?  Moses  by  Hieu  Hoang,  University  of  Edinburgh  4  
    • What  is  Sta$s$cal  Machine  Transla$on?    It  is  very  temp,ng  to  say  that  a  book  wri5en  in  Chinese  is  simply  a  book  wri5en  in  English  which  was  coded  into  the  “Chinese  code.”  If  we  have  useful  methods  for  solving  almost  any  cryptographic  problem,  may  it  not  be  that  with  proper  interpreta,on  we  already  have  useful  methods  for  transla,on?  Warren  Weaver  1949  Moses  by  Hieu  Hoang,  University  of  Edinburgh  5  
    • •  NLP  Applica$on  – search  engines,  text  mining  etc.  •  Big-­‐data  – bi-­‐text  from  the  Internet  •  eg.  mul$lingual  websites,  documents  – large  monolingual  data  •  Learn  to  translate  – from  previous  transla$ons  – models  of  language  What  is  Sta$s$cal  Machine  Transla$on?    Moses  by  Hieu  Hoang,  University  of  Edinburgh  6  
    • What  is  Sta$s$cal  Machine  Transla$on?  Training  Training  Data   Linguis$c  Tools  bi-­‐text  monolingual  data  dic$onary  SMT  System  transla$on  model  language  model  lots  of  numbers…  Using  Source  Text  SMT  System  transla$on  model  language  model  lots  of  numbers…  §  Source  Text  Moses  by  Hieu  Hoang,  University  of  Edinburgh  7  
    • What  is  a  model?  Moses  by  Hieu  Hoang,  University  of  Edinburgh  8  thanks  to  Precision  Transla$on  Tools  •  Transla$on  Model  •  Language  Model  – (of  the  target  language)  
    • What  is  a  model?  •  Transla$on  model  – source  à  transla$on  – probability  Moses  by  Hieu  Hoang,  University  of  Edinburgh  9  source   target   probability  den  Vorschlag   the  proposal   0.6227  ‘s  proposal   0.1068  a  proposal   0.0341  the  idea   0.0250  this  proposal   0.0227  proposal   0.0205  ….   ….  
    • What  is  a  model?  •  Language  model  – Likelihood  of  sentence  – in  target  language  Moses  by  Hieu  Hoang,  University  of  Edinburgh  10  text   probability  I  would  like   0.489  would  like  to   0.905  like  to  commend   0.002  to  commend  the   0.472  commend  the  rapporteur  0.147  ….   ….  
    • Agenda  •  What  is  Sta$s$cal  Machine  Transla$on?  •  What  is  Moses?  – Common  misconcep$ons  •  Coming  up  •  What  can  we  do  for  you?  Moses  by  Hieu  Hoang,  University  of  Edinburgh  11  
    • What  is  Moses?  •  Replacement  for  Pharoah  – Academic  so_ware  – Closed-­‐source  •  Open  source  •  Re-­‐wriaen,  clean  code  – More  features  •  Large  developer  community  – Ini$ated  by  Hieu  Hoang  – Developed  at  NLP  Workshop  Moses  by  Hieu  Hoang,  University  of  Edinburgh  12  
    • Agenda  •  What  is  Sta$s$cal  Machine  Transla$on?  •  What  is  Moses?  – Timeline  – Common  misconcep$ons  •  Coming  up  •  What  can  we  do  for  you?  Moses  by  Hieu  Hoang,  University  of  Edinburgh  13  
    • What  is  Moses?  •  Only  for  Linux  •  Difficult  to  use  •  Unreliable  •  Only  phrase-­‐based  •  Developed  by  one  person  •  Slow  Common  Misconcep$ons  Moses  by  Hieu  Hoang,  University  of  Edinburgh  14  
    • Only  works  on  Linux  •  Tested  on  –  Windows  7  (32-­‐bit)  with  Cygwin  6.1    –  Mac  OSX  10.7  with  MacPorts  –  Ubuntu  12.10,  32  and  64-­‐bit  –  Debian  6.0,  32  and  64-­‐bit  –  Fedora  17,  32  and  64-­‐bit  –  openSUSE  12.2,  32  and  64-­‐bit  •  Project  files  for  –  Visual  Studio  –  Eclipse  on  Linux  and  Mac  OSX  Moses  by  Hieu  Hoang,  University  of  Edinburgh  15  
    • Difficult  to  use  •  Easier  compile  and  install  –  Boost  bjam    –  No  installa$on  required  •  Binaries  available  for  –  Linux  –  Mac  –  Windows/Cygwin  –  Moses  +  Friends  •  IRSTLM  •  GIZA++  and  MGIZA  •  Ready-­‐made  models  trained  on  Europarl  Moses  by  Hieu  Hoang,  University  of  Edinburgh  16  
    • Unreliable  •  Monitor  check-­‐ins  •  Unit  tests  •  More  regression  tests  •  Nightly  tests  –  Run  end-­‐to-­‐end  training  –  hap://www.statmt.org/moses/cruise/  •  Tested  on  all  major  OSes  •  Train  Europarl  models  –  Phrase-­‐based,  hierarchical,  factored  –  8  language-­‐pairs  –  hap://www.statmt.org/moses/RELEASE-­‐1.0/models/  Moses  by  Hieu  Hoang,  University  of  Edinburgh  17  
    • Only  phrase-­‐based  model  – replacement  for  Pharoah  – extension  of  Pharaoh  •  From  the  beginning  – Factored  models  – Lamce  and  confusion  network  input  – Mul$ple  LMs,  mul$ple  phrase-­‐tables  •  since  2009  – Hierarchical  model  – Syntac$c  models  Moses  by  Hieu  Hoang,  University  of  Edinburgh  18  
    • Developed  by  one  person  •  ANYONE  can  contribute    – 50  contributors  ‘git  blame’  of  Moses  repository  0%  5%  10%  15%  20%  25%  30%  35%  40%  Moses  by  Hieu  Hoang,  University  of  Edinburgh  19  
    • Slow  thanks  to  Ken!!  Decoding  -101.7-101.6-101.5-101.41 2 3 4 5ModelscoreCPU seconds/sentence excluding loadingMosescdecJoshuaMoses  by  Hieu  Hoang,  University  of  Edinburgh  20  
    • Slow  •  Mul$threaded  •  Reduced  disk  IO  – compress  intermediate  files  •  Reduce  disk  space  requirement  Time  (mins)   1-­‐core   2-­‐cores   4-­‐cores   8-­‐cores   Size  (MB)  Phrase-­‐based  60   47  (79%)  37  (63%)  33  (56%)  893  Hierarchical   1030   677  (65%)  473  (45%)  375  (36%)  8300  Training  Moses  by  Hieu  Hoang,  University  of  Edinburgh  21  
    • What  is  Moses?  Common  Misconcep$ons  •  Only  for  Linux  •  Difficult  to  use  •  Unreliable  •  Only  phrase-­‐based  •  Developed  by  one  person  •  Slow  Moses  by  Hieu  Hoang,  University  of  Edinburgh  22  
    • What  is  Moses?  •  Only  for  Linux    Windows,  Linux,  Mac  •  Difficult  to  use  Easier  compile  and  install  •  Unreliable  Mul$-­‐stage  tes$ng  •  Only  phrase-­‐based  Hierarchical,  syntax  model  •  Developed  by  one  person  everyone  •  Slow  Fastest  decoder,  mul$threaded  training,  less  IO  Common  Misconcep$ons  Moses  by  Hieu  Hoang,  University  of  Edinburgh  23  
    • Agenda  •  What  is  Sta$s$cal  Machine  Transla$on?  •  What  is  Moses?  – Common  misconcep$ons  •  Coming  up  •  What  can  we  do  for  you?  Moses  by  Hieu  Hoang,  University  of  Edinburgh  24  
    • Coming  up…  Moses  by  Hieu  Hoang,  University  of  Edinburgh  25  •  Code  cleanup  •  Incremental  Training  •  Beaer  transla$on  – smaller  model  – bigger  data  – faster  training  and  decoding  •  Applica$ons  – CAT  tools  – Speech  transla$on  
    • Applica$ons  •  EU  Project  – CASMACAT  – MATECAT  Moses  by  Hieu  Hoang,  University  of  Edinburgh  26  Computer-­‐Aided  Transla$on  
    • Agenda  •  What  is  Sta$s$cal  Machine  Transla$on?  •  What  is  Moses?  – Common  misconcep$ons  •  Coming  up  •  What  can  we  do  for  you?  Moses  by  Hieu  Hoang,  University  of  Edinburgh  27  
    • What  can  we  do  for  you?  – simpler  Moses  – graphical  interface  – Windows  compa$bility  – terminology  and  glossary  – incremental  training  •  What  can  you  do  for  us?  – code  – data  – funding  Moses  by  Hieu  Hoang,  University  of  Edinburgh  28  
    • What  can  we  do  for  you?  – simpler  Moses  – graphical  interface  – Windows  compa$bility  – terminology  and  glossary  – incremental  training  •  What  can  you  do  for  us?  – code  – data  – funding  Moses  by  Hieu  Hoang,  University  of  Edinburgh  29