Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASEMoses on the Cloud forDo-It-Yourself MachineTranslationranslationBy Andrejs V...
Moses on the Cloud forDo-It-Yourself Machine      Translation             s      Andrejs VasiļjevsChairman of the Board, T...
• Language technology  developer• Localization service  provider• Leadership in smaller  languages• Offices in Riga (Latvi...
machine translationmachine translation
d i s r u p t i v eINNOVATIONd i s r u p t i v e
CHALLENGE
one sizefits all?
[ttable-file]0 0 5 /.../unfactored/model/phrase-table.0-0.gz% ls steps/1/LM_toy_tokenize.1* | catsteps/1/LM_toy_tokenize.1...
buildyour ownMT engine!
scustomized MT
Tilde / CoordinatorLATVIAUniversity of EdinburghUKUppsala UniversitySWEDENCopehagen UniversityDENMARKUniversity of ZagrebC...
• Online collaborative platform for  MT building from user-provided  data• Repository of parallel and  monolingual corpora...
• User-driven cloud-based MT  factory, based on open-source  MT tools• Services for data collection, MT  generation, custo...
• Stores SMT training data             • Supports different formats –               TMX, XLIFF, PDF, DOC, plain           ...
cMT
• Integration with CAT tools              • Integration in web pages              • Integration in web browsers           ...
Sharing of training data                                          Training                                         Using  ...
System                                                                                                                    ...
Latvian           %32.9%*        productivity           * Skadiņš R., Puriņš M., Skadiņa I., Vasiļjevs A., Evaluation of S...
Czech   Polish                 %                 productivity        28.5%25.1%                 * LetsMT! Project Delivera...
• incremental training,New Moses            • distributed language modelsfeatures            • interpolated language model...
tilde.com                                               technologies                                                     f...
TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE, Monaco, Andrejs Vasiljevs, Tilde, 25 March 2012
TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE, Monaco, Andrejs Vasiljevs, Tilde, 25 March 2012
TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE, Monaco, Andrejs Vasiljevs, Tilde, 25 March 2012
TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE, Monaco, Andrejs Vasiljevs, Tilde, 25 March 2012
TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE, Monaco, Andrejs Vasiljevs, Tilde, 25 March 2012
TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE, Monaco, Andrejs Vasiljevs, Tilde, 25 March 2012
Upcoming SlideShare
Loading in …5
×

TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE, Monaco, Andrejs Vasiljevs, Tilde, 25 March 2012

3,849 views

Published on

LETS MT!
This presentation is a part of the MosesCore project that encourages the development and usage of open source machine translation tools, notably the Moses statistical MT toolkit.
MosesCore is supporetd by the European Commission Grant Number 288487 under the 7th Framework Programme.
Latest news on Twitter - #MosesCore

Published in: Technology
  • Be the first to comment

  • Be the first to like this

TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE, Monaco, Andrejs Vasiljevs, Tilde, 25 March 2012

  1. 1. TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASEMoses on the Cloud forDo-It-Yourself MachineTranslationranslationBy Andrejs Vasiļjevs
  2. 2. Moses on the Cloud forDo-It-Yourself Machine Translation s Andrejs VasiļjevsChairman of the Board, Tilde andrejs@tilde.com
  3. 3. • Language technology developer• Localization service provider• Leadership in smaller languages• Offices in Riga (Latvia), Tallinn (Estonia) and Vilnius (Lithuania)• 135 employees• Strong R&D team• 9 PhDs and candidates
  4. 4. machine translationmachine translation
  5. 5. d i s r u p t i v eINNOVATIONd i s r u p t i v e
  6. 6. CHALLENGE
  7. 7. one sizefits all?
  8. 8. [ttable-file]0 0 5 /.../unfactored/model/phrase-table.0-0.gz% ls steps/1/LM_toy_tokenize.1* | catsteps/1/LM_toy_tokenize.1steps/1/LM_toy_tokenize.1.DONEsteps/1/LM_toy_tokenize.1.INFOsteps/1/LM_toy_tokenize.1.STDERRsteps/1/LM_toy_tokenize.1.STDERR.digeststeps/1/LM_toy_tokenize.1.STDOUT% train-model.perl --corpus factored-corpus/proj-syndicate --root-dir unfactored --f de --e en --lm 0:3:factored-corpus/surface.lm:0% moses -f moses.ini -lmodel-file "0 0 3../lm/europarl.srilm.gz“use-berkeley = truealignment-symmetrization-method = berkeleyberkeley-train = $moses-script- just usedir/ems/support/berkeley-train.shberkeley-process = $moses-script-dir/ems/support/berkeley-process.shberkeley-jar = /your/path/to/berkeleyaligner- Moses2.1/berkeleyaligner.jarberkeley-java-options = "-server -mx30000m -ea"berkeley-training-options = "-Main.iters 5 5 - ?EMWordAligner.numThreads 8"berkeley-process-options = "-EMWordAligner.numThreads 8"berkeley-posterior = 0.5tokenizein: raw-stemout: tokenized-stemdefault-name: corpus/tokpass-unless: input-tokenizer output-tokenizertemplate-if: input-tokenizer IN.$input-extension OUT.$input-extensiontemplate-if: output-tokenizer IN.$output-extension OUT.$output-extensionparallelizable: yesworking-dir = /home/pkoehn/experiment
  9. 9. buildyour ownMT engine!
  10. 10. scustomized MT
  11. 11. Tilde / CoordinatorLATVIAUniversity of EdinburghUKUppsala UniversitySWEDENCopehagen UniversityDENMARKUniversity of ZagrebCROATIAMoraviaCZECH REPUBLICSemLabNETHERLANDS
  12. 12. • Online collaborative platform for MT building from user-provided data• Repository of parallel and monolingual corpora for MT generation• Automated training of SMT systems from specified collections of data• Users can specify particular training data collections and build customised MT engines from these collections• Users can also use LetsMT! platform for tailoring MT system to their needs from their non- public data
  13. 13. • User-driven cloud-based MT factory, based on open-source MT tools• Services for data collection, MT generation, customization and running of variety of user- tailored MT systems• Application in localization among the key usage scenarios• Strong synergy with FP7 project ACCURAT to advance data-driven machine translation for under- resourced languages and domains
  14. 14. • Stores SMT training data • Supports different formats – TMX, XLIFF, PDF, DOC, plain text • Converts to unified format • Performs format conversions and alignmentResourceRepository
  15. 15. cMT
  16. 16. • Integration with CAT tools • Integration in web pages • Integration in web browsers • API-level integrationintegration
  17. 17. Sharing of training data Training Using Web page Anonymous access Web page Procesing, Evaluation ... translation widget SMT Resource SMT Multi-Model Repository Repository Web browserUpload Giza++ (trained SMT models) Moses SMT toolkit Plug-ins SMT Resource SMT System Directory Directory Web service Authenticated access CAT tools Moses decoder System management, user authentication, access rights control ...
  18. 18. System s Architecture Web Browser CAT tools CAT tools CAT tools Widget ... Browsers plug-ins REST, SOAP, ... http/https TCP/IP REST https REST https https htmlInterface Layer Web Page UI Public API User interface REST/SOAP REST/SOAP webpage UI, web service API http httpApplication Logic Layer Resource Repository Adapter SMT training Translation Application Logic Resource Repository REST Data Storage Layer High-performance Computing (HPC) Cluster (Resource Repository) stores MT training data and RR API trained models REST HPC frontend SGE CPU File Share CPU CPU CPU SVN CPU CPU High-performance Computing CPU CPU CPU System DB Cluster executes all computationally heavy tasks: SMT training, MT service, Processing and aligning of training data etc.
  19. 19. Latvian %32.9%* productivity * Skadiņš R., Puriņš M., Skadiņa I., Vasiļjevs A., Evaluation of SMT in localization to under-resourced inflected language, in Proceedings of the 15th International Conference of the European Association for Machine Translation EAMT 2011, p. 35-40, May 30-31, 2011, Leuven, Belgium
  20. 20. Czech Polish % productivity 28.5%25.1% * LetsMT! Project Deliverable D6.4
  21. 21. • incremental training,New Moses • distributed language modelsfeatures • interpolated language models for domain adaptation • randomized language models to train using huge corpora • translation of formatted texts • running Moses decoder in a server mode
  22. 22. tilde.com technologies for smaller languagesThe research within the LetsMT! project leading to these results has received funding from the ICT Policy Support Programme (ICT PSP), Theme 5 – Multilingual web, grant agreement no 250456

×