LETS MT!
This presentation is a part of the MosesCore project that encourages the development and usage of open source machine translation tools, notably the Moses statistical MT toolkit.
MosesCore is supporetd by the European Commission Grant Number 288487 under the 7th Framework Programme.
Latest news on Twitter - #MosesCore
TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE, Monaco, Andrejs Vasiljevs, Tilde, 25 March 2012
1. TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE
Moses on the Cloud for
Do-It-Yourself Machine
Translationranslation
By Andrejs Vasiļjevs
2. Moses on the Cloud for
Do-It-Yourself Machine
Translation
s
Andrejs Vasiļjevs
Chairman of the Board, Tilde
andrejs@tilde.com
3. • Language technology
developer
• Localization service
provider
• Leadership in smaller
languages
• Offices in Riga (Latvia),
Tallinn (Estonia) and Vilnius
(Lithuania)
• 135 employees
• Strong R&D team
• 9 PhDs and candidates
13. Tilde / Coordinator
LATVIA
University of Edinburgh
UK
Uppsala University
SWEDEN
Copehagen University
DENMARK
University of Zagreb
CROATIA
Moravia
CZECH REPUBLIC
SemLab
NETHERLANDS
14. • Online collaborative platform for
MT building from user-provided
data
• Repository of parallel and
monolingual corpora for MT
generation
• Automated training of SMT
systems from specified
collections of data
• Users can specify particular
training data collections and
build customised MT engines
from these collections
• Users can also use LetsMT!
platform for tailoring MT system
to their needs from their non-
public data
15. • User-driven cloud-based MT
factory, based on open-source
MT tools
• Services for data collection, MT
generation, customization and
running of variety of user-
tailored MT systems
• Application in localization among
the key usage scenarios
• Strong synergy with FP7 project
ACCURAT to advance data-driven
machine translation for under-
resourced languages and
domains
16. • Stores SMT training data
• Supports different formats –
TMX, XLIFF, PDF, DOC, plain
text
• Converts to unified format
• Performs format
conversions and alignment
Resource
Repository
18. • Integration with CAT tools
• Integration in web pages
• Integration in web browsers
• API-level integration
integration
19. Sharing of training data Training Using
Web page
Anonymous
access
Web page
Procesing, Evaluation ...
translation widget
SMT Resource SMT Multi-Model
Repository Repository
Web browser
Upload
Giza++
(trained SMT models)
Moses SMT toolkit Plug-ins
SMT Resource SMT System
Directory Directory
Web service
Authenticated
access
CAT tools
Moses decoder
System management, user authentication, access rights control ...
20. System
s Architecture
Web Browser
CAT tools
CAT tools CAT tools Widget ...
Browsers plug-ins
REST, SOAP, ...
http/https
TCP/IP
REST
https
REST
https
https
html
Interface Layer
Web Page UI Public API
User interface
REST/SOAP
REST/SOAP
webpage UI, web service API
http
http
Application Logic Layer
Resource
Repository
Adapter
SMT training Translation
Application Logic Resource
Repository
REST
Data Storage Layer High-performance Computing (HPC) Cluster
(Resource Repository)
stores MT training data and
RR API
trained models
REST HPC frontend SGE CPU
File Share CPU CPU CPU
SVN
CPU CPU
High-performance Computing
CPU CPU CPU
System
DB
Cluster
executes all computationally
heavy tasks: SMT training, MT
service, Processing and
aligning of training data etc.
21.
22.
23.
24.
25. Latvian
%
32.9%* productivity
* Skadiņš R., Puriņš M., Skadiņa I., Vasiļjevs A., Evaluation of SMT in
localization to under-resourced inflected language, in Proceedings
of the 15th International Conference of the European Association
for Machine Translation EAMT 2011, p. 35-40, May 30-31, 2011,
Leuven, Belgium
27. • incremental training,
New Moses
• distributed language models
features
• interpolated language models
for domain adaptation
• randomized language models to
train using huge corpora
• translation of formatted texts
• running Moses decoder in a
server mode
28. tilde.com
technologies
for
smaller
languages
The research within the LetsMT! project leading to these results has received funding from the ICT Policy Support
Programme (ICT PSP), Theme 5 – Multilingual web, grant agreement no 250456