Moses

Presented by
NIKHIL.P
MCA S4
CHINTECH

INTRODUCTION
 TRANSLATION??

Translation is the communication of the meaning of
a source-language text by means of an equivalent
target-language text.
 TRANSLITERATION??
It is the conversion of a text from one script to another.

INTRODUCTION
 Why TRANSLATION??

Being able to establish links between two languages
allows for transferring resources from one language to
another.
Books written in unknown foreign languages can be
read by translating the contents of the book in our
own language.

Computers
Databases

Robotics

Artificial Intelligence

Algorithms

Natural Language Processing

Information
Retrieval

Machine
Translation

Networking

Search

INTRODUCTION
 Natural Language Processing(NLP)

NLP is a field of Computer Science, Artificial
Intelligence and Linguistics, concerned with the
interactions between computers and human(natural)
languages.
Applications of NLP
Machine Translation
database access
information retrieval

Machine Translation??
 Machine Translation is the automatic translation ,

for example using a computer system, from a first
language(source language) into another
language(target language).

Background
 Automatic machine language processing was one of

the first natural language processing applications
developed in computer science.
 Explores rule based, example based, knowledge based

and statistical approaches.
 Statistical Machine Translation(SMT) is the preferred

approach in many industrial and academic research.

 Rule based Machine Translation: a system of lexical,

grammatical, and reordering rules is created for
source/target pair. Rules are then applied to source to
produce output.
 Example based Machine Translation: a bilingual text

corpus is used directly for comparison against source
text and case based reasoning is applied to create
output.

What is Moses?
 It is an open source toolkit
 Toolkit for (SMT)Statistical Machine Translation
 Moses is under LGPL license
 It uses standard external toolkits such as GIZA++ and

SRILM

Statistical Machine Translation??
 Goal is to produce a target sentence from a source

sentence that maximizes the probability
 Statistical MT system is modeled as three separate
parts:
language model
translation model
decoder

language model(LM): assigns a probability to any
target string of words {P(e)}
an LM probability distribution over strings S that
attempts to reflect how frequently a string S occurs as
a sentence.

translation model(TM): assigns a probability to any
pair of target and source strings {P(f|e)}
decoder: determines translation based on
probabilities of LM & TM

GIZA++
 It is used for making word-alignments
 This toolkit is an implementation of the original IBM

Models that started machine translation research.

 First the language pairs are aligned bi-directionally, as

English to German and German to English
 This generates two word alignments, then performs
 Intersection-, we get a high-precision alignment of

high confidence alignment points,
 Union-, we get a high-recall alignment with additional
alignment points.

SRILM
 It is used for language modeling.
 It consists of the following components

A set of C++ class libraries implementing language
models, supporting data structures and miscellaneous
utility functions.
A set of executable programs built on top of these
libraries to perform standard tasks such as training
LMs and testing them on data,
A collection of miscellaneous scripts facilitating minor
related tasks

Moses Translation Process
 It involves
 Segmenting the source sentence into source phrases
 Translating each source phrase into a target phrase
 & optionally reordering the target phrases into a target

sentence.

Moses Toolkit
 Consists of all the components needed to preprocess

data , train the language models and the translation
models.
 Also contains tools for tuning these models using

minimum error rate.
 External tools like GIZA++ & SRILM

Moses Toolkit
 Decoder is the core component of Moses.
 Phrase based decoder is used.

 Job of decoder is to find the highest scoring sentence

in the target language corresponding to source
sentence.
 Possible to output a ranked list of translation

candidates

 Principles used when developing Moses decoder
 Accessibility
 Easy

to maintain
 Flexibility
 Easy for distributed team development
 Portability

 It was developed in C++ for efficiency and followed

modular, object-oriented design.

 Decoding process in various ways:

-Input:-can be plain sentence
-Translation model
-Decoding algorithm

-Language model

 Contributed Tools
 Moses Server- provides an xml-rpc interface to the

decoder
 Web translation- set of scripts to translate webpage
 Analysis tools- scripts to enable and analyze the

visualization of Moses output

Moses Decoder
A simple translation model

Contains two files:
Phrase-table(phrase translation table)
{de ||| the ||| 0.3 ||| |||}
Moses.ini(configuration file)
The decoder is controlled by moses.ini

 Phrase table:

The phrase translation tables are the main knowledge
source for the machine translation decoder.

• entry means that the probability of translating the

English word the from the German der is 0.3.

 Configuration file

The decoder is controlled by the Moses configuration
file moses.ini

translation model files and language model files are
specified here.

Moses Decoder
Trace

This option reveals which phrase translation were used
in the best translation found by the decoder.

Moses Decoder
Tuning for Quality

the probability cost is assigned by four models
 Phrase translation table (phi(f|e)

ensures that both source and target language
phrases are good translation of each other
 Language model (LM(e))

ensures that the output is fluent target language

 Reordering model (D(e,f))

allows for the re-ordering of the input sentence

 Word penalty (W(e))

to ensure that the translation do not get too long or
too short

Moses Decoder
Tuning for Speed

speed-ups are achieved by limiting the search space
of the decoder
• Translation table size
• Hypothesis stack size

Translation table size




one strategy is to reduce the number of translation
options used for each input phrase , i.e., number of
table entries that are retrieved.

two ways to limit table size
I.
II.

fixed limits on translation options retrieved
phrase translation probability has to above some value

 Hypothesis stack size

another way to reduce the search space is to reduce
the size of hypothesis stacks.
for each number of foreign words translated, decoder
keeps a stack of the best translations.

Moses Decoder
Limit on Distortion
 Reordering cost is measured by the number of words

skipped when foreign phrases are picked out of order.
 Reordering cost is computed for finding the best target
pair probability.

Decoding Algorithm
 Decoder uses a beam search algorithm
 The output sentence is generated left to right in form

of hypothesis
 Final state in the search are hypotheses that cover all

foreign words.

 Beam Search
an efficient search algorithm that quickly finds the
highest probability translation among the exponential
number of choices.
Search through the space of hypotheses generated is
performed using beam search that keeps in each node
the list of the top best translations for the node.

The score for the translation is computed using the
weights of the individual phrases that make up the
translation and the overall LM probability of the
combination.
The scores are computed by querying the standard
Moses Phrase Table and the LM for the target
language.

Language Models
 Decoder works with the following language models:
SRI language model
IRST language model
RandLM

KenLM is included by default in moses

Translating Webpages with Moses

 Moses servers are installed in one or several computers
 On each Moses server, a daemon(daemon.pl) accepts

network connection on a given port and copies
everything it receives from the connection to Moses.
 Another web server runs Apache or any web server

software
 Through web server cgi scripts(index.cgi, translate.cgi)

are served to clients.

 A client request index.cgi via the web server, a form

containing textbox is served back to enter the URL.
 The form is submitted to “translate.cgi” which does the

job.
it fetches page from web
extract plaintext from it
send those to moses server
inserts the translation back into document& to client

Setting up MOSES server
Choosing machines for moses servers
running Moses is slow and expensive process, so the
machine used must have a fast processor and as many
GB’s of memory as possible.
Install Moses
for each moses server, need to install and configure
the language pair that we wish to use.

Install daemon.pl
open bin/daemon.pl and edit the $MOSES and
$MOSES_INI paths to point to the location of moses
binary and moses configuration file.
Choose a port number
pick any port number between 1024 and 49151 for the
daemon process to listen on.

Start the daemon
to activate Moses server, type in a shell on the server,
./daemon.pl <hostname> <port>

hostname is the name of the host where Moses is
installed.
port is the selected port

Configure web server to connect to Moses server
final step is to tell the front-end Web server where to
find the back-end Moses server
in the translate.cgi script set the
@MOSES_ADDRESS array to the list of hostname:port
strings identifying the Moses servers.

Comparison with pharaoh and phramer for a fren translation of 2000 sentences

Installing Moses
Need to install boost
sudo apt-get install libboost-all-dev
get source code
git clone git://github.com/mosessmt/mosesdecoder.git

Installing GIZA++
 wget http://giza-pp.googlecode.com/files/giza-pp-

v1.0.7.tar.gz
 tar xzvf giza-pp-v1.0.7.tar.gz
 cd giza-pp
 Make

 cd ~/mosesdecoder
 mkdir tools
 cp ~/giza-pp/GIZA++-v2/GIZA++ ~/giza-pp/GIZA++-

v2/snt2cooc.out ~/giza-pp/mkcls-v2/mkcls tools

Installing IRSTLM
 tar zxvf irstlm-5.80.01.tgz
 cd irstlm-5.80.01
 ./regenerate-makefiles.sh
 ./configure --prefix=$HOME/irstlm

 make install

Moses Platform
 Primary development platform for Moses is Linux.
 & recommended platform is Linux since it is easier to

get support for it.
 However it works on other platforms also.

Moses Releases
 Moses 1.0 (28th Jan 2013)
 Moses 0.91 (12th Oct 2012)

Importance of Moses
 Moses is an installable software unlike other online-

only translation systems
 Online systems cannot be trained on our own data
 There is also a problem with privacy, if you have to

translate sensitive info.

Conclusion
Moses is an open source toolkit, so that the users can
modify and customize the toolkit based on their needs
and requirements.

Reference
 www.statmt.org/moses/
 www.crosslang.com/en/machine-translation/custom-

built-mt-engines/moses-smt

Moses

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to Moses

Similar to Moses (20)

Recently uploaded

Recently uploaded (20)

Moses