Moses
Upcoming SlideShare
Loading in...5
×
 

Moses

on

  • 7,861 views

moses is a machine language translation tool

moses is a machine language translation tool

Statistics

Views

Total Views
7,861
Views on SlideShare
7,861
Embed Views
0

Actions

Likes
0
Downloads
23
Comments
0

0 Embeds 0

No embeds

Accessibility

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Moses Moses Presentation Transcript

  • Presented by NIKHIL.P MCA S4 CHINTECH
  • INTRODUCTION  TRANSLATION?? Translation is the communication of the meaning of a source-language text by means of an equivalent target-language text.  TRANSLITERATION?? It is the conversion of a text from one script to another.
  • INTRODUCTION  Why TRANSLATION?? Being able to establish links between two languages allows for transferring resources from one language to another. Books written in unknown foreign languages can be read by translating the contents of the book in our own language.
  • Computers Databases Robotics Artificial Intelligence Algorithms Natural Language Processing Information Retrieval Machine Translation Networking Search
  • INTRODUCTION  Natural Language Processing(NLP) NLP is a field of Computer Science, Artificial Intelligence and Linguistics, concerned with the interactions between computers and human(natural) languages. Applications of NLP Machine Translation database access information retrieval
  • Machine Translation??  Machine Translation is the automatic translation , for example using a computer system, from a first language(source language) into another language(target language).
  • Background  Automatic machine language processing was one of the first natural language processing applications developed in computer science.  Explores rule based, example based, knowledge based and statistical approaches.  Statistical Machine Translation(SMT) is the preferred approach in many industrial and academic research.
  •  Rule based Machine Translation: a system of lexical, grammatical, and reordering rules is created for source/target pair. Rules are then applied to source to produce output.  Example based Machine Translation: a bilingual text corpus is used directly for comparison against source text and case based reasoning is applied to create output.
  • What is Moses?  It is an open source toolkit  Toolkit for (SMT)Statistical Machine Translation  Moses is under LGPL license  It uses standard external toolkits such as GIZA++ and SRILM
  • Statistical Machine Translation??  Goal is to produce a target sentence from a source sentence that maximizes the probability  Statistical MT system is modeled as three separate parts: language model translation model decoder
  • language model(LM): assigns a probability to any target string of words {P(e)} an LM probability distribution over strings S that attempts to reflect how frequently a string S occurs as a sentence.
  • translation model(TM): assigns a probability to any pair of target and source strings {P(f|e)} decoder: determines translation based on probabilities of LM & TM
  • GIZA++  It is used for making word-alignments  This toolkit is an implementation of the original IBM Models that started machine translation research.
  •  First the language pairs are aligned bi-directionally, as English to German and German to English  This generates two word alignments, then performs  Intersection-, we get a high-precision alignment of high confidence alignment points,  Union-, we get a high-recall alignment with additional alignment points.
  • SRILM  It is used for language modeling.  It consists of the following components A set of C++ class libraries implementing language models, supporting data structures and miscellaneous utility functions. A set of executable programs built on top of these libraries to perform standard tasks such as training LMs and testing them on data, A collection of miscellaneous scripts facilitating minor related tasks
  • Moses Translation Process  It involves  Segmenting the source sentence into source phrases  Translating each source phrase into a target phrase  & optionally reordering the target phrases into a target sentence.
  • Moses Toolkit  Consists of all the components needed to preprocess data , train the language models and the translation models.  Also contains tools for tuning these models using minimum error rate.  External tools like GIZA++ & SRILM
  • Moses Toolkit  Decoder is the core component of Moses.  Phrase based decoder is used.  Job of decoder is to find the highest scoring sentence in the target language corresponding to source sentence.  Possible to output a ranked list of translation candidates
  •  Principles used when developing Moses decoder  Accessibility  Easy to maintain  Flexibility  Easy for distributed team development  Portability  It was developed in C++ for efficiency and followed modular, object-oriented design.
  •  Decoding process in various ways: -Input:-can be plain sentence -Translation model -Decoding algorithm -Language model
  •  Contributed Tools  Moses Server- provides an xml-rpc interface to the decoder  Web translation- set of scripts to translate webpage  Analysis tools- scripts to enable and analyze the visualization of Moses output
  • Moses Decoder A simple translation model Contains two files: Phrase-table(phrase translation table) {de ||| the ||| 0.3 ||| |||} Moses.ini(configuration file) The decoder is controlled by moses.ini
  •  Phrase table: The phrase translation tables are the main knowledge source for the machine translation decoder. • entry means that the probability of translating the English word the from the German der is 0.3.
  •  Configuration file The decoder is controlled by the Moses configuration file moses.ini translation model files and language model files are specified here.
  • Moses Decoder Trace This option reveals which phrase translation were used in the best translation found by the decoder.
  • Moses Decoder Tuning for Quality the probability cost is assigned by four models  Phrase translation table (phi(f|e) ensures that both source and target language phrases are good translation of each other  Language model (LM(e)) ensures that the output is fluent target language
  •  Reordering model (D(e,f)) allows for the re-ordering of the input sentence  Word penalty (W(e)) to ensure that the translation do not get too long or too short
  • Moses Decoder Tuning for Speed speed-ups are achieved by limiting the search space of the decoder • Translation table size • Hypothesis stack size
  • Translation table size   one strategy is to reduce the number of translation options used for each input phrase , i.e., number of table entries that are retrieved. two ways to limit table size I. II. fixed limits on translation options retrieved phrase translation probability has to above some value
  •  Hypothesis stack size another way to reduce the search space is to reduce the size of hypothesis stacks. for each number of foreign words translated, decoder keeps a stack of the best translations.
  • Moses Decoder Limit on Distortion  Reordering cost is measured by the number of words skipped when foreign phrases are picked out of order.  Reordering cost is computed for finding the best target pair probability.
  • Moses Decoder
  • Decoding Algorithm  Decoder uses a beam search algorithm  The output sentence is generated left to right in form of hypothesis  Final state in the search are hypotheses that cover all foreign words.
  •  Beam Search an efficient search algorithm that quickly finds the highest probability translation among the exponential number of choices. Search through the space of hypotheses generated is performed using beam search that keeps in each node the list of the top best translations for the node.
  • The score for the translation is computed using the weights of the individual phrases that make up the translation and the overall LM probability of the combination. The scores are computed by querying the standard Moses Phrase Table and the LM for the target language.
  • Language Models  Decoder works with the following language models: SRI language model IRST language model RandLM KenLM is included by default in moses
  • Translating Webpages with Moses
  •  Moses servers are installed in one or several computers  On each Moses server, a daemon(daemon.pl) accepts network connection on a given port and copies everything it receives from the connection to Moses.  Another web server runs Apache or any web server software  Through web server cgi scripts(index.cgi, translate.cgi) are served to clients.
  •  A client request index.cgi via the web server, a form containing textbox is served back to enter the URL.  The form is submitted to “translate.cgi” which does the job. it fetches page from web extract plaintext from it send those to moses server inserts the translation back into document& to client
  • Setting up MOSES server Choosing machines for moses servers running Moses is slow and expensive process, so the machine used must have a fast processor and as many GB’s of memory as possible. Install Moses for each moses server, need to install and configure the language pair that we wish to use.
  • Setting up MOSES server Install daemon.pl open bin/daemon.pl and edit the $MOSES and $MOSES_INI paths to point to the location of moses binary and moses configuration file. Choose a port number pick any port number between 1024 and 49151 for the daemon process to listen on.
  • Setting up MOSES server Start the daemon to activate Moses server, type in a shell on the server, ./daemon.pl <hostname> <port> hostname is the name of the host where Moses is installed. port is the selected port
  • Setting up MOSES server Configure web server to connect to Moses server final step is to tell the front-end Web server where to find the back-end Moses server in the translate.cgi script set the @MOSES_ADDRESS array to the list of hostname:port strings identifying the Moses servers.
  • Comparison with pharaoh and phramer for a fren translation of 2000 sentences
  • Installing Moses Need to install boost sudo apt-get install libboost-all-dev get source code git clone git://github.com/mosessmt/mosesdecoder.git
  • Installing GIZA++  wget http://giza-pp.googlecode.com/files/giza-pp- v1.0.7.tar.gz  tar xzvf giza-pp-v1.0.7.tar.gz  cd giza-pp  Make  cd ~/mosesdecoder  mkdir tools  cp ~/giza-pp/GIZA++-v2/GIZA++ ~/giza-pp/GIZA++- v2/snt2cooc.out ~/giza-pp/mkcls-v2/mkcls tools
  • Installing IRSTLM  tar zxvf irstlm-5.80.01.tgz  cd irstlm-5.80.01  ./regenerate-makefiles.sh  ./configure --prefix=$HOME/irstlm  make install
  • Moses Platform  Primary development platform for Moses is Linux.  & recommended platform is Linux since it is easier to get support for it.  However it works on other platforms also.
  • Moses Releases  Moses 1.0 (28th Jan 2013)  Moses 0.91 (12th Oct 2012)
  • Importance of Moses  Moses is an installable software unlike other online- only translation systems  Online systems cannot be trained on our own data  There is also a problem with privacy, if you have to translate sensitive info.
  • Conclusion Moses is an open source toolkit, so that the users can modify and customize the toolkit based on their needs and requirements.
  • Reference  www.statmt.org/moses/  www.crosslang.com/en/machine-translation/custom- built-mt-engines/moses-smt
  • Questions??