We propose two encoder-decoder architectures for this task. Our word-based architecture (WORD) is similar to that of Luong et al. (2015).
The context vector is combined with the decoder hidden state through a one layer MLP (yellow), after which an affine transformation followed by a softmax is applied to obtain a distribution over the next word/tag
Here, we model the probability of the target given the source, p(t | s), with an encoder neural network that summarizes the source sequence and a decoder neural network that generates a distribution over the target words and tags at each step given the source.
FeaturesEmbedding --[[ A nngraph unit that maps features ids to embeddings. When using multiple features this can be the concatenation or the sum of each individual embedding. ]]
FeaturesGenerator --[[ Feature decoder generator. Given RNN state, produce categorical distribution over tokens and features. Implements $$[softmax(W^1 h + b^1), softmax(W^2 h + b^2), ..., softmax(W^n h + b^n)] $$. --]]
GLobalAttention The current target hidden state h_j is combined with each of the memory vectors in the source to produce attention weights. --[[ Global attention takes a matrix and a query vector. It then computes a parameterized convex combination of the matrix based on the input query to computer context vector. H_1 H_2 H_3 ... H_n q q q q | | | | \ | | / ..... \ | / a Constructs a unit mapping: $$(H_1 .. H_n, q) => (a)$$ Where H is of `batch x n x dim` and q is of `batch x dim`. The full function is $$\tanh(W_2 [(softmax((W_1 q + b_1) H) H), q] + b_2)$$. --]]
Open-Source Neural Machine Translation in Torch
What is OpenNMT?
OpenNMT was originally developed by Yoon Kim and harvardnlp.
Major source contributions and support come from SYSTRAN.
Basically it is:
“A Modularized Translation Program using Seq2Seq Attention Model”
Features of OpenNMT
Simple general-purpose interface, requires only source/target files.
Speed and memory optimizations for high-performance multi-GPU training.
Includes a dependency-free C++ translator for model deployment.
Latest research features to improve translation performance.
Pretrained models available for several language pairs.
Extensions to allow other sequence generation tasks such as summarization and image-to-text
Active open community welcoming both academic and industrial requests and contributions.
The combination of context vector
Compute the probability of the next
token by computing affine
transformation and softmax
Systran picked this phrase to show the superiority of their system.
“I am officially running … for president of the United States,
and we are going to make our country great again.”
Notice the understanding the meaning of ‘running’ requires the context of the whole sentence.
Korean - 저는 공식적으로 미국 대통령에 입후보하고 있으며, 우리는 우리 나라를 다시 위대하게 만들 것입니다.
Chinese - 我是在正式竞选美国总统，我们将再次使我国伟大。
Korean - 나는 공식적으로 미국 대통령을 위해 달리고있다. 그리고 우리는 다시 우리나라를 위대하게 만들 것이다.
Chinese - 我正式运行......为美国总统，我们将使我们的国家再次大。
English to Korean - 나는 미국의 대통령을 공식적으로 운영하고 있으며, 우리 나라를 다시 위대하게 만들 것이다.
Korean to English - I am officially running for president of the United States, and we will make our country great again.
Noticed it is not too difficult to write OpenNMT clone in Python if you use Tensorflow.
Many of the functionalities in OpenNMT are already implemented in Tensorflow.
Plan 1: Modularize the Tensorflow Seq2Seq example and add some convenient tools.
Plan 2: Make this as a console application.
As a new side project, we plan to publish this project by the end of the winter break (Jan 23)
Contribute to OpenNMT? or publish a new Open Source?
OpenNMT in Python
Mar. 3, 2020
Nov. 2, 2017
Brief introduction of OpenNMT: Open Source Neural Machine Translation in Torch.