Emnl preading2016

20170218
Neural Morphological Analysis: Encoding-Decoding Canonical
Segments
Kann and Cotterell
EMNLP2016 reading @komachi_lab 
introduced byYoshiaki Kitagawa

Abstract
✤ Canonical morphological segmentation aims to divide words into a
sequence of standardized segments.
✤ we propose a character-based neural encoder- decoder model for this task.
✤ we extend our model to include morpheme-level and lexical information
through a neural reranker.
✤ We set the new state of the art for the task improving previous results by up
to 21% accuracy in three language.

Task: Canonical morphological segmentation
✤ morphological segmentation 
Example: achievability = achiev+abil+ity
✤ Canonical morphological segmentation 
Example: achievability = achieve+able+ity

Neural Canonical Segmentation
✤ Model: Encoder-Decoder Model
✤ Input: achievability
✤ Output: achieve+able+ity

Neural Encoder-Decoder
✤ Just Encoder-Decoder Model

Neural Reranker (1/2)
✤ The reranker rescores canonical segmentations from a candidate set, which
in our setting is sampled from pED.
✤ The parameters W and u are projection and hidden layers.

Neural Reranker (2/2)
✤ The detail of Zθ.

Training
✤ First: Encoder-Decoder 
We train an ensemble of five encoder-decoder models. 
The encoder and decoder RNNs each have 100 hidden units. 
Embedding size is 300. 
We use ADADELTA (Zeiler, 2012) with a minibatch size of 20.
✤ Second: reranking model 
To train the reranking model, we first gather the sample set Sw on the training data. 
We optimize the log-likelihood of the training data using ADADELTA. 
For generalization, we employ L2 reg- ularization and we perform grid search to
determine the coefficient λ ∈ {0.0, 0.1, 0.2, 0.3, 0.4, 0.5}.

Result
✤ RR: This work, namely Encoder-Decoder
+reranker
✤ ED: Encoder-Decoder
✤ Joint: (Cotterell et al., 2016) current state-of-
the-art. Joint of separate transduction and
segmentation component.
✤ WFST: (Mohri et al., 2002) with a log- linear
parameterization.
✤ UB: upper bound

Conclusion and FutureWork
✤ We developed a model consisting of an encoder- decoder and a neural
reranker for the task of canonical morphological segmentation.
✤ Our model combines character-level information with features on the
morpheme level and external information about words.

Emnl preading2016

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Emnl preading2016

Similar to Emnl preading2016 (20)

More from Ace12358

More from Ace12358 (18)

Recently uploaded

Recently uploaded (20)

Emnl preading2016