EMNLP reading @komachi lab (Tokyo Metropolitan University)
About Morphological Analysis in English, German, and Indonesian. It may be possible to adapt to Japanese Word Segmentation with normalizing word.
EMNLP reading @komachi lab (Tokyo Metropolitan University)
About Morphological Analysis in English, German, and Indonesian. It may be possible to adapt to Japanese Word Segmentation with normalizing word.
6. Dataset Collection
学習データ: sentence-triple pairs -> 質を高くしたいので以下のアプローチ
2 step のアプローチ
1. co-reference resolution (Clark and Manning, 2016) とヒューリスティックによる
first sentence の名詞を main entity に言い換え(例)
例: Barack Obama の Wikipedia の first sentence
He was reelected to the Illinois Senate in 1998.
-> Barack Obama was reelected to the Illinois Senate in 1998.
2. 辞書による言い換えと sentence filter
PATTY (Nakashole et al., 2012), POLY (Grycner and Weikum, 2016), PPDB (Ganitkevitch et al., 2013) ->
これらは 540 predicates と 24, 013 の unique paraphrases を持つ
relationship paraphrase の例: “place of birth” -> {born in, was born in, ...}
sentence filter の例:
⟨Barack Obama, place of birth, Honolulu⟩ に対して
OK: Barack Obama was born in 1961 in Honolulu, Hawaii.
NG (filter される): Barack Obama visited Honolulu in 2010.
7. Joint Learning of Word and Entity Embeddings
Entity embedding JE と Word embedding JW を 足した J を objective function に
TransE (Bordes et al., 2013) Skip-gram (Mikolov et al., 2013)
8. N-gram Based Attention Model
普通の attention モデルだと entitiy が複数の word に対応する点を捉えることが
できない
例: New York University 3単語で一つの entity
そこで、N-gram (N=3) までみて attention を張る
n indicates the n-gram combination