Neural relation extraction for knowledge base enrichment introduced by Yoshiaki Kitagawa

Neural Relation Extraction for
Knowledge Base Enrichment
Bayu Distiawan Trisedya1, Gerhard Weikum2, Jianzhong Qi1, Rui Zhang
ACL 2019 , @論文読み会, 紹介者: Yoshiaki Kitagawa

Abstract
 KB の質向上のための関係抽出の研究
 sentence からエンティティと関係を抽出するもの
 課題: 過去の研究は, Named Entity Disambiguation (NED) に依存しているため、
NED のエラーの影響を受ける
 この課題に対処するため、
 n-gram based attention model を提案
 このモデルは word とエンティティの embedding を同時学習するモデルになっていて、
NED のエラーの影響を軽減できる
 2つの datasets で State-of-the-art を達成 (F1 で 15.51%, 8.38% の向上)

Introduction
 ⟨h, r, t⟩ を triple とする
 h: head entity
 t: tail entity
 r: relationship.
 典型的には、カバレッジ entity > relation
 この研究では、h と t の間にある関係を見つ
けることを目指す
 この研究の model は Input sentence を
Canonicalized output に翻訳する問題として
モデルを作成している

Contribution
 NED のエラーによる精度悪化を避けるため、(sentence から) triple の正規の表現
を抽出 end-to-end の model を提案
 model 提案部分は以下
 n-gram based attention model (複数単語に効率的にマッピングするため)
 word and entity embeddings の同時学習手法 for NED
 beam search と triple classifier （さらなる精度向上のための手法）
 2つの dataset で評価し、state-of-the-art. 学習データをより良くするため、共参
照と言い換えを使った distant supervision を適用

Overview
 大きく3つのモジュールに分かれる
 Dataset collection
 Embedding
 Neural Relation Extraction

Dataset Collection
 学習データ: sentence-triple pairs -> 質を高くしたいので以下のアプローチ
 2 step のアプローチ
1. co-reference resolution (Clark and Manning, 2016) とヒューリスティックによる
first sentence の名詞を main entity に言い換え（例）
 例: Barack Obama の Wikipedia の first sentence
 He was reelected to the Illinois Senate in 1998.
 -> Barack Obama was reelected to the Illinois Senate in 1998.
2. 辞書による言い換えと sentence filter
 PATTY (Nakashole et al., 2012), POLY (Grycner and Weikum, 2016), PPDB (Ganitkevitch et al., 2013) ->
これらは 540 predicates と 24, 013 の unique paraphrases を持つ
 relationship paraphrase の例: “place of birth” -> {born in, was born in, ...}
 sentence filter の例:
 ⟨Barack Obama, place of birth, Honolulu⟩ に対して
 OK: Barack Obama was born in 1961 in Honolulu, Hawaii.
 NG (filter される): Barack Obama visited Honolulu in 2010.

Joint Learning of Word and Entity Embeddings
 Entity embedding JE と Word embedding JW を足した J を objective function に
TransE (Bordes et al., 2013) Skip-gram (Mikolov et al., 2013)

N-gram Based Attention Model
 普通の attention モデルだと entitiy が複数の word に対応する点を捉えることが
できない
 例: New York University 3単語で一つの entity
 そこで、N-gram (N=3) までみて attention を張る
 n indicates the n-gram combination

Triple Generation
 課題:
 モデルの出力は triple （おさらい）
 embeddings の similarity が原因で間違ったエンティティが選ばれることがある
 例: ”New York City” と “Chicago”
 課題に対する2つの戦略
 Modifyied beam
 Output entity と Input sentence の n-gram の間の、edit distance を計算して re-ranking し、top k
(k=10) を選ぶ
 triple classifier (filtering invalid triples)
 2値分類の classifier
 quality of entity embeddings (Socher et al., 2013)
 Joint learning で作成した embeddings を用いて score = (h + r − t) を作成
 負の例は negative sampling で作成

Conclusion
 KB の質向上のための関係抽出の研究の課題である NER のエラー伝播の課題に対処する
ために以下を提案
 Distant supervision によるハイクオリティなデータ取得方法
 co-reference resolution と paraphrase detection
 モデル
 n-gram based attention model
 Joint learning entity and word embeddings
 modified beam search and a triple classification
 結果: state-of-the-art model を2つのデータセットで超えた
 15.51% and 8.38% (F1 score)

Neural relation extraction for knowledge base enrichment introduced by Yoshiaki Kitagawa

Recommended

Recommended

More Related Content

More from Ace12358

More from Ace12358 (17)

Neural relation extraction for knowledge base enrichment introduced by Yoshiaki Kitagawa