Dissecting contextual word embeddings

Dissecting Contextual Word Embeddings:
Architecture and Representation

要点
2
n 近年、様々なタスクに対して大幅な精度向上に寄与しているpre-
trained双方向言語モデル(BiLM)について、調査を行なった
n BiLMの実装であるLSTM, CNN, Self Attentionについて、Textual
Entailmentタスク(MultiNLI)、意味役割付与タスク、句構造解析タス
ク、固有表現抽出タスクについて比較を行った
n BiLMの性質について、単語ー句の近さの可視化、BiLMの単語分散表
現の可視化、各タスクにおける層別の結果と層の重要度比較による
調査を行った

双方向言語モデル
3
n 文が与えられた時、以下のように２方向の予測を行うモデル(パラ
メータは共有)
l 左方向から順に、それまでのトークンから次のとトークンを予測する
l 右方向から順に、それまでのトークンから次のトークンを使用する

使用モデル１：ELMo(LSTM)
4
n LSTMを使った、BiLM
n 各層の加重平均をタスクにしようする
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding より

使用モデル２：Gated Convolutional Neural Network (GCNN)
5
n CNNを用いた双方向言語モデル
n LSTMには劣るが、計算の高速化を実現
3weeks
2weeks

使用モデル３：Transformer
6
n Attention is all you needで使用されたユニット
n Self-Attention + MLPの総称？
n 言語モデルとしてはGPTを使用(双方向ではない
のでは？？（BERTなかったからね））

使用パラメータ数など
7
n PerplexityはLSTMが良い
n 速度は、CNNとTransformerが良い

タスク
8
n MultiNLI：文役割付与タスク. SNLIが単一ドメインなので拡張した
l データ：Multi NLI
l モデル：Chen2018の手法(ESIM)
A Broad-Coverage Challenge Corpus forSentence Understanding through Inferenceより

タスク
9
n Semantic Role Labeling: 意味役割付与タスク(タグがBIOで付与される)
l 文を入力に、誰が、誰に対して、何をしたかと言った情報を抽出する問題
l データ：Ontonotes 5.0 dataをしよう
l モデル：He2017
l assigns labels to words or phrases in a sentence that indicate their semantic role in the sentence
l It consists of the detection of the semantic arguments associated with the predicate or verb of a
sentence and their classification into their specific roles. For example, given a sentence like
"Mary sold the book to John", the task would be to recognize the verb "to sell" as representing
the predicate, "Mary" as representing the seller (agent), "the book" as representing the goods
(theme), and "John" as representing the recipient.

意味役割付与
10
n 日本語の例(http://www.cl.cs.okayama-u.ac.jp/study/project/asa/)
l 入力文) 昨日彼は私に手紙を送った．
l (出力) [昨日] 場所(時)(点) => 時間(点)に変更予定 (2015.10)
[彼は] 動作主
[私に] 着点(人)
[手紙を] 対象
[送った] 状態変化あり-位置変化-位置変化（物理）（人物間）-他者への所有物の移動-提
供

タスク
11
n Consistency Parsing: 句構造解析
l データ：PennTreebankを使用
l モデル：Joshi2018 Reconciled Span Parser
n Named Entity Recognition: 固有表現抽出,
l データ：CoNLL2003データを使用
l モデル：Peter2018を使用(charbase CNN word rep + biLSTM + CRFloss)

分析１：文類似度
13
n それぞれの要素をcos-simで
可視化(黄色が似ている)
n 初期の層が、狭い文法的特
徴を、後の層が広い文法的
特徴を見ている
n 赤：句で類似度がまとまっ
ている
n 白：共参照分析ができてい
る感じがある
n 紫：動詞が似ている

分析２：スパン表現の可視化
14
n スパン表現を作り、t-SNEで可視化
l スパン表現：
l span representations of 3,000 labeled chunks and 500 spans not labeled as chunks from the
CoNLL 2000 chunking dataset
n 文法的構造ごとにまとまっている

分析３：教師なし共参照分析
15
n CoNLL 2012の一文中の代名詞が指すものについて実施。
n ベースライン
l Lee2017: 64%
l ルール
l すぐ前の名詞：27%
l 最初の名詞：35%
l +さらにルール追加(詳細不明）：41%
l +数の一致：47%
n やり方
l 代名詞のベクトルから平均を引く
l 局所的な類似度がもとも高くなるから
l もっとも代名詞より前に出ている似ている名詞を選ぶ
l 精度は57%
l 数の一致み入れている。このルールを抜くと2-3%落ちる

分析４：意味表現
16
n Mikolovのアナロジータスクをword-embeddingに置いて実施
n context embeddingは文法的特徴は見れているが、意味的特徴は見れ
ていない

分析５：各層の役割
17
n タスクごとに重要な層が異なる
l Pen Tree bankのPOS-Tag: 低いレイヤーが大事
l 句構造解析：中間が大事
l 共参照：高いレイヤーが大事
l MNLI：中間の層が大事
l NER: 低い層の方が大事だが、全体的に制度に寄与

Dissecting contextual word embeddings

Recommended

Recommended

More Related Content

More from Hiroki Iida

More from Hiroki Iida (9)

Dissecting contextual word embeddings