[DLHacks]XLNet を動かして可視化してみた

1
XLNet を動かして可視化してみた
東京大学鶴岡研 B4 中村朝陽

おしながき
Code: XLNetを適当に動かしてみる
Introduction: XLNetとは
Results: XLNetから得られた潜在空間を可視化してみた
Conclusion: まとめと考察
XLNetを手元で動かしてみて、可視化を行った

XLNetは2019年6月にBERTを大きく超えた自然言語モデルとして注目された
XLNet: a new pretraining method for NLP
that significantly improves upon BERT on
20 tasks (e.g., SQuAD, GLUE, RACE) 
19 JUN 2019
arxiv: (link: https://arxiv.org/abs/
1906.08237) arxiv.org/abs/1906.08237
github (code + pre-trained models): (link:
https://github.com/zihangdai/xlnet)
github.com/zihangdai/xlnet
1. Introduction: XLNetとは
4

1. Introduction: XLNetとは
5
https://mc.ai/bertを超えたxlnetの紹介/
https://qiita.com/Kaniikura/items/242988e96cd78148be60

1. Introduction: 問題点と提案手法
6
文が発生する確率の分解の仕方に問題があることを指摘
全ての分解順序を加味したトレーニングを提案

ある順序で「因数分解」した場合、x3を予測するのに使える情報には制限がある
例：2→4→3→1の順で分解した場合：使っていい情報はx2, x4 (右上)
1. Introduction: 具体例
7

XLNet vs. BERT vs. GPT
BERTはマスクされた単語間の依存関係を捉えられない
GPTは「右の単語→左単語」の向きの依存関係などを捉えられない
New York is a city
1. Introduction: BERTやGPTとの比較
8
GPT is only able to cover the dependency

XLNetの出力ベクトルをPCAやT-SNEなどで可視化したい
Final layer output
2. Code: やりたいこと
10
Embeddings
Transformer Layer 0
Transformer Layer 1
Transformer Layer 22
[CLS] This is a pen . [SEP] [PAD] …
…
All output tensor shape is (seq_len, n_dim) = (200, 1024)
200 tokens * 32 sentences
̶> 6,400 points in 1024 dimensions
Vector Token sentence_id (0 - 31)
This 0
is 0
[SEP] 0
[PAD] 0
This 0
That 1
これ 0
あれ 1
：：
私 31
Meta data
XLNet

本家githubでは分かりにくかったところをノートブックにした
2. Code: XLNetを適当に動かしてみる
11
・テキスト分類やSQuAD, RACEにファインチューンするスクリプトはある
　→ (研究のため) ただ「推論」を実行できるようにしたかった
・ファインチューンとかじゃなくて適当に動かしてみたい
　→ 「run_classifier.pyとかをちょっと変えてみてね！」
　→ 800行以上…
・そもそもTensorflowにあまり詳しくない
　→ 一応Pytorch版も有志が鋭意開発中, でもすぐ触ってみたかったので待てなかった
→「Pre-trainedモデルをロードして自分で用意した文の出力をとる」jupyter notebookを作った
https://github.com/tyo-yo/analyze_xlnet/blob/master/analyze_xlnet.2019.07.01.ipynb

コードの量は少なめで、実験してたものを綺麗にして出しました感
2. Code: Gitリポジトリの構成
12

モデルの重み, SentencePieceのモデル(どう分割するか), configの3つ
2. Code: Pre-trainedデータのダウンロード
13

こちらのノートブックを参照
2. Code: XLNetを適当に動かしてみる
14
https://github.com/tyo-yo/analyze_xlnet/blob/master/analyze_xlnet.2019.07.01.ipynb

モデル自体の設定と、実行設定の2つを指定する
2. Code: モデルの設定
15

Tensorflowなので、先に計算グラフを定義する
2. Code: モデル(計算グラフ)の定義
16

Sentencepieceモデル、語彙数：約32,000 piece
2. Code: トークン化
17

2つのモード(user_defined_symbols, control_symbols) どうやって切り替える？
2. Code: トークン化
18
user_defined_symbols: トークン化するときに特殊記号を認識する
control_symbols: トークン化するときに特殊記号を認識しない(特殊記号はトークン化後に別途挿入する)
https://github.com/google/sentencepiece/issues/215
Pre-trainedモデルがcontrol_symbolsモードで扱いにくい…

計算グラフにfeed_dictを代入することで結果を得る
2. Code: 実行
19
(先のsentencepieceを用いながら「文」を「id列のバッチ」にする)

BERTで同様の可視化実験をした際に以下のような結果が観察された
1.BERTは「階層的な点群」として意味をエンコードする 
ex) [フレーズクラスタの重心]＝[フレーズの意味]？
(Language cluster >) Sentence cluster > Phrase cluster
2. 多言語BERTのPre-trainには対訳コーパスは用いていないのに、日-英間に相関が見られた
• Word2Vecのようなアナロジーの関係
• ある日本語のトークンと、それに最も近い英語のトークンが翻訳の関係
　　　e.g) “鎖” <-> “isolation”
※ Appendixに詳細を載せておきます
3. Results: 仮説
21

nd
20 [CLS] しかも配分に当たって大石自
らは分配金受け取りを辞退したので、
藩士たちの支持を集めた。 [SEP]
20 [CLS] Moreover, Oishi gained
their support since he declined
to receive a dividend. [SEP]
BERTは文ごとにクラスタを作る上に、その中にもフレーズ単位のクラスタを作る
1.BERTは「階層的な点群」として意味をエンコードする
3. Results: 仮説
22

nd
20 [CLS] しかも配分に当たって大石自
らは分配金受け取りを辞退したので、
藩士たちの支持を集めた。 [SEP]
20 [CLS] Moreover, Oishi gained
their support since he declined
to receive a dividend. [SEP]
BERTは文ごとにクラスタを作る上に、その中にもフレーズ単位のクラスタを作る
1.BERTは「階層的な点群」として意味をエンコードする
3. Results: 仮説
23
仮説(お気持ち)：XLNetでは、さぞ綺麗なクラスタが観察されるのだろう

仮説は正しくなかった模様
1.XLNetは文をクラスタにまとめようとはあまりしていない様子だった
フレーズ単位でのクラスタは観察されたが、文のクラスタは少なかった
一部の短い文(10トークン以下程度)はクラスタを作っていた
3. Results: 結果
24T-SNE of outputs, colored by sentence id Filter setence id == 20
結果(悲しい)：そんなこともなかった

仮説は正しくなかった模様
1.XLNetは文をクラスタにまとめようとはあまりしていない様子だった
フレーズ単位でのクラスタは観察されたが、文のクラスタは少なかった
一部の短い文(10トークン以下程度)はクラスタを作っていた
3. Results: 結果
25sentence idで色付けしたものをT-SNE ある文のみを抽出

3. Results: 詳細
26
XLNet (Large, Mono-lingual) BERT (Base, Multi-lingual)
Sentence clusters
BERTでは文毎に綺麗にクラスタになっていたが、
XLNetではそこまで綺麗に分かれていない

O / ishi
CLS
. / SEP
Most contents of the sentence are here
Phrase clusters are distributed
3. Results: 詳細
27
XLNetでは1文がフレーズクラスタ単位で散らかっていた

3. Results: 詳細
28
XLNetでは1文がフレーズクラスタ単位で散らかっていた

なぜXLNetでは「階層的なクラスタリング」が観察されなかったのか？
→ 実験条件が違ったから？(要調査)
もしくは、先のBERTの可視化実験の方がレアな結果だった？
4. Conclusion: 考察
30
XLNet BERT
サイズ
Large
(24 layers, 1024dim)
Base
(12 layers, 768dim)
多言語？ 1言語多言語
アーキテクチャ Transformer-XL Transformer

4. Conclusion: 今回やったことのまとめ
XLNetのPre-trainedモデルを使うためのコードを書いて公開した
XLNetを簡単に紹介した
XLNetの出力ベクトルを可視化し、BERTと比較した
XLNetを手元で動かしてみて、可視化を行った
感想：pytorchでこれくらい簡単に使えるようになってほしい

• Filter by metadata
• Color by metadata
• PCA and t-SNE
• Select a word to see details
• Search the nearest points in original space by Euclidean or cosine distance
5.1 Tensorboardでできること
34

Encode 32 sentences of Japanese-English parallel corpus using pre-trained
BERT, and visualize the output of each layer with PCA and t-SNE
Final layer output
5.2 多言語BERTの可視化実験
35
BERT Embeddings
Transformer Layer 0
Transformer Layer 1
[CLS] This is a pen . [SEP] [PAD] …
[CLS] これはペンです。[SEP] [PAD] …
…
Layer 10 output
Layer 0 output
All output tensor shape is (seq_len, n_dim) = (50, 768)
50 tokens * 32 sentences * 2 languages * 12 layers
̶> 38,400 points in 768 dimensions
Vector Token Lang (en, ja) sentence_id (0 - 31) layer_id (0-11)
This en 0 0
is en 0 0
[SEP] en 0 0
[PAD] en 0 0
This en 0 1
That en 1 0
これ ja 0 0
あれ ja 1 0
：：：：
私 Ja 31 11
Meta data
BERT

Input sentences: Japanese-English Bilingual Corpus of Wikipedia's Kyoto Articles
(sentence_id=0～15)
5.2.1 多言語BERTの可視化実験: 入力文①
36
0 [CLS] 通称は久川靱負。 [SEP]
1 [CLS] 江戸時代は鎖国と呼ばれる状態にあり、外国との交流は中国（明・清）やオランダなどを除いて断絶したが、250年に亘る「泰平」を築いた。 [SEP]
2 [CLS] 大和茶（やまとちゃ）は、奈良県大和高原を中心とする地域で生産される日本茶のひとつ。 [SEP]
3 [CLS] 義顕は父の幼名である小太郎を受け継いでいることから、母は義貞の正室とする説がある。 [SEP]
4 [CLS] 「御殿女中の夏服には、辻、茶屋辻、晒布の三種があり、身分の高低によって使い分けられていた」。 [SEP]
5 [CLS] 日本の暦については、Template季節の話題、Template今日のこよみ、Template今日は何の日も参照。 [SEP]
6 [CLS] 国鉄本社とすれば、京阪神緩行線と京浜東北線が同じような線区に見えたのだろうが、実際のところは大きな違いがあった。 [SEP]
7 [CLS] 藤原隆家（ふじわらのたかいえ、天元 (日本)2年（979年） - 寛徳元年正月1日（1044年2月2日））は、平安時代の公卿。 [SEP]
8 [CLS] 1422年、観世大夫の座を長男の観世元雅に譲り、自身は出家した。 [SEP]
9 [CLS] 慶長14年（1609年）7月、参議烏丸光広ら公家衆の密通が露顕する事件（猪熊事件）が起きると、乱交の手引きをしていた教利は京都所司代の追及を恐れて九州へ逃亡する。 [SEP]
10 [CLS] 後宇多天皇宸翰消息（五月十一日　御法号）。 [SEP]
11 [CLS] korerano点において日本料理との共通点を持つ。 [SEP]
12 [CLS] 乗車時間は2分ほどの距離だが、寺に100円の寄付金を納めた人が無料で乗車できる（事実上、運賃が片道100円）。 [SEP]
13 [CLS] 日本は、日清戦争と日露戦争に勝利を収めた後、列強の一角を占めるようになった。 [SEP]
14 [CLS] 第十条 - 殺害刃傷罪科事。 [SEP]
15 [CLS] 富子が討手を差し向けて暗殺したとも言われる。 [SEP]
0 [CLS] His common name was Yukie HISAKAWA. [SEP]
1 [CLS] In the Edo period, Japan was in a state of what is called national isolation, ending cultural exchanges with foreign countries except for China (Ming, Qing) and the Netherlands, establishing peace which continued for 2
2 [CLS] Yamato-cha green tea is a kind of Japanese tea produced in an area centering Yamato Plateau in Nara Prefecture. [SEP]
3 [CLS] Since Yoshiaki received the childhood name of his father, Kotaro, there was a theory that his mother was the seishitsu (legal wife) of Yoshisada. [SEP]
4 [CLS] There are three kinds of summer wear for goten jochu (palace maids) which are Tsuji, Chayatsuji and Sarashinuno (bleached cloth), and they were properly used according to the social position.' [SEP]
5 [CLS] For Japanese calendar, also refer to Template seasonal topic, Template today's calendar, and Template what happened on this date in the past. [SEP]
6 [CLS] There was a big difference between the Keihanshin Local Line and the Keihin Tohoku Line, though they were seemingly similar from the viewpoint of JNR Head Office,. [SEP]
7 [CLS] FUJIWARA no Takaie (979 - February 8, 1044) was a court noble during the Heian period. [SEP]
8 [CLS] In 1422, Zeami assigned the position of Kanze-dayu to his eldest son, Motomasa KANZE and he became a priest. [SEP]
9 [CLS] In August 1609, when adulteries of Kugeshu (court nobles) such as Sangi (Councilor) [SEP] Mitsuhiro KARASUMARU were discovered (the Inokuma Incident), Noritoshi, who had been helping their promiscuities, was a
10 [CLS] Letter from Emperor Go-Uda [SEP] (May 11, On Receiving a Buddhist Name) [SEP]
11 [CLS] Korean dishes are provided with many features of Japanese dishes. [SEP]
12 [CLS] The time spent in the cable car is only about 2 minutes, and those who contribute 100 yen to the temple can ride the cable car for free (essentially, the fare is 100 yen one way). [SEP]
13 [CLS] After winning the Sino-Japanese War and Russo-Japanese War, Japan became one of the great world powers. [SEP]
14 [CLS] Article 10: Crimes of murder and bodily injury [SEP]
15 [CLS] It was said that Tomiko had sent a assassin to kill Imamairi. [SEP]

5.2.1 多言語BERTの可視化実験: 入力文②
37
16 [CLS] 秋季 10月25日～ 11月10日。 [SEP]
17 [CLS] 部民制（べみんせい）とは、ヤマト王権の制度であり、王権への従属・奉仕の体制、朝廷の仕事分掌の体制をいう。 [SEP]
18 [CLS] 細川満元（ほそかわみつもと、天授 (日本)4年/永和 (日本)4年（1378年）- 応永33年10月16日 (旧暦)（1426年11月15日））は、室町時代前期の管領。 [SEP]
19 [CLS] そして「己（おのれ）もそうしなければ、餓死をする体なのだ。 [SEP] 」と言い残し、漆黒の闇の中へ消えていった。 [SEP]
20 [CLS] しかも配分に当たって大石自らは分配金受け取りを辞退したので、藩士たちの支持を集めた。 [SEP]
21 [CLS] 柚子胡椒 - 少しだけ取って入れる。 [SEP]
22 [CLS] 扇子を開く角度は、大体90度から180度の間であり、円を三等分した中心角120度前後のものが主流である。 [SEP]
23 [CLS] 森羅万象の擬人化。 [SEP]
24 [CLS] ＜内宮＞1993年9月25日/2013年9月＜外宮＞1993年9月27日/2013年9月。 [SEP]
25 [CLS] 1957年、東宝系の東京映画へ移籍。 [SEP]
26 [CLS] 山名師義の四男。 [SEP]
27 [CLS] この葛城地域には、古墳時代前期の中頃から有力な古墳の造営が始まった。 [SEP]
28 [CLS] 鋸を引いたり、斧を振ったり単純な往復運動をするものが多い。 [SEP]
29 [CLS] 三上兵部、樹下茂国らを弟子とした。 [SEP]
30 [CLS] 御縁輪のはた板ニハしやちほこひれうかゝせられ候。 [SEP]
31 [CLS] 羽二重は日本では近世から始められたと伝わっている伝統的な織物である。 [SEP]
16 [CLS] Autumn: From October 25 to November 10 [SEP]
17 [CLS] Bemin system is a system during the Yamato sovereignty, which refers to the system of subordination and service to the sovereignty and the system of the division of duties at the Imperial Court. [SEP]
18 [CLS] Mitsumoto HOSOKAWA (1378 - November 15, 1425) was a Kanrei (shogunal deputy) lived in the early Muromachi period. [SEP]
19 [CLS] He then said, 'That's what I have to do to keep from starving to death,' and disappeared into the darkness of the night. [SEP]
20 [CLS] Moreover, Oishi gained their support since he declined to receive a dividend. [SEP]
21 [CLS] Yuzu kosho (a spicy, hot Japanese condiment made from yuzu rind, chili, and salt): A touch of Yuzu Kosho may be added to suiji in a bowl. [SEP]
22 [CLS] The angle a Sensu or Ogi when unfolded varies from 90 - 180 degrees, with around 120 degrees being the norm. [SEP]
23 [CLS] The personification of shinrabansho [SEP]
24 [CLS] Naiku: September 25, 1993/September 2013; [SEP] Geku: September 27, 1993/September 2013 [SEP]
25 [CLS] In 1957, he moved to Tokyo Eiga Haikyu [SEP] (Tokyo film distribution company) affiliated with Toho Co., Ltd. [SEP]
26 [CLS] He was the fourth son to Moroyashi YAMANA. [SEP]
27 [CLS] In the Katsuragi region, construction of famous tumulus started from the middle of the early Kofun period (Tumulus period). [SEP]
28 [CLS] Most of the old mechanical dolls seen outside Japan do a simple reciprocation, such as sawing and ax-swinging. [SEP]
29 [CLS] He took Hyobu MIKAMI, Sigekuni JUGE and others as his disciples. [SEP][CLS] On the wall panel, Shachihoko(mythical creature with a tiger's head and the body of a fish)was dynamically painted. [SEP]
30 [CLS] On the wall panel, Shachihoko(mythical creature with a tiger's head and the body of a fish)was dynamically painted. [SEP]
31 [CLS] Habutae is a traditional Japanese woven cloth, which is said to have originated in the early-modern times. [SEP]
Input sentences: Japanese-English Bilingual Corpus of Wikipedia's Kyoto Articles
(sentence_id=16～31)

BERTで同様の可視化実験をした際に以下のような結果が観察された
1.BERTは「階層的な点群」として意味をエンコードする 
ex) [フレーズクラスタの重心]＝[フレーズの意味]？
(Language cluster >) Sentence cluster > Phrase cluster
2. 多言語BERTのPre-trainには対訳コーパスは用いていないのに、日-英間に相関が見られた
• Word2Vecのようなアナロジーの関係
• ある日本語のトークンと、それに最も近い英語のトークンが翻訳の関係
　　　e.g) “鎖” <-> “isolation”
※ Appendixに詳細を載せておきます
5.2.2 多言語BERTの可視化実験: 結果
38

All layers, Both languages, All sentences, All tokens (PCA)
Color by layer_id
5.2.3 多言語BERTの可視化実験: 結果詳細
39
Color by language Color by sentence_id Select [PAD]
→ Next: Remove [PAD]

All layers, Both languages, All sentences, Tokens without [PAD]
(PCA)
Color by layer_id
40
Color by language Color by sentence_id Select sentence_id=20
→ Next: Isolate sentece_id=20 to observe changes between layers

All layers, Both languages, sentence_id=20, Tokens without [PAD]
Color by layer_id (PCA)
41
Color by language(t-SNE)Color by sentence_id(t-SNE)
→ Next: Look at the center figure up close

Word vectors become contextualized by passing layers
Also, the words in the same phrase are getting closer
Color by layer_id (t-SNE)
42
Label by layer_idZoom
→ Next: Look at the red cluster (outputs of the final layer)
Label by token

The final layer (layer_id=11) is somewhat special
Color by layer_id (t-SNE)
43
Zoom, Label by token

There are phrase clusters in the sentence cluster
44
nd
The final layer, 
Both languages,
Sentence_id = 20,
Tokens without [PAD],
t-SNE

The final layer, Both language, All sentences, Tokens without [PAD]
45
Color by sentence_id (PCA)Color by language (PCA)
Color by sentence_id (t-SNE)Color by language (t-SNE)
Language clusters
Language axis
Sentence clusters

Nearest English token of “鎖”is “isolation”, which is a translation of “鎖
国”
46Nearest points of “鎖” in the original space
The final layer,
Both language,
All sentences,
Tokens without
[PAD]

There are some analogies between sentence clusters like word2vec
47
The final layer, 
Both language,
Sentence_ids=0,10,20,30, 
Tokens without [PAD], 
PCA
The final layer, 
Both language,
Sentence_ids=20, 21, 22, 23 
Tokens without [PAD],
PCA

[DLHacks]XLNet を動かして可視化してみた

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

More from Deep Learning JP

More from Deep Learning JP (20)

Recently uploaded

Recently uploaded (9)

[DLHacks]XLNet を動かして可視化してみた