論文紹介：Discovering Universal Geometry in Embeddings with ICA

Discovering Universal Geometry
in Embeddings with ICA
Hiroki Yamagiwa, Momose Oyama, Hidetoshi Shimodaira
EMNLP2023
仁田智也（名工大）

概要
n深層学習におけるEmbedding
• 入力データを固定次元のベクトルに変換すること
• Ex. 単語、画像
• 基本的に学習によって得られる
• 人間が解釈するには高次元すぎて難しい
• 各軸の意味を表す成分が特定不可
nEmbeddingの解釈
• 独立成分分析 (ICA)を用いた解析
• 主成分分析と比較してスパースで解釈可能に
• 異なるモーダル間で同じ意味を持つ軸の獲得
Discovering Universal Geometry in Embeddings with ICA
Hiroaki Yamagiwa⇤ 1
Momose Oyama⇤ 1,2
Hidetoshi Shimodaira1,2
1
Kyoto University 2
RIKEN
hiroaki.yamagiwa@sys.i.kyoto-u.ac.jp,
oyama.momose@sys.i.kyoto-u.ac.jp, shimo@i.kyoto-u.ac.jp
Abstract
This study utilizes Independent Component
Analysis (ICA) to unveil a consistent seman-
tic structure within embeddings of words or
images. Our approach extracts independent se-
mantic components from the embeddings of a
pre-trained model by leveraging anisotropic in-
formation that remains after the whitening pro-
cess in Principal Component Analysis (PCA).
We demonstrate that each embedding can be
expressed as a composition of a few intrinsic
interpretable axes and that these semantic axes
remain consistent across different languages,
algorithms, and modalities. The discovery of
a universal semantic structure in the geometric
patterns of embeddings enhances our under-
standing of the representations in embeddings.
[cs.CL]
2
Nov
2023

関連研究
nWord Embeddingの解釈可能性
• 制約を入れた損失関数による学習
• スパース性 (Faruqui+, ACL-
IJCNLP2015)
• スパースオートエンコーダ
(Subramanian+, AAAI2018)
• 𝑙!正則化 (Sun+, ACL-IJCAI2016)
• ICAによる解析 (Musil & Marecek,
arXiv2022), (Honkela+, Natural
Language Engineering2010)
n異なる言語間のEmbedding
• マッピング
• 線形変換 (Mikolov+, arXiv2013)
• 翻訳ペアで学習 (Xing+,
NAACL2015)
• 多言語モデル
• BERT (Devlin+, NAACL2019)を用
いた研究
• 言語横断的な文法知識の獲得
(Chi+, ACL2020)
• Embeddingを通じた他言語知識
の表現に関して (Chang+,
EMNLP2022)

主成分分析 (PCA)
n主成分分析
• データを無相関になるように変換を行う
• ガウス分布に対しては独立成分の分解が可能
• 分散が大きい方向順に直行な軸の獲得
• 𝐙 = 𝐗𝐀
• 𝐗 ∈ ℝ"×$
• 𝑛:サンプル数, 𝑑:次元数
• 𝐀 ∈ ℝ$×$:変換行列
• 共分散行列の固有値ベクトルの行列
• 𝐙 ∈ ℝ"×$
• 主成分の軸に射影されたデータ
• 𝐙⊺𝐙:対角行列
PCA

独立成分分析 (ICA)
nICA
• 変換した基底が独立になる
• 独立成分の尖度、歪度を考慮
• PCAでは独立成分が正規分布に従うと仮定
• 平均、分散のみが考慮
• 𝐒 = 𝐗𝐁
• 𝐁:変換行列
nFastICA
• 白色化したデータ𝐙に対して回転行列𝐑&'(を適用する
• 𝐒 = 𝐙𝐑&'(
• 各成分のエントロピーが最小になるようにニュートン法で最適化
Figure 5: Scatterplots of word embeddings along specific axe
ICA and PCA-transformed embeddings were arranged in des
Figure 6: Measures of non-Gaussianity for each axis of ICA a
the measures for each component of the raw word embedd
baselines. Larger values indicate deviation from the norma
non-Gaussian than those found by PCA. For more details, ref
3.3 Interpretability and low-dimensionality of
ICA-transformed embeddings
Fig. 1 illustrates that the key components of a word
embedding are formed by specific axes that capture
the meanings associated with each word. Specifi-
cally, this figure showcases five axes selected from
the ICA-transformed word embeddings, that is, five
columns of S. The word embeddings X have a
dimensionality of d = 300 and were trained on
the text8 corpus using Skip-gram with negative
sampling. For details of the experiment, refer to
Appendix B.
Interpretability. Each of these axes represents
a distinct meaning and can be interpreted by ex-
amining the top words based on their normalized
component values. For example, words like dishes,
meat, noodles have high values on axis 16, while
is
Q
4
T
IC
fr
m
be
de
S
G
se
im
S
(H
si
PCAでは尖度、歪度が0
Figure 5: Scatterplots of word embeddings along specific axes: (0, 1), (2, 3), (49, 50), and (99, 100). The axes for
ICA and PCA-transformed embeddings were arranged in descending order of skewness and variance, respectively.
Figure 6: Measures of non-Gaussianity for each axis of ICA and PCA-transformed word embeddings. Additionally,
the measures for each component of the raw word embedding and a Gaussian random variable are plotted as
baselines. Larger values indicate deviation from the normal distribution. The axes found by ICA are far more
non-Gaussian than those found by PCA. For more details, refer to Appendix B.
3.3 Interpretability and low-dimensionality of
ICA-transformed embeddings
Fig. 1 illustrates that the key components of a word
embedding are formed by specific axes that capture
the meanings associated with each word. Specifi-
cally, this figure showcases five axes selected from
the ICA-transformed word embeddings, that is, five
is approximately represented by these two axes.
Quantitative evaluation is provided in Section 6.
4 Universality across languages
This section examines the results of conducting
ICA on word embeddings, each trained individually
from different language corpora. Interestingly, the
正規分布とのずれ

独立成分分析の適用
nWord Embeddingに対して独立成分分析の適用
• モデル：MikolovらのSkip-gramモデル (NIPS2013)
• 次元数：300
• データ：text8 (Mahoney, 2011)
• トークン数：17.0×10)
• 語彙：254×10*
Discovering Universal Geometry in Embeddings with ICA
Hiroaki Yamagiwa⇤ 1
Momose Oyama⇤ 1,2
Hidetoshi Shimodaira1,2
1
Kyoto University 2
RIKEN
hiroaki.yamagiwa@sys.i.kyoto-u.ac.jp,
oyama.momose@sys.i.kyoto-u.ac.jp, shimo@i.kyoto-u.ac.jp
Abstract
This study utilizes Independent Component
Analysis (ICA) to unveil a consistent seman-
tic structure within embeddings of words or
images. Our approach extracts independent se-
mantic components from the embeddings of a
pre-trained model by leveraging anisotropic in-
formation that remains after the whitening pro-
cess in Principal Component Analysis (PCA).
We demonstrate that each embedding can be
expressed as a composition of a few intrinsic
interpretable axes and that these semantic axes
remain consistent across different languages,
algorithms, and modalities. The discovery of
a universal semantic structure in the geometric
patterns of embeddings enhances our under-
standing of the representations in embeddings.
Figure 1: (Left) Heatmap of normalized ICA-
2
[cs.CL]
2
Nov
2023

異なる言語間の比較
n異なる言語のEmbeddingでの比較
• Embeddingモデル
• GraveらのfastText学習済みモデル (LREC2018)
• 157言語で異なるコーパスで学習
• 次元数：300
• 語彙：各言語50000語を著者らが選択
• 比較言語
• 英語、スペイン語、ロシア語、アラビア語、ヒンディー語、ロシア語、中国
語
• ICA
• 500軸で分析

n最初の100軸の相互相関係数
(a) Normalized ICA-transformed word embeddings for seven languages.
(a) Normalized ICA-transformed word embeddings for seven languages.
ICA
PCA

n各軸に対する単語の反応
• 英語に対して各軸に強く反応した上位５単語を選択

別モデルとの比較
nfastTextとBERTとの比較
• BERTモデル
• トークン数：100000
• 次元数：768
Results. The upper panels of Fig. 3 display the
cross-correlation coefficients between the first 100
axes of English and those of Spanish embeddings.
The significant diagonal elements and negligible
off-diagonal elements observed in Fig. 3a suggest
a strong alignment of axes, indicating a good cor-
respondence between the ICA-transformed embed-
dings. Conversely, the less pronounced diagonal
elements and non-negligible off-diagonal elements
observed in Fig. 3b indicate a less favorable align-
ment between the PCA-transformed embeddings.
The semantic axes identified by ICA, referred to
as ‘independent semantic axes’ or ‘independent se-
mantic components’, appear to be nearly universal,
regardless of the language. In Fig. 2a, heatmaps
visualize the ICA-transformed embeddings. The
upper panels display the top 5 words for each of the (a) ICA (b) PCA
Results. The upper panels of Fig. 3 display the
cross-correlation coefficients between the first 100
axes of English and those of Spanish embeddings.
The significant diagonal elements and negligible
off-diagonal elements observed in Fig. 3a suggest
a strong alignment of axes, indicating a good cor-
respondence between the ICA-transformed embed-
dings. Conversely, the less pronounced diagonal
elements and non-negligible off-diagonal elements
observed in Fig. 3b indicate a less favorable align-
ment between the PCA-transformed embeddings.
The semantic axes identified by ICA, referred to
as ‘independent semantic axes’ or ‘independent se-
mantic components’, appear to be nearly universal,
regardless of the language. In Fig. 2a, heatmaps
visualize the ICA-transformed embeddings. The
upper panels display the top 5 words for each of the (a) ICA (b) PCA
ICA PCA

Image Embeddingの分析
n画像認識モデルとの比較
• データ：ImageNet (Russakovsky+, IJCV2015)
• 1000クラス
• モデル
• Vit-Base (Dosvitskiy+, ICLR2021)
• ResMLP-12 (Touvron+, TPAMI2023)
• Swin-S (Liu+, ICCV2021)
• ResNet-18 (He+, CVPR2016)
• RegNetY-200MF (Han+, CoRR2018)
• 分析対象
• 各クラス100枚に対して分析
• 各モデルの最終層のベクトル
• fastText
• ImageNetのクラス名と対応する単語を分析

定量的評価
nWord Intrusion Task
• 似た意味のk個の単語集合から他の意味の単語と分離具合を表す
• ex. {windows, Microsoft, unix, linux, os}とhamster
nAnalogy Task
• 単語の関係性から他の単語を類推するタスク
• ex. France : Paris = Italy : ?
nWord Similarity Task
• 人間が定めた単語間の類似度との一致度合いを表す
Results. In Fig. 8a, the heatmaps of the five im-
age models and fastText exhibit similar patterns,
indicating a shared set of independent semantic
axes in the ICA-transformed embeddings for both
images and words. While we expected a good align-
ment between ViT-Base and fastText, it is note-
worthy that we also observe a good alignment of
ResMLP (Touvron et al., 2023) and Swin Trans-
former (Liu et al., 2021) with fastText. Further-
more, Fig. 3a demonstrates a very good alignment
of axes between ViT-Base and ResMLP. This sug-
gests that the independent semantic components
captured by ICA are not specific to a particular
image model but are shared across multiple mod-
els. On the other hand, PCA gives a less favorable
alignment, as seen in Figs. 3b and 8b.
6 Quantitative evaluation
We quantitatively evaluated the interpretability
ZCA PCA Varimax ICA
DistRatio 1.04 1.13 1.26 1.57
Table 2: A large value of DistRatio indicates the consis-
tency of word meaning in the word intrusion task. We
set k = 5, and the reported score is the average of 10
runs with randomly selected intruder words.
Figure 9: The performance of several whitened word
embeddings when reducing the non-zero components.
The values are averaged over 14 analogy tasks or six
word similarity tasks. The performance of a specific
Results. In Fig. 8a, the heatmaps of the five im-
age models and fastText exhibit similar patterns,
indicating a shared set of independent semantic
axes in the ICA-transformed embeddings for both
images and words. While we expected a good align-
ment between ViT-Base and fastText, it is note-
worthy that we also observe a good alignment of
ResMLP (Touvron et al., 2023) and Swin Trans-
former (Liu et al., 2021) with fastText. Further-
more, Fig. 3a demonstrates a very good alignment
of axes between ViT-Base and ResMLP. This sug-
gests that the independent semantic components
captured by ICA are not specific to a particular
image model but are shared across multiple mod-
els. On the other hand, PCA gives a less favorable
alignment, as seen in Figs. 3b and 8b.
6 Quantitative evaluation
ZCA PCA Varimax ICA
DistRatio 1.04 1.13 1.26 1.57
Table 2: A large value of DistRatio indicates the consis-
tency of word meaning in the word intrusion task. We
set k = 5, and the reported score is the average of 10
runs with randomly selected intruder words.
Figure 9: The performance of several whitened word
embeddings when reducing the non-zero components.
The values are averaged over 14 analogy tasks or six

定量的評価
nCross-Lingual Alignment
• 多言語間の単語の翻訳タスク
Dataset Method Original Rand.
157langs-fastText
LS 84.72 55.28
Proc 84.95 18.72
ICA 19.48 18.60
PCA 0.92 0.64
MUSE-fastText
LS 78.91 51.44
Proc 78.41 20.19
ICA 43.27 43.31
PCA 2.13 0.40
optimal mappi
target language
as well as tran
sider LS as the
Proc performed
beddings, but
the random tra
beddings had v
across languag

まとめ
n既存のモデルのEmbeddingに対してICAで分析
• 定性的、定量的に見て良い軸の獲得
• PCAやZCAよりも向上
• 異なる言語、モデル、モーダル間で同様の意味となる軸を獲得

定量的評価
n単語侵入タスク
• 似た意味のk個の単語集合から他の意味の単語と分離具合を表す
• ex. {windows, Microsoft, unix, linux, os}とhamster
• 𝐷𝑖𝑠𝑡𝑅𝑎𝑡𝑖𝑜 =
!
$
∑+,!
$ -"./0123.(+)
-".0+123.(+)
• 𝐼𝑛𝑡𝑟𝑎𝐷𝑖𝑠𝑡 𝑎 = ∑6!,6"∈.9:# + ,
6!;6"
$23.(6!,6")
<(<=!)
• 𝑡𝑜𝑝< 𝑎 ：k個の単語集合を各分析手法でa次元で表現したベクトル
• 𝑑𝑖𝑠𝑡：距離関数 (L2ノルム)
• 𝐼𝑛𝑡𝑒𝑟𝐷𝑖𝑠𝑡 𝑎 = ∑6!∈.9:#(+)
$23.(6!,6!$%&'()& + )
<
• 𝑤2".0>$/0 𝑎 ：他の意味の単語を各分析手法でa次元で表現したベクトル
!
" な気がする
Fig. 8a, the heatmaps of the five im-
d fastText exhibit similar patterns,
hared set of independent semantic
ZCA PCA Varimax ICA
DistRatio 1.04 1.13 1.26 1.57

論文紹介：Discovering Universal Geometry in Embeddings with ICA

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to 論文紹介：Discovering Universal Geometry in Embeddings with ICA

Similar to 論文紹介：Discovering Universal Geometry in Embeddings with ICA (20)

More from Toru Tamaki

More from Toru Tamaki (20)

Recently uploaded

Recently uploaded (20)

論文紹介：Discovering Universal Geometry in Embeddings with ICA