本スライドは、弊社の梅本により弊社内の技術勉強会で使用されたものです。
近年注目を集めるアーキテクチャーである「Transformer」の解説スライドとなっております。
"Arithmer Seminar" is weekly held, where professionals from within and outside our company give lectures on their respective expertise.
The slides are made by the lecturer from outside our company, and shared here with his/her permission.
Arithmer株式会社は東京大学大学院数理科学研究科発の数学の会社です。私達は現代数学を応用して、様々な分野のソリューションに、新しい高度AIシステムを導入しています。AIをいかに上手に使って仕事を効率化するか、そして人々の役に立つ結果を生み出すのか、それを考えるのが私たちの仕事です。
Arithmer began at the University of Tokyo Graduate School of Mathematical Sciences. Today, our research of modern mathematics and AI systems has the capability of providing solutions when dealing with tough complex issues. At Arithmer we believe it is our job to realize the functions of AI through improving work efficiency and producing more useful results for society.
This document discusses generative adversarial networks (GANs) and their relationship to reinforcement learning. It begins with an introduction to GANs, explaining how they can generate images without explicitly defining a probability distribution by using an adversarial training process. The second half discusses how GANs are related to actor-critic models and inverse reinforcement learning in reinforcement learning. It explains how GANs can be viewed as training a generator to fool a discriminator, similar to how policies are trained in reinforcement learning.
本スライドは、弊社の梅本により弊社内の技術勉強会で使用されたものです。
近年注目を集めるアーキテクチャーである「Transformer」の解説スライドとなっております。
"Arithmer Seminar" is weekly held, where professionals from within and outside our company give lectures on their respective expertise.
The slides are made by the lecturer from outside our company, and shared here with his/her permission.
Arithmer株式会社は東京大学大学院数理科学研究科発の数学の会社です。私達は現代数学を応用して、様々な分野のソリューションに、新しい高度AIシステムを導入しています。AIをいかに上手に使って仕事を効率化するか、そして人々の役に立つ結果を生み出すのか、それを考えるのが私たちの仕事です。
Arithmer began at the University of Tokyo Graduate School of Mathematical Sciences. Today, our research of modern mathematics and AI systems has the capability of providing solutions when dealing with tough complex issues. At Arithmer we believe it is our job to realize the functions of AI through improving work efficiency and producing more useful results for society.
This document discusses generative adversarial networks (GANs) and their relationship to reinforcement learning. It begins with an introduction to GANs, explaining how they can generate images without explicitly defining a probability distribution by using an adversarial training process. The second half discusses how GANs are related to actor-critic models and inverse reinforcement learning in reinforcement learning. It explains how GANs can be viewed as training a generator to fool a discriminator, similar to how policies are trained in reinforcement learning.
Ryosuke Hattori, Kazushi Okamoto, Atsushi Shibata: Visualizing the Importance of Floor-Plan Image Features in Rent-Prediction Models, Joint 11th International Conference on Soft Computing and Intelligent Systems and 21st International Symposium on Advanced Intelligent Systems (SCIS-ISIS2020), 2020.12
Kanta Nakamura, Kazushi Okamoto: Directed Graph-based Researcher Recommendation by Random Walk with Restart and Cosine Similarity, Joint 11th International Conference on Soft Computing and Intelligent Systems and 21st International Symposium on Advanced Intelligent Systems(SCIS-ISIS2020), 2020.12
Kanta Nakamura, Kazushi Okamoto: Development of a Collaborator Recommender System Based on Directed Graph Model, 20th International Symposium on Advanced Intelligent Systems and International Conference on Biometrics and Kansei Engineering (ISIS2019&ICBAKE2019), 2019.12
Kazushi Okamoto: Text Analysis of Academic Papers Archived in Institutional Repositories, 15th IEEE/ACIS International Conference on Computer and Information Science (ICIS2016), 2016.06.28
The document discusses triangular norm (t-norm) based kernel functions and their application to kernel k-means clustering. It introduces common kernel functions and describes how t-norms can be used to create new kernel functions. Several parameterized and non-parameterized t-norm based kernel functions are presented. The document then details experiments applying various kernel functions including t-norm kernels to four datasets, evaluating the results using adjusted rand index scores. The best performing kernels for each dataset are identified, with some t-norm kernels performing comparably or better than traditional kernels.
15. パーティクルフィルタ研究会2016/1/7 / 32
入力層のベクトル はV次元
ある要素だけ1、それ以外は0の 1-of-k ベクトル
例: V= { } =
入力層( )
15
x =
0
B
B
B
B
B
B
@
x1
x2
x3
x4
x5
x6
1
C
C
C
C
C
C
A
=
0
B
B
B
B
B
B
@
0 · · · I
1 · · · like
0 · · · black
0 · · · coffee
0 · · · am
0 · · · cat
1
C
C
C
C
C
C
A
x
x
wI
16. パーティクルフィルタ研究会2016/1/7 / 32
W はボキャブラリの全単語の単語ベクトルを横に並べた行列
隠れ層への重み( )
16
隠
れ
層
の
次
元
W
=
0
B
B
B
@
v11 v21 v31 v41 v51 v61
v12 v22 v32 v42 v52 v62
...
...
...
...
...
...
v1N v2N v3N v4N v5N v6N
1
C
C
C
A
W = v1 v2 v3 v4 v5 v6
, の場合,W は 200 x 106
の大きさh 2 R200
x 2 R106
17. パーティクルフィルタ研究会2016/1/7 / 32
隠れ層の計算 h = Wx
(活性化関数なし)
隠れ層( )
17
隠れ層の出力は、複雑な計算をせずに になる
ここで 1-of-k が生きてくる
( :N次元ベクトル)
各 v が単語の「意味」を表す
W x = v1 v2 v3 v4 v5 v6
0
B
B
B
B
B
B
@
0
1
0
0
0
0
1
C
C
C
C
C
C
A
= v2W x = v1 v2 v3 v4
h
h = W xwI
= vwI
vwI v
18. パーティクルフィルタ研究会2016/1/7 / 32
隠れ層から出力層への重みW′も同様に単語ベクトル v′ を並べた
出力層への重み( )
18
W'
W 0
=
0
B
B
B
B
B
B
B
@
v0T
1
v0T
2
v0T
3
v0T
4
v0T
5
v0T
6
1
C
C
C
C
C
C
C
A
=
0
B
B
B
B
B
B
@
v0
11 v0
12 · · · v0
1N
v0
21 v0
22 · · · v0
2N
v0
31 v0
32 · · · v0
3N
v0
41 v0
42 · · · v0
4N
v0
51 v0
52 · · · v0
5N
v0
61 v0
62 · · · v0
6N
1
C
C
C
C
C
C
A
, の場合,W' は 106
x 200 の大きさh 2 R200
x 2 R106
19. パーティクルフィルタ研究会2016/1/7 / 32
出力層( )
19
の各要素は,周辺単語出力ベクトルと注目単語入力ベクトルの内積
uc.i =
0
B
B
B
B
B
B
@
v0
11 v0
12 · · · v0
1N
v0
21 v0
22 · · · v0
2N
v0
31 v0
32 · · · v0
3N
v0
41 v0
42 · · · v0
4N
v0
51 v0
52 · · · v0
5N
v0
61 v0
62 · · · v0
6N
1
C
C
C
C
C
C
A
vwI
出力層のユニットは出力層への重み x 隠れ層
0
B
B
B
B
B
B
@
uc,1
uc,2
uc,3
uc,4
uc,5
uc,6
1
C
C
C
C
C
C
A
0
B
B
B
B
B
B
@
vwI 1
vwI 2
vwI 3
vwI 4
vwI 5
vwI 6
1
C
C
C
C
C
C
A
uc = W 0
vwI
uc,i = v0T
i · vwI
uc = W 0
vwI
uc = W 0
vwI
20. パーティクルフィルタ研究会2016/1/7 / 32
出力層( )
19
の各要素は,周辺単語出力ベクトルと注目単語入力ベクトルの内積
uc.i =
0
B
B
B
B
B
B
@
v0
11 v0
12 · · · v0
1N
v0
21 v0
22 · · · v0
2N
v0
31 v0
32 · · · v0
3N
v0
41 v0
42 · · · v0
4N
v0
51 v0
52 · · · v0
5N
v0
61 v0
62 · · · v0
6N
1
C
C
C
C
C
C
A
vwI
出力層のユニットは出力層への重み x 隠れ層
0
B
B
B
B
B
B
@
uc,1
uc,2
uc,3
uc,4
uc,5
uc,6
1
C
C
C
C
C
C
A
0
B
B
B
B
B
B
@
vwI 1
vwI 2
vwI 3
vwI 4
vwI 5
vwI 6
1
C
C
C
C
C
C
A
uc = W 0
vwI
uc,i = v0T
i · vwI
uc = W 0
vwI
uc = W 0
vwI
21. パーティクルフィルタ研究会2016/1/7 / 32
出力層( )
19
の各要素は,周辺単語出力ベクトルと注目単語入力ベクトルの内積
uc.i =
0
B
B
B
B
B
B
@
v0
11 v0
12 · · · v0
1N
v0
21 v0
22 · · · v0
2N
v0
31 v0
32 · · · v0
3N
v0
41 v0
42 · · · v0
4N
v0
51 v0
52 · · · v0
5N
v0
61 v0
62 · · · v0
6N
1
C
C
C
C
C
C
A
vwI
出力層のユニットは出力層への重み x 隠れ層
0
B
B
B
B
B
B
@
uc,1
uc,2
uc,3
uc,4
uc,5
uc,6
1
C
C
C
C
C
C
A
0
B
B
B
B
B
B
@
vwI 1
vwI 2
vwI 3
vwI 4
vwI 5
vwI 6
1
C
C
C
C
C
C
A
uc = W 0
vwI
uc,i = v0T
i · vwI
uc = W 0
vwI
uc = W 0
vwI
34. パーティクルフィルタ研究会2016/1/7 / 32
高速化したあとの目的関数
29
Wの更新式
W’の更新式
E を W,W’ の各要素で偏微分する ti =
⇢
1 (i = wO)
0 (otherwise)
E = log (v0T
wO
· vwI
)
X
v2VNeg
( v0T
v · vwI
)
v0
ij := v0
ij ⌘( (v0T
i · v0
wI
) ti)vwI j
vij := vwI i ⌘
X
v2wO[VNeg
( (v0T
v · vwI
) tv)v0
vi