本スライドは、弊社の梅本により弊社内の技術勉強会で使用されたものです。
近年注目を集めるアーキテクチャーである「Transformer」の解説スライドとなっております。
"Arithmer Seminar" is weekly held, where professionals from within and outside our company give lectures on their respective expertise.
The slides are made by the lecturer from outside our company, and shared here with his/her permission.
Arithmer株式会社は東京大学大学院数理科学研究科発の数学の会社です。私達は現代数学を応用して、様々な分野のソリューションに、新しい高度AIシステムを導入しています。AIをいかに上手に使って仕事を効率化するか、そして人々の役に立つ結果を生み出すのか、それを考えるのが私たちの仕事です。
Arithmer began at the University of Tokyo Graduate School of Mathematical Sciences. Today, our research of modern mathematics and AI systems has the capability of providing solutions when dealing with tough complex issues. At Arithmer we believe it is our job to realize the functions of AI through improving work efficiency and producing more useful results for society.
本スライドは、弊社の梅本により弊社内の技術勉強会で使用されたものです。
近年注目を集めるアーキテクチャーである「Transformer」の解説スライドとなっております。
"Arithmer Seminar" is weekly held, where professionals from within and outside our company give lectures on their respective expertise.
The slides are made by the lecturer from outside our company, and shared here with his/her permission.
Arithmer株式会社は東京大学大学院数理科学研究科発の数学の会社です。私達は現代数学を応用して、様々な分野のソリューションに、新しい高度AIシステムを導入しています。AIをいかに上手に使って仕事を効率化するか、そして人々の役に立つ結果を生み出すのか、それを考えるのが私たちの仕事です。
Arithmer began at the University of Tokyo Graduate School of Mathematical Sciences. Today, our research of modern mathematics and AI systems has the capability of providing solutions when dealing with tough complex issues. At Arithmer we believe it is our job to realize the functions of AI through improving work efficiency and producing more useful results for society.
2018年3月のニューロコンピューティング研究会にて発表.確率行列分解のBayes汎化誤差に関する理論的な不等式について数値実験を試みたかったが,そもそもBayes推定をすることが困難な問題であった:パラメータが単体(simplex)上に存在するために,事後分布からサンプリングを行うことが難しい.そこで本研究ではハミルトニアンモンテカルロ法という効率的なMCMC法を用いてBayes推定をしてみた.理論値と比較し,確率行列分解に対するハミルトニアンモンテカルロ法の有効性を検証した.in Japanese
Deep Learningについて、日本情報システム・ユーザー協会(JUAS)のJUAS ビジネスデータ研究会 AI分科会で発表しました。その際に使用した資料です。専門家向けではなく、一般向けの資料です。
なお本資料は、2015年12月の日本情報システム・ユーザー協会(JUAS)での発表資料の改訂版となります。
Tensor representations in signal processing and machine learning (tutorial ta...Tatsuya Yokota
Tutorial talk in APSIPA-ASC 2020.
Title: Tensor representations in signal processing and machine learning.
Introduction to tensor decomposition (テンソル分解入門)
Basics of tensor decomposition (テンソル分解の基礎)
2. 2
Introduction
Tensor is a general name of multi-dimensional array.
For the growth of information sensing, demands of tensor data
analysis are substantially increasing.
・
・
・
・
・
・
・
・
・
・
・
・
・・・
・・・
・・・
1d-tensor 2d-tensor 3d-tensor 4d-tensor 5d-tensor
Multi-channel time-series(2d)
Multi-channel
time-frequency signal(3d)
MRI data for multiple subjects(4d)
・・・
・・・
・・・
・・・
・・・
・・・
・・・
Subject 1 Subject 2 Subject N
Task 1
Task 2
Task M
Multi-channel time-frequency signals
for multiple mental tasks and subjects (5d)
9. 行列と行列の積
(I×J)行列 ・(J×K)行列 = (I×K)行列
テンソルと行列の積
(I×J×K)テンソル ×1 (L×I)行列 = (L×J×K)テンソル
(I×J×K)テンソル ×2 (L×J)行列 = (I×L×K)テンソル
(I×J×K)テンソル ×3 (L×K)行列 = (I×J×L)テンソル
9
テンソルの計算(6)
=・
I
J
J
K K
I
I
J
K
×1
I
L = L
JK
I
L I
=
JK
L
JK
行列化
行列化
・
22. Tucker decomposition or
High order singular value decomposition (HOSVD) is
one of mathematical decomposition model for tensor.
It can be used for
dimensionality reduction (compression),
feature extraction (sparse / nonnegative / independent),
completion (estimation of missing value),
prediction (regression) and so on.
22
Tucker decomposition
~=
T
Y A B
(I × J) (I × R) (R × J)(R × R)
D
=
Generalization
of matrix factorization
23. Low-rank approximation of matrix decomposition
Ex) R-rank approximation
What’s a multilinear tensor rank (MT rank)
Ex) (R1,R2,R3)-rank approximation of a 3D-tensor
Multilinear tensor rank is the size of core tensor
23
Low-rank approximation of tensor
~=
T
Y A B
(I × J) (I × R) (R × J)(R × R)
D
Y
A D B
Y
G
(I1 × I2 × I3) (I1 × R1)
A
B
C
(R2 × I2)
(I3 × R3)
(R1 × R2 × R3)
T
=~
Y
GA B
C
24. Appropriate accuracy and compression ratio are important for data
compression.
Rank is a trade-off parameter for both properties.
Compression ratio is linearly changed w.r.t. rank, but
Accuracy is non-linearly changed in many real problems.
It is important to estimate appropriate MT rank for compression
24
Compression & Rank
Higher rank
Lower rank
High accuracy
Low compression ratio
Low accuracy
High compression ratio
rank
Comp.ratio
rank
Accuracy
25. We assume that the observed data is generated by low-rank Tucker
model and additive noise.
Assumption : generated model can be characterized as Tucker model
Rank is an important parameter for the noise reduction
25
Noise reduction & Rank
rank
Accuracy
+
Too highToo low
Over-fitting to noise Insufficient
to construct
26. テンソルのランク推定の研究について紹介
研究成果
スパースTucker分解を用いたテンソルランクの推定
T. Yokota, A. Cichocki. Multilinear tensor rank estimation via
sparse Tucker decomposition, In Proceedings of SCIS&ISIS2014,
pp. 478-483, 2014.
情報量基準を用いたテンソルランクの推定
T. Yokota, N. Lee, and A. Cichocki. Robust Multilinear Tensor
Rank Estimation Using Higher Order Singular Value
Decomposition and Information Criteria, IEEE Transactions on
Signal Processing, vol. 65, issue 5, pp. 1196-1206, 2017.
26
ランク推定の研究について紹介
28. Pruning Sparse Tucker Decomposition (PSTD)
L1-norm minimization of core tensor
Error bound of input tensor and reconstructed tensor
Orthogonal constraint of factor matrices
28
Proposed method & Algorithm
Orthogonal LS fix
Orthogonal LS fixfix fix
fix fix
fix Orthogonal LSfix fix
fix fixfix LASSO
…
…
Pruning step
(coefficient based)
sparse
Orthogonal
29. Main-problem
Sub-problem for U (orthogonal dictionary learning)
Criterion
Update rule
29
Sub-problem for U
Lagrange’s formalization Lagrange’s coefficient
Least squares solution
Orthogonalization
30. Sub-problem for G
We estimate optimal λ corresponding to ε by binary search.
Large λ Sparse & Large error
Small λ Dense & Small error
30
Sub-problem for G
Tensor form of original problem
Vector form in Lagrange’s method
Vector form of Y
Vector form of G
Error bound
For sparse coding
Error
λ
Non-linear monotonic
increasing function
LASSO regression
31. We have sparse coefficient core tensor G
Detection of redundant slices
Pruning redundant slices & dictionaries for all directions
Value of nearly zero implies that the corresponding dictionary is
not used for representing data. (deleted by sparse coding)
31
Pruning step
Slice Unfold Sum of absolute (L1-norm)
relatively large
relatively large
nearly zero
relatively large
relatively small
nearly zero
relatively small
Prune !!
Prune !!
32. Data: synthetic data
Generated core tensor : (10 × 20 × 30)
Generated factor matrices: (25×10), (50×20), (75×30)
Input tensor is generated by
Convergence aspect
Final objective value
1.5 e-2 ± 9,8 e-5
MT rank
Completely estimated
Sparsity of G
42.9 ± 0.626 %
Decreased iteration
99.9 %
32
Experiments: convergence
+
SNR = 10dB
Gaussian noiseTucker model
Applying PSTD algorithm
34. We applied PSTD to image compression
We changed
SNR parameter = {25, 30, …, 45} for error bound
Quantization parameter: q in various values
34
Experiments: Image compression (1)
PSTD
(1024 × 1024)
(8×8×16384)
Bases of PSTD Bases of JPEG
Huffman coding
quantization(q) sorting
Sorting index
zero run-length Huffman coding
Huffman codingdifference
DC
AC
quantization(q) difference Huffman coding