SlideShare a Scribd company logo
1 of 29
Download to read offline
Tensor Train in machine learning
Alexander Novikov
October 11, 2016
Alexander Novikov Tensor Train in machine learning October 11, 2016 1 / 26
Recommender systems
Assume low-rank structure.
Alexander Novikov Tensor Train in machine learning October 11, 2016 2 / 26
Tensor Train summary
Tensor Train (TT) decomposition [Oseledets 2011]:
A compact representation for tensors (=multidimensional array);
Allows for efficient application of linear algebra operations.
Alexander Novikov Tensor Train in machine learning October 11, 2016 3 / 26
Low-rank decomposition
A23 =
G1 G2
i2 = 3i1 = 2
Ai1i2 = G1[i1]
1×r
G2[i2]
r×1
A = G1G2
G1 – collection of rows, G2 – collection of columns:
Alexander Novikov Tensor Train in machine learning October 11, 2016 4 / 26
Tensor Train decomposition
A2423 =
G1 G2 G3 G4
i2 = 4 i3 = 2 i4 = 3
i1 = 2
Ai1...id
= G1[i1]
1×r
G2[i2]
r×r
. . . Gd [id ]
r×1
An example of computing one element of 4-dimensional tensor:
Alexander Novikov Tensor Train in machine learning October 11, 2016 5 / 26
Tensor Train decomposition Cont’d
Tensor A is said to be in the TT-format, if
Ai1,...,id
= G1[i1] G2[i2] · · · Gd [id ], ik ∈ {1, . . . , n},
where Gk[ik] — is a matrix of size rk−1 × rk, r0 = rd = 1.
Notation & terminology:
Gk — TT-cores;
rk — TT-ranks;
r = max
k=0,...,d
rk — the maximal TT-rank.
The TT-format uses O ndr2 memory to store nd elements. Efficient only
if the TT-rank is small.
Alexander Novikov Tensor Train in machine learning October 11, 2016 6 / 26
TT-format: example
Ai1,i2,i3 = i1 + i2 + i3,
i1 ∈ {1, 2, 3}, i2 ∈ {1, 2, 3, 4}, i3 ∈ {1, 2, 3, 4, 5}.
Ai1,i2,i3 = G1[i1]G2[i2]G3[i3],
Alexander Novikov Tensor Train in machine learning October 11, 2016 7 / 26
TT-format: example
Ai1,i2,i3 = i1 + i2 + i3,
i1 ∈ {1, 2, 3}, i2 ∈ {1, 2, 3, 4}, i3 ∈ {1, 2, 3, 4, 5}.
Ai1,i2,i3 = G1[i1]G2[i2]G3[i3],
G1[i1] = i1 1 G2[i2] =
1 0
i2 1
G3[i3] =
1
i3
Lets check:
A(i1, i2, i3) = i1 1
1 0
i2 1
1
i3
=
= i1 + i2 1
1
i3
= i1 + i2 + i3.
Alexander Novikov Tensor Train in machine learning October 11, 2016 7 / 26
TT-format: example
Ai1,i2,i3 = i1 + i2 + i3,
i1 ∈ {1, 2, 3}, i2 ∈ {1, 2, 3, 4}, i3 ∈ {1, 2, 3, 4, 5}.
Ai1,i2,i3 = G1[i1]G2[i2]G3[i3],
G1 = 1 1 , 2 1 , 3 1
G2 =
1 0
1 1
,
1 0
2 1
,
1 0
3 1
,
1 0
4 1
G3 =
1
1
,
1
2
,
1
3
,
1
4
,
1
5
The tensor has 3 · 4 · 5 = 60 elements.
The TT-format use 32 parameters to describe it.
Alexander Novikov Tensor Train in machine learning October 11, 2016 8 / 26
Sum of tensors
Tensors A and B are in the TT-format:
Ai1...id
= GA
1 [i1] · · · GA
d [id ], Bi1...id
= GB
1 [i1] · · · GB
d [id ].
Find the TT-format of
C = A + B,
Ci1...id
= Ai1...id
+ Bi1...id
.
Alexander Novikov Tensor Train in machine learning October 11, 2016 9 / 26
Sum of tensors
Tensors A and B are in the TT-format:
Ai1...id
= GA
1 [i1] · · · GA
d [id ], Bi1...id
= GB
1 [i1] · · · GB
d [id ].
Find the TT-format of
C = A + B,
Ci1...id
= Ai1...id
+ Bi1...id
.
TT-cores of the result:
GC
k [ik] =
GA
k [ik] 0
0 GB
k [ik]
, k = 2, . . . , d − 1,
GC
1 [i1] = GA
1 [i1] GB
1 [i1] , GC
d [id ] =
GA
d [id ]
GB
d [id ]
.
TT-ranks of the result are sums of the TT-ranks.
Alexander Novikov Tensor Train in machine learning October 11, 2016 9 / 26
TT-rounding
Given a tensor A in the TT-format with rank r, the TT-rounding
[Oseledets, 2011]:
A = tt-round(A, ε), ε > 0
finds the tensor A such that
1 A − A F ≤ ε A F ;
2 TT-rank of A is minimal among all B:
A − B F ≤ ε√
d−1
A F .
Where A F = i1,...,id
A2
i1,...,id
.
Alexander Novikov Tensor Train in machine learning October 11, 2016 10 / 26
How to find TT-decomposition of a given tensor
Analytical formulas for special cases;
An exact algorithm based on SVD for medium tensor. E.g. for a
58 ≈ 400 000 tensor takes 8 ms on my laptop;
For large tensors (e.g. 250), approximate algorithms that look at a
fraction of the tensor elements: DMRG-cross [Savostyanov and
Oseledets, 2011], AMEn-cross [Dolgov and Savostyanov, 2013].
Alexander Novikov Tensor Train in machine learning October 11, 2016 11 / 26
TT-format operations
Operation Rank of the result
C = c · A r(C) = r(A)
C = A + c r(C) = r(A)+1
C = A + B r(C) ≤ r(A)+r(B)
C = A B r(C) ≤ r(A)r(B)
C = round(A, ε) r(C) ≤ r(A)
sum A –
A F –
(Ask me about differential equations)
Alexander Novikov Tensor Train in machine learning October 11, 2016 12 / 26
Example application: TensorNet
1 Neural networks use fully-connected layers: y = f (W x + b).
2 The matrix W is of millions parameters.
3 Lets store and train the matrix W in the TT-format.
Can’t work for general matrices, but for VGG-16 net we compressed
4048 × 4048 matrix to 320 params without loss of accuracy.
Alexander Novikov Tensor Train in machine learning October 11, 2016 13 / 26
Linear model
Model
y(x) = w x + b,
b ∈ R, w ∈ Rd
Loss function
N
k=1
w x(k)
+ b, y(k)
.
Linear regression
Logistic regression
Linear SVM
...
Alexander Novikov Tensor Train in machine learning October 11, 2016 14 / 26
Need for interactions
Linear models give everyone same recommendations
Same story e.g. in bag-of-words text tasks
Use interactions (products of features)!
Alexander Novikov Tensor Train in machine learning October 11, 2016 15 / 26
Models with interactions
y(x) = b + w x +
i,j
Pijxi xj,
b ∈ R, w ∈ Rd
, P ∈ Rd×d
For d features d2 parameters: overfitting on sparse data
Complexity is also d2
For recommender systems d is millions
SVM with polynomial kernel has same drawbacks
Alexander Novikov Tensor Train in machine learning October 11, 2016 16 / 26
Factorization machines
y(x) = b + w x +
i,j
Pijxi xj
Factorization machines [Rendle 2010] use rank r for P
y(x) =b + w x +
i,j
r
f =1
Vif Vjf xi xj,
b ∈ R, w ∈ Rd
, V ∈ Rd×r
Matrix P = VV is not sparse, but structured (low rank)
Control the number of parameters with r
Can represent almost any matrix with large r
Alexander Novikov Tensor Train in machine learning October 11, 2016 17 / 26
High order analysis
Factorization machines model (3rd order)
y(x) =b + w x +
i,j
r
f =1
Vif Vjf xi xj
+
i,j,k
r
f =1
Uif Ujf Ukf xi xjxk.
In fact, Factorization machines just use CP-decomposition for the weight
tensor Pi,j,k:
Pijk =
r
f =1
Uif Ujf Ukf
But
Converge poorly with high order
Complexity of inference and learning
Alexander Novikov Tensor Train in machine learning October 11, 2016 18 / 26
Exponential machines
Lets encode interactions by binary code. Every bit indicates if
corresponded feature is included or not in current interaction.
Exponential machines example (d = 3):
y(x) = W000 + W100 x1 + W010 x2 + W001x3
+ W110 x1x2 + W101 x1x3 + W011 x2x3
+ W111 x1x2x3.
Alexander Novikov Tensor Train in machine learning October 11, 2016 19 / 26
Exponential machines
Lets encode interactions by binary code. Every bit indicates if
corresponded feature is included or not in current interaction.
Exponential machines example (d = 3):
y(x) = W000 + W100 x1 + W010 x2 + W001x3
+ W110 x1x2 + W101 x1x3 + W011 x2x3
+ W111 x1x2x3.
In general:
y(x) =
1
i1=0
. . .
1
id =0
Wi1,...,id
xi1
1 . . . xid
d ,
W ∈ R2×...×2
with TT-rank r
Captures all 2d interactions
Control the number of parameters with TT-rank r
Can represent any polynomial function with large r
Alexander Novikov Tensor Train in machine learning October 11, 2016 19 / 26
Exponential machines inference
Linear O(r2d) inference:
y(x) =
i1,...,id
G1[i1] . . . Gd [id ]
d
k=1
xik
k
=
i1,...,id
xi1
1 G1[i1] . . . xid
d Gd [id ]
=


1
i1=0
xi1
1 G1[i1]

 . . .


1
id =0
xid
d Gd [id ]


= A1
1×r
A2
r×r
. . . Ad
r×1
,
Alexander Novikov Tensor Train in machine learning October 11, 2016 20 / 26
Exponential machines learning
minimize
W
N
k=1
W, X(k)
, y(k)
,
subject to TT-rank(W) = r0,
1 Autodiff to compute gradients with respect to TT-cores G
2 OR Riemannian optimization
Theorem [Holtz, 2012]
The set of all d-dimensional tensors with fixed TT-rank r
Mr = {W ∈ R2×...×2
: TT-rank(W) = r}
forms a Riemannian manifold.
Alexander Novikov Tensor Train in machine learning October 11, 2016 21 / 26
Riemannian optimization
− ∂L
∂Wt
TW Mr
−Gt
TT-roundWt+1
Mr
projection
Wt
Alexander Novikov Tensor Train in machine learning October 11, 2016 22 / 26
Riemannian optimization Cont’d
Loss function
L(W) =
N
k=1
W, X(k)
, y(k)
Gradient
∂L
∂W
=
N
k=1
∂
∂y
X(k)
.
Where X is of TT-rank 1!
Xi1...id
=
d
k=1
xik
k .
Alexander Novikov Tensor Train in machine learning October 11, 2016 23 / 26
Experiments: optimization
10-1 100 101 102
time (s)
10-17
10-15
10-13
10-11
10-9
10-7
10-5
10-3
10-1
trainloss
Cores GD
Cores SGD 100
Cores SGD 500
Riemann GD
Riemann 100
Riemann 500
Riemann GD rand init
(a) Car dataset
10-1 100 101 102 103 104
time (s)
10-16
10-14
10-12
10-10
10-8
10-6
10-4
10-2
100
trainloss
Cores GD
Cores SGD 100
Cores SGD 500
Riemann GD
Riemann 100
Riemann 500
Riemann GD rand init
(b) HIV dataset
Alexander Novikov Tensor Train in machine learning October 11, 2016 24 / 26
Experiments: classification
1 We generated 105 train and 105 test objects and d = 30 features.
2 Xij ∼ U{−1, +1}.
3 Ground truth for 3 interactions of order 2:
y(x) = ε1x1x5 + ε2x3x8 + ε3x4x5; ε1, ε2, ε3 ∼ U(−1, 1).
4 We used 20 interactions of order 6.
Method Test AUC Training time (s) Inference time (s)
Log. reg. 0.50 ± 0.0 0.4 0.0
RF 0.55 ± 0.0 21.4 1.3
SVM RBF 0.50 ± 0.0 2262.6 1076.1
SVM poly. 2 0.50 ± 0.0 1152.6 852.0
SVM poly. 6 0.56 ± 0.0 4090.9 754.8
2-nd order FM 0.50 ± 0.0 638.2 0.1
6-th order FM 0.57 ± 0.05 1412.0 0.2
ExM rank 2 0.54 ± 0.05 198.4 0.1
ExM rank 4 0.69 ± 0.02 443.0 0.1
ExM rank 8 0.75 ± 0.02 998.3 0.2
Alexander Novikov Tensor Train in machine learning October 11, 2016 25 / 26
Conclusion
Tensor Train decomposition compactly represent tensors.
Can parametrize machine learning models with TT-tensors.
E.g. the weights of a neural network.
Or modeling all 2d interactions (products of features).
Control the number of underlying parameters via TT-rank.
Riemannian optimization learning sometimes outperforms SGD.
There is a Python code for everything: TT, TensorNet, and
Exponential Machines.
Alexander Novikov Tensor Train in machine learning October 11, 2016 26 / 26

More Related Content

What's hot

Nonnegative Matrix Factorization
Nonnegative Matrix FactorizationNonnegative Matrix Factorization
Nonnegative Matrix FactorizationTatsuya Yokota
 
グラフ構造データに対する深層学習〜創薬・材料科学への応用とその問題点〜 (第26回ステアラボ人工知能セミナー)
グラフ構造データに対する深層学習〜創薬・材料科学への応用とその問題点〜 (第26回ステアラボ人工知能セミナー)グラフ構造データに対する深層学習〜創薬・材料科学への応用とその問題点〜 (第26回ステアラボ人工知能セミナー)
グラフ構造データに対する深層学習〜創薬・材料科学への応用とその問題点〜 (第26回ステアラボ人工知能セミナー)STAIR Lab, Chiba Institute of Technology
 
2値分類・多クラス分類
2値分類・多クラス分類2値分類・多クラス分類
2値分類・多クラス分類t dev
 
Generalization of Tensor Factorization and Applications
Generalization of Tensor Factorization and ApplicationsGeneralization of Tensor Factorization and Applications
Generalization of Tensor Factorization and ApplicationsKohei Hayashi
 
第8回関西CV・PRML勉強会(Meanshift)
第8回関西CV・PRML勉強会(Meanshift)第8回関西CV・PRML勉強会(Meanshift)
第8回関西CV・PRML勉強会(Meanshift)Yutaka Yamada
 
Genetic algorithm raktim
Genetic algorithm raktimGenetic algorithm raktim
Genetic algorithm raktimRaktim Halder
 
오토인코더의 모든 것
오토인코더의 모든 것오토인코더의 모든 것
오토인코더의 모든 것NAVER Engineering
 
確率的推論と行動選択
確率的推論と行動選択確率的推論と行動選択
確率的推論と行動選択Masahiro Suzuki
 
総和伝搬法を用いた分散近似メッセージ伝搬アルゴリズム
総和伝搬法を用いた分散近似メッセージ伝搬アルゴリズム総和伝搬法を用いた分散近似メッセージ伝搬アルゴリズム
総和伝搬法を用いた分散近似メッセージ伝搬アルゴリズムRyo Hayakawa
 
Neural Networks: Support Vector machines
Neural Networks: Support Vector machinesNeural Networks: Support Vector machines
Neural Networks: Support Vector machinesMostafa G. M. Mostafa
 
Anomaly detection 系の論文を一言でまとめた
Anomaly detection 系の論文を一言でまとめたAnomaly detection 系の論文を一言でまとめた
Anomaly detection 系の論文を一言でまとめたぱんいち すみもと
 
Recurrent and Recursive Networks (Part 1)
Recurrent and Recursive Networks (Part 1)Recurrent and Recursive Networks (Part 1)
Recurrent and Recursive Networks (Part 1)sohaib_alam
 
Denoising Diffusion Probabilistic Modelsの重要な式の解説
Denoising Diffusion Probabilistic Modelsの重要な式の解説Denoising Diffusion Probabilistic Modelsの重要な式の解説
Denoising Diffusion Probabilistic Modelsの重要な式の解説Tomonari Masada
 
Variational Autoencoders VAE - Santiago Pascual - UPC Barcelona 2018
Variational Autoencoders VAE - Santiago Pascual - UPC Barcelona 2018Variational Autoencoders VAE - Santiago Pascual - UPC Barcelona 2018
Variational Autoencoders VAE - Santiago Pascual - UPC Barcelona 2018Universitat Politècnica de Catalunya
 

What's hot (20)

Nonnegative Matrix Factorization
Nonnegative Matrix FactorizationNonnegative Matrix Factorization
Nonnegative Matrix Factorization
 
グラフ構造データに対する深層学習〜創薬・材料科学への応用とその問題点〜 (第26回ステアラボ人工知能セミナー)
グラフ構造データに対する深層学習〜創薬・材料科学への応用とその問題点〜 (第26回ステアラボ人工知能セミナー)グラフ構造データに対する深層学習〜創薬・材料科学への応用とその問題点〜 (第26回ステアラボ人工知能セミナー)
グラフ構造データに対する深層学習〜創薬・材料科学への応用とその問題点〜 (第26回ステアラボ人工知能セミナー)
 
2値分類・多クラス分類
2値分類・多クラス分類2値分類・多クラス分類
2値分類・多クラス分類
 
π計算
π計算π計算
π計算
 
SINGLE-SOURCE SHORTEST PATHS
SINGLE-SOURCE SHORTEST PATHS SINGLE-SOURCE SHORTEST PATHS
SINGLE-SOURCE SHORTEST PATHS
 
Generalization of Tensor Factorization and Applications
Generalization of Tensor Factorization and ApplicationsGeneralization of Tensor Factorization and Applications
Generalization of Tensor Factorization and Applications
 
第8回関西CV・PRML勉強会(Meanshift)
第8回関西CV・PRML勉強会(Meanshift)第8回関西CV・PRML勉強会(Meanshift)
第8回関西CV・PRML勉強会(Meanshift)
 
Genetic algorithm raktim
Genetic algorithm raktimGenetic algorithm raktim
Genetic algorithm raktim
 
오토인코더의 모든 것
오토인코더의 모든 것오토인코더의 모든 것
오토인코더의 모든 것
 
Wasserstein GAN
Wasserstein GANWasserstein GAN
Wasserstein GAN
 
確率的推論と行動選択
確率的推論と行動選択確率的推論と行動選択
確率的推論と行動選択
 
CSC446: Pattern Recognition (LN6)
CSC446: Pattern Recognition (LN6)CSC446: Pattern Recognition (LN6)
CSC446: Pattern Recognition (LN6)
 
総和伝搬法を用いた分散近似メッセージ伝搬アルゴリズム
総和伝搬法を用いた分散近似メッセージ伝搬アルゴリズム総和伝搬法を用いた分散近似メッセージ伝搬アルゴリズム
総和伝搬法を用いた分散近似メッセージ伝搬アルゴリズム
 
Neural Networks: Support Vector machines
Neural Networks: Support Vector machinesNeural Networks: Support Vector machines
Neural Networks: Support Vector machines
 
Random Forestsとその応用
Random Forestsとその応用Random Forestsとその応用
Random Forestsとその応用
 
Anomaly detection 系の論文を一言でまとめた
Anomaly detection 系の論文を一言でまとめたAnomaly detection 系の論文を一言でまとめた
Anomaly detection 系の論文を一言でまとめた
 
Matrix Factorization
Matrix FactorizationMatrix Factorization
Matrix Factorization
 
Recurrent and Recursive Networks (Part 1)
Recurrent and Recursive Networks (Part 1)Recurrent and Recursive Networks (Part 1)
Recurrent and Recursive Networks (Part 1)
 
Denoising Diffusion Probabilistic Modelsの重要な式の解説
Denoising Diffusion Probabilistic Modelsの重要な式の解説Denoising Diffusion Probabilistic Modelsの重要な式の解説
Denoising Diffusion Probabilistic Modelsの重要な式の解説
 
Variational Autoencoders VAE - Santiago Pascual - UPC Barcelona 2018
Variational Autoencoders VAE - Santiago Pascual - UPC Barcelona 2018Variational Autoencoders VAE - Santiago Pascual - UPC Barcelona 2018
Variational Autoencoders VAE - Santiago Pascual - UPC Barcelona 2018
 

Similar to Tensor Train decomposition in machine learning

New data structures and algorithms for \\post-processing large data sets and ...
New data structures and algorithms for \\post-processing large data sets and ...New data structures and algorithms for \\post-processing large data sets and ...
New data structures and algorithms for \\post-processing large data sets and ...Alexander Litvinenko
 
Low Power Adaptive FIR Filter Based on Distributed Arithmetic
Low Power Adaptive FIR Filter Based on Distributed ArithmeticLow Power Adaptive FIR Filter Based on Distributed Arithmetic
Low Power Adaptive FIR Filter Based on Distributed ArithmeticIJERA Editor
 
Integration techniques
Integration techniquesIntegration techniques
Integration techniquesKrishna Gali
 
Talk on Resource Allocation Strategies for Layered Multimedia Multicast Services
Talk on Resource Allocation Strategies for Layered Multimedia Multicast ServicesTalk on Resource Allocation Strategies for Layered Multimedia Multicast Services
Talk on Resource Allocation Strategies for Layered Multimedia Multicast ServicesAndrea Tassi
 
Design and Implementation of Parallel and Randomized Approximation Algorithms
Design and Implementation of Parallel and Randomized Approximation AlgorithmsDesign and Implementation of Parallel and Randomized Approximation Algorithms
Design and Implementation of Parallel and Randomized Approximation AlgorithmsAjay Bidyarthy
 
Linear regression without tears
Linear regression without tearsLinear regression without tears
Linear regression without tearsAnkit Sharma
 
High Speed Memory Efficient Multiplier-less 1-D 9/7 Wavelet Filters Based NED...
High Speed Memory Efficient Multiplier-less 1-D 9/7 Wavelet Filters Based NED...High Speed Memory Efficient Multiplier-less 1-D 9/7 Wavelet Filters Based NED...
High Speed Memory Efficient Multiplier-less 1-D 9/7 Wavelet Filters Based NED...IJERA Editor
 
Digital Signal Processing[ECEG-3171]-Ch1_L03
Digital Signal Processing[ECEG-3171]-Ch1_L03Digital Signal Processing[ECEG-3171]-Ch1_L03
Digital Signal Processing[ECEG-3171]-Ch1_L03Rediet Moges
 
preTEST3A Double Integrals Solved
preTEST3A Double Integrals SolvedpreTEST3A Double Integrals Solved
preTEST3A Double Integrals SolvedA Jorge Garcia
 
ENBIS 2018 presentation on Deep k-Means
ENBIS 2018 presentation on Deep k-MeansENBIS 2018 presentation on Deep k-Means
ENBIS 2018 presentation on Deep k-Meanstthonet
 
preTEST3A Double Integrals
preTEST3A Double IntegralspreTEST3A Double Integrals
preTEST3A Double IntegralsA Jorge Garcia
 
Small updates of matrix functions used for network centrality
Small updates of matrix functions used for network centralitySmall updates of matrix functions used for network centrality
Small updates of matrix functions used for network centralityFrancesco Tudisco
 
Efficient Analysis of high-dimensional data in tensor formats
Efficient Analysis of high-dimensional data in tensor formatsEfficient Analysis of high-dimensional data in tensor formats
Efficient Analysis of high-dimensional data in tensor formatsAlexander Litvinenko
 

Similar to Tensor Train decomposition in machine learning (20)

New data structures and algorithms for \\post-processing large data sets and ...
New data structures and algorithms for \\post-processing large data sets and ...New data structures and algorithms for \\post-processing large data sets and ...
New data structures and algorithms for \\post-processing large data sets and ...
 
QMC: Undergraduate Workshop, Introduction to Monte Carlo Methods with 'R' Sof...
QMC: Undergraduate Workshop, Introduction to Monte Carlo Methods with 'R' Sof...QMC: Undergraduate Workshop, Introduction to Monte Carlo Methods with 'R' Sof...
QMC: Undergraduate Workshop, Introduction to Monte Carlo Methods with 'R' Sof...
 
Type and proof structures for concurrency
Type and proof structures for concurrencyType and proof structures for concurrency
Type and proof structures for concurrency
 
Low Power Adaptive FIR Filter Based on Distributed Arithmetic
Low Power Adaptive FIR Filter Based on Distributed ArithmeticLow Power Adaptive FIR Filter Based on Distributed Arithmetic
Low Power Adaptive FIR Filter Based on Distributed Arithmetic
 
Integration techniques
Integration techniquesIntegration techniques
Integration techniques
 
Randomized algorithms ver 1.0
Randomized algorithms ver 1.0Randomized algorithms ver 1.0
Randomized algorithms ver 1.0
 
Talk on Resource Allocation Strategies for Layered Multimedia Multicast Services
Talk on Resource Allocation Strategies for Layered Multimedia Multicast ServicesTalk on Resource Allocation Strategies for Layered Multimedia Multicast Services
Talk on Resource Allocation Strategies for Layered Multimedia Multicast Services
 
SASA 2016
SASA 2016SASA 2016
SASA 2016
 
Design and Implementation of Parallel and Randomized Approximation Algorithms
Design and Implementation of Parallel and Randomized Approximation AlgorithmsDesign and Implementation of Parallel and Randomized Approximation Algorithms
Design and Implementation of Parallel and Randomized Approximation Algorithms
 
Linear regression without tears
Linear regression without tearsLinear regression without tears
Linear regression without tears
 
High Speed Memory Efficient Multiplier-less 1-D 9/7 Wavelet Filters Based NED...
High Speed Memory Efficient Multiplier-less 1-D 9/7 Wavelet Filters Based NED...High Speed Memory Efficient Multiplier-less 1-D 9/7 Wavelet Filters Based NED...
High Speed Memory Efficient Multiplier-less 1-D 9/7 Wavelet Filters Based NED...
 
Digital Signal Processing[ECEG-3171]-Ch1_L03
Digital Signal Processing[ECEG-3171]-Ch1_L03Digital Signal Processing[ECEG-3171]-Ch1_L03
Digital Signal Processing[ECEG-3171]-Ch1_L03
 
Section4 stochastic
Section4 stochasticSection4 stochastic
Section4 stochastic
 
preTEST3A Double Integrals Solved
preTEST3A Double Integrals SolvedpreTEST3A Double Integrals Solved
preTEST3A Double Integrals Solved
 
R Language Introduction
R Language IntroductionR Language Introduction
R Language Introduction
 
ENBIS 2018 presentation on Deep k-Means
ENBIS 2018 presentation on Deep k-MeansENBIS 2018 presentation on Deep k-Means
ENBIS 2018 presentation on Deep k-Means
 
preTEST3A Double Integrals
preTEST3A Double IntegralspreTEST3A Double Integrals
preTEST3A Double Integrals
 
2020 preTEST3A
2020 preTEST3A2020 preTEST3A
2020 preTEST3A
 
Small updates of matrix functions used for network centrality
Small updates of matrix functions used for network centralitySmall updates of matrix functions used for network centrality
Small updates of matrix functions used for network centrality
 
Efficient Analysis of high-dimensional data in tensor formats
Efficient Analysis of high-dimensional data in tensor formatsEfficient Analysis of high-dimensional data in tensor formats
Efficient Analysis of high-dimensional data in tensor formats
 

Recently uploaded

Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystSamantha Rae Coolbeth
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998YohFuh
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptxthyngster
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...Suhani Kapoor
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...Pooja Nehwal
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...Suhani Kapoor
 
Digi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptxDigi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptxTanveerAhmed817946
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一ffjhghh
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiSuhani Kapoor
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts ServiceSapana Sha
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubaihf8803863
 

Recently uploaded (20)

Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data Analyst
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
 
Decoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in ActionDecoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in Action
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
 
Digi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptxDigi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptx
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts Service
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 

Tensor Train decomposition in machine learning

  • 1. Tensor Train in machine learning Alexander Novikov October 11, 2016 Alexander Novikov Tensor Train in machine learning October 11, 2016 1 / 26
  • 2. Recommender systems Assume low-rank structure. Alexander Novikov Tensor Train in machine learning October 11, 2016 2 / 26
  • 3. Tensor Train summary Tensor Train (TT) decomposition [Oseledets 2011]: A compact representation for tensors (=multidimensional array); Allows for efficient application of linear algebra operations. Alexander Novikov Tensor Train in machine learning October 11, 2016 3 / 26
  • 4. Low-rank decomposition A23 = G1 G2 i2 = 3i1 = 2 Ai1i2 = G1[i1] 1×r G2[i2] r×1 A = G1G2 G1 – collection of rows, G2 – collection of columns: Alexander Novikov Tensor Train in machine learning October 11, 2016 4 / 26
  • 5. Tensor Train decomposition A2423 = G1 G2 G3 G4 i2 = 4 i3 = 2 i4 = 3 i1 = 2 Ai1...id = G1[i1] 1×r G2[i2] r×r . . . Gd [id ] r×1 An example of computing one element of 4-dimensional tensor: Alexander Novikov Tensor Train in machine learning October 11, 2016 5 / 26
  • 6. Tensor Train decomposition Cont’d Tensor A is said to be in the TT-format, if Ai1,...,id = G1[i1] G2[i2] · · · Gd [id ], ik ∈ {1, . . . , n}, where Gk[ik] — is a matrix of size rk−1 × rk, r0 = rd = 1. Notation & terminology: Gk — TT-cores; rk — TT-ranks; r = max k=0,...,d rk — the maximal TT-rank. The TT-format uses O ndr2 memory to store nd elements. Efficient only if the TT-rank is small. Alexander Novikov Tensor Train in machine learning October 11, 2016 6 / 26
  • 7. TT-format: example Ai1,i2,i3 = i1 + i2 + i3, i1 ∈ {1, 2, 3}, i2 ∈ {1, 2, 3, 4}, i3 ∈ {1, 2, 3, 4, 5}. Ai1,i2,i3 = G1[i1]G2[i2]G3[i3], Alexander Novikov Tensor Train in machine learning October 11, 2016 7 / 26
  • 8. TT-format: example Ai1,i2,i3 = i1 + i2 + i3, i1 ∈ {1, 2, 3}, i2 ∈ {1, 2, 3, 4}, i3 ∈ {1, 2, 3, 4, 5}. Ai1,i2,i3 = G1[i1]G2[i2]G3[i3], G1[i1] = i1 1 G2[i2] = 1 0 i2 1 G3[i3] = 1 i3 Lets check: A(i1, i2, i3) = i1 1 1 0 i2 1 1 i3 = = i1 + i2 1 1 i3 = i1 + i2 + i3. Alexander Novikov Tensor Train in machine learning October 11, 2016 7 / 26
  • 9. TT-format: example Ai1,i2,i3 = i1 + i2 + i3, i1 ∈ {1, 2, 3}, i2 ∈ {1, 2, 3, 4}, i3 ∈ {1, 2, 3, 4, 5}. Ai1,i2,i3 = G1[i1]G2[i2]G3[i3], G1 = 1 1 , 2 1 , 3 1 G2 = 1 0 1 1 , 1 0 2 1 , 1 0 3 1 , 1 0 4 1 G3 = 1 1 , 1 2 , 1 3 , 1 4 , 1 5 The tensor has 3 · 4 · 5 = 60 elements. The TT-format use 32 parameters to describe it. Alexander Novikov Tensor Train in machine learning October 11, 2016 8 / 26
  • 10. Sum of tensors Tensors A and B are in the TT-format: Ai1...id = GA 1 [i1] · · · GA d [id ], Bi1...id = GB 1 [i1] · · · GB d [id ]. Find the TT-format of C = A + B, Ci1...id = Ai1...id + Bi1...id . Alexander Novikov Tensor Train in machine learning October 11, 2016 9 / 26
  • 11. Sum of tensors Tensors A and B are in the TT-format: Ai1...id = GA 1 [i1] · · · GA d [id ], Bi1...id = GB 1 [i1] · · · GB d [id ]. Find the TT-format of C = A + B, Ci1...id = Ai1...id + Bi1...id . TT-cores of the result: GC k [ik] = GA k [ik] 0 0 GB k [ik] , k = 2, . . . , d − 1, GC 1 [i1] = GA 1 [i1] GB 1 [i1] , GC d [id ] = GA d [id ] GB d [id ] . TT-ranks of the result are sums of the TT-ranks. Alexander Novikov Tensor Train in machine learning October 11, 2016 9 / 26
  • 12. TT-rounding Given a tensor A in the TT-format with rank r, the TT-rounding [Oseledets, 2011]: A = tt-round(A, ε), ε > 0 finds the tensor A such that 1 A − A F ≤ ε A F ; 2 TT-rank of A is minimal among all B: A − B F ≤ ε√ d−1 A F . Where A F = i1,...,id A2 i1,...,id . Alexander Novikov Tensor Train in machine learning October 11, 2016 10 / 26
  • 13. How to find TT-decomposition of a given tensor Analytical formulas for special cases; An exact algorithm based on SVD for medium tensor. E.g. for a 58 ≈ 400 000 tensor takes 8 ms on my laptop; For large tensors (e.g. 250), approximate algorithms that look at a fraction of the tensor elements: DMRG-cross [Savostyanov and Oseledets, 2011], AMEn-cross [Dolgov and Savostyanov, 2013]. Alexander Novikov Tensor Train in machine learning October 11, 2016 11 / 26
  • 14. TT-format operations Operation Rank of the result C = c · A r(C) = r(A) C = A + c r(C) = r(A)+1 C = A + B r(C) ≤ r(A)+r(B) C = A B r(C) ≤ r(A)r(B) C = round(A, ε) r(C) ≤ r(A) sum A – A F – (Ask me about differential equations) Alexander Novikov Tensor Train in machine learning October 11, 2016 12 / 26
  • 15. Example application: TensorNet 1 Neural networks use fully-connected layers: y = f (W x + b). 2 The matrix W is of millions parameters. 3 Lets store and train the matrix W in the TT-format. Can’t work for general matrices, but for VGG-16 net we compressed 4048 × 4048 matrix to 320 params without loss of accuracy. Alexander Novikov Tensor Train in machine learning October 11, 2016 13 / 26
  • 16. Linear model Model y(x) = w x + b, b ∈ R, w ∈ Rd Loss function N k=1 w x(k) + b, y(k) . Linear regression Logistic regression Linear SVM ... Alexander Novikov Tensor Train in machine learning October 11, 2016 14 / 26
  • 17. Need for interactions Linear models give everyone same recommendations Same story e.g. in bag-of-words text tasks Use interactions (products of features)! Alexander Novikov Tensor Train in machine learning October 11, 2016 15 / 26
  • 18. Models with interactions y(x) = b + w x + i,j Pijxi xj, b ∈ R, w ∈ Rd , P ∈ Rd×d For d features d2 parameters: overfitting on sparse data Complexity is also d2 For recommender systems d is millions SVM with polynomial kernel has same drawbacks Alexander Novikov Tensor Train in machine learning October 11, 2016 16 / 26
  • 19. Factorization machines y(x) = b + w x + i,j Pijxi xj Factorization machines [Rendle 2010] use rank r for P y(x) =b + w x + i,j r f =1 Vif Vjf xi xj, b ∈ R, w ∈ Rd , V ∈ Rd×r Matrix P = VV is not sparse, but structured (low rank) Control the number of parameters with r Can represent almost any matrix with large r Alexander Novikov Tensor Train in machine learning October 11, 2016 17 / 26
  • 20. High order analysis Factorization machines model (3rd order) y(x) =b + w x + i,j r f =1 Vif Vjf xi xj + i,j,k r f =1 Uif Ujf Ukf xi xjxk. In fact, Factorization machines just use CP-decomposition for the weight tensor Pi,j,k: Pijk = r f =1 Uif Ujf Ukf But Converge poorly with high order Complexity of inference and learning Alexander Novikov Tensor Train in machine learning October 11, 2016 18 / 26
  • 21. Exponential machines Lets encode interactions by binary code. Every bit indicates if corresponded feature is included or not in current interaction. Exponential machines example (d = 3): y(x) = W000 + W100 x1 + W010 x2 + W001x3 + W110 x1x2 + W101 x1x3 + W011 x2x3 + W111 x1x2x3. Alexander Novikov Tensor Train in machine learning October 11, 2016 19 / 26
  • 22. Exponential machines Lets encode interactions by binary code. Every bit indicates if corresponded feature is included or not in current interaction. Exponential machines example (d = 3): y(x) = W000 + W100 x1 + W010 x2 + W001x3 + W110 x1x2 + W101 x1x3 + W011 x2x3 + W111 x1x2x3. In general: y(x) = 1 i1=0 . . . 1 id =0 Wi1,...,id xi1 1 . . . xid d , W ∈ R2×...×2 with TT-rank r Captures all 2d interactions Control the number of parameters with TT-rank r Can represent any polynomial function with large r Alexander Novikov Tensor Train in machine learning October 11, 2016 19 / 26
  • 23. Exponential machines inference Linear O(r2d) inference: y(x) = i1,...,id G1[i1] . . . Gd [id ] d k=1 xik k = i1,...,id xi1 1 G1[i1] . . . xid d Gd [id ] =   1 i1=0 xi1 1 G1[i1]   . . .   1 id =0 xid d Gd [id ]   = A1 1×r A2 r×r . . . Ad r×1 , Alexander Novikov Tensor Train in machine learning October 11, 2016 20 / 26
  • 24. Exponential machines learning minimize W N k=1 W, X(k) , y(k) , subject to TT-rank(W) = r0, 1 Autodiff to compute gradients with respect to TT-cores G 2 OR Riemannian optimization Theorem [Holtz, 2012] The set of all d-dimensional tensors with fixed TT-rank r Mr = {W ∈ R2×...×2 : TT-rank(W) = r} forms a Riemannian manifold. Alexander Novikov Tensor Train in machine learning October 11, 2016 21 / 26
  • 25. Riemannian optimization − ∂L ∂Wt TW Mr −Gt TT-roundWt+1 Mr projection Wt Alexander Novikov Tensor Train in machine learning October 11, 2016 22 / 26
  • 26. Riemannian optimization Cont’d Loss function L(W) = N k=1 W, X(k) , y(k) Gradient ∂L ∂W = N k=1 ∂ ∂y X(k) . Where X is of TT-rank 1! Xi1...id = d k=1 xik k . Alexander Novikov Tensor Train in machine learning October 11, 2016 23 / 26
  • 27. Experiments: optimization 10-1 100 101 102 time (s) 10-17 10-15 10-13 10-11 10-9 10-7 10-5 10-3 10-1 trainloss Cores GD Cores SGD 100 Cores SGD 500 Riemann GD Riemann 100 Riemann 500 Riemann GD rand init (a) Car dataset 10-1 100 101 102 103 104 time (s) 10-16 10-14 10-12 10-10 10-8 10-6 10-4 10-2 100 trainloss Cores GD Cores SGD 100 Cores SGD 500 Riemann GD Riemann 100 Riemann 500 Riemann GD rand init (b) HIV dataset Alexander Novikov Tensor Train in machine learning October 11, 2016 24 / 26
  • 28. Experiments: classification 1 We generated 105 train and 105 test objects and d = 30 features. 2 Xij ∼ U{−1, +1}. 3 Ground truth for 3 interactions of order 2: y(x) = ε1x1x5 + ε2x3x8 + ε3x4x5; ε1, ε2, ε3 ∼ U(−1, 1). 4 We used 20 interactions of order 6. Method Test AUC Training time (s) Inference time (s) Log. reg. 0.50 ± 0.0 0.4 0.0 RF 0.55 ± 0.0 21.4 1.3 SVM RBF 0.50 ± 0.0 2262.6 1076.1 SVM poly. 2 0.50 ± 0.0 1152.6 852.0 SVM poly. 6 0.56 ± 0.0 4090.9 754.8 2-nd order FM 0.50 ± 0.0 638.2 0.1 6-th order FM 0.57 ± 0.05 1412.0 0.2 ExM rank 2 0.54 ± 0.05 198.4 0.1 ExM rank 4 0.69 ± 0.02 443.0 0.1 ExM rank 8 0.75 ± 0.02 998.3 0.2 Alexander Novikov Tensor Train in machine learning October 11, 2016 25 / 26
  • 29. Conclusion Tensor Train decomposition compactly represent tensors. Can parametrize machine learning models with TT-tensors. E.g. the weights of a neural network. Or modeling all 2d interactions (products of features). Control the number of underlying parameters via TT-rank. Riemannian optimization learning sometimes outperforms SGD. There is a Python code for everything: TT, TensorNet, and Exponential Machines. Alexander Novikov Tensor Train in machine learning October 11, 2016 26 / 26