Low-rank tensor approximation (Introduction)

Low-rank tensor approximation
Alexander Litvinenko
RWTH Aachen

Objectives of low-rank/sparse data approximations
1. Drastically reduce computing time and memory
requirements. This will reduce energy consumption and CO2
pollution
2. Extract knowledge from large high-dimensional datasets
How?
1. Develop new low-rank/sparse data structures/formats
2. Represent/approximate multidimensional operators and
functions in these data formats
3. Develop (new) linear algebra algorithms for these tensor
formats
1 / 52

Used
I Tensor book of W. Hackbusch 2012, and 2 books of Boris and
Venera Khoromskij
I Dissertations of I. Oseledets and M. Espig
I Articles of Tyrtyshnikov et al., De Lathauwer et al., L.
Grasedyck, B. Khoromskij, M. Espig
I Lecture courses and presentations of Boris and Venera
Khoromskij, D. Kressner
I Software
– T. Kolda et al.;
– M. Espig et al.;
– D. Kressner, K. Tobler;
– I. Oseledets et al.
– L. De Lathauwer
2 / 52

History (not full list) of using tensor approximations
I Canonical in 1927, Tucker in 1966, Tensor Train in 2010.
I 1997 : signal processing (Lieven De Lathauwer)
I 2005 and later: for computational physics and chemistry -
Hartree–Fock, Schrödinger equations (Hackbusch,
Tyrtyshnikov, Kressner, Espig, Khoromskij(aja), Grasedyck,
Oseledets ...)
I 2007: uncertainty quantification, SPDEs, parametric PDEs
(Nobile, all above)
I ∼2009: spatical statistics
I ∼2015: Machine Learning
3 / 52

Challenging applications
I quantum mechanics
I modelling of multi-particle interactions in large molecular
systems such as proteins, biomolecules,
I modelling of large atomic (metallic) clusters,
I stochastic and parametric equations,
I machine learning, data mining and information technologies,
I multidimensional dynamical systems,
I data compression
I financial mathematics,
I analysis of satellite data.
4 / 52

Curse of dimensionality
Assume we have nd data. Our aim is to reduce
storage/complexity from O(nd) to O(dn).
If n = 100 and d = 10, then just to store one needs
8 · 10010 ≈ 8 · 1020 = 8 · 108 TeraBytes.
If we assume that a modern computer compares 107 numbers
per second, then the total time for comparison 1020 elements
will be 1013 seconds or ≈ 3 ∗ 105 years. In some chemical
applications we had n = 100 and d = 800.
I how to compute maxima and minima ?
I how to compute level sets, i.e. all elements from an interval
[a,b] ?
I how to compute the number of elements in an interval [a,b] ?
5 / 52

Example: Tensors appear in stochastic PDEs
−∇ · (κ(x,ω)∇u(x,ω)) = f(x,ω), x ∈ G ⊂ Rd
where ω ∈ Ω, and U = L2(G).
Write first Karhunen-Loeve Expansion and then for
uncorrelated random variables the Polynomial Chaos
Expansion
u(x,ω) =
K
X
i=1
p
λiϕi(x)ξi(ω) =
K
X
i=1
p
λiϕi(x)
X
α∈J
ξ
(α)
i Hα(θ(ω))
(1)
where ξ
(α)
i is a tensor. Note that α = (α1,α2,...,αM,...) is a
multi-index.
X
α∈J
ξ
(α)
i Hα(θ(ω)) :=
p1
X
α1=1
...
pM
X
αM=1
ξ
(α1,...,αM)
i
M
Y
j=1
hαj
(θj) (2)
The same decomposition for κ(x,ω). 6 / 52

Final discretized stochastic PDE
Au = f, where
A:=
Ps
l=1 Ãl ⊗
NM
µ=1 ∆lµ

, Ãl ∈ RN×N, ∆lµ ∈ RRµ×Rµ,
u:=
Pr
j=1 uj ⊗
NM
µ=1 ujµ

, uj ∈ RN, ujµ ∈ RRµ,
f:=
PR
k=1 f̃k ⊗
NM
µ=1 gkµ, f̃k ∈ RN and gkµ ∈ RRµ.
Examples of stochastic Galerkin matrices:
And then solve iteratively with a tensor preconditioner [PhD of E. Zander, 2012]
M Espig, W Hackbusch, A Litvinenko, HG Matthies, P Wähnert, Efficient low-rank approximation of the stochastic Galerkin matrix in tensor formats Computers
Mathematics with Applications 67 (4), 818-829, 2014
Also see E. Ullmann, Chr. Schwab, B. Khoromskij, Schneider, Ballani, Kressner, Tobler and many-many others.
7 / 52

Tensor of order 2
Let M := UΣVT ≈ ŨΣ̃ṼT = Mk, k min{n,m}.
(Truncated Singular Value Decomposition).
Denote A := ŨΣ̃ and B := Ṽ, then Mk = ABT.
Storage of A and BT is k(n + m) in contrast to nm for M.
U V
Σ
T
=
M
U
V
Σ
∼
∼ ∼ T
=
M
∼
8 / 52

Arithmetic operations with low-rank matrices
Let v ∈ Rm.
Suppose Mk = ABT ∈ Rn×m, A ∈ Rn×k, B ∈ Rm×k is given.
MV product: Mkv = ABTv = (A(BTv)). Cost O(km + kn).
Suppose M
0
= CDT, C ∈ Rn×k and D ∈ Rm×k.
Matrix addition: Mk + M
0
= AnewBT
new, Anew := [A C] ∈ Rn×2k
and Bnew = [B D] ∈ Rm×2k.
Cost of rank truncation from rank 2k to k is O((n + m)k2 + k3).
9 / 52

Post-processing: Compute mean and variance
Let W := [v1,v2,...,vm], where vi are vectors (e.g., solution
vectors of a Navier-Stokes equation).
Given tSVD Wk = ABT ≈ W ∈ Rn×m, A := UkSk, B := Vk.
v =
1
m
m
X
i=1
vi =
1
m
m
X
i=1
A · bi = Ab, (3)
C =
1
m − 1
WcWT
c =
1
m − 1
ABT
BAT
=
1
m − 1
AAT
. (4)
Diagonal of C can be computed with the complexity
O(k2(m + n)).
If kW − Wkk ≤ ε, then
a) kv − vkk ≤ 1
√
n
ε,
b) kC − Ckk ≤ 1
m−1ε2.
10 / 52

Example from CFD and aerodynamics
Inflow and air-foil shape uncertain.
Data compression achieved by updated SVD:
Made from m = 600 MC Simulations, SVD is updated every 10
samples.
n = 260,000
Updated SVD: Relative errors, memory requirements:
rank k pressure turb. kin. energy memory [MB]
10 1.9e-2 4.0e-3 21
20 1.4e-2 5.9e-3 42
50 5.3e-3 1.5e-4 104
Dense matrix M ∈ R260000×600 costs 1250 MB storage.
1.A. Litvinenko, H.G. Matthies, T.A. El-Moselhy, Sampling and low-rank tensor approximation of the response surface, Monte Carlo and Quasi-Monte Carlo Methods 2012,
535-551, 2013
2. A. Litvinenko, H.G. Matthies, Numerical methods for uncertainty quantification and bayesian update in aerodynamics, Management and Minimisation of Uncertainties
and Errors in Numerical Aerodynamics, pp 265-282, Springer, Berlin, 2013
11 / 52

Example: a high-dimensional PDE
−∇2
u = f on G = [0,1]d
with u|∂G = 1,
and the right-hand-side
f(x1,...,xd) ∝
d
X
k=1
d
Y
`=1,`,k
x`(1 − x`).
Solved via finite-difference method with n = 100 grid-points in
each direction.
Tensor u has N = nd entries.
Applications: computing electron density and Hartree potential
of molecules (see Diss. of M. Espig).
12 / 52

Comp. time to compute the maximum
d # loc’s.: ≈ years [a] actual time [s]
N = nd inspect. N (see Espig’s diss.)
25 1050 1.6 × 1033 0.16
50 10100 1.6 × 1083 0.42
75 10150 1.6 × 10133 1.16
100 10200 1.6 × 10183 2.58
125 10250 1.6 × 10233 4.97
150 10300 1.6 × 10283 8.56
Assumed 2 × 109 FLOPs/sec. on an 2 GHz CPU.
M. Espig, W. Hackbusch, A. Litvinenko, H.G. Matthies, E. Zander, Iterative algorithms for the post-processing of high-dimensional data Journal of Computational Physics
410, 109396, 2020
13 / 52

A tensor is a multi-index array
where multi-indices are used instead of indices.
Let w ∈ RN, N = 1012.
We can reshape it into a matrix W ∈ R106×106
, which is a tensor
of 2nd order.
Or a tensor of 3rd order
R104×104×104
Or a tensor of 6th order
R102×...×102
(6 times).
Or a tensor of 12th order
R10×...×10 (12 times).
14 / 52

Definition of tensor of order d
Tensor of order d is a multidimensional array over a d-tuple
index set I = I1 × ··· × Id,
A = [ai1...id
: i` ∈ I`] ∈ RI
, I` = {1,...,n`}, ` = 1,..,d.
A is an element of the linear space
Vn =
d
O
`=1
V`, V` = RI`
equipped with the Euclidean scalar product h·,·i : Vn × Vn → R,
defined as
hA,Bi :=
X
(i1...id)∈I
ai1...id
bi1...id
, forA, B ∈ Vn.
15 / 52

Canonical, Tucker and TT tensor formats
Canonical in 1927 (F.L. Hitchcock), Tucker in 1966, TT in 2010.
a) Schema of the CP tensor decomposition of a 3D tensor; b)
Tucker; c) TT decompositions. The waggons denote the TT
cores and each wheel denotes the index iν.
16 / 52

Tensor formats: CP, Tucker, TT
A(i1,i2,i3) ≈
r
X
α=1
u1(i1,α)u2(i2,α)u3(i3,α)
A(i1,i2,i3) ≈
X
α1,α2,α3
c(α1,α2,α3)u1(i1,α1)u2(i2,α2)u3(i3,α3)
A(i1,...,id) ≈
X
α1,...,αd−1
G1(i1,α1)G2(α1,i2,α2)...Gd−1(αd−1,id)
Discrete: Gk(ik) is a rk−1 × rk matrix, r1 = rd = 1.
17 / 52

Tensor and Matrices
Rank-1 tensor
A = u1 ⊗ u2 ⊗ ... ⊗ ud =:
d
O
µ=1
uµ
Ai1,...,id
= (u1)i1
· ... · (ud)id
Rank-1 tensor A = u ⊗ v is equivalent to rank-1 matrix A = uvT,
where u ∈ Rn, v ∈ Rm,
Rank-k tensor A =
Pk
i=1 ui ⊗ vi, matrix A =
Pk
i=1 uivT
i .
Kronecker product A ⊗ B ∈ Rnm×nm is a block matrix whose ij-th
block is [AijB].
18 / 52

Examples (B. Khoromskij’s lecture)
Rank-1: f = exp(f1(x1) + ... + fd(xd)) =
Qd
j=1 exp(fj(xj))
Rank-2: f = sin(
Pd
j=1 xj), since
2i · sin(
Pd
j=1 xj) = ei
Pd
j=1 xj
− e−i
Pd
j=1 xj
Rank-d function f(x1,...,xd) = x1 + x2 + ... + xd can be
approximated by rank-2: with any prescribed accuracy:
f ≈
Qd
j=1(1 + εxj)
ε
−
Qd
j=1 1
ε
+ O(ε), as ε → 0
19 / 52

Examples: in TT format
f(x1,...,xd) = w1(x1) + w2(x2) + ... + wd(xd)
= (w1(x1),1)
1 0
w2(x2) 1
!
...
1 0
wd−1(xd−1) 1
!
1
wd(xd)
!
f = sin(x1 + x2 + ... + xd)
= (sinx1,cosx1)
cosx2 −sinx2
sinx2 cosx2
!
...
cosxd−1 −sinxd−1
sinxd−1 cosxd−1
!
cosxd
sinxd−1
!
rank(f)=2
20 / 52

Examples where you can apply low-rank tensor approx.
eiκkx−yk
kx − yk
Helmholtz kernel (5)
f(x) =
Z
[−a,a]3
e−iµkx−yk
kx − yk
u(y)dy Yukawa potential (6)
Classical Green kernels, x,y ∈ Rd
log(kx − yk),
1
kx − yk
(7)
Other multivariate functions
(a)
1
x2
1 + .. + x2
d
, (b)
1
q
x2
1 + .. + x2
d
, (c)
e−λkxk
kxk
. (8)
For (a) use (ρ = x2
1 + ... + x2
d ∈ [1, R], R 1)
1
ρ
=
Z
R+
e−ρt
dt. (9)
21 / 52

Definitions of CP
Let T :=
Nd
µ=1 Rnµ be the tensor product constructed from
vector spaces (Rnµ,h,iR
nµ) (d ≥ 3).
Tensor representation U is a multilinear map U : P → T , where
parametric space P =
D
ν=1 Pν (d ≤ D).
Further, Pν depends on some representation rank parameter
rν ∈ N.
A standard example of a tensor representation is the canonical
tensor format.
(!!!)We distinguish between a tensor v ∈ T and its tensor format
representation p ∈ P, where v = U(p).
23 / 52

r-Terms, Tensor Rank, Canonical Tensor Format
The set Rr of tensors which can be represented in T with
r-terms is defined as
Rr(T ) := Rr :=









r
X
i=1
d
O
µ=1
viµ ∈ T : viµ ∈ Rnµ









. (10)
Let v ∈ T . The tensor rank of v in T is
rank(v) := min{r ∈ N0 : v ∈ Rr}. (11)
Example: The Laplace operator in 3d:
∆3
= ∆1
⊗ I ⊗ I + I ⊗ ∆1
⊗ I + I ⊗ I ⊗ ∆1
24 / 52

Definitions of CP
The canonical tensor format is defined by the mapping
Ucp :
d

µ=1
Rnµ×r
→ Rr, (12)
v̂ := (viµ : 1 ≤ i ≤ r, 1 ≤ µ ≤ d) 7→ Ucp(v̂) :=
r
X
i=1
d
O
µ=1
viµ.
25 / 52

Properties of CP
Let r1,r2 ∈ N, u ∈ Rr1
and v ∈ Rr2
. We have
(i) hu,viT =
Pr1
j1=1
Pr2
j2=1
Qd
µ=1
D
uj1µ,vj2µ
E
R
nµ. The computational
cost of hu,viT is O

r1r2
Pd
µ=1 nµ

.
(ii) u + v ∈ Rr1+r2
.
(iii) u

denotes the point wise Hadamard
product. Further, u

v can be computed in the canonical
tensor format with r1r2
Pd
µ=1 nµ arithmetic operations.
Let R1 = A1BT
1 , R2 = A2BT
2 be rank-k matrices, then
R1 + R2 = [A1A2][B1B2]T be rank-2k matrix. Rank truncation!
26 / 52

Properties of Hadamard product and FT
Let u =
Pk
j=1
Nd
i=1 uji, uji ∈ Rn.
F [d]
(ũ) =
k
X
j=1
d
O
i=1
(Fi (ũji)), where F [d]
=
d
O
i=1
Fi. (13)
Let S = ABT =
Pk1
i=0 aibT
i ∈ Rn×m, T = CDT =
Pk2
j=0 cidT
i ∈ Rn×m
where ai, ci ∈ Rn, bi, di ∈ Rm, k1,k2,n,m 0. Then
F (2)
(S ◦ T) =
k1
X
i=0
k2
X
j=0
F (ai ◦ cj)F (bT
i ◦ dT
j ).
27 / 52

Tensor Format
A =
k1
X
i1=1
...
kd
X
id=1
ci1,...,id
· u1
i1
⊗ ... ⊗ ud
id
(14)
Core tensor c ∈ Rk1×...×kd , rank (k1,...,kd).
Nonlinear fixed rank approximation problem:
X = argminX minrank(k1,...,kd)kA − Xk (15)
I Problem is well-posed but not solved
I There are many local minima
I HOSVD (Lathauwer et al.) yields rank
(k1,...,kd)Y : kA − Yk ≤
√
dkA − Xk
I reliable arithmetic, exponential scaling (c ∈ Rk1×k2×...×kd )
28 / 52

Example: Canonical rank d, whereas TT rank 2
d-Laplacian over uniform tensor grid. It is known to have the
Kronecker rank-d representation,
∆d = A ⊗IN ⊗...⊗IN +IN ⊗A ⊗...⊗IN +...+IN ⊗IN ⊗...⊗A ∈ RI⊗d⊗I⊗d
(16)
with A = ∆1 = tridiag{−1,2,−1} ∈ RN×N, and IN being the N × N
identity. Notice that for the canonical rank we have rank
kC(∆d) = d, while TT-rank of ∆d is equal to 2 for any dimension
due to the explicit representation
∆d = (∆1 I) ×
I 0
∆1 I
!
× ... ×
I 0
∆1 I
!
×
I
∆1
!
(17)
where the rank product operation ”×” is defined as a regular
matrix product of the two corresponding core matrices, their
blocks being multiplied by means of tensor product.
29 / 52

Linear algebra in the CP format
w = u + v =








ru
X
j=1
d
O
ν=1
u
(ν)
j








+









rv
X
k=1
d
O
µ=1
v
(µ)
k









=
ru+rv
X
j=1
d
O
ν=1
w
(ν)
j ,
where w
(ν)
j := u
(ν)
j for j ≤ ru and w
(ν)
j := v
(ν)
j for ru j ≤ ru + rv.
Cost O(1).
The Hadamard product
w = u

v =








ru
X
j=1
d
O
ν=1
u
(ν)
j
















rv
X
k=1
d
O
ν=1
v
(ν)
k






 =
ru
X
j=1
rv
X
k=1
d
O
ν=1

u
(ν)
j

v
(ν)
k

.
The new rank is generally ru × rv, and the computational cost is
O(ru rvn d) arithmetic operations.
30 / 52

The Euclidean inner product
is computed as follows:
hu|vi = h
ru
X
j=1
d
O
ν=1
u
(ν)
j |
rv
X
k=1
d
O
ν=1
v
(ν)
k i =
ru
X
j=1
rv
X
k=1
d
Y
ν=1
hu
(ν)
j |v
(ν)
k i
The computational cost of the inner product is O(ru rv n d).
31 / 52

Advantages and disadvantages
Denote k - rank, d-dimension, n = # dofs in 1D:
1. CP: ill-posed approx. alg-m [V. de Silva, L-H. Lim’08], O(dnk),
hard to compute approx.
2. Tucker: reliable arithmetic based on SVD, O(dnk + kd)
3. Hierarchical Tucker: based on SVD, storage O(dnk + dk3),
truncation O(dnk2 + dk4)
4. TT: based on SVD, O(dnk2) or O(dnk3), stable
5. Quantics-TT: O(nd) → O(dlogqn)
32 / 52

How to compute the mean value in CP format
Let u =
Pr
j=1
Nd
µ=1 ujµ ∈ Rr, then the mean value u can be
computed as a scalar product
u =
*









r
X
j=1
d
O
µ=1
ujµ









,









d
O
µ=1
1
nµ
1̃µ









+
=
r
X
j=1
d
O
µ=1
D
ujµ,1̃µ
E
nµ
= (18)
=
r
X
j=1
d
Y
µ=1
1
nµ








nµ
X
k=1
ujµ








, (19)
where 1̃µ := (1,...,1)T ∈ Rnµ.
Numerical cost is O

r ·
Pd
µ=1 nµ

.
33 / 52

How to compute the variance in CP format
Let u ∈ Rr and
ũ := u − u
d
O
µ=1
1
nµ
1 =
r+1
X
j=1
d
O
µ=1
ũjµ ∈ Rr+1, (20)
then the variance var(u) of u can be computed as follows
var(u) =
hũ,ũi
Qd
µ=1 nµ
=
1
Qd
µ=1 nµ
*









r+1
X
i=1
d
O
µ=1
ũiµ









,








r+1
X
j=1
d
O
ν=1
ũjν








+
=
r+1
X
i=1
r+1
X
j=1
d
Y
µ=1
1
nµ
D
ũiµ,ũjµ
E
.
Numerical cost is O

(r + 1)2 ·
Pd
µ=1 nµ

.
S. Dolgov, B.N. Khoromskij, A. Litvinenko, H.G. Matthies, Computation of the response surface in the tensor train data format, arXiv:1406.2816, 2014
34 / 52

Conclusion
We discussed:
I Motivation: why do we need low-rank tensors
I Tensors of the second order (matrices)
I CP, Tucker and tensor train tensor formats
I Many classical kernels have (or can be approximated in )
low-rank tensor format
I Post processing: Computation of mean, variance, level sets,
frequency
35 / 52

Tensor Software
Ivan Oseledets et al., Tensor Train toolbox (Matlab),
http://spring.inm.ras.ru/osel
D.Kressner, C. Tobler, Hierarchical Tucker Toolbox (Matlab),
http://www.sam.math.ethz.ch/NLAgroup/htucker toolbox.html
M. Espig, et al
Tensor Calculus library (C): http://gitorious.org/tensorcalculus
36 / 52

Two types of tensors
1. Function-related and 2. data-related
Function-related tensors we considered earlier:
Ex1: A d -dimensional function f(x1,...,xd) = sin(x1 + ... + xd)
discretised on an axis-parallel grid).
Ex2: solution of a stochastic PDE.
These tensors are usually given implicitly.
Data-related tensors:
Ex1: user i1 cited work i2 from author i3 published in year i4.
Ex2: a tensor 400 × 480 × 360 × 3 -containing 400 CT images of
size 480 × 360 pixels, 3 colours.
Ex3: disease forecast depends on temperature i1, blood
pressure i2, other blood parameters i3,i4,i5.
These tensors are given explicitly.
38 / 52

What is needed for tensor approximation?
1. Does a low-rank tensor approximation always exist?
Yes, it exists always. The question is only what is the rank
(ranks)? In the worst case the rank is huge, of order O(n) (or
O(N)).
2. What is needed for a low-rank tensor approximations?
For function-related tensors:
1. Decay of eigenvalues is crucial.
2. Low-rank tensor approximability and function separability
are strongly connected
3. Smoothness is not necessary. Piecewise smoothness can be
enough.
4. Smoothness is not sufficient
39 / 52

How to compute tensor decomposition?
This is not so easy (just give a short list):
1. Factors of CP decomposition: a minimization problem is
solved by a quasi Newton method or ALS (dissertation of M.
Espig, Leipzig 2007)
2. Factors of Tucker decomp.: HOSVD (De Lathauwer,...)
3. TT, Hierarchical Tucker: SVD, QR (Kressner, Tobler,
Hackbusch, Ballani, Grasedyck,...), Cauchy integral formula,
sinc quadrature (Boris and Venera Khoromskij,...)
4. (adaptive) cross methods (Dolgov, Oseledets, Bebendorf,...)
5. Successive rank-1 approximation(A. Nouy)
6. Randomized methods
See more in tensor book of W. Hackbusch’13, 2 books of Boris and Venera Khoromskij, book chapters of A. Nouy and I. Oseledets, two tensor overview papers by B.
Khoromskij’11 and by Kressner/Grasedyck/Tobler’13, tensor dissertations on
https://www.mis.mpg.de/scicomp/phdthesis.de.html
40 / 52

Practical exercises Ex1:
Implement in Matlab.
Generate two tensors u and v of order d in the CP tensor
format with tensor ranks ru = 3 and rv = 4 respectively
1. Compute u + v. Which rank has this sum?
2. Compute scalar product hu,vi
3. Apply d-dimensional Fast Fourier transform (FFT) F d to u, i.e.,
w = F du.
4. Generate a full d-dimensional tensor u ∈ Rn×...×n, d times,
n = 2M and apply FFT, i.e., F d(u). Measure the computing
time and the needed memory. Now assume
u ≈
Pr
i=1
Nd
µ=1 ũiµ. Apply F d to the CP representation, i.e.,
F d(
Pr
i=1
Nd
µ=1 uiµ). Measure and compare again the
computational time and the memory requirement. Play with
different M, d, and r.
41 / 52

Practical exercise Ex2:
Prove numerically that for the Laplace operator, discretised
with a finite difference on a 3d axis-parallel grid [0,1]3 with a
step size h, hold:
∆3
= ∆1
⊗ I ⊗ I + I ⊗ ∆1
⊗ I + I ⊗ I ⊗ ∆1
,
where ∆1 is the discretised Laplace operator in 1d.
Hint:
1. You may use this Matlab code
https://www.mathworks.com/matlabcentral/fileexchange/
27279-laplacian-in-1d-2d-or-3d to generate ∆3
2. Use Matlab operators kron() and eye() to compute ∆1 ⊗ I ⊗ I
42 / 52

Let C and D two quadratic matrices of size n × n and m × m. Let
the eigenvalues of C be µ1,...,µn. Let the eigenvalues of D be
λ1,...,λm. What are eigenvalues of matrix C ⊗ D?
43 / 52

Let cov(x,y) = exp−|x−y|2
, where
x = (x1,..,xd), y = (y1,...,yd) ∈ D ∈ Rd, d = 3,
cov(x,y) = exp−|x1−y1|2
⊗exp−|x2−y2|2
⊗exp−|x3−y3|2
.
C = C1 ⊗ ... ⊗ Cd.
Assume that d Cholesky decompositions exist, i.e, Ci = Li · LT
i ,
i = 1..d. Use properties of the Kronecker tensor product to
compute the following tensor product in terms of factors Li, i.e.,
compute L and LT
C1 ⊗ ... ⊗ Cd =: L · LT
.
Are L and LT also low- and upper-triangular matrices?
Generate all needed intermediate matrices and visualize L in
Matlab.
Show that the computational complexity was reduced from
O(N logN), N = nd, to O(dn logn).
44 / 52

Assume that inverse matrices C−1
i , i = 1..d, exist. Use
properties of the Kronecker tensor product to compute
(C1 ⊗ ... ⊗ Cd)−1
=? (21)
45 / 52

Let C ≈ C̃ =
Pr
i=1
Nd
µ=1 Ciµ, then
diag(C̃) = diag









r
X
i=1
d
O
µ=1
Ciµ









=
r
X
i=1
d
O
µ=1
diag

Ciµ

, (22)
trace(C̃) = trace









r
X
i=1
d
O
µ=1
Ciµ









=
r
X
i=1
d
Y
µ=1
trace(Ciµ). (23)
det(C1 ⊗ C2) = det(C1)n2 · det(C2)n1
logdet(C1 ⊗ C2) = log(det(C1)n2 · det(C2)n1)
= n2 logdetC1 + n1 logdetC2.
logdet(C1 ⊗ C2 ⊗ C3) = n2n3 logdetC1 + n1n3 logdetC2
+ n1n2 logdetC3.
46 / 52

Low-rank tensor approximation (Introduction)

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Low-rank tensor approximation (Introduction)

Similar to Low-rank tensor approximation (Introduction) (20)

More from Alexander Litvinenko

More from Alexander Litvinenko (20)

Recently uploaded

Recently uploaded (20)

Low-rank tensor approximation (Introduction)