2019 Fall Series: Postdoc Seminars - Special Guest Lecture, Attacking the Curse of Dimensionality Using Sums of Separable Functions - Martin Mohlenkamp, September 11, 2019
Naive computations involving a function of many variables suffer from the curse of dimensionality: the computational cost grows exponentially with the number of variables. One approach to bypassing the curse is to approximate the function as a sum of products of functions of one variable and compute in this format. When the variables are indices, a function of many variables is called a tensor, and this approach is to approximate and use the tensor in the (so-called) canonical tensor format. In this talk I will describe how such approximations can be used in numerical analysis and in machine learning.
Similar to 2019 Fall Series: Postdoc Seminars - Special Guest Lecture, Attacking the Curse of Dimensionality Using Sums of Separable Functions - Martin Mohlenkamp, September 11, 2019
Similar to 2019 Fall Series: Postdoc Seminars - Special Guest Lecture, Attacking the Curse of Dimensionality Using Sums of Separable Functions - Martin Mohlenkamp, September 11, 2019 (20)
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
2019 Fall Series: Postdoc Seminars - Special Guest Lecture, Attacking the Curse of Dimensionality Using Sums of Separable Functions - Martin Mohlenkamp, September 11, 2019
1. Attacking the Curse of Dimensionality
using Sums of Separable Functions
Martin J. Mohlenkamp
Department of Mathematics
http://www.ohiouniversityfaculty.com/mohlenka/
SAMSI, September 2019
2. Abstract
Naive computations involving a function of many variables suffer from the
curse of dimensionality: the computational cost grows exponentially with
the number of variables. One approach to bypassing the curse is to
approximate the function as a sum of products of functions of one variable
and compute in this format. When the variables are indices, a function of
many variables is called a tensor, and this approach is to approximate and
use the tensor in the (so-called) canonical tensor format. In this talk I will
describe how such approximations can be used in numerical analysis and in
machine learning.
Martin J. Mohlenkamp (OHIO) Attacking the CoD using SoS Functions SAMSI, September 2019 2 / 28
3. Goals of this Talk
Show you a tool that you may find useful.
Hint at other things I know that you may find useful.
Not Goals
Convince you that this tool is better than other methods.
Show that I am great.
Martin J. Mohlenkamp (OHIO) Attacking the CoD using SoS Functions SAMSI, September 2019 3 / 28
4. The Curse of Dimensionality (discrete setting)
d Name Notation Storage Visual
1 Vector vj $
2 Matrix Ajk $$
3 Tensor Tjkm $$$
> 3 Tensor T(j1, . . . , jd ) $d ?
The cost to do anything, even store the object,
grows exponentially in the dimension d.
Martin J. Mohlenkamp (OHIO) Attacking the CoD using SoS Functions SAMSI, September 2019 4 / 28
5. The Curse of Dimensionality (function setting)
To approximate a function f (x1, x2, . . . , xd )
that has smoothness p
to accuracy
costs −d/p = ( −1/p)d = $d .
This curse is unavoidable for general function spaces (smoothness classes).
If a method seems to avoid it, look for
“constants” that grow exponentially in d,
inductive proofs that require d! terms, and
assumptions that imply a vanishing set of functions as d increases.
(Exercise: Think about how this applies to Monte Carlo methods.)
Martin J. Mohlenkamp (OHIO) Attacking the CoD using SoS Functions SAMSI, September 2019 5 / 28
6. Philosophy
Naturally occuring functions of many variables are not general.
If a method can match what really occurs in some application,
then it can avoid the curse.
Non-trivial, non-circular characterizations of the set of functions that a
given method can match are hard. (I know of none.)
Instead we start from inspiration:
Neural networks are inspired by the visual cortex of cats.
The following method is inspired by partial differential equations in
physics (e.g. heat flow).
Martin J. Mohlenkamp (OHIO) Attacking the CoD using SoS Functions SAMSI, September 2019 6 / 28
7. Approximation by Sums of Separable Tensors/Functions
In dimension d, a rank r approximation of a tensor T is
T(j1, j2, . . . , jd ) ≈ G(j1, . . . , jd ) =
r
l=1
d
i=1
Gl
i (ji )
or equivalently T ≈ G =
r
l=1
Gl
=
r
l=1
d
i=1
Gl
i .
Instead of $d , storage is rd$, which is no longer exponential.
To do functions, just change notation:
f (x1, x2, . . . , xd ) ≈ g(x1, . . . , xd ) =
r
l=1
d
i=1
gl
i (xi ) .
With large enough r this can approximate anything within .
Martin J. Mohlenkamp (OHIO) Attacking the CoD using SoS Functions SAMSI, September 2019 7 / 28
8. Basic Computational Paradigm
1 Start with operators/matrices and functions/vectors that can be
represented within with low rank.
2 Do linear algebra operations with them, e.g.
˜g = Lg =
r
l=1
r1
m=1
d
i=1
(Ll
i gm
i (xi ))
The computational cost is O(d · r · r1)
which is linear in d rather than exponential.
3 Adaptively re-minimize the rank of the output of each operation,
controlling the approximation error.
Martin J. Mohlenkamp (OHIO) Attacking the CoD using SoS Functions SAMSI, September 2019 8 / 28
9. Example: Power Method
L =
r
g1 =
r1
multiply
↓
˜g =
r·r1
↓
reduce r · r1 → r2
L =
r
g2 =
r2
· · ·
↓
Martin J. Mohlenkamp (OHIO) Attacking the CoD using SoS Functions SAMSI, September 2019 9 / 28
10. Reducing the Rank
We wish to (well) approximate ˜g =
R
m=1
d
i=1
˜gm
i by g =
r
l=1
d
i=1
gl
i ,
with r small(er).
This is NP-hard, but we can try optimization algorithms:
From an initial g, iteratively modify {gl
i } to reduce the error ˜g −g 2
2.
You can try your favorite generic method:
Newton’s method and variations
gradient descent and variations
GMRES, BFGS, other acronyms
etc.
Often any method will do, but sometimes all of them struggle.
(I have worked years on challenges with this optimization problem.)
Martin J. Mohlenkamp (OHIO) Attacking the CoD using SoS Functions SAMSI, September 2019 10 / 28
11. Alternating Least Squares (ALS)
This optimization problem has a multilinear structure we can use.
Loop until the error is small enough or r seems insufficient:
Loop through the directions k = 1, . . . , d.
Fix {gl
i } for i = k, and solve a linear least squares problem for new gl
k .
The normal equations are
i=k g1
i , g1
i . . . i=k g1
i , gr
i
...
...
...
i=k gr
i , g1
i . . . i=k gr
i , gr
i
g1
k
...
gr
k
=
R
q=1 ˜gR
k i=k g1
i , ˜gq
i
...
R
q=1 ˜gR
k i=k gr
i , ˜gq
i
.
ALS is old, simple, stepwise robust, adaptable, and widely used,
but does not make the underlying optimization problem any easier.
Martin J. Mohlenkamp (OHIO) Attacking the CoD using SoS Functions SAMSI, September 2019 11 / 28
12. Extended Computational Paradigm
(developed mainly for quantum mechanics)
Some symmetries can be enforced implicitly in the inner product.
Example: The antisymmetrizer A creates the beast
A
N
i=1
φi (γi ) =
1
N!
φ1(γ1) φ1(γ2) · · · φ1(γN)
φ2(γ1) φ2(γ2) · · · φ2(γN)
...
...
...
φN(γ1) φN(γ2) · · · φN(γN)
,
but inner products with it are computed simply as
A ˜φi , A φi =
|L|
N!
with L(i, j) = ˜φi , φj .
Martin J. Mohlenkamp (OHIO) Attacking the CoD using SoS Functions SAMSI, September 2019 12 / 28
13. Extended Computational Paradigm
If L does not have low rank but Lg1, g is computable,
then you cannot use the basic paradigm g1
apply L
−−−−→ Lg1 = ˜g
reduce rank
−−−−−−−→ g2
but you can sometimes still run ALS to form g.
Example: the electron-electron interaction (multiplication) operator
W =
1
2
N
i=1 j=i
1
ri − rj
cannot be written with small r, but
AW ˜φi , A φi
is computable (formula suppressed).
Martin J. Mohlenkamp (OHIO) Attacking the CoD using SoS Functions SAMSI, September 2019 13 / 28
14. Extended Computational Paradigm
If you know why your function cannot be written will small r,
you might be able to extend the sum-of-separable format.
Example: To capture the interelectron cusp, we can use
A
P
p=0
1
2 m=n
wp(|γm − γn|)
rp
q=1
N
i=1
φp,q
i (γi ) .
Example: To scale to large systems (composed of subsystems) we can use
A
r
q=1
K
k=1
rk
qk =1
Nk
ik =1
φq,qk
k,ik
(γk,ik
)
.
Martin J. Mohlenkamp (OHIO) Attacking the CoD using SoS Functions SAMSI, September 2019 14 / 28
15. Conclusions, Part I
Sums of separable functions give a tractable way to represent (some)
functions of many variables.
You can compute with then, to solve PDEs etc.
There are various extensions.
(There are difficulties too, which I skip.)
Martin J. Mohlenkamp (OHIO) Attacking the CoD using SoS Functions SAMSI, September 2019 15 / 28
16. Mutivariate Regression
Beginning with scattered data in high dimensions
D = (xj, yj) = (xj
1, · · · , xj
d ; yj)
N
j=1
,
define an empirical inner product between functions
f , g =
N
j=1
f (xj)g(xj) ,
which also works between a function and our data,
{(xj, yj)}N
j=1 , g =
N
j=1
yjg(xj) .
The (empirical) least-squares error is then
{(xj, yj)}N
j=1 − g
2
=
N
j=1
(yj − g(xj))2
.
Martin J. Mohlenkamp (OHIO) Attacking the CoD using SoS Functions SAMSI, September 2019 16 / 28
17. Regression with a Sum of Separable Functions
Construct g(x) such that g(xj) ≈ yj with
g(x) =
r
l=1
d
i=1
gl
i (xi ) .
We can use an ALS approach:
Loop until you are happy or the metaparameters seem inappropriate:
Loop through the directions k = 1, . . . , d.
Fix {gl
i } for i = k, and update {gl
k }l to reduce (minimize) the error
N
j=1
yj −
r
l=1
gl
k (xj
k )
d
i=k
gl
i (xj
i )
2
.
If we choose each gl
k to be a linear combination of some basis functions,
then we get a linear least-squares problem in its coefficients.
Otherwise (and for other loss functions) it is nonlinear.
Martin J. Mohlenkamp (OHIO) Attacking the CoD using SoS Functions SAMSI, September 2019 17 / 28
18. Comments
The usual issues (noise, local minima, over-fitting) and
standard techniques (regularization, cross-validation) apply.
The cost for an optimization pass is linear in both d and N,
so the method is feasible for large data sets in high dimensions.
As of 2009, this regression method was competitive on a standard set
of benchmark problems (see the paper).
As of 2010, a classification method based on these principles was
competitive on a standard set of benchmark problems (see a paper by
Jochen Garcke).
Martin J. Mohlenkamp (OHIO) Attacking the CoD using SoS Functions SAMSI, September 2019 18 / 28
19. Regression on Molecules and Materials
D = {(σj, yj)}N
j=1, where σj is a material/molecular structure,
which is an unordered set of atoms a = (t, r),
where t is a species type (e.g. t = Mo), and
r is a location in 3-dimensional space.
A structure can be mapped to a set Vσ whose elements (w, v) are a
weight w and an ordered list of atoms v called a view.
The set Vσ is invariant under rotations, translations, and the order the
atoms are given in.
Martin J. Mohlenkamp (OHIO) Attacking the CoD using SoS Functions SAMSI, September 2019 19 / 28
21. Regression with Consistent Functions
From a function g on ordered lists of atoms, we can build a function on
structures that is rotation and translation invariant by defining
Cg(σ) =
(w,v)∈Vσ
wg(v) .
We can then attempt to minimize the least-squares error
D −Cg 2
=
1
N
N
j=1
(yj − Cg(σj))2
=
1
N
N
j=1
yj −
(w,v)∈Vσj
wg(v)
2
.
If g([a1, a2, . . .]) := g([a1, a2, . . . , ad ]) =
r
l=1
d
i=1
gl
i (ai ) ,
then ALS can be run. Each gl
i is a function of a = (t, r), so its domain is
several copies of R3, which is tractable.
Martin J. Mohlenkamp (OHIO) Attacking the CoD using SoS Functions SAMSI, September 2019 21 / 28
22. Conclusions, Part II
Sums of separable functions give a tractable way to represent (some)
functions of many variables.
You can do regression with then, for machine learning etc.
There are various extensions.
(There are difficulties too, which I skip.)
Martin J. Mohlenkamp (OHIO) Attacking the CoD using SoS Functions SAMSI, September 2019 22 / 28
23. Examples: Gaussians and Radial Functions
a exp −b x 2
= a
d
i=1
exp −bx2
i
If φ(y) ≈
r
l=1
al e−bl y2
for 0 ≤ y , then
φ( x ) ≈
r
l=1
al exp −bl
d
i=1
x2
i =
r
l=1
al
d
i=1
exp −bl x2
i ,
with rank r independent of d (but be careful about ≈ when used).
This construction is especially useful for Greens functions such as 1/ r .
Martin J. Mohlenkamp (OHIO) Attacking the CoD using SoS Functions SAMSI, September 2019 23 / 28
24. Example: Linear Model
If we can write
φ(t) ≈
r
l=1
αl exp(βl t)
then the linear model has
φ
d
i=1
ai xi + b ≈
r
l=1
αl exp βl
d
i=1
ai xi + b
=
r
l=1
αl exp(βl b)
d
i=1
exp(βl ai xi ) .
Properties of φ matter, but the orientation of the axes does not.
(Although if only one ai is nonzero, then r = 1.)
Martin J. Mohlenkamp (OHIO) Attacking the CoD using SoS Functions SAMSI, September 2019 24 / 28
25. Example: Additive Model
f (x) =
d
i=1
fi (xi ) =
d
dt
d
i=1
(1 + tfi (xi ))
t=0
= lim
h→0
1
2h
d
i=1
(1 + hfi (xi )) −
d
i=1
(1 − hfi (xi )) .
At r = 2 the minimization problem is ill-posed.
Ill-posedness can allow useful approximations.
There can be large cancellations and ill-conditioning.
Martin J. Mohlenkamp (OHIO) Attacking the CoD using SoS Functions SAMSI, September 2019 25 / 28
26. Example: Sine of the sum of several variables
As long as sin(αk − αj) = 0 for all j = k,
sin
d
j=1
xj
=
d
j=1
sin(xj)
d
k=1,k=j
sin(xk + αk − αj)
sin(αk − αj)
,
which is rank d.
Ordinary trigonometric expansions yield r = 2d .
Over the complex numbers, r = 2. The field matters.
The representation is not unique. (For generic tensors they are.)
Martin J. Mohlenkamp (OHIO) Attacking the CoD using SoS Functions SAMSI, September 2019 26 / 28
27. Example: Do not add Constraints!
If {gj}2d
j=1 form an orthonormal set and
g(x) =
d
i=1
gi (xi ) +
d
i=1
(gi (xi ) + gi+d (xi ))
then an orthogonality constraint would force us to multiply out,
g(x) =
d
i=1
gi (xi ) +g1(x1)
d
i=2
(gi (xi ) + gi+d (xi ))
+g1+d (x1)
d
i=2
(gi (xi ) + gi+d (xi ))
= · · ·
and have r = 2d instead of r = 2.
Martin J. Mohlenkamp (OHIO) Attacking the CoD using SoS Functions SAMSI, September 2019 27 / 28
28. Final Thoughts
There are no theorems that this approach is good,
but there are intriguing examples.
There are not many alternatives for computing in high dimensions.
(There are alternative tensor formats.)
See http://www.ohiouniversityfaculty.com/mohlenka/
for papers.
Talk with me if any of this seems useful for you.
Martin J. Mohlenkamp (OHIO) Attacking the CoD using SoS Functions SAMSI, September 2019 28 / 28