The document discusses emulators, which are used to approximate complex models. Emulators take the form of a linear combination of basis functions fitted to available data. Two types of emulators are discussed: linear approximations, where the basis functions are fixed beforehand, and nonlinear approximations, where the basis functions depend on the data. Experimental design for collecting data is critical for building an accurate emulator.
MUMS Opening Workshop - Emulators for models and Complexity Reduction - Akil Narayan , August 21, 2018
1. Emulators for models and complexity reduction
Akil Narayan1
1
Department of Mathematics, and Scientific Computing and Imaging (SCI) Institute
University of Utah
August 2018
SAMSI MUMS opening workshop
A. Narayan (U. Utah) Emulators and surrogates
2. Models and emulators
y = u(x) + ε
x ∈ D ⊆ Rd
y ∈ RP
x
y
The parameters/factors x govern the bulk behavior of the response u
The noise or error ε can account for model discrepancy
The observable y can be deterministic or stochastic
A. Narayan (U. Utah) Emulators and surrogates
3. Models and emulators
y = u(x) + ε
x ∈ D ⊆ Rd
y ∈ RP
x
y
Available data: noisy measurements, y, abstractly treated as samples at
specific values of x.
Emulators are generally built to be consistent with data. Their purpose can
be to
extrapolate/interpolate data
accelerate queries of the model
analyze for variances, screening, sensitivity, etc.
A. Narayan (U. Utah) Emulators and surrogates
4. Models and emulators
y = u(x) + ε
x ∈ D ⊆ Rd
y ∈ RP
x
y
I will primarily discuss emulator constructions from applied
mathematics/scientific computing.
We are interested in things like stability, accuracy, consistency, etc.
Take-home point: experimental design is critical in building good emulators.
A. Narayan (U. Utah) Emulators and surrogates
5. Building emulators
Many mathematical emulator models have the form
u(x) ≈ uN (x) :=
N
n=1
cnφn(x),
Information about y: sample data (xm, ym), m = 1, . . . , M.
Two general types of approximations:
linear approximations: uN is linear in the data.
φn(·) are prescribed a priori, {ym} → {cn} is linear
nonlinear approximations: uN is nonlinearly dependent on data
computation of cn may be nonlinear
identification of φn may depend on data
A. Narayan (U. Utah) Emulators and surrogates
6. Building emulators
Many mathematical emulator models have the form
u(x) ≈ uN (x) :=
N
n=1
cnφn(x),
Information about y: sample data (xm, ym), m = 1, . . . , M.
Two general types of approximations:
linear approximations: uN is linear in the data.
φn(·) are prescribed a priori, {ym} → {cn} is linear
nonlinear approximations: uN is nonlinearly dependent on data
computation of cn may be nonlinear
identification of φn may depend on data
The form of φn does not generally dictate linear/nonlinearity.
Some linear approximations:
interpolation
quadrature
least-squares
Some nonlinear approximations:
radial basis/kernel approximations
non-quadratic regularized approximation
proper orthogonal decomposition
A. Narayan (U. Utah) Emulators and surrogates
7. Building emulators
Many mathematical emulator models have the form
u(x) ≈ uN (x) :=
N
n=1
cnφn(x),
Information about y: sample data (xm, ym), m = 1, . . . , M.
Two general types of approximations:
linear approximations: uN is linear in the data.
φn(·) are prescribed a priori, {ym} → {cn} is linear
nonlinear approximations: uN is nonlinearly dependent on data
computation of cn may be nonlinear
identification of φn may depend on data
Example: If M ≥ N, cj coefficients computable via least-squares
y1
y2
...
yM
= y ≈ Ac =
φ1(x1) φ2(x1) · · · φN (x1)
φ1(x2) φ2(x2) · · · φN (x2)
...
...
...
...
φ1(xM ) φ2(xM ) · · · φN (xM )
c1
c2
...
cN
A. Narayan (U. Utah) Emulators and surrogates
8. Emulators as model reduction
Emulators are built in the hope that x → u(x) is a map of low complexity.
If true, and an efficient model to capture this complexity is discoverable, then
u(x) ≈ uN (x) =
N
n=1
cnφn(x), V := span{φ1, . . . , φN }
can be achieved with “small” N.
A. Narayan (U. Utah) Emulators and surrogates
9. Emulators as model reduction
Emulators are built in the hope that x → u(x) is a map of low complexity.
If true, and an efficient model to capture this complexity is discoverable, then
u(x) ≈ uN (x) =
N
n=1
cnφn(x), V := span{φ1, . . . , φN }
can be achieved with “small” N.
Identify V
Efficiently construct uN from V
Neither of these is particularly easy in general.
Anyway, scientific models are complex, is this even feasible with reasonable
N?
A. Narayan (U. Utah) Emulators and surrogates
10. An explicit example
Example: Consider the solution u(z; x) to the parameterized PDE:
− z · (a(z; x) zu(z; x)) = f(z), (z, x) ∈ Ω × D,
u(z; x) = 0, (z, x) ∈ ∂Ω × D.
For each x, u(·; x) ∈ H = H1
(Ω). Let the diffusion coefficient be given by
a(z; x) =
∞
j=1
xjψj(z).
A. Narayan (U. Utah) Emulators and surrogates
11. An explicit example
Example: Consider the solution u(z; x) to the parameterized PDE:
− z · (a(z; x) zu(z; x)) = f(z), (z, x) ∈ Ω × D,
u(z; x) = 0, (z, x) ∈ ∂Ω × D.
For each x, u(·; x) ∈ H = H1
(Ω). Let the diffusion coefficient be given by
a(z; x) =
∞
j=1
xjψj(z).
If x = (x1, . . .) ∈ D = [−1, 1]∞
, and there is some p ≤ 1 such that
∞
j=1
ψj
p
L∞(Ω) < ∞,
then an emulator uN can be constructed such that
u − uN L2(D,H) N−r
, r =
1
p
−
1
2
.
[Cohen, DeVore, Schwab 2010]
A. Narayan (U. Utah) Emulators and surrogates
12. Adapted vs linear
An approximation to u:
u ≈ uN (z; x) =
N
n=1
cn(z)φn(x), V := span{φ1, . . . , φN }
Non-adapted approximation: With V chosen, construct uN so that
u − uN L2(D,RP ) inf
v∈V
u − v L2(D,RP )
The main task is to compute uN from a given V .
A. Narayan (U. Utah) Emulators and surrogates
13. Adapted vs linear
An approximation to u:
u ≈ uN (z; x) =
N
n=1
cn(z)φn(x), V := span{φ1, . . . , φN }
Non-adapted approximation: With V chosen, construct uN so that
u − uN L2(D,RP ) inf
v∈V
u − v L2(D,RP )
The main task is to compute uN from a given V .
Adapated approximation: Find V and uN so that
u(x) − uN (x) RP is “small” for all x ∈ D
A. Narayan (U. Utah) Emulators and surrogates
14. Adapted vs linear
An approximation to u:
u ≈ uN (z; x) =
N
n=1
cn(z)φn(x), V := span{φ1, . . . , φN }
Non-adapted approximation: With V chosen, construct uN so that
u − uN L2(D,RP ) inf
v∈V
u − v L2(D,RP )
The main task is to compute uN from a given V .
Adapated approximation: Find V and uN so that
u(x) − uN (x) RP is “small” for all x ∈ D
Adapted approximations are always nonlinear.
Non-adapted approximations can be linear.
A. Narayan (U. Utah) Emulators and surrogates
15. Emulators and sampling/experimental design
y = u +
u ≈ uN =
N
n=1
cnφn(x) ∈ V,
{(xm, ym)}
M
m=1 −→ {cn}
N
n=1
Desiderata:
u − uN B small for a normed vector space B
M of “reasonable” size
Accuracy, both in identification of V and in computation of uN depends
largely on sample design, i.e., the choice of x1, . . . , xM .
A. Narayan (U. Utah) Emulators and surrogates
16. Emulators and sampling/experimental design
y = u +
u ≈ uN =
N
n=1
cnφn(x) ∈ V,
{(xm, ym)}
M
m=1 −→ {cn}
N
n=1
Desiderata:
u − uN B small for a normed vector space B
M of “reasonable” size
Accuracy, both in identification of V and in computation of uN depends
largely on sample design, i.e., the choice of x1, . . . , xM .
Good sample design can minimize required data size M
Intelligent sampling enables efficient emulator construction
A. Narayan (U. Utah) Emulators and surrogates
17. Summary of methods
We’ll see how sampling design affects approximation statements for three
strategies:
Discrete least-squares: linear approximation, M ≥ N
Compressive sampling: nonlinear approximation, M N
Reduced order modeling: nonlinear approximation, N ∼ M = O(1)
I’ll discuss optimal mathematical statements one can make, taking the form
u − uN B KN × (Best approx error) + if M ≥ KM .
I will focus on the role that sampling plays in these techniques.
A. Narayan (U. Utah) Emulators and surrogates
18. Summary of methods
We’ll see how sampling design affects approximation statements for three
strategies:
Discrete least-squares: linear approximation, M ≥ N
Compressive sampling: nonlinear approximation, M N
Reduced order modeling: nonlinear approximation, N ∼ M = O(1)
I’ll discuss optimal mathematical statements one can make, taking the form
u − uN B KN × (Best approx error) + if M ≥ KM .
I will focus on the role that sampling plays in these techniques.
Warning: There are entire sub-fields of applied math and statistics
concerning sampling that I will ignore.
A. Narayan (U. Utah) Emulators and surrogates
19. Summary of methods
We’ll see how sampling design affects approximation statements for three
strategies:
Discrete least-squares: linear approximation, M ≥ N
Compressive sampling: nonlinear approximation, M N
Reduced order modeling: nonlinear approximation, N ∼ M = O(1)
I’ll discuss optimal mathematical statements one can make, taking the form
u − uN B KN × (Best approx error) + if M ≥ KM .
I will focus on the role that sampling plays in these techniques.
Warning: There are entire sub-fields of applied math and statistics
concerning sampling that I will ignore.
(Because they’re not directly relevant to the message.)
A. Narayan (U. Utah) Emulators and surrogates
20. Part I: Linear approximation
Discrete least squares
Non-adapted basis functions, linear approximation construction procedure
A. Narayan (U. Utah) Emulators and surrogates
21. An aside – polynomials and PCE
u(x) + = y ≈ uN (x) =
N
n=1
cnφn(x),
x ∈ D ⊆ Rd
y ∈ R c ∈ RN
Non-adapted approximation: the φn are a priori chosen.
We often choose polynomials. (Cf. PCE)
Why?
A. Narayan (U. Utah) Emulators and surrogates
22. An aside – polynomials and PCE
u(x) + = y ≈ uN (x) =
N
n=1
cnφn(x),
x ∈ D ⊆ Rd
y ∈ R c ∈ RN
Non-adapted approximation: the φn are a priori chosen.
We often choose polynomials. (Cf. PCE)
Why?
Polynomials are easy to compute with/evaluate
Polynomial expansions are (reasonably) easy to manipulate, multiply,
differentiate, etc.
Polynomials provide best approximation numbers that behave optimally:
inf
N=dim P =dim P d
k
P ⊂Hs
sup
f∈Hs
f Hs =1
inf
p∈P
f − p Hs ∼ N−s/d
sup
f∈Hs
f Hs =1
inf
p∈P d
k
N=dim P d
k
f − p Hs N−s/d
[Pinkus 1985]
A. Narayan (U. Utah) Emulators and surrogates
23. Mathematical preliminaries
u(x) + = y ≈ uN (x) =
N
n=1
cnφn(x),
x ∈ D ⊆ Rd
y ∈ R c ∈ RN
Least-squares problem is approximation of the form
(c∗
1, . . . , c∗
N )
T
= c∗
= arg min
c∈RN
M
m=1
[uN (xm) − ym]
2
A. Narayan (U. Utah) Emulators and surrogates
24. Mathematical preliminaries
u(x) + = y ≈ uN (x) =
N
n=1
cnφn(x),
x ∈ D ⊆ Rd
y ∈ R c ∈ RN
Least-squares problem is approximation of the form
(c∗
1, . . . , c∗
N )
T
= c∗
= arg min
c∈RN
M
m=1
[uN (xm) − ym]
2
Let V := span {φ1, . . . , φN }. Least-squares is, equivalently,
v∗
= arg min
v∈V
M
m=1
(v(xm) − ym)
2
V is an a priori space of functions.
What is the “best” approximation we can hope for?
A. Narayan (U. Utah) Emulators and surrogates
25. Mathematical preliminaries
Given a probability measure µ on D, approximation will take place in an L2
space:
g, h µ :=
D
g(x)h(x)dµ(x), L2
µ (D) := g : D → R g µ < ∞
The best approximation error to u from the subspace V is
σV (u) := inf
v∈V
u − v µ
A. Narayan (U. Utah) Emulators and surrogates
26. Mathematical preliminaries
Given a probability measure µ on D, approximation will take place in an L2
space:
g, h µ :=
D
g(x)h(x)dµ(x), L2
µ (D) := g : D → R g µ < ∞
The best approximation error to u from the subspace V is
σV (u) := inf
v∈V
u − v µ
Randomized sampling: xm sampled iid from µ, and no noise, ym = u(xm),
uN = arg min
v∈V
M
m=1
(v(xm) − ym)
2
Law of large numbers: M ↑ ∞ ⇒ uN − u µ → σV (u).
A. Narayan (U. Utah) Emulators and surrogates
27. “Standard” Monte Carlo
Approximate a function
u(x) = exp −ω x −
1
π
2
, x ∈ [−1, 1], = 0,
with µ uniform on [−1, 1], from the space of potential surrogates
V = span 1, . . . , xN−1
Data xm sampled iid from µ
Convergence observed, but slow
Why does this happen, and can we fix it?
50 100 150 200 250 300
10−5
10−3
10−1
101
103
105
107
M
Mean-squareerror
D = [−1, 1], N = 50
Optimal error
MC
A. Narayan (U. Utah) Emulators and surrogates
28. “Standard” Monte Carlo
Approximate a function
u(x) = exp −ω x −
1
π
2
, x ∈ [−1, 1], = 0,
with µ uniform on [−1, 1], from the space of potential surrogates
V = span 1, . . . , xN−1
Data xm sampled iid from µ
Convergence observed, but slow
Why does this happen, and can we fix it?
50 100 150 200 250 300
10−5
10−3
10−1
101
103
105
107
M
Mean-squareerror
D = [−1, 1], N = 50
Optimal error
MC
Sampling from a standard distribution is frequently suboptimal
A. Narayan (U. Utah) Emulators and surrogates
29. Convergence results
Proximity to the optimal solution is guaranteed with enough samples.
Define
Kµ(V ) := sup
x∈D
sup
v∈V {0}
|v(x)|2
v 2
µ
If x1, . . . , xM are sampled iid from µ, then
M
log M
≥
2 + 2r
log(e/2)
Kµ(V )
guarantees that, with probability ≥ 1 − 2M−r
,
E u − uN
2
µ ≤ 1 + 2
1 − log 2
(1 + r) log M
σV (u)2
+ 8U2
M−r
,
where U = supx∈D |u(x)|, and
uN = arg min
v∈V
M
m=1
(v(xm) − ym)
2
[Cohen, Davenport, Leviatan 2013]
A. Narayan (U. Utah) Emulators and surrogates
30. Randomized sampling – Monte Carlo
M
log M
≥
2 + 2r
log(e/2)
Kµ(V ), Kµ(V ) = sup
x∈D
sup
v∈V {0}
|v(x)|2
v 2
µ
The smallest (best) value of Kµ(V ) is N.
A. Narayan (U. Utah) Emulators and surrogates
31. Randomized sampling – Monte Carlo
M
log M
≥
2 + 2r
log(e/2)
Kµ(V ), Kµ(V ) = sup
x∈D
sup
v∈V {0}
|v(x)|2
v 2
µ
The smallest (best) value of Kµ(V ) is N.
Example: Linear models, N = d + 1
φ1(x) = 1, φj+1(x) = xj, j = 1, . . . , d
Let µ be the standard Gaussian measure over D = Rd
Then Kµ(V ) = ∞.
Analysis suggests this is a pretty bad sampling design, but in practice it’s fine.
A. Narayan (U. Utah) Emulators and surrogates
32. Randomized sampling – Monte Carlo
M
log M
≥
2 + 2r
log(e/2)
Kµ(V ), Kµ(V ) = sup
x∈D
sup
v∈V {0}
|v(x)|2
v 2
µ
The smallest (best) value of Kµ(V ) is N.
Example: Linear models, N = d + 1
φ1(x) = 1, φj+1(x) = xj, j = 1, . . . , d
Let µ be the standard Gaussian measure over D = Rd
Then Kµ(V ) = ∞.
Analysis suggests this is a pretty bad sampling design, but in practice it’s fine.
In the previous example, Kµ(V ) ∼ N2
.
In practice, Kµ(V ) depends exponentially on d.
The ideal case: Kµ(V ) ∼ N. To accomplish this, use biased sampling.
A. Narayan (U. Utah) Emulators and surrogates
33. Randomized sampling – weighted methods
Lesson: sampling xm ∼ µ is usually not optimal, and sometimes terrible.
Standard least-squares:
arg min
c
Ac − y 2
Weighted least-squares:
arg min
c
Ac − y 2,w = arg min
c
√
W Ac −
√
W y
2
where W = diag(w1, . . . , wM ) contains positive weights wj.
A. Narayan (U. Utah) Emulators and surrogates
34. Randomized sampling – optimality
We can entirely circumvent the Kµ(V ) problem by changing sampling
measures.
Assume φ1, . . . φN is an L2
µ-orthonormal basis for V . Generate x1, . . . , xM
iid from µV , where
dµV (x) =
1
N
N
n=1
φ2
n(x)dµ(x).
Use weights
wm =
dµ
dµV
(xm) =
N
N
n=1 φ2
n(xm)
.
Our weighted least-squares estimator is defined by
c∗
= arg min
c
Ac − y 2,w
The measure µV is called the induced distribution for V .
A. Narayan (U. Utah) Emulators and surrogates
35. Randomized sampling – optimality
Let x1, . . . , xM ∼ µV , with uN (x) =
N
n=1 c∗
nφn(x) computed via
c∗
= arg min
c
Ac − y 2,w
Then
M
log M
≥
2 + 2r
log(e/2)
N
guarantees that, with probability ≥ 1 − 2M−r
,
E u − uN
2
µ ≤ 1 + 2
1 − log 2
(1 + r) log M
σV (µ)2
+ 8U2
M−r
[Cohen, Migliorati 2017]
Note: This M/N dependence is essentially optimal.
A. Narayan (U. Utah) Emulators and surrogates
36. The induced distribution
The induced distribution µV can be substantially different from µ.
x ∈ D = R2
, dµ(x) ∝ exp(− x 2
2),
x = (x(1)
, x(2)
), V = span x(1)
α1
x(2)
α2
(α1 + 1)(α2 + 1) ≤ 26
−6 −5 −4 −3 −2 −1 0 1 2 3 4 5 6
−6
−5
−4
−3
−2
−1
0
1
2
3
4
5
6
x(1)
x(2)
Samples from µ
−6 −5 −4 −3 −2 −1 0 1 2 3 4 5 6
−6
−5
−4
−3
−2
−1
0
1
2
3
4
5
6
x(1)
x(2)
Samples from µV
Under certain conditions, can sample from this distribution very efficiently, in
particular with linear complexity in d. [AN 2017]
A. Narayan (U. Utah) Emulators and surrogates
37. Randomized sampling – examples
This analysis tends to give accurate estimates
10 20 30 40 50
10−1
100
101
M
L2
µerror
N = 5
Optimal error = ε µ
µ
µV
50 100 150 200 250 300
10−5
10−3
10−1
101
103
105
107
M
L2
µerror
N = 45
A. Narayan (U. Utah) Emulators and surrogates
38. Randomized sampling – examples
This analysis tends to give accurate estimates
10 20 30 40 50
10−1
100
101
M
L2
µerror
N = 5
Optimal error = ε µ
µ
µV
50 100 150 200 250 300
10−5
10−3
10−1
101
103
105
107
M
L2
µerror
N = 45
Moral of the story:
randomized sampling according to µ is generally bad
randomized sampling according to µV is generally good
Intelligent sampling allows efficient, near-optimal computation of emulators.
A. Narayan (U. Utah) Emulators and surrogates
39. Odds and ends
Robust and accurate least-squares emulators for linear approximations can be
built with biased sampling.
Estimates are optimal: M N implies u − uN µ σV (u).
Estimates are d-independent.
Sampling is efficient if both µ and φn are tensor-product.
Convergence results robust to noise > 0.
No significant changes if y is vector-/function-valued
A. Narayan (U. Utah) Emulators and surrogates
40. Part II: Nonlinear approximation
Sparse approximation
Non-adapted basis functions, nonlinear approximation construction procedure
A. Narayan (U. Utah) Emulators and surrogates
41. Limited measurements
u(x) + = y ≈ uN (x) =
N
n=1
cnφn(x),
x ∈ D ⊆ Rd
y ∈ R c ∈ RN
V = span {φ1, . . . , φN }
{(xm, ym)}
M
m=1 → {cn}
N
n=1
When d > 1, it is common for an a priori N = dim V to be very large.
Least-squares: collecting M ∼ dim V measurements can be infeasible.
A. Narayan (U. Utah) Emulators and surrogates
42. Limited measurements
u(x) + = y ≈ uN (x) =
N
n=1
cnφn(x),
x ∈ D ⊆ Rd
y ∈ R c ∈ RN
V = span {φ1, . . . , φN }
{(xm, ym)}
M
m=1 → {cn}
N
n=1
When d > 1, it is common for an a priori N = dim V to be very large.
Least-squares: collecting M ∼ dim V measurements can be infeasible.
What happens when M < N? The system
Ac ≈ y,
is now underdetermined. Unique solutions can be gauranteed if functional
structure is imposed.
A. Narayan (U. Utah) Emulators and surrogates
43. Sparse approximation
uN (x) =
N
n=1
cnφn(x)
If M < N measurements are available, can we recover the largest M
coefficients from the vector c?
Assume
y(x) =
N
n=1
cnφn(x) + (x), | | < η.
The compressibility of y is measured by
σV,s(c) = inf
d∈RN
d 0≤s
c − d 1 , d 0 := {j ∈ {1, . . . , N} dj = 0
A. Narayan (U. Utah) Emulators and surrogates
44. Sparse approximation
y(x) =
N
n=1
cnφn(x) + (x), | | < η.
ym = y(xm), A ≈ cy
y is assumed to be compressible (i.e., c is assumed compressible)
With a limited number, M, of measurements, seek to approximate the
best s-term approximation of c.
Ideally, s ∼ M.
This is not possible if the sampling points are arbitrarily chosen.
A. Narayan (U. Utah) Emulators and surrogates
45. Compressed sensing
It is possible to recover the best s-term approximation with high probability.
Assume xm are iid sampled from µ, and that φn are L2
µ-orthonormal, and
assume
M CKµs log3
(s) log N,
For any c ∈ RN
, let ym = y(xm) =
N
n=1 cnφn(xm) + (xm), and assume
| | ≤ η.
Then, with probability exceeding 1 − N−γ log3
(s)
, the solution c∗
to the
optimization problem,
min d 1 such that Ad − y 2 ≤ η
√
M,
satisfies
c − c∗
1 ≤ C1σV,s(c) + C2
√
s .
Above,
Kµ = max
n=1,...,N
φn L∞(D).
[Rauhut 2010], [Rauhut, Ward 2010]
A. Narayan (U. Utah) Emulators and surrogates
46. Recovery of models with sparse representations
Figure 2. Transition plots for uniform random variables for d = 2 (top row
row). The left column corresponds to sampling from the random variable
column the CSA method and the right column asymptotic sampling.
For all low-dimensional and high-degree situations considered, CSA
and performs significantly better than than probabilistic sampling acco
optimization tolerances, and when th
tolerance, the authors of [23] obtained
We observe poor recovery since Kµ in sample requirement is poorly behaved:
M CKµs log3
(s) log N, Kµ = max
n=1,...,N
φn L∞(D).
This requirement is heavily dependent on µ.
A. Narayan (U. Utah) Emulators and surrogates
47. Better sampling
Again, choosing a better sampling strategy ameliorates this issue.
Sample xm ∼ µV , solve
min d 1 such that Ad − y 2,w ≤ η
√
M,
where w are weights to make the discrete sampling unbiased.
Exponential variables, Moreover the error in the approximation recovered by the asymptotic bounded
sampling method for Beta variables increases with dimension. When d = 30 the asymptotic bounded
sampling method fails to recover any polynomials regardless of the sparsity or the number of samples
used.
It is worth noting that case of Legendre polynomials sampled by Chebyshev distribution we have
a complete independence of the order of approximation, which agrees with previous results in [42].
However there are numerical results in [23, 49] showing almost no recovery when using the Chebyshev
sampling method in high-dimensions.
With the help of the authors of [23] we have verified that the poor performance exhibited in
the aforementioned papers is a result of numerical issues associated with the authors use of the
`1
-minimization solver in SparseLab [16]. Specifically, the authors of [23] were using more lenient
optimization tolerances, and when these tolerances were made tighter to match our optimization
tolerance, the authors of [23] obtained results consistent with Figure 2.
sampling method for Beta variables increases with dimension. When d = 30 the asymptotic bounded
sampling method fails to recover any polynomials regardless of the sparsity or the number of samples
used.
It is worth noting that case of Legendre polynomials sampled by Chebyshev distribution we have
a complete independence of the order of approximation, which agrees with previous results in [42].
However there are numerical results in [23, 49] showing almost no recovery when using the Chebyshev
sampling method in high-dimensions.
With the help of the authors of [23] we have verified that the poor performance exhibited in
the aforementioned papers is a result of numerical issues associated with the authors use of the
`1
-minimization solver in SparseLab [16]. Specifically, the authors of [23] were using more lenient
optimization tolerances, and when these tolerances were made tighter to match our optimization
tolerance, the authors of [23] obtained results consistent with Figure 2.
A GENERALIZED SAMPLING AND PRECONDITIONING SCHEME F
101
102
103
10 14
10 13
10 12
10 11
10 10
10 9
10 8
10 7
10 6
10 5
10 4
10 3
10 2
10 1
Number of samples M
`2error
CSA
MC
101
Figure 8. The e↵ect of dimension on the convergence of the CSA
the di↵usion equation (28). (Left) 30th degree polynomial in 2
polynomial in 20 dimensions
10 4
10 3
10 2
10 1
rror
CSA
Asymptotic
[Jakeman, AN 2017], [Guo, Zhou, Chen, AN 2016]
A. Narayan (U. Utah) Emulators and surrogates
48. Part III: Nonlinear approximation
Dimension reduction/reduced modeling
Adapted basis functions, nonlinear approximation construction procedure
A. Narayan (U. Utah) Emulators and surrogates
49. Dimension reduction
u(x) + = y ≈ uN (x) =
N
n=1
cnφn(x),
x ∈ D ⊆ Rd
y ∈ RP
cn ∈ RP
A “sample” ym is a vector, possible of large size, P 1.
In scientific models, P is also an indicator of the effort to obtain ym.
Construct V and φ1, . . . , φN , by analyzing
{(xm, ym)}
M
m=1 , (xm, ym) ∈ Rd
× RP
The φn are adapted to the data.
Though φn has no explicit form, evaluating such functions can be much
cheaper than gathering more data.
A. Narayan (U. Utah) Emulators and surrogates
50. Reduced basis methods
Gather (xm, ym) from a scientific model.
The reduced basis method (RBM) for nonlinear, adapted approximation,
constructs the emulator
uN (x) =
N
n=1
cnφn(x) =
N
n=1
yn n(x),
Here:
We need at least N = M data samples ym.
n are cardinal Lagrange functions, satisfying n(xm) = δn,m. They
have no explicit form.
The n are defined implicitly from the scientific model. (Via a Galerkin
procedure.)
This is not POD.
The space V = span{φn}N
n=1 is constructed/defined from the data and the
model.
There is no reason to believe this is a good idea unless xm is chosen well!
A. Narayan (U. Utah) Emulators and surrogates
51. Reduced basis methods
End goal: evaluation of surrogate uN should cost less than acquiring more
data. Costs:
Evaluting Lagrange functions n is the hard part – complexity usually
scales like N3
.
The full model ym is queried only at xm, and nowhere else.
Details of computational efficiency of the surrogate uN depend on
particular problem.
In practice, N ∼ O(10).
A. Narayan (U. Utah) Emulators and surrogates
52. Lagrange functions
-1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1
µ
-3
-2
-1
0
1
2
The Lagrange functions
uN
10,1 uN
10,2 uN
10,3 uN
10,4 uN
10,5 uN
10,6 uN
10,7 uN
10,8 uN
10,9 uN
10,10
-1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1
µ
2
4
6
8
˜∆10(µ)
5 10 15 20 25 30 35 40
Number of bases, N
10
-10
10
-5
10
0
-1 -0.5 0 0.5 1
µ
10
-6
10
-5
10
-4
10
-3
10
-2
10−4
× ˜∆10(µ)
||uN
(µ) − uN
10,E3
(µ)||X
S10
E3
-1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1
µ
-1
-0.5
0
0.5
1
1.5
The Lagrange functions
uN
10,1 uN
10,2 uN
10,3 uN
10,4 uN
10,5 uN
10,6 uN
10,7 uN
10,8 uN
10,9 uN
10,10
3
˜∆10(µ)
A. Narayan (U. Utah) Emulators and surrogates
53. RBM accuracy
Does uN computed via RBM provide a good emulator for u? Depends on the
sampling.
Let u(xm) ∈ H. Suppose we choose
xn+1 = arg max
x∈D
un(x) − u(x) H
(This can be approximated without knowing u!)
Then,
u − uN L∞(D,H) σN (U),
where
U := u(x) x ∈ D ⊂ H,
σN (U) := inf
dim V =N
sup
v∈U
inf
vN ∈V
v − vN H.
[DeVore et al 2013], [Binev et al 2013]
A. Narayan (U. Utah) Emulators and surrogates
54. RBM accuracy
Surrogates for nontrivial problems can be constructed.
(−∆)s
u(z; x) = f(z; ν), (z, x) ∈ Ω × D
u(z; x) = 0, (z, x) ∈ ∂Ω × D
Parameters/variables are x = (s, ν).
4 2 4 6 8 10 12 14
10
-10
10-5
10
0
tion UN associated to problem (42), where µ = (s, ⌫).
1 100 200 309
M
10
2
103
10
4
10
5
tMsolves
s ∈ D1
M · tUN
toffline + M · tU10
1 100 200 309
M
10
2
103
10
4
10
5
tMsolves
s ∈ D2
M · tUN
toffline + M · tU10
Figure 3: The cumulative computation time for M queries of the full order model uN and the
RBM surrogate uN . On the left is for the case s 2 D1 with N = 7; on the right, s 2 D2 with
[Antil, Chen, AN 2018]
A. Narayan (U. Utah) Emulators and surrogates
55. Building emulators
Surrogate models can be enormously useful.
Linear approximations with non-adapted basis functions
“Easiest” to construct, with weakest accuracy guarantees.
Querying surrogate generally very fast.
Useful for analying large datasets
Nonlinear approximations with non-adapted basis functions
Harder to construct, but more general accuracy guarantees.
Querying surrogate still very fast.
Useful when data is limited.
Nonlinear approximations with adapted basis functions
Generally very hard to construct.
Very attractive accuracy bounds, when possible to certify
Depend heavily on data, model, and the transparency of the model.
A. Narayan (U. Utah) Emulators and surrogates
56. Building emulators
Surrogate models can be enormously useful.
Linear approximations with non-adapted basis functions
“Easiest” to construct, with weakest accuracy guarantees.
Querying surrogate generally very fast.
Useful for analying large datasets
Nonlinear approximations with non-adapted basis functions
Harder to construct, but more general accuracy guarantees.
Querying surrogate still very fast.
Useful when data is limited.
Nonlinear approximations with adapted basis functions
Generally very hard to construct.
Very attractive accuracy bounds, when possible to certify
Depend heavily on data, model, and the transparency of the model.
Challenges:
high dimensionality (d, P, or N)
adaptivity and hierarchical constructions
A. Narayan (U. Utah) Emulators and surrogates
57. mathematics of reduced order models
algorithms for approximation and
complexity reduction
computational statistics and data-driven
techniques
https://icerm.brown.edu/programs/sp-s20/
A. Narayan (U. Utah) Emulators and surrogates
58. References
Chkifa, Cohen, Migliorati, Nobile, Tempone, ”Discrete least squares polynomial approximation with
random evaluations – application to parametric and stochastic elliptic PDEs”, ESAIM: Mathematical
Modelling and Numerical Analysis, 49:3 (2015)
Cohen, Davenport, & Leviatan, ”On the Stability and Accuracy of Least Squares Approximations”,
Foundations of Computational Mathematics, 13:5 (2013)
Cohen & Migliorati, ”Optimal weighted least-squares methods”, arXiv:1608.00512 [math, stat]
Jakeman, Narayan, & Zhou, ”A Christoffel function weighted least squares algorithm for collocation
approximations”, Mathematics of Computation, 86:306 (2017)
Narayan, ”Computation of Induced Orthogonal Polynomial Distributions”, arXiv:1704.08465 [math]
(2017)
Narayan & Zhou, ”Stochastic Collocation on Unstructured Multivariate Meshes”, Communications in
Computational Physics, 18:1 (2015)
A. Narayan (U. Utah) Emulators and surrogates