Small Sample Analysis and Algorithms for Multivariate Functions

Small Sample Analysis and Algorithms
for Multivariate Functions
Mac Hyman
Tulane University
Joint work with Lin Li, Jeremy Dewar, and Mu Tian (SUNY),
SAMSI WG5 , May 7, 2018
Hyman, Li, Dewar and Tian Small Sample Analysis SAMSI WG5 1 / 27

The Problem: Accurate integration of multivariate functions
Goal:To estimate the integral I = Ω f (x)dx,
where f (x) : Ω → R, Ω ⊂ Rd
We are focused on situations where:
Situations where there are few samples (n < few 1000), and the
eﬀective dimension is relatively small, x ∈ Rd , (d < 50);
Function evaluations (samples) f (x) are (very) expensive, such as a
large-scale simulation, and additional samples may not be obtainable;
Little a prior information about f (x) is available; and
We might not have control over the sample locations, which can be
far from a desired distribution (e.g. missing data).
Identify new sample locations to minimize MSE based on existing
information.

The Problem: Accurate integration of multivariate functions
Goal:To estimate the integral I = Ω f (x)dx,
where f (x) : Ω → R, Ω ⊂ Rd
Four Approaches that work pretty well in practice.
How does do they work in theory?
1. Detrending using covariates
2. Voronoi Weighted Quadrature
3. Surrogate Model Quadrature
4. Adaptive Sampling Based on Kriging SE Estimates

Detrending before Integrating
1. Detrending using covariates
Detrending ﬁrst approximates the underlying function with an easily
integrated surrogate model (covariate).
The integral is then estimated by the exact integral of surrogate +
an approximation of the residual.
For example, the , f (x) can be approximated by a linear combination of
simple basis functions, such as Legendre polynomials, p(x) = t
i=1 βi ψi (x),
which can be integrated exactly, and deﬁne
I(f ) =
Ω
f (x)dx (1)
=
Ω
p(x)dx +
Ω
[f (x) − p(x)]dx . (2)
Goal is the pick f (x) to minimize the residual Ω[f (x) − p(x)]dx.

Detrending before Integrating
The error for the detrended integral is proportional to the
standard deviation of the residual p(x) − f (x), not f (x)
The residual errors are the only errors in the integration approximation
I(f ) =
Ω
f (x)dx (3)
≈ ˆI(f ) =
Ω
p(x)dx +
1
n
[f (xi ) − p(xi )] (4)
PMC error bound: ||en|| = O( 1√
n
σ(f − p)) and
QMC error bound: ||en||≤O(1
n V [f − p](log n)(d−1))
1. The error bounds are based on σ(f − p) and V [f − p] instead of σ(f )
and V [f ]. The least squares ﬁt reduces these quantities.
2. The convergence rates are the same; the constants are reduced.

Quintic detrending reduces MC and QMC errors by factor of 100
Error ˆI(f ) − I(f ) Distributions I(f ) = [0,1]6 i cos(ixi)dx
Error Distributions (6D, 600 points) for PMC (top) and LDS QMC
(bottom) for detrending with a cubic and quintic, K = 3, 5, polynomial.
The x-axis bounds are 10 times smaller for the LDS/QMC samples.

Detrending reduces the error constant, not the convergence rates
Detrending doesn’t change the convergence rates PMC
(O(n−1
2 )) and QMC (O(n−1
)) for [0,1]6 i cos(ixi)dx
Errors for PMC (upper lines −−) and QMC (lower lines − · −) for
constant K = 0 (left), cubic K = 3 (center), and quintic K = 5 (right).

Detrending reduces the error constant, not the convergence rates
Mean errors for [0,1]5 i cos(ixi)dx with detrending
Detrending errors: degrees K = 0, 1, 2, 3, 4, 5 for 500 − 4000 samples.
Convergence rates don’t chance, but the constant is reduced by 0.001

Curse of Dimensionality for polynomial detrending
High degree polynomials in high dimensions are quickly
constrained by the Curse of Dimensionality
For the least squares coefficients to be identifiable, the number of samples
must be ≥ the number of coefficients in the detrending function.
Degree Dimension 1 2 3 4 5 10 20
0 1 1 1 1 1 1 1
1 2 3 4 5 6 11 21
2 3 6 10 15 21 66 231
3 4 10 20 35 56 286 1,771
4 5 15 35 70 126 1,001 10,626
5 6 21 56 126 252 3,003 53,130
10 11 66 286 1,001 3,003 184,756 30,045,015
The mixed variable terms in multivariate polynomials create an explosion
in the number of terms as a function of the degree and dimension.
For example, a 5th degree polynomial in 20 dimensions has 53,130 terms.
The complexity of this approach grows linearly in the number of
basis functions, and as O(n3) as the number of samples increases.

Sparse model selection
The Lp sparse penalty regularization method
p(x) =
t
i=1
βi ψi (x) (5)
β = argmin{
1
2
||Aβ − f ||2
2 +
1
p
λ||β||p} (6)
(7)
This system can be solved using a cyclic coordinate descent algorithm, or
factored iterated reweighted least-squares (IRLS) solving a linear system
of (n = number of samples) equations on each iteration.
If the function f (x) varies along some directions more than others, then
sparse subset selection extracts the appropriate basis functions based on
the eﬀective dimension of the active subspaces.

Sparse model selection
Sparse subset detrending allows high degree polynomial
dictionaries for sparse sample distributions.
[0,1]6 i cos(ixi )dx; Errors PMC (top) and QMC (bottom) degrees
K = 0, 3, 5. K = 5 ﬁts keep 35% (PMC) or 29% (QMC) of terms.

Least squares detrending = a weighted quadrature rule
Least squares detrending is equivalent to using a weighted
quadrature rule
The integral of a least squares detrending ﬁt through the data points can
be represented as a weighted quadrature rule:
I(f ) =
Ω
f (x)dx =
i Ωi
f (x)dx = wi
¯fi ≈ ˆwi f (xi )
where ¯fi is the mean of f in the Voronoi volume (wi ) of Ωi near xi , and
ˆwi ≈ wi . The error depends on (¯fi − f (xi )) and (wi − ˆwi ).
When the sample points have low discrepancy, then wi ≈ ˆwi = 1/n is a
good approximation.
Can this be improved if we replace the weights with a better approxima-
tion of the Voronoi volume?

Voronoi Weighted Quadrature
2. Voronoi Weighted Quadrature
The Voronoi weighted quadrature rule is deﬁned as
In(f ) =
n
i=1
wi f (xi )
where wi is the Voronoi volume associated with the sample xi that is
closer to xi than any other sample point.
• The Voronoi weighted quadrature rule, In(f ), is exact if f (x) is piece-
wise constant over each Voronoi volume.
• Solving for the exact Voronoi volumes is expensive in high dimensions
and suﬀers from the curse of dimensionality.
• Solution: Use LDS to approximate these volumes.

The weights for the Voronoi quadrature rule
In(f ) = wi f (xi )
can be approximated using nearest neighbors of a dense reference LDS.
Step 1: Generate a dense LDS, {ˆxj }, with NLDS points.
Step 2: Compute the distance from each LDS to the original sample set.
Step 3: Deﬁne Wi as the number of LDS points closest to xi .
Step 4: Rescale these counts to deﬁne the weights wi = Wi /NLDS
(and normalize by the domain volume, if needed).
The weights wi converge to the Voronoi volumes as O(1/NLDS )

Voronoi Volumes Estimated by Low Discrepancy Sample
The fraction of LDS samples nearest to each sample is used to estimate
the Voronoi volume for the sample as a fraction of the domain volume.
Works for samples living in a blob.

Simple 3D example
1 1.2 1.4 1.6 1.8 2 2.2 2.4
log10
(number of samples)
-2.6
-2.4
-2.2
-2
-1.8
-1.6
-1.4
-1.2
-1
log10
(meanerror)
trig Integration error 3D
MC error
MC Voronoi error
1 1.2 1.4 1.6 1.8 2 2.2 2.4
log10
(number of samples)
-3
-2.5
-2
-1.5
-1
log10
(meanerror)
LDS error
LSD Voronoi error
In 3D, the Voronoi weights are much more eﬀective in reducing the errors
when the original sample is iid MC than its for a LDS QMC sample.
In(f ) = ˆwi f (xi ) ˆwi = estimate of xi Voronoi volume

Simple 6D example
1.4 1.6 1.8 2 2.2 2.4 2.6 2.8
log10
(number of samples)
-2.2
-2
-1.8
-1.6
-1.4
-1.2
-1
log10
(meanerror)
MC error
MC Voronoi error
1.4 1.6 1.8 2 2.2 2.4 2.6 2.8
log10
(number of samples)
-2.8
-2.6
-2.4
-2.2
-2
-1.8
-1.6
-1.4
log10
(meanerror)
LDS error
LSD Voronoi error
In 6D, the Voronoi weighted quadrature reduces the errors for iid MC
samples. The approach is not eﬀective for LDS in higher dimensions.
We are looking for ideas for explaining why the Voronoi weighted quadra-
ture approach is less eﬀective for LDS in higher dimensions.

Surrogate Model Quadrature
3. Surrogate Model Quadrature
Interpolate samples to a dense LDS, and use standard QMC quadra-
ture on the surrogate points.
Step 1: Generate a dense LDS sample, {ˆxj }, with NLDS points.
Step 2: Use kriging to approximate ˆf (ˆxj ) at the LDS points.
Step 3: Estimate the integral by
I(f ) ≈
1
NLDS
ˆf (ˆxj )
We use the DACE kriging package with a quadratic polynomial basis
based on distance-weighted least-squares with radial Gaussian weights.

Simple 3D example
1 1.2 1.4 1.6 1.8 2 2.2 2.4
log10
(number of samples)
-3
-2.5
-2
-1.5
-1
log10
(meanerror)
MC error
MC SLDS Error
1 1.2 1.4 1.6 1.8 2 2.2 2.4
log10
(number of samples)
-4.5
-4
-3.5
-3
-2.5
-2
-1.5
-1
log10
(meanerror)
LDS error
LDS SLDS Error
In 3D, the surrogate data points are eﬀective in reducing the errors for
both iid MC and LDS samples.

Simple 6 D example
1.4 1.6 1.8 2 2.2 2.4
log10
(number of samples)
-2.2
-2
-1.8
-1.6
-1.4
-1.2
-1
log10
(meanerror)
MC error
MC SLDS Error
1.4 1.6 1.8 2 2.2 2.4 2.6 2.8
log10
(number of samples)
-4.5
-4
-3.5
-3
-2.5
-2
-1.5
log10
(meanerror)
LDS error
LDS SLDS Error
In 6D, the surrogate data points are eﬀective in reducing the errors for
both iid MC and LDS samples.

Comparing Voronoi and Surrogate Model Quadrature
The surrogate quadrature is consistently better than the
Voronoi quadrature
1 1.2 1.4 1.6 1.8 2 2.2 2.4
log10
(number of samples)
-3
-2.5
-2
-1.5
-1
log10
(meanerror)
MC error
MC Voronoi error
MC SLDS Error
1 1.2 1.4 1.6 1.8 2 2.2 2.4
log10
(number of samples)
-4.5
-4
-3.5
-3
-2.5
-2
-1.5
-1
log10
(meanerror)
LDS error
LSD Voronoi error
LDS SLDS Error
Both methods reduce the errors in this 3D dimensional problem.

Comparing Voronoi and Surrogate Model Quadrature
The surrogate quadrature is consistently better than the
Voronoi quadrature
1.4 1.6 1.8 2 2.2 2.4 2.6 2.8
log10
(number of samples)
-2.4
-2.2
-2
-1.8
-1.6
-1.4
-1.2
-1
log10
(meanerror)
MC error
MC Voronoi error
MC SLDS Error
1.4 1.6 1.8 2 2.2 2.4 2.6 2.8
log10
(number of samples)
-4.5
-4
-3.5
-3
-2.5
-2
-1.5
log10
(meanerror)
LDS error
LSD Voronoi error
LDS SLDS Error
• In this 6D dimensional problem, both methods reduce the error when
the original sample is not a LDS.
• When the original sample is LDS, then the Voronoi quadrature doesn’t
improve the accuracy, while the surrogate model continues to be eﬀective.

Adaptive Sampling Quadrature
4. Adaptive Sampling Based on Kriging SE Estimates
Instead of adding new samples to ’ﬁll in the holes’ of the existing
distribution, use kriging error estimates to guide future samples.
Iterate until converged, or max number of function values is reached:
Step 1: Generate a dense LDS sample, {ˆxj }, with NLDS points.
Step 2: Using kriging to approximate the function, ˆf (ˆxj ), and estimate
standard errors, SEj , at the LDS points.
Step 3: If the max{SEj } > tolerance, then evaluate the function with the
largest SEj , and return to Step 2 .

Initial Random Sample
0 0.2 0.4 0.6 0.8 1
0
0.2
0.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 1
0
0.2
0.4
0.6
0.8
1
• = current samples
The large standard errors are small red circles and smaller errors are blue.
The next sample will be evaluated at the largest SE.

First and second adaptive samples are in the corners
0 0.2 0.4 0.6 0.8 1
0
0.2
0.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 1
0
0.2
0.4
0.6
0.8
1
• = current samples • = largest SE and next sample.

The new samples ﬁll in the holes to reduce the
uncertainity
0 0.2 0.4 0.6 0.8 1
0
0.2
0.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 1
0
0.2
0.4
0.6
0.8
1
• = current samples • = largest SE and next sample.

Future Research Questions
Future Research Questions
Continue exploring surrogate models for guiding adaptive sampling.
Develop theory for how many LDS surrogate samples are needed for the
Voronoi weights and surrogate quadrature methods.
Use the surrogate approach to interpolate to a sparse grid, instead of the
LDS, and use higher order quadrature rules.
Combine the surrogate LDS methods with the detrending approaches.
Develop kriging methods that preserve local positivity, monotonicity, and
convexity of the data for both design of experiment surrogate models and
surrogate quadrature methods.

Small Sample Analysis and Algorithms for Multivariate Functions

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Small Sample Analysis and Algorithms for Multivariate Functions

Similar to Small Sample Analysis and Algorithms for Multivariate Functions (20)

More from The Statistical and Applied Mathematical Sciences Institute

More from The Statistical and Applied Mathematical Sciences Institute (20)

Recently uploaded

Recently uploaded (20)

Small Sample Analysis and Algorithms for Multivariate Functions