I will review the discussions of the working group on the statistical analysis and algorithms for situations where generating observations of multivariate functions is expensive, such as running a time-consuming computer code, medical trials, large-scale computer simulations, such as climate models, and financial market data. This can result in having only a few samples for high-dimensional input variables. Our approaches are based on the assumption that the cost of
the observations will far exceed the computational cost of the post-processing algorithms. I will describe our discussions on how to extract the most information out of the available small samples and how adaptively identify future samples to minimize the uncertainty in the quantities of interest.
Small Sample Analysis and Algorithms for Multivariate Functions
1. Small Sample Analysis and Algorithms
for Multivariate Functions
Mac Hyman
Tulane University
Joint work with Lin Li, Jeremy Dewar, and Mu Tian (SUNY),
SAMSI WG5 , May 7, 2018
Hyman, Li, Dewar and Tian Small Sample Analysis SAMSI WG5 1 / 27
2. The Problem: Accurate integration of multivariate functions
Goal:To estimate the integral I = Ω f (x)dx,
where f (x) : Ω → R, Ω ⊂ Rd
We are focused on situations where:
Situations where there are few samples (n < few 1000), and the
effective dimension is relatively small, x ∈ Rd , (d < 50);
Function evaluations (samples) f (x) are (very) expensive, such as a
large-scale simulation, and additional samples may not be obtainable;
Little a prior information about f (x) is available; and
We might not have control over the sample locations, which can be
far from a desired distribution (e.g. missing data).
Identify new sample locations to minimize MSE based on existing
information.
Hyman, Li, Dewar and Tian Small Sample Analysis SAMSI WG5 2 / 27
3. The Problem: Accurate integration of multivariate functions
Goal:To estimate the integral I = Ω f (x)dx,
where f (x) : Ω → R, Ω ⊂ Rd
Four Approaches that work pretty well in practice.
How does do they work in theory?
1. Detrending using covariates
2. Voronoi Weighted Quadrature
3. Surrogate Model Quadrature
4. Adaptive Sampling Based on Kriging SE Estimates
Hyman, Li, Dewar and Tian Small Sample Analysis SAMSI WG5 3 / 27
4. Detrending before Integrating
1. Detrending using covariates
Detrending first approximates the underlying function with an easily
integrated surrogate model (covariate).
The integral is then estimated by the exact integral of surrogate +
an approximation of the residual.
For example, the , f (x) can be approximated by a linear combination of
simple basis functions, such as Legendre polynomials, p(x) = t
i=1 βi ψi (x),
which can be integrated exactly, and define
I(f ) =
Ω
f (x)dx (1)
=
Ω
p(x)dx +
Ω
[f (x) − p(x)]dx . (2)
Goal is the pick f (x) to minimize the residual Ω[f (x) − p(x)]dx.
Hyman, Li, Dewar and Tian Small Sample Analysis SAMSI WG5 4 / 27
5. Detrending before Integrating
The error for the detrended integral is proportional to the
standard deviation of the residual p(x) − f (x), not f (x)
The residual errors are the only errors in the integration approximation
I(f ) =
Ω
f (x)dx (3)
≈ ˆI(f ) =
Ω
p(x)dx +
1
n
[f (xi ) − p(xi )] (4)
PMC error bound: ||en|| = O( 1√
n
σ(f − p)) and
QMC error bound: ||en||≤O(1
n V [f − p](log n)(d−1))
1. The error bounds are based on σ(f − p) and V [f − p] instead of σ(f )
and V [f ]. The least squares fit reduces these quantities.
2. The convergence rates are the same; the constants are reduced.
Hyman, Li, Dewar and Tian Small Sample Analysis SAMSI WG5 5 / 27
6. Quintic detrending reduces MC and QMC errors by factor of 100
Error ˆI(f ) − I(f ) Distributions I(f ) = [0,1]6 i cos(ixi)dx
Error Distributions (6D, 600 points) for PMC (top) and LDS QMC
(bottom) for detrending with a cubic and quintic, K = 3, 5, polynomial.
The x-axis bounds are 10 times smaller for the LDS/QMC samples.
Hyman, Li, Dewar and Tian Small Sample Analysis SAMSI WG5 6 / 27
7. Detrending reduces the error constant, not the convergence rates
Detrending doesn’t change the convergence rates PMC
(O(n−1
2 )) and QMC (O(n−1
)) for [0,1]6 i cos(ixi)dx
Errors for PMC (upper lines −−) and QMC (lower lines − · −) for
constant K = 0 (left), cubic K = 3 (center), and quintic K = 5 (right).
Hyman, Li, Dewar and Tian Small Sample Analysis SAMSI WG5 7 / 27
8. Detrending reduces the error constant, not the convergence rates
Mean errors for [0,1]5 i cos(ixi)dx with detrending
Detrending errors: degrees K = 0, 1, 2, 3, 4, 5 for 500 − 4000 samples.
Convergence rates don’t chance, but the constant is reduced by 0.001
Hyman, Li, Dewar and Tian Small Sample Analysis SAMSI WG5 8 / 27
9. Curse of Dimensionality for polynomial detrending
High degree polynomials in high dimensions are quickly
constrained by the Curse of Dimensionality
For the least squares coefficients to be identifiable, the number of samples
must be ≥ the number of coefficients in the detrending function.
Degree Dimension 1 2 3 4 5 10 20
0 1 1 1 1 1 1 1
1 2 3 4 5 6 11 21
2 3 6 10 15 21 66 231
3 4 10 20 35 56 286 1,771
4 5 15 35 70 126 1,001 10,626
5 6 21 56 126 252 3,003 53,130
10 11 66 286 1,001 3,003 184,756 30,045,015
The mixed variable terms in multivariate polynomials create an explosion
in the number of terms as a function of the degree and dimension.
For example, a 5th degree polynomial in 20 dimensions has 53,130 terms.
The complexity of this approach grows linearly in the number of
basis functions, and as O(n3) as the number of samples increases.
Hyman, Li, Dewar and Tian Small Sample Analysis SAMSI WG5 9 / 27
10. Sparse model selection
The Lp sparse penalty regularization method
p(x) =
t
i=1
βi ψi (x) (5)
β = argmin{
1
2
||Aβ − f ||2
2 +
1
p
λ||β||p} (6)
(7)
This system can be solved using a cyclic coordinate descent algorithm, or
factored iterated reweighted least-squares (IRLS) solving a linear system
of (n = number of samples) equations on each iteration.
If the function f (x) varies along some directions more than others, then
sparse subset selection extracts the appropriate basis functions based on
the effective dimension of the active subspaces.
Hyman, Li, Dewar and Tian Small Sample Analysis SAMSI WG5 10 / 27
11. Sparse model selection
Sparse subset detrending allows high degree polynomial
dictionaries for sparse sample distributions.
[0,1]6 i cos(ixi )dx; Errors PMC (top) and QMC (bottom) degrees
K = 0, 3, 5. K = 5 fits keep 35% (PMC) or 29% (QMC) of terms.
Hyman, Li, Dewar and Tian Small Sample Analysis SAMSI WG5 11 / 27
12. Least squares detrending = a weighted quadrature rule
Least squares detrending is equivalent to using a weighted
quadrature rule
The integral of a least squares detrending fit through the data points can
be represented as a weighted quadrature rule:
I(f ) =
Ω
f (x)dx =
i Ωi
f (x)dx = wi
¯fi ≈ ˆwi f (xi )
where ¯fi is the mean of f in the Voronoi volume (wi ) of Ωi near xi , and
ˆwi ≈ wi . The error depends on (¯fi − f (xi )) and (wi − ˆwi ).
When the sample points have low discrepancy, then wi ≈ ˆwi = 1/n is a
good approximation.
Can this be improved if we replace the weights with a better approxima-
tion of the Voronoi volume?
Hyman, Li, Dewar and Tian Small Sample Analysis SAMSI WG5 12 / 27
13. Voronoi Weighted Quadrature
2. Voronoi Weighted Quadrature
The Voronoi weighted quadrature rule is defined as
In(f ) =
n
i=1
wi f (xi )
where wi is the Voronoi volume associated with the sample xi that is
closer to xi than any other sample point.
• The Voronoi weighted quadrature rule, In(f ), is exact if f (x) is piece-
wise constant over each Voronoi volume.
• Solving for the exact Voronoi volumes is expensive in high dimensions
and suffers from the curse of dimensionality.
• Solution: Use LDS to approximate these volumes.
Hyman, Li, Dewar and Tian Small Sample Analysis SAMSI WG5 13 / 27
14. Voronoi Weighted Quadrature
Voronoi Weighted Quadrature
The weights for the Voronoi quadrature rule
In(f ) = wi f (xi )
can be approximated using nearest neighbors of a dense reference LDS.
Step 1: Generate a dense LDS, {ˆxj }, with NLDS points.
Step 2: Compute the distance from each LDS to the original sample set.
Step 3: Define Wi as the number of LDS points closest to xi .
Step 4: Rescale these counts to define the weights wi = Wi /NLDS
(and normalize by the domain volume, if needed).
The weights wi converge to the Voronoi volumes as O(1/NLDS )
Hyman, Li, Dewar and Tian Small Sample Analysis SAMSI WG5 14 / 27
15. Voronoi Weighted Quadrature
Voronoi Volumes Estimated by Low Discrepancy Sample
The fraction of LDS samples nearest to each sample is used to estimate
the Voronoi volume for the sample as a fraction of the domain volume.
Works for samples living in a blob.
Hyman, Li, Dewar and Tian Small Sample Analysis SAMSI WG5 15 / 27
16. Voronoi Weighted Quadrature
Simple 3D example
1 1.2 1.4 1.6 1.8 2 2.2 2.4
log10
(number of samples)
-2.6
-2.4
-2.2
-2
-1.8
-1.6
-1.4
-1.2
-1
log10
(meanerror)
trig Integration error 3D
MC error
MC Voronoi error
1 1.2 1.4 1.6 1.8 2 2.2 2.4
log10
(number of samples)
-3
-2.5
-2
-1.5
-1
log10
(meanerror)
trig Integration error 3D
LDS error
LSD Voronoi error
In 3D, the Voronoi weights are much more effective in reducing the errors
when the original sample is iid MC than its for a LDS QMC sample.
In(f ) = ˆwi f (xi ) ˆwi = estimate of xi Voronoi volume
Hyman, Li, Dewar and Tian Small Sample Analysis SAMSI WG5 16 / 27
17. Voronoi Weighted Quadrature
Simple 6D example
1.4 1.6 1.8 2 2.2 2.4 2.6 2.8
log10
(number of samples)
-2.2
-2
-1.8
-1.6
-1.4
-1.2
-1
log10
(meanerror)
trig Integration error 6D
MC error
MC Voronoi error
1.4 1.6 1.8 2 2.2 2.4 2.6 2.8
log10
(number of samples)
-2.8
-2.6
-2.4
-2.2
-2
-1.8
-1.6
-1.4
log10
(meanerror)
trig Integration error 6D
LDS error
LSD Voronoi error
In 6D, the Voronoi weighted quadrature reduces the errors for iid MC
samples. The approach is not effective for LDS in higher dimensions.
We are looking for ideas for explaining why the Voronoi weighted quadra-
ture approach is less effective for LDS in higher dimensions.
Hyman, Li, Dewar and Tian Small Sample Analysis SAMSI WG5 17 / 27
18. Surrogate Model Quadrature
3. Surrogate Model Quadrature
Interpolate samples to a dense LDS, and use standard QMC quadra-
ture on the surrogate points.
Step 1: Generate a dense LDS sample, {ˆxj }, with NLDS points.
Step 2: Use kriging to approximate ˆf (ˆxj ) at the LDS points.
Step 3: Estimate the integral by
I(f ) ≈
1
NLDS
ˆf (ˆxj )
We use the DACE kriging package with a quadratic polynomial basis
based on distance-weighted least-squares with radial Gaussian weights.
Hyman, Li, Dewar and Tian Small Sample Analysis SAMSI WG5 18 / 27
19. Surrogate Model Quadrature
Simple 3D example
1 1.2 1.4 1.6 1.8 2 2.2 2.4
log10
(number of samples)
-3
-2.5
-2
-1.5
-1
log10
(meanerror)
trig Integration error 3D
MC error
MC SLDS Error
1 1.2 1.4 1.6 1.8 2 2.2 2.4
log10
(number of samples)
-4.5
-4
-3.5
-3
-2.5
-2
-1.5
-1
log10
(meanerror)
trig Integration error 3D
LDS error
LDS SLDS Error
In 3D, the surrogate data points are effective in reducing the errors for
both iid MC and LDS samples.
Hyman, Li, Dewar and Tian Small Sample Analysis SAMSI WG5 19 / 27
20. Surrogate Model Quadrature
Simple 6 D example
1.4 1.6 1.8 2 2.2 2.4
log10
(number of samples)
-2.2
-2
-1.8
-1.6
-1.4
-1.2
-1
log10
(meanerror)
trig Integration error 6D
MC error
MC SLDS Error
1.4 1.6 1.8 2 2.2 2.4 2.6 2.8
log10
(number of samples)
-4.5
-4
-3.5
-3
-2.5
-2
-1.5
log10
(meanerror)
trig Integration error 6D
LDS error
LDS SLDS Error
In 6D, the surrogate data points are effective in reducing the errors for
both iid MC and LDS samples.
Hyman, Li, Dewar and Tian Small Sample Analysis SAMSI WG5 20 / 27
21. Comparing Voronoi and Surrogate Model Quadrature
The surrogate quadrature is consistently better than the
Voronoi quadrature
1 1.2 1.4 1.6 1.8 2 2.2 2.4
log10
(number of samples)
-3
-2.5
-2
-1.5
-1
log10
(meanerror)
trig Integration error 3D
MC error
MC Voronoi error
MC SLDS Error
1 1.2 1.4 1.6 1.8 2 2.2 2.4
log10
(number of samples)
-4.5
-4
-3.5
-3
-2.5
-2
-1.5
-1
log10
(meanerror)
trig Integration error 3D
LDS error
LSD Voronoi error
LDS SLDS Error
Both methods reduce the errors in this 3D dimensional problem.
Hyman, Li, Dewar and Tian Small Sample Analysis SAMSI WG5 21 / 27
22. Comparing Voronoi and Surrogate Model Quadrature
The surrogate quadrature is consistently better than the
Voronoi quadrature
1.4 1.6 1.8 2 2.2 2.4 2.6 2.8
log10
(number of samples)
-2.4
-2.2
-2
-1.8
-1.6
-1.4
-1.2
-1
log10
(meanerror)
trig Integration error 6D
MC error
MC Voronoi error
MC SLDS Error
1.4 1.6 1.8 2 2.2 2.4 2.6 2.8
log10
(number of samples)
-4.5
-4
-3.5
-3
-2.5
-2
-1.5
log10
(meanerror)
trig Integration error 6D
LDS error
LSD Voronoi error
LDS SLDS Error
• In this 6D dimensional problem, both methods reduce the error when
the original sample is not a LDS.
• When the original sample is LDS, then the Voronoi quadrature doesn’t
improve the accuracy, while the surrogate model continues to be effective.
Hyman, Li, Dewar and Tian Small Sample Analysis SAMSI WG5 22 / 27
23. Adaptive Sampling Quadrature
4. Adaptive Sampling Based on Kriging SE Estimates
Instead of adding new samples to ’fill in the holes’ of the existing
distribution, use kriging error estimates to guide future samples.
Iterate until converged, or max number of function values is reached:
Step 1: Generate a dense LDS sample, {ˆxj }, with NLDS points.
Step 2: Using kriging to approximate the function, ˆf (ˆxj ), and estimate
standard errors, SEj , at the LDS points.
Step 3: If the max{SEj } > tolerance, then evaluate the function with the
largest SEj , and return to Step 2 .
Hyman, Li, Dewar and Tian Small Sample Analysis SAMSI WG5 23 / 27
24. Adaptive Sampling Quadrature
Initial Random Sample
0 0.2 0.4 0.6 0.8 1
0
0.2
0.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 1
0
0.2
0.4
0.6
0.8
1
• = current samples
The large standard errors are small red circles and smaller errors are blue.
The next sample will be evaluated at the largest SE.
Hyman, Li, Dewar and Tian Small Sample Analysis SAMSI WG5 24 / 27
25. Adaptive Sampling Quadrature
First and second adaptive samples are in the corners
0 0.2 0.4 0.6 0.8 1
0
0.2
0.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 1
0
0.2
0.4
0.6
0.8
1
The large standard errors are small red circles and smaller errors are blue.
• = current samples • = largest SE and next sample.
Hyman, Li, Dewar and Tian Small Sample Analysis SAMSI WG5 25 / 27
26. Adaptive Sampling Quadrature
The new samples fill in the holes to reduce the
uncertainity
0 0.2 0.4 0.6 0.8 1
0
0.2
0.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 1
0
0.2
0.4
0.6
0.8
1
The large standard errors are small red circles and smaller errors are blue.
• = current samples • = largest SE and next sample.
Hyman, Li, Dewar and Tian Small Sample Analysis SAMSI WG5 26 / 27
27. Future Research Questions
Future Research Questions
Continue exploring surrogate models for guiding adaptive sampling.
Develop theory for how many LDS surrogate samples are needed for the
Voronoi weights and surrogate quadrature methods.
Use the surrogate approach to interpolate to a sparse grid, instead of the
LDS, and use higher order quadrature rules.
Combine the surrogate LDS methods with the detrending approaches.
Develop kriging methods that preserve local positivity, monotonicity, and
convexity of the data for both design of experiment surrogate models and
surrogate quadrature methods.
Hyman, Li, Dewar and Tian Small Sample Analysis SAMSI WG5 27 / 27