Sampling-SDM2012_Jun

Improving the Accuracy of Surrogate Models
Using Inverse Transform Sampling
Junqiang Zhang*, Achille Messac#, Jie Zhang*, and Souma Chowdhury*
* Rensselaer Polytechnic Institute, Department of Mechanical, Aerospace, and Nuclear Engineering
# Syracuse University, Department of Mechanical and Aerospace Engineering
53rd AIAA/ASME/ASCE/AHS/ASC Structures, Structural Dynamics
and Materials Conference
8th AIAA Multidisciplinary Design Optimization Specialist Conference
April 23 - 26, 2012
Honolulu, Hawaii

Introduction
• Sampling is an important component of optimization, numerical
simulations, design of experiments and uncertainty analysis.
• Surrogate modeling is concerned with the construction of
approximation models to estimate the system performance, and to
develop relationships between specific system inputs and outputs.
• It is expected that an intelligent selection of sample points can
increase the accuracy of surrogate models.
2
Surrogate

3
Sampling Based on Probability Distribution
• Observations of inputs often follow a distribution.
• A set of sample points representative of the naturally
occurring distribution of inputs is often desirable.
Distribution of a population Sample points
-20
0
20
-20
0
6
4
2
0
20
x 10-3
x2 x1
PDF
x1
x2
20
10
0
-10
-20
-20 -10 0 10 20

Presentation Outline
4
Research Objectives and Motivation
Probability-based sampling methods overview
Inverse transform sampling
Surrogate model development
• Surrogate model performance comparison
• Performance in increasing sample space
Concluding remarks

 Certain inputs occur more frequently, comprising regions of
high interests in the condition space.
 It is desirable to have higher accuracy in the system response
(surrogate) in the regions of higher interest.
5
Motivation and Research Objectives
Motivation:
Objectives:
 Develop a sampling strategy for surrogate model
development, which promotes higher accuracy in regions of
high interest (of the observed input).

Existing Probabilistic Sampling Strategies
6
• Rejection sampling
• Importance sampling
• Markov Chain Monte Carlo
• Metropolis-Hastings Sampling
• Gibbs Sampling

7
Inverse Transform Sampling: Key Features
 Inverse transform can
• Sample more points in the regions where random variables
have higher probability densities; and
• Sample fewer points in the regions where random variables
have low probability densities.
 The probability of random variables is used as the metric of
distance instead of the Euclidean distance.
 Sample points are uniform in terms of the probability
differences.

8
Procedure: Step 1
Random Variable Observations
Distribution Function Fitting
Generating the Sequence of CDFs
Coordinates Evaluation
Step 1
Step 2
Step 3
Step 4
The occurrence of sampling
variables should be sufficiently
observed.
10
5
0
-5
-10
-15
-20
-5 0 5 10 15
x1
x2

9
Procedure : Step 2
Approaches
• The least squares method
• The least absolute deviations method
• The generalized method of moments
• The Maximum Likelihood Estimation
Step 1
Step 2
Step 3
Step 4
-20
0
20
-20
0
0.015
0.01
0.005
0
20
x2 x1
PDF

10
Procedure : Step 3
• CDF increases from 0 to 1.
• Low-discrepancy sampling methods
generate uniformly distributed
sequences between 0 and 1 in all
dimensions of a sample space.
• Van der Corput sequence
• Halton/Hammersley sequence
• Sobol sequence
• Faure sequence
Generate the Sequence of CDFs
Step 1
Step 2
Step 3
Step 4
1
0.8
0.6
0.4
0.2
0
0 0.2 0.4 0.6 0.8 1
CDF(x1)
CDF(x2)

11
Procedure : Step 4
• Coordinates are evaluated using the
inverse function of CDF.
• Analytical expressions
• Numerical approaches
• The Newton’s method
• The Levenberg-Marquardt algorithm
• The trust region methods
Step 1
Step 2
Step 3
Step 4
x1
x2
20
10
0
-10
-20
-20 -10 0 10 20

12
More Applications
• More points
-20
0
20
-20
0
0.015
0.01
0.005
0
20
x2 x1
PDF
x1
x2
20
10
0
-10
-20
-20 -10 0 10 20
x1
x2
20
10
0
-10
-20
-20 -10 0 10 20
-20
0
20
-20
0
0.015
0.01
0.005
0
0.015
0.01
0.005
0
20
x2 x1
PDF
x1
x2
20
10
0
-10
-20
20
10
0
-10
-20
-20 -10 0 10 20
-20
0
20
-20
0
20
x2 x1
PDF
x1
x2
-20 -10 0 10 20
31
127
• Multimodal functions
Bi-modal
Quad-modal

13
Window Performance Evaluation
• The heat transfer rate through a triple pane window is
evaluated under varying climatic conditions.
• A CFD model of the triple pane window is created.
• Sample climatic conditions are boundary conditions of the
window CFD model.
Cross Section

14
Step 1 Random Variable Observations
Three climatic conditions:
• Air temperature
• Wind speed
• Solar radiation
.
Michigan, ND.
3720 hourly observations for either
January or August from 2006 to 2010

15
Step 2 Distribution Function Fitting
• Air temperature: Gaussian
• Wind speed: Weibull
• Solar radiation: Gamma
Parameters are fitted using the Maximum Likelihood Estimation.
Michigan, ND.

Michigan, ND.
16
Step 3 Generate the Sequence of CDFs
1
0.8
0.6
0.4
0.2
0
270 280 290 300 310
Temperature
CDF
1
0.8
0.6
0.4
0.2
0
0 5 10 15
Wind Speed
CDF
1
0.8
0.6
0.4
0.2
0
0 500 1000 1500
Solar radiation
CDF
Sobol sequence

Michigan, ND.
17
Step 4 Coordinates Evaluation
1
0.8
0.6
0.4
0.2
0
270 280 290 300 310
Temperature
CDF
1
0.8
0.6
0.4
0.2
0
0 5 10 15
Wind Speed
CDF
1
0.8
0.6
0.4
0.2
0
0 500 1000 1500
Solar radiation
CDF

18
Distribution of Sample Points
• Sample climatic conditions for January
• Sample climatic conditions for August
Sample points crowd in the region where PDF is high.

19
Surrogate Model Development
• The heat transfer rate through the window is evaluated using
31 sample climatic conditions for either January or August,
respectively.
• Two surrogate models are developed for January and August
using Kriging, respectively.
Outdoor temperature
Wind speed
Solar radiation
Heat flux
Kriging
Inputs
Output
 In this paper, we use a Matlab Kriging
toolbox DACE (Design and Analysis
of Computer Experiments), developed
by Dr. Nielsen.

20
Surrogate Model Performance Criteria
For January and August, 3720 climatic conditions are used to
evaluate errors of each surrogate.
The performance of the surrogate can be evaluated using:
• Root Mean Squared Error (RMSE)
• Root Mean Squared Percentage Error (RMSPE)
• Maximum Absolute Error (MAE)
• Maximum Percentage Error (MPE)

21
Surrogate Model Performance Comparison
Month Method RMSE MAE RMSPE MPE
January Inverse 0.047 0.49 0.64% 7.2%
Sobol 0.054 0.30 0.68% 9.3%
August Inverse 0.079 0.54 11% 318%
Sobol 0.094 0.32 85% 4373%
• RMSE, RMSPE, and MPE: Inverse transform sampling
performs better than the Sobol sequence.
• MAE: Inverse transform sampling has a larger MAE values.

22
Performance in Increasing Sample Space
• All the hourly climatic conditions are classified into regions
with increasing PDF values in the sample space.
• The performance of the surrogate models is evaluated in
increasing sample space.
280
285
290
295
300
305
2
4
6
8
800
600
400
200
0
10
Wind speed (m/s) Temperature (K)
Solar radiation (W/m2)
100%
…
…
0.8%
0.1%
3720 climatic conditions

Root Mean Squared Percentage Error
23
The surrogate model for January
Root Mean Squared Error
Increasing percentage of sample space

The surrogate model for January
Maximum Percentage Error
24
Maximum Absolute Error

Root Mean Squared Percentage Error
25
The surrogate model for August
Root Mean Squared Error

The surrogate model for August
Maximum Percentage Error
26
Maximum Absolute Error

27
Conclusions
• Inverse transform sampling is uniquely helpful for surrogate
development where the system inputs follow a certain distribution.
• The CDF of the inputs are made to follow a pseudorandom
sequence (such as Sobol).
• For window performance evaluation, the surrogate models
developed using inverse transform sampling have lower root mean
squared error than those developed using the Sobol sequence.
• For window performance evaluation, the surrogate models
developed using inverse transform sampling have higher maximum
absolute error than those developed using the Sobol sequence.

28
Future Work
• Extend the applicability of inverse transform sampling to
correlated multi-variate/multi-input systems.

Acknowledgement
• I would like to acknowledge my research adviser
Prof. Achille Messac, for his immense help and
support in this research.
• Support from the NSF Awards is also
acknowledged.
29

30
Selected References
• Husslage, B. G., Rennen, G., van Dam, E. R., and den Hertog, D., “Space-filling Latin Hypercube Designs for Computer
Experiments,” Optimization and Engineering, Vol. 12, 2011, pp. 611–632.
• Clarkson, K. L. and Shor, P. W., “Applications of Random Sampling in Computational Geometry, II,” Discrete and Computational
Geometry, Vol. 4, 1989, pp. 387–421.
• Goldreich, O., Computational Complexity: A Conceptual Perspective, Cambridge University Press, 1st ed., 2008.
• LaValle, S. M., Planning Algorithms, Cambridge University Press, 2006.
• Niederreiter, H., “Point Sets and Sequences with Small Discrepancy,” Monatshefte fr Mathematik, Vol. 104, December 1987, pp.
273–337.
• van der Corput, J. G., “Verteilungsfunktionen,” Nederl. Akad. Wetensch. Proc., Vol. 38, 1935, pp. 813–821.
• Diaconis, P., “The Distribution of Leading Digits and Uniform Distribution Mod 1,” The Annals of Probability, Vol. 5, No. 1, Feb
1977, pp. 72–81.
• Sobol, I. M., “Uniformly Distributed Sequences with an Additional Uniform Property,” USSR Computational Mathematics and
Mathematical Physics, Vol. 16, 1976, pp. 236–242.
• Faure, H., “Discrpances de suites associes un systme de numration en dimension s,” Acta Arithmetica, Vol. 41, 1982, pp. 337–351.
• Miller, F., Vandome, A., and John, M., Inverse Transform Sampling, VDM Verlag Dr. Mueller e.K., 2010.
• von Neumann, J., “Various Techniques Used in Connection with Random Digits,” Nat. Bureau Stand. Appl. Math. Ser., Vol. 12,
1951, pp. 3638.
• Marshall, A. W., “The Use of Multi-stage Sampling Schemes in Monte Carlo Computations,” H. A. Meyer (ed.), Symposium on
Monte Carlo Methods, edited by N. Y. John Wiley & Sons, Inc., 1956, p. 123140.
• Gilks, W., Gilks, W., Richardson, S., and Spiegelhalter, D., Markov Chain Monte Carlo in Practice, Interdisciplinary Statistics,
Chapman & Hall, 1996.

31
• All the hourly climatic conditions are classified into regions
with increasing PDF values in the sample space.
• For each variable, the probability is the integral of the fitted
PDF in the shortest interval.
• The performance of the surrogate models is evaluated in
increasing sample space.

32
Review
• Sampling sequences
• Latin hypercube
• Random
• Pseudorandom
• Low-dispersion
• Low-discrepancy
• Generating sample points from a probability distribution
• Inverse transform sampling
• rejection sampling
• importance sampling
• Markov Chain Monte Carlo
• Metropolis-Hastings Sampling
• Gibbs Sampling

Comparisons and Analyses
33
Sobol sequence Inverse transform
• A Voronoi diagram is a special kind of decomposition of a metric space
determined by distances to a specified discrete set of points in the space.
• Each point has a cell that includes the region closer to the point than to
any others.
• The lines are equidistant to the two nearest points.

Sampling-SDM2012_Jun

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Sampling-SDM2012_Jun

Similar to Sampling-SDM2012_Jun (20)

More from MDO_Lab

More from MDO_Lab (20)

Recently uploaded

Recently uploaded (20)

Sampling-SDM2012_Jun

Editor's Notes