The document compares three methods for selecting input variables in uncertainty quantification and sensitivity analysis of computer models: random sampling, stratified sampling, and Latin hypercube sampling. It shows that stratified sampling and Latin hypercube sampling produce estimators with lower variance than random sampling for important statistics like the sample mean and variance. When applied to a fluid dynamics computer code called SOLA-PLOOP, Latin hypercube sampling produced estimators with standard deviations around one-fourth those of random sampling, demonstrating its superior performance.
Water Industry Process Automation & Control Monthly - April 2024
A comparison of three methods for selecting values of input variables in the analysis
1. American Society for Quality
A Comparison of Three Methods for Selecting Values of Input Variables in the Analysis of
Output from a Computer Code
Author(s): M. D. Mckay, R. J. Beckman, W. J. Conover
Source: Technometrics, Vol. 42, No. 1, Special 40th Anniversary Issue (Feb., 2000), pp. 55-61
Published by: American Statistical Association and American Society for Quality
Stable URL: http://www.jstor.org/stable/1271432
Accessed: 18/09/2010 22:32
Your use of the JSTOR archive indicates your acceptance of JSTOR's Terms and Conditions of Use, available at
http://www.jstor.org/page/info/about/policies/terms.jsp. JSTOR's Terms and Conditions of Use provides, in part, that unless
you have obtained prior permission, you may not download an entire issue of a journal or multiple copies of articles, and you
may use content in the JSTOR archive only for your personal, non-commercial use.
Please contact the publisher regarding any further use of this work. Publisher contact information may be obtained at
http://www.jstor.org/action/showPublisher?publisherCode=astata.
Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printed
page of such transmission.
JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of
content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms
of scholarship. For more information about JSTOR, please contact support@jstor.org.
American Statistical Association and American Society for Quality are collaborating with JSTOR to digitize,
preserve and extend access to Technometrics.
http://www.jstor.org
2. A Comparison of Three Methods for Selecting
Values of Input Variables in the Analysis of
Output From a Computer Code
M. D. MCKAYAND R. J. BECKMAN
LosAlamosScientificLaboratory
P.O.Box 1663
LosAlamos,NM87545
W. J. CONOVER
Departmentof Mathematics
TexasTechUniversity
Lubbock,TX79409
Two types of sampling plans are examined as alternatives to simple random sampling in Monte
Carlo studies. These plans areshown to be improvementsover simple randomsamplingwith respect
to variancefor a class of estimators which includes the sample mean andthe empirical distribution
function.
KEY WORDS: Latin hypercubesampling;Sampling techniques;Simulation techniques;Variance
reduction.
1. INTRODUCTION
Numerical methods have been used for years to provide
approximatesolutions to fluid flow problems that defy ana-
lytical solutions because of their complexity. A mathemati-
cal model is constructedto resemble the fluid flow problem,
and a computer program (called a "code"), incorporating
methods of obtaining a numerical solution, is written.Then
for any selection of input variables X = (X,..., XK) an
output variable Y = h(X) is produced by the computer
code. If the code is accurate the output Y resembles what
the actualoutputwould be if an experimentwere performed
under the conditions X. It is often impractical or impossi-
ble to perform such an experiment.Moreover,the computer
codes are sometimes sufficiently complex so that a single
set of input variables may require several hours of time on
the fastest computers presently in existence in orderto pro-
duce one output. We should mention that a single output
Y is usually a graph Y(t) of output as a function of time,
calculated at discrete time points t, to < t < tl.
When modeling real world phenomena with a computer
code one is often faced with the problem of what values
to use for the inputs. This difficulty can arise from within
the physical process itself when system parametersare not
constant, but vary in some manner about nominal values.
We model our uncertainty about the values of the inputs
by treating them as random variables. The information de-
sired from the code can be obtained from a study of the
probability distribution of the output Y(t). Consequently,
we model the "numerical"experiment by Y(t) as an un-
known transformationh(X) of the inputs X, which have a
known probability distribution F(x) for x c S. Obviously
several values of X, say XI,..., XN, must be selected as
successive inputs sets in order to obtain the desired infor-
mation concerning Y(t). When N must be small because
of the runningtime of the code, the input variables should
be selected with great care.
The next section describes three methods of selecting
(sampling) input variables. Sections 3, 4 and 5 are devoted
to comparing the three methods with respect to their per-
formance in an actual computer code.
The computer code used in this paper was developed
in the Hydrodynamics Group of the Theoretical Division
at the Los Alamos Scientific Laboratory, to study reac-
tor safety (Hirt and Romero 1975). The computer code is
named SOLA-PLOOP and is a one-dimensional version of
anothercode SOLA (Hirt,Nichols, and Romero 1975). The
code was used by us to model the blowdown depressuriza-
tion of a straightpipe filled with water at fixed initial tem-
perature and pressure. Input variables include: X1, phase
change rate;X2, dragcoefficient for driftvelocity; X3, num-
berof bubblesperunitvolume;andX4, pipe roughness.The
inputvariablesareassumedto be uniformly distributedover
given ranges. The output variable is pressure as a function
of time, where the initial time to is the time the pipe rup-
tures and depressurizationinitiates, and the final time tl is
20 milliseconds later.The pressure is recorded at 0.1 milli-
second time intervals.The code was used repeatedly so that
the accuracy and precision of the three sampling methods
could be compared.
2. A DESCRIPTIONOF THE THREE METHODS
USED FOR SELECTINGTHE VALUES
OF INPUTVARIABLES
From the many different methods of selecting the values
of input variables, we have chosen three that have consid-
erable intuitive appeal. These are called random sampling,
stratifiedsampling, and Latin hypercube sampling.
RandomSampling. Let the input values XI,..., XN be
a random sample from F(x). This method of sampling is
perhaps the most obvious, and an entire body of statistical
literaturemay be used in making inferences regarding the
distributionof Y(t).
? 1979 American Statistical Association
and the American Society for Quality
TECHNOMETRICS,FEBRUARY2000, VOL.42, NO. 1
55
3. M. D. MCKAY,R. J. BECKMAN,AND W. J. CONOVER
StratifiedSampling. Using stratifiedsampling,all areas
of the samplespaceof X arerepresentedby inputvalues.
Let the samplespaceS of X be partitionedintoI disjoint
strataSi. Let pi = P(X E Si) representthe size of Si.
Obtain a random sample Xij, j = 1,..., ni from Si. Then
of coursethe ni sum to N. If I = 1, we have random
samplingoverthe entiresamplespace.
LatinHypercubeSampling. The same reasoning thatled
to stratifiedsampling,ensuringthatall portionsof S were
sampled,couldleadfurther.If we wishto ensurealsothat
each of the inputvariablesXk has all portionsof its dis-
tributionrepresentedby inputvalues,we can divide the
rangeof each Xk into N strataof equalmarginalproba-
bility 1/N, and sampleonce from each stratum.Let this
sample be Xkj, j = 1,..., N. These form the Xk compo-
nent, k = 1,..., K, in Xi, i = 1,..., N. The components
of the variousXk's are matchedat random.This method
of selectinginputvaluesis anextensionof quotasampling
(Steinberg1963),andcan be viewed as a K-dimensional
extensionof Latinsquaresampling(Raj1968).
One advantageof the Latinhypercubesampleappears
when the output Y(t) is dominatedby only a few of
the componentsof X. This methodensuresthateach of
those componentsis representedin a fully stratifiedman-
ner, no matterwhich componentsmight turn out to be
important.
WementionherethattheN intervalsontherangeof each
componentof X combineto form NK cells whichcover
the samplespaceof X. Thesecells, whicharelabeledby
coordinatescorrespondingto the intervals,areused when
findingthepropertiesof the samplingplan.
2.1 Estimators
IntheAppendix(Section8),stratifiedsamplingandLatin
hypercubesamplingareexaminedandcomparedtorandom
samplingwithrespectto theclassof estimatorsof theform
N
T(Y1,..., YN) = (1/N) g(Yi),
i=l
whereg(.) = arbitraryfunction.
If g(Y) = Y thenT representsthe samplemeanwhichis
used to estimate E(Y). If g(Y) = yr we obtain the rth
sample moment. By letting g(Y) = 1 for Y < y, 0 other-
wise, we obtaintheusualempiricaldistributionfunctionat
thepointy. Ourinterestis centeredaroundtheseparticular
statistics.
Let T denotetheexpectedvalueof T whentheYt'scon-
stitutearandomsamplefromthedistributionof Y = h(X).
We show in the Appendixthat both stratifiedsampling
and Latinhypercubesamplingyield unbiasedestimators
of r.
If TR is theestimateof T froma randomsampleof size
N, andTs is the estimatefroma stratifiedsampleof size
N, thenVar(Ts)< Var(TR)whenthe stratifiedplanuses
equalprobabilitystratawith one sampleper stratum(all
pi = 1/N and nij = 1). No direct means of comparing the
varianceof the correspondingestimatorfromLatinhyper-
cube sampling,TL,to Var(Ts)has been found.However,
TECHNOMETRICS,FEBRUARY2000, VOL.42, NO. 1
thefollowingtheorem,provedin theAppendix,relatesthe
variances of TL and TR.
Theorem. If Y = h(X1,... XK) is monotonic in each
of its arguments,andg(Y) is a monotonicfunctionof Y,
then Var(TL)< Var(TR).
2.2 The SOLA-PLOOP Example
The three samplingplans were comparedusing the
SOLA-PLOOPcomputercodewithN = 16.Firstarandom
sampleconsistingof 16 valuesof X = (X1,X2,X3,X4)
wasselected,enteredasinputs,and16graphsof Y(t) were
observedas outputs.Theseoutputvalueswereusedin the
estimators.
Forthe stratifiedsamplingmethodtherangeof eachin-
put variablewas dividedat the medianinto two partsof
equalprobability.Thecombinationsof rangesthusformed
produced24 = 16 strataSi. Oneobservationwas obtained
atrandomfromeachSi as input,andtheresultingoutputs
wereusedto obtaintheestimates.
ToobtaintheLatinhypercubesampletherangeof each
inputvariableXi was stratifiedinto 16 intervalsof equal
probability,andoneobservationwasdrawnatrandomfrom
eachinterval.These16valuesforthe4 inputvariableswere
matchedatrandomto form 16 inputs,andthus 16 outputs
fromthecode.
The entireprocessof samplingand estimatingfor the
threeselectionmethodswas repeated50 timesin orderto
getsomeideaof theaccuraciesandprecisionsinvolved.The
total computertime spentin runningthe SOLA-PLOOP
code in this studywas 7 hourson a CDC-6600.Some of
the standarddeviationplotsappearto be inconsistentwith
the theoreticalresults.These occasionaldiscrepanciesare
believedto arisefromthenon-independenceof theestima-
torsovertimeandthe smallsamplesizes.
3. ESTIMATINGTHE MEAN
Thegoodnessof an unbiasedestimatorof the meancan
be measuredby the size of its variance.Foreachsampling
method,theestimatorof E(Y(t)) is of theform
N
Y(t) = (l/N) E Yi(t)
i=l
(3.1)
where
i=1,...,N.
In the case of the stratifiedsample,the Xi comesfrom
stratumSi, pi = 1/N and ni = 1. For the Latin hypercube
sample,theXi is obtainedin themannerdescribedearlier.
Each of the three estimators YR,Ys, and YLis an unbiased
estimatorof E(Y(t)). The variancesof the estimatorsare
givenin (3.2):
Var(Y(t)) = (1/N)Var(Y(t))
N
Var(Ys(t)) = Var(YR(t))- (1/N2) (pi - ,)2
i=l
56
Yi(t) = h(X),
4. A COMPARISONOF THREE METHODS FOR SELECTINGVALUESOF INPUT
150-0
ANDOM ............
STRATI .........
LAT --
I I
0
2
I-
(nV)
La.
z
<
LLJ
2~E
RANDOM
STRATIFIED
LATIN100-0
50-0
0-0 5-0 1I0
TIME
1-02015.0 20-0
Figure 1. Estimating the Mean: The Sample Mean of YR(t), Ys(t),
and YL(t).
Var(YL(t))= Var(YR(t))+ ((N - 1)/N)
1/(NK(N- I)K)) E (i
R
- )(tj - t) (3.2)
TIME
Figure 3. Estimating the Variance:The Sample Mean of S2 (t), S (t),
and S2 (t).
YL(t) clearly demonstrates superiority as an estimator in
this example, with a standarddeviationroughly one-fo[u]rth
that of the random sampling estimator.
where p = E(Y(t)),
pi = E(Y(t)lX E Si) in the stratified sample, or
Pi - E(Y(t)lX e cell i) in the Latin hypercube
sample,
and R means the restricted space of all pairs ,ui,/j having
no cell coordinates in common.
For the SOLA-PLOOP computer code the means and
standarddeviations, based on 50 observations, were com-
puted for the estimators just described. Comparative plots
of the means are given in Figure 1. All of the plots of the
means are comparable, demonstrating the unbiasedness of
the estimators.
Comparativeplots of the standarddeviations of the es-
timators are given in Figure 2. The standarddeviation of
Ys(t) is smaller than that of YR(t) as expected. However,
or
O0
I-
2
I-
(C)
Lj
La.
0
0
4. ESTIMATINGTHE VARIANCE
For each sampling method, the form of the estimator of
the variance is
N
S2(t) - (1/N)Y (Y(t)- Y(t))2,
i=l
and its expectation is
E(S2(t)) Var(Y(t)) - Var(Y(t)),
(4.1)
(4.2)
where Y(t) is one of YR(t),Ys(t), or YL(t).
In the case of the random sample, it is well known that
NS2/(N- 1) is an unbiased estimator of the variance of
Y(t). The bias in the case of the stratified sample is un-
known. However, because Var(Ys(t)) < Var(YR(t)),
(1- 1/N)Var(Y(t)) < E(S2(t)) < Var(Y(t)). (4.3)
50'0 -
2-5 -
20 -
1-5 -
1.0 -
0-5 -
RANDOM .
STRATFIED ----------
LATIN
: ''
:, i
.@: ,;i/ .
S
., :t
'~.....~- -"-- '--t-- -~....._.._ ....~`
I:
I-
V)
LA-
0
O
c5
u'
40-0 -
30-0 -
20-0 -
10-0 -
RANDOM
STRATIFIED
LATIN
i''
ii
I 1'
1 :,
1
1 '
1 '
1 't
?f?
I
I
"r,
5-0 10-0
TIME
15-15.0 20-0
0.0 5.0 10-0
TIME
15-0 2015.0 20.0
Figure 2. Estimating the Mean: The Standard Deviation of YR(t),
Ys(t), and YL(t).
Figure 4. Estimating the Variance: The Standard Deviation of S2(t),
Ss(t), and S2(t).
TECHNOMETRICS,FEBRUARY2000, VOL.42, NO. 1
140-0 -
CY
O0
I-
2
V)
LJ6
LA.
0
zLLJ
w
120-0-
100*0-
80.0 -
60-0 -
0'0
0.0
Ai"-(n A -.I.v -V
n.n ' a -1. I-,.,
..- /-a
57
5. M. D. MCKAY,R. J. BECKMAN,AND W. J. CONOVER
o-
RANDOM
STRATIFID .......- I
LATIN --
/
/
Il/
//
"_'
......./""
20-0 30-0 40-0 50-0 60-0 70-0
PRESSURE
2.1, theexpectedvalueof G(y,t) underthethreesampling
plansis thesame,andunderrandomsampling,theexpected
value of G(y, t) is D(y, t).
The variancesof the threeestimatorsaregivenin (5.2).
Di againrefersto eitherstratumi orcell i, as appropriate,
andR representsthesamerestrictedspaceasit didin (3.2).
Var(GR(y,t)) = (1/N)D(y, t)(l - D(y, t))
Var(Gs(y, t)) = Var(GR(y,t))
N
- (1/N2) (D (y, t)- D(y, t))2
t=l
I80 980.0 90'0
Figure 5. Estimating the CDF: The Sample Mean of GR(Y,t), Gs(y,
t), and GL(y, t) at t = 1.4.
Thebiasin the Latinhypercubeplanis also unknown,but
for the SOLA-PLOOPexampleit wassmall.Variancesfor
theseestimatorswerenotfound.
AgainusingtheSOLA-PLOOPexample,meansandstan-
darddeviations(basedon 50 observations)werecomputed.
The meanplots are given in Figure3. They indicatethat
all threeestimatorsare in relativeagreementconcerning
thequantitiestheyareestimating.Intermsof standardde-
viationsof the estimators,Figure4 shows that,although
stratifiedsamplingyieldsaboutthe sameprecisionas does
randomsampling,Latinhypercubefurnishesa clearlybet-
terestimator.
5. ESTIMATINGTHE DISTRIBUTIONFUNCTION
The distribution function, D(y,t), of Y(t) = h(X) may
beestimatedbytheempiricaldistributionfunction.Theem-
piricaldistributionfunctioncanbe writtenas
N
G(y, t) = (1/N) u(y- Yi(t)), (5.1)
i=-i
whereu(z) = 1 for z > 0 and is zero otherwise.Since
equation(5.1) is of the formof the estimatorsin Section
015 -
~~~~~~~RANDOMWY~~~~~~
.
0o .T5RA D ..........
----
< o0-0- LAW -- /
I- :
vG:y,-, tatt=4(:,a
_
.....*
:
:-
^ ^ *../:-
Eo , . .: .
0. 2''
0-00 /J * I I I I
20'0 30-0 4O?0 5-0s 0'0 700 0-o0 90-0
PRESSURE
Figure 6. Estimating the CDF: The Standard Deviation of GR(y, t),
Gs(y, t), and GL(y, t) at t = 1.4.
Var(GL(y,t)) = Var(GR(y,t))
+ ((N - 1)/N. 1/NK(N - 1)K) E (Di(y,t)
R
- D(y, t)). (Dj(y, t) - D(y, t)). (5.2)
As with the cases of the meanandvarianceestimators,
thedistributionfunctionestimatorswerecomparedfor the
threesamplingplans.Figures5 and6 give the meansand
standarddeviationsof the estimatorsat t = 1.4 ms. This
time pointwas chosento correspondto the time of max-
imumvariancein the distributionof Y(t). Againthe esti-
matesobtainedfroma Latinhypercubesampleappearto
be moreprecisein generalthanthe othertwo typesof es-
timates.
6. DISCUSSION AND CONCLUSIONS
We havepresentedthreesamplingplansandassociated
estimatorsof themean,thevariance,andthepopulationdis-
tributionfunctionof the outputof a computercode when
theinputsaretreatedasrandomvariables.Thefirstmethod
is simplerandomsampling.The secondmethodinvolves
stratifiedsamplingandimprovesuponthefirstmethod.The
thirdmethodis calledhereLatinhypercubesampling.It is
an extensionof quotasampling(Steinberg1963),andit is
a firstcousinto the "randombalance"designdiscussedby
Satterthwaite(1959),Budne(1959),Youdenet al. (1959),
Anscombe(1959),andto thehighlyfractionalizedfactorial
designs discussedby Enrenfeldand Zacks (1951, 1967),
Dempster(1960, 1961), and Zacks (1963, 1964), and to
latticesamplingas discussedby Jessen(1975).This third
methodimprovesuponsimplerandomsamplingwhencer-
tain monotonicityconditionshold, andit appearsto be a
goodmethodto use for selectingvaluesof inputvariables.
7. ACKNOWLEDGMENTS
WeextendaspecialthankstoRonaldK.Lohrding,forhis
earlysuggestionsrelatedtothisworkandforhiscontinuing
supportandencouragement.We also thankourcolleagues
LarryBruckner,BenDuran,C.Phive,andTomBoullionfor
theirdiscussionsconcerningvariousaspectsof theproblem,
andDaveWhitemanfor assistancewiththecomputer.
Thispaperwaspreparedunderthe supportof the Anal-
ysis DevelopmentBranch,Divisionof ReactorSafetyRe-
search,NuclearRegulatoryCommission.
1.0 -
0.8 -
0*6-
0-4 -
0-2 -
cr
0
I-4
V)
2
L&J
La.
0
z
LLJ
:2
0'0
TECHNOMETRICS,FEBRUARY2000, VOL.42, NO. 1
lm . . .. k-
58
6. A COMPARISONOF THREE METHODSFOR SELECTINGVALUESOF INPUT
8. APPENDIX
In the sectionsthatfollow we presentsome generalre-
sults aboutstratifiedsamplingand Latinhypercubesam-
pling in orderto make comparisonswith simplerandom
sampling.Wemovefromthegeneralcaseof stratifiedsam-
pling to stratifiedsamplingwith proportionalallocation,
andthen to proportionalallocationswith one observation
perstratum.WeexamineLatinhypercubesamplingforthe
equalmarginalprobabilitystratacase only.
8.1 TypeIEstimators
LetX denotea K variaterandomvariablewithprobabil-
ity densityfunction(pdf) f(x) for x E S. Let Y denotea
univariatetransformationof X givenby Y = h(X). Inthe
contextof thispaperwe assume
X f(x),xeS KNOWNpdf
Y = h(X) UNKNOWNbutobservable
transformationof X.
The class of estimatorsto be consideredare those of the
form
N
T(Ul,..., iUN)=-(l/N) Eg (ui), (8.1)
t=l
where g(.) is an arbitrary,knownfunction.In particular
we use g(u) = ur to estimatemoments,andg(u) = 1 for
u > 0,= 0 elsewhere,to estimatethedistributionfunction.
The samplingschemesdescribedin the following sec-
tions will be comparedto randomsamplingwith respect
to T. The symbolTR denotesT(Y1,..., YN)whenthe ar-
gumentsY, ..., YNconstitutea randomsampleof Y. The
meanandvarianceof TRaredenotedby r and02/N. The
statisticT given by (8.1) will be evaluatedat arguments
arisingfrom stratifiedsamplingto form Ts, andat argu-
mentsarisingfromLatinhypercubesamplingto formTL.
The associatedmeansand varianceswill be comparedto
thosefor randomsampling.
8.2 StratifiedSampling
Lettherangespace,S, of X bepartitionedintoI disjoint
subsetsSi of size pi = P(X c Si), with
I
5Pi 1.
i=l
Let Xij,j = 1,., ni, be a randomsamplefrom stratum
Si. Thatis, let Xij iid f(x)/pi,j = 1,..., ni, forx ESi,
butwithzerodensityelsewhere.Thecorrespondingvalues
of Y aredenotedby Yij= h(Xij), andthestratameansand
variancesof g(Y) aredenotedby
i = E(g(Yij))- j g(y)(l/pi)f(x) dx
Si
a-2 - Var(g(Yj)) =
S (g(y) -
)2(1/pi)f(x)dx.i I s
Itis easyto seethatif weusethegeneralform
I ni
Ts = (pi/ni) E g(Yij),
i=l j=l
thatTs is an unbiasedestimatorof r with variancegiven by
(8.2)Var(Ts) = (p2/ni)o2.
i=l
Thefollowingresultscanbe foundin Tocher(1963).
StratifiedSampling with Proportional Allocation. If the
probabilitysizes, pi, of the strataand the samplesizes,
ni, are chosen so that ni = piN, proportional allocation
is achieved.Inthiscase (8.2)becomes
I
Var(Ts) = Var(TR)- (I/N) EPi(iii - r)2.
i=l
(8.3)
Thus,we see thatstratifiedsamplingwithproportionalal-
locationoffersanimprovementoverrandomsampling,and
thatthe variancereductionis a functionof the differences
betweenthe stratameans,i andtheoverallmeanr.
Proportional Allocation with One Sample per Stratum.
Any stratifiedplan which employssubsampling,ni > 1,
canbe improvedby furtherstratification.Whenall ni = 1,
(8.3)becomes
N
Var(Ts) = Var(TR) - (1/N2) E (i - r)2.
i=l
(8.4)
8.3 LatinHypercube Sampling
In stratifiedsamplingthe range space S of X can be
arbitrarilypartitionedto form strata.In Latinhypercube
samplingthepartitionsareconstructedina specificmanner
usingpartitionsof therangesof eachcomponentof X. We
will onlyconsiderthecasewherethecomponentsof X are
independent.
Let the rangesof each of the K componentsof X be
partitionedinto N intervalsof probabilitysize 1/N. The
Cartesianproductof these intervalspartitionsS into NK
cellseachof probabilitysizeN-K. Eachcellcanbelabeled
by a set of K cell coordinates mi = (mil, i2,..., iK)
wheremij is the intervalnumberof componentXj repre-
sentedin cell i. A Latinhypercubesampleof size N is ob-
tained from a randomselection N of the cells ml,..., mN,
withtheconditionthatforeachj theset {mij}N is a per-
mutationof theintegers1,..., N. Onerandomobservation
is madein eachcell. Thedensityfunctionof X givenX c
cell i is NKf(x) if x E cell i, zero otherwise. The marginal
(unconditional)distributionof Yi(t)is easilyseento be the
sameas thatfor a randomlydrawnX as follows:
P(Y < y) = P(Yi < ylX E cell q)P(X c cell q)
all cells q
= E ll N K(x)dx(1/NK)
h(x)<y
-Jh(x)<y
f(x) dx.
TECHNOMETRICS,FEBRUARY2000, VOL.42, NO. 1
59
7. M. D. MCKAY,R. J. BECKMAN,AND W.J. CONOVER
From this we have TL as an unbiased estimator of T.
To arrive at a form for the variance of TL we introduce
indicator variables wt, with
f 1 if cell i is in the sample
Wi- l 0 if not.
The estimator can now be written as
NK
TL = (1/N) z wig(Yi),
i=l1
(8.5)
where Yi = h(Xi) and Xi c cell i. The variance of TL is
given by
NK
Var(TL)= (1/N2) Var(wig(Yi))
i=l
NK NK
+ (1/N2) 5 Cov(wig(Yi), Wjg(Yj)). (8.6)
i=l j=l
jii
The following results about the wi are immediate:
1. P (wi = 1) = (1/NK-1) = E(wi) = E(w2)
Var(wi) (1/NK-1)(1- 1INK-1).
2. If wi and wj correspond to cells having no cell coor-
dinates in common, then
E(wiwj) = E(wiw lwwj= O)P(wj = 0)
+ E(wiwjlwj = 1)P(wj = 1)
= 1/(N(N- 1))K-1
3. If wi and wj correspond to cells having at least one
common cell coordinate, then
E(iwjw) =0.
Now
Var(wig(Yi)) = E(w2)Var g(Yi) + E2(g(Yi))Var(wt) (8.7)
so that
NK
E Var(wig(Yi))
i=l
NK
N-K+l E E(g(Yi)
i=l
i)2
NK
+ (N-K+I1 N-2K+2) E 2 (8.8)
='-1
where ui = E{g(Y))X e cell i}. Since
E(g(Y)- i)2
-- NK (g(y) - 7)2f(x) dx + (i -
wcelli
we have
5 Var(wig(Yi))
i
N Var(Y)- N-K+1 E (i
i
+ (N-K+1 _ N-2K+2) E
Furthermore
NK NK
E Z Cov(wig(Yi),wjg(Yj))
i=1 j=1
i#j
- EZ E ijE{wiwj} - N-2K+2 ZE ipj (8.11)
i#j i?j
which combines with (8.10) to give
Var(TL) = (1/N)Var(Y) - N-K-1 (t _ )2
i
+ (N-K-1 N-2NK)-
2
+ (N - 1)-K+1NK-1
R
-N -2K
E ij-^
EE^.~~.,
(8.12)
where R means the restricted space of NK(N - 1)K pairs
([i, ,j) correspondingto cells having no cell coordinates in
common. After some algebra, and with K
ui = NKT, the
final form for Var(TL)becomes
Var(TL) = Var(TR) + (N - 1)/N[N-K(N - 1)-K
*E (pi- T)(pu- T)]. (8.13)
R
Note that Var(TL)< Var(TR)if and only if
N -K(N - 1)-K (/i - )(1jt - T) < 0, (8.14)
R
which is equivalent to saying that the covariance between
cells having no cell coordinates in common is negative. A
sufficient condition for (8.14) to hold is given by the fol-
lowing theorem.
Theorem. If Y = h(X1,..., XK) is monotonic in each
of its arguments,and if g(Y) is a monotonic function of Y,
then Var(TL)< Var(TR).
Proof The proof employs a theorem by Lehmann
(1966). Two functions r(x1,..., XK) and s(y1,..., YK) are
said to be concordant in each argument if r and s either
increase or decrease together as a function of xi - yi, with
all xj,j 7 i and yj,j - i held fixed, for each i. Also,
)2 (8.9) a pair of random variables (X, Y) are said to be nega-
tively quadrant dependent if P(X < x, Y < y) < P(X <
x)P(Y < y) for all x,y. Lehmann's theorem states that
_ T)2 if (i) (X1, Y1),(X2, Y2),... (XK, YK) are independent, (ii)
(Xi, Y,) is negatively quadrantdependent for all i, and (iii)
X = r(X1,...,XK) and Y = s(Y1,...,YK) are concor-
2. (8.10) dant in each argument,then (X, Y) is negatively quadrant
dependent.
TECHNOMETRICS,FEBRUARY2000, VOL.42, NO. 1
60
8. A COMPARISONOF THREE METHODSFOR SELECTINGVALUESOF INPUT
We earlierdescribeda stage-wiseprocessfor selecting
cellsforaLatinhypercubesample,whereacell waslabeled
by cell coordinates mi,..., miK. Two cells (I1,..., IK) and
(ml,..., mK) with no coordinates in common may be se-
lectedas follows.Randomlyselecttwo integers(R11,R21)
withoutreplacementfromthefirstN integers1,..., N. Let
11 = R11 and m1 = R21. Repeat the procedure to obtain
(R12,R22), (R13,R23), . .,(R1K, R2K) and let lk = RIk
and mk = R2k. Thus two cells are randomly selected and
lk 7 mk for k = 1,..., K.
Note thatthe pairs (Rlk, R2k), k = 1,..., K, aremutually
independent. Also note that because P(Rlk < x, R2k <
y) = [xy - min(x,y)]/(n(n - 1)) < P(Rlk < x)P(R2k <
y), where[.] representsthe"greatestinteger"function,each
pair (Rlk, R2k) is negatively quadrantdependent.
Let /ul be the expectedvalue of g(Y) withinthe cell
designated by (I1,..., 1K), and let /2 be similarly defined
for (ml,... ,mK). Then /1 = ii(R11,R12,... ,R1K) and
A12 -= I(R21, R22, ..., R2K) are concordant in each argu-
ment underthe assumptionsof the theorem.Lehmann's
theorem then yields that 1iand /2 are negatively quadrant
dependent.Therefore,
P(PI1< X, l2 < y) < P(li1 < x)P(i2 < y).
UsingHoeffding'sequation
Cov(X, Y) =
1+oo r+00
[P(X < x, Y < y)
- P(X < x)P(Y < y)] dx dy,
(seeLehmann(1966)foraproof),wehaveCov(/Al,/2) < 0.
Since Var(TL) = Var(TR)+ (N - 1)/N Cov(i,Lu2), the
theoremis proved.
Sinceg(t) asusedin bothSections3 and5 is anincreas-
ing functionof t, we cansay thatif Y = h(X) is a mono-
tonic functionof each of its arguments,Latinhypercube
samplingis betterthanrandomsamplingforestimatingthe
meanandthepopulationdistributionfunction.
[Received January 1977. Revised May 1978.]
REFERENCES
Anscombe,F. J. (1959),"QuickAnalysisMethodsfor RandomBalance
ScreeningExperiments,"Technometrics,1, 195-209.
Budne,T.A. (1959),"TheApplicationof RandomBalanceDesigns,Tech-
nometrics, 1, 139-155.
Dempster,A.P.(1960),"RandomAllocationDesignsI:OnGeneralClasses
of EstimationMethods,"TheAnnals of MathematicalStatistics, 31, 885-
905.
(1961),"RandomAllocationDesignsII:ApproximateTheoryfor
Simple Random Allocation," TheAnnals of Mathematical Statistics, 32,
387-405.
Ehrenfeld,S., andZacks,S. (1951),"RandomizationandFactorialExper-
iments,"TheAnnals of Mathematical Statistics, 32, 270-297.
(1967), "TestingHypothesesin RandomizedFactorialExperi-
ments,"TheAnnals of Mathematical Statistics, 38, 1494-1507.
Hirt,C.W.,Nichols,B.D.,andRomero,N.C.(1975),"SOLA-A Numeri-
calSolutionAlgorithmforTransientFluidFlows,"ScientificLaboratory
ReportLA-5852,LosAlamosNationalLaboratory,NM.
Hirt,C.W.,andRomero,N. C.(1975),"Applicationof aDrift-FluxModel
to Flashingin StraightPipes,"ScientificLaboratoryReportLA-6005-
MS,LosAlamosNationalLaboratory,NM.
Jessen,RaymondJ.(1975),"SquareandCubicLatticeSampling,"Biomet-
rics,31,449-471.
Lehmann,E. L. (1966),"SomeConceptsof Dependence,"TheAnnalsof
Mathematical Statistics, 35, 1137-1153.
Raj,Des. (1968),SamplingTheory,New York:McGraw-Hill.
Satterthwaite,F.E. (1959),"RandomBalanceExperimentation,"Techno-
metrics, 1, 111-137.
Steinberg,H. A. (1963),"GeneralizedQuotaSampling,"NuclearScience
and Engineering, 15, 142-145.
Tocher,K.D. (1963),TheArtofSimulation,Princeton,NJ:VanNostrand.
Youden,W.J., Kempthorne,O.,Tukey,J. W.,Box, G. E. P.,andHunter,
J. S. (1959), Discussionof the papersof Messrs.Satterthwaiteand
Budne, Technometrics,1, 157-193.
Zacks,S. (1963),"Ona CompleteClassof LinearUnbiasedEstimators
for RandomizedFactorialExperiments,"TheAnnalsof Mathematical
Statistics, 34, 769-779.
(1964), "GeneralizedLeastSquaresEstimatorsfor Randomized
FractionalReplication Designs," TheAnnals of Mathematical Statistics,
35, 696-704.
TECHNOMETRICS,FEBRUARY2000, VOL.42, NO. 1
61