324 The Immunoassay Handbook
log of the standard concentrations. The logit transform
yields unreliable results at the low and high concentration
regions, is not effective with asymmetrical immunometric
assays, and is not used much anymore.
Sometimes a log function has been used in an attempt to
reduce asymmetry in sigmoidal dose–response curves
when computing with a four-parameter logistic (4PL).
However, the log transform has the opposite effect and
increases asymmetry and worsens the 4PL ﬁt, when the
transition point is above the midpoint of the curve, which
is typical for most ELISA and immunometric assays.
In practice, unless the errors introduced by the transfor-
mation are correctly handled in the regression ﬁtting,
these transformations will introduce more error in addi-
tion to the random error and the lack-of-ﬁt error already
present in the regression.
Determining the Response–
It is necessary to determine the random error associated
with each sample response in order to obtain what is called
the maximum likelihood estimate of the true curve in
regression theory. In immunoassays, the random error,
expressed as the variance of the response, is caused by
G Magnitude of the response
G Kinetics of the antigen–antibody reaction at each
G Signal error from the detector.
It is common for the variances at the high and low
responses to differ by three or four orders of magnitude.
This is not surprising since the responses themselves can
differ by two or more orders of magnitude.
Most detectors produce signal noise with a standard
deviation that is proportional to the magnitude of the
response. An exception is the error from isotopic and
luminescent detectors. These detectors measure discrete
“counts” of light photons, and their error is Poisson, i.e.,
the square root of the number of counts.
The error from the kinetics associated with antibody
binding is nonlinear, with the result that the kinetic varia-
tions in the reaction change disproportionately as the ratio
of analyte to primary binder and tracer binding changes. It
is this second factor that is responsible for most of the dis-
similarity in variance patterns between test methods, which
is not surprising since the reaction kinetics between differ-
ent antibodies vary so much. As a result, immunoassay
responses are always heteroscedastic and do not have a
constant variance. Because of the substantial error from
the kinetics, these variances cannot be made homoscedas-
tic, or constant, using, for example, a log transformation
or a simple 1/Y or 1/Y² formula and must be determined
individually for each test method.
This response–error relationship is usually expressed as
a regression function itself. The variance of the standard
dilutions can usually be approximated by a power function
of the response,
where A is a function of the magnitude of the responses
and the average noise level, and B falls in a range of 1.0–
2.2. In some cases, adding a constant minimum variance
parameter (C) will improve the variance ﬁt of very low
Since it is impractical to run enough replicates to get a
reliable estimate of the true variance function from a sin-
gle assay, responses from a pool of randomly selected
assays of that test method are required. This has the added
beneﬁt of incorporating the differences in intra-assay vari-
ation observed between assays. A one-way analysis of vari-
ance (ANOVA) is performed separately on each dilution,
using the replicates from each assay. A power linear regres-
sion is then computed using the log of the mean response
versus the log of the error mean sum of squares of each
dilution to generate the variance regression. These expected
variances are then used to weight the dose–response
regressions using the inverse of the variance at each point.
See Fig. 2.
FIGURE 1 Logit–log method.
FIGURE 2 Only weighting equations from pooled assays model the
actual variances of a dose response curve. (The color version of this
ﬁgure may be viewed at www.immunoassayhandbook.com)
325CHAPTER 3.6 Calibration Curve Fitting
These variance regressions, which are obtained from
the normal behavior of randomly-selected assays from a
test method, can also be used to evaluate the precision of
replicates from assays of that test method. These
variance regressions are also required to compute the
error proﬁles that will be discussed below, as well as
determining the limits of quantitation, the limits of
detection, parallelism curves in potency assays, and
A good curve model should possess three properties. First,
the curve model must do a good job of approximating the
shape of the true curve. If the curve model does not do
this, there is no way to compensate for this lack-of-ﬁt com-
ponent of the total error. Second, a good curve model
must be able to average out as much of the random varia-
tion as practical to produce concentration estimates with
low error. Third, a good curve model must be able to pre-
dict accurate concentration estimates for points between
the anchor points of the standard dilutions.
Empirical, or interpolatory, methods such as point-to-
point and cubic splines have been used because they are
simple to run. These functions pass exactly through the
mean data points. Because these empirical methods pass
through the data points, there is no averaging of the data
to reduce random variation. Since the random error of
these points shifts the data from their true value, these
empirical curves are guaranteed not to be good estimates
of the true curve. Point-to-point curves do not attempt to
approximate the area between the data points (see Fig. 3),
rendering concentration estimates in these regions
Cubic splines are not always monotonic and can oscil-
late up and down because of the random variation in every
node point, instead of producing a continuous, smooth
functional form (see Fig. 4).
Because of these and other weaknesses with empirical
methods, the concentration estimates contain a greater
amount of error than curve regressions with fewer
parameters that are able to average out the random
Regression methods ﬁt a given functional form or model
to the data, so that errors in the calibrator points are par-
tially corrected for, making the calibration curve more
robust. These regression models are chosen for their abil-
ity, with parameterization, to assume a shape that matches
the dose–response shape of the standard curve. Use of a
good regression model is especially important if the cali-
brators are run as singletons.
The statistical technique most commonly used to esti-
mate the parameters in any regression method is least-
squares ﬁtting. In least-squares ﬁtting, the vertical
response distance, or residual, of each point from the curve
is calculated and squared. The sum of these squared resid-
uals is called the sum of squares error (SSE). The least-
squares procedure selects the curve that gives the smallest
SSE. An illustration of this is shown in Fig. 5.
It should be obvious that there is a problem if the SSE is
simply the sum of the individual squared residuals. If the
individual errors between points are proportional to each
other, the squared difference between the observed and
the curve responses will be very, very high at the high-
response end and very, very low at the low-response end.
For example, if the error of all points is 5%, at the high end
an observed response of 11,500−11,000 from the com-
puted curve is a squared residual of 250,000. Conversely, a
5% difference at the low end is a residual of 105−100,
observed minus computed, or a squared residual of 25.
Clearly this means that the regression algorithms would be
ﬁtting the curve using, essentially, only the high end. The
low end would contribute virtually nothing to the SSE that
would not be overwhelmed by the high end no matter how
bad the ﬁt was at the low end.
The random error estimate generated from the repli-
cate variances of the pooled assays described above is
FIGURE 3 Point-to-point linear interpolation. FIGURE 4 Spline function between nodes.
326 The Immunoassay Handbook
the same error as the squared residual error observed
with the ﬁtted regression models, assuming no model
lack-of-ﬁt error. Therefore, the expected variance from
the variance regression will be the same as the squared
residual error at each response, when averaged from
pooled assays. When each squared residual is divided by
its expected variance from the variance regression, i.e.,
weighting the responses, all points contribute equally
to the ﬁtted curve. This means that those concentra-
tions that have the least relative error have a greater
impact on the curve than those points with more pro-
portional error. Now, the best ﬁtting curve, i.e., the
curve with the lowest SSE, will be the optimal maximum
likelihood estimate of the true curve, as predicted from
regression theory. That explains why sample concentra-
tions computed from unweighted regression ﬁts can dif-
fer from properly weighted curves by hundreds of
The weighted SSE is sometimes referred to as wSSE
(weighted sum of squares error), or RSSE for residual
sum of squares error.
Statistical theory adds the stipulation that the error at
each point is normally distributed. In practice, this works
ﬁne so long as each dilution point displays a central ten-
dency in its distribution. This behavior is generally true
for all immunoassays.
Data points should not be used when the signal values
from the detector are beyond the linear range of their
capabilities. All detectors have a ﬁnite range where
their signal is proportionately linear to the amount of
label material producing the signal. When assay
ranges are extended below or above this linear range,
the mean signal itself is wrong, and the distribution of
that signal becomes distorted and is no longer as
It is necessary to have a means of assessing the quality of a
curve ﬁt. There are no metrics that can assess empirical
methods. When measuring unweighted nonlinear regres-
sions, there are no statistically-appropriate metrics that are
meaningful, so the goodness-of-ﬁt is usually assessed with
the r2. This metric is a measure of the proportion of the
responses that ﬁt the regression model, not the residual
amount that did not ﬁt. The r2 is primarily a measure of
whether there is a causal relationship between the concen-
tration and its associated response and is not well suited for
nonlinear regressions. This is why even obviously bad
curves usually have good r2 values.
With weighted regressions, the SSE itself is a metric that
is a direct measure of how well the curve model ﬁts the data.
The SSE is often expressed as a residual variance by dividing
the SSE by the degrees of freedom of the curve (number
of points minus number of parameters) to normalize the
metric between different curve models and data point num-
bers. However, neither of these metrics provides any infor-
mation about how good or bad the ﬁt is relative to all of the
values that could be obtained from good assay curves.
It is a statistical property of the weighted SSE that, if the
responses are normally distributed (a requirement satisﬁed
with the central tendency of this data), the SSE is a χ2-
distributed value at number of points minus number of
parameters degrees of freedom.
This property allows a χ2 probability to be determined.
The p value of the SSE can be viewed as the fraction of an
inﬁnite number of assays that, if performed under exactly
the same conditions, would be expected to have a worse
curve ﬁt, i.e., a larger SSE, than the curve ﬁt of the assay
FIGURE 5 Illustration of wSSE from individual squared residuals.
327CHAPTER 3.6 Calibration Curve Fitting
under consideration. This χ2 probability has been called a
Since this metric is a probability, a ﬁt probability of 0.01 or
above can be considered acceptable.
WEIGHTED LEAST-SQUARES METHODS
There are three weighted least-squares regression meth-
ods in general use for immunoassay dose–response curves.
These are linear regression, the 4PL and the ﬁve-parame-
ter logistic (5PL). All three of these methods use weighting
for the individual points, and all of them ﬁnd the lowest
SSE for the solution, a process called minimization. All
least-square regression curves require at least one more
data point than there are parameters in the model. These
extra degrees of freedom are what allow the averaging of
the error in these models.
Linear Regression Model
Linear regression is the simplest of these methods
because it is a closed form function that can be solved alge-
braically. This means that there will be an exact solution
for the regression parameters. This makes the computa-
tion simple enough to perform on a handheld calculator,
or simple software programs, and all will get the same
solution. The formula:
where Y is the response and X is the concentration, gener-
ates a straight line having a slope of b and a Y intercept of a.
Concentrations are determined by inverting the formula to
The problem, of course, is that all but the shortest immu-
noassay curves are nonlinear. Various methods have been
used to “linearize” this curve, the most popular being the
logit transform discussed above. These linearization
schemes were necessitated by the poor or nonexistent com-
puting resources that were available at the time. But attempt-
ing to linearize a nonlinear curve is a poor solution, and for
many years, these transformation attempts have been
replaced by nonlinear curve models like the 4PL and 5PL.
The linear logit–log model is sometimes considered to
be related to the 4PL model (a 4PL curve transforms to a
straight line in logit–log space). These in turn have been
shown to have certain approximations to the mass action
model, the only model where the parameters are measures
of physical properties. But the complexities of ascertaining
these physical properties, and modeling them in a kinetic
regression formula, have prevented any practical applica-
tion of such a model from appearing.
Nonlinear Curve Models
Nonlinear curve modeling is much more difﬁcult than
linear regression. Finding the solutions to these models
requires using numeric processes to ﬁnd a solution. Numeric
processes are iterative processes that incrementally reparam-
eterize the coefﬁcients to ﬁnd better solutions (i.e., lower
SSEs). There are two major steps involved, and each of
them is critical to the process. The ﬁrst is ﬁnding the initial
starting estimates of the parameters, and the second is ﬁnd-
ing the best solution in the region the ﬁrst step placed the
ﬁtting algorithms. The region identiﬁed in the starting esti-
mates is very important because, in the four- or ﬁve-dimen-
sional geometric space of the 4PL or 5PL, respectively,
there is one, and only one, set of coefﬁcients that is the
global minimum, i.e., the best ﬁt. But there are many local
minima that can fool the ﬁtting algorithms into settling on
a solution that is not the best ﬁt, and sometimes a local set
of minima gives a terrible ﬁt. Sometimes the ﬁtting algo-
rithm cannot ﬁnd a solution at all.
Marquardt–Levenberg and Gauss–Newton are popu-
lar minimization algorithms used in many immunoassay
software programs. They are generally adequate for ﬁnd-
ing solutions to 4PL models. But these minimization algo-
rithms have well-documented problems ﬁnding solutions
in 5PL space. Software programs that use more powerful
numeric algorithms are more successful with the 5PL.
Although they require more computational speed and
memory, modern PCs are generally powerful enough to
run these programs.
It is important to note that these nonlinear curve models
are mathematical shape functions only. Their parameters
do not correlate with any physical properties of the immu-
noassay reaction. Sometimes the coefﬁcients between sim-
ilar appearing curves can be widely different, especially the
b, c, and g coefﬁcients of the 5PL, with very little differ-
ence in the shapes. This is to be expected and does not
matter so long as the SSE is low.
The 4PL and 5PL models
The 4PL model is widely used, in large part because the
model is easier to ﬁt computationally than the 5PL model.
Widespread use of the 5PL has only become possible with
the more powerful ﬁtting algorithms available in recent
years. The 4PL model works well when the dose–response
curve is symmetrical. The formula for the 4PL curve is:
where a and d are the asymptotic ends, b controls the tran-
sition between the two asymptotes, and c is the transition
point midway between the two asymptotes where the curve
Concentration estimates for the 4PL curve can be
obtained from the formula:
The 4PL model can be extended by adding a ﬁfth param-
eter, g, which controls the degree of asymmetry of the
curve. With the extra ﬂexibility afforded by the asymmetry
parameter, the 5PL model is able to eliminate the lack-
of-ﬁt error that occurs when the 4PL is ﬁtted to asym-
metric dose–response data. The 5PL model provides an
excellent compromise between over-parameterized mod-
els that can ﬁt data closely at the cost of a large variance in
the predictions and under-parameterized models that suf-
fer from large lack-of-ﬁt errors. This is because the fewest
328 The Immunoassay Handbook
number of parameters that any general asymmetric sig-
moidal function can possess is ﬁve: one for the upper
asymptote, one for the lower asymptote, one for the over-
all length of the function’s transition region, one for the
location of the transition region, and one for the degree of
asymmetry. It is unlikely that any function with less than
ﬁve parameters will have the ﬂexibility necessary to pro-
duce a high-quality ﬁt to asymmetric sigmoidal dose–
The formula for the 5PL curve is:
where a and d are the asymptotic ends, b controls the tran-
sition between the two asymptotes, c is the transition point
where the curve changes inﬂection, and g, with b, controls
the rate of approach to the lower asymptote. Note that in
the 5PL, the transition point c is not in the center between
the two asymptotes except when g=1, in which case the
5PL reduces down to a 4PL. Concentration estimates for
the 5PL curve can be obtained from the formula:
The families of curve shapes that the 5PL can assume by
varying one parameter at a time are shown in Fig. 6.
Figure 6 makes several characteristics of the 5PL
function apparent. The function approaches a horizon-
tal asymptote as the dose approaches zero, and it
approaches a horizontal asymptote as the dose approaches
inﬁnity. Between the asymptotic regions of the curve is a
transition region which contains a single inﬂection point.
On either side of the inﬂection point the curve will
FIGURE 6 Effects of varying the parameters of the 5PL function.
329CHAPTER 3.6 Calibration Curve Fitting
approach the left and right asymptotes at different rates
unless g = 1.
Consideration of Fig. 6 gives some insight into how the
ﬁve parameters of the 5PL function affect the resulting
curves. Parameters a and d control the position of the
curve’s horizontal asymptotes. Examining the behavior of
the curve in its asymptotic regions provides additional
insights, especially into the roles of b and g. When
approaching the “a” asymptote, only parameter b controls
the rate of approach to the asymptote. However, when
approaching the “d” asymptote, the rate of approach is
controlled by the product bg. This coupling of parameters
is one of the reasons the 5PL is harder to model.
The table below summarizes the effect of the parame-
ters a, b, and d on the slope of the logistic function.
Because the 4PL function is point symmetric on semi-
log axes about its midpoint (g=1), the literature has
adopted two conventions for eliminating this redundancy
in the parameterization of the 4PL function: either a>d is
ﬁxed and the sign of b determines the slope of the logistic
or b>0 is ﬁxed and the ordering of a and d determines the
slope of the logistic. In the 4PL case, neither of these con-
ventions restricts the range of functions that can be
In contrast, the 5PL function has no symmetry. There-
fore, the cases of Table 1 all yield distinct functional forms.
Cases 1 and 4 are both suitable for modeling decreasing
dose–response data, while cases 2 and 3 are both suitable
for modeling increasing dose–response data. The a>d
form can model sharp transitions at the high end of the
curve (with shallow transitions at the low end), and the
a<d form can model sharp transitions at the low end of
the curve (with shallow transitions at the high end), but
neither can model the reverse. These cases can arise with
some bioassay dose–response curves but are not typically
observed with cell-free immunoassay data. For immunoas-
say curves, either 5PL form can model the curves
Figures 7 and 8 show an immunoassay curve ﬁt with a
weighted 4PL and the same data ﬁtted with a weighted
The squared residuals plotted below the graphs illus-
trate the greater curve error with the 4PL in the curve-ﬁt
metrics observed compared to the magnitude of the
squared residuals of the 5PL.
Few immunoassay curves are completely symmetric,
with immunometric or sandwich assays such as ELISAs
being particularly asymmetric. Also, improvements in
signal-to-noise ratios have tended to increase the amount of
asymmetry observed in these dose–response curves. The
5PL will have a lower SSE because of its greater ﬂexibility
in shape and is usually the better choice. But if the curve is
fairly symmetrical, the SSEs from the 4PL and 5PL will be
similar. This in turn may result in a slightly higher ﬁt prob-
ability for the 4PL than the 5PL due to the extra degree of
freedom of the 4PL curve. In this case, either curve model
would be justiﬁed. If the curve does not reach the inﬂection
point, there is only one end region and the 4PL will be
ESTABLISHING STANDARD DILUTION
Standard dilution concentrations should be chosen that
generate an even gradient of responses. It is not necessary
that the concentrations be evenly distributed on the log
scale. What is necessary is that the response gradient should
be evenly separated when plotted on a linear scale, and
none of the responses should be clumped together at the
same response value. The plot of the squared residuals of
the assay below the assay curve in Fig. 9 shows how the
distribution should appear. Note that there is an even
distribution of individual responses for the entire
In Fig. 10, the squared residuals of another assay shows
the effect of an uneven gradient of responses on the mag-
nitude of the squared residuals in the region with the
clumped responses, and hence on the SSE of the weighted
This clumping of responses relative to the rest of the
responses results in uneven ﬁtting throughout the curve, as
shown by the squared residual plot. Concentrations should
be chosen that evenly span the range of dose-dependent
changes in response. When the dose no longer has an
effect on the response (the plateau regions), there should
not be more than one dilution.
More than one dilution in the plateau regions makes it
likely that these points will be nonmonotonic. Even if the
variation of these nonmonotonic points is within their
variance ranges, they will decrease the acceptability of
the curve ﬁt. Since these points only contribute noise,
they only shift the shape from the more accurate estima-
tion of the true curve derived from the more sensitive
points. This is especially true when hooks are not
The purpose of immunoassay tests is to determine the
concentrations of unknown specimens. These concentra-
tion estimates are only valid if the error of their estimates
is not so high that the result is meaningless. There are ﬁve
factors that make up the error of the concentration
G The response variation at that point,
G The response/dose slope at that point,
G The closeness of adjacent standard calibrators to that
TABLE 1 The Relationship between the Order of a and d, the
Sign of b, and the Slope of the Monotonic 5PL Function
Order of a
and d Sign of b Slope
1 a>d b>0 Down
2 a>d b<0 Up
3 a<d b>0 Up
4 a<d b<0 Down
Note: For g=1 (4PL curve), case no.1 generates the same functional form as case no. 4,
and case no. 2 generates the same functional form as case no. 3. For 5PL, all four cases
produce distinct functional forms.
330 The Immunoassay Handbook
G The number of replicates of the calibrator points and
the unknown sample,
G The amount of error in the regression.
Each of these factors can be addressed during assay
development to reduce their contribution, but this is only
practical when the amount of error can be measured. The
response variation, discussed earlier, can be reduced by
further optimizing incubation conditions and reagents.
The response/dose slope can be made steeper in diagnosti-
cally important regions by varying assay conditions. Cali-
brator concentrations, number of dilutions, and replicate
numbers can be adjusted. More appropriate curve models
can be selected. But what is needed to measure the effect of
any changes to the assay is an accurate determination of
the error proﬁle throughout the dose range.
The mathematical determination of this error is very com-
plex for the nonlinear curves produced by immunoassays.
Earlier methods from the literature simpliﬁed the error
determinations by only including the error from the ﬁrst two
bullet points above. But these methods can miss a third or
more of the actual error and give unreliable error estimates at
the low and high ends. These earlier methods were usually
termed precision proﬁles, and the error was expressed as a
%CV to approximate the empirical values of the variation
obtained from repeated measurements of known specimen
Recently, estimations using Monte Carlo methods have
enabled accurate determinations of the error proﬁle of an
individual assay to be made that include all of the error
factors. In these more accurate error proﬁles, the error is
typically expressed as a percent error of the concentration.
Individual sample concentration error from both individ-
ual replicates and dilution averages is then derived from
these error proﬁles. These error proﬁles are of inestimable
FIGURE 8 The SSE and ﬁt probability of the 5PL ﬁt are 1.386 and
FIGURE 7 The SSE and ﬁt probability of the 4PL ﬁt are 29.931 and
331CHAPTER 3.6 Calibration Curve Fitting
utility when developing an assay, when troubleshooting
the effects of changed conditions on assay results, and for
establishing the limits of quantitation that will determine
the reportable range of an assay.
An error proﬁle plot from a standard curve plots the
concentration on the x-axis and the %Error on the y-axis.
Determinations of the limits of quantitation can be easily
made by setting the amount of error allowed for report-
able results and calculating the lowest and highest concen-
trations that do not exceed that amount of error. Three
sets of immunoassay data were computed using the pro-
gram StatLIA® from Brendan Technologies, Inc., in the
examples below using a weighted 5PL curve model, as
described in Gottschalk and Dunn (2005a). All of the
curves had ﬁt probabilities above 0.3. One assay curve was
normal (Fig. 11), one assay curve matched an incubation
temperature that was too low (Fig. 12), and one assay
curve increased the number of dilution points at each end
and reduced the number of dilutions in the center of the
curve (Fig. 13). Error proﬁles at a 95% signiﬁcance level
were computed for each curve, and the results are dis-
Assuming a 50% acceptable %Error for the limits of
quantitation, the minimum and maximum acceptable
concentrations (MinAC and MaxAC, respectively) can
be easily seen from the error proﬁles below their respective
dose–response curves. Note the reduced reportable range
and increased %Errors in Fig. 12, which had suboptimal
incubation conditions. In Fig. 13, note how increasing the
number of dilutions at each end increased the reportable
range half an order of magnitude at the low end and 30%
at the high end when compared to the limits in Fig. 11.
Also note the increased %Error in the regions where there
are no close standard points to anchor the regression.
FIGURE 9 The concentrations are all 1:2 serial dilutions and evenly
spaced apart, and the responses are also evenly spaced apart. The
squared residuals are similar in magnitude for all doses.
FIGURE 10 The concentrations are all 1:2 serial dilutions and evenly
spaced apart, but the responses are not evenly spaced. The squared
residuals are all high where the responses are clumped and small where
they are sparsely distributed.
332 The Immunoassay Handbook
Identifying outliers is a requirement for all immunoassay
data. Outliers are not to be confused with failed samples.
Outliers are responses whose values are so far removed from
the normal distribution of that sample that the probability
that it is a member of that population is prohibitively low,
generally less than 0.0001. A failed sample response, how-
ever, is statistically likely still to be a member of that popula-
tion but has behavior that does not pass acceptability, such
as a probability less than 0.01 but above 0.0001. The differ-
ence between the two is that an outlier response can, and
should, be ignored and the assay recomputed; but a failed
sample response should not be ignored even if the assay fails.
There are two types of outlier responses: precision
outliers and residual outliers. Precision outliers are rep-
licate samples that have too much difference between their
response values. Often this variation is expressed as %CV
(the standard deviation of the responses as a percentage of
their mean). Acceptability is determined by an arbitrarily
selected %CV applied to all replicate sets. When three or
more replicates are assayed and the %CV is unacceptable,
attempts are sometimes made to identify the replicate
response farthest from the other samples as an outlier.
Occasionally, Dixon or similar outlier tests are used to ﬁl-
ter these outlier replicates. The problem with methods like
Dixon is that the sample size is too small for a reliable esti-
mation of the distribution of that sample.
A more sound method for evaluating replicate behavior
is to use the variance regression discussed above. Using the
ratio of the observed replicate variance divided by the
expected variance of that response generates an F statistic,
and an F probability can then be obtained using the appro-
priate degrees of freedom. These would be the number of
replicates—one for the numerator, and the sum of the
individual ANOVA error degrees of freedom—two for the
denominator. These F probabilities (precision probabili-
ties) are tailored for that test method because the compari-
son is to the normal behavior at that response level in that
test method, as determined by its historical behavior. If
there are three or more replicates, a Grubbs threshold can
be determined using the degrees of freedom of the vari-
ance regression to determine if a replicate can be ignored.
0.5 1 5 10 50
0.5 1 5 10 50 100
FIGURE 11 Normal curve.
0.5 1 5 10 50
0.5 1 5 10 50 100
FIGURE 12 Depressed curve.
0.5 1 5 10 50 100
MinAC1 MaxAC1 MinAC2 MaxAC2
0.5 1 5 10 50 100
FIGURE 13 Dilution point effects.
333CHAPTER 3.6 Calibration Curve Fitting
Residual outliers are isolated points on the curve that do
not match the behavior of the other points and make it
impossible to get an acceptable curve ﬁt. These residual out-
liers can be the mean dilution response from replicates with
acceptable precision, meaning that the dilution itself was
improperly prepared, or from a single outlier replicate
response. Residual dilution outliers are identiﬁed by comput-
ing an F probability using the squared residual of the point as
the F statistic, with the number of points minus number of
parameters as the numerator degrees of freedom, and the
variance regression degrees of freedom for the denominator.
Stored Calibration Curves,
Factory Master Curves,
In life sciences and pharmaceutical research, and many
other ﬁelds where immunoassays are applied, the samples
are usually run in batches with freshly-prepared standard
curves made by diluting a master standard solution. Assay
kits usually contain a set of calibrators supplied with the
reagents. But in the clinical diagnostic ﬁeld, to improve the
turnaround time, individual samples can be loaded onto a
random-access analyzer at any time. This requires that
calibration curves are stable over a period of time. Most
immunoassay systems on the market have calibration
curves that are stable for at least 2 weeks and across reagent
packs within one lot, although controls should still be run
regularly. This reﬂects the level of stability and consis-
tency now achievable with reagents and equipment.
Recently, this stability has also led to a reduction in the
number of calibrators required for user calibrations,
through the provision of a master calibration curve from
the manufacturer. A reduced number of calibrators are run
by the user to adjust the master curve to take account of
bias due to the user’s analyzer. Hence the calibrators, in this
context, are really adjusters.
In order to achieve random-access operation, the num-
ber of requirements that must be fulﬁlled by the curve-ﬁt
and adjustment algorithms, and by the assay reagents and
instrumentation, is signiﬁcantly greater than for a batch
assay. That being said, manufacturers have been able to
provide remarkably precise sample measurements consid-
ering the many potential sources of error, through rigor-
ous attention to detail and extensive validation. It is beyond
the scope of this chapter to comment on all the possible
methods for using stored and reduced calibration curves,
however, there are a few general principles.
There are two different stages to the process: establish-
ing the initial calibration curve—sometimes referred to as
the “master curve”—at the manufacturer’s laboratory and
the user adjustment in the local analyzer. This corrects for
bias due to the analyzer or changes in the reagents since
they were manufactured.
MASTER CALIBRATION CURVE
A master calibration curve is established by the manufac-
turer for each reagent pack lot created. Its details are usu-
ally encoded in some way on the packs, by a conventional
barcode, a two-dimensional barcode, a magnetic card, or a
smart card. The data may also be made available from a
web site on the Internet or communicated to analyzers via
e-mail. For each reagent pack lot, around 6–10 calibrator
concentrations are used with high order replication. The
responses are obtained from at least two different analyz-
ers. Generally, at least 20 “replicates” for each calibrator
are used to establish the master calibration curve. The cali-
bration curve-ﬁtting method chosen is usually 4PL or
5PL. Due to the closed nature of commercial systems, the
curve-ﬁtting process can be specially tailored to the par-
ticular assay or family of assays.
There are two major advantages of master curves, other
than their obvious convenience for the user. First, the
number of replicates is much greater than in conventional
user calibrations. This is economically viable because the
master calibration is only run once for all of the users of
each lot. More concentration levels may be used, and many
replicates run at each level, without excessive cost. The
second advantage is that one set of master calibrator sets
may be used to provide calibration curves to all the ana-
lyzers, possibly for many years, removing a potential
source of bias (due to calibration error) when conventional
selling calibrators are calibrated from the master set. How-
ever, for the purist, there is no substitute for running
freshly made standards, unknown samples, and controls
together in a batch format, because it removes many
potential sources of error. But this is impractical for many
No matter how many analyzers are used by the manufac-
turer to produce a master curve, any bias due to the user’s
analyzer will produce a bias if the master calibration curve
is used. For this reason, some local calibration activity is
necessary for each analyzer.
The user calibration consists of running two or three
adjuster calibrators with known concentrations and an
algorithm to move the master calibration curve based on
the signal levels for each adjuster. For example, if the mas-
ter calibration curve gives a signal level of 1000 signal units
at 100 concentration units and the adjuster, which has a
concentration of 100 units, gives a signal level of 950 signal
units, the algorithm may lead to a shift in the master cali-
bration curve of 5% at this point. Using multiple adjusters,
the entire master calibration curve is moved to allow for
bias in the user analyzer. However, this highlights the
weakness of the adjustment stage. In this example, was the
difference of 5% between the signal levels due to genuine
bias of the user analyzer from the analyzers used for the
master calibration or was it just due to normal assay varia-
tion? The error in this type of calibration system thus
derives largely from the assay imprecision when the
adjuster calibrators are run at the user laboratory. Any
deterioration of the calibrators during shipment from the
manufacturer, or due to insufﬁcient refrigeration on stor-
age, can cause the entire calibration curve to be biased.
For the immunoassay system designer, the questions are:
how many adjusters should be used, where should they be
placed, and how should the user’s analyzer process the
information they produce? Clearly the best results would
334 The Immunoassay Handbook
be achieved by running at least four adjusters in duplicate,
but this would negate the potential economic advantages
of the master curve. The compromise solution is that two
or three adjusters are usually run, sometimes in singleton
and sometimes in duplicate.
Little has appeared in the literature about the theory of
curve ﬁtting associated with factory calibration with user
adjustment, but we can offer some guidelines in respect of
the problems involved. One key principle applies through-
out: the total number of adjuster replicates must at least equal
the number of parameters in the model that may change from
the master curve determination.
Linear Master Curves
It is helpful to distinguish between different types of lin-
earity. There is the direct linear form where there are only
two parameters: slope (m) and intercept (c), that need to be
determined. These parameters could well appear as the
natural parameters after suitable transformations of
response and dose. The other situation is where there is a
pseudo-linear curve shape, where the response and dose
metameters are linear in the direct sense, but there are
other unknown parameters needed to specify the transfor-
mation. An example is the 4PL, where if it is assumed that
the NSB and B0 values are known, the logit–log transfor-
mation produces a linear plot. These two situations need
to be differentiated.
Direct linear form
There are two cases to consider: ﬁxed and non-ﬁxed slope.
If the master curve has a ﬁxed slope, then there is only one
parameter to determine, namely the intercept. The mini-
mum number of adjusters is therefore 1. For a homosce-
dastic assay, if the slope of the line is m, the error in the
response metameter for the adjuster σA, and the error in
the response metameter for the unknowns σU, then the
error in the interpolated dose is:
σU and σA can be reduced by replication. So if the maxi-
mum error in prediction is speciﬁed, this will not only put
constraints on m and σU, but also on σA, a result that could
mean extra adjusters being needed in the form of
If the master curve has a non-ﬁxed slope, then the mini-
mum number of adjusters needed is two as there are the
two unknown parameters: intercept (c) and slope (m).
Standard statistical theory indicates that the placement of
the adjusters should be such that they span the linear
range, thus avoiding the increased loss of precision due to
As mentioned earlier, the use of only two adjusters could be
challenged on theoretical grounds, for most immunoas-
says, since the linear relationship might be a consequence
of a transformation from a mathematical form that had
more than two parameters. For example, the logit–log
transformation might well linearize a plot, but there are
fundamentally four parameters to describe the response,
the slope and the intercept in the logit–log domain, and the
NSB and B0. If the NSB and B0 are “known” then the logit–
log plot is a truly linear one with the slope and intercept
deﬁning its properties. If NSB and B0 are “unknown,” then
extra information must be introduced into the process to
infer their values. One possible approach is now described.
Suppose conventional calibration curves are established in
a number of assay runs at the factory, then sometimes a
plot of percent bound against dose will produce a proﬁle
that is very stable across assays, even though the signal lev-
els vary considerably between assays. This constant feature
can be exploited to reduce the number of calibrators/
adjusters needed. Suppose two adjusters have the ﬁxed per-
centage bound B1 and B2, respectively, and for a particular
assay run, they have responses R1 and R2, respectively.
Constant percent bound means can be written as:
Here B1, B2, R1, and R2 are known and so NSB and B0
can be determined. This is all that is required, together
with the master curve to run the assay. The statistical
problem that has to be resolved by the manufacturer is to
determine the optimum adjuster concentrations, taking
account of the impact on the interpolated dose. Also, with
any calibration curve with asymptotes (NSB and B0), what
procedure is adopted by the software when responses fall
outside of the range of the calibration?
Integration of the Model, the Master Curve
and the Adjustment Process
Most manufacturers currently provide adjusters so that the
curve shape can be adjusted empirically, by pulling it up
and/or down, without taking account of the underlying
model. This approach is validated by measuring precision
on many assays, over the full reagent shelf lives. But the
method is very prone to single-point error and outliers. A
better approach may be to record the key model parame-
ters for the master curve, and then use the adjuster data to
change the particular parameters in the model that have
been shown, experimentally, to change from analyzer to
analyzer and across the reagent shelf life. The number of
adjusters would then be determined by the number of
parameters that can change.
Master curve model
We can start the process of integration with the master
curve model. The increased number of calibrators and
replicates run in the factory laboratory allows for more
complex modeling. For example a 4PL ﬁt could be applied
with additional constants included for variables unique to
the chemistry and system, such as a reduction of horserad-
ish peroxidase efﬁciency in the presence of high substrate
levels, at very low concentrations in an immunometric
assay. With as many as 200 replicates for the master cali-
bration, there would be a more than adequate supply of
data. So the master curve model could have many con-
stants, reﬂecting the true variables in the assay. This would
allow very precise, but constrained, modeling of the mas-
ter calibration curve.
335CHAPTER 3.6 Calibration Curve Fitting
The provision of many extra data points has another ben-
eﬁt. It is possible to systematically determine the optimum
values of the parameters in the curve-ﬁt model in a logical
fashion. Conventional curve ﬁtting may involve the deter-
mination and ﬁxing of parameters in a stepwise fashion to
obtain best ﬁt (least sum of squares) leaving one last param-
eter to be optimized. This may have to take on an extreme
value to make the model ﬁt, due to errors in the determi-
nation of the values of the other parameters. However,
using the larger amount of data available in a factory cali-
bration, it may be possible to determine each parameter
independently, before ﬁnally making small adjustments to
the values to obtain the best ﬁt. Previous theoretical
knowledge about the parameters that can be affected by
reagent lot-to-lot variation can be used to restrain param-
eter changes from going outside previously established
Parameters affected by analyzer-to-analyzer
Analyzer variation needs to be investigated to identify
which parameters in the curve-ﬁt model are affected. For
example, there could be a variation in absolute signal lev-
els, variation in low signal sensitivity, or differences in
high-signal saturation characteristics. These should be
incorporated in the model using as few parameters as pos-
sible. They may be assay speciﬁc. For example, analyzer
variation in the background noise level at low concentra-
tions may only be a signiﬁcant factor in a sensitive TSH
During development of new kits, the model could be
explored further. Is the assay competitive or immunomet-
ric? How linear is the dose response? Are the zero refer-
ence calibrators, used for master curve generation, truly
zero? Is the assay likely to involve very low levels of enzyme
at the signal generation stage? The aim of this work would
be to derive the basic model for the assay, perhaps chosen
from a family of options for the system. Any parameters in
the system model not relevant to the assay could be set to
a ﬁxed number or removed from the model.
During transport and stability studies, the model would be
used to determine which parameters change in value with
time. It may be that more than one parameter changes
with time, but that two parameters vary according to a
ﬁxed ratio. Knowledge of this can be used to determine the
number of adjusters and their concentrations.
Number of adjusters
The number of adjusters can be determined from knowl-
edge about analyzer–analyzer and stability effects. The
minimum number depends on the number of parameters
(or linked pairs of parameters) that can change.
Position of adjuster concentrations
Knowledge of the changes that can occur may be applied
to the choice of adjuster concentration. For example, if
background signal at zero concentration does not vary, the
lowest adjuster does not have to be at zero.
Replication of adjusters
This is a matter of trade-off between convenience and
avoidance of adjustment bias. However, the replication is
strongly inﬂuenced by the number of adjusters, the preci-
sion of the system in the user’s laboratory, and the desired
precision for the assay. As a rule-of-thumb, for assays with
less than four replicates overall (e.g., four adjusters in sin-
gleton or two adjusters in duplicate), adjustment bias will
be a signiﬁcant source of overall assay imprecision.
Method used to adjust master curve using
adjuster signal levels
Using the accumulated information about the assay,
obtained during development, it should be possible to use
the adjuster data to modify the relevant parameters in the
model, while retaining the parameter values that are not
expected to change. In this way, the model is less likely to
be forced into bias simply due to the error in the adjuster
signal determination. However, the problem still remains
that adjuster signal determination error can unduly inﬂu-
ence the parameter(s) allowed to change and distort the
curve. It is important that the system has some error checks
to warn of signal changes that are outside of expected
MODELING CALIBRATION CURVE
CHANGES OVER SHELF LIFE
For reagents that are very stable, periodic recalibration
within the shelf life may not be necessary. However, a con-
cept that does not seem to have attracted much attention
in the literature is that of modeling the time dependence of
parameter values. As an example, suppose an appropriate
calibration curve-ﬁt model for a particular assay is the 4PL.
Also suppose that a time series plot of the four parameter
values reveals a proﬁle that can be accurately modeled over
a period of time. If this turned out to be the case, then no
adjusters would be required, since all that would be neces-
sary is the date of use of the reagents. A periodic calibra-
tion of the analyzer may be required, but this may not need
to be assay speciﬁc. We are aware of this technique being
used on one system, but it places great demands on the
manufacturer to produce materials that have consistent
and predictable changes during the shelf life. It is not likely
to be robust to temperature ﬂuctuation during transporta-
tion or storage.
USE OF ELECTRONIC DATA
As explained earlier, use of a master curve determined
using a large number of replicates from a set of secondary
reference standards at the manufacturer’s laboratory has
several advantages that are offset by the additional error
due to use of very few adjuster concentrations and repli-
cates in the user laboratory. In the future, we may see
increasing use of master curve updating via the Internet,
using modems in immunoassay analyzers. User analyzer
336 The Immunoassay Handbook
calibration would comprise of a periodic determination of
analyzer bias from the master analyzers at the manufac-
turer, which may not be assay speciﬁc.
The key issues about using stored calibration curves are
stability and analyzer-to-analyzer variation. If, in a con-
ventional assay, the underlying dose–response relationship
requires k parameters to determine its form, then there
should be at least k adjusters used, unless it has been shown
experimentally that fewer parameters can change over
time or between analyzers. It would be interesting to see if
manufacturers could explain the fundamental theory of
their stored calibration systems, as well as providing data
to support stability and precision claims. There is no doubt
that these “black box” proprietary calibration systems
should be approached with caution, with great attention
paid to quality assurance and control schemes, to check
that calibration integrity has been maintained.
STATLIA Quantum®, by Brendan Technologies (www.
brendan.com), offers a complete data reduction program
for quantitative immunoassays, potency bioassays, and
qualitative screening tests. The program is a fully inte-
grated enterprise system networking multiple detectors,
liquid handlers, workstations, users, and LIM systems to a
central SQL database. The program features the compa-
ny’s TrueFit™ weighted data reduction and SmartQC™
quality control (QC) and method analysis. Assays are com-
pared to their historical performance for a comprehensive
statistical QC analysis of assay performance, comprehen-
sive customizable assay and unknown acceptance criteria,
and test method performance qualiﬁcation (PQ) analysis.
The program features a large number of graphs in 2D and
3D, many report templates, and a report designer for cus-
tomizing reports, PQ reports, and other reports in Excel®,
PDF, HTML, TIFF, and CSV formats.
References and Further
Bates, D.M. and Watts, D.G. Nonlinear Regression Analysis and Its Applications.
(Wiley, New York, 1988).
Baud, M., Mercier, M. and Chatelain, F. Transforming signals into quantitative
values and mathematical treatment of data. Scand. J. Clin. Lab. Invest. 51(Suppl.
205), 120–130 (1991).
Belanger, B., Davidian, M. and Giltinan, D. The effect of variance function estima-
tion on nonlinear calibration inference in immunoassay data. Biometrics 52,
Box, F.E.P. and Hunter, W.G. A useful method for model-building. Technometrics
4, 301–318 (1962).
Daniels, P.B. The fitting, acceptance, and processing of standard curve data in
automated immunoassay systems, as exemplified by the Serono SR1 analyzer.
Clin. Chem. 40, 513–517 (1994).
Davidian, M., Carroll, R.J. and Smith, W. Variance functions and the minimum
detectable concentration in assays. Biometrika 75, 549–556 (1988).
DeSilva, B., Smith, W., Weiner, R., Kelley, M., Smolec, J., Lee, B., Khan, M.,
Tacey, R., Hill, H. and Celniker, A. Recommendations for the bioanalytical
method validation of ligand-binding assay to support pharmacokinetic assess-
ments of macromolecules. Pharm. Res. 20, 1885–1900 (2003).
Draper, N.R. and Smith, H. Applied Regression Analysis. 3rd edn, (Wiley, New York,
Dudley, R.A., Edwards, P., Ekins, R.P., et al. Guidelines for immunoassay data
processing. Clin. Chem. 31, 1264–1271 (1985).
Feldman, H. and Rodbard, D. Mathematical theory of radioimmunoassay. In:
Competitive Protein Binding Assays (eds Daughaday, W.H. and Odell, W.D.),
158–203 (Lippincott, Philadelphia, 1971).
Findlay, J.W., Smith, W.C., Lee, J.W., Nordblom, G.D., Das, I., DeSilva, B.S.,
Khan, M.N. and Bowsher, R.R. Validation of immunoassays for bioanalysis: a
pharmaceutical industry perspective. J. Pharm. Biomed. Anal. 21, 1249–1273
Finney, D.J. Statistical Method in Biological Assay. (Charles Griffin, London, 1978).
Gerlach, R.W., White, R.J., Deming, S.N., Palasota, J.A. and Van Emon, J.M. An
evaluation of five commercial immunoassay data analysis software systems.
Anal. Biochem. 212, 185–193 (1993).
Gottschalk, P.G. and Dunn, J.R. Determining the error of dose estimates and
minimum and maximum acceptable concentrations from assays with nonlinear
dose-response curves. Comput. Methods Programs Biomed. 80, 204–215 (2005a).
Gottschalk, P.G. and Dunn, J.R. The five parameter logistic: a characterization and
comparison with the four parameter logistic. Anal. Biochem. 343, 54–65
Haven, M.C., Orsulak, P.J., Arnold, L.L. and Crowley, G. Data-reduction methods
for immunoradiometric assays of thyrotropin compared. Clin. Chem. 33,
Healy, M.J.R. Statistical analysis of radioimmunoassay data. Biochem. J. 130,
Lynch, M.J. Extended standard curve stability on the CCD Magic Lite immunoas-
say system using a two-point adjustment. J. Biolumin. Chemilumin. 4, 615–619
Maciel, R.J. Standard curve-fitting in immunodiagnostics: a primer. J. Clin.
Immunoassay 8, 98–106 (1985).
Malan, P.G., Cox, M.G., Long, E.M.R. and Ekins, R.P. A multi-binding site
model-based curve-fitting program for the computation of RIA data. In:
Radioimmunoassay and Related Procedures in Medicine, vol. I, 425–455 (IAEA,
Nisbet, J.A., Owen, J.A. and Ward, G.E. A comparison of five curve-fitting proce-
dures in radioimmunoassay. Ann. Clin. Biochem. 23, 694–698 (1986).
Nix, B. and Wild, D.G. Data processing. In: Immunoassays, (ed Gosling, J.P.),
(Oxford University Press, Oxford, 2000).
Peterman, J.H. Immunochemical considerations in the analysis of data from non-
competitive solid-phase immunoassays. In: Immunochemistry of Solid-Phase
Immunoassay, (ed Butler, J.E.), (CRC Press, Boca Raton, 1991).
Plikaytis, B.D., Turner, S.H., Gheesling, L.L. and Carlone, G.M. Comparisons of
standard curve-fitting methods to quantitate Neisseria meningitidis Group A
polysaccharide antibody levels by enzyme-linked immunosorbent assay. J. Clin.
Microbiol. 29, 1439–1446 (1991).
Raab, G.M. Estimation of a variance function, with application to immunoassay.
Appl. Stat. 3, 32–40 (1981).
Raggatt, P.R. Data manipulation. In: Principles and practice of immunoassay, 2nd edn
(eds Price, C.P. and Newman, D.J.), 269–297 (Macmillan, London, 1997).
Rodbard, D. Statistical quality control and routine data processing for radioim-
munoassay and immunometric assays. Clin. Chem. 20, 1255–1270 (1974).
Rodbard, D. and Feldman, Y. Kinetics of two-site immunoradiometric (‘sandwich’)
assays-I. Mathematical models for simulation, optimization and curve-fitting.
Immunochemistry 15, 71–76 (1978).
Rodbard, D. and Hutt, D.M. Statistical analysis of radioimmunoassays and immu-
noradiometric (labeled antibody) assays: a generalized, weighted, iterative,
least-squares method for logistic curve-fitting. In: Radioimmunoassay and Related
Procedures in Medicine, vol. I , 165–192 (IAEA, Vienna, 1974).
Rodbard, D., Munson, P.J. and De Lean, A. Improved curve-fitting, parallelism
testing, characterization of sensitivity, validation and optimization for radioli-
gand assays. In: Radioimmunoassay and Related Procedures in Medicine, Proceedings
of the Symposium, West Berlin, 1977, (IAEA, Vienna, 1978).
Rogers, R.P.C. Data analysis and quality control of assays: a practical primer. In:
Practical Immunoassay, the State of the Art (ed Butt, W.R.) 253–308 (Marcel
Dekker, New York, 1984).
Wilkins, T.A., Chadney, D.C., Bryant, J., et al. Non-linear least-squares curve fit-
ting of a simple theoretical model using a mini-computer. Ann. Clin. Biochem.
15, 123–135 (1978a).
Wilkins, T.A., Chadney, D.C., Bryant, J., et al. Non-linear least-squares curve fit-
ting of a simple statistical model to radioimmunoassay dose-response data using
a mini-computer. In: Radioimmunoassay and Related Procedures in Medicine,
399–423 (IAEA, Vienna, 1978b).