This document analyzes the sampling design for water quality monitoring in Lake Kuortaneenjarvi, Finland. It used a nonlinear water quality model and historical data from 1980. The Wynn-Fedorov algorithm was used to construct a D-optimal design for estimating 6 key parameters in the model related to cyanobacteria and algae growth and temperature dependence. The nominal design suggested 7 daily observations between June and October, when rapid changes occur in algal succession and biomass. A Monte Carlo study found the design was relatively robust to 10% parameter uncertainty. The timing of sampling was analyzed for estimating each parameter using the information matrix.
An Analysis Of A Sampling Design A Case Study Of Lake Eutrophication
1. Computational Statistics & Data Analysis 8 (1989) 81-91
North-Holland
81
An analysis of a sampling design -
A case study of lake eutrophication zyxwvutsrqponmlkjihgfe
J
uhani KETTUNEN, Hannu SIRVI6 and Olli VARIS
Helsinki University of Technoloa, Laboratory of Hydrology and Water Resources Engineering,
Rakentajanaukio 4, SF-02150 Espoo, Finland.
Received 19 May 1988
Revised 8 September 1988
Abstract: The design of water quality sampling in Lake Kuortaneenjarvi, Finland was studied. A
nonlinear water quality model together with historical data of one year cycle were used as the base
of the design. A design for the nominal solution of the model was constructed using the
Wynn-Fedorov algorithm. According to the results, 7 daily observations of the lake consisted
essential information for the estimation of 6 most sensitive parameters of the model. A Monte-Carlo
study was carried out to analyze the robustness of the design with respect to the nominal parameter
vector. The results indicated that the design was relatively insensitive to 10% C.V. of the parameter
vector. However, the nominal design scattered from exactly fixed daily observations into a
recommendation to observe in few intensively measured periods. The timing of sampling with
respect to the estimation of each of the parameters studied was analyzed using the trace of the
inverse of the information matrix. Rapid changes in the algal succession and maximal biomasses
contributed greatly to the timing of the sampling suggested by the design.
Keywords: Wynn-Fedorov algorithm, Lake water quality, Monte-Carlo simulation, Nonlinear
modelling, Observational design, Sensitivity analysis. zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHG
1. Introduction
Modeling and forecasting of water quality in lakes is usually bound to be based
on very limited amount of observations compared with the complexity of the
systems. The sampling frequency ranges very often from four times of year up to
monthly or biweekly monitoring. More frequent sampling is a rarity. This is due
to limited resources in terms of capacities of laboratories, low level of automatiza-
tion in monitoring etc. The variety in the time constants of the processes
contributing to the water quality problems is very large. In chemical, biochemical
and biological processes, the time constants are smaller with several orders of
magnitude. Ecological process induced by them, such as seasonal succession
patterns and periodicity of plankton, are definitely easier detect using data with
intervals mentioned above. However, owing to the high number of factors driving
the ecosystem, the data requirements of unambiguous identification of the
0167-9473/89/$3.50 0 1989, Elsevier Science Publishers B.V. (North-Holland)
2. 82 J. Kettunen et al. /A sampling design
procedure resulting in lake water quality problems are extremely high in compari-
son to the resources.
Jorgensen [3] pointed out the importance of observational design to the
parameter estimation of lake models. He suggested the timing of sampling into
few intensively measured periods of the annual cycle of the ecosystem facilitating
effective allocation of observational resources. Mejer and Jorgensen [4] elaborated
the idea further. They suggested intensive sampling at the periods of maximum
changes in the ecosystem state and showed how the data obtained could be
utilized in nonlinear estimation.
Observational design for parameter estimation can also be approached from
the regression design theory. An application of this was given by Kettunen [4],
who reported a D-optimal design based on a nonlinear model of Lake
Kuortaneenjarvi Western Finland. The design was constructed maximizing the
inverse of the determinant of the information matrix using the simplex search
technique.
In this study, the sampling program of Lake Kuortaneenjarvi was analyzed
further. The aim was to design the observations to support the estimation of 6
most characteristic rate and temperature dependency parameters of cyanobacteria
and other algae in the model simulating the lake. The Wynn-Fedorov algorithm
(Fedorov [l]) was used to construct the design. Unlike the simplex, this algorithm
converges into an unique and exact solution. The sensitivity of the solution to
nominal parameter values and the physical interpretation of the design were
studied.
2. zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA
Modeling formalism
The system experiment model was represented on the observation interval [t,, 7’1
by the structure (l)-(4).
X’(& P) =f[-+, P>, u(t), t; PI, -r,=-+,, P), (1)
Y(& P) =g[-+, P>i PIP (2)
z(f,, P>=J&, p)+e(d, k=l,L...,N, (3)
~[-dt,PI, 4tL PI aoo, (4)
where x E zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA
R” is a state vector of water quality; ‘denotes the time derivative of the
system, u E R’ is the input vector, y E R M is the output vector of the model; zyxwvutsrqponmlkjihgfed
f is
a nonlinear vector valued function defining the postulated structure of the model
dynamics parameterized by a parameter vector p E R4; g is a nonlinear vector
valued function describing the known measurement process; h represents all
auxiliary, mainly differential and algebraic equality or inequality constraints
known a zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA
priori relating X, u and p; z(t,, p) E R” is a vector valued discrete
time measurement vector at measurement times t,. z relates the output y( t,, p)
of the deterministic model with the measurement errors e( tk), which are assumed
3. J. Kettunen et al. / A sampling design 83
to be white, Gaussian noise with zero mean and known variance a’( tk)lm. N is
the number of discrete samples in time.
The task of the observation design is to choose the sampling times t, which
guarantee the most accurate parameter estimates of the model. To solve
problem, the model given below was assumed to be structurally correct and
population mean of the parameter vector was assumed to be known a priori.
the
the
3. The zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA
algorithm for D-optimal@
A D-optimal sampling schedule was constructed using Wynn-Fedorov [l] al-
gorithm (See also Silvey [6], Fedorov et al. [2]). The optimal design was iterated in
7 successive stages:
(1) The initial design t,, . . . , t, having a non-singular information matrix M
was fixed according to the prior knowledge of the system. Each sampling time
was supposed to have an equal weight pk,N = l/N. Iteration counter ITER was
set to zero.
(2) The inverse of the information matrix M-’ was calculated from the
equation: zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA
-1
(5)
where S, is the Jacobian matrix of the model response at observation time t,
given by zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA
Sk= :
I
aY,/
aPl . . * aYl/
aP,
aY?n/
aPl . * . zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA
:I
aY?n>aPq k
In order to avoid computational problems
(6)
and to give equal weights on separate
state variables and parameters, the elements of the Jacobian matrix S,, denoted
by s( tk)ij were scaled and substituted by first order difference approximations
according to the formula:
@k)i, = [ [YzCtk, Pj+APj) -Yi(‘k, Pj)]/
‘Pj) * { Pj/
Yi,max} (7)
where Y;,max denotes the maximum value of the i:th component of the output
vector, and i = 1,. . . , m, j = 1,. . . , q, t, = 1,. . . , N. The perturbation of the
parameters Apj was 1% of the nominal value. The scaling was necessary due to
mathematically unpractical units of the model output and it corresponds to the
case, where the efficiency weighting matrix (Fedorov [l]) is diagonal having the
elements p,/y,, max.
(3) The iteration counter was cumulated ITER = ITER + 1. A new observation
time was added in the design by carrying out a grid search maximizing the
following criterion:
JN = max Tr{ SN+,M~‘S~+l). (8)
4. 84 J. Kettunen et al. /A sampling design zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQ
(4) The efficiency weights (Ywere updated using the equation:
JN
- 4
aN+*=q(d,-l) (9)
(5) The inverse of the information matrix M- ’ was updated using the recursive
equation :
M~~*=(l-ciN+*
)-I
1 Iq- ly+*
(yN+l MG1sT
Ni-1 zyxwvutsrqponmlkjihgfedcbaZYXWVUTSR
aN+l
1 - aN+l zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQP
S
(6) The weights of the supporting points of the design were updated using the
equations
Pk,Ntl = (I - aN+l)Pk,N
for a point tk already existed in the design;
Pk,N+l = (I - aN+l)Pk,N + ffN+l
for a design point replicated at iteration N + 1;
Pk,N+l = aN+l
for a completely new design point. (11)
(7) A design point having the weight less than 1% was removed from the design
and N was updated. If 1d, - q ) > 0.2 and the ITER c ITERMAX, matrix M-l
was updated using the equation in step (5) and the iteration was continued from
(3). Otherwise the run quit.
4. The zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA
model and the prior data
Due to high nutrient concentrations, Lake Kuortaneenjtivi suffers from excessive
biomasses of cyanobacteria (blue-green algae), which embarasses fisheries and
recreational use. A simulation model was formulated for the main transforma-
tions of phosphorus and nitrogen and the biomasses of nitrogen-fixing cyanobac-
teria and other planktonic algae in the lake (Varis [7,8]). The state variables x of
the model are given in Table 1 and the structure of the model is illustrated by
Figure 1. The model was constructed mainly by inference based on prior
assumptions or knowledge, and the parameters were estimated from the data
collected in 1980. The data consisted of daily measurements of hydrological and
meteorological inputs, as well as of physico-chemical and algal biomass observa-
tions that were carried out 23 and 10 times, respectively, during 1980.
The model is highly nonlinear having 33 parameters (Varis [S]). In the analysis
of averaged model sensitivity to the nominal parameter values, Varis [7] con-
5. J. Kettunen et al. / A sampling design 85
Table 1
State variables of the model used in the observation design (for details, see Varis [S])
Symbol Interpretation unit Remarks
Observed in 1980
DIP
DIN
C
F
CN
CP
FN
FP
ALP
AUP
ND
POSE
PSSE
NOSE
NSSE
Dissolved inorganic phosphorus
Dissolved inorganic nitrogen
Biomass of Cyanobacteria
Biomass of phytoplankton (C excluded)
Nitrogen in cyanobacteria cells
Phosphorus in cyanobacteria cells
Nitrogen in phytoplankton cells
Phosphorus in phytoplankton cells
Alloctonous detrital phosphorus
Autochtonous detrital phosphorus
Detrital nitrogen
Organic phosphorus in sediment
Inorganic phosphorus in sediment
Organic nitrogen in sediment
Inorganic nitrogen in sediment
wm
3
mg me3
mg me3
mg me3
mg me3
mg me3
mg me3
mg rnd3
mg mM3
mg rnd3
mg mM3
mg me3
mg rnd3
mg rnd3
mg rnd3
Observed in 1980
Observed in 1980
Observed in 19870
Indirectly observed in 1980
Indirectly observed in 1980
Indirectly observed in 1980
(per volume of lake water)
(per volume of lake water)
(per volume of lake water)
(per volume of lake water)
eluded that of the model output the concentrations of inorganic nutrients and
biomasses of phytoplankton and cyanobacteria are extremely sensitive to the
errors in the parameters of growth and respiration and the temperature depend-
encies of the algal growth. Hence, it was found necessary to design further
observations for re-evaluating them.
,
*loss d”B ,o O”,flOW + kx* due 10 Eedimsnt consolidation
Fig. 1. The flow diagram of the Lake Kuortaneenjarvi model (for the symbols, see Table 1).
6. 86 zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA
J. Kettunen et al. / A sampling design zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQP
5. The nominal design
An observational program was designed for obtaining optimal data for the
estimation of parameters in Table 2 The system output, vector was taken to
consist of four components: Dissolved inorganic phosphorus (DIP) and nitrogen
(DIN), and biomasses of phytoplankton (F) and cyanobacteria (C). Simulated
nominal response of the model is shown in Figure 2.
An initial estimate for the design was obtained by Kettunen [5], who concluded
that all the informative observational times for the model took place between the zyxwvutsrqponm
50
mg/m' DIP (a)
01 I I I I I
0 60 120 160 2LO 300 d 360
Time
800
mg/m3 DIN (b)
01 I I WI I I
0 60 120 180 240 300 d 360
Time
0 60 120 180 2LO 300 d 360
Time
2500r
mg/m
3l F
Cd)
1000 -
500 -
0 60 120 180 240 300 d 3
4
6
Time
Fig. 2. Simulated and observed values of dissolved inorganic phosphorus (a), dissolved inorganic
nitrogen (b), biomass of cyanobacteria (c) and phytoplankton biomass (d) in Lake Kuortaneenjorvi,
1980.
Table 2
Parameters in the design
Symbol Interpretation Nominal value Unit
Pl Temperature coeff. of F growth in van? Hoff’s eq. 1.02 - zyxwvutsrqponmlkjihgf
P2 Max. growth rate of F 0.65 I/ d
P3 Respiration rate of F 0.16 l/ d
P4 Max. growth rate of C 0.60 I/ d
P5 Temperature coeff. of C growth in van? Hoff’s eq. 1.09 _
P6 Respiration rate of C 0.185 I/ d
7. J. Kettunen et al. / A sampling design
PROBABILITY (%)
87
JUNE JULY AUG SEPT
MONTH
Fig. 3. The nominal observational design. Optimal sampling times and their weights.
Table 3
Correlation of the model parameters at the nominal design
PI P2 P3 P4 PS P6
Pl 1
P2 0.77 1
P3 0.57 0.95 1
P4 -0.14 0.20 0.37 1
PS -0.15 -0.38 - 0.39 0.13 1
P6 -0.07 0.35 0.53 0.95 - 0.18 1
beginning of June and the end of October (referred also as days 150.. .300
counted from the beginning of the year). 21 uniformly distributed sampling times
(151, 158, 165,. zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA
. . ,298) were fixed at the observational interval.
The algorithm converged rapidly towards the solution. After 7 iterations, all
the final design points existed in the design. After 20 iterations the only change
was the introduction of replicates in the points chosen in previous iterations. The
final design consisted of 7 sampling instants (Figure 3). Two samples were timed
in June, one in July, three in August and one in September. However, according
to the nominal design, 30% of the repeated observations were allocated in June,
54% in August and only 3% and 14% in July and September, respectively.
The correlation of the model parameters was approximated at the nominal
design points from the sensitivity matrix STS (Table 3). According to the results,
parameters p2 and p3 as well as parameters p4 and p6 correlated significantly.
Also parameters p1 and p2 as well as p, and p3 were strongly correlated.
6. Sensitivity of the design
The convergence of the design algorithm and the sensitivity of the design to the
nominal parameter values were studied using Monte-Carlo analysis. 30 simulation
runs were performed. In each of the runs zero mean white Gaussian noise was
8. 88 J. Kettunen et al. /A sampling design
9
0
:!;__
I
0 50 1 zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONM
0
Iteration number
Fig. 4. The development of efficiency weights CIin the iteration. Maximum, average and minimum
values of cx. zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLK
added to the nominal parameter vector. The variance of the noise was (0.1
*pi)2
for the parameters p2, p3, p4 and p6, and (0.05 * pi)2 for the parameters p1 and
ps. In each run, all the parameters were perturbed. The initial design was equal to
one used in the construction of the nominal design.
Despite of the perturbations, the algorithm converged rather rapidly to the
final design. A good indicator of this is the trajectory of efficiency weights plotted
as the function of iteration (Figure 4). It stabilized after 70-80 iterations.
The perturbation of parameters scattered the nominal design into less distinct
sampling instants (Figure 5). However, 39% and 41% of the sampling capacity
was still allocated into J
une and August, respectively. The scattering of the design
points was greatest in J
une.
The instants providing the best support for the estimation of each model
parameter studied was analysed studying the development of the diagonal ele-
ments of M-i. The design runs with 100 iteration steps were performed for the
nominal solution with a randomly chosen initial design condition. At each
PROBABILITY (%)
25, ,
I NOMINAL
II PERTURB
JUNE JULY AUG SEPT
MONTH
Fig. 5. Results of the Monte-Carlo analysis. The distribution of sampling points in time and their
3-day cumulative weights.
9. J. Kettunen et al. / A sampling design 89
Fig. 6. Frequency of improvements of the parameter accuracy at design points. zyxwvutsrqponmlkjihgfedcb
iteration, a negative change in the diagonal element was interpreted as an
improvement in the respective parameter value due to observation.
The frequency of the occurrence of improvements at a certain design point was
counted and plotted for Figure 6. According to the results, the sampling instant
159 improved the accuracy of the parameter pt only. Sampling at 169 improved
the accuracy of parameters pr -p3; sampling at 214 improved all the parameters
but pl, and at 225 improved the parameters p3, p4 and p6. Parameter p4 was the
only one to be improved remarkably by timing the observation at the instant 234.
Sampling at 250 improved all the parameters characterizing the biological activity
of the cyanobacteria.
7. Discussion and conclusion
When comparing the application of the Wynn-Fedorov [l] and the application of
simplex search (Kettunen [5]), following remarks can be postulated. Unlike in
simplex, one need not to fix the number of points in the initial design when using
the Wynn-Fedorov algorithm. The variance function is distinct and more
straightforward to calculate than the determinant criterion, even if the application
of both of the criteria resulted in the same design. The Wynn-Fedorov algorithm
is of on-line type, and hence it is more flexible than the off-line simplex search
used by Kettunen [5]. Yet, the maximization of the variance function [tr(S-‘ST)]
was not applied in that study. The simplex, however, is much more sparse in
computational time.
The design obtained may be subject to bias due to the significant multicollin-
earity involved in the model. However, the design can be postulated to form a
very logical entity with ecological prior knowledge.
10. 90 J. Kettunen et al. / A sampling design zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQ
The phytoplankton growth is at its highest at the instants 159 and 169. This is
likely to be the reason for the identifiability of the parameters p1 -p3 char-
acterizing phytoplankton activity. The instant 225 represents the phase in the
algal succession in which the cyanobacteria are becoming dominant, and the
phytoplankton biomass is rapidly decreasing. Sampling at this instant supports
the distinction of the parameters representing the properties of these two groups
of algae. Sampling after that supports the estimation of the cyanobacterial
parameters.
The perturbations introduced into the parameters may appear modest in
comparison with the rather heuristic approach in the model identification.
However, for instance in the most sensitive model parameters, which are the
temperature coefficients, a perturbation of 5% is very large compared with the
range of literature values for this kinetic parameter.
The modest scattering of the solution observed in the Monte-Carlo analysis
does not constitute a practical problem in scheduling a sampling program. This is
due to the character of the physical problem; the inputs to the system are subject
to very big variations in the nature, at lest in the climatic zone considered. Hence,
the algal periodicity can have very different outcomes in different years.
The stability of the design obtained and the accordance of the design with the
one obtained by the simplex search may be an implication of the property of the
model encountered by Varis [9]. The competitive facilities of the two algal groups
were studied by input perturbation analysis. Despite of remarkable responses in
biomasses, the timings of the main succession events; growth and decay seasons,
competition phases, timings of biomass peaks etc., appeared to be rather stable.
Whether this also is a property of the lake studied is presently obscure. Anyhow,
the seasons at the 63th latitude and the hydrological variability due to distinct
flood seasons in the region studied support this idea.
In the study of the temporal input sensitivity of the Lake Kuortaneenjarvi
model (Varis [lo]), remarkable lags were encountered between the instant of input
(nutrient loads, temperature, irradiance) perturbation and the maximal outcome
response. The instants in which the output (cyanobacteria, phytoplankton) sensi-
tivities were greatest, occurred with input perturbations in late May. However,
the output sensitivity peak took place as late as in late August and early
September. An extension of the parameter sensitivity based study, a design taking
into account also the model sensitivity to inputs, is hence likely to be the next
step of the study.
The main practical restrictions of the results obtained in this study are
following. The model used in this study has been constructed to simulate the algal
succession in one lake. As mentioned, it is based on the system behavior observed
only during one year. Also the assumption of the correctness of the model is very
restrictive. However, the design appears to be very logical. The practical idea of
the temporal pattern of the design to be derived from the ecological succession
pattern observed and postulated, in addition to the methodological experiences
gained, are likely to find more general applicability.
11. J. Kettunen et al. / A sampling design 91 zyxwvutsrqponmlkji
Acknowledgements zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA
The comments of Prof. M. Straskraba and Dr. L. Lhotka and useful remarks of
the referee on our approach are acknowledged. Maj- and Tor Nessling’s Founda-
tion and the Academy of Finland supported us financially.
References
[l] V.V. Fedorov, Theory of Optimal Experiments (Academic Press, New York, 1972).
[2] V.V. Fedorov, S. Leonov, M. Antonovski and S. Pitovranov, The Experimental Design of an
Observation Network: Software and Examples, Working Paper WP-87-05, International In-
stitute for Applied Systems Analysis, 2361 Laxenburg, Austria (1987).
[3] S.E. Jorgensen, Lake Management (Pergamon Press, Oxford, 1980).
[4] H. Mejer and L. Jorgensen, Identification methods applied to two Danish lakes, in: M.B. Beck
and G. van Straten (Eds.), Uncertainty and Forecasting of Water Quality (Springer-Verlag,
Berlin, 1983).
[5] J. Kettunen, Design of limnological observations for detecting processes in lakes and re-
servoirs, in: M. Straskraba, M. Tundisi and A. Duncan (Eds.), Comparative Limnologv and
Water Quality Modeling of Reservoirs, in print.
[6] S.D. Silvey, Optimal Design (Chapman and Hall, London, 1980).
[7] 0. Varis, A water quality model for Lake Kuortaneenjarvi (in Finnish, with English summary)
M.Sc. Thesis. Division of Water Engineering, Helsinki University of Technology (1984).
[8] 0. Varis, Water quality model for Lake Kuortaneenjarvi, a polyhumic Finnish lake, Aqua
Fennica 14 (1984) 179-187.
[9] 0. Varis, Impacts of growth factors on competitive ability of blue-green algae analyzed with
whole-lake simulation, in: M. Straskraba, M. Tundisi and A. Duncan (Eds.), Comparative
Limnology and Water Quality Modeling of Reservoirs, in print.
[lo] 0. Varis, The temporal sensitivity of Aphanizomenon flos-aquae dominance a whole-lake
simulation study with input perturbations, to appear in Ecological Modelling.