SlideShare a Scribd company logo
ASSESSMENT OF SEASONAL AND POLLUTING EFFECTS
ON THE QUALITY OF RIVER WATER BY EXPLORATORY
DATA ANALYSIS
MARISOL VEGA*, RAFAEL PARDO*
M
, ENRIQUE BARRADO and LUIS DEBA
 N
Departamento de QuõÂmica AnalõÂtica, Facultad de Ciencias, Universidad de Valladolid, 47005
Valladolid, Spain
(First received December 1996; accepted March 1998)
AbstractÐ22 Physico-chemical variables have been analyzed in water samples collected every three
months for two and a half years from three sampling stations located along a section of 25 km of a
river a€ected by man-made and seasonal in¯uences. Exploratory analysis of experimental data have
been carried out by box plots, ANOVA, display methods (principal component analysis) and unsuper-
vised pattern recognition (cluster analysis) in an attempt to discriminate sources of variation of water
quality. PCA has allowed the identi®cation of a reduced number of ``latent'' factors with a hydrochemi-
cal meaning: mineral contents, man-made pollution and water temperature. Spatial (pollution from
anthropogenic origin) and temporal (seasonal and climatic) sources of variation a€ecting quality and
hydrochemistry of river water have been di€erentiated and assigned to polluting sources. An ANOVA
of the rotated principal components has demonstrated that (i) mineral contents are seasonal and climate
dependent, thus pointing to a natural origin for this polluting form and (ii) pollution by organic matter
and nutrients originates from anthropogenic sources, mainly as municipal wastewater. The application
of PCA and cluster analysis has achieved a meaningful classi®cation of river water samples based on
seasonal and spatial criteria. # 1998 Elsevier Science Ltd. All rights reserved
Key words: water quality, surface water, hydrochemistry, exploratory data analysis, ANOVA, box plot,
principal component analysis, pattern recognition, cluster analysis.
INTRODUCTION
River basins generally constitute areas with a high
population density owing to favourable living con-
ditions such as the availability of fertile lands,
water for irrigation, industrial or drinking purposes,
and ecient means of transportation. Rivers play a
major role in assimilating or carrying o€ industrial
and municipal wastewater, manure discharges and
runo€ from agricultural ®elds, roadways and
streets, which are responsible for river pollution
(Stroomberg et al., 1995; Ward and Elliot, 1995).
Rivers constitute too the main water resources in
inland areas for drinking, irrigation and industrial
purposes; thus, it is a prerequisite for e€ective and
ecient water management to have reliable infor-
mation of water quality.
The discharge of industrial and municipal waste-
water and manure can be considered a constant
polluting source, but not so the surface runo€
which is seasonal and highly a€ected by climate.
Flow in rivers is a function of many factors includ-
ing precipitation, surface runo€, inter¯ow, ground-
water ¯ow and pumped in¯ow and out¯ow.
Seasonal variations of these factors have a strong
e€ect on ¯ow rates and hence on the concentration
of pollutants in the river water.
Long-term surveys and monitoring programs of
water quality are an adequate approach to a better
knowledge of river hydrochemistry and pollution,
but they produce large sets of data which are often
dicult to interpret (Dixon and Chiswell, 1996).
Most discussions on trend detection focus on ana-
lysing a single variable, while routine monitoring
programs ordinarily measure several variables. The
problem of data reduction and interpretation of
multiconstituent chemical and physical measure-
ments can be approached through the application
of multivariate statistical methods and exploratory
data analysis (Massart et al., 1988; Wenning and
Erickson, 1994). The usefulness of multivariate stat-
istical tools in the treatment of analytical and en-
vironmental data is re¯ected by the increasing
number of papers cited in Analytical Chemistry
Reviews (Brown et al., 1994, 1996).
Cluster analysis and principal component analysis
(PCA) have been widely used as they are unbiased
methods which can indicate associations between
samples and/or variables (Wenning and Erickson,
1994). These associations, based on similar magni-
tudes or variations in chemical and physical constitu-
ents, may indicate the presence of seasonal or man-
made in¯uences. Hierarchical agglomerative cluster
Wat. Res. Vol. 32, No. 12, pp. 3581±3592, 1998
# 1998 Elsevier Science Ltd. All rights reserved
Printed in Great Britain
0043-1354/98 $19.00 + 0.00
PII: S0043-1354(98)00138-9
*Author to whom all correspondence should be addressed.
[E-mail: solvega@wamba.cpd.uva.es].
3581
analysis indicates groupings of samples by linking
inter-sample similarities and illustrates the overall
similarity of variables in the data set (Massart and
Kaufman, 1983). PCA is used to reduce the dimen-
sionality of the data set by explaining the correlation
among a large set of variables in terms of a small
number of underlying factors or principal com-
ponents without losing much information (Jackson,
1991; Meglen, 1992), and allows to assess associ-
ations between variables, since they indicate partici-
pation of individual chemicals in several in¯uence
factors. Exploratory data analysis has been used to
evaluate the water quality of rivers, and seasonal,
spatial and anthropogenic in¯uences have been evi-
denced (Brown et al., 1980; Bartels et al., 1985;
Grimalt et al., 1990; Librando, 1991; Andrade et al.,
1992; Aruga et al., 1993; Elosegui and Pozo, 1994;
Pardo et al., 1994; Battegazzore and Renoldi, 1995;
Voutsa et al., 1995).
In this work, PCA, analysis of variance
(ANOVA) and agglomerative hierarchical cluster
analysis have been used to investigate the water
quality of the Pisuerga river (Duero basin, Spain),
to assess the in¯uence that pollution and seasonality
have on the quality of river water, and to discrimi-
nate the individual e€ects of climate and human ac-
tivities on the river hydrochemistry.
METHODS
Sampling stations
The Pisuerga river belongs to the Duero river basin,
which is located in the Castilla y Leo
 n region (Centre-
North of Spain). The inland geographic situation of the
basin, surrounded by mountains, conditions an extremely
continental climate. Precipitations in the area are scarce,
ranging from 313 to 571 mm yrÿ1
, with a mean of
442 mm yrÿ1
. Precipitations are maximum in November
(49.8 mm) and minimum in August (13.2 mm). The annual
mean temperature is 128C, and extreme values of ÿ48C
and 328C are registered in January and July, respectively.
The river ¯ows in direction North±South from the
Northern mountains through a high tableland to run into
the Duero river, and is the main drainage stream in that
direction; in spring, snow melting in the Northern moun-
tains causes a marked increase in river ¯ow. Along its
course, the river pass through limestone, marl, gypsum
and sandstone soils which are the main contributors to the
high levels of minerals in the river water. An important
agricultural activity devoted to irrigated crops takes place
in riverine areas where the use of nitrogenous fertilisers is
a common practice. 12 Km upstream its mouth, the river
crosses the town of Valladolid, major industrial centre of
the region with a population of ca. 400 000. Municipal
wastewater is directly discharged into the river (estimated
volume is ca. 57 millions m3
yrÿ1
) as the wastewater and
sewage treatment plant is still being built. Moreover,
although big industries settled in the area purify their
wastewater, small industries are suspected to discharge
residues into the river. The combination of both a high
population density in the area and an extreme continental
climate causes river hydrology and hence river pollution to
be strongly in¯uenced by seasonality.
The investigated river section is located at 41823'24N
and 04827'00W, and is in average 690 m over the sea level.
It covers a length of 25 km from Cabezo
 n de Pisuerga,
small village located 13 km upstream Valladolid, and the
village of Simancas, in the mouth of the Pisuerga river,
12 km downstream Valladolid. Major industrial activity in
the area is concentrated in the North of the city, upstream
the bridge called Puente Mayor, and municipal discharges
into the river are mainly produced from Puente Mayor to
Simancas.
Selected sampling stations were located in Cabezo
 n de
Pisuerga, Puente Mayor and Simancas, in an attempt to
isolate and identify the polluting sources: in Cabezo
 n de
Pisuerga the river has not received industrial and munici-
pal wastewater yet, and the water quality in this station
can be considered to re¯ect pollution from overland ¯ow
and from agricultural and manure discharges; Puente
Mayor re¯ects the situation in which industrial wastewater
has been discharged, but no municipal residues; in
Simancas the river has received all the polluting dis-
charges.
Selected stations were sampled every three months for
two and a half years. A total of 10 samples were collected
from each station on the following dates: 06/04/90, 03/07/
90, 09/10/90, 09/01/91, 03/04/91, 02/07/91, 10/10/91, 09/
01/92, 10/04/92 and 06/07/92. Samples are identi®ed
throughout by means of a four-character code XYZZ,
where X means the sampling station (C, Cabezo
 n; P,
Puente Mayor and S, Simancas), Y is the month of
sampling (A, April; J, July; O, October and E, January)
and ZZ means the year (90, 91 or 92).
Analytical procedures
Sample containers were 1 l polyethylene bottles provided
with hermetic-locking caps. Bottles and caps were cleaned
by soaking into 50% HCl for three days, rinsed with
desionized water and soaked into 2 M HNO3 for another
three days, ®nally rinsed with desionized water, drained,
wrapped in polyethylene bags and stored until required.
Samples were collected by means of a Go-Flo device
from the middle of the stream at a depth of 15 cm, from
stone bridges existing in each of the sampling stations.
Prior to sample collection, sampling device and containers
were rinsed twice with the water to be sampled.
Temperature, pH, conductivity and dissolved oxygen
measurements were performed in situ. Duplicate samples
were taken out from each sampling station and immedi-
ately ®ltered under nitrogen pressure through cellulose
nitrate ®lters (pore size 0.45 mm) into acid-washed poly-
ethylene bottles. One duplicate was acidi®ed to pH 2 by
addition of 100 ml of 10 M HCl to each 100 ml sample and
used for determination of metals, hardness, nitrogen (as
ammonia, nitrite and nitrate) and phosphorous (as phos-
phate). The second duplicate was kept at its natural pH
and used for determination of the remaining anions (bicar-
bonate, chloride and sulphate), conductivity and organic
matter (as chemical oxygen demand, COD, and biochemi-
cal oxygen demand, BOD). Samples were immediately
transported to the laboratory and stored at 48C until their
analysis, which was accomplished within one week.
22 Physico-chemical parameters have been determined
by following standard and recommended methods of
analysis (APHA-AWWA-WPCF, 1985; AOAC, 1990).
Table 1 displays the variables measured and their units,
the analytical techniques employed, and the abbreviations
used henceforth. A total of 660 analysis were carried out
(22 variables in 30 samples). Two replications of each
analysis were performed and mean values were used for
calculations.
Data treatment
Exploratory data analysis was performed by linear dis-
play methods (principal component analysis) and by unsu-
pervised pattern recognition techniques (hierarchical
cluster analysis) on experimental data normalized to zero
Marisol Vega et al.
3582
mean and unit variance in order to avoid misclassi®cations
arising from the di€erent order of magnitude of both nu-
merical value and variance, of the parameters analysed. As
the methods of classi®cation used here are non-parametric,
they make no assumptions about the underlying statistical
distribution of the data and therefore no evaluation of
normal (Gaussian) distribution of the data is necessary
(Sharaf et al., 1986).
Principal component analysis was applied to normalized
data to assess associations between variables, since this
method evidences participation of individual chemicals in
several in¯uence factors, which commonly occurs in
hydrochemistry. Diagonalization of the correlation matrix
transforms the original p correlated variables into p uncor-
related (orthogonal) variables called principal components
(PCs), which are weighed linear combinations of the orig-
inal variables (Mellinger, 1987; Meglen, 1992; Wenning
and Erickson, 1994). The characteristic roots (eigenvalues)
of the PCs are a measure of their associated variances,
and the sum of eigenvalues coincides with the total num-
ber of variables. Correlation of PCs and original variables
is given by loadings, and individual transformed obser-
vations are called scores.
Cluster analysis is an unsupervised pattern recognition
technique that uncovers intrinsic structure or underlying
behaviour of a data set without making a priori assump-
tions about the data, in order to classify the objects of the
system into categories or clusters based on their nearness
or similarity. In hierarchical cluster analysis the distance
between samples is used as a measure of similarity.
Hierarchical agglomerative cluster analysis was carried out
on the normalised data by means of the complete linkage
(furthest neighbour), average linkage (between and within
groups) and Ward's methods, using squared Euclidean dis-
tances as a measure of similarity (Massart and Kaufman,
1983; Willet, 1987).
RESULTS AND DISCUSSION
Table 2 summarises brie¯y the mean value and
standard deviation of the 22 measured variables in
the river water samples from the three stations. It
must be noticed the high dispersion of most vari-
ables (high standard deviations), which indicates
variability in chemical composition between
samples, thus pointing to the presence of temporal
variations caused likely by polluting sources and/or
climatic factors.
Recommended guide levels of these variables and
maximum levels allowed by the European Directive
80/778/EEC concerning the quality of water intended
for human consumption are included in Table 2. It
must be emphasised that average concentrations of
some variables such as chloride, COD, iron, manga-
nese, sodium, ammonia, nitrite, phosphate and sul-
phate are higher than those recommended by the
European legislation, therefore this water resource is
not adequate for human consumption or industrial
purposes and needs to be puri®ed.
High levels of phosphate may originate from mu-
nicipal wastewater discharges since it is an important
component of detergents. The presence of nitrate in
the river section sampled is suspected to originate
from overland runo€ from riverine agricultural ®elds
where irrigated horticultural crops are grown and the
use of inorganic fertilisers (usually as ammonium
nitrate) is rather frequent. This practice could also
explain the high levels of ammonia, but this pollutant
may also originate from decomposition of nitrogen-
containing organic compounds such as proteins and
urea occurring in municipal wastewater discharges.
In the presence of high levels of organic matter,
nitrate can be reduced in some extent to nitrite, what
could explain the high concentration of this pollutant
in some samples. The high sulphate contents found
in waters of the Pisuerga river are probably a conse-
quence of the morphology of soils irrigated by the
river, which are formed mainly by limestone, marl
and gypsum.
Exploratory data analysis using box plots
Normal probability plots of the variables in con-
junction with the Anderson±Darling normality test
Table 1. Physico-chemical parameters determined and analytical techniques used
Variable Abbreviation Analytical technique Units
Biochemical oxygen demand BOD potentiometry/O2 probe mg O2 lÿ1
Calcium Ca ¯ame AAS mg lÿ1
Chloride Cl ion chromatography mg lÿ1
Chemical oxygen demand COD redox titrometry (KMnO4) mg O2 lÿ1
Conductivity COND conductometry mmho cmÿ1
Dissolved solids DS drying at 1808C/weighing mg lÿ1
Iron Fe ¯ame AAS mg lÿ1
Flow rate FLOW (*) m3
sÿ1
Hardness HARD EDTA titrometry mg CaCO3 lÿ1
Bicarbonate HCO3 acid±base titrometry mg lÿ1
Potassium K ¯ame AES mg lÿ1
Magnesium Mg ¯ame AAS mg lÿ1
Manganese Mn ¯ame AAS mg lÿ1
Sodium Na ¯ame AES mg lÿ1
Ammonium NH4 spectrophotometry mg lÿ1
Nitrite NO2 spectrophotometry mg lÿ1
Nitrate NO3 spectrophotometry mg lÿ1
Dissolved oxygen OXYG potentiometry/O2 probe mg lÿ1
pH pH potentiometry/pH probe pH units
Phosphate PO4 ion chromatography mg lÿ1
Sulphate SO4 ion chromatography mg lÿ1
Temperature TEMP temperature probe 8C
(*) Data supplied by Confederacio
 n Hidrogra
 ®ca del Duero.
Water quality analysis using exploratory data 3583
demonstrated that most variables were not normally
distributed. However, these normality tests applied
to individual sampling stations resulted in normal
distributions for most variables, thus pointing to
the existence of di€erences in water composition
among stations.
Box plots (also called box-and-whisker plots) of
individual variables in the three sampling stations
were examined. Figure 1 shows an example of
box plots for some meaningful variables related to
the quality of river water, such as conductivity
(mineralization), COD, dissolved oxygen or am-
monium. The line across the box represents the
median, whereas the bottom and top of the box
show the locations of the ®rst and third quartiles
(Q1 and Q3). The whiskers are the lines that
extend from the bottom and top of the box to
the lowest and highest observations inside the
region de®ned by Q1ÿ1.5(Q3ÿQ1) and
Q3+1.5(Q3ÿQ1). Individual points with values
outside these limits (outliers) are plotted with
asterisks.
Table 2. Statistical descriptives for the 30 samples analysed
Cabezo
 n Puente Mayor Simancas
Variable Mean Std. Dev. Mean Std. Dev. Mean Std. Dev. Min. Max. Guide level* Max.*
BOD 2.8 0.8 3.2 0.7 3.7 1.2 1.5 6.5
Ca 77.0 9.6 77.1 7.4 76.5 8.9 58.8 91.2 100
Cl 23.3 7.7 24.3 8.0 28.3 9.9 12.2 46.1 25 200
COD 3.1 1.2 3.6 0.8 5.0 2.0 0.7 10 2 5
COND 589 123 599 98 629 115 402 773 400
DS 398 81 410 67 427 69 273 524 1500
Fe 0.10 0.05 0.12 0.04 0.11 0.05 0.01 0.19 0.05 0.2
FLOW 45.0 42.6 37.0 20.9 37.5 21.2 14.8 129.2
HARD 250.1 43.6 253.1 32.6 254.4 35.1 179.1 302.9
HCO3 150.4 17.8 142.8 20.9 156.1 23.4 96.1 176.8
K 4.8 1.9 5.2 1.8 6.2 2.2 2.8 10.4 10 12
Mg 14.0 5.3 14.8 4.8 15.4 4.5 6.2 23.8 30 50
Mn 0.03 0.02 0.03 0.02 0.04 0.02 0.01 0.08 0.02 0.05
Na 19.4 9.5 20.2 7.7 25.6 10.0 7.1 40.5 20 150
NH4 0.63 0.62 0.51 0.23 1.66 0.92 0.05 3.61 0.05 0.5
NO2 0.32 0.32 0.13 0.09 0.35 0.30 0.03 1.08 Absence 0.1
NO3 11.2 7.3 11.9 7.0 10.4 8.4 0.3 29.9 25 50
OXYG 8.1 1.8 8.4 1.8 4.9 3.3 0.7 11.4
pH 8.0 0.2 8.1 0.5 7.6 0.3 7.2 8.8 6.5±8.5 9.5
PO4 0.84 0.32 0.86 0.30 1.61 0.63 0.35 2.50 0.3 3.3
SO4 105.4 34.9 108.9 28.7 112.7 28.1 50 150 25 250
TEMP 13.6 5.9 14.5 7.7 14.3 7.3 2.2 24.9 12 25
(*) Recommended guide levels and maximum concentrations allowed by the European Directive 80/778/EEC concerning the quality of
water intended for human consumption.
Fig. 1. Box plots for conductivity, COD, dissolved oxygen and ammonium in Cabezo
 n (C), Puente
Mayor (P) and Simancas (S).
Marisol Vega et al.
3584
Box plots provide a visual impression of the lo-
cation and shape of the underlying distributions.
For example, box plots with long whiskers at the
top of the box (such as that for ammonium at
Simancas) indicate the underlying distribution is
skewed toward high concentration. Box plots with
large spread indicate seasonal variations of the
water composition (see conductivity box plot). By
inspecting these plots it was also possible to per-
ceive di€erences among the three stations. For
example, dissolved oxygen in Simancas is lower and
has a greater spread compared with that in
Cabezo
 n and Puente Mayor. At the same time,
COD and ammonium are higher in Simancas, thus
pointing to a deterioration of the water quality
downstream likely caused by the discharge of mu-
nicipal wastewater.
Analysis of variance (ANOVA) examines the
di€erent e€ects (usually called sources of variation)
operating simultaneously on a response to decide
which e€ects are statistically signi®cant and to esti-
mate their contribution to the variability of the re-
sponse (Sche€e, 1959; Ross, 1988). Two-way
ANOVA of independent variables showed the exist-
ence of seasonal and/or spatial di€erences. For
example, seasonal signi®cant di€erences were found
for conductivity, temperature or ¯ow, whereas for
ammonium, phosphate or pH the di€erences were
mainly due to the sampling station. For COD and
BOD both sources of variation were signi®cant.
Box plots and ANOVA showed similar trends for
each variable; however, these are univariate tech-
niques inadequate for the investigation of our mul-
tivariate data table as the variables are correlated.
Principal component analysis
The covariance matrix of the 22 analysed vari-
ables was calculated from data normalised as
described in Section 2.3 and, therefore, coincides
with the correlation matrix (Table 3). Because the
three sampling stations were combined to calculate
the correlation matrix, the correlation coecients
should be interpreted with caution as they are
a€ected simultaneously by spatial and temporal
variations. Nevertheless, some clear hydrochemical
relationships can be readily inferred: High and posi-
tive correlation (underlined values) can be observed
between bicarbonate, sulphate, chloride, calcium,
magnesium, potassium, sodium, dissolved solids,
conductivity and hardness (r = 0.572 to 0.977),
which are responsible for water mineralization.
Flow rate is negatively correlated to most variables,
since an increase in ¯ow rate causes dilution of con-
taminants. This anti-correlation is highly signi®cant
for ``mineral'' components (conductivity, hardness,
dissolved solids, magnesium and sulphate). BOD
and COD are strongly correlated (r = 0.893) and
also with ammonia, phosphate (closely related to
contamination for organic mater) and potassium.
As expected, dissolved oxygen is negatively corre-
Table
3.
Correlation
matrix
of
the
22
physico-chemical
parameters
determined
BOD
Ca
Cl
COD
COND
DS
Fe
FLOW
HARD
HCO
3
K
Mg
Mn
Na
NH
4
NO
2
NO
3
OXYG
pH
PO
4
SO
4
TEMP
BOD
1.000
Ca
ÿ0.117
1.000
Cl
0.413
0.758
1.000
COD
0.893
ÿ0.036
0.516
1.000
COND
0.260
0.887
0.916
0.321
1.000
DS
0.316
0.825
0.881
0.334
0.974
1.000
Fe
0.177
ÿ0.270
ÿ0.137
0.065
ÿ0.154
ÿ0.102
1.000
FLOW
ÿ0.164
ÿ0.497
ÿ0.394
ÿ0.108
ÿ0.592
ÿ0.571
ÿ0.048
1.000
HARD
0.229
0.898
0.860
0.240
0.977
0.951
ÿ0.151
ÿ0.659
1.000
HCO
3
0.270
0.648
0.712
0.347
0.774
0.762
ÿ0.251
ÿ0.484
0.770
1.000
K
0.679
0.442
0.748
0.649
0.701
0.713
ÿ0.100
ÿ0.356
0.656
0.644
1.000
Mg
0.552
0.579
0.772
0.484
0.849
0.868
0.016
ÿ0.683
0.879
0.725
0.736
1.000
Mn
0.492
0.109
0.434
0.437
0.333
0.311
0.464
ÿ0.431
0.346
0.285
0.423
0.521
1.000
Na
0.238
0.809
0.914
0.350
0.929
0.902
ÿ0.118
ÿ0.419
0.841
0.705
0.697
0.683
0.280
1.000
NH
4
0.709
0.110
0.483
0.773
0.378
0.384
0.094
ÿ0.170
0.291
0.485
0.663
0.419
0.359
0.468
1.000
NO
2
0.324
0.062
0.190
0.258
0.195
0.233
ÿ0.110
ÿ0.198
0.222
0.381
0.381
0.341
0.329
0.118
0.327
1.000
NO
3
ÿ0.010
ÿ0.021
ÿ0.172
ÿ0.114
ÿ0.018
0.072
0.208
0.187
ÿ0.019
ÿ0.211
ÿ0.047
ÿ0.014
ÿ0.314
ÿ0.074
0.021
ÿ0.109
1.000
OXYG
ÿ0.531
ÿ0.009
ÿ0.375
ÿ0.634
ÿ0.282
ÿ0.246
ÿ0.016
0.389
ÿ0.247
ÿ0.435
ÿ0.476
ÿ0.444
ÿ0.613
ÿ0.286
ÿ0.559
ÿ0.555
0.453
1.000
pH
ÿ0.541
0.402
ÿ0.031
ÿ0.544
0.112
0.030
ÿ0.337
ÿ0.132
0.159
ÿ0.048
ÿ0.112
ÿ0.144
ÿ0.292
0.076
ÿ0.477
ÿ0.365
ÿ0.173
0.442
1.000
PO
4
0.434
0.209
0.506
0.601
0.409
0.378
0.026
ÿ0.395
0.342
0.532
0.503
0.406
0.590
0.451
0.613
0.453
ÿ0.376
ÿ0.847
ÿ0.374
1.000
SO
4
0.130
0.902
0.873
0.209
0.971
0.944
ÿ0.097
ÿ0.594
0.949
0.682
0.572
0.781
0.297
0.900
0.224
0.112
0.014
ÿ0.182
0.160
0.338
1.000
TEMP
0.290
ÿ0.080
0.122
0.278
0.092
0.041
0.022
ÿ0.481
0.150
0.142
0.198
0.359
0.568
ÿ0.025
ÿ0.031
0.359
ÿ0.501
ÿ0.712
ÿ0.070
0.463
0.074
1.000
Water quality analysis using exploratory data 3585
lated with temperature because the solubility of
oxygen in water decreases with increasing tempera-
ture; BOD, COD and nitrogen and phosphorous
compounds are also anti-correlated with dissolved
oxygen as organic matter is partially oxidized by
oxygen, whilst nutrients are responsible for eutro-
phication of freshwater, thus causing a further
increase in organic matter concentration and,
hence, in oxygen demand. Iron, nitrate and pH
showed no signi®cant correlation with any other
variables.
By applying the Bartlett's sphericity test, a value
of 1006.6 for the Bartlett chi-square statistic was
found (critical value is 234 for 231 degrees of free-
dom at the 95% signi®cance level), con®rming that
variables are not orthogonal but correlated, there-
fore allowing to explain the data variability with a
lesser number of variables (called principal com-
ponents).
Principal components were extracted by the R-
mode principal component method which math-
ematically transforms the original data with no
assumptions about the form of the covariance
matrix. This analysis allows a clustering of variables
on the basis of mutual correlations, and a grouping
of objects based on their similarities. For this analy-
sis, the covariance matrix was diagonalised and the
characteristic roots (eigenvalues) were obtained.
The transformed variables or principal components
(PCs) were obtained as weighted linear combi-
nations of the original variables.
The Scree plot (see Fig. 2) was used to identify
the number of PCs to be retained in order to com-
prehend the underlying data structure (Jackson,
1991). The Scree plot shows a pronounced change
of slope after the third eigenvalue; Cattell and
Jaspers (1967) suggested using all the PCs up to
and including the ®rst one after the brake, so that
four PCs were retained, which have eigenvalues
greater than unity and explain 81.5% of the var-
iance or information contained in the original data
set. Projections of the original variables on the sub-
space of the PCs are called loadings and coincide
with the correlation coecients between PCs and
variables. Loadings of the four retained PCs are
presented in Table 4. PC1 explains 46.1% of the
variance and is highly contributed by most vari-
ables: chloride, bicarbonate, sulphate, conductivity,
dissolved solids, hardness, calcium, potassium, mag-
nesium, sodium and, in a less extent, by BOD,
COD, manganese, ammonia, and phosphate. These
variables were demonstrated to be correlated (see
correlation matrix, Table 3). Flow rate and dis-
solved oxygen have a negative participation in PC1.
PC2 explains 19.0% of the variance and includes
calcium, dissolved oxygen, pH (positive loading),
BOD, COD, nitrite, phosphate and manganese
(negative participation). PC3 (9.8% of the variance)
is positively contributed by nitrate and negatively
by temperature. Finally, PC4 explains 6.6% of the
total variability of the original data and is highly
participated by iron.
As can be seen in Table 4, PC1 is highly partici-
pated by most variables, thus hindering its hydro-
chemical interpretation. In the same way, variables
related to anthropogenic pollution like BOD, COD,
phosphorous or nitrogen compounds have a high
participation on both PC1 and PC2, and therefore
PC2 cannot be explained only in terms of organic
pollution. A rotation of principal components can
achieve a simpler and more meaningful represen-
tation of the underlying factors by decreasing the
contribution to PCs of variables with minor signi®-
cance and increasing the more signi®cant ones.
Rotation produces a new set of factors, each one
involving primarily a subset of the original variables
with as little overlap as possible, so that the original
variables are divided into groups somewhat inde-
Fig. 2. Scree plot of the characteristic roots (eigenvalues)
of principal components (r) and varifactors (q).
Table 4. Loadings of 22 experimental variables on four signi®cant
principal components for 30 river water samples
Variable PC1 PC2 PC3 PC4
BOD 0.523 ÿ0.635 0.353 0.022
Ca 0.702 0.656 ÿ0.073 ÿ0.027
Cl 0.914 0.164 0.101 ÿ0.073
COD 0.574 ÿ0.618 0.322 ÿ0.154
COND 0.925 0.365 0.046 0.036
DS 0.909 0.335 0.139 0.076
Fe ÿ0.074 ÿ0.328 0.195 0.826
FLOW ÿ0.628 ÿ0.095 0.424 ÿ0.347
HARD 0.897 0.394 ÿ0.037 0.101
HCO3 0.821 0.116 ÿ0.050 ÿ0.242
K 0.828 ÿ0.139 0.215 ÿ0.147
Mg 0.901 0.020 0.013 0.216
Mn 0.547 ÿ0.479 ÿ0.253 0.459
Na 0.864 0.317 0.140 ÿ0.067
NH4 0.590 ÿ0.468 0.446 ÿ0.199
NO2 0.388 ÿ0.400 ÿ0.152 ÿ0.218
NO3 ÿ0.160 0.223 0.710 0.303
OXYG ÿ0.576 0.669 0.329 0.136
pH ÿ0.120 0.712 ÿ0.401 ÿ0.082
PO4 0.650 ÿ0.495 ÿ0.205 ÿ0.151
SO4 0.851 0.458 0.003 0.140
TEMP 0.306 ÿ0.469 ÿ0.708 0.143
Eigenvalue 10.148 4.181 2.154 1.459
% Variance explained 46.1 19.0 9.8 6.6
% Cum. variance 46.1 65.1 74.9 81.5
Marisol Vega et al.
3586
pendent of each other (Sharaf et al., 1986; Massart
et al., 1988). Although rotation does not a€ect the
goodness of ®tting of the principal component sol-
ution, the variance explained by each factor is
modi®ed.
A varimax rotation of the principal components
led to 22 rotated PCs (called henceforth varifactors)
whose eigenvalues are plotted in Fig. 2. The Scree
plot shows a pronounced change of slope after the
third eigenvalue, therefore four varifactors explain-
ing 67.8% of the variance were retained (Cattell
and Jaspers, 1967). Eigenvalues and loadings of
these varifactors are displayed in Table 5. It must
be noted that rotation has resulted in an increase of
the number of factors necessary to explain the same
amount of variance of the original data set, so that
the ®rst two varifactors used for graphical represen-
tation explains a lesser amount of variance.
However, smaller groups of variables can be now
associated to individual rotated factors with a
clearer hydrochemical meaning.
Varifactor 1 explains 37.2% of the total variance
and is highly participated by calcium, chloride, con-
ductivity, dissolved solids, hardness, bicarbonate,
magnesium, sodium and sulphate, and can be thus
interpreted as a mineral component of the river
water. This clustering of variables points to a com-
mon origin for these minerals, likely from dissol-
ution of limestone, marl and gypsum soils. Flow
rate contributes negatively to this factor, which can
be explained considering that dilution processes of
dissolved minerals increase with ¯ow. Varifactor 2
contains 16.7% of the variance and includes BOD,
COD and ammonia, whereas pH and oxygen have
a negative contribution to this varifactor. This vari-
factor can be explained taking into account that
high levels of dissolved organic matter consume
large amounts of oxygen; organic matter in urban
wastewater consists mainly of carbohydrates, pro-
teins and lipids which, as the amount of available
dissolved oxygen decreases, undergo anaerobic fer-
mentation processes leading to ammonia and or-
ganic acids. Hydrolysis of these acidic materials
causes a decrease of water pH values. Potassium
contributes in the same extent to varifactor 1 and 2.
Varifactor 3 (8.0% of variance) has a high and
positive load of temperature and negative of dis-
Table 5. Loadings of 22 experimental variables on the ®rst four
rotated PCs for 30 river water samples
Variable
Varifactor
1
Varifactor
2
Varifactor
3
Varifactor
4
BOD 0.116 0.934 0.163 0.111
Ca 0.920 ÿ0.179 ÿ0.093 ÿ0.119
Cl 0.893 0.326 0.048 ÿ0.034
COD 0.180 0.912 0.159 0.011
COND 0.973 0.148 0.049 ÿ0.038
DS 0.950 0.183 ÿ0.001 0.001
Fe ÿ0.131 0.072 0.012 0.970
FLOW ÿ0.496 ÿ0.005 ÿ0.323 ÿ0.094
HARD 0.952 0.089 0.106 ÿ0.033
HCO3 0.697 0.184 0.024 ÿ0.139
K 0.584 0.614 0.089 ÿ0.043
Mg 0.766 0.359 0.289 0.071
Mn 0.248 0.290 0.387 0.472
Na 0.918 0.180 ÿ0.070 0.003
NH4 0.225 0.761 ÿ0.190 0.065
NO2 0.105 0.170 0.182 ÿ0.061
NO3 0.014 ÿ0.003 ÿ0.260 0.104
OXYG ÿ0.132 ÿ0.418 ÿ0.540 ÿ0.016
pH 0.169 ÿ0.434 ÿ0.018 ÿ0.201
PO4 0.276 0.350 0.244 0.045
SO4 0.981 0.008 0.059 0.022
TEMP ÿ0.003 0.114 0.919 0.031
Eigenvalue 8.175 3.677 1.763 1.292
% Variance explained 37.2 16.7 8.0 5.9
% Cum. variance 37.2 53.9 61.9 67.8
Fig. 3. Scores of river water samples on the bidimensional plane de®ned by the ®rst two varifactors.
Space reduction from 22 to 2 dimensions (53.9% of the total variance). Samples collected at Cabezo
 n
de Pisuerga (.), Puente Mayor (Q) and Simancas (R) in January (E), April (A), July (J) and October
(O) from 1990 to 1992.
Water quality analysis using exploratory data 3587
solved oxygen, since solubility of gases in water
decreases with increasing temperature. Flow rate
should be expected to have a high and negative
load on varifactor 3, as high temperatures corre-
spond to dry and hot seasons like summer, when
¯ow rate is lower; however, its load is negative but
small (ÿ0.323) because during 1990 a persistent
drought caused low ¯ow rates even in winter sea-
son. Finally, varifactor 4 (5.9% of variance) is par-
ticipated by iron and manganese, which are
hydrochemically related.
Figure 3 displays a plot of sample scores on the
bidimensional plane de®ned by varifactors 1 (min-
eral contents) and varifactor 2 (anthropogenic con-
tamination, namely organic matter). High and
positive scores on varifactors 1 or 2 indicate high
mineral contents or high organic pollution, respect-
ively, whereas those samples with high and negative
scores on varifactors 1 or 2 will correspond to
higher ¯ow rate or dissolved oxygen, thus indicating
a better water quality. From Fig. 3 it can be con-
cluded that sample SJ90 (collected in Simancas in
July 1990) shows the worst quality, with high levels
of both minerals and organics. Samples collected in
January and April 1991 are projected onto negative
varifactor 1 and therefore show the lowest mineral
contents. As pointed above, winter of 1990 was
extremely dry and that fact is re¯ected by the high
scores on varifactor 2 of samples collected in April
and July 1990.
Box plots of varifactors 1, 2 and 3 in the three
sampling stations are shown in Fig. 4. Some im-
portant conclusions are derived from these plots:
varifactor 1 (mineral contents) and varifactor 3
(temperature) show large spread around the me-
dian, thus pointing to an important contribution of
sampling time to the variance of these varifactors.
On the other hand, varifactor 2 (anthropogenic pol-
lution) exhibits small spread, but the median
increases slightly from Cabezo
 n to Simancas, there-
fore indicating that sampling station is the most im-
portant source of variation in explaining the
variance of this varifactor, which is scarcely a€ected
by sampling times.
Two-way ANOVA on the three more relevant
varifactors was carried out and results of the F-test
are displayed in Table 6. Normal probability plots
of varifactors applied to individual sampling
Fig. 4. Box plots for three more signi®cant varifactors in
Cabezo
 n (C), Puente Mayor (P) and Simancas (S).
Table 6. Two-way ANOVA and F-test of the three more relevant rotated PCs
Source of variation
Sum of
squares
Degrees
of freedom
Variance
of squares F
Pooled sum
of squares % Contribution
Varifactor 1
Sampling time 18.399 9 2.044 3.521 13.629 47.0
Sampling station 0.150 2 0.075 0.129
Residual 10.451 18 0.581 15.371 53.0
Total 29.000 29 29.000 100.0
Varifactor 2
Sampling time 11.809 9 1.312 1.941
Sampling station 5.026 2 2.513 3.718 3.250 11.2
Residual 12.165 18 0.676 25.750 88.8
Total 29.000 29 29.000 100.0
Varifactor 3
Sampling time 25.306 9 2.812 15.428 23.643 81.5
Sampling station 0.414 2 0.207 1.135
Residual 3.281 18 0.182 5.357 18.5
Total 29.000 29 29.000 100.0
F calculated as variance of the e€ect/variance of the residual.
Fcrit is 2.456 for 9 and 18 degrees of freedom and 3.555 for 2 and 18 d.f (p = 0.05).
Marisol Vega et al.
3588
stations showed that varifactors were normally dis-
tributed, except varifactor 2 at Simancas. However,
the F-test as applied in ANOVA is not too sensitive
to departures from normality of distribution (Miller
and Miller, 1984) and was therefore used to inter-
pret the sources of variation.
Sources of variation that can a€ect sample pro-
jections on varifactors are sampling time (seasonal
e€ect) and sampling station (geographical or pollut-
ing e€ect). A comparison of the estimates of var-
iance by means of the Fisher ratio (F) indicates
that, at the 95% con®dence level, there is a signi®-
cant contribution to the total variance of varifactor
1 due to variation between sampling times
(F>Fcrit(9,18,p = 0.05), but the variation between
sampling stations does not contribute signi®cantly
(F < Fcrit(2,18,p = 0.05). Since varifactor 1 can be
interpreted as water inorganic (mineral) contents,
which increase with decreasing ¯ow rate, it can be
concluded that levels of minerals in the river water
investigated are seasonal and climate dependent,
and are una€ected by sampling location, thus point-
ing to a natural (non-anthropogenic) origin for this
polluting form. For varifactor 2 (organic matter,
nitrogen and phosphorous), only signi®cant contri-
bution to the variance due to di€erences between
sampling stations was found. This indicates that or-
ganic pollution of river water originates from
anthropogenic sources, mainly as municipal waste-
water which is disposed into the river between
Puente Mayor and Simancas. Sampling stations
were demonstrated not to contribute to the variance
of varifactor 3 (temperature), whereas highly signi®-
cant di€erences were found between sampling times,
thus showing that only climate and seasonality are
responsible for variations in water temperature, and
that there is no thermal pollution in the river sec-
tion investigated.
Those sources of variation that were demon-
strated not to contribute signi®cantly to the var-
iance of varifactors (F < Fcritical) were combined
with the residual variance (Ross, 1988) and from
the recalculated sum of squares the contribution of
the e€ect to the variability of the varifactor was
estimated as
%Contribution ˆ
SS0
SST
 100,
where SS' is the pooled sum of squares and SST the
total sum of squares. It can be seen in Table 6 that
seasonality contributes by 47.0% and 81.5% to the
variability of varifactors 1 (mineral composition)
and 3 (temperature), respectively, thus evidencing
the strong e€ect that climate has on the variables
explained by these varifactors. Besides, sampling lo-
cation has a negligible contribution to varifactors 1
and 3, but contributes by 11.2% to the variability
of varifactor 2 (anthropogenic pollution); this con-
tribution is smaller than that of the residual
(88.8%), thus indicating the possible existence of an
interaction between both sources of variation:
although the e€ect of sampling time (season) is not
signi®cant, it cannot be completely discarded
(F  Fcritical but 1) since climate has also a small
contribution to varifactor 2 due to seasonal vari-
ations of ¯ow rate which cause dilution of pollu-
tants of anthropogenic origin.
Spatio-temporal variations of water quality can
be readily visualised in Fig. 5, where varifactors 1,
2 and 3 have been plotted vs sampling times for the
Fig. 5. Spatio-temporal ¯uctuations of varifactors 1, 2 and
3 and their relationship with river ¯ow rate (ÐÐÐ).
Sampling stations: Cabezo
 n de Pisuerga (), Puente
Mayor (q) and Simancas (r).
Water quality analysis using exploratory data 3589
stations investigated: Cabezo
 n, Puente Mayor and
Simancas. The average ¯ow rate for the three
stations has been simultaneously plotted to show
the relationship between water quality and ¯ow
rate. Again, the inverse relationship between ¯ow
rate and rotated factors 1 and 3 (mineral com-
ponents in water and temperature, respectively) can
be observed, whilst for varifactor 2 (organic pol-
lution and nutrients) this negative correlation exists
not so markedly. The interaction between sampling
location and sampling time is illustrated in Fig. 5:
maximum variability of varifactor 2 along the river
section sampled occurs in dry seasons (July and
October) when river ¯ow rate decreases. This can
be interpreted taking into account that municipal
wastewater discharges into the Pisuerga river are
the main and nearly constant source of organic
matter, so that an increase in river ¯ow rate causes
dilution of pollutants and hence di€erences between
sampling stations are made less evident. Figure 5
shows also that sample scores on varifactor 2 are
always higher for those samples collected in
Simancas whilst Cabezo
 n and Puente Mayor scores
are similar, thus assessing that the main discharges
of organic mater and nutrients are located between
Puente Mayor and Simancas, which con®rms mu-
nicipal wastewater as the principal source of or-
ganic pollutants for the Pisuerga river. These
conclusions are in good agreement with the spatio-
temporal pro®le exhibited by the complexing ca-
pacity of the Pisuerga river water (Pardo et al.,
1994). Furthermore, di€erences in sample scores
between Simancas and the other two sampling
stations were higher in dry seasons (July and
October) thus con®rming the spatial-temporal inter-
action on varifactor 2.
Temporal variation of some independent vari-
ables associated to contamination of river water is
depicted in Fig. 6. It can be observed that conduc-
tivity behaves in the same way as varifactor 1 (see
Fig. 5 for comparison), since this variable is closely
related to mineral composition of river water, and
therefore to varifactor 1. COD and ammonia are
associated to organic pollution and therefore their
pro®les are similar to that of varifactor 2. As can
be seen in Fig. 6, the highest variation of these con-
taminants occurs in Simancas, as important
amounts of municipal wastewater are discharged
Fig. 6. Temporal variations of some original variables associated to river water pollution and their re-
lation with ¯ow rate (ÐÐÐ). Sampling stations: Cabezo
 n de Pisuerga (), Puente Mayor (q) and
Simancas (r).
Marisol Vega et al.
3590
upstream this station. Dissolved oxygen also shows
a periodic pro®le habit related to seasonality with
strong decreases at Simancas, caused by the high
levels of oxygen-consuming organic matter.
Cluster analysis
Cluster analysis allows the grouping of river
water samples on the basis of their similarities in
chemical composition. Unlike PCA that normally
uses only two or three PCs for display purposes,
cluster analysis uses all the variance or information
contained in the original data set. Hierarchical
agglomerative clustering by the Ward's method was
selected for sample classi®cation because it pos-
sesses an small space distorting e€ect, uses more in-
formation on cluster contents that other methods,
and has been proved to be an extremely powerful
grouping mechanism (Willet, 1987); besides, Ward's
method yielded the most meaningful clusters. The
method was applied to normalised data using
squared Euclidean distances as a measure of simi-
larity (Massart and Kaufman, 1983). A similar
classi®cation pattern was obtained by the average
linkage method (between groups).
The dendrogram of samples obtained by the
Ward's method is shown in Fig. 7. Two well di€er-
entiated clusters can be seen, each formed by two
subgroups, with river water quality decreasing from
top to bottom. The ®rst group from the top is
assorted with samples collected in January and
April 1991, and one sample collected in Cabezo
 n in
April 1992; in the PCA method of classi®cation
these samples scored high and negative on varifac-
tor 1 and close to 0 on varifactor 2 (see Fig. 3) thus
indicating the lowest levels of both minerals and or-
ganic matter as these samples were collected in
January and April 1991, when the river ¯ow rate is
at is maximum due to snow melting at the river
sources. This cluster is linked at a rescaled distance
of about 7 to other small but tight group that
includes samples taken out in July 1991 and July
1992 (except that from Simancas) and the sample
PO91. In the PCA analysis these samples were also
grouped on intermediate and negative values on the
varifactor 1 axis. The second main cluster is formed
for two subgroups that are linked at a rescaled dis-
tance of 10: the ®rst of them includes very similar
samples collected in January and April 1992, and
samples CO90 and CO91 and corresponds to
samples scoring high and positive varifactor 1 and
negative varifactor 2 in the PCA analysis (see Fig. 3)
thus pointing to their high levels of minerals and
low of anthropogenic pollutants. The second sub-
group includes samples collected in 1990 (April,
July and October) and samples collected from
Simancas in July and October 1991. These samples
correspond to dry seasons and to the most contami-
Fig. 7. Dendrogram based on agglomerative hierarchical clustering (Ward's method) for 30 river water
samples collected at Cabezo
 n de Pisuerga (C), Puente Mayor (P) and Simancas (S) in January (E),
April (A), July (J) and October (O) from 1990 to 1992.
Water quality analysis using exploratory data 3591
nated station (Simancas) and show the worst water
quality in both minerals and organic matter.
CONCLUSIONS
Environmental analytical chemistry generates
multidimensional data that need of multivariate
statistics to analyse and interpret the underlying in-
formation. Water quality data of a river have been
analysed by unsupervised pattern recognition (hier-
archical cluster analysis) and display methods (prin-
cipal component analysis) to extract correlations
and similarities between variables and to classify
river water samples in groups of similar quality.
PCA has found a reduced number of ``latent'' vari-
ables (principal components) that explain most of
the variance of the experimental data set. A vari-
max rotation of these PCs led to a reduced number
of varifactors, each of them related to a small
group of experimental variables with a hydrochemi-
cal meaning: mineral contents for varifactor 1,
anthropogenic pollutants for varifactor 2 or water
temperature for varifactor 3.
PCA in combination with ANOVA has allowed
the identi®cation and assessment of spatial (pol-
lution from anthropogenic origin) and temporal
(seasonal and climatic) sources of variation a€ecting
quality and hydrochemistry of river water. Man-
made pollution was demonstrated to originate from
municipal wastewater discharged into the river
between the sampling stations of Puente Mayor and
Simancas; temporal e€ects were associated to seaso-
nal variations of river ¯ow rate which cause di-
lution of pollutants and hence variations in water
quality. The application of PCA and cluster analy-
sis has achieved meaningful classi®cation of hydro-
chemical variables and of river water samples based
on seasonal and spatial criteria. Both multivariate
techniques led to very similar classi®cation patterns.
AcknowledgementÐThe authors wish to thank the
Confederacio
 n Hidrogra
 ®ca del Duero (Valladolid, Spain)
for providing data of river ¯ow rates.
REFERENCES
Andrade J. M., Prada D., Muniategui S., Gonza
 lez E. and
Alonso E. (1992) Multivariate analysis of environmental
data for two hydrographic basins. Anal. Lett. 25, 379±
399.
AOAC (1990) Ocial Methods of Analysis, Vol. 1, 15th
edn., Association of Ocial Analytical Chemists,
Arlington, VI, U.S.A., p. 312.
APHA-AWWA-WPCF (1985) Standard Methods for the
Examination of Water and Wastewater, 16th edn.,
American Public Health Association, American Water
Works Association, Water Pollution Control
Federation, U.S.A.
Aruga R., Negro G. and Ostacoli G. (1993) Multivariate
data analysis applied to the investigation of river pol-
lution. Fresenius J. Anal. Chem. 346, 968±975.
Bartels J. H. M., Janse T. A. H. M. and Pijpers F.
W. (1985) Classi®cation of the quality of surface waters
by means of pattern recognition. Anal. Chim. Acta 177,
35±45.
Battegazzore M. and Renoldi M. (1995) Integrated chemi-
cal and biological evaluation of the quality of the river
Lambro (Italy). Wat. Air Soil Poll. 83, 375±390.
Brown S. D., Skogerboe R. K. and Kowalski B. R. (1980)
Pattern recognition assessment of water quality data:
coal strip mine drainage. Chemosphere 9, 265±276.
Brown S. D., Blank T. B., Sum S. T. and Weyer L.
G. (1994) Chemometr. Anal. Chem. 66, 315R±359R.
Brown S. D., Sum S. T. and Despagne F. (1996)
Chemometrics. Anal. Chem. 68, 21R±61R.
Cattell R. B. and Jaspers J. (1967) A general plasmode
(No. 30-10-5-2) for factor analytic exercises and
research. Mult. Behav. Res. Monogr. 67, 1±212.
Dixon W. and Chiswell B. (1996) Rewiew of aquatic
monitoring program design. Wat. Res. 30, 1935±1948.
Elosegui A. and Pozo J. (1994) Spatial vs temporal varia-
bility in the physical and chemical characteristics of the
Aguera stream (Northern Spain). Acta Ecologica Ð Int.
J. Ecol. 15, 543±559.
Grimalt J. O., Olive J. and Go
 mez-Belincho
 n J. I. (1990)
Assessment of organic source contributions in coastal
waters by principal component and factor analysis of
the dissolved and particulate hydrocarbon and fatty
acid contents. Int. J. Environ. Anal. Chem. 38, 305±320.
Jackson J. E. (1991) A User's Guide to Principal
Components. Wiley, New York.
Librando V. (1991) Chemometric evaluation of surface
water quality at regional level. Fresenius J. Anal. Chem.
339, 613±619.
Massart D. L. and Kaufman L. (1983) The Interpretation
of Analytical Chemical Data by the Use of Cluster
Analysis. Wiley, New York.
Massart D. L., Vandeginste B. G. M., Deming S. N.,
Michotte Y. and Kaufman L. (1988) Chemometrics: A
Textbook. Elsevier, Amsterdam.
Meglen R. R. (1992) Examining large databases: a chemo-
metric approach using principal component analysis.
Mar. Chem. 39, 217±237.
Mellinger M. (1987) Multivariate data analysis: its
methods. Chemometr. Intell. Lab. Systems 2, 29±36.
Miller J. C. and Miller J. N. (1984) Statistics for
Analytical Chemistry. Ellis Horwood Series in Analytical
Chemistry, Wiley, New York.
Pardo R., Barrado E., Vega M., Deban L. and Tasco
 n M.
L. (1994) Voltammetric complexation capacity of waters
from the Pisuerga river. Wat. Res. 28, 2139±2146.
Ross P. J. (1988) Taguchi Techniques for Quality
Engineering. McGraw-Hill, New York.
Sche€e H. (1959) The Analysis of Variance. Wiley, New
York.
Sharaf M. A., Illman D. L. and Kowalski B. R. (1986)
Chemometrics. Wiley, New York.
Stroomberg G. J., Freriks I. L., Smedes F. and Co®no W.
P. (1995) In Quality Assurance in Environmental
Monitoring, ed. P. Quevauviller. VCH, Weinheim.
Voutsa D., Zachariadis G., Samara C. and Kouimtzis
T. (1995) Evaluation of chemical parameters in
Aliakmon river in Northers Greece. 2. Dissolved and
particulate heavy metals. J. Environ. Sci. Hlth. Part A:
Environ. Sci. Engng 30, 1±13.
Ward A. D. and Elliot W. J. (1995) In Environmental
Hydrology, ed. A. D. Ward and W. J. Elliot, pp. 1.
CRC Press, Boca Raton.
Wenning R. J. and Erickson G. A. (1994) Interpretation
and analysis of complex environmental data using che-
mometric methods. Trends Anal. Chem. 13, 446±457.
Willet P. (1987) Similarity and Clustering in Chemical
Information Systems. Research Studies Press, Wiley,
New York.
Marisol Vega et al.
3592

More Related Content

Similar to Assessment Of Seasonal And Polluting Effects On The Quality Of River Water By Exploratory Data Analysis

ENVIRONMENTAL CONDITIONS OF THE WATERS OF THE MANZANARES RIVER, CUMANA-SUCRE,...
ENVIRONMENTAL CONDITIONS OF THE WATERS OF THE MANZANARES RIVER, CUMANA-SUCRE,...ENVIRONMENTAL CONDITIONS OF THE WATERS OF THE MANZANARES RIVER, CUMANA-SUCRE,...
ENVIRONMENTAL CONDITIONS OF THE WATERS OF THE MANZANARES RIVER, CUMANA-SUCRE,...
Jubilado de la Universidad de Oriente (UDO), Venezuela.
 
Regression models for prediction of water quality in krishna river
Regression models for prediction of water quality in krishna riverRegression models for prediction of water quality in krishna river
Regression models for prediction of water quality in krishna river
Alexander Decker
 
Regression models for prediction of water quality in krishna river
Regression models for prediction of water quality in krishna riverRegression models for prediction of water quality in krishna river
Regression models for prediction of water quality in krishna river
Alexander Decker
 
Industry and mining, urban waste-water and water related health hazards in th...
Industry and mining, urban waste-water and water related health hazards in th...Industry and mining, urban waste-water and water related health hazards in th...
Industry and mining, urban waste-water and water related health hazards in th...
AndesBFP
 
Assessment Of Lake Water Quality And Eutrophication Risk In An Agricultural I...
Assessment Of Lake Water Quality And Eutrophication Risk In An Agricultural I...Assessment Of Lake Water Quality And Eutrophication Risk In An Agricultural I...
Assessment Of Lake Water Quality And Eutrophication Risk In An Agricultural I...
Amy Cernava
 
BrusséeT.J. - fieldwork report
BrusséeT.J. - fieldwork reportBrusséeT.J. - fieldwork report
BrusséeT.J. - fieldwork report
Timo Brussée
 
Assessment of seasonal variations in surface water quality of Laguna Lake Sta...
Assessment of seasonal variations in surface water quality of Laguna Lake Sta...Assessment of seasonal variations in surface water quality of Laguna Lake Sta...
Assessment of seasonal variations in surface water quality of Laguna Lake Sta...
Open Access Research Paper
 
Understanding land use influence to coastal ecosystems in the Rio Grande de M...
Understanding land use influence to coastal ecosystems in the Rio Grande de M...Understanding land use influence to coastal ecosystems in the Rio Grande de M...
Understanding land use influence to coastal ecosystems in the Rio Grande de M...
Loretta Roberson
 
Sample Writing2
Sample Writing2Sample Writing2
Sample Writing2
Kevin Choi
 
PHYSICO-CHEMICAL AND BACTERIOLOGICAL ASSESSMENT OF RIVER MUDZIRA WATER IN MUB...
PHYSICO-CHEMICAL AND BACTERIOLOGICAL ASSESSMENT OF RIVER MUDZIRA WATER IN MUB...PHYSICO-CHEMICAL AND BACTERIOLOGICAL ASSESSMENT OF RIVER MUDZIRA WATER IN MUB...
PHYSICO-CHEMICAL AND BACTERIOLOGICAL ASSESSMENT OF RIVER MUDZIRA WATER IN MUB...
International Journal of Technical Research & Application
 
Macrobenthic Invertebrate assemblage along gradients of the river Basantar (J...
Macrobenthic Invertebrate assemblage along gradients of the river Basantar (J...Macrobenthic Invertebrate assemblage along gradients of the river Basantar (J...
Macrobenthic Invertebrate assemblage along gradients of the river Basantar (J...
Agriculture Journal IJOEAR
 
Combating surface and groundwater pollution in armenia
Combating surface and groundwater pollution in armeniaCombating surface and groundwater pollution in armenia
Combating surface and groundwater pollution in armenia
Varduhi Surmalyan
 
Water resources
Water resourcesWater resources
Water resources
Gavs Capistrano
 
The multivariate statistical analysis of the environmental pollutants at lake...
The multivariate statistical analysis of the environmental pollutants at lake...The multivariate statistical analysis of the environmental pollutants at lake...
The multivariate statistical analysis of the environmental pollutants at lake...
Alexander Decker
 
Water Management in Rural Areas in a Changing Climate
Water Management in Rural Areas in a Changing ClimateWater Management in Rural Areas in a Changing Climate
Water Management in Rural Areas in a Changing Climate
CIFOR-ICRAF
 
Environmental flow & Fish passes
Environmental flow & Fish passesEnvironmental flow & Fish passes
Environmental flow & Fish passes
SHAHANAS6
 
Bioassessment of intermittent rivers and Ephemeral Streams from the Mediterra...
Bioassessment of intermittent rivers and Ephemeral Streams from the Mediterra...Bioassessment of intermittent rivers and Ephemeral Streams from the Mediterra...
Bioassessment of intermittent rivers and Ephemeral Streams from the Mediterra...
Andrea Castellanos
 
Assessment of the Water Quality of Lake Sidi Boughaba (Ramsar Site 1980) Keni...
Assessment of the Water Quality of Lake Sidi Boughaba (Ramsar Site 1980) Keni...Assessment of the Water Quality of Lake Sidi Boughaba (Ramsar Site 1980) Keni...
Assessment of the Water Quality of Lake Sidi Boughaba (Ramsar Site 1980) Keni...
journal ijrtem
 
Lessons from Iran Mukhtar Hashemi
Lessons from Iran Mukhtar HashemiLessons from Iran Mukhtar Hashemi
Lessons from Iran Mukhtar Hashemi
WANA forum
 
Assessment of the Water Quality of Lake Sidi Boughaba (Ramsar Site 1980) Keni...
Assessment of the Water Quality of Lake Sidi Boughaba (Ramsar Site 1980) Keni...Assessment of the Water Quality of Lake Sidi Boughaba (Ramsar Site 1980) Keni...
Assessment of the Water Quality of Lake Sidi Boughaba (Ramsar Site 1980) Keni...
IJRTEMJOURNAL
 

Similar to Assessment Of Seasonal And Polluting Effects On The Quality Of River Water By Exploratory Data Analysis (20)

ENVIRONMENTAL CONDITIONS OF THE WATERS OF THE MANZANARES RIVER, CUMANA-SUCRE,...
ENVIRONMENTAL CONDITIONS OF THE WATERS OF THE MANZANARES RIVER, CUMANA-SUCRE,...ENVIRONMENTAL CONDITIONS OF THE WATERS OF THE MANZANARES RIVER, CUMANA-SUCRE,...
ENVIRONMENTAL CONDITIONS OF THE WATERS OF THE MANZANARES RIVER, CUMANA-SUCRE,...
 
Regression models for prediction of water quality in krishna river
Regression models for prediction of water quality in krishna riverRegression models for prediction of water quality in krishna river
Regression models for prediction of water quality in krishna river
 
Regression models for prediction of water quality in krishna river
Regression models for prediction of water quality in krishna riverRegression models for prediction of water quality in krishna river
Regression models for prediction of water quality in krishna river
 
Industry and mining, urban waste-water and water related health hazards in th...
Industry and mining, urban waste-water and water related health hazards in th...Industry and mining, urban waste-water and water related health hazards in th...
Industry and mining, urban waste-water and water related health hazards in th...
 
Assessment Of Lake Water Quality And Eutrophication Risk In An Agricultural I...
Assessment Of Lake Water Quality And Eutrophication Risk In An Agricultural I...Assessment Of Lake Water Quality And Eutrophication Risk In An Agricultural I...
Assessment Of Lake Water Quality And Eutrophication Risk In An Agricultural I...
 
BrusséeT.J. - fieldwork report
BrusséeT.J. - fieldwork reportBrusséeT.J. - fieldwork report
BrusséeT.J. - fieldwork report
 
Assessment of seasonal variations in surface water quality of Laguna Lake Sta...
Assessment of seasonal variations in surface water quality of Laguna Lake Sta...Assessment of seasonal variations in surface water quality of Laguna Lake Sta...
Assessment of seasonal variations in surface water quality of Laguna Lake Sta...
 
Understanding land use influence to coastal ecosystems in the Rio Grande de M...
Understanding land use influence to coastal ecosystems in the Rio Grande de M...Understanding land use influence to coastal ecosystems in the Rio Grande de M...
Understanding land use influence to coastal ecosystems in the Rio Grande de M...
 
Sample Writing2
Sample Writing2Sample Writing2
Sample Writing2
 
PHYSICO-CHEMICAL AND BACTERIOLOGICAL ASSESSMENT OF RIVER MUDZIRA WATER IN MUB...
PHYSICO-CHEMICAL AND BACTERIOLOGICAL ASSESSMENT OF RIVER MUDZIRA WATER IN MUB...PHYSICO-CHEMICAL AND BACTERIOLOGICAL ASSESSMENT OF RIVER MUDZIRA WATER IN MUB...
PHYSICO-CHEMICAL AND BACTERIOLOGICAL ASSESSMENT OF RIVER MUDZIRA WATER IN MUB...
 
Macrobenthic Invertebrate assemblage along gradients of the river Basantar (J...
Macrobenthic Invertebrate assemblage along gradients of the river Basantar (J...Macrobenthic Invertebrate assemblage along gradients of the river Basantar (J...
Macrobenthic Invertebrate assemblage along gradients of the river Basantar (J...
 
Combating surface and groundwater pollution in armenia
Combating surface and groundwater pollution in armeniaCombating surface and groundwater pollution in armenia
Combating surface and groundwater pollution in armenia
 
Water resources
Water resourcesWater resources
Water resources
 
The multivariate statistical analysis of the environmental pollutants at lake...
The multivariate statistical analysis of the environmental pollutants at lake...The multivariate statistical analysis of the environmental pollutants at lake...
The multivariate statistical analysis of the environmental pollutants at lake...
 
Water Management in Rural Areas in a Changing Climate
Water Management in Rural Areas in a Changing ClimateWater Management in Rural Areas in a Changing Climate
Water Management in Rural Areas in a Changing Climate
 
Environmental flow & Fish passes
Environmental flow & Fish passesEnvironmental flow & Fish passes
Environmental flow & Fish passes
 
Bioassessment of intermittent rivers and Ephemeral Streams from the Mediterra...
Bioassessment of intermittent rivers and Ephemeral Streams from the Mediterra...Bioassessment of intermittent rivers and Ephemeral Streams from the Mediterra...
Bioassessment of intermittent rivers and Ephemeral Streams from the Mediterra...
 
Assessment of the Water Quality of Lake Sidi Boughaba (Ramsar Site 1980) Keni...
Assessment of the Water Quality of Lake Sidi Boughaba (Ramsar Site 1980) Keni...Assessment of the Water Quality of Lake Sidi Boughaba (Ramsar Site 1980) Keni...
Assessment of the Water Quality of Lake Sidi Boughaba (Ramsar Site 1980) Keni...
 
Lessons from Iran Mukhtar Hashemi
Lessons from Iran Mukhtar HashemiLessons from Iran Mukhtar Hashemi
Lessons from Iran Mukhtar Hashemi
 
Assessment of the Water Quality of Lake Sidi Boughaba (Ramsar Site 1980) Keni...
Assessment of the Water Quality of Lake Sidi Boughaba (Ramsar Site 1980) Keni...Assessment of the Water Quality of Lake Sidi Boughaba (Ramsar Site 1980) Keni...
Assessment of the Water Quality of Lake Sidi Boughaba (Ramsar Site 1980) Keni...
 

More from Karen Benoit

Writing A Sociology Essay Navigating The Societal La
Writing A Sociology Essay Navigating The Societal LaWriting A Sociology Essay Navigating The Societal La
Writing A Sociology Essay Navigating The Societal La
Karen Benoit
 
Citations Examples For Research Paper. Online assignment writing service.
Citations Examples For Research Paper. Online assignment writing service.Citations Examples For Research Paper. Online assignment writing service.
Citations Examples For Research Paper. Online assignment writing service.
Karen Benoit
 
5Th Grade 5 Paragraph Essay Samples - Structurin
5Th Grade 5 Paragraph Essay Samples - Structurin5Th Grade 5 Paragraph Essay Samples - Structurin
5Th Grade 5 Paragraph Essay Samples - Structurin
Karen Benoit
 
In An Essay Films Are Underlined - Persepolisthesis.Web.Fc2.Com
In An Essay Films Are Underlined - Persepolisthesis.Web.Fc2.ComIn An Essay Films Are Underlined - Persepolisthesis.Web.Fc2.Com
In An Essay Films Are Underlined - Persepolisthesis.Web.Fc2.Com
Karen Benoit
 
Essay On Newspaper PDF Newspapers Public Opinion
Essay On Newspaper  PDF  Newspapers  Public OpinionEssay On Newspaper  PDF  Newspapers  Public Opinion
Essay On Newspaper PDF Newspapers Public Opinion
Karen Benoit
 
Printable Writing Paper Ramar Pinterest Ramar
Printable Writing Paper  Ramar  Pinterest  RamarPrintable Writing Paper  Ramar  Pinterest  Ramar
Printable Writing Paper Ramar Pinterest Ramar
Karen Benoit
 
How To Write A Good Expository Essay -. Online assignment writing service.
How To Write A Good Expository Essay -. Online assignment writing service.How To Write A Good Expository Essay -. Online assignment writing service.
How To Write A Good Expository Essay -. Online assignment writing service.
Karen Benoit
 
8 Tips That Will Make You Guru In Essay Writing - SCS
8 Tips That Will Make You Guru In Essay Writing - SCS8 Tips That Will Make You Guru In Essay Writing - SCS
8 Tips That Will Make You Guru In Essay Writing - SCS
Karen Benoit
 
Benefits Of Tertiary Education. What Are The Be
Benefits Of Tertiary Education. What Are The BeBenefits Of Tertiary Education. What Are The Be
Benefits Of Tertiary Education. What Are The Be
Karen Benoit
 
Essay On Money Money Essay For Students And Children In En
Essay On Money  Money Essay For Students And Children In EnEssay On Money  Money Essay For Students And Children In En
Essay On Money Money Essay For Students And Children In En
Karen Benoit
 
ALBERT CAMUS ON THE NOTION OF SUICIDE, AND THE VALUE OF.pdf
ALBERT CAMUS ON THE NOTION OF SUICIDE, AND THE VALUE OF.pdfALBERT CAMUS ON THE NOTION OF SUICIDE, AND THE VALUE OF.pdf
ALBERT CAMUS ON THE NOTION OF SUICIDE, AND THE VALUE OF.pdf
Karen Benoit
 
Automation A Robotic Arm (FYP) Thesis.pdf
Automation  A Robotic Arm (FYP) Thesis.pdfAutomation  A Robotic Arm (FYP) Thesis.pdf
Automation A Robotic Arm (FYP) Thesis.pdf
Karen Benoit
 
12th Report on Carcinogens.pdf
12th Report on Carcinogens.pdf12th Report on Carcinogens.pdf
12th Report on Carcinogens.pdf
Karen Benoit
 
11.Bio Inspired Approach as a Problem Solving Technique.pdf
11.Bio Inspired Approach as a Problem Solving Technique.pdf11.Bio Inspired Approach as a Problem Solving Technique.pdf
11.Bio Inspired Approach as a Problem Solving Technique.pdf
Karen Benoit
 
A Brief Overview Of Ethiopian Film History.pdf
A Brief Overview Of Ethiopian Film History.pdfA Brief Overview Of Ethiopian Film History.pdf
A Brief Overview Of Ethiopian Film History.pdf
Karen Benoit
 
A Commentary on Education and Sustainable Development Goals.pdf
A Commentary on Education and Sustainable Development Goals.pdfA Commentary on Education and Sustainable Development Goals.pdf
A Commentary on Education and Sustainable Development Goals.pdf
Karen Benoit
 
A Historical Overview of Writing and Technology.pdf
A Historical Overview of Writing and Technology.pdfA Historical Overview of Writing and Technology.pdf
A Historical Overview of Writing and Technology.pdf
Karen Benoit
 
A History of Ancient Rome - Mary Beard.pdf
A History of Ancient Rome - Mary Beard.pdfA History of Ancient Rome - Mary Beard.pdf
A History of Ancient Rome - Mary Beard.pdf
Karen Benoit
 
A Review of Problem Solving Capabilities in Lean Process Management.pdf
A Review of Problem Solving Capabilities in Lean Process Management.pdfA Review of Problem Solving Capabilities in Lean Process Management.pdf
A Review of Problem Solving Capabilities in Lean Process Management.pdf
Karen Benoit
 
Art Archaeology the Ineligible project (2020) - extended book chapter.pdf
Art Archaeology  the Ineligible project (2020) - extended book chapter.pdfArt Archaeology  the Ineligible project (2020) - extended book chapter.pdf
Art Archaeology the Ineligible project (2020) - extended book chapter.pdf
Karen Benoit
 

More from Karen Benoit (20)

Writing A Sociology Essay Navigating The Societal La
Writing A Sociology Essay Navigating The Societal LaWriting A Sociology Essay Navigating The Societal La
Writing A Sociology Essay Navigating The Societal La
 
Citations Examples For Research Paper. Online assignment writing service.
Citations Examples For Research Paper. Online assignment writing service.Citations Examples For Research Paper. Online assignment writing service.
Citations Examples For Research Paper. Online assignment writing service.
 
5Th Grade 5 Paragraph Essay Samples - Structurin
5Th Grade 5 Paragraph Essay Samples - Structurin5Th Grade 5 Paragraph Essay Samples - Structurin
5Th Grade 5 Paragraph Essay Samples - Structurin
 
In An Essay Films Are Underlined - Persepolisthesis.Web.Fc2.Com
In An Essay Films Are Underlined - Persepolisthesis.Web.Fc2.ComIn An Essay Films Are Underlined - Persepolisthesis.Web.Fc2.Com
In An Essay Films Are Underlined - Persepolisthesis.Web.Fc2.Com
 
Essay On Newspaper PDF Newspapers Public Opinion
Essay On Newspaper  PDF  Newspapers  Public OpinionEssay On Newspaper  PDF  Newspapers  Public Opinion
Essay On Newspaper PDF Newspapers Public Opinion
 
Printable Writing Paper Ramar Pinterest Ramar
Printable Writing Paper  Ramar  Pinterest  RamarPrintable Writing Paper  Ramar  Pinterest  Ramar
Printable Writing Paper Ramar Pinterest Ramar
 
How To Write A Good Expository Essay -. Online assignment writing service.
How To Write A Good Expository Essay -. Online assignment writing service.How To Write A Good Expository Essay -. Online assignment writing service.
How To Write A Good Expository Essay -. Online assignment writing service.
 
8 Tips That Will Make You Guru In Essay Writing - SCS
8 Tips That Will Make You Guru In Essay Writing - SCS8 Tips That Will Make You Guru In Essay Writing - SCS
8 Tips That Will Make You Guru In Essay Writing - SCS
 
Benefits Of Tertiary Education. What Are The Be
Benefits Of Tertiary Education. What Are The BeBenefits Of Tertiary Education. What Are The Be
Benefits Of Tertiary Education. What Are The Be
 
Essay On Money Money Essay For Students And Children In En
Essay On Money  Money Essay For Students And Children In EnEssay On Money  Money Essay For Students And Children In En
Essay On Money Money Essay For Students And Children In En
 
ALBERT CAMUS ON THE NOTION OF SUICIDE, AND THE VALUE OF.pdf
ALBERT CAMUS ON THE NOTION OF SUICIDE, AND THE VALUE OF.pdfALBERT CAMUS ON THE NOTION OF SUICIDE, AND THE VALUE OF.pdf
ALBERT CAMUS ON THE NOTION OF SUICIDE, AND THE VALUE OF.pdf
 
Automation A Robotic Arm (FYP) Thesis.pdf
Automation  A Robotic Arm (FYP) Thesis.pdfAutomation  A Robotic Arm (FYP) Thesis.pdf
Automation A Robotic Arm (FYP) Thesis.pdf
 
12th Report on Carcinogens.pdf
12th Report on Carcinogens.pdf12th Report on Carcinogens.pdf
12th Report on Carcinogens.pdf
 
11.Bio Inspired Approach as a Problem Solving Technique.pdf
11.Bio Inspired Approach as a Problem Solving Technique.pdf11.Bio Inspired Approach as a Problem Solving Technique.pdf
11.Bio Inspired Approach as a Problem Solving Technique.pdf
 
A Brief Overview Of Ethiopian Film History.pdf
A Brief Overview Of Ethiopian Film History.pdfA Brief Overview Of Ethiopian Film History.pdf
A Brief Overview Of Ethiopian Film History.pdf
 
A Commentary on Education and Sustainable Development Goals.pdf
A Commentary on Education and Sustainable Development Goals.pdfA Commentary on Education and Sustainable Development Goals.pdf
A Commentary on Education and Sustainable Development Goals.pdf
 
A Historical Overview of Writing and Technology.pdf
A Historical Overview of Writing and Technology.pdfA Historical Overview of Writing and Technology.pdf
A Historical Overview of Writing and Technology.pdf
 
A History of Ancient Rome - Mary Beard.pdf
A History of Ancient Rome - Mary Beard.pdfA History of Ancient Rome - Mary Beard.pdf
A History of Ancient Rome - Mary Beard.pdf
 
A Review of Problem Solving Capabilities in Lean Process Management.pdf
A Review of Problem Solving Capabilities in Lean Process Management.pdfA Review of Problem Solving Capabilities in Lean Process Management.pdf
A Review of Problem Solving Capabilities in Lean Process Management.pdf
 
Art Archaeology the Ineligible project (2020) - extended book chapter.pdf
Art Archaeology  the Ineligible project (2020) - extended book chapter.pdfArt Archaeology  the Ineligible project (2020) - extended book chapter.pdf
Art Archaeology the Ineligible project (2020) - extended book chapter.pdf
 

Recently uploaded

How to Manage Your Lost Opportunities in Odoo 17 CRM
How to Manage Your Lost Opportunities in Odoo 17 CRMHow to Manage Your Lost Opportunities in Odoo 17 CRM
How to Manage Your Lost Opportunities in Odoo 17 CRM
Celine George
 
Pollock and Snow "DEIA in the Scholarly Landscape, Session One: Setting Expec...
Pollock and Snow "DEIA in the Scholarly Landscape, Session One: Setting Expec...Pollock and Snow "DEIA in the Scholarly Landscape, Session One: Setting Expec...
Pollock and Snow "DEIA in the Scholarly Landscape, Session One: Setting Expec...
National Information Standards Organization (NISO)
 
Cognitive Development Adolescence Psychology
Cognitive Development Adolescence PsychologyCognitive Development Adolescence Psychology
Cognitive Development Adolescence Psychology
paigestewart1632
 
Film vocab for eal 3 students: Australia the movie
Film vocab for eal 3 students: Australia the movieFilm vocab for eal 3 students: Australia the movie
Film vocab for eal 3 students: Australia the movie
Nicholas Montgomery
 
BBR 2024 Summer Sessions Interview Training
BBR  2024 Summer Sessions Interview TrainingBBR  2024 Summer Sessions Interview Training
BBR 2024 Summer Sessions Interview Training
Katrina Pritchard
 
Présentationvvvvvvvvvvvvvvvvvvvvvvvvvvvv2.pptx
Présentationvvvvvvvvvvvvvvvvvvvvvvvvvvvv2.pptxPrésentationvvvvvvvvvvvvvvvvvvvvvvvvvvvv2.pptx
Présentationvvvvvvvvvvvvvvvvvvvvvvvvvvvv2.pptx
siemaillard
 
The History of Stoke Newington Street Names
The History of Stoke Newington Street NamesThe History of Stoke Newington Street Names
The History of Stoke Newington Street Names
History of Stoke Newington
 
How to Add Chatter in the odoo 17 ERP Module
How to Add Chatter in the odoo 17 ERP ModuleHow to Add Chatter in the odoo 17 ERP Module
How to Add Chatter in the odoo 17 ERP Module
Celine George
 
Reimagining Your Library Space: How to Increase the Vibes in Your Library No ...
Reimagining Your Library Space: How to Increase the Vibes in Your Library No ...Reimagining Your Library Space: How to Increase the Vibes in Your Library No ...
Reimagining Your Library Space: How to Increase the Vibes in Your Library No ...
Diana Rendina
 
Wound healing PPT
Wound healing PPTWound healing PPT
Wound healing PPT
Jyoti Chand
 
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...
Nguyen Thanh Tu Collection
 
Your Skill Boost Masterclass: Strategies for Effective Upskilling
Your Skill Boost Masterclass: Strategies for Effective UpskillingYour Skill Boost Masterclass: Strategies for Effective Upskilling
Your Skill Boost Masterclass: Strategies for Effective Upskilling
Excellence Foundation for South Sudan
 
How to Make a Field Mandatory in Odoo 17
How to Make a Field Mandatory in Odoo 17How to Make a Field Mandatory in Odoo 17
How to Make a Field Mandatory in Odoo 17
Celine George
 
ANATOMY AND BIOMECHANICS OF HIP JOINT.pdf
ANATOMY AND BIOMECHANICS OF HIP JOINT.pdfANATOMY AND BIOMECHANICS OF HIP JOINT.pdf
ANATOMY AND BIOMECHANICS OF HIP JOINT.pdf
Priyankaranawat4
 
Main Java[All of the Base Concepts}.docx
Main Java[All of the Base Concepts}.docxMain Java[All of the Base Concepts}.docx
Main Java[All of the Base Concepts}.docx
adhitya5119
 
Natural birth techniques - Mrs.Akanksha Trivedi Rama University
Natural birth techniques - Mrs.Akanksha Trivedi Rama UniversityNatural birth techniques - Mrs.Akanksha Trivedi Rama University
Natural birth techniques - Mrs.Akanksha Trivedi Rama University
Akanksha trivedi rama nursing college kanpur.
 
Leveraging Generative AI to Drive Nonprofit Innovation
Leveraging Generative AI to Drive Nonprofit InnovationLeveraging Generative AI to Drive Nonprofit Innovation
Leveraging Generative AI to Drive Nonprofit Innovation
TechSoup
 
How to Create a More Engaging and Human Online Learning Experience
How to Create a More Engaging and Human Online Learning Experience How to Create a More Engaging and Human Online Learning Experience
How to Create a More Engaging and Human Online Learning Experience
Wahiba Chair Training & Consulting
 
MARY JANE WILSON, A “BOA MÃE” .
MARY JANE WILSON, A “BOA MÃE”           .MARY JANE WILSON, A “BOA MÃE”           .
MARY JANE WILSON, A “BOA MÃE” .
Colégio Santa Teresinha
 
PIMS Job Advertisement 2024.pdf Islamabad
PIMS Job Advertisement 2024.pdf IslamabadPIMS Job Advertisement 2024.pdf Islamabad
PIMS Job Advertisement 2024.pdf Islamabad
AyyanKhan40
 

Recently uploaded (20)

How to Manage Your Lost Opportunities in Odoo 17 CRM
How to Manage Your Lost Opportunities in Odoo 17 CRMHow to Manage Your Lost Opportunities in Odoo 17 CRM
How to Manage Your Lost Opportunities in Odoo 17 CRM
 
Pollock and Snow "DEIA in the Scholarly Landscape, Session One: Setting Expec...
Pollock and Snow "DEIA in the Scholarly Landscape, Session One: Setting Expec...Pollock and Snow "DEIA in the Scholarly Landscape, Session One: Setting Expec...
Pollock and Snow "DEIA in the Scholarly Landscape, Session One: Setting Expec...
 
Cognitive Development Adolescence Psychology
Cognitive Development Adolescence PsychologyCognitive Development Adolescence Psychology
Cognitive Development Adolescence Psychology
 
Film vocab for eal 3 students: Australia the movie
Film vocab for eal 3 students: Australia the movieFilm vocab for eal 3 students: Australia the movie
Film vocab for eal 3 students: Australia the movie
 
BBR 2024 Summer Sessions Interview Training
BBR  2024 Summer Sessions Interview TrainingBBR  2024 Summer Sessions Interview Training
BBR 2024 Summer Sessions Interview Training
 
Présentationvvvvvvvvvvvvvvvvvvvvvvvvvvvv2.pptx
Présentationvvvvvvvvvvvvvvvvvvvvvvvvvvvv2.pptxPrésentationvvvvvvvvvvvvvvvvvvvvvvvvvvvv2.pptx
Présentationvvvvvvvvvvvvvvvvvvvvvvvvvvvv2.pptx
 
The History of Stoke Newington Street Names
The History of Stoke Newington Street NamesThe History of Stoke Newington Street Names
The History of Stoke Newington Street Names
 
How to Add Chatter in the odoo 17 ERP Module
How to Add Chatter in the odoo 17 ERP ModuleHow to Add Chatter in the odoo 17 ERP Module
How to Add Chatter in the odoo 17 ERP Module
 
Reimagining Your Library Space: How to Increase the Vibes in Your Library No ...
Reimagining Your Library Space: How to Increase the Vibes in Your Library No ...Reimagining Your Library Space: How to Increase the Vibes in Your Library No ...
Reimagining Your Library Space: How to Increase the Vibes in Your Library No ...
 
Wound healing PPT
Wound healing PPTWound healing PPT
Wound healing PPT
 
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...
 
Your Skill Boost Masterclass: Strategies for Effective Upskilling
Your Skill Boost Masterclass: Strategies for Effective UpskillingYour Skill Boost Masterclass: Strategies for Effective Upskilling
Your Skill Boost Masterclass: Strategies for Effective Upskilling
 
How to Make a Field Mandatory in Odoo 17
How to Make a Field Mandatory in Odoo 17How to Make a Field Mandatory in Odoo 17
How to Make a Field Mandatory in Odoo 17
 
ANATOMY AND BIOMECHANICS OF HIP JOINT.pdf
ANATOMY AND BIOMECHANICS OF HIP JOINT.pdfANATOMY AND BIOMECHANICS OF HIP JOINT.pdf
ANATOMY AND BIOMECHANICS OF HIP JOINT.pdf
 
Main Java[All of the Base Concepts}.docx
Main Java[All of the Base Concepts}.docxMain Java[All of the Base Concepts}.docx
Main Java[All of the Base Concepts}.docx
 
Natural birth techniques - Mrs.Akanksha Trivedi Rama University
Natural birth techniques - Mrs.Akanksha Trivedi Rama UniversityNatural birth techniques - Mrs.Akanksha Trivedi Rama University
Natural birth techniques - Mrs.Akanksha Trivedi Rama University
 
Leveraging Generative AI to Drive Nonprofit Innovation
Leveraging Generative AI to Drive Nonprofit InnovationLeveraging Generative AI to Drive Nonprofit Innovation
Leveraging Generative AI to Drive Nonprofit Innovation
 
How to Create a More Engaging and Human Online Learning Experience
How to Create a More Engaging and Human Online Learning Experience How to Create a More Engaging and Human Online Learning Experience
How to Create a More Engaging and Human Online Learning Experience
 
MARY JANE WILSON, A “BOA MÃE” .
MARY JANE WILSON, A “BOA MÃE”           .MARY JANE WILSON, A “BOA MÃE”           .
MARY JANE WILSON, A “BOA MÃE” .
 
PIMS Job Advertisement 2024.pdf Islamabad
PIMS Job Advertisement 2024.pdf IslamabadPIMS Job Advertisement 2024.pdf Islamabad
PIMS Job Advertisement 2024.pdf Islamabad
 

Assessment Of Seasonal And Polluting Effects On The Quality Of River Water By Exploratory Data Analysis

  • 1. ASSESSMENT OF SEASONAL AND POLLUTING EFFECTS ON THE QUALITY OF RIVER WATER BY EXPLORATORY DATA ANALYSIS MARISOL VEGA*, RAFAEL PARDO* M , ENRIQUE BARRADO and LUIS DEBA  N Departamento de QuõÂmica AnalõÂtica, Facultad de Ciencias, Universidad de Valladolid, 47005 Valladolid, Spain (First received December 1996; accepted March 1998) AbstractÐ22 Physico-chemical variables have been analyzed in water samples collected every three months for two and a half years from three sampling stations located along a section of 25 km of a river a€ected by man-made and seasonal in¯uences. Exploratory analysis of experimental data have been carried out by box plots, ANOVA, display methods (principal component analysis) and unsuper- vised pattern recognition (cluster analysis) in an attempt to discriminate sources of variation of water quality. PCA has allowed the identi®cation of a reduced number of ``latent'' factors with a hydrochemi- cal meaning: mineral contents, man-made pollution and water temperature. Spatial (pollution from anthropogenic origin) and temporal (seasonal and climatic) sources of variation a€ecting quality and hydrochemistry of river water have been di€erentiated and assigned to polluting sources. An ANOVA of the rotated principal components has demonstrated that (i) mineral contents are seasonal and climate dependent, thus pointing to a natural origin for this polluting form and (ii) pollution by organic matter and nutrients originates from anthropogenic sources, mainly as municipal wastewater. The application of PCA and cluster analysis has achieved a meaningful classi®cation of river water samples based on seasonal and spatial criteria. # 1998 Elsevier Science Ltd. All rights reserved Key words: water quality, surface water, hydrochemistry, exploratory data analysis, ANOVA, box plot, principal component analysis, pattern recognition, cluster analysis. INTRODUCTION River basins generally constitute areas with a high population density owing to favourable living con- ditions such as the availability of fertile lands, water for irrigation, industrial or drinking purposes, and ecient means of transportation. Rivers play a major role in assimilating or carrying o€ industrial and municipal wastewater, manure discharges and runo€ from agricultural ®elds, roadways and streets, which are responsible for river pollution (Stroomberg et al., 1995; Ward and Elliot, 1995). Rivers constitute too the main water resources in inland areas for drinking, irrigation and industrial purposes; thus, it is a prerequisite for e€ective and ecient water management to have reliable infor- mation of water quality. The discharge of industrial and municipal waste- water and manure can be considered a constant polluting source, but not so the surface runo€ which is seasonal and highly a€ected by climate. Flow in rivers is a function of many factors includ- ing precipitation, surface runo€, inter¯ow, ground- water ¯ow and pumped in¯ow and out¯ow. Seasonal variations of these factors have a strong e€ect on ¯ow rates and hence on the concentration of pollutants in the river water. Long-term surveys and monitoring programs of water quality are an adequate approach to a better knowledge of river hydrochemistry and pollution, but they produce large sets of data which are often dicult to interpret (Dixon and Chiswell, 1996). Most discussions on trend detection focus on ana- lysing a single variable, while routine monitoring programs ordinarily measure several variables. The problem of data reduction and interpretation of multiconstituent chemical and physical measure- ments can be approached through the application of multivariate statistical methods and exploratory data analysis (Massart et al., 1988; Wenning and Erickson, 1994). The usefulness of multivariate stat- istical tools in the treatment of analytical and en- vironmental data is re¯ected by the increasing number of papers cited in Analytical Chemistry Reviews (Brown et al., 1994, 1996). Cluster analysis and principal component analysis (PCA) have been widely used as they are unbiased methods which can indicate associations between samples and/or variables (Wenning and Erickson, 1994). These associations, based on similar magni- tudes or variations in chemical and physical constitu- ents, may indicate the presence of seasonal or man- made in¯uences. Hierarchical agglomerative cluster Wat. Res. Vol. 32, No. 12, pp. 3581±3592, 1998 # 1998 Elsevier Science Ltd. All rights reserved Printed in Great Britain 0043-1354/98 $19.00 + 0.00 PII: S0043-1354(98)00138-9 *Author to whom all correspondence should be addressed. [E-mail: solvega@wamba.cpd.uva.es]. 3581
  • 2. analysis indicates groupings of samples by linking inter-sample similarities and illustrates the overall similarity of variables in the data set (Massart and Kaufman, 1983). PCA is used to reduce the dimen- sionality of the data set by explaining the correlation among a large set of variables in terms of a small number of underlying factors or principal com- ponents without losing much information (Jackson, 1991; Meglen, 1992), and allows to assess associ- ations between variables, since they indicate partici- pation of individual chemicals in several in¯uence factors. Exploratory data analysis has been used to evaluate the water quality of rivers, and seasonal, spatial and anthropogenic in¯uences have been evi- denced (Brown et al., 1980; Bartels et al., 1985; Grimalt et al., 1990; Librando, 1991; Andrade et al., 1992; Aruga et al., 1993; Elosegui and Pozo, 1994; Pardo et al., 1994; Battegazzore and Renoldi, 1995; Voutsa et al., 1995). In this work, PCA, analysis of variance (ANOVA) and agglomerative hierarchical cluster analysis have been used to investigate the water quality of the Pisuerga river (Duero basin, Spain), to assess the in¯uence that pollution and seasonality have on the quality of river water, and to discrimi- nate the individual e€ects of climate and human ac- tivities on the river hydrochemistry. METHODS Sampling stations The Pisuerga river belongs to the Duero river basin, which is located in the Castilla y Leo  n region (Centre- North of Spain). The inland geographic situation of the basin, surrounded by mountains, conditions an extremely continental climate. Precipitations in the area are scarce, ranging from 313 to 571 mm yrÿ1 , with a mean of 442 mm yrÿ1 . Precipitations are maximum in November (49.8 mm) and minimum in August (13.2 mm). The annual mean temperature is 128C, and extreme values of ÿ48C and 328C are registered in January and July, respectively. The river ¯ows in direction North±South from the Northern mountains through a high tableland to run into the Duero river, and is the main drainage stream in that direction; in spring, snow melting in the Northern moun- tains causes a marked increase in river ¯ow. Along its course, the river pass through limestone, marl, gypsum and sandstone soils which are the main contributors to the high levels of minerals in the river water. An important agricultural activity devoted to irrigated crops takes place in riverine areas where the use of nitrogenous fertilisers is a common practice. 12 Km upstream its mouth, the river crosses the town of Valladolid, major industrial centre of the region with a population of ca. 400 000. Municipal wastewater is directly discharged into the river (estimated volume is ca. 57 millions m3 yrÿ1 ) as the wastewater and sewage treatment plant is still being built. Moreover, although big industries settled in the area purify their wastewater, small industries are suspected to discharge residues into the river. The combination of both a high population density in the area and an extreme continental climate causes river hydrology and hence river pollution to be strongly in¯uenced by seasonality. The investigated river section is located at 41823'24N and 04827'00W, and is in average 690 m over the sea level. It covers a length of 25 km from Cabezo  n de Pisuerga, small village located 13 km upstream Valladolid, and the village of Simancas, in the mouth of the Pisuerga river, 12 km downstream Valladolid. Major industrial activity in the area is concentrated in the North of the city, upstream the bridge called Puente Mayor, and municipal discharges into the river are mainly produced from Puente Mayor to Simancas. Selected sampling stations were located in Cabezo  n de Pisuerga, Puente Mayor and Simancas, in an attempt to isolate and identify the polluting sources: in Cabezo  n de Pisuerga the river has not received industrial and munici- pal wastewater yet, and the water quality in this station can be considered to re¯ect pollution from overland ¯ow and from agricultural and manure discharges; Puente Mayor re¯ects the situation in which industrial wastewater has been discharged, but no municipal residues; in Simancas the river has received all the polluting dis- charges. Selected stations were sampled every three months for two and a half years. A total of 10 samples were collected from each station on the following dates: 06/04/90, 03/07/ 90, 09/10/90, 09/01/91, 03/04/91, 02/07/91, 10/10/91, 09/ 01/92, 10/04/92 and 06/07/92. Samples are identi®ed throughout by means of a four-character code XYZZ, where X means the sampling station (C, Cabezo  n; P, Puente Mayor and S, Simancas), Y is the month of sampling (A, April; J, July; O, October and E, January) and ZZ means the year (90, 91 or 92). Analytical procedures Sample containers were 1 l polyethylene bottles provided with hermetic-locking caps. Bottles and caps were cleaned by soaking into 50% HCl for three days, rinsed with desionized water and soaked into 2 M HNO3 for another three days, ®nally rinsed with desionized water, drained, wrapped in polyethylene bags and stored until required. Samples were collected by means of a Go-Flo device from the middle of the stream at a depth of 15 cm, from stone bridges existing in each of the sampling stations. Prior to sample collection, sampling device and containers were rinsed twice with the water to be sampled. Temperature, pH, conductivity and dissolved oxygen measurements were performed in situ. Duplicate samples were taken out from each sampling station and immedi- ately ®ltered under nitrogen pressure through cellulose nitrate ®lters (pore size 0.45 mm) into acid-washed poly- ethylene bottles. One duplicate was acidi®ed to pH 2 by addition of 100 ml of 10 M HCl to each 100 ml sample and used for determination of metals, hardness, nitrogen (as ammonia, nitrite and nitrate) and phosphorous (as phos- phate). The second duplicate was kept at its natural pH and used for determination of the remaining anions (bicar- bonate, chloride and sulphate), conductivity and organic matter (as chemical oxygen demand, COD, and biochemi- cal oxygen demand, BOD). Samples were immediately transported to the laboratory and stored at 48C until their analysis, which was accomplished within one week. 22 Physico-chemical parameters have been determined by following standard and recommended methods of analysis (APHA-AWWA-WPCF, 1985; AOAC, 1990). Table 1 displays the variables measured and their units, the analytical techniques employed, and the abbreviations used henceforth. A total of 660 analysis were carried out (22 variables in 30 samples). Two replications of each analysis were performed and mean values were used for calculations. Data treatment Exploratory data analysis was performed by linear dis- play methods (principal component analysis) and by unsu- pervised pattern recognition techniques (hierarchical cluster analysis) on experimental data normalized to zero Marisol Vega et al. 3582
  • 3. mean and unit variance in order to avoid misclassi®cations arising from the di€erent order of magnitude of both nu- merical value and variance, of the parameters analysed. As the methods of classi®cation used here are non-parametric, they make no assumptions about the underlying statistical distribution of the data and therefore no evaluation of normal (Gaussian) distribution of the data is necessary (Sharaf et al., 1986). Principal component analysis was applied to normalized data to assess associations between variables, since this method evidences participation of individual chemicals in several in¯uence factors, which commonly occurs in hydrochemistry. Diagonalization of the correlation matrix transforms the original p correlated variables into p uncor- related (orthogonal) variables called principal components (PCs), which are weighed linear combinations of the orig- inal variables (Mellinger, 1987; Meglen, 1992; Wenning and Erickson, 1994). The characteristic roots (eigenvalues) of the PCs are a measure of their associated variances, and the sum of eigenvalues coincides with the total num- ber of variables. Correlation of PCs and original variables is given by loadings, and individual transformed obser- vations are called scores. Cluster analysis is an unsupervised pattern recognition technique that uncovers intrinsic structure or underlying behaviour of a data set without making a priori assump- tions about the data, in order to classify the objects of the system into categories or clusters based on their nearness or similarity. In hierarchical cluster analysis the distance between samples is used as a measure of similarity. Hierarchical agglomerative cluster analysis was carried out on the normalised data by means of the complete linkage (furthest neighbour), average linkage (between and within groups) and Ward's methods, using squared Euclidean dis- tances as a measure of similarity (Massart and Kaufman, 1983; Willet, 1987). RESULTS AND DISCUSSION Table 2 summarises brie¯y the mean value and standard deviation of the 22 measured variables in the river water samples from the three stations. It must be noticed the high dispersion of most vari- ables (high standard deviations), which indicates variability in chemical composition between samples, thus pointing to the presence of temporal variations caused likely by polluting sources and/or climatic factors. Recommended guide levels of these variables and maximum levels allowed by the European Directive 80/778/EEC concerning the quality of water intended for human consumption are included in Table 2. It must be emphasised that average concentrations of some variables such as chloride, COD, iron, manga- nese, sodium, ammonia, nitrite, phosphate and sul- phate are higher than those recommended by the European legislation, therefore this water resource is not adequate for human consumption or industrial purposes and needs to be puri®ed. High levels of phosphate may originate from mu- nicipal wastewater discharges since it is an important component of detergents. The presence of nitrate in the river section sampled is suspected to originate from overland runo€ from riverine agricultural ®elds where irrigated horticultural crops are grown and the use of inorganic fertilisers (usually as ammonium nitrate) is rather frequent. This practice could also explain the high levels of ammonia, but this pollutant may also originate from decomposition of nitrogen- containing organic compounds such as proteins and urea occurring in municipal wastewater discharges. In the presence of high levels of organic matter, nitrate can be reduced in some extent to nitrite, what could explain the high concentration of this pollutant in some samples. The high sulphate contents found in waters of the Pisuerga river are probably a conse- quence of the morphology of soils irrigated by the river, which are formed mainly by limestone, marl and gypsum. Exploratory data analysis using box plots Normal probability plots of the variables in con- junction with the Anderson±Darling normality test Table 1. Physico-chemical parameters determined and analytical techniques used Variable Abbreviation Analytical technique Units Biochemical oxygen demand BOD potentiometry/O2 probe mg O2 lÿ1 Calcium Ca ¯ame AAS mg lÿ1 Chloride Cl ion chromatography mg lÿ1 Chemical oxygen demand COD redox titrometry (KMnO4) mg O2 lÿ1 Conductivity COND conductometry mmho cmÿ1 Dissolved solids DS drying at 1808C/weighing mg lÿ1 Iron Fe ¯ame AAS mg lÿ1 Flow rate FLOW (*) m3 sÿ1 Hardness HARD EDTA titrometry mg CaCO3 lÿ1 Bicarbonate HCO3 acid±base titrometry mg lÿ1 Potassium K ¯ame AES mg lÿ1 Magnesium Mg ¯ame AAS mg lÿ1 Manganese Mn ¯ame AAS mg lÿ1 Sodium Na ¯ame AES mg lÿ1 Ammonium NH4 spectrophotometry mg lÿ1 Nitrite NO2 spectrophotometry mg lÿ1 Nitrate NO3 spectrophotometry mg lÿ1 Dissolved oxygen OXYG potentiometry/O2 probe mg lÿ1 pH pH potentiometry/pH probe pH units Phosphate PO4 ion chromatography mg lÿ1 Sulphate SO4 ion chromatography mg lÿ1 Temperature TEMP temperature probe 8C (*) Data supplied by Confederacio  n Hidrogra  ®ca del Duero. Water quality analysis using exploratory data 3583
  • 4. demonstrated that most variables were not normally distributed. However, these normality tests applied to individual sampling stations resulted in normal distributions for most variables, thus pointing to the existence of di€erences in water composition among stations. Box plots (also called box-and-whisker plots) of individual variables in the three sampling stations were examined. Figure 1 shows an example of box plots for some meaningful variables related to the quality of river water, such as conductivity (mineralization), COD, dissolved oxygen or am- monium. The line across the box represents the median, whereas the bottom and top of the box show the locations of the ®rst and third quartiles (Q1 and Q3). The whiskers are the lines that extend from the bottom and top of the box to the lowest and highest observations inside the region de®ned by Q1ÿ1.5(Q3ÿQ1) and Q3+1.5(Q3ÿQ1). Individual points with values outside these limits (outliers) are plotted with asterisks. Table 2. Statistical descriptives for the 30 samples analysed Cabezo  n Puente Mayor Simancas Variable Mean Std. Dev. Mean Std. Dev. Mean Std. Dev. Min. Max. Guide level* Max.* BOD 2.8 0.8 3.2 0.7 3.7 1.2 1.5 6.5 Ca 77.0 9.6 77.1 7.4 76.5 8.9 58.8 91.2 100 Cl 23.3 7.7 24.3 8.0 28.3 9.9 12.2 46.1 25 200 COD 3.1 1.2 3.6 0.8 5.0 2.0 0.7 10 2 5 COND 589 123 599 98 629 115 402 773 400 DS 398 81 410 67 427 69 273 524 1500 Fe 0.10 0.05 0.12 0.04 0.11 0.05 0.01 0.19 0.05 0.2 FLOW 45.0 42.6 37.0 20.9 37.5 21.2 14.8 129.2 HARD 250.1 43.6 253.1 32.6 254.4 35.1 179.1 302.9 HCO3 150.4 17.8 142.8 20.9 156.1 23.4 96.1 176.8 K 4.8 1.9 5.2 1.8 6.2 2.2 2.8 10.4 10 12 Mg 14.0 5.3 14.8 4.8 15.4 4.5 6.2 23.8 30 50 Mn 0.03 0.02 0.03 0.02 0.04 0.02 0.01 0.08 0.02 0.05 Na 19.4 9.5 20.2 7.7 25.6 10.0 7.1 40.5 20 150 NH4 0.63 0.62 0.51 0.23 1.66 0.92 0.05 3.61 0.05 0.5 NO2 0.32 0.32 0.13 0.09 0.35 0.30 0.03 1.08 Absence 0.1 NO3 11.2 7.3 11.9 7.0 10.4 8.4 0.3 29.9 25 50 OXYG 8.1 1.8 8.4 1.8 4.9 3.3 0.7 11.4 pH 8.0 0.2 8.1 0.5 7.6 0.3 7.2 8.8 6.5±8.5 9.5 PO4 0.84 0.32 0.86 0.30 1.61 0.63 0.35 2.50 0.3 3.3 SO4 105.4 34.9 108.9 28.7 112.7 28.1 50 150 25 250 TEMP 13.6 5.9 14.5 7.7 14.3 7.3 2.2 24.9 12 25 (*) Recommended guide levels and maximum concentrations allowed by the European Directive 80/778/EEC concerning the quality of water intended for human consumption. Fig. 1. Box plots for conductivity, COD, dissolved oxygen and ammonium in Cabezo  n (C), Puente Mayor (P) and Simancas (S). Marisol Vega et al. 3584
  • 5. Box plots provide a visual impression of the lo- cation and shape of the underlying distributions. For example, box plots with long whiskers at the top of the box (such as that for ammonium at Simancas) indicate the underlying distribution is skewed toward high concentration. Box plots with large spread indicate seasonal variations of the water composition (see conductivity box plot). By inspecting these plots it was also possible to per- ceive di€erences among the three stations. For example, dissolved oxygen in Simancas is lower and has a greater spread compared with that in Cabezo  n and Puente Mayor. At the same time, COD and ammonium are higher in Simancas, thus pointing to a deterioration of the water quality downstream likely caused by the discharge of mu- nicipal wastewater. Analysis of variance (ANOVA) examines the di€erent e€ects (usually called sources of variation) operating simultaneously on a response to decide which e€ects are statistically signi®cant and to esti- mate their contribution to the variability of the re- sponse (Sche€e, 1959; Ross, 1988). Two-way ANOVA of independent variables showed the exist- ence of seasonal and/or spatial di€erences. For example, seasonal signi®cant di€erences were found for conductivity, temperature or ¯ow, whereas for ammonium, phosphate or pH the di€erences were mainly due to the sampling station. For COD and BOD both sources of variation were signi®cant. Box plots and ANOVA showed similar trends for each variable; however, these are univariate tech- niques inadequate for the investigation of our mul- tivariate data table as the variables are correlated. Principal component analysis The covariance matrix of the 22 analysed vari- ables was calculated from data normalised as described in Section 2.3 and, therefore, coincides with the correlation matrix (Table 3). Because the three sampling stations were combined to calculate the correlation matrix, the correlation coecients should be interpreted with caution as they are a€ected simultaneously by spatial and temporal variations. Nevertheless, some clear hydrochemical relationships can be readily inferred: High and posi- tive correlation (underlined values) can be observed between bicarbonate, sulphate, chloride, calcium, magnesium, potassium, sodium, dissolved solids, conductivity and hardness (r = 0.572 to 0.977), which are responsible for water mineralization. Flow rate is negatively correlated to most variables, since an increase in ¯ow rate causes dilution of con- taminants. This anti-correlation is highly signi®cant for ``mineral'' components (conductivity, hardness, dissolved solids, magnesium and sulphate). BOD and COD are strongly correlated (r = 0.893) and also with ammonia, phosphate (closely related to contamination for organic mater) and potassium. As expected, dissolved oxygen is negatively corre- Table 3. Correlation matrix of the 22 physico-chemical parameters determined BOD Ca Cl COD COND DS Fe FLOW HARD HCO 3 K Mg Mn Na NH 4 NO 2 NO 3 OXYG pH PO 4 SO 4 TEMP BOD 1.000 Ca ÿ0.117 1.000 Cl 0.413 0.758 1.000 COD 0.893 ÿ0.036 0.516 1.000 COND 0.260 0.887 0.916 0.321 1.000 DS 0.316 0.825 0.881 0.334 0.974 1.000 Fe 0.177 ÿ0.270 ÿ0.137 0.065 ÿ0.154 ÿ0.102 1.000 FLOW ÿ0.164 ÿ0.497 ÿ0.394 ÿ0.108 ÿ0.592 ÿ0.571 ÿ0.048 1.000 HARD 0.229 0.898 0.860 0.240 0.977 0.951 ÿ0.151 ÿ0.659 1.000 HCO 3 0.270 0.648 0.712 0.347 0.774 0.762 ÿ0.251 ÿ0.484 0.770 1.000 K 0.679 0.442 0.748 0.649 0.701 0.713 ÿ0.100 ÿ0.356 0.656 0.644 1.000 Mg 0.552 0.579 0.772 0.484 0.849 0.868 0.016 ÿ0.683 0.879 0.725 0.736 1.000 Mn 0.492 0.109 0.434 0.437 0.333 0.311 0.464 ÿ0.431 0.346 0.285 0.423 0.521 1.000 Na 0.238 0.809 0.914 0.350 0.929 0.902 ÿ0.118 ÿ0.419 0.841 0.705 0.697 0.683 0.280 1.000 NH 4 0.709 0.110 0.483 0.773 0.378 0.384 0.094 ÿ0.170 0.291 0.485 0.663 0.419 0.359 0.468 1.000 NO 2 0.324 0.062 0.190 0.258 0.195 0.233 ÿ0.110 ÿ0.198 0.222 0.381 0.381 0.341 0.329 0.118 0.327 1.000 NO 3 ÿ0.010 ÿ0.021 ÿ0.172 ÿ0.114 ÿ0.018 0.072 0.208 0.187 ÿ0.019 ÿ0.211 ÿ0.047 ÿ0.014 ÿ0.314 ÿ0.074 0.021 ÿ0.109 1.000 OXYG ÿ0.531 ÿ0.009 ÿ0.375 ÿ0.634 ÿ0.282 ÿ0.246 ÿ0.016 0.389 ÿ0.247 ÿ0.435 ÿ0.476 ÿ0.444 ÿ0.613 ÿ0.286 ÿ0.559 ÿ0.555 0.453 1.000 pH ÿ0.541 0.402 ÿ0.031 ÿ0.544 0.112 0.030 ÿ0.337 ÿ0.132 0.159 ÿ0.048 ÿ0.112 ÿ0.144 ÿ0.292 0.076 ÿ0.477 ÿ0.365 ÿ0.173 0.442 1.000 PO 4 0.434 0.209 0.506 0.601 0.409 0.378 0.026 ÿ0.395 0.342 0.532 0.503 0.406 0.590 0.451 0.613 0.453 ÿ0.376 ÿ0.847 ÿ0.374 1.000 SO 4 0.130 0.902 0.873 0.209 0.971 0.944 ÿ0.097 ÿ0.594 0.949 0.682 0.572 0.781 0.297 0.900 0.224 0.112 0.014 ÿ0.182 0.160 0.338 1.000 TEMP 0.290 ÿ0.080 0.122 0.278 0.092 0.041 0.022 ÿ0.481 0.150 0.142 0.198 0.359 0.568 ÿ0.025 ÿ0.031 0.359 ÿ0.501 ÿ0.712 ÿ0.070 0.463 0.074 1.000 Water quality analysis using exploratory data 3585
  • 6. lated with temperature because the solubility of oxygen in water decreases with increasing tempera- ture; BOD, COD and nitrogen and phosphorous compounds are also anti-correlated with dissolved oxygen as organic matter is partially oxidized by oxygen, whilst nutrients are responsible for eutro- phication of freshwater, thus causing a further increase in organic matter concentration and, hence, in oxygen demand. Iron, nitrate and pH showed no signi®cant correlation with any other variables. By applying the Bartlett's sphericity test, a value of 1006.6 for the Bartlett chi-square statistic was found (critical value is 234 for 231 degrees of free- dom at the 95% signi®cance level), con®rming that variables are not orthogonal but correlated, there- fore allowing to explain the data variability with a lesser number of variables (called principal com- ponents). Principal components were extracted by the R- mode principal component method which math- ematically transforms the original data with no assumptions about the form of the covariance matrix. This analysis allows a clustering of variables on the basis of mutual correlations, and a grouping of objects based on their similarities. For this analy- sis, the covariance matrix was diagonalised and the characteristic roots (eigenvalues) were obtained. The transformed variables or principal components (PCs) were obtained as weighted linear combi- nations of the original variables. The Scree plot (see Fig. 2) was used to identify the number of PCs to be retained in order to com- prehend the underlying data structure (Jackson, 1991). The Scree plot shows a pronounced change of slope after the third eigenvalue; Cattell and Jaspers (1967) suggested using all the PCs up to and including the ®rst one after the brake, so that four PCs were retained, which have eigenvalues greater than unity and explain 81.5% of the var- iance or information contained in the original data set. Projections of the original variables on the sub- space of the PCs are called loadings and coincide with the correlation coecients between PCs and variables. Loadings of the four retained PCs are presented in Table 4. PC1 explains 46.1% of the variance and is highly contributed by most vari- ables: chloride, bicarbonate, sulphate, conductivity, dissolved solids, hardness, calcium, potassium, mag- nesium, sodium and, in a less extent, by BOD, COD, manganese, ammonia, and phosphate. These variables were demonstrated to be correlated (see correlation matrix, Table 3). Flow rate and dis- solved oxygen have a negative participation in PC1. PC2 explains 19.0% of the variance and includes calcium, dissolved oxygen, pH (positive loading), BOD, COD, nitrite, phosphate and manganese (negative participation). PC3 (9.8% of the variance) is positively contributed by nitrate and negatively by temperature. Finally, PC4 explains 6.6% of the total variability of the original data and is highly participated by iron. As can be seen in Table 4, PC1 is highly partici- pated by most variables, thus hindering its hydro- chemical interpretation. In the same way, variables related to anthropogenic pollution like BOD, COD, phosphorous or nitrogen compounds have a high participation on both PC1 and PC2, and therefore PC2 cannot be explained only in terms of organic pollution. A rotation of principal components can achieve a simpler and more meaningful represen- tation of the underlying factors by decreasing the contribution to PCs of variables with minor signi®- cance and increasing the more signi®cant ones. Rotation produces a new set of factors, each one involving primarily a subset of the original variables with as little overlap as possible, so that the original variables are divided into groups somewhat inde- Fig. 2. Scree plot of the characteristic roots (eigenvalues) of principal components (r) and varifactors (q). Table 4. Loadings of 22 experimental variables on four signi®cant principal components for 30 river water samples Variable PC1 PC2 PC3 PC4 BOD 0.523 ÿ0.635 0.353 0.022 Ca 0.702 0.656 ÿ0.073 ÿ0.027 Cl 0.914 0.164 0.101 ÿ0.073 COD 0.574 ÿ0.618 0.322 ÿ0.154 COND 0.925 0.365 0.046 0.036 DS 0.909 0.335 0.139 0.076 Fe ÿ0.074 ÿ0.328 0.195 0.826 FLOW ÿ0.628 ÿ0.095 0.424 ÿ0.347 HARD 0.897 0.394 ÿ0.037 0.101 HCO3 0.821 0.116 ÿ0.050 ÿ0.242 K 0.828 ÿ0.139 0.215 ÿ0.147 Mg 0.901 0.020 0.013 0.216 Mn 0.547 ÿ0.479 ÿ0.253 0.459 Na 0.864 0.317 0.140 ÿ0.067 NH4 0.590 ÿ0.468 0.446 ÿ0.199 NO2 0.388 ÿ0.400 ÿ0.152 ÿ0.218 NO3 ÿ0.160 0.223 0.710 0.303 OXYG ÿ0.576 0.669 0.329 0.136 pH ÿ0.120 0.712 ÿ0.401 ÿ0.082 PO4 0.650 ÿ0.495 ÿ0.205 ÿ0.151 SO4 0.851 0.458 0.003 0.140 TEMP 0.306 ÿ0.469 ÿ0.708 0.143 Eigenvalue 10.148 4.181 2.154 1.459 % Variance explained 46.1 19.0 9.8 6.6 % Cum. variance 46.1 65.1 74.9 81.5 Marisol Vega et al. 3586
  • 7. pendent of each other (Sharaf et al., 1986; Massart et al., 1988). Although rotation does not a€ect the goodness of ®tting of the principal component sol- ution, the variance explained by each factor is modi®ed. A varimax rotation of the principal components led to 22 rotated PCs (called henceforth varifactors) whose eigenvalues are plotted in Fig. 2. The Scree plot shows a pronounced change of slope after the third eigenvalue, therefore four varifactors explain- ing 67.8% of the variance were retained (Cattell and Jaspers, 1967). Eigenvalues and loadings of these varifactors are displayed in Table 5. It must be noted that rotation has resulted in an increase of the number of factors necessary to explain the same amount of variance of the original data set, so that the ®rst two varifactors used for graphical represen- tation explains a lesser amount of variance. However, smaller groups of variables can be now associated to individual rotated factors with a clearer hydrochemical meaning. Varifactor 1 explains 37.2% of the total variance and is highly participated by calcium, chloride, con- ductivity, dissolved solids, hardness, bicarbonate, magnesium, sodium and sulphate, and can be thus interpreted as a mineral component of the river water. This clustering of variables points to a com- mon origin for these minerals, likely from dissol- ution of limestone, marl and gypsum soils. Flow rate contributes negatively to this factor, which can be explained considering that dilution processes of dissolved minerals increase with ¯ow. Varifactor 2 contains 16.7% of the variance and includes BOD, COD and ammonia, whereas pH and oxygen have a negative contribution to this varifactor. This vari- factor can be explained taking into account that high levels of dissolved organic matter consume large amounts of oxygen; organic matter in urban wastewater consists mainly of carbohydrates, pro- teins and lipids which, as the amount of available dissolved oxygen decreases, undergo anaerobic fer- mentation processes leading to ammonia and or- ganic acids. Hydrolysis of these acidic materials causes a decrease of water pH values. Potassium contributes in the same extent to varifactor 1 and 2. Varifactor 3 (8.0% of variance) has a high and positive load of temperature and negative of dis- Table 5. Loadings of 22 experimental variables on the ®rst four rotated PCs for 30 river water samples Variable Varifactor 1 Varifactor 2 Varifactor 3 Varifactor 4 BOD 0.116 0.934 0.163 0.111 Ca 0.920 ÿ0.179 ÿ0.093 ÿ0.119 Cl 0.893 0.326 0.048 ÿ0.034 COD 0.180 0.912 0.159 0.011 COND 0.973 0.148 0.049 ÿ0.038 DS 0.950 0.183 ÿ0.001 0.001 Fe ÿ0.131 0.072 0.012 0.970 FLOW ÿ0.496 ÿ0.005 ÿ0.323 ÿ0.094 HARD 0.952 0.089 0.106 ÿ0.033 HCO3 0.697 0.184 0.024 ÿ0.139 K 0.584 0.614 0.089 ÿ0.043 Mg 0.766 0.359 0.289 0.071 Mn 0.248 0.290 0.387 0.472 Na 0.918 0.180 ÿ0.070 0.003 NH4 0.225 0.761 ÿ0.190 0.065 NO2 0.105 0.170 0.182 ÿ0.061 NO3 0.014 ÿ0.003 ÿ0.260 0.104 OXYG ÿ0.132 ÿ0.418 ÿ0.540 ÿ0.016 pH 0.169 ÿ0.434 ÿ0.018 ÿ0.201 PO4 0.276 0.350 0.244 0.045 SO4 0.981 0.008 0.059 0.022 TEMP ÿ0.003 0.114 0.919 0.031 Eigenvalue 8.175 3.677 1.763 1.292 % Variance explained 37.2 16.7 8.0 5.9 % Cum. variance 37.2 53.9 61.9 67.8 Fig. 3. Scores of river water samples on the bidimensional plane de®ned by the ®rst two varifactors. Space reduction from 22 to 2 dimensions (53.9% of the total variance). Samples collected at Cabezo  n de Pisuerga (.), Puente Mayor (Q) and Simancas (R) in January (E), April (A), July (J) and October (O) from 1990 to 1992. Water quality analysis using exploratory data 3587
  • 8. solved oxygen, since solubility of gases in water decreases with increasing temperature. Flow rate should be expected to have a high and negative load on varifactor 3, as high temperatures corre- spond to dry and hot seasons like summer, when ¯ow rate is lower; however, its load is negative but small (ÿ0.323) because during 1990 a persistent drought caused low ¯ow rates even in winter sea- son. Finally, varifactor 4 (5.9% of variance) is par- ticipated by iron and manganese, which are hydrochemically related. Figure 3 displays a plot of sample scores on the bidimensional plane de®ned by varifactors 1 (min- eral contents) and varifactor 2 (anthropogenic con- tamination, namely organic matter). High and positive scores on varifactors 1 or 2 indicate high mineral contents or high organic pollution, respect- ively, whereas those samples with high and negative scores on varifactors 1 or 2 will correspond to higher ¯ow rate or dissolved oxygen, thus indicating a better water quality. From Fig. 3 it can be con- cluded that sample SJ90 (collected in Simancas in July 1990) shows the worst quality, with high levels of both minerals and organics. Samples collected in January and April 1991 are projected onto negative varifactor 1 and therefore show the lowest mineral contents. As pointed above, winter of 1990 was extremely dry and that fact is re¯ected by the high scores on varifactor 2 of samples collected in April and July 1990. Box plots of varifactors 1, 2 and 3 in the three sampling stations are shown in Fig. 4. Some im- portant conclusions are derived from these plots: varifactor 1 (mineral contents) and varifactor 3 (temperature) show large spread around the me- dian, thus pointing to an important contribution of sampling time to the variance of these varifactors. On the other hand, varifactor 2 (anthropogenic pol- lution) exhibits small spread, but the median increases slightly from Cabezo  n to Simancas, there- fore indicating that sampling station is the most im- portant source of variation in explaining the variance of this varifactor, which is scarcely a€ected by sampling times. Two-way ANOVA on the three more relevant varifactors was carried out and results of the F-test are displayed in Table 6. Normal probability plots of varifactors applied to individual sampling Fig. 4. Box plots for three more signi®cant varifactors in Cabezo  n (C), Puente Mayor (P) and Simancas (S). Table 6. Two-way ANOVA and F-test of the three more relevant rotated PCs Source of variation Sum of squares Degrees of freedom Variance of squares F Pooled sum of squares % Contribution Varifactor 1 Sampling time 18.399 9 2.044 3.521 13.629 47.0 Sampling station 0.150 2 0.075 0.129 Residual 10.451 18 0.581 15.371 53.0 Total 29.000 29 29.000 100.0 Varifactor 2 Sampling time 11.809 9 1.312 1.941 Sampling station 5.026 2 2.513 3.718 3.250 11.2 Residual 12.165 18 0.676 25.750 88.8 Total 29.000 29 29.000 100.0 Varifactor 3 Sampling time 25.306 9 2.812 15.428 23.643 81.5 Sampling station 0.414 2 0.207 1.135 Residual 3.281 18 0.182 5.357 18.5 Total 29.000 29 29.000 100.0 F calculated as variance of the e€ect/variance of the residual. Fcrit is 2.456 for 9 and 18 degrees of freedom and 3.555 for 2 and 18 d.f (p = 0.05). Marisol Vega et al. 3588
  • 9. stations showed that varifactors were normally dis- tributed, except varifactor 2 at Simancas. However, the F-test as applied in ANOVA is not too sensitive to departures from normality of distribution (Miller and Miller, 1984) and was therefore used to inter- pret the sources of variation. Sources of variation that can a€ect sample pro- jections on varifactors are sampling time (seasonal e€ect) and sampling station (geographical or pollut- ing e€ect). A comparison of the estimates of var- iance by means of the Fisher ratio (F) indicates that, at the 95% con®dence level, there is a signi®- cant contribution to the total variance of varifactor 1 due to variation between sampling times (F>Fcrit(9,18,p = 0.05), but the variation between sampling stations does not contribute signi®cantly (F < Fcrit(2,18,p = 0.05). Since varifactor 1 can be interpreted as water inorganic (mineral) contents, which increase with decreasing ¯ow rate, it can be concluded that levels of minerals in the river water investigated are seasonal and climate dependent, and are una€ected by sampling location, thus point- ing to a natural (non-anthropogenic) origin for this polluting form. For varifactor 2 (organic matter, nitrogen and phosphorous), only signi®cant contri- bution to the variance due to di€erences between sampling stations was found. This indicates that or- ganic pollution of river water originates from anthropogenic sources, mainly as municipal waste- water which is disposed into the river between Puente Mayor and Simancas. Sampling stations were demonstrated not to contribute to the variance of varifactor 3 (temperature), whereas highly signi®- cant di€erences were found between sampling times, thus showing that only climate and seasonality are responsible for variations in water temperature, and that there is no thermal pollution in the river sec- tion investigated. Those sources of variation that were demon- strated not to contribute signi®cantly to the var- iance of varifactors (F < Fcritical) were combined with the residual variance (Ross, 1988) and from the recalculated sum of squares the contribution of the e€ect to the variability of the varifactor was estimated as %Contribution ˆ SS0 SST 100, where SS' is the pooled sum of squares and SST the total sum of squares. It can be seen in Table 6 that seasonality contributes by 47.0% and 81.5% to the variability of varifactors 1 (mineral composition) and 3 (temperature), respectively, thus evidencing the strong e€ect that climate has on the variables explained by these varifactors. Besides, sampling lo- cation has a negligible contribution to varifactors 1 and 3, but contributes by 11.2% to the variability of varifactor 2 (anthropogenic pollution); this con- tribution is smaller than that of the residual (88.8%), thus indicating the possible existence of an interaction between both sources of variation: although the e€ect of sampling time (season) is not signi®cant, it cannot be completely discarded (F Fcritical but 1) since climate has also a small contribution to varifactor 2 due to seasonal vari- ations of ¯ow rate which cause dilution of pollu- tants of anthropogenic origin. Spatio-temporal variations of water quality can be readily visualised in Fig. 5, where varifactors 1, 2 and 3 have been plotted vs sampling times for the Fig. 5. Spatio-temporal ¯uctuations of varifactors 1, 2 and 3 and their relationship with river ¯ow rate (ÐÐÐ). Sampling stations: Cabezo  n de Pisuerga (), Puente Mayor (q) and Simancas (r). Water quality analysis using exploratory data 3589
  • 10. stations investigated: Cabezo  n, Puente Mayor and Simancas. The average ¯ow rate for the three stations has been simultaneously plotted to show the relationship between water quality and ¯ow rate. Again, the inverse relationship between ¯ow rate and rotated factors 1 and 3 (mineral com- ponents in water and temperature, respectively) can be observed, whilst for varifactor 2 (organic pol- lution and nutrients) this negative correlation exists not so markedly. The interaction between sampling location and sampling time is illustrated in Fig. 5: maximum variability of varifactor 2 along the river section sampled occurs in dry seasons (July and October) when river ¯ow rate decreases. This can be interpreted taking into account that municipal wastewater discharges into the Pisuerga river are the main and nearly constant source of organic matter, so that an increase in river ¯ow rate causes dilution of pollutants and hence di€erences between sampling stations are made less evident. Figure 5 shows also that sample scores on varifactor 2 are always higher for those samples collected in Simancas whilst Cabezo  n and Puente Mayor scores are similar, thus assessing that the main discharges of organic mater and nutrients are located between Puente Mayor and Simancas, which con®rms mu- nicipal wastewater as the principal source of or- ganic pollutants for the Pisuerga river. These conclusions are in good agreement with the spatio- temporal pro®le exhibited by the complexing ca- pacity of the Pisuerga river water (Pardo et al., 1994). Furthermore, di€erences in sample scores between Simancas and the other two sampling stations were higher in dry seasons (July and October) thus con®rming the spatial-temporal inter- action on varifactor 2. Temporal variation of some independent vari- ables associated to contamination of river water is depicted in Fig. 6. It can be observed that conduc- tivity behaves in the same way as varifactor 1 (see Fig. 5 for comparison), since this variable is closely related to mineral composition of river water, and therefore to varifactor 1. COD and ammonia are associated to organic pollution and therefore their pro®les are similar to that of varifactor 2. As can be seen in Fig. 6, the highest variation of these con- taminants occurs in Simancas, as important amounts of municipal wastewater are discharged Fig. 6. Temporal variations of some original variables associated to river water pollution and their re- lation with ¯ow rate (ÐÐÐ). Sampling stations: Cabezo  n de Pisuerga (), Puente Mayor (q) and Simancas (r). Marisol Vega et al. 3590
  • 11. upstream this station. Dissolved oxygen also shows a periodic pro®le habit related to seasonality with strong decreases at Simancas, caused by the high levels of oxygen-consuming organic matter. Cluster analysis Cluster analysis allows the grouping of river water samples on the basis of their similarities in chemical composition. Unlike PCA that normally uses only two or three PCs for display purposes, cluster analysis uses all the variance or information contained in the original data set. Hierarchical agglomerative clustering by the Ward's method was selected for sample classi®cation because it pos- sesses an small space distorting e€ect, uses more in- formation on cluster contents that other methods, and has been proved to be an extremely powerful grouping mechanism (Willet, 1987); besides, Ward's method yielded the most meaningful clusters. The method was applied to normalised data using squared Euclidean distances as a measure of simi- larity (Massart and Kaufman, 1983). A similar classi®cation pattern was obtained by the average linkage method (between groups). The dendrogram of samples obtained by the Ward's method is shown in Fig. 7. Two well di€er- entiated clusters can be seen, each formed by two subgroups, with river water quality decreasing from top to bottom. The ®rst group from the top is assorted with samples collected in January and April 1991, and one sample collected in Cabezo  n in April 1992; in the PCA method of classi®cation these samples scored high and negative on varifac- tor 1 and close to 0 on varifactor 2 (see Fig. 3) thus indicating the lowest levels of both minerals and or- ganic matter as these samples were collected in January and April 1991, when the river ¯ow rate is at is maximum due to snow melting at the river sources. This cluster is linked at a rescaled distance of about 7 to other small but tight group that includes samples taken out in July 1991 and July 1992 (except that from Simancas) and the sample PO91. In the PCA analysis these samples were also grouped on intermediate and negative values on the varifactor 1 axis. The second main cluster is formed for two subgroups that are linked at a rescaled dis- tance of 10: the ®rst of them includes very similar samples collected in January and April 1992, and samples CO90 and CO91 and corresponds to samples scoring high and positive varifactor 1 and negative varifactor 2 in the PCA analysis (see Fig. 3) thus pointing to their high levels of minerals and low of anthropogenic pollutants. The second sub- group includes samples collected in 1990 (April, July and October) and samples collected from Simancas in July and October 1991. These samples correspond to dry seasons and to the most contami- Fig. 7. Dendrogram based on agglomerative hierarchical clustering (Ward's method) for 30 river water samples collected at Cabezo  n de Pisuerga (C), Puente Mayor (P) and Simancas (S) in January (E), April (A), July (J) and October (O) from 1990 to 1992. Water quality analysis using exploratory data 3591
  • 12. nated station (Simancas) and show the worst water quality in both minerals and organic matter. CONCLUSIONS Environmental analytical chemistry generates multidimensional data that need of multivariate statistics to analyse and interpret the underlying in- formation. Water quality data of a river have been analysed by unsupervised pattern recognition (hier- archical cluster analysis) and display methods (prin- cipal component analysis) to extract correlations and similarities between variables and to classify river water samples in groups of similar quality. PCA has found a reduced number of ``latent'' vari- ables (principal components) that explain most of the variance of the experimental data set. A vari- max rotation of these PCs led to a reduced number of varifactors, each of them related to a small group of experimental variables with a hydrochemi- cal meaning: mineral contents for varifactor 1, anthropogenic pollutants for varifactor 2 or water temperature for varifactor 3. PCA in combination with ANOVA has allowed the identi®cation and assessment of spatial (pol- lution from anthropogenic origin) and temporal (seasonal and climatic) sources of variation a€ecting quality and hydrochemistry of river water. Man- made pollution was demonstrated to originate from municipal wastewater discharged into the river between the sampling stations of Puente Mayor and Simancas; temporal e€ects were associated to seaso- nal variations of river ¯ow rate which cause di- lution of pollutants and hence variations in water quality. The application of PCA and cluster analy- sis has achieved meaningful classi®cation of hydro- chemical variables and of river water samples based on seasonal and spatial criteria. Both multivariate techniques led to very similar classi®cation patterns. AcknowledgementÐThe authors wish to thank the Confederacio  n Hidrogra  ®ca del Duero (Valladolid, Spain) for providing data of river ¯ow rates. REFERENCES Andrade J. M., Prada D., Muniategui S., Gonza  lez E. and Alonso E. (1992) Multivariate analysis of environmental data for two hydrographic basins. Anal. Lett. 25, 379± 399. AOAC (1990) Ocial Methods of Analysis, Vol. 1, 15th edn., Association of Ocial Analytical Chemists, Arlington, VI, U.S.A., p. 312. APHA-AWWA-WPCF (1985) Standard Methods for the Examination of Water and Wastewater, 16th edn., American Public Health Association, American Water Works Association, Water Pollution Control Federation, U.S.A. Aruga R., Negro G. and Ostacoli G. (1993) Multivariate data analysis applied to the investigation of river pol- lution. Fresenius J. Anal. Chem. 346, 968±975. Bartels J. H. M., Janse T. A. H. M. and Pijpers F. W. (1985) Classi®cation of the quality of surface waters by means of pattern recognition. Anal. Chim. Acta 177, 35±45. Battegazzore M. and Renoldi M. (1995) Integrated chemi- cal and biological evaluation of the quality of the river Lambro (Italy). Wat. Air Soil Poll. 83, 375±390. Brown S. D., Skogerboe R. K. and Kowalski B. R. (1980) Pattern recognition assessment of water quality data: coal strip mine drainage. Chemosphere 9, 265±276. Brown S. D., Blank T. B., Sum S. T. and Weyer L. G. (1994) Chemometr. Anal. Chem. 66, 315R±359R. Brown S. D., Sum S. T. and Despagne F. (1996) Chemometrics. Anal. Chem. 68, 21R±61R. Cattell R. B. and Jaspers J. (1967) A general plasmode (No. 30-10-5-2) for factor analytic exercises and research. Mult. Behav. Res. Monogr. 67, 1±212. Dixon W. and Chiswell B. (1996) Rewiew of aquatic monitoring program design. Wat. Res. 30, 1935±1948. Elosegui A. and Pozo J. (1994) Spatial vs temporal varia- bility in the physical and chemical characteristics of the Aguera stream (Northern Spain). Acta Ecologica Ð Int. J. Ecol. 15, 543±559. Grimalt J. O., Olive J. and Go  mez-Belincho  n J. I. (1990) Assessment of organic source contributions in coastal waters by principal component and factor analysis of the dissolved and particulate hydrocarbon and fatty acid contents. Int. J. Environ. Anal. Chem. 38, 305±320. Jackson J. E. (1991) A User's Guide to Principal Components. Wiley, New York. Librando V. (1991) Chemometric evaluation of surface water quality at regional level. Fresenius J. Anal. Chem. 339, 613±619. Massart D. L. and Kaufman L. (1983) The Interpretation of Analytical Chemical Data by the Use of Cluster Analysis. Wiley, New York. Massart D. L., Vandeginste B. G. M., Deming S. N., Michotte Y. and Kaufman L. (1988) Chemometrics: A Textbook. Elsevier, Amsterdam. Meglen R. R. (1992) Examining large databases: a chemo- metric approach using principal component analysis. Mar. Chem. 39, 217±237. Mellinger M. (1987) Multivariate data analysis: its methods. Chemometr. Intell. Lab. Systems 2, 29±36. Miller J. C. and Miller J. N. (1984) Statistics for Analytical Chemistry. Ellis Horwood Series in Analytical Chemistry, Wiley, New York. Pardo R., Barrado E., Vega M., Deban L. and Tasco  n M. L. (1994) Voltammetric complexation capacity of waters from the Pisuerga river. Wat. Res. 28, 2139±2146. Ross P. J. (1988) Taguchi Techniques for Quality Engineering. McGraw-Hill, New York. Sche€e H. (1959) The Analysis of Variance. Wiley, New York. Sharaf M. A., Illman D. L. and Kowalski B. R. (1986) Chemometrics. Wiley, New York. Stroomberg G. J., Freriks I. L., Smedes F. and Co®no W. P. (1995) In Quality Assurance in Environmental Monitoring, ed. P. Quevauviller. VCH, Weinheim. Voutsa D., Zachariadis G., Samara C. and Kouimtzis T. (1995) Evaluation of chemical parameters in Aliakmon river in Northers Greece. 2. Dissolved and particulate heavy metals. J. Environ. Sci. Hlth. Part A: Environ. Sci. Engng 30, 1±13. Ward A. D. and Elliot W. J. (1995) In Environmental Hydrology, ed. A. D. Ward and W. J. Elliot, pp. 1. CRC Press, Boca Raton. Wenning R. J. and Erickson G. A. (1994) Interpretation and analysis of complex environmental data using che- mometric methods. Trends Anal. Chem. 13, 446±457. Willet P. (1987) Similarity and Clustering in Chemical Information Systems. Research Studies Press, Wiley, New York. Marisol Vega et al. 3592