SlideShare a Scribd company logo
1 of 31
Download to read offline
Page 1 of 31
Analysis of water quality in the Keilor
Lodge locality and recommendations
for a suitable sampling program
Dean Iovenitti
Industrial Application of Mathematics and Statistics 2 report
RMIT University
May 2016
Page 2 of 31
With thanks to City West Water, the Operations Research division for their mathematical
and statistical assistance with this project and to George Ruta for providing and clearly
explaining information about the water quality processes that City West Water
undertakes.
Page 3 of 31
Table of Contents
1. Executive Summary .......................................................................................................... 4
2. Introduction ...................................................................................................................... 5
2.1 – City West Water & data description......................................................................... 5
2.2 – Aims ......................................................................................................................... 5
3. Background ....................................................................................................................... 6
3.1 – Water quality processes............................................................................................. 6
3.2 – KAPTA probe........................................................................................................... 6
3.3 – Cost of sampling ....................................................................................................... 6
3.4 – Issues with the data .................................................................................................. 7
4. Method.............................................................................................................................. 8
4.1 – Data preparation....................................................................................................... 8
4.2 – Bootstrapping ........................................................................................................... 8
5. Results .............................................................................................................................. 9
5.1 – Graphical results & comments .................................................................................. 9
5.1.1 – Probe data.......................................................................................................... 9
5.1.2 – Manual data ......................................................................................................12
5.2 – Comparison of probe & manual data........................................................................13
5.3 – Statistical analysis....................................................................................................14
5.3.1 – Hypotheses and assumptions..............................................................................14
5.3.2 - Skewness ............................................................................................................15
5.3.3 – Testing for normality .........................................................................................16
5.3.4 – Bootstrapping ....................................................................................................18
6. Conclusions ......................................................................................................................20
6.1 – Recommendations ....................................................................................................20
6.2 – Further Analysis ......................................................................................................20
7. References ........................................................................................................................21
8. Appendix 1.......................................................................................................................22
9. Appendix 2.......................................................................................................................23
10. Appendix 3.....................................................................................................................24
11. Appendix 4.....................................................................................................................25
Page 4 of 31
1. Executive Summary
This report will provide an analysis of water quality data obtained from the Keilor Lodge
area and put forth recommendations for a suitable sampling program. Probe data was
obtained by VEOLIA Water and manual data was provided by ALS Laboratories over a
period of 23 months from January 2013 to November 2014.
The probe data in particular had not been previously analysed so therefore the client is
interested in what is happening with the data and what the data is indicating such as if
there are any unusual patterns in the data. All water quality indicators, such as chlorine and
conductivity, were within the recommended guidelines.
Graphical representation showed that chlorine concentrations and electrical conductivity
levels were all within the recommended guidelines for both the probe and manual data.
Analysis was performed using an improvised bootstrapping technique which led to the
conclusion that there is not a statistical difference between the variances of the manual and
probe data. However, graphically there was not a “significant” difference between the two
methods.
Recommendations include sampling using either method but if cost were a factor, then
manual sampling would provide a cheaper option. If diurnal patterns were to be analysed, a
probe can be temporarily inserted by CWW.
Further analysis could be conducted on finding or developing a different method to deal with
the large difference in sample sizes for the two methods.
Page 5 of 31
2. Introduction
2.1 – City West Water & data description
City West Water (CWW), one of three water retail businesses in metropolitan Melbourne,
provide drinking water, sewerage, trade waste and recycled water services to customers in
Melbourne’s CBD and western suburbs [1]. This involves developing conservation and
contingency plans for water resource management and emergency situations [2].
The probe data was provided by VEOLIA Water and recorded values in a portion of the
Keilor Lodge area for chlorine (in mg/L), electrical conductivity (in µS/cm, micro-Siemens
per cm), temperature in °C and pressure in bar. These values were recorded from the 16th of
January 2013 at 4:54pm to the 8th of November 2014 at 6:56am in 5 minute intervals
culminating in a total of 185,232 measurements for each parameter. A probe can only detect
one form of chlorine, namely hypochlorous acid (HOCl), in the water which is different from
the chlorine values obtained via manual sampling (see next section).
The manual data was supplied by ALS Laboratories and samples were taken for analysis for
chlorine and electrical conductivity (or simply conductivity) with units of mg/L and µS/cm
respectively. Samples were taken exclusively on weekdays; on average one chlorine sample
per weekday, resulting in 526 measurements from 41 locations throughout the Keilor area.
Conductivity samples were taken fortnightly resulting in 47 measurements from 29 locations
in the Keilor area.
This form of sampling measures for three different forms of chlorine including HOCl, OCl-
and unreacted Cl2, resulting in a higher measured concentration than that of the probe.
Hence when the comparison is made between the manual and probe data later in this report,
this should be taken into consideration.
Manual tests were undertaken by NATA accredited laboratory.
2.2 – Aims
The objective of the report is to determine characteristic (if any) features of chlorine
concentration, temperature and conductivity from data in the Keilor Lodge area. The
development of a suitable sampling program will also be investigated for City West Water to
consider implementing.
Page 6 of 31
3. Background
3.1 – Water quality processes
City West Water undertakes sampling to determine the quality of the water that they are
supplying to their customers. This includes taking bacterial samples (looking if E. coli is
present which indicates faecal contamination and hence a health risk) in addition to physical
samples (such as pH and colour) and chemical samples (such as l) [3].
The main characteristics that will be investigated in this report are chlorine concentration
and electrical conductivity, in addition to temperature but to a lesser extent. Chlorine is an
important feature because it is a form of disinfectant to defend against any microbes in the
water (G. Ruta [City West Water, Victoria] 2016, pers. comm., 24 May 2016). Also, the
chlorine concentration cannot be too low otherwise the microbes are not eliminated and
conversely cannot be too high else the concentration could be above the acceptable guideline
(i.e. toxic to human consumption). Electrical conductivity (or conductivity) measures for
ions that conduct electricity in the water. This is an extremely sensitive measure so if there
is a contamination, then a large spike (much greater than the “normal” values) will be
measured. Temperature is not a significant characteristic but it is worthwhile to understand
whether it displays any interesting patterns.
3.2 – KAPTA probe
The KAPTATM 3000-AC4 probe is a battery operated device that was rented from VEOLIA
Water for the 23-month period from January 2013 to November 2014. It can record chlorine
concentration, conductivity, pressure and temperature simultaneously [4]. When inserting
one of these probes, a small hole is drilled into the appropriate water pipe and then a rod
inserted into the pipe so the data can be recorded. Any gaps are sealed so as to become
waterproof with the probe being left untouched for the duration of the sampling (G. Ruta
[City West Water, Victoria] 2016, pers. comm., 18 March 2016).
3.3 – Cost of sampling
The cost for one chlorine sample is $31. However, other measurements are taken at the same
time that the chlorine is taken, but it will be assumed that it costs $31 to take a chlorine
sample. For one conductivity sample, it costs $5. There were a total of 52 conductivity
samples taken which cost a total of 52 × $5 = $260. 526 chlorine samples were taken which
cost a total of 526 × $31 = $16,306. This results in a grand total of $16,566 for all the
manual chlorine and conductivity samples.
Renting a KAPTA probe costs approximately $900 per month (M. Ramov [City West
Water, Victoria] 2016, pers. comm., 11 May 2016). A probe was inserted for about 23
months so that means the total cost of renting the probe was 23 × $900 = $20,700. Note
that this records chlorine concentration, conductivity, pressure and temperature.
Furthermore, the probe doesn’t cost less if it records less frequently however, this doesn’t
necessarily mean that the battery life of the probe will be extended (M. Ramov [City West
Water, Victoria] 2016, pers. comm., 11 May 2016).
Page 7 of 31
3.4 – Issues with the data
There were several relatively minor issues with the data. The first of these being that there
were a few occasions when the probe failed to record measurements for a certain period of
time. The most significant of these was a period of 7 days from April 30th to May 7th 2014
and to a lesser extent for a period of about 6 hours on the 27th of July 2014 from 8am to
2pm when no data was recorded. For the period of 7 days without data, the chlorine
concentration increased by 0.06mg/L, temperature decreased by 0.3°C and conductivity
remained the same. Considering only 2000 values (approximately) are missing from the probe
data, little information is being lost as there are more than 180,000 values recorded from the
probe.
The times for when each manual sample was taken could have been obtained, but it would
have required an extensive search on the part of the personnel in the water quality
department of CWW. If these times were acquired, then the manual data times could be
compared with those of the probe to determine whether the times when the manual samples
were taken were a true reflection of that particular day or week.
An interesting challenge was finding a method that could statistical test whether there is a
difference between the probe and manual data. This was more difficult than was expected
and with the help of the Operations Research team at CWW an improvised bootstrapping
method was developed and then implemented in MATLAB (see Bootstrapping for more
details).
Page 8 of 31
4. M ethod
4.1 – Data preparation
The creation of pivot-tables in Excel was an essential part of the preparation for analysis
since these helped to simplify the data such that the features of each particular parameter
(chlorine, conductivity or temperature) could be determined. The graphs from this are
analysed in Graphical results and comments. Additionally, it has been confirmed that
pressure was not analysed in this report at CWW’s request (G. Ruta [City West Water,
Victoria] 2016, pers. comm., 18 March 2016).
4.2 – Bootstrapping
A technique known as bootstrapping can be applied to data that is not normally distributed
whereby the classical or general form compares for a difference in means between two
samples. However, since the comparison between the probe and manual data is regarding the
variation, it is more appropriate to develop a method that tests for differences in variance
rather than differences in means. The reasons for using this method will be made apparent
throughout this report.
The following method is a variation of a bootstrapping method whereby it tests for a
difference in variances which has been modified by Stuart Roberts (Statistical Analyst in the
Operations Research team at CWW).
There are two sample distributions, the manual and the probe data, which will be called M
and P respectively for the purposes of this method. It will be assumed that the two samples
accurately describe the population distribution for both M and P. Below is the bootstrapping
method for variances.
Step 1: From both of the samples M and P, one value is randomly selected x number of
times per group where x is the size of the M and P samples. Sampling with replacement is
applied which results in 2 new sample groups for M and P with the size of the new sample
being the same as the size of the original sample.
Step 2. The sample variances for these new sample groups is calculated and recorded
resulting in variance Sm and Sp for the manual and probe respectively.
Step 3. Steps 1 and 2 are repeated a large number of times, say 1000 times, so then there are
1000 values of both Sm and Sp with Sm = {Sm1, Sm2, …, Sm1000} and Sp = {Sp1, Sp2, …, Sp1000}.
Step 4: The 5th and 95th percentiles are calculated for both Sm and Sp.
Step 5: Lastly, if the percentile ranges (Step 4) of Sm and Sp do not overlap, then there is
sufficient evidence to conclude that there is a difference of variance.
Page 9 of 31
5. Results
5.1 – Graphical results & comments
5.1.1 – Probe data
Below is a table summarising some important statistical measures for the raw probe data.
Chlorine Conductivity Temperature
Mean 0.0903 69.0 16.7
Variance 0.0043 18.8 14.0
Minimum value 0 55 11.1
Maximum value 0.38 110 23.7
Table 1: Some important statistical measures evaluated from the probe chlorine,
conductivity and temperature data.
Firstly, the graphs for the probe monthly chlorine, conductivity and temperature data are
shown in Figures 1, 2 and 3 respectively. The bars, lines and circles represent the range for
that particular month.
Figure 1: Probe chlorine concentration monthly averages and ranges from January 2013 to
November 2014 measured in mg/L.
The monthly averages of chlorine concentration (Figure 1) appear to have yearly seasonality,
meaning there is a peak about 12 months apart. These peaks (and troughs) are not exactly
defined by a single observation with the peak or trough lasting multiple months. Peak
chlorine concentrations occur during the winter months and potentially a month either side
of winter whereas the troughs occur predominantly throughout the summer months only in
addition to March. The ranges from January 2014 to May 2014 (except March) are
reasonably large with the smallest ranges occurring during the summer months. Additionally,
the overall range of the monthly averages can be more accurately defined as being from
about 0.025 to 0.18 mg/L with the range for winter months generally larger than the range
of the summer months.
Page 10 of 31
Figure 2: Probe conductivity monthly averages and ranges from January 2013 to November
2014 measured in µS/cm.
The monthly conductivity averages (Figure 2, above) show a predominantly decreasing trend
over the sampling period except for the months from about January to April where there is
an increasing trend, for both 2013 and 2014. Even though there is no particular peak or
trough month, the data appears to be seasonal (yearly), meaning cycles of 12 months. The
overall range of the monthly averages is from 60.7 to 74.5 µS/cm, with ranges for each
month predominantly less than about 20 µS/cm (e.g. from 70 to 75 µS/cm) but going as
high as 40 µS/cm (e.g. from 70 to 100 µS/cm in February 2013).
The overall decrease in trend can be attributed to there being an increase in the water
supplied to all City West Water localities (including Keilor Lodge) from the Silvan
Reservoir. This means that since there is a lower conductivity in the water supplied from the
Silvan Reservoir [3], the conductivity levels decrease.
Figure 3: Probe temperature monthly averages and ranges from January 2013 to November
2014 measured in °C.
Temperature monthly averages have a strong cyclic pattern that is more prominent than
that of chlorine. The peaks and troughs are more clearly defined (generally one point) with
the 2013 and 2014 averages almost identical. The peaks occur during either February or
March (both considered hot months on average) and the troughs occur around July and
August (both considered cold months on average). This is to be expected since December
Page 11 of 31
through to March will have, for the majority, much hotter temperatures than June through
September with the range of monthly averages from about 11.5 to 22.5°C. The range for each
month is about the same from one month to the next with all ranges less than 5°C.
Figure 4: Probe chlorine concentration hourly averages measured in mg/L.
Chlorine hourly averages show there is a decrease of about 0.05 mg/L from 9pm to 3am.
This “dip” can be attributed to the chlorine dissipating since the water is sitting in the pipe
overnight. The rise is probably due to people waking up and getting ready to go to work.
Once the water starts flowing in the pipe then it becomes chlorinated again, which results in
a rise in the chlorine concentration. Besides this “dip”, the mean is roughly constant (with no
chlorine dissipation) with the range of values (from about 9am to 9pm) less than 0.01 mg/L
compared with the overall range of about 0.06mg/L.
Figure 5: Probe conductivity hourly averages measured in µS/cm.
Conductivity hourly averages are roughly constant over a 24-hour period with minimal trend
and no obvious cycles. The range is from 68.3 to 70.6 µS/cm (about 2.3 µS/cm).
0
0.05
0.1
0.15
0.2
Midnight 3am 6am 9am Noon 3pm 6pm 9pm 11pm
Chlorine(mg/L)
Hourly average chlorine concentration
0
20
40
60
80
100
120
Midnight 3am 6am 9am Noon 3pm 6pm 9pm 11pm
Conductivity(µS/cm)
Hourly conductivity average
Page 12 of 31
Figure 6: Probe temperature hourly averages measured in °C.
Temperature hourly averages are virtually constant over 24 hours with no other important
features to mention.
5.1.2 – Manual data
Below is a table summarising some important statistical measures for the manual data.
Chlorine Conductivity Temperature
Mean 0.2033 76.5 16.7
Variance 0.0175 178.6 14.0
Minimum value 0 62 11.1
Maximum value 0.38 120 23.7
Table 2: Some important statistical values evaluated from the manual chlorine, conductivity
and temperature data.
Figure 7: Manual conductivity data from January 2013 to November 2014.
The conductivity data appear to have a roughly constant mean with larger variation during
the first six months of 2013 than any other period when the samples were taken. The large
variations were caused by fluctuations in water source. The range of values is from 62 to 120
µS/cm.
Next, the manual chlorine data are analysed. Since there are about 500 measurements,
averages of the data will be taken, and hence analysed, using the same process as that of the
probe data. Below is the graph showing the chlorine concentration monthly averages and
ranges.
10
15
20
25
Midnight 3am 6am 9am Noon 3pm 6pm 9pm 11pm
Temperature(°C)
Hourly average temperature
0
40
80
120
J F M A M J J A S O D J F M A M J J A S O N
2013 2014
Conductivity(µS/cm)
All manual conductivity measurements
Page 13 of 31
Figure 8: Manual chlorine concentration monthly averages and ranges from January 2013 to
November 2014.
Figure 8 shows that there is seasonality in the data (yearly) with the peaks and troughs
lasting for several months in addition to no apparent overall trend. The overall range is from
0 to 0.65 mg/L with the range for each month about the same from one month to the next.
5.2 – Comparison of probe & manual data
Since the only manual data provided was for chlorine concentrations and conductivity, then
only these two parameters can be compared against the probe data. Below is a combined
graph of the manual and probe monthly averages for chlorine.
Figure 9: Manual & probe chlorine monthly averages from January 2013 to November 2014.
Besides the difference in means of the probe and manual chlorine monthly averages, both
series display similar features including seasonality, variation and overall shape. Thus, it
0
0.05
0.1
0.15
0.2
0.25
0.3
J F M A M J J A S O N D J F M A M J J A S O N
2013 2014
Chlorine(mg/L)
Monthly average chlorine concentrations (probe & manual)
Probe Manual
Page 14 of 31
appears that in looking at longer term trends, taking manual measurements on a daily basis
is just as good as inserting a probe and recording measurements every 5 minutes for
obtaining an overall picture of how the chlorine concentration is fairing.
Since the manual samples of conductivity were taken fortnightly, a logical assumption would
be to compare the fortnightly raw manual data against the fortnightly averages of the probe
data. However, this would be an unfair comparison since raw values would be compared
against averages. If this was conducted, the averages would lack the variation that is present
in the raw data. Similarly, monthly averages of the conductivity will have more variation
than that of the probe data, so for this case quarterly averages will be compared.
Below is a combined graph of the quarterly averages for the manual and probe conductivity
data (Figure 10).
Figure 10: Manual and probe conductivity quarterly averages from January 2013 to
November 2014.
The manual quarterly averages are marginally higher than the corresponding quarter for the
probe data, not a very large difference between the two methods when observing the
quarterly averages. Therefore, the manual conductivity data appear to give a satisfactory
overall picture over the 23-month period. Additional manual sampling information for
conductivity may need to be obtained for a better comparison with the probe data.
Hence, it appears that using manual samples to obtain chlorine concentrations and
conductivity gives just as good an indication about overall trend, mean and variation as that
of the probe.
5.3 – Statistical analysis
5.3.1 – Hypotheses and assumptions
The first stage of any statistical analysis is establishing a hypothesis that will be tested. So
for this report, the hypotheses are:
H0: the variances of the probe and manual data are the same and therefore either sampling
method could be used
H1: the variances of the probe and manual data are different
0
20
40
60
80
100
120
March June September December March June September December
2013 2014
Conductivity(µS/cm)
Quarterly average conductivity (probe & manual)
Probe Manual
Page 15 of 31
Where H0 and H1 are the null and alternate hypotheses, respectively.
There are some necessary assumptions for many statistical tests that need to be satisfied
before the test can be suitably conducted. If a test of variances is to be conducted, then
three assumptions are required which are the samples needs to be randomly selected,
independent and normally distributed.
5.3.2 - Skewness
Before looking at what test can be applied to the data or determine if the data are normally
distributed, first an analysis of whether the data are skewed (either positively, negatively or
not at all) should be conducted. This will help establish whether the data are normally
distributed or if a transformation is required. Figures 11 and 12 show histograms for the
manual chlorine and probe chlorine data respectively.
Figure 11: Histogram of the manual chlorine data, showing frequency of events.
Figure 12: Histogram of the probe chlorine data, showing frequency of events.
0.60.50.40.30.20.10.0
40
30
20
10
0
Chlorine manual
Frequency
Histogram of Chlorine manual
Page 16 of 31
From the above figures, it can be seen that both histograms are positively skewed (or skewed
to the right). This would suggest, even before conducting a test for normality, that neither
the manual nor probe chlorine data will be normally distributed.
The same observation can be concluded from the conductivity histograms (i.e. that the
manual and probe data are positively skewed) and hence these data sets will most likely not
follow a normal distribution.
For the conductivity histograms regarding the manual and probe data, see Appendix 2.
5.3.3 – Testing for normality
Before any statistical testing can be done, the data needs to be deemed sufficiently normally
distributed at the 0.05 level of significance. This means that we can be 95% sure that the
data are normally distributed if the test for normality passes. Since the comparison is limited
to probe and manual data for the chlorine and conductivity parameters only, four tests for
normality were undertaken. Below is the probability plot of the manual chlorine data.
Figure 13: Probability plot of the manual chlorine data, testing for normality at the 0.05
level of significance.
The hypothesis that the data are normally distributed for the manual chlorine data is
rejected since the p-value is less than the 0.05 significance level. Thus, the manual chlorine
data are not normally distributed.
Moreover, when testing for normality for the probe chlorine data and both conductivity data
sets from manual and probe, similar results occurred with all of the remaining data sets not
passing the test for normality.
For the probability plots associated with each of these data sets, see Appendix 3.
Page 17 of 31
Since the test for normality failed for the probe and manual data (for both chlorine and
conductivity), then a 2-sample variance test could not be conducted since this test assumes
that the data are normally distributed. Furthermore, since an objective is to determine
whether the probe and manual data have similar variation, the non-parametric tests such as
the Mann-Whitney and Kruskal-Wallis are not testing for a difference in variance. Hence, a
different method had to be found to test whether the probe and manual data have statistical
different variances.
To determine whether the data can be transformed, or if it can be fitted to a distribution,
the Individual Distribution Identification in Minitab was utilised to check if the data fitted a
known distribution, such as exponential, log, Weibull, etc. This was firstly conducted for the
manual chlorine data which resulted in several plots with one of these shown below in Figure
14.
Figure 14: Distribution identification plot attempting to fit a known distribution to the
manual chlorine data. Shown are four distributions with p-values all less than 0.05, meaning
none of the four distributions fit the manual chlorine data.
The extra distribution identification plots for the manual chlorine data can be found in
Appendix 4.
These additional distribution identification plots all had p-values less than 0.05 which meant
that no suitable transformation or distribution fitted the manual chlorine data. Hence, no
appropriate tests (i.e. tests that rely on the assumption that the data are normal) were
conducted. The same procedure was applied to the probe chlorine data in addition to the
conductivity manual and probe data, but achieved the same results except for the manual
conductivity data.
Page 18 of 31
A distribution was found that fits the manual conductivity data (Figure 15) which was a
Box-Cox transformation with λ = -5, meaning the reciprocal was taken and then the data
was raised to the fifth power. However, since the same transformation couldn’t be
successfully applied to the probe conductivity data without the data being normally
distributed, then no appropriate transformation or distribution fits the conductivity data.
Figure 15: Distribution identification plot attempting to fit a known distribution to the
manual conductivity data.
Additional distribution plots for the probe chlorine data and conductivity manual and probe
data can be found in Appendix 4.
Therefore, an alternative method had to be developed in order for the data to be tested.
5.3.4 – Bootstrapping
After developing a suitable program in MATLAB (outlined in section 3.2 Bootstrapping), the
sample variances for the chlorine and conductivity data (for both probe and manual) could
be calculated. The chlorine data will be discussed initially before progressing to the
conductivity data.
The average of the sample variances for probe chlorine was determined to be 0.004343 mg/L.
The 5th and 95th percentiles of the sample variances from the manual data were calculated as
0.015569 mg/L and 0.019452 mg/L respectively. Since there is no overlap of percentiles, then
there is evidence to conclude that there is a difference between the variances of chlorine for
the probe and manual data.
Similarly, for the conductivity data the average of the sample variances from the probe was
determined as 18.79 µS/cm. Additionally, the 5th and 95th percentiles of the sample variances
from the manual data were calculated as 81.31 µS/cm and 281.96 µS/cm respectively. Since
Page 19 of 31
there is no overlap of percentiles, then there is evidence to conclude that there is a difference
between the variances of conductivity for the probe and manual data.
From the bootstrapping results, it can be established that there is a difference between the
variances of chlorine for the probe and manual data and similarly for the conductivity
variances. For the manual data, the 5th and 95th percentiles of the variances are a few times
larger than those of the probe data and is due to the large differences in sample size between
the two methods.
Page 20 of 31
6. Conclusions
Statistically, the probe data has less variability than the manual data for both the chlorine
concentration and electrical conductivity. This is due to the large difference in sample size
between the two methods.
However, when observing the graphs which were comparing the probe and manual data,
there was not a “significant” difference between the two methods for either chlorine
concentration or electrical conductivity.
6.1 – Recommendations
Based on the results and the consequent conclusions, the first recommendation would be that
either method, probe or manual sampling, is acceptable for observing the long-term
characterisation of the water quality, even though the variances were found to be
significantly different.
Secondly, the cheaper option of manual sampling can be implemented instead of the probe.
However, if diurnal patterns were to be analysed then it would be advised that City West
Water can temporarily rent a probe to record these patterns.
6.2 – Further Analysis
The first option for further analysis would be to find some transformation, possibly utilising
a different statistical software such as SPSS, SAS or R, such that both the manual and
probe data sets can be statistically tested.
Further investigation could be conducted on grouping the probe data by month (i.e. January
2013 with January 2014, February 2013 with February 2014, etc) and determine whether the
data can be transformed in this way.
An additional option for analysis could be to find a statistical method that could deal with
the large difference in sample size between the two methods.
Finally, more analysis could be conducted to establish whether randomly selecting 47
(equivalent to number of conductivity values) or 447 values (equivalent to number of
chlorine values) from the probe data produces different results. Furthermore, randomly
selecting without replacement could be implemented for the probe data only, in conjunction
with the method outlined in the previous sentence.
Page 21 of 31
7. References
[1] City West Water 2016, Who We Are, viewed 9 May 2015,
<https://www.citywestwater.com.au/about_us/who_we_are.aspx>.
[2] City West Water 2016, Where We Fit in, viewed 22 May 2015,
<https://www.citywestwater.com.au/about_us/where_we_fit_in.aspx>.
[3] City West Water, 2015, Drinking Water Quality Report 2015, City West Water,
Melbourne.
[4] KAPTATM 3000-AC4, In-line Multi-parameter Water Sensor 2014, viewed 26 March
2016, <http://www.endetec.com/endetec/ressources/files/1/20117,LIT-EN-032-02_KAPTA-
3000-AC4-Produ.pdf>.
Page 22 of 31
8. Appendix 1
MATLAB code for the improvised bootstrapping technique. The example code is for the
manual conductivity data.
clear;
% 47 manual conductivity measurements
in = fopen('Conductivity manual data only.txt','r'); % Opening the text file to
read the values
Manualcond = fscanf(in, '%f', [1,inf]); % Reading all the values in the text
file
Mcond = zeros(1000,47); % Creating a 1000 x 47 matrix of all zeroes
fclose(in); % Closing the file
new_out = fopen('Conductivity manual random sample.txt','w'); % Creating a new
text file
for i = 1:1000
for j = 1:47
Mcond(i,j) = datasample(Manualcond,1); % Selecting a random value, with
replacement, from the data every time through the j loop.
fprintf(new_out, '%4.2f ', Mcond(i,j)); % Storing selected value into the
newly created file
end;
fprintf(new_out, 'n');
end;
fclose(new_out); % Closing the file
varfile = fopen('Variance is here Cond manual.txt','w'); % Creating a new text
file to store the variances of the manual conductivity data
varcalc = var(Mcond,0,2); % Calculating the variances for each row in the text
file “Conductivity manual random sample”
fprintf(varfile, '%20.10f', varcalc); % Storing variances into newly created text
file
fclose(varfile); % Closing the file
Page 23 of 31
9. Appendix 2
12010896847260
20
15
10
5
0
Conductivity manual
Frequency
Histogram of Conductivity manual
10896847260
90000
80000
70000
60000
50000
40000
30000
20000
10000
0
Conductivity probe
Frequency
Histogram of Conductivity probe
Page 24 of 31
10. Appendix 3
Page 25 of 31
11. Appendix 4
M anual chlorine probability plots, attempting to find an appropriate
transformation
All p-values in the above two probability plots are less than 0.05 which means that the
manual chlorine data doesn’t fit any of the above distributions.
Page 26 of 31
From the above graph, since the p-value < 0.05 then no appropriate Johnson transformation
can be made to the data.
M anual conductivity probability plots, attempting to find an appropriate
transformation
Page 27 of 31
Since Minitab found a sufficient transformation using the Box-Cox transformation, then
there is no need to consider the Johnson transformation. Additionally, the other
transformations all had p-values less than 0.05 so would nonetheless be an inappropriate
transformation to the conductivity manual.
Page 28 of 31
Probe conductivity probability plots, attempting to find an appropriate
transformation
Page 29 of 31
All p-values in the above four probability plots are less than 0.05 which means that the
probe conductivity data doesn’t fit any of the above distributions.
Page 30 of 31
Probe chlorine probability plots, attempting to find an appropriate
transformation
Page 31 of 31
All p-values in the above three probability plots are less than 0.05 which means that the
manual chlorine data doesn’t fit any of the above distributions.

More Related Content

What's hot

TessRogersPresenationDraft1
TessRogersPresenationDraft1TessRogersPresenationDraft1
TessRogersPresenationDraft1
Tess Rogers
 
ChemE335GroupProject.docx
ChemE335GroupProject.docxChemE335GroupProject.docx
ChemE335GroupProject.docx
YIFANG WANG
 
Evaluation of acoustical parameters of aqueous solution of sodium chloride an...
Evaluation of acoustical parameters of aqueous solution of sodium chloride an...Evaluation of acoustical parameters of aqueous solution of sodium chloride an...
Evaluation of acoustical parameters of aqueous solution of sodium chloride an...
eSAT Journals
 

What's hot (20)

Yuwu chen wastewater treatment
Yuwu chen wastewater treatmentYuwu chen wastewater treatment
Yuwu chen wastewater treatment
 
Evaluation of Water Quality of Kharun River Stretch Near the Raipur City
Evaluation of Water Quality of Kharun River Stretch Near the Raipur CityEvaluation of Water Quality of Kharun River Stretch Near the Raipur City
Evaluation of Water Quality of Kharun River Stretch Near the Raipur City
 
Classification of storm water and sea water samples by zero-, first- and seco...
Classification of storm water and sea water samples by zero-, first- and seco...Classification of storm water and sea water samples by zero-, first- and seco...
Classification of storm water and sea water samples by zero-, first- and seco...
 
Paper id 312201519
Paper id 312201519Paper id 312201519
Paper id 312201519
 
A laboratory based study of hydraulic simulation of leakage in water distribu...
A laboratory based study of hydraulic simulation of leakage in water distribu...A laboratory based study of hydraulic simulation of leakage in water distribu...
A laboratory based study of hydraulic simulation of leakage in water distribu...
 
TessRogersPresenationDraft1
TessRogersPresenationDraft1TessRogersPresenationDraft1
TessRogersPresenationDraft1
 
ELECTRO DIALYSIS FOR THE DESALINATION OF BACKWATERS IN KERALA
ELECTRO DIALYSIS FOR THE DESALINATION OF BACKWATERS IN KERALAELECTRO DIALYSIS FOR THE DESALINATION OF BACKWATERS IN KERALA
ELECTRO DIALYSIS FOR THE DESALINATION OF BACKWATERS IN KERALA
 
JSEHR 1(1)-8
JSEHR 1(1)-8JSEHR 1(1)-8
JSEHR 1(1)-8
 
Water Quality Assessment of Kukkarahalli Lake Water Mysore, Karnataka, India
Water Quality Assessment of Kukkarahalli Lake Water Mysore, Karnataka, IndiaWater Quality Assessment of Kukkarahalli Lake Water Mysore, Karnataka, India
Water Quality Assessment of Kukkarahalli Lake Water Mysore, Karnataka, India
 
East Palatka_Nitrogen Modeling_Report
East Palatka_Nitrogen Modeling_ReportEast Palatka_Nitrogen Modeling_Report
East Palatka_Nitrogen Modeling_Report
 
IRJET- Water Quality Assessment of Paravoor Lake
IRJET- Water Quality Assessment of Paravoor LakeIRJET- Water Quality Assessment of Paravoor Lake
IRJET- Water Quality Assessment of Paravoor Lake
 
Thesis DEFENSE
Thesis DEFENSEThesis DEFENSE
Thesis DEFENSE
 
ChemE335GroupProject.docx
ChemE335GroupProject.docxChemE335GroupProject.docx
ChemE335GroupProject.docx
 
water-07-01568
water-07-01568water-07-01568
water-07-01568
 
Analysis of Water Quality Characteristics in Distribution Networks
Analysis of Water Quality Characteristics in Distribution NetworksAnalysis of Water Quality Characteristics in Distribution Networks
Analysis of Water Quality Characteristics in Distribution Networks
 
Evaluation of acoustical parameters of aqueous solution of sodium chloride an...
Evaluation of acoustical parameters of aqueous solution of sodium chloride an...Evaluation of acoustical parameters of aqueous solution of sodium chloride an...
Evaluation of acoustical parameters of aqueous solution of sodium chloride an...
 
Writing Sample2
Writing Sample2Writing Sample2
Writing Sample2
 
Electro kinetic fractal dimension for characterizing shajara reservoirs of th...
Electro kinetic fractal dimension for characterizing shajara reservoirs of th...Electro kinetic fractal dimension for characterizing shajara reservoirs of th...
Electro kinetic fractal dimension for characterizing shajara reservoirs of th...
 
Studying the effects of industrial wastes on
Studying the effects of industrial wastes onStudying the effects of industrial wastes on
Studying the effects of industrial wastes on
 
Ashrae rp 1301 quantification of ventilation effectiveness
Ashrae rp 1301 quantification of ventilation effectivenessAshrae rp 1301 quantification of ventilation effectiveness
Ashrae rp 1301 quantification of ventilation effectiveness
 

Viewers also liked

Ravinder Kumar Update Resume
Ravinder Kumar  Update ResumeRavinder Kumar  Update Resume
Ravinder Kumar Update Resume
Ravinder Kumar
 
Fire mapping brochure
Fire mapping brochureFire mapping brochure
Fire mapping brochure
MazRio Sekayu
 
MPO Working Process
MPO Working ProcessMPO Working Process
MPO Working Process
Tom Choon
 

Viewers also liked (19)

Tc sales direct team
Tc sales direct teamTc sales direct team
Tc sales direct team
 
PortFolioEnglish
PortFolioEnglishPortFolioEnglish
PortFolioEnglish
 
KHParisA4-16rev2
KHParisA4-16rev2KHParisA4-16rev2
KHParisA4-16rev2
 
Tc sales direct
Tc sales directTc sales direct
Tc sales direct
 
webVD
webVDwebVD
webVD
 
CV Azfar
CV AzfarCV Azfar
CV Azfar
 
Ravinder Kumar Update Resume
Ravinder Kumar  Update ResumeRavinder Kumar  Update Resume
Ravinder Kumar Update Resume
 
MPrasetyo-19mei16A
MPrasetyo-19mei16AMPrasetyo-19mei16A
MPrasetyo-19mei16A
 
Garments Fault Library
Garments Fault LibraryGarments Fault Library
Garments Fault Library
 
Clockwise inspection
Clockwise inspectionClockwise inspection
Clockwise inspection
 
Qualities of q.a person
Qualities of q.a personQualities of q.a person
Qualities of q.a person
 
Fault Library 2016-2017 By Abdul Latif (+92 3127274200)
Fault Library 2016-2017 By Abdul Latif (+92 3127274200)Fault Library 2016-2017 By Abdul Latif (+92 3127274200)
Fault Library 2016-2017 By Abdul Latif (+92 3127274200)
 
Tugas 2 1104505098__1104505102_dwh
Tugas 2 1104505098__1104505102_dwhTugas 2 1104505098__1104505102_dwh
Tugas 2 1104505098__1104505102_dwh
 
Trabalho de conclusão de curso
Trabalho de conclusão de cursoTrabalho de conclusão de curso
Trabalho de conclusão de curso
 
Fire mapping brochure
Fire mapping brochureFire mapping brochure
Fire mapping brochure
 
Continuous Delivery in Oracle ADF Projekten
Continuous Delivery in Oracle ADF ProjektenContinuous Delivery in Oracle ADF Projekten
Continuous Delivery in Oracle ADF Projekten
 
国内スタートアップを取材する記者が、2014年注目するテーマとは? 先生:平野 武士・岩本 有平
国内スタートアップを取材する記者が、2014年注目するテーマとは? 先生:平野 武士・岩本 有平国内スタートアップを取材する記者が、2014年注目するテーマとは? 先生:平野 武士・岩本 有平
国内スタートアップを取材する記者が、2014年注目するテーマとは? 先生:平野 武士・岩本 有平
 
「非エンジニア向け 初めてのプログラミング体験講座」@CodeCamp
「非エンジニア向け 初めてのプログラミング体験講座」@CodeCamp「非エンジニア向け 初めてのプログラミング体験講座」@CodeCamp
「非エンジニア向け 初めてのプログラミング体験講座」@CodeCamp
 
MPO Working Process
MPO Working ProcessMPO Working Process
MPO Working Process
 

Similar to DeanIovenitti_WaterQualityReport

Comparative Analysis of Physicochemical Parameters and Heavy Metals of Public...
Comparative Analysis of Physicochemical Parameters and Heavy Metals of Public...Comparative Analysis of Physicochemical Parameters and Heavy Metals of Public...
Comparative Analysis of Physicochemical Parameters and Heavy Metals of Public...
iosrjce
 
Water Quality Monitoring Assessment - Mudgeeraba Catchment - August 2013
Water Quality Monitoring Assessment - Mudgeeraba Catchment - August 2013Water Quality Monitoring Assessment - Mudgeeraba Catchment - August 2013
Water Quality Monitoring Assessment - Mudgeeraba Catchment - August 2013
Markus Race
 
Network and mandates of wq monitoring
Network and mandates of wq monitoringNetwork and mandates of wq monitoring
Network and mandates of wq monitoring
hydrologyproject0
 
Artigo pronto! desinfecção de efluentes primário municipal de águas residua...
Artigo pronto!   desinfecção de efluentes primário municipal de águas residua...Artigo pronto!   desinfecção de efluentes primário municipal de águas residua...
Artigo pronto! desinfecção de efluentes primário municipal de águas residua...
José Demontier Vieira de Souza Filho
 

Similar to DeanIovenitti_WaterQualityReport (20)

IRJET- Modelling BOD and COD using Artificial Neural Network with Factor Anal...
IRJET- Modelling BOD and COD using Artificial Neural Network with Factor Anal...IRJET- Modelling BOD and COD using Artificial Neural Network with Factor Anal...
IRJET- Modelling BOD and COD using Artificial Neural Network with Factor Anal...
 
Comparative Analysis of Physicochemical Parameters and Heavy Metals of Public...
Comparative Analysis of Physicochemical Parameters and Heavy Metals of Public...Comparative Analysis of Physicochemical Parameters and Heavy Metals of Public...
Comparative Analysis of Physicochemical Parameters and Heavy Metals of Public...
 
IRJET- Effectiveness of Residual Disinfectant in Distribution System for Bela...
IRJET- Effectiveness of Residual Disinfectant in Distribution System for Bela...IRJET- Effectiveness of Residual Disinfectant in Distribution System for Bela...
IRJET- Effectiveness of Residual Disinfectant in Distribution System for Bela...
 
WATER QUALITY PREDICTION
WATER QUALITY PREDICTIONWATER QUALITY PREDICTION
WATER QUALITY PREDICTION
 
Study of Solar Distillation on Domestic Wastewater Treatment
Study of Solar Distillation on Domestic Wastewater TreatmentStudy of Solar Distillation on Domestic Wastewater Treatment
Study of Solar Distillation on Domestic Wastewater Treatment
 
Effect of zeolite types ltx and lta on physicochemical parameters of drinking...
Effect of zeolite types ltx and lta on physicochemical parameters of drinking...Effect of zeolite types ltx and lta on physicochemical parameters of drinking...
Effect of zeolite types ltx and lta on physicochemical parameters of drinking...
 
Effect of zeolite types ltx and lta on physicochemical parameters of drinking...
Effect of zeolite types ltx and lta on physicochemical parameters of drinking...Effect of zeolite types ltx and lta on physicochemical parameters of drinking...
Effect of zeolite types ltx and lta on physicochemical parameters of drinking...
 
Comprehensive surface water monitoring
Comprehensive surface water monitoringComprehensive surface water monitoring
Comprehensive surface water monitoring
 
Water Quality Monitoring Assessment - Mudgeeraba Catchment - August 2013
Water Quality Monitoring Assessment - Mudgeeraba Catchment - August 2013Water Quality Monitoring Assessment - Mudgeeraba Catchment - August 2013
Water Quality Monitoring Assessment - Mudgeeraba Catchment - August 2013
 
Network and mandates of wq monitoring
Network and mandates of wq monitoringNetwork and mandates of wq monitoring
Network and mandates of wq monitoring
 
Artigo pronto! desinfecção de efluentes primário municipal de águas residua...
Artigo pronto!   desinfecção de efluentes primário municipal de águas residua...Artigo pronto!   desinfecção de efluentes primário municipal de águas residua...
Artigo pronto! desinfecção de efluentes primário municipal de águas residua...
 
Assessment of mortality and morbidity risks due to the consumption of some sa...
Assessment of mortality and morbidity risks due to the consumption of some sa...Assessment of mortality and morbidity risks due to the consumption of some sa...
Assessment of mortality and morbidity risks due to the consumption of some sa...
 
“A STUDY ON THE SEASONAL VARIATION IN WATER QUALITY OF SHANTIGRAMA LAKE IN HA...
“A STUDY ON THE SEASONAL VARIATION IN WATER QUALITY OF SHANTIGRAMA LAKE IN HA...“A STUDY ON THE SEASONAL VARIATION IN WATER QUALITY OF SHANTIGRAMA LAKE IN HA...
“A STUDY ON THE SEASONAL VARIATION IN WATER QUALITY OF SHANTIGRAMA LAKE IN HA...
 
“A STUDY ON THE SEASONAL VARIATION IN WATER QUALITY OF SHANTIGRAMA LAKE IN HA...
“A STUDY ON THE SEASONAL VARIATION IN WATER QUALITY OF SHANTIGRAMA LAKE IN HA...“A STUDY ON THE SEASONAL VARIATION IN WATER QUALITY OF SHANTIGRAMA LAKE IN HA...
“A STUDY ON THE SEASONAL VARIATION IN WATER QUALITY OF SHANTIGRAMA LAKE IN HA...
 
Physico-Chemical Analysis of Groundwater, RO Water, RO Waste Water and Conser...
Physico-Chemical Analysis of Groundwater, RO Water, RO Waste Water and Conser...Physico-Chemical Analysis of Groundwater, RO Water, RO Waste Water and Conser...
Physico-Chemical Analysis of Groundwater, RO Water, RO Waste Water and Conser...
 
The International Journal of Engineering and Science (IJES)
The International Journal of Engineering and Science (IJES)The International Journal of Engineering and Science (IJES)
The International Journal of Engineering and Science (IJES)
 
207960699 ee-lab-manual
207960699 ee-lab-manual207960699 ee-lab-manual
207960699 ee-lab-manual
 
An Efficient Method for Assessing Water Quality Based on Bayesian Belief Netw...
An Efficient Method for Assessing Water Quality Based on Bayesian Belief Netw...An Efficient Method for Assessing Water Quality Based on Bayesian Belief Netw...
An Efficient Method for Assessing Water Quality Based on Bayesian Belief Netw...
 
Gp3511691177
Gp3511691177Gp3511691177
Gp3511691177
 
sea water quality
sea water qualitysea water quality
sea water quality
 

DeanIovenitti_WaterQualityReport

  • 1. Page 1 of 31 Analysis of water quality in the Keilor Lodge locality and recommendations for a suitable sampling program Dean Iovenitti Industrial Application of Mathematics and Statistics 2 report RMIT University May 2016
  • 2. Page 2 of 31 With thanks to City West Water, the Operations Research division for their mathematical and statistical assistance with this project and to George Ruta for providing and clearly explaining information about the water quality processes that City West Water undertakes.
  • 3. Page 3 of 31 Table of Contents 1. Executive Summary .......................................................................................................... 4 2. Introduction ...................................................................................................................... 5 2.1 – City West Water & data description......................................................................... 5 2.2 – Aims ......................................................................................................................... 5 3. Background ....................................................................................................................... 6 3.1 – Water quality processes............................................................................................. 6 3.2 – KAPTA probe........................................................................................................... 6 3.3 – Cost of sampling ....................................................................................................... 6 3.4 – Issues with the data .................................................................................................. 7 4. Method.............................................................................................................................. 8 4.1 – Data preparation....................................................................................................... 8 4.2 – Bootstrapping ........................................................................................................... 8 5. Results .............................................................................................................................. 9 5.1 – Graphical results & comments .................................................................................. 9 5.1.1 – Probe data.......................................................................................................... 9 5.1.2 – Manual data ......................................................................................................12 5.2 – Comparison of probe & manual data........................................................................13 5.3 – Statistical analysis....................................................................................................14 5.3.1 – Hypotheses and assumptions..............................................................................14 5.3.2 - Skewness ............................................................................................................15 5.3.3 – Testing for normality .........................................................................................16 5.3.4 – Bootstrapping ....................................................................................................18 6. Conclusions ......................................................................................................................20 6.1 – Recommendations ....................................................................................................20 6.2 – Further Analysis ......................................................................................................20 7. References ........................................................................................................................21 8. Appendix 1.......................................................................................................................22 9. Appendix 2.......................................................................................................................23 10. Appendix 3.....................................................................................................................24 11. Appendix 4.....................................................................................................................25
  • 4. Page 4 of 31 1. Executive Summary This report will provide an analysis of water quality data obtained from the Keilor Lodge area and put forth recommendations for a suitable sampling program. Probe data was obtained by VEOLIA Water and manual data was provided by ALS Laboratories over a period of 23 months from January 2013 to November 2014. The probe data in particular had not been previously analysed so therefore the client is interested in what is happening with the data and what the data is indicating such as if there are any unusual patterns in the data. All water quality indicators, such as chlorine and conductivity, were within the recommended guidelines. Graphical representation showed that chlorine concentrations and electrical conductivity levels were all within the recommended guidelines for both the probe and manual data. Analysis was performed using an improvised bootstrapping technique which led to the conclusion that there is not a statistical difference between the variances of the manual and probe data. However, graphically there was not a “significant” difference between the two methods. Recommendations include sampling using either method but if cost were a factor, then manual sampling would provide a cheaper option. If diurnal patterns were to be analysed, a probe can be temporarily inserted by CWW. Further analysis could be conducted on finding or developing a different method to deal with the large difference in sample sizes for the two methods.
  • 5. Page 5 of 31 2. Introduction 2.1 – City West Water & data description City West Water (CWW), one of three water retail businesses in metropolitan Melbourne, provide drinking water, sewerage, trade waste and recycled water services to customers in Melbourne’s CBD and western suburbs [1]. This involves developing conservation and contingency plans for water resource management and emergency situations [2]. The probe data was provided by VEOLIA Water and recorded values in a portion of the Keilor Lodge area for chlorine (in mg/L), electrical conductivity (in µS/cm, micro-Siemens per cm), temperature in °C and pressure in bar. These values were recorded from the 16th of January 2013 at 4:54pm to the 8th of November 2014 at 6:56am in 5 minute intervals culminating in a total of 185,232 measurements for each parameter. A probe can only detect one form of chlorine, namely hypochlorous acid (HOCl), in the water which is different from the chlorine values obtained via manual sampling (see next section). The manual data was supplied by ALS Laboratories and samples were taken for analysis for chlorine and electrical conductivity (or simply conductivity) with units of mg/L and µS/cm respectively. Samples were taken exclusively on weekdays; on average one chlorine sample per weekday, resulting in 526 measurements from 41 locations throughout the Keilor area. Conductivity samples were taken fortnightly resulting in 47 measurements from 29 locations in the Keilor area. This form of sampling measures for three different forms of chlorine including HOCl, OCl- and unreacted Cl2, resulting in a higher measured concentration than that of the probe. Hence when the comparison is made between the manual and probe data later in this report, this should be taken into consideration. Manual tests were undertaken by NATA accredited laboratory. 2.2 – Aims The objective of the report is to determine characteristic (if any) features of chlorine concentration, temperature and conductivity from data in the Keilor Lodge area. The development of a suitable sampling program will also be investigated for City West Water to consider implementing.
  • 6. Page 6 of 31 3. Background 3.1 – Water quality processes City West Water undertakes sampling to determine the quality of the water that they are supplying to their customers. This includes taking bacterial samples (looking if E. coli is present which indicates faecal contamination and hence a health risk) in addition to physical samples (such as pH and colour) and chemical samples (such as l) [3]. The main characteristics that will be investigated in this report are chlorine concentration and electrical conductivity, in addition to temperature but to a lesser extent. Chlorine is an important feature because it is a form of disinfectant to defend against any microbes in the water (G. Ruta [City West Water, Victoria] 2016, pers. comm., 24 May 2016). Also, the chlorine concentration cannot be too low otherwise the microbes are not eliminated and conversely cannot be too high else the concentration could be above the acceptable guideline (i.e. toxic to human consumption). Electrical conductivity (or conductivity) measures for ions that conduct electricity in the water. This is an extremely sensitive measure so if there is a contamination, then a large spike (much greater than the “normal” values) will be measured. Temperature is not a significant characteristic but it is worthwhile to understand whether it displays any interesting patterns. 3.2 – KAPTA probe The KAPTATM 3000-AC4 probe is a battery operated device that was rented from VEOLIA Water for the 23-month period from January 2013 to November 2014. It can record chlorine concentration, conductivity, pressure and temperature simultaneously [4]. When inserting one of these probes, a small hole is drilled into the appropriate water pipe and then a rod inserted into the pipe so the data can be recorded. Any gaps are sealed so as to become waterproof with the probe being left untouched for the duration of the sampling (G. Ruta [City West Water, Victoria] 2016, pers. comm., 18 March 2016). 3.3 – Cost of sampling The cost for one chlorine sample is $31. However, other measurements are taken at the same time that the chlorine is taken, but it will be assumed that it costs $31 to take a chlorine sample. For one conductivity sample, it costs $5. There were a total of 52 conductivity samples taken which cost a total of 52 × $5 = $260. 526 chlorine samples were taken which cost a total of 526 × $31 = $16,306. This results in a grand total of $16,566 for all the manual chlorine and conductivity samples. Renting a KAPTA probe costs approximately $900 per month (M. Ramov [City West Water, Victoria] 2016, pers. comm., 11 May 2016). A probe was inserted for about 23 months so that means the total cost of renting the probe was 23 × $900 = $20,700. Note that this records chlorine concentration, conductivity, pressure and temperature. Furthermore, the probe doesn’t cost less if it records less frequently however, this doesn’t necessarily mean that the battery life of the probe will be extended (M. Ramov [City West Water, Victoria] 2016, pers. comm., 11 May 2016).
  • 7. Page 7 of 31 3.4 – Issues with the data There were several relatively minor issues with the data. The first of these being that there were a few occasions when the probe failed to record measurements for a certain period of time. The most significant of these was a period of 7 days from April 30th to May 7th 2014 and to a lesser extent for a period of about 6 hours on the 27th of July 2014 from 8am to 2pm when no data was recorded. For the period of 7 days without data, the chlorine concentration increased by 0.06mg/L, temperature decreased by 0.3°C and conductivity remained the same. Considering only 2000 values (approximately) are missing from the probe data, little information is being lost as there are more than 180,000 values recorded from the probe. The times for when each manual sample was taken could have been obtained, but it would have required an extensive search on the part of the personnel in the water quality department of CWW. If these times were acquired, then the manual data times could be compared with those of the probe to determine whether the times when the manual samples were taken were a true reflection of that particular day or week. An interesting challenge was finding a method that could statistical test whether there is a difference between the probe and manual data. This was more difficult than was expected and with the help of the Operations Research team at CWW an improvised bootstrapping method was developed and then implemented in MATLAB (see Bootstrapping for more details).
  • 8. Page 8 of 31 4. M ethod 4.1 – Data preparation The creation of pivot-tables in Excel was an essential part of the preparation for analysis since these helped to simplify the data such that the features of each particular parameter (chlorine, conductivity or temperature) could be determined. The graphs from this are analysed in Graphical results and comments. Additionally, it has been confirmed that pressure was not analysed in this report at CWW’s request (G. Ruta [City West Water, Victoria] 2016, pers. comm., 18 March 2016). 4.2 – Bootstrapping A technique known as bootstrapping can be applied to data that is not normally distributed whereby the classical or general form compares for a difference in means between two samples. However, since the comparison between the probe and manual data is regarding the variation, it is more appropriate to develop a method that tests for differences in variance rather than differences in means. The reasons for using this method will be made apparent throughout this report. The following method is a variation of a bootstrapping method whereby it tests for a difference in variances which has been modified by Stuart Roberts (Statistical Analyst in the Operations Research team at CWW). There are two sample distributions, the manual and the probe data, which will be called M and P respectively for the purposes of this method. It will be assumed that the two samples accurately describe the population distribution for both M and P. Below is the bootstrapping method for variances. Step 1: From both of the samples M and P, one value is randomly selected x number of times per group where x is the size of the M and P samples. Sampling with replacement is applied which results in 2 new sample groups for M and P with the size of the new sample being the same as the size of the original sample. Step 2. The sample variances for these new sample groups is calculated and recorded resulting in variance Sm and Sp for the manual and probe respectively. Step 3. Steps 1 and 2 are repeated a large number of times, say 1000 times, so then there are 1000 values of both Sm and Sp with Sm = {Sm1, Sm2, …, Sm1000} and Sp = {Sp1, Sp2, …, Sp1000}. Step 4: The 5th and 95th percentiles are calculated for both Sm and Sp. Step 5: Lastly, if the percentile ranges (Step 4) of Sm and Sp do not overlap, then there is sufficient evidence to conclude that there is a difference of variance.
  • 9. Page 9 of 31 5. Results 5.1 – Graphical results & comments 5.1.1 – Probe data Below is a table summarising some important statistical measures for the raw probe data. Chlorine Conductivity Temperature Mean 0.0903 69.0 16.7 Variance 0.0043 18.8 14.0 Minimum value 0 55 11.1 Maximum value 0.38 110 23.7 Table 1: Some important statistical measures evaluated from the probe chlorine, conductivity and temperature data. Firstly, the graphs for the probe monthly chlorine, conductivity and temperature data are shown in Figures 1, 2 and 3 respectively. The bars, lines and circles represent the range for that particular month. Figure 1: Probe chlorine concentration monthly averages and ranges from January 2013 to November 2014 measured in mg/L. The monthly averages of chlorine concentration (Figure 1) appear to have yearly seasonality, meaning there is a peak about 12 months apart. These peaks (and troughs) are not exactly defined by a single observation with the peak or trough lasting multiple months. Peak chlorine concentrations occur during the winter months and potentially a month either side of winter whereas the troughs occur predominantly throughout the summer months only in addition to March. The ranges from January 2014 to May 2014 (except March) are reasonably large with the smallest ranges occurring during the summer months. Additionally, the overall range of the monthly averages can be more accurately defined as being from about 0.025 to 0.18 mg/L with the range for winter months generally larger than the range of the summer months.
  • 10. Page 10 of 31 Figure 2: Probe conductivity monthly averages and ranges from January 2013 to November 2014 measured in µS/cm. The monthly conductivity averages (Figure 2, above) show a predominantly decreasing trend over the sampling period except for the months from about January to April where there is an increasing trend, for both 2013 and 2014. Even though there is no particular peak or trough month, the data appears to be seasonal (yearly), meaning cycles of 12 months. The overall range of the monthly averages is from 60.7 to 74.5 µS/cm, with ranges for each month predominantly less than about 20 µS/cm (e.g. from 70 to 75 µS/cm) but going as high as 40 µS/cm (e.g. from 70 to 100 µS/cm in February 2013). The overall decrease in trend can be attributed to there being an increase in the water supplied to all City West Water localities (including Keilor Lodge) from the Silvan Reservoir. This means that since there is a lower conductivity in the water supplied from the Silvan Reservoir [3], the conductivity levels decrease. Figure 3: Probe temperature monthly averages and ranges from January 2013 to November 2014 measured in °C. Temperature monthly averages have a strong cyclic pattern that is more prominent than that of chlorine. The peaks and troughs are more clearly defined (generally one point) with the 2013 and 2014 averages almost identical. The peaks occur during either February or March (both considered hot months on average) and the troughs occur around July and August (both considered cold months on average). This is to be expected since December
  • 11. Page 11 of 31 through to March will have, for the majority, much hotter temperatures than June through September with the range of monthly averages from about 11.5 to 22.5°C. The range for each month is about the same from one month to the next with all ranges less than 5°C. Figure 4: Probe chlorine concentration hourly averages measured in mg/L. Chlorine hourly averages show there is a decrease of about 0.05 mg/L from 9pm to 3am. This “dip” can be attributed to the chlorine dissipating since the water is sitting in the pipe overnight. The rise is probably due to people waking up and getting ready to go to work. Once the water starts flowing in the pipe then it becomes chlorinated again, which results in a rise in the chlorine concentration. Besides this “dip”, the mean is roughly constant (with no chlorine dissipation) with the range of values (from about 9am to 9pm) less than 0.01 mg/L compared with the overall range of about 0.06mg/L. Figure 5: Probe conductivity hourly averages measured in µS/cm. Conductivity hourly averages are roughly constant over a 24-hour period with minimal trend and no obvious cycles. The range is from 68.3 to 70.6 µS/cm (about 2.3 µS/cm). 0 0.05 0.1 0.15 0.2 Midnight 3am 6am 9am Noon 3pm 6pm 9pm 11pm Chlorine(mg/L) Hourly average chlorine concentration 0 20 40 60 80 100 120 Midnight 3am 6am 9am Noon 3pm 6pm 9pm 11pm Conductivity(µS/cm) Hourly conductivity average
  • 12. Page 12 of 31 Figure 6: Probe temperature hourly averages measured in °C. Temperature hourly averages are virtually constant over 24 hours with no other important features to mention. 5.1.2 – Manual data Below is a table summarising some important statistical measures for the manual data. Chlorine Conductivity Temperature Mean 0.2033 76.5 16.7 Variance 0.0175 178.6 14.0 Minimum value 0 62 11.1 Maximum value 0.38 120 23.7 Table 2: Some important statistical values evaluated from the manual chlorine, conductivity and temperature data. Figure 7: Manual conductivity data from January 2013 to November 2014. The conductivity data appear to have a roughly constant mean with larger variation during the first six months of 2013 than any other period when the samples were taken. The large variations were caused by fluctuations in water source. The range of values is from 62 to 120 µS/cm. Next, the manual chlorine data are analysed. Since there are about 500 measurements, averages of the data will be taken, and hence analysed, using the same process as that of the probe data. Below is the graph showing the chlorine concentration monthly averages and ranges. 10 15 20 25 Midnight 3am 6am 9am Noon 3pm 6pm 9pm 11pm Temperature(°C) Hourly average temperature 0 40 80 120 J F M A M J J A S O D J F M A M J J A S O N 2013 2014 Conductivity(µS/cm) All manual conductivity measurements
  • 13. Page 13 of 31 Figure 8: Manual chlorine concentration monthly averages and ranges from January 2013 to November 2014. Figure 8 shows that there is seasonality in the data (yearly) with the peaks and troughs lasting for several months in addition to no apparent overall trend. The overall range is from 0 to 0.65 mg/L with the range for each month about the same from one month to the next. 5.2 – Comparison of probe & manual data Since the only manual data provided was for chlorine concentrations and conductivity, then only these two parameters can be compared against the probe data. Below is a combined graph of the manual and probe monthly averages for chlorine. Figure 9: Manual & probe chlorine monthly averages from January 2013 to November 2014. Besides the difference in means of the probe and manual chlorine monthly averages, both series display similar features including seasonality, variation and overall shape. Thus, it 0 0.05 0.1 0.15 0.2 0.25 0.3 J F M A M J J A S O N D J F M A M J J A S O N 2013 2014 Chlorine(mg/L) Monthly average chlorine concentrations (probe & manual) Probe Manual
  • 14. Page 14 of 31 appears that in looking at longer term trends, taking manual measurements on a daily basis is just as good as inserting a probe and recording measurements every 5 minutes for obtaining an overall picture of how the chlorine concentration is fairing. Since the manual samples of conductivity were taken fortnightly, a logical assumption would be to compare the fortnightly raw manual data against the fortnightly averages of the probe data. However, this would be an unfair comparison since raw values would be compared against averages. If this was conducted, the averages would lack the variation that is present in the raw data. Similarly, monthly averages of the conductivity will have more variation than that of the probe data, so for this case quarterly averages will be compared. Below is a combined graph of the quarterly averages for the manual and probe conductivity data (Figure 10). Figure 10: Manual and probe conductivity quarterly averages from January 2013 to November 2014. The manual quarterly averages are marginally higher than the corresponding quarter for the probe data, not a very large difference between the two methods when observing the quarterly averages. Therefore, the manual conductivity data appear to give a satisfactory overall picture over the 23-month period. Additional manual sampling information for conductivity may need to be obtained for a better comparison with the probe data. Hence, it appears that using manual samples to obtain chlorine concentrations and conductivity gives just as good an indication about overall trend, mean and variation as that of the probe. 5.3 – Statistical analysis 5.3.1 – Hypotheses and assumptions The first stage of any statistical analysis is establishing a hypothesis that will be tested. So for this report, the hypotheses are: H0: the variances of the probe and manual data are the same and therefore either sampling method could be used H1: the variances of the probe and manual data are different 0 20 40 60 80 100 120 March June September December March June September December 2013 2014 Conductivity(µS/cm) Quarterly average conductivity (probe & manual) Probe Manual
  • 15. Page 15 of 31 Where H0 and H1 are the null and alternate hypotheses, respectively. There are some necessary assumptions for many statistical tests that need to be satisfied before the test can be suitably conducted. If a test of variances is to be conducted, then three assumptions are required which are the samples needs to be randomly selected, independent and normally distributed. 5.3.2 - Skewness Before looking at what test can be applied to the data or determine if the data are normally distributed, first an analysis of whether the data are skewed (either positively, negatively or not at all) should be conducted. This will help establish whether the data are normally distributed or if a transformation is required. Figures 11 and 12 show histograms for the manual chlorine and probe chlorine data respectively. Figure 11: Histogram of the manual chlorine data, showing frequency of events. Figure 12: Histogram of the probe chlorine data, showing frequency of events. 0.60.50.40.30.20.10.0 40 30 20 10 0 Chlorine manual Frequency Histogram of Chlorine manual
  • 16. Page 16 of 31 From the above figures, it can be seen that both histograms are positively skewed (or skewed to the right). This would suggest, even before conducting a test for normality, that neither the manual nor probe chlorine data will be normally distributed. The same observation can be concluded from the conductivity histograms (i.e. that the manual and probe data are positively skewed) and hence these data sets will most likely not follow a normal distribution. For the conductivity histograms regarding the manual and probe data, see Appendix 2. 5.3.3 – Testing for normality Before any statistical testing can be done, the data needs to be deemed sufficiently normally distributed at the 0.05 level of significance. This means that we can be 95% sure that the data are normally distributed if the test for normality passes. Since the comparison is limited to probe and manual data for the chlorine and conductivity parameters only, four tests for normality were undertaken. Below is the probability plot of the manual chlorine data. Figure 13: Probability plot of the manual chlorine data, testing for normality at the 0.05 level of significance. The hypothesis that the data are normally distributed for the manual chlorine data is rejected since the p-value is less than the 0.05 significance level. Thus, the manual chlorine data are not normally distributed. Moreover, when testing for normality for the probe chlorine data and both conductivity data sets from manual and probe, similar results occurred with all of the remaining data sets not passing the test for normality. For the probability plots associated with each of these data sets, see Appendix 3.
  • 17. Page 17 of 31 Since the test for normality failed for the probe and manual data (for both chlorine and conductivity), then a 2-sample variance test could not be conducted since this test assumes that the data are normally distributed. Furthermore, since an objective is to determine whether the probe and manual data have similar variation, the non-parametric tests such as the Mann-Whitney and Kruskal-Wallis are not testing for a difference in variance. Hence, a different method had to be found to test whether the probe and manual data have statistical different variances. To determine whether the data can be transformed, or if it can be fitted to a distribution, the Individual Distribution Identification in Minitab was utilised to check if the data fitted a known distribution, such as exponential, log, Weibull, etc. This was firstly conducted for the manual chlorine data which resulted in several plots with one of these shown below in Figure 14. Figure 14: Distribution identification plot attempting to fit a known distribution to the manual chlorine data. Shown are four distributions with p-values all less than 0.05, meaning none of the four distributions fit the manual chlorine data. The extra distribution identification plots for the manual chlorine data can be found in Appendix 4. These additional distribution identification plots all had p-values less than 0.05 which meant that no suitable transformation or distribution fitted the manual chlorine data. Hence, no appropriate tests (i.e. tests that rely on the assumption that the data are normal) were conducted. The same procedure was applied to the probe chlorine data in addition to the conductivity manual and probe data, but achieved the same results except for the manual conductivity data.
  • 18. Page 18 of 31 A distribution was found that fits the manual conductivity data (Figure 15) which was a Box-Cox transformation with λ = -5, meaning the reciprocal was taken and then the data was raised to the fifth power. However, since the same transformation couldn’t be successfully applied to the probe conductivity data without the data being normally distributed, then no appropriate transformation or distribution fits the conductivity data. Figure 15: Distribution identification plot attempting to fit a known distribution to the manual conductivity data. Additional distribution plots for the probe chlorine data and conductivity manual and probe data can be found in Appendix 4. Therefore, an alternative method had to be developed in order for the data to be tested. 5.3.4 – Bootstrapping After developing a suitable program in MATLAB (outlined in section 3.2 Bootstrapping), the sample variances for the chlorine and conductivity data (for both probe and manual) could be calculated. The chlorine data will be discussed initially before progressing to the conductivity data. The average of the sample variances for probe chlorine was determined to be 0.004343 mg/L. The 5th and 95th percentiles of the sample variances from the manual data were calculated as 0.015569 mg/L and 0.019452 mg/L respectively. Since there is no overlap of percentiles, then there is evidence to conclude that there is a difference between the variances of chlorine for the probe and manual data. Similarly, for the conductivity data the average of the sample variances from the probe was determined as 18.79 µS/cm. Additionally, the 5th and 95th percentiles of the sample variances from the manual data were calculated as 81.31 µS/cm and 281.96 µS/cm respectively. Since
  • 19. Page 19 of 31 there is no overlap of percentiles, then there is evidence to conclude that there is a difference between the variances of conductivity for the probe and manual data. From the bootstrapping results, it can be established that there is a difference between the variances of chlorine for the probe and manual data and similarly for the conductivity variances. For the manual data, the 5th and 95th percentiles of the variances are a few times larger than those of the probe data and is due to the large differences in sample size between the two methods.
  • 20. Page 20 of 31 6. Conclusions Statistically, the probe data has less variability than the manual data for both the chlorine concentration and electrical conductivity. This is due to the large difference in sample size between the two methods. However, when observing the graphs which were comparing the probe and manual data, there was not a “significant” difference between the two methods for either chlorine concentration or electrical conductivity. 6.1 – Recommendations Based on the results and the consequent conclusions, the first recommendation would be that either method, probe or manual sampling, is acceptable for observing the long-term characterisation of the water quality, even though the variances were found to be significantly different. Secondly, the cheaper option of manual sampling can be implemented instead of the probe. However, if diurnal patterns were to be analysed then it would be advised that City West Water can temporarily rent a probe to record these patterns. 6.2 – Further Analysis The first option for further analysis would be to find some transformation, possibly utilising a different statistical software such as SPSS, SAS or R, such that both the manual and probe data sets can be statistically tested. Further investigation could be conducted on grouping the probe data by month (i.e. January 2013 with January 2014, February 2013 with February 2014, etc) and determine whether the data can be transformed in this way. An additional option for analysis could be to find a statistical method that could deal with the large difference in sample size between the two methods. Finally, more analysis could be conducted to establish whether randomly selecting 47 (equivalent to number of conductivity values) or 447 values (equivalent to number of chlorine values) from the probe data produces different results. Furthermore, randomly selecting without replacement could be implemented for the probe data only, in conjunction with the method outlined in the previous sentence.
  • 21. Page 21 of 31 7. References [1] City West Water 2016, Who We Are, viewed 9 May 2015, <https://www.citywestwater.com.au/about_us/who_we_are.aspx>. [2] City West Water 2016, Where We Fit in, viewed 22 May 2015, <https://www.citywestwater.com.au/about_us/where_we_fit_in.aspx>. [3] City West Water, 2015, Drinking Water Quality Report 2015, City West Water, Melbourne. [4] KAPTATM 3000-AC4, In-line Multi-parameter Water Sensor 2014, viewed 26 March 2016, <http://www.endetec.com/endetec/ressources/files/1/20117,LIT-EN-032-02_KAPTA- 3000-AC4-Produ.pdf>.
  • 22. Page 22 of 31 8. Appendix 1 MATLAB code for the improvised bootstrapping technique. The example code is for the manual conductivity data. clear; % 47 manual conductivity measurements in = fopen('Conductivity manual data only.txt','r'); % Opening the text file to read the values Manualcond = fscanf(in, '%f', [1,inf]); % Reading all the values in the text file Mcond = zeros(1000,47); % Creating a 1000 x 47 matrix of all zeroes fclose(in); % Closing the file new_out = fopen('Conductivity manual random sample.txt','w'); % Creating a new text file for i = 1:1000 for j = 1:47 Mcond(i,j) = datasample(Manualcond,1); % Selecting a random value, with replacement, from the data every time through the j loop. fprintf(new_out, '%4.2f ', Mcond(i,j)); % Storing selected value into the newly created file end; fprintf(new_out, 'n'); end; fclose(new_out); % Closing the file varfile = fopen('Variance is here Cond manual.txt','w'); % Creating a new text file to store the variances of the manual conductivity data varcalc = var(Mcond,0,2); % Calculating the variances for each row in the text file “Conductivity manual random sample” fprintf(varfile, '%20.10f', varcalc); % Storing variances into newly created text file fclose(varfile); % Closing the file
  • 23. Page 23 of 31 9. Appendix 2 12010896847260 20 15 10 5 0 Conductivity manual Frequency Histogram of Conductivity manual 10896847260 90000 80000 70000 60000 50000 40000 30000 20000 10000 0 Conductivity probe Frequency Histogram of Conductivity probe
  • 24. Page 24 of 31 10. Appendix 3
  • 25. Page 25 of 31 11. Appendix 4 M anual chlorine probability plots, attempting to find an appropriate transformation All p-values in the above two probability plots are less than 0.05 which means that the manual chlorine data doesn’t fit any of the above distributions.
  • 26. Page 26 of 31 From the above graph, since the p-value < 0.05 then no appropriate Johnson transformation can be made to the data. M anual conductivity probability plots, attempting to find an appropriate transformation
  • 27. Page 27 of 31 Since Minitab found a sufficient transformation using the Box-Cox transformation, then there is no need to consider the Johnson transformation. Additionally, the other transformations all had p-values less than 0.05 so would nonetheless be an inappropriate transformation to the conductivity manual.
  • 28. Page 28 of 31 Probe conductivity probability plots, attempting to find an appropriate transformation
  • 29. Page 29 of 31 All p-values in the above four probability plots are less than 0.05 which means that the probe conductivity data doesn’t fit any of the above distributions.
  • 30. Page 30 of 31 Probe chlorine probability plots, attempting to find an appropriate transformation
  • 31. Page 31 of 31 All p-values in the above three probability plots are less than 0.05 which means that the manual chlorine data doesn’t fit any of the above distributions.