1. Page 1 of 31
Analysis of water quality in the Keilor
Lodge locality and recommendations
for a suitable sampling program
Dean Iovenitti
Industrial Application of Mathematics and Statistics 2 report
RMIT University
May 2016
2. Page 2 of 31
With thanks to City West Water, the Operations Research division for their mathematical
and statistical assistance with this project and to George Ruta for providing and clearly
explaining information about the water quality processes that City West Water
undertakes.
3. Page 3 of 31
Table of Contents
1. Executive Summary .......................................................................................................... 4
2. Introduction ...................................................................................................................... 5
2.1 – City West Water & data description......................................................................... 5
2.2 – Aims ......................................................................................................................... 5
3. Background ....................................................................................................................... 6
3.1 – Water quality processes............................................................................................. 6
3.2 – KAPTA probe........................................................................................................... 6
3.3 – Cost of sampling ....................................................................................................... 6
3.4 – Issues with the data .................................................................................................. 7
4. Method.............................................................................................................................. 8
4.1 – Data preparation....................................................................................................... 8
4.2 – Bootstrapping ........................................................................................................... 8
5. Results .............................................................................................................................. 9
5.1 – Graphical results & comments .................................................................................. 9
5.1.1 – Probe data.......................................................................................................... 9
5.1.2 – Manual data ......................................................................................................12
5.2 – Comparison of probe & manual data........................................................................13
5.3 – Statistical analysis....................................................................................................14
5.3.1 – Hypotheses and assumptions..............................................................................14
5.3.2 - Skewness ............................................................................................................15
5.3.3 – Testing for normality .........................................................................................16
5.3.4 – Bootstrapping ....................................................................................................18
6. Conclusions ......................................................................................................................20
6.1 – Recommendations ....................................................................................................20
6.2 – Further Analysis ......................................................................................................20
7. References ........................................................................................................................21
8. Appendix 1.......................................................................................................................22
9. Appendix 2.......................................................................................................................23
10. Appendix 3.....................................................................................................................24
11. Appendix 4.....................................................................................................................25
4. Page 4 of 31
1. Executive Summary
This report will provide an analysis of water quality data obtained from the Keilor Lodge
area and put forth recommendations for a suitable sampling program. Probe data was
obtained by VEOLIA Water and manual data was provided by ALS Laboratories over a
period of 23 months from January 2013 to November 2014.
The probe data in particular had not been previously analysed so therefore the client is
interested in what is happening with the data and what the data is indicating such as if
there are any unusual patterns in the data. All water quality indicators, such as chlorine and
conductivity, were within the recommended guidelines.
Graphical representation showed that chlorine concentrations and electrical conductivity
levels were all within the recommended guidelines for both the probe and manual data.
Analysis was performed using an improvised bootstrapping technique which led to the
conclusion that there is not a statistical difference between the variances of the manual and
probe data. However, graphically there was not a “significant” difference between the two
methods.
Recommendations include sampling using either method but if cost were a factor, then
manual sampling would provide a cheaper option. If diurnal patterns were to be analysed, a
probe can be temporarily inserted by CWW.
Further analysis could be conducted on finding or developing a different method to deal with
the large difference in sample sizes for the two methods.
5. Page 5 of 31
2. Introduction
2.1 – City West Water & data description
City West Water (CWW), one of three water retail businesses in metropolitan Melbourne,
provide drinking water, sewerage, trade waste and recycled water services to customers in
Melbourne’s CBD and western suburbs [1]. This involves developing conservation and
contingency plans for water resource management and emergency situations [2].
The probe data was provided by VEOLIA Water and recorded values in a portion of the
Keilor Lodge area for chlorine (in mg/L), electrical conductivity (in µS/cm, micro-Siemens
per cm), temperature in °C and pressure in bar. These values were recorded from the 16th of
January 2013 at 4:54pm to the 8th of November 2014 at 6:56am in 5 minute intervals
culminating in a total of 185,232 measurements for each parameter. A probe can only detect
one form of chlorine, namely hypochlorous acid (HOCl), in the water which is different from
the chlorine values obtained via manual sampling (see next section).
The manual data was supplied by ALS Laboratories and samples were taken for analysis for
chlorine and electrical conductivity (or simply conductivity) with units of mg/L and µS/cm
respectively. Samples were taken exclusively on weekdays; on average one chlorine sample
per weekday, resulting in 526 measurements from 41 locations throughout the Keilor area.
Conductivity samples were taken fortnightly resulting in 47 measurements from 29 locations
in the Keilor area.
This form of sampling measures for three different forms of chlorine including HOCl, OCl-
and unreacted Cl2, resulting in a higher measured concentration than that of the probe.
Hence when the comparison is made between the manual and probe data later in this report,
this should be taken into consideration.
Manual tests were undertaken by NATA accredited laboratory.
2.2 – Aims
The objective of the report is to determine characteristic (if any) features of chlorine
concentration, temperature and conductivity from data in the Keilor Lodge area. The
development of a suitable sampling program will also be investigated for City West Water to
consider implementing.
6. Page 6 of 31
3. Background
3.1 – Water quality processes
City West Water undertakes sampling to determine the quality of the water that they are
supplying to their customers. This includes taking bacterial samples (looking if E. coli is
present which indicates faecal contamination and hence a health risk) in addition to physical
samples (such as pH and colour) and chemical samples (such as l) [3].
The main characteristics that will be investigated in this report are chlorine concentration
and electrical conductivity, in addition to temperature but to a lesser extent. Chlorine is an
important feature because it is a form of disinfectant to defend against any microbes in the
water (G. Ruta [City West Water, Victoria] 2016, pers. comm., 24 May 2016). Also, the
chlorine concentration cannot be too low otherwise the microbes are not eliminated and
conversely cannot be too high else the concentration could be above the acceptable guideline
(i.e. toxic to human consumption). Electrical conductivity (or conductivity) measures for
ions that conduct electricity in the water. This is an extremely sensitive measure so if there
is a contamination, then a large spike (much greater than the “normal” values) will be
measured. Temperature is not a significant characteristic but it is worthwhile to understand
whether it displays any interesting patterns.
3.2 – KAPTA probe
The KAPTATM 3000-AC4 probe is a battery operated device that was rented from VEOLIA
Water for the 23-month period from January 2013 to November 2014. It can record chlorine
concentration, conductivity, pressure and temperature simultaneously [4]. When inserting
one of these probes, a small hole is drilled into the appropriate water pipe and then a rod
inserted into the pipe so the data can be recorded. Any gaps are sealed so as to become
waterproof with the probe being left untouched for the duration of the sampling (G. Ruta
[City West Water, Victoria] 2016, pers. comm., 18 March 2016).
3.3 – Cost of sampling
The cost for one chlorine sample is $31. However, other measurements are taken at the same
time that the chlorine is taken, but it will be assumed that it costs $31 to take a chlorine
sample. For one conductivity sample, it costs $5. There were a total of 52 conductivity
samples taken which cost a total of 52 × $5 = $260. 526 chlorine samples were taken which
cost a total of 526 × $31 = $16,306. This results in a grand total of $16,566 for all the
manual chlorine and conductivity samples.
Renting a KAPTA probe costs approximately $900 per month (M. Ramov [City West
Water, Victoria] 2016, pers. comm., 11 May 2016). A probe was inserted for about 23
months so that means the total cost of renting the probe was 23 × $900 = $20,700. Note
that this records chlorine concentration, conductivity, pressure and temperature.
Furthermore, the probe doesn’t cost less if it records less frequently however, this doesn’t
necessarily mean that the battery life of the probe will be extended (M. Ramov [City West
Water, Victoria] 2016, pers. comm., 11 May 2016).
7. Page 7 of 31
3.4 – Issues with the data
There were several relatively minor issues with the data. The first of these being that there
were a few occasions when the probe failed to record measurements for a certain period of
time. The most significant of these was a period of 7 days from April 30th to May 7th 2014
and to a lesser extent for a period of about 6 hours on the 27th of July 2014 from 8am to
2pm when no data was recorded. For the period of 7 days without data, the chlorine
concentration increased by 0.06mg/L, temperature decreased by 0.3°C and conductivity
remained the same. Considering only 2000 values (approximately) are missing from the probe
data, little information is being lost as there are more than 180,000 values recorded from the
probe.
The times for when each manual sample was taken could have been obtained, but it would
have required an extensive search on the part of the personnel in the water quality
department of CWW. If these times were acquired, then the manual data times could be
compared with those of the probe to determine whether the times when the manual samples
were taken were a true reflection of that particular day or week.
An interesting challenge was finding a method that could statistical test whether there is a
difference between the probe and manual data. This was more difficult than was expected
and with the help of the Operations Research team at CWW an improvised bootstrapping
method was developed and then implemented in MATLAB (see Bootstrapping for more
details).
8. Page 8 of 31
4. M ethod
4.1 – Data preparation
The creation of pivot-tables in Excel was an essential part of the preparation for analysis
since these helped to simplify the data such that the features of each particular parameter
(chlorine, conductivity or temperature) could be determined. The graphs from this are
analysed in Graphical results and comments. Additionally, it has been confirmed that
pressure was not analysed in this report at CWW’s request (G. Ruta [City West Water,
Victoria] 2016, pers. comm., 18 March 2016).
4.2 – Bootstrapping
A technique known as bootstrapping can be applied to data that is not normally distributed
whereby the classical or general form compares for a difference in means between two
samples. However, since the comparison between the probe and manual data is regarding the
variation, it is more appropriate to develop a method that tests for differences in variance
rather than differences in means. The reasons for using this method will be made apparent
throughout this report.
The following method is a variation of a bootstrapping method whereby it tests for a
difference in variances which has been modified by Stuart Roberts (Statistical Analyst in the
Operations Research team at CWW).
There are two sample distributions, the manual and the probe data, which will be called M
and P respectively for the purposes of this method. It will be assumed that the two samples
accurately describe the population distribution for both M and P. Below is the bootstrapping
method for variances.
Step 1: From both of the samples M and P, one value is randomly selected x number of
times per group where x is the size of the M and P samples. Sampling with replacement is
applied which results in 2 new sample groups for M and P with the size of the new sample
being the same as the size of the original sample.
Step 2. The sample variances for these new sample groups is calculated and recorded
resulting in variance Sm and Sp for the manual and probe respectively.
Step 3. Steps 1 and 2 are repeated a large number of times, say 1000 times, so then there are
1000 values of both Sm and Sp with Sm = {Sm1, Sm2, …, Sm1000} and Sp = {Sp1, Sp2, …, Sp1000}.
Step 4: The 5th and 95th percentiles are calculated for both Sm and Sp.
Step 5: Lastly, if the percentile ranges (Step 4) of Sm and Sp do not overlap, then there is
sufficient evidence to conclude that there is a difference of variance.
9. Page 9 of 31
5. Results
5.1 – Graphical results & comments
5.1.1 – Probe data
Below is a table summarising some important statistical measures for the raw probe data.
Chlorine Conductivity Temperature
Mean 0.0903 69.0 16.7
Variance 0.0043 18.8 14.0
Minimum value 0 55 11.1
Maximum value 0.38 110 23.7
Table 1: Some important statistical measures evaluated from the probe chlorine,
conductivity and temperature data.
Firstly, the graphs for the probe monthly chlorine, conductivity and temperature data are
shown in Figures 1, 2 and 3 respectively. The bars, lines and circles represent the range for
that particular month.
Figure 1: Probe chlorine concentration monthly averages and ranges from January 2013 to
November 2014 measured in mg/L.
The monthly averages of chlorine concentration (Figure 1) appear to have yearly seasonality,
meaning there is a peak about 12 months apart. These peaks (and troughs) are not exactly
defined by a single observation with the peak or trough lasting multiple months. Peak
chlorine concentrations occur during the winter months and potentially a month either side
of winter whereas the troughs occur predominantly throughout the summer months only in
addition to March. The ranges from January 2014 to May 2014 (except March) are
reasonably large with the smallest ranges occurring during the summer months. Additionally,
the overall range of the monthly averages can be more accurately defined as being from
about 0.025 to 0.18 mg/L with the range for winter months generally larger than the range
of the summer months.
10. Page 10 of 31
Figure 2: Probe conductivity monthly averages and ranges from January 2013 to November
2014 measured in µS/cm.
The monthly conductivity averages (Figure 2, above) show a predominantly decreasing trend
over the sampling period except for the months from about January to April where there is
an increasing trend, for both 2013 and 2014. Even though there is no particular peak or
trough month, the data appears to be seasonal (yearly), meaning cycles of 12 months. The
overall range of the monthly averages is from 60.7 to 74.5 µS/cm, with ranges for each
month predominantly less than about 20 µS/cm (e.g. from 70 to 75 µS/cm) but going as
high as 40 µS/cm (e.g. from 70 to 100 µS/cm in February 2013).
The overall decrease in trend can be attributed to there being an increase in the water
supplied to all City West Water localities (including Keilor Lodge) from the Silvan
Reservoir. This means that since there is a lower conductivity in the water supplied from the
Silvan Reservoir [3], the conductivity levels decrease.
Figure 3: Probe temperature monthly averages and ranges from January 2013 to November
2014 measured in °C.
Temperature monthly averages have a strong cyclic pattern that is more prominent than
that of chlorine. The peaks and troughs are more clearly defined (generally one point) with
the 2013 and 2014 averages almost identical. The peaks occur during either February or
March (both considered hot months on average) and the troughs occur around July and
August (both considered cold months on average). This is to be expected since December
11. Page 11 of 31
through to March will have, for the majority, much hotter temperatures than June through
September with the range of monthly averages from about 11.5 to 22.5°C. The range for each
month is about the same from one month to the next with all ranges less than 5°C.
Figure 4: Probe chlorine concentration hourly averages measured in mg/L.
Chlorine hourly averages show there is a decrease of about 0.05 mg/L from 9pm to 3am.
This “dip” can be attributed to the chlorine dissipating since the water is sitting in the pipe
overnight. The rise is probably due to people waking up and getting ready to go to work.
Once the water starts flowing in the pipe then it becomes chlorinated again, which results in
a rise in the chlorine concentration. Besides this “dip”, the mean is roughly constant (with no
chlorine dissipation) with the range of values (from about 9am to 9pm) less than 0.01 mg/L
compared with the overall range of about 0.06mg/L.
Figure 5: Probe conductivity hourly averages measured in µS/cm.
Conductivity hourly averages are roughly constant over a 24-hour period with minimal trend
and no obvious cycles. The range is from 68.3 to 70.6 µS/cm (about 2.3 µS/cm).
0
0.05
0.1
0.15
0.2
Midnight 3am 6am 9am Noon 3pm 6pm 9pm 11pm
Chlorine(mg/L)
Hourly average chlorine concentration
0
20
40
60
80
100
120
Midnight 3am 6am 9am Noon 3pm 6pm 9pm 11pm
Conductivity(µS/cm)
Hourly conductivity average
12. Page 12 of 31
Figure 6: Probe temperature hourly averages measured in °C.
Temperature hourly averages are virtually constant over 24 hours with no other important
features to mention.
5.1.2 – Manual data
Below is a table summarising some important statistical measures for the manual data.
Chlorine Conductivity Temperature
Mean 0.2033 76.5 16.7
Variance 0.0175 178.6 14.0
Minimum value 0 62 11.1
Maximum value 0.38 120 23.7
Table 2: Some important statistical values evaluated from the manual chlorine, conductivity
and temperature data.
Figure 7: Manual conductivity data from January 2013 to November 2014.
The conductivity data appear to have a roughly constant mean with larger variation during
the first six months of 2013 than any other period when the samples were taken. The large
variations were caused by fluctuations in water source. The range of values is from 62 to 120
µS/cm.
Next, the manual chlorine data are analysed. Since there are about 500 measurements,
averages of the data will be taken, and hence analysed, using the same process as that of the
probe data. Below is the graph showing the chlorine concentration monthly averages and
ranges.
10
15
20
25
Midnight 3am 6am 9am Noon 3pm 6pm 9pm 11pm
Temperature(°C)
Hourly average temperature
0
40
80
120
J F M A M J J A S O D J F M A M J J A S O N
2013 2014
Conductivity(µS/cm)
All manual conductivity measurements
13. Page 13 of 31
Figure 8: Manual chlorine concentration monthly averages and ranges from January 2013 to
November 2014.
Figure 8 shows that there is seasonality in the data (yearly) with the peaks and troughs
lasting for several months in addition to no apparent overall trend. The overall range is from
0 to 0.65 mg/L with the range for each month about the same from one month to the next.
5.2 – Comparison of probe & manual data
Since the only manual data provided was for chlorine concentrations and conductivity, then
only these two parameters can be compared against the probe data. Below is a combined
graph of the manual and probe monthly averages for chlorine.
Figure 9: Manual & probe chlorine monthly averages from January 2013 to November 2014.
Besides the difference in means of the probe and manual chlorine monthly averages, both
series display similar features including seasonality, variation and overall shape. Thus, it
0
0.05
0.1
0.15
0.2
0.25
0.3
J F M A M J J A S O N D J F M A M J J A S O N
2013 2014
Chlorine(mg/L)
Monthly average chlorine concentrations (probe & manual)
Probe Manual
14. Page 14 of 31
appears that in looking at longer term trends, taking manual measurements on a daily basis
is just as good as inserting a probe and recording measurements every 5 minutes for
obtaining an overall picture of how the chlorine concentration is fairing.
Since the manual samples of conductivity were taken fortnightly, a logical assumption would
be to compare the fortnightly raw manual data against the fortnightly averages of the probe
data. However, this would be an unfair comparison since raw values would be compared
against averages. If this was conducted, the averages would lack the variation that is present
in the raw data. Similarly, monthly averages of the conductivity will have more variation
than that of the probe data, so for this case quarterly averages will be compared.
Below is a combined graph of the quarterly averages for the manual and probe conductivity
data (Figure 10).
Figure 10: Manual and probe conductivity quarterly averages from January 2013 to
November 2014.
The manual quarterly averages are marginally higher than the corresponding quarter for the
probe data, not a very large difference between the two methods when observing the
quarterly averages. Therefore, the manual conductivity data appear to give a satisfactory
overall picture over the 23-month period. Additional manual sampling information for
conductivity may need to be obtained for a better comparison with the probe data.
Hence, it appears that using manual samples to obtain chlorine concentrations and
conductivity gives just as good an indication about overall trend, mean and variation as that
of the probe.
5.3 – Statistical analysis
5.3.1 – Hypotheses and assumptions
The first stage of any statistical analysis is establishing a hypothesis that will be tested. So
for this report, the hypotheses are:
H0: the variances of the probe and manual data are the same and therefore either sampling
method could be used
H1: the variances of the probe and manual data are different
0
20
40
60
80
100
120
March June September December March June September December
2013 2014
Conductivity(µS/cm)
Quarterly average conductivity (probe & manual)
Probe Manual
15. Page 15 of 31
Where H0 and H1 are the null and alternate hypotheses, respectively.
There are some necessary assumptions for many statistical tests that need to be satisfied
before the test can be suitably conducted. If a test of variances is to be conducted, then
three assumptions are required which are the samples needs to be randomly selected,
independent and normally distributed.
5.3.2 - Skewness
Before looking at what test can be applied to the data or determine if the data are normally
distributed, first an analysis of whether the data are skewed (either positively, negatively or
not at all) should be conducted. This will help establish whether the data are normally
distributed or if a transformation is required. Figures 11 and 12 show histograms for the
manual chlorine and probe chlorine data respectively.
Figure 11: Histogram of the manual chlorine data, showing frequency of events.
Figure 12: Histogram of the probe chlorine data, showing frequency of events.
0.60.50.40.30.20.10.0
40
30
20
10
0
Chlorine manual
Frequency
Histogram of Chlorine manual
16. Page 16 of 31
From the above figures, it can be seen that both histograms are positively skewed (or skewed
to the right). This would suggest, even before conducting a test for normality, that neither
the manual nor probe chlorine data will be normally distributed.
The same observation can be concluded from the conductivity histograms (i.e. that the
manual and probe data are positively skewed) and hence these data sets will most likely not
follow a normal distribution.
For the conductivity histograms regarding the manual and probe data, see Appendix 2.
5.3.3 – Testing for normality
Before any statistical testing can be done, the data needs to be deemed sufficiently normally
distributed at the 0.05 level of significance. This means that we can be 95% sure that the
data are normally distributed if the test for normality passes. Since the comparison is limited
to probe and manual data for the chlorine and conductivity parameters only, four tests for
normality were undertaken. Below is the probability plot of the manual chlorine data.
Figure 13: Probability plot of the manual chlorine data, testing for normality at the 0.05
level of significance.
The hypothesis that the data are normally distributed for the manual chlorine data is
rejected since the p-value is less than the 0.05 significance level. Thus, the manual chlorine
data are not normally distributed.
Moreover, when testing for normality for the probe chlorine data and both conductivity data
sets from manual and probe, similar results occurred with all of the remaining data sets not
passing the test for normality.
For the probability plots associated with each of these data sets, see Appendix 3.
17. Page 17 of 31
Since the test for normality failed for the probe and manual data (for both chlorine and
conductivity), then a 2-sample variance test could not be conducted since this test assumes
that the data are normally distributed. Furthermore, since an objective is to determine
whether the probe and manual data have similar variation, the non-parametric tests such as
the Mann-Whitney and Kruskal-Wallis are not testing for a difference in variance. Hence, a
different method had to be found to test whether the probe and manual data have statistical
different variances.
To determine whether the data can be transformed, or if it can be fitted to a distribution,
the Individual Distribution Identification in Minitab was utilised to check if the data fitted a
known distribution, such as exponential, log, Weibull, etc. This was firstly conducted for the
manual chlorine data which resulted in several plots with one of these shown below in Figure
14.
Figure 14: Distribution identification plot attempting to fit a known distribution to the
manual chlorine data. Shown are four distributions with p-values all less than 0.05, meaning
none of the four distributions fit the manual chlorine data.
The extra distribution identification plots for the manual chlorine data can be found in
Appendix 4.
These additional distribution identification plots all had p-values less than 0.05 which meant
that no suitable transformation or distribution fitted the manual chlorine data. Hence, no
appropriate tests (i.e. tests that rely on the assumption that the data are normal) were
conducted. The same procedure was applied to the probe chlorine data in addition to the
conductivity manual and probe data, but achieved the same results except for the manual
conductivity data.
18. Page 18 of 31
A distribution was found that fits the manual conductivity data (Figure 15) which was a
Box-Cox transformation with λ = -5, meaning the reciprocal was taken and then the data
was raised to the fifth power. However, since the same transformation couldn’t be
successfully applied to the probe conductivity data without the data being normally
distributed, then no appropriate transformation or distribution fits the conductivity data.
Figure 15: Distribution identification plot attempting to fit a known distribution to the
manual conductivity data.
Additional distribution plots for the probe chlorine data and conductivity manual and probe
data can be found in Appendix 4.
Therefore, an alternative method had to be developed in order for the data to be tested.
5.3.4 – Bootstrapping
After developing a suitable program in MATLAB (outlined in section 3.2 Bootstrapping), the
sample variances for the chlorine and conductivity data (for both probe and manual) could
be calculated. The chlorine data will be discussed initially before progressing to the
conductivity data.
The average of the sample variances for probe chlorine was determined to be 0.004343 mg/L.
The 5th and 95th percentiles of the sample variances from the manual data were calculated as
0.015569 mg/L and 0.019452 mg/L respectively. Since there is no overlap of percentiles, then
there is evidence to conclude that there is a difference between the variances of chlorine for
the probe and manual data.
Similarly, for the conductivity data the average of the sample variances from the probe was
determined as 18.79 µS/cm. Additionally, the 5th and 95th percentiles of the sample variances
from the manual data were calculated as 81.31 µS/cm and 281.96 µS/cm respectively. Since
19. Page 19 of 31
there is no overlap of percentiles, then there is evidence to conclude that there is a difference
between the variances of conductivity for the probe and manual data.
From the bootstrapping results, it can be established that there is a difference between the
variances of chlorine for the probe and manual data and similarly for the conductivity
variances. For the manual data, the 5th and 95th percentiles of the variances are a few times
larger than those of the probe data and is due to the large differences in sample size between
the two methods.
20. Page 20 of 31
6. Conclusions
Statistically, the probe data has less variability than the manual data for both the chlorine
concentration and electrical conductivity. This is due to the large difference in sample size
between the two methods.
However, when observing the graphs which were comparing the probe and manual data,
there was not a “significant” difference between the two methods for either chlorine
concentration or electrical conductivity.
6.1 – Recommendations
Based on the results and the consequent conclusions, the first recommendation would be that
either method, probe or manual sampling, is acceptable for observing the long-term
characterisation of the water quality, even though the variances were found to be
significantly different.
Secondly, the cheaper option of manual sampling can be implemented instead of the probe.
However, if diurnal patterns were to be analysed then it would be advised that City West
Water can temporarily rent a probe to record these patterns.
6.2 – Further Analysis
The first option for further analysis would be to find some transformation, possibly utilising
a different statistical software such as SPSS, SAS or R, such that both the manual and
probe data sets can be statistically tested.
Further investigation could be conducted on grouping the probe data by month (i.e. January
2013 with January 2014, February 2013 with February 2014, etc) and determine whether the
data can be transformed in this way.
An additional option for analysis could be to find a statistical method that could deal with
the large difference in sample size between the two methods.
Finally, more analysis could be conducted to establish whether randomly selecting 47
(equivalent to number of conductivity values) or 447 values (equivalent to number of
chlorine values) from the probe data produces different results. Furthermore, randomly
selecting without replacement could be implemented for the probe data only, in conjunction
with the method outlined in the previous sentence.
21. Page 21 of 31
7. References
[1] City West Water 2016, Who We Are, viewed 9 May 2015,
<https://www.citywestwater.com.au/about_us/who_we_are.aspx>.
[2] City West Water 2016, Where We Fit in, viewed 22 May 2015,
<https://www.citywestwater.com.au/about_us/where_we_fit_in.aspx>.
[3] City West Water, 2015, Drinking Water Quality Report 2015, City West Water,
Melbourne.
[4] KAPTATM 3000-AC4, In-line Multi-parameter Water Sensor 2014, viewed 26 March
2016, <http://www.endetec.com/endetec/ressources/files/1/20117,LIT-EN-032-02_KAPTA-
3000-AC4-Produ.pdf>.
22. Page 22 of 31
8. Appendix 1
MATLAB code for the improvised bootstrapping technique. The example code is for the
manual conductivity data.
clear;
% 47 manual conductivity measurements
in = fopen('Conductivity manual data only.txt','r'); % Opening the text file to
read the values
Manualcond = fscanf(in, '%f', [1,inf]); % Reading all the values in the text
file
Mcond = zeros(1000,47); % Creating a 1000 x 47 matrix of all zeroes
fclose(in); % Closing the file
new_out = fopen('Conductivity manual random sample.txt','w'); % Creating a new
text file
for i = 1:1000
for j = 1:47
Mcond(i,j) = datasample(Manualcond,1); % Selecting a random value, with
replacement, from the data every time through the j loop.
fprintf(new_out, '%4.2f ', Mcond(i,j)); % Storing selected value into the
newly created file
end;
fprintf(new_out, 'n');
end;
fclose(new_out); % Closing the file
varfile = fopen('Variance is here Cond manual.txt','w'); % Creating a new text
file to store the variances of the manual conductivity data
varcalc = var(Mcond,0,2); % Calculating the variances for each row in the text
file “Conductivity manual random sample”
fprintf(varfile, '%20.10f', varcalc); % Storing variances into newly created text
file
fclose(varfile); % Closing the file
23. Page 23 of 31
9. Appendix 2
12010896847260
20
15
10
5
0
Conductivity manual
Frequency
Histogram of Conductivity manual
10896847260
90000
80000
70000
60000
50000
40000
30000
20000
10000
0
Conductivity probe
Frequency
Histogram of Conductivity probe
25. Page 25 of 31
11. Appendix 4
M anual chlorine probability plots, attempting to find an appropriate
transformation
All p-values in the above two probability plots are less than 0.05 which means that the
manual chlorine data doesn’t fit any of the above distributions.
26. Page 26 of 31
From the above graph, since the p-value < 0.05 then no appropriate Johnson transformation
can be made to the data.
M anual conductivity probability plots, attempting to find an appropriate
transformation
27. Page 27 of 31
Since Minitab found a sufficient transformation using the Box-Cox transformation, then
there is no need to consider the Johnson transformation. Additionally, the other
transformations all had p-values less than 0.05 so would nonetheless be an inappropriate
transformation to the conductivity manual.
28. Page 28 of 31
Probe conductivity probability plots, attempting to find an appropriate
transformation
29. Page 29 of 31
All p-values in the above four probability plots are less than 0.05 which means that the
probe conductivity data doesn’t fit any of the above distributions.
30. Page 30 of 31
Probe chlorine probability plots, attempting to find an appropriate
transformation
31. Page 31 of 31
All p-values in the above three probability plots are less than 0.05 which means that the
manual chlorine data doesn’t fit any of the above distributions.