SlideShare a Scribd company logo
World Bank & Government of The Netherlands funded
Training module # WQ - 48
Applied Statistics
New Delhi, September 2000
CSMRS Building, 4th Floor, Olof Palme Marg, Hauz Khas,
New Delhi – 11 00 16 India
Tel: 68 61 681 / 84 Fax: (+ 91 11) 68 61 685
E-Mail: dhvdelft@del2.vsnl.net.in
DHV Consultants BV & DELFT HYDRAULICS
with
HALCROW, TAHAL, CES, ORG & JPS
HP Training Module File: “ 48 Applied Statistics.doc” Version: September 2000 Page 1
Table of contents
Page
1. Module context 2
2. Module profile 3
3. Session plan 4
4. Overhead/flipchart master 5
5. Evaluation sheets 21
6. Handout 23
7. Additional handout 30
8. Main text 32
HP Training Module File: “ 48 Applied Statistics.doc” Version: September 2000 Page 2
1. Module context
This module discusses statistical procedures, which are commonly used by chemists to
evaluate the precision and accuracy of results of analyses. Prior training in ‘Basic Statistic’,
module no. 47, or equivalent is necessary to complete this module satisfactorily.
While designing a training course, the relationship between this module and the others,
would be maintained by keeping them close together in the syllabus and place them in a
logical sequence. The actual selection of the topics and the depth of training would, of
course, depend on the training needs of the participants, i.e. their knowledge level and skills
performance upon the start of the course.
HP Training Module File: “ 48 Applied Statistics.doc” Version: September 2000 Page 3
2. Module profile
Title : Applied Statistics
Target group : HIS function(s): Q2, Q3, Q5, Q6
Duration : one session of 90 min
Objectives : After the training the participants will be able to:
• Apply common statistical tests for evaluation of the precision of
data
Key concepts : • confidence interval
• regression analyses
Training methods : Lecture, exercises, OHS
Training tools
required
: Board, flipchart
Handouts : As provided in this module
Further reading
and references
: • ‘Analytical Chemistry’, Douglas A. Skoog and Donald M. West,
Saunders College Publishing, 1986 (Chapter 3)
• ‘Statistical Procedures for analysis of Environmenrtal monitoring
Data and Risk Assessment’, Edward A. Mc Bean and Frank A.
Rovers, Prentice Hall, 1998.
HP Training Module File: “ 48 Applied Statistics.doc” Version: September 2000 Page 4
3. Session plan
No Activities Time Tools
1 Preparations
2 Introduction:
• Need for statistical procedures
• Scope of the module
10 min OHS
3 Confidence interval
• Definition
• Significance of standard deviation and its correct estimation
• When population SD is known
• When population SD is not known
30 min OHS
4 Rejection of Data
• Caution in rejection
• Procedures
20 min OHS
5 Regression analysis 20 min OHS
6 Wrap up 10 min
HP Training Module File: “ 48 Applied Statistics.doc” Version: September 2000 Page 5
4. Overhead/flipchart master
OHS format guidelines
Type of text Style Setting
Headings: OHS-Title Arial 30-36, with bottom border line (not:
underline)
Text: OHS-lev1
OHS-lev2
Arial 24-26, maximum two levels
Case: Sentence case. Avoid full text in UPPERCASE.
Italics: Use occasionally and in a consistent way
Listings: OHS-lev1
OHS-lev1-Numbered
Big bullets.
Numbers for definite series of steps. Avoid
roman numbers and letters.
Colours: None, as these get lost in photocopying and
some colours do not reproduce at all.
Formulas/Equat
ions
OHS-Equation Use of a table will ease horizontal alignment
over more lines (columns)
Use equation editor for advanced formatting
only
HP Training Module File: “ 48 Applied Statistics.doc” Version: September 2000 Page 6
Applied Statistics
• Confidence intervals
• Number of replicate observations
• Rejection of data
• Regression analysis
HP Training Module File: “ 48 Applied Statistics.doc” Version: September 2000 Page 7
Confidence Intervals (1)
• Population mean µ is always unknown
• Sample mean x
- Only for large number of replicate x → µ
• Estimation of limits for µ from x
(x – a) < µ < (x + a)
• Value of a depend
- confidence level (probability of occurrence)
- standard deviation
- difference between limits is the interval
HP Training Module File: “ 48 Applied Statistics.doc” Version: September 2000 Page 8
Confidence Intervals
Figure 1. Normal distribution curve and areas under curve for different distances
HP Training Module File: “ 48 Applied Statistics.doc” Version: September 2000 Page 9
Confidence Intervals (2)
• When s is good estimate of σ
- for single observation, (x - zσ) < µ < (x - zσ)
Confidence level 50 90 95 99 99.7
Z 0.67 1.64 1.96 2.58 3
- as z or interval increases confidence level also increases
HP Training Module File: “ 48 Applied Statistics.doc” Version: September 2000 Page 10
Example (1)
• Mercury concentration in fish was determined as 1.8 µg/kg.
Calculate limits for µ for 95% confidence level if σ = 0.1 µg/kg
- For 95% confidence level z = 1.96
- Limits for mean
1.8 – 1.96 x 0.1 < µ < 1.8 + 1.96 x 0.1
1.6 < µ < 2.0
• What are the limits for 99.7% confidence level?
HP Training Module File: “ 48 Applied Statistics.doc” Version: September 2000 Page 11
Confidence Intervals (3)
• Confidence interval when mean,x, of n replicates is available
• Confidence interval is reduced compared to single
measurement as n increases
n 1 2 3 4 9 16
1 1.4 1.7 2 3 4
• Analyse 2 to 4 replicates, more replicates give diminishing
return
nzxnzx σ+<µ<σ−
n
HP Training Module File: “ 48 Applied Statistics.doc” Version: September 2000 Page 12
Example (2)
• Average mercury conc. from 3 replicates was 1.67 µg/ kg
• Calculate limits for 95% confidence level, σ = 0.1 µg/ kg
1.67 – 1.96 x 0.1/ < µ < 1.67 + 1.96 x 0.1/
1.56 < µ < 1.78
3 3
HP Training Module File: “ 48 Applied Statistics.doc” Version: September 2000 Page 13
Example (3)
• How Many replicate analyses will be required to decrease the
95% confidence interval to ± 0.07, σ = 0.1
Interval = ± zσ/
± 0.07 = ± 1.96 x 0.1/
n = 7.8
• 8 measurements will give slightly better than 95% chance for µ
to be with x ± 0.07
n
n
HP Training Module File: “ 48 Applied Statistics.doc” Version: September 2000 Page 14
Confidence Interval (4)
• When σ is unknown
• Sample s based on n measurements is available
Degrees of Freedom 1 4 10 α
T95 12.7 2.78 2.23 1.96
T99 63.7 4.60 3.17 2.58
• Note T → z as degrees of freedom increases to ∝
nstxnstx +<µ<−
HP Training Module File: “ 48 Applied Statistics.doc” Version: September 2000 Page 15
Example (4)
• Groundwater samples were analysed for TOC, n = 5, x = 11.7
mg/L s = 3.2 mg/L
• Find the 95% confidence limit for the true mean
- Degrees of freedom = n – 1 = 5 – 1 = 4
- T95 = 2.78 (from table)
11.7 – 2.78 x 3.2 / < µ < 11.7 + 2.78 x 3.2 /
7 < µ < 15.74
5 5
HP Training Module File: “ 48 Applied Statistics.doc” Version: September 2000 Page 16
Detection of Outliers
• Extreme values, very high or low, will influence calculated
statistics
• Rejection
- sound basis
- may be due to an unrecorded event
- errors in transcription, reading of instruments, inconsistent
methodology, etc.
- otherwise apply statistical yardsticks
HP Training Module File: “ 48 Applied Statistics.doc” Version: September 2000 Page 17
Q Test
• Qexp =
• Reject if Qexp = Qcrit
Total No. of
observations
3 5 7 10
Qcrit 90% 0.94 0.64 0.51 0.41
Qcrit 96% 0.98 0.73 0.59 0.48
• Other criteria are also available
testentireofSpread
neighbourandsuspectbetweenDifference
HP Training Module File: “ 48 Applied Statistics.doc” Version: September 2000 Page 18
Regression Analysis
• Quantification of relationships between variables
- used for predictions
- calibration of instruments
• Least-square method
- deviations due to random errors
y = a + bx
- linear relationship
HP Training Module File: “ 48 Applied Statistics.doc” Version: September 2000 Page 19
Calibration of GC
Pesticide conc., µg/L
(xi)
Peak area, cm2
(yi)
xi
2
yi
2
xiyi
0.352 1.09 0.12390 1.1881 0.3868
0.803 1.78 0.64481 3.1684 1.42934
1.08 2.60 1.16640 6.7600 2.80800
1.38 3.03 1.90140 9.1809 4.18140
1.75 4.01 3.06250 16.0801 7.01750
∑ 5.365 12.51 6.90201 36.3775 15.81992
Sxx = ∑xi
2
– (∑xi)2
/n =1.145365
Syy = ∑yi
2
– (∑yi)2
/n =5.07748
Sxy = ∑xiyi – ∑xi ∑yi /n =2.39669
b = Sxy / Sxx = 2.09
a = y - bx = 0.26
HP Training Module File: “ 48 Applied Statistics.doc” Version: September 2000 Page 20
Figure 2 - Calibration Curve
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2
Concentration of pesticide, ug/L
Peakarea,cm2
Calibration Curve
HP Training Module File: “ 48 Applied Statistics.doc” Version: September 2000 Page 21
5. Evaluation sheets
HP Training Module File: “ 48 Applied Statistics.doc” Version: September 2000 Page 22
HP Training Module File: “ 48 Applied Statistics.doc” Version: September 2000 Page 23
6. Handout
HP Training Module File: “ 48 Applied Statistics.doc” Version: September 2000 Page 24
Applied Statistics
• Confidence intervals
• Number of replicate observations
• Rejection of data
• Regression analysis
Confidence Intervals (1)
• Population mean µ is always unknown
• Sample mean x
- Only for large number of replicate x → µ
• Estimation of limits for µ from x
(x – a) < µ < (x + a)
• Value of a depend
- confidence level (probability of occurrence)
- standard deviation
- difference between limits is the interval
Confidence Intervals
Figure 1. Normal distribution curve and areas under curve for different distances
HP Training Module File: “ 48 Applied Statistics.doc” Version: September 2000 Page 25
Confidence Intervals (2)
• When s is good estimate of σ
- for single observation, (x - zσ) < µ < (x - zσ)
Confidence level 50 90 95 99 99.7
Z 0.67 1.64 1.96 2.58 3
- as z or interval increases confidence level also increases
Example (1)
• Mercury concentration in fish was determined as 1.8 µg/kg.
• Calculate limits for µ for 95% confidence level if σ = 0.1 µg/k
- For 95% confidence level z = 1.96
- Limits for mean
1.8 – 1.96 x 0.1 < µ < 1.8 + 1.96 x 0.1
1.6 < µ < 2.0
• What are the limits for 99.7% confidence level?
Confidence Intervals (3)
• Confidence interval when mean,x, of n replicates is available
• Confidence interval is reduced compared to single measurement as n increases
n 1 2 3 4 9 16
1 1.4 1.7 2 3 4
• Analyse 2 to 4 replicates, more replicates give diminishing return
Example (2)
• Average mercury conc. from 3 replicates was 1.67 µg/ kg.
• Calculate limits for 95% confidence level, σ = 0.1 µg/ k
nzxnzx σ+<µ<σ−
n
78.156.1
31.0x96.167.131.0x96.167.1
<µ<
+<µ<−
HP Training Module File: “ 48 Applied Statistics.doc” Version: September 2000 Page 26
Example (3)
• How Many replicate analyses will be required to decrease the 95% confidence
interval to ± 0.07, σ = 0.1
Interval = ± zσ/
± 0.07 = ± 1.96 x 0.1/
n = 7.8
• 8 measurements will give slightly better than 95% chance for µ to be with x ±
0.07
Confidence Intervals (4)
• When σ is unknown
• Sample s based on n measurements is available
Degrees of Freedom 1 4 10 α
T95 12.7 2.78 2.23 1.96
T99 63.7 4.60 3.17 2.58
• Note T → z as degrees of freedom increases to α
Example (4)
• Groundwater samples were analysed for TOC n = 5, x = 11.7 mg/L s = 3.2
mg/L
• Find the 95% confidence limit for the true mean
- Degrees of freedom = n – 1 = 5 – 1 = 4
- T95 = 2.78
11.7 – 2.78 x 3.2 / < µ < 11.7 + 2.78 x 3.2 /
7 < µ < 15.74
n
n
nstxnstx +<µ<−
5 5
HP Training Module File: “ 48 Applied Statistics.doc” Version: September 2000 Page 27
Detection of Outliers
• Extreme values, very high or low, will influence calculated statistics
• Rejection
- sound basis
- may be due to an unrecorded event
- errors in transcription, reading of instruments, inconsistent methodology, etc.
- otherwise apply statistical yardsticks
Q Test
• Qexp =
• Reject if Qexp = Qcrit
Total No. of
observations
3 5 7 10
Qcrit 90% 0.94 0.64 0.51 0.41
Qcrit 96% 0.98 0.73 0.59 0.48
• Other criteria are also available
Regression Analysis
• Quantification of relationships between variables
- used for predictions
- calibration of instruments
• Least-square method
- deviations due to random errors
y = a + bx
- linear relationship
testentireofSpread
neighbourandsuspectbetweenDifference
HP Training Module File: “ 48 Applied Statistics.doc” Version: September 2000 Page 28
Calibration of GC
Pesticide conc.,
µg/L
(xi)
Peak area,
cm2
(yi)
xi
2
yi
2
xiyi
0.352 1.09 0.12390 1.1881 0.3868
0.803 1.78 0.64481 3.1684 1.42934
1.08 2.60 1.16640 6.7600 2.80800
1.38 3.03 1.90140 9.1809 4.18140
1.75 4.01 3.06250 16.0801 7.01750
∑ 5.365 12.51 6.90201 36.3775 15.81992
Sxx = ∑xi
2
– (∑xi)2
/n =1.145365
Syy = ∑yi
2
– (∑yi)2
/n =5.07748
Sxy = ∑xiyi – ∑xi ∑yi /n =2.39669
b = Sxy / Sxx = 2.09
a = y - bx = 0.26
Figure 2 – Calibration Curve
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2
Concentration of pesticide,
Peakarea,cm
2
HP Training Module File: “ 48 Applied Statistics.doc” Version: September 2000 Page 29
Add copy of Main text in chapter 8, for all participants.
HP Training Module File: “ 48 Applied Statistics.doc” Version: September 2000 Page 30
7. Additional handout
These handouts are distributed during delivery and contain test questions, answers to
questions, special worksheets, optional information, and other matters you would not like to
be seen in the regular handouts.
It is a good practice to pre-punch these additional handouts, so the participants can easily
insert them in the main handout folder.
HP Training Module File: “ 48 Applied Statistics.doc” Version: September 2000 Page 31
HP Training Module File: “ 48 Applied Statistics.doc” Version: September 2000 Page 32
8. Main text
Contents
1. Confidence Intervals 1
2. Detection of Data Outliers 5
3. Regression Analysis 6
HP Training Module File: “ 48 Applied Statistics.doc” Version: September 2000 Page 1
Applied Statistics
For small data sets, there is always the question as to how well the data represent the
population and what useful information can be inferred regarding the population
characteristics. Statistical calculations are used for this purpose. This module deals with four
such applications:
1. Definition of a population around the mean of a data set within which the true mean can
be expected to be found with a particular degree of probability.
2. Determination of the number of measurements needed to obtain the true mean within a
predetermined interval around the experimental mean with a particular degree of
probability.
3. Determining if an outlying value in a set of replicate measurements should be retained in
calculating the mean for the set.
4. The fitting of a straight line to a set of experimental points.
1. Confidence Intervals
The true mean of population, µ, is always unknown. However, limits can be set about the
experimentally determined mean, x, within which the true mean may be expected to occur
with a given degree of probability. These bounds are called confidence limits and the interval
between these limits, the confidence interval.
For a given set of data, if the confidence interval about the mean is large, the probability of
an observation falling within the limits also becomes large. On the other hand, in the case
when the limits are set close to the mean, the probability that the observed value falls within
the limits also becomes smaller or, in other words, a larger fraction of observations are
expected to fall out side the limits. Usually confidence intervals are calculated such that 95%
of the observations are likely to fall within the limits.
For a given probability of occurrence, the confidence limits depend on the value of the
standard deviation, s, of the observed data and the certainty with which this quantity can be
taken to represent the true standard deviation, σ, of the population.
1.1 Confidence limits where σ is known
When the number of repetitive observations is 20 or more, the standard deviation of the
observed set of data, s, can be taken to be a good approximation of the standard deviation
of the population, σ.
However, it may not always be possible to perform such a large number of repetitive
analyses, particularly when costly and time consuming extraction and analytical procedures
are involved. In such cases, data from previous records or different laboratories may be
pooled, provided that identical precautions and analytical steps are followed in each case.
Further, care should be taken that the medium sampled is also similar, for example, analysis
results of groundwater samples having high TDS should not be pooled with the results of
surface waters, which usually have low TDS.
As discussed in the previous module, most of the randomly varying data can be
approximated to a normal distribution curve. For normal distribution, 68% of the observations
lie between the limits µ ± σ , 96% between the limits µ ± 2σ, 99.7% between the limits µ ±
HP Training Module File: “ 48 Applied Statistics.doc” Version: September 2000 Page 2
3σ, etc., Figure 1. For a single observation, xi, the confidence limits for the population mean
are given by:
Figure 1. Normal distribution curve and areas under curve for different distances
confidence limit for µ = xi ± zσ (1)
where z assumes different values depending upon the desired confidence level as given in
Table 1.
Table 1: Values of z for various confidence levels
Confidence level, % z
50 0.67
68 1.00
80 1.29
90 1.64
95 1.96
96 2.00
99 2.58
99.7 3.00
99.9 3.29
Example 1:
Mercury concentration in the sample of a fish was determined to be 1.80 µg/kg. Calculate
the 50% and 95% confidence limits for this observation. Based on previous analysis records,
it is known that the standard deviation of such observations, following similar analysis
procedures, is 0.1 µg/kg and it closely represents the population standard deviation, σ.
HP Training Module File: “ 48 Applied Statistics.doc” Version: September 2000 Page 3
From Table 1, it is seen that z = 0.67 and 1.96 for the two confidence limits in question.
Upon substitution in Equation 1, we find that
50% confidence limit = 1.80 ± 0.67 x 0.1 = 1.8 ± 0.07 µg/kg
95% confidence limit = 1.80 ± 1.96 x 0.1 = 1.8 ± 0.2 µg/kg
Therefore if 100 replicate analyses are made, the results of 50 analysis will lie between the
limits 1.73 and 1.87 and 95 results are expected to be within an enlarged limit of 1.6 and 2.0
Equation 1 applies to the result of a single measurement. In case a number of observations,
n, is made and an average of the replicate samples is taken, the confidence interval
decreases. In such a case the limits are given by:
confidence limit for µ = x ± zσ/ √n (2)
Example 2:
Calculate the confidence limits for the problem of Example 1, if three samples of the fish
were analysed yielding an average of 1.67 µg/kg Hg.
Substitution in Equation 2 gives:
50% confidence limit = 1.67 ± 0.67 x 0.1/ √3 = 1.67 ± 0.04 µg/kg
95% confidence limit = 1.67 ± 1.96 x 0.1/ √3 = 1.67 ± 0.11 µg/kg
For the same odds the confidence limits are now substantially smaller and the result can be
said to be more accurate and probably more useful.
Note that Equation 2 indicates that the confidence interval can be halved by increasing the
number of analyses to 4 (√4 = 2) compared to a single observation. Increasing the number
of measurements beyond 4 does not decrease the confidence interval proportionately. To
narrow the interval by one fourth, 16 measurements would be required (√16 = 4), thus giving
diminishing return. Consequently, 2 to 4 replicate measurements are made in most cases.
Equation 2 can also be used to find the number of replicate measurements required such
that with a given probability the true mean would be found within a predetermined interval.
This is illustrated in Example 3.
Example 3:
How many replicate measurements of the specimen in Example 1 would be needed to
decrease the 95% confidence interval to ±0.07.
Substituting for confidence interval in Equation 2:
0.07 = 1.96 x 0.1/ √n
√n = 1.96 x 0.1/0.07, n = 7.8
Thus, 8 measurements will provide slightly better than 95% chance of he true mean lying
within ±0.07 of the experimental mean.
HP Training Module File: “ 48 Applied Statistics.doc” Version: September 2000 Page 4
1.2 Confidence limits where σ is unknown
When the number of individual measurements in a set of data are small the reproducibility of
the calculated value of the standard deviation, s, is decreased. Therefore, for a given
probability, the confidence interval must be larger under these circumstances.
To account for the potential variability in s, the confidence limits are calculated using the
statistical parameter t:
confidence limit for µ = x ± ts/ √n (3)
In contrast to z in Equation 2, t depends not only on the desired confidence level, but also
upon the number of degrees of freedom available in the calculation of s. Table 2 provides
values for t for various degrees of freedom and confidence levels. Note that the values of t
become equal to those for z (Table 1) as the number of degrees of freedom becomes
infinite.
Table 2: Values of t for various confidence levels and degrees of freedom
Confidence LevelDegrees of
Freedom 90 95 99
1 6.31 12.7 63.7
2 2.92 4.30 9.92
3 2.35 3.18 5.84
4 2.13 2.78 4.60
5 2.02 2.57 4.03
6 1.94 2.45 3.71
8 1.86 2.31 3.36
10 1.81 2.23 3.17
12 1.78 2.18 3.11
12 1.78 2.16 3.06
14 1.76 2.14 2.98
∞ 1.64 1.96 2.58
Example 4:
A chemist obtained the following data for the concentration of total organic carbon (TOC) in
groundwater samples: 13.55, 6.39, 13.81, 11.20, 13.88 mg/L. Calculate the 95% confidence
limits for the mean of the data.
Thus, from the given data, n = 5, x = 11.77 and s = 3.2. For degrees of freedom = 5 – 1 = 4
and 95% confidence limit, t from Table 2 = 2.78. Therefore, from Equation 3:
confidence limit for µ = x ± ts/ √n = 11.77 ± 2.78 x 3.2/ √5
= 11.77 ± 3.97
HP Training Module File: “ 48 Applied Statistics.doc” Version: September 2000 Page 5
2. Detection of Data Outliers
Data outliers are extreme (high or low) values that diverge widely from the main body of the
data set. The presence of one or more outliers may greatly influence any calculated statistics
and yield biased results. However, there is also the possibility that the outlier is a legitimate
member of the data set. Outlier detection tests are to determine whether there is sufficient
statistical evidence to conclude that an observation appears extreme and does not belong to
the data set and should be rejected.
Data outliers may result from faulty instruments, error in transcription, misreading of
instruments, inconsistent methodology of sampling and analysis and so on. These aspects
should be investigated and if any of such reasons can be pegged to an outlier, the value
may be safely deleted from consideration. However, this is to be kept in mind that the
suspect data may rightfully belong to the set and may be the consequence of an unrecorded
event, such as, a short rainfall, intrusion of sea water or a spill.
Table 3: Critical values for rejection quotient
Qcrit (reject if Qexp> Qcrit)Number of observations
90% confidence 96% confidence
3 0.94 0.98
4 0.76 0.85
5 0.64 0.73
6 0.56 0.64
7 0.51 0.59
8 0.47 0.54
9 0.44 0.51
10 0.41 0.48
Of the numerous statistical criteria available for detection of outliers, the Q test, which is
commonly used, will be discussed here. To apply the Q test, the difference between the
questionable result and its closest neighbour is divided by the spread of the entire set. The
resulting ratio Qexp is compared with the rejection values, Qcrit given in Table3, that are
critical for a particular degree of confidence. If Qexp is larger, a statistical basis for rejection
exists. The table shows only some selected values. Any standard statistical analysis book
may be consulted for a complete set.
Example 5:
Concentration measurements for fluoride in a well were measured as 2.77, 2.80, 2.90, 2.92,
3.45, 3.95, 4.44, 4.61, 5.21, 7.46. Use the Q test to examine whether the highest value is an
outlier.
Qexp = (7.46 – 5.21) / (7.46 – 2.77) = 0.51
Since 0.51 is larger than 0.48, the Qcrit value for 96% confidence, there is a basis for
excluding the value.
Other criteria that can be used to evaluate an apparent outlier are:
- Plotting of a scatter diagram indicating inter-parameter relationship between two
constituents. If there is a correlation between the two constituents, an outlier would lie a
significant distance from the general trend.
HP Training Module File: “ 48 Applied Statistics.doc” Version: September 2000 Page 6
- Applying test for normal distribution by calculating the mean and standard deviation of all
the data and determining if the extreme value is outside the mean ±3 times the standard
deviation limit. If so, it is indeed an unusual value. (A cumulative normal probability plot
may also be used for this test.)
- When the data set is not large, calculating and comparing the standard deviation, with
and without the suspect observations. A suspect value which has a considerable
influence on the calculated standard deviation suggests an outlier.
3. Regression Analysis
In assessing environmental quality , it is often of interest to quantify relationship between two
or more variables. This may allow filling of missing data for one constituent and also may
help in predicting future levels of a constituent.
Regression analysis is focused on determining the degree to which one or more variable(s)
is dependant on an other variable, the independent variable. Thus regression is a means of
calibrating coefficients of a predictive equation. In correlation analysis neither of the
variables is identified as more important than the other. Correlation is not causation. It
provides a measure of the goodness of fit.
In the calibration step, in most analytical procedures where a ‘best’ straight line is fitted to
the observed response of the detector system when known amounts of analyte (standards)
are analysed. This section discusses the regression analysis procedure employed for this
purpose.
Least-squares method is the most straightforward regression procedure. Application of the
method requires two assumptions; a linear relationship exists between the amount of analyte
(x) and the magnitude of the measured response (y) and that any deviation of individual
points from the straight line is entirely the consequence of indeterminate error, that is, no
significant error exists in the composition of the standards.
The line generated is of the form:
y = a + bx (4)
where a is the value of y when x is zero (the intercept) and b is the slope. The method
minimises the squares of the vertical displacements of data points from the best fit line.
For convenience, the following three quantities are defined:
Sxx = ∑ (xi – x)2
= ∑xi
2
– (∑xi)2
/n
Syy = ∑ (yi – y)2
= ∑yi
2
– (∑yi)2
/n
Sxy = ∑ (xi – x) (yi – y) = ∑xiyi – ∑xi ∑yi /n
Calculating these quantities permits the determination of the following:
1. The slope of the line b:
b = Sxy/Sxx (5)
2. The intercept a:
a = y – bx (6)
HP Training Module File: “ 48 Applied Statistics.doc” Version: September 2000 Page 7
3. The standard deviation about the regression line, sr:
sr = √{(Syy – b2
Sxx)/(n-2)} (7)
4. The standard deviation of the slope sb:
sb = √(sr
2
/Sxx) (8)
5. The standard deviation of the results based on the calibration curve sc:
sc = (sr/b) x √{(1/m) + (1/n) + (yc – y)2
/b2
Sxx} (9)
where yc is the mean of m replicate measurements made using the calibration curve.
Example 6:
The following table gives calibration data for chromatographic analysis of a pesticide and
computations for fitting a straight line according to the least-square method.
Pesticide conc., µg/L
(xi)
Peak area, cm2
(yi) xi
2
yi
2
xiyi
0.352 1.09 0.12390 1.1881 0.3868
0.803 1.78 0.64481 3.1684 1.42934
1.08 2.60 1.16640 6.7600 2.80800
1.38 3.03 1.90140 9.1809 4.18140
1.75 4.01 3.06250 16.0801 7.01750
∑ 5.365 12.51 6.90201 36.3775 15.81992
Therefore
Sxx = ∑xi
2
– (∑xi)2
/n =1.145365
Syy = ∑yi
2
– (∑yi)2
/n =5.07748
Sxy = ∑xiyi – ∑xi ∑yi /n =2.39669
and from Equations 5 & 6
b = 2.0925 = 2.09 and a = 0.2567 = 0.26
Thus the equation for the least-square line is
y = 0.26 + 2.09x
Note that rounding should not be performed until the end in order to avoid rounding errors.
The data points and the equation is plotted in Figure2.
HP Training Module File: “ 48 Applied Statistics.doc” Version: September 2000 Page 8
Figure 2: Calibration Curve
Example 7:
Using the relationship derived in Example 6 calculate
1. concentration of pesticide in sample if the peak area of 2.65 was obtained.
2. the standard deviation of the result based on the single measurement of 2.65.
3. the standard deviation of the measurement if 2.65 represents the average of 4
replicates.
1. Substituting in the equation derived in the previous example
2.65 = 0.26 + 2.09x
x = 1.14 µg/L
2. Substituting in Equation 9 for m = 1
sc = (0.144/2.09) x √{(1/1) + (1/5) + (2.65 – 12.51/5)2
/2.092
x1.145} = ±0.08
3. Substituting in Equation 9 for m = 4
sc = (0.144/2.09) x √{(1/4) + (1/5) + (2.65 – 12.51/5)2
/2.092
x1.145} = ±0.05
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2
Concentration of pesticide, ug/L
Peakarea,cm
2

More Related Content

Viewers also liked

Download-manuals-surface water-software-processingofstreamflowdata
 Download-manuals-surface water-software-processingofstreamflowdata Download-manuals-surface water-software-processingofstreamflowdata
Download-manuals-surface water-software-processingofstreamflowdata
hydrologyproject0
 
Download-manuals-water quality-wq-manuals-49qualityassuranceandwithinlaborat...
 Download-manuals-water quality-wq-manuals-49qualityassuranceandwithinlaborat... Download-manuals-water quality-wq-manuals-49qualityassuranceandwithinlaborat...
Download-manuals-water quality-wq-manuals-49qualityassuranceandwithinlaborat...
hydrologyproject0
 
Gw02 role of dwlr data in groundwater resource estimation
Gw02 role of dwlr data in groundwater resource estimationGw02 role of dwlr data in groundwater resource estimation
Gw02 role of dwlr data in groundwater resource estimation
hydrologyproject0
 
Proceedings of technical papers
Proceedings of technical papersProceedings of technical papers
Proceedings of technical papers
hydrologyproject0
 
Himachal may, 2014
Himachal  may, 2014Himachal  may, 2014
Himachal may, 2014
hydrologyproject0
 
Ch sw water availibility study and supply demand analysis in kharun sub basin...
Ch sw water availibility study and supply demand analysis in kharun sub basin...Ch sw water availibility study and supply demand analysis in kharun sub basin...
Ch sw water availibility study and supply demand analysis in kharun sub basin...
hydrologyproject0
 
Cwc
CwcCwc

Viewers also liked (7)

Download-manuals-surface water-software-processingofstreamflowdata
 Download-manuals-surface water-software-processingofstreamflowdata Download-manuals-surface water-software-processingofstreamflowdata
Download-manuals-surface water-software-processingofstreamflowdata
 
Download-manuals-water quality-wq-manuals-49qualityassuranceandwithinlaborat...
 Download-manuals-water quality-wq-manuals-49qualityassuranceandwithinlaborat... Download-manuals-water quality-wq-manuals-49qualityassuranceandwithinlaborat...
Download-manuals-water quality-wq-manuals-49qualityassuranceandwithinlaborat...
 
Gw02 role of dwlr data in groundwater resource estimation
Gw02 role of dwlr data in groundwater resource estimationGw02 role of dwlr data in groundwater resource estimation
Gw02 role of dwlr data in groundwater resource estimation
 
Proceedings of technical papers
Proceedings of technical papersProceedings of technical papers
Proceedings of technical papers
 
Himachal may, 2014
Himachal  may, 2014Himachal  may, 2014
Himachal may, 2014
 
Ch sw water availibility study and supply demand analysis in kharun sub basin...
Ch sw water availibility study and supply demand analysis in kharun sub basin...Ch sw water availibility study and supply demand analysis in kharun sub basin...
Ch sw water availibility study and supply demand analysis in kharun sub basin...
 
Cwc
CwcCwc
Cwc
 

Similar to Download-manuals-surface water-software-48appliedstatistics

Download-manuals-surface water-software-47basicstatistics
 Download-manuals-surface water-software-47basicstatistics Download-manuals-surface water-software-47basicstatistics
Download-manuals-surface water-software-47basicstatistics
hydrologyproject0
 
Download-manuals-surface water-software-47basicstatistics
 Download-manuals-surface water-software-47basicstatistics Download-manuals-surface water-software-47basicstatistics
Download-manuals-surface water-software-47basicstatistics
hydrologyproject001
 
Download-manuals-surface water-software-47basicstatistics
 Download-manuals-surface water-software-47basicstatistics Download-manuals-surface water-software-47basicstatistics
Download-manuals-surface water-software-47basicstatistics
hydrologyproject0
 
Download-manuals-surface water-waterlevel-36howtocarryoutsecondaryvalidation...
 Download-manuals-surface water-waterlevel-36howtocarryoutsecondaryvalidation... Download-manuals-surface water-waterlevel-36howtocarryoutsecondaryvalidation...
Download-manuals-surface water-waterlevel-36howtocarryoutsecondaryvalidation...
hydrologyproject001
 
Download-manuals-surface water-waterlevel-41howtoanalysedischargedata
 Download-manuals-surface water-waterlevel-41howtoanalysedischargedata Download-manuals-surface water-waterlevel-41howtoanalysedischargedata
Download-manuals-surface water-waterlevel-41howtoanalysedischargedata
hydrologyproject001
 
Harmel - Monitoring to Support and Improve H/WQ Modeling
Harmel - Monitoring to Support and Improve H/WQ ModelingHarmel - Monitoring to Support and Improve H/WQ Modeling
Harmel - Monitoring to Support and Improve H/WQ Modeling
Soil and Water Conservation Society
 
Online machine learning in Streaming Applications
Online machine learning in Streaming ApplicationsOnline machine learning in Streaming Applications
Online machine learning in Streaming Applications
Stavros Kontopoulos
 
Download-manuals-water quality-wq-training-50inter-laboratoryaqcexercise
 Download-manuals-water quality-wq-training-50inter-laboratoryaqcexercise Download-manuals-water quality-wq-training-50inter-laboratoryaqcexercise
Download-manuals-water quality-wq-training-50inter-laboratoryaqcexercise
hydrologyproject0
 
Statistics in real life engineering
Statistics in real life engineeringStatistics in real life engineering
Statistics in real life engineering
MD TOUFIQ HASAN ANIK
 
Bridging the Gap: Machine Learning for Ubiquitous Computing -- Evaluation
Bridging the Gap: Machine Learning for Ubiquitous Computing -- EvaluationBridging the Gap: Machine Learning for Ubiquitous Computing -- Evaluation
Bridging the Gap: Machine Learning for Ubiquitous Computing -- Evaluation
Thomas Ploetz
 
Study on Application of Ensemble learning on Credit Scoring
Study on Application of Ensemble learning on Credit ScoringStudy on Application of Ensemble learning on Credit Scoring
Study on Application of Ensemble learning on Credit Scoring
harmonylab
 
Machine Learning part 3 - Introduction to data science
Machine Learning part 3 - Introduction to data science Machine Learning part 3 - Introduction to data science
Machine Learning part 3 - Introduction to data science
Frank Kienle
 
RUCK 2017 MxNet과 R을 연동한 딥러닝 소개
RUCK 2017 MxNet과 R을 연동한 딥러닝 소개RUCK 2017 MxNet과 R을 연동한 딥러닝 소개
RUCK 2017 MxNet과 R을 연동한 딥러닝 소개
r-kor
 
Researc-paper_Project Work Phase-1 PPT (21CS09).pptx
Researc-paper_Project Work Phase-1 PPT (21CS09).pptxResearc-paper_Project Work Phase-1 PPT (21CS09).pptx
Researc-paper_Project Work Phase-1 PPT (21CS09).pptx
AdityaKumar993506
 
MSPresentation_Spring2011
MSPresentation_Spring2011MSPresentation_Spring2011
MSPresentation_Spring2011
Shaun Smith
 
Declarative data analysis
Declarative data analysisDeclarative data analysis
Declarative data analysis
South West Data Meetup
 
Download-manuals-surface water-waterlevel-30howtovalidateratingcurve
 Download-manuals-surface water-waterlevel-30howtovalidateratingcurve Download-manuals-surface water-waterlevel-30howtovalidateratingcurve
Download-manuals-surface water-waterlevel-30howtovalidateratingcurve
hydrologyproject001
 
Week_2_Lecture.pdf
Week_2_Lecture.pdfWeek_2_Lecture.pdf
Week_2_Lecture.pdf
AlbertoLugoGonzalez
 
Mô hình dự báo Churn cho khách hàng bằng phương pháp học máy suy diễn Phương ...
Mô hình dự báo Churn cho khách hàng bằng phương pháp học máy suy diễn Phương ...Mô hình dự báo Churn cho khách hàng bằng phương pháp học máy suy diễn Phương ...
Mô hình dự báo Churn cho khách hàng bằng phương pháp học máy suy diễn Phương ...
NguyenThiNgocAnh9
 
Eugm 2011 mehta - adaptive designs for phase 3 oncology trials
Eugm 2011   mehta - adaptive designs for phase 3 oncology trialsEugm 2011   mehta - adaptive designs for phase 3 oncology trials
Eugm 2011 mehta - adaptive designs for phase 3 oncology trials
Cytel USA
 

Similar to Download-manuals-surface water-software-48appliedstatistics (20)

Download-manuals-surface water-software-47basicstatistics
 Download-manuals-surface water-software-47basicstatistics Download-manuals-surface water-software-47basicstatistics
Download-manuals-surface water-software-47basicstatistics
 
Download-manuals-surface water-software-47basicstatistics
 Download-manuals-surface water-software-47basicstatistics Download-manuals-surface water-software-47basicstatistics
Download-manuals-surface water-software-47basicstatistics
 
Download-manuals-surface water-software-47basicstatistics
 Download-manuals-surface water-software-47basicstatistics Download-manuals-surface water-software-47basicstatistics
Download-manuals-surface water-software-47basicstatistics
 
Download-manuals-surface water-waterlevel-36howtocarryoutsecondaryvalidation...
 Download-manuals-surface water-waterlevel-36howtocarryoutsecondaryvalidation... Download-manuals-surface water-waterlevel-36howtocarryoutsecondaryvalidation...
Download-manuals-surface water-waterlevel-36howtocarryoutsecondaryvalidation...
 
Download-manuals-surface water-waterlevel-41howtoanalysedischargedata
 Download-manuals-surface water-waterlevel-41howtoanalysedischargedata Download-manuals-surface water-waterlevel-41howtoanalysedischargedata
Download-manuals-surface water-waterlevel-41howtoanalysedischargedata
 
Harmel - Monitoring to Support and Improve H/WQ Modeling
Harmel - Monitoring to Support and Improve H/WQ ModelingHarmel - Monitoring to Support and Improve H/WQ Modeling
Harmel - Monitoring to Support and Improve H/WQ Modeling
 
Online machine learning in Streaming Applications
Online machine learning in Streaming ApplicationsOnline machine learning in Streaming Applications
Online machine learning in Streaming Applications
 
Download-manuals-water quality-wq-training-50inter-laboratoryaqcexercise
 Download-manuals-water quality-wq-training-50inter-laboratoryaqcexercise Download-manuals-water quality-wq-training-50inter-laboratoryaqcexercise
Download-manuals-water quality-wq-training-50inter-laboratoryaqcexercise
 
Statistics in real life engineering
Statistics in real life engineeringStatistics in real life engineering
Statistics in real life engineering
 
Bridging the Gap: Machine Learning for Ubiquitous Computing -- Evaluation
Bridging the Gap: Machine Learning for Ubiquitous Computing -- EvaluationBridging the Gap: Machine Learning for Ubiquitous Computing -- Evaluation
Bridging the Gap: Machine Learning for Ubiquitous Computing -- Evaluation
 
Study on Application of Ensemble learning on Credit Scoring
Study on Application of Ensemble learning on Credit ScoringStudy on Application of Ensemble learning on Credit Scoring
Study on Application of Ensemble learning on Credit Scoring
 
Machine Learning part 3 - Introduction to data science
Machine Learning part 3 - Introduction to data science Machine Learning part 3 - Introduction to data science
Machine Learning part 3 - Introduction to data science
 
RUCK 2017 MxNet과 R을 연동한 딥러닝 소개
RUCK 2017 MxNet과 R을 연동한 딥러닝 소개RUCK 2017 MxNet과 R을 연동한 딥러닝 소개
RUCK 2017 MxNet과 R을 연동한 딥러닝 소개
 
Researc-paper_Project Work Phase-1 PPT (21CS09).pptx
Researc-paper_Project Work Phase-1 PPT (21CS09).pptxResearc-paper_Project Work Phase-1 PPT (21CS09).pptx
Researc-paper_Project Work Phase-1 PPT (21CS09).pptx
 
MSPresentation_Spring2011
MSPresentation_Spring2011MSPresentation_Spring2011
MSPresentation_Spring2011
 
Declarative data analysis
Declarative data analysisDeclarative data analysis
Declarative data analysis
 
Download-manuals-surface water-waterlevel-30howtovalidateratingcurve
 Download-manuals-surface water-waterlevel-30howtovalidateratingcurve Download-manuals-surface water-waterlevel-30howtovalidateratingcurve
Download-manuals-surface water-waterlevel-30howtovalidateratingcurve
 
Week_2_Lecture.pdf
Week_2_Lecture.pdfWeek_2_Lecture.pdf
Week_2_Lecture.pdf
 
Mô hình dự báo Churn cho khách hàng bằng phương pháp học máy suy diễn Phương ...
Mô hình dự báo Churn cho khách hàng bằng phương pháp học máy suy diễn Phương ...Mô hình dự báo Churn cho khách hàng bằng phương pháp học máy suy diễn Phương ...
Mô hình dự báo Churn cho khách hàng bằng phương pháp học máy suy diễn Phương ...
 
Eugm 2011 mehta - adaptive designs for phase 3 oncology trials
Eugm 2011   mehta - adaptive designs for phase 3 oncology trialsEugm 2011   mehta - adaptive designs for phase 3 oncology trials
Eugm 2011 mehta - adaptive designs for phase 3 oncology trials
 

More from hydrologyproject0

Specification purewater
Specification purewaterSpecification purewater
Specification purewater
hydrologyproject0
 
Water purifier ion exchange resin column
Water purifier ion exchange resin columnWater purifier ion exchange resin column
Water purifier ion exchange resin column
hydrologyproject0
 
Water purifier, double distillation unit
Water purifier, double distillation unitWater purifier, double distillation unit
Water purifier, double distillation unit
hydrologyproject0
 
Water purifier, distillation unit (automatic)
Water purifier, distillation unit (automatic)Water purifier, distillation unit (automatic)
Water purifier, distillation unit (automatic)
hydrologyproject0
 
Water geyser (heater)
Water geyser (heater)Water geyser (heater)
Water geyser (heater)
hydrologyproject0
 
Water bath, general purpose
Water bath, general purposeWater bath, general purpose
Water bath, general purpose
hydrologyproject0
 
Water bath, bacteriological
Water bath, bacteriologicalWater bath, bacteriological
Water bath, bacteriological
hydrologyproject0
 
Wash bottle
Wash bottleWash bottle
Wash bottle
hydrologyproject0
 
Volumetric flask
Volumetric flaskVolumetric flask
Volumetric flask
hydrologyproject0
 
Visual accumulation tube
Visual accumulation tubeVisual accumulation tube
Visual accumulation tube
hydrologyproject0
 
Vacuum pump
Vacuum pumpVacuum pump
Vacuum pump
hydrologyproject0
 
Turbidity meter (nephelometer)
Turbidity meter (nephelometer)Turbidity meter (nephelometer)
Turbidity meter (nephelometer)
hydrologyproject0
 
Tool kit
Tool kitTool kit
Toc analyser, microprocessor controlled
Toc analyser, microprocessor controlledToc analyser, microprocessor controlled
Toc analyser, microprocessor controlled
hydrologyproject0
 
Tissue grinder
Tissue grinderTissue grinder
Tissue grinder
hydrologyproject0
 
Thermometers, set of
Thermometers, set ofThermometers, set of
Thermometers, set of
hydrologyproject0
 
Test tubes
Test tubesTest tubes
Test tubes
hydrologyproject0
 
Test sieves with shaker
Test sieves with shakerTest sieves with shaker
Test sieves with shaker
hydrologyproject0
 
Stirrer, magnetic (with hot plate)
Stirrer, magnetic (with hot plate)Stirrer, magnetic (with hot plate)
Stirrer, magnetic (with hot plate)
hydrologyproject0
 
Sterliser, autoclave
Sterliser, autoclaveSterliser, autoclave
Sterliser, autoclave
hydrologyproject0
 

More from hydrologyproject0 (20)

Specification purewater
Specification purewaterSpecification purewater
Specification purewater
 
Water purifier ion exchange resin column
Water purifier ion exchange resin columnWater purifier ion exchange resin column
Water purifier ion exchange resin column
 
Water purifier, double distillation unit
Water purifier, double distillation unitWater purifier, double distillation unit
Water purifier, double distillation unit
 
Water purifier, distillation unit (automatic)
Water purifier, distillation unit (automatic)Water purifier, distillation unit (automatic)
Water purifier, distillation unit (automatic)
 
Water geyser (heater)
Water geyser (heater)Water geyser (heater)
Water geyser (heater)
 
Water bath, general purpose
Water bath, general purposeWater bath, general purpose
Water bath, general purpose
 
Water bath, bacteriological
Water bath, bacteriologicalWater bath, bacteriological
Water bath, bacteriological
 
Wash bottle
Wash bottleWash bottle
Wash bottle
 
Volumetric flask
Volumetric flaskVolumetric flask
Volumetric flask
 
Visual accumulation tube
Visual accumulation tubeVisual accumulation tube
Visual accumulation tube
 
Vacuum pump
Vacuum pumpVacuum pump
Vacuum pump
 
Turbidity meter (nephelometer)
Turbidity meter (nephelometer)Turbidity meter (nephelometer)
Turbidity meter (nephelometer)
 
Tool kit
Tool kitTool kit
Tool kit
 
Toc analyser, microprocessor controlled
Toc analyser, microprocessor controlledToc analyser, microprocessor controlled
Toc analyser, microprocessor controlled
 
Tissue grinder
Tissue grinderTissue grinder
Tissue grinder
 
Thermometers, set of
Thermometers, set ofThermometers, set of
Thermometers, set of
 
Test tubes
Test tubesTest tubes
Test tubes
 
Test sieves with shaker
Test sieves with shakerTest sieves with shaker
Test sieves with shaker
 
Stirrer, magnetic (with hot plate)
Stirrer, magnetic (with hot plate)Stirrer, magnetic (with hot plate)
Stirrer, magnetic (with hot plate)
 
Sterliser, autoclave
Sterliser, autoclaveSterliser, autoclave
Sterliser, autoclave
 

Recently uploaded

GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
Neo4j
 
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Nexer Digital
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
Safe Software
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
Alpen-Adria-Universität
 
Full-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalizationFull-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalization
Zilliz
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
DianaGray10
 
GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...
ThomasParaiso2
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
Kari Kakkonen
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
James Anderson
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
Aftab Hussain
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
sonjaschweigert1
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
mikeeftimakis1
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
SOFTTECHHUB
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
Quotidiano Piemontese
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
danishmna97
 

Recently uploaded (20)

GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
 
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
 
Full-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalizationFull-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalization
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
 
GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
 

Download-manuals-surface water-software-48appliedstatistics

  • 1. World Bank & Government of The Netherlands funded Training module # WQ - 48 Applied Statistics New Delhi, September 2000 CSMRS Building, 4th Floor, Olof Palme Marg, Hauz Khas, New Delhi – 11 00 16 India Tel: 68 61 681 / 84 Fax: (+ 91 11) 68 61 685 E-Mail: dhvdelft@del2.vsnl.net.in DHV Consultants BV & DELFT HYDRAULICS with HALCROW, TAHAL, CES, ORG & JPS
  • 2. HP Training Module File: “ 48 Applied Statistics.doc” Version: September 2000 Page 1 Table of contents Page 1. Module context 2 2. Module profile 3 3. Session plan 4 4. Overhead/flipchart master 5 5. Evaluation sheets 21 6. Handout 23 7. Additional handout 30 8. Main text 32
  • 3. HP Training Module File: “ 48 Applied Statistics.doc” Version: September 2000 Page 2 1. Module context This module discusses statistical procedures, which are commonly used by chemists to evaluate the precision and accuracy of results of analyses. Prior training in ‘Basic Statistic’, module no. 47, or equivalent is necessary to complete this module satisfactorily. While designing a training course, the relationship between this module and the others, would be maintained by keeping them close together in the syllabus and place them in a logical sequence. The actual selection of the topics and the depth of training would, of course, depend on the training needs of the participants, i.e. their knowledge level and skills performance upon the start of the course.
  • 4. HP Training Module File: “ 48 Applied Statistics.doc” Version: September 2000 Page 3 2. Module profile Title : Applied Statistics Target group : HIS function(s): Q2, Q3, Q5, Q6 Duration : one session of 90 min Objectives : After the training the participants will be able to: • Apply common statistical tests for evaluation of the precision of data Key concepts : • confidence interval • regression analyses Training methods : Lecture, exercises, OHS Training tools required : Board, flipchart Handouts : As provided in this module Further reading and references : • ‘Analytical Chemistry’, Douglas A. Skoog and Donald M. West, Saunders College Publishing, 1986 (Chapter 3) • ‘Statistical Procedures for analysis of Environmenrtal monitoring Data and Risk Assessment’, Edward A. Mc Bean and Frank A. Rovers, Prentice Hall, 1998.
  • 5. HP Training Module File: “ 48 Applied Statistics.doc” Version: September 2000 Page 4 3. Session plan No Activities Time Tools 1 Preparations 2 Introduction: • Need for statistical procedures • Scope of the module 10 min OHS 3 Confidence interval • Definition • Significance of standard deviation and its correct estimation • When population SD is known • When population SD is not known 30 min OHS 4 Rejection of Data • Caution in rejection • Procedures 20 min OHS 5 Regression analysis 20 min OHS 6 Wrap up 10 min
  • 6. HP Training Module File: “ 48 Applied Statistics.doc” Version: September 2000 Page 5 4. Overhead/flipchart master OHS format guidelines Type of text Style Setting Headings: OHS-Title Arial 30-36, with bottom border line (not: underline) Text: OHS-lev1 OHS-lev2 Arial 24-26, maximum two levels Case: Sentence case. Avoid full text in UPPERCASE. Italics: Use occasionally and in a consistent way Listings: OHS-lev1 OHS-lev1-Numbered Big bullets. Numbers for definite series of steps. Avoid roman numbers and letters. Colours: None, as these get lost in photocopying and some colours do not reproduce at all. Formulas/Equat ions OHS-Equation Use of a table will ease horizontal alignment over more lines (columns) Use equation editor for advanced formatting only
  • 7. HP Training Module File: “ 48 Applied Statistics.doc” Version: September 2000 Page 6 Applied Statistics • Confidence intervals • Number of replicate observations • Rejection of data • Regression analysis
  • 8. HP Training Module File: “ 48 Applied Statistics.doc” Version: September 2000 Page 7 Confidence Intervals (1) • Population mean µ is always unknown • Sample mean x - Only for large number of replicate x → µ • Estimation of limits for µ from x (x – a) < µ < (x + a) • Value of a depend - confidence level (probability of occurrence) - standard deviation - difference between limits is the interval
  • 9. HP Training Module File: “ 48 Applied Statistics.doc” Version: September 2000 Page 8 Confidence Intervals Figure 1. Normal distribution curve and areas under curve for different distances
  • 10. HP Training Module File: “ 48 Applied Statistics.doc” Version: September 2000 Page 9 Confidence Intervals (2) • When s is good estimate of σ - for single observation, (x - zσ) < µ < (x - zσ) Confidence level 50 90 95 99 99.7 Z 0.67 1.64 1.96 2.58 3 - as z or interval increases confidence level also increases
  • 11. HP Training Module File: “ 48 Applied Statistics.doc” Version: September 2000 Page 10 Example (1) • Mercury concentration in fish was determined as 1.8 µg/kg. Calculate limits for µ for 95% confidence level if σ = 0.1 µg/kg - For 95% confidence level z = 1.96 - Limits for mean 1.8 – 1.96 x 0.1 < µ < 1.8 + 1.96 x 0.1 1.6 < µ < 2.0 • What are the limits for 99.7% confidence level?
  • 12. HP Training Module File: “ 48 Applied Statistics.doc” Version: September 2000 Page 11 Confidence Intervals (3) • Confidence interval when mean,x, of n replicates is available • Confidence interval is reduced compared to single measurement as n increases n 1 2 3 4 9 16 1 1.4 1.7 2 3 4 • Analyse 2 to 4 replicates, more replicates give diminishing return nzxnzx σ+<µ<σ− n
  • 13. HP Training Module File: “ 48 Applied Statistics.doc” Version: September 2000 Page 12 Example (2) • Average mercury conc. from 3 replicates was 1.67 µg/ kg • Calculate limits for 95% confidence level, σ = 0.1 µg/ kg 1.67 – 1.96 x 0.1/ < µ < 1.67 + 1.96 x 0.1/ 1.56 < µ < 1.78 3 3
  • 14. HP Training Module File: “ 48 Applied Statistics.doc” Version: September 2000 Page 13 Example (3) • How Many replicate analyses will be required to decrease the 95% confidence interval to ± 0.07, σ = 0.1 Interval = ± zσ/ ± 0.07 = ± 1.96 x 0.1/ n = 7.8 • 8 measurements will give slightly better than 95% chance for µ to be with x ± 0.07 n n
  • 15. HP Training Module File: “ 48 Applied Statistics.doc” Version: September 2000 Page 14 Confidence Interval (4) • When σ is unknown • Sample s based on n measurements is available Degrees of Freedom 1 4 10 α T95 12.7 2.78 2.23 1.96 T99 63.7 4.60 3.17 2.58 • Note T → z as degrees of freedom increases to ∝ nstxnstx +<µ<−
  • 16. HP Training Module File: “ 48 Applied Statistics.doc” Version: September 2000 Page 15 Example (4) • Groundwater samples were analysed for TOC, n = 5, x = 11.7 mg/L s = 3.2 mg/L • Find the 95% confidence limit for the true mean - Degrees of freedom = n – 1 = 5 – 1 = 4 - T95 = 2.78 (from table) 11.7 – 2.78 x 3.2 / < µ < 11.7 + 2.78 x 3.2 / 7 < µ < 15.74 5 5
  • 17. HP Training Module File: “ 48 Applied Statistics.doc” Version: September 2000 Page 16 Detection of Outliers • Extreme values, very high or low, will influence calculated statistics • Rejection - sound basis - may be due to an unrecorded event - errors in transcription, reading of instruments, inconsistent methodology, etc. - otherwise apply statistical yardsticks
  • 18. HP Training Module File: “ 48 Applied Statistics.doc” Version: September 2000 Page 17 Q Test • Qexp = • Reject if Qexp = Qcrit Total No. of observations 3 5 7 10 Qcrit 90% 0.94 0.64 0.51 0.41 Qcrit 96% 0.98 0.73 0.59 0.48 • Other criteria are also available testentireofSpread neighbourandsuspectbetweenDifference
  • 19. HP Training Module File: “ 48 Applied Statistics.doc” Version: September 2000 Page 18 Regression Analysis • Quantification of relationships between variables - used for predictions - calibration of instruments • Least-square method - deviations due to random errors y = a + bx - linear relationship
  • 20. HP Training Module File: “ 48 Applied Statistics.doc” Version: September 2000 Page 19 Calibration of GC Pesticide conc., µg/L (xi) Peak area, cm2 (yi) xi 2 yi 2 xiyi 0.352 1.09 0.12390 1.1881 0.3868 0.803 1.78 0.64481 3.1684 1.42934 1.08 2.60 1.16640 6.7600 2.80800 1.38 3.03 1.90140 9.1809 4.18140 1.75 4.01 3.06250 16.0801 7.01750 ∑ 5.365 12.51 6.90201 36.3775 15.81992 Sxx = ∑xi 2 – (∑xi)2 /n =1.145365 Syy = ∑yi 2 – (∑yi)2 /n =5.07748 Sxy = ∑xiyi – ∑xi ∑yi /n =2.39669 b = Sxy / Sxx = 2.09 a = y - bx = 0.26
  • 21. HP Training Module File: “ 48 Applied Statistics.doc” Version: September 2000 Page 20 Figure 2 - Calibration Curve 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 Concentration of pesticide, ug/L Peakarea,cm2 Calibration Curve
  • 22. HP Training Module File: “ 48 Applied Statistics.doc” Version: September 2000 Page 21 5. Evaluation sheets
  • 23. HP Training Module File: “ 48 Applied Statistics.doc” Version: September 2000 Page 22
  • 24. HP Training Module File: “ 48 Applied Statistics.doc” Version: September 2000 Page 23 6. Handout
  • 25. HP Training Module File: “ 48 Applied Statistics.doc” Version: September 2000 Page 24 Applied Statistics • Confidence intervals • Number of replicate observations • Rejection of data • Regression analysis Confidence Intervals (1) • Population mean µ is always unknown • Sample mean x - Only for large number of replicate x → µ • Estimation of limits for µ from x (x – a) < µ < (x + a) • Value of a depend - confidence level (probability of occurrence) - standard deviation - difference between limits is the interval Confidence Intervals Figure 1. Normal distribution curve and areas under curve for different distances
  • 26. HP Training Module File: “ 48 Applied Statistics.doc” Version: September 2000 Page 25 Confidence Intervals (2) • When s is good estimate of σ - for single observation, (x - zσ) < µ < (x - zσ) Confidence level 50 90 95 99 99.7 Z 0.67 1.64 1.96 2.58 3 - as z or interval increases confidence level also increases Example (1) • Mercury concentration in fish was determined as 1.8 µg/kg. • Calculate limits for µ for 95% confidence level if σ = 0.1 µg/k - For 95% confidence level z = 1.96 - Limits for mean 1.8 – 1.96 x 0.1 < µ < 1.8 + 1.96 x 0.1 1.6 < µ < 2.0 • What are the limits for 99.7% confidence level? Confidence Intervals (3) • Confidence interval when mean,x, of n replicates is available • Confidence interval is reduced compared to single measurement as n increases n 1 2 3 4 9 16 1 1.4 1.7 2 3 4 • Analyse 2 to 4 replicates, more replicates give diminishing return Example (2) • Average mercury conc. from 3 replicates was 1.67 µg/ kg. • Calculate limits for 95% confidence level, σ = 0.1 µg/ k nzxnzx σ+<µ<σ− n 78.156.1 31.0x96.167.131.0x96.167.1 <µ< +<µ<−
  • 27. HP Training Module File: “ 48 Applied Statistics.doc” Version: September 2000 Page 26 Example (3) • How Many replicate analyses will be required to decrease the 95% confidence interval to ± 0.07, σ = 0.1 Interval = ± zσ/ ± 0.07 = ± 1.96 x 0.1/ n = 7.8 • 8 measurements will give slightly better than 95% chance for µ to be with x ± 0.07 Confidence Intervals (4) • When σ is unknown • Sample s based on n measurements is available Degrees of Freedom 1 4 10 α T95 12.7 2.78 2.23 1.96 T99 63.7 4.60 3.17 2.58 • Note T → z as degrees of freedom increases to α Example (4) • Groundwater samples were analysed for TOC n = 5, x = 11.7 mg/L s = 3.2 mg/L • Find the 95% confidence limit for the true mean - Degrees of freedom = n – 1 = 5 – 1 = 4 - T95 = 2.78 11.7 – 2.78 x 3.2 / < µ < 11.7 + 2.78 x 3.2 / 7 < µ < 15.74 n n nstxnstx +<µ<− 5 5
  • 28. HP Training Module File: “ 48 Applied Statistics.doc” Version: September 2000 Page 27 Detection of Outliers • Extreme values, very high or low, will influence calculated statistics • Rejection - sound basis - may be due to an unrecorded event - errors in transcription, reading of instruments, inconsistent methodology, etc. - otherwise apply statistical yardsticks Q Test • Qexp = • Reject if Qexp = Qcrit Total No. of observations 3 5 7 10 Qcrit 90% 0.94 0.64 0.51 0.41 Qcrit 96% 0.98 0.73 0.59 0.48 • Other criteria are also available Regression Analysis • Quantification of relationships between variables - used for predictions - calibration of instruments • Least-square method - deviations due to random errors y = a + bx - linear relationship testentireofSpread neighbourandsuspectbetweenDifference
  • 29. HP Training Module File: “ 48 Applied Statistics.doc” Version: September 2000 Page 28 Calibration of GC Pesticide conc., µg/L (xi) Peak area, cm2 (yi) xi 2 yi 2 xiyi 0.352 1.09 0.12390 1.1881 0.3868 0.803 1.78 0.64481 3.1684 1.42934 1.08 2.60 1.16640 6.7600 2.80800 1.38 3.03 1.90140 9.1809 4.18140 1.75 4.01 3.06250 16.0801 7.01750 ∑ 5.365 12.51 6.90201 36.3775 15.81992 Sxx = ∑xi 2 – (∑xi)2 /n =1.145365 Syy = ∑yi 2 – (∑yi)2 /n =5.07748 Sxy = ∑xiyi – ∑xi ∑yi /n =2.39669 b = Sxy / Sxx = 2.09 a = y - bx = 0.26 Figure 2 – Calibration Curve 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 Concentration of pesticide, Peakarea,cm 2
  • 30. HP Training Module File: “ 48 Applied Statistics.doc” Version: September 2000 Page 29 Add copy of Main text in chapter 8, for all participants.
  • 31. HP Training Module File: “ 48 Applied Statistics.doc” Version: September 2000 Page 30 7. Additional handout These handouts are distributed during delivery and contain test questions, answers to questions, special worksheets, optional information, and other matters you would not like to be seen in the regular handouts. It is a good practice to pre-punch these additional handouts, so the participants can easily insert them in the main handout folder.
  • 32. HP Training Module File: “ 48 Applied Statistics.doc” Version: September 2000 Page 31
  • 33. HP Training Module File: “ 48 Applied Statistics.doc” Version: September 2000 Page 32 8. Main text Contents 1. Confidence Intervals 1 2. Detection of Data Outliers 5 3. Regression Analysis 6
  • 34. HP Training Module File: “ 48 Applied Statistics.doc” Version: September 2000 Page 1 Applied Statistics For small data sets, there is always the question as to how well the data represent the population and what useful information can be inferred regarding the population characteristics. Statistical calculations are used for this purpose. This module deals with four such applications: 1. Definition of a population around the mean of a data set within which the true mean can be expected to be found with a particular degree of probability. 2. Determination of the number of measurements needed to obtain the true mean within a predetermined interval around the experimental mean with a particular degree of probability. 3. Determining if an outlying value in a set of replicate measurements should be retained in calculating the mean for the set. 4. The fitting of a straight line to a set of experimental points. 1. Confidence Intervals The true mean of population, µ, is always unknown. However, limits can be set about the experimentally determined mean, x, within which the true mean may be expected to occur with a given degree of probability. These bounds are called confidence limits and the interval between these limits, the confidence interval. For a given set of data, if the confidence interval about the mean is large, the probability of an observation falling within the limits also becomes large. On the other hand, in the case when the limits are set close to the mean, the probability that the observed value falls within the limits also becomes smaller or, in other words, a larger fraction of observations are expected to fall out side the limits. Usually confidence intervals are calculated such that 95% of the observations are likely to fall within the limits. For a given probability of occurrence, the confidence limits depend on the value of the standard deviation, s, of the observed data and the certainty with which this quantity can be taken to represent the true standard deviation, σ, of the population. 1.1 Confidence limits where σ is known When the number of repetitive observations is 20 or more, the standard deviation of the observed set of data, s, can be taken to be a good approximation of the standard deviation of the population, σ. However, it may not always be possible to perform such a large number of repetitive analyses, particularly when costly and time consuming extraction and analytical procedures are involved. In such cases, data from previous records or different laboratories may be pooled, provided that identical precautions and analytical steps are followed in each case. Further, care should be taken that the medium sampled is also similar, for example, analysis results of groundwater samples having high TDS should not be pooled with the results of surface waters, which usually have low TDS. As discussed in the previous module, most of the randomly varying data can be approximated to a normal distribution curve. For normal distribution, 68% of the observations lie between the limits µ ± σ , 96% between the limits µ ± 2σ, 99.7% between the limits µ ±
  • 35. HP Training Module File: “ 48 Applied Statistics.doc” Version: September 2000 Page 2 3σ, etc., Figure 1. For a single observation, xi, the confidence limits for the population mean are given by: Figure 1. Normal distribution curve and areas under curve for different distances confidence limit for µ = xi ± zσ (1) where z assumes different values depending upon the desired confidence level as given in Table 1. Table 1: Values of z for various confidence levels Confidence level, % z 50 0.67 68 1.00 80 1.29 90 1.64 95 1.96 96 2.00 99 2.58 99.7 3.00 99.9 3.29 Example 1: Mercury concentration in the sample of a fish was determined to be 1.80 µg/kg. Calculate the 50% and 95% confidence limits for this observation. Based on previous analysis records, it is known that the standard deviation of such observations, following similar analysis procedures, is 0.1 µg/kg and it closely represents the population standard deviation, σ.
  • 36. HP Training Module File: “ 48 Applied Statistics.doc” Version: September 2000 Page 3 From Table 1, it is seen that z = 0.67 and 1.96 for the two confidence limits in question. Upon substitution in Equation 1, we find that 50% confidence limit = 1.80 ± 0.67 x 0.1 = 1.8 ± 0.07 µg/kg 95% confidence limit = 1.80 ± 1.96 x 0.1 = 1.8 ± 0.2 µg/kg Therefore if 100 replicate analyses are made, the results of 50 analysis will lie between the limits 1.73 and 1.87 and 95 results are expected to be within an enlarged limit of 1.6 and 2.0 Equation 1 applies to the result of a single measurement. In case a number of observations, n, is made and an average of the replicate samples is taken, the confidence interval decreases. In such a case the limits are given by: confidence limit for µ = x ± zσ/ √n (2) Example 2: Calculate the confidence limits for the problem of Example 1, if three samples of the fish were analysed yielding an average of 1.67 µg/kg Hg. Substitution in Equation 2 gives: 50% confidence limit = 1.67 ± 0.67 x 0.1/ √3 = 1.67 ± 0.04 µg/kg 95% confidence limit = 1.67 ± 1.96 x 0.1/ √3 = 1.67 ± 0.11 µg/kg For the same odds the confidence limits are now substantially smaller and the result can be said to be more accurate and probably more useful. Note that Equation 2 indicates that the confidence interval can be halved by increasing the number of analyses to 4 (√4 = 2) compared to a single observation. Increasing the number of measurements beyond 4 does not decrease the confidence interval proportionately. To narrow the interval by one fourth, 16 measurements would be required (√16 = 4), thus giving diminishing return. Consequently, 2 to 4 replicate measurements are made in most cases. Equation 2 can also be used to find the number of replicate measurements required such that with a given probability the true mean would be found within a predetermined interval. This is illustrated in Example 3. Example 3: How many replicate measurements of the specimen in Example 1 would be needed to decrease the 95% confidence interval to ±0.07. Substituting for confidence interval in Equation 2: 0.07 = 1.96 x 0.1/ √n √n = 1.96 x 0.1/0.07, n = 7.8 Thus, 8 measurements will provide slightly better than 95% chance of he true mean lying within ±0.07 of the experimental mean.
  • 37. HP Training Module File: “ 48 Applied Statistics.doc” Version: September 2000 Page 4 1.2 Confidence limits where σ is unknown When the number of individual measurements in a set of data are small the reproducibility of the calculated value of the standard deviation, s, is decreased. Therefore, for a given probability, the confidence interval must be larger under these circumstances. To account for the potential variability in s, the confidence limits are calculated using the statistical parameter t: confidence limit for µ = x ± ts/ √n (3) In contrast to z in Equation 2, t depends not only on the desired confidence level, but also upon the number of degrees of freedom available in the calculation of s. Table 2 provides values for t for various degrees of freedom and confidence levels. Note that the values of t become equal to those for z (Table 1) as the number of degrees of freedom becomes infinite. Table 2: Values of t for various confidence levels and degrees of freedom Confidence LevelDegrees of Freedom 90 95 99 1 6.31 12.7 63.7 2 2.92 4.30 9.92 3 2.35 3.18 5.84 4 2.13 2.78 4.60 5 2.02 2.57 4.03 6 1.94 2.45 3.71 8 1.86 2.31 3.36 10 1.81 2.23 3.17 12 1.78 2.18 3.11 12 1.78 2.16 3.06 14 1.76 2.14 2.98 ∞ 1.64 1.96 2.58 Example 4: A chemist obtained the following data for the concentration of total organic carbon (TOC) in groundwater samples: 13.55, 6.39, 13.81, 11.20, 13.88 mg/L. Calculate the 95% confidence limits for the mean of the data. Thus, from the given data, n = 5, x = 11.77 and s = 3.2. For degrees of freedom = 5 – 1 = 4 and 95% confidence limit, t from Table 2 = 2.78. Therefore, from Equation 3: confidence limit for µ = x ± ts/ √n = 11.77 ± 2.78 x 3.2/ √5 = 11.77 ± 3.97
  • 38. HP Training Module File: “ 48 Applied Statistics.doc” Version: September 2000 Page 5 2. Detection of Data Outliers Data outliers are extreme (high or low) values that diverge widely from the main body of the data set. The presence of one or more outliers may greatly influence any calculated statistics and yield biased results. However, there is also the possibility that the outlier is a legitimate member of the data set. Outlier detection tests are to determine whether there is sufficient statistical evidence to conclude that an observation appears extreme and does not belong to the data set and should be rejected. Data outliers may result from faulty instruments, error in transcription, misreading of instruments, inconsistent methodology of sampling and analysis and so on. These aspects should be investigated and if any of such reasons can be pegged to an outlier, the value may be safely deleted from consideration. However, this is to be kept in mind that the suspect data may rightfully belong to the set and may be the consequence of an unrecorded event, such as, a short rainfall, intrusion of sea water or a spill. Table 3: Critical values for rejection quotient Qcrit (reject if Qexp> Qcrit)Number of observations 90% confidence 96% confidence 3 0.94 0.98 4 0.76 0.85 5 0.64 0.73 6 0.56 0.64 7 0.51 0.59 8 0.47 0.54 9 0.44 0.51 10 0.41 0.48 Of the numerous statistical criteria available for detection of outliers, the Q test, which is commonly used, will be discussed here. To apply the Q test, the difference between the questionable result and its closest neighbour is divided by the spread of the entire set. The resulting ratio Qexp is compared with the rejection values, Qcrit given in Table3, that are critical for a particular degree of confidence. If Qexp is larger, a statistical basis for rejection exists. The table shows only some selected values. Any standard statistical analysis book may be consulted for a complete set. Example 5: Concentration measurements for fluoride in a well were measured as 2.77, 2.80, 2.90, 2.92, 3.45, 3.95, 4.44, 4.61, 5.21, 7.46. Use the Q test to examine whether the highest value is an outlier. Qexp = (7.46 – 5.21) / (7.46 – 2.77) = 0.51 Since 0.51 is larger than 0.48, the Qcrit value for 96% confidence, there is a basis for excluding the value. Other criteria that can be used to evaluate an apparent outlier are: - Plotting of a scatter diagram indicating inter-parameter relationship between two constituents. If there is a correlation between the two constituents, an outlier would lie a significant distance from the general trend.
  • 39. HP Training Module File: “ 48 Applied Statistics.doc” Version: September 2000 Page 6 - Applying test for normal distribution by calculating the mean and standard deviation of all the data and determining if the extreme value is outside the mean ±3 times the standard deviation limit. If so, it is indeed an unusual value. (A cumulative normal probability plot may also be used for this test.) - When the data set is not large, calculating and comparing the standard deviation, with and without the suspect observations. A suspect value which has a considerable influence on the calculated standard deviation suggests an outlier. 3. Regression Analysis In assessing environmental quality , it is often of interest to quantify relationship between two or more variables. This may allow filling of missing data for one constituent and also may help in predicting future levels of a constituent. Regression analysis is focused on determining the degree to which one or more variable(s) is dependant on an other variable, the independent variable. Thus regression is a means of calibrating coefficients of a predictive equation. In correlation analysis neither of the variables is identified as more important than the other. Correlation is not causation. It provides a measure of the goodness of fit. In the calibration step, in most analytical procedures where a ‘best’ straight line is fitted to the observed response of the detector system when known amounts of analyte (standards) are analysed. This section discusses the regression analysis procedure employed for this purpose. Least-squares method is the most straightforward regression procedure. Application of the method requires two assumptions; a linear relationship exists between the amount of analyte (x) and the magnitude of the measured response (y) and that any deviation of individual points from the straight line is entirely the consequence of indeterminate error, that is, no significant error exists in the composition of the standards. The line generated is of the form: y = a + bx (4) where a is the value of y when x is zero (the intercept) and b is the slope. The method minimises the squares of the vertical displacements of data points from the best fit line. For convenience, the following three quantities are defined: Sxx = ∑ (xi – x)2 = ∑xi 2 – (∑xi)2 /n Syy = ∑ (yi – y)2 = ∑yi 2 – (∑yi)2 /n Sxy = ∑ (xi – x) (yi – y) = ∑xiyi – ∑xi ∑yi /n Calculating these quantities permits the determination of the following: 1. The slope of the line b: b = Sxy/Sxx (5) 2. The intercept a: a = y – bx (6)
  • 40. HP Training Module File: “ 48 Applied Statistics.doc” Version: September 2000 Page 7 3. The standard deviation about the regression line, sr: sr = √{(Syy – b2 Sxx)/(n-2)} (7) 4. The standard deviation of the slope sb: sb = √(sr 2 /Sxx) (8) 5. The standard deviation of the results based on the calibration curve sc: sc = (sr/b) x √{(1/m) + (1/n) + (yc – y)2 /b2 Sxx} (9) where yc is the mean of m replicate measurements made using the calibration curve. Example 6: The following table gives calibration data for chromatographic analysis of a pesticide and computations for fitting a straight line according to the least-square method. Pesticide conc., µg/L (xi) Peak area, cm2 (yi) xi 2 yi 2 xiyi 0.352 1.09 0.12390 1.1881 0.3868 0.803 1.78 0.64481 3.1684 1.42934 1.08 2.60 1.16640 6.7600 2.80800 1.38 3.03 1.90140 9.1809 4.18140 1.75 4.01 3.06250 16.0801 7.01750 ∑ 5.365 12.51 6.90201 36.3775 15.81992 Therefore Sxx = ∑xi 2 – (∑xi)2 /n =1.145365 Syy = ∑yi 2 – (∑yi)2 /n =5.07748 Sxy = ∑xiyi – ∑xi ∑yi /n =2.39669 and from Equations 5 & 6 b = 2.0925 = 2.09 and a = 0.2567 = 0.26 Thus the equation for the least-square line is y = 0.26 + 2.09x Note that rounding should not be performed until the end in order to avoid rounding errors. The data points and the equation is plotted in Figure2.
  • 41. HP Training Module File: “ 48 Applied Statistics.doc” Version: September 2000 Page 8 Figure 2: Calibration Curve Example 7: Using the relationship derived in Example 6 calculate 1. concentration of pesticide in sample if the peak area of 2.65 was obtained. 2. the standard deviation of the result based on the single measurement of 2.65. 3. the standard deviation of the measurement if 2.65 represents the average of 4 replicates. 1. Substituting in the equation derived in the previous example 2.65 = 0.26 + 2.09x x = 1.14 µg/L 2. Substituting in Equation 9 for m = 1 sc = (0.144/2.09) x √{(1/1) + (1/5) + (2.65 – 12.51/5)2 /2.092 x1.145} = ±0.08 3. Substituting in Equation 9 for m = 4 sc = (0.144/2.09) x √{(1/4) + (1/5) + (2.65 – 12.51/5)2 /2.092 x1.145} = ±0.05 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 Concentration of pesticide, ug/L Peakarea,cm 2