2. MODULE 3 Inferential statistics. Hypothesis or significance tests in chemistry
L 5. Sampling and the Central Limit Theorem. Confidence intervals for
population mean and variance.
1
PC 5. Practical applications for confidence intervals calculations in analytical
chemistry. Testing accuracy for analytical methods using certified reference
material.
2 10
L 6. Hypothesis testing and statistical tests. Parametrical statistical tests for
single samples.
1
PC 6. One sample z and t tests. 2 15
IWST 3. Exercises and problems regarding confidence intervals and t tests
L 7. Statistical tests for two samples or treatments. F-test, t-test, paired t-test 1
PC 7. Exercises and practical problems involving t-test, paired t-test in Excel. 2 10
IWS 2. Individual work with exercises and problems regarding inferential
statistics
20
Midterm control 1 100
Learning resources:
1. Stephen Kokoska,
Introductory Statistics: A
Problem-Solving
Approach, Publisher: WH
Freeman; 3rd edition
chapter 7, 8
2. James Miller, Jane Miller,
Robert Miller, Statistics
and Chemometrics for
Analytical Chemistry,
Publisher: Pearson
Education; 7th edition,
chapter 2
3. Sampling. Central limit theorem.
Theorem: Let a population whose distribution is
normal or not, with the mean μ and variance σ2, from
which k samples are drawn each containing n
independent measurements (k series of ni random
measurements), then, for a sufficiently large n, the
new random variable 𝑥 has a normal distribution with
mean μ and variance σ2/n.
1. 𝑥 → μ (sample mean 𝑥 tends towards the true population mean
μ)
2. σ𝑥 →
σ𝑥
𝑛
- standard error of the mean (sample standard
deviation decrease with 𝑛 and precision increase as n
increase.
3. If the sampling population is normal, or approximately normal,
the new random variable 𝑥 has a normal distribution and even
for a n sampling volume smaller.
4. If we work with high-volume samples (n high), we can use the
normal distribution, even if the population from which we sample
is not normal.
Consequences:
Descriptive
statistics
Inferential
statistics
Population
(Parameters – μ, σ)
Sample
(Statistics – , )
Probability
Sampling
Inference
population
distribution
4. Example:
Suppose salaries at a very large company average
$62,000 and a standard deviation of $32,000.
a. What is the probability that a randomly selected
employee will have a salary of at least 66000?
p(x>66000$)=?
b. What is the probability that 100 randomly selected
employees will have an average salary of at least
66,000? p(x
̅ >66000$)=?
n=50
Three different original, or underlying, populations, and approximations to th
distribution of the sample mean for various sample sizes n.
5. α - significance level (e.g. 10% or 0.1; 5% or 0.05; 1% or 0.01)
1 – α - probability of confidence, confidence level (e.g.: 0.9 or 90%; 0.95 or 95%; 0.99 or
99%)
One-sided confidence
interval
Two-sided confidence interval
(1 –
α)
(1 –
α)
(1 –
α)
α α
α/2
α/2
z zα
zα/2
-z
Confidence interval of the mean (measurement uncertainty)
A confidence interval (CI) for a population parameter is an interval of values constructed so that, with a specified
degree of confidence, the value of the population parameter lies within it.
(x - ε) < μ < (x + ε)
(x - ε) (x + ε)
μ is i this i terval
x
Confidence zone
Critical area
Critical area
6. A. Confidence interval of the mean 𝝁 when σ is known or n > 30 (less often in practice)
Example: A study investigated the 2D/4D ratio of
European women at a large university. A sample of
135 women resulted in an average 2D/4D ratio of
0.988. Knowing σ = 0.028, what is the confidence
interval of the population mean, with a probability of
95%? But with a 99% probability?
R: 0.988±0.005 and 0.988±0.006 respectively
• A sample is drawn from a population and n measurements are made
• The sample mean 𝒙 is calculated and according to the Central Limit Theorem: 𝑋~𝑁(μ, σ
𝑛
). But σ is known and
since it is population parameter, it is constant.
• We know the z variable, 𝑧~𝑁(1, 0), 𝑧 =
𝑥−μ
σ
𝑛
• We choose the confidence level, 1 – α that we wa t to use, or the sig ifica ce level α.
𝑝 −𝑧𝛼/2 < 𝑧 < 𝑧𝛼/2 = 1 − 𝛼, 𝑡ℎ𝑢𝑠, −𝑧𝛼
2
<
𝑥−𝜇
𝜎
𝑛
< 𝑧𝛼
2
𝑤ℎ𝑖𝑐ℎ 𝑐𝑎𝑛 𝑏𝑒 𝑤𝑟𝑖𝑡𝑡𝑒𝑛: 𝑥 − 𝑧𝛼
2
∗
𝜎
𝑛
<μ < 𝑥 + 𝑧𝛼
2
∗
𝜎
𝑛
, so
• With 1 – α probability, 𝑥 ∓ 𝑧𝛼/2 ×
𝜎
𝑛
7. B. Confidence interval of the mean 𝝁 when σ is unknown and/or n < 30
R: 795,3±16,5 g respectiv
795,3±24,9 g
z
tα/2,ν
α/2
Example: A grain producer produces packages of cereals
with a declared mass of 750 g. From one batch, 7
packages were weighed, leading to an average of 795.3 g
and s = 17.8 g. What is the confidence interval of the
average, with a probability of 95%? But with a 99%
probability?
Standard normal distribution
t with 6 degrees of freedom
• A sample is drawn from a population and n measurements are made
• The sample mean 𝒙 and sample standard deviation s are calculated
• We know the t variable, t~𝑡ν (1, 0),t =
𝑥−μ
𝑠
𝑛
𝑤ℎ𝑖𝑐ℎ ℎ𝑎𝑠 𝑎 𝑡 𝑑𝑖𝑠𝑡𝑟𝑖𝑏𝑢𝑡𝑖𝑜𝑛 𝑤𝑖𝑡ℎ ν = 𝑛 − 1 𝑑𝑒𝑔𝑟𝑒𝑒𝑠 𝑜𝑓 𝑓𝑟𝑒𝑒𝑑𝑜𝑚
• We choose the confidence level, 1 – α that we wa t to use, or the sig ifica ce level α.
𝑝 −𝑡𝛼/2,ν < 𝑡 < 𝑡𝛼/2,ν = 1 − 𝛼, 𝑡ℎ𝑢𝑠, −𝑡𝛼
2
,𝜈 <
𝑥−𝜇
𝑠
𝑛
< 𝑡𝛼
2
,𝜈 𝑤ℎ𝑖𝑐ℎ 𝑐𝑎𝑛 𝑏𝑒 𝑤𝑟𝑖𝑡𝑡𝑒𝑛: 𝑥 − 𝑡𝛼
2
,ν ∗
𝜎
𝑛
<μ < 𝑥 − 𝑡𝛼
2
,ν ∗
𝜎
𝑛
, so
• With 1 – α probability, 𝑥 ∓ 𝑡𝛼/2,ν ×
𝑠
𝑛
8. Confidence interval of the variance 𝝈𝟐
The confidence interval of variance, at a level of significance α: with a probability of 1- α, the variance σ2 is within the range:
9. P1. The blood concentration of lead was measured for 50 children at a school near a large intersection with high
circulation. The mean Pb concentration of this sample was 10.12 ng/mL and standard deviation 0.64 ng/mL.
a. Calculate the confidence interval for the average concentration of Pb in the blood for the whole school at a
probability of 95%.
b. How large would the sample have to be to have a confidence interval of 0.20 ng/mL (±0.10 ng/mL) at a
probability of 95%?
P2. Seven measurements for the pH of a buffer solution yielded the following results: 5.12; 5.20; 5.15; 5.17; 5.16;
5.19; 5.15. Calculate the confidence interval of the mean at a probability of 95%.
P3. A spectrophotometer is checked at a given wavelength with a standard solution having a true absorbance value of
0.470. Ten independent measurements yielded the following values: 0.465; 0.463; 0.456; 0.459; 0.461; 0.465; 0.462;
0.457; 0.461; 0.461. Calculate the confidence interval of the mean absorbance and decide with 95% probability
whether the spectrophotometer commits systematic errors. (Nargiz must send it to me and is optional for the others)
Applications
10. IWS for week 5 (homework) to be submitted until 6th of November 2023 17:00 Almaty local time.
P1. Many lakes are carefully monitored for pH concentration, total phosphorus, chlorophyll, nitrogen, and
total suspended solids. These data are used to characterize the condition of the lake and to chart year-to-year
variability. Based on information from the Lake Partner Program, Ontario Ministry of the Environment,
Aberdeen Lake has a mean total phosphorus concentration of 14.6 mg/liter and standard deviation 5.8
mg/liter. Suppose a day is selected at random, and a total phosphorus measure from Aberdeen Lake is
obtained.
a. What is the probability that the total phosphorus is less than 13 mg/liter?
b. What is the probability that the total phosphorus differs from the mean by more than 5 mg/liter?
P4. Ten Hg concentration measurements in a commercial sample of liquefied gas yielded the following
results: 23.3; 22.5; 21.9; 21.5; 19.9; 21.3; 21.7; 23.8; 22.6; 24.7 ng/mL. Calculate the confidence interval of
the average at a probability of 99%.
P5. A solution of 0.1 M monoprotic acid was used to titrate 10 mL of monobasic alkaline solution of 0.1 M
concentration. Five successive measurements yielded values: 9.88; 10.18; 10.23; 10.39; 10.21 mL.
Calculate the confidence interval of the average and decide, with a probability of 95%, whether it is
evidence for a systematic error.
5 points
(for last
week)
3 points
3 points