Population and sample mean

POPULATION AND
SAMPLE MEAN
Avjinder Singh Kaler and Kristi Mai

• Estimating a Population Mean
• 𝜎 unknown
• 𝜎 known
• Estimating the difference between two population means
• Independent samples
• Dependent samples

Main Ideas:
• The sample mean is the best point estimate of the population mean
• We can use a sample mean to construct a C.I. to estimate the true value of a
population mean
• We must learn how to find the sample size necessary to estimate a population mean
Recall:
• 𝑥 =
𝑥
𝑛
: sample mean
• 𝑥 targets 𝜇 and is an individual value that is used as an estimate (i.e. it is a point
estimate for 𝜇)
Notice: There are two situations when estimating a population mean
1. 𝜎, the population standard deviation, is known
2. 𝜎, the population standard deviation, is NOT known

• Margin of Error (estimating the population mean when 𝜎 is known)
• 𝐸 = 𝑍 𝛼/2 ∗
𝜎
𝑛
• Notice: The margin of error changes when what we are estimating changes!!
• Constructing a C.I.
• Requirements:
 The sample must be a SRS
 The value of 𝜎 is known
 The population is normal OR 𝑛 > 30
• C.I.:
 𝑥 − 𝐸 < 𝜇 < 𝑥 + 𝐸
 Same as: 𝑥 ± 𝐸 and (𝑥 − 𝐸, 𝑥 + 𝐸)

• Minimum required sample size
• Sample size needed:
• If 𝜎 is known:
𝑛 =
(𝑍 𝛼/2) ∗ 𝜎
𝐸
2
• If not a whole number, ALWAYS round up to the nearest
whole number for minimum required sample sizes

Some Key Points:
• The sample mean is still the best point estimate of the population
mean
• We can use a sample mean to construct a C.I. to estimate the true
value of a population mean even when we do not know the
population standard deviation
• We see that if requirements are generally met but 𝜎 is unknown, we
must use a t-distribution

The Student t Distribution:
• If a population has a normal distribution, then the following formula describes the t-distribution:
𝑡 =
𝑥−𝜇
𝑠
𝑛
• The above formula is a t-score; a measure of relative standing
• We are estimating the unknown population standard deviation with the sample standard
deviation
• This estimation would typically lead to unreliability and so we compensate for this inherent
unreliability with wider intervals and “fatter tails” displayed in the density curve
• We must utilize a t-table or t-calculator when using the t-distribution
• We NEED degrees of freedom
 Degrees of Freedom (𝑑𝑓) for a collection of sample data is the number of sample values that can vary
after certain restrictions have been imposed upon all the data values

• Recall:
• 𝑠 =
𝑥−𝑥 2
𝑛−1
: sample standard deviation
• Margin of Error
 𝐸 = 𝑡 𝛼/2 ∗
𝑠
𝑛
with 𝑑𝑓 = 𝑛 − 1
 Notice: The margin of error also changes when the
information we have changes

Constructing a C.I.
• Requirements:
• The sample must be a SRS
• The value of 𝜎 is NOT known
• The population is normal OR 𝑛 > 30
• C.I.:
• 𝑥 − 𝐸 < 𝜇 < 𝑥 + 𝐸
• Same As: 𝑥 ± 𝐸 and 𝑥 − 𝐸, 𝑥 + 𝐸
• Notice that the C.I. appears to be the same – however, it will NOT be the
same as the previous CI for 𝜇 because (with our uncertainty about 𝜎) the
margin of error changed

• The student t distribution is different for different sample sizes
• The t distribution has the same general symmetric bell shape as the Normal
distribution, but reflects the greater variability that is expected when samples
are smaller
• The t distribution has a mean of 𝑡 = 0 just as the standard normal distribution has
a mean of 𝑧 = 0

Is the population
normal OR is n>30
Is 𝜎 known or
unknown?
Use normal
distribution
(Normal -Calculator)
Use t distribution
(t-calculator)
Use nonparametric
method or
bootstrapping
technique
Yes
No
Known
Unknown

 Requirements:
 The value of 𝜎 is known
 Test Statistic: z =
𝑥−𝜇
𝜎
𝑛
 𝜇: population mean (assumed true
under 𝐻0)
Note: p-values and critical values are from
Z-table
 Requirements:
 The value of 𝜎 is NOT known
 Test Statistic: t =
𝑥−𝜇
𝑠
𝑛
; 𝑑𝑓 = 𝑛 − 1
 𝜇: population mean (assumed true
under 𝐻0)
Note: p-values and critical values are from
t-table
𝜎 known 𝜎 NOT known

Listed below are the measured radiation emissions (in W/kg) corresponding to
a sample of cell phones.
Use a 0.05 level of significance to test the claim that cell phones have a mean
radiation level that is less than 1.00 W/kg.
The summary statistics are: .
0.38 0.55 1.54 1.55 0.50 0.60 0.92 0.96 1.00 0.86 1.46
0.938 and 0.423x s 

Requirement Check:
1. We assume the sample is a simple random sample.
2. The sample size is n = 11, which is not greater than 30, so we must check
a normal quantile plot for normality.
Note: (See plot on the right)
The points are reasonably close to a straight line
and there is no other patter, so we conclude that
The data appear to be from a normally distributed
Population.

Step 1: The claim that cell phones have a mean radiation level less than 1.00
W/kg is expressed as μ < 1.00 W/kg.
Step 2: The alternative to the original claim is μ ≥ 1.00 W/kg.
Step 3: The hypotheses are written as:
Step 4: The stated level of significance is 𝛼 = 0.05.
Step 5: Because the claim is about a population mean μ, the statistic most
relevant to this test is the sample mean:
0
1
: 1.00 W/kg
: 1.00 W/kg
H
H




x

Step 6: Calculate the test statistic and then find the P-value or the critical value
using StatCrunch.
0.938 1.00
0.486
0.423
11
xx
t
s
n
 
   

Step 7: Critical Value Method: Because the test statistic of t = –0.486 does not
fall in the critical region bounded by the critical value of t = –1.812, fail to reject
the null hypothesis.

Step 7: P-value method:
Using StatCrunch, the P-value computed is 0.3187. Since the P-value is
greater than α = 0.05, we fail to reject the null hypothesis.
Step 8:
Because we fail to reject the null hypothesis, we conclude that there is not
sufficient evidence to support the claim that cell phones have a mean
radiation level that is less than 1.00 W/kg.

We can use a confidence interval for testing a claim about μ.
For a two-tailed test with a 0.05 significance level, we construct a 95% confidence
interval.
For a one-tailed test with a 0.05 significance level, we construct a 90% confidence
interval.

Using the cell phone example, construct a confidence interval that can be used to
test the claim that μ < 1.00 W/kg, assuming a 0.05 significance level.
Note that a left-tailed hypothesis test with α = 0.05 corresponds to a 90%
confidence interval.
Using StatCrunch, the confidence interval is:
0.707 W/kg < μ < 1.169 W/kg
Because the value of μ = 1.00 W/kg is contained in the interval, we fail to reject the
null hypothesis that μ = 1.00 W/kg .
Based on the sample of 11 values, we do not have sufficient evidence to support
the claim that the mean radiation level is less than 1.00 W/kg.

When σ is known, we use test that involves the standard normal distribution.
In reality, it is very rare to test a claim about an unknown population mean
while the population standard deviation is somehow known.
The procedure is essentially the same as a t test, with the following
exception: The test statistic is
The P-value and critical values can be computed using StatCrunch.
xx
z
n





If we repeat the cell phone radiation example, with the assumption that
σ = 0.480 W/kg, the test statistic is:
The example refers to a left-tailed test, so the P-value is the area to the left
of z = –0.43, which is 0.3342.
Since the P-value is greater than 𝛼 = 0.05, we fail to reject the null and
reach the same conclusion as before.
0.938 1.00
0.43
0.480
11
xx
z
n


 
   

Main Ideas:
• The sample mean is the best point estimate of the population mean
• We can use two independent sample means to construct a
confidence interval that can be used to estimate the true value of the
underlying difference in the corresponding population means
• We can also test claims about the difference between two population
means

Dependent samples
 two samples are dependent if the
sample values are paired
Independent samples
 two samples are independent if
the sample values from one are
not related to or somehow
naturally paired/matched with the
sample values from the other

Requirements:
• Population standard deviations (𝜎1 and 𝜎2) are NOT known and
NOT assumed equal
• The two samples are independent
• Both samples are SRS
• Both 𝑛1 > 30 and 𝑛2 > 30 OR both samples come from populations
that are normal

• Margin of Error
𝐸 = 𝑡 𝛼/2 ∗
𝑠1
2
𝑛1
+
𝑠2
2
𝑛2
and 𝑑𝑓 = min 𝑛1 − 1, 𝑛2 − 1
• C.I.: 𝑥1 − 𝑥2 − 𝐸 < 𝜇1 − 𝜇2 < 𝑥1 − 𝑥2 + 𝐸
• Notice that we are often interested in whether or not 0 is included
within the limits of the confidence interval constructed, i.e., whether or
not 𝜇1 − 𝜇2 = 0 is reasonable

• Requirements:
• Requirements and degrees of freedom (df) are the same as in the
C.I. before
• Test Statistic: 𝑡 =
𝑥1−𝑥2 − 𝜇1−𝜇2
𝑠1
2
𝑛1
+
𝑠2
2
𝑛2

Researchers conducted trials to investigate the effects of color on
creativity.
Subjects with a red background were asked to think of creative uses for
a brick; other subjects with a blue background were given the same
task.
Responses were given by a panel of judges.
Researchers make the claim that “blue enhances performance on a
creative task”. Test the claim using a 0.01 significance level.

Requirement check:
1. The values of the two population standard deviations are unknown
and assumed not equal.
2. The subject groups are independent.
3. The samples are simple random samples.
4. Both sample sizes exceed 30.
The requirements are all satisfied.

The data:
Background color Sample size Sample mean Sample standard deviation
Red Background n = 35 s = 0.97
Blue Background n = 36 s = 0.63
3.39x 
3.97x 

Step 1: The claim that “blue enhances performance on a creative task”
can be restated as “people with a blue background (group 2) have a
higher mean creativity score than those in the group with a red background
(group 1)”. This can be expressed as μ1 < μ2.
Step 2: If the original claim is false, then μ1 ≥ μ2.
Step 3: The hypotheses can be written as:
OR
𝐻0: 𝜇1−𝜇2=0
𝐻1: 𝜇1−𝜇2<0
0 1 2
1 1 2
:
:
H
H
 
 



Step 4: The significance level is α = 0.05.
Step 5: Because we have two independent samples and we are testing a claim
about two population means, we use a t-distribution.
Step 6: Calculate the test statistic. 1 2 1 2
2 2
1 2
1 2
2 2
( ) ( )
(3.39 3.97) 0
2.979
0.97 0.63
35 36
x x
t
s s
n n
   


 
  


Step 6: Because we are using a t-distribution, the critical value of t = –2.441
is found using StatCrunch. We use 34 degrees of freedom.

Step 7: Because the test statistic does fall in the critical region, we reject the
null hypothesis μ1 – μ2.
P-Value Method: StatCrunch provides a P-value, and the area to the left of
the test statistic of t = –2.979 is 0.0021. Since this is less than the significance
level of 0.01, we reject the null hypothesis.
Conclusion: There is sufficient evidence to support the claim that the red
background group has a lower mean creativity score than the blue
background group.

Using the data from this color creativity example, construct a 98%
confidence interval estimate for the difference between the mean
creativity score for those with a red background and the mean
creativity score for those with a blue background.

Using StatCrunch, the 98% confidence interval obtained is:
−1.05 < 𝜇1 − 𝜇2 < −0.11
2 2 2 2
1 2
/2
1 2
0.97 0.63
2.441 0.475261
35 36
s s
E t
n n
    
1 23.39 and 3.97x x 
1 2 1 2 1 2
1 2
( ) ( ) ( )
1.06 ( ) 0.10
x x E x x E 
 
      
    

We are 98% confident that the limits –1.05 and –0.11 actually do contain
the difference between the two population means.
Because those limits do not include 0, our interval suggests that there is
a significant difference between the two means.

These methods are rarely used in practice because the underlying
assumptions are usually not met.
1. The two population standard deviations are both known
• the test statistic will be a z instead of a t and use the standard
normal model.
2. The two population standard deviations are unknown but assumed
to be equal
• pool the sample variances

1 2 1 2
2 2
1 2
1 2
( ) ( )x x
z
n n
 
 
  


The test statistic will be:
P-values and critical values are found using StatCrunch.

1 2 1 2 1 2( ) ( ) ( )x x E x x E       
2 2
1 2
/ 2
1 2
E z
n n

 
 

The test statistic will be
Where the pooled sample variance is
with
1 2 1 2
2 2
1 2
( ) ( )
p p
x x
t
s s
n n
   


2 2
2 1 1 2 2
1 2
( 1) ( 1)
( 1) ( 1)
p
n s n s
s
n n
  

  
1 2df 2n n  

1 2 1 2 1 2( ) ( ) ( )x x E x x E       
2 2
/2
1 2
p ps s
E t
n n
 
1 2df 2n n  

Independent Samples (Two Additional Methods)
• 𝜎1 𝑎𝑛𝑑 𝜎2 known – Z Test / Z Interval
• 𝜎1 = 𝜎2 -- Pooled Sample Variance
Dependent Samples
• When samples are paired, we use a different methodology

Main Ideas:
• The sample mean is still the best point estimate of the population mean
• We can use two dependent sample means to construct a confidence interval
that can be used to estimate the true value of the underlying difference in the
corresponding population means
• We can also test claims about the difference between two population means
• In experimental design, using dependent samples is generally better and more
practical than assuming two independent samples

Notation:
• 𝑑: 𝑡ℎ𝑒 𝑖𝑛𝑑𝑖𝑣𝑖𝑑𝑢𝑎𝑙 𝑑𝑖𝑓𝑓𝑒𝑟𝑒𝑛𝑐𝑒 𝑏𝑒𝑡𝑤𝑒𝑒𝑛 𝑡𝑤𝑜 𝑣𝑎𝑙𝑢𝑒𝑠 𝑖𝑛 𝑎 𝑠𝑖𝑛𝑔𝑙𝑒 𝑚𝑎𝑡𝑐ℎ𝑒𝑑 𝑝𝑎𝑖𝑟
• 𝑛: 𝑡ℎ𝑒 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑝𝑎𝑖𝑟𝑠 𝑜𝑓 𝑑𝑎𝑡𝑎
• 𝜇 𝑑: 𝑡ℎ𝑒 𝑝𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛 𝑚𝑒𝑎𝑛 𝑜𝑓 𝑡ℎ𝑒 𝑑𝑖𝑓𝑓𝑒𝑟𝑒𝑛𝑐𝑒𝑠 𝑓𝑜𝑟 𝑎𝑙𝑙 𝑡ℎ𝑒 𝑝𝑎𝑖𝑟𝑠 𝑜𝑓 𝑑𝑎𝑡𝑎
• 𝑑: 𝑡ℎ𝑒 𝑠𝑎𝑚𝑝𝑙𝑒 𝑚𝑒𝑎𝑛 𝑜𝑓 𝑡ℎ𝑒 𝑑𝑖𝑓𝑓𝑒𝑟𝑒𝑛𝑐𝑒𝑠 𝑓𝑜𝑟 𝑡ℎ𝑒 𝑝𝑎𝑖𝑟𝑒𝑑 𝑠𝑎𝑚𝑝𝑙𝑒 𝑑𝑎𝑡𝑎
• 𝑠 𝑑: 𝑡ℎ𝑒 𝑠𝑎𝑚𝑝𝑙𝑒 𝑠𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛 𝑜𝑓 𝑡ℎ𝑒 𝑑𝑖𝑓𝑓𝑒𝑟𝑒𝑛𝑐𝑒𝑠 𝑓𝑜𝑟 𝑡ℎ𝑒 𝑝𝑎𝑖𝑟𝑒𝑑 𝑠𝑎𝑚𝑝𝑙𝑒 𝑑𝑎𝑡𝑎

Requirements
• The sample data are dependent
• Both samples are SRS
• Either 𝑛 > 30 OR the paired differences come from a population that is
normal
Margin of Error
• 𝐸 = 𝑡 𝛼/2 ∗
𝑠 𝑑
𝑛
with 𝑑𝑓 = 𝑛 − 1
C.I.
• 𝑑 − 𝐸 < 𝜇 𝑑 < 𝑑 + 𝐸
Notice that we are often interested in whether or not 0 is included within the limits
of the confidence interval constructed, i.e., whether or not 𝜇 𝑑 = 0 is reasonable

Requirements:
• Requirements and Degrees of freedom (𝑑𝑓) are the same as in the C.I.
above
Test Statistic:
𝑡 =
𝑑−𝜇 𝑑
𝑠 𝑑
𝑛

Use the sample data below with a significance level of 0.05 to test the
claim that for the population of heights of presidents and their main
opponents, the differences have a mean greater than 0 cm (so presidents
tend to be taller than their opponents).
Height (cm) of President 189 173 183 180 179
Height (cm) of Main Opponent 170 185 175 180 178
Difference d 19 -12 8 0 1

Requirement Check:
1. The samples are dependent because the values are paired.
2. The pairs of data are randomly selected.
3. The number of data points is 5, so normality should be checked (and it is
assumed the condition is met).

Step 1: The claim is that µd > 0 cm.
Step 2: If the original claim is not true, we have µd ≤ 0 cm.
Step 3: The hypotheses can be written as:
0
0
: 0 cm
: 0 cm
d
d
H
H





Step 4: The significance level is α = 0.05.
Step 5: We use the Student t-distribution.
The summary statistics are: 3.2
11.4
d
s



Step 6: Determine the value of the test statistic:
with df = 5 – 1 = 4
3.2 0
0.628
11.4
5
d
d
d
t
s
n
 
  

Step 6: Using StatCrunch, the P-value is 0.282.
Using the critical value method:

Step 7: Because the P-value exceeds 0.05, or because the test statistic
does not fall in the critical region, we fail to reject the null hypothesis.
Conclusion: There is not sufficient evidence to support the claim that for
the population of heights of presidents and their main opponent, the
differences have a mean greater than 0 cm.
In other words, presidents do not appear to be taller than their
opponents.

Confidence Interval: Support the conclusions with a 90% confidence
interval estimate for µd.
/2
11.4
2.132 10.8694
5
ds
E t
n
  
3.2 10.8694 3.2 10.8694
7.7 14.1
d
d
d
d E d E


   
   
  

We have 90% confidence that the limits of –7.7 cm and 14.1 cm contain
the true value of the difference in height (president’s height – opponent’s
height).
See that the interval does contain the value of 0 cm, so it is very possible
that the mean of the differences is equal to 0 cm, indicating that there is
no significant difference between the heights.

Complete the following:
• Practice Problems 5
• Practice Problems 6

Population and sample mean

More Related Content

What's hot

Viewers also liked

Similar to Population and sample mean

More from Avjinder (Avi) Kaler

Recently uploaded

Population and sample mean