0
Lecture 10 - Basic Statistics and the Z-test
C2 Foundation Mathematics (Standard Track)
Dr Linda Stringer Dr Simon Craik
l...
Lecture 9 skills
Calculate the following measures of location (AVERAGES)
Mode
Median
Mean
Calculate the following measures...
A data set
A data set is usually a list of values (numbers) that has
been gathered in a survey.
We will use the following ...
Bar charts
A bar chart showing how many pets 10 people have:
0 2 0 1 0 8 2 1 0 0
1 2 3 4 5 6 7 8 9 10
0
2
4
6
8
Pie charts
A pie chart of the data
0 2 0 1 0 8 2 1 0 0
0
50%
1
20%
2
20% 8
10%
Histogram
A histogram of the data showing how many people have each
number of pets.
0 2 0 1 0 8 2 1 0 0
0 1 2 8
1
2
3
4
5
Mode
In a data set the mode is the most frequent value (the value
which occurs most often). The mode is a type of average....
Mode
There can be more than one mode in a data set
Example:
0 5 5 0 1 5 0 1 6
There are two modes, they are 0 and 5.
Median
The median is the middle value in an ordered data set. It is
another type of average.
First order the data, with va...
Example
Order the data:
0 0 0 0 0 1 1 2 2 8
n = 10 (the number of values)
n
2 = 10
2 = 5, which is an integer
The median i...
Example 2
Order the data:
0 0 0 1 1 5 5 5 6
n = 9 (the number of values)
n
2 = 9
2 = 4.5, which is not an integer.
Round u...
Interquartile range
First order the data, with values increasing from left to right.
We want to find two values: the first q...
Example
Order the data
0 0 0 0 0 1 1 2 2 8
n
4 = 10
4 = 2.5, which is not an integer.
Round up to 3.
Q1 is the third value...
Sigma notation Σ
Given a data set X, we denote the sum of all the values x
in X by
x
Example: If
X = 0 2 0 1 0 8 2 1 0 0
t...
Mean
The mean is our third average.
In a data set of size n the mean, denoted ¯x, is the sum of
all the values divided by ...
Standard deviation, σ
The standard deviation, σ is a measure of dispersion.
First calculate the variance, σ2. The standard...
Proof that the two formulae for standard deviation are
equivalent
σ2
= (x−¯x)2
n
= x2
−2x¯x+¯x2
n
= x2
n − 2¯x x
n +
¯x2
n...
Example
What is the standard deviation of the following data ?
0 2 0 1 0 8 2 1 0 0
Use the second formula to calculate the...
Absolute value
The absolute value function gives the positive value of any
number
|x| =
x if x ≥ 0
−x if x < 0
|5| = 5,
| ...
Absolute deviation
The absolute deviation measures the average distance
from each value to the mean. It is another measure...
Example
What is the absolute deviation of the data
0 2 0 1 0 8 2 1 0 0
The mean is ¯x = 1.4. We first work out |x − ¯x|:
1....
Hypothesis testing
We use hypothesis testing to compare the mean of a very large
data set, a population mean, with the mea...
Hypothesis testing
The null hypothesis, H0 is a statement which is assumed to
be true.
Sample data is collected and tested...
The null hypothesis and the alternative hypothesis
The null hypothesis concerns the population mean.
It is of the form
H0 ...
Significance level
The null hypothesis will always be tested to a given level of
significance.
A 5% level of significance mea...
Critical value
A critical value is the value beyond which we reject the null
hypothesis. It tells us the boundary of the c...
H1 : µ = A
If our alternative hypothesis is H1 : µ = A we are doing a
two-tailed test and we have 2 critical values, one n...
H1 : µ > A
If our alternative hypothesis is H1 : µ > A we are doing a
one-tailed test and we have 1 critical value which i...
H1 : µ < A
If our alternative hypothesis is H1 : µ < A we are doing a
one-tailed test and we have 1 critical value which i...
Test statistic
The test statistic is difference between the sample mean, ¯x
and the (hypothetical) population mean A, divi...
Z-test - Example 1
Research says that the mean height for a man is 182cm with a
standard deviation of 9. We suspect men mi...
Z-test - Example 1
The null hypothesis and alternative hypothesis are:
H0 : µ = 182
H1 : µ < 182
We are doing a 1-tailed t...
Z-test - Example 2
A company says employees are supposed to work an average
of 40 hours a week with a standard deviation o...
Z-test - Example 2
The null hypothesis and alternative hypothesis are:
H0 : µ = 40
H1 : µ = 40
We are doing a 2-tailed tes...
Z-test - Example 3
A lightbulb company says their lightbulbs last a mean time of
1000 hours with a standard deviation of 5...
Z-test - Example 3
The null hypothesis and alternative hypothesis are:
H0 : µ = 1000
H1 : µ > 1000
We are doing a 1-tailed...
Z-test summary
You will be given
1. Population mean, A
2. Population standard deviation, σ
3. Significance level
4. Sample ...
The theory behind the Z-test and the T-test
If samples of size n are taken from a population with mean A
and standard devi...
Normal distribution X ∼ N(µ, σ2
)
The normal distribution is defined as
f(x) =
1
σ
√
2π
e
−
(x−µ)2
2σ2
where σ is the popul...
Upcoming SlideShare
Loading in...5
×

C2 st lecture 10 basic statistics and the z test handout

185

Published on

Published in: Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
185
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
10
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Transcript of "C2 st lecture 10 basic statistics and the z test handout"

  1. 1. Lecture 10 - Basic Statistics and the Z-test C2 Foundation Mathematics (Standard Track) Dr Linda Stringer Dr Simon Craik l.stringer@uea.ac.uk s.craik@uea.ac.uk INTO City/UEA London
  2. 2. Lecture 9 skills Calculate the following measures of location (AVERAGES) Mode Median Mean Calculate the following measures of dispersion (MEASURES OF SPREAD) Interquartile range Standard deviation Absolute deviation Perform a Z-test Write the null and alternative hypothesis Look up the critical value Calculate the test statistic Make the decision Write a conclusion
  3. 3. A data set A data set is usually a list of values (numbers) that has been gathered in a survey. We will use the following data set to demonstrate the ideas in the first part of this lecture. A statistician wants to find how many pets the average person has. He interviews 10 people and gets the following values 0 2 0 1 0 8 2 1 0 0
  4. 4. Bar charts A bar chart showing how many pets 10 people have: 0 2 0 1 0 8 2 1 0 0 1 2 3 4 5 6 7 8 9 10 0 2 4 6 8
  5. 5. Pie charts A pie chart of the data 0 2 0 1 0 8 2 1 0 0 0 50% 1 20% 2 20% 8 10%
  6. 6. Histogram A histogram of the data showing how many people have each number of pets. 0 2 0 1 0 8 2 1 0 0 0 1 2 8 1 2 3 4 5
  7. 7. Mode In a data set the mode is the most frequent value (the value which occurs most often). The mode is a type of average. Example: Find the mode of the following data set 0 2 0 1 0 8 2 1 0 0 In this data set the mode is 0.
  8. 8. Mode There can be more than one mode in a data set Example: 0 5 5 0 1 5 0 1 6 There are two modes, they are 0 and 5.
  9. 9. Median The median is the middle value in an ordered data set. It is another type of average. First order the data, with values increasing from left to right. Let n be the size of the data set (the number of values). If n 2 is an integer (whole number) then the median is the midpoint of the n 2 th value and the n 2 + 1th value (to find the midpoint, add the values together and divide by 2). If n 2 is not an integer (whole number) then round it up to the nearest integer (n+1 2 ). The median is the n+1 2 th value. OR find the median by crossing off pairs of values, starting from the ends of the data set.
  10. 10. Example Order the data: 0 0 0 0 0 1 1 2 2 8 n = 10 (the number of values) n 2 = 10 2 = 5, which is an integer The median is the midpoint of the 5th and 6th value = 0+1 2 = 0.5.
  11. 11. Example 2 Order the data: 0 0 0 1 1 5 5 5 6 n = 9 (the number of values) n 2 = 9 2 = 4.5, which is not an integer. Round up to 5. The median is the 5th value = 1.
  12. 12. Interquartile range First order the data, with values increasing from left to right. We want to find two values: the first quartile Q1 and the third quartile Q3. Let n be the size of the data set (the number of values). To find Q1 we multiply n by 1 4 . If n 4 is an integer (whole number) then Q1 is the midpoint of the (n 4 )th value and the (n 4 + 1)th value If n 4 is not an integer then round it up to the nearest integer. Q1 is the corresponding value. To find Q3 we multiply n by 3 4 . If 3n 4 is an integer then Q3 is the midpoint of the (3n 4 )th value and the (3n 4 + 1)th value If 3n 4 is not an integer then round it up to the nearest integer. Q3 is the corresponding value. The interquartile range is Q3 − Q1.
  13. 13. Example Order the data 0 0 0 0 0 1 1 2 2 8 n 4 = 10 4 = 2.5, which is not an integer. Round up to 3. Q1 is the third value, so Q1 = 0. 3n 4 = 3×10 4 = 7.5, which is not an integer. Round up to 8. Q3 is the eighth value, so Q3 = 2. The interquartile range is Q3 − Q1 = 2 − 0 = 2.
  14. 14. Sigma notation Σ Given a data set X, we denote the sum of all the values x in X by x Example: If X = 0 2 0 1 0 8 2 1 0 0 then x = 0 + 2 + 0 + 1 + 0 + 8 + 2 + 1 + 0 + 0 = 14
  15. 15. Mean The mean is our third average. In a data set of size n the mean, denoted ¯x, is the sum of all the values divided by n. ¯x = x n Example: What is the mean number of pets? Calculate the sum of all the values and divide by n ¯x = x n = 0 + 2 + 0 + 1 + 0 + 8 + 2 + 1 + 0 + 0 10 = 14 10 = 1.4
  16. 16. Standard deviation, σ The standard deviation, σ is a measure of dispersion. First calculate the variance, σ2. The standard deviation, σ, is the square root of the variance. There are two formulae for variance. They give the same answer. Usually the second formula is easier to use. σ2 = (x − ¯x)2 n = x2 n − ¯x2 When you have found the variance, do not forget to take the square root ! σ = x2 n − ¯x2
  17. 17. Proof that the two formulae for standard deviation are equivalent σ2 = (x−¯x)2 n = x2 −2x¯x+¯x2 n = x2 n − 2¯x x n + ¯x2 n = x2 n − 2¯x2 + ¯x2 1 n = x2 n − ¯x2
  18. 18. Example What is the standard deviation of the following data ? 0 2 0 1 0 8 2 1 0 0 Use the second formula to calculate the variance. σ2 = x2 n − ¯x2 We previously worked out the mean ¯x = 1.4. x2 = 02 +22 +02 +12 +02 +82 +22 +12 +02 +02 = 74 The variance is σ2 = x2 n − ¯x2 = 74 10 − 1.42 = 5.44 The standard deviation is σ = √ 5.44 = 2.33 to 2 d.p.
  19. 19. Absolute value The absolute value function gives the positive value of any number |x| = x if x ≥ 0 −x if x < 0 |5| = 5, | − 8| = 8, | − 1.213| = 1.213. |1, 000, 000| = 1, 000, 000.
  20. 20. Absolute deviation The absolute deviation measures the average distance from each value to the mean. It is another measure of dispersion. As a formula: AD = |x − ¯x| n
  21. 21. Example What is the absolute deviation of the data 0 2 0 1 0 8 2 1 0 0 The mean is ¯x = 1.4. We first work out |x − ¯x|: 1.4 0.6 1.4 0.4 1.4 6.6 0.6 0.4 1.4 1.4 The absolute deviation is AD = |x − ¯x| n = 15.6 10 = 1.56
  22. 22. Hypothesis testing We use hypothesis testing to compare the mean of a very large data set, a population mean, with the mean of a sample data set, a sample mean. Example: A lightbulb company says their lightbulbs last a mean time of 1000 hours with a standard deviation of 50. We think their lightbulbs last longer than this and propose a test at a 5% level of significance. We buy 75 lightbulbs and they last a mean time of 1022 hours. The population mean is 1000 hours. The sample is the 75 light bulbs that we test. The sample mean is 1022 hours.
  23. 23. Hypothesis testing The null hypothesis, H0 is a statement which is assumed to be true. Sample data is collected and tested to see if it is consistent with the null hypothesis. If the sample mean is significantly different from the population mean, then we say that we have sufficient evidence to reject the null hypothesis, H0, in favour of the alternative hypothesis, H1.
  24. 24. The null hypothesis and the alternative hypothesis The null hypothesis concerns the population mean. It is of the form H0 : µ = A where µ is ’population mean’ and A is the hypothetical value The alternative hypothesis is that the null hypothesis is incorrect and will be one of H1 : µ = A H1 : µ < A H1 : µ > A The question will direct you which of the above to use.
  25. 25. Significance level The null hypothesis will always be tested to a given level of significance. A 5% level of significance means we are testing to see if the probability of getting the sample data is less than 0.05. If the probability is less we reject the null hypothesis in favour of the alternative hypothesis. A 1% level of significance translates to a probability of 0.01.
  26. 26. Critical value A critical value is the value beyond which we reject the null hypothesis. It tells us the boundary of the critical region(s) In a Z-test this depends on the alternative hypothesis and the significance level. We look up the critical value(s) in tables. Sig. Lev. 5% Sig. Lev. 1% One-tail Two-tail One-tail Two-tail Critical value 1.65 1.96 2.33 2.58
  27. 27. H1 : µ = A If our alternative hypothesis is H1 : µ = A we are doing a two-tailed test and we have 2 critical values, one negative and one positive. The critical value is the boundary of the rejection region. For a 5% level of significance we have the following picture: −1.96 1.96 x y The rejection (shaded) regions have a combined area of 0.05.
  28. 28. H1 : µ > A If our alternative hypothesis is H1 : µ > A we are doing a one-tailed test and we have 1 critical value which is positive. The critical value is the boundary of the rejection region. For a 5% level of significance we have the following picture: 1.65 x y The rejection region has an area of 0.05.
  29. 29. H1 : µ < A If our alternative hypothesis is H1 : µ < A we are doing a one-tailed test and we have 1 critical value which is negative. The critical value is the boundary of the rejection region. For a 5% level of significance we have the following picture: 1.65 x y The rejection region has an area of 0.05.
  30. 30. Test statistic The test statistic is difference between the sample mean, ¯x and the (hypothetical) population mean A, divided by the standard error. The standard error is σ/ √ n for the Z-test and s/ √ n for the T-test, where n is the sample size, σ is the population standard deviation and s is the sample standard deviation. The Z-test statistic is Z = ¯x − A σ/ √ n If the test statistic lies beyond the critical value(s) (in the rejection region) we reject H0. If it does not, we accept H0.
  31. 31. Z-test - Example 1 Research says that the mean height for a man is 182cm with a standard deviation of 9. We suspect men might be shorter than this. We get the heights of 100 men and their mean height is 176. We test at a 1% level of significance.
  32. 32. Z-test - Example 1 The null hypothesis and alternative hypothesis are: H0 : µ = 182 H1 : µ < 182 We are doing a 1-tailed test at a 1% level of significance so the critical value is: C = −2.33. The test statistic is Z = 176−182 9/ √ 100 = −6.67. −6.67 < −2.33 so we reject the null hypothesis.
  33. 33. Z-test - Example 2 A company says employees are supposed to work an average of 40 hours a week with a standard deviation of 5 hours. Alfred wants to know if he fits this to a 5% level of significance. He notes down how many hours he works over 48 weeks and has a mean of 39 hours.
  34. 34. Z-test - Example 2 The null hypothesis and alternative hypothesis are: H0 : µ = 40 H1 : µ = 40 We are doing a 2-tailed test at a 5% level of significance so the critical values are: C = −1.96, 1.96. The test statistic is Z = 39−40 5/ √ 48 = −1.39. −1.96 < −1.39 < 1.96 so we accept the null hypothesis.
  35. 35. Z-test - Example 3 A lightbulb company says their lightbulbs last a mean time of 1000 hours with a standard deviation of 50. We think their lightbulbs last longer than this and propose a test at a 5% level of significance. We buy 75 lightbulbs and they last a mean time of 1022 hours.
  36. 36. Z-test - Example 3 The null hypothesis and alternative hypothesis are: H0 : µ = 1000 H1 : µ > 1000 We are doing a 1-tailed test at a 5% level of significance so the critical value is: C = 1.65. The test statistic is Z = 1022−1000 50/ √ 75 = 3.81. 1.65 < 3.81 so we reject the null hypothesis.
  37. 37. Z-test summary You will be given 1. Population mean, A 2. Population standard deviation, σ 3. Significance level 4. Sample mean, ¯x 5. Sample size, n 6. Quantifying word. You have to work out 1. Null hypothesis, alternative hypotheis 2. Critical value(s) 3. Test statistic 4. Decision - accept/reject H0 (sketch a picture if possible) 5. Conclusion
  38. 38. The theory behind the Z-test and the T-test If samples of size n are taken from a population with mean A and standard deviation σ, then the sample means are distributed normally, with mean A and standard deviation σ/ √ n When we calculate the test statistic, we are calculating the Z-score of the sample mean The critical value is the Z-score of a sample mean which we have a 5% (or 1%) probability of obtaining For further information, try a statistics book from the library, or the khanacademy videos on youtube
  39. 39. Normal distribution X ∼ N(µ, σ2 ) The normal distribution is defined as f(x) = 1 σ √ 2π e − (x−µ)2 2σ2 where σ is the population standard deviation and µ is the population mean. The graph below is when µ = 0 and σ = 1. −4 −2 2 4 0.1 0.2 0.3 0.4 0.5 x y Probabilities correspond to areas under this curve
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×