Upcoming SlideShare
×

# Unit 2 - Statistics

1,372

Published on

Published in: Education
0 Likes
Statistics
Notes
• Full Name
Comment goes here.

Are you sure you want to Yes No
• Be the first to comment

• Be the first to like this

Views
Total Views
1,372
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
0
0
Likes
0
Embeds 0
No embeds

No notes for slide

### Unit 2 - Statistics

1. 1. Unit 2 - Statistics EssentialMathematics 40S
2. 2. Statistics• Quite often you will encounter statistics in your day to day life. These statistics may be mentioned on the news, commercials, weather reports or the Internet. Its important for you to be able to interpret these statistics, as they may affect your opinion and the choices you make.
3. 3. Measures of Central Tendency• In this lesson you will learn about measures of central tendency. Large amounts of data are often summarized by stating the values of the mean, median and mode.• Although these three measures of central tendency are usually located near the centre of a group of data, they often have different calculated values.
4. 4. Calculating the Mean, Median and Mode• The word average is often used in everyday language to describe the sum of a set of values divided by the total number of values.• In statistics, this term is known as the mean or arithmetic mean.• There are two other common statistical terms that are used to refer to the centre of a set of values — the median and the mode.• The median of a set of values is the middle value when the values are arranged in ascending or descending order. The mode of a set of values is the value that occurs the most often.
5. 5. Calculating the Mean, Median and ModeExample 1:• Calculate the mean, the median and the mode for the following set of values: 55, 62, 70, 77, 78, 78• Sample Solution:• (If there is an even number of values you must average the two values in the middle.)
6. 6. Calculating the Mean, Median and ModeExample 1:• Calculate the mean, the median and the mode for the following set of values: 55, 62, 70, 77, 78, 78• Sample Solution: Mode = 78 (the number that occurs the most often)
7. 7. Calculating the Mean, Median and ModeExample 2:• Kansas Ross has a mean mark of 50% for her first three math tests and then she earns a mark of 70% on her fourth test. Kansas states that since the average of 50 and 70 is 60, her new mean math mark is 60%. Do you think Kansas is correct? Explain your reasoning.
8. 8. Calculating the Mean, Median and ModeExample 2:• Sample Solution:• Kansas is incorrect. If her first three tests have a mean mark of 50%, then the sum of her first three tests must be 150 (150/3 = 50). Then the sum of her four tests will be 150 + 70 = 220, so the mean of her four tests is 220/4 = 55.• Kansas did not take into account the fact that each test should be weighted evenly to give a mean of 55% not 60%.
9. 9. The Effect of Outliers on the Measures of Central Tendancy• In statistics, an outlier is an observation that is numerically distant from the rest of the data. Its a value that lies outside (and is much larger or smaller than) the other values in a set of data.• For example, in the scores 3, 25, 27, 28, 29, 32, 33, 85, both 3 and 85 are considered "outliers" because they are numerically distant from the other numbers in the data set.
10. 10. The Effect of Outliers on the Measures of Central TendancyExample:• There are five people in a group that are 61, 61, 63, 64, 66, and 90 inches tall. a) Determine the mean, median and mode. b) What is the outlier? If you remove the outlier, which measure(s) of central tendancy is affected the most?
11. 11. The Effect of Outliers on the Measures of Central TendancySolution:• The mean is 67.5, the median is 63.5 (halfway between 63 and 64) and the mode is 61.• 90 is the outlier. If you remove the 90 from the set of values, the new mean is 63, and the median is also 63. The mode is unchanged at 61. The outlier affects the mean more because when dealing with median it doesnt matter what the actual value of the outlier is. Whether it was 90 or 130, taking it out would have the same effect on the median. The mean depends on the actual value of the outlier.
12. 12. Determining the Trimmed Mean• A trimmed mean is calculated by discarding a certain percentage of the lowest and the highest scores — then calculating the mean of the remaining scores.• For example, a mean trimmed 25% is calculated by discarding the top and bottom 25% of the scores, then taking the mean of the remaining scores.• A trimmed mean is less susceptible to the effects of extreme scores (outliers) than the arithmetic mean. The trimmed mean is designed to eliminate the impact of outliers.
13. 13. Determining the Trimmed Mean• These are the steps to follow to determine the trimmed mean: – Find the number of observations, denoted n. – Reorder them from smallest to largest. – Find the proportion trimmed, p=P/100, where P = % trimmed. – Calculate np to determine how many values to trim at each end.
14. 14. Determining the Trimmed MeanExample:• Find the 10% trimmed mean of 2, 35, 46, 47, 51, 51, 59, 60, 61, 121.
15. 15. Determining the Trimmed MeanSolution:• n=10, p = 10/100 = 0.10, np = 10*0.1 = 1 which is an integer so trim exactly one observation at each end. Trim off 2 and 121 which leaves you with 8 observations. If np has a fractional part present, you can round that portion to the nearest integer to determine how many values to trim at each end.
16. 16. Determining the Trimmed Mean
17. 17. Determine the Weighted Mean of a Set of Data• The weighted mean is similar to an arithmetic mean (the most common type of average). But with the weighted mean, each of the data points contributing equally to the final average, some data points contribute more than others. If all the weights are equal, then the weighted mean is the same as the arithmetic mean.
18. 18. Determine the Weighted Mean of a Set of Data
19. 19. Determine the Weighted Mean of a Set of Data• Consider the following example:• Your math teacher has two math classes. One class has 5 students, while the other has 10 students. The grades in each class on a test were: Class 1: 55, 69, 80, 84, 62 Class 2: 70, 90, 55, 84, 88, 93, 78, 69, 98, 75• The mean for class 1 is 70, and the mean for class 2 is 80.
20. 20. Determine the Weighted Mean of a Set of Data• If you calculate the mean of the two classes together you get 75 (70 + 80 = 150 150/2 = 75). However, this does not account for the different number of students in each class, and the value of 75 does not reflect the mean student grade for all 15 students.• The accurate student mean for all of the students, without regard to which class they are in, can be found by totalling all of the grades and dividing by 15 students.
21. 21. Determine the Weighted Mean of a Set of Data• This can also be accomplished by using a weighted mean of the class means:• The use of weighted mean makes it possible to find the mean student grade in the case where only the class means and the number of students in each class are available.
22. 22. Determine the Weighted Mean of a Set of Data• To calculate the Weighted Mean for a set of data follow these steps: – Multiply each value by its weight. – Add up the products of value multiplied by weight to get the total value. – Add the weight themselves to get the total weight. – Divide the total value by the total number of individual values.
23. 23. Determine the Weighted Mean of a Set of DataExample:• One hundred people were surveyed to find out how many days they exercised per week. The following chart summarizes the results of the survey. What is the mean number of days that this group of people exercised per week? Number of days of 0 1 2 3 4 5 6 7 exercise per week Number of People 6 5 7 15 29 16 14 8
24. 24. Determine the Weighted Mean of a Set of DataSample Solution:• 1. Multiply each value by its weight. 0 x 6 = 0, 1 x 5 = 5, 2 x 7 = 14, 3 x 15 = 45, 4 x 29 = 116, 5 x 16 = 80, 6 x 14 = 84, 7 x 8 = 56• 2. Add up the products of value multiplied by weight to get the total value. Sum = 0 + 5 + 14 + 45 + 116 + 80 + 84 + 56 = 400• 3. Add the weight themselves to get the total weight. There are 100 people in the survey.• 4. Divide the total value by the total number of individual values. 400/100 = 4 days The mean number of days that this group of people exercised per week is 4 days.
25. 25. Practice• Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Vivamus et magna. Fusce sed sem sed magna suscipit egestas.• Lorem ipsum dolor sit amet, consectetuer STATISTICS adipiscing elit. Vivamus et magna. Fusce Measures magna suscipit egestas. sed sem sed of Central Tendency Worksheet #1
26. 26. Statistics• One of the most fundamental principles in statistics is that of variability. The study and understanding of variability is important in medicine, manufacturing, science, meteorology, bu siness and many aspects of our daily lives. How affective is a particular drug? What is the average life span of a D-cell battery? What’s the probability of precipitation tomorrow? Which age group watches the most TV in a week? Statistics and more specifically the study of variability help us to answer questions like these. In this unit you will be learning about the variability of data.
27. 27. Percentiles• One way to find out how well you have done on a test is to convert your test mark to a percent score. This percent score indicates how well you would have done on the test if it were marked out of 100.• Although somewhat meaningful in itself, your score takes on more meaning when it is compared to that of your classmates. How many students scored higher than you did? How many scored lower? The study of percentiles helps us answer these questions.
28. 28. Percentiles• Cara writes a test and scores 48 out of possible 60 marks.• Therefore, 48 out of 60 as a percent score is 80%.• A mark of 80% seems like a very good mark. It is often given a letter grade of ‘A’ and is associated with excellence.• Is it?
29. 29. Percentiles• Suppose 100 students have written the same test as Cara. Suppose only 10 of these students score less than 48 out of 60. How does Cara’s mark compare with the marks of the other students who have written the same test? Assume that no other student has scored exactly 48 out of 60. Solution You can compare Cara’s mark with those of the other students who have written the same test using the following percent bar. Since the majority of the students have scored higher than 48 out of 60, Cara’s mark of 48 out of 60 is not that impressive.
30. 30. Percentiles• Suppose, however, that of the 100 students who have written the same test as Cara, 90 of them score lower than 48 out of 60. How does Cara’s mark now compare with the marks of the other students? Assume no other student scores 48 out of 60. Solution You can again compare Cara’s mark with those of the other students with a percent bar. Relative to the other test scores, Cara’s mark of 48 out of 60 is very good.
31. 31. Percentile Rank• A score becomes more meaningful when it is compared to other scores. One way to compare a score is to assign it a percentile rank. A percentile rank indicates the percent of all scores that fall below a particular score.
32. 32. Percentile Rank• In the first example, where 10% of the students have scored below 48 out of 60, Cara’s percentile rank would be 10 out of 100 or in the 10th percentile. A mark in the 10th percentile indicates that Cara has scored better than only 10% of all the students who have written the test. In the second example, where 90% of students have scored below 48 out of 60, Cara’s percentile rank would be 90 out of 100 or in the 90th percentile. A mark in the 90th percentile indicates that Cara has scored better than 90% of the students who have written the test.• A percentile rank compares the number of scores less than or equal to a given score to the total number of scores. The higher the percentile rank, the better the score compares to the other scores. The lower the percentile rank, the poorer the score compares to the other scores.
33. 33. Calculating Percentile Rank• The formula is as follows: B = the number of scores Below a given score E = the number of scores Equal to the given score, including the given score. However, if there are no other scores equal to the given score, then E = 1. n = the total number of scores The percentile formula takes all the scores less than the given score (B) and adds these to half the scores equal to the given score (E). This sum is then converted to a percent (percentile) by dividing by the total number of scores (n) and then multiplying that value by 100. Note that the percentile rank is usually rounded up to the next whole number.
34. 34. PercentilesNotebook Assignment Page 390 Q. 1 - 8
35. 35. Standard Deviation• Measures of central tendency give us a sense of the ‘average’ of all values in a set of data. The range measures the variability of the data in that it is the difference between the greatest and least values. Although useful, these statistics don’t give a complete picture of the data set. Standard deviation is a more complex measure of variability that measures the distance that each piece of data is from the mean.
36. 36. Standard Deviation• The standard deviation of a sample is represented by the symbol Sx and is calculated using the following formula:
37. 37. Standard Deviation• To calculate the standard deviation follow the 6 steps outlined below:• Step 1 Determine the mean ( X ).• Step 2 Determine the difference between each score (x) and the mean (X). This calculation is represented by the following:
38. 38. Standard Deviation• Step 3 Square each difference by multiplying each difference by itself. Calculate the standard deviation for this data set using the formula• Step 4 Determine the sum of these squares. This sum is represented by the following:
39. 39. Standard Deviation• Step 5 Divide the sum of the squares by n - 1. (Recall that n is the number of values.) This calculation is called the variance and is represented by the following:• Step 6 To determine the standard deviation, calculate the square root of the variance. This calculation determines the standard deviation and is represented by the following:
40. 40. Standard Deviation• Step 5 Divide the sum of the squares by n - 1. (Recall that n is the number of values.) This calculation is called the variance and is represented by the following:• Step 6 To determine the standard deviation, calculate the square root of the variance. This calculation determines the standard deviation and is represented by the following:
41. 41. Summary
42. 42. Standard DeviationNotebook Assignment Page 399 Q. 1 - 5
43. 43. Distribution of Data• Data samples are often collected from very large populations. The heights of Senior 4 students in Manitoba, the life expectancy of new automobiles, the mass of a new penny and the number of CDs sold monthly are all examples of such large populations. When this type of data is displayed in a frequency histogram*, a bell-shaped curve such as this often results.
44. 44. Distribution of Data• A graph of this shape is called a normal curve and the distribution of the data along this curve is called a normal distribution. Because the distribution of many naturally occurring sets of data follow a normal distribution, the normal curve is widely used in statistics.
45. 45. Normal Distribution• The following histogram shows the test results for a larger population of students.
46. 46. Characteristics of Normal DistributionObserve the characteristics of this histogram:• The tops of the bars are connected producing a smooth curve.• This smooth curve is bell shaped.• Most students’ scores are clustered around the mean score.• The histogram is symmetrical on either side of the mean.• Very few students scored less than 45 or greater than 85.
47. 47. Characteristics of Normal Distribution• There is a significant relationship between any normal distribution and the standard deviation introduced earlier.• Every normal distribution has the same percent of its data within given standard deviations of its mean. The following graph indicates the percents of data within one, two, and three standard deviations from the mean for any normal distribution.
48. 48. Characteristics of Normal DistributionEvery normal curve has the following characteristics.• It is bell shaped and extends in both directions.• The mean is at the centre of the curve and the curve is symmetrical about the mean. This means that the curve can be folded along the line marking the mean and the left side of the curve will fall on top of the right side.• The mean equals the median. There are an equal number of pieces of data below and above the mean.• The scores that make up the normal distribution tend to cluster around the middle with very few values more than three standard deviations away from the mean on either side.• Approximately 68% (34% + 34%) of all the data falls within one standard deviation of the mean.• Approximately 28% (14% + 14%) of all data falls between one and two standard deviations of the mean.• Approximately 4% (2% + 2%) of all data falls between two and three standard deviations of the mean.
49. 49. Normal DistributionNotebook Assignment Page 408 Q. 1 - 7