nossi ch 9
Upcoming SlideShare
Loading in...5
×
 

Like this? Share it with your network

Share

nossi ch 9

on

  • 1,188 views

Contemporary Math ch 9 power point

Contemporary Math ch 9 power point

Statistics

Views

Total Views
1,188
Views on SlideShare
1,188
Embed Views
0

Actions

Likes
0
Downloads
64
Comments
0

0 Embeds 0

No embeds

Accessibility

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

nossi ch 9 Presentation Transcript

  • 1. Chapter 9 Collecting and Interpreting Data
  • 2. Section 9.1 Populations, Samples, and Data
    • Goals
      • Study populations and samples
      • Study data
        • Quantitative data
        • Qualitative data
      • Study bias
      • Study simple random sampling
  • 3. Populations and Samples
    • The entire set of objects being studied is called the population .
    • The members of a population are called elements .
  • 4. Populations and Samples, cont’d
    • Any characteristic of elements of the population is called a variable .
      • Quantitative variables can be expressed as numbers.
      • Qualitative variables cannot be expressed as numbers-usually expressed as categories.
  • 5. Populations and Samples
    • A census measures the variable for every element of the population.
      • A census is time-consuming and expensive, unless the population is very small.
    • Instead of dealing with the entire population, a subset, called a sample , is usually selected for study.
  • 6. Example 1
    • Suppose you want to determine voter opinion on a ballot measure. You survey potential voters among pedestrians on Main Street during lunch.
      • What is the population?
      • What is the sample?
      • What is the variable being measured?
  • 7. Example 1
      • Solution: The population consists of all the people who intend to vote on the ballot measure.
  • 8. Example 1
      • Solution: The sample consists of all the people you interviewed on Main Street who intend to vote on the ballot measure.
  • 9. Example 1
      • Solution: The variable being measured is the voter’s intent to vote “yes” or “no” on the ballot measure.
  • 10.
    • Qualitative data with a natural ordering is called ordinal .
      • For example, a ranking of a pizza on a scale of “Excellent” to “Poor” is ordinal.
    • Qualitative data without a natural ordering is called nominal .
      • For example, eye color is nominal.
  • 11. Example 2
    • Suppose you survey potential voters among the people on Main Street during lunch to determine their political affiliation and age, as well as their opinion on the ballot measure.
    • Classify the variables as quantitative or qualitative.
  • 12. Example 2
    • Solution:
      • Political affiliation is a qualitative variable (categories)
      • Age is a quantitative variable (numbers)
      • Opinion on the ballot measure is a qualitative variable (categories)
  • 13. Common Sources of Bias
    • Faulty sampling : The sample is not representative.
    • Faulty questions : The questions are worded to influence the answers.
    • Faulty interviewing : Interviewers fail to survey the entire sample, misread questions, and/or misinterpret answers.
  • 14. Common Sources of Bias, cont’d
    • Lack of understanding or knowledge : The person being interviewed does not understand the question or needs more information.
    • False answers : The person being interviewed intentionally gives incorrect information.
  • 15. Example 3
    • Suppose you wish to determine voter opinion regarding eliminating the capital gains tax. You survey potential voters on a street corner near Wall Street in New York City.
    • Identify a source of bias in this poll.
  • 16. Example 3
    • Solution: One source of bias in choosing the sample is that people who work on Wall Street would benefit from the elimination of the tax and are more likely to favor the elimination than the average voter may be.
      • This is faulty sampling.
  • 17. Example 4
    • Suppose a car manufacturer wants to test the reliability of 1000 alternators. They will test the first 30 from the lot for defects.
    • Identify any potential sources of bias.
  • 18. Example 4
    • Solution: One source of bias could be that the first 30 alternators are chosen for the sample. It may be that defects are either much more likely at the beginning of a production run or much less likely at the beginning. In either case, the sample would not be representative.
      • This is potentially faulty sampling.
  • 19. Simple Random Samples
    • Given a population and a desired sample size, a simple random sample is any sample chosen in such a way that all samples of the same size are equally likely to be chosen.
  • 20. Simple Random Samples, cont’d
    • One way to choose a simple random sample is to use a random number generator or table.
      • A random number generator is a computer or calculator program designed to produce numbers with no apparent pattern.
      • A random number table is a table produced with a random number generator.
        • An example of the first few rows of a random number table is shown on the next slide.
  • 21. Random Number Table
  • 22. Example 5
    • Choose a simple random sample of size 5 from 12 semifinalists: Astoria, Beatrix, Charles, Delila, Elsie, Frank, Gaston, Heidi, Ian, Jose, Kirsten, and Lex.
  • 23. Example 5, cont’d
    • Solution: Assign numerical labels to the population elements, in any order, as shown below:
  • 24. Example 5
    • Choose a random spot in the table to begin.
      • One option is to start at the top of the third column and to read down, looking at the last 2 digits in each number. This choice is arbitrary. There are many ways to use this table.
      • Numbers that correspond to population labels are recorded, ignoring duplicates, until 5 such numbers have been found.
  • 25. Example 5
  • 26. Example 5
    • The numbers located are 01, 06, 10, 11, and 07.
      • The simple random sample consists of Beatrix, Gaston, Heidi, Kirsten, and Lex.
  • 27. Example 6
    • Choose a simple random sample of size 8 from the states of the United States of America.
  • 28. Example 6
  • 29. Example 6
    • We randomly choose to start at the top row, left column of the number table and read the last 2 digits of each entry across the row.
      • The entries are 039 18 771 95 477 72 218 70 871 22 994 45 100 41 317 95 638 57 645 69 348 93 204 29 435 37 253 68 952 37 177 07 342 80 047 55 643 01 668 36 122 01 …
  • 30. Example 6
      • The numbers obtained from the table are 18, 22, 45, 41, 29, 37, 07, 01.
      • The states selected for the sample are Washington, Florida, Vermont, West Virginia, Arkansas, Kentucky, Nevada, and Alaska.
  • 31. Section 9.2 Survey Sampling Methods
    • Goals
      • Study sampling methods
        • Independent sampling
        • Systematic sampling
        • Quota sampling
        • Stratified sampling
        • Cluster sampling
  • 32. 9.2 Initial Problem
    • You need to interview at least 800 people nationwide.
      • You need a different interviewer for each county.
      • Each interviewer costs $50 plus $10 per interview.
      • Your budget is $15,000 .
    • Which is better, a simple random sample of all adults in the U.S. or a simple random sample of adults in randomly-selected counties?
      • The solution will be given at the end of the section.
  • 33. Sample Survey Design
    • Simple random sampling can be expensive and time-consuming in practice.
    • Statisticians have developed sample survey design to provide less expensive alternatives to simple random sampling.
  • 34. Independent Sampling
    • In independent sampling , each member of the population has the same fixed chance of being selected for the sample.
      • The size of the sample is not fixed ahead of time.
    • For example, in a 50% independent sample, each element of the population has a 50% chance of being selected.
  • 35. Example 1
    • Find a 50% independent sample of the 12 semifinalists:
    • Astoria, Beatrix, Charles, Delila, Elsie, Frank, Gaston, Heidi, Ian, Jose, Kirsten, and Lex.
  • 36. Example 1
    • One suggestion is to let the digits 0, 1, 2, 3, or 4 represent “select this contestant” and let the remaining digits represent “do not select this contestant”.
    • We randomly choose column 6 in the random number table and look at the first 12 digits: 99445 20429 04.
    • Contestants: Astoria, Beatrix, Charles , Delila , Elsie, Frank , Gaston , Heidi , Ian , Jose, Kirsten , and Lex
      • The first 9 indicates that Astoria is not selected.
      • The second 9 indicates that Beatrix is not selected.
      • The 4 represents that Charles is selected, and so on…
    • The 50% independent sample is Charles, Delila, Frank, Gaston, Heidi, Ian, Kirsten, and Lex.
  • 37. Systematic Sampling
    • In systematic sampling , we decide ahead of time what proportion of the population we wish to sample.
    • For a 1-in- k systematic sample :
      • List the population elements in some order.
      • Randomly choose a number, r , from 1 to k .
      • The elements selected are those labeled r , r + k , r + 2 k , r + 3 k , …
  • 38. Example 3
    • Use systematic sampling to select a 1-in-10 systematic sample of the 100 automobiles produced in one day at a factory.
  • 39. Example 3
    • Solution: List the automobiles in some order.
    • Suppose we randomly choose r = 5.
      • Since r = 5 and k = 10, the automobiles selected for the sample are those labeled 5, 15, 25, 35, 45, 55, 65, 75, 85, and 95.
  • 40. Example 3
    • A systematic sample is easier to choose than an independent sample.
    • However, the regularity in the selection of a systematic sample can sometimes be a source of bias.
  • 41. Quota Sampling
    • In quota sampling , the sample is chosen to be representative for known important variables.
      • Quotas may be set for age groups, genders, ethnicities, occupations, and so on.
      • There is no way to know ahead of time which variables are important enough to require quotas.
      • Quota sampling is not always reliable.
  • 42. Stratified Sampling
    • In stratified sampling , the population is subdivided into 2 or more nonoverlapping subsets, each of which is called a stratum . Examples of strata are:
    • Men and women
    • Children, working adults, retired adults
  • 43. Example 4
    • Select a stratified random sample of 10 men and 10 women from a population of 200 (100 men and 100 women).
    • Solution: The 2 strata are men and women.
    • Choose a simple random sample from the men.
      • Number the 100 men with labels 00 through 99.
      • Use the random number table to choose 10 men.
    • Repeat for the women.
  • 44. Example 4
    • The stratified random sample is represented below.
  • 45. Cluster Sampling
    • In cluster sampling , the population is divided into nonoverlapping subsets called sampling units or clusters .
      • Clusters may vary in size.
    • A frame is a complete list of the sampling units.
    • A sample is a collection of sampling units selected from the frame.
    • Examples:
        • Counties
        • Cities
        • Colleges
  • 46. Sampling Summary
  • 47. 9.2 Initial Problem Solution
    • You need to interview at least 800 people nationwide.
      • You need a different interviewer for each county.
      • Each interviewer costs $50 plus $10 per interview.
      • Your budget is $15,000.
    • Which is better, a simple random sample of all adults in the U.S. or a simple random sample of adults in randomly-selected counties?
  • 48. Initial Problem Solution, cont’d
    • A simple random sample is unbiased, so this might seem to be the best choice.
    • However, there are 3130 counties in the U.S.
      • If, for example, you get people in your sample from only 400 of the counties, it would cost you 400($50) + 800($10) = $28,000.
    • You cannot afford to choose a simple random sample.
  • 49. Initial Problem Solution, cont’d
    • The second type of sample is a much less expensive choice.
    • You must pay 800($10) = $8000 for the interviews, which leaves $7000 for hiring interviewers.
      • You can select a simple random sample of up to 140 counties.
      • Then select a simple random sample of people from each selected county, for a total of 800 people.
  • 50. Section 9.3 Central Tendency and Variability
    • Goals
      • Study measures of central tendency
        • Mean
        • Median
        • Mode
      • Study measures of dispersion (spread of the data)
        • Range
        • Quartiles
        • Standard deviation
  • 51. The Mean
    • The mean is the most common type of average.
      • This is an arithmetic mean.
    • If there are N numbers in a data set, the mean is:
  • 52. Example 1
    • Find the mean of each data set.
      • 1, 1, 2, 2, 3
    • Solution:
      • The mean is
  • 53. Example 2
    • A college graduate reads that a company with 5 employees has a mean salary of $48,000.
    • How might this be misleading?
  • 54. Example 2
    • One possibility is that every employee earns a salary of $48,000.
    • Another possibility is that the owner makes $120,000, while the other 4 employees each earn $30,000.
  • 55. The Median
    • The median is the “middle number” of a data set when the values are arranged from smallest to largest.
      • If there are an odd number of data points, the data point exactly in the middle of the list is the median.
      • If there are an even number of data points, the mean of the two data points in the middle of the list is the median.
  • 56. Example 3
    • Find the mean and median of each data set.
      • 0, 2, 4
      • 0, 2, 4, 10
      • 0, 2, 4, 10, 1000
  • 57. Example 3, cont’d
    • Solution for 0, 2, 4
      • The median is 2.
      • The mean is:
  • 58. Example 3, cont’d
    • Solution: for 0, 2, 4, 10
      • The median is:
      • The mean is:
  • 59. Example 3, cont’d
    • Solution: for 0, 2, 4, 10, 1000
      • The median is 4.
      • The mean is:
  • 60. Example 3, cont’d
    • One very large or very small data value can change the mean dramatically.
    • Large or small data values do not have much of an effect on the median.
  • 61. Symmetric Distributions
    • If the mean and median of a data set are equal, the data distribution is called symmetric .
    • An example of a symmetric data set is shown below.
  • 62. Skewed Distributions
    • A distribution is skewed left if the mean is less than the median.
    • A distribution is skewed right if the mean is greater than the median.
  • 63. The Mode
    • The mode is the most commonly-occurring value in a data set.
    • A data set may have:
      • No mode.
      • One mode.
      • Multiple modes.
  • 64. Example 5
    • Find the mode(s) of the following set of test scores: 26, 32, 54, 62, 67, 70, 71, 71, 74, 76, 80, 81, 84, 87, 87, 87, 89, 93, 95, 96.
    • Solution: The value 87 occurs more times than any other score. The mode is 87.
  • 65. Example 5, cont’d
  • 66. The Weighted Mean
    • A weighted mean is calculated when different data points have different levels of importance, called weights.
    • If the numbers in a data set,
    • , have weights
    • then the weighted mean is:
  • 67. Example 6
    • Suppose your grades one semester are:
      • An A in a 5-credit course
      • A B in a 4-credit course
      • A C in two 3-credit courses
    • What is your GPA that semester?
  • 68. Example 6
    • Solution: A grade of A is worth 4 points, a B 3 points, and a C 2 points.
    • The weights are the number of credits.
    • Your GPA is the weighted mean of your grades:
  • 69. Measures of Variability
    • The measures of central tendency describe only part of the behavior of a data set.
    • Statistics that tell us how the data varies from its center are called measures of variability or measures of spread .
    • The measures of variability studied here are:
      • Range
      • Quartiles
      • Standard deviation
  • 70. The Range
    • The range of a data set is the difference between the largest data value and the smallest data value.
  • 71. Example 8
    • Compute the mean and the range for each data set.
      • 3, 4, 5, 6, 7, 8
      • 0, 2, 5, 7, 8, 11
  • 72. Example 8, cont’d
    • Solution:
      • 3, 4, 5, 6, 7, 8
        • The mean is 5.5.
        • The range is 8 – 3 = 5.
      • 0, 2, 5, 7, 8, 11
        • The mean is 5.5.
        • The range is 11 – 0 = 11.
    • The two data sets have the same mean, but the difference in ranges shows that the second data set is more spread out.
  • 73. Quartiles
    • Quartiles are measures of location that divide a data set approximately into fourths.
    • The quartiles are labeled as the
      • first quartile, q 1
      • second quartile, q 2
        • The second quartile is the same as the median.
      • third quartile, q 3
  • 74. Quartiles
    • To find the quartiles, arrange the data values in order from smallest to largest.
      • Find the median. This is also the second quartile.
      • If the number of data points is even, go to Step 3. If the number of data point is odd, remove the median from the list before going to Step 3.
  • 75. Quartiles
      • Divide the remaining data points into a lower half and an upper half.
      • The first quartile, q 1 , is the median of the lower half of the data.
      • The third quartile, q 3 , is the median of the upper half of the data.
  • 76. Quartiles, cont’d
    • The interquartile range, IQR, is the difference between the first and third quartiles.
      • IQR = q 3 - q 1
      • The IQR is a measure of variability.
        • About half of the data points lie within the IQR
  • 77. Example 10
    • Find the median, the first and third quartiles, and the interquartile range for the test scores:
    • 26, 32, 54, 62, 67, 70, 71, 71, 74, 76, 80, 81, 84, 87, 87, 87, 89, 93, 95, 96.
  • 78. Example 10
    • 26, 32, 54, 62, 67, 70, 71, 71, 74, 76, 80, 81, 84, 87, 87, 87, 89, 93, 95, 96.
    • The median is
      • Since there is an even number of data points, we do not remove the median from the list.
    • The first quartile is the median of the lower half of the list: 26, 32, 54, 62, 67, 70, 71, 71, 74, 76.
      • The first quartile is
  • 79. Example 10
    • The third quartile is the median of the upper half of the list:
    • 80, 81, 84, 87, 87, 87, 89, 93, 95, 96.
      • The third quartile is
    • The IQR is 87 – 68.5 = 18.5 (Q3-Q1)
  • 80. The Five-Number Summary
    • The five-number summary of a data set is a list of 5 informative numbers related to that set:
      • The smallest value, s
      • The first quartile, q 1
      • The median, m
      • The third quartile, q 3
      • The largest value, L
    • The numbers are always written in this order.
  • 81. Example 11
    • Consider the set of test scores from the previous example:
    • 26, 32, 54, 62, 67, 70, 71, 71, 74, 76, 80, 81, 84, 87, 87, 87, 89, 93, 95, 96.
    • The five-number summary for this data set is 26, 68.5, 78, 87, 96.
  • 82. Box-and-Whisker Plot
    • The box-and-whisker plot , also called a box plot , is a graphical representation of the five-number summary of a data set.
      • The box (rectangle) represents the IQR.
        • The location of the median is marked within the box.
      • The whiskers (lines) represent the lower and upper 25% of the data.
  • 83. Box-and-Whisker Plot
  • 84. Example 12
    • The list of test scores from the previous example had a five-number summary of
    • 26, 68.5, 78, 87, 96.
    • The box-and-whisker plot for this data set is shown below.
  • 85. Example 13
    • The monthly rainfall for 2 cities is shown below.
    • Use box-and-whisker plots to compare the rainfall amounts.
  • 86. Example 13, cont’d
    • Solution: In St. Louis, MO, the rainfalls were: 2.21, 2.23, 2.31, 2.64, 2.96, 3.20, 3.26, 3.29, 3.74, 4.10, 4.12.
      • The median is 3.08.
      • The first quartile is 2.475.
      • The third quartile is 3.515.
    • The five-number summary for St. Louis is 2.21, 2.475, 3.08, 3.515, 4.12.
  • 87. Example 13, cont’d
    • Solution, cont’d: In Portland, OR, the rainfalls were: 0.46, 1.13, 1.47, 1.61, 2.08, 2.31, 3.05, 3.61, 3.93, 5.17, 6.14, 6.16.
      • The median is 2.68.
      • The first quartile is 1.54.
      • The third quartile is 4.55.
    • The five-number summary for Portland is 0.46, 1.54, 2.68, 4.55, 6.16.
  • 88. Example 13
    • Solution, cont’d: The 2 box-and-whisker plots are shown above.
    • Note that the amount of rainfall in Portland, OR, varies much more from month-to-month than it does in St. Louis, MO.
  • 89. Standard Deviation
    • The standard deviation is a widely-used measure of variability.
    • Calculating the standard deviation requires several intermediate steps, which will be illustrated using the data set of incomes shown below.
  • 90. Deviation From The Mean
    • The difference between a data point and the mean of the data set is called the deviation from the mean of that data point.
  • 91. Deviation From The Mean, cont’d
    • The mean income is $35,800.
  • 92. Sample Variance
    • The variance of the incomes is calculated by first squaring all the deviations.
  • 93. Sample Variance, cont’d
    • The squared deviations are added and then divided by n – 1 = 9 – 1 = 8.
  • 94. Standard Deviation
    • Standard deviation is the square root of the variance.
    • The standard deviation of the incomes is:
  • 95. Example 14
    • Find the sample standard deviation of the weights (in pounds) in the 2 data sets.
      • Turkeys: 17, 18, 19, 20, 21
      • Dogs: 13, 16, 19, 22, 25
  • 96. Example 14
    • Solution:
    • The sample mean for the turkeys is 19 pounds.
    • The sample mean for the dogs is also 19 pounds.
      • We note that although the means are the same, the standard deviations should reflect the amount of variability in the data values.
  • 97. Example 14
    • The deviations from the mean for the turkey weights are found.
  • 98. Example 14
    • The sample variance of the turkey weights is 2.5 square pounds.
    • The sample standard deviation of the turkey weights is 1.58 pounds.
  • 99. Example 14
    • The deviations from the mean for the dog weights are found.
  • 100. Example 14
    • The sample variance of the dog weights is 22.5 square pounds.
    • The sample standard deviation of the dog weights is 4.74 pounds.
  • 101. Example 14
    • The sample standard deviation of the turkey weights is 1.58 pounds.
    • The sample standard deviation of the dog weights is 4.74 pounds.
    • The standard deviation of the sample of dog weights is larger than the standard deviation of the sample of turkey weights because there was a much wider spread among the dog weights.
  • 102. 9.3 Initial Problem Solution
    • Which stockbroker should you choose if you want to minimize risk while maintaining a steady rate of growth?
      • One stockbroker’s recommendations had percentage gains of 21%, -3%, 16%, 27%, 9%, 11%, 13%, 6%, and 17%.
      • The other’s recommendations had percentage gains of 11%, 13%, 16%, 8%, 5%, 14%, 15%, 17%, and 18%.
  • 103. Initial Problem Solution
    • First you could calculate the mean rate of return for each stockbroker.
      • Both stockbrokers have a mean rate of return of 13%.
      • Since the average growth rates are the same, you can measure the variability to determine which stockbroker’s recommendations have the least variability.
  • 104. Initial Problem Solution, cont’d
    • First stockbroker:
  • 105. Initial Problem Solution, cont’d
    • Second stockbroker:
  • 106. Initial Problem Solution, cont’d
    • The standard deviation of the second portfolio 4.30 is much smaller than the standard deviation of the first stock portfolio 8.73.
    • Since the growth rates were the same, the second stockbroker should be chosen in order to minimize risk.
  • 107. Ch 9 Assignment
    • You must show some work for calculations to receive full credit.
    • Section 9.1 pg 573 (1,3,4,13,14,19,23,25,27)
    • Section 9.2 pg 586 (1,2,21,27,33,39)
    • Section 9.3 pg 614 (1,5,15,16,19,21,33 and find standard deviation=square root of the variance, 35)
    • I will also be giving an extra credit assignment. You will review an article from the Tennessean. This assignment can count as a homework.