On October 23rd, 2014, we updated our
Privacy Policy
and
User Agreement.
By continuing to use LinkedIn’s SlideShare service, you agree to the revised terms, so please take a few minutes to review them.
Any characteristic of elements of the population is called a variable .
Quantitative variables can be expressed as numbers.
Qualitative variables cannot be expressed as numbers-usually expressed as categories.
5.
Populations and Samples
A census measures the variable for every element of the population.
A census is time-consuming and expensive, unless the population is very small.
Instead of dealing with the entire population, a subset, called a sample , is usually selected for study.
6.
Example 1
Suppose you want to determine voter opinion on a ballot measure. You survey potential voters among pedestrians on Main Street during lunch.
What is the population?
What is the sample?
What is the variable being measured?
7.
Example 1
Solution: The population consists of all the people who intend to vote on the ballot measure.
8.
Example 1
Solution: The sample consists of all the people you interviewed on Main Street who intend to vote on the ballot measure.
9.
Example 1
Solution: The variable being measured is the voter’s intent to vote “yes” or “no” on the ballot measure.
10.
Qualitative data with a natural ordering is called ordinal .
For example, a ranking of a pizza on a scale of “Excellent” to “Poor” is ordinal.
Qualitative data without a natural ordering is called nominal .
For example, eye color is nominal.
11.
Example 2
Suppose you survey potential voters among the people on Main Street during lunch to determine their political affiliation and age, as well as their opinion on the ballot measure.
Classify the variables as quantitative or qualitative.
12.
Example 2
Solution:
Political affiliation is a qualitative variable (categories)
Age is a quantitative variable (numbers)
Opinion on the ballot measure is a qualitative variable (categories)
13.
Common Sources of Bias
Faulty sampling : The sample is not representative.
Faulty questions : The questions are worded to influence the answers.
Faulty interviewing : Interviewers fail to survey the entire sample, misread questions, and/or misinterpret answers.
14.
Common Sources of Bias, cont’d
Lack of understanding or knowledge : The person being interviewed does not understand the question or needs more information.
False answers : The person being interviewed intentionally gives incorrect information.
15.
Example 3
Suppose you wish to determine voter opinion regarding eliminating the capital gains tax. You survey potential voters on a street corner near Wall Street in New York City.
Identify a source of bias in this poll.
16.
Example 3
Solution: One source of bias in choosing the sample is that people who work on Wall Street would benefit from the elimination of the tax and are more likely to favor the elimination than the average voter may be.
This is faulty sampling.
17.
Example 4
Suppose a car manufacturer wants to test the reliability of 1000 alternators. They will test the first 30 from the lot for defects.
Identify any potential sources of bias.
18.
Example 4
Solution: One source of bias could be that the first 30 alternators are chosen for the sample. It may be that defects are either much more likely at the beginning of a production run or much less likely at the beginning. In either case, the sample would not be representative.
This is potentially faulty sampling.
19.
Simple Random Samples
Given a population and a desired sample size, a simple random sample is any sample chosen in such a way that all samples of the same size are equally likely to be chosen.
20.
Simple Random Samples, cont’d
One way to choose a simple random sample is to use a random number generator or table.
A random number generator is a computer or calculator program designed to produce numbers with no apparent pattern.
A random number table is a table produced with a random number generator.
An example of the first few rows of a random number table is shown on the next slide.
21.
Random Number Table
22.
Example 5
Choose a simple random sample of size 5 from 12 semifinalists: Astoria, Beatrix, Charles, Delila, Elsie, Frank, Gaston, Heidi, Ian, Jose, Kirsten, and Lex.
23.
Example 5, cont’d
Solution: Assign numerical labels to the population elements, in any order, as shown below:
24.
Example 5
Choose a random spot in the table to begin.
One option is to start at the top of the third column and to read down, looking at the last 2 digits in each number. This choice is arbitrary. There are many ways to use this table.
Numbers that correspond to population labels are recorded, ignoring duplicates, until 5 such numbers have been found.
25.
Example 5
26.
Example 5
The numbers located are 01, 06, 10, 11, and 07.
The simple random sample consists of Beatrix, Gaston, Heidi, Kirsten, and Lex.
27.
Example 6
Choose a simple random sample of size 8 from the states of the United States of America.
28.
Example 6
29.
Example 6
We randomly choose to start at the top row, left column of the number table and read the last 2 digits of each entry across the row.
One suggestion is to let the digits 0, 1, 2, 3, or 4 represent “select this contestant” and let the remaining digits represent “do not select this contestant”.
We randomly choose column 6 in the random number table and look at the first 12 digits: 99445 20429 04.
Contestants: Astoria, Beatrix, Charles , Delila , Elsie, Frank , Gaston , Heidi , Ian , Jose, Kirsten , and Lex
The first 9 indicates that Astoria is not selected.
The second 9 indicates that Beatrix is not selected.
The 4 represents that Charles is selected, and so on…
The 50% independent sample is Charles, Delila, Frank, Gaston, Heidi, Ian, Kirsten, and Lex.
37.
Systematic Sampling
In systematic sampling , we decide ahead of time what proportion of the population we wish to sample.
For a 1-in- k systematic sample :
List the population elements in some order.
Randomly choose a number, r , from 1 to k .
The elements selected are those labeled r , r + k , r + 2 k , r + 3 k , …
38.
Example 3
Use systematic sampling to select a 1-in-10 systematic sample of the 100 automobiles produced in one day at a factory.
39.
Example 3
Solution: List the automobiles in some order.
Suppose we randomly choose r = 5.
Since r = 5 and k = 10, the automobiles selected for the sample are those labeled 5, 15, 25, 35, 45, 55, 65, 75, 85, and 95.
40.
Example 3
A systematic sample is easier to choose than an independent sample.
However, the regularity in the selection of a systematic sample can sometimes be a source of bias.
41.
Quota Sampling
In quota sampling , the sample is chosen to be representative for known important variables.
Quotas may be set for age groups, genders, ethnicities, occupations, and so on.
There is no way to know ahead of time which variables are important enough to require quotas.
Quota sampling is not always reliable.
42.
Stratified Sampling
In stratified sampling , the population is subdivided into 2 or more nonoverlapping subsets, each of which is called a stratum . Examples of strata are:
Men and women
Children, working adults, retired adults
43.
Example 4
Select a stratified random sample of 10 men and 10 women from a population of 200 (100 men and 100 women).
Solution: The 2 strata are men and women.
Choose a simple random sample from the men.
Number the 100 men with labels 00 through 99.
Use the random number table to choose 10 men.
Repeat for the women.
44.
Example 4
The stratified random sample is represented below.
45.
Cluster Sampling
In cluster sampling , the population is divided into nonoverlapping subsets called sampling units or clusters .
Clusters may vary in size.
A frame is a complete list of the sampling units.
A sample is a collection of sampling units selected from the frame.
Examples:
Counties
Cities
Colleges
46.
Sampling Summary
47.
9.2 Initial Problem Solution
You need to interview at least 800 people nationwide.
You need a different interviewer for each county.
Each interviewer costs $50 plus $10 per interview.
Your budget is $15,000.
Which is better, a simple random sample of all adults in the U.S. or a simple random sample of adults in randomly-selected counties?
48.
Initial Problem Solution, cont’d
A simple random sample is unbiased, so this might seem to be the best choice.
However, there are 3130 counties in the U.S.
If, for example, you get people in your sample from only 400 of the counties, it would cost you 400($50) + 800($10) = $28,000.
You cannot afford to choose a simple random sample.
49.
Initial Problem Solution, cont’d
The second type of sample is a much less expensive choice.
You must pay 800($10) = $8000 for the interviews, which leaves $7000 for hiring interviewers.
You can select a simple random sample of up to 140 counties.
Then select a simple random sample of people from each selected county, for a total of 800 people.
50.
Section 9.3 Central Tendency and Variability
Goals
Study measures of central tendency
Mean
Median
Mode
Study measures of dispersion (spread of the data)
Range
Quartiles
Standard deviation
51.
The Mean
The mean is the most common type of average.
This is an arithmetic mean.
If there are N numbers in a data set, the mean is:
52.
Example 1
Find the mean of each data set.
1, 1, 2, 2, 3
Solution:
The mean is
53.
Example 2
A college graduate reads that a company with 5 employees has a mean salary of $48,000.
How might this be misleading?
54.
Example 2
One possibility is that every employee earns a salary of $48,000.
Another possibility is that the owner makes $120,000, while the other 4 employees each earn $30,000.
55.
The Median
The median is the “middle number” of a data set when the values are arranged from smallest to largest.
If there are an odd number of data points, the data point exactly in the middle of the list is the median.
If there are an even number of data points, the mean of the two data points in the middle of the list is the median.
56.
Example 3
Find the mean and median of each data set.
0, 2, 4
0, 2, 4, 10
0, 2, 4, 10, 1000
57.
Example 3, cont’d
Solution for 0, 2, 4
The median is 2.
The mean is:
58.
Example 3, cont’d
Solution: for 0, 2, 4, 10
The median is:
The mean is:
59.
Example 3, cont’d
Solution: for 0, 2, 4, 10, 1000
The median is 4.
The mean is:
60.
Example 3, cont’d
One very large or very small data value can change the mean dramatically.
Large or small data values do not have much of an effect on the median.
61.
Symmetric Distributions
If the mean and median of a data set are equal, the data distribution is called symmetric .
An example of a symmetric data set is shown below.
62.
Skewed Distributions
A distribution is skewed left if the mean is less than the median.
A distribution is skewed right if the mean is greater than the median.
63.
The Mode
The mode is the most commonly-occurring value in a data set.
A data set may have:
No mode.
One mode.
Multiple modes.
64.
Example 5
Find the mode(s) of the following set of test scores: 26, 32, 54, 62, 67, 70, 71, 71, 74, 76, 80, 81, 84, 87, 87, 87, 89, 93, 95, 96.
Solution: The value 87 occurs more times than any other score. The mode is 87.
65.
Example 5, cont’d
66.
The Weighted Mean
A weighted mean is calculated when different data points have different levels of importance, called weights.
If the numbers in a data set,
, have weights
then the weighted mean is:
67.
Example 6
Suppose your grades one semester are:
An A in a 5-credit course
A B in a 4-credit course
A C in two 3-credit courses
What is your GPA that semester?
68.
Example 6
Solution: A grade of A is worth 4 points, a B 3 points, and a C 2 points.
The weights are the number of credits.
Your GPA is the weighted mean of your grades:
69.
Measures of Variability
The measures of central tendency describe only part of the behavior of a data set.
Statistics that tell us how the data varies from its center are called measures of variability or measures of spread .
The measures of variability studied here are:
Range
Quartiles
Standard deviation
70.
The Range
The range of a data set is the difference between the largest data value and the smallest data value.
71.
Example 8
Compute the mean and the range for each data set.
3, 4, 5, 6, 7, 8
0, 2, 5, 7, 8, 11
72.
Example 8, cont’d
Solution:
3, 4, 5, 6, 7, 8
The mean is 5.5.
The range is 8 – 3 = 5.
0, 2, 5, 7, 8, 11
The mean is 5.5.
The range is 11 – 0 = 11.
The two data sets have the same mean, but the difference in ranges shows that the second data set is more spread out.
73.
Quartiles
Quartiles are measures of location that divide a data set approximately into fourths.
The quartiles are labeled as the
first quartile, q 1
second quartile, q 2
The second quartile is the same as the median.
third quartile, q 3
74.
Quartiles
To find the quartiles, arrange the data values in order from smallest to largest.
Find the median. This is also the second quartile.
If the number of data points is even, go to Step 3. If the number of data point is odd, remove the median from the list before going to Step 3.
75.
Quartiles
Divide the remaining data points into a lower half and an upper half.
The first quartile, q 1 , is the median of the lower half of the data.
The third quartile, q 3 , is the median of the upper half of the data.
76.
Quartiles, cont’d
The interquartile range, IQR, is the difference between the first and third quartiles.
IQR = q 3 - q 1
The IQR is a measure of variability.
About half of the data points lie within the IQR
77.
Example 10
Find the median, the first and third quartiles, and the interquartile range for the test scores:
The five-number summary for this data set is 26, 68.5, 78, 87, 96.
82.
Box-and-Whisker Plot
The box-and-whisker plot , also called a box plot , is a graphical representation of the five-number summary of a data set.
The box (rectangle) represents the IQR.
The location of the median is marked within the box.
The whiskers (lines) represent the lower and upper 25% of the data.
83.
Box-and-Whisker Plot
84.
Example 12
The list of test scores from the previous example had a five-number summary of
26, 68.5, 78, 87, 96.
The box-and-whisker plot for this data set is shown below.
85.
Example 13
The monthly rainfall for 2 cities is shown below.
Use box-and-whisker plots to compare the rainfall amounts.
86.
Example 13, cont’d
Solution: In St. Louis, MO, the rainfalls were: 2.21, 2.23, 2.31, 2.64, 2.96, 3.20, 3.26, 3.29, 3.74, 4.10, 4.12.
The median is 3.08.
The first quartile is 2.475.
The third quartile is 3.515.
The five-number summary for St. Louis is 2.21, 2.475, 3.08, 3.515, 4.12.
87.
Example 13, cont’d
Solution, cont’d: In Portland, OR, the rainfalls were: 0.46, 1.13, 1.47, 1.61, 2.08, 2.31, 3.05, 3.61, 3.93, 5.17, 6.14, 6.16.
The median is 2.68.
The first quartile is 1.54.
The third quartile is 4.55.
The five-number summary for Portland is 0.46, 1.54, 2.68, 4.55, 6.16.
88.
Example 13
Solution, cont’d: The 2 box-and-whisker plots are shown above.
Note that the amount of rainfall in Portland, OR, varies much more from month-to-month than it does in St. Louis, MO.
89.
Standard Deviation
The standard deviation is a widely-used measure of variability.
Calculating the standard deviation requires several intermediate steps, which will be illustrated using the data set of incomes shown below.
90.
Deviation From The Mean
The difference between a data point and the mean of the data set is called the deviation from the mean of that data point.
91.
Deviation From The Mean, cont’d
The mean income is $35,800.
92.
Sample Variance
The variance of the incomes is calculated by first squaring all the deviations.
93.
Sample Variance, cont’d
The squared deviations are added and then divided by n – 1 = 9 – 1 = 8.
94.
Standard Deviation
Standard deviation is the square root of the variance.
The standard deviation of the incomes is:
95.
Example 14
Find the sample standard deviation of the weights (in pounds) in the 2 data sets.
Turkeys: 17, 18, 19, 20, 21
Dogs: 13, 16, 19, 22, 25
96.
Example 14
Solution:
The sample mean for the turkeys is 19 pounds.
The sample mean for the dogs is also 19 pounds.
We note that although the means are the same, the standard deviations should reflect the amount of variability in the data values.
97.
Example 14
The deviations from the mean for the turkey weights are found.
98.
Example 14
The sample variance of the turkey weights is 2.5 square pounds.
The sample standard deviation of the turkey weights is 1.58 pounds.
99.
Example 14
The deviations from the mean for the dog weights are found.
100.
Example 14
The sample variance of the dog weights is 22.5 square pounds.
The sample standard deviation of the dog weights is 4.74 pounds.
101.
Example 14
The sample standard deviation of the turkey weights is 1.58 pounds.
The sample standard deviation of the dog weights is 4.74 pounds.
The standard deviation of the sample of dog weights is larger than the standard deviation of the sample of turkey weights because there was a much wider spread among the dog weights.
102.
9.3 Initial Problem Solution
Which stockbroker should you choose if you want to minimize risk while maintaining a steady rate of growth?
One stockbroker’s recommendations had percentage gains of 21%, -3%, 16%, 27%, 9%, 11%, 13%, 6%, and 17%.
The other’s recommendations had percentage gains of 11%, 13%, 16%, 8%, 5%, 14%, 15%, 17%, and 18%.
103.
Initial Problem Solution
First you could calculate the mean rate of return for each stockbroker.
Both stockbrokers have a mean rate of return of 13%.
Since the average growth rates are the same, you can measure the variability to determine which stockbroker’s recommendations have the least variability.
104.
Initial Problem Solution, cont’d
First stockbroker:
105.
Initial Problem Solution, cont’d
Second stockbroker:
106.
Initial Problem Solution, cont’d
The standard deviation of the second portfolio 4.30 is much smaller than the standard deviation of the first stock portfolio 8.73.
Since the growth rates were the same, the second stockbroker should be chosen in order to minimize risk.
107.
Ch 9 Assignment
You must show some work for calculations to receive full credit.
Section 9.1 pg 573 (1,3,4,13,14,19,23,25,27)
Section 9.2 pg 586 (1,2,21,27,33,39)
Section 9.3 pg 614 (1,5,15,16,19,21,33 and find standard deviation=square root of the variance, 35)
I will also be giving an extra credit assignment. You will review an article from the Tennessean. This assignment can count as a homework.