Question
Which of the following data sets is most likely to be normally distributed? For other choices, explain why you believe they would not follow a normal distribution.
The hand span (measured from the tip of the thumb to the tip of the extended 5th finger) of a random sample of high school seniors.
The annual salaries of all employees of a large shipping company
The annual salaries of a random sample of 50 CEOs of major companies (25 men and 25 women)
The dates of 100 pennies taken from a cash drawer in a convenience store
Question
Assume than the mean weight of 1 year old girls in the US is normally distributed with a mean value of 9.5 kg and standard deviation of 1.1. Without using a calculator (use the empirical rule 68 %, 95 %, 99%), estimate the percentage of 1 year old girls in the US that meet the following conditions. Draw a sketch and shade the proper region for each problem…
Less than 8.1 kg
Between 7.3 and 11.7 kg.
More than 12.8 kg.
Question
The grades on a marketing research course midterm are normally distributed with a mean (81) and standard deviation (6.3) . Calculate the z score for each of the following exam grades. Draw and label a sketch for each example.
65
83
93
100
Question
The grades on a marketing research course midterm are normally distributed with a mean (81) and standard deviation (6.3) . Calculate the z score for each of the following exam grades. Draw and label a sketch for each example.
65
83
93
100
Question…
What is the relative frequency of observations below 1.18? That is, find the relative frequency of the event Z < 1.18.
z .00 .01 ... .08 .09
0.0 .5000 .5040 ... .5319 .5359
0.1 .5398 .5438 ... .5714 .5753
... ... ... ... ... ...
1.0 .8413 .8438 ... .8599 .8621
1.1 .8643 .8665 ... .8810 8830
1.2 .8849 .8869 ... .8997 .9015
... ... ... ... ... ...
Question
Find the value z such that the event Z > z has relative frequency 0.80.
Question
For borrowers with good credits the mean debt for revolving and installment accounts is $ 15, 015. Assume the standard deviation is $3,540 and that debt amounts are normally distributed.
What is the probability that the debt for a borrower with good credit is more than $ 18,000.
Question
The average stock price for companies making up the S&P 500 is $30, and the standard deviation is $ 8.20. Assume the stock prices are normally distributed.
How high does a stock price have to be to put a company in the top 10 % … ?
Question
The scores on a statewide geometry exam were normally distributed with μ=72 and σ=8. What fraction of test-takers had a grade between 70 and 72 on the exam? Use the cumulative z-table provided below.
z. 00 .01 .02. 03. 04. 05. 06. 07 .08 .09
0.00. 50000 .50400 .50800 .51200 .51600 .51990 .52390 .52790 .53190 .5359
0.10. 53980 .54380 .54780 .55170 .55570 .55960 .56360 .56750 .57140 .5753
0.20. 57930 .58320 .58710 .59100 .59480 .59870 .60260 .60640 .61 ...
Interactive Powerpoint_How to Master effective communication
QuestionWhich of the following data sets is most likel.docx
1. Question
Which of the following data sets is most likely to be normally
distributed? For other choices, explain why you believe they
would not follow a normal distribution.
The hand span (measured from the tip of the thumb to the tip of
the extended 5th finger) of a random sample of high school
seniors.
The annual salaries of all employees of a large shipping
company
The annual salaries of a random sample of 50 CEOs of major
companies (25 men and 25 women)
The dates of 100 pennies taken from a cash drawer in a
convenience store
Question
Assume than the mean weight of 1 year old girls in the US is
normally distributed with a mean value of 9.5 kg and standard
deviation of 1.1. Without using a calculator (use the empirical
rule 68 %, 95 %, 99%), estimate the percentage of 1 year old
girls in the US that meet the following conditions. Draw a
sketch and shade the proper region for each problem…
Less than 8.1 kg
Between 7.3 and 11.7 kg.
More than 12.8 kg.
2. Question
The grades on a marketing research course midterm are
normally distributed with a mean (81) and standard deviation
(6.3) . Calculate the z score for each of the following exam
grades. Draw and label a sketch for each example.
65
83
93
100
Question
The grades on a marketing research course midterm are
normally distributed with a mean (81) and standard deviation
(6.3) . Calculate the z score for each of the following exam
grades. Draw and label a sketch for each example.
65
83
93
100
Question…
What is the relative frequency of observations below 1.18? That
is, find the relative frequency of the event Z < 1.18.
z .00 .01 ... .08 .09
0.0 .5000 .5040 ... .5319 .5359
0.1 .5398 .5438 ... .5714 .5753
... ... ... ... ... ...
1.0 .8413 .8438 ... .8599 .8621
1.1 .8643 .8665 ... .8810 8830
1.2 .8849 .8869 ... .8997 .9015
... ... ... ... ... ...
3. Question
Find the value z such that the event Z > z has relative
frequency 0.80.
Question
For borrowers with good credits the mean debt for revolving
and installment accounts is $ 15, 015. Assume the standard
deviation is $3,540 and that debt amounts are normally
distributed.
What is the probability that the debt for a borrower with good
credit is more than $ 18,000.
Question
The average stock price for companies making up the S&P 500
is $30, and the standard deviation is $ 8.20. Assume the stock
prices are normally distributed.
How high does a stock price have to be to put a company in the
top 10 % … ?
Question
The scores on a statewide geometry exam were normally
distributed with μ=72 and σ=8. What fraction of test-takers had
a grade between 70 and 72 on the exam? Use the cumulative z-
table provided below.
z. 00 .01 .02. 03. 04. 05. 06.
07 .08 .09
0.00. 50000 .50400 .50800 .51200 .51600 .51990 .52390 .52790
4. .53190 .5359
0.10. 53980 .54380 .54780 .55170 .55570 .55960 .56360 .56750
.57140 .5753
0.20. 57930 .58320 .58710 .59100 .59480 .59870 .60260 .60640
.61030 .6141
0.30. 61790 .62170 .62550 .62930 .63310 .63680 .64060 .64430
.64800 .6517
0.40. 65540 .65910 .66280 .66640 .67000 .67360 .67720 .68080
.68440 .6879
0.50. 69150 .69500 .69850 .70190 .70540 .70880 .71230 .71570
.71900 .7224
0.60. 72570 .72910 .73240 .73570 .73890 .74220 .74540 .74860
.75170 .7549
0.70. 75800 .76110 .76420 .76730 .77040 .77340 .77640 .77940
.78230 .7852
0.80. 78810 .79100 .79390 .79670 .79950 .80230 .80510 .80780
.81060 .8133
0.90. 81590 .81860 .82120 .82380 .82640 .82890 .83150 .83400
.83650 .8389
1.00. 84130 .84380 .84610 .84850 .85080 .85310 .85540 .85770
.85990 .8621
Question
The scores on a marketing research take home exam is normally
distributed with μ=70.25 and σ=3.
Henry scored 71 on the exam Henry’s exam grade was higher
than what percentage of test-takers?
Question
You sample 36 apples from your farm’s harvest of over 200,000
apples. The mean weight of the sample is 112 grams (with a 40
gram sample standard deviation). What is the probability that
the weight of all 200,000 apples is within 100 and 124 grams?
5. Question
In a local teaching district a technology grant is available to
teachers in order to install a cluster of four computers in their
classrooms. From the 6250 teachers in the district, 250 were
randomly selected and asked if they felt that computers were
essential teaching tool for their classroom. Of those selected,
142 teachers felt that computers were an essential teaching tool.
Calculate a 99 % confidence interval for the proportion of
teachers who felt that computers are an essential teaching tool.
How could the survey be changed to narrow the confidence
interval but to maintain the 99 % confidence interval?
Statistical Analysis
Episode #1:
Prior Data Analysis
Logic of Statistical Inference
Now… statistics…
Part #1
6. Prior Data Analysis
(descriptive statistics)
Data screening… and, Why?
One needs to get a feel for the data.. Understanding the sample
data is a MUST before making any statistical inference
One variable at a time, and bivariate relationships give you a
feeling about the preliminary results
Early detection of issues give suggestions about some
adjustments before moving on… (e.g. outliers, missing
observations)
Univariate analysis
First, check the distribution of every variable (univariate
analysis) … why?
Important terms:
Skewness – distribution’s deviation from a perfectly
symmetrical shape (positive skewness, negative skewness)
Kurtosis – general peakedness of a distribution (platykurtic,
leptokurtic, mesokurtic)
Univariate analysis
7. Univariate analysis
Bivariate analysis
Once you know each variable well, you can start looking at
their relationships
2 Categorical variables – crosstabulation (joint distribution of
2 variables
2 Continuous variables – scatterplot
1 categorical 1 continous – compare the boxplots/ steam-leafs
Cross tabulation
Scatterplot
8. Part #2
Logic of Statistical Inference
Statistical inference
Statistics is the analysis of population characteristics by
inference from sampling.
Statistical analysis has two foci: descriptive statistics and
statistical inference
Descriptive statistics= organizing and describing data obtained
from a sample of observations
Statistical inference – descriptive statistics estimates the value
of measures in in the population from which the sample was
drawn (based on probability theory)
The goal of statistical inference is to make precise estimates of
population parameters, with known risks of error based on
observations from samples (random sampling error)
9. Statistical inference
Sampling Distribution
When conducting research, scientists seldom take more than one
sample from a population. This single sample becomes the basis
upon which inferences are made. Consider for a moment the
possibility of selecting numerous samples using identical
random sampling procedures from the same population. We
would now have multiple instances of whatever statistic we
were interested in examining… the differences between these
sample statistics might give us some notion concerning how
well our sampling procedure was working.
Each sample of the same size would provide one observation to
be included in the distribution (where the data points are the
sample statistics of each sample being drawn from the
respective population)
Sampling Distribution
Sampling distribution is the distribution of a sample statistics
that would be obtained if all possible samples of the same size
(N) were drawn from a given population.
10. Sampling Distribution
Central Limit Theorem
Distribution of a large number of sample means or sample
proportions will approximate a normal distribution, regardless
of the distribution of the population from which they were
drawn.
it specifies that the mean of the sampling distribution will be
equal to the mean of the population.
As the sample gets larger, sample does a better job estimating
the corresponding population parameter. Standard deviation of
sampling distribution is called the “standard error, or sampling
error”
Normal distribution
Theoretical probability distribution – with a symmetrical,
unimodal, and bell shaped curve.. It is based on a probability
density function….
ND is bell shaped and has only one mode [particular value that
occurs most frequently]
It is symmetric around mean, not skewed (mean, median, mode
11. are all equal)
The area of a region under the curve between any two values of
a variable equals the probability of observing a value in that
range when an observation is randomly selected form the
distribution.
The area between the mean and a given number of standard
deviations from the mean is the same for all NDs [the area
between the mean and plus or minus one SD takes in 68.26 % of
the observations]
Question
Which of the following data sets is most likely to be normally
distributed? For other choices, explain why you believe they
would not follow a normal distribution.
The hand span (measured from the tip of the thumb to the tip of
the extended 5th finger) of a random sample of high school
seniors.
The annual salaries of all employees of a large shipping
company
The annual salaries of a random sample of 50 CEOs of major
companies (25 men and 25 women)
The dates of 100 pennies taken from a cash drawer in a
convenience store
Question
12. Assume than the mean weight of 1 year old girls in the US is
normally distributed with a mean value of 9.5 kg and standard
deviation of 1.1. Without using a calculator (use the empirical
rule 68 %, 95 %, 99%), estimate the percentage of 1 year old
girls in the US that meet the following conditions. Draw a
sketch and shade the proper region for each problem…
Less than 8.1 kg
Between 7.3 and 11.7 kg.
More than 12.8 kg.
Standard normal distribution
A standard normal distribution is a normal distribution with
mean 0 and standard deviation 1.
From the 68-95-99.7 rule we know that for a variable with the
standard normal distribution, 68% of the observations fall
between -1 and 1 (within 1 standard deviation of the mean of 0),
95% fall between -2 and 2 (within 2 standard deviations of the
mean) and 99.7% fall between -3 and 3 (within 3 standard
deviations of the mean).
No naturally measured variable has this distribution. However,
all other normal distributions are equivalent to this distribution
when the unit of measurement is changed to measure standard
deviations from the mean. (That's why this distribution is
important--it's used to handle problems involving any normal
distribution.)
13. Question
The grades on a marketing research course midterm are
normally distributed with a mean (81) and standard deviation
(6.3) . Calculate the z score for each of the following exam
grades. Draw and label a sketch for each example.
65
83
93
100
Question
The grades on a marketing research course midterm are
normally distributed with a mean (81) and standard deviation
(6.3) . Calculate the z score for each of the following exam
grades. Draw and label a sketch for each example.
65
83
93
100
Question…
What is the relative frequency of observations below 1.18? That
is, find the relative frequency of the event Z < 1.18.
z .00 .01 ... .08 .09
14. 0.0 .5000 .5040 ... .5319 .5359
0.1 .5398 .5438 ... .5714 .5753
... ... ... ... ... ...
1.0 .8413 .8438 ... .8599 .8621
1.1 .8643 .8665 ... .8810 8830
1.2 .8849 .8869 ... .8997 .9015
... ... ... ... ... ...
Question
Find the value z such that the event Z > z has relative
frequency 0.80.
Question
For borrowers with good credits the mean debt for revolving
and installment accounts is $ 15, 015. Assume the standard
deviation is $3,540 and that debt amounts are normally
distributed.
What is the probability that the debt for a borrower with good
credit is more than $ 18,000.
Question
15. The average stock price for companies making up the S&P 500
is $30, and the standard deviation is $ 8.20. Assume the stock
prices are normally distributed.
How high does a stock price have to be to put a company in the
top 10 % … ?
Question
The scores on a statewide geometry exam were normally
distributed with μ=72 and σ=8. What fraction of test-takers had
a grade between 70 and 72 on the exam? Use the cumulative z-
table provided below.
z. 00 .01 .02. 03. 04. 05. 06.
07 .08 .09
0.00. 50000 .50400 .50800 .51200 .51600 .51990 .52390 .52790
.53190 .5359
0.10. 53980 .54380 .54780 .55170 .55570 .55960 .56360 .56750
.57140 .5753
0.20. 57930 .58320 .58710 .59100 .59480 .59870 .60260 .60640
.61030 .6141
0.30. 61790 .62170 .62550 .62930 .63310 .63680 .64060 .64430
.64800 .6517
0.40. 65540 .65910 .66280 .66640 .67000 .67360 .67720 .68080
.68440 .6879
0.50. 69150 .69500 .69850 .70190 .70540 .70880 .71230 .71570
.71900 .7224
0.60. 72570 .72910 .73240 .73570 .73890 .74220 .74540 .74860
.75170 .7549
0.70. 75800 .76110 .76420 .76730 .77040 .77340 .77640 .77940
.78230 .7852
0.80. 78810 .79100 .79390 .79670 .79950 .80230 .80510 .80780
16. .81060 .8133
0.90. 81590 .81860 .82120 .82380 .82640 .82890 .83150 .83400
.83650 .8389
1.00. 84130 .84380 .84610 .84850 .85080 .85310 .85540 .85770
.85990 .8621
Question
The scores on a marketing research take home exam is normally
distributed with μ=70.25 and σ=3.
Henry scored 71 on the exam Henry’s exam grade was higher
than what percentage of test-takers?
Summary on Statistical Inference
SI involves generalization from sample statistics to population
parameters
17. To conduct inferential analysis, we must have a theory that
underlies the process. The theory is based on probability
There are two kinds of error in samples: bias and random
sampling error. Through the random selection of a random
sample, bias can be eliminated and random sampling error can
be measured.
Sampling distributions are theoretical probability distributions
that describe the relationship between populations and samples.
The standard deviation of the sampling distribution is called the
standard error and when based on sample statistics, estimates
random sampling error.
As the size of the sample increases, sampling error and the
standard error will decrease.
The standard error is used both to develop interval estimates of
population parameters and to conduct hypothesis testing.
Confidence Interval
Sample statistic is not likely to be exactly equal to the
population parameters. But, we can place an interval around a
sample statistic that specifies the likely range within which the
population parameter is likely to fall…the term CI refers to the
degree of confidence, expressed as %, that the interval contains
the population mean, and for which we have an estimate
calculated from our sample.
Accuracy & precision = small standard error. The most efficient
way is to increase the N (sample size).
With 95 % CI, about 5 % will erroneously exclude the
population value.
18. Confidence Interval
Question
You sample 36 apples from your farm’s harvest of over 200,000
apples. The mean weight of the sample is 112 grams (with a 40
gram sample standard deviation). What is the probability that
the weight of all 200,000 apples is within 100 and 124 grams?
Question
In a local teaching district a technology grant is available to
teachers in order to install a cluster of four computers in their
classrooms. From the 6250 teachers in the district, 250 were
randomly selected and asked if they felt that computers were
essential teaching tool for their classroom. Of those selected,
142 teachers felt that computers were an essential teaching tool.
Calculate a 99 % confidence interval for the proportion of
teachers who felt that computers are an essential teaching tool.
How could the survey be changed to narrow the confidence
interval but to maintain the 99 % confidence interval?
19. Hypothesis Testing
Two hypotheses in the testing process: Null & Research …
Reject the null hypothesis= you show that the null hypothesis is
false. This means that the alternative hypothesis represents the
correct state of affairs in the population.
Fail to reject the null hypothesis = you show that the null
hypothesis can not be rejected. There is insufficient evidence to
support the argument that you make in your research hypothesis.
You NEVER accept the null hypothesis – because you can
NEVER prove that the population means were equal/less/more…
We can NEVER be certain.
Level of significance (p value) – level of risk you are willing to
accept. (p < .05 means we will reject the null hypothesis only
when the probability of falsely rejecting the null hypothesis is
less than 5 in 100..
Hypothesis Testing