BUS308 – Week 5 Lecture 1
A Different View
Expected Outcomes
After reading this lecture, the student should be familiar with:
1. What a confidence interval for a statistic is.
2. What a confidence interval for differences is.
3. The difference between statistical and practical significance.
4. The meaning of an Effect Size measure.
Overview
Years ago, a comedy show used to introduce new skits with the phrase “and now for
something completely different.” That seems appropriate for this week’s material.
This week we will look at evaluating our data results in somewhat different ways. One of
the criticisms of the hypothesis testing procedure is that it only shows one value, when it is
reasonably clear that a number of different values would also cause us to reject or not reject a
null hypothesis of no difference. Many managers and researchers would like to see what these
values could be; and, in particular, what are the extreme values as help in making decisions.
Confidence intervals will help us here.
The other criticism of the hypothesis testing procedure is that we can “manage” the
results, or ensure that we will reject the null, by manipulating the sample size. For example, if
we have a difference in a customer preference between two products of only 1%, is this a big
deal? Given the uncertainty contained in sample results, we might tend to think that we can
safely ignore this result. However, if we were to use a sample of, say, 10,000, we would find
that this difference is statistically significant. This, for many, seems to fly in the face of
reasonableness. We will look at a measure of “practical significance,” meaning the likelihood of
the difference being worth paying any attention to, called the effect size to help us here.
Confidence Intervals
A confidence interval is a range of values that, based upon the sample results, most likely
contains the actual population parameter. The “most likely” element is the level of confidence
attached to the interval, 95% confidence interval, 90% confidence interval, 99% confidence
interval, etc. They can be created at any time, with or without performing a statistical test, such
as the t-test.
A confidence interval may be expressed as a range (45 to 51% of the town’s population
support the proposal) or as a mean or proportion with a margin of error (48% of the town
supports the proposal, with a margin of error of 3%). This last format is frequently seen with
opinion poll results, and simply means that you should add and subtract this margin of error from
the reported proportion to obtain the range. With either format, the confidence percent should
also be provided.
Confidence intervals for a single mean (or proportion) are fairly straightforward to
understand, and relate to t-test outcomes simply. Details on how to construct the interval will be
given in this week’s second lecture. We want to understand how to interpret and understa.
BUS308 – Week 5 Lecture 1 A Different View Expected Ou.docx
1. BUS308 – Week 5 Lecture 1
A Different View
Expected Outcomes
After reading this lecture, the student should be familiar with:
1. What a confidence interval for a statistic is.
2. What a confidence interval for differences is.
3. The difference between statistical and practical significance.
4. The meaning of an Effect Size measure.
Overview
Years ago, a comedy show used to introduce new skits with the
phrase “and now for
something completely different.” That seems appropriate for
this week’s material.
This week we will look at evaluating our data results in
somewhat different ways. One of
the criticisms of the hypothesis testing procedure is that it only
shows one value, when it is
reasonably clear that a number of different values would also
cause us to reject or not reject a
null hypothesis of no difference. Many managers and
researchers would like to see what these
values could be; and, in particular, what are the extreme values
as help in making decisions.
Confidence intervals will help us here.
2. The other criticism of the hypothesis testing procedure is that
we can “manage” the
results, or ensure that we will reject the null, by manipulating
the sample size. For example, if
we have a difference in a customer preference between two
products of only 1%, is this a big
deal? Given the uncertainty contained in sample results, we
might tend to think that we can
safely ignore this result. However, if we were to use a sample
of, say, 10,000, we would find
that this difference is statistically significant. This, for many,
seems to fly in the face of
reasonableness. We will look at a measure of “practical
significance,” meaning the likelihood of
the difference being worth paying any attention to, called the
effect size to help us here.
Confidence Intervals
A confidence interval is a range of values that, based upon the
sample results, most likely
contains the actual population parameter. The “most likely”
element is the level of confidence
attached to the interval, 95% confidence interval, 90%
confidence interval, 99% confidence
interval, etc. They can be created at any time, with or without
performing a statistical test, such
as the t-test.
A confidence interval may be expressed as a range (45 to 51%
of the town’s population
support the proposal) or as a mean or proportion with a margin
of error (48% of the town
supports the proposal, with a margin of error of 3%). This last
format is frequently seen with
opinion poll results, and simply means that you should add and
3. subtract this margin of error from
the reported proportion to obtain the range. With either format,
the confidence percent should
also be provided.
Confidence intervals for a single mean (or proportion) are fairly
straightforward to
understand, and relate to t-test outcomes simply. Details on
how to construct the interval will be
given in this week’s second lecture. We want to understand
how to interpret and understand
them in this discussion.
In Week 2, we looked at how to test sample means against a
constant, and we found that
the female compa-ratio mean was not equal to or less than 1.0.
The related confidence interval
for the female compa-ratio mean would be 1.0367 to 1.0977, or
1.0687 +/- 0.0290 (all values
rounded to 4 decimal places). This result relates to possible t-
test outcomes directly. If, again in
the one sample situation, the standard/constant we are
comparing our sample result against is
within this range, then we would NOT reject the null hypothesis
of no difference. If the standard
is outside of this range, as our 1.00 test in Week 2, then we
reject the null and say we have a
significant difference. It is clear in this case, that the female
mean is not even close to the mid-
point value of 1.0 that we looked at.
Confidence intervals allow us to make some informed “gut
level” decisions when more
precise measure may not be needed. For example, if the means
4. of two variables are fairly close,
the wider confidence interval will have more variation within
the data, and be less consistent.
We could test this with the F-test for variance that we covered
in week 2. While a hypothesis
result of “reject the null hypothesis” or “do not reject the null
hypothesis” with an alpha of 0.05
is definite; it does not convey the “strength” of the rejection.
Comparing the endpoints against
the standard used in our one-sample t-test would give a sense of
how “close” we came to making
the other decision.
Confidence intervals can also be used to examine the difference
between means. The
most direct way is by constructing a confidence interval for the
difference. Again, the details on
how to develop one of these will be presented in the second
lecture for this week. This result is
very similar to the intervals we constructed while doing the
ANOVA comparisons. While we
use a different calculation formula when comparing only two
means (rather than two means at a
time with the ANOVA situation), the interpretation is the same.
If the range contains a 0, then
the population means could be identical and we would not reject
the null hypothesis of no
difference.
If we have two single mean confidence intervals, for example
intervals for the male and
female compa-ratios; using them to determine if the means are
significantly different is a bit
trickier than simply seeing if they might contain the same value
within their range. If, the top ¼
of one interval and the bottom ¼ of the other overlap, then we
5. have a significant difference at the
alpha = 0.05 level. If the endpoints barely overlap, we have a
significant difference around the
alpha = 0.01 level.
The natural question at this point is why does an overlap when
comparing means show a
significant difference when it does not do so when comparing a
mean against a constant. The
answer lies in how the intervals are constructed. Without
getting too technical, the intervals use
a t-value to establish the level of confidence. And, as the
sample size gets larger, the
corresponding t-value gets smaller for any specific alpha level.
So, in our example of comparing
compa-ratio means, we had samples of 25 when constructing the
individual intervals and used a
slightly larger t-value than we would use with our overall
sample of 50 when comparing the two
groups together. This means the individual intervals are a bit
longer when compared to the
larger sample result, hence why some of the overlap shows a
significant difference rather than
the more “logical” interpretation of only no overlap at all means
they differ.
Effect Size – Practical Importance
A popular saying a few years ago was “if you torture data long
enough, it will confess to
anything.” �� Unfortunately, many regard statistical analysis
this same way. Some think that if
you do not get a rejection of the null hypothesis that you want,
6. simply repeat the sampling with a
larger group, at some sample size virtually all differences will
be found to be statistically
significant. Note, this is somewhat unethical for professional
researchers; however, those who
feel that proving their point is more important than following
professional guidelines have been
known to do this.
But, does statistical significance mean the findings should be
used in decision making?
If, for example we typically round salary to the nearest
thousand dollars when making decisions,
does a significant difference based on a $500 difference have
any practical importance?
Probably not, even if we could find a sample size large enough
to make this difference
statistically significant.
So, how do we decide the practical importance of a statistically
significant difference?
Once, and this is important, we have rejected the null
hypothesis – and only if we have rejected
the null hypothesis – we calculate a new statistic called the
effect size.
The name comes from the effect changing a variable’s value
would have on the outcome.
To understand this idea, let’s look at the male and female
compa-ratios. We found in week 2 that
the male and female compa-ratio means were not significantly
different. So, the “effect” of
changing from male to female when doing an analysis with the
compa-ratio mean would not be
very big. However, if we switched from the male to female
average salary, we would expect to
7. see a large effect or difference in the outcome since their
salaries were so different.
The effect size measure – however it is calculated for different
statistical tests – can be
interpreted in a similar fashion. Effect sizes generally have
their value translated into a “large,”
“moderate,” or “small” label. If we have a large effect, then we
know that the variable
interaction caused the rejection of the null hypothesis, and that
our results have a strong practical
significance. If, however, we have a small effect, then we can
be fairly sure that the sample size
caused the rejection of the null hypothesis and the results have
little to no practical significance
for decision making or research results. A moderate outcome is
less clear, and we might want to
redo the analysis with a different sample. (Note: since we have
already rejected the null,
repeating the experiment with a different sample in this case is
not to manipulate the findings,
but rather to study the effect of the variables in more detail.
This is done in research all the time
– providing evidence that the findings are replicable and
correct.) Examples of different effect
size measures and how to determine what is large, medium, and
small are presented in the third
lecture for this week.
Summary
Some are concerned about statistical outcomes as many
different values can produce a
statistically significant outcome, so they are not really sure
8. about what the outcome means. This
concern is frequently addressed with the use of a confidence
interval, a range of values that could
be the population parameter being estimated based on the
sample’s uncertainty. Remember that
since we have only a sample, we know the resulting estimates
are a bit off, but are generally
considered close enough to use for decisions. The confidence
interval gives us a range of values
that could be used in decision-making to see if outcomes might
differ with the more extreme
possibilities.
Just as we cannot use two single sample t-test outcomes to
determine if sample means
differ, we cannot use two individual sample CIs to make
decisions about differences between
two samples. A difference Confidence Interval needs to be
constructed for this outcome.
A second criticism about statistical significance outcomes
involves the impact that the
sample size has on an outcome. Some think that statistically
significant results are not always
significant for real world practical decisions and situations.
This is due to the impact of sample
sizes on the decision to reject the null. Almost any difference
can be found to be statistically
significant if the sample size is large enough. So, we can have
statistical significance of a
difference that has no practical importance at all.
This concern lead to the development of the effect size measure.
Used only when the
null hypothesis is rejected (meaning we have found a
statistically significant difference), the
9. effect size tells us if the rejection was due to the variables
changing (a large effect size and an
important difference) or due more to merely having a large
sample size (a small effect size and
an unimportant difference for all practical purposes). For
example, what impact would be
changing from being a male to a female have on mean salary; in
this case a big impact. The
impact of the same change on compa-ratio however is relatively
small. Different statistical tools
have different calculations for this measure.
If you have any questions on this material, please ask your
instructor.
After finishing with this lecture, please go to the first
discussion for the week, and engage
in a discussion with others in the class over the first couple of
days before reading the second
lecture.
BUS308 Week 5 – Lecture 2
A Different View: Confidence Intervals
Expected Outcomes
After reading this lecture, the student should be familiar with:
10. 1. How to construct a confidence interval for a mean.
2. How to interpret a confidence interval for a mean.
3. How to construct a confidence interval for mean differences.
4. How to interpret a mean difference confidence interval.
Overview
This week we are changing focus again. From looking at
differences and relationships,
we are going to look at considering what we can take from our
findings – broadening, so to
speak, our interpretation of what we have found out.
Our focus will take us to two distinctly different views of the
findings. The first topic
will be confidence intervals. We saw something similar to these
when we found which pairs of
means different with our ANOVA outcomes. One of the most
commonly seen example of
confidence intervals comes with opinion polling data. We often
see a result such as 56% of the
population approves of the idea with a margin of error of +/-
3%. This means that the real
percentage if everyone were asked would lie between 53 and
59% - the confidence interval. The
second will be something called Effect Size, a way of
examining how practically significant a
finding is. Effect sizes will be discussed in the third lecture for
this week.
Confidence Intervals
One thing we might have noticed during this class is that
different samples from the same
population (technically, with replacement, putting previously
selected individuals back into the
11. pool for a chance of being selected again), give us different
results. The sample used in the
lectures was different from the sample used in the homework
assignments, at least with the
salary and compa-ratio values. Yet, even with these sample
differences, the outcomes of the
statistical tests were the same – we rejected or not rejected the
same hypothesis statements with
our slightly different values.
So, many want to know the range of values that would could be
a population’s mean
based on the sample results, particularly the variation present in
the sample. A confidence
interval can provide us with this information. This is basically
a range of values that show us
the possible sampling error in our results. When we construct
them, we are able to specify the
level of confidence we desire that the interval will contain the
actual population mean – 90%,
95%, 99%, etc.
Confidence intervals often provide the added information and
comfort about estimates of
population parameter values that the single point estimates lack.
Since the one thing we do know
about a statistic generated from a sample is that it will not
exactly equal the population
parameter, we can use a confidence interval to get a better feel
for the range of values that might
be the actual population parameter. They also give us an
indication of how much variation exists
in the data set – the larger the range (at the same confidence
12. level), the more variation within the
sample data set – and the less representative the mean would be.
We are going to look at two
different kinds of confidence intervals this week – intervals for
a one sample mean and intervals
for the differences between the means of two samples.
One Sample Confidence Interval for the mean
A confidence interval is simply a range of values that could
contain the actual population
parameter of interest. It is centered on the sample mean, and
uses the variation in the sample to
estimate a range of possible values. To construct a confidence
interval, we use several pieces of
information from the sample, and the confidence level we want.
From the sample we use the mean, standard deviation, and size.
To get the confidence
level – a desired probability, usually set at 95%, that the
interval does, in fact, contain the
population mean – we use a related t value.
One-Sample Intervals
Example. Confidence intervals can be generated manually or
we can use the Excel
Confidence.T function found in the fx or Formulas statistical
list. To compare the two, we will
construct a 95% confidence interval for the female compa-ratio.
Our sample mean value is
1.069, the standard deviation is 0.70, and the sample size is 25
(from week 1 material).
Our confidence interval formula is mean +/- t* standard error of
the mean, which is the
13. same as: mean +/- t * stdev/sqrt(sample size).
Once we determine the confidence level we want, we use the
associated 2-tail t value to
achieve it. The t-value is found with the fx function t.inv.2t
(Prob, df). For a 95% confidence
interval, we would use t.inv.2t(0.05, 24), this equals 2.064
(rounded). Remember that the df for
a t is n-1, so for a sample of 25 it equals 24.
We now have all the information we need to construct a 95%
confidence interval for the
female compa-ratio mean:
CI = mean +/- t * stdev/sqrt(sample size) = 1.069 +/-
2.064*0.070/sqrt(25) = 1.069 +/- 0.029.
This is typically written as 1.04 to 1.098. Note: the standard
deviation divided by the square root
of the sample size is called the standard error of the mean, and
is the variation measure of the
sample used in several statistical tests, including the t-test and
confidence intervals.
Now, let’s look at how Excel can help us with this. Excel has
two confidence interval
functions, one based on the z distribution (often used for large
samples) and another based on the
t distribution. The t-based interval is becoming more and more
of the standard with the wide
availability of Excel and other statistical programs. So, we will
use it.
The Fx (or Formulas) function Confidence.t will give us the
14. margin of error (the +/- term)
for any desired interval. Here is a screenshot of this tool, with
our Female Compa-ratio values
filled in. Note that our value from above (0.029 rounded) is
shown even without hitting OK.
The associated 95% CI for males is 1.022 to 1.091. Note that
the ranges overlap. The
female range of 1.04 to 1.098 (with higher endpoint values than
the male range) and the male
range of 1.022 to 1.091(with lower endpoint values) overlap
quite a bit: between 1.04 to 1.091 of
each range is common to both. How do we interpret this
overlap? We cannot immediately say
an overlap means the two means might be equal. We will look
at the reason for this after looking
at confidence intervals for differences below.
In the meantime, there are a couple of guidelines we can use.
• In general, the more overlap, the less significant the difference
is.
• If the bottom quarter of the higher value range overlaps with
the top quarter of the
lower value range, the means would be found to be significantly
different at the alpha
= 0.05 level.
• If the bottom value of the higher value range just touches the
top value of the lower
valuer range, the means would be found to be significantly
different at the .01 alpha
level.
15. • If the ranges do not overlap at all, the means are significantly
different at an alpha
less than .01 level.
• If the intervals overlap more than at the extreme quarters, the
means are not
significantly different at the alpha = .05 level.
Since these CIs overlap more than a single quarter of their
ranges, our interpretation would be
that they are not significantly different at the alpha = .05 level.
This is what we found in week 1
when we performed the t-test for mean difference.
The Confidence Interval for mean differences.
When comparing multiple samples, it is always best to use all
the possible information in
a single test or procedure. The same is true for confidence
intervals. Rather than calculate 2 CIs
to determine if means differ, we can use a single Confidence
Interval for the difference. If we
are interested in seeing if sample means could be equal, we look
to see if the interval for their
difference contains 0 or not. If so, then the means could be the
same; if not, then the means must
be significantly different. This interpretation of the interval is
the same as we used with the
ANOVA intervals in week 3.
The formula for the mean difference confidence interval is mean
difference +/-
16. t*standard error. The standard error for the difference of two
populations is found by adding the
variance/sample size (which is the standard error squared) for
each and taking the square root.
For our compa-ratio data set we have the following values:
Female mean = 1.069 Male mean = 1.056 t = t.inv.2t(0.05, 48) =
2.106
Female Stdev = 0.070 Male Stdev = 0.084 Sample size = 50, df
= 48
Female variance = 0.070^2 = 0.0049 Male variance = 0.084^2 =
0.0071
Standard error = sqrt(Variance (female)/25) + Variance
(male)/25) =
Sqrt(0.0049/25 + 0.0071/25) = sqrt(.000196 + 000284)
=0.0219.
This gives us a 95% confidence interval for the difference
equaling:
(1.069-1.056)) +/- 2.011 * 0.0219 = 0.013 +/- 0.044 = -0.031 to
0.057.
Since this confidence interval does contain 0, we are 95%
confident that the male and female
compa-ratio means are are likely equal – which is the same
result we got from our 2-sample t-
test in week 2. We also now have a sense of how much
variation exists in our measures. One
interpretation of these two intervals is that the averages are
clearly similar rather than just barely
the same.
17. Side note: The “+/- t* SE” term is often called the margin of
error. We most often hear
this phrase in conjunction with opinion polls – particularly
political polls, “candidate A has 43%
approval rating with a margin of error of 3.5%. While we do
not deal with proportions in the
class, they are calculated the same as an empirical probability –
number of positive replies
divided by the sample size. The construction of these margins
or confidences is conceptually the
same – a t-value and a standard error of the proportion based on
the sample size and results.
Now, let’s go back and look at why two individual confidence
intervals do not provide
the same information on differences as a single CI for
differences. The reason lies in the statistic
used for our confidence level. The male and female individual
mean confidence intervals used a
t value of 2.064 while the CI for the difference used the smaller
value of 2.011, due to the larger
sample size. This difference in the t value between individual
and difference intervals is the
reason why the individual intervals cannot be used for a direct
interpretation of differences; they
are a bit too large for the combined data set.
Summary
gail long
Start headings on a new page.
Confidence intervals give us added insight into what our sample
18. results are saying about
the population. By developing a confidence interval around a
mean estimate we can see the
range of values that are reasonable estimates for the actual
parameter we are trying to estimate.
By looking at the spread within the interval, we can get a
feeling for how much variation exists
in the population (by using the variation within the sample).
Confidence intervals can give us a feeling about the differences
between mean estimates.
Both are built the same way: our estimate (sample result or
difference between means) +/- t times
the standard error (the standard deviation divided by the square
root of the sample size). The t
value is obtained using our desired confidence level and the
degrees of freedom associated with
our estimate.
Individual estimates of means can be used to get a feel for the
difference between
population means but they are harder to interpret precisely than
using the mean difference
confidence interval.
Please ask your instructor if you have any questions about this
material.
When you have finished with this lecture, please respond to
Discussion Thread 2 for this
week with your initial response and responses to others over a
couple of days before reading the
third lecture for the week.