BUS 308 Week 2 Lecture 1 Examining Differences

BUS 308 Week 2 Lecture 1
Examining Differences - overview
Expected Outcomes
After reading this lecture, the student should be familiar with:
1. The importance of random sampling.
2. The meaning of statistical significance.
3. The basic approach to determining statistical significance.
4. The meaning of the null and alternate hypothesis statements.
5. The hypothesis testing process.
6. The purpose of the F-test and the T-test.
Overview
Last week we collected clues and evidence to help us answer
our case question about
males and females getting equal pay for equal work. As we
looked at the clues presented by the
salary and comp-ratio measures of pay, things got a bit
confusing with results that did not see to
be consistent. We found, among other things, that the male and
female compa-ratios were fairly
close together with the female mean being slightly larger. The
salary analysis showed a different
view; here we noticed that the averages were apparently quite
different with the males, on
average, earning more. Contradictory findings such as this are
not all that uncommon when
examining data in the “real world.”

One issue that we could not fully address last week was how
meaningful were the
differences? That is, would a different sample have results that
might be completely different, or
can we be fairly sure that the observed differences are real and
show up in the population as
well? This issue, often referred to as sampling error, deals with
the fact that random samples
taken from a population will generally be a bit different than the
actual population parameters,
but will be “close” enough to the actual values to be valuable in
decision making.
This week, our journey takes us to ways to explore differences,
and how significant these
differences are. Just as clues in mysteries are not all equally
useful, not all differences are
equally important; and one of the best things statistics will do
for us is tell us what differences
we should pay attention to and what we can safely ignore.
Side note; this is a skill that many managers could benefit from.
Not all differences in
performances from one period to another are caused by
intentional employee actions, some are
due to random variations that employees have no control over.
Knowing which differences to
react to would make managers much more effective.
In keeping with our detective theme, this week could be
considered the introduction of
the crime scene experts who help detectives interpret what the
physical evidence means and how
it can relate to the crime being looked at. We are getting into
the support being offered by

experts who interpret details. We need to know how to use
these experts to our fullest
advantage. ��
Differences
In general, differences exist in virtually everything we measure
that is man-made or
influenced. The underlying issue in statistical analysis is that at
times differences are important.
When measuring related or similar things, we have two types of
differences: differences in
consistency and differences in average values. Some examples
of things that should be the
“same” could be:
• The time it takes to drive to work in the morning.
• The quality of parts produced on the same manufacturing line.
• The time it takes to write a 3-page paper in a class.
• The weight of a 10-pound bag of potatoes.
• Etc.
All of these “should” be the same, as each relates to the same
outcome. Yet, they all differ. We
all experience differences in travel time, and the time it takes to
produce the same output on the
job or in school (such as a 3-page paper). Production standards
all recognize that outcomes
should be measured within a range rather than a single point.
For example, few of us would be
upset if a 10-pound bag of potatoes weighed 9.85 pounds or
would think we were getting a great
deal if the bag weighed 10.2 pounds. We realize that it is
virtually impossible for a given

number of potatoes to weigh exactly the same and we accept
this as normal.
One reason for our acceptance is that we know that variation
occurs. Variation is simply
the differences that occur in things that should be “the same.”
If we can measure things with
enough detail, everything we do in life has variation over time.
When we get up in the morning,
how long it takes to get to work, how effective we are at doing
the same thing over and over, etc.
Except for physical constants, we can say that things differ and
we need to recognize this. A side
note: variation exists in virtually everything we study (we have
more than one language, word,
sentence, paragraph, past actions, financial transactions, etc.),
but only in statistics do we bring
this idea front and center for examination.
This suggests that any population that we are interested in will
consist of things that are
slightly different, even if the population contains only one
“thing.” Males are not all the same,
neither are females. Manufactured parts differ in key
measurements; this is the reason we have
quality control checking to make sure the differences are not
too large. So, even if we measure
everything in our population we will have a mean that is
accompanied by a standard deviation
(or range). Managers and professionals need to manage this
variation, whether it is quantitative
(such as salary paid for similar work) or even qualitative (such
as interpersonal interactions with
customers).
The second reason that we are so concerned with differences is

that we rarely have all the
evidence, or all the possible measures of what we are looking
for. Having this would mean we
have access to the entire population (everything we are
interested in); rarely is this the case.
Generally, all decisions, analysis, research, etc. is done with
samples, a selected subset of the
population. And, with any sample we are not going have all the
information needed, obviously;
but we also know that each sample we take is going to differ a
bit. (Remember, variation is
everywhere, including in the consistency of sample values.) If
you are not sure of this, try
flipping a coin 10 times for 10 trials, do you expect or get the
exact same number of heads for
each trial? Variation!
Since we are making decisions using samples, we have even
more variation to consider
than simply that with the population we are looking at. Each
sample will be slightly different
from its population and from others taken from the same
population.
How do we make informed decisions with all this variation and
our not being able to
know the “real” values of the measures we are using? This
question is much like how detectives
develop the “motive” for a crime – do they know exactly how
the guilty party felt/thought when
they say “he was jealous of the success the victim had.” This
could be true, but it is only an
approximation of the true feelings, but it is “close enough” to

say it was the reason. It is similar
with data samples, good ones are “close enough” to use the
results to make decisions with. The
question we have now focuses on how do we know what the
data results show?
The answer lies with statistical tests. They can use the
observed variation to provide
results that let us make decisions with a known chance of being
wrong! Most managers hope to
be right just over 50% of the time, a statistical decision can be
correct 95% or more of the time!
Quite an improvement.
Sampling. The use of samples brings us to a distinction in
summary statistics, between
descriptive and inferential statistics. With one minor exception
(discussed shortly), these two
appear to be the same: means, standard deviations, etc.
However, one very important distinction
exists in how we use these. Descriptive statistics, as we saw
last week, describes a data set. But,
that is all they do. We cannot use them to make claims or
inferences about any other larger
group.
Making inferences or judgements about a larger population is
the role of inferential
statistics and statistical tests. So, what makes descriptive
statistics sound enough to become
inferential statistics? The group they were taken from! If we
have a sample that is randomly
selected from the population (meaning that each member has the
same chance of being selected
at the start), then we have our best chance of having a sample
that accurately reflects the

population, and we can use the statistics developed from that
sample to make inferences back to
the population. (How we develop a randomly selected sample is
more of a research course issue,
and we will not go into these details. You are welcome to
search the web for approaches.)
Random Sampling. If we are not working with a random
sample, then our descriptive
statistics apply only to the group they are developed for. For
example, asking all of our friends
their opinion of Facebook only tells us what our friends feel; we
cannot say that their opinions
reflect all Facebook users, all Facebook users that fall in the
age range of our friends, or any
other group. Our friends are not a randomly selected group of
Facebook users, so they may not
be typical; and, if not typical users, cannot be considered to
reflect the typical users.
If our sample is random, then we know (or strongly suspect) a
few things. First, the
sample is unlikely to contain both the smallest and largest value
that exists in the larger
population, so an estimate of the population variation is likely
to be too small if based on the
sample. This is corrected by using a sample standard deviation
formula rather than a population
formula. We will look at what this means specifically in the
other lectures this week; but Excel
will do this for us easily.
Second, we know that our summary statistics are not the same

as the population’s
parameter values. We are dealing with some (generally small)
errors. This is where the new
statistics student often begins to be uncomfortable. How can we
make good judgements if our
information is wrong? This is a reasonable question, and one
that we, as data detectives, need to
be comfortable with.
The first part of the answer falls with the design of the sample,
by selecting the right
sample size (how many are in the sample), we can control the
relative size of the likely error.
For example, we can design a sample where the estimated error
for our average salary is about
plus or minus $1,000. Does knowing that our estimates could
be $1000 off change our view of
the data? If the female average was a thousand dollars more
and the male salary was a thousand
dollars less, would you really change your opinion about them
being different? Probably not
with the difference we see in our salary values (around 38K
versus 52K). If the actual averages
were closer together, this error range might impact our
conclusions, so we could select a sample
with a smaller error range. (Again, the technical details on how
to do this are found in research
courses. For our statistics class, we assume we have the correct
sample.)
Note, this error range is often called the margin of error. We
see this most often in
opinion polls. For example, if a poll said that the percent of
Americans who favored Federal
Government support for victims of natural disasters (hurricanes,
floods, etc.) was 65% with a

margin of error of +/- 3%; we would say that the true proportion
was somewhat between 62% to
68%, clearly a majority of the population. Where the margin of
error becomes important to
know is when results are closer together, such as when support
is 52% in favor versus 48%
opposed, with a margin of error of 3%. This means the actual
support could be as low as 49% or
as high as 55%; meaning the results are generally too close to
make a solid decision that the issue
is supported by a majority, the proverbial “too close to call.”
The second part of answering the question of how do we make
good decisions introduces
the tools we will be looking at this week, decision making
statistical tests that focus on
examining the size of observed differences to see if they are
“meaningful” or not. The neat part
of these tools is we do not need to know what the sampling
error was, as the techniques will
automatically include this impact into our results!
The statistical tools we will be looking at for the next couple of
weeks all “work” due to a
couple of assumptions about the population. First, the data
needs to be at the interval or ratio
level; the differences between sequential values needs to be
constant (such as in temperature or
money). Additionally, the data is assumed to come from a
population that is normally
distributed, the normal curve shape that we briefly looked at
last week. Note that many
statisticians feel that minor deviations from these strict
assumptions will not significantly impact
the outcomes of the tests.

The tools for this week and next use the same basic logic. If we
take a lot of samples
from the population and graph the mean for all of them, we will
get a normal curve (even if the
population is not exactly normal) distribution called the
sampling distribution of the mean.
Makes sense as we are using sample means. This distribution
has an overall, or grand, mean
equal to that of the population. The standard deviation equals
the standard deviation of the
population divided by the square root of the population. (Let’s
take this on faith for now, trust
me you do not want to see the math behind proving these. But
if you do, I invite you to look it
up on the web.) Now, knowing – in theory – what the mean
values will be from population
samples, we can look at how any given sample differs from
what we think the population mean
is. This difference can be translated into what is essentially a
z-score (although the specific
measure will vary depending upon the test we are using) that we
looked at last week. With this
statistic, we can determine how likely (the probability of)
getting a difference as large or larger
than we have purely by chance (sampling error from the actual
population value) alone.
If we have a small likelihood of getting this large of a
difference, we say that our
difference is too large to have been purely a sampling error, and
we say a real difference exists or
that the mean of the population that the sample came from is not
what we thought.

That is the basic logic of statistical testing. Of course, the
actual process is a bit more
structured, but the logic holds: if the probability of getting our
result is small (for example 4% or
0.04), we say the difference is significant. If the probability is
large (for example 37% or 0.37),
then we say there is not enough evidence to say the difference is
anything but a simple sampling
error difference from the actual population result.
The tools we will be adding to our bag of tricks this week will
allow us to examine
differences between data sets. One set of tools, called the t-
test, looks at means to see if the
observed difference is significant or merely a chance difference
due mostly to sampling error
rather than a true difference in the population. Knowing if
means differ is a critical issue in
examining groups and making decisions.
The other tool – the F-test for variance, does the same for the
data variation between
groups. Often ignored, the consistency within groups is an
important characteristic in
understanding whether groups having similar means can be said
to be similar or not. For
example, if a group of English majors all took two classes
together, one math and one English,
would you expect the grade distributions to be similar, or would
you expect one to show a larger
range (or variation) than the other?
We will see throughout the class that consistency and
differences are key elements to
understanding what the data is hiding from us, or trying to tell
us – depending on how you look

at it. In either case, as detectives our job is to ferret out the
information we need to answer our
questions.
Hypothesis Testing-Are Differences Meaningful
Here is where the crime scene experts come in. Detectives have
found something but are
not completely sure of how to interpret it. Now the training and
tools used by detectives and
analysts take over to examine what is found and make some
interpretations. The process or
standard approach that we will use is called the hypothesis
testing procedure. It consists of six
steps; the first four (4) set up the problem and how we will
make our decisions (and are done
before we do anything with the actual data), the fifth step
involves the analysis (done with
Excel), and the final and sixth step focuses on interpreting the
result.
The hypothesis testing procedure is a standardized decision-
making process that ensures
we make our decisions (on whether things are significantly
different or not) is based on the data,
and not some other factors. Many times, our results are more
conservative than individual
managerial judgements; that is, a statistical decision will call
fewer things significantly different
than many managerial judgement calls. This statistical
tendency is, at times, frustrating for
managers who want to show that things have changed. At other
times, it is a benefit such as if

we are hoping that things, such as error rates, have not changed.
While a lot of statistical texts have slightly different versions of
the hypothesis testing
procedure (fewer or more steps), they are essentially the same,
and are a spinoff of the scientific
method. For this class, we will use the following six steps:
1. State the null and alternate hypothesis
2. Select a level of significance
3. Identify the statistical test to use
4. State the decision rule. Steps 1 – 4 are done before we
examine the data
5. Perform the analysis
6. Interpret the result.
Step 1
A hypothesis is a claim about an outcome. It comes in two
forms. The first is the null
hypothesis – sometimes called the testable hypothesis, as it is
the claim we perform all of our
statistical tests on. It is termed the “Null” hypothesis, shown as
Ho, as it basically says “no
difference exists.” Even if we want to test for a difference,
such as males and females having a
different average compa-ratio; in statistics, we test to see if
they do not.
Why? It is easier to show that something differs from a fixed
point than it is to show that
the difference is meaningful – I mean how can we focus on
“different?” What does “different”
mean? So, we go with testing no difference. The key rule
about developing a null hypothesis is
that it always contains an equal claim, this could be equal (=),

equal to or less than (<=), or equal
to or more than (=>).
Here are some examples:
Ex 1: Question: Is the female compa-ratio mean = 1.0?
Ho: Female compa-ratio mean = 1.0.
Ex 2: Q: is the female compa-ratio mean = the male compa-
ratio mean?
Ho: Female compa-ratio mean = Male compa-ratio mean.
Ex. 3: Q: Is the female compa-ratio more than the male compa-
ratio? Note that this
question does not contain an equal condition. In this case, the
null is the opposite of what
the question asks:
Ho: Female compa-ratio <= Male compa-ratio.
We can see by testing this null, we can answer our initial
question of a directional
difference. This logic is key to developing the correct test
claim.
A null hypothesis is always coupled with an alternate
hypothesis. The alternate is the
opposite claim as the null. The alternate hypothesis is shown as
Ha. Between the two claims, all
possible outcomes must be covered. So, for our three examples,
the complete step 1 (state the
null and alternate hypothesis statements) would look like:

Ex 1: Question: Is the female compa-ratio mean = 1.0?
Ho: Female compa-ratio mean = 1.0.
Ha: Female compa-ratio mean =/= (not equal to) 1.0
Ex 2: Q: is the female compa-ratio mean = the male compa-
ratio mean?
Ho: Female compa-ratio mean = Male compa-ratio mean.
Ha: Female compa-ratio mean =/= Male compa-ration mean.
Ex. 3: Q: Is the female compa-ratio more than the male compa-
ratio?
Ho: Female compa-ratio <= Male compa-ratio
Ha: Female compa-ratio > Male compa-ratio. (Note that in this
case, the alternate
hypothesis is the question being asked, but the null is what we
always use as the
test hypothesis.)
When developing the null and alternate hypothesis,
1. Look at the question being asked.
2. If the wording implies an equality could exist (equal to, at
least, no more than, etc.),
we have a null hypothesis and we write it exactly as the
question asks.
3. If the wording does not suggest an equality (less than, more
than, etc.), it refers to the

alternate hypothesis. Write the alternate first.
4. Then, for whichever hypothesis statement you wrote, develop
the other to contain all
of the other cases. An = null should have a =/= alternate, an =>
null should have a <
alternate; a <= null should have a > alternate, and vice versa.
5. The order the variables are listed in each hypothesis must be
the same, if we list
males first in the null, we need to list males first in the
alternate. This minimizes
confusion in interpreting results.
Note: the hypothesis statements are claims about the population
parameters/values based
on the sample results. So, when we develop our hypothesis
statements, we do not consider the
sample values when developing the hypothesis statements. For
example, consider our desire to
determine if the compa-ratio and salary means for males and
females are different in the
population, based on our sample results. While the compa-ratio
means seemed fairly close
together, the salary means seemed to differ by quite a bit; in
both cases, we would test if the male
and female means were equal since that is the question we have
about the values in the
population.
If you look at the examples, you can notice two distinct kinds of
null hypothesis
statements. One has only an equal sign in it, while the other

contains an equal sign and an
inequality sign (<=, but it could be =>). These two types
correspond to two different research
questions and test results.
If we are only interested in whether something is equal or not,
such as if the male average
salary equals the female average salary; we do not really care
which is greater, just if they could
be the same in the population or not. For our equal salary
question, it is not important if we find
that the male’s mean is > (greater than) the female’s mean or if
the male’s mean is < (less than)
the female’s mean; we only care about a difference existing or
not in the population. This, by the
way, is considered a two-tail test (more on this later), as either
conditions would cause us to say
the null’s claim of equality is wrong: a result of “rejecting the
null hypothesis.”
The other condition we might be interested in, and we need a
reason to select this
approach, occurs when we want to specifically know if one
mean exceeds the other. In this
situation, we care about the direction of the difference. For
example, only if the male mean is
greater than the female mean or if the male mean is less than the
female mean.
Step 2
The level of significance is another concept that is critical in
statistics but is often not
used in typical business decisions. One senior manager told the
author that their role was to
ensure that the “boss’ decisions were right 50% +1 of the time

rather than 50% -1.” This
suggests that the level of confidence that the right decisions are
being made is around 50%. In
statistics, this would be completely unacceptable.
A typically statistical test has a level of confidence that the
right decision is being made is
about 95%, with a typical range from 90 to 99%. This is done
with our chosen level of
significance. For this class, we will always use the most
common level of 5%, or more
technically alpha = 0.05. This means we will live with a 5%
chance of saying a difference is
significant when it is not and we really have only a chance
sampling error.
Remember, no decision that does not involve all the possible
information that can be
collected will ever have a zero possibility of being wrong. So,
saying we are 95% sure we made
the right call is great. Marketing studies often will use an alpha
of .10, meaning that are 90%
sure when they say the marketing campaign worked. Medical
studies will often use an alpha of
0.01 or even 0.001, meaning they are 99% or even 99.9% sure
that the difference is real and not
a chance sampling error.
Step 3
Choosing the statistical test and test statistic depends upon the
data we have and the
question we are asking. For this week, we will be using compa-
ratio data in the examples and

salary data in the homework – both are continuous and at least
interval level data. The questions
we will look at this week will focus on seeing if there is a
difference in the average pay (as
measured by either the compa-ratio or salary) between males
and females in the population,
based on our sample results. After all, if we cannot find a
difference in our sample, should we
even be working on the question?
In the quality improvement world, one of the strategies for
looking for and improving
performance of a process is to first look at and reduce the
variation in the data. If the data has a
lot of variation, we cannot really trust the mean to be very
reflective of the entire data set.
Our first statistical test is called the F-test. It is used when we
have at least interval level
data and we are interested in determining if the variances of two
groups are significantly
different or if the observed difference is merely chance
sampling error. The test statistic for this
is the F.
Once we know if the variances are the same or not, we can
move to looking for
differences between the group means. This is done with the T-
test and the t-statistic. Details on
these two tests will be given later; for now, we just need to
know what we are looking at and
what we will be using.
Step 4
One of the rules in researching questions is that the decision

rule, how we are going to
make our decision once the analysis is done, should be stated
upfront and, technically, even
before we even get to the data. This helps ensure that our
decision is data driven rather than
being made by emotional factors to get the outcome we want
rather than the outcome that fits the
data. (Much like making our detectives go after the suspect that
did the crime rather than the one
they do not like and want to arrest, at least when they are being
honest detectives.)
The decision rule for our class is very simple, and will always
be the same:
Reject the null hypothesis if the p-value is less than our alpha
of .05. (Note: this would
be the same as saying that if the p-value is not less than 0.05,
we would fail to reject the null
hypothesis.)
We introduced the p-value last week, it is the probability of our
outcome being as large or
larger than we have by pure chance alone. The further from the
actual mean a sample mean is,
the less chance we have of getting a value that differs from the
mean that much or more; the
closer to the actual mean, the greater our chance would be of
getting that difference or more
purely by sampling error.
Our decision rule ties our criteria for significance of the
outcome, the step 2 choice of
alpha, with the results that the statistical tests will provide (and,
the Excel tests will give us the p-
values for us to use in making the decisions).

These four steps define our analysis, and are done before we do
any analysis of the data.
Step 5
Once we know how we will analyze and interpret the results, it
is time to get our sample
data and set it up for input into an Excel statistical function.
Some examples of how this data
input works will be discussed in the third lecture for this week.
This step is fairly easy, simply identify the statistical test we
want to use. The test to use
is based on our question and the related hypothesis claims. For
this week, if we are looking at
variance equality, we will use the F-test. If we are looking at
mean equality, we will use the T-
test.
Step 6
Here is where we bring everything together and interpret the
outcomes.
What is constant about this step is the need to:
1. Look at the appropriate p-value (indicated in the test outputs,
as we will see in lecture
2).
2. Compare the p-value with our value for alpha (0.05).
3. Make a decision: if the test p-value is less than or equal to
(<=) 0.05, we will reject

the null hypothesis. If the test p-value is more than (=>) 0.05,
we will fail to reject
the null hypothesis.
Rejecting the null hypothesis means that we feel the alternate
hypothesis is the more
accurate statement about the populations we are testing. This is
the same for all of our statistical
tests.
Once we have made our decision to reject or fail to reject the
null hypothesis, we need to
close the loop, and go back and answer our original question.
We need to take the statistical
result or rejecting or failing to reject the null and turn it into an
“English” answer to the question.
Doing so depends on how the original question lead to the
hypothesis statements. Examples of
this follow in Lecture 2.
Lectures 2 and 3 will show how to use this process in
conjunction with Excel and the F
and T tests. For now, focus on the logic of setting up the
testing instructions.
Summary
This week we begin our journey discovering ways to make
decisions on data, and more
specifically differences in data sets, based on generally agreed
upon approaches rather than by
“guess and by golly.” The process is called hypothesis testing
and is part of the scientific
method of research and decision making.

In this approach we always test a claim of no difference (the
null hypothesis) whether or
not we are suspect or desire to see an actual difference. The
null hypothesis is paired with an
alternate hypothesis that is exactly the opposite claim.
Decisions are made based on a p-value
which is the probability that we would see a difference as large
or larger as we got if the null
hypothesis is true. Small p-values mean we reject the null as
not being an accurate description of
the population we are looking at.
The hypothesis testing process (or procedure) has six steps.
The first four are completed
before we look at the data; the fifth step is the actual
calculation of the statistical test and the
final and sixth step is where the analysis of the results is done.
The steps are:
1. State the null and alternate hypothesis
2. Select a level of significance
3. Identify the statistical test to use
4. State the decision rule
5. Perform the analysis
6. Interpret the result
If you have any questions on this material, please ask your
instructor.
After finishing with this lecture, please go to the first
discussion for the week and engage
in a discussion with others in the class over the first couple of

days before reading the second
lecture.
Biases
Stereotype
A single attitude or belief about an entire group of people.
Oversimplification.
Not inherently negative.
2
Prejudice

A set of attitudes or believes that devalues an entire group of
people.
Attitudes, NOT behaviors.
Inherently negative.
Project Implicit Social Attitudes (implicit bias tests)
https://implicit.harvard.edu/implicit/
3
Discrimination
Behavior or set of behaviors that hurts an entire group of people
4
Oppression

A system of forces that presses down on an entire group of
people, preventing them from living the good life.
5
Privilege
A system of forces that lifts up an entire group of people,
helping them live the good life.
6
Paradox of Privilege

Privilege is assigned by others
Can be privileged without feeling privileged
Privilege doesn’t make you happy
7

BUS 308 Week 2 Lecture 1 Examining Differences

Recommended

Recommended

More Related Content

Similar to BUS 308 Week 2 Lecture 1 Examining Differences

Similar to BUS 308 Week 2 Lecture 1 Examining Differences (20)

More from jasoninnes20

More from jasoninnes20 (20)

Recently uploaded

Recently uploaded (20)

BUS 308 Week 2 Lecture 1 Examining Differences