BUS 308 Week 2 Lecture 3
Setting up the F and T tests in Excel
After reading this lecture, the student should know:
1. How to set up data lists for the F and T tests.
2. How to set-up and conduct the F test (both options) produced by Excel
3. How to set-up and conduct the T-test produced by Excel
Overview
One of the nice characteristics of Excel is that setting up and running most functions and
tests is done in a very similar fashion, only having specific test related differences showing up in
the different functions and tests.
This lecture will cover setting up data ranges that will be used for all of our statistical
functions. It will then move into setting up the F and T tests specifically.
Setting up Data
While in the hypothesis testing procedure it was said to set up steps 1 – 4 before even
looking at the data, we can set up the data columns to be used at any time. The set-up is simple
and straightforward. But, we have a couple of questions to answer before we set things up.
Since this week needs us to compare male and female outcomes (and Degree outcomes in
Question 3), we need to decide how we want our data to look. Sticking strictly with the gender
related data (you can do similar things with the degree data when ready), we need to decide if we
want our key data (compa-ratios, salary, etc.) to be in a long column or in two columns. An
example of both is shown in the screen shot below.
Notice that Column S contains all of the compa-ratio values (all 50 if we could see the
entire range) and that they are grouped by gender, with the first 25 rows being female values and
the last 25 rows being male values. The other way to display the data values is to have them
listed in separate columns, such as shown in columns Q and R – each having a label heading.
Start by looking at what variables the questions are asking for. For week 2, we have
Questions 1 and 2 asking for the same variables – compa-ratio and gender1, so we can use the
same location for both questions. Question 3 asks for a different set of variables, compa-ratio
and degree, so we should set up a different area for that question. Remember, it is best to
NEVER sort the data on the data tab. An error in sorting that missed a column could mess up the
data set and make it unusable for other problems.
In either case, copy the entire data column of interest (for example, compa-ratio,
Gender1, Degree, etc.) from the Data Tab to the weekly worksheet. Highlight the entire data
range of interest including the label in row 1, then press Control + C at the same time. Go over
to the weekly work sheet and find a column to the right of the work area (generally columns Q or
higher will be OK) and press Control + V at the same time. Repeat this for all the variables you
need.
After pasting the variables, use the Sort function in the Data tab to arrange them in
whatever order you want. You can do multiple sorts at the same time w.
BUS 308 Week 2 Lecture 3 Setting up the F and T tests in E.docx
1. BUS 308 Week 2 Lecture 3
Setting up the F and T tests in Excel
After reading this lecture, the student should know:
1. How to set up data lists for the F and T tests.
2. How to set-up and conduct the F test (both options) produced
by Excel
3. How to set-up and conduct the T-test produced by Excel
Overview
One of the nice characteristics of Excel is that setting up and
running most functions and
tests is done in a very similar fashion, only having specific test
related differences showing up in
the different functions and tests.
This lecture will cover setting up data ranges that will be used
for all of our statistical
functions. It will then move into setting up the F and T tests
specifically.
Setting up Data
While in the hypothesis testing procedure it was said to set up
steps 1 – 4 before even
looking at the data, we can set up the data columns to be used at
any time. The set-up is simple
and straightforward. But, we have a couple of questions to
answer before we set things up.
2. Since this week needs us to compare male and female outcomes
(and Degree outcomes in
Question 3), we need to decide how we want our data to look.
Sticking strictly with the gender
related data (you can do similar things with the degree data
when ready), we need to decide if we
want our key data (compa-ratios, salary, etc.) to be in a long
column or in two columns. An
example of both is shown in the screen shot below.
Notice that Column S contains all of the compa-ratio values (all
50 if we could see the
entire range) and that they are grouped by gender, with the first
25 rows being female values and
the last 25 rows being male values. The other way to display
the data values is to have them
listed in separate columns, such as shown in columns Q and R –
each having a label heading.
Start by looking at what variables the questions are asking for.
For week 2, we have
Questions 1 and 2 asking for the same variables – compa-ratio
and gender1, so we can use the
same location for both questions. Question 3 asks for a
different set of variables, compa-ratio
and degree, so we should set up a different area for that
question. Remember, it is best to
NEVER sort the data on the data tab. An error in sorting that
missed a column could mess up the
data set and make it unusable for other problems.
In either case, copy the entire data column of interest (for
3. example, compa-ratio,
Gender1, Degree, etc.) from the Data Tab to the weekly
worksheet. Highlight the entire data
range of interest including the label in row 1, then press Control
+ C at the same time. Go over
to the weekly work sheet and find a column to the right of the
work area (generally columns Q or
higher will be OK) and press Control + V at the same time.
Repeat this for all the variables you
need.
After pasting the variables, use the Sort function in the Data tab
to arrange them in
whatever order you want. You can do multiple sorts at the same
time with this function – for
example, you can sort the compa-ratios by gender1 first (to
group all male and female values
together) and then within each gender group sort the values
from high to low by adding a second
sort row.
If you would like, you can then create new columns of data by
copying and pasting
sections of the data range – for example, creating Male and
Female columns. The advantage to
this approach is that you can include the labels in the data entry
boxes and have the variable
labels included in the output tables as the examples showed in
Lecture 2.
The F-Test Set-up
In each question asking for an analysis of data using the
4. hypothesis testing process, step 5
requires that you place the results of a statistical test in a
certain cell. This, is mostly for the
convenience of the instructors reviewing your work but deciding
where to put the output is
required for every test you run.
The following shows the setting up of the hypothesis testing
steps and conducting of the
F-test to answer our question about the equality of male and
female compa-ratio variance. (Note:
again, you will perform these steps for salary variance in your
homework.)
Before even getting to the test itself, we have a couple of
questions to answer. Part a of
question 1 asks where the data range is for this question. We
always need to know where the
data is that we are using for tests, even if – as is true in this
case – the data is on the same work
sheet. So, list where the variables are listed, such as in the
range S1:T51 or Q1:Q26 as in the
examples above. Either would be an appropriate entry for the
data shown. One reason for this
question is to allow instructors to see if a data copy or sorting
error occurred if the data results
are not correct.
The second question simply asks for you to decide if a one- or
two-tail test is required for
the question being asked. This is to help prepare you for the
actual hypothesis testing steps.
Now, the set-up concerns move to Step 5: Conduct the test.
Note that a cell location is
given for you to place your outputs. In most cases, the tests we
5. want to perform are located in
the Analysis ToolPak option found in the Analysis tab on the far
right of the Data Ribbon. Left
Click on the Data label on the green ribbon at the top of an
Excel page, then click on the
Analysis Tab or on the Data Analysis tool listed. Once the Data
Analysis list is shown, scroll
down to your desired tool.
Below is a screen shot of locating the F-test Two-Sample for
Variance in the Data Analysis list.
The F.TEST option for question 1 is found in the Fx (or
Formulas) Statistical list. Here
is a screenshot of where the F.Test is found in the fx Statistical
list.
Either test can be used for this question. After highlighting the
desired test, just select
OK at the bottom and a data entry box will open. Both are
somewhat similar, so only the F-Test
Two Sample for Variances data entry will be shown below.
Here is a screenshot of the data entry box for the F-Test Two-
Sample for Variance. Note
that the compa-ratios have been copied over to columns headed
by labels of Male and Female.
This lets our test results show the label for each group. Also
note, that for this screenshot, the
6. results are placed next to the data columns (AA2), while in your
assignment K10 should be listed
in the Output Range box.
Note, always enter the variables in the order listed in the null
hypothesis statement; since
the male values were entered in the Variable 1 range, the
hypothesis statements should list the
male variable first. This makes interpreting the test results
easier.
Entering cell values into any box is fairly simple. You can
simply type the data range
into the box, using a : between the starting and ending cells.
You can place the cursor in a box,
left click, and then move the cursor to the top cell in the data
range (include labels if present),
hold down the left button and drag the cursor to the end of the
data range and release the left
button. Or, you can click on the symbol at the right which opens
a box, then enter the data by
either technique just mentioned and click on the icon at the
right.
After entering the data ranges, click on the Labels box if, and
only if, you have included
labels in the data input range. An alpha of 0.05 is automatically
selected but can be changed
simply by entering another value. Finally, go to the Output
Options and click on the desired
location – for this class use Output Range and then enter the
cell location into the box. Click on
Ok and you are done.
The process is pretty straightforward, but once in a while an
7. error occurs. The most
common is when someone does not include labels in the input
range but checks the labels box.
This is fairly easy to spot – the data tables will have a data
value listed as a label, and – at least
for the questions this week – will show a data count of 24 rather
than the correct count of 25 per
group. If this occurs, simply go back and reenter the data with
the labels. Excel will tell you that
you are about to overwrite existing data, and that is what you
want to do, so check OK.
The F.Test is even simpler to set-up. Going to Fx (or
Formulas), statistical list, and
selecting the F.Test will produce a data entry box that simply
asks for each data range – as with
the top entries in the F-Test shown above. Complete them in
the same way and select Ok. (Do
not include labels in these ranges.) The F.Test outcome shows
up in the cell your cursor was on
when you opened the Fx link.
VIDEO Link: Here is a video on the F-Test Two Sample for
Variances: https://screencast-o-
matic.com/watch/cbQuFRIwDX .
The T-Test Set-up
There are three versions of the T-Test done for us by Excel.
The first two are similar
except one version is done if the variances are equal and the
other if the variances are not equal.
(Now we see an important reason for performing the F-test
8. first.)
The third version of the T-test is for paired data, and is called
T-test Paired Two Sample
for Means. Paired data are two measures taken on the same
subject. Examples include a math
and English test score for each student, preference sores for
different drinks, and, in our data set
the salary and midpoint values. Note that paired data must be
measured in the same units, and be
from the same subjects. Students in the past have incorrectly
used the paired t-test on male and
female salaries. These are not paired, as the measures are taken
on different people and cannot
be paired together for analysis.
In many ways, setting up Excel’s T-tests, and virtually all the
functions we will study,
follow the same steps as we just went through:
1. Set up the data into distinct groups.
2. Select the test function from either the Fx or Analysis list
3. Input the data ranges and output ranges into the appropriate
entry boxes, checking
Labels if appropriate.
4. Clicking on OK to produce the output.
As with the F-test, the T-test has a couple of options depending
upon what you want your
output to look like. The Fx (or Formulas) option returns simply
the p-value for the selected
version of the test. The Data | Analysis selection provides
descriptive statistics that are useful for
additional analysis (some of which we will discuss later in the
course).
9. The t-test requires that we select between three versions, one
assuming equal variances
between the populations, one assuming unequal variances in the
populations, and one requiring
paired data (two measures on each element in the sample, such
as salary and midpoint for each
person in our data set.) All have the same data set-up approach,
so only one will be shown.
Setting up the data and test for question 2 about mean equality
is similar to what one for
the F-test question, and we can actually use the same data
columns as we used in question 1 on
variances. Again, after sorting the data into your comparison
groups (with labels as we did for
the F-test), select the appropriate test from either the Fx or
Analysis list. A completed T-test
Two-Sample Assuming Equal Variances input table is shown
below.
The input box looks a lot like the one we saw for the F-test, and
is completed in the same way.
Enter the data ranges in the same order you have them listed in
the hypothesis statements, check
the labels box if appropriate, and identify your output range top
left cell (this is given in the
homework problems for a consistent format for instructor
grading).
There is one input that differs and which we have not yet
discussed, Hypothesized Mean
Difference. For the most part, we do not use this. An example
10. of when we might want to is
when we have made a change and want to test its effectiveness.
For example, we might have a
pre- and post-test in a training course. In the original design,
the average improvement might be
10 points on the post-test. If we change the design of the
training, we would be interested not
only in showing a significant change between the two tests but
also a better change due to the
revision. In this case, the first 10-point difference in the tests
is a given, we want to know if the
additional score change is significant. So, we enter 10 in the
HMD box, and the analysis looks at
only the mean difference larger than 10, the marginal
improvement due to the design change.
The input for the Fx T.Test contains 4 boxes, and produces the
p-value in the cell the
cursor is in. The first two boxes are the data range for each
variable, and these should not have a
label included. The third box asks whether you have a one or
two tail test. The forth box asks
for the kind of test, paired, equal variance, or unequal variance.
Once we click OK for the T.test. we get a output, the p-value.
When we click OK on the
Analysis ToolPak function we get a more descriptive table;
much like the differences with the
two versions of the F.
There is no difference in setting up a Data Analysis test for a
one- or two-tail outcome,
11. these results are examined in the output, not in the input
screens.
Question 3
The only data entry difference for this question is the need to
copy, paste, and sort the
degree and gender1 variable columns. The rest of the set-up is
exactly the same as done for
either question 1 or question 2.
Special Case: The One-Sample T-test
Often, we may want to test the results of a sample against a
standard; for example, is the
weight of a production run of 8 ounces of canned pears actually
equal to the standard of 8.02 oz.?
(Note, most manufactures will put in slightly more than the
label says to avoid being
underweight which could result in a fine.)
Excel is not set up to perform this test, but we can “trick” it to
do this for us. In the one-
sample case, we need two pieces of information, the sample
values and our comparison standard.
Set these up as if they were any two-sample data sets, have our
sample values (for example, 25
female compa-ratios in one column) and our comparison value
in another. The comparison data
column will only contain a single value equal to our comparison
value. For example, we might
want to test if the average female compa-ratio was greater than
the compa-ratio midpoint of 1.00.
The null would be H0: female compa-ratio mean <= 1.00 while
the alternate would be Ha:
Female compa-ratio mean > 1.00. The Compa-ratio data column
12. would contain the Female
compa-ratios and the other column (named for convenience as
Ho Data) would contain only the
value of 1.00, our standard value.
While we will leave the math for any interested student to
perform, if we take the T-test
unequal variance formulas for both the t-value and the df value
and have a variance of 0 for one
variable, both will reduce to the one-sample t-test formula and
df value. Knowing this, we can
use the unequal variance version of the t-test to perform what is
essentially a one-sample test for
us.
The output of this test will show a mean of 1.0 and a variance of
0 for the Ho Data
(comparison) value, and the correct values for the Female
compa-ratio variable, including the p-
values.
Here is a video on setting up and using the t-test in Excel:
https://screencast-o-
matic.com/watch/cb6lYcImnn
Summary
Conducting an F or t test is fairly straightforward: set-up the
data, select the appropriate
test from the Analysis Toolpak or Fx/Formulas list, enter the
data into the set-up box, and
identify the cell you want the result placed in.
Setting up the data for either test is the same. Label two
13. columns with the name of each
group and list all the related measures (for example, all Male
salaries in a column named Male)
vertically under the label. Each test has a set-up box that will
ask for the ranges for each group.
When entering the data in the Analysis Toolpak function, be
sure to include each label.
Labels cannot be included in the Fx version of either test.
Please ask your instructor if you have any questions about this
material.
When you have finished with this lecture, please respond to
Discussion Thread 3 for this
week with your initial response and responses to others over a
couple of days before reading the
third lecture for the week.
https://screencast-o-matic.com/watch/cb6lYcImnn
https://screencast-o-matic.com/watch/cb6lYcImnn
BUS 308 Week 2 Lecture 2
Statistical Testing for Differences – Part 1
After reading this lecture, the student should know:
1. How statistical distributions are used in hypothesis testing.
2. How to interpret the F test (both options) produced by Excel
3. How to interpret the T-test produced by Excel
Overview
14. Lecture 1 introduced the logic of statistical testing using the
hypothesis testing procedure.
It also mentioned that we will be looking at two different tests
this week. The t-test is used to
determine if means differ, from either a standard or claim or
from another group. The F-test is
used to examine variance differences between groups.
This lecture starts by looking at statistical distributions – they
underline the entire
statistical testing approach. They are kind of like the
detective’s base belief that crimes are
committed for only a couple of reasons – money, vengeance, or
love. The statistical distribution
that underlies each test assumes that statistical measures (such
as the F value when comparing
variances and the t value when looking at means) follow a
particular pattern, and this can be used
to make decisions.
While the underlying distributions differ for the different tests
we will be looking at
throughout the course, they all have some basic similarities that
allow us to examine the t
distribution and extrapolate from it to interpreting results based
on other distributions.
Distributions. The basic logic for all statistical tests: If the null
hypothesis claim is
correct, then the distribution of the statistical outcome will be
distributed around a central value,
and larger and smaller values will be increasingly rare. At some
point (and we define this as our
alpha value), we can say that the likelihood of getting a
difference this large is extremely
15. unlikely and we will say that our results do not seem to come
from a population that matches the
claims of the null hypothesis.
Note that this logic has several key elements:
1. The test is based on an assumption that the null hypothesis is
correct. This gives us a
starting point, even if later proven wrong.
2. All sample results are turned into a statistic that matches the
test selected (for
example, the F statistic when using the F-test, or the t-statistic
when using the T-test.)
3. The calculated statistic is compared to a related statistical
distribution to see how
likely an outcome we have.
4. The larger the test statistic, the more unlikely it is that the
result matches or comes
from the population described by the null hypothesis claim.
We will demonstrate these ideas by looking at the questions
being asked in this week’s
homework. We will show results of the related Excel tests, and
discuss how to interpret the
output.
We need to remember that seeing different value (mean,
variance, etc.) from different
samples does not tell us that the population parameters we are
estimating are, in fact, different.
The one thing we know about sampling is that each sample will
16. be a bit different. They
generally provide a “close enough” estimate to the population
values of concern for decision and
action purposes. But, they are not an exact match. This
difference is examined by the use of the
statistical tests, which tell us how much importance we should
attach to observed differences.
Lecture Examples
The lectures for each week will also look at our class question
of whether or not males
and females are paid equally for equal work. These additional
analyses provide some different
clues on what the data is trying to tell us about company pay
practices.
While your analysis will focus directly on the salary that males
and females are being
paid, the lecture examples will use an alternate method of
examining pay practices. Many
compensation professionals often use a relative pay measure
called the “comparison-ratio,” or
compa-ratio, to examine pay patterns within the company.
Some background on this measure. Many companies use grades
to group jobs of equal
value to the company into groups that have a similar pay range
– the values that a company is
willing to pay employees for the job. (As strong as a performer
a mail room clerk is, they will
rarely be paid the same as the CEO.) Many companies will set
the middle of this range, the
midpoint, as the average salary that that market pays to hire
someone into the job. This is how
companies remain competitive in their hiring.
17. Now, compensation professionals will generally want to analyze
how the company is
paying employees relative to these market rates (as summarized
by the midpoint). One approach
is to divide each employee’s salary by its related midpoint. The
outcome is the compa-ratio
which is considered an alternate measure of pay that eliminates
the impact of different grades.
The compa-ratio reports pay as a ratio of the actual salary
divided by the salary grade’s midpoint.
The compa-ratio shows if an employee is being paid more than
the midpoint (measure’s
value > 1.0) or less than the midpoint (< 1.0). This measure
allows us to look at salary
dispersion within a company without focusing on the exact
dollar values. It allows a comparison
between what the company is paying and what the outside
market is paying (which most
company’s target as the midpoint of a salary range) for the jobs.
The compa-ratio shows if employees are paid above or below
the grade midpoint and it
can be used to see what the dispersion pattern of pay. Equal
pay would expect to see similar
distributions, variances and means, between males and females
in this measure.
The lecture examples will cover the same statistical tests as the
homework assignments
but will focus on the compa-ratio pay measure rather than
salary. As such, the results presented
18. each week should be included and/or factored into your weekly
conclusions on what the data has
told us about the answer to our question.
The first step in looking at whether males and females are paid
equally would be to look
at the average pay of each. Given our sample is a random
sample of the population of employees
(and, therefore considered to be representative of the
population), the average salaries or average
compa-ratios (they measure related but not identical measures
of pay) will give us an indication
of whether things are the same for each gender or not.
One issue in looking at averages is the variation within the
groups. If both groups have
the same or very similar variation across the salaries then we
test the averages for a difference
using one approach. If the group variances are significantly
different, we use a slightly
approach. So, the first step is to examine group variances. This
is done with the F-test.
F-test
As noted, the F-test is used to compare variances to determine if
the differences noted
could be from simple sampling error (also known as pure chance
alone) or if the differences are
large enough to be considered statistically different. The F
statistic is simply one group’s
variance divided by the other group’s variance. (When done by
hand, it is traditional to have the
largest variance in the numerator, but this is not critical when
Excel performs the test for us.) So,
if the variances are equal, then the result of one variance
19. divided by the other would equal 1.0 –
this is the center of the F distribution. How about a situation
where one variance equaled 4 and
the other equaled 5 (randomly picked numbers for this
example)? If we divided the larger by the
smaller, we would get 5/4 = 1.25 while if we divided the
smaller by the larger, we would get 0.8.
Note that these values are on each side of the center value of
1.00. This is what is meant by “two
tails” with the F-test – one tail of the distribution has values
less than 1.0 while the other has
values greater than 1.0.. Our value of F depends first on the
variances (of course) and then on
how we do the division. The likelihood of these two variances
coming from populations that
have the same variance does not depend upon which tail the
result is in, but rather how likely it is
to see a difference from 1.00. This is given to us by the F-test
p-value (probability value of
seeing a difference as large or larger than what we have if the
null hypothesis is true).
One new concept introduced with the F-test is the idea of
degrees of freedom (df). While
the technical explanation is somewhat tedious, we can
understand the concept with a simple
example. If we have 5 numbers, for example: 1, 2, 3, 4, and 5;
we also have a sum of them; in
this case 15. Now, assume we can change any of the numbers in
the data set with the only
requirement being that the total must remain the same. How
many of the numbers are we free to
change; or what is our degree of freedom in making changes?
In this case, we can change any 4
of the values, as soon as we do so we automatically get the fifth
value (whatever is needed to
20. make the sum equal to 15). Thus, to generalize, our df is the
count we have minus 1 (equaling n
-1). N-1 is the formula for the degrees of freedom for each
variable in the F-test. We will se this
idea in other statistical tests, each of which has its own formula
to calculate it. The nice thing is
that Excel will give us this outcome without our needing to
worry about it, and we rarely have to
actually use it in any of our work – but, it is technically part of
most statistical tests.
There are two versions of the F-test available for use. One is
located in the Data ribbon
under the Analysis block in the Data Analysis link and is called
F-Test Two-Sample for
Variances. The other is located in the Fx Statistical list (which
is duplicated in the Formulas
ribbon under the More Functions option and the Statistical list)
and is called simply F.test.
While both test variances, there is an important difference. The
F-Test Two Sample for
Variances option provides some additional summary statistics
(mean, variance, count) for each
sample, but only provides a one-tail test outcome. One-tail
results, whether with the F-Test or
the T-test are used to test a directional difference in variances,
when we want to know if one
variance is larger (or smaller) than the other. Since, in general
we are interested in the simpler
question of whether the variances are equal or not (without
regard to which is larger), when
using this form to test for equality or not, we need to double the
p-value to find the two t-tail p-
21. value we need for our decision on rejecting the null hypothesis.
On the other hand, the F.test found in Fx or Formulas returns
only the two-tail p-value;
enough for a decision on rejecting or failing to reject the null
hypothesis of no difference but
nothing else. Technically, this is the version we should use
when conducting our two-tail
questions in the homework, but (as noted) either can be used if
we remember to double the p-
value for the one -tail outcome.
Example: Testing for Variance Equality
As mentioned above, it is often beneficial to start with looking
at variance equality when
comparing groups. We need to start our analysis of equal pay
for equal work by seeing if there is
even an issue to be concerned with. So, we have selected our
random sample of 25 males and 25
females from our corporate population. (A couple of
assumptions; the company exists in only
one location, and all our employees in the sample are exempt
professions or managers with at
least a bachelor’s college degree.)
Our initial question is: Are the male and female compa-ratio
variances equal? (Note, if
they are, this would mean that the standard deviations of both
groups are the same.) As with all
statistical tests, we will be using our samples to make
judgements or inferences about the
population values. While the sample result values will differ,
this difference may not be large
enough to show that the population values are not the same.
22. Question 1. One of the first things of interest to detectives is if
the behavior of the
suspects differs from what they normally do. That is, who’s
behavior varies from the norm?
Relating this to our compa-ratio measures has us asking if the
compa-ratio variance for males
and females are equal within the population. (In the homework,
the question asks about salary
variances. The logic and approach for answering the salary-
based question is the same as shown
below.)
Variance equality is tested using the F-Test. There are two
versions of this test available
to us that we could use, and both will be shown below. Note
that equal standard deviations do
not automatically mean that the means are close, it just tells us
if the dispersion patterns are
similar. If similar, the means of each group can be considered
equally reflective of the data. The
following focuses on just setting up the data for and performing
the statistical test.
The following show only the output for the six hypothesis
testing steps. How the Excel
F-tests are set up is covered in Lecture 3 for this week.
Step 1: The question asked is whether the variances for males
and females are equal. The
hypothesis statements for an equality test are shown below.
Ho: Male compa-ratio variance = Female compa-ratio variance
23. Ha: Male compa-ratio variance =/= Female compa-ratio
variance
(Since the question asks about equality and not a directional
difference, this is a two-tail
test. The Null must contain the names of the two variables
involved (Male and Female),
the statistic being tested (variance), and the relationship sign
(=). The alternate provides
the opposite view so that between them all possible outcomes
are covered. We are only
concerned if the variances are equal, not whether one or the
other is larger (or smaller).)
Step 2: We state our decision-making criteria here. It is: Alpha
= 0.05 (This will be the
same for all statistical tests we perform in the class, and
therefore the same in all
hypothesis set-ups.)
Step 3: The test, test statistic, and the reason for selecting the
test are stated here. For this
example, we are using: F statistic and F-test for Variance. We
use these as they are
designed to test variance equality.
Step 4: Our decision-making rule is presented here: Reject the
null hypothesis if the p-
value is < alpha = 0.05.
(This step is also the same for every statistical test we will
perform; it says we will reject
the null hypothesis if the probability of getting a result as large
as what we see is less than
5% or a probability of 0.05.)
24. Note that these steps are set-up before we even look at the data.
While, we may have set
up the data columns, we should not have done any analysis yet.
These steps tell us how
we will make a decision from the results we get.
Step 5: Perform the analysis. This is the step where Excel
performs the analysis and
produces output tables. The setting up of each Excel test is
covered in Lecture 3, we are
primarily interested in how to interpret the results in this
lecture.
Here is a screenshot of the results for both versions of the F-
test. (Only one is needed for
the question.)
Step 6: Conclusions and Interpretation. This is where we
interpret what the data is trying
to tell us.
Before moving on to interpreting these results, let’s look at
what we have. The F-Test
Two-Sample for Variances output clearly has more information
than the F.TEST. We
have the labels identifying each group as well as the mean,
variance, and count
(Observations) for each group. The df, equaling the (sample
size -1), is shown as well as
the calculated F statistic (which equals the left group’s (or Male
in this case) variance
divided by the right group’s (Female) variance. Note, Excel
divides the variances in the
25. order that they were entered into the data entry box, for this
example the Males were
entered first.
The next two rows are critical for our decision making; but they
are incomplete. They
show the one-tail critical values used in decision making. The
P(F<=f) one-tail is the P-
value, or the probability of getting an F-value as large or larger
than we have if the null
hypothesis is true. However, it is only a one-tail outcome,
while we want a two-tail
outcome, since we only care about the variances being equal or
not, not which one is
larger. So, the result as presented cannot be used directly.
What we need to do will be
covered after we look at the F.TEST result.
The F.TEST gives less information, but it provides us with
exactly what we need; the
two-tail p-value. For our data set, we have a 40% (rounded)
chance of getting an F value
this large or larger purely by chance alone when we are looking
at a two-tail outcome.
Note that this value, 0.39766 is twice the one-tail value of
0.19883 from the F-Test table.
This will always be true, the two-tail probabilities will be twice
the one-tail values.
So, if we want to use the F-TEST Two-Sample for Variance
tool, we need to double the
p-value before making our step 6 decisions.
26. We are now ready to move on to what step 6 asks for. This step
has several parts.
• What is the p-value: 0.3977 (our compa-ratio example result).
This value equals
EITHER tht F.TEST outcome or 2 times the F-test result. (If,
the F-test p-value is
in cell K-15, you could enter =2*K15 to get the value desired).
• What is your decision: REJ or Not reject the null? (If our p-
value is < (less than)
0.05, we say REJ, if the p-value is > (larger than) 0.05, we say
NOT reject. This is
what our decision rule says to do.) Our answer is for compa-
ratio variances:
NOT Rej. This means we do not reject the claim made by the
null hypothesis
and accept it as the most likely description of the variances
within the population.
• Why? This line asks us to explain why we made the decision
we did. Our compa-
ratio response is: The p-value is > (greater than) 0.05, and the
decision rule is
reject if the p-value was < 0.05. (The answer here is simply
why, based on the
reasoning shown above, you chose your REJ or NOT Rej
choice.)
• What is your conclusion about the variances in the population
for the male and
female salaries? This part asks us to translate the statistical
decision into a clear
answer to the initial question (Are male and female compa-
values equal?) Our
response: We do not have enough evidence to say that the
27. variances differ in
the population. The variances are equal in the population. Had
we rejected
the null hypothesis, we would have said the population
variances differed.
Note that this question did not tell us anything about actual pay
differences between the
genders. It did tell us that both groups are dispersed in a
similar manner, and thus supported
some of the conclusions we drew from looking at the data last
week.
Examples: Testing for Mean Equality
While we test for variance equality with an F test, we use the T-
Test to test for mean
equality testing. The t-test also uses the degree of freedom (df)
value in providing us with our
probability result; but again, Excel does the work for us. The t
distribution is a bell-shaped curve
that is flatter and a bit more spread out than the normal curve
we discussed last week. The center
is located at 0 (zero) and the tails (the negative and positive
values) are symmetrical.
The t statistic for the testing of two means is basically: (Mean1
– Mean2)/standard error
estimate. (The standard error formula varies according to which
type of t-test we are
performing.) Note that the t will be either positive or negative
depending upon which mean is
larger. So, if we are interested in simply equal or not equal, it
does not matter if we have a
28. positive or negative t value, only the size of the difference
matters. As with the F-test, half of
our alpha goes in the positive tail and half goes in the negative
tail when making our equality
decision.
We have two questions about means this week.
Question 2. The second question for this week asks about
salary mean equality between males
and females. Again, the set up for this question is covered in
Lecture 3, we are concerned here
with the interpretation of the results. (Note, the comments
about each step made above apply to
this example as well, but they will not be repeated except for
specific information related to the t-
test outcome.) Specific differences from the variance example
of question 1 will be highlighted
with italics. Again, the results discussed with each step are
shown in Lecture 2.
Since the question asks if male and female compa-ratios
(salaries in the homework), are equal we
have a equal versus non-equal hypothesis pair.
Step 1: Ho: Male compa-ratio mean = Female compa-ratio mean
Ha: Male compa-ratio mean =/= Female compa-ratio mean
Step 2: The decision criteria is constant: Alpha = 0.05
Step 3: t statistic and t-test, assuming equal variances. We use
these as they are
designed to test mean equality, and we are assuming (and
according to the F-test
have) equal variances.
29. Step 4: Again, our decision rule is the same: Reject the null
hypothesis if the p-value is
< alpha = 0.05.
Step 5: Perform the analysis. Here is the screen shot for the
results, using the same data
as with Question 1.
As with the F-test output, the t-test starts with the test name, the
group names, and some
descriptive statistics. Line 4 start with a new result, the Pooled
Variance; this is a
weighted average of the sample variances since we are assuming
that the related
population variances are equal. The next line, hypothesized
Mean Difference, shows up
only if a value was entered in the data input box setting up the
test (discussed in Lecture
3).
Next comes our friend degrees of freedom, which equal the sum
of both sample sizes
minus 2 (or N1-1 + N2 -1). The calculated T value (similar to
the calculated F value in
the F test output) comes next. Note that since we have a
negative t Stat, it falls in the left
tail of the t distribution. This is important in one tail tests but
not in two tail tests.
Following the calculated T value come the one and two tail
decision points. The one tail
p-value is found in the P(T<==t) one-tail row followed by the T
critical one-tail value.
30. The two tail results follow.
Step 6: Conclusions and Interpretation. This step has several
parts.
What is the p-value? 0.571 (our rounded compa-ratio example
result).
(Since we again have a two-tail test, we use the P(T<=t) two-
tail result.)
What is your decision: REJ or Not reject the null? NOT Rej
(our compa-ratio result)
Why? The p-value (0.571) is > (greater than) 0.05. (The compa-
ratio result)
What is your conclusion about the means in the population for
the male and female
salaries? We do not have enough evidence to say that the means
differ in the
population. So, our conclusion is that the means are equal in
the population. (Our
compa-ratio result.)
Question 3
The third question for this week asks about salary differences
based on educational level
rather than gender. Since education is a legitimate reason to
pay someone more, it will be
helpful to see if a graduate degree results in a higher average
pay. Note that this question has a
directional focus (do employees with an advanced degree
31. (degree code = 1) have higher average
salaries?). This means we must develop a direction set of
hypothesis statements. We will use
the terms UnderG (for undergraduate degree code 0) and Grad
(for graduate degree code 1) in
these statements. Again, the results discussed with each step
are shown in Lecture 2.
Step 1: Ho: UnderG mean compa-ratio => Grad mean compa-
ratio
Ha: UnderG mean ratio < Grad mean compa-ratio
(Note the way the inequalities are set up; since the question is if
degree 1 salary
means >, the question becomes the alternate hypothesis as it
does not contain an =
claim. These can be written with the Grad mean listed first but
the arrow heads
must point to Grad showing an expectation that grad means are
larger.)
Step 2: Alpha = 0.05 (Our constant decision criterion)
Step 3: t statistic and t-test assuming equal variances. We use
these as they are
designed to test mean equality. (The variance equality
assumption is part of the
question set-up.)
Step 4: Reject the null hypothesis if the p-value is < alpha =
0.05. (Our constant
decision rule.)
Step 5: Perform the analysis. Here is a screen print for a T-test
on the question of
32. whether the graduate and undergraduate degree compa-ratios
means in the population are
equal or not. We are assuming equal variances, we are using
the T-Test Two-Sample
Assuming Equal Variances form.
t-Test: Two-Sample Assuming Equal Variances
UnderG Grad
Mean 1.05172 1.07324
Variance 0.00581 0.005999
Observations 25 25
Pooled Variance 0.005904
Hypothesized Mean
Difference 0
df 48
t Stat -0.99016
P(T<=t) one-tail 0.163529
t Critical one-tail 1.677224
P(T<=t) two-tail 0.327059
t Critical two-tail 2.010635
The table output is read exactly the same way as with the
question 2 table with the
exception that we are interested in the one-tail outcome, so we
use the (highlighted) one-
tail p-value row in our decision making.
Step 6: Conclusions and Interpretation. This step has several
parts.
33. • What is the p-value? 0.164(rounded) (our rounded compa-ratio
example result).
(Since we again have a one-tail test, we use the P(T<=t) one-tail
result.)
• Is the t value in the t-distribution tail indicated by the arrow in
the Ha claim? Yes.
The t-value is negative, and the Ha arrow points to the left (or
negative) tail
of the t distribution. (Since we only care about a difference in
one direction, the
result must be consistent with the desired direction. Only large
negative values
are of interest in this case/set-up, since our difference is
calculated by (UnderG –
Grad); large negative values show a larger Grad salary. If we
had said Grad <=
Underg in the Null, the alternate arrow would have pointed to
the right or positive
tail, and a positive t would have been needed.)
• What is your decision: REJ or Not reject the null? NOT Rej
(our compa-ratio
result)
• Why? The p-value is > (greater than) 0.05. So, the sign does
not matter in this
case, but it is in the correct or negative tail. (The compa-ratio
result)
• What is your conclusion about the impact of education on
average salaries? We
do not have enough information to suggest that graduate degree
holders have a
34. higher average salary than undergraduate degree holders. (Our
compa-ratio
result.)
Question 4
While the week 1 salary results suggest that males and females
are not paid the same, this
week’s compa-ratio tests still do not suggest any inequality.
Gender Compa-ratio variances and
means are not significantly different. A somewhat surprising
result was that graduate degree
holders did not have higher compa-ratios.
We still cannot answer our equal pay for equal work question;
however, as we have yet
developed a measure of pay for equal work. Compa-ratios do
remove the impact of grades, but
too many other work-related variables still need to be examined.
Summary
The F and t tests are used to determine if, based upon random
sample results, the
population parameters can reasonably be said to differ. The F-
test looks for differences in
population variances, while the t-test examines population mean
differences. Both tests are
performed as part of the hypothesis testing procedure and
always is done in step 5.
Differences in sample results can be transformed into statistical
distributions that allow us
to determine the probability or likelihood of getting a difference
35. as large or larger than we found.
It is this transformation that allows us to make our decisions
about the differences we see in the
results.
When either test is set-up using the Data | Analysis toolpak
function, these tests will
provide summary sample descriptive statistics for the mean,
variance, and count as well as the
calculated statistic, the critical value of the statistic, and the p-
value. When set-up using Excel’s
Fx or Formula functions, only the p-value is returned.
For both tests, if the appropriate p-value is less than the
specified alpha (always 0.05 in
this class), we reject the null hypothesis and say the alternate is
the more likely description of the
population.
We can test for a simple difference (called a two-tail test) where
it does not matter which
group has the larger value or we can use a directional test
(called a one-tail test) where we are
concerned about which variable is larger (or smaller). The null
and alternate hypothesis define
which difference we are looking for.
The t-test has three versions: equal variances, unequal
variances, and paired. The paired
test is used when we have two measures on each subject (such
as the salary and midpoint for
each employee). The F-test is used to help us decide if we need
to use the equal or unequal
variance form of the t-test.
The Analysis toolpak F test defaults to a one-tail test so we
36. need to double its p-value
when testing for simple variance differences. The Fx (or
Formula) F-test lets us select a one- or
two-tail outcome.
Please ask your instructor if you have any questions about this
material.
When you have finished with this lecture, please respond to
Discussion Thread 2 for this
week with your initial response and responses to others over a
couple of days.