SlideShare a Scribd company logo
1 of 29
BUS 308 – Week 4 Lecture 2
Interpreting Relationships
Expected Outcomes
After reading this lecture, the student should be able to:
1. Interpret the strength of a correlation
2. Interpret a Correlation Table
3. Interpret a Linear Regression Equation
4. Interpret a Multiple Regression Equation
Overview
As in many detective stories, we will often find that when one
thing changes, we see that
something else has changed as well. Moving to correlation and
regression opens up new insights
into our data sets, but still lets us use what we have learned
about Excel tools in setting up and
generating our results.
The correlation between events is mirrored in data analysis
examinations with correlation
analysis. This week’s focus changes from detecting and
evaluating differences to looking at
relationships. As students often comment, finding significant
differences in gender-based
measures does not explain why these differences exist.
Correlation, while not always explaining
why things happen gives data detectives great clues on what to
examine more closely and helps
move us towards understanding why outcomes exist and what
impacts them. If we see
correlations in the real world, we often will spend time
examining what might underlie them;
finding out if they are spurious or causal.
Regression lets us use relationships between and among our
variables to predict or
explain outcomes based upon inputs, factors we think might be
related. In our quest to
understand what impacts the compa-ratio and salary outcomes
we see, we have often been
frustrated due to being basically limited to examining only two
variables at a time, when we felt
that we needed to include many other factors. Regression,
particularly multiple regression, is the
tool that allows us to do this.
Linear Correlation
When two things seem to move in a somewhat predictable way,
we say they are
correlated. This correlation could be direct or positive, both
move in the same direction, or it
could be inverse or negative, where when one increases the
other decreases. The Law of Supply
in economics is a common example of an inverse (or negative)
correlation, where the more
supply we have of something, the less we typically can charge
for it; the Law of Demand is an
example of a direct (or positive) correlation as the more demand
exists for something, the more
we can charge for it. Height and weight in young children is
another common example of a
direct correlation, as one increases so does the other measure.
Probably the most commonly used correlation is the Pearson
Correlation Coefficient,
symbolized by r. It measures the strength of the association –
the extent to which measures
change together – between interval or ratio level measures as
well as the direction of the
relationship (inverse or direct). Several measures in our
company data set could use the Pearson
Correlation to show relationships; salary and midpoint, salary
and years of service, salary and
performance rating, etc. The Pearson Correlation runs from -
1.0 (perfect negative or inverse
correlation) thru 0 (no correlation) to +1.0 (perfect positive or
direct correlation).
A perfect correlation means that if we graphed the values, they
would fall exactly on a
straight line, either increase from bottom left to top right
(positive) or from top left to bottom
right (negative). The stronger the absolute value (ignoring the
sign), the stronger the correlation
and the more the data points would form a straight line when
plotted on a graph. The Excel Fx
function Correl, and the Data Analysis function Correlation
both produce Pearson Correlations.
Question 1
When we have a data set with multiple variables, we would
want to see what
relationships exist – a detective’s sort of “who works with
whom” around the result we are
looking for. This question asks for us to identify the
statistically significant correlations between
the SALARY and the other variables.
Remember, while the assignment asks for correlations with
Salary, this example will use
correlations with the compa-ratio. The hypothesis testing steps
are
Step 1: Ho: Correlation with compa-ratio is not significant. Ha:
Correlation with compa-ratio is
significant. (A two-tail test.)
Technically we should set up the hypothesis testing steps for
each of the correlations
(which we will see shortly equals 15). This is a bit tedious, so
we approach the issue of
statistical significance as we did with the ANOVA coefficients
last week by using a
single Hypothesis test and apply it to each of the correlations.
Step 2: Alpha = 0.05
Step 3: Statistical Test and statistic: Pearson’s correlation (r), t,
and the correlation t-test to test a
correlation
The significance of the Pearson Correlation is tested with our
old friend the t-test.
Step 4: Reject the null hypothesis if the p-value < 0.05.
Step 5: Conduct the test.
Below is a screen shot of a correlation table produced by the
Analysis Toolpak
Correlation tool.
Reading the Table. The table only shows correlations below the
diagonal (which has a
1.00 correlation of each variable with itself). Values above the
line would simply duplicate those
below it. The diagonal is a “pivot-point,” so to speak. In
reading the correlations we would start
with a row such as Age. The correlation of Age and Compa-
ratio is 0.195 (rounded), the
correlation of Age and Midpoint is 0.567. Then we get to the
diagonal. Instead of continuing
horizontally, we start going vertical (down the Age column).
So, the correlation of Age with
Performance rating is 0.139, with Service is 0.565, and with
raise -0.180. All correlations,
except for the first one (in this case Compa-ratio) would be read
this way in Correlation tables.
Step 6: Conclusion and Interpretation.
To assess the statistical significance of the correlations we need
to calculate a t value for
each correlation, using the formula: t = r * sqrt(n-2)/sqrt(1-r^2),
df = n-2;
In this formula, n equals the number of data point pairs used in
the correlation, and r
equals the correlation that we are testing.
The associated p-value for a t-value is found using the Fx
function T.DIST.2T(t, df).
In any correlation table, the correlations are all developed with
the same number of pairs
of data points. (In our case, we have 50 pairs to use.) So, it
seems reasonable that each
of these correlations would have the same critical value that
cuts off our p-value of 0.05.
So, once we find a correlation value that is non-significant (that
is, we fail to reject the
null hypothesis), any correlations smaller than this would also
be non-significant.
For this lecture, please ignore part b. We will discuss this in
Lecture 3, as it is a short-cut
that depends on your understanding the approach described
below.
Part c asks us to use this information and identify the variables
significantly correlated to
salary in the homework and to compa-ratio for this example.
We need to calculate the t value and its associated p-value to
determine significance.
Starting with the largest correlation value with Compa-ratio, we
see this is with midpoint,
and is 0.50, rounded. So we have:
T= r * sqrt(n-2)/sqrt(1-r^2) = .5*sqrt(50-2)/sqrt(1-.5^2) = 4
(letting Excel do the math).
The two-tail p-value is T.DIST.2T(t,df) = T.DIST.2T(4,48) =
0.0002.
The null hypothesis is rejected for the Compa-Ratio and
Midpoint correlation.
The next largest Compa-ratio correlation is with Age at 0.20
(rounded). Using the same
Excel functions, we get t= 1.41 with a p-value of 0.164 (both
rounded). So, we do not
reject the null hypothesis for the Compa-ratio and age
correlation.
It makes sense that if a correlation of .20 is non-significant,
then any smaller correlations
would also be non-significant, so our testing is done.
Looking at the output table above, we can say that only
Midpoint is significantly
correlated to compa-ratio with a correlation of .50 (rounded).
Part d asks for any surprising results/correlations. This will
depend upon your table and
what you did or did not expect.
Part e asks if this information helps us answer our equal pay
question. The compa-ratio
correlations do not seem to be helpful as they do not shed any
insight on gender based
issues.
Multiple Correlation
As interesting as linear correlation is, multiple correlation is
even more so. It correlates
several independent (input) variables with a single dependent
(output) variable. For example, it
would show the shared variation (multiple R squared, or
Multiple Coefficient of Determination)
for compa-ratio with the other variables in the data set at the
same time rather than in pairs as we
did in question 1. While we can generate this value by itself, it
is a bit complicated and is rarely
found except in conjunction with a multiple regression equation.
So, having noted that this
exists, let’s move on to multiple regression.
Regression
Regression takes us the next step in the journey. We move from
knowing which
variables are correlated to finding out which variables can be
used to actually predict outcomes
or explain the influence of different variables on a result. As
we might suspect, linear regression
involves a single dependent (outcome) and single independent
(input) variable. Linear
regression uses at least interval level data for both the
dependent and independent variables.
The form of a linear regression equation is:
Y = a + b*X; where Y is the output, X is the input, a is the
intercept (the value of y when
X = 0) on a graph, and b is the coefficient (showing the change
in Y for every 1 change in the
value of X.
Earlier, we found that the correlation between raise and
performance rating was 0.674
(rounded). While we did not make note of this in our
correlation discussion, it was part of the
correlation table. This correlation relates to a coefficient of
determination (CD) of 0.674^2 or
0.45 (rounded). As mentioned, this is not a particularly strong
correlation, and we would not
expect the graph of these values to show much of a straight line.
For purposes of understanding
linear regression, let’s look at a graph showing performance
rating as an input (an X variable)
predicting raise (Y). An example of a regression equation and
its graph is:
Raise (Y) vs Performance Rating (X)
This is a Scatter Diagram graph produced by Excel. The
regression line, equation, and R-
squared values have been added. Note that the Coefficient of
Determination (R2) is the 45% we
found earlier, and that the data points are not all that close to
the regression (AKA trend) line.
Note the format of the regression equation Y = 0.5412 +
0.0512X, this is the same as saying
Raise = 0.5412 + 0.0512* Performance Rating when we
substitute the variable names for the
algebraic letters.
Let us look at the equation. Since we know that the correlation
is significant (it is larger
than our 0.50 value we found for compa-ratio and Midpoint),
the linear regression equation is
significant. The regression says for every single point increase
in the performance rating (our X
variable), the raise (The Y variable) increases, on average by
0.0512%. If we extended the line
towards the y (vertical axis), it would cross at Y = – 0.0512 and
X = 0, this is an example where
looking at the origin points is not particularly helpful as no one
has a performance rating of 0.
This graph does tend to reinforce our earlier comment that raise
and performance rating, even
y = 0.0512x + 0.5412
R² = 0.4538
0
1
2
3
4
5
6
7
0 20 40 60 80 100 120
though the strongest correlation, are not particularly good at
predicting each other’s value. We
see too much dispersion of data points around the best fit
regression line through the data points.
Most of us are probably not surprised, just as we feel compa-
ratio is not determined by a
single factor, we know raise is more complicated than simply
the performance rating. This is
where looking at multiple regression, the use of several factors,
might be more insightful.
Multiple Regression
Multiple Regression is probably the most powerful tool we will
look at in this course. It
allows us to examine the impact of multiple inputs (AKA
independent variables) on a single
output or result (AKA dependent variable). It also allows us to
include nominal and ordinal
variables in the results when they are used as dummy coded
variables.
Multiple regression has an interesting ability that we have not
been able to use before. It
can use nominal data variables as inputs to our outcomes, rather
than using them simply as
grouping labels. It does so by assigning either a 0 or 1 to the
variable value depending upon
whether some characteristic exists or not. For example, with
degree we essentially are looking to
see if a graduate degree has any impact, since everyone in the
sample has at least an
undergraduate degree. So, we code the existence of a graduate
degree with a 1, and the “non-
existence” with a 0. Similarly, with gender we are interested,
essentially, how females are being
treated, so we code them 1 (existence of being female). This
coding is called Dummy Coding,
and involves only using a 0 or 1 in specific situations where the
existence of a factor is
considered important. Note, other than some changes in the
value of the coefficients, the
outcomes would not differ if the codes were reversed. The
significance, or non-significance, of
degree or gender would remain the same regardless of the code
used. We will comment on this
more after we see our results.
Question 2
Question 2 for this week asks for a regression equation that
explains the impact of
various variables on our output of interest. Of course, in the
homework this is salary, while in
our lectures it is the compa-ratio.
Now that the data has been set up, let’s look at our hypothesis
testing process for the
question, first, of whether or not the regression equation is
helpful in explaining what impacts
compa-ratio outcomes.
Parts a and b. This part looks at the overall regression.
Step 1: Ha: The regression equation is not significant.
Ho: The regression equation is significant.
Step 2: Alpha = 0.05
Step 3: F stat and ANOVA-Regression, used to test regression
significance
Step 4: Decision Rule: Reject the null hypothesis if p-value <
0.05.
Step 5: Conduct the test.
Here is a screen shot of a multiple regression analysis for the
question of what factors influence
compa-ratio. Note: we will split the discussion of the output
into two screen shots.
Step 6: Conclusion and Interpretation.
The first table in the output provides some summary statistics.
Two are important for us
– the multiple correlation, shown as R, which equals 0.655, a
moderate value; and, the R
square or the multiple coefficient of determination showing that
about 43% of the
variation in compa-ratio values can be explained by the shared
variation in the variables
used in the analysis.
The second table shows the results of the actual statistical test
of the regression. Similar
to the ANOVA tables we looked at last week, it has two rows
that are used to generate
our F statistic (4.51) and the p-value which is labeled
“Significance F.”
What is the p-value? 0.0008
Decision: Rej or Not reject the null? Reject the null hypothesis.
Why? The p-value is less than (<) 0.05.
Conclusion about Compa-ratio factors? The input variables are
significantly related to
compa-ratio outcomes. Some of the compa-ratio outcomes can
be explained by the
selected variables. We used the phrase “some of” since the
equation only explains 43%
of the variance, less than half.
Part c
Once we reject the null hypothesis, our attention changes to the
actual equation, the
variables and their corresponding coefficients. The third table
provides all the details we need to
reach our conclusions.
As with the correlations in question 1, we will use the
hypothesis testing process, but will
write it only once and use the p-values to make decisions on
each of the possible equation
variables.
Step 1: Ha: The variable coefficient is not significant (b = 0).
Ho: The variable coefficient is significant (b =/= 0).
Step 2: Alpha = 0.05
Step 3: T stat and t-test for coefficients
Step 4: Decision Rule: Reject the null hypothesis if p-value <
0.05.
Step 5: Conduct the test. In this case, the test has already been
performed and is part of the
regression out. Here is a screen shot of the second half of the
Regression output.
Step 6: Conclusions and Interpretation
As with the correlations, we will use a single statement of the 6
steps to interpret the
outcomes in this part. You are asked to transfer values from the
ANOVA-Reg table to a
decision summary table. Here is the completed table. Note that
the variable names (the
X’s in our regression equation) come from column L. The t-
values are shown in column
O while the p-values are in Column P. The coefficients (the b’s
in our equation) are
listed in Column M.
The rejection decision is the same as we have been using.
Looking at the p-values, we
reject the null for all p-values less than 0.05. Note the unusual
looking p-value for the
intercept: 2.9E-13. This format is called exponential and is the
same as 2.9 * 10-13. This
value equals 0.00000000000029, we move the decimal point 13
places to the left. Any
E-xx Excel output will be less than our alpha value of 0.05. We
do not list the intercept in
our table as it is always included in the equation.
With our results, we reject the null hypothesis and find that 3
variables are significant
factors into determining compa-ratio, and these are Midpoint,
Perf. Rating and Gender.
The Multiple Regression equation is similar to the linear
regression example given above
except it has more independent terms: Y = a + b1*X1 + b2*X2
+ B3*X3 + …. The b’s stand for
the coefficients that are multiplied by the value of each variable
(represented by the X’s).
With these 3 variables and the intercept, the statistically
significant regression equation
is:
Compa-ratio = 0.954 + 0.003*midpoint -0.002*performance
rating + 0.056*gender.
So, what does this equation mean? How do we interpret it? The
intercept (0.9545) is
somewhat of a place holder – it centers the line in the middle of
the data points but has little
other meaning for us. The three variables, however tell us a lot.
Changes in each of them impact
the compa-ratio outcome independently of the others; it is as if
we can consider the other factors
being held constant as we examine each factor’s impact. So, all
other things the same, each
dollar increase in midpoint increases the compa-ratio value by
0.0034. This relates to what we
found last week that compa-ratio is not independent of grade.
At the same time, and possibly
surprisingly, every increase in an employee’s performance
rating causes the compa-rating to
decrease by .0024! Finally, the equation says that gender is an
important factor. This factor
alone means that the company is violating the equal pay act.
But, what might be surprising is
that for a change from male (coded 0) to female (coded 1) the
compa-ratio goes up by 0.0562!
Females get a higher compa-ratio (percent of midpoint) when all
other things are equal than
males do, since the female gender results in adding 0.056*1 to
the compa-ratio while the male
gender has 0.056 * 0 (or 0) added to their compa-ratio.
We did have one hint that this might be the case, when we
noticed in week 1 that the
female mean compa-ratio was higher than the male compa-ratio.
But, then some of the single
factor tests minimize this difference. This is one of multiple
regression’s greatest strengths, it
will show us the impact of a single variable by controlling for,
or keeping constant, the impact of
all other variables.
Parts d, e, and f
Gender is a significant element in the compa-ratio, as females
(coded 1) get a higher
value when all other variables are equal. We see this from the
significant positive coefficient to
the variable gender. If we had switched the coding and had
Females coded 0, the sign of the
gender variable would have changed causing Males to have part
of their compa-ratio reduced due
to being male.
Question 3
This answer will depend on what other factors you would like to
see.
Question 4
As of this point, we have some strong evidence in the compa-
regression equation and the t-test
on average compa-ratios, that females get more pay for equal
work than males. The company is
violating the Equal Pay Act, in favor of women.
Question 5
What you say here describes your understanding of regression
analysis versus the power of
inferential tests of 2 variables at a time.
Summary
Correlations show the direction and strength of a relationship
between two variables and
are fairly straight forward to understand. The Pearson
correlation and the Spearman Rank order
correlation are the two most generally used correlations.
Excel produces the Pearson Correlation in a single value or in a
table showing
correlations among three or more variables. Each cell shows
the correlation value for the
variable listed on the side row and top column. The statistical
significance of either the Pearson
or Spearman correlation is found by using the t-statistic. The
Spearman’s rank correlation is not
produced directly by Excel.
If we have variables that are related/correlated to an output,
such as salary, we can create
a regression equation. A regression equation is somewhat like a
recipe for your favorite food; it
tells how “much” of each ingredient (AKA variables) to add
into the mix to get your result.
The regression tables have a lot of information in them, and at
first glance they can
appear a bit overwhelming. However, as with the ANOVA
tables, only a few results are
important. Start with the first value, the Multiple R and R
squared (the multiple coefficient of
determination). These give us a sense of how well the
regression variables explain the outcome.
Then go to the next table and look at the significance of F value
at the right. This is our P-value
and should be less than 0.05 for our regression equation to
reject the null of no significance.
Assuming we rejected the null, the third table gives us our
details. The first two columns contain
the variable names and their respective coefficient. These get
multiplied together and added (or
subtracted depending on the coefficient sign) to create the
regression equation. A couple of
columns over is the p-value column, letting us know which of
the variables is significant to the
regression. If interested, we can build a confidence interval
using the final columns for each
variable.
Please ask your instructor if you have any questions about this
material.
When you have finished with this lecture, please respond to
Discussion Thread 2 for this
week with your initial response and responses to others over a
couple of days before reading the
third lecture for the week.
BUS 308 Week 4 Lecture 3
Developing Relationships in Excel
Expected Outcomes
After reading this lecture, the student should be able to:
1. Calculate the t-value for a correlation coefficient
2. Calculate the minimum statistically significant correlation
coefficient value.
3. Set-up and interpret a Linear Regression in Excel
4. Set-up and interpret a Multiple Regression in Excel
Overview
Setting up correlations and regressions in Excel is fairly
straightforward and follows the
approaches we have seen with our previous tools. This involves
setting up the data input table,
selecting the tools, and inputting information into the
appropriate parts of the input window.
Correlations
Question 1
Data set-up for a correlation is perhaps the simplest of any we
have seen. It involves
simply copying and pasting the variables from the Data tab to
the Week 4 worksheet. Again,
paste them to the right of the question area. The screenshot
below has the data for both the
question 1 correlation and the question 2 multiple regression
pasted them starting at column V.
You can paste all the data at once or add the multiple regression
variables later (as long as you
do not sort the original data).
Specifically, for Question 1, copy the salary data to column V
(for example). Then copy
the Midpoint thru Service columns and paste them next to
salary. Finally copy the Raise column
and paste it next to the service column. Notice that our data
input range for this question now
includes Salary in Column V and the other interval level
variables found in Columns W thru AA.
Question 1 asks for the correlation among the interval/ratio
level variables with salary
and says to exclude compa-ratio. For our example, we will
correlation compa-ratio with the
other interval/ratio level variables with the exclusion of salary.
Since compa-ratio equals the
salary divided by the midpoint, it does not seem reasonable to
use salary in predicting compa-
ratio or compa-ratio in predicting salary.
Pearson correlations can be performed in two ways within
Excel. If we have a single pair
of variables we are interested in, for example compa-ratio and
performance rating, we could use
the fx (or Formulas) function CORREL(array1, array2) (note
array means the same as range) to
give us the correlation.
However, if we have several variables we want to correlate at
the same time, it is more
effective to use the Correlation function found in the Analysis
ToolPak in the Data Analysis tab.
Set up of the input data for Correlation is simple. Just ensure
that all of the variables to be
correlated are listed together, and only include interval or ratio
level data. For our data set, this
would mean we cannot include gender or degree; even though
they look like numerical data the 0
and 1 are merely labels as far as correlation is concerned.
In the Correlation data input box shown below, list the entire
data range, indicate if your
data has labels or not (good idea to include these), select the
output cell, and click OK. Here is a
screen shot of the input box and some of the data.
The result will show up in K08 (in this case).
Statistical Significance
Part b. Normally, we would go thru our questions about the p-
value for each value. But
since you are familiar with the testing logic, for this question
we are going to “shortcut” the
process. Now, there is an easier way of determining which of
the correlations are statistically
significant. This is suggested by the question1 part b that we
skipped in lecture 2. We noted that
values smaller than the r = .20 that we tested could be assumed
to all be non-significant. We
could also have assumed that values larger than the tested 0.50
would be assumed to all be
significant. So, it would seem to make sense that there is a
specific value of r that exactly
matches the alpha = 0.05 criteria.
If we can find this value of r, we can compare each correlation
with this critical value;
correlations larger (absolute values) than this are significant;
while smaller correlations are not
significant. Having this critical value would give us a quick
decision point (much like how we
use the p-value).
The issue is now, what is this critical r value?
Technical Point. If you are interested in how we obtain the
formula for determining the
minimum r value, the approach is shown below. If you are not
interested in the math, you can
safely skip this paragraph, and go to The Result paragraph
below.
We know that t = r* sqrt(n-2)/sqrt(1-r2)
Multiplying both sides by sqrt(1-r2) gives us t *sqrt (1- r2) =
r*sqrt(n-2)
Squaring both sides gives us: t2 * (1- r2) = r2* (n-2)
Multiplying each side out gives us: t2– t2* r2 = n r2-2* r2
Adding t2* r2 both sides gives us: t2= n* r2-2*r2+ t2 *r2
Factoring gives us: t2= r2 *(n -2+ t2)
Dividing both sides by *(n -2+ t2) gives us: t2 / (n -2+ t2) = r2
Taking the square root gives us: r = sqrt (t2 / (n -2+ t2)
The Result. The formula to use in finding the minimum
correlation value that is
statistically significant is: r = sqrt(t^2/(t^2 + n-2)), where t is
the 2-tail value for any df count.
We would find the t value associated with a two-tail p-value of
0.05 and a df value of 48
by using the t.inv.2T(alpha, df) function with alpha = 0.05 and
df = n-2 or 48 (for our data set of
50 employees). Plugging these values into the gives us a t-
value of 2.0106 or 2.011(rounded).
t =t.inv.2T(alpha, df) =t.inv.2t(0.05,48) = 2.011
r = sqrt(t^2/(t^2 + n-2)) = sqrt(2.011^2/(2.011^2 + 50-2) =
0.278.
Therefore, in a correlation table based on 50 pairs, any
correlation greater than or equal to
0.278 would be statistically significant.
So, what does all this mean? If we find a correlation based on
50 pairs of data (such as
what our data set will produce), any correlation value that
exceeds an absolute value of 0.278
would be found to be statistically significant (p-value less than
0.05) and cause us to reject the
related null hypothesis of not significant.
So, when looking at a table of correlation values, we can
identify the significant
correlations immediately; these are any correlation above the
absolute value of 0.278 (that means
larger than + 0.278 (such as + .46) or less than -0.278 (such as -
0.53)). Knowing how to
interpret table results, we can proceed making our decisions on
what is significant.
So, for part b, the first question asked is what is the T value that
cuts off the two tails of
the distribution with an alpha of 0.05? We calculated this
above as 2.011. The second question
asks for the associated correlation value for this t-value. Again,
we found this above to be 0.278.
Spearman’s. Note that while the Spearman’s rank order
correlation is not asked for in
the assignment, you might want to use it at times. For example,
some could argue that
Performance Rating, since it is based mostly on human
judgement, it is really ordinal and
requires Spearman’s. The formula for Spearman’s (which needs
to be manually input into
Excel), is:
Rho = 1 – 6*(Sum of d^2)/(n*(n^2 – 1)); where d is the
difference in the rank score for
each of the paired variables, and n is the count of paired data
used. Remember that the ^ in an
Excel formula means take the number to that power, so d^2
means d squared (or d times d).
Regression
Question 2
Both linear and multiple regression are both set up in the same
fashion, so we will look at
only the multiple regression situation. For the data, put the
dependent variable, the output such
as salary or compa-ratio, in one column and then paste the
independent, input, variables in
sequential columns next to it. Make sure that none of the
columns contain letter characters. It is
also a good idea to include the variable labels for each data
column. The first screen shot above
shows the data input required for this question.
The Regression function is found in the Data | Analysis block
and is labeled Regression.
Here is a screen shot of a complete Regression set-up for a
regression equation for compa-ratio.
Note that unlike the correlation input, we have two ranges to
work with. The first is the output,
which for this example is compa-ratio (and would be salary for
the homework). The second is
for the inputs, which should include all of the numeric looking
variables, including the Degree
and Gender variables as shown below.
Data range entry for the Y (or outcome) and the X (or input)
variables are done separately
by either typing in the ranges or using dragging the cursor over
the data range after clicking on
the up arrow at the right end of the data entry boxes. The same
is done with the data entry box
after clicking the circle for Output range.
There are a number of options to consider. First, of course, is
the need to click the labels
box if your data ranges include labels. A second option is the
Constant is Zero equation. This
would force the regression equation to pass thru the X = 0 and
Y = 0 origin, even if this is not the
best fit. Use this with caution, even though it might make sense
to have Y = 0 when all the X
variables are 0, using this option may not give us the equation
that best fits the data.
The residuals box provides a way to see how well each of the
plotted data points fits with
the predicted results. This will often allow us to see outliers –
cases that do not fit with the rest
of the data set. Outliers are sometimes indications of data entry
errors or, in the case of salary,
they may be paid using a different approach. One such example
would be a commission
salesperson being included with employees that are paid on a
straight salary, the basis of pay is
so different these two should not be analyzed in the same study.
Other options here allow for the
results to be turned into Z-scores (Standardized Residuals),
plotted on a graph, or have linear
plots made for the output and each separate input. Normal
Probability Plots are rather
complicated to discuss, and it is left to the student to explore
this if desired. You are encouraged
to play around with some of these options, even though they are
not required for the assignment.
Here is a video on Regression: https://screencast-o-
matic.com/watch/cb6jfuIk8S
Summary
Pearson Correlations are fairly easy to produce in Excel with
either the Analysis ToolPak
Correlation function (best used for multiple correlations or
when you want the labels shown) or
the Fx (or Formulas) function CORREL, best used for a single
correlation outcome with no
labels. Both are used for the Pearson correlation only. The
Spearman’s correlation requires
setting up the data in rank order and providing ranks for each
variable separately and then
summing these and placing them into a cell formula to obtain
the correlation.
Setting up the data for a correlation is fairly simple. Just list the
variables in a column and
select the appropriate columns for the function being used. For
the correlation table, have all the
variable columns in a continuous range.
The statistical significance of either correlation is found using
the t formula t = r* sqrt(n-
2)/sqrt(1-r2), where r is the correlation value and n is the
number of data pairs used for the
correlation. Once we have a t-value we can use the t.dist.2t(t,
df) formula (df -= n-2) to find the
two tail p-value. The lecture presents an approach for finding a
minimum statistically significant
value when we have a table of correlations to look at;
correlations with an absolute value equal
to or greater than this value would be statistically significant.
Data set-up for a regression is similar to the correlation table.
Have the outcome variable
at one end of the range so it can be selected alone and have all
the other input variables listed in a
continuous range.
The regression function (for either a linear or multiple
regression) is located in the
Analysis ToolPak list and is called Regression. The set-up
within the data entry box is similar to
the other functions we have done (t-test, ANOVA, etc.) with a
data range, data output location,
and a label box to fill in.
Please ask your instructor if you have any questions about this
material.
When you have finished with this lecture, please respond to
Discussion Thread 3 for this
week with your initial response and responses to others over a
couple of days.
https://screencast-o-matic.com/watch/cb6jfuIk8S

More Related Content

Similar to BUS 308 – Week 4 Lecture 2 Interpreting Relationships .docx

Week 5 Lecture 14 The Chi Square Test Quite often, pat.docx
Week 5 Lecture 14 The Chi Square Test Quite often, pat.docxWeek 5 Lecture 14 The Chi Square Test Quite often, pat.docx
Week 5 Lecture 14 The Chi Square Test Quite often, pat.docxcockekeshia
 
Week 3 Lecture 11 Regression Analysis Regression analy.docx
Week 3 Lecture 11 Regression Analysis Regression analy.docxWeek 3 Lecture 11 Regression Analysis Regression analy.docx
Week 3 Lecture 11 Regression Analysis Regression analy.docxcockekeshia
 
Statistics- chapter4.pdf
Statistics- chapter4.pdfStatistics- chapter4.pdf
Statistics- chapter4.pdfPaulosMekuria
 
Covariance and correlation
Covariance and correlationCovariance and correlation
Covariance and correlationRashid Hussain
 
Multivariate Analysis Degree of association between two variable - Test of Ho...
Multivariate Analysis Degree of association between two variable- Test of Ho...Multivariate Analysis Degree of association between two variable- Test of Ho...
Multivariate Analysis Degree of association between two variable - Test of Ho...NiezelPertimos
 
Measures and Strengths of AssociationRemember that while w.docx
Measures and Strengths of AssociationRemember that while w.docxMeasures and Strengths of AssociationRemember that while w.docx
Measures and Strengths of AssociationRemember that while w.docxARIV4
 
Assessment 2 ContextIn many data analyses, it is desirable.docx
Assessment 2 ContextIn many data analyses, it is desirable.docxAssessment 2 ContextIn many data analyses, it is desirable.docx
Assessment 2 ContextIn many data analyses, it is desirable.docxfestockton
 
Assessment 2 ContextIn many data analyses, it is desirable.docx
Assessment 2 ContextIn many data analyses, it is desirable.docxAssessment 2 ContextIn many data analyses, it is desirable.docx
Assessment 2 ContextIn many data analyses, it is desirable.docxgalerussel59292
 
The management of a regional bus line thought the companys cost of .pdf
 The management of a regional bus line thought the companys cost of .pdf The management of a regional bus line thought the companys cost of .pdf
The management of a regional bus line thought the companys cost of .pdfnagaraj138348
 
Artificial Intelligence (Unit - 8).pdf
Artificial Intelligence   (Unit  -  8).pdfArtificial Intelligence   (Unit  -  8).pdf
Artificial Intelligence (Unit - 8).pdfSathyaNarayanan47813
 
The future is uncertain. Some events do have a very small probabil.docx
The future is uncertain. Some events do have a very small probabil.docxThe future is uncertain. Some events do have a very small probabil.docx
The future is uncertain. Some events do have a very small probabil.docxoreo10
 
correlation and regression
correlation and regressioncorrelation and regression
correlation and regressionKeyur Tejani
 
BUS 308 Week 2 Lecture 2 Statistical Testing for Differenc.docx
BUS 308 Week 2 Lecture 2 Statistical Testing for Differenc.docxBUS 308 Week 2 Lecture 2 Statistical Testing for Differenc.docx
BUS 308 Week 2 Lecture 2 Statistical Testing for Differenc.docxjasoninnes20
 
4DDBA 8307 Week 7 Assignment TemplateJohn DoeD.docx
4DDBA 8307 Week 7 Assignment TemplateJohn DoeD.docx4DDBA 8307 Week 7 Assignment TemplateJohn DoeD.docx
4DDBA 8307 Week 7 Assignment TemplateJohn DoeD.docxtroutmanboris
 
Research Methodology Module-06
Research Methodology Module-06Research Methodology Module-06
Research Methodology Module-06Kishor Ade
 
Correlation Example
Correlation ExampleCorrelation Example
Correlation ExampleOUM SAOKOSAL
 
PPT Correlation.pptx
PPT Correlation.pptxPPT Correlation.pptx
PPT Correlation.pptxMahamZeeshan5
 

Similar to BUS 308 – Week 4 Lecture 2 Interpreting Relationships .docx (20)

Measure of Association
Measure of AssociationMeasure of Association
Measure of Association
 
Week 5 Lecture 14 The Chi Square Test Quite often, pat.docx
Week 5 Lecture 14 The Chi Square Test Quite often, pat.docxWeek 5 Lecture 14 The Chi Square Test Quite often, pat.docx
Week 5 Lecture 14 The Chi Square Test Quite often, pat.docx
 
9. parametric regression
9. parametric regression9. parametric regression
9. parametric regression
 
Week 3 Lecture 11 Regression Analysis Regression analy.docx
Week 3 Lecture 11 Regression Analysis Regression analy.docxWeek 3 Lecture 11 Regression Analysis Regression analy.docx
Week 3 Lecture 11 Regression Analysis Regression analy.docx
 
Statistics- chapter4.pdf
Statistics- chapter4.pdfStatistics- chapter4.pdf
Statistics- chapter4.pdf
 
2-20-04.ppt
2-20-04.ppt2-20-04.ppt
2-20-04.ppt
 
Covariance and correlation
Covariance and correlationCovariance and correlation
Covariance and correlation
 
Multivariate Analysis Degree of association between two variable - Test of Ho...
Multivariate Analysis Degree of association between two variable- Test of Ho...Multivariate Analysis Degree of association between two variable- Test of Ho...
Multivariate Analysis Degree of association between two variable - Test of Ho...
 
Measures and Strengths of AssociationRemember that while w.docx
Measures and Strengths of AssociationRemember that while w.docxMeasures and Strengths of AssociationRemember that while w.docx
Measures and Strengths of AssociationRemember that while w.docx
 
Assessment 2 ContextIn many data analyses, it is desirable.docx
Assessment 2 ContextIn many data analyses, it is desirable.docxAssessment 2 ContextIn many data analyses, it is desirable.docx
Assessment 2 ContextIn many data analyses, it is desirable.docx
 
Assessment 2 ContextIn many data analyses, it is desirable.docx
Assessment 2 ContextIn many data analyses, it is desirable.docxAssessment 2 ContextIn many data analyses, it is desirable.docx
Assessment 2 ContextIn many data analyses, it is desirable.docx
 
The management of a regional bus line thought the companys cost of .pdf
 The management of a regional bus line thought the companys cost of .pdf The management of a regional bus line thought the companys cost of .pdf
The management of a regional bus line thought the companys cost of .pdf
 
Artificial Intelligence (Unit - 8).pdf
Artificial Intelligence   (Unit  -  8).pdfArtificial Intelligence   (Unit  -  8).pdf
Artificial Intelligence (Unit - 8).pdf
 
The future is uncertain. Some events do have a very small probabil.docx
The future is uncertain. Some events do have a very small probabil.docxThe future is uncertain. Some events do have a very small probabil.docx
The future is uncertain. Some events do have a very small probabil.docx
 
correlation and regression
correlation and regressioncorrelation and regression
correlation and regression
 
BUS 308 Week 2 Lecture 2 Statistical Testing for Differenc.docx
BUS 308 Week 2 Lecture 2 Statistical Testing for Differenc.docxBUS 308 Week 2 Lecture 2 Statistical Testing for Differenc.docx
BUS 308 Week 2 Lecture 2 Statistical Testing for Differenc.docx
 
4DDBA 8307 Week 7 Assignment TemplateJohn DoeD.docx
4DDBA 8307 Week 7 Assignment TemplateJohn DoeD.docx4DDBA 8307 Week 7 Assignment TemplateJohn DoeD.docx
4DDBA 8307 Week 7 Assignment TemplateJohn DoeD.docx
 
Research Methodology Module-06
Research Methodology Module-06Research Methodology Module-06
Research Methodology Module-06
 
Correlation Example
Correlation ExampleCorrelation Example
Correlation Example
 
PPT Correlation.pptx
PPT Correlation.pptxPPT Correlation.pptx
PPT Correlation.pptx
 

More from curwenmichaela

BUS310ASSIGNMENTImagine that you work for a company with an ag.docx
BUS310ASSIGNMENTImagine that you work for a company with an ag.docxBUS310ASSIGNMENTImagine that you work for a company with an ag.docx
BUS310ASSIGNMENTImagine that you work for a company with an ag.docxcurwenmichaela
 
BUS357 Copyright © 2020 Singapore University of Social Science.docx
BUS357 Copyright © 2020 Singapore University of Social Science.docxBUS357 Copyright © 2020 Singapore University of Social Science.docx
BUS357 Copyright © 2020 Singapore University of Social Science.docxcurwenmichaela
 
BUS308 – Week 1 Lecture 2 Describing Data Expected Out.docx
BUS308 – Week 1 Lecture 2 Describing Data Expected Out.docxBUS308 – Week 1 Lecture 2 Describing Data Expected Out.docx
BUS308 – Week 1 Lecture 2 Describing Data Expected Out.docxcurwenmichaela
 
BUS308 – Week 5 Lecture 1 A Different View Expected Ou.docx
BUS308 – Week 5 Lecture 1 A Different View Expected Ou.docxBUS308 – Week 5 Lecture 1 A Different View Expected Ou.docx
BUS308 – Week 5 Lecture 1 A Different View Expected Ou.docxcurwenmichaela
 
BUS308 – Week 1 Lecture 1 Statistics Expected Outcomes.docx
BUS308 – Week 1 Lecture 1 Statistics Expected Outcomes.docxBUS308 – Week 1 Lecture 1 Statistics Expected Outcomes.docx
BUS308 – Week 1 Lecture 1 Statistics Expected Outcomes.docxcurwenmichaela
 
BUS308 Statistics for ManagersDiscussions To participate in .docx
BUS308 Statistics for ManagersDiscussions To participate in .docxBUS308 Statistics for ManagersDiscussions To participate in .docx
BUS308 Statistics for ManagersDiscussions To participate in .docxcurwenmichaela
 
BUS308 Week 4 Lecture 1 Examining Relationships Expect.docx
BUS308 Week 4 Lecture 1 Examining Relationships Expect.docxBUS308 Week 4 Lecture 1 Examining Relationships Expect.docx
BUS308 Week 4 Lecture 1 Examining Relationships Expect.docxcurwenmichaela
 
BUS225 Group Assignment1. Service BlueprintCustomer acti.docx
BUS225 Group Assignment1. Service BlueprintCustomer acti.docxBUS225 Group Assignment1. Service BlueprintCustomer acti.docx
BUS225 Group Assignment1. Service BlueprintCustomer acti.docxcurwenmichaela
 
BUS301 Memo Rubric Spring 2020 - Student.docxBUS301 Writing Ru.docx
BUS301 Memo Rubric Spring 2020 - Student.docxBUS301 Writing Ru.docxBUS301 Memo Rubric Spring 2020 - Student.docxBUS301 Writing Ru.docx
BUS301 Memo Rubric Spring 2020 - Student.docxBUS301 Writing Ru.docxcurwenmichaela
 
BUS1431Introduction and PreferencesBUS143 Judgmen.docx
BUS1431Introduction and PreferencesBUS143 Judgmen.docxBUS1431Introduction and PreferencesBUS143 Judgmen.docx
BUS1431Introduction and PreferencesBUS143 Judgmen.docxcurwenmichaela
 
BUS210 analysis – open question codesQ7a01 Monthly OK02 Not .docx
BUS210 analysis – open question codesQ7a01 Monthly OK02 Not .docxBUS210 analysis – open question codesQ7a01 Monthly OK02 Not .docx
BUS210 analysis – open question codesQ7a01 Monthly OK02 Not .docxcurwenmichaela
 
Bus101 quiz (Business Organizations)The due time is in 1hrs1 .docx
Bus101 quiz (Business Organizations)The due time is in 1hrs1 .docxBus101 quiz (Business Organizations)The due time is in 1hrs1 .docx
Bus101 quiz (Business Organizations)The due time is in 1hrs1 .docxcurwenmichaela
 
BUS 625 Week 4 Response to Discussion 2Guided Response Your.docx
BUS 625 Week 4 Response to Discussion 2Guided Response Your.docxBUS 625 Week 4 Response to Discussion 2Guided Response Your.docx
BUS 625 Week 4 Response to Discussion 2Guided Response Your.docxcurwenmichaela
 
BUS 625 Week 2 Response for Discussion 1 & 2Week 2 Discussion 1 .docx
BUS 625 Week 2 Response for Discussion 1 & 2Week 2 Discussion 1 .docxBUS 625 Week 2 Response for Discussion 1 & 2Week 2 Discussion 1 .docx
BUS 625 Week 2 Response for Discussion 1 & 2Week 2 Discussion 1 .docxcurwenmichaela
 
Bus 626 Week 6 - Discussion Forum 1Guided Response Respon.docx
Bus 626 Week 6 - Discussion Forum 1Guided Response Respon.docxBus 626 Week 6 - Discussion Forum 1Guided Response Respon.docx
Bus 626 Week 6 - Discussion Forum 1Guided Response Respon.docxcurwenmichaela
 
BUS 499, Week 8 Corporate Governance Slide #TopicNarration.docx
BUS 499, Week 8 Corporate Governance Slide #TopicNarration.docxBUS 499, Week 8 Corporate Governance Slide #TopicNarration.docx
BUS 499, Week 8 Corporate Governance Slide #TopicNarration.docxcurwenmichaela
 
BUS 499, Week 6 Acquisition and Restructuring StrategiesSlide #.docx
BUS 499, Week 6 Acquisition and Restructuring StrategiesSlide #.docxBUS 499, Week 6 Acquisition and Restructuring StrategiesSlide #.docx
BUS 499, Week 6 Acquisition and Restructuring StrategiesSlide #.docxcurwenmichaela
 
BUS 499, Week 4 Business-Level Strategy, Competitive Rivalry, and.docx
BUS 499, Week 4 Business-Level Strategy, Competitive Rivalry, and.docxBUS 499, Week 4 Business-Level Strategy, Competitive Rivalry, and.docx
BUS 499, Week 4 Business-Level Strategy, Competitive Rivalry, and.docxcurwenmichaela
 
BUS 437 Project Procurement Management Discussion QuestionsWe.docx
BUS 437 Project Procurement Management  Discussion QuestionsWe.docxBUS 437 Project Procurement Management  Discussion QuestionsWe.docx
BUS 437 Project Procurement Management Discussion QuestionsWe.docxcurwenmichaela
 
BUS 480.01HY Case Study Assignment Instructions .docx
BUS 480.01HY Case Study Assignment Instructions     .docxBUS 480.01HY Case Study Assignment Instructions     .docx
BUS 480.01HY Case Study Assignment Instructions .docxcurwenmichaela
 

More from curwenmichaela (20)

BUS310ASSIGNMENTImagine that you work for a company with an ag.docx
BUS310ASSIGNMENTImagine that you work for a company with an ag.docxBUS310ASSIGNMENTImagine that you work for a company with an ag.docx
BUS310ASSIGNMENTImagine that you work for a company with an ag.docx
 
BUS357 Copyright © 2020 Singapore University of Social Science.docx
BUS357 Copyright © 2020 Singapore University of Social Science.docxBUS357 Copyright © 2020 Singapore University of Social Science.docx
BUS357 Copyright © 2020 Singapore University of Social Science.docx
 
BUS308 – Week 1 Lecture 2 Describing Data Expected Out.docx
BUS308 – Week 1 Lecture 2 Describing Data Expected Out.docxBUS308 – Week 1 Lecture 2 Describing Data Expected Out.docx
BUS308 – Week 1 Lecture 2 Describing Data Expected Out.docx
 
BUS308 – Week 5 Lecture 1 A Different View Expected Ou.docx
BUS308 – Week 5 Lecture 1 A Different View Expected Ou.docxBUS308 – Week 5 Lecture 1 A Different View Expected Ou.docx
BUS308 – Week 5 Lecture 1 A Different View Expected Ou.docx
 
BUS308 – Week 1 Lecture 1 Statistics Expected Outcomes.docx
BUS308 – Week 1 Lecture 1 Statistics Expected Outcomes.docxBUS308 – Week 1 Lecture 1 Statistics Expected Outcomes.docx
BUS308 – Week 1 Lecture 1 Statistics Expected Outcomes.docx
 
BUS308 Statistics for ManagersDiscussions To participate in .docx
BUS308 Statistics for ManagersDiscussions To participate in .docxBUS308 Statistics for ManagersDiscussions To participate in .docx
BUS308 Statistics for ManagersDiscussions To participate in .docx
 
BUS308 Week 4 Lecture 1 Examining Relationships Expect.docx
BUS308 Week 4 Lecture 1 Examining Relationships Expect.docxBUS308 Week 4 Lecture 1 Examining Relationships Expect.docx
BUS308 Week 4 Lecture 1 Examining Relationships Expect.docx
 
BUS225 Group Assignment1. Service BlueprintCustomer acti.docx
BUS225 Group Assignment1. Service BlueprintCustomer acti.docxBUS225 Group Assignment1. Service BlueprintCustomer acti.docx
BUS225 Group Assignment1. Service BlueprintCustomer acti.docx
 
BUS301 Memo Rubric Spring 2020 - Student.docxBUS301 Writing Ru.docx
BUS301 Memo Rubric Spring 2020 - Student.docxBUS301 Writing Ru.docxBUS301 Memo Rubric Spring 2020 - Student.docxBUS301 Writing Ru.docx
BUS301 Memo Rubric Spring 2020 - Student.docxBUS301 Writing Ru.docx
 
BUS1431Introduction and PreferencesBUS143 Judgmen.docx
BUS1431Introduction and PreferencesBUS143 Judgmen.docxBUS1431Introduction and PreferencesBUS143 Judgmen.docx
BUS1431Introduction and PreferencesBUS143 Judgmen.docx
 
BUS210 analysis – open question codesQ7a01 Monthly OK02 Not .docx
BUS210 analysis – open question codesQ7a01 Monthly OK02 Not .docxBUS210 analysis – open question codesQ7a01 Monthly OK02 Not .docx
BUS210 analysis – open question codesQ7a01 Monthly OK02 Not .docx
 
Bus101 quiz (Business Organizations)The due time is in 1hrs1 .docx
Bus101 quiz (Business Organizations)The due time is in 1hrs1 .docxBus101 quiz (Business Organizations)The due time is in 1hrs1 .docx
Bus101 quiz (Business Organizations)The due time is in 1hrs1 .docx
 
BUS 625 Week 4 Response to Discussion 2Guided Response Your.docx
BUS 625 Week 4 Response to Discussion 2Guided Response Your.docxBUS 625 Week 4 Response to Discussion 2Guided Response Your.docx
BUS 625 Week 4 Response to Discussion 2Guided Response Your.docx
 
BUS 625 Week 2 Response for Discussion 1 & 2Week 2 Discussion 1 .docx
BUS 625 Week 2 Response for Discussion 1 & 2Week 2 Discussion 1 .docxBUS 625 Week 2 Response for Discussion 1 & 2Week 2 Discussion 1 .docx
BUS 625 Week 2 Response for Discussion 1 & 2Week 2 Discussion 1 .docx
 
Bus 626 Week 6 - Discussion Forum 1Guided Response Respon.docx
Bus 626 Week 6 - Discussion Forum 1Guided Response Respon.docxBus 626 Week 6 - Discussion Forum 1Guided Response Respon.docx
Bus 626 Week 6 - Discussion Forum 1Guided Response Respon.docx
 
BUS 499, Week 8 Corporate Governance Slide #TopicNarration.docx
BUS 499, Week 8 Corporate Governance Slide #TopicNarration.docxBUS 499, Week 8 Corporate Governance Slide #TopicNarration.docx
BUS 499, Week 8 Corporate Governance Slide #TopicNarration.docx
 
BUS 499, Week 6 Acquisition and Restructuring StrategiesSlide #.docx
BUS 499, Week 6 Acquisition and Restructuring StrategiesSlide #.docxBUS 499, Week 6 Acquisition and Restructuring StrategiesSlide #.docx
BUS 499, Week 6 Acquisition and Restructuring StrategiesSlide #.docx
 
BUS 499, Week 4 Business-Level Strategy, Competitive Rivalry, and.docx
BUS 499, Week 4 Business-Level Strategy, Competitive Rivalry, and.docxBUS 499, Week 4 Business-Level Strategy, Competitive Rivalry, and.docx
BUS 499, Week 4 Business-Level Strategy, Competitive Rivalry, and.docx
 
BUS 437 Project Procurement Management Discussion QuestionsWe.docx
BUS 437 Project Procurement Management  Discussion QuestionsWe.docxBUS 437 Project Procurement Management  Discussion QuestionsWe.docx
BUS 437 Project Procurement Management Discussion QuestionsWe.docx
 
BUS 480.01HY Case Study Assignment Instructions .docx
BUS 480.01HY Case Study Assignment Instructions     .docxBUS 480.01HY Case Study Assignment Instructions     .docx
BUS 480.01HY Case Study Assignment Instructions .docx
 

Recently uploaded

Full Stack Web Development Course for Beginners
Full Stack Web Development Course  for BeginnersFull Stack Web Development Course  for Beginners
Full Stack Web Development Course for BeginnersSabitha Banu
 
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdfFraming an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdfUjwalaBharambe
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon AUnboundStockton
 
Pharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdfPharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdfMahmoud M. Sallam
 
Meghan Sutherland In Media Res Media Component
Meghan Sutherland In Media Res Media ComponentMeghan Sutherland In Media Res Media Component
Meghan Sutherland In Media Res Media ComponentInMediaRes1
 
Hierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of managementHierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of managementmkooblal
 
Types of Journalistic Writing Grade 8.pptx
Types of Journalistic Writing Grade 8.pptxTypes of Journalistic Writing Grade 8.pptx
Types of Journalistic Writing Grade 8.pptxEyham Joco
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxiammrhaywood
 
Biting mechanism of poisonous snakes.pdf
Biting mechanism of poisonous snakes.pdfBiting mechanism of poisonous snakes.pdf
Biting mechanism of poisonous snakes.pdfadityarao40181
 
Final demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptxFinal demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptxAvyJaneVismanos
 
DATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersDATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersSabitha Banu
 
Historical philosophical, theoretical, and legal foundations of special and i...
Historical philosophical, theoretical, and legal foundations of special and i...Historical philosophical, theoretical, and legal foundations of special and i...
Historical philosophical, theoretical, and legal foundations of special and i...jaredbarbolino94
 
Painted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of IndiaPainted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of IndiaVirag Sontakke
 
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...M56BOOKSTORE PRODUCT/SERVICE
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationnomboosow
 
CELL CYCLE Division Science 8 quarter IV.pptx
CELL CYCLE Division Science 8 quarter IV.pptxCELL CYCLE Division Science 8 quarter IV.pptx
CELL CYCLE Division Science 8 quarter IV.pptxJiesonDelaCerna
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxSayali Powar
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxOH TEIK BIN
 
How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17Celine George
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxthorishapillay1
 

Recently uploaded (20)

Full Stack Web Development Course for Beginners
Full Stack Web Development Course  for BeginnersFull Stack Web Development Course  for Beginners
Full Stack Web Development Course for Beginners
 
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdfFraming an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon A
 
Pharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdfPharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdf
 
Meghan Sutherland In Media Res Media Component
Meghan Sutherland In Media Res Media ComponentMeghan Sutherland In Media Res Media Component
Meghan Sutherland In Media Res Media Component
 
Hierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of managementHierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of management
 
Types of Journalistic Writing Grade 8.pptx
Types of Journalistic Writing Grade 8.pptxTypes of Journalistic Writing Grade 8.pptx
Types of Journalistic Writing Grade 8.pptx
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
 
Biting mechanism of poisonous snakes.pdf
Biting mechanism of poisonous snakes.pdfBiting mechanism of poisonous snakes.pdf
Biting mechanism of poisonous snakes.pdf
 
Final demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptxFinal demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptx
 
DATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersDATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginners
 
Historical philosophical, theoretical, and legal foundations of special and i...
Historical philosophical, theoretical, and legal foundations of special and i...Historical philosophical, theoretical, and legal foundations of special and i...
Historical philosophical, theoretical, and legal foundations of special and i...
 
Painted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of IndiaPainted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of India
 
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communication
 
CELL CYCLE Division Science 8 quarter IV.pptx
CELL CYCLE Division Science 8 quarter IV.pptxCELL CYCLE Division Science 8 quarter IV.pptx
CELL CYCLE Division Science 8 quarter IV.pptx
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptx
 
How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptx
 

BUS 308 – Week 4 Lecture 2 Interpreting Relationships .docx

  • 1. BUS 308 – Week 4 Lecture 2 Interpreting Relationships Expected Outcomes After reading this lecture, the student should be able to: 1. Interpret the strength of a correlation 2. Interpret a Correlation Table 3. Interpret a Linear Regression Equation 4. Interpret a Multiple Regression Equation Overview As in many detective stories, we will often find that when one thing changes, we see that something else has changed as well. Moving to correlation and regression opens up new insights into our data sets, but still lets us use what we have learned about Excel tools in setting up and generating our results. The correlation between events is mirrored in data analysis examinations with correlation analysis. This week’s focus changes from detecting and evaluating differences to looking at relationships. As students often comment, finding significant differences in gender-based measures does not explain why these differences exist. Correlation, while not always explaining why things happen gives data detectives great clues on what to
  • 2. examine more closely and helps move us towards understanding why outcomes exist and what impacts them. If we see correlations in the real world, we often will spend time examining what might underlie them; finding out if they are spurious or causal. Regression lets us use relationships between and among our variables to predict or explain outcomes based upon inputs, factors we think might be related. In our quest to understand what impacts the compa-ratio and salary outcomes we see, we have often been frustrated due to being basically limited to examining only two variables at a time, when we felt that we needed to include many other factors. Regression, particularly multiple regression, is the tool that allows us to do this. Linear Correlation When two things seem to move in a somewhat predictable way, we say they are correlated. This correlation could be direct or positive, both move in the same direction, or it could be inverse or negative, where when one increases the other decreases. The Law of Supply in economics is a common example of an inverse (or negative) correlation, where the more supply we have of something, the less we typically can charge for it; the Law of Demand is an example of a direct (or positive) correlation as the more demand exists for something, the more we can charge for it. Height and weight in young children is another common example of a direct correlation, as one increases so does the other measure.
  • 3. Probably the most commonly used correlation is the Pearson Correlation Coefficient, symbolized by r. It measures the strength of the association – the extent to which measures change together – between interval or ratio level measures as well as the direction of the relationship (inverse or direct). Several measures in our company data set could use the Pearson Correlation to show relationships; salary and midpoint, salary and years of service, salary and performance rating, etc. The Pearson Correlation runs from - 1.0 (perfect negative or inverse correlation) thru 0 (no correlation) to +1.0 (perfect positive or direct correlation). A perfect correlation means that if we graphed the values, they would fall exactly on a straight line, either increase from bottom left to top right (positive) or from top left to bottom right (negative). The stronger the absolute value (ignoring the sign), the stronger the correlation and the more the data points would form a straight line when plotted on a graph. The Excel Fx function Correl, and the Data Analysis function Correlation both produce Pearson Correlations. Question 1 When we have a data set with multiple variables, we would want to see what relationships exist – a detective’s sort of “who works with whom” around the result we are looking for. This question asks for us to identify the
  • 4. statistically significant correlations between the SALARY and the other variables. Remember, while the assignment asks for correlations with Salary, this example will use correlations with the compa-ratio. The hypothesis testing steps are Step 1: Ho: Correlation with compa-ratio is not significant. Ha: Correlation with compa-ratio is significant. (A two-tail test.) Technically we should set up the hypothesis testing steps for each of the correlations (which we will see shortly equals 15). This is a bit tedious, so we approach the issue of statistical significance as we did with the ANOVA coefficients last week by using a single Hypothesis test and apply it to each of the correlations. Step 2: Alpha = 0.05 Step 3: Statistical Test and statistic: Pearson’s correlation (r), t, and the correlation t-test to test a correlation The significance of the Pearson Correlation is tested with our old friend the t-test. Step 4: Reject the null hypothesis if the p-value < 0.05. Step 5: Conduct the test. Below is a screen shot of a correlation table produced by the Analysis Toolpak Correlation tool.
  • 5. Reading the Table. The table only shows correlations below the diagonal (which has a 1.00 correlation of each variable with itself). Values above the line would simply duplicate those below it. The diagonal is a “pivot-point,” so to speak. In reading the correlations we would start with a row such as Age. The correlation of Age and Compa- ratio is 0.195 (rounded), the correlation of Age and Midpoint is 0.567. Then we get to the diagonal. Instead of continuing horizontally, we start going vertical (down the Age column). So, the correlation of Age with Performance rating is 0.139, with Service is 0.565, and with raise -0.180. All correlations, except for the first one (in this case Compa-ratio) would be read this way in Correlation tables. Step 6: Conclusion and Interpretation. To assess the statistical significance of the correlations we need to calculate a t value for each correlation, using the formula: t = r * sqrt(n-2)/sqrt(1-r^2), df = n-2; In this formula, n equals the number of data point pairs used in the correlation, and r equals the correlation that we are testing. The associated p-value for a t-value is found using the Fx function T.DIST.2T(t, df). In any correlation table, the correlations are all developed with
  • 6. the same number of pairs of data points. (In our case, we have 50 pairs to use.) So, it seems reasonable that each of these correlations would have the same critical value that cuts off our p-value of 0.05. So, once we find a correlation value that is non-significant (that is, we fail to reject the null hypothesis), any correlations smaller than this would also be non-significant. For this lecture, please ignore part b. We will discuss this in Lecture 3, as it is a short-cut that depends on your understanding the approach described below. Part c asks us to use this information and identify the variables significantly correlated to salary in the homework and to compa-ratio for this example. We need to calculate the t value and its associated p-value to determine significance. Starting with the largest correlation value with Compa-ratio, we see this is with midpoint, and is 0.50, rounded. So we have: T= r * sqrt(n-2)/sqrt(1-r^2) = .5*sqrt(50-2)/sqrt(1-.5^2) = 4 (letting Excel do the math). The two-tail p-value is T.DIST.2T(t,df) = T.DIST.2T(4,48) = 0.0002. The null hypothesis is rejected for the Compa-Ratio and Midpoint correlation.
  • 7. The next largest Compa-ratio correlation is with Age at 0.20 (rounded). Using the same Excel functions, we get t= 1.41 with a p-value of 0.164 (both rounded). So, we do not reject the null hypothesis for the Compa-ratio and age correlation. It makes sense that if a correlation of .20 is non-significant, then any smaller correlations would also be non-significant, so our testing is done. Looking at the output table above, we can say that only Midpoint is significantly correlated to compa-ratio with a correlation of .50 (rounded). Part d asks for any surprising results/correlations. This will depend upon your table and what you did or did not expect. Part e asks if this information helps us answer our equal pay question. The compa-ratio correlations do not seem to be helpful as they do not shed any insight on gender based issues. Multiple Correlation As interesting as linear correlation is, multiple correlation is even more so. It correlates several independent (input) variables with a single dependent (output) variable. For example, it would show the shared variation (multiple R squared, or Multiple Coefficient of Determination) for compa-ratio with the other variables in the data set at the same time rather than in pairs as we did in question 1. While we can generate this value by itself, it
  • 8. is a bit complicated and is rarely found except in conjunction with a multiple regression equation. So, having noted that this exists, let’s move on to multiple regression. Regression Regression takes us the next step in the journey. We move from knowing which variables are correlated to finding out which variables can be used to actually predict outcomes or explain the influence of different variables on a result. As we might suspect, linear regression involves a single dependent (outcome) and single independent (input) variable. Linear regression uses at least interval level data for both the dependent and independent variables. The form of a linear regression equation is: Y = a + b*X; where Y is the output, X is the input, a is the intercept (the value of y when X = 0) on a graph, and b is the coefficient (showing the change in Y for every 1 change in the value of X. Earlier, we found that the correlation between raise and performance rating was 0.674 (rounded). While we did not make note of this in our correlation discussion, it was part of the correlation table. This correlation relates to a coefficient of determination (CD) of 0.674^2 or 0.45 (rounded). As mentioned, this is not a particularly strong correlation, and we would not
  • 9. expect the graph of these values to show much of a straight line. For purposes of understanding linear regression, let’s look at a graph showing performance rating as an input (an X variable) predicting raise (Y). An example of a regression equation and its graph is: Raise (Y) vs Performance Rating (X) This is a Scatter Diagram graph produced by Excel. The regression line, equation, and R- squared values have been added. Note that the Coefficient of Determination (R2) is the 45% we found earlier, and that the data points are not all that close to the regression (AKA trend) line. Note the format of the regression equation Y = 0.5412 + 0.0512X, this is the same as saying Raise = 0.5412 + 0.0512* Performance Rating when we substitute the variable names for the algebraic letters. Let us look at the equation. Since we know that the correlation is significant (it is larger than our 0.50 value we found for compa-ratio and Midpoint), the linear regression equation is significant. The regression says for every single point increase in the performance rating (our X variable), the raise (The Y variable) increases, on average by 0.0512%. If we extended the line towards the y (vertical axis), it would cross at Y = – 0.0512 and X = 0, this is an example where looking at the origin points is not particularly helpful as no one has a performance rating of 0.
  • 10. This graph does tend to reinforce our earlier comment that raise and performance rating, even y = 0.0512x + 0.5412 R² = 0.4538 0 1 2 3 4 5 6 7 0 20 40 60 80 100 120 though the strongest correlation, are not particularly good at predicting each other’s value. We see too much dispersion of data points around the best fit regression line through the data points. Most of us are probably not surprised, just as we feel compa- ratio is not determined by a single factor, we know raise is more complicated than simply the performance rating. This is where looking at multiple regression, the use of several factors,
  • 11. might be more insightful. Multiple Regression Multiple Regression is probably the most powerful tool we will look at in this course. It allows us to examine the impact of multiple inputs (AKA independent variables) on a single output or result (AKA dependent variable). It also allows us to include nominal and ordinal variables in the results when they are used as dummy coded variables. Multiple regression has an interesting ability that we have not been able to use before. It can use nominal data variables as inputs to our outcomes, rather than using them simply as grouping labels. It does so by assigning either a 0 or 1 to the variable value depending upon whether some characteristic exists or not. For example, with degree we essentially are looking to see if a graduate degree has any impact, since everyone in the sample has at least an undergraduate degree. So, we code the existence of a graduate degree with a 1, and the “non- existence” with a 0. Similarly, with gender we are interested, essentially, how females are being treated, so we code them 1 (existence of being female). This coding is called Dummy Coding, and involves only using a 0 or 1 in specific situations where the existence of a factor is considered important. Note, other than some changes in the value of the coefficients, the outcomes would not differ if the codes were reversed. The significance, or non-significance, of degree or gender would remain the same regardless of the code
  • 12. used. We will comment on this more after we see our results. Question 2 Question 2 for this week asks for a regression equation that explains the impact of various variables on our output of interest. Of course, in the homework this is salary, while in our lectures it is the compa-ratio. Now that the data has been set up, let’s look at our hypothesis testing process for the question, first, of whether or not the regression equation is helpful in explaining what impacts compa-ratio outcomes. Parts a and b. This part looks at the overall regression. Step 1: Ha: The regression equation is not significant. Ho: The regression equation is significant. Step 2: Alpha = 0.05 Step 3: F stat and ANOVA-Regression, used to test regression significance Step 4: Decision Rule: Reject the null hypothesis if p-value < 0.05. Step 5: Conduct the test. Here is a screen shot of a multiple regression analysis for the
  • 13. question of what factors influence compa-ratio. Note: we will split the discussion of the output into two screen shots. Step 6: Conclusion and Interpretation. The first table in the output provides some summary statistics. Two are important for us – the multiple correlation, shown as R, which equals 0.655, a moderate value; and, the R square or the multiple coefficient of determination showing that about 43% of the variation in compa-ratio values can be explained by the shared variation in the variables used in the analysis. The second table shows the results of the actual statistical test of the regression. Similar to the ANOVA tables we looked at last week, it has two rows that are used to generate our F statistic (4.51) and the p-value which is labeled “Significance F.” What is the p-value? 0.0008 Decision: Rej or Not reject the null? Reject the null hypothesis. Why? The p-value is less than (<) 0.05. Conclusion about Compa-ratio factors? The input variables are significantly related to compa-ratio outcomes. Some of the compa-ratio outcomes can be explained by the selected variables. We used the phrase “some of” since the equation only explains 43%
  • 14. of the variance, less than half. Part c Once we reject the null hypothesis, our attention changes to the actual equation, the variables and their corresponding coefficients. The third table provides all the details we need to reach our conclusions. As with the correlations in question 1, we will use the hypothesis testing process, but will write it only once and use the p-values to make decisions on each of the possible equation variables. Step 1: Ha: The variable coefficient is not significant (b = 0). Ho: The variable coefficient is significant (b =/= 0). Step 2: Alpha = 0.05 Step 3: T stat and t-test for coefficients Step 4: Decision Rule: Reject the null hypothesis if p-value < 0.05. Step 5: Conduct the test. In this case, the test has already been performed and is part of the regression out. Here is a screen shot of the second half of the Regression output. Step 6: Conclusions and Interpretation
  • 15. As with the correlations, we will use a single statement of the 6 steps to interpret the outcomes in this part. You are asked to transfer values from the ANOVA-Reg table to a decision summary table. Here is the completed table. Note that the variable names (the X’s in our regression equation) come from column L. The t- values are shown in column O while the p-values are in Column P. The coefficients (the b’s in our equation) are listed in Column M. The rejection decision is the same as we have been using. Looking at the p-values, we reject the null for all p-values less than 0.05. Note the unusual looking p-value for the intercept: 2.9E-13. This format is called exponential and is the same as 2.9 * 10-13. This value equals 0.00000000000029, we move the decimal point 13 places to the left. Any E-xx Excel output will be less than our alpha value of 0.05. We do not list the intercept in our table as it is always included in the equation. With our results, we reject the null hypothesis and find that 3 variables are significant factors into determining compa-ratio, and these are Midpoint, Perf. Rating and Gender. The Multiple Regression equation is similar to the linear
  • 16. regression example given above except it has more independent terms: Y = a + b1*X1 + b2*X2 + B3*X3 + …. The b’s stand for the coefficients that are multiplied by the value of each variable (represented by the X’s). With these 3 variables and the intercept, the statistically significant regression equation is: Compa-ratio = 0.954 + 0.003*midpoint -0.002*performance rating + 0.056*gender. So, what does this equation mean? How do we interpret it? The intercept (0.9545) is somewhat of a place holder – it centers the line in the middle of the data points but has little other meaning for us. The three variables, however tell us a lot. Changes in each of them impact the compa-ratio outcome independently of the others; it is as if we can consider the other factors being held constant as we examine each factor’s impact. So, all other things the same, each dollar increase in midpoint increases the compa-ratio value by 0.0034. This relates to what we found last week that compa-ratio is not independent of grade. At the same time, and possibly surprisingly, every increase in an employee’s performance rating causes the compa-rating to decrease by .0024! Finally, the equation says that gender is an important factor. This factor alone means that the company is violating the equal pay act. But, what might be surprising is that for a change from male (coded 0) to female (coded 1) the compa-ratio goes up by 0.0562! Females get a higher compa-ratio (percent of midpoint) when all
  • 17. other things are equal than males do, since the female gender results in adding 0.056*1 to the compa-ratio while the male gender has 0.056 * 0 (or 0) added to their compa-ratio. We did have one hint that this might be the case, when we noticed in week 1 that the female mean compa-ratio was higher than the male compa-ratio. But, then some of the single factor tests minimize this difference. This is one of multiple regression’s greatest strengths, it will show us the impact of a single variable by controlling for, or keeping constant, the impact of all other variables. Parts d, e, and f Gender is a significant element in the compa-ratio, as females (coded 1) get a higher value when all other variables are equal. We see this from the significant positive coefficient to the variable gender. If we had switched the coding and had Females coded 0, the sign of the gender variable would have changed causing Males to have part of their compa-ratio reduced due to being male. Question 3 This answer will depend on what other factors you would like to see. Question 4
  • 18. As of this point, we have some strong evidence in the compa- regression equation and the t-test on average compa-ratios, that females get more pay for equal work than males. The company is violating the Equal Pay Act, in favor of women. Question 5 What you say here describes your understanding of regression analysis versus the power of inferential tests of 2 variables at a time. Summary Correlations show the direction and strength of a relationship between two variables and are fairly straight forward to understand. The Pearson correlation and the Spearman Rank order correlation are the two most generally used correlations. Excel produces the Pearson Correlation in a single value or in a table showing correlations among three or more variables. Each cell shows the correlation value for the variable listed on the side row and top column. The statistical significance of either the Pearson or Spearman correlation is found by using the t-statistic. The Spearman’s rank correlation is not produced directly by Excel. If we have variables that are related/correlated to an output, such as salary, we can create a regression equation. A regression equation is somewhat like a recipe for your favorite food; it tells how “much” of each ingredient (AKA variables) to add
  • 19. into the mix to get your result. The regression tables have a lot of information in them, and at first glance they can appear a bit overwhelming. However, as with the ANOVA tables, only a few results are important. Start with the first value, the Multiple R and R squared (the multiple coefficient of determination). These give us a sense of how well the regression variables explain the outcome. Then go to the next table and look at the significance of F value at the right. This is our P-value and should be less than 0.05 for our regression equation to reject the null of no significance. Assuming we rejected the null, the third table gives us our details. The first two columns contain the variable names and their respective coefficient. These get multiplied together and added (or subtracted depending on the coefficient sign) to create the regression equation. A couple of columns over is the p-value column, letting us know which of the variables is significant to the regression. If interested, we can build a confidence interval using the final columns for each variable. Please ask your instructor if you have any questions about this material. When you have finished with this lecture, please respond to Discussion Thread 2 for this week with your initial response and responses to others over a couple of days before reading the
  • 20. third lecture for the week. BUS 308 Week 4 Lecture 3 Developing Relationships in Excel Expected Outcomes After reading this lecture, the student should be able to: 1. Calculate the t-value for a correlation coefficient 2. Calculate the minimum statistically significant correlation coefficient value. 3. Set-up and interpret a Linear Regression in Excel 4. Set-up and interpret a Multiple Regression in Excel Overview Setting up correlations and regressions in Excel is fairly straightforward and follows the approaches we have seen with our previous tools. This involves setting up the data input table, selecting the tools, and inputting information into the appropriate parts of the input window. Correlations Question 1 Data set-up for a correlation is perhaps the simplest of any we have seen. It involves simply copying and pasting the variables from the Data tab to the Week 4 worksheet. Again,
  • 21. paste them to the right of the question area. The screenshot below has the data for both the question 1 correlation and the question 2 multiple regression pasted them starting at column V. You can paste all the data at once or add the multiple regression variables later (as long as you do not sort the original data). Specifically, for Question 1, copy the salary data to column V (for example). Then copy the Midpoint thru Service columns and paste them next to salary. Finally copy the Raise column and paste it next to the service column. Notice that our data input range for this question now includes Salary in Column V and the other interval level variables found in Columns W thru AA. Question 1 asks for the correlation among the interval/ratio level variables with salary and says to exclude compa-ratio. For our example, we will correlation compa-ratio with the other interval/ratio level variables with the exclusion of salary. Since compa-ratio equals the salary divided by the midpoint, it does not seem reasonable to use salary in predicting compa- ratio or compa-ratio in predicting salary. Pearson correlations can be performed in two ways within Excel. If we have a single pair of variables we are interested in, for example compa-ratio and
  • 22. performance rating, we could use the fx (or Formulas) function CORREL(array1, array2) (note array means the same as range) to give us the correlation. However, if we have several variables we want to correlate at the same time, it is more effective to use the Correlation function found in the Analysis ToolPak in the Data Analysis tab. Set up of the input data for Correlation is simple. Just ensure that all of the variables to be correlated are listed together, and only include interval or ratio level data. For our data set, this would mean we cannot include gender or degree; even though they look like numerical data the 0 and 1 are merely labels as far as correlation is concerned. In the Correlation data input box shown below, list the entire data range, indicate if your data has labels or not (good idea to include these), select the output cell, and click OK. Here is a screen shot of the input box and some of the data. The result will show up in K08 (in this case). Statistical Significance Part b. Normally, we would go thru our questions about the p- value for each value. But since you are familiar with the testing logic, for this question we are going to “shortcut” the process. Now, there is an easier way of determining which of the correlations are statistically
  • 23. significant. This is suggested by the question1 part b that we skipped in lecture 2. We noted that values smaller than the r = .20 that we tested could be assumed to all be non-significant. We could also have assumed that values larger than the tested 0.50 would be assumed to all be significant. So, it would seem to make sense that there is a specific value of r that exactly matches the alpha = 0.05 criteria. If we can find this value of r, we can compare each correlation with this critical value; correlations larger (absolute values) than this are significant; while smaller correlations are not significant. Having this critical value would give us a quick decision point (much like how we use the p-value). The issue is now, what is this critical r value? Technical Point. If you are interested in how we obtain the formula for determining the minimum r value, the approach is shown below. If you are not interested in the math, you can safely skip this paragraph, and go to The Result paragraph below. We know that t = r* sqrt(n-2)/sqrt(1-r2) Multiplying both sides by sqrt(1-r2) gives us t *sqrt (1- r2) = r*sqrt(n-2) Squaring both sides gives us: t2 * (1- r2) = r2* (n-2) Multiplying each side out gives us: t2– t2* r2 = n r2-2* r2
  • 24. Adding t2* r2 both sides gives us: t2= n* r2-2*r2+ t2 *r2 Factoring gives us: t2= r2 *(n -2+ t2) Dividing both sides by *(n -2+ t2) gives us: t2 / (n -2+ t2) = r2 Taking the square root gives us: r = sqrt (t2 / (n -2+ t2) The Result. The formula to use in finding the minimum correlation value that is statistically significant is: r = sqrt(t^2/(t^2 + n-2)), where t is the 2-tail value for any df count. We would find the t value associated with a two-tail p-value of 0.05 and a df value of 48 by using the t.inv.2T(alpha, df) function with alpha = 0.05 and df = n-2 or 48 (for our data set of 50 employees). Plugging these values into the gives us a t- value of 2.0106 or 2.011(rounded). t =t.inv.2T(alpha, df) =t.inv.2t(0.05,48) = 2.011 r = sqrt(t^2/(t^2 + n-2)) = sqrt(2.011^2/(2.011^2 + 50-2) = 0.278. Therefore, in a correlation table based on 50 pairs, any correlation greater than or equal to 0.278 would be statistically significant. So, what does all this mean? If we find a correlation based on 50 pairs of data (such as what our data set will produce), any correlation value that exceeds an absolute value of 0.278 would be found to be statistically significant (p-value less than
  • 25. 0.05) and cause us to reject the related null hypothesis of not significant. So, when looking at a table of correlation values, we can identify the significant correlations immediately; these are any correlation above the absolute value of 0.278 (that means larger than + 0.278 (such as + .46) or less than -0.278 (such as - 0.53)). Knowing how to interpret table results, we can proceed making our decisions on what is significant. So, for part b, the first question asked is what is the T value that cuts off the two tails of the distribution with an alpha of 0.05? We calculated this above as 2.011. The second question asks for the associated correlation value for this t-value. Again, we found this above to be 0.278. Spearman’s. Note that while the Spearman’s rank order correlation is not asked for in the assignment, you might want to use it at times. For example, some could argue that Performance Rating, since it is based mostly on human judgement, it is really ordinal and requires Spearman’s. The formula for Spearman’s (which needs to be manually input into Excel), is: Rho = 1 – 6*(Sum of d^2)/(n*(n^2 – 1)); where d is the difference in the rank score for each of the paired variables, and n is the count of paired data used. Remember that the ^ in an Excel formula means take the number to that power, so d^2 means d squared (or d times d).
  • 26. Regression Question 2 Both linear and multiple regression are both set up in the same fashion, so we will look at only the multiple regression situation. For the data, put the dependent variable, the output such as salary or compa-ratio, in one column and then paste the independent, input, variables in sequential columns next to it. Make sure that none of the columns contain letter characters. It is also a good idea to include the variable labels for each data column. The first screen shot above shows the data input required for this question. The Regression function is found in the Data | Analysis block and is labeled Regression. Here is a screen shot of a complete Regression set-up for a regression equation for compa-ratio. Note that unlike the correlation input, we have two ranges to work with. The first is the output, which for this example is compa-ratio (and would be salary for the homework). The second is for the inputs, which should include all of the numeric looking variables, including the Degree and Gender variables as shown below. Data range entry for the Y (or outcome) and the X (or input) variables are done separately by either typing in the ranges or using dragging the cursor over the data range after clicking on
  • 27. the up arrow at the right end of the data entry boxes. The same is done with the data entry box after clicking the circle for Output range. There are a number of options to consider. First, of course, is the need to click the labels box if your data ranges include labels. A second option is the Constant is Zero equation. This would force the regression equation to pass thru the X = 0 and Y = 0 origin, even if this is not the best fit. Use this with caution, even though it might make sense to have Y = 0 when all the X variables are 0, using this option may not give us the equation that best fits the data. The residuals box provides a way to see how well each of the plotted data points fits with the predicted results. This will often allow us to see outliers – cases that do not fit with the rest of the data set. Outliers are sometimes indications of data entry errors or, in the case of salary, they may be paid using a different approach. One such example would be a commission salesperson being included with employees that are paid on a straight salary, the basis of pay is so different these two should not be analyzed in the same study. Other options here allow for the results to be turned into Z-scores (Standardized Residuals), plotted on a graph, or have linear plots made for the output and each separate input. Normal Probability Plots are rather complicated to discuss, and it is left to the student to explore this if desired. You are encouraged
  • 28. to play around with some of these options, even though they are not required for the assignment. Here is a video on Regression: https://screencast-o- matic.com/watch/cb6jfuIk8S Summary Pearson Correlations are fairly easy to produce in Excel with either the Analysis ToolPak Correlation function (best used for multiple correlations or when you want the labels shown) or the Fx (or Formulas) function CORREL, best used for a single correlation outcome with no labels. Both are used for the Pearson correlation only. The Spearman’s correlation requires setting up the data in rank order and providing ranks for each variable separately and then summing these and placing them into a cell formula to obtain the correlation. Setting up the data for a correlation is fairly simple. Just list the variables in a column and select the appropriate columns for the function being used. For the correlation table, have all the variable columns in a continuous range. The statistical significance of either correlation is found using the t formula t = r* sqrt(n- 2)/sqrt(1-r2), where r is the correlation value and n is the number of data pairs used for the correlation. Once we have a t-value we can use the t.dist.2t(t, df) formula (df -= n-2) to find the two tail p-value. The lecture presents an approach for finding a minimum statistically significant value when we have a table of correlations to look at;
  • 29. correlations with an absolute value equal to or greater than this value would be statistically significant. Data set-up for a regression is similar to the correlation table. Have the outcome variable at one end of the range so it can be selected alone and have all the other input variables listed in a continuous range. The regression function (for either a linear or multiple regression) is located in the Analysis ToolPak list and is called Regression. The set-up within the data entry box is similar to the other functions we have done (t-test, ANOVA, etc.) with a data range, data output location, and a label box to fill in. Please ask your instructor if you have any questions about this material. When you have finished with this lecture, please respond to Discussion Thread 3 for this week with your initial response and responses to others over a couple of days. https://screencast-o-matic.com/watch/cb6jfuIk8S