SlideShare a Scribd company logo
1 of 17
Week 4 Lecture 10
We have been examining the question of equal pay for equal
work for several weeks now; but have been somewhat frustrated
with the equal work part. We suspect that salary varies with
grade level, so that equal work is not done if we compare
salaries across grades. We found that we could control the
effect of grades with either of two techniques. The first is by
choosing a variable that does not include grade level variation
such as compa-ratios (the salary divided by midpoint). The
second by statistically removing the impact of grade level using
the ANOVA Two-factor without replication. Both of these gave
us different outcomes on the question of male and female pay
equality than examining salary only.
However, we still have not gotten a “clean” measure of equal
work as there are still other factors that may impact work done
such as performance levels (measured by the performance
appraisal rating), seniority, education, etc. And, there could be
gender bias (and, for real world companies, ethnic bias as well.
We will not cover this, but it can be dealt with the same way as
we will examine gender). We need to find a way to eliminate
the impact of these variables on our pay measure as well.
This week we will look at two techniques that are very good at
examining and explaining the influence of variables on
outcomes. These are correlation and regression techniques.
Linear Correlation
Correlation is a measure of how variables/things relate – that is,
if one variable changes does another variable change in a
predictable pattern as well? One very well-known example is
the correlation (or relationship) between length/height of
children and weight. As children become longer/taller their
weight also increases (Tanner & Youssef-Morgan, 2013). Using
this relationship, we can make predictions (using the technique
of regression discussed in Lecture 11 for this week) about how
heavy a child should be for any given height.
For variables that are at least interval in nature, two types of
correlation exist for a bivariable (two variables only)
relationship– linear and curvilinear. As they sound, linear
correlations show the extent to which the data variables move in
a straight line. Curvilinear correlations – which we will not
cover – show the extent that variables move in curved lines.
Scatter Diagrams
An effective way to see if the data do relate in predictable ways
involves generating a scatter diagram (AKA scatter chart) – a
visual display of how the data points – (variable 1 value,
corresponding variable 2 value) relate together (Lind, Marchel,
& Wathen, 2008).
Example1. One relationship we might expect to show a positive
(both values increasing) relationship would be salary and
performance rating, either for the entire salary range or at least
within grades. The following scatter diagram (made with the
Excel Insert Graph functions) show the relationship with
Performance Rating on the bottom and Salary on the on the
vertical axis. It shows if we put a straight line through the data
points, there is a very modest increase from the lower left to
upper right.
Salary (Y-axis) and Performance Rating (X-axis)
Example2. If we look at the same variables, but include Grade
as a factor, we get the second graph (below) and see the data
separated by grade. Each grade seems to show (again, if we
were to put a straight line thru the data points for each grade)
level lines, indicating no correlation at all. Neither graph gives
us much hope that Performance Rating is related to Salary ,
something HR would probably not be happy with.
Salary Grades (Y-axis) and Performance Appraisal Rating (X-
axis)
Correlation
We will be focusing our efforts on the Pearson Correlation
Coefficient – a mathematical value that shows the strength of
the linear (straight line) relationship between two variables
(Lind, Marchel, & Wathen, 2008). The math formula is a bit
tedious, so we will not bother with it – but, if interested, you
can ask Excel to display it (either with Help or the “Tell me
what you want to do.” With the latter, I typed show help on
Pearson Correlation, and then selected the “show help…” line,
getting a description and the math formula.).
Pearson correlation ranges from a value of -1.00 to a +1.00.
Any value outside of this range indicates an error in the math or
setup. A perfect negative correlation (-1.00) means that the
data points all fit exactly on a line that runs from the upper left
corner to the lower right on a graph, a negative slope. A perfect
positive correlation (+1.00) has the line with a positive slope
and runs from the lower left to the upper right (Tanner &
Youssef-Morgan, 2013).
As the values move away from the perfect extremes, the data
points move away from a
line to a spread around the line. If we look at our first graph
above, the overall Salary and Performance Rating relationship,
we have a correlation of +.15, considered very low and not
particularly impressive.
Pearson Correlation. Excel finds the Pearson Correlation
Coefficient using either the fx function Correl or the Data
Analysis function Correlation. The former is used for a single
data set with two variables, while the latter can be used for a
single or multiple data sets. The Correl output for the
Performance Rating and Salary correlation result is:
Column Column
1 2
Column 1 1
Column 2 0.151307 1
Note the variable names are not included, and we have three
correlations. Two will always show a perfect +1.00 correlation
of column 1 with column 1 and column 2 with column 2; a
diagonal convention makes more sense with the Correlation
table we will look at below. The third correlation is the column
1 with column 2 variable. It does not matter which variable is
considered in column 1 or 2, as the result will be the same as
switching the variable columns.
We can use the Correlation function to identify correlations
between multiple data sets at the same time, much as
Descriptive Statistics could work with multiple variables at
once. In trying to identify what variables might be impacting
Salary, we could generate the following table. Remember, that
Pearson’s Correlation requires at least interval level data, so
that not all of our variables are used. In addition, since Salary
and Compa-ratio are two measures of the same thing (pay) we
do not want to include them in the same table.
Sal
Mid
Age
Perf Rat
Service
Raise
Sal
1.000
Mid
0.989
1.000
Age
0.544
0.567
1.000
Perf Rat
0.151
0.192
0.139
1.000
Service Raise
0.452
-0.041
0.471
-0.029
0.565
0.226
0.674
1.000 0.103
1.000
-0.180
To identify all of the correlations for a single variable, find the
name in the left column. Then go across until you reach the
1.00 value, then go down. For age, we find that the correlation
with:
Age = 0.544,
Mid = 0.567,
Age (itself) = 1.00,
Perf Rat = 0.139,
Service = 0.565, and Raise = -0.180.
Side note: now we can see why the correlation with itself is
shown in the tables, it provides the pivot point for reading the
table outcomes. The values above this diagonal of 1.00 values
would be identical to those below, so they are not provided to
make the table visually easier to read.
Coefficient of Determination. We will look at determining
statistical significance of correlations in lecture three for this
week. But, in the meantime, we can consider the Coefficient of
Determination as a rough measure of usefulness (we will look at
the effect size measure in lecture three as well). The coefficient
of determination is the square of the correlation, and represents
the percent of variation that the variables share in common; that
is, the amount of variation in one variable’s changes that is
explained by the variation in the other variable. So, for age and
salary, the coefficient equals 0.5442 = .30 (rounded). As a rule
of thumb, variable pairs with coefficients less than (<) 70% are
generally not very valuable for prediction purposes.
References
Lind, D. A., Marchel, W. G., & Wathen, S. A. (2008).
Statistical Techniques in Business & Finance. (13th Ed.)
Boston: McGraw-Hill Irwin.
Tanner, D. E. & Youssef-Morgan, C. M. (2013). Statistics for
Managers. San Diego, CA: Bridgeport Education.
Week 4 Lecture 12 Significance
Earlier we discussed correlations without going into how we can
identify statistically significant values. Our approach to this
uses the t-test. Unfortunately, Excel does not automatically
produce this form of the t-test, but setting it up within an Excel
cell is fairly easy. And, with some slight algebra, we can
determine the minimum value that is statistically significant for
any table of correlations all of which have the same number of
pairs (for example, a Correlation table for our data set would
use 50 pairs of values, since we have 50 members in our
sample).
The t-test formula for a correlation (r) is t = r * sqrt(n-2)/sqrt(1-
r2); the associated degrees of freedom are n-2 (number of pairs
– 2) (Lind, Marchel, & Wathen, 2008). For some this might
look a bit off-putting, but remember that we can translate this
into Excel cells and functions and have Excel do the arithmetic
for us.
Excel Example
If we go back to our correlation table for salary, midpoint, Age,
Perf Rat, Service, and Raise, we have:
Using Excel to create the formula and cell numbers for our key
values allows us to quickly create a result. The T.dist.2t gives
us a p-value easily.
The formula to use in finding the minimum correlation value
that is statistically significant is r = sqrt(t^2/(t^2 + n-2)). We
would find the appropriate t value by using the
t.inv.2T(alpha, df) with alpha = 0.05 and df = n-2 or 48.
Plugging these values into the gives us a t-value of 2.0106 or
2.011(rounded).
Putting 2.011 and 48 (n-2) into our formula gives us a r value of
0.278; therefore, in a correlation table based on 50 pairs, any
correlation greater or equal to 0.278 would be statistically
significant.
Technical Point. If you are interested in how we obtained the
formula for determining the minimum r value, the approach is
shown below. If you are not interested in the math, you can
safely skip this paragraph.
t = r* sqrt(n-2)/sqrt(1-r2)
Multiplying gives us t *sqrt (1- r2) = r2* (n-2)
Squaring gives us: t2 * (1- r2) = r2* (n-2)
Multiplying out gives us: t2– t2* r2 = n r2-2* r2
Adding gives us: t2= n* r2-2*r2+ t2 *r2
Factoring gives us t2= r2 *(n -2+ t2)
Dividing gives us t2 / (n -2+ t2) = r2
Taking the square root gives us r = sqrt (t2 / (n -2+ t2) Effect
Size Measures
As we have discussed, there is a difference between statistical
and practical significance. Virtually any statistic can become
statistically significant if the sample is large enough. In
practical terms, a correlation of .30 and below is generally
considered too weak to be of any practical significance.
Additionally, the effect size measure for Pearson’s correlation
is simply the absolute value of the correlation; the outcome has
the same general interpretation as Cohen’s D for the t-test (0.8
is strong, and 0.2 is quite weak, for example) (Tanner &
YoussefMorgan, 2013). Spearman’s Rank Correlation
Another type of correlation is the Spearman’s rank order
correlation. This correlation, which is interpreted the same way
as the Pearson’s Correlation, can be performed on ordinal or
any ranked data. If the data used is ordinal (rankable), we use
Spearman’s rank order correlation, rho (Tanner & Youssef-
Morgan, 2013). Using the same data, only assuming at least one
variable is ordinal would give us the following results. Note in
ranking from low to high, similar values are given the average
rank for all of the same values. For example, in the example
below the raise of 4.7 occurs twice (the 3rd and 4th places), so
it gets a rank of 3.5.
PR-
Rank
Performance Rating
Raise
Raise Rank
Difference in rank
Difference squared
1
55
3
1
0
0
2
75
3.6
2
0
0
4
80
4.7
3.5
0.5
0.25
9
100
4.7
3.5
5.5
30.25
9
100
4.8
5
4
16
4
80
4.9
6
-2
4
4
80
5.6
7
-3
9
9
100
5.7
8
1
1
6.5
90
5.8
9
-2.5
6.25
6.5
90
6
10
-3.5
12.25
Sum =
79
Spearman’s rank order correlation = 1-6*sum of differences
squared/(n*(n2 -1))
For this data, the sum of differences = 79, and n = 10. This
gives us a value of 1-6*(79/(10 *(102 -1))79 = 1 – 6*
(79/(10*99) = 1-6 * (79/990) = 1 – 6*0.08 = 0.52.
For comparison purposes, the Pearson Correlation equals 0.686.
Note that we have less information about the data when we use
ranks, particularly with several ties in the data. This reduced
information results in a lower correlation value with
Spearman’s. This correlation is tested and interpreted the same
way as Pearson’s Coefficient is (Lind, Marchel, & Wathen,
2008).
References
Lind, D. A., Marchel, W. G., & Wathen, S. A. (2008).
Statistical Techniques in Business & Finance. (13th Ed.)
Boston: McGraw-Hill Irwin.
Tanner, D. E. & Youssef-Morgan, C. M. (2013). Statistics for
Managers. San Diego, CA: Bridgeport Education.
Week 3 Lecture 11
Regression Analysis
Regression analysis is the development of an equation that
shows the impact of the independent variables (the inputs we
can generally control) on the output result. While the
mathematical language may sound strange, most of you are
quite familiar with regression like instructions and use them
quite regularly.
To make a cake, we take 1 box mix, add 1¼ cups of water, ½
cup of oil, and 3 eggs. All of this is combined and cooked. The
recipe is an example of a regression equation. The output (or
result or dependent variable) is the cake, the inputs (or
independent variables) are the inputs used. Each input is
accompanied by a coefficient (AKA weight or amount) that tells
us how “much” of the variable is “used” or weighted into the
outcome.
So, in an equation format, this cake recipe might look like:
Y = 1X1 + 1.25X2 + .5X3 + 3X4 where:
Y = cake
X1 = box mix
X2 = cups of water
X3 = cups of oil X4 = an egg.
Of course, for the cake, the recipe needs to go through the
cooking process; while for other regression equations the
outputs need to go through whatever “process” turns the inputs
into the output – this is often called “life.” Example
With a regression analysis, we can identify what factors
influence an outcome. So, with our Salary issue, the natural
question to help us answer our research question of do males
and females get equal pay for equal work would be: what
factors influence or explain an individual’s pay? This is a
perfect question for a multi-variate regression. Multi-variate
simply means we have multiple input variables with a single
output variable (Lind, Marchel, & Wathen, 2008).
Variables. A regression analysis uses two distinct types of data.
The first are variables that are at least interval level or better
(the same as the other techniques we have used so far). The
other is called a dummy variable, a variable that can be coded 0
or 1 indicating the presence of some characteristic. In our data
set, we have two variables that can be used as dummy coded
variables in a regression, Degree and Gender; both coded 0 or 1.
In the case of Degree, the 0 stands for having a bachelor’s
degree and the 1 stands for having an advanced degree. For
Gender, 0 means a male and 1 means a female. How these are
interpreted in a regression output will be discussed below. For
now, the significance of dummy coding is that it allows us to
include nominal or ordinal data in our analysis.
Excel Approach. For our question of what factors influence
pay, we will use Excel’s Regression function found in the Data
Analysis section. This function will produce two output tables
of interest. The first table tests to see if the entire regression
equation is statistically significant; that is, do the input
variables significantly impact the output variable. If so, we
would then examine the second table – the coefficients used in a
regression equation for each of the variables. We would have a
second set of hypothesis statements for each variable, the null
would be the coefficient equals 0 versus an alternate of the
coefficient is not equal to 0. Typically, we list these before we
start the analysis.
Step 1: For the regression equation:
Ho: The regression equation is not significant Ha: The
regression equation is significant.
For the coefficients if the regression equation is
significant:
Ho: The regression coefficient equals 0
Ha: The regression coefficient is not equal to 0.
Note: We would write one pair of statements for each variable,
for space reasons, we include only one general statement that
should be applied to each variable.
Step 2: Reject each null hypothesis claim if the related p-value
> (is greater than) p-value = .05.
Step 3: Regression Analysis
Step 4: Perform the test. Selecting the Regression option in
Data Analysis will open a familiar data entry box. The Input Y
Range would be the salary range including the label. The Input
X range would the labels and data for our input variables. In
this case we will use Midpoint, Age, Performance Rating,
Service, Raise, Degree, and Gender. Be sure to check the labels
box and pick an output range upper left corner. This will result
in the following output (values rounded to three decimal
places):
Step 5: Conclusions and Interpretation. Let’s look at each table
separately.
The Regression Statistics table shows A Multiple R and an R
squared value. Multiple R is the multiple correlation value.
Similar to our Pearson Coefficient it shows the relationship
between the dependent (output or Salary in this case) variable
with all for the independent or input variables. Multiple R is
the multiple coefficient of determination, similar to the Pearson
coefficient of determination, it displays the percent of variation
in common between the dependent and all of the independent
variables.
The adjusted R square reduces the R square by a factor that
involves the number of variables and the sample size, a
suggestion if the design impacted the outcome more than the
variables. We have an insignificant reduction. The standard
error is a measure of variation in the outcome used for
predictions. The count shows the number of cases used in the
regression.
The ANOVA table, sometimes called ANOR – analysis of
regression – provides us with our test of significance outcome.
Similar to the ANOVA covered in Week 3, we look at the
Significance of F (AKA P-value) to see if we reject or fail to
reject the null hypothesis of no significance. In this case, with
a p-value of 8.44E-36 (equaling
0.00000000000000000000000000000000000844) is less than
.05, so we reject the null of no significance. The regression
equation explains a significant proportion of the variation in our
dependent variable of salary.
Now that we have a significant regression equation, we move on
to the final table that presents and tests the coefficients for each
variable. One of the important parts of a regression equation is
that it shows us the impact of each factor if all other factors are
held constant. A regression has the form:
Y = A + B1* X1 + B2*X2 + B3*X3 + …. Where Y is the
output, A is the intercept (places the line up or down on the Y
axis when all other values are 0), the B’s are the coefficient
values, and the X’s are the variable names. Before considering
whether each coefficient is statistically significant or not, our
equation would be:
Salary - -4.009 + 1.22* Midpoint + 0.029*Age – 0.096*Perf Rat
– 0.074*Service + 0.834*Raise + 1.002*Degree + 2.552*
Gender. Whew!
What does this mean? The intercept is an adjustment factor,
one that we do not need to analyze. For midpoint, it means that
as midpoint goes up by a thousand dollars (remember salary and
midpoint are measured in thousands), the salary goes up by 1.22
thousand – higher graded employees are paid relatively more
compared to midpoint than others (all others things equal). For
Performance Rating, employees lose $96 (-0.096) for every
higher PR point they have – certainly not what HR would like!
Now, let’s look at our dummy variables, Degree and Gender.
For Degree, an extra $1,002 is added to employees having a Deg
code = 1, as if Deg = 0, the +1.002* 0 = 0; so graduate degree
holders get an extra $1002 per year. The same thing applies to
Gender, those coded 0 get nothing extra and those coded 1 get
$2,552 more per year (all other things equal). Since females are
coded 1, if this factor is significant, they would be paid $2552
more than males with all other factors equal (the definition of
equal work).
So, now let’s take a look at the statistical significance of each
of the variables. This is determined with the P-value column
(next to the t Stat value). This is read the same way we noticed
in the t-test and ANVOA tables, if the value is less than 0.05 we
reject the null hypothesis of no significance.
While the intercept has a significance value, we tend to ignore
it and include the intercept in all equations. For the other
variables, the only significant variables are: Midpoint, Perf
Rating (unrounded it was 0.0497994…), and Gender. So, the
regression equation including only our statistically significant
factors is Sal = -4.009 +1.22*Midpoint -).096*Perf Rat +
2.552*Gender.
So, we now have a clear answer to our question about males and
females getting equal pay for equal work. Not only is the
answer no (as gender is a significant factor in determining
salary) but also females are paid $2552 more annually all other
things equal!
This is certainly not the outcome most of us expected when we
began this journey. What we see is that variation within any
measure has some often unanticipated outcomes, and unless we
examine the inputs into our results, we often do not understand
them very well. Single measure tests such as the t and ANOVA
tests are quite valuable comparing similar results, but they do
not always get to the root of what causes differences.
Reference
Lind, D. A., Marchel, W. G., & Wathen, S. A. (2008).
Statistical Techniques in Business & Finance. (13th Ed.)
Boston: McGraw-Hill Irwin.

More Related Content

Similar to Week 4 Lecture 10 We have been examining the question of equal p.docx

Business Email Rubric Subject Line Subject line clea.docx
Business Email Rubric Subject Line Subject line clea.docxBusiness Email Rubric Subject Line Subject line clea.docx
Business Email Rubric Subject Line Subject line clea.docxjasoninnes20
 
Excel Files AssingmentsCopy of Student_Assignment_File.11.01..docx
Excel Files AssingmentsCopy of Student_Assignment_File.11.01..docxExcel Files AssingmentsCopy of Student_Assignment_File.11.01..docx
Excel Files AssingmentsCopy of Student_Assignment_File.11.01..docxSANSKAR20
 
Week 3 Lecture 9 Effect Size When we reject the null h.docx
Week 3 Lecture 9 Effect Size When we reject the null h.docxWeek 3 Lecture 9 Effect Size When we reject the null h.docx
Week 3 Lecture 9 Effect Size When we reject the null h.docxcockekeshia
 
Measures and Strengths of AssociationRemember that while w.docx
Measures and Strengths of AssociationRemember that while w.docxMeasures and Strengths of AssociationRemember that while w.docx
Measures and Strengths of AssociationRemember that while w.docxARIV4
 
Week 5 Lecture 14 The Chi Square TestQuite often, patterns of .docx
Week 5 Lecture 14 The Chi Square TestQuite often, patterns of .docxWeek 5 Lecture 14 The Chi Square TestQuite often, patterns of .docx
Week 5 Lecture 14 The Chi Square TestQuite often, patterns of .docxcockekeshia
 
BUS 308 Week 3 Lecture 3 Setting up ANOVA and Chi Square .docx
BUS 308 Week 3 Lecture 3 Setting up ANOVA and Chi Square .docxBUS 308 Week 3 Lecture 3 Setting up ANOVA and Chi Square .docx
BUS 308 Week 3 Lecture 3 Setting up ANOVA and Chi Square .docxjasoninnes20
 
BUS 308 Week 3 Lecture 3 Setting up ANOVA and Chi Square .docx
BUS 308 Week 3 Lecture 3 Setting up ANOVA and Chi Square .docxBUS 308 Week 3 Lecture 3 Setting up ANOVA and Chi Square .docx
BUS 308 Week 3 Lecture 3 Setting up ANOVA and Chi Square .docxcurwenmichaela
 
correlation and r3433333333333333333333333333333333333333333333333egratio111n...
correlation and r3433333333333333333333333333333333333333333333333egratio111n...correlation and r3433333333333333333333333333333333333333333333333egratio111n...
correlation and r3433333333333333333333333333333333333333333333333egratio111n...Ghaneshwer Jharbade
 
Week 5 Lecture 14 The Chi Square Test Quite often, pat.docx
Week 5 Lecture 14 The Chi Square Test Quite often, pat.docxWeek 5 Lecture 14 The Chi Square Test Quite often, pat.docx
Week 5 Lecture 14 The Chi Square Test Quite often, pat.docxcockekeshia
 
Correlation Example
Correlation ExampleCorrelation Example
Correlation ExampleOUM SAOKOSAL
 
Linear regression.pptx
Linear regression.pptxLinear regression.pptx
Linear regression.pptxssuserb8a904
 
Covariance and correlation
Covariance and correlationCovariance and correlation
Covariance and correlationRashid Hussain
 
For this assignment, use the aschooltest.sav dataset.The d
For this assignment, use the aschooltest.sav dataset.The dFor this assignment, use the aschooltest.sav dataset.The d
For this assignment, use the aschooltest.sav dataset.The dMerrileeDelvalle969
 
1Create a correlation table for the variables in our data set. (Us.docx
1Create a correlation table for the variables in our data set. (Us.docx1Create a correlation table for the variables in our data set. (Us.docx
1Create a correlation table for the variables in our data set. (Us.docxjeanettehully
 
BUS 308 Week 2 Lecture 2 Statistical Testing for Differenc.docx
BUS 308 Week 2 Lecture 2 Statistical Testing for Differenc.docxBUS 308 Week 2 Lecture 2 Statistical Testing for Differenc.docx
BUS 308 Week 2 Lecture 2 Statistical Testing for Differenc.docxjasoninnes20
 

Similar to Week 4 Lecture 10 We have been examining the question of equal p.docx (20)

Business Email Rubric Subject Line Subject line clea.docx
Business Email Rubric Subject Line Subject line clea.docxBusiness Email Rubric Subject Line Subject line clea.docx
Business Email Rubric Subject Line Subject line clea.docx
 
Excel Files AssingmentsCopy of Student_Assignment_File.11.01..docx
Excel Files AssingmentsCopy of Student_Assignment_File.11.01..docxExcel Files AssingmentsCopy of Student_Assignment_File.11.01..docx
Excel Files AssingmentsCopy of Student_Assignment_File.11.01..docx
 
Week 3 Lecture 9 Effect Size When we reject the null h.docx
Week 3 Lecture 9 Effect Size When we reject the null h.docxWeek 3 Lecture 9 Effect Size When we reject the null h.docx
Week 3 Lecture 9 Effect Size When we reject the null h.docx
 
Measures and Strengths of AssociationRemember that while w.docx
Measures and Strengths of AssociationRemember that while w.docxMeasures and Strengths of AssociationRemember that while w.docx
Measures and Strengths of AssociationRemember that while w.docx
 
9. parametric regression
9. parametric regression9. parametric regression
9. parametric regression
 
Week 5 Lecture 14 The Chi Square TestQuite often, patterns of .docx
Week 5 Lecture 14 The Chi Square TestQuite often, patterns of .docxWeek 5 Lecture 14 The Chi Square TestQuite often, patterns of .docx
Week 5 Lecture 14 The Chi Square TestQuite often, patterns of .docx
 
Measure of Association
Measure of AssociationMeasure of Association
Measure of Association
 
BUS 308 Week 3 Lecture 3 Setting up ANOVA and Chi Square .docx
BUS 308 Week 3 Lecture 3 Setting up ANOVA and Chi Square .docxBUS 308 Week 3 Lecture 3 Setting up ANOVA and Chi Square .docx
BUS 308 Week 3 Lecture 3 Setting up ANOVA and Chi Square .docx
 
BUS 308 Week 3 Lecture 3 Setting up ANOVA and Chi Square .docx
BUS 308 Week 3 Lecture 3 Setting up ANOVA and Chi Square .docxBUS 308 Week 3 Lecture 3 Setting up ANOVA and Chi Square .docx
BUS 308 Week 3 Lecture 3 Setting up ANOVA and Chi Square .docx
 
correlation and r3433333333333333333333333333333333333333333333333egratio111n...
correlation and r3433333333333333333333333333333333333333333333333egratio111n...correlation and r3433333333333333333333333333333333333333333333333egratio111n...
correlation and r3433333333333333333333333333333333333333333333333egratio111n...
 
Week 5 Lecture 14 The Chi Square Test Quite often, pat.docx
Week 5 Lecture 14 The Chi Square Test Quite often, pat.docxWeek 5 Lecture 14 The Chi Square Test Quite often, pat.docx
Week 5 Lecture 14 The Chi Square Test Quite often, pat.docx
 
Correlation Example
Correlation ExampleCorrelation Example
Correlation Example
 
Linear regression.pptx
Linear regression.pptxLinear regression.pptx
Linear regression.pptx
 
assignment 2
assignment 2assignment 2
assignment 2
 
Ijetcas14 608
Ijetcas14 608Ijetcas14 608
Ijetcas14 608
 
4. correlations
4. correlations4. correlations
4. correlations
 
Covariance and correlation
Covariance and correlationCovariance and correlation
Covariance and correlation
 
For this assignment, use the aschooltest.sav dataset.The d
For this assignment, use the aschooltest.sav dataset.The dFor this assignment, use the aschooltest.sav dataset.The d
For this assignment, use the aschooltest.sav dataset.The d
 
1Create a correlation table for the variables in our data set. (Us.docx
1Create a correlation table for the variables in our data set. (Us.docx1Create a correlation table for the variables in our data set. (Us.docx
1Create a correlation table for the variables in our data set. (Us.docx
 
BUS 308 Week 2 Lecture 2 Statistical Testing for Differenc.docx
BUS 308 Week 2 Lecture 2 Statistical Testing for Differenc.docxBUS 308 Week 2 Lecture 2 Statistical Testing for Differenc.docx
BUS 308 Week 2 Lecture 2 Statistical Testing for Differenc.docx
 

More from cockekeshia

at least 2 references in each peer responses! I noticed .docx
at least 2 references in each peer responses! I noticed .docxat least 2 references in each peer responses! I noticed .docx
at least 2 references in each peer responses! I noticed .docxcockekeshia
 
At least 2 pages longMarilyn Lysohir, an internationally celebra.docx
At least 2 pages longMarilyn Lysohir, an internationally celebra.docxAt least 2 pages longMarilyn Lysohir, an internationally celebra.docx
At least 2 pages longMarilyn Lysohir, an internationally celebra.docxcockekeshia
 
At least 2 citations. APA 7TH EditionResponse 1. TITop.docx
At least 2 citations. APA 7TH EditionResponse 1. TITop.docxAt least 2 citations. APA 7TH EditionResponse 1. TITop.docx
At least 2 citations. APA 7TH EditionResponse 1. TITop.docxcockekeshia
 
At each decision point, you should evaluate all options before selec.docx
At each decision point, you should evaluate all options before selec.docxAt each decision point, you should evaluate all options before selec.docx
At each decision point, you should evaluate all options before selec.docxcockekeshia
 
At an elevation of nearly four thousand metres above sea.docx
At an elevation of nearly four thousand metres above sea.docxAt an elevation of nearly four thousand metres above sea.docx
At an elevation of nearly four thousand metres above sea.docxcockekeshia
 
At a minimum, your outline should include the followingIntroducti.docx
At a minimum, your outline should include the followingIntroducti.docxAt a minimum, your outline should include the followingIntroducti.docx
At a minimum, your outline should include the followingIntroducti.docxcockekeshia
 
At least 500 wordsPay attention to the required length of these.docx
At  least 500 wordsPay attention to the required length of these.docxAt  least 500 wordsPay attention to the required length of these.docx
At least 500 wordsPay attention to the required length of these.docxcockekeshia
 
At a generic level, innovation is a core business process concerned .docx
At a generic level, innovation is a core business process concerned .docxAt a generic level, innovation is a core business process concerned .docx
At a generic level, innovation is a core business process concerned .docxcockekeshia
 
Asymmetric Cryptography•Description of each algorithm•Types•Encrypt.docx
Asymmetric Cryptography•Description of each algorithm•Types•Encrypt.docxAsymmetric Cryptography•Description of each algorithm•Types•Encrypt.docx
Asymmetric Cryptography•Description of each algorithm•Types•Encrypt.docxcockekeshia
 
Astronomy HWIn 250-300 words,What was Aristarchus idea of the.docx
Astronomy HWIn 250-300 words,What was Aristarchus idea of the.docxAstronomy HWIn 250-300 words,What was Aristarchus idea of the.docx
Astronomy HWIn 250-300 words,What was Aristarchus idea of the.docxcockekeshia
 
Astronomy ASTA01The Sun and PlanetsDepartment of Physic.docx
Astronomy ASTA01The Sun and PlanetsDepartment of Physic.docxAstronomy ASTA01The Sun and PlanetsDepartment of Physic.docx
Astronomy ASTA01The Sun and PlanetsDepartment of Physic.docxcockekeshia
 
Astronomers have been reflecting laser beams off the Moon since refl.docx
Astronomers have been reflecting laser beams off the Moon since refl.docxAstronomers have been reflecting laser beams off the Moon since refl.docx
Astronomers have been reflecting laser beams off the Moon since refl.docxcockekeshia
 
Astrategicplantoinformemergingfashionretailers.docx
Astrategicplantoinformemergingfashionretailers.docxAstrategicplantoinformemergingfashionretailers.docx
Astrategicplantoinformemergingfashionretailers.docxcockekeshia
 
Asthma, Sleep, and Sun-SafetyPercentage of High School S.docx
Asthma, Sleep, and Sun-SafetyPercentage of High School S.docxAsthma, Sleep, and Sun-SafetyPercentage of High School S.docx
Asthma, Sleep, and Sun-SafetyPercentage of High School S.docxcockekeshia
 
Asthma DataSchoolNumStudentIDGenderZipDOBAsthmaRADBronchitisWheezi.docx
Asthma DataSchoolNumStudentIDGenderZipDOBAsthmaRADBronchitisWheezi.docxAsthma DataSchoolNumStudentIDGenderZipDOBAsthmaRADBronchitisWheezi.docx
Asthma DataSchoolNumStudentIDGenderZipDOBAsthmaRADBronchitisWheezi.docxcockekeshia
 
Assumption-Busting1. What assumption do you have that is in s.docx
Assumption-Busting1.  What assumption do you have that is in s.docxAssumption-Busting1.  What assumption do you have that is in s.docx
Assumption-Busting1. What assumption do you have that is in s.docxcockekeshia
 
Assuming you have the results of the Business Impact Analysis and ri.docx
Assuming you have the results of the Business Impact Analysis and ri.docxAssuming you have the results of the Business Impact Analysis and ri.docx
Assuming you have the results of the Business Impact Analysis and ri.docxcockekeshia
 
Assuming you are hired by a corporation to assess the market potenti.docx
Assuming you are hired by a corporation to assess the market potenti.docxAssuming you are hired by a corporation to assess the market potenti.docx
Assuming you are hired by a corporation to assess the market potenti.docxcockekeshia
 
Assuming that you are in your chosen criminal justice professi.docx
Assuming that you are in your chosen criminal justice professi.docxAssuming that you are in your chosen criminal justice professi.docx
Assuming that you are in your chosen criminal justice professi.docxcockekeshia
 
assuming that Nietzsche is correct that conventional morality is aga.docx
assuming that Nietzsche is correct that conventional morality is aga.docxassuming that Nietzsche is correct that conventional morality is aga.docx
assuming that Nietzsche is correct that conventional morality is aga.docxcockekeshia
 

More from cockekeshia (20)

at least 2 references in each peer responses! I noticed .docx
at least 2 references in each peer responses! I noticed .docxat least 2 references in each peer responses! I noticed .docx
at least 2 references in each peer responses! I noticed .docx
 
At least 2 pages longMarilyn Lysohir, an internationally celebra.docx
At least 2 pages longMarilyn Lysohir, an internationally celebra.docxAt least 2 pages longMarilyn Lysohir, an internationally celebra.docx
At least 2 pages longMarilyn Lysohir, an internationally celebra.docx
 
At least 2 citations. APA 7TH EditionResponse 1. TITop.docx
At least 2 citations. APA 7TH EditionResponse 1. TITop.docxAt least 2 citations. APA 7TH EditionResponse 1. TITop.docx
At least 2 citations. APA 7TH EditionResponse 1. TITop.docx
 
At each decision point, you should evaluate all options before selec.docx
At each decision point, you should evaluate all options before selec.docxAt each decision point, you should evaluate all options before selec.docx
At each decision point, you should evaluate all options before selec.docx
 
At an elevation of nearly four thousand metres above sea.docx
At an elevation of nearly four thousand metres above sea.docxAt an elevation of nearly four thousand metres above sea.docx
At an elevation of nearly four thousand metres above sea.docx
 
At a minimum, your outline should include the followingIntroducti.docx
At a minimum, your outline should include the followingIntroducti.docxAt a minimum, your outline should include the followingIntroducti.docx
At a minimum, your outline should include the followingIntroducti.docx
 
At least 500 wordsPay attention to the required length of these.docx
At  least 500 wordsPay attention to the required length of these.docxAt  least 500 wordsPay attention to the required length of these.docx
At least 500 wordsPay attention to the required length of these.docx
 
At a generic level, innovation is a core business process concerned .docx
At a generic level, innovation is a core business process concerned .docxAt a generic level, innovation is a core business process concerned .docx
At a generic level, innovation is a core business process concerned .docx
 
Asymmetric Cryptography•Description of each algorithm•Types•Encrypt.docx
Asymmetric Cryptography•Description of each algorithm•Types•Encrypt.docxAsymmetric Cryptography•Description of each algorithm•Types•Encrypt.docx
Asymmetric Cryptography•Description of each algorithm•Types•Encrypt.docx
 
Astronomy HWIn 250-300 words,What was Aristarchus idea of the.docx
Astronomy HWIn 250-300 words,What was Aristarchus idea of the.docxAstronomy HWIn 250-300 words,What was Aristarchus idea of the.docx
Astronomy HWIn 250-300 words,What was Aristarchus idea of the.docx
 
Astronomy ASTA01The Sun and PlanetsDepartment of Physic.docx
Astronomy ASTA01The Sun and PlanetsDepartment of Physic.docxAstronomy ASTA01The Sun and PlanetsDepartment of Physic.docx
Astronomy ASTA01The Sun and PlanetsDepartment of Physic.docx
 
Astronomers have been reflecting laser beams off the Moon since refl.docx
Astronomers have been reflecting laser beams off the Moon since refl.docxAstronomers have been reflecting laser beams off the Moon since refl.docx
Astronomers have been reflecting laser beams off the Moon since refl.docx
 
Astrategicplantoinformemergingfashionretailers.docx
Astrategicplantoinformemergingfashionretailers.docxAstrategicplantoinformemergingfashionretailers.docx
Astrategicplantoinformemergingfashionretailers.docx
 
Asthma, Sleep, and Sun-SafetyPercentage of High School S.docx
Asthma, Sleep, and Sun-SafetyPercentage of High School S.docxAsthma, Sleep, and Sun-SafetyPercentage of High School S.docx
Asthma, Sleep, and Sun-SafetyPercentage of High School S.docx
 
Asthma DataSchoolNumStudentIDGenderZipDOBAsthmaRADBronchitisWheezi.docx
Asthma DataSchoolNumStudentIDGenderZipDOBAsthmaRADBronchitisWheezi.docxAsthma DataSchoolNumStudentIDGenderZipDOBAsthmaRADBronchitisWheezi.docx
Asthma DataSchoolNumStudentIDGenderZipDOBAsthmaRADBronchitisWheezi.docx
 
Assumption-Busting1. What assumption do you have that is in s.docx
Assumption-Busting1.  What assumption do you have that is in s.docxAssumption-Busting1.  What assumption do you have that is in s.docx
Assumption-Busting1. What assumption do you have that is in s.docx
 
Assuming you have the results of the Business Impact Analysis and ri.docx
Assuming you have the results of the Business Impact Analysis and ri.docxAssuming you have the results of the Business Impact Analysis and ri.docx
Assuming you have the results of the Business Impact Analysis and ri.docx
 
Assuming you are hired by a corporation to assess the market potenti.docx
Assuming you are hired by a corporation to assess the market potenti.docxAssuming you are hired by a corporation to assess the market potenti.docx
Assuming you are hired by a corporation to assess the market potenti.docx
 
Assuming that you are in your chosen criminal justice professi.docx
Assuming that you are in your chosen criminal justice professi.docxAssuming that you are in your chosen criminal justice professi.docx
Assuming that you are in your chosen criminal justice professi.docx
 
assuming that Nietzsche is correct that conventional morality is aga.docx
assuming that Nietzsche is correct that conventional morality is aga.docxassuming that Nietzsche is correct that conventional morality is aga.docx
assuming that Nietzsche is correct that conventional morality is aga.docx
 

Recently uploaded

भारत-रोम व्यापार.pptx, Indo-Roman Trade,
भारत-रोम व्यापार.pptx, Indo-Roman Trade,भारत-रोम व्यापार.pptx, Indo-Roman Trade,
भारत-रोम व्यापार.pptx, Indo-Roman Trade,Virag Sontakke
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTiammrhaywood
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdfssuser54595a
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxOH TEIK BIN
 
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxEPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxRaymartEstabillo3
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxNirmalaLoungPoorunde1
 
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Celine George
 
Hierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of managementHierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of managementmkooblal
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Meghan Sutherland In Media Res Media Component
Meghan Sutherland In Media Res Media ComponentMeghan Sutherland In Media Res Media Component
Meghan Sutherland In Media Res Media ComponentInMediaRes1
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxGaneshChakor2
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Educationpboyjonauth
 
MARGINALIZATION (Different learners in Marginalized Group
MARGINALIZATION (Different learners in Marginalized GroupMARGINALIZATION (Different learners in Marginalized Group
MARGINALIZATION (Different learners in Marginalized GroupJonathanParaisoCruz
 
Full Stack Web Development Course for Beginners
Full Stack Web Development Course  for BeginnersFull Stack Web Development Course  for Beginners
Full Stack Web Development Course for BeginnersSabitha Banu
 
Pharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdfPharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdfMahmoud M. Sallam
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxpboyjonauth
 
Capitol Tech U Doctoral Presentation - April 2024.pptx
Capitol Tech U Doctoral Presentation - April 2024.pptxCapitol Tech U Doctoral Presentation - April 2024.pptx
Capitol Tech U Doctoral Presentation - April 2024.pptxCapitolTechU
 

Recently uploaded (20)

भारत-रोम व्यापार.pptx, Indo-Roman Trade,
भारत-रोम व्यापार.pptx, Indo-Roman Trade,भारत-रोम व्यापार.pptx, Indo-Roman Trade,
भारत-रोम व्यापार.pptx, Indo-Roman Trade,
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptx
 
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxEPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptx
 
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
 
Hierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of managementHierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of management
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
 
Meghan Sutherland In Media Res Media Component
Meghan Sutherland In Media Res Media ComponentMeghan Sutherland In Media Res Media Component
Meghan Sutherland In Media Res Media Component
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptx
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Education
 
MARGINALIZATION (Different learners in Marginalized Group
MARGINALIZATION (Different learners in Marginalized GroupMARGINALIZATION (Different learners in Marginalized Group
MARGINALIZATION (Different learners in Marginalized Group
 
Full Stack Web Development Course for Beginners
Full Stack Web Development Course  for BeginnersFull Stack Web Development Course  for Beginners
Full Stack Web Development Course for Beginners
 
Pharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdfPharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdf
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptx
 
9953330565 Low Rate Call Girls In Rohini Delhi NCR
9953330565 Low Rate Call Girls In Rohini  Delhi NCR9953330565 Low Rate Call Girls In Rohini  Delhi NCR
9953330565 Low Rate Call Girls In Rohini Delhi NCR
 
OS-operating systems- ch04 (Threads) ...
OS-operating systems- ch04 (Threads) ...OS-operating systems- ch04 (Threads) ...
OS-operating systems- ch04 (Threads) ...
 
Capitol Tech U Doctoral Presentation - April 2024.pptx
Capitol Tech U Doctoral Presentation - April 2024.pptxCapitol Tech U Doctoral Presentation - April 2024.pptx
Capitol Tech U Doctoral Presentation - April 2024.pptx
 

Week 4 Lecture 10 We have been examining the question of equal p.docx

  • 1. Week 4 Lecture 10 We have been examining the question of equal pay for equal work for several weeks now; but have been somewhat frustrated with the equal work part. We suspect that salary varies with grade level, so that equal work is not done if we compare salaries across grades. We found that we could control the effect of grades with either of two techniques. The first is by choosing a variable that does not include grade level variation such as compa-ratios (the salary divided by midpoint). The second by statistically removing the impact of grade level using the ANOVA Two-factor without replication. Both of these gave us different outcomes on the question of male and female pay equality than examining salary only. However, we still have not gotten a “clean” measure of equal work as there are still other factors that may impact work done such as performance levels (measured by the performance appraisal rating), seniority, education, etc. And, there could be gender bias (and, for real world companies, ethnic bias as well. We will not cover this, but it can be dealt with the same way as we will examine gender). We need to find a way to eliminate the impact of these variables on our pay measure as well. This week we will look at two techniques that are very good at examining and explaining the influence of variables on outcomes. These are correlation and regression techniques. Linear Correlation Correlation is a measure of how variables/things relate – that is, if one variable changes does another variable change in a predictable pattern as well? One very well-known example is the correlation (or relationship) between length/height of children and weight. As children become longer/taller their weight also increases (Tanner & Youssef-Morgan, 2013). Using this relationship, we can make predictions (using the technique of regression discussed in Lecture 11 for this week) about how heavy a child should be for any given height.
  • 2. For variables that are at least interval in nature, two types of correlation exist for a bivariable (two variables only) relationship– linear and curvilinear. As they sound, linear correlations show the extent to which the data variables move in a straight line. Curvilinear correlations – which we will not cover – show the extent that variables move in curved lines. Scatter Diagrams An effective way to see if the data do relate in predictable ways involves generating a scatter diagram (AKA scatter chart) – a visual display of how the data points – (variable 1 value, corresponding variable 2 value) relate together (Lind, Marchel, & Wathen, 2008). Example1. One relationship we might expect to show a positive (both values increasing) relationship would be salary and performance rating, either for the entire salary range or at least within grades. The following scatter diagram (made with the Excel Insert Graph functions) show the relationship with Performance Rating on the bottom and Salary on the on the vertical axis. It shows if we put a straight line through the data points, there is a very modest increase from the lower left to upper right. Salary (Y-axis) and Performance Rating (X-axis) Example2. If we look at the same variables, but include Grade as a factor, we get the second graph (below) and see the data separated by grade. Each grade seems to show (again, if we were to put a straight line thru the data points for each grade) level lines, indicating no correlation at all. Neither graph gives us much hope that Performance Rating is related to Salary , something HR would probably not be happy with. Salary Grades (Y-axis) and Performance Appraisal Rating (X- axis) Correlation We will be focusing our efforts on the Pearson Correlation
  • 3. Coefficient – a mathematical value that shows the strength of the linear (straight line) relationship between two variables (Lind, Marchel, & Wathen, 2008). The math formula is a bit tedious, so we will not bother with it – but, if interested, you can ask Excel to display it (either with Help or the “Tell me what you want to do.” With the latter, I typed show help on Pearson Correlation, and then selected the “show help…” line, getting a description and the math formula.). Pearson correlation ranges from a value of -1.00 to a +1.00. Any value outside of this range indicates an error in the math or setup. A perfect negative correlation (-1.00) means that the data points all fit exactly on a line that runs from the upper left corner to the lower right on a graph, a negative slope. A perfect positive correlation (+1.00) has the line with a positive slope and runs from the lower left to the upper right (Tanner & Youssef-Morgan, 2013). As the values move away from the perfect extremes, the data points move away from a line to a spread around the line. If we look at our first graph above, the overall Salary and Performance Rating relationship, we have a correlation of +.15, considered very low and not particularly impressive. Pearson Correlation. Excel finds the Pearson Correlation Coefficient using either the fx function Correl or the Data Analysis function Correlation. The former is used for a single data set with two variables, while the latter can be used for a single or multiple data sets. The Correl output for the Performance Rating and Salary correlation result is: Column Column 1 2 Column 1 1 Column 2 0.151307 1 Note the variable names are not included, and we have three correlations. Two will always show a perfect +1.00 correlation of column 1 with column 1 and column 2 with column 2; a
  • 4. diagonal convention makes more sense with the Correlation table we will look at below. The third correlation is the column 1 with column 2 variable. It does not matter which variable is considered in column 1 or 2, as the result will be the same as switching the variable columns. We can use the Correlation function to identify correlations between multiple data sets at the same time, much as Descriptive Statistics could work with multiple variables at once. In trying to identify what variables might be impacting Salary, we could generate the following table. Remember, that Pearson’s Correlation requires at least interval level data, so that not all of our variables are used. In addition, since Salary and Compa-ratio are two measures of the same thing (pay) we do not want to include them in the same table. Sal Mid Age Perf Rat Service Raise Sal 1.000 Mid 0.989 1.000 Age
  • 5. 0.544 0.567 1.000 Perf Rat 0.151 0.192 0.139 1.000 Service Raise 0.452 -0.041 0.471 -0.029 0.565 0.226 0.674 1.000 0.103 1.000 -0.180 To identify all of the correlations for a single variable, find the name in the left column. Then go across until you reach the 1.00 value, then go down. For age, we find that the correlation with:
  • 6. Age = 0.544, Mid = 0.567, Age (itself) = 1.00, Perf Rat = 0.139, Service = 0.565, and Raise = -0.180. Side note: now we can see why the correlation with itself is shown in the tables, it provides the pivot point for reading the table outcomes. The values above this diagonal of 1.00 values would be identical to those below, so they are not provided to make the table visually easier to read. Coefficient of Determination. We will look at determining statistical significance of correlations in lecture three for this week. But, in the meantime, we can consider the Coefficient of Determination as a rough measure of usefulness (we will look at the effect size measure in lecture three as well). The coefficient of determination is the square of the correlation, and represents the percent of variation that the variables share in common; that is, the amount of variation in one variable’s changes that is explained by the variation in the other variable. So, for age and salary, the coefficient equals 0.5442 = .30 (rounded). As a rule of thumb, variable pairs with coefficients less than (<) 70% are generally not very valuable for prediction purposes. References Lind, D. A., Marchel, W. G., & Wathen, S. A. (2008). Statistical Techniques in Business & Finance. (13th Ed.) Boston: McGraw-Hill Irwin. Tanner, D. E. & Youssef-Morgan, C. M. (2013). Statistics for Managers. San Diego, CA: Bridgeport Education.
  • 7. Week 4 Lecture 12 Significance Earlier we discussed correlations without going into how we can identify statistically significant values. Our approach to this uses the t-test. Unfortunately, Excel does not automatically produce this form of the t-test, but setting it up within an Excel cell is fairly easy. And, with some slight algebra, we can determine the minimum value that is statistically significant for any table of correlations all of which have the same number of pairs (for example, a Correlation table for our data set would use 50 pairs of values, since we have 50 members in our sample). The t-test formula for a correlation (r) is t = r * sqrt(n-2)/sqrt(1- r2); the associated degrees of freedom are n-2 (number of pairs – 2) (Lind, Marchel, & Wathen, 2008). For some this might look a bit off-putting, but remember that we can translate this into Excel cells and functions and have Excel do the arithmetic for us. Excel Example If we go back to our correlation table for salary, midpoint, Age, Perf Rat, Service, and Raise, we have: Using Excel to create the formula and cell numbers for our key values allows us to quickly create a result. The T.dist.2t gives us a p-value easily. The formula to use in finding the minimum correlation value that is statistically significant is r = sqrt(t^2/(t^2 + n-2)). We would find the appropriate t value by using the t.inv.2T(alpha, df) with alpha = 0.05 and df = n-2 or 48. Plugging these values into the gives us a t-value of 2.0106 or 2.011(rounded). Putting 2.011 and 48 (n-2) into our formula gives us a r value of 0.278; therefore, in a correlation table based on 50 pairs, any correlation greater or equal to 0.278 would be statistically significant. Technical Point. If you are interested in how we obtained the
  • 8. formula for determining the minimum r value, the approach is shown below. If you are not interested in the math, you can safely skip this paragraph. t = r* sqrt(n-2)/sqrt(1-r2) Multiplying gives us t *sqrt (1- r2) = r2* (n-2) Squaring gives us: t2 * (1- r2) = r2* (n-2) Multiplying out gives us: t2– t2* r2 = n r2-2* r2 Adding gives us: t2= n* r2-2*r2+ t2 *r2 Factoring gives us t2= r2 *(n -2+ t2) Dividing gives us t2 / (n -2+ t2) = r2 Taking the square root gives us r = sqrt (t2 / (n -2+ t2) Effect Size Measures As we have discussed, there is a difference between statistical and practical significance. Virtually any statistic can become statistically significant if the sample is large enough. In practical terms, a correlation of .30 and below is generally considered too weak to be of any practical significance. Additionally, the effect size measure for Pearson’s correlation is simply the absolute value of the correlation; the outcome has the same general interpretation as Cohen’s D for the t-test (0.8 is strong, and 0.2 is quite weak, for example) (Tanner & YoussefMorgan, 2013). Spearman’s Rank Correlation Another type of correlation is the Spearman’s rank order correlation. This correlation, which is interpreted the same way as the Pearson’s Correlation, can be performed on ordinal or any ranked data. If the data used is ordinal (rankable), we use Spearman’s rank order correlation, rho (Tanner & Youssef- Morgan, 2013). Using the same data, only assuming at least one variable is ordinal would give us the following results. Note in ranking from low to high, similar values are given the average rank for all of the same values. For example, in the example below the raise of 4.7 occurs twice (the 3rd and 4th places), so it gets a rank of 3.5. PR- Rank
  • 9. Performance Rating Raise Raise Rank Difference in rank Difference squared 1 55 3 1 0 0 2 75 3.6 2 0 0 4 80 4.7 3.5 0.5 0.25
  • 12. Sum = 79 Spearman’s rank order correlation = 1-6*sum of differences squared/(n*(n2 -1)) For this data, the sum of differences = 79, and n = 10. This gives us a value of 1-6*(79/(10 *(102 -1))79 = 1 – 6* (79/(10*99) = 1-6 * (79/990) = 1 – 6*0.08 = 0.52. For comparison purposes, the Pearson Correlation equals 0.686. Note that we have less information about the data when we use ranks, particularly with several ties in the data. This reduced information results in a lower correlation value with Spearman’s. This correlation is tested and interpreted the same way as Pearson’s Coefficient is (Lind, Marchel, & Wathen, 2008). References Lind, D. A., Marchel, W. G., & Wathen, S. A. (2008). Statistical Techniques in Business & Finance. (13th Ed.) Boston: McGraw-Hill Irwin. Tanner, D. E. & Youssef-Morgan, C. M. (2013). Statistics for Managers. San Diego, CA: Bridgeport Education. Week 3 Lecture 11 Regression Analysis Regression analysis is the development of an equation that shows the impact of the independent variables (the inputs we can generally control) on the output result. While the mathematical language may sound strange, most of you are
  • 13. quite familiar with regression like instructions and use them quite regularly. To make a cake, we take 1 box mix, add 1¼ cups of water, ½ cup of oil, and 3 eggs. All of this is combined and cooked. The recipe is an example of a regression equation. The output (or result or dependent variable) is the cake, the inputs (or independent variables) are the inputs used. Each input is accompanied by a coefficient (AKA weight or amount) that tells us how “much” of the variable is “used” or weighted into the outcome. So, in an equation format, this cake recipe might look like: Y = 1X1 + 1.25X2 + .5X3 + 3X4 where: Y = cake X1 = box mix X2 = cups of water X3 = cups of oil X4 = an egg. Of course, for the cake, the recipe needs to go through the cooking process; while for other regression equations the outputs need to go through whatever “process” turns the inputs into the output – this is often called “life.” Example With a regression analysis, we can identify what factors influence an outcome. So, with our Salary issue, the natural question to help us answer our research question of do males and females get equal pay for equal work would be: what factors influence or explain an individual’s pay? This is a perfect question for a multi-variate regression. Multi-variate simply means we have multiple input variables with a single output variable (Lind, Marchel, & Wathen, 2008). Variables. A regression analysis uses two distinct types of data. The first are variables that are at least interval level or better (the same as the other techniques we have used so far). The other is called a dummy variable, a variable that can be coded 0 or 1 indicating the presence of some characteristic. In our data set, we have two variables that can be used as dummy coded variables in a regression, Degree and Gender; both coded 0 or 1. In the case of Degree, the 0 stands for having a bachelor’s
  • 14. degree and the 1 stands for having an advanced degree. For Gender, 0 means a male and 1 means a female. How these are interpreted in a regression output will be discussed below. For now, the significance of dummy coding is that it allows us to include nominal or ordinal data in our analysis. Excel Approach. For our question of what factors influence pay, we will use Excel’s Regression function found in the Data Analysis section. This function will produce two output tables of interest. The first table tests to see if the entire regression equation is statistically significant; that is, do the input variables significantly impact the output variable. If so, we would then examine the second table – the coefficients used in a regression equation for each of the variables. We would have a second set of hypothesis statements for each variable, the null would be the coefficient equals 0 versus an alternate of the coefficient is not equal to 0. Typically, we list these before we start the analysis. Step 1: For the regression equation: Ho: The regression equation is not significant Ha: The regression equation is significant. For the coefficients if the regression equation is significant: Ho: The regression coefficient equals 0 Ha: The regression coefficient is not equal to 0. Note: We would write one pair of statements for each variable, for space reasons, we include only one general statement that should be applied to each variable. Step 2: Reject each null hypothesis claim if the related p-value > (is greater than) p-value = .05. Step 3: Regression Analysis Step 4: Perform the test. Selecting the Regression option in Data Analysis will open a familiar data entry box. The Input Y Range would be the salary range including the label. The Input X range would the labels and data for our input variables. In this case we will use Midpoint, Age, Performance Rating, Service, Raise, Degree, and Gender. Be sure to check the labels
  • 15. box and pick an output range upper left corner. This will result in the following output (values rounded to three decimal places): Step 5: Conclusions and Interpretation. Let’s look at each table separately. The Regression Statistics table shows A Multiple R and an R squared value. Multiple R is the multiple correlation value. Similar to our Pearson Coefficient it shows the relationship between the dependent (output or Salary in this case) variable with all for the independent or input variables. Multiple R is the multiple coefficient of determination, similar to the Pearson coefficient of determination, it displays the percent of variation in common between the dependent and all of the independent variables. The adjusted R square reduces the R square by a factor that involves the number of variables and the sample size, a suggestion if the design impacted the outcome more than the variables. We have an insignificant reduction. The standard error is a measure of variation in the outcome used for predictions. The count shows the number of cases used in the regression. The ANOVA table, sometimes called ANOR – analysis of regression – provides us with our test of significance outcome. Similar to the ANOVA covered in Week 3, we look at the Significance of F (AKA P-value) to see if we reject or fail to reject the null hypothesis of no significance. In this case, with a p-value of 8.44E-36 (equaling 0.00000000000000000000000000000000000844) is less than .05, so we reject the null of no significance. The regression equation explains a significant proportion of the variation in our dependent variable of salary. Now that we have a significant regression equation, we move on to the final table that presents and tests the coefficients for each variable. One of the important parts of a regression equation is that it shows us the impact of each factor if all other factors are
  • 16. held constant. A regression has the form: Y = A + B1* X1 + B2*X2 + B3*X3 + …. Where Y is the output, A is the intercept (places the line up or down on the Y axis when all other values are 0), the B’s are the coefficient values, and the X’s are the variable names. Before considering whether each coefficient is statistically significant or not, our equation would be: Salary - -4.009 + 1.22* Midpoint + 0.029*Age – 0.096*Perf Rat – 0.074*Service + 0.834*Raise + 1.002*Degree + 2.552* Gender. Whew! What does this mean? The intercept is an adjustment factor, one that we do not need to analyze. For midpoint, it means that as midpoint goes up by a thousand dollars (remember salary and midpoint are measured in thousands), the salary goes up by 1.22 thousand – higher graded employees are paid relatively more compared to midpoint than others (all others things equal). For Performance Rating, employees lose $96 (-0.096) for every higher PR point they have – certainly not what HR would like! Now, let’s look at our dummy variables, Degree and Gender. For Degree, an extra $1,002 is added to employees having a Deg code = 1, as if Deg = 0, the +1.002* 0 = 0; so graduate degree holders get an extra $1002 per year. The same thing applies to Gender, those coded 0 get nothing extra and those coded 1 get $2,552 more per year (all other things equal). Since females are coded 1, if this factor is significant, they would be paid $2552 more than males with all other factors equal (the definition of equal work). So, now let’s take a look at the statistical significance of each of the variables. This is determined with the P-value column (next to the t Stat value). This is read the same way we noticed in the t-test and ANVOA tables, if the value is less than 0.05 we reject the null hypothesis of no significance. While the intercept has a significance value, we tend to ignore it and include the intercept in all equations. For the other variables, the only significant variables are: Midpoint, Perf Rating (unrounded it was 0.0497994…), and Gender. So, the
  • 17. regression equation including only our statistically significant factors is Sal = -4.009 +1.22*Midpoint -).096*Perf Rat + 2.552*Gender. So, we now have a clear answer to our question about males and females getting equal pay for equal work. Not only is the answer no (as gender is a significant factor in determining salary) but also females are paid $2552 more annually all other things equal! This is certainly not the outcome most of us expected when we began this journey. What we see is that variation within any measure has some often unanticipated outcomes, and unless we examine the inputs into our results, we often do not understand them very well. Single measure tests such as the t and ANOVA tests are quite valuable comparing similar results, but they do not always get to the root of what causes differences. Reference Lind, D. A., Marchel, W. G., & Wathen, S. A. (2008). Statistical Techniques in Business & Finance. (13th Ed.) Boston: McGraw-Hill Irwin.