SlideShare a Scribd company logo
1 of 144
CORRELATION
Correlation
• Correlation is a statistical measure for
finding out degree(or strength)of
association between two(or more)
variables. If the change in one variable
effects a change in other variable then
these variables are said to be correlated.
Correlation
The measure of correlation called the
correlation coefficient .
Correlation coefficient ranges from
correlation ( -1 ≤ r ≥ +1)
The direction of change is indicated by a
sign.
Correlation & Causation
Causation means cause & effect
relation.
Causation always implies correlation
but correlation does not necessarily
implies causation.
Correlation – basic assumptions
• does not change when we change the
units of measurement. For example, from
Kg to pounds for weight. Why?
 r uses standardized values of the observations.
• does not measure nor describe curved or
non-linear association no matter how
strong.
• Like the mean and SD, r is not resistant or
uninfluenced by outliers.
 r is strongly affected by outlier or outlying
observations.
Types of Correlation
Type I
Correlation
Positive Correlation Negative Correlation
Types of Correlation
 Positive Correlation: The correlation is
said to be positive if the values of two
variables changing with same direction.
 Ex. No of hours spent on study and grade
in exam
 Negative Correlation: The correlation is
said to be negative when the values of
variables change with opposite direction.
 No of hrs spent on watching TV and grades in
exam .
Direction of the Correlation
• Positive relationship – Variables change in the
same direction.
 As X is increasing, Y is increasing
 As X is decreasing, Y is decreasing
▫ E.g., As study time increases, grades increase
• Negative relationship – Variables change in
opposite directions.
 As X is increasing, Y is decreasing
 As X is decreasing, Y is increasing
▫ E.g., As TV time increases, grades decrease
More examples
• Positive relationshipsPositive relationships
▫ No of vehicles and air
pollution .
▫ Smoking and cancer.
• Negative relationshipsNegative relationships:
▫ alcohol consumption and
driving ability.
▫ Cholesterol level and heart
disease
A perfect positive
correlation
Height
Weight
Height
of A
Weight
of A
Height
of B
Weight
of B
A linear
relationship
Degree of correlation
• Moderate Positive Correlation
Weight
Shoe
Size
r = + 0.4
Degree of correlation
• Perfect Negative Correlation
Exam score
TV
watching
per
week
r = -1.0
Degree of correlation
• Weak negative Correlation
No of friends
No of
books
read in a
month
r = - 0.2
Strong, negative relationship
but non-linear!
Strong negative Correlation
Degree of correlation (r)
r = +.80 r = +.60
r = +.40 r = +.20
Types of Correlation
• Simple correlation: Under simple
correlation problem there are only two
variables are studied.
• Multiple Correlation: Under Multiple
Correlation three or more variables are
studied.
• Partial correlation: analysis recognizes
more than two variables but considers only
two variables keeping the other constant.
Methods of studying correlation
• Pearson product moment correlation
• Rank correlation
• Kendal’s Tau
• Biserial correlation
• Point Biserial correlation
• Phi coefficient
• Tetra choric correlation
Correlation Coefficient
 Pearson’s Product Moment Correlation
 Symbolized by r
 Covariance ÷ (product of the 2 SDs)
 Correlation is a standardized covarianceYX
XY
ss
Cov
r =
Calculation for Example
• CovXY = 11.12
• sX = 2.33
• sY = 6.69
cov 11.12 11.12
.713
(2.33)(6.69) 15.59
XY
X Y
r
s s
= = = =
Other formulae
• Z-score method
• Computational (Raw Score) Method
20
1
x yz z
r
N
=
−
∑
2 2 2 2
( ) ( )
N XY X Y
r
N X X N Y Y
−
=
   − −   
∑ ∑ ∑
∑ ∑ ∑ ∑
Interpretation of Correlation
Coefficient (r)
• The value of correlation coefficient ‘r’ ranges
from -1 to +1
• If r = +1, then the correlation between the
two variables is said to be perfect and
positive
• If r = -1, then the correlation between the
two variables is said to be perfect and
negative
• If r = 0, then there exists no correlation
between the variables
Relation between regression and
correlation
• The coefficient of correlation is the geometric mean
of two regression coefficient.
r = √ bxy * byx
Limitation of Pearson’s
Coefficient
• Always assume linear
relationship
• Interpreting the value of r is
difficult.
• Value of Correlation Coefficient
is affected by the extreme
values.
Outliers.....Outliers are dangerous
Here we have a spurious
correlation of r=0.68
without IBM, r=0.48
without IBM & GE,
r=0.21
Coefficient of Determination
• It is the square of correlation coefficient (r²)
• It explains how much of the variability of a factor is
explained by its relationship to another factor.
• The maximum value of r2
is 1 because it is possible
to explain all of the variation in y but it is not
possible to explain more than all of it.
• Coefficient of Determination = Explained variation /
Total variation
Coefficient of Determination: An
example
 r = 0.60
r = 0.30
 It does not mean that the first correlation is twice
as strong as the second
 This can be understood by computing the value of
r2 .
When r = 0.60 r2
= 0.36
r = 0.30 r2
= 0.09
 This implies that in the first case 36% of the total
variation is explained (shared) whereas in second
case 9% of the total variation is explained (shared)
.
Product moment correlation
• Direct formula
ExampleRes
pon
den
ts
Variabl
e X
Variable
Y
XY X² Y²
1 43 99 4257 1849 9801
2 21 65 1365 441 4225
3 25 79 1975 625 6241
4 42 75 3150 1764 5625
5 57 87 4859 3249 7569
6 59 81 4779 3481 6561
Σ 247 486 20485 11409 40022
Solution
• From our table:
• Σx = 247
• Σy = 486
• Σxy = 20,485
• Σx2 = 11,409
• Σy2 = 40,022
• n is the sample size, in our case = 6
Solution
• = 6(20,485) – (247 × 486)
√[6(11,409) – 247²] × [6(40,022) – 486²]
• =0.5298
6084
8464
8100
3364
1849
5476
6561
624
184
450
696
645
666
486
57 516 3751 579 39898
1 8 78
2 2 92
3 5 90
4 12 58
5 15 43
6 9 74
7 6 81
64
4
25
144
225
81
36
xy x2
y2
Another example
x y
Spearman’s Rank
Coefficient of Correlation (Rho)
• When variables under study are arranged in serial
order Spearman Rank correlation can be used.
• Rho = 1- (6 ∑D2
) / N (N2
– 1)
• Rho = Rank correlation coefficient
• D = Difference of rank between paired item in two series.
• N = Total number of observation.
Rank Correlation Coefficient
(Rho)
a) Problems where actual rank are given.
1) Calculate the difference ‘D’ of two Ranks i.e.
(R1 – R2).
2) Square the difference & calculate the sum of
the difference i.e. ∑D2
3) Substitute the values obtained in the formula.
Example
• To calculate a Spearman rank-order correlation
on data without any ties
• English
56 75 45 71 62 64 58 80 76
61
• Maths
66 70 40 60 65 56 59 77 67
63
Example
• Eng Math Rank Rank d d2
• Eng Maths
• 56 66 9 4 5 25
• 75 70 3 2 1 1
• 45 40 10 10 0 0
• 71 60 4 7 3 9
• 62 65 6 5 1 1
• 64 56 5 9 4 16
• 58 59 8 8 0 0
• 80 77 1 1 0 0
• 76 67 2 3 1 1
• 61 63 7 6 1 1
Solution
• Σd² = 25+1+…. =54
• Applying the formulae
• Rho= 1- (6x54)/10(10 ²-1)
• 1-324/990
• 1- 0.33
• .67
Rank Correlation Coefficient
(Rho)
• Equal Ranks or tie in Ranks: In such cases
average ranks should be assigned to each individual
• Example (to be worked out)
Interpretation of Rank
Correlation Coefficient
• The value of rank correlation coefficient, R ranges
from -1 to +1
• If R = +1, then there is complete agreement in the
order of the ranks and the ranks are in the same
direction
• If R = -1, then there is complete agreement in the
order of the ranks and the ranks are in the
opposite direction
• If R = 0, then there is no correlation
Merits Spearman’s Rank
Correlation
• This method is simpler to understand and
easier to apply compared to Karl Pearson’s
correlation method.
• This method is useful in ordinal data
• But difficult if data is large
Kendall's Tau
• Kendall's τ (tau) is a non-parametric
measure of correlation between two
ranked variables. It is similar to
Spearman's Rho and Pearson's Product
Moment Correlation Coefficient
Calculation of τ
• τ = C-D/C+D
• C= Concordant Pairs
• D= Discordant Pairs
• A concordant pair is when the rank of the
second variable is greater than the rank of the
former variable.
• A discordant pair is when the rank is equal to or
less than the rank of the rst variable
ExampleRank
variable
1
Rank
variable
2
1 1
2 3
3 6
4 2
5 7
6 4
7 5
R2
1
2 C
3 C C
4 C D D
5 C C C C
6 C C C D D
7 C C C C D D
1 2 3 4 5 6 7
Counting Concordant and
Discordant Values
τ= 15-6/15+6 = 7/21= 0.429
ExampleChange in
Testoster
one
Display Ranked
Change in
Testostero
ne
Ranked
Display
1.16 05.40 1 1
1.07 3.80 2 4
1.06 3.60 3 5
1.01 4.80 4 2
.96 2.60 5 7
. 90 4.60 6 3
. 81 2.40
7 8
.23 3.20 8 6
Calculating the Kendall tau-a
Coefficient
Ran
ked
Cha
nge
in
Disp
lay
Scor
es
2 C
3 C C
4 C D D
5 C C D D
6 C C C C C
7 C C C C D D
8 C C C C C C D
1 2 3 4 5 6 7 8
solution
• Taking the first person, who is ranked 1 for change in
testosterone, how many people are ranked above that
person for display? These are concordant – and the answer
is 7 people, so C = 7. The number of discordant people,
who are ranked above, is zero, so D = 0.
• Take the second person. 6 people are ranked above that
person, and they are concordant, so C = 6, and 1 person
(the person ranked 4th in display) is equal, so they are
discordant, D = 1.
• We keep doing this for each person, but we can make our
lives easier by putting this into a table, which is shown in
Table 2. For each pair of people, we say whether the scores
are concordant, in which case we give them a C, or
discordant, in which case we give them a D.
Easier method
• 1 2 3 4 5 6 7 8
• 1 4 5 2 7 3 8 6
• No of inversions 7
• τ = 1- (2r)
• n(n-1)/2
• r= number of inversions
• n= number of cases
• τ= 1- 14
• 6x7/2
• 1-.5 =.5
An easier method
A B C D E F G H I J
•V1 1 2 3 4 5 6 7 8 9 10
•V2 2 1 5 3 4 6 10 8 7 9
•τ = 1- (2r)
• n(n-1)/2
•r= number of inversions
•n= number of cases
•1- 2x5
• 10x9/2
•1- 0.222 = 778
Significance of tau
• Z=
τ
• √2(2n+5)/9n(n-1)
√ .778
50x810
• = 3.17 it is >1.96
Example 2
• A B C D E F G H I J
• V1 1 2 3 4 5 6 7 8 9 10
• V2 5 1 2 4 3 10 6 7 9 8
• Do it yourself
Where can you use
• to understand whether there is an association
between exam grade and time spent revising (i.e.,
where there were six possible exam grades – A, B, C,
D, E and F – and revision time was split into five
categories: less than 5 hours, 5-9 hours, 10-14 hours,
15-19 hours, and 20 hours or more).
• to understand whether there is an association
between customer satisfaction and delivery time (i.e.,
where delivery time had four categories – next day, 2
working days, 3-5 working days, and more than 5
working days – and customer satisfaction was
measured in terms of :Highly satisfied; very satisfied,
satisfied, dissatisfied, highly dissatisfied).
Comparison of tau and rank
correlations• Kendall’s tau
• In most of the situations, the
interpretations of Kendall’s
tau and Spearman’s rank
correlation coefficient are
very similar and thus
invariably lead to the same
inferences.
 give usually smaller
values than Spearman’s
rho correlation.
 Calculations based on
concordant and
discordant pairs.
 Insensitive to error.
 P values are more
accurate with smaller
sample sizes.
 The distribution of
Kendall’s tau has better
statistical properties
Spearman rank
•usually have larger values than
Kendall’s Tau.
• Calculations based on deviations.
•Much more sensitive to error and
discrepancies in data
•rank correlation coefficient is the
more widely used rank correlation
coefficient
Other Kinds of Correlation
• Point biserial correlation coefficient (rpb)
▫ used with one continuous scale and one
nominal or ordinal or dichotomous scale.
▫ uses the same Pearson formula
Attractiveness Date?
3 0
4 0
1 1
2 1
5 1
6 0
rpb = -0.49 52
Point biserial
• Point biserial is used when one variable is
continuous and the other is dichotomous
(like gender)
• Rpb = M1-M2
• Sn-1 x √n(n-1)²
Computation of point biserial
• r pb = Mp-Mq
• Std x √pq
• Where rpb is point biserial correlation
• Mp is mean score of students answering correctly
• Mq is mean of students answering incorrectly
• Std is standard deviation of the whole sample
• P is proportion of students answering correctly
• Q is 1-p
Computation of point biserial
Student Item1 Item2 Item3
item4
Item5 Item6 item7 total
1 1 0 1 50
2 1 0 1 45
3 1 0 1 45
4 1 0 1 40
5 0 1 1 35
6 0 1 1 30
7 0 1 1 30
8 0 1 1 25
Mp 45 30 37.5 Mean
total
score
37.5
Mq 30 45 0 Sd 8.29
p .50 .50 1.00
q .50 .50 0
rpb .91 -.91 0
Correlation and ‘t’
We can convert r to t and test for significance:
Where DF = N-2
2
2
1
N
t r
r
−
=
−
Tables of Significance
 Suppose r= 0.71 and n=21
 Start with Ho = r=0
 Df = N-2 = 21 – 2 = 19
 ‘t’-crit (19) = 2.09
 Since 6.90 is larger than 2.09 reject
null hypothesis r = 0.
2 2
2 19 19
.71* .71* 6.90
1 1 .71 .4959
N
t r
r
−
= = = =
− −
Other Kinds of Correlation
• Phi coefficient (Φ)
▫ used with two dichotomous scales.
▫ uses the same Pearson formula
Attractiveness Date?
0 0
1 0
1 1
1 1
0 0
1 1
Φ = 0.71
58
formula
A B A+B
C D C+D
A+C B+D
solution Attracti
veness
yes
Attracti
veness
no
total
Date yes 3 0 3
Date no 1 2 3
total 4 2
6-0/√3x3x4x2
6/ √72
6/8.48
0.707
Tetrachoric correlation
• If you have dichotomous data on two variables
but are willing to assume that the underlying
variables are normally distributed, you may use
the tetrachoric correlation to estimate the size
correlation between the underlying variables.
• Rpb= Cos {180◦/1+√(ad/bc)}
Tetrachoric correlation
• When you have continuous data but wants
to split the data in a dichotomous form
(median split for example), you use tetra
choric. Here you are artificially making a
continuous data dichotomous
example Attitude towards women
Score on
openness
Negative Positive Total
Above
median
68 a 32 b 100
Below
median
30 c 70 d 100
Total 98 102 200
rtet= Cos {180◦/1+√(ad/bc)}
Cos {180◦/√(65x70)/30x32} = cos 55.784 = .
722
On tetrachoric
• This formula works best only when
▫ N is large
▫ When the splits are as near the median as
possible
It is better to use phi coefficient than tetra
choric
Advantages of Correlation
studies
• Show the amount (strength) of relationship
present
• Can be used to make predictions about the
variables under study.
• Can be used in many places, including
natural settings, laboratories , etc.
• Easier to collect co relational data
Factors Affecting r
 Range restrictions (truncation)
 Looking at only a small portion of the total
scatter plot (looking at a smaller portion of the
scores’ variability) decreases r.
 Reducing variability reduces r
 It affects especially in validation with an
external criterion in selection scenario
 Nonlinearity
 The Pearson r (and its relatives) measure the
degree of linear relationship between two
variables
 If a strong non-linear relationship exists, r will
provide a low, or at least inaccurate measure
of the true relationship.
Truncation
69
Non-linearity
70
Heterogenous samples
71
Factors affecting correlation
• Reliability of measurement
▫ If you are not reliably distinguishing individuals on some
measure, you will not capture the covariance that
measure may have with another adequately
• Heterogeneous subsamples
▫ Sub-samples may artificially increase or decrease overall
r.
▫ Solution - calculate r separately for sub-samples &
overall, look for differences
▫ Can be caused by lack of reliability
• Outliers can artificially increase or decrease
• r
Testing Correlations
 How to find out if a correlation is big/ large?
 In terms of magnitude, how big is big?
 Small correlations in large samples are “big.”
 Large correlations in small samples aren’t always
“big.”
 Depends upon the magnitude of the
correlation coefficient
 AND
 The size of your sample.
73
Correlation and effect size
Effect size r
Small 0.10
Medium 0.30
Large 0.50
Pearson r or correlation coefficient
Partial correlation
What is a partial correlation
• Partialling is holding constant a third
variable via residuals
• It estimates what would happen if
everyone had the same score in the third
variable
Partial correlation
• Two variables A and B are correlated. But
you feel that this relationship is influenced
by a third variable C. You want to remove
this influence and want to know the true
correlation between A and B.
• In this case you partial out the influence
of C.
Example
• You know that exam grades are correlated
with intelligence. You also know that exam
grades are influenced by exam anxiety.
You also know that intelligence scores are
moderated by anxiety. You want to know
the correlation between exam grade and
intelligence when controlled for anxiety.
Example
• Suppose you have the following data.
• Correlation between exam (A) grade and
intelligence (B) = .918
• Correlation between exam grade(A) and
anxiety (C) = -.369
• Correlation between anxiety (C) and
intelligence(B) = -.245
Solution
• Correlation between exam grade and
intelligence controlled for anxiety is
• r AB.C = rAB-rACxrBC
• √(1-r²AC)(1-r²BC)
• = .918-(-.369x-.245)/√(1-(-.369²)x1-(-.245)²
• = .918-.090/√.864x.94
• = .918-.090 = .828/.901
• = .919
• The true correlation between exam score and intelligence is
.919. We can see that the correlation improved slightly
after partialling out the effect of anxiety.
Example 2Attitude to
wards
women
Openness Education
2 7 14
4 10 13
8 14 11
7 13 9
8 9 5
9 10 14
1 9 5
0 9 6
6 12 11
5 10 12
Solution
• Find out correlation between ATW and
Openness
• Find out correlation between ATW amd
Edu
• Find out correlation between Openess and
Edu
• Partial out the effect of edu in the
correlation between ATW and openness
Solution
• r between ATW and openness 0.662 (A)
• r between ATW and Edu 0.276(B)
• r between Openness and Edu 0.250 (C)
• Partialling effect of edu
• r AB.C = rAB-rACxrBC
• √(1-r²AC)(1-r²BC)
• .662-.276x.250/√(1 -.276²)(1-.250²) = .637
Partial and semi-partial correlation
• In partial correlation the effect of the third
variable is partialled out from both the
variables rAB.C
• In semi partial correlation the effect of the
third variable is partialled out from only
one the two variables. rA(B.C)
Order of partialling
• If you partial 1 variable out of a correlation, the resulting
partial is called a first order partial correlation.
• If you partial 2 variables out of a correlation, the resulting
partial is called a second order partial correlation. Can
have 3rd, 4th, etc., order partials.
• Unpartialed (raw) correlations are called zero order
correlations because nothing is partialed out.
• Can use regression to find residuals and compute partial
correlations from the residuals, e.g. for r12.34, regress 1
and 2 on both 3 and 4, then compute correlation between
2 sets of residuals.
Solution
• In the above example the relationship
between exam grade and intelligence can be
semipartialed by removing the effect of
anxiety only on intelligence
• r AB.C = rAB-rACxrBC/√1-r²BC)
• .918--.369x-.245/√1-.(-245)²
• .918-.090 = .858/.970
• 0.885
• The correlation between exam grade and
intelligence after removing the influnce of
anxiety on intelligence is 0.885. The effect of
anxiety on exam grade is not removed.
The correlation coefficient of number of times absent and
final grade is r = –0.975. The coefficient of determination is
r2
= (–0.975)2
= 0.9506.
Interpretation: About 95% of the variation in final grades can be
explained by the number of times a student is absent. The other
5% is unexplained and can be due to sampling error or other
variables such as intelligence, amount of time studied, etc.
Strength of the
Association
The coefficient of determination, r2
, measures the strength
of the association and is the ratio of explained variation in
y to the total variation in y.
Regression
Regression Analysis
• Regression Analysis is a very
powerful tool in the field of
statistical analysis in predicting
the value of one variable, given
the value of another variable,
when those variables are related
to each other.
Regression Analysis• Regression takes us a step beyond
correlation in that not only are we
concerned with the strength of the
association, but we want to be able to
describe its nature with sufficient precision
to be able to make predictions
• To be able to make predictions, we need to
be able to characterize one of the variables
in the relationship as independent and the
other as dependent
Regression Analysis
• For example, in the relationship (male
literacy vs % of people living in the cities),
the causal order seems pretty obvious.
Literacy rates are not like to produce
urbanization, but urbanization is probably
causally prior to increases in literacy rates
Regression and Prediction
• If you say that there is a correlation between no
of vehicles and air pollution it does not convey
causal relationship though you know that
vehicles can increase pollution and pollution
cannot increase vehicles
• In regression analysis you predict for each unit
increase in vehicle population how much
increase in pollution will result.
In short
• Regression Analysis is mathematical
measure of average relationship between
two or more variables.
• Regression analysis is a statistical tool used
in prediction of value of unknown variable
from known variable.
Advantages of Regression
Analysis
• Regression analysis provides estimates of
values of the dependent variables from the
values of independent variables.
• Regression analysis also helps to obtain a
measure of the error involved in using the
regression line as a basis for estimations .
• Regression analysis helps in obtaining a
measure of the degree of association or
correlation that exists between the two
variable.
Regression
What is regression?
• Fitting a line to the data using an equation in
order to describe and predict data
• Simple Regression
▫ Uses just 2 variables (X and Y)
▫ Other: Multiple Regression (one Y and many X’s)
• Linear Regression
▫ Fits data to a straight line
▫ Other: Curvilinear Regression (curved line)
• Existence of actual linear relationship.
• The regression analysis is used to estimate the
values within the range for which it is valid.
• In regression, we have only one dependant
variable in our estimating equation. However,
we can use more than one independent
variable.
•
Assumptions in Regression Analysis
• The dependent variable takes any random
value but the values of the independent
variables are fixed.
• In regression, we have only one dependant
variable in our estimating equation. However,
we can use more than one independent
variable.
Assumptions in Regression Analysis
What is regression
• Regression indicates the degree to which
the variation in one variable X, is related
to or can be explained by the variation in
another variable Y
• Once you know there is a significant linear
correlation, you can write an equation
describing the relationship between the x
and y variables. This equation is called the
line of regression or least squares line.
Regression Equation
• Regression line of Y on X : gives the best
estimate for the value of y for any specific given
values of x
• Y = ax+ b a =Slope of the line
• b =Y - intercept
• Y = Dependent variable
• X = Independent
variable
Regression Equation:
We can predict a Y score from an X by
plugging a value for X into the equation and
calculating Y
What would we expect a person to get on quiz
#4 if they got a 12.5 on quiz #3?
Y = .823X + -4.239
Y = .823(12.5) + -4.239 = 6.049
Interpreting Regression : Basics
• Intercept
▫ Value of Y if X(s) is 0
▫ Often not meaningful, particularly if it’s practically impossible to
have an X of 0 (e.g. weight)
• Slope, the regression coefficient
▫ Amount of change in Y seen with 1 unit change in X
 Standardized regression coefficient
 Amount of change in Y seen in standard deviation units with 1 standard
deviation unit change in X
 In simple regression it is equivalent to the r for the two variables
• Standard error of estimate
▫ Gives a measure of the accuracy of prediction
• R2
▫ Proportion of variance in the outcome explained by the model
▫ Effect size
Example
Example 2
Explanation
• In the above example, y = 5x + 2
• X is the slope and 2 is the intercept
• This means that
• The predicted Y = 5x (value of x) +2
• Suppose you want to predict the
corresponding value of x=5,
• Then Y(pre) = (5x5) +2 = 27
• If x = 12 then Y (pre)= (5 x 12) + 2 = 62
Explanation• We should also know that what we
calculate is the estimated value of Y for a
given value of x
• This need not be accurate; there is some
error in prediction. (because we assume
regression line to be a straight line ,but
the data points actually cluster around the
line not exactly on the line)
Explanation• So this predicted (estimate value of Y is
called
• Y- is the error.
• The regression line is fitted in such a way
that this error is minimum
The Explanation of Regression Line
• In case of perfect correlation ( positive or
negative ) the two line of regression
coincide.
• If the two Regression lines are far from
each other, then degree of correlation is
less, & vice versa.
• The mean values of X &Y can be obtained
as the point of intersection of the two
regression line.
• The higher degree of correlation between
the variables, the angle between the lines
is smaller & vice versa.
Regression Equation / Line
& Method of Least Squares
• Regression Equation of y on x
Y = a x+ b
We have to obtain the values of a, b
• Regression Equation of x on y
X = cy + d
We have to obtain the values of c and d
How to calculate
• Regression equation = Y ( = )= ax + b
• Where ‘a’ is the slope and ‘b’ is the y intercept
• To find out slope a= nΣxy-ΣxΣy
• nΣx²-(Σx) ²
• Y intercept = b= Ῡ-a
Regression Equation / Line when
Deviation taken from Arithmetic Mean
• Regression Equation of y on x:
Y – Y = byx (X –X)
byx = ∑xy / ∑x2
byx=r (σy / σx )
• Regression Equation of x on y:
X – X = bxy(Y –Y)
bxy = ∑xy / ∑y2
bxy=r (σx / σy )
Properties of the Regression Coefficients
• The coefficient of correlation is geometric mean of the
two regression coefficients. r = √ byx * bxy
• If byx is positive than bxy should also be positive & vice
versa.
• If one regression coefficient is greater than one the
other must be less than one.
• The coefficient of correlation will have the same sign as
that our regression coefficient.
• Arithmetic mean of byx & bxy is equal to or greater than
coefficient of correlation. byx + bxy / 2 ≥ r
• Regression coefficient are independent of origin but not
of scale.
180
190
200
210
220
230
240
250
260
1.5 2.0 2.5 3.0Ad $
= a residual
(xi,yi) = a data point
revenue
= a point on the line with the same x-value
Best fitting straight line
Calculate a and b.
Write the equation of the
line of regression with
x = number of absences
and y = final grade.
The line of regression is: = –3.924x + 105.66
6084
8464
8100
3364
1849
5476
6561
624
184
450
696
645
666
486
57 516 3751 579 39898
1 8 78
2 2 92
3 5 90
4 12 58
5 15 43
6 9 74
7 6 81
64
4
25
144
225
81
36
xy x2
y2
x y
Solution
• a= nΣxy-ΣxΣy
• nΣx²-(Σx) ²
• 7x 3751- 57x 516
• 7x 579 x 57²
• = -3.924
• b = -Ῡ a
• = 516/7 – (-3.924x 57/7) = 73.714 + 31.953 =
105.667
• The line of regression is
• = -3.924 x + 105.667
0 2 4 6 8 10 12 14 16
40
45
50
55
60
65
70
75
80
85
90
95
Absences
FinalGrade
m = –3.924 and b = 105.667
The line of regression is:
Note that the point = (8.143, 73.714) is on the line.
The Line of Regression
The regression line can be used to predict values of y for
values of x falling within the range of the data.
The regression equation for number of times absent
and final grade is:
Use this equation to predict the expected grade for a student with
(a) 3 absences (b) 12 absences
(a)
(b)
Predicting y Values
= –3.924(3) + 105.667 = 93.895
= –3.924(12) + 105.667 = 58.579
= –3.924x + 105.667
Estimating the error
• Here the estimated grade for 12 absences is
58.57
• But from the original data you can find m
that for 12 absences the grade obtained is 58.
So the error Y- = 58-58.57 = -.57
Similarly for 6 absences we can calculate
Estimating the error
• = –3.924(6) + 105.667 = 82.12
• But obtained value is 81
• So Estimated value and obtained value have
some difference
Standard Error of Estimate.
• Standard Error of Estimate is the measure of variation
around the computed regression line.
• Standard error of estimate (SE) of Y measure the
variability of the observed values of Y around the
regression line.
• Standard error of estimate gives us a measure about
the line of regression. of the scatter of the observations
about the line of regression.
Estimating the error
• We can estimate this error by the following
formula
• SE est of Y=σy √1-r²
• σy = is the sd of y distribution
• r² = square of correlation between x and y
Estimating the error
• In the above example the correlation between no
of absences (x) and grades (y) is .975
• Sd is 17.61
• Then SE =σy √1-r²
• 17.61 x √ .046 = 17.61-.224
• 17.39
• This means the estimated y can be +/- 17.39
Problem in Regression
• The following are the scores obtained in two variables X,
Y by 10 individuals. Find out the regression of Y on X
and X on YIn
dls
X x² Y y² xy
1 40 1600 2 4 80
2 43 1849 5 25 215
3 45 2025 4 16 180
4 46 2116 7 49 322
5 60 3600 9 81 540
6 63 3969 5 25 315
7 69 4761 2 4 138
8 54 2916 8 64 432
9 70 4900 6 36 420
10 62 3844 9 81 558
solution
• ΣX = 552
• ΣY = 57
• ΣX² = 31580
• ΣY²= 385
• ΣXY= 3200
Solution
• a= nΣxy-ΣxΣy
• nΣx²-(Σx) ²
• 10 x 3200- (552x 57)
10 x 31580 - 552²
• 32000- 31464
• 315800-304704
• 536/11096
• 0.048
Solution
• b = -Ῡ a
• 5.7- 0.048 x 55.2
• 3.05
• Regression Equation is
• = .048x + 3.05
• (for x= 60) .048 x 60 + 3.05
• 5.93
• Obtained score is 9
• So error = -Y = -3.07
Solution
• For x= 70 What is the estimated Y
• Estimated Y = .048 x 70 + 3.05
• 6.41
• Obtained value for 70 is 6
• So error is 6.41-6 = .41
Calculation of regression of X on Y
• c= nΣxy-ΣxΣy
• nΣy²-(Σy) ²
• Numerator is same
• Denominator 10 x 385 - 57²
•
• c= 536/601 = 0.892
• d = -cῩ = 55.2- 0.892 x 5.7
• 50.116
Calculation of regression of X on Y
• Regression equation is
̂̂x = cy + d = .892y + 50.116
• Verify:
• For y= 5,
• = .892 x 5 + 50.116 = 54.576
• Obtained value = 43
• For y= 9, is 58.144
• Obtained value is 62
Std error
• Then SE =σy √1-r²
• Sd y =2.58414
• r = -.1278
• 2.58414 x .9837
• 2.5420
• This means that for any given value of x the
estimated value of y may be +/- 2.540 of the true
value
Multiple Regression
Y = a + b1X1 + b2X2
Notation
 a is the Y intercept, where the regression line crosses the Y
axis
 b1 is the partial slope for X1 on Y
 b1 indicates the change in Y for one unit change in X1,
controlling for X2
 b2 is the partial slope for X2 on Y
 b2 indicates the change in Y for one unit change in X2,
controlling for X1
Partial Slopes
• The partial slopes = the effect of each independent
variable on Y while controlling for the effect of the
other independent variable(s).
• Show the effects of the X’s in their original units.
• These values can be used to predict scores on Y.
• Partial slopes must be computed before computing a
(the Y intercept).
Formula for y intercept
Formulas for Partial Slopes
Standardized Partial Slopes
(beta-weights)
• Partial slopes (b1 and b2) are in the original units of
the independent variables.
• To compare the relative effects of the independent
variables, compute beta-weights (b*).
• Beta-weights show the amount of change in the
standardized scores of Y for a one-unit change in
the standardized scores of each independent
variable while controlling for the effects of all other
independent variables.
Beta-weights
• Formula to calculate
the beta-weight for X1
• Formula to calculate
the beta-weight for X2
Multiple Correlation (R2
)
• The multiple correlation coefficient (R2
) shows
the combined effects of all independent variables
on the dependent variable.
Limitations
Multiple regression and correlation are among the
most powerful techniques available to researchers.
•These techniques require:
▫ Every variable is measured at the interval-ratio level
▫ Each independent variable has a linear relationship
with the dependent variable
▫ Independent variables do not interact with each
other
▫ Independent variables are uncorrelated with each
other
Limitations
When these requirements are violated (as they often
are), these techniques will produce biased and/or
inefficient estimates.
There are more advanced techniques available to
researchers that can correct for violations of these
requirements. Such techniques are beyond the scope
of this course
Step wise regression
Statement of problem
• A common problem is that there is a large set of
candidate predictor variables.
• (Note: The examples herein are really not that large.)
• Goal is to choose a small subset from the larger
set so that the resulting regression model is
simple, yet have good predictive ability.
Example: Selection data
• You are trying to select the best candidates from
a pool of applicant for a job, using a number of
variables (and their tests)
• Cognitive ability
• Adjustment
• Integrity
• Leadership
• Stress tolerance
Your problem
• You want to select those variables which
together will predict the criterion (job success)
• You want to select only minimum variables
• Together their predictive efficiency must be
maximum
Two basic methods
of selecting predictors
• Stepwise regression: Enter and remove
predictors, in a stepwise manner, until
there is no justifiable reason to enter or
remove more.
• Best subsets regression: Select the subset of
predictors that do the best at meeting some well-
defined objective criterion.
What is step wise
• First include the test (variable ) with maximum
predictive ability (Predictive Validity)
• Add a new test (the second best)
• See if adds to the Multiple correlation (R)
• If yes, add a third one
• Go on addicting tests till R does not increase.
• When R no longer increases, you have reached
your maximum predictive efficiency

More Related Content

What's hot

Correlation and regression
Correlation and regressionCorrelation and regression
Correlation and regressionMohit Asija
 
Skewness and kurtosis ppt
Skewness and kurtosis pptSkewness and kurtosis ppt
Skewness and kurtosis pptDrishti Rajput
 
correlation and regression
correlation and regressioncorrelation and regression
correlation and regressionUnsa Shakir
 
Correlation analysis
Correlation analysis Correlation analysis
Correlation analysis Anil Pokhrel
 
Spearman rank correlation coefficient
Spearman rank correlation coefficientSpearman rank correlation coefficient
Spearman rank correlation coefficientKarishma Chaudhary
 
Spearman Rank Correlation Presentation
Spearman Rank Correlation PresentationSpearman Rank Correlation Presentation
Spearman Rank Correlation Presentationcae_021
 
Variance & standard deviation
Variance & standard deviationVariance & standard deviation
Variance & standard deviationFaisal Hussain
 
Skewness and kurtosis
Skewness and kurtosisSkewness and kurtosis
Skewness and kurtosisKalimaniH
 
Correlation
CorrelationCorrelation
CorrelationTech_MX
 
Statistics: Probability
Statistics: ProbabilityStatistics: Probability
Statistics: ProbabilitySultan Mahmood
 
Phi Coefficient of Correlation - Thiyagu
Phi Coefficient of Correlation - ThiyaguPhi Coefficient of Correlation - Thiyagu
Phi Coefficient of Correlation - ThiyaguThiyagu K
 
Correlation analysis ppt
Correlation analysis pptCorrelation analysis ppt
Correlation analysis pptDavid Jaison
 
Measures of dispersion or variation
Measures of dispersion or variationMeasures of dispersion or variation
Measures of dispersion or variationRaj Teotia
 
Correlation coefficient
Correlation coefficientCorrelation coefficient
Correlation coefficientCarlo Magno
 

What's hot (20)

Correlation and regression
Correlation and regressionCorrelation and regression
Correlation and regression
 
Geometric Mean
Geometric MeanGeometric Mean
Geometric Mean
 
Skewness and kurtosis ppt
Skewness and kurtosis pptSkewness and kurtosis ppt
Skewness and kurtosis ppt
 
correlation and regression
correlation and regressioncorrelation and regression
correlation and regression
 
Correlation analysis
Correlation analysis Correlation analysis
Correlation analysis
 
Spearman rank correlation coefficient
Spearman rank correlation coefficientSpearman rank correlation coefficient
Spearman rank correlation coefficient
 
Spearman Rank Correlation Presentation
Spearman Rank Correlation PresentationSpearman Rank Correlation Presentation
Spearman Rank Correlation Presentation
 
Variance & standard deviation
Variance & standard deviationVariance & standard deviation
Variance & standard deviation
 
Skewness and kurtosis
Skewness and kurtosisSkewness and kurtosis
Skewness and kurtosis
 
PEARSON'CORRELATION
PEARSON'CORRELATION PEARSON'CORRELATION
PEARSON'CORRELATION
 
Correlation
CorrelationCorrelation
Correlation
 
Meaning and types of correlation
Meaning and types of correlationMeaning and types of correlation
Meaning and types of correlation
 
Statistics: Probability
Statistics: ProbabilityStatistics: Probability
Statistics: Probability
 
Phi Coefficient of Correlation - Thiyagu
Phi Coefficient of Correlation - ThiyaguPhi Coefficient of Correlation - Thiyagu
Phi Coefficient of Correlation - Thiyagu
 
MEAN.pptx
MEAN.pptxMEAN.pptx
MEAN.pptx
 
Correlation analysis ppt
Correlation analysis pptCorrelation analysis ppt
Correlation analysis ppt
 
Measures of dispersions
Measures of dispersionsMeasures of dispersions
Measures of dispersions
 
Measures of dispersion or variation
Measures of dispersion or variationMeasures of dispersion or variation
Measures of dispersion or variation
 
Correlation coefficient
Correlation coefficientCorrelation coefficient
Correlation coefficient
 
Correlation
CorrelationCorrelation
Correlation
 

Similar to CORRELATION COEFFICIENTS

Correlation and Regression Analysis.pptx
Correlation and Regression Analysis.pptxCorrelation and Regression Analysis.pptx
Correlation and Regression Analysis.pptxasemzkgmu
 
Educ eval ppt correlation
Educ eval ppt correlationEduc eval ppt correlation
Educ eval ppt correlationcampionjelmar
 
Correlation - Biostatistics
Correlation - BiostatisticsCorrelation - Biostatistics
Correlation - BiostatisticsFahmida Swati
 
Correlation and Regression
Correlation and RegressionCorrelation and Regression
Correlation and Regressionjasondroesch
 
Correlation analysis in Biostatistics .pptx
Correlation analysis in Biostatistics .pptxCorrelation analysis in Biostatistics .pptx
Correlation analysis in Biostatistics .pptxHamdiMichaelCC
 
Regression & correlation coefficient
Regression & correlation coefficientRegression & correlation coefficient
Regression & correlation coefficientMuhamamdZiaSamad
 
Correlation _ Regression Analysis statistics.pptx
Correlation _ Regression Analysis statistics.pptxCorrelation _ Regression Analysis statistics.pptx
Correlation _ Regression Analysis statistics.pptxkrunal soni
 
Correlation and Regression.pptx
Correlation and Regression.pptxCorrelation and Regression.pptx
Correlation and Regression.pptxJayaprakash985685
 
Unit 1 Correlation- BSRM.pdf
Unit 1 Correlation- BSRM.pdfUnit 1 Correlation- BSRM.pdf
Unit 1 Correlation- BSRM.pdfRavinandan A P
 
Biostatistics - Correlation explanation.pptx
Biostatistics - Correlation explanation.pptxBiostatistics - Correlation explanation.pptx
Biostatistics - Correlation explanation.pptxUVAS
 

Similar to CORRELATION COEFFICIENTS (20)

Correlation and Regression Analysis.pptx
Correlation and Regression Analysis.pptxCorrelation and Regression Analysis.pptx
Correlation and Regression Analysis.pptx
 
13943056.ppt
13943056.ppt13943056.ppt
13943056.ppt
 
Statistics ppt
Statistics pptStatistics ppt
Statistics ppt
 
12943625.ppt
12943625.ppt12943625.ppt
12943625.ppt
 
Correlation.pdf
Correlation.pdfCorrelation.pdf
Correlation.pdf
 
Educ eval ppt correlation
Educ eval ppt correlationEduc eval ppt correlation
Educ eval ppt correlation
 
correlation.ppt
correlation.pptcorrelation.ppt
correlation.ppt
 
Correlation - Biostatistics
Correlation - BiostatisticsCorrelation - Biostatistics
Correlation - Biostatistics
 
Correlation
CorrelationCorrelation
Correlation
 
Correlation and Regression
Correlation and RegressionCorrelation and Regression
Correlation and Regression
 
Correlation analysis in Biostatistics .pptx
Correlation analysis in Biostatistics .pptxCorrelation analysis in Biostatistics .pptx
Correlation analysis in Biostatistics .pptx
 
Regression & correlation coefficient
Regression & correlation coefficientRegression & correlation coefficient
Regression & correlation coefficient
 
Correlation _ Regression Analysis statistics.pptx
Correlation _ Regression Analysis statistics.pptxCorrelation _ Regression Analysis statistics.pptx
Correlation _ Regression Analysis statistics.pptx
 
Correlation and Regression.pptx
Correlation and Regression.pptxCorrelation and Regression.pptx
Correlation and Regression.pptx
 
Correlation analysis
Correlation analysisCorrelation analysis
Correlation analysis
 
Correlation continued
Correlation continuedCorrelation continued
Correlation continued
 
Correlations
CorrelationsCorrelations
Correlations
 
Unit 1 Correlation- BSRM.pdf
Unit 1 Correlation- BSRM.pdfUnit 1 Correlation- BSRM.pdf
Unit 1 Correlation- BSRM.pdf
 
Correlation
CorrelationCorrelation
Correlation
 
Biostatistics - Correlation explanation.pptx
Biostatistics - Correlation explanation.pptxBiostatistics - Correlation explanation.pptx
Biostatistics - Correlation explanation.pptx
 

More from ANCYBS

METHODS OF TEACHING: LECTURE METHOD AND STORY TELLING METHOD
METHODS OF TEACHING: LECTURE METHOD AND STORY TELLING METHODMETHODS OF TEACHING: LECTURE METHOD AND STORY TELLING METHOD
METHODS OF TEACHING: LECTURE METHOD AND STORY TELLING METHODANCYBS
 
RELEVANCE OF HEALTH PSYCHOLOGY
RELEVANCE OF HEALTH PSYCHOLOGYRELEVANCE OF HEALTH PSYCHOLOGY
RELEVANCE OF HEALTH PSYCHOLOGYANCYBS
 
THE RIGHT OF INFORMED CONSENT AND DIMENSIONS OF CONFIDENTIALITY.
THE RIGHT OF INFORMED CONSENT AND DIMENSIONS OF CONFIDENTIALITY.THE RIGHT OF INFORMED CONSENT AND DIMENSIONS OF CONFIDENTIALITY.
THE RIGHT OF INFORMED CONSENT AND DIMENSIONS OF CONFIDENTIALITY.ANCYBS
 
RECENT DEVELOPMENT IN FAMILY COUNSELLING
RECENT DEVELOPMENT IN FAMILY COUNSELLINGRECENT DEVELOPMENT IN FAMILY COUNSELLING
RECENT DEVELOPMENT IN FAMILY COUNSELLINGANCYBS
 
Ethics
EthicsEthics
EthicsANCYBS
 
Psychological Factors in Health and Disease
Psychological Factors in Health and DiseasePsychological Factors in Health and Disease
Psychological Factors in Health and DiseaseANCYBS
 
Family: Definition, Changing trends in family structure, Types of families, C...
Family: Definition, Changing trends in family structure, Types of families, C...Family: Definition, Changing trends in family structure, Types of families, C...
Family: Definition, Changing trends in family structure, Types of families, C...ANCYBS
 
Family life cycle
Family life cycleFamily life cycle
Family life cycleANCYBS
 
Legislation
Legislation Legislation
Legislation ANCYBS
 
Chronic illness
Chronic illness Chronic illness
Chronic illness ANCYBS
 
Family Dynamics
Family DynamicsFamily Dynamics
Family DynamicsANCYBS
 
CARING THE TERMINAL ILL
CARING THE TERMINAL ILLCARING THE TERMINAL ILL
CARING THE TERMINAL ILLANCYBS
 
Dual and multiple relationships in counselling
Dual and multiple relationships in counsellingDual and multiple relationships in counselling
Dual and multiple relationships in counsellingANCYBS
 
ROLE OF COLLEGE COUNSELLOR
ROLE OF COLLEGE COUNSELLORROLE OF COLLEGE COUNSELLOR
ROLE OF COLLEGE COUNSELLORANCYBS
 
consultation
consultationconsultation
consultationANCYBS
 
LEADERSHIP BEHAVIOUR DEFINATION AND THEORIES
LEADERSHIP BEHAVIOUR  DEFINATION AND THEORIESLEADERSHIP BEHAVIOUR  DEFINATION AND THEORIES
LEADERSHIP BEHAVIOUR DEFINATION AND THEORIESANCYBS
 
SOMATOFORM AND DISSOCIATIVE DISORDERS
SOMATOFORM AND DISSOCIATIVE DISORDERSSOMATOFORM AND DISSOCIATIVE DISORDERS
SOMATOFORM AND DISSOCIATIVE DISORDERSANCYBS
 
ETHICAL STANDARDS IN TESTING.
ETHICAL STANDARDS IN TESTING.ETHICAL STANDARDS IN TESTING.
ETHICAL STANDARDS IN TESTING.ANCYBS
 
BEHAVIOURAL COUNSELLING SPECIFIC TECHNIQUES
BEHAVIOURAL COUNSELLING  SPECIFIC TECHNIQUESBEHAVIOURAL COUNSELLING  SPECIFIC TECHNIQUES
BEHAVIOURAL COUNSELLING SPECIFIC TECHNIQUESANCYBS
 
Models of counselling
Models of counsellingModels of counselling
Models of counsellingANCYBS
 

More from ANCYBS (20)

METHODS OF TEACHING: LECTURE METHOD AND STORY TELLING METHOD
METHODS OF TEACHING: LECTURE METHOD AND STORY TELLING METHODMETHODS OF TEACHING: LECTURE METHOD AND STORY TELLING METHOD
METHODS OF TEACHING: LECTURE METHOD AND STORY TELLING METHOD
 
RELEVANCE OF HEALTH PSYCHOLOGY
RELEVANCE OF HEALTH PSYCHOLOGYRELEVANCE OF HEALTH PSYCHOLOGY
RELEVANCE OF HEALTH PSYCHOLOGY
 
THE RIGHT OF INFORMED CONSENT AND DIMENSIONS OF CONFIDENTIALITY.
THE RIGHT OF INFORMED CONSENT AND DIMENSIONS OF CONFIDENTIALITY.THE RIGHT OF INFORMED CONSENT AND DIMENSIONS OF CONFIDENTIALITY.
THE RIGHT OF INFORMED CONSENT AND DIMENSIONS OF CONFIDENTIALITY.
 
RECENT DEVELOPMENT IN FAMILY COUNSELLING
RECENT DEVELOPMENT IN FAMILY COUNSELLINGRECENT DEVELOPMENT IN FAMILY COUNSELLING
RECENT DEVELOPMENT IN FAMILY COUNSELLING
 
Ethics
EthicsEthics
Ethics
 
Psychological Factors in Health and Disease
Psychological Factors in Health and DiseasePsychological Factors in Health and Disease
Psychological Factors in Health and Disease
 
Family: Definition, Changing trends in family structure, Types of families, C...
Family: Definition, Changing trends in family structure, Types of families, C...Family: Definition, Changing trends in family structure, Types of families, C...
Family: Definition, Changing trends in family structure, Types of families, C...
 
Family life cycle
Family life cycleFamily life cycle
Family life cycle
 
Legislation
Legislation Legislation
Legislation
 
Chronic illness
Chronic illness Chronic illness
Chronic illness
 
Family Dynamics
Family DynamicsFamily Dynamics
Family Dynamics
 
CARING THE TERMINAL ILL
CARING THE TERMINAL ILLCARING THE TERMINAL ILL
CARING THE TERMINAL ILL
 
Dual and multiple relationships in counselling
Dual and multiple relationships in counsellingDual and multiple relationships in counselling
Dual and multiple relationships in counselling
 
ROLE OF COLLEGE COUNSELLOR
ROLE OF COLLEGE COUNSELLORROLE OF COLLEGE COUNSELLOR
ROLE OF COLLEGE COUNSELLOR
 
consultation
consultationconsultation
consultation
 
LEADERSHIP BEHAVIOUR DEFINATION AND THEORIES
LEADERSHIP BEHAVIOUR  DEFINATION AND THEORIESLEADERSHIP BEHAVIOUR  DEFINATION AND THEORIES
LEADERSHIP BEHAVIOUR DEFINATION AND THEORIES
 
SOMATOFORM AND DISSOCIATIVE DISORDERS
SOMATOFORM AND DISSOCIATIVE DISORDERSSOMATOFORM AND DISSOCIATIVE DISORDERS
SOMATOFORM AND DISSOCIATIVE DISORDERS
 
ETHICAL STANDARDS IN TESTING.
ETHICAL STANDARDS IN TESTING.ETHICAL STANDARDS IN TESTING.
ETHICAL STANDARDS IN TESTING.
 
BEHAVIOURAL COUNSELLING SPECIFIC TECHNIQUES
BEHAVIOURAL COUNSELLING  SPECIFIC TECHNIQUESBEHAVIOURAL COUNSELLING  SPECIFIC TECHNIQUES
BEHAVIOURAL COUNSELLING SPECIFIC TECHNIQUES
 
Models of counselling
Models of counsellingModels of counselling
Models of counselling
 

Recently uploaded

Evidences of Evolution General Biology 2
Evidences of Evolution General Biology 2Evidences of Evolution General Biology 2
Evidences of Evolution General Biology 2John Carlo Rollon
 
Transposable elements in prokaryotes.ppt
Transposable elements in prokaryotes.pptTransposable elements in prokaryotes.ppt
Transposable elements in prokaryotes.pptArshadWarsi13
 
Artificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PArtificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PPRINCE C P
 
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfBehavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfSELF-EXPLANATORY
 
Analytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdfAnalytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdfSwapnil Therkar
 
‏‏VIRUS - 123455555555555555555555555555555555555555
‏‏VIRUS -  123455555555555555555555555555555555555555‏‏VIRUS -  123455555555555555555555555555555555555555
‏‏VIRUS - 123455555555555555555555555555555555555555kikilily0909
 
Pests of safflower_Binomics_Identification_Dr.UPR.pdf
Pests of safflower_Binomics_Identification_Dr.UPR.pdfPests of safflower_Binomics_Identification_Dr.UPR.pdf
Pests of safflower_Binomics_Identification_Dr.UPR.pdfPirithiRaju
 
Solution chemistry, Moral and Normal solutions
Solution chemistry, Moral and Normal solutionsSolution chemistry, Moral and Normal solutions
Solution chemistry, Moral and Normal solutionsHajira Mahmood
 
Pests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdfPests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdfPirithiRaju
 
Scheme-of-Work-Science-Stage-4 cambridge science.docx
Scheme-of-Work-Science-Stage-4 cambridge science.docxScheme-of-Work-Science-Stage-4 cambridge science.docx
Scheme-of-Work-Science-Stage-4 cambridge science.docxyaramohamed343013
 
Harmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms PresentationHarmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms Presentationtahreemzahra82
 
Twin's paradox experiment is a meassurement of the extra dimensions.pptx
Twin's paradox experiment is a meassurement of the extra dimensions.pptxTwin's paradox experiment is a meassurement of the extra dimensions.pptx
Twin's paradox experiment is a meassurement of the extra dimensions.pptxEran Akiva Sinbar
 
OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024innovationoecd
 
TOPIC 8 Temperature and Heat.pdf physics
TOPIC 8 Temperature and Heat.pdf physicsTOPIC 8 Temperature and Heat.pdf physics
TOPIC 8 Temperature and Heat.pdf physicsssuserddc89b
 
Grafana in space: Monitoring Japan's SLIM moon lander in real time
Grafana in space: Monitoring Japan's SLIM moon lander  in real timeGrafana in space: Monitoring Japan's SLIM moon lander  in real time
Grafana in space: Monitoring Japan's SLIM moon lander in real timeSatoshi NAKAHIRA
 
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxAnalytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxSwapnil Therkar
 
Neurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 trNeurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 trssuser06f238
 

Recently uploaded (20)

Hot Sexy call girls in Moti Nagar,🔝 9953056974 🔝 escort Service
Hot Sexy call girls in  Moti Nagar,🔝 9953056974 🔝 escort ServiceHot Sexy call girls in  Moti Nagar,🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Moti Nagar,🔝 9953056974 🔝 escort Service
 
Evidences of Evolution General Biology 2
Evidences of Evolution General Biology 2Evidences of Evolution General Biology 2
Evidences of Evolution General Biology 2
 
Transposable elements in prokaryotes.ppt
Transposable elements in prokaryotes.pptTransposable elements in prokaryotes.ppt
Transposable elements in prokaryotes.ppt
 
Artificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PArtificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C P
 
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfBehavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
 
Analytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdfAnalytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdf
 
‏‏VIRUS - 123455555555555555555555555555555555555555
‏‏VIRUS -  123455555555555555555555555555555555555555‏‏VIRUS -  123455555555555555555555555555555555555555
‏‏VIRUS - 123455555555555555555555555555555555555555
 
Pests of safflower_Binomics_Identification_Dr.UPR.pdf
Pests of safflower_Binomics_Identification_Dr.UPR.pdfPests of safflower_Binomics_Identification_Dr.UPR.pdf
Pests of safflower_Binomics_Identification_Dr.UPR.pdf
 
Solution chemistry, Moral and Normal solutions
Solution chemistry, Moral and Normal solutionsSolution chemistry, Moral and Normal solutions
Solution chemistry, Moral and Normal solutions
 
Pests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdfPests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdf
 
Scheme-of-Work-Science-Stage-4 cambridge science.docx
Scheme-of-Work-Science-Stage-4 cambridge science.docxScheme-of-Work-Science-Stage-4 cambridge science.docx
Scheme-of-Work-Science-Stage-4 cambridge science.docx
 
Harmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms PresentationHarmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms Presentation
 
Twin's paradox experiment is a meassurement of the extra dimensions.pptx
Twin's paradox experiment is a meassurement of the extra dimensions.pptxTwin's paradox experiment is a meassurement of the extra dimensions.pptx
Twin's paradox experiment is a meassurement of the extra dimensions.pptx
 
OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024
 
TOPIC 8 Temperature and Heat.pdf physics
TOPIC 8 Temperature and Heat.pdf physicsTOPIC 8 Temperature and Heat.pdf physics
TOPIC 8 Temperature and Heat.pdf physics
 
Grafana in space: Monitoring Japan's SLIM moon lander in real time
Grafana in space: Monitoring Japan's SLIM moon lander  in real timeGrafana in space: Monitoring Japan's SLIM moon lander  in real time
Grafana in space: Monitoring Japan's SLIM moon lander in real time
 
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
 
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxAnalytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
 
Volatile Oils Pharmacognosy And Phytochemistry -I
Volatile Oils Pharmacognosy And Phytochemistry -IVolatile Oils Pharmacognosy And Phytochemistry -I
Volatile Oils Pharmacognosy And Phytochemistry -I
 
Neurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 trNeurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 tr
 

CORRELATION COEFFICIENTS

  • 2. Correlation • Correlation is a statistical measure for finding out degree(or strength)of association between two(or more) variables. If the change in one variable effects a change in other variable then these variables are said to be correlated.
  • 3. Correlation The measure of correlation called the correlation coefficient . Correlation coefficient ranges from correlation ( -1 ≤ r ≥ +1) The direction of change is indicated by a sign.
  • 4. Correlation & Causation Causation means cause & effect relation. Causation always implies correlation but correlation does not necessarily implies causation.
  • 5. Correlation – basic assumptions • does not change when we change the units of measurement. For example, from Kg to pounds for weight. Why?  r uses standardized values of the observations. • does not measure nor describe curved or non-linear association no matter how strong. • Like the mean and SD, r is not resistant or uninfluenced by outliers.  r is strongly affected by outlier or outlying observations.
  • 6. Types of Correlation Type I Correlation Positive Correlation Negative Correlation
  • 7. Types of Correlation  Positive Correlation: The correlation is said to be positive if the values of two variables changing with same direction.  Ex. No of hours spent on study and grade in exam  Negative Correlation: The correlation is said to be negative when the values of variables change with opposite direction.  No of hrs spent on watching TV and grades in exam .
  • 8. Direction of the Correlation • Positive relationship – Variables change in the same direction.  As X is increasing, Y is increasing  As X is decreasing, Y is decreasing ▫ E.g., As study time increases, grades increase • Negative relationship – Variables change in opposite directions.  As X is increasing, Y is decreasing  As X is decreasing, Y is increasing ▫ E.g., As TV time increases, grades decrease
  • 9. More examples • Positive relationshipsPositive relationships ▫ No of vehicles and air pollution . ▫ Smoking and cancer. • Negative relationshipsNegative relationships: ▫ alcohol consumption and driving ability. ▫ Cholesterol level and heart disease
  • 10. A perfect positive correlation Height Weight Height of A Weight of A Height of B Weight of B A linear relationship
  • 11. Degree of correlation • Moderate Positive Correlation Weight Shoe Size r = + 0.4
  • 12. Degree of correlation • Perfect Negative Correlation Exam score TV watching per week r = -1.0
  • 13. Degree of correlation • Weak negative Correlation No of friends No of books read in a month r = - 0.2
  • 14. Strong, negative relationship but non-linear! Strong negative Correlation
  • 15. Degree of correlation (r) r = +.80 r = +.60 r = +.40 r = +.20
  • 16. Types of Correlation • Simple correlation: Under simple correlation problem there are only two variables are studied. • Multiple Correlation: Under Multiple Correlation three or more variables are studied. • Partial correlation: analysis recognizes more than two variables but considers only two variables keeping the other constant.
  • 17. Methods of studying correlation • Pearson product moment correlation • Rank correlation • Kendal’s Tau • Biserial correlation • Point Biserial correlation • Phi coefficient • Tetra choric correlation
  • 18. Correlation Coefficient  Pearson’s Product Moment Correlation  Symbolized by r  Covariance ÷ (product of the 2 SDs)  Correlation is a standardized covarianceYX XY ss Cov r =
  • 19. Calculation for Example • CovXY = 11.12 • sX = 2.33 • sY = 6.69 cov 11.12 11.12 .713 (2.33)(6.69) 15.59 XY X Y r s s = = = =
  • 20. Other formulae • Z-score method • Computational (Raw Score) Method 20 1 x yz z r N = − ∑ 2 2 2 2 ( ) ( ) N XY X Y r N X X N Y Y − =    − −    ∑ ∑ ∑ ∑ ∑ ∑ ∑
  • 21. Interpretation of Correlation Coefficient (r) • The value of correlation coefficient ‘r’ ranges from -1 to +1 • If r = +1, then the correlation between the two variables is said to be perfect and positive • If r = -1, then the correlation between the two variables is said to be perfect and negative • If r = 0, then there exists no correlation between the variables
  • 22. Relation between regression and correlation • The coefficient of correlation is the geometric mean of two regression coefficient. r = √ bxy * byx
  • 23. Limitation of Pearson’s Coefficient • Always assume linear relationship • Interpreting the value of r is difficult. • Value of Correlation Coefficient is affected by the extreme values.
  • 24. Outliers.....Outliers are dangerous Here we have a spurious correlation of r=0.68 without IBM, r=0.48 without IBM & GE, r=0.21
  • 25. Coefficient of Determination • It is the square of correlation coefficient (r²) • It explains how much of the variability of a factor is explained by its relationship to another factor. • The maximum value of r2 is 1 because it is possible to explain all of the variation in y but it is not possible to explain more than all of it. • Coefficient of Determination = Explained variation / Total variation
  • 26. Coefficient of Determination: An example  r = 0.60 r = 0.30  It does not mean that the first correlation is twice as strong as the second  This can be understood by computing the value of r2 . When r = 0.60 r2 = 0.36 r = 0.30 r2 = 0.09  This implies that in the first case 36% of the total variation is explained (shared) whereas in second case 9% of the total variation is explained (shared) .
  • 28. ExampleRes pon den ts Variabl e X Variable Y XY X² Y² 1 43 99 4257 1849 9801 2 21 65 1365 441 4225 3 25 79 1975 625 6241 4 42 75 3150 1764 5625 5 57 87 4859 3249 7569 6 59 81 4779 3481 6561 Σ 247 486 20485 11409 40022
  • 29. Solution • From our table: • Σx = 247 • Σy = 486 • Σxy = 20,485 • Σx2 = 11,409 • Σy2 = 40,022 • n is the sample size, in our case = 6
  • 30. Solution • = 6(20,485) – (247 × 486) √[6(11,409) – 247²] × [6(40,022) – 486²] • =0.5298
  • 31. 6084 8464 8100 3364 1849 5476 6561 624 184 450 696 645 666 486 57 516 3751 579 39898 1 8 78 2 2 92 3 5 90 4 12 58 5 15 43 6 9 74 7 6 81 64 4 25 144 225 81 36 xy x2 y2 Another example x y
  • 32. Spearman’s Rank Coefficient of Correlation (Rho) • When variables under study are arranged in serial order Spearman Rank correlation can be used. • Rho = 1- (6 ∑D2 ) / N (N2 – 1) • Rho = Rank correlation coefficient • D = Difference of rank between paired item in two series. • N = Total number of observation.
  • 33. Rank Correlation Coefficient (Rho) a) Problems where actual rank are given. 1) Calculate the difference ‘D’ of two Ranks i.e. (R1 – R2). 2) Square the difference & calculate the sum of the difference i.e. ∑D2 3) Substitute the values obtained in the formula.
  • 34. Example • To calculate a Spearman rank-order correlation on data without any ties • English 56 75 45 71 62 64 58 80 76 61 • Maths 66 70 40 60 65 56 59 77 67 63
  • 35. Example • Eng Math Rank Rank d d2 • Eng Maths • 56 66 9 4 5 25 • 75 70 3 2 1 1 • 45 40 10 10 0 0 • 71 60 4 7 3 9 • 62 65 6 5 1 1 • 64 56 5 9 4 16 • 58 59 8 8 0 0 • 80 77 1 1 0 0 • 76 67 2 3 1 1 • 61 63 7 6 1 1
  • 36. Solution • Σd² = 25+1+…. =54 • Applying the formulae • Rho= 1- (6x54)/10(10 ²-1) • 1-324/990 • 1- 0.33 • .67
  • 37. Rank Correlation Coefficient (Rho) • Equal Ranks or tie in Ranks: In such cases average ranks should be assigned to each individual • Example (to be worked out)
  • 38. Interpretation of Rank Correlation Coefficient • The value of rank correlation coefficient, R ranges from -1 to +1 • If R = +1, then there is complete agreement in the order of the ranks and the ranks are in the same direction • If R = -1, then there is complete agreement in the order of the ranks and the ranks are in the opposite direction • If R = 0, then there is no correlation
  • 39. Merits Spearman’s Rank Correlation • This method is simpler to understand and easier to apply compared to Karl Pearson’s correlation method. • This method is useful in ordinal data • But difficult if data is large
  • 40. Kendall's Tau • Kendall's τ (tau) is a non-parametric measure of correlation between two ranked variables. It is similar to Spearman's Rho and Pearson's Product Moment Correlation Coefficient
  • 41. Calculation of τ • τ = C-D/C+D • C= Concordant Pairs • D= Discordant Pairs • A concordant pair is when the rank of the second variable is greater than the rank of the former variable. • A discordant pair is when the rank is equal to or less than the rank of the rst variable
  • 42. ExampleRank variable 1 Rank variable 2 1 1 2 3 3 6 4 2 5 7 6 4 7 5 R2 1 2 C 3 C C 4 C D D 5 C C C C 6 C C C D D 7 C C C C D D 1 2 3 4 5 6 7 Counting Concordant and Discordant Values τ= 15-6/15+6 = 7/21= 0.429
  • 43. ExampleChange in Testoster one Display Ranked Change in Testostero ne Ranked Display 1.16 05.40 1 1 1.07 3.80 2 4 1.06 3.60 3 5 1.01 4.80 4 2 .96 2.60 5 7 . 90 4.60 6 3 . 81 2.40 7 8 .23 3.20 8 6
  • 44. Calculating the Kendall tau-a Coefficient Ran ked Cha nge in Disp lay Scor es 2 C 3 C C 4 C D D 5 C C D D 6 C C C C C 7 C C C C D D 8 C C C C C C D 1 2 3 4 5 6 7 8
  • 45. solution • Taking the first person, who is ranked 1 for change in testosterone, how many people are ranked above that person for display? These are concordant – and the answer is 7 people, so C = 7. The number of discordant people, who are ranked above, is zero, so D = 0. • Take the second person. 6 people are ranked above that person, and they are concordant, so C = 6, and 1 person (the person ranked 4th in display) is equal, so they are discordant, D = 1. • We keep doing this for each person, but we can make our lives easier by putting this into a table, which is shown in Table 2. For each pair of people, we say whether the scores are concordant, in which case we give them a C, or discordant, in which case we give them a D.
  • 46. Easier method • 1 2 3 4 5 6 7 8 • 1 4 5 2 7 3 8 6 • No of inversions 7 • τ = 1- (2r) • n(n-1)/2 • r= number of inversions • n= number of cases • τ= 1- 14 • 6x7/2 • 1-.5 =.5
  • 47. An easier method A B C D E F G H I J •V1 1 2 3 4 5 6 7 8 9 10 •V2 2 1 5 3 4 6 10 8 7 9 •τ = 1- (2r) • n(n-1)/2 •r= number of inversions •n= number of cases •1- 2x5 • 10x9/2 •1- 0.222 = 778
  • 48. Significance of tau • Z= τ • √2(2n+5)/9n(n-1) √ .778 50x810 • = 3.17 it is >1.96
  • 49. Example 2 • A B C D E F G H I J • V1 1 2 3 4 5 6 7 8 9 10 • V2 5 1 2 4 3 10 6 7 9 8 • Do it yourself
  • 50. Where can you use • to understand whether there is an association between exam grade and time spent revising (i.e., where there were six possible exam grades – A, B, C, D, E and F – and revision time was split into five categories: less than 5 hours, 5-9 hours, 10-14 hours, 15-19 hours, and 20 hours or more). • to understand whether there is an association between customer satisfaction and delivery time (i.e., where delivery time had four categories – next day, 2 working days, 3-5 working days, and more than 5 working days – and customer satisfaction was measured in terms of :Highly satisfied; very satisfied, satisfied, dissatisfied, highly dissatisfied).
  • 51. Comparison of tau and rank correlations• Kendall’s tau • In most of the situations, the interpretations of Kendall’s tau and Spearman’s rank correlation coefficient are very similar and thus invariably lead to the same inferences.  give usually smaller values than Spearman’s rho correlation.  Calculations based on concordant and discordant pairs.  Insensitive to error.  P values are more accurate with smaller sample sizes.  The distribution of Kendall’s tau has better statistical properties Spearman rank •usually have larger values than Kendall’s Tau. • Calculations based on deviations. •Much more sensitive to error and discrepancies in data •rank correlation coefficient is the more widely used rank correlation coefficient
  • 52. Other Kinds of Correlation • Point biserial correlation coefficient (rpb) ▫ used with one continuous scale and one nominal or ordinal or dichotomous scale. ▫ uses the same Pearson formula Attractiveness Date? 3 0 4 0 1 1 2 1 5 1 6 0 rpb = -0.49 52
  • 53. Point biserial • Point biserial is used when one variable is continuous and the other is dichotomous (like gender) • Rpb = M1-M2 • Sn-1 x √n(n-1)²
  • 54. Computation of point biserial • r pb = Mp-Mq • Std x √pq • Where rpb is point biserial correlation • Mp is mean score of students answering correctly • Mq is mean of students answering incorrectly • Std is standard deviation of the whole sample • P is proportion of students answering correctly • Q is 1-p
  • 55. Computation of point biserial Student Item1 Item2 Item3 item4 Item5 Item6 item7 total 1 1 0 1 50 2 1 0 1 45 3 1 0 1 45 4 1 0 1 40 5 0 1 1 35 6 0 1 1 30 7 0 1 1 30 8 0 1 1 25 Mp 45 30 37.5 Mean total score 37.5 Mq 30 45 0 Sd 8.29 p .50 .50 1.00 q .50 .50 0 rpb .91 -.91 0
  • 56. Correlation and ‘t’ We can convert r to t and test for significance: Where DF = N-2 2 2 1 N t r r − = −
  • 57. Tables of Significance  Suppose r= 0.71 and n=21  Start with Ho = r=0  Df = N-2 = 21 – 2 = 19  ‘t’-crit (19) = 2.09  Since 6.90 is larger than 2.09 reject null hypothesis r = 0. 2 2 2 19 19 .71* .71* 6.90 1 1 .71 .4959 N t r r − = = = = − −
  • 58. Other Kinds of Correlation • Phi coefficient (Φ) ▫ used with two dichotomous scales. ▫ uses the same Pearson formula Attractiveness Date? 0 0 1 0 1 1 1 1 0 0 1 1 Φ = 0.71 58
  • 59. formula A B A+B C D C+D A+C B+D
  • 60. solution Attracti veness yes Attracti veness no total Date yes 3 0 3 Date no 1 2 3 total 4 2 6-0/√3x3x4x2 6/ √72 6/8.48 0.707
  • 61.
  • 62. Tetrachoric correlation • If you have dichotomous data on two variables but are willing to assume that the underlying variables are normally distributed, you may use the tetrachoric correlation to estimate the size correlation between the underlying variables. • Rpb= Cos {180◦/1+√(ad/bc)}
  • 63. Tetrachoric correlation • When you have continuous data but wants to split the data in a dichotomous form (median split for example), you use tetra choric. Here you are artificially making a continuous data dichotomous
  • 64. example Attitude towards women Score on openness Negative Positive Total Above median 68 a 32 b 100 Below median 30 c 70 d 100 Total 98 102 200 rtet= Cos {180◦/1+√(ad/bc)} Cos {180◦/√(65x70)/30x32} = cos 55.784 = . 722
  • 65. On tetrachoric • This formula works best only when ▫ N is large ▫ When the splits are as near the median as possible It is better to use phi coefficient than tetra choric
  • 66. Advantages of Correlation studies • Show the amount (strength) of relationship present • Can be used to make predictions about the variables under study. • Can be used in many places, including natural settings, laboratories , etc. • Easier to collect co relational data
  • 67. Factors Affecting r  Range restrictions (truncation)  Looking at only a small portion of the total scatter plot (looking at a smaller portion of the scores’ variability) decreases r.  Reducing variability reduces r  It affects especially in validation with an external criterion in selection scenario  Nonlinearity  The Pearson r (and its relatives) measure the degree of linear relationship between two variables  If a strong non-linear relationship exists, r will provide a low, or at least inaccurate measure of the true relationship.
  • 71. Factors affecting correlation • Reliability of measurement ▫ If you are not reliably distinguishing individuals on some measure, you will not capture the covariance that measure may have with another adequately • Heterogeneous subsamples ▫ Sub-samples may artificially increase or decrease overall r. ▫ Solution - calculate r separately for sub-samples & overall, look for differences ▫ Can be caused by lack of reliability • Outliers can artificially increase or decrease • r
  • 72. Testing Correlations  How to find out if a correlation is big/ large?  In terms of magnitude, how big is big?  Small correlations in large samples are “big.”  Large correlations in small samples aren’t always “big.”  Depends upon the magnitude of the correlation coefficient  AND  The size of your sample. 73
  • 73. Correlation and effect size Effect size r Small 0.10 Medium 0.30 Large 0.50 Pearson r or correlation coefficient
  • 75. What is a partial correlation • Partialling is holding constant a third variable via residuals • It estimates what would happen if everyone had the same score in the third variable
  • 76. Partial correlation • Two variables A and B are correlated. But you feel that this relationship is influenced by a third variable C. You want to remove this influence and want to know the true correlation between A and B. • In this case you partial out the influence of C.
  • 77. Example • You know that exam grades are correlated with intelligence. You also know that exam grades are influenced by exam anxiety. You also know that intelligence scores are moderated by anxiety. You want to know the correlation between exam grade and intelligence when controlled for anxiety.
  • 78. Example • Suppose you have the following data. • Correlation between exam (A) grade and intelligence (B) = .918 • Correlation between exam grade(A) and anxiety (C) = -.369 • Correlation between anxiety (C) and intelligence(B) = -.245
  • 79. Solution • Correlation between exam grade and intelligence controlled for anxiety is • r AB.C = rAB-rACxrBC • √(1-r²AC)(1-r²BC) • = .918-(-.369x-.245)/√(1-(-.369²)x1-(-.245)² • = .918-.090/√.864x.94 • = .918-.090 = .828/.901 • = .919 • The true correlation between exam score and intelligence is .919. We can see that the correlation improved slightly after partialling out the effect of anxiety.
  • 80. Example 2Attitude to wards women Openness Education 2 7 14 4 10 13 8 14 11 7 13 9 8 9 5 9 10 14 1 9 5 0 9 6 6 12 11 5 10 12
  • 81. Solution • Find out correlation between ATW and Openness • Find out correlation between ATW amd Edu • Find out correlation between Openess and Edu • Partial out the effect of edu in the correlation between ATW and openness
  • 82. Solution • r between ATW and openness 0.662 (A) • r between ATW and Edu 0.276(B) • r between Openness and Edu 0.250 (C) • Partialling effect of edu • r AB.C = rAB-rACxrBC • √(1-r²AC)(1-r²BC) • .662-.276x.250/√(1 -.276²)(1-.250²) = .637
  • 83. Partial and semi-partial correlation • In partial correlation the effect of the third variable is partialled out from both the variables rAB.C • In semi partial correlation the effect of the third variable is partialled out from only one the two variables. rA(B.C)
  • 84. Order of partialling • If you partial 1 variable out of a correlation, the resulting partial is called a first order partial correlation. • If you partial 2 variables out of a correlation, the resulting partial is called a second order partial correlation. Can have 3rd, 4th, etc., order partials. • Unpartialed (raw) correlations are called zero order correlations because nothing is partialed out. • Can use regression to find residuals and compute partial correlations from the residuals, e.g. for r12.34, regress 1 and 2 on both 3 and 4, then compute correlation between 2 sets of residuals.
  • 85. Solution • In the above example the relationship between exam grade and intelligence can be semipartialed by removing the effect of anxiety only on intelligence • r AB.C = rAB-rACxrBC/√1-r²BC) • .918--.369x-.245/√1-.(-245)² • .918-.090 = .858/.970 • 0.885 • The correlation between exam grade and intelligence after removing the influnce of anxiety on intelligence is 0.885. The effect of anxiety on exam grade is not removed.
  • 86. The correlation coefficient of number of times absent and final grade is r = –0.975. The coefficient of determination is r2 = (–0.975)2 = 0.9506. Interpretation: About 95% of the variation in final grades can be explained by the number of times a student is absent. The other 5% is unexplained and can be due to sampling error or other variables such as intelligence, amount of time studied, etc. Strength of the Association The coefficient of determination, r2 , measures the strength of the association and is the ratio of explained variation in y to the total variation in y.
  • 88. Regression Analysis • Regression Analysis is a very powerful tool in the field of statistical analysis in predicting the value of one variable, given the value of another variable, when those variables are related to each other.
  • 89. Regression Analysis• Regression takes us a step beyond correlation in that not only are we concerned with the strength of the association, but we want to be able to describe its nature with sufficient precision to be able to make predictions • To be able to make predictions, we need to be able to characterize one of the variables in the relationship as independent and the other as dependent
  • 90. Regression Analysis • For example, in the relationship (male literacy vs % of people living in the cities), the causal order seems pretty obvious. Literacy rates are not like to produce urbanization, but urbanization is probably causally prior to increases in literacy rates
  • 91. Regression and Prediction • If you say that there is a correlation between no of vehicles and air pollution it does not convey causal relationship though you know that vehicles can increase pollution and pollution cannot increase vehicles • In regression analysis you predict for each unit increase in vehicle population how much increase in pollution will result.
  • 92. In short • Regression Analysis is mathematical measure of average relationship between two or more variables. • Regression analysis is a statistical tool used in prediction of value of unknown variable from known variable.
  • 93. Advantages of Regression Analysis • Regression analysis provides estimates of values of the dependent variables from the values of independent variables. • Regression analysis also helps to obtain a measure of the error involved in using the regression line as a basis for estimations . • Regression analysis helps in obtaining a measure of the degree of association or correlation that exists between the two variable.
  • 95. What is regression? • Fitting a line to the data using an equation in order to describe and predict data • Simple Regression ▫ Uses just 2 variables (X and Y) ▫ Other: Multiple Regression (one Y and many X’s) • Linear Regression ▫ Fits data to a straight line ▫ Other: Curvilinear Regression (curved line)
  • 96. • Existence of actual linear relationship. • The regression analysis is used to estimate the values within the range for which it is valid. • In regression, we have only one dependant variable in our estimating equation. However, we can use more than one independent variable. • Assumptions in Regression Analysis
  • 97. • The dependent variable takes any random value but the values of the independent variables are fixed. • In regression, we have only one dependant variable in our estimating equation. However, we can use more than one independent variable. Assumptions in Regression Analysis
  • 98. What is regression • Regression indicates the degree to which the variation in one variable X, is related to or can be explained by the variation in another variable Y • Once you know there is a significant linear correlation, you can write an equation describing the relationship between the x and y variables. This equation is called the line of regression or least squares line.
  • 99. Regression Equation • Regression line of Y on X : gives the best estimate for the value of y for any specific given values of x • Y = ax+ b a =Slope of the line • b =Y - intercept • Y = Dependent variable • X = Independent variable
  • 100. Regression Equation: We can predict a Y score from an X by plugging a value for X into the equation and calculating Y What would we expect a person to get on quiz #4 if they got a 12.5 on quiz #3? Y = .823X + -4.239 Y = .823(12.5) + -4.239 = 6.049
  • 101. Interpreting Regression : Basics • Intercept ▫ Value of Y if X(s) is 0 ▫ Often not meaningful, particularly if it’s practically impossible to have an X of 0 (e.g. weight) • Slope, the regression coefficient ▫ Amount of change in Y seen with 1 unit change in X  Standardized regression coefficient  Amount of change in Y seen in standard deviation units with 1 standard deviation unit change in X  In simple regression it is equivalent to the r for the two variables • Standard error of estimate ▫ Gives a measure of the accuracy of prediction • R2 ▫ Proportion of variance in the outcome explained by the model ▫ Effect size
  • 104. Explanation • In the above example, y = 5x + 2 • X is the slope and 2 is the intercept • This means that • The predicted Y = 5x (value of x) +2 • Suppose you want to predict the corresponding value of x=5, • Then Y(pre) = (5x5) +2 = 27 • If x = 12 then Y (pre)= (5 x 12) + 2 = 62
  • 105. Explanation• We should also know that what we calculate is the estimated value of Y for a given value of x • This need not be accurate; there is some error in prediction. (because we assume regression line to be a straight line ,but the data points actually cluster around the line not exactly on the line)
  • 106. Explanation• So this predicted (estimate value of Y is called • Y- is the error. • The regression line is fitted in such a way that this error is minimum
  • 107. The Explanation of Regression Line • In case of perfect correlation ( positive or negative ) the two line of regression coincide. • If the two Regression lines are far from each other, then degree of correlation is less, & vice versa. • The mean values of X &Y can be obtained as the point of intersection of the two regression line. • The higher degree of correlation between the variables, the angle between the lines is smaller & vice versa.
  • 108. Regression Equation / Line & Method of Least Squares • Regression Equation of y on x Y = a x+ b We have to obtain the values of a, b • Regression Equation of x on y X = cy + d We have to obtain the values of c and d
  • 109. How to calculate • Regression equation = Y ( = )= ax + b • Where ‘a’ is the slope and ‘b’ is the y intercept • To find out slope a= nΣxy-ΣxΣy • nΣx²-(Σx) ² • Y intercept = b= Ῡ-a
  • 110. Regression Equation / Line when Deviation taken from Arithmetic Mean • Regression Equation of y on x: Y – Y = byx (X –X) byx = ∑xy / ∑x2 byx=r (σy / σx ) • Regression Equation of x on y: X – X = bxy(Y –Y) bxy = ∑xy / ∑y2 bxy=r (σx / σy )
  • 111. Properties of the Regression Coefficients • The coefficient of correlation is geometric mean of the two regression coefficients. r = √ byx * bxy • If byx is positive than bxy should also be positive & vice versa. • If one regression coefficient is greater than one the other must be less than one. • The coefficient of correlation will have the same sign as that our regression coefficient. • Arithmetic mean of byx & bxy is equal to or greater than coefficient of correlation. byx + bxy / 2 ≥ r • Regression coefficient are independent of origin but not of scale.
  • 112. 180 190 200 210 220 230 240 250 260 1.5 2.0 2.5 3.0Ad $ = a residual (xi,yi) = a data point revenue = a point on the line with the same x-value Best fitting straight line
  • 113. Calculate a and b. Write the equation of the line of regression with x = number of absences and y = final grade. The line of regression is: = –3.924x + 105.66 6084 8464 8100 3364 1849 5476 6561 624 184 450 696 645 666 486 57 516 3751 579 39898 1 8 78 2 2 92 3 5 90 4 12 58 5 15 43 6 9 74 7 6 81 64 4 25 144 225 81 36 xy x2 y2 x y
  • 114. Solution • a= nΣxy-ΣxΣy • nΣx²-(Σx) ² • 7x 3751- 57x 516 • 7x 579 x 57² • = -3.924 • b = -Ῡ a • = 516/7 – (-3.924x 57/7) = 73.714 + 31.953 = 105.667 • The line of regression is • = -3.924 x + 105.667
  • 115. 0 2 4 6 8 10 12 14 16 40 45 50 55 60 65 70 75 80 85 90 95 Absences FinalGrade m = –3.924 and b = 105.667 The line of regression is: Note that the point = (8.143, 73.714) is on the line. The Line of Regression
  • 116. The regression line can be used to predict values of y for values of x falling within the range of the data. The regression equation for number of times absent and final grade is: Use this equation to predict the expected grade for a student with (a) 3 absences (b) 12 absences (a) (b) Predicting y Values = –3.924(3) + 105.667 = 93.895 = –3.924(12) + 105.667 = 58.579 = –3.924x + 105.667
  • 117. Estimating the error • Here the estimated grade for 12 absences is 58.57 • But from the original data you can find m that for 12 absences the grade obtained is 58. So the error Y- = 58-58.57 = -.57 Similarly for 6 absences we can calculate
  • 118. Estimating the error • = –3.924(6) + 105.667 = 82.12 • But obtained value is 81 • So Estimated value and obtained value have some difference
  • 119. Standard Error of Estimate. • Standard Error of Estimate is the measure of variation around the computed regression line. • Standard error of estimate (SE) of Y measure the variability of the observed values of Y around the regression line. • Standard error of estimate gives us a measure about the line of regression. of the scatter of the observations about the line of regression.
  • 120. Estimating the error • We can estimate this error by the following formula • SE est of Y=σy √1-r² • σy = is the sd of y distribution • r² = square of correlation between x and y
  • 121. Estimating the error • In the above example the correlation between no of absences (x) and grades (y) is .975 • Sd is 17.61 • Then SE =σy √1-r² • 17.61 x √ .046 = 17.61-.224 • 17.39 • This means the estimated y can be +/- 17.39
  • 122. Problem in Regression • The following are the scores obtained in two variables X, Y by 10 individuals. Find out the regression of Y on X and X on YIn dls X x² Y y² xy 1 40 1600 2 4 80 2 43 1849 5 25 215 3 45 2025 4 16 180 4 46 2116 7 49 322 5 60 3600 9 81 540 6 63 3969 5 25 315 7 69 4761 2 4 138 8 54 2916 8 64 432 9 70 4900 6 36 420 10 62 3844 9 81 558
  • 123. solution • ΣX = 552 • ΣY = 57 • ΣX² = 31580 • ΣY²= 385 • ΣXY= 3200
  • 124. Solution • a= nΣxy-ΣxΣy • nΣx²-(Σx) ² • 10 x 3200- (552x 57) 10 x 31580 - 552² • 32000- 31464 • 315800-304704 • 536/11096 • 0.048
  • 125. Solution • b = -Ῡ a • 5.7- 0.048 x 55.2 • 3.05 • Regression Equation is • = .048x + 3.05 • (for x= 60) .048 x 60 + 3.05 • 5.93 • Obtained score is 9 • So error = -Y = -3.07
  • 126. Solution • For x= 70 What is the estimated Y • Estimated Y = .048 x 70 + 3.05 • 6.41 • Obtained value for 70 is 6 • So error is 6.41-6 = .41
  • 127. Calculation of regression of X on Y • c= nΣxy-ΣxΣy • nΣy²-(Σy) ² • Numerator is same • Denominator 10 x 385 - 57² • • c= 536/601 = 0.892 • d = -cῩ = 55.2- 0.892 x 5.7 • 50.116
  • 128. Calculation of regression of X on Y • Regression equation is ̂̂x = cy + d = .892y + 50.116 • Verify: • For y= 5, • = .892 x 5 + 50.116 = 54.576 • Obtained value = 43 • For y= 9, is 58.144 • Obtained value is 62
  • 129. Std error • Then SE =σy √1-r² • Sd y =2.58414 • r = -.1278 • 2.58414 x .9837 • 2.5420 • This means that for any given value of x the estimated value of y may be +/- 2.540 of the true value
  • 130. Multiple Regression Y = a + b1X1 + b2X2 Notation  a is the Y intercept, where the regression line crosses the Y axis  b1 is the partial slope for X1 on Y  b1 indicates the change in Y for one unit change in X1, controlling for X2  b2 is the partial slope for X2 on Y  b2 indicates the change in Y for one unit change in X2, controlling for X1
  • 131. Partial Slopes • The partial slopes = the effect of each independent variable on Y while controlling for the effect of the other independent variable(s). • Show the effects of the X’s in their original units. • These values can be used to predict scores on Y. • Partial slopes must be computed before computing a (the Y intercept).
  • 132. Formula for y intercept
  • 134. Standardized Partial Slopes (beta-weights) • Partial slopes (b1 and b2) are in the original units of the independent variables. • To compare the relative effects of the independent variables, compute beta-weights (b*). • Beta-weights show the amount of change in the standardized scores of Y for a one-unit change in the standardized scores of each independent variable while controlling for the effects of all other independent variables.
  • 135. Beta-weights • Formula to calculate the beta-weight for X1 • Formula to calculate the beta-weight for X2
  • 136. Multiple Correlation (R2 ) • The multiple correlation coefficient (R2 ) shows the combined effects of all independent variables on the dependent variable.
  • 137. Limitations Multiple regression and correlation are among the most powerful techniques available to researchers. •These techniques require: ▫ Every variable is measured at the interval-ratio level ▫ Each independent variable has a linear relationship with the dependent variable ▫ Independent variables do not interact with each other ▫ Independent variables are uncorrelated with each other
  • 138. Limitations When these requirements are violated (as they often are), these techniques will produce biased and/or inefficient estimates. There are more advanced techniques available to researchers that can correct for violations of these requirements. Such techniques are beyond the scope of this course
  • 140. Statement of problem • A common problem is that there is a large set of candidate predictor variables. • (Note: The examples herein are really not that large.) • Goal is to choose a small subset from the larger set so that the resulting regression model is simple, yet have good predictive ability.
  • 141. Example: Selection data • You are trying to select the best candidates from a pool of applicant for a job, using a number of variables (and their tests) • Cognitive ability • Adjustment • Integrity • Leadership • Stress tolerance
  • 142. Your problem • You want to select those variables which together will predict the criterion (job success) • You want to select only minimum variables • Together their predictive efficiency must be maximum
  • 143. Two basic methods of selecting predictors • Stepwise regression: Enter and remove predictors, in a stepwise manner, until there is no justifiable reason to enter or remove more. • Best subsets regression: Select the subset of predictors that do the best at meeting some well- defined objective criterion.
  • 144. What is step wise • First include the test (variable ) with maximum predictive ability (Predictive Validity) • Add a new test (the second best) • See if adds to the Multiple correlation (R) • If yes, add a third one • Go on addicting tests till R does not increase. • When R no longer increases, you have reached your maximum predictive efficiency

Editor's Notes

  1. The value or r that is computed represents the correlation coefficient of the sample. Have students interpret this result. Since r is close to -1, there is a strong negative correlation. As the number of absences increase, grades tend to decrease. Since there are 7 ordered pairs, n = 7.
  2. The proof that the coefficient of determination is equal to the square of the correlation coefficient is beyond the scope of the text.
  3. The value of d can be positive, negative or 0. Discuss the circumstances for each. The sum of the values of d will be 0 for the regression line. Squaring d eliminates negative values. Criteria for the Best Fit Line: The sum of the squares of the distances will be minimized.
  4. The sums are repeated here, but they have already been calculated when determining the value of r. A TI-83 can also be used to compute the equation.
  5. To graph the line of regression, find two points that satisfy the equation. Use any x values within the range of the data. Remember that (x-bar, y-bar) can be used as a point. For someone absent no times, a predicted grade is 105.667 (about 106). Each time a person is absent, it is expected that their grade will decrease by close to 4 points. (-3.924)
  6. Prediction values are meaningful only for x-values in (or close t) the range f x value in the data. If x = 100 the prediction fund by using the equation would be meaningless. A person who has been absent 3 times is predicted to have a final grade of about 94. A person who has been absent 12 times is predicted to have a grade of about 59.