FIZZA SARFARAZ
ABBOTTABAD UNIVERSITY OF
SCIENCE & TECHNOLOGY (A.U.S.T)
CORRELATION
HISTORY
GALTON:
Obsessed with measurement
Tried to measure everything
from the weather to female
beauty
Invented correlation and
regression
KARL PEARSON
formalized Galton's method
invented method
CORRELATION
Measure of the degree to
which any two variables
vary together.
or
Simultaneously variation
of variables in some
direction.
e.g., iron bar
INTRODUCTION
Association b/w two variables
Nature & strength of
relationship b/w two variables
Both random variables
Lies b/w +1 & -1
0 = No relationship b/w
variables
-1 = Perfect Negative
correlation
+1 = Perfect positive
correlation
Ice-cream - Temperature
MATHEMATICALLY








−








−
−
=
∑ ∑∑ ∑
∑ ∑ ∑
n
y)(
y.
n
x)(
x
n
yx
xy
r
2
2
2
2
EXAMPLE
AnxietyAnxiety
(X)(X)
TestTest
score (Y)score (Y)
XX22
YY22
XYXY
1010 22 100100 44 2020
88 33 6464 99 2424
22 99 44 8181 1818
11 77 11 4949 77
55 66 2525 3636 3030
66 55 3636 2525 3030
∑∑X = 32X = 32 ∑∑Y = 32Y = 32 ∑∑XX22
= 230= 230 ∑∑YY22
= 204= 204 ∑∑XY=12XY=12
99
Calculating CorrelationCalculating Correlation
CoefficientCoefficient
( )( )
94.
)200)(356(
1024774
32)204(632)230(6
)32)(32()129)(6(
22
−=
−
=
−−
−
=r
r = - 0.94
Indirect strong correlation
METHODS OF STUDYING
CORRELATION
METHODS
SCATTER DIAGRAM
KARL PEARSON
COEFFICIENT
SPEARMAN'S RANK
SCATTER DIAGRAM
Rectangular coordinate
Two quantitative variables
1 variable: independent (X) & 2nd:
dependent (Y)
Points are not joined
KARL PEARSON
COEFFICIENT
• Statistic showing the degree of relationship
b/w two variables.
• represented by 'r'
• called Pearson's correlation
SPEARMAN'S RANK
Actual measurement of objects/individualsActual measurement of objects/individuals
not availablenot available
Accurate assesment is not possibleAccurate assesment is not possible
Arranged in orderArranged in order
Ordered arrangement: RankingOrdered arrangement: Ranking
Order Given to object: RanksOrder Given to object: Ranks
Correlation blw two sets X & Y: RankCorrelation blw two sets X & Y: Rank
correlationcorrelation
PROCEDURE
 Rank values of X from 1-nRank values of X from 1-n
n: # of pairs of values of X & Yn: # of pairs of values of X & Y
 Rank Y from 1-nRank Y from 1-n
 Compute value of 'di' by Xi - YiCompute value of 'di' by Xi - Yi
 Square each di & compute ∑diSquare each di & compute ∑di22
 Apply formula;Apply formula; 2
s 2
6 (di)
r 1
n(n 1)
= −
−
∑
EXAMPLE
In a study of the relationship between level education andIn a study of the relationship between level education and
income the following data was obtained. Find the relationshipincome the following data was obtained. Find the relationship
between them and comment.between them and comment.
sample
numbers
level education
(X)
Income
(Y)
A Preparatory.Preparatory. 25
B Primary.Primary. 10
C University.University. 8
D secondarysecondary 10
E secondarysecondary 15
F illiterateilliterate 50
G University.University. 60
X Y rank
X
rank
Y
di di2
A Preparato
ry
25 5 3 2 4
B Primary 10 6 5.5 0.5 0.25
C University 8 1.5 7 -5.5 30.25
D secondary 10 3.5 5.5 -2 4
E secondary 15 3.5 4 -0.5 0.25
F illiterate 50 7 2 5 25
G university 60 1.5 1 0.5 0.25
∑ di2
=64
Conclusion:Conclusion:
There is an indirect weak correlationThere is an indirect weak correlation
between level of education and income.between level of education and income.
1.0
)48(7
646
1 −=
×
−=sr
TYPES
Types
Type 1 Type 2 Type 3
TYPE 1
Type 1
Negative NO Perfect
Positive
• POSITIVE - both
either increase or
decrease
• NEGATIVE - one
increase while other
decrease
• NO - no correlation
• PERFECT - both
variables are
independents
EXAMPLES
+ive Relationships
• WAter consumption
& temperature
• Study times &
grades
-ive Relationships
• Alcohol
consumption &
driving ability
• Price & Quantity
demanded
TYPE 2
Type 2
Linear
Non-linear
• LINEAR - Perfect
straight line on graph
• NON-LINEAR - Not
a perfect straight line
TYPE 3
Type 3
Simple Multiple Partial
• SIMPLE - 1 independent & 1 dependent
variable
• MULTIPLE - 1 dep & more than 1 indep
variable
• PARTIAL - 1 dep & more than 1 indep
variable bt only 1 indep variable is considered
while other const
COEFFICIENT OF
CORRELATION
Measure of the strength of linear relationship
b/w two variables.
Represented by 'r'
'r' lies b/w +1 & -1
-1 ≤ r ≤ +1
 +ive sign = +ive linear correlation
 -ive sign = -ive linear correlation
MATHEMATICALLY 'r'
2 2 2 2
( )( )
[ ( ) ][ ( ) ]
n xy x y
r
n x x n y y
−
=
− −
∑ ∑ ∑
∑ ∑ ∑ ∑
INTERPRETATION OF 'r'
INTERPRETATION
-1 ≤ r ≤ +1
-1 10-0.25-0.75 0.750.25
strong strongintermediate intermediateweak weak
no relation
perfect
correlation
perfect
correlation
Directindirect
CORRELATIION: LINEAR
RELATIONSHIPS
0
20
40
60
80
100
120
140
160
180
0 50 100 150 200 250
Drug A (dose in mg)
SymptomIndex
0
20
40
60
80
100
120
140
160
0 50 100 150 200 250
Drug B (dose in mg)
SymptomIndex
Srong Relationship → Good linear fit
Points clustered closely around a line show a
strong correlation. The line is a good
predictor (good fit) with the data. The more
spread out the points, the weaker the
correlation, and the less good the fit. The line
is a REGRESSSION line (Y = bX + a)
r : shows relationship b/w variables either
+ive or -ive
r2
: shows % of variation by best fit line
Example:Example:
A sample of 6 children was selected, data about their ageA sample of 6 children was selected, data about their age
in years and weight in kilograms was recorded as shownin years and weight in kilograms was recorded as shown
in the following table . It is required to find thein the following table . It is required to find the
correlation between age and weight.correlation between age and weight.
serial # Age X
I.V(years)
Weight Y
D.V(Kg)
1 7 12
2 6 8
3 8 12
4 5 10
5 6 11
6 9 13
Serial
n.
Age
(years)
(x)
Weight
(Kg)
(y)
xy X2
Y2
1 7 12 84 49 144
2 6 8 48 36 64
3 8 12 96 64 144
4 5 10 50 25 100
5 6 11 66 36 121
6 9 13 117 81 169
Total ∑x=
41
∑y=
66
∑xy=
461
∑x2
=
291
∑y2
=
742
r = 0.759r = 0.759
strong direct correlationstrong direct correlation
2 2
41 66
461
6r
(41) (66)
291 . 742
6 6
×
−
=
   
− −   
   
2 2
2 2
x y
xy
nr
( x) ( y)
x . y
n n
−
=
   
− − ÷ ÷ ÷ ÷
   
∑ ∑∑
∑ ∑∑ ∑
APPLICATIONS
Estimating & improving.,
Seasonal sales for departmental stores
Quantity demanded & production
Motivating tools for employees
Cost of products demanded
accuracy of estimations for demands for sails
Inflation & real wage
Oil exploration
Moreover, Radar system is field where
correlation is vehicle to map distance
&
in communication, for instance in digital
receivers.
SPSS TUTORIAL
1.Analyz
2.Correlate 3.
(Bivariate)
Points to be noted:
Confidence Level
Correlation is
highly significant
0.01**
Correlation is
significant 0.05*
Correlation new 2017 black
Correlation new 2017 black
Correlation new 2017 black
Correlation new 2017 black
Correlation new 2017 black
Correlation new 2017 black

Correlation new 2017 black

  • 1.
    FIZZA SARFARAZ ABBOTTABAD UNIVERSITYOF SCIENCE & TECHNOLOGY (A.U.S.T)
  • 3.
  • 4.
    HISTORY GALTON: Obsessed with measurement Triedto measure everything from the weather to female beauty Invented correlation and regression KARL PEARSON formalized Galton's method invented method
  • 5.
    CORRELATION Measure of thedegree to which any two variables vary together. or Simultaneously variation of variables in some direction. e.g., iron bar
  • 6.
    INTRODUCTION Association b/w twovariables Nature & strength of relationship b/w two variables Both random variables Lies b/w +1 & -1 0 = No relationship b/w variables -1 = Perfect Negative correlation +1 = Perfect positive correlation
  • 7.
  • 8.
  • 9.
    EXAMPLE AnxietyAnxiety (X)(X) TestTest score (Y)score (Y) XX22 YY22 XYXY 101022 100100 44 2020 88 33 6464 99 2424 22 99 44 8181 1818 11 77 11 4949 77 55 66 2525 3636 3030 66 55 3636 2525 3030 ∑∑X = 32X = 32 ∑∑Y = 32Y = 32 ∑∑XX22 = 230= 230 ∑∑YY22 = 204= 204 ∑∑XY=12XY=12 99
  • 10.
    Calculating CorrelationCalculating Correlation CoefficientCoefficient ()( ) 94. )200)(356( 1024774 32)204(632)230(6 )32)(32()129)(6( 22 −= − = −− − =r r = - 0.94 Indirect strong correlation
  • 11.
    METHODS OF STUDYING CORRELATION METHODS SCATTERDIAGRAM KARL PEARSON COEFFICIENT SPEARMAN'S RANK
  • 12.
    SCATTER DIAGRAM Rectangular coordinate Twoquantitative variables 1 variable: independent (X) & 2nd: dependent (Y) Points are not joined
  • 13.
    KARL PEARSON COEFFICIENT • Statisticshowing the degree of relationship b/w two variables. • represented by 'r' • called Pearson's correlation
  • 14.
    SPEARMAN'S RANK Actual measurementof objects/individualsActual measurement of objects/individuals not availablenot available Accurate assesment is not possibleAccurate assesment is not possible Arranged in orderArranged in order Ordered arrangement: RankingOrdered arrangement: Ranking Order Given to object: RanksOrder Given to object: Ranks Correlation blw two sets X & Y: RankCorrelation blw two sets X & Y: Rank correlationcorrelation
  • 15.
    PROCEDURE  Rank valuesof X from 1-nRank values of X from 1-n n: # of pairs of values of X & Yn: # of pairs of values of X & Y  Rank Y from 1-nRank Y from 1-n  Compute value of 'di' by Xi - YiCompute value of 'di' by Xi - Yi  Square each di & compute ∑diSquare each di & compute ∑di22  Apply formula;Apply formula; 2 s 2 6 (di) r 1 n(n 1) = − − ∑
  • 16.
    EXAMPLE In a studyof the relationship between level education andIn a study of the relationship between level education and income the following data was obtained. Find the relationshipincome the following data was obtained. Find the relationship between them and comment.between them and comment. sample numbers level education (X) Income (Y) A Preparatory.Preparatory. 25 B Primary.Primary. 10 C University.University. 8 D secondarysecondary 10 E secondarysecondary 15 F illiterateilliterate 50 G University.University. 60
  • 17.
    X Y rank X rank Y didi2 A Preparato ry 25 5 3 2 4 B Primary 10 6 5.5 0.5 0.25 C University 8 1.5 7 -5.5 30.25 D secondary 10 3.5 5.5 -2 4 E secondary 15 3.5 4 -0.5 0.25 F illiterate 50 7 2 5 25 G university 60 1.5 1 0.5 0.25 ∑ di2 =64
  • 18.
    Conclusion:Conclusion: There is anindirect weak correlationThere is an indirect weak correlation between level of education and income.between level of education and income. 1.0 )48(7 646 1 −= × −=sr
  • 19.
  • 20.
    TYPE 1 Type 1 NegativeNO Perfect Positive
  • 21.
    • POSITIVE -both either increase or decrease • NEGATIVE - one increase while other decrease • NO - no correlation • PERFECT - both variables are independents
  • 22.
    EXAMPLES +ive Relationships • WAterconsumption & temperature • Study times & grades -ive Relationships • Alcohol consumption & driving ability • Price & Quantity demanded
  • 23.
  • 24.
    • LINEAR -Perfect straight line on graph • NON-LINEAR - Not a perfect straight line
  • 25.
    TYPE 3 Type 3 SimpleMultiple Partial
  • 26.
    • SIMPLE -1 independent & 1 dependent variable • MULTIPLE - 1 dep & more than 1 indep variable • PARTIAL - 1 dep & more than 1 indep variable bt only 1 indep variable is considered while other const
  • 27.
    COEFFICIENT OF CORRELATION Measure ofthe strength of linear relationship b/w two variables. Represented by 'r' 'r' lies b/w +1 & -1 -1 ≤ r ≤ +1  +ive sign = +ive linear correlation  -ive sign = -ive linear correlation
  • 28.
    MATHEMATICALLY 'r' 2 22 2 ( )( ) [ ( ) ][ ( ) ] n xy x y r n x x n y y − = − − ∑ ∑ ∑ ∑ ∑ ∑ ∑
  • 29.
  • 30.
    INTERPRETATION -1 ≤ r≤ +1 -1 10-0.25-0.75 0.750.25 strong strongintermediate intermediateweak weak no relation perfect correlation perfect correlation Directindirect
  • 31.
    CORRELATIION: LINEAR RELATIONSHIPS 0 20 40 60 80 100 120 140 160 180 0 50100 150 200 250 Drug A (dose in mg) SymptomIndex 0 20 40 60 80 100 120 140 160 0 50 100 150 200 250 Drug B (dose in mg) SymptomIndex Srong Relationship → Good linear fit Points clustered closely around a line show a strong correlation. The line is a good predictor (good fit) with the data. The more spread out the points, the weaker the correlation, and the less good the fit. The line is a REGRESSSION line (Y = bX + a)
  • 32.
    r : showsrelationship b/w variables either +ive or -ive r2 : shows % of variation by best fit line
  • 33.
    Example:Example: A sample of6 children was selected, data about their ageA sample of 6 children was selected, data about their age in years and weight in kilograms was recorded as shownin years and weight in kilograms was recorded as shown in the following table . It is required to find thein the following table . It is required to find the correlation between age and weight.correlation between age and weight. serial # Age X I.V(years) Weight Y D.V(Kg) 1 7 12 2 6 8 3 8 12 4 5 10 5 6 11 6 9 13
  • 34.
    Serial n. Age (years) (x) Weight (Kg) (y) xy X2 Y2 1 712 84 49 144 2 6 8 48 36 64 3 8 12 96 64 144 4 5 10 50 25 100 5 6 11 66 36 121 6 9 13 117 81 169 Total ∑x= 41 ∑y= 66 ∑xy= 461 ∑x2 = 291 ∑y2 = 742
  • 35.
    r = 0.759r= 0.759 strong direct correlationstrong direct correlation 2 2 41 66 461 6r (41) (66) 291 . 742 6 6 × − =     − −        2 2 2 2 x y xy nr ( x) ( y) x . y n n − =     − − ÷ ÷ ÷ ÷     ∑ ∑∑ ∑ ∑∑ ∑
  • 36.
    APPLICATIONS Estimating & improving., Seasonalsales for departmental stores Quantity demanded & production Motivating tools for employees Cost of products demanded accuracy of estimations for demands for sails Inflation & real wage Oil exploration
  • 37.
    Moreover, Radar systemis field where correlation is vehicle to map distance & in communication, for instance in digital receivers.
  • 38.
    SPSS TUTORIAL 1.Analyz 2.Correlate 3. (Bivariate) Pointsto be noted: Confidence Level Correlation is highly significant 0.01** Correlation is significant 0.05*

Editor's Notes

  • #25 [linear: heat wave] ; worker doubl=produ,double & [non linear: pressure and volume]
  • #33 r2 = coefficient of determination
  • #39 points 2 b noted: confidence level, n (sample) & pearsons correlation