Unit: 1
Subtopic:4
Correlation
Ravinandan A P
Assistant Professor,
Sree Siddaganga College of Pharmacy
In association with
Siddaganga Hospital
Tumkur-02
Presentation Outlines……….
1. Definition
2. Karl Pearson’s coefficient of
correlation
3. Multiple correlation
Relationships between Variables
• Common aim of research is to try to relate a
variable of interest to one or more other
variables demonstrate an association
between variables (not necessarily causal)
• Ex: dose of drug and resulting systolic blood
pressure, advertising expenditure of a
company and end of year sales figures, etc.
• Establish a theoretical model to predict the
value of one variable from a number of
known factors
Dependent and Independent Variables
• The independent variable is the variable
which is under the investigator’s control
(denoted x)
• the dependent variable is the one which
the investigator is trying to estimate or
predict (denoted y)
• Can a relationship be used to predict what
happens to y as x changes (ie what
happens to the dependent variable as the
independent variable changes)?
Exploring Relationships
The relationship between x and y
• Correlation: is there a relationship between 2
variables?
• Regression: how well a certain independent variable
predict dependent variable?
• Regression: a measure of the relation between
the mean value of one variable (e.g. output) and
corresponding values of other variables (e.g.
time and cost).
• CORRELATION  CAUSATION
– In order to infer causality: manipulate independent variable
and observe effect on dependent variable
Correlation
• Correlation is the study of relationship between two or
variables
• Correlation methods are used to measure the association of
two or more variables
• Two values are related means, that one variable may be
predicted from the knowledge of the other.
• Ex: One predict the dissolution of tablets based on the
hardness
• Correlation is applied to the relationship of continuous
variables.
Correlation
A correlation is a relationship between two variables.
The data can be represented by the ordered pairs (x, y)
where x is the independent (or explanatory) variable,
and y is the dependent (or response) variable.
x
2 4
–2
– 4
y
2
6
x 1 2 3 4 5
y – 4 – 2 – 1 0 2
Example:
Correlation key concepts:
Types of correlation Methods of studying
correlation
a) Scatter diagram
b) Karl pearson’s coefficient of correlation
c) Spearman’s Rank correlation coefficient
d) Method of least square
Scatter Plot
• Visual representation of scores.
• Each individual score is represented by a single
point on the graph.
• Allows you to see any patterns of trends that exist
in the data.
• A scatter plot can be used to determine whether
a linear (straight line) correlation exists between
two variables.
Scatter Plots
Y
X
Y
X
Y
X
Y
Y Y
Positive correlation Negative correlation No correlation
Linear Correlation
x
y
Negative Linear Correlation
x
y
No Correlation
x
y
Positive Linear Correlation
x
y
Nonlinear Correlation
As x increases, y
tends to decrease.
As x increases, y
tends to increase.
Correlation Coefficient
Statistic showing the degree of relation between two
variables
The correlation coefficient is a measure of the strength and the
direction of a linear relationship between two variables.
The symbol r represents the sample correlation coefficient.
The formula for r is
( )( )
( ) ( )
2 2
2 2
.
n xy x y
r
n x x n y y
 −  
=
 −   − 
The range of the correlation coefficient is −1 to 1. If x and y have a
strong positive linear correlation, r is close to 1. If x and y have a
strong negative linear correlation, r is close to −1. If there is no linear
correlation or a weak linear correlation, r is close to 0.
Linear Correlation
x
y
Strong negative correlation
x
y
Weak positive correlation
x
y
Strong positive correlation
x
y
Nonlinear Correlation
r = −0.91 r = 0.88
r = 0.42
r = 0.07
Calculating a Correlation Coefficient
1. Find the sum of the x-values.
2. Find the sum of the y-values.
3. Multiply each x-value by its corresponding
y-value and find the sum.
4. Square each x-value and find the sum.
5. Square each y-value and find the sum.
6. Use these five sums to calculate the
correlation coefficient.
Continued.
Calculating a Correlation Coefficient
In Words In Symbols
x

y

xy

2
x

2
y

( )( )
( ) ( )
2 2
2 2
.
n xy x y
r
n x x n y y
 −  
=
 −   − 
Correlation Coefficient
Example:
Calculate the correlation coefficient r for the following data.
x y xy x2 y2
1 – 3 – 3 1 9
2 – 1 – 2 4 1
3 0 0 9 0
4 1 4 16 1
5 2 10 25 4
15
x
 = 1
y
 = − 9
xy
 = 2
55
x
 = 2
15
y
 =
( )( )
( ) ( )
2 2
2 2
n xy x y
r
n x x n y y
 −  
=
 −   − 
( )( )
( )2
2
5(9) 15 1
5(55) 15 5(15) 1
− −
=
− − −
60
50 74
= 0.986

There is a strong positive
linear correlation between x
and y.
Correlation Coefficient
Hours, x 0 1 2 3 3 5 5 5 6 7 7 10
Test score, y 96 85 82 74 95 68 76 84 58 65 75 50
Example:
The following data represents the number of hours 12
different students watched television during the
weekend and the scores of each student who took a test
the following Monday.
a.) Display the scatter plot.
b.) Calculate the correlation coefficient r.
Continued.
Correlation Coefficient
Hours, x 0 1 2 3 3 5 5 5 6 7 7 10
Test score, y 96 85 82 74 95 68 76 84 58 65 75 50
Example continued:
100
x
y
Hours watching TV
Test
score
80
60
40
20
2 4 6 8 10
Continued.
Correlation Coefficient
Hours, x 0 1 2 3 3 5 5 5 6 7 7 10
Test score, y 96 85 82 74 95 68 76 84 58 65 75 50
xy 0 85 164 222 285 340 380 420 348 455 525 500
x2 0 1 4 9 9 25 25 25 36 49 49 100
y2 9216 7225 6724 5476 9025 4624 5776 7056 3364 4225 5625 2500
Example continued:
( )( )
( ) ( )
2 2
2 2
n xy x y
r
n x x n y y
 −  
=
 −   − 
( )( )
( )2
2
12(3724) 54 908
12(332) 54 12(70836) 908
−
=
− −
0.831
 −
There is a strong negative linear correlation.
As the number of hours spent watching TV increases, the test
scores tend to decrease.
54
x
 = 908
y
 = 3724
xy
 = 2
332
x
 = 2
70836
y
 =
Simple Correlation coefficient (r)
➢It is also called Pearson's correlation or
product moment correlation coefficient.
➢It measures the nature and strength between
two variables of the quantitative type.
The sign of r denotes the nature of
association
while the value of r denotes the strength of
association.
➢If the sign is +ve this means the relation
is direct (an increase in one variable is
associated with an increase in the
other variable and a decrease in one
variable is associated with a
decrease in the other variable).
➢While if the sign is -ve this means an
inverse or indirect relationship (which
means an increase in one variable is
associated with a decrease in the other).
➢ The value of r ranges between ( -1) and ( +1)
➢ The value of r denotes the strength of the
association as illustrated by the following
diagram.
-1 1
0
-0.25
-0.75 0.75
0.25
strong strong
intermediate intermediate
weak weak
no relation
perfect
correlation
perfect
correlation
Direct
indirect
If r = Zero this means no association or
correlation between the two variables.
If 0 < r < 0.25 = weak correlation.
If 0.25 ≤ r < 0.75 = intermediate correlation.
If 0.75 ≤ r < 1 = strong correlation.
If r = l = perfect correlation.
Karl Pearson's correlation coefficient
• Karl Pearson's correlation coefficient measures
degree of linear relationship between two variables
• If the relationship between two variables X and Y is
to be ascertained through Karl Pearson method ,
then the following formula is used:
• The value of the coefficient of correlation always
lies between ±1
Multiple correlation
• Coefficient of multiple correlation is a
measure of how well a given variable can be
predicted using a linear function of a set of
other variables.
• It is the correlation between the variable's
values and the best predictions that can be
computed linearly from the predictive
variables.
• The coefficient of multiple correlation is known
as the square root of the coefficient of
determination,
Multiple correlation
• The coefficient of multiple correlation takes
values between 0 and 1.
• Higher values indicate higher predictability of
the dependent variable from the independent
variables, with a value of 1 indicating that the
predictions are exactly correct and a value of
0 indicating that no linear combination of the
independent variables is a better predictor
than is the fixed mean of the dependent
variable
Thank you

Unit 1 Correlation- BSRM.pdf

  • 1.
    Unit: 1 Subtopic:4 Correlation Ravinandan AP Assistant Professor, Sree Siddaganga College of Pharmacy In association with Siddaganga Hospital Tumkur-02
  • 2.
    Presentation Outlines………. 1. Definition 2.Karl Pearson’s coefficient of correlation 3. Multiple correlation
  • 3.
    Relationships between Variables •Common aim of research is to try to relate a variable of interest to one or more other variables demonstrate an association between variables (not necessarily causal) • Ex: dose of drug and resulting systolic blood pressure, advertising expenditure of a company and end of year sales figures, etc. • Establish a theoretical model to predict the value of one variable from a number of known factors
  • 4.
    Dependent and IndependentVariables • The independent variable is the variable which is under the investigator’s control (denoted x) • the dependent variable is the one which the investigator is trying to estimate or predict (denoted y) • Can a relationship be used to predict what happens to y as x changes (ie what happens to the dependent variable as the independent variable changes)?
  • 6.
  • 7.
    The relationship betweenx and y • Correlation: is there a relationship between 2 variables? • Regression: how well a certain independent variable predict dependent variable? • Regression: a measure of the relation between the mean value of one variable (e.g. output) and corresponding values of other variables (e.g. time and cost). • CORRELATION  CAUSATION – In order to infer causality: manipulate independent variable and observe effect on dependent variable
  • 8.
    Correlation • Correlation isthe study of relationship between two or variables • Correlation methods are used to measure the association of two or more variables • Two values are related means, that one variable may be predicted from the knowledge of the other. • Ex: One predict the dissolution of tablets based on the hardness • Correlation is applied to the relationship of continuous variables.
  • 9.
    Correlation A correlation isa relationship between two variables. The data can be represented by the ordered pairs (x, y) where x is the independent (or explanatory) variable, and y is the dependent (or response) variable. x 2 4 –2 – 4 y 2 6 x 1 2 3 4 5 y – 4 – 2 – 1 0 2 Example:
  • 11.
    Correlation key concepts: Typesof correlation Methods of studying correlation a) Scatter diagram b) Karl pearson’s coefficient of correlation c) Spearman’s Rank correlation coefficient d) Method of least square
  • 12.
    Scatter Plot • Visualrepresentation of scores. • Each individual score is represented by a single point on the graph. • Allows you to see any patterns of trends that exist in the data. • A scatter plot can be used to determine whether a linear (straight line) correlation exists between two variables.
  • 13.
    Scatter Plots Y X Y X Y X Y Y Y Positivecorrelation Negative correlation No correlation
  • 18.
    Linear Correlation x y Negative LinearCorrelation x y No Correlation x y Positive Linear Correlation x y Nonlinear Correlation As x increases, y tends to decrease. As x increases, y tends to increase.
  • 21.
    Correlation Coefficient Statistic showingthe degree of relation between two variables The correlation coefficient is a measure of the strength and the direction of a linear relationship between two variables. The symbol r represents the sample correlation coefficient. The formula for r is ( )( ) ( ) ( ) 2 2 2 2 . n xy x y r n x x n y y  −   =  −   −  The range of the correlation coefficient is −1 to 1. If x and y have a strong positive linear correlation, r is close to 1. If x and y have a strong negative linear correlation, r is close to −1. If there is no linear correlation or a weak linear correlation, r is close to 0.
  • 22.
    Linear Correlation x y Strong negativecorrelation x y Weak positive correlation x y Strong positive correlation x y Nonlinear Correlation r = −0.91 r = 0.88 r = 0.42 r = 0.07
  • 23.
    Calculating a CorrelationCoefficient 1. Find the sum of the x-values. 2. Find the sum of the y-values. 3. Multiply each x-value by its corresponding y-value and find the sum. 4. Square each x-value and find the sum. 5. Square each y-value and find the sum. 6. Use these five sums to calculate the correlation coefficient. Continued. Calculating a Correlation Coefficient In Words In Symbols x  y  xy  2 x  2 y  ( )( ) ( ) ( ) 2 2 2 2 . n xy x y r n x x n y y  −   =  −   − 
  • 24.
    Correlation Coefficient Example: Calculate thecorrelation coefficient r for the following data. x y xy x2 y2 1 – 3 – 3 1 9 2 – 1 – 2 4 1 3 0 0 9 0 4 1 4 16 1 5 2 10 25 4 15 x  = 1 y  = − 9 xy  = 2 55 x  = 2 15 y  = ( )( ) ( ) ( ) 2 2 2 2 n xy x y r n x x n y y  −   =  −   −  ( )( ) ( )2 2 5(9) 15 1 5(55) 15 5(15) 1 − − = − − − 60 50 74 = 0.986  There is a strong positive linear correlation between x and y.
  • 25.
    Correlation Coefficient Hours, x0 1 2 3 3 5 5 5 6 7 7 10 Test score, y 96 85 82 74 95 68 76 84 58 65 75 50 Example: The following data represents the number of hours 12 different students watched television during the weekend and the scores of each student who took a test the following Monday. a.) Display the scatter plot. b.) Calculate the correlation coefficient r. Continued.
  • 26.
    Correlation Coefficient Hours, x0 1 2 3 3 5 5 5 6 7 7 10 Test score, y 96 85 82 74 95 68 76 84 58 65 75 50 Example continued: 100 x y Hours watching TV Test score 80 60 40 20 2 4 6 8 10 Continued.
  • 27.
    Correlation Coefficient Hours, x0 1 2 3 3 5 5 5 6 7 7 10 Test score, y 96 85 82 74 95 68 76 84 58 65 75 50 xy 0 85 164 222 285 340 380 420 348 455 525 500 x2 0 1 4 9 9 25 25 25 36 49 49 100 y2 9216 7225 6724 5476 9025 4624 5776 7056 3364 4225 5625 2500 Example continued: ( )( ) ( ) ( ) 2 2 2 2 n xy x y r n x x n y y  −   =  −   −  ( )( ) ( )2 2 12(3724) 54 908 12(332) 54 12(70836) 908 − = − − 0.831  − There is a strong negative linear correlation. As the number of hours spent watching TV increases, the test scores tend to decrease. 54 x  = 908 y  = 3724 xy  = 2 332 x  = 2 70836 y  =
  • 28.
    Simple Correlation coefficient(r) ➢It is also called Pearson's correlation or product moment correlation coefficient. ➢It measures the nature and strength between two variables of the quantitative type. The sign of r denotes the nature of association while the value of r denotes the strength of association.
  • 29.
    ➢If the signis +ve this means the relation is direct (an increase in one variable is associated with an increase in the other variable and a decrease in one variable is associated with a decrease in the other variable). ➢While if the sign is -ve this means an inverse or indirect relationship (which means an increase in one variable is associated with a decrease in the other).
  • 30.
    ➢ The valueof r ranges between ( -1) and ( +1) ➢ The value of r denotes the strength of the association as illustrated by the following diagram. -1 1 0 -0.25 -0.75 0.75 0.25 strong strong intermediate intermediate weak weak no relation perfect correlation perfect correlation Direct indirect
  • 31.
    If r =Zero this means no association or correlation between the two variables. If 0 < r < 0.25 = weak correlation. If 0.25 ≤ r < 0.75 = intermediate correlation. If 0.75 ≤ r < 1 = strong correlation. If r = l = perfect correlation.
  • 33.
    Karl Pearson's correlationcoefficient • Karl Pearson's correlation coefficient measures degree of linear relationship between two variables • If the relationship between two variables X and Y is to be ascertained through Karl Pearson method , then the following formula is used: • The value of the coefficient of correlation always lies between ±1
  • 35.
    Multiple correlation • Coefficientof multiple correlation is a measure of how well a given variable can be predicted using a linear function of a set of other variables. • It is the correlation between the variable's values and the best predictions that can be computed linearly from the predictive variables. • The coefficient of multiple correlation is known as the square root of the coefficient of determination,
  • 36.
    Multiple correlation • Thecoefficient of multiple correlation takes values between 0 and 1. • Higher values indicate higher predictability of the dependent variable from the independent variables, with a value of 1 indicating that the predictions are exactly correct and a value of 0 indicating that no linear combination of the independent variables is a better predictor than is the fixed mean of the dependent variable
  • 39.