SlideShare a Scribd company logo
Bivariate Analysis
Dr. Shriram Dawkhar
SIBAR, Kondhwa.
So far the statistical methods we
have used only permit us to:
◼ Look at the frequency in which certain
numbers or categories occur.
◼ Look at measures of central tendency such
as means, modes, and medians for one
variable.
◼ Look at measures of dispersion such as
standard deviation and z scores for one
interval or ratio level variable.
Bivariate analysis allows us to:
◼ Look at associations/relationships among
two variables.
◼ Look at measures of the strength of the
relationship between two variables.
◼ Test hypotheses about relationships
between two nominal or ordinal level
variables.
For example, what does this table tell us about
opinions on welfare by gender?
Support cutting
welfare benefits
for immigrants
Male Female
Yes 15 5
No 10 20
Total 25 25
Are frequencies sufficient to
allow us to make comparisons
about groups?
What other information do we
need?
Is this table more helpful?
Benefits for
Immigrants
Males Female
Yes 15 (60%) 5 (20%)
No 10 (40%) 20 (80%)
Total 25 (100%) 25 (100%)
Rules for cross-tabulation
◼ Calculate either column or row percents.
◼ Calculations are the number of frequencies
in a cell of a table divided by the total
number of frequencies in that column or
row, for example 20/25 = 80.0%
◼ All percentages in a column or row should
total 100%.
Let’s look at another example –
social work degrees by gender
Social Work
Degree
Male Female
BA 20 (33.3%) 20 ( %)
MSW 30 ( ) 70 (70.0%)
Ph.D. 10 (16.7%) 10 (10.0%)
60 (100.0%) 100 (100.0%
We use cross-tabulation when:
◼ We want to look at relationships among
two or three variables.
◼ We want a descriptive statistical measure
to tell us whether differences among
groups are large enough to indicate some
sort of relationship among variables.
Cross-tabs are not sufficient to:
◼ Tell us the strength or actually size of the relationships
among two or three variables.
◼ Test a hypothesis about the relationship between two or
three variables.
◼ Tell us the direction of the relationship among two or
more variables.
◼ Look at relationships between one nominal or ordinal
variable and one ratio or interval variable unless the range
of possible values for the ratio or interval variable is
small. What do you think a table with a large number of
ratio values would look like?
We can use cross-tabs to visually
assess whether independent and
dependent variables might be
related. In addition, we also use
cross-tabs to find out if
demographic variables such as
gender and ethnicity are related
to the second variable.
Because we use tables in these ways, we can
set up some decision rules about how to use
tables.
◼ Independent variables should be column
variables.
◼ If you are not looking at independent and
dependent variable relationships, use the variable
that can logically be said to influence the other as
your column variable.
◼ Using this rule, always calculate column
percentages rather than row percentages.
◼ Use the column percentages to interpret your
results.
CORRELATION
CORRELATION
◼ Correlation is a much abused word/term
◼ Correlation is a term which implies that there is
an association between the paired values of 2
variables, where association means that the
fluctuations in the values for each variable is
sufficiently regular to make it unlikely that the
association has arisen by chance
◼ assumes: independent random samples are taken
from a distribution in which the 2 variables are
together normally distributed
Correlation
◼ Correlation: The degree of relationship between the
variables under consideration is measure through the
correlation analysis.
◼ The measure of correlation called the correlation coefficient
◼ The degree of relationship is expressed by coefficient which
range from correlation ( -1 ≤ r ≥ +1)
◼ The direction of change is indicated by a sign.
◼ The correlation analysis enable us to have an idea about the
degree & direction of the relationship between the two
variables under study.
Correlation
◼ Correlation is a statistical tool that helps
to measure and analyze the degree of
relationship between two variables.
◼ Correlation analysis deals with the
association between two or more
variables.
Correlation & Causation
◼ Causation means cause & effect relation.
◼ Correlation denotes the interdependency among the
variables for correlating two phenomenon, it is essential
that the two phenomenon should have cause-effect
relationship,& if such relationship does not exist then the
two phenomenon can not be correlated.
◼ If two variables vary in such a way that movement in one
are accompanied by movement in other, these variables
are called cause and effect relationship.
◼ Causation always implies correlation but correlation does
not necessarily implies causation.
Types of Correlation
Type I
Correlation
Positive Correlation Negative Correlation
Types of Correlation Type I
◼ Positive Correlation: The correlation is said to
be positive correlation if the values of two
variables changing with same direction.
Ex. Pub./Ad. Exp. & sales, Height & weight.
◼ Negative Correlation: The correlation is said to
be negative correlation when the values of
variables change with opposite direction.
Ex. Price & qty. demanded.
Direction of the Correlation
◼ Positive relationship – Variables change in the
same direction.
◼ As X is increasing, Y is increasing
◼ As X is decreasing, Y is decreasing
◼ E.g., As height increases, so does weight.
◼ Negative relationship – Variables change in
opposite directions.
◼ As X is increasing, Y is decreasing
◼ As X is decreasing, Y is increasing
◼ E.g., As TV time increases, grades decrease
Indicated by
sign; (+) or (-).
More examples
◼ Positive relationships
◼ temperature and
water consumption .
◼ study time and
grades.
◼ Negative relationships:
◼ alcohol consumption
and driving ability.
◼ Price & quantity
demanded
Types of Correlation
Type II
Correlation
Simple Multiple
Partial Total
Types of Correlation Type II
◼ Simple correlation: Under simple correlation
problem there are only two variables are studied.
◼ Multiple Correlation: Under Multiple
Correlation three or more than three variables
are studied. Ex. Qd = f ( P,PC, PS, t, y )
◼ Partial correlation: analysis recognizes more
than two variables but considers only two
variables keeping the other constant.
◼ Total correlation: is based on all the relevant
variables, which is normally not feasible.
Types of Correlation
Type III
Correlation
LINEAR NON LINEAR
Types of Correlation Type III
◼ Linear correlation: Correlation is said to be linear
when the amount of change in one variable tends to
bear a constant ratio to the amount of change in the
other. The graph of the variables having a linear
relationship will form a straight line.
Ex X = 1, 2, 3, 4, 5, 6, 7, 8,
Y = 5, 7, 9, 11, 13, 15, 17, 19,
Y = 3 + 2x
◼ Non Linear correlation: The correlation would be
non linear if the amount of change in one variable
does not bear a constant ratio to the amount of change
in the other variable.
Methods of Studying Correlation
◼ Scatter Diagram Method
◼ Karl Pearson’s Coefficient of
Correlation (r)
◼ Spearman’s Rank Correlation
Coefficient
Scatter Plot
◼ A scatter plot is a graph of a collection of ordered
pairs (x,y).
◼ The graph looks like a bunch of dots, but some of
the graphs are a general shape or move in a general
direction.
Scatter Diagram Method
◼ Scatter Diagram is a graph of observed
plotted points where each points
represents the values of X & Y as a
coordinate. It portrays the relationship
between these two variables graphically.
Is there a
linear
relationshi
p from this
plot?
Y
X
Y
X
Y
Y
X
X
Linear relationships Curvilinear relationships
Linear Correlation
◼Slide from: Statistics for Managers Using Microsoft® Excel 4th Edition, 2004 Prentice-Hall
Y
X
Y
X
Y
Y
X
X
Strong relationships Weak relationships
Linear Correlation
◼Slide from: Statistics for Managers Using Microsoft® Excel 4th Edition, 2004 Prentice-Hall
Linear Correlation
Y
X
Y
X
No relationship
◼Slide from: Statistics for Managers Using Microsoft® Excel 4th Edition, 2004 Prentice-Hall
34
Correlation
A perfect positive correlation
Height
Weight
Height
of A
Weight
of A
Height
of B
Weight
of B
A linear
relationship
Degree of correlation
◼ Perfect Positive Correlation
Grade obtained
Hours of
study
r = +1.0
High Degree of positive correlation
◼ Positive relationship
Height
Weight
r = +.80
Degree of correlation
◼ Moderate Positive Correlation
Weight
Shoe
Size
r = + 0.4
Degree of correlation
◼ Perfect Negative Correlation
Exam score
TV
watching
per
week
r = -1.0
Degree of correlation
◼ Moderate Negative Correlation
Exam score
TV
watching
per
week
r = -.80
Degree of correlation
◼ Weak negative Correlation
Weight
Shoe
Size
r = - 0.2
Degree of correlation
◼ No Correlation (horizontal line)
Height
IQ
r = 0.0
2) Direction of the Relationship
◼ Positive relationship – Variables change in the
same direction.
◼ As X is increasing, Y is increasing
◼ As X is decreasing, Y is decreasing
◼ E.g., As height increases, so does weight.
◼ Negative relationship – Variables change in
opposite directions.
◼ As X is increasing, Y is decreasing
◼ As X is decreasing, Y is increasing
◼ E.g., As TV time increases, grades decrease
Indicated by
sign; (+) or (-).
Advantages of Scatter Diagram
◼ Simple & Non Mathematical method
◼ Not influenced by the size of extreme
item
◼ First step in investing the relationship
between two variables
Disadvantage of scatter diagram
Can not adopt the an exact degree of
correlation
Karl Pearson's
Coefficient of Correlation
◼ Pearson’s ‘r’ is the most common
correlation coefficient.
◼ Karl Pearson’s Coefficient of Correlation
denoted by- ‘r’ The coefficient of
correlation ‘r’ measure the degree of
linear relationship between two variables
say x & y.
Karl Pearson's
Coefficient of Correlation
◼ Karl Pearson’s Coefficient of
Correlation denoted by- r
-1 ≤ r ≥ +1
◼ Degree of Correlation is expressed by a
value of Coefficient
◼ Direction of change is Indicated by sign
( - ve) or ( + ve)
Karl Pearson's
Coefficient of Correlation
◼ When deviation taken from actual mean:
r(x, y)= Σxy /√ Σx² Σy²
◼ When deviation taken from an assumed
mean:
r = N Σdxdy - Σdx Σdy
√N Σdx²-(Σdx)² √N Σdy²-(Σdy)²
Procedure for computing the
correlation coefficient
◼ Calculate the mean of the two series ‘x’ &’y’
◼ Calculate the deviations ‘x’ &’y’ in two series from their
respective mean.
◼ Square each deviation of ‘x’ &’y’ then obtain the sum of
the squared deviation i.e.∑x2 & .∑y2
◼ Multiply each deviation under x with each deviation under
y & obtain the product of ‘xy’.Then obtain the sum of the
product of x , y i.e. ∑xy
◼ Substitute the value in the formula.
Interpretation of Correlation
Coefficient (r)
◼ The value of correlation coefficient ‘r’ ranges
from -1 to +1
◼ If r = +1, then the correlation between the two
variables is said to be perfect and positive
◼ If r = -1, then the correlation between the two
variables is said to be perfect and negative
◼ If r = 0, then there exists no correlation between
the variables
Properties of Correlation coefficient
◼ The correlation coefficient lies between -1 & +1
symbolically ( - 1≤ r ≥ 1 )
◼ The correlation coefficient is independent of the
change of origin & scale.
◼ The coefficient of correlation is the geometric mean of
two regression coefficient.
r = √ bxy * byx
The one regression coefficient is (+ve) other regression
coefficient is also (+ve) correlation coefficient is (+ve)
Assumptions of Pearson’s
Correlation Coefficient
◼ There is linear relationship between two
variables, i.e. when the two variables are
plotted on a scatter diagram a straight line
will be formed by the points.
◼ Cause and effect relation exists between
different forces operating on the item of
the two variable series.
Advantages of Pearson’s Coefficient
◼ It summarizes in one value, the
degree of correlation & direction
of correlation also.
Limitation of Pearson’s Coefficient
◼ Always assume linear relationship
◼ Interpreting the value of r is difficult.
◼ Value of Correlation Coefficient is
affected by the extreme values.
◼ Time consuming methods
Coefficient of Determination
◼ The convenient way of interpreting the value of
correlation coefficient is to use of square of
coefficient of correlation which is called
Coefficient of Determination.
◼ The Coefficient of Determination = r2.
◼ Suppose: r = 0.9, r2 = 0.81 this would mean that
81% of the variation in the dependent variable
has been explained by the independent variable.
Coefficient of Determination
◼ The maximum value of r2 is 1 because it is
possible to explain all of the variation in y but it
is not possible to explain more than all of it.
◼ Coefficient of Determination = Explained
variation / Total variation
Coefficient of Determination: An example
◼ Suppose: r = 0.60
r = 0.30 It does not mean that the first
correlation is twice as strong as the second the
‘r’ can be understood by computing the value of
r2 .
When r = 0.60 r2 = 0.36 -----(1)
r = 0.30 r2 = 0.09 -----(2)
This implies that in the first case 36% of the total
variation is explained whereas in second case
9% of the total variation is explained .
3) Spearman’s Rank Coefficient of
Correlation
◼ When statistical series in which the variables
under study are not capable of quantitative
measurement but can be arranged in serial order,
in such situation pearson’s correlation coefficient
can not be used in such case Spearman Rank
correlation can be used.
◼ ρ (rho) = 1- (6 ∑D2 ) / N (N2 – 1)
◼ ρ = Rank correlation coefficient
◼ D = Difference of rank between paired item in two series.
◼ N = Total number of observation.
59
Spearman Rho Correlation
◼ Spearman's rank correlation coefficient or
Spearman's rho is named after Charles Spearman
◼ Used Greek letter ρ (rho) or as rs (non- parametric
measure of statistical dependence between two
variables)
◼ Assesses how well the relationship between two
variables can be described using a monotonic
function
◼ Monotonic is a function (or monotone function) in
mathematic that preserves the given order.
◼ If there are no repeated data values, a perfect
Spearman correlation of +1 or −1 occurs when
each of the variables is a perfect monotone
function of the other
60
Spearman Rho Correlation
60
◼ A correlation coefficient is a numerical measure or
index of the amount of association between two sets of
scores. It ranges in size from a maximum of +1.00
through 0.00 to -1.00
◼ The ‘+’ sign indicates a positive correlation (the scores on
one variable increase as the scores on the other
variable increase)
◼ The ‘-’ sign indicates a negative correlation (the scores on
one variable increase, the scores on the other variable
decrease)
61
61
Spearman Rho CorrelationCalculation
◼ Often thought of as being the Pearson correlation coefficient between the
ranked (relationship between two item) variables
◼ The n raw scores Xi, Yi are converted to ranks xi, yi, and the differences
di = xi − yi between the ranks of each observation on the two variables are
calculated
◼ If there are no tied ranks, then ρ is given by this formula:
62
Spearman Rho Correlation
Calculation
◼ Often thought of as being the Pearson correlation coefficient between the
ranked (relationship between two item) variables
◼ The n raw scores Xi, Yi are converted to ranks xi, yi, and the differences
di = xi − yi between the ranks of each observation on the two variables are
calculated
◼ If there are no tied ranks, then ρ is given by this formula:
63
Spearman Rho Correlation
Interpretation
◼ The sign of the Spearman correlation indicates the direction of association
between X (the independent variable) and Y (the dependent variable)
◼ If Y tends to increase when X increases, the Spearman correlation coefficient
is positive
◼ If Y tends to decrease when X increases, the Spearman correlation coefficient
is negative
◼ A Spearman correlation of zero indicates that there is no tendency for Y to
either increase or decrease when X increases
64
Spearman Rho Correlation
Interpretation cont…/
◼ Alternative name for the Spearman rank correlation is the "grade correlation”
the "rank" of an observation is replaced by the "grade"
◼ When X and Y are perfectly monotonically related, the Spearman correlation
coefficient becomes 1
◼ A perfect monotone increasing relationship implies that for any two pairs of
data values Xi, Yi and Xj, Yj, that Xi − Xj and Yi − Yj always have the same
sign
65
Spearman Rho Correlation
Example # 1
◼ Calculate the correlation between the IQ
of a person with the number of hours
spent in the class per week
◼ Find the value of the term d²i:
1. Sort the data by the first column (Xi).
Create a new column xi and assign it the
ranked values 1,2,3,...n.
2. Sort the data by the second column (Yi).
Create a fourth column yi and similarly
assign it the ranked values 1,2,3,...n.
3. Create a fifth column di to hold the
differences between the two rank
columns (xi and yi).
IQ, Xi Hours of class per
week, Yi
106 7
86 0
100 27
101 50
99 28
103 29
97 20
113 12
112 6
110 17
66
Spearman Rho Correlation
Example # 1 cont…/
4. Create one final column to hold the value of column di squared.
IQ
(Xi )
Hours of class per week
(Yi)
rank xi rank yi di d²i
86 0 1 1 0 0
97 20 2 6 -4 16
99 28 3 8 -5 25
100 27 4 7 -3 9
101 50 5 10 -5 25
103 29 6 9 -3 9
106 7 7 3 4 16
110 17 8 5 3 9
112 6 9 2 7 49
113 12 10 4 6 36
67
Spearman Rho Correlation
Example # 1- Result
◼ With d²i found, we can add them to find  d²i = 194
◼ The value of n is 10, so;
ρ = 1- 6 x 194
10(10² - 1)
ρ = −0.18
◼ The low value shows that the correlation between IQ and
hours spent in the class is very low
68
Spearman Rho Correlation
Example # 2:
5 college students have the following rankings in
Mathematics and Science subject. Is there an association
between the rankings in BRM and Economics subject.
Student Jogi Arpit Ankush Poonam Akshay
Economics 1 2 3 4 5
BRM 5 3 1 4 2
69
Spearman Rho Correlation
Example # 2 cont…/
◼ Compute Spearman Rho
n= number of paired ranks
d= difference between the paired ranks
(when two or more observations of one variable are the
same, ranks are assigned by averaging positions
occupied in their rank order)
70
Spearman Rho Correlation
Economics
Rank
BRM Rank X-Y (X-Y) 2
X Y D d2
1 5 -4 16
2 3 -1 1
3 1 2 4
4 4 0 0
5 2 3 9
30
Example # 2 cont…/
71
Spearman Rho Correlation
Result: ρ = -0.5
There is a moderate negative
correlation between
The BRM and Economics subject
rankings of students
Students who rank high as compared to other students in
their Economics subject generally have lower BRM subject
ranks and those with low Economics rankings have higher
BRM subject rankings than those with high Eco. rankings.
Merits Spearman’s Rank Correlation
◼ This method is simpler to understand and easier
to apply compared to karl pearson’s correlation
method.
◼ This method is useful where we can give the
ranks and not the actual data. (qualitative term)
◼ This method is to use where the initial data in
the form of ranks.
Limitation Spearman’s Correlation
◼ Cannot be used for finding out correlation in a
grouped frequency distribution.
◼ This method should be applied where N
exceeds 30.
Advantages of Correlation studies
◼ Show the amount (strength) of relationship
present
◼ Can be used to make predictions about the
variables under study.
◼ Can be used in many places, including natural
settings, libraries, etc.
◼ Easier to collect co relational data
3 kinds of relationships between variables
◼ Association or Correlation or Covary
◼ Both variables tend to be high or low (positive
relationship) or one tends to be high when the other is low
(negative relationship). Variables do not have
independent & dependent roles.
◼ Prediction
◼ Variables are assigned independent and dependent roles.
Both variables are observed. There is a weak causal
implication that the independent predictor variable is the
cause and the dependent variable is the effect.
◼ Causal
◼ Variables are assigned independent and dependent roles.
The independent variable is manipulated and the
dependent variable is observed. Strong causal statements
are allowed.
General Overview of
Correlational Analysis
◼ The purpose is to measure the strength of a
linear relationship between 2 variables.
◼ A correlation coefficient does not ensure
“causation” (i.e. a change in X causes a
change in Y)
◼ X is typically the Input, Measured, or
Independent variable.
◼ Y is typically the Output, Predicted, or
Dependent variable.
◼ If, as X increases, there is a predictable shift
in the values of Y, a correlation exists.
General Properties of
Correlation Coefficients
◼ Values can range between +1 and -1
◼ The value of the correlation coefficient
represents the scatter of points on a
scatterplot
◼ You should be able to look at a scatterplot
and estimate what the correlation would be
◼ You should be able to look at a correlation
coefficient and visualize the scatterplot
Perfect Linear Correlation
◼ Occurs when all the points in a scatterplot
fall exactly along a straight line.
Positive Correlation
Direct Relationship
◼ As the value of X increases, the value of Y also
increases
◼ Larger values of X tend to be paired with larger
values of Y (and consequently, smaller values of X and Y
tend to be paired)
Negative Correlation
Inverse Relationship
• As the value of X
increases, the value of Y
decreases
• Small values of X tend to
be paired with large
value of Y (and vice
versa).
Non-Linear Correlation
• As the value of X increases, the value of Y
changes in a non-linear manner
No Correlation
• As the value of X
changes, Y does not
change in a predictable
manner.
• Large values of X seem
just as likely to be paired
with small values of Y as
with large values of Y
Correlation Strength
◼ .00 - .20 Weak or none
◼ .20 - .40 Weak
◼ .40 - .60 Moderate
◼ .60 - .80 Strong
◼ .80 - 1.00 Very strong
Interpretation
◼ Depends on what the purpose of the study
is… but here is a “general guideline”...
• Value = magnitude of the relationship
• Sign = direction of the relationship
Some of the many
Types of Correlation Coefficients
(there are lot’s more…)
Name X variable Y variable
Pearson r Interval/Ratio Interval/Ratio
Spearman rho Ordinal Ordinal
Kendall's Tau Ordinal Ordinal
Phi Dichotomous Dichotomous
Intraclass R Interval/Ratio
Test
Interval/Ratio
Retest
Correlation Hypothesis
Testing
◼ Step 1. Identify the population, distribution,
and assumptions
◼ Step 2. State the null and research
hypotheses.
◼ Step 3. Determine the characteristics of the
comparison distribution.
◼ Step 4. Determine the critical values.
◼ Step 5. Calculate the test statistic
◼ Step 6. Make a decision.
Regression Analysis
◼ Regression Analysis is a very
powerful tool in the field of statistical
analysis in predicting the value of one
variable, given the value of another
variable, when those variables are
related to each other.
Regression Analysis
◼ Regression Analysis is mathematical measure of
average relationship between two or more
variables.
◼ Regression analysis is a statistical tool used in
prediction of value of unknown variable from
known variable.
Advantages of Regression Analysis
◼ Regression analysis provides estimates of
values of the dependent variables from the
values of independent variables.
◼ Regression analysis also helps to obtain a
measure of the error involved in using the
regression line as a basis for estimations .
◼ Regression analysis helps in obtaining a
measure of the degree of association or
correlation that exists between the two variable.
◼ Existence of actual linear relationship.
◼ The regression analysis is used to estimate the
values within the range for which it is valid.
◼ The relationship between the dependent and
independent variables remains the same till the
regression equation is calculated.
◼ The dependent variable takes any random value but
the values of the independent variables are fixed.
◼ In regression, we have only one dependant variable
in our estimating equation. However, we can use
more than one independent variable.
Assumptions in Regression Analysis
Regression line
◼ Regression line is the line which gives the best
estimate of one variable from the value of any
other given variable.
◼ The regression line gives the average
relationship between the two variables in
mathematical form.
◼ The Regression would have the following
properties: a) ∑( Y – Yc ) = 0 and
b) ∑( Y – Yc )2 = Minimum
Regression line
◼ For two variables X and Y, there are always two
lines of regression –
◼ Regression line of X on Y : gives the best
estimate for the value of X for any specific
given values of Y
◼ X = a + b Y a = X - intercept
◼ b = Slope of the line
◼ X = Dependent variable
◼ Y = Independent variable
Regression line
◼ For two variables X and Y, there are always two
lines of regression –
◼ Regression line of Y on X : gives the best
estimate for the value of Y for any specific given
values of X
◼ Y = a + bx a = Y - intercept
◼ b = Slope of the line
◼ Y = Dependent variable
◼ x= Independent variable
The Explanation of Regression Line
◼ In case of perfect correlation ( positive or
negative ) the two line of regression coincide.
◼ If the two R. line are far from each other then
degree of correlation is less, & vice versa.
◼ The mean values of X &Y can be obtained as
the point of intersection of the two regression
line.
◼ The higher degree of correlation between the
variables, the angle between the lines is
smaller & vice versa.
Regression Equation / Line
& Method of Least Squares
◼ Regression Equation of y on x
Y = a + bx
In order to obtain the values of ‘a’ & ‘b’
∑y = na + b∑x
∑xy = a∑x + b∑x2
◼ Regression Equation of x on y
X = c + dy
In order to obtain the values of ‘c’ & ‘d’
∑x = nc + d∑y
∑xy = c∑y + d∑y2
Regression Equation / Line when
Deviation taken from Arithmetic Mean
◼ Regression Equation of y on x:
Y = a + bx
In order to obtain the values of ‘a’ & ‘b’
a = Y – bX b = ∑xy / ∑x2
◼ Regression Equation of x on y:
X = c + dy
c = X – dY d = ∑xy / ∑y2
Regression Equation / Line when
Deviation taken from Arithmetic Mean
◼ Regression Equation of y on x:
Y – Y = byx (X –X)
byx = ∑xy / ∑x2
byx = r (σy / σx )
◼ Regression Equation of x on y:
X – X = bxy (Y –Y)
bxy = ∑xy / ∑y2
bxy = r (σx / σy )
Properties of the Regression Coefficients
◼ The coefficient of correlation is geometric mean of the two
regression coefficients. r = √ byx * bxy
◼ If byx is positive than bxy should also be positive & vice
versa.
◼ If one regression coefficient is greater than one the other
must be less than one.
◼ The coefficient of correlation will have the same sign as
that our regression coefficient.
◼ Arithmetic mean of byx & bxy is equal to or greater than
coefficient of correlation. byx + bxy / 2 ≥ r
◼ Regression coefficient are independent of origin but not of
scale.
Standard Error of Estimate.
◼ Standard Error of Estimate is the measure of
variation around the computed regression line.
◼ Standard error of estimate (SE) of Y measure the
variability of the observed values of Y around the
regression line.
◼ Standard error of estimate gives us a measure about
the line of regression. of the scatter of the
observations about the line of regression.
Standard Error of Estimate.
◼ Standard Error of Estimate of Y on X is:
S.E. of Yon X (SExy) = √∑(Y – Ye )2 / n-2
Y = Observed value of y
Ye = Estimated values from the estimated equation that correspond
to each y value
e = The error term (Y – Ye)
n = Number of observation in sample.
◼ The convenient formula:
(SExy) = √∑Y2 _ a∑Y _ b∑YX / n – 2
X = Value of independent variable.
Y = Value of dependent variable.
a = Y intercept.
b = Slope of estimating equation.
n = Number of data points.
Correlation analysis vs.
Regression analysis.
◼ Regression is the average relationship between two
variables
◼ Correlation need not imply cause & effect
relationship between the variables understudy.- R A
clearly indicate the cause and effect relation ship
between the variables.
◼ There may be non-sense correlation between two
variables.- There is no such thing like non-sense
regression.
Correlation analysis vs.
Regression analysis.
◼ Regression is the average relationship between two
variables
◼ R A.
What is regression?
◼ Fitting a line to the data using an equation in order
to describe and predict data
◼ Simple Regression
◼ Uses just 2 variables (X and Y)
◼ Other: Multiple Regression (one Y and many X’s)
◼ Linear Regression
◼ Fits data to a straight line
◼ Other: Curvilinear Regression (curved line)
We’re doing: Simple, Linear Regression
From Geometry:
◼ Any line can be described by an equation
◼ For any point on a line for X, there will be a
corresponding Y
◼ the equation for this is y = mx + b
◼ m is the slope, b is the Y-intercept (when X = 0)
◼ Slope = change in Y per unit change in X
◼ Y-intercept = where the line crosses the Y axis
(when X = 0)
Regression equation
◼ Find a line that fits the data the best, = find a
line that minimizes the distance from all the
data points to that line
◼ Regression Equation: Y(Y-hat) = bX + a
◼ Y(hat) is the predicted value of Y given a
certain X
◼ b is the slope
◼ a is the y-intercept
^
Regression Equation:
We can predict a Y score from an X by
plugging a value for X into the equation and
calculating Y
What would we expect a person to get on quiz
#4 if they got a 12.5 on quiz #3?
Y = .823X + -4.239
Y = .823(12.5) + -4.239 = 6.049
Advantages of Correlation studies
◼ Show the amount (strength) of relationship present
◼ Can be used to make predictions about the variables
studied
◼ Can be used in many places, including natural
settings, libraries, etc.
◼ Easier to collect correlational data

More Related Content

What's hot

What's hot (20)

Mba i qt unit-3_correlation
Mba i qt unit-3_correlationMba i qt unit-3_correlation
Mba i qt unit-3_correlation
 
Correlations using SPSS
Correlations using SPSSCorrelations using SPSS
Correlations using SPSS
 
Correlation analysis ppt
Correlation analysis pptCorrelation analysis ppt
Correlation analysis ppt
 
Correlation
CorrelationCorrelation
Correlation
 
Correlation- an introduction and application of spearman rank correlation by...
Correlation- an introduction and application of spearman rank correlation  by...Correlation- an introduction and application of spearman rank correlation  by...
Correlation- an introduction and application of spearman rank correlation by...
 
Correlation IN STATISTICS
Correlation IN STATISTICSCorrelation IN STATISTICS
Correlation IN STATISTICS
 
Correlation
CorrelationCorrelation
Correlation
 
Correlation
CorrelationCorrelation
Correlation
 
Measure of Relationship: Correlation Coefficient
Measure of Relationship: Correlation CoefficientMeasure of Relationship: Correlation Coefficient
Measure of Relationship: Correlation Coefficient
 
Covariance vs Correlation
Covariance vs CorrelationCovariance vs Correlation
Covariance vs Correlation
 
Correlation and partial correlation
Correlation and partial correlationCorrelation and partial correlation
Correlation and partial correlation
 
Correlation
CorrelationCorrelation
Correlation
 
Meaning and types of correlation
Meaning and types of correlationMeaning and types of correlation
Meaning and types of correlation
 
Correlationanalysis
CorrelationanalysisCorrelationanalysis
Correlationanalysis
 
01 psychological statistics 1
01 psychological statistics 101 psychological statistics 1
01 psychological statistics 1
 
Correlationanalysis
CorrelationanalysisCorrelationanalysis
Correlationanalysis
 
What is Karl Pearson Correlation Analysis and How Can it be Used for Enterpri...
What is Karl Pearson Correlation Analysis and How Can it be Used for Enterpri...What is Karl Pearson Correlation Analysis and How Can it be Used for Enterpri...
What is Karl Pearson Correlation Analysis and How Can it be Used for Enterpri...
 
Correlation
CorrelationCorrelation
Correlation
 
Correlation and regression
Correlation and regressionCorrelation and regression
Correlation and regression
 
Correlation
CorrelationCorrelation
Correlation
 

Similar to Shriram correlation

Correlation and Regression.pdf
Correlation and Regression.pdfCorrelation and Regression.pdf
Correlation and Regression.pdf
AadarshSah1
 
correlationandregression1-200905162711.pdf
correlationandregression1-200905162711.pdfcorrelationandregression1-200905162711.pdf
correlationandregression1-200905162711.pdf
MuhammadAftab89
 

Similar to Shriram correlation (20)

Correlation.pptx
Correlation.pptxCorrelation.pptx
Correlation.pptx
 
P G STAT 531 Lecture 9 Correlation
P G STAT 531 Lecture 9 CorrelationP G STAT 531 Lecture 9 Correlation
P G STAT 531 Lecture 9 Correlation
 
BS_5Correlation.pptx
BS_5Correlation.pptxBS_5Correlation.pptx
BS_5Correlation.pptx
 
correlation and regression.pptx
correlation and regression.pptxcorrelation and regression.pptx
correlation and regression.pptx
 
Correlation and regression
Correlation and regressionCorrelation and regression
Correlation and regression
 
Artificial Intelligence (Unit - 8).pdf
Artificial Intelligence   (Unit  -  8).pdfArtificial Intelligence   (Unit  -  8).pdf
Artificial Intelligence (Unit - 8).pdf
 
G5-Statistical-Relationship-Among-Variables.pptx
G5-Statistical-Relationship-Among-Variables.pptxG5-Statistical-Relationship-Among-Variables.pptx
G5-Statistical-Relationship-Among-Variables.pptx
 
Correlation Analysis
Correlation AnalysisCorrelation Analysis
Correlation Analysis
 
Correlation Computations Thiyagu
Correlation Computations ThiyaguCorrelation Computations Thiyagu
Correlation Computations Thiyagu
 
Biostatistics - Correlation explanation.pptx
Biostatistics - Correlation explanation.pptxBiostatistics - Correlation explanation.pptx
Biostatistics - Correlation explanation.pptx
 
Data analysis test for association BY Prof Sachin Udepurkar
Data analysis   test for association BY Prof Sachin UdepurkarData analysis   test for association BY Prof Sachin Udepurkar
Data analysis test for association BY Prof Sachin Udepurkar
 
PPT Correlation.pptx
PPT Correlation.pptxPPT Correlation.pptx
PPT Correlation.pptx
 
Multivariate Analysis Degree of association between two variable - Test of Ho...
Multivariate Analysis Degree of association between two variable- Test of Ho...Multivariate Analysis Degree of association between two variable- Test of Ho...
Multivariate Analysis Degree of association between two variable - Test of Ho...
 
2-20-04.ppt
2-20-04.ppt2-20-04.ppt
2-20-04.ppt
 
Correlation and Regression.pdf
Correlation and Regression.pdfCorrelation and Regression.pdf
Correlation and Regression.pdf
 
Correlation
CorrelationCorrelation
Correlation
 
correlationandregression1-200905162711.pdf
correlationandregression1-200905162711.pdfcorrelationandregression1-200905162711.pdf
correlationandregression1-200905162711.pdf
 
Correlation and Regression
Correlation and RegressionCorrelation and Regression
Correlation and Regression
 
Simple correlation
Simple correlationSimple correlation
Simple correlation
 
OUANTITATIVE METHODS report group 5.pptx
OUANTITATIVE METHODS report group 5.pptxOUANTITATIVE METHODS report group 5.pptx
OUANTITATIVE METHODS report group 5.pptx
 

More from Dr. Shriram Dawkhar, Sinhgad Institutes

More from Dr. Shriram Dawkhar, Sinhgad Institutes (14)

Brm chap-4 present-updated
Brm chap-4 present-updatedBrm chap-4 present-updated
Brm chap-4 present-updated
 
Brm unit 3-dr. shriram dawkhar
Brm unit 3-dr. shriram dawkharBrm unit 3-dr. shriram dawkhar
Brm unit 3-dr. shriram dawkhar
 
Business research methods_mba_unit-2
Business research methods_mba_unit-2Business research methods_mba_unit-2
Business research methods_mba_unit-2
 
Research methods nicholas walliman
Research methods nicholas wallimanResearch methods nicholas walliman
Research methods nicholas walliman
 
Business research methods_unit-1
Business research methods_unit-1Business research methods_unit-1
Business research methods_unit-1
 
Qualitative Research Methods
Qualitative Research MethodsQualitative Research Methods
Qualitative Research Methods
 
Qualitative research methods for student
Qualitative research methods for studentQualitative research methods for student
Qualitative research methods for student
 
Brm unit.5 data.analysis_interpretation_shriram.dawkhar.1
Brm unit.5 data.analysis_interpretation_shriram.dawkhar.1Brm unit.5 data.analysis_interpretation_shriram.dawkhar.1
Brm unit.5 data.analysis_interpretation_shriram.dawkhar.1
 
Shriram edpm chapter-3
Shriram edpm chapter-3Shriram edpm chapter-3
Shriram edpm chapter-3
 
Sampling brm chap-4
Sampling brm chap-4Sampling brm chap-4
Sampling brm chap-4
 
Qualitative and quantitative research
Qualitative and quantitative researchQualitative and quantitative research
Qualitative and quantitative research
 
Research design brm-chap-2..
Research design brm-chap-2..Research design brm-chap-2..
Research design brm-chap-2..
 
Entrepreneurship development unit-1_for_internal_distribution
Entrepreneurship development unit-1_for_internal_distributionEntrepreneurship development unit-1_for_internal_distribution
Entrepreneurship development unit-1_for_internal_distribution
 
Theories of entrepreneurship_shriram.dawkhar
Theories of entrepreneurship_shriram.dawkharTheories of entrepreneurship_shriram.dawkhar
Theories of entrepreneurship_shriram.dawkhar
 

Recently uploaded

Computer Presentation.pptx ecommerce advantage s
Computer Presentation.pptx ecommerce advantage sComputer Presentation.pptx ecommerce advantage s
Computer Presentation.pptx ecommerce advantage s
MAQIB18
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
ocavb
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
vcaxypu
 
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
nscud
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
benishzehra469
 
Introduction-to-Cybersecurit57hhfcbbcxxx
Introduction-to-Cybersecurit57hhfcbbcxxxIntroduction-to-Cybersecurit57hhfcbbcxxx
Introduction-to-Cybersecurit57hhfcbbcxxx
zahraomer517
 
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
ewymefz
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
ewymefz
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
yhkoc
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
ewymefz
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
ewymefz
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
ukgaet
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
ArpitMalhotra16
 

Recently uploaded (20)

How can I successfully sell my pi coins in Philippines?
How can I successfully sell my pi coins in Philippines?How can I successfully sell my pi coins in Philippines?
How can I successfully sell my pi coins in Philippines?
 
Tabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflowsTabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflows
 
社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .
 
Computer Presentation.pptx ecommerce advantage s
Computer Presentation.pptx ecommerce advantage sComputer Presentation.pptx ecommerce advantage s
Computer Presentation.pptx ecommerce advantage s
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
 
Introduction-to-Cybersecurit57hhfcbbcxxx
Introduction-to-Cybersecurit57hhfcbbcxxxIntroduction-to-Cybersecurit57hhfcbbcxxx
Introduction-to-Cybersecurit57hhfcbbcxxx
 
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
 
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project PresentationPredicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
 
Webinar One View, Multiple Systems No-Code Integration of Salesforce and ERPs
Webinar One View, Multiple Systems No-Code Integration of Salesforce and ERPsWebinar One View, Multiple Systems No-Code Integration of Salesforce and ERPs
Webinar One View, Multiple Systems No-Code Integration of Salesforce and ERPs
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
 

Shriram correlation

  • 1. Bivariate Analysis Dr. Shriram Dawkhar SIBAR, Kondhwa.
  • 2. So far the statistical methods we have used only permit us to: ◼ Look at the frequency in which certain numbers or categories occur. ◼ Look at measures of central tendency such as means, modes, and medians for one variable. ◼ Look at measures of dispersion such as standard deviation and z scores for one interval or ratio level variable.
  • 3. Bivariate analysis allows us to: ◼ Look at associations/relationships among two variables. ◼ Look at measures of the strength of the relationship between two variables. ◼ Test hypotheses about relationships between two nominal or ordinal level variables.
  • 4. For example, what does this table tell us about opinions on welfare by gender? Support cutting welfare benefits for immigrants Male Female Yes 15 5 No 10 20 Total 25 25
  • 5. Are frequencies sufficient to allow us to make comparisons about groups? What other information do we need?
  • 6. Is this table more helpful? Benefits for Immigrants Males Female Yes 15 (60%) 5 (20%) No 10 (40%) 20 (80%) Total 25 (100%) 25 (100%)
  • 7. Rules for cross-tabulation ◼ Calculate either column or row percents. ◼ Calculations are the number of frequencies in a cell of a table divided by the total number of frequencies in that column or row, for example 20/25 = 80.0% ◼ All percentages in a column or row should total 100%.
  • 8. Let’s look at another example – social work degrees by gender Social Work Degree Male Female BA 20 (33.3%) 20 ( %) MSW 30 ( ) 70 (70.0%) Ph.D. 10 (16.7%) 10 (10.0%) 60 (100.0%) 100 (100.0%
  • 9. We use cross-tabulation when: ◼ We want to look at relationships among two or three variables. ◼ We want a descriptive statistical measure to tell us whether differences among groups are large enough to indicate some sort of relationship among variables.
  • 10. Cross-tabs are not sufficient to: ◼ Tell us the strength or actually size of the relationships among two or three variables. ◼ Test a hypothesis about the relationship between two or three variables. ◼ Tell us the direction of the relationship among two or more variables. ◼ Look at relationships between one nominal or ordinal variable and one ratio or interval variable unless the range of possible values for the ratio or interval variable is small. What do you think a table with a large number of ratio values would look like?
  • 11. We can use cross-tabs to visually assess whether independent and dependent variables might be related. In addition, we also use cross-tabs to find out if demographic variables such as gender and ethnicity are related to the second variable.
  • 12. Because we use tables in these ways, we can set up some decision rules about how to use tables. ◼ Independent variables should be column variables. ◼ If you are not looking at independent and dependent variable relationships, use the variable that can logically be said to influence the other as your column variable. ◼ Using this rule, always calculate column percentages rather than row percentages. ◼ Use the column percentages to interpret your results.
  • 14. CORRELATION ◼ Correlation is a much abused word/term ◼ Correlation is a term which implies that there is an association between the paired values of 2 variables, where association means that the fluctuations in the values for each variable is sufficiently regular to make it unlikely that the association has arisen by chance ◼ assumes: independent random samples are taken from a distribution in which the 2 variables are together normally distributed
  • 15. Correlation ◼ Correlation: The degree of relationship between the variables under consideration is measure through the correlation analysis. ◼ The measure of correlation called the correlation coefficient ◼ The degree of relationship is expressed by coefficient which range from correlation ( -1 ≤ r ≥ +1) ◼ The direction of change is indicated by a sign. ◼ The correlation analysis enable us to have an idea about the degree & direction of the relationship between the two variables under study.
  • 16. Correlation ◼ Correlation is a statistical tool that helps to measure and analyze the degree of relationship between two variables. ◼ Correlation analysis deals with the association between two or more variables.
  • 17. Correlation & Causation ◼ Causation means cause & effect relation. ◼ Correlation denotes the interdependency among the variables for correlating two phenomenon, it is essential that the two phenomenon should have cause-effect relationship,& if such relationship does not exist then the two phenomenon can not be correlated. ◼ If two variables vary in such a way that movement in one are accompanied by movement in other, these variables are called cause and effect relationship. ◼ Causation always implies correlation but correlation does not necessarily implies causation.
  • 18. Types of Correlation Type I Correlation Positive Correlation Negative Correlation
  • 19. Types of Correlation Type I ◼ Positive Correlation: The correlation is said to be positive correlation if the values of two variables changing with same direction. Ex. Pub./Ad. Exp. & sales, Height & weight. ◼ Negative Correlation: The correlation is said to be negative correlation when the values of variables change with opposite direction. Ex. Price & qty. demanded.
  • 20. Direction of the Correlation ◼ Positive relationship – Variables change in the same direction. ◼ As X is increasing, Y is increasing ◼ As X is decreasing, Y is decreasing ◼ E.g., As height increases, so does weight. ◼ Negative relationship – Variables change in opposite directions. ◼ As X is increasing, Y is decreasing ◼ As X is decreasing, Y is increasing ◼ E.g., As TV time increases, grades decrease Indicated by sign; (+) or (-).
  • 21. More examples ◼ Positive relationships ◼ temperature and water consumption . ◼ study time and grades. ◼ Negative relationships: ◼ alcohol consumption and driving ability. ◼ Price & quantity demanded
  • 22. Types of Correlation Type II Correlation Simple Multiple Partial Total
  • 23. Types of Correlation Type II ◼ Simple correlation: Under simple correlation problem there are only two variables are studied. ◼ Multiple Correlation: Under Multiple Correlation three or more than three variables are studied. Ex. Qd = f ( P,PC, PS, t, y ) ◼ Partial correlation: analysis recognizes more than two variables but considers only two variables keeping the other constant. ◼ Total correlation: is based on all the relevant variables, which is normally not feasible.
  • 24. Types of Correlation Type III Correlation LINEAR NON LINEAR
  • 25. Types of Correlation Type III ◼ Linear correlation: Correlation is said to be linear when the amount of change in one variable tends to bear a constant ratio to the amount of change in the other. The graph of the variables having a linear relationship will form a straight line. Ex X = 1, 2, 3, 4, 5, 6, 7, 8, Y = 5, 7, 9, 11, 13, 15, 17, 19, Y = 3 + 2x ◼ Non Linear correlation: The correlation would be non linear if the amount of change in one variable does not bear a constant ratio to the amount of change in the other variable.
  • 26. Methods of Studying Correlation ◼ Scatter Diagram Method ◼ Karl Pearson’s Coefficient of Correlation (r) ◼ Spearman’s Rank Correlation Coefficient
  • 27. Scatter Plot ◼ A scatter plot is a graph of a collection of ordered pairs (x,y). ◼ The graph looks like a bunch of dots, but some of the graphs are a general shape or move in a general direction.
  • 28. Scatter Diagram Method ◼ Scatter Diagram is a graph of observed plotted points where each points represents the values of X & Y as a coordinate. It portrays the relationship between these two variables graphically.
  • 29.
  • 31. Y X Y X Y Y X X Linear relationships Curvilinear relationships Linear Correlation ◼Slide from: Statistics for Managers Using Microsoft® Excel 4th Edition, 2004 Prentice-Hall
  • 32. Y X Y X Y Y X X Strong relationships Weak relationships Linear Correlation ◼Slide from: Statistics for Managers Using Microsoft® Excel 4th Edition, 2004 Prentice-Hall
  • 33. Linear Correlation Y X Y X No relationship ◼Slide from: Statistics for Managers Using Microsoft® Excel 4th Edition, 2004 Prentice-Hall
  • 35. A perfect positive correlation Height Weight Height of A Weight of A Height of B Weight of B A linear relationship
  • 36. Degree of correlation ◼ Perfect Positive Correlation Grade obtained Hours of study r = +1.0
  • 37. High Degree of positive correlation ◼ Positive relationship Height Weight r = +.80
  • 38. Degree of correlation ◼ Moderate Positive Correlation Weight Shoe Size r = + 0.4
  • 39. Degree of correlation ◼ Perfect Negative Correlation Exam score TV watching per week r = -1.0
  • 40. Degree of correlation ◼ Moderate Negative Correlation Exam score TV watching per week r = -.80
  • 41. Degree of correlation ◼ Weak negative Correlation Weight Shoe Size r = - 0.2
  • 42. Degree of correlation ◼ No Correlation (horizontal line) Height IQ r = 0.0
  • 43. 2) Direction of the Relationship ◼ Positive relationship – Variables change in the same direction. ◼ As X is increasing, Y is increasing ◼ As X is decreasing, Y is decreasing ◼ E.g., As height increases, so does weight. ◼ Negative relationship – Variables change in opposite directions. ◼ As X is increasing, Y is decreasing ◼ As X is decreasing, Y is increasing ◼ E.g., As TV time increases, grades decrease Indicated by sign; (+) or (-).
  • 44. Advantages of Scatter Diagram ◼ Simple & Non Mathematical method ◼ Not influenced by the size of extreme item ◼ First step in investing the relationship between two variables
  • 45. Disadvantage of scatter diagram Can not adopt the an exact degree of correlation
  • 46. Karl Pearson's Coefficient of Correlation ◼ Pearson’s ‘r’ is the most common correlation coefficient. ◼ Karl Pearson’s Coefficient of Correlation denoted by- ‘r’ The coefficient of correlation ‘r’ measure the degree of linear relationship between two variables say x & y.
  • 47. Karl Pearson's Coefficient of Correlation ◼ Karl Pearson’s Coefficient of Correlation denoted by- r -1 ≤ r ≥ +1 ◼ Degree of Correlation is expressed by a value of Coefficient ◼ Direction of change is Indicated by sign ( - ve) or ( + ve)
  • 48. Karl Pearson's Coefficient of Correlation ◼ When deviation taken from actual mean: r(x, y)= Σxy /√ Σx² Σy² ◼ When deviation taken from an assumed mean: r = N Σdxdy - Σdx Σdy √N Σdx²-(Σdx)² √N Σdy²-(Σdy)²
  • 49. Procedure for computing the correlation coefficient ◼ Calculate the mean of the two series ‘x’ &’y’ ◼ Calculate the deviations ‘x’ &’y’ in two series from their respective mean. ◼ Square each deviation of ‘x’ &’y’ then obtain the sum of the squared deviation i.e.∑x2 & .∑y2 ◼ Multiply each deviation under x with each deviation under y & obtain the product of ‘xy’.Then obtain the sum of the product of x , y i.e. ∑xy ◼ Substitute the value in the formula.
  • 50. Interpretation of Correlation Coefficient (r) ◼ The value of correlation coefficient ‘r’ ranges from -1 to +1 ◼ If r = +1, then the correlation between the two variables is said to be perfect and positive ◼ If r = -1, then the correlation between the two variables is said to be perfect and negative ◼ If r = 0, then there exists no correlation between the variables
  • 51. Properties of Correlation coefficient ◼ The correlation coefficient lies between -1 & +1 symbolically ( - 1≤ r ≥ 1 ) ◼ The correlation coefficient is independent of the change of origin & scale. ◼ The coefficient of correlation is the geometric mean of two regression coefficient. r = √ bxy * byx The one regression coefficient is (+ve) other regression coefficient is also (+ve) correlation coefficient is (+ve)
  • 52. Assumptions of Pearson’s Correlation Coefficient ◼ There is linear relationship between two variables, i.e. when the two variables are plotted on a scatter diagram a straight line will be formed by the points. ◼ Cause and effect relation exists between different forces operating on the item of the two variable series.
  • 53. Advantages of Pearson’s Coefficient ◼ It summarizes in one value, the degree of correlation & direction of correlation also.
  • 54. Limitation of Pearson’s Coefficient ◼ Always assume linear relationship ◼ Interpreting the value of r is difficult. ◼ Value of Correlation Coefficient is affected by the extreme values. ◼ Time consuming methods
  • 55. Coefficient of Determination ◼ The convenient way of interpreting the value of correlation coefficient is to use of square of coefficient of correlation which is called Coefficient of Determination. ◼ The Coefficient of Determination = r2. ◼ Suppose: r = 0.9, r2 = 0.81 this would mean that 81% of the variation in the dependent variable has been explained by the independent variable.
  • 56. Coefficient of Determination ◼ The maximum value of r2 is 1 because it is possible to explain all of the variation in y but it is not possible to explain more than all of it. ◼ Coefficient of Determination = Explained variation / Total variation
  • 57. Coefficient of Determination: An example ◼ Suppose: r = 0.60 r = 0.30 It does not mean that the first correlation is twice as strong as the second the ‘r’ can be understood by computing the value of r2 . When r = 0.60 r2 = 0.36 -----(1) r = 0.30 r2 = 0.09 -----(2) This implies that in the first case 36% of the total variation is explained whereas in second case 9% of the total variation is explained .
  • 58. 3) Spearman’s Rank Coefficient of Correlation ◼ When statistical series in which the variables under study are not capable of quantitative measurement but can be arranged in serial order, in such situation pearson’s correlation coefficient can not be used in such case Spearman Rank correlation can be used. ◼ ρ (rho) = 1- (6 ∑D2 ) / N (N2 – 1) ◼ ρ = Rank correlation coefficient ◼ D = Difference of rank between paired item in two series. ◼ N = Total number of observation.
  • 59. 59 Spearman Rho Correlation ◼ Spearman's rank correlation coefficient or Spearman's rho is named after Charles Spearman ◼ Used Greek letter ρ (rho) or as rs (non- parametric measure of statistical dependence between two variables) ◼ Assesses how well the relationship between two variables can be described using a monotonic function ◼ Monotonic is a function (or monotone function) in mathematic that preserves the given order. ◼ If there are no repeated data values, a perfect Spearman correlation of +1 or −1 occurs when each of the variables is a perfect monotone function of the other
  • 60. 60 Spearman Rho Correlation 60 ◼ A correlation coefficient is a numerical measure or index of the amount of association between two sets of scores. It ranges in size from a maximum of +1.00 through 0.00 to -1.00 ◼ The ‘+’ sign indicates a positive correlation (the scores on one variable increase as the scores on the other variable increase) ◼ The ‘-’ sign indicates a negative correlation (the scores on one variable increase, the scores on the other variable decrease)
  • 61. 61 61 Spearman Rho CorrelationCalculation ◼ Often thought of as being the Pearson correlation coefficient between the ranked (relationship between two item) variables ◼ The n raw scores Xi, Yi are converted to ranks xi, yi, and the differences di = xi − yi between the ranks of each observation on the two variables are calculated ◼ If there are no tied ranks, then ρ is given by this formula:
  • 62. 62 Spearman Rho Correlation Calculation ◼ Often thought of as being the Pearson correlation coefficient between the ranked (relationship between two item) variables ◼ The n raw scores Xi, Yi are converted to ranks xi, yi, and the differences di = xi − yi between the ranks of each observation on the two variables are calculated ◼ If there are no tied ranks, then ρ is given by this formula:
  • 63. 63 Spearman Rho Correlation Interpretation ◼ The sign of the Spearman correlation indicates the direction of association between X (the independent variable) and Y (the dependent variable) ◼ If Y tends to increase when X increases, the Spearman correlation coefficient is positive ◼ If Y tends to decrease when X increases, the Spearman correlation coefficient is negative ◼ A Spearman correlation of zero indicates that there is no tendency for Y to either increase or decrease when X increases
  • 64. 64 Spearman Rho Correlation Interpretation cont…/ ◼ Alternative name for the Spearman rank correlation is the "grade correlation” the "rank" of an observation is replaced by the "grade" ◼ When X and Y are perfectly monotonically related, the Spearman correlation coefficient becomes 1 ◼ A perfect monotone increasing relationship implies that for any two pairs of data values Xi, Yi and Xj, Yj, that Xi − Xj and Yi − Yj always have the same sign
  • 65. 65 Spearman Rho Correlation Example # 1 ◼ Calculate the correlation between the IQ of a person with the number of hours spent in the class per week ◼ Find the value of the term d²i: 1. Sort the data by the first column (Xi). Create a new column xi and assign it the ranked values 1,2,3,...n. 2. Sort the data by the second column (Yi). Create a fourth column yi and similarly assign it the ranked values 1,2,3,...n. 3. Create a fifth column di to hold the differences between the two rank columns (xi and yi). IQ, Xi Hours of class per week, Yi 106 7 86 0 100 27 101 50 99 28 103 29 97 20 113 12 112 6 110 17
  • 66. 66 Spearman Rho Correlation Example # 1 cont…/ 4. Create one final column to hold the value of column di squared. IQ (Xi ) Hours of class per week (Yi) rank xi rank yi di d²i 86 0 1 1 0 0 97 20 2 6 -4 16 99 28 3 8 -5 25 100 27 4 7 -3 9 101 50 5 10 -5 25 103 29 6 9 -3 9 106 7 7 3 4 16 110 17 8 5 3 9 112 6 9 2 7 49 113 12 10 4 6 36
  • 67. 67 Spearman Rho Correlation Example # 1- Result ◼ With d²i found, we can add them to find  d²i = 194 ◼ The value of n is 10, so; ρ = 1- 6 x 194 10(10² - 1) ρ = −0.18 ◼ The low value shows that the correlation between IQ and hours spent in the class is very low
  • 68. 68 Spearman Rho Correlation Example # 2: 5 college students have the following rankings in Mathematics and Science subject. Is there an association between the rankings in BRM and Economics subject. Student Jogi Arpit Ankush Poonam Akshay Economics 1 2 3 4 5 BRM 5 3 1 4 2
  • 69. 69 Spearman Rho Correlation Example # 2 cont…/ ◼ Compute Spearman Rho n= number of paired ranks d= difference between the paired ranks (when two or more observations of one variable are the same, ranks are assigned by averaging positions occupied in their rank order)
  • 70. 70 Spearman Rho Correlation Economics Rank BRM Rank X-Y (X-Y) 2 X Y D d2 1 5 -4 16 2 3 -1 1 3 1 2 4 4 4 0 0 5 2 3 9 30 Example # 2 cont…/
  • 71. 71 Spearman Rho Correlation Result: ρ = -0.5 There is a moderate negative correlation between The BRM and Economics subject rankings of students Students who rank high as compared to other students in their Economics subject generally have lower BRM subject ranks and those with low Economics rankings have higher BRM subject rankings than those with high Eco. rankings.
  • 72. Merits Spearman’s Rank Correlation ◼ This method is simpler to understand and easier to apply compared to karl pearson’s correlation method. ◼ This method is useful where we can give the ranks and not the actual data. (qualitative term) ◼ This method is to use where the initial data in the form of ranks.
  • 73. Limitation Spearman’s Correlation ◼ Cannot be used for finding out correlation in a grouped frequency distribution. ◼ This method should be applied where N exceeds 30.
  • 74. Advantages of Correlation studies ◼ Show the amount (strength) of relationship present ◼ Can be used to make predictions about the variables under study. ◼ Can be used in many places, including natural settings, libraries, etc. ◼ Easier to collect co relational data
  • 75. 3 kinds of relationships between variables ◼ Association or Correlation or Covary ◼ Both variables tend to be high or low (positive relationship) or one tends to be high when the other is low (negative relationship). Variables do not have independent & dependent roles. ◼ Prediction ◼ Variables are assigned independent and dependent roles. Both variables are observed. There is a weak causal implication that the independent predictor variable is the cause and the dependent variable is the effect. ◼ Causal ◼ Variables are assigned independent and dependent roles. The independent variable is manipulated and the dependent variable is observed. Strong causal statements are allowed.
  • 76. General Overview of Correlational Analysis ◼ The purpose is to measure the strength of a linear relationship between 2 variables. ◼ A correlation coefficient does not ensure “causation” (i.e. a change in X causes a change in Y) ◼ X is typically the Input, Measured, or Independent variable. ◼ Y is typically the Output, Predicted, or Dependent variable. ◼ If, as X increases, there is a predictable shift in the values of Y, a correlation exists.
  • 77. General Properties of Correlation Coefficients ◼ Values can range between +1 and -1 ◼ The value of the correlation coefficient represents the scatter of points on a scatterplot ◼ You should be able to look at a scatterplot and estimate what the correlation would be ◼ You should be able to look at a correlation coefficient and visualize the scatterplot
  • 78. Perfect Linear Correlation ◼ Occurs when all the points in a scatterplot fall exactly along a straight line.
  • 79. Positive Correlation Direct Relationship ◼ As the value of X increases, the value of Y also increases ◼ Larger values of X tend to be paired with larger values of Y (and consequently, smaller values of X and Y tend to be paired)
  • 80. Negative Correlation Inverse Relationship • As the value of X increases, the value of Y decreases • Small values of X tend to be paired with large value of Y (and vice versa).
  • 81. Non-Linear Correlation • As the value of X increases, the value of Y changes in a non-linear manner
  • 82. No Correlation • As the value of X changes, Y does not change in a predictable manner. • Large values of X seem just as likely to be paired with small values of Y as with large values of Y
  • 83. Correlation Strength ◼ .00 - .20 Weak or none ◼ .20 - .40 Weak ◼ .40 - .60 Moderate ◼ .60 - .80 Strong ◼ .80 - 1.00 Very strong
  • 84. Interpretation ◼ Depends on what the purpose of the study is… but here is a “general guideline”... • Value = magnitude of the relationship • Sign = direction of the relationship
  • 85. Some of the many Types of Correlation Coefficients (there are lot’s more…) Name X variable Y variable Pearson r Interval/Ratio Interval/Ratio Spearman rho Ordinal Ordinal Kendall's Tau Ordinal Ordinal Phi Dichotomous Dichotomous Intraclass R Interval/Ratio Test Interval/Ratio Retest
  • 86. Correlation Hypothesis Testing ◼ Step 1. Identify the population, distribution, and assumptions ◼ Step 2. State the null and research hypotheses. ◼ Step 3. Determine the characteristics of the comparison distribution. ◼ Step 4. Determine the critical values. ◼ Step 5. Calculate the test statistic ◼ Step 6. Make a decision.
  • 87. Regression Analysis ◼ Regression Analysis is a very powerful tool in the field of statistical analysis in predicting the value of one variable, given the value of another variable, when those variables are related to each other.
  • 88. Regression Analysis ◼ Regression Analysis is mathematical measure of average relationship between two or more variables. ◼ Regression analysis is a statistical tool used in prediction of value of unknown variable from known variable.
  • 89. Advantages of Regression Analysis ◼ Regression analysis provides estimates of values of the dependent variables from the values of independent variables. ◼ Regression analysis also helps to obtain a measure of the error involved in using the regression line as a basis for estimations . ◼ Regression analysis helps in obtaining a measure of the degree of association or correlation that exists between the two variable.
  • 90. ◼ Existence of actual linear relationship. ◼ The regression analysis is used to estimate the values within the range for which it is valid. ◼ The relationship between the dependent and independent variables remains the same till the regression equation is calculated. ◼ The dependent variable takes any random value but the values of the independent variables are fixed. ◼ In regression, we have only one dependant variable in our estimating equation. However, we can use more than one independent variable. Assumptions in Regression Analysis
  • 91. Regression line ◼ Regression line is the line which gives the best estimate of one variable from the value of any other given variable. ◼ The regression line gives the average relationship between the two variables in mathematical form. ◼ The Regression would have the following properties: a) ∑( Y – Yc ) = 0 and b) ∑( Y – Yc )2 = Minimum
  • 92. Regression line ◼ For two variables X and Y, there are always two lines of regression – ◼ Regression line of X on Y : gives the best estimate for the value of X for any specific given values of Y ◼ X = a + b Y a = X - intercept ◼ b = Slope of the line ◼ X = Dependent variable ◼ Y = Independent variable
  • 93. Regression line ◼ For two variables X and Y, there are always two lines of regression – ◼ Regression line of Y on X : gives the best estimate for the value of Y for any specific given values of X ◼ Y = a + bx a = Y - intercept ◼ b = Slope of the line ◼ Y = Dependent variable ◼ x= Independent variable
  • 94. The Explanation of Regression Line ◼ In case of perfect correlation ( positive or negative ) the two line of regression coincide. ◼ If the two R. line are far from each other then degree of correlation is less, & vice versa. ◼ The mean values of X &Y can be obtained as the point of intersection of the two regression line. ◼ The higher degree of correlation between the variables, the angle between the lines is smaller & vice versa.
  • 95. Regression Equation / Line & Method of Least Squares ◼ Regression Equation of y on x Y = a + bx In order to obtain the values of ‘a’ & ‘b’ ∑y = na + b∑x ∑xy = a∑x + b∑x2 ◼ Regression Equation of x on y X = c + dy In order to obtain the values of ‘c’ & ‘d’ ∑x = nc + d∑y ∑xy = c∑y + d∑y2
  • 96. Regression Equation / Line when Deviation taken from Arithmetic Mean ◼ Regression Equation of y on x: Y = a + bx In order to obtain the values of ‘a’ & ‘b’ a = Y – bX b = ∑xy / ∑x2 ◼ Regression Equation of x on y: X = c + dy c = X – dY d = ∑xy / ∑y2
  • 97. Regression Equation / Line when Deviation taken from Arithmetic Mean ◼ Regression Equation of y on x: Y – Y = byx (X –X) byx = ∑xy / ∑x2 byx = r (σy / σx ) ◼ Regression Equation of x on y: X – X = bxy (Y –Y) bxy = ∑xy / ∑y2 bxy = r (σx / σy )
  • 98. Properties of the Regression Coefficients ◼ The coefficient of correlation is geometric mean of the two regression coefficients. r = √ byx * bxy ◼ If byx is positive than bxy should also be positive & vice versa. ◼ If one regression coefficient is greater than one the other must be less than one. ◼ The coefficient of correlation will have the same sign as that our regression coefficient. ◼ Arithmetic mean of byx & bxy is equal to or greater than coefficient of correlation. byx + bxy / 2 ≥ r ◼ Regression coefficient are independent of origin but not of scale.
  • 99. Standard Error of Estimate. ◼ Standard Error of Estimate is the measure of variation around the computed regression line. ◼ Standard error of estimate (SE) of Y measure the variability of the observed values of Y around the regression line. ◼ Standard error of estimate gives us a measure about the line of regression. of the scatter of the observations about the line of regression.
  • 100. Standard Error of Estimate. ◼ Standard Error of Estimate of Y on X is: S.E. of Yon X (SExy) = √∑(Y – Ye )2 / n-2 Y = Observed value of y Ye = Estimated values from the estimated equation that correspond to each y value e = The error term (Y – Ye) n = Number of observation in sample. ◼ The convenient formula: (SExy) = √∑Y2 _ a∑Y _ b∑YX / n – 2 X = Value of independent variable. Y = Value of dependent variable. a = Y intercept. b = Slope of estimating equation. n = Number of data points.
  • 101. Correlation analysis vs. Regression analysis. ◼ Regression is the average relationship between two variables ◼ Correlation need not imply cause & effect relationship between the variables understudy.- R A clearly indicate the cause and effect relation ship between the variables. ◼ There may be non-sense correlation between two variables.- There is no such thing like non-sense regression.
  • 102. Correlation analysis vs. Regression analysis. ◼ Regression is the average relationship between two variables ◼ R A.
  • 103. What is regression? ◼ Fitting a line to the data using an equation in order to describe and predict data ◼ Simple Regression ◼ Uses just 2 variables (X and Y) ◼ Other: Multiple Regression (one Y and many X’s) ◼ Linear Regression ◼ Fits data to a straight line ◼ Other: Curvilinear Regression (curved line) We’re doing: Simple, Linear Regression
  • 104. From Geometry: ◼ Any line can be described by an equation ◼ For any point on a line for X, there will be a corresponding Y ◼ the equation for this is y = mx + b ◼ m is the slope, b is the Y-intercept (when X = 0) ◼ Slope = change in Y per unit change in X ◼ Y-intercept = where the line crosses the Y axis (when X = 0)
  • 105. Regression equation ◼ Find a line that fits the data the best, = find a line that minimizes the distance from all the data points to that line ◼ Regression Equation: Y(Y-hat) = bX + a ◼ Y(hat) is the predicted value of Y given a certain X ◼ b is the slope ◼ a is the y-intercept ^
  • 106. Regression Equation: We can predict a Y score from an X by plugging a value for X into the equation and calculating Y What would we expect a person to get on quiz #4 if they got a 12.5 on quiz #3? Y = .823X + -4.239 Y = .823(12.5) + -4.239 = 6.049
  • 107. Advantages of Correlation studies ◼ Show the amount (strength) of relationship present ◼ Can be used to make predictions about the variables studied ◼ Can be used in many places, including natural settings, libraries, etc. ◼ Easier to collect correlational data