Bivariate analysis allows researchers to look at relationships between two variables. It examines measures of the strength of relationships and allows testing of hypotheses about relationships between nominal or ordinal variables. Cross-tabulation is used to visually assess if independent and dependent variables may be related. Correlation analysis provides a statistical measure of whether differences between groups indicate a relationship between variables. Correlation coefficients range from -1 to 1, with values farther from 0 indicating stronger linear relationships, and the sign indicating positive or negative association.
2. So far the statistical methods we
have used only permit us to:
◼ Look at the frequency in which certain
numbers or categories occur.
◼ Look at measures of central tendency such
as means, modes, and medians for one
variable.
◼ Look at measures of dispersion such as
standard deviation and z scores for one
interval or ratio level variable.
3. Bivariate analysis allows us to:
◼ Look at associations/relationships among
two variables.
◼ Look at measures of the strength of the
relationship between two variables.
◼ Test hypotheses about relationships
between two nominal or ordinal level
variables.
4. For example, what does this table tell us about
opinions on welfare by gender?
Support cutting
welfare benefits
for immigrants
Male Female
Yes 15 5
No 10 20
Total 25 25
5. Are frequencies sufficient to
allow us to make comparisons
about groups?
What other information do we
need?
6. Is this table more helpful?
Benefits for
Immigrants
Males Female
Yes 15 (60%) 5 (20%)
No 10 (40%) 20 (80%)
Total 25 (100%) 25 (100%)
7. Rules for cross-tabulation
◼ Calculate either column or row percents.
◼ Calculations are the number of frequencies
in a cell of a table divided by the total
number of frequencies in that column or
row, for example 20/25 = 80.0%
◼ All percentages in a column or row should
total 100%.
8. Let’s look at another example –
social work degrees by gender
Social Work
Degree
Male Female
BA 20 (33.3%) 20 ( %)
MSW 30 ( ) 70 (70.0%)
Ph.D. 10 (16.7%) 10 (10.0%)
60 (100.0%) 100 (100.0%
9. We use cross-tabulation when:
◼ We want to look at relationships among
two or three variables.
◼ We want a descriptive statistical measure
to tell us whether differences among
groups are large enough to indicate some
sort of relationship among variables.
10. Cross-tabs are not sufficient to:
◼ Tell us the strength or actually size of the relationships
among two or three variables.
◼ Test a hypothesis about the relationship between two or
three variables.
◼ Tell us the direction of the relationship among two or
more variables.
◼ Look at relationships between one nominal or ordinal
variable and one ratio or interval variable unless the range
of possible values for the ratio or interval variable is
small. What do you think a table with a large number of
ratio values would look like?
11. We can use cross-tabs to visually
assess whether independent and
dependent variables might be
related. In addition, we also use
cross-tabs to find out if
demographic variables such as
gender and ethnicity are related
to the second variable.
12. Because we use tables in these ways, we can
set up some decision rules about how to use
tables.
◼ Independent variables should be column
variables.
◼ If you are not looking at independent and
dependent variable relationships, use the variable
that can logically be said to influence the other as
your column variable.
◼ Using this rule, always calculate column
percentages rather than row percentages.
◼ Use the column percentages to interpret your
results.
14. CORRELATION
◼ Correlation is a much abused word/term
◼ Correlation is a term which implies that there is
an association between the paired values of 2
variables, where association means that the
fluctuations in the values for each variable is
sufficiently regular to make it unlikely that the
association has arisen by chance
◼ assumes: independent random samples are taken
from a distribution in which the 2 variables are
together normally distributed
15. Correlation
◼ Correlation: The degree of relationship between the
variables under consideration is measure through the
correlation analysis.
◼ The measure of correlation called the correlation coefficient
◼ The degree of relationship is expressed by coefficient which
range from correlation ( -1 ≤ r ≥ +1)
◼ The direction of change is indicated by a sign.
◼ The correlation analysis enable us to have an idea about the
degree & direction of the relationship between the two
variables under study.
16. Correlation
◼ Correlation is a statistical tool that helps
to measure and analyze the degree of
relationship between two variables.
◼ Correlation analysis deals with the
association between two or more
variables.
17. Correlation & Causation
◼ Causation means cause & effect relation.
◼ Correlation denotes the interdependency among the
variables for correlating two phenomenon, it is essential
that the two phenomenon should have cause-effect
relationship,& if such relationship does not exist then the
two phenomenon can not be correlated.
◼ If two variables vary in such a way that movement in one
are accompanied by movement in other, these variables
are called cause and effect relationship.
◼ Causation always implies correlation but correlation does
not necessarily implies causation.
19. Types of Correlation Type I
◼ Positive Correlation: The correlation is said to
be positive correlation if the values of two
variables changing with same direction.
Ex. Pub./Ad. Exp. & sales, Height & weight.
◼ Negative Correlation: The correlation is said to
be negative correlation when the values of
variables change with opposite direction.
Ex. Price & qty. demanded.
20. Direction of the Correlation
◼ Positive relationship – Variables change in the
same direction.
◼ As X is increasing, Y is increasing
◼ As X is decreasing, Y is decreasing
◼ E.g., As height increases, so does weight.
◼ Negative relationship – Variables change in
opposite directions.
◼ As X is increasing, Y is decreasing
◼ As X is decreasing, Y is increasing
◼ E.g., As TV time increases, grades decrease
Indicated by
sign; (+) or (-).
21. More examples
◼ Positive relationships
◼ temperature and
water consumption .
◼ study time and
grades.
◼ Negative relationships:
◼ alcohol consumption
and driving ability.
◼ Price & quantity
demanded
23. Types of Correlation Type II
◼ Simple correlation: Under simple correlation
problem there are only two variables are studied.
◼ Multiple Correlation: Under Multiple
Correlation three or more than three variables
are studied. Ex. Qd = f ( P,PC, PS, t, y )
◼ Partial correlation: analysis recognizes more
than two variables but considers only two
variables keeping the other constant.
◼ Total correlation: is based on all the relevant
variables, which is normally not feasible.
25. Types of Correlation Type III
◼ Linear correlation: Correlation is said to be linear
when the amount of change in one variable tends to
bear a constant ratio to the amount of change in the
other. The graph of the variables having a linear
relationship will form a straight line.
Ex X = 1, 2, 3, 4, 5, 6, 7, 8,
Y = 5, 7, 9, 11, 13, 15, 17, 19,
Y = 3 + 2x
◼ Non Linear correlation: The correlation would be
non linear if the amount of change in one variable
does not bear a constant ratio to the amount of change
in the other variable.
26. Methods of Studying Correlation
◼ Scatter Diagram Method
◼ Karl Pearson’s Coefficient of
Correlation (r)
◼ Spearman’s Rank Correlation
Coefficient
27. Scatter Plot
◼ A scatter plot is a graph of a collection of ordered
pairs (x,y).
◼ The graph looks like a bunch of dots, but some of
the graphs are a general shape or move in a general
direction.
28. Scatter Diagram Method
◼ Scatter Diagram is a graph of observed
plotted points where each points
represents the values of X & Y as a
coordinate. It portrays the relationship
between these two variables graphically.
32. Y
X
Y
X
Y
Y
X
X
Strong relationships Weak relationships
Linear Correlation
◼Slide from: Statistics for Managers Using Microsoft® Excel 4th Edition, 2004 Prentice-Hall
43. 2) Direction of the Relationship
◼ Positive relationship – Variables change in the
same direction.
◼ As X is increasing, Y is increasing
◼ As X is decreasing, Y is decreasing
◼ E.g., As height increases, so does weight.
◼ Negative relationship – Variables change in
opposite directions.
◼ As X is increasing, Y is decreasing
◼ As X is decreasing, Y is increasing
◼ E.g., As TV time increases, grades decrease
Indicated by
sign; (+) or (-).
44. Advantages of Scatter Diagram
◼ Simple & Non Mathematical method
◼ Not influenced by the size of extreme
item
◼ First step in investing the relationship
between two variables
46. Karl Pearson's
Coefficient of Correlation
◼ Pearson’s ‘r’ is the most common
correlation coefficient.
◼ Karl Pearson’s Coefficient of Correlation
denoted by- ‘r’ The coefficient of
correlation ‘r’ measure the degree of
linear relationship between two variables
say x & y.
47. Karl Pearson's
Coefficient of Correlation
◼ Karl Pearson’s Coefficient of
Correlation denoted by- r
-1 ≤ r ≥ +1
◼ Degree of Correlation is expressed by a
value of Coefficient
◼ Direction of change is Indicated by sign
( - ve) or ( + ve)
48. Karl Pearson's
Coefficient of Correlation
◼ When deviation taken from actual mean:
r(x, y)= Σxy /√ Σx² Σy²
◼ When deviation taken from an assumed
mean:
r = N Σdxdy - Σdx Σdy
√N Σdx²-(Σdx)² √N Σdy²-(Σdy)²
49. Procedure for computing the
correlation coefficient
◼ Calculate the mean of the two series ‘x’ &’y’
◼ Calculate the deviations ‘x’ &’y’ in two series from their
respective mean.
◼ Square each deviation of ‘x’ &’y’ then obtain the sum of
the squared deviation i.e.∑x2 & .∑y2
◼ Multiply each deviation under x with each deviation under
y & obtain the product of ‘xy’.Then obtain the sum of the
product of x , y i.e. ∑xy
◼ Substitute the value in the formula.
50. Interpretation of Correlation
Coefficient (r)
◼ The value of correlation coefficient ‘r’ ranges
from -1 to +1
◼ If r = +1, then the correlation between the two
variables is said to be perfect and positive
◼ If r = -1, then the correlation between the two
variables is said to be perfect and negative
◼ If r = 0, then there exists no correlation between
the variables
51. Properties of Correlation coefficient
◼ The correlation coefficient lies between -1 & +1
symbolically ( - 1≤ r ≥ 1 )
◼ The correlation coefficient is independent of the
change of origin & scale.
◼ The coefficient of correlation is the geometric mean of
two regression coefficient.
r = √ bxy * byx
The one regression coefficient is (+ve) other regression
coefficient is also (+ve) correlation coefficient is (+ve)
52. Assumptions of Pearson’s
Correlation Coefficient
◼ There is linear relationship between two
variables, i.e. when the two variables are
plotted on a scatter diagram a straight line
will be formed by the points.
◼ Cause and effect relation exists between
different forces operating on the item of
the two variable series.
53. Advantages of Pearson’s Coefficient
◼ It summarizes in one value, the
degree of correlation & direction
of correlation also.
54. Limitation of Pearson’s Coefficient
◼ Always assume linear relationship
◼ Interpreting the value of r is difficult.
◼ Value of Correlation Coefficient is
affected by the extreme values.
◼ Time consuming methods
55. Coefficient of Determination
◼ The convenient way of interpreting the value of
correlation coefficient is to use of square of
coefficient of correlation which is called
Coefficient of Determination.
◼ The Coefficient of Determination = r2.
◼ Suppose: r = 0.9, r2 = 0.81 this would mean that
81% of the variation in the dependent variable
has been explained by the independent variable.
56. Coefficient of Determination
◼ The maximum value of r2 is 1 because it is
possible to explain all of the variation in y but it
is not possible to explain more than all of it.
◼ Coefficient of Determination = Explained
variation / Total variation
57. Coefficient of Determination: An example
◼ Suppose: r = 0.60
r = 0.30 It does not mean that the first
correlation is twice as strong as the second the
‘r’ can be understood by computing the value of
r2 .
When r = 0.60 r2 = 0.36 -----(1)
r = 0.30 r2 = 0.09 -----(2)
This implies that in the first case 36% of the total
variation is explained whereas in second case
9% of the total variation is explained .
58. 3) Spearman’s Rank Coefficient of
Correlation
◼ When statistical series in which the variables
under study are not capable of quantitative
measurement but can be arranged in serial order,
in such situation pearson’s correlation coefficient
can not be used in such case Spearman Rank
correlation can be used.
◼ ρ (rho) = 1- (6 ∑D2 ) / N (N2 – 1)
◼ ρ = Rank correlation coefficient
◼ D = Difference of rank between paired item in two series.
◼ N = Total number of observation.
59. 59
Spearman Rho Correlation
◼ Spearman's rank correlation coefficient or
Spearman's rho is named after Charles Spearman
◼ Used Greek letter ρ (rho) or as rs (non- parametric
measure of statistical dependence between two
variables)
◼ Assesses how well the relationship between two
variables can be described using a monotonic
function
◼ Monotonic is a function (or monotone function) in
mathematic that preserves the given order.
◼ If there are no repeated data values, a perfect
Spearman correlation of +1 or −1 occurs when
each of the variables is a perfect monotone
function of the other
60. 60
Spearman Rho Correlation
60
◼ A correlation coefficient is a numerical measure or
index of the amount of association between two sets of
scores. It ranges in size from a maximum of +1.00
through 0.00 to -1.00
◼ The ‘+’ sign indicates a positive correlation (the scores on
one variable increase as the scores on the other
variable increase)
◼ The ‘-’ sign indicates a negative correlation (the scores on
one variable increase, the scores on the other variable
decrease)
61. 61
61
Spearman Rho CorrelationCalculation
◼ Often thought of as being the Pearson correlation coefficient between the
ranked (relationship between two item) variables
◼ The n raw scores Xi, Yi are converted to ranks xi, yi, and the differences
di = xi − yi between the ranks of each observation on the two variables are
calculated
◼ If there are no tied ranks, then ρ is given by this formula:
62. 62
Spearman Rho Correlation
Calculation
◼ Often thought of as being the Pearson correlation coefficient between the
ranked (relationship between two item) variables
◼ The n raw scores Xi, Yi are converted to ranks xi, yi, and the differences
di = xi − yi between the ranks of each observation on the two variables are
calculated
◼ If there are no tied ranks, then ρ is given by this formula:
63. 63
Spearman Rho Correlation
Interpretation
◼ The sign of the Spearman correlation indicates the direction of association
between X (the independent variable) and Y (the dependent variable)
◼ If Y tends to increase when X increases, the Spearman correlation coefficient
is positive
◼ If Y tends to decrease when X increases, the Spearman correlation coefficient
is negative
◼ A Spearman correlation of zero indicates that there is no tendency for Y to
either increase or decrease when X increases
64. 64
Spearman Rho Correlation
Interpretation cont…/
◼ Alternative name for the Spearman rank correlation is the "grade correlation”
the "rank" of an observation is replaced by the "grade"
◼ When X and Y are perfectly monotonically related, the Spearman correlation
coefficient becomes 1
◼ A perfect monotone increasing relationship implies that for any two pairs of
data values Xi, Yi and Xj, Yj, that Xi − Xj and Yi − Yj always have the same
sign
65. 65
Spearman Rho Correlation
Example # 1
◼ Calculate the correlation between the IQ
of a person with the number of hours
spent in the class per week
◼ Find the value of the term d²i:
1. Sort the data by the first column (Xi).
Create a new column xi and assign it the
ranked values 1,2,3,...n.
2. Sort the data by the second column (Yi).
Create a fourth column yi and similarly
assign it the ranked values 1,2,3,...n.
3. Create a fifth column di to hold the
differences between the two rank
columns (xi and yi).
IQ, Xi Hours of class per
week, Yi
106 7
86 0
100 27
101 50
99 28
103 29
97 20
113 12
112 6
110 17
66. 66
Spearman Rho Correlation
Example # 1 cont…/
4. Create one final column to hold the value of column di squared.
IQ
(Xi )
Hours of class per week
(Yi)
rank xi rank yi di d²i
86 0 1 1 0 0
97 20 2 6 -4 16
99 28 3 8 -5 25
100 27 4 7 -3 9
101 50 5 10 -5 25
103 29 6 9 -3 9
106 7 7 3 4 16
110 17 8 5 3 9
112 6 9 2 7 49
113 12 10 4 6 36
67. 67
Spearman Rho Correlation
Example # 1- Result
◼ With d²i found, we can add them to find d²i = 194
◼ The value of n is 10, so;
ρ = 1- 6 x 194
10(10² - 1)
ρ = −0.18
◼ The low value shows that the correlation between IQ and
hours spent in the class is very low
68. 68
Spearman Rho Correlation
Example # 2:
5 college students have the following rankings in
Mathematics and Science subject. Is there an association
between the rankings in BRM and Economics subject.
Student Jogi Arpit Ankush Poonam Akshay
Economics 1 2 3 4 5
BRM 5 3 1 4 2
69. 69
Spearman Rho Correlation
Example # 2 cont…/
◼ Compute Spearman Rho
n= number of paired ranks
d= difference between the paired ranks
(when two or more observations of one variable are the
same, ranks are assigned by averaging positions
occupied in their rank order)
71. 71
Spearman Rho Correlation
Result: ρ = -0.5
There is a moderate negative
correlation between
The BRM and Economics subject
rankings of students
Students who rank high as compared to other students in
their Economics subject generally have lower BRM subject
ranks and those with low Economics rankings have higher
BRM subject rankings than those with high Eco. rankings.
72. Merits Spearman’s Rank Correlation
◼ This method is simpler to understand and easier
to apply compared to karl pearson’s correlation
method.
◼ This method is useful where we can give the
ranks and not the actual data. (qualitative term)
◼ This method is to use where the initial data in
the form of ranks.
73. Limitation Spearman’s Correlation
◼ Cannot be used for finding out correlation in a
grouped frequency distribution.
◼ This method should be applied where N
exceeds 30.
74. Advantages of Correlation studies
◼ Show the amount (strength) of relationship
present
◼ Can be used to make predictions about the
variables under study.
◼ Can be used in many places, including natural
settings, libraries, etc.
◼ Easier to collect co relational data
75. 3 kinds of relationships between variables
◼ Association or Correlation or Covary
◼ Both variables tend to be high or low (positive
relationship) or one tends to be high when the other is low
(negative relationship). Variables do not have
independent & dependent roles.
◼ Prediction
◼ Variables are assigned independent and dependent roles.
Both variables are observed. There is a weak causal
implication that the independent predictor variable is the
cause and the dependent variable is the effect.
◼ Causal
◼ Variables are assigned independent and dependent roles.
The independent variable is manipulated and the
dependent variable is observed. Strong causal statements
are allowed.
76. General Overview of
Correlational Analysis
◼ The purpose is to measure the strength of a
linear relationship between 2 variables.
◼ A correlation coefficient does not ensure
“causation” (i.e. a change in X causes a
change in Y)
◼ X is typically the Input, Measured, or
Independent variable.
◼ Y is typically the Output, Predicted, or
Dependent variable.
◼ If, as X increases, there is a predictable shift
in the values of Y, a correlation exists.
77. General Properties of
Correlation Coefficients
◼ Values can range between +1 and -1
◼ The value of the correlation coefficient
represents the scatter of points on a
scatterplot
◼ You should be able to look at a scatterplot
and estimate what the correlation would be
◼ You should be able to look at a correlation
coefficient and visualize the scatterplot
79. Positive Correlation
Direct Relationship
◼ As the value of X increases, the value of Y also
increases
◼ Larger values of X tend to be paired with larger
values of Y (and consequently, smaller values of X and Y
tend to be paired)
80. Negative Correlation
Inverse Relationship
• As the value of X
increases, the value of Y
decreases
• Small values of X tend to
be paired with large
value of Y (and vice
versa).
82. No Correlation
• As the value of X
changes, Y does not
change in a predictable
manner.
• Large values of X seem
just as likely to be paired
with small values of Y as
with large values of Y
84. Interpretation
◼ Depends on what the purpose of the study
is… but here is a “general guideline”...
• Value = magnitude of the relationship
• Sign = direction of the relationship
85. Some of the many
Types of Correlation Coefficients
(there are lot’s more…)
Name X variable Y variable
Pearson r Interval/Ratio Interval/Ratio
Spearman rho Ordinal Ordinal
Kendall's Tau Ordinal Ordinal
Phi Dichotomous Dichotomous
Intraclass R Interval/Ratio
Test
Interval/Ratio
Retest
86. Correlation Hypothesis
Testing
◼ Step 1. Identify the population, distribution,
and assumptions
◼ Step 2. State the null and research
hypotheses.
◼ Step 3. Determine the characteristics of the
comparison distribution.
◼ Step 4. Determine the critical values.
◼ Step 5. Calculate the test statistic
◼ Step 6. Make a decision.
87. Regression Analysis
◼ Regression Analysis is a very
powerful tool in the field of statistical
analysis in predicting the value of one
variable, given the value of another
variable, when those variables are
related to each other.
88. Regression Analysis
◼ Regression Analysis is mathematical measure of
average relationship between two or more
variables.
◼ Regression analysis is a statistical tool used in
prediction of value of unknown variable from
known variable.
89. Advantages of Regression Analysis
◼ Regression analysis provides estimates of
values of the dependent variables from the
values of independent variables.
◼ Regression analysis also helps to obtain a
measure of the error involved in using the
regression line as a basis for estimations .
◼ Regression analysis helps in obtaining a
measure of the degree of association or
correlation that exists between the two variable.
90. ◼ Existence of actual linear relationship.
◼ The regression analysis is used to estimate the
values within the range for which it is valid.
◼ The relationship between the dependent and
independent variables remains the same till the
regression equation is calculated.
◼ The dependent variable takes any random value but
the values of the independent variables are fixed.
◼ In regression, we have only one dependant variable
in our estimating equation. However, we can use
more than one independent variable.
Assumptions in Regression Analysis
91. Regression line
◼ Regression line is the line which gives the best
estimate of one variable from the value of any
other given variable.
◼ The regression line gives the average
relationship between the two variables in
mathematical form.
◼ The Regression would have the following
properties: a) ∑( Y – Yc ) = 0 and
b) ∑( Y – Yc )2 = Minimum
92. Regression line
◼ For two variables X and Y, there are always two
lines of regression –
◼ Regression line of X on Y : gives the best
estimate for the value of X for any specific
given values of Y
◼ X = a + b Y a = X - intercept
◼ b = Slope of the line
◼ X = Dependent variable
◼ Y = Independent variable
93. Regression line
◼ For two variables X and Y, there are always two
lines of regression –
◼ Regression line of Y on X : gives the best
estimate for the value of Y for any specific given
values of X
◼ Y = a + bx a = Y - intercept
◼ b = Slope of the line
◼ Y = Dependent variable
◼ x= Independent variable
94. The Explanation of Regression Line
◼ In case of perfect correlation ( positive or
negative ) the two line of regression coincide.
◼ If the two R. line are far from each other then
degree of correlation is less, & vice versa.
◼ The mean values of X &Y can be obtained as
the point of intersection of the two regression
line.
◼ The higher degree of correlation between the
variables, the angle between the lines is
smaller & vice versa.
95. Regression Equation / Line
& Method of Least Squares
◼ Regression Equation of y on x
Y = a + bx
In order to obtain the values of ‘a’ & ‘b’
∑y = na + b∑x
∑xy = a∑x + b∑x2
◼ Regression Equation of x on y
X = c + dy
In order to obtain the values of ‘c’ & ‘d’
∑x = nc + d∑y
∑xy = c∑y + d∑y2
96. Regression Equation / Line when
Deviation taken from Arithmetic Mean
◼ Regression Equation of y on x:
Y = a + bx
In order to obtain the values of ‘a’ & ‘b’
a = Y – bX b = ∑xy / ∑x2
◼ Regression Equation of x on y:
X = c + dy
c = X – dY d = ∑xy / ∑y2
97. Regression Equation / Line when
Deviation taken from Arithmetic Mean
◼ Regression Equation of y on x:
Y – Y = byx (X –X)
byx = ∑xy / ∑x2
byx = r (σy / σx )
◼ Regression Equation of x on y:
X – X = bxy (Y –Y)
bxy = ∑xy / ∑y2
bxy = r (σx / σy )
98. Properties of the Regression Coefficients
◼ The coefficient of correlation is geometric mean of the two
regression coefficients. r = √ byx * bxy
◼ If byx is positive than bxy should also be positive & vice
versa.
◼ If one regression coefficient is greater than one the other
must be less than one.
◼ The coefficient of correlation will have the same sign as
that our regression coefficient.
◼ Arithmetic mean of byx & bxy is equal to or greater than
coefficient of correlation. byx + bxy / 2 ≥ r
◼ Regression coefficient are independent of origin but not of
scale.
99. Standard Error of Estimate.
◼ Standard Error of Estimate is the measure of
variation around the computed regression line.
◼ Standard error of estimate (SE) of Y measure the
variability of the observed values of Y around the
regression line.
◼ Standard error of estimate gives us a measure about
the line of regression. of the scatter of the
observations about the line of regression.
100. Standard Error of Estimate.
◼ Standard Error of Estimate of Y on X is:
S.E. of Yon X (SExy) = √∑(Y – Ye )2 / n-2
Y = Observed value of y
Ye = Estimated values from the estimated equation that correspond
to each y value
e = The error term (Y – Ye)
n = Number of observation in sample.
◼ The convenient formula:
(SExy) = √∑Y2 _ a∑Y _ b∑YX / n – 2
X = Value of independent variable.
Y = Value of dependent variable.
a = Y intercept.
b = Slope of estimating equation.
n = Number of data points.
101. Correlation analysis vs.
Regression analysis.
◼ Regression is the average relationship between two
variables
◼ Correlation need not imply cause & effect
relationship between the variables understudy.- R A
clearly indicate the cause and effect relation ship
between the variables.
◼ There may be non-sense correlation between two
variables.- There is no such thing like non-sense
regression.
103. What is regression?
◼ Fitting a line to the data using an equation in order
to describe and predict data
◼ Simple Regression
◼ Uses just 2 variables (X and Y)
◼ Other: Multiple Regression (one Y and many X’s)
◼ Linear Regression
◼ Fits data to a straight line
◼ Other: Curvilinear Regression (curved line)
We’re doing: Simple, Linear Regression
104. From Geometry:
◼ Any line can be described by an equation
◼ For any point on a line for X, there will be a
corresponding Y
◼ the equation for this is y = mx + b
◼ m is the slope, b is the Y-intercept (when X = 0)
◼ Slope = change in Y per unit change in X
◼ Y-intercept = where the line crosses the Y axis
(when X = 0)
105. Regression equation
◼ Find a line that fits the data the best, = find a
line that minimizes the distance from all the
data points to that line
◼ Regression Equation: Y(Y-hat) = bX + a
◼ Y(hat) is the predicted value of Y given a
certain X
◼ b is the slope
◼ a is the y-intercept
^
106. Regression Equation:
We can predict a Y score from an X by
plugging a value for X into the equation and
calculating Y
What would we expect a person to get on quiz
#4 if they got a 12.5 on quiz #3?
Y = .823X + -4.239
Y = .823(12.5) + -4.239 = 6.049
107. Advantages of Correlation studies
◼ Show the amount (strength) of relationship present
◼ Can be used to make predictions about the variables
studied
◼ Can be used in many places, including natural
settings, libraries, etc.
◼ Easier to collect correlational data