2.
IN THIS LECTURE
We learn how to
Draw a scatter diagram and label using SPSS.
Distinguish between positive, negative and no correlation.
Interpret the findings
We learn how to
Use the linear regression using SPSS
Generate an linear regression equation
Interpret the regression coefficients
Use the linear regression to make predictions.
3.
What is correlation analysis?
•This is an analysis of the level of association between two or more variables. In
this module, we only consider the level of association between two variables.
•If two variables are related in the sense that low (and high) values for one
variable are associated with low (and high) values of the other variable, then
we say that the two variables are correlated.
•Sometimes, the association between variables is causal and sometimes it is
not causal. NOTE THAT STATISTICAL TESTS DO NOT PROVE CAUSALITY, they only
show that there is a relationship.
•A causal association is when levels of one variable causes (or forces) the levels
of the other variable. Example, Weight and the body mass index among adults.
•A non-causal association is when two variables are related with no obvious one
to one causation. Example, Sale of ice cream and atmospheric temperatures.
•A spurious correlation between two variables is an association that clearly
occurs by accident. Example, height and earnings among workers in London.
4.
The first descriptive observation of an association between two variables is made by
drawing a scatter diagram.
Here, we need to distinguish between independent and dependent variables.
The dependent variable is the variable of interest to us. It is the variable that is
seemingly affected by other factors.
Example:
Teachers often want to understand the attendance patterns of students. Is it the case
that attendance in class is due to the distance travelled from to university? If that is
the case, then attendance is the dependent variable and distance travelled is the
independent variable.
The independent variable should be on the horizontal axis (x).
THE FIRST OBSERVATION
5.
SCATTER DIAGRAM
Positive correlation Negative correlation No correlation
Upward trend No trendDownward trend
Stronger
Weaker
6.
An example
We are interested to know whether the
amount of CO2 emission is associated with
the gross domestic product (or growth in the
economy) among some countries.
In particular, we want to know if the increase
in CO2 emission is explained by growth in the
economy (GDP) and vice versa.
9.
Strength of the correlation:Correlation coefficient
The correlation coefficient is a statistics that informs the strength of the correlation.
It is a value between -1 and +1.
A value close to -1 suggests a strong negative correlation, whereas a value close to +1
suggests a strong positive correlation.
There is no universal cut off point for a strong correlation.
In this module, we accept a cut off point of 0.7 for strong correlation.
Strong negative
correlation
Strong positive
correlation
No
correlation
10.
Linear regression
Regression analysis is to do with finding a mathematical model for the
relationship between the variables.
A linear regression is finding a mathematical in the form of a straight
line. In the process of regression, we find the equation for the
dependent variable (variable of interest).
A linear equation is of the form
The values of a and b are obtained from the data of the two variables.
y is the dependent variable and x the independent variable.
bxay
12.
Regression output
From the output above, we can write down the
regression equation as follows:
GDPCO 405.516144.452
The R Square value (0.521) is called the coefficient of
determination.
13.
INTERPRETING THE OUTPUT
The coefficient of determination (0.521) represents the proportion of variance of
the CO2 emissions that is explained by the variance in GDP. So, we can say that
52.1% of changes in CO2 emissions are explained by changes in GDP. The rest
(47.9%) are explained by changes to factors other than GDP.
The regression equation
is made up of two coefficients, namely 45.144 and 516.405. These are called the
coefficients of regression.
The coefficient 516.405 is called the gradient and tells us that an increase in GDP by
$1bn will lead to an increase of 516.405 tons of CO2.
GDPCO 405.516144.452
14.
Predictions
From the output above, we can write down the
regression equation as follows:
Example 1: Tonga
CO2 = 45.144 + 516.405*0.24
= 45.144 + 123.9372
= 169.0812
Example 2: Liberia
CO2 = 45.144 + 516.405*0.61
= 45.144 + 315.00705
= 360.151
GDPCO 405.516144.452
The predictions based on the regression equation
are added in a new column.
Be the first to comment