Upcoming SlideShare
×

# LT4011 week 21 slides

773

Published on

Published in: Education, Technology
0 Likes
Statistics
Notes
• Full Name
Comment goes here.

Are you sure you want to Yes No
• Be the first to comment

• Be the first to like this

Views
Total Views
773
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
594
0
Likes
0
Embeds 0
No embeds

No notes for slide

### LT4011 week 21 slides

1. 1. CORRELATION & REGRESSION Week 11
2. 2. IN THIS LECTURE  We learn how to  Draw a scatter diagram and label using SPSS.  Distinguish between positive, negative and no correlation.  Interpret the findings  We learn how to  Use the linear regression using SPSS  Generate an linear regression equation  Interpret the regression coefficients  Use the linear regression to make predictions.
3. 3. What is correlation analysis? •This is an analysis of the level of association between two or more variables. In this module, we only consider the level of association between two variables. •If two variables are related in the sense that low (and high) values for one variable are associated with low (and high) values of the other variable, then we say that the two variables are correlated. •Sometimes, the association between variables is causal and sometimes it is not causal. NOTE THAT STATISTICAL TESTS DO NOT PROVE CAUSALITY, they only show that there is a relationship. •A causal association is when levels of one variable causes (or forces) the levels of the other variable. Example, Weight and the body mass index among adults. •A non-causal association is when two variables are related with no obvious one to one causation. Example, Sale of ice cream and atmospheric temperatures. •A spurious correlation between two variables is an association that clearly occurs by accident. Example, height and earnings among workers in London.
4. 4. The first descriptive observation of an association between two variables is made by drawing a scatter diagram. Here, we need to distinguish between independent and dependent variables. The dependent variable is the variable of interest to us. It is the variable that is seemingly affected by other factors. Example: Teachers often want to understand the attendance patterns of students. Is it the case that attendance in class is due to the distance travelled from to university? If that is the case, then attendance is the dependent variable and distance travelled is the independent variable. The independent variable should be on the horizontal axis (x). THE FIRST OBSERVATION
5. 5. SCATTER DIAGRAM Positive correlation Negative correlation No correlation Upward trend No trendDownward trend Stronger Weaker
6. 6. An example We are interested to know whether the amount of CO2 emission is associated with the gross domestic product (or growth in the economy) among some countries. In particular, we want to know if the increase in CO2 emission is explained by growth in the economy (GDP) and vice versa.
7. 7. Using SPSS to draw a scatter diagram
8. 8. Output Outliers Outliers
9. 9. Strength of the correlation:Correlation coefficient The correlation coefficient is a statistics that informs the strength of the correlation. It is a value between -1 and +1. A value close to -1 suggests a strong negative correlation, whereas a value close to +1 suggests a strong positive correlation. There is no universal cut off point for a strong correlation. In this module, we accept a cut off point of 0.7 for strong correlation. Strong negative correlation Strong positive correlation No correlation
10. 10. Linear regression Regression analysis is to do with finding a mathematical model for the relationship between the variables. A linear regression is finding a mathematical in the form of a straight line. In the process of regression, we find the equation for the dependent variable (variable of interest). A linear equation is of the form The values of a and b are obtained from the data of the two variables. y is the dependent variable and x the independent variable. bxay 
11. 11. USING SPSS
12. 12. Regression output From the output above, we can write down the regression equation as follows: GDPCO  405.516144.452 The R Square value (0.521) is called the coefficient of determination.
13. 13. INTERPRETING THE OUTPUT The coefficient of determination (0.521) represents the proportion of variance of the CO2 emissions that is explained by the variance in GDP. So, we can say that 52.1% of changes in CO2 emissions are explained by changes in GDP. The rest (47.9%) are explained by changes to factors other than GDP. The regression equation is made up of two coefficients, namely 45.144 and 516.405. These are called the coefficients of regression. The coefficient 516.405 is called the gradient and tells us that an increase in GDP by \$1bn will lead to an increase of 516.405 tons of CO2. GDPCO  405.516144.452
14. 14. Predictions From the output above, we can write down the regression equation as follows: Example 1: Tonga CO2 = 45.144 + 516.405*0.24 = 45.144 + 123.9372 = 169.0812 Example 2: Liberia CO2 = 45.144 + 516.405*0.61 = 45.144 + 315.00705 = 360.151 GDPCO  405.516144.452 The predictions based on the regression equation are added in a new column.