Srinivasulu Rajendran Centre for the Study of Regional Development (CSRD)Jawaharlal Nehru University (JNU) New Delhi India firstname.lastname@example.org
Objective of the session To understand CORRELATION
1. What is the procedure toperform Correlation &Regression?2. How do we interpret results?
Identify the relationship between variables thatwe want to perform Scatter plot for outliers andtype of relationship Monthly HH food Expenditure and HHSIZE
Interpreting Correlation Coefficient r strong correlation: r > .70 or r < –.70 moderate correlation: r is between .30 & .70 or r is between –.30 and –.70 weak correlation: r is between 0 and .30 or r is between 0 and –.30 .
GENERATE A SCATTERPLOT TO SEE THE RELATIONSHIPSGo to Graphs → Legacy dialogues→ Scatter/Dot → Simple Click on DEPENDENT “mfx”. and move it to the Y-Axis Click on the “hhsize”. and move it to the X-AxisClick OK
Scatterplot might not look promising at firstDouble click on chart to open a CHART EDIT window
use Options →Bin Element Simply CLOSE this box. Bins are applied automatically.
BINSDot size now shows number of cases with each pair of X, Y values DO NOT CLOSE CHART EDITOR YET!
Add Fit Line (Regression) In Chart Editor: Elements →Fit Line at Total Close dialog box that opens Close Chart Editor window
Edited Scatterplot Distribution of cases shown by dots (bins) Trend shown by fit line.
Type of Correlation Bivariate Correlations. Partial Correlations Distances
BIVARIATE CORRELATIONS In Bivariate Correlations, the relationship between two variables is measured. The degree of relationship (how closely they are related) could be either positive or negative. The maximum number could be either +1 (positive) or -1 (negative). This number is the correlation coefficient. A zero correlation indicates no relationship. Remember that you will want to perform a scatter plot before performing the correlation (to see if the assumptions have been met.)
Objective We are interested in whether an monthly HH food expenditure was correlated with hhsize.
Select one of the variables that you want to correlate by clicking on it in the left hand pane of the Bivariate Correlations dialog box i.e mfx and hhsize Check the type of correlation coefficients that you require (Pearson for parametric, and Kendall’s tau-b and Spearman for non-parametric). Select the appropriate Test: Pearson’s correlation coefficient assumes that each pair of variables is bivariate normal and it is a measure of linear association. Two variables can be perfectly related, but if the relationship is not linear, Pearson’s correlation coefficient is not an appropriate statistic for measuring their association. Test of Significance: You can select two-tailed or one-tailed probabilities. If the direction of association is known in advance, select One-tailed. Otherwise, select Two-tailed.
Flag significant correlations. Correlation coefficients significant at the 0.05 level are identified with a single asterisk, and those significant at the 0.01 level are identified with two asterisks. Click on the Options… button to select statistics, and select Means and SD and control the missing value by clicking “Exclude Cases pairwise.
The Descriptive Statistics section Descriptive Statistics gives the mean, standard deviation, and number of observations (N) for each of the Std. variables that you Mean Deviation N specified. Household 4.34 1.919 1237 size Monthly hh 4411.25 2717.13 1237 food expenditure (taka)
The correlations table displays Pearsoncorrelation coefficients, significance values, Correlationsand the number of cases with non-missingvalues (N). Monthl y hhThe values of the correlation coefficient foodrange from -1 to 1. expend Househol itureThe sign of the correlation coefficient d size (taka)indicates the direction of the relationship Household Pearson 1 .608 **(positive or negative). size Correlation Sig. (1- .000The absolute value of the correlation tailed)coefficient indicates the strength, with N 1237 1237larger absolute values indicating strongerrelationships. ** Monthly hh Pearson .608 1 food CorrelationThe correlation coefficients on the main expenditure Sig. (1- .000diagonal are always 1, because each variable (taka) tailed)has a perfect positive linear relationship withitself. N 1237 1237
The significance of each correlation coefficient is also Correlations displayed in the correlation table. Monthl y hh The significance level (or p- food value) is the probability of expend obtaining results as extreme Househol iture d size (taka) as the one observed. If the ** Household Pearson 1 .608 significance level is very small size Correlation (less than 0.05) then the Sig. (1- .000 correlation is significant and tailed) the two variables are linearly N 1237 1237 related. If the significance ** level is relatively large (for Monthly hh Pearson .608 1 example, 0.50) then the food Correlation correlation is not significant expenditure Sig. (1- .000 and the two variables are not (taka) tailed) linearly related. N 1237 1237
Partial Correlations The Partial Correlations procedure computes partial correlation coefficients that describe the linear relationship between two variables while controlling for the effects of one or more additional variables. Correlations are measures of linear association. Two variables can be perfectly related, but if the relationship is not linear, a correlation coefficient is not a proper statistic to measure their association.
Select one of the variables that you want to correlate by clicking on it in the left hand pane of the Bivariate Correlations dialog box i.e mfx and hhsize In this case, we can see the correlation between monthly HH food expenditure and household size when head of education maintain constant. Test of Significance: You can select two- tailed or one-tailed probabilities. If the direction of association is known in advance, select One-tailed. Otherwise, select Two-tailed.
Flag significant correlations. Correlation coefficients significant at the 0.05 level are identified with a single asterisk, and those significant at the 0.01 level are identified with two asterisks. Click OK to get results
As we can see, the Correlationspositive Monthlcorrelation y hh foodbetween mfx and expend Househol iturehhsize when Control Variables (sum) head_edu Household size Correlation d size (taka) 1.000 .606hh_edu is Significance . .000 (1-tailed)maintainedconstant is df 0 1232significant at 1% Monthly hh food Correlation .606 1.000level (p > 0.00) expenditure (taka) Significance (1-tailed) .000 . df 1232 0
Hands-on Exercises Find out the correlation relationship between per capita total monthly expenditure and household size and identify the nature of relationship and define the reasons? Find out the correlation relationship between per capita total monthly expenditure and household size by controlling the village those who have adopted technology and not adopted tech? Find out the correlation relationship between per capita food expenditure and non-food expenditure by controlling district effect? [Hint: it is two tail why?]
Distances This procedure calculates any of a wide variety of statistics measuring either similarities or dissimilarities (distances), either between pairs of variables or between pairs of cases. These similarity or distance measures can then be used with other procedures, such as factor analysis, cluster analysis, or multidimensional scaling, to help analyze complex data sets.