Correlation & Regression.pptx

Presenter: Faizan Fazal (abdalian459@gmail.com)
Final year MBBS, Rawalindi medical university
Journal Club incharge RSRS
Author of 16 publications
Presenter at various national & International conferences
Fellow of world psychiatric association

Today’s Topics to be covered:
CORRELATION REGRESSION

Correlation and Regressions strengthens the leve of analysis of your
Research.
It increases the chances of your article to get Accepted & Published.
It shows that the authors know something more than the basic
analytics.
Why are even we talking about it?

What is
Correaltion
It is the relationship between 2
variables.
Correaltion shows the strength
and direction of association
between 2 studied variables.
Exp: Is there any link between
Job satisfaction and Burnout at
workplace.

2 Possibilities
in a
Correlation
The correlation between two
variables can be positive (i.e.,
higher levels of one variable are
associated with higher levels of
the other).
The correlation between two
variables can be negative (i.e.,
higher levels of one variable are
associated with lower levels of
the other).

What is correlation
coefficient ?
• In correlation analysis, we estimate a
sample correlation coefficient,
denoted by r
• correlation coefficient (r) ranges
between -1 and +1 and quantifies
the direction and strength of the
linear association between the two
variables.

How to interpret correlation coefficient
• The sign of the correlation coefficient indicates the direction of
the association.
• The magnitude of the correlation coefficient indicates the
strength of the association.
• For example, a correlation of r = 0.9 suggests a strong, positive
association between two variables,
• A correlation of r = -0.2 suggest a weak, negative association. A
correlation close to zero suggests no linear association between
two continuous variables.

Graphical Representation of Correlation:
• one continuous
variable is plotted
along the X-axis and
the other along the Y-
axis.

What are
types of
Correlation
coefficient:
Pearson’s
correlation
Spearman
correlation
Kendall’s tau

Why & When to use Pearson’s correlation
in your research:
• most common method to use for numerical variables.
• it assigns a value between − 1 and 1
• Pearson correlation measures the strength of the linear
relationship between two variables
• both variables are quantitative
• normally distributed

PEARSON CORRELATION:
• Interpretation and reporting:
Pearson correlation between JS and BO was found
to be moderately negative and statistically
significant (r= -0.65, p<0.05). Thus, the null
hypothesis is rejected. This shows that increase in
BO leads to a significant decrease in JS at
workplaces.
Correlations
Job Satisfaction Burnout
Job Satisfaction Pearson Correlation
1 -.650**
Sig. (2-tailed) .000
N 200 200
Burnout Pearson Correlation
-.650** 1
Sig. (2-tailed) .000
N 200 200
**. Correlation is significant at the 0.01 level (2-tailed).

PEARSON CORRELATION:
Scatter plot

Why & When to use Spearman correlation
in your research:
• measures the strength of a non linear relationship between two
variables
• Used when:
• There are ordinal variables present
• Data is not normally distributed.
• Out of 2 variables, one is qualitative & one is quantitative

Correlations
Educational Level Income Level
Spearman's rho Educational Level Correlation Coefficient
1.000 .716**
Sig. (2-tailed) . .000
N 30 30
Income Level Correlation Coefficient
.716** 1.000
Sig. (2-tailed) .000 .
N 30 30
**. Correlation is significant at the 0.01 level (2-tailed).
• Interpretation and reporting:
Spearman correlation between EL and IL was found to be strongly positive and statistically significant
(rho= 0.71, p<0.05). Thus, the null hypothesis is rejected. This shows that increase in EL leads to a
significant increase in IL at workplaces.

Kendall’s tau:
• There are 2 variable, these are quantitative but non-normally distributed.
• Or
• When both variables are qualitative

•REGRESSION:
Linear R
Logistic R

REGRESSION :
• We have to define an OUTCOME Variable and a PREDICTOR variable.
• A proper ‘’Reasoning’’ comes into play.
• Regression has many types.
• 3 most common are:
1. Linear R
2. Logistic R
3. Cox R

REGRESSION :
• major uses for
regression analysis
are (1) determining th
strength of predictors,
(2) forecasting an
effect.

Diff b/w Regression &
Correlation
• Correlation measures the degree of a
relationship between two independent
variables (x and y).
• In contrast, regression is how one
variable affects another (x causes y to
happen).

Diff b/w Regression & Correlation
• Regression has an:
i. Independent variable (IV)
ii. Dpendent variable (DV)
• Exp: Smoking (IV) causes cancer (DV)

Diff b/w Regression & Correlation

Lets go into types of Regression
• Linear Regression (simple regression) :
It assumes a linear relationship between the independent
variable and the dependent variable
Change in one variable (Independent Variable) leads to change
in the another (Dependent Variable).
Exp: Marks obtained in exam depend on the number of hours
studied.

Linear Regression:
• Regression Coefficient is denoted by ‘’R’’
• If you have applied Pearson correlation between two variables
in your data, then you can apply linear R.
• And there should be significant correlation between two
variables studied.

Linear Regression (simple regression) :

There are two kinds of Linear Regression Model:-
Simple Linear Regression: A linear regression model with one
independent and one dependent variable.
Multiple Linear Regression: A linear regression model with
more than one independent variable and one dependent
variable.

Linear regression on SPSS
• Ensure that there is a linear relationship between the 2 variables.
• You will observe this by making a scatter plot between those variables on SPSS.
• R: It is a simple correlation statistic.
• R Square: Shows the effect of one predictor in the presence of other predictors.
• R Square: Tells how much percentage of variance in marks can be predicted by
hours studied.

Linear regression on SPSS
• See the Anova table to assess the level of significiance. If the value is
<0.05 then the Regression model is good to go.
• Value of unstandardized coefficient beta shows that how much
change in the DV can be brought about by 1 unit change in the IV.

Lets Run Linear regression on SPSS
• Here we will study two variables:
Independent variable ( Hours spent studying )
Dependent Variable ( Marks obtained in exams )

Lets dive into Logistic Regression

Logistic Regression:
• Here we will also have a Dependent and an independent variable.
• BUT Dependent variable must be a Dichotomous variable.
• Both the categories should be mutually exclusive. ( Exp: You may code 1 for
men, and 2 for women)
• Independent variable can be continuous or categorical.
• Non-linear relationship between dependent and independent variable.
• The independent variable needs not to be normally distributed.
• A minimum of 50 observations must be present to run a LR.
• LR calculates the probability to fall in one of the 2 categories. So the results
of analysis are in the form of Odds Ratio.

Logistic Regression:
Independent variable/ variables ( nominal or categorical variable )
Dependent variable ( Dichotomous/Binary variable)

Logistic Regression Examples:
• You are the owner of a business and you want to assess the factors
that lead to increase return or no return of customer.
• If a specific politician wins or loses and election based on hours spent
during campaign.
• Guess the dichotomous variables in these 2 above examples.

Example Data set on SPSS
• A sales director for a chain of appliance stores wants to find out what
circumstances encourage customers to purchase extended warranties
after a major appliance purchase. The dependent variable is whether or
not a warranty is purchased. The predictor/Independent variables are:
Age of the customer
Whether a gift is offered with the warranty
Price of the appliance

Example Data set on SPSS
• You can add more than 1 independent variables in the Covariates
portion in SPSS after you open binary logistic portion.
• Go to options.
• Click on Homser-Lemmeshow goodness of fit
• Click CI for exp(B)
• Exp (B) means odds ratio.
• Press ok and see the output.

Interpreting Output:
• Case processing summary: Shows Number of respondents.
• Dependent variable encoding shows the coding done in SPSS for the
dependent dichotomous variable.
• Block 0 is not usable as there is no predictor in the model. We can compare
it with Block 1 in which the predictor variables are added.
• Go to block 1.
Dependent Variable Encoding
Original Value Internal Value
No 0
Yes 1

• Goodness-of-fit: Tells whether the model that we have used describes
the dichotomous variable adequately or not.
• Omnibus test of model coefficients: If it is significant, the model is
showing a good fit. It means the independently variable can explain
the variability seen in the dependent variable. The value should be
<0.05 Omnibus Tests of Model Coefficients
Chi-square df Sig.
Step 1 Step
37.018 3 .000
Block
37.018 3 .000
Model
37.018 3 .000

• Hosmer and Lemeshow test: It is also a test to assess if the model is
showing a good fit or not. If significiance level is >0.05: model is a
good fit.
• Contingency table for Hosmer and Lemeshow : No major difference in
observed and expected value means that the model fits well.
Hosmer and Lemeshow Test
Step Chi-square df Sig.
1
1.792 8 .987

• Model summary: Nagelkerke R square(Pseudo R square) shows the %
change in Dependent variable as seen in the presence of used
Independent variable.
Model Summary
Step -2 Log likelihood Cox & Snell R Square Nagelkerke R Square
1
22.278a .523 .753
a. Estimation terminated at iteration number 8 because parameter estimates changed
by less than .001.

• Classification table: It show specificity % at the end of 1st row, and
sensitivity % at the end of 2nd row. Overall % shows the overall
accuracy of the model.
Classification Tablea
Observed
Predicted
Bought
Percentage
Correct
No Yes
Step 1 Bought No 12 2 85.7
Yes 1 35 97.2
Overall Percentage 94.0
a. The cut value is .500

• Variables in the equation:
• Shows the relationship between predictor and dependent variable.
Variables in the Equation
B S.E. Wald df Sig. Exp(B)
95% C.I.for EXP(B)
Lower Upper
Step 1a gift 2.339 1.131 4.273 1 .039 10.368 1.129 95.232
Age .064 .032 4.132 1 .042 1.066 1.002 1.134
Price .000 .000 6.165 1 .013 1.000 1.000 1.001
Constant
-6.096 2.142 8.096 1 .004 .002
a. Variable(s) entered on step 1: Gift, Age, Price.

• The result is that we can now see that the odds that a customer who is
offered a gift will purchase a warranty is 10 (see Exp(B) for Gift)
times greater than the corresponding odds for a customer not offered a
gift.
• the OR for age, 1.066, tells us that older buyers are more likely to
purchase a warranty.
• See C.I

That’s all for Correlation and regression

Correlation & Regression.pptx

Recommended

Recommended

More Related Content

Similar to Correlation & Regression.pptx

Similar to Correlation & Regression.pptx (20)

Recently uploaded

Recently uploaded (20)

Correlation & Regression.pptx