KNOWLEDGE FOR THE BENEFIT OF HUMANITYKNOWLEDGE FOR THE BENEFIT OF HUMANITY
BIOSTATISTICS (HFS3283)
REGRESSION
Dr.Dr. MohdMohd RazifRazif ShahrilShahril
School of Nutrition & DieteticsSchool of Nutrition & Dietetics
Faculty of Health SciencesFaculty of Health Sciences
UniversitiUniversiti SultanSultan ZainalZainal AbidinAbidin
1
S C H O O L O F N U T R I T I O N A N D D I E T E T I C S โ€ข U N I V E R S I T I S U L T A N Z A I N A L A B I D I N
Topic Learning Outcomes
At the end of this lecture, students should be able to;
โ€ข identify types of regression analysis and their use.
โ€ข explain assumptions to be met when using Simple Linear
Regression.
โ€ข perform Simple Linear Regression analysis using SPSS.
โ€ข explain how to interpret the SPSS outputs from Simple
Linear Regression analysis.
2
S C H O O L O F N U T R I T I O N A N D D I E T E T I C S โ€ข U N I V E R S I T I S U L T A N Z A I N A L A B I D I N
Regression
โ€ข Regression analysis is the estimation of linear
relationship between a dependent variable and one or
more independent variables or covariates
โ€ข Regression is used to predict the value of the dependent
variable when value of independent variable(s) known
โ€ข Does not imply causality
โ€ข Regression analysis requires interval and ratio-level
data.
3
S C H O O L O F N U T R I T I O N A N D D I E T E T I C S โ€ข U N I V E R S I T I S U L T A N Z A I N A L A B I D I N
Scatter Plot
โ€ข To see if your data fits
the models of regression,
it is wise to conduct a
scatter plot analysis.
โ€ข The reason?
โ€“ Regression analysis
assumes a linear
relationship. If you have
a curvilinear relationship
or no relationship,
regression analysis is of
little use.
4
S C H O O L O F N U T R I T I O N A N D D I E T E T I C S โ€ข U N I V E R S I T I S U L T A N Z A I N A L A B I D I N
Regression line
โ€ข The best straight line
description of the plotted
points
โ€ข Regression line is used to
describe the association
between the variables.
5
S C H O O L O F N U T R I T I O N A N D D I E T E T I C S โ€ข U N I V E R S I T I S U L T A N Z A I N A L A B I D I N
Beta (ฮฒ) regression coefficient
โ€ข Predicts the variation of dependent variable by
changing one unit of explanatory (independent)
variable.
6Sleeping (hours)
Examscores
0 2 4 6 8
Y = a + ฮฒx
Regression coefficientRegression coefficient
(change in Y when X increases by 1)
InterceptIntercept
(value of Y when X=0)
a{
S C H O O L O F N U T R I T I O N A N D D I E T E T I C S โ€ข U N I V E R S I T I S U L T A N Z A I N A L A B I D I N
Coefficient of determination, R2
โ€ข R2 represents how much
proportion of the variation
of dependent variable
explained by the
independent variable.
โ€“ R2 = 1, indicates that
the regression line
perfectly fits the data
โ€“ R2 = 0, indicates that
the line does not fit the
data at all.
7
R2
=0.75
Only 75%
of Y
changes
explained
by X.
YChanges
S C H O O L O F N U T R I T I O N A N D D I E T E T I C S โ€ข U N I V E R S I T I S U L T A N Z A I N A L A B I D I N
Types of regression analysis
โ€ข Simple Linear Regression
โ€“ 1 numerical variable (dependent) vs. 1 numerical variable
(independent)
โ€ข Multiple Linear Regression
โ€“ 1 numerical variable (dependent) vs. more than 1 numerical
variable (independent)
โ€ข Multivariable Linear Regression
โ€“ 1 numerical variable (dependent) vs. more than 1 numerical or
categorical variables (independent)
โ€ข Multivariate Linear Regression
โ€“ More than 1 numerical or categorical variables (dependent) vs.
more than 1 numerical or categorical variables (independent)
โ€ข Logistics Regression
โ€“ 1 categorical variable (dependent) vs. more than 1 numerical or
categorical variables (independent)
8
S C H O O L O F N U T R I T I O N A N D D I E T E T I C S โ€ข U N I V E R S I T I S U L T A N Z A I N A L A B I D I N
Research Qโ€™s and Hypothesis
Example;
โ€ข Research Question
โ€“ Is sleeping hours a predicting factor of exam scores?
โ€ข Null Hypothesis (Ho: ฮฒ = 0)
โ€“ There is no linear relationship between the sleeping
hours and exam scores
โ€ข Alternate Hypothesis (Ha: ฮฒ โ‰  0)
โ€“ There is a significant linear relationship between the
sleeping hours and exam scores
9
S C H O O L O F N U T R I T I O N A N D D I E T E T I C S โ€ข U N I V E R S I T I S U L T A N Z A I N A L A B I D I N
Assumptions
โ€ข The data is drawn from a random sample of
population.
โ€ข The data is independent to each other.
โ€ข The relationship between two variables must be
linear.
โ€ข There is normal distribution of y at any point of
x.
โ€ข There is equal variance of y at any point of x.
10
S C H O O L O F N U T R I T I O N A N D D I E T E T I C S โ€ข U N I V E R S I T I S U L T A N Z A I N A L A B I D I N
Assumptions 3 - Linearity
11
11
22
33
S C H O O L O F N U T R I T I O N A N D D I E T E T I C S โ€ข U N I V E R S I T I S U L T A N Z A I N A L A B I D I N
Assumptions 3 โ€“ Linearity (cont.)
12
44
55
66
77
Put the independent variablePut the independent variable
into โ€œX Axisโ€ box
Put the dependent variable
into โ€œY Axisโ€ box
S C H O O L O F N U T R I T I O N A N D D I E T E T I C S โ€ข U N I V E R S I T I S U L T A N Z A I N A L A B I D I N
Assumptions 3 โ€“ Linearity (cont.)
To add regression line;To add regression line;
Double click on the plots
88
S C H O O L O F N U T R I T I O N A N D D I E T E T I C S โ€ข U N I V E R S I T I S U L T A N Z A I N A L A B I D I N
Assumptions 3 โ€“ Linearity (cont.)
99
The relationship between twoThe relationship between two
variables is linear
S C H O O L O F N U T R I T I O N A N D D I E T E T I C S โ€ข U N I V E R S I T I S U L T A N Z A I N A L A B I D I N
Assumptions 4 โ€“ Normal distribution
S C H O O L O F N U T R I T I O N A N D D I E T E T I C S โ€ข U N I V E R S I T I S U L T A N Z A I N A L A B I D I N
Assumptions 4 โ€“ Normal distribution (cont.)
S C H O O L O F N U T R I T I O N A N D D I E T E T I C S โ€ข U N I V E R S I T I S U L T A N Z A I N A L A B I D I N
Assumptions 5 โ€“ Equal variance
S C H O O L O F N U T R I T I O N A N D D I E T E T I C S โ€ข U N I V E R S I T I S U L T A N Z A I N A L A B I D I N
Assumptions 5 โ€“ Equal variance (cont.)
S C H O O L O F N U T R I T I O N A N D D I E T E T I C S โ€ข U N I V E R S I T I S U L T A N Z A I N A L A B I D I N
Assumptions 5 โ€“ Equal variance (cont.)
S C H O O L O F N U T R I T I O N A N D D I E T E T I C S โ€ข U N I V E R S I T I S U L T A N Z A I N A L A B I D I N
Simple Linear Regression in SPSS
11
22
33
S C H O O L O F N U T R I T I O N A N D D I E T E T I C S โ€ข U N I V E R S I T I S U L T A N Z A I N A L A B I D I N
Simple Linear Regression in SPSS
44
55
Put the independent variablePut the independent variable
into โ€œX Axisโ€ box
Put the dependent variable
into โ€œY Axisโ€ box
S C H O O L O F N U T R I T I O N A N D D I E T E T I C S โ€ข U N I V E R S I T I S U L T A N Z A I N A L A B I D I N
Simple Linear Regression in SPSS
66
88
77
S C H O O L O F N U T R I T I O N A N D D I E T E T I C S โ€ข U N I V E R S I T I S U L T A N Z A I N A L A B I D I N
Simple Linear Regression in SPSS
99
S C H O O L O F N U T R I T I O N A N D D I E T E T I C S โ€ข U N I V E R S I T I S U L T A N Z A I N A L A B I D I N
SPSS Output
11
โ€ข The table demonstrates the method used in this data analysis.
โ€ข
โ€ข The table demonstrates the method used in this data analysis.
โ€ข No variable selection was carried out.
S C H O O L O F N U T R I T I O N A N D D I E T E T I C S โ€ข U N I V E R S I T I S U L T A N Z A I N A L A B I D I N
SPSS Output
22
The โ€˜Model Summaryโ€™ table shows the
โ€ข
โ€ข
โ€ข
โ€ข
โ€ข
The โ€˜Model Summaryโ€™ table shows the
โ€ข Correlation coefficient (R)
โ€ข Coefficient of determination (R2)
โ€ข The correlation coefficient (r) is 0.463 and thus there is fair
positive linear relationship between the two variable.
โ€ข The coefficient of determination (r2) is 0.214.
โ€ข Thus 21.4% of variation of exam scores is explained by sleeping
hours.
S C H O O L O F N U T R I T I O N A N D D I E T E T I C S โ€ข U N I V E R S I T I S U L T A N Z A I N A L A B I D I N
SPSS Output
33
The ANOVA table explicates the p value of the relationship .The ANOVA table explicates the p value of the relationship .
S C H O O L O F N U T R I T I O N A N D D I E T E T I C S โ€ข U N I V E R S I T I S U L T A N Z A I N A L A B I D I N
SPSS Output
44
The coefficients table shows
โ€ข
โ€ข
โ€ข
The coefficients table shows
โ€ข the slope of the line (ฮฒ),
โ€ข the intercept at y axis (constant),
โ€ข the p value of the relationship.
Y = a + ฮฒ x
S C H O O L O F N U T R I T I O N A N D D I E T E T I C S โ€ข U N I V E R S I T I S U L T A N Z A I N A L A B I D I N
SPSS Output Interpretation
โ€ข The slope of the regression line (ฮฒ) is 3.456 with y axis
intercept at 39.151.
โ€ข Increase 1 hours of sleeping hours will increase
3.456 exam scores.
โ€ข The regression equation:
Exam scores = 39.151 + 3.456 (sleeping hours)
โ€ข The p value is < 0.05, therefore reject null hypothesis.
โ€ข There is a significant linear relationship between
sleeping hours and exam scores (p<0.001).
โ€ข Sleeping hours is a significant predicting factor for
exam scores.
28
S C H O O L O F N U T R I T I O N A N D D I E T E T I C S โ€ข U N I V E R S I T I S U L T A N Z A I N A L A B I D I N
Results Presentation
29
ฮฒ (95% CI) t statistics P value* R2
Sleeping hours 3.456 (3.166, 3.746) 23.354 < 0.001 0.214
Table: Relationship between sleeping hours and exam scores
*Simple Linear Regression
There is a significant linear
between sleeping hours
observed that an of
There is a significant linear
relationship between sleeping hours
and exam scores (p<0.001). It is
observed that an Increase 1 hours of
sleeping hours will increase 3.456
exam scores. Sleeping hours is a
significant predicting factor for
exam scores.
Thank YouThank You
30

9. Regression

  • 1.
    KNOWLEDGE FOR THEBENEFIT OF HUMANITYKNOWLEDGE FOR THE BENEFIT OF HUMANITY BIOSTATISTICS (HFS3283) REGRESSION Dr.Dr. MohdMohd RazifRazif ShahrilShahril School of Nutrition & DieteticsSchool of Nutrition & Dietetics Faculty of Health SciencesFaculty of Health Sciences UniversitiUniversiti SultanSultan ZainalZainal AbidinAbidin 1
  • 2.
    S C HO O L O F N U T R I T I O N A N D D I E T E T I C S โ€ข U N I V E R S I T I S U L T A N Z A I N A L A B I D I N Topic Learning Outcomes At the end of this lecture, students should be able to; โ€ข identify types of regression analysis and their use. โ€ข explain assumptions to be met when using Simple Linear Regression. โ€ข perform Simple Linear Regression analysis using SPSS. โ€ข explain how to interpret the SPSS outputs from Simple Linear Regression analysis. 2
  • 3.
    S C HO O L O F N U T R I T I O N A N D D I E T E T I C S โ€ข U N I V E R S I T I S U L T A N Z A I N A L A B I D I N Regression โ€ข Regression analysis is the estimation of linear relationship between a dependent variable and one or more independent variables or covariates โ€ข Regression is used to predict the value of the dependent variable when value of independent variable(s) known โ€ข Does not imply causality โ€ข Regression analysis requires interval and ratio-level data. 3
  • 4.
    S C HO O L O F N U T R I T I O N A N D D I E T E T I C S โ€ข U N I V E R S I T I S U L T A N Z A I N A L A B I D I N Scatter Plot โ€ข To see if your data fits the models of regression, it is wise to conduct a scatter plot analysis. โ€ข The reason? โ€“ Regression analysis assumes a linear relationship. If you have a curvilinear relationship or no relationship, regression analysis is of little use. 4
  • 5.
    S C HO O L O F N U T R I T I O N A N D D I E T E T I C S โ€ข U N I V E R S I T I S U L T A N Z A I N A L A B I D I N Regression line โ€ข The best straight line description of the plotted points โ€ข Regression line is used to describe the association between the variables. 5
  • 6.
    S C HO O L O F N U T R I T I O N A N D D I E T E T I C S โ€ข U N I V E R S I T I S U L T A N Z A I N A L A B I D I N Beta (ฮฒ) regression coefficient โ€ข Predicts the variation of dependent variable by changing one unit of explanatory (independent) variable. 6Sleeping (hours) Examscores 0 2 4 6 8 Y = a + ฮฒx Regression coefficientRegression coefficient (change in Y when X increases by 1) InterceptIntercept (value of Y when X=0) a{
  • 7.
    S C HO O L O F N U T R I T I O N A N D D I E T E T I C S โ€ข U N I V E R S I T I S U L T A N Z A I N A L A B I D I N Coefficient of determination, R2 โ€ข R2 represents how much proportion of the variation of dependent variable explained by the independent variable. โ€“ R2 = 1, indicates that the regression line perfectly fits the data โ€“ R2 = 0, indicates that the line does not fit the data at all. 7 R2 =0.75 Only 75% of Y changes explained by X. YChanges
  • 8.
    S C HO O L O F N U T R I T I O N A N D D I E T E T I C S โ€ข U N I V E R S I T I S U L T A N Z A I N A L A B I D I N Types of regression analysis โ€ข Simple Linear Regression โ€“ 1 numerical variable (dependent) vs. 1 numerical variable (independent) โ€ข Multiple Linear Regression โ€“ 1 numerical variable (dependent) vs. more than 1 numerical variable (independent) โ€ข Multivariable Linear Regression โ€“ 1 numerical variable (dependent) vs. more than 1 numerical or categorical variables (independent) โ€ข Multivariate Linear Regression โ€“ More than 1 numerical or categorical variables (dependent) vs. more than 1 numerical or categorical variables (independent) โ€ข Logistics Regression โ€“ 1 categorical variable (dependent) vs. more than 1 numerical or categorical variables (independent) 8
  • 9.
    S C HO O L O F N U T R I T I O N A N D D I E T E T I C S โ€ข U N I V E R S I T I S U L T A N Z A I N A L A B I D I N Research Qโ€™s and Hypothesis Example; โ€ข Research Question โ€“ Is sleeping hours a predicting factor of exam scores? โ€ข Null Hypothesis (Ho: ฮฒ = 0) โ€“ There is no linear relationship between the sleeping hours and exam scores โ€ข Alternate Hypothesis (Ha: ฮฒ โ‰  0) โ€“ There is a significant linear relationship between the sleeping hours and exam scores 9
  • 10.
    S C HO O L O F N U T R I T I O N A N D D I E T E T I C S โ€ข U N I V E R S I T I S U L T A N Z A I N A L A B I D I N Assumptions โ€ข The data is drawn from a random sample of population. โ€ข The data is independent to each other. โ€ข The relationship between two variables must be linear. โ€ข There is normal distribution of y at any point of x. โ€ข There is equal variance of y at any point of x. 10
  • 11.
    S C HO O L O F N U T R I T I O N A N D D I E T E T I C S โ€ข U N I V E R S I T I S U L T A N Z A I N A L A B I D I N Assumptions 3 - Linearity 11 11 22 33
  • 12.
    S C HO O L O F N U T R I T I O N A N D D I E T E T I C S โ€ข U N I V E R S I T I S U L T A N Z A I N A L A B I D I N Assumptions 3 โ€“ Linearity (cont.) 12 44 55 66 77 Put the independent variablePut the independent variable into โ€œX Axisโ€ box Put the dependent variable into โ€œY Axisโ€ box
  • 13.
    S C HO O L O F N U T R I T I O N A N D D I E T E T I C S โ€ข U N I V E R S I T I S U L T A N Z A I N A L A B I D I N Assumptions 3 โ€“ Linearity (cont.) To add regression line;To add regression line; Double click on the plots 88
  • 14.
    S C HO O L O F N U T R I T I O N A N D D I E T E T I C S โ€ข U N I V E R S I T I S U L T A N Z A I N A L A B I D I N Assumptions 3 โ€“ Linearity (cont.) 99 The relationship between twoThe relationship between two variables is linear
  • 15.
    S C HO O L O F N U T R I T I O N A N D D I E T E T I C S โ€ข U N I V E R S I T I S U L T A N Z A I N A L A B I D I N Assumptions 4 โ€“ Normal distribution
  • 16.
    S C HO O L O F N U T R I T I O N A N D D I E T E T I C S โ€ข U N I V E R S I T I S U L T A N Z A I N A L A B I D I N Assumptions 4 โ€“ Normal distribution (cont.)
  • 17.
    S C HO O L O F N U T R I T I O N A N D D I E T E T I C S โ€ข U N I V E R S I T I S U L T A N Z A I N A L A B I D I N Assumptions 5 โ€“ Equal variance
  • 18.
    S C HO O L O F N U T R I T I O N A N D D I E T E T I C S โ€ข U N I V E R S I T I S U L T A N Z A I N A L A B I D I N Assumptions 5 โ€“ Equal variance (cont.)
  • 19.
    S C HO O L O F N U T R I T I O N A N D D I E T E T I C S โ€ข U N I V E R S I T I S U L T A N Z A I N A L A B I D I N Assumptions 5 โ€“ Equal variance (cont.)
  • 20.
    S C HO O L O F N U T R I T I O N A N D D I E T E T I C S โ€ข U N I V E R S I T I S U L T A N Z A I N A L A B I D I N Simple Linear Regression in SPSS 11 22 33
  • 21.
    S C HO O L O F N U T R I T I O N A N D D I E T E T I C S โ€ข U N I V E R S I T I S U L T A N Z A I N A L A B I D I N Simple Linear Regression in SPSS 44 55 Put the independent variablePut the independent variable into โ€œX Axisโ€ box Put the dependent variable into โ€œY Axisโ€ box
  • 22.
    S C HO O L O F N U T R I T I O N A N D D I E T E T I C S โ€ข U N I V E R S I T I S U L T A N Z A I N A L A B I D I N Simple Linear Regression in SPSS 66 88 77
  • 23.
    S C HO O L O F N U T R I T I O N A N D D I E T E T I C S โ€ข U N I V E R S I T I S U L T A N Z A I N A L A B I D I N Simple Linear Regression in SPSS 99
  • 24.
    S C HO O L O F N U T R I T I O N A N D D I E T E T I C S โ€ข U N I V E R S I T I S U L T A N Z A I N A L A B I D I N SPSS Output 11 โ€ข The table demonstrates the method used in this data analysis. โ€ข โ€ข The table demonstrates the method used in this data analysis. โ€ข No variable selection was carried out.
  • 25.
    S C HO O L O F N U T R I T I O N A N D D I E T E T I C S โ€ข U N I V E R S I T I S U L T A N Z A I N A L A B I D I N SPSS Output 22 The โ€˜Model Summaryโ€™ table shows the โ€ข โ€ข โ€ข โ€ข โ€ข The โ€˜Model Summaryโ€™ table shows the โ€ข Correlation coefficient (R) โ€ข Coefficient of determination (R2) โ€ข The correlation coefficient (r) is 0.463 and thus there is fair positive linear relationship between the two variable. โ€ข The coefficient of determination (r2) is 0.214. โ€ข Thus 21.4% of variation of exam scores is explained by sleeping hours.
  • 26.
    S C HO O L O F N U T R I T I O N A N D D I E T E T I C S โ€ข U N I V E R S I T I S U L T A N Z A I N A L A B I D I N SPSS Output 33 The ANOVA table explicates the p value of the relationship .The ANOVA table explicates the p value of the relationship .
  • 27.
    S C HO O L O F N U T R I T I O N A N D D I E T E T I C S โ€ข U N I V E R S I T I S U L T A N Z A I N A L A B I D I N SPSS Output 44 The coefficients table shows โ€ข โ€ข โ€ข The coefficients table shows โ€ข the slope of the line (ฮฒ), โ€ข the intercept at y axis (constant), โ€ข the p value of the relationship. Y = a + ฮฒ x
  • 28.
    S C HO O L O F N U T R I T I O N A N D D I E T E T I C S โ€ข U N I V E R S I T I S U L T A N Z A I N A L A B I D I N SPSS Output Interpretation โ€ข The slope of the regression line (ฮฒ) is 3.456 with y axis intercept at 39.151. โ€ข Increase 1 hours of sleeping hours will increase 3.456 exam scores. โ€ข The regression equation: Exam scores = 39.151 + 3.456 (sleeping hours) โ€ข The p value is < 0.05, therefore reject null hypothesis. โ€ข There is a significant linear relationship between sleeping hours and exam scores (p<0.001). โ€ข Sleeping hours is a significant predicting factor for exam scores. 28
  • 29.
    S C HO O L O F N U T R I T I O N A N D D I E T E T I C S โ€ข U N I V E R S I T I S U L T A N Z A I N A L A B I D I N Results Presentation 29 ฮฒ (95% CI) t statistics P value* R2 Sleeping hours 3.456 (3.166, 3.746) 23.354 < 0.001 0.214 Table: Relationship between sleeping hours and exam scores *Simple Linear Regression There is a significant linear between sleeping hours observed that an of There is a significant linear relationship between sleeping hours and exam scores (p<0.001). It is observed that an Increase 1 hours of sleeping hours will increase 3.456 exam scores. Sleeping hours is a significant predicting factor for exam scores.
  • 30.