Statistics - Simple Linear and Multiple Linear Regression

PRESENTED BY |
Advanced Quantitative Research in the Designed and Built Environment
Simple Linear and Multiple Linear Regression
UNIVERSITY OF THE PHILIPPINES – DILIMAN | INTEGRATED GRADUATE PROGRAM (IGP) | URBAN DESIGN STUDIO LAB
BRYLL EDISON C. PAR
A. INTRODUCTION TO SIMPLE LINEAR REGRESSION
B. HOW TO PERFORM LINEAR REGRESSION
C. MULTIPLE REGRESSION

PRESENTED BY |
PART 1 – Introduction to Simple Linear Regression
BRYLL EDISON C. PAR

INTRODUCTION TO SIMPLE LINEAR REGRESSION | HOW TO PERFORM LINEAR REGRESSION | MULTIPLE REGRESSION
SOURCE: https://www.scribbr.com/statistics/simple-linear-regression/
IMAGE REFERENCES: From left-analyticsvidhya.com; Nwaogazie (2017)
Regression models describe the relationship between variables by fitting a line to the observed data.
Linear regression models use a straight line, while logistic and nonlinear regression models use a
curved line. Regression allows you to estimate how a dependent variable changes as the independent
variable(s) change.
FIGURE 1 – LINEAR REGRESSION MODEL FIGURE 2 – LOGISTIC AND NONLINEAR REGRESSION MODEL

Linear regression attempts to model the relationship between two variables by fitting a linear equation to
observed data. One variable is considered to be an explanatory variable, and the other is considered to be
a dependent variable.
SOURCE: stat.yale.edu
IMAGE REFERENCES: From left-Scribbr; Auerkari, e at.,(2017); sphweb.bumc.bu.edu

SOURCE: Introduction to Simple Linear Regression by dataminingincae found at YouTube.com
Objectives of Linear Regression
To establish if there is a relationship between two
variables:
- More specifically, establish if there is a statistically
significant relationship between the two.
- Examples: Income and spending, wage and gender,
and student height and exam scores.
Forecast new observations:
- Can we use what we know about the relationship to
forecast unobserved values?
- Examples: What will our sales be for the next quarter?
What will the ROI of a new store opening be contingent
on store attributes.
Variable Roles The Magic
• Dependent Variable – Denoted by “y”
• Independent Variable – Denoted by “x”
Slope-intercept form
• y = a+bx
• y = mx+b
Linear Equation in Statistics
y = β0 + β1x
where:
β0 = Intercept/constant value
β1 = slope of x

Note: We call it “linear equation” because the equation represents a
straight line in a bi-dimensional plot
Change in intercept
Change in slope
• y = a+bx
• y = mx+b

Simple Linear Regression Error Term
Linear Regression Model Linear Regression Equation
y = β0 + β1x + ε
where:
y = Dependent Variable
x = Independent Variable
β1 = Coefficient/slope of x
ε = error term
y = β0 + β1x
where:
Note: There is no error term since the error is assumed to be zero

PRESENTED BY |
PART 2 – How to Perform Linear Regression
BRYLL EDISON C. PAR

Example: Family’s consumption of a given product (Relationship between the family’s income and
the consumption)
Step 1. Compile the observations/true value on a table in the Microsoft Excel program and
save it as a CSV. File.

the consumption)

the consumption)
Step 2. Review the linear regression model and identify the independent as well as the
dependent variable.

the consumption)
Linear Regression Model Explanation
y = β0 + β1x + ε
where:
ε = error term
CONSUMPTION = β0 + β1 INCOME + ε
where:
y = Consumption
x = Income
ε = error term
Assumption: Income explains consumption

SOURCE: Introduction to Simple Linear Regression by dataminingincae found at YouTube.com; Gretl Tutorial 1: Simple Linear Regression by dataminingincae Found at YouTube.com
the consumption)
Step 3. Find the coefficients of the constant and the independent variable
In this case an open-source statistical package will be used (Gretl software)
The software may be downloaded on this link: http://gretl.sourceforge.net/win32/

the consumption)
Open Gretl software Click the file tab and hover to open data and user file

the consumption)
A window will pop up. Locate your csv file from your pc and
then click open
After you click open, you will be redirected to this window

the consumption)
Select both the dependent and independent variable (income
and consumption) and then click the Beta icon below
Choose consumption by clicking the blue arrow pointing right

the consumption)
Choose income by clicking the green arrow pointing right
below the blue arrow then click OK
A window will pop up showing the summary of the data you
need

the consumption)
Gretl is an open-source statistical
package, mainly for econometrics.
Econometrics is the science or field of
knowledge that analyses data with
statistical models to test hypothesis and
reach conclusions.
y = 48.77 + 0.85 x + ε
Consumption = 48.77 + 0.85 income + ε

the consumption)
Step 4. Forecast using gretl. Please follow the instructions

the consumption)
Hover on the analysis tab and click forecast and wait for
another window to pop up
Check the value of the forecast range and click OK

the consumption)
A summary of the predictions, standards error and the intervals will open as well as the “Forecast evaluation statistics using 40
observations” not shown in this figure. The graph will also pop up as shown on the next slide

the consumption)
Estimated vs. Actual Values

the consumption)
Step 5. Finalize result and proceed with conclusion

the consumption)
y = 48.77 + 0.85 x + ε
Consumption = 48.77 + 0.85 income + ε
Estimated model of consumption
48.77 = Interpreted consumption of a family with 0 income
0.85 = Marginal effect of one unit increase of income on consumption
x = It doesn’t have an intuitive interpretation meaning that in most cases we will actually be ignoring it.
Conclusion:
Income will grow 0.85 for every unit increase in income. Ex: A family's income is 50 dollars more.
0.85x
(0.85)(50 dollars) = 42.5 dollars
It means that for every 50 dollars of income a family earns more per week, the consumption will grow on average
an expected of 42.5 dollars.

PRESENTED BY |
PART 3 – Multiple Regression
BRYLL EDISON C. PAR

SOURCE: https://www.scribbr.com/statistics/multiple-linear-regression/
Multiple linear regression is used to estimate the relationship between two or more independent
variables and one dependent variable.
IMAGE REFERENCES: From left – Jacome (2016); scribbr

Objectives of Multiple Linear Regression
You can use multiple linear regression when you want to
know:
• How strong the relationship is between two or more
independent variables and one dependent variable (e.g.
how rainfall, temperature, and amount of fertilizer added
affect crop growth)
You can use multiple linear regression when you want to
know:
• The value of the dependent variable at a certain value
of the independent variables (e.g. the expected yield of
a crop at certain levels of rainfall, temperature, and
fertilizer addition).
Variable Roles The Magic
• Dependent Variable – Denoted by “y”
• Independent Variable – Denoted by “x”
• y = a+bx
• y = mx+b
Multiple Linear Regression Model
y = β0 + β1X1 + β2X2 + … βpXp + ε
where:
β1 = slope of x

Multiple Regression Key Concept
Simple linear regression
(One to one relationship)
DV
IV
Multiple regression
(Many to one relationship)
DV
IV IV
IV IV
… or more
Note: Adding more independent variables to a multiple regression procedure does not mean the regression will be
“better” or offer better predictions; in fact, it can make things worse. This is called “Overfitting”
The addition of more independent variables creates more relationships among them. So not only the independent
variable, but they are also potentially related to each other. When this happen, it is called “Multicollinearity”

Multicollinearity – the independent variables are correlated with each other.
The ideal is for all the independent variables to be correlated with the dependent variable but not with each other.
DV
IV
IV
IV
IV
Check for the relationship between each
independent variable and the dependent variable.
Consider all the relationships between each
independent variables.
Multiple regression
(Many to one relationship)
In tis example. 10 relationships should be considered.
4 relationships between IV and DV and another and 6
relationships between IV and IV
Note: The more Independent variable added the
relationships become numerous. Some independent
variables, or set of independent variables, are better at
predicting the dependent variable than others. Some
contribute nothing.
The ideal is for all the independent variables to be
correlated with the dependent variable but not with each
other.

Facts
• Multiple regression is an extension of simple linear
regression.
• Two or more independent variable is used to
predict/explain the variance in one dependent variable.
• Two problems may arise:
• 1. Overfitting: is caused by adding too many
independent variables; they account for more variance
but add nothing to the model
• 2. Multicollinearity: happens when some/all the
independent variables are correlated with eachother.
• In multiple regression, each coefficient is interpreted as
the estimated change in y corresponding to a one unit
change in a variable, when all other variables are held
constant.
y = β0 + β1X1 + β2X2 + … βpXp + ε
Multiple Linear Regression Model
Sum of Linear Parameters Error Term
Multiple Linear Regression Equation
y = β0 + β1X1 + β2X2 + … βpXp
Error Term is assumed to be zero
Estimated Multiple Linear
Regression Equation
ŷ = b0 + b1X1 + b2X2 + … bpXp
b1, b2,…bp are the estimates of β1, β2,…βp

EXAMPLE 1: ESTIMATED MULTIPLE REGRESSION
EQUATION
Ŷ = 5.344 + 0.025 X1 + 0.234 X2 – 0.529 X3
(Standard form of a multiple regression equation)
ŷ = b0 + b1X1 + b2X2 + … bpXp
(Estimated multiple regression equation)
Estimates of a multiple regression model
Variables: X1, X2, and X3
Coefficients: 0.025, 0.234, and -0.529
Intercept: 5.344
EXAMPLE 2: INTERPRETTING COEFFICIENTS
Ŷ = 27 + 9 X1 + 12 X2
(Standard form of a multiple regression equation)
X1 = Capital Investments (1000 usd)
X2 = Marketing Expenditures (1000 usd)
Ŷ = Predicted Exam Score (1000 usd)
Note: In multiple regression, each coefficient is
interpreted as the estimated change in y
corresponding to a one unit change in a variable, when
all other variables are held constant.
In this example, 9000 usd is an estimate of the
expected increase in sales y, corresponding to a 1000
usd increase in capital investment (X1) when
marketing expenditure (X2) are held constant.

Multiple Regression Pre-work/Data Preparation
1. Generate a list of potential variables; independent(s) and dependent.
2. Collect data on the variables.
3. Check the relationships between each independent variable and the dependent variable using scatterplots and
correlations.
4. Check the relationships between independent variables using scatterplots and correlations.
5. (Optional) Conduct simple linear regression for each independent and dependent pair.
6. Use the non-redundant independent variables in the analysis to find the best fitting model.
7. Use the best fitting model to make predictions about the dependent variable.

Sample Problem: Regional Delivery Service
RDS Data and Variable Naming
To conduct your analysis, you take a random sample of 10 past trips and record four pieces of information for each
trip: 10 Total miles traveled, 2) number of deliveries, 3) the daily gas price, and 4) total travel time in hours.
Miles Traveled, (X1) Number of Deliveries, (X2) Gas Price, (X3) Travel Time (Hours), (y)
89 4 3.84 7
66 1 3.19 5.4
78 3 3.78 6.6
111 6 3.89 7.4
44 1 3.57 4.8
77 3 3.57 6.4
80 3 3.03 7
66 2 3.51 5.6
109 5 3.54 7.3
76 3 3.25 6.4

Sample Problem: Regional Delivery Service (Sketching out relationships)
Independent Variables
Miles
Traveled,
(X1)
Number of
Deliveries,
(X2)
Gas Price,
(X3)
Travel
Time, (y)
Dependent Variable
6 Relationships Should be Analyzed

Sample Problem: Regional Delivery Service – Dependent Variable vs Independent Variable Scatterplots (Using
Gretl application)
X1 vs. y X2 vs. y X3 vs. y
R squared = 0.862
P Value (F) = 0.000
R squared = 0.840
P Value (F) = 0.000
R squared = 0.071
P Value (F) = 0.455

Sample Problem: Regional Delivery Service – Multicollinearity Scatterplots (Using Gretl application)
X1 vs. y X2 vs. y X3 vs. y
R squared = 0.914
P Value (F) = 0.000
R squared = 0.127
P Value (F) = 0.313
R squared = 0.248
P Value (F) = 0.143

Sample Problem: Regional Delivery Service – Correlation Summary
Correlation analysis confirms the conclusion reached by visual examination of the scatterplots
Redundant multicollinear variables
• Miles Travelled and Number of Deliveries are both highly correlated with each other and therefore are redundant;
only one should be used in the multiple regression analysis.
Non-contributing variables
• Gas price is not correlated with the dependent variable and should be excluded.
Note: For education purposes, all three relationships will be retained.

Sample Problem: Regional Delivery Service – Individual Summary Output
Travel Time (y) vs. Miles Travelled (X1)
Ŷ = 3.186 + 0.0403 (Miles Travelled)
Ŷ = 3.186 + 0.0403 X1
An increase in 1 mile will increase delivery
time by 0.0403 hours

Travel Time (y) vs. Number of Deliveries (X2)
Ŷ = 4.845 + 0.498 (Number of Deliveries)
Ŷ = 4.845 + 0.498 X2
An increase in 1 delivery will increase
delivery time by 0.498 hours

Travel Time (y) vs. Gas Price (X3)
Ŷ = 3.536 + 0.811 (Gas Price)
Ŷ = 3.536 + 0.811 X3
Gas Price is not a variable that contributes
to travel time. No need to explore this value.

Statistics - Simple Linear and Multiple Linear Regression

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Statistics - Simple Linear and Multiple Linear Regression

Similar to Statistics - Simple Linear and Multiple Linear Regression (20)

More from Bryll Edison Par

More from Bryll Edison Par (8)

Recently uploaded

Recently uploaded (20)

Statistics - Simple Linear and Multiple Linear Regression