Linear regression theory

Linear RegressionTheory
https://in.linkedin.com/in/sauravmukherjee

What &Why
1
What is Regression?
Formulation of a functional relationship between a set of Independent or
Explanatory variables (X’s) with a Dependent or Response variable (Y).
Y = f(X)
Why Regression?
Knowledge of Y is crucial for decision making.
• Will he/she buy or not?
• Shall I offer him/her the loan or not?
• ………
X is available at the time of decision making and is related to Y, thus making
it possible to have a prediction of Y.

2
Types of Regression
Y
Continuous
E.g., SalesVolume, Claim
Amount, % of sales growth
etc.
Binary (0/1)
E.g., Buy/No-Buy, Survive/Not-
Survive,Win/Loss etc
Ordinary Least Square
(OLS) Regression
Logistic Regression

• Regression analysis is used to:
• Predict the value of a
dependent variable based on
the value of at least one
independent variable
• Explain the impact of changes
in an independent variable on
the dependent variable
• Dependent variable: the
variable we wish to explain,
usually denoted by Y.
• Independent variable: the
variable used to explain the
dependent variable. Usually
denoted by X.
3
Intro to RegressionAnalysis

4
Regression Example
Predict the fitness of a
person based on one or
more parameters.

• Only one independent
variable, x
• Relationship between x
and y is described by a
linear function
• Changes in y are
assumed to be caused
by changes in x
6
Simple Linear Regression Model

7
Assumptions for Simple Linear Regression
E(ε) = 0

8
Assumptions for Multiple Regression

9
݅
2
ߝ
ଶ

10
i]j0,)ε[E(ε ji ≠=

12
Simple Linear Regression Model

17
The Simple Linear Regression Model

18
The Multiple Linear Regression Model

19
Model for Multiple Regression

20
Positive Linear Relationship
Negative Linear Relationship
No Relationship
Relationship NOT Linear
Types of Regression Relationships

21
Unknown
Relationship
Population Random Sample
Y Xi i i= + +β β ε0 1
☺ ☺
☺
☺
☺
☺
☺
Population & Sample Regression Models

22
PredictedValue
ofY for Xi
Intercept = β0
Random Error for this x value
Y
X
uXββY 10 ++=
xi
Slope = β1
ui
Individual
person's marks
Population Linear Regression

23
Linear component
Population y
intercept
Population Slope
Coefficient
Random
Error term, or
residual
Dependent
Variable
Independent
Variable
Random Error
component
uXββY 10 ++=
But can we actually get this equation?
If yes what all information we will need?
Population Regression Function

24
PredictedValue
ofY for Xi
Intercept = β0
Random Error for this x value
Y
Xxi
Slope = β1
exbby 10 ++=
ei
ObservedValue
of y for xi
Sample Regression Function

25
exbby 10i ++=
Estimate of the
regression intercept
Estimate of the
regression slope
Independent
variable
Error term
Notice the similarity with the Population Regression Function
Can we do something of the error term?
Sample Regression Function

• Represents the influence of all the variable which
we have not accounted for in the equation
• It represents the difference between the actual y
values as compared the predicted y values from the
Sample Regression Line
• Wouldn't it be good if we were able to reduce this
error term?
• By the way - what are we trying to achieve by
Sample Regression?
26
The ErrorTerm (Residual)

27
HowWell A Model Fits the Data

28
Comparing the Regression Model to a Baseline Model

29
Comparing the Regression Model to a Baseline Model

• The sum of the residuals from the least squares regression line is
zero.
• The sum of the squared residuals is a minimum.
Minimize( )
• The simple regression line always passes through the mean of
the y variable and the mean of the x variable
• The least squares coefficients are unbiased estimates of β0 and
β1
30
0)ˆ( =−∑ yy
2
)ˆ( yy∑ −
OLS Regression Properties

• Parameter Instability - This happens in situations where
correlations change over a period of time.This is very
common in financial markets where economic, tax,
regulatory, and political factors change frequently.
• Public knowledge of a specific regression relation may
cause a large number of people to react in a similar fashion
towards the variables, negating its future usefulness.
• If any of the regression assumptions are violated,
predicted dependent variables and hypothesis tests will not
hold valid.
31
Limitations of RegressionAnalysis

• In simple linear regression, the dependent variable was assumed to be
dependent on only one variable (independent variable)
• In General Multiple Linear Regression model, the dependent variable derives its
value from two or more than two variable.
• General Multiple Linear Regression model take the following form:
where:
Yi = ith observation of dependent variableY
Xki = ith observation of kth independent variable X
b0 = intercept term
bk = slope coefficient of kth independent variable
εi = error term of ith observation
n = number of observations
k = total number of independent variables
32
ikikiii XbXbXbbY ε+++++= .........22110
General Multiple Linear Regression Model

• As we calculated the intercept and the slope coefficient in case of
simple linear regression by minimizing the sum of squared errors,
similarly we estimate the intercept and slope coefficient in multiple
linear regression.
• Sum of Squared Errors is minimized and the slope coefficient is
estimated.
• The resultant estimated equation becomes:
• Now the error in the ith observation can be written as:
33
∑=
n
i
i
1
2
ε
kikiii XbXbXbbY
∧∧∧∧∧
++++= .........22110






++++−=−=
∧∧∧∧∧
kikiiiiii XbXbXbbYYY .........22110ε
Estimated Regression Equation

34
Assumptions of Multiple Regression Model
• There exists a linear relationship between the dependent and
independent variables.
• The expected value of the error term, conditional on the
independent variables is zero.
• The error terms are homoskedastic, i.e. the variance of the
error terms is constant for all the observations.
• The expected value of the product of error terms is always
zero, which implies that the error terms are uncorrelated with
each other.
• The error term is normally distributed.
• The independent variables doesn't have any linear
relationships between each other.

Linear regression theory

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Linear regression theory

Similar to Linear regression theory (20)

More from Saurav Mukherjee

More from Saurav Mukherjee (6)

Recently uploaded

Recently uploaded (20)

Linear regression theory