2. REGRESSION ANALYSIS
• Regression was developed in the field of statistics and is studied as a
model for understanding the relationship between input and output
numerical variables
• In machine learning we are using regression as a supervised
learning technique to forecasting numeric data and quantifying the
size and strength of a relationship between an outcome and its
predictors.
Applications:
• Predicting the stock market index based on economics
• Projecting a company’s revenue based on the amount spent for
advertisement
3. Regression Models
• Regression techniques mostly differ based on the number of
independent variables and the type of relationship between the
independent and dependent variables.
5. Classification Based on Relation b/w Variables
• A linear relationship exists when two quantities are proportional to
each other. If you increase one of the quantities, the other quantity
either increases or decreases at a constant rate
• A nonlinear relationship is a type of relationship between two
entities in which change in one entity does not correspond with
constant change in the other entity.
• The graph of a linear relation forms a straight line, where as the
graph for a non-linear relation is curved.
6. Classification Based on Independent Variables
• Simple regression :a single independent variable is used to
predict the value of a dependent variable.
• Multiple regression : two or more independent variables are
used to predict the value of a dependent variable.
• The difference between the two is the number of independent
variables.
• Both of these models assume that the dependent variable is
continuous.
7. Simple Linear Regression
• Simple Linear regression performs the task to predict a
dependent variable value (y) based on a given
independent variable (x).
• So, this regression technique finds out a linear
relationship between x (input) and y(output). Hence, the
name is Linear Regression.
• Relationship between target variable and input variable
is established by fitting a line, known as the regression
line.
8. Simple Linear Regression
• A line can be represented by linear equation
y = mX + c,
y is the dependent variable, X is the independent variable,
m is the slope, c is the intercept
• The slope m indicates change in y for each increase in x.
• The value C indicates the value of y when x = 0.
• It is known as the intercept because it specifies where the
line crosses the vertical axis.
9. Simple Linear Regression
• In machine learning, we rewrite our equation as
y = w0+ w1 x
where w’s are the parameters of the model, x is the input,
and y is the target variable.
• Different values of w0 and w1 will give us different lines.
• Performing a regression analysis involves finding
parameter estimates for wo and w1.
10. Simple Linear Regression
• Once we find the best w1 and w0 values, we get the best
fit straight line that establishes the relation between
these variables.
• Best values of w1 and w0 are those , that gives the
minimum error for the given dataset.
• So when we are finally using our model for prediction, it
will predict the value of y for the input value of x.
12. Least Square Estimation
• In order to determine the optimal estimates of w1 and
wo, an estimation method known as ordinary least
squares (OLS) was used.
• In OLS regression, the slope and intercept are chosen so
that they minimize the sum of the squared errors
• ie, the vertical distance between the predicted y value
and the actual y value.
• These errors are known as residuals
13. Least Square Estimation
• In mathematical terms, the goal of OLS regression can be expressed
as the task of minimizing the following equation:
• This equation defines e (the error) as the difference between the
actual y value and the predicted y value. The error values are
squared and summed across all points in the data.
14. Least Square Estimation
|
|
1
2
|
|
1
)
(
)
)(
(
1 D
i
i
D
i
i
i
x
x
y
y
x
x
w
x
w
y
w
1
0
Following equation gives the values of the w0, w1 of the regression line
which minimizes the sum of the squares of the offsets ("the residuals")
of the points from the curve.
15. • Obtain a linear regression for the data given in the table
below assuming that y is the dependent variable.
16. Mean of x=66
Mean of y=56.4
x Xi-X Yi Yi-Y (Xi-X)(Yi-Y) (xi-x)2
55 -11 52 -4.4 48.4 121
60 -6 54 -2.4 14.4 36
65 -1 56 -.4 .4 1
70 4 58 1.6 6.4 16
80 14 62 5.6 78.4 196
148 370
|
|
1
2
|
|
1
)
(
)
)(
(
1 D
i
i
D
i
i
i
x
x
y
y
x
x
w
W1=148/370=.4
W0=56.4-.4*66
=30
Y= .4x +30
17. Mean of x=66
Mean of y=56.4
x Xi-X Yi Yi-Y (Xi-X)(Yi-Y) (yi-y)2
55 -11 52 -4.4 48.4 19.36
60 -6 54 -2.4 14.4 5.76
65 -1 56 -.4 .4 .16
70 4 58 1.6 6.4 2.56
80 14 62 5.6 78.4 31.36
148 59.2
W1=148/59.2=2.5
W0=66-2.5*56.6
=66-141
= -75
x= 2.5y-75
18. Multiple Linear Regression
• Most real-world analyses have more than one
independent variable
• Multiple regression is an extension of simple linear
regression.
• The goal in both cases is similar: find values of
coefficients that minimize the prediction error of a linear
equation.
• For simple liner regression: y = w0+ w1 x
19. Multiple Linear Regression
• The key difference is that there are additional terms for
the additional independent variables.
• Represented using the following equation:
• y is specified as the sum of an intercept term α plus the
product of the estimated β value and the x values for
each of the i features
20. Multiple Linear Regression
• This can be re-expressed using a condensed formulation:
• The dependent variable is now a vector, Y, with a row
for every example.
• The independent variables have been combined into a
matrix, X, with a column for each feature plus an
additional column of '1' values for the intercept term.
21. Multiple Linear Regression
• The goal now is to solve for the vector β that minimizes the sum of the
squared errors between the predicted and actual y values.
• The best estimate of the vector β can be computed as:
Example :The yield of rice per acre depends upon quality of seed, fertility of
soil, fertilizer used, temperature, rainfall.
22. Multiple Linear Regression
• The regression coefficients β and errors ε are also now
vectors
• The goal now is to solve for the vector β that minimizes the
sum of the squared errors between the predicted and actual
y values.
• The best estimate of the vector β can be computed as: