Regression
Linear Regression
• Simple Linear Regression
• Multiple Linear Regression
• Polynomial Regression
Non-Linear Regression
• Support Vector Regression (SVR)
• Decision Tree Regression
• Random Forest Regression
1. Simple Linear Regression
Simple Linear Regression
Ordinary Least Square
2. Multiple Linear Regression
A Caveat
Dummy Variables
Dummy Variable Trap
Statistical Significance
Tossing a coin
H0: Fair Coin
H1: Unfair Coin
0.5
0.25
0.12
0.06
0.03
0.01
P-Value
α = 0.05
P-Value (Probability Value)
Building ML Model Step-by-Step
Only 1 independent and 1
dependent variable
Many independent and 1
dependent variable
Which variable to include and which to exclude and Why?
Selecting the Right Variable
5 Methods of Building Models
1.All-in
2.Backward Elimination
3.Forward Selection
4.Bidirectional Elimination
Stepwise
Regression
All-in Model
Include all the variables in the model when:
• You have prior knowledge because of domain knowledge or such similar model has
been done previously.
• You have to because of some existing framework in the organization etc.
Backward Elimination
Backward Elimination
Forward Elimination
Bidirectional Elimination
All Possible Models
Multiple Regression- Python Code
# Importing the libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
# Importing the dataset
dataset = pd.read_csv('50_Startups.csv')
X = dataset.iloc[:, :-1].values
y = dataset.iloc[:, 4].values
# Encoding categorical data
from sklearn.preprocessing import LabelEncoder, OneHotEncoder
labelencoder = LabelEncoder()
X[:, 3] = labelencoder.fit_transform(X[:, 3])
onehotencoder = OneHotEncoder(categorical_features = [3])
X = onehotencoder.fit_transform(X).toarray()
# Avoiding the Dummy Variable Trap
X = X[:, 1:]
# Splitting the dataset into the Training set and Test set
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2,
random_state = 0)
# Fitting Multiple Linear Regression to the Training set
from sklearn.linear_model import LinearRegression
regressor = LinearRegression()
regressor.fit(X_train, y_train)
# Predicting the Test set results
y_pred = regressor.predict(X_test)
Backward Elimination
# Building an optimal model using Backward Elimination
import statsmodels.regression.linear_model as sm
X = np.append(arr = np.ones((50,1)).astype(int),values = X, axis = 1)
X_opt = X[:,[0,1,2,3,4,5]]
regressor_OLS = sm.OLS(endog = y, exog = X_opt).fit()
regressor_OLS.summary()
X_opt = X[:,[0,3,4,5]]
regressor_OLS = sm.OLS(endog = y, exog = X_opt).fit()
regressor_OLS.summary()
3. Polynomial Linear Regression
Examine the data points as linear and parabolic
Parabolic Curve
4. Support Vector Regression
Nonlinear regression involves curves.
Support Vector Machine (SVM), is a classifier
predicting discrete categorical labels.
Support Vector Regression is a regressor
predicting continuous ordered variables.
Both use very similar algorithms, but predict
different types of variables.
In Simple Regression, we try to minimize the
error rate while in SVR we try to fit the error
within a certain threshold.
Support Vector Regression
Support Vector Regression
Support Vector Regression
Support Vector Regression
Support Vector Regression
Support Vector Regression
• Kernel : The function used to map a lower dimensional data into a higher
dimensional data.
• Hyper Plane: In SVM this is basically the separation line between the data
classes. Although in SVR we are going to define it as the line that will help
us predict the continuous value or target value.
• Boundary Line: In SVM, there are two lines other than Hyper Plane which
creates a margin. The support vectors can be on the Boundary lines or
outside it. This boundary line separates the two classes. In SVR the concept
is same.
• Support Vectors: This are the data points which are closest to the
boundary. The distance of the points is minimum or least.
Support Vector Regression
Support Vector Regression
Support Vector Regression
Support Vector Regression
Support Vector Regression
Support Vector Regression
Support Vector Regression
Support Vector Regression
5. Decision Tree Regression
Decision Tree Regression
Decision Tree Regression
Decision Tree Regression
Decision Tree Regression
Decision Tree Regression
Decision Tree Regression
Decision Tree Regression
Decision Tree Regression
6. Random Forest Regression
Random Forest Regression
Random Forest Regression
Evaluating Regression Models Performance
R Squared- Simple Linear Regression
Adjusted R2
Adjusted R2
Interpreting Linear Regression Coefficients
Interpreting Linear Regression Coefficients
Regression Model Pros and Cons
3. Regression.pdf

3. Regression.pdf