Mrs.Harsha Patil,Dr.D.Y.Patil ACS College,Pimpri,Pune
What is Regression :
 Regression analysis is a statistical method to model the relationship
between a dependent (target) and independent (predictor) variables with
one or more independent variables.
 It helps to understand how the value of the dependent variable is changing
corresponding to an independent variable when other independent variables
are held fixed. It predicts continuous/real values such as temperature.
 Regression is a supervised learning technique which helps in finding the
correlation between variables and enables us to predict the continuous
output variable based on the one or more predictor variables. It is mainly
used for prediction, forecasting, time series modeling, and determining
the causal-effect relationship between variables., age, salary, price, etc.
Mrs.Harsha Patil,Dr.D.Y.Patil ACS College,Pimpri,Pune
 In Regression, we plot a graph between the variables which best fits the
given data points, using this plot, the machine learning model can make
predictions about the data. In simple words, "Regression shows a line or
curve that passes through all the data points on target-predictor graph
in such a way that the vertical distance between the data points and
the regression line is minimum." The distance between data points and
line tells whether a model has captured a strong relationship or not.
 Some examples of regression can be as:
 Prediction of rain using temperature and other factors
 Determining Market trends
 Prediction of road accidents due to rash driving.
Mrs.Harsha Patil,Dr.D.Y.Patil ACS College,Pimpri,Pune
 Terminologies Related to the Regression :
 Dependent Variable: The main factor in Regression analysis which we want to
predict or understand is called the dependent variable. It is also called target
variable.
 Independent Variable: The factors which affect the dependent variables or which are
used to predict the values of the dependent variables are called independent variable,
also called as a predictor.
 Outliers: Outlier is an observation which contains either very low value or very high
value in comparison to other observed values. An outlier may hamper the result, so it
should be avoided.
 Multicollinearity: If the independent variables are highly correlated with each other
than other variables, then such condition is called Multicollinearity. It should not be
present in the dataset, because it creates problem while ranking the most affecting
variable.
 Underfitting and Overfitting: If our algorithm works well with the training dataset but
not well with test dataset, then such problem is called Overfitting. And if our algorithm
does not perform well even with training dataset, then such problem is
called underfitting.
Mrs.Harsha Patil,Dr.D.Y.Patil ACS College,Pimpri,Pune
 Types of Regression :
There are various types of regressions which are used in data science and
machine learning.
 Linear Regression
 Logistic Regression
 Polynomial Regression
 Support Vector Regression
 Decision Tree Regression
 Random Forest Regression
 Ridge Regression
 Lasso Regression:
Mrs.Harsha Patil,Dr.D.Y.Patil ACS College,Pimpri,Pune
Mrs.Harsha Patil,Dr.D.Y.Patil ACS College,Pimpri,Pune
Linear Regression:
 Linear regression is a statistical regression method which is used for
predictive analysis.
 It is one of the very simple and easy algorithms which works on regression
and shows the relationship between the continuous variables.
 It is used for solving the regression problem in machine learning.
 Linear regression shows the linear relationship between the independent
variable (X-axis) and the dependent variable (Y-axis), hence called linear
regression.
 If there is only one input variable (x), then such linear regression is
called simple linear regression. And if there is more than one input
variable, then such linear regression is called multiple linear regression.
 The relationship between variables in the linear regression model can be
explained using the below image. Here we are predicting the salary of an
employee on the basis of the year of experience.
Mrs.Harsha Patil,Dr.D.Y.Patil ACS College,Pimpri,Pune
Mrs.Harsha Patil,Dr.D.Y.Patil ACS College,Pimpri,Pune
 Some popular applications of linear regression are:
 Analyzing trends and sales estimates
 Salary forecasting
 Real estate prediction
 Arriving at ETAs in traffic.
1.Simple Linear Regression :
 Simple Linear Regression is a type of Regression algorithms that models the
relationship between a dependent variable and a single independent variable.
The relationship shown by a Simple Linear Regression model is linear or a
sloped straight line, hence it is called Simple Linear Regression.
 The key point in Simple Linear Regression is that the dependent variable must
be a continuous/real value. However, the independent variable can be
measured on continuous or categorical values.
 Simple Linear regression algorithm has mainly two objectives:
Mrs.Harsha Patil,Dr.D.Y.Patil ACS College,Pimpri,Pune
 Model the relationship between the two variables. Such as the
relationship between Income and expenditure, experience and Salary, etc.
 Forecasting new observations. Such as Weather forecasting according to
temperature, Revenue of a company according to the investments in a year,
etc.
 Recall the geometry lesson from high school. What is the equation of a line?
y = mx + c
Linear regression is nothing but a manifestation of this simple equation.
Mrs.Harsha Patil,Dr.D.Y.Patil ACS College,Pimpri,Pune
Where,
 y is the dependent variable i.e. the variable that needs to be estimated and
predicted.
 x is the independent variable i.e. the variable that is controllable. It is the
input.
 m is the slope. It determines what will be the angle of the line. It is the
parameter denoted as β.
 c is the intercept. A constant that determines the value of y when x is 0.
 We may recognize the equation for simple linear regression as the equation
for a sloped line on an x and y axis.
y = b0 + b1 * x1
Mrs.Harsha Patil,Dr.D.Y.Patil ACS College,Pimpri,Pune
Where ,
 b0 is constant.
 y is dependent variable
 B1 coefficient can be thought of as a multiplier that connects the independent
and dependent variables. It translates how much y will be affected by a unit
change in x. In other words, a change in x does not usually mean an equal
change in y.
 x1is an independent variable.
Mrs.Harsha Patil,Dr.D.Y.Patil ACS College,Pimpri,Pune
 Simple Linear Regression in Python :
#importing libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
# Importing the dataset
dataset = pd.read_csv('salary_data.csv')
x = dataset.iloc[:, :-1].values
y = dataset.iloc[:, 1].values
Mrs.Harsha Patil,Dr.D.Y.Patil ACS College,Pimpri,Pune
# Splitting the dataset into the Training set and Test set
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=1/3,
random_state=0)
# Fitting Simple Linear Regression to the Training set
from sklearn.linear_model import LinearRegression
regressor = LinearRegression()
regressor.fit(X_train, y_train)
Mrs.Harsha Patil,Dr.D.Y.Patil ACS College,Pimpri,Pune
# Visualizing the Training set results
viz_train = plt
viz_train.scatter(X_train, y_train, color='red')
viz_train.plot(X_train, regressor.predict(X_train), color='blue')
viz_train.title('Salary VS Experience (Training set)')
viz_train.xlabel('Year of Experience')
viz_train.ylabel('Salary')
viz_train.show()
# Visualizing the Test set results
viz_test = plt
viz_test.scatter(X_test, y_test, color='red')
viz_test.plot(X_train, regressor.predict(X_train), color='blue')
viz_test.title('Salary VS Experience (Test set)')
viz_test.xlabel('Year of Experience')
viz_test.ylabel('Salary')
viz_test.show()
Mrs.Harsha Patil,Dr.D.Y.Patil ACS College,Pimpri,Pune
 After running above code excluding code explanations part, you can see 2
plots in the console window as shown below:
Mrs.Harsha Patil,Dr.D.Y.Patil ACS College,Pimpri,Pune
Mrs.Harsha Patil,Dr.D.Y.Patil ACS College,Pimpri,Pune
 One plot is from training set and another from test. Blue lines are in the
same direction. Our model is good to use now.
 Now we can use it to calculate (predict) any values of X depends on y or any
values of y depends on X. This can be done by using predict() function as
follows:
# Predicting the result of 5 Years Experience
y_pred =regressor.predict(np.array([5]).reshape(1, 1))
Output :
The value of y_pred with X = 5 (5 Years Experience) is 73545.90
You can offer to your candidate the salary of ₹ 73,545.90
and this is the best salary for him!
Mrs.Harsha Patil,Dr.D.Y.Patil ACS College,Pimpri,Pune
 In conclusion, with Simple Linear Regression, we have to do 5 steps as
per below:
 Importing the dataset.
 Splitting dataset into training set and testing set (2 dimensions of X and y
per each set). Normally, the testing set should be 5% to 30% of dataset.
 Visualize the training set and testing set to double check (you can bypass
this step if you want).
 Initializing the regression model and fitting it using training set (both X and
y).
 Let’s predict!!
We can also pass an array of X (of test set):
 # Predicting the Test set results
y_pred = regressor.predict(X_test)
Mrs.Harsha Patil,Dr.D.Y.Patil ACS College,Pimpri,Pune
Predict y_pred using array of X_test
Mrs.Harsha Patil,Dr.D.Y.Patil ACS College,Pimpri,Pune
2.Multiple Linear Regression :
 We have seen the concept of simple linear regression where a single
predictor variable x(years of experience) was used to model the response
variable y (Salary). In many applications, there is more than one factor that
effects the response. Multiple regression models describe how a single
response variable y depends linearly on a number of predictor variables.
 For Examples:
 The selling price of a house can depend on the desirability of the location,
the number of bedrooms, the number of bathrooms, the year the house was
built, the square footage of the plot and a number of other factors.
 The height of a child can rest on the height of the mother, the height of the
father, nutrition, and environmental factors.
Mrs.Harsha Patil,Dr.D.Y.Patil ACS College,Pimpri,Pune
 Multiple linear regression works the same way as that of simple linear
regression, except for the introduction of more independent variables and
their corresponding coefficients.
 In Simple Linear Regression we dealt with equation:
y = b0 + b1 * x1
With concerned to it, Multiple Linear Regression equation will become:
y = b0 + b1 * x1+ b2 * x2 + b3 * x3 + ………... +bn * xn
Or
i
Y= b0 + ∑ bn
1
Mrs.Harsha Patil,Dr.D.Y.Patil ACS College,Pimpri,Pune
 In translation, predicted value y is sum of all features multiplied with their
coefficients, summed with base coefficient b0.
Where,
 y is dependent variable/ predicted value.
 xi – features / independent variable / explanatory variable / observed
variable
 b0 is constant
 bn are coefficients that can be thought of as a multiplier that connects the
independent and dependent variables. It translates how much y will be
affected by a unit change in x. In other words, a change in x does not usually
mean an equal change in y.
Alternatively,
 So simplified, we are predicting what value of y will be depending on
features xi and with coefficients bi, we are deciding how much each feature
is affecting predicted value.
Mrs.Harsha Patil,Dr.D.Y.Patil ACS College,Pimpri,Pune
Multiple Linear Regression in Python :
#Importing libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
# Importing the dataset
dataset = pd.read_csv('salary_data.csv')
x = dataset.iloc[:, :-1].values
y = dataset.iloc[:, 4].values
#Splitting the dataset into the Training set and Test set
X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size = 0.2,
random_state = 0)
Mrs.Harsha Patil,Dr.D.Y.Patil ACS College,Pimpri,Pune
# Fitting Multiple Linear Regression to the Training set.
regressor = LinearRegression()
regressor.fit(X_train, y_train)
 Now! We have the Multiple Linear Regression model, we can use it to
calculate (predict) any values of x depends on y or any values of y depends
on x. This is how we do it as follows:
 ‘’’Predicting the result of salary of new employee with 5 Years of total
Experience, 2 years as team lead Experience, one year as project manager
and has 2 certifications’’’
x_new = [[5],[2],[1],[2]]
y_pred = regressor.predict(np.array(x_new).reshape(1, 4))
print(y_pred)
accuracy = (regressor.score(X_test,y_test))
print(accuracy)
Mrs.Harsha Patil,Dr.D.Y.Patil ACS College,Pimpri,Pune
Output :
The value of y_pred with x_new = [[5],[2],[1],[2]](5 Years of total Experience,
2 years as team lead, one year as project manager and 2 Certifications) is ₹
48017.20
You can offer to your candidate the salary of ₹48017.20 and this is the
best salary for him!
Mrs.Harsha Patil,Dr.D.Y.Patil ACS College,Pimpri,Pune
3.Polynomial Linear Regression :
 Polynomial regression is a form of regression analysis in which the
relationship between the independent variable x and the dependent
variable y is modelled as nth degree polynomial in x.
 Polynomial regression fits a nonlinear relationship between the value of x and
the corresponding conditional mean of y, denoted E(y |x).
 Although polynomial regression fits a nonlinear model to the data, as
a statistical estimation problem it is linear, in the sense that the regression
function E(y | x) is linear in the unknown parameters that are estimated from
the data.
 For this reason, polynomial regression is considered to be a special case
of multiple linear regression.
Mrs.Harsha Patil,Dr.D.Y.Patil ACS College,Pimpri,Pune
 For Example: Increment of salaryof employees per year is often non-linear.
We may express it in terms of polynomial Equation as
y = b0 + b1x+ b2x
2 + b3x
3 + ......+ bn x
n
where,
 b0 is constant .
 y is dependent variable
 bicoefficient can be thought of as a multiplier that connects the independent
and dependent variables. It translates how much y will be affected by a
degree or powerof change in x. In other words, a change in xi does not
usually mean an equal change in y.
 x is an independent variable.
Mrs.Harsha Patil,Dr.D.Y.Patil ACS College,Pimpri,Pune
 Let us consider dataset of this kind of example that represent the Polynomial
shape.
Mrs.Harsha Patil,Dr.D.Y.Patil ACS College,Pimpri,Pune
 To get an overview of the increment of salary, let’s visualize the data set into
a chart:
Mrs.Harsha Patil,Dr.D.Y.Patil ACS College,Pimpri,Pune
 Let’s think about our candidate. He has 5.5 Year of experience. What if we
use the Linear Regression in this example?
Mrs.Harsha Patil,Dr.D.Y.Patil ACS College,Pimpri,Pune
Polynomial Linear Regression in Python :
#Importing libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
# Importing the dataset
dataset = pd.read_csv(‘position_salaries’)
X = dataset.iloc[:, 1:2].values
y = dataset.iloc[:, 2].values
Mrs.Harsha Patil,Dr.D.Y.Patil ACS College,Pimpri,Pune
# Splitting the dataset into the Training set and Test set
from sklearn.model_selection importtrain_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,
random_state=0)
# Fitting Polynomial Regression to the dataset
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import PolynomialFeatures
poly_reg = PolynomialFeatures(degree=4)
X_poly = poly_reg.fit_transform(X)
pol_reg = LinearRegression()
pol_reg.fit(X_poly, y)
Mrs.Harsha Patil,Dr.D.Y.Patil ACS College,Pimpri,Pune
# Visualizing the Polynomial Regression results
def viz_polymonial():
plt.scatter(X, y, color='red')
plt.plot(X, pol_reg.predict(poly_reg.fit_transform(X)), color='blue')
plt.title('Truth or Bluff (Linear Regression)')
plt.xlabel('Position level')
plt.ylabel('Salary')
plt.show()
return
viz_polymonial()
Mrs.Harsha Patil,Dr.D.Y.Patil ACS College,Pimpri,Pune
# Additional feature
# Making the plot line (Blue one) more smooth
def viz_polymonial_smooth():
X_grid = np.arange(min(X), max(X), 0.1)
X_grid = X_grid.reshape(len(X_grid), 1)
# Visualizing the Polymonial Regression results
plt.scatter(X, y, color='red')
plt.plot(X_grid, pol_reg.predict(poly_reg.fit_transform(X_grid)), color='blue')
plt.title('Truth or Bluff (Linear Regression)')
plt.xlabel('Position level')
plt.ylabel('Salary')
plt.show()
return
viz_polymonial_smooth()
Mrs.Harsha Patil,Dr.D.Y.Patil ACS College,Pimpri,Pune
 After calling the viz_polynomial() function, you can see a plotting as per
below:
Mrs.Harsha Patil,Dr.D.Y.Patil ACS College,Pimpri,Pune
Last step, let’s predict the value of our candidate (with 5.5 YE) Polynomial
Regression model:
# Predicting a new result with Polymonial Regression
print(pol_reg.predict(poly_reg.fit_transform([[5.5]])))
Output:
It’s time to let our candidate know, we will offer him a best salary in class with₹
132,148!
Mrs.Harsha Patil,Dr.D.Y.Patil ACS College,Pimpri,Pune
 As we know linear regression, models a line to depicts the data points, while
a support vector regression (SVR), models a hyperplane to portrays or cover
the data points.
 Hyperplane is an area for margin of tolerance, set for approximation of
datapoints. The training instances inside to the hyperplane that helps to
define the margin are called Support Vectors.
 SVR tries to have as many support vectors as possible within the boundary
lines without much margin violation, thus keeping the error within the
threshold decided by the boundary lines of hyperplane.
Mrs.Harsha Patil,Dr.D.Y.Patil ACS College,Pimpri,Pune
Mrs.Harsha Patil,Dr.D.Y.Patil ACS College,Pimpri,Pune
 For Example: Stock price prediction as shown below
Mrs.Harsha Patil,Dr.D.Y.Patil ACS College,Pimpri,Pune
 Intuitively, support vectors contribute to the error ε made by the SVR called
threshold and thus, we want most of the support vectors to be in that
threshold. We can model SVR through kernels that indicate the similarity
measure between the test data point and the support vectors.
 A kernel is a set of mathematical functions which takes data as input and
transforms it into the form required by the output.
 Support Vector regression supports linear and non-linear regression. As it
seems in the above graph, the mission is to fit as many instances as
possible between the lines while limiting the margin violations. The violation
concept in this example represents as ε (epsilon).
 In regression problems, we generally try to find a line that best fits the data
provided. The equation of the line in its simplest form is described as y=mx
+c
 In the case of support vector regression, we do something similar but with a
slight change. Here we define a small error value e (error = prediction -
actual).
Mrs.Harsha Patil,Dr.D.Y.Patil ACS College,Pimpri,Pune
 The value of e determines the width of the error tube (also called insensitive
tube or hyper plane). The value of e determines the number of support
vectors, and a smaller e value indicates a lower tolerance for error.
 Thus, we try to find the line’s best fit in such a way that:
(mx+c)-y ≤ e and y-(mx+c) ≤ e
 Also, we do not care about errors as long, as they are less than e.
 For example, if we’re dealing with stock trading, and we want to minimize
the trading loss, but we do not care about loss as long as they are less than
a certain value (e).
 Hence, the support vector regression model depends only on a subset of the
training data points, as the cost function of the model ignores any training
data close to the model prediction when the error is less than e.
 In the field of machine learning, a support vector regression algorithm can, in
some cases, be more suitable for regression problems than other common
and popular algorithms. Below are the cases where a support vector
regression is advantageous over other regression algorithms:
Mrs.Harsha Patil,Dr.D.Y.Patil ACS College,Pimpri,Pune
 SVR is memory efficient, which means it takes a relatively lower amount of
calculation resources to train the model. This is because presenting the
solution by means of a small subset of training points gives enormous
computational advantages.
 There are non-linear or complex relationships between features and labels.
This is because we have the option to convert non-linear relationships to
higher-dimensional problems in the case of support vector regression.
Mrs.Harsha Patil,Dr.D.Y.Patil ACS College,Pimpri,Pune
 Decision trees are supervised learning algorithms used for both,
classification and regression.
 Decision trees are assigned to the information-based learning algorithms
which use different measures of information gain for learning. We can use
decision trees for issues where we have continuous but also categorical
input and target features.
 The main idea of decision trees is to find those descriptive features which
contain the most "information" regarding the target feature and then split the
dataset along the values of these features such that the target feature
values for the resulting sub datasets are as pure as possible.
 The descriptive feature which leaves the target feature most purely is said to
be the most informative one.
 This process of finding the "most informative" feature is done until we
accomplish a stopping criterion, where we then finally end up in so
called leaf nodes.
Mrs.Harsha Patil,Dr.D.Y.Patil ACS College,Pimpri,Pune
Mrs.Harsha Patil,Dr.D.Y.Patil ACS College,Pimpri,Pune
 The leaf nodes contain the predictions we will make for new query instances
presented to our trained model.
 This is possible since the model has kind of learned the underlying structure
of the training data and hence can, given some assumptions, make
predictions about the target feature value (class) of unseen query instances.
 A decision tree mainly contains of a root node, interior nodes, and leaf
nodes which are then connected by branches.
Mrs.Harsha Patil,Dr.D.Y.Patil ACS College,Pimpri,Pune
 Decision trees are sensitive to the specific data on which they are trained. If
the training data is changed the resulting decision tree can be quite different
and in turn the predictions can be quite different.
 Also, Decision trees are computationally expensive to train, carry a big risk
of overfitting (learning system tightly fits the given training data so much that
it would be inaccurate in predicting the outcomes of the untrained data.
In decision trees, over-fitting occurs when the tree is designed so as to
perfectly fit all samples in the training data set.), and tend to find local optima
because they can’t go back after they have made a split.
 To solve these weaknesses, we use Random Forest which illustrates the
power of combining many decision trees into one model.
Mrs.Harsha Patil,Dr.D.Y.Patil ACS College,Pimpri,Pune
Mrs.Harsha Patil,Dr.D.Y.Patil ACS College,Pimpri,Pune
 Random forest is a Supervised Learning algorithm which uses ensemble
learning method for classification and regression.
 An Ensemble method is a technique that combines the predictions from
multiple machine learning algorithms together to make more accurate
predictions than any individual model. A model comprised of many models is
called an Ensemble model.
Mrs.Harsha Patil,Dr.D.Y.Patil ACS College,Pimpri,Pune
Types of Ensemble Learning:
 Boosting.
 Bootstrap Aggregation (Bagging).
1. Boosting
Boosting refers to a group of algorithms that utilize weighted averages to
make weak learners into stronger learners. Boosting is all about “teamwork”.
Each model that runs, dictates what features the next model will focus
on.In boosting as the name suggests, one is learning from other which in
turn boosts the learning.
2. Bootstrap Aggregation (Bagging)
Bootstrap allows us to better understand the bias and the variance with the
dataset. Bootstrap involves random sampling of small subset of data from
the dataset.Bagging makes each model run independently and
then aggregates the outputs at the end without preference to any model.
Mrs.Harsha Patil,Dr.D.Y.Patil ACS College,Pimpri,Pune
Thanks !!!
Mrs.Harsha Patil,Dr.D.Y.Patil ACS College,Pimpri,Pune

Introduction to Regression . pptx

  • 1.
  • 2.
    What is Regression:  Regression analysis is a statistical method to model the relationship between a dependent (target) and independent (predictor) variables with one or more independent variables.  It helps to understand how the value of the dependent variable is changing corresponding to an independent variable when other independent variables are held fixed. It predicts continuous/real values such as temperature.  Regression is a supervised learning technique which helps in finding the correlation between variables and enables us to predict the continuous output variable based on the one or more predictor variables. It is mainly used for prediction, forecasting, time series modeling, and determining the causal-effect relationship between variables., age, salary, price, etc. Mrs.Harsha Patil,Dr.D.Y.Patil ACS College,Pimpri,Pune
  • 3.
     In Regression,we plot a graph between the variables which best fits the given data points, using this plot, the machine learning model can make predictions about the data. In simple words, "Regression shows a line or curve that passes through all the data points on target-predictor graph in such a way that the vertical distance between the data points and the regression line is minimum." The distance between data points and line tells whether a model has captured a strong relationship or not.  Some examples of regression can be as:  Prediction of rain using temperature and other factors  Determining Market trends  Prediction of road accidents due to rash driving. Mrs.Harsha Patil,Dr.D.Y.Patil ACS College,Pimpri,Pune
  • 4.
     Terminologies Relatedto the Regression :  Dependent Variable: The main factor in Regression analysis which we want to predict or understand is called the dependent variable. It is also called target variable.  Independent Variable: The factors which affect the dependent variables or which are used to predict the values of the dependent variables are called independent variable, also called as a predictor.  Outliers: Outlier is an observation which contains either very low value or very high value in comparison to other observed values. An outlier may hamper the result, so it should be avoided.  Multicollinearity: If the independent variables are highly correlated with each other than other variables, then such condition is called Multicollinearity. It should not be present in the dataset, because it creates problem while ranking the most affecting variable.  Underfitting and Overfitting: If our algorithm works well with the training dataset but not well with test dataset, then such problem is called Overfitting. And if our algorithm does not perform well even with training dataset, then such problem is called underfitting. Mrs.Harsha Patil,Dr.D.Y.Patil ACS College,Pimpri,Pune
  • 5.
     Types ofRegression : There are various types of regressions which are used in data science and machine learning.  Linear Regression  Logistic Regression  Polynomial Regression  Support Vector Regression  Decision Tree Regression  Random Forest Regression  Ridge Regression  Lasso Regression: Mrs.Harsha Patil,Dr.D.Y.Patil ACS College,Pimpri,Pune
  • 6.
  • 7.
    Linear Regression:  Linearregression is a statistical regression method which is used for predictive analysis.  It is one of the very simple and easy algorithms which works on regression and shows the relationship between the continuous variables.  It is used for solving the regression problem in machine learning.  Linear regression shows the linear relationship between the independent variable (X-axis) and the dependent variable (Y-axis), hence called linear regression.  If there is only one input variable (x), then such linear regression is called simple linear regression. And if there is more than one input variable, then such linear regression is called multiple linear regression.  The relationship between variables in the linear regression model can be explained using the below image. Here we are predicting the salary of an employee on the basis of the year of experience. Mrs.Harsha Patil,Dr.D.Y.Patil ACS College,Pimpri,Pune
  • 8.
  • 9.
     Some popularapplications of linear regression are:  Analyzing trends and sales estimates  Salary forecasting  Real estate prediction  Arriving at ETAs in traffic. 1.Simple Linear Regression :  Simple Linear Regression is a type of Regression algorithms that models the relationship between a dependent variable and a single independent variable. The relationship shown by a Simple Linear Regression model is linear or a sloped straight line, hence it is called Simple Linear Regression.  The key point in Simple Linear Regression is that the dependent variable must be a continuous/real value. However, the independent variable can be measured on continuous or categorical values.  Simple Linear regression algorithm has mainly two objectives: Mrs.Harsha Patil,Dr.D.Y.Patil ACS College,Pimpri,Pune
  • 10.
     Model therelationship between the two variables. Such as the relationship between Income and expenditure, experience and Salary, etc.  Forecasting new observations. Such as Weather forecasting according to temperature, Revenue of a company according to the investments in a year, etc.  Recall the geometry lesson from high school. What is the equation of a line? y = mx + c Linear regression is nothing but a manifestation of this simple equation. Mrs.Harsha Patil,Dr.D.Y.Patil ACS College,Pimpri,Pune
  • 11.
    Where,  y isthe dependent variable i.e. the variable that needs to be estimated and predicted.  x is the independent variable i.e. the variable that is controllable. It is the input.  m is the slope. It determines what will be the angle of the line. It is the parameter denoted as β.  c is the intercept. A constant that determines the value of y when x is 0.  We may recognize the equation for simple linear regression as the equation for a sloped line on an x and y axis. y = b0 + b1 * x1 Mrs.Harsha Patil,Dr.D.Y.Patil ACS College,Pimpri,Pune
  • 12.
    Where ,  b0is constant.  y is dependent variable  B1 coefficient can be thought of as a multiplier that connects the independent and dependent variables. It translates how much y will be affected by a unit change in x. In other words, a change in x does not usually mean an equal change in y.  x1is an independent variable. Mrs.Harsha Patil,Dr.D.Y.Patil ACS College,Pimpri,Pune
  • 13.
     Simple LinearRegression in Python : #importing libraries import numpy as np import matplotlib.pyplot as plt import pandas as pd # Importing the dataset dataset = pd.read_csv('salary_data.csv') x = dataset.iloc[:, :-1].values y = dataset.iloc[:, 1].values Mrs.Harsha Patil,Dr.D.Y.Patil ACS College,Pimpri,Pune
  • 14.
    # Splitting thedataset into the Training set and Test set from sklearn.model_selection import train_test_split X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=1/3, random_state=0) # Fitting Simple Linear Regression to the Training set from sklearn.linear_model import LinearRegression regressor = LinearRegression() regressor.fit(X_train, y_train) Mrs.Harsha Patil,Dr.D.Y.Patil ACS College,Pimpri,Pune
  • 15.
    # Visualizing theTraining set results viz_train = plt viz_train.scatter(X_train, y_train, color='red') viz_train.plot(X_train, regressor.predict(X_train), color='blue') viz_train.title('Salary VS Experience (Training set)') viz_train.xlabel('Year of Experience') viz_train.ylabel('Salary') viz_train.show() # Visualizing the Test set results viz_test = plt viz_test.scatter(X_test, y_test, color='red') viz_test.plot(X_train, regressor.predict(X_train), color='blue') viz_test.title('Salary VS Experience (Test set)') viz_test.xlabel('Year of Experience') viz_test.ylabel('Salary') viz_test.show() Mrs.Harsha Patil,Dr.D.Y.Patil ACS College,Pimpri,Pune
  • 16.
     After runningabove code excluding code explanations part, you can see 2 plots in the console window as shown below: Mrs.Harsha Patil,Dr.D.Y.Patil ACS College,Pimpri,Pune
  • 17.
  • 18.
     One plotis from training set and another from test. Blue lines are in the same direction. Our model is good to use now.  Now we can use it to calculate (predict) any values of X depends on y or any values of y depends on X. This can be done by using predict() function as follows: # Predicting the result of 5 Years Experience y_pred =regressor.predict(np.array([5]).reshape(1, 1)) Output : The value of y_pred with X = 5 (5 Years Experience) is 73545.90 You can offer to your candidate the salary of ₹ 73,545.90 and this is the best salary for him! Mrs.Harsha Patil,Dr.D.Y.Patil ACS College,Pimpri,Pune
  • 19.
     In conclusion,with Simple Linear Regression, we have to do 5 steps as per below:  Importing the dataset.  Splitting dataset into training set and testing set (2 dimensions of X and y per each set). Normally, the testing set should be 5% to 30% of dataset.  Visualize the training set and testing set to double check (you can bypass this step if you want).  Initializing the regression model and fitting it using training set (both X and y).  Let’s predict!! We can also pass an array of X (of test set):  # Predicting the Test set results y_pred = regressor.predict(X_test) Mrs.Harsha Patil,Dr.D.Y.Patil ACS College,Pimpri,Pune
  • 20.
    Predict y_pred usingarray of X_test Mrs.Harsha Patil,Dr.D.Y.Patil ACS College,Pimpri,Pune
  • 21.
    2.Multiple Linear Regression:  We have seen the concept of simple linear regression where a single predictor variable x(years of experience) was used to model the response variable y (Salary). In many applications, there is more than one factor that effects the response. Multiple regression models describe how a single response variable y depends linearly on a number of predictor variables.  For Examples:  The selling price of a house can depend on the desirability of the location, the number of bedrooms, the number of bathrooms, the year the house was built, the square footage of the plot and a number of other factors.  The height of a child can rest on the height of the mother, the height of the father, nutrition, and environmental factors. Mrs.Harsha Patil,Dr.D.Y.Patil ACS College,Pimpri,Pune
  • 22.
     Multiple linearregression works the same way as that of simple linear regression, except for the introduction of more independent variables and their corresponding coefficients.  In Simple Linear Regression we dealt with equation: y = b0 + b1 * x1 With concerned to it, Multiple Linear Regression equation will become: y = b0 + b1 * x1+ b2 * x2 + b3 * x3 + ………... +bn * xn Or i Y= b0 + ∑ bn 1 Mrs.Harsha Patil,Dr.D.Y.Patil ACS College,Pimpri,Pune
  • 23.
     In translation,predicted value y is sum of all features multiplied with their coefficients, summed with base coefficient b0. Where,  y is dependent variable/ predicted value.  xi – features / independent variable / explanatory variable / observed variable  b0 is constant  bn are coefficients that can be thought of as a multiplier that connects the independent and dependent variables. It translates how much y will be affected by a unit change in x. In other words, a change in x does not usually mean an equal change in y. Alternatively,  So simplified, we are predicting what value of y will be depending on features xi and with coefficients bi, we are deciding how much each feature is affecting predicted value. Mrs.Harsha Patil,Dr.D.Y.Patil ACS College,Pimpri,Pune
  • 24.
    Multiple Linear Regressionin Python : #Importing libraries import numpy as np import matplotlib.pyplot as plt import pandas as pd from sklearn.model_selection import train_test_split from sklearn.linear_model import LinearRegression # Importing the dataset dataset = pd.read_csv('salary_data.csv') x = dataset.iloc[:, :-1].values y = dataset.iloc[:, 4].values #Splitting the dataset into the Training set and Test set X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size = 0.2, random_state = 0) Mrs.Harsha Patil,Dr.D.Y.Patil ACS College,Pimpri,Pune
  • 25.
    # Fitting MultipleLinear Regression to the Training set. regressor = LinearRegression() regressor.fit(X_train, y_train)  Now! We have the Multiple Linear Regression model, we can use it to calculate (predict) any values of x depends on y or any values of y depends on x. This is how we do it as follows:  ‘’’Predicting the result of salary of new employee with 5 Years of total Experience, 2 years as team lead Experience, one year as project manager and has 2 certifications’’’ x_new = [[5],[2],[1],[2]] y_pred = regressor.predict(np.array(x_new).reshape(1, 4)) print(y_pred) accuracy = (regressor.score(X_test,y_test)) print(accuracy) Mrs.Harsha Patil,Dr.D.Y.Patil ACS College,Pimpri,Pune
  • 26.
    Output : The valueof y_pred with x_new = [[5],[2],[1],[2]](5 Years of total Experience, 2 years as team lead, one year as project manager and 2 Certifications) is ₹ 48017.20 You can offer to your candidate the salary of ₹48017.20 and this is the best salary for him! Mrs.Harsha Patil,Dr.D.Y.Patil ACS College,Pimpri,Pune
  • 27.
    3.Polynomial Linear Regression:  Polynomial regression is a form of regression analysis in which the relationship between the independent variable x and the dependent variable y is modelled as nth degree polynomial in x.  Polynomial regression fits a nonlinear relationship between the value of x and the corresponding conditional mean of y, denoted E(y |x).  Although polynomial regression fits a nonlinear model to the data, as a statistical estimation problem it is linear, in the sense that the regression function E(y | x) is linear in the unknown parameters that are estimated from the data.  For this reason, polynomial regression is considered to be a special case of multiple linear regression. Mrs.Harsha Patil,Dr.D.Y.Patil ACS College,Pimpri,Pune
  • 28.
     For Example:Increment of salaryof employees per year is often non-linear. We may express it in terms of polynomial Equation as y = b0 + b1x+ b2x 2 + b3x 3 + ......+ bn x n where,  b0 is constant .  y is dependent variable  bicoefficient can be thought of as a multiplier that connects the independent and dependent variables. It translates how much y will be affected by a degree or powerof change in x. In other words, a change in xi does not usually mean an equal change in y.  x is an independent variable. Mrs.Harsha Patil,Dr.D.Y.Patil ACS College,Pimpri,Pune
  • 29.
     Let usconsider dataset of this kind of example that represent the Polynomial shape. Mrs.Harsha Patil,Dr.D.Y.Patil ACS College,Pimpri,Pune
  • 30.
     To getan overview of the increment of salary, let’s visualize the data set into a chart: Mrs.Harsha Patil,Dr.D.Y.Patil ACS College,Pimpri,Pune
  • 31.
     Let’s thinkabout our candidate. He has 5.5 Year of experience. What if we use the Linear Regression in this example? Mrs.Harsha Patil,Dr.D.Y.Patil ACS College,Pimpri,Pune
  • 32.
    Polynomial Linear Regressionin Python : #Importing libraries import numpy as np import matplotlib.pyplot as plt import pandas as pd # Importing the dataset dataset = pd.read_csv(‘position_salaries’) X = dataset.iloc[:, 1:2].values y = dataset.iloc[:, 2].values Mrs.Harsha Patil,Dr.D.Y.Patil ACS College,Pimpri,Pune
  • 33.
    # Splitting thedataset into the Training set and Test set from sklearn.model_selection importtrain_test_split X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0) # Fitting Polynomial Regression to the dataset from sklearn.linear_model import LinearRegression from sklearn.preprocessing import PolynomialFeatures poly_reg = PolynomialFeatures(degree=4) X_poly = poly_reg.fit_transform(X) pol_reg = LinearRegression() pol_reg.fit(X_poly, y) Mrs.Harsha Patil,Dr.D.Y.Patil ACS College,Pimpri,Pune
  • 34.
    # Visualizing thePolynomial Regression results def viz_polymonial(): plt.scatter(X, y, color='red') plt.plot(X, pol_reg.predict(poly_reg.fit_transform(X)), color='blue') plt.title('Truth or Bluff (Linear Regression)') plt.xlabel('Position level') plt.ylabel('Salary') plt.show() return viz_polymonial() Mrs.Harsha Patil,Dr.D.Y.Patil ACS College,Pimpri,Pune
  • 35.
    # Additional feature #Making the plot line (Blue one) more smooth def viz_polymonial_smooth(): X_grid = np.arange(min(X), max(X), 0.1) X_grid = X_grid.reshape(len(X_grid), 1) # Visualizing the Polymonial Regression results plt.scatter(X, y, color='red') plt.plot(X_grid, pol_reg.predict(poly_reg.fit_transform(X_grid)), color='blue') plt.title('Truth or Bluff (Linear Regression)') plt.xlabel('Position level') plt.ylabel('Salary') plt.show() return viz_polymonial_smooth() Mrs.Harsha Patil,Dr.D.Y.Patil ACS College,Pimpri,Pune
  • 36.
     After callingthe viz_polynomial() function, you can see a plotting as per below: Mrs.Harsha Patil,Dr.D.Y.Patil ACS College,Pimpri,Pune
  • 37.
    Last step, let’spredict the value of our candidate (with 5.5 YE) Polynomial Regression model: # Predicting a new result with Polymonial Regression print(pol_reg.predict(poly_reg.fit_transform([[5.5]]))) Output: It’s time to let our candidate know, we will offer him a best salary in class with₹ 132,148! Mrs.Harsha Patil,Dr.D.Y.Patil ACS College,Pimpri,Pune
  • 38.
     As weknow linear regression, models a line to depicts the data points, while a support vector regression (SVR), models a hyperplane to portrays or cover the data points.  Hyperplane is an area for margin of tolerance, set for approximation of datapoints. The training instances inside to the hyperplane that helps to define the margin are called Support Vectors.  SVR tries to have as many support vectors as possible within the boundary lines without much margin violation, thus keeping the error within the threshold decided by the boundary lines of hyperplane. Mrs.Harsha Patil,Dr.D.Y.Patil ACS College,Pimpri,Pune
  • 39.
  • 40.
     For Example:Stock price prediction as shown below Mrs.Harsha Patil,Dr.D.Y.Patil ACS College,Pimpri,Pune
  • 41.
     Intuitively, supportvectors contribute to the error ε made by the SVR called threshold and thus, we want most of the support vectors to be in that threshold. We can model SVR through kernels that indicate the similarity measure between the test data point and the support vectors.  A kernel is a set of mathematical functions which takes data as input and transforms it into the form required by the output.  Support Vector regression supports linear and non-linear regression. As it seems in the above graph, the mission is to fit as many instances as possible between the lines while limiting the margin violations. The violation concept in this example represents as ε (epsilon).  In regression problems, we generally try to find a line that best fits the data provided. The equation of the line in its simplest form is described as y=mx +c  In the case of support vector regression, we do something similar but with a slight change. Here we define a small error value e (error = prediction - actual). Mrs.Harsha Patil,Dr.D.Y.Patil ACS College,Pimpri,Pune
  • 42.
     The valueof e determines the width of the error tube (also called insensitive tube or hyper plane). The value of e determines the number of support vectors, and a smaller e value indicates a lower tolerance for error.  Thus, we try to find the line’s best fit in such a way that: (mx+c)-y ≤ e and y-(mx+c) ≤ e  Also, we do not care about errors as long, as they are less than e.  For example, if we’re dealing with stock trading, and we want to minimize the trading loss, but we do not care about loss as long as they are less than a certain value (e).  Hence, the support vector regression model depends only on a subset of the training data points, as the cost function of the model ignores any training data close to the model prediction when the error is less than e.  In the field of machine learning, a support vector regression algorithm can, in some cases, be more suitable for regression problems than other common and popular algorithms. Below are the cases where a support vector regression is advantageous over other regression algorithms: Mrs.Harsha Patil,Dr.D.Y.Patil ACS College,Pimpri,Pune
  • 43.
     SVR ismemory efficient, which means it takes a relatively lower amount of calculation resources to train the model. This is because presenting the solution by means of a small subset of training points gives enormous computational advantages.  There are non-linear or complex relationships between features and labels. This is because we have the option to convert non-linear relationships to higher-dimensional problems in the case of support vector regression. Mrs.Harsha Patil,Dr.D.Y.Patil ACS College,Pimpri,Pune
  • 44.
     Decision treesare supervised learning algorithms used for both, classification and regression.  Decision trees are assigned to the information-based learning algorithms which use different measures of information gain for learning. We can use decision trees for issues where we have continuous but also categorical input and target features.  The main idea of decision trees is to find those descriptive features which contain the most "information" regarding the target feature and then split the dataset along the values of these features such that the target feature values for the resulting sub datasets are as pure as possible.  The descriptive feature which leaves the target feature most purely is said to be the most informative one.  This process of finding the "most informative" feature is done until we accomplish a stopping criterion, where we then finally end up in so called leaf nodes. Mrs.Harsha Patil,Dr.D.Y.Patil ACS College,Pimpri,Pune
  • 45.
  • 46.
     The leafnodes contain the predictions we will make for new query instances presented to our trained model.  This is possible since the model has kind of learned the underlying structure of the training data and hence can, given some assumptions, make predictions about the target feature value (class) of unseen query instances.  A decision tree mainly contains of a root node, interior nodes, and leaf nodes which are then connected by branches. Mrs.Harsha Patil,Dr.D.Y.Patil ACS College,Pimpri,Pune
  • 47.
     Decision treesare sensitive to the specific data on which they are trained. If the training data is changed the resulting decision tree can be quite different and in turn the predictions can be quite different.  Also, Decision trees are computationally expensive to train, carry a big risk of overfitting (learning system tightly fits the given training data so much that it would be inaccurate in predicting the outcomes of the untrained data. In decision trees, over-fitting occurs when the tree is designed so as to perfectly fit all samples in the training data set.), and tend to find local optima because they can’t go back after they have made a split.  To solve these weaknesses, we use Random Forest which illustrates the power of combining many decision trees into one model. Mrs.Harsha Patil,Dr.D.Y.Patil ACS College,Pimpri,Pune
  • 48.
  • 49.
     Random forestis a Supervised Learning algorithm which uses ensemble learning method for classification and regression.  An Ensemble method is a technique that combines the predictions from multiple machine learning algorithms together to make more accurate predictions than any individual model. A model comprised of many models is called an Ensemble model. Mrs.Harsha Patil,Dr.D.Y.Patil ACS College,Pimpri,Pune
  • 50.
    Types of EnsembleLearning:  Boosting.  Bootstrap Aggregation (Bagging). 1. Boosting Boosting refers to a group of algorithms that utilize weighted averages to make weak learners into stronger learners. Boosting is all about “teamwork”. Each model that runs, dictates what features the next model will focus on.In boosting as the name suggests, one is learning from other which in turn boosts the learning. 2. Bootstrap Aggregation (Bagging) Bootstrap allows us to better understand the bias and the variance with the dataset. Bootstrap involves random sampling of small subset of data from the dataset.Bagging makes each model run independently and then aggregates the outputs at the end without preference to any model. Mrs.Harsha Patil,Dr.D.Y.Patil ACS College,Pimpri,Pune
  • 51.