SlideShare a Scribd company logo
Course – Big Data Analytics (Professional
Elective-II)
Course code-IT314B
Unit-II- ADVANCED ANALYTICAL THEORY AND
METHODS USING PYTHON
Sanjivani Rural Education Society’s
Sanjivani College of Engineering, Kopargaon-423603
(An Autonomous Institute Affiliated to Savitribai Phule Pune University, Pune)
NAAC ‘A’ Grade Accredited, ISO 9001:2015 Certified
Department of Information Technology
(NBA Accredited)
Mr. Rajendra N Kankrale
Asst. Prof.
1
BDA- Unit-II Regression Department of IT
Unit-II- ADVANCED ANALYTICAL THEORY AND METHODS USING
PYTHON
• Syllabus
2
ADVANCED ANALYTICAL THEORY AND METHODS USING PYTHON
Introduction to Scikit-learn,
Installations, Dataset, matplotlib, filling missing values,
Regression and Classification using Scikit-learn
Association Rules: FP growth,
Regression: Linear Regression, Logistic Regression,
Classification: Naïve Bayes classifier
BDA- Unit-II Regression Department of IT
Unit-II- Regression
3
BDA- Unit-II Regression Department of IT
Unit-II- Regression
• Motivation
• Regression estimates the relationship between the target and
the independent variable.
• It is used to find the trends in data.
• It helps to predict real/continuous values.
• By performing the regression, we can confidently determine the
most important factor, the least important factor, and how each
factor is affecting the other factors.
4
BDA- Unit-II Regression Department of IT
Linear Regression
• Linear Regression is a supervised machine learning algorithm.
• It tries to find out the best linear relationship that describes the data you have.
• It assumes that there exists a linear relationship between a dependent variable and
independent variable(s).
• The value of the dependent variable of a linear regression model is a continuous
value i.e. real numbers.
5
BDA- Unit-II Regression Department of IT
Representing Linear Regression Model
• Linear regression model represents the linear relationship between a dependent
variable and independent variable(s) via a sloped straight line
• The sloped straight line representing the linear relationship that fits the given
data best is called as a regression line.
• It is also called as best fit line.
6
BDA- Unit-II Regression Department of IT
Types of Linear Regression-
1. Simple Linear Regression
2. Multiple Linear Regression
7
BDA- Unit-II Regression Department of IT
Simple Linear Regression
For simple linear regression, the form of the model is-
Y = β0 + β1X
8
Here,
Y is a dependent variable.
X is an independent variable.
β0 and β1 are the regression coefficients.
β0 is the intercept or the bias that fixes the offset to a
line.
β1 is the slope or weight that specifies the factor by
which X has an impact on Y.
BDA- Unit-II Regression Department of IT
Simple Linear Regression
There are following 3 cases possible-
Case-01: β1 < 0
It indicates that variable X has negative impact on Y.
If X increases, Y will decrease and vice-versa.
9
BDA- Unit-II Regression Department of IT
Simple Linear Regression
Case-02: β1 = 0
• It indicates that variable X has no impact on Y.
• If X changes, there will be no change in Y.
10
BDA- Unit-II Regression Department of IT
Simple Linear Regression
11
Case-03: β1 > 0
It indicates that variable X has positive impact on Y.
If X increases, Y will increase and vice-versa.
BDA- Unit-II Regression Department of IT
Multiple Linear Regression-
12
In multiple linear regression, the dependent variable depends on more
than one independent variables.
For multiple linear regression, the form of the model is-
Y = β0 + β1X1 + β2X2 + β3X3 + …… + βnXn
Here,
Y is a dependent variable.
X1, X2, …., Xn are independent variables.
β0, β1,…, βn are the regression coefficients.
βj (1<=j<=n) is the slope or weight that specifies the factor
by which Xj has an impact on Y.
BDA- Unit-II Regression Department of IT
Evaluation metrics for a linear regression model
Mean Squared Error (MSE)
The most common metric for regression tasks is MSE. It has a convex shape. It
is the average of the squared difference between the predicted and actual value.
Since it is differentiable and has a convex shape, it is easier to optimize.
MSE penalizes large errors.
13
BDA- Unit-II Regression Department of IT
Evaluation metrics for a linear regression model
14
BDA- Unit-II Regression Department of IT
Evaluation metrics for a linear regression model
Mean Absolute Error (MAE)
This is simply the average of the absolute difference between the target value
and the value predicted by the model. Not preferred in cases where outliers are
prominent.
MAE does not penalize large errors.
15
BDA- Unit-II Regression Department of IT
Evaluation metrics for a linear regression model
16
BDA- Unit-II Regression Department of IT
Evaluation metrics for a linear regression model
Root Mean Squared Error(RMSE)
As RMSE is clear by the name itself, that it is a simple square root of mean squared
error.
17
BDA- Unit-II Regression Department of IT
Evaluation metrics for a linear regression model
18
BDA- Unit-II Regression Department of IT
Evaluation metrics for a linear regression model
R-squared explains to what extent the variance of one variable explains
the variance of the second variable. In other words, it measures the
proportion of variance of the dependent variable explained by the
independent variable.
R squared is a popular metric for identifying model accuracy. It tells how
close are the data points to the fitted line generated by a regression
algorithm. A larger R squared value indicates a better fit. This helps us to
find the relationship between the independent variable towards the
dependent variable.
19
BDA- Unit-II Regression Department of IT
Evaluation metrics for a linear regression model
• SSE is the sum of the square of the
difference between the actual value
and the predicted value
• SST is the total sum of the square of
the difference between the actual
value and the mean of the actual
value.
• yi is the observed target value, ŷi is
the predicted value, and y-bar is the
mean value, m represents the total
number of observations.
20
BDA- Unit-II Regression Department of IT
Evaluation metrics for a linear regression model
21
BDA- Unit-II Regression Department of IT
Evaluation metrics for a linear regression model
22
• R² score ranges from 0 to 1. The closest to 1 the R², the better the
regression model is. If R² is equal to 0, the model is not performing better
than a random model. If R² is negative, the regression model is erroneous.
• A small MAE suggests the model is great at prediction, while a large MAE
suggests that your model may have trouble in certain areas. MAE of 0
means that your model is a perfect predictor of the outputs.
• If you have outliers in the dataset then it penalizes the outliers most and
the calculated MSE is bigger. So, in short, It is not Robust to outliers which
were an advantage in MAE.
BDA- Unit-II Regression Department of IT
Simple Linear Regression With scikit-learn
23
• You’ll start with the simplest case, which is simple linear regression. There
are five basic steps when you’re implementing linear regression:
1. Import the packages and classes that you need.
2. Provide data to work with, and eventually do appropriate transformations.
3. Create a regression model and fit it with existing data.
4. Check the results of model fitting to know whether the model is
satisfactory.
5. Apply the model for predictions.
BDA- Unit-II Regression Department of IT
Simple Linear Regression With scikit-learn
24
• Step 1: Import packages and classes
• The first step is to import the package numpy and the class
LinearRegression from sklearn.linear_model:
• >>> import numpy as np
• >>> from sklearn.linear_model import LinearRegression
BDA- Unit-II Regression Department of IT
Simple Linear Regression With scikit-learn
25
• Step 2: Provide data
• The second step is defining data to work with. The inputs (regressors, 𝑥)
and output (response, 𝑦) should be arrays or similar objects. This is the
simplest way of providing data for regression:
• >>> x = np.array([5, 15, 25, 35, 45, 55]).reshape((-1, 1))
• >>> y = np.array([5, 20, 14, 32, 22, 38])
• Now, you have two arrays: the input, x, and the output, y. You should call
.reshape() on x because this array must be two-dimensional, or more
precisely, it must have one column and as many rows as necessary. That’s
exactly what the argument (-1, 1) of .reshape() specifies.
BDA- Unit-II Regression Department of IT
Simple Linear Regression With scikit-learn
26
• This is how x and y look now:
• >>> x
• array([[ 5],
• [15],
• [25],
• [35],
• [45],
• [55]])
• >>> y
• array([ 5, 20, 14, 32, 22, 38])
• As you can see, x has two dimensions, and x.shape is (6, 1), while y has a
single dimension, and y.shape is (6,).
BDA- Unit-II Regression Department of IT
Simple Linear Regression With scikit-learn
27
• Step 3: Create a model and fit it
• The next step is to create a linear regression model and fit it using the
existing data.
• Create an instance of the class LinearRegression, which will represent the
regression model:
• >>> model = LinearRegression()
• This statement creates the variable model as an instance of
LinearRegression. You can provide several optional parameters to
LinearRegression:
BDA- Unit-II Regression Department of IT
Simple Linear Regression With scikit-learn
28
• This statement creates the variable model as an instance of
LinearRegression. You can provide several optional parameters to
LinearRegression:
• fit_intercept is a Boolean that, if True, decides to calculate the intercept 𝑏₀
or, if False, considers it equal to zero. It defaults to True.
• normalize is a Boolean that, if True, decides to normalize the input
variables. It defaults to False, in which case it doesn’t normalize the input
variables.
• copy_X is a Boolean that decides whether to copy (True) or overwrite the
input variables (False). It’s True by default.
• n_jobs is either an integer or None. It represents the number of jobs used
in parallel computation. It defaults to None, which usually means one job. -
1 means to use all available processors.
• Your model as defined above uses the default values of all parameters.
BDA- Unit-II Regression Department of IT
Simple Linear Regression With scikit-learn
29
• It’s time to start using the model. First, you need to call .fit() on model:
• >>> model.fit(x, y)
• LinearRegression()
• With .fit(), you calculate the optimal values of the weights 𝑏₀ and 𝑏₁, using
the existing input and output, x and y, as the arguments. In other words,
.fit() fits the model. It returns self, which is the variable model itself. That’s
why you can replace the last two statements with this one:
• >>> model = LinearRegression().fit(x, y)
• This statement does the same thing as the previous two. It’s just shorter.
BDA- Unit-II Regression Department of IT
Simple Linear Regression With scikit-learn
30
• Step 4: Get results
• Once you have your model fitted, you can get the results to check whether
the model works satisfactorily and to interpret it.
• You can obtain the coefficient of determination, 𝑅², with .score() called on
model:
• >>> r_sq = model.score(x, y)
• >>> print(f"coefficient of determination: {r_sq}")
• coefficient of determination: 0.7158756137479542
BDA- Unit-II Regression Department of IT
Simple Linear Regression With scikit-learn
31
• When you’re applying .score(), the arguments are also the predictor x and
response y, and the return value is 𝑅².
• The attributes of model are .intercept_, which represents the coefficient 𝑏₀,
and .coef_, which represents 𝑏₁:
• >>> print(f"intercept: {model.intercept_}")
• intercept: 5.633333333333329
• >>> print(f"slope: {model.coef_}")
• slope: [0.54]
• The code above illustrates how to get 𝑏₀ and 𝑏₁. You can notice that
.intercept_ is a scalar, while .coef_ is an array.

More Related Content

Similar to Unit2_Regression, ADVANCED ANALYTICAL THEORY AND METHODS USING PYTHON

Study on Evaluation of Venture Capital Based onInteractive Projection Algorithm
	Study on Evaluation of Venture Capital Based onInteractive Projection Algorithm	Study on Evaluation of Venture Capital Based onInteractive Projection Algorithm
Study on Evaluation of Venture Capital Based onInteractive Projection Algorithm
inventionjournals
 
Week 2 Individual Assignment 2 Quantitative Analysis of Credit - .docx
Week 2 Individual Assignment 2 Quantitative Analysis of Credit - .docxWeek 2 Individual Assignment 2 Quantitative Analysis of Credit - .docx
Week 2 Individual Assignment 2 Quantitative Analysis of Credit - .docx
jessiehampson
 
FREE- REFERENCE IMAGE QUALITY ASSESSMENT FRAMEWORK USING METRICS FUSION AND D...
FREE- REFERENCE IMAGE QUALITY ASSESSMENT FRAMEWORK USING METRICS FUSION AND D...FREE- REFERENCE IMAGE QUALITY ASSESSMENT FRAMEWORK USING METRICS FUSION AND D...
FREE- REFERENCE IMAGE QUALITY ASSESSMENT FRAMEWORK USING METRICS FUSION AND D...
sipij
 
Itab innovative assessement tool
Itab innovative assessement toolItab innovative assessement tool
Itab innovative assessement tool
Martin J Ippel
 
Exploring Support Vector Regression - Signals and Systems Project
Exploring Support Vector Regression - Signals and Systems ProjectExploring Support Vector Regression - Signals and Systems Project
Exploring Support Vector Regression - Signals and Systems Project
Surya Chandra
 
SupportVectorRegression
SupportVectorRegressionSupportVectorRegression
SupportVectorRegression
Daniel K
 
An application of artificial intelligent neural network and discriminant anal...
An application of artificial intelligent neural network and discriminant anal...An application of artificial intelligent neural network and discriminant anal...
An application of artificial intelligent neural network and discriminant anal...
Alexander Decker
 
Boost Your Data Expertise - What's New in Minitab 19.2020.1
Boost Your Data Expertise -  What's New in Minitab 19.2020.1Boost Your Data Expertise -  What's New in Minitab 19.2020.1
Boost Your Data Expertise - What's New in Minitab 19.2020.1
Minitab, LLC
 
IRJET - A Survey on Machine Learning Intelligence Techniques for Medical ...
IRJET -  	  A Survey on Machine Learning Intelligence Techniques for Medical ...IRJET -  	  A Survey on Machine Learning Intelligence Techniques for Medical ...
IRJET - A Survey on Machine Learning Intelligence Techniques for Medical ...
IRJET Journal
 
sarisus hdyses can create targeted .pptx
sarisus hdyses can create targeted .pptxsarisus hdyses can create targeted .pptx
sarisus hdyses can create targeted .pptx
13DikshaDatir
 
Software Analytics In Action: A Hands-on Tutorial on Mining, Analyzing, Model...
Software Analytics In Action: A Hands-on Tutorial on Mining, Analyzing, Model...Software Analytics In Action: A Hands-on Tutorial on Mining, Analyzing, Model...
Software Analytics In Action: A Hands-on Tutorial on Mining, Analyzing, Model...
Chakkrit (Kla) Tantithamthavorn
 
New Directions in Mahout's Recommenders
New Directions in Mahout's RecommendersNew Directions in Mahout's Recommenders
New Directions in Mahout's Recommenders
sscdotopen
 
Qwertyui
QwertyuiQwertyui
Qwertyui
Jamie Boyd
 
Six sigma pedagogy
Six sigma pedagogySix sigma pedagogy
Six sigma pedagogy
MallikarjunRaoPanaba
 
Six sigma
Six sigma Six sigma
Prediction research: perspectives on performance Stanford 19May22.pptx
Prediction research: perspectives on performance Stanford 19May22.pptxPrediction research: perspectives on performance Stanford 19May22.pptx
Prediction research: perspectives on performance Stanford 19May22.pptx
Ewout Steyerberg
 
IRJET- Supervised Learning Classification Algorithms Comparison
IRJET- Supervised Learning Classification Algorithms ComparisonIRJET- Supervised Learning Classification Algorithms Comparison
IRJET- Supervised Learning Classification Algorithms Comparison
IRJET Journal
 
IRJET- Supervised Learning Classification Algorithms Comparison
IRJET- Supervised Learning Classification Algorithms ComparisonIRJET- Supervised Learning Classification Algorithms Comparison
IRJET- Supervised Learning Classification Algorithms Comparison
IRJET Journal
 
Boosting conversion rates on ecommerce using deep learning algorithms
Boosting conversion rates on ecommerce using deep learning algorithmsBoosting conversion rates on ecommerce using deep learning algorithms
Boosting conversion rates on ecommerce using deep learning algorithms
Armando Vieira
 
IRJET-Handwritten Digit Classification using Machine Learning Models
IRJET-Handwritten Digit Classification using Machine Learning ModelsIRJET-Handwritten Digit Classification using Machine Learning Models
IRJET-Handwritten Digit Classification using Machine Learning Models
IRJET Journal
 

Similar to Unit2_Regression, ADVANCED ANALYTICAL THEORY AND METHODS USING PYTHON (20)

Study on Evaluation of Venture Capital Based onInteractive Projection Algorithm
	Study on Evaluation of Venture Capital Based onInteractive Projection Algorithm	Study on Evaluation of Venture Capital Based onInteractive Projection Algorithm
Study on Evaluation of Venture Capital Based onInteractive Projection Algorithm
 
Week 2 Individual Assignment 2 Quantitative Analysis of Credit - .docx
Week 2 Individual Assignment 2 Quantitative Analysis of Credit - .docxWeek 2 Individual Assignment 2 Quantitative Analysis of Credit - .docx
Week 2 Individual Assignment 2 Quantitative Analysis of Credit - .docx
 
FREE- REFERENCE IMAGE QUALITY ASSESSMENT FRAMEWORK USING METRICS FUSION AND D...
FREE- REFERENCE IMAGE QUALITY ASSESSMENT FRAMEWORK USING METRICS FUSION AND D...FREE- REFERENCE IMAGE QUALITY ASSESSMENT FRAMEWORK USING METRICS FUSION AND D...
FREE- REFERENCE IMAGE QUALITY ASSESSMENT FRAMEWORK USING METRICS FUSION AND D...
 
Itab innovative assessement tool
Itab innovative assessement toolItab innovative assessement tool
Itab innovative assessement tool
 
Exploring Support Vector Regression - Signals and Systems Project
Exploring Support Vector Regression - Signals and Systems ProjectExploring Support Vector Regression - Signals and Systems Project
Exploring Support Vector Regression - Signals and Systems Project
 
SupportVectorRegression
SupportVectorRegressionSupportVectorRegression
SupportVectorRegression
 
An application of artificial intelligent neural network and discriminant anal...
An application of artificial intelligent neural network and discriminant anal...An application of artificial intelligent neural network and discriminant anal...
An application of artificial intelligent neural network and discriminant anal...
 
Boost Your Data Expertise - What's New in Minitab 19.2020.1
Boost Your Data Expertise -  What's New in Minitab 19.2020.1Boost Your Data Expertise -  What's New in Minitab 19.2020.1
Boost Your Data Expertise - What's New in Minitab 19.2020.1
 
IRJET - A Survey on Machine Learning Intelligence Techniques for Medical ...
IRJET -  	  A Survey on Machine Learning Intelligence Techniques for Medical ...IRJET -  	  A Survey on Machine Learning Intelligence Techniques for Medical ...
IRJET - A Survey on Machine Learning Intelligence Techniques for Medical ...
 
sarisus hdyses can create targeted .pptx
sarisus hdyses can create targeted .pptxsarisus hdyses can create targeted .pptx
sarisus hdyses can create targeted .pptx
 
Software Analytics In Action: A Hands-on Tutorial on Mining, Analyzing, Model...
Software Analytics In Action: A Hands-on Tutorial on Mining, Analyzing, Model...Software Analytics In Action: A Hands-on Tutorial on Mining, Analyzing, Model...
Software Analytics In Action: A Hands-on Tutorial on Mining, Analyzing, Model...
 
New Directions in Mahout's Recommenders
New Directions in Mahout's RecommendersNew Directions in Mahout's Recommenders
New Directions in Mahout's Recommenders
 
Qwertyui
QwertyuiQwertyui
Qwertyui
 
Six sigma pedagogy
Six sigma pedagogySix sigma pedagogy
Six sigma pedagogy
 
Six sigma
Six sigma Six sigma
Six sigma
 
Prediction research: perspectives on performance Stanford 19May22.pptx
Prediction research: perspectives on performance Stanford 19May22.pptxPrediction research: perspectives on performance Stanford 19May22.pptx
Prediction research: perspectives on performance Stanford 19May22.pptx
 
IRJET- Supervised Learning Classification Algorithms Comparison
IRJET- Supervised Learning Classification Algorithms ComparisonIRJET- Supervised Learning Classification Algorithms Comparison
IRJET- Supervised Learning Classification Algorithms Comparison
 
IRJET- Supervised Learning Classification Algorithms Comparison
IRJET- Supervised Learning Classification Algorithms ComparisonIRJET- Supervised Learning Classification Algorithms Comparison
IRJET- Supervised Learning Classification Algorithms Comparison
 
Boosting conversion rates on ecommerce using deep learning algorithms
Boosting conversion rates on ecommerce using deep learning algorithmsBoosting conversion rates on ecommerce using deep learning algorithms
Boosting conversion rates on ecommerce using deep learning algorithms
 
IRJET-Handwritten Digit Classification using Machine Learning Models
IRJET-Handwritten Digit Classification using Machine Learning ModelsIRJET-Handwritten Digit Classification using Machine Learning Models
IRJET-Handwritten Digit Classification using Machine Learning Models
 

Recently uploaded

Object Oriented Analysis and Design - OOAD
Object Oriented Analysis and Design - OOADObject Oriented Analysis and Design - OOAD
Object Oriented Analysis and Design - OOAD
PreethaV16
 
smart pill dispenser is designed to improve medication adherence and safety f...
smart pill dispenser is designed to improve medication adherence and safety f...smart pill dispenser is designed to improve medication adherence and safety f...
smart pill dispenser is designed to improve medication adherence and safety f...
um7474492
 
Accident detection system project report.pdf
Accident detection system project report.pdfAccident detection system project report.pdf
Accident detection system project report.pdf
Kamal Acharya
 
NATURAL DEEP EUTECTIC SOLVENTS AS ANTI-FREEZING AGENT
NATURAL DEEP EUTECTIC SOLVENTS AS ANTI-FREEZING AGENTNATURAL DEEP EUTECTIC SOLVENTS AS ANTI-FREEZING AGENT
NATURAL DEEP EUTECTIC SOLVENTS AS ANTI-FREEZING AGENT
Addu25809
 
Use PyCharm for remote debugging of WSL on a Windo cf5c162d672e4e58b4dde5d797...
Use PyCharm for remote debugging of WSL on a Windo cf5c162d672e4e58b4dde5d797...Use PyCharm for remote debugging of WSL on a Windo cf5c162d672e4e58b4dde5d797...
Use PyCharm for remote debugging of WSL on a Windo cf5c162d672e4e58b4dde5d797...
shadow0702a
 
一比一原版(CalArts毕业证)加利福尼亚艺术学院毕业证如何办理
一比一原版(CalArts毕业证)加利福尼亚艺术学院毕业证如何办理一比一原版(CalArts毕业证)加利福尼亚艺术学院毕业证如何办理
一比一原版(CalArts毕业证)加利福尼亚艺术学院毕业证如何办理
ecqow
 
一比一原版(USF毕业证)旧金山大学毕业证如何办理
一比一原版(USF毕业证)旧金山大学毕业证如何办理一比一原版(USF毕业证)旧金山大学毕业证如何办理
一比一原版(USF毕业证)旧金山大学毕业证如何办理
uqyfuc
 
Digital Twins Computer Networking Paper Presentation.pptx
Digital Twins Computer Networking Paper Presentation.pptxDigital Twins Computer Networking Paper Presentation.pptx
Digital Twins Computer Networking Paper Presentation.pptx
aryanpankaj78
 
P5 Working Drawings.pdf floor plan, civil
P5 Working Drawings.pdf floor plan, civilP5 Working Drawings.pdf floor plan, civil
P5 Working Drawings.pdf floor plan, civil
AnasAhmadNoor
 
ITSM Integration with MuleSoft.pptx
ITSM  Integration with MuleSoft.pptxITSM  Integration with MuleSoft.pptx
ITSM Integration with MuleSoft.pptx
VANDANAMOHANGOUDA
 
原版制作(Humboldt毕业证书)柏林大学毕业证学位证一模一样
原版制作(Humboldt毕业证书)柏林大学毕业证学位证一模一样原版制作(Humboldt毕业证书)柏林大学毕业证学位证一模一样
原版制作(Humboldt毕业证书)柏林大学毕业证学位证一模一样
ydzowc
 
AI-Based Home Security System : Home security
AI-Based Home Security System : Home securityAI-Based Home Security System : Home security
AI-Based Home Security System : Home security
AIRCC Publishing Corporation
 
Generative AI Use cases applications solutions and implementation.pdf
Generative AI Use cases applications solutions and implementation.pdfGenerative AI Use cases applications solutions and implementation.pdf
Generative AI Use cases applications solutions and implementation.pdf
mahaffeycheryld
 
LLM Fine Tuning with QLoRA Cassandra Lunch 4, presented by Anant
LLM Fine Tuning with QLoRA Cassandra Lunch 4, presented by AnantLLM Fine Tuning with QLoRA Cassandra Lunch 4, presented by Anant
LLM Fine Tuning with QLoRA Cassandra Lunch 4, presented by Anant
Anant Corporation
 
Prediction of Electrical Energy Efficiency Using Information on Consumer's Ac...
Prediction of Electrical Energy Efficiency Using Information on Consumer's Ac...Prediction of Electrical Energy Efficiency Using Information on Consumer's Ac...
Prediction of Electrical Energy Efficiency Using Information on Consumer's Ac...
PriyankaKilaniya
 
4. Mosca vol I -Fisica-Tipler-5ta-Edicion-Vol-1.pdf
4. Mosca vol I -Fisica-Tipler-5ta-Edicion-Vol-1.pdf4. Mosca vol I -Fisica-Tipler-5ta-Edicion-Vol-1.pdf
4. Mosca vol I -Fisica-Tipler-5ta-Edicion-Vol-1.pdf
Gino153088
 
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024
Sinan KOZAK
 
TIME TABLE MANAGEMENT SYSTEM testing.pptx
TIME TABLE MANAGEMENT SYSTEM testing.pptxTIME TABLE MANAGEMENT SYSTEM testing.pptx
TIME TABLE MANAGEMENT SYSTEM testing.pptx
CVCSOfficial
 
Data Driven Maintenance | UReason Webinar
Data Driven Maintenance | UReason WebinarData Driven Maintenance | UReason Webinar
Data Driven Maintenance | UReason Webinar
UReason
 
2. protection of river banks and bed erosion protection works.ppt
2. protection of river banks and bed erosion protection works.ppt2. protection of river banks and bed erosion protection works.ppt
2. protection of river banks and bed erosion protection works.ppt
abdatawakjira
 

Recently uploaded (20)

Object Oriented Analysis and Design - OOAD
Object Oriented Analysis and Design - OOADObject Oriented Analysis and Design - OOAD
Object Oriented Analysis and Design - OOAD
 
smart pill dispenser is designed to improve medication adherence and safety f...
smart pill dispenser is designed to improve medication adherence and safety f...smart pill dispenser is designed to improve medication adherence and safety f...
smart pill dispenser is designed to improve medication adherence and safety f...
 
Accident detection system project report.pdf
Accident detection system project report.pdfAccident detection system project report.pdf
Accident detection system project report.pdf
 
NATURAL DEEP EUTECTIC SOLVENTS AS ANTI-FREEZING AGENT
NATURAL DEEP EUTECTIC SOLVENTS AS ANTI-FREEZING AGENTNATURAL DEEP EUTECTIC SOLVENTS AS ANTI-FREEZING AGENT
NATURAL DEEP EUTECTIC SOLVENTS AS ANTI-FREEZING AGENT
 
Use PyCharm for remote debugging of WSL on a Windo cf5c162d672e4e58b4dde5d797...
Use PyCharm for remote debugging of WSL on a Windo cf5c162d672e4e58b4dde5d797...Use PyCharm for remote debugging of WSL on a Windo cf5c162d672e4e58b4dde5d797...
Use PyCharm for remote debugging of WSL on a Windo cf5c162d672e4e58b4dde5d797...
 
一比一原版(CalArts毕业证)加利福尼亚艺术学院毕业证如何办理
一比一原版(CalArts毕业证)加利福尼亚艺术学院毕业证如何办理一比一原版(CalArts毕业证)加利福尼亚艺术学院毕业证如何办理
一比一原版(CalArts毕业证)加利福尼亚艺术学院毕业证如何办理
 
一比一原版(USF毕业证)旧金山大学毕业证如何办理
一比一原版(USF毕业证)旧金山大学毕业证如何办理一比一原版(USF毕业证)旧金山大学毕业证如何办理
一比一原版(USF毕业证)旧金山大学毕业证如何办理
 
Digital Twins Computer Networking Paper Presentation.pptx
Digital Twins Computer Networking Paper Presentation.pptxDigital Twins Computer Networking Paper Presentation.pptx
Digital Twins Computer Networking Paper Presentation.pptx
 
P5 Working Drawings.pdf floor plan, civil
P5 Working Drawings.pdf floor plan, civilP5 Working Drawings.pdf floor plan, civil
P5 Working Drawings.pdf floor plan, civil
 
ITSM Integration with MuleSoft.pptx
ITSM  Integration with MuleSoft.pptxITSM  Integration with MuleSoft.pptx
ITSM Integration with MuleSoft.pptx
 
原版制作(Humboldt毕业证书)柏林大学毕业证学位证一模一样
原版制作(Humboldt毕业证书)柏林大学毕业证学位证一模一样原版制作(Humboldt毕业证书)柏林大学毕业证学位证一模一样
原版制作(Humboldt毕业证书)柏林大学毕业证学位证一模一样
 
AI-Based Home Security System : Home security
AI-Based Home Security System : Home securityAI-Based Home Security System : Home security
AI-Based Home Security System : Home security
 
Generative AI Use cases applications solutions and implementation.pdf
Generative AI Use cases applications solutions and implementation.pdfGenerative AI Use cases applications solutions and implementation.pdf
Generative AI Use cases applications solutions and implementation.pdf
 
LLM Fine Tuning with QLoRA Cassandra Lunch 4, presented by Anant
LLM Fine Tuning with QLoRA Cassandra Lunch 4, presented by AnantLLM Fine Tuning with QLoRA Cassandra Lunch 4, presented by Anant
LLM Fine Tuning with QLoRA Cassandra Lunch 4, presented by Anant
 
Prediction of Electrical Energy Efficiency Using Information on Consumer's Ac...
Prediction of Electrical Energy Efficiency Using Information on Consumer's Ac...Prediction of Electrical Energy Efficiency Using Information on Consumer's Ac...
Prediction of Electrical Energy Efficiency Using Information on Consumer's Ac...
 
4. Mosca vol I -Fisica-Tipler-5ta-Edicion-Vol-1.pdf
4. Mosca vol I -Fisica-Tipler-5ta-Edicion-Vol-1.pdf4. Mosca vol I -Fisica-Tipler-5ta-Edicion-Vol-1.pdf
4. Mosca vol I -Fisica-Tipler-5ta-Edicion-Vol-1.pdf
 
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024
 
TIME TABLE MANAGEMENT SYSTEM testing.pptx
TIME TABLE MANAGEMENT SYSTEM testing.pptxTIME TABLE MANAGEMENT SYSTEM testing.pptx
TIME TABLE MANAGEMENT SYSTEM testing.pptx
 
Data Driven Maintenance | UReason Webinar
Data Driven Maintenance | UReason WebinarData Driven Maintenance | UReason Webinar
Data Driven Maintenance | UReason Webinar
 
2. protection of river banks and bed erosion protection works.ppt
2. protection of river banks and bed erosion protection works.ppt2. protection of river banks and bed erosion protection works.ppt
2. protection of river banks and bed erosion protection works.ppt
 

Unit2_Regression, ADVANCED ANALYTICAL THEORY AND METHODS USING PYTHON

  • 1. Course – Big Data Analytics (Professional Elective-II) Course code-IT314B Unit-II- ADVANCED ANALYTICAL THEORY AND METHODS USING PYTHON Sanjivani Rural Education Society’s Sanjivani College of Engineering, Kopargaon-423603 (An Autonomous Institute Affiliated to Savitribai Phule Pune University, Pune) NAAC ‘A’ Grade Accredited, ISO 9001:2015 Certified Department of Information Technology (NBA Accredited) Mr. Rajendra N Kankrale Asst. Prof. 1
  • 2. BDA- Unit-II Regression Department of IT Unit-II- ADVANCED ANALYTICAL THEORY AND METHODS USING PYTHON • Syllabus 2 ADVANCED ANALYTICAL THEORY AND METHODS USING PYTHON Introduction to Scikit-learn, Installations, Dataset, matplotlib, filling missing values, Regression and Classification using Scikit-learn Association Rules: FP growth, Regression: Linear Regression, Logistic Regression, Classification: Naïve Bayes classifier
  • 3. BDA- Unit-II Regression Department of IT Unit-II- Regression 3
  • 4. BDA- Unit-II Regression Department of IT Unit-II- Regression • Motivation • Regression estimates the relationship between the target and the independent variable. • It is used to find the trends in data. • It helps to predict real/continuous values. • By performing the regression, we can confidently determine the most important factor, the least important factor, and how each factor is affecting the other factors. 4
  • 5. BDA- Unit-II Regression Department of IT Linear Regression • Linear Regression is a supervised machine learning algorithm. • It tries to find out the best linear relationship that describes the data you have. • It assumes that there exists a linear relationship between a dependent variable and independent variable(s). • The value of the dependent variable of a linear regression model is a continuous value i.e. real numbers. 5
  • 6. BDA- Unit-II Regression Department of IT Representing Linear Regression Model • Linear regression model represents the linear relationship between a dependent variable and independent variable(s) via a sloped straight line • The sloped straight line representing the linear relationship that fits the given data best is called as a regression line. • It is also called as best fit line. 6
  • 7. BDA- Unit-II Regression Department of IT Types of Linear Regression- 1. Simple Linear Regression 2. Multiple Linear Regression 7
  • 8. BDA- Unit-II Regression Department of IT Simple Linear Regression For simple linear regression, the form of the model is- Y = β0 + β1X 8 Here, Y is a dependent variable. X is an independent variable. β0 and β1 are the regression coefficients. β0 is the intercept or the bias that fixes the offset to a line. β1 is the slope or weight that specifies the factor by which X has an impact on Y.
  • 9. BDA- Unit-II Regression Department of IT Simple Linear Regression There are following 3 cases possible- Case-01: β1 < 0 It indicates that variable X has negative impact on Y. If X increases, Y will decrease and vice-versa. 9
  • 10. BDA- Unit-II Regression Department of IT Simple Linear Regression Case-02: β1 = 0 • It indicates that variable X has no impact on Y. • If X changes, there will be no change in Y. 10
  • 11. BDA- Unit-II Regression Department of IT Simple Linear Regression 11 Case-03: β1 > 0 It indicates that variable X has positive impact on Y. If X increases, Y will increase and vice-versa.
  • 12. BDA- Unit-II Regression Department of IT Multiple Linear Regression- 12 In multiple linear regression, the dependent variable depends on more than one independent variables. For multiple linear regression, the form of the model is- Y = β0 + β1X1 + β2X2 + β3X3 + …… + βnXn Here, Y is a dependent variable. X1, X2, …., Xn are independent variables. β0, β1,…, βn are the regression coefficients. βj (1<=j<=n) is the slope or weight that specifies the factor by which Xj has an impact on Y.
  • 13. BDA- Unit-II Regression Department of IT Evaluation metrics for a linear regression model Mean Squared Error (MSE) The most common metric for regression tasks is MSE. It has a convex shape. It is the average of the squared difference between the predicted and actual value. Since it is differentiable and has a convex shape, it is easier to optimize. MSE penalizes large errors. 13
  • 14. BDA- Unit-II Regression Department of IT Evaluation metrics for a linear regression model 14
  • 15. BDA- Unit-II Regression Department of IT Evaluation metrics for a linear regression model Mean Absolute Error (MAE) This is simply the average of the absolute difference between the target value and the value predicted by the model. Not preferred in cases where outliers are prominent. MAE does not penalize large errors. 15
  • 16. BDA- Unit-II Regression Department of IT Evaluation metrics for a linear regression model 16
  • 17. BDA- Unit-II Regression Department of IT Evaluation metrics for a linear regression model Root Mean Squared Error(RMSE) As RMSE is clear by the name itself, that it is a simple square root of mean squared error. 17
  • 18. BDA- Unit-II Regression Department of IT Evaluation metrics for a linear regression model 18
  • 19. BDA- Unit-II Regression Department of IT Evaluation metrics for a linear regression model R-squared explains to what extent the variance of one variable explains the variance of the second variable. In other words, it measures the proportion of variance of the dependent variable explained by the independent variable. R squared is a popular metric for identifying model accuracy. It tells how close are the data points to the fitted line generated by a regression algorithm. A larger R squared value indicates a better fit. This helps us to find the relationship between the independent variable towards the dependent variable. 19
  • 20. BDA- Unit-II Regression Department of IT Evaluation metrics for a linear regression model • SSE is the sum of the square of the difference between the actual value and the predicted value • SST is the total sum of the square of the difference between the actual value and the mean of the actual value. • yi is the observed target value, ŷi is the predicted value, and y-bar is the mean value, m represents the total number of observations. 20
  • 21. BDA- Unit-II Regression Department of IT Evaluation metrics for a linear regression model 21
  • 22. BDA- Unit-II Regression Department of IT Evaluation metrics for a linear regression model 22 • R² score ranges from 0 to 1. The closest to 1 the R², the better the regression model is. If R² is equal to 0, the model is not performing better than a random model. If R² is negative, the regression model is erroneous. • A small MAE suggests the model is great at prediction, while a large MAE suggests that your model may have trouble in certain areas. MAE of 0 means that your model is a perfect predictor of the outputs. • If you have outliers in the dataset then it penalizes the outliers most and the calculated MSE is bigger. So, in short, It is not Robust to outliers which were an advantage in MAE.
  • 23. BDA- Unit-II Regression Department of IT Simple Linear Regression With scikit-learn 23 • You’ll start with the simplest case, which is simple linear regression. There are five basic steps when you’re implementing linear regression: 1. Import the packages and classes that you need. 2. Provide data to work with, and eventually do appropriate transformations. 3. Create a regression model and fit it with existing data. 4. Check the results of model fitting to know whether the model is satisfactory. 5. Apply the model for predictions.
  • 24. BDA- Unit-II Regression Department of IT Simple Linear Regression With scikit-learn 24 • Step 1: Import packages and classes • The first step is to import the package numpy and the class LinearRegression from sklearn.linear_model: • >>> import numpy as np • >>> from sklearn.linear_model import LinearRegression
  • 25. BDA- Unit-II Regression Department of IT Simple Linear Regression With scikit-learn 25 • Step 2: Provide data • The second step is defining data to work with. The inputs (regressors, 𝑥) and output (response, 𝑦) should be arrays or similar objects. This is the simplest way of providing data for regression: • >>> x = np.array([5, 15, 25, 35, 45, 55]).reshape((-1, 1)) • >>> y = np.array([5, 20, 14, 32, 22, 38]) • Now, you have two arrays: the input, x, and the output, y. You should call .reshape() on x because this array must be two-dimensional, or more precisely, it must have one column and as many rows as necessary. That’s exactly what the argument (-1, 1) of .reshape() specifies.
  • 26. BDA- Unit-II Regression Department of IT Simple Linear Regression With scikit-learn 26 • This is how x and y look now: • >>> x • array([[ 5], • [15], • [25], • [35], • [45], • [55]]) • >>> y • array([ 5, 20, 14, 32, 22, 38]) • As you can see, x has two dimensions, and x.shape is (6, 1), while y has a single dimension, and y.shape is (6,).
  • 27. BDA- Unit-II Regression Department of IT Simple Linear Regression With scikit-learn 27 • Step 3: Create a model and fit it • The next step is to create a linear regression model and fit it using the existing data. • Create an instance of the class LinearRegression, which will represent the regression model: • >>> model = LinearRegression() • This statement creates the variable model as an instance of LinearRegression. You can provide several optional parameters to LinearRegression:
  • 28. BDA- Unit-II Regression Department of IT Simple Linear Regression With scikit-learn 28 • This statement creates the variable model as an instance of LinearRegression. You can provide several optional parameters to LinearRegression: • fit_intercept is a Boolean that, if True, decides to calculate the intercept 𝑏₀ or, if False, considers it equal to zero. It defaults to True. • normalize is a Boolean that, if True, decides to normalize the input variables. It defaults to False, in which case it doesn’t normalize the input variables. • copy_X is a Boolean that decides whether to copy (True) or overwrite the input variables (False). It’s True by default. • n_jobs is either an integer or None. It represents the number of jobs used in parallel computation. It defaults to None, which usually means one job. - 1 means to use all available processors. • Your model as defined above uses the default values of all parameters.
  • 29. BDA- Unit-II Regression Department of IT Simple Linear Regression With scikit-learn 29 • It’s time to start using the model. First, you need to call .fit() on model: • >>> model.fit(x, y) • LinearRegression() • With .fit(), you calculate the optimal values of the weights 𝑏₀ and 𝑏₁, using the existing input and output, x and y, as the arguments. In other words, .fit() fits the model. It returns self, which is the variable model itself. That’s why you can replace the last two statements with this one: • >>> model = LinearRegression().fit(x, y) • This statement does the same thing as the previous two. It’s just shorter.
  • 30. BDA- Unit-II Regression Department of IT Simple Linear Regression With scikit-learn 30 • Step 4: Get results • Once you have your model fitted, you can get the results to check whether the model works satisfactorily and to interpret it. • You can obtain the coefficient of determination, 𝑅², with .score() called on model: • >>> r_sq = model.score(x, y) • >>> print(f"coefficient of determination: {r_sq}") • coefficient of determination: 0.7158756137479542
  • 31. BDA- Unit-II Regression Department of IT Simple Linear Regression With scikit-learn 31 • When you’re applying .score(), the arguments are also the predictor x and response y, and the return value is 𝑅². • The attributes of model are .intercept_, which represents the coefficient 𝑏₀, and .coef_, which represents 𝑏₁: • >>> print(f"intercept: {model.intercept_}") • intercept: 5.633333333333329 • >>> print(f"slope: {model.coef_}") • slope: [0.54] • The code above illustrates how to get 𝑏₀ and 𝑏₁. You can notice that .intercept_ is a scalar, while .coef_ is an array.