SlideShare a Scribd company logo
Linear Regression
Regression analysis is a very widely used
statistical tool to establish a relationship model
between two variables.
One of these variable is called predictor variable
whose value is gathered through experiments.
The other variable is called response variable
whose value is derived from the predictor
variable
• In Linear Regression these two variables are related
through an equation, where exponent (power) of both
these variables is 1.
• Mathematically a linear relationship represents a straight
line when plotted as a graph.
• A non-linear relationship where the exponent of any
variable is not equal to 1 creates a curve.
• The general mathematical equation for a linear
regression is −
• y = ax + b
• Following is the description of the parameters used −
• y is the response variable.
• x is the predictor variable.
• a and b are constants which are called the coefficients.
• Steps to Establish a Regression
• A simple example of regression is predicting weight of a
person when his height is known. To do this we need to
have the relationship between height and weight of a
person.
• The steps to create the relationship is −
• Carry out the experiment of gathering a sample of
observed values of height and corresponding weight.
• Create a relationship model using the lm() functions in R.
• Find the coefficients from the model created and create
the mathematical equation using these
• Get a summary of the relationship model to know the
average error in prediction. Also called residuals.
• To predict the weight of new persons, use
the predict() function in R.
• Input Data
• Below is the sample data representing the observations −
• # Values of height
• 151, 174, 138, 186, 128, 136, 179, 163, 152, 131
• # Values of weight.
• 63, 81, 56, 91, 47, 57, 76, 72, 62, 48
• lm() Function
• This function creates the relationship model between the
predictor and the response variable.
• Syntax
• The basic syntax for lm() function in linear regression is −
• lm(formula,data)
• Following is the description of the parameters used −
• formula is a symbol presenting the relation between x and y.
• data is the vector on which the formula will be applied.
• x <- c(151, 174, 138, 186, 128, 136, 179, 163,
152, 131)
• y <- c(63, 81, 56, 91, 47, 57, 76, 72, 62, 48)
• # Apply the lm() function.
• relation <- lm(y~x)
• print(relation)
• O/P
• Call: lm(formula = y ~ x)
• Coefficients: (Intercept) x -38.4551 0.6746
• Get the Summary of the Relationship
• x <- c(151, 174, 138, 186, 128, 136, 179, 163,
152, 131)
• y <- c(63, 81, 56, 91, 47, 57, 76, 72, 62, 48)
• # Apply the lm() function.
• relation <- lm(y~x)
• print(summary(relation))
• predict() Function
• Syntax
• The basic syntax for predict() in linear regression
is −
• predict(object, newdata) Following is the
description of the parameters used −
• object is the formula which is already created
using the lm() function.
• newdata is the vector containing the new value
for predictor variable.
• Predict the weight of new persons
• # The predictor vector.
• x <- c(151, 174, 138, 186, 128, 136, 179, 163, 152,
131) # The resposne vector.
• y <- c(63, 81, 56, 91, 47, 57, 76, 72, 62, 48)
• # Apply the lm() function.
• relation <- lm(y~x)
• # Find weight of a person with height 170.
• a <- data.frame(x = 170)
• result <- predict(relation,a)
• print(result)
Visualize the Regression Graphically
• Create the predictor and response variable.
• x <- c(151, 174, 138, 186, 128, 136, 179, 163, 152, 131)
• y <- c(63, 81, 56, 91, 47, 57, 76, 72, 62, 48)
• relation <- lm(y~x) # Give the chart file a name.
• png(file = "linearregression.png") # Plot the chart.
• plot(y,x,col = "blue",xlab = "Weight in Kg",ylab = "Height in cm")
• plot(y,x,col = "blue"cex = 1.3,pch = 16,xlab = "Weight in Kg",ylab =
"Height in cm")
• plot(y,x,col = "blue",cex = 1.3,pch = 16,xlab = "Weight in Kg",ylab =
"Height in cm")
• plot(y,x,col = "blue",abline(lm(x~y)),cex = 1.3,pch = 16,xlab = "Weight
in Kg",ylab = "Height in cm")
• plot(y,x,col = "blue",main = "Height & Weight Regression",
abline(lm(x~y)),cex = 1.3,pch = 16,xlab = "Weight in Kg",ylab = "Height
in cm") dev.off()
Regression assumptions
• Linear regression makes several assumptions
about the data, such as :
• Linearity of the data. The relationship between
the predictor (x) and the outcome (y) is assumed
to be linear.
• Normality of residuals. The residual errors are
assumed to be normally distributed.
• Homogeneity of residuals variance. The residuals
are assumed to have a constant variance
(homoscedasticity)
• Independence of residuals error terms.
• Assumptions about the form of the model
• Assumptions about the errors
• Assumptions about the predictors
• The Predictor variables x1,x2,...xn are
assumed to be linearly independent of each
other. If this assumption is violated, then the
problem is called collinearity problem.
Validating linear assumptions
• Step 1 - Install the necessary libraries
• install.packages("ggplot2") install.packages("dplyr")
library(ggplot2) library(dplyr)
• Step 2 - Read a csv file and explore the data
• data <- read.csv("/content/Data_1.csv")
• head(data) # head() returns the top 6 rows of the
dataframe
• summary(data) # returns the statistical summary of the
data columns
• plot(data$Width,data$Cost) #the plot() gives a visual
representation of the relation between the variable Width
and Cost
• cor(data$Width,data$Cost) # correlation between the two
variables
Using Scatter Plot
• The linearity of the relationship between the dependent and
predictor variables of the model can be studied using scatter plots
• No of hours freshmen_score
• 2 55
• 2.5 62
• 3 65
• 3.5 70
• 4 77
• 4.5 82
• 5 75
• 5.5 83
• 6 85
6.5 88
• Students (HS$noofhours) against fresmen_score
(freshmen_score)
• It can be observed that the study time exhibits a
linear relationship with the score in freshmen
• Using r
• x=1:20
y=x^2
plot(lm(y~x)) Residuals vs fitted plots
• plot(lm(dist~speed,data=cars))
Quantile-Quantile Plot
• The Quantile-Quantile Plot in Programming
Language, or (Q-Q Plot) is defined as a value
of two variables that are plotted
corresponding to each other and check
whether the distributions of two variables are
similar or not with respect to the
locations. qqline() function in R Language is
used to draw a Q-Q Line Plot.
• R – Quantile-Quantile Plot
• Syntax: qqline(x, y, col)
• Parameters:
• x, y: X and Y coordinates of plot
• col: It defines color
• Returns: A QQ Line plot of the coordinates provided
• # Set seed for reproducibility
• set.seed(500)
•
• # Create random normally distributed values
• x <- rnorm(1200)
•
• # QQplot of normally distributed values
• qqnorm(x)
•
• # Add qqline to plot
• qqline(x, col = "darkgreen")
Implementation of QQplot of
Logistically Distributed Values
• # Set seed for reproducibility
•
• # Random values according to logistic distribution
• # QQplot of logistic distribution
• y <- rlogis(800)
•
• # QQplot of normally distributed values
• qqnorm(y)
•
• # Add qqline to plot
• qqline(y, col = "darkgreen")
The Scale Location Plot
• The scale-location plot is very similar to residuals vs
fitted, but simplifies analysis of the homoskedasticity
assumption.
• It takes the square root of the absolute value of
standardized residuals instead of plotting the residuals
themselves.
• Recall that homoskedasticity means constant variance
in linear regression.
• More formally, in linear regression you have
• where is your design matrix, is your vector of
responses, and your vector of errors. .
•
plot(lm(dist~speed,data=cars))
• We want to check two things:
• That the red line is approximately horizontal.
Then the average magnitude of the
standardized residuals isn’t changing much as a
function of the fitted values.
• That the spread around the red line doesn’t
vary with the fitted values. Then the variability
of magnitudes doesn’t vary much as a function
of the fitted values.
Residuals vs fitted values plots
• The fitted vs residuals plot is mainly useful for investigating:
• if linearity assumptions holds: This is indicated by the mean
residual value for every fitted value region being close to 0.this
is shown by the red line is approximate to the dashed line in
the graph.
• if data contain outlines This indicated by some ‘extreme’
residuals that are far from the other residuals points.
• we can see the pattern in the graph so that indicate the data
violations of linearity. the y equation is 3rd order polynomial
function.
• if the relationship between x and y is non-linear, the residua)
ls will be a non-linear function of the fitted values.
• data("cars") model <- lm(dist~speed,data=cars)
plot(model,which = 1
• The Scale Location Plot
• The scale-location plot is very similar to residuals vs
fitted, but plot the square root Standardized residuals vs
fitted values to verify homoskedasticity assumption.We
want to look at:
• the red line: the red line represent the average the
standardized residuals.and must be approximately
horizontal.if the line approximately horizontal and
magnitude of the line hasn’t much fluctuations in the line
,that means the average of the standardized residuals
approximately same.
• variance around the line: The spread of standardized
residuals around the red line doesn’t vary with respect to
the fitted values,means the variance of standardized
residuals due to each fitted value is approximately the
same not much fluctuations in the variance
• modelmt <- lm(disp ~ cyl + hp ,data= mtcars)
plot(modelmt,which = 3)

More Related Content

What's hot

Linear models for classification
Linear models for classificationLinear models for classification
Linear models for classification
Sung Yub Kim
 
Data Analysis and Programming in R
Data Analysis and Programming in RData Analysis and Programming in R
Data Analysis and Programming in REshwar Sai
 
ML - Simple Linear Regression
ML - Simple Linear RegressionML - Simple Linear Regression
ML - Simple Linear Regression
Andrew Ferlitsch
 
GMM
GMMGMM
Cross validation
Cross validationCross validation
Cross validation
RidhaAfrawe
 
Vectormaths and Matrix in R.pptx
Vectormaths and Matrix in R.pptxVectormaths and Matrix in R.pptx
Vectormaths and Matrix in R.pptx
Ramakrishna Reddy Bijjam
 
Decision Tree.pptx
Decision Tree.pptxDecision Tree.pptx
Decision Tree.pptx
Ramakrishna Reddy Bijjam
 
K-Nearest Neighbor Classifier
K-Nearest Neighbor ClassifierK-Nearest Neighbor Classifier
K-Nearest Neighbor Classifier
Neha Kulkarni
 
Machine Learning With Logistic Regression
Machine Learning  With Logistic RegressionMachine Learning  With Logistic Regression
Machine Learning With Logistic Regression
Knoldus Inc.
 
Lect6 Association rule & Apriori algorithm
Lect6 Association rule & Apriori algorithmLect6 Association rule & Apriori algorithm
Lect6 Association rule & Apriori algorithm
hktripathy
 
K-Folds Cross Validation Method
K-Folds Cross Validation MethodK-Folds Cross Validation Method
K-Folds Cross Validation Method
SHUBHAM GUPTA
 
k medoid clustering.pptx
k medoid clustering.pptxk medoid clustering.pptx
k medoid clustering.pptx
Roshan86572
 
Hierarchical Clustering
Hierarchical ClusteringHierarchical Clustering
Hierarchical Clustering
Carlos Castillo (ChaTo)
 
Unsupervised learning clustering
Unsupervised learning clusteringUnsupervised learning clustering
Unsupervised learning clustering
Arshad Farhad
 
Logistic regression
Logistic regressionLogistic regression
Logistic regression
YashwantGahlot1
 
Data Science - Part III - EDA & Model Selection
Data Science - Part III - EDA & Model SelectionData Science - Part III - EDA & Model Selection
Data Science - Part III - EDA & Model Selection
Derek Kane
 
Bruteforce algorithm
Bruteforce algorithmBruteforce algorithm
Bruteforce algorithm
Rezwan Siam
 
Decision Tree Learning
Decision Tree LearningDecision Tree Learning
Decision Tree Learning
Milind Gokhale
 

What's hot (20)

Linear models for classification
Linear models for classificationLinear models for classification
Linear models for classification
 
Data Analysis and Programming in R
Data Analysis and Programming in RData Analysis and Programming in R
Data Analysis and Programming in R
 
ML - Simple Linear Regression
ML - Simple Linear RegressionML - Simple Linear Regression
ML - Simple Linear Regression
 
GMM
GMMGMM
GMM
 
K Nearest Neighbors
K Nearest NeighborsK Nearest Neighbors
K Nearest Neighbors
 
Cross validation
Cross validationCross validation
Cross validation
 
Vectormaths and Matrix in R.pptx
Vectormaths and Matrix in R.pptxVectormaths and Matrix in R.pptx
Vectormaths and Matrix in R.pptx
 
Decision Tree.pptx
Decision Tree.pptxDecision Tree.pptx
Decision Tree.pptx
 
K-Nearest Neighbor Classifier
K-Nearest Neighbor ClassifierK-Nearest Neighbor Classifier
K-Nearest Neighbor Classifier
 
Machine Learning With Logistic Regression
Machine Learning  With Logistic RegressionMachine Learning  With Logistic Regression
Machine Learning With Logistic Regression
 
Lect6 Association rule & Apriori algorithm
Lect6 Association rule & Apriori algorithmLect6 Association rule & Apriori algorithm
Lect6 Association rule & Apriori algorithm
 
K-Folds Cross Validation Method
K-Folds Cross Validation MethodK-Folds Cross Validation Method
K-Folds Cross Validation Method
 
k medoid clustering.pptx
k medoid clustering.pptxk medoid clustering.pptx
k medoid clustering.pptx
 
Hierarchical Clustering
Hierarchical ClusteringHierarchical Clustering
Hierarchical Clustering
 
Unsupervised learning clustering
Unsupervised learning clusteringUnsupervised learning clustering
Unsupervised learning clustering
 
Logistic regression
Logistic regressionLogistic regression
Logistic regression
 
Data Science - Part III - EDA & Model Selection
Data Science - Part III - EDA & Model SelectionData Science - Part III - EDA & Model Selection
Data Science - Part III - EDA & Model Selection
 
Data preprocessing ng
Data preprocessing   ngData preprocessing   ng
Data preprocessing ng
 
Bruteforce algorithm
Bruteforce algorithmBruteforce algorithm
Bruteforce algorithm
 
Decision Tree Learning
Decision Tree LearningDecision Tree Learning
Decision Tree Learning
 

Similar to Linear Regression.pptx

Linear regression by Kodebay
Linear regression by KodebayLinear regression by Kodebay
Linear regression by Kodebay
Kodebay
 
[Paper reading] L-SHAPLEY AND C-SHAPLEY: EFFICIENT MODEL INTERPRETATION FOR S...
[Paper reading] L-SHAPLEY AND C-SHAPLEY: EFFICIENT MODEL INTERPRETATION FOR S...[Paper reading] L-SHAPLEY AND C-SHAPLEY: EFFICIENT MODEL INTERPRETATION FOR S...
[Paper reading] L-SHAPLEY AND C-SHAPLEY: EFFICIENT MODEL INTERPRETATION FOR S...
Daiki Tanaka
 
Data Mining Lecture_9.pptx
Data Mining Lecture_9.pptxData Mining Lecture_9.pptx
Data Mining Lecture_9.pptx
Subrata Kumer Paul
 
Linear Regression
Linear RegressionLinear Regression
Linear Regression
SourajitMaity1
 
Exploratory data analysis using r
Exploratory data analysis using rExploratory data analysis using r
Exploratory data analysis using r
Tahera Shaikh
 
Simple linear regression
Simple linear regressionSimple linear regression
Simple linear regression
Avjinder (Avi) Kaler
 
R nonlinear least square
R   nonlinear least squareR   nonlinear least square
R nonlinear least square
Learnbay Datascience
 
Curve Fitting
Curve FittingCurve Fitting
Curve Fitting
Sachin Kumar
 
Machine learning Introduction
Machine learning IntroductionMachine learning Introduction
Machine learning Introduction
Kuppusamy P
 
working with python
working with pythonworking with python
working with python
bhavesh lande
 
슬로우캠퍼스: scikit-learn & 머신러닝 (강박사)
슬로우캠퍼스:  scikit-learn & 머신러닝 (강박사)슬로우캠퍼스:  scikit-learn & 머신러닝 (강박사)
슬로우캠퍼스: scikit-learn & 머신러닝 (강박사)
마이캠퍼스
 
Linear regression [Theory and Application (In physics point of view) using py...
Linear regression [Theory and Application (In physics point of view) using py...Linear regression [Theory and Application (In physics point of view) using py...
Linear regression [Theory and Application (In physics point of view) using py...
ANIRBANMAJUMDAR18
 
Regression diagnostics - Checking if linear regression assumptions are violat...
Regression diagnostics - Checking if linear regression assumptions are violat...Regression diagnostics - Checking if linear regression assumptions are violat...
Regression diagnostics - Checking if linear regression assumptions are violat...
Jerome Gomes
 
02-alignment.pdf
02-alignment.pdf02-alignment.pdf
02-alignment.pdf
SivaAyyappan2
 
Unit-III Correlation and Regression.pptx
Unit-III Correlation and Regression.pptxUnit-III Correlation and Regression.pptx
Unit-III Correlation and Regression.pptx
Anusuya123
 
Different Types of Machine Learning Algorithms
Different Types of Machine Learning AlgorithmsDifferent Types of Machine Learning Algorithms
Different Types of Machine Learning Algorithms
rahmedraj93
 
rugs koco.pptx
rugs koco.pptxrugs koco.pptx
rugs koco.pptx
AbdalrahmanTahaJaya
 
regression.pptx
regression.pptxregression.pptx
regression.pptx
Rashi Agarwal
 

Similar to Linear Regression.pptx (20)

Linear regression by Kodebay
Linear regression by KodebayLinear regression by Kodebay
Linear regression by Kodebay
 
[Paper reading] L-SHAPLEY AND C-SHAPLEY: EFFICIENT MODEL INTERPRETATION FOR S...
[Paper reading] L-SHAPLEY AND C-SHAPLEY: EFFICIENT MODEL INTERPRETATION FOR S...[Paper reading] L-SHAPLEY AND C-SHAPLEY: EFFICIENT MODEL INTERPRETATION FOR S...
[Paper reading] L-SHAPLEY AND C-SHAPLEY: EFFICIENT MODEL INTERPRETATION FOR S...
 
Data Mining Lecture_9.pptx
Data Mining Lecture_9.pptxData Mining Lecture_9.pptx
Data Mining Lecture_9.pptx
 
Linear Regression
Linear RegressionLinear Regression
Linear Regression
 
Exploratory data analysis using r
Exploratory data analysis using rExploratory data analysis using r
Exploratory data analysis using r
 
Simple linear regression
Simple linear regressionSimple linear regression
Simple linear regression
 
R nonlinear least square
R   nonlinear least squareR   nonlinear least square
R nonlinear least square
 
Curve Fitting
Curve FittingCurve Fitting
Curve Fitting
 
Machine learning Introduction
Machine learning IntroductionMachine learning Introduction
Machine learning Introduction
 
working with python
working with pythonworking with python
working with python
 
슬로우캠퍼스: scikit-learn & 머신러닝 (강박사)
슬로우캠퍼스:  scikit-learn & 머신러닝 (강박사)슬로우캠퍼스:  scikit-learn & 머신러닝 (강박사)
슬로우캠퍼스: scikit-learn & 머신러닝 (강박사)
 
Linear regression [Theory and Application (In physics point of view) using py...
Linear regression [Theory and Application (In physics point of view) using py...Linear regression [Theory and Application (In physics point of view) using py...
Linear regression [Theory and Application (In physics point of view) using py...
 
Regression diagnostics - Checking if linear regression assumptions are violat...
Regression diagnostics - Checking if linear regression assumptions are violat...Regression diagnostics - Checking if linear regression assumptions are violat...
Regression diagnostics - Checking if linear regression assumptions are violat...
 
02-alignment.pdf
02-alignment.pdf02-alignment.pdf
02-alignment.pdf
 
Unit-III Correlation and Regression.pptx
Unit-III Correlation and Regression.pptxUnit-III Correlation and Regression.pptx
Unit-III Correlation and Regression.pptx
 
Different Types of Machine Learning Algorithms
Different Types of Machine Learning AlgorithmsDifferent Types of Machine Learning Algorithms
Different Types of Machine Learning Algorithms
 
Chapter05
Chapter05Chapter05
Chapter05
 
An introduction to R
An introduction to RAn introduction to R
An introduction to R
 
rugs koco.pptx
rugs koco.pptxrugs koco.pptx
rugs koco.pptx
 
regression.pptx
regression.pptxregression.pptx
regression.pptx
 

More from Ramakrishna Reddy Bijjam

Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docx
Ramakrishna Reddy Bijjam
 
Arrays to arrays and pointers with arrays.pptx
Arrays to arrays and pointers with arrays.pptxArrays to arrays and pointers with arrays.pptx
Arrays to arrays and pointers with arrays.pptx
Ramakrishna Reddy Bijjam
 
Auxiliary, Cache and Virtual memory.pptx
Auxiliary, Cache and Virtual memory.pptxAuxiliary, Cache and Virtual memory.pptx
Auxiliary, Cache and Virtual memory.pptx
Ramakrishna Reddy Bijjam
 
Python With MongoDB in advanced Python.pptx
Python With MongoDB in advanced Python.pptxPython With MongoDB in advanced Python.pptx
Python With MongoDB in advanced Python.pptx
Ramakrishna Reddy Bijjam
 
Pointers and single &multi dimentionalarrays.pptx
Pointers and single &multi dimentionalarrays.pptxPointers and single &multi dimentionalarrays.pptx
Pointers and single &multi dimentionalarrays.pptx
Ramakrishna Reddy Bijjam
 
Certinity Factor and Dempster-shafer theory .pptx
Certinity Factor and Dempster-shafer theory .pptxCertinity Factor and Dempster-shafer theory .pptx
Certinity Factor and Dempster-shafer theory .pptx
Ramakrishna Reddy Bijjam
 
Auxiliary Memory in computer Architecture.pptx
Auxiliary Memory in computer Architecture.pptxAuxiliary Memory in computer Architecture.pptx
Auxiliary Memory in computer Architecture.pptx
Ramakrishna Reddy Bijjam
 
Random Forest Decision Tree.pptx
Random Forest Decision Tree.pptxRandom Forest Decision Tree.pptx
Random Forest Decision Tree.pptx
Ramakrishna Reddy Bijjam
 
K Means Clustering in ML.pptx
K Means Clustering in ML.pptxK Means Clustering in ML.pptx
K Means Clustering in ML.pptx
Ramakrishna Reddy Bijjam
 
Pandas.pptx
Pandas.pptxPandas.pptx
Python With MongoDB.pptx
Python With MongoDB.pptxPython With MongoDB.pptx
Python With MongoDB.pptx
Ramakrishna Reddy Bijjam
 
Python with MySql.pptx
Python with MySql.pptxPython with MySql.pptx
Python with MySql.pptx
Ramakrishna Reddy Bijjam
 
PYTHON PROGRAMMING NOTES RKREDDY.pdf
PYTHON PROGRAMMING NOTES RKREDDY.pdfPYTHON PROGRAMMING NOTES RKREDDY.pdf
PYTHON PROGRAMMING NOTES RKREDDY.pdf
Ramakrishna Reddy Bijjam
 
BInary file Operations.pptx
BInary file Operations.pptxBInary file Operations.pptx
BInary file Operations.pptx
Ramakrishna Reddy Bijjam
 
Data Science in Python.pptx
Data Science in Python.pptxData Science in Python.pptx
Data Science in Python.pptx
Ramakrishna Reddy Bijjam
 
CSV JSON and XML files in Python.pptx
CSV JSON and XML files in Python.pptxCSV JSON and XML files in Python.pptx
CSV JSON and XML files in Python.pptx
Ramakrishna Reddy Bijjam
 
HTML files in python.pptx
HTML files in python.pptxHTML files in python.pptx
HTML files in python.pptx
Ramakrishna Reddy Bijjam
 
Regular Expressions in Python.pptx
Regular Expressions in Python.pptxRegular Expressions in Python.pptx
Regular Expressions in Python.pptx
Ramakrishna Reddy Bijjam
 
datareprersentation 1.pptx
datareprersentation 1.pptxdatareprersentation 1.pptx
datareprersentation 1.pptx
Ramakrishna Reddy Bijjam
 
Apriori.pptx
Apriori.pptxApriori.pptx

More from Ramakrishna Reddy Bijjam (20)

Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docx
 
Arrays to arrays and pointers with arrays.pptx
Arrays to arrays and pointers with arrays.pptxArrays to arrays and pointers with arrays.pptx
Arrays to arrays and pointers with arrays.pptx
 
Auxiliary, Cache and Virtual memory.pptx
Auxiliary, Cache and Virtual memory.pptxAuxiliary, Cache and Virtual memory.pptx
Auxiliary, Cache and Virtual memory.pptx
 
Python With MongoDB in advanced Python.pptx
Python With MongoDB in advanced Python.pptxPython With MongoDB in advanced Python.pptx
Python With MongoDB in advanced Python.pptx
 
Pointers and single &multi dimentionalarrays.pptx
Pointers and single &multi dimentionalarrays.pptxPointers and single &multi dimentionalarrays.pptx
Pointers and single &multi dimentionalarrays.pptx
 
Certinity Factor and Dempster-shafer theory .pptx
Certinity Factor and Dempster-shafer theory .pptxCertinity Factor and Dempster-shafer theory .pptx
Certinity Factor and Dempster-shafer theory .pptx
 
Auxiliary Memory in computer Architecture.pptx
Auxiliary Memory in computer Architecture.pptxAuxiliary Memory in computer Architecture.pptx
Auxiliary Memory in computer Architecture.pptx
 
Random Forest Decision Tree.pptx
Random Forest Decision Tree.pptxRandom Forest Decision Tree.pptx
Random Forest Decision Tree.pptx
 
K Means Clustering in ML.pptx
K Means Clustering in ML.pptxK Means Clustering in ML.pptx
K Means Clustering in ML.pptx
 
Pandas.pptx
Pandas.pptxPandas.pptx
Pandas.pptx
 
Python With MongoDB.pptx
Python With MongoDB.pptxPython With MongoDB.pptx
Python With MongoDB.pptx
 
Python with MySql.pptx
Python with MySql.pptxPython with MySql.pptx
Python with MySql.pptx
 
PYTHON PROGRAMMING NOTES RKREDDY.pdf
PYTHON PROGRAMMING NOTES RKREDDY.pdfPYTHON PROGRAMMING NOTES RKREDDY.pdf
PYTHON PROGRAMMING NOTES RKREDDY.pdf
 
BInary file Operations.pptx
BInary file Operations.pptxBInary file Operations.pptx
BInary file Operations.pptx
 
Data Science in Python.pptx
Data Science in Python.pptxData Science in Python.pptx
Data Science in Python.pptx
 
CSV JSON and XML files in Python.pptx
CSV JSON and XML files in Python.pptxCSV JSON and XML files in Python.pptx
CSV JSON and XML files in Python.pptx
 
HTML files in python.pptx
HTML files in python.pptxHTML files in python.pptx
HTML files in python.pptx
 
Regular Expressions in Python.pptx
Regular Expressions in Python.pptxRegular Expressions in Python.pptx
Regular Expressions in Python.pptx
 
datareprersentation 1.pptx
datareprersentation 1.pptxdatareprersentation 1.pptx
datareprersentation 1.pptx
 
Apriori.pptx
Apriori.pptxApriori.pptx
Apriori.pptx
 

Recently uploaded

一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
ewymefz
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Subhajit Sahu
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
TravisMalana
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
ArpitMalhotra16
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
ewymefz
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
axoqas
 
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
vcaxypu
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Subhajit Sahu
 
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
correoyaya
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
John Andrews
 
Jpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization SampleJpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization Sample
James Polillo
 
Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
Opendatabay
 
Tabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflowsTabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflows
alex933524
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
ewymefz
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
NABLAS株式会社
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
enxupq
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
ukgaet
 
FP Growth Algorithm and its Applications
FP Growth Algorithm and its ApplicationsFP Growth Algorithm and its Applications
FP Growth Algorithm and its Applications
MaleehaSheikh2
 
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
Tiktokethiodaily
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
ewymefz
 

Recently uploaded (20)

一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
 
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
 
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
 
Jpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization SampleJpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization Sample
 
Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
 
Tabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflowsTabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflows
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
 
FP Growth Algorithm and its Applications
FP Growth Algorithm and its ApplicationsFP Growth Algorithm and its Applications
FP Growth Algorithm and its Applications
 
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
 

Linear Regression.pptx

  • 1. Linear Regression Regression analysis is a very widely used statistical tool to establish a relationship model between two variables. One of these variable is called predictor variable whose value is gathered through experiments. The other variable is called response variable whose value is derived from the predictor variable
  • 2. • In Linear Regression these two variables are related through an equation, where exponent (power) of both these variables is 1. • Mathematically a linear relationship represents a straight line when plotted as a graph. • A non-linear relationship where the exponent of any variable is not equal to 1 creates a curve. • The general mathematical equation for a linear regression is − • y = ax + b • Following is the description of the parameters used − • y is the response variable. • x is the predictor variable. • a and b are constants which are called the coefficients.
  • 3. • Steps to Establish a Regression • A simple example of regression is predicting weight of a person when his height is known. To do this we need to have the relationship between height and weight of a person. • The steps to create the relationship is − • Carry out the experiment of gathering a sample of observed values of height and corresponding weight. • Create a relationship model using the lm() functions in R. • Find the coefficients from the model created and create the mathematical equation using these • Get a summary of the relationship model to know the average error in prediction. Also called residuals. • To predict the weight of new persons, use the predict() function in R.
  • 4. • Input Data • Below is the sample data representing the observations − • # Values of height • 151, 174, 138, 186, 128, 136, 179, 163, 152, 131 • # Values of weight. • 63, 81, 56, 91, 47, 57, 76, 72, 62, 48 • lm() Function • This function creates the relationship model between the predictor and the response variable. • Syntax • The basic syntax for lm() function in linear regression is − • lm(formula,data) • Following is the description of the parameters used − • formula is a symbol presenting the relation between x and y. • data is the vector on which the formula will be applied.
  • 5. • x <- c(151, 174, 138, 186, 128, 136, 179, 163, 152, 131) • y <- c(63, 81, 56, 91, 47, 57, 76, 72, 62, 48) • # Apply the lm() function. • relation <- lm(y~x) • print(relation) • O/P • Call: lm(formula = y ~ x) • Coefficients: (Intercept) x -38.4551 0.6746
  • 6. • Get the Summary of the Relationship • x <- c(151, 174, 138, 186, 128, 136, 179, 163, 152, 131) • y <- c(63, 81, 56, 91, 47, 57, 76, 72, 62, 48) • # Apply the lm() function. • relation <- lm(y~x) • print(summary(relation))
  • 7. • predict() Function • Syntax • The basic syntax for predict() in linear regression is − • predict(object, newdata) Following is the description of the parameters used − • object is the formula which is already created using the lm() function. • newdata is the vector containing the new value for predictor variable.
  • 8. • Predict the weight of new persons • # The predictor vector. • x <- c(151, 174, 138, 186, 128, 136, 179, 163, 152, 131) # The resposne vector. • y <- c(63, 81, 56, 91, 47, 57, 76, 72, 62, 48) • # Apply the lm() function. • relation <- lm(y~x) • # Find weight of a person with height 170. • a <- data.frame(x = 170) • result <- predict(relation,a) • print(result)
  • 9. Visualize the Regression Graphically • Create the predictor and response variable. • x <- c(151, 174, 138, 186, 128, 136, 179, 163, 152, 131) • y <- c(63, 81, 56, 91, 47, 57, 76, 72, 62, 48) • relation <- lm(y~x) # Give the chart file a name. • png(file = "linearregression.png") # Plot the chart. • plot(y,x,col = "blue",xlab = "Weight in Kg",ylab = "Height in cm") • plot(y,x,col = "blue"cex = 1.3,pch = 16,xlab = "Weight in Kg",ylab = "Height in cm") • plot(y,x,col = "blue",cex = 1.3,pch = 16,xlab = "Weight in Kg",ylab = "Height in cm") • plot(y,x,col = "blue",abline(lm(x~y)),cex = 1.3,pch = 16,xlab = "Weight in Kg",ylab = "Height in cm") • plot(y,x,col = "blue",main = "Height & Weight Regression", abline(lm(x~y)),cex = 1.3,pch = 16,xlab = "Weight in Kg",ylab = "Height in cm") dev.off()
  • 10. Regression assumptions • Linear regression makes several assumptions about the data, such as : • Linearity of the data. The relationship between the predictor (x) and the outcome (y) is assumed to be linear. • Normality of residuals. The residual errors are assumed to be normally distributed. • Homogeneity of residuals variance. The residuals are assumed to have a constant variance (homoscedasticity) • Independence of residuals error terms.
  • 11. • Assumptions about the form of the model • Assumptions about the errors • Assumptions about the predictors • The Predictor variables x1,x2,...xn are assumed to be linearly independent of each other. If this assumption is violated, then the problem is called collinearity problem.
  • 12. Validating linear assumptions • Step 1 - Install the necessary libraries • install.packages("ggplot2") install.packages("dplyr") library(ggplot2) library(dplyr) • Step 2 - Read a csv file and explore the data • data <- read.csv("/content/Data_1.csv") • head(data) # head() returns the top 6 rows of the dataframe • summary(data) # returns the statistical summary of the data columns • plot(data$Width,data$Cost) #the plot() gives a visual representation of the relation between the variable Width and Cost • cor(data$Width,data$Cost) # correlation between the two variables
  • 13. Using Scatter Plot • The linearity of the relationship between the dependent and predictor variables of the model can be studied using scatter plots • No of hours freshmen_score • 2 55 • 2.5 62 • 3 65 • 3.5 70 • 4 77 • 4.5 82 • 5 75 • 5.5 83 • 6 85 6.5 88
  • 14. • Students (HS$noofhours) against fresmen_score (freshmen_score) • It can be observed that the study time exhibits a linear relationship with the score in freshmen • Using r • x=1:20 y=x^2 plot(lm(y~x)) Residuals vs fitted plots • plot(lm(dist~speed,data=cars))
  • 15. Quantile-Quantile Plot • The Quantile-Quantile Plot in Programming Language, or (Q-Q Plot) is defined as a value of two variables that are plotted corresponding to each other and check whether the distributions of two variables are similar or not with respect to the locations. qqline() function in R Language is used to draw a Q-Q Line Plot.
  • 16. • R – Quantile-Quantile Plot • Syntax: qqline(x, y, col) • Parameters: • x, y: X and Y coordinates of plot • col: It defines color • Returns: A QQ Line plot of the coordinates provided • # Set seed for reproducibility • set.seed(500) • • # Create random normally distributed values • x <- rnorm(1200) • • # QQplot of normally distributed values • qqnorm(x) • • # Add qqline to plot • qqline(x, col = "darkgreen")
  • 17. Implementation of QQplot of Logistically Distributed Values • # Set seed for reproducibility • • # Random values according to logistic distribution • # QQplot of logistic distribution • y <- rlogis(800) • • # QQplot of normally distributed values • qqnorm(y) • • # Add qqline to plot • qqline(y, col = "darkgreen")
  • 18. The Scale Location Plot • The scale-location plot is very similar to residuals vs fitted, but simplifies analysis of the homoskedasticity assumption. • It takes the square root of the absolute value of standardized residuals instead of plotting the residuals themselves. • Recall that homoskedasticity means constant variance in linear regression. • More formally, in linear regression you have • where is your design matrix, is your vector of responses, and your vector of errors. . • plot(lm(dist~speed,data=cars))
  • 19. • We want to check two things: • That the red line is approximately horizontal. Then the average magnitude of the standardized residuals isn’t changing much as a function of the fitted values. • That the spread around the red line doesn’t vary with the fitted values. Then the variability of magnitudes doesn’t vary much as a function of the fitted values.
  • 20. Residuals vs fitted values plots • The fitted vs residuals plot is mainly useful for investigating: • if linearity assumptions holds: This is indicated by the mean residual value for every fitted value region being close to 0.this is shown by the red line is approximate to the dashed line in the graph. • if data contain outlines This indicated by some ‘extreme’ residuals that are far from the other residuals points. • we can see the pattern in the graph so that indicate the data violations of linearity. the y equation is 3rd order polynomial function. • if the relationship between x and y is non-linear, the residua) ls will be a non-linear function of the fitted values. • data("cars") model <- lm(dist~speed,data=cars) plot(model,which = 1
  • 21. • The Scale Location Plot • The scale-location plot is very similar to residuals vs fitted, but plot the square root Standardized residuals vs fitted values to verify homoskedasticity assumption.We want to look at: • the red line: the red line represent the average the standardized residuals.and must be approximately horizontal.if the line approximately horizontal and magnitude of the line hasn’t much fluctuations in the line ,that means the average of the standardized residuals approximately same. • variance around the line: The spread of standardized residuals around the red line doesn’t vary with respect to the fitted values,means the variance of standardized residuals due to each fitted value is approximately the same not much fluctuations in the variance • modelmt <- lm(disp ~ cyl + hp ,data= mtcars) plot(modelmt,which = 3)