SlideShare a Scribd company logo
1 of 33
AUTO MPG
REGRESSION
ANALYSIS
INTRODUCTION
 The objective of this project is to study the
relationship between Horsepower,
Displacement, Cylinders, Acceleration and
Weight on Miles Per Gallon(MPG). The dataset
was obtained from the UCI Website and
Regression Analysis was conducted.
 The reason why we choose the particular
dataset was because of its practical
applications involved in it. Miles per
Gallon(mpg) will be useful when you purchase
a car and that was one of the reasons why we
choose this dataset.
METHODOLOGY
The model that we have used to perform regression
analysis is multivariate. It has more than two variables
and therefore Multiple Regression Analysis is conducted.
The variable here to predict is called the dependent
variable. The variables here to predict the dependent
variable are called the independent variables.
Data
Sourcing
The data taken into consideration is taken from
the University of California-Irvine website. It has
been extensively used by students,educators and
researchers all over the world and is the primary
source for Regression Dataset Analysis
Link to the Dataset -
http://archive.ics.uci.edu/ml/datasets/Auto+MPG
VARIABLES
DEPENDENT VARIABLE:
Miles per Gallon(mpg) – Continuous
INDEPENDENT VARIABLES:
Cylinders - Multi-Valued discrete - Denotes the no
of cylinders in a car(3,4,6,8)
Displacement - Continuous - Volume of Pistons
inside a car
Horsepower - Continuous - Power of an Engine in a
car
Weight - Continuous - Weight of the car in lbs
Acceleration - Continuous - Acceleration of a car
MODEL 1 - Multiple Regression Analysis
Miles Per Gallon(MPG) is regressed on the four independent
variables and this is the first model of our Regression Analysis. R-
Squared explains 70.70% variation in the independent
variable(MPG).
MODEL 2 - INDEPENDENT
VARIABLE
TRANSFORMATION
 After transforming the
independent variable with log
transformation, we found the R
squared to improve from 70.70%
to 78.98%.Also performing the
slog transformation, showed the
data to be distributed normal
which we could see from the
histogram distribution. The
formula is given below
 L_mpg = β0 + β1Displacement +
β2Horsepower + β3Acceleration
+ β4Weight
CORRELATION
ANALYSIS
 Here we found that
correlation between
 1) Displacement and
Horsepower
 2) Weight and
horsepower
 3) Weight and
Displacement
HISTOGRAM & SCATTER PLOT FOR LIN-LIN MODEL
Scatter Plot HISTOGRAM
HISTOGRAM & SCATTER PLOT FOR LOG-LIN MODEL
Scatter Plot Histogram
As you can see from the graphs, the Log-Lin Model appears to be a better
model because it is more normally distributed.
Hypothesis Testing - Paired sample t test
Hypothesis Testing to identify if the Coefficients of Two variables are
equal is performed
MODEL 3 - DUMMY
VARIABLE - ANALYSIS(STEP
1)
The first step to identify
the dummy variables in
the model is to identify
the no of categories in a
variable. As seen from the
table, our model has 5
categories with Eight
having the highest
frequency..
STEP 2 - DUMMY
VARIABLE ANALYSIS
Multiple Regression is
performed Using the
Dummy encoded Cylinders
with Cylinder 5 as the base
variable. Cylinder Variable
5 is Three which has a
frequency of 3.
MODEL 4 - INTERACTION TERMS & REGRESSION ANALYSIS
Regression is done on Interaction Terms (Displacement & Horsepower) and the
other independent variables. The reason why Displacement and Horsepower
was chosen is because of their high correlation value
MODEL 5 - Regression Analysis on Dummy Variables & Interaction
Terms.
Regression Analysis is done on the Dummy Variables and Interaction Terms to check if the R-Squared
Value is increasing. The equation for the model is given below
L_mpg = β0 + β1Displacement + β2Horsepower + β3Acceleration + β4Weight + β5CYLINDER_COUNT4
+ β6CYLINDER_COUNT2 + β7CYLINDER_COUNT3 + β8CYLINDER_COUNT5 + β9disp_horse
OBSERVATIONS FROM MODEL 5
 Here CYLINDER_COUNT1 is being kept as
base variable and regressed on the other
independent variables.
 We can see that CYLINDER_COUNT4 is 3.3%
less that CYLINDER_COUNT1
 We could see (CYLINDER_COUNT4 ) is
predicted to have 11.3 – (-3.3) = 14.6 more
mpg than CYLINDER_COUNT2
 To check whether the difference is
significant or not, we have performed
another model with CYLINDER_COUNT4 is
kept as the base variable.
Test For Significant Difference
Here CYLINDER_COUNT4 is kept as base and regressed model shows that
CYLINDER_COUNT2 has 14.6% more mpg than CYLINDER_COUNT4 (which is
evident from our previous model)
Testing Differences Between
Groups(F-Test)
L_mpg = β0 + 𝛿0 CYLINDER_COUNT1 + β1 displacement + 𝛿1 c1_disp + β2
horsepower + 𝛿2 c1_horse + β3 weight + 𝛿3 c1_weight + β4 acceleration + 𝛿4
c1_acc

Null hypothesis:
If 𝛿0 = 𝛿1 = 𝛿2 = 𝛿3 = 𝛿4 = 0 then we conclude that there is no difference between
the groups
Alternate:
Null hypothesis is False i.e, there is a difference between the groups
Using F-Stats to determine difference between groups(Restricted & Unrestricted)
UNRESTRICTED MODEL
Unrestricted model contains Independent Variables and Dummy
Variable(Cylinder Count 1) and the product of the Dummy Variable along
with Independent Variables.
RESTRICTED MODEL
Restricted Model contains Regression on the Base
Model.
F-Test to Determine Difference between Groups
F = (R2
u - R2
r)/q
(1 – R2
u)/ (n-k-1)
= (0.8154 – 0.7898)/5
(1 – 0.8154) / 382
=10.59
Therefore 10.59 is greater than F-Table(5,382) which is 2.2141.
Therefore we reject the null and therefore we can conclude that there
are differences in groups.
Test for Heteroskedasticity - Breusch Pagan Test
Multiple Regression is done using Log-Lin Model to
check for heteroskedasticity.
As seen from the table, the Error Term is predicted and regression is
done on the Square of the Regressors.
Hypothesis Testing for Heteroskedasticity
Continued..
Null Hypothesis - βdisplacement = βhorsepower= βweight= βacceleration = 0
Alternate Hypothesis - There is heteroskedasticity
F = (R2
u /k)
(1 – R2
u)/ (n-k-1)
= (0.05/4)
(1 – 0.05)/ (387)
= 5.092
Therefore 5.092 is greater than F-Table(4,387) which is 2.3719 and null is rejected. So our model exhibits
heteroskedasticity.
White Test for Heteroskedasticity
Multiple Regression is done using the Log-Lin Model.
Regression
on Cross
Products of
Regressors
and its
Square
 Gen disp2 = displacement ^2
 Gen horsepower2 = Horsepower ^2
 Gen Acceleration2 = Acceleration ^2
 Gen Weight2 = Weight ^2
 Gen disp_acceleration = Displacement * Acceleration
 Gen horse_acc = Horsepower * Acceleration
 Gen weight_acc = Weight * Acceleration
Contd..
Hypothesis Testing
Null Hypothesis - βdisplacement = βhorsepower= βweight= βacceleration = 0
Alternate Hypothesis - There is heteroskedasticity
F Statistic(90.44482) is greater than F-Table Value(8.08), therefore we
reject the null and confirm that there is heteroskedasticity.
Conclusion for Heteroskedasticity
As seen from the graph and the two tests, we can determine that there is
heteroskedasticity.
HETEROSKEDASTICITY ROBUST STANDARD
ERRORS(HRSE)
Due to the presence
of heteroskedasticity,
the best variance and
the standard error
estimates are not
valid. Therefore we
need to find
heteroskedasticity
robust standard
errors.
When a model
exhibits
heteroskedasticity, it
is better to look at
the robust standard
errors than the OLS
standard errors.
Regression on Log-Lin Model
Robust Standard Errors
Summary
Model No R-Squared Adjusted R-
Squared
Model 1 0.7070 0.7040
Model 2 0.7898 0.7876
Model 3 0.8112 0.8073
Model 4 0.8134 0.8110
Model 5 0.8286 0.8284

More Related Content

What's hot

Categorical data analysis
Categorical data analysisCategorical data analysis
Categorical data analysisSumit Das
 
Logistic regression (blyth 2006) (simplified)
Logistic regression (blyth 2006) (simplified)Logistic regression (blyth 2006) (simplified)
Logistic regression (blyth 2006) (simplified)MikeBlyth
 
Logistic regression
Logistic regressionLogistic regression
Logistic regressionsaba khan
 
Complements conditional probability bayes theorem
Complements  conditional probability bayes theorem  Complements  conditional probability bayes theorem
Complements conditional probability bayes theorem Long Beach City College
 
Logistic regression
Logistic regressionLogistic regression
Logistic regressionDrZahid Khan
 
EC4417 Econometrics Project
EC4417 Econometrics ProjectEC4417 Econometrics Project
EC4417 Econometrics ProjectGearóid Dowling
 
Introduction to Generalized Linear Models
Introduction to Generalized Linear ModelsIntroduction to Generalized Linear Models
Introduction to Generalized Linear Modelsrichardchandler
 
Credit Card Fraud Detection Using Unsupervised Machine Learning Algorithms
Credit Card Fraud Detection Using Unsupervised Machine Learning AlgorithmsCredit Card Fraud Detection Using Unsupervised Machine Learning Algorithms
Credit Card Fraud Detection Using Unsupervised Machine Learning AlgorithmsHariteja Bodepudi
 
7 classical assumptions of ordinary least squares
7 classical assumptions of ordinary least squares7 classical assumptions of ordinary least squares
7 classical assumptions of ordinary least squaresYugesh Dutt Panday
 
hypothesis testing
hypothesis testinghypothesis testing
hypothesis testingmsrpt
 
Statistics Case Study - Stepwise Multiple Regression
Statistics Case Study - Stepwise Multiple RegressionStatistics Case Study - Stepwise Multiple Regression
Statistics Case Study - Stepwise Multiple RegressionSharad Srivastava
 
Logistic regression with SPSS examples
Logistic regression with SPSS examplesLogistic regression with SPSS examples
Logistic regression with SPSS examplesGaurav Kamboj
 

What's hot (20)

Counting
CountingCounting
Counting
 
Categorical data analysis
Categorical data analysisCategorical data analysis
Categorical data analysis
 
Logistic regression (blyth 2006) (simplified)
Logistic regression (blyth 2006) (simplified)Logistic regression (blyth 2006) (simplified)
Logistic regression (blyth 2006) (simplified)
 
Logistic regression
Logistic regressionLogistic regression
Logistic regression
 
Complements conditional probability bayes theorem
Complements  conditional probability bayes theorem  Complements  conditional probability bayes theorem
Complements conditional probability bayes theorem
 
Logistic regression
Logistic regressionLogistic regression
Logistic regression
 
Propensity Score Matching Methods
Propensity Score Matching MethodsPropensity Score Matching Methods
Propensity Score Matching Methods
 
Probability distributions & expected values
Probability distributions & expected valuesProbability distributions & expected values
Probability distributions & expected values
 
Binary Logistic Regression
Binary Logistic RegressionBinary Logistic Regression
Binary Logistic Regression
 
EC4417 Econometrics Project
EC4417 Econometrics ProjectEC4417 Econometrics Project
EC4417 Econometrics Project
 
Bayes Theorem
Bayes TheoremBayes Theorem
Bayes Theorem
 
Regression analysis
Regression analysisRegression analysis
Regression analysis
 
Principal component analysis
Principal component analysisPrincipal component analysis
Principal component analysis
 
Introduction to Generalized Linear Models
Introduction to Generalized Linear ModelsIntroduction to Generalized Linear Models
Introduction to Generalized Linear Models
 
Credit Card Fraud Detection Using Unsupervised Machine Learning Algorithms
Credit Card Fraud Detection Using Unsupervised Machine Learning AlgorithmsCredit Card Fraud Detection Using Unsupervised Machine Learning Algorithms
Credit Card Fraud Detection Using Unsupervised Machine Learning Algorithms
 
7 classical assumptions of ordinary least squares
7 classical assumptions of ordinary least squares7 classical assumptions of ordinary least squares
7 classical assumptions of ordinary least squares
 
hypothesis testing
hypothesis testinghypothesis testing
hypothesis testing
 
Statistics Case Study - Stepwise Multiple Regression
Statistics Case Study - Stepwise Multiple RegressionStatistics Case Study - Stepwise Multiple Regression
Statistics Case Study - Stepwise Multiple Regression
 
Roc auc curve
Roc auc curveRoc auc curve
Roc auc curve
 
Logistic regression with SPSS examples
Logistic regression with SPSS examplesLogistic regression with SPSS examples
Logistic regression with SPSS examples
 

Similar to AUTO MPG Regression Analysis

Auto MPG Regression Analysis
Auto MPG Regression AnalysisAuto MPG Regression Analysis
Auto MPG Regression AnalysisAnirudh Srinath.V
 
Multiple Linear Regression Applications Automobile Pricing
Multiple Linear Regression Applications Automobile PricingMultiple Linear Regression Applications Automobile Pricing
Multiple Linear Regression Applications Automobile Pricinginventionjournals
 
Toward a Unified Approach to Fitting Loss Models
Toward a Unified Approach to Fitting Loss ModelsToward a Unified Approach to Fitting Loss Models
Toward a Unified Approach to Fitting Loss ModelsJacques Rioux
 
Lab practice session.pptx
Lab practice session.pptxLab practice session.pptx
Lab practice session.pptxakashayosha
 
Statistics - Multiple Regression and Two Way Anova
Statistics - Multiple Regression and Two Way AnovaStatistics - Multiple Regression and Two Way Anova
Statistics - Multiple Regression and Two Way AnovaNisheet Mahajan
 
Stats computing project_final
Stats computing project_finalStats computing project_final
Stats computing project_finalAyank Gupta
 
Diabetes data - model assessment using R
Diabetes data - model assessment using RDiabetes data - model assessment using R
Diabetes data - model assessment using RGregg Barrett
 
Simulation of an Active Suspension Using PID Control
Simulation of an Active Suspension Using PID ControlSimulation of an Active Suspension Using PID Control
Simulation of an Active Suspension Using PID ControlSuzana Avila
 
multiple Regression
multiple Regressionmultiple Regression
multiple RegressionAnniqah
 
Guide for building GLMS
Guide for building GLMSGuide for building GLMS
Guide for building GLMSAli T. Lotia
 
Regression_Class_Project_-_MTCARS
Regression_Class_Project_-_MTCARSRegression_Class_Project_-_MTCARS
Regression_Class_Project_-_MTCARSDavid Ritchie
 
Flavours of Physics Challenge: Transfer Learning approach
Flavours of Physics Challenge: Transfer Learning approachFlavours of Physics Challenge: Transfer Learning approach
Flavours of Physics Challenge: Transfer Learning approachAlexander Rakhlin
 

Similar to AUTO MPG Regression Analysis (20)

Auto MPG Regression Analysis
Auto MPG Regression AnalysisAuto MPG Regression Analysis
Auto MPG Regression Analysis
 
Multiple Regression
Multiple RegressionMultiple Regression
Multiple Regression
 
R analysis of covariance
R   analysis of covarianceR   analysis of covariance
R analysis of covariance
 
JEDM_RR_JF_Final
JEDM_RR_JF_FinalJEDM_RR_JF_Final
JEDM_RR_JF_Final
 
Employee mode of commuting
Employee mode of commutingEmployee mode of commuting
Employee mode of commuting
 
Multiple Linear Regression Applications Automobile Pricing
Multiple Linear Regression Applications Automobile PricingMultiple Linear Regression Applications Automobile Pricing
Multiple Linear Regression Applications Automobile Pricing
 
Toward a Unified Approach to Fitting Loss Models
Toward a Unified Approach to Fitting Loss ModelsToward a Unified Approach to Fitting Loss Models
Toward a Unified Approach to Fitting Loss Models
 
Telecom customer churn prediction
Telecom customer churn predictionTelecom customer churn prediction
Telecom customer churn prediction
 
Lab practice session.pptx
Lab practice session.pptxLab practice session.pptx
Lab practice session.pptx
 
Chapter 18,19
Chapter 18,19Chapter 18,19
Chapter 18,19
 
report
reportreport
report
 
Statistics - Multiple Regression and Two Way Anova
Statistics - Multiple Regression and Two Way AnovaStatistics - Multiple Regression and Two Way Anova
Statistics - Multiple Regression and Two Way Anova
 
Stats computing project_final
Stats computing project_finalStats computing project_final
Stats computing project_final
 
Diabetes data - model assessment using R
Diabetes data - model assessment using RDiabetes data - model assessment using R
Diabetes data - model assessment using R
 
Simulation of an Active Suspension Using PID Control
Simulation of an Active Suspension Using PID ControlSimulation of an Active Suspension Using PID Control
Simulation of an Active Suspension Using PID Control
 
multiple Regression
multiple Regressionmultiple Regression
multiple Regression
 
Guide for building GLMS
Guide for building GLMSGuide for building GLMS
Guide for building GLMS
 
Regression_Class_Project_-_MTCARS
Regression_Class_Project_-_MTCARSRegression_Class_Project_-_MTCARS
Regression_Class_Project_-_MTCARS
 
Flavours of Physics Challenge: Transfer Learning approach
Flavours of Physics Challenge: Transfer Learning approachFlavours of Physics Challenge: Transfer Learning approach
Flavours of Physics Challenge: Transfer Learning approach
 
Factors affecting customer satisfaction
Factors affecting customer satisfactionFactors affecting customer satisfaction
Factors affecting customer satisfaction
 

Recently uploaded

BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxolyaivanovalion
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Delhi Call girls
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceDelhi Call girls
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxolyaivanovalion
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...amitlee9823
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023ymrp368
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxMohammedJunaid861692
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxfirstjob4
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Delhi Call girls
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...amitlee9823
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...amitlee9823
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 

Recently uploaded (20)

BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptx
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 

AUTO MPG Regression Analysis

  • 2. INTRODUCTION  The objective of this project is to study the relationship between Horsepower, Displacement, Cylinders, Acceleration and Weight on Miles Per Gallon(MPG). The dataset was obtained from the UCI Website and Regression Analysis was conducted.  The reason why we choose the particular dataset was because of its practical applications involved in it. Miles per Gallon(mpg) will be useful when you purchase a car and that was one of the reasons why we choose this dataset.
  • 3. METHODOLOGY The model that we have used to perform regression analysis is multivariate. It has more than two variables and therefore Multiple Regression Analysis is conducted. The variable here to predict is called the dependent variable. The variables here to predict the dependent variable are called the independent variables.
  • 4. Data Sourcing The data taken into consideration is taken from the University of California-Irvine website. It has been extensively used by students,educators and researchers all over the world and is the primary source for Regression Dataset Analysis Link to the Dataset - http://archive.ics.uci.edu/ml/datasets/Auto+MPG
  • 5. VARIABLES DEPENDENT VARIABLE: Miles per Gallon(mpg) – Continuous INDEPENDENT VARIABLES: Cylinders - Multi-Valued discrete - Denotes the no of cylinders in a car(3,4,6,8) Displacement - Continuous - Volume of Pistons inside a car Horsepower - Continuous - Power of an Engine in a car Weight - Continuous - Weight of the car in lbs Acceleration - Continuous - Acceleration of a car
  • 6. MODEL 1 - Multiple Regression Analysis Miles Per Gallon(MPG) is regressed on the four independent variables and this is the first model of our Regression Analysis. R- Squared explains 70.70% variation in the independent variable(MPG).
  • 7. MODEL 2 - INDEPENDENT VARIABLE TRANSFORMATION  After transforming the independent variable with log transformation, we found the R squared to improve from 70.70% to 78.98%.Also performing the slog transformation, showed the data to be distributed normal which we could see from the histogram distribution. The formula is given below  L_mpg = β0 + β1Displacement + β2Horsepower + β3Acceleration + β4Weight
  • 8. CORRELATION ANALYSIS  Here we found that correlation between  1) Displacement and Horsepower  2) Weight and horsepower  3) Weight and Displacement
  • 9. HISTOGRAM & SCATTER PLOT FOR LIN-LIN MODEL Scatter Plot HISTOGRAM
  • 10. HISTOGRAM & SCATTER PLOT FOR LOG-LIN MODEL Scatter Plot Histogram As you can see from the graphs, the Log-Lin Model appears to be a better model because it is more normally distributed.
  • 11. Hypothesis Testing - Paired sample t test Hypothesis Testing to identify if the Coefficients of Two variables are equal is performed
  • 12. MODEL 3 - DUMMY VARIABLE - ANALYSIS(STEP 1) The first step to identify the dummy variables in the model is to identify the no of categories in a variable. As seen from the table, our model has 5 categories with Eight having the highest frequency..
  • 13. STEP 2 - DUMMY VARIABLE ANALYSIS Multiple Regression is performed Using the Dummy encoded Cylinders with Cylinder 5 as the base variable. Cylinder Variable 5 is Three which has a frequency of 3.
  • 14. MODEL 4 - INTERACTION TERMS & REGRESSION ANALYSIS Regression is done on Interaction Terms (Displacement & Horsepower) and the other independent variables. The reason why Displacement and Horsepower was chosen is because of their high correlation value
  • 15. MODEL 5 - Regression Analysis on Dummy Variables & Interaction Terms. Regression Analysis is done on the Dummy Variables and Interaction Terms to check if the R-Squared Value is increasing. The equation for the model is given below L_mpg = β0 + β1Displacement + β2Horsepower + β3Acceleration + β4Weight + β5CYLINDER_COUNT4 + β6CYLINDER_COUNT2 + β7CYLINDER_COUNT3 + β8CYLINDER_COUNT5 + β9disp_horse
  • 16. OBSERVATIONS FROM MODEL 5  Here CYLINDER_COUNT1 is being kept as base variable and regressed on the other independent variables.  We can see that CYLINDER_COUNT4 is 3.3% less that CYLINDER_COUNT1  We could see (CYLINDER_COUNT4 ) is predicted to have 11.3 – (-3.3) = 14.6 more mpg than CYLINDER_COUNT2  To check whether the difference is significant or not, we have performed another model with CYLINDER_COUNT4 is kept as the base variable.
  • 17. Test For Significant Difference Here CYLINDER_COUNT4 is kept as base and regressed model shows that CYLINDER_COUNT2 has 14.6% more mpg than CYLINDER_COUNT4 (which is evident from our previous model)
  • 18. Testing Differences Between Groups(F-Test) L_mpg = β0 + 𝛿0 CYLINDER_COUNT1 + β1 displacement + 𝛿1 c1_disp + β2 horsepower + 𝛿2 c1_horse + β3 weight + 𝛿3 c1_weight + β4 acceleration + 𝛿4 c1_acc  Null hypothesis: If 𝛿0 = 𝛿1 = 𝛿2 = 𝛿3 = 𝛿4 = 0 then we conclude that there is no difference between the groups Alternate: Null hypothesis is False i.e, there is a difference between the groups Using F-Stats to determine difference between groups(Restricted & Unrestricted)
  • 19. UNRESTRICTED MODEL Unrestricted model contains Independent Variables and Dummy Variable(Cylinder Count 1) and the product of the Dummy Variable along with Independent Variables.
  • 20. RESTRICTED MODEL Restricted Model contains Regression on the Base Model.
  • 21. F-Test to Determine Difference between Groups F = (R2 u - R2 r)/q (1 – R2 u)/ (n-k-1) = (0.8154 – 0.7898)/5 (1 – 0.8154) / 382 =10.59 Therefore 10.59 is greater than F-Table(5,382) which is 2.2141. Therefore we reject the null and therefore we can conclude that there are differences in groups.
  • 22. Test for Heteroskedasticity - Breusch Pagan Test Multiple Regression is done using Log-Lin Model to check for heteroskedasticity.
  • 23. As seen from the table, the Error Term is predicted and regression is done on the Square of the Regressors. Hypothesis Testing for Heteroskedasticity
  • 24. Continued.. Null Hypothesis - βdisplacement = βhorsepower= βweight= βacceleration = 0 Alternate Hypothesis - There is heteroskedasticity F = (R2 u /k) (1 – R2 u)/ (n-k-1) = (0.05/4) (1 – 0.05)/ (387) = 5.092 Therefore 5.092 is greater than F-Table(4,387) which is 2.3719 and null is rejected. So our model exhibits heteroskedasticity.
  • 25. White Test for Heteroskedasticity Multiple Regression is done using the Log-Lin Model.
  • 26. Regression on Cross Products of Regressors and its Square  Gen disp2 = displacement ^2  Gen horsepower2 = Horsepower ^2  Gen Acceleration2 = Acceleration ^2  Gen Weight2 = Weight ^2  Gen disp_acceleration = Displacement * Acceleration  Gen horse_acc = Horsepower * Acceleration  Gen weight_acc = Weight * Acceleration
  • 28. Hypothesis Testing Null Hypothesis - βdisplacement = βhorsepower= βweight= βacceleration = 0 Alternate Hypothesis - There is heteroskedasticity F Statistic(90.44482) is greater than F-Table Value(8.08), therefore we reject the null and confirm that there is heteroskedasticity.
  • 29. Conclusion for Heteroskedasticity As seen from the graph and the two tests, we can determine that there is heteroskedasticity.
  • 30. HETEROSKEDASTICITY ROBUST STANDARD ERRORS(HRSE) Due to the presence of heteroskedasticity, the best variance and the standard error estimates are not valid. Therefore we need to find heteroskedasticity robust standard errors. When a model exhibits heteroskedasticity, it is better to look at the robust standard errors than the OLS standard errors.
  • 33. Summary Model No R-Squared Adjusted R- Squared Model 1 0.7070 0.7040 Model 2 0.7898 0.7876 Model 3 0.8112 0.8073 Model 4 0.8134 0.8110 Model 5 0.8286 0.8284