SlideShare a Scribd company logo
How do US Drivers
Choose the Cars
They Buy
SHIH-WEN HUANG, SHEN YAN, LIYAN WANG
Kelley Blue Book. © Kelley Blue Book Co, Inc. All Rights Reserved.
Survey
Kelley Blue Book. © Kelley Blue Book Co, Inc. All Rights Reserved.
Source of data
 This is a consumer study undertaken across the
US by Kelley Blue Book(known as KBB), a
vehicle valuation and automotive research
company that is recognized by both consumers
and the automotive industry.
 The purpose of this analysis is to find out how US
drivers choose the cars they buy.
Kelley Blue Book. © Kelley Blue Book Co, Inc. All Rights Reserved.
Data Cleansing
Kelley Blue Book. © Kelley Blue Book Co, Inc. All Rights Reserved.
Missing values removed
Sorted in alphabet order
Kelley Blue Book. © Kelley Blue Book Co, Inc. All Rights Reserved.
Dataset Variables
 Dependent Variable: OCRAT1(Consumer
Reports numerical score)
 Number of observations: n=170
1. Mcode(Manufacturer
Code)
5. OGAS1(Gas mileage mpg) 9. OSAF1(Consumer
Reports rating of safety )
2. CRREC(Recommended by
Consumer Reports = 1)
6. ORLGRM1(Rear leg room
inches)
10. OHAND1(Consumer
Reports rating of handling)
3. OREL1(Consumer Reports
reliability rating - 5 pt. scale)
7. OACCEL1(Acceleration 0-
60 mph)
11. ORIDE1(Consumer
Reports rating of ride)
4. OLCAP1(Luggage capacity
cu. ft.)
8. OFSEAT1(Consumer
Reports rating of front seat
comfort)
Kelley Blue Book. © Kelley Blue Book Co, Inc. All Rights Reserved.
Descriptive Statistics
Variable Name Mean Standard Deviation
CRREC 3.112 1.0793283
OREL1 31.048 16.5318098
OLCAP1 18.647 3.4287537
OGAS1 28.644 1.8149920
ORLGRM1 9.077 1.8248783
OACCEL1 3.900 0.5509540
OFSEAT1 4.323 0.7672302
OSAF1 3.288 0.7333012
OHAND1 3.224 0.6772110
Kelley Blue Book. © Kelley Blue Book Co, Inc. All Rights Reserved.
Variable Correlation
 All correlations between independent
variables are <0.5 except for the following:
OLCAP1 with OGAS1 = -0.559
OHAND1 with OGAS1 = 0.504
Kelley Blue Book. © Kelley Blue Book Co, Inc. All Rights Reserved.
Variable Correlation
Kelley Blue Book. © Kelley Blue Book Co, Inc. All Rights Reserved.
Transformation
Variable Selection
 Maximum R-squared
 Stepwise Selection
 GLMSelection
 AIC Selection
Kelley Blue Book. © Kelley Blue Book Co, Inc. All Rights Reserved.
Selected Model
OCRAT1= -107.53353+ 0.88959*Mcode+ 1.52307*
OLCAP1+ 3.59932* OGAS1- -8.34379* OACCEL1+
17.88028* OFSEAT1+ 15.66684* OSAF1+ 39.36924*
OHAND1+ 27.44342* ORIDE1;
Regression Conclusion
 R Squared: 0.5754
 Adjusted R-squared: 0.5543
 Number of influence point is 17
 Overall F-statistic: 27.27
 P-value for Overall F-test: <0.0001
 VIF=PRESS/SSR=1.1433
PROC REG
Source Code
PROC REG data=car_new;
MODEL OCRAT1= Mcode OLCAP1 OGAS1
OACCEL1 OFSEAT1 OSAF1 OHAND1 ORIDE1
/ r p influence vif;
PLOT r.* p. r.* nqq.;
RUN;
Kelley Blue Book. © Kelley Blue Book Co, Inc. All Rights Reserved.
Selected Model
Parameter Estimates
Variable DF Parameter Standard t Value Pr > |t|
Estimate Error
 ---------------------------------------------------------------------------
 Intercept 1 -210.36703 36.06039 -5.83 <.0001
 Mcode 1 0.65258 0.28073 2.32 0.0215
 OLCAP1 1 1.59184 0.21803 7.30 <.0001
 OGAS1 1 5.58281 1.11826 4.99 <.0001
 OACCEL1 1 -6.79223 1.60769 -4.22 <.0001
 OFSEAT1 1 18.78770 5.61708 3.34 0.0011
 OSAF1 1 21.18837 3.99055 5.31 <.0001
 OHAND1 1 35.98280 4.59454 7.83 <.0001
 ORIDE1 1 38.90925 4.57810 8.50 <.0001
Kelley Blue Book. © Kelley Blue Book Co, Inc. All Rights Reserved.
Selected Model
Cross-validation Estimates
 --------------Cross Validation Estimates---------------
 Parameter 1 2 3 4 5
 Intercept -215.22 -221.657 -240.976 -190.66 -174.67
 Mcode 0.78 0.471 0.884 0.51 0.64
 OLCAP1 1.54 1.563 1.470 1.79 1.61
 OGAS1 5.35 5.461 5.613 6.15 5.39
 OACCEL1 -6.38 -6.281 -5.785 -7.92 -8.46
 OFSEAT1 20.39 19.546 22.426 15.83 15.27
 OSAF1 22.50 22.200 22.450 17.85 21.28
 OHAND1 34.11 36.681 33.626 39.38 35.78
 ORIDE1 38.68 40.296 42.117 35.96 37.25
Kelley Blue Book. © Kelley Blue Book Co, Inc. All Rights Reserved.
Analysis of Variance for
Model
 R-squared: 0.7782
 Adjusted R-squared: 0.7659
 Overall F-statistic: 63.17
 P-value for Overall F-test: <0.0001
 VIF=PRESS/SSR:1.1715
Kelley Blue Book. © Kelley Blue Book Co, Inc. All Rights Reserved.
PROC REG -- Influence
Points and Multicollinarity
 Inspection of 𝑧𝑖
∗
and ℎ𝑖𝑖 for all observations shows that there are 10
influence points.
 Here are the VIF values:

Variable VIF
-------------------------------
 Intercept 0
 Mcode 1.27290
 OLCAP1 1.77977
 OGAS1 1.95318
 OACCEL1 1.24724
 OFSEAT1 1.19959
 OSAF1 1.16904
 OHAND1 1.62568
 ORIDE1 1.17256
Kelley Blue Book. © Kelley Blue Book Co, Inc. All Rights Reserved.
Residual Plot
Kelley Blue Book. © Kelley Blue Book Co, Inc. All Rights Reserved.
Normal Plot of Residuals
Kelley Blue Book. © Kelley Blue Book Co, Inc. All Rights Reserved.
Conclusions
 The final model holds up under cross-validation.
 The R-squared value is relatively high: 𝑅2
=.7782.
 There are 10 influence points which can be accepted
given the sample number.
 There is no multicollinarity.
 The residual plot satisfies the assumption: the
residuals are unbiased and homoscedastic.
 The residuals are normally distributed.
Kelley Blue Book. © Kelley Blue Book Co, Inc. All Rights Reserved.
Follow-up Analysis
 Is it possible to have more observations in a
sample?
 Is there any other factors, which influence the
consumer choice making, that are not
included in the original survey?
 Is this analysis too general? Should we break
down into several groups, ex. used cars vs
new cars, SUV vs sedan?
Appendix
 /* Import data and creat new dataset called car */
 PROC IMPORT datafile="C:/datasets/cars.csv"
 OUT=car
 DBMS=csv
 REPLACE;
 getnames=yes;
 RUN;
 PROC PRINT;
 RUN;
Appendix
 /* Descriptive statistics about each variable */
 PROC MEANS data=car mean min max stddev p25 p75;
 VAR OCRAT1 OREL1 OLCAP1 OGAS1 ORLGRM1 OACCEL1 OFSEAT1 OSAF1 OHAND1
ORIDE1;
 RUN;
 PROC SGSCATTER data=car;
 MATRIX OCRAT1 Mcode OREL1 OLCAP1 OGAS1 ORLGRM1 OACCEL1 OFSEAT1 OSAF1
OHAND1 ORIDE1;
 RUN;
 /* Test the correlation between each independent variable */
 PROC CORR data=car;
 VAR Mcode OREL1 OLCAP1 OGAS1 ORLGRM1 OACCEL1 OFSEAT1 OSAF1 OHAND1
ORIDE1;
 RUN;
Appendix
 /* Build Linear Regression Model for car dataset */
 PROC REG data=car;
 MODEL OCRAT1= Mcode CRREC OREL1 OLCAP1 OGAS1 ORLGRM1 OACCEL1
OFSEAT1 OSAF1 OHAND1 ORIDE1 / r p vif;
 PLOT r.* p. r.* nqq.;
 RUN;
 * Model 1 Using MaximumR-squared Selection ;
 PROC REG data=car ;
 MODEL OCRAT1= Mcode CRREC OREL1 OLCAP1 OGAS1 ORLGRM1 OACCEL1
OFSEAT1 OSAF1 OHAND1 ORIDE1 / selection= maxr r p influence vif;
 PLOT r.* p. r.* nqq.;
 RUN;
Appendix
 * Model 2 Using Stepwise Selection ;
 PROC REG data=car ;
 MODEL OCRAT1= Mcode CRREC OREL1 OLCAP1 OGAS1 ORLGRM1 OACCEL1 OFSEAT1 OSAF1
OHAND1 ORIDE1 / selection= stepwise r p influence vif;
 PLOT r.* p. r.* nqq.;
 RUN;
 *Model 3 Using AIC Selection;
 PROC RSQUARE AIC;
 MODEL OCRAT1= Mcode CRREC OREL1 OLCAP1 OGAS1 ORLGRM1 OACCEL1 OFSEAT1 OSAF1
OHAND1 ORIDE1 / select=2;
 RUN;
 * Model 4 Using GLMSelection ;
 PROC GLMSELECT data=car ;
 MODEL OCRAT1= Mcode CRREC OREL1 OLCAP1 OGAS1 ORLGRM1 OACCEL1 OFSEAT1 OSAF1
OHAND1 ORIDE1;
 RUN;
Appendix
 *Initial final model ;
 PROC REG data=car;
 MODEL OCRAT1=MCODE OLCAP1 OGAS1 OACCEL1 OFSEAT1 OSAF1 OHAND1 ORIDE1/r p influence
VIF;
 PLOT r.* p. r.* nqq.;
 RUN;
 /* Remove the influence points and rebuild the model */
 *Import new data and creat new dataset called car_new ;
 PROC IMPORT datafile='C:/datasets/cars_new.csv'
 OUT=car_new
 DBMS=csv
 REPLACE;
 getnames=yes;
 RUN;
 PROC PRINT;
 RUN;
Appendix
 /* Build the best regression model */
 PROC REG data=car_new;
 MODEL OCRAT1= Mcode OLCAP1 OGAS1 OACCEL1 OFSEAT1 OSAF1 OHAND1
ORIDE1 / r p influence vif;
 PLOT r.* p. r.* nqq.;
 RUN;
 /* Cross validation */
 PROC GLMSELECT seed=4530;
 MODEL OCRAT1= Mcode OREL1 OLCAP1 OGAS1 ORLGRM1 OACCEL1 OFSEAT1
OSAF1 OHAND1 ORIDE1
 / stats= all cvdetails=all details=summary
selection=stepwise(select=cv drop=competitive) cvmethod=random(5);
 RUN;
 QUIT;

More Related Content

Viewers also liked

Victor julio velasquez cabria
Victor julio velasquez cabriaVictor julio velasquez cabria
Victor julio velasquez cabria
inspirandotic
 
Luis alfredo cartagena quinto
Luis alfredo cartagena quintoLuis alfredo cartagena quinto
Luis alfredo cartagena quinto
inspirandotic
 
Publicación de contenidos en web
Publicación de contenidos en webPublicación de contenidos en web
Publicación de contenidos en web
javiercristian
 
What's Next for Open Access--Part I
What's Next for Open Access--Part IWhat's Next for Open Access--Part I
What's Next for Open Access--Part I
Copyright Clearance Center
 
Estella maria leon castillo
Estella maria leon castilloEstella maria leon castillo
Estella maria leon castillo
inspirandotic
 
Geraldin andrea cerezo martinez
Geraldin andrea cerezo martinezGeraldin andrea cerezo martinez
Geraldin andrea cerezo martinez
inspirandotic
 
Yeison alejandro seguro
Yeison alejandro seguroYeison alejandro seguro
Yeison alejandro seguro
inspirandotic
 
E-mail marketing legally
E-mail marketing legallyE-mail marketing legally
E-mail marketing legally
Katalin Horváth
 
Presentación1
Presentación1Presentación1
Presentación1
elviavalle
 
Español Acuifero Huerta de las Pilas
Español Acuifero Huerta de las PilasEspañol Acuifero Huerta de las Pilas
Español Acuifero Huerta de las Pilas
Acuifero Las Pilas
 
Analisis brian lugo barona
Analisis brian lugo baronaAnalisis brian lugo barona
Analisis brian lugo barona
Brian Barona
 
Economia 4º A
Economia 4º AEconomia 4º A
Economia 4º A
Luis Fran Solis
 
Davison andrey brown pastrana
Davison andrey brown pastranaDavison andrey brown pastrana
Davison andrey brown pastrana
inspirandotic
 
Diego alexis palencia ibarguen
Diego alexis palencia ibarguenDiego alexis palencia ibarguen
Diego alexis palencia ibarguen
inspirandotic
 
Natalia andrea ballesteros
Natalia andrea ballesterosNatalia andrea ballesteros
Natalia andrea ballesteros
inspirandotic
 
Literature Review – Immigrant Entrepreneurs
Literature Review – Immigrant EntrepreneursLiterature Review – Immigrant Entrepreneurs
Literature Review – Immigrant Entrepreneurs
Daniel Arvidsson
 
Steve Signore Resume 12-2016
Steve Signore  Resume 12-2016Steve Signore  Resume 12-2016
Steve Signore Resume 12-2016
Steve signore
 
Ordonanța de urgență nr. 52/2016 - ce reprezintă și cum se aplică
Ordonanța de urgență nr. 52/2016 - ce reprezintă și cum se aplicăOrdonanța de urgență nr. 52/2016 - ce reprezintă și cum se aplică
Ordonanța de urgență nr. 52/2016 - ce reprezintă și cum se aplică
Țuca Zbârcea & Asociații
 

Viewers also liked (18)

Victor julio velasquez cabria
Victor julio velasquez cabriaVictor julio velasquez cabria
Victor julio velasquez cabria
 
Luis alfredo cartagena quinto
Luis alfredo cartagena quintoLuis alfredo cartagena quinto
Luis alfredo cartagena quinto
 
Publicación de contenidos en web
Publicación de contenidos en webPublicación de contenidos en web
Publicación de contenidos en web
 
What's Next for Open Access--Part I
What's Next for Open Access--Part IWhat's Next for Open Access--Part I
What's Next for Open Access--Part I
 
Estella maria leon castillo
Estella maria leon castilloEstella maria leon castillo
Estella maria leon castillo
 
Geraldin andrea cerezo martinez
Geraldin andrea cerezo martinezGeraldin andrea cerezo martinez
Geraldin andrea cerezo martinez
 
Yeison alejandro seguro
Yeison alejandro seguroYeison alejandro seguro
Yeison alejandro seguro
 
E-mail marketing legally
E-mail marketing legallyE-mail marketing legally
E-mail marketing legally
 
Presentación1
Presentación1Presentación1
Presentación1
 
Español Acuifero Huerta de las Pilas
Español Acuifero Huerta de las PilasEspañol Acuifero Huerta de las Pilas
Español Acuifero Huerta de las Pilas
 
Analisis brian lugo barona
Analisis brian lugo baronaAnalisis brian lugo barona
Analisis brian lugo barona
 
Economia 4º A
Economia 4º AEconomia 4º A
Economia 4º A
 
Davison andrey brown pastrana
Davison andrey brown pastranaDavison andrey brown pastrana
Davison andrey brown pastrana
 
Diego alexis palencia ibarguen
Diego alexis palencia ibarguenDiego alexis palencia ibarguen
Diego alexis palencia ibarguen
 
Natalia andrea ballesteros
Natalia andrea ballesterosNatalia andrea ballesteros
Natalia andrea ballesteros
 
Literature Review – Immigrant Entrepreneurs
Literature Review – Immigrant EntrepreneursLiterature Review – Immigrant Entrepreneurs
Literature Review – Immigrant Entrepreneurs
 
Steve Signore Resume 12-2016
Steve Signore  Resume 12-2016Steve Signore  Resume 12-2016
Steve Signore Resume 12-2016
 
Ordonanța de urgență nr. 52/2016 - ce reprezintă și cum se aplică
Ordonanța de urgență nr. 52/2016 - ce reprezintă și cum se aplicăOrdonanța de urgență nr. 52/2016 - ce reprezintă și cum se aplică
Ordonanța de urgență nr. 52/2016 - ce reprezintă și cum se aplică
 

Similar to auto choice(revised)

Airline scheduling and pricing using a genetic algorithm
Airline scheduling and pricing using a genetic algorithmAirline scheduling and pricing using a genetic algorithm
Airline scheduling and pricing using a genetic algorithm
Alan Walker
 
Evsa sales deck [pub]
Evsa sales deck [pub]Evsa sales deck [pub]
Evsa sales deck [pub]
Jorge Sánchez Ureña
 
Lazy Join Optimizations Without Upfront Statistics with Matteo Interlandi
Lazy Join Optimizations Without Upfront Statistics with Matteo InterlandiLazy Join Optimizations Without Upfront Statistics with Matteo Interlandi
Lazy Join Optimizations Without Upfront Statistics with Matteo Interlandi
Databricks
 
Technology and Cost Assessments for New Passenger Cars in China - Hui He
Technology and Cost Assessments for New Passenger Cars in China - Hui HeTechnology and Cost Assessments for New Passenger Cars in China - Hui He
Technology and Cost Assessments for New Passenger Cars in China - Hui He
International Council on Clean Transportation
 
Applied Econometrics assignment3
Applied Econometrics assignment3Applied Econometrics assignment3
Applied Econometrics assignment3
Chenguang Li
 
Induction Loop Vehicle Detector and Counter
Induction Loop Vehicle Detector and CounterInduction Loop Vehicle Detector and Counter
Induction Loop Vehicle Detector and Counter
Towfiqur Rahman
 
Cpk problem solving_pcba smt machine
Cpk problem solving_pcba smt machineCpk problem solving_pcba smt machine
Cpk problem solving_pcba smt machine
Shenzhen Southern Machinery Sales And Service Co., Ltd
 
IRJET- Analysis of Emission Data by using Testbed for Euro VI Norms
IRJET- Analysis of Emission Data by using Testbed for Euro VI NormsIRJET- Analysis of Emission Data by using Testbed for Euro VI Norms
IRJET- Analysis of Emission Data by using Testbed for Euro VI Norms
IRJET Journal
 
Trends in real driving emissions from roadside measurements
Trends in real driving emissions from roadside measurementsTrends in real driving emissions from roadside measurements
Trends in real driving emissions from roadside measurements
Institute for Transport Studies (ITS)
 
Prediction-based Model Selection in PLS-PM
Prediction-based Model Selection in PLS-PMPrediction-based Model Selection in PLS-PM
Prediction-based Model Selection in PLS-PM
Galit Shmueli
 
NAFEMS Americas Elements presentation
NAFEMS Americas Elements presentationNAFEMS Americas Elements presentation
NAFEMS Americas Elements presentation
Angus Lock
 
Cost Improvement EPIC
Cost Improvement EPICCost Improvement EPIC
Cost Improvement EPIC
Jason Gallion
 
Item 1. Value of a PT
Item 1. Value of a PTItem 1. Value of a PT
Item 1. Value of a PT
Soils FAO-GSP
 
SAP CIN
SAP CINSAP CIN
SSGB-Loss Reduction in ATF PLT
SSGB-Loss Reduction in ATF PLTSSGB-Loss Reduction in ATF PLT
SSGB-Loss Reduction in ATF PLT
Neelesh Bhagwat
 
Using Six Sigma to reduce ATF pipeline transfer losses
Using Six Sigma to reduce ATF pipeline transfer lossesUsing Six Sigma to reduce ATF pipeline transfer losses
Using Six Sigma to reduce ATF pipeline transfer losses
Neelesh Bhagwat
 
Introduction to Green NCAP
Introduction to Green NCAPIntroduction to Green NCAP
Introduction to Green NCAP
Green NCAP
 
Configurationguidecin 121107011256-phpapp02
Configurationguidecin 121107011256-phpapp02Configurationguidecin 121107011256-phpapp02
Configurationguidecin 121107011256-phpapp02
Aditya Pandey
 
AAPS Advanced Controls Uploaded 2
AAPS Advanced Controls Uploaded 2AAPS Advanced Controls Uploaded 2
AAPS Advanced Controls Uploaded 2
Paul Brodbeck
 
On-Board Diagnostics (OBD) Program Overview
On-Board Diagnostics (OBD) Program Overview On-Board Diagnostics (OBD) Program Overview
On-Board Diagnostics (OBD) Program Overview
Zentiz
 

Similar to auto choice(revised) (20)

Airline scheduling and pricing using a genetic algorithm
Airline scheduling and pricing using a genetic algorithmAirline scheduling and pricing using a genetic algorithm
Airline scheduling and pricing using a genetic algorithm
 
Evsa sales deck [pub]
Evsa sales deck [pub]Evsa sales deck [pub]
Evsa sales deck [pub]
 
Lazy Join Optimizations Without Upfront Statistics with Matteo Interlandi
Lazy Join Optimizations Without Upfront Statistics with Matteo InterlandiLazy Join Optimizations Without Upfront Statistics with Matteo Interlandi
Lazy Join Optimizations Without Upfront Statistics with Matteo Interlandi
 
Technology and Cost Assessments for New Passenger Cars in China - Hui He
Technology and Cost Assessments for New Passenger Cars in China - Hui HeTechnology and Cost Assessments for New Passenger Cars in China - Hui He
Technology and Cost Assessments for New Passenger Cars in China - Hui He
 
Applied Econometrics assignment3
Applied Econometrics assignment3Applied Econometrics assignment3
Applied Econometrics assignment3
 
Induction Loop Vehicle Detector and Counter
Induction Loop Vehicle Detector and CounterInduction Loop Vehicle Detector and Counter
Induction Loop Vehicle Detector and Counter
 
Cpk problem solving_pcba smt machine
Cpk problem solving_pcba smt machineCpk problem solving_pcba smt machine
Cpk problem solving_pcba smt machine
 
IRJET- Analysis of Emission Data by using Testbed for Euro VI Norms
IRJET- Analysis of Emission Data by using Testbed for Euro VI NormsIRJET- Analysis of Emission Data by using Testbed for Euro VI Norms
IRJET- Analysis of Emission Data by using Testbed for Euro VI Norms
 
Trends in real driving emissions from roadside measurements
Trends in real driving emissions from roadside measurementsTrends in real driving emissions from roadside measurements
Trends in real driving emissions from roadside measurements
 
Prediction-based Model Selection in PLS-PM
Prediction-based Model Selection in PLS-PMPrediction-based Model Selection in PLS-PM
Prediction-based Model Selection in PLS-PM
 
NAFEMS Americas Elements presentation
NAFEMS Americas Elements presentationNAFEMS Americas Elements presentation
NAFEMS Americas Elements presentation
 
Cost Improvement EPIC
Cost Improvement EPICCost Improvement EPIC
Cost Improvement EPIC
 
Item 1. Value of a PT
Item 1. Value of a PTItem 1. Value of a PT
Item 1. Value of a PT
 
SAP CIN
SAP CINSAP CIN
SAP CIN
 
SSGB-Loss Reduction in ATF PLT
SSGB-Loss Reduction in ATF PLTSSGB-Loss Reduction in ATF PLT
SSGB-Loss Reduction in ATF PLT
 
Using Six Sigma to reduce ATF pipeline transfer losses
Using Six Sigma to reduce ATF pipeline transfer lossesUsing Six Sigma to reduce ATF pipeline transfer losses
Using Six Sigma to reduce ATF pipeline transfer losses
 
Introduction to Green NCAP
Introduction to Green NCAPIntroduction to Green NCAP
Introduction to Green NCAP
 
Configurationguidecin 121107011256-phpapp02
Configurationguidecin 121107011256-phpapp02Configurationguidecin 121107011256-phpapp02
Configurationguidecin 121107011256-phpapp02
 
AAPS Advanced Controls Uploaded 2
AAPS Advanced Controls Uploaded 2AAPS Advanced Controls Uploaded 2
AAPS Advanced Controls Uploaded 2
 
On-Board Diagnostics (OBD) Program Overview
On-Board Diagnostics (OBD) Program Overview On-Board Diagnostics (OBD) Program Overview
On-Board Diagnostics (OBD) Program Overview
 

auto choice(revised)

  • 1. How do US Drivers Choose the Cars They Buy SHIH-WEN HUANG, SHEN YAN, LIYAN WANG Kelley Blue Book. © Kelley Blue Book Co, Inc. All Rights Reserved.
  • 2. Survey Kelley Blue Book. © Kelley Blue Book Co, Inc. All Rights Reserved.
  • 3. Source of data  This is a consumer study undertaken across the US by Kelley Blue Book(known as KBB), a vehicle valuation and automotive research company that is recognized by both consumers and the automotive industry.  The purpose of this analysis is to find out how US drivers choose the cars they buy. Kelley Blue Book. © Kelley Blue Book Co, Inc. All Rights Reserved.
  • 4. Data Cleansing Kelley Blue Book. © Kelley Blue Book Co, Inc. All Rights Reserved.
  • 5. Missing values removed Sorted in alphabet order Kelley Blue Book. © Kelley Blue Book Co, Inc. All Rights Reserved.
  • 6. Dataset Variables  Dependent Variable: OCRAT1(Consumer Reports numerical score)  Number of observations: n=170 1. Mcode(Manufacturer Code) 5. OGAS1(Gas mileage mpg) 9. OSAF1(Consumer Reports rating of safety ) 2. CRREC(Recommended by Consumer Reports = 1) 6. ORLGRM1(Rear leg room inches) 10. OHAND1(Consumer Reports rating of handling) 3. OREL1(Consumer Reports reliability rating - 5 pt. scale) 7. OACCEL1(Acceleration 0- 60 mph) 11. ORIDE1(Consumer Reports rating of ride) 4. OLCAP1(Luggage capacity cu. ft.) 8. OFSEAT1(Consumer Reports rating of front seat comfort) Kelley Blue Book. © Kelley Blue Book Co, Inc. All Rights Reserved.
  • 7. Descriptive Statistics Variable Name Mean Standard Deviation CRREC 3.112 1.0793283 OREL1 31.048 16.5318098 OLCAP1 18.647 3.4287537 OGAS1 28.644 1.8149920 ORLGRM1 9.077 1.8248783 OACCEL1 3.900 0.5509540 OFSEAT1 4.323 0.7672302 OSAF1 3.288 0.7333012 OHAND1 3.224 0.6772110 Kelley Blue Book. © Kelley Blue Book Co, Inc. All Rights Reserved.
  • 8. Variable Correlation  All correlations between independent variables are <0.5 except for the following: OLCAP1 with OGAS1 = -0.559 OHAND1 with OGAS1 = 0.504 Kelley Blue Book. © Kelley Blue Book Co, Inc. All Rights Reserved.
  • 9. Variable Correlation Kelley Blue Book. © Kelley Blue Book Co, Inc. All Rights Reserved.
  • 11. Variable Selection  Maximum R-squared  Stepwise Selection  GLMSelection  AIC Selection Kelley Blue Book. © Kelley Blue Book Co, Inc. All Rights Reserved.
  • 12.
  • 13.
  • 14.
  • 15.
  • 16. Selected Model OCRAT1= -107.53353+ 0.88959*Mcode+ 1.52307* OLCAP1+ 3.59932* OGAS1- -8.34379* OACCEL1+ 17.88028* OFSEAT1+ 15.66684* OSAF1+ 39.36924* OHAND1+ 27.44342* ORIDE1;
  • 17. Regression Conclusion  R Squared: 0.5754  Adjusted R-squared: 0.5543  Number of influence point is 17  Overall F-statistic: 27.27  P-value for Overall F-test: <0.0001  VIF=PRESS/SSR=1.1433
  • 18. PROC REG Source Code PROC REG data=car_new; MODEL OCRAT1= Mcode OLCAP1 OGAS1 OACCEL1 OFSEAT1 OSAF1 OHAND1 ORIDE1 / r p influence vif; PLOT r.* p. r.* nqq.; RUN; Kelley Blue Book. © Kelley Blue Book Co, Inc. All Rights Reserved.
  • 19. Selected Model Parameter Estimates Variable DF Parameter Standard t Value Pr > |t| Estimate Error  ---------------------------------------------------------------------------  Intercept 1 -210.36703 36.06039 -5.83 <.0001  Mcode 1 0.65258 0.28073 2.32 0.0215  OLCAP1 1 1.59184 0.21803 7.30 <.0001  OGAS1 1 5.58281 1.11826 4.99 <.0001  OACCEL1 1 -6.79223 1.60769 -4.22 <.0001  OFSEAT1 1 18.78770 5.61708 3.34 0.0011  OSAF1 1 21.18837 3.99055 5.31 <.0001  OHAND1 1 35.98280 4.59454 7.83 <.0001  ORIDE1 1 38.90925 4.57810 8.50 <.0001 Kelley Blue Book. © Kelley Blue Book Co, Inc. All Rights Reserved.
  • 20. Selected Model Cross-validation Estimates  --------------Cross Validation Estimates---------------  Parameter 1 2 3 4 5  Intercept -215.22 -221.657 -240.976 -190.66 -174.67  Mcode 0.78 0.471 0.884 0.51 0.64  OLCAP1 1.54 1.563 1.470 1.79 1.61  OGAS1 5.35 5.461 5.613 6.15 5.39  OACCEL1 -6.38 -6.281 -5.785 -7.92 -8.46  OFSEAT1 20.39 19.546 22.426 15.83 15.27  OSAF1 22.50 22.200 22.450 17.85 21.28  OHAND1 34.11 36.681 33.626 39.38 35.78  ORIDE1 38.68 40.296 42.117 35.96 37.25 Kelley Blue Book. © Kelley Blue Book Co, Inc. All Rights Reserved.
  • 21. Analysis of Variance for Model  R-squared: 0.7782  Adjusted R-squared: 0.7659  Overall F-statistic: 63.17  P-value for Overall F-test: <0.0001  VIF=PRESS/SSR:1.1715 Kelley Blue Book. © Kelley Blue Book Co, Inc. All Rights Reserved.
  • 22. PROC REG -- Influence Points and Multicollinarity  Inspection of 𝑧𝑖 ∗ and ℎ𝑖𝑖 for all observations shows that there are 10 influence points.  Here are the VIF values:  Variable VIF -------------------------------  Intercept 0  Mcode 1.27290  OLCAP1 1.77977  OGAS1 1.95318  OACCEL1 1.24724  OFSEAT1 1.19959  OSAF1 1.16904  OHAND1 1.62568  ORIDE1 1.17256 Kelley Blue Book. © Kelley Blue Book Co, Inc. All Rights Reserved.
  • 23. Residual Plot Kelley Blue Book. © Kelley Blue Book Co, Inc. All Rights Reserved.
  • 24. Normal Plot of Residuals Kelley Blue Book. © Kelley Blue Book Co, Inc. All Rights Reserved.
  • 25. Conclusions  The final model holds up under cross-validation.  The R-squared value is relatively high: 𝑅2 =.7782.  There are 10 influence points which can be accepted given the sample number.  There is no multicollinarity.  The residual plot satisfies the assumption: the residuals are unbiased and homoscedastic.  The residuals are normally distributed. Kelley Blue Book. © Kelley Blue Book Co, Inc. All Rights Reserved.
  • 26. Follow-up Analysis  Is it possible to have more observations in a sample?  Is there any other factors, which influence the consumer choice making, that are not included in the original survey?  Is this analysis too general? Should we break down into several groups, ex. used cars vs new cars, SUV vs sedan?
  • 27. Appendix  /* Import data and creat new dataset called car */  PROC IMPORT datafile="C:/datasets/cars.csv"  OUT=car  DBMS=csv  REPLACE;  getnames=yes;  RUN;  PROC PRINT;  RUN;
  • 28. Appendix  /* Descriptive statistics about each variable */  PROC MEANS data=car mean min max stddev p25 p75;  VAR OCRAT1 OREL1 OLCAP1 OGAS1 ORLGRM1 OACCEL1 OFSEAT1 OSAF1 OHAND1 ORIDE1;  RUN;  PROC SGSCATTER data=car;  MATRIX OCRAT1 Mcode OREL1 OLCAP1 OGAS1 ORLGRM1 OACCEL1 OFSEAT1 OSAF1 OHAND1 ORIDE1;  RUN;  /* Test the correlation between each independent variable */  PROC CORR data=car;  VAR Mcode OREL1 OLCAP1 OGAS1 ORLGRM1 OACCEL1 OFSEAT1 OSAF1 OHAND1 ORIDE1;  RUN;
  • 29. Appendix  /* Build Linear Regression Model for car dataset */  PROC REG data=car;  MODEL OCRAT1= Mcode CRREC OREL1 OLCAP1 OGAS1 ORLGRM1 OACCEL1 OFSEAT1 OSAF1 OHAND1 ORIDE1 / r p vif;  PLOT r.* p. r.* nqq.;  RUN;  * Model 1 Using MaximumR-squared Selection ;  PROC REG data=car ;  MODEL OCRAT1= Mcode CRREC OREL1 OLCAP1 OGAS1 ORLGRM1 OACCEL1 OFSEAT1 OSAF1 OHAND1 ORIDE1 / selection= maxr r p influence vif;  PLOT r.* p. r.* nqq.;  RUN;
  • 30. Appendix  * Model 2 Using Stepwise Selection ;  PROC REG data=car ;  MODEL OCRAT1= Mcode CRREC OREL1 OLCAP1 OGAS1 ORLGRM1 OACCEL1 OFSEAT1 OSAF1 OHAND1 ORIDE1 / selection= stepwise r p influence vif;  PLOT r.* p. r.* nqq.;  RUN;  *Model 3 Using AIC Selection;  PROC RSQUARE AIC;  MODEL OCRAT1= Mcode CRREC OREL1 OLCAP1 OGAS1 ORLGRM1 OACCEL1 OFSEAT1 OSAF1 OHAND1 ORIDE1 / select=2;  RUN;  * Model 4 Using GLMSelection ;  PROC GLMSELECT data=car ;  MODEL OCRAT1= Mcode CRREC OREL1 OLCAP1 OGAS1 ORLGRM1 OACCEL1 OFSEAT1 OSAF1 OHAND1 ORIDE1;  RUN;
  • 31. Appendix  *Initial final model ;  PROC REG data=car;  MODEL OCRAT1=MCODE OLCAP1 OGAS1 OACCEL1 OFSEAT1 OSAF1 OHAND1 ORIDE1/r p influence VIF;  PLOT r.* p. r.* nqq.;  RUN;  /* Remove the influence points and rebuild the model */  *Import new data and creat new dataset called car_new ;  PROC IMPORT datafile='C:/datasets/cars_new.csv'  OUT=car_new  DBMS=csv  REPLACE;  getnames=yes;  RUN;  PROC PRINT;  RUN;
  • 32. Appendix  /* Build the best regression model */  PROC REG data=car_new;  MODEL OCRAT1= Mcode OLCAP1 OGAS1 OACCEL1 OFSEAT1 OSAF1 OHAND1 ORIDE1 / r p influence vif;  PLOT r.* p. r.* nqq.;  RUN;  /* Cross validation */  PROC GLMSELECT seed=4530;  MODEL OCRAT1= Mcode OREL1 OLCAP1 OGAS1 ORLGRM1 OACCEL1 OFSEAT1 OSAF1 OHAND1 ORIDE1  / stats= all cvdetails=all details=summary selection=stepwise(select=cv drop=competitive) cvmethod=random(5);  RUN;  QUIT;

Editor's Notes

  1. Data Transform The exploratory data analysis suggests a model that is adequate for fitting the data. Check from the correlation graph (See in Appendix A), the data show the linear relationship; thus, data transform is not necessary in this case.
  2. Maximum r squared selection, obviously, we ought to look for the biggest R squared model. However, we still take the parameter P-Value into account. We thus get the regression model. In step 9, with the R squared increases, all the P-Value of parameters are significant but if we include 9 independent variables, predictors start to be not significant. In the 9 dependent variable model, the OREL1’ P- Value is 0.2 which is not significant.
  3. Stepwise selection is pretty easy for model builders because this selection method give us a summary as the PowerPoint shows.
  4. Smallest AIC, and 1331.2492 is the smallest AIC. So we find out the variables in model.