SlideShare a Scribd company logo
1 of 10
Download to read offline
See discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/344196551
Machine-learning analysis for automobile dataset
Article · May 2020
CITATIONS
0
READS
498
1 author:
Some of the authors of this publication are also working on these related projects:
Spring Multi-tenant View project
Services base for Microservice View project
Ahmed Ali
University of the Cumberlands
34 PUBLICATIONS 2 CITATIONS
SEE PROFILE
All content following this page was uploaded by Ahmed Ali on 10 September 2020.
The user has requested enhancement of the downloaded file.
AHMED ALI 1
Ahmed Ali
Ph.D. Student
University of Cumberlands
Automobile Data Analysis
AHMED ALI 2
Introduction
The automobile data analysis includes a dataset introduced from the Univerisity of
California Irvine Machine Learning Repository UCI and refined from Kaggle. According
to UCI (1985), the attributes consist of three different types of entities: (a) the model and
specification of an auto, which includes the characteristics, (b) the personal insurance, (c)
its normalized losses in use as compared to other cars. The data set source for this model
collected from Insurance collision reports, personal insurance, and car models. According
to Kaggle, There are 26 data attributes in this model descript the data set model from
different angles. The objective of this report is to perform exploratory data analysis to find
the primary relationships between features, which include univariate analysis, which
includes finding the maximum and minimum, such as the weight, engine size,
horsepower, and price.
Moreover, perform a regression to predict the car prices. The first step after determining
the attributes is performing the data clean-up to find out the missing data and replace the
unwanted data with context data; for example, the UCI indicates the missing attribute
donated by “?” NAN will replace with zero. The data downloaded from the source website
UCI. Figure 2 shows the data set attributes in which two columns the first one include 26
attributes, while the second columns include the attributes ranges.
Figure 1 - Sample Orignal data from (https://archive.ics.uci.edu/ml/datasets/Automobile)
AHMED ALI 3
Figure2 Orignal data attaributes from (https://archive.ics.uci.edu/ml/datasets/Automobile)
The data preparation includes constructing one excel sheet to combine the two data set
from Figure 1 and Figure 1; the header row includes the 26 columns attributes and the rest
of rows imported from the file in Figure 1. Figure 3 shows a snapshot of the final excel
sheet after importing the data.
AHMED ALI 4
Figure 3 Data set after importing
Exploratory Data Analysis
In this section, we will start by importing the data into the google Colab exploratory data
analysis to find the primary relationship between attributes. The next step is to clean the
unwanted data such as “?” with NAN value if it is a character or mean value
Figure 4 data set imported from the Automobile datasheet
AHMED ALI 5
The next step fills the missing numerical data with the mean this done by calculating the
mean for every numerical column after excluding the nan-numerical data and then
replace the missing data with the mean; figure 4 show snap code of this steps, this steps
will be repeated for columns (price, curb-weight, engine-size, horsepower,peak-rpm)
Figure 4 – Replace the missing data with the mean
Figure 5 – Replace the missing data with the mean for performing the Univariate
analysis
AHMED ALI 6
Figure 6 – Univariate analysis
The univariate analysis shows that most car price between 5000 – 8000, for the engine
size is in range 60 to 170. Most of the car has a Curb Weight in range 1800 to 3100. Also,
the graph shows the peak rpm distributed in the range between 4600 to 5100. The most
vehicle has horsepower 50 to 120.
Figure 6 shows the correlation between the different attributes, the heat map ranging
from zero to one, and it also goes under zero. For example, car prices highly correlated
with engine size and horsepower; another observation is the highway-mpg and city-mpg
negatively correlated if they are higher in the price.
Figure 7 shows the linear regression model relation between the price and engine-size
variables by using a machine learning library and splitting the dataset into two groups,
the training data with 75% and test data 25%. The linear regression shows that the price
tends to go hight when the engine size becomes higher (Mediasittich, 2019).
AHMED ALI 7
Figure 6 – Heat Maps
AHMED ALI 8
Figure 7 – linear regression
Figure 8– linear regression
AHMED ALI 9
References
Mediasittich. (2019). Linear regression for car price prediction. Retrieved from
https://www.kaggle.com/mediasittich/linear-regression-for-car-price-prediction
Automobile Data Set. (n.d.). Retrieved from
https://archive.ics.uci.edu/ml/datasets/Automobile
Shubhamsinghgharsele. (2018). Exploratory Data Analysis on Automobile Dataset.
Retrieved from https://www.kaggle.com/shubhamsinghgharsele/exploratory-
data-analysis-on-automobile-dataset
View publication stats
View publication stats

More Related Content

Similar to AutomobileDataAnalysis.pdf

Tesla's technological innovations by ghanesh kulkarni
Tesla's technological innovations by ghanesh kulkarniTesla's technological innovations by ghanesh kulkarni
Tesla's technological innovations by ghanesh kulkarni
Ghanesh Kulkarni, PMP
 
CarStream: An Industrial System of Big Data Processing for Internet of Vehicles
CarStream: An Industrial System of Big Data Processing for Internet of VehiclesCarStream: An Industrial System of Big Data Processing for Internet of Vehicles
CarStream: An Industrial System of Big Data Processing for Internet of Vehicles
ijtsrd
 
INNOVATIVE BI APPROACHES AND METHODOLOGIES IMPLEMENTING A MULTILEVEL ANALYTIC...
INNOVATIVE BI APPROACHES AND METHODOLOGIES IMPLEMENTING A MULTILEVEL ANALYTIC...INNOVATIVE BI APPROACHES AND METHODOLOGIES IMPLEMENTING A MULTILEVEL ANALYTIC...
INNOVATIVE BI APPROACHES AND METHODOLOGIES IMPLEMENTING A MULTILEVEL ANALYTIC...
ijscai
 

Similar to AutomobileDataAnalysis.pdf (20)

Tesla's technological innovations by ghanesh kulkarni
Tesla's technological innovations by ghanesh kulkarniTesla's technological innovations by ghanesh kulkarni
Tesla's technological innovations by ghanesh kulkarni
 
Scenaria Perspective #1 - 2025 CAFE Compliance Costs
Scenaria Perspective #1 - 2025 CAFE Compliance CostsScenaria Perspective #1 - 2025 CAFE Compliance Costs
Scenaria Perspective #1 - 2025 CAFE Compliance Costs
 
CarStream: An Industrial System of Big Data Processing for Internet of Vehicles
CarStream: An Industrial System of Big Data Processing for Internet of VehiclesCarStream: An Industrial System of Big Data Processing for Internet of Vehicles
CarStream: An Industrial System of Big Data Processing for Internet of Vehicles
 
Data mining-for-prediction-of-aircraft-component-replacement
Data mining-for-prediction-of-aircraft-component-replacementData mining-for-prediction-of-aircraft-component-replacement
Data mining-for-prediction-of-aircraft-component-replacement
 
Azure IoT Hub on a Toradex Colibri VF61 – Part 3: Using Cloud Services to col...
Azure IoT Hub on a Toradex Colibri VF61 – Part 3: Using Cloud Services to col...Azure IoT Hub on a Toradex Colibri VF61 – Part 3: Using Cloud Services to col...
Azure IoT Hub on a Toradex Colibri VF61 – Part 3: Using Cloud Services to col...
 
Effect of Rack Friction, Column Friction and Vehicle Speed on Electric Power ...
Effect of Rack Friction, Column Friction and Vehicle Speed on Electric Power ...Effect of Rack Friction, Column Friction and Vehicle Speed on Electric Power ...
Effect of Rack Friction, Column Friction and Vehicle Speed on Electric Power ...
 
Prediction of Car Price using Linear Regression
Prediction of Car Price using Linear RegressionPrediction of Car Price using Linear Regression
Prediction of Car Price using Linear Regression
 
Multiple Linear Regression Applications Automobile Pricing
Multiple Linear Regression Applications Automobile PricingMultiple Linear Regression Applications Automobile Pricing
Multiple Linear Regression Applications Automobile Pricing
 
AD VANCED TRAFFIC ENGINEERING ASSIGNMENT CIV 8331
AD VANCED TRAFFIC ENGINEERING ASSIGNMENT CIV 8331 AD VANCED TRAFFIC ENGINEERING ASSIGNMENT CIV 8331
AD VANCED TRAFFIC ENGINEERING ASSIGNMENT CIV 8331
 
Modelling and simulation of driving cycle using simulink
Modelling and simulation of driving cycle using simulinkModelling and simulation of driving cycle using simulink
Modelling and simulation of driving cycle using simulink
 
INNOVATIVE BI APPROACHES AND METHODOLOGIES IMPLEMENTING A MULTILEVEL ANALYTIC...
INNOVATIVE BI APPROACHES AND METHODOLOGIES IMPLEMENTING A MULTILEVEL ANALYTIC...INNOVATIVE BI APPROACHES AND METHODOLOGIES IMPLEMENTING A MULTILEVEL ANALYTIC...
INNOVATIVE BI APPROACHES AND METHODOLOGIES IMPLEMENTING A MULTILEVEL ANALYTIC...
 
Marketplace affiliates potential analysis using cosine similarity and vision-...
Marketplace affiliates potential analysis using cosine similarity and vision-...Marketplace affiliates potential analysis using cosine similarity and vision-...
Marketplace affiliates potential analysis using cosine similarity and vision-...
 
RELIABILITY OF MECHANICAL SYSTEM OF SYSTEMS
RELIABILITY OF MECHANICAL SYSTEM OF SYSTEMSRELIABILITY OF MECHANICAL SYSTEM OF SYSTEMS
RELIABILITY OF MECHANICAL SYSTEM OF SYSTEMS
 
How IoT helps Firms Optimize their Digital Performance. Economic Impact of IoT
How IoT helps Firms Optimize their Digital Performance. Economic Impact of IoTHow IoT helps Firms Optimize their Digital Performance. Economic Impact of IoT
How IoT helps Firms Optimize their Digital Performance. Economic Impact of IoT
 
A Comparative Study Of Machine Learning Algorithms For Industry-Specific Frei...
A Comparative Study Of Machine Learning Algorithms For Industry-Specific Frei...A Comparative Study Of Machine Learning Algorithms For Industry-Specific Frei...
A Comparative Study Of Machine Learning Algorithms For Industry-Specific Frei...
 
CAR PRICE PREDICTION.pptx
CAR PRICE PREDICTION.pptxCAR PRICE PREDICTION.pptx
CAR PRICE PREDICTION.pptx
 
[000008]
[000008][000008]
[000008]
 
IRJET- Damage Assessment for Car Insurance
IRJET-  	  Damage Assessment for Car InsuranceIRJET-  	  Damage Assessment for Car Insurance
IRJET- Damage Assessment for Car Insurance
 
Information system infrastructure
Information system infrastructureInformation system infrastructure
Information system infrastructure
 
IFSM 201 Exceptional Education - snaptutorial.com
IFSM 201  Exceptional Education - snaptutorial.comIFSM 201  Exceptional Education - snaptutorial.com
IFSM 201 Exceptional Education - snaptutorial.com
 

Recently uploaded

Powerpoint showing results from tik tok metrics
Powerpoint showing results from tik tok metricsPowerpoint showing results from tik tok metrics
Powerpoint showing results from tik tok metrics
CaitlinCummins3
 
Abortion pills in Muscut<Oman(+27737758557) Cytotec available.inn Kuwait City.
Abortion pills in Muscut<Oman(+27737758557) Cytotec available.inn Kuwait City.Abortion pills in Muscut<Oman(+27737758557) Cytotec available.inn Kuwait City.
Abortion pills in Muscut<Oman(+27737758557) Cytotec available.inn Kuwait City.
daisycvs
 
如何办理(SUT毕业证书)斯威本科技大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(SUT毕业证书)斯威本科技大学毕业证成绩单本科硕士学位证留信学历认证如何办理(SUT毕业证书)斯威本科技大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(SUT毕业证书)斯威本科技大学毕业证成绩单本科硕士学位证留信学历认证
ogawka
 
#Mtp-Kit Prices » Qatar. Doha (+27737758557) Abortion Pills For Sale In Doha,...
#Mtp-Kit Prices » Qatar. Doha (+27737758557) Abortion Pills For Sale In Doha,...#Mtp-Kit Prices » Qatar. Doha (+27737758557) Abortion Pills For Sale In Doha,...
#Mtp-Kit Prices » Qatar. Doha (+27737758557) Abortion Pills For Sale In Doha,...
drm1699
 
Shots fired Budget Presentation.pdf12312
Shots fired Budget Presentation.pdf12312Shots fired Budget Presentation.pdf12312
Shots fired Budget Presentation.pdf12312
LR1709MUSIC
 

Recently uploaded (20)

stock price prediction using machine learning
stock price prediction using machine learningstock price prediction using machine learning
stock price prediction using machine learning
 
MichaelStarkes_UncutGemsProjectSummary.pdf
MichaelStarkes_UncutGemsProjectSummary.pdfMichaelStarkes_UncutGemsProjectSummary.pdf
MichaelStarkes_UncutGemsProjectSummary.pdf
 
WAM Corporate Presentation May 2024_w.pdf
WAM Corporate Presentation May 2024_w.pdfWAM Corporate Presentation May 2024_w.pdf
WAM Corporate Presentation May 2024_w.pdf
 
HAL Financial Performance Analysis and Future Prospects
HAL Financial Performance Analysis and Future ProspectsHAL Financial Performance Analysis and Future Prospects
HAL Financial Performance Analysis and Future Prospects
 
Powerpoint showing results from tik tok metrics
Powerpoint showing results from tik tok metricsPowerpoint showing results from tik tok metrics
Powerpoint showing results from tik tok metrics
 
Toyota Kata Coaching for Agile Teams & Transformations
Toyota Kata Coaching for Agile Teams & TransformationsToyota Kata Coaching for Agile Teams & Transformations
Toyota Kata Coaching for Agile Teams & Transformations
 
hyundai capital 2023 consolidated financial statements
hyundai capital 2023 consolidated financial statementshyundai capital 2023 consolidated financial statements
hyundai capital 2023 consolidated financial statements
 
Progress Report - UKG Analyst Summit 2024 - A lot to do - Good Progress1-1.pdf
Progress Report - UKG Analyst Summit 2024 - A lot to do - Good Progress1-1.pdfProgress Report - UKG Analyst Summit 2024 - A lot to do - Good Progress1-1.pdf
Progress Report - UKG Analyst Summit 2024 - A lot to do - Good Progress1-1.pdf
 
Abortion pills in Muscut<Oman(+27737758557) Cytotec available.inn Kuwait City.
Abortion pills in Muscut<Oman(+27737758557) Cytotec available.inn Kuwait City.Abortion pills in Muscut<Oman(+27737758557) Cytotec available.inn Kuwait City.
Abortion pills in Muscut<Oman(+27737758557) Cytotec available.inn Kuwait City.
 
Elevate Your Online Presence with SEO Services
Elevate Your Online Presence with SEO ServicesElevate Your Online Presence with SEO Services
Elevate Your Online Presence with SEO Services
 
Most Visionary Leaders in Cloud Revolution, Shaping Tech’s Next Era - 2024 (2...
Most Visionary Leaders in Cloud Revolution, Shaping Tech’s Next Era - 2024 (2...Most Visionary Leaders in Cloud Revolution, Shaping Tech’s Next Era - 2024 (2...
Most Visionary Leaders in Cloud Revolution, Shaping Tech’s Next Era - 2024 (2...
 
Blinkit: Revolutionizing the On-Demand Grocery Delivery Service.pptx
Blinkit: Revolutionizing the On-Demand Grocery Delivery Service.pptxBlinkit: Revolutionizing the On-Demand Grocery Delivery Service.pptx
Blinkit: Revolutionizing the On-Demand Grocery Delivery Service.pptx
 
如何办理(SUT毕业证书)斯威本科技大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(SUT毕业证书)斯威本科技大学毕业证成绩单本科硕士学位证留信学历认证如何办理(SUT毕业证书)斯威本科技大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(SUT毕业证书)斯威本科技大学毕业证成绩单本科硕士学位证留信学历认证
 
Moradia Isolada com Logradouro; Detached house with patio in Penacova
Moradia Isolada com Logradouro; Detached house with patio in PenacovaMoradia Isolada com Logradouro; Detached house with patio in Penacova
Moradia Isolada com Logradouro; Detached house with patio in Penacova
 
#Mtp-Kit Prices » Qatar. Doha (+27737758557) Abortion Pills For Sale In Doha,...
#Mtp-Kit Prices » Qatar. Doha (+27737758557) Abortion Pills For Sale In Doha,...#Mtp-Kit Prices » Qatar. Doha (+27737758557) Abortion Pills For Sale In Doha,...
#Mtp-Kit Prices » Qatar. Doha (+27737758557) Abortion Pills For Sale In Doha,...
 
Pay after result spell caster (,$+27834335081)@ bring back lost lover same da...
Pay after result spell caster (,$+27834335081)@ bring back lost lover same da...Pay after result spell caster (,$+27834335081)@ bring back lost lover same da...
Pay after result spell caster (,$+27834335081)@ bring back lost lover same da...
 
Pitch Deck Teardown: Goodcarbon's $5.5m Seed deck
Pitch Deck Teardown: Goodcarbon's $5.5m Seed deckPitch Deck Teardown: Goodcarbon's $5.5m Seed deck
Pitch Deck Teardown: Goodcarbon's $5.5m Seed deck
 
tekAura | Desktop Procedure Template (2016)
tekAura | Desktop Procedure Template (2016)tekAura | Desktop Procedure Template (2016)
tekAura | Desktop Procedure Template (2016)
 
Shots fired Budget Presentation.pdf12312
Shots fired Budget Presentation.pdf12312Shots fired Budget Presentation.pdf12312
Shots fired Budget Presentation.pdf12312
 
Unlocking Growth The Power of Outsourcing for CPA Firms
Unlocking Growth The Power of Outsourcing for CPA FirmsUnlocking Growth The Power of Outsourcing for CPA Firms
Unlocking Growth The Power of Outsourcing for CPA Firms
 

AutomobileDataAnalysis.pdf

  • 1. See discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/344196551 Machine-learning analysis for automobile dataset Article · May 2020 CITATIONS 0 READS 498 1 author: Some of the authors of this publication are also working on these related projects: Spring Multi-tenant View project Services base for Microservice View project Ahmed Ali University of the Cumberlands 34 PUBLICATIONS 2 CITATIONS SEE PROFILE All content following this page was uploaded by Ahmed Ali on 10 September 2020. The user has requested enhancement of the downloaded file.
  • 2. AHMED ALI 1 Ahmed Ali Ph.D. Student University of Cumberlands Automobile Data Analysis
  • 3. AHMED ALI 2 Introduction The automobile data analysis includes a dataset introduced from the Univerisity of California Irvine Machine Learning Repository UCI and refined from Kaggle. According to UCI (1985), the attributes consist of three different types of entities: (a) the model and specification of an auto, which includes the characteristics, (b) the personal insurance, (c) its normalized losses in use as compared to other cars. The data set source for this model collected from Insurance collision reports, personal insurance, and car models. According to Kaggle, There are 26 data attributes in this model descript the data set model from different angles. The objective of this report is to perform exploratory data analysis to find the primary relationships between features, which include univariate analysis, which includes finding the maximum and minimum, such as the weight, engine size, horsepower, and price. Moreover, perform a regression to predict the car prices. The first step after determining the attributes is performing the data clean-up to find out the missing data and replace the unwanted data with context data; for example, the UCI indicates the missing attribute donated by “?” NAN will replace with zero. The data downloaded from the source website UCI. Figure 2 shows the data set attributes in which two columns the first one include 26 attributes, while the second columns include the attributes ranges. Figure 1 - Sample Orignal data from (https://archive.ics.uci.edu/ml/datasets/Automobile)
  • 4. AHMED ALI 3 Figure2 Orignal data attaributes from (https://archive.ics.uci.edu/ml/datasets/Automobile) The data preparation includes constructing one excel sheet to combine the two data set from Figure 1 and Figure 1; the header row includes the 26 columns attributes and the rest of rows imported from the file in Figure 1. Figure 3 shows a snapshot of the final excel sheet after importing the data.
  • 5. AHMED ALI 4 Figure 3 Data set after importing Exploratory Data Analysis In this section, we will start by importing the data into the google Colab exploratory data analysis to find the primary relationship between attributes. The next step is to clean the unwanted data such as “?” with NAN value if it is a character or mean value Figure 4 data set imported from the Automobile datasheet
  • 6. AHMED ALI 5 The next step fills the missing numerical data with the mean this done by calculating the mean for every numerical column after excluding the nan-numerical data and then replace the missing data with the mean; figure 4 show snap code of this steps, this steps will be repeated for columns (price, curb-weight, engine-size, horsepower,peak-rpm) Figure 4 – Replace the missing data with the mean Figure 5 – Replace the missing data with the mean for performing the Univariate analysis
  • 7. AHMED ALI 6 Figure 6 – Univariate analysis The univariate analysis shows that most car price between 5000 – 8000, for the engine size is in range 60 to 170. Most of the car has a Curb Weight in range 1800 to 3100. Also, the graph shows the peak rpm distributed in the range between 4600 to 5100. The most vehicle has horsepower 50 to 120. Figure 6 shows the correlation between the different attributes, the heat map ranging from zero to one, and it also goes under zero. For example, car prices highly correlated with engine size and horsepower; another observation is the highway-mpg and city-mpg negatively correlated if they are higher in the price. Figure 7 shows the linear regression model relation between the price and engine-size variables by using a machine learning library and splitting the dataset into two groups, the training data with 75% and test data 25%. The linear regression shows that the price tends to go hight when the engine size becomes higher (Mediasittich, 2019).
  • 8. AHMED ALI 7 Figure 6 – Heat Maps
  • 9. AHMED ALI 8 Figure 7 – linear regression Figure 8– linear regression
  • 10. AHMED ALI 9 References Mediasittich. (2019). Linear regression for car price prediction. Retrieved from https://www.kaggle.com/mediasittich/linear-regression-for-car-price-prediction Automobile Data Set. (n.d.). Retrieved from https://archive.ics.uci.edu/ml/datasets/Automobile Shubhamsinghgharsele. (2018). Exploratory Data Analysis on Automobile Dataset. Retrieved from https://www.kaggle.com/shubhamsinghgharsele/exploratory- data-analysis-on-automobile-dataset View publication stats View publication stats