SlideShare a Scribd company logo
1 of 5
1
Santosh Kumaravel Sundaravadivelu S3729461
Practical Data Science
Assignment – 1
Task 1: Data Preparation:
Step 1.1 - ImportingLibraries and data
Libraries such as Pandas, Numpy, Matplotlib are imported for ease use of data structures and data
analysis in python. Data set is importedinto the jupyter notebook in order to analyse and give good
insightsintothe data. It isimportedwithfunction“read_csv” witha separatorof “#” and givingthe
corresponding column names. There are 238 observations and 26 variables. The imported data is
checked using the head function order to check the source and imported dataset
my_data_automobile are the same.
Step 1.2 – Removing White Space
White spaces are present in my_data_automobile, the remove_whitespace function will check the
corresponding whitespaces in all columns and replaces it using strip function. Changing all the data
intolowercases sothat all the data will be meaningful. i.e,Inthe “fuel-type”column there are values
like gas, Gas, diesel, Diesel, it can be reduced with converting everything into lower cases.
Step 1.3 – Typo Errors
While analyzing the column “symboling”, there is a pattern which is not in the range of -3 to 3, we
consider this as a typo (will be explained in step 1.4). There are some more typos in the column of
make,aspiration,num-of-doorsandhandledbystr.replace function withthe correspondingspelling.
Step 1.4 – Sanity checks for Impossible values
Impossible values/Outliers in thisdataset are considered as the unexpected values which are not in
betweenthe range whichisgiven.Whileanalyzingthe dataset“my_data_automobile”the symboling
columnhas the value whichis not between -3to 3 and we take the replacingthe value withnearest
one, it takes the value of 3 and considers it as a typo, because symboling place a vital role in the
relationship with other data.
Step 1.5 – CheckingMissingValues
Missing Values are checked using IsNull().sum(), In price column there are values with 0. While
checkingthe datasetonly Volvohassome valuesof 0.So we are replacingthe 0 of Volvoat price with
mean values. The reason for choosing Volvo because the price of BMW and Holden cannot be the
same consideringthe brandname. If the missingvalue isfoundon a differentmake of the car, taking
the mean of the column will be considered. Mean is calculated for the entire dataset
“my_data_automobile”byusingfill na() functionwithaxis =0.While checkingthe datasetatthisstate,
there is actuallymissingvaluesin“no-of-doors”columnsowe are removingitusing dropnafunction
with a how=all attribute.
2
Santosh Kumaravel Sundaravadivelu S3729461
Task 2: Data Exploration:
Step 2.1.1 – Safety First(Consideringone Ordinal Value)
Safety should be given as high priority while the car is purchased by the end consumer. The main
reason for selecting histogram for these ordinal values, it gives a clear picture of how the frequency
for differentsafetylevels. There isnovalue in -3whichimpliesitismuch harder to make one perfect
car. There are few cars in safety level -2 which can be considered since there are no perfect cars
accordingto the dataset. The frequencyismore insafetylevel 0whichimplies,the manufacturersare
not compromising on safety by having a value greater than 0. There are few cars with level 3 which
suggest there are no safety what so ever.
Step 2.1.2 – Type Matters (Considering one Nominal value)
Body shape plays a vital role in selecting the type of car for a specific audience. i.e, Sedan(52.97%)
type of carswill be suitedforafamilyof 4.The hatchback(29.66%) issuitableforafamilyforextraboot
space. The wagon(11.44%) isfor more than 4 people. The hardtop(3.39%) isforpeople withcamping
nature. Convertible(2.54%) is for people who are very fond of the sun. There are a wide variety of
audience asmentionedabove forwhich manufacturersare strivingto satisfythem. Accordingto the
datasetgiven,Sedanismanufacturedmore innumbers comparedtoothers.The reasonforchoosing
a pie chart because it is good for comparing the percentage of body types.
3
Santosh Kumaravel Sundaravadivelu S3729461
Step 2.1.3 – Mileage inCity (Consideringone Numericvalue)
Mileage is inversely proportional to the maintenance of the car, excluding for newer cars. There is
some thresholdwhere maximummileage isachievedforcertaintypesof cars.i.e, Car withmaximum
performance will not able to get the mileage expected because compromise is made on the basis of
the preferenceof individuals. Basedonthe datasetthereismaximummileageincityis49likewisethe
minimum is 13. The average mileage is around 24.6. According to figure 75% fall on 28.7.
Step 2.2.1: Horsepower- highway mileage (Hypothesis1)
The mainaimof thishypothesisisto analysethe relationshipbetweenthe twocolumnsinthe dataset.
Comparinghorsepowerwith highwaymileage,whenithashighhorsepowerit resultsin low highway-
mileage. The observation is based on the pattern is generated in the scatter plot. The reason for
choosingto visualize inscatterplotbecause of the pattern formed. Infig 2.2.1 the patternis formed
from high horsepower low mileage and vice versa. Mostly horsepower is preferred by high-cost car.
So for achieving best highway mileage, Cars with lower horsepower is considered.
4
Santosh Kumaravel Sundaravadivelu S3729461
Step 2.2.2 – Price Wars - (Hypothesis2)
The Hypothesisistodeterminethe Make of the cardeterminesthe price of the car. The manufactures
witha longhistoryand brandname will be the leaderstodecide the marketof the cars. Accordingto
the dataset, Mercedes Benz is pricing more compared to other brands, the second highest cost is
BMW, there isadifferentsetof audience forthe rangeof cars,whichcandeterminethe manufactures
to produce them. The budgetcars, there foraspecificsetof audience.The price of the carstartsfrom
5118 to45400. The plausiblehypothesisisconcludedbasedonthe function,the priceincreasesbased
on the Make of the car.
Step 2.2.3 – Mileage and Cylinders (Hypothesis3)
The Hypothesisistodeterminethe leastnumberof cylinders insidethe engineisareasonfor getting
good city mileage If there are the least number of cylindersit will be easy for the engine to process
the petrol or gas and whichin turn will be useful forgettingbettermileage. The Scatterplot istaken
because of the comparison will be clear and viewable.The plausible hypothesis can be taken into
considerationbasedonthe dataset for four cylindersthe mileage ismore and for twelve cylindersit
is very low. The pattern can be concluded stating that a lower number of cylinders higher the city
mileage and vice versa.
5
Santosh Kumaravel Sundaravadivelu S3729461
Step 2.3 Scatter Matrix :
Scatter Matrix is plottedbasedon groupingall the numericvaluesandplottedwiththe figure size of
18. The observationcanbe made basedon the valueswhichare producedin a diagonal fashion. The
diagonal of the matrix contains the bar chart of all the 15 numeric values and scatter plot is plotted
for all the 15 numerical values.

More Related Content

What's hot

What's hot (20)

Modul metode regresi
Modul metode regresiModul metode regresi
Modul metode regresi
 
Tugas resume metode sampling
Tugas resume metode samplingTugas resume metode sampling
Tugas resume metode sampling
 
Heart Disease Prediction using Machine Learning Algorithm
Heart Disease Prediction using Machine Learning AlgorithmHeart Disease Prediction using Machine Learning Algorithm
Heart Disease Prediction using Machine Learning Algorithm
 
Three Stage Cluster Sampling (1).ppt
Three Stage Cluster Sampling (1).pptThree Stage Cluster Sampling (1).ppt
Three Stage Cluster Sampling (1).ppt
 
How Deep Learning Could Predict Weather Events
How Deep Learning Could Predict Weather EventsHow Deep Learning Could Predict Weather Events
How Deep Learning Could Predict Weather Events
 
machine learning a a tool for disease detection and diagnosis
machine learning a a tool for disease detection and diagnosismachine learning a a tool for disease detection and diagnosis
machine learning a a tool for disease detection and diagnosis
 
Probability Assignment Help
Probability Assignment HelpProbability Assignment Help
Probability Assignment Help
 
Final ppt
Final pptFinal ppt
Final ppt
 
GREY SYSTEM THEORY with example | Data mining
GREY SYSTEM THEORY with example  | Data miningGREY SYSTEM THEORY with example  | Data mining
GREY SYSTEM THEORY with example | Data mining
 
Heart disease prediction system
Heart disease prediction systemHeart disease prediction system
Heart disease prediction system
 
Ds practical file
Ds practical fileDs practical file
Ds practical file
 
Machine Learning for Disease Prediction
Machine Learning for Disease PredictionMachine Learning for Disease Prediction
Machine Learning for Disease Prediction
 
Chronic Kidney Disease Prediction
Chronic Kidney Disease PredictionChronic Kidney Disease Prediction
Chronic Kidney Disease Prediction
 
PhD Defense
PhD DefensePhD Defense
PhD Defense
 
Deep-learning-or-health-informatics-recent-trends-and-future-directions By Ra...
Deep-learning-or-health-informatics-recent-trends-and-future-directions By Ra...Deep-learning-or-health-informatics-recent-trends-and-future-directions By Ra...
Deep-learning-or-health-informatics-recent-trends-and-future-directions By Ra...
 
Weather Forecasting using Deep Learning A lgorithm for the Ethiopian Context
Weather Forecasting using Deep Learning A lgorithm for the Ethiopian ContextWeather Forecasting using Deep Learning A lgorithm for the Ethiopian Context
Weather Forecasting using Deep Learning A lgorithm for the Ethiopian Context
 
Heart Disease Identification Method Using Machine Learnin in E-healthcare.
Heart Disease Identification Method Using Machine Learnin in E-healthcare.Heart Disease Identification Method Using Machine Learnin in E-healthcare.
Heart Disease Identification Method Using Machine Learnin in E-healthcare.
 
K-means Clustering with Scikit-Learn
K-means Clustering with Scikit-LearnK-means Clustering with Scikit-Learn
K-means Clustering with Scikit-Learn
 
Rancangan Faktorial 2k
Rancangan Faktorial 2kRancangan Faktorial 2k
Rancangan Faktorial 2k
 
Context aware
Context awareContext aware
Context aware
 

Similar to Data Visualization for Automobile Dataset

Project_Overview_ecopy
Project_Overview_ecopyProject_Overview_ecopy
Project_Overview_ecopy
David Beck
 
Questions On The Equation For Regression
Questions On The Equation For RegressionQuestions On The Equation For Regression
Questions On The Equation For Regression
Tiffany Sandoval
 
Linear programming models - U2.pptx
Linear programming models - U2.pptxLinear programming models - U2.pptx
Linear programming models - U2.pptx
MariaBurgos55
 

Similar to Data Visualization for Automobile Dataset (20)

Deriving insights from data using "R"ight way
Deriving insights from data using "R"ight wayDeriving insights from data using "R"ight way
Deriving insights from data using "R"ight way
 
Dmml report final
Dmml report finalDmml report final
Dmml report final
 
Project_Overview_ecopy
Project_Overview_ecopyProject_Overview_ecopy
Project_Overview_ecopy
 
Black_Friday_Sales_Trushita
Black_Friday_Sales_TrushitaBlack_Friday_Sales_Trushita
Black_Friday_Sales_Trushita
 
ITB Term Paper - 10BM60066
ITB Term Paper - 10BM60066ITB Term Paper - 10BM60066
ITB Term Paper - 10BM60066
 
Lesson 26. Optimization of 64-bit programs
Lesson 26. Optimization of 64-bit programsLesson 26. Optimization of 64-bit programs
Lesson 26. Optimization of 64-bit programs
 
CAR EVALUATION DATABASE
CAR EVALUATION DATABASECAR EVALUATION DATABASE
CAR EVALUATION DATABASE
 
Questions On The Equation For Regression
Questions On The Equation For RegressionQuestions On The Equation For Regression
Questions On The Equation For Regression
 
Employee mode of commuting
Employee mode of commutingEmployee mode of commuting
Employee mode of commuting
 
AIRLINE FARE PRICE PREDICTION
AIRLINE FARE PRICE PREDICTIONAIRLINE FARE PRICE PREDICTION
AIRLINE FARE PRICE PREDICTION
 
IRJET- Comparison for Max-Flow Min-Cut Algorithms for Optimal Assignment Problem
IRJET- Comparison for Max-Flow Min-Cut Algorithms for Optimal Assignment ProblemIRJET- Comparison for Max-Flow Min-Cut Algorithms for Optimal Assignment Problem
IRJET- Comparison for Max-Flow Min-Cut Algorithms for Optimal Assignment Problem
 
Caravan insurance data mining prediction models
Caravan insurance data mining prediction modelsCaravan insurance data mining prediction models
Caravan insurance data mining prediction models
 
Caravan insurance data mining prediction models
Caravan insurance data mining prediction modelsCaravan insurance data mining prediction models
Caravan insurance data mining prediction models
 
The Impact of Cloud Computing on Predictive Analytics 7-29-09 v5
The Impact of Cloud Computing on Predictive Analytics 7-29-09 v5The Impact of Cloud Computing on Predictive Analytics 7-29-09 v5
The Impact of Cloud Computing on Predictive Analytics 7-29-09 v5
 
Linear programming models - U2.pptx
Linear programming models - U2.pptxLinear programming models - U2.pptx
Linear programming models - U2.pptx
 
Modified montgomery modular multiplier for cryptosystems
Modified montgomery modular multiplier for cryptosystemsModified montgomery modular multiplier for cryptosystems
Modified montgomery modular multiplier for cryptosystems
 
Vave two wheelers
Vave two wheelersVave two wheelers
Vave two wheelers
 
IRJET- Machine Learning Techniques for Code Optimization
IRJET-  	  Machine Learning Techniques for Code OptimizationIRJET-  	  Machine Learning Techniques for Code Optimization
IRJET- Machine Learning Techniques for Code Optimization
 
Predicting model for prices of used cars
Predicting model for prices of used carsPredicting model for prices of used cars
Predicting model for prices of used cars
 
CS-422 THESIS (1).pptx
CS-422 THESIS (1).pptxCS-422 THESIS (1).pptx
CS-422 THESIS (1).pptx
 

Recently uploaded

Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
JohnnyPlasten
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
amitlee9823
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
shivangimorya083
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
amitlee9823
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 

Recently uploaded (20)

Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptx
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 

Data Visualization for Automobile Dataset

  • 1. 1 Santosh Kumaravel Sundaravadivelu S3729461 Practical Data Science Assignment – 1 Task 1: Data Preparation: Step 1.1 - ImportingLibraries and data Libraries such as Pandas, Numpy, Matplotlib are imported for ease use of data structures and data analysis in python. Data set is importedinto the jupyter notebook in order to analyse and give good insightsintothe data. It isimportedwithfunction“read_csv” witha separatorof “#” and givingthe corresponding column names. There are 238 observations and 26 variables. The imported data is checked using the head function order to check the source and imported dataset my_data_automobile are the same. Step 1.2 – Removing White Space White spaces are present in my_data_automobile, the remove_whitespace function will check the corresponding whitespaces in all columns and replaces it using strip function. Changing all the data intolowercases sothat all the data will be meaningful. i.e,Inthe “fuel-type”column there are values like gas, Gas, diesel, Diesel, it can be reduced with converting everything into lower cases. Step 1.3 – Typo Errors While analyzing the column “symboling”, there is a pattern which is not in the range of -3 to 3, we consider this as a typo (will be explained in step 1.4). There are some more typos in the column of make,aspiration,num-of-doorsandhandledbystr.replace function withthe correspondingspelling. Step 1.4 – Sanity checks for Impossible values Impossible values/Outliers in thisdataset are considered as the unexpected values which are not in betweenthe range whichisgiven.Whileanalyzingthe dataset“my_data_automobile”the symboling columnhas the value whichis not between -3to 3 and we take the replacingthe value withnearest one, it takes the value of 3 and considers it as a typo, because symboling place a vital role in the relationship with other data. Step 1.5 – CheckingMissingValues Missing Values are checked using IsNull().sum(), In price column there are values with 0. While checkingthe datasetonly Volvohassome valuesof 0.So we are replacingthe 0 of Volvoat price with mean values. The reason for choosing Volvo because the price of BMW and Holden cannot be the same consideringthe brandname. If the missingvalue isfoundon a differentmake of the car, taking the mean of the column will be considered. Mean is calculated for the entire dataset “my_data_automobile”byusingfill na() functionwithaxis =0.While checkingthe datasetatthisstate, there is actuallymissingvaluesin“no-of-doors”columnsowe are removingitusing dropnafunction with a how=all attribute.
  • 2. 2 Santosh Kumaravel Sundaravadivelu S3729461 Task 2: Data Exploration: Step 2.1.1 – Safety First(Consideringone Ordinal Value) Safety should be given as high priority while the car is purchased by the end consumer. The main reason for selecting histogram for these ordinal values, it gives a clear picture of how the frequency for differentsafetylevels. There isnovalue in -3whichimpliesitismuch harder to make one perfect car. There are few cars in safety level -2 which can be considered since there are no perfect cars accordingto the dataset. The frequencyismore insafetylevel 0whichimplies,the manufacturersare not compromising on safety by having a value greater than 0. There are few cars with level 3 which suggest there are no safety what so ever. Step 2.1.2 – Type Matters (Considering one Nominal value) Body shape plays a vital role in selecting the type of car for a specific audience. i.e, Sedan(52.97%) type of carswill be suitedforafamilyof 4.The hatchback(29.66%) issuitableforafamilyforextraboot space. The wagon(11.44%) isfor more than 4 people. The hardtop(3.39%) isforpeople withcamping nature. Convertible(2.54%) is for people who are very fond of the sun. There are a wide variety of audience asmentionedabove forwhich manufacturersare strivingto satisfythem. Accordingto the datasetgiven,Sedanismanufacturedmore innumbers comparedtoothers.The reasonforchoosing a pie chart because it is good for comparing the percentage of body types.
  • 3. 3 Santosh Kumaravel Sundaravadivelu S3729461 Step 2.1.3 – Mileage inCity (Consideringone Numericvalue) Mileage is inversely proportional to the maintenance of the car, excluding for newer cars. There is some thresholdwhere maximummileage isachievedforcertaintypesof cars.i.e, Car withmaximum performance will not able to get the mileage expected because compromise is made on the basis of the preferenceof individuals. Basedonthe datasetthereismaximummileageincityis49likewisethe minimum is 13. The average mileage is around 24.6. According to figure 75% fall on 28.7. Step 2.2.1: Horsepower- highway mileage (Hypothesis1) The mainaimof thishypothesisisto analysethe relationshipbetweenthe twocolumnsinthe dataset. Comparinghorsepowerwith highwaymileage,whenithashighhorsepowerit resultsin low highway- mileage. The observation is based on the pattern is generated in the scatter plot. The reason for choosingto visualize inscatterplotbecause of the pattern formed. Infig 2.2.1 the patternis formed from high horsepower low mileage and vice versa. Mostly horsepower is preferred by high-cost car. So for achieving best highway mileage, Cars with lower horsepower is considered.
  • 4. 4 Santosh Kumaravel Sundaravadivelu S3729461 Step 2.2.2 – Price Wars - (Hypothesis2) The Hypothesisistodeterminethe Make of the cardeterminesthe price of the car. The manufactures witha longhistoryand brandname will be the leaderstodecide the marketof the cars. Accordingto the dataset, Mercedes Benz is pricing more compared to other brands, the second highest cost is BMW, there isadifferentsetof audience forthe rangeof cars,whichcandeterminethe manufactures to produce them. The budgetcars, there foraspecificsetof audience.The price of the carstartsfrom 5118 to45400. The plausiblehypothesisisconcludedbasedonthe function,the priceincreasesbased on the Make of the car. Step 2.2.3 – Mileage and Cylinders (Hypothesis3) The Hypothesisistodeterminethe leastnumberof cylinders insidethe engineisareasonfor getting good city mileage If there are the least number of cylindersit will be easy for the engine to process the petrol or gas and whichin turn will be useful forgettingbettermileage. The Scatterplot istaken because of the comparison will be clear and viewable.The plausible hypothesis can be taken into considerationbasedonthe dataset for four cylindersthe mileage ismore and for twelve cylindersit is very low. The pattern can be concluded stating that a lower number of cylinders higher the city mileage and vice versa.
  • 5. 5 Santosh Kumaravel Sundaravadivelu S3729461 Step 2.3 Scatter Matrix : Scatter Matrix is plottedbasedon groupingall the numericvaluesandplottedwiththe figure size of 18. The observationcanbe made basedon the valueswhichare producedin a diagonal fashion. The diagonal of the matrix contains the bar chart of all the 15 numeric values and scatter plot is plotted for all the 15 numerical values.