SlideShare a Scribd company logo
Dataset
• Data Acquisition
• There are various formats for a dataset, .csv, .json, .xlsx etc. The dataset can be stored in different
places, on your local machine or sometimes online.
In this section, you will learn how to load a dataset into our Jupyter Notebook.
In our case, the Automobile Dataset is an online source, and it is in CSV (comma separated value)
format. Let's use this dataset as an example to practice data reading.
• data source: https://archive.ics.uci.edu/ml/machine-learning-databases/autos/imports-85.data
• data type: csv
• The Pandas Library is a useful tool that enables us to read various datasets into a data frame; our
Jupyter notebook platforms have a built-in Pandas Library so that all we need to do is import
Pandas without installing.
Background
• Link - https://colab.research.google.com/drive/1S3hZ99_c-
eCQmU3mDyjSHle4KX1j_tMW?usp=sharing
• Information and background
• UCI Machine Learning Repository: Automobile Data Set
Objectives
• Handle missing values
• Bring the columns into correct datatypes
• Obtain mean, median, mode
• Convert miles per gallon (mpg) values into liters per 100 km using The formula
for unit conversion is L/100km = 235 / mpg
• Bring the original column range to uniform range 0-1
• In our dataset, "horsepower" is a real valued variable ranging from 48 to
288, it has 57 unique values. What if we only care about the price
difference between cars with high horsepower, medium horsepower, and
little horsepower (3 types)? Can we rearrange them into three ‘bins' to
simplify analysis?
Objectives
• We see the column "fuel-type" has two unique values, "gas" or
"diesel". Regression doesn't understand words, only numbers.
To use this attribute in regression analysis, we convert "fuel-
type" into indicator variables. Also do this for any other
columns?
• Find the correlation between the following columns: bore,
stroke,compression-ratio , and horsepower.
• find the scatterplot of "engine-size" and "price“
• Find which variables are suitable predictors of price by using
correlation – regression plot or scatterplot
Objectives
• Draw a heatmap of correlation between different predictors and check
which are correlated
• look at the relationship between "body-style" and "price“ using
suitable visualization plot – box plot
• examine engine "engine-location" and "price“ using suitable
visualization plot – box plot
• examine "drive-wheels" and "price" using suitable visualization plot
– box plot
• Use the "groupby" function to find the average "price" of each car
based on "body-style" ?
Objectives
• Use the "groupby" function to find other interesting knowledge?
• calculate the Pearson Correlation Coefficient and P-value of
'wheel-base' and 'price’.
• calculate the Pearson Correlation Coefficient and P-value of
'horsepower' and 'price’.
• calculate the Pearson Correlation Coefficient and P-value of
'length' and 'price’.
• calculate the Pearson Correlation Coefficient and P-value of 'width'
and 'price':
Objectives
• calculate the Pearson Correlation Coefficient and P-value of
other predictors and 'price’ to find strong evidence of correlation
• Use machine/deep learning for prediction of price
Next class
• Discussion next class
• Upload on Github
Thank you

More Related Content

Similar to DS.pptx

project ppt.pdf
project ppt.pdfproject ppt.pdf
project ppt.pdf
sumayyabeegam1
 
Building a Data Warehouse for Business Analytics using Spark SQL-(Blagoy Kalo...
Building a Data Warehouse for Business Analytics using Spark SQL-(Blagoy Kalo...Building a Data Warehouse for Business Analytics using Spark SQL-(Blagoy Kalo...
Building a Data Warehouse for Business Analytics using Spark SQL-(Blagoy Kalo...
Spark Summit
 
Azure machine learning
Azure machine learningAzure machine learning
Azure machine learning
Simone Caldaro
 
INTERNSHIP PPT.pptx
INTERNSHIP PPT.pptxINTERNSHIP PPT.pptx
INTERNSHIP PPT.pptx
Mahalakshmi752536
 
APN Partner Webinar - Having Effective and Critical TCO Conversations
APN Partner Webinar - Having Effective and Critical TCO ConversationsAPN Partner Webinar - Having Effective and Critical TCO Conversations
APN Partner Webinar - Having Effective and Critical TCO Conversations
Amazon Web Services
 
Chap7-Multidimensional data modeling.pptx
Chap7-Multidimensional data modeling.pptxChap7-Multidimensional data modeling.pptx
Chap7-Multidimensional data modeling.pptx
MOHDAIMANFARHANBINMO
 
When We Spark and When We Don’t: Developing Data and ML Pipelines
When We Spark and When We Don’t: Developing Data and ML PipelinesWhen We Spark and When We Don’t: Developing Data and ML Pipelines
When We Spark and When We Don’t: Developing Data and ML Pipelines
Stitch Fix Algorithms
 
Nose Dive into Apache Spark ML
Nose Dive into Apache Spark MLNose Dive into Apache Spark ML
Nose Dive into Apache Spark ML
Ahmet Bulut
 
Beginners guide to_optimizer
Beginners guide to_optimizerBeginners guide to_optimizer
Beginners guide to_optimizer
Maria Colgan
 
2018 data warehouse features in spark
2018   data warehouse features in spark2018   data warehouse features in spark
2018 data warehouse features in spark
Chester Chen
 
Evolving The Optimal Relevancy Scoring Model at Dice.com: Presented by Simon ...
Evolving The Optimal Relevancy Scoring Model at Dice.com: Presented by Simon ...Evolving The Optimal Relevancy Scoring Model at Dice.com: Presented by Simon ...
Evolving The Optimal Relevancy Scoring Model at Dice.com: Presented by Simon ...
Lucidworks
 
AILABS - Lecture Series - Is AI the New Electricity. Topic- Role of AI in Log...
AILABS - Lecture Series - Is AI the New Electricity. Topic- Role of AI in Log...AILABS - Lecture Series - Is AI the New Electricity. Topic- Role of AI in Log...
AILABS - Lecture Series - Is AI the New Electricity. Topic- Role of AI in Log...
AILABS Academy
 
Auto-Pilot for Apache Spark Using Machine Learning
Auto-Pilot for Apache Spark Using Machine LearningAuto-Pilot for Apache Spark Using Machine Learning
Auto-Pilot for Apache Spark Using Machine Learning
Databricks
 
Running with Elephants: Predictive Analytics with HDInsight
Running with Elephants: Predictive Analytics with HDInsightRunning with Elephants: Predictive Analytics with HDInsight
Running with Elephants: Predictive Analytics with HDInsight
Chris Price
 
Bn1038 demo pega
Bn1038 demo  pegaBn1038 demo  pega
Bn1038 demo pega
conline training
 
UNIT_5_Data Wrangling.pptx
UNIT_5_Data Wrangling.pptxUNIT_5_Data Wrangling.pptx
UNIT_5_Data Wrangling.pptx
BhagyasriPatel2
 
Procedural vs. object oriented programming
Procedural vs. object oriented programmingProcedural vs. object oriented programming
Procedural vs. object oriented programming
Haris Bin Zahid
 
How to Build a Data Closed-loop Platform for Autonomous Driving?
How to Build a Data Closed-loop Platform for Autonomous Driving?How to Build a Data Closed-loop Platform for Autonomous Driving?
How to Build a Data Closed-loop Platform for Autonomous Driving?
Yu Huang
 
Switchable Map APIs with Drupal
Switchable Map APIs with DrupalSwitchable Map APIs with Drupal
Switchable Map APIs with Drupal
Ranel Padon
 
Learning to Optimize
Learning to OptimizeLearning to Optimize
Learning to Optimize
Pramit Choudhary
 

Similar to DS.pptx (20)

project ppt.pdf
project ppt.pdfproject ppt.pdf
project ppt.pdf
 
Building a Data Warehouse for Business Analytics using Spark SQL-(Blagoy Kalo...
Building a Data Warehouse for Business Analytics using Spark SQL-(Blagoy Kalo...Building a Data Warehouse for Business Analytics using Spark SQL-(Blagoy Kalo...
Building a Data Warehouse for Business Analytics using Spark SQL-(Blagoy Kalo...
 
Azure machine learning
Azure machine learningAzure machine learning
Azure machine learning
 
INTERNSHIP PPT.pptx
INTERNSHIP PPT.pptxINTERNSHIP PPT.pptx
INTERNSHIP PPT.pptx
 
APN Partner Webinar - Having Effective and Critical TCO Conversations
APN Partner Webinar - Having Effective and Critical TCO ConversationsAPN Partner Webinar - Having Effective and Critical TCO Conversations
APN Partner Webinar - Having Effective and Critical TCO Conversations
 
Chap7-Multidimensional data modeling.pptx
Chap7-Multidimensional data modeling.pptxChap7-Multidimensional data modeling.pptx
Chap7-Multidimensional data modeling.pptx
 
When We Spark and When We Don’t: Developing Data and ML Pipelines
When We Spark and When We Don’t: Developing Data and ML PipelinesWhen We Spark and When We Don’t: Developing Data and ML Pipelines
When We Spark and When We Don’t: Developing Data and ML Pipelines
 
Nose Dive into Apache Spark ML
Nose Dive into Apache Spark MLNose Dive into Apache Spark ML
Nose Dive into Apache Spark ML
 
Beginners guide to_optimizer
Beginners guide to_optimizerBeginners guide to_optimizer
Beginners guide to_optimizer
 
2018 data warehouse features in spark
2018   data warehouse features in spark2018   data warehouse features in spark
2018 data warehouse features in spark
 
Evolving The Optimal Relevancy Scoring Model at Dice.com: Presented by Simon ...
Evolving The Optimal Relevancy Scoring Model at Dice.com: Presented by Simon ...Evolving The Optimal Relevancy Scoring Model at Dice.com: Presented by Simon ...
Evolving The Optimal Relevancy Scoring Model at Dice.com: Presented by Simon ...
 
AILABS - Lecture Series - Is AI the New Electricity. Topic- Role of AI in Log...
AILABS - Lecture Series - Is AI the New Electricity. Topic- Role of AI in Log...AILABS - Lecture Series - Is AI the New Electricity. Topic- Role of AI in Log...
AILABS - Lecture Series - Is AI the New Electricity. Topic- Role of AI in Log...
 
Auto-Pilot for Apache Spark Using Machine Learning
Auto-Pilot for Apache Spark Using Machine LearningAuto-Pilot for Apache Spark Using Machine Learning
Auto-Pilot for Apache Spark Using Machine Learning
 
Running with Elephants: Predictive Analytics with HDInsight
Running with Elephants: Predictive Analytics with HDInsightRunning with Elephants: Predictive Analytics with HDInsight
Running with Elephants: Predictive Analytics with HDInsight
 
Bn1038 demo pega
Bn1038 demo  pegaBn1038 demo  pega
Bn1038 demo pega
 
UNIT_5_Data Wrangling.pptx
UNIT_5_Data Wrangling.pptxUNIT_5_Data Wrangling.pptx
UNIT_5_Data Wrangling.pptx
 
Procedural vs. object oriented programming
Procedural vs. object oriented programmingProcedural vs. object oriented programming
Procedural vs. object oriented programming
 
How to Build a Data Closed-loop Platform for Autonomous Driving?
How to Build a Data Closed-loop Platform for Autonomous Driving?How to Build a Data Closed-loop Platform for Autonomous Driving?
How to Build a Data Closed-loop Platform for Autonomous Driving?
 
Switchable Map APIs with Drupal
Switchable Map APIs with DrupalSwitchable Map APIs with Drupal
Switchable Map APIs with Drupal
 
Learning to Optimize
Learning to OptimizeLearning to Optimize
Learning to Optimize
 

Recently uploaded

“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”
Claudio Di Ciccio
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
Quotidiano Piemontese
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
Zilliz
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Safe Software
 
Infrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI modelsInfrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI models
Zilliz
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
SOFTTECHHUB
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
Neo4j
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
mikeeftimakis1
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
名前 です男
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems S.M.S.A.
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
KAMESHS29
 
Mariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceXMariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceX
Mariano Tinti
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
shyamraj55
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
danishmna97
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
panagenda
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
Kari Kakkonen
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
Safe Software
 
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
Edge AI and Vision Alliance
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
Matthew Sinclair
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
panagenda
 

Recently uploaded (20)

“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
 
Infrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI modelsInfrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI models
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
 
Mariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceXMariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceX
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
 
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
 

DS.pptx

  • 1. Dataset • Data Acquisition • There are various formats for a dataset, .csv, .json, .xlsx etc. The dataset can be stored in different places, on your local machine or sometimes online. In this section, you will learn how to load a dataset into our Jupyter Notebook. In our case, the Automobile Dataset is an online source, and it is in CSV (comma separated value) format. Let's use this dataset as an example to practice data reading. • data source: https://archive.ics.uci.edu/ml/machine-learning-databases/autos/imports-85.data • data type: csv • The Pandas Library is a useful tool that enables us to read various datasets into a data frame; our Jupyter notebook platforms have a built-in Pandas Library so that all we need to do is import Pandas without installing.
  • 2. Background • Link - https://colab.research.google.com/drive/1S3hZ99_c- eCQmU3mDyjSHle4KX1j_tMW?usp=sharing • Information and background • UCI Machine Learning Repository: Automobile Data Set
  • 3. Objectives • Handle missing values • Bring the columns into correct datatypes • Obtain mean, median, mode • Convert miles per gallon (mpg) values into liters per 100 km using The formula for unit conversion is L/100km = 235 / mpg • Bring the original column range to uniform range 0-1 • In our dataset, "horsepower" is a real valued variable ranging from 48 to 288, it has 57 unique values. What if we only care about the price difference between cars with high horsepower, medium horsepower, and little horsepower (3 types)? Can we rearrange them into three ‘bins' to simplify analysis?
  • 4. Objectives • We see the column "fuel-type" has two unique values, "gas" or "diesel". Regression doesn't understand words, only numbers. To use this attribute in regression analysis, we convert "fuel- type" into indicator variables. Also do this for any other columns? • Find the correlation between the following columns: bore, stroke,compression-ratio , and horsepower. • find the scatterplot of "engine-size" and "price“ • Find which variables are suitable predictors of price by using correlation – regression plot or scatterplot
  • 5. Objectives • Draw a heatmap of correlation between different predictors and check which are correlated • look at the relationship between "body-style" and "price“ using suitable visualization plot – box plot • examine engine "engine-location" and "price“ using suitable visualization plot – box plot • examine "drive-wheels" and "price" using suitable visualization plot – box plot • Use the "groupby" function to find the average "price" of each car based on "body-style" ?
  • 6. Objectives • Use the "groupby" function to find other interesting knowledge? • calculate the Pearson Correlation Coefficient and P-value of 'wheel-base' and 'price’. • calculate the Pearson Correlation Coefficient and P-value of 'horsepower' and 'price’. • calculate the Pearson Correlation Coefficient and P-value of 'length' and 'price’. • calculate the Pearson Correlation Coefficient and P-value of 'width' and 'price':
  • 7. Objectives • calculate the Pearson Correlation Coefficient and P-value of other predictors and 'price’ to find strong evidence of correlation • Use machine/deep learning for prediction of price
  • 8. Next class • Discussion next class • Upload on Github