SlideShare a Scribd company logo
1 of 12
Regression: Predicting
House Prices
SRUTI JAIN
MACHINE LEARNING SPECIALIZATION
UNIVERSITY OF WASHINGTON
Problem Statement
Determine the housing prices of California properties for new sellers and also for buyers to
estimate the profitability of the deal.
Question: How much is my house worth?
Solution: Involves looking at recent sales
in the neighborhood
Dataset Details
1. The data is taken from California census data
with 20,640 instances & 10 attributes
2. Converted the text attribute (ocean_proximity)
into categorical data types using one hot
encoding scheme using Scikit package.
3. Attributes like latitude, longitude were used
during exploratory analysis. Not used in further
model building.
4. Feature standardization was performed on all
numeric data variables.
5. The dataset was split into Train-Validate-Test
samples using Stratified sampling.
Correlation Plot
Exploratory Analysis Plot
Plot to visualize role of latitude, longitude & population on the price of the house
Training-Testing Models
1. Linear Regression
2. Decision Tree Regressor
3. Random Forest Regressor
4. Support Vector Regressor
5. Fine Tuning the Hyperparameters for Random Forest Regressor using Grid Search and
Randomized Search
Note: Random seed values were picked to develop training, validation & testing sets in the ratio
60:20:20
Linear Regression
Linear regression helped understand which variable are significant & which not. Also since many
of our attributes are continuous, linear regression is a good approach to use as a starting step.
Decision Tree Regressor
Random Forest Regressor
Support Vector Regressor
Comparative Analysis
1. In multiple linear regression, the best R-Squared 0.6002, correlation of prediction and test is
0.7748672 and RMSE- 68321.70.
2. In Decision Tree, the best regression model comes from random forest with correlation
0.876914 and RMSE- 70269.57.
3. In SVM model, model with linear kernel performs best with correlation 0.82014 & RMSE-
110914.79.
4. Of the four models, random forest performs better than the others with least RMSE-
49261.28 obtained by tuning the Hyperparameters using Randomized Search.
Thank You !!

More Related Content

What's hot

House Price Estimates Based on Machine Learning Algorithm
House Price Estimates Based on Machine Learning AlgorithmHouse Price Estimates Based on Machine Learning Algorithm
House Price Estimates Based on Machine Learning Algorithm
ijtsrd
 
housepriceprediction-ml.pptx
housepriceprediction-ml.pptxhousepriceprediction-ml.pptx
housepriceprediction-ml.pptx
tommychauhan
 
Machine Learning Project
Machine Learning ProjectMachine Learning Project
Machine Learning Project
Abhishek Singh
 

What's hot (20)

House Price Prediction An AI Approach.
House Price Prediction An AI Approach.House Price Prediction An AI Approach.
House Price Prediction An AI Approach.
 
House Price Estimates Based on Machine Learning Algorithm
House Price Estimates Based on Machine Learning AlgorithmHouse Price Estimates Based on Machine Learning Algorithm
House Price Estimates Based on Machine Learning Algorithm
 
Predicting house price
Predicting house pricePredicting house price
Predicting house price
 
Predicting Moscow Real Estate Prices with Azure Machine Learning
Predicting Moscow Real Estate Prices with Azure Machine LearningPredicting Moscow Real Estate Prices with Azure Machine Learning
Predicting Moscow Real Estate Prices with Azure Machine Learning
 
IRJET- House Rent Price Prediction
IRJET- House Rent Price PredictionIRJET- House Rent Price Prediction
IRJET- House Rent Price Prediction
 
Housing price prediction
Housing price predictionHousing price prediction
Housing price prediction
 
House Price Prediction Using Machine Learning
House Price Prediction Using Machine LearningHouse Price Prediction Using Machine Learning
House Price Prediction Using Machine Learning
 
Machine learning
Machine learningMachine learning
Machine learning
 
housepriceprediction-ml.pptx
housepriceprediction-ml.pptxhousepriceprediction-ml.pptx
housepriceprediction-ml.pptx
 
Data analytics with python introductory
Data analytics with python introductoryData analytics with python introductory
Data analytics with python introductory
 
Machine Learning - Splitting Datasets
Machine Learning - Splitting DatasetsMachine Learning - Splitting Datasets
Machine Learning - Splitting Datasets
 
House Price Prediction Using Machine Learning Via Data Analysis
House Price Prediction Using Machine Learning Via Data AnalysisHouse Price Prediction Using Machine Learning Via Data Analysis
House Price Prediction Using Machine Learning Via Data Analysis
 
Linear regression in machine learning
Linear regression in machine learningLinear regression in machine learning
Linear regression in machine learning
 
Ames housing
Ames housingAmes housing
Ames housing
 
Machine Learning Project
Machine Learning ProjectMachine Learning Project
Machine Learning Project
 
Machine Learning and Real-World Applications
Machine Learning and Real-World ApplicationsMachine Learning and Real-World Applications
Machine Learning and Real-World Applications
 
Predictive Analytics - An Overview
Predictive Analytics - An OverviewPredictive Analytics - An Overview
Predictive Analytics - An Overview
 
Machine Learning - Accuracy and Confusion Matrix
Machine Learning - Accuracy and Confusion MatrixMachine Learning - Accuracy and Confusion Matrix
Machine Learning - Accuracy and Confusion Matrix
 
Decision Trees
Decision TreesDecision Trees
Decision Trees
 
laptop price prediction presentation
laptop price prediction presentationlaptop price prediction presentation
laptop price prediction presentation
 

Similar to Predicting house prices_Regression

Practical Data Science: Data Modelling and Presentation
Practical Data Science: Data Modelling and PresentationPractical Data Science: Data Modelling and Presentation
Practical Data Science: Data Modelling and Presentation
HariniMS1
 
Parameter Optimisation for Automated Feature Point Detection
Parameter Optimisation for Automated Feature Point DetectionParameter Optimisation for Automated Feature Point Detection
Parameter Optimisation for Automated Feature Point Detection
Dario Panada
 
RBHF_SDM_2011_Jie
RBHF_SDM_2011_JieRBHF_SDM_2011_Jie
RBHF_SDM_2011_Jie
MDO_Lab
 

Similar to Predicting house prices_Regression (20)

The Beginnings Of A Search Engine
The Beginnings Of A Search EngineThe Beginnings Of A Search Engine
The Beginnings Of A Search Engine
 
The Beginnings of a Search Engine
The Beginnings of a Search EngineThe Beginnings of a Search Engine
The Beginnings of a Search Engine
 
Unveiling the Market: Predicting House Prices with Data Science
Unveiling the Market: Predicting House Prices with Data ScienceUnveiling the Market: Predicting House Prices with Data Science
Unveiling the Market: Predicting House Prices with Data Science
 
Predicting House Prices: A Machine Learning Approach
Predicting House Prices: A Machine Learning ApproachPredicting House Prices: A Machine Learning Approach
Predicting House Prices: A Machine Learning Approach
 
Comparison of Segmentation Algorithms and Estimation of Optimal Segmentation ...
Comparison of Segmentation Algorithms and Estimation of Optimal Segmentation ...Comparison of Segmentation Algorithms and Estimation of Optimal Segmentation ...
Comparison of Segmentation Algorithms and Estimation of Optimal Segmentation ...
 
Practical Data Science: Data Modelling and Presentation
Practical Data Science: Data Modelling and PresentationPractical Data Science: Data Modelling and Presentation
Practical Data Science: Data Modelling and Presentation
 
Competition16
Competition16Competition16
Competition16
 
Predicting Moscow Real Estate Prices with Azure Machine Learning
Predicting Moscow Real Estate Prices with Azure Machine LearningPredicting Moscow Real Estate Prices with Azure Machine Learning
Predicting Moscow Real Estate Prices with Azure Machine Learning
 
Predicting Moscow Real Estate Prices with Azure Machine Learning
Predicting Moscow Real Estate Prices with Azure Machine LearningPredicting Moscow Real Estate Prices with Azure Machine Learning
Predicting Moscow Real Estate Prices with Azure Machine Learning
 
Parameter Optimisation for Automated Feature Point Detection
Parameter Optimisation for Automated Feature Point DetectionParameter Optimisation for Automated Feature Point Detection
Parameter Optimisation for Automated Feature Point Detection
 
Study of relevancy, diversity, and novelty in recommender systems
Study of relevancy, diversity, and novelty in recommender systemsStudy of relevancy, diversity, and novelty in recommender systems
Study of relevancy, diversity, and novelty in recommender systems
 
forest-cover-type
forest-cover-typeforest-cover-type
forest-cover-type
 
Classification modelling review
Classification modelling reviewClassification modelling review
Classification modelling review
 
Predicting rainfall using ensemble of ensembles
Predicting rainfall using ensemble of ensemblesPredicting rainfall using ensemble of ensembles
Predicting rainfall using ensemble of ensembles
 
RBHF_SDM_2011_Jie
RBHF_SDM_2011_JieRBHF_SDM_2011_Jie
RBHF_SDM_2011_Jie
 
Dmml report final
Dmml report finalDmml report final
Dmml report final
 
Python Code for Classification Supervised Machine Learning.pdf
Python Code for Classification Supervised Machine Learning.pdfPython Code for Classification Supervised Machine Learning.pdf
Python Code for Classification Supervised Machine Learning.pdf
 
casestudy_important.pptx
casestudy_important.pptxcasestudy_important.pptx
casestudy_important.pptx
 
Predict Backorder on a supply chain data for an Organization
Predict Backorder on a supply chain data for an OrganizationPredict Backorder on a supply chain data for an Organization
Predict Backorder on a supply chain data for an Organization
 
Study of Parametric Kernels for LSSVM Models in NIR Determination of Fishmeal...
Study of Parametric Kernels for LSSVM Models in NIR Determination of Fishmeal...Study of Parametric Kernels for LSSVM Models in NIR Determination of Fishmeal...
Study of Parametric Kernels for LSSVM Models in NIR Determination of Fishmeal...
 

Recently uploaded

一比一原版(Monash毕业证书)莫纳什大学毕业证原件一模一样
一比一原版(Monash毕业证书)莫纳什大学毕业证原件一模一样一比一原版(Monash毕业证书)莫纳什大学毕业证原件一模一样
一比一原版(Monash毕业证书)莫纳什大学毕业证原件一模一样
yhavx
 
Simplify hybrid data integration at an enterprise scale. Integrate all your d...
Simplify hybrid data integration at an enterprise scale. Integrate all your d...Simplify hybrid data integration at an enterprise scale. Integrate all your d...
Simplify hybrid data integration at an enterprise scale. Integrate all your d...
varanasisatyanvesh
 
obat aborsi Bontang wa 082135199655 jual obat aborsi cytotec asli di Bontang
obat aborsi Bontang wa 082135199655 jual obat aborsi cytotec asli di  Bontangobat aborsi Bontang wa 082135199655 jual obat aborsi cytotec asli di  Bontang
obat aborsi Bontang wa 082135199655 jual obat aborsi cytotec asli di Bontang
siskavia95
 
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
acoha1
 
原件一样伦敦国王学院毕业证成绩单留信学历认证
原件一样伦敦国王学院毕业证成绩单留信学历认证原件一样伦敦国王学院毕业证成绩单留信学历认证
原件一样伦敦国王学院毕业证成绩单留信学历认证
pwgnohujw
 
Abortion Clinic in Kempton Park +27791653574 WhatsApp Abortion Clinic Service...
Abortion Clinic in Kempton Park +27791653574 WhatsApp Abortion Clinic Service...Abortion Clinic in Kempton Park +27791653574 WhatsApp Abortion Clinic Service...
Abortion Clinic in Kempton Park +27791653574 WhatsApp Abortion Clinic Service...
mikehavy0
 
Audience Researchndfhcvnfgvgbhujhgfv.pptx
Audience Researchndfhcvnfgvgbhujhgfv.pptxAudience Researchndfhcvnfgvgbhujhgfv.pptx
Audience Researchndfhcvnfgvgbhujhgfv.pptx
Stephen266013
 
一比一原版(ucla文凭证书)加州大学洛杉矶分校毕业证学历认证官方成绩单
一比一原版(ucla文凭证书)加州大学洛杉矶分校毕业证学历认证官方成绩单一比一原版(ucla文凭证书)加州大学洛杉矶分校毕业证学历认证官方成绩单
一比一原版(ucla文凭证书)加州大学洛杉矶分校毕业证学历认证官方成绩单
aqpto5bt
 
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证
pwgnohujw
 
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
q6pzkpark
 

Recently uploaded (20)

一比一原版(Monash毕业证书)莫纳什大学毕业证原件一模一样
一比一原版(Monash毕业证书)莫纳什大学毕业证原件一模一样一比一原版(Monash毕业证书)莫纳什大学毕业证原件一模一样
一比一原版(Monash毕业证书)莫纳什大学毕业证原件一模一样
 
Genuine love spell caster )! ,+27834335081) Ex lover back permanently in At...
Genuine love spell caster )! ,+27834335081)   Ex lover back permanently in At...Genuine love spell caster )! ,+27834335081)   Ex lover back permanently in At...
Genuine love spell caster )! ,+27834335081) Ex lover back permanently in At...
 
Simplify hybrid data integration at an enterprise scale. Integrate all your d...
Simplify hybrid data integration at an enterprise scale. Integrate all your d...Simplify hybrid data integration at an enterprise scale. Integrate all your d...
Simplify hybrid data integration at an enterprise scale. Integrate all your d...
 
Chapter 1 - Introduction to Data Mining Concepts and Techniques.pptx
Chapter 1 - Introduction to Data Mining Concepts and Techniques.pptxChapter 1 - Introduction to Data Mining Concepts and Techniques.pptx
Chapter 1 - Introduction to Data Mining Concepts and Techniques.pptx
 
Formulas dax para power bI de microsoft.pdf
Formulas dax para power bI de microsoft.pdfFormulas dax para power bI de microsoft.pdf
Formulas dax para power bI de microsoft.pdf
 
obat aborsi Bontang wa 082135199655 jual obat aborsi cytotec asli di Bontang
obat aborsi Bontang wa 082135199655 jual obat aborsi cytotec asli di  Bontangobat aborsi Bontang wa 082135199655 jual obat aborsi cytotec asli di  Bontang
obat aborsi Bontang wa 082135199655 jual obat aborsi cytotec asli di Bontang
 
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
 
Predictive Precipitation: Advanced Rain Forecasting Techniques
Predictive Precipitation: Advanced Rain Forecasting TechniquesPredictive Precipitation: Advanced Rain Forecasting Techniques
Predictive Precipitation: Advanced Rain Forecasting Techniques
 
原件一样伦敦国王学院毕业证成绩单留信学历认证
原件一样伦敦国王学院毕业证成绩单留信学历认证原件一样伦敦国王学院毕业证成绩单留信学历认证
原件一样伦敦国王学院毕业证成绩单留信学历认证
 
Abortion Clinic in Kempton Park +27791653574 WhatsApp Abortion Clinic Service...
Abortion Clinic in Kempton Park +27791653574 WhatsApp Abortion Clinic Service...Abortion Clinic in Kempton Park +27791653574 WhatsApp Abortion Clinic Service...
Abortion Clinic in Kempton Park +27791653574 WhatsApp Abortion Clinic Service...
 
MATERI MANAJEMEN OF PENYAKIT TETANUS.ppt
MATERI  MANAJEMEN OF PENYAKIT TETANUS.pptMATERI  MANAJEMEN OF PENYAKIT TETANUS.ppt
MATERI MANAJEMEN OF PENYAKIT TETANUS.ppt
 
Credit Card Fraud Detection: Safeguarding Transactions in the Digital Age
Credit Card Fraud Detection: Safeguarding Transactions in the Digital AgeCredit Card Fraud Detection: Safeguarding Transactions in the Digital Age
Credit Card Fraud Detection: Safeguarding Transactions in the Digital Age
 
NOAM AAUG Adobe Summit 2024: Summit Slam Dunks
NOAM AAUG Adobe Summit 2024: Summit Slam DunksNOAM AAUG Adobe Summit 2024: Summit Slam Dunks
NOAM AAUG Adobe Summit 2024: Summit Slam Dunks
 
The Significance of Transliteration Enhancing
The Significance of Transliteration EnhancingThe Significance of Transliteration Enhancing
The Significance of Transliteration Enhancing
 
Audience Researchndfhcvnfgvgbhujhgfv.pptx
Audience Researchndfhcvnfgvgbhujhgfv.pptxAudience Researchndfhcvnfgvgbhujhgfv.pptx
Audience Researchndfhcvnfgvgbhujhgfv.pptx
 
一比一原版(ucla文凭证书)加州大学洛杉矶分校毕业证学历认证官方成绩单
一比一原版(ucla文凭证书)加州大学洛杉矶分校毕业证学历认证官方成绩单一比一原版(ucla文凭证书)加州大学洛杉矶分校毕业证学历认证官方成绩单
一比一原版(ucla文凭证书)加州大学洛杉矶分校毕业证学历认证官方成绩单
 
Northern New England Tableau User Group (TUG) May 2024
Northern New England Tableau User Group (TUG) May 2024Northern New England Tableau User Group (TUG) May 2024
Northern New England Tableau User Group (TUG) May 2024
 
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证
 
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
 
How to Transform Clinical Trial Management with Advanced Data Analytics
How to Transform Clinical Trial Management with Advanced Data AnalyticsHow to Transform Clinical Trial Management with Advanced Data Analytics
How to Transform Clinical Trial Management with Advanced Data Analytics
 

Predicting house prices_Regression

  • 1. Regression: Predicting House Prices SRUTI JAIN MACHINE LEARNING SPECIALIZATION UNIVERSITY OF WASHINGTON
  • 2. Problem Statement Determine the housing prices of California properties for new sellers and also for buyers to estimate the profitability of the deal. Question: How much is my house worth? Solution: Involves looking at recent sales in the neighborhood
  • 3. Dataset Details 1. The data is taken from California census data with 20,640 instances & 10 attributes 2. Converted the text attribute (ocean_proximity) into categorical data types using one hot encoding scheme using Scikit package. 3. Attributes like latitude, longitude were used during exploratory analysis. Not used in further model building. 4. Feature standardization was performed on all numeric data variables. 5. The dataset was split into Train-Validate-Test samples using Stratified sampling.
  • 5. Exploratory Analysis Plot Plot to visualize role of latitude, longitude & population on the price of the house
  • 6. Training-Testing Models 1. Linear Regression 2. Decision Tree Regressor 3. Random Forest Regressor 4. Support Vector Regressor 5. Fine Tuning the Hyperparameters for Random Forest Regressor using Grid Search and Randomized Search Note: Random seed values were picked to develop training, validation & testing sets in the ratio 60:20:20
  • 7. Linear Regression Linear regression helped understand which variable are significant & which not. Also since many of our attributes are continuous, linear regression is a good approach to use as a starting step.
  • 11. Comparative Analysis 1. In multiple linear regression, the best R-Squared 0.6002, correlation of prediction and test is 0.7748672 and RMSE- 68321.70. 2. In Decision Tree, the best regression model comes from random forest with correlation 0.876914 and RMSE- 70269.57. 3. In SVM model, model with linear kernel performs best with correlation 0.82014 & RMSE- 110914.79. 4. Of the four models, random forest performs better than the others with least RMSE- 49261.28 obtained by tuning the Hyperparameters using Randomized Search.