SlideShare a Scribd company logo
1 of 12
Regression: Predicting
House Prices
SRUTI JAIN
MACHINE LEARNING SPECIALIZATION
UNIVERSITY OF WASHINGTON
Problem Statement
Determine the housing prices of California properties for new sellers and also for buyers to
estimate the profitability of the deal.
Question: How much is my house worth?
Solution: Involves looking at recent sales
in the neighborhood
Dataset Details
1. The data is taken from California census data
with 20,640 instances & 10 attributes
2. Converted the text attribute (ocean_proximity)
into categorical data types using one hot
encoding scheme using Scikit package.
3. Attributes like latitude, longitude were used
during exploratory analysis. Not used in further
model building.
4. Feature standardization was performed on all
numeric data variables.
5. The dataset was split into Train-Validate-Test
samples using Stratified sampling.
Correlation Plot
Exploratory Analysis Plot
Plot to visualize role of latitude, longitude & population on the price of the house
Training-Testing Models
1. Linear Regression
2. Decision Tree Regressor
3. Random Forest Regressor
4. Support Vector Regressor
5. Fine Tuning the Hyperparameters for Random Forest Regressor using Grid Search and
Randomized Search
Note: Random seed values were picked to develop training, validation & testing sets in the ratio
60:20:20
Linear Regression
Linear regression helped understand which variable are significant & which not. Also since many
of our attributes are continuous, linear regression is a good approach to use as a starting step.
Decision Tree Regressor
Random Forest Regressor
Support Vector Regressor
Comparative Analysis
1. In multiple linear regression, the best R-Squared 0.6002, correlation of prediction and test is
0.7748672 and RMSE- 68321.70.
2. In Decision Tree, the best regression model comes from random forest with correlation
0.876914 and RMSE- 70269.57.
3. In SVM model, model with linear kernel performs best with correlation 0.82014 & RMSE-
110914.79.
4. Of the four models, random forest performs better than the others with least RMSE-
49261.28 obtained by tuning the Hyperparameters using Randomized Search.
Thank You !!

More Related Content

What's hot

House Price Estimates Based on Machine Learning Algorithm
House Price Estimates Based on Machine Learning AlgorithmHouse Price Estimates Based on Machine Learning Algorithm
House Price Estimates Based on Machine Learning Algorithm
ijtsrd
 
House Price Prediction.pptx
House Price Prediction.pptxHouse Price Prediction.pptx
House Price Prediction.pptx
CodingWorld5
 
housepriceprediction-ml.pptx
housepriceprediction-ml.pptxhousepriceprediction-ml.pptx
housepriceprediction-ml.pptx
tommychauhan
 

What's hot (20)

House Price Estimates Based on Machine Learning Algorithm
House Price Estimates Based on Machine Learning AlgorithmHouse Price Estimates Based on Machine Learning Algorithm
House Price Estimates Based on Machine Learning Algorithm
 
Housing price prediction
Housing price predictionHousing price prediction
Housing price prediction
 
House Price Prediction.pptx
House Price Prediction.pptxHouse Price Prediction.pptx
House Price Prediction.pptx
 
House price prediction
House price predictionHouse price prediction
House price prediction
 
House Price Prediction An AI Approach.
House Price Prediction An AI Approach.House Price Prediction An AI Approach.
House Price Prediction An AI Approach.
 
Data analytics with python introductory
Data analytics with python introductoryData analytics with python introductory
Data analytics with python introductory
 
housepriceprediction-ml.pptx
housepriceprediction-ml.pptxhousepriceprediction-ml.pptx
housepriceprediction-ml.pptx
 
Stock Price Prediction PPT
Stock Price Prediction  PPTStock Price Prediction  PPT
Stock Price Prediction PPT
 
IRJET- House Rent Price Prediction
IRJET- House Rent Price PredictionIRJET- House Rent Price Prediction
IRJET- House Rent Price Prediction
 
Machine Learning Deep Learning AI and Data Science
Machine Learning Deep Learning AI and Data Science Machine Learning Deep Learning AI and Data Science
Machine Learning Deep Learning AI and Data Science
 
Gradient Boosted trees
Gradient Boosted treesGradient Boosted trees
Gradient Boosted trees
 
Data Science: Prediction analysis for houses in Ames, Iowa.
Data Science: Prediction analysis for houses in Ames, Iowa.Data Science: Prediction analysis for houses in Ames, Iowa.
Data Science: Prediction analysis for houses in Ames, Iowa.
 
Deep Dive into Hyperparameter Tuning
Deep Dive into Hyperparameter TuningDeep Dive into Hyperparameter Tuning
Deep Dive into Hyperparameter Tuning
 
Deep Learning With Neural Networks
Deep Learning With Neural NetworksDeep Learning With Neural Networks
Deep Learning With Neural Networks
 
Performance Metrics for Machine Learning Algorithms
Performance Metrics for Machine Learning AlgorithmsPerformance Metrics for Machine Learning Algorithms
Performance Metrics for Machine Learning Algorithms
 
Machine learning and linear regression programming
Machine learning and linear regression programmingMachine learning and linear regression programming
Machine learning and linear regression programming
 
ML - Multiple Linear Regression
ML - Multiple Linear RegressionML - Multiple Linear Regression
ML - Multiple Linear Regression
 
Fraud detection with Machine Learning
Fraud detection with Machine LearningFraud detection with Machine Learning
Fraud detection with Machine Learning
 
Logistic regression in Machine Learning
Logistic regression in Machine LearningLogistic regression in Machine Learning
Logistic regression in Machine Learning
 
Machine Learning project presentation
Machine Learning project presentationMachine Learning project presentation
Machine Learning project presentation
 

Similar to Predicting house prices_Regression

Practical Data Science: Data Modelling and Presentation
Practical Data Science: Data Modelling and PresentationPractical Data Science: Data Modelling and Presentation
Practical Data Science: Data Modelling and Presentation
HariniMS1
 
Parameter Optimisation for Automated Feature Point Detection
Parameter Optimisation for Automated Feature Point DetectionParameter Optimisation for Automated Feature Point Detection
Parameter Optimisation for Automated Feature Point Detection
Dario Panada
 
RBHF_SDM_2011_Jie
RBHF_SDM_2011_JieRBHF_SDM_2011_Jie
RBHF_SDM_2011_Jie
MDO_Lab
 

Similar to Predicting house prices_Regression (20)

The Beginnings Of A Search Engine
The Beginnings Of A Search EngineThe Beginnings Of A Search Engine
The Beginnings Of A Search Engine
 
The Beginnings of a Search Engine
The Beginnings of a Search EngineThe Beginnings of a Search Engine
The Beginnings of a Search Engine
 
Unveiling the Market: Predicting House Prices with Data Science
Unveiling the Market: Predicting House Prices with Data ScienceUnveiling the Market: Predicting House Prices with Data Science
Unveiling the Market: Predicting House Prices with Data Science
 
Predicting House Prices: A Machine Learning Approach
Predicting House Prices: A Machine Learning ApproachPredicting House Prices: A Machine Learning Approach
Predicting House Prices: A Machine Learning Approach
 
Comparison of Segmentation Algorithms and Estimation of Optimal Segmentation ...
Comparison of Segmentation Algorithms and Estimation of Optimal Segmentation ...Comparison of Segmentation Algorithms and Estimation of Optimal Segmentation ...
Comparison of Segmentation Algorithms and Estimation of Optimal Segmentation ...
 
Practical Data Science: Data Modelling and Presentation
Practical Data Science: Data Modelling and PresentationPractical Data Science: Data Modelling and Presentation
Practical Data Science: Data Modelling and Presentation
 
Competition16
Competition16Competition16
Competition16
 
Predicting Moscow Real Estate Prices with Azure Machine Learning
Predicting Moscow Real Estate Prices with Azure Machine LearningPredicting Moscow Real Estate Prices with Azure Machine Learning
Predicting Moscow Real Estate Prices with Azure Machine Learning
 
Predicting Moscow Real Estate Prices with Azure Machine Learning
Predicting Moscow Real Estate Prices with Azure Machine LearningPredicting Moscow Real Estate Prices with Azure Machine Learning
Predicting Moscow Real Estate Prices with Azure Machine Learning
 
Parameter Optimisation for Automated Feature Point Detection
Parameter Optimisation for Automated Feature Point DetectionParameter Optimisation for Automated Feature Point Detection
Parameter Optimisation for Automated Feature Point Detection
 
Study of relevancy, diversity, and novelty in recommender systems
Study of relevancy, diversity, and novelty in recommender systemsStudy of relevancy, diversity, and novelty in recommender systems
Study of relevancy, diversity, and novelty in recommender systems
 
forest-cover-type
forest-cover-typeforest-cover-type
forest-cover-type
 
Classification modelling review
Classification modelling reviewClassification modelling review
Classification modelling review
 
Predicting rainfall using ensemble of ensembles
Predicting rainfall using ensemble of ensemblesPredicting rainfall using ensemble of ensembles
Predicting rainfall using ensemble of ensembles
 
RBHF_SDM_2011_Jie
RBHF_SDM_2011_JieRBHF_SDM_2011_Jie
RBHF_SDM_2011_Jie
 
Dmml report final
Dmml report finalDmml report final
Dmml report final
 
Python Code for Classification Supervised Machine Learning.pdf
Python Code for Classification Supervised Machine Learning.pdfPython Code for Classification Supervised Machine Learning.pdf
Python Code for Classification Supervised Machine Learning.pdf
 
casestudy_important.pptx
casestudy_important.pptxcasestudy_important.pptx
casestudy_important.pptx
 
Predict Backorder on a supply chain data for an Organization
Predict Backorder on a supply chain data for an OrganizationPredict Backorder on a supply chain data for an Organization
Predict Backorder on a supply chain data for an Organization
 
Study of Parametric Kernels for LSSVM Models in NIR Determination of Fishmeal...
Study of Parametric Kernels for LSSVM Models in NIR Determination of Fishmeal...Study of Parametric Kernels for LSSVM Models in NIR Determination of Fishmeal...
Study of Parametric Kernels for LSSVM Models in NIR Determination of Fishmeal...
 

Recently uploaded

sourabh vyas1222222222222222222244444444
sourabh vyas1222222222222222222244444444sourabh vyas1222222222222222222244444444
sourabh vyas1222222222222222222244444444
saurabvyas476
 
Abortion pills in Riyadh Saudi Arabia| +966572737505 | Get Cytotec, Unwanted Kit
Abortion pills in Riyadh Saudi Arabia| +966572737505 | Get Cytotec, Unwanted KitAbortion pills in Riyadh Saudi Arabia| +966572737505 | Get Cytotec, Unwanted Kit
Abortion pills in Riyadh Saudi Arabia| +966572737505 | Get Cytotec, Unwanted Kit
Abortion pills in Riyadh +966572737505 get cytotec
 
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
acoha1
 
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...
yulianti213969
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Bertram Ludäscher
 
Simplify hybrid data integration at an enterprise scale. Integrate all your d...
Simplify hybrid data integration at an enterprise scale. Integrate all your d...Simplify hybrid data integration at an enterprise scale. Integrate all your d...
Simplify hybrid data integration at an enterprise scale. Integrate all your d...
varanasisatyanvesh
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Klinik kandungan
 
如何办理澳洲拉筹伯大学毕业证(LaTrobe毕业证书)成绩单原件一模一样
如何办理澳洲拉筹伯大学毕业证(LaTrobe毕业证书)成绩单原件一模一样如何办理澳洲拉筹伯大学毕业证(LaTrobe毕业证书)成绩单原件一模一样
如何办理澳洲拉筹伯大学毕业证(LaTrobe毕业证书)成绩单原件一模一样
wsppdmt
 
Huawei Ransomware Protection Storage Solution Technical Overview Presentation...
Huawei Ransomware Protection Storage Solution Technical Overview Presentation...Huawei Ransomware Protection Storage Solution Technical Overview Presentation...
Huawei Ransomware Protection Storage Solution Technical Overview Presentation...
LuisMiguelPaz5
 

Recently uploaded (20)

sourabh vyas1222222222222222222244444444
sourabh vyas1222222222222222222244444444sourabh vyas1222222222222222222244444444
sourabh vyas1222222222222222222244444444
 
Abortion pills in Riyadh Saudi Arabia| +966572737505 | Get Cytotec, Unwanted Kit
Abortion pills in Riyadh Saudi Arabia| +966572737505 | Get Cytotec, Unwanted KitAbortion pills in Riyadh Saudi Arabia| +966572737505 | Get Cytotec, Unwanted Kit
Abortion pills in Riyadh Saudi Arabia| +966572737505 | Get Cytotec, Unwanted Kit
 
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
 
Pentesting_AI and security challenges of AI
Pentesting_AI and security challenges of AIPentesting_AI and security challenges of AI
Pentesting_AI and security challenges of AI
 
SCI8-Q4-MOD11.pdfwrwujrrjfaajerjrajrrarj
SCI8-Q4-MOD11.pdfwrwujrrjfaajerjrajrrarjSCI8-Q4-MOD11.pdfwrwujrrjfaajerjrajrrarj
SCI8-Q4-MOD11.pdfwrwujrrjfaajerjrajrrarj
 
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...
 
DBMS UNIT 5 46 CONTAINS NOTES FOR THE STUDENTS
DBMS UNIT 5 46 CONTAINS NOTES FOR THE STUDENTSDBMS UNIT 5 46 CONTAINS NOTES FOR THE STUDENTS
DBMS UNIT 5 46 CONTAINS NOTES FOR THE STUDENTS
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
 
Case Study 4 Where the cry of rebellion happen?
Case Study 4 Where the cry of rebellion happen?Case Study 4 Where the cry of rebellion happen?
Case Study 4 Where the cry of rebellion happen?
 
Simplify hybrid data integration at an enterprise scale. Integrate all your d...
Simplify hybrid data integration at an enterprise scale. Integrate all your d...Simplify hybrid data integration at an enterprise scale. Integrate all your d...
Simplify hybrid data integration at an enterprise scale. Integrate all your d...
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
 
ℂall Girls In Navi Mumbai Hire Me Neha 9910780858 Top Class ℂall Girl Serviℂe...
ℂall Girls In Navi Mumbai Hire Me Neha 9910780858 Top Class ℂall Girl Serviℂe...ℂall Girls In Navi Mumbai Hire Me Neha 9910780858 Top Class ℂall Girl Serviℂe...
ℂall Girls In Navi Mumbai Hire Me Neha 9910780858 Top Class ℂall Girl Serviℂe...
 
如何办理澳洲拉筹伯大学毕业证(LaTrobe毕业证书)成绩单原件一模一样
如何办理澳洲拉筹伯大学毕业证(LaTrobe毕业证书)成绩单原件一模一样如何办理澳洲拉筹伯大学毕业证(LaTrobe毕业证书)成绩单原件一模一样
如何办理澳洲拉筹伯大学毕业证(LaTrobe毕业证书)成绩单原件一模一样
 
Huawei Ransomware Protection Storage Solution Technical Overview Presentation...
Huawei Ransomware Protection Storage Solution Technical Overview Presentation...Huawei Ransomware Protection Storage Solution Technical Overview Presentation...
Huawei Ransomware Protection Storage Solution Technical Overview Presentation...
 
jll-asia-pacific-capital-tracker-1q24.pdf
jll-asia-pacific-capital-tracker-1q24.pdfjll-asia-pacific-capital-tracker-1q24.pdf
jll-asia-pacific-capital-tracker-1q24.pdf
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
 
Credit Card Fraud Detection: Safeguarding Transactions in the Digital Age
Credit Card Fraud Detection: Safeguarding Transactions in the Digital AgeCredit Card Fraud Detection: Safeguarding Transactions in the Digital Age
Credit Card Fraud Detection: Safeguarding Transactions in the Digital Age
 
Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...
Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...
Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...
 
Introduction to Statistics Presentation.pptx
Introduction to Statistics Presentation.pptxIntroduction to Statistics Presentation.pptx
Introduction to Statistics Presentation.pptx
 

Predicting house prices_Regression

  • 1. Regression: Predicting House Prices SRUTI JAIN MACHINE LEARNING SPECIALIZATION UNIVERSITY OF WASHINGTON
  • 2. Problem Statement Determine the housing prices of California properties for new sellers and also for buyers to estimate the profitability of the deal. Question: How much is my house worth? Solution: Involves looking at recent sales in the neighborhood
  • 3. Dataset Details 1. The data is taken from California census data with 20,640 instances & 10 attributes 2. Converted the text attribute (ocean_proximity) into categorical data types using one hot encoding scheme using Scikit package. 3. Attributes like latitude, longitude were used during exploratory analysis. Not used in further model building. 4. Feature standardization was performed on all numeric data variables. 5. The dataset was split into Train-Validate-Test samples using Stratified sampling.
  • 5. Exploratory Analysis Plot Plot to visualize role of latitude, longitude & population on the price of the house
  • 6. Training-Testing Models 1. Linear Regression 2. Decision Tree Regressor 3. Random Forest Regressor 4. Support Vector Regressor 5. Fine Tuning the Hyperparameters for Random Forest Regressor using Grid Search and Randomized Search Note: Random seed values were picked to develop training, validation & testing sets in the ratio 60:20:20
  • 7. Linear Regression Linear regression helped understand which variable are significant & which not. Also since many of our attributes are continuous, linear regression is a good approach to use as a starting step.
  • 11. Comparative Analysis 1. In multiple linear regression, the best R-Squared 0.6002, correlation of prediction and test is 0.7748672 and RMSE- 68321.70. 2. In Decision Tree, the best regression model comes from random forest with correlation 0.876914 and RMSE- 70269.57. 3. In SVM model, model with linear kernel performs best with correlation 0.82014 & RMSE- 110914.79. 4. Of the four models, random forest performs better than the others with least RMSE- 49261.28 obtained by tuning the Hyperparameters using Randomized Search.