Machine Learning Project - House Price Prediction

House On Sale -
Prediction of Sales Price
Kexin
liu.kexin22@gmail.com
Module II
ML - Algorithm - Regression

Inspect Problems
Avg Price : $180k
Top Sale :
June , 2007
2007 2010
Top Drop :
2009 - 2010
↓50%
Best Style : One Story
Best Building Type : Single-family Detached
Mode
Mean

Dataset Info
Total R
Total C
Null
Duplicates
1460
81
6965
0

Variable Type
Cat
Nominal
Ordinal
'MSZoning', 'Street', 'Alley', 'LotShape', 'LandContour',
'Utilities','LotConfig', 'LandSlope', 'Neighborhood', 'BldgType',
'HouseStyle', 'RoofStyle','RoofMatl', 'Exterior1st', 'Exterior2nd',
'MasVnrType', 'Foundation', 'BsmtExposure','BsmtFinType1',
'BsmtFinType2', 'Heating', 'CentralAir','Electrical', 'BsmtFullBath',
'BsmtHalfBath', 'FullBath', 'HalfBath','BedroomAbvGr',
'KitchenAbvGr','TotRmsAbvGrd',
'Functional', 'Fireplaces', 'GarageType', 'GarageFinish',
'GarageCars', 'PavedDrive', 'Fence', 'MiscFeature', 'SaleType'
'Condition1', 'Condition2','OverallQual', 'OverallCond',
'ExterQual','ExterCond', 'BsmtQual', 'BsmtCond', 'HeatingQC',
'KitchenQual', 'FireplaceQu', 'GarageQual', 'GarageCond',
'PoolQC','SaleCondition'
Num
Continuous
Discrete
'YearBuilt','YearRemodAdd', 'GarageYrBlt', 'MoSold', 'YrSold',
'MSSubClass', 'LotFrontage', 'LotArea', 'MasVnrArea', 'BsmtFinSF1',
'BsmtFinSF2', 'BsmtUnfSF','TotalBsmtSF', '1stFlrSF', '2ndFlrSF',
'LowQualFinSF', 'GrLivArea','GarageArea', 'WoodDeckSF',
'OpenPorchSF','EnclosedPorch', '3SsnPorch', 'ScreenPorch','PoolArea',
'MiscVal','SalePrice'

Missing Value
Cat
Num -
LotFrontage
Fill NaN by String ‘NO’ ✔
Fill NaN by Prediction ?
Fill NaN by Mean ? ✔
Drop NaN column ?

Missing Value
Num - LotFrontage
Drop NaN column ?
NO
Linear feet of street
connected to property
Image: https://www.concordma.gov/DocumentCenter/View/1385/Section-6-PDF?bidId=

Missing Value
Num - LotFrontage
Fill NaN by Prediction ?
NO
EDA - Correlation
Training Scores from
models
'1stFlrSF','LotArea','GrLivArea','TotalBsmtSF','GarageArea',
'MSSubClass'
Predict 'LogFrontage' with KNN - 0.52
Predict 'LogFrontage' with Linear - 0.40
Predict 'LogFrontage' with RandomForest - 0.59

Missing Value
Num - LotFrontage
Fill NaN by Mean ?
Groupby Mean 'Neighborhood','YearBuilt'

Target - SalesPrice
Top 3 Corr
'GrLivArea', 'GarageArea',
'TotalBsmtSF'

Feature Selection -
All Numeric Variables
'GrLivArea',
'GarageArea',
'TotalBsmtSF',
'1stFlrSF',
'YearBuilt',
'YearRemodAdd',
'OpenPorchSF',
'LotArea',
'LotFrontage'

Feature Selection -
Categorical Variables

TrainTest Info
Total R
Total C
1426
80
Linear
Regression
LR
Scaled
LR
Lasso
Train Test

TrainTest Info
Total R
Total C
1426
80
KNN
Random
Forest
Decision
Tree
Train Test

TrainTest Info
Total R
Total C
1426
48
Linear
Regression
LR
Scaled
LR
Lasso
Train Test

TrainTest Info
Total R
Total C
1426
48
KNN
Random
Forest
Decision
Tree
Train Test

TrainTest Info
Total R
Total C
1426
48
Linear
Regression
LR
Lasso
Train Test
0.88 0.85
0.38 0.84

TrainTest Info
Total R
Total C
1426
48
KNN
Random
Forest
Decision
Tree
Train Test
0.74 0.74
0.75 0.71
0.83 0.71

Linear
Regression
Train Test
0.88 0.85
KNN
Train Test
0.74 0.74

Machine Learning Project - House Price Prediction

More Related Content

Similar to Machine Learning Project - House Price Prediction

Recently uploaded

Machine Learning Project - House Price Prediction