1. Neural Network Experiments
on House Prices
CENK BIRCANOĞLU
COMPUTER ENGINEERING, BAHCESEHIR UNIVERSITY
MACHINE LEARNING AND PATTERN RECOGNITION (CMP5130) 1
3. Problem Definition
◦ Estimation of a numerical value by using the obtained data
◦ In this study, predict the house prices with 79 explanatory
variables
MACHINE LEARNING AND PATTERN RECOGNITION (CMP5130) 3
4. Previous Works
◦ Preprocessing
◦ Normalization
◦ Standard Scaling
◦ Simple Anomaly Detection algorithms
◦ Random Forest algorithm [10,11].
◦ Gradient Boosting algorithm [3,4]
◦ Regression form of Support Vector Machine (SVR) algorithm [9]
◦ PCA and regression algorithm [7]
◦ Deep Learning application [8]
◦ Different machine learning algorithms are applied together and the averages of their results are taken
[1,2,5,6]
◦ Results are between the 0,11 and 0,23.
MACHINE LEARNING AND PATTERN RECOGNITION (CMP5130) 4
5. Dataset
◦ House Prices: Advanced Regression Techniques
◦ Feature Size: 81 (id and price, 52 categorical, 2 date,
others float/int)
◦ Train size: 1460
◦ Test Size: 1459
MACHINE LEARNING AND PATTERN RECOGNITION (CMP5130) 5
6. Dataset
◦ SalePrice: property's sale price
in dollars
◦ MSSubClass: The building class
◦ MSZoning: The general zoning
classification
◦ LotFrontage: Linear feet of
street connected to property
◦ LotArea: Lot size in square feet
◦ Street: Type of road access
◦ Alley: Type of alley access
◦ LotShape: General shape
◦ LandContour: Flatness
◦ Utilities: Type of utilities
available
◦ LotConfig: Lot configuration
◦ LandSlope: Slope
◦ Neighborhood: Physical
locations within Ames city
limits
◦ Condition1: Proximity to main
road or railroad
◦ Condition2: Proximity to main
road or railroad
◦ BldgType: Type of dwelling
◦ HouseStyle: Style of dwelling
◦ OverallQual: Overall material
and finish quality
◦ OverallCond: Overall condition
rating
◦ YearBuilt: Original construction
date
◦ YearRemodAdd: Remodel date
◦ RoofStyle: Type of roof
◦ RoofMatl: Roof material
MACHINE LEARNING AND PATTERN RECOGNITION (CMP5130) 6
7. Dataset
◦ Exterior1st: Exterior covering
on house
◦ Exterior2nd: Exterior covering
on house (if more than one
material)
◦ MasVnrType: Masonry veneer
type
◦ MasVnrArea: Masonry veneer
area in square feet
◦ ExterQual: Exterior material
quality
◦ ExterCond: Present condition of
the material on the exterior
◦ Foundation: Type of foundation
◦ BsmtQual: Height of the
basement
◦ BsmtCond: General condition
of the basement
◦ BsmtExposure: Walkout or
garden level basement walls
◦ BsmtFinType1: Quality of
basement finished area
◦ BsmtFinSF1: Type 1 finished
square feet
◦ BsmtFinType2: Quality of
second finished area
◦ BsmtFinSF2: Type 2 finished
square feet
◦ BsmtUnfSF: Unfinished square
feet of basement area
◦ TotalBsmtSF: Total square feet
of basement area
◦ Heating: Type of heating
◦ HeatingQC: Heating quality and
condition
◦ CentralAir: Central air
conditioning
◦ Electrical: Electrical system
◦ 1stFlrSF: First Floor square feet
◦ 2ndFlrSF: Second floor square
feet
MACHINE LEARNING AND PATTERN RECOGNITION (CMP5130) 7
8. Dataset
◦ LowQualFinSF: Low quality
finished square feet
◦ GrLivArea: Above grade living
area square feet
◦ BsmtFullBath: Basement full
bathrooms
◦ BsmtHalfBath: Basement half
bathrooms
◦ FullBath: Full bathrooms above
grade
◦ HalfBath: Half baths above
grade
◦ Bedroom: Number of
bedrooms above basement
level
◦ Kitchen: Number of kitchens
◦ KitchenQual: Kitchen quality
◦ TotRmsAbvGrd: Total rooms
above grade
◦ Functional: Home functionality
rating
◦ Fireplaces: Number of
fireplaces
◦ FireplaceQu: Fireplace quality
◦ GarageType: Garage location
◦ GarageYrBlt: Year garage was
built
◦ GarageFinish: Interior finish of
the garage
◦ GarageCars: Size of garage in
car capacity
◦ GarageArea: Size of garage in
square feet
◦ GarageQual: Garage quality
◦ GarageCond: Garage condition
MACHINE LEARNING AND PATTERN RECOGNITION (CMP5130) 8
9. Dataset
◦ GarageCond: Garage condition
◦ PavedDrive: Paved driveway
◦ WoodDeckSF: Wood deck area
in square feet
◦ OpenPorchSF: Open porch area
in square feet
◦ EnclosedPorch: Enclosed porch
area in square feet
◦ 3SsnPorch: Three season porch
area in square feet
◦ ScreenPorch: Screen porch area
in square feet
◦ PoolArea: Pool area in square
feet
◦ PoolQC: Pool quality
◦ Fence: Fence quality
◦ MiscFeature: Miscellaneous
feature not covered in other
categories
◦ MiscVal: Value of miscellaneous
feature
◦ MoSold: Month Sold
◦ YrSold: Year Sold
◦ SaleType: Type of sale
◦ SaleCondition: Condition of sale
MACHINE LEARNING AND PATTERN RECOGNITION (CMP5130) 9
10. Proposed Architecture
◦ Inputs are same for all Neural Network model.
◦ Output is the prediction of house prices
◦ Adam optimizer used
◦ Mean Square Error loss function is used
◦ Each network models trained with linear, tanh, relu and selu
activation function
MACHINE LEARNING AND PATTERN RECOGNITION (CMP5130) 10
11. Single Layer Perceptron, Multi Layer
Perceptron
MACHINE LEARNING AND PATTERN RECOGNITION (CMP5130) 11
18. Single Layer Network Model
◦ Single Layer Perceptron
◦ To have an idea about the performance of network on House Prices dataset
MACHINE LEARNING AND PATTERN RECOGNITION (CMP5130) 18
19. Model 1
◦ Multi-Layer Perceptron (1 hidden layer)
MACHINE LEARNING AND PATTERN RECOGNITION (CMP5130) 19
20. Model 2
◦ Multi-Layer Perceptron (1 hidden layer and wider)
MACHINE LEARNING AND PATTERN RECOGNITION (CMP5130) 20
21. Model 3
◦ Multi-Layer Perceptron (3 hidden layer)
MACHINE LEARNING AND PATTERN RECOGNITION (CMP5130) 21
22. Model 4
◦ Multi-Layer Perceptron (3 hidden layer and wider)
MACHINE LEARNING AND PATTERN RECOGNITION (CMP5130) 22
23. Model 5
◦ Multi-Layer Perceptron (3 hidden layer and dropout after each hidden layer)
MACHINE LEARNING AND PATTERN RECOGNITION (CMP5130) 23
24. Model 6
◦ Multi-Layer Perceptron (3 hidden layer and dropout after each hidden layer
and wider)
MACHINE LEARNING AND PATTERN RECOGNITION (CMP5130) 24
25. Experiments and Results
◦ Data Cleaning/Preprocessing
◦ Training Network Model
◦ Results
MACHINE LEARNING AND PATTERN RECOGNITION (CMP5130) 25
26. Data Cleaning/Preprocessing
◦ Total column number is 79
◦ Label encoder used for every categorical results
◦ Missing column values set to mean value for columns which have int/float
type
◦ VarianceThreshold, Normalizer are applied
◦ IsolationForest algorithm applied also to find outliers. 139 outliers removed
from train dataset
◦ Logarithm of Sale Price values used as y value
◦ Input columns 79 to 262
◦ Python 3.6.3, Scikit-Learn, Pandas environment
MACHINE LEARNING AND PATTERN RECOGNITION (CMP5130) 26
28. Variance Threshold
◦ Feature selector which removes
all low-variance features
◦ Unsupervised Approach
◦ 3 features removed
Normalizer
◦ Normalize samples individually
to unit norm
MACHINE LEARNING AND PATTERN RECOGNITION (CMP5130) 28
29. Isolation Forest
◦ Scoring each sample whether it is
anomaly or not
◦ Isolates observations by randomly
selecting a feature and then
randomly selecting a split value
between the maximum and
minimum values of the selected
feature.
MACHINE LEARNING AND PATTERN RECOGNITION (CMP5130) 29
30. Training
◦ Input size 262
◦ EarlyStopping added to training part
◦ Batch size 8
◦ Validation Split 0.1
◦ Keras backed by Tensorflow
◦ Tensorboard,
◦ Exponential of results used as the last results
MACHINE LEARNING AND PATTERN RECOGNITION (CMP5130) 30
33. Conclusion and Future Works
◦ 7 different networks are implemented and experimented
◦ Deeper and wider models give better results but they cause
overfitting if regularization is not used.
◦ Deeper and wider models, as well as new studies combining
traditional machine learning algorithms and deep learning
algorithms
◦ Batch Normalization layers, regularizers in Fully Connected layers
◦ AutoEncoders with traditional regression algorithms as Lasso,
Ridge, Huber regression
MACHINE LEARNING AND PATTERN RECOGNITION (CMP5130) 33