SlideShare a Scribd company logo
1 of 35
Neural Network Experiments
on House Prices
CENK BIRCANOĞLU
COMPUTER ENGINEERING, BAHCESEHIR UNIVERSITY
MACHINE LEARNING AND PATTERN RECOGNITION (CMP5130) 1
Content
◦Problem Definition
◦Previous Works
◦Dataset
◦Proposed Architecture
◦Experiments and Results
◦Conclusion and Future Works
MACHINE LEARNING AND PATTERN RECOGNITION (CMP5130) 2
Problem Definition
◦ Estimation of a numerical value by using the obtained data
◦ In this study, predict the house prices with 79 explanatory
variables
MACHINE LEARNING AND PATTERN RECOGNITION (CMP5130) 3
Previous Works
◦ Preprocessing
◦ Normalization
◦ Standard Scaling
◦ Simple Anomaly Detection algorithms
◦ Random Forest algorithm [10,11].
◦ Gradient Boosting algorithm [3,4]
◦ Regression form of Support Vector Machine (SVR) algorithm [9]
◦ PCA and regression algorithm [7]
◦ Deep Learning application [8]
◦ Different machine learning algorithms are applied together and the averages of their results are taken
[1,2,5,6]
◦ Results are between the 0,11 and 0,23.
MACHINE LEARNING AND PATTERN RECOGNITION (CMP5130) 4
Dataset
◦ House Prices: Advanced Regression Techniques
◦ Feature Size: 81 (id and price, 52 categorical, 2 date,
others float/int)
◦ Train size: 1460
◦ Test Size: 1459
MACHINE LEARNING AND PATTERN RECOGNITION (CMP5130) 5
Dataset
◦ SalePrice: property's sale price
in dollars
◦ MSSubClass: The building class
◦ MSZoning: The general zoning
classification
◦ LotFrontage: Linear feet of
street connected to property
◦ LotArea: Lot size in square feet
◦ Street: Type of road access
◦ Alley: Type of alley access
◦ LotShape: General shape
◦ LandContour: Flatness
◦ Utilities: Type of utilities
available
◦ LotConfig: Lot configuration
◦ LandSlope: Slope
◦ Neighborhood: Physical
locations within Ames city
limits
◦ Condition1: Proximity to main
road or railroad
◦ Condition2: Proximity to main
road or railroad
◦ BldgType: Type of dwelling
◦ HouseStyle: Style of dwelling
◦ OverallQual: Overall material
and finish quality
◦ OverallCond: Overall condition
rating
◦ YearBuilt: Original construction
date
◦ YearRemodAdd: Remodel date
◦ RoofStyle: Type of roof
◦ RoofMatl: Roof material
MACHINE LEARNING AND PATTERN RECOGNITION (CMP5130) 6
Dataset
◦ Exterior1st: Exterior covering
on house
◦ Exterior2nd: Exterior covering
on house (if more than one
material)
◦ MasVnrType: Masonry veneer
type
◦ MasVnrArea: Masonry veneer
area in square feet
◦ ExterQual: Exterior material
quality
◦ ExterCond: Present condition of
the material on the exterior
◦ Foundation: Type of foundation
◦ BsmtQual: Height of the
basement
◦ BsmtCond: General condition
of the basement
◦ BsmtExposure: Walkout or
garden level basement walls
◦ BsmtFinType1: Quality of
basement finished area
◦ BsmtFinSF1: Type 1 finished
square feet
◦ BsmtFinType2: Quality of
second finished area
◦ BsmtFinSF2: Type 2 finished
square feet
◦ BsmtUnfSF: Unfinished square
feet of basement area
◦ TotalBsmtSF: Total square feet
of basement area
◦ Heating: Type of heating
◦ HeatingQC: Heating quality and
condition
◦ CentralAir: Central air
conditioning
◦ Electrical: Electrical system
◦ 1stFlrSF: First Floor square feet
◦ 2ndFlrSF: Second floor square
feet
MACHINE LEARNING AND PATTERN RECOGNITION (CMP5130) 7
Dataset
◦ LowQualFinSF: Low quality
finished square feet
◦ GrLivArea: Above grade living
area square feet
◦ BsmtFullBath: Basement full
bathrooms
◦ BsmtHalfBath: Basement half
bathrooms
◦ FullBath: Full bathrooms above
grade
◦ HalfBath: Half baths above
grade
◦ Bedroom: Number of
bedrooms above basement
level
◦ Kitchen: Number of kitchens
◦ KitchenQual: Kitchen quality
◦ TotRmsAbvGrd: Total rooms
above grade
◦ Functional: Home functionality
rating
◦ Fireplaces: Number of
fireplaces
◦ FireplaceQu: Fireplace quality
◦ GarageType: Garage location
◦ GarageYrBlt: Year garage was
built
◦ GarageFinish: Interior finish of
the garage
◦ GarageCars: Size of garage in
car capacity
◦ GarageArea: Size of garage in
square feet
◦ GarageQual: Garage quality
◦ GarageCond: Garage condition
MACHINE LEARNING AND PATTERN RECOGNITION (CMP5130) 8
Dataset
◦ GarageCond: Garage condition
◦ PavedDrive: Paved driveway
◦ WoodDeckSF: Wood deck area
in square feet
◦ OpenPorchSF: Open porch area
in square feet
◦ EnclosedPorch: Enclosed porch
area in square feet
◦ 3SsnPorch: Three season porch
area in square feet
◦ ScreenPorch: Screen porch area
in square feet
◦ PoolArea: Pool area in square
feet
◦ PoolQC: Pool quality
◦ Fence: Fence quality
◦ MiscFeature: Miscellaneous
feature not covered in other
categories
◦ MiscVal: Value of miscellaneous
feature
◦ MoSold: Month Sold
◦ YrSold: Year Sold
◦ SaleType: Type of sale
◦ SaleCondition: Condition of sale
MACHINE LEARNING AND PATTERN RECOGNITION (CMP5130) 9
Proposed Architecture
◦ Inputs are same for all Neural Network model.
◦ Output is the prediction of house prices
◦ Adam optimizer used
◦ Mean Square Error loss function is used
◦ Each network models trained with linear, tanh, relu and selu
activation function
MACHINE LEARNING AND PATTERN RECOGNITION (CMP5130) 10
Single Layer Perceptron, Multi Layer
Perceptron
MACHINE LEARNING AND PATTERN RECOGNITION (CMP5130) 11
Fully Connected Layer
MACHINE LEARNING AND PATTERN RECOGNITION (CMP5130) 12
Activation Functions
Linear (Identity) Function Hyperbolic tangent (Tanh)
MACHINE LEARNING AND PATTERN RECOGNITION (CMP5130) 13
Activation Functions
Rectified Linear Unit (ReLU) Scaled Exponential Linear Unit (Selu)
MACHINE LEARNING AND PATTERN RECOGNITION (CMP5130) 14
Dropout
MACHINE LEARNING AND PATTERN RECOGNITION (CMP5130) 15
Cost (Loss) Function
MACHINE LEARNING AND PATTERN RECOGNITION (CMP5130) 16
Adam
MACHINE LEARNING AND PATTERN RECOGNITION (CMP5130) 17
Single Layer Network Model
◦ Single Layer Perceptron
◦ To have an idea about the performance of network on House Prices dataset
MACHINE LEARNING AND PATTERN RECOGNITION (CMP5130) 18
Model 1
◦ Multi-Layer Perceptron (1 hidden layer)
MACHINE LEARNING AND PATTERN RECOGNITION (CMP5130) 19
Model 2
◦ Multi-Layer Perceptron (1 hidden layer and wider)
MACHINE LEARNING AND PATTERN RECOGNITION (CMP5130) 20
Model 3
◦ Multi-Layer Perceptron (3 hidden layer)
MACHINE LEARNING AND PATTERN RECOGNITION (CMP5130) 21
Model 4
◦ Multi-Layer Perceptron (3 hidden layer and wider)
MACHINE LEARNING AND PATTERN RECOGNITION (CMP5130) 22
Model 5
◦ Multi-Layer Perceptron (3 hidden layer and dropout after each hidden layer)
MACHINE LEARNING AND PATTERN RECOGNITION (CMP5130) 23
Model 6
◦ Multi-Layer Perceptron (3 hidden layer and dropout after each hidden layer
and wider)
MACHINE LEARNING AND PATTERN RECOGNITION (CMP5130) 24
Experiments and Results
◦ Data Cleaning/Preprocessing
◦ Training Network Model
◦ Results
MACHINE LEARNING AND PATTERN RECOGNITION (CMP5130) 25
Data Cleaning/Preprocessing
◦ Total column number is 79
◦ Label encoder used for every categorical results
◦ Missing column values set to mean value for columns which have int/float
type
◦ VarianceThreshold, Normalizer are applied
◦ IsolationForest algorithm applied also to find outliers. 139 outliers removed
from train dataset
◦ Logarithm of Sale Price values used as y value
◦ Input columns 79 to 262
◦ Python 3.6.3, Scikit-Learn, Pandas environment
MACHINE LEARNING AND PATTERN RECOGNITION (CMP5130) 26
MACHINE LEARNING AND PATTERN RECOGNITION (CMP5130) 27
Variance Threshold
◦ Feature selector which removes
all low-variance features
◦ Unsupervised Approach
◦ 3 features removed
Normalizer
◦ Normalize samples individually
to unit norm
MACHINE LEARNING AND PATTERN RECOGNITION (CMP5130) 28
Isolation Forest
◦ Scoring each sample whether it is
anomaly or not
◦ Isolates observations by randomly
selecting a feature and then
randomly selecting a split value
between the maximum and
minimum values of the selected
feature.
MACHINE LEARNING AND PATTERN RECOGNITION (CMP5130) 29
Training
◦ Input size 262
◦ EarlyStopping added to training part
◦ Batch size 8
◦ Validation Split 0.1
◦ Keras backed by Tensorflow
◦ Tensorboard,
◦ Exponential of results used as the last results
MACHINE LEARNING AND PATTERN RECOGNITION (CMP5130) 30
Results
◦Single-Layer perceptron
◦ Training: 0.0268
◦ Validation: 0.1639
◦ Test: 0.1814
MACHINE LEARNING AND PATTERN RECOGNITION (CMP5130) 31
Results
Activation Dataset MLP 1 MLP 2 (wider) MLP 3 MLP 4 (wider) MLP 5 MLP 6 (wider)
Lin Train 0.0179 0.0188 0.0182 0.0172 0.0181 0.0187
Val 0.1338 0.1374 0.1350 0.1313 0.1345 0.1368
Test 0.1904 0.2009 0.1895 0.1882 0.1910 0.1935
Tanh Train 0.0142 0.0138 0.0112 0.1547 0.0173 0.1558
Val 0.1195 0.1177 0.1059 0.3934 0.1318 0.3948
Test 0.1811 0.2305 0.1614 0.4184 0.1669 0.4199
Relu Train 0.0171 0.0128 0.0134 0.0150 0.0657 0.0300
Val 0.1310 0.1133 0.1160 0.1227 0.2564 0.1734
Test 0.1906 0.1891 0.1891 0.3049 0.2883 0.2036
Selu Train 0.0145 0.0117 0.0105 0.0088 0.0269 0.0134
Val 0.1204 0.1081 0.1026 0.0939 0.1642 0.1161
Test 0.1814 0.2012 0.1468 0.1936 0.1909 0.1390
MACHINE LEARNING AND PATTERN RECOGNITION (CMP5130) 32
Conclusion and Future Works
◦ 7 different networks are implemented and experimented
◦ Deeper and wider models give better results but they cause
overfitting if regularization is not used.
◦ Deeper and wider models, as well as new studies combining
traditional machine learning algorithms and deep learning
algorithms
◦ Batch Normalization layers, regularizers in Fully Connected layers
◦ AutoEncoders with traditional regression algorithms as Lasso,
Ridge, Huber regression
MACHINE LEARNING AND PATTERN RECOGNITION (CMP5130) 33
References
1. https://www.kaggle.com/iamprateek/my-submission-to-predict-sale-price
2. https://www.kaggle.com/apapiu/regularized-linear-models
3. https://www.kaggle.com/johnnymedhanie/house-prices-gradient-boosting
4. https://www.kaggle.com/browooro/simple-feature-engineering-selection-notebook
5. https://www.kaggle.com/jimthompson/ensemble-model-stacked-model-example
6. https://www.kaggle.com/humananalog/xgboost-lasso
7. https://www.kaggle.com/miguelangelnieto/pca-and-regression
8. https://www.kaggle.com/zoupet/neural-network-model-for-house-prices-tensorflow
9. https://www.kaggle.com/tilii7/svr-sparse-matrix-bayesian-optimization
10. https://www.kaggle.com/dfitzgerald3/randomforestregressor
11. https://www.kaggle.com/dansbecker/random-forests
12. https://www.kaggle.com/
13. https://www.kaggle.com/c/house-prices-advanced-regression-techniques
MACHINE LEARNING AND PATTERN RECOGNITION (CMP5130) 34
Thanks For You Patience
CENK BIRCANOĞLU
MACHINE LEARNING AND PATTERN RECOGNITION (CMP5130) 35

More Related Content

Similar to Kaggles House Prices Competition Study

L1_Introduction.ppt
L1_Introduction.pptL1_Introduction.ppt
L1_Introduction.pptVarsha506533
 
learning phase final meeting.pdf
learning phase final meeting.pdflearning phase final meeting.pdf
learning phase final meeting.pdfkarimsamhy2
 
Global C4IR-1 Masterclass Bowyer - McLaren 2017
Global C4IR-1 Masterclass Bowyer - McLaren 2017Global C4IR-1 Masterclass Bowyer - McLaren 2017
Global C4IR-1 Masterclass Bowyer - McLaren 2017Justin Hayward
 
Design-for-Test (Testing of VLSI Design)
Design-for-Test (Testing of VLSI Design)Design-for-Test (Testing of VLSI Design)
Design-for-Test (Testing of VLSI Design)Usha Mehta
 
Total Control Eddyfi Tank Inspections workshop 2022
Total Control Eddyfi Tank Inspections workshop 2022Total Control Eddyfi Tank Inspections workshop 2022
Total Control Eddyfi Tank Inspections workshop 2022argebit
 
진동데이터 활용 충돌체 탐지 AI 경진대회 1등
진동데이터 활용 충돌체 탐지 AI 경진대회 1등진동데이터 활용 충돌체 탐지 AI 경진대회 1등
진동데이터 활용 충돌체 탐지 AI 경진대회 1등DACON AI 데이콘
 
Design of pulse jet engine for UAV - 2
Design of pulse jet engine for UAV - 2Design of pulse jet engine for UAV - 2
Design of pulse jet engine for UAV - 2ROSHAN SAH
 
19R002-SDN-MJB-021-SUP-01.pdf
19R002-SDN-MJB-021-SUP-01.pdf19R002-SDN-MJB-021-SUP-01.pdf
19R002-SDN-MJB-021-SUP-01.pdfChandan Sharma
 
Laptop Repairing Course 5 Months Syllabus
Laptop Repairing Course 5 Months SyllabusLaptop Repairing Course 5 Months Syllabus
Laptop Repairing Course 5 Months SyllabusChiptroniks Inst
 
Farzad Mirshams, Mechanical / Thermal Engineer
Farzad Mirshams, Mechanical / Thermal EngineerFarzad Mirshams, Mechanical / Thermal Engineer
Farzad Mirshams, Mechanical / Thermal EngineerFarzad Mirshams
 
Performance Optimization of CGYRO for Multiscale Turbulence Simulations
Performance Optimization of CGYRO for Multiscale Turbulence SimulationsPerformance Optimization of CGYRO for Multiscale Turbulence Simulations
Performance Optimization of CGYRO for Multiscale Turbulence SimulationsIgor Sfiligoi
 
층류 익형의 설계 최적화
층류 익형의 설계 최적화층류 익형의 설계 최적화
층류 익형의 설계 최적화HyunJoon Kim
 
OPAL-RT HYPERSIM Features applied for Relay Testing
OPAL-RT HYPERSIM Features applied for Relay TestingOPAL-RT HYPERSIM Features applied for Relay Testing
OPAL-RT HYPERSIM Features applied for Relay TestingOPAL-RT TECHNOLOGIES
 
Interchangeable skids provide plug and-play capabilities, improve uptime
Interchangeable skids provide plug and-play capabilities, improve uptimeInterchangeable skids provide plug and-play capabilities, improve uptime
Interchangeable skids provide plug and-play capabilities, improve uptimeIntelligentManufacturingInstitute
 

Similar to Kaggles House Prices Competition Study (20)

L1_Introduction.ppt
L1_Introduction.pptL1_Introduction.ppt
L1_Introduction.ppt
 
Report
ReportReport
Report
 
learning phase final meeting.pdf
learning phase final meeting.pdflearning phase final meeting.pdf
learning phase final meeting.pdf
 
lecture 2 parametric yield.pdf
lecture 2 parametric yield.pdflecture 2 parametric yield.pdf
lecture 2 parametric yield.pdf
 
SPM.pptx
SPM.pptxSPM.pptx
SPM.pptx
 
Global C4IR-1 Masterclass Bowyer - McLaren 2017
Global C4IR-1 Masterclass Bowyer - McLaren 2017Global C4IR-1 Masterclass Bowyer - McLaren 2017
Global C4IR-1 Masterclass Bowyer - McLaren 2017
 
Design-for-Test (Testing of VLSI Design)
Design-for-Test (Testing of VLSI Design)Design-for-Test (Testing of VLSI Design)
Design-for-Test (Testing of VLSI Design)
 
presentation 2.pptx
presentation 2.pptxpresentation 2.pptx
presentation 2.pptx
 
Total Control Eddyfi Tank Inspections workshop 2022
Total Control Eddyfi Tank Inspections workshop 2022Total Control Eddyfi Tank Inspections workshop 2022
Total Control Eddyfi Tank Inspections workshop 2022
 
진동데이터 활용 충돌체 탐지 AI 경진대회 1등
진동데이터 활용 충돌체 탐지 AI 경진대회 1등진동데이터 활용 충돌체 탐지 AI 경진대회 1등
진동데이터 활용 충돌체 탐지 AI 경진대회 1등
 
2014 PV Performance Modeling Workshop: Optimization strategies with Pvsyst fo...
2014 PV Performance Modeling Workshop: Optimization strategies with Pvsyst fo...2014 PV Performance Modeling Workshop: Optimization strategies with Pvsyst fo...
2014 PV Performance Modeling Workshop: Optimization strategies with Pvsyst fo...
 
Design of pulse jet engine for UAV - 2
Design of pulse jet engine for UAV - 2Design of pulse jet engine for UAV - 2
Design of pulse jet engine for UAV - 2
 
19R002-SDN-MJB-021-SUP-01.pdf
19R002-SDN-MJB-021-SUP-01.pdf19R002-SDN-MJB-021-SUP-01.pdf
19R002-SDN-MJB-021-SUP-01.pdf
 
Laptop Repairing Course 5 Months Syllabus
Laptop Repairing Course 5 Months SyllabusLaptop Repairing Course 5 Months Syllabus
Laptop Repairing Course 5 Months Syllabus
 
Farzad Mirshams, Mechanical / Thermal Engineer
Farzad Mirshams, Mechanical / Thermal EngineerFarzad Mirshams, Mechanical / Thermal Engineer
Farzad Mirshams, Mechanical / Thermal Engineer
 
Nexmark with beam
Nexmark with beamNexmark with beam
Nexmark with beam
 
Performance Optimization of CGYRO for Multiscale Turbulence Simulations
Performance Optimization of CGYRO for Multiscale Turbulence SimulationsPerformance Optimization of CGYRO for Multiscale Turbulence Simulations
Performance Optimization of CGYRO for Multiscale Turbulence Simulations
 
층류 익형의 설계 최적화
층류 익형의 설계 최적화층류 익형의 설계 최적화
층류 익형의 설계 최적화
 
OPAL-RT HYPERSIM Features applied for Relay Testing
OPAL-RT HYPERSIM Features applied for Relay TestingOPAL-RT HYPERSIM Features applied for Relay Testing
OPAL-RT HYPERSIM Features applied for Relay Testing
 
Interchangeable skids provide plug and-play capabilities, improve uptime
Interchangeable skids provide plug and-play capabilities, improve uptimeInterchangeable skids provide plug and-play capabilities, improve uptime
Interchangeable skids provide plug and-play capabilities, improve uptime
 

More from Cenk Bircanoğlu

Yapay Sinir Ağlarında Aktivasyon Fonksiyonlarının Karşılaştırılması
Yapay Sinir Ağlarında Aktivasyon Fonksiyonlarının KarşılaştırılmasıYapay Sinir Ağlarında Aktivasyon Fonksiyonlarının Karşılaştırılması
Yapay Sinir Ağlarında Aktivasyon Fonksiyonlarının KarşılaştırılmasıCenk Bircanoğlu
 
Image Generation with Tensorflow
Image Generation with TensorflowImage Generation with Tensorflow
Image Generation with TensorflowCenk Bircanoğlu
 
Routing or Computing? The Paradigm Shift Towards Intelligent Computer Network...
Routing or Computing? The Paradigm Shift Towards Intelligent Computer Network...Routing or Computing? The Paradigm Shift Towards Intelligent Computer Network...
Routing or Computing? The Paradigm Shift Towards Intelligent Computer Network...Cenk Bircanoğlu
 
Facial Emotion Classification Using Deep Embedding with Triplet Loss Function
Facial Emotion Classification Using Deep Embedding with Triplet Loss FunctionFacial Emotion Classification Using Deep Embedding with Triplet Loss Function
Facial Emotion Classification Using Deep Embedding with Triplet Loss FunctionCenk Bircanoğlu
 
A Comparison of Loss Function on Deep Embedding
A Comparison of Loss Function on Deep EmbeddingA Comparison of Loss Function on Deep Embedding
A Comparison of Loss Function on Deep EmbeddingCenk Bircanoğlu
 
Autoencoders for image_classification
Autoencoders for image_classificationAutoencoders for image_classification
Autoencoders for image_classificationCenk Bircanoğlu
 

More from Cenk Bircanoğlu (7)

Yapay Sinir Ağlarında Aktivasyon Fonksiyonlarının Karşılaştırılması
Yapay Sinir Ağlarında Aktivasyon Fonksiyonlarının KarşılaştırılmasıYapay Sinir Ağlarında Aktivasyon Fonksiyonlarının Karşılaştırılması
Yapay Sinir Ağlarında Aktivasyon Fonksiyonlarının Karşılaştırılması
 
Image Generation with Tensorflow
Image Generation with TensorflowImage Generation with Tensorflow
Image Generation with Tensorflow
 
Routing or Computing? The Paradigm Shift Towards Intelligent Computer Network...
Routing or Computing? The Paradigm Shift Towards Intelligent Computer Network...Routing or Computing? The Paradigm Shift Towards Intelligent Computer Network...
Routing or Computing? The Paradigm Shift Towards Intelligent Computer Network...
 
Facial Emotion Classification Using Deep Embedding with Triplet Loss Function
Facial Emotion Classification Using Deep Embedding with Triplet Loss FunctionFacial Emotion Classification Using Deep Embedding with Triplet Loss Function
Facial Emotion Classification Using Deep Embedding with Triplet Loss Function
 
Recommendation Systems
Recommendation SystemsRecommendation Systems
Recommendation Systems
 
A Comparison of Loss Function on Deep Embedding
A Comparison of Loss Function on Deep EmbeddingA Comparison of Loss Function on Deep Embedding
A Comparison of Loss Function on Deep Embedding
 
Autoencoders for image_classification
Autoencoders for image_classificationAutoencoders for image_classification
Autoencoders for image_classification
 

Recently uploaded

dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingNeil Barnes
 
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一F La
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfgstagge
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Jack DiGiovanna
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort servicejennyeacort
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptxthyngster
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024thyngster
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一F sss
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubaihf8803863
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDRafezzaman
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一fhwihughh
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts ServiceSapana Sha
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceSapana Sha
 

Recently uploaded (20)

dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
 
Decoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in ActionDecoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in Action
 
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdf
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
 
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
 
Call Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort ServiceCall Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort Service
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts Service
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts Service
 

Kaggles House Prices Competition Study

  • 1. Neural Network Experiments on House Prices CENK BIRCANOĞLU COMPUTER ENGINEERING, BAHCESEHIR UNIVERSITY MACHINE LEARNING AND PATTERN RECOGNITION (CMP5130) 1
  • 2. Content ◦Problem Definition ◦Previous Works ◦Dataset ◦Proposed Architecture ◦Experiments and Results ◦Conclusion and Future Works MACHINE LEARNING AND PATTERN RECOGNITION (CMP5130) 2
  • 3. Problem Definition ◦ Estimation of a numerical value by using the obtained data ◦ In this study, predict the house prices with 79 explanatory variables MACHINE LEARNING AND PATTERN RECOGNITION (CMP5130) 3
  • 4. Previous Works ◦ Preprocessing ◦ Normalization ◦ Standard Scaling ◦ Simple Anomaly Detection algorithms ◦ Random Forest algorithm [10,11]. ◦ Gradient Boosting algorithm [3,4] ◦ Regression form of Support Vector Machine (SVR) algorithm [9] ◦ PCA and regression algorithm [7] ◦ Deep Learning application [8] ◦ Different machine learning algorithms are applied together and the averages of their results are taken [1,2,5,6] ◦ Results are between the 0,11 and 0,23. MACHINE LEARNING AND PATTERN RECOGNITION (CMP5130) 4
  • 5. Dataset ◦ House Prices: Advanced Regression Techniques ◦ Feature Size: 81 (id and price, 52 categorical, 2 date, others float/int) ◦ Train size: 1460 ◦ Test Size: 1459 MACHINE LEARNING AND PATTERN RECOGNITION (CMP5130) 5
  • 6. Dataset ◦ SalePrice: property's sale price in dollars ◦ MSSubClass: The building class ◦ MSZoning: The general zoning classification ◦ LotFrontage: Linear feet of street connected to property ◦ LotArea: Lot size in square feet ◦ Street: Type of road access ◦ Alley: Type of alley access ◦ LotShape: General shape ◦ LandContour: Flatness ◦ Utilities: Type of utilities available ◦ LotConfig: Lot configuration ◦ LandSlope: Slope ◦ Neighborhood: Physical locations within Ames city limits ◦ Condition1: Proximity to main road or railroad ◦ Condition2: Proximity to main road or railroad ◦ BldgType: Type of dwelling ◦ HouseStyle: Style of dwelling ◦ OverallQual: Overall material and finish quality ◦ OverallCond: Overall condition rating ◦ YearBuilt: Original construction date ◦ YearRemodAdd: Remodel date ◦ RoofStyle: Type of roof ◦ RoofMatl: Roof material MACHINE LEARNING AND PATTERN RECOGNITION (CMP5130) 6
  • 7. Dataset ◦ Exterior1st: Exterior covering on house ◦ Exterior2nd: Exterior covering on house (if more than one material) ◦ MasVnrType: Masonry veneer type ◦ MasVnrArea: Masonry veneer area in square feet ◦ ExterQual: Exterior material quality ◦ ExterCond: Present condition of the material on the exterior ◦ Foundation: Type of foundation ◦ BsmtQual: Height of the basement ◦ BsmtCond: General condition of the basement ◦ BsmtExposure: Walkout or garden level basement walls ◦ BsmtFinType1: Quality of basement finished area ◦ BsmtFinSF1: Type 1 finished square feet ◦ BsmtFinType2: Quality of second finished area ◦ BsmtFinSF2: Type 2 finished square feet ◦ BsmtUnfSF: Unfinished square feet of basement area ◦ TotalBsmtSF: Total square feet of basement area ◦ Heating: Type of heating ◦ HeatingQC: Heating quality and condition ◦ CentralAir: Central air conditioning ◦ Electrical: Electrical system ◦ 1stFlrSF: First Floor square feet ◦ 2ndFlrSF: Second floor square feet MACHINE LEARNING AND PATTERN RECOGNITION (CMP5130) 7
  • 8. Dataset ◦ LowQualFinSF: Low quality finished square feet ◦ GrLivArea: Above grade living area square feet ◦ BsmtFullBath: Basement full bathrooms ◦ BsmtHalfBath: Basement half bathrooms ◦ FullBath: Full bathrooms above grade ◦ HalfBath: Half baths above grade ◦ Bedroom: Number of bedrooms above basement level ◦ Kitchen: Number of kitchens ◦ KitchenQual: Kitchen quality ◦ TotRmsAbvGrd: Total rooms above grade ◦ Functional: Home functionality rating ◦ Fireplaces: Number of fireplaces ◦ FireplaceQu: Fireplace quality ◦ GarageType: Garage location ◦ GarageYrBlt: Year garage was built ◦ GarageFinish: Interior finish of the garage ◦ GarageCars: Size of garage in car capacity ◦ GarageArea: Size of garage in square feet ◦ GarageQual: Garage quality ◦ GarageCond: Garage condition MACHINE LEARNING AND PATTERN RECOGNITION (CMP5130) 8
  • 9. Dataset ◦ GarageCond: Garage condition ◦ PavedDrive: Paved driveway ◦ WoodDeckSF: Wood deck area in square feet ◦ OpenPorchSF: Open porch area in square feet ◦ EnclosedPorch: Enclosed porch area in square feet ◦ 3SsnPorch: Three season porch area in square feet ◦ ScreenPorch: Screen porch area in square feet ◦ PoolArea: Pool area in square feet ◦ PoolQC: Pool quality ◦ Fence: Fence quality ◦ MiscFeature: Miscellaneous feature not covered in other categories ◦ MiscVal: Value of miscellaneous feature ◦ MoSold: Month Sold ◦ YrSold: Year Sold ◦ SaleType: Type of sale ◦ SaleCondition: Condition of sale MACHINE LEARNING AND PATTERN RECOGNITION (CMP5130) 9
  • 10. Proposed Architecture ◦ Inputs are same for all Neural Network model. ◦ Output is the prediction of house prices ◦ Adam optimizer used ◦ Mean Square Error loss function is used ◦ Each network models trained with linear, tanh, relu and selu activation function MACHINE LEARNING AND PATTERN RECOGNITION (CMP5130) 10
  • 11. Single Layer Perceptron, Multi Layer Perceptron MACHINE LEARNING AND PATTERN RECOGNITION (CMP5130) 11
  • 12. Fully Connected Layer MACHINE LEARNING AND PATTERN RECOGNITION (CMP5130) 12
  • 13. Activation Functions Linear (Identity) Function Hyperbolic tangent (Tanh) MACHINE LEARNING AND PATTERN RECOGNITION (CMP5130) 13
  • 14. Activation Functions Rectified Linear Unit (ReLU) Scaled Exponential Linear Unit (Selu) MACHINE LEARNING AND PATTERN RECOGNITION (CMP5130) 14
  • 15. Dropout MACHINE LEARNING AND PATTERN RECOGNITION (CMP5130) 15
  • 16. Cost (Loss) Function MACHINE LEARNING AND PATTERN RECOGNITION (CMP5130) 16
  • 17. Adam MACHINE LEARNING AND PATTERN RECOGNITION (CMP5130) 17
  • 18. Single Layer Network Model ◦ Single Layer Perceptron ◦ To have an idea about the performance of network on House Prices dataset MACHINE LEARNING AND PATTERN RECOGNITION (CMP5130) 18
  • 19. Model 1 ◦ Multi-Layer Perceptron (1 hidden layer) MACHINE LEARNING AND PATTERN RECOGNITION (CMP5130) 19
  • 20. Model 2 ◦ Multi-Layer Perceptron (1 hidden layer and wider) MACHINE LEARNING AND PATTERN RECOGNITION (CMP5130) 20
  • 21. Model 3 ◦ Multi-Layer Perceptron (3 hidden layer) MACHINE LEARNING AND PATTERN RECOGNITION (CMP5130) 21
  • 22. Model 4 ◦ Multi-Layer Perceptron (3 hidden layer and wider) MACHINE LEARNING AND PATTERN RECOGNITION (CMP5130) 22
  • 23. Model 5 ◦ Multi-Layer Perceptron (3 hidden layer and dropout after each hidden layer) MACHINE LEARNING AND PATTERN RECOGNITION (CMP5130) 23
  • 24. Model 6 ◦ Multi-Layer Perceptron (3 hidden layer and dropout after each hidden layer and wider) MACHINE LEARNING AND PATTERN RECOGNITION (CMP5130) 24
  • 25. Experiments and Results ◦ Data Cleaning/Preprocessing ◦ Training Network Model ◦ Results MACHINE LEARNING AND PATTERN RECOGNITION (CMP5130) 25
  • 26. Data Cleaning/Preprocessing ◦ Total column number is 79 ◦ Label encoder used for every categorical results ◦ Missing column values set to mean value for columns which have int/float type ◦ VarianceThreshold, Normalizer are applied ◦ IsolationForest algorithm applied also to find outliers. 139 outliers removed from train dataset ◦ Logarithm of Sale Price values used as y value ◦ Input columns 79 to 262 ◦ Python 3.6.3, Scikit-Learn, Pandas environment MACHINE LEARNING AND PATTERN RECOGNITION (CMP5130) 26
  • 27. MACHINE LEARNING AND PATTERN RECOGNITION (CMP5130) 27
  • 28. Variance Threshold ◦ Feature selector which removes all low-variance features ◦ Unsupervised Approach ◦ 3 features removed Normalizer ◦ Normalize samples individually to unit norm MACHINE LEARNING AND PATTERN RECOGNITION (CMP5130) 28
  • 29. Isolation Forest ◦ Scoring each sample whether it is anomaly or not ◦ Isolates observations by randomly selecting a feature and then randomly selecting a split value between the maximum and minimum values of the selected feature. MACHINE LEARNING AND PATTERN RECOGNITION (CMP5130) 29
  • 30. Training ◦ Input size 262 ◦ EarlyStopping added to training part ◦ Batch size 8 ◦ Validation Split 0.1 ◦ Keras backed by Tensorflow ◦ Tensorboard, ◦ Exponential of results used as the last results MACHINE LEARNING AND PATTERN RECOGNITION (CMP5130) 30
  • 31. Results ◦Single-Layer perceptron ◦ Training: 0.0268 ◦ Validation: 0.1639 ◦ Test: 0.1814 MACHINE LEARNING AND PATTERN RECOGNITION (CMP5130) 31
  • 32. Results Activation Dataset MLP 1 MLP 2 (wider) MLP 3 MLP 4 (wider) MLP 5 MLP 6 (wider) Lin Train 0.0179 0.0188 0.0182 0.0172 0.0181 0.0187 Val 0.1338 0.1374 0.1350 0.1313 0.1345 0.1368 Test 0.1904 0.2009 0.1895 0.1882 0.1910 0.1935 Tanh Train 0.0142 0.0138 0.0112 0.1547 0.0173 0.1558 Val 0.1195 0.1177 0.1059 0.3934 0.1318 0.3948 Test 0.1811 0.2305 0.1614 0.4184 0.1669 0.4199 Relu Train 0.0171 0.0128 0.0134 0.0150 0.0657 0.0300 Val 0.1310 0.1133 0.1160 0.1227 0.2564 0.1734 Test 0.1906 0.1891 0.1891 0.3049 0.2883 0.2036 Selu Train 0.0145 0.0117 0.0105 0.0088 0.0269 0.0134 Val 0.1204 0.1081 0.1026 0.0939 0.1642 0.1161 Test 0.1814 0.2012 0.1468 0.1936 0.1909 0.1390 MACHINE LEARNING AND PATTERN RECOGNITION (CMP5130) 32
  • 33. Conclusion and Future Works ◦ 7 different networks are implemented and experimented ◦ Deeper and wider models give better results but they cause overfitting if regularization is not used. ◦ Deeper and wider models, as well as new studies combining traditional machine learning algorithms and deep learning algorithms ◦ Batch Normalization layers, regularizers in Fully Connected layers ◦ AutoEncoders with traditional regression algorithms as Lasso, Ridge, Huber regression MACHINE LEARNING AND PATTERN RECOGNITION (CMP5130) 33
  • 34. References 1. https://www.kaggle.com/iamprateek/my-submission-to-predict-sale-price 2. https://www.kaggle.com/apapiu/regularized-linear-models 3. https://www.kaggle.com/johnnymedhanie/house-prices-gradient-boosting 4. https://www.kaggle.com/browooro/simple-feature-engineering-selection-notebook 5. https://www.kaggle.com/jimthompson/ensemble-model-stacked-model-example 6. https://www.kaggle.com/humananalog/xgboost-lasso 7. https://www.kaggle.com/miguelangelnieto/pca-and-regression 8. https://www.kaggle.com/zoupet/neural-network-model-for-house-prices-tensorflow 9. https://www.kaggle.com/tilii7/svr-sparse-matrix-bayesian-optimization 10. https://www.kaggle.com/dfitzgerald3/randomforestregressor 11. https://www.kaggle.com/dansbecker/random-forests 12. https://www.kaggle.com/ 13. https://www.kaggle.com/c/house-prices-advanced-regression-techniques MACHINE LEARNING AND PATTERN RECOGNITION (CMP5130) 34
  • 35. Thanks For You Patience CENK BIRCANOĞLU MACHINE LEARNING AND PATTERN RECOGNITION (CMP5130) 35