SlideShare a Scribd company logo
1 of 23
Is this house ‘worthy’ to be your home?
Using Regression Analysis to Predict HDB Resale Prices
Valerie Lim
4 Feb 2020
2
Table of Contents
MethodologyIntroduction
Results Conclusion
F e a t u r e I m p o r t a n c e
Results
M o d e l P e r f o r m a n c e
Methodology
Future Work
F e a t u r e S e l e c t i o nO u t l i n e
3
Introduction
• A house represents the biggest investment
• Yet, first time home buyers are often
unsure about the
1. factors that influence prices
2. ‘true’ value of property prices and
Introduction Methodology Results Conclusion Future Work
4
Introduction
Goal:
Use property listings to predict prices
and grant buyers’ with more power in
determining if their desired property is
priced at the 'truth' value
Introduction Methodology Results Conclusion Future Work
5
Data
collection
- Dropped duplicated listings
- Imputed ‘missing’ information as
0
- Converted categorical variables
(e.g. property type, model type)
to dummy variables
- Age of flat
- HDB Towns  HDB
Region
- Log transformed price
Methodology
Data
Preprocessin
g
Feature
Engineering
Feature
Selection
Train &
Evaluate
Model
Tools:
- Lasso regression
model
Metric:
- Mean absolute error,
- Root mean squared
error
Introduction Methodology Results Conclusion Future Work
Type Variable Reason for Removal In final
model?
High
correlation
Zero correlation
with target
Backward
stepwise
Lasso
regression
Target Asking price (log) ✓
Core Property Type ✓
Model type Certain Model
types
✓
Bedrooms ✓
Per square foot ✓
Area ✓
Furnish ✓
Land Tenure
HDB Town ✓
HDB Region ✓
Age
Year built ✓
Facilitie
s
Jacuzzi, Meeting Rooms ✓
Private pool, Garage ✓
Air Conditioning
Renovated
Corner Unit
Water Heater
Balcony
Private Garden
Outdoor Patio
Original Condition
Hairdryer
Bathtub
Maidsroom
Colonial Building
Private Lift
Cooker Hob/hood
City View
Park/greenery View
Sea View
Swimming Pool View
Bombshelter
Walk-in-wardrobe
Roof Terrace
✓
FEA TU R E SELEC TION
Introduction Methodology Results Conclusion Future Work
Results
MOD EL PER FOR MA N C E
 MEAN ABSOLUTE ERROR
0.10
(~$ 52K)
 ROOT MEAN SQUARED ERROR
0.16
(~$ 74K)
Introduction Methodology Results Conclusion Future Work
8
Increase Price (log):
1. Per square foot is the strongest
predictor of price
- Every 1 unit ↑ in PSF, prices ↑ by $61k
Results
FEA TU R E IMPOR TA N C E
Introduction Methodology Results Conclusion Future Work
Increase Price (log):
2. Property type
as compared to 3-room flats,
• Executive flats ($55k more)
• 5-room flats ($55k more)
• Jumbo flats ($51k more)
Results
FEA TU R E IMPOR TA N C E
9
Introduction Methodology Results Conclusion Future Work
10
Increase Price (log):
3. Number of bedrooms
- Every additional bedroom ↑ prices by $54k
Results
FEA TU R E IMPOR TA N C E
10
Introduction Methodology Results Conclusion Future Work
11
1
A 3-room flat
• Smaller per square foot
For a relatively cheaper
house:
11
Conclusion
There are different types of HDB houses you can call home
2
A 5-room, jumbo or executive flat
If you need a bigger living
space and have more
budget:
Introduction Methodology Results Conclusion Future Work
12
Conclusion
There are different types of HDB houses you can call home
3 But they could matter based on individuals’
preferences
Peripheral factors don’t
influence prices as much as
core features
Introduction Methodology Results Conclusion Future Work
13
• Collect more data, from multiple sources.
• Retrieve actual purchasing prices from ERA
• Create additional features e.g. distance from
business districts – Central Business
District, Mapletree Business District, Jurong
Lake District
Future Work
Introduction Methodology Results Conclusion Future Work
14
Thank you!
github.com/valerielimyh
medium.com/@valerieeelimyh
valerieeelimyh@gmail.com
Valerie Lim
Questions?
Appendix
Model Comparison
Linear Regression Model Performance Comparison
Model Type Adjusted R2 RMSE MAE RMSE
(exponential)
MAE
(exponential)
Lasso
Regresison
Validation 84% 0.12 0.10 60k 46k
Test 77% 0.15 0.10 74k 52k
Ordinary Least
Square
Validation 84% 0.12 0.10 61k 45k
Test 84% 0.15 0.10 72k 50k
OLS summary
statistics
Feature engineering: Log Transforms of target
variable
Assumption 1: regression is linear in parameters and correctly
specified
Assumption 2: Residuals should be normally distributed with zero
mean
Assumption 3: error terms must have constant variance

More Related Content

Similar to Predicting HDB resale price in Singapore

2010 pilot study 1950s with basements
2010 pilot study 1950s with basements2010 pilot study 1950s with basements
2010 pilot study 1950s with basementsmhmaggie
 
Predictive modeling for resale hdb evaluation price
Predictive modeling for resale hdb evaluation pricePredictive modeling for resale hdb evaluation price
Predictive modeling for resale hdb evaluation pricekahhuey
 
Improving classification accuracy for customer contact transcriptions
Improving classification accuracy for customer contact transcriptionsImproving classification accuracy for customer contact transcriptions
Improving classification accuracy for customer contact transcriptionsMaria Vechtomova
 
Statistical Analysis Real Estate Market
Statistical Analysis Real Estate MarketStatistical Analysis Real Estate Market
Statistical Analysis Real Estate MarketRoy Lunsford
 
eVent Business Model
eVent Business Model eVent Business Model
eVent Business Model Cory Grenier
 
Demystifying Machine Learning
Demystifying Machine LearningDemystifying Machine Learning
Demystifying Machine LearningLouis Dorard
 
Ch1_slides.ppt
Ch1_slides.pptCh1_slides.ppt
Ch1_slides.pptSoycam
 
DirectionsThis exam consists of seven problems and is an open-boo.docx
DirectionsThis exam consists of seven problems and is an open-boo.docxDirectionsThis exam consists of seven problems and is an open-boo.docx
DirectionsThis exam consists of seven problems and is an open-boo.docxduketjoy27252
 
Price Premium of ENERGY STAR Certified Homes: A Maryland Analysis
Price Premium of ENERGY STAR Certified Homes: A Maryland AnalysisPrice Premium of ENERGY STAR Certified Homes: A Maryland Analysis
Price Premium of ENERGY STAR Certified Homes: A Maryland AnalysisMichelleYuan10
 
Predicting_housing_prices_using_advanced.pdf
Predicting_housing_prices_using_advanced.pdfPredicting_housing_prices_using_advanced.pdf
Predicting_housing_prices_using_advanced.pdfAyesha Lata
 
Predicting King County House Prices
Predicting King County House PricesPredicting King County House Prices
Predicting King County House PricesPawan Shivhare
 
3rd annual Marcus & Millichap / IPA West and Central Florida Multifamily Forum
3rd annual Marcus & Millichap / IPA West and Central Florida Multifamily Forum3rd annual Marcus & Millichap / IPA West and Central Florida Multifamily Forum
3rd annual Marcus & Millichap / IPA West and Central Florida Multifamily ForumRyan Slack
 
Principles and Practice Module 8 PowerPoint
Principles and Practice Module 8 PowerPointPrinciples and Practice Module 8 PowerPoint
Principles and Practice Module 8 PowerPointRod Farthing
 
RESN-Nevada-Pre-licensing-Chapter 3.pptx
RESN-Nevada-Pre-licensing-Chapter 3.pptxRESN-Nevada-Pre-licensing-Chapter 3.pptx
RESN-Nevada-Pre-licensing-Chapter 3.pptxtyler716641
 
Caba Soft Market Vshort
Caba Soft Market VshortCaba Soft Market Vshort
Caba Soft Market VshortCABA
 

Similar to Predicting HDB resale price in Singapore (20)

2010 pilot study 1950s with basements
2010 pilot study 1950s with basements2010 pilot study 1950s with basements
2010 pilot study 1950s with basements
 
Predictive modeling for resale hdb evaluation price
Predictive modeling for resale hdb evaluation pricePredictive modeling for resale hdb evaluation price
Predictive modeling for resale hdb evaluation price
 
Improving classification accuracy for customer contact transcriptions
Improving classification accuracy for customer contact transcriptionsImproving classification accuracy for customer contact transcriptions
Improving classification accuracy for customer contact transcriptions
 
Statistical Analysis Real Estate Market
Statistical Analysis Real Estate MarketStatistical Analysis Real Estate Market
Statistical Analysis Real Estate Market
 
Property investment and evaluation
Property investment and evaluationProperty investment and evaluation
Property investment and evaluation
 
eVent Business Model
eVent Business Model eVent Business Model
eVent Business Model
 
Demystifying Machine Learning
Demystifying Machine LearningDemystifying Machine Learning
Demystifying Machine Learning
 
Team 6 project paper
Team 6 project paperTeam 6 project paper
Team 6 project paper
 
Ch1_slides.ppt
Ch1_slides.pptCh1_slides.ppt
Ch1_slides.ppt
 
Ch1 slides
Ch1 slidesCh1 slides
Ch1 slides
 
Econometrics
EconometricsEconometrics
Econometrics
 
Ch1_slides.ppt
Ch1_slides.pptCh1_slides.ppt
Ch1_slides.ppt
 
DirectionsThis exam consists of seven problems and is an open-boo.docx
DirectionsThis exam consists of seven problems and is an open-boo.docxDirectionsThis exam consists of seven problems and is an open-boo.docx
DirectionsThis exam consists of seven problems and is an open-boo.docx
 
Price Premium of ENERGY STAR Certified Homes: A Maryland Analysis
Price Premium of ENERGY STAR Certified Homes: A Maryland AnalysisPrice Premium of ENERGY STAR Certified Homes: A Maryland Analysis
Price Premium of ENERGY STAR Certified Homes: A Maryland Analysis
 
Predicting_housing_prices_using_advanced.pdf
Predicting_housing_prices_using_advanced.pdfPredicting_housing_prices_using_advanced.pdf
Predicting_housing_prices_using_advanced.pdf
 
Predicting King County House Prices
Predicting King County House PricesPredicting King County House Prices
Predicting King County House Prices
 
3rd annual Marcus & Millichap / IPA West and Central Florida Multifamily Forum
3rd annual Marcus & Millichap / IPA West and Central Florida Multifamily Forum3rd annual Marcus & Millichap / IPA West and Central Florida Multifamily Forum
3rd annual Marcus & Millichap / IPA West and Central Florida Multifamily Forum
 
Principles and Practice Module 8 PowerPoint
Principles and Practice Module 8 PowerPointPrinciples and Practice Module 8 PowerPoint
Principles and Practice Module 8 PowerPoint
 
RESN-Nevada-Pre-licensing-Chapter 3.pptx
RESN-Nevada-Pre-licensing-Chapter 3.pptxRESN-Nevada-Pre-licensing-Chapter 3.pptx
RESN-Nevada-Pre-licensing-Chapter 3.pptx
 
Caba Soft Market Vshort
Caba Soft Market VshortCaba Soft Market Vshort
Caba Soft Market Vshort
 

Recently uploaded

Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023ymrp368
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998YohFuh
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiSuhani Kapoor
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Delhi Call girls
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxolyaivanovalion
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSAishani27
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxfirstjob4
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiSuhani Kapoor
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...Suhani Kapoor
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 

Recently uploaded (20)

Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICS
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptx
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 

Predicting HDB resale price in Singapore

  • 1. Is this house ‘worthy’ to be your home? Using Regression Analysis to Predict HDB Resale Prices Valerie Lim 4 Feb 2020
  • 2. 2 Table of Contents MethodologyIntroduction Results Conclusion F e a t u r e I m p o r t a n c e Results M o d e l P e r f o r m a n c e Methodology Future Work F e a t u r e S e l e c t i o nO u t l i n e
  • 3. 3 Introduction • A house represents the biggest investment • Yet, first time home buyers are often unsure about the 1. factors that influence prices 2. ‘true’ value of property prices and Introduction Methodology Results Conclusion Future Work
  • 4. 4 Introduction Goal: Use property listings to predict prices and grant buyers’ with more power in determining if their desired property is priced at the 'truth' value Introduction Methodology Results Conclusion Future Work
  • 5. 5 Data collection - Dropped duplicated listings - Imputed ‘missing’ information as 0 - Converted categorical variables (e.g. property type, model type) to dummy variables - Age of flat - HDB Towns  HDB Region - Log transformed price Methodology Data Preprocessin g Feature Engineering Feature Selection Train & Evaluate Model Tools: - Lasso regression model Metric: - Mean absolute error, - Root mean squared error Introduction Methodology Results Conclusion Future Work
  • 6. Type Variable Reason for Removal In final model? High correlation Zero correlation with target Backward stepwise Lasso regression Target Asking price (log) ✓ Core Property Type ✓ Model type Certain Model types ✓ Bedrooms ✓ Per square foot ✓ Area ✓ Furnish ✓ Land Tenure HDB Town ✓ HDB Region ✓ Age Year built ✓ Facilitie s Jacuzzi, Meeting Rooms ✓ Private pool, Garage ✓ Air Conditioning Renovated Corner Unit Water Heater Balcony Private Garden Outdoor Patio Original Condition Hairdryer Bathtub Maidsroom Colonial Building Private Lift Cooker Hob/hood City View Park/greenery View Sea View Swimming Pool View Bombshelter Walk-in-wardrobe Roof Terrace ✓ FEA TU R E SELEC TION Introduction Methodology Results Conclusion Future Work
  • 7. Results MOD EL PER FOR MA N C E  MEAN ABSOLUTE ERROR 0.10 (~$ 52K)  ROOT MEAN SQUARED ERROR 0.16 (~$ 74K) Introduction Methodology Results Conclusion Future Work
  • 8. 8 Increase Price (log): 1. Per square foot is the strongest predictor of price - Every 1 unit ↑ in PSF, prices ↑ by $61k Results FEA TU R E IMPOR TA N C E Introduction Methodology Results Conclusion Future Work
  • 9. Increase Price (log): 2. Property type as compared to 3-room flats, • Executive flats ($55k more) • 5-room flats ($55k more) • Jumbo flats ($51k more) Results FEA TU R E IMPOR TA N C E 9 Introduction Methodology Results Conclusion Future Work
  • 10. 10 Increase Price (log): 3. Number of bedrooms - Every additional bedroom ↑ prices by $54k Results FEA TU R E IMPOR TA N C E 10 Introduction Methodology Results Conclusion Future Work
  • 11. 11 1 A 3-room flat • Smaller per square foot For a relatively cheaper house: 11 Conclusion There are different types of HDB houses you can call home 2 A 5-room, jumbo or executive flat If you need a bigger living space and have more budget: Introduction Methodology Results Conclusion Future Work
  • 12. 12 Conclusion There are different types of HDB houses you can call home 3 But they could matter based on individuals’ preferences Peripheral factors don’t influence prices as much as core features Introduction Methodology Results Conclusion Future Work
  • 13. 13 • Collect more data, from multiple sources. • Retrieve actual purchasing prices from ERA • Create additional features e.g. distance from business districts – Central Business District, Mapletree Business District, Jurong Lake District Future Work Introduction Methodology Results Conclusion Future Work
  • 18. Model Comparison Linear Regression Model Performance Comparison Model Type Adjusted R2 RMSE MAE RMSE (exponential) MAE (exponential) Lasso Regresison Validation 84% 0.12 0.10 60k 46k Test 77% 0.15 0.10 74k 52k Ordinary Least Square Validation 84% 0.12 0.10 61k 45k Test 84% 0.15 0.10 72k 50k
  • 20. Feature engineering: Log Transforms of target variable
  • 21. Assumption 1: regression is linear in parameters and correctly specified
  • 22. Assumption 2: Residuals should be normally distributed with zero mean
  • 23. Assumption 3: error terms must have constant variance

Editor's Notes

  1. Good morning everyone. Today, I’ll be sharing with you my project on using Regression Analysis to predict HDB resale prices
  2. I’ll begin by setting up the context and problem statement, before I walk you through my methodology. Next, I’ll dive into the results as well as some interesting insights and recommendation. Lastly, I’ll wrap up with some future directions for this project
  3. For most households, a house represents the biggest investment. For couples who are planning to settle down but couldn’t secure a Built-To-Order flat would have to seek alternatives, such as HDB Resale flats. Yet, as first-time home buyers, they are often clueless about the market rate of house prices. This could be because there are a lot of factors that could influence house prices. Unless you’re sufficiently well-read about the property market or has done/is doing extensive research, buyers often have imperfect information. As a result, property agent are likely to lead this conversation. Can buyers be granted with actionable insights so that the balance can be more symmetrical?
  4. Yes! That’s the aim of this project: to provide data-driven and valuable guidance to first-time home buyers in determining if their desired property is priced at the 'truth' value
  5. - I used Beautiful Soup to scrape HDB resale flats from SRX. Each listing has various information, from core information such as property type, name, location, model type to more granular information such as the kind of facilities they have, whether it is a corner unit or renovated etc. Since the target audience are couples who are searching for a flat, I focused on listings that are 3-room and above Among the pool of selected listings, there are some duplicated ones which I dropped. Some listings omitted that other listings provided, and I imputed these missing ones as 0 I converted such variables (e.g. property type, model type) to dummy variables I created new features such as age from years built for ease of interpretation, and aggregated HDB towns into regions to investigate whether different areas in Sg have an impact on prices. Feature selection was conducted at various stages of the model building process, which I’ll share in greater detail in the next slide Finally I built the model using Lasso reg and used mainly Mean absolute error (which is an average error) as my main metric for evaluation
  6. Before building the models:  To avoid multicollinearity, features that were highly correlated were removed. Features that had 0 correlation with target variable was also removed. After which, features were selected using backward stepwise method. Based on OLS model, features with p-values above 0.05 were removed Lasso regression was also used for feature selection. Features that had coefficients of 0 were removed. Finally, the final features are property type and per square foot and number of bedrooms.
  7. Using this model, the mean absolute error is about 0.1. after reversing the log transformation, this error is equivalent to $52k, which means buyers using this tool can expect roughly that much wiggle room in determining housing prices based on the features mentioned earlier.   Rmse: Since the errors are squared before they are averaged, the RMSE gives a relatively high weight to large errors. This means the RMSE should be more useful when large errors are particularly undesirable.   Plot actual log prices against my predicted log prices. The red line represents a perfect model where my actual prices is equal to =my predicted prices. My linear reg model appears suitable for the data, except the some of these outliers – where the model predicted a lower price than actual.   Based on these data points, the model suggested property type bigger than 5-room flats and PSF to be sig predictors. But other features could be more suitable in predicting housing prices for these data, so the model can’t predict these flats accurately.
  8. To best understand how the model works, let’s dive into the features. The plot below shows the relative importance of each feature The strongest positive predictors is per square foot. property type. Model intercept = 13.12 For 1 unit increase in PSF, price increases by 61k
  9. Followed by property type - An executive flat and 5-room flat increases price by 55k more than a 3-room flat - A jumbo flat increases price by 51k more than a 3-room flat
  10. Lastly bedrooms. For every additional bedrooms, price increases by 54k
  11. In conclusion, there are different types of HDB houses you can call home. With the above insights, this tells us that if you are looking for something that is relatively cheaper, you can consider a simple 3-room flat with smaller PSF If you need a bigger living space and have more budget, a 5-room, jumbo or executive flat
  12. Lastly, Aesthetics factors (e.g. sea view, renovated, corner unit) are not as significant in determining prices as core features (e.g. property or model type); But they could matter based on individuals’ preferences
  13. Current model has still a lot of room for improvement. To increase the accuracy of prediction, I can collect more data, as this results is based on 1000 data points. Having more listings would lower the MAE. The current source of data is limited to SRX. For a more comprehensive analysis, I could scrape from multiple property market websites such as Propertyguru, 99.co However, relying on data from property portals may be biased as agents may hike up prices to earn a larger share of the pie. For an even more accurate analysis, I could gather the actual purchasing prices from ERA.