After going through the pain of buying and selling my HDB recently, I believe a predictive model of resale price is needed. I wrote a model using MLR.
Y = β0 + β1X1 + β2X2 + β3X3 + β4X4 + β5X5 +… + βpXp + Ɛ
This model includes variables like west sun, corridor etc. It would be a great help for buyers and sellers if this model with the support of real data from the authority is available publicly.
Call Girls In Vasant Vihar Delhi 💯Call Us 🔝8264348440🔝
Predictive modeling for resale hdb evaluation price
1. 2017
Loh Kah Huey
Business Analytic Plan
MKT654 Business Analytics Assignment
4-August-2017
Predictive Model of HDB Resale Price
2. 1
Business Analytics Individual Assignment: Loh Kah Huey
Background: HDB Resale Transaction
More than 80% of Singaporeans live in HDB. There are more than 1 million of HDB units in Singapore
and an average of 20,000 HDB resale transaction per yeari
.
According to the latest policy of the HDB resale transaction, buyers and sellers need to negotiate the
resale price before an official valuation of unit can be requested. Without an official valuation price
of the unit as reference, negotiation is generally based on recent HDB transaction price that
published on the HDB InfoWEB e-Serviceii
. However, the information that obtains from this data is
limited. HDB transaction price is provided with some basis variables i.e. HDB Town, Block, Floor Area
(sqm), Flat Model. Lease Commence Date, Resale Price and Resale Registration Date.
The challenges of using current reference dataset as below:
1. Differences of the transaction price can be very significant but insufficient information is
provided to explain the driving factors of this difference.
a. Example 1: transaction prices of two executive apartments at 542 Jelapang Road are
$628,000 vs. $725,000. The difference of two prices is huge, it is close to $100,000
despite of both are same in size, same in flat type and located in the same block.
Floor level of both apartments is marginally different but it should not the only
factor to cause the difference of price.
b. Example 2: comparison of transaction prices 5-room flat at for block 534, 536 and
540. The difference of price is close to $60,000 which is more than 10% of the price
despite of these flats in the similar location, same type and same size. Although the
general belief is higher level demands of higher price. This belief is not applicable in
this case. There may be other parameters drives the price difference such as interior
condition of the flat, orientation of the flat, location of the flat in the block and
whether there is direct west sun etc.
2. Sometimes, no HDB flat with similar profile in transaction in the past 12 months in the
specific town. Therefore, no relevant data can be used as reference.
3. This dataset is not able to predict new and unobserved data.
Business Objectives
In order to assist both buyers and sellers to have a better referencing price for negotiation, a
predictive model can be developed. Several models have been attempted and published but none of
these modelsiii iv v
includes the important lifestyle parameters which are known to have significant
impact of the price such as distance from the respective town centre, amenities of the elite schools,
orientation of west sun etc.
A business analytics plan is proposed to achieve these objectives.
Data Management
Required data for this model is the resale transaction data and other parameters of the respective
flats. All these information are currently captured for every HDB transaction. These data are readily
available with data.gov.sg and HDB. Therefore, no extra cost is required to collect this data. Data
storage, processing and management should be done in collaboration with data.gov.sg and HDB.
3. 2
Business Analytics Individual Assignment: Loh Kah Huey
Variables for this predictive model as below:
Variable Description Data type
ResalePrice Resale HDB price in SGD Numeric
Storey Floor level of the unit Numeric
Floor_Area Area in square meter Numeric
Flat_Type 1-room to 5-room, executive maisonette(EM) ,
executive apartment (EA), multi-generation (MG)
(Dummy variable 0/1 is required for each category i.e.
1-room=0/1, 2-room=0/1, 3-room=0/1, 4-room=0/1, 5-
room=0/1, EM=0/1, EA=0/1. Last variable, MG is
redundant. Inclusion of redundant variable will cause
multicollinearity error.)
Binary
Age Age of the flat in years as of 2017
(calculated from the lease commencement date)
Numeric
Resale_Price In SGD Numeric
DisCBD Distance from CBD in KM Numeric
DisTownCentre Distance from the respective/nearest town centre in KM Numeric
DisMRT Distance from nearer MRT in KM Numeric
West_Sun Orientation of the flat whether there is any direct west
sun
Dummy variable: 1=no for west sun, 0=yes west sun
Binary
Amenities Number of amenities available within 1 km
(Amenities such as neighbourhood schools, elite
schools, tertiary institutions, parks, shops/shopping
centres, eating/F&B places, library, community centre,
sport facilities, wet market)
Dummy variable: 1=yes i.e. the respective amenity is
available, 0= absence of the respective amenity
Each amenity as a dummy variable. There are 10
variables under this category.
Binary
Interior Interior conditions and state of maintenance of the flat
(On scale 1 to 5, 1 is poor and 5 is excellent)
Numeric
Transportation Connection of transportation and mobility by referring
the total number of transit bus/LRT available at the bus
stop/LRT stations within 0.5 km
Numeric
Corridor Location of the unit whether it is right located by the
corridor where non-corridor units are better in term of
privacy and theft protection. (See exhibition 2 for
illustration)
Dummy variable: 1=no for corridor, 0=yes for corridor
Binary
UnblockView View of the flat in term of it is blocked by nearby flat or
it is an unblocked view which provides greater privacy,
natural sun light and better ventilation.
Dummy variable: 1=yes for unblocked view, 0= no for
unblocked view
Binary
4. 3
Business Analytics Individual Assignment: Loh Kah Huey
Analysis Plan
1. Predictive model is built according for each HDB townvi
separately. There are total of 23 HDB
town in Singapore, namely Ang Mo Kio, Bedok, Bukit Panjang, Bukit Timah etc. Multiple
linear regression is used.
2. Data will be partitioned into training and validation. Training setvii
is used to estimate the
predictive model and the validation set is to access the predictive model performance.
3. Examine the relationship of the variables. Number of variables can be reduced via the first
step through the understanding of the domain knowledge i.e. what type of variables are
measured and how it should be prioritised. Secondly, computational power and statistical
significance are taken into consideration to select the relevant variables. Some examples of
variable reduction methods as below:
a. Select the variables by examination the p-value of each variables. Variables can be
eliminated selectively according to the p-value which is greater than a cut off e.g.
0.5. This cut off can be adjusted according to the business needs of the model.
b. Running correlation table to search for highly correlated pairs to reduce and remove
these variable(s),
c. Exhaustive search, Subset selection algorithm (forward selection, backward
elimination, stepwise regression),
d. Principal Components and Regression Tree.
4. A multiple linear regression model is generated to the function of ResalePrice by fitting the
above variables into the model. ResalePrice is the output of variable and other variables
are predictors.
Y = β0 + β1X1 + β2X2 + β3X3 + β4X4 + β5X5 +… + βpXp + Ɛ
β1, …, βp = coefficients, Ɛ = noise
5. The performance of this model can be measured by how well the predictive accuracy is via
the validation set. Average error, error percentile and a boxplot of residuals can be
examined to assess the predictive performance of the model.
6. A general recommended number of observations required is larger than 5(p+2), where p is
the total number of variables as predictors for a parsimonious modelviii
, which is the minimal
number of predictor variablesix
. With an average or 20,000 HDB transactions per year in
these 23 towns, there may be an average 850 cases for each town per year (See exhibition 3
for recent HDB transaction volume). Recent 2 to 3 years data is recommended in
consideration if there is no major changes or development happens in the respective towns
or no major fluctuation of the inflation rate in general. Therefore, about 1,000 to 2,000
observations is recommended per town (which equal to per HDB town).
7. R2
is examined to determine how close the data can be fitted into the regression line in the
range of 0 to 1. The higher it is, the better the model fits the data. The problems with R2
are it increases by adding predictors, therefore, it may always appear to be better when
more predictors are added. Besides, there may be overfitting the model by having random
noise when there are too many predictors. Adjusted R2
will be used to overcome the
problems because it does not artificially increase simply by adding more predictors.
8. Although this is a predictive model, it is not aimed to develop an explanatory model. The
goal is to predict new individual observation. Coefficient is not the key focus but it does
provide some level of understanding about the association between output variable
5. 4
Business Analytics Individual Assignment: Loh Kah Huey
(ResalePrice) and the predictors i.e. whether the respective predictors are positive or
negative associated with the output variables. The key focus for predictive model is the
prediction (Y) in this case.
9. Since the data and the respective variables are readily available and all information are
captured for each transaction, handling of missing data is unlikely an issue for this model.
The variable with the largest scale is age. Normalising data of this variable maybe
considered.
Conclusion
This predictive model provides a better referencing points of HDB resale price for both buyers and
sellers to gauge the valuation of the unit. With a more accurate predictive price, it provides time and
cost saving through making the right decision in the beginning and better financial planning for both
parties. This model will also be updated periodically, recommended on weekly basis to ensure the
predictive model is always up-to-date, relevant to the users and align with the market direction.
Limitations and future research
This business analytics plan is written based on the knowledge that acquired via 4-day executive
program for MSc Marketing and Consumer Insight on 18-22 July 2017. Other data mining techniques
e.g. a hedonic price model maybe considered and included to refine the predictive model in addition
to the multiple linear regression that used here. Future research should also cover time factors,
transaction dates/month, and resale price index. This predictive model can also be modified for
private residential areas which could be beneficial for wider group of general public, foreigners who
live in Singapore, foreign investors and overall real estate industry.
6. 5
Business Analytics Individual Assignment: Loh Kah Huey
Appendix
Exhibition 1: Illustration of limitation of current referencing price point via HDB infoweb:
Transaction prices of two executive apartments at 542 Jelapang Road are $628,000 vs. $725,000. The
difference of two prices are close to $100,000 despite of both are same in size, same in flat type and located in
the same block.
Comparison of transaction prices 5-room flat at for block 534, 536 and 540. The difference is close to $60,000
which is more than 10% of the price despite of these flats in the similar location, same type and same size.
Generally, higher level demands of higher price. However, this hypothesis is not applicable here. There may be
other parameters drives the price difference.
7. 6
Business Analytics Individual Assignment: Loh Kah Huey
Exhibition 2: HDB unit at corridor where privacy of the living room and common bedroom is a
concern
Exhibition 3: Resale applications registered (Quarterly record)x
2017 1-Room 2-Room 3-Room 4-Room 5-Room Executive* Total
Q1 2 64 1,210 1,857 1,055 342 4,530
Q2 1 100 1,519 2,529 1,407 445 6,001
Total 3 164 2,729 4,386 2,462 787 10,531
Quarter 1-Room 2-Room 3-Room 4-Room 5-Room Executive* Total
Q1 1 51 1,220 1,809 1,023 345 4,449
Q2 3 72 1,500 2,473 1,369 421 5,838
Q3 1 74 1,473 2,250 1,283 433 5,514
Q4 1 69 1,326 2,071 1,186 359 5,012
Total 6 266 5,519 8,603 4,861 1,558 20,813
8. 7
Business Analytics Individual Assignment: Loh Kah Huey
About the author:
Loh Kah Huey
Business owner of CalebREO to provide market research and
supporting expertise to delivery successful research insights to
the clients.
17 years in market research industry, she started her career with
Ipsos (formerly known Synovate), spent significant number of
years with TNS/Kantar and left in 2012 to set up CalebREO.
Kah Huey graduated with Bachelor of Science, Major in Chemistry in 1999 from University of
Malaya. She is currently pursuing her study of Master of Science, Marketing and Consumer
Insight at Nanyang Business School, National Technology University, Singapore.
Publication: Understanding the Chinese healthcare consumer.
http://www.ephmra.org/user_uploads/11.%20asia%20parallel%203%20research%20partnership%20yates%20and
%20loh.pdf
i http://www.straitstimes.com/singapore/housing/hdb-resale-flat-transactions-rose-78-per-cent-in-2016-even-as-prices-
remained-flat
ii https://services2.hdb.gov.sg/webapp/BB33RTIS/
iii https://yuzukixx.github.io/2017/02/23/quick-analysis-and-prediction-of-hdb-resale-flat-prices/
iv Muhammad Faishal Ibrahim, Fook Jam Cheng, Kheng How Eng (Department of Real Estate, National University of
Singapore, Singapore), Automated valuation model: an application to the public housing resale market in Singapore
http://www.emeraldinsight.com/doi/abs/10.1108/02637470510631492?mobileUi=0&journalCode=pm
v Peter C. B. PHILLIPS, Singapore Management University, Jun Yu, A New Hedonic Regression for Real Estate Prices Applied
to the Singapore Residential Market
https://ink.library.smu.edu.sg/cgi/viewcontent.cgi?article=2602&context=soe_research
vi http://www.hdb.gov.sg/cs/infoweb/about-us/history/hdb-towns-your-home
vii Data Mining for Business Analytics: Concepts, Techniques, and Applications with XLMiner. Galit Shmueli, Nitin R. Patel
and Peter C. Bruce, 2016
viii Data Mining for Business Analytics: Concepts, Techniques, and Applications with XLMiner. Galit Shmueli, Nitin R. Patel
and Peter C. Bruce, 2016
ix http://www.statisticshowto.com/parsimonious-model/
x http://www.hdb.gov.sg/cs/infoweb/residential/buying-a-flat/resale/resale-statistics
Source of photo: http://www.asiaone.com/singapore/new-hdb-flats-go-all-green; google image search