SlideShare a Scribd company logo
1 of 32
Download to read offline
Analyze price determinants and forecast Seoul apartment prices
(Year of 2022)
The 2nd KAIST Digital Finance Mastership Program: Cloud computing and Bigdata Analysis
Kyungrok Park
Analysis purpose
1
Check the influence of various factors that determine the Seoul apartment sales index and predict future sales price trends
Analyzable?
Get data
Choose topics that make your analytics data accessible
Generate interest
Analyze apartment prices that have been fluctuating rapidly
recently to find out if Engage your audience
Align with learning
Make the most of what you learned in the KAIST CBA course
1
Examine existing research
Using 8 explanatory variables to predict apartment prices by region
Consumer Price Index, Term Deposit Rate, Money Volume (M2), Apartment Transaction Price Index, Index, Apartment Sales Status,
Mortgage Interest Rate, Real Estate Search Index
Among ARIMA, Random Forest, and LSTM, we conclude that the LSTM model is the most accurate.
Machine learning models have better predictive performance than traditional time series models
The objective of this study is to identify the numbers that can explain the major inflection points in the price using the LIME
algorithm after selecting a predictive model.
Analyzed that data from 4 months ago has no effect on price changes, and data from 1-2 months ago is the main explanatory
variable for apartment price forecasting.
If the interest rate on a mortgage decreases and the interest rate on a time deposit increases, you can suspect a price drop in two
months
In Seoul and Busan, real estate-related search index (Google Trends) was analyzed as a key explanatory variable
Verify the feasibility of applying machine learning to the home price disclosure calculation industry by analyzing the actual price gap
rather than comparing the predictive power of different models.
Analyzed a total of 4,791 apartment transactions in Gangnam-gu, Seoul from January 1, 2016 to December 31, 2016, excluding
outliers.
Use explanatory variables such as floor, land area, dedicated area, age, transaction with subway station, number of units, transaction
month, building, etc.
Randomly use 70% of all cases as training data and 30% as testing data
Machine learning methods SVM, RF, GBRT, and DNN have been shown to have better predictive power than MRA
Given the increasing budget of the common housing disclosure business, it is expected to increase work efficiency by estimating the
price of common housing through machine learning.
2
Data to analyze
2
Check the influence of various factors that determine the Seoul apartment sales index and predict future sales price trends
Collect data on 13 variables related to Seoul apartments in each area - Preprocessing
Source: Statistics Korea, Ministry of Land, Infrastructure, and Transport,
Bank of Korea, Housing Finance Corporation, Korea Real Estate Agency,
e-National Indicators
Data range: January 2006 to April 2022 (198 months total)
apt_prc Trading Indices
lease_prc Jeonse* Index
Dwelling Type Data
lease_rto Jeonse rates
rent_trans_rto Rent conversion rate
Supply data
approved_home Authorized Volume
(60 meters2 or more)
unsold_home Unsold Volume
year30_home 30-Year-Old Homes
(Redevelopment-rebuildability)
family_tot Total household debt
M2_mon M2 call volume
Financial data
base_int Base Rate
loan_int Mortgage rates
Other data
date (index) Date - Monthly
marriage Newlyweds (new demand)
growth Real economic growth
3
Why indices?
?
- Aim to understand macro market trends
- Most accumulated statistics over a long period of time without missing data
- Regulation creates market distortions, such as increased specialty trading
- Possibility of individual listings over-representing the sample due to reduced trading volume
* Jeonse : The Jeonse system is a unique system in South Korean rental practice where tenants pay a large deposit instead of monthly rent, which is returned at the lease's end, usually after two years.
This deposit can be a significant percentage of the property's value, providing landlords with a lump-sum to invest.
Analysis methods
3
Check the influence of various factors that determine the Seoul apartment sales index and predict future sales price trends
Correlation
1
Analyze the correlation between the apartment sales index and each variable and select variables
Regression
2
Generate regressions for models with high explanatory power and see the impact of variables
Time-series
3
Predicting future apartment sales index trends with time series analysis
Data modeling and forecasting with Python and Brightics AI
Utilize Google Collaboration Tools
Python
Brightics AI
Drive Colab
Sheets Presentation
4
Workflows
4
5
Correlation
Correlation & Variable Selection
Correlation
5
Correlation is a descriptive analysis technique that looks to determine if a linear relationship exists between two variables and, if so, how strong it is.
Typically, the Pearson's Correlation Coefficient is used.
You should utilize the appropriate analysis technique for the nature of your data.
The approximate distribution of two continuous variables based on the values of
the correlation coefficient.
You can see that the closer the shape is to a straight line, the closer it is to 1 or -1.
This is a value that tells you how strong the linear relationship between two variables is if it exists, and can be used when the two variables are continuous quantitative variables.
The correlation coefficient is derived from the covariance, when squared, which is equal to the coefficient of determination of a simple linear regression.
The value of the correlation coefficient ranges from -1 to 1. If the sign is positive, it is called a positive correlation, and if it is negative, it is called a negative correlation.
Pearson's Correlation
Calculatio
ns
Test statistic t The p-value is calculated as 2×P(T>t)2×P(T>t), which is equal to
T follows a t distribution with n - 2 degrees of freedom.
Null hypothesis (𝐻0): There is no linear relationship between the two variables. (r = 0)
Null hypothesis (𝐻1): There is a linear relationship between two variables. (r ≠ 0)
6
Use Spearman/Kendal for ordinal variables
Theory
Correlation
5
The Cross Correlation Function (CCF) is a signal processing technique that overlays different time series to analyze their similarity in shape and time differences.
Analyze correlations and time differences between time series that satisfy stationarity.
If it does not satisfy normality, it must be transformed into a time series that satisfies normality through differentiation or logarithmic transformation.
High similarity with a time difference of 4
columns
Graph x + 4 squares = Graph y
CCF
Cross Correlation Basics
Calculates the difference between consecutive observations to remove variation at the time series level and make the average change constant.
As a result, trends or seasonality are removed (or reduced).
Differencing
Difference
formulas
Quadratic Difference
Formula
7
Theory
Correlation
5
Analytics
The number of licenses is not only related to apartment sales, but also to other variables such as
No significant positive or negative correlation.
Without knowing the nature of the variables, it's easy to assume that the data is irrelevant.
Permit volume leads to actual occupancy, which in turn affects prices.
It is characterized by the presence of time differences.
If there is a time difference, the usual correlation analysis is meaningless.
!
CCF analysis (second difference) shows that the decrease in permit volume is due to
confirmed that it leads to an increase in the Seoul apartment sales index about 6 years later
8
Correlation
5
See correlations between variables
9
Analytics
Regression
Regression & Variable Influence
Regression
6
Split the data into a Train Set and Test Set with a ratio of 85:15
10
Split
Regression
6
Regression model performance evaluation metrics
MSE
Key loss functions for regression models
Define it as the mean of the squared errors, which is the difference between the predicted
and actual values.
Because it squares, it is sensitive to outliers.
MAE
The absolute mean of the errors, the difference between the actual and predicted values.
Less sensitive to outliers than MSE.
RMSE
Values rooted in MSE
Converting error metrics back to units similar to their actual values makes them easier to interpret.
R-Squared
Evaluating Distributed Prediction Performance
You can intuitively judge relative performance regardless of the scale of your data.
The closer to zero
Good performance
The closer to 1
Good performance
11
Metrics Calculations Description. Performance criteria
Theory
Linear Regression
7
A linear regression uses one continuous dependent variable and two or more independent variables to estimate the relationship between the
determine the relationship between the independent and dependent variables and use the resulting regression model to predict the value of the dependent variable.
12
The slope of the regression, which depends on the unit of measure and is therefore independent of intensity.
Coefficient of Elasticity (COEF)
The smaller the standard deviation of the residual (the difference between the regression estimated by the sample and the true value), the better the fit.
Standard error (std_err)
Estimated parameter value / standard deviation (parameter value)
The larger it is, the smaller the p-value and the higher the probability of rejecting the null hypothesis.
t-statistics
A value that determines if the relationship between variables is significant.
The smaller it is, the higher the probability of rejecting the null hypothesis and accepting the alternative hypothesis.
p-value
Null hypothesis (𝐻0): There is no linear relationship between the two variables.
Null hypothesis (𝐻1): There is a linear relationship between two variables.
Theory
Number of objects
Linear Regression
7
Create a Linear Regression Prediction Model
Evaluating Linear Regression Models - Test Set Validation Results
R2 is high, but MSE, RMSE, and MAE are all weak
According to Linear Regression, there is a significant relationship between the Seoul Apartment Sales Index and the
Total household loans, real economic growth, and number of newlyweds are not statistically significant.
13
Analytics
Decision Tree Regression
8
A decision tree is a model that makes predictions about data based on rules.
The training data is partitioned step by step using the input independent variables and various separation criteria, and the visualized model is intuitive
and very easy to interpret compared to other machine learning models.
Create a regression model by forming a Rule in the style of If then else
Regression tree branches based on continuous objective variable
Visualize results in a tree structure for easy understanding and interpretation
Provide information about explanatory variables (importance, interactions)
Non-parametric model, less sensitive to type, size, and outliers
Flexibility to deal with missing or raw data
Risk of overfitting → addressed with pruning and ensemble models, etc.
Doesn't guarantee an optimal tree
Pros Cons
14
Bagging
Boosting
Theory
Decision Tree Regression
8
Create a Decision Tree Regression Prediction Model
15
Evaluating Decision Tree Regression Models - Test Set Validation Results
R2 is high, but the RMSE is somewhat weak
According to Decision Tree Regression,
The Seoul Apartment Sales Index is
It is closely related to
the total amount of household loans.
Analytics
Random Forest Regression
9
It is a technique that compensates for the overfitting of a decision tree.
Generate multiple trees and generalize them using bagging, an ensemble method.
Outputs a classification or average prediction from multiple decision trees constructed during training,
Source: Youtube, Udacity
Each tree provides a taxonomy, and Voting selects the taxonomy with the most votes.
16
Theory
Random Forest Regression
9
Create a Random Forest Regression predictive model
According to the Random Forest Regression, the Seoul apartment sales index is
It is closely linked to financial variables such as M2 money and total household loans.
The conclusion is that the liquidity of the market affects the trading index.
17
Evaluating Random Forest Regression Models - Test Set Validation Results
Generally good performance, but RMSE is a bit weak
Analytics
XGB Regression
10
It is a model that uses an ensemble method, Boosting, on a Decision Tree to improve prediction performance.
Increase the accuracy of the next model by weighting data that was incorrectly predicted by previous models.
Source: Youtube, Udacity
Combine Weak Learners to create accurate and strong learners.
Build a model with low accuracy first, and the weaknesses (prediction errors) are compensated for by the next model, and so on until they are combined.
18
Theory
Create an XGB Regression predictive model
Evaluating the XGB Regression Model - Test Set Validation Results
Best results across all metrics
According to the XGB Regression, the Seoul Apartment Sales Index
The Jeonse index, unsold inventory, licensed inventory, and sublease rates are affected in this order.
The conclusion is that supply affects the trading index.
XGB Regression
10
19
Analytics
How are non-capital areas different?
Regress data from non-capital areas to compare variable impact
Linear Regression
According to Linear Regression,
Non-capital Apartment Sales Index and
Interest rates, number of newlyweds, and
license volume, Total household loans is not
statistically significant.
Generally acceptable performance
Decision Tree Regression
According to Decision Tree Regression,
Non-capital Area Apartment Sales Index
is related to Total household loans and
the Jeonse index.
R2 is high, but MSE, RMSE, and MAE are all weak
- Non-capital areas: Areas outside of the metropolitan areas (Seoul, Gyeonggi, Incheon)
- Non-capital area data: Sales Index, Jeonse Index, Rental Rate, Rental Conversion Rate, Permits,
Unsold Inventory, 30-year old home, newlyweds
- National common data: Gross household debt, M2 money supply, base rate,
Mortgage interest rate
? Defining the variable concept
20
Analytics
How are non-capital areas different?
Regress data from non-capital areas to compare variable impact
Random Forest Regression
Good results across all metrics
XGB Regression
According to XGB Regression,
Non-Capital Area Apartment Sales Index
is related to Jeonse index, Unsold volumn,
Mortgage rates.
It has nothing to do with Interest rates or
30-year houses.
Best results across all metrics
According to Random Forest Regression,
Non-Capital Area Apartment Sales Index
M2 Money Volume, Jeonseed Index,
Gross Household Lending,
It's related to the Rent conversion rate.
It has nothing to do with Interest rates or
30-year houses.
21
Analytics
- Non-capital areas: Areas outside of the metropolitan areas (Seoul, Gyeonggi, Incheon)
- Non-capital area data: Sales Index, Jeonse Index, Rental Rate, Rental Conversion Rate, Permits,
Unsold Inventory, 30-year old home, newlyweds
- National common data: Gross household debt, M2 money supply, base rate,
Mortgage interest rate
? Defining the variable concept
Time-series
Time Series Analytics &
Forecasting
Autoregressive Integrated Moving Average (ARIMA) is a model that combines the Autoregression (AR) model and the Moving Average (MA) model.
The ARIMA model assumes normality of the time series data, so it must be transformed to logarithmic if the variance is not constant, or to differencing if trends and seasonality
are present.
Auto ARIMA
11
AR A model that predicts the future value of a given variable as a linear combination of past observations of that variable.
( )
MA A model that uses forecast error to predict the future.
ARIMA The ARIMA(p,d,q) model is a combination of an autoregressive partial AR(p) model and a moving average MA(q) model over d-dimensional stationary data.
Auto ARIMA Functions to automatically estimate orders p,d,q and coefficients of ARIMA
models
22
!
Log
Conversion
Calm
Log
Conversion
Calm
Logarithmic Transformations and Differences for Stationarity
Non-stationary Time Series Constant Variance
Constant Mean Stationary Time Series
Theory
Create an Auto ARIMA Predictive Model Auto ARIMA Model Forecast Results - Next 24 Months
Expect a 24-month bear market if the last 2 months have been impacted by a decline in the index.
Auto ARIMA
11
23
202307
Analytics
Holt-Winters
12
The Holt-Winters seasonality technique consists of a predictor and three smoothing equations.
It consists of a level ℓt , a trend bt , a seasonal component st , and corresponding smoothing parameters α, β∗ , and γ, respectively.
M is the frequency of seasonality, where M=4 for quarterly data and M=12 for monthly data.
Addition Techniques It adds trend, seasonal, and data variation to each other and is typically used when the seasonal amplitude is constant over time.
Let k be the integer part of (h - 1)/m.
This value ensures that the estimated seasonality index for the forecast comes from the last year of the sample.
The level expression represents the seasonally adjusted observations (yt - st - m) and the non-seasonal forecast
(ℓt - 1 + bt - 1) for time t.
The seasonality expression shows the weighted average between the current seasonality index (yt -ℓt - 1 - bt - 1) and the seasonality index for the same season in the previous year (before time m).
Substituting ℓt into the above component form of the level equation in the smoothing formula, we get
This is equivalent to the smoothing expression for seasonality given by γ=γ∗(1-α). The usual parameter constraint is 0≤γ∗≤1, which can be rewritten as 0≤γ≤1-α.
Multiplication Techniques It multiplies the trend, seasonal, and data variations by each other, and is typically used
when the seasonal amplitude is characterized by a gradual increase or decrease.
24
Theory
Holt-Winters Model Forecast Results - Next 24 Months
Holt-Winters
12
Number of newlyweds
Real Growth Rate
Trading Indexes
Jeonse Index Jeonse rate ▼ Rent Conversion Rate
Rental rates remain weak despite rising rental index and rental conversion rate ☞ Rising rental index?
Licensed volume ▲ License volume Volume of unsold inventory Number of homes 30 years old
Slight increase in unsold inventory, but significant increase in 30-year-old homes
Total household loan amount
M2 call volume
Base Rate
Mortgage interest rates
VS
Increased liquidity and tightening through rate hikes collide?
25
Expect a sustained bull market
Analytics
Conclusions and limitations
13
Correlation
1
Correlation analysis is used to check the correlation between variables, and CCF is used to check the time difference between weakly correlated variables.
Regression
2
For the Seoul Apartment Sales Index, the Random Forest and XGB regression models performed well.
Random Forest Regression is highly influenced by market liquidity such as M2 money and total household loans.
The XGB Regression is heavily influenced by supply factors such as Jeonse index, unsold inventory, licensed inventory, and sublease rates.
The Random Forest and XGB regression models also outperformed the non-metropolitan apartment sales index.
In general, the impact of the Jeonse index tends to be higher than in Seoul.
The impact of the base rate and older homes reaching the 30-year age is much lower than in Seoul.
Time-series
3
With Auto ARIMA, the analysis predicted a bearish market with low volatility.
The Holt-Winters analysis, on the other hand, predicted a sustained bull market.
As the variables in your model are changing rapidly, you'll need to accumulate future data to refine your model.
26
Limitations
!
1. insufficient data volume and delay in updating recent data
2. different variables have different data collection times, so adding variables reduces the analyzable time series range to intersections
3. Limited support for analytical models in Brightics AI ☞ Compensate with Python
27
Source
Comparison of Apartment Transaction Price Index Prediction Models by Region Using Machine Learning Algorithms: Verification of LIME Analysis (Bo Geun Cho, Park Kyung Bae,
Ha Sung Ho / Korean Society of Information Systems / 2020.9)
Estimating apartment prices using machine learning: The case of Gangnam-gu, Seoul (Sungwan Bae, Jungseok Yoo / Journal of Real Estate Research, Vol. 24, No. 1, March 2018)
Reference papers
Office for National Statistics housing statistics: https://kosis.kr/index/index.do
Ministry of Land, Infrastructure, and Transport Statistics: https://stat.molit.go.kr/
Bank of Korea Economic Statistics System : https://ecos.bok.or.kr/
HF Housing Finance Statistics System: https://www.hf.go.kr/research/portal/stat/
Real Estate Statistics from Korea Real Estate Agency : https://www.reb.or.kr/r-one/
e-Country indicators: https://www.index.go.kr/potal/main/
Data sources
Brightics AI analysis model description (Correlation, etc.) : https://datadoctorblog.com/
Regression model performance evaluation : https://inistory.tistory.com/111
Cross-correlation: https://brique-analytics.tistory.com/23
Normalcy and calm: https://otexts.com/fppkr/stationarity.html
Linear Regression : https://soohee410.github.io/stat4
Bagging : https://www.youtube.com/watch?v=sVriC_Ys2cw
Boosting : https://www.youtube.com/watch?v=GM3CDQfQ4sw
ARIMA Model: https://leedakyeong.tistory.com/
Holt-Winters Technique: https://otexts.com/fppkr/holt-winters.html
Analytic Model Theory
Thank you
The 2nd KAIST DFMP CBA
Kyungrok Park
Should you have further questions, don’t hesitate to contact me: jarvis@krx.co.kr
Copyrightⓒ 2022. Kyungrok Park. All rights reserved.

More Related Content

Similar to [KAIST DFMP CBA] Analyze price determinants and forecast Seoul apartment prices_Kyungrok Park.pdf

The future is uncertain. Some events do have a very small probabil.docx
The future is uncertain. Some events do have a very small probabil.docxThe future is uncertain. Some events do have a very small probabil.docx
The future is uncertain. Some events do have a very small probabil.docx
oreo10
 
Linear functions and modeling
Linear functions and modelingLinear functions and modeling
Linear functions and modeling
IVY SOLIS
 

Similar to [KAIST DFMP CBA] Analyze price determinants and forecast Seoul apartment prices_Kyungrok Park.pdf (20)

Qt unit i
Qt unit   iQt unit   i
Qt unit i
 
Data Science - Part IV - Regression Analysis & ANOVA
Data Science - Part IV - Regression Analysis & ANOVAData Science - Part IV - Regression Analysis & ANOVA
Data Science - Part IV - Regression Analysis & ANOVA
 
A Critique of Factor Analysis of Interest Rates
A Critique of Factor Analysis of Interest RatesA Critique of Factor Analysis of Interest Rates
A Critique of Factor Analysis of Interest Rates
 
Modelling Inflation using Generalized Additive Mixed Models (GAMM)
Modelling Inflation using Generalized Additive Mixed Models (GAMM)Modelling Inflation using Generalized Additive Mixed Models (GAMM)
Modelling Inflation using Generalized Additive Mixed Models (GAMM)
 
Managerial Economics (Chapter 5 - Demand Estimation)
 Managerial Economics (Chapter 5 - Demand Estimation) Managerial Economics (Chapter 5 - Demand Estimation)
Managerial Economics (Chapter 5 - Demand Estimation)
 
bhagat.pdf
bhagat.pdfbhagat.pdf
bhagat.pdf
 
Multiple Linear Regression Applications in Real Estate Pricing
Multiple Linear Regression Applications in Real Estate PricingMultiple Linear Regression Applications in Real Estate Pricing
Multiple Linear Regression Applications in Real Estate Pricing
 
Multiple Linear Regression Applications in Real Estate Pricing
Multiple Linear Regression Applications in Real Estate PricingMultiple Linear Regression Applications in Real Estate Pricing
Multiple Linear Regression Applications in Real Estate Pricing
 
Prediction of house price using multiple regression
Prediction of house price using multiple regressionPrediction of house price using multiple regression
Prediction of house price using multiple regression
 
Demand forcasting
Demand forcastingDemand forcasting
Demand forcasting
 
Presentation 4
Presentation 4Presentation 4
Presentation 4
 
Research on the Trading Strategy Based On Interest Rate Term Structure Change...
Research on the Trading Strategy Based On Interest Rate Term Structure Change...Research on the Trading Strategy Based On Interest Rate Term Structure Change...
Research on the Trading Strategy Based On Interest Rate Term Structure Change...
 
Forecasting Methodology Used in Restructured Electricity Market: A Review
Forecasting Methodology Used in Restructured Electricity Market: A ReviewForecasting Methodology Used in Restructured Electricity Market: A Review
Forecasting Methodology Used in Restructured Electricity Market: A Review
 
Principal Component Analysis and Clustering
Principal Component Analysis and ClusteringPrincipal Component Analysis and Clustering
Principal Component Analysis and Clustering
 
PERFORMANCE ANALYSIS OF HYBRID FORECASTING MODEL IN STOCK MARKET FORECASTING
PERFORMANCE ANALYSIS OF HYBRID FORECASTING MODEL IN STOCK MARKET FORECASTINGPERFORMANCE ANALYSIS OF HYBRID FORECASTING MODEL IN STOCK MARKET FORECASTING
PERFORMANCE ANALYSIS OF HYBRID FORECASTING MODEL IN STOCK MARKET FORECASTING
 
The future is uncertain. Some events do have a very small probabil.docx
The future is uncertain. Some events do have a very small probabil.docxThe future is uncertain. Some events do have a very small probabil.docx
The future is uncertain. Some events do have a very small probabil.docx
 
Real Estate Investment Advising Using Machine Learning
Real Estate Investment Advising Using Machine LearningReal Estate Investment Advising Using Machine Learning
Real Estate Investment Advising Using Machine Learning
 
House Price Prediction Using Machine Learning
House Price Prediction Using Machine LearningHouse Price Prediction Using Machine Learning
House Price Prediction Using Machine Learning
 
Linear functions and modeling
Linear functions and modelingLinear functions and modeling
Linear functions and modeling
 
Multiple Regression.ppt
Multiple Regression.pptMultiple Regression.ppt
Multiple Regression.ppt
 

Recently uploaded

Land as a Resource for urban finanace- 24-1-23.ppt
Land as a Resource  for urban finanace- 24-1-23.pptLand as a Resource  for urban finanace- 24-1-23.ppt
Land as a Resource for urban finanace- 24-1-23.ppt
JIT KUMAR GUPTA
 
Acibadem Konaklari Uskudar - Listin Turkey
Acibadem Konaklari Uskudar - Listin TurkeyAcibadem Konaklari Uskudar - Listin Turkey
Acibadem Konaklari Uskudar - Listin Turkey
Listing Turkey
 
MEQ Mainstreet Equity Corp Q2 2024 Investor Presentation
MEQ Mainstreet Equity Corp Q2 2024 Investor PresentationMEQ Mainstreet Equity Corp Q2 2024 Investor Presentation
MEQ Mainstreet Equity Corp Q2 2024 Investor Presentation
MEQ - Mainstreet Equity Corp.
 
Mtp kit Available in Kuwait City +919101817206)) Get Mifty kit in Kuwait City
Mtp kit Available in Kuwait City +919101817206)) Get Mifty kit in Kuwait CityMtp kit Available in Kuwait City +919101817206)) Get Mifty kit in Kuwait City
Mtp kit Available in Kuwait City +919101817206)) Get Mifty kit in Kuwait City
ahmedjiabur940
 
Listing Turkey - 2024 - May Featured Portfolio
Listing Turkey - 2024 - May Featured PortfolioListing Turkey - 2024 - May Featured Portfolio
Listing Turkey - 2024 - May Featured Portfolio
Listing Turkey
 

Recently uploaded (20)

M3M Sector 72 Noida E-Brochure.pdf NEW.
M3M Sector 72 Noida  E-Brochure.pdf NEW.M3M Sector 72 Noida  E-Brochure.pdf NEW.
M3M Sector 72 Noida E-Brochure.pdf NEW.
 
Improvise, Adapt, Overcome - Sales Meeting, May '24
Improvise, Adapt, Overcome - Sales Meeting, May '24Improvise, Adapt, Overcome - Sales Meeting, May '24
Improvise, Adapt, Overcome - Sales Meeting, May '24
 
San Francisco Market Update -February 2024
San Francisco Market Update -February 2024San Francisco Market Update -February 2024
San Francisco Market Update -February 2024
 
Land as a Resource for urban finanace- 24-1-23.ppt
Land as a Resource  for urban finanace- 24-1-23.pptLand as a Resource  for urban finanace- 24-1-23.ppt
Land as a Resource for urban finanace- 24-1-23.ppt
 
Explore Dual Citizenship in Africa | Citizenship Benefits & Requirements
Explore Dual Citizenship in Africa | Citizenship Benefits & RequirementsExplore Dual Citizenship in Africa | Citizenship Benefits & Requirements
Explore Dual Citizenship in Africa | Citizenship Benefits & Requirements
 
Acibadem Konaklari Uskudar - Listin Turkey
Acibadem Konaklari Uskudar - Listin TurkeyAcibadem Konaklari Uskudar - Listin Turkey
Acibadem Konaklari Uskudar - Listin Turkey
 
Yashone Eternitee Mann-Hinjawadi Pune | E-Brochure
Yashone Eternitee Mann-Hinjawadi Pune | E-BrochureYashone Eternitee Mann-Hinjawadi Pune | E-Brochure
Yashone Eternitee Mann-Hinjawadi Pune | E-Brochure
 
MEQ Mainstreet Equity Corp Q2 2024 Investor Presentation
MEQ Mainstreet Equity Corp Q2 2024 Investor PresentationMEQ Mainstreet Equity Corp Q2 2024 Investor Presentation
MEQ Mainstreet Equity Corp Q2 2024 Investor Presentation
 
construction material procurement in India
construction material procurement in Indiaconstruction material procurement in India
construction material procurement in India
 
Lodha Baner Flat In Pune E-Brochure.pdf
Lodha Baner Flat In Pune  E-Brochure.pdfLodha Baner Flat In Pune  E-Brochure.pdf
Lodha Baner Flat In Pune E-Brochure.pdf
 
Dholera A Blueprint for Future Cities Massive Infrastructure Investments Tran...
Dholera A Blueprint for Future Cities Massive Infrastructure Investments Tran...Dholera A Blueprint for Future Cities Massive Infrastructure Investments Tran...
Dholera A Blueprint for Future Cities Massive Infrastructure Investments Tran...
 
Mtp kit Available in Kuwait City +919101817206)) Get Mifty kit in Kuwait City
Mtp kit Available in Kuwait City +919101817206)) Get Mifty kit in Kuwait CityMtp kit Available in Kuwait City +919101817206)) Get Mifty kit in Kuwait City
Mtp kit Available in Kuwait City +919101817206)) Get Mifty kit in Kuwait City
 
Kohinoor Courtyard One Wakad Pune | Elegant Living Spaces
Kohinoor Courtyard One Wakad Pune | Elegant Living SpacesKohinoor Courtyard One Wakad Pune | Elegant Living Spaces
Kohinoor Courtyard One Wakad Pune | Elegant Living Spaces
 
Unveiling the Veil: The Top Challenges with Estate Agents?
Unveiling the Veil: The Top Challenges with Estate Agents?Unveiling the Veil: The Top Challenges with Estate Agents?
Unveiling the Veil: The Top Challenges with Estate Agents?
 
Yedi Mavi TOBB Zeytinburnu - Listing Turkey
Yedi Mavi TOBB Zeytinburnu - Listing TurkeyYedi Mavi TOBB Zeytinburnu - Listing Turkey
Yedi Mavi TOBB Zeytinburnu - Listing Turkey
 
Retail Space for Lease - 1221 W. Main St., Sun Prairie, WI
Retail Space for Lease - 1221 W. Main St., Sun Prairie, WIRetail Space for Lease - 1221 W. Main St., Sun Prairie, WI
Retail Space for Lease - 1221 W. Main St., Sun Prairie, WI
 
Unique NIBM Flat In Pune E-Brochure.pdf
Unique NIBM Flat In Pune  E-Brochure.pdfUnique NIBM Flat In Pune  E-Brochure.pdf
Unique NIBM Flat In Pune E-Brochure.pdf
 
Listing Turkey - 2024 - May Featured Portfolio
Listing Turkey - 2024 - May Featured PortfolioListing Turkey - 2024 - May Featured Portfolio
Listing Turkey - 2024 - May Featured Portfolio
 
Dynamic Grandeur Undri Pune | A Space For You To Find Your Space
Dynamic Grandeur Undri Pune | A Space For You To Find Your SpaceDynamic Grandeur Undri Pune | A Space For You To Find Your Space
Dynamic Grandeur Undri Pune | A Space For You To Find Your Space
 
Are You Thinking About Selling Your House Soon? | KM Realty Group LLC
Are You Thinking About Selling Your House Soon?  | KM Realty Group LLCAre You Thinking About Selling Your House Soon?  | KM Realty Group LLC
Are You Thinking About Selling Your House Soon? | KM Realty Group LLC
 

[KAIST DFMP CBA] Analyze price determinants and forecast Seoul apartment prices_Kyungrok Park.pdf

  • 1. Analyze price determinants and forecast Seoul apartment prices (Year of 2022) The 2nd KAIST Digital Finance Mastership Program: Cloud computing and Bigdata Analysis Kyungrok Park
  • 2. Analysis purpose 1 Check the influence of various factors that determine the Seoul apartment sales index and predict future sales price trends Analyzable? Get data Choose topics that make your analytics data accessible Generate interest Analyze apartment prices that have been fluctuating rapidly recently to find out if Engage your audience Align with learning Make the most of what you learned in the KAIST CBA course 1
  • 3. Examine existing research Using 8 explanatory variables to predict apartment prices by region Consumer Price Index, Term Deposit Rate, Money Volume (M2), Apartment Transaction Price Index, Index, Apartment Sales Status, Mortgage Interest Rate, Real Estate Search Index Among ARIMA, Random Forest, and LSTM, we conclude that the LSTM model is the most accurate. Machine learning models have better predictive performance than traditional time series models The objective of this study is to identify the numbers that can explain the major inflection points in the price using the LIME algorithm after selecting a predictive model. Analyzed that data from 4 months ago has no effect on price changes, and data from 1-2 months ago is the main explanatory variable for apartment price forecasting. If the interest rate on a mortgage decreases and the interest rate on a time deposit increases, you can suspect a price drop in two months In Seoul and Busan, real estate-related search index (Google Trends) was analyzed as a key explanatory variable Verify the feasibility of applying machine learning to the home price disclosure calculation industry by analyzing the actual price gap rather than comparing the predictive power of different models. Analyzed a total of 4,791 apartment transactions in Gangnam-gu, Seoul from January 1, 2016 to December 31, 2016, excluding outliers. Use explanatory variables such as floor, land area, dedicated area, age, transaction with subway station, number of units, transaction month, building, etc. Randomly use 70% of all cases as training data and 30% as testing data Machine learning methods SVM, RF, GBRT, and DNN have been shown to have better predictive power than MRA Given the increasing budget of the common housing disclosure business, it is expected to increase work efficiency by estimating the price of common housing through machine learning. 2
  • 4. Data to analyze 2 Check the influence of various factors that determine the Seoul apartment sales index and predict future sales price trends Collect data on 13 variables related to Seoul apartments in each area - Preprocessing Source: Statistics Korea, Ministry of Land, Infrastructure, and Transport, Bank of Korea, Housing Finance Corporation, Korea Real Estate Agency, e-National Indicators Data range: January 2006 to April 2022 (198 months total) apt_prc Trading Indices lease_prc Jeonse* Index Dwelling Type Data lease_rto Jeonse rates rent_trans_rto Rent conversion rate Supply data approved_home Authorized Volume (60 meters2 or more) unsold_home Unsold Volume year30_home 30-Year-Old Homes (Redevelopment-rebuildability) family_tot Total household debt M2_mon M2 call volume Financial data base_int Base Rate loan_int Mortgage rates Other data date (index) Date - Monthly marriage Newlyweds (new demand) growth Real economic growth 3 Why indices? ? - Aim to understand macro market trends - Most accumulated statistics over a long period of time without missing data - Regulation creates market distortions, such as increased specialty trading - Possibility of individual listings over-representing the sample due to reduced trading volume * Jeonse : The Jeonse system is a unique system in South Korean rental practice where tenants pay a large deposit instead of monthly rent, which is returned at the lease's end, usually after two years. This deposit can be a significant percentage of the property's value, providing landlords with a lump-sum to invest.
  • 5. Analysis methods 3 Check the influence of various factors that determine the Seoul apartment sales index and predict future sales price trends Correlation 1 Analyze the correlation between the apartment sales index and each variable and select variables Regression 2 Generate regressions for models with high explanatory power and see the impact of variables Time-series 3 Predicting future apartment sales index trends with time series analysis Data modeling and forecasting with Python and Brightics AI Utilize Google Collaboration Tools Python Brightics AI Drive Colab Sheets Presentation 4
  • 8. Correlation 5 Correlation is a descriptive analysis technique that looks to determine if a linear relationship exists between two variables and, if so, how strong it is. Typically, the Pearson's Correlation Coefficient is used. You should utilize the appropriate analysis technique for the nature of your data. The approximate distribution of two continuous variables based on the values of the correlation coefficient. You can see that the closer the shape is to a straight line, the closer it is to 1 or -1. This is a value that tells you how strong the linear relationship between two variables is if it exists, and can be used when the two variables are continuous quantitative variables. The correlation coefficient is derived from the covariance, when squared, which is equal to the coefficient of determination of a simple linear regression. The value of the correlation coefficient ranges from -1 to 1. If the sign is positive, it is called a positive correlation, and if it is negative, it is called a negative correlation. Pearson's Correlation Calculatio ns Test statistic t The p-value is calculated as 2×P(T>t)2×P(T>t), which is equal to T follows a t distribution with n - 2 degrees of freedom. Null hypothesis (𝐻0): There is no linear relationship between the two variables. (r = 0) Null hypothesis (𝐻1): There is a linear relationship between two variables. (r ≠ 0) 6 Use Spearman/Kendal for ordinal variables Theory
  • 9. Correlation 5 The Cross Correlation Function (CCF) is a signal processing technique that overlays different time series to analyze their similarity in shape and time differences. Analyze correlations and time differences between time series that satisfy stationarity. If it does not satisfy normality, it must be transformed into a time series that satisfies normality through differentiation or logarithmic transformation. High similarity with a time difference of 4 columns Graph x + 4 squares = Graph y CCF Cross Correlation Basics Calculates the difference between consecutive observations to remove variation at the time series level and make the average change constant. As a result, trends or seasonality are removed (or reduced). Differencing Difference formulas Quadratic Difference Formula 7 Theory
  • 10. Correlation 5 Analytics The number of licenses is not only related to apartment sales, but also to other variables such as No significant positive or negative correlation. Without knowing the nature of the variables, it's easy to assume that the data is irrelevant. Permit volume leads to actual occupancy, which in turn affects prices. It is characterized by the presence of time differences. If there is a time difference, the usual correlation analysis is meaningless. ! CCF analysis (second difference) shows that the decrease in permit volume is due to confirmed that it leads to an increase in the Seoul apartment sales index about 6 years later 8
  • 11. Correlation 5 See correlations between variables 9 Analytics
  • 13. Regression 6 Split the data into a Train Set and Test Set with a ratio of 85:15 10 Split
  • 14. Regression 6 Regression model performance evaluation metrics MSE Key loss functions for regression models Define it as the mean of the squared errors, which is the difference between the predicted and actual values. Because it squares, it is sensitive to outliers. MAE The absolute mean of the errors, the difference between the actual and predicted values. Less sensitive to outliers than MSE. RMSE Values rooted in MSE Converting error metrics back to units similar to their actual values makes them easier to interpret. R-Squared Evaluating Distributed Prediction Performance You can intuitively judge relative performance regardless of the scale of your data. The closer to zero Good performance The closer to 1 Good performance 11 Metrics Calculations Description. Performance criteria Theory
  • 15. Linear Regression 7 A linear regression uses one continuous dependent variable and two or more independent variables to estimate the relationship between the determine the relationship between the independent and dependent variables and use the resulting regression model to predict the value of the dependent variable. 12 The slope of the regression, which depends on the unit of measure and is therefore independent of intensity. Coefficient of Elasticity (COEF) The smaller the standard deviation of the residual (the difference between the regression estimated by the sample and the true value), the better the fit. Standard error (std_err) Estimated parameter value / standard deviation (parameter value) The larger it is, the smaller the p-value and the higher the probability of rejecting the null hypothesis. t-statistics A value that determines if the relationship between variables is significant. The smaller it is, the higher the probability of rejecting the null hypothesis and accepting the alternative hypothesis. p-value Null hypothesis (𝐻0): There is no linear relationship between the two variables. Null hypothesis (𝐻1): There is a linear relationship between two variables. Theory Number of objects
  • 16. Linear Regression 7 Create a Linear Regression Prediction Model Evaluating Linear Regression Models - Test Set Validation Results R2 is high, but MSE, RMSE, and MAE are all weak According to Linear Regression, there is a significant relationship between the Seoul Apartment Sales Index and the Total household loans, real economic growth, and number of newlyweds are not statistically significant. 13 Analytics
  • 17. Decision Tree Regression 8 A decision tree is a model that makes predictions about data based on rules. The training data is partitioned step by step using the input independent variables and various separation criteria, and the visualized model is intuitive and very easy to interpret compared to other machine learning models. Create a regression model by forming a Rule in the style of If then else Regression tree branches based on continuous objective variable Visualize results in a tree structure for easy understanding and interpretation Provide information about explanatory variables (importance, interactions) Non-parametric model, less sensitive to type, size, and outliers Flexibility to deal with missing or raw data Risk of overfitting → addressed with pruning and ensemble models, etc. Doesn't guarantee an optimal tree Pros Cons 14 Bagging Boosting Theory
  • 18. Decision Tree Regression 8 Create a Decision Tree Regression Prediction Model 15 Evaluating Decision Tree Regression Models - Test Set Validation Results R2 is high, but the RMSE is somewhat weak According to Decision Tree Regression, The Seoul Apartment Sales Index is It is closely related to the total amount of household loans. Analytics
  • 19. Random Forest Regression 9 It is a technique that compensates for the overfitting of a decision tree. Generate multiple trees and generalize them using bagging, an ensemble method. Outputs a classification or average prediction from multiple decision trees constructed during training, Source: Youtube, Udacity Each tree provides a taxonomy, and Voting selects the taxonomy with the most votes. 16 Theory
  • 20. Random Forest Regression 9 Create a Random Forest Regression predictive model According to the Random Forest Regression, the Seoul apartment sales index is It is closely linked to financial variables such as M2 money and total household loans. The conclusion is that the liquidity of the market affects the trading index. 17 Evaluating Random Forest Regression Models - Test Set Validation Results Generally good performance, but RMSE is a bit weak Analytics
  • 21. XGB Regression 10 It is a model that uses an ensemble method, Boosting, on a Decision Tree to improve prediction performance. Increase the accuracy of the next model by weighting data that was incorrectly predicted by previous models. Source: Youtube, Udacity Combine Weak Learners to create accurate and strong learners. Build a model with low accuracy first, and the weaknesses (prediction errors) are compensated for by the next model, and so on until they are combined. 18 Theory
  • 22. Create an XGB Regression predictive model Evaluating the XGB Regression Model - Test Set Validation Results Best results across all metrics According to the XGB Regression, the Seoul Apartment Sales Index The Jeonse index, unsold inventory, licensed inventory, and sublease rates are affected in this order. The conclusion is that supply affects the trading index. XGB Regression 10 19 Analytics
  • 23. How are non-capital areas different? Regress data from non-capital areas to compare variable impact Linear Regression According to Linear Regression, Non-capital Apartment Sales Index and Interest rates, number of newlyweds, and license volume, Total household loans is not statistically significant. Generally acceptable performance Decision Tree Regression According to Decision Tree Regression, Non-capital Area Apartment Sales Index is related to Total household loans and the Jeonse index. R2 is high, but MSE, RMSE, and MAE are all weak - Non-capital areas: Areas outside of the metropolitan areas (Seoul, Gyeonggi, Incheon) - Non-capital area data: Sales Index, Jeonse Index, Rental Rate, Rental Conversion Rate, Permits, Unsold Inventory, 30-year old home, newlyweds - National common data: Gross household debt, M2 money supply, base rate, Mortgage interest rate ? Defining the variable concept 20 Analytics
  • 24. How are non-capital areas different? Regress data from non-capital areas to compare variable impact Random Forest Regression Good results across all metrics XGB Regression According to XGB Regression, Non-Capital Area Apartment Sales Index is related to Jeonse index, Unsold volumn, Mortgage rates. It has nothing to do with Interest rates or 30-year houses. Best results across all metrics According to Random Forest Regression, Non-Capital Area Apartment Sales Index M2 Money Volume, Jeonseed Index, Gross Household Lending, It's related to the Rent conversion rate. It has nothing to do with Interest rates or 30-year houses. 21 Analytics - Non-capital areas: Areas outside of the metropolitan areas (Seoul, Gyeonggi, Incheon) - Non-capital area data: Sales Index, Jeonse Index, Rental Rate, Rental Conversion Rate, Permits, Unsold Inventory, 30-year old home, newlyweds - National common data: Gross household debt, M2 money supply, base rate, Mortgage interest rate ? Defining the variable concept
  • 26. Autoregressive Integrated Moving Average (ARIMA) is a model that combines the Autoregression (AR) model and the Moving Average (MA) model. The ARIMA model assumes normality of the time series data, so it must be transformed to logarithmic if the variance is not constant, or to differencing if trends and seasonality are present. Auto ARIMA 11 AR A model that predicts the future value of a given variable as a linear combination of past observations of that variable. ( ) MA A model that uses forecast error to predict the future. ARIMA The ARIMA(p,d,q) model is a combination of an autoregressive partial AR(p) model and a moving average MA(q) model over d-dimensional stationary data. Auto ARIMA Functions to automatically estimate orders p,d,q and coefficients of ARIMA models 22 ! Log Conversion Calm Log Conversion Calm Logarithmic Transformations and Differences for Stationarity Non-stationary Time Series Constant Variance Constant Mean Stationary Time Series Theory
  • 27. Create an Auto ARIMA Predictive Model Auto ARIMA Model Forecast Results - Next 24 Months Expect a 24-month bear market if the last 2 months have been impacted by a decline in the index. Auto ARIMA 11 23 202307 Analytics
  • 28. Holt-Winters 12 The Holt-Winters seasonality technique consists of a predictor and three smoothing equations. It consists of a level ℓt , a trend bt , a seasonal component st , and corresponding smoothing parameters α, β∗ , and γ, respectively. M is the frequency of seasonality, where M=4 for quarterly data and M=12 for monthly data. Addition Techniques It adds trend, seasonal, and data variation to each other and is typically used when the seasonal amplitude is constant over time. Let k be the integer part of (h - 1)/m. This value ensures that the estimated seasonality index for the forecast comes from the last year of the sample. The level expression represents the seasonally adjusted observations (yt - st - m) and the non-seasonal forecast (ℓt - 1 + bt - 1) for time t. The seasonality expression shows the weighted average between the current seasonality index (yt -ℓt - 1 - bt - 1) and the seasonality index for the same season in the previous year (before time m). Substituting ℓt into the above component form of the level equation in the smoothing formula, we get This is equivalent to the smoothing expression for seasonality given by γ=γ∗(1-α). The usual parameter constraint is 0≤γ∗≤1, which can be rewritten as 0≤γ≤1-α. Multiplication Techniques It multiplies the trend, seasonal, and data variations by each other, and is typically used when the seasonal amplitude is characterized by a gradual increase or decrease. 24 Theory
  • 29. Holt-Winters Model Forecast Results - Next 24 Months Holt-Winters 12 Number of newlyweds Real Growth Rate Trading Indexes Jeonse Index Jeonse rate ▼ Rent Conversion Rate Rental rates remain weak despite rising rental index and rental conversion rate ☞ Rising rental index? Licensed volume ▲ License volume Volume of unsold inventory Number of homes 30 years old Slight increase in unsold inventory, but significant increase in 30-year-old homes Total household loan amount M2 call volume Base Rate Mortgage interest rates VS Increased liquidity and tightening through rate hikes collide? 25 Expect a sustained bull market Analytics
  • 30. Conclusions and limitations 13 Correlation 1 Correlation analysis is used to check the correlation between variables, and CCF is used to check the time difference between weakly correlated variables. Regression 2 For the Seoul Apartment Sales Index, the Random Forest and XGB regression models performed well. Random Forest Regression is highly influenced by market liquidity such as M2 money and total household loans. The XGB Regression is heavily influenced by supply factors such as Jeonse index, unsold inventory, licensed inventory, and sublease rates. The Random Forest and XGB regression models also outperformed the non-metropolitan apartment sales index. In general, the impact of the Jeonse index tends to be higher than in Seoul. The impact of the base rate and older homes reaching the 30-year age is much lower than in Seoul. Time-series 3 With Auto ARIMA, the analysis predicted a bearish market with low volatility. The Holt-Winters analysis, on the other hand, predicted a sustained bull market. As the variables in your model are changing rapidly, you'll need to accumulate future data to refine your model. 26 Limitations ! 1. insufficient data volume and delay in updating recent data 2. different variables have different data collection times, so adding variables reduces the analyzable time series range to intersections 3. Limited support for analytical models in Brightics AI ☞ Compensate with Python
  • 31. 27 Source Comparison of Apartment Transaction Price Index Prediction Models by Region Using Machine Learning Algorithms: Verification of LIME Analysis (Bo Geun Cho, Park Kyung Bae, Ha Sung Ho / Korean Society of Information Systems / 2020.9) Estimating apartment prices using machine learning: The case of Gangnam-gu, Seoul (Sungwan Bae, Jungseok Yoo / Journal of Real Estate Research, Vol. 24, No. 1, March 2018) Reference papers Office for National Statistics housing statistics: https://kosis.kr/index/index.do Ministry of Land, Infrastructure, and Transport Statistics: https://stat.molit.go.kr/ Bank of Korea Economic Statistics System : https://ecos.bok.or.kr/ HF Housing Finance Statistics System: https://www.hf.go.kr/research/portal/stat/ Real Estate Statistics from Korea Real Estate Agency : https://www.reb.or.kr/r-one/ e-Country indicators: https://www.index.go.kr/potal/main/ Data sources Brightics AI analysis model description (Correlation, etc.) : https://datadoctorblog.com/ Regression model performance evaluation : https://inistory.tistory.com/111 Cross-correlation: https://brique-analytics.tistory.com/23 Normalcy and calm: https://otexts.com/fppkr/stationarity.html Linear Regression : https://soohee410.github.io/stat4 Bagging : https://www.youtube.com/watch?v=sVriC_Ys2cw Boosting : https://www.youtube.com/watch?v=GM3CDQfQ4sw ARIMA Model: https://leedakyeong.tistory.com/ Holt-Winters Technique: https://otexts.com/fppkr/holt-winters.html Analytic Model Theory
  • 32. Thank you The 2nd KAIST DFMP CBA Kyungrok Park Should you have further questions, don’t hesitate to contact me: jarvis@krx.co.kr Copyrightⓒ 2022. Kyungrok Park. All rights reserved.