Webinar de l'agence Neper sur les approches recommandées pour la création de contenu optimisé pour le SEO en 2022.
Présenté et animé par Philippe Yonnet.
E-mail Restart 2023: Jakub Malý - Gamifikace, omnichannel komunikace a kineti...Taste
Praktické ukázky s konkrétními čísly, jak se nám povedlo díky gamifikaci a následné omnichannel komunikaci nakopnout a odlišit e-mailing našich klientů od konkurence a dosáhnout lepších výsledků a hlubších vztahů se zákazníky.
Top ten reasons to build out your Google My Business page:
1. It's free
2. It helps customers find you
.
.
.
7. Insights, stats, and analytics
8. You can also track traffic
and more.
Getting Started In SEO: 10 Things Every SEO Strategy Needs To SucceedSearch Engine Journal
Are you looking to start an SEO strategy for your business?
Don’t know where to start?
Creating and executing your first SEO strategy can seem daunting and complicated.
But, with the right resources, businesses of any size can see the growth they are looking for.
Watch now and learn the ten things every SEO campaign needs to succeed.
You'll discover:
- The 10 things you need to succeed with SEO.
- How to optimize with quality content.
- What a successful SEO campaign looks like.
Matt Salzl, VP of Sales at Manta, a Boostability company, will help you find the best path forward with your budget and resources.
Small businesses often know the importance of SEO but they don't always know where to start.
In this insightful webinar, you'll learn the important factors needed to start improving your business's online presence and grow through SEO.
GMB - Google My Business Setup & OptimizationGetFoundLocal
When set up and marketed correctly GMB can deliver branding, engagement, and new customers to your business. If ignored or not marketed properly, Google My Business just becomes another waste of time. Google makes this excellent service available for free. Take advantage and build your business.
Webinar de l'agence Neper sur les approches recommandées pour la création de contenu optimisé pour le SEO en 2022.
Présenté et animé par Philippe Yonnet.
E-mail Restart 2023: Jakub Malý - Gamifikace, omnichannel komunikace a kineti...Taste
Praktické ukázky s konkrétními čísly, jak se nám povedlo díky gamifikaci a následné omnichannel komunikaci nakopnout a odlišit e-mailing našich klientů od konkurence a dosáhnout lepších výsledků a hlubších vztahů se zákazníky.
Top ten reasons to build out your Google My Business page:
1. It's free
2. It helps customers find you
.
.
.
7. Insights, stats, and analytics
8. You can also track traffic
and more.
Getting Started In SEO: 10 Things Every SEO Strategy Needs To SucceedSearch Engine Journal
Are you looking to start an SEO strategy for your business?
Don’t know where to start?
Creating and executing your first SEO strategy can seem daunting and complicated.
But, with the right resources, businesses of any size can see the growth they are looking for.
Watch now and learn the ten things every SEO campaign needs to succeed.
You'll discover:
- The 10 things you need to succeed with SEO.
- How to optimize with quality content.
- What a successful SEO campaign looks like.
Matt Salzl, VP of Sales at Manta, a Boostability company, will help you find the best path forward with your budget and resources.
Small businesses often know the importance of SEO but they don't always know where to start.
In this insightful webinar, you'll learn the important factors needed to start improving your business's online presence and grow through SEO.
GMB - Google My Business Setup & OptimizationGetFoundLocal
When set up and marketed correctly GMB can deliver branding, engagement, and new customers to your business. If ignored or not marketed properly, Google My Business just becomes another waste of time. Google makes this excellent service available for free. Take advantage and build your business.
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangeThinkInnovation
Context
1. Housing Agent collected resale prices on HDB apartments in Singapore.
Objective
2. To predict resale prices in to advise his potential clients.
Strategies
3. Explore & Clean data for analysis.
4. Perform K-Means Clustering, in Orange, to find possible segments in the customer data.
5. Tune the model to improve its performance.
6. Visualise the findings, share conclusions, and give insight-driven recommendations.
Author: Anthony mok Date: 18 Nov 2023
Email: xxiaohao@yahoo.com
Airbnb Price Estimation in Major US Metropolitan AreasRavitejChilukuri1
An analysis on Airbnb listing prices to estimate a target price range based on various characteristics, including but not limited to: zip code, adjectives in description, bedrooms, bathrooms, etc.
Predictive modeling for resale hdb evaluation pricekahhuey
After going through the pain of buying and selling my HDB recently, I believe a predictive model of resale price is needed. I wrote a model using MLR.
Y = β0 + β1X1 + β2X2 + β3X3 + β4X4 + β5X5 +… + βpXp + Ɛ
This model includes variables like west sun, corridor etc. It would be a great help for buyers and sellers if this model with the support of real data from the authority is available publicly.
Improving classification accuracy for customer contact transcriptionsMaria Vechtomova
Reducing number of customer contacts by making processes more efficient is an everlasting target for KPN. Business users can get insights on why people call and spot areas of improvement using call classification. Currently KPN has a classification tree in place, and calls and chats are being classified manually. We try to use the input to classify calls and chats using supervised machine learning techniques (CNN, LSTM, hybrid models). The dataset is challenging in many ways and human accuracy on such a dataset is low. Poorly separated classes in the classification tree play a big role in it. Unfortunately, creating a new classification tree does not seem feasible because of high costs involved, and we tried 'cheaper' unsupervised and semi-supervised techniques which were not successful. We discuss the reasons of failure and possible next steps.
Commercial valuation of property is the prime requirement for investments. Real estate values depend on many elements, such as the present cost of the land, taxation on the lad, the depreciation rates, and others. It is essential that a feasibility analysis be conducted first before the land is invested into. Given this context this report basically attempts to do a feasibility analysis for a property using the Estate Master Feasibility analysis tool. A number of inputs are given based on a case scenario for property. The report attempts to evaluate and critically discusses the real world scenario presented by the Estate master for each of these sets of inputs. A feasibility analysis based on commercial valuation methodology is carried out first, followed by a valuation of the site as is. The residual value is calculated here. The report then calculates the project returns based on the residual values using the Estate Master and in the second part of the report, sensitivity and risks analysis are covered.
In this keynote I will give you a business understanding of ML by going through key concepts and concrete use cases that illustrate its possibilities. I'll present new technology that makes ML more accessible, and I'll explain in simple terms the limitations to what can be achieved. Finally, I'll discuss pragmatic considerations of real-world applications and I'll give a sneak peak at the Machine Learning Canvas — a framework for describing a predictive system that uses ML to provide value to its end user.
The objective of the project was to develop an analytical model to predict the house prices at King County. This whitepaper describes in detail the data preprocessing, predictive models developed, recommendations and future plans for improvement.
DirectionsThis exam consists of seven problems and is an open-boo.docxduketjoy27252
Directions:This exam consists of seven problems and is an open-book exam with no time limit. All work should be done individually. Word-process your solutions within this template and show all steps used in arriving at the final answers. Incomplete solutions will receive partial credit. Copy and paste all necessary data from Excel into this document and create tables as needed.
Problem 1
Suppose a manufacturing company makes a certain item. The time to produce each item is normally distributed around a mean of 27 minutes with a standard deviation of 2.5 minutes. Thus, the population of production times is normal in shape. Find the mean and standard deviation of the sample.
Problem 2
The average prices for a product in 12stores in a city are shown below.
$2.99, $2.85, $3.25, $3.55, $3.00, $2.99, $2.76, $3.50, $3.20, $2.85, $3.75, $3.85
Test the hypothesis that the average price is higher than $2.87. Use level of significance = 0.05.
Problem 3
A store wishes to predict net profit as a function of sales for the next year.The following table gives the years 1998 to 2005.
Year
Sales
(thousands of dollars)
Net Profit
1998
51
5
1999
55
10.2
2000
65
9.6
2001
82
-3
2002
75
2.8
2003
71
3.2
2004
82
-2.3
2005
81
-2.6
(a) Graph the points from 1998 through 2005on ascatter diagram using Sales as the independent variable and Net Profit as the dependent variable.
(b) Draw the regression line on the graph you constructed in Part (a).
(c) What is the value of the coefficient of determination for this regression model? Comment on the strength of the regression line for this model.
(d) What is the predicted net profit for 2006 if sales are expected to be 125?
Problem 4
Last week’s sales of iMac computers at an Apple Store in Oklahoma City, OK,are shown in the following table:
Day
Sales (Dollars)
1
180
2
150
3
210
4
225
5
195
6
190
7
230
(a) Use the 3-day moving average method for forecasting days 4–7.
(b) Use the 3-day weighted moving average method for forecasting days 4–7. Use Weight 1 day ago = 2, Weight 2 days ago = 4, and Weight 3 days ago = 3.
(c) Compare the techniques using the mean absolute deviation (MAD).
Problem 5
The following table shows six years of average annual cost-of-living index data:
Year
Annual Cost of Living Index
2008
105.8
2009
111.4
2010
121.9
2011
134.3
2012
128.6
2013
125.2
(a) Forecast the average annual food price index for all years from 2008 to 2013. Use a 3-year weighted moving average with weights of 0.5, 0.3, and 0.2. Use the largest weight with the most recent data.
(b) Forecast the average annual food price index using exponential smoothing with α = 0.7 for all years from 2008 to 2014. Use the rate for 2008 as the starting forecast for 2008.
(c) Which of the methods in parts (a) and (b) produces better forecasts for the 3 years from 2011 to 2013? Answer on the basis of mean square error (MAD).
Problem 6
A company manufactures two products, Product A and Product B. The wholesale price and manufacturing cost.
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangeThinkInnovation
Context
1. Housing Agent collected resale prices on HDB apartments in Singapore.
Objective
2. To predict resale prices in to advise his potential clients.
Strategies
3. Explore & Clean data for analysis.
4. Perform K-Means Clustering, in Orange, to find possible segments in the customer data.
5. Tune the model to improve its performance.
6. Visualise the findings, share conclusions, and give insight-driven recommendations.
Author: Anthony mok Date: 18 Nov 2023
Email: xxiaohao@yahoo.com
Airbnb Price Estimation in Major US Metropolitan AreasRavitejChilukuri1
An analysis on Airbnb listing prices to estimate a target price range based on various characteristics, including but not limited to: zip code, adjectives in description, bedrooms, bathrooms, etc.
Predictive modeling for resale hdb evaluation pricekahhuey
After going through the pain of buying and selling my HDB recently, I believe a predictive model of resale price is needed. I wrote a model using MLR.
Y = β0 + β1X1 + β2X2 + β3X3 + β4X4 + β5X5 +… + βpXp + Ɛ
This model includes variables like west sun, corridor etc. It would be a great help for buyers and sellers if this model with the support of real data from the authority is available publicly.
Improving classification accuracy for customer contact transcriptionsMaria Vechtomova
Reducing number of customer contacts by making processes more efficient is an everlasting target for KPN. Business users can get insights on why people call and spot areas of improvement using call classification. Currently KPN has a classification tree in place, and calls and chats are being classified manually. We try to use the input to classify calls and chats using supervised machine learning techniques (CNN, LSTM, hybrid models). The dataset is challenging in many ways and human accuracy on such a dataset is low. Poorly separated classes in the classification tree play a big role in it. Unfortunately, creating a new classification tree does not seem feasible because of high costs involved, and we tried 'cheaper' unsupervised and semi-supervised techniques which were not successful. We discuss the reasons of failure and possible next steps.
Commercial valuation of property is the prime requirement for investments. Real estate values depend on many elements, such as the present cost of the land, taxation on the lad, the depreciation rates, and others. It is essential that a feasibility analysis be conducted first before the land is invested into. Given this context this report basically attempts to do a feasibility analysis for a property using the Estate Master Feasibility analysis tool. A number of inputs are given based on a case scenario for property. The report attempts to evaluate and critically discusses the real world scenario presented by the Estate master for each of these sets of inputs. A feasibility analysis based on commercial valuation methodology is carried out first, followed by a valuation of the site as is. The residual value is calculated here. The report then calculates the project returns based on the residual values using the Estate Master and in the second part of the report, sensitivity and risks analysis are covered.
In this keynote I will give you a business understanding of ML by going through key concepts and concrete use cases that illustrate its possibilities. I'll present new technology that makes ML more accessible, and I'll explain in simple terms the limitations to what can be achieved. Finally, I'll discuss pragmatic considerations of real-world applications and I'll give a sneak peak at the Machine Learning Canvas — a framework for describing a predictive system that uses ML to provide value to its end user.
The objective of the project was to develop an analytical model to predict the house prices at King County. This whitepaper describes in detail the data preprocessing, predictive models developed, recommendations and future plans for improvement.
DirectionsThis exam consists of seven problems and is an open-boo.docxduketjoy27252
Directions:This exam consists of seven problems and is an open-book exam with no time limit. All work should be done individually. Word-process your solutions within this template and show all steps used in arriving at the final answers. Incomplete solutions will receive partial credit. Copy and paste all necessary data from Excel into this document and create tables as needed.
Problem 1
Suppose a manufacturing company makes a certain item. The time to produce each item is normally distributed around a mean of 27 minutes with a standard deviation of 2.5 minutes. Thus, the population of production times is normal in shape. Find the mean and standard deviation of the sample.
Problem 2
The average prices for a product in 12stores in a city are shown below.
$2.99, $2.85, $3.25, $3.55, $3.00, $2.99, $2.76, $3.50, $3.20, $2.85, $3.75, $3.85
Test the hypothesis that the average price is higher than $2.87. Use level of significance = 0.05.
Problem 3
A store wishes to predict net profit as a function of sales for the next year.The following table gives the years 1998 to 2005.
Year
Sales
(thousands of dollars)
Net Profit
1998
51
5
1999
55
10.2
2000
65
9.6
2001
82
-3
2002
75
2.8
2003
71
3.2
2004
82
-2.3
2005
81
-2.6
(a) Graph the points from 1998 through 2005on ascatter diagram using Sales as the independent variable and Net Profit as the dependent variable.
(b) Draw the regression line on the graph you constructed in Part (a).
(c) What is the value of the coefficient of determination for this regression model? Comment on the strength of the regression line for this model.
(d) What is the predicted net profit for 2006 if sales are expected to be 125?
Problem 4
Last week’s sales of iMac computers at an Apple Store in Oklahoma City, OK,are shown in the following table:
Day
Sales (Dollars)
1
180
2
150
3
210
4
225
5
195
6
190
7
230
(a) Use the 3-day moving average method for forecasting days 4–7.
(b) Use the 3-day weighted moving average method for forecasting days 4–7. Use Weight 1 day ago = 2, Weight 2 days ago = 4, and Weight 3 days ago = 3.
(c) Compare the techniques using the mean absolute deviation (MAD).
Problem 5
The following table shows six years of average annual cost-of-living index data:
Year
Annual Cost of Living Index
2008
105.8
2009
111.4
2010
121.9
2011
134.3
2012
128.6
2013
125.2
(a) Forecast the average annual food price index for all years from 2008 to 2013. Use a 3-year weighted moving average with weights of 0.5, 0.3, and 0.2. Use the largest weight with the most recent data.
(b) Forecast the average annual food price index using exponential smoothing with α = 0.7 for all years from 2008 to 2014. Use the rate for 2008 as the starting forecast for 2008.
(c) Which of the methods in parts (a) and (b) produces better forecasts for the 3 years from 2011 to 2013? Answer on the basis of mean square error (MAD).
Problem 6
A company manufactures two products, Product A and Product B. The wholesale price and manufacturing cost.
Learn SQL from basic queries to Advance queriesmanishkhaire30
Dive into the world of data analysis with our comprehensive guide on mastering SQL! This presentation offers a practical approach to learning SQL, focusing on real-world applications and hands-on practice. Whether you're a beginner or looking to sharpen your skills, this guide provides the tools you need to extract, analyze, and interpret data effectively.
Key Highlights:
Foundations of SQL: Understand the basics of SQL, including data retrieval, filtering, and aggregation.
Advanced Queries: Learn to craft complex queries to uncover deep insights from your data.
Data Trends and Patterns: Discover how to identify and interpret trends and patterns in your datasets.
Practical Examples: Follow step-by-step examples to apply SQL techniques in real-world scenarios.
Actionable Insights: Gain the skills to derive actionable insights that drive informed decision-making.
Join us on this journey to enhance your data analysis capabilities and unlock the full potential of SQL. Perfect for data enthusiasts, analysts, and anyone eager to harness the power of data!
#DataAnalysis #SQL #LearningSQL #DataInsights #DataScience #Analytics
Adjusting OpenMP PageRank : SHORT REPORT / NOTESSubhajit Sahu
For massive graphs that fit in RAM, but not in GPU memory, it is possible to take
advantage of a shared memory system with multiple CPUs, each with multiple cores, to
accelerate pagerank computation. If the NUMA architecture of the system is properly taken
into account with good vertex partitioning, the speedup can be significant. To take steps in
this direction, experiments are conducted to implement pagerank in OpenMP using two
different approaches, uniform and hybrid. The uniform approach runs all primitives required
for pagerank in OpenMP mode (with multiple threads). On the other hand, the hybrid
approach runs certain primitives in sequential mode (i.e., sumAt, multiply).
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdfEnterprise Wired
In this guide, we'll explore the key considerations and features to look for when choosing a Trusted analytics platform that meets your organization's needs and delivers actionable intelligence you can trust.
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Round table discussion of vector databases, unstructured data, ai, big data, real-time, robots and Milvus.
A lively discussion with NJ Gen AI Meetup Lead, Prasad and Procure.FYI's Co-Found
Adjusting primitives for graph : SHORT REPORT / NOTESSubhajit Sahu
Graph algorithms, like PageRank Compressed Sparse Row (CSR) is an adjacency-list based graph representation that is
Multiply with different modes (map)
1. Performance of sequential execution based vs OpenMP based vector multiply.
2. Comparing various launch configs for CUDA based vector multiply.
Sum with different storage types (reduce)
1. Performance of vector element sum using float vs bfloat16 as the storage type.
Sum with different modes (reduce)
1. Performance of sequential execution based vs OpenMP based vector element sum.
2. Performance of memcpy vs in-place based CUDA based vector element sum.
3. Comparing various launch configs for CUDA based vector element sum (memcpy).
4. Comparing various launch configs for CUDA based vector element sum (in-place).
Sum with in-place strategies of CUDA mode (reduce)
1. Comparing various launch configs for CUDA based vector element sum (in-place).
The Building Blocks of QuestDB, a Time Series Databasejavier ramirez
Talk Delivered at Valencia Codes Meetup 2024-06.
Traditionally, databases have treated timestamps just as another data type. However, when performing real-time analytics, timestamps should be first class citizens and we need rich time semantics to get the most out of our data. We also need to deal with ever growing datasets while keeping performant, which is as fun as it sounds.
It is no wonder time-series databases are now more popular than ever before. Join me in this session to learn about the internal architecture and building blocks of QuestDB, an open source time-series database designed for speed. We will also review a history of some of the changes we have gone over the past two years to deal with late and unordered data, non-blocking writes, read-replicas, or faster batch ingestion.
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Discussion on Vector Databases, Unstructured Data and AI
https://www.meetup.com/unstructured-data-meetup-new-york/
This meetup is for people working in unstructured data. Speakers will come present about related topics such as vector databases, LLMs, and managing data at scale. The intended audience of this group includes roles like machine learning engineers, data scientists, data engineers, software engineers, and PMs.This meetup was formerly Milvus Meetup, and is sponsored by Zilliz maintainers of Milvus.
1. Is this house ‘worthy’ to be your home?
Using Regression Analysis to Predict HDB Resale Prices
Valerie Lim
4 Feb 2020
2. 2
Table of Contents
MethodologyIntroduction
Results Conclusion
F e a t u r e I m p o r t a n c e
Results
M o d e l P e r f o r m a n c e
Methodology
Future Work
F e a t u r e S e l e c t i o nO u t l i n e
3. 3
Introduction
• A house represents the biggest investment
• Yet, first time home buyers are often
unsure about the
1. factors that influence prices
2. ‘true’ value of property prices and
Introduction Methodology Results Conclusion Future Work
4. 4
Introduction
Goal:
Use property listings to predict prices
and grant buyers’ with more power in
determining if their desired property is
priced at the 'truth' value
Introduction Methodology Results Conclusion Future Work
5. 5
Data
collection
- Dropped duplicated listings
- Imputed ‘missing’ information as
0
- Converted categorical variables
(e.g. property type, model type)
to dummy variables
- Age of flat
- HDB Towns HDB
Region
- Log transformed price
Methodology
Data
Preprocessin
g
Feature
Engineering
Feature
Selection
Train &
Evaluate
Model
Tools:
- Lasso regression
model
Metric:
- Mean absolute error,
- Root mean squared
error
Introduction Methodology Results Conclusion Future Work
6. Type Variable Reason for Removal In final
model?
High
correlation
Zero correlation
with target
Backward
stepwise
Lasso
regression
Target Asking price (log) ✓
Core Property Type ✓
Model type Certain Model
types
✓
Bedrooms ✓
Per square foot ✓
Area ✓
Furnish ✓
Land Tenure
HDB Town ✓
HDB Region ✓
Age
Year built ✓
Facilitie
s
Jacuzzi, Meeting Rooms ✓
Private pool, Garage ✓
Air Conditioning
Renovated
Corner Unit
Water Heater
Balcony
Private Garden
Outdoor Patio
Original Condition
Hairdryer
Bathtub
Maidsroom
Colonial Building
Private Lift
Cooker Hob/hood
City View
Park/greenery View
Sea View
Swimming Pool View
Bombshelter
Walk-in-wardrobe
Roof Terrace
✓
FEA TU R E SELEC TION
Introduction Methodology Results Conclusion Future Work
7. Results
MOD EL PER FOR MA N C E
MEAN ABSOLUTE ERROR
0.10
(~$ 52K)
ROOT MEAN SQUARED ERROR
0.16
(~$ 74K)
Introduction Methodology Results Conclusion Future Work
8. 8
Increase Price (log):
1. Per square foot is the strongest
predictor of price
- Every 1 unit ↑ in PSF, prices ↑ by $61k
Results
FEA TU R E IMPOR TA N C E
Introduction Methodology Results Conclusion Future Work
9. Increase Price (log):
2. Property type
as compared to 3-room flats,
• Executive flats ($55k more)
• 5-room flats ($55k more)
• Jumbo flats ($51k more)
Results
FEA TU R E IMPOR TA N C E
9
Introduction Methodology Results Conclusion Future Work
10. 10
Increase Price (log):
3. Number of bedrooms
- Every additional bedroom ↑ prices by $54k
Results
FEA TU R E IMPOR TA N C E
10
Introduction Methodology Results Conclusion Future Work
11. 11
1
A 3-room flat
• Smaller per square foot
For a relatively cheaper
house:
11
Conclusion
There are different types of HDB houses you can call home
2
A 5-room, jumbo or executive flat
If you need a bigger living
space and have more
budget:
Introduction Methodology Results Conclusion Future Work
12. 12
Conclusion
There are different types of HDB houses you can call home
3 But they could matter based on individuals’
preferences
Peripheral factors don’t
influence prices as much as
core features
Introduction Methodology Results Conclusion Future Work
13. 13
• Collect more data, from multiple sources.
• Retrieve actual purchasing prices from ERA
• Create additional features e.g. distance from
business districts – Central Business
District, Mapletree Business District, Jurong
Lake District
Future Work
Introduction Methodology Results Conclusion Future Work
18. Model Comparison
Linear Regression Model Performance Comparison
Model Type Adjusted R2 RMSE MAE RMSE
(exponential)
MAE
(exponential)
Lasso
Regresison
Validation 84% 0.12 0.10 60k 46k
Test 77% 0.15 0.10 74k 52k
Ordinary Least
Square
Validation 84% 0.12 0.10 61k 45k
Test 84% 0.15 0.10 72k 50k
Good morning everyone. Today, I’ll be sharing with you my project on using Regression Analysis to predict HDB resale prices
I’ll begin by setting up the context and problem statement, before I walk you through my methodology. Next, I’ll dive into the results as well as some interesting insights and recommendation. Lastly, I’ll wrap up with some future directions for this project
For most households, a house represents the biggest investment.
For couples who are planning to settle down but couldn’t secure a Built-To-Order flat would have to seek alternatives, such as HDB Resale flats.
Yet, as first-time home buyers, they are often clueless about the market rate of house prices. This could be because there are a lot of factors that could influence house prices. Unless you’re sufficiently well-read about the property market or has done/is doing extensive research, buyers often have imperfect information. As a result, property agent are likely to lead this conversation.
Can buyers be granted with actionable insights so that the balance can be more symmetrical?
Yes! That’s the aim of this project: to provide data-driven and valuable guidance to first-time home buyers in determining if their desired property is priced at the 'truth' value
- I used Beautiful Soup to scrape HDB resale flats from SRX. Each listing has various information, from core information such as property type, name, location, model type to more granular information such as the kind of facilities they have, whether it is a corner unit or renovated etc.
Since the target audience are couples who are searching for a flat, I focused on listings that are 3-room and above
Among the pool of selected listings, there are some duplicated ones which I dropped.
Some listings omitted that other listings provided, and I imputed these missing ones as 0
I converted such variables (e.g. property type, model type) to dummy variables
I created new features such as age from years built for ease of interpretation, and aggregated HDB towns into regions to investigate whether different areas in Sg have an impact on prices.
Feature selection was conducted at various stages of the model building process, which I’ll share in greater detail in the next slide
Finally I built the model using Lasso reg and used mainly Mean absolute error (which is an average error) as my main metric for evaluation
Before building the models:
To avoid multicollinearity, features that were highly correlated were removed. Features that had 0 correlation with target variable was also removed.
After which, features were selected using backward stepwise method. Based on OLS model, features with p-values above 0.05 were removed
Lasso regression was also used for feature selection. Features that had coefficients of 0 were removed.
Finally, the final features are property type and per square foot and number of bedrooms.
Using this model, the mean absolute error is about 0.1. after reversing the log transformation, this error is equivalent to $52k, which means buyers using this tool can expect roughly that much wiggle room in determining housing prices based on the features mentioned earlier.
Rmse: Since the errors are squared before they are averaged, the RMSE gives a relatively high weight to large errors. This means the RMSE should be more useful when large errors are particularly undesirable.
Plot actual log prices against my predicted log prices. The red line represents a perfect model where my actual prices is equal to =my predicted prices. My linear reg model appears suitable for the data, except the some of these outliers – where the model predicted a lower price than actual.
Based on these data points, the model suggested property type bigger than 5-room flats and PSF to be sig predictors. But other features could be more suitable in predicting housing prices for these data, so the model can’t predict these flats accurately.
To best understand how the model works, let’s dive into the features. The plot below shows the relative importance of each feature
The strongest positive predictors is per square foot. property type.
Model intercept = 13.12
For 1 unit increase in PSF, price increases by 61k
Followed by property type
- An executive flat and 5-room flat increases price by 55k more than a 3-room flat
- A jumbo flat increases price by 51k more than a 3-room flat
Lastly bedrooms. For every additional bedrooms, price increases by 54k
In conclusion, there are different types of HDB houses you can call home.
With the above insights, this tells us that if you are looking for something that is relatively cheaper, you can consider a simple 3-room flat with smaller PSF
If you need a bigger living space and have more budget, a 5-room, jumbo or executive flat
Lastly, Aesthetics factors (e.g. sea view, renovated, corner unit) are not as significant in determining prices as core features (e.g. property or model type); But they could matter based on individuals’ preferences
Current model has still a lot of room for improvement. To increase the accuracy of prediction, I can collect more data, as this results is based on 1000 data points. Having more listings would lower the MAE.
The current source of data is limited to SRX. For a more comprehensive analysis, I could scrape from multiple property market websites such as Propertyguru, 99.co
However, relying on data from property portals may be biased as agents may hike up prices to earn a larger share of the pie. For an even more accurate analysis, I could gather the actual purchasing prices from ERA.