SlideShare a Scribd company logo
1 of 12
ChennaiEstate’s Price
Prediction Model
Presented by :
Soumyadeep Chowdhury
Praxis Business School
Business Case
• Mr Satyanarayanan is a young and dynamic engineer, working in one of the
top IT MNC in Chennai.
• He’s planning to start his family soon so he decides to buy a new house in
Chennai and comes to know about ‘ChennaiEstate’ – a real estate
‘Aggregator’ website
• He then checks the website of ‘ChennaiEstate’ and learn more about the
real estate properties and it’s price
• We, the ChennaiEstate Analytics team, have deployed a ‘Price Prediction
model’ for helping our customers
• Mr Satyanarayanan enters various features of the property he wants to buy
and clicks the ‘Predict’ button
• Our model shows him the Price of the property along with visualizations
Consumer Decision Dilemma
Price per Square feet heat map [Dark Red: High
Price , Dark Yellow: Low Price]
Commission per Square feet heat map [Dark Red:
High Commission, Dark Yellow: Low Commission]
Our Approach
• Exploratory Data Analysis – Checking of Missing values and other
irregular values in the train & test dataset.
• Visualisation – various kinds of plots, e.g.- density plot, bar plot, heat
map, contour plots, bubble charts etc.
• Missing value imputation
• Feature Engineering
• Model building
• Model validation
Exploratory Data Analysis
• First we checked for Missing values/Blank values/other irregular values
• We found missing values for the columns – Bedroom, Bathroom and
Overall score in train and test set
• We used the ‘missForest’ package to replace the missing values in both
train and test set.
• This package uses tree-based Random Forest to replace the categorical
values and Random Forest Regression to replace the numerical values
• Then we used ‘RLOF’ package to detect multivariate outlier. We found a
very few outliers (and removed them for linear regression model)
Advanced Visualisations – Bubble Charts
This chart shows the conditions of the properties being
sold in various Locations with different colour bubbles This chart shows average Property Registration Fee
in various Locations with different colour bubbles
Feature Engineering & Model Building
• We start off with Linear model because a linear model has higher interpretability in a real
life scenario than a non-linear model
• We first created a ‘Vanilla Linear Regression’ model which had an adjusted 𝑅2
= 0.965 &
RMSE submission score = 660,000
• Then we did a QQ plot & Variance plot and found non-linearity in it
• So we started adding Interaction term and polynomial terms
• We added a total of 105 features, out of many have real life interpretability and many are
purely mathematical in nature
Non-Linear model – Deep Learning
• The RMSE achieved after adding 105 features = 225,000 and we hit a
plateau after that point
• At this point the QQ plot still showed presence of high non-linearity at
some portions
• So we decided to try non-linear model – Artificial Neural Network
• We have used the H2o package in R to model the data with Single Layer
Neural Network and Deep Neural Network
• H2o package divides the core into 4 parts and distributes the computation
so it is considerably fast compared to other DL packages available in R
• We have used ‘ReLU’ as the activation function as it performs better
compared to ‘Tanh’ activation in Multiple hidden layer architecture
• First we tried to tune single hidden layer but later moved on to double
hidden layer to obtain improved RMSE score
RMSE: 50000
RMSE: 9600
RMSE: 20000
RMSE: 100000
RMSE: 150000
Deep Learning Architecture & Tuning
• We started of with a single hidden layer
model with 20 hidden nodes and we got
an RMSE around 150,000. Then we used
the ‘Random Discrete Search’ feature to
find the optimal 2 hidden layer
architecture
• The 2 layer architecture came out to be –
(20,25) with RMSE around 100,000
• The ‘Random Discrete Search’ does not
always give the optimal parameters. So we
tried many more 2 layer architectures –
(25,30) with RMSE around 50000
• (30,35) with RMSE around 20000
• And finally, it plateaued at (32,39) with
RMSE around 9642
Benefit Analysis – Business & Consumer
• This model will benefit ‘Chennaiestate’ in various ways –
• Increase transparency between ‘chennaiestate’ & it’s customers
• Enable customers with no/minimal knowledge to understand about the actual
price
• Help chennaiestate to establish a trust relationship with it’s customers
• Increase customer satisfaction
• Positive word of mouth due to increased ‘CSAT’ and consequently higher
market share
• Customers will be benefited in following ways –
• A transparent system to know about the actual real estate prices
• An unbiased automated system for price prediction without human influence
• Compare with the various real estate builder prices and decide from whom to
buy
• Fast, convenient and flexible system for price understanding
THANK YOU

More Related Content

Similar to Greak lakes - Beyond Infinity

AWS Stockholm Meetup June 2019 - Cybercom DeepRacer story
AWS Stockholm Meetup June 2019 - Cybercom DeepRacer storyAWS Stockholm Meetup June 2019 - Cybercom DeepRacer story
AWS Stockholm Meetup June 2019 - Cybercom DeepRacer storyRolf Koski
 
AWS Finland Meetup June 2019 - DeepRacer story
AWS Finland Meetup June 2019 - DeepRacer storyAWS Finland Meetup June 2019 - DeepRacer story
AWS Finland Meetup June 2019 - DeepRacer storyJouni Luoma
 
06 image features
06 image features06 image features
06 image featuresankit_ppt
 
Big Data in the Cloud: How the RISElab Enables Computers to Make Intelligent ...
Big Data in the Cloud: How the RISElab Enables Computers to Make Intelligent ...Big Data in the Cloud: How the RISElab Enables Computers to Make Intelligent ...
Big Data in the Cloud: How the RISElab Enables Computers to Make Intelligent ...Amazon Web Services
 
(ARC310) Solving Amazon's Catalog Contention With Amazon Kinesis
(ARC310) Solving Amazon's Catalog Contention With Amazon Kinesis(ARC310) Solving Amazon's Catalog Contention With Amazon Kinesis
(ARC310) Solving Amazon's Catalog Contention With Amazon KinesisAmazon Web Services
 
A Fully Progressive approach to Single image super-resolution
A Fully Progressive approach to Single image super-resolution A Fully Progressive approach to Single image super-resolution
A Fully Progressive approach to Single image super-resolution Mohammed Ashour
 
Predicting Azure Churn with Deep Learning and Explaining Predictions with LIME
Predicting Azure Churn with Deep Learning and Explaining Predictions with LIMEPredicting Azure Churn with Deep Learning and Explaining Predictions with LIME
Predicting Azure Churn with Deep Learning and Explaining Predictions with LIMEFeng Zhu
 
The challenges of live events scalability
The challenges of live events scalabilityThe challenges of live events scalability
The challenges of live events scalabilityGuy Tomer
 
151106 Sketch-based 3D Shape Retrievals using Convolutional Neural Networks
151106 Sketch-based 3D Shape Retrievals using Convolutional Neural Networks151106 Sketch-based 3D Shape Retrievals using Convolutional Neural Networks
151106 Sketch-based 3D Shape Retrievals using Convolutional Neural NetworksJunho Cho
 
DQN (Deep Q-Network)
DQN (Deep Q-Network)DQN (Deep Q-Network)
DQN (Deep Q-Network)Dong Guo
 
General Tips for participating Kaggle Competitions
General Tips for participating Kaggle CompetitionsGeneral Tips for participating Kaggle Competitions
General Tips for participating Kaggle CompetitionsMark Peng
 
10 -- Overfitting and Underfitting.pptx
10 -- Overfitting and Underfitting.pptx10 -- Overfitting and Underfitting.pptx
10 -- Overfitting and Underfitting.pptxkpcp
 
Scale Saliency: Applications in Visual Matching,Tracking and View-Based Objec...
Scale Saliency: Applications in Visual Matching,Tracking and View-Based Objec...Scale Saliency: Applications in Visual Matching,Tracking and View-Based Objec...
Scale Saliency: Applications in Visual Matching,Tracking and View-Based Objec...Jonathon Hare
 
Practicing DDD & CQRS
Practicing DDD & CQRSPracticing DDD & CQRS
Practicing DDD & CQRSMuhammad Ali
 
Machine Learning.pptx
Machine Learning.pptxMachine Learning.pptx
Machine Learning.pptxJasonTuran2
 
Nearest Neighbor Customer Insight
Nearest Neighbor Customer InsightNearest Neighbor Customer Insight
Nearest Neighbor Customer InsightMapR Technologies
 
EMOD_Optimization_Presentation.pptx
EMOD_Optimization_Presentation.pptxEMOD_Optimization_Presentation.pptx
EMOD_Optimization_Presentation.pptxAliElMoselhy
 
Deep-Dive into Deep Learning Pipelines with Sue Ann Hong and Tim Hunter
Deep-Dive into Deep Learning Pipelines with Sue Ann Hong and Tim HunterDeep-Dive into Deep Learning Pipelines with Sue Ann Hong and Tim Hunter
Deep-Dive into Deep Learning Pipelines with Sue Ann Hong and Tim HunterDatabricks
 

Similar to Greak lakes - Beyond Infinity (20)

AWS Stockholm Meetup June 2019 - Cybercom DeepRacer story
AWS Stockholm Meetup June 2019 - Cybercom DeepRacer storyAWS Stockholm Meetup June 2019 - Cybercom DeepRacer story
AWS Stockholm Meetup June 2019 - Cybercom DeepRacer story
 
AWS Finland Meetup June 2019 - DeepRacer story
AWS Finland Meetup June 2019 - DeepRacer storyAWS Finland Meetup June 2019 - DeepRacer story
AWS Finland Meetup June 2019 - DeepRacer story
 
06 image features
06 image features06 image features
06 image features
 
Big Data in the Cloud: How the RISElab Enables Computers to Make Intelligent ...
Big Data in the Cloud: How the RISElab Enables Computers to Make Intelligent ...Big Data in the Cloud: How the RISElab Enables Computers to Make Intelligent ...
Big Data in the Cloud: How the RISElab Enables Computers to Make Intelligent ...
 
(ARC310) Solving Amazon's Catalog Contention With Amazon Kinesis
(ARC310) Solving Amazon's Catalog Contention With Amazon Kinesis(ARC310) Solving Amazon's Catalog Contention With Amazon Kinesis
(ARC310) Solving Amazon's Catalog Contention With Amazon Kinesis
 
A Fully Progressive approach to Single image super-resolution
A Fully Progressive approach to Single image super-resolution A Fully Progressive approach to Single image super-resolution
A Fully Progressive approach to Single image super-resolution
 
Predicting Azure Churn with Deep Learning and Explaining Predictions with LIME
Predicting Azure Churn with Deep Learning and Explaining Predictions with LIMEPredicting Azure Churn with Deep Learning and Explaining Predictions with LIME
Predicting Azure Churn with Deep Learning and Explaining Predictions with LIME
 
SVM
SVMSVM
SVM
 
The challenges of live events scalability
The challenges of live events scalabilityThe challenges of live events scalability
The challenges of live events scalability
 
151106 Sketch-based 3D Shape Retrievals using Convolutional Neural Networks
151106 Sketch-based 3D Shape Retrievals using Convolutional Neural Networks151106 Sketch-based 3D Shape Retrievals using Convolutional Neural Networks
151106 Sketch-based 3D Shape Retrievals using Convolutional Neural Networks
 
DQN (Deep Q-Network)
DQN (Deep Q-Network)DQN (Deep Q-Network)
DQN (Deep Q-Network)
 
General Tips for participating Kaggle Competitions
General Tips for participating Kaggle CompetitionsGeneral Tips for participating Kaggle Competitions
General Tips for participating Kaggle Competitions
 
Design pattern
Design patternDesign pattern
Design pattern
 
10 -- Overfitting and Underfitting.pptx
10 -- Overfitting and Underfitting.pptx10 -- Overfitting and Underfitting.pptx
10 -- Overfitting and Underfitting.pptx
 
Scale Saliency: Applications in Visual Matching,Tracking and View-Based Objec...
Scale Saliency: Applications in Visual Matching,Tracking and View-Based Objec...Scale Saliency: Applications in Visual Matching,Tracking and View-Based Objec...
Scale Saliency: Applications in Visual Matching,Tracking and View-Based Objec...
 
Practicing DDD & CQRS
Practicing DDD & CQRSPracticing DDD & CQRS
Practicing DDD & CQRS
 
Machine Learning.pptx
Machine Learning.pptxMachine Learning.pptx
Machine Learning.pptx
 
Nearest Neighbor Customer Insight
Nearest Neighbor Customer InsightNearest Neighbor Customer Insight
Nearest Neighbor Customer Insight
 
EMOD_Optimization_Presentation.pptx
EMOD_Optimization_Presentation.pptxEMOD_Optimization_Presentation.pptx
EMOD_Optimization_Presentation.pptx
 
Deep-Dive into Deep Learning Pipelines with Sue Ann Hong and Tim Hunter
Deep-Dive into Deep Learning Pipelines with Sue Ann Hong and Tim HunterDeep-Dive into Deep Learning Pipelines with Sue Ann Hong and Tim Hunter
Deep-Dive into Deep Learning Pipelines with Sue Ann Hong and Tim Hunter
 

Recently uploaded

Credit Card Fraud Detection: Safeguarding Transactions in the Digital Age
Credit Card Fraud Detection: Safeguarding Transactions in the Digital AgeCredit Card Fraud Detection: Safeguarding Transactions in the Digital Age
Credit Card Fraud Detection: Safeguarding Transactions in the Digital AgeBoston Institute of Analytics
 
Bios of leading Astrologers & Researchers
Bios of leading Astrologers & ResearchersBios of leading Astrologers & Researchers
Bios of leading Astrologers & Researchersdarmandersingh4580
 
Formulas dax para power bI de microsoft.pdf
Formulas dax para power bI de microsoft.pdfFormulas dax para power bI de microsoft.pdf
Formulas dax para power bI de microsoft.pdfRobertoOcampo24
 
如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证acoha1
 
Northern New England Tableau User Group (TUG) May 2024
Northern New England Tableau User Group (TUG) May 2024Northern New England Tableau User Group (TUG) May 2024
Northern New England Tableau User Group (TUG) May 2024patrickdtherriault
 
Predictive Precipitation: Advanced Rain Forecasting Techniques
Predictive Precipitation: Advanced Rain Forecasting TechniquesPredictive Precipitation: Advanced Rain Forecasting Techniques
Predictive Precipitation: Advanced Rain Forecasting TechniquesBoston Institute of Analytics
 
如何办理英国卡迪夫大学毕业证(Cardiff毕业证书)成绩单留信学历认证
如何办理英国卡迪夫大学毕业证(Cardiff毕业证书)成绩单留信学历认证如何办理英国卡迪夫大学毕业证(Cardiff毕业证书)成绩单留信学历认证
如何办理英国卡迪夫大学毕业证(Cardiff毕业证书)成绩单留信学历认证ju0dztxtn
 
Aggregations - The Elasticsearch "GROUP BY"
Aggregations - The Elasticsearch "GROUP BY"Aggregations - The Elasticsearch "GROUP BY"
Aggregations - The Elasticsearch "GROUP BY"John Sobanski
 
Data Analysis Project Presentation : NYC Shooting Cluster Analysis
Data Analysis Project Presentation : NYC Shooting Cluster AnalysisData Analysis Project Presentation : NYC Shooting Cluster Analysis
Data Analysis Project Presentation : NYC Shooting Cluster AnalysisBoston Institute of Analytics
 
NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...
NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...
NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...Amil baba
 
edited gordis ebook sixth edition david d.pdf
edited gordis ebook sixth edition david d.pdfedited gordis ebook sixth edition david d.pdf
edited gordis ebook sixth edition david d.pdfgreat91
 
Seven tools of quality control.slideshare
Seven tools of quality control.slideshareSeven tools of quality control.slideshare
Seven tools of quality control.slideshareraiaryan448
 
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...yulianti213969
 
How to Transform Clinical Trial Management with Advanced Data Analytics
How to Transform Clinical Trial Management with Advanced Data AnalyticsHow to Transform Clinical Trial Management with Advanced Data Analytics
How to Transform Clinical Trial Management with Advanced Data AnalyticsBrainSell Technologies
 
Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...
Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...
Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...Klinik Aborsi
 
Statistics Informed Decisions Using Data 5th edition by Michael Sullivan solu...
Statistics Informed Decisions Using Data 5th edition by Michael Sullivan solu...Statistics Informed Decisions Using Data 5th edition by Michael Sullivan solu...
Statistics Informed Decisions Using Data 5th edition by Michael Sullivan solu...ssuserf63bd7
 
Displacement, Velocity, Acceleration, and Second Derivatives
Displacement, Velocity, Acceleration, and Second DerivativesDisplacement, Velocity, Acceleration, and Second Derivatives
Displacement, Velocity, Acceleration, and Second Derivatives23050636
 
如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证
如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证
如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证zifhagzkk
 
1:1原版定制利物浦大学毕业证(Liverpool毕业证)成绩单学位证书留信学历认证
1:1原版定制利物浦大学毕业证(Liverpool毕业证)成绩单学位证书留信学历认证1:1原版定制利物浦大学毕业证(Liverpool毕业证)成绩单学位证书留信学历认证
1:1原版定制利物浦大学毕业证(Liverpool毕业证)成绩单学位证书留信学历认证ppy8zfkfm
 
如何办理加州大学伯克利分校毕业证(UCB毕业证)成绩单留信学历认证
如何办理加州大学伯克利分校毕业证(UCB毕业证)成绩单留信学历认证如何办理加州大学伯克利分校毕业证(UCB毕业证)成绩单留信学历认证
如何办理加州大学伯克利分校毕业证(UCB毕业证)成绩单留信学历认证a8om7o51
 

Recently uploaded (20)

Credit Card Fraud Detection: Safeguarding Transactions in the Digital Age
Credit Card Fraud Detection: Safeguarding Transactions in the Digital AgeCredit Card Fraud Detection: Safeguarding Transactions in the Digital Age
Credit Card Fraud Detection: Safeguarding Transactions in the Digital Age
 
Bios of leading Astrologers & Researchers
Bios of leading Astrologers & ResearchersBios of leading Astrologers & Researchers
Bios of leading Astrologers & Researchers
 
Formulas dax para power bI de microsoft.pdf
Formulas dax para power bI de microsoft.pdfFormulas dax para power bI de microsoft.pdf
Formulas dax para power bI de microsoft.pdf
 
如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证
 
Northern New England Tableau User Group (TUG) May 2024
Northern New England Tableau User Group (TUG) May 2024Northern New England Tableau User Group (TUG) May 2024
Northern New England Tableau User Group (TUG) May 2024
 
Predictive Precipitation: Advanced Rain Forecasting Techniques
Predictive Precipitation: Advanced Rain Forecasting TechniquesPredictive Precipitation: Advanced Rain Forecasting Techniques
Predictive Precipitation: Advanced Rain Forecasting Techniques
 
如何办理英国卡迪夫大学毕业证(Cardiff毕业证书)成绩单留信学历认证
如何办理英国卡迪夫大学毕业证(Cardiff毕业证书)成绩单留信学历认证如何办理英国卡迪夫大学毕业证(Cardiff毕业证书)成绩单留信学历认证
如何办理英国卡迪夫大学毕业证(Cardiff毕业证书)成绩单留信学历认证
 
Aggregations - The Elasticsearch "GROUP BY"
Aggregations - The Elasticsearch "GROUP BY"Aggregations - The Elasticsearch "GROUP BY"
Aggregations - The Elasticsearch "GROUP BY"
 
Data Analysis Project Presentation : NYC Shooting Cluster Analysis
Data Analysis Project Presentation : NYC Shooting Cluster AnalysisData Analysis Project Presentation : NYC Shooting Cluster Analysis
Data Analysis Project Presentation : NYC Shooting Cluster Analysis
 
NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...
NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...
NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...
 
edited gordis ebook sixth edition david d.pdf
edited gordis ebook sixth edition david d.pdfedited gordis ebook sixth edition david d.pdf
edited gordis ebook sixth edition david d.pdf
 
Seven tools of quality control.slideshare
Seven tools of quality control.slideshareSeven tools of quality control.slideshare
Seven tools of quality control.slideshare
 
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...
 
How to Transform Clinical Trial Management with Advanced Data Analytics
How to Transform Clinical Trial Management with Advanced Data AnalyticsHow to Transform Clinical Trial Management with Advanced Data Analytics
How to Transform Clinical Trial Management with Advanced Data Analytics
 
Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...
Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...
Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...
 
Statistics Informed Decisions Using Data 5th edition by Michael Sullivan solu...
Statistics Informed Decisions Using Data 5th edition by Michael Sullivan solu...Statistics Informed Decisions Using Data 5th edition by Michael Sullivan solu...
Statistics Informed Decisions Using Data 5th edition by Michael Sullivan solu...
 
Displacement, Velocity, Acceleration, and Second Derivatives
Displacement, Velocity, Acceleration, and Second DerivativesDisplacement, Velocity, Acceleration, and Second Derivatives
Displacement, Velocity, Acceleration, and Second Derivatives
 
如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证
如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证
如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证
 
1:1原版定制利物浦大学毕业证(Liverpool毕业证)成绩单学位证书留信学历认证
1:1原版定制利物浦大学毕业证(Liverpool毕业证)成绩单学位证书留信学历认证1:1原版定制利物浦大学毕业证(Liverpool毕业证)成绩单学位证书留信学历认证
1:1原版定制利物浦大学毕业证(Liverpool毕业证)成绩单学位证书留信学历认证
 
如何办理加州大学伯克利分校毕业证(UCB毕业证)成绩单留信学历认证
如何办理加州大学伯克利分校毕业证(UCB毕业证)成绩单留信学历认证如何办理加州大学伯克利分校毕业证(UCB毕业证)成绩单留信学历认证
如何办理加州大学伯克利分校毕业证(UCB毕业证)成绩单留信学历认证
 

Greak lakes - Beyond Infinity

  • 1. ChennaiEstate’s Price Prediction Model Presented by : Soumyadeep Chowdhury Praxis Business School
  • 2. Business Case • Mr Satyanarayanan is a young and dynamic engineer, working in one of the top IT MNC in Chennai. • He’s planning to start his family soon so he decides to buy a new house in Chennai and comes to know about ‘ChennaiEstate’ – a real estate ‘Aggregator’ website • He then checks the website of ‘ChennaiEstate’ and learn more about the real estate properties and it’s price • We, the ChennaiEstate Analytics team, have deployed a ‘Price Prediction model’ for helping our customers • Mr Satyanarayanan enters various features of the property he wants to buy and clicks the ‘Predict’ button • Our model shows him the Price of the property along with visualizations
  • 3. Consumer Decision Dilemma Price per Square feet heat map [Dark Red: High Price , Dark Yellow: Low Price] Commission per Square feet heat map [Dark Red: High Commission, Dark Yellow: Low Commission]
  • 4. Our Approach • Exploratory Data Analysis – Checking of Missing values and other irregular values in the train & test dataset. • Visualisation – various kinds of plots, e.g.- density plot, bar plot, heat map, contour plots, bubble charts etc. • Missing value imputation • Feature Engineering • Model building • Model validation
  • 5. Exploratory Data Analysis • First we checked for Missing values/Blank values/other irregular values • We found missing values for the columns – Bedroom, Bathroom and Overall score in train and test set • We used the ‘missForest’ package to replace the missing values in both train and test set. • This package uses tree-based Random Forest to replace the categorical values and Random Forest Regression to replace the numerical values • Then we used ‘RLOF’ package to detect multivariate outlier. We found a very few outliers (and removed them for linear regression model)
  • 6. Advanced Visualisations – Bubble Charts This chart shows the conditions of the properties being sold in various Locations with different colour bubbles This chart shows average Property Registration Fee in various Locations with different colour bubbles
  • 7.
  • 8. Feature Engineering & Model Building • We start off with Linear model because a linear model has higher interpretability in a real life scenario than a non-linear model • We first created a ‘Vanilla Linear Regression’ model which had an adjusted 𝑅2 = 0.965 & RMSE submission score = 660,000 • Then we did a QQ plot & Variance plot and found non-linearity in it • So we started adding Interaction term and polynomial terms • We added a total of 105 features, out of many have real life interpretability and many are purely mathematical in nature
  • 9. Non-Linear model – Deep Learning • The RMSE achieved after adding 105 features = 225,000 and we hit a plateau after that point • At this point the QQ plot still showed presence of high non-linearity at some portions • So we decided to try non-linear model – Artificial Neural Network • We have used the H2o package in R to model the data with Single Layer Neural Network and Deep Neural Network • H2o package divides the core into 4 parts and distributes the computation so it is considerably fast compared to other DL packages available in R • We have used ‘ReLU’ as the activation function as it performs better compared to ‘Tanh’ activation in Multiple hidden layer architecture • First we tried to tune single hidden layer but later moved on to double hidden layer to obtain improved RMSE score
  • 10. RMSE: 50000 RMSE: 9600 RMSE: 20000 RMSE: 100000 RMSE: 150000 Deep Learning Architecture & Tuning • We started of with a single hidden layer model with 20 hidden nodes and we got an RMSE around 150,000. Then we used the ‘Random Discrete Search’ feature to find the optimal 2 hidden layer architecture • The 2 layer architecture came out to be – (20,25) with RMSE around 100,000 • The ‘Random Discrete Search’ does not always give the optimal parameters. So we tried many more 2 layer architectures – (25,30) with RMSE around 50000 • (30,35) with RMSE around 20000 • And finally, it plateaued at (32,39) with RMSE around 9642
  • 11. Benefit Analysis – Business & Consumer • This model will benefit ‘Chennaiestate’ in various ways – • Increase transparency between ‘chennaiestate’ & it’s customers • Enable customers with no/minimal knowledge to understand about the actual price • Help chennaiestate to establish a trust relationship with it’s customers • Increase customer satisfaction • Positive word of mouth due to increased ‘CSAT’ and consequently higher market share • Customers will be benefited in following ways – • A transparent system to know about the actual real estate prices • An unbiased automated system for price prediction without human influence • Compare with the various real estate builder prices and decide from whom to buy • Fast, convenient and flexible system for price understanding