SlideShare a Scribd company logo
1 of 26
Predicting House Price on
Ames Dataset
Evaluating Linear and Bayesian Regression
Supervisor :
Anirban Basu
Prepared By:-
Archit Vora (2015HT12480)
Data Description
• 80 features
• 23 Nominal
• Near by Area - Agriculture,
Commercial, Ordinal
• 23 Ordinal
• Service satisfaction
• Unsatisfied, neutral, satisfied
• Kitchen Quality - Bad, Good,
Excellent
• 14 Discrete
• 20 Continuous
• Example Features
• Nearby Area
• No of bedrooms
• Parking Facility
• Kitchen Quality
• Swimming Pool
• No of Floors
Filling Up Missing Values
• Common
• Encoding for Ordinal variables
• Leveling for nominal variables
• Specific
• Replace with most frequent value for discrete and nominal
• When no of missing values are less
• Mostly the case with our data
• Sometimes null value implies feature is not available
• Basement area null gets replaced with zero
• Inferring from related variables
• When garage year is null, we say house is without garage
• Allows us to infer garage area, no of cars, garage condition
• Linear Regression : Predict Lot Frontage from lot area (after sqrt)
• Improvements:
• Creating Levels blindly : 288 features, 0.157122 rmse
• After logically filling missing values : 263 features, 0.155965
Error Metric
• RMSE of log
• 5 fold cross validation
Linear Regression
Linear Least Squares
• P is Project matrix
• Instead of solving for Y we solve for
𝑌∗
• A = (𝑋 𝑇 𝑋) is the matrix of interest
• In our case it is not invertible
• We find Pseudo inverse with SVD
LLS with pseudo inverse
• SVD is fast
• U and V are orthogonal hence is transpose
• D is diagonal, so inverse each of them
• SVD allows pseudo inverse
• Just inverse the non zero entries of D
• Accuracy
• 0.16096
Gradient Descent
• Importance of Normalizing
• Iterations (step size 0.05)
• 100 – rmse 1.93
• 5k – rmse 0.48566
• 10k – rmse 0.38121
• 100k – rmse 0.18964
Shrinkage Intuition
• Ridge
• Lasso
• Assumption
• Two parameters
• Without shrinkage entire plane is
available
Ridge
• Matrix Inverse
• Unique inverse Exists
• Accuracy - 0.1395
• Matrix SVD
• Accuracy - 0.13927
• SAG
• Accuracy - 0.230815
• Tuning
• Alpha = 15
Lasso
• Indicating not lo use Lasso
• Generally used when we have
too many features say in the
range of 1000
• Accuracy - 0.19638
• Non zero coefficients – 13
• Tuning
• Alpha = 1
PCR
• SVD
• U reduced
• First 100 component
• Var explained = 0.9797
• rmse 0.14949
Linear Regression - Summary
• Baseline accuracy (average of training data)
• 0.399
• Simple Least Square
• LLS with Pseudo Inverse - rmse 0.16096
• Gradient descent – step size 0.005 (Did not help), diverging
• After Normalizing
• Iter 1,00,000 rmse 0.18964
• Iter 10, 000 rmse 0.38121
• Iter 5000 rmse 0.48566
• PCR
• 100 component
• Rmse 0.14949
• Ridge
• Matrix - alpha 15, rmse 0.1395
• Svd instead of finding inverse – 0.13927
• Sag – 0.230815
• Lasso
• Alpha 1, rmse 0.19638, non zero 13
Bayesian Regression
Bayes Theorem
• Role of prior
• Expert Advise
• What is fixed ?
• Data
• Parameter
• Denominator as complex multiple
integration over parameters
Markov Chain Monte Carlo
• Markov Chain
• Next state depends only on
current steps
• Application
• Hidden Markov models in speech
recognition
• Allows to draw independent samples
• Monte Carlo
• Simulation when simple
mathematical equation is simple
• Application
• Uncertainty in market is random,
predict sales
• Solving complex Integration
MCMC Steps – Metropolis Hasting
1. muProposal = norm(muCurrent, StdDev).rvs()
2. likelihoodCurrent = norm(muCurrent, 1).pdf(data).prod()
3. likelihoodProposal = norm(muProposal, 1).pdf(data).prod()
4. priorCurrent = norm(muPriorMu, muPriorSd).pdf(muCurrent)
5. priorProposal = norm(muPriorMu, muPriorSd).pdf(muProposal)
6. probCurrent = likelihoodCurrent * priorCurrent
7. probProposal = likelihoodProposal * priorProposal
8. probAccept = probProposal / probCurrent
9. accept = np.random.rand() < probAccept
10. if (accept == True) then muCurrent = muProposal
Visualization
Pymc3 style
1. Model
2. Describe 3. Simulate
4. You get the trace
Trace
Predicting for new sample
• Predict outcome for beta, alpha,
sigma of each iteration
• Draw random sample from given
mu and sigma
Ridge
• Accuracy = 0.33020
Lasso
• Change Prior of parameter to
strictly closer to zero distribution
• Laplace Distribution
• Accuracy : 0.25288
Some Kaggle Results
• Without Missing Data Script
• Accuracy – 0.12727
• Rank - 968
• With Missing Data Script
• Accuracy – 0.12664
• Rank - 932
Thank you !
• BITS Wilp
• Two years of amazing learning
• Adobe for work life balance
• My mentor Anirban Basu
• Mathematics will carry along, subjective knowledge will not
• Teachers, professors and seniors
• I think I am luck enough
• Parents and sisters
• I think I am blessed
Questions and Suggestions !

More Related Content

What's hot

Linear models and multiclass classification
Linear models and multiclass classificationLinear models and multiclass classification
Linear models and multiclass classificationNdSv94
 
Anomaly detection Workshop slides
Anomaly detection Workshop slidesAnomaly detection Workshop slides
Anomaly detection Workshop slidesQuantUniversity
 
Predicting Moscow Real Estate Prices with Azure Machine Learning
Predicting Moscow Real Estate Prices with Azure Machine LearningPredicting Moscow Real Estate Prices with Azure Machine Learning
Predicting Moscow Real Estate Prices with Azure Machine LearningLeo Salemann
 
Principal Component Analysis(PCA) understanding document
Principal Component Analysis(PCA) understanding documentPrincipal Component Analysis(PCA) understanding document
Principal Component Analysis(PCA) understanding documentNaveen Kumar
 
Biostatistics Workshop: Missing Data
Biostatistics Workshop: Missing DataBiostatistics Workshop: Missing Data
Biostatistics Workshop: Missing DataHopkinsCFAR
 
Cross-validation Tutorial: What, how and which?
Cross-validation Tutorial: What, how and which?Cross-validation Tutorial: What, how and which?
Cross-validation Tutorial: What, how and which?Pradeep Redddy Raamana
 
House price prediction
House price predictionHouse price prediction
House price predictionSabahBegum
 
CART Classification and Regression Trees Experienced User Guide
CART Classification and Regression Trees Experienced User GuideCART Classification and Regression Trees Experienced User Guide
CART Classification and Regression Trees Experienced User GuideSalford Systems
 
Dimension Reduction: What? Why? and How?
Dimension Reduction: What? Why? and How?Dimension Reduction: What? Why? and How?
Dimension Reduction: What? Why? and How?Kazi Toufiq Wadud
 
Introduction of Xgboost
Introduction of XgboostIntroduction of Xgboost
Introduction of Xgboostmichiaki ito
 
Feature selection concepts and methods
Feature selection concepts and methodsFeature selection concepts and methods
Feature selection concepts and methodsReza Ramezani
 
04 Classification in Data Mining
04 Classification in Data Mining04 Classification in Data Mining
04 Classification in Data MiningValerii Klymchuk
 
K Means Clustering Algorithm | K Means Clustering Example | Machine Learning ...
K Means Clustering Algorithm | K Means Clustering Example | Machine Learning ...K Means Clustering Algorithm | K Means Clustering Example | Machine Learning ...
K Means Clustering Algorithm | K Means Clustering Example | Machine Learning ...Simplilearn
 
Feature selection
Feature selectionFeature selection
Feature selectiondkpawar
 
K mean-clustering algorithm
K mean-clustering algorithmK mean-clustering algorithm
K mean-clustering algorithmparry prabhu
 
Support vector machines (svm)
Support vector machines (svm)Support vector machines (svm)
Support vector machines (svm)Sharayu Patil
 

What's hot (20)

My8clst
My8clstMy8clst
My8clst
 
Linear models and multiclass classification
Linear models and multiclass classificationLinear models and multiclass classification
Linear models and multiclass classification
 
Anomaly detection Workshop slides
Anomaly detection Workshop slidesAnomaly detection Workshop slides
Anomaly detection Workshop slides
 
Predicting Moscow Real Estate Prices with Azure Machine Learning
Predicting Moscow Real Estate Prices with Azure Machine LearningPredicting Moscow Real Estate Prices with Azure Machine Learning
Predicting Moscow Real Estate Prices with Azure Machine Learning
 
Principal Component Analysis(PCA) understanding document
Principal Component Analysis(PCA) understanding documentPrincipal Component Analysis(PCA) understanding document
Principal Component Analysis(PCA) understanding document
 
Biostatistics Workshop: Missing Data
Biostatistics Workshop: Missing DataBiostatistics Workshop: Missing Data
Biostatistics Workshop: Missing Data
 
Cross-validation Tutorial: What, how and which?
Cross-validation Tutorial: What, how and which?Cross-validation Tutorial: What, how and which?
Cross-validation Tutorial: What, how and which?
 
House price prediction
House price predictionHouse price prediction
House price prediction
 
CART Classification and Regression Trees Experienced User Guide
CART Classification and Regression Trees Experienced User GuideCART Classification and Regression Trees Experienced User Guide
CART Classification and Regression Trees Experienced User Guide
 
Pca
PcaPca
Pca
 
Dimension Reduction: What? Why? and How?
Dimension Reduction: What? Why? and How?Dimension Reduction: What? Why? and How?
Dimension Reduction: What? Why? and How?
 
Support vector machine
Support vector machineSupport vector machine
Support vector machine
 
Introduction of Xgboost
Introduction of XgboostIntroduction of Xgboost
Introduction of Xgboost
 
Feature selection concepts and methods
Feature selection concepts and methodsFeature selection concepts and methods
Feature selection concepts and methods
 
Logistic regression
Logistic regressionLogistic regression
Logistic regression
 
04 Classification in Data Mining
04 Classification in Data Mining04 Classification in Data Mining
04 Classification in Data Mining
 
K Means Clustering Algorithm | K Means Clustering Example | Machine Learning ...
K Means Clustering Algorithm | K Means Clustering Example | Machine Learning ...K Means Clustering Algorithm | K Means Clustering Example | Machine Learning ...
K Means Clustering Algorithm | K Means Clustering Example | Machine Learning ...
 
Feature selection
Feature selectionFeature selection
Feature selection
 
K mean-clustering algorithm
K mean-clustering algorithmK mean-clustering algorithm
K mean-clustering algorithm
 
Support vector machines (svm)
Support vector machines (svm)Support vector machines (svm)
Support vector machines (svm)
 

Similar to Ames housing

Intro to Reinforcement learning - part III
Intro to Reinforcement learning - part IIIIntro to Reinforcement learning - part III
Intro to Reinforcement learning - part IIIMikko Mäkipää
 
General Tips for participating Kaggle Competitions
General Tips for participating Kaggle CompetitionsGeneral Tips for participating Kaggle Competitions
General Tips for participating Kaggle CompetitionsMark Peng
 
Why quants love NAG & NAG from Excel
Why quants love NAG & NAG from ExcelWhy quants love NAG & NAG from Excel
Why quants love NAG & NAG from ExcelMarcin Krzysztofik
 
Digital Timing and Carrier Synchronization.ppt
Digital Timing and Carrier Synchronization.pptDigital Timing and Carrier Synchronization.ppt
Digital Timing and Carrier Synchronization.pptmohamadfarzansabahi1
 
Gradient descent method
Gradient descent methodGradient descent method
Gradient descent methodSanghyuk Chun
 
Like-for-Like Comparisons of Machine Learning Algorithms - Dominik Dahlem, Bo...
Like-for-Like Comparisons of Machine Learning Algorithms - Dominik Dahlem, Bo...Like-for-Like Comparisons of Machine Learning Algorithms - Dominik Dahlem, Bo...
Like-for-Like Comparisons of Machine Learning Algorithms - Dominik Dahlem, Bo...WithTheBest
 
Gradient Steepest method application on Griewank Function
Gradient Steepest method application  on Griewank Function Gradient Steepest method application  on Griewank Function
Gradient Steepest method application on Griewank Function Imane Haf
 
Biratnagar Robotics Club, Nepal
Biratnagar Robotics Club, NepalBiratnagar Robotics Club, Nepal
Biratnagar Robotics Club, NepalSarwar Alam Ansari
 
[Q-tangled 22] Deconstructing Quantum Machine Learning Algorithms - Sasha Laz...
[Q-tangled 22] Deconstructing Quantum Machine Learning Algorithms - Sasha Laz...[Q-tangled 22] Deconstructing Quantum Machine Learning Algorithms - Sasha Laz...
[Q-tangled 22] Deconstructing Quantum Machine Learning Algorithms - Sasha Laz...DataScienceConferenc1
 
controltrix - we make control solutions easier
controltrix - we make control solutions easiercontroltrix - we make control solutions easier
controltrix - we make control solutions easieranusheel nahar
 
Quantum Machine Learning for IBM AI
Quantum Machine Learning for IBM AIQuantum Machine Learning for IBM AI
Quantum Machine Learning for IBM AISasha Lazarevic
 
Array antenna and LMS algorithm
Array antenna and LMS algorithmArray antenna and LMS algorithm
Array antenna and LMS algorithmSHIVI JAIN
 
Topology hiding Multipath Routing Protocol in MANET
Topology hiding Multipath Routing Protocol in MANETTopology hiding Multipath Routing Protocol in MANET
Topology hiding Multipath Routing Protocol in MANETAkshay Phalke
 
Lines and planes in space
Lines and planes in spaceLines and planes in space
Lines and planes in spaceFaizan Shabbir
 
PR-305: Exploring Simple Siamese Representation Learning
PR-305: Exploring Simple Siamese Representation LearningPR-305: Exploring Simple Siamese Representation Learning
PR-305: Exploring Simple Siamese Representation LearningSungchul Kim
 
진동데이터 활용 충돌체 탐지 AI 경진대회 1등
진동데이터 활용 충돌체 탐지 AI 경진대회 1등진동데이터 활용 충돌체 탐지 AI 경진대회 1등
진동데이터 활용 충돌체 탐지 AI 경진대회 1등DACON AI 데이콘
 
Wayfair-Data Science Project
Wayfair-Data Science ProjectWayfair-Data Science Project
Wayfair-Data Science ProjectMehnaz Maharin
 

Similar to Ames housing (20)

Intro to Reinforcement learning - part III
Intro to Reinforcement learning - part IIIIntro to Reinforcement learning - part III
Intro to Reinforcement learning - part III
 
General Tips for participating Kaggle Competitions
General Tips for participating Kaggle CompetitionsGeneral Tips for participating Kaggle Competitions
General Tips for participating Kaggle Competitions
 
Why quants love NAG & NAG from Excel
Why quants love NAG & NAG from ExcelWhy quants love NAG & NAG from Excel
Why quants love NAG & NAG from Excel
 
Digital Timing and Carrier Synchronization.ppt
Digital Timing and Carrier Synchronization.pptDigital Timing and Carrier Synchronization.ppt
Digital Timing and Carrier Synchronization.ppt
 
Gradient descent method
Gradient descent methodGradient descent method
Gradient descent method
 
Like-for-Like Comparisons of Machine Learning Algorithms - Dominik Dahlem, Bo...
Like-for-Like Comparisons of Machine Learning Algorithms - Dominik Dahlem, Bo...Like-for-Like Comparisons of Machine Learning Algorithms - Dominik Dahlem, Bo...
Like-for-Like Comparisons of Machine Learning Algorithms - Dominik Dahlem, Bo...
 
Gradient Steepest method application on Griewank Function
Gradient Steepest method application  on Griewank Function Gradient Steepest method application  on Griewank Function
Gradient Steepest method application on Griewank Function
 
Approximation Algorithms TSP
Approximation Algorithms   TSPApproximation Algorithms   TSP
Approximation Algorithms TSP
 
Biratnagar Robotics Club, Nepal
Biratnagar Robotics Club, NepalBiratnagar Robotics Club, Nepal
Biratnagar Robotics Club, Nepal
 
[Q-tangled 22] Deconstructing Quantum Machine Learning Algorithms - Sasha Laz...
[Q-tangled 22] Deconstructing Quantum Machine Learning Algorithms - Sasha Laz...[Q-tangled 22] Deconstructing Quantum Machine Learning Algorithms - Sasha Laz...
[Q-tangled 22] Deconstructing Quantum Machine Learning Algorithms - Sasha Laz...
 
controltrix - we make control solutions easier
controltrix - we make control solutions easiercontroltrix - we make control solutions easier
controltrix - we make control solutions easier
 
Quantum Machine Learning for IBM AI
Quantum Machine Learning for IBM AIQuantum Machine Learning for IBM AI
Quantum Machine Learning for IBM AI
 
Array antenna and LMS algorithm
Array antenna and LMS algorithmArray antenna and LMS algorithm
Array antenna and LMS algorithm
 
Topology hiding Multipath Routing Protocol in MANET
Topology hiding Multipath Routing Protocol in MANETTopology hiding Multipath Routing Protocol in MANET
Topology hiding Multipath Routing Protocol in MANET
 
Lines and planes in space
Lines and planes in spaceLines and planes in space
Lines and planes in space
 
Making Complex Decisions(Artificial Intelligence)
Making Complex Decisions(Artificial Intelligence)Making Complex Decisions(Artificial Intelligence)
Making Complex Decisions(Artificial Intelligence)
 
PR-305: Exploring Simple Siamese Representation Learning
PR-305: Exploring Simple Siamese Representation LearningPR-305: Exploring Simple Siamese Representation Learning
PR-305: Exploring Simple Siamese Representation Learning
 
진동데이터 활용 충돌체 탐지 AI 경진대회 1등
진동데이터 활용 충돌체 탐지 AI 경진대회 1등진동데이터 활용 충돌체 탐지 AI 경진대회 1등
진동데이터 활용 충돌체 탐지 AI 경진대회 1등
 
MixedSignal UVM Demo CDNLive
MixedSignal UVM Demo CDNLiveMixedSignal UVM Demo CDNLive
MixedSignal UVM Demo CDNLive
 
Wayfair-Data Science Project
Wayfair-Data Science ProjectWayfair-Data Science Project
Wayfair-Data Science Project
 

Recently uploaded

Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Sapana Sha
 
Digi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptxDigi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptxTanveerAhmed817946
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiSuhani Kapoor
 
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...Suhani Kapoor
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystSamantha Rae Coolbeth
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxStephen266013
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingNeil Barnes
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998YohFuh
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...Florian Roscheck
 
Aminabad Call Girl Agent 9548273370 , Call Girls Service Lucknow
Aminabad Call Girl Agent 9548273370 , Call Girls Service LucknowAminabad Call Girl Agent 9548273370 , Call Girls Service Lucknow
Aminabad Call Girl Agent 9548273370 , Call Girls Service Lucknowmakika9823
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...Pooja Nehwal
 
Predicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project PresentationPredicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project PresentationBoston Institute of Analytics
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 

Recently uploaded (20)

Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
 
Digi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptxDigi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptx
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
 
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
 
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data Analyst
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docx
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
 
Decoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in ActionDecoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in Action
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
 
Aminabad Call Girl Agent 9548273370 , Call Girls Service Lucknow
Aminabad Call Girl Agent 9548273370 , Call Girls Service LucknowAminabad Call Girl Agent 9548273370 , Call Girls Service Lucknow
Aminabad Call Girl Agent 9548273370 , Call Girls Service Lucknow
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
 
Predicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project PresentationPredicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project Presentation
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 

Ames housing

  • 1. Predicting House Price on Ames Dataset Evaluating Linear and Bayesian Regression Supervisor : Anirban Basu Prepared By:- Archit Vora (2015HT12480)
  • 2. Data Description • 80 features • 23 Nominal • Near by Area - Agriculture, Commercial, Ordinal • 23 Ordinal • Service satisfaction • Unsatisfied, neutral, satisfied • Kitchen Quality - Bad, Good, Excellent • 14 Discrete • 20 Continuous • Example Features • Nearby Area • No of bedrooms • Parking Facility • Kitchen Quality • Swimming Pool • No of Floors
  • 3. Filling Up Missing Values • Common • Encoding for Ordinal variables • Leveling for nominal variables • Specific • Replace with most frequent value for discrete and nominal • When no of missing values are less • Mostly the case with our data • Sometimes null value implies feature is not available • Basement area null gets replaced with zero • Inferring from related variables • When garage year is null, we say house is without garage • Allows us to infer garage area, no of cars, garage condition • Linear Regression : Predict Lot Frontage from lot area (after sqrt) • Improvements: • Creating Levels blindly : 288 features, 0.157122 rmse • After logically filling missing values : 263 features, 0.155965
  • 4. Error Metric • RMSE of log • 5 fold cross validation
  • 6. Linear Least Squares • P is Project matrix • Instead of solving for Y we solve for 𝑌∗ • A = (𝑋 𝑇 𝑋) is the matrix of interest • In our case it is not invertible • We find Pseudo inverse with SVD
  • 7. LLS with pseudo inverse • SVD is fast • U and V are orthogonal hence is transpose • D is diagonal, so inverse each of them • SVD allows pseudo inverse • Just inverse the non zero entries of D • Accuracy • 0.16096
  • 8. Gradient Descent • Importance of Normalizing • Iterations (step size 0.05) • 100 – rmse 1.93 • 5k – rmse 0.48566 • 10k – rmse 0.38121 • 100k – rmse 0.18964
  • 9. Shrinkage Intuition • Ridge • Lasso • Assumption • Two parameters • Without shrinkage entire plane is available
  • 10. Ridge • Matrix Inverse • Unique inverse Exists • Accuracy - 0.1395 • Matrix SVD • Accuracy - 0.13927 • SAG • Accuracy - 0.230815 • Tuning • Alpha = 15
  • 11. Lasso • Indicating not lo use Lasso • Generally used when we have too many features say in the range of 1000 • Accuracy - 0.19638 • Non zero coefficients – 13 • Tuning • Alpha = 1
  • 12. PCR • SVD • U reduced • First 100 component • Var explained = 0.9797 • rmse 0.14949
  • 13. Linear Regression - Summary • Baseline accuracy (average of training data) • 0.399 • Simple Least Square • LLS with Pseudo Inverse - rmse 0.16096 • Gradient descent – step size 0.005 (Did not help), diverging • After Normalizing • Iter 1,00,000 rmse 0.18964 • Iter 10, 000 rmse 0.38121 • Iter 5000 rmse 0.48566 • PCR • 100 component • Rmse 0.14949 • Ridge • Matrix - alpha 15, rmse 0.1395 • Svd instead of finding inverse – 0.13927 • Sag – 0.230815 • Lasso • Alpha 1, rmse 0.19638, non zero 13
  • 15. Bayes Theorem • Role of prior • Expert Advise • What is fixed ? • Data • Parameter • Denominator as complex multiple integration over parameters
  • 16. Markov Chain Monte Carlo • Markov Chain • Next state depends only on current steps • Application • Hidden Markov models in speech recognition • Allows to draw independent samples • Monte Carlo • Simulation when simple mathematical equation is simple • Application • Uncertainty in market is random, predict sales • Solving complex Integration
  • 17. MCMC Steps – Metropolis Hasting 1. muProposal = norm(muCurrent, StdDev).rvs() 2. likelihoodCurrent = norm(muCurrent, 1).pdf(data).prod() 3. likelihoodProposal = norm(muProposal, 1).pdf(data).prod() 4. priorCurrent = norm(muPriorMu, muPriorSd).pdf(muCurrent) 5. priorProposal = norm(muPriorMu, muPriorSd).pdf(muProposal) 6. probCurrent = likelihoodCurrent * priorCurrent 7. probProposal = likelihoodProposal * priorProposal 8. probAccept = probProposal / probCurrent 9. accept = np.random.rand() < probAccept 10. if (accept == True) then muCurrent = muProposal
  • 19. Pymc3 style 1. Model 2. Describe 3. Simulate 4. You get the trace
  • 20. Trace
  • 21. Predicting for new sample • Predict outcome for beta, alpha, sigma of each iteration • Draw random sample from given mu and sigma
  • 23. Lasso • Change Prior of parameter to strictly closer to zero distribution • Laplace Distribution • Accuracy : 0.25288
  • 24. Some Kaggle Results • Without Missing Data Script • Accuracy – 0.12727 • Rank - 968 • With Missing Data Script • Accuracy – 0.12664 • Rank - 932
  • 25. Thank you ! • BITS Wilp • Two years of amazing learning • Adobe for work life balance • My mentor Anirban Basu • Mathematics will carry along, subjective knowledge will not • Teachers, professors and seniors • I think I am luck enough • Parents and sisters • I think I am blessed